Improve Data

Once you have imported data is time to understand and clean your data. Arkangel AI performs an automatic analysis and suggest best practices for it

Preparing your data is an iterative process. Even if you clean and prep your training data prior to uploading it to Arkangel AI, you can still improve its quality by assessing features during EDA (Exploratory Data Analysis).

For categorical variables with numerical labels, like 0, 1, 2, it's advisable to represent these categories with descriptive terms such as "bad," "medium," and "good," or even "category1," "category2," and "category3." This helps models to effectively recognize and treat the variable as categorical.

Stages of EDA

During EDA, Arkangel AI performs Data Quality Assessment. The assessment provides information about data quality issues that are relevant to the stage of model building you are performing. Click one of the following tabs to learn about the two EDA stages.

EDA1 (data ingest)

EDA1 occurs after you upload your data and assesses the All Features list and detects issues like:

  • Outliers

  • Inliers

  • Excess zeros

  • Disguised missing values

Load and view your dataset

As soon as you load your dataset, DataRobot performs EDA1. In this phase, DataRobot generates summary statistics based on a sample of your data.

Investigate feature importance

Arkangel AI calculates automatically the significance of each feature and correlation with the prediction target selected. To get the calculations select a prediction target.

You might want to remove features that are unrelated to the target. To learn how to use this feature check our Correlation and Significance tutorial.

Assess data and improve it at EDA2

Once you have all your cleaning commands press the yellow button and move to Create AI Model final step.

Last updated