🦉
Arkangel AI Docs
  • 👋Welcome to Arkangel AI
  • 🍕Preparing Learning Data
    • Data Best Practices
    • How to anonymize your data?
    • How to build a good dataset for ML?
  • 🛠️Getting Started
  • Product Tutorials
    • 📩Upload Data
    • ⭐Improve Data
      • Handling dates
      • Correlation and Significance
      • Handling of Outliers
    • 🤖Create AI Models
    • 🔮Make Predictions
    • 📈Integrate & Monitor
  • API Docs
    • 🚩API Overview
    • 🔑Authentication
    • 👾Methods
      • 🚀Projects
      • 🧠Datasets
      • 🔮Predictions
    • 📖Glossary
Powered by GitBook
On this page
  • Stages of EDA
  • EDA1 (data ingest)
  • Load and view your dataset
  • Investigate feature importance
  • Assess data and improve it at EDA2

Was this helpful?

  1. Product Tutorials

Improve Data

Once you have imported data is time to understand and clean your data. Arkangel AI performs an automatic analysis and suggest best practices for it

PreviousUpload DataNextHandling dates

Last updated 1 year ago

Was this helpful?

Preparing your data is an iterative process. Even if you clean and prep your training data prior to uploading it to Arkangel AI, you can still improve its quality by assessing features during EDA (Exploratory Data Analysis).

For categorical variables with numerical labels, like 0, 1, 2, it's advisable to represent these categories with descriptive terms such as "bad," "medium," and "good," or even "category1," "category2," and "category3." This helps models to effectively recognize and treat the variable as categorical.

Stages of EDA

During EDA, Arkangel AI performs Data Quality Assessment. The assessment provides information about data quality issues that are relevant to the stage of model building you are performing. Click one of the following tabs to learn about the two EDA stages.

EDA1 (data ingest)

EDA1 occurs after you upload your data and assesses the All Features list and detects issues like:

  • Outliers

  • Inliers

  • Excess zeros

  • Disguised missing values

Load and view your dataset

As soon as you load your dataset, DataRobot performs EDA1. In this phase, DataRobot generates summary statistics based on a sample of your data.

Investigate feature importance

Arkangel AI calculates automatically the significance of each feature and correlation with the prediction target selected. To get the calculations select a prediction target.

Assess data and improve it at EDA2

You might want to remove features that are unrelated to the target. To learn how to use this feature check our .

Once you have all your cleaning commands press the yellow button and move to final step.

⭐
Correlation and Significance tutorial
Create AI Model
Arkangel AI provides detailed analysis for each column from your dataset.
Scroll to button and you will find the Send to preprocess data button to go to the last stage.