Handling of Outliers

How to detect and handle anomalous data that could potentially lower the algorithm analysis.

What are outliers?

Outliers are data points that are significantly different from the dataset, can either be abnormally high or low. They can be generated by wrong observations or inconsistent data entry and can often skew the results of statistical analyses on the dataset.

It is important to remove the rows considered as outliers since keeping them could lead to a less effective and less useful models.

How Arkangel detects outliers?

Outliers are detected by using Isolation Forests, this method filters the data by realizing cuts along the dataset, separating them by how apart the data is from one another, to the point where only a single point is left.

Anomalies entries are determined by how easily the entry is to make it isolated from the dataset.

Last updated