IsolationForest Decision Function vs. Anomaly Prediction Question

I'm currently working on an unsupervised anomaly detection project, and for it I'm using IsolationForest through scikit-learn. My question is, why/how is it possible for the model to predict something to be an anomaly when it is within the decision function space for inliers?

I've attached my results here:

Could the size of the decision function space be due to my input dimension vs this 2 dimensional projection?

I also made a quick plot of anomaly score vs. prediction (0 = inlier, 1 = anomaly):

As seen, there exists outliers above the threshold score, which doesn't make sense to me. Can someone explain?

Topic anomaly-detection random-forest scikit-learn machine-learning

Category Data Science


It looks that your PCA is not mapping well the data. I recommend you to look at other dimensionality reductions techniques, such as, UMAP, TSNE, in order to see if you can achieve better representation. If you do not have too much data, you could use also MDS since it will preserve local distances (however, computationally very expensive):

https://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html

If only the visualization is what your are after, then you could just simply run Isolation Forest on 2 Principal components, then the boundaries should be more obvious in the first picture. Of course, the actual results may be misleading.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.