IsolationForest Decision Function vs. Anomaly Prediction Question

Question

IsolationForest Decision Function vs. Anomaly Prediction Question

kdavid2

2022年4月28日 20:04

I'm currently working on an unsupervised anomaly detection project, and for it I'm using IsolationForest through scikit-learn. My question is, why/how is it possible for the model to predict something to be an anomaly when it is within the decision function space for inliers?

I've attached my results here:

Could the size of the decision function space be due to my input dimension vs this 2 dimensional projection?

I also made a quick plot of anomaly score vs. prediction (0 = inlier, 1 = anomaly):

As seen, there exists outliers above the threshold score, which doesn't make sense to me. Can someone explain?

Topic anomaly-detection random-forest scikit-learn machine-learning

Category Data Science

Klaudijus · Accepted Answer · 2020年3月25日 11:52

It looks that your PCA is not mapping well the data. I recommend you to look at other dimensionality reductions techniques, such as, UMAP, TSNE, in order to see if you can achieve better representation. If you do not have too much data, you could use also MDS since it will preserve local distances (however, computationally very expensive):

https://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html

If only the visualization is what your are after, then you could just simply run Isolation Forest on 2 Principal components, then the boundaries should be more obvious in the first picture. Of course, the actual results may be misleading.

IsolationForest Decision Function vs. Anomaly Prediction Question

About