Dynamic clustering

Question

Dynamic clustering

Lou65

2022年4月24日 19:06

I am performing anomaly detection on different datasets and thought to first cluster the dataset and submit each of the clusters to different AD models. I am using HDBSCAN, and in my test dataset I get anywhere between 10 and 20 clusters, but when I ran the first test in production I get 3500. How can I repeat the AD models dynamically amongst all the clusters?

Topic anomaly-detection machine-learning

Category Data Science

Arpit Sisodia · Accepted Answer · 2018年6月15日 13:33

Considering your objective, i would suggest you to use LOF ( Local Outlier Factor) based clustering. This will give you outliers respective of clusters, Not only the the global outliers. LOF distance of all the data points would be used to identify abnormalities. Here you dont have to be worry about number of clusters.

https://en.wikipedia.org/wiki/Local_outlier_factor

Also i would question the need of having multiple clustering algos. They are intended to use for specfic scenarios. One should see the underlying distribution and pick the best AD algo.

Dynamic clustering

About