Dynamic clustering

I am performing anomaly detection on different datasets and thought to first cluster the dataset and submit each of the clusters to different AD models. I am using HDBSCAN, and in my test dataset I get anywhere between 10 and 20 clusters, but when I ran the first test in production I get 3500. How can I repeat the AD models dynamically amongst all the clusters?

Topic anomaly-detection machine-learning

Category Data Science


Considering your objective, i would suggest you to use LOF ( Local Outlier Factor) based clustering. This will give you outliers respective of clusters, Not only the the global outliers. LOF distance of all the data points would be used to identify abnormalities. Here you dont have to be worry about number of clusters.

https://en.wikipedia.org/wiki/Local_outlier_factor

Also i would question the need of having multiple clustering algos. They are intended to use for specfic scenarios. One should see the underlying distribution and pick the best AD algo.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.