How can the labels of AgglomerativeClustering be re-computed?
I am using scikit-learn's AgglomerativeClustering on a large data set.
I would like to modify the distance_threshold
after the model has already been computed. Computing the model is slow (quadratic time), but it should easily be possible to re-compute the labels for a new distance_threshold
in linear time because the model stores the children_
and distances_
arrays permanently. But how can the labels be re-computed for a different distance_threshold
?
It can be assumed that distance_threshold
was originally set to 0, i.e. the entire tree was computed.
Topic scikit-learn clustering
Category Data Science