-
Notifications
You must be signed in to change notification settings - Fork 530
Description
Hi,
I am having a problem with clustering a data set. The data has been extensively filtered before clustering to remove uninteresting samples, so I would expect that the vast majority of samples clusters with some other samples. If I try other clustering algorithms which simply separate the dataset into clusters, with reasonable number of clusters, I see that all clusters make sense and there is (almost) no noise. However with all min_cluster_size and min_samples parameters I have tried hdbscan considers a lot of samples (~1/4-1/3) as noise. I can clearly see that there is structure in that noise by eye too... Is there anything else to do about it?
I'm attaching a seaborn clustermap to show that there is no real noise in the data, and what I can get with HDBSCAN (with the leftmost cluster being what is detected as noise)


More what I expect is produced by Agglomerative Clustering:

Is it possible to force all points to the nearest cluster, for example?