Skip to content

Too much noise found #72

@Phlya

Description

@Phlya

Hi,

I am having a problem with clustering a data set. The data has been extensively filtered before clustering to remove uninteresting samples, so I would expect that the vast majority of samples clusters with some other samples. If I try other clustering algorithms which simply separate the dataset into clusters, with reasonable number of clusters, I see that all clusters make sense and there is (almost) no noise. However with all min_cluster_size and min_samples parameters I have tried hdbscan considers a lot of samples (~1/4-1/3) as noise. I can clearly see that there is structure in that noise by eye too... Is there anything else to do about it?

I'm attaching a seaborn clustermap to show that there is no real noise in the data, and what I can get with HDBSCAN (with the leftmost cluster being what is detected as noise)
clustermap
hdbscan_clusters

More what I expect is produced by Agglomerative Clustering:
agglomerativeclustring_fromcormatrix_11_clusters

Is it possible to force all points to the nearest cluster, for example?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions