Density Based Clustering Algorithm
Density Based Clustering Algorithm
Algorithms
Kehkashan Fatima 202090202
Amulya Viswambharan 202090007
Sruthi Krishnan 202090333
Outline
• Introduction
• DBSCAN
• OPTICS
• DENCLUE
• Conclusion
Introduction
• Clustering, in data mining, is a useful technique.
Based on density between objects.
where is some distance measure and . Note that the point is always in its own
-neighborhood, i.e., always holds.
DBSCAN
• Point classes :
o
A point is classified as:
• Core point: If has high density, i.e., || minPts, where minPts is a user-
specified density threshold.
(a) (b)
Figure (a) the three point classes, core, border, and noise points Figure (b) illustrates the concept of density-
reachability and density-connectivity.
DBSCAN
•Procedure:
• Initially all the objects are marked unvisited.
• Then, randomly select an unvisited object m and mark it as visited .
• Check if its neighbourhood has no minpoints objects, then mark m as noise point.
• Else Form a new cluster C and add m to C.
• Add remaining objects of neighbourhood of m to set N.
• Repeat the procedure.
• The run time of DBSCAN algorithm is ().
DBSCAN
•Advantages:
• Fast.
• Shape.
• No prior knowledge.
• Only two input parameters.
• No need to define in advance.
Disadvantages:
• Initial values.
• The computational complexity is O(); with spatial indexing it is O ( log ).
• Variable density.
• High-dimensional data.
OPTICS
Reachability-distanceƐ,MinPts(p, q)
“smallest distance such that q is directly density-reachable
from p”
reachability distance
reachability distance
Disadvantages
Memory Cost
It expects some kind of density decline to find cluster borders
It is less sensitive to erroneous data
DENCLUE
(DENsity-based CLUstEring)
Proposed by
A. Hinneburg and D. A. Keim
DENCLUE
• It is considered as a special case of the Kernel Density Estimation (KDE)
• The KDE is a non-parametric estimation technique, which aimed to find dense
regions points.
• DENCLUE was developed to classify large multimedia databases as such data
bases contain high volume of noise & requires clustering of high-dimensional
feature vectors.
STEPS in DENCLUE
PRE-CLUSTERING CLUSTERING
FIND CONNECTING
CONSTRUCT A THE DENSITY
DETERMINE DENSITY
CONSTRUCT MAP BY ATTRACTORS
THE CONNECTING
ATTRACTORS
THE HYPER HAVING THE
POPULATED THE USING HILL SAME PATH TO
RECTANGLE CUBES CLIMBING
POPULATED FORM FINAL
CUBES ALGORITHM CLUSTERS
Steps…
DENCLUE is based on the calculation of the influence of
points between them.
Where
d(x, y) is an euclidean distance between x and y,
σ represents the radius of the neighbourhood containing x. Fig. A Hyper Rectangle with cubes
DENSITY FUNCTION & DENSITY ATTRACTOR
The density function is calculated using the previous equation.
where D represents the set of points on the database, and N its cardinal.
• To determine the clusters, DENCLUE calculate the density attractor for each point in the
database.
• This attractor is considered as a local maximum of the density function. This maximum is
found by the Hill Climbing algorithm, which is based on gradient ascent approach
Procedure..
The points forming a path with the density attractor, are called attracted points.
Clusters are made by taking into account the density attractors and its attracted points.
Overall..
i. A statistical density estimation method is used for estimating the kernel density. This results in the local
ii. Clusters can be formed from this local density maxima value.
iii. If the local density value is very small, then the objects of clusters are discarded as noise In this
method
iv. Objects under consideration are added to a cluster through density attractors using a step wise hill-
climbing procedure.
ADVANTAGES:
• It detects erroneous data very well.
• It solves non-spherical shaped clusters in high-dimensional data sets.
• Its processing is much faster than DBSCAN.
DISADVANTAGES:
• It needs many constants.
• It is less sensitive to outliers.
Problem with Hill Climbing Algorithm
Global Maxima
Result can either be
Optimum or Complete
when using the Hill
Objective Function, y
Climbing Algorithm
Local Maxima
DENCLUE suffers in
terms of execution
time due to the Hill
Climbing Algorithm
State Space, X
CONCLUSION :
DENCLUE-IM (An improvement to DENCLUE)
• The idea behind is to speed calculation by avoiding the crucial step in DENCLUE
which is the Hill Climbing step.
• This step considered as crucial in DENCLUE algorithm is based on gradient
calculations.
• These calculations are done for each point in order to find its density attractor.
• Make calculations for each point is not obvious to achieve results in a reasonable
time, especially when it comes to operate on large databases.
• This allows to find an equivalent item to the density attractor, which will
represent all the points contained in a hyper-cube, XH Cube.
Conclusion..
• XH Cube will be considered as the point having the highest density in this hyper-
cube as shown in equation.