0% found this document useful (0 votes)
76 views

Lecture 4 - Density Based Methods

The document discusses different types of density-based clustering methods. It describes DBSCAN, which finds clusters of arbitrary shape by discovering density-connected points based on a density parameter. OPTICS is presented as an improvement over DBSCAN that produces a cluster-ordering to represent the intrinsic clustering structure. Finally, DENCLUE is summarized as using statistical density functions with a solid mathematical foundation to allow description of arbitrarily shaped clusters.

Uploaded by

Manikandan M
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Lecture 4 - Density Based Methods

The document discusses different types of density-based clustering methods. It describes DBSCAN, which finds clusters of arbitrary shape by discovering density-connected points based on a density parameter. OPTICS is presented as an improvement over DBSCAN that produces a cluster-ordering to represent the intrinsic clustering structure. Finally, DENCLUE is summarized as using statistical density functions with a solid mathematical foundation to allow description of arbitrarily shaped clusters.

Uploaded by

Manikandan M
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Chapter 7.

Cluster Analysis
1. What is Cluster Analysis?
2. Types of Data in Cluster Analysis
3. A Categorization of Major Clustering Methods
4. Partitioning Methods
5. Hierarchical Methods
6. Density-Based Methods
7. Grid-Based Methods
8. Model-Based Methods
9. Clustering High-Dimensional Data
10. Constraint-Based Clustering
11. Outlier Analysis
12. Summary

11/1/22 Data Mining: Concepts and Techniques 1


Density-Based Clustering Methods
 Clustering based on density (local cluster criterion), such
as density-connected points
 Major features:

Discover clusters of arbitrary shape

Handle noise

One scan

Need density parameters as termination condition
 Several interesting studies:
 DBSCAN: Ester, et al. (KDD’96)

 OPTICS: Ankerst, et al (SIGMOD’99).

 DENCLUE: Hinneburg & D. Keim (KDD’98)

 CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid-

based)
11/1/22 Data Mining: Concepts and Techniques 2
Density-Based Clustering: Basic Concepts
 Two parameters:
 Eps: Maximum radius of the neighbourhood
 MinPts: Minimum number of points in an Eps-
neighbourhood of that point
 NEps(p): {q belongs to D | dist(p,q) <= Eps}
 Directly density-reachable: A point p is directly density-
reachable from a point q w.r.t. Eps, MinPts if
 p belongs to NEps(q)
 core point condition: p MinPts = 5
q Eps = 1 cm
|NEps (q)| >= MinPts

11/1/22 Data Mining: Concepts and Techniques 3


Density-Reachable and Density-Connected
 Density-reachable:
 A point p is density-reachable p
from a point q w.r.t. Eps, MinPts if
p1
there is a chain of points p1, …, q
pn, p1 = q, pn = p such that pi+1 is
directly density-reachable from pi
 Density-connected
 A point p is density-connected to a p q
point q w.r.t. Eps, MinPts if there
is a point o such that both, p and o
q are density-reachable from o
w.r.t. Eps and MinPts
11/1/22 Data Mining: Concepts and Techniques 4
DBSCAN: Density Based Spatial Clustering of
Applications with Noise
 Relies on a density-based notion of cluster: A cluster is
defined as a maximal set of density-connected points
 Discovers clusters of arbitrary shape in spatial databases
with noise

Outlier

Border
Eps = 1cm
Core MinPts = 5

11/1/22 Data Mining: Concepts and Techniques 5


DBSCAN: The Algorithm

 Arbitrary select a point p


 Retrieve all points density-reachable from p w.r.t. Eps
and MinPts.
 If p is a core point, a cluster is formed.
 If p is a border point, no points are density-reachable
from p and DBSCAN visits the next point of the database.
 Continue the process until all of the points have been
processed.

11/1/22 Data Mining: Concepts and Techniques 6


DBSCAN: Sensitive to Parameters

11/1/22 Data Mining: Concepts and Techniques 7


CHAMELEON (Clustering Complex Objects)

11/1/22 Data Mining: Concepts and Techniques 8


OPTICS: A Cluster-Ordering Method (1999)

 OPTICS: Ordering Points To Identify the Clustering


Structure
 Ankerst, Breunig, Kriegel, and Sander (SIGMOD’99)

 Produces a special order of the database wrt its

density-based clustering structure


 This cluster-ordering contains info equiv to the density-

based clusterings corresponding to a broad range of


parameter settings
 Good for both automatic and interactive cluster

analysis, including finding intrinsic clustering structure


 Can be represented graphically or using visualization

techniques

11/1/22 Data Mining: Concepts and Techniques 9


OPTICS: Some Extension from
DBSCAN
 Index-based:

k = number of dimensions

N = 20

p = 75%

M = N(1-p) = 5 D
 Complexity: O(kN2)
 Core Distance
p1

 Reachability Distance o
p2
o
Max (core-distance (o), d (o, p))
MinPts = 5
r(p1, o) = 2.8cm. r(p2,o) = 4cm
11/1/22 e = 3 cm
Data Mining: Concepts and Techniques 10
Reachability
-distance

undefined


‘

Cluster-order
of the objects
11/1/22 Data Mining: Concepts and Techniques 11
Density-Based Clustering: OPTICS & Its Applications

11/1/22 Data Mining: Concepts and Techniques 12


DENCLUE: Using Statistical Density
Functions
 DENsity-based CLUstEring by Hinneburg & Keim (KDD’98)
d ( x , y )2

 Using statistical density functions:
f Gaussian ( x , y )  e 2 2

d ( x , xi ) 2


D N 2
f Gaussian ( x)  i 1
e 2

d ( x , xi ) 2

( x, xi )  i 1 ( xi  x)  e
D N
2 2
 Major features f Gaussian

 Solid mathematical foundation


 Good for data sets with large amounts of noise
 Allows a compact mathematical description of arbitrarily shaped
clusters in high-dimensional data sets
 Significant faster than existing algorithm (e.g., DBSCAN)
 But needs a large number of parameters

11/1/22 Data Mining: Concepts and Techniques 13


Denclue: Technical Essence

 Uses grid cells but only keeps information about grid


cells that do actually contain data points and manages
these cells in a tree-based access structure
 Influence function: describes the impact of a data point
within its neighborhood
 Overall density of the data space can be calculated as
the sum of the influence function of all data points
 Clusters can be determined mathematically by
identifying density attractors
 Density attractors are local maximal of the overall
density function

11/1/22 Data Mining: Concepts and Techniques 14


Density Attractor

11/1/22 Data Mining: Concepts and Techniques 15


Center-Defined and Arbitrary

11/1/22 Data Mining: Concepts and Techniques 16

You might also like