0% found this document useful (0 votes)

76 views

Lecture 4 - Density Based Methods

The document discusses different types of density-based clustering methods. It describes DBSCAN, which finds clusters of arbitrary shape by discovering density-connected points based on a density parameter. OPTICS is presented as an improvement over DBSCAN that produces a cluster-ordering to represent the intrinsic clustering structure. Finally, DENCLUE is summarized as using statistical density functions with a solid mathematical foundation to allow description of arbitrarily shaped clusters.

Uploaded by

Manikandan M

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views

Lecture 4 - Density Based Methods

Uploaded by

Manikandan M

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Chapter 7.

Cluster Analysis
1. What is Cluster Analysis?
2. Types of Data in Cluster Analysis
3. A Categorization of Major Clustering Methods
4. Partitioning Methods
5. Hierarchical Methods
6. Density-Based Methods
7. Grid-Based Methods
8. Model-Based Methods
9. Clustering High-Dimensional Data
10. Constraint-Based Clustering
11. Outlier Analysis
12. Summary

11/1/22 Data Mining: Concepts and Techniques 1

Density-Based Clustering Methods
 Clustering based on density (local cluster criterion), such
as density-connected points
 Major features:

Discover clusters of arbitrary shape

Handle noise

One scan

Need density parameters as termination condition
 Several interesting studies:
 DBSCAN: Ester, et al. (KDD’96)

 OPTICS: Ankerst, et al (SIGMOD’99).

 DENCLUE: Hinneburg & D. Keim (KDD’98)

 CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid-

based)
11/1/22 Data Mining: Concepts and Techniques 2
Density-Based Clustering: Basic Concepts
 Two parameters:
 Eps: Maximum radius of the neighbourhood
 MinPts: Minimum number of points in an Eps-
neighbourhood of that point
 NEps(p): {q belongs to D | dist(p,q) <= Eps}
 Directly density-reachable: A point p is directly density-
reachable from a point q w.r.t. Eps, MinPts if
 p belongs to NEps(q)
 core point condition: p MinPts = 5
q Eps = 1 cm
|NEps (q)| >= MinPts

11/1/22 Data Mining: Concepts and Techniques 3

Density-Reachable and Density-Connected
 Density-reachable:
 A point p is density-reachable p
from a point q w.r.t. Eps, MinPts if
p1
there is a chain of points p1, …, q
pn, p1 = q, pn = p such that pi+1 is
directly density-reachable from pi
 Density-connected
 A point p is density-connected to a p q
point q w.r.t. Eps, MinPts if there
is a point o such that both, p and o
q are density-reachable from o
w.r.t. Eps and MinPts
11/1/22 Data Mining: Concepts and Techniques 4
DBSCAN: Density Based Spatial Clustering of
Applications with Noise
 Relies on a density-based notion of cluster: A cluster is
defined as a maximal set of density-connected points
 Discovers clusters of arbitrary shape in spatial databases
with noise

Outlier

Border
Eps = 1cm
Core MinPts = 5

11/1/22 Data Mining: Concepts and Techniques 5

DBSCAN: The Algorithm

 Arbitrary select a point p

 Retrieve all points density-reachable from p w.r.t. Eps
and MinPts.
 If p is a core point, a cluster is formed.
 If p is a border point, no points are density-reachable
from p and DBSCAN visits the next point of the database.
 Continue the process until all of the points have been
processed.

11/1/22 Data Mining: Concepts and Techniques 6

DBSCAN: Sensitive to Parameters

11/1/22 Data Mining: Concepts and Techniques 7

CHAMELEON (Clustering Complex Objects)

11/1/22 Data Mining: Concepts and Techniques 8

OPTICS: A Cluster-Ordering Method (1999)

 OPTICS: Ordering Points To Identify the Clustering

Structure
 Ankerst, Breunig, Kriegel, and Sander (SIGMOD’99)

 Produces a special order of the database wrt its

density-based clustering structure

 This cluster-ordering contains info equiv to the density-

based clusterings corresponding to a broad range of

parameter settings
 Good for both automatic and interactive cluster

analysis, including finding intrinsic clustering structure

 Can be represented graphically or using visualization

techniques

11/1/22 Data Mining: Concepts and Techniques 9

OPTICS: Some Extension from
DBSCAN
 Index-based:

k = number of dimensions

N = 20

p = 75%

M = N(1-p) = 5 D
 Complexity: O(kN2)
 Core Distance
p1

 Reachability Distance o
p2
o
Max (core-distance (o), d (o, p))
MinPts = 5
r(p1, o) = 2.8cm. r(p2,o) = 4cm
11/1/22 e = 3 cm
Data Mining: Concepts and Techniques 10
Reachability
-distance

undefined


‘

Cluster-order
of the objects
11/1/22 Data Mining: Concepts and Techniques 11
Density-Based Clustering: OPTICS & Its Applications

11/1/22 Data Mining: Concepts and Techniques 12

DENCLUE: Using Statistical Density
Functions
 DENsity-based CLUstEring by Hinneburg & Keim (KDD’98)
d ( x , y )2

 Using statistical density functions:
f Gaussian ( x , y )  e 2 2

d ( x , xi ) 2


D N 2
f Gaussian ( x)  i 1
e 2

d ( x , xi ) 2

( x, xi )  i 1 ( xi  x)  e
D N
2 2
 Major features f Gaussian

 Solid mathematical foundation

 Good for data sets with large amounts of noise
 Allows a compact mathematical description of arbitrarily shaped
clusters in high-dimensional data sets
 Significant faster than existing algorithm (e.g., DBSCAN)
 But needs a large number of parameters

11/1/22 Data Mining: Concepts and Techniques 13

Denclue: Technical Essence

 Uses grid cells but only keeps information about grid

cells that do actually contain data points and manages
these cells in a tree-based access structure
 Influence function: describes the impact of a data point
within its neighborhood
 Overall density of the data space can be calculated as
the sum of the influence function of all data points
 Clusters can be determined mathematically by
identifying density attractors
 Density attractors are local maximal of the overall
density function

11/1/22 Data Mining: Concepts and Techniques 14

Density Attractor

11/1/22 Data Mining: Concepts and Techniques 15

Center-Defined and Arbitrary

11/1/22 Data Mining: Concepts and Techniques 16

TDS C01
No ratings yet
TDS C01
110 pages
Introduction To Health Informatics
No ratings yet
Introduction To Health Informatics
50 pages
Fundamentals of Numerical Linear Algebra
No ratings yet
Fundamentals of Numerical Linear Algebra
265 pages
(Fall 2011) CS-402 Data Mining - Final Exam-SUB - v03
No ratings yet
(Fall 2011) CS-402 Data Mining - Final Exam-SUB - v03
6 pages
SAP Bi/Bw: Anas Mohammed
No ratings yet
SAP Bi/Bw: Anas Mohammed
25 pages
Neural Network: by Subodh Deolekar
No ratings yet
Neural Network: by Subodh Deolekar
12 pages
SAP BW DataSource
No ratings yet
SAP BW DataSource
21 pages
Enhancing LO DataSources - Step by Step PDF
No ratings yet
Enhancing LO DataSources - Step by Step PDF
19 pages
Matrix:: Solution
No ratings yet
Matrix:: Solution
13 pages
05 - Simulated Annealing - 01
100% (1)
05 - Simulated Annealing - 01
44 pages
BW LO Extraction - Guide
No ratings yet
BW LO Extraction - Guide
59 pages
NewOpenSQL and CDS Views
No ratings yet
NewOpenSQL and CDS Views
39 pages
Data Warehousing & Data Mining
No ratings yet
Data Warehousing & Data Mining
97 pages
SAP HANA Calculation View Tutorial
No ratings yet
SAP HANA Calculation View Tutorial
10 pages
Data Visualisation
No ratings yet
Data Visualisation
55 pages
Chap-3 Search Algorithms in Artificial Intelligence
100% (1)
Chap-3 Search Algorithms in Artificial Intelligence
93 pages
Linear Algebra Problems: Math 504 - 505 Jerry L. Kazdan
No ratings yet
Linear Algebra Problems: Math 504 - 505 Jerry L. Kazdan
80 pages
11.SAP BO InfoView PDF
No ratings yet
11.SAP BO InfoView PDF
46 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 6
82 pages
ABAP On HANA Topics
No ratings yet
ABAP On HANA Topics
23 pages
Excel Adv Formulae & Functions
No ratings yet
Excel Adv Formulae & Functions
26 pages
Frequent Patterns
No ratings yet
Frequent Patterns
80 pages
01 Basics of Data Analytics and Machine Learning
No ratings yet
01 Basics of Data Analytics and Machine Learning
16 pages
Unit-3 DMDW
No ratings yet
Unit-3 DMDW
36 pages
SQL Workshop
No ratings yet
SQL Workshop
22 pages
Custom Hierarchy Extraction in SAP BW 7.3 (Part 3)
No ratings yet
Custom Hierarchy Extraction in SAP BW 7.3 (Part 3)
8 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Appendix B DAX Reference
100% (1)
Appendix B DAX Reference
174 pages
SAP HANA Model For COPA Scenario: Previous Blogs
No ratings yet
SAP HANA Model For COPA Scenario: Previous Blogs
9 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
Unit 1
No ratings yet
Unit 1
70 pages
Guide 2 Data Mining
No ratings yet
Guide 2 Data Mining
395 pages
SAP Analytics Cloud - Connect To Live BW BEX Queries
No ratings yet
SAP Analytics Cloud - Connect To Live BW BEX Queries
19 pages
ABAP Class04
100% (1)
ABAP Class04
63 pages
Data Modelling and Visualization
No ratings yet
Data Modelling and Visualization
31 pages
openSAP Sac3 Week 1 Exercise1
No ratings yet
openSAP Sac3 Week 1 Exercise1
30 pages
DBSCAN
No ratings yet
DBSCAN
18 pages
Frame-Based Expert Systems
No ratings yet
Frame-Based Expert Systems
50 pages
What Is Currency Conversion Currency Conversion Currency Conversion
No ratings yet
What Is Currency Conversion Currency Conversion Currency Conversion
2 pages
Lecture6 Tfidf
No ratings yet
Lecture6 Tfidf
45 pages
Data Mining Notes
No ratings yet
Data Mining Notes
21 pages
SAP Certified ABAP BW Consultant
No ratings yet
SAP Certified ABAP BW Consultant
4 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
BW Best Practices
No ratings yet
BW Best Practices
8 pages
Cs8082 Machine Learning Techniques Ripped From Amazon Kindle e Books by Sai Seena
No ratings yet
Cs8082 Machine Learning Techniques Ripped From Amazon Kindle e Books by Sai Seena
148 pages
BP Ip Modeling
No ratings yet
BP Ip Modeling
26 pages
Nptel Swayam DWDM Slides
No ratings yet
Nptel Swayam DWDM Slides
406 pages
Creating Virtual Cube Based On Function Module in SAP BI
No ratings yet
Creating Virtual Cube Based On Function Module in SAP BI
15 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Oracle SQL Syllabus
No ratings yet
Oracle SQL Syllabus
9 pages
DMM360
No ratings yet
DMM360
44 pages
DWDM Syllabus
No ratings yet
DWDM Syllabus
2 pages
Seminar 7 Introduction To Databases
No ratings yet
Seminar 7 Introduction To Databases
41 pages
ER Practical 7r
No ratings yet
ER Practical 7r
5 pages
Big Data: Introduction To Terms, Concepts and Tools
No ratings yet
Big Data: Introduction To Terms, Concepts and Tools
23 pages
Clustering Density Based
No ratings yet
Clustering Density Based
14 pages
8 Clustering
No ratings yet
8 Clustering
89 pages
Lecture 6 - Clustering
No ratings yet
Lecture 6 - Clustering
25 pages
Cluster Analysis
No ratings yet
Cluster Analysis
22 pages
Data Ontap 8.2: Storage Management Guide For 7-Mode
No ratings yet
Data Ontap 8.2: Storage Management Guide For 7-Mode
360 pages
ISM-24 (BI and Knowledge Management)
No ratings yet
ISM-24 (BI and Knowledge Management)
37 pages
DBMS - Multimedia Database
No ratings yet
DBMS - Multimedia Database
6 pages
DBA T4 Business Analytics Proposal Development 2023
No ratings yet
DBA T4 Business Analytics Proposal Development 2023
33 pages
Department of Computer Science and Engineering
No ratings yet
Department of Computer Science and Engineering
41 pages
A Machine Learning Project Report
No ratings yet
A Machine Learning Project Report
12 pages
Electronic History of Patient - Electronic Health Record (EHR)
No ratings yet
Electronic History of Patient - Electronic Health Record (EHR)
33 pages
Access
No ratings yet
Access
20 pages
Database Coursework Examples
100% (2)
Database Coursework Examples
5 pages
Bangladesh University of Engineering and Technology (Buet), Dhaka
No ratings yet
Bangladesh University of Engineering and Technology (Buet), Dhaka
19 pages
CIT 841 TMA 4 Quiz Question
No ratings yet
CIT 841 TMA 4 Quiz Question
3 pages
Webinar _Learn More About Data Scientist, Data Analyst & Data Engineer
No ratings yet
Webinar _Learn More About Data Scientist, Data Analyst & Data Engineer
85 pages
Ziehn Bernhard - Manual of Harmony, Theoretical and Practical (1907) (Sequence Patterns)
100% (1)
Ziehn Bernhard - Manual of Harmony, Theoretical and Practical (1907) (Sequence Patterns)
129 pages
Chapter 6. Synchronization Tools
No ratings yet
Chapter 6. Synchronization Tools
60 pages
Data Warehousing
No ratings yet
Data Warehousing
30 pages
Project Proposal: Rashmi - Ece@sairam - Edu.in
No ratings yet
Project Proposal: Rashmi - Ece@sairam - Edu.in
5 pages
BI Model Quespap - 3
No ratings yet
BI Model Quespap - 3
2 pages
Tilley12e PPT Ch05
No ratings yet
Tilley12e PPT Ch05
42 pages
Types of Comuters
No ratings yet
Types of Comuters
15 pages
20220802-EB-Practical Data Mesh
No ratings yet
20220802-EB-Practical Data Mesh
71 pages
AcronisCyberProtect 15 Best Practices en-US
No ratings yet
AcronisCyberProtect 15 Best Practices en-US
100 pages
The PPT COA
No ratings yet
The PPT COA
18 pages
Applsci-12-09207-V2
No ratings yet
Applsci-12-09207-V2
26 pages
Life Cycle of A MongoDB Query
No ratings yet
Life Cycle of A MongoDB Query
5 pages
Data Science With Python Workflow
No ratings yet
Data Science With Python Workflow
1 page
System Architecture Document
No ratings yet
System Architecture Document
5 pages
Data Engineer - JD
No ratings yet
Data Engineer - JD
2 pages
Lesson Plan (Class 12information Technology)
100% (2)
Lesson Plan (Class 12information Technology)
6 pages
Riverbed+Academy+Tracks+at+a+Glance
No ratings yet
Riverbed+Academy+Tracks+at+a+Glance
1 page