DM Lecture 06
DM Lecture 06
Lectore-07
Dr. Waqas Haider Khan Bangyal
Data Mining Overview
Data Mining
Data warehouses and OLAP (On Line Analytical
Processing.)
Clustering: Hierarchical and Partitioned approaches
Classification: Decision Trees , ANN, Bayesian
classifiers and G.A
Association Rules Mining
Advanced topics: outlier detection, web mining
What is a natural grouping among these objects?
What is a natural grouping among these objects?
Clustering is subjective
Partitioning methods
k-Means (and EM), k-Medoids
Hierarchical methods
agglomerative, divisive, BIRCH
Model-based clustering methods
K-MEANS CLUSTERING
Simply speaking k-means clustering is an algorithm to
classify or to group the objects based on
attributes/features into K number of group.
K is positive integer number.
The grouping is done by minimizing the sum of
squares of distances between data and the
corresponding cluster centroid.
Common Distance measures:
Distance measure will determine how the similarity of two
elements is calculated and it will influence the shape of the
clusters.
They include:
1. The Euclidean distance (also called 2-norm distance) is given by:
Medicine B 2 1
Medicine C 4 3
Medicine D 5 4
Step 1:
Initial value of
centroids : Suppose we
use medicine A and
medicine B as the first
centroids.
Let and c1 and c2 denote
the coordinate of the
centroids, then c1=(1,1)
and c2=(2,1)
Objects-Centroids distance : we calculate the distance between cluster centroid to each object.
Let us use ρ(a, b) = |x2 – x1| + |y2 – y1|
distance matrix at iteration 0 is
(1,1) 0 1 Cluster-1
(2,1) 1 0 Cluster-2
(4,3) 5 4 Cluster-2
(5,4) 7 6 Cluster-2
In Iteration2, we basically repeat the process from Iteration1 this time using the new
means we computed.