0% found this document useful (0 votes)

159 views

Clustering

1. Hierarchical clustering begins by treating each observation as a separate cluster and then combines the nearest pairs of clusters step-by-step until all observations are in one cluster. 2. K-means clustering places k points (cluster centroids) in the data space and assigns each observation to the nearest centroid, then recalculates the centroid positions and reassigns observations in an iterative process until convergence is reached. 3. There are various techniques for choosing the optimal number of clusters k in k-means clustering, including the elbow method, information criteria, and cross-validation.

Uploaded by

DhruTheGamer

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

159 views

Clustering

Uploaded by

DhruTheGamer

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 4

1,

of a set of points is defined using a distance measure .

Similarity

2,
Each point is a cluster in itself. We then combine the two nearest clusters into
one. What type of clustering does this represent ?
Agglo

3,
Unsupervised learning focuses on understanding the data and its underlying pattern.
True

4,Members of the same cluster are far away / distant from each other .
False

5,
The ______ is a visual representation of how the data points are merged to form
clusters.
Dendogram

6,
A centroid is a valid point in a non-Eucledian space . F

7,measures the goodness of a cluster

Clusteroid

8,of two points is the average of the two points in Eucledian Space.
Centroid

9,Sampling is one technique to pick the initial k points in K Means Clustering

10,
K Means algorithm assumes Eucledian Space/Distance
T

11, is when points don't move between clusters and centroids stabilize.
Convergence

12,
What is the overall complexity of the the Agglomerative Hierarchical Clustering ?
o(N2)

13,
The number of rounds for convergence in k means clustering can be lage T

14,
is the data point that is closest to the other point in the cluster.
Clusteroid

15,
What is the R Function to divide a dataset into k clusters ?
Kmeans()

16,
___________ is a way of finding the k value for k means clustering.
CV
17,

Distance Measure
Distance Measure is a very important aspect of clustering. Knowing how close or how
far apart each variable is with respect to the other helps in grouping them.

Jaccard Distance

The Jaccard index is used to compare elements of two sets to identify which of the
members are shared and not shared.
The Jaccard Distance is a measure of how different the two given sets are.

Jaccard Distance = 1-(Jaccard Index)

Eucledian Distance

Eucledian Distance is the shortest distance between the two given points in
Eucledian Space.

Cosine distance

Cosine distance of two given vectors u and v is the angular cosine between the
given vectors.

Manhattan Distance Application

Manhattan distance is calculated on a strictly horizontal or vertical path.

Hierarchical Clustering Explained

Begin by allotting each item to a cluster. If you are having N items, you are now
having N clusters,
where each of them contains one item. Now, let us make the similarities (distances)
between the clusters the same as the similarities (distances) between the items
they include.

Discover the most identical or closest pair of clusters, merge them into one
cluster, thereby reducing one cluster.

Calculate the similarities (distances) between each of the old clusters and the new
cluster.

Repeat step 2 and step 3 until all items are finally clustered into one cluster
with size N.

Dendogram

A dendrogram is a branching diagram that represents the relationships of similarity

among a group of entities
Each branch is called a clade
The terminal end of each clade is called a leaf
There is no limit to the number of leaves in a clade
The arrangement of the clades tells us which leaves are most similar to each other
The height of the branch points indicates how similar or different they are from
each other
The greater the height, the greater the difference between the points

Disadvantages of Agglomerative Clustering

Disadvantages for agglomerative hierarchical clustering

If data points are wrongly grouped at the inception, they cannot be reallocated.
If different similarity measures are utilized to calculate the similarity between
clusters, it may result in different results altogether.

Tips for Hierarchical Clustering

There is no particluar size that fits all solutions to determine how many clusters
you need. It depends on what you intend to do with them.
For a better solution, look at the basic characteristics of the given clusters at
successive steps and make a decision when you have a solution
that can be interpreted.

Hierarchical clustering � Standarization

Standardizing the variables is a good way to follow while clustering data.

Summary on Hierarchical Clustering

In this module, you have learnt Hierarchical Clustering in detail. You have also
learnt how to read a Dendogram and
some tips to be followed when fitting hierarchical clustering to a data set.

K-Means Algorithm Simplified

Place k points in the space represented by the objects that are being clustered.
These points represent initial group centroids.
Assign each object to the group that has the closest centroid.
When all objects have been assigned, recalculate the positions of the k centroids.
Repeat Step 2 and 3 until the centroids no longer move.

Tips for K Means Clustering

For large datasets random sampling can be used to determine the k value for
clustering
Hierarchical Clustering can also be used for the same

Choosing Right K-value

Other Ways to choose the right k value

By rule of thumb
Elbow method
Information Criterion Approach
An Information Theoretic Approach
Choosing k using the Silhouette
Cross-validation

Means Clustering in R

Loading and exploring the dataset

library(datasets)
head(iris)
Visualizing the data

library(ggplot2)
ggplot(iris, aes(Petal.Length, Petal.Width, color = Species)) + geom_point()
Setting the seed and creating the cluster

set.seed(20)
irisCluster <- kmeans(iris[, 3:4], 3, nstart = 20)
irisCluster
Comparing the clusters with the species
table(irisCluster$cluster, iris$Species)
Plotting the dataset to view the clusters

irisCluster$cluster <- as.factor(irisCluster$cluster)

ggplot(iris, aes(Petal.Length, Petal.Width, color = iris$cluster)) + geom_point()

Code Snippet
Hierarchical Clustering in R

Loading and exploring the dataset

library(datasets)
head(iris)
Visualizing the data

library(ggplot2)
ggplot(iris, aes(Petal.Length, Petal.Width, color = Species)) + geom_point()
Calculating the distance and plotting the dendogram

clusters <- hclust(dist(iris[, 3:4]))

plot(clusters)
Cutting the desired number of clusters and comparing it with the data

clusterCut <- cutree(clusters, 3)

table(clusterCut, iris$Species)
Visualizing the clusters

ggplot(iris, aes(Petal.Length, Petal.Width, color = iris$Species)) +

geom_point(alpha = 0.4, size = 3.5) + geom_point(col = clusterCut) +
scale_color_manual(values = c('black', 'red', 'green'))

Model Validation Tips and Tricks

Clustering is an unsupervised learning technique. Here are few tips to validate the
model.

My Lecture On CLUSTER ANALYSIS PDF
No ratings yet
My Lecture On CLUSTER ANALYSIS PDF
55 pages
Appraisal Comments
0% (1)
Appraisal Comments
3 pages
Clustering - The Data Ensemble
No ratings yet
Clustering - The Data Ensemble
4 pages
Clustering
No ratings yet
Clustering
20 pages
Clustering Algorithms: Dalya Baron (Tel Aviv University) XXX Winter School, November 2018
No ratings yet
Clustering Algorithms: Dalya Baron (Tel Aviv University) XXX Winter School, November 2018
53 pages
Cluster Analysis in R
No ratings yet
Cluster Analysis in R
6 pages
Introduction To Clustering: Alka Arora Sr. Scientist
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
57 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Clustering - The Data Ensemble Q&A
No ratings yet
Clustering - The Data Ensemble Q&A
2 pages
Clustering - The Data Ensemble Q&A
No ratings yet
Clustering - The Data Ensemble Q&A
2 pages
Learneverythingai
No ratings yet
Learneverythingai
12 pages
Chapter 8 - Clustering
No ratings yet
Chapter 8 - Clustering
42 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
S27
No ratings yet
S27
30 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Cluster Analysis Usingr PDF
No ratings yet
Cluster Analysis Usingr PDF
0 pages
Clustering - The Data Ensemble Q&A
No ratings yet
Clustering - The Data Ensemble Q&A
2 pages
Clustering
No ratings yet
Clustering
75 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Module12.02 UnsupervisedLearning
No ratings yet
Module12.02 UnsupervisedLearning
25 pages
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
No ratings yet
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
24 pages
K-Mean Clustering Final
No ratings yet
K-Mean Clustering Final
21 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
Clustering The Data Ensemble
No ratings yet
Clustering The Data Ensemble
3 pages
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
No ratings yet
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
7 pages
L18_19_Clustering
No ratings yet
L18_19_Clustering
48 pages
4.1 Clustering
No ratings yet
4.1 Clustering
80 pages
Lecture-11 Cluster Analysis-1
No ratings yet
Lecture-11 Cluster Analysis-1
28 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Hierarchial Clustering
No ratings yet
Hierarchial Clustering
14 pages
07-Clustering
No ratings yet
07-Clustering
54 pages
K-Means Clustering
No ratings yet
K-Means Clustering
18 pages
ML UNIT-5 (1)
No ratings yet
ML UNIT-5 (1)
30 pages
K-Means Clustering
No ratings yet
K-Means Clustering
8 pages
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
No ratings yet
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
45 pages
Unit-4 new
No ratings yet
Unit-4 new
36 pages
Clustering
No ratings yet
Clustering
80 pages
1731009606_Clustering_(Class_38-39)
No ratings yet
1731009606_Clustering_(Class_38-39)
45 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Slides - Clustering
No ratings yet
Slides - Clustering
13 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Unit 5
No ratings yet
Unit 5
63 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Cluster Analysis
No ratings yet
Cluster Analysis
34 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
Understanding Clustering_ A Comprehensive Guide to
No ratings yet
Understanding Clustering_ A Comprehensive Guide to
5 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
Unit 2
No ratings yet
Unit 2
33 pages
Cluster Analysis I: Presidency University
No ratings yet
Cluster Analysis I: Presidency University
98 pages
MODULE 4 - 5TH SEM (2)
No ratings yet
MODULE 4 - 5TH SEM (2)
23 pages
Hierarchical Clustering: Required Data
No ratings yet
Hierarchical Clustering: Required Data
6 pages
UnSupervisedLearning
No ratings yet
UnSupervisedLearning
22 pages
Clustering: Analisis Big Data - Pertemuan 6
No ratings yet
Clustering: Analisis Big Data - Pertemuan 6
51 pages
Chapter 4 PDF
No ratings yet
Chapter 4 PDF
89 pages
Clustering
No ratings yet
Clustering
75 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
Slides - Intro To Clustering
No ratings yet
Slides - Intro To Clustering
10 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Digital For Industries Q&A
No ratings yet
Digital For Industries Q&A
1 page
Azure Virtual Machines
No ratings yet
Azure Virtual Machines
3 pages
Asure Essentials Final
100% (1)
Asure Essentials Final
9 pages
Association Rule Mining
100% (2)
Association Rule Mining
2 pages
Apigee Api
100% (3)
Apigee Api
3 pages
DT20184814357 Application
No ratings yet
DT20184814357 Application
6 pages
ResumeSallojuSrivani
No ratings yet
ResumeSallojuSrivani
2 pages
ZENITEL Tunnel Solution EN LR
No ratings yet
ZENITEL Tunnel Solution EN LR
3 pages
TPA3116 Bluetooth 5.0 HIFI 2x50W Stereo Amplifier With Audio Indicator Music Spectrum - Black
No ratings yet
TPA3116 Bluetooth 5.0 HIFI 2x50W Stereo Amplifier With Audio Indicator Music Spectrum - Black
8 pages
The Philippines Department of Finance Bureau of Internal Revenue Quezon City
No ratings yet
The Philippines Department of Finance Bureau of Internal Revenue Quezon City
8 pages
UK Land Registry - Change of Details Form COG1
No ratings yet
UK Land Registry - Change of Details Form COG1
4 pages
Pre-Alignment Delays v.1.1 - Guide EN
No ratings yet
Pre-Alignment Delays v.1.1 - Guide EN
48 pages
Class 2 - Mathematical Modeling Using Transfer Function Approach
No ratings yet
Class 2 - Mathematical Modeling Using Transfer Function Approach
14 pages
Mandirigma Proposal Letter - Frabelle
No ratings yet
Mandirigma Proposal Letter - Frabelle
8 pages
CSCI 665 Final - Any of The Objectives Can Be Tested On Your Final Examination
No ratings yet
CSCI 665 Final - Any of The Objectives Can Be Tested On Your Final Examination
2 pages
Livework Introtoblueprints
No ratings yet
Livework Introtoblueprints
29 pages
Instrumentation & Measurement: Introduction To Oscilloscope
No ratings yet
Instrumentation & Measurement: Introduction To Oscilloscope
30 pages
Siemens Polydoros SX 50 80 Generator Service Manual
No ratings yet
Siemens Polydoros SX 50 80 Generator Service Manual
7 pages
MSDN Magazine 052010
No ratings yet
MSDN Magazine 052010
100 pages
LBP1000PC
No ratings yet
LBP1000PC
48 pages
Professional Practices in Information Technology: Hina Nazar
No ratings yet
Professional Practices in Information Technology: Hina Nazar
19 pages
NAM Collections Factsheet
No ratings yet
NAM Collections Factsheet
5 pages
KFD2-SR2-2.W.SM Standstill Controller
No ratings yet
KFD2-SR2-2.W.SM Standstill Controller
4 pages
Lec2 Part1
No ratings yet
Lec2 Part1
35 pages
Fresher Golang
No ratings yet
Fresher Golang
1 page
11.servlet Wrappers
No ratings yet
11.servlet Wrappers
14 pages
DW01x-DS-17 EN 53550
No ratings yet
DW01x-DS-17 EN 53550
19 pages
Debug
No ratings yet
Debug
5 pages
Applications of Linear Transformation: I. Computer Graphics
No ratings yet
Applications of Linear Transformation: I. Computer Graphics
7 pages
Ducab - Certificate of Product Approval - Basec - 083-003-124
0% (1)
Ducab - Certificate of Product Approval - Basec - 083-003-124
2 pages
Model-for-the-Prediction-of-Default-Risk-of-Funding-Requests-Using-Data-Mining-Sameh-Ali-2
No ratings yet
Model-for-the-Prediction-of-Default-Risk-of-Funding-Requests-Using-Data-Mining-Sameh-Ali-2
8 pages
The 5G Operator: Platforms, Partnerships, and IT Strategies For Monetizing 5G
No ratings yet
The 5G Operator: Platforms, Partnerships, and IT Strategies For Monetizing 5G
23 pages
Perceptron ScanWorks V4i High Performance 3D Laser Scanner
No ratings yet
Perceptron ScanWorks V4i High Performance 3D Laser Scanner
3 pages
Vacon 20 Complete Manual-DPD00716H1-UK PDF
No ratings yet
Vacon 20 Complete Manual-DPD00716H1-UK PDF
160 pages
ET2220 Series: User Manual
No ratings yet
ET2220 Series: User Manual
56 pages
Lab-1 Protutorial PDF
No ratings yet
Lab-1 Protutorial PDF
16 pages