0% found this document useful (0 votes)
9 views6 pages

K-Means Clustering

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
9 views6 pages

K-Means Clustering

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 6

K-means clustering is an unsupervised machine learning algorithm used for

partitioning a dataset into a set of clusters. The goal of K-means clustering is to


group similar data points together and discover underlying patterns in the data.

Suppose you want to cluster your data into 3 groups on a straight line. So the K is
“3” here.
Step 1: identify the number of clusters you want to identify.

Step 2: Process to select K (there are various ways to do it). You can randomly select
the K as any number too.

Step 3: Measure the distance between the 1 st point and the three initial clusters.
Now we do the same thing to all the points , measure and assign the points to the
nearest clusters .

Step 5: Calculate the mean or the average of the each cluster.


We then repeat what we did earlier using distance formula but now using mean
value. We divide the dataset into K-clusters and assign a mean value to each , data
points are measured and placed into a cluster where they are closest to the mean
value of the cluster.

The resulting clustering is very different from what we assumed initially.


We can access the quality of the clustering by adding up the variation with each
cluster. Since K-means clustering can’t see the best clustering, its only option is to
keep track of these clusters, and their total variance and do the whole thing again
with different data points.
So, again it goes from the beginning step1 , randomly selects 3 points , calculates
the distance and clusters them . Calculates the mean of the clusters and then re-
clusters based on the new mean and repeats the steps until the clusters can no
longer change.

How to pick K?
How K-means Clustering Works:

1. Initialization:
- Choose the number of clusters (K) you want to partition the data into.
- Randomly initialize K centroids (cluster centers) in the feature space.

2. Assign Data Points to Clusters:


- For each data point, calculate the distance to each centroid.
- Assign the data point to the cluster whose centroid is closest (usually using
Euclidean distance).

3. Update Cluster Centroids:


- Recalculate the centroid of each cluster by taking the mean of all data points
assigned to that cluster.

4. Repeat:
- Repeat steps 2 and 3 until the centroids no longer change significantly or until a
specified number of iterations is reached.

Real-life Applications:

1. Customer Segmentation:
- Segment customers based on their purchasing behavior, demographics, or
website interactions. This helps businesses target specific customer groups with
personalized marketing strategies.

2. Image Compression:
- Cluster similar pixels together in an image to reduce redundancy and compress
the image size without significant loss of information.

3. Anomaly Detection:
- Identify outliers or anomalies in large datasets by clustering normal data points
together. Any data points that do not fit well into any cluster may be considered
anomalies.

4. Document Clustering:
- Group similar documents together based on their content for tasks such as topic
modeling, information retrieval, and recommendation systems.

5. Genetic Clustering:
- Cluster genes based on their expression patterns to identify groups of genes that
are co-regulated or functionally related, aiding in biological research.

6. Market Segmentation:
- Divide markets into segments based on geographical, demographic, or
behavioral characteristics of consumers, helping businesses tailor their products
and services to different market segments.

7. Social Network Analysis:


- Identify communities or groups of individuals with similar interests or
interactions in social networks, enabling targeted advertising or content
recommendations.

Overall, K-means clustering is a versatile and widely used algorithm with


applications across various domains where data needs to be grouped into clusters
based on similarity.

You might also like