0% found this document useful (0 votes)
36 views92 pages

Unit 2

The document outlines the syllabus for a Machine Learning course, focusing on Regression and Clustering techniques. It covers various types of regression, including Linear, Multivariate, and regularization methods like Ridge and Lasso, as well as clustering methods such as K-Means and Density-Based Clustering. Additionally, it discusses concepts like sensitivity analysis, bias-variance tradeoff, and performance metrics for evaluating models.

Uploaded by

Arman Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views92 pages

Unit 2

The document outlines the syllabus for a Machine Learning course, focusing on Regression and Clustering techniques. It covers various types of regression, including Linear, Multivariate, and regularization methods like Ridge and Lasso, as well as clustering methods such as K-Means and Density-Based Clustering. Additionally, it discusses concepts like sensitivity analysis, bias-variance tradeoff, and performance metrics for evaluating models.

Uploaded by

Arman Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

CSF 344

Machine Learning

Faculty & Course coordinator :


Dr. Anushikha Singh
1
06/01/2025 2
SYLLABUS : Unit 2: Regression & Clustering

 Regression: Linear Regression, Ridge Regression,


 Sensitivity Analysis, Multivariate Regression.
 Clustering: Distance measures, Different clustering
methods (Distance, Density, Hierarchical),
 Iterative distance-based clustering, dealing with
continuous, categorical values in K-Means,
 Constructing a hierarchical cluster, K-Medoids, k-Mode
and density-based clustering,
 Measures of quality of clustering, Hidden Markov Model

06/01/2025 3
Regression
Regression analysis is a statistical method to model
the relationship between a dependent (target) and
independent (predictor) variables with one or more
independent variables.
Regression analysis helps us to understand how the
value of the dependent variable is changing
corresponding to an independent variable when
other independent variables are held fixed. It
predicts continuous/real values such
as temperature, age, salary, price, etc.
• Example: Suppose there is a
marketing company A, who
does various advertisement
every year and get sales on that.
The below list shows the
advertisement made by the
company in the last 5 years and
the corresponding sales:

Now, the company wants to do the advertisement of $200 in the year 2019 and wants
to know the prediction about the sales for this year. So to solve such type of prediction
problems in machine learning, we need regression analysis.
Types of Regression

• Linear Regression
• Logistic Regression
• Polynomial Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression
• Ridge Regression
• Lasso Regression
Linear Regression in Machine Learning

• Linear regression is one of the easiest and most popular


Machine Learning algorithms. It is a statistical method
that is used for predictive analysis. Linear regression
makes predictions for continuous/real or numeric
variables such as sales, salary, age, product price, etc.
• Linear regression algorithm shows a linear relationship
between a dependent (y) and one or more independent
(y) variables, hence called as linear regression. Since
linear regression shows the linear relationship, which
means it finds how the value of the dependent variable
is changing according to the value of the independent
variable.
The linear regression model provides a sloped straight line
representing the relationship between the variables.
Mathematically, we can represent a linear regression
as:
y= a0+a1x+ ε
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree
of freedom)
a1 = Linear regression coefficient (scale factor to
each input value).
ε = random error
The values for x and y variables are training datasets
for Linear Regression model representation.
COST FUNCTION

06/01/2025 9
GRADIENT DESCENT

06/01/2025 10
06/01/2025 11
Multiple Linear Regression

06/01/2025 12
How to perform Multiple
Linear Regression

06/01/2025 13
Types of Linear Regression

• Linear regression can be further divided into two types of


the algorithm:
• Simple Linear Regression:
If a single independent variable is used to predict the value
of a numerical dependent variable, then such a Linear
Regression algorithm is called Simple Linear Regression.
• Multiple Linear regression:
If more than one independent variable is used to predict the
value of a numerical dependent variable, then such a Linear
Regression algorithm is called Multiple Linear Regression.
Multivariate Regression
• Multivariate Regression is a supervised machine learning algorithm
involving multiple data variables for analysis. Multivariate
regression is an extension of multiple regression with one dependent
variable and multiple independent variables. Based on the number of
independent variables, we try to predict the output.
• Praneeta wants to estimate the price of a house. She will collect
details such as the location of the house, number of bedrooms, size
in square feet, amenities available, or not. Basis these details price of
the house can be predicted and how each variables are interrelated.
• An agriculture scientist wants to predict the total crop yield expected
for the summer. He collected details of the expected amount of
rainfall, fertilizers to be used, and soil conditions. By building a
Multivariate regression model scientists can predict his crop yield.
With the crop yield, the scientist also tries to understand the
relationship among the variables.
Multivariate Regression

Mathematical equation
The simple regression linear model represents a straight line meaning y is a
function of x. When we have an extra dimension (z), the straight line becomes
a plane.
Here, the plane is the function that expresses y as a function of x and z. The linear
regression equation can now be expressed as:
y = m1.x + m2.z+ c

y is the dependent variable, that is, the variable that needs to be predicted.
x is the first independent variable. It is the first input.
m1 is the slope of x1. It lets us know the angle of the line (x).
z is the second independent variable. It is the second input.
m2 is the slope of z. It helps us to know the angle of the line (z).
c is the intercept. A constant that finds the value of y when x and z are 0.
Regularization
• Regularization is one of the most important concepts of machine
learning. It is a technique to prevent the model from overfitting by
adding extra information to it.
• Sometimes the machine learning model performs well with the training
data but does not perform well with the test data. It means the model is
not able to predict the output when deals with unseen data by
introducing noise in the output, and hence the model is called overfitted.
• This technique can be used in such a way that it will allow to maintain
all variables or features in the model by reducing the magnitude of the
variables. Hence, it maintains accuracy as well as a generalization of the
model.
• y= β0+β1x1+β2x2+β3x3+⋯+βnxn +b
BIAS

06/01/2025 18
Bias
• In machine learning model analyses the data, find patterns in it
and make predictions. While training, the model learns these
patterns in the dataset and applies them to test data for
prediction. While making predictions, a difference occurs
between prediction values made by the model and actual
values/expected values, and this difference is known as bias
errors or Errors due to bias.
A model has either:
• Low Bias: A low bias model will make fewer assumptions about
the form of the target function.
• High Bias: A model with a high bias makes more assumptions,
and the model becomes unable to capture the important features
of our dataset. A high bias model also cannot perform well on
new data.
Variance

06/01/2025 20
06/01/2025 21
BIAS-VARIANCE TRADEOFF

06/01/2025 22
BIAS-VARIANCE TRADEOFF

06/01/2025 23
Performance Metrics

06/01/2025 24
R Square/Adjusted R Square

06/01/2025 25
Mean Absolute Error(MAE)

06/01/2025 26
Mean Square Error(MSE)/
Root Mean Square
Error(RMSE)

06/01/2025 27
Techniques of Regularization
There are mainly two types of regularization techniques, which are given
below:
• Ridge Regression
• Lasso Regression
Ridge Regression
• Ridge regression is one of the types of linear regression in which a small
amount of bias is introduced so that we can get better long-term
predictions.
• Ridge regression is a regularization technique, which is used to reduce
the complexity of the model. It is also called as L2 regularization.
• In this technique, the cost function is altered by adding the penalty term
to it. The amount of bias added to the model is called Ridge Regression
penalty. We can calculate it by multiplying with the lambda to the
squared weight of each individual feature.
• A general linear or polynomial regression will fail if there is high collinearity
between the independent variables, so to solve such problems, Ridge
regression can be used.
• Ridge regression is a regularization technique, which is used to reduce the
complexity of the model. It is also called as L2 regularization.
• It helps to solve the problems if we have more parameters than samples.

Lasso Regression:
• Lasso regression is another regularization technique to reduce the complexity
of the model.
• It is similar to the Ridge Regression except that penalty term contains only
the absolute weights instead of a square of weights.
• Since it takes absolute values, hence, it can shrink the slope to 0, whereas
Ridge Regression can only shrink it near to 0.
• It is also called as L1 regularization. The equation for Lasso regression will
be:
Sensitivity Analysis
• Sensitivity analysis determines how different values of an
independent variable affect a particular dependent variable under a
given set of assumptions. In other words, sensitivity analyses study
how various sources of uncertainty in a mathematical model
contribute to the model's overall uncertainty. This technique is used
within specific boundaries that depend on one or more input
variables.
• Sensitivity analysis is a method to explore the impact of feature
changes on the model. In this method, we will change one feature
and keep others to constant, and check the impact on model output.
The main goal of Sensitivity analysis is to observe the effects of
feature changes on the optimal solutions for the model. It can
provide additional insights or information for the optimal solutions
to an model.
We can perform Sensitivity
Analysis in 3 ways

• A change in the value of Objective function


coefficients
• A change in the right-hand-side value of a constant.
• A change in a coefficient of constant.
Clustering
Clustering in Machine Learning

A way of grouping the data points into different clusters,


consisting of similar data points. The objects with the possible
similarities remain in a group that has less or no similarities
with another group.

• It does it by finding some similar patterns in the unlabelled


dataset such as shape, size, color, behavior, etc., and divides
them as per the presence and absence of those similar patterns.

• It is an unsupervised learning method, hence no supervision is


provided to the algorithm, and it deals with the unlabeled
dataset.
Clustering in Machine Learning

Example: Let's understand the clustering technique with the real-


world example of Mall: When we visit any shopping mall, we
can observe that the things with similar usage are grouped
together. Such as the t-shirts are grouped in one section, and
trousers are at other sections, similarly, at vegetable sections,
apples, bananas, Mangoes, etc., are grouped in separate
sections, so that we can easily find out the things.
Example:
• Market Segmentation
• Statistical data analysis
• Social network analysis
• Image segmentation
• Anomaly detection, etc.
Types of Clustering Methods
The clustering methods are broadly divided into Hard
clustering (data point belongs to only one group) and Soft
Clustering (data points can belong to another group also).
But there are also other various approaches of Clustering
exist. Below are the main clustering methods used in
Machine learning:

• Partitioning Clustering
• Density-Based Clustering
• Distribution Model-Based Clustering
• Hierarchical Clustering
• Fuzzy Clustering
Partitioning Clustering
• It is a type of clustering that divides the
data into non-hierarchical groups. It is
also known as the centroid-based
method. The most common example of
partitioning clustering is the K-Means
Clustering algorithm.

• In this type, the dataset is divided into a


set of k groups, where K is used to define
the number of pre-defined groups. The
cluster centre is created in such a way
that the distance between the data points
of one cluster is minimum as compared
to another cluster centroid.
K-Means Clustering
• K-Means Clustering is an unsupervised learning algorithm that is
used to solve the clustering problems in machine learning or
data science.

• K-Means Clustering is an Unsupervised Learning algorithm,


which groups the unlabeled dataset into different clusters. Here
K defines the number of pre-defined clusters that need to be
created in the process, as if K=2, there will be two clusters, and
for K=3, there will be three clusters, and so on.

• It is an iterative algorithm that divides the unlabeled dataset into


k different clusters in such a way that each dataset belongs only
one group that has similar properties.
K-Means Clustering
• It allows us to cluster the data into different groups and a
convenient way to discover the categories of groups in the
unlabeled dataset on its own without the need for any training.

• It is a centroid-based algorithm, where each cluster is associated


with a centroid. The main aim of this algorithm is to minimize
the sum of distances between the data point and their
corresponding clusters.

• The algorithm takes the unlabeled dataset as input, divides the


dataset into k-number of clusters, and repeats the process until
it does not find the best clusters. The value of k should be
predetermined in this algorithm.
Steps involved in K-Means
Clustering
• The first step when using k-means clustering is to indicate the
number of clusters (k) that will be generated in the final solution.

• The algorithm starts by randomly selecting k objects from the


data set to serve as the initial centres for the clusters. The
selected objects are also known as cluster means or centroids.

• Next, each of the remaining objects is assigned to it’s closest


centroid, where closest is defined using the Euclidean
distance between the object and the cluster mean. This step is
called “cluster assignment step”.
Steps involved in K-Means
Clustering
• After the assignment step, the algorithm computes the new
mean value of each cluster. The term cluster “centroid update” is
used to design this step. Now that the centres have been
recalculated, every observation is checked again to see if it
might be closer to a different cluster. All the objects are
reassigned again using the updated cluster means.

• The cluster assignment and centroid update steps are iteratively


repeated until the cluster assignments stop changing (i.e
until convergence is achieved). That is, the clusters formed in
the current iteration are the same as those obtained in the
previous iteration.
• K-mean Clustering Solved Example
Density-Based Clustering

• The density-based clustering method connects the


highly-dense areas into clusters, and the arbitrarily
shaped distributions are formed as long as the dense
region can be connected. This algorithm does it by
identifying different clusters in the dataset and
connects the areas of high densities into clusters.
The dense areas in data space are divided from each
other by sparser areas.
Parameters Required For DBSCAN
Algorithm
• eps: It defines the neighborhood around a data point i.e. if the distance between two
points is lower or equal to ‘eps’ then they are considered neighbors. If the eps value is
chosen too small then a large part of the data will be considered as an outlier. If it is
chosen very large then the clusters will merge and the majority of the data points will be
in the same clusters.

• One way to find the eps value is based on the k-distance graph.

• MinPts: Minimum number of neighbors (data points) within eps radius. The larger the
dataset, the larger value of MinPts must be chosen. As a general rule, the minimum
MinPts can be derived from the number of dimensions D in the dataset as, MinPts >=
D+1.

• The minimum value of MinPts must be chosen at least 3.


Parameters Required For DBSCAN
Algorithm
In this algorithm, we have 3 types of data
points.

• Core Point: A point is a core point if it has more than MinPts points within
eps.

• Border Point: A point which has fewer than MinPts within eps but it is in
the neighborhood of a core point.


Noise or outlier: A point which is not a core point or border point.
Parameters Required For DBSCAN
Algorithm
Steps Used In DBSCAN Algorithm
• Find all the neighbor points within eps and identify the core points or
visited with more than MinPts neighbors.

• For each core point if it is not already assigned to a cluster, create a new
cluster.

• Find recursively all its density-connected points and assign them to the
same cluster as the core point.
A point a and b are said to be density connected if there exists a
point c which has a sufficient number of points in its neighbors and both
points a and b are within the eps distance. This is a chaining process. So,
if b is a neighbor of c, c is a neighbor of d, and d is a neighbor of e, which in
turn is neighbor of a implying that b is a neighbor of a.

• Iterate through the remaining unvisited points in the dataset. Those points
that do not belong to any cluster are noise.
Distribution Model-Based Clustering

• In the distribution model-based


clustering method, the data is
divided based on the probability
of how a dataset belongs to a
particular distribution. The
grouping is done by assuming
some distributions
commonly Gaussian Distribution.
• The example of this type is
the Expectation-Maximization
Clustering algorithm that uses
Gaussian Mixture Models
(GMM).
Expectation Minimization
Algo

06/01/2025 67
Hierarchical Clustering
• Hierarchical clustering can be
used as an alternative for the
partitioned clustering as there is
no requirement of pre-specifying
the number of clusters to be
created. In this technique, the
dataset is divided into clusters to
create a tree-like structure, which
is also called a dendrogram. The
observations or any number of
clusters can be selected by cutting
the tree at the correct level. The
most common example of this
method is the Agglomerative
Hierarchical algorithm.
K-Medoids clustering

• Medoid: A Medoid is a point in the cluster from which the sum


of distances to other data points is minimal.

• K-medoids is an unsupervised method with unlabelled data to be


clustered. It is an improvised version of the K-Means algorithm
mainly designed to deal with outlier data sensitivity. Compared to
other partitioning algorithms, the algorithm is simple, fast, and easy
to implement.

The partitioning will be carried on such that:


• Each cluster must have at least one object
• An object must belong to only one cluster
• Here is a small recap on K-Means clustering:
K-Medoids:

• A Medoid is a point in the cluster from which dissimilarities


with all the other points in the clusters are minimal.

• Instead of centroids as reference points in K-Means algorithms,


the K-Medoids algorithm takes a Medoid as a reference point.

There are three types of algorithms for K-Medoids Clustering:


• PAM (Partitioning Around Clustering)
• CLARA (Clustering Large Applications)
• CLARANS (Randomized Clustering Large Applications)
Algorithm
Given the value of k and unlabelled data:
• Choose k number of random points from the data and assign these k
points to k number of clusters. These are the initial medoids.
• For all the remaining data points, calculate the distance from each
medoid and assign it to the cluster with the nearest medoid.
• Calculate the total cost (Sum of all the distances from all the data points
to the medoids)
• Select a random point as the new medoid and swap it with the previous
medoid. Repeat 2 and 3 steps.
• If the total cost of the new medoid is less than that of the previous
medoid, make the new medoid permanent and repeat step 4.
• If the total cost of the new medoid is greater than the cost of the
previous medoid, undo the swap and repeat step 4.
• The Repetitions have to continue until no change is encountered with
new medoids to classify data points.
K-mode clustering

• K-mode clustering is an unsupervised machine-learning technique


used to group a set of data objects into a specified number of
clusters, based on their categorical attributes. The algorithm is
called “K-Mode” because it uses modes (i.e. the most frequent
values) instead of means or medians to represent the clusters.
• In K-means clustering when we used categorical data after
converting it into a numerical form. it doesn’t give a good result for
high-dimensional data.
So, Some changes are made for categorical data t.
• Replace Euclidean distance with Dissimilarity metric
• Replace Mean by Mode for cluster centers.
• Apply a frequency-based method in each iteration to update the
mode.
Measures for Quality of Clustering:
If all the data objects in the cluster are highly similar then the
cluster has high quality. We can measure the quality
of Clustering by using the Dissimilarity/Similarity metric in
most situations. But there are some other methods to measure
the Qualities of Good Clustering if the clusters are alike.
• 1. Dissimilarity/Similarity metric: The similarity between the
clusters can be expressed in terms of a distance function, which
is represented by d(i, j). Distance functions are different for
various data types and data variables. Distance function
measure is different for continuous-valued variables, categorical
variables, and vector variables. Distance function can be
expressed as Euclidean distance, Mahalanobis distance, and
Cosine distance for different types of data.
• 2. Cluster completeness: Cluster completeness is the essential
parameter for good clustering, if any two data objects are
having similar characteristics then they are assigned to the same
category of the cluster according to ground truth. Cluster
completeness is high if the objects are of the same category.
3. Ragbag: In some situations, there can be a few categories
in which the objects of those categories cannot be merged
with other objects. Then the quality of those cluster
categories is measured by the Rag Bag method. According
to the rag bag method, we should put the heterogeneous
object into a rag bag category.
4. Small cluster preservation: If a small category of
clustering is further split into small pieces, then those small
pieces of cluster become noise to the entire clustering and
thus it becomes difficult to identify that small category from
the clustering. The small cluster preservation criterion states
that are splitting a small category into pieces is not
advisable and it further decreases the quality of clusters as
the pieces of clusters are distinctive. Suppose clustering C1
has split into three clusters, C11 = {d1, . . . , dn}, C12 =
{dn+1}, and C13 = {dn+2}.
Density Based Clustering
• Locate areas of high density that are separated
from low density areas
• DBSCAN (Density based Spatial Clustering of
Applications with Noise)
• Based on density of data points
• Consider outliers as noise

06/01/2025 81
2 Input parameters:
• Radius around each point (eps)

• Minimum no of data points that should be around


that point within radius (minpts)

06/01/2025 82
Algorithm
• Step 1: Select a random value for eps and minpts.
• Step 2: For a particular point, calculate its distance
from every other point
• Step 3: Find all the neighbour points of X (which
fall inside the circle of radius eps)
More neighbour High density
• Step 4: X is called “Core Point”
if neighbours of X > Min pts
Otherwise “Border Point”
• Step 5: Repeat this process for all points
06/01/2025 83
Hidden Markov Model
• A statistical model called a Hidden Markov Model (HMM) is used
to describe systems with changing unobservable states over time.
It is predicated on the idea that there is an underlying process
with concealed states, each of which has a known result.
Probabilities for switching between concealed states and emitting
observable symbols are defined by the model.
• Because of their superior ability to capture uncertainty and
temporal dependencies, HMMs are used in a wide range of
industries, including finance, bioinformatics, and speech
recognition. HMMs are useful for modelling dynamic systems and
forecasting future states based on sequences that have been seen
because of their flexibility.

06/01/2025 84
06/01/2025 85
HMM
• An HMM consists of two types of variables: hidden
states and observations.
• The hidden states are the underlying variables that
generate the observed data, but they are not
directly observable.
• The observations are the variables that are
measured and observed.

06/01/2025 86
• The relationship between the hidden states and the
observations is modeled using a probability
distribution. The Hidden Markov Model (HMM) is
the relationship between the hidden states and the
observations using two sets of probabilities: the
transition probabilities and the emission
probabilities.
• The transition probabilities describe the probability
of transitioning from one hidden state to another.
• The emission probabilities describe the probability
of observing an output given a hidden state.

06/01/2025 87
Hidden Markov
Model Algorithm
• The Hidden Markov Model (HMM) algorithm can be
implemented using the following steps:
• Step 1: Define the state space and observation space
• The state space is the set of all possible hidden states, and
the observation space is the set of all possible observations.
• Step 2: Define the initial state distribution
• This is the probability distribution over the initial state.
• Step 3: Define the state transition probabilities
• These are the probabilities of transitioning from one state
to another. This forms the transition matrix, which describes
the probability of moving from one state to another.
06/01/2025 88
• Step 4: Define the observation likelihoods:
• These are the probabilities of generating each observation
from each state. This forms the emission matrix, which
describes the probability of generating each observation
from each state.
• Step 5: Train the model
• The parameters of the state transition probabilities and the
observation likelihoods are estimated using the Baum-
Welch algorithm, or the forward-backward algorithm. This is
done by iteratively updating the parameters until
convergence.

06/01/2025 89
• Step 6: Decode the most likely sequence of hidden states
• Given the observed data, the Viterbi algorithm is used to
compute the most likely sequence of hidden states. This can
be used to predict future observations, classify sequences,
or detect patterns in sequential data.
• Step 7: Evaluate the model
• The performance of the HMM can be evaluated using
various metrics, such as accuracy, precision, recall, or F1
score.

06/01/2025 90
Summary
• The HMM algorithm involves defining the state
space, observation space, and the parameters of
the state transition probabilities and observation
likelihoods, training the model using the Baum-
Welch algorithm or the forward-backward
algorithm, decoding the most likely sequence of
hidden states using the Viterbi algorithm, and
evaluating the performance of the model.

06/01/2025 91
Thank You

Khushboo Jain CS401 and ML Applications using R Unit -1

You might also like