0% found this document useful (0 votes)

36 views92 pages

Unit 2

The document outlines the syllabus for a Machine Learning course, focusing on Regression and Clustering techniques. It covers various types of regression, including Linear, Multivariate, and regularization methods like Ridge and Lasso, as well as clustering methods such as K-Means and Density-Based Clustering. Additionally, it discusses concepts like sensitivity analysis, bias-variance tradeoff, and performance metrics for evaluating models.

Uploaded by

Arman Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views92 pages

Unit 2

Uploaded by

Arman Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

CSF 344

Machine Learning

Faculty & Course coordinator :

Dr. Anushikha Singh
1
06/01/2025 2
SYLLABUS : Unit 2: Regression & Clustering

 Regression: Linear Regression, Ridge Regression,

 Sensitivity Analysis, Multivariate Regression.
 Clustering: Distance measures, Different clustering
methods (Distance, Density, Hierarchical),
 Iterative distance-based clustering, dealing with
continuous, categorical values in K-Means,
 Constructing a hierarchical cluster, K-Medoids, k-Mode
and density-based clustering,
 Measures of quality of clustering, Hidden Markov Model

06/01/2025 3
Regression
Regression analysis is a statistical method to model
the relationship between a dependent (target) and
independent (predictor) variables with one or more
independent variables.
Regression analysis helps us to understand how the
value of the dependent variable is changing
corresponding to an independent variable when
other independent variables are held fixed. It
predicts continuous/real values such
as temperature, age, salary, price, etc.
• Example: Suppose there is a
marketing company A, who
does various advertisement
every year and get sales on that.
The below list shows the
advertisement made by the
company in the last 5 years and
the corresponding sales:

Now, the company wants to do the advertisement of $200 in the year 2019 and wants
to know the prediction about the sales for this year. So to solve such type of prediction
problems in machine learning, we need regression analysis.
Types of Regression

• Linear Regression
• Logistic Regression
• Polynomial Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression
• Ridge Regression
• Lasso Regression
Linear Regression in Machine Learning

• Linear regression is one of the easiest and most popular

Machine Learning algorithms. It is a statistical method
that is used for predictive analysis. Linear regression
makes predictions for continuous/real or numeric
variables such as sales, salary, age, product price, etc.
• Linear regression algorithm shows a linear relationship
between a dependent (y) and one or more independent
(y) variables, hence called as linear regression. Since
linear regression shows the linear relationship, which
means it finds how the value of the dependent variable
is changing according to the value of the independent
variable.
The linear regression model provides a sloped straight line
representing the relationship between the variables.
Mathematically, we can represent a linear regression
as:
y= a0+a1x+ ε
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree
of freedom)
a1 = Linear regression coefficient (scale factor to
each input value).
ε = random error
The values for x and y variables are training datasets
for Linear Regression model representation.
COST FUNCTION

06/01/2025 9
GRADIENT DESCENT

06/01/2025 10
06/01/2025 11
Multiple Linear Regression

06/01/2025 12
How to perform Multiple
Linear Regression

06/01/2025 13
Types of Linear Regression

• Linear regression can be further divided into two types of

the algorithm:
• Simple Linear Regression:
If a single independent variable is used to predict the value
of a numerical dependent variable, then such a Linear
Regression algorithm is called Simple Linear Regression.
• Multiple Linear regression:
If more than one independent variable is used to predict the
value of a numerical dependent variable, then such a Linear
Regression algorithm is called Multiple Linear Regression.
Multivariate Regression
• Multivariate Regression is a supervised machine learning algorithm
involving multiple data variables for analysis. Multivariate
regression is an extension of multiple regression with one dependent
variable and multiple independent variables. Based on the number of
independent variables, we try to predict the output.
• Praneeta wants to estimate the price of a house. She will collect
details such as the location of the house, number of bedrooms, size
in square feet, amenities available, or not. Basis these details price of
the house can be predicted and how each variables are interrelated.
• An agriculture scientist wants to predict the total crop yield expected
for the summer. He collected details of the expected amount of
rainfall, fertilizers to be used, and soil conditions. By building a
Multivariate regression model scientists can predict his crop yield.
With the crop yield, the scientist also tries to understand the
relationship among the variables.
Multivariate Regression

Mathematical equation
The simple regression linear model represents a straight line meaning y is a
function of x. When we have an extra dimension (z), the straight line becomes
a plane.
Here, the plane is the function that expresses y as a function of x and z. The linear
regression equation can now be expressed as:
y = m1.x + m2.z+ c

y is the dependent variable, that is, the variable that needs to be predicted.
x is the first independent variable. It is the first input.
m1 is the slope of x1. It lets us know the angle of the line (x).
z is the second independent variable. It is the second input.
m2 is the slope of z. It helps us to know the angle of the line (z).
c is the intercept. A constant that finds the value of y when x and z are 0.
Regularization
• Regularization is one of the most important concepts of machine
learning. It is a technique to prevent the model from overfitting by
adding extra information to it.
• Sometimes the machine learning model performs well with the training
data but does not perform well with the test data. It means the model is
not able to predict the output when deals with unseen data by
introducing noise in the output, and hence the model is called overfitted.
• This technique can be used in such a way that it will allow to maintain
all variables or features in the model by reducing the magnitude of the
variables. Hence, it maintains accuracy as well as a generalization of the
model.
• y= β0+β1x1+β2x2+β3x3+⋯+βnxn +b
BIAS

06/01/2025 18
Bias
• In machine learning model analyses the data, find patterns in it
and make predictions. While training, the model learns these
patterns in the dataset and applies them to test data for
prediction. While making predictions, a difference occurs
between prediction values made by the model and actual
values/expected values, and this difference is known as bias
errors or Errors due to bias.
A model has either:
• Low Bias: A low bias model will make fewer assumptions about
the form of the target function.
• High Bias: A model with a high bias makes more assumptions,
and the model becomes unable to capture the important features
of our dataset. A high bias model also cannot perform well on
new data.
Variance

06/01/2025 20
06/01/2025 21
BIAS-VARIANCE TRADEOFF

06/01/2025 22
BIAS-VARIANCE TRADEOFF

06/01/2025 23
Performance Metrics

06/01/2025 24
R Square/Adjusted R Square

06/01/2025 25
Mean Absolute Error(MAE)

06/01/2025 26
Mean Square Error(MSE)/
Root Mean Square
Error(RMSE)

06/01/2025 27
Techniques of Regularization
There are mainly two types of regularization techniques, which are given
below:
• Ridge Regression
• Lasso Regression
Ridge Regression
• Ridge regression is one of the types of linear regression in which a small
amount of bias is introduced so that we can get better long-term
predictions.
• Ridge regression is a regularization technique, which is used to reduce
the complexity of the model. It is also called as L2 regularization.
• In this technique, the cost function is altered by adding the penalty term
to it. The amount of bias added to the model is called Ridge Regression
penalty. We can calculate it by multiplying with the lambda to the
squared weight of each individual feature.
• A general linear or polynomial regression will fail if there is high collinearity
between the independent variables, so to solve such problems, Ridge
regression can be used.
• Ridge regression is a regularization technique, which is used to reduce the
complexity of the model. It is also called as L2 regularization.
• It helps to solve the problems if we have more parameters than samples.

Lasso Regression:
• Lasso regression is another regularization technique to reduce the complexity
of the model.
• It is similar to the Ridge Regression except that penalty term contains only
the absolute weights instead of a square of weights.
• Since it takes absolute values, hence, it can shrink the slope to 0, whereas
Ridge Regression can only shrink it near to 0.
• It is also called as L1 regularization. The equation for Lasso regression will
be:
Sensitivity Analysis
• Sensitivity analysis determines how different values of an
independent variable affect a particular dependent variable under a
given set of assumptions. In other words, sensitivity analyses study
how various sources of uncertainty in a mathematical model
contribute to the model's overall uncertainty. This technique is used
within specific boundaries that depend on one or more input
variables.
• Sensitivity analysis is a method to explore the impact of feature
changes on the model. In this method, we will change one feature
and keep others to constant, and check the impact on model output.
The main goal of Sensitivity analysis is to observe the effects of
feature changes on the optimal solutions for the model. It can
provide additional insights or information for the optimal solutions
to an model.
We can perform Sensitivity
Analysis in 3 ways

• A change in the value of Objective function

coefficients
• A change in the right-hand-side value of a constant.
• A change in a coefficient of constant.
Clustering
Clustering in Machine Learning

A way of grouping the data points into different clusters,

consisting of similar data points. The objects with the possible
similarities remain in a group that has less or no similarities
with another group.

• It does it by finding some similar patterns in the unlabelled

dataset such as shape, size, color, behavior, etc., and divides
them as per the presence and absence of those similar patterns.

• It is an unsupervised learning method, hence no supervision is

provided to the algorithm, and it deals with the unlabeled
dataset.
Clustering in Machine Learning

Example: Let's understand the clustering technique with the real-

world example of Mall: When we visit any shopping mall, we
can observe that the things with similar usage are grouped
together. Such as the t-shirts are grouped in one section, and
trousers are at other sections, similarly, at vegetable sections,
apples, bananas, Mangoes, etc., are grouped in separate
sections, so that we can easily find out the things.
Example:
• Market Segmentation
• Statistical data analysis
• Social network analysis
• Image segmentation
• Anomaly detection, etc.
Types of Clustering Methods
The clustering methods are broadly divided into Hard
clustering (data point belongs to only one group) and Soft
Clustering (data points can belong to another group also).
But there are also other various approaches of Clustering
exist. Below are the main clustering methods used in
Machine learning:

• Partitioning Clustering
• Density-Based Clustering
• Distribution Model-Based Clustering
• Hierarchical Clustering
• Fuzzy Clustering
Partitioning Clustering
• It is a type of clustering that divides the
data into non-hierarchical groups. It is
also known as the centroid-based
method. The most common example of
partitioning clustering is the K-Means
Clustering algorithm.

• In this type, the dataset is divided into a

set of k groups, where K is used to define
the number of pre-defined groups. The
cluster centre is created in such a way
that the distance between the data points
of one cluster is minimum as compared
to another cluster centroid.
K-Means Clustering
• K-Means Clustering is an unsupervised learning algorithm that is
used to solve the clustering problems in machine learning or
data science.

• K-Means Clustering is an Unsupervised Learning algorithm,

which groups the unlabeled dataset into different clusters. Here
K defines the number of pre-defined clusters that need to be
created in the process, as if K=2, there will be two clusters, and
for K=3, there will be three clusters, and so on.

• It is an iterative algorithm that divides the unlabeled dataset into

k different clusters in such a way that each dataset belongs only
one group that has similar properties.
K-Means Clustering
• It allows us to cluster the data into different groups and a
convenient way to discover the categories of groups in the
unlabeled dataset on its own without the need for any training.

• It is a centroid-based algorithm, where each cluster is associated

with a centroid. The main aim of this algorithm is to minimize
the sum of distances between the data point and their
corresponding clusters.

• The algorithm takes the unlabeled dataset as input, divides the

dataset into k-number of clusters, and repeats the process until
it does not find the best clusters. The value of k should be
predetermined in this algorithm.
Steps involved in K-Means
Clustering
• The first step when using k-means clustering is to indicate the
number of clusters (k) that will be generated in the final solution.

• The algorithm starts by randomly selecting k objects from the

data set to serve as the initial centres for the clusters. The
selected objects are also known as cluster means or centroids.

• Next, each of the remaining objects is assigned to it’s closest

centroid, where closest is defined using the Euclidean
distance between the object and the cluster mean. This step is
called “cluster assignment step”.
Steps involved in K-Means
Clustering
• After the assignment step, the algorithm computes the new
mean value of each cluster. The term cluster “centroid update” is
used to design this step. Now that the centres have been
recalculated, every observation is checked again to see if it
might be closer to a different cluster. All the objects are
reassigned again using the updated cluster means.

• The cluster assignment and centroid update steps are iteratively

repeated until the cluster assignments stop changing (i.e
until convergence is achieved). That is, the clusters formed in
the current iteration are the same as those obtained in the
previous iteration.
• K-mean Clustering Solved Example
Density-Based Clustering

• The density-based clustering method connects the

highly-dense areas into clusters, and the arbitrarily
shaped distributions are formed as long as the dense
region can be connected. This algorithm does it by
identifying different clusters in the dataset and
connects the areas of high densities into clusters.
The dense areas in data space are divided from each
other by sparser areas.
Parameters Required For DBSCAN
Algorithm
• eps: It defines the neighborhood around a data point i.e. if the distance between two
points is lower or equal to ‘eps’ then they are considered neighbors. If the eps value is
chosen too small then a large part of the data will be considered as an outlier. If it is
chosen very large then the clusters will merge and the majority of the data points will be
in the same clusters.

• One way to find the eps value is based on the k-distance graph.

• MinPts: Minimum number of neighbors (data points) within eps radius. The larger the
dataset, the larger value of MinPts must be chosen. As a general rule, the minimum
MinPts can be derived from the number of dimensions D in the dataset as, MinPts >=
D+1.

• The minimum value of MinPts must be chosen at least 3.

Parameters Required For DBSCAN
Algorithm
In this algorithm, we have 3 types of data
points.

• Core Point: A point is a core point if it has more than MinPts points within
eps.

• Border Point: A point which has fewer than MinPts within eps but it is in
the neighborhood of a core point.

•
Noise or outlier: A point which is not a core point or border point.
Parameters Required For DBSCAN
Algorithm
Steps Used In DBSCAN Algorithm
• Find all the neighbor points within eps and identify the core points or
visited with more than MinPts neighbors.

• For each core point if it is not already assigned to a cluster, create a new
cluster.

• Find recursively all its density-connected points and assign them to the
same cluster as the core point.
A point a and b are said to be density connected if there exists a
point c which has a sufficient number of points in its neighbors and both
points a and b are within the eps distance. This is a chaining process. So,
if b is a neighbor of c, c is a neighbor of d, and d is a neighbor of e, which in
turn is neighbor of a implying that b is a neighbor of a.

• Iterate through the remaining unvisited points in the dataset. Those points
that do not belong to any cluster are noise.
Distribution Model-Based Clustering

• In the distribution model-based

clustering method, the data is
divided based on the probability
of how a dataset belongs to a
particular distribution. The
grouping is done by assuming
some distributions
commonly Gaussian Distribution.
• The example of this type is
the Expectation-Maximization
Clustering algorithm that uses
Gaussian Mixture Models
(GMM).
Expectation Minimization
Algo

06/01/2025 67
Hierarchical Clustering
• Hierarchical clustering can be
used as an alternative for the
partitioned clustering as there is
no requirement of pre-specifying
the number of clusters to be
created. In this technique, the
dataset is divided into clusters to
create a tree-like structure, which
is also called a dendrogram. The
observations or any number of
clusters can be selected by cutting
the tree at the correct level. The
most common example of this
method is the Agglomerative
Hierarchical algorithm.
K-Medoids clustering

• Medoid: A Medoid is a point in the cluster from which the sum

of distances to other data points is minimal.

• K-medoids is an unsupervised method with unlabelled data to be

clustered. It is an improvised version of the K-Means algorithm
mainly designed to deal with outlier data sensitivity. Compared to
other partitioning algorithms, the algorithm is simple, fast, and easy
to implement.

The partitioning will be carried on such that:

• Each cluster must have at least one object
• An object must belong to only one cluster
• Here is a small recap on K-Means clustering:
K-Medoids:

• A Medoid is a point in the cluster from which dissimilarities

with all the other points in the clusters are minimal.

• Instead of centroids as reference points in K-Means algorithms,

the K-Medoids algorithm takes a Medoid as a reference point.

There are three types of algorithms for K-Medoids Clustering:

• PAM (Partitioning Around Clustering)
• CLARA (Clustering Large Applications)
• CLARANS (Randomized Clustering Large Applications)
Algorithm
Given the value of k and unlabelled data:
• Choose k number of random points from the data and assign these k
points to k number of clusters. These are the initial medoids.
• For all the remaining data points, calculate the distance from each
medoid and assign it to the cluster with the nearest medoid.
• Calculate the total cost (Sum of all the distances from all the data points
to the medoids)
• Select a random point as the new medoid and swap it with the previous
medoid. Repeat 2 and 3 steps.
• If the total cost of the new medoid is less than that of the previous
medoid, make the new medoid permanent and repeat step 4.
• If the total cost of the new medoid is greater than the cost of the
previous medoid, undo the swap and repeat step 4.
• The Repetitions have to continue until no change is encountered with
new medoids to classify data points.
K-mode clustering

• K-mode clustering is an unsupervised machine-learning technique

used to group a set of data objects into a specified number of
clusters, based on their categorical attributes. The algorithm is
called “K-Mode” because it uses modes (i.e. the most frequent
values) instead of means or medians to represent the clusters.
• In K-means clustering when we used categorical data after
converting it into a numerical form. it doesn’t give a good result for
high-dimensional data.
So, Some changes are made for categorical data t.
• Replace Euclidean distance with Dissimilarity metric
• Replace Mean by Mode for cluster centers.
• Apply a frequency-based method in each iteration to update the
mode.
Measures for Quality of Clustering:
If all the data objects in the cluster are highly similar then the
cluster has high quality. We can measure the quality
of Clustering by using the Dissimilarity/Similarity metric in
most situations. But there are some other methods to measure
the Qualities of Good Clustering if the clusters are alike.
• 1. Dissimilarity/Similarity metric: The similarity between the
clusters can be expressed in terms of a distance function, which
is represented by d(i, j). Distance functions are different for
various data types and data variables. Distance function
measure is different for continuous-valued variables, categorical
variables, and vector variables. Distance function can be
expressed as Euclidean distance, Mahalanobis distance, and
Cosine distance for different types of data.
• 2. Cluster completeness: Cluster completeness is the essential
parameter for good clustering, if any two data objects are
having similar characteristics then they are assigned to the same
category of the cluster according to ground truth. Cluster
completeness is high if the objects are of the same category.
3. Ragbag: In some situations, there can be a few categories
in which the objects of those categories cannot be merged
with other objects. Then the quality of those cluster
categories is measured by the Rag Bag method. According
to the rag bag method, we should put the heterogeneous
object into a rag bag category.
4. Small cluster preservation: If a small category of
clustering is further split into small pieces, then those small
pieces of cluster become noise to the entire clustering and
thus it becomes difficult to identify that small category from
the clustering. The small cluster preservation criterion states
that are splitting a small category into pieces is not
advisable and it further decreases the quality of clusters as
the pieces of clusters are distinctive. Suppose clustering C1
has split into three clusters, C11 = {d1, . . . , dn}, C12 =
{dn+1}, and C13 = {dn+2}.
Density Based Clustering
• Locate areas of high density that are separated
from low density areas
• DBSCAN (Density based Spatial Clustering of
Applications with Noise)
• Based on density of data points
• Consider outliers as noise

06/01/2025 81
2 Input parameters:
• Radius around each point (eps)

• Minimum no of data points that should be around

that point within radius (minpts)

06/01/2025 82
Algorithm
• Step 1: Select a random value for eps and minpts.
• Step 2: For a particular point, calculate its distance
from every other point
• Step 3: Find all the neighbour points of X (which
fall inside the circle of radius eps)
More neighbour High density
• Step 4: X is called “Core Point”
if neighbours of X > Min pts
Otherwise “Border Point”
• Step 5: Repeat this process for all points
06/01/2025 83
Hidden Markov Model
• A statistical model called a Hidden Markov Model (HMM) is used
to describe systems with changing unobservable states over time.
It is predicated on the idea that there is an underlying process
with concealed states, each of which has a known result.
Probabilities for switching between concealed states and emitting
observable symbols are defined by the model.
• Because of their superior ability to capture uncertainty and
temporal dependencies, HMMs are used in a wide range of
industries, including finance, bioinformatics, and speech
recognition. HMMs are useful for modelling dynamic systems and
forecasting future states based on sequences that have been seen
because of their flexibility.

06/01/2025 84
06/01/2025 85
HMM
• An HMM consists of two types of variables: hidden
states and observations.
• The hidden states are the underlying variables that
generate the observed data, but they are not
directly observable.
• The observations are the variables that are
measured and observed.

06/01/2025 86
• The relationship between the hidden states and the
observations is modeled using a probability
distribution. The Hidden Markov Model (HMM) is
the relationship between the hidden states and the
observations using two sets of probabilities: the
transition probabilities and the emission
probabilities.
• The transition probabilities describe the probability
of transitioning from one hidden state to another.
• The emission probabilities describe the probability
of observing an output given a hidden state.

06/01/2025 87
Hidden Markov
Model Algorithm
• The Hidden Markov Model (HMM) algorithm can be
implemented using the following steps:
• Step 1: Define the state space and observation space
• The state space is the set of all possible hidden states, and
the observation space is the set of all possible observations.
• Step 2: Define the initial state distribution
• This is the probability distribution over the initial state.
• Step 3: Define the state transition probabilities
• These are the probabilities of transitioning from one state
to another. This forms the transition matrix, which describes
the probability of moving from one state to another.
06/01/2025 88
• Step 4: Define the observation likelihoods:
• These are the probabilities of generating each observation
from each state. This forms the emission matrix, which
describes the probability of generating each observation
from each state.
• Step 5: Train the model
• The parameters of the state transition probabilities and the
observation likelihoods are estimated using the Baum-
Welch algorithm, or the forward-backward algorithm. This is
done by iteratively updating the parameters until
convergence.

06/01/2025 89
• Step 6: Decode the most likely sequence of hidden states
• Given the observed data, the Viterbi algorithm is used to
compute the most likely sequence of hidden states. This can
be used to predict future observations, classify sequences,
or detect patterns in sequential data.
• Step 7: Evaluate the model
• The performance of the HMM can be evaluated using
various metrics, such as accuracy, precision, recall, or F1
score.

06/01/2025 90
Summary
• The HMM algorithm involves defining the state
space, observation space, and the parameters of
the state transition probabilities and observation
likelihoods, training the model using the Baum-
Welch algorithm or the forward-backward
algorithm, decoding the most likely sequence of
hidden states using the Viterbi algorithm, and
evaluating the performance of the model.

06/01/2025 91
Thank You

Khushboo Jain CS401 and ML Applications using R Unit -1

Module 5.2
No ratings yet
Module 5.2
51 pages
ML Unit-2
No ratings yet
ML Unit-2
34 pages
Unit 2
No ratings yet
Unit 2
26 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
SumitBurnwal ML
No ratings yet
SumitBurnwal ML
13 pages
Supervised Learning Essentials
No ratings yet
Supervised Learning Essentials
30 pages
Understanding Regression in Supervised Learning
No ratings yet
Understanding Regression in Supervised Learning
25 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
18 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
5.linear Regression
No ratings yet
5.linear Regression
39 pages
RRB - Unit 2 Regresion
No ratings yet
RRB - Unit 2 Regresion
53 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Unit - 3 - ML - 24
No ratings yet
Unit - 3 - ML - 24
41 pages
Data Science
100% (1)
Data Science
14 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
23 pages
Chapter2 - Optimisation
No ratings yet
Chapter2 - Optimisation
7 pages
S&ML Unit 5 - Q & A
No ratings yet
S&ML Unit 5 - Q & A
15 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
36 pages
Unit-2 Supervised Machine Learning
No ratings yet
Unit-2 Supervised Machine Learning
132 pages
21csc305p ML Unit 2
No ratings yet
21csc305p ML Unit 2
115 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Types of Supervised Learning2
No ratings yet
Types of Supervised Learning2
66 pages
Regression
No ratings yet
Regression
6 pages
Unit 2
No ratings yet
Unit 2
136 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
4 ML
No ratings yet
4 ML
41 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
Unit - Iii Data Analysis
No ratings yet
Unit - Iii Data Analysis
39 pages
Practical 5
No ratings yet
Practical 5
8 pages
Unit 2
No ratings yet
Unit 2
67 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
ML U2 Regression
No ratings yet
ML U2 Regression
20 pages
Chapter - 2 - Linear and Logistic Regression
No ratings yet
Chapter - 2 - Linear and Logistic Regression
34 pages
Unit III
No ratings yet
Unit III
18 pages
Unit II ML
No ratings yet
Unit II ML
14 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
Logistic Regression Example Explained
No ratings yet
Logistic Regression Example Explained
45 pages
d3 It ML Jan 2023 Part 2
No ratings yet
d3 It ML Jan 2023 Part 2
32 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Unit III
No ratings yet
Unit III
11 pages
Unit III
No ratings yet
Unit III
24 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
2.1 Supervised Regression
No ratings yet
2.1 Supervised Regression
26 pages
ML EasySol
No ratings yet
ML EasySol
62 pages
Regression
No ratings yet
Regression
16 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Regression
No ratings yet
Regression
11 pages
Data Science Lab: Linear Regression
No ratings yet
Data Science Lab: Linear Regression
9 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
Unit 3
No ratings yet
Unit 3
48 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Understanding Bias-Variance Tradeoff
No ratings yet
Understanding Bias-Variance Tradeoff
13 pages
Dda3020 ML Fall24 Jia l12
No ratings yet
Dda3020 ML Fall24 Jia l12
40 pages
Data Science Interview Prep Guide
No ratings yet
Data Science Interview Prep Guide
2 pages
Homo Heuristicus - Why Biased Minds Make Better Inferences - Gigerenzer - 2009 - Topics in Cognitive Science - Wiley Online Library
No ratings yet
Homo Heuristicus - Why Biased Minds Make Better Inferences - Gigerenzer - 2009 - Topics in Cognitive Science - Wiley Online Library
37 pages
Self-Study Plan For Becoming A Quantitative Trader - Part II
No ratings yet
Self-Study Plan For Becoming A Quantitative Trader - Part II
4 pages
Machine Learning Overview and Applications
No ratings yet
Machine Learning Overview and Applications
106 pages
Machine Learning - Unit 1 - Introduction - Study Material
No ratings yet
Machine Learning - Unit 1 - Introduction - Study Material
15 pages
Bias-Variance Tradeoff in Model Selection
No ratings yet
Bias-Variance Tradeoff in Model Selection
66 pages
ML Unit 4 Trupesh Patel
No ratings yet
ML Unit 4 Trupesh Patel
56 pages
170 Machine Learning Interview Questios - Greatlearning
100% (1)
170 Machine Learning Interview Questios - Greatlearning
57 pages
Company Wise Data Science Interview Questions
100% (2)
Company Wise Data Science Interview Questions
39 pages
Understanding SVM Tuning and Classification
No ratings yet
Understanding SVM Tuning and Classification
7 pages
Deep Learning: - Course Code: - Unit 2
No ratings yet
Deep Learning: - Course Code: - Unit 2
15 pages
Revision Exercise SDSC5001 Midterm
No ratings yet
Revision Exercise SDSC5001 Midterm
4 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
ML 7th Sem AIML ITE Notes Complete LONG (1) - 10-33
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG (1) - 10-33
24 pages
Deep Learning Optimization Course
No ratings yet
Deep Learning Optimization Course
59 pages
Regularization and Optimization Techniques
No ratings yet
Regularization and Optimization Techniques
38 pages
Introduction To Deep Learning AI 2025
No ratings yet
Introduction To Deep Learning AI 2025
78 pages
Data Science Vijay1
No ratings yet
Data Science Vijay1
88 pages
Machine Learning Notes Anna University
No ratings yet
Machine Learning Notes Anna University
9 pages
14
No ratings yet
14
72 pages
Machine Learning: Professor Department of Computer Science & Engineering
No ratings yet
Machine Learning: Professor Department of Computer Science & Engineering
28 pages
Bias Variance Trade Off
No ratings yet
Bias Variance Trade Off
4 pages
Bias-Variance Tradeoff in ML Models
No ratings yet
Bias-Variance Tradeoff in ML Models
5 pages
SSRN - Mosaics of Predictability
No ratings yet
SSRN - Mosaics of Predictability
56 pages
Lecture 9 - Estimators, Bias and Variance
No ratings yet
Lecture 9 - Estimators, Bias and Variance
38 pages
Sources of Uncertainty in Supervised Machine Learning - A Statisticians' View
No ratings yet
Sources of Uncertainty in Supervised Machine Learning - A Statisticians' View
29 pages
Data Science Mcqs - Hamza Zahoor
No ratings yet
Data Science Mcqs - Hamza Zahoor
9 pages
Unit-3 Aiml
No ratings yet
Unit-3 Aiml
40 pages

Unit 2

Uploaded by

Unit 2

Uploaded by

CSF 344

Faculty & Course coordinator :

 Regression: Linear Regression, Ridge Regression,

• Linear regression is one of the easiest and most popular

• Linear regression can be further divided into two types of

• A change in the value of Objective function

A way of grouping the data points into different clusters,

• It does it by finding some similar patterns in the unlabelled

• It is an unsupervised learning method, hence no supervision is

Example: Let's understand the clustering technique with the real-

• In this type, the dataset is divided into a

• K-Means Clustering is an Unsupervised Learning algorithm,

• It is an iterative algorithm that divides the unlabeled dataset into

• It is a centroid-based algorithm, where each cluster is associated

• The algorithm takes the unlabeled dataset as input, divides the

• The algorithm starts by randomly selecting k objects from the

• Next, each of the remaining objects is assigned to it’s closest

• The cluster assignment and centroid update steps are iteratively

• The density-based clustering method connects the

• The minimum value of MinPts must be chosen at least 3.

• In the distribution model-based

• Medoid: A Medoid is a point in the cluster from which the sum

• K-medoids is an unsupervised method with unlabelled data to be

The partitioning will be carried on such that:

• A Medoid is a point in the cluster from which dissimilarities

• Instead of centroids as reference points in K-Means algorithms,

There are three types of algorithms for K-Medoids Clustering:

• K-mode clustering is an unsupervised machine-learning technique

• Minimum no of data points that should be around

Khushboo Jain CS401 and ML Applications using R Unit -1

You might also like