0% found this document useful (0 votes)
11 views77 pages

UNIT-3 Supervised Learning

The document outlines the course details for a unit on Supervised Learning in Artificial Intelligence and Machine Learning, taught by Dr. Raju at the Noida Institute of Engineering and Technology. It includes course outcomes, a syllabus covering regression and classification techniques, and various types of regression analysis, including univariate and multivariate regression. Key concepts such as Mean Squared Error, R-squared, and logistic regression are also discussed, emphasizing their importance in predictive modeling.

Uploaded by

mallickanjaneya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views77 pages

UNIT-3 Supervised Learning

The document outlines the course details for a unit on Supervised Learning in Artificial Intelligence and Machine Learning, taught by Dr. Raju at the Noida Institute of Engineering and Technology. It includes course outcomes, a syllabus covering regression and classification techniques, and various types of regression analysis, including univariate and multivariate regression. Key concepts such as Mean Squared Error, R-squared, and logistic regression are also discussed, emphasizing their importance in predictive modeling.

Uploaded by

mallickanjaneya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Noida Institute of Engineering and Technology, Greater Noida

Artificial Intelligence & Machine Learning

Unit: 3
Supervised Learning

Dr. Raju
Course Details
Assistant Professor & HoD
B-Tech 3rd Sem. ONLINE & Offline
(Sec A) Department of CSE(AIML)

Dr. Raju, Assistant Prof. (CSE (AIML)) UNIT 03


Faculty Introduction

• Name : Dr. Raju


• Qualification: Ph.D
• Experience: More than 9 years
• Subject Taught: Neural Network,
DBMS, Object Oriented
Programming, Computer Graphics,
COA, Digital Image Processing,
Computer Application

Dr. Raju, Assistant Prof. (CSE (AIML))


UNIT 03
Course Outcomes (CO)

Course Outcomes
(CO)
Bloom’s Knowledge Level (KL)
Course outcome: After completion of this course
students will be able to:
CO 1 Choose and apply the most suitable search algorithm for a given problem to find the goal state. K3

CO2 Comprehend and apply feature engineering and data visualization concepts. K3

CO3 Critically analyze the strengths and weaknesses of various regression and classification algorithms. K5

CO4 Develop approaches that incorporates appropriate clustering algorithms to solve a specific data clustering K3
problem.

CO5 Analyze the efficiency using the ensemble learning techniques, probabilistic learning and reinforcement learning K4
algorithms.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Syllabus
Lecture
Unit Module
s
Introduction to AI and problem-solving methods
Introduction to AI and Intelligent agent, Different Approaches of AI, Problem Solving by searching Techniques: Uninformed search- BFS, DFS, Iterative
deepening, Bi directional search,
Unit-I Informed search- Iterative deepening, Bi directional search, Heuristic search, Greedy Best First Search, A * search, Local Search Algorithms- Hill
Climbing and Simulated Annealing
Adversial Search- Game Playing- minimax, alpha-beta pruning, constraint satisfaction problems

Machine Learning & Feature Engineering


Introduction to Machine Learning, Types of Machine Learning, Feature Engineering: Features and their types, handing missing data, Dealing with
Unit-II categorical features, Working with features: Feature Scaling, Feature selection, Feature Extraction: Principal Component Analysis (PCA) algorithm

Supervised Learning
Regression & Classification: Types of regression (Univariate, Multivariate, Polynomial), Mean Square Error, R square error, Logistic Regression,
Unit Regularization: Bias and Variance, Overfitting and Underfitting, L1 and L2 Regularization, Regularized Linear Regression, Decision Trees (ID3, C4.5,
III CART), Confusion matrix, k-folds cross-validation, K Nearest Neighbour, Support vector machine.

Unsupervised Machine Learning


Unit Introduction to clustering, Types of clustering: K-means clustering, K-mode, K-medoid, hierarchical clustering, single-linkage, multiple linkage, AGNES
IV and DIANA algorithms, Gaussian mixture models density based clustering, DBSCAN

Ensemble & Reinforcement Learning


Probabilistic learning: Bayesian Learning, Naive Bayes Classifier, Bayesian belief networks, Ensembles Learning: Random Forest, Gradient Boosting,
Unit V XGBoost., Reinforcement Learning: Introduction to reinforcement learning, models of reinforcement learning: Markov decision process, Q-learning.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Course Contents / Syllabus

UNIT-III Supervised Learning 8 Hours


Regression & Classification: Types of regression (Univariate, Multivariate,
Polynomial), Mean Square Error, R square error, Logistic Regression,
Regularization: Bias and Variance, Overfitting and Underfitting, L1 and L2
Regularization, Regularized Linear Regression, Decision Trees (ID3, C4.5, CART),
Confusion matrix, k-folds cross-validation, K Nearest Neighbour, Support vector
machine.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regression
• Regression is a supervised learning technique which helps in finding the
correlation between variables and enables us to predict the continuous
output variable based on the one or more predictor variables.
• Regression refers to a type of predictive modeling technique used to estimate
the relationships among variables.
• It involves predicting a continuous outcome variable based on one or more
predictor variables (features).
• It is a statistical method to model the relationship between a dependent
(target) and independent (predictor) variables with one or more independent
variables.
• It predicts continuous/real values such as temperature, age, salary, price, etc.
• It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.
Dr. Raju, Assistant Prof. (CSE (AIML)) U
NIT 03
Regression
• "Regression shows a line or curve that passes through all the datapoints on
target-predictor graph in such a way that the vertical distance between the
datapoints and the regression line is minimum."

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Terminologies Related to the Regression Analysis
• Dependent Variable: The main factor in Regression analysis which
we want to predict or understand is called the dependent
variable. It is also called target variable.
• Independent Variable: The factors which affect the dependent
variables or which are used to predict the values of the dependent
variables are called independent variable, also called as a
predictor.
• Outliers: Outlier is an observation which contains either very low
value or very high value in comparison to other observed values.
An outlier may hamper the result, so it should be avoided.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Terminologies Related to the Regression Analysis
• Multicollinearity: If the independent variables are highly
correlated with each other than other variables, then such
condition is called Multicollinearity. It should not be present in the
dataset, because it creates problem while ranking the most
affecting variable.
• Underfitting and Overfitting: If our algorithm works well with the
training dataset but not well with test dataset, then such problem
is called Overfitting. And if our algorithm does not perform well
even with training dataset, then such problem is called
underfitting.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Types of Regression

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Type of Regression: Univariate
• Univariate regression refers to a statistical technique that analyzes the
relationship between a single independent variable (predictor) and a single
dependent variable (outcome). The goal is to model how changes in the
independent variable affect the dependent variable.
• Simple Linear Regression
Y=a+bX+ϵ
Use Case: Predicting outcomes like sales based on advertising spend
• Polynomial Regression
Y=a+b1 X+b2 X2+b3 X3+...+bn Xn +ϵ
Use Case: Modeling relationships where the effect of the independent
variable changes at different levels, such as growth patterns that are
quadratic.
Dr. Raju, Assistant Prof. (CSE (AIML)) U
NIT 03
Type of Regression: Univariate
• Logarithmic Regression
Y=a+blog(X)+ϵ
Use Case: Analyzing phenomena like the relationship between income and
consumption, where increases in income lead to smaller increases in
consumption.
• Exponential Regression
Y=a⋅e bX
Use Case: Modeling population growth or radioactive decay.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Type of Regression:Multivariate regression
• Multivariate regression involves the analysis of multiple independent
variables to predict a single dependent variable.
• Multiple Linear Regression
Y=a+b1X1+b2X2+...+bnXn+ϵ
Use Case: Predicting a person’s weight based on height, age, and
exercise frequency.
• Ridge Regression

where λ is the regularization parameter.


Use Case: Useful when there are many predictors, and you want to reduce model
complexity.
Dr. Raju, Assistant Prof. (CSE (AIML)) U
NIT 03
Type of Regression:Multivariate regression
• Lasso Regression

Use Case: Effective for variable selection in models with a large number of
predictors.
• Elastic Net Regression

• Use Case: Useful when there are many correlated variables and you want
a more robust model.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Mean Square Error (MSE)
• Mean Squared Error (MSE) is a common metric used to evaluate the performance of
regression models. It measures the average squared difference between the actual
(observed) values and the predicted values generated by a model. The formula for MSE
is:

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Mean Square Error (MSE)

Observation Actual (y) Predict (Y) y-Y (y-Y)^2


1 3 2.5 0.5 0.25
2 -0.5 0 -0.5 0.25
3 2 2 0 0
4 7 8.5 -1.5 2.25
MSE 0.6875

• The Mean Squared Error (MSE) for this dataset is 0.6875. This value gives us an
indication of how well the predicted values match the actual values, with a lower MSE
representing better model performance.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Importance of MSE
• Performance Evaluation:
• MSE provides a quantitative measure of how well a regression model predicts
outcomes. A lower MSE indicates a better fit to the data, meaning the model's
predictions are closer to the actual values.
• Sensitivity to Outliers:
• Since MSE squares the errors, it gives greater weight to larger errors. This
sensitivity makes it effective for detecting models that may not perform well on
extreme values, although it can also make the metric overly influenced by outliers.
• Optimization Objective:
• Many machine learning algorithms, particularly those based on gradient descent
(e.g., linear regression, neural networks), use MSE as the loss function to minimize
during training. By minimizing MSE, models learn to make better predictions.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Importance of MSE
• Comparative Analysis:
• MSE allows for easy comparison between different models or algorithms. By
evaluating multiple models using MSE, practitioners can select the one with the best
performance based on this metric.
• Interpretable Metric:
• Although MSE itself is in squared units of the target variable, it is straightforward to
interpret. When paired with the square root (resulting in Root Mean Squared Error,
RMSE), it can be expressed in the same units as the target variable, enhancing
interpretability.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
R-squared
• R-squared, also known as the coefficient of determination, is a statistical measure that
indicates how well the independent variables in a regression model explain the
variability of the dependent variable.
• It provides an indication of the goodness of fit of the model.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Interpretation of R-squared
• Range: R-squared values range from 0 to 1.
• 0: Indicates that the model does not explain any variability in the
dependent variable (the mean of the dependent variable is the best
predictor).
• 1: Indicates that the model explains all the variability in the dependent
variable (perfect prediction).
• Value Interpretation:
• An R² value of 0.70, for example, suggests that 70% of the variability in
the dependent variable can be explained by the independent variables in
the model, while 30% is attributed to other factors or random noise..

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Linear Regression (Beyond The Syllabus)
• Linear regression is a statistical regression method which is used for predictive analysis.
• It is one of the very simple and easy algorithms which works on regression and shows
the relationship between the continuous variables.
• It is used for solving the regression problem in machine learning.
• Linear regression shows the linear relationship between the independent variable (X-
axis) and the dependent variable (Y-axis), hence called linear regression.
• If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is
called multiple linear regression.
• The relationship between variables in the linear regression model can be explained
using the below image. Here we are predicting the salary of an employee on the basis of
the year of experience.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Linear Regression:
• Some popular applications of linear regression are:
• Analyzing trends and sales estimates
• Salary forecasting
• Real estate prediction
• Arriving at ETAs in traffic.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Colab Link for Linear Regression:

[Link]
DDNR4vngO2EkZcfFaAM4jItw#scrollTo=0X7hGyLc11EZ

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Logistic Regression:
• Logistic regression is another supervised learning algorithm which is used to solve the
classification problems.
• In classification problems, we have dependent variables in a binary or discrete format
such as 0 or 1.
• Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or
No, True or False, Spam or not spam, etc.
• It is a predictive analysis algorithm which works on the concept of probability.
• Logistic regression is a type of regression, but it is different from the linear regression
algorithm in the term how they are used.
• Logistic regression uses sigmoid function or logistic function which is a complex cost
function.
• This sigmoid function is used to model the data in logistic regression. The function can
be represented as:

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Logistic Regression:
• Logistic regression is another supervised learning algorithm which is used to solve the
classification problems.
• In classification problems, we have dependent variables in a binary or discrete format
such as 0 or 1.
• Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or
No, True or False, Spam or not spam, etc.
• It is a predictive analysis algorithm which works on the concept of probability.
• Logistic regression is a type of regression, but it is different from the linear regression
algorithm in the term how they are used.
• Logistic regression uses sigmoid function or logistic function which is a complex cost
function.
• This sigmoid function is used to model the data in logistic regression. The function can
be represented as:

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Logistic Regression:

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Types of Logistic Regression:

Binary(0/1, pass/fail)
Multi(cats, dogs, lions)
Ordinal(low, medium, high)

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Binary logistic regression
• Binary logistic regression predicts the relationship between the
independent and binary dependent variables.
• Some examples of the output of this regression type may be,
success/failure, 0/1, or true/false.
• Examples:
• Deciding on whether or not to offer a loan to a bank customer:
Outcome = yes or no.
• Evaluating the risk of cancer: Outcome = high or low.
• Predicting a team’s win in a football match: Outcome = yes or no.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Multinomial logistic regression
• A categorical dependent variable has two or more discrete
outcomes in a multinomial regression type. This implies that this
regression type has more than two possible outcomes.
• Examples:
• Let’s say you want to predict the most popular transportation type for
2040. Here, transport type equates to the dependent variable, and the
possible outcomes can be electric cars, electric trains, electric buses, and
electric bikes.
• Predicting whether a student will join a college, vocational/trade school,
or corporate industry.
• Estimating the type of food consumed by pets, the outcome may be wet
food, dry food, or junk food.
Dr. Raju, Assistant Prof. (CSE (AIML)) U
NIT 03
Ordinal logistic regression

• Ordinal logistic regression applies when the dependent


variable is in an ordered state (i.e., ordinal).
• The dependent variable (y) specifies an order with two or
more categories or levels.
• Examples: Dependent variables represent,
• Formal shirt size: Outcomes = XS/S/M/L/XL
• Survey answers: Outcomes = Agree/Disagree/Unsure
• Scores on a math test: Outcomes = Poor/Average/Good

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Key advantages of logistic regression

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Colab Link for Linear Regression:

• Case Study: Predicting Diabetes Using Logistic Regression


• [Link]
I_T6uSz2CphiMcLKDYo#scrollTo=IHo9w80zCKoR

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization

• Regularization is a technique used in machine learning and statistics to prevent


overfitting, which occurs when a model learns the noise in the training data instead of
the underlying patterns.
• Purpose:
• To improve model generalization and performance on unseen data by reducing
overfitting.
• Common Types:
• L1 Regularization (Lasso): Adds a penalty equal to the absolute value of the
coefficients. It can produce sparse models by driving some coefficients to zero,
effectively selecting features.
• L2 Regularization (Ridge): Adds a penalty equal to the square of the coefficients. It
tends to shrink coefficients evenly, preventing any one feature from having too
much influence.
• Elastic Net: Combines both L1 and L2 penalties, allowing for feature selection and
coefficient shrinkage.
Dr. Raju, Assistant Prof. (CSE (AIML)) U
NIT 03
Regularization

• Techniques:
• Dropout: Randomly sets a fraction of neurons to zero during training in
neural networks, which helps prevent co-adaptation.
• Early Stopping: Involves monitoring the model's performance on a
validation set and stopping training when performance begins to
degrade.
• Benefits:
• Helps to avoid overfitting.
• Encourages simpler models, which are often more interpretable.
• Can improve prediction accuracy on new data.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization: Bias and Variance

• An error is a measure of how accurately an


algorithm can make predictions for the
previously unknown dataset.

• Reducible errors: These errors can be


reduced to improve the model accuracy. Such
errors can further be classified into bias and
Variance.
• Irreducible errors: These errors will always be
present in the model

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization: Bias and Variance

• Bias: a difference between prediction values made by the model and


actual values/expected values is known as bias errors or Errors due to
bias.
• Low Bias: A low bias model will make fewer assumptions about the form of the
target function.
• High Bias: A model with a high bias makes more assumptions, and the model
becomes unable to capture the important features of our dataset. A high bias
model also cannot perform well on new data.
• Some examples of machine learning algorithms with low bias are
Decision Trees, k-Nearest Neighbours and Support Vector Machines.
• At the same time, an algorithm with high bias is Linear Regression,
Linear Discriminant Analysis and Logistic Regression.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization: Bias and Variance

• Ways to reduce High Bias:


• Increase the input features as the model is underfitted.
• Decrease the regularization term.
• Use more complex models, such as including some polynomial
features.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization: Bias and Variance

• Variance
• The variance would specify the amount of variation in the prediction if the
different training data was used.
• It tells that how much a random variable is different from its expected value.
• Ideally, a model should not vary too much from one training dataset to another,
which means the algorithm should be good in understanding the hidden mapping
between inputs and output variables.
• Variance errors are either of low variance or high variance.
• Low variance means there is a small variation in the prediction of the
target function with changes in the training data set.
• At the same time, High variance shows a large variation in the prediction
of the target function with changes in the training dataset.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization: Bias and Variance

• A model with high variance has the below problems:


• A high variance model leads to overfitting.
• Increase model complexities.
• Ways to Reduce High Variance:
• Reduce the input features or number of parameters as a model is overfitted.
• Do not use a much complex model.
• Increase the training data.
• Increase the Regularization term.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization: Bias and Variance

• Different Combinations of Bias-Variance

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization: Bias and Variance
• Different Combinations of Bias-Variance
• Low-Bias, Low-Variance:
• The combination of low bias and low variance shows an ideal machine learning model.
However, it is not possible practically.
• Low-Bias, High-Variance:
• With low bias and high variance, model predictions are inconsistent and accurate on
average. This case occurs when the model learns with a large number of parameters and
hence leads to an overfitting
• High-Bias, Low-Variance:
• With High bias and low variance, predictions are consistent but inaccurate on average.
This case occurs when a model does not learn well with the training dataset or uses few
numbers of the parameter. It leads to underfitting problems in the model.
• High-Bias, High-Variance:
• With high bias and high variance, predictions are inconsistent and also inaccurate on
average.
Dr. Raju, Assistant Prof. (CSE (AIML)) U
NIT 03
Regularization

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization: Overfitting

• Overfitting
• Overfitting is an undesirable machine learning behavior that occurs when the
machine learning model gives accurate predictions for training data but not
for new data.
• High variance and low bias.
• Reasons for Overfiting
• The training data size is too small and does not contain enough data samples
to accurately represent all possible input data values.
• The training data contains large amounts of irrelevant information, called
noisy data.
• The model trains for too long on a single sample set of data.
• The model complexity is high, so it learns the noise within the training data.
Dr. Raju, Assistant Prof. (CSE (AIML)) U
NIT 03
Regularization: Overfitting

• Symptom of Overfitting:
• Low training error but high validation error.
• The model fits the training data very well but fails to generalize to new data.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization: Underfitting

• Underfitting
• It occurs when a model is too simple to capture data complexities.
• It represents the inability of the model to learn the training data effectively
result in poor performance both on the training and testing data.
• In simple terms, an underfit model’s are inaccurate, especially when applied to
new, unseen examples.
• It mainly happens when we uses very simple model with overly simplified
assumptions.
• To address underfitting problem of the model, we need to use more complex
models, with enhanced feature representation, and less regularization.
• The underfitting model has High bias and low variance.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization: Underfitting

• Reasons for Underfitting


• The model is too simple, So it may be not capable to represent the
complexities in the data.
• The input features which is used to train the model is not the
adequate representations of underlying factors influencing the
target variable.
• The size of the training dataset used is not enough.
• Excessive regularization are used to prevent the overfitting, which
constraint the model to capture the data well.
• Features are not scaled.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization: Underfitting

• Techniques to Reduce Underfitting


• Increase model complexity.
• Increase the number of features, performing feature engineering.
• Remove noise from the data.
• Increase the number of epochs or increase the duration of training
to get better results.
• Symptoms:
• High training error and high validation error.
• The model does not fit the training data well, leading to poor
predictions even on known data.
Dr. Raju, Assistant Prof. (CSE (AIML)) U
NIT 03
Regularization: Underfitting vs. Overfitting

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization: L1 (Lasso)

• LASSO (Least Absolute Shrinkage and Selection Operator) is a


regression analysis method that incorporates L1 regularization.
• It's particularly useful for models that can benefit from feature
selection and addressing multicollinearity.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Key Components of L1 (Lasso)

• L1 Regularization: LASSO adds a penalty equal to the absolute value


of the coefficients multiplied by a tuning parameter (λ) to the loss
function. The objective function becomes:

• Feature Selection: LASSO tends to shrink some coefficients to exactly


zero when the tuning parameter λ is sufficiently large. This means it
effectively selects a simpler model by excluding some features, which
can improve interpretability.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Key Components of L1 (Lasso)

• Bias-Variance Tradeoff: By introducing the penalty, LASSO can help


reduce overfitting, making the model more generalizable to unseen
data. However, this can introduce some bias.

• Tuning Parameter (λ): The choice of λ is crucial. A smaller λ leads to a


model closer to ordinary least squares (OLS), while a larger λ
increases the amount of shrinkage and feature selection. Techniques
like cross-validation are commonly used to find an optimal λ.

• Applications: LASSO is widely used in scenarios with many predictors,


especially when some of them might be irrelevant or redundant. It's
popular in fields like genetics, finance, and machine learning.
Dr. Raju, Assistant Prof. (CSE (AIML)) U
NIT 03
Example of L1 (Lasso)

• Example Scenario

• Imagine you have a dataset with information about houses, and you
want to predict the price based on various features:

• Features
• Size (square feet)
• Number of bedrooms
• Age of the house
• Number of bathrooms
• Proximity to downtown
• Garage size
Dr. Raju, Assistant Prof. (CSE (AIML)) U
NIT 03
Example of L1 (Lasso)

• Dataset

• [Link]
1jeqwzMN5hpfasIFu6J9vub0BSBfonQEm#scrollTo=Dw5LvNSEy0nY
Dr. Raju, Assistant Prof. (CSE (AIML)) U
NIT 03
Regularization: L1 (Lasso)

• [Link]
1pD_uCPkmYla71GerEoN56izmX_ubaol0#scrollTo=3LMS-djesEHk

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization: L2 (Ridge)
• It is a machine learning technique that avoids overfitting by
introducing a penalty term into the model's loss function based on
the squares of the model's parameters.
• The goal of L2 regularization is to keep the model's parameter sizes
short and prevent oversizing.
• In order to achieve L2 regularization, a term that is proportionate to
the squares of the model's parameters is added to the loss function.
• This word works as a limiter on the parameters' size, preventing them
from growing out of control.
• A hyperparameter called lambda that controls the regularization's
intensity also controls the size of the penalty term.
• The parameters will be smaller and the regularization will be stronger
the greater the lambda.
Dr. Raju, Assistant Prof. (CSE (AIML)) U
NIT 03
Regularization: L2 (Lasso)

• A regression model that uses the L2 regularization technique is called


Ridge regression.
• Ridge regression adds the “squared magnitude” of the coefficient as
a penalty term to the loss function(L).

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization: L1(Lasso) and L2 (Ridge)

• Link for Code:


• [Link]
1YkOjpnMJdYWvn0oqjJUCKebbqredDlFH#scrollTo=ityALR23ZF-R

• Link for Dataset:


• [Link]

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Benefits of Regularization

• Reduces Overfitting: Regularization helps prevent models from


learning noise and irrelevant details in the training data.
• Improves Generalization: By discouraging complex models,
regularization ensures better performance on unseen data.
• Enhances Stability: Regularization stabilizes model training by
penalizing large weights.
• Enables Feature Selection: L1 regularization can zero out some
coefficients, effectively selecting more relevant features.
• Manages Multicollinearity: Reduces the problem of high correlations
among features, particularly useful in linear models.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Benefits of Regularization

• Encourages Simplicity: Promotes simpler models that are easier to


interpret and less likely to overfit.
• Controls Model Complexity: Provides a mechanism to balance the
complexity of the model with its performance on the training and
test data.
• Facilitates Robustness: Makes models less sensitive to individual
peculiarities in the training set.
• Improves Convergence: Helps optimization algorithms converge
more quickly and reliably by smoothing the error landscape.
• Adjustable Complexity: The strength of regularization can be tuned
to fit the data's specific needs and desired model complexity.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Confusion Matrix
• A confusion matrix is a tool used in machine learning to evaluate the performance of
a classification model.
• It provides a visual representation of the actual versus predicted classifications,
helping to understand the types of errors made by the model.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Confusion Matrix

• True Positive (TP): The number of correct predictions that an


instance is positive.
• True Negative (TN): The number of correct predictions that an
instance is negative.
• False Positive (FP): The number of incorrect predictions where an
instance is predicted as positive, but it is actually negative (Type I
error).
• False Negative (FN): The number of incorrect predictions where an
instance is predicted as negative, but it is actually positive (Type II
error).

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Confusion Matrix

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Key Metrics Derived from the Confusion Matrix

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Key Metrics Derived from the Confusion Matrix

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Case Study: Medical Diagnosis for Diabetes

• Context: A healthcare provider develops a machine learning model to predict


whether patients have diabetes based on various health metrics. After training and
validating the model, they evaluate its performance using a confusion matrix.
• Data Overview
• The model was tested on a dataset of 1,000 patients, with the following results:
• Actual Diabetic Patients: 150
• Actual Non-Diabetic Patients: 850
• Confusion Matrix

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Case Study: Medical Diagnosis for Diabetes

• Interpretation of the Confusion Matrix


• True Positives (TP): 120 patients were correctly identified as diabetic.
• False Negatives (FN): 30 patients were diabetic but were incorrectly
classified as non-diabetic.
• False Positives (FP): 50 patients were classified as diabetic, but they were
actually non-diabetic.
• True Negatives (TN): 800 patients were correctly identified as non-diabetic

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Case Study: Medical Diagnosis for Diabetes

• Interpretation of the Confusion Matrix


• True Positives (TP): 120 patients were correctly identified as diabetic.
• False Negatives (FN): 30 patients were diabetic but were incorrectly
classified as non-diabetic.
• False Positives (FP): 50 patients were classified as diabetic, but they were
actually non-diabetic.
• True Negatives (TN): 800 patients were correctly identified as non-diabetic

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Case Study: Medical Diagnosis for Diabetes
• Key Metrics
• Accuracy = 92%
• Precision = 70.6%
• Recall = 80%
• F1-Score = 75%
• Insights and Analysis
• High Accuracy: The model has a high accuracy of 92%, which might seem promising, but
accuracy alone does not give a complete picture, especially in medical diagnosis where the
costs of false negatives can be significant.
• Precision vs. Recall: With a precision of 70.6%, the model does reasonably well when it
predicts diabetes. However, a recall of 80% indicates that some diabetic patients are being
missed, which can be critical in healthcare settings.
• Focus on Recall: Given that missing a diabetic patient can lead to serious health
complications, healthcare providers might prioritize improving recall, even if it results in a
lower precision (increased false positives).
Dr. Raju, Assistant Prof. (CSE (AIML)) U
NIT 03
K-fold cross-validation

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
K-fold cross-validation
• Data Partitioning: K-fold cross-validation divides the dataset into
k equal-sized subsets (or "folds"), allowing the model to be trained
and validated multiple times. Each fold serves as the validation set
once while the remaining folds are used for training.
• Performance Estimation: The model is trained and evaluated k
times, and the performance metrics (like accuracy, precision, recall,
etc.) are averaged across all folds. This provides a more reliable
estimate of the model's performance on unseen data.
• Mitigation of Overfitting: By using multiple train-test splits, k-
fold cross-validation helps reduce overfitting, ensuring that the
model generalizes well to new data rather than just memorizing the
training set.
Dr. Raju, Assistant Prof. (CSE (AIML)) U
NIT 03
K-fold cross-validation
• Flexibility in Choice of k: The value of k can be adjusted based on the size of
the dataset and the computational resources available. Common choices are k
= 5 or k = 10, but smaller or larger values can be used depending on specific
needs.
• Stratification Option: In cases of imbalanced datasets, stratified k-fold cross-
validation can be employed to ensure that each fold has a similar distribution
of classes, which helps maintain the representativeness of each fold.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
How K-Fold Cross-Validation Works
• Data Splitting:
• The entire dataset is randomly divided into k subsets or "folds" of
approximately equal size.
• Common values for k are 5 or 10, but it can be adjusted based on the size
of the dataset.
• Model Training and Validation:
• The model is trained and validated k times.
• In each iteration:
• One fold is used as the validation set, while the remaining k-1 folds are used for
training.
• This process ensures that every instance in the dataset is used for both training and
validation at least once.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
How K-Fold Cross-Validation Works
• Performance Measurement:
• After completing all k iterations, the performance metrics (such as
accuracy, precision, recall, F1 score, etc.) are averaged across all
iterations.
• This average provides a more comprehensive view of the model's
performance.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Disadvantages of K-Fold Cross-Validation
• Better Generalization: By using multiple train-test splits, k-fold
cross-validation reduces the risk of overfitting and provides a better
estimate of how the model will perform on unseen data.
• Efficient Use of Data: Since each instance in the dataset is used
for both training and validation, it maximizes the utilization of
available data, which is particularly important for smaller datasets.
• Variance Reduction: Averaging the results over multiple folds
helps to smooth out the variability that can occur with a single
train-test split.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Advantages of K-Fold Cross-Validation

• Computationally Intensive: K-fold cross-validation can be


computationally expensive, especially for large datasets or complex
models, as the model must be trained k times.
• Choice of K: Selecting an appropriate value for k is crucial. A very
small k may lead to high variance in the performance estimate,
while a very large k can be computationally expensive.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03
Regularization

• Regularization: Bias and Variance, Overfitting and Underfitting, L1 and L2


Regularization, Regularized Linear Regression, Decision Trees (ID3, C4.5,
CART), Confusion matrix, k-folds cross-validation, K Nearest Neighbour,
Support vector machine.

Dr. Raju, Assistant Prof. (CSE (AIML)) U


NIT 03

You might also like