0% found this document useful (0 votes)

30 views48 pages

Logistic Regression and Overfitting Prevention

The document discusses various techniques for preventing underfitting and overfitting in machine learning models, including regularization methods like ridge regression, lasso regression, and elastic net regression. It explains that regularization adds a penalty for complex models to avoid overfitting, and describes how different regularization methods like L1 and L2 normalization work. The document also covers logistic regression and how it can be used for classification problems by fitting data to a sigmoid function using techniques like gradient descent.

Uploaded by

rediet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views48 pages

Logistic Regression and Overfitting Prevention

Uploaded by

rediet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Classification : Logistic

Regression
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of Computer Science & Engineering
Dr. Mesfin Abebe Haile (2022)
Under and Overfitting

 Can we prevent over fitting and use a complicated model at once?

03/18/24 2
Preventing Under and Overfitting
(Regularization)
 Over-fitting occurs when a Machine Learning algorithm captures the noise of the data. (low bias but high variance)
 Under-fitting occurs when a Machine Learning algorithm cannot capture the underlying trend of the data. (high bias but low variance)

03/18/24 3
Preventing Under and Overfitting
(Regularization)
 Regularization is a form of regression, that constrains /
regularizes or shrinks the coefficient estimates towards zero.

 This technique discourages learning a more complex or flexible

model, so as to avoid the risk of over-fitting.

 λ is the tuning parameter that decides how much we want to

penalize the flexibility of our model.

03/18/24 4
Ridge Regularization (L2)

 Penalizing big coefficients in our 9 degree polynomial.

 The resulting algorithm is called “Ridge Regression (L2)”.

03/18/24 5
Effect of Ridge Regression on
Parameters
 Penalizing big coefficients in our 9 degree polynomial.
 The resulting algorithm is called “Ridge Regression (L2)”.

03/18/24 6
Lasso Regression (L1)

 A different regularization term can be the L1 norm, instead of L2. (It takes the sum of abs values)
 Larger values are still penalized, but not as strongly.
 The effect of this is, it selectively shrink some coefficients.
 Slower to converge than Ridge regression.
 Can be used for feature selection.

03/18/24 7
Effect of Lasso Regression on
Parameters
 Some coefficients can stay high, and others are closed to zero.

03/18/24 8
Elastic Net Regression

 Compromise of both Ridge and Lasso regression.

 The idea is to get the best of both techniques.
 The mixture is determined by lambda 1 and lambad2.

03/18/24 9
Elastic Net Regression

03/18/24 10
Regularization Coefficients

 Regularization coefficients (λ1 and λ2) has to be determined

empirically.
 We can not use test data for tuning.
 Create additional split of data to tune which is called validation
set.

 Train and cross-validate with several lambda choices and find

the best one.
 Then test on the test set to make sure results generalize.

03/18/24 11
Ridge Regression: the Syntax

 Import the class containing the regression method.

 from sklearn.linear_model import Ridge
 Create an instance of the class.
 RR = Ridge (alpha=1.0) # Regularization parameter
 Fit the instance on the data and then predict the expected value.
 RR = [Link] (x_train, y_train)
 y_predict = [Link] (x_test)

 The RedgeCV class will perform cross validation on a set of

values for alpha.
03/18/24 12
Lasso Regression: the Syntax

 Import the class containing the regression method.

 From sklearn.linear_model import Lasso
 Create an instance of the class.
 LR = Lasso (alpha=1.0) # Regularization parameter
 Fit the instance on the data and then predict the expected value.
 LR = [Link] (x_train, y_train)
 y_predict = [Link] (x_test)

 The LassoCV class will perform cross validation on a set of

values for alpha.
03/18/24 13
Elastic Net Regression: the Syntax

 Import the class containing the regression method.

 From sklearn.linear_model import ElasticNet
 Create an instance of the class.
 EN = ElasticNet (alpha=1.0, L1_ratio=0.5) # alpha is
Regularization parameter, L1_ratio distributes alpha to L1/L2
(L1 = 0 , ridge and L1 = 1, lasso)
 Fit the instance on the data and then predict the expected value.
 EN = [Link] (x_train, y_train)
 y_predict = [Link] (x_test)
 The ElasticNetCV class will perform cross validation on a set
of values L1_ratio and alpha.
03/18/24 14
Feature Selection

 Regularization performs feature selection by shrinking the

contribution of features.
 Reduces the contribution of features to the model so that
models are not over-fitting.
 For L1-regularization, this is accomplished by driving some
coefficients to zero.

 Feature selection can also be performed by removing features.

 This would have to be done qualitatively say by removing a
feature at a time and measuring the predictive results via cross
validation.
03/18/24 15
Why is Feature Selection
Important ?
 Reducing the number of features is another way to prevent
over-fitting (similar to regularization) [some signal, some
noise].

 For some models, fewer features can improve fitting time

and/or results. (Particularly those that don’t have regularization
built-in)
 Identifying critical features can improve model interpretability.
 Feature elimination can be also be used to identify most
important features, which can improve model interpretability.
03/18/24 16
Recursive Feature Elimination
(RFE) : the Syntax
 Import the class containing the feature selection method.
 From sklearn.feature_selection import RFE
 Create an instance of the class.
 rfeMod = RFE(est, n_features_to_select=5) # [est is an instance
of the model to use (LASSO), final number of features]
 Fit the instance on the data and then predict the expected value.
 rfeMod = [Link](x_train, y_train)
 y_predict = [Link](x_test)

 The RFECV class will perform feature elimination using cross

validation.
03/18/24 17
Logistic Regression

03/18/24 18
Logistic Regression

 Many of the problems we have in the real life need optimize solution.
 Logistic regression is finding best-fit parameters to a nonlinear function called the sigmoid.

03/18/24 19
Logistic Regression

 We’ll use optimization algorithms to find these best-fit parameters.

 Among the optimization algorithms, one of the most common algorithms is gradient
descent/ascent.

 Gradient ascent can be simplified with stochastic gradient descent/ascent.

 These optimization algorithms will be used to train our classifier.

03/18/24 20
Logistic Regression

 General approach to logistic regression:

 Collect: any method
 Prepare : numeric values are needed for a distance calculation. A
structured data format is best.
 Analyze: any method

 Train : take time to find optimal coefficient to classify the data.

 Test: classification is quick and easy once the training step is done.
 Use: this application needs to get some input data and output structured
numeric values.

03/18/24 21
Logistic Regression

 Stochastic gradient ascent can do as well as gradient ascent using far

fewer computing resources.

 Logistic regression:
 Pros: Computationally inexpensive, easy to implement, knowledge
representation easy to interpret.
 Cons: prone to under-fitting, may have low accuracy.
 Works with: Numeric values, nominal values.

03/18/24 22
Logistic Regression

 We can treat a binary classification problem as a regression

problem.
 The binary classes 1/0, and fit a regression algorithm.
 When a new unlabeled recorded comes in, try to guess the value
with a regression algorithm.
 If the value is higher than the midpoint (0.5), declare the prediction
to be the class 1, and vice versa.

03/18/24 23
Logistic Regression

 Linear regression for classification:

03/18/24 24
Logistic Regression

 What if our data looks different as shown below; the line becomes
more slanted.

03/18/24 25
Logistic Regression

 What if our data looks different as shown below; the line becomes
more slanted.

03/18/24 26
Logistic Regression

 Linear regression for classification:

03/18/24 27
Logistic Regression

 This function always takes values between 0 and 1.

 Smooth out the effect of high/low x values.

03/18/24 28
Logistic Regression

 This way, our algorithm is not skewed by the sample and it

manages to find the obvious visual threshold.

03/18/24 29
Logistic Regression

 So instead of trying to fit (y = b0 + b1x); we can try to fit (y = f(b0 +

b1x)).
 The resulting algorithm is called “logistic regression”.
 It is not regression algorithm; it is a classification algorithm.

03/18/24 30
Logistic Regression

 Yb(x) is always going to be between 0 and 1.

 As opposed to linear regression approach; which can take on any
value.

03/18/24 31
Logistic Regression: the Syntax

 Import the class containing the classification method.

 From sklearn.linear_model import LogisticRegression
 Create an instance of the class.
 LR = LogisticRegression(penalty=12, c=10.0) # Regularization
parameters [penalty 12 norm , higher c -> less penalty]
 Fit the instance on the data and then predict the expected value.
 LR = [Link](x_train, y_train)
 y_predict = [Link](x_test)
 Tune regularization parameters with cross validation:
LogisticRegressionCV.
03/18/24 32
Classification Error Metrics

03/18/24 33
Error Metrics

 The choice of the “right” error metric depends heavily on the

question and the data.

 For example: assume we are classifying patients likely to get

leukemia.
 In our training data, a large majority (99%) of patients are
healthy. (1% patient with leukemia)

 Accuracy is often not the right metric for a binary classifier

problem.
03/18/24 34
Error Metrics

 When thinking about error with classification, we often talk about

a confusion matrix.

03/18/24 35
Error Metrics

 Accuracy: predicting correctly. (the most common error metric

but it can be deceived in situation where the populations are
skewed)

03/18/24 36
Error Metrics

 Recall: measures the % of the actual positive class that is

correctly predicted (another common error metric)
 100 % recall by predicting everything to be positive

03/18/24 37
Error Metrics

 Precision: measures the % of the predicted positive class that is

correct (identify only positive instances)

03/18/24 38
Error Metrics

 Specificity: how correctly the actual negative class is predicted

(Avoid false alarms)

03/18/24 39
Error Metrics

 F1 Score : is the harmonic mean.

 It is nice metric because it uses both precision and recall.

03/18/24 40
Error Metrics

 Receiver Operating Characteristic (ROC):

 The diagonal of this matrix represents the value that can be obtained
by random guessing.

03/18/24 41
Error Metrics

 Area Under Curve (AUC):

 How well are we separating the two classes.
 0.5 is random – useless model; 1 is perfect classification.

03/18/24 42
Error Metrics

 Precision Recall Curve (PR Curve):

 Plot the precision – recall values for various score threshold.

03/18/24 43
Error Metrics

 Multiple Class Error Metrics:

03/18/24 44
Classification Error Metrics: the
Syntax

 Import the desired error function

 from [Link] import accuracy_score
 Calculate the error on the test and predicted data sets
 accuracy_value = accuracy_score(y_test, y_pred)

 Lots of other error metrics and diagnostic tools:

 from [Link] import precision_score, recall_score,
f1_score, roc_auc_score, confusion_matrix, roc_curve,
precision_recall_curve etc…

03/18/24 45
Summary

 What the logistic function is and how it is used in logistic regression.

 The key representation in logistic regression are the coefficients, just like linear regression.

 That the coefficients in logistic regression are estimated using a process called maximum-
likelihood estimation. That making predictions using logistic regression is so easy.

03/18/24 46
Question & Answer

03/18/24 47
Thank You !!!

03/18/24 48

ML 1
No ratings yet
ML 1
24 pages
Intro to Classification & Regression
No ratings yet
Intro to Classification & Regression
42 pages
Unit 2
No ratings yet
Unit 2
8 pages
Supervised Learning: Regression Insights
No ratings yet
Supervised Learning: Regression Insights
11 pages
ML 3
No ratings yet
ML 3
50 pages
Regression Algorithms Guide
No ratings yet
Regression Algorithms Guide
22 pages
ML Day3
No ratings yet
ML Day3
10 pages
Unit II - Supervised Machine Learning Techniques
No ratings yet
Unit II - Supervised Machine Learning Techniques
131 pages
BFCAI BigDataAnalytics Lecture#5 2
No ratings yet
BFCAI BigDataAnalytics Lecture#5 2
69 pages
CO 2 Session 3
No ratings yet
CO 2 Session 3
39 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Comprehensive Machine Learning Tutorial - Regressio
No ratings yet
Comprehensive Machine Learning Tutorial - Regressio
9 pages
Lecture 09 ML
No ratings yet
Lecture 09 ML
26 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Lecture04. Training Models (Regression in Chapter 4)
No ratings yet
Lecture04. Training Models (Regression in Chapter 4)
44 pages
SML
No ratings yet
SML
8 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
28 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
3.1 Linear and Logistic Regression
No ratings yet
3.1 Linear and Logistic Regression
36 pages
COL774 Machine Learning Practice Problems
No ratings yet
COL774 Machine Learning Practice Problems
22 pages
Classification Models
No ratings yet
Classification Models
3 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Intro To Linear and Logistic Reg
No ratings yet
Intro To Linear and Logistic Reg
5 pages
Lecture Material 11
No ratings yet
Lecture Material 11
14 pages
6 - Classification and Regression Tasks
No ratings yet
6 - Classification and Regression Tasks
115 pages
Unit-2 Sem II Notes
No ratings yet
Unit-2 Sem II Notes
33 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Describe in Brief Different Types of Regression Algorithms
No ratings yet
Describe in Brief Different Types of Regression Algorithms
25 pages
Regression
No ratings yet
Regression
56 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Class 9 After
No ratings yet
Class 9 After
38 pages
ML CLASS 5 Logistic Regression Algorithm
No ratings yet
ML CLASS 5 Logistic Regression Algorithm
16 pages
ML Syllabus First 5 Topics Hints Visuals
No ratings yet
ML Syllabus First 5 Topics Hints Visuals
25 pages
Exam 2 Review
No ratings yet
Exam 2 Review
23 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Lasso Regression in Logistic Models
No ratings yet
Lasso Regression in Logistic Models
43 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Chapter2 - Optimisation
No ratings yet
Chapter2 - Optimisation
7 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Logistic Regression Explained: Pros & Cons
No ratings yet
Logistic Regression Explained: Pros & Cons
45 pages
Linear Regression Techniques
No ratings yet
Linear Regression Techniques
25 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Logistic Regression Algorithm
No ratings yet
Logistic Regression Algorithm
8 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
53 pages
Regression Questionnaire
No ratings yet
Regression Questionnaire
10 pages
Regularization in Polynomial Regression
No ratings yet
Regularization in Polynomial Regression
30 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Unit-4 Pda
No ratings yet
Unit-4 Pda
111 pages
Aychew Chernet
No ratings yet
Aychew Chernet
8 pages
2.1 Supervised Regression
No ratings yet
2.1 Supervised Regression
26 pages
LR2
No ratings yet
LR2
25 pages
Classification & Regression BDMDM Print
No ratings yet
Classification & Regression BDMDM Print
5 pages
B24 ML Exp-1
No ratings yet
B24 ML Exp-1
10 pages
AWS Machine Learning Specialty Master Cheat Sheet
No ratings yet
AWS Machine Learning Specialty Master Cheat Sheet
24 pages
ML-classification Models
No ratings yet
ML-classification Models
27 pages
CH 5
No ratings yet
CH 5
42 pages
Module 2 Modified
No ratings yet
Module 2 Modified
67 pages
Algorithmic Auditorium Design Model
No ratings yet
Algorithmic Auditorium Design Model
10 pages
UiTM Jengka Academic Affairs Overview
No ratings yet
UiTM Jengka Academic Affairs Overview
20 pages
Final Exam Specifications St's
No ratings yet
Final Exam Specifications St's
19 pages
Cant Hurt Me Full 5000 Word Summary
No ratings yet
Cant Hurt Me Full 5000 Word Summary
26 pages
Design of Transmission Systems Question Bank
No ratings yet
Design of Transmission Systems Question Bank
23 pages
(Steven Galbraith) Mathematics of Public Key Crypt
100% (1)
(Steven Galbraith) Mathematics of Public Key Crypt
597 pages
Library Modernization Plan
No ratings yet
Library Modernization Plan
14 pages
Right to Information on Road Aggregates
No ratings yet
Right to Information on Road Aggregates
8 pages
Upstanders Against Cyberbullying
No ratings yet
Upstanders Against Cyberbullying
9 pages
NA - BR - 1166 - Dematic - AGV - CompactGeneral 2
No ratings yet
NA - BR - 1166 - Dematic - AGV - CompactGeneral 2
4 pages
Preventive Mantenance of HT BOARD
No ratings yet
Preventive Mantenance of HT BOARD
6 pages
Equity Protect Granted Patents For Groundbreaking Data Security Technology
No ratings yet
Equity Protect Granted Patents For Groundbreaking Data Security Technology
4 pages
Analysis of Harmonic Reduction by Using Passive Harmonic Filters
No ratings yet
Analysis of Harmonic Reduction by Using Passive Harmonic Filters
6 pages
Short Dump MESSAGE - TYPE - X - Delete Project
No ratings yet
Short Dump MESSAGE - TYPE - X - Delete Project
29 pages
Corporate Culture Case Study
0% (1)
Corporate Culture Case Study
5 pages
Auxiliary Heater Control Guide
No ratings yet
Auxiliary Heater Control Guide
2 pages
Turbo Ques
No ratings yet
Turbo Ques
6 pages
Unit-10-Sources-of-Energy-Lesson-2-A Closer Look 1
No ratings yet
Unit-10-Sources-of-Energy-Lesson-2-A Closer Look 1
28 pages
Regression Performance Metrics Overview
No ratings yet
Regression Performance Metrics Overview
6 pages
FACTS Technology in Power Systems
No ratings yet
FACTS Technology in Power Systems
17 pages
Permutations and Combinations Problems
No ratings yet
Permutations and Combinations Problems
4 pages
Mitsubishi NT ND Specs en
No ratings yet
Mitsubishi NT ND Specs en
8 pages
Semiconductor Course Overview
No ratings yet
Semiconductor Course Overview
24 pages
DC Circuits Worksheet
No ratings yet
DC Circuits Worksheet
4 pages
EDC Lab Manual: CS-302 Experiments
100% (1)
EDC Lab Manual: CS-302 Experiments
52 pages
Demography and Public Health Insights
No ratings yet
Demography and Public Health Insights
22 pages
46.sae100 R1at en 853 1SN
No ratings yet
46.sae100 R1at en 853 1SN
1 page
E Nihss
No ratings yet
E Nihss
5 pages
Magnetic Materials Grde 6
No ratings yet
Magnetic Materials Grde 6
2 pages
3, 5, 7 Term B.Tech - BE Master Time Table ESE Winter - 2021
No ratings yet
3, 5, 7 Term B.Tech - BE Master Time Table ESE Winter - 2021
8 pages

Logistic Regression and Overfitting Prevention

Uploaded by

Logistic Regression and Overfitting Prevention

Uploaded by

Classification : Logistic

 Can we prevent over fitting and use a complicated model at once?

 This technique discourages learning a more complex or flexible

 λ is the tuning parameter that decides how much we want to

 Penalizing big coefficients in our 9 degree polynomial.

 Compromise of both Ridge and Lasso regression.

 Regularization coefficients (λ1 and λ2) has to be determined

 Train and cross-validate with several lambda choices and find

 Import the class containing the regression method.

 The RedgeCV class will perform cross validation on a set of

 Import the class containing the regression method.

 The LassoCV class will perform cross validation on a set of

 Import the class containing the regression method.

 Regularization performs feature selection by shrinking the

 Feature selection can also be performed by removing features.

 For some models, fewer features can improve fitting time

 The RFECV class will perform feature elimination using cross

 We’ll use optimization algorithms to find these best-fit parameters.

 Gradient ascent can be simplified with stochastic gradient descent/ascent.

 General approach to logistic regression:

 Train : take time to find optimal coefficient to classify the data.

 Stochastic gradient ascent can do as well as gradient ascent using far

 We can treat a binary classification problem as a regression

 Linear regression for classification:

 Linear regression for classification:

 This function always takes values between 0 and 1.

 This way, our algorithm is not skewed by the sample and it

 So instead of trying to fit (y = b0 + b1x); we can try to fit (y = f(b0 +

 Yb(x) is always going to be between 0 and 1.

 Import the class containing the classification method.

 The choice of the “right” error metric depends heavily on the

 For example: assume we are classifying patients likely to get

 Accuracy is often not the right metric for a binary classifier

 When thinking about error with classification, we often talk about

 Accuracy: predicting correctly. (the most common error metric

 Recall: measures the % of the actual positive class that is

 Precision: measures the % of the predicted positive class that is

 Specificity: how correctly the actual negative class is predicted

 F1 Score : is the harmonic mean.

 Receiver Operating Characteristic (ROC):

 Area Under Curve (AUC):

 Precision Recall Curve (PR Curve):

 Multiple Class Error Metrics:

 Import the desired error function

 Lots of other error metrics and diagnostic tools:

 What the logistic function is and how it is used in logistic regression.

You might also like