0% found this document useful (0 votes)
30 views48 pages

Logistic Regression and Overfitting Prevention

The document discusses various techniques for preventing underfitting and overfitting in machine learning models, including regularization methods like ridge regression, lasso regression, and elastic net regression. It explains that regularization adds a penalty for complex models to avoid overfitting, and describes how different regularization methods like L1 and L2 normalization work. The document also covers logistic regression and how it can be used for classification problems by fitting data to a sigmoid function using techniques like gradient descent.

Uploaded by

rediet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views48 pages

Logistic Regression and Overfitting Prevention

The document discusses various techniques for preventing underfitting and overfitting in machine learning models, including regularization methods like ridge regression, lasso regression, and elastic net regression. It explains that regularization adds a penalty for complex models to avoid overfitting, and describes how different regularization methods like L1 and L2 normalization work. The document also covers logistic regression and how it can be used for classification problems by fitting data to a sigmoid function using techniques like gradient descent.

Uploaded by

rediet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Classification : Logistic

Regression
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of Computer Science & Engineering
Dr. Mesfin Abebe Haile (2022)
Under and Overfitting

 Can we prevent over fitting and use a complicated model at once?

03/18/24 2
Preventing Under and Overfitting
(Regularization)
 Over-fitting occurs when a Machine Learning algorithm captures the noise of the data. (low bias but high variance)
 Under-fitting occurs when a Machine Learning algorithm cannot capture the underlying trend of the data. (high bias but low variance)

03/18/24 3
Preventing Under and Overfitting
(Regularization)
 Regularization is a form of regression, that constrains /
regularizes or shrinks the coefficient estimates towards zero.

 This technique discourages learning a more complex or flexible


model, so as to avoid the risk of over-fitting.

 λ is the tuning parameter that decides how much we want to


penalize the flexibility of our model.

03/18/24 4
Ridge Regularization (L2)

 Penalizing big coefficients in our 9 degree polynomial.


 The resulting algorithm is called “Ridge Regression (L2)”.

03/18/24 5
Effect of Ridge Regression on
Parameters
 Penalizing big coefficients in our 9 degree polynomial.
 The resulting algorithm is called “Ridge Regression (L2)”.

03/18/24 6
Lasso Regression (L1)

 A different regularization term can be the L1 norm, instead of L2. (It takes the sum of abs values)
 Larger values are still penalized, but not as strongly.
 The effect of this is, it selectively shrink some coefficients.
 Slower to converge than Ridge regression.
 Can be used for feature selection.

03/18/24 7
Effect of Lasso Regression on
Parameters
 Some coefficients can stay high, and others are closed to zero.

03/18/24 8
Elastic Net Regression

 Compromise of both Ridge and Lasso regression.


 The idea is to get the best of both techniques.
 The mixture is determined by lambda 1 and lambad2.

03/18/24 9
Elastic Net Regression

03/18/24 10
Regularization Coefficients

 Regularization coefficients (λ1 and λ2) has to be determined


empirically.
 We can not use test data for tuning.
 Create additional split of data to tune which is called validation
set.

 Train and cross-validate with several lambda choices and find


the best one.
 Then test on the test set to make sure results generalize.

03/18/24 11
Ridge Regression: the Syntax

 Import the class containing the regression method.


 from sklearn.linear_model import Ridge
 Create an instance of the class.
 RR = Ridge (alpha=1.0) # Regularization parameter
 Fit the instance on the data and then predict the expected value.
 RR = [Link] (x_train, y_train)
 y_predict = [Link] (x_test)

 The RedgeCV class will perform cross validation on a set of


values for alpha.
03/18/24 12
Lasso Regression: the Syntax

 Import the class containing the regression method.


 From sklearn.linear_model import Lasso
 Create an instance of the class.
 LR = Lasso (alpha=1.0) # Regularization parameter
 Fit the instance on the data and then predict the expected value.
 LR = [Link] (x_train, y_train)
 y_predict = [Link] (x_test)

 The LassoCV class will perform cross validation on a set of


values for alpha.
03/18/24 13
Elastic Net Regression: the Syntax

 Import the class containing the regression method.


 From sklearn.linear_model import ElasticNet
 Create an instance of the class.
 EN = ElasticNet (alpha=1.0, L1_ratio=0.5) # alpha is
Regularization parameter, L1_ratio distributes alpha to L1/L2
(L1 = 0 , ridge and L1 = 1, lasso)
 Fit the instance on the data and then predict the expected value.
 EN = [Link] (x_train, y_train)
 y_predict = [Link] (x_test)
 The ElasticNetCV class will perform cross validation on a set
of values L1_ratio and alpha.
03/18/24 14
Feature Selection

 Regularization performs feature selection by shrinking the


contribution of features.
 Reduces the contribution of features to the model so that
models are not over-fitting.
 For L1-regularization, this is accomplished by driving some
coefficients to zero.

 Feature selection can also be performed by removing features.


 This would have to be done qualitatively say by removing a
feature at a time and measuring the predictive results via cross
validation.
03/18/24 15
Why is Feature Selection
Important ?
 Reducing the number of features is another way to prevent
over-fitting (similar to regularization) [some signal, some
noise].

 For some models, fewer features can improve fitting time


and/or results. (Particularly those that don’t have regularization
built-in)
 Identifying critical features can improve model interpretability.
 Feature elimination can be also be used to identify most
important features, which can improve model interpretability.
03/18/24 16
Recursive Feature Elimination
(RFE) : the Syntax
 Import the class containing the feature selection method.
 From sklearn.feature_selection import RFE
 Create an instance of the class.
 rfeMod = RFE(est, n_features_to_select=5) # [est is an instance
of the model to use (LASSO), final number of features]
 Fit the instance on the data and then predict the expected value.
 rfeMod = [Link](x_train, y_train)
 y_predict = [Link](x_test)

 The RFECV class will perform feature elimination using cross


validation.
03/18/24 17
Logistic Regression

03/18/24 18
Logistic Regression

 Many of the problems we have in the real life need optimize solution.
 Logistic regression is finding best-fit parameters to a nonlinear function called the sigmoid.

03/18/24 19
Logistic Regression

 We’ll use optimization algorithms to find these best-fit parameters.


 Among the optimization algorithms, one of the most common algorithms is gradient
descent/ascent.

 Gradient ascent can be simplified with stochastic gradient descent/ascent.


 These optimization algorithms will be used to train our classifier.

03/18/24 20
Logistic Regression

 General approach to logistic regression:


 Collect: any method
 Prepare : numeric values are needed for a distance calculation. A
structured data format is best.
 Analyze: any method

 Train : take time to find optimal coefficient to classify the data.


 Test: classification is quick and easy once the training step is done.
 Use: this application needs to get some input data and output structured
numeric values.

03/18/24 21
Logistic Regression

 Stochastic gradient ascent can do as well as gradient ascent using far


fewer computing resources.

 Logistic regression:
 Pros: Computationally inexpensive, easy to implement, knowledge
representation easy to interpret.
 Cons: prone to under-fitting, may have low accuracy.
 Works with: Numeric values, nominal values.

03/18/24 22
Logistic Regression

 We can treat a binary classification problem as a regression


problem.
 The binary classes 1/0, and fit a regression algorithm.
 When a new unlabeled recorded comes in, try to guess the value
with a regression algorithm.
 If the value is higher than the midpoint (0.5), declare the prediction
to be the class 1, and vice versa.

03/18/24 23
Logistic Regression

 Linear regression for classification:

03/18/24 24
Logistic Regression

 What if our data looks different as shown below; the line becomes
more slanted.

03/18/24 25
Logistic Regression

 What if our data looks different as shown below; the line becomes
more slanted.

03/18/24 26
Logistic Regression

 Linear regression for classification:

03/18/24 27
Logistic Regression

 This function always takes values between 0 and 1.


 Smooth out the effect of high/low x values.

03/18/24 28
Logistic Regression

 This way, our algorithm is not skewed by the sample and it


manages to find the obvious visual threshold.

03/18/24 29
Logistic Regression

 So instead of trying to fit (y = b0 + b1x); we can try to fit (y = f(b0 +


b1x)).
 The resulting algorithm is called “logistic regression”.
 It is not regression algorithm; it is a classification algorithm.

03/18/24 30
Logistic Regression

 Yb(x) is always going to be between 0 and 1.


 As opposed to linear regression approach; which can take on any
value.

03/18/24 31
Logistic Regression: the Syntax

 Import the class containing the classification method.


 From sklearn.linear_model import LogisticRegression
 Create an instance of the class.
 LR = LogisticRegression(penalty=12, c=10.0) # Regularization
parameters [penalty 12 norm , higher c -> less penalty]
 Fit the instance on the data and then predict the expected value.
 LR = [Link](x_train, y_train)
 y_predict = [Link](x_test)
 Tune regularization parameters with cross validation:
LogisticRegressionCV.
03/18/24 32
Classification Error Metrics

03/18/24 33
Error Metrics

 The choice of the “right” error metric depends heavily on the


question and the data.

 For example: assume we are classifying patients likely to get


leukemia.
 In our training data, a large majority (99%) of patients are
healthy. (1% patient with leukemia)

 Accuracy is often not the right metric for a binary classifier


problem.
03/18/24 34
Error Metrics

 When thinking about error with classification, we often talk about


a confusion matrix.

03/18/24 35
Error Metrics

 Accuracy: predicting correctly. (the most common error metric


but it can be deceived in situation where the populations are
skewed)

03/18/24 36
Error Metrics

 Recall: measures the % of the actual positive class that is


correctly predicted (another common error metric)
 100 % recall by predicting everything to be positive

03/18/24 37
Error Metrics

 Precision: measures the % of the predicted positive class that is


correct (identify only positive instances)

03/18/24 38
Error Metrics

 Specificity: how correctly the actual negative class is predicted


(Avoid false alarms)

03/18/24 39
Error Metrics

 F1 Score : is the harmonic mean.


 It is nice metric because it uses both precision and recall.

03/18/24 40
Error Metrics

 Receiver Operating Characteristic (ROC):


 The diagonal of this matrix represents the value that can be obtained
by random guessing.

03/18/24 41
Error Metrics

 Area Under Curve (AUC):


 How well are we separating the two classes.
 0.5 is random – useless model; 1 is perfect classification.

03/18/24 42
Error Metrics

 Precision Recall Curve (PR Curve):


 Plot the precision – recall values for various score threshold.

03/18/24 43
Error Metrics

 Multiple Class Error Metrics:

03/18/24 44
Classification Error Metrics: the
Syntax

 Import the desired error function


 from [Link] import accuracy_score
 Calculate the error on the test and predicted data sets
 accuracy_value = accuracy_score(y_test, y_pred)

 Lots of other error metrics and diagnostic tools:


 from [Link] import precision_score, recall_score,
f1_score, roc_auc_score, confusion_matrix, roc_curve,
precision_recall_curve etc…

03/18/24 45
Summary

 What the logistic function is and how it is used in logistic regression.


 The key representation in logistic regression are the coefficients, just like linear regression.

 That the coefficients in logistic regression are estimated using a process called maximum-
likelihood estimation. That making predictions using logistic regression is so easy.

03/18/24 46
Question & Answer

03/18/24 47
Thank You !!!

03/18/24 48

You might also like