Classification : Logistic
Regression
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of Computer Science & Engineering
Dr. Mesfin Abebe Haile (2022)
Under and Overfitting
Can we prevent over fitting and use a complicated model at once?
03/18/24 2
Preventing Under and Overfitting
(Regularization)
Over-fitting occurs when a Machine Learning algorithm captures the noise of the data. (low bias but high variance)
Under-fitting occurs when a Machine Learning algorithm cannot capture the underlying trend of the data. (high bias but low variance)
03/18/24 3
Preventing Under and Overfitting
(Regularization)
Regularization is a form of regression, that constrains /
regularizes or shrinks the coefficient estimates towards zero.
This technique discourages learning a more complex or flexible
model, so as to avoid the risk of over-fitting.
λ is the tuning parameter that decides how much we want to
penalize the flexibility of our model.
03/18/24 4
Ridge Regularization (L2)
Penalizing big coefficients in our 9 degree polynomial.
The resulting algorithm is called “Ridge Regression (L2)”.
03/18/24 5
Effect of Ridge Regression on
Parameters
Penalizing big coefficients in our 9 degree polynomial.
The resulting algorithm is called “Ridge Regression (L2)”.
03/18/24 6
Lasso Regression (L1)
A different regularization term can be the L1 norm, instead of L2. (It takes the sum of abs values)
Larger values are still penalized, but not as strongly.
The effect of this is, it selectively shrink some coefficients.
Slower to converge than Ridge regression.
Can be used for feature selection.
03/18/24 7
Effect of Lasso Regression on
Parameters
Some coefficients can stay high, and others are closed to zero.
03/18/24 8
Elastic Net Regression
Compromise of both Ridge and Lasso regression.
The idea is to get the best of both techniques.
The mixture is determined by lambda 1 and lambad2.
03/18/24 9
Elastic Net Regression
03/18/24 10
Regularization Coefficients
Regularization coefficients (λ1 and λ2) has to be determined
empirically.
We can not use test data for tuning.
Create additional split of data to tune which is called validation
set.
Train and cross-validate with several lambda choices and find
the best one.
Then test on the test set to make sure results generalize.
03/18/24 11
Ridge Regression: the Syntax
Import the class containing the regression method.
from sklearn.linear_model import Ridge
Create an instance of the class.
RR = Ridge (alpha=1.0) # Regularization parameter
Fit the instance on the data and then predict the expected value.
RR = [Link] (x_train, y_train)
y_predict = [Link] (x_test)
The RedgeCV class will perform cross validation on a set of
values for alpha.
03/18/24 12
Lasso Regression: the Syntax
Import the class containing the regression method.
From sklearn.linear_model import Lasso
Create an instance of the class.
LR = Lasso (alpha=1.0) # Regularization parameter
Fit the instance on the data and then predict the expected value.
LR = [Link] (x_train, y_train)
y_predict = [Link] (x_test)
The LassoCV class will perform cross validation on a set of
values for alpha.
03/18/24 13
Elastic Net Regression: the Syntax
Import the class containing the regression method.
From sklearn.linear_model import ElasticNet
Create an instance of the class.
EN = ElasticNet (alpha=1.0, L1_ratio=0.5) # alpha is
Regularization parameter, L1_ratio distributes alpha to L1/L2
(L1 = 0 , ridge and L1 = 1, lasso)
Fit the instance on the data and then predict the expected value.
EN = [Link] (x_train, y_train)
y_predict = [Link] (x_test)
The ElasticNetCV class will perform cross validation on a set
of values L1_ratio and alpha.
03/18/24 14
Feature Selection
Regularization performs feature selection by shrinking the
contribution of features.
Reduces the contribution of features to the model so that
models are not over-fitting.
For L1-regularization, this is accomplished by driving some
coefficients to zero.
Feature selection can also be performed by removing features.
This would have to be done qualitatively say by removing a
feature at a time and measuring the predictive results via cross
validation.
03/18/24 15
Why is Feature Selection
Important ?
Reducing the number of features is another way to prevent
over-fitting (similar to regularization) [some signal, some
noise].
For some models, fewer features can improve fitting time
and/or results. (Particularly those that don’t have regularization
built-in)
Identifying critical features can improve model interpretability.
Feature elimination can be also be used to identify most
important features, which can improve model interpretability.
03/18/24 16
Recursive Feature Elimination
(RFE) : the Syntax
Import the class containing the feature selection method.
From sklearn.feature_selection import RFE
Create an instance of the class.
rfeMod = RFE(est, n_features_to_select=5) # [est is an instance
of the model to use (LASSO), final number of features]
Fit the instance on the data and then predict the expected value.
rfeMod = [Link](x_train, y_train)
y_predict = [Link](x_test)
The RFECV class will perform feature elimination using cross
validation.
03/18/24 17
Logistic Regression
03/18/24 18
Logistic Regression
Many of the problems we have in the real life need optimize solution.
Logistic regression is finding best-fit parameters to a nonlinear function called the sigmoid.
03/18/24 19
Logistic Regression
We’ll use optimization algorithms to find these best-fit parameters.
Among the optimization algorithms, one of the most common algorithms is gradient
descent/ascent.
Gradient ascent can be simplified with stochastic gradient descent/ascent.
These optimization algorithms will be used to train our classifier.
03/18/24 20
Logistic Regression
General approach to logistic regression:
Collect: any method
Prepare : numeric values are needed for a distance calculation. A
structured data format is best.
Analyze: any method
Train : take time to find optimal coefficient to classify the data.
Test: classification is quick and easy once the training step is done.
Use: this application needs to get some input data and output structured
numeric values.
03/18/24 21
Logistic Regression
Stochastic gradient ascent can do as well as gradient ascent using far
fewer computing resources.
Logistic regression:
Pros: Computationally inexpensive, easy to implement, knowledge
representation easy to interpret.
Cons: prone to under-fitting, may have low accuracy.
Works with: Numeric values, nominal values.
03/18/24 22
Logistic Regression
We can treat a binary classification problem as a regression
problem.
The binary classes 1/0, and fit a regression algorithm.
When a new unlabeled recorded comes in, try to guess the value
with a regression algorithm.
If the value is higher than the midpoint (0.5), declare the prediction
to be the class 1, and vice versa.
03/18/24 23
Logistic Regression
Linear regression for classification:
03/18/24 24
Logistic Regression
What if our data looks different as shown below; the line becomes
more slanted.
03/18/24 25
Logistic Regression
What if our data looks different as shown below; the line becomes
more slanted.
03/18/24 26
Logistic Regression
Linear regression for classification:
03/18/24 27
Logistic Regression
This function always takes values between 0 and 1.
Smooth out the effect of high/low x values.
03/18/24 28
Logistic Regression
This way, our algorithm is not skewed by the sample and it
manages to find the obvious visual threshold.
03/18/24 29
Logistic Regression
So instead of trying to fit (y = b0 + b1x); we can try to fit (y = f(b0 +
b1x)).
The resulting algorithm is called “logistic regression”.
It is not regression algorithm; it is a classification algorithm.
03/18/24 30
Logistic Regression
Yb(x) is always going to be between 0 and 1.
As opposed to linear regression approach; which can take on any
value.
03/18/24 31
Logistic Regression: the Syntax
Import the class containing the classification method.
From sklearn.linear_model import LogisticRegression
Create an instance of the class.
LR = LogisticRegression(penalty=12, c=10.0) # Regularization
parameters [penalty 12 norm , higher c -> less penalty]
Fit the instance on the data and then predict the expected value.
LR = [Link](x_train, y_train)
y_predict = [Link](x_test)
Tune regularization parameters with cross validation:
LogisticRegressionCV.
03/18/24 32
Classification Error Metrics
03/18/24 33
Error Metrics
The choice of the “right” error metric depends heavily on the
question and the data.
For example: assume we are classifying patients likely to get
leukemia.
In our training data, a large majority (99%) of patients are
healthy. (1% patient with leukemia)
Accuracy is often not the right metric for a binary classifier
problem.
03/18/24 34
Error Metrics
When thinking about error with classification, we often talk about
a confusion matrix.
03/18/24 35
Error Metrics
Accuracy: predicting correctly. (the most common error metric
but it can be deceived in situation where the populations are
skewed)
03/18/24 36
Error Metrics
Recall: measures the % of the actual positive class that is
correctly predicted (another common error metric)
100 % recall by predicting everything to be positive
03/18/24 37
Error Metrics
Precision: measures the % of the predicted positive class that is
correct (identify only positive instances)
03/18/24 38
Error Metrics
Specificity: how correctly the actual negative class is predicted
(Avoid false alarms)
03/18/24 39
Error Metrics
F1 Score : is the harmonic mean.
It is nice metric because it uses both precision and recall.
03/18/24 40
Error Metrics
Receiver Operating Characteristic (ROC):
The diagonal of this matrix represents the value that can be obtained
by random guessing.
03/18/24 41
Error Metrics
Area Under Curve (AUC):
How well are we separating the two classes.
0.5 is random – useless model; 1 is perfect classification.
03/18/24 42
Error Metrics
Precision Recall Curve (PR Curve):
Plot the precision – recall values for various score threshold.
03/18/24 43
Error Metrics
Multiple Class Error Metrics:
03/18/24 44
Classification Error Metrics: the
Syntax
Import the desired error function
from [Link] import accuracy_score
Calculate the error on the test and predicted data sets
accuracy_value = accuracy_score(y_test, y_pred)
Lots of other error metrics and diagnostic tools:
from [Link] import precision_score, recall_score,
f1_score, roc_auc_score, confusion_matrix, roc_curve,
precision_recall_curve etc…
03/18/24 45
Summary
What the logistic function is and how it is used in logistic regression.
The key representation in logistic regression are the coefficients, just like linear regression.
That the coefficients in logistic regression are estimated using a process called maximum-
likelihood estimation. That making predictions using logistic regression is so easy.
03/18/24 46
Question & Answer
03/18/24 47
Thank You !!!
03/18/24 48