0% found this document useful (0 votes)

46 views22 pages

Lecture 4 - Regularization

The document discusses overfitting, cross-validation, and regularization in the context of machine learning. It explains how increasing model complexity can lead to overfitting, the importance of model selection using cross-validation, and the role of regularization techniques like `2 and `1 norms in preventing overfitting. The document emphasizes the need for disjoint training and test sets and the computational trade-offs involved in different regularization methods.

Uploaded by

aeryaery0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views22 pages

Lecture 4 - Regularization

Uploaded by

aeryaery0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Artificial Intelligence II (CS4442 & CS9542)

Overfitting, Cross-Validation, and Regularization

Boyu Wang
Department of Computer Science
University of Western Ontario
Motivation examples: polynomial regression

I As the degree of the polynomial increases, there is more degrees of

freedom, and the (training) error approaches to zero.

Figure credit: Christopher Bishop

1
Motivation examples: polynomial regression

I Minimizing the training/empirical loss does NOT indicate a good

test/generalization performance.
I Overfitting: Very low training error, very high test error.

Figure credit: Christopher Bishop

2
Overfitting – general phenomenon

I Too simple (e.g., small M) → underfitting

I Too complex (e.g., large M) → overfitting

Figure credit: Ian Goodfellow

3
Overfitting

I Training loss and test loss are different

I Larger the hypothesis class, easier to find a hypothesis that fits
the training data
- but may have large test error (overfitting)
I Prevent overfitting:
- Large data set
- Throw away useless hypothesis class (model selection)
- Control model complexity (regularization)

4
Larger data set

I Overfitting is mostly due to sparseness of data.

I Same model complexity: more data ⇒ less overfitting. With more data,
more complex (i.e. more flexible) models can be used.

Figure credit: Christopher Bishop

5
Model selection

I How to choose the optimal model complexity/hyper-parameter

(e.g., choose the best degree for polynomial regression)
I Cannot be done by training data alone

6
Model selection

I How to choose the optimal model complexity/hyper-parameter

(e.g., choose the best degree for polynomial regression)
I Cannot be done by training data alone

I We can use our prior knowledge or expertise (e.g., somehow we

know that the degree should not exceed 4)
I Create held-out data to approximate the test error (i.e., mimic
the test data)
I called validation data set

6
Model selection: cross-validation

For each order of polynomial M

1. Randomly split the training data into K groups, and following procedure
K times:
i. Leave out the k -th group from the training set as a validation set
ii. Use the other other K − 1 to find best parameter vector wk
iii. Measure the error of wk on the validation set; call this Jk

1
PK
2. Compute the average errors: J = K k =1 Jk

Choose the order of polynomial M with the lowest error J.

7
Model selection: cross-validation

Figure: K -fold cross-validation for the case of K = 4

Figure credit: Christopher Bishop

8
General learning procedure

Given a training set and a test set

1. Use cross-validation to choose the hyper-parameter/hypothesis

class.

2. Once the hyper-parameter is selected, use the entire training set

to find the best model parameters w.

3. Evaluate the performance of w on the test set.

These sets must be disjoint! – you should never touch the test data
before you evaluate your model.

9
Summary of cross-validation

I Can also used for selecting other hyper-parameters for

model/algorithm (e.g., number of hidden layers of neural
networks, learning rate of gradient descent, or even different
machine learning models)
I Very straightforward to implement algorithm
I Provides a great estimate of the true error of a model
I Leave-one-out cross-validation: number of groups = number of
training instances
I Computationally expensive; even worse when there are more
hyper-parameters

10
Regularization

I Intuition: complicated hypotheses lead to overfitting

I Idea: penalize the model complexity (e.g., large values of wj ):

L(w) = J(w) + λR(w)

where J(w): training loss, R(w): regularization

function/regularizer, and λ ≥ 0: regularization parameter to
control the tradeoff between data fitting and model complexity.

11
`2 -norm regularization for linear regression
Objective function:
m n 2 λ X n
1 XX
L(w) = w0 + wj · xi,j − yi + wj2
2 2
i=1 j=1 j=1

I No regularization on w0 !

Equivalently, we have
1 λ
L(w) = ||Xw − y||22 + w > Îw
2 2
where w = [w0 , w1 , . . . , wn ]>
 
0 0 ··· 0
0 1 ··· 0
Î =  .
 
.. .. .. 
 .. . . .
0 0 ··· 1
12
`2 -norm regularization for linear regression

Objective function:

1 λ
L(w) = ||Xw − y||22 + w > Îw
2 2
1 > >
= w (X X + λÎ)w − w > X > y − y > Xw + y > y
2
Optimal solution (by solving ∇L(w) = 0):

w = (X > X + λÎ)−1 X > y

13
More on `2 -norm regularization

1 λ
arg min ||Xw − y||22 + w > Îw = (X > X + λÎ)−1 X > y
w 2 2
I `2 -norm regularization pushes the parameters towards to 0.

I λ = 0 ⇒ same as in the regular linear regression

I λ→∞⇒w →0

I 0 < λ < ∞ ⇒ magnitude of the weights will be smaller than in the

regular linear regression

Figure credit: Christopher Bishop 14

Another view of `2 -norm regularization

I From the optimization theory1 , we know that

min J(w) + λR(w)

is equivalent to
min J(w)
w

such that R(w) ≤ η

for some η ≥ 0.
I Hence, `2 -regularized linear regression can be re-formulated as (we
only consider wj , j > 0 here)

min ||Xw − y ||22

such that ||w||22 ≤ η

1
e.g., Boyd and Lieven. Convex Optimization. 2004.
15
Visualizing `2 -norm regularization (2 features)

Figure: w ∗ = (X > X + λI)−1 Xy

Figure credit: Christopher Bishop

16
`1 -norm regularization

I Instead of using `2 -norm, we use `1 -norm to control the model

complexity:
m n n
1 XX 2 X
min w0 + wj · xi,j − yi + λ |wj |
w 2
i=1 j=1 j=1

which is equivalent to
m n
1 XX 2
min w0 + wj · xi,j − yi
w 2
i=1 j=1
n
X
such that |wj | ≤ η
j=1

I Also called LASSO (least absolute shrinkage and selection operator).

I No analytical solution anymore!

17
Visualizing `1 -norm regularization (2 features)
I If λ is large enough , the circle is very likely to intersect the diamond at
one of the corners.
I This makes `1 -norm regularization much more likely to make some
weights exactly 0.
I In other words, we essentially perform feature selection!

18
Comparison of `2 and `1

Figure credit: Bishop; Hastie, Tibshirani & Friedman 19

Summary of regularization

I Both are commonly used approaches to avoid overfitting.

I Both push the weights towards 0.

I `2 produces small, but non-zero weights, while `1 is likely to make some

weights exactly 0.
I `1 optimization is computationally more expensive than `2 .

I Choose appropriate λ: cross-validation is often used.

Lecture 3
No ratings yet
Lecture 3
61 pages
Regularization
No ratings yet
Regularization
42 pages
Regularization
No ratings yet
Regularization
46 pages
Regression and Generalization
No ratings yet
Regression and Generalization
67 pages
Machine Learning
No ratings yet
Machine Learning
63 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
12-Regularization For Deep Learning-17!08!2024
No ratings yet
12-Regularization For Deep Learning-17!08!2024
51 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Deep Learning Regularization Guide
No ratings yet
Deep Learning Regularization Guide
77 pages
DL-Lec 2 - Bias-Variance-Tradeoff
No ratings yet
DL-Lec 2 - Bias-Variance-Tradeoff
33 pages
L11+ Regularization
No ratings yet
L11+ Regularization
24 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
BACK PROPAGATION and REGULATION, BATCH NORMALIZATION
No ratings yet
BACK PROPAGATION and REGULATION, BATCH NORMALIZATION
20 pages
Regularization CrossValidation
No ratings yet
Regularization CrossValidation
37 pages
4.bias and Variance
No ratings yet
4.bias and Variance
19 pages
1.2 Overfitting Under Fitting and Cross Validation and Confusion Matrix
No ratings yet
1.2 Overfitting Under Fitting and Cross Validation and Confusion Matrix
17 pages
Lecture 7
No ratings yet
Lecture 7
29 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
DL Unit1
100% (1)
DL Unit1
61 pages
Lecture Slide 02 - Supervised Learning-1
No ratings yet
Lecture Slide 02 - Supervised Learning-1
43 pages
Quiz 1 Materials
No ratings yet
Quiz 1 Materials
159 pages
Machine Learning Application Tips
No ratings yet
Machine Learning Application Tips
8 pages
4 MachineLearningForCV
No ratings yet
4 MachineLearningForCV
73 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
Lasso Regression in Logistic Models
No ratings yet
Lasso Regression in Logistic Models
43 pages
Ai - W7L14
No ratings yet
Ai - W7L14
22 pages
Unit-I Machine Learning Basics
No ratings yet
Unit-I Machine Learning Basics
85 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
Bias Variance
No ratings yet
Bias Variance
14 pages
ML - Perplexity
No ratings yet
ML - Perplexity
71 pages
Supervised Learning: Regression Insights
No ratings yet
Supervised Learning: Regression Insights
11 pages
L11+ Regularization
No ratings yet
L11+ Regularization
25 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Linear Regression With Multiple Variable
No ratings yet
Linear Regression With Multiple Variable
30 pages
Understanding Model Regularization in ML
No ratings yet
Understanding Model Regularization in ML
42 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
5 pages
CMPE257 - W2C3 - ML Fundamentals - Part 2
No ratings yet
CMPE257 - W2C3 - ML Fundamentals - Part 2
34 pages
Regression
No ratings yet
Regression
56 pages
AN2DL 03 2324 NeuralNetwroksTraining
No ratings yet
AN2DL 03 2324 NeuralNetwroksTraining
40 pages
Underfitting Overfitting
No ratings yet
Underfitting Overfitting
38 pages
Lecture Slide 02 - Supervised Learning - Summer 2023
No ratings yet
Lecture Slide 02 - Supervised Learning - Summer 2023
43 pages
Regularisation in Machine Learning Models
No ratings yet
Regularisation in Machine Learning Models
79 pages
Scribe Notes Fall 2022
No ratings yet
Scribe Notes Fall 2022
41 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
12 pages
Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
02 - Linear Models - C - Regularization - Logistic - Regression
No ratings yet
02 - Linear Models - C - Regularization - Logistic - Regression
16 pages
Lec 3
No ratings yet
Lec 3
31 pages
Lecture Slides 3 - Bias Variance and Regularisation For Neural Networks - 2021
No ratings yet
Lecture Slides 3 - Bias Variance and Regularisation For Neural Networks - 2021
29 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Regularization 1704650055
No ratings yet
Regularization 1704650055
32 pages
Session 3
No ratings yet
Session 3
26 pages
Optimizing Model Complexity in ML
No ratings yet
Optimizing Model Complexity in ML
32 pages
CS2011 2
No ratings yet
CS2011 2
14 pages
CC103
100% (1)
CC103
12 pages
ME 231 Assignment 3
100% (1)
ME 231 Assignment 3
4 pages
Rounding Practice for Students
No ratings yet
Rounding Practice for Students
2 pages
2 Euclid
No ratings yet
2 Euclid
25 pages
Half-Space and B-rep Modeling Techniques
No ratings yet
Half-Space and B-rep Modeling Techniques
21 pages
Voronoi Diagrams Explained
No ratings yet
Voronoi Diagrams Explained
14 pages
Applications of Pursuit Curves in Math
No ratings yet
Applications of Pursuit Curves in Math
20 pages
Unsaturated Soil Impact on Pile Design
No ratings yet
Unsaturated Soil Impact on Pile Design
22 pages
Math 110 Prob Sets 1
No ratings yet
Math 110 Prob Sets 1
4 pages
AutoCAD Module 1
No ratings yet
AutoCAD Module 1
38 pages
Deep Learning Essentials
No ratings yet
Deep Learning Essentials
27 pages
Word Problems (Age)
100% (1)
Word Problems (Age)
4 pages
INSEAN E779a Propeller CFD Database
No ratings yet
INSEAN E779a Propeller CFD Database
5 pages
Tutorial3 Moment Couple
No ratings yet
Tutorial3 Moment Couple
6 pages
Fluids 3 Pipe Network Assignment
No ratings yet
Fluids 3 Pipe Network Assignment
13 pages
NRM-201-Forest Survey and Engineering 2
No ratings yet
NRM-201-Forest Survey and Engineering 2
73 pages
Babuska Cara 1991
No ratings yet
Babuska Cara 1991
224 pages
Volume 5
No ratings yet
Volume 5
25 pages
Iso 16610 28 2016
No ratings yet
Iso 16610 28 2016
11 pages
Experiment 5 - DDCA
No ratings yet
Experiment 5 - DDCA
6 pages
Cantab Unit 2 Measurement PDF
No ratings yet
Cantab Unit 2 Measurement PDF
5 pages
Engineering Drawing: Tolerance & Fits
No ratings yet
Engineering Drawing: Tolerance & Fits
37 pages
Math10 SLHT, q3, Wk3, M10sp-Iiic-1
No ratings yet
Math10 SLHT, q3, Wk3, M10sp-Iiic-1
6 pages
Idk 2
No ratings yet
Idk 2
3 pages
Textbook: Digital Design, 6 - Edition: M. Morris Mano and Michael D. Ciletti
No ratings yet
Textbook: Digital Design, 6 - Edition: M. Morris Mano and Michael D. Ciletti
41 pages
2024 KZN Informal Test - Calculus
No ratings yet
2024 KZN Informal Test - Calculus
2 pages
DLP - Q3 - W3 - Solving Simple Equation Using Bar Model
No ratings yet
DLP - Q3 - W3 - Solving Simple Equation Using Bar Model
6 pages
Boundary Integral Equation Methods and Numerical Solutions Thin Plates On An Elastic Foundation Constanda Latest PDF 2025
No ratings yet
Boundary Integral Equation Methods and Numerical Solutions Thin Plates On An Elastic Foundation Constanda Latest PDF 2025
147 pages
Griffiths QMCH 4 P 13
No ratings yet
Griffiths QMCH 4 P 13
4 pages
TOC - MCQ Unit 1 To 3
No ratings yet
TOC - MCQ Unit 1 To 3
19 pages