Learning Best Practices For Model Evaluation and Hyperparameter Tuning

- K-fold cross-validation is a technique to estimate model performance on unseen data by splitting the dataset into k folds, using k-1 folds for training and 1 fold for validation, and repeating this process k times. This yields a less variable estimate than the holdout method. - Learning curves and validation curves are diagnostic tools to identify overfitting and underfitting problems. Learning curves show training and validation accuracy for different sized training sets, while validation curves vary a model parameter like regularization.

Uploaded by

sanjeev dev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

282 views17 pages

Learning Best Practices For Model Evaluation and Hyperparameter Tuning

Uploaded by

sanjeev dev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Module 8

Learning Best Practices for Model

Evaluation and Hyperparameter Tuning
Using k-fold cross-validation to assess
model performance
• One of the key steps in building a machine learning model is to estimate its
performance on data that the model hasn't seen before.
• When we fit our model on a training dataset and use the same data to estimate
how well it performs, a model can either suffer from underfitting (high bias) if the
model is too simple, or it can overfit the training data (high variance) if the model
is too complex for the underlying training data.
• To find an acceptable bias-variance trade-off, we need to evaluate our model
carefully.
• One common cross-validation techniques holdout cross-validation and k-fold cross-
validation, can help us obtain reliable estimates of the model's generalization
performance, that is, how well the model performs on unseen data.
The holdout method
• A classic and popular approach for estimating the generalization performance of machine
learning models is holdout cross-validation.
• in typical machine learning applications, we are also interested in tuning and comparing
different parameter settings to further improve the performance for making predictions on
unseen data.
• This process is called model selection, where the term model selection refers to a given
classification problem for which we want to select the optimal values of tuning parameters
(also called hyperparameters).
• However, if we reuse the same test dataset over and over again during model selection, it
will become part of our training data and thus the model will be more likely to overfit, and
it is also not a good machine learning practice.
• A better way of using the holdout method for model selection is to separate the data into
three parts: a training set, a validation set, and a test set.
• The training set is used to fit the different models, and the performance on the
validation set is then used for the model selection.
The Concept of Holdout Method
• The validation set repeatedly evaluates the
performance of the model after training using
different parameter values. Once we are
satisfied with the tuning of hyperparameter
values, we estimate the models' generalization
performance on the test dataset
• A disadvantage of the holdout method is that
the performance estimate may be very
sensitive to how we partition the training set
into the training and validation subsets; the
estimate will vary for different samples of the
data. In the next subsection, we will take a
look at a more robust technique for
performance estimation, k-fold cross-
validation, where we repeat the holdout Concept of Holdout Cross-validation
method k times on k subsets of the training
data.
K-fold cross-validation
• In k-fold cross-validation, we randomly split the training dataset into k folds
without replacement, where k — 1 folds are used for the model training, and one
fold is used for performance evaluation.
• This procedure is repeated k times so that we obtain k models and performance
estimates.
• A good standard value for k in k-fold cross-validation is 10, as empirical evidence
shows.
• However, if we are working with relatively small training sets, it can be useful to
increase the number of folds.
• On the other hand, if we are working with large datasets, we can choose a
smaller value for k, for example, k = 5, and still obtain an accurate estimate of the
average performance of the model while reducing the computational cost of
refitting and evaluating the model on the different folds.
Since k-fold cross-validation is a resampling
technique without replacement, the advantage
of this approach is that each sample point will
be used for training and validation (as part of a
test fold) exactly once, which yields a lower-
variance estimate of the model performance
than the holdout method. The following figure
summarizes the concept behind k-fold cross-
validation with k = 10. The training dataset is
divided into 10 folds, and during the 10
iterations, nine folds are used for training, and
one fold will be used as the test set for the
model evaluation. Also, the estimated
performances Ei (for example, classification
accuracy or error) for each fold are then used to
calculate the estimated average performance E
of the model:
SAMPLE CODES
SAMPLE CODES
Debugging algorithms with
learning and validation curves
• simple yet powerful diagnostic tools that can help us
improve the performance of a learning algorithm: learning
curves and validation curves.
• to diagnose whether a learning algorithm has a problem with
overfitting (high variance) or underfitting (high bias).
high bias - low
training and cross-
validation accuracy, high variance, which
which indicates that it is indicated by the large
underfits the training gap between the training
data. Solution: and cross-validation
increase the number accuracy
of parameters Solution : we can collect
of the model, for more training data,
example, by collecting reduce the
or constructing complexity of the model,
additional features, or or increase the
by regularization parameter
decreasing the degree
of regularization

Diagnosing bias and variance problems with learning curves

SAMPLE CODES
learning curve plot
Via the train_sizes parameter in the learning_curve function, we can control the
absolute or relative number of training samples that are used to generate the
learning curves. Here, we set train_sizes=np.linspace(0.1, 1.0, 10) to use 10
evenly spaced, relative intervals for the training set sizes. By default, the
learning_curve function uses stratified k-fold cross-validation to calculate the
cross-validation accuracy of a classifier, and we set k=10 via the cv parameter
for 10-fold stratified cross-validation. Then, we simply calculated the
average accuracies from the returned cross-validated training and test scores
for the different sizes of the training set, which we plotted using Matplotlib's
plot function. Furthermore, we added the standard deviation of the average
accuracy to the plot using the fill_between function to indicate the variance of
the estimate. As we can see in the preceding learning curve plot, our model
performs quite well on both the training and validation dataset if it had seen
more than 250 samples during training. We can also see that the training
accuracy increases for training sets with fewer than 250 samples, and the gap
between validation and training accuracy widens—an indicator of an
increasing degree of overfitting.
Addressing over- and underfitting with
validation curves
• Validation curves are a useful tool for improving the performance of a
model by addressing issues such as overfitting or underfitting.
Validation curves are related to learning curves, but instead of
plotting the training and test accuracies as functions of the sample
size, we vary the values of the model parameters, for example, the
inverse regularization parameter C in logistic regression. Let's go
ahead and see how we create validation curves via scikit-learn:
Sample codes
Similar to the learning_curve function, the validation_curve function uses
stratified k-fold cross-validation by default to estimate the performance of the
classifier. Inside the validation_curve function, we specified the parameter that
we wanted to evaluate. In this case, it is C, the inverse regularization parameter of
the LogisticRegression classifier, which we wrote as 'logisticregression__C' to
access the LogisticRegression object inside the scikit-learn pipeline for a specified
value range that we set via the param_range parameter. Similar to the learning
curve example in the previous section, we plotted the average training and cross-
validation accuracies and the corresponding standard deviations. Although the
differences in the accuracy for varying values of C are subtle, we can see that the
model slightly underfits the data when we increase the regularization strength
(small values of C). However, for large values of C, it means lowering the strength
of regularization, so the model tends to slightly overfit the data. In this case, the
sweet spot appears to be between 0.01 and 0.1 of the C value.

Topic 3
No ratings yet
Topic 3
48 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
Unit 5 (ML)
No ratings yet
Unit 5 (ML)
25 pages
Improving Machine Learning Performance
No ratings yet
Improving Machine Learning Performance
14 pages
ML-4 Cross Validation in Machine Learning
No ratings yet
ML-4 Cross Validation in Machine Learning
13 pages
Chapter2 1 33
No ratings yet
Chapter2 1 33
18 pages
Unit 2
No ratings yet
Unit 2
28 pages
Unit 5 New
No ratings yet
Unit 5 New
9 pages
Debugging Algorithms with Curves
100% (1)
Debugging Algorithms with Curves
11 pages
Cross Validation in Machine Learning
No ratings yet
Cross Validation in Machine Learning
4 pages
Cross Validation for ML Models
No ratings yet
Cross Validation for ML Models
6 pages
P-2.1.2 Cross Validation and Regularization
No ratings yet
P-2.1.2 Cross Validation and Regularization
37 pages
Unit 6 - Model Selection
No ratings yet
Unit 6 - Model Selection
13 pages
Supervised Learning Sampling Methods
No ratings yet
Supervised Learning Sampling Methods
13 pages
Lec 16
No ratings yet
Lec 16
18 pages
Evaluating Model Performance in ML
No ratings yet
Evaluating Model Performance in ML
16 pages
18 Bias Variance K-foldCrossValidation Boosting
No ratings yet
18 Bias Variance K-foldCrossValidation Boosting
23 pages
Cross Validation Techniques
No ratings yet
Cross Validation Techniques
27 pages
List Steps in Data Preparation. Give Short Description of Each Step
No ratings yet
List Steps in Data Preparation. Give Short Description of Each Step
20 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
ML Unit 4 Trupesh Patel
No ratings yet
ML Unit 4 Trupesh Patel
56 pages
Cross Validation
No ratings yet
Cross Validation
5 pages
Cross Validation
No ratings yet
Cross Validation
7 pages
CH 05 Optimization Technique
No ratings yet
CH 05 Optimization Technique
58 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
Unit V
No ratings yet
Unit V
16 pages
Model Evaluation and Cross-Validation Methods
No ratings yet
Model Evaluation and Cross-Validation Methods
3 pages
Machine Learning Data Splits Guide
No ratings yet
Machine Learning Data Splits Guide
30 pages
ML Unit 2 Part 1
No ratings yet
ML Unit 2 Part 1
47 pages
Unit 4
No ratings yet
Unit 4
34 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
43 pages
Unit 3 ML
No ratings yet
Unit 3 ML
40 pages
Cross Validation in ML
No ratings yet
Cross Validation in ML
5 pages
Data Science Model Optimization
No ratings yet
Data Science Model Optimization
18 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
Unit 9 Model Evaluation
No ratings yet
Unit 9 Model Evaluation
26 pages
Lec-1 Bias-variance-Tradeoff
No ratings yet
Lec-1 Bias-variance-Tradeoff
24 pages
ML Pyq Ans
No ratings yet
ML Pyq Ans
37 pages
A Gentle Introduction To K-Fold Cross-Validation
No ratings yet
A Gentle Introduction To K-Fold Cross-Validation
69 pages
Section 1: Cross-Validation and Model Performance
No ratings yet
Section 1: Cross-Validation and Model Performance
33 pages
ML Unit1
No ratings yet
ML Unit1
11 pages
Ovefitting, Generalization, Cross Validation
No ratings yet
Ovefitting, Generalization, Cross Validation
20 pages
Prediction Accuracy in Data Modeling
No ratings yet
Prediction Accuracy in Data Modeling
28 pages
Lect 03 Evaluation Part 2
No ratings yet
Lect 03 Evaluation Part 2
40 pages
Chapter 5
No ratings yet
Chapter 5
3 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
107 pages
Introduction To K-Fold Cross-Validation
No ratings yet
Introduction To K-Fold Cross-Validation
6 pages
DSOST3
No ratings yet
DSOST3
31 pages
Overfitting vs Underfitting in ML
No ratings yet
Overfitting vs Underfitting in ML
20 pages
ML-4th Unit
No ratings yet
ML-4th Unit
44 pages
Lecture Note #6 - PEC-CS701E
No ratings yet
Lecture Note #6 - PEC-CS701E
11 pages
Cross-Validation in Machine Learning
No ratings yet
Cross-Validation in Machine Learning
51 pages
DS Unit 5
No ratings yet
DS Unit 5
18 pages
ML.1Lecture.2 (Old)
No ratings yet
ML.1Lecture.2 (Old)
23 pages
Lecture Model Test Evaluation Techniques
No ratings yet
Lecture Model Test Evaluation Techniques
31 pages
Declaration Cover Sheet For Project Submission: SECTION 1 Student To Complete
No ratings yet
Declaration Cover Sheet For Project Submission: SECTION 1 Student To Complete
77 pages
Resource Scheduling and Cost Management
No ratings yet
Resource Scheduling and Cost Management
40 pages
Week 3 Legal Issues
No ratings yet
Week 3 Legal Issues
49 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
Addressing The Problems of Food Waste Generation in Malaysia
No ratings yet
Addressing The Problems of Food Waste Generation in Malaysia
11 pages
ENGLISH 10 Q1 Reviewer
No ratings yet
ENGLISH 10 Q1 Reviewer
5 pages
Attitude and Social Cognition
No ratings yet
Attitude and Social Cognition
6 pages
Qureshi Guess 9059
No ratings yet
Qureshi Guess 9059
24 pages
مفردات و قواعد اللغة الإنجليزية
No ratings yet
مفردات و قواعد اللغة الإنجليزية
4 pages
15-10-45 22-47-46 MI Padmapriya Redo NG
No ratings yet
15-10-45 22-47-46 MI Padmapriya Redo NG
44 pages
Narrative Inservice Training Masterclass 24 25
No ratings yet
Narrative Inservice Training Masterclass 24 25
4 pages
2nd Quarter COT 2023-2024
No ratings yet
2nd Quarter COT 2023-2024
2 pages
Sample Test For Aspiring School Heads For Review Purposes
No ratings yet
Sample Test For Aspiring School Heads For Review Purposes
17 pages
School Wise ZERO Teacher Attendance Report 03 - Oct - 2025!01!06 - AM
No ratings yet
School Wise ZERO Teacher Attendance Report 03 - Oct - 2025!01!06 - AM
7 pages
The Klacakratantra The Chapter On Sdhana Together With The Vimalaprabh Commentary Wallace PDF Download
No ratings yet
The Klacakratantra The Chapter On Sdhana Together With The Vimalaprabh Commentary Wallace PDF Download
89 pages
University Vision & M.Tech Program Overview
No ratings yet
University Vision & M.Tech Program Overview
104 pages
Appendix - A
No ratings yet
Appendix - A
19 pages
Understanding Feedback in Communication
No ratings yet
Understanding Feedback in Communication
8 pages
Ruby on Rails Developer Profile
No ratings yet
Ruby on Rails Developer Profile
1 page
Improve Narrative Writing Skills
No ratings yet
Improve Narrative Writing Skills
27 pages
Paper 1 - Year 5 Cefr
100% (2)
Paper 1 - Year 5 Cefr
12 pages
School Form 2 Daily Attendance Report of Learners For Senior High School (SF2-SHS)
No ratings yet
School Form 2 Daily Attendance Report of Learners For Senior High School (SF2-SHS)
6 pages
Is 26 Years Old Still Consider As Teenager - Goog
No ratings yet
Is 26 Years Old Still Consider As Teenager - Goog
1 page
APICS CPIM Exam Prep Guide
No ratings yet
APICS CPIM Exam Prep Guide
1 page
Civil Engineer Full Tect Results - Performance of Schools
No ratings yet
Civil Engineer Full Tect Results - Performance of Schools
7 pages
MQ HSC
No ratings yet
MQ HSC
1 page
Learner Guide For Cambridge Igcse Geography 0460 For Examination From 2020
100% (2)
Learner Guide For Cambridge Igcse Geography 0460 For Examination From 2020
34 pages
St. Rosary College - Goa - Conference 2025
No ratings yet
St. Rosary College - Goa - Conference 2025
7 pages
Module 1b Technopreneurial Mindset
No ratings yet
Module 1b Technopreneurial Mindset
5 pages
Nicole Wiley Link Resume
No ratings yet
Nicole Wiley Link Resume
2 pages
Safety-Assessment-Psa-11911248: Download PDF
No ratings yet
Safety-Assessment-Psa-11911248: Download PDF
160 pages
English 5 Quarter 1 Week 2 Day 5
No ratings yet
English 5 Quarter 1 Week 2 Day 5
5 pages
Task 1
No ratings yet
Task 1
3 pages
Curriculum Pacing Guide/Lesson Plan: San Isidro Catholic School
No ratings yet
Curriculum Pacing Guide/Lesson Plan: San Isidro Catholic School
6 pages
Solved - Chapter 10 Problem 14P Solution - Classical Mechanics 0th Edition
No ratings yet
Solved - Chapter 10 Problem 14P Solution - Classical Mechanics 0th Edition
3 pages