0% found this document useful (0 votes)

15 views16 pages

Evaluating Model Performance in ML

The document discusses the fundamentals of machine learning evaluation, focusing on data splitting methods and bias-variance handling. It emphasizes the importance of generalization, sample error, and true error, detailing methods like the holdout method and k-fold cross-validation for assessing model performance. The document also highlights the significance of avoiding data leakage and the need for proper dataset partitioning during model training and testing.

Uploaded by

Dhruv Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views16 pages

Evaluating Model Performance in ML

Uploaded by

Dhruv Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

DS605: Fundamentals of Machine Learning

Lecture 09

Evaluation - I
[Schemes for Data Split and Handling Bias-Variance]

Arpit Rana
8th August 2024
Learning = Representation + Evaluation + Optimization

Representation ✔
Choosing a representation of the learner: the hypotheses
space or the model class — the set of models that it can
possibly learn.

Evaluation
Choosing an evaluation function (also called objective
function, utility function, loss function, or scoring
function) is needed to distinguish good classiﬁers from
bad ones.

Optimization
��
Choosing a method to search among the models in the
hypothesis space for the highest-scoring one.
Experimental Evaluation of Learning Algorithms

The overall objective of the Learning

Algorithm is to ﬁnd a hypothesis that -

● is consistent (i.e., ﬁts the training

data), but more importantly,
Hypothesis Learner
Space 𝓗 (𝚪: S → h)
● generalizes well for previously
unseen data.

Experimental Evaluation deﬁnes ways

to Measure the Generalizability of a Final Hypothesis or
Learning Algorithm. Model (h)

Given a representation, data, and a bias, the

learning algorithm returns a ﬁnal hypothesis.
Experimental Evaluation of Learning Algorithms

Sample Error
The sample error of hypothesis h with respect to the target function f and data sample S is:

It is impossible to asses
true error, so we try to
estimate it using sample
error.

True Error

The true error of hypothesis h with respect to the target function f and the distribution D is
the probability that h will misclassify an instance drawn at random according to D:
Generalizing to Unseen Data

The error on the training set is called the training error (a.k.a. resubstitution error and
in-sample error).

● The training error is not, in general a good indicator of performance on unseen data. It's
often too optimistic.

● Why?
Generalizing to Unseen Data

To predict future performance, we need to measure error on an independent dataset:

● We want a dataset that has played no part in creating the model.

● This second dataset is called the test set.

● The error on the test set is called the test error (a.k.a. out-of-sample error and
extra-sample error).

Given a sample data S, there are methodologies to better approximate the true error of the
model.
Holdout Method

● Shufﬂe the dataset and partition it into two disjoint sets:

Dataset
○ training set (e.g., 80% of the full dataset); and

○ test set (the rest of the full dataset).

Shuﬄed Dataset
● Train the estimator on the training set.

● Test the model (evaluate the predictions) on the test set. Train Test

It is essential that the test set is not used in any way to create the model. Don't even look at it!
● 'Cheating' is called leakage.
● 'Cheating' is one cause of overﬁtting
Holdout Method: Class Exercise

Standardization, as we know, is about scaling the data. It requires calculation of the mean and
standard deviation.

When should the mean and standard deviation be calculated? And Why?
(a) before splitting, on the entire dataset, or
(b) after splitting, on just the training set, or
(c) after splitting, on just the test set, or
(d) after splitting, on the training and test sets separately,

What to do when the model is deployed?

Facts about Holdout Method

● The disadvantages of this method are:

○ Results can vary quite a lot across different runs.
○ Informally, you might get lucky — or unlucky
i.e., in any one split, the data used for training or testing might not be
representative.

● We are training on only a subset of the available dataset, perhaps as little as 50% of it.
From so little data, we may learn a worse model and so our error measurement may be
pessimistic.

● In practice, we only use the holdout method when we have a very large dataset. The size
of the dataset mitigates the above problems.

● When we have a smaller dataset, we use a resampling method:

○ The examples get re-used for training and testing.
K-fold Cross-Validation Method

The most-used resampling method is k-fold cross-validation:

● Shufﬂe the dataset and partition it into k disjoint subsets of equal size.
○ Each of the partitions is called a fold.
○ Typically, k=10, so you have 10 folds.

● You take each fold in turn and use it as the test set, training the learner on the remaining
folds.

● Clearly, you can do this k times, so that each fold gets 'a turn' at being the test set.
○ By this method, each example is used exactly once for testing, and k-1 times for
training.
K-fold Cross-Validation: Pseudocode

● Shuﬄe the dataset D and partition it into k k= 5 folds

disjoint equal-sized subsets, D1, ... ,Dk

● for i = 1 to k: Test

○ train on D \ Di
Test
○ make predictions for Di
○ measure error (e.g. MAE) .
.
.
● Report the mean of the errors
Test
Facts about K-fold Cross-Validation

● The disadvantages of this method are:

○ The number of folds is constrained by the size of the dataset and the desire
sometimes on the part of statisticians to have folds of at least 30 examples.

○ It can be costly to train the learning algorithm k times.

○ There may still be some variability in the results due to 'lucky'/'unlucky' splits.

● The extreme is k = n, also known as leave-one-out cross-validation or LOOCV.

Nested K-fold Cross-Validation Method

In case of hyperparameter (parameters of the model class, Dataset

not of the individual model) or parameter tuning, we
partition the whole dataset into three disjoint sets:
Shuﬄed Dataset
● A training set to train candidate models.

● A validation set, (a.k.a. a development set or dev set)

Train Dev. Test
to evaluate the candidate models and choose the
best one.
Train and Select
● A test set to do a ﬁnal unbiased evaluation of the the best model
best model.
Merge, Train and
Test the model
K-fold Cross-Validation can be applied to validation set
(inner CV) and test set (outer CV) in a nested way. Merge, Train and
Deploy the model
Model’s Performance

Training high
Underﬁtting
Error

low

Validation high
Overﬁtting
Error

low

Test high I.I.D.

Error Violation

low

Good Model
Model’s Performance

Training high
Underﬁtting
Error

low Underﬁtting Overﬁtting

Need More Complex Need Simpler Model

high Model
Validation
Overﬁtting
Error
Need Less Regularization Need More Regularization

low Need More Features Remove Extra Features

More Data Doesn’t Work Need More Data

Test high I.I.D.
Error Violation

low

Good Model
Next lecture Evaluation - II
11th August 2024

Lec 16
No ratings yet
Lec 16
18 pages
Ovefitting, Generalization, Cross Validation
No ratings yet
Ovefitting, Generalization, Cross Validation
20 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
Supervised Learning Sampling Methods
No ratings yet
Supervised Learning Sampling Methods
13 pages
CH 05 Optimization Technique
No ratings yet
CH 05 Optimization Technique
58 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
ML Unit 2 Part 1
No ratings yet
ML Unit 2 Part 1
47 pages
ML 3170724 Unit-3
No ratings yet
ML 3170724 Unit-3
48 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
ML-4 Cross Validation in Machine Learning
No ratings yet
ML-4 Cross Validation in Machine Learning
13 pages
Chapter 3
No ratings yet
Chapter 3
56 pages
5 - Model For Predictions - ML
No ratings yet
5 - Model For Predictions - ML
52 pages
Unit 5 (ML)
No ratings yet
Unit 5 (ML)
25 pages
Modellingandevaluationunit2june2322 220623063944 5c70ebed
No ratings yet
Modellingandevaluationunit2june2322 220623063944 5c70ebed
53 pages
Wa0001.
No ratings yet
Wa0001.
173 pages
Machine Learning Model Insights
No ratings yet
Machine Learning Model Insights
14 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
Lecture-4 Model Evaluation
No ratings yet
Lecture-4 Model Evaluation
28 pages
Model Evaluation and Cross-Validation Methods
No ratings yet
Model Evaluation and Cross-Validation Methods
3 pages
Da Unit 4
No ratings yet
Da Unit 4
26 pages
Training Evaluation
No ratings yet
Training Evaluation
42 pages
Model Selection and Evaluation
No ratings yet
Model Selection and Evaluation
23 pages
ML Unit 2
No ratings yet
ML Unit 2
86 pages
Cross Validation Techniques
No ratings yet
Cross Validation Techniques
27 pages
Unit 6 - Model Selection
No ratings yet
Unit 6 - Model Selection
13 pages
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
17 pages
Unit3ModellingandEvaluationpptx 2023 09 02 15 19 21
No ratings yet
Unit3ModellingandEvaluationpptx 2023 09 02 15 19 21
49 pages
Ch6-Models Selection Evaluating Classifiers
No ratings yet
Ch6-Models Selection Evaluating Classifiers
28 pages
Topic 3
No ratings yet
Topic 3
48 pages
ML Unit 4 Trupesh Patel
No ratings yet
ML Unit 4 Trupesh Patel
56 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Cross Validation in Machine Learning
No ratings yet
Cross Validation in Machine Learning
4 pages
ML Unit 2
No ratings yet
ML Unit 2
35 pages
Unit V
No ratings yet
Unit V
16 pages
Unit 2
No ratings yet
Unit 2
28 pages
ML.1Lecture.2 (Old)
No ratings yet
ML.1Lecture.2 (Old)
23 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
43 pages
K-Fold Cross Validation in ML
No ratings yet
K-Fold Cross Validation in ML
22 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
73 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Prediction Accuracy in Data Modeling
No ratings yet
Prediction Accuracy in Data Modeling
28 pages
Lect 03 Evaluation Part 2
No ratings yet
Lect 03 Evaluation Part 2
40 pages
M.L L-6 Re-Sampling Methods
No ratings yet
M.L L-6 Re-Sampling Methods
24 pages
ML Unit1
No ratings yet
ML Unit1
11 pages
Improving Machine Learning Performance
No ratings yet
Improving Machine Learning Performance
14 pages
Dimensionality Reduction & Model Evaluation
No ratings yet
Dimensionality Reduction & Model Evaluation
80 pages
Gradient Descent and Model Validation
No ratings yet
Gradient Descent and Model Validation
57 pages
Unit 5 New
No ratings yet
Unit 5 New
9 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
14 Model Selection and Boosting
No ratings yet
14 Model Selection and Boosting
51 pages
Lecture Note #6 - PEC-CS701E
No ratings yet
Lecture Note #6 - PEC-CS701E
11 pages
Resampling Methods Class 2
No ratings yet
Resampling Methods Class 2
38 pages
2020 Evaluation PDF
No ratings yet
2020 Evaluation PDF
25 pages
Chapter 5
No ratings yet
Chapter 5
3 pages
Receiver Operating Character-Istic: Max C
No ratings yet
Receiver Operating Character-Istic: Max C
5 pages
7 ML
No ratings yet
7 ML
38 pages
Cross Validation
No ratings yet
Cross Validation
5 pages
Adobe Scan Dec 17, 2023
No ratings yet
Adobe Scan Dec 17, 2023
1 page
ABM Capstone Project Midterm
No ratings yet
ABM Capstone Project Midterm
16 pages
C++ Insertion Sort Example Code
No ratings yet
C++ Insertion Sort Example Code
11 pages
H. F. Weinberger - A First Course in Partial Differential Equations With Complex Variables and Transform Methods
80% (5)
H. F. Weinberger - A First Course in Partial Differential Equations With Complex Variables and Transform Methods
456 pages
Sliding Controller Design for Nonlinear Systems
No ratings yet
Sliding Controller Design for Nonlinear Systems
15 pages
DSA Lab: Dijkstra & Knapsack
No ratings yet
DSA Lab: Dijkstra & Knapsack
10 pages
System Identification Lecture Guide
No ratings yet
System Identification Lecture Guide
25 pages
AI Agent Environments Overview
No ratings yet
AI Agent Environments Overview
10 pages
2025-Back To Basics - Let Denoising Generative Models Denoise
No ratings yet
2025-Back To Basics - Let Denoising Generative Models Denoise
18 pages
MCQS
No ratings yet
MCQS
7 pages
Decision Tree
0% (1)
Decision Tree
16 pages
Transportation Problem Solutions Explained
No ratings yet
Transportation Problem Solutions Explained
60 pages
Bisection Method for Solving Equations
No ratings yet
Bisection Method for Solving Equations
17 pages
Top 5 Books For Aspiring Data Analysts
No ratings yet
Top 5 Books For Aspiring Data Analysts
1 page
Communication Interpretation Using Machine Learning and Open CV
No ratings yet
Communication Interpretation Using Machine Learning and Open CV
11 pages
CLARINET Augmenting Language Models To Ask Clarification Questions
No ratings yet
CLARINET Augmenting Language Models To Ask Clarification Questions
12 pages
Dickson - Appendix D Tables
No ratings yet
Dickson - Appendix D Tables
13 pages
Lagrange's Equation: 1 Calculus of Variations
No ratings yet
Lagrange's Equation: 1 Calculus of Variations
4 pages
Syllabus cs412 2025fall
No ratings yet
Syllabus cs412 2025fall
3 pages
Java Code Implementation: Generated by Blackbox Ai
No ratings yet
Java Code Implementation: Generated by Blackbox Ai
5 pages
SSRN 3320044
No ratings yet
SSRN 3320044
9 pages
Probability Presentation
No ratings yet
Probability Presentation
12 pages
DECISION TREE - Worked Example
No ratings yet
DECISION TREE - Worked Example
2 pages
Measure of Dispersion
No ratings yet
Measure of Dispersion
12 pages
Lecture Note 1: Introduction and Review of Probability and Statistics
No ratings yet
Lecture Note 1: Introduction and Review of Probability and Statistics
37 pages
Linear Regression & Contraceptive Use Analysis
No ratings yet
Linear Regression & Contraceptive Use Analysis
12 pages
AI in Healthcare: Trends & Prospects
No ratings yet
AI in Healthcare: Trends & Prospects
14 pages
Question Bank
No ratings yet
Question Bank
8 pages
Assignment#1
No ratings yet
Assignment#1
2 pages

Evaluating Model Performance in ML

Uploaded by

Evaluating Model Performance in ML

Uploaded by

DS605: Fundamentals of Machine Learning

The overall objective of the Learning

● is consistent (i.e., ﬁts the training

Experimental Evaluation deﬁnes ways

Given a representation, data, and a bias, the

To predict future performance, we need to measure error on an independent dataset:

● We want a dataset that has played no part in creating the model.

● This second dataset is called the test set.

● Shufﬂe the dataset and partition it into two disjoint sets:

○ test set (the rest of the full dataset).

What to do when the model is deployed?

● The disadvantages of this method are:

● When we have a smaller dataset, we use a resampling method:

The most-used resampling method is k-fold cross-validation:

● Shuﬄe the dataset D and partition it into k k= 5 folds

● The disadvantages of this method are:

○ It can be costly to train the learning algorithm k times.

● The extreme is k = n, also known as leave-one-out cross-validation or LOOCV.

In case of hyperparameter (parameters of the model class, Dataset

● A validation set, (a.k.a. a development set or dev set)

Test high I.I.D.

low Underﬁtting Overﬁtting

Need More Complex Need Simpler Model

low Need More Features Remove Extra Features

More Data Doesn’t Work Need More Data

You might also like