0% found this document useful (0 votes)

31 views

Module 4

The document discusses model evaluation techniques for classification and regression models. It covers classification metrics like accuracy, precision, recall and confusion matrices. It also discusses techniques for classification model evaluation including hold-out validation, k-fold cross validation, gain/lift charts and K-S charts. The document provides examples and explanations of these techniques.

Uploaded by

PRINCE CARL AJOC

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

Module 4

Uploaded by

PRINCE CARL AJOC

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

LEARNING

LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

LEARNING MODULE NO. 4

Title MODEL EVALUATION

Topic 4.1 Classification Evaluation

4.2 Regression Evaluation

Time Frame 10 hrs.

Introduction Model Evaluation is the subsidiary part of the model development process. It is the
phase that is decided whether the model performs better. Therefore, it is critical
to consider the model outcomes according to every possible evaluation method.
Applying different methods can provide different perspectives. There are different
metrics (or methods) such as accuracy, recall, precision, F1 Score. These are the
most used and known metrics for the model evaluation in classification. Every one
of them evaluates the model in different ways. Model evaluation aims to estimate
the generalization accuracy of a model on future (unseen/out-of-sample) data.

Objectives In this module, learners will be able to:

5. Comprehensively predict how well the final model will work in the future and
as an integral part of many learning methods.

6. Accurately solve data using the cross-validation technique for classification

and regression evaluation.

Learning Activities
(to include MODEL EVALUATION
Content/Discussion
of the Topic) Model Evaluation is an integral part of the model development process. It helps
to find the best model that represents our data and how well the chosen model
will work in the future. Evaluating model performance with the data used for
training is not acceptable in data science because it can easily generate

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 178
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

overoptimistic and overfitted models. There are two methods of evaluating

models in data science, Hold-Out and Cross-Validation. To avoid overfitting, both
methods use a test set (not seen by the model) to evaluate model performance.

Hold-Out

in this method, the mostly large dataset is randomly divided to three subsets:

1. Training set is a subset of the dataset used to build predictive models.

2. Validation set is a subset of the dataset used to assess the performance of
model built in the training phase. It provides a test platform for fine
tuning model's parameters and selecting the best-performing model. Not
all modeling algorithms need a validation set.
3. Test set or unseen examples is a subset of the dataset to assess the likely
future performance of a model. If a model fit to the training set much
better than it fits the test set, overfitting is probably the cause.

Cross-Validation

When only a limited amount of data is available, to achieve an unbiased estimate

of the model performance we use k-fold cross-validation. In k-fold cross-
validation, we divide the data into k subsets of equal size. We build
models k times, each time leaving out one of the subsets from training and use it
as the test set. If k equals the sample size, this is called "leave-one-out".

Model evaluation can be divided to two sections:

 Classification Evaluation
 Regression Evaluation

4.1 Model Evaluation – Classification

In data mining, classification involves the problem of predicting which category or
class a new observation belongs in. The derived model (classifier) is based on the
analysis of a set of training data where each data is given a class label. The trained
model (classifier) is then used to predict the class label for new, unseen data.

To understand classification metrics, one of the most important concepts is the

confusion matrix.

Confusion Matrix

A confusion matrix shows the number of correct and incorrect predictions made
by the classification model compared to the actual outcomes (target value) in the
data. The matrix is NxN, where N is the number of target values (classes).
Performance of such models is commonly evaluated using the data in the matrix.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 179
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

The following table displays a 2x2 confusion matrix for two classes (Positive and
Negative).

Target
Confusion Matrix
Positive Negative
Positive
Positive a b Predictive a/(a+b)
Value
Model
Negative
Negative c d Predictive d/(c+d)
Value
Sensitivity Specificity
a/(a+c) d/(b+d) Accuracy = (a+d)/(a+b+c+d)

 Accuracy : the proportion of the total number of predictions that were

correct.
 Positive Predictive Value or Precision : the proportion of positive cases
that were correctly identified.
 Negative Predictive Value : the proportion of negative cases that were
correctly identified.
 Sensitivity or Recall : the proportion of actual positive cases which are
correctly identified.
 Specificity : the proportion of actual negative cases which are correctly
identified. 

Example
Target
Confusion Matrix
Positive Negative
eX
Positive 70 20 Positive Predictive Value 0.78
Model Negative Predictive
Negative 30 80 0.73
Value
Sensitivity Specificity
Accuracy = 0.75
0.70 0.80

Gain and Lift Charts

Gain or lift is a measure of the effectiveness of a classification model calculated
as the ratio between the results obtained with and without the model. Gain and
lift charts are visual aids for evaluating performance of classification models.
However, in contrast to the confusion matrix that evaluates models on the whole

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 180
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

population gain or lift chart evaluates model performance in a portion of the

population.

Lift Chart
The lift chart shows how much more likely we are to receive positive responses
than if we contact a random sample of customers. For example, by contacting
only 10% of customers based on the predictive model we will reach 3 times as
many respondents, as if we use no model.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 181
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

K-S Chart

K-S or Kolmogorov-Smirnov chart measures performance of classification models.

More accurately, K-S is a measure of the degree of separation between the
positive and negative distributions. The K-S is 100 if the scores partition the
population into two separate groups in which one group contains all the positives
and the other all the negatives. On the other hand, If the model cannot
differentiate between positives and negatives, then it is as if the model selects
cases randomly from the population. The K-S would be 0. In most classification
models the K-S will fall between 0 and 100, and that the higher the value the
better the model is at separating the positive from negative cases.

Example:

The following example shows the results from a classification model. The model
assigns a score between 0-1000 to each positive (Target) and negative (Non-
Target) outcome.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 182
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

ROC Chart

ROC curves are a useful visual tool for comparing two classification models. The
name ROC stands for Receiver Operating Characteristic. ROC curves come from
signal detection theory that was developed during World War II for the analysis
of radar images. An ROC curve shows the trade-off between the true positive rate
or sensitivity (proportion of positive tuples that are correctly identified) and the
false-positive rate (proportion of negative tuples that are incorrectly identified as
positive) for a given model. That is, given a two-class problem, it allows us to
visualize the trade-off between the rate at which the model can accurately
recognize ‘yes’ cases versus the rate at which it mistakenly identifies ‘no’ cases as
‘yes’ for different “portions” of the test set. Any increase in the true positive rate
occurs at the cost of an increase in the false-positive rate. The area under the ROC
curve is a measure of the accuracy of the model.

The ROC chart is similar to the gain or lift charts in that they provide a means of
comparison between classification models. The ROC chart shows false positive
rate (1-specificity) on X-axis, the probability of target=1 when its true value is 0,
against true positive rate (sensitivity) on Y-axis, the probability of target=1 when
its true value is 1. Ideally, the curve will climb quickly toward the top-left meaning
the model correctly predicted the cases. The diagonal red line is for a random
model.

Area Under the Curve (AUC)

Area under ROC curve is often used as a measure of quality of the classification
models. A random classifier has an area under the curve of 0.5, while AUC for a
perfect classifier is equal to 1. In practice, most of the classification models have
an AUC between 0.5 and 1.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 183
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

In area under the ROC curve of 0.8, for example, means that a randomly selected
case from the group with the target equals 1 has a score larger than that for a
randomly chosen case from the group with the target equals 0 in 80% of the time.
When a classifier cannot distinguish between the two groups, the area will be
equal to 0.5 (the ROC curve will coincide with the diagonal). When there is a
perfect separation of the two groups, i.e., no overlapping of the distributions, the
area under the ROC curve reaches to 1 (the ROC curve will reach the upper left
corner of the plot).

Activity #1:

Using your activity in Classification in Module 3, accurately solve data

using the cross-validation technique for classification. Make a
comprehensive prediction and interpretation on how well the final
model will work in the future and as an integral part of many learning
methods.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 184
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

4.2 Model Evaluation - Regression

After building a number of different regression models, there is a wealth of

criteria by which they can be evaluated and compared.

Root Mean Squared Error

RMSE is a popular formula to measure the error rate of a regression model.

However, it can only be compared between models whose errors are measured
in the same units.

Relative Squared Error

Unlike RMSE, the relative squared error (RSE) can be compared

between models whose errors are measured in the different units.

Mean Absolute Error

The mean absolute error (MAE) has the same unit as the original data, and it
can only be compared between models whose errors are measured in the
same units. It is usually similar in magnitude to RMSE, but slightly smaller.

Relative Absolute Error

Like RSE , the relative absolute error (RAE) can be compared between models
whose errors are measured in the different units.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 185
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

Coefficient of Determination

The coefficient of determination (R2) summarizes the explanatory power of the

regression model and is computed from the sums-of-squares terms.

R2 describes the proportion of variance of the dependent variable explained by

the regression model. If the regression model is “perfect”, SSE is zero, and R2 is 1.
If the regression model is a total failure, SSE is equal to SST, no variance is
explained by regression, and R2 is zero.

Standardized Residuals (Errors) Plot

The standardized residual plot is a useful visualization tool in order to show

the residual dispersion patterns on a standardized scale. There are no
substantial differences between the pattern for a standardized residual
plot and the pattern in the regular residual plot. The only difference is the
standardized scale on the y-axis which allows us to easily detect potential
outliers.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 186
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

Activity #2:

Using your activity in Regression in Module 3, accurately solve data

using the cross-validation technique for regression. Make a
comprehensive prediction and interpretation on how well the final
model will work in the future and as an integral part of many learning
methods.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 187
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

Self-Evaluation Name: ____________________________ Date:

Program/Yr/Section: __________________________ Score: _________
Try to answer the questions below to test your knowledge of this lesson.

1. TABLE 4. 1: Data for Height Classification

Name Gender Height Output1 Output2

Kristina F 1 .6 m Short Medium
Jim M 2m Tall Medium
Maggie F 1.9 m Medium Tall
Martha F 1 .88 m Medium Tall
Stephanie F 1.7 m Short Medium
Bob M 1 .85 m Medium Medium
Kathy F 1 .6 m Short Medium
Dave M 1.7 m Short Medium
Worth M 2.2 m Tall Tall
Steven M 2. 1 m Tall Tall
Debbie F 1.8 m Medium Medium
Todd M 1 .95 m Medium Medium
Kim F 1 .9 m Medium Tall
Amy F 1.8 m Medium Medium
Wynette F 1.75 m Medium Medium

a. Using the data in Table 4.1, draw ROC curves assuming that the Output2
column is the correct classification and Output 1 is what is seen. You will
need to draw three curves, one for each class. 3.
b. Using the data in Table 4. 1, construct a confusion matrix assuming Output2
is the correct assignment and Output1 is what is actually made.

We now look at predicting the class using the short and medium data as input
and looking at the Output1 classification. Since regression assumes numeric
data, we assume that the value for the short class is 0 and the value for the
medium class is 1.

Table 4.1 (a) shows the data for this example: {(1.6, 0), (1.9, 1), (1.88, 1), (1.7, 0),
(1.85, 1), (1.6, 0), (1.7, 0), (1.8, 1), (1.95, 1), (1.9, 1), ( 1.8, 1), (1.75, 1)}.

c. In this case, you may use all the techniques in model evaluation for
regression:

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 188
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

Review of Model evaluation aims to estimate the generalization accuracy of a model on

Concepts future (unseen/out-of-sample) data. Methods for evaluating a model's
performance are divided into 2 categories: namely, holdout and Cross-validation.
Both methods use a test set (i.e data not seen by the model) to evaluate model
performance.

Thoroughly evaluate the model, and review the steps executed to construct the
model, to be certain it properly achieves the business objectives. Determine if
there is some important business issue that has not been sufficiently considered.

Particularly, in this case, have to evaluate the result in the context of the business
goal. In this phase, due to new patterns, new business requirements occur. That
patterns have to discover in the model results or from other factors. Gaining
business understanding is an iterative process in data mining. The go or no-go
decision must be made in this step to move to the deployment phase.

References Han, J., Kamber, M. and Pei, J. (2011). Data Mining: Concepts and Techniques,
3rd edition. Morgan Kaufman.

Sayad, S. (2010-2021). An Introduction to Data Mining. http://www.saedsayad.

com/data_mining

Dunham, M.H. (2003). Data Mining Introductory and Advanced Topics. Pearson
Education Inc. Upper Saddle River, New Jersey.

https://www.javatpoint.com/data-mining

https://www.pluralsight.com/guides/evaluating-a-data-mining-model

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 189

L08 - Small Signal Stability
100% (1)
L08 - Small Signal Stability
46 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
IAI&ML UNIT-5
No ratings yet
IAI&ML UNIT-5
15 pages
Chapter 3 Model Evaluation Final
No ratings yet
Chapter 3 Model Evaluation Final
30 pages
Evaluation Metrics:: Confusion Matrix
No ratings yet
Evaluation Metrics:: Confusion Matrix
7 pages
Unit 2 Chap 4
No ratings yet
Unit 2 Chap 4
14 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
6.evaluation Metrics - UNIT 2
No ratings yet
6.evaluation Metrics - UNIT 2
4 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
Evaluating Model Performance Unit 6
No ratings yet
Evaluating Model Performance Unit 6
33 pages
8.predictive Analytics - Classification 2
No ratings yet
8.predictive Analytics - Classification 2
28 pages
EvaluationMatrix
No ratings yet
EvaluationMatrix
29 pages
Model Performance Assessment
No ratings yet
Model Performance Assessment
13 pages
Analytics in Practice: Model Evaluation
No ratings yet
Analytics in Practice: Model Evaluation
40 pages
S1-Evaluate-Performance-LKW-1Mar2025
No ratings yet
S1-Evaluate-Performance-LKW-1Mar2025
26 pages
Evaluation Method Holdout
No ratings yet
Evaluation Method Holdout
14 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
Module 6
No ratings yet
Module 6
24 pages
Risk Security and Regulatory Compliance
No ratings yet
Risk Security and Regulatory Compliance
12 pages
Evaluation of Predictive Models Final
No ratings yet
Evaluation of Predictive Models Final
6 pages
Unit - I Chap-4 Model Evaluation and Development
No ratings yet
Unit - I Chap-4 Model Evaluation and Development
35 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Topic 7
No ratings yet
Topic 7
70 pages
MACHINELEARNING
No ratings yet
MACHINELEARNING
20 pages
UNIT4 Evaluation Metrics
No ratings yet
UNIT4 Evaluation Metrics
16 pages
DM - Ch4 - Classification (Part1)
No ratings yet
DM - Ch4 - Classification (Part1)
20 pages
9b. Evaluation of Classifiers
No ratings yet
9b. Evaluation of Classifiers
4 pages
IT 138 - Lecture 4
No ratings yet
IT 138 - Lecture 4
30 pages
AD3501-DL-UNIT 4 NOTES
No ratings yet
AD3501-DL-UNIT 4 NOTES
16 pages
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Lec 8
No ratings yet
Lec 8
35 pages
Introduction to Data Mining
No ratings yet
Introduction to Data Mining
19 pages
Confusion Matrix in Machine Learning
No ratings yet
Confusion Matrix in Machine Learning
10 pages
Model Evaluation Metrics 1683566651
No ratings yet
Model Evaluation Metrics 1683566651
12 pages
DSML Clasification
No ratings yet
DSML Clasification
44 pages
Performance Measures - Session 2
No ratings yet
Performance Measures - Session 2
35 pages
Confusion Matrix
No ratings yet
Confusion Matrix
8 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
11 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Exp7_MLAI2
No ratings yet
Exp7_MLAI2
8 pages
Confusion Matrix ROC
No ratings yet
Confusion Matrix ROC
8 pages
Confusion Matrix ROC
No ratings yet
Confusion Matrix ROC
8 pages
UNIT-1-2.Binary Classification and Related Tasks
No ratings yet
UNIT-1-2.Binary Classification and Related Tasks
22 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
Week 6 Machine Learning
No ratings yet
Week 6 Machine Learning
17 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
ML Unit-3 - RTU
No ratings yet
ML Unit-3 - RTU
20 pages
Classification Analysis
No ratings yet
Classification Analysis
4 pages
Confusion Matrix & Evaluation Metrics in Machine Learning
No ratings yet
Confusion Matrix & Evaluation Metrics in Machine Learning
23 pages
Cross Validation
No ratings yet
Cross Validation
10 pages
9 - Session 9 - Visualizing Model Performance, Evidence and Probabilities
No ratings yet
9 - Session 9 - Visualizing Model Performance, Evidence and Probabilities
37 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
6.Data Mining - Classification Ppt
No ratings yet
6.Data Mining - Classification Ppt
37 pages
Classification Algorithm in Machine Learning
No ratings yet
Classification Algorithm in Machine Learning
13 pages
Camm 4e Ch09 PPT
No ratings yet
Camm 4e Ch09 PPT
71 pages
Clase10 11
No ratings yet
Clase10 11
18 pages
Model Evaluation
No ratings yet
Model Evaluation
18 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
C++ Armadillo Specifications
No ratings yet
C++ Armadillo Specifications
15 pages
Naive Bayes - Report (Repaired)
No ratings yet
Naive Bayes - Report (Repaired)
5 pages
Signals and Systems For Signals and Systems For
No ratings yet
Signals and Systems For Signals and Systems For
85 pages
ArtificiaI Intelligence Engineer Brochure
No ratings yet
ArtificiaI Intelligence Engineer Brochure
27 pages
Design and Analysis of Algorithm: Practical File (MCA-261)
No ratings yet
Design and Analysis of Algorithm: Practical File (MCA-261)
50 pages
Simulation of Temperature Controller For An Injection Mould Machine Using Fuzzy Logic
No ratings yet
Simulation of Temperature Controller For An Injection Mould Machine Using Fuzzy Logic
11 pages
Legendre Equation Problems
No ratings yet
Legendre Equation Problems
2 pages
AWGN
No ratings yet
AWGN
3 pages
Week 5
No ratings yet
Week 5
14 pages
Spearman Rho
No ratings yet
Spearman Rho
18 pages
Business Statistics Project
No ratings yet
Business Statistics Project
5 pages
4) Model Question Bank - SP
No ratings yet
4) Model Question Bank - SP
14 pages
Week 4-6 Convolution Correlation
No ratings yet
Week 4-6 Convolution Correlation
64 pages
Lec7_10_HMM Learning
No ratings yet
Lec7_10_HMM Learning
88 pages
Assignment 8 System Design Through VERILOG - Unit 9 - Week 8 - Case Studies
No ratings yet
Assignment 8 System Design Through VERILOG - Unit 9 - Week 8 - Case Studies
3 pages
Alexander 4
No ratings yet
Alexander 4
9 pages
Ex4 22
No ratings yet
Ex4 22
3 pages
Definition of Simple Linear Regression
No ratings yet
Definition of Simple Linear Regression
9 pages
Module 4: Dynamic Programming: Design and Analysis of Algorithms 21CS42
No ratings yet
Module 4: Dynamic Programming: Design and Analysis of Algorithms 21CS42
105 pages
CSE Instructor Materials Chapter41
No ratings yet
CSE Instructor Materials Chapter41
27 pages
Project Ideas
No ratings yet
Project Ideas
8 pages
Class 12 Maths Sample Paper 4 - Questions
No ratings yet
Class 12 Maths Sample Paper 4 - Questions
15 pages
ISLR Chap 4 Shaheryar
No ratings yet
ISLR Chap 4 Shaheryar
16 pages
2016 Midterm
No ratings yet
2016 Midterm
2 pages
Evaluation of Cordic Algorithms For Fpga Design PDF
No ratings yet
Evaluation of Cordic Algorithms For Fpga Design PDF
2 pages
Disk Scheduling Algorithms in Operating Systems
No ratings yet
Disk Scheduling Algorithms in Operating Systems
8 pages
Chapter 6 AI
No ratings yet
Chapter 6 AI
52 pages
State Space Analysis and Control Design
No ratings yet
State Space Analysis and Control Design
5 pages
Week 10
No ratings yet
Week 10
42 pages