0% found this document useful (0 votes)
31 views

Module 4

The document discusses model evaluation techniques for classification and regression models. It covers classification metrics like accuracy, precision, recall and confusion matrices. It also discusses techniques for classification model evaluation including hold-out validation, k-fold cross validation, gain/lift charts and K-S charts. The document provides examples and explanations of these techniques.

Uploaded by

PRINCE CARL AJOC
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Module 4

The document discusses model evaluation techniques for classification and regression models. It covers classification metrics like accuracy, precision, recall and confusion matrices. It also discusses techniques for classification model evaluation including hold-out validation, k-fold cross validation, gain/lift charts and K-S charts. The document provides examples and explanations of these techniques.

Uploaded by

PRINCE CARL AJOC
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

LEARNING

LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

LEARNING MODULE NO. 4


Title MODEL EVALUATION

Topic 4.1 Classification Evaluation

4.2 Regression Evaluation

Time Frame 10 hrs.

Introduction Model Evaluation is the subsidiary part of the model development process. It is the
phase that is decided whether the model performs better. Therefore, it is critical
to consider the model outcomes according to every possible evaluation method.
Applying different methods can provide different perspectives. There are different
metrics (or methods) such as accuracy, recall, precision, F1 Score. These are the
most used and known metrics for the model evaluation in classification. Every one
of them evaluates the model in different ways. Model evaluation aims to estimate
the generalization accuracy of a model on future (unseen/out-of-sample) data.

Objectives In this module, learners will be able to:


5. Comprehensively predict how well the final model will work in the future and
as an integral part of many learning methods.

6. Accurately solve data using the cross-validation technique for classification


and regression evaluation.

Learning Activities
(to include MODEL EVALUATION
Content/Discussion
of the Topic) Model Evaluation is an integral part of the model development process. It helps
to find the best model that represents our data and how well the chosen model
will work in the future. Evaluating model performance with the data used for
training is not acceptable in data science because it can easily generate

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 178
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

overoptimistic and overfitted models. There are two methods of evaluating


models in data science, Hold-Out and Cross-Validation. To avoid overfitting, both
methods use a test set (not seen by the model) to evaluate model performance.

Hold-Out

in this method, the mostly large dataset is randomly divided to three subsets:

1. Training set is a subset of the dataset used to build predictive models.


2. Validation set is a subset of the dataset used to assess the performance of
model built in the training phase. It provides a test platform for fine
tuning model's parameters and selecting the best-performing model. Not
all modeling algorithms need a validation set.
3. Test set or unseen examples is a subset of the dataset to assess the likely
future performance of a model. If a model fit to the training set much
better than it fits the test set, overfitting is probably the cause.

Cross-Validation

When only a limited amount of data is available, to achieve an unbiased estimate


of the model performance we use k-fold cross-validation. In k-fold cross-
validation, we divide the data into k subsets of equal size. We build
models k times, each time leaving out one of the subsets from training and use it
as the test set. If k equals the sample size, this is called "leave-one-out".

Model evaluation can be divided to two sections:

 Classification Evaluation
 Regression Evaluation

4.1 Model Evaluation – Classification


In data mining, classification involves the problem of predicting which category or
class a new observation belongs in. The derived model (classifier) is based on the
analysis of a set of training data where each data is given a class label. The trained
model (classifier) is then used to predict the class label for new, unseen data.

To understand classification metrics, one of the most important concepts is the


confusion matrix.

Confusion Matrix

A confusion matrix shows the number of correct and incorrect predictions made
by the classification model compared to the actual outcomes (target value) in the
data. The matrix is NxN, where N is the number of target values (classes).
Performance of such models is commonly evaluated using the data in the matrix.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 179
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

The following table displays a 2x2 confusion matrix for two classes (Positive and
Negative).

Target
Confusion Matrix
Positive Negative
Positive
Positive a b Predictive a/(a+b)
Value
Model
Negative
Negative c d Predictive d/(c+d)
Value
Sensitivity Specificity
a/(a+c) d/(b+d) Accuracy = (a+d)/(a+b+c+d)

 Accuracy : the proportion of the total number of predictions that were


correct.
 Positive Predictive Value or Precision : the proportion of positive cases
that were correctly identified.
 Negative Predictive Value : the proportion of negative cases that were
correctly identified.
 Sensitivity or Recall : the proportion of actual positive cases which are
correctly identified.
 Specificity : the proportion of actual negative cases which are correctly
identified. 

Example
Target
Confusion Matrix
Positive Negative
eX
Positive 70 20 Positive Predictive Value 0.78
Model Negative Predictive
Negative 30 80 0.73
Value
Sensitivity Specificity
Accuracy = 0.75
0.70 0.80

Gain and Lift Charts


Gain or lift is a measure of the effectiveness of a classification model calculated
as the ratio between the results obtained with and without the model. Gain and
lift charts are visual aids for evaluating performance of classification models.
However, in contrast to the confusion matrix that evaluates models on the whole

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 180
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

population gain or lift chart evaluates model performance in a portion of the


population.

Lift Chart
The lift chart shows how much more likely we are to receive positive responses
than if we contact a random sample of customers. For example, by contacting
only 10% of customers based on the predictive model we will reach 3 times as
many respondents, as if we use no model.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 181
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

K-S Chart

K-S or Kolmogorov-Smirnov chart measures performance of classification models.


More accurately, K-S is a measure of the degree of separation between the
positive and negative distributions. The K-S is 100 if the scores partition the
population into two separate groups in which one group contains all the positives
and the other all the negatives. On the other hand, If the model cannot
differentiate between positives and negatives, then it is as if the model selects
cases randomly from the population. The K-S would be 0. In most classification
models the K-S will fall between 0 and 100, and that the higher the value the
better the model is at separating the positive from negative cases.

Example:

The following example shows the results from a classification model. The model
assigns a score between 0-1000 to each positive (Target) and negative (Non-
Target) outcome.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 182
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

ROC Chart

ROC curves are a useful visual tool for comparing two classification models. The
name ROC stands for Receiver Operating Characteristic. ROC curves come from
signal detection theory that was developed during World War II for the analysis
of radar images. An ROC curve shows the trade-off between the true positive rate
or sensitivity (proportion of positive tuples that are correctly identified) and the
false-positive rate (proportion of negative tuples that are incorrectly identified as
positive) for a given model. That is, given a two-class problem, it allows us to
visualize the trade-off between the rate at which the model can accurately
recognize ‘yes’ cases versus the rate at which it mistakenly identifies ‘no’ cases as
‘yes’ for different “portions” of the test set. Any increase in the true positive rate
occurs at the cost of an increase in the false-positive rate. The area under the ROC
curve is a measure of the accuracy of the model.

The ROC chart is similar to the gain or lift charts in that they provide a means of
comparison between classification models. The ROC chart shows false positive
rate (1-specificity) on X-axis, the probability of target=1 when its true value is 0,
against true positive rate (sensitivity) on Y-axis, the probability of target=1 when
its true value is 1. Ideally, the curve will climb quickly toward the top-left meaning
the model correctly predicted the cases. The diagonal red line is for a random
model.

Area Under the Curve (AUC)

Area under ROC curve is often used as a measure of quality of the classification
models. A random classifier has an area under the curve of 0.5, while AUC for a
perfect classifier is equal to 1. In practice, most of the classification models have
an AUC between 0.5 and 1.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 183
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

In area under the ROC curve of 0.8, for example, means that a randomly selected
case from the group with the target equals 1 has a score larger than that for a
randomly chosen case from the group with the target equals 0 in 80% of the time.
When a classifier cannot distinguish between the two groups, the area will be
equal to 0.5 (the ROC curve will coincide with the diagonal). When there is a
perfect separation of the two groups, i.e., no overlapping of the distributions, the
area under the ROC curve reaches to 1 (the ROC curve will reach the upper left
corner of the plot).

Activity #1:

Using your activity in Classification in Module 3, accurately solve data


using the cross-validation technique for classification. Make a
comprehensive prediction and interpretation on how well the final
model will work in the future and as an integral part of many learning
methods.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 184
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

4.2 Model Evaluation - Regression

After building a number of different regression models, there is a wealth of


criteria by which they can be evaluated and compared.

Root Mean Squared Error

RMSE is a popular formula to measure the error rate of a regression model.


However, it can only be compared between models whose errors are measured
in the same units.

Relative Squared Error

Unlike RMSE, the relative squared error (RSE) can be compared


between models whose errors are measured in the different units.

Mean Absolute Error

The mean absolute error (MAE) has the same unit as the original data, and it
can only be compared between models whose errors are measured in the
same units. It is usually similar in magnitude to RMSE, but slightly smaller.

Relative Absolute Error

Like RSE , the relative absolute error (RAE) can be compared between models
whose errors are measured in the different units.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 185
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

Coefficient of Determination

The coefficient of determination (R2) summarizes the explanatory power of the


regression model and is computed from the sums-of-squares terms.

R2 describes the proportion of variance of the dependent variable explained by


the regression model. If the regression model is “perfect”, SSE is zero, and R2 is 1.
If the regression model is a total failure, SSE is equal to SST, no variance is
explained by regression, and R2 is zero.

Standardized Residuals (Errors) Plot

The standardized residual plot is a useful visualization tool in order to show


the residual dispersion patterns on a standardized scale. There are no
substantial differences between the pattern for a standardized residual
plot and the pattern in the regular residual plot. The only difference is the
standardized scale on the y-axis which allows us to easily detect potential
outliers.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 186
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

Activity #2:

Using your activity in Regression in Module 3, accurately solve data


using the cross-validation technique for regression. Make a
comprehensive prediction and interpretation on how well the final
model will work in the future and as an integral part of many learning
methods.

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 187
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

Self-Evaluation Name: ______________________________________ Date: __________


Program/Yr/Section: __________________________ Score: _________
Try to answer the questions below to test your knowledge of this lesson.

1. TABLE 4. 1: Data for Height Classification

Name Gender Height Output1 Output2


Kristina F 1 .6 m Short Medium
Jim M 2m Tall Medium
Maggie F 1.9 m Medium Tall
Martha F 1 .88 m Medium Tall
Stephanie F 1.7 m Short Medium
Bob M 1 .85 m Medium Medium
Kathy F 1 .6 m Short Medium
Dave M 1.7 m Short Medium
Worth M 2.2 m Tall Tall
Steven M 2. 1 m Tall Tall
Debbie F 1.8 m Medium Medium
Todd M 1 .95 m Medium Medium
Kim F 1 .9 m Medium Tall
Amy F 1.8 m Medium Medium
Wynette F 1.75 m Medium Medium

a. Using the data in Table 4.1, draw ROC curves assuming that the Output2
column is the correct classification and Output 1 is what is seen. You will
need to draw three curves, one for each class. 3.
b. Using the data in Table 4. 1, construct a confusion matrix assuming Output2
is the correct assignment and Output1 is what is actually made.

We now look at predicting the class using the short and medium data as input
and looking at the Output1 classification. Since regression assumes numeric
data, we assume that the value for the short class is 0 and the value for the
medium class is 1.

Table 4.1 (a) shows the data for this example: {(1.6, 0), (1.9, 1), (1.88, 1), (1.7, 0),
(1.85, 1), (1.6, 0), (1.7, 0), (1.8, 1), (1.95, 1), (1.9, 1), ( 1.8, 1), (1.75, 1)}.

c. In this case, you may use all the techniques in model evaluation for
regression:

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 188
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY

Review of Model evaluation aims to estimate the generalization accuracy of a model on


Concepts future (unseen/out-of-sample) data. Methods for evaluating a model's
performance are divided into 2 categories: namely, holdout and Cross-validation.
Both methods use a test set (i.e data not seen by the model) to evaluate model
performance.

Thoroughly evaluate the model, and review the steps executed to construct the
model, to be certain it properly achieves the business objectives. Determine if
there is some important business issue that has not been sufficiently considered.

Particularly, in this case, have to evaluate the result in the context of the business
goal. In this phase, due to new patterns, new business requirements occur. That
patterns have to discover in the model results or from other factors. Gaining
business understanding is an iterative process in data mining. The go or no-go
decision must be made in this step to move to the deployment phase.

References Han, J., Kamber, M. and Pei, J. (2011). Data Mining: Concepts and Techniques,
3rd edition. Morgan Kaufman.

Sayad, S. (2010-2021). An Introduction to Data Mining. http://www.saedsayad.


com/data_mining

Dunham, M.H. (2003). Data Mining Introductory and Advanced Topics. Pearson
Education Inc. Upper Saddle River, New Jersey.

https://www.javatpoint.com/data-mining

https://www.pluralsight.com/guides/evaluating-a-data-mining-model

CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 189

You might also like