Module 4
Module 4
Introduction Model Evaluation is the subsidiary part of the model development process. It is the
phase that is decided whether the model performs better. Therefore, it is critical
to consider the model outcomes according to every possible evaluation method.
Applying different methods can provide different perspectives. There are different
metrics (or methods) such as accuracy, recall, precision, F1 Score. These are the
most used and known metrics for the model evaluation in classification. Every one
of them evaluates the model in different ways. Model evaluation aims to estimate
the generalization accuracy of a model on future (unseen/out-of-sample) data.
Learning Activities
(to include MODEL EVALUATION
Content/Discussion
of the Topic) Model Evaluation is an integral part of the model development process. It helps
to find the best model that represents our data and how well the chosen model
will work in the future. Evaluating model performance with the data used for
training is not acceptable in data science because it can easily generate
CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 178
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY
Hold-Out
in this method, the mostly large dataset is randomly divided to three subsets:
Cross-Validation
Classification Evaluation
Regression Evaluation
Confusion Matrix
A confusion matrix shows the number of correct and incorrect predictions made
by the classification model compared to the actual outcomes (target value) in the
data. The matrix is NxN, where N is the number of target values (classes).
Performance of such models is commonly evaluated using the data in the matrix.
CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 179
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY
The following table displays a 2x2 confusion matrix for two classes (Positive and
Negative).
Target
Confusion Matrix
Positive Negative
Positive
Positive a b Predictive a/(a+b)
Value
Model
Negative
Negative c d Predictive d/(c+d)
Value
Sensitivity Specificity
a/(a+c) d/(b+d) Accuracy = (a+d)/(a+b+c+d)
Example
Target
Confusion Matrix
Positive Negative
eX
Positive 70 20 Positive Predictive Value 0.78
Model Negative Predictive
Negative 30 80 0.73
Value
Sensitivity Specificity
Accuracy = 0.75
0.70 0.80
CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 180
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY
Lift Chart
The lift chart shows how much more likely we are to receive positive responses
than if we contact a random sample of customers. For example, by contacting
only 10% of customers based on the predictive model we will reach 3 times as
many respondents, as if we use no model.
CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 181
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY
K-S Chart
Example:
The following example shows the results from a classification model. The model
assigns a score between 0-1000 to each positive (Target) and negative (Non-
Target) outcome.
CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 182
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY
ROC Chart
ROC curves are a useful visual tool for comparing two classification models. The
name ROC stands for Receiver Operating Characteristic. ROC curves come from
signal detection theory that was developed during World War II for the analysis
of radar images. An ROC curve shows the trade-off between the true positive rate
or sensitivity (proportion of positive tuples that are correctly identified) and the
false-positive rate (proportion of negative tuples that are incorrectly identified as
positive) for a given model. That is, given a two-class problem, it allows us to
visualize the trade-off between the rate at which the model can accurately
recognize ‘yes’ cases versus the rate at which it mistakenly identifies ‘no’ cases as
‘yes’ for different “portions” of the test set. Any increase in the true positive rate
occurs at the cost of an increase in the false-positive rate. The area under the ROC
curve is a measure of the accuracy of the model.
The ROC chart is similar to the gain or lift charts in that they provide a means of
comparison between classification models. The ROC chart shows false positive
rate (1-specificity) on X-axis, the probability of target=1 when its true value is 0,
against true positive rate (sensitivity) on Y-axis, the probability of target=1 when
its true value is 1. Ideally, the curve will climb quickly toward the top-left meaning
the model correctly predicted the cases. The diagonal red line is for a random
model.
Area under ROC curve is often used as a measure of quality of the classification
models. A random classifier has an area under the curve of 0.5, while AUC for a
perfect classifier is equal to 1. In practice, most of the classification models have
an AUC between 0.5 and 1.
CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 183
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY
In area under the ROC curve of 0.8, for example, means that a randomly selected
case from the group with the target equals 1 has a score larger than that for a
randomly chosen case from the group with the target equals 0 in 80% of the time.
When a classifier cannot distinguish between the two groups, the area will be
equal to 0.5 (the ROC curve will coincide with the diagonal). When there is a
perfect separation of the two groups, i.e., no overlapping of the distributions, the
area under the ROC curve reaches to 1 (the ROC curve will reach the upper left
corner of the plot).
Activity #1:
CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 184
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY
The mean absolute error (MAE) has the same unit as the original data, and it
can only be compared between models whose errors are measured in the
same units. It is usually similar in magnitude to RMSE, but slightly smaller.
Like RSE , the relative absolute error (RAE) can be compared between models
whose errors are measured in the different units.
CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 185
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY
Coefficient of Determination
CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 186
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY
Activity #2:
CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 187
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY
a. Using the data in Table 4.1, draw ROC curves assuming that the Output2
column is the correct classification and Output 1 is what is seen. You will
need to draw three curves, one for each class. 3.
b. Using the data in Table 4. 1, construct a confusion matrix assuming Output2
is the correct assignment and Output1 is what is actually made.
We now look at predicting the class using the short and medium data as input
and looking at the Output1 classification. Since regression assumes numeric
data, we assume that the value for the short class is 0 and the value for the
medium class is 1.
Table 4.1 (a) shows the data for this example: {(1.6, 0), (1.9, 1), (1.88, 1), (1.7, 0),
(1.85, 1), (1.6, 0), (1.7, 0), (1.8, 1), (1.95, 1), (1.9, 1), ( 1.8, 1), (1.75, 1)}.
c. In this case, you may use all the techniques in model evaluation for
regression:
CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 188
LEARNING
LEA MODULE SURIGAO STATE COLLEGE OF TECHNOLOGY
Thoroughly evaluate the model, and review the steps executed to construct the
model, to be certain it properly achieves the business objectives. Determine if
there is some important business issue that has not been sufficiently considered.
Particularly, in this case, have to evaluate the result in the context of the business
goal. In this phase, due to new patterns, new business requirements occur. That
patterns have to discover in the model results or from other factors. Gaining
business understanding is an iterative process in data mining. The go or no-go
decision must be made in this step to move to the deployment phase.
References Han, J., Kamber, M. and Pei, J. (2011). Data Mining: Concepts and Techniques,
3rd edition. Morgan Kaufman.
Dunham, M.H. (2003). Data Mining Introductory and Advanced Topics. Pearson
Education Inc. Upper Saddle River, New Jersey.
https://www.javatpoint.com/data-mining
https://www.pluralsight.com/guides/evaluating-a-data-mining-model
CS 325 – Data Mining (compiled by: DR. MONALEE A. DELA CERNA) 189