0% found this document useful (0 votes)
19 views

Evaluation of Clustering Algorithm

The document discusses various evaluation metrics for classification algorithms, including the Confusion Matrix, Precision, Recall, F1 Score, and AUC-ROC. It explains how these metrics help assess model performance, particularly in distinguishing between true positives and false positives/negatives. Additionally, it covers K-fold cross-validation as a method for validating model performance using subsets of data.

Uploaded by

donmoulali786
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Evaluation of Clustering Algorithm

The document discusses various evaluation metrics for classification algorithms, including the Confusion Matrix, Precision, Recall, F1 Score, and AUC-ROC. It explains how these metrics help assess model performance, particularly in distinguishing between true positives and false positives/negatives. Additionally, it covers K-fold cross-validation as a method for validating model performance using subsets of data.

Uploaded by

donmoulali786
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

EVALUATION OF CLASSIFICATION ALGORITHMS

EVALUATION OF CLASSIFICATION ALGORITHMS


EVALUATION OF CLASSIFICATION ALGORITHMS
Confusion Matrix
Confusion Matrix is a performance measurement for
the machine learning classification problems where the
output can be two or more classes. It is a table with
combinations of predicted and actual values.
EVALUATION OF CLASSIFICATION ALGORITHMS
•True Positive: We predicted positive and it’s true
•True Negative: We predicted negative and it’s true.
•False Positive (Type 1 Error): We predicted
positive and it’s false.
•False Negative (Type 2 Error): We predicted
negative and it’s false.
EVALUATION OF CLASSIFICATION ALGORITHMS

Precision
It explains how many of the correctly predicted cases
actually turned out to be positive. Precision is useful in
the cases where False Positive is a higher concern than
False Negatives. Precision for a label is defined as
the number of true positives divided by the
number of predicted positives.
EVALUATION OF CLASSIFICATION ALGORITHMS

Recall (Sensitivity)
It explains how many of the actual positive cases we were
able to predict correctly with our model. Recall is a useful
metric in cases where False Negative is of higher concern
than False Positive.
Recall for a label is defined as the number of true positives
divided by the total number of actual positives.
EVALUATION OF CLASSIFICATION ALGORITHMS
F1 Score
It gives a combined idea about Precision and Recall
metrics.
It is maximum when Precision is equal to Recall.
F1 Score is the harmonic mean of precision and recall.

F1 Score could be an effective evaluation metric in the following cases:


•When FP and FN are equally costly.
•Adding more data doesn’t effectively change the outcome
•True Negative is high
EVALUATION OF CLASSIFICATION ALGORITHMS

AUC-ROC
The Receiver Operator Characteristic (ROC) is a
probability curve that plots the TPR(True Positive Rate)
against the FPR(False Positive Rate) at various threshold
values and separates the ‘signal’ from the ‘noise’.

The Area Under the Curve (AUC) is the measure of the


ability of a classifier to distinguish between classes.
EVALUATION OF CLASSIFICATION ALGORITHMS

The greater the AUC, the better is the performance of


the model at different threshold points between positive
and negative classes.

• This simply means that When AUC is equal to 1, the


classifier is able to perfectly distinguish between all
Positive and Negative class points.
• When AUC is equal to 0, the classifier would be
predicting all Negatives as Positives and vice versa.
• When AUC is 0.5, the classifier is not able to
distinguish between the Positive and Negative classes.
EVALUATION OF CLASSIFICATION ALGORITHMS
In a ROC curve, the X-axis value shows False Positive Rate
(FPR), and Y-axis shows True Positive Rate (TPR).
• Higher the value of X means higher the number of False
Positives(FP) than True Negatives(TN)
• Higher Y-axis value indicates a higher number of TP than
FN.
So, the choice of the threshold depends
on the ability to balance between FP and FN.
EVALUATION OF CLASSIFICATION ALGORITHMS

K-fold cross-validation
In K-fold cross-validation, the data set is divided into a
number of K-folds and used to assess the model's ability as
new data become available.
K represents the number of groups into which the data
sample is divided. For example, if you find the k value to be
5, you can call it 5-fold cross-validation.
It involves splitting the dataset into k subsets or folds,
where each fold is used as the validation set in turn while
the remaining k-1 folds are used for training.
EVALUATION OF CLASSIFICATION ALGORITHMS
EVALUATION OF CLASSIFICATION ALGORITHMS

You might also like