0% found this document useful (0 votes)

54 views44 pages

Intermediate Analytics-Regression-Week 3-1

The document outlines standard binary classification algorithms and introduces concepts such as thresholds, ROC curves, and Generalized Linear Models (GLM). It emphasizes the importance of evaluation methods, particularly the confusion matrix, and discusses the limitations of accuracy in imbalanced datasets. Additionally, it provides guidance on when to use various performance metrics like Precision, Recall, and F1-Score based on the context of the classification problem.

Uploaded by

cronguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPSX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views44 pages

Intermediate Analytics-Regression-Week 3-1

Uploaded by

cronguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPSX, PDF, TXT or read online on Scribd

Intermediate

Analytics
ALY6015
Northeastern University
By: Behzad Abdi

Week Three
Standard Binary Classification Algorithms:

• K-Nearest Neighbors (KNN)

• Logistic Regression
• Support Vector Machine (SVM)
• Decision Trees
• Random Forest
• Gradient Boosting
• Neural Networks
• Naive Bayes
• AdaBoost
• Stochastic Gradient Descent (SGD)
[0 1 2 3 4 5 6 7 8 9 10]
[0 0 0 0 0 ? 1 1 1 1 1]
What is a Threshold?
What is a Threshold?
How to Use the ROC Curve for Optimal Threshold Selection?
Generalize
d Linear
Models
Introduction to Generalized Linear Models (GLM)
Structure of a GLM
Common Types of GLM
Applications of GLMs
Advantages and Limitations of GLM
Advantages and Limitations of GLM
Classifier
s
Evaluatio
n
Methods
Introducing the Confusion Matrix
What is a Confusion Matrix?

• A table used to evaluate the performance of a classification model.

• Compares predicted labels with actual labels.
Introducing the Confusion Matrix
Introducing the Confusion Matrix
 Definition: Proportion of correctly predicted samples (both positive and negative) out of the total
samples.

 Example: If a spam filter correctly classifies 90 out of 100 Emails, its Accuracy is 90%

 Use Case:
• When the dataset is balanced (i.e., the number of samples in each class is roughly equal).
• Example: Predicting whether an email is spam when spam and non-spam emails are equally
represented.

 Limitations:
• Misleading for imbalanced datasets (e.g., predicting all samples as the majority class may still result
in high accuracy).
Example: Classifying transactions as fraudulent or non-fraudulent:

 Dataset:
Total transactions: 10,000

• 9500 non-fraudulent transactions (majority class)

• 500 fraudulent transactions (minority class)

Model Prediction: with :

Let’s assume the model predicts that all transactions are non-fraudulent. In this case:

• It correctly predicts 9500 non-fraudulent transactions (True Negatives).

• It incorrectly predicts 500 fraudulent transactions as non-fraudulent (False Negatives).

Model Accuracy (High but Misleading Accuracy ):

•Correct predictions: 9500
Accuracy = (Correct predictions / Total samples) = 9500 / 10000 = 95%
•Total transactions: 10,000
Example: Classifying transactions as fraudulent or non-fraudulent:
 Dataset:
• Total transactions: 10,000
• 9500 non-fraudulent transactions (majority class)
• 500 fraudulent transactions (minority class)

 Dataset Recap:
• Total transactions: 10,000
• Non-fraudulent transactions (majority class): 9,500
• Fraudulent transactions (minority class): 500

 Model Behavior:
• The model predicts all transactions as non-fraudulent.
• True Negatives (TN): 9,500 (correctly predicted non-fraudulent transactions).
• False Negatives (FN): 500 (fraudulent transactions incorrectly predicted as non-fraudulent).

 Model Accuracy Calculation:

model has 95% accuracy, which seems high and impressive at first glance.
Example: Classifying transactions as fraudulent or non-fraudulent:

Why is this Misleading?

1. Model Ignores the Minority Class (Fraudulent Transactions):

• The model does not detect any fraudulent transactions (500 missed cases).
• False negatives are critical in fraud detection. Missing these cases could lead to serious financial losses
or risks.
2. Accuracy Favors the Majority Class:
• Since 95% of the data is non-fraudulent, predicting "non-fraudulent" for every transaction results in
high accuracy.
• However, the model fails to address the minority class (fraud).

3. Critical Errors Are Overlooked:

• A fraud detection model must prioritize detecting fraudulent transactions (minority class), even if it
sacrifices some accuracy for the majority class.
 Definition: Proportion of true positive predictions out of all positive predictions.

 Use Case:
• When false positives (FP) are costly or critical to minimize.
• Example: Predicting whether a patient has cancer (false positives can lead to unnecessary treatments).

 Strengths:
• Focuses on the quality of positive predictions.
• Useful when positive predictions should be highly reliable.
(Sensitivity or True Positive Rate)
 Definition: Proportion of true positive predictions out of all actual positive samples.

 Use Case:
• When false negatives (FN) are costly or critical to minimize.
• Example: Detecting fraud or diagnosing a rare disease (missing a positive case is unacceptable)

 Strengths:
• Ensures that as many positive cases as possible are detected.
• Useful in applications where missing true positives has severe consequences.
 Definition: Harmonic mean of Precision and Recall, balancing both metrics.

 Use Case:
• When there is an uneven class distribution and a balance between Precision and Recall is desired.
• Example: Spam detection, where both false positives (wrongly classifying a legitimate email as spam)
and false negatives (missing a spam email) are important.

 Strengths:

• Useful for imbalanced datasets.

• Provides a single score that balances the trade-off between Precision and Recall.
When to Use Each Metric

Summary:

• Use Accuracy for balanced datasets or when all errors are equally important.
• Use Precision to avoid false alarms in applications where positive predictions must be highly reliable.
• Use Recall when failing to detect true positives is costly.
• Use F1-Score when the dataset is imbalanced, and you need to balance Precision and Recall.
Introduction to ROC Curve
Understanding AUC
Advantages and Limitations of ROC and AUC

Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
Unit 4
No ratings yet
Unit 4
20 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Assignment 5
No ratings yet
Assignment 5
22 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
61 pages
Unit - 3 Evaluation
No ratings yet
Unit - 3 Evaluation
6 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
Unit - 5
No ratings yet
Unit - 5
57 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
Evaluating Models CH-3
No ratings yet
Evaluating Models CH-3
5 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
Unit3 Evaluating Models
No ratings yet
Unit3 Evaluating Models
10 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
ML Classification Metrics Guide
100% (1)
ML Classification Metrics Guide
30 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
8 pages
Performance Metrics
No ratings yet
Performance Metrics
34 pages
Unit8 (Evaluation Method)
No ratings yet
Unit8 (Evaluation Method)
43 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
12 pages
03 Performance Metrics
No ratings yet
03 Performance Metrics
15 pages
Chapter 3 Model Evaluation Final
No ratings yet
Chapter 3 Model Evaluation Final
30 pages
Ch01 ICS422 03
No ratings yet
Ch01 ICS422 03
46 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
Evaluation Metrics in Machine Learning - GeeksforGeeks
No ratings yet
Evaluation Metrics in Machine Learning - GeeksforGeeks
6 pages
MLS 2 - Classification
No ratings yet
MLS 2 - Classification
13 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
CLASSIFICATION
No ratings yet
CLASSIFICATION
36 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
2.3 Performance Metrics
No ratings yet
2.3 Performance Metrics
32 pages
Confusion Matrix in Model Evaluation
No ratings yet
Confusion Matrix in Model Evaluation
43 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
CH 6
No ratings yet
CH 6
24 pages
9.1 Accuracy: Formula: Accuracy (True Positives + True Negatives) / (Total Observations)
No ratings yet
9.1 Accuracy: Formula: Accuracy (True Positives + True Negatives) / (Total Observations)
4 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
8 pages
Evaluating Classification Methods
No ratings yet
Evaluating Classification Methods
29 pages
Ad3501-Dl-Unit 4 Notes
No ratings yet
Ad3501-Dl-Unit 4 Notes
16 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
11 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
Unit 2 Classification
No ratings yet
Unit 2 Classification
59 pages
Risk Security and Regulatory Compliance
No ratings yet
Risk Security and Regulatory Compliance
12 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
8 2 Lecture AI 8 2
No ratings yet
8 2 Lecture AI 8 2
35 pages
Evaluating Machine Learning Models
No ratings yet
Evaluating Machine Learning Models
21 pages
ML Lecture 11 Evaluation
No ratings yet
ML Lecture 11 Evaluation
17 pages
Ads 5
No ratings yet
Ads 5
5 pages
Lecture 20 - Evaluation Metrics
No ratings yet
Lecture 20 - Evaluation Metrics
27 pages
ML - Training - Evaluation For Machine Learning Course
No ratings yet
ML - Training - Evaluation For Machine Learning Course
31 pages
Unit 2 Chap 4
No ratings yet
Unit 2 Chap 4
14 pages
Classification Metrics in Machine Learning
No ratings yet
Classification Metrics in Machine Learning
6 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
6 pages
IPOPHLFillable Form 400
No ratings yet
IPOPHLFillable Form 400
2 pages
Android Application Components Overview
No ratings yet
Android Application Components Overview
3 pages
Bar Exam Application Guide
No ratings yet
Bar Exam Application Guide
18 pages
WWW Daniweb Com
No ratings yet
WWW Daniweb Com
3 pages
Wisegate Sample Byod Policy
No ratings yet
Wisegate Sample Byod Policy
22 pages
ECS401: Cryptography and Network Security: Module 5: Authentication Protocols
No ratings yet
ECS401: Cryptography and Network Security: Module 5: Authentication Protocols
15 pages
Accenture Selection Confirmation Email
No ratings yet
Accenture Selection Confirmation Email
2 pages
Types of Malicious Scam
No ratings yet
Types of Malicious Scam
2 pages
国际学生申请流程指南
No ratings yet
国际学生申请流程指南
8 pages
QA004130 Manual
No ratings yet
QA004130 Manual
365 pages
2024 DE Program Data Entry Guide
No ratings yet
2024 DE Program Data Entry Guide
2 pages
Randstad Misconduct Reporting Procedure Faq
No ratings yet
Randstad Misconduct Reporting Procedure Faq
5 pages
1.listening and Speaking Worksheet
No ratings yet
1.listening and Speaking Worksheet
2 pages
IRCTC 22nd AGM Notice 2021-22
No ratings yet
IRCTC 22nd AGM Notice 2021-22
10 pages
WhatsApp Sentiment Analysis with R
No ratings yet
WhatsApp Sentiment Analysis with R
4 pages
Reading Extra
No ratings yet
Reading Extra
4 pages
District Preliminary Rounds - Rules and Regulations - EPS2024
No ratings yet
District Preliminary Rounds - Rules and Regulations - EPS2024
4 pages
Shadow N
No ratings yet
Shadow N
3 pages
Client Acquisition Strategies
No ratings yet
Client Acquisition Strategies
81 pages
Accessing Mechanical Virtual Labs Guide
No ratings yet
Accessing Mechanical Virtual Labs Guide
20 pages
Evaluation Machine Learning
No ratings yet
Evaluation Machine Learning
5 pages
OBM247 MindMap Assignment Chapter3 DayangkuDhamirah
No ratings yet
OBM247 MindMap Assignment Chapter3 DayangkuDhamirah
4 pages
BSK User Manual English
No ratings yet
BSK User Manual English
233 pages
Sports Action Plan
No ratings yet
Sports Action Plan
5 pages
Unit 5
No ratings yet
Unit 5
187 pages
Intranet Mail System Overview
100% (23)
Intranet Mail System Overview
17 pages
CAT RELAB Grade 10 Learner Guide
No ratings yet
CAT RELAB Grade 10 Learner Guide
116 pages
Samsung Personal
No ratings yet
Samsung Personal
12 pages
2024 - Broadcom Partner User Registration Guide - 06.04.2024
No ratings yet
2024 - Broadcom Partner User Registration Guide - 06.04.2024
8 pages
Gmail - Refund Form 3.0 Collabwish Media
No ratings yet
Gmail - Refund Form 3.0 Collabwish Media
7 pages

Intermediate Analytics-Regression-Week 3-1

Uploaded by

Intermediate Analytics-Regression-Week 3-1

Uploaded by

Intermediate

• K-Nearest Neighbors (KNN)

• A table used to evaluate the performance of a classification model.

• 9500 non-fraudulent transactions (majority class)

Model Prediction: with :

• It correctly predicts 9500 non-fraudulent transactions (True Negatives).

Model Accuracy (High but Misleading Accuracy ):

 Model Accuracy Calculation:

Why is this Misleading?

1. Model Ignores the Minority Class (Fraudulent Transactions):

3. Critical Errors Are Overlooked:

• Useful for imbalanced datasets.

You might also like