100% found this document useful (1 vote)
108 views

FEM 2063 - Data Analytics: CHAPTER 4: Classifications

This document discusses classification techniques including logistic regression, naive Bayesian, and discriminant analysis. It provides examples of using logistic regression for classification problems involving credit card default prediction based on customer income and balance. Specifically, it shows how to build a logistic regression model using maximum likelihood estimation on training data, make predictions on test data, and calculate accuracy. Adding more predictor variables like both income and balance is also demonstrated to improve prediction performance compared to using only one variable.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
108 views

FEM 2063 - Data Analytics: CHAPTER 4: Classifications

This document discusses classification techniques including logistic regression, naive Bayesian, and discriminant analysis. It provides examples of using logistic regression for classification problems involving credit card default prediction based on customer income and balance. Specifically, it shows how to build a logistic regression model using maximum likelihood estimation on training data, make predictions on test data, and calculate accuracy. Adding more predictor variables like both income and balance is also demonstrated to improve prediction performance compared to using only one variable.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

FEM 2063 - Data Analytics

CHAPTER 4: Classifications
4.1 Logistic Regression
4.2 Naïve Bayesian
4.3 Discriminant Analysis

1
Overview
➢Logistic Regression

➢Naïve Bayesian

➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis

2
Classification

Supervised learning or classification: attribution of a class or label to an


observation by exploiting the availability of a training set (labeled data)

Unsupervised learning or clustering: representation of input data in


clusters/classes based on some inherent similarity measures (no training set)
ClassificationTHE
Performance
TOOLS

TP : True Positive FP : False Positive

TN : True Negative FN : False Negative


TN
Specificity =
TN + FP
Confusion Matrix
TP
Sensitivity = = recall = r Actual class
TP + FN P N

Predicted
P TP FP

class
TP + TN
Accuracy = N FN TN
TP + TN + FP + FN
Overview
➢Logistic Regression

➢Naïve Bayesian

➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis

G. James, D. Witten, T. Hastie, R. Tibshirani, “An Introduction to Statistical Learning with Applications in R”, Springer,
ISBN 978-1-4614-7137-0, ISBN 978-1-4614-7138-7 (eBook) 5
Logistic Regression
Example classifications

• Qualitative variables take values in an unordered set C, such as:


eye color {brown, blue, green}
email {spam, ham}.

• Given a feature vector X and a qualitative response Y taking values in the set C, the
classification task is to build a function C(X) that takes as input the feature vector X and
predicts its value for Y.

• Interested in estimating the probabilities that X belongs to each category in C.

• Probability that an insurance claim is fraudulent, instead of classification fraudulent or not.


Logistic Regression
Example: Credit Card Default
Logistic Regression
Example: Credit Card Default
Simulated Default data set: Individuals with annual income, monthly credit card balance, default on
payment (Yes or No)
Objective: predict whether an individual will default on his or her credit card payment based on
annual income and monthly credit card balance
Logistic Regression
Example: Credit Card Default
Can we use Linear Regression?
• Suppose for the Default classification task that we code

• Can we simply perform a linear regression of Y on X and classify as Yes if


Logistic Regression
Default Income Balance
Can we use Linear Regression? 1 25000 3000
0 35000 1000
1 23000 2700
0 28000 1500
0 30000 1200
For example Default in terms of Balance 0 26000 1400
1 43000 4200
0 34000 580
1 42000 7390
0 26000 245
0 23000 1970
1 29000 2845
1 31000 4656
1 42000 5800
1 30000 900
Logistic Regression
Can we use Linear Regression?

Linear regression does not estimate Pr(Y = 1|X) well (could generate negative values
or values greater than 1 as probability!)
Logistic Regression
What could be used?

Needed : a function that takes values between 0 and 1

For example: Logistic function -- > Logistic regression.


Logistic Regression
• Let’s write p(X) = Pr(Y = 1|X) for short and consider using balance to
predict default in the previous example.

• Logistic regression uses the form

After rearrangement

• This monotone transformation is called the log odds or logit


transformation of p(X).
Logistic Regression
Maximum Likelihood
• We use maximum likelihood to estimate the parameters.

• This likelihood gives the probability of the observed zeros and ones in
the data. Find  0 and 1 to maximize the likelihood of the observed
data.
Logistic Regression
Maximum Likelihood
Example: Flipping a coin. The probability to get a head (H) is p
(- - > probability to get a tail (T) will be (1-p)).
After 3 flips the following result is obtained : HTH

Use maximum likelihood to estimate the parameter p, that best fits the data.

l(p)=p(1-p)p=p^2 – p^3

l’(p)=p(2-3p) l’(p)=0 for p=0 or p=2/3


p=2/3 is the best value for the data
Logistic Regression
Maximum Likelihood

Using a software (e.g. Python, R) to find the Logistic Regression model

Example of output
Logistic Regression
Making Predictions

• What is our estimated probability of default for someone with a balance of $1000?

• With a balance of $2000?


Logistic Regression
Default Income Balance
Example 1: Default in terms of Balance 1 25000 3000
0 35000 1000
1 23000 2700
0 28000 1500
Use the first 10 values to build the model 0 30000 1200
0 26000 1400
1 43000 4200
Check the performance of the model using the 0 34000 580
1 42000 7390
last 5 values
0 26000 245
0 23000 1970
1 29000 2845
1 31000 4656
1 42000 5800
1 30000 900
Logistic Regression
Example 1: Default in terms of Balance
#input the training (first 10) and testing (last 5) data
x = np.array([3000, 1000, 2700, 1500, 1200, 1400, 4200, 580, 7390, 245]).reshape(-1, 1)
y = np.array([1, 0, 1, 0, 0, 0, 1, 0, 1, 0])
xtest = np.array([1970, 2845, 4656, 5800, 900]).reshape(-1, 1)
ytest = np.array([0, 1, 1, 1, 1])

model = LogisticRegression()
model.fit(x, y)
beta0=model.intercept_
beta1=model.coef_
ypred=model.predict(xtest)

#get the accuracy


model.score(xtest, ytest)
Logistic Regression
Example 1: Default in terms of Balance

xtest = np.array([1970, 2845, 4656, 5800, 900]).reshape(-1, 1)

ytest = np.array([0, 1, 1, 1, 1])

ypred=model.predict(xtest): [0 1 1 1 0]

Accuracy=4/5=80%
Logistic Regression
Example 2: Default in terms of Income
#input the training (first 10) and testing (last 5) data
x = np.array([25000, 35000, 23000, 28000, 30000, 26000, 43000, 34000, 42000, 26000]).reshape(-1, 1)
y = np.array([1, 0, 1, 0, 0, 0, 1, 0, 1, 0])
xtest = np.array([23000, 29000, 31000, 42000, 30000]).reshape(-1, 1)
ytest = np.array([0, 1, 1, 1, 1])

model = LogisticRegression()
model.fit(x, y)
beta0=model.intercept_
beta1=model.coef_
ypred=model.predict(xtest)

#get the accuracy


model.score(xtest, ytest)
Logistic Regression
Example 2: Default in terms of Income

ytest = np.array([0, 1, 1, 1, 1])

ypred=model.predict(xtest): [0 0 0 0 0]

Accuracy=1/5=20%
Logistic Regression
More than 2 independent variables

Example of output
Logistic Regression
Example 3: Default in terms of Income and Balance
x = np.array([[25000,3000],[35000,1000],[23000,2700], [28000,1500], [30000,1200], [26000,1400],
[43000,4200],[34000,580],[42000,7390], [26000,245]])
y = np.array([1, 0, 1, 0, 0, 0, 1, 0, 1, 0])
xtest = np.array([[23000,1970], [29000,2845],[31000,4656],[42000,5800],[30000,900]])
ytest = np.array([0, 1, 1, 1, 1])
model = LogisticRegression()
model.fit(x, y)
beta0=model.intercept_
beta1=model.coef_
ypred=model.predict(xtest)

#get the accuracy


model.score(xtest, ytest)
Logistic Regression
Example 3: Default in terms of Income and Balance

ytest = np.array([0, 1, 1, 1, 1])

ypred=model.predict(xtest): [1 1 1 1 0]

Accuracy=3/5=60%
Logistic Regression
Example: South African Heart Disease

• 160 cases of MI (myocardial infarction) and 302 controls (all male in


age range 15-64), from Western Cape, South Africa in early 80s.

• Overall prevalence very high in this region: 5:1%.

• Goal is to identify relative strengths and directions of risk factors.


Logistic Regression
Example: South African Heart Disease
Logistic Regression
Example: South African Heart Disease

Scatterplot matrix of the


South African Heart
Disease data.
The cases (MI) are red,
the controls turquoise.
Logistic Regression
Example: South African Heart Disease
Example of output
Logistic Regression
More than two classes
• A patient presents at the emergency room, and we must classify them according to
their symptoms.

• This coding suggests an ordering, it implies that the difference between stroke and
drug overdose is the same as between drug overdose and epileptic seizure.

• Use Multiclass Logistic Regression


Logistic Regression
More than two classes
• It is easily generalized to more than two classes.

• Multiclass logistic regression is also referred to as multinomial regression.


Logistic Regression
More than two classes

Other option.

• Use twice 2 class Logistic Regression

• For example first (stroke and drug overdose) vs epileptic seizure followed by
stroke vs drug overdose
FEM 2063 - Data Analytics

CHAPTER 4: Classifications

4.1 Logistic Regression


4.2 Naïve Bayes
4.3 Discriminant Analysis

33
Naïve Bayes

Learning objectives:

Understand Naïve Bayes Classifier

Some references http://www3.cs.stonybrook.edu/~cse634/ch6book.pdf


https://www3.cs.stonybrook.edu/~cse634/T14.pdf
Naïve Bayes
Example “Antenna Length” of insects
Naïve Bayes
Example

Histogram of “Antenna Length”


Naïve Bayes
Example

The histograms

The corresponding
normal distributions
Naïve Bayes
Example

• If an antenna is 3 units long, to which kind of insect it belongs to?


• Is it more probable to be Grasshopper or Katydid?
Naïve Bayes
Example
Naïve Bayes
Example
Naïve Bayes
Example
Naïve Bayes
Naïve Bayes (Bayes, Idiot, or Simple) classifier is statistical classifier.

It performs probabilistic prediction.

Idea: Find the probability of a previously unseen instance belonging to each


class, then classify it based on the highest probability.
Bayes Classifiers
• Let X be a data sample (“evidence”): class label is unknown

• Let H be a hypothesis that X belongs to class C

• The classification task is to determine P(H|X), the probability that the hypothesis
holds given the observed data sample X

• P(H) : prior probability (initial probability)

• P(X): probability that sample data is observed

• P(X|H) : posteriori probability (the probability of observing the sample X, given


that the hypothesis holds)
Bayes Classifiers
Given a data X, the posteriori probability of a hypothesis H, noted P(H|X),
follows the Bayes theorem:

𝑷 𝑿 𝑯 𝑷(𝑯)
𝑷 𝑯𝑿 =
𝑷(𝑿)

Informally: posteriori = likelihood x prior/evidence

Predicts X belongs to Ci iff the probability P(Ci|X) is the highest among all the
P(Ck|X) for all the k classes
Bayes Classifiers
Example

𝑷 𝑿 𝑯 𝑷(𝑯)
𝑷 𝑯𝑿 =
𝑷(𝑿)
Bayes Classifiers
Example

Given the small database with names and sex.

We can apply Bayes theorem

1 Attribute: Name
Bayes Classifiers
Example

𝑷 𝑿 𝑯 𝑷(𝑯)
𝑷 𝑯𝑿 =
𝑷(𝑿)
Bayes Classifiers
Example
Officer Drew is a female!
Bayes Classifiers

Suppose there are m classes Ci , i=1…m.

𝑷 𝑿 𝑯𝒊 𝑷(𝑯𝒊 )
𝑷(𝑯𝒊 𝑿 =
𝑷(𝑿)

Since P(X) is constant for all classes, only 𝑷 𝑿 𝑯𝒊 𝑷(𝑯𝒊 ) need to be compared
Bayes Classifiers
More than one attribute
Example: Height, Eye Color, Hair Length

P(male| Height, Eye, Hair Length ) = P(Height, Eye, Hair Length | male) P(male)

Challenge in computing Probability P(Height, Eye, Hair Length | male) !


Bayes Classifiers
More than one attribute
Assumption: attributes are conditionally independent: Height, Eye Color, Hair
Length

In that case:
P(Height, Eye, Hair Length | male) = P(Height | male)* P(Eye | male) *P(Hair
Length | male)

Using the training set, all probabilities could be calculated and stored in a table
Bayes Classifiers
Example

Over Hair
Sex Prob Eye Prob Prob
170cm length
Yes 1/3 Yes Yes
Male
No 2/3 No No
Yes 2/5 Yes Yes
Female
No 3/5 No No
Bayes Classifiers
• Advantages:
–Fast to train. Fast to classify

–Not sensitive to irrelevant features

–Handles real and discrete data

–Handles streaming data well

• Disadvantage: Assumes independence of features


FEM 2063 - Data Analytics

CHAPTER 4: Classifications

4.3 Discriminant Analysis

54
Discriminant Analysis

Learning objectives:

Understand how Discriminant Analysis works as a classifier

G. James, D. Witten, T. Hastie, R. Tibshirani, “An Introduction to Statistical Learning with Applications in R”, Springer,
ISBN 978-1-4614-7137-0, ISBN 978-1-4614-7138-7 (eBook)
Discriminant Analysis
Aim:

Let X represents predictors (number –p) and Y represents classes ( number -K)

For a given input x of X, we aim to find the probability of it to be in a particular

class k of Y:

Pr( 𝑌 = 𝑘|𝑋 = 𝑥)
Discriminant Analysis
Idea:

• Model the distribution of X in each of the classes separately, and then use Bayes
theorem to obtain
Pr( 𝑌 = 𝑘|𝑋 = 𝑥)

• Use normal (Gaussian) distributions for each class, this leads to linear or
quadratic discriminant analysis.

• Remark: it could be done with other distributions.


Discriminant Analysis
Bayes theorem

Using the notation for , we get

, where

f k ( x) = Pr( X = x | Y = k ) is the (normal) density for X in class k .

 k = Pr(Y = k ) is the marginal or prior probability for class k.


Discriminant Analysis

Classify to the highest density


Discriminant Analysis

Linear Discriminant Analysis when there is only 1 predictor (p=1)

• The Gaussian (normal) density has the form

• Here k is the mean, and  k2 the variance (in class k).

• We will assume that all the  k =  are the same.


Discriminant Analysis
Linear Discriminant Analysis when there is only 1 predictor (p=1)

Plugging into Bayes formula, we get:


Discriminant Analysis
Discriminant functions

• To classify the value X = x, we need to find the k which gives the largest pk ( x)

• After simplifications it is equivalent of finding the largest discriminant score:

Note that  k ( x) is a linear function of x .


Discriminant Analysis

The decision boundary is the boundary where all classes have the same
discriminant score

Example: If K = 2 (2 classes) and assume that 1 =  2 = 0.5 , then the decision


boundary is at
Discriminant Analysis
Example of decision boundaries:
Discriminant Analysis
Parameter Estimation

Size of training set

Size of class k in the training set


Discriminant Analysis
Linear Discriminant Analysis when p > 1

X= (X1, …, Xp) from a multivariate Gaussian distribution with

• a class specific mean vector

• a common covariance matrix

Notation:
Discriminant Analysis

Linear Discriminant Analysis when p > 1

The density
Discriminant Analysis

Linear Discriminant Analysis when p > 1

The discriminant function


Discriminant Analysis
Illustration: p = 2 and K = 3 classes

Decision boundaries

Solid : LDA

Dashed : Bayes
Discriminant Analysis
Illustration: Fisher's Iris Data

• 4 variables
• 3 species
• 50 samples/class
Discriminant Analysis
Fisher's Discriminant Plot
Discriminant Analysis
Other forms of Discriminant Analysis

• When f k ( x) are Gaussian densities, with the same covariance matrix in


each class, this leads to linear discriminant analysis.

• With Gaussians but different  k in each class, we get quadratic


discriminant analysis.
Discriminant Analysis

Quadratic Discriminant Analysis

The discriminant function


Discriminant Analysis
Quadratic Discriminant Analysis

Decision boundaries

Green : QDA
Purple : Bayes
Black : LDA
Summary (chapter 4)
Logistic Regression (LR) Naïve Bayes Linear Discriminant
Analysis (LDA)
Logistic function Independence of
Maximum likelihood Normal distribution
attributes Same covariance
matrices

Quadratic Discriminant
Analysis (QDA)

Normal distribution
Different covariance
matrices
76

You might also like