FEM 2063 - Data Analytics: CHAPTER 4: Classifications
FEM 2063 - Data Analytics: CHAPTER 4: Classifications
CHAPTER 4: Classifications
4.1 Logistic Regression
4.2 Naïve Bayesian
4.3 Discriminant Analysis
1
Overview
➢Logistic Regression
➢Naïve Bayesian
➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis
2
Classification
Predicted
P TP FP
class
TP + TN
Accuracy = N FN TN
TP + TN + FP + FN
Overview
➢Logistic Regression
➢Naïve Bayesian
➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis
G. James, D. Witten, T. Hastie, R. Tibshirani, “An Introduction to Statistical Learning with Applications in R”, Springer,
ISBN 978-1-4614-7137-0, ISBN 978-1-4614-7138-7 (eBook) 5
Logistic Regression
Example classifications
• Given a feature vector X and a qualitative response Y taking values in the set C, the
classification task is to build a function C(X) that takes as input the feature vector X and
predicts its value for Y.
Linear regression does not estimate Pr(Y = 1|X) well (could generate negative values
or values greater than 1 as probability!)
Logistic Regression
What could be used?
After rearrangement
• This likelihood gives the probability of the observed zeros and ones in
the data. Find 0 and 1 to maximize the likelihood of the observed
data.
Logistic Regression
Maximum Likelihood
Example: Flipping a coin. The probability to get a head (H) is p
(- - > probability to get a tail (T) will be (1-p)).
After 3 flips the following result is obtained : HTH
Use maximum likelihood to estimate the parameter p, that best fits the data.
l(p)=p(1-p)p=p^2 – p^3
Example of output
Logistic Regression
Making Predictions
• What is our estimated probability of default for someone with a balance of $1000?
model = LogisticRegression()
model.fit(x, y)
beta0=model.intercept_
beta1=model.coef_
ypred=model.predict(xtest)
ypred=model.predict(xtest): [0 1 1 1 0]
Accuracy=4/5=80%
Logistic Regression
Example 2: Default in terms of Income
#input the training (first 10) and testing (last 5) data
x = np.array([25000, 35000, 23000, 28000, 30000, 26000, 43000, 34000, 42000, 26000]).reshape(-1, 1)
y = np.array([1, 0, 1, 0, 0, 0, 1, 0, 1, 0])
xtest = np.array([23000, 29000, 31000, 42000, 30000]).reshape(-1, 1)
ytest = np.array([0, 1, 1, 1, 1])
model = LogisticRegression()
model.fit(x, y)
beta0=model.intercept_
beta1=model.coef_
ypred=model.predict(xtest)
ypred=model.predict(xtest): [0 0 0 0 0]
Accuracy=1/5=20%
Logistic Regression
More than 2 independent variables
Example of output
Logistic Regression
Example 3: Default in terms of Income and Balance
x = np.array([[25000,3000],[35000,1000],[23000,2700], [28000,1500], [30000,1200], [26000,1400],
[43000,4200],[34000,580],[42000,7390], [26000,245]])
y = np.array([1, 0, 1, 0, 0, 0, 1, 0, 1, 0])
xtest = np.array([[23000,1970], [29000,2845],[31000,4656],[42000,5800],[30000,900]])
ytest = np.array([0, 1, 1, 1, 1])
model = LogisticRegression()
model.fit(x, y)
beta0=model.intercept_
beta1=model.coef_
ypred=model.predict(xtest)
ypred=model.predict(xtest): [1 1 1 1 0]
Accuracy=3/5=60%
Logistic Regression
Example: South African Heart Disease
• This coding suggests an ordering, it implies that the difference between stroke and
drug overdose is the same as between drug overdose and epileptic seizure.
Other option.
• For example first (stroke and drug overdose) vs epileptic seizure followed by
stroke vs drug overdose
FEM 2063 - Data Analytics
CHAPTER 4: Classifications
33
Naïve Bayes
Learning objectives:
The histograms
The corresponding
normal distributions
Naïve Bayes
Example
• The classification task is to determine P(H|X), the probability that the hypothesis
holds given the observed data sample X
𝑷 𝑿 𝑯 𝑷(𝑯)
𝑷 𝑯𝑿 =
𝑷(𝑿)
Predicts X belongs to Ci iff the probability P(Ci|X) is the highest among all the
P(Ck|X) for all the k classes
Bayes Classifiers
Example
𝑷 𝑿 𝑯 𝑷(𝑯)
𝑷 𝑯𝑿 =
𝑷(𝑿)
Bayes Classifiers
Example
1 Attribute: Name
Bayes Classifiers
Example
𝑷 𝑿 𝑯 𝑷(𝑯)
𝑷 𝑯𝑿 =
𝑷(𝑿)
Bayes Classifiers
Example
Officer Drew is a female!
Bayes Classifiers
𝑷 𝑿 𝑯𝒊 𝑷(𝑯𝒊 )
𝑷(𝑯𝒊 𝑿 =
𝑷(𝑿)
Since P(X) is constant for all classes, only 𝑷 𝑿 𝑯𝒊 𝑷(𝑯𝒊 ) need to be compared
Bayes Classifiers
More than one attribute
Example: Height, Eye Color, Hair Length
P(male| Height, Eye, Hair Length ) = P(Height, Eye, Hair Length | male) P(male)
In that case:
P(Height, Eye, Hair Length | male) = P(Height | male)* P(Eye | male) *P(Hair
Length | male)
Using the training set, all probabilities could be calculated and stored in a table
Bayes Classifiers
Example
Over Hair
Sex Prob Eye Prob Prob
170cm length
Yes 1/3 Yes Yes
Male
No 2/3 No No
Yes 2/5 Yes Yes
Female
No 3/5 No No
Bayes Classifiers
• Advantages:
–Fast to train. Fast to classify
CHAPTER 4: Classifications
54
Discriminant Analysis
Learning objectives:
G. James, D. Witten, T. Hastie, R. Tibshirani, “An Introduction to Statistical Learning with Applications in R”, Springer,
ISBN 978-1-4614-7137-0, ISBN 978-1-4614-7138-7 (eBook)
Discriminant Analysis
Aim:
Let X represents predictors (number –p) and Y represents classes ( number -K)
class k of Y:
Pr( 𝑌 = 𝑘|𝑋 = 𝑥)
Discriminant Analysis
Idea:
• Model the distribution of X in each of the classes separately, and then use Bayes
theorem to obtain
Pr( 𝑌 = 𝑘|𝑋 = 𝑥)
• Use normal (Gaussian) distributions for each class, this leads to linear or
quadratic discriminant analysis.
, where
• To classify the value X = x, we need to find the k which gives the largest pk ( x)
The decision boundary is the boundary where all classes have the same
discriminant score
Notation:
Discriminant Analysis
The density
Discriminant Analysis
Decision boundaries
Solid : LDA
Dashed : Bayes
Discriminant Analysis
Illustration: Fisher's Iris Data
• 4 variables
• 3 species
• 50 samples/class
Discriminant Analysis
Fisher's Discriminant Plot
Discriminant Analysis
Other forms of Discriminant Analysis
Decision boundaries
Green : QDA
Purple : Bayes
Black : LDA
Summary (chapter 4)
Logistic Regression (LR) Naïve Bayes Linear Discriminant
Analysis (LDA)
Logistic function Independence of
Maximum likelihood Normal distribution
attributes Same covariance
matrices
Quadratic Discriminant
Analysis (QDA)
Normal distribution
Different covariance
matrices
76