0% found this document useful (0 votes)
12 views13 pages

Lecture 12

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
12 views13 pages

Lecture 12

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 13

Naïve Bayes Classification

1
Introduction
• The Bayes Classifier requires probability
structure of the problem to be known.
• Density estimation (using non-parametric or
parametric methods) is one way to handle the
problem.
• There are several problems ….

2
Problems with density estimation
• Large datasets are needed.
• Numeric valued features are required.

• In practice these two may not be satisfied.

3
How to overcome the problem
• One has to work with the given data set.
• So, Probability estimations needs to be done
using the given data only.
• Often marginal probabilities can be better
estimated than the joint probabilities.
• Also, marginal probabilities are easy to
compute.

4
Play-tennis data

• P(<sunny,cool,high,false>|N) = 0
• But, P(sunny|N) = 3/5, P(cool|N) = 1/5,
P(high|N) = 4/5, P(false|N) = 2/5.

5
• P(<sunny, cool, high, false>|N) = 0
• This may be because of the smaller dataset.
• If we increase the dataset size, this may
become a positive number.

• This problem is often referred to as “the curse


of dimensionality”.

6
Assumption
• Make the assumption that for a given class,
features are independent of each other.

• In practice, this assumption holds very often.

• Then P(<sunny,cool,high,false>|N) =
P(sunny|N) . P(cool|N) . P(high|N) . P(false|N)
= 3/5 . 1/5 . 4/5 . 2/5 = 24/625.

7
Naïve Bayesian Classification

• Naïve assumption: for a given class, features


are independent of each other
P(<x1,…,xk>|C) = P(x1|C)·…·P(xk|C)
• P(xi|C) is estimated as the relative freq of
samples having value xi as i-th attribute in
class C
• It often makes the problem a feasible and
easy one to solve.
8
Play-tennis example: estimating P(xi|C)
outlook
P(sunny|p) = 2/9 P(sunny|n) = 3/5
P(overcast|p) = 4/9 P(overcast|n) = 0
P(rain|p) = 3/9 P(rain|n) = 2/5
temperature
P(hot|p) = 2/9 P(hot|n) = 2/5
P(mild|p) = 4/9 P(mild|n) = 2/5
P(cool|p) = 3/9 P(cool|n) = 1/5
humidity
P(high|p) = 3/9 P(high|n) = 4/5
P(p) = 9/14 P(normal|p) = 6/9 P(normal|n) = 2/5

P(n) = 5/14 windy


P(true|p) = 3/9 P(true|n) = 3/5
P(false|p) = 6/9 P(false|n) = 2/5 9
Play-tennis example: classifying X
• An unseen sample X = <rain, hot, high, false>

• P(X|p)·P(p) =
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 =
0.010582
• P(X|n)·P(n) =
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 =
0.018286

• Sample X is classified in class n (don’t play)

10
With Continuous features
• In order to use the Naïve Bayes classifier, the features has to
be discretized appropriately (otherwise what happens?)

11
12
With Continuous features
• In order to use the Naïve Bayes classifier, the features has to
be discretized appropriately (otherwise what happens?)
• Height = 4.234 will not occur anywhere in that column; but
4.213, 4.285 may be occuring. If you discretize (eg.,
rounding) then frequency ratio’s are meaningful.
• Clustering of feature values of a feature may be done to
achieve a better discretization.

13

You might also like