Lecture 12
Lecture 12
1
Introduction
• The Bayes Classifier requires probability
structure of the problem to be known.
• Density estimation (using non-parametric or
parametric methods) is one way to handle the
problem.
• There are several problems ….
2
Problems with density estimation
• Large datasets are needed.
• Numeric valued features are required.
3
How to overcome the problem
• One has to work with the given data set.
• So, Probability estimations needs to be done
using the given data only.
• Often marginal probabilities can be better
estimated than the joint probabilities.
• Also, marginal probabilities are easy to
compute.
4
Play-tennis data
• P(<sunny,cool,high,false>|N) = 0
• But, P(sunny|N) = 3/5, P(cool|N) = 1/5,
P(high|N) = 4/5, P(false|N) = 2/5.
5
• P(<sunny, cool, high, false>|N) = 0
• This may be because of the smaller dataset.
• If we increase the dataset size, this may
become a positive number.
6
Assumption
• Make the assumption that for a given class,
features are independent of each other.
• Then P(<sunny,cool,high,false>|N) =
P(sunny|N) . P(cool|N) . P(high|N) . P(false|N)
= 3/5 . 1/5 . 4/5 . 2/5 = 24/625.
7
Naïve Bayesian Classification
• P(X|p)·P(p) =
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 =
0.010582
• P(X|n)·P(n) =
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 =
0.018286
10
With Continuous features
• In order to use the Naïve Bayes classifier, the features has to
be discretized appropriately (otherwise what happens?)
11
12
With Continuous features
• In order to use the Naïve Bayes classifier, the features has to
be discretized appropriately (otherwise what happens?)
• Height = 4.234 will not occur anywhere in that column; but
4.213, 4.285 may be occuring. If you discretize (eg.,
rounding) then frequency ratio’s are meaningful.
• Clustering of feature values of a feature may be done to
achieve a better discretization.
13