Lecture10 - Bayesian Classifier
Lecture10 - Bayesian Classifier
Naïve
Bayesian
rule
intuition
Lets talk about a factory that
makes spanners
• We have two machines in the
factory.
• Each machine has different
characteristics i.e., they work at
different rate, they consume
different power.
• But they produce that same
spanners.
• The spanners produced by
machine 1 and machine 2are
labeled m1 and m2 respectively.
• In the figure we can see some
defective spanners among the
good ones marked as black.
• What is the probability of
machine 2 producing a defective
spanner
• We can do this using the bayes
theorem
Introduction to Naïve Bayes Classifier
It is a classification technique based on Bayes’ Theorem
• The fundamental Naïve Bayes assumption is that each feature makes an:
• In the multivariate Bernoulli event model, features are independent booleans (binary variables)
describing inputs. Like the multinomial model, this model is popular for document
classification tasks, where binary term occurrence (i.e. a word occurs in a document or not)
features are used rather than term frequencies (i.e. frequency of a word in the document).
• In Gaussian Naïve Bayes, continuous values associated with each feature are assumed to be
distributed according to a Gaussian distribution (Normal distribution). When plotted, it gives a
bell-shaped curve which is symmetric about the mean of the feature values as shown below:
Bayes Theorem is
P(A/B) = P(B/A) * P(A)
P(B)
Naïve Bayes Classifier
Example
Naïve Bayes Classifier
The Frequency table shows how often labels appear for each feature. These tables will
assist us in calculating the prior and posterior probabilities.
• Now let us divide the above frequency table into prior and posterior
probabilities. The prior probability is the probability of an event before
new data is collected. In contrast, the posterior probability is the
statistical probability that calculates the hypothesis to be true in the
light of relevant observations.
• Table 1 shows the prior probabilities of labels, and Table 2 shows the
posterior probability.
• We want to calculate the probability of playing when the weather is
overcast.
• Solution:
Probability of Playing:
We have the formula for the Naive Bayes classification, which is
P(Yes | Overcast) = P(Overcast | Yes) P(Yes) / P
(Overcast).
Now, let’s calculate the prior probabilities:
P(Overcast) = 4/14 = 0.29
P(Yes)= 9/14 = 0.64
• The next step is to find the posterior probability, which can be easily
calculated by:
• P(Overcast | Yes) = 4/9 = 0.44
• Once we have the posterior and prior probabilities, we can put them
back in our main formula to calculate the probability of playing when
the weather is overcast.
• P(Yes | Overcast) = 0.44 * 0.64 / 0.29 = 0.98
• Probability of Not Playing:
• Similarly, we can calculate the probability of not playing any sports when overcast weather.
• First, let us calculate the prior probabilities
• P(Overcast) = 4/14 = 0.29
• P(No)= 5/14 = 0.36
• The next step is to calculate the posterior probability, which is:
• P(Overcast | No) = 0/9 = 0
• By putting these probabilities in the main formula, we get:
• P(No | Overcast) = 0 * 0.36 / 0.29 = 0
• We can see that the probability of a playing class is higher, so if the weather is overcast
players will play sports.
Example # 02
Naïve Bayes Classifier
• Outlook and temperature are features and play is output
• Today(Sunny, hot), Play or Not = ?
Outlook Temperature
Yes No P(Y) P(N) Yes No P(Y) P(N)
Sunn 2 3 2/9 3/5 Hot 2 2 2/9 2/5
y
Overc 4 0 4/9 0/5 Mild 4 2 4/9 2/5
ast Cool 3 1 3/9 1/5
Rainy 3 2 3/9 2/5
Total 9 5 100% 100%
Total 9 5 100% 100%
Play
P(Y) & P(N)
Yes 9 9/14
No 5 5/14
Total 14 100%
Naïve Bayes Classifier
P(Yes) = 0.031 / 0.031 + 0.0837 => Normalize
P(Yes) = 0.27
P(No) = 1 – 0.27
P(No) = 0.73
Output = argmax(P)
Output = 0.73 => No
Python Implementation
Problem
The problem is same as before
You are given data by your manager of customers who have previously bought some older makes of your company’s
SUV. The data includes a total of 400 instances
Independent variables
• Age
• Estimated salary
Dependent Variable
You are asked to predict who will buy the new SUV vehicle
Steps
Split the dataset
Import the Import the into training Feature scaling
libraries dataset and testing
samples
Visualize the
Visualize the
training set
results test set results
Importing libraries and dataset +
dividing the dataset into train and test sets
Scaling the
data
• We do not need to scale the dependent variable ‘purchased’ as it is already in the
form of ‘0’ and ‘1’.
• Remember scaling is always applied after splitting the dataset into test and train sets,
because we want to avoid data leakage.
Training the dataset
Predicting unknown
values
Predicting results on the test data
Confusion matrix and
accuracy
• Accuracy = 90%
Decision boundary Training
set
Decision boundary test set