Naive Bayes Algorithm
Naive Bayes Algorithm
It is an algorithm that learns the probability of every object, its features, and which groups
they belong to. It is also known as a probabilistic classifier. The Naive Bayes Algorithm
comes under supervised learning and is mainly used to solve classification problems.
For example, you cannot identify a bird based on its features and color as there are many
birds with similar attributes. But, you make a probabilistic prediction about the same, and that
is where the Naive Bayes Algorithm comes in.
Probability is the base for the Naive Bayes algorithm. This algorithm is built based on the
probability results that it can offer for unsolvable problems with the help of prediction. You
can learn more about probability, Bayes theory, and conditional probability below:
Probability
Probability helps to predict an event's occurrence out of all the potential outcomes. The
mathematical equation for probability is as follows:
0 < = probability of an event < = 1. The favorable outcome denotes the event that results from
the probability. Probability is always between 0 and 1, where 0 means no probability of it
happening, and 1 means the success rate of that event is likely.
For better understanding, you can also consider a case where you predict a fruit based on its
color and texture. Here are some possible assumptions that you can make. You can either
choose the correct fruit that you have in mind or get confused with similar fruits and make
mistakes. Either way, the probability of choosing the right fruit is 50%.
Bayes Theory
Bayes Theory works on coming to a hypothesis (H) from a given set of evidence (E). It
relates to two things: the probability of the hypothesis before the evidence P(H) and the
probability after the evidence P(H|E). The Bayes Theory is explained by the following
equation:
The Bayes Rule is a method for determining P(H|E) from P(E|H). In short, it provides you
with a way of calculating the probability of a hypothesis with the provided evidence.
Conditional Probability
Conditional probability is a subset of probability. It reduces the probability of becoming
dependent on a single event. You can compute the conditional probability for two or more
occurrences.
When you take events X and Y, the conditional probability of event Y is defined as the
probability that the event occurs when event X is already over. It is written as P(Y|X). The
mathematical formula for this is as follows:
Bayesian Probability
The Bayesian Rule is used in probability theory for computing - conditional probabilities.
What is important is that you cannot discover just how the evidence will impact the
probability of an event occurring, but you can find the exact probability.
There are training data to train your model and make it functional. You then need to validate
the data for evaluating the model and making new predictions. Finally, you need to call the
input attributes “evidence” and label them “outputs” in the training data.
Using conditional probability denoted by P(E|O), you can calculate the probability of the
evidence from the given outputs. Your ultimate goal is to compute P(O|E) - the probability of
output based on the current attributes.
When the problem has two outputs, you can calculate the probability of every outcome and
say which one wins. Whereas if you have various input attributes, then the Naive Bayesian
Algorithm will be needed.
For instance, you are using the social_media_ads dataset. With this problem, you can predict
if a user has purchased a product by clicking on the ad, depending on her age and other
attributes. You can understand the working of the Naive Bayes Classifier by following the
below steps:
You can use the below command for importing the basic libraries required.
The below command will help you with the data preprocessing.
Feature Scaling
In this step, you have to split the dataset into a training dataset (70%) and a testing dataset
(30%). Next, you have to do some basic feature scaling with the help of a standard scaler. It
will transform the dataset in a way where the mean value will be 0, and the standard deviation
will be 1.
You should then write the following command for training the model.
A confusion matrix helps to understand the quality of the model. It describes the production
of a classification model on a set of test data for which you know the true values. Every row
in a confusion matrix portrays an actual class, and every column portrays the predicted class.
However, in some cases, these steps might not be absolutely necessary. But the above-
mentioned example provides a clear idea and information about how data points can be
classified.
There are four types of the Naive Bayes Model, which are explained below:
It is a straightforward algorithm used when the attributes are continuous. The attributes
present in the data should follow the rule of Gaussian distribution or normal distribution. It
remarkably quickens the search, and under lenient conditions, the error will be two times
greater than Optimal Naive Bayes.
Optimal Naive Bayes selects the class that has the greatest posterior probability of
happenings. As per the name, it is optimal. But it will go through all the possibilities, which
is very slow and time-consuming.
Bernoulli Naive Bayes
Bernoulli Naive Bayes is an algorithm that is useful for data that has binary or boolean
attributes. The attributes will have a value of yes or no, useful or not, granted or rejected, etc.
The Naive Bayes Algorithm has trouble with the ‘zero-frequency problem’. It
happens when you assign zero probability for categorical variables in the training
dataset that is not available. When you use a smooth method for overcoming this
problem, you can make it work the best.
It will assume that all the attributes are independent, which rarely happens in real life.
It will limit the application of this algorithm in real-world situations.
It will estimate things wrong sometimes, so you shouldn’t take its probability outputs
seriously.