Classification Algorithms in
Artificial Intelligence
WHAT IS CLASSIFICATION???
• Classification is a data mining task of predicting the value of a categorical
variable (target or class). This can be done by building model based on one or
more numerical and/or categorical variables (predictors, attributes or features).
• It is considered as an instance case of supervised learning.
• The classification algorithms used in AI are a mixture of statistical analysis and
algebra, arranged in flowcharts and decision trees. Some approaches predate
the idea of creating machine intelligence, emerging from a field of statistics,
calculus and numerical analysis.
z
Classification learning
Training Testing
phase phase
Learning the classifier Testing how well the classifier
from the available data performs
‘Training set’ ‘Testing set’
(Labeled)
LOGISTIC K- NEAREST
NAIVE BAYESIAN
REGRESSION NEIGHBOR
CLASSIFICATION
BASED ALGORITHMS
ARTIFICIAL SUPPORT VECTOR
DECISION TREE RANDOM FOREST
NEURAL NETWORK MACHINE
NAIVE BAYESIAN
It is a classification algorithm based on Bayes’ theorem which gives an
assumption of independence among predictors. In simple terms, a Naïve Bayes
classifier assumes that the presence of a feature in a class is unrelated to the
presence of any other feature.
LOGISTIC REGRESSION
It is a classification algorithm in machine learning that uses one or more
independent variable to determine an outcome. The outcome is measured with a
dichotomous variable meaning it will have only two possible outcomes.
The goal of logistic regression is to find out the best fitting relationship between
dependent and a set of independent variables.
K-NEAREST NEIGHBOURS
• K-Nearest Neighbors (KNN) Algorithm uses ‘feature similarity’ to predict the
values of new data points which further means that the new data point will be
assigned a value based on how closely it matches the points in the training set.
• It is a lazy learning algorithm that stores all instances corresponding to training
data in n- dimensional space. It is a lazy learning algorithm as it does not focus
on constructing a general Internal Model, instead, it works on storing instances
of training data.
• It is a powerful classification algorithm used in pattern recognition.
In this figure, three nearest
neighbors of the data point with
Cross Mark. Among those three,
two of them lies in Triangle Class
hence the Cross Mark will also be
assigned in Triangle Class.
DECISION TREE
• A Decision Tree is a flowchart-like structure where an internal node
represents feature(or attribute), the branch represents a decision rule, and
each leaf node represents the outcome.
• The Decision Tree Algorithm builds the classification model in the form of
a tree structure. It utilizes the if-then rules which are equally exhaustive and
mutually exclusive in classification.
• Its training time is faster compared to the neural network algorithm. It is a
distribution-free or non-parametric method, which doesn’t depend upon
probability distribution assumptions.
• It can handle high dimensional data with good accuracy and also map non-
linear relationships quite well.
• Common terms in Decision Tree Algorithm are Root Node, Splitting,
Decision Node, Leaf/Terminal Node, Pruning, Branch/Sub-Tree and Parent
& Child Node.
RANDOM FOREST
• Random Decision Trees or Random Forest are an ensemble learning method
for classification, regression, etc. it operates by constructing a multitude of
decision trees at training time and outputs the class that is the mode of the
classes or classification or mean prediction (regression) of the individual trees.
ARTIFICIAL NEURAL NETWORK
• A Neural Network consists of neurons that are arranged in layers, they take
some input vector and convert it into an output. The process involves each
neuron taking input and applying a function which is often a non-linear
function to it and then passes the output to the next layer.
SUPPORT VECTOR MACHINE
• The Support Vector Machine is a classifier that represents the training data as
points in space separated into categories by a gap as wide as possible. New
points are then added to space by predicting which category they fall into and
which space they will belong to.
COMPARISION OF DIFFERENT
CLASSIFICATION ALGORITHM IN R-STUDIO
DECISION TREE
The adjoined image depicts
the Decision Tree.
Predicted Setosa Versicolor Virginica
Setosa 8 0 1
Versicolor 0 13 0
Virginica 0 2 6
The above table represents the confusion matrix for the testing dataset of iris by
Decision Tree. 8 out of 9 Setosa Species are correctly classified with 88.8% of
accuracy, 13 out of 13 Versicolor Species are classified correctly with 100% of
accuracy and 6 out of 8 Virginica Species are correctly classified with 75% of
accuracy. The overall accuracy rate of testing dataset is 90%.
ARTIFICIAL NEURAL NETWORK
The adjoining results were
obtained using R-Studio for
dataset “iris”. The figure indicates
a neural network which is having
3 layers namely input layer,
hidden layer and output layer. The
input layer consists of independent
variables i.e., Sepal Length, Sepal
Width, Petal Length and Petal
Width as nodes; the hidden layer
has 3 nodes and the output layer
has 3 categories of species i.e.,
Setosa, Versicolor and Virginica as
output nodes.
Predicted Setosa Versicolor Virginica
Setosa 9 0 0
Versicolor 0 11 2
Virginica 0 0 8
The above table represents the confusion matrix for the testing dataset of iris by
ANN. 9 out of the 9 Setosa Species are correctly classified with 100% of accuracy,
11 out of 13 Versicolor Species are classified correctly with 84.6% of accuracy and 8
out of 8 Virginica Species are correctly classified with 100% of accuracy. The
overall accuracy rate of testing dataset is 96.7%.
SUPPORT VECTOR MACHINE
The adjoining image shows
the hyperplane of testing
dataset of iris data using
support vector machine.
Predicted Setosa Versicolor Virginica
Setosa 9 0 0
Versicolor 1 12 0
Virginica 0 0 8
The above table represents the confusion matrix for the testing dataset of iris by
Support Vector Machine (SVM). 9 out of the 9 Setosa Species are correctly
classified with 100% of accuracy, 12 out of 13 Versicolor Species are classified
correctly with 92.3% of accuracy and 8 out of 8 Virginica Species are correctly
classified with 100% of accuracy. The overall accuracy rate of testing dataset is
96.7%.