Classification
(Supervised Learning)
COE 544
Intelligent Engineering Algorithms
Prepared by:
Jason Sakr
Elias Saliba
Marc Mansour
Instructor: Dr. Joe Tekli
10/1/2016 1
Introduction
Machine Learning
“A field of study that gives computers the ability to learn without being
explicitly programmed”
- Arthur Lee Samuel
10/1/2016 2
Introduction
Supervised vs Unsupervised
Supervised
• Previous knowledge built-in Input/Output
pairs
• Called labeled training data
• Used to generate a desired output
Training
• Input/output training pairs Data
• Output is called the supervisory signal
10/1/2016 3
Input Machine Output
Introduction
Supervised vs Unsupervised
Unsupervised
• No prior knowledge on the data
• No training data set
• No idea of what the results should look like
• No input/output training pairs
10/1/2016 4
Introduction
Classification vs Regression
Classification
• One of the most common problems in SL.
• When the machine encounters new samples, it should be able to identify their
class.
• Classification algorithms are called classifiers.
• Classify inputs into discrete categories.
10/1/2016 5
Introduction
Classification vs Regression
Classification example Discrete Set
COW
COW
DOG
Input Output
… Training Set
DOG CAT
Input Output HORSE
10/1/2016 6
Introduction
Classification vs Regression
Classification example
COW
• With enough training
data, the machine
DOG should be able to
classify the animal
CAT
correctly as a dog.
Actual Input
HORSE
10/1/2016 7
Introduction
Classification vs Regression
Regression
• Having some information about the data in hand
• The machine tries to predict certain outcomes.
• Tries to map inputs into continuous outputs.
10/1/2016 8
Introduction
Classification vs Regression
Regression example Continuous Set
$10,000
Input Output
Training Set
$1,000,000
[10,000 ; 1,000,000]
Input Output
10/1/2016 9
Introduction
Classification vs Regression
Regression example
$150,000
• With enough
information on the size
$500,000 of the house, its price
should be approximated
accurately.
Actual Input $800,000
10/1/2016 10
Introduction
Binary vs Multiclass
• Binary:
• Samples must be classified into one of two categories
• Patient is Sick/Healthy, price is Higher/Lower than a certain value.
• Multiclass:
• Samples must be classified into one of many categories
• Animal is a Dog/Cat/Horse, fruit is an Apple/Orange/Banana/Strawberry.
10/1/2016 11
Introduction
Classifiers
There are many algorithms for classification:
• Linear Classifiers: • Support Vector Machines
• Naïve Bayes Classifier
•
• Quadratic Classifiers
Perceptron
• Decision Trees:
• ID3
• Random Forests • Kernel Estimation
• K-Nearest Neighbor 10/1/2016 12
Outline
• Introduction
• Support Vector Machine (SVM)
• Algorithm
• Applications
• K-Nearest Neighbors (KNN)
• Algorithm
• Applications
• Demo
• Decision Trees
• ID3 Algorithm
• Demo
• Conclusion 10/1/2016 13
SVM
Algorithm
• This classifier is
given labeled
training data
(supervised
learning), the
algorithm outputs
an optimal
hyperplane which
categorizes the info
into classes.
10/1/2016 14
SVM
Algorithm
• The goal is
to provide a
hyperplane
that
classifies all
vectors into
2 classes.
10/1/2016 15
SVM
Algorithm
• There exists
more than one
hyperplane
that can split
the two
classes.
10/1/2016 16
SVM
Algorithm
• The best
choice will
be the
hyperplane
that has the
maximum
margin from
both classes.
10/1/2016 17
SVM
Algorithm
• The aim is to
Maximize the
total margin by
minimizing the
weighted
vector.
10/1/2016 18
10/1/2016 19
SVM
Applications
• Spam filtering [spam, not spam]
• Sentiment classification [positive, negative]
• For example, classifying reviews
• Customer service message classification [urgent,
not urgent]
• Information retrieval [relevant, not relevant]
10/1/2016 20
KNN
Algorithm
• Learning: Memorize All Training Data.
• Label New Example
• Compute distance of input to each training example.
• Select K closest instances.
• Return class with most instances selected.
• If K = 1, then the case is simply assigned to the class of its nearest neighbor.
10/1/2016 21
KNN
Algorithm
Distance functions Small Demo
• [Link]
10/1/2016 22
KNN
Algorithm
• If k = 3 (solid line circle) it is assigned to the red
class because there are 2 triangles and only 1
square inside the inner circle.
• If k = 5 (dashed line circle) it is assigned to the
blue class (3 squares vs. 2 triangles inside the
outer circle).
10/1/2016 23
KNN
Applications
• Radar target classification
• GPS
10/1/2016 24
KNN
Demo
[Link]
10/1/2016 25
Decision Trees
• Decision trees are one of the oldest and most used elements in machine
learning in general. They go back for decades, and they are extremely
robust.
• Decision trees uses a trick to let you do non-linear decision making with
simple linear decision surfaces.
10/1/2016 26
Decision Trees
• We can use a tree of questions as a
representation language, each node from the
tree is a test about an attribute.
• To learn we have to perform a search in the
space of trees of questions.
10/1/2016 27
Decision Trees
Example
O
Sunny
Windy? X
X
10/1/2016 28
Decision Trees
Example
Decision node
10/1/2016 29
Leaf node
Decision Trees
ID3 Algorithm
• One of the first algorithms for building decision trees.
• It employs a top down, greedy search through the space of possible
branches.
• ID3 uses Entropy and Information Gain to construct a decision tree.
• Entropy to calculate the homogeneity of a sample.
• Information gain is based on the decrease in entropy after a dataset is split
10/1/2016 30
Decision Trees
ID3 Algorithm
• Entropy:
• If the sample is completely homogeneous
the entropy is zero and if the sample is an
equally divided it has entropy of one.
• Formulas:
• 1.
• 2.
10/1/2016 31
Decision Trees
ID3 Algorithm
• Entropy:
• Formulas:
• 1.
10/1/2016 32
Decision Trees
ID3 Algorithm
• Entropy:
• Formulas:
• 2.
10/1/2016 33
Decision Trees
ID3 Algorithm
• Information gain:
• The information gain is based on the
decrease in entropy after a dataset is
split on an attribute.
• Constructing a decision tree is all
about finding attribute that returns
the highest information gain.
10/1/2016 34
Decision Trees
ID3 Algorithm
• Step 1: Calculate entropy of the target.
• Step 2: Calculate the information gain for each attribute
• Step 3: Chose the attribute with the largest information gain as the decision node
• Step 4: Check the entropy of each branch
• 4a: A branch with entropy of 0 is a leaf node
• 4b: A branch with entropy more than 0 => split
• Step 5: Back to step 1
10/1/2016 35
Decision Trees
ID3 Algorithm
• Step 1: Calculate entropy of the target.
10/1/2016 36
Decision Trees
ID3 Algorithm
• Step 2: Calculate the information gain
for each attribute
• The dataset is split on different attributes
• Calculate entropy for each split (using
formula 2)
• Calculate the information gain for each split
10/1/2016 37
Decision Trees
ID3 Algorithm
• Step 3: Chose the attribute with the largest
information gain as the decision node
10/1/2016 38
Decision Trees
ID3 Algorithm
• Step 4: Check the entropy of each branch
• 4a: A branch with entropy of 0 is a leaf node
10/1/2016 39
Decision Trees
ID3 Algorithm
• Step 4: Check the entropy of each branch
• 4b: A branch with entropy more than 0 => split
10/1/2016 40
Decision Trees
ID3 Algorithm
• Step 5:
• The ID3 algorithm is run recursively on the non-leaf
branches, until all data is classified.
10/1/2016 41
Decision Trees
Demo
Decision Tree Demonstration
O
Sunny
Windy? X
X
10/1/2016 42
Conclusion
• After we introduced ML, and viewed its different concepts and their
differences
• SL vs UL - Classification vs Regression - Binary vs Multiclass
• We dug deeper into Supervised Classification
• Its main algorithms
• Explaining their thought process
• Their applications
• Some demos
10/1/2016 43
References
• Supervised learning. (n.d.). Retrieved September 28, 2016, from
[Link]
• What is the difference between supervised and unsupervised learning algorithms?
(n.d.). Retrieved September 28, 2016, from [Link]
difference-between-supervised-and-unsupervised-learning-algorithms
• Decision tree learning. (n.d.). Retrieved September 28, 2016, from
[Link]
• ID3. (n.d.). Retrieved September 28, 2016, from
[Link]
• Data Mining Map. (n.d.). Retrieved September 28, 2016, from
[Link]
10/1/2016 44
References
• K-nearest neighbors algorithm. (n.d.). Retrieved September 29, 2016, from
[Link]
• How SVM (Support Vector Machine) algorithm works. Retrieved September 28, 2016, from
[Link]
• Introduction to Support Vector Machines. (n.d.). Retrieved September 28, 2016, from
[Link]
ml
• KNN Classification. (n.d.). Retrieved September 28, 2016, from
[Link]
10/1/2016 45
THANK YOU!
10/1/2016 46