0% found this document useful (0 votes)
9 views37 pages

Unit 2 S4 Slo2

The document discusses pattern classification in the context of biometrics, using the example of sorting fish by species through optical sensing. It outlines the process of feature extraction, preprocessing, and classification, emphasizing the importance of selecting relevant features and decision boundaries. Additionally, it covers various classification algorithms, including decision trees and clustering methods, highlighting their mechanisms and advantages.

Uploaded by

Deepa S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views37 pages

Unit 2 S4 Slo2

The document discusses pattern classification in the context of biometrics, using the example of sorting fish by species through optical sensing. It outlines the process of feature extraction, preprocessing, and classification, emphasizing the importance of selecting relevant features and decision boundaries. Additionally, it covers various classification algorithms, including decision trees and clustering methods, highlighting their mechanisms and advantages.

Uploaded by

Deepa S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

18CSE357T – BIOMETRICS

Unit –2 : Session –4 : SLO -2

SRM Institute of Science and Technology 1


PATTERN CLASSIFICATION
15

An Example

• “Sorting incoming Fish on a conveyor according to


species using optical sensing”

Sea bass
Species
Salmon

Pattern Classification……By Ranjan Ganguli


16

• Problem Analysis

• Set up a camera and take some sample images to extract


features

• Length
• Lightness
• Width
• Number and shape of fins
• Position of the mouth, etc…

• This is the set of all suggested features to explore for use in our
classifier!
Pattern Classification……By Ranjan Ganguli
17

• Preprocessing

• Use a segmentation operation to isolate fishes from one


another and from the background

• Information from a single fish is sent to a feature


extractor whose purpose is to reduce the data by
measuring certain features

• The features are passed to a classifier


Pattern Classification……By Ranjan Ganguli
18

Pattern Classification……By Ranjan Ganguli


19

• Classification

• Select the length of the fish as a possible feature for


discrimination

Pattern Classification……By Ranjan Ganguli


20

Pattern Classification……By Ranjan Ganguli


21

The length is a poor feature alone!

Select the lightness as a possible feature.

Pattern Classification……By Ranjan Ganguli


22

Pattern Classification……By Ranjan Ganguli


23

• Threshold decision boundary and cost relationship

• Move our decision boundary toward smaller values of


lightness in order to minimize the cost (reduce the number
of sea bass that are classified salmon!)

Task of decision theory

Pattern Classification……By Ranjan Ganguli


24

• Adopt the lightness and add the width of the fish

Fish xT = [x1, x2]

Lightness Width

Pattern Classification……By Ranjan Ganguli


25

Pattern Classification……By Ranjan Ganguli


26

• We might add other features that are not correlated


with the ones we already have. A precaution should
be taken not to reduce the performance by adding
such “noisy features”

• Ideally, the best decision boundary should be the one


which provides an optimal performance such as in the
following figure:

Pattern Classification……By Ranjan Ganguli


27

Pattern Classification……By Ranjan Ganguli


28

• However, our satisfaction is premature because


the central aim of designing a classifier is to
correctly classify novel input

Issue of generalization!

Pattern Classification……By Ranjan Ganguli


29

Pattern Classification……By Ranjan Ganguli


30

Pattern Classification……By Ranjan Ganguli


CLASSIFICATION ALGORITHMS
(Supervised Learning)

•Decision trees
•Kernel Estimation & K-nearest neighbour(KNn)
•Linear discriminate analysis (LDA)
• Quadratic Discriminate Analysis (QDA)
•Maximum entropy classifier (multinomial logistic regression)
•Naive Bayes classifier
•Artificial Neural Networks
•Support Vector Machine

Pattern Classification……By Ranjan Ganguli


Decision Trees

“Splitting datasets one feature at a time”


The decision tree is one of the most commonly used
classification techniques; recent surveys claim that it’s
the most commonly used technique.
Advantages:

“Major focus on insights about the data”.


Decision tree–building algorithm use
information theory to split the data-set
based on some decisions
Steps:
1. To build a decision tree, we need to make a
first decision on the dataset to dictate which feature is
used to split the data.

2. To determine this, we try every feature and measure


which split will give you the best results.

3. After that, we’ll split the dataset into subsets.

4. The subsets will then traverse down the branches of


the first decision node. If the data on the branches is
the same class, then you’ve properly classified it and
5. If the data isn’t the same, then we need
to repeat the splitting process on this
subset. The decision on how to split this
subset is done the same way as
the original dataset, and we repeat this
process until we’ve classified all the data.
Information gain
•We choose to split our dataset in a way
that makes our unorganized data more
organized. One way to organize this is to
measure the information.
•Using information theory, we can measure
the information before and after the split
•The change in information before and after
the split is known as the information
gain.
Note:
Highest information gain helps to
split the data set

The attribute with the highest


information gain is chosen as the
splitting
Information gain = Entropy

What is entropy?
Entropy is defined as the expected value
of the information.
(Here, it is measured on each attribute)
For entropy to calculate, we need the
expected value of all the information
of all possible values of our class.
This is given by:
Example to calculate information gain
•Next, we need to calculate expected
information gain for each attribute
CLUSTERING ALGORITHMS
(Un-supervised Learning)

•Hierarchical Clustering
•K-means Clustering
•KPCA (Kernel Principle Component Analysis)

Pattern Classification……By Ranjan Ganguli

You might also like