TRAINING METHODS
Calvin Abonga
Training/ Test phases
Training/Test data
• How do we know that we have collected an
adequately large and representative set of
examples for training/testing the system?
Training Set ?
Test Set ?
3
Pre-processing Step
Example
(1) Image enhancement
(2) Separate touching
or occluding fish
(3) Find the boundary of
each fish
4
Sensors & Preprocessing
• Sensing:
• Use a sensor (camera or microphone) for data capture.
• PR depends on bandwidth, resolution, sensitivity, distortion
of the sensor.
• Pre-processing:
• Removal of noise in data.
• Segmentation (i.e., isolation of patterns of interest from
background).
5
Feature Extraction
• Assume a fisherman told us that a sea bass is
generally longer than a salmon.
• We can use length as a feature and decide
between sea bass and salmon according to a
threshold on length.
• How should we choose the threshold?
6
“Length” Histograms
threshold l*
• Even though sea bass is longer than salmon on
the average, there are many examples of fish
where this observation does not hold.
7
“Average Lightness” Histograms
• Consider a different feature such as “average
lightness”
threshold x*
• It seems easier to choose the threshold x* but we
still cannot make a perfect decision.
8
Multiple Features
• To improve recognition accuracy, we might have to
use more than one feature.
• Single features might not yield the best performance.
• Using combinations of features might yield better
performance.
x1 x1 : lightness
x x : width
2 2
• How many features should we choose?
9
How Many Features?
• Does adding more features always improve
performance?
• It might be difficult and computationally expensive to
extract certain features.
• Correlated features might not improve performance
(i.e. redundancy).
• “Curse” of dimensionality.
10
Curse of Dimensionality
• Adding too many features can, paradoxically, lead to a
worsening of performance.
• Divide each of the input features into a number of intervals, so
that the value of a feature can be specified approximately by
saying in which interval it lies.
• If each input feature is divided into M divisions, then the total
number of cells is Md (d: # of features).
• Since each cell must contain at least one point, the number of
training data grows exponentially with d.
11
Missing Features
• Certain features might be missing (e.g., due to
occlusion).
• How should we train the classifier with missing
features ?
• How should the classifier make the best decision
with missing features ?
12
“Quality” of Features
• How to choose a good set of features?
• Discriminative features
• Invariant features (e.g., invariant to geometric
transformations such as translation, rotation and scale)
• Are there ways to automatically learn which features
are best ?
13
Classification
• Partition the feature space into two regions by finding
the decision boundary that minimizes the error.
• How should we find the optimal decision boundary?
14
Complexity of Model
• We can get perfect classification performance on the
training data by choosing a more complex model.
• Complex models are tuned to the particular training
samples, rather than on the characteristics of the true
model.
overfitting
How well can the model generalize to unknown samples?
15
Generalization
• Generalization is defined as the ability of a classifier to
produce correct results on novel patterns.
• How can we improve generalization performance ?
• More training examples (i.e., better model estimates).
• Simpler models usually yield better performance.
complex model simpler model
16
Understanding model complexity:
function approximation
• Approximate a function from a set of samples
o Green curve is the true function
o Ten sample points are shown by the blue circles
(assuming noise)
17
Understanding model complexity:
function approximation (cont’d)
Polynomial curve fitting: polynomials having various
orders, shown as red curves, fitted to the set of 10
sample points.
18
Understanding model complexity:
function approximation (cont’d)
Polynomial curve fitting: 9’th order polynomials fitted to
15 and 100 sample points.
19
Improve Classification Performance
through Post-processing
• Consider the problem of character recognition
• Exploit context to improve performance.
How m ch info
mation are y u mi
sing?
20
Improve Classification Performance
through Ensembles of Classifiers
• Performance can be
improved using a "pool" of
classifiers.
• How should we build and
combine different classifiers
?
21
Cost of miss-classifications
• Fish classification: two possible classification
errors:
(1) Deciding the fish was a sea bass when it was a
salmon.
(2) Deciding the fish was a salmon when it was a sea
bass.
• Are both errors equally important ?
22
Cost of miss-classifications
(cont’d)
• Suppose that:
• Customers who buy salmon will object vigorously if
they see sea bass in their cans.
• Customers who buy sea bass will not be unhappy if
they occasionally see some expensive salmon in their
cans.
• How does this knowledge affect our decision?
23
Computational Complexity
• How does an algorithm scale with the number of:
• features
• patterns
• categories
• Need to consider tradeoffs between computational
complexity and performance.
24
Would it be possible to build a
“general purpose” PR system?
• It would be very difficult to design a system that is
capable of performing a variety of classification
tasks.
• Different problems require different features.
• Different features might yield different solutions.
• Different tradeoffs exist for different problems.
25