0% found this document useful (0 votes)
26 views38 pages

05 Ensemble Learning

Ensemble learning involves combining multiple simple classifiers to obtain an ensemble classifier with better performance. There are two key components to ensemble learning: 1) Creating a set of diverse classifiers using techniques like bagging, boosting, and AdaBoost that generate different training datasets or parameters, and 2) Combining the predictions of individual classifiers, with techniques like majority voting that strengthen correct predictions and weaken incorrect ones. AdaBoost in particular iteratively trains classifiers on weighted versions of the training data, where misclassified examples receive higher weight, to focus newer classifiers on correcting mistakes.

Uploaded by

Adika Stadevant
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
26 views38 pages

05 Ensemble Learning

Ensemble learning involves combining multiple simple classifiers to obtain an ensemble classifier with better performance. There are two key components to ensemble learning: 1) Creating a set of diverse classifiers using techniques like bagging, boosting, and AdaBoost that generate different training datasets or parameters, and 2) Combining the predictions of individual classifiers, with techniques like majority voting that strengthen correct predictions and weaken incorrect ones. AdaBoost in particular iteratively trains classifiers on weighted versions of the training data, where misclassified examples receive higher weight, to focus newer classifiers on correcting mistakes.

Uploaded by

Adika Stadevant
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 38

Ensemble learning

Ensembles of simple classifiers

Ensembles: Boosting, Weak and Strong


Learning, AdaBoost
Ensemble learning
◼ ENSEMBLE
❑ group, set (of classifiers)

◼ ENSEMBLE LEARNING
❑ learning a set of classifiers

◼ also called ENSEMBLE BASED SYSTEM

AIDP, M.Oravec, ÚIM FEI STU


Main literature
◼ R. Polikar: Ensemble Based Systems in Decision Making
❑ IEEE Circuits and Systems Magazine, vol.6, no.3, pp. 21-45, 2006

◼ http://users.rowan.edu/~polikar/RESEARCH/PUBLICATIONS/csm06.pdf

AIDP, M.Oravec, ÚIM FEI STU


Ensembles (slov. súbory)
◼ „ensemble based systems“
❑ often we seek a second opinion before making a decision,
sometimes a third, and sometimes many more
❑ we weigh the individual opinions, and combine them
through some thought process to reach a final (process of
consulting “several experts” before making a final decision)

❑ other names:
◼ multiple classifier systems
◼ committee of classifiers
◼ mixture of experts

AIDP, M.Oravec, ÚIM FEI STU


Principle
◼ one of the possibilities:

Training data

Data 1 Data 2  Data m

Learner 1 Learner 2  Learner m

Model1 Model2  Model m

Combination of the models Final model

AIDP, M.Oravec, ÚIM FEI STU


Why to use ensemble based systems
◼ statistical reasons
❑ good classification of training data does not mean good behavior for test data
(we also know from neural networks)
❑ a combination (averaging) of several classifiers will help

◼ large amounts of data


❑ dividing the data into smaller parts, training the classifiers, then combining
their outputs

◼ small amounts of data


❑ „resampling" - creation of several randomly overlapping subsets of data,
training of classifiers, then a combination of their outputs

AIDP, M.Oravec, ÚIM FEI STU


Why to use ensemble based systems
◼ too demanding a task for one classifier
❑ divide and rule (divide et impera) - dividing space into smaller (and less
demanding) parts, classifiers for these simpler parts, then their combination

AIDP, M.Oravec, ÚIM FEI STU


Why to use ensemble based systems

◼ data fusion
◼ several data sets from different sources (heterogeneous data)
❑ data from each modality to the appropriate classifier, then a
combination

AIDP, M.Oravec, ÚIM FEI STU


Generating individual classifiers
◼ generally two types of combination

1. classifier selection
◼ each classifier is an expert for a certain subspace
◼ combination:
❑ the classifier closest (based on the metric) to the input vector has the
highest weight
❑ several such local experts will be allowed to vote

2. classifier fusion
◼ the whole set of classifiers learns the whole space
◼ combination:
❑ the combination of individual (WEAK) classifiers creates one (STRONG)
expert with the best performance
❑ e.g. bagging, boosting, ...

AIDP, M.Oravec, ÚIM FEI STU


Diversity

◼ diversity
❑ strategy for ensemble based systems:
◼ create many classifiers, combine their outputs
◼ the overall performance will be better than for one classifier
❑ individual classifiers must make errors for different examples (each
classifier should be unique)

◼ ensemble of diversity classifiers:


❑ classifiers whose decision boundaries are different

AIDP, M.Oravec, ÚIM FEI STU


How to achieve classifier diversity?
1) using different training datasets to train individual classifiers
◼ a) resampling techniques – bootstrapping, bagging (training data subsets are
drawn randomly)

AIDP, M.Oravec, ÚIM FEI STU with


replacement
How to achieve classifier diversity?
◼ b) using different training parameters for different classifiers
❑ k-fold data split (in Slovak: k-násobné delenie dát )
◼ k different overlapping data sets

AIDP, M.Oravec, ÚIM FEI STU without


replacement
How to achieve classifier diversity?
2) using different training parameters for individual classifiers
❑ e.g. set of MLPs, different initializations, different configurations, different
required errors ...
❑ possibility of managing instability of individual MLP -> diversity

3) a combination of completely different types of classifiers


❑ e.g. MLP + SVM + decision trees + nearest neighbor classifiers

4) by using different symptoms


❑ so-called random subspace method

◼ Diversity is most often achieved through 1)

AIDP, M.Oravec, ÚIM FEI STU


Two key components of ensemble based
systems

1. strategy for creating a set of classifiers with maximum


diversity
◼ bagging, boosting, AdaBoost, ...

2. strategy for combining classifier outputs


◼ the right decisions are strengthened, the wrong ones are
discarded

AIDP, M.Oravec, ÚIM FEI STU


1) Strategy for creating a set of classifiers with
maximum diversity

bagging, boosting, AdaBoost, ...

AIDP, M.Oravec, ÚIM FEI STU


Weak learner, base classifier

◼ weak learner
❑ classifier which is to be learnt

◼ Base Classifier (BC)


❑ a simple classifier that is able to classify any input sample better
than randomly (probability of success greater than 50%)

AIDP, M.Oravec, ÚIM FEI STU


Bagging
◼ bagging = bootstrap aggregating
◼ bootstrapped replicas of the training
data
◼ combination of outputs - by majority
voting
◼ suitable for small datasets
◼ large portions of the samples (75% to
100%) are drawn into each subset
◼ neural networks and decision trees are
suitable classifiers

AIDP, M.Oravec, ÚIM FEI STU


Boosting
◼ boosting - 3 weak classifiers:
❑ C1 - training on a random subset
❑ C2 - training set: ½ correctly
classified examples from C1 and
½ misclassified
❑ C3 - training on examples for
which C1 and C2 disagree

◼ C1, C2, C3 shall be combined


by majority voting into a
strong classifier

AIDP, M.Oravec, ÚIM FEI STU


Adaboost.M1

◼ AdaBoost - more versions


❑ AdaBoost.M1 (multiclass)
❑ AdaBoost.R (regression)

◼ training of a weak classifier on examples selected from


iteratively updated distributions of training data
❑ the update will ensure that examples that have been incorrectly
classified by the previous classifier are more likely to be included
in the training data of the new classifier

◼ combination by weighted majority vote

AIDP, M.Oravec, ÚIM FEI STU


Adaboost.M1
• weight distribution Dt(i) for training
samples xi, i = 1, . . . , N from which training
data subsets St are chosen for each
consecutive classifier (hypothesis) ht

• during initialization the distribution is


uniform, all examples have the same
chance to get to the first training set

AIDP, M.Oravec, ÚIM FEI STU


Adaboost.M1
• the trainig error εt of the classifier ht is
also weighted by this distribution

• the error εt must be less than ½

• computation of normalized error βt ,


for 0 < εt < ½
we have 0 < βt < 1.

AIDP, M.Oravec, ÚIM FEI STU


Adaboost.M1

distribution update rule:

• the weights of well-classified examples by


the current hypothesis are reduced by the
factor βt

• the weights of misclassified examples do


not change

• thus, after normalization, the weights of


misclassified examples increase

AIDP, M.Oravec, ÚIM FEI STU


Adaboost.M1
• weighted majority voting (as opposed to
bagging and boosting)

• well-classifying classifiers during


training are rewarded with higher voting
weights than the others

• 1/βt is a measure of performance, for a


small error it is large - sometimes too
large, for possible stability problems a
logarithm is used

AIDP, M.Oravec, ÚIM FEI STU


Adaboost.M1 - block diagram
◼ algorithm is sequential:
◼ classifier CK is created before CK+1, i.e. βK and DK are available

AIDP, M.Oravec, ÚIM FEI STU


Adaboost.M1
◼ the training error E (ensemble error) is bounded above

◼ since εt <1/2, E is guaranteed to decrease with each new classifier

◼ resistance against overtraining!


◼ relation to margin theory
❑ we studied margins - SVM (maximizing the margin among classes)

AIDP, M.Oravec, ÚIM FEI STU


! Adaboost SVM
◼ support vectors are said to define the margin that separates the classes
◼ both SVM and AdaBoost maximize margin
◼ AdaBoost also boosts the margins

AIDP, M.Oravec, ÚIM FEI STU


Good material about Adaboost:
◼ http://www.comp.leeds.ac.uk/scsjso/adaboost_talk.pdf

AIDP, M.Oravec, ÚIM FEI STU


AIDP, M.Oravec, ÚIM FEI STU
AIDP, M.Oravec, ÚIM FEI STU
2) Strategy for combining classifier outputs

AIDP, M.Oravec, ÚIM FEI STU


Strategy for combining classifier outputs

◼ 2 taxonomies

1) trainable vs. non-trainable combination rules,


◼ trainable (dynamic)
❑ parameters of combiner (weights) are determined by a separate training algorithm (e.g. EM
– expectation maximization)
◼ nontrainable
❑ e.g. weighted majority voting

2) combination rules that apply either to class tags or to class-specific continuous


outputs
◼ application to class labels ωj, j = 1, . . . , C
◼ application directly to continuous outputs of individual classifiers
❑ e.g. to continuous outputs of MLP or RBF network

AIDP, M.Oravec, ÚIM FEI STU


Strategy for combining classifier outputs

(2) combination rules that apply either to class tags or to class-specific


continuous outputs

◼ combining class labels


❑ Majority Voting
❑ Weighted Majority Voting
❑ Behavior Knowledge Space (BKS)
❑ Borda Count

◼ combining continuous outputs


❑ Algebraic combiners
▪ mean rule, weighted average, trimmed mean,
Minimum/Maximum/Median Rule, product rule, Generalized Mean,
❑ Decision Templates
❑ Dempster-Shafer Based Combination

AIDP, M.Oravec, ÚIM FEI STU


Which ensemble generation or
combination rule is the best?

◼ there is no best classifier for all classification problems


❑ the best algorithm depends on the data structure and a priori knowledge

◼ similar applies to combination rules

AIDP, M.Oravec, ÚIM FEI STU


! dropout ensembles

◼ close relationship of dropout in deep NN and


ensemble learning:
❑ different neurons off - a set of different NN architectures!

https://cv-tricks.com/cnn/understand-resnet-
AIDP, M.Oravec, ÚIM FEI STU alexnet-vgg-inception/
Illustrations
◼ The graph shows 2 classes of 100 objects. Banana shaped classes were
used to generate data. 40% of the data was used for training, the rest
was used for testing.
◼ bpxnc classifier (back-propagation), MLP classifier.

Banana set, 3 neurons in hidden layer Banana set, more neurons in hidden layer

MLP klasifikátor, 3 neurons in hidden layer MLP classifier, 5,15 and 50 neurons in
hidden layer

AIDP, M.Oravec, ÚIM FEI STU


◼ Result of ensemble of classifiers

ensemble of classifiers ensemble of classifiers

mean, voting and maximum combiner mean, voting and maximum combiner

AIDP, M.Oravec, ÚIM FEI STU


◼ Separate classifiers and the result of ensemble of
classifiers

Various classifiers Ensemble of classifiers

linear, quadratic, parzen and mean, voting, maximum and product


backpropagation classifier combiner

AIDP, M.Oravec, ÚIM FEI STU

You might also like