0% found this document useful (0 votes)
23 views93 pages

SVM and Multi-class Classifiers in CV

The document outlines a lecture on Advanced Computer Vision, focusing on Support Vector Machines (SVM), kernels, and multi-class linear classifiers. It discusses the statistical learning approach for image classification, including feature extraction, training, and testing processes. Additionally, it covers concepts like overfitting, underfitting, and the comparison between neural networks and linear classifiers.

Uploaded by

Life Zhen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views93 pages

SVM and Multi-class Classifiers in CV

The document outlines a lecture on Advanced Computer Vision, focusing on Support Vector Machines (SVM), kernels, and multi-class linear classifiers. It discusses the statistical learning approach for image classification, including feature extraction, training, and testing processes. Additionally, it covers concepts like overfitting, underfitting, and the comparison between neural networks and linear classifiers.

Uploaded by

Life Zhen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

EECE 7370 Advanced Computer Vision

Lecture 3
SVM, Kernels, Multi-class Linear Classi ers
Next Class
Loss Functions, , Regularization, Optimization

1
fi
Robust
Systems

Image Classi cation


Lab

Northeastern 2
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Statistical Learning Approach


Lab

f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”

We want a “prediction” function that is applied to a “feature


representation” of the image to get the “label” of the image.
Slide adapted from S. Lazebnik
Northeastern 3
UNIVERSITY Advanced Computer Vision
Robust
Systems

Statistical Learning Approach Lab

Feature x y
f ()
<latexit sha1_base64="(null)">(null)</latexit>
<latexit

Extraction <latexit sha1_base64="(null)">(null)</latexit>


<latexit sha1_base64="(null)">(null)</latexit>

y = f (x)
<latexit sha1_base64="(null)">(null)</latexit>

We want a “prediction” function that is applied to a “feature


representation” of the image to get the “label” of the image.
Slide adapted from S. Lazebnik
Northeastern 4
UNIVERSITY Advanced Computer Vision
Robust
Systems

Statistical Learning Approach Lab

Feature x y
f ()
<latexit sha1_base64="(null)">(null)</latexit>
<latexit

Extraction <latexit sha1_base64="(null)">(null)</latexit>


<latexit sha1_base64="(null)">(null)</latexit>

y = f (x)
<latexit sha1_base64="(null)">(null)</latexit>

Training: Given a set of labeled training samples:


{(x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )}
<latexit sha1_base64="(null)">(null)</latexit>

estimate the prediction function that minimizes the prediction


error of the training set. Slide adapted from S. Lazebnik
Northeastern 5
UNIVERSITY Advanced Computer Vision
Robust
Systems

Statistical Learning Approach Lab

Feature x y
f ()
<latexit sha1_base64="(null)">(null)</latexit>
<latexit

Extraction <latexit sha1_base64="(null)">(null)</latexit>


<latexit sha1_base64="(null)">(null)</latexit>

Testing: Apply the prediction function to a never seen before


sample:

y = f (x)
<latexit sha1_base64="(null)">(null)</latexit>

Slide adapted from S. Lazebnik


Northeastern 6
UNIVERSITY Advanced Computer Vision
Robust
Systems

Steps
Lab

Training Training
Images Labels

Image Learned
Training
Features model

Learned
model

Image
Prediction
Features
Test Image Slide credit: D. Hoiem
Northeastern 7
UNIVERSITY Advanced Computer Vision
Robust
Systems

“Shallow Learning”
Lab

Feature x y
f ()
<latexit sha1_base64="(null)">(null)</latexit>
<latexit

Extraction <latexit sha1_base64="(null)">(null)</latexit>


<latexit sha1_base64="(null)">(null)</latexit>

Handcrafted features
Off-the-shelf trainable classi er

Northeastern 8
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
Lab

Features: raw pixels


Northeastern 9
UNIVERSITY Advanced Computer Vision
Robust
Systems
Lab

Features: raw pixels Classi er: NN

Northeastern 10
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

NN Distance Metrics
Lab

L1 (Manhattan) Distance: L2 (Euclidean) Distance:

sX
X
d1 (I1 , I2 ) = |I1 (p) I2 (p)| d2 (I1 , I2 ) = (I1 (p) I2 (p))2
p
<latexit sha1_base64="(null)">(null)</latexit>

p
<latexit sha1_base64="(null)">(null)</latexit>

Northeastern 11
UNIVERSITY Advanced Computer Vision
Robust
Systems

Comparing Images
Lab

X
L1 Distance: d1 (I1 , I2 ) = |I1 (p) I2 (p)|
<latexit sha1_base64="(null)">(null)</latexit>
p

Northeastern 12
UNIVERSITY Advanced Computer Vision
Robust
Systems

NN: Visualization
Lab

Northeastern 13
UNIVERSITY Advanced Computer Vision
Robust
Systems

CFAR example
Lab

Northeastern 14
UNIVERSITY Advanced Computer Vision
Robust
Systems

CFAR example
Lab

Northeastern 15
UNIVERSITY Advanced Computer Vision
Robust
Systems

NN: Visualization
Lab

NN boundaries can be rough, isolated points create “islands”

Northeastern 16
UNIVERSITY Advanced Computer Vision
Robust
Systems

K-NN: Visualization
Lab

Use majority vote from K neighbors

Northeastern 17
UNIVERSITY Advanced Computer Vision
Robust
Systems

Hyperparameters
Lab

Goal: obtain a classi er with good generalization i.e. good performance on


never seen before data

• Learn parameters (model) on the training set


• Tune hyperparameters (implementation
choices) on held out validation set
• Evaluate performance on the test set (not
seen in previous steps)

Northeastern 18
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

No cheating!
Lab

Northeastern 19
UNIVERSITY Advanced Computer Vision
Robust
Systems

No Cheating!
Lab

Northeastern 20
UNIVERSITY Advanced Computer Vision
Robust
Systems

Why do we need validation set?


Lab

Northeastern 21
UNIVERSITY Advanced Computer Vision
Robust
Systems

Why do we need validation set?


Lab

Northeastern 22
UNIVERSITY Advanced Computer Vision
Robust
Systems

Under tting and Over tting


Lab

Under tting: Training and test error are both high

Northeastern 23
UNIVERSITY Advanced Computer Vision
fi
fi
fi
Robust
Systems

Under tting and Over tting


Lab

Under tting: Training and test error are both high

Over tting: Training error is low and test error is high

Northeastern 24
UNIVERSITY Advanced Computer Vision
fi
fi
fi
fi
Robust
Systems

Under tting and Over tting


Lab

Under tting: Training and test error are both high

Over tting: Training error is low and test error is high

Northeastern 25
UNIVERSITY Advanced Computer Vision
fi
fi
fi
fi
Robust
Systems

k-NN
Lab

In practice, not used for images because:


Very slow at test time
Distance metric on pixels are not informative
Curse of dimensionality

Northeastern 26
UNIVERSITY Advanced Computer Vision
Robust
Systems

Linear Classi ers Lab

Find a LINEAR FUNCTION to separate the classes

Northeastern 27
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Linear Classi ers Lab

l
x2

Normal
W
x

x1

w
x. =
∥w∥
Northeastern 28
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Linear Classi ers Lab

l
x2

w
x. −ρ=0
Normal
W

∥w∥
x

x1

w
x. = ρ
∥w∥
Northeastern 29
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Linear Classi ers Lab

l
x2

w
x. −ρ=0
Normal
W

∥w∥
x

P
d
ρ

x1

d=
Northeastern 30
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Linear Classi ers Lab

l
x2

w
x. −ρ=0
Normal
W

∥w∥
x

P
d
ρ

x1
w
xp . − (ρ + d) = 0
∥w∥
d=
Northeastern 31
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Linear Classi ers Lab

l
x2

w
x. −ρ=0
Normal
W

∥w∥
x

P
d
ρ

x1
w
xp . − (ρ + d) = 0
∥w∥
w
d = xp . −ρ
∥w∥
Northeastern 32
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Linear Classi ers Lab

l
x2

w
x. −ρ=0
Normal
W

∥w∥
x

x1
w
xn . − (ρ − d) = 0
d

N
∥w∥

Northeastern 33
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Linear Classi ers Lab

l
x2

w
x. −ρ=0
Normal
W

∥w∥
x

x1
w
xn . − (ρ − d) = 0
d

N
∥w∥
w
−d = xn . −ρ
∥w∥
Northeastern 34
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Linear Classi ers Lab

Find a LINEAR FUNCTION to separate the classes:

T
f (x) = sgn(w · x + b)
<latexit sha1_base64="(null)">(null)</latexit>

Northeastern 35
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

NN vs. Linear Classi ers Lab

NN Pros:
Simple to implement
Decision boundaries not necessarily linear
Works for any number of classes
Nonparametric Method
NN Cons:
Need good distance function
Slow at test time

Northeastern 36
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

NN vs. Linear Classi ers Lab

NN Pros:
Simple to implement
Decision boundaries not necessarily linear
Works for any number of classes
Nonparametric Method
NN Cons:
Need good distance function
Slow at test time

Linear Pros:
Low-dimensional parametric representation
Very fast at test time
Linear Cons:
Works for two classes
How to train the linear function?
What if data is not linearly separable?
Northeastern 37
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Linear Classi ers Lab

• If linearly separable, there


might be more than one
separating hyperplane

• This is an “extra degree


of freedom”. What can we
do it with it?

Northeastern 38
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Linear Support Vector Machine SVM Lab

y1 = + 1

yi = + 1

y2 = − 1

yN = − 1

Northeastern 39
UNIVERSITY Advanced Computer Vision
Robust
Systems

Linear Support Vector Machine SVM Lab

y1 = + 1

yi = + 1

y2 = − 1

yN = − 1

Northeastern 40
UNIVERSITY Advanced Computer Vision
Robust
Systems

Linear Support Vector Machine SVM Lab

y1 = + 1

yi = + 1

y2 = − 1

yN = − 1

Northeastern 41
UNIVERSITY Advanced Computer Vision
Robust
Systems

Linear Support Vector Machine SVM Lab

y1 = + 1

yi = + 1

y2 = − 1

yN = − 1

Northeastern 42
UNIVERSITY Advanced Computer Vision
Robust
Systems

Linear SVM Lab

2
1. Maximize margin kwk
<latexit sha1_base64="(null)">(null)</latexit>

2. Correctly classify all training data:

Positive samples: <latexit sha1_base64="(null)">(null)</latexit>


w T xi + b 1

Negative samples: w T xi + b 
<latexit sha1_base64="(null)">(null)</latexit>
1

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Northeastern 43
UNIVERSITY Advanced Computer Vision
Robust
Systems

Linear SVM Lab

2
1. Maximize margin kwk
<latexit sha1_base64="(null)">(null)</latexit>

2. Correctly classify all training data: yi (wT xi + b)


<latexit sha1_base64="(null)">(null)</latexit>
1

Positive samples: <latexit sha1_base64="(null)">(null)</latexit>


w T xi + b 1

Negative samples: w T xi + b 
<latexit sha1_base64="(null)">(null)</latexit>
1

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Northeastern 44
UNIVERSITY Advanced Computer Vision
Robust
Systems

Linear SVM Lab

2
1. Maximize margin kwk <latexit sha1_base64="(null)">(null)</latexit>

2. Correctly classify all training data: yi (wT xi + b)


<latexit sha1_base64="(null)">(null)</latexit>
1

Positive samples: <latexit sha1_base64="(null)">(null)</latexit>


w T xi + b 1

Negative samples: w T xi + b 
<latexit sha1_base64="(null)">(null)</latexit>
1

Quadratic Optimization Problem:


1
min kwk2 subject to yi (wT xi + b) 1
w,b 2
<latexit sha1_base64="(null)">(null)</latexit>

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Northeastern 45
UNIVERSITY Advanced Computer Vision
Robust
Systems

Linear Support Vector Machine SVM Lab

y1 = + 1

y2 = − 1

Northeastern 46
UNIVERSITY Advanced Computer Vision
Robust
Systems

Linear SVM Lab

Linearly separable data:

1
min kwk2 subject to yi (wT xi + b) 1
w,b 2
<latexit sha1_base64="(null)">(null)</latexit>

Non-Linearly separable data:

X n
1 2
min kwk + C max 0, 1 yi (wT xi + b)
w,b 2
<latexit sha1_base64="(null)">(null)</latexit>
I=1

Northeastern 47
UNIVERSITY Advanced Computer Vision
Robust
Systems

Linear SVM with Hinge Loss Lab

X n
1 2
min kwk + C max 0, 1 yi (wT xi + b)
w,b 2
<latexit sha1_base64="(null)">(null)</latexit>
I=1

+1 Margin
yi (wT xi + b) 1
0
<latexit sha1_base64="(null)">(null)</latexit>

-1
Demo: http://cs.stanford.edu/people/karpathy/svmjs/demo
Northeastern 48
UNIVERSITY Advanced Computer Vision
Robust
Systems

Non-Linear SVMs Lab

If the data is not linearly separable, map the original input to a higher
dimensional space where it is separable.

Northeastern 49
UNIVERSITY Advanced Computer Vision
Robust
Systems

Kernel Trick Lab

Instead of explicitly computing the lift transformation, we de ne a Kernel


function:

K(xi , xj ) = (xi ) (xj )


<latexit sha1_base64="(null)">(null)</latexit>

Northeastern 50
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Polynomial Kernel Lab

3D

9D

Northeastern 51
UNIVERSITY Advanced Computer Vision
Robust
Systems

Kernel Trick Lab

Linear SVM decision function:

X
w T xi + b = ↵i yi xTi x + b
<latexit sha1_base64="(null)">(null)</latexit>
i

Kernel SVM decision function:

X X
↵i yi (xi ) (x) + b = ↵i yi K(xi , x) + b
<latexit sha1_base64="(null)">(null)</latexit>
i i

This gives a nonlinear decision boundary in the original feature space.

Northeastern 52
UNIVERSITY Advanced Computer Vision
Robust
Systems

Polynomial Kernel Lab

K(x, y) = (c + xT y)d
<latexit sha1_base64="(null)">(null)</latexit>

Northeastern 53
UNIVERSITY Advanced Computer Vision
Robust
Systems

Gaussian (Radial Basis Function) Kernel Lab

1
kx yk2
K(x, y) = exp
<latexit sha1_base64="(null)">(null)</latexit>
2

K(x, y)
<latexit sha1_base64="(null)">(null)</latexit>

kx
<latexit sha1_base64="(null)">(null)</latexit>
yk
Northeastern 54
UNIVERSITY Advanced Computer Vision
Robust
Systems

SVMs Pros and Cons Lab

Pros:
Kernel-based framework is very powerful and exible
Training is a convex optimization problem
Amenable to theoretical analysis
Works very well in practice, even with small training datasets

Cons:
No direct multi-class SVM
Computation, memory

Northeastern 55
UNIVERSITY Advanced Computer Vision
fl
Robust
Systems

Linear Classi ers as Building Blocks Lab

Northeastern 56
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Parametric “Prediction” Function Lab

Northeastern 57
UNIVERSITY Advanced Computer Vision
Robust
Systems

Parametric Linear “Prediction” Function Lab

f (x, W) = Wx
<latexit sha1_base64="(null)">(null)</latexit>

3072 x 1
10 x 1 10 x 3072
Northeastern 58
UNIVERSITY Advanced Computer Vision
Robust
Systems

Parametric Linear “Prediction” Function Lab

Bias
f (x, W) = Wx + b
10 x 1
<latexit sha1_base64="(null)">(null)</latexit>

10 x 1

Northeastern 59
UNIVERSITY Advanced Computer Vision
Robust
Systems

Parametric Linear “Prediction” Function Lab

f (x,
f (x,fW)
(x, = W)
W) =[=
Wx + Wx
Wxb ][ ]
1 3073 x 1
<latexit sha1_base64="(null)">(null)</latexit>

<latexit sha1_base64="(null)">(null)</latexit> <latexit sha1_base64="(null)">(null)</latexit>

10 x 1 10 x 3073
Northeastern 60
UNIVERSITY Advanced Computer Vision
Robust
Systems

Example with 4 pixels, 3 classes Lab

Northeastern 61
UNIVERSITY Advanced Computer Vision
Robust
Systems

Example with 4 pixels, 3 classes Lab

Northeastern 62
UNIVERSITY Advanced Computer Vision
Robust
Systems

Example with 4 pixels, 3 classes Lab

Northeastern 63
UNIVERSITY Advanced Computer Vision
Robust
Systems

Example with 4 pixels, 3 classes Lab

Northeastern 64
UNIVERSITY Advanced Computer Vision
Robust
Systems

Linear Classi er Visualization Lab

Source: Andrej Karpathy, http://cs231n.github.io/linear-classify/

Northeastern 65
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Linear Classi er, Geometric Viewpoint Lab

Northeastern 66
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

How do we choose W,b? Lab

Are these scores good?

Northeastern 67
UNIVERSITY Advanced Computer Vision
Robust
Systems

How do we choose W,b? Lab

Need a multiclass “Loss”


Function and a way to nd
the best parameters for this
loss

Northeastern 68
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Choosing W Lab

Training Data:

{(xi , yi )}i=1,...,N
<latexit sha1_base64="(null)">(null)</latexit>

Image
Integer Label

Loss:
1 X
L= Li (f (xi , W ), yi )
N i
<latexit sha1_base64="(null)">(null)</latexit>

Northeastern 69
UNIVERSITY Advanced Computer Vision
Robust
Systems

Choosing W Lab

Training Data:

{(xi , yi )}i=1,...,N
<latexit sha1_base64="(null)">(null)</latexit>

Image
Integer Label

Average Loss:
1 X
L= Li (f (xi , W ), yi )
N i
<latexit sha1_base64="(null)">(null)</latexit>

Northeastern 70
UNIVERSITY Advanced Computer Vision
Robust
Systems

Choosing W Lab

Hinge Loss for sample i:

syi
Score for other class sj Score for true class

1
Margin of 1

Multiclass SVM: s = f (xi , W )


<latexit sha1_base64="(null)">(null)</latexit>

Northeastern 71
UNIVERSITY Advanced Computer Vision
Robust
Systems

Choosing W Lab

Hinge Loss for sample i:

syi
Score for other class sj Score for true class

1
Margin of 1

Multiclass SVM: s = f (xi , W )


<latexit sha1_base64="(null)">(null)</latexit>

Northeastern 72
UNIVERSITY Advanced Computer Vision
Robust
Systems

Choosing W Lab

Hinge Loss for sample i:

syi
Score for other class sj Score for true class

1
Margin of 1

Multiclass SVM: s = f (xi , W )


<latexit sha1_base64="(null)">(null)</latexit>

Northeastern 73
UNIVERSITY Advanced Computer Vision
Robust
Systems

Choosing W Lab

Hinge Loss for sample i:

syi
Score for other class sj Score for true class

1
Margin of 1

s = f (xi , W )
Multiclass SVM:
X ⇢ Sum over all classes j <latexit sha1_base64="(null)">(null)</latexit>

0 if syi sj + 1
Li =
sj syi + 1 otherwise
j6=yi
X
<latexit sha1_base64="(null)">(null)</latexit>

= max(0, sj syi + 1)
⌘6=yi
Northeastern 74
Advanced Computer Vision
<latexit sha1_base64="(null)">(null)</latexit>

UNIVERSITY
Robust
Systems

Choosing W Lab

X
Li = max(0, sj syi + 1)
⌘6=yi
3.2
<latexit sha1_base64="(null)">(null)</latexit>

5.1, -1.7

Northeastern 75
UNIVERSITY Advanced Computer Vision
Robust
Systems

Choosing W Lab

X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi

Northeastern 76
UNIVERSITY Advanced Computer Vision
Robust
Systems

Choosing W Lab

X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi

Northeastern 77
UNIVERSITY Advanced Computer Vision
Robust
Systems

Choosing W Lab

X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi

1 X
L= Li
N i
<latexit sha1_base64="(null)">(null)</latexit>

Northeastern 78
UNIVERSITY Advanced Computer Vision
Robust
Systems

Choosing W Lab

X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi

1 X
L= Li
N i
<latexit sha1_base64="(null)">(null)</latexit>

Loss does not change


since score is really good

Northeastern 79
UNIVERSITY Advanced Computer Vision
Robust
Systems

Choosing W Lab

X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi

1 X
L= Li
N i
<latexit sha1_base64="(null)">(null)</latexit>

Min = 0; Max = in nity

Northeastern 80
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Choosing W Lab

X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi

1 X
L= Li
N i
<latexit sha1_base64="(null)">(null)</latexit>

C-1

Northeastern 81
UNIVERSITY Advanced Computer Vision
Robust
Systems

Choosing W Lab

X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi

1 X
L= Li
N i
<latexit sha1_base64="(null)">(null)</latexit>

Northeastern 82
UNIVERSITY Advanced Computer Vision
Robust
Systems

Choosing W Lab

X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi

1 X
L= Li
N i
<latexit sha1_base64="(null)">(null)</latexit>

Same as before

Northeastern 83
UNIVERSITY Advanced Computer Vision
Robust
Systems

Choosing W Lab

X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi

1 X
L= Li
N i
<latexit sha1_base64="(null)">(null)</latexit>

Penalizes more larger errors

Northeastern 84
UNIVERSITY Advanced Computer Vision
Robust
Systems

Multi-class Hinge Loss Lab

f (x, W) = Wx
<latexit sha1_base64="(null)">(null)</latexit>

XN X
1
L= max(0, sj syi + 1)
N i=1
<latexit sha1_base64="(null)">(null)</latexit>
j6=yi

The best loss is 0. Suppose, we found W such that


the loss is 0. Is this W unique?

Northeastern 85
UNIVERSITY Advanced Computer Vision
Robust
Systems

Multi-class Hinge Loss Lab

f (x, W) = Wx
<latexit sha1_base64="(null)">(null)</latexit>

XN X
1
L= max(0, f (xi , W )j f (xi , W )yi + 1)
N i=1
<latexit sha1_base64="(null)">(null)</latexit>
j6=yi

The best loss is 0. Suppose, we found W such that


the loss is 0. Is this W unique?

Northeastern 86
UNIVERSITY Advanced Computer Vision
Robust
Systems

Choosing W Lab

With W

With 2W

Northeastern 87
UNIVERSITY Advanced Computer Vision
Robust
Systems

Choosing W Lab

With W

With 2W

If the difference in the max term is less than -1, we can scale it up
and the max will still be 0!
Northeastern 88
UNIVERSITY Advanced Computer Vision
Robust
Systems

Multi-class Hinge Loss Lab

f (x, W) = Wx
<latexit sha1_base64="(null)">(null)</latexit>

XN X
1
L= max(0, f (xi , W )j f (xi , W )yi + 1)
N i=1
<latexit sha1_base64="(null)">(null)</latexit>
j6=yi

The best loss is 0. Suppose, we found W such that


the loss is 0. Is this W unique?

NO! So we have extra degrees of freedom.

Northeastern 89
UNIVERSITY Advanced Computer Vision
Robust
Systems

Regularization Lab

XN X
1
L(W ) = max(0, f (xi , W )j f (xi , W )yi + 1)
N i=1
<latexit sha1_base64="(null)">(null)</latexit>
j6=yi

The model (W) matches well the training data

+ R(W )
<latexit sha1_base64="(null)">(null)</latexit>

Use the degrees of freedom

Northeastern 90
UNIVERSITY Advanced Computer Vision
Robust
Systems

Regularization Lab

XN X
1
L(W ) = max(0, f (xi , W )j f (xi , W )yi + 1)
N i=1
<latexit sha1_base64="(null)">(null)</latexit>
j6=yi

The model (W) matches well the training data

+ R(W )
<latexit sha1_base64="(null)">(null)</latexit>

Prevent over tting, encourage generalization

<latexit sha1_base64="(null)">(null)</latexit>
Hyperparameter that controls the effect of the regularization

Northeastern 91
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems

Regularization Lab

XN X
1
L(W ) = max(0, f (xi , W )j f (xi , W )yi + 1)
N i=1
<latexit sha1_base64="(null)">(null)</latexit>
j6=yi

The model (W) matches well the training data

+ R(W )
<latexit sha1_base64="(null)">(null)</latexit>

Prevent over tting, encourage generalization

Simple Examples:
X
2 2
L2: R(W ) = kW k = Wkl
<latexit sha1_base64="(null)">(null)</latexit>
k,l X
L1 (approx to L0): R(W ) = kW k1 = |Wkl |
X k,l
Elastic Net: R(W ) = <latexit sha1_base64="(null)">(null)</latexit>

2
Wkl + |Wkl |
Northeastern
k,l 92
Advanced Computer Vision
<latexit sha1_base64="(null)">(null)</latexit>

UNIVERSITY
fi
Robust
Systems

Regularization Lab

XN X
1
L(W ) = max(0, f (xi , W )j f (xi , W )yi + 1)
N i=1
<latexit sha1_base64="(null)">(null)</latexit>
j6=yi

The model (W) matches well the training data

+ R(W )
<latexit sha1_base64="(null)">(null)</latexit>

Prevent over tting, encourage generalization

Other Examples:
Dropout
Batch normalization
Stochastic depth, fractional pooling, etc.
Northeastern 93
UNIVERSITY Advanced Computer Vision
fi

You might also like