SVM and Multi-class Classifiers in CV
SVM and Multi-class Classifiers in CV
Lecture 3
SVM, Kernels, Multi-class Linear Classi ers
Next Class
Loss Functions, , Regularization, Optimization
1
fi
Robust
Systems
Northeastern 2
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
Feature x y
f ()
<latexit sha1_base64="(null)">(null)</latexit>
<latexit
y = f (x)
<latexit sha1_base64="(null)">(null)</latexit>
Feature x y
f ()
<latexit sha1_base64="(null)">(null)</latexit>
<latexit
y = f (x)
<latexit sha1_base64="(null)">(null)</latexit>
Feature x y
f ()
<latexit sha1_base64="(null)">(null)</latexit>
<latexit
y = f (x)
<latexit sha1_base64="(null)">(null)</latexit>
Steps
Lab
Training Training
Images Labels
Image Learned
Training
Features model
Learned
model
Image
Prediction
Features
Test Image Slide credit: D. Hoiem
Northeastern 7
UNIVERSITY Advanced Computer Vision
Robust
Systems
“Shallow Learning”
Lab
Feature x y
f ()
<latexit sha1_base64="(null)">(null)</latexit>
<latexit
Handcrafted features
Off-the-shelf trainable classi er
Northeastern 8
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
Lab
Northeastern 10
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
NN Distance Metrics
Lab
sX
X
d1 (I1 , I2 ) = |I1 (p) I2 (p)| d2 (I1 , I2 ) = (I1 (p) I2 (p))2
p
<latexit sha1_base64="(null)">(null)</latexit>
p
<latexit sha1_base64="(null)">(null)</latexit>
Northeastern 11
UNIVERSITY Advanced Computer Vision
Robust
Systems
Comparing Images
Lab
X
L1 Distance: d1 (I1 , I2 ) = |I1 (p) I2 (p)|
<latexit sha1_base64="(null)">(null)</latexit>
p
Northeastern 12
UNIVERSITY Advanced Computer Vision
Robust
Systems
NN: Visualization
Lab
Northeastern 13
UNIVERSITY Advanced Computer Vision
Robust
Systems
CFAR example
Lab
Northeastern 14
UNIVERSITY Advanced Computer Vision
Robust
Systems
CFAR example
Lab
Northeastern 15
UNIVERSITY Advanced Computer Vision
Robust
Systems
NN: Visualization
Lab
Northeastern 16
UNIVERSITY Advanced Computer Vision
Robust
Systems
K-NN: Visualization
Lab
Northeastern 17
UNIVERSITY Advanced Computer Vision
Robust
Systems
Hyperparameters
Lab
Northeastern 18
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
No cheating!
Lab
Northeastern 19
UNIVERSITY Advanced Computer Vision
Robust
Systems
No Cheating!
Lab
Northeastern 20
UNIVERSITY Advanced Computer Vision
Robust
Systems
Northeastern 21
UNIVERSITY Advanced Computer Vision
Robust
Systems
Northeastern 22
UNIVERSITY Advanced Computer Vision
Robust
Systems
Northeastern 23
UNIVERSITY Advanced Computer Vision
fi
fi
fi
Robust
Systems
Northeastern 24
UNIVERSITY Advanced Computer Vision
fi
fi
fi
fi
Robust
Systems
Northeastern 25
UNIVERSITY Advanced Computer Vision
fi
fi
fi
fi
Robust
Systems
k-NN
Lab
Northeastern 26
UNIVERSITY Advanced Computer Vision
Robust
Systems
Northeastern 27
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
l
x2
Normal
W
x
x1
w
x. =
∥w∥
Northeastern 28
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
l
x2
w
x. −ρ=0
Normal
W
∥w∥
x
x1
w
x. = ρ
∥w∥
Northeastern 29
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
l
x2
w
x. −ρ=0
Normal
W
∥w∥
x
P
d
ρ
x1
d=
Northeastern 30
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
l
x2
w
x. −ρ=0
Normal
W
∥w∥
x
P
d
ρ
x1
w
xp . − (ρ + d) = 0
∥w∥
d=
Northeastern 31
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
l
x2
w
x. −ρ=0
Normal
W
∥w∥
x
P
d
ρ
x1
w
xp . − (ρ + d) = 0
∥w∥
w
d = xp . −ρ
∥w∥
Northeastern 32
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
l
x2
w
x. −ρ=0
Normal
W
∥w∥
x
x1
w
xn . − (ρ − d) = 0
d
N
∥w∥
Northeastern 33
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
l
x2
w
x. −ρ=0
Normal
W
∥w∥
x
x1
w
xn . − (ρ − d) = 0
d
N
∥w∥
w
−d = xn . −ρ
∥w∥
Northeastern 34
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
T
f (x) = sgn(w · x + b)
<latexit sha1_base64="(null)">(null)</latexit>
Northeastern 35
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
NN Pros:
Simple to implement
Decision boundaries not necessarily linear
Works for any number of classes
Nonparametric Method
NN Cons:
Need good distance function
Slow at test time
Northeastern 36
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
NN Pros:
Simple to implement
Decision boundaries not necessarily linear
Works for any number of classes
Nonparametric Method
NN Cons:
Need good distance function
Slow at test time
Linear Pros:
Low-dimensional parametric representation
Very fast at test time
Linear Cons:
Works for two classes
How to train the linear function?
What if data is not linearly separable?
Northeastern 37
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
Northeastern 38
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
y1 = + 1
yi = + 1
y2 = − 1
yN = − 1
Northeastern 39
UNIVERSITY Advanced Computer Vision
Robust
Systems
y1 = + 1
yi = + 1
y2 = − 1
yN = − 1
Northeastern 40
UNIVERSITY Advanced Computer Vision
Robust
Systems
y1 = + 1
yi = + 1
y2 = − 1
yN = − 1
Northeastern 41
UNIVERSITY Advanced Computer Vision
Robust
Systems
y1 = + 1
yi = + 1
y2 = − 1
yN = − 1
Northeastern 42
UNIVERSITY Advanced Computer Vision
Robust
Systems
2
1. Maximize margin kwk
<latexit sha1_base64="(null)">(null)</latexit>
Negative samples: w T xi + b
<latexit sha1_base64="(null)">(null)</latexit>
1
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Northeastern 43
UNIVERSITY Advanced Computer Vision
Robust
Systems
2
1. Maximize margin kwk
<latexit sha1_base64="(null)">(null)</latexit>
Negative samples: w T xi + b
<latexit sha1_base64="(null)">(null)</latexit>
1
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Northeastern 44
UNIVERSITY Advanced Computer Vision
Robust
Systems
2
1. Maximize margin kwk <latexit sha1_base64="(null)">(null)</latexit>
Negative samples: w T xi + b
<latexit sha1_base64="(null)">(null)</latexit>
1
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining
and Knowledge Discovery, 1998
Northeastern 45
UNIVERSITY Advanced Computer Vision
Robust
Systems
y1 = + 1
y2 = − 1
Northeastern 46
UNIVERSITY Advanced Computer Vision
Robust
Systems
1
min kwk2 subject to yi (wT xi + b) 1
w,b 2
<latexit sha1_base64="(null)">(null)</latexit>
X n
1 2
min kwk + C max 0, 1 yi (wT xi + b)
w,b 2
<latexit sha1_base64="(null)">(null)</latexit>
I=1
Northeastern 47
UNIVERSITY Advanced Computer Vision
Robust
Systems
X n
1 2
min kwk + C max 0, 1 yi (wT xi + b)
w,b 2
<latexit sha1_base64="(null)">(null)</latexit>
I=1
+1 Margin
yi (wT xi + b) 1
0
<latexit sha1_base64="(null)">(null)</latexit>
-1
Demo: http://cs.stanford.edu/people/karpathy/svmjs/demo
Northeastern 48
UNIVERSITY Advanced Computer Vision
Robust
Systems
If the data is not linearly separable, map the original input to a higher
dimensional space where it is separable.
Northeastern 49
UNIVERSITY Advanced Computer Vision
Robust
Systems
Northeastern 50
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
3D
9D
Northeastern 51
UNIVERSITY Advanced Computer Vision
Robust
Systems
X
w T xi + b = ↵i yi xTi x + b
<latexit sha1_base64="(null)">(null)</latexit>
i
X X
↵i yi (xi ) (x) + b = ↵i yi K(xi , x) + b
<latexit sha1_base64="(null)">(null)</latexit>
i i
Northeastern 52
UNIVERSITY Advanced Computer Vision
Robust
Systems
K(x, y) = (c + xT y)d
<latexit sha1_base64="(null)">(null)</latexit>
Northeastern 53
UNIVERSITY Advanced Computer Vision
Robust
Systems
1
kx yk2
K(x, y) = exp
<latexit sha1_base64="(null)">(null)</latexit>
2
K(x, y)
<latexit sha1_base64="(null)">(null)</latexit>
kx
<latexit sha1_base64="(null)">(null)</latexit>
yk
Northeastern 54
UNIVERSITY Advanced Computer Vision
Robust
Systems
Pros:
Kernel-based framework is very powerful and exible
Training is a convex optimization problem
Amenable to theoretical analysis
Works very well in practice, even with small training datasets
Cons:
No direct multi-class SVM
Computation, memory
Northeastern 55
UNIVERSITY Advanced Computer Vision
fl
Robust
Systems
Northeastern 56
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
Northeastern 57
UNIVERSITY Advanced Computer Vision
Robust
Systems
f (x, W) = Wx
<latexit sha1_base64="(null)">(null)</latexit>
3072 x 1
10 x 1 10 x 3072
Northeastern 58
UNIVERSITY Advanced Computer Vision
Robust
Systems
Bias
f (x, W) = Wx + b
10 x 1
<latexit sha1_base64="(null)">(null)</latexit>
10 x 1
Northeastern 59
UNIVERSITY Advanced Computer Vision
Robust
Systems
f (x,
f (x,fW)
(x, = W)
W) =[=
Wx + Wx
Wxb ][ ]
1 3073 x 1
<latexit sha1_base64="(null)">(null)</latexit>
10 x 1 10 x 3073
Northeastern 60
UNIVERSITY Advanced Computer Vision
Robust
Systems
Northeastern 61
UNIVERSITY Advanced Computer Vision
Robust
Systems
Northeastern 62
UNIVERSITY Advanced Computer Vision
Robust
Systems
Northeastern 63
UNIVERSITY Advanced Computer Vision
Robust
Systems
Northeastern 64
UNIVERSITY Advanced Computer Vision
Robust
Systems
Northeastern 65
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
Northeastern 66
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
Northeastern 67
UNIVERSITY Advanced Computer Vision
Robust
Systems
Northeastern 68
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
Choosing W Lab
Training Data:
{(xi , yi )}i=1,...,N
<latexit sha1_base64="(null)">(null)</latexit>
Image
Integer Label
Loss:
1 X
L= Li (f (xi , W ), yi )
N i
<latexit sha1_base64="(null)">(null)</latexit>
Northeastern 69
UNIVERSITY Advanced Computer Vision
Robust
Systems
Choosing W Lab
Training Data:
{(xi , yi )}i=1,...,N
<latexit sha1_base64="(null)">(null)</latexit>
Image
Integer Label
Average Loss:
1 X
L= Li (f (xi , W ), yi )
N i
<latexit sha1_base64="(null)">(null)</latexit>
Northeastern 70
UNIVERSITY Advanced Computer Vision
Robust
Systems
Choosing W Lab
syi
Score for other class sj Score for true class
1
Margin of 1
Northeastern 71
UNIVERSITY Advanced Computer Vision
Robust
Systems
Choosing W Lab
syi
Score for other class sj Score for true class
1
Margin of 1
Northeastern 72
UNIVERSITY Advanced Computer Vision
Robust
Systems
Choosing W Lab
syi
Score for other class sj Score for true class
1
Margin of 1
Northeastern 73
UNIVERSITY Advanced Computer Vision
Robust
Systems
Choosing W Lab
syi
Score for other class sj Score for true class
1
Margin of 1
s = f (xi , W )
Multiclass SVM:
X ⇢ Sum over all classes j <latexit sha1_base64="(null)">(null)</latexit>
0 if syi sj + 1
Li =
sj syi + 1 otherwise
j6=yi
X
<latexit sha1_base64="(null)">(null)</latexit>
= max(0, sj syi + 1)
⌘6=yi
Northeastern 74
Advanced Computer Vision
<latexit sha1_base64="(null)">(null)</latexit>
UNIVERSITY
Robust
Systems
Choosing W Lab
X
Li = max(0, sj syi + 1)
⌘6=yi
3.2
<latexit sha1_base64="(null)">(null)</latexit>
5.1, -1.7
Northeastern 75
UNIVERSITY Advanced Computer Vision
Robust
Systems
Choosing W Lab
X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi
Northeastern 76
UNIVERSITY Advanced Computer Vision
Robust
Systems
Choosing W Lab
X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi
Northeastern 77
UNIVERSITY Advanced Computer Vision
Robust
Systems
Choosing W Lab
X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi
1 X
L= Li
N i
<latexit sha1_base64="(null)">(null)</latexit>
Northeastern 78
UNIVERSITY Advanced Computer Vision
Robust
Systems
Choosing W Lab
X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi
1 X
L= Li
N i
<latexit sha1_base64="(null)">(null)</latexit>
Northeastern 79
UNIVERSITY Advanced Computer Vision
Robust
Systems
Choosing W Lab
X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi
1 X
L= Li
N i
<latexit sha1_base64="(null)">(null)</latexit>
Northeastern 80
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
Choosing W Lab
X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi
1 X
L= Li
N i
<latexit sha1_base64="(null)">(null)</latexit>
C-1
Northeastern 81
UNIVERSITY Advanced Computer Vision
Robust
Systems
Choosing W Lab
X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi
1 X
L= Li
N i
<latexit sha1_base64="(null)">(null)</latexit>
Northeastern 82
UNIVERSITY Advanced Computer Vision
Robust
Systems
Choosing W Lab
X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi
1 X
L= Li
N i
<latexit sha1_base64="(null)">(null)</latexit>
Same as before
Northeastern 83
UNIVERSITY Advanced Computer Vision
Robust
Systems
Choosing W Lab
X
Li = max(0, sj syi + 1)
<latexit sha1_base64="(null)">(null)</latexit>
⌘6=yi
1 X
L= Li
N i
<latexit sha1_base64="(null)">(null)</latexit>
Northeastern 84
UNIVERSITY Advanced Computer Vision
Robust
Systems
f (x, W) = Wx
<latexit sha1_base64="(null)">(null)</latexit>
XN X
1
L= max(0, sj syi + 1)
N i=1
<latexit sha1_base64="(null)">(null)</latexit>
j6=yi
Northeastern 85
UNIVERSITY Advanced Computer Vision
Robust
Systems
f (x, W) = Wx
<latexit sha1_base64="(null)">(null)</latexit>
XN X
1
L= max(0, f (xi , W )j f (xi , W )yi + 1)
N i=1
<latexit sha1_base64="(null)">(null)</latexit>
j6=yi
Northeastern 86
UNIVERSITY Advanced Computer Vision
Robust
Systems
Choosing W Lab
With W
With 2W
Northeastern 87
UNIVERSITY Advanced Computer Vision
Robust
Systems
Choosing W Lab
With W
With 2W
If the difference in the max term is less than -1, we can scale it up
and the max will still be 0!
Northeastern 88
UNIVERSITY Advanced Computer Vision
Robust
Systems
f (x, W) = Wx
<latexit sha1_base64="(null)">(null)</latexit>
XN X
1
L= max(0, f (xi , W )j f (xi , W )yi + 1)
N i=1
<latexit sha1_base64="(null)">(null)</latexit>
j6=yi
Northeastern 89
UNIVERSITY Advanced Computer Vision
Robust
Systems
Regularization Lab
XN X
1
L(W ) = max(0, f (xi , W )j f (xi , W )yi + 1)
N i=1
<latexit sha1_base64="(null)">(null)</latexit>
j6=yi
+ R(W )
<latexit sha1_base64="(null)">(null)</latexit>
Northeastern 90
UNIVERSITY Advanced Computer Vision
Robust
Systems
Regularization Lab
XN X
1
L(W ) = max(0, f (xi , W )j f (xi , W )yi + 1)
N i=1
<latexit sha1_base64="(null)">(null)</latexit>
j6=yi
+ R(W )
<latexit sha1_base64="(null)">(null)</latexit>
<latexit sha1_base64="(null)">(null)</latexit>
Hyperparameter that controls the effect of the regularization
Northeastern 91
UNIVERSITY Advanced Computer Vision
fi
Robust
Systems
Regularization Lab
XN X
1
L(W ) = max(0, f (xi , W )j f (xi , W )yi + 1)
N i=1
<latexit sha1_base64="(null)">(null)</latexit>
j6=yi
+ R(W )
<latexit sha1_base64="(null)">(null)</latexit>
Simple Examples:
X
2 2
L2: R(W ) = kW k = Wkl
<latexit sha1_base64="(null)">(null)</latexit>
k,l X
L1 (approx to L0): R(W ) = kW k1 = |Wkl |
X k,l
Elastic Net: R(W ) = <latexit sha1_base64="(null)">(null)</latexit>
2
Wkl + |Wkl |
Northeastern
k,l 92
Advanced Computer Vision
<latexit sha1_base64="(null)">(null)</latexit>
UNIVERSITY
fi
Robust
Systems
Regularization Lab
XN X
1
L(W ) = max(0, f (xi , W )j f (xi , W )yi + 1)
N i=1
<latexit sha1_base64="(null)">(null)</latexit>
j6=yi
+ R(W )
<latexit sha1_base64="(null)">(null)</latexit>
Other Examples:
Dropout
Batch normalization
Stochastic depth, fractional pooling, etc.
Northeastern 93
UNIVERSITY Advanced Computer Vision
fi