SYLLABUS
We know it is good to learn a small model.
From this fully connected model, do we really
need all the edges?
Can some of these be shared?
Some patterns are much smaller than the whole image
Can represent a small region with fewer
parameters
“beak” detector
“upper-left beak”
detector
They can be compressed
to the same parameters.
“middle beak”
detector
A CNN is a neural network with some convolutional layers
(and some other layers). A convolutional layer has a number
of filters that does convolutional operation.
Beak detector
A filter
These are the network
parameters to be learned.
1 -1 -1
1 0 0 0 0 1 -1 1 -1 Filter 1
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0
1 0 0 0 1 0 -1 1 -1
0 1 0 0 1 0 -1 1 -1 Filter 2
0 0 1 0 1 0 -1 1 -1
…
…
6 x 6 image
Each filter detects a
small pattern (3 x 3).
1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
stride=1
1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
If stride=2
1 0 0 0 0 1
0 1 0 0 1 0 3 -3
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
stride=1
1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
0 1 0 0 1 0
0 0 1 0 1 0 -3 -3 0 1
6 x 6 image 3 -2 -2 -1
-1 1 -1
-1 1 -1 Filter 2
-1 1 -1
stride=1
Repeat this for each filter
1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
-1 -1 -1 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
-1 -1 -2 1
0 1 0 0 1 0 Feature
0 0 1 0 1 0 -3 -3 Map
0 1
-1 -1 -2 1
6 x 6 image 3 -2 -2 -1
-1 0 -4 3
Two 4 x 4 images
Forming 2 x 4 x 4 matrix
11 -1-1 -1-1 -1-1 11 -1-1
1 -1 -1 -1 1 -1
-1-1 11 -1-1 -1-1-1111-1-1-1 Filter 2
-1 1 -1 Filter 1 -1 1 -1
-1-1 -1-1 11 -1-1 11 -1-1
-1 -1 1
Color image
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
Convolution v.s. Fully Connected
1 0 0 0 0 1 1 -1 -1 -1 1 -1
0 1 0 0 1 0 -1 1 -1 -1 1 -1
0 0 1 1 0 0 -1 -1 1 -1 1 -1
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
convolution
image
x1
1 0 0 0 0 1
0 1 0 0 1 0 x2
Fully- 0 0 1 1 0 0
1 0 0 0 1 0
connected
…
…
…
…
0 1 0 0 1 0
0 0 1 0 1 0
x36
1 -1 -1 Filter 1 1 1
-1 1 -1 2 0
-1 -1 1 3 0
4 0 3
:
…
1 0 0 0 0 1
0 1 0 0 1 0 0
0 0 1 1 0 0 8 1
1 0 0 0 1 0 9 0
0 1 0 0 1 0 10: 0
…
0 0 1 0 1 0
13 0
6 x 6 image
14 0
fewer 15 1 Only connect
parameters! 16 1 to 9 inputs,
not fully
…
connected
1 -1 -1 1 1
:2 0
-1 1 -1 Filter 1
:3 0
-1 -1 1
:4 0 3
:
…
1 0 0 0 0 1
0 1 0 0 1 0 7 0
:8 1
0 0 1 1 0 0
:9 0 -1
1 0 0 0 1 0
10:: 0
0 1 0 0 1 0
…
0 0 1 0 1 0
13 0
6 x 6 image
:
14 0
Fewer parameters :15 1
:
16 1 Shared weights
Even fewer
:
…
parameters
cat dog ……
Convolution
Max Pooling
Can repeat
Fully Connected many
Feedforward
network
Convolution times
Max Pooling
Flattened
1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1
3 -1 -3 -1 -1 -1 -1 -1
-3 1 0 -3 -1 -1 -2 1
-3 -3 0 1 -1 -1 -2 1
3 -2 -2 -1 -1 0 -4 3
Subsampling pixels will not change the
object
bird
bird
Subsampling
We can subsample the pixels to make image smaller
fewer parameters to characterize the image
Reducing number of connections
Shared weights on the edges
Max pooling further reduces the complexity
New image
1 0 0 0 0 1 but smaller
0 1 0 0 1 0 Conv
3 0
0 0 1 1 0 0 -1 1
1 0 0 0 1 0
Max
0 1 0 0 1 0 30 13
Poolin
0 0 1 0 1 0 g
2 x 2 image
6 x 6 image
Each filter
is a channel
3 0
-1 1 Convolution
3 1
0 3
Max Pooling
Can repeat
A new
many
image
Convolution times
Smaller than the original
image
The number of channels Max Pooling
is the number of filters
cat dog ……
Convolution
Max Pooling
Fully Connected A new
Feedforward
Convolutionimage
network
Max Pooling
Flattene A new
d image
3
1
3- 0
1 3
1
30 1 -1
3 Flattened
Fully Connected
1 Feedforward
network
0
3
Only modified the network structure
CNN in Keras and input format (vector -> 3-D
tensor)
input
Convolution
1 -1 -1
-1 1 -1
-1 1 -1
-1 1 -1 … There are
-1 -1 1 25 3x3
-1 1 -1 … Max Pooling
filters.
Input_shape = ( 28 , 28 , 1)
28 x 28 pixels 1: black/white, 3: RGB Convolution
3 -1 3 Max Pooling
-3 1
Only modified the network structure
CNN in Keras and input format (vector -> 3-D
array)
Input
1 x 28 x 28
Convolution
How many parameters for
each filter? 9 25 x 26 x
26
Max Pooling
25 x 13 x
13
Convolution
How many parameters 225=
for each filter? 50 x 11 x
25x9 11
Max Pooling
50 x 5 x 5
Only modified the network structure
CNN in Keras and input format (vector -> 3-D
array)
Input
1 x 28 x 28
Output Convolution
25 x 26 x
26
Fully connected Max Pooling
feedforward
network 25 x 13 x
13
Convolution
50 x 11 x
11
Max Pooling
1250 50 x 5 x 5
Flattened
Next move
Neural
(19 x 19
Network positions)
19 x 19 matrix
Black: 1 Fully-connected feedforward
network can be used
white: -1
none: 0 But CNN performs much better
The following is quotation from their Nature article:
Note: AlphaGo does not use Max Pooling.
The filters move in the
CNN frequency direction.
Frequency
Image Time
Spectrogram
?
Source of image: http://citeseerx.ist.psu.ed
u/viewdoc/download?doi=10.1.1.703.6858
&rep=rep1&type=pdf