Tutorial CNN
Tutorial CNN
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
Convolutional neural networks are an important class of learnable representations applicable, among others, to
numerous computer vision problems. Deep CNNs, in particular, are composed of several layers of processing, each
involving linear as well as non-linear operators, that are learned jointly, in an end-to-end manner, to solve a
particular tasks. These methods are now the dominant approach for feature extraction from audiovisual and
textual data.
This practical explores the basics of learning (deep) CNNs. The first part introduces typical CNN building blocks,
such as ReLU units and linear filters, with a particular emphasis on understanding back-propagation. The second
part looks at learning two basic CNNs. The first one is a simple non-linear filter capturing particular image
structures, while the second one is a network that recognises typewritten characters (using a variety of dierent
fonts). These examples illustrate the use of stochastic gradient descent with momentum, the definition of an
objective function, the construction of mini-batches of data, and data jittering. The last part shows how powerful
CNN models can be downloaded o-the-shelf and used directly in applications, bypassing the expensive training
process.
VGG Convolutional Neural Networks Practical
Getting started
Part 1: CNN building blocks
Part 1.1: convolution
Part 1.2: non-linear gating
1 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
Part 1.3: pooling
Part 1.4: normalisation
Part 2: back-propagation and derivatives
Part 2.1: the theory of back-propagation
Part 2.1: using back-propagation in practice
Part 3: learning a tiny CNN
Part 3.1: training data and labels
Part 3.2: image preprocessing
Part 3.3: learning with gradient descent
Part 3.4: experimenting with the tiny CNN
Part 4: learning a character CNN
Part 4.1: prepare the data
Part 4.2: intialize a CNN architecture
Part 4.3: train and evaluate the CNN
Part 4.4: visualise the learned filters
Part 4.5: apply the model
Part 4.6: training with jitter
Part 4.7: Training using the GPU
Part 5: using pretrained models
Part 5.1: load a pre-trained model
Part 5.2: use the model to classify an image
Links and further work
Acknowledgements
History
Getting started
Read and understand the requirements and installation instructions. The download links for this practical are:
Code and data: practical-cnn-2015a.tar.gz
Code only: practical-cnn-2015a-code-only.tar.gz
Data only: practical-cnn-2015a-data-only.tar.gz
Git repository (for lab setters and developers)
Aer the installation is complete, open and edit the script exercise1.m in the MATLAB editor. The script contains
commented code and a description for all steps of this exercise, for Part I of this document. You can cut and paste
this code into the MATLAB window to run it, and will need to modify it as you go through the session. Other files
exercise2.m , exercise3.m , and exercise4.m are given for Part II, III, and IV.
Each part contains several Questions (that require pen and paper) and Tasks (that require experimentation or
coding) to be answered/completed before proceeding further in the practical.
2 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
f : RMNK RM
N K
x y.
Open the example1.m file, select the following part of the code, and execute it in MATLAB (right button >
Evaluate selection or Shift+F7 ).
% Read an example image
x = imread('peppers.png') ;
% Convert to single format
x = im2single(x) ;
% Visualize the input x
figure(1) ; clf ; imagesc(x)
3 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
Use MATLAB size command to obtain the size of the array x . Note that the array x is converted to single
precision format. This is because the underlying MatConvNet assumes that data is in single precision.
Question. The third dimension of x is 3. Why?
Now we will create a bank 10 of 5 5 3 filters.
% Create a bank of linear filters
w = randn(5,5,3,10,'single') ;
The filters are in single precision as well. Note that w has four dimensions, packing 10 filters. Note also that each
filter is not flat, but rather a volume with three layers. The next step is applying the filter to the image. This uses the
vl_nnconv function from MatConvNet:
% Apply the convolution operator
y = vl_nnconv(x, w, []) ;
Remark: You might have noticed that the third argument to the vl_nnconv function is the empty matrix [] . It
can be otherwise used to pass a vector of bias terms to add to the output of each filter.
The variable y contains the output of the convolution. Note that the filters are three-dimensional, in the sense
that it operates on a map x with K channels. Furthermore, there are K such filters, generating a K
dimensional map y as follows
4 of 23
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
dimension Mf
Note that x is indexed by i + i and j + j , but that there is no plus sign between k and k . Why?
Task: check that the size of the variable y matches your calculations.
We can now visualise the output y of the convolution. In order to do this, use the vl_imarraysc function to
display an image for each feature channel in y :
% Visualize the output y
figure(2) ; clf ; vl_imarraysc(y) ; colormap gray ;
Question: Study the feature channels obtained. Most will likely contain a strong response in correspondences
of edges in the input image x . Recall that w was obtained by drawing random numbers from a Gaussian
distribution. Can you explain this phenomenon?
So far filters preserve the resolution of the input feature map. However, it is oen useful to downsample the output.
This can be obtained by using the stride option in vl_nnconv :
% Try again, downsampling the output
y_ds = vl_nnconv(x, w, [], 'stride', 16) ;
figure(3) ; clf ; vl_imarraysc(y_ds) ; colormap gray ;
As you should have noticed in a question above, applying a filter to an image or feature map interacts with the
boundaries, making the output map smaller by an amount proportional to the size of the filters. If this is
undesirable, then the input array can be padded with zeros by using the pad option:
% Try padding
y_pad = vl_nnconv(x, w, [], 'pad', 4) ;
figure(4) ; clf ; vl_imarraysc(y_pad) ; colormap gray ;
Task: Convince yourself that the previous codes output has dierent boundaries compared to the code that
does not use padding. Can you explain the result?
In order to consolidate what has been learned so far, we will now design a filter by hand:
w = [0 1 0 ;
1 -4 1 ;
0 1 0 ] ;
w = single(repmat(w, [1, 1, 3])) ;
y_lap = vl_nnconv(x, w, []) ;
figure(5) ; clf ; colormap gray ;
subplot(1,2,1) ;
imagesc(y_lap) ; title('filter output') ;
subplot(1,2,2) ;
imagesc(-abs(y_lap)) ; title('- abs(filter output)') ;
5 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
Questions:
What filter have we implemented?
How are the RGB colour channels processed by this filter?
What image structure are detected?
=
=
=
=
Tasks:
Run the code above and understand what the filter w is doing.
Explain the final result z.
6 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
Question: look at the resulting image. Can you interpret the result?
The function vl_nnpool supports subsampling and padding just like vl_nnconv . However, for max-pooling
feature maps are padded with the value instead of 0. Why?
yijk =
where
xijk
( + kG(k ) x2ijk )
input map.
Task: Understand what this operator is doing. How would you set
normalisation?
Tasks:
Inspect the figure just obtained. Can you interpret it?
Compute the L2 norm of the feature channels in the output map y_nrm . What do you notice?
Explain this result in relation to the particular choice of the parameters , , and .
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
The parameters of a CNN w = (w1 , wL ) should be learned in such a manner that the overall CNN function
z = f(x; w) achieves a desired goal. In some cases, the goal is to model the distribution of the data, which leads
to a generative objective. Here, however, we will use f as a regressor and obtain it by minimising a discriminative
objective. In simple terms, we are given:
examples of the desired input-output relations (x1 , z1 ), , (xn , zn ) where
corresponding output values;
^) that expresses the penalty for predicting z
^ instead of z.
and a loss (z, z
We use those to write the empirical loss of the CNN f by averaging over the examples:
L(w) =
1 n
(zi , f(xi ; w))
n i=1
Note that the composition of the function f with the loss can be though of as a CNN with one more layer (called a
loss layer). Hence, with a slight abuse of notation, in the rest of this part we incorporate the loss in the function f
(which therefore is a map X R) and do not talk about it explicitly anymore.
The simplest algorithm to minimise L , and in fact one that is used in practice, is gradient descent. The idea is
simple: compute the gradient of the objective L at a current solution wt and then update the latter along the
direction of fastest descent of L :
wt+1 = wt t
where t
f
( wt )
w
=
fL ( f2 (f1 (x; w1 ); w2 ) ), wL )
wl
wl
vec fl+1 vec fl
vec fL vec fL1
=
wl is already
Questions: Make sure you understand the structure of this formula and answer the following:
vec fl / vec x
l is a matrix. What are its dimensions?
The formula can be rewritten with a slightly dierent notation by replacing the symbols fl with the
symbols xl+1 . If you do so, do you notice any formal cancellation?
8 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
The formula only includes the derivative symbols. However, these derivatives must be computed at a
well defined point. What is this point?
To apply the chain rule we must be able to compute, for each function fl , its derivative with respect to the
parameters wl as well as its input xl . While this could be done naively, a problem is the very high dimensionality
of the matrices involved in this calculation as these are M N K MNK arrays. We will now introduce a
trick that allows this to be reduced to working with MNK numbers only and which will yield the
back-propagation algorithm.
gl+1 vec fl
f
=
wl
vec x
l+1 wl
where gl+1
Question: Explain why the dimensions of the vectors gl+1 / vec xl+1 and f/wl equals the number
of elements in xl+1 and wl respectively. Hence, in particular, the symbol gl+1 / xl+1 (without
vectorisation) denotes an array with the same size of xl+1 .
Hint: recall that the last layer is the loss.
Hence the algorithm can focus on computing the derivatives of gl instead of fl which are far lower-dimensional. To
see how this can be done iteratively, decompose gl as:
xl fl xl+1 gl+1 xL
wl
Then the key of the iteration is obtaining the derivatives for layer l given the ones for layer l + 1:
Input:
the derivative gl+1 / xl+1 .
Output:
the derivative gl / xl
the derivative gl / wl
Question: Suppose that fl is the function xl+1 = Axl where xl and xl+1 are column vectors. Suppose
that B = gl+1 / xl+1 is given. Derive an expression for C = gl / xl and an expression for
D = gl / wl .
9 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
x f y g z
w
where z is assumed to be a scalar. Then each computation block (for example vl_nnconv or vl_nnpool ) can
compute z/x and z/w given as input x and z/y. Lets put this into practice:
% Read an example image
x = im2single(imread('peppers.png')) ;
% Create a bank of linear filters and apply them to the image
w = randn(5,5,3,10,'single') ;
y = vl_nnconv(x, w, []) ;
% Create the derivative dz/dy
dzdy = randn(size(y), 'single') ;
% Back-propagation
[dzdx, dzdw] = vl_nnconv(x, w, [], dzdy) ;
Task: Run the code above and check the dimensions of dzdx and dzdy . Does this matches your
expectations?
An advantage of this modular view is that new building blocks can be coded and added to the architecture in a
simple manner. However, it is easy to make mistakes in the calculation of complex derivatives. Hence, it is a good
idea to verify results numerically. Consider the following piece of code:
% Check the derivative numerically
ex = randn(size(x), 'single') ;
eta = 0.0001 ;
xp = x + eta * ex ;
yp = vl_nnconv(xp, w, []) ;
dzdx_empirical = sum(dzdy(:) .* (yp(:) - y(:)) / eta) ;
dzdx_computed = sum(dzdx(:) .* ex(:)) ;
fprintf(...
'der: empirical: %f, computed: %f, error: %.2f %%\n', ...
dzdx_empirical, dzdx_computed, ...
abs(1 - dzdx_empirical/dzdx_computed)*100) ;
Questions:
What is the meaning of ex in the code above?
What are the derivatives dzdx_empirical and dzdx_computed ?
Tasks:
Run the code and convince yourself that vl_nnconv derivatives is (probably) correct.
Create a new version of this code to test the derivative calculation with respect to w.
10 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
We are now ready to build our first elementary CNN, composed of just two layers, and to compute its derivatives:
% Parameters of the CNN
w1 = randn(5,5,3,10,'single') ;
rho2 = 10 ;
% Run the CNN forward
x1 = im2single(imread('peppers.png')) ;
x2 = vl_nnconv(x1, w1, []) ;
x3 = vl_nnpool(x2, rho2) ;
% Create the derivative dz/dx3
dzdx3 = randn(size(x3), 'single') ;
% Run the CNN backward
dzdx2 = vl_nnpool(x2, rho2, dzdx3) ;
[dzdx1, dzdw1] = vl_nnconv(x1, w1, [], dzdx2) ;
Question: Note that the last derivative in the CNN is dzdx3 . Here, for the sake of the example, this derivative
is initialised randomly. In a practical application, what would this derivative represent?
We can now use the same technique as before to check that the derivative computed through back-propagation are
correct.
% Check the derivative numerically
ew1 = randn(size(w1), 'single') ;
eta = 0.0001 ;
w1p = w1 + eta * ew1 ;
x1p = x1 ;
x2p = vl_nnconv(x1p, w1p, []) ;
x3p = vl_nnpool(x2p, rho2) ;
dzdw1_empirical = sum(dzdx3(:) .* (x3p(:) - x3(:)) / eta) ;
dzdw1_computed = sum(dzdw1(:) .* ew1(:)) ;
fprintf(...
'der: empirical: %f, computed: %f, error: %.2f %%\n', ...
dzdw1_empirical, dzdw1_computed, ...
abs(1 - dzdw1_empirical/dzdw1_computed)*100) ;
11 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
x2 = W x1 + b,
x3 = maxpool x2 .
W contains a single 3 3 square filter, so that b is a scalar. and the input image x = x1 has a single channel.
Task
Open the file tinycnn.m and inspect the code. Convince yourself that the code computes the CNN
just described.
Look at the paddings used in the code. If the input image x1 has dimensions M N , what is the
dimension of the output feature map x3 ?
In the rest of the section we will learn the CNN parameters in order to extract blob-like structures from images, such
as the ones in the following image:
The arrays pos and neg contain now pixel labels and will be used as annotations for the supervised training of
the CNN. These annotations can be visualised as follows:
figure(1) ; clf ;
subplot(1,3,1) ; imagesc(im) ; axis equal ; title('image') ;
12 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
Are there pixels for which both pos and neg evaluate to false?
E(w, b) =
1
1
w2 +
max{0, 1 f(x; w, b)(u,v) } +
max{0, f(x; w, b)(u,v) }
2
|P| (u,v)P
|N | (u,v)N
Questions:
What can you say about the score of each pixel if = 0 and E(w, b) = 0?
Note that the objective enforces a margin between the scores of the positive and negative pixels.
How much is this margin?
We can now train the CNN by minimising the objective function with respect to
13 of 23
w and b. We do so by using an
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
algorithm called gradient descent with momentum. Given the current solution (wt , bt ) and update it , this is
updated to (wt+1 , bt ) by following the direction of fastest descent as given by the negative gradient
E(wt , bt ) of the objective. However, gradient updates are smoothed by considering a momentum term
t,
t ), yielding the update equations
(w
t+1 w
t +
w
E
,
wt
t.
wt+1 wt w
and similarly for the bias term. Here is the momentum rate and the learning rate.
Questions:
Explain why the momentum rate must be smaller than 1. What is the eect of having a momentum
rate close to 1?
The learning rate establishes how fast the algorithm will try to minimise the objective function. Can
you see any problem with a large learning rate?
The parameters of the algorithm are set as follows:
numIterations = 500 ;
rate = 5 ;
momentum = 0.9 ;
shrinkRate = 0.0001 ;
plotPeriod = 10 ;
Tasks:
Inspect the code in the file exercise3.m . Convince yourself that the code is implementing the
algorithm described above. Pay particular attention at the forward and backward passes as well as
at how the objective function and its derivatives are computed.
Run the algorithm and observe the results. Then answer the following questions:
The learned filter should resemble the discretisation of a well-known dierential operator.
Which one?
What is the average of the filter values compared to the average of the absolute values?
Run the algorithm again and observe the evolution of the histograms of the score of the positive and
negative pixels in relation to the values 0 and 1. Answer the following:
Is the objective function minimised monotonically?
As the histograms evolve, can you identify at least two phases in the optimisation?
Once converged, do the score distribute in the manner that you would expect?
Hint: the plotPeriod option can be changed to plot the diagnostic figure with a higher or lower frequency;
this can significantly aect the speed of the algorithm.
14 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
Task: Train again the tiny CNN without smoothing the input image in preprocessing. Answer the following
questions:
Is the learned filter very dierent from the one learned before?
If so, can you figure out what went wrong?
Look carefully at the output of the first layer, magnifying with the loupe tool. Is the maximal filter
response attained in the middle of each blob?
Hint: The Laplacian of Gaussian operator responds maximally at the centre of a blob only if the latter matches
the blob size. Relate this fact to the combination of pre-smoothing the image and applying the learned 3 3
filter.
Now restore the smoothing but switch o subtracting the median from the input image.
Task: Train again the tiny CNN without subtracting the median value in preprocessing. Answer the following
questions:
Does the algorithm converge?
Reduce a hundred-fold the learning are and increase the maximum number of iterations by an equal
amount. Does it get better?
Explain why adding a constant to the input image can have such a dramatic eect on the
performance of the optimisation.
Hint: What constraint should the filter w satisfy if the filter output should be zero when (i) the input image is
zero or (ii) the input image is a large constant? Do you think that it would be easy for gradient descent to
enforce (ii) at all times?
What you have just witnessed is actually a fairly general principle: centring the data usually makes learning
problems much better conditioned.
Now we will explore several parameters in the algorithms:
Task: Restore the preprocessing as given in experiment4.m . Try the following:
Try increasing the learning rate eta . Can you achieve a better value of the energy in the 500
iterations?
Disable momentum by setting momentum = 0 . Now try to beat the result obtained above by
choosing eta . Can you succeed?
Finally, consider the regularisation eect of shrinking:
Task: Restore the learning rate and momentum as given in experiment4.m . Then increase the shrinkage
factor tenfold and a hundred-fold.
What is the eect on the convergence speed?
What is the eect on the final value of the total objective function and of the average loss part of it?
15 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
These are stored as the array imdb.images.id is a 24,206-dimensional vector of numeric IDs for each of the
24,206 character images in the dataset. imdb.images.data contains a 32 32 image for each character, stored
as a slide of a 32 32 24,206-dimensional array. imdb.images.label is a vector of image labels, denoting
which one of the 26 possible characters it is. imdb.images.set is equal to 1 for each image that should be used
to train the CNN and to 2 for each image that should be used for validation.
Task: look at the Figure 1 generated by the code and at the code itself and make sure that you understand
what you are looking at.
16 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
yijk =
exijk
k exijk
17 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
trainOpts.numEpochs = 100 ;
trainOpts.continue = true ;
trainOpts.useGpu = false ;
trainOpts.learningRate = 0.001 ;
trainOpts.numEpochs = 15 ;
trainOpts.expDir = 'data/chars-experiment' ;
This says that the function will operate on SGD mini-batches of 100 elements, it will run for 15 epochs (passes
through the data), it will continue from the last epoch if interrupted, if will not use the GPU, it will use a learning
rate of 0.001, and it will save any file in the data/chars-experiment subdirectory.
Before the training starts, the average image value is subtracted:
% Take the average image out
imageMean = mean(imdb.images.data(:)) ;
imdb.images.data = imdb.images.data - imageMean ;
Here the key, in addition to the trainOpts structure, is the @getBatch function handle. This is how
cnn_train obtains a copy of the data to operate on. Examine this function (see the bottom of the exercise4.m
file):
function [im, labels] = getBatch(imdb, batch)
im = imdb.images.data(:,:,batch) ;
im = 256 * reshape(im, 32, 32, 1, []) ;
labels = imdb.images.label(1,batch) ;
The function extracts the m images corresponding to the vector of indexes batch . It also reshape them as a
32 32 1 m array (as this is the format expected by the MatConvNet functions) and multiplies the values
by 256 (the resulting values match the network initialisation and learning parameters). Finally, it also returns a
vector of labels, one for each image in the batch.
Task: Run the learning code and examine the plots that are produced. As training completes answer the
following questions:
1. How many images per second can you process? (Look at the output in the MATLAB screen)
2. There are two sets of curves: energy and prediction error. What do you think is the dierence? What is
the energy?
3. Some curves are labelled train and some other val. Should they be equal? Which one should be
lower than the other?
4. Both the top-1 and top-5 prediction errors are plotted. What do they mean? What is the dierence?
Once training is finished, the model is saved back:
18 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
Note that we remember the imageMean for later use. Note also that the somaxloss layer is removed from the
network before saving.
Question: The image is much wider than 32 pixels. Why can you apply to it the CNN learned before for
32 32 patches?
Task: examine the size of the CNN output using size(res(end).x) . Does this match your expectation?
Now use the decodeCharacters() function to visualise the results:
% Visualize the results
19 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
figure(3) ; clf ;
decodeCharacters(net, imdb, im, res) ;
Tasks: inspect the output of the decodeCharacters() function and answer the following:
1. Is the quality of the recognition any good?
2. Does this match your expectation given the recognition rate in your validation set (as reported by
cnn_train during training)?
20 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
3. Try again training the model of exercise4.m switching to true the useGpu flag.
Task: Follow the steps above and note the speed of training. How many images per second can you process
now?
For these small images, the GPU speedup is probably modest (perhaps 2-5 fold). However, for larger models it
becomes really dramatic (>10 fold).
Tasks:
1. Look at the output of vl_simplenn_display and understand the structure of the model. Can you
understand why it is called very deep?
2. Look at the size of the file data/imagenet-vgg-verydeep-16.mat on disk. This is just the model.
21 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
We can now use the model to classify an image. We start from peppers.png , a MATLAB stock image:
% obtain and preprocess an image
im = imread('peppers.png') ;
im_ = single(im) ; % note: 255 range
im_ = imresize(im_, net.normalization.imageSize(1:2)) ;
im_ = im_ - net.normalization.averageImage ;
The code normalises the image in a format compatible with the model net . This amounts to: converting the
image to single format (but with range 0,,255 rather than [0, 1] as typical in MATLAB), resizing the image to a
fixed size, and then subtracting an average image.
It is now possible to call the CNN:
% run the CNN
res = vl_simplenn(net, im_) ;
As usual, res contains the results of the computation, including all intermediate layers. The last one can be used
to perform the classification:
% show the classification result
scores = squeeze(gather(res(end).x)) ;
[bestScore, best] = max(scores) ;
figure(1) ; clf ; imagesc(im) ;
title(sprintf('%s (%d), score %.3f',...
net.classes.description{best}, best, bestScore)) ;
Acknowledgements
Beta testing by: Karel Lenc and Carlos Arteta.
Bugfixes/typos by: Sun Yushi.
History
22 of 23
20-01-16 16:00
CNN practical
http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
Used in the Oxford AIMS CDT, 2014-15.
23 of 23
20-01-16 16:00