Image Category Classification Using Deep Learning
Image Category Classification Using Deep Learning
This example shows how to use a pretrained Convolutional Neural Network (CNN) as a feature
extractor for training an image category classifier.
Contents
Overview
Download Image Data
Load Images
Load pretrained Network
Prepare Training and Test Image Sets
Pre-process Images For CNN
Extract Training Features Using CNN
Train A Multiclass SVM Classifier Using CNN Features
Evaluate Classifier
Try the Newly Trained Classifier on Test Images
References
Overview
A Convolutional Neural Network (CNN) is a powerful machine learning technique from the field
of deep learning. CNNs are trained using large collections of diverse images. From these large
collections, CNNs can learn rich feature representations for a wide range of images. These
feature representations often outperform hand-crafted features such as HOG, LBP, or SURF.
An easy way to leverage the power of CNNs, without investing time and effort into training, is to
use a pretrained CNN as a feature extractor.
In this example, images from Caltech 101 are classified into categories using a multiclass linear
SVM trained with CNN features extracted from the images. This approach to image category
classification follows the standard practice of training an off-the-shelf classifier using features
extracted from images. For example, the Image Category Classification Using Bag Of
Features example uses SURF features within a bag of features framework to train a multiclass
SVM. The difference here is that instead of using image features such as HOG or SURF,
features are extracted using a CNN. And, as this example will show, the classifier trained using
CNN features provides close to 100% accuracy, which is higher than the accuracy achieved
using bag of features and SURF.
Note: This example requires Neural Network Toolbox™, Statistics and Machine Learning
Toolbox™, and Neural Network Toolbox™ Model for ResNet-50 Network .
Using a CUDA-capable NVIDIA™ GPU with compute capability 3.0 or higher is highly
recommended for running this example. Use of a GPU requires the Parallel Computing
Toolbox™.
Note: Download time of the data depends on your internet connection. The next set of
commands use MATLAB to download the data and will block MATLAB. Alternatively, you can
use your web browser to first download the dataset to your local disk. To use the file you
downloaded from the web, change the 'outputFolder' variable above to the location of the
downloaded file.
Load Images
Instead of operating on all of Caltech 101, which is time consuming, use three of the categories:
airplanes, ferry, and laptop. The image category classifier will be trained to distinguish amongst
these six categories.
The imds variable now contains the images and the category labels associated with each
image. The labels are automatically assigned from the folder names of the image files.
Use countEachLabel to summarize the number of images per category.
tbl = countEachLabel(imds)
tbl =
3×2 table
Label Count
_________ _____
airplanes 800
ferry 67
laptop 81
Because imds above contains an unequal number of images per category, let's first adjust it, so
that the number of images in the training set is balanced.
minSetCount = min(tbl{:,2}); % determine the smallest amount of images in a category
% Notice that each set now has exactly the same number of images.
countEachLabel(imds)
ans =
3×2 table
Label Count
_________ _____
airplanes 67
ferry 67
laptop 67
Below, you can see example images from three of the categories included in the dataset.
figure
subplot(1,3,1);
imshow(readimage(imds,airplanes))
subplot(1,3,2);
imshow(readimage(imds,ferry))
subplot(1,3,3);
imshow(readimage(imds,laptop))
Load pretrained Network
There are several pretrained networks that have gained popularity. Most of these have been
trained on the ImageNet dataset, which has 1000 object categories and 1.2 million training
images[1]. "ResNet-50" is one such model and can be loaded using the resnet50 function
from Neural Network Toolbox™. Using resnet50 requires that you first install Neural Network
Toolbox™ Model for ResNet-50 Network.
% Load pretrained network
net = resnet50();
Other popular networks trained on ImageNet include AlexNet, GoogLeNet, VGG-16 and VGG-
19 [3], which can be loaded using alexnet, googlenet, vgg16, and vgg19 from the Neural
Network Toolbox™.
Use plot to visualize the network. Because this is a large network, adjust the display window to
show just the first section.
% Visualize the first section of the network.
figure
plot(net)
title('First section of ResNet-50')
set(gca,'YLim',[150 170]);
The first layer defines the input dimensions. Each CNN has a different input size requirements.
The one used in this example requires image input that is 224-by-224-by-3.
ans =
Name: 'input_1'
Hyperparameters
DataAugmentation: 'none'
Normalization: 'zerocenter'
The intermediate layers make up the bulk of the CNN. These are a series of convolutional
layers, interspersed with rectified linear units (ReLU) and max-pooling layers [2]. Following the
these layers are 3 fully-connected layers.
The final layer is the classification layer and its properties depend on the classification task. In
this example, the CNN model that was loaded was trained to solve a 1000-way classification
problem. Thus the classification layer has 1000 classes from the ImageNet dataset.
ans =
Name: 'ClassificationLayer_fc1000'
OutputSize: 1000
Hyperparameters
LossFunction: 'crossentropyex'
ans =
1000
Note that the CNN model is not going to be used for the original classification task. It is going to
be re-purposed to solve a different classification task on the Caltech 101 dataset.
Note that the activations function automatically uses a GPU for processing if one is available,
otherwise, a CPU is used.
In the code above, the 'MiniBatchSize' is set 32 to ensure that the CNN and image data fit into
GPU memory. You may need to lower the 'MiniBatchSize' if your GPU runs out of memory.
Also, the activations output is arranged as columns. This helps speed-up the multiclass linear
SVM training that follows.
% Train multiclass SVM classifier using a fast linear solver, and set
% 'ObservationsIn' to 'columns' to match the arrangement used for training
% features.
classifier = fitcecoc(trainingFeatures, trainingLabels, ...
'Learners', 'Linear', 'Coding', 'onevsall', 'ObservationsIn', 'columns');
Evaluate Classifier
Repeat the procedure used earlier to extract image features from testSet. The test features
can then be passed to the classifier to measure the accuracy of the trained classifier.
% Extract test features using the CNN
testFeatures = activations(net, augmentedTestSet, featureLayer, ...
'MiniBatchSize', 32, 'OutputAs', 'columns');
confMat =
0.9787 0 0.0213
0 1.0000 0
0 0 1.0000
ans =
0.9929
label =
categorical
airplanes
References
[1] Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." Computer Vision
and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.
[2] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep
convolutional neural networks." Advances in neural information processing systems. 2012.
[3] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale
image recognition." arXiv preprint arXiv:1409.1556 (2014).
[4] Donahue, Jeff, et al. "Decaf: A deep convolutional activation feature for generic visual
recognition." arXiv preprint arXiv:1310.1531 (2013).