DeepLearning Unit-II
DeepLearning Unit-II
Unit-II
Convolution Neural Network
Dr Rajesh Thumma
Assoc. Professor
Anurag University
Syllabus
Deep Learning: Activation functions, initialization,
regularization, batch normalization, model selection,
ensembles.
Convolutional neural networks: Fundamentals, architectures,
striding and padding, pooling layers, CNN -Case study with
MNIST, CNN vs Fully Connected.
Syllabus
Batch Noramlization
CNN: Introduction
Striding and Padding
Pooling layers
Structure
Operations and prediction of CNN with layers
CNN - Case study with MNIST
CNN vs Fully Connected
Weight Initialization
Its main objective is to prevent layer activation outputs from exploding or
vanishing gradients during the forward propagation. If either of the problems
occurs, loss gradients will either be too large or too small, and the network
will take more time to converge
Training the network without a useful weight initialization can lead to a very
slow convergence or an inability to converge
The most used weight initialization techniques are:
1. Zero Initialization (Initialized all weights to 0)
2. Random initialization
Why Initialization Matters
Avoiding Symmetry:
If all the weights are initialized to the same value, neurons in each layer will
learn the same features during training, preventing the model from capturing
complex patterns.
Speed of Convergence:
Proper initialization can lead to faster convergence by ensuring that
gradients during backpropagation do not vanish or explode.
Ensuring Effective Learning:
Good initialization helps in utilizing the capacity of the network efficiently,
enabling it to learn from the data properly.
Types of Initialization
weight initialization
• Zero Initialization (Initialized all weights to 0)
All weights are set to zero.
Problem: Leads to symmetry, where all neurons in the layer learn
the same features and gradients, making the model ineffective.
• Random Initialization:
Weights are initialized randomly, typically using a small range of
values.
Example: Uniform distribution between a small range (e.g., -0.05
to 0.05) or a normal distribution with a small standard deviation.
Issue: Random initialization can still lead to issues if not done
carefully, such as vanishing or exploding gradients.
weight initialization
• Xavier (Glorot) Initialization:
weight initialization
• He Initialization:
weight initialization
• LeCun Initialization:
weight initialization
• Orthogonal Initialization:
weight initialization
• Pre-trained Initialization:
Batch Normalization
Computers can not see things as we do, for computers image is nothing but a matrix
Convolutional Neural Network(CNN)
Order to be followed
1. Convolutional Layer
3. Flattening(unrolling)
Note 2: At the end we can take any no.of fully connected layers
Convolutional Neural Network (Structure)
Why Convolutions?
• Convolutions reduce the number of parameters and speed up the training of the
model significantly
• For example 14 million parameters in a fully connected layer can be reduced
to just 156 parameters in case of convolutional layer
Advantages of convolutional layers over fully connected layers:
1. Parameter sharing: In convolutions, a single filter is convolved over the entire
input. Due to this, the parameters are shared between input and output nodes
2. Sparsity of connections: For each layer, each output value depends on a small
number of inputs, instead taking account all the inputs i.e the weights of most of
the connections are zeros
ANN ( input is vector) CNN (input is image)
2. Stride: It is the number of steps (pixels) the filter is to be moved horizontally or vertically over the input
matrix. When stride is 1, then we move the filter to 1 step (pixel) at a time and when stride is 2, then we
move the filter to 2 steps (pixels) at a time etc
3. Padding: It is the process of adding borders with zeros to input matrix (default padding is 0)
4. No.of filters: for 3D input
• For 1-Dimensional (signal) : Parameters I,II are used
• For 2-Dimensional (gray image) : Parameters I, II, III are used
• For 2-Dimensional (color image) : Parameters I,II, III, IV are used
2-Dimensional Convolution
Gray Scale Image (1 Channel)
3 0 1 2 7 4
1 5 8 9 3 1
2 7 23 50 250 3
5 55 34 3 1 89
67 45 4 56 34 23
17 13 17 20 23 16
6X6 Matrix
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
0 1 3 1 7 8 * 1
1
0
0
-1
-1
0 -2 -4 -7
4 2 1 6 2 8 -3 -2 -3 -16
2 4 5 2 3 9 3x3 4x4
Filter output
6x6
Gray scale image
So, we take the first 3 X 3 matrix from the 6 X 6 image and multiply it with the filter. Now, the first element of the
4 X 4 output will be the sum of the element-wise product of these values, i.e. 3*1 + 0 + 1*-1 + 1*1 + 5*0 + 8*-1 +
2*1 + 7*0 + 2*-1 = -5. To calculate the second element, we will shift the filter one step towards the right and
again get the sum of the element-wise product
Convolution Operation - Gray Scale Image
3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
0 1 3 1 7 8 * 1 0 -1
0 -2 -4 -7
1 0 -1 -3 -2 -3 -16
4 2 1 6 2 8
2 4 5 2 3 9
Convolution Operation - Gray Scale Image
3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
0 1 3 1 7 8 * 1
1
0
0
-1
-1
0 -2 -4 -7
4 2 1 6 2 8 -3 -2 -3 -16
2 4 5 2 3 9
Convolution Operation - Gray Scale Image
3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
0 1 3 1 7 8 * 1
1
0
0
-1
-1
0 -2 -4 -7
4 2 1 6 2 8 -3 -2 -3 -16
2 4 5 2 3 9
Convolution Operation - Gray Scale Image
3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
0 1 3 1 7 8 * 1
1
0
0
-1
-1
0 -2 -4 -7
4 2 1 6 2 8 -3 -2 -3 -16
2 4 5 2 3 9
3-Dimensional convolution
Colour Image (3 Channels)
Keep in mind that the number of channels in the input and filter should be same
After convolution, the output shape is a 4 X 4 matrix
First element of the output is the sum of the element-wise product of the 27 values from the input ( 9
values from each channel ) and the 27 values from the filter
After that we convolve over the entire image
Convolution operation
• Each convolutional layer contains 1 or more convolutional filters. The number of filters in
each CONV layers determines the depth of the next layer because each filter produces its own
feature map
• The CONV layers are the hidden layers. And to increase the number of neurons in hidden
layers, we increase the number of kernels in CONV layers.
• Each Kernel unit is considered a neuron.
• Kernel_size is one of the hyperparameters that you will be setting when building a
convolutional layer.
CONV layers in Keras
• model.add(Conv2D(filters=16, kernel_size=3, stride=’1’, padding='same', activation='relu'))
Padding
• Convolving a 6x6 input with a 3x3 filter results in 4x4 output i.e the input size is not
retained after convolution.
Disadvantages:
1. The size of the image shrinks, after convolution operation
2. Pixels present in the corners of the image are used only a few no. of times as compared
to central pixels - we do not focus too much on the corners, it can lead to information loss.
To overcome these issues, we use padding i.e padding the image with an additional border
i.e we add one pixel all around the edges There are two common choices for padding:
0 0 0 0 0
• valid: It means no padding i.e p=0
0 0 1 2 0
• same: we apply padding so that the output size is same as 0 3 4 5 0
the input size, i.e., n+2p-f+1 = n, so p = ( f-1 )/2 0 6 7 8 0
0 0 0 0 0
Padding
For the input matrix, we add one pixel all around the edges.
This means that the input will be an 8 x 8 matrix (instead of a 6x6 matrix )
Applying convolution of 3 x 3 on 8 x 8 will result in a 6 x 6 matrix which is the original shape of the
image
Input: n x n - 6 x 6
Padding: p = (f-1)/2 =(3-1)/2=1 Filter size: f X f - 3 x 3
Output: ( n+2p-f+1 ) x ( n+2p-f+1 ) = ( 6+2-3+1 ) x ( 6+2-3+1 ) = 6 x 6
0 0 0 0 0 0 0 0
-5 -5 -6 -1 6 10
0 3 0 1 2 7 4 0
1 0 -1 -12 -5 -4 0 8 11
0 1 5 8 9 3 1 0
-13 -10 -2 2 3 11
0 2 7 2 5 1 3 0 1 0 -1
0 0 1 3 1 7 8 0 -10 0 -2 -4 -7 10
1 0 -1
0 4 2 1 6 2 8 0 -7 -3 -2 -3 -16 12
0 2 4 5 2 3 9 0 -6 0 -2 1 -9 5
0 0 0 0 0 0 0 0
2. Pooling layer (Pooling operation)
Pooling operation: It is responsible for reducing the spatial size of the Convolved Feature.
This is to decrease the computational power required to process the data by reducing
the dimensions.
Types of Pooling:
1. Max pooling: Maximum element in the pooling window is selected
2. Min pooling: Minimum element in the pooling window is selected
3.Avg pooling: Average of all the elements of the pooling window is selected .
Generally we use max pooling or avg pooling
Hyperparameters for pooling operation:
1. Filter size ( elements are not required)
2. Stride
3. Max pooling or average pooling
2. Pooling layer
• The pooling layer will always reduce the size
of each feature map by a factor of 2.
Ex 1: If size of input is 4x4 then after applying
pooling operation, size of output is 2x2
Ex 2: If size of i'nput is 6x6 then after applying
pooling operation, size of output is 3x3
Hyperparameters:
1. Filter size -2x2
2. Stride - 2
3. Max pooling or average pooling
Note: Max/Avg pooling preserves the important features
2. Pooling Layer
-5 -5 -6 -1 6 10
-12 -5 -4 0 8 11
-5 0 11
-13 -10 -2 2 3 11
0 2 11
-10 0 -2 -4 -7 10
-7 -3 -2 -3 -16 12
Max pooling 0 1 12
-6 0 -2 1 -9 5 3x3
6x6
POOL layer has the following attributes that we need to configure:
Ex:
-5
0
-5 0 11 11
0
0 2 11
2
0 1 12 unrolling 11
0
Output of the pooling layer 1
12
11
11
0
Output
1 layer
output of
pooling layer 12
input layer
hidden layers
4. Fully Connected Layer
Fully-Connected Layer (FC-Layer): The flattened matrix is fed as input to
the fully connected layer to classify the image. CNN use fully-connected
layers in which each pixel is considered as a separate neuron just like a
regular neural network. The last fully-connected layer will contain as many
neurons as the number of classes to be predicted.
• Image Classification
Applications: • Image Recognition
• Document Classification
• Medical Image Analysis
• Automatic image captioning
Digit Classification
Brain Tumor Classification
Y N
CNN Architecture
ReLU
C
C O
O N
IMAGE NON
N POOLING V
LINEARITY
V
FULLY NON
CONNECTED POOLING
CLASS LINEARITY
LAYER
ReLU
SOFTMAX
• This dataset consists of 70000 images where each image is a handwritten digit,
out of which 60000 are for training and 10000 are for testing.
• 70000 handwritten digits are divided into 10 classes. Classes include digits such as
0,1,2,3,4,5,6,7,8,9
• Size of each image is a 28x28 pixel square(784 pixels total)
MNIST(Sample dataset)
2.Object Recognition in Photographs-CIFAR 10 dataset
( multi-class classification, color images, 3D convolution )
• CIFAR dataset: (Canadian Institute For Advanced Research) is a large database of
Photographs of objects that is used for training various image processing systems
,frog,horse
• The CIFAR-10 consists of tiny colour images
• The database is also widely used for training and testing in the field of
deep learning.
• This dataset consists of 60000 photos where each object is a photograph, out of
which 50000 are for training and 10000 are for testing.
• 60000 photographs are divided into 10 classes. Classes include objects such as
airplanes, automobiles, birds, cats, frog, horse, ship, truck etc
• Size of each image is a 32x32 pixel squares (784 pixels total)
CIFAR10 ( Sample Dataset )
3. Predict Sentiment from Movie Reviews-IMDB dataset
( Binary classification, 1D convolution)
• The Internet Movie Database (IMDB) is a huge repository for image and text
data.
• The database is an excellent source for data analytics and deep learning
practice and research
• This is a dataset for binary sentiment classification consists of
50,000 highly polar movie reviews.
• 50000 movie reviews are divided into 2 classes. Classes include positive
reviews and negative reviews
• This is a dataset for with 25,000 highly polar movie reviews for training, and
25,000 for testing.
IMDB ( Sample Dataset )
S.No Review Sentiment
This movie features Charlie Spradling dancing in a strip club. Beyond
1 that, it features a truly bad script with dull, unrealistic dialogue. That negative
it got as many positive votes suggests some people may be joking.
If you like original gut wrenching laughter you will like this movie. If
2 you are young or old then you will love this movie, hell even my mom Positive
liked it. Great Camp!!!
It's terrific when a funny movie doesn't make smile you. What a pity!!
3 This film is very boring and so long. It's simply painfull. The story is negative
staggering without goal and no fun. You feel better when it's finished.
I have seen this film at least 100 times and I am still excited by it, the
acting is perfect and the romance between Joe and Jean keeps me on
4 positive
the edge of my seat, plus I still think Bryan Brown is the tops. Brilliant
Film.
Datasets - MNIST / CIFAR10 / IMDB
No. of
S.No dataset type of dataset classes class labels classification convolution
airplane, automobile,
2 CIFAR10 images of objects 10 dog, deer, cat, truck, multiclass 3D
(Color images) ship, horse, bird, frog
# FC_2: output a softmax to squash the matrix into output probabilities for the 10 classes
model.add(Dense(10, activation='softmax'))
number of params = filters x kernel size x depth of the previous layer + no. of filters (for biases)
Model summary
Conv2d_1
=((3*3*1)+1)*32
=320
Model summary
Maxpooling
=((28-2)/2)+1
=14
Model summary
Conv2d_2
=((3*3*32)+1)*64
=18496
Model summary
Maxpooling
=((14-2)/2)+1
=7
Model summary
Flatten
=7*7*64
Model summary
Dense_1
=64*3136+1*64
=200768
Model summary
Dense_2
=10*64+1*10
=650
320
18496
200768
650
220234
CNN vs Fully Connected
The basic difference between the two types of layers is the density of the
connections. The FC layers are densely connected, meaning that every neuron
in the output is connected to every input neuron. On the other hand, in a Conv
layer, the neurons are not densely connected but are connected only to
neighboring neurons within the width of the convolutional kernel.