0% found this document useful (0 votes)
13 views

DeepLearning Unit-II

Uploaded by

D44 SREETEJA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

DeepLearning Unit-II

Uploaded by

D44 SREETEJA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Deep Learning

Unit-II
Convolution Neural Network
Dr Rajesh Thumma
Assoc. Professor
Anurag University
Syllabus
Deep Learning: Activation functions, initialization,
regularization, batch normalization, model selection,
ensembles.
Convolutional neural networks: Fundamentals, architectures,
striding and padding, pooling layers, CNN -Case study with
MNIST, CNN vs Fully Connected.
Syllabus
 Batch Noramlization
 CNN: Introduction
 Striding and Padding
 Pooling layers
 Structure
 Operations and prediction of CNN with layers
 CNN - Case study with MNIST
 CNN vs Fully Connected
Weight Initialization
Its main objective is to prevent layer activation outputs from exploding or
vanishing gradients during the forward propagation. If either of the problems
occurs, loss gradients will either be too large or too small, and the network
will take more time to converge
Training the network without a useful weight initialization can lead to a very
slow convergence or an inability to converge
The most used weight initialization techniques are:
1. Zero Initialization (Initialized all weights to 0)
2. Random initialization
Why Initialization Matters

Avoiding Symmetry:
If all the weights are initialized to the same value, neurons in each layer will
learn the same features during training, preventing the model from capturing
complex patterns.
Speed of Convergence:
Proper initialization can lead to faster convergence by ensuring that
gradients during backpropagation do not vanish or explode.
Ensuring Effective Learning:
Good initialization helps in utilizing the capacity of the network efficiently,
enabling it to learn from the data properly.
Types of Initialization
weight initialization
• Zero Initialization (Initialized all weights to 0)
All weights are set to zero.
Problem: Leads to symmetry, where all neurons in the layer learn
the same features and gradients, making the model ineffective.
• Random Initialization:
Weights are initialized randomly, typically using a small range of
values.
Example: Uniform distribution between a small range (e.g., -0.05
to 0.05) or a normal distribution with a small standard deviation.
Issue: Random initialization can still lead to issues if not done
carefully, such as vanishing or exploding gradients.
weight initialization
• Xavier (Glorot) Initialization:
weight initialization
• He Initialization:
weight initialization
• LeCun Initialization:
weight initialization
• Orthogonal Initialization:
weight initialization
• Pre-trained Initialization:
Batch Normalization

Batch normalization is a technique used in machine learning,


especially in training deep neural networks. Its main goal is to
improve the speed, performance, and stability of a neural
network. Here's a simple explanation suitable for understanding.
Batch Normalization
Imagine you're in a classroom with students taking a test. Some
students might finish very quickly, while others might take a lot
longer. If the teacher gives some help to the slower students and
keeps the faster students from going too fast, everyone can stay on
track and learn better together.
In batch normalization, the "students" are the individual units (or
neurons) in a neural network. During training, these neurons can
sometimes produce outputs (answers) that are too high or too low,
which can slow down the learning process. Batch normalization helps
by adjusting the outputs to be more balanced, so the network can
learn faster and more efficiently.
Batch Normalization
In technical terms, batch normalization works by:
Calculating the mean and variance of the outputs of neurons for a mini-
batch (a small group of data points).
Using these statistics to normalize the outputs, so they have a mean of 0 and
a variance of 1.
Scaling and shifting these normalized outputs with learnable parameters to
ensure the network can still learn complex patterns.
This helps the neural network train faster and can improve its overall
performance.
Convolutional Neural Networks (Conv Net)
• Convolutional Neural Network (CNN) is a type of Deep Learning neural network
architecture commonly used in Computer Vision. Computer vision is a field of Artificial
Intelligence that enables a computer to understand and interpret the image or visual data.
• The role of ConvNet is to transform the images into a form that is easier to process, without
losing features that are critical for prediction.
• It uses a special technique called Convolution.
Convolutional Neural Networks (Conv Net)
In CNN, every image is represented in the form of an array of pixel values.

 Computers can not see things as we do, for computers image is nothing but a matrix
Convolutional Neural Network(CNN)

Order to be followed
1. Convolutional Layer

2. Pooling Layer ( optional )

3. Flattening(unrolling)

4. Fully connected layer

Note 1: The first layer must be convolutional layer

Note 2: At the end we can take any no.of fully connected layers
Convolutional Neural Network (Structure)
Why Convolutions?
• Convolutions reduce the number of parameters and speed up the training of the
model significantly
• For example 14 million parameters in a fully connected layer can be reduced
to just 156 parameters in case of convolutional layer
Advantages of convolutional layers over fully connected layers:
1. Parameter sharing: In convolutions, a single filter is convolved over the entire
input. Due to this, the parameters are shared between input and output nodes
2. Sparsity of connections: For each layer, each output value depends on a small
number of inputs, instead taking account all the inputs i.e the weights of most of
the connections are zeros
ANN ( input is vector) CNN (input is image)

In CNN, the number of parameters/weights is


In ANN, all the layers are fully connected
independent of the size of the image. It depends on
layers, due to this there will be millions of
the filter size. Advantages of CNN:
parameters/weights (on each connection
1. Sparsity of connections
separate weight) which increases
2. Parameter sharing (Bolded connections are with 0
computational complexity
weights)
1. Convolution Layer
Convolution operation:
• Filter/Kernel : When this Kernel
(K) is convolved with the input
image F(x,y), it creates a new
convolved image amplifying the
edges. Also known as feature map.
• Other filters can be applied to
detected different types of features.
For example, some filters detect
horizontal edges, others detect
vertical edges, some other filters
detect more complex shapes like
corners and so on.
Convolution Layer
Convolution operation: hyper parameters
1. Filter/Kernel : It is a weight matrix

2. Stride: It is the number of steps (pixels) the filter is to be moved horizontally or vertically over the input
matrix. When stride is 1, then we move the filter to 1 step (pixel) at a time and when stride is 2, then we
move the filter to 2 steps (pixels) at a time etc
3. Padding: It is the process of adding borders with zeros to input matrix (default padding is 0)
4. No.of filters: for 3D input
• For 1-Dimensional (signal) : Parameters I,II are used
• For 2-Dimensional (gray image) : Parameters I, II, III are used
• For 2-Dimensional (color image) : Parameters I,II, III, IV are used
2-Dimensional Convolution
Gray Scale Image (1 Channel)

3 0 1 2 7 4
1 5 8 9 3 1
2 7 23 50 250 3
5 55 34 3 1 89
67 45 4 56 34 23
17 13 17 20 23 16

6X6 Matrix

Each Pixel value is in the range of 0-255

• Conversion of gray scale image into a 2-dimensional matrix of pixels


2-Dimensional Convolution
Convolution Operation : Gray Scale image (2D)

Convolutional Gray Scale image Example


operation

input size (2D) nx x ny 6x6


filter size fxf 3x3
stride s = 1,2,3, … s = 1 (default)
padding p = 0, 1, 2,… p = 0 (default)
output size (2D) (nx +2p-f )/s + 1 x (ny+2p-f )/s + 1 ((6+2*0-3)/1+1) x (6+2*0-3)/1+1))
o/p matrix - 4 x 4
Convolution Operation - Gray Scale Image
3 0 1 2 7 4 Convolution operator

1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
0 1 3 1 7 8 * 1
1
0
0
-1
-1
0 -2 -4 -7
4 2 1 6 2 8 -3 -2 -3 -16
2 4 5 2 3 9 3x3 4x4
Filter output
6x6
Gray scale image

So, we take the first 3 X 3 matrix from the 6 X 6 image and multiply it with the filter. Now, the first element of the
4 X 4 output will be the sum of the element-wise product of these values, i.e. 3*1 + 0 + 1*-1 + 1*1 + 5*0 + 8*-1 +
2*1 + 7*0 + 2*-1 = -5. To calculate the second element, we will shift the filter one step towards the right and
again get the sum of the element-wise product
Convolution Operation - Gray Scale Image

3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
0 1 3 1 7 8 * 1 0 -1
0 -2 -4 -7
1 0 -1 -3 -2 -3 -16
4 2 1 6 2 8
2 4 5 2 3 9
Convolution Operation - Gray Scale Image

3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
0 1 3 1 7 8 * 1
1
0
0
-1
-1
0 -2 -4 -7
4 2 1 6 2 8 -3 -2 -3 -16
2 4 5 2 3 9
Convolution Operation - Gray Scale Image

3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
0 1 3 1 7 8 * 1
1
0
0
-1
-1
0 -2 -4 -7
4 2 1 6 2 8 -3 -2 -3 -16
2 4 5 2 3 9
Convolution Operation - Gray Scale Image

3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
0 1 3 1 7 8 * 1
1
0
0
-1
-1
0 -2 -4 -7
4 2 1 6 2 8 -3 -2 -3 -16
2 4 5 2 3 9
3-Dimensional convolution
Colour Image (3 Channels)

Each Pixel is in the range of 0-255


Convolution operation - Color image
input : 6 X 6 X 3 input : 3D
filter : 3 X 3 X 3 filter : 3D
output : 4 X 4 output : 2D

Keep in mind that the number of channels in the input and filter should be same
After convolution, the output shape is a 4 X 4 matrix
First element of the output is the sum of the element-wise product of the 27 values from the input ( 9
values from each channel ) and the 27 values from the filter
After that we convolve over the entire image
Convolution operation
• Each convolutional layer contains 1 or more convolutional filters. The number of filters in
each CONV layers determines the depth of the next layer because each filter produces its own
feature map
• The CONV layers are the hidden layers. And to increase the number of neurons in hidden
layers, we increase the number of kernels in CONV layers.
• Each Kernel unit is considered a neuron.
• Kernel_size is one of the hyperparameters that you will be setting when building a
convolutional layer.
CONV layers in Keras
• model.add(Conv2D(filters=16, kernel_size=3, stride=’1’, padding='same', activation='relu'))
Padding
• Convolving a 6x6 input with a 3x3 filter results in 4x4 output i.e the input size is not
retained after convolution.
Disadvantages:
1. The size of the image shrinks, after convolution operation
2. Pixels present in the corners of the image are used only a few no. of times as compared
to central pixels - we do not focus too much on the corners, it can lead to information loss.
To overcome these issues, we use padding i.e padding the image with an additional border
i.e we add one pixel all around the edges There are two common choices for padding:
0 0 0 0 0
• valid: It means no padding i.e p=0
0 0 1 2 0
• same: we apply padding so that the output size is same as 0 3 4 5 0
the input size, i.e., n+2p-f+1 = n, so p = ( f-1 )/2 0 6 7 8 0
0 0 0 0 0
Padding
For the input matrix, we add one pixel all around the edges.
This means that the input will be an 8 x 8 matrix (instead of a 6x6 matrix )
Applying convolution of 3 x 3 on 8 x 8 will result in a 6 x 6 matrix which is the original shape of the
image
Input: n x n - 6 x 6
Padding: p = (f-1)/2 =(3-1)/2=1 Filter size: f X f - 3 x 3
Output: ( n+2p-f+1 ) x ( n+2p-f+1 ) = ( 6+2-3+1 ) x ( 6+2-3+1 ) = 6 x 6
0 0 0 0 0 0 0 0
-5 -5 -6 -1 6 10
0 3 0 1 2 7 4 0
1 0 -1 -12 -5 -4 0 8 11
0 1 5 8 9 3 1 0
-13 -10 -2 2 3 11
0 2 7 2 5 1 3 0 1 0 -1
0 0 1 3 1 7 8 0 -10 0 -2 -4 -7 10
1 0 -1
0 4 2 1 6 2 8 0 -7 -3 -2 -3 -16 12
0 2 4 5 2 3 9 0 -6 0 -2 1 -9 5
0 0 0 0 0 0 0 0
2. Pooling layer (Pooling operation)
Pooling operation: It is responsible for reducing the spatial size of the Convolved Feature.
This is to decrease the computational power required to process the data by reducing
the dimensions.
Types of Pooling:
1. Max pooling: Maximum element in the pooling window is selected
2. Min pooling: Minimum element in the pooling window is selected
3.Avg pooling: Average of all the elements of the pooling window is selected .
Generally we use max pooling or avg pooling
Hyperparameters for pooling operation:
1. Filter size ( elements are not required)
2. Stride
3. Max pooling or average pooling
2. Pooling layer
• The pooling layer will always reduce the size
of each feature map by a factor of 2.
Ex 1: If size of input is 4x4 then after applying
pooling operation, size of output is 2x2
Ex 2: If size of i'nput is 6x6 then after applying
pooling operation, size of output is 3x3
Hyperparameters:
1. Filter size -2x2
2. Stride - 2
3. Max pooling or average pooling
Note: Max/Avg pooling preserves the important features
2. Pooling Layer
-5 -5 -6 -1 6 10

-12 -5 -4 0 8 11
-5 0 11
-13 -10 -2 2 3 11
0 2 11
-10 0 -2 -4 -7 10

-7 -3 -2 -3 -16 12
Max pooling 0 1 12

-6 0 -2 1 -9 5 3x3
6x6
POOL layer has the following attributes that we need to configure:

model.add(MaxPooling2D(pool_size=(2, 2), strides = 2))


3. Flattening Layer
Flattening layer: Flattening layer is used to convert the resultant 2-
Dimensional arrays from pooled feature maps into a single continuous linear
vector.
3. Flattening Layer
Flattening(Unrolling): Conversion of matrix in to vector

Ex:

-5
0

-5 0 11 11
0
0 2 11
2
0 1 12 unrolling 11
0
Output of the pooling layer 1
12

Input to the Fully Connected layer


4. Fully Connected Layer
-5

11

11

0
Output
1 layer
output of
pooling layer 12

input layer

hidden layers
4. Fully Connected Layer
Fully-Connected Layer (FC-Layer): The flattened matrix is fed as input to
the fully connected layer to classify the image. CNN use fully-connected
layers in which each pixel is considered as a separate neuron just like a
regular neural network. The last fully-connected layer will contain as many
neurons as the number of classes to be predicted.
• Image Classification
Applications: • Image Recognition
• Document Classification
• Medical Image Analysis
• Automatic image captioning
Digit Classification
Brain Tumor Classification

Y N
CNN Architecture
ReLU

C
C O
O N
IMAGE NON
N POOLING V
LINEARITY
V

FULLY NON
CONNECTED POOLING
CLASS LINEARITY
LAYER

ReLU
SOFTMAX

*ReLU = Rectified Linear Unit


Steps to be followed to train the CNN
1. Provide the input image into convolution layer.
2. Get convolution with featured kernel / filters.
3. Apply pooling layer to reduce the dimensions.
4. Add these layers multiple times.
5. Flatten the output and feed into a fully connected layer.
6. Now train the model with backpropagation using logistic regression.
Some well known convolution networks
 LeNet — Developed by Yann LeCun to recognize handwritten digits is the pioneer CNN.
 AlexNet — Developed by Alex Krizhevsky, Ilya Sutskever and Geoff Hinton won the 2012
ImageNet challenge. It is the first CNN where multiple convolution operations were used.
 GoogleLeNet - Developed by Google, won the 2014 ImageNet competition. The main
advantage of this network over the other networks was that it required a lot lesser number of
parameters to train, making it faster and less prone to overfitting.
 VGGNet - This is another popular network, with its most popular version being VGG16.
VGG16 has 16 layers which includes input, output and hidden layers.
 ResNet - Developed by Kaiming He, this network won the 2015 ImageNet competition.
The 2 most popular variant of ResNet are the ResNet50 and ResNet34. Another complex
variation of ResNet is ResNeXt architecture.
Projects on CNN

1. Hand Written Digit Recognition - MNIST dataset (2D)

2. Object Recognition in Photographs - CIFAR 10/100 dataset (3D)

3. Predict Sentiment from Movie Reviews - IMDB dataset (1D)


1.Hand Written Digit Recognition - MNIST dataset
( Multi-class classification , gray scale images, 2D convolution)
• MNIST dataset: (Modified National Institute of Standards and Technology database) is a
large database of handwritten digits that is used for training various image processing
systems
• The database is also widely used for training and testing in the field of machine learning
and deep learning.

• This dataset consists of 70000 images where each image is a handwritten digit,
out of which 60000 are for training and 10000 are for testing.
• 70000 handwritten digits are divided into 10 classes. Classes include digits such as
0,1,2,3,4,5,6,7,8,9
• Size of each image is a 28x28 pixel square(784 pixels total)
MNIST(Sample dataset)
2.Object Recognition in Photographs-CIFAR 10 dataset
( multi-class classification, color images, 3D convolution )
• CIFAR dataset: (Canadian Institute For Advanced Research) is a large database of
Photographs of objects that is used for training various image processing systems
,frog,horse
• The CIFAR-10 consists of tiny colour images
• The database is also widely used for training and testing in the field of
deep learning.
• This dataset consists of 60000 photos where each object is a photograph, out of
which 50000 are for training and 10000 are for testing.
• 60000 photographs are divided into 10 classes. Classes include objects such as
airplanes, automobiles, birds, cats, frog, horse, ship, truck etc
• Size of each image is a 32x32 pixel squares (784 pixels total)
CIFAR10 ( Sample Dataset )
3. Predict Sentiment from Movie Reviews-IMDB dataset
( Binary classification, 1D convolution)

• The Internet Movie Database (IMDB) is a huge repository for image and text
data.
• The database is an excellent source for data analytics and deep learning
practice and research
• This is a dataset for binary sentiment classification consists of
50,000 highly polar movie reviews.
• 50000 movie reviews are divided into 2 classes. Classes include positive
reviews and negative reviews
• This is a dataset for with 25,000 highly polar movie reviews for training, and
25,000 for testing.
IMDB ( Sample Dataset )
S.No Review Sentiment
This movie features Charlie Spradling dancing in a strip club. Beyond
1 that, it features a truly bad script with dull, unrealistic dialogue. That negative
it got as many positive votes suggests some people may be joking.
If you like original gut wrenching laughter you will like this movie. If
2 you are young or old then you will love this movie, hell even my mom Positive
liked it. Great Camp!!!
It's terrific when a funny movie doesn't make smile you. What a pity!!
3 This film is very boring and so long. It's simply painfull. The story is negative
staggering without goal and no fun. You feel better when it's finished.
I have seen this film at least 100 times and I am still excited by it, the
acting is perfect and the romance between Joe and Jean keeps me on
4 positive
the edge of my seat, plus I still think Bryan Brown is the tops. Brilliant
Film.
Datasets - MNIST / CIFAR10 / IMDB

No. of
S.No dataset type of dataset classes class labels classification convolution

1 MNIST Handwritten digits 10 0,1,2,3,4,5,6,7,8,9 multiclass 2D


(gray scale images)

airplane, automobile,
2 CIFAR10 images of objects 10 dog, deer, cat, truck, multiclass 3D
(Color images) ship, horse, bird, frog

3 IMDB movie reviews 2 positive, negative binary 1D


Steps to develop Convolutional Neural Networks(CNN) with
Keras

Step 1: Load Dataset

Step2: Define Model

Step 3: Compile Model

Step 4: Fit Model

Step 5: Evaluate Model


Hand Written Digit Recognition
Build the model architecture
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
# build the model object
model = Sequential()
# CONV_1: add CONV layer with RELU activation and depth = 32 kernels
model.add(Conv2D(32, kernel_size=(3, 3), stride=1, padding='same', activation='relu', input_shape=(28,28,1)))

# POOL_1: downsample the image to choose the best features


model.add(MaxPooling2D(pool_size=(2, 2)))
# CONV_2: here we increase the depth to 64
model.add(Conv2D(64, (3, 3), stride=1, padding='same', activation='relu'))

# POOL_2: more downsampling


model.add(MaxPooling2D(pool_size=(2, 2)))
Build the model architecture
# flatten since too many dimensions, we only want a classification output
model.add(Flatten())

# FC_1: fully connected to get all relevant data


model.add(Dense(64, activation='relu'))

# FC_2: output a softmax to squash the matrix into output probabilities for the 10 classes
model.add(Dense(10, activation='softmax'))

# print model architecture summary


model.summary()
Model summary

number of params = filters x kernel size x depth of the previous layer + no. of filters (for biases)
Model summary
Conv2d_1

=((3*3*1)+1)*32
=320
Model summary
Maxpooling

=((28-2)/2)+1
=14
Model summary

Conv2d_2

=((3*3*32)+1)*64
=18496
Model summary
Maxpooling

=((14-2)/2)+1
=7
Model summary

Flatten

=7*7*64
Model summary
Dense_1

=64*3136+1*64
=200768
Model summary
Dense_2

=10*64+1*10
=650
320
18496
200768
650
220234
CNN vs Fully Connected
 The basic difference between the two types of layers is the density of the
connections. The FC layers are densely connected, meaning that every neuron
in the output is connected to every input neuron. On the other hand, in a Conv
layer, the neurons are not densely connected but are connected only to
neighboring neurons within the width of the convolutional kernel.

 A second main difference between them is weight sharing. In an FC layer,


every output neuron is connected to every input neuron through a different
weight . However, in a Conv layer, the weights are shared among different
neurons. This is another characteristic that enables Conv layers to be used in
the case of a large number of neurons.
CNN vs Fully Connected
Aspect CNN Layer Fully Connected Layer
Purpose Process grid-like data (e.g., images) Make final decisions or classifications
Each neuron is connected to every
Operation Uses filters to produce feature maps
neuron in the previous layer
Parameters Filter size, number of filters, stride, padding Number of neurons
Small region of input connected to next All neurons in one layer connected to
Connectivity
layer all neurons in the next
Weight Each output neuron is connected
Weights are shared among neurons
Sharing through a different weight
Connection Neurons connected only to neighboring
Neurons are densely connected
Density neurons
Reduces parameters, captures spatial
Advantages Combines features for final predictions
hierarchies
Detecting local features like edges, textures, High-level understanding and decision-
Best for
patterns making
https://www.youtube.com/watch?v=vw6nCSDDekc

You might also like