Deep Learning
Subject Code – EC37T
Course Pre-requisite:EC37P
Dr. Nayana Mahajan
9/19/2025 Dr. Nayana Mahajan 1
Module II: Convolutional Neural Networks
(CNNs)
Basics of CNNs (Convolution, Pooling, Padding, Stride)
Modern Deep Learning Architectures: LeNET: Architecture,
AlexNET: Architecture
Advanced Architectures: ResNet, DenseNet, EfficientNet
Transfer Learning and Fine-tuning CNNs
Applications: Image Classification, Object Detection
9/19/2025 Dr. Nayana Mahajan 2
LeNet-5:
Purpose:
• Primarily designed for handwritten digit recognition,
specifically the MNIST dataset.
• The MNIST dataset (Modified National Institute of
Standards and Technology database) is a classic
benchmark in the field of machine learning and computer
vision.
9/19/2025 Dr. Nayana Mahajan 3
LeNet-5
This is also known as the Classic Neural Network that
was designed by Yann LeCun, Leon Bottou, Yosuha
Bengio and Patrick Haffner for handwritten and
machine-printed character recognition in 1990’s which
they called LeNet-5.
The architecture was designed to identify handwritten
digits in the MNIST data-set.
9/19/2025 Dr. Nayana Mahajan 4
LeNet-5
The architecture is pretty straightforward and simple to
understand.
The input images were gray scale with dimension of 32*32*1
followed by two pairs of Convolution layer with stride 2 and
Average pooling layer with stride 1.
Finally, fully connected layers with Softmax activation in the
output layer.
Traditionally, this network had 60,000 parameters in total.
9/19/2025 Dr. Nayana Mahajan 5
9/19/2025 Dr. Nayana Mahajan 6
LeNet-5:
Architecture:
A relatively shallow CNN with 7 layers (including input and output).
Input Layer: Accepts 32x32 pixel images.
• Convolutional Layers (C1, C3): Use 5x5 filters with 6 and 16 feature
maps, respectively.
• Pooling Layers (S2, S4): Employ 2x2 average pooling to reduce
spatial dimensions.
• Fully Connected Layers (C5, F6): Connect all neurons in the
preceding layer to each neuron in the current layer.
• Output Layer: Uses a sigmoid activation function.
9/19/2025 Dr. Nayana Mahajan 7
Number of Kernels (Filters)
• Each kernel learns a different feature (e.g., edge, texture,
shape).
• More kernels → more types of features learned.
9/19/2025 Dr. Nayana Mahajan 8
LeNet's Architecture
The LeNet architecture consists of several layers
that progressively extract and condense
information from input images.
Here, is it the description of each layer of the LeNet
architecture:
1.Input Layer: Accepts 32x32 pixel images, often
zero-padded if original images are smaller.
2.First Convolutional Layer (C1): Consists of six 5x5
filters, producing six feature maps of 28x28 each.
9/19/2025 Dr. Nayana Mahajan 9
3.First
Pooling Layer (S2): Applies 2x2 average
pooling, reducing feature maps' size to 14x14.
4.Second Convolutional Layer (C3): Uses sixteen
5x5 filters, but with sparse connections, outputting
sixteen 10x10 feature maps.
5.Second Pooling Layer (S4): Further reduces
feature maps to 5x5 using 2x2 average pooling.
9/19/2025 Dr. Nayana Mahajan 10
6.First Fully Connected Layer (C5): Fully connected with
120 nodes.
6.Second Fully Connected Layer (F6): Comprises 84
nodes.
7.Output Layer: Softmax or Gaussian activation that
outputs probabilities across 10 classes (digits 0-9).
9/19/2025 Dr. Nayana Mahajan 11
AlexNet:
Purpose:
Won the 2012 Image Net competition, demonstrating
the potential of deep learning for large-scale image
recognition.
9/19/2025 Dr. Nayana Mahajan 12
Architecture
A deeper CNN with 8 layers (including input and output).
Convolutional Layers: Employs 5 convolutional layers with
varying filter sizes (11x11, 5x5, 3x3).
Pooling Layers: Uses max-pooling to reduce spatial
dimensions.
Fully Connected Layers: Includes three fully connected layers,
with the final layer producing 1000 outputs (for ImageNet
classes).
9/19/2025 Dr. Nayana Mahajan 13
AlexNet architecture
9/19/2025 Dr. Nayana Mahajan 14
Input Layer
AlexNet takes images of the Input size of
227x227x3 RGB Pixels.
9/19/2025 Dr. Nayana Mahajan 15
Convolutional Layers
• First Layer: The first layer uses 96 kernels of size
11×11 with a stride of 4, activates them with the
ReLU activation function, and then performs a Max
Pooling operation.
• Second Layer: The second layer takes the output of
the first layer as the input, with 256 kernels of size
5x5x48.
• Third Layer: 384 kernels of size 3x3x256.
9/19/2025 Dr. Nayana Mahajan 16
Convolutional Layers
• No pooling or normalization operations are
performed on the third, fourth, and fifth layers.
• Fourth Layer: 384 kernels of size 3x3x192.
• Fifth Layer: 256 kernels of size 3x3x192.
9/19/2025 Dr. Nayana Mahajan 17
Fully Connected Layers
The fully connected layers have 4096 neurons each.
Output Layer
The output layer is a SoftMax layer that outputs
probabilities of the 1000 class labels.
9/19/2025 Dr. Nayana Mahajan 18
Example Calculation (without padding)
For an input of 227×227×3, applying:
Kernel size = 11
Stride = 4
No padding
Output size (per channel) =
(So, the output feature map = 55×55×96
(assuming AlexNet uses 96 filters in the first conv
layer).
9/19/2025 Dr. Nayana Mahajan 19
9/19/2025 Dr. Nayana Mahajan 20