0% found this document useful (0 votes)
13 views20 pages

Module - 2.2

The document outlines a course on Deep Learning, focusing on Convolutional Neural Networks (CNNs) including architectures like LeNet and AlexNet. It details the structure and purpose of each architecture, emphasizing their applications in image classification and object detection. Key concepts such as transfer learning, pooling, and fully connected layers are also discussed.

Uploaded by

conway.rl112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views20 pages

Module - 2.2

The document outlines a course on Deep Learning, focusing on Convolutional Neural Networks (CNNs) including architectures like LeNet and AlexNet. It details the structure and purpose of each architecture, emphasizing their applications in image classification and object detection. Key concepts such as transfer learning, pooling, and fully connected layers are also discussed.

Uploaded by

conway.rl112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Deep Learning

Subject Code – EC37T


Course Pre-requisite:EC37P

Dr. Nayana Mahajan

9/19/2025 Dr. Nayana Mahajan 1


Module II: Convolutional Neural Networks
(CNNs)
 Basics of CNNs (Convolution, Pooling, Padding, Stride)

 Modern Deep Learning Architectures: LeNET: Architecture,


AlexNET: Architecture

 Advanced Architectures: ResNet, DenseNet, EfficientNet

 Transfer Learning and Fine-tuning CNNs

 Applications: Image Classification, Object Detection


9/19/2025 Dr. Nayana Mahajan 2
LeNet-5:
Purpose:
• Primarily designed for handwritten digit recognition,
specifically the MNIST dataset.

• The MNIST dataset (Modified National Institute of


Standards and Technology database) is a classic
benchmark in the field of machine learning and computer
vision.

9/19/2025 Dr. Nayana Mahajan 3


LeNet-5

 This is also known as the Classic Neural Network that


was designed by Yann LeCun, Leon Bottou, Yosuha
Bengio and Patrick Haffner for handwritten and
machine-printed character recognition in 1990’s which
they called LeNet-5.

 The architecture was designed to identify handwritten


digits in the MNIST data-set.

9/19/2025 Dr. Nayana Mahajan 4


LeNet-5

 The architecture is pretty straightforward and simple to


understand.

 The input images were gray scale with dimension of 32*32*1


followed by two pairs of Convolution layer with stride 2 and
Average pooling layer with stride 1.

 Finally, fully connected layers with Softmax activation in the


output layer.

 Traditionally, this network had 60,000 parameters in total.

9/19/2025 Dr. Nayana Mahajan 5


9/19/2025 Dr. Nayana Mahajan 6
LeNet-5:
Architecture:
A relatively shallow CNN with 7 layers (including input and output).
Input Layer: Accepts 32x32 pixel images.

• Convolutional Layers (C1, C3): Use 5x5 filters with 6 and 16 feature
maps, respectively.

• Pooling Layers (S2, S4): Employ 2x2 average pooling to reduce


spatial dimensions.

• Fully Connected Layers (C5, F6): Connect all neurons in the


preceding layer to each neuron in the current layer.

• Output Layer: Uses a sigmoid activation function.


9/19/2025 Dr. Nayana Mahajan 7
Number of Kernels (Filters)
• Each kernel learns a different feature (e.g., edge, texture,
shape).
• More kernels → more types of features learned.

9/19/2025 Dr. Nayana Mahajan 8


LeNet's Architecture
 The LeNet architecture consists of several layers
that progressively extract and condense
information from input images.

Here, is it the description of each layer of the LeNet


architecture:

1.Input Layer: Accepts 32x32 pixel images, often


zero-padded if original images are smaller.
2.First Convolutional Layer (C1): Consists of six 5x5
filters, producing six feature maps of 28x28 each.
9/19/2025 Dr. Nayana Mahajan 9
3.First
Pooling Layer (S2): Applies 2x2 average
pooling, reducing feature maps' size to 14x14.

4.Second Convolutional Layer (C3): Uses sixteen


5x5 filters, but with sparse connections, outputting
sixteen 10x10 feature maps.

5.Second Pooling Layer (S4): Further reduces


feature maps to 5x5 using 2x2 average pooling.

9/19/2025 Dr. Nayana Mahajan 10


6.First Fully Connected Layer (C5): Fully connected with
120 nodes.

6.Second Fully Connected Layer (F6): Comprises 84


nodes.

7.Output Layer: Softmax or Gaussian activation that


outputs probabilities across 10 classes (digits 0-9).

9/19/2025 Dr. Nayana Mahajan 11


AlexNet:

Purpose:
 Won the 2012 Image Net competition, demonstrating
the potential of deep learning for large-scale image
recognition.

9/19/2025 Dr. Nayana Mahajan 12


Architecture
 A deeper CNN with 8 layers (including input and output).

 Convolutional Layers: Employs 5 convolutional layers with


varying filter sizes (11x11, 5x5, 3x3).

 Pooling Layers: Uses max-pooling to reduce spatial


dimensions.

 Fully Connected Layers: Includes three fully connected layers,


with the final layer producing 1000 outputs (for ImageNet
classes).

9/19/2025 Dr. Nayana Mahajan 13


AlexNet architecture

9/19/2025 Dr. Nayana Mahajan 14


Input Layer
 AlexNet takes images of the Input size of
227x227x3 RGB Pixels.

9/19/2025 Dr. Nayana Mahajan 15


 Convolutional Layers
• First Layer: The first layer uses 96 kernels of size
11×11 with a stride of 4, activates them with the
ReLU activation function, and then performs a Max
Pooling operation.

• Second Layer: The second layer takes the output of


the first layer as the input, with 256 kernels of size
5x5x48.

• Third Layer: 384 kernels of size 3x3x256.


9/19/2025 Dr. Nayana Mahajan 16
 Convolutional Layers

• No pooling or normalization operations are


performed on the third, fourth, and fifth layers.

• Fourth Layer: 384 kernels of size 3x3x192.

• Fifth Layer: 256 kernels of size 3x3x192.

9/19/2025 Dr. Nayana Mahajan 17


Fully Connected Layers
 The fully connected layers have 4096 neurons each.

Output Layer
 The output layer is a SoftMax layer that outputs
probabilities of the 1000 class labels.

9/19/2025 Dr. Nayana Mahajan 18


Example Calculation (without padding)
 For an input of 227×227×3, applying:
 Kernel size = 11
 Stride = 4
 No padding
 Output size (per channel) =
 (So, the output feature map = 55×55×96
(assuming AlexNet uses 96 filters in the first conv
layer).

9/19/2025 Dr. Nayana Mahajan 19


9/19/2025 Dr. Nayana Mahajan 20

You might also like