Assignment 5_ _Implementing Image Classification using Deep Learning

Assignment 5: Implementing Image
Classification using Deep Learning
Aim:
Using MNIST and CIFAR-10 Dataset, perform image classification using ANN and CNN
respectively.
Theory:
1. What is Artificial Neural Network and explain structure of artificial

neurons?
Artificial Neural Networks (ANNs) are computational models inspired by the structure
and function of biological neural networks in the human brain. They consist of
interconnected nodes, known as artificial neurons, which process information in a
manner similar to how biological neurons communicate. ANNs are widely used in
various applications, including image recognition, natural language processing, and
predictive modeling.
Structure of Artificial Neurons

An artificial neuron is the fundamental building block of an ANN. It receives input
signals, processes them, and produces an output signal. The structure of an artificial
neuron can be broken down into several key components:
1. Inputs:Each neuron receives multiple inputs, which can be raw data features
(e.g., pixel values in an image) or outputs from other neurons in the network.
These inputs are typically represented as a vector x = [x1, x2, …, xn]
2. Weights:Each input is associated with a weight wi, which determines the
influence of the corresponding input on the neuron's output. Weights are adjusted
during the training process to minimize the error in the network's predictions.
3. Weighted Sum:The neuron computes a weighted sum of its inputs, which can be
expressed mathematically as:
Here, 𝑏 represents the bias term, which allows the model to fit the data better by
shifting the activation function.
4. Activation Function:The weighted sum z is passed through a non-linear activation
function 𝑓(𝑧) to produce the neuron's output. Common activation functions
include:
● Sigmoid:
● Hyperbolic Tangent (tanh):
● ReLU (Rectified Linear Unit):
5. Output:The output of the neuron can be sent to other neurons in the next layer or
can serve as the final output of the network. The output can be interpreted based
on the specific task, such as classification or regression.
2. What is the Convolution operation in CNN?
Convolution is a fundamental operation in Convolutional Neural Networks (CNNs) that

plays a crucial role in feature extraction from input data, particularly images. Here's a
detailed explanation of the convolution operation in CNNs:
What is Convolution in CNN?

Convolution is a mathematical operation that combines two sets of information. In the
context of CNNs, it involves applying a filter (also known as a kernel) to an input image
to extract features such as edges, textures, and patterns. The convolution operation
helps in reducing the dimensionality of the data while preserving important features,
making it easier for the network to learn.
How Convolution Works
1. Filter (Kernel):A filter is a small matrix (e.g., 3x3 or 5x5) that slides over the input
image. Each filter is designed to detect specific features. For instance, one filter
might detect horizontal edges, while another might detect vertical edges.
2. Sliding the Filter:The filter is slid over the input image, performing element-wise
multiplication with the portion of the image it covers at each position. The results
of these multiplications are summed to produce a single output value.
3. Feature Map:The output of the convolution operation is a feature map (or
activation map) that represents the presence of the detected feature in different
parts of the image. Each value in the feature map corresponds to the response of
the filter at a specific location in the input image.
4. Stride and Padding:Stride refers to the number of pixels the filter moves at each
step. A stride of 1 means the filter moves one pixel at a time, while a stride of 2
means it moves two pixels.
Padding is the addition of extra pixels around the input image. It helps preserve
the spatial dimensions of the output feature map. Common types of padding
include "valid" (no padding) and "same" (padding so that the output size is the
same as the input size).
Mathematical Representation
For a 2D convolution operation, the output at position (i,j) can be expressed
mathematically as:
Where:
● Z(i,j) is the output feature map at position (i,j).
● X is the input image.
● K is the filter (kernel).
● m and n are the dimensions of the kernel.
Applications of Convolution in CNNs

● Edge Detection: Convolution helps in detecting edges in images, which is crucial
for identifying objects.
● Feature Extraction: It extracts various features from images, allowing the network
to learn hierarchical representations (from low-level features like edges to
high-level features like shapes).
● Dimensionality Reduction: By reducing the spatial dimensions of the input while
retaining important information, convolution helps in making the model more
efficient.
3. Components of CNN.
Convolutional Neural Networks (CNNs) are a class of deep learning models primarily
used for analyzing visual data. They are particularly effective in tasks like image
classification, object detection, and more due to their ability to automatically extract
features from images. Here’s an overview of the key components of CNNs and how
they work.
Key Components of CNNs
1. Convolutional Layers
The convolutional layer is the core component of a CNN. It applies a set of filters (also
known as kernels) to the input image. Each filter is a small matrix that slides over the
image, performing element-wise multiplication and summing the results to produce a
feature map. This operation allows CNNs to detect patterns such as edges, textures,
and shapes. For example, a filter might be designed to detect horizontal edges, while
another might detect vertical edges. The process of convolution helps in extracting local
features from the input image, which are crucial for understanding the overall structure.
2. Activation Functions
After the convolution operation, an activation function is applied to introduce
non-linearity into the model. The most commonly used activation function in CNNs is the
Rectified Linear Unit (ReLU), which replaces negative values with zero, allowing the
network to learn complex patterns. Other variants include Leaky ReLU and Parametric
ReLU, which help mitigate issues like the "dying ReLU" problem.
3. Pooling Layers
Pooling layers are used to downsample the feature maps produced by the convolutional
layers. This reduces the spatial dimensions of the data, which helps to decrease
computational load and mitigate overfitting. The most common pooling operation is max
pooling, which takes the maximum value from a specified window (e.g., 2x2) of the
feature map. This process retains the most significant features while discarding less
important information, leading to a more compact representation of the data.
4. Fully Connected Layers

After several convolutional and pooling layers, the output is flattened and passed to one
or more fully connected layers. These layers function similarly to traditional neural
networks, where each neuron is connected to every neuron in the previous layer. The
fully connected layers are responsible for making the final classification based on the
features extracted in the earlier layers. The output layer typically uses a softmax
activation function to produce probabilities for each class in a multi-class classification
problem.
Working Example: Handwritten Digit Recognition

A common example of CNN application is in the classification of handwritten digits (e.g.,
the MNIST dataset).
1. Input Layer: The input is a grayscale image of a handwritten digit, typically

represented as a 28x28 pixel matrix.
2. Convolution Layer: Multiple filters slide over the image to extract features such as
edges and curves. For instance, one filter might detect horizontal lines while another
detects vertical lines.
3. Activation Function: After convolution, the ReLU function is applied to introduce

non-linearity.
4. Pooling Layer: Max pooling is applied to reduce the dimensionality of the feature
maps, retaining the most important features.
5. Fully Connected Layer: The pooled feature maps are flattened and passed through
one or more fully connected layers, culminating in an output layer that predicts the digit
class (0-9) based on the learned features.
This hierarchical structure allows CNNs to effectively learn and recognize patterns in
images, making them powerful tools in computer vision tasks.
4. Various Activation functions in ANN/CNN (used in your CNN

architecture for image classification).
Activation Functions in ANNs and CNNs

1. Sigmoid Function:
● The sigmoid function is a commonly used activation function in neural networks.
● It maps any input value to the range (0, 1), making it suitable for binary
classification tasks.
● Mathematically, the sigmoid function is defined as:
● However, the sigmoid function is not commonly used in modern neural networks
due to the vanishing gradient problem, where the gradients become very small
during backpropagation, making it difficult for the network to learn.
2. Tanh (Hyperbolic Tangent) Function:

● The tanh function is similar to the sigmoid function but maps input values to the
range (-1, 1).
● It is defined as:
● Tanh is zero-centered, which can help in faster convergence of the gradient

descent algorithm compared to the sigmoid function.
3. ReLU (Rectified Linear Unit) Function:

● ReLU is the most commonly used activation function in modern neural networks,
especially in CNNs.
● ReLU introduces non-linearity by setting all negative values to zero while keeping
positive values unchanged.
● ReLU is computationally efficient and helps in alleviating the vanishing gradient
problem.
4. Leaky ReLU Function:
● Leaky ReLU is a variant of the ReLU function that addresses the problem of
"dying ReLU" where some neurons never activate.
● Here, α is a small positive constant (e.g., 0.01) that prevents the neuron from
completely dying.
5. Softmax Function:
● The softmax function is commonly used as the activation function in the output
layer of neural networks for multi-class classification tasks.
● It converts the output logits into probabilities for each class, ensuring that the
sum of all probabilities is equal to 1.
● The softmax function is defined as:
5. Conclusion
In this assignment, we explored image classification using Artificial Neural Networks
(ANN) and Convolutional Neural Networks (CNN) on the MNIST and CIFAR-10
datasets, respectively. ANNs, inspired by the human brain, consist of interconnected
neurons that process inputs through weighted sums and activation functions. CNNs, on
the other hand, leverage convolution operations to extract spatial features from images,
making them particularly effective for image classification tasks.
By implementing these models, we demonstrated the power of deep learning in

recognizing patterns and classifying images with high accuracy. The use of various
activation functions, such as ReLU and Softmax, further enhanced the models’ ability to
learn complex patterns and make accurate predictions. This assignment highlights the
significance of neural networks in modern machine learning and their potential to drive
advancements in various fields.

Assignment 5_ _Implementing Image Classification using Deep Learning

Uploaded by

Assignment 5_ _Implementing Image Classification using Deep Learning

Uploaded by

Assignment 5: Implementing Image

Classification using Deep Learning

1. What is Artificial Neural Network and explain structure of artificial

Structure of Artificial Neurons

● Hyperbolic Tangent (tanh):

● ReLU (Rectified Linear Unit):

2. What is the Convolution operation in CNN?

Convolution is a fundamental operation in Convolutional Neural Networks (CNNs) that

What is Convolution in CNN?

Applications of Convolution in CNNs

Key Components of CNNs

4. Fully Connected Layers

Working Example: Handwritten Digit Recognition

1. Input Layer: The input is a grayscale image of a handwritten digit, typically

3. Activation Function: After convolution, the ReLU function is applied to introduce

4. Various Activation functions in ANN/CNN (used in your CNN

Activation Functions in ANNs and CNNs

2. Tanh (Hyperbolic Tangent) Function:

● Tanh is zero-centered, which can help in faster convergence of the gradient

3. ReLU (Rectified Linear Unit) Function:

By implementing these models, we demonstrated the power of deep learning in

You might also like