Assignment 5_ _Implementing Image Classification using Deep Learning
Assignment 5_ _Implementing Image Classification using Deep Learning
Aim:
Using MNIST and CIFAR-10 Dataset, perform image classification using ANN and CNN
respectively.
Theory:
Artificial Neural Networks (ANNs) are computational models inspired by the structure
and function of biological neural networks in the human brain. They consist of
interconnected nodes, known as artificial neurons, which process information in a
manner similar to how biological neurons communicate. ANNs are widely used in
various applications, including image recognition, natural language processing, and
predictive modeling.
5. Output:The output of the neuron can be sent to other neurons in the next layer or
can serve as the final output of the network. The output can be interpreted based
on the specific task, such as classification or regression.
Padding is the addition of extra pixels around the input image. It helps preserve
the spatial dimensions of the output feature map. Common types of padding
include "valid" (no padding) and "same" (padding so that the output size is the
same as the input size).
Mathematical Representation
For a 2D convolution operation, the output at position (i,j) can be expressed
mathematically as:
Where:
● Z(i,j) is the output feature map at position (i,j).
● X is the input image.
● K is the filter (kernel).
● m and n are the dimensions of the kernel.
3. Components of CNN.
Convolutional Neural Networks (CNNs) are a class of deep learning models primarily
used for analyzing visual data. They are particularly effective in tasks like image
classification, object detection, and more due to their ability to automatically extract
features from images. Here’s an overview of the key components of CNNs and how
they work.
1. Convolutional Layers
The convolutional layer is the core component of a CNN. It applies a set of filters (also
known as kernels) to the input image. Each filter is a small matrix that slides over the
image, performing element-wise multiplication and summing the results to produce a
feature map. This operation allows CNNs to detect patterns such as edges, textures,
and shapes. For example, a filter might be designed to detect horizontal edges, while
another might detect vertical edges. The process of convolution helps in extracting local
features from the input image, which are crucial for understanding the overall structure.
2. Activation Functions
After the convolution operation, an activation function is applied to introduce
non-linearity into the model. The most commonly used activation function in CNNs is the
Rectified Linear Unit (ReLU), which replaces negative values with zero, allowing the
network to learn complex patterns. Other variants include Leaky ReLU and Parametric
ReLU, which help mitigate issues like the "dying ReLU" problem.
3. Pooling Layers
Pooling layers are used to downsample the feature maps produced by the convolutional
layers. This reduces the spatial dimensions of the data, which helps to decrease
computational load and mitigate overfitting. The most common pooling operation is max
pooling, which takes the maximum value from a specified window (e.g., 2x2) of the
feature map. This process retains the most significant features while discarding less
important information, leading to a more compact representation of the data.
2. Convolution Layer: Multiple filters slide over the image to extract features such as
edges and curves. For instance, one filter might detect horizontal lines while another
detects vertical lines.
4. Pooling Layer: Max pooling is applied to reduce the dimensionality of the feature
maps, retaining the most important features.
5. Fully Connected Layer: The pooled feature maps are flattened and passed through
one or more fully connected layers, culminating in an output layer that predicts the digit
class (0-9) based on the learned features.
This hierarchical structure allows CNNs to effectively learn and recognize patterns in
images, making them powerful tools in computer vision tasks.
● ReLU introduces non-linearity by setting all negative values to zero while keeping
positive values unchanged.
● ReLU is computationally efficient and helps in alleviating the vanishing gradient
problem.
4. Leaky ReLU Function:
● Leaky ReLU is a variant of the ReLU function that addresses the problem of
"dying ReLU" where some neurons never activate.
● It is defined as:
● Here, α is a small positive constant (e.g., 0.01) that prevents the neuron from
completely dying.
5. Softmax Function:
● The softmax function is commonly used as the activation function in the output
layer of neural networks for multi-class classification tasks.
● It converts the output logits into probabilities for each class, ensuring that the
sum of all probabilities is equal to 1.
● The softmax function is defined as:
5. Conclusion
In this assignment, we explored image classification using Artificial Neural Networks
(ANN) and Convolutional Neural Networks (CNN) on the MNIST and CIFAR-10
datasets, respectively. ANNs, inspired by the human brain, consist of interconnected
neurons that process inputs through weighted sums and activation functions. CNNs, on
the other hand, leverage convolution operations to extract spatial features from images,
making them particularly effective for image classification tasks.