Image Classification using MNIST Dataset
Image Classification using MNIST Dataset
Problem Statement:
The task is to build and train a deep learning model that can classify images from a given dataset.
Specifically, we are working with the MNIST dataset, which contains handwri en digits (0-9). The goal
is to create a Convolu onal Neural Network (CNN) to classify images into the appropriate digit class
(from 0 to 9). The model will be evaluated on its ability to predict the correct class for test images, and
the performance will be measured using accuracy and loss metrics. The CNN will use techniques such
as convolu on layers, max pooling, and dropout for regulariza on.
Learning Outcomes:
o Understand the key components of a deep learning model for image classifica on,
including CNNs and their layers (Convolu onal, Pooling, Dense).
o Learn how to load, preprocess, and normalize image data for neural network models.
o Construct a Convolu onal Neural Network (CNN) using Keras for classifying images.
o Use techniques such as convolu on, max pooling, and dense layers in CNNs.
o Evaluate the model using test data and report on the loss and accuracy.
o Create visualiza ons to track the training and valida on accuracy/loss over epochs
using Matplotlib.
o Use the trained model to make predic ons on new images and visualize the results.
o Learn about the so max ac va on func on and how it is used to output probabili es
for each class.
Technologies Used:
1. Python:
2. Keras:
o A high-level neural networks API, wri en in Python and capable of running on top of
TensorFlow, used to construct the deep learning model.
3. TensorFlow:
4. NumPy:
o A library for numerical computa ons, used for handling arrays and data manipula on.
5. Matplotlib:
o A Python plo ng library used to visualize the performance of the model by plo ng
training/valida on accuracy and loss graphs.
6. MNIST Dataset:
o A dataset of 70,000 28x28 grayscale images of handwri en digits (0-9) used for
training and tes ng the model.
7. Image Preprocessing:
o Techniques like normaliza on and reshaping used to prepare images for feeding into
the neural network.
o A specialized deep learning architecture designed to process structured grid data like
images.
o Layers that apply a convolu on opera on to the input to extract features from the
image.
3. Max Pooling:
o A downsampling opera on that reduces the spa al dimensions (width and height) of
the image, helping the model learn more abstract features.
4. Fla ening:
o Conver ng the mul -dimensional outputs from the convolu onal layers into a one-
dimensional vector before feeding it into fully connected (dense) layers.
5. Dense Layers:
o Fully connected layers that allow the network to learn complex representa ons from
the extracted features.
6. Dropout:
7. So max Ac va on:
o A mathema cal func on used to produce a probability distribu on over the classes
(0-9), which is used to predict the most likely class for an image.
8. One-Hot Encoding:
o Reshape the images to match the input shape expected by the CNN.
o Compile the model with the Adam op mizer and categorical cross-entropy loss
func on.
o Train the model using the training data and validate it using the test data.
o Plot the training and valida on accuracy and loss graphs to assess how well the model
has learned over me.
o Display the image and print the predicted class along with the so max probabili es.
Conclusion:
o This project demonstrates how to build, train, and evaluate a simple Convolu onal
Neural Network (CNN) for image classifica on using the MNIST dataset. The CNN
successfully classifies handwri en digits based on features extracted through
convolu onal layers.
Future Work:
o You can experiment with different architectures, including more convolu onal layers
or other types of neural networks like Fully Connected Networks (FNNs) or Recurrent
Neural Networks (RNNs) for different types of tasks.
o The model can be extended to classify more complex datasets like CIFAR-10 or custom
datasets.
Overfi ng: It's important to monitor both training and valida on performance to avoid
overfi ng. Dropout layers and regulariza on techniques help in mi ga ng overfi ng.
Hyperparameter Tuning: Experimen ng with different architectures, learning rates, and other
hyperparameters can lead to be er model performance.
Deployment: Once the model is trained and evaluated, you can deploy it using various
frameworks like Flask, Django, or Streamlit for real- me image classifica on applica ons.
Impor ng Necessary libraries
Code Explana on
import numpy as np
import keras
import tensorflow as
Step-by-Step Breakdown
1. import numpy as np
o Why It's Needed: It is widely used in machine learning and deep learning for
handling arrays, performing mathema cal opera ons, and manipula ng image data
(as image data is o en represented as arrays).
2. import keras
o Why It's Needed: Keras simplifies building and training deep learning models by
providing intui ve APIs.
3. import tensorflow as
o Why It's Needed: TensorFlow acts as the backend for Keras. It is responsible for
performing computa ons, especially on GPUs for faster processing.
o Purpose: This module provides building blocks (e.g., dense layers, convolu onal
layers, etc.) for construc ng neural networks.
o Why It's Needed: Layers are the core components of any neural network. For
instance, a Convolu onal Neural Network (CNN) is built by stacking convolu onal,
pooling, and dense layers.
o Why It's Needed: A model represents the structure of the neural network, specifying
how layers are connected and interact.
o Purpose: The MNIST dataset is a benchmark dataset for image classifica on,
consis ng of 28x28 grayscale images of handwri en digits (0-9).
It includes 60,000 training images and 10,000 test images, making it an ideal
star ng point for beginners.
Ease of Development: Libraries like TensorFlow and Keras reduce the complexity of wri ng
deep learning models from scratch.
Standard Prac ce: These tools are industry-standard and widely accepted in academia and
research.
Efficient and Scalable: They enable the development of models that can scale to handle real-
world problems.
Ensure that these libraries are installed before running the code. Use the following
commands to install them if necessary:
Familiarize yourself with the MNIST dataset, as it is a common star ng point for image
classifica on projects.
Data Prepara on
Code Explana on
Step-by-Step Breakdown
o Why It's Needed: Neural networks work be er when categorical labels are
represented as vectors (one-hot encoding). For example, the digit 3 becomes [0, 0, 0,
1, 0, 0, 0, 0, 0, 0].
o Purpose: Loads the MNIST dataset into training and tes ng sets.
o Key Details:
o Dataset Size:
o Output Example: (60000, 28, 28) shows there are 60,000 images of size 28x28.
o Key Steps:
Normaliza on: Divides each pixel value by 255 to scale them between 0 and
1. This helps the model converge faster during training.
o Purpose: Applies the same reshaping and normaliza on to the test dataset.
o Purpose: Converts the training labels into one-hot encoded format with 10 classes
(digits 0-9).
1. Reshaping:
o Neural networks expect input data to be in a specific shape, especially convolu onal
layers that require 4D tensors.
o Adding the channel dimension makes the data compa ble with the input format for
CNNs.
2. Normaliza on:
o Scaling pixel values to the range [0, 1] ensures numerical stability during
computa ons and faster convergence of the model.
3. One-Hot Encoding:
o This format is ideal for classifica on tasks, where the model outputs probabili es for
each class.
Output Shapes A er Preprocessing
X_train: (60000, 28, 28, 1) — 60,000 images of 28x28 size with 1 channel.
X_test: (10000, 28, 28, 1) — 10,000 images of 28x28 size with 1 channel.
Valida on: Always validate the shape of the data before proceeding to model building.
Generaliza on: The same preprocessing steps (reshaping, normaliza on, and encoding)
should be applied to any dataset when working with neural networks.
model.add(MaxPooling2D(pool_size=(1,1)))
model.add(Fla en())
model.add(Dense(128, ac va on='relu'))
model.add(Dropout(0.5))
model.compile(loss='categorical_crossentropy', op mizer='adam',
metrics=['accuracy'])
Step-by-Step Breakdown
Purpose: These modules provide the building blocks to define the CNN architecture:
o Conv2D: Convolu onal layer to extract features from the input image.
o Fla en: Converts 2D feature maps into a 1D vector for the dense layers.
o Sequen al: A linear stack of layers, where each layer has exactly one input tensor
and one output tensor.
2. Defining the CNN Architecture
Purpose: The Sequen al model is used to build a neural network by stacking layers
sequen ally.
Details:
model.add(MaxPooling2D(pool_size=(1,1)))
Details:
o pool_size=(1,1): Reduces the spa al dimensions of the feature map without much
informa on loss.
model.add(Fla en())
Purpose: Converts the 2D feature maps into a 1D vector to feed into the fully connected
layers.
model.add(Dense(128, ac va on='relu'))
Details:
o 128: Number of neurons in this dense layer.
model.add(Dropout(0.5))
Details:
o 0.5: 50% of the neurons will be randomly deac vated during training.
Details:
o ac va on='so max': Converts the outputs into probabili es for each class.
model.compile(loss='categorical_crossentropy', op mizer='adam',
metrics=['accuracy'])
Details:
o metrics=['accuracy']: Tracks the model's accuracy during training and evalua on.
o Convolu onal layers extract relevant pa erns and features from the input image.
o Pooling layers reduce computa onal complexity and prevent overfi ng.
3. Classifica on:
Experimenta on: Try changing the number of filters, kernel size, or ac va on func ons to
observe their impact.
Visualiza on: Plot the model architecture using model.summary() or external tools to
understand the layer-wise structure.
Code Explana on
history = model.fit(X_train, Y_train, batch_size=128,
Purpose of model.fit()
The fit() method is used to train the CNN model on the training dataset while evalua ng its
performance on the valida on dataset. Here's a breakdown of the parameters:
Parameters in model.fit()
2. batch_size=128:
o The number of samples processed before the model updates its weights.
o Larger batch size: Faster training but higher memory consump on.
3. epochs=30:
o Increasing epochs can improve performance but may lead to overfi ng if too high.
history is an object that stores training and valida on metrics (e.g., loss and accuracy) for
each epoch.
This can be used later for visualiza on or analysis of the training process.
2. Key Concepts
1. Batch Training:
o The training data is divided into smaller batches to efficiently train the model.
o Helps with memory constraints and smooths out updates to weights.
2. Epochs:
o Each epoch represents one full cycle through the training dataset.
o At the end of each epoch, the model is evaluated on the valida on data.
3. Valida on Data:
3. Monitoring Progress
o Loss: A measure of how far off the model's predic ons are from the true labels.
o Valida on Loss and Accuracy: Corresponding metrics for the valida on dataset.
o A batch size of 128 is a good star ng point, but it can be adjusted depending on the
dataset size and system memory.
2. Epoch Count:
o 30 epochs work well for this task but can be increased if the model hasn't converged.
3. Analyze history:
o Use the history object to visualize the training and valida on curves a er training.
Epoch 1/30
Epoch 2/30
...
Epoch 30/30
What’s Next?
A er training, visualize the training and valida on curves to ensure the model is performing
well.
Code Explana on
model.summary()
1. Purpose of model.summary()
It outlines the layers of the model, the number of parameters, and the shapes of the
input/output tensors for each layer.
1. Layer Type:
o Displays the name and type of each layer (e.g., Conv2D, MaxPooling2D, Dense).
2. Output Shape:
3. Number of Parameters:
4. Total Parameters:
3. Example Output
_________________________________________________________________
=================================================================
=================================================================
Non-trainable params: 0
_________________________________________________________________
4. Detailed Breakdown
o The reduc on from 28x28 to 26x26 is due to the filter size (3x3) and no padding
('valid').
Param #: 160:
o Computed as
(filter width×filter height×input channels+1)×number of filters(\text{filter width}
\ mes \text{filter height} \ mes \text{input channels} + 1) \ mes \text{number of
filters}.
Pooling Layer
Param #: 0:
Fla en Layer
o Converts the 26x26x16 tensor into a flat vector with 26×26×16=10,81626 \ mes 26
\ mes 16 = 10,816 features.
o Param #: 1,384,576:
Dropout Layer
Output Shape: (None, 128):
o The shape remains unchanged as dropout only deac vates neurons during training
without altering dimensions.
Output Layer
o Fully connected layer with 10 neurons (one for each digit class).
o Param #: 1,290:
1. Understanding Parameters:
o Trainable parameters are the weights and biases adjusted during training.
o Non-trainable parameters are fixed values, o en in layers like batch normaliza on.
2. Total Parameters:
3. Debugging:
o Use the summary to ensure layer outputs match the expected dimensions.
For larger models, visualize the architecture using external tools like TensorBoard or
plot_model from Keras.
Use the parameter count to assess the model’s complexity rela ve to the available
computa onal resources.
The output includes a breakdown of the model's parameters and their memory usage:
1. Total Parameters:
2. Trainable Parameters:
Parameters (weights and biases) that the model learns during training.
3. Non-Trainable Parameters:
4. Op mizer Parameters:
Memory Breakdown
The memory usage is calculated based on the number of parameters and their precision:
1. Trainable Parameters:
2. Op mizer Parameters:
3. Model Complexity:
o Ensure the hardware (GPU/CPU) can handle the memory requirements, especially
for larger datasets or deeper models.
Code Explana on
1. Purpose of Evalua on
model.evaluate():
1. model.evaluate():
o Parameters:
o Output:
o Mul plying the loss and accuracy by 100 converts them to percentages for be er
readability.
3. Prin ng Results:
Loss:
o Example: 2.53% loss means the model performs well on unseen data.
Accuracy:
o Example: 98.14% accuracy means the model correctly classifies ~98% of test images.
o A significant gap between training and test accuracy suggests overfi ng.
o Accuracy focuses on correctness, while loss gives a more nuanced view of how
confident the predic ons are.
3. Performance Metric:
o High test accuracy reflects that the model generalizes well to unseen data.
1. Improving Accuracy:
2. Analyzing Loss:
o Monitor if the loss remains consistent across training, valida on, and tes ng phases.
o Check for:
The provided code plots the accuracy achieved during training and valida on over all epochs.
Visualiza on is essen al to evaluate the model's performance and detect poten al issues like
overfi ng or underfi ng.
1. Code Explana on
train_acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
history.history:
o Keys:
1. X-Axis: Epochs
2. Y-Axis: Accuracy
3. plt.plot():
o Labels:
4. plt. tle():
o Adds a tle to the graph: "Training and Valida on Accuracy".
6. plt.legend():
7. plt.show():
4. Example Output
X-Axis (Epochs): Ranges from 1 to the total number of epochs (e.g., 30).
Two Lines:
1. Ideal Scenario:
2. Overfi ng:
3. Underfi ng:
o Training and valida on accuracy curves align closely, indica ng good generaliza on.
Always monitor training and valida on accuracy to ensure the model is learning effec vely.
Compare accuracy with the loss values to ensure consistent performance across metrics.
This code generates a plot comparing the training and valida on loss throughout the training
process. Visualizing the loss helps evaluate model performance and iden fy issues like overfi ng or
underfi ng.
1. Code Explana on
train_loss = history.history['loss']
val_loss = history.history['val_loss']
history.history:
o This object stores the loss and accuracy values for both training and valida on during
the training process.
o Keys:
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
plt. tle(): Adds a tle to the plot: "Training and Valida on Loss".
plt.legend(): Adds a legend to differen ate the lines for training and valida on loss.
5. Example Output
X-Axis (Epochs): Ranges from 1 to the total number of epochs (e.g., 30).
Two Lines:
1. Ideal Scenario:
o Valida on loss should ideally be close to training loss, indica ng good generaliza on.
2. Overfi ng:
o Training loss con nues to decrease, but valida on loss starts increasing a er a point.
3. Underfi ng:
4. Convergence:
o If both training and valida on loss converge to similar low values, it indicates the
model is generalizing well to the test data.
Always monitor both training and valida on loss to detect overfi ng or underfi ng early.
If the valida on loss starts increasing while training loss decreases, it’s a sign to stop training
or use regulariza on techniques.
Document and save these plots to assess model performance for future reference or in
project reports.
Summary:
Visualizing the loss for both training and valida on data is crucial for understanding how well the
model is performing and detec ng poten al issues like overfi ng.