0% found this document useful (0 votes)
67 views1 page

Deep Learning Assignment: CNN & Clustering

Uploaded by

damini1996mini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views1 page

Deep Learning Assignment: CNN & Clustering

Uploaded by

damini1996mini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

EE769 Introduction to Machine Learning (Jan 2022 edition)

Electrical Engineering, Indian Institute of Technology Bombay


Programming Assignment – 3 : Deep Learning and Unsupervised Learning

Instructions:

a) Submit four ipython notebooks with file names <RollNo>_<i>.pynb, where i is the question
number. The notebook should be a complete code plus report with copious comments,
references and URLs, outputs, critical observations, and your reasoning to choose next steps.
b) Use good coding practices such as avoiding hard-coding, using self-explanatory variable
names, using functions (if applicable). This will also be graded.
c) Cite your sources if you use code from the internet. Also clarify what you have modified.
Ensure that the code has a permissive license or it can be assumed that academic purposes
fall under ‘fair use’.

Problem statements:

1. Convolutional Neural Networks:


a. Copy and study the starter code (until “ConvNet as fixed feature extractor”) given by
Sasank (CTO of [Link], pytorch contributor, and alumnus of IITB) for classifying ants
vs. bees: [Link] The
key feature of this code is that it does not train a model from scratch, but uses
transfer learning of a ResNet-18 architecture that is pre-trained on a large dataset
(ImageNet) and then only fine-tunes it for the problem at hand.
b. Modify the code to run on co-lab without any new features. [1]
c. Modify the code to plot validation loss and accuracy after every training epoch. [2]
d. Change the learning rate, momentum, and number of epochs at least three times to
see the net effect on final validation loss and accuracy and its speed of convergence.
[Link] [1]
e. Introduce weight decay (L2 penalty on weights) and find a good value for the weight
decay factor. [1]
2. Clustering:
a. Visualize and pre-process the data as appropriate from the file [Link].
You might have to use a power, an exponential, or a log transformation. [1]
b. Train k-means, and find the appropriate number of k. [1]
c. Using the cluster assignment as the label, visualize the t-sne embedding. [1]
3. PCA:
a. Visualize the data from the file [Link]. [1]
b. Train PCA. [1]
c. Plot the variance explained versus PCA dimensions. [1]
d. Reconstruct the data with various numbers of PCA dimensions, and compute the
MSE. [1]
4. Non-linear dimension reduction:
a. Visualize the data from the file [Link]. [1]
b. Train KPCA. [1]
c. Plot the variance explained versus KPCA dimensions for up to 10 dimensions. [1]

Common questions

Powered by AI

Plotting validation loss and accuracy after each training epoch provides real-time feedback on the model's performance during training. This process is important for monitoring overfitting or underfitting, as it helps assess whether the model is generalizing well to unseen data. Sudden changes or stability in these metrics can guide adjustments to hyperparameters or model structure, ensuring that model improvements align with the data's characteristics and desired performance metrics.

Weight decay, also known as L2 regularization, adds a penalty proportional to the square of the magnitude of the model's weights to the loss function. This discourages overly complex models by penalizing large weight values, thus preventing overfitting. By controlling model complexity, weight decay improves the model’s ability to generalize to unseen data, enhancing its robustness and performance on held-out samples. Properly tuning the weight decay factor is crucial to balance model simplicity and predictive power.

t-SNE is advantageous for visualizing cluster assignments as it captures non-linear structures in data and embeds high-dimensional data into low-dimensional spaces while preserving local similarities, making it suitable for identifying clusters. Unlike traditional methods like PCA, which focuses on global variance, t-SNE emphasizes revealing complex data relationships and patterns that are not easily visible through other linear techniques, providing more intuitive visual insights into clustering outcomes.

Non-linear dimension reduction techniques like Kernel PCA can model complex, non-linear relationships among data points that linear methods like PCA cannot capture. Kernel PCA projects data into high-dimensional spaces using kernel functions, uncovering structure and patterns in data with non-linear characteristics, which are pivotal in enhancing model performance in tasks where linear assumptions are inadequate. The choice between Kernel PCA and linear methods depends on the data characteristics and task requirements, with non-linear techniques generally providing richer insights at the cost of increased computational complexity.

Plotting variance explained versus PCA dimensions allows the identification of the optimal number of components that capture the most substantial data variation. This plot, often referred to as a scree plot or elbow curve, helps determine the point of diminishing returns where additional components contribute minimally to explained variance. Making informed decisions on dimensionality reduction enables reduced computational costs and improved model performance, as it simplifies the dataset while retaining meaningful information.

Reconstructing data using varying numbers of PCA dimensions involves balancing the trade-off between data reconstruction accuracy and dimensionality reduction. Lowering dimensions may increase computation efficiency and reduce storage needs, but it can lead to higher Mean Squared Error (MSE) as critical data information might be lost. Conversely, using more dimensions preserves more information and results in lower MSE, but at increased computational and storage expenses. The challenge lies in selecting a dimensional threshold that minimizes MSE while maximizing the benefits of dimensionality reduction.

Modifications to the learning rate, momentum, and the number of epochs can significantly impact a neural network's training effectiveness and efficiency. A properly selected learning rate facilitates rapid convergence, whereas a too-high rate may cause divergence or oscillation. Momentum helps in accelerating the gradients during training, especially in regions with shallow gradients, reducing convergence time. Furthermore, adjusting the number of epochs can ensure sufficient training duration for convergence but needs to be balanced to avoid overfitting. Experimenting with these parameters allows practitioners to optimize the training process for better performance and faster convergence.

Selecting the appropriate number of clusters (k) in k-means clustering is crucial for accurately reflecting inherent data groupings and ensuring model validity. An incorrect k can lead to overfitting or underfitting, misrepresenting data structure. Effective methods to determine k include the Elbow Method, which analyzes the within-cluster sum of squares to identify diminishing returns, and the Silhouette Coefficient, which measures cluster separation and cohesion. These methods help achieve a balance between accuracy and simplicity, crucial for meaningful cluster interpretations.

Transforming data using power, exponential, or log transformations helps stabilize variance, improve linear relationships among features, and normalize data distribution, which are critical for effective clustering. These transformations can make patterns more discernible by adjusting skewness and reducing the impact of outliers, thus enhancing the accuracy and reliability of clustering algorithms that assume certain statistical properties of the data, such as k-means.

Transfer learning with a pre-trained model like ResNet-18 involves using a model that has been previously trained on a large and diverse dataset (such as ImageNet). This approach utilizes the existing patterns and features learned by the model, thus requiring only fine-tuning on a smaller, task-specific dataset, such as the ants vs. bees classification. This significantly reduces computation time and the amount of data needed for training, compared to training a CNN from scratch, where the model must learn all patterns from the ground up, often necessitating a large dataset and more computational resources.

You might also like