Assignment 1

This document outlines an assignment for a Machine Learning course with the following key details: 1. The assignment involves data visualization, decision tree classification, and ensemble methods. Students will analyze two datasets and apply various machine learning models and techniques. 2. The assignment is worth 100 marks and is due on September 27th, 2020. Late submissions will not be accepted. 3. Students must complete the assignment individually and are only allowed to use Python along with specified libraries. 4. Detailed instructions are provided for each question involving tasks like data visualization, parameter tuning, model evaluation and more. 5. Students must submit a report along with code and analysis to document their approach and receive marks.

Uploaded by

Arnav Yadav

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Assignment 1

Uploaded by

Arnav Yadav

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Machine Learning (PG)

Monsoon 2020
Total Marks: 100 Assignment 1 Due Date: 27 Sept, 2020

Instructions:
(1) The assignment is to be attempted individually
(2) You can use only Python as the programming language.
(3) You are free to use math libraries like Numpy, Pandas; and use Matplotlib, Seaborn
library for plotting.
(4) Usage instructions regarding the other libraries is provided in the questions. Do not use
any ML module that is not allowed.
(5) Create a ‘.pdf’ report that contains your approach, pre-processing, assumptions etc. Add
all the analysis related to the question in the written format in the report, anything not
in the report will not be marked. Use plots whereever required.
(6) Implement code that is modular in nature. Only python (*.py) files should be submitted.
(7) Submit code, readme and analysis files in ZIP format with naming convention
‘A1 rollno name.zip.’ This nomenclature has to be followed strictly.
(8) You should be able to replicate your results during the demo, failing which will fetch
zero marks.
(9) There will be no deadline extension under any circumstances. According to course poli-
cies, no late submissions will be considered. So, start early.

(1) In this question, you will explore the data visualization, a very essential task for the
machine learning.
Datasets: ‘dataset 1’, ‘dataset 2’ (attached with the assignment)
You can use the pyplot or other related libraries for this question. You have to include
the legends in the plots to denote the class labels and other relevant information.
(a) Visualize the 10 samples of each class from ‘dataset 1’ in the form of images. What
are your observations? 5 marks
(b) Usually to explore the data complexity, we visualize the scatter plot of the data. In
the scatter plot, the samples of all the classes are visualized simultaneously, which
provides information about the class separation. Visualize the ‘dataset 2’ in the
form of scatter plot. What is your inference about this dataset? 10 marks
(c) We can visualize only the 2D or 3D data using the scatter plots. For features dimen-
sions higher that three, we may use T-distributed Stochastic Neighbor Embedding
(t-SNE) to reduce the number of features. Use the t-SNE to reduce the ‘dataset 1’
to 2 dimensions and visualize the scatter plot. What is your inference regarding the
class separation ? 10 marks
(d) Now, reduce the original ‘dataset 1’ to the three dimensions and visualize the scatter
plot again. Is there any important distinction in inference as compared to (c) above.
10 marks
You can use the t-sne from the sklearn library for part (c) and (d).
(2) For this question, you can use the decision tree classifier from sklearn.
Dataset: ‘dataset 2’ (attached with the assignment).
Use first 70% of the samples for training and remaining 30% for testing. Implement the
function to split the dataset and calculate accuracy. You cannot use any inbuilt version.
Though, you can use Numpy, Pandas, Random etc.
(a) Take depth as a hyperparameter, and perform a grid search for finding its optimal
value. Plot a curve between depth and testing accuracy. Comment on the effect
of depth on the performance of the classifier. You have to perform grid search
for at least 15 values of the depth. Is your best performance consist ant with the
visualization in (1.b). You have to implement grid search and cannot use any inbuilt
implementation for the algorithm. 15 marks
(b) For part (a), prepare a table representing train accuracy and validation accuracy for
each value of the depth. Comment on overfittimg, and underfitting for each entry
in the table. 10 marks
(c) Replicate part (b) with sklearn ‘accuracy function’. Is there any deviation between
the results from your implementation and inbuilt function? 5 marks
(3) For this question, you can use the decision tree classifier from sklearn.
Dataset: PM2.5 Data UCI Archive
Target Variable: Month
You will have to handle null values in the data.
(Remove ”No” column as that is index. This information is shared as this might be
the first ML model for many people. From next assignment, data analysis and feature
selection will be a part of the exercise.)
Split the data into training and testing set (80:20 ratio) using the function you created
in previous question. Use the same training set for training the following models. You
can not use sklearn for splitting the dataset.
(a) Train a decision tree using both gini index and entropy. Don’t change any of other
default values of the classifier. In the following models, use the criteria which gives
better accuracy on test set. 5 marks
(b) Train decision trees with different maximum depths [2, 4, 8, 10, 15, 30]. Find the
best value of depth by using testing and training accuracy. Plot the curve between
training and testing accuracy and depth to support your analysis. 10 marks
(c) Ensembling is a method to combine multiple not-so-good models to get a better per-
forming model (more in upcoming lectures). Create 100 different decision stumps
(max depth 3). For each stump, train it on randomly selected 50% of the training
data, i.e., select data for each stump separately. Now, predict the test samples’ labels
by taking take majority vote of the output of the stumps. How is the performance
effected as compared to part (a) and (b)? 10 marks
(d) Now, try to tune the decision stumps by changing the max-depth [4, 8, 10, 15, 20,
best achieved from (b)] and number of trees. Analyze the effect on the training and
testing accuracy. Use majority vote for final prediction on the test data. 10 marks
Compare the results of the classification models created above on the test set. Rank the
models and analyze if there is a statistically significant difference.
Add all the analysis to the report.

CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
Tushar ML
No ratings yet
Tushar ML
52 pages
AML ML Practical List
No ratings yet
AML ML Practical List
10 pages
hw2 2020
No ratings yet
hw2 2020
3 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
CSL7620_A2
No ratings yet
CSL7620_A2
2 pages
ML File
No ratings yet
ML File
17 pages
Assignment_1_Machine Learning
No ratings yet
Assignment_1_Machine Learning
3 pages
ML - LAB - FILE Pankaj
No ratings yet
ML - LAB - FILE Pankaj
13 pages
178 hw1
No ratings yet
178 hw1
4 pages
ML - LAB - FILE Amrit
No ratings yet
ML - LAB - FILE Amrit
13 pages
DIT865 2018 Mar Solution
No ratings yet
DIT865 2018 Mar Solution
9 pages
Ml Lab Manual Completed
No ratings yet
Ml Lab Manual Completed
56 pages
Pattern Recognition Lab
No ratings yet
Pattern Recognition Lab
24 pages
ML PG Assignment 3
No ratings yet
ML PG Assignment 3
3 pages
FAQ's - Supervised Learning
No ratings yet
FAQ's - Supervised Learning
4 pages
hw1 Problem Set
No ratings yet
hw1 Problem Set
8 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
# ELG 5255 Applied Machine Learning Fall 2020 # Quiz 1 (Bayesian Decision Theory)
No ratings yet
# ELG 5255 Applied Machine Learning Fall 2020 # Quiz 1 (Bayesian Decision Theory)
6 pages
Capstone project_Jaro-Prof. Babji
No ratings yet
Capstone project_Jaro-Prof. Babji
5 pages
MachineLearning MidTerm UMT Spring 2021
No ratings yet
MachineLearning MidTerm UMT Spring 2021
12 pages
original ML lab manual (1)
No ratings yet
original ML lab manual (1)
22 pages
# ELG 5255 Applied Machine Learning Fall 2020 # Assignment 3 (Multivariate Method)
No ratings yet
# ELG 5255 Applied Machine Learning Fall 2020 # Assignment 3 (Multivariate Method)
8 pages
Machine Learning LAB
No ratings yet
Machine Learning LAB
20 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
HW 3
No ratings yet
HW 3
4 pages
Final ML File
No ratings yet
Final ML File
34 pages
ML Priyesha - 778
No ratings yet
ML Priyesha - 778
23 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
Assignment 2 Specification
No ratings yet
Assignment 2 Specification
3 pages
Slides on DataI
No ratings yet
Slides on DataI
33 pages
Python
No ratings yet
Python
38 pages
Homework 2 - Handwriting: Grading Policy
No ratings yet
Homework 2 - Handwriting: Grading Policy
3 pages
Practical Assignment ML
No ratings yet
Practical Assignment ML
50 pages
CS4100 CS5100 CW1 20241001
No ratings yet
CS4100 CS5100 CW1 20241001
10 pages
HW_02
No ratings yet
HW_02
3 pages
ML New record (5)
No ratings yet
ML New record (5)
51 pages
DM_LabManual_teena
No ratings yet
DM_LabManual_teena
6 pages
ML RECORD NEW FORMAT
No ratings yet
ML RECORD NEW FORMAT
48 pages
ML Week7 Soln
No ratings yet
ML Week7 Soln
3 pages
ML Lab Manual
No ratings yet
ML Lab Manual
38 pages
Datascience
No ratings yet
Datascience
8 pages
Ml_Lab_Manual
No ratings yet
Ml_Lab_Manual
70 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Karmbir 19 ML
No ratings yet
Karmbir 19 ML
20 pages
Python: Master
No ratings yet
Python: Master
37 pages
Edited - Edited - Final ML Lab Manual Version11
No ratings yet
Edited - Edited - Final ML Lab Manual Version11
83 pages
Exercise - 3: DS203-2024-S1 Roll Number: 23B2215
No ratings yet
Exercise - 3: DS203-2024-S1 Roll Number: 23B2215
25 pages
ML Assignment
No ratings yet
ML Assignment
34 pages
Assigniment 2 Machine Learning
No ratings yet
Assigniment 2 Machine Learning
7 pages
Ifjo 320 Fy 98324 Fo 3 F 2 Ifr
No ratings yet
Ifjo 320 Fy 98324 Fo 3 F 2 Ifr
6 pages
CCS 4102 Machine Learning Examinations-August 2020
No ratings yet
CCS 4102 Machine Learning Examinations-August 2020
2 pages
ECON 460202E006 MLforBI2 S23o
No ratings yet
ECON 460202E006 MLforBI2 S23o
5 pages
Important Questions
No ratings yet
Important Questions
4 pages
Machine Learning Question Paper Set 1
No ratings yet
Machine Learning Question Paper Set 1
4 pages
Machine File
No ratings yet
Machine File
27 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Dynavision Handout
No ratings yet
Dynavision Handout
1 page
The Man Bert Hellinger
No ratings yet
The Man Bert Hellinger
4 pages
4 Marigolds, Collier
No ratings yet
4 Marigolds, Collier
10 pages
Preventative Care Assignment-Part 24 Answers
No ratings yet
Preventative Care Assignment-Part 24 Answers
9 pages
Personal Statement Worksheet
No ratings yet
Personal Statement Worksheet
5 pages
Quicksort - An Example: Left Left+1 Right
No ratings yet
Quicksort - An Example: Left Left+1 Right
2 pages
Resume-Simran Gulati Present Weebly
No ratings yet
Resume-Simran Gulati Present Weebly
2 pages
Scope and Sequence COST 2
No ratings yet
Scope and Sequence COST 2
5 pages
Practice Test 3
No ratings yet
Practice Test 3
5 pages
Revised Jikoshoukai
No ratings yet
Revised Jikoshoukai
8 pages
Notes - Bill of Lading Cases
No ratings yet
Notes - Bill of Lading Cases
7 pages
Derrick Rose': Valerio, Cyrus Steven E. Bsma-1A Arts Appreciation
No ratings yet
Derrick Rose': Valerio, Cyrus Steven E. Bsma-1A Arts Appreciation
2 pages
TSLB3113 - Language Assessment in The Classroom - T1 - Anis PDF
100% (1)
TSLB3113 - Language Assessment in The Classroom - T1 - Anis PDF
1 page
ACT - Ziegler-Nichols Tuning
No ratings yet
ACT - Ziegler-Nichols Tuning
3 pages
Speech 2
No ratings yet
Speech 2
2 pages
Test Bank Management System Applying Rasch Model and Data Encryption Standard (DES) Algorithm
No ratings yet
Test Bank Management System Applying Rasch Model and Data Encryption Standard (DES) Algorithm
9 pages
Bài tập Tết môn Tiếng Anh
No ratings yet
Bài tập Tết môn Tiếng Anh
7 pages
5 CAUTI-checklist-GDIPC
No ratings yet
5 CAUTI-checklist-GDIPC
4 pages
M1 L2 Vision Mission
No ratings yet
M1 L2 Vision Mission
7 pages
Table Space Administration
No ratings yet
Table Space Administration
10 pages
A Study On Consumer Satisfaction Towards E - Banking" With Special Reference To Syndicate Bank, Vidya Nagara, Shivamogga
No ratings yet
A Study On Consumer Satisfaction Towards E - Banking" With Special Reference To Syndicate Bank, Vidya Nagara, Shivamogga
55 pages
Lesson Plan PPG Executive Branch
100% (2)
Lesson Plan PPG Executive Branch
8 pages
Chapter 3 School Life at Ateneo 1
100% (2)
Chapter 3 School Life at Ateneo 1
16 pages
One Party Symtem
No ratings yet
One Party Symtem
44 pages
Costache - Christian Worldview: Understandings From ST Basil The Great (Cappadocian Legacy)
100% (1)
Costache - Christian Worldview: Understandings From ST Basil The Great (Cappadocian Legacy)
34 pages
Standard Steel
No ratings yet
Standard Steel
13 pages
Birth of Bioinfo
No ratings yet
Birth of Bioinfo
15 pages
BUS7B50 Assignment1 Brief
No ratings yet
BUS7B50 Assignment1 Brief
9 pages
Power and Professions in Britain, 1700-1850 (Penelo Corfield) (Z-Library)
No ratings yet
Power and Professions in Britain, 1700-1850 (Penelo Corfield) (Z-Library)
281 pages

Assignment 1

Uploaded by

Assignment 1

Uploaded by

Machine Learning (PG)

You might also like