0% found this document useful (0 votes)

87 views3 pages

Decision Trees in Machine Learning

This document contains information about building decision tree models. It includes 7 multiple choice questions about decision trees, including splitting nodes based on error reduction thresholds, how bias and variance change as trees grow deeper, calculating average outputs of branches for reduced error pruning, and selecting optimal split points based on information gain and gini index measures.

Uploaded by

Abhishek Dubey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views3 pages

Decision Trees in Machine Learning

Uploaded by

Abhishek Dubey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Assignment 6

Introduction to Machine Learning

Prof. B. Ravindran
1. When building models using decision trees we essentially split the entire input space using
(a) axis parallel hyper-rectangles
(b) polynomials curves of order greater than two
(c) polynomial curves of the same order as the length of decision tree
(d) none of the above
Sol. (a)
2. In building a decision tree model, to control the size of the tree, we need to control the number
of regions. One approach to do this would be to split tree nodes only if the resultant decrease
in the sum of squares error exceeds some threshold. For the described method, which among
the following are true?
(a) it would, in general, help restrict the size of the trees
(b) it has the potential to affect the performance of the resultant regression/classification
model
(c) it is computationally infeasible
Sol. (a), (b)
While this approach may restrict the eventual number of regions produced, the main problem
with this approach is that it is too restrictive and may result in poor performance. It is very
common for splits at one level, which themselves are not that good (i.e., they do not decrease
the error significantly), to lead to very good splits (i.e., where the error is significantly reduced)
down the line. Think about the XOR problem.
3. Suppose we use the decision tree model for solving a multi-class classification problem. As we
continue building the tree, w.r.t. the generalisation error of the model,
(a) the error due to bias increases
(b) the error due to bias decreases
(c) the error due to variance increases
(d) the error due to variance decreases
Sol. (b) & (c)
As we continue to build the decision tree model, it is possible that we overfit the data. In
this case, the model is sufficiently complex, i.e., the error due to bias is low. However, due to
overfitting, the error due to variance starts increasing.
4. (2 marks) Having built a decision tree, we are using reduced error pruning to reduce the size
of the tree. We select a node to collapse. For this particular node, on the left branch, there are
3 training data points with the following outputs: 5, 7, 9.6 and for the right branch, there are
four training data points with the following outputs: 8.7, 9.8, 10.5, 11. The average value of the
outputs of data points denotes the response of a branch. The original responses for data points

1
along the two branches (left right respectively) were response left and, response right and the
new response after collapsing the node is response new. What are the values for response left,
response right and response new (numbers in the option are given in the same order)?
(a) 21.6, 40, 61.6
(b) 7.2; 10; 8.8
(c) 3, 4, 7
(d) depends on the tree height.
Sol. (b)
Original responses:
Left: 5+7+9.6
3 = 7.2
Right: 8.7+9.8+10.5+11
4 = 10
New response: 7.2 × 37 + 10 × 4
7 = 8.8
5. (2 marks) Consider the following dataset:

feature1 feature2 output

11.7 183.2 a
12.8 187.6 a
15.3 177.4 a
13.9 198.6 a
17.2 175.3 a
16.8 151.1 b
17.5 171.4 b
23.6 162.8 b
16.9 179.5 b
19.1 173.8 b

Which among the following split-points for the feature 1 would give the best split according to
the information gain measure?

(a) 14.6
(b) 16.05
(c) 16.85
(d) 17.35

Sol. (b)
3 3 3 0 0 7 2 2 5 5
info feature1 (14.6) (D) = 10 (− 3 log2 3 − 3 log2 3 ) + 10 (− 7 log2 7 − 7 log2 7 ) = 0.6042
4
info feature1 (16.05) (D) = 10 (− 44 log2 44 − 04 log2 04 ) + 10
6
(− 16 log2 61 − 56 log2 56 ) = 0.39
5
info feature1 (16.85) (D) = 10 (− 45 log2 54 − 15 log2 15 ) + 10
5
(− 15 log2 51 − 45 log2 45 ) = 0.7219
7
info feature1 (17.35) (D) = 10 (− 57 log2 75 − 27 log2 27 ) + 10
3
(− 03 log2 30 − 33 log2 33 ) = 0.6042
6. (2 marks) For the same dataset, which among the following split-points for feature2 would
give the best split according to the gini index measure?
(a) 172.6

2
(b) 176.35
(c) 178.45
(d) 185.4
Sol. (a)
7
ginifeature2 (172.6) (D) = 10 × 2 × 75 × 27 + 10
3
× 2 × 03 × 33 = 0.2857
5
ginifeature2 (176.35) (D) = 10 × 2 × 51 × 54 + 105
× 2 × 45 × 15 = 0.32
ginifeature2 (178.45) (D) = 10 × 2 × 6 × 6 + 10 × 2 × 34 × 14 = 0.4167
6 2 4 4
2
ginifeature2 (185.4) (D) = 10 × 2 × 22 × 02 + 10
8
× 2 × 38 × 85 = 0.375

7. In which of the following situations is it appropriate to introduce a new category ’Missing’ for
missing values? (multiple options may be correct)

(a) When values are missing because the 108 emergency operator is sometimes attending a
very urgent distress call.
(b) When values are missing because the attendant spilled coffee on the papers from which
the data was extracted.
(c) When values are missing because the warehouse storing the paper records went up in
flames and burnt parts of it.
(d) When values are missing because the nurse/doctor finds the patient’s situation too urgent.

Sol. (a),(d)
We typically introduce a ‘Missing’ value when the fact that a value is missing can also be a
relevant feature. In the case of (a) is can imply that the call was so urgent that the operator
couldn’t note it down. This urgency could potentially be useful to determine the target.
But a coffee spill corrupting the records is likely to be completely random and we glean no
new information from it. In this case, a better method is to try to predict the missing data
from the available data.

Decision Tree Pruning Analysis
No ratings yet
Decision Tree Pruning Analysis
10 pages
Machine Learning Quiz 2 - Monsoon 2024
No ratings yet
Machine Learning Quiz 2 - Monsoon 2024
6 pages
Machine Learning Quiz 2 - Monsoon 2024
No ratings yet
Machine Learning Quiz 2 - Monsoon 2024
5 pages
IIT Kharagpur Machine Learning Exam 2023
No ratings yet
IIT Kharagpur Machine Learning Exam 2023
9 pages
AI Decision Trees and K-Nearest Neighbors
No ratings yet
AI Decision Trees and K-Nearest Neighbors
20 pages
Practice Quiz: Logistic Regression Concepts
No ratings yet
Practice Quiz: Logistic Regression Concepts
8 pages
Decision Tree Classifier Insights
No ratings yet
Decision Tree Classifier Insights
3 pages
Machine Learning Quiz 3 Rubric 2023
No ratings yet
Machine Learning Quiz 3 Rubric 2023
4 pages
Machine Learning Assignment 1 Guide
No ratings yet
Machine Learning Assignment 1 Guide
3 pages
Machine Learning Midterm Solutions
No ratings yet
Machine Learning Midterm Solutions
14 pages
Data Mining Exam for B.Sc. Students
No ratings yet
Data Mining Exam for B.Sc. Students
6 pages
Introduction To Machine Learning IIT KGP Week 2
100% (1)
Introduction To Machine Learning IIT KGP Week 2
14 pages
Data Mining Algorithms - Exam 23/24
No ratings yet
Data Mining Algorithms - Exam 23/24
5 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
71 pages
CS171 Midterm Exam Performance Analysis
No ratings yet
CS171 Midterm Exam Performance Analysis
14 pages
Week 4 Solution PDS
No ratings yet
Week 4 Solution PDS
9 pages
Midsem Exam Overview and Questions
No ratings yet
Midsem Exam Overview and Questions
208 pages
Data Mining Homework 2 Instructions
No ratings yet
Data Mining Homework 2 Instructions
7 pages
DM-I Q Paper 2024
No ratings yet
DM-I Q Paper 2024
12 pages
Advanced Data Mining Exam Questions
100% (1)
Advanced Data Mining Exam Questions
5 pages
May 2021 Examination Diet School of Mathematics & Statistics ID5059
No ratings yet
May 2021 Examination Diet School of Mathematics & Statistics ID5059
6 pages
Machine Learning Quiz for Students
No ratings yet
Machine Learning Quiz for Students
45 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
69 pages
Data Mining Midterm Exam Solutions
No ratings yet
Data Mining Midterm Exam Solutions
11 pages
SMAI End 2015 S
No ratings yet
SMAI End 2015 S
4 pages
MT2023 Sol
No ratings yet
MT2023 Sol
8 pages
CS 540 AI Midterm Exam Overview
No ratings yet
CS 540 AI Midterm Exam Overview
10 pages
Data Quality Issues and Solutions in Datasets
No ratings yet
Data Quality Issues and Solutions in Datasets
4 pages
CMU 15-381 Spring 2007 Assignment 6
No ratings yet
CMU 15-381 Spring 2007 Assignment 6
14 pages
Introduction To Machine Learning - Unit 9 - Week 6
No ratings yet
Introduction To Machine Learning - Unit 9 - Week 6
4 pages
Mid Semester Regular-DM
No ratings yet
Mid Semester Regular-DM
3 pages
Week 7 Solution
100% (1)
Week 7 Solution
4 pages
CS 771A: Intro To Machine Learning, IIT Kanpur Name Roll No Dept
No ratings yet
CS 771A: Intro To Machine Learning, IIT Kanpur Name Roll No Dept
4 pages
Data Science Midterm Exam Solutions
No ratings yet
Data Science Midterm Exam Solutions
8 pages
Introduction To Machine Learning - Ecen 4122 - 2023
No ratings yet
Introduction To Machine Learning - Ecen 4122 - 2023
4 pages
Decision Trees and Ensemble Methods
No ratings yet
Decision Trees and Ensemble Methods
38 pages
ML 2024a QP Solution Full
No ratings yet
ML 2024a QP Solution Full
13 pages
CS3352 Foundations of Data Science Apr May 2024 Question Paper Download
No ratings yet
CS3352 Foundations of Data Science Apr May 2024 Question Paper Download
19 pages
Big Data Analytics Exam Questions 2023
No ratings yet
Big Data Analytics Exam Questions 2023
7 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
77 pages
Machine Learning Assignment Overview
No ratings yet
Machine Learning Assignment Overview
45 pages
CS246 Winter 2011 Exam Solutions
No ratings yet
CS246 Winter 2011 Exam Solutions
18 pages
COL 774 Assignment 3: Decision Trees & RF
No ratings yet
COL 774 Assignment 3: Decision Trees & RF
4 pages
Machine Learning Assignment Questions
No ratings yet
Machine Learning Assignment Questions
7 pages
CS771 Mid-Semester Exam on ML Techniques
No ratings yet
CS771 Mid-Semester Exam on ML Techniques
7 pages
Nptel Week 7
No ratings yet
Nptel Week 7
3 pages
Parameter Learning in Classification Models
No ratings yet
Parameter Learning in Classification Models
2 pages
Naive Bayes and Regression Analysis Guide
No ratings yet
Naive Bayes and Regression Analysis Guide
5 pages
C4.5 Decision Tree Overview and Issues
No ratings yet
C4.5 Decision Tree Overview and Issues
53 pages
Data Mining f20 Practice Final Solutions
No ratings yet
Data Mining f20 Practice Final Solutions
8 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
31 pages
Machine Learning Exam Questions and Guidelines
No ratings yet
Machine Learning Exam Questions and Guidelines
7 pages
Decision Trees: Building & Interpretation Guide
100% (1)
Decision Trees: Building & Interpretation Guide
26 pages
EE2211 Past Paper
No ratings yet
EE2211 Past Paper
14 pages
Machine Learning Assignment Solutions
No ratings yet
Machine Learning Assignment Solutions
46 pages
Machine Learning Mid-Sem Exam Instructions
No ratings yet
Machine Learning Mid-Sem Exam Instructions
8 pages
Response and Predictor Variables Analysis
No ratings yet
Response and Predictor Variables Analysis
6 pages
Quantitative Methods
25% (4)
Quantitative Methods
28 pages
2018 CFA L2 Mock AM 01.05
No ratings yet
2018 CFA L2 Mock AM 01.05
48 pages
AP-Statistics Exam
100% (2)
AP-Statistics Exam
23 pages
Numerical Methods For Least Squares Problems, Second Edition
No ratings yet
Numerical Methods For Least Squares Problems, Second Edition
510 pages
Wang Et Al. (2025) D Cjnjvcerjfjkrklgkorigidd HG
No ratings yet
Wang Et Al. (2025) D Cjnjvcerjfjkrklgkorigidd HG
23 pages
Math Problem Solving Answers
No ratings yet
Math Problem Solving Answers
48 pages
Comparison Between ICH Vs ANVISA AMV 1742817436
No ratings yet
Comparison Between ICH Vs ANVISA AMV 1742817436
15 pages
(2019) Ahmed Et Al. The Impact of Corporate Social and Environmental Practices On The Cost of Equity Capital - UK Evidence
No ratings yet
(2019) Ahmed Et Al. The Impact of Corporate Social and Environmental Practices On The Cost of Equity Capital - UK Evidence
17 pages
Nonprofit Hospital Marketing in Nigeria
No ratings yet
Nonprofit Hospital Marketing in Nigeria
10 pages
Linear Regression: Simple & Multiple Models
No ratings yet
Linear Regression: Simple & Multiple Models
103 pages
Modeling An SMT Line To Improve Throughput
No ratings yet
Modeling An SMT Line To Improve Throughput
6 pages
Business School Homework Analysis
No ratings yet
Business School Homework Analysis
18 pages
Types of Spatial Regression Models
No ratings yet
Types of Spatial Regression Models
19 pages
02c# - Guggenmos Et Al. (2018) - Custom Contrast Testing
No ratings yet
02c# - Guggenmos Et Al. (2018) - Custom Contrast Testing
23 pages
Prediction and Optimization of Tensile Properties of 2219-T8 Aluminum Alloy TIG Welding Joint by Machine Learning
No ratings yet
Prediction and Optimization of Tensile Properties of 2219-T8 Aluminum Alloy TIG Welding Joint by Machine Learning
11 pages
Millennial Workforce and L&D in IT Industry
No ratings yet
Millennial Workforce and L&D in IT Industry
43 pages
2021-22 M.A. Psychology (CBCS Pattern)
No ratings yet
2021-22 M.A. Psychology (CBCS Pattern)
33 pages
SPSS Nonparametric Analysis for Ordinal Data
100% (3)
SPSS Nonparametric Analysis for Ordinal Data
212 pages
MLR Insights for Data Analysts
100% (2)
MLR Insights for Data Analysts
33 pages
Rubibenuboduvoz
No ratings yet
Rubibenuboduvoz
3 pages
ANOVA Assumptions and Data Transformations
No ratings yet
ANOVA Assumptions and Data Transformations
16 pages
Probability and Statistics in Engineering: - Restricted!
No ratings yet
Probability and Statistics in Engineering: - Restricted!
672 pages
Estimating Population Parameters Guide
100% (1)
Estimating Population Parameters Guide
5 pages
CHE 302: Instrumental Analysis Overview
100% (1)
CHE 302: Instrumental Analysis Overview
33 pages
Predicting Activity Coefficients in Water
No ratings yet
Predicting Activity Coefficients in Water
10 pages
Sample Answers for Further Statistics 1&2
No ratings yet
Sample Answers for Further Statistics 1&2
39 pages
Limitations of Statistics in Management
No ratings yet
Limitations of Statistics in Management
10 pages
Physics: Measurements and Vectors Guide
No ratings yet
Physics: Measurements and Vectors Guide
21 pages
Sampling and Non-Sampling
No ratings yet
Sampling and Non-Sampling
3 pages
Cola Sales Analysis and Elasticity Results
No ratings yet
Cola Sales Analysis and Elasticity Results
5 pages

Decision Trees in Machine Learning

Uploaded by

Decision Trees in Machine Learning

Uploaded by

Assignment 6

Introduction to Machine Learning

feature1 feature2 output

You might also like