Classification and Regression Trees (CART) Algorithm

Uploaded by

Ilampooranan

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

19 views15 pages

Classification and Regression Trees (CART) Algorithm

Uploaded by

Ilampooranan

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Download as pptx, pdf, or txt

You are on page 1/ 15

Classification and

Regression Trees
(CART) Algorithm
Prakash P
Classification and Regression Trees
• Classification and Regression Trees (CART) is only a modern term for
what are otherwise known as Decision Trees.
• Decision Trees have been around for a very long time and are
important for predictive modelling in Machine Learning.
• As the name suggests, these trees are used for classification and
prediction problems.
• This serve the basis for other modern classifiers such as Random
Forest
Classification
• Generally, a classification problem
can be described as follows:
Data: A set of records (instances)
that are described by:
* k attributes: A1, A2,...Ak
* A class: Discrete set of labels
Goal: To learn a classification
model from the data that can be
used to predict the classes of new
(future, or test) instances.
Classification

However, we must note that there can be many other possible decision
trees for a given problem - we want the shortest one.
Want it to be better in terms of accuracy (prediction error measured in
terms of misclassification cost).
CART Algorithm for
Classification
• The tree will be constructed in a top-down approach as follows:

Step 1: Start at the root node with all training instances

Step 2: Select an attribute on the basis of splitting criteria (Gain Ratio or other impurity metrics)

Step 3: Partition instances according to selected attribute recursively

Partitioning stops when:

 There are no examples left
 All examples for a given node belong to the same class
 There are no remaining attributes for further partitioning –
majority class is the leaf
What is Impurity?
• In our dataset we can see that a loan is always approved when the
applicant owns their own house. This is very informative (and certain)
and is hence set as the root node of the alternative decision tree shown
previously.
• Classifying a lot of future applicants will be easy.
Selecting the age attribute is not as informative - there is a degree of
uncertainity (or impurity). The person's age does not seem to affect the
final class as much.
• Based on the above discussion:
• A subset of data is pure if all instances belong to the same class.
• Our objective is to reduce impurity or uncertainty in data as much as possible.
The metric (or heuristic) used in CART to measure impurity is
the Gini Index and we select the attributes with lower Gini
Indices first.
Gini Index

Gini index or Gini impurity measures the degree or probability of a particular variable being wrongly classified when it is
randomly chosen.

But what is actually meant by ‘impurity’? If all the elements belong to a single class, then it can be called pure.

The degree of Gini index varies between 0 and 1, where 0 denotes that all elements belong to a certain class or if there
exists only one class, and 1 denotes that the elements are randomly distributed across various classes.

A Gini Index of 0.5 denotes equally distributed elements into some classes.
Example of Gini Index
Let’s start by calculating the
Gini Index for ‘Past Trend’.
Calculation of Gini Index for
Open Interest
Trading Volume
Example

From the above table, we observe that ‘Past Trend’ has the
lowest Gini Index and hence it will be chosen as the root node
for how decision tree works.

We will repeat the same procedure to determine the sub-nodes

or branches of the decision tree.
Gini Index for the ‘Positive’ branch

Calculation of Gini Index of Open Interest for Positive Past

Trend
Gini Index for the ‘Positive’ branch

Calculation of Gini Index of Open Interest for Trading Volume

Gini Index for the ‘Positive’ branch

We will split the node further using the ‘Trading Volume’

feature, as it has the minimum Gini index.

Gini Index, unlike information gain, isn’t computationally

intensive as it doesn’t involve the logarithm function used to
calculate entropy in information gain, which is why Gini Index is
preferred over Information gain

FEF and FEF-IV - 5 - 6 - 2017
No ratings yet
FEF and FEF-IV - 5 - 6 - 2017
13 pages
CART1
No ratings yet
CART1
17 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
Classification and Regression Trees (CART - III) : DR A. Ramesh
No ratings yet
Classification and Regression Trees (CART - III) : DR A. Ramesh
42 pages
3 - Sınıflandırma 2
No ratings yet
3 - Sınıflandırma 2
62 pages
Decision Trees
No ratings yet
Decision Trees
16 pages
dm unit 4
No ratings yet
dm unit 4
24 pages
S&ML Unit 6- Q & A
No ratings yet
S&ML Unit 6- Q & A
12 pages
Business Data Mining Week 11
No ratings yet
Business Data Mining Week 11
15 pages
CSE445 NSU Week_4
No ratings yet
CSE445 NSU Week_4
48 pages
Applying Decision Tree Algorithm Classification An
No ratings yet
Applying Decision Tree Algorithm Classification An
5 pages
DM Lect8
No ratings yet
DM Lect8
56 pages
Decision Tree
No ratings yet
Decision Tree
8 pages
PR GTU IMP questions by jay
No ratings yet
PR GTU IMP questions by jay
35 pages
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
60 pages
Decision Tree - ML Class
No ratings yet
Decision Tree - ML Class
58 pages
Decision Tree: "For Each Node of The Tree, The Information Value Measures
No ratings yet
Decision Tree: "For Each Node of The Tree, The Information Value Measures
3 pages
Dadm s16 Cart
No ratings yet
Dadm s16 Cart
18 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Ml Unit 2 Final_iii Yr
No ratings yet
Ml Unit 2 Final_iii Yr
72 pages
Classification: Basic Concepts and Decision Trees
No ratings yet
Classification: Basic Concepts and Decision Trees
71 pages
Data Warehousing and Data Mining: Classification, Trees
No ratings yet
Data Warehousing and Data Mining: Classification, Trees
26 pages
CSE 422 Machine Learning Tree Based Methods
No ratings yet
CSE 422 Machine Learning Tree Based Methods
35 pages
Decitions Tree
No ratings yet
Decitions Tree
6 pages
Introduction To Big Data and Data Mining
No ratings yet
Introduction To Big Data and Data Mining
130 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Data II_ Decision Trees and Rules
No ratings yet
Data II_ Decision Trees and Rules
11 pages
DMMLASSIGNMENT
No ratings yet
DMMLASSIGNMENT
36 pages
Decision Tree Theory
No ratings yet
Decision Tree Theory
22 pages
Decision Trees
No ratings yet
Decision Trees
19 pages
Cart
No ratings yet
Cart
19 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Decision Tree
No ratings yet
Decision Tree
47 pages
Cart
No ratings yet
Cart
7 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
ML-chap9_2024_110217
No ratings yet
ML-chap9_2024_110217
52 pages
Trinh Khanh Ly 20213676
No ratings yet
Trinh Khanh Ly 20213676
13 pages
ML-20 (1)
No ratings yet
ML-20 (1)
24 pages
Data Mining & Knowledge Discovery
No ratings yet
Data Mining & Knowledge Discovery
34 pages
Lecture 9
No ratings yet
Lecture 9
21 pages
Classification and Regression Trees (CART) Theory and Applications
No ratings yet
Classification and Regression Trees (CART) Theory and Applications
40 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
CART - Machine Learning
No ratings yet
CART - Machine Learning
29 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
59 pages
Decision Tree - Associative Rule Mining
No ratings yet
Decision Tree - Associative Rule Mining
69 pages
dm 3
No ratings yet
dm 3
37 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Decision Trees
No ratings yet
Decision Trees
31 pages
2179-Unit-3
No ratings yet
2179-Unit-3
29 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
NOTES
No ratings yet
NOTES
18 pages
Ecture Ecision REE: Sajal Halder Bsmrstu
100% (1)
Ecture Ecision REE: Sajal Halder Bsmrstu
22 pages
CART Algo
No ratings yet
CART Algo
36 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
CART
No ratings yet
CART
19 pages
Decision Tree Algorithm Considering Distances Betw
No ratings yet
Decision Tree Algorithm Considering Distances Betw
7 pages
Financial Applications of Classification and Regr
No ratings yet
Financial Applications of Classification and Regr
41 pages
Gini Vs Entrophy
No ratings yet
Gini Vs Entrophy
8 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Gretl Manual
No ratings yet
Gretl Manual
26 pages
Unit 3 - Measures of Central Tendency
No ratings yet
Unit 3 - Measures of Central Tendency
2 pages
STA 211 Lecture 3
No ratings yet
STA 211 Lecture 3
21 pages
Chapter Five Sampling and Sampling Distribution
No ratings yet
Chapter Five Sampling and Sampling Distribution
19 pages
Statistics and Probability EDITED LAS
No ratings yet
Statistics and Probability EDITED LAS
6 pages
WEEK 5 - Random Sampling
0% (1)
WEEK 5 - Random Sampling
27 pages
Quick and Dirty Regression Tutorial
No ratings yet
Quick and Dirty Regression Tutorial
6 pages
Block-3 MCO-3 Unit-3
No ratings yet
Block-3 MCO-3 Unit-3
23 pages
Data Science & Analytics: Course Code: CSE3105 Credits: 02 Credit Hours: 02/week Exam Hours: 03
No ratings yet
Data Science & Analytics: Course Code: CSE3105 Credits: 02 Credit Hours: 02/week Exam Hours: 03
2 pages
Assigment (1&2) Statistical Methods (1551) PDF - Docx (21-06-2018) PDF
No ratings yet
Assigment (1&2) Statistical Methods (1551) PDF - Docx (21-06-2018) PDF
6 pages
Chapter 3-Forecast Error - EM - Fall 22
No ratings yet
Chapter 3-Forecast Error - EM - Fall 22
6 pages
6179-0 Suppl Text
No ratings yet
6179-0 Suppl Text
183 pages
RV Prob Distributions First Lecture Notes
No ratings yet
RV Prob Distributions First Lecture Notes
45 pages
Simple Classical Forecasting Methods:, ,, ,, and - ARIMA Models (Box-Jenkins Procedure)
No ratings yet
Simple Classical Forecasting Methods:, ,, ,, and - ARIMA Models (Box-Jenkins Procedure)
7 pages
HIT391-week 3-New
No ratings yet
HIT391-week 3-New
43 pages
3.2-Problem-Solving-and-Data-Analysis-Medium
No ratings yet
3.2-Problem-Solving-and-Data-Analysis-Medium
102 pages
Introduction To Business Statistics - BCPC 112 PDF
No ratings yet
Introduction To Business Statistics - BCPC 112 PDF
11 pages
ECON 3300 Test 2 Reviews
No ratings yet
ECON 3300 Test 2 Reviews
5 pages
Full Download (Original PDF) Business Statistics A First Course, Second 2nd Canadian Edition PDF DOCX
100% (8)
Full Download (Original PDF) Business Statistics A First Course, Second 2nd Canadian Edition PDF DOCX
41 pages
Ledoit-Wolf Shrinkage Variance Estimate
No ratings yet
Ledoit-Wolf Shrinkage Variance Estimate
1 page
Representation of Metrics: Name of The Model Train and Test Set Accuracy Avg Precision RUC AUC Recall
No ratings yet
Representation of Metrics: Name of The Model Train and Test Set Accuracy Avg Precision RUC AUC Recall
3 pages
TRIAL STPM Mathematics M 2 (JOHOR) SMK TunHussienOnn
No ratings yet
TRIAL STPM Mathematics M 2 (JOHOR) SMK TunHussienOnn
8 pages
Jindal Global Business School: Course Outline
No ratings yet
Jindal Global Business School: Course Outline
5 pages
stat chapter 1
No ratings yet
stat chapter 1
4 pages
Hypothesis
100% (2)
Hypothesis
25 pages
The Need To Report Effect Size Estimates Revisited. An Overview of Some Recommended Measures of Effect Size
No ratings yet
The Need To Report Effect Size Estimates Revisited. An Overview of Some Recommended Measures of Effect Size
8 pages
Topic 17: Simple Hypotheses: 1 Overview and Terminology
No ratings yet
Topic 17: Simple Hypotheses: 1 Overview and Terminology
12 pages
MAS FE Summer 2020
No ratings yet
MAS FE Summer 2020
16 pages
Chapter 8 Stats Project
No ratings yet
Chapter 8 Stats Project
1 page