Classification and Regression Trees (CART) Algorithm
Classification and Regression Trees (CART) Algorithm
Regression Trees
(CART) Algorithm
Prakash P
Classification and Regression Trees
• Classification and Regression Trees (CART) is only a modern term for
what are otherwise known as Decision Trees.
• Decision Trees have been around for a very long time and are
important for predictive modelling in Machine Learning.
• As the name suggests, these trees are used for classification and
prediction problems.
• This serve the basis for other modern classifiers such as Random
Forest
Classification
• Generally, a classification problem
can be described as follows:
Data: A set of records (instances)
that are described by:
* k attributes: A1, A2,...Ak
* A class: Discrete set of labels
Goal: To learn a classification
model from the data that can be
used to predict the classes of new
(future, or test) instances.
Classification
However, we must note that there can be many other possible decision
trees for a given problem - we want the shortest one.
Want it to be better in terms of accuracy (prediction error measured in
terms of misclassification cost).
CART Algorithm for
Classification
• The tree will be constructed in a top-down approach as follows:
Step 2: Select an attribute on the basis of splitting criteria (Gain Ratio or other impurity metrics)
Gini index or Gini impurity measures the degree or probability of a particular variable being wrongly classified when it is
randomly chosen.
But what is actually meant by ‘impurity’? If all the elements belong to a single class, then it can be called pure.
The degree of Gini index varies between 0 and 1, where 0 denotes that all elements belong to a certain class or if there
exists only one class, and 1 denotes that the elements are randomly distributed across various classes.
A Gini Index of 0.5 denotes equally distributed elements into some classes.
Example of Gini Index
Let’s start by calculating the
Gini Index for ‘Past Trend’.
Calculation of Gini Index for
Open Interest
Trading Volume
Example
From the above table, we observe that ‘Past Trend’ has the
lowest Gini Index and hence it will be chosen as the root node
for how decision tree works.