0% found this document useful (0 votes)
123 views

Decision Tree

The document discusses decision trees, which are classification models that recursively partition data into purer subsets based on the values of predictor variables. It describes the ID3 algorithm, which uses information gain to choose the variable that best splits the data at each step. The algorithm calculates the entropy of data subsets and chooses the split with the highest information gain, or largest decrease in entropy. It then recursively partitions the resulting subsets until reaching leaf nodes containing homogeneous data or stopping criteria are met. The final result is a tree that can be converted to classification rules.

Uploaded by

sneharach
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views

Decision Tree

The document discusses decision trees, which are classification models that recursively partition data into purer subsets based on the values of predictor variables. It describes the ID3 algorithm, which uses information gain to choose the variable that best splits the data at each step. The algorithm calculates the entropy of data subsets and chooses the split with the highest information gain, or largest decrease in entropy. It then recursively partitions the resulting subsets until reaching leaf nodes containing homogeneous data or stopping criteria are met. The final result is a tree that can be converted to classification rules.

Uploaded by

sneharach
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

9/23/2017 Decision Tree

Map > Data Mining > Predic ng the Future > Modeling > Classifica on > Decision Tree

Decision Tree - Classifica on

Decision tree builds classifica on or regression models in the form of a tree structure. It breaks down a dataset into
smaller and smaller subsets while at the same me an associated decision tree is incrementally developed. The final result
is a tree with decision nodes and leaf nodes. A decision node (e.g., Outlook) has two or more branches (e.g., Sunny,
Overcast and Rainy). Leaf node (e.g., Play) represents a classifica on or decision. The topmost decision node in a tree
which corresponds to the best predictor called root node. Decision trees can handle both categorical and numerical data.

Algorithm

The core algorithm for building decision trees called ID3 by J. R. Quinlan which employs a top-down, greedy search
through the space of possible branches with no backtracking. ID3 uses Entropy and Informa on Gain to construct a
decision tree.

Entropy
A decision tree is built top-down from a root node and involves par oning the data into subsets that contain instances
with similar values (homogenous). ID3 algorithm uses entropy to calculate the homogeneity of a sample. If the sample is
completely homogeneous the entropy is zero and if the sample is an equally divided it has entropy of one.

To build a decision tree, we need to calculate two types of entropy using frequency tables as follows:

http://www.saedsayad.com/decision_tree.htm 1/5
9/23/2017 Decision Tree

a) Entropy using the frequency table of one a ribute:

b) Entropy using the frequency table of two a ributes:

Informa on Gain
The informa on gain is based on the decrease in entropy a er a dataset is split on an a ribute. Construc ng a decision
tree is all about finding a ribute that returns the highest informa on gain (i.e., the most homogeneous branches).

Step 1: Calculate entropy of the target.

Step 2: The dataset is then split on the different a ributes. The entropy for each branch is calculated. Then it is added
propor onally, to get total entropy for the split. The resul ng entropy is subtracted from the entropy before the split. The
result is the Informa on Gain, or decrease in entropy.

http://www.saedsayad.com/decision_tree.htm 2/5
9/23/2017 Decision Tree

Step 3: Choose a ribute with the largest informa on gain as the decision node, divide the dataset by its branches and
repeat the same process on every branch.

Step 4a: A branch with entropy of 0 is a leaf node.

http://www.saedsayad.com/decision_tree.htm 3/5
9/23/2017 Decision Tree

Step 4b: A branch with entropy more than 0 needs further spli ng.

Step 5: The ID3 algorithm is run recursively on the non-leaf branches, un l all data is classified.

Decision Tree to Decision Rules

A decision tree can easily be transformed to a set of rules by mapping from the root node to the leaf nodes one by one.

Decision Trees - Issues

http://www.saedsayad.com/decision_tree.htm 4/5
9/23/2017 Decision Tree

Working with con nuous a ributes (binning)


Avoiding overfi ng
Super A ributes (a ributes with many values)
Working with missing values

Exercise

Try to invent a new algorithm to construct a decision tree from data using Chi2 test.

http://www.saedsayad.com/decision_tree.htm 5/5

You might also like