Iterative Dichotomizer 3 (ID3)
Data Mining – Solved Example 02
Which feature will be at the root node of the decision tree trained for
the following data? Then draw the Decision Tree (DT) for this training
data.
Day Outlook Temp Humidity Wind PlayTennis
D01 Sunny Hot High Weak No
D02 Sunny Hot High Strong No
D03 Overcast Hot High Weak Yes
D04 Rain Mild High Weak Yes
D05 Rain Cool Normal Weak Yes
D06 Rain Cool Normal Strong No
D07 Overcast Cool Normal Strong Yes
D08 Sunny Mild High Weak No
D09 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Use your DT to classify the following:
Outlook Temp Humidity Wind Play Tennis
Rain Hot High Strong ?
Overcast Mild Normal Weak ?
1
Solution:
Calculating the entropy and information gain gives:
No Yes Total Entropy_PlayTennis
5 9 14 0.94
Item No Yes Tot Ent IG
Overcast 0 4 4 0
Outlook Rain 2 3 5 0.971 0.247
Sunny 3 2 5 0.971
Cool 1 3 4 0.811
Temp Hot 2 2 4 1.000 0.029
Mild 2 4 6 0.918
High 4 3 7 0.985
Humidity 0.152
Normal 1 6 7 0.592
Strong 3 3 6 1.000
Windy 0.048
Week 2 6 8 0.811
Ass shown, the highest IG is for the Outlook field, so we assign it the root.
Moving to the next step:
Play
Day Temp Humidity Wind
Tennis
D01 Hot High Weak No
D02 Hot High Strong No
D08 Mild High Weak No
D09 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
E_Sunny_Temp{no=3,yes=2} 0.97 IG_Temp 0.57
E_Hot{no=2, yes=0} 0 IG_Humidity 0.97
E_Mild{no=1,yes=1} 1 IG_Wind 0.0192
E_Cool{no=0,yes=1} 0
The Decision Tree will look like:
2
3