100% found this document useful (1 vote)
331 views7 pages

Assignment 4.solution

This document describes using a medical diagnostic dataset to construct a decision tree using the ID3 algorithm. Key steps include: 1. Calculating the entropy and information gain of attributes like sore throat, fever, swollen glands, congestion, and headache using formulas provided. 2. Determining the attribute with the highest information gain at each node to split the data, starting with swollen glands which had the highest gain of 0.88. 3. Continuing recursively down the tree by calculating entropy and information gain at each node until reaching a classification of either "Steep Throat", "Allergy", or "Cold".

Uploaded by

Mohammad Sharif
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
100% found this document useful (1 vote)
331 views7 pages

Assignment 4.solution

This document describes using a medical diagnostic dataset to construct a decision tree using the ID3 algorithm. Key steps include: 1. Calculating the entropy and information gain of attributes like sore throat, fever, swollen glands, congestion, and headache using formulas provided. 2. Determining the attribute with the highest information gain at each node to split the data, starting with swollen glands which had the highest gain of 0.88. 3. Continuing recursively down the tree by calculating entropy and information gain at each node until reaching a classification of either "Steep Throat", "Allergy", or "Cold".

Uploaded by

Mohammad Sharif
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 7

Assignment 4: Decision Tree - Classification

Consider a data set (take a reasonable number of observations) from the literature or research
papers or some other source to construct a Decision Tree using the ID3 algorithm.

Use well the Entropy and Information Gain to perform the calculations...

Following medical diagnostic data, I will solve for decision tree by using ID3 algorithm.

Sore Fever Swollen Congestion Headach Diagnosis


Throat glands e
Yes Yes Yes Yes Yes Steep
Throat
No No No Yes Yes allergy
Yes Yes No Yes No cold
Yes No Yes No No Steep
throat
No Yes No Yes No Cold
No No No Yes No allergy
No No Yes No No Steep
throat
Yes No No Yes Yes Allergy
No Yes No Yes Yes Cold
Yes No No Yes Yes Cold

First of all here we need to use to calculator.

We will use here formulas of entropy and information gain.

Information gain(p,n)= - P/s Log2 p/s – n/s log2 n/s


S=(p+n)

Entropy E(A)=
Proper gain (A)=I p1(n)-E(A)

By using these formulas, we will solve decision tree.

Here in this example I will use log base 2.

First of all, I have to find the information gain.

How much we have sample space S=ST+A+C=10

Steep throat=3

Allergy=3

Cold=4

Information gain = - [3/10log2(3/10)] + [3/10 log2 (3/10)] + [4/10 log2 (4/10)]

= - [0.3 log2(0.3) + [0.3log2(0.3)]+[0.4log2(0.4)]

=-[-0.521-0.521-0.529]

=1.571

By finding splitting attributes

(i). Sore Throat

Steep throat Allergy Cold


yes 2 1 2
no 1 2 2

Entropy (sore throat) =

For entropy first we will find information gain of yes and no.

Entropy (sore throat) = info(gain)*p + info (gain)*p

Information [yes]=[2/5 log2 (2/5)] + [1/5log2 (1/5)]+[2/5log(2/5)]


= -[-0.53+0.46+0.53]

=1.52

Information (no)= -[1/5log2 (1/5)]+ [2/5log(2/5)]+ [2/5log(2/5)]

= -[-0.46 -0.53 -0.53]

=1.52

We have to calculate entropy of sore throat.

(i) Sore Throat

Entropy (sore throat) = info(gain)*p + info (gain)*p

=5/10*1.52 + 5/10*1.52

= 0.5*1.52+0.5*1.52

=0.76+076

=1.52

Now we have to calculate gain

Proper gain (A) =I p1(n)-E(A)

IP(n) we have already found which was 1.562


Proper gain (A) =I p1(n)-E(A)
=1.571-1.52
=0.05
First attributes gain has been fined.

Now for finding the second attributes which is Fever


(ii). Fever

ST A C
yes 1 0 3
No 2 3 1
I have to find entropy of Fever.

For finding entropy I have to need Information gain of (yes) and Information gain of (No).

Information gain (yes)=

Entropy (Fever) = info(gain)*p + info (gain)*p

Info gain (Yes) = -[1/4 log2(1/4) + 0 log2(0/4) + ¾ log 2(3/4)

= -[-0.5 – 0.0 -0.311]

=0.811

Info gain (no) = -[2/6log2(2/6) + 3/6log2(3/6) + 1/6log2(l/6)

=-[-0.52 – 0.5 – 0.43]

=-[-1.45]

=1.45

Entropy of fever = [4/10*.811] + [6/10*1.45]

=0.32+ 0.87

=1.19

Gain of fever is = I p1(n)-E(fever)

=1.571-1.19
=0.38

(iii). Swollen glands

ST A A
yes 3 0 0
No 0 3 4

We have to calculate entropy of swollen glands.

Info gain (yes) = -[3/3log2 (3/3)

=0

Info gain (No) = -[3/7log2(3/7) + [4/7log2(4/7)]

=- [- 0.53-0.46]
= 0.99

Entropy of the swollen glands= 3/10*0 + 7/10 * .99

= 0.69

Gain of swollen glands is = 1.571-0.69=0.88

(iv) Congestion

ST A C
Yes 1 3 4
No 2 0 0

Entropy of congestion=

Info gain (yes) = -[1/8 log2(1/8) + 3/8 log2(3/8) + 4/8 log2(4/8) ]


= - [ -0.38 – 0.53 -0.5]
= 1.41
Info gain (No) = -[2/2log2(2/2)
=0
Entropy of Congestion = 8/10*1.41+ 2/10*0
= 1.128
Gain of Congestion = 1.572.1.128
=0.44

(iv) Headache

ST A C
Yes 1 2 2
No 2 1 2
Now I have to calculate Entropy of headache.

Info gain of (yes) = -[1/5log2(1/5)+ 2/5log2(2/5)+ 2/5log2(2/5)]


= -[- 0.46 -- 0.53 – 0.53]
=1.52
Info gain of (no) = - [2/5log2(2/5) + 1/5log2(1/5)+ 2/5log2(2/5)]
= - [-0.53 - 0.46 - 0.53]
=1.52
Entropy (Congestion) = 5/10*1.52+5/10*1.52
=0.76+0.76
=1.52
Gain of congestion is = 1.572-1.52
=0.05

So we now all gains which are also written below.

Attribute Gain
Sore throat 0.05
fever 0.38
Swollen glands 0.88
Congestion 0.44
Headache 0.05

Now we have to create decision tree .

We will choose first attributes

You might also like