0% found this document useful (0 votes)
155 views15 pages

Attribute Selection Measures

data mining notes outliers

Uploaded by

prathap badam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
155 views15 pages

Attribute Selection Measures

data mining notes outliers

Uploaded by

prathap badam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Classification -Part2

[Attribute selection measures]


Shubham kumar
Dept. of CS&IT
MGCUB
Attribute Selection Measures
• It is a heuristic approach to select the best splitting criterion that separates a given
data partition, D, of class-labeled training tuples into individual classes.

• Splitting criterion is called the best when after splitting, each partition will be pure.

• A partition is called pure when all the tuples that fall into the partition belongs to
the same class.

• Attribute selection measures are also known as splitting rules because they
determine how the tuples at a given node are to be split.

• First, a rank is provided for each attribute that describes the training tuples. And the
attribute having the best score for the measure is chosen as the splitting attribute
for the given tuples.
• If the splitting attribute is continuous-valued or if we are restricted to binary
trees, then respectively either a split point or a splitting subset must also be
determined as part of the splitting criterion.
partition scenarios Examples

1.

2.

3.
1. A is discrete-valued: In this case, the outcomes of the test at node N
correspond directly to the known values of A. A branch is created for each
known value, aj, of A and labeled with that value (as in the figure). Partition Dj
is the subset of class-labeled tuples in D having value aj of A.
2. A is continuous-valued: In this case, the test at node N has two possible
outcomes, corresponding to the conditions A ≤ split point and A > split point,
respectively, where split point is the split-point returned by Attribute selection
method as part of the splitting criterion.

3. If A is discrete-valued and a binary tree must be produced, then the test is of


the form A ∈ SA, where SA is the splitting subset for A.
• According to the algorithm the tree node created for partition D is labeled
with the splitting criterion, and the tuples are partitioned accordingly. [Also
Shown in the figure ].
• There are three popular attribute selection measures: Information Gain, Gain
ratio, and, Gini index.
• Information gain:
The attribute with the highest information gain is chosen as the splitting
attribute.
This attribute minimizes the information needed to classify the tuples in the
resulting partitions.
Let D, the data partition, be a training set of class-labeled tuples.
let class label attribute has m distinct values defining m distinct classes, Ci (for i
= 1,..., m). Let Ci,D be the set of tuples of class Ci in D. Let |D| and |Ci,D| denote
the number of tuples in D and Ci,D, respectively.
• Then the expected information needed to classify a tuple in D is given by

• where pi is the nonzero probability that an arbitrary tuple in D belongs to class


Ci and is estimated by |Ci,D|/|D|. Info(D) is the average amount of information
needed to identify the class label of a tuple in D. Info(D) is also known as the
entropy of D.
• Now, suppose we have to partition the tuples in D on some attribute A having
v distinct values, {a1, a2,..., av }.
Then the expected information required to classify the tuple from D based on
attribute A is:
• The term |Dj| /|D| acts as the weight of the j th partition. Info A (D) is the
expected information required to classify a tuple from D based on the
partitioning by A.

• Information gain is defined as the difference between the original


information requirement and the new requirement (i.e. obtained after
portioning on A).
Gain(A) = Info (D) – Info A (D)

Now, the attribute A with the highest information gain is chosen as the
splitting attribute.
Example:
This is a training set D, of class-labeled tuples randomly selected from the
AllElectronics customer database.
RID Age Income Student Credit rating Class: buys computer
1 Youth High No Fair No
2 Youth High No Excellent No
3 Middle_aged High No Fair Yes
4 Senior Medium No Fair Yes
5 Senior Low Yes Fair Yes
6 Senior Low Yes Excellent No
7 Middle_aged Low Yes Excellent yes
8 Youth Medium No Fair No
9 Youth Low Yes Fair Yes
10 Senior Medium Yes Fair Yes
11 Youth Medium Yes Excellent Yes
12 Middle_aged Medium No Excellent Yes
13 Middle_aged High Yes Fair Yes
14 Senior Mdium No Excellent no
• Here, the class label attribute, buys computer, has two distinct values: yes &
no.
• Therefore, there are two distinct classes (i.e., m = 2). Let class C1 correspond
to yes and class C2 correspond to no.
• There are nine tuples of class yes and five tuples of class no.
• A (root) node N is created for the tuples in D.
• To find the splitting criterion for these tuples, we must compute the
information gain of each attribute.
• Next, we need to compute the expected information requirement for each
attribute.
(1) Age: We need to look at the distribution of yes and no tuples for each
category of age. For category “youth,” there are two yes tuples and three
no tuples. For the category “middle_aged,” there are four yes tuples and
zero no tuples. For the category “senior,” there are three yes tuples and
two no tuples.
Therefore, the expected information needed to classify a tuple in D if the
tuples are partitioned according to age is:
=

Hence, the gain in information from such a partitioning would be


Gain(age) = Info(D) - InfoAge (D)
= 0.940 – 0.694 = 0.246 bits
Similarly,
Gain(income) = 0.029 bits
Gain (student)= 0.151 bits, and Gain(credit_rating) = 0.048 bits.
• Since Age has the highest information gain among the attributes, therefore
it is selected as the splitting attribute.

• According to the decision tree algorithm, Node N is labeled with age, and
branches are grown for each of the attribute’s values and the tuples are
then partitioned accordingly.
[Shown in the next figure]

• Here, the tuples falling into the partition for age = middle_aged, all belong
to the same class. Since they all belong to class “yes,” a leaf should
therefore be created at the end of this branch and labeled “yes.”
But for the other resulting partition where classes are not same, decision tree
algorithm uses the same splitting process recursively to form a decision tree.
Final decision tree would be: (Using Information gain as attribute selection
measure.)
Reference
• Jiawei Han, Micheline kamber and Jian pei. “DATA MINING concepts and
Techniques” 3/e, Elsevier, 2012

You might also like