Decision Trees and Ensemble Learning

#+BEAMER_HEADER_EXTRA \beamerdefaultoverlayspecification{<+->}

1 Review

1.1 Decision Trees Review

1.1.1 <1-> What is a decision tree?

1.1.1.1 <2-> a function that takes as input a vector of attribute values and returns a output value

1.1.1.2 <2-> assumption: inputs are discrete and output is boolean

1.1.1.3 <2-> given a set of examples: value assignment to attributes in input and corresponding output value

1.1.2 <3-> Entropy of a random variable?

1.1.2.1 <4-> measurement of uncertainity of a random variable

1.1.2.2 <4-> acquisition of information corresponds to reduction in entropy

1.1.2.3 <4-> \[H(V) = -∑_k P(v_k)log_2P(v_k)\]

1.1.2.4 <5-> $H(fair-coin)$?

1.1.2.5 <6-> $H(tail-only)$?

1.1.3 <7-> $B(q)$

1.1.3.1 <8-> entropy of a Boolean random variable that is true with a probability $q$

1.1.3.2 <8-> $B(q) = -(q log_2 q + (1-q) log_2 (1-q))$

2 Decision Tree

2.1 Learning to Lunch

Price	Food Type	Rush?	Results	Index
High	Sushi	Hurry	Good	x1
Low	Pizza	Hurry	Good	x2
Medium	Hamburger	Hurry	Good	x3
Medium	Pizza	Relax	Good	x4
High	Gruel	Relax	Bad	x5
High	Hamburger	Relax	Good	x6
Low	Gruel	Hurry	Bad	x7
Low	Sushi	Relax	Bad	x8
High	Pizza	Hurry	Bad	x9
High	Hamburger	Hurry	Bad	x10
Medium	Hamburger	Hurry	Bad	x11
Low	Gruel	Relax	Bad	x12

2.1.1 <2-> What is the entropy of the result attribute of the whole example set?

2.1.1.1 <3-> $B(\frac{5}{12})$

2.1.1.2 <4-> $-\frac{5}{12}log_2\frac{5}{12} - \frac{7}{12}log_2\frac{27}{12} = 0.9803$

2.2 Learning to Lunch

\small

Price	Food Type	Rush?	Results	Index
High	Sushi	Hurry	Good	x1
Low	Pizza	Hurry	Good	x2
Medium	Hamburger	Hurry	Good	x3
Medium	Pizza	Relax	Good	x4
High	Gruel	Relax	Bad	x5
High	Hamburger	Relax	Good	x6
Low	Gruel	Hurry	Bad	x7
Low	Sushi	Relax	Bad	x8
High	Pizza	Hurry	Bad	x9
High	Hamburger	Hurry	Bad	x10
Medium	Hamburger	Hurry	Bad	x11
Low	Gruel	Relax	Bad	x12

2.2.1 What is first attribute to split on?

2.2.1.1 <2-> That attribute that gives us the most information gain, reduces the entropy by the largest amount.

2.2.1.2 <3-> \[Remainder(A)= ∑_k=1^d\frac{p_k+n_k}{p+n}B(\frac{p_k}{p_k+n_k})\]

2.2.1.3 <4-> \[Gain(A) = B(\frac{p_k}{p_k+n_k}) - Remainder(A)\]

2.3 Learning to Lunch

\small

Price	Food Type	Rush?	Results	Index
High	Sushi	Hurry	Good	x1
Low	Pizza	Hurry	Good	x2
Medium	Hamburger	Hurry	Good	x3
Medium	Pizza	Relax	Good	x4
High	Gruel	Relax	Bad	x5
High	Hamburger	Relax	Good	x6
Low	Gruel	Hurry	Bad	x7
Low	Sushi	Relax	Bad	x8
High	Pizza	Hurry	Bad	x9
High	Hamburger	Hurry	Bad	x10
Medium	Hamburger	Hurry	Bad	x11
Low	Gruel	Relax	Bad	x12

2.3.1 What is first attribute to split on?

2.3.1.1 <2-> Remainder(Type)?

2.3.1.1.1 <3-> $\frac{2}{12}B(\frac{1}{2}) + \frac{3}{12}B(\frac{2}{3})+\frac{4}{12}B(\frac{1}{2})+\frac{3}{12}B(0) = 0.6371$

2.3.1.2 <4-> Remainder(Price)?

2.3.1.2.1 <5-> $\frac{5}{12}B(\frac{2}{5}) + \frac{3}{12}B(\frac{2}{3})+\frac{4}{12}B(\frac{1}{4}) = 0.90448$

2.3.1.3 <5-> Remainder(Rush)?

2.3.1.3.1 <6-> $\frac{6}{12}B(\frac{1}{2}) + \frac{6}{12}B(\frac{2}{3}) = 0.9591$

2.3.1.4 <7-> Gain(Type)? Gain(Price)? Gain(Rush)?

2.3.1.4.1 <8-> 0.3432, 0.07585, 0.0212

3 Ensemble Learning

3.1 Ensemble of Stumps

\small

Price	Food Type	Rush?	Results	Index
High	Sushi	Hurry	Good	x1
Low	Pizza	Hurry	Good	x2
Medium	Hamburger	Hurry	Good	x3
Medium	Pizza	Relax	Good	x4
High	Gruel	Relax	Bad	x5
High	Hamburger	Relax	Good	x6
Low	Gruel	Hurry	Bad	x7
Low	Sushi	Relax	Bad	x8
High	Pizza	Hurry	Bad	x9
High	Hamburger	Hurry	Bad	x10
Medium	Hamburger	Hurry	Bad	x11
Low	Gruel	Relax	Bad	x12

3.1.1 <2-> What is the weight of examples?

3.1.1.1 <3-> $\frac{1}{12}$

3.1.2 <4-> What is the initial decision stump (use information gain)?

3.1.2.1 <5-> Food type

3.1.3 <6-> What is the classification error?

3.1.3.1 <7-> Sushi (good), Hamburger (good), Pizza (good), Gruel (bad)

3.1.3.2 <7-> $\frac{4}{12}=0.333$

3.1.4 <8-> What is the weight of the hypothesis?

3.1.4.1 <9-> $z[1 ]= log(\frac{1-0.333}{0.333}) = 0.695$

3.2 Ensemble of Stumps

\small

Price	Food Type	Rush?	Results	Index
High	Sushi	Hurry	Good	x1
Low	Pizza	Hurry	Good	x2
Medium	Hamburger	Hurry	Good	x3
Medium	Pizza	Relax	Good	x4
High	Gruel	Relax	Bad	x5
High	Hamburger	Relax	Good	x6
Low	Gruel	Hurry	Bad	x7
Low	Sushi	Relax	Bad	x8
High	Pizza	Hurry	Bad	x9
High	Hamburger	Hurry	Bad	x10
Medium	Hamburger	Hurry	Bad	x11
Low	Gruel	Relax	Bad	x12

3.2.1 How will the weight change?

3.3 Ensemble of Stumps

\small

Price	Food Type	Rush?	Results	Index	Weight
High	Sushi	Hurry	Good	x1	0.0625
Low	Pizza	Hurry	Good	x2	0.0625
Medium	Hamburger	Hurry	Good	x3	0.0625
Medium	Pizza	Relax	Good	x4	0.0625
High	Gruel	Relax	Bad	x5	0.0625
High	Hamburger	Relax	Good	x6	0.0625
Low	Gruel	Hurry	Bad	x7	0.0625
Low	Sushi	Relax	Bad	x8	0.125
High	Pizza	Hurry	Bad	x9	0.125
High	Hamburger	Hurry	Bad	x10	0.125
Medium	Hamburger	Hurry	Bad	x11	0.125
Low	Gruel	Relax	Bad	x12	0.0625

3.3.1 <1-> How will the weight change?

3.3.1.1 <2-> $w[j] ← w[j]. \frac{error}{(1-error)}$

3.3.1.2 <2-> NORMALIZE(w)

3.3.2 <3-> New entropy of the sample

3.3.2.1 <4-> $-0.3125log_2(0.3125) - 0.6875log_2(0.6875)$

3.3.3 <5-> Genrate the next stump.

3.3.3.1 <6-> \[Remainder(A)= ∑_k=1^d\frac{p_k+n_k}{p+n}B(\frac{p_k}{p_k+n_k})\]

4 PAC

4.1 PAC Hypothesis

4.1.1 <2-> Size of the hypothesis space of this problem.

4.1.1.1 <3-> $2^342$

4.1.2 <4-> If a hypothesis classifies more than 3 samples wrong, it is a bad hypothesis. What is the epsilon?

4.1.2.1 <5-> $\frac{3}{24}$

4.1.3 <6-> After 12 examples are observed, what is the hypothesis space?

4.1.3.1 <7-> $2¹²$

4.1.4 <8-> How many of these are approximately correct?

4.1.4.1 <9-> $1 + 12 + {12\choose2}$

4.1.5 <10-> What is the probability that a randomly-chosen hypothesis that is consistent with the 8 examples seen so far will be approximately correct?

4.1.5.1 <11-> $\frac{1 + 12 + {12\choose2}} {2¹²}$

Files

10-DecisionTrees.org

Latest commit

History

10-DecisionTrees.org

File metadata and controls

Decision Trees and Ensemble Learning

1 Review

1.1 Decision Trees Review

1.1.1 <1-> What is a decision tree?

1.1.1.1 <2-> a function that takes as input a vector of attribute values and returns a output value

1.1.1.2 <2-> assumption: inputs are discrete and output is boolean

1.1.1.3 <2-> given a set of examples: value assignment to attributes in input and corresponding output value

1.1.2 <3-> Entropy of a random variable?

1.1.2.1 <4-> measurement of uncertainity of a random variable

1.1.2.2 <4-> acquisition of information corresponds to reduction in entropy

1.1.2.3 <4-> \[H(V) = -∑_k P(v_k)log_2P(v_k)\]

1.1.2.4 <5-> $H(fair-coin)$?

1.1.2.5 <6-> $H(tail-only)$?

1.1.3 <7-> $B(q)$

1.1.3.1 <8-> entropy of a Boolean random variable that is true with a probability $q$

1.1.3.2 <8-> $B(q) = -(q log_2 q + (1-q) log_2 (1-q))$

2 Decision Tree

2.1 Learning to Lunch

2.1.1 <2-> What is the entropy of the result attribute of the whole example set?

2.1.1.1 <3-> $B(\frac{5}{12})$

2.1.1.2 <4-> $-\frac{5}{12}log_2\frac{5}{12} - \frac{7}{12}log_2\frac{27}{12} = 0.9803$

2.2 Learning to Lunch

2.2.1 What is first attribute to split on?

2.2.1.1 <2-> That attribute that gives us the most information gain, reduces the entropy by the largest amount.

2.2.1.2 <3-> \[Remainder(A)= ∑k=1d\frac{p_k+n_k}{p+n}B(\frac{p_k}{p_k+n_k})\]

2.2.1.3 <4-> \[Gain(A) = B(\frac{p_k}{p_k+n_k}) - Remainder(A)\]

2.3 Learning to Lunch

2.3.1 What is first attribute to split on?

2.3.1.1 <2-> Remainder(Type)?

2.3.1.1.1 <3-> $\frac{2}{12}B(\frac{1}{2}) + \frac{3}{12}B(\frac{2}{3})+\frac{4}{12}B(\frac{1}{2})+\frac{3}{12}B(0) = 0.6371$

2.3.1.2 <4-> Remainder(Price)?

2.3.1.2.1 <5-> $\frac{5}{12}B(\frac{2}{5}) + \frac{3}{12}B(\frac{2}{3})+\frac{4}{12}B(\frac{1}{4}) = 0.90448$

2.3.1.3 <5-> Remainder(Rush)?

2.3.1.3.1 <6-> $\frac{6}{12}B(\frac{1}{2}) + \frac{6}{12}B(\frac{2}{3}) = 0.9591$

2.3.1.4 <7-> Gain(Type)? Gain(Price)? Gain(Rush)?

2.3.1.4.1 <8-> 0.3432, 0.07585, 0.0212

3 Ensemble Learning

3.1 Ensemble of Stumps

3.1.1 <2-> What is the weight of examples?

3.1.1.1 <3-> $\frac{1}{12}$

3.1.2 <4-> What is the initial decision stump (use information gain)?

3.1.2.1 <5-> Food type

3.1.3 <6-> What is the classification error?

3.1.3.1 <7-> Sushi (good), Hamburger (good), Pizza (good), Gruel (bad)

3.1.3.2 <7-> $\frac{4}{12}=0.333$

3.1.4 <8-> What is the weight of the hypothesis?

3.1.4.1 <9-> $z[1 ]= log(\frac{1-0.333}{0.333}) = 0.695$

3.2 Ensemble of Stumps

3.2.1 How will the weight change?

3.3 Ensemble of Stumps

3.3.1 <1-> How will the weight change?

3.3.1.1 <2-> $w[j] ← w[j]. \frac{error}{(1-error)}$

3.3.1.2 <2-> NORMALIZE(w)

3.3.2 <3-> New entropy of the sample

3.3.2.1 <4-> $-0.3125log_2(0.3125) - 0.6875log_2(0.6875)$

3.3.3 <5-> Genrate the next stump.

3.3.3.1 <6-> \[Remainder(A)= ∑k=1d\frac{p_k+n_k}{p+n}B(\frac{p_k}{p_k+n_k})\]

4 PAC

4.1 PAC Hypothesis

4.1.1 <2-> Size of the hypothesis space of this problem.

4.1.1.1 <3-> $23*4*2$

4.1.2 <4-> If a hypothesis classifies more than 3 samples wrong, it is a bad hypothesis. What is the epsilon?

4.1.2.1 <5-> $\frac{3}{24}$

4.1.3 <6-> After 12 examples are observed, what is the hypothesis space?

4.1.3.1 <7-> $212$

4.1.4 <8-> How many of these are approximately correct?

4.1.4.1 <9-> $1 + 12 + {12\choose2}$

4.1.5 <10-> What is the probability that a randomly-chosen hypothesis that is consistent with the 8 examples seen so far will be approximately correct?

4.1.5.1 <11-> $\frac{1 + 12 + {12\choose2}} {212}$

2.2.1.2 <3-> \[Remainder(A)= ∑_k=1^d\frac{p_k+n_k}{p+n}B(\frac{p_k}{p_k+n_k})\]

3.3.3.1 <6-> \[Remainder(A)= ∑_k=1^d\frac{p_k+n_k}{p+n}B(\frac{p_k}{p_k+n_k})\]

4.1.1.1 <3-> $2^342$

4.1.3.1 <7-> $2¹²$

4.1.5.1 <11-> $\frac{1 + 12 + {12\choose2}} {2¹²}$