Skip to content

Latest commit

 

History

History
211 lines (206 loc) · 9.4 KB

10-DecisionTrees.org

File metadata and controls

211 lines (206 loc) · 9.4 KB

Decision Trees and Ensemble Learning

#+BEAMER_HEADER_EXTRA \beamerdefaultoverlayspecification{<+->}

1 Review

1.1 Decision Trees Review

1.1.1 <1-> What is a decision tree?

1.1.1.1 <2-> a function that takes as input a vector of attribute values and returns a output value

1.1.1.2 <2-> assumption: inputs are discrete and output is boolean

1.1.1.3 <2-> given a set of examples: value assignment to attributes in input and corresponding output value

1.1.2 <3-> Entropy of a random variable?

1.1.2.1 <4-> measurement of uncertainity of a random variable

1.1.2.2 <4-> acquisition of information corresponds to reduction in entropy

1.1.2.3 <4-> \[H(V) = -∑_k P(v_k)log_2P(v_k)\]

1.1.2.4 <5-> $H(fair-coin)$?

1.1.2.5 <6-> $H(tail-only)$?

1.1.3 <7-> $B(q)$

1.1.3.1 <8-> entropy of a Boolean random variable that is true with a probability $q$

1.1.3.2 <8-> $B(q) = -(q log_2 q + (1-q) log_2 (1-q))$

2 Decision Tree

2.1 Learning to Lunch

PriceFood TypeRush?ResultsIndex
HighSushiHurryGoodx1
LowPizzaHurryGoodx2
MediumHamburgerHurryGoodx3
MediumPizzaRelaxGoodx4
HighGruelRelaxBadx5
HighHamburgerRelaxGoodx6
LowGruelHurryBadx7
LowSushiRelaxBadx8
HighPizzaHurryBadx9
HighHamburgerHurryBadx10
MediumHamburgerHurryBadx11
LowGruelRelaxBadx12

2.1.1 <2-> What is the entropy of the result attribute of the whole example set?

2.1.1.1 <3-> $B(\frac{5}{12})$

2.1.1.2 <4-> $-\frac{5}{12}log_2\frac{5}{12} - \frac{7}{12}log_2\frac{27}{12} = 0.9803$

2.2 Learning to Lunch

\small

PriceFood TypeRush?ResultsIndex
HighSushiHurryGoodx1
LowPizzaHurryGoodx2
MediumHamburgerHurryGoodx3
MediumPizzaRelaxGoodx4
HighGruelRelaxBadx5
HighHamburgerRelaxGoodx6
LowGruelHurryBadx7
LowSushiRelaxBadx8
HighPizzaHurryBadx9
HighHamburgerHurryBadx10
MediumHamburgerHurryBadx11
LowGruelRelaxBadx12

2.2.1 What is first attribute to split on?

2.2.1.1 <2-> That attribute that gives us the most information gain, reduces the entropy by the largest amount.

2.2.1.2 <3-> \[Remainder(A)= ∑k=1d\frac{p_k+n_k}{p+n}B(\frac{p_k}{p_k+n_k})\]

2.2.1.3 <4-> \[Gain(A) = B(\frac{p_k}{p_k+n_k}) - Remainder(A)\]

2.3 Learning to Lunch

\small

PriceFood TypeRush?ResultsIndex
HighSushiHurryGoodx1
LowPizzaHurryGoodx2
MediumHamburgerHurryGoodx3
MediumPizzaRelaxGoodx4
HighGruelRelaxBadx5
HighHamburgerRelaxGoodx6
LowGruelHurryBadx7
LowSushiRelaxBadx8
HighPizzaHurryBadx9
HighHamburgerHurryBadx10
MediumHamburgerHurryBadx11
LowGruelRelaxBadx12

2.3.1 What is first attribute to split on?

2.3.1.1 <2-> Remainder(Type)?

2.3.1.1.1 <3-> $\frac{2}{12}B(\frac{1}{2}) + \frac{3}{12}B(\frac{2}{3})+\frac{4}{12}B(\frac{1}{2})+\frac{3}{12}B(0) = 0.6371$

2.3.1.2 <4-> Remainder(Price)?

2.3.1.2.1 <5-> $\frac{5}{12}B(\frac{2}{5}) + \frac{3}{12}B(\frac{2}{3})+\frac{4}{12}B(\frac{1}{4}) = 0.90448$

2.3.1.3 <5-> Remainder(Rush)?

2.3.1.3.1 <6-> $\frac{6}{12}B(\frac{1}{2}) + \frac{6}{12}B(\frac{2}{3}) = 0.9591$

2.3.1.4 <7-> Gain(Type)? Gain(Price)? Gain(Rush)?

2.3.1.4.1 <8-> 0.3432, 0.07585, 0.0212

3 Ensemble Learning

3.1 Ensemble of Stumps

\small

PriceFood TypeRush?ResultsIndex
HighSushiHurryGoodx1
LowPizzaHurryGoodx2
MediumHamburgerHurryGoodx3
MediumPizzaRelaxGoodx4
HighGruelRelaxBadx5
HighHamburgerRelaxGoodx6
LowGruelHurryBadx7
LowSushiRelaxBadx8
HighPizzaHurryBadx9
HighHamburgerHurryBadx10
MediumHamburgerHurryBadx11
LowGruelRelaxBadx12

3.1.1 <2-> What is the weight of examples?

3.1.1.1 <3-> $\frac{1}{12}$

3.1.2 <4-> What is the initial decision stump (use information gain)?

3.1.2.1 <5-> Food type

3.1.3 <6-> What is the classification error?

3.1.3.1 <7-> Sushi (good), Hamburger (good), Pizza (good), Gruel (bad)

3.1.3.2 <7-> $\frac{4}{12}=0.333$

3.1.4 <8-> What is the weight of the hypothesis?

3.1.4.1 <9-> $z[1 ]= log(\frac{1-0.333}{0.333}) = 0.695$

3.2 Ensemble of Stumps

\small

PriceFood TypeRush?ResultsIndex
HighSushiHurryGoodx1
LowPizzaHurryGoodx2
MediumHamburgerHurryGoodx3
MediumPizzaRelaxGoodx4
HighGruelRelaxBadx5
HighHamburgerRelaxGoodx6
LowGruelHurryBadx7
LowSushiRelaxBadx8
HighPizzaHurryBadx9
HighHamburgerHurryBadx10
MediumHamburgerHurryBadx11
LowGruelRelaxBadx12

3.2.1 How will the weight change?

3.3 Ensemble of Stumps

\small

PriceFood TypeRush?ResultsIndexWeight
HighSushiHurryGoodx10.0625
LowPizzaHurryGoodx20.0625
MediumHamburgerHurryGoodx30.0625
MediumPizzaRelaxGoodx40.0625
HighGruelRelaxBadx50.0625
HighHamburgerRelaxGoodx60.0625
LowGruelHurryBadx70.0625
LowSushiRelaxBadx80.125
HighPizzaHurryBadx90.125
HighHamburgerHurryBadx100.125
MediumHamburgerHurryBadx110.125
LowGruelRelaxBadx120.0625

3.3.1 <1-> How will the weight change?

3.3.1.1 <2-> $w[j] ← w[j]. \frac{error}{(1-error)}$

3.3.1.2 <2-> NORMALIZE(w)

3.3.2 <3-> New entropy of the sample

3.3.2.1 <4-> $-0.3125log_2(0.3125) - 0.6875log_2(0.6875)$

3.3.3 <5-> Genrate the next stump.

3.3.3.1 <6-> \[Remainder(A)= ∑k=1d\frac{p_k+n_k}{p+n}B(\frac{p_k}{p_k+n_k})\]

4 PAC

4.1 PAC Hypothesis

4.1.1 <2-> Size of the hypothesis space of this problem.

4.1.1.1 <3-> $23*4*2$

4.1.2 <4-> If a hypothesis classifies more than 3 samples wrong, it is a bad hypothesis. What is the epsilon?

4.1.2.1 <5-> $\frac{3}{24}$

4.1.3 <6-> After 12 examples are observed, what is the hypothesis space?

4.1.3.1 <7-> $212$

4.1.4 <8-> How many of these are approximately correct?

4.1.4.1 <9-> $1 + 12 + {12\choose2}$

4.1.5 <10-> What is the probability that a randomly-chosen hypothesis that is consistent with the 8 examples seen so far will be approximately correct?

4.1.5.1 <11-> $\frac{1 + 12 + {12\choose2}} {212}$