Lectures Machine Learning
Lectures Machine Learning
Jesús Fernández-Villaverde1
December 28, 2018
1
University of Pennsylvania
What is machine learning?
• Set of algorithms to detect and learn from patterns in the data and use them
for decision making or to forecast future realizations of random variables.
1
Turing test
2
Why now?
• Many of the ideas of machine learning (e.g., basic neural network by McCulloch
and Pitts, 1943, and perceptron by Rosenblatt, 1958) are decades old.
• Previous waves of excitement: late 1980s and early 1990s. Those decades were
followed by a backlash.
1. Big data.
2. Long tails.
• Likely that these four forces will become stronger over time.
10 9
Dataset size (number examples)
10 8 Canadian Hansard
WMT Sports-1M
10 7 ImageNet10k
10 6 Public SVHN
10 5 Criminals ImageNet ILSVRC 2014
10 4
MNIST CIFAR-10
10 3
10 2 T vs. G vs. F Rotated T vs. C
1 Iris
10
10 0
1900 1950 1985 2000 2015
Y
e 1.8: Dataset sizes have increased greatly over time. In the early 1900s, sta
ed datasets using hundreds or thousands of manually compiled measurements 4
Number of transistors
●
10,000,000,000 ● ●
● ● ●
●
●
● ● ●
● ● ●
● ● ●
● ●
● ●
●
● ● ● ● ● ●
●
● ● ●
● ● ●
● ● ●
1,000,000,000 ● ● ●
●
●
● ●
● ● ●
●
● ● ● ●
● ● ●
100,000,000 ● ●
● ● ●
● ●
Transistors
● ●
● ●
10,000,000 ●
● ●
●
●
●
● ● ●
● ● ●
1,000,000
● ●
● ● ● ●
● ●
● ●
100,000 ● ●
●
● ●
● ● ●
●
●
10,000 ● ●
●
●
●
●
●
● ●
●
1,000
1971 1976 1981 1986 1991 1996 2001 2006 2011 2016
Year
5
6
7
8
9
Relation with other fields
• Link with computer science, statistical learning, data science, data mining,
predictive analytics, and optimization: frontiers are often fuzzy.
1. No unified approach.
10
Economics and machine learning I
• Most important lesson for economists from data science: Everything is data.
11
Library data
12
Parish and probate data
13
Satellite imagery
14
Luminosity
15
Cell use
16
TABLE I
MOST PARTISAN PHRASES FROM THE 2005 CONGRESSIONAL RECORDa
a The top 60 Democratic and Republican phrases, respectively, are shown ranked by χ2 . The phrases are classified
pl
as two or three word after dropping common “stopwords” such as “for” and “the.” See Section 3 for details and see
Appendix B (online) for a more extensive phrase list. 18
19
Economics and machine learning II
1. Counterfactuals.
2. Welfare.
4. New changes.
5. Less data.
• Another example by Athey (2017): hotel prices and occupancy rates. In the
data, prices and occupancy rates are strongly positively correlated, but what is
the expected impact of a hotel raising its prices on a given day?
20
21
References I
22
References II
1. Machine Learning: The Art and Science of Algorithms that Make Sense of Data,
Peter Flach (2012).
2. Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville (2016).
23
A wide range of algorithms
• Main division:
3. Reinforcement learning.
24
Supervised Learning
Jesús Fernández-Villaverde1
December 28, 2018
1
University of Pennsylvania
Supervised learning I
1.0
Unit Cube
p=10
p=3
0.8
p=2
0.6
p=1
Distance
0.4
0.2
0
0.0
1
0.0 0.2 0.4 0.6
Neighborhood
Fraction of Volume
y = f (x)
• Then:
yb = fb (x) = arg max p (y = c|x,D)
c∈C
5
Example I: K-nearest algorithm
• Formally:
1 X
p (y = c|x,D, K ) = I (yi = c)
K
i∈NK (x,K )
6
7
Bias-variance trade-off
• We have:
y = f (x) = fb (x) + ε, ε ∼ ID 0, σ 2
• Then:
2 2
E y − fb (x) = E fb (x) − f (x)
2 2
= Efb (x) − f (x) + E fb (x) − Efb (x)
= Bias 2 + var
Closest fit
Truth MODEL
SPACE
Model bias
Estimation
Variance
RESTRICTED
MODEL SPACE
9
Linear Regression of 0/1 Response
.. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. . . . . . . . . . . . . . . .o .. .. ..
............................................... .. ..
.. ..
.. .. o .... .... .... ....o.... .... .... .... .... .... .... .... .... .... .... .... .... o.... .... .... ....o.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ....
..
..
.. ..
.. ..
.. ..
.. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. o .. . . . . . . . . . . . . .o .. .. .. .. ..o..o . . . . . . . . . . . . .o .. .. ............................
.. .. .. .. .. .. ... ... ... ... ...o... ... ... ... ... ... ... ... o
.. .. .. .. ..
.. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... .. .. .. .. .. .. .. .. .. .. .. ..o ... ... . . .. .. ..o.. .. .. ..o ... ...
o ..... o..... ..... ..... ..... ..... o .. .. .. .. .. .. .. .. .. .. .. ... ...o
.. ....................................
.. .. .. .. .. .. .. .. ..o .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. ..
o o o.... .... .... .... .... .... ..... .....o..... ..... ..... ..... ..... ..... ..... .....o..... ..... .....oo..... ..... .....o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....
.. .. .. .. .. .. ..
.. .. .. .. .. .. ..
..
... ... . . . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . . .. ..o.. o ... ... ... ...
o .... .... .... .... .... .... .... .... .... .... .... .... .... o.... .... .... .... .... ....o.... .... .... .... ....oo.... .... .... .... .... .... .... ....o
.. ...........................
.. ..
.. ..
..
..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. ..o.. .. o
.. ..
.. .. .. .. o
.. ..
.. .. o.. ....o.... o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....
.. .. .. ... o
... ... .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. o .. . . .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ...
.. .. .. .. .. .. o .. . .. .. .. .. .. .. .. .. o . ... ... o ........o .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. ..
. .. .. .. ... ... ... ... ... ... ... ...o
.. .. .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
.. .. .. ..
.. .. .. .. .. .. ... o . . . .. .. .. .. .. ...o . . .. o . .. ..o.. o . . . . . .. .. .. ..o
.. ..o.. ... ... ... ... ...o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. ... ... ...o .. .. .. .. .. .. ... ... ... ... o .. .. .. .. ...o o . . . .. .. .. .. .. .. ..o
.. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... o
.. .. ..o .. .. .. .. .. .. .. .. .. .. .. o .. .. .. ..o . .. .. .. .. .. .. ..
o ... ..... .. .. .. .. .. .. .. .. .. .. .. o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. o
..
.. o .. o.. ..o o.. o.. .. .. .. .. .. .. .. .. .. .. .. .. o
... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o
.. .. o
... ...
.. ..o .. .. .. .. .. ..o .. .. .. o ..o
. . . . . . . ... o
.. .. .. .. .. ... ... ... ... ... ... ... o
. . . . . .o.
.. ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o .. .. .. o ..o.. o
.. .. .. .. .. .. .. .. ..o
.. o .. .. .. o .. .. ..o
.. ..
.. .. .. .. .. .. .. .. o .. .. .. .. o .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... o ... ... ... ... ... ... ... ... ... ... ... ... ...o... ... ... ... ... ...o oo ...o
.. .. .. .. .. .. ..
.. .. ...o .. .. .. ...o... ... ...o... ...o .. .. ... ... ... ... ... ... ... ... ...o... ... ... ... ... ... o .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o .. .. .. ... ... ... o .. .. .. .. ... ... ... ... ... ... o
.. .. .. ... o
.. .. o
.. .. .. .. .. ..o... ... ... ... ... ... ... ... ... ... ... ... ...o... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...o
... ... ... ... ... .o . . .. .. ... ... o ... ...
.. .. .. .. .. .. ... o .. ... ... ... ... ... ... o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o .. .. .. .. .. .. ..
.. .... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ...o
..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
.. ..
.. .. .. .. o
.o o
.. .. .. .. .. o
. . . . . . .
.. ..o.. .. .. .. o . .. ..
. . .. o
. .. .. .. .. .. .. .. .. o
. . . . . . . . . .
.. .. o .
.. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ...
. . . .
.. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. o
..
. . . . . . . . . .o
.. .. . . . . . . .
. . . . . .. .. .. .. .. .. o . . .o
.. ... . . o
o . . . . . . . .
. . . . . . . . . . . . . .. .. .. .. .. .. ..
. . . . . . . . . . . .
. . .. .. ... ... ... ... o
.. .. .. .. .. .. .. . . . . .. .. .. ..o
.. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... o . .o . . . . .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... .. .. ... ... ... ... o
.. ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o .. .. .. .. .. .. .. o ..o..o.. ..o .. o .. .. .. .. .. ... ... ... ... ... ... ... ... ... ...o ..............
.. .... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
.. .. .. .. .. ..o . .o .. ..
. . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ...o... ... ... ... ...o... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. ..o
... ... .. ... ... o .. .. .. o
... ... .. o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. o
.. .. .. .. .. .. .. . . . . . . . .o ..................
. . . . . . .. .. ..o . . . . . . .o . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ...o
.. .. .. .. .. .. .. .. .. .. .. .. o
.. .. .. .. .. .. ..
.. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ...o... ... ... ... ... ... o .. .. .. .. .. ..o... ... ... ...o
... ... .. . . . . . . . . . . . . . . ..o.. .. .. .. .. o
... ... ... ... . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... o ... o
.. .. .. .. .. ..
.. .. .. .. .. .. o .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. ..
..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
.. .. .. .. .. ... ... ...o .. .. o .. .... .. o .. .. .. .. ..o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...o... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... o
..
... ... .. ... ... ... ... .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ...o
.. .. ..
..
.. ..
.. ..
.. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. ..
o. . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
.. .. .. .. .. .. .. ................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... .. ... ... ... ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
..
.. .. .. .............................................................. o 10
1−Nearest Neighbor Classifier
oo
o o o
o o oo oo o
o o o oo o o
o oo o o oo oo
o oo
o o o oo o oo
o o
o o o oo o
o o ooo oo o o
o o
o oo oo o ooo o
oooooo oo oooo o ooo o
o o
o oo ooo o o o o o oo
o o o o oo o o oo oo
o oo o oo ooo oo o o o
o oo oo oooo
oo oo
o o oo o o
o oooo o oooooo o
o oo o
o o o
o oo o o oo o
o ooo oo oo o
o
o o ooo o
o ooo
o o
o 11
15-Nearest Neighbor Classifier
.. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. . . . . . . . . . . . . . . .o .. .. ..
............................................... .. ..
.. ..
.. ..
o .... .... .... ....o.... .... .... .... .... .... .... .... .... .... .... .... .... o.... .... .... ....o.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ....
..
..
.. ..
.. ..
.. ..
.. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. o . . . . . . . . . . . . .o
.. .. .. ..o..o
.. .. . . . . . . . . . . . . .o .. .. ............................
.. .. .. .. .. .. ... ... ... ... ...o... ... ... ... ... ... ... ... o
.. .. .. .. ..
.. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. ..
.. .. .. .. .. .. .. .. .. .. ..o . . .. .. ..o.. .. .. ..o
.. .. .. ..
o ..... o..... ..... ..... ..... ..... o .. .. .. .. .. .. .. .. .. .. .. ... ...o
.. .. .. .. .. ....................................
.. ..
.. .. .. .. ..o .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
o o o.... .... .... .... .... .... ..... .....o..... ..... ..... ..... ..... ..... ..... .....o..... ..... .....oo..... ..... .....o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....
... ... .. ... ... ... ...
.. .. .. .. .. .. ..
..
... ... . . . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . . .. ..o.. o ... ... ... ...
o .... .... .... .... .... .... .... .... .... .... .... .... .... o.... .... .... .... .... ....o.... .... .... .... ....oo.... .... .... .... .... .... .... ....o
.. ...........................
.. ..
.. ..
..
..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. ..o.. .. o
.. ..
.. .. .. .. o
.. ..
.. .. o.. ....o.... o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....
.. .. .. ... o
... ... .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. o .. . . .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ...
.. .. .. .. .. .. o ..
. .. .. .. .. .. .. .. .. o . ... ... o ........o ... ... ... ... ... ... ... ... ... ... ...o... ... ... ... ... ... ...
. .. .. .. ... ... ... ... ... ... ... ...o
.. .. .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
.. .. .. ..
.. .. .. .. .. .. ... o .. .. .. .. .. .. .. .. ...o . . .. o . .. ..o.. o . . . . . .. .. .. ..o
.. ..o.. ... ... ... ... ...o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o
.. .. .. .. .. .. .. .. .. .. .. .. ..o
.. .. .. .. .. .. .. .. .. .. .. ..o
..
.. o .. .. .. .. .. ... ... ... ... o
o. o. ...oo... o .. .. .. .. .. .. ...o .. .. o
. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... o
.. .. .. .. .. .. .. o .. .. .. ..o ........
.. .. .. .. .. .. .. ... ... ... ... ... ... ... ...
o ... ... .. .. .. .. .. .. .. .. .. .. .. .. ... ... o .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. ...o
. .. o
. ...o... .... .. .. .. .. ..o . .. .. o ..o .. .. .. .. .. ... ... ... ... ... ... ... o .. o .. .. .. .. .. .. ..
o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o .. . . . .o . .o .. ..
. .. .. .. .. .. .. ..o . . . .. .. ..o
. .. ... . .. .. . ..o . . . . .o . . .. .. .. .. .. .. ... ... ... ... ... ... ... ...
.. .. ..o .. .. .. .. .. .. o .. .. .. .. .. .. .. .. .. ... ... ...o...o... o
.. .. ..
.. .. .. .. ..o .. .. ... ...o... o
o .. ..o
.. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... o .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o
.. ... ... .o. . . .. ..o .. .. .. .. .. ..o.. .. .. ..o.. .. .. .. .. .. ..o
... ...
. . . . .. .. ... ... ... ... o . .. o
.. .............
.. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o . . . . . . .. .. .. .. .. .. .. .. .. .. .. ... ...o... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ...o
.. .. . . .. .. .. o .. .. .. ..
.. .. .. .. o .. .. ... ... ... ... o .. .. .. .. .. .. .. .. ... o . ... ... ... ... ... ... o
.. .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o .......
.. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
.. . .. .. .. o
o
... ... ... ... ... o
.. .. ... ...o... ... ... ... o
. .. ..... o
.
... ... ... ... ... ... ... ... o ... ... o . . . . . . . . . . . .. .. .. .. .. .. ..
... ... .. .. .. .. .. .. .. .. .. .. .. .. .. o .. . . . . . . . . . .o . o. ... ...
. . . . .
. . . . . .. .. .. .. .. .. o . . . . o. ... ...
. .o
. o . . . . . . . . . ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
. .. ... ... ... ... o
. . . .. .. .. ..o ....................
.. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ...o... ... ... ... ... o .. .. . .o ..o .. .. . . . . .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o .. . . . . . .o . ..o.. ..o
.. .. . . .. ... ... ... ... o . . . . . .. .. .. .. .. .. .. .. .. ..o
.. .. ..............
... ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... o
..
.. .. ..o.. ... ... ... o
... ... ..o..o ... ...
.. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. o
..
.. ... ...
oo o
... ... . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...o .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. o ... ... ... ... ... ... ... ...o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. ..
.. .. .. .. .. .. .. .. .. .. .. .. o
.. ..
..
. . . . . .o
..
.. .. o
... ...
..
. . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. . ... ...
.. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. o
.. .. ... .. ..
.. .. o .. .. ..o.. .. .. .. .. o
o.. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... o
... ... ..
.. .. .. .. o
... ... .. o.. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... o
.. .. .. .. .. .. .. .. .. .. ..o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... o
... ...
..
.. ... ... .. .. .. ..o o
.. .. .. .. .. o
... ... .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... o
.. .. .. .. .. .. .. . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... o
.. .. .. .. .. .. .. .................
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... .. ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
.. ................
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. ..
.. .. .. .. ..
.. ..
.. ..
.. ..
..
..............................................................
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
..
.. .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o 12
Bayes Optimal Classifier
.. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. . . . . . . . . . . . . . . .o .. .. ..
............................................... .. ..
.. ..
.. ..
o .... .... .... ....o.... .... .... .... .... .... .... .... .... .... .... .... .... o.... .... .... ....o.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ....
..
..
.. ..
.. ..
.. ..
.. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. o . . . . . . . . . . . . .o
.. .. .. ..o..o
.. .. . . . . . . . . . . . . .o .. .. ............................
.. .. .. .. .. .. ... ... ... ... ...o... ... ... ... ... ... ... ... o
.. .. .. .. ..
.. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... .. .. .. .. .. .. .. .. .. .. .. ..o . . .. .. ..o.. .. .. ..o
... ... ... ...
o ..... o..... ..... ..... ..... ..... o .. .. .. .. .. .. .. .. .. .. .. ... ...o
.. ....................................
.. .. .. .. .. .. .. .. ..o .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. ..
o o o.... .... .... .... .... .... ..... .....o..... ..... ..... ..... ..... ..... ..... .....o..... ..... .....oo..... ..... .....o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....
..
... ... .. ... ... ... ...
.. .. .. .. .. .. ..
. . . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . . .. ..o.. o
o .... .... .... .... .... .... .... .... .... .... .... .... .... o.... .... .... .... .... ....o.... .... .... .... ....oo.... .... .... .... .... .... .... ....o
.. .. .. .. .. .. .. ...........................
.. ..
.. ..
..
..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. ..o.. .. o
.. ..
.. .. .. .. o
.. ..
.. ..
.. .. .. ... o
o.. ....o.... o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....
... ... .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. o .. . . .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ...
.. .. .. .. .. .. o ..
. .. .. .. .. .. .. .. .. o . ... ... o ........o .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. ..
. .. .. .. ... ... ... ... ... ... ... ...o
.. .. .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
.. .. .. ..
.. .. .. .. .. .. ... o .. .. .. .. .. .. .. .. ...o . . .. o . .. ..o.. o . . . . . .. .. .. ..o
.. ..o.. ... ... ... ... ...o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. o .. .. .. .. .. ... ... ... ... o .. .. .. .. ...o o . . . .. .. .. .. .. .. ..o
.. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... o
.. .. .. .. .. .. .. .. .. .. .. .. ..o .. .. .. .. .. .. .. .. .. .. .. o .. .. .. ..o . .. .. .. .. .. .. ..
... ..... .. .. .. .. .. .. .. .. .. .. ..o .. o
.. o .. ..o o.. o.. .. .. .. .. .. .. .. .. .. .. .. .. o
... ... .. .. o
... ... . . . . . . . ... o . . . . . .o.
o .. .. .. .. .. .. .. .. .. .. .. .. .. .. o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o .. ..o .. .. .. .. .. ..o .. .. .. o ..o .. .. .. .. .. ... ... ... ... ... ... ... o . .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o .. .. .. o ..o.. o .. ..
.. o.. .. .. .. .. .. ..o .. .. .. o .. .. ..o
.. .. .. .. .. .. .. .. .. .. o .. .. .. .. o .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ...
.. .. ..o .. .. .. .. .. .. o .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. ..o
.. .. .. .. .. .. ..o.. .. ..o
o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ..... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o
.. ... ... .o. . . .. ..o ... ..... .. .. .. .. ..o.. .. .. ..o.. .. .. .. .. .. ..o
. . . . .. .. ... ... ... ... o . .. o
.. .............
.. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o . . . . . . .. .. .. .. .. .. .. .. .. .. .. ... ...o... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ...o
.. .. . . .. .. .. o .. .. .. ..
.. .. .. .. o .. .. ... ... ... ... o
.. .. .. .. .. .. .. .. ... o .. ... ... ... ... ... ... o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o .......
.. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
.. . .. .. .. o
.. ..o
... ... ... ... ... o ... ...o... ... ... ... o .. ..
... o
. .. .. .. .. .. .. .. .. o
.. .. .. .. .. .. .. ... ... o
. . .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ...
... ..... .. .. .. .. .. .. .. .. .. .. .. .. o .. . . . . . . . . . .o . o.... ...
. . . . .
. . . . . .. .. .. .. .. .. o . . . . o... ...
. .. o
.o
. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.
. .. ... ... ... ... o
. . . .. .. .. ..o ....................
.. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ...o... ... ... ... ... o .. .. . .o ..o .. .. . . . . .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o .. . . . . . .o . ..o.. ..o
.. .. . . .. ... ... ... ... o . . . . . .. .. .. .. .. .. .. .. .. ..o
.. .. ..............
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... o
... ... ..
.. .. ..o.. ... ... ... o
... ... ..o..o ... ...
.. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. ..o
.. .. ..
.. ... ... o .. .. .. o
... ... .. o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. o
.. .. . . . . . . . .o ..................
.. .. .. .. .. .. .. .. .. .. .. .. o
... ...
..
.. .. .. .. .. .. ..o .. .. ..o .. .. .. .. .. .. .. o
.. ..
.. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. ..
.. ..
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o
.. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. o
.. .. ..o.. .. .. .. .. o
.. .. .. .. .. ..o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... o
... ... .. ... ... .. o .. o ... ...
o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... o
.. .. .. .. .. .. .. .. ..o .....................
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
.. . . . .o . .o . . . .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. ... ... ... ... ... ... ... ... ... o
.. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
... ... ... ... ... ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. ..
..
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...o
.. .. .. .. .. o.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... o
.. .. .. .. .. .. .. ................
... ... ..
.. ... ... o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. ..
. .............................................................. o 13
k − Number of Nearest Neighbors
151 101 69 45 31 21 11 7 5 3 1
0.30
0.25 Linear
Test Error
0.20
0.15
0.10
Train
Test
Bayes
2 3 5 8 12 18 29 67 200
14
Degrees of Freedom − N/k
ample, the quadratic model is perfectly matched to the true structu
k so it generalizes well to new data.
5.2: We fit three models to this example training set. The training dat
ed synthetically, by randomly sampling x values and choosing y determinis
uating a quadratic function. (Left)A linear function fit to the data suffer
15
16
Example II: Regularized regression
• Zvi Grilliches: “never trust OLS with more than five regressors.”
• Estimator:
N
X 2 p
βreg = min (yi − x0i β) + λ kβkp
β
i=1
where
K
! p1
X p
kβkp = |β|
i=1
18
19
L0 norm
• Once we select the relevant covariates, we can perform OLS, but standard
errors are not correct since we do not condition on model selection.
• Thus, not very used in practice. 20
Rigde regression
• Estimator with p = 2.
• Close form:
−1
β = (X0 X + λI) X0 Y
• It does not enforce sparsity and, thus, we do not achieve model selection.
2
• Bayesian interpretation λ = στ 2 where βprior ∼ N 0, τ 2 I and εt ∼ N 0, σ 2 .
21
LASSO
• LASSO shrinks some coefficients exactly to zero and others towards zero
(variation of relaxed lasso).
β1 β1
FIGURE 3.11. Estimation picture for the lasso (left) and ridge regression
(right). Shown are contours of the error and constraint functions. The solid blue
areas are the constraint regions |β1 | + |β2 | ≤ t and β12 + β22 ≤ t2 , respectively, 23
while the red ellipses are the contours of the least squares error function.
also the posterior mean, but the lasso and best subset selection are not.
Looking again at the criterion (3.53), we might try using other values
of q besides 0, 1, or 2. Although one might consider estimating q from
the data, our experience is that it is not worth the effort for the extra
variance incurred. Values of q ∈ (1, 2) suggest a compromise between the
lasso and ridge regression. Although this is the case, with q > 1, |βj |q is
differentiable at 0, and so does not share the ability of lasso (q = 1) for
P
FIGURE 3.12. Contours of constant value of j |βj |q for given values of q.
24
25
Which penalty?
28
Adaptive LASSO
• The LASSO penalty screens out irrelevant variables, but it also biases the
estimates of relevant variables.
• Ideally, we want to weaken the penalty on relevant covariates and vice versa.
• Estimator:
N
X X
2
min (yi − x0i β) + ωj |βj |
i=1 j
where
1
ωj = γ
|βbjOLS |
or
N
X 2
min (yi − x0i β) + λ (kβk1 + α kβk2 )
i=1
• Particularly useful because we can use available software for support vector
machines.
30
q = 1.2 α = 0.2
Lq Elastic Net
P q
ours of constant value of j |β j | 31
More background
• References:
32
Example II: Support vector machines
xT β + β0 = 0 • xT β + β0 = 0 •
• • • •
• •• • ξ4∗ • ∗ ••
ξ5
• • • •
1
M= ∗
• ••
kβk •ξ ∗• ξ3 •• M = 1
1 kβk
•• • •• • •∗
ξ2
• margin •
•• •• •• • • margin
1
M= kβk
• • 1
M= kβk
FIGURE 12.1. Support vector classifiers. The left panel shows the separable
case. The decision boundary is the solid line, while broken lines bound the shaded
maximal margin of width 2M = 2/kβk. The right panel shows the nonseparable
(overlap) case. The points labeled ξj∗ are on the wrong side of their margin by 34
∗ ∗
Linear boundaries
yi = 1 if β0 + β1 x ≥ 0
yi = −1 otherwise
• Find:
N
1 X 2
min max [0, (1 − yi (β0 − β1 ) x)] + λ kβk2
β0 ,β1 N
i=1
2
• The regularization term λ kβk2 is known as the hinge loss.
• You can solve the primal problem with a sub-gradient descent method or dual
problem with a coordinate descent algorithm.
35
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o..... ..... o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o o
.. .. .. .. .. ..o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .........................................
.. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
o o .. ..o .. .. ..o..o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . .o ............................
.. .. o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
. . .. .. ..o.. .. .. ..o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. ..o
.. .. ... ...o
....................................
o o o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . .o
.. .. .. .. ..o .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
o oo o ..o
. .. .. .. .. .. .. o
. . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ........................................
o .... .... .... ....oo.... .... .... .... .... .... .... .... .... .... .... .... .... .... ....o....o.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ....
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o o .. .. .. .. .. .. .. ... ...oo... ... ... ... ... ... ... ...o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
•... ......o...... o...... ...... ...... ...... ...... ...... ...... ...... ...... o...... ...... ...... ......o...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ......
. . . . . . . ..............................
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. ..o.. .. .. .. ..o.. .. o . . .. .. ..o
.. o .. .. .. .. .. .. .. .. .. ... ... o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o .. .. o ..o.. .. ... o
o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o.. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ...o... ... ... ... o . . . . . . . .o .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. ..
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
o... o... ... ... ... ...o... ... ... ... ...oo... ... ... ..... ..... ..... ..... o..... ..... ..... .....o
•
o oooo .....o..... ..... .....o..... o..... .....o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. o .. .. .. .. .. .. ..o .. .. .. .. .. .. .. ..
o oooo.. o .. .. .. .. .. .. .. .. .. .. .. .. .. o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . .. .. o
. . .
..o . . . . .
.. .. .. .. .. .. .. .. .. ..o . . . . . .. .. ... ... ... ... ... ... ... ... ... ... ... o
. . .. .. .. .. .. ..o..
..o .. .. o .. .. o .. .. .. .. .. .. .. .. .. ..o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o .. .. .. ..o .. .. .. o .. o .. .. .. .. .. .. .. .. .. .. .. .. o .. ... ... ... ... ... ... ...
o o o o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
oo o ..o
o
... ... ... ... ...o... ... ... ... ... ...o
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .. .. .. .. .. .. .. ... ... ... ... ... ... ...
. . . . . . . . . . . . . . . . . . . . .
o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ...o... ... ...o... ...o .. ... ... ... ... ...o... ... ... ...o... ... ... ... ... ... o .o
o.... ....o.... o.... .... o.... .... .... .... o.... .... .... .... .... .... o.... .... o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .............
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ...o... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. o .. .. .. .. .. .. .. .. o .. ... ... ... ... ... ... o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o .. .. .. .. .. .. ..
.. .. .. o .. .. .. ..o .. .. .. .. .. o
o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o .. ..o.. .. .. .. o .. o.. .. .. .. .. .. .. .. o . . . . . .
.. .. o .
.. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ...
. . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . .o . . .oo
o ... ... ... ... ... ... ... ... o... ... ... ... ... ... ...o... ... ... ... o... ... ... ... o... ... ... ...o... ... ... ... ... ... ... ... ... ... ... ... ... .... .... .... .... .... .... ....
. . . . . . . . . . . . . . . . . . . .
. .. .. .. .. .. .. .. .. ..•
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . ..o..o.. ..o ..o . .. .. .. o . . . . . .o
.. ... ... ... ... ... ... o
..o ..............
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. o .. .. .. ... ... ... o .. ..o .. ... o .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ...o... ... ... ... ...o... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o... ... ... ... ... ... ... ... ... ... ... ... ... ... ...o... ...oo... ... ... o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. o . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. .. .. .. .. .. .. .. ..o
o ..... ..... .....o..... ..... .....o..... ..... ..... .....o..... ..... ..... o..... ..... ..... ..... ..... ..... ..... ..... o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..................
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. ..o
.. .. .. .. .. ... ... ... ... ... ... ... ...o... ... ... ... ... ... o . . . . . . .. .. .. ..o
. . . . . . . . . . . . . . ..o.. .. .. .. .. o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . .. .. ... ... ... ...o... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
..................
.. .. .. .. .. .. .. .. .. .. .. .. .. o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ...o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .o
.. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... o .. .. .. .. ..o .....................
.. .. .. .. .. .. .. .. .. .. .. .. .. o .. .. .. ..o .. .. o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. .. .. o .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...o
... ...Training.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. ... ... ... ... ... ... ... ... Error: .. .. .. .. .. .. 0.270 .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... o ................
.. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ...
.. ..Test .. .. .. .. Error:
.. .. .. .. .. .. .. .. .. .. 0.288 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ................
Bayes Error: 0.210 .. . .. .. . .. .. .. . .. .. . .. .. . .. .. .. o.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
C = 10000 36
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o..... ..... o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o o
.. .. .. .. .. ..o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .........................................
.. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
o o .. ..o .. .. ..o..o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . .o ............................
.. .. o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
. . .. .. ..o.. .. .. ..o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. ..o
.. .. ... ...o
....................................
o o o .. .. .. .. ..o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . .o
.. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
o oo o ..o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. .. .. .. .. .. .. o
. . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..........................................
o .... .... .... ....oo.... .... .... .... .... .... .... .... .... .... .... .... .... .... ....o....o.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ....
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
•
o o ... ... ... ... ... ... ... ... ...oo... ... ... ... ... ... ... ...o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. ..o.. .. .. .. ..o.. .. o .. .. o .. .. ..o... o
.. ..o . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. ... o . . . . . . . . . . . . .o ..............
.. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
. . . . . . . . . . . . . o
o o o....o.... .... o....o.... .... .... .... .... .... .... .... .... .... .... .... ....o.... .... .... .... o
. . . . . .. .. .. .. .. .. .. .. o .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. .. .. .. .. ..o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o o ... ... ... ... ... o... ...o o.. o.. .. .. .. ..o.. .. .. ... ...oo... ... ... ... ... ... ... o... ... ... ...o... ... ...o... ... ... ... ... ... ...o... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.
oooooo.. o . . . . . . .. .. .. .. .. .. .. ... ... o . . . . . .o.
o...o.... .... o....o.... o.... .... .... .... .... .... .... ....o.... o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
o ..o .. ...o . . .. .. .. ..o . . . ..o .. .. .. .. .. .. .. .. .. .. .. .. o . .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. ..o.. o .. .. .. ... ... o .. ... ...o .. .. .. .. ... ... ... o .. o .. .. .. .. o .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ...
o o .. .. .. .. .. .. .. .. .. .. ..o
.. .. .. .. .. o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o ..o.. .. ..o.. ..o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
. . o
.. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. o
. . . . . . . . . . . . . . . .
. .. .. .. .. .. .. ... ... ... ... ... ... ...
. . . . . .
o.... ....o.... o.... .... o.... .... .... .... o.... .... .... .... .... .... o.... .... o .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. ...o . . . . . . .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ...o... ... ... ... o .......
.. .. .. .. .. .. .. .. o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. o .. .. .. .. .. .. .. o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ...
.. .. .. ..o o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. o .o . . . . o . o
.
. .. .. .. .. .. .. .. .. ..o . . . .o
. . o .
. .. o . o o
. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ...
. . . . . . . . . . . . .
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
o ... ... ... ... ... ... ... ... o... ... ... ... ... ... ...o... ... ... ... o...o... ... ... o... ... ... ...o... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
. . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. .. .. .. .. .. .. o .o..o.. ..o ..o . .. .. .. o . .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o
...o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..............
.. .. .. .. .. .. ... o .. .. .. ... ... ... o .. ..o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. ... o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ...o... ... ... ... ...o... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o.. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . oo . . . o . .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ...
. . . . . . . . . . . . .
o
o ..... ..... .....o..... ..... .....o..... ..... ..... .....o..... ..... ..... o..... ..... ..... ..... ..... ..... ..... ..... o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. ..o .. .. .. .. .. .. .. .. .. .. .. .. .. .. o ..o.. .. .. .. .. o .. .. .. ..o ..................
.. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. o .o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.o . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. o . . . .o . .o
.. .. .. .. .. .. .. .. .. .. ..o.. .. ... ... ... ... ... ... ... ... ... o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...o
.. ..Training
. . . . . . . . .. .. .. .. .. .. 0.26 ...... o.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. ... ... ... ... ... ... ... ... Error: .. .. .. .. .. .. ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... o ................
.. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ...Test
... ... ... ... Error:
... ... ... ... ... ... ... ... ... ... 0.30
.. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ...
............................................... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Bayes Error: 0.21 o
C = 0.01 37
38
Example III: Regression trees
• Main idea: recursively subdivide regressors into subspaces and compute the
mean of the regressor in that subspace.
• Works surprisingly well in practice (very popular in data mining), but nearly no
theoretical result (asymptotic normality?).
• Compute
f (x) = y
with
N
X 2
MSE (f (x)) = (yi − f (x))
i=1
and define (
y |xk ≤ t if xk ≤ t
fk,t (x) =
y |xk > t if xk > t
40
Recursive partitions II
41
R5
R2 t4
X2
X2
R3
t2 R4
R1
t1 t3
X1 X1
X1 ≤ t1
|
X2 ≤ t2 X1 ≤ t3
X2 ≤ t4
R1 R2 R3
X2 X1
42
R4 R5
Improvements
• Random forest.
43
Other algorithms
• Multi-task LASSO.
• LARS LASSO.
• Dantzig Selector.
• Ensemble methods.
44
Unsupervised and Reinforcement Learning
Jesús Fernández-Villaverde1
December 28, 2018
1
University of Pennsylvania
Unsupervised learning
• Use a sample:
N
D = {xi }i=1
to:
3. Dimensionality reduction.
• Example: what can we learn about the loan book of a bank without imposing
too much a priori structure?
1
506 14. Unsupervised Learning
2
• •• •• • •
4
• •• •• •
•
•••
1
•
• •• • •• • • • •• •
2
• •••• • •
• •• • •
• •• • • •
•
• •••••••••••• • •
• • •• •
• • • •• • • •• •• • •• •
X2
X2
••
••• • • • •• • ••••
0
••
0
• • •• •
• ••
• •••• ••• • • •• •••••••••••• • • ••
••• •
• • • •• • • • ••••• •
-2
• ••••••• • • • ••
-1
• • •••
• •
-4
••
• • •
-2
-6
-6 -4 -2 0 2 4 -2 -1 0 1 2
X1 X1
FIGURE 14.5. Simulated data: on the left, K-means clustering (with K=2) ha
been applied to the raw data. The two colors indicate the cluster memberships. On
the right, the features were first standardized before clustering. This is equivalen
2
to using feature weights 1/[2 · var(Xj )]. The standardization has obscured the two
Cluster discovering
1. Select K clusters
K ∗ = argmax p (K |D)
K
• Related variations:
3. k-SVD.
4
5
6
Other algorithms
1. Agglomerative clustering.
2. DBSCAN.
3. Birch.
• Density estimation.
• Association rules and the Apriori algorithm (Agrawal and Srikant, 1994).
7
8
4
Largest Principal o
Component
o o
2 o o oo o o
o oo oo
ooooo o
oooo o
o o oo
ooo oo o o ooo o o o
o o oo
o
oo o o o o
o ooo ooooo o ooo o o
o o oooo ooo o o o o
X2
o o oo o
ooo ooo ooo o o
0
o o o o o
ooo ooo oo o
o o
o o oo o o
ooo o ooooo
o ooo ooo
o ooo o oo o
oo oo ooooo o
o
o ooo oo ooo o o
o ooo oo o o
o o oo o
o o
-2
oo o o
oo o Smallest Principal
o Component
o
o
-4
-4 -2 0 2 4
9
X1
Reinforcement learning
• Algorithms that use training information that evaluates the actions taken
instead of deciding whether the action was correct.
• Purely evaluative feedback assess how good the action taken was, but not
whether it was the best feasible action.
10
Example: Multi-armed bandit problem
• But you do not know which action is best, you only have estimates (dual
control problem of identification and optimization).
11
12
Theory vs. practice
1. Follow greedy actions: actions with highest expected value. This is known as
exploiting.
• This should remind you of a basic dynamic programming problem: what is the
optimal mix of pure strategies?
• But these structures are too restrictive for practical purposes outside the pages
of Econometrica.
13
An action-value method I
• We start with Q0 (a) = 0 for all k. Here (and later), we randomize among ties.
14
An action-value method II
• In particular, we can connect with genetic algorithms (we will discuss AlphaGo
later on).
15
problem, such as the one shown in Figure 2.1, the action values, q⇤ (a), a = 1, . . .
2
q⇤ (3)
q⇤ (5)
1
q⇤ (9)
q⇤ (4)
Reward 0
q⇤ (1)
q⇤ (7)
distribution q⇤ (10
q⇤ (2)
-1 q⇤ (8)
q⇤ (6)
-2
-3
1 2 3 4 5 6 7 8 9 10
16
Action
1.5
" = 0.1
" = 0.01
1
" = 0 (greedy)
Average
reward
0.5
0
01 250 500 750 1000
Steps
100%
80%
" = 0.1
% 60%
" = 0.01
Optimal
action 40%
" = 0 (greedy)
20%
0%
01 250 500 750 1000
Steps
17
Figure 2.2: Average performance of "-greedy action-value methods on the 10-armed testbed.
A more general update rule
18
Improving the algorithm
100%
Optimistic,
optimistic, greedy
greedy
Q1 = 5, " = 0
80% Q01 = 5, !!= 0
20%
0%
01 200 400 600 800 1000
Plays
Steps
ure 2.3: The e↵ect of optimistic initial action-value estimates on the 10-armed testb
h methods used a constant step-size parameter, ↵ = 0.1.
20
spaces, particularly when using function approximation as developed in Part II
book. In these more advanced settings the idea of UCB action selection is usua
ractical.
1.5 UCB c = 2
-greedy = 0.1
1
Average
reward
0.5
0
1 250 500 750 1000
Steps
re 2.4: Average performance of UCB action selection on the 10-armed testbed. As show
generally performs better than "-greedy action selection, except in the first k steps, wh
cts randomly among the as-yet-untried actions.
21
ise 2.8: UCB Spikes In Figure 2.4 the UCB algorithm shows a distinct spi
ntermediate value of their parameter, neither too large nor too small. In assess
1.5
UCB greedy with
optimistic
1.4 initialization
α = 0.1
Average 1.3 -greedy
reward
gradient
over first 1.2 bandit
1000 steps
1.1
1
1/128 1/64 1/32 1/16 1/8 1/4 1/2 1 2 4
" ↵ c Q0
ure 2.6: A parameter study of the various bandit algorithms presented in this chap
point is the average reward obtained over 1000 steps with a particular algorithm a
cular setting of its parameter.
22
Other algorithms
• Actor-critic methods.
23
Deep Learning
Jesús Fernández-Villaverde
December 28, 2018
1
University of Pennsylvania
Deep learning
• Some of the most active research right now thanks to the rediscovery of
artificial neural networks (ANN; also known as connectionist systems) in the
2000s.
• It can deal with very abstract set-ups and, thus, it has more than considerable
option value.
• Key advances:
1
2
AlphaGo
• Check also:
1. https://deepmind.com/research/alphago/.
2. https://www.alphagomovie.com/
3
ARTICLE RESEARCH
a b
Rollout policy SL policy network RL policy network Value network Policy network Value network
Neural network
pS pV pU QT pVU (a⎪s) QT (s′)
Policy gradient
n
Cla
tio
Se
n
ca
ssio
ssifi
lf P
ssifi
lay
gre
ca
Cla
tio
Re
n
Data
s s′
Human expert positions Self-play positions
Figure 1 | Neural network training pipeline and architecture. a, A fast the current player wins) in positions from the self-play data set.
rollout policy pπ and supervised learning (SL) policy network pσ are b, Schematic representation of the neural network architecture used in
trained to predict human expert moves in a data set of positions. AlphaGo. The policy network takes a representation of the board position
A reinforcement learning (RL) policy network pρ is initialized to the SL s as its input, passes it through many convolutional layers with parameters
policy network, and is then improved by policy gradient learning to σ (SL policy network) or ρ (RL policy network), and outputs a probability
maximize the outcome (that is, winning more games) against previous distribution pσ (a | s) or pρ (a | s) over legal moves a, represented by a
versions of the policy network. A new data set is generated by playing probability map over the board. The value network similarly uses many
games of self-play with the RL policy network. Finally, a value network vθ convolutional layers with parameters θ, but outputs a scalar value vθ(s′)
is trained by regression to predict the expected outcome (that is, whether that predicts the expected outcome in position s′.
sampled state-action pairs (s, a), using stochastic gradient ascent to and its weights ρ are initialized to the same values, ρ = σ. We play
maximize the likelihood of the human move a selected in state s games between the current policy network pρ and a randomly selected
previous iteration of the policy network. Randomizing from a pool 4
∂log p (a | s )
RESEARCH ARTICLE
QT
P P Q Q
Q + u(P) max Q + u(P)
QT QT
Q Q
P P
Q + u(P) max Q + u(P)
pV QT QT QT
P P
pS
r r r r
igure 3 | Monte Carlo tree search in AlphaGo. a, Each simulation is evaluated in two ways: using the value network vθ; and by running
averses the tree by selecting the edge with maximum action value Q, a rollout to the end of the game with the fast rollout policy pπ, then
us a bonus u(P) that depends on a stored prior probability P for that computing the winner with function r. d, Action values Q are updated to
dge. b, The leaf node may be expanded; the new node is processed once track the mean value of all evaluations r(·) and vθ(·) in the subtree below
y the policy network pσ and the output probabilities are stored as prior that action.
robabilities P for each action. c, At the end of a simulation, the leaf node
arning of convolutional networks, won 11% of games against Pachi23 (s, a) of the search tree stores an action value Q(s, a), visit count N(s, a
nd 12% against a slightly weaker program, Fuego24. and prior probability P(s, a). The tree is traversed by simulation (th
is, descending the tree in complete games without backup), startin
einforcement learning of value networks from the root state. At each time step t of each simulation, an action
he final stage of the training pipeline focuses on position evaluation, is selected from state st 5
p
ce has been to create programs that can in- superhuman performance across multiple chal- Instead of an alpha-beta search with dom
d learn for themselves from first principles lenging games. specific enhancements, AlphaZero uses a gen
6). Recently, the AlphaGo Zero algorithm A landmark for artificial intelligence was purpose Monte Carlo tree search (MCTS) algor
eved superhuman performance in the game achieved in 1997 when Deep Blue defeated the Each search consists of a series of simu
human world chess champion (1). Computer games of self-play that traverse a tree from
chess programs continued to progress stead- state sroot until a leaf state is reached. Each
Mind, 6 Pancras Square, London N1C 4AG, UK. 2University ily beyond human level in the following two ulation proceeds by selecting in each stat
e London, Gower Street, London WC1E 6BT, UK.
e authors contributed equally to this work.
decades. These programs evaluate positions by move a with low visit count (not previo
esponding author. Email: davidsilver@google.com (D.S.); using handcrafted features and carefully tuned frequently explored), high move probability
tact@google.com (D.H.) weights, constructed by strong human players and high value (averaged over the leaf stat
1. Training AlphaZero for 700,000 steps. Elo ratings were (B) Performance of AlphaZero in shogi compared with the 2017
puted from games between different players where each player CSA world champion program Elmo. (C) Performance of AlphaZero
given 1 s per move. (A) Performance of AlphaZero in chess in Go compared with AlphaGo Lee and AlphaGo Zero (20 blocks
pared with the 2016 TCEC world champion program Stockfish. over 3 days).
6
Why deep learning?
• Moreover, ANN often require less “inside knowledge” by experts on the area..
7
Deep learning Example:
Shallow
Example: Example:
Example: autoencoders
Logistic Knowledge
MLPs
regression bases
Representation learning
Machine learning
AI
8
9
Limitations of deep learning
• While deep learning can work extremely well, there is no such a thing as a silver
bullets.
• Rule-of-thumb in the industry is that one needs around 107 labeled observations
to properly train a complex ANN with around 104 observations in each relevant
group.
10
Neural networks
• We will follow a much sober formal treatment (which, in any case, agrees with
state-of-art researchers approach).
11
A neuron
y = g (x; θ) = φ (z)
Inputs Weights
x1 θ1
x2 θ2 Activation
n
X Perceptron
θi xi classification
x3 i=1 output
θ3
Net input
γ
xn θn
13
The biological analog
14
Activation functions
• Traditionally:
1. A sigmoidal function:
1
φ (v ) =
1 + e −v
3. Hyperbolic tangent.
2. Softplus:
φ (v ) = log (1 + e v )
15
16
17
Interpretation
• The level of the θ control the activation rate (the higher the θ, the harder the
activation).
• Some textbooks separate the activation threshold and a scaling parameter from
θ as different coefficients in φ, but such separation moves notation farther away
from standard econometrics.
20
Two classic (yet remarkable) results II
• Assume, as well, that we are dealing with the class of functions for which the
Fourier transform of their gradient is integrable.
• We actually rely on the theorems by Leshno et al. (1993) and Bach (2017).
21
22
Training the network
23
Minimization
• You can flush the algorithm to a graphics processing unit (GPU) or a tensor
processing unit (TPU) instead of a standard CPU.
24
re 2-6. Batch gradient descent is sensitive to saddle points, which can lead to25pre
convergence
roach is illustrated by Figure 2-7, where instead of a single static error surfac
r surface is dynamic. As a result, descending on this stochastic surface si
ly improves our ability to navigate flat regions.
re 2-7. The stochastic error surface fluctuates with respect to the batch error su
bling saddle point avoidance
26
Stochastic gradient descent
• θ is recursively updated:
θt+1 = θt − t ∇E (θ; yj , ybj )
where:
" #>
∂E (θ; yj , ybj ) ∂E (θ; yj , ybj ) ∂E (θ; yj , ybj )
∇E (θ; yj , ybj ) ≡ 2 , 2 , ..., 1
∂θ0 ∂θ1 ∂θN,M
is the gradient of the error function with respect to θ evaluated at (yj , ybj ) until:
kθt+1 − θt k < ε
28
Alternative minimization algorithms
2. McMc/Simulated annealing.
3. Genetic algorithms:
• In fact, much of the research in deep learning incorporates some flavor of genetic
selection.
• Basic idea.
29
Multiple layers I
and
M
X
2 2 2 1
zm = θ0,m + θm φ zm
m=1
...
M
X
K −1
y = g (x; θ) = θ0K + K
θm φ zm
m=1
30
Flow representation
x1
x2
x3
1. It works! Our brains have six layers. AlphaGo has 12 layers with ReLUs.
• We can have different M’s in each layer ⇒ fewer neurons in higher layers allow
for compression of learning into fewer features.
32
Alternative ANNs
33
Input
Kernel
a b c d
w x
e f g h
y z
i j k l
Output
aw + bx + bw + cx + cw + dx +
ey + fz fy + gz gy + hz
ew + fx + fw + gx + gw + hx +
iy + jz jy + kz ky + lz
34
35
36
Appendix A
Direction set methods
Ax = b
• We can, in general, calculate f (P) and ∇f (P) for a given N-dimensional point
P.
37
Steepest descent method
• Start at a point P0 . As many times as needed, move from point Pi to the point
Pi+1 by minimizing along the line from Pi in the direction of the local downhill
gradient −∇f (Pi ).
• Risk of over-shooting.
• To avoid it: perform many small steps (perhaps with line search)⇒ Not very
efficient!
38
Steepest descent method
39
Conjugate gradient method
• A better way.
• In RN take N steps each of which attains the minimum in each direction, w/o
undoing previous steps’ progress.
40
Conjugate gradient method
41
Algorithm - linear
1. Let d0 = r0 = b − Ax0 .
riT ri
• αi = diT Adi
.
• xi+1 = xi + αi di .
• ri+1 = ri − αi Adi .
T
ri+1 ri+1
• βi+1 = riT ri
.
3. Return xN .
42
Algorithm - non-linear
1. Let d0 = r0 = −f 0 (x0 ).
• xi+1 = xi + αi di .
• ri+1 = −f 0 (xi+1 ).
T T
ri+1 ri+1 ri+1 (ri+1 −ri )
• βi+1 = riT ri
or βi+1 = max riT ri
,0 .
3. Return xN .
Go Back
43
Text Analysis
Jesús Fernández-Villaverde1
December 28, 2018
1
University of Pennsylvania
Text analysis
2. Documents in libraries.
3. Electronic news.
4. Verbal surveys.
• Large area with many other applications that, at the moment, are of less
interest in economics.
1
1600 QUARTERLY JOURNAL OF ECONOMICS
FIGURE I
EPU Index for the United States 2
ECONOMIC POLICY UNCERTAINTY
3
What is text?
• Some of these may be from the Latin alphabet –‘a’, ‘A’, ’p,’– but there may
also be:
5. Numbers.
• First step for non-editable files is conversion to editable format, usually with
optical character recognition software.
• With raw text files, we can use regular expressions to identify relevant patterns.
5
Raw text files
• Issues:
6
Regular expressions I
7
Regular expressions II
• Tye Rattenbury et al. claim that between 50% and 80% of real-life data
analysis is spent with data wrangling.
1. Tutorial: https://www.regular-expressions.info/reference.html.
8
Regular expressions III
• In Python: https:
//www.tutorialspoint.com/python/python_reg_expressions.htm.
• In R: https://www.rstudio.com/wp-content/uploads/2016/09/
RegExCheatsheet.pdf.
• In particular, learn to use the piping command from dplyr to make code more
readable.
9
Pre-processing I: tokenization
• Often these elements are words, but we may also want to keep numbers or
punctuation as well.
• Simple rules work well, but not perfectly. For example, splitting on white space
and punctuation will separate hyphenated phrases as in ‘risk-averse agent’ and
contractions as in ‘aren‘t’.
10
Pre-processing II: stopword removal
• Examples from English are ‘a’, ‘the’, ‘to’, ‘for,’ and so on.
• Also common to drop rare words, for example those that appear is less than
some fixed percentage of documents.
11
Pre-processing III: linguistic roots
• For many applications, the relevant information in tokens is their linguistic root,
not their grammatical form. We may want to treat ‘prefer’, ‘prefers’,
‘preferences’ as equivalent tokens.
• Two options:
2. Stem need not be an English word: Porter stemmer maps ‘inflation’ to ‘inflat’.
3. Lemmatizing : Tag each token with its part of speech, then look up each (word,
POS) pair in a dictionary to find linguistic root.
4. E.g., ‘saw’ tagged as verb would be converted to ‘see’, ‘saw’ tagged as noun left
unchanged.
• In FOMC data, most common bigrams include ‘interest rate’, ‘labor market’,
‘basi point’; most common trigrams include ‘feder fund rate’, ‘real interest
rate’, ‘real gdp growth’, ‘unit labor cost’.
13
More systematic approach
• Some phrases have meaning because they stand in for specific names, like
“Bank Indonesia”. One can used named-entity recognition software applied to
raw, tokenized text data to identify these.
• Other phrases have meaning because they denote a recurring concept, like
“housing bubble”. To find these, one can apply part-of-speech tagging, then
tabulate the frequency of the following tag patterns:
AN/NN/AAN/ANN/NAN/NNN/NPN.
14
Some terminology
15
Notation
P
• Let w = (w1 , . . . , wD ) be a list of all terms in the corpus, and let N ≡ d Nd
be the total number of terms in the corpus.
• We can map each term in the corpus into this index, so that wd,n ∈ {1, . . . , V }.
P
• Let xd,v ≡ n 1(wd,n = v ) be the count of term v in document d.
16
Example
1. ‘En un lugar’
• Iindex:
en un lugar muchos años después del
1 2 3 4 5 6 7
• We then have w1 = (1, 2, 3); w2 = (4, 5, 6); w3 = (6, 2, 3).
17
Multinominal distribution I
where
K
X
xi = n
i=1
K
X
βi = 1
i=1
18
Multinominal distribution II
• Moments:
E (Xi ) = nβi
Var (Xi ) = nβi (1 − βi )
Cov (Xi , Xj ) = −nβi βj for i 6= j
19
Dirichlet distribution I
4
3
0.8
Density
Density
2
0.4
1
0.0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
(i) (ii)
12
6
10
8
Density
Density
4
6
4
2
2
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
(iii) (iv)
Figure 1.1 Plots of the densities of Z ∼ Beta(a1 , a2 ) with various parameter values (i) 21
a1 = a2 = 0.9; (ii) a1 = a2 = 20; (iii) a1 = 0.5 and a2 = 2; (iv) a1 = 5 and a2 = 0.4.
0 0
0 10 20 30 4 5
-1 0 1 2 3 4 5
Z
Z
1
1 0.8
0.8 0.6 1
0.6 1 0.8
0.8 0 0.6
0 Y .4
Y .4 0.6 0.2 0.4
X
0.2 0.4 0.2
X 0
0.2 0
0 0
(i) (ii)
5 0 5
0 5 0
-5 0 5 10 1 2 2
-5 0 5 1 1 2
Z
1 Z 1
0.8 0.8
0.6 1 1
0.8 0.6
0 0.8
Y .4 0.6 0 0.6
0.2 0.4 Y .4
X 0.2 0.4
0.2 0.2
X
0 0 0 0
(iii) (iv)
Figure 2.1 Plots of the densities of (x1 , x2 ) ∼ Dirichlet(a1 , a2 ; a3 ) on V2 with various
parameter values: (i) a1 = a2 = a3 = 2; (ii) a1 = 1, a2 = 5, a3 = 10; (iii) a1 = 10, a2 = 3, 22
a3 = 8; (iv) a1 = 2, a2 = 10, a3 = 4.
Dirichlet distribution II
• Moments:
βi
E (Xi ) = PK
i=1 βi
P
K
βi i=1 βi − βi
Var (Xi ) = P 2 P
K K
i=1 βi i=1 βi + 1
−βi βj
Cov (Xi , Xj ) = P 2 P for i 6= j
K K
i=1 βi i=1 βi + 1
23
Simple probability model
• Suppose that each term is i.i.d., and that p(wn = v ) = βv ∈ [0, 1].
• Note that term counts are a sufficient statistic for w in the estimation of β.
The independence assumption provides statistical foundations for the
bag-of-words model.
24
Bayesian inference
• Why?
25
26
Posterior distribution
• We can add term counts to the prior distribution’s parameters to form posterior:
V
Y V
Y V
Y
p(β|w) ∝ p(w|β)p(β) ∝ βvxv βvηv −1 = βvxv +ηv −1
v =1 v =1 v =1
• Let β = (β 1 , . . . , β K ) and θ = (θ 1 , . . . , θ D ).
28
pics as Urns
Topics as urns
= wage = employ
= price = increase
• Two approaches:
We then have
1 if zd = k
θd,k = .
0 otherwise
30
ixture Model for Document
Mixture model
zd = 1 zd = 2
wd,n wd,n 31
Mixed-Membership Model for Document
Mixed-membership model
θd
0.25 0.75
zd,n = 1 zd,n = 2
wd,n wd,n 32
A canonical model
• Structure:
33
A modification of the Latent Dirichlet allocation
• We can slightly modified the previous model to ease, later on, the
implementation of a Gibbs sampler.
• Structure:
34
LDA I
L ATENT D IRICHLET A LLOCATION
α θ z w N
M
1: Graphical model representation of LDA. The boxes are “plates” representing replicates.
The outer plate represents documents, while the inner plate represents the repeated choice
of topics and words within a document.
p(zn | θ) is simply θi for the unique i such that zin = 1. Integrating over θ and summing over
obtain the marginal distribution of a document: 35
!
LDA II
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
11
00
00
11 topic 1
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
00
11 000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
x 111111111111111111111
000000000000000000000
topic simplex
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
x 000000000000000000000
111111111111111111111
x
000000000000000000000
111111111111111111111
x
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
x 000000000000000000000
111111111111111111111 word simplex
000000000000000000000
111111111111111111111
x
x 000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
x
000000000000000000000
111111111111111111111
x 000000000000000000000
111111111111111111111
x 111111111111111111111
000000000000000000000
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
x x
x 000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
x 000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
x
000000000000000000000
111111111111111111111
x 111111111111111111111
000000000000000000000
x
000000000000000000000
111111111111111111111
x
x 000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
x
x 000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
x 000000000000000000000
111111111111111111111
x
000000000000000000000
111111111111111111111
x111111111111111111111
x 000000000000000000000
x x
000000000000000000000
111111111111111111111
topic 2 000000000000000000000
111111111111111111111
000000000000000000000
111111111111111111111
11
00 000000000000000000000
111111111111111111111
00000000000000000000000000000
11111111111111111111111111111 1
0
11
00 00000000000000000000000000000
11111111111111111111111111111 1
0
00000000000000000000000000000
11111111111111111111111111111
00000000000000000000000000000
11111111111111111111111111111 topic 3
00000000000000000000000000000
11111111111111111111111111111
00000000000000000000000000000
11111111111111111111111111111
00000000000000000000000000000
11111111111111111111111111111
00000000000000000000000000000
11111111111111111111111111111
00000000000000000000000000000
11111111111111111111111111111
00000000000000000000000000000
11111111111111111111111111111
00000000000000000000000000000
11111111111111111111111111111
36
LDA III
37
Example statement: Yellen, March 2006, #51
38
Example statement: Yellen, March 2006, #51
39
Example statement: Yellen, March 2006, #51
40
Example statement: Yellen, March 2006, #51
Allocation
We have 17ticed a 39ange in the 39lationship 1etween the 25re 25I and the 41ained
25re 25I, which 25ggested to us that 36ybe 36mething is 38ing on 43lating to
25bstitution 20as at the 25per 39vel of the 16dex. You 23cused on the 25nmarket
25mponent of the 25E, and I 32ndered if 38mething 16usual might be 4appening
with the 25re 25I 16lative to other 25asures.
41
Distribution
Distribution of
of Attention
topics
42
Topic 25
Topic 25
43
Topic 11
Topic 11
44
Advantage of Flexibility
• ‘measur’ has probability 0.026 in topic 25, and probability 0.021 in topic 11.
45
ternal Validation—BBD
Topics vs. BDD
200
200
.1
.1
BBD Topic 35 BBD Topic 12
Fraction of FOMC 1 & 2
.08
150
150
.06
.06
BBD
BBD
.04
.04
100
100
.02
.02
50
50
0
1987m7 1991m7 1995m7 1999m7 2003m7 2007m7 1987m7 1991m7 1995m7 1999m7 2003m7 2007m7
46
o-Cyclical Topics
Pro-cyclical topics
Fraction of speaker’s time in FOMC2
.2
Min Median Max Min Median Max
.15
.15
.1
.1
.05
.05
0
1987m7 1991m1 1994m7 1998m1 2001m7 2005m1 2008m7 1987m7 1991m1 1994m7 1998m1 2001m7 2005m1 2008m7
47
unter-Cyclical Topics
Counter-cyclical topics
Fraction of speaker’s time in FOMC2
.2
Min Median Max Min Median Max
.15
.15
.1
.1
.05
.05
0
1987m7 1991m1 1994m7 1998m1 2001m7 2005m1 2008m7 1987m7 1991m1 1994m7 1998m1 2001m7 2005m1 2008m7
48
tribution of Topics in Iraq Articles
Time series
0.14
topic0
topic7
0.12 topic9
0.10
0.08
0.06
0.04
0.02
1980 1985 1990 1995 2000 2005 2010 2015 49
Larsen (2018)
te: The 150 words with the highest probabilities are shown, the size of the words corresponds to th
obability of that word occurring in the topic distribution. All the word clouds are available at
tp://www.vegardlarsen.com/Word_clouds/.
ne by starting out with a given set of word distributions where the probabilities of50th
Figure 2. Aggregate newspaper uncertainty
Note: The black line plots the 300 day backward-looking rolling mean. The series gives the share of
uncertainty terms per 1 000 000 words in the newspaper.
53
te: The black line plots the 300 day backward-looking mean. The uncertainty count is the number
Note: A plot of the four principal components extracted from 80 topic-based uncertainty measures. 54
Figure 8. Correlations with alternative measures
• The posterior distribution of the latent variables taking the parameters as given
is:
p (w|z = z 0 , θ, β) p (z = z 0 |θ, β)
p (z = z 0 |w, θ, β) = P 0 0
z 0 p (w|z = z , θ, β) p (z = z |θ, β)
• For example, a 100 word corpus with 50 topics has ≈ 7.88 × 10169 terms.
56
Gibbs sampler
• Outline:
1. Sample from a multinomial distribution N times for the topic allocation variables.
• We can improve upon the basic Gibbs sampler with collapsed sampling, i.e.,
analytically integrating out some variables in the joint likelihood and sampling
the remainder (Griths and Steyvers, 2004, and Hansen, McMahon, and Prat,
2015).
57
Sampling θ d
• Then
YX Y
p(zd |θ d ) = 1(zd,n = k)θd,k = nd,k
θd,k .
n k k
58
Sampling β k
p(zd,n = k|wd,n = v , θ d , β)
p(wd,n = v |zd,n = k, θ d , β)p(zd,n = k|θ d , β)
=P
k p(wd,n = v |zd,n = k, θ d , β))p(zd,n = k|θ d , β)
θk β v
= P d kk v
k θ d βk
60
Priors
• There are three parameters to set to run the Gibbs sampling algorithm:
number of topics K and hyperparameters α, η.
• Methods to choose K :
2. Information criteria.
61
Cross validation
• For test data, obtain θd distributions via sampling as above, or else use uniform
distribution.
62
Information criteria
• Erosheva et al. (2007) compare several in the context of an LDA-like model for
survey data, and find that AICM is optimal.
P
b s be the average value of the log-likelihood across S
• Let µ` = S1 s ` w | Θ
draws of a Markov chain.
1
P 2
b s − µ` be the variance.
• Let σ`2 = S s ` w |Θ
63
Formalizing interpretablility
• Chang et al. (2009), in “Reading Tea Leaves: How Humans Interpret Topic
Models,” propose an objective way of determining whether topics are
interpretable.
• Two tests:
1. Word intrusion. Form set of words consisting of top five words from topic k +
word with low probability in topic k. Ask subjects to identify inserted word.
• Estimate LDA and other topic models on NYT and Wikipedia articles for
K = 50, 100, 150.
64
Results
Distribution of topics
65