Linear Regression Methods Overview
Linear Regression Methods Overview
Chapter 3
Linear Regression
Regression
EEE 485/585 Statistical Learning and Data Analytics Simple linear
regression
(Multiple) linear
regression
Classification with
linear regression
Non-linear
relationships with
linear regression
Cem Tekin
Bilkent University
Regression
Simple linear
regression
(Multiple) linear
regression
Classification with
linear regression
Non-linear
relationships with
All models are wrong, but some are useful - George E. Box, linear regression
1978
3.2
Linear Regression
Regression
XM ECE o
Y fly me gig
Model:
Y ⇡ f (X , true )
T
X = [X , X , . . . , X ] : independent variable vector
1 2 p
Y : dependent variable
Regression
true
: unknown parameter vector Simple linear
regression
Form of f is assumed to be known (Multiple) linear
regression
Usually Classification with
linear regression
true
E[Y |X ] = f (X , ) Non-linear
relationships with
linear regression
Some applications:
Stock market prediction given macro economic variables
Costumer satisfaction vs waiting time
Smart grid - load prediction
Auto car sales prediction
3.3
Linear Regression
Simple linear regression
Model:
Y : response variable, X : predictor variable
Y ⇡ 0true + 1true X
ER
Example:
ECE 0 E 3
sales ⇡ 0true + 1true ⇥ TV ad budget Regression
|{z} |{z} Simple linear
slope regression
bias
estimate 0true
D Higilli
true
How to and 1 ?
(Multiple) linear
regression
Classification with
linear regression
25
Non-linear
linear regression
j Potti
Sales
15
10
5
TV
Figure 3.1 from “An introduction to statistical learning" by James et al. 3.4
Linear Regression
The least squares approach
n
X z
gdiitia
ŷ
}| {
i
2
RSS( 0, 1) = (yi ( 0+ 1 xi ) ) Regression
| {z } Simple linear
i=1
residual regression
(Multiple) linear
Least squares method: regression
Classification with
linear regression
Non-linear
( ˆ0 , ˆ1 ) := arg min RSS( 0, 1) relationships with
linear regression
( 0, 1 )2R2
3.5
Linear Regression
The least squares approach
(Multiple) linear
D [Link]
Least squares method: regression
Classification with
linear regression ex Ei O centralized predator
Foo
2 linear regression
( 0 , 1 )2R
is
Pn
i=1 (xi x̄)(yi ȳ )
slope
ˆ1 = Pn aftercentering
Trixie predateand
i=1 (xi x̄)2
aim
ˆ0 = ȳ ˆ1 x̄
a.m
response
1
Pn 1
Pn
where ȳ = n i=1 yi and x̄ = n i=1 xi 3.5
Linear Regression
The least squares approach - Example
of
minimizer
Rssthatpair
25
É
highorz
20
Sales
15
10
5 Regression
Simple linear
regression
0 50 100 150 200 250 300
(Multiple) linear
TV
regression
Rs
scrotal Classification with
linear regression
Non-linear
relationships with
linear regression
againRoseropi
RS
S
β1
β0
Figures 3.1 and 3.2 from “An introduction to statistical learning" by James et al. 3.6
How accurate is ( ˆ0 , ˆ1 )?
Linear Regression
Classification with
10
10
linear regression
Non-linear
relationships with
5
5
linear regression
Y
Y
0
0
−5
−5
grins a
−10
−10
−2 −1 0 1 2 −2 −1 0 1 2
X X
Unbiasedness:
leanrater satin
E[ ˆ0 ] = true
0
true
E[ ˆ1 ] = 1
Variance: Regression
2
novast
Simple linear
= Var(✏), ✏i s are uncorrelated regression
h i
x̄ 2
not
(Multiple) linear
Var( ˆ0 ) = 2 n1 + Pn (x
g
2 regression
i x̄) i=1
Classification with
2
Var( ˆ1 ) = Pn
x̄)2
linear regression
Gi it
Is't RI
sample
variance
3.8
Confidence intervals for ˆ0 and ˆ1
Linear Regression
Simple linear
Example: regression
(Multiple) linear
regression
25
Classification with
linear regression
20
Non-linear
relationships with
Sales
linear regression
15
10
5
TV
Non-linear
relationships with
linear regression
25
25
25
20
20
20
Sales
Sales
Sales
15
15
15
10
10
10
5
TV Radio Newspaper
Figure 2.1 from “An introduction to statistical learning" by James et al. 3.10
Linear Regression
(Multiple) linear regression
Given [X1 , X2 , . . . , Xp ] predict Y via a linear model:
ECE
Y Botnet
If Botnet ˆE X need to learn
p
Ŷ = 0 + ˆj Xj
|{z}
bias
j=1 ptsparameters Regression
trope
0
Simple linear
O
Let X = [1, X1 , . . . , Xp ]T and ˆ = [ ˆ0 , ˆ1 , . . . , ˆp ]T . Then, regression
(Multiple) linear
regression
T Classification with
Ŷ = X T ˆ = ˆ X linear regression
Non-linear
relationships with
linear regression
3.11
Linear Regression
The least squares approach
Non-linear
relationships with
linear regression
ˆ RSS = arg min RSS( )
2Rp+1
in no Is
In
nign
y
3.12
I
Linear Regression
The least squares solution
E
Some2 notation:
3 2 3 2 3
Itt IEI E
y1 ŷ1 1 x11 x12 ... x1p
petition
6 y2 7 6ŷ2 7 61
6 7 6 7 6 x21 x22 ... x2p 7
7
y = 6 . 7, ŷ = 6 . 7, X = 6 . .. .. .. 7
Cy Itcz gl
4 .. 5 4 .. 5 4 .. ..
. . . . 5
yn ŷn 1 xn1 xn2 ... xnp
I Regression
Ly XIICy Xfl
Simple linear
Least squares solution: regression
(Multiple) linear
yt It Ily XII
regression
n
X Classification with
linear regression
Yy Yy It y It I
ˆ RSS = arg min RSS( ) = arg min (yi x Ti 2
) ) Non-linear
2Rp+1 2Rp+1 i=1 relationships with
linear regression
ˆ RSS = (XT X) 1
XT y
Non-linear
relationships with
linear regression
3.14
Linear Regression
How to get the least squares solution?
Rules of matrix differentiation:
Let g = [g1 , g2 , . . . , gn ]T and f : Rn ! Rm
Let h = f (g), (h = [h1 , h2 , . . . , hm ]T )
2 @h @h1 @h1
3
1
@g @g . . . @g
6 @h12 @h22 . . . @hn2 7 Regression
Non-linear
1 h = Ag ) @h
@g = A. relationships with
linear regression
2 ↵ = y T Ag where y : m ⇥ 1, g : n ⇥ 1 and A : m ⇥ n
Idw )
@↵
@g
= y T A,
@↵
@y
= g T AT
3 ↵ = y T Ay where y : m ⇥ 1 and A : m ⇥ m
@↵
) = y T (A + AT )
@y
3.14
Linear Regression
Example
Sales
Regression
squares regression
sector
(Multiple) linear
regression
Classification with
linear regression
Non-linear
TV relationships with
linear regression
Radio
Figure 3.5 from “An introduction to statistical learning" by James et al. 3.15
Linear Regression
Linear algebra interpretation of the solution
ŷ is the orthogonal projection of y into the subspace spanned
by the2column3 vectors:
2 3 2 3
ÉEEETE
1 x11 x1p
617 6 x21 7 6 x2p 7
6 7 6 7 6 7
w 0 = 6 . 7, w 1 = 6 . 7, . . ., w p = 6 . 7
4 .. 5 4 .. 5 4 .. 5 Regression
(Multiple) linear
XII
regression
I s an Classification with
linear regression
I
Non-linear
IEEE
relationships with
linear regression
error
w1
w2
3.16
Linear Regression
Gauss-Markov Theorem
Assumptions:
yi = ( true )T x i + ✏i , i = 1, . . . , n
( true fixed & not observed, x i fixed & observed, ✏i : noise,
random, not observed)
2
E[✏i ] = 0, Var(✏i ) = , Cov(✏i , ✏j ) = 0, i 6= j Regression
Simple linear
true
Linear estimator of : regression
II [Link]
j
ˆj = c1j y1 + · · · + cnj yn ex (Multiple) linear
regression
Classification with
c : possibly non-linear function of X
if thelinear
kj
in
linear regression
smallest y
true Non-linear
Theorem: Least squares estimate of have relationships with
linear regression
variance among all linear unbiased estimators.
true
Unbiased estimator: E[ ˆ ] =
3.17
Linear Regression
Probabilistic interpretation of the solution
yi = ( true T fixed
) x i + ✏i (unknown)
2
i.i.d. ✏i ⇠ N (0, )
We have access to D := {(x i , yi )}ni=1
Lets compute the MLE of Regression
yi W Exist
(Multiple) linear
regression
lain Non-linear
relationships with
linear regression
Pace PG't
ay
3.18
Linear Regression
Probabilistic interpretation of the solution
true T
yi = ( ) x i + ✏i (unknown)
2
i.i.d. ✏i ⇠ N (0, )
We have access to D := {(x i , yi )}ni=1
Lets compute the MLE of Regression
(Multiple) linear
!
xi2
regression
n
Y T
1 (yi ) Classification with
L( ) = p exp 2
linear regression
2⇡ 2 Non-linear
i=1 relationships with
linear regression
3.18
Linear Regression
Probabilistic interpretation of the solution
underGaussian
yi = ( true T
) x i + ✏i (unknown) likelihood
2
i.i.d. ✏i ⇠ N (0, )
We have access to D := {(x i , yi )}ni=1
Lets compute the MLE of Regression
stands
(Multiple) linear
n
! regression
Y T
1 (yi xi) Classification with
L( ) = p exp 2
linear regression
2⇡ 2 Non-linear
i=1 relationships with
linear regression
Log likelihood:
logiait logothigh loge a
n
1 1 X T
l( ) := log L( ) = n log p x i )2
IME ER's
(yi
2⇡ 2 2
i=1 R55Al
arggox1cal
CI
angryleft
since log is monotone
argginR
3.18
Linear Regression
Probabilistic interpretation of the solution
Simple linear
regression
(Multiple) linear
2 regression
Note: MLE is independent of Classification with
linear regression
Non-linear
relationships with
linear regression
3.19
Linear Regression
Binary classification
D = {(x i , yi )}ni=1
yi 2 {0, 1}. yi = 0 =) BLUE and yi = 1 =) ORANGE
T
ŷi = ˆ RSS x i
Simple linear
regression
(Multiple) linear
regression
Classification with
linear regression
Non-linear
relationships with
linear regression
Figure 2.1 from “The elements of statistical learning" by Hastie et al. 3.20
Linear Regression
Binary classification
D = {(x i , yi )}ni=1
yi 2 {0, 1}. yi = 0 =) BLUE and yi = 1 =) ORANGE
T
ŷi = ˆ RSS x i
T
Regression coefficients ˆ RSS can be used to define a decision boundary:
T
{x : ˆ x = 0.5}
RSS
Regression
Simple linear
(
1 T decisionthresholdregression
\=
Class
ORANGE,
BLUE,
T O
if ˆ RSS x > 0.5
if ˆ RSS x 0.5
(Multiple) linear
regression
Classification with
0 linear regression
Non-linear
relationships with
linear regression
If decisionboundary
Sales
0
Regression
Simple linear
regression
(Multiple) linear
regression
Classification with
linear regression
Non-linear
0
TV relationships with
linear regression
Radio
New model:
Let X3 = X1 ⇥ X2 feature
grafted
Y = 0 + 1 X1 + 2 X2 + 3 X3 +✏
Figure 3.5 from “An introduction to statistical learning" by James et al. 3.21
Linear Regression
Polynomial regression
Model:
2 p
Y = 0 + 1X + 2X + ... + pX +✏
D := {(xi , yi )}ni=1 , xi 2 R
2 3 2 32 3 2 3
y1 1 x1 x12 ... x1p 0 ✏1 Regression
p7 6 7
6y2 7 61
6 7 6 x2 x22 ... x2 7 6 1 7 6✏2 7
6
7 Simple linear
6 .. 7 = 6 .. .. .. .. .. 7 6 .. 7 + 6 .. 7 regression
4 . 5 4. . . . . 5 4 . 5 4 .5 (Multiple) linear
p regression
yn 1 xn xn2 . . . xn p ✏n Classification with
| {z } | {z } linear regression
y X Vandermonde matrix Non-linear
relationships with
Least squares solution: ˆ = (XT X) 1 T
X y linear regression
3.22
Linear Regression
Example to choose
what
p
I
hyperparameter
50
PIE
Linear
Degree 2
Degree 5
p5
Regression
40
Simple linear
regression
Miles per gallon
(Multiple) linear
regression
30
Classification with
linear regression
Non-linear
relationships with
20
linear regression
10
Horsepower
Figure 3.8 from “An introduction to statistical learning" by James et al. 3.23
Liner boric function models
ERP
yer
giftÉiÉÉ
function
polynomialregression
p11 gWhat
borisfin Oye exp 1YE s hyper
Gaussian
Aj parameters
boorf
sigmoid
Oj [Link] llzs