0% found this document useful (0 votes)
6 views29 pages

Linear Regression Methods Overview

This document discusses linear regression, covering simple and multiple linear regression, as well as classification and non-linear relationships. It explains the least squares approach for estimating parameters, including the calculation of residual sum of squares and the properties of estimators. Applications of linear regression in various fields such as stock market prediction and customer satisfaction are also highlighted.

Uploaded by

Sarp İLHAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views29 pages

Linear Regression Methods Overview

This document discusses linear regression, covering simple and multiple linear regression, as well as classification and non-linear relationships. It explains the least squares approach for estimating parameters, including the calculation of residual sum of squares and the properties of estimators. Applications of linear regression in various fields such as stock market prediction and customer satisfaction are also highlighted.

Uploaded by

Sarp İLHAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Linear Regression

Chapter 3
Linear Regression
Regression
EEE 485/585 Statistical Learning and Data Analytics Simple linear
regression

(Multiple) linear
regression

Classification with
linear regression

Non-linear
relationships with
linear regression

Cem Tekin
Bilkent University

Cannot be distributed outside this class without the permission of the


instructor. 3.1
Linear Regression

Regression

Simple linear
regression

(Multiple) linear
regression

Classification with
linear regression

Non-linear
relationships with
All models are wrong, but some are useful - George E. Box, linear regression

1978

3.2
Linear Regression
Regression
XM ECE o

Y fly me gig
Model:
Y ⇡ f (X , true )
T
X = [X , X , . . . , X ] : independent variable vector
1 2 p
Y : dependent variable
Regression
true
: unknown parameter vector Simple linear
regression
Form of f is assumed to be known (Multiple) linear
regression
Usually Classification with
linear regression
true
E[Y |X ] = f (X , ) Non-linear
relationships with
linear regression

Some applications:
Stock market prediction given macro economic variables
Costumer satisfaction vs waiting time
Smart grid - load prediction
Auto car sales prediction

3.3
Linear Regression
Simple linear regression
Model:
Y : response variable, X : predictor variable
Y ⇡ 0true + 1true X
ER
Example:
ECE 0 E 3
sales ⇡ 0true + 1true ⇥ TV ad budget Regression
|{z} |{z} Simple linear
slope regression
bias
estimate 0true
D Higilli
true
How to and 1 ?
(Multiple) linear
regression

Classification with
linear regression
25

Non-linear

Roni relationships with


20

linear regression

j Potti
Sales

15
10
5

0 50 100 150 200 250 300

TV

Figure 3.1 from “An introduction to statistical learning" by James et al. 3.4
Linear Regression
The least squares approach

Data: D = {(xi , yi )}ni=1 (in ad example n = 200)


RSS: Residual sum of squares

n
X z
gdiitia

}| {
i

2
RSS( 0, 1) = (yi ( 0+ 1 xi ) ) Regression
| {z } Simple linear
i=1
residual regression

(Multiple) linear
Least squares method: regression

Classification with
linear regression

Non-linear
( ˆ0 , ˆ1 ) := arg min RSS( 0, 1) relationships with
linear regression
( 0, 1 )2R2

Least squares solution:

3.5
Linear Regression
The least squares approach

Data: D = {(xi , yi )}ni=1 (in ad example n = 200)


RSS: Residual sum of squares
ŷi
n
D lexigilli
X z }| {
RSS( 0, 1) = (yi ( 0 + 1 xi ) ) 2 Regression
| {z } Simple linear
i=1
residual regression

(Multiple) linear
D [Link]
Least squares method: regression

Classification with
linear regression ex Ei O centralized predator

( ˆ0 , ˆ1 ) := arg min RSS( 0, 1)


Non-linear
relationships with yyg Ey O response

Foo
2 linear regression
( 0 , 1 )2R

Least squares solution:


einberemoved

is
Pn
i=1 (xi x̄)(yi ȳ )
slope
ˆ1 = Pn aftercentering
Trixie predateand
i=1 (xi x̄)2

aim
ˆ0 = ȳ ˆ1 x̄

a.m
response
1
Pn 1
Pn
where ȳ = n i=1 yi and x̄ = n i=1 xi 3.5
Linear Regression
The least squares approach - Example
of
minimizer
Rssthatpair

25

É
highorz

20
Sales

15
10
5 Regression

Simple linear
regression
0 50 100 150 200 250 300
(Multiple) linear
TV
regression

Rs
scrotal Classification with
linear regression

Non-linear
relationships with
linear regression

againRoseropi
RS
S

β1

β0

ˆ0 = 7.03 and ˆ1 = 0.0475. Ŷ = 7.03 + 0.0475X

Figures 3.1 and 3.2 from “An introduction to statistical learning" by James et al. 3.6
How accurate is ( ˆ0 , ˆ1 )?
Linear Regression

True model: Y = true true


0 + 1 X +✏
✏: zero mean random variable
Example random
afinal
Y = 2 + 3X + ✏ (unknown true model), {xi }100
i=1 given
Generate: 100 random ✏i values from N (0, 1) (independently), Regression

yi = 2 + 3xi + ✏i Simple linear


regression
Data: D = {(xi , yi )}100
i=1 (only this is known) (Multiple) linear
regression

Classification with
10

10
linear regression

Non-linear
relationships with
5

5
linear regression
Y

Y
0

0
−5

−5

grins a
−10

−10

−2 −1 0 1 2 −2 −1 0 1 2

X X

Figure: Red line is the true (unknown) model


Figure 3.3 from “An introduction to statistical learning" by James et al. 3.7
Properties of ˆ0 and ˆ1
Linear Regression

Unbiasedness:
leanrater satin
E[ ˆ0 ] = true
0
true
E[ ˆ1 ] = 1
Variance: Regression

2
novast
Simple linear
= Var(✏), ✏i s are uncorrelated regression
h i
x̄ 2
not
(Multiple) linear
Var( ˆ0 ) = 2 n1 + Pn (x

g
2 regression
i x̄) i=1
Classification with
2
Var( ˆ1 ) = Pn
x̄)2
linear regression

i=1 (xi Non-linear


relationships with
linear regression

Gi it
Is't RI
sample
variance

3.8
Confidence intervals for ˆ0 and ˆ1
Linear Regression

95% confidence interval for 0true :


q q
[ ˆ0 2 Var( ˆ0 ), ˆ0 + 2 Var( ˆ0 )]

95% confidence interval for 1true :


q q
[ ˆ1 2 Var( ˆ1 ), ˆ1 + 2 Var( ˆ1 )] Regression

Simple linear
Example: regression

(Multiple) linear
regression
25

Classification with
linear regression
20

Non-linear
relationships with
Sales

linear regression
15
10
5

0 50 100 150 200 250 300

TV

95% confidence interval for true


0 : [6.130, 7.935]
95% confidence interval for true
1 : [0.042, 0.053]
3.9
Linear Regression
(Multiple) linear regression
bolusinsulin bloodglucose
Model:
Y : response variable, X = [X1 , . . . , Xp ]T :
Y ⇡ 0true + 1true X1 + . . . ptrue Xp blotglaeffabolydra incl I
Example:
Regression
sales ⇡
Bo t p cob Rains
Simple linear
true
0 + 1true ⇥ TV + true
2 ⇥ radio + true
3 ⇥ newspaper regression

|{z} (Multiple) linear


regression
bias
true true true true T Classification with
How to estimate =[ 0 , 1 ,..., p ] ? linear regression

Non-linear
relationships with
linear regression
25

25

25
20

20

20
Sales

Sales

Sales
15

15

15
10

10

10
5

0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100

TV Radio Newspaper

Figure 2.1 from “An introduction to statistical learning" by James et al. 3.10
Linear Regression
(Multiple) linear regression
Given [X1 , X2 , . . . , Xp ] predict Y via a linear model:
ECE
Y Botnet
If Botnet ˆE X need to learn
p
Ŷ = 0 + ˆj Xj
|{z}
bias
j=1 ptsparameters Regression

trope
0
Simple linear

O
Let X = [1, X1 , . . . , Xp ]T and ˆ = [ ˆ0 , ˆ1 , . . . , ˆp ]T . Then, regression

(Multiple) linear
regression

T Classification with
Ŷ = X T ˆ = ˆ X linear regression

Non-linear
relationships with
linear regression

3.11
Linear Regression
The least squares approach

Data: D = {(x i , yi )}ni=1


x i = [1, xi1 , xi2 , . . . , xip ]T
RSS: Residual sum of squares
n
X Regression

RSS( ) = (yi x Ti ) 2 Simple linear


regression
i=1 (Multiple) linear
regression

Least squares method: Classification with


linear regression

Non-linear
relationships with
linear regression
ˆ RSS = arg min RSS( )
2Rp+1

in no Is
In
nign
y

3.12
I
Linear Regression
The least squares solution

E
Some2 notation:
3 2 3 2 3
Itt IEI E
y1 ŷ1 1 x11 x12 ... x1p
petition
6 y2 7 6ŷ2 7 61
6 7 6 7 6 x21 x22 ... x2p 7
7
y = 6 . 7, ŷ = 6 . 7, X = 6 . .. .. .. 7
Cy Itcz gl
4 .. 5 4 .. 5 4 .. ..
. . . . 5
yn ŷn 1 xn1 xn2 ... xnp
I Regression

Ly XIICy Xfl
Simple linear
Least squares solution: regression

(Multiple) linear

yt It Ily XII
regression

n
X Classification with
linear regression

Yy Yy It y It I
ˆ RSS = arg min RSS( ) = arg min (yi x Ti 2
) ) Non-linear
2Rp+1 2Rp+1 i=1 relationships with
linear regression
ˆ RSS = (XT X) 1
XT y

Assumption: X has full column rank!


HII Q y't ytxtpt Ext ti
Then ŷ = X ˆ RSS = X(XT X) 1 XT y
| {z }
projection matrix H e.g
Kxtx gtx xtxp xty
3.13 if Ext [Link] B txt'xty
Linear Regression
How to get the least squares solution?
Rules of matrix differentiation:
Let g = [g1 , g2 , . . . , gn ]T and f : Rn ! Rm
Let h = f (g), (h = [h1 , h2 , . . . , hm ]T )
2 @h @h1 @h1
3
1
@g @g . . . @g
6 @h12 @h22 . . . @hn2 7 Regression

6 @g @g @gn 7 Simple linear


6 1 2
.. 77 Jacobian matrix
@h
@g = 6 .. .. regression
..
4 . . . . 5 (Multiple) linear
regression
@hm @hm
@g1 @g2 . . . @h
@gn
m
Classification with
linear regression

Non-linear
relationships with
linear regression

3.14
Linear Regression
How to get the least squares solution?
Rules of matrix differentiation:
Let g = [g1 , g2 , . . . , gn ]T and f : Rn ! Rm
Let h = f (g), (h = [h1 , h2 , . . . , hm ]T )
2 @h @h1 @h1
3
1
@g @g . . . @g
6 @h12 @h22 . . . @hn2 7 Regression

6 @g @g @gn 7 Simple linear


6 1 2
.. 77 Jacobian matrix
@h
@g = 6 .. .. regression
..
4 . . . . 5 (Multiple) linear
regression
@hm @hm
@g1 @g2 . . . @h
@gn
m
Classification with
linear regression

Non-linear
1 h = Ag ) @h
@g = A. relationships with
linear regression
2 ↵ = y T Ag where y : m ⇥ 1, g : n ⇥ 1 and A : m ⇥ n

Idw )
@↵
@g
= y T A,
@↵
@y
= g T AT

3 ↵ = y T Ay where y : m ⇥ 1 and A : m ⇥ m
@↵
) = y T (A + AT )
@y
3.14
Linear Regression
Example

Linear regression fit to sales using TV and radio as


predictors.

Sales
Regression

least Simple linear

squares regression

sector
(Multiple) linear
regression

Classification with
linear regression

Non-linear
TV relationships with
linear regression
Radio

Figure 3.5 from “An introduction to statistical learning" by James et al. 3.15
Linear Regression
Linear algebra interpretation of the solution
ŷ is the orthogonal projection of y into the subspace spanned
by the2column3 vectors:
2 3 2 3

ÉEEETE
1 x11 x1p
617 6 x21 7 6 x2p 7
6 7 6 7 6 7
w 0 = 6 . 7, w 1 = 6 . 7, . . ., w p = 6 . 7
4 .. 5 4 .. 5 4 .. 5 Regression

1 xN1 xNp Simple linear


regression

(Multiple) linear

XII
regression

I s an Classification with
linear regression

I
Non-linear

IEEE
relationships with
linear regression

error
w1

w2
3.16
Linear Regression
Gauss-Markov Theorem
Assumptions:
yi = ( true )T x i + ✏i , i = 1, . . . , n
( true fixed & not observed, x i fixed & observed, ✏i : noise,
random, not observed)
2
E[✏i ] = 0, Var(✏i ) = , Cov(✏i , ✏j ) = 0, i 6= j Regression

Simple linear
true
Linear estimator of : regression

II [Link]
j
ˆj = c1j y1 + · · · + cnj yn ex (Multiple) linear
regression

Classification with
c : possibly non-linear function of X
if thelinear
kj
in
linear regression

smallest y
true Non-linear
Theorem: Least squares estimate of have relationships with
linear regression
variance among all linear unbiased estimators.
true
Unbiased estimator: E[ ˆ ] =

“Best linear unbiased estimator" (BLUE) = ˆ RSS

3.17
Linear Regression
Probabilistic interpretation of the solution

yi = ( true T fixed
) x i + ✏i (unknown)
2
i.i.d. ✏i ⇠ N (0, )
We have access to D := {(x i , yi )}ni=1
Lets compute the MLE of Regression

Yi Bteitei Simple linear


regression

yi W Exist
(Multiple) linear
regression

Iight's Classification with


linear regression

lain Non-linear
relationships with
linear regression

Pace PG't
ay
3.18
Linear Regression
Probabilistic interpretation of the solution
true T
yi = ( ) x i + ✏i (unknown)
2
i.i.d. ✏i ⇠ N (0, )
We have access to D := {(x i , yi )}ni=1
Lets compute the MLE of Regression

Likelihood: Simple linear


regression

(Multiple) linear
!
xi2
regression
n
Y T
1 (yi ) Classification with
L( ) = p exp 2
linear regression

2⇡ 2 Non-linear
i=1 relationships with
linear regression

3.18
Linear Regression
Probabilistic interpretation of the solution
underGaussian
yi = ( true T
) x i + ✏i (unknown) likelihood
2
i.i.d. ✏i ⇠ N (0, )
We have access to D := {(x i , yi )}ni=1
Lets compute the MLE of Regression

Likelihood: Simple linear


likelihood
for
regression

stands
(Multiple) linear

n
! regression
Y T
1 (yi xi) Classification with
L( ) = p exp 2
linear regression

2⇡ 2 Non-linear
i=1 relationships with
linear regression

Log likelihood:
logiait logothigh loge a
n
1 1 X T
l( ) := log L( ) = n log p x i )2
IME ER's
(yi
2⇡ 2 2
i=1 R55Al

arggox1cal
CI
angryleft
since log is monotone
argginR
3.18
Linear Regression
Probabilistic interpretation of the solution

Maximizing L( ) is equivalent to maximizing l( ) since


l( ) is a strictly increasing function of L( )!

arg max l( ) = arg min RSS( ) =) ˆ MLE = ˆ RSS Regression

Simple linear
regression

(Multiple) linear
2 regression
Note: MLE is independent of Classification with
linear regression

Non-linear
relationships with
linear regression

3.19
Linear Regression
Binary classification
D = {(x i , yi )}ni=1
yi 2 {0, 1}. yi = 0 =) BLUE and yi = 1 =) ORANGE
T
ŷi = ˆ RSS x i

Irs Atx Tty Regression

Simple linear
regression

(Multiple) linear
regression

Classification with
linear regression

Non-linear
relationships with
linear regression

Figure 2.1 from “The elements of statistical learning" by Hastie et al. 3.20
Linear Regression
Binary classification
D = {(x i , yi )}ni=1
yi 2 {0, 1}. yi = 0 =) BLUE and yi = 1 =) ORANGE
T
ŷi = ˆ RSS x i
T
Regression coefficients ˆ RSS can be used to define a decision boundary:
T
{x : ˆ x = 0.5}
RSS
Regression

Simple linear
(
1 T decisionthresholdregression
\=
Class
ORANGE,
BLUE,
T O
if ˆ RSS x > 0.5
if ˆ RSS x  0.5
(Multiple) linear
regression

Classification with

0 linear regression

Non-linear
relationships with
linear regression

If decisionboundary

Notrecommend for binary classification


o f
IffintingRssmaynot bebest for
classification
Figure 2.1 from “The elements of statistical learning" by Hastie et al.
it not a closprobability 3.20
Non-additive models with linear regression
y Linear Regression

Limitation of linear regression: Effects of the variables are


independent and the effect of each variable is linear.

Sales

0
Regression

Simple linear
regression

(Multiple) linear
regression

Classification with
linear regression

Non-linear

0
TV relationships with
linear regression
Radio

New model:
Let X3 = X1 ⇥ X2 feature
grafted
Y = 0 + 1 X1 + 2 X2 + 3 X3 +✏

Figure 3.5 from “An introduction to statistical learning" by James et al. 3.21
Linear Regression
Polynomial regression
Model:
2 p
Y = 0 + 1X + 2X + ... + pX +✏
D := {(xi , yi )}ni=1 , xi 2 R
2 3 2 32 3 2 3
y1 1 x1 x12 ... x1p 0 ✏1 Regression
p7 6 7
6y2 7 61
6 7 6 x2 x22 ... x2 7 6 1 7 6✏2 7
6
7 Simple linear
6 .. 7 = 6 .. .. .. .. .. 7 6 .. 7 + 6 .. 7 regression
4 . 5 4. . . . . 5 4 . 5 4 .5 (Multiple) linear
p regression
yn 1 xn xn2 . . . xn p ✏n Classification with
| {z } | {z } linear regression
y X Vandermonde matrix Non-linear
relationships with
Least squares solution: ˆ = (XT X) 1 T
X y linear regression

Solution is unique as long as at least p + 1 of xi s are


distinct!

3.22
Linear Regression
Example to choose
what
p
I
hyperparameter
50

PIE
Linear
Degree 2
Degree 5

p5
Regression
40

Simple linear
regression
Miles per gallon

(Multiple) linear
regression
30

Classification with
linear regression

Non-linear
relationships with
20

linear regression
10

50 100 150 200

Horsepower

Figure 3.8 from “An introduction to statistical learning" by James et al. 3.23
Liner boric function models
ERP
yer
giftÉiÉÉ
function

polynomialregression
p11 gWhat
borisfin Oye exp 1YE s hyper
Gaussian
Aj parameters
boorf
sigmoid
Oj [Link] llzs

You might also like