0% found this document useful (0 votes)

14 views24 pages

Lecture9 Regression

This document covers the fundamentals of linear regression, including its probabilistic framework, parameter estimation, and hypothesis testing. It explains the relationship between dependent and independent variables, outlines the process of estimating model parameters, and discusses multiple linear regression and categorical variables. Additionally, it addresses hypothesis testing for model utility and individual regression coefficients, as well as the importance of checking assumptions in regression analysis.

Uploaded by

justson280304

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views24 pages

Lecture9 Regression

Uploaded by

justson280304

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lecture 9: Linear

Regression
Goals

• Develop basic concepts of linear regression from

a probabilistic framework

• Estimating parameters and hypothesis testing

with linear models

• Linear regression in R
Regression

• Technique used for the modeling and analysis of

numerical data

• Exploits the relationship between two or more

variables so that we can gain information about one of
them through knowing values of the other

• Regression can be used for prediction, estimation,

hypothesis testing, and modeling causal relationships
Regression Lingo

Y = X1 + X2 + X3

Dependent Variable Independent Variable

Outcome Variable Predictor Variable

Response Variable Explanatory Variable

Why Linear Regression?
• Suppose we want to model the dependent variable Y in terms
of three predictors, X1, X2, X3

Y = f(X1, X2, X3)

• Typically will not have enough data to try and directly

estimate f

• Therefore, we usually have to assume that it has some

restricted form, such as linear

Y = X1 + X2 + X3
Linear Regression is a Probabilistic Model

• Much of mathematics is devoted to studying variables

that are deterministically related to one another

y = " 0 + "1 x

y "y
! "0
"x
"1 =
#y
#x
!
! x
!
!
!
• But we’re interested in understanding the relationship
between variables related in a nondeterministic fashion
!
A Linear Probabilistic Model
• Definition: There exists parameters " 0 , "1, and " ,2 such that for
any fixed value of the independent variable x, the dependent
variable is related to x through the model equation

y = "!0 +! "1 x +
!#

2
• " is a rv assumed to be N(0, # )
True Regression Line
!
"3 y = " 0 + "1 x
y "1
"2

"0 !

! x !
!
!
!
Implications
• The expected value of Y is a linear function of X, but for fixed
x, the variable Y differs from its expected value by a random
amount

• Formally, let x* denote a particular value of the independent

variable x, then our linear probabilistic model says:

E(Y | x) = µY|x = mean value of Y when x is x *

2
V (Y | x*) = " Y|x* = variance of Y when x is x *

!
Graphical Interpretation
y
y = " 0 + "1 x
µY |x 2 = " 0 + "1 x 2
! !
!
µY |x1 = " 0 + "1 x1

x
x1 x2
!

• For example, if x = height and y = weight then µ!

Y |x =60 is the average
!
weight for!
all individuals 60 inches tall in the population

!
One More Example
Suppose the relationship between the independent variable height
(x) and dependent variable weight (y) is described by a simple
linear regression model with true regression line
y = 7.5 + 0.5x and " = 3
• Q1: What is the interpretation of "1 = 0.5?
The expected change in height associated with a 1-unit increase
in weight !
!
• Q2: If x = 20 what is the expected value of Y?
µY |x =20 = 7.5 + 0.5(20) = 17.5

• Q3: If x = 20 what is P(Y > 22)?

! " 22 -17.5 %
P(Y > 22 | x = 20) = P$ ' = 1( ) (1.5) = 0.067
# 3 &
Estimating Model Parameters
• Point estimates of "ˆ 0 and "ˆ1 are obtained by the principle of least
squares
n
f (" 0 , "1 ) = $ [ y i # (" 0 + "1 x i )] 2
! ! i=1

! y
"0

x
!

•
!
"ˆ = y # "ˆ x
0 1

!
Predicted and Residual Values
• Predicted, or fitted, values are values of y predicted by the least-
squares regression line obtained by plugging in x1,x2,…,xn into the
estimated regression line
yˆ1 = "ˆ 0 # "ˆ1 x1
yˆ = "ˆ # "ˆ x
2 0 1 2

• Residuals are
! the deviations of observed and predicted values
e1 = y1 " yˆ1
! e2 = y 2 " yˆ 2
y
e3

y1
! e1 e2
yˆ1
!
! ! ! x
!
!
Residuals Are Useful!
• They allow us to calculate the error sum of squares (SSE):
n n
SSE = " (ei ) = " (y i # yˆ i ) 2
2

i=1 i=1

• Which in turn allows us to estimate " 2 :

! SSE
"ˆ 2 =
n #2
!

• As well as an important statistic referred to as the coefficient of

determination:
!
n
2 SSE SST = # (y i " y ) 2
r = 1"
SST i=1

! !
Multiple Linear Regression
• Extension of the simple linear regression model to two or
more independent variables
y = " 0 + "1 x1 + " 2 x 2 + ...+ " n x n + #

Expression = Baseline + Age + Tissue + Sex + Error

!
• Partial Regression Coefficients: βi ≡ effect on the
dependent variable when increasing the ith independent
variable by 1 unit, holding all other predictors
constant
Categorical Independent Variables
• Qualitative variables are easily incorporated in regression
framework through dummy variables

• Simple example: sex can be coded as 0/1

• What if my categorical variable contains three levels:

0 if AA
xi = 1 if AG
2 if GG
Categorical Independent Variables
• Previous coding would result in colinearity

• Solution is to set up a series of dummy variable. In general

for k levels you need k-1 dummy variables
1 if AA
x1 =
0 otherwise
1 if AG
x2 =
0 otherwise

x1 x2
AA 1 0
AG 0 1
GG 0 0
Hypothesis Testing: Model Utility Test (or
Omnibus Test)
• The first thing we want to know after fitting a model is whether
any of the independent variables (X’s) are significantly related to
the dependent variable (Y):

H 0 : "1 = " 2 = ... = " k = 0

H A : At least one "1 # 0
R2 k
f = 2
•
(1$ R ) n $ (k + 1)

Rejection Region : F" ,k,n#(k +1)

!
Equivalent ANOVA Formulation of Omnibus Test

• We can also frame this in our now familiar ANOVA framework

- partition total variation into two components: SSE (unexplained
variation) and SSR (variation explained by linear model)
Equivalent ANOVA Formulation of Omnibus Test

• We can also frame this in our now familiar ANOVA framework

- partition total variation into two components: SSE (unexplained
variation) and SSR (variation explained by linear model)

Source of df Sum of Squares MS F

Variation
SSR MSR
Regression k SSR = # ( yˆ i " y ) 2
k MSE

SSE
Error n-2 SSE = # (y i " yˆ i ) 2
n "2
!
! n-1 !
Total SST = # (y i " y ) 2

! !

Rejection Region : F" ,k,n#(k +1)

!
F Test For Subsets of Independent Variables

• A powerful tool in multiple regression analyses is the ability to

compare two models

• For instance say we want to compare:

Full Model : y = " 0 + "1 x1 + " 2 x 2 + " 3 x 3 + " 4 x 4 + #

Reduced Model : y = " 0 + "1 x1 + " 2 x 2 + #

!
• Again, another example of ANOVA:
!
SSER = error sum of squares for
reduced model with l predictors (SSE R " SSE F ) /(k " l)
f =
SSEF = error sum of squares for SSE F /([n " (k + 1)]
full model with k predictors
!
Example of Model Comparison
• We have a quantitative trait and want to test the effects at two
markers, M1 and M2.

Full Model: Trait = Mean + M1 + M2 + (M1*M2) + error

Reduced Model: Trait = Mean + M1 + M2 + error

(SSE R " SSE F ) /(3 " 2) (SSE R " SSE F )

f = =
SSE F /([100 " (3 + 1)] SSE F /96

Rejection Region : Fa, 1, 96

!
Hypothesis Tests of Individual Regression
Coefficients
• Hypothesis tests for each "ˆ i can be done by simple t-tests:
H 0 : "ˆ i = 0

! H : "ˆ # 0
A i

"ˆ i $ " i
T=
se(" i )

Critical value : t" / 2,n#(k#1)

• Confidence Intervals
! are equally easy to obtain:

! "ˆ i ± t# / 2,n$(k$1) • se("ˆ i )

Checking Assumptions
• Critically important to examine data and check assumptions
underlying the regression model
 Outliers
 Normality
 Constant variance
 Independence among residuals

• Standard diagnostic plots include:

 scatter plots of y versus xi (outliers)
 qq plot of residuals (normality)
 residuals versus fitted values (independence, constant variance)
 residuals versus xi (outliers, constant variance)

• We’ll explore diagnostic plots in more detail in R

Fixed -vs- Random Effects Models
• In ANOVA and Regression analyses our independent variables can
be treated as Fixed or Random

• Fixed Effects: variables whose levels are either sampled

exhaustively or are the only ones considered relevant to the
experimenter

• Random Effects: variables whose levels are randomly sampled

from a large population of levels

• Example from our recent AJHG paper:

Expression = Baseline + Population + Individual + Error

Linear Regression
No ratings yet
Linear Regression
23 pages
Understanding Linear Regression in R
No ratings yet
Understanding Linear Regression in R
24 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
12 pages
ch12 0
No ratings yet
ch12 0
43 pages
Linear Regression for Academics
No ratings yet
Linear Regression for Academics
28 pages
Regression Analysis Basics
No ratings yet
Regression Analysis Basics
56 pages
15multiple Linear Regression
No ratings yet
15multiple Linear Regression
168 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
24 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Linear Regression - Module 3
No ratings yet
Linear Regression - Module 3
16 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
27 pages
Module01.1 LinearRegression
No ratings yet
Module01.1 LinearRegression
32 pages
Note 13 - Linear Regression
No ratings yet
Note 13 - Linear Regression
25 pages
Bivariate
No ratings yet
Bivariate
28 pages
RiP Final Study
No ratings yet
RiP Final Study
35 pages
Linear Regression Lecture Notes
100% (2)
Linear Regression Lecture Notes
228 pages
REGRESSION
No ratings yet
REGRESSION
8 pages
Linear Regression
100% (3)
Linear Regression
28 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
No ratings yet
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
50 pages
STAT630Slide Adv Data Analysis
0% (1)
STAT630Slide Adv Data Analysis
238 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
Regression
No ratings yet
Regression
56 pages
ch12 0
No ratings yet
ch12 0
82 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Simple Linear Regression Analysis Guide
No ratings yet
Simple Linear Regression Analysis Guide
50 pages
Unit 5
No ratings yet
Unit 5
10 pages
Econometrics: Linear Regression Basics
No ratings yet
Econometrics: Linear Regression Basics
52 pages
R-Programming - Unit 5
No ratings yet
R-Programming - Unit 5
43 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Multiple Linear Regression & Nonlinear Regression Models
No ratings yet
Multiple Linear Regression & Nonlinear Regression Models
51 pages
F Regression
No ratings yet
F Regression
65 pages
Understanding Linear Regression Analysis
No ratings yet
Understanding Linear Regression Analysis
36 pages
Linear Regression & Python Guide
No ratings yet
Linear Regression & Python Guide
24 pages
Chapter 3 Notes Part 3
No ratings yet
Chapter 3 Notes Part 3
9 pages
06 Least Squar Regression
No ratings yet
06 Least Squar Regression
25 pages
Linear Regression Essentials
No ratings yet
Linear Regression Essentials
14 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
Multiple Regression A
No ratings yet
Multiple Regression A
32 pages
Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
Lect5 Math231
No ratings yet
Lect5 Math231
31 pages
Chap3 - Multiple Regression
No ratings yet
Chap3 - Multiple Regression
56 pages
Simple Linear Regression Analysis Guide
No ratings yet
Simple Linear Regression Analysis Guide
95 pages
Multiple Linear Regression Session 4
No ratings yet
Multiple Linear Regression Session 4
32 pages
Module 2-Supervised Learning
No ratings yet
Module 2-Supervised Learning
74 pages
Linear Model
No ratings yet
Linear Model
10 pages
Lecture8 4
No ratings yet
Lecture8 4
29 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
Simple Linear
No ratings yet
Simple Linear
10 pages
1 - Simple Linear Regression
No ratings yet
1 - Simple Linear Regression
43 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
45 pages
CH 2
No ratings yet
CH 2
31 pages
CH 06
No ratings yet
CH 06
22 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
54 pages
Scientific Research Report 2024 2025
No ratings yet
Scientific Research Report 2024 2025
49 pages
Bài 1: Viết lại các câu sau, sử dụng dạng so sánh hơn của tính từ
No ratings yet
Bài 1: Viết lại các câu sau, sử dụng dạng so sánh hơn của tính từ
7 pages
Present Continuous Exercises
No ratings yet
Present Continuous Exercises
12 pages
đề 2 bà trưng
No ratings yet
đề 2 bà trưng
3 pages
BlocksizeFinalversion Cai, 2006
No ratings yet
BlocksizeFinalversion Cai, 2006
25 pages
Degrees of Freedom What Are They
No ratings yet
Degrees of Freedom What Are They
6 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
28 pages
Equity Valuation Using Multiples
No ratings yet
Equity Valuation Using Multiples
47 pages
Multivariate Regression, Slides
No ratings yet
Multivariate Regression, Slides
61 pages
CFA Level II: Quantitative Methods
No ratings yet
CFA Level II: Quantitative Methods
20 pages
Statistical Data Treatment Techniques
100% (1)
Statistical Data Treatment Techniques
26 pages
ApplStats Spring2022 Final Practice
No ratings yet
ApplStats Spring2022 Final Practice
5 pages
Lecture 03-Handling Missing Values in RCBD
No ratings yet
Lecture 03-Handling Missing Values in RCBD
24 pages
Chapter 12 Autocorrelation
No ratings yet
Chapter 12 Autocorrelation
84 pages
Carhart Model
100% (1)
Carhart Model
12 pages
Cens Reg
No ratings yet
Cens Reg
12 pages
Data Analysis for Statisticians
No ratings yet
Data Analysis for Statisticians
93 pages
Excel Data Analysis for Physics Labs
No ratings yet
Excel Data Analysis for Physics Labs
4 pages
Computational Fluid Dynamics
No ratings yet
Computational Fluid Dynamics
11 pages
Econometric Analysis for Researchers
No ratings yet
Econometric Analysis for Researchers
8 pages
Regression Analysis of Bike Rentals
100% (1)
Regression Analysis of Bike Rentals
16 pages
811-Article Text-7306-1-10-20231127
No ratings yet
811-Article Text-7306-1-10-20231127
14 pages
Solutions Chapter 9
No ratings yet
Solutions Chapter 9
34 pages
Assessment 2
No ratings yet
Assessment 2
10 pages
EE6404 Measurements and Instrumentation 1
0% (1)
EE6404 Measurements and Instrumentation 1
115 pages
A Research Paper Impact of Marketing Stimuli On Mobile Phone Buying Behaviour of Young Indian Adults-An Efa and Cfa Approach
No ratings yet
A Research Paper Impact of Marketing Stimuli On Mobile Phone Buying Behaviour of Young Indian Adults-An Efa and Cfa Approach
23 pages
ODTK Objects and GUI Overview PDF
No ratings yet
ODTK Objects and GUI Overview PDF
45 pages
3 Types of Backtest
No ratings yet
3 Types of Backtest
20 pages
06-Geostatistics Estimation Lecture Notes
No ratings yet
06-Geostatistics Estimation Lecture Notes
29 pages
Indian Textile Industry Productivity Analysis
No ratings yet
Indian Textile Industry Productivity Analysis
13 pages
Final Exam
No ratings yet
Final Exam
7 pages
Sustainability 14 09295
No ratings yet
Sustainability 14 09295
21 pages
Pavement Deterioration Models
No ratings yet
Pavement Deterioration Models
7 pages
Industrail Report Format
No ratings yet
Industrail Report Format
14 pages

Lecture9 Regression

Uploaded by

Lecture9 Regression

Uploaded by

Lecture 9: Linear

• Develop basic concepts of linear regression from

• Estimating parameters and hypothesis testing

• Technique used for the modeling and analysis of

• Exploits the relationship between two or more

• Regression can be used for prediction, estimation,

Dependent Variable Independent Variable

Outcome Variable Predictor Variable

Response Variable Explanatory Variable

Y = f(X1, X2, X3)

• Typically will not have enough data to try and directly

• Therefore, we usually have to assume that it has some

• Much of mathematics is devoted to studying variables

• Formally, let x* denote a particular value of the independent

E(Y | x*) = µY|x* = mean value of Y when x is x *

• For example, if x = height and y = weight then µ!

• Q3: If x = 20 what is P(Y > 22)?

• Which in turn allows us to estimate " 2 :

• As well as an important statistic referred to as the coefficient of

Expression = Baseline + Age + Tissue + Sex + Error

• Simple example: sex can be coded as 0/1

• What if my categorical variable contains three levels:

• Solution is to set up a series of dummy variable. In general

H 0 : "1 = " 2 = ... = " k = 0

Rejection Region : F" ,k,n#(k +1)

• We can also frame this in our now familiar ANOVA framework

• We can also frame this in our now familiar ANOVA framework

Source of df Sum of Squares MS F

Rejection Region : F" ,k,n#(k +1)

• A powerful tool in multiple regression analyses is the ability to

• For instance say we want to compare:

Reduced Model : y = " 0 + "1 x1 + " 2 x 2 + #

Full Model: Trait = Mean + M1 + M2 + (M1*M2) + error

(SSE R " SSE F ) /(3 " 2) (SSE R " SSE F )

Rejection Region : Fa, 1, 96

Critical value : t" / 2,n#(k#1)

! "ˆ i ± t# / 2,n$(k$1) • se("ˆ i )

• Standard diagnostic plots include:

• We’ll explore diagnostic plots in more detail in R

• Fixed Effects: variables whose levels are either sampled

• Random Effects: variables whose levels are randomly sampled

• Example from our recent AJHG paper:

Expression = Baseline + Population + Individual + Error

You might also like

E(Y | x) = µY|x = mean value of Y when x is x *