0% found this document useful (0 votes)

67 views26 pages

Multiple Regression Inference Guide

1) The lecture discusses inference for the multiple regression model, including assessing the significance of variables. Standard errors and confidence intervals allow estimating the variability and accuracy of regression coefficients. 2) Hypothesis tests examine whether individual variables are needed in the model and whether the overall regression relationship is statistically significant. The t-statistic and p-values assess individual predictors, while the F-statistic and p-value judge the entire model. 3) An example analyzes physical measures to predict biochemical levels in children. A full model is compared to submodels excluding variables, to see if a more parsimonious relationship can adequately describe the data. Confounding must be considered when variables are removed.

Uploaded by

PETER

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views26 pages

Multiple Regression Inference Guide

Uploaded by

PETER

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

STATS 330: Lecture 6

Inference for the Multiple Regression Model

31.07.2014
Getting RStudio

[Link]

[Link]
Inference for the regression model

Aim of todays lecture

I To discuss how we assess the significance of variables in the

regression.

I Key concepts:

I Standard errors
I Confidence intervals for the coefficients
I Tests of significance
Variability of the regression coefficients

I Imagine that we keep the xs fixed, but resample the errors

and refit the plane. How much would the plane (estimated
coefficients) change?

I This gives us an idea of the variability (accuracy) of the

estimated coefficients as estimates of the coefficients of the
true regression plane.
Y

X1
X2
Variability of the regression coefficients

I Variability depends on

I The arrangement of the xs (the more correlation, the more

change)
I The error variance (the more scatter about the true plane, the
more the fitted plane changes)

I Measure variability by the standard error of the coefficients

Example: Cherries

Call:
lm(formula = volume ~ diameter + height)

Residuals:
Min 1Q Median 3Q Max
-6.4065 -2.6493 -0.2876 2.2003 8.4847

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -57.9877 8.6382 -6.713 2.75e-07 ***
diameter 56.4979 3.1712 17.816 < 2e-16 ***
height 0.3393 0.1302 2.607 0.0145 *
---

Residual standard error: 3.882 on 28 degrees of freedom

Multiple R-squared: 0.948, Adjusted R-squared: 0.9442
F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
Confidence intervals

CI : Estimated coefficient standard error t

t : 97.5% point of t distribution with df degrees of

freedom.

df : n k 1.

n : number of observations.

k : number of covariates (assuming we have a constant

term).
Confidence intervals
Example: Cherries

Use stats function confint

> confint([Link])
2.5 % 97.5 %
(Intercept) -75.68226247 -40.2930554
diameter 50.00206788 62.9937842
height 0.07264863 0.6058538
Hypothesis test

I Often we ask do we need a particular variable, given the

others are in the model?

I Note that this is not the same as asking is a particular

variable related to the response?

I Can test the former by examining the ratio of the coefficient

to its standard error.
Hypothesis test

I This is the t-statistic t.

I The bigger t, the more we need the variable.

I Equivalently, the smaller the p-value, the more we need the

variable.
Example: Cherries

Call:
lm(formula = volume ~ diameter + height)

Residuals:
Min 1Q Median 3Q Max
-6.4065 -2.6493 -0.2876 2.2003 8.4847

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -57.9877 8.6382 -6.713 2.75e-07 ***
diameter 56.4979 3.1712 17.816 < 2e-16 ***
height 0.3393 0.1302 2.607 0.0145 *
---

Residual standard error: 3.882 on 28 degrees of freedom

Multiple R-squared: 0.948, Adjusted R-squared: 0.9442
F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
Recall: p-value
Density for t with df=28

2.607 2.607
0.4
0.3

pvalue = 0.0145
0.2
0.1
0.0

4 2 0 2 4
Other hypotheses

I Overall significance of the regression: do none of the variables

have a relationship with the response?

I Use the F statistic: the bigger F , the more evidence that at

least one variable has a relationship.

I equivalently, the smaller the p-value, the more evidence that

at least one variable has a relationship.
Example: Cherries

Call:
lm(formula = volume ~ diameter + height)

Residuals:
Min 1Q Median 3Q Max
-6.4065 -2.6493 -0.2876 2.2003 8.4847

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -57.9877 8.6382 -6.713 2.75e-07 ***
diameter 56.4979 3.1712 17.816 < 2e-16 ***
height 0.3393 0.1302 2.607 0.0145 *
---

Residual standard error: 3.882 on 28 degrees of freedom

Multiple R-squared: 0.948, Adjusted R-squared: 0.9442
F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
Testing if a subset is required

I Often we want to test if a subset of variables is unnecessary.

I Terminology

Full model: Model containing all variables.

Submodel: Model with a set of variables removed.

I Test is based on comparing the RSS of the submodel with the

RSS of the full model. Full model RSS is always smaller
(why?)
Testing if a subset is required

I If the full model RSS is not much smaller than the submodel
RSS, the submodel is adequate: we do not need the extra
variables.

I To do the test, we

I fit both models, get RSS for both;

I calculate test statistic;
I If the test statistic is large, and equivalently the p-value is
small, the submodel is not adequate.
Testing if a subset is required

I The test statistic is

(RSSsub RSSfull )
F =
s 2 (dffull dfsub )

I dffull dfsub is the number of variables dropped.

I s 2 is the estimate of 2 from the full model (the residual

mean square)

I R has a function anova to do the calculation.

p-values

I If the submodel is correct, the test statistic has an

F -distribution with dffull dfsub and n k 1 degrees of
freedom.

I We assess if the value of F calculated from the sample is a

plausible value from this distribution by means of a p-value.

I if the p-value is too small, we have evidence against the

hypothesis that the submodel is ok.
p-values
Density for F with 2 and 16 degrees of freedom

1.0
0.8

Fvalue
0.6
0.4
0.2

pvalue
0.0

0 2 4 6 8 10
Example: Free fatty acid data

I Use physical measures to model a biochemical parameter in

overweight children.

I Variables are

FFA: Free fatty acid level in blood (response variable)

Age: months

Weight: pounds

Skinfold thickness: inches

Analysis

Call:
lm(formula = ffa ~ age + weight + skinfold, data = [Link])

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.95777 1.40138 2.824 0.01222 *
age -0.01912 0.01275 -1.499 0.15323
weight -0.02007 0.00613 -3.274 0.00478 **
skinfold -0.07788 0.31377 -0.248 0.80714

This suggests
I age is not required if weight and skinfold are retained

I skinfold is not required if weight and age are retained

I Can we get away with just weight?

Analysis

> [Link] <- lm(ffa~weight,data=[Link])

> anova([Link],[Link])
Analysis of Variance Table

Model 1: ffa ~ weight

Model 2: ffa ~ age + weight + skinfold
[Link] RSS Df Sum of Sq F Pr(>F)
1 18 0.91007
2 16 0.79113 2 0.11895 1.2028 0.3261

I Small F and large p-value suggest weight alone is adequate.

I But test should be interpreted with caution, confounding?
Confounding?
I Non-causal relation due to missing variable.

I Effect can be checked by comparing coefficients in full and

submodel (if available).
> summary([Link])
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.95777 1.40138 2.824 0.01222 *
age -0.01912 0.01275 -1.499 0.15323
weight -0.02007 0.00613 -3.274 0.00478 **
skinfold -0.07788 0.31377 -0.248 0.80714

> summary([Link])
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.01651 0.37578 5.366 4.23e-05 ***
weight -0.02162 0.00608 -3.555 0.00226 **

Multiple Regression Inference Techniques
No ratings yet
Multiple Regression Inference Techniques
33 pages
Predicting Cherry Tree Volume with R
No ratings yet
Predicting Cherry Tree Volume with R
31 pages
Cherry Tree Volume Prediction in R
No ratings yet
Cherry Tree Volume Prediction in R
29 pages
Biostatistics Assignment: Correlation & Regression Analysis
No ratings yet
Biostatistics Assignment: Correlation & Regression Analysis
18 pages
Multiple Regression Analysis in Business
No ratings yet
Multiple Regression Analysis in Business
28 pages
Multiple Regression Analysis in Excel
No ratings yet
Multiple Regression Analysis in Excel
35 pages
Linear Model
No ratings yet
Linear Model
10 pages
Regression Analysis of Hydration Systems
No ratings yet
Regression Analysis of Hydration Systems
7 pages
Chicken Consumption Regression Analysis
No ratings yet
Chicken Consumption Regression Analysis
3 pages
Multiple Linear Regression in R
No ratings yet
Multiple Linear Regression in R
5 pages
9 Regression and Correlation Methods 5 2023
No ratings yet
9 Regression and Correlation Methods 5 2023
7 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
40 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
23 pages
Multiple Regression Slides Mod-Ed
No ratings yet
Multiple Regression Slides Mod-Ed
32 pages
F-Test for Model Comparison in Regression
No ratings yet
F-Test for Model Comparison in Regression
37 pages
Experimental Design and Statistical Analysis Guide
No ratings yet
Experimental Design and Statistical Analysis Guide
9 pages
08 Multiple Regression
No ratings yet
08 Multiple Regression
11 pages
Multiple Regression
No ratings yet
Multiple Regression
34 pages
Lec 59
No ratings yet
Lec 59
23 pages
R Programming: Statistical Analysis Guide
No ratings yet
R Programming: Statistical Analysis Guide
29 pages
X X B X B X B y X X B X B N B Y: QMDS 202 Data Analysis and Modeling
No ratings yet
X X B X B X B y X X B X B N B Y: QMDS 202 Data Analysis and Modeling
6 pages
9.regression Zoom
No ratings yet
9.regression Zoom
23 pages
Statistical Analysis of Chocolate Sales
No ratings yet
Statistical Analysis of Chocolate Sales
49 pages
330 Lecture9 2014
No ratings yet
330 Lecture9 2014
40 pages
Multiple Regression
No ratings yet
Multiple Regression
11 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
10 pages
RSM1282-2025-Session 6-Multiple Regression POST
No ratings yet
RSM1282-2025-Session 6-Multiple Regression POST
84 pages
Anova 2 Dec 2015
No ratings yet
Anova 2 Dec 2015
16 pages
Midterm
No ratings yet
Midterm
54 pages
Part 11 Multiple Linear Regression - Pdf.crdownload
No ratings yet
Part 11 Multiple Linear Regression - Pdf.crdownload
41 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
Hypothesis Testing and Regression Analysis
No ratings yet
Hypothesis Testing and Regression Analysis
3 pages
LMPractical 01
No ratings yet
LMPractical 01
18 pages
Fitting & Interpreting Linear Models in Rinear Models in R
100% (1)
Fitting & Interpreting Linear Models in Rinear Models in R
8 pages
Regression Analysis of Household Data
No ratings yet
Regression Analysis of Household Data
54 pages
MS3252 Midterm Exam Practice Questions
No ratings yet
MS3252 Midterm Exam Practice Questions
6 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
77 pages
Regression Model Evaluation in Econometrics
No ratings yet
Regression Model Evaluation in Econometrics
7 pages
Unit 3
No ratings yet
Unit 3
24 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
31 pages
Programming For Data Science Assignment-4
No ratings yet
Programming For Data Science Assignment-4
21 pages
Final Project On Abalone Length and Diameter
No ratings yet
Final Project On Abalone Length and Diameter
11 pages
F-Test for Multiple Regression Hypotheses
No ratings yet
F-Test for Multiple Regression Hypotheses
23 pages
Hypothesis Testing in Multiple Regression
No ratings yet
Hypothesis Testing in Multiple Regression
23 pages
Week 2 Multiple Regression
No ratings yet
Week 2 Multiple Regression
24 pages
Father-Daughter Height Correlation Analysis
No ratings yet
Father-Daughter Height Correlation Analysis
43 pages
SPSS Linear Regression Analysis Guide
No ratings yet
SPSS Linear Regression Analysis Guide
72 pages
Multiple Regression Analysis in Business
No ratings yet
Multiple Regression Analysis in Business
71 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
54 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
61 pages
Monika Project
No ratings yet
Monika Project
34 pages
Multiple Linear Regression in Excel
No ratings yet
Multiple Linear Regression in Excel
37 pages
Chapter 11
No ratings yet
Chapter 11
18 pages
Practical Linear Regression Guide
No ratings yet
Practical Linear Regression Guide
162 pages
Digital Strategy for Business Leaders
No ratings yet
Digital Strategy for Business Leaders
4 pages
Safety Data Sheet: Die Hardener Premium
No ratings yet
Safety Data Sheet: Die Hardener Premium
7 pages
Discri, Bully and Haras
No ratings yet
Discri, Bully and Haras
10 pages
Robots Mean Business A Conversation With Rodney Brooks
No ratings yet
Robots Mean Business A Conversation With Rodney Brooks
4 pages
Vertex Self Curing Liquid
No ratings yet
Vertex Self Curing Liquid
18 pages
Methylated Spirits Msds 56454 Jun08
No ratings yet
Methylated Spirits Msds 56454 Jun08
8 pages
Methylated Spirits Msds 56454 Jun08
No ratings yet
Methylated Spirits Msds 56454 Jun08
8 pages
Modelling Wax MSDS - Carmel Group Inc.
No ratings yet
Modelling Wax MSDS - Carmel Group Inc.
4 pages
Special Methylated Spirit Safety Data
No ratings yet
Special Methylated Spirit Safety Data
3 pages
Wacker Catalyst T 35
No ratings yet
Wacker Catalyst T 35
6 pages
Dentaurum Rema Exakt, Rema Exakt F Mixing Liquid
No ratings yet
Dentaurum Rema Exakt, Rema Exakt F Mixing Liquid
6 pages
Safety Data for Industrial Use
No ratings yet
Safety Data for Industrial Use
2 pages
B50 Polishing Solution MSDS
No ratings yet
B50 Polishing Solution MSDS
3 pages
VET Pricing under Smart and Skilled Report
No ratings yet
VET Pricing under Smart and Skilled Report
7 pages
Vertex Cold-Curing Acrylic Powder SDS
No ratings yet
Vertex Cold-Curing Acrylic Powder SDS
6 pages
Mini-Lesson 1: Section 1.1: Order of Operations
No ratings yet
Mini-Lesson 1: Section 1.1: Order of Operations
11 pages
Partial Derivatives and Error Analysis
No ratings yet
Partial Derivatives and Error Analysis
1 page
IT Exam Marking Guide
No ratings yet
IT Exam Marking Guide
13 pages
Lab 7 B
No ratings yet
Lab 7 B
2 pages
2012 Assessment: Specialist Maths 2 GA 3 Exam © Victorian Curriculum and Assessment Authority 2013 1
No ratings yet
2012 Assessment: Specialist Maths 2 GA 3 Exam © Victorian Curriculum and Assessment Authority 2013 1
9 pages
Statistical Analysis Homework Questions
No ratings yet
Statistical Analysis Homework Questions
11 pages
Beginner Data Science Q&A Guide
No ratings yet
Beginner Data Science Q&A Guide
4 pages
6chapter-6 (Nested and Split Plot Design) - 1
No ratings yet
6chapter-6 (Nested and Split Plot Design) - 1
46 pages
Selection Bias in Randomised Trials
No ratings yet
Selection Bias in Randomised Trials
8 pages
Statistical Sampling in Auditing
No ratings yet
Statistical Sampling in Auditing
35 pages
Assignment 1 - MANOVA (Multivariate ANOVA)
No ratings yet
Assignment 1 - MANOVA (Multivariate ANOVA)
39 pages
T Test
100% (1)
T Test
6 pages
Sampling Designs and Data Processing
No ratings yet
Sampling Designs and Data Processing
3 pages
Simple Linear Regression Overview
100% (1)
Simple Linear Regression Overview
77 pages
Sampling Techniques in Statistics
No ratings yet
Sampling Techniques in Statistics
28 pages
Union and Intersection of Events Expectations:: Sample Space Denoted by S. Therefore, S (1, 2, 3, 4, 5, 6)
No ratings yet
Union and Intersection of Events Expectations:: Sample Space Denoted by S. Therefore, S (1, 2, 3, 4, 5, 6)
12 pages
B.SC Statistics
No ratings yet
B.SC Statistics
16 pages
Point Estimation Methods Explained
100% (1)
Point Estimation Methods Explained
13 pages
Understanding Probability Distributions
No ratings yet
Understanding Probability Distributions
6 pages
Calculator Instructions - Casio PDF
No ratings yet
Calculator Instructions - Casio PDF
14 pages
Uji Normalitas dan ANOVA Skor Kelompok
No ratings yet
Uji Normalitas dan ANOVA Skor Kelompok
5 pages
Sta 342-Testing Hypothesis-3-Test On Proportion (Single)
No ratings yet
Sta 342-Testing Hypothesis-3-Test On Proportion (Single)
11 pages
Statistics for Researchers
No ratings yet
Statistics for Researchers
20 pages
Statistics and Probability - q4 - Mod4 - Identifying Parameter To Be Tested Given A Real Life-Problem - V2 PDF
No ratings yet
Statistics and Probability - q4 - Mod4 - Identifying Parameter To Be Tested Given A Real Life-Problem - V2 PDF
25 pages
MMW Finals
No ratings yet
MMW Finals
5 pages
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
MEMORE SAS Documentation V2.1
No ratings yet
MEMORE SAS Documentation V2.1
7 pages
Fixed Random
No ratings yet
Fixed Random
19 pages
Understanding Probability Axioms and Concepts
No ratings yet
Understanding Probability Axioms and Concepts
17 pages
Statistics Unit 7 Notes
No ratings yet
Statistics Unit 7 Notes
9 pages
Markov Chain Analysis in Business
No ratings yet
Markov Chain Analysis in Business
53 pages
Sampling Techniques in Research Methods
100% (1)
Sampling Techniques in Research Methods
25 pages
Skittles Statistics Analysis Project
100% (1)
Skittles Statistics Analysis Project
7 pages
Stat 3603
No ratings yet
Stat 3603
2 pages
Average Variance Analysis Results
No ratings yet
Average Variance Analysis Results
4 pages
Lecture - 9 EstimationRM (ECON 1005 2011-2012)
No ratings yet
Lecture - 9 EstimationRM (ECON 1005 2011-2012)
52 pages

Multiple Regression Inference Guide

Uploaded by

Multiple Regression Inference Guide

Uploaded by

STATS 330: Lecture 6

Inference for the Multiple Regression Model

Aim of todays lecture

I To discuss how we assess the significance of variables in the

I Imagine that we keep the xs fixed, but resample the errors

I This gives us an idea of the variability (accuracy) of the

I The arrangement of the xs (the more correlation, the more

I Measure variability by the standard error of the coefficients

Residual standard error: 3.882 on 28 degrees of freedom

CI : Estimated coefficient standard error t

t : 97.5% point of t distribution with df degrees of

k : number of covariates (assuming we have a constant

Use stats function confint

I Often we ask do we need a particular variable, given the

I Note that this is not the same as asking is a particular

I Can test the former by examining the ratio of the coefficient

I This is the t-statistic t.

I The bigger t, the more we need the variable.

I Equivalently, the smaller the p-value, the more we need the

Residual standard error: 3.882 on 28 degrees of freedom

I Overall significance of the regression: do none of the variables

I Use the F statistic: the bigger F , the more evidence that at

I equivalently, the smaller the p-value, the more evidence that

Residual standard error: 3.882 on 28 degrees of freedom

I Often we want to test if a subset of variables is unnecessary.

Full model: Model containing all variables.

Submodel: Model with a set of variables removed.

I Test is based on comparing the RSS of the submodel with the

I fit both models, get RSS for both;

I The test statistic is

I dffull dfsub is the number of variables dropped.

I s 2 is the estimate of 2 from the full model (the residual

I R has a function anova to do the calculation.

I If the submodel is correct, the test statistic has an

I We assess if the value of F calculated from the sample is a

I if the p-value is too small, we have evidence against the

I Use physical measures to model a biochemical parameter in

FFA: Free fatty acid level in blood (response variable)

Skinfold thickness: inches

I skinfold is not required if weight and age are retained

I Can we get away with just weight?

> [Link] <- lm(ffa~weight,data=[Link])

Model 1: ffa ~ weight

I Small F and large p-value suggest weight alone is adequate.

I Effect can be checked by comparing coefficients in full and

You might also like