Chapter 3 - Multiple Linear Regression

Chapter 3 - Multiple Linear Regression
1. Introduction
A regression model that involves more than one regressor variable is called a multiple regression
model.
In multiple regressions, the mean of the response variable is a function of two or more explanatory
variables.
In Chapter 2 we examined the relationship between HSGPA and College GPA. There are some
other possible factors that may be related to College GPA, such as ACT Scores, Rank in high school
class, etc.
The corresponding multiple linear regression model in this case would look like this:
College GPA = 𝛽 + 𝛽 (HSGPA) + 𝛽 (ACT Scores) + 𝛽 (Rank) + 𝜀
where the parameters 𝛽 , 𝛽 , 𝛽 , and 𝛽 would be estimated from the data.
2. Multiple Linear Regression Model and Assumptions
The multiple linear regression model with 𝑘 regressors or predictor variables is:
𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 +⋯+ 𝛽 𝑥 + 𝜀
where
1. 𝑦 is the response variable that we want to predict.
2. 𝑥 , 𝑥 , . . . , 𝑥 are 𝑘 predictor variables.
3. 𝛽 , 𝛽 , 𝛽 , … , 𝛽 are unknown parameters.
4. 𝛽 is the intercept – the average value of 𝒀 when 𝑿𝟏 , 𝑿𝟐 and 𝑿𝒌 are all zeros.
5. 𝛽 ’s are called the (partial) regression coefficients which represent the expected change in
the response 𝑦 per unit change in 𝑥 when all the remaining regressor variables,
𝑥 (𝑖 ≠ 𝑗), are held constant. For this reason, the parameters, 𝛽 ; 𝑗 = 1, 2, … , 𝑘 , are often
called partial regression coefficients.
6. 𝜀 is the random error.
Assume that 𝜀 ~ 𝑁𝐼𝐷(0, 𝜎 ) and thus the mean of 𝑦 is 𝐸(𝑌) = 𝛽 + 𝛽 𝑋 + 𝛽 𝑋 +. . . +𝛽 𝑋 .

(approximate, not real)
Multiple linear regression models are often used as empirical models or approximating functions.
That is, the true functional relationship between y and 𝑥 , 𝑥 , … , 𝑥 is unknown, but over certain
ranges of the regressor variables the linear regression model is an adequate approximation to the
true unknown function.
UECM2253 Applied Regression Analysis Chapter 3 - 1

Models with complex structure may often still be analyzed by multiple linear regression techniques.
For example:
𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝜀 can be written as 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝜀
where 𝑥 = 𝑥 , 𝑥 = 𝑥 , 𝑥 = 𝑥 .
2.1. Matrix Form of Multiple Linear Regression Model
The multiple linear regression model can be written in matrix notation as 𝑦 = 𝑋𝛽 + 𝜀
𝑦 1 𝑥 𝑥 ⋯ 𝑥 𝛽 𝜀
where 𝑦 = , 𝑋= , 𝛽= , 𝜀=
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
In general, 𝑦 is an 𝑛 × 1 vector of the observations, 𝑋 is an 𝑛 × 𝑝 matrix of the levels of the

regressor’s variables, 𝛽 is a 𝑝 × 1 vector of the regression coefficients, and 𝜀 is an 𝑛 × 1 vector of
random errors. Note that 𝑝 = 𝑘 + 1.
2.2. Estimation of the Model Parameter
Least Squares Estimation of the Regression Coefficients
The method of least squares can be used to estimate the regression coefficients, 𝛽 ’s. Suppose that
𝑛 > 𝑘 observations are available with sample regression model
𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯+ 𝛽 𝑥 + 𝜀 = 𝛽 + 𝛽 𝑥 + 𝜀 ; 𝑖 = 1, … , 𝑛 .
Data for Multiple Linear Regression

Regressors
Observation, i Response, y 𝑥 𝑥 … 𝑥
1 𝑦 𝑥 𝑥 … 𝑥
2 𝑦 𝑥 𝑥 … 𝑥
⋮ ⋮ ⋮ ⋮ ⋮
n 𝑦 𝑥 𝑥 … 𝑥

Suppose that 𝜀 ~ 𝑁𝐼𝐷(0, 𝜎 ) then the least square function is given as
𝑆(𝛽 , 𝛽 , … , 𝛽 ) = 𝜀 = 𝑦 − 𝛽 − 𝛽𝑥 ,
And in matrix form
𝑆(𝜷) = 𝜺 = (𝒚 − 𝑿𝜷) (𝒚 − 𝑿𝜷) .
Note that 𝑆(𝜷) may be expressed as
𝑆(𝜷) = 𝒚 𝒚 − 𝜷 𝑿 𝒚 − 𝒚 𝑿𝜷 + 𝜷 𝑿 𝑿𝜷 = 𝒚 𝒚 − 2𝜷 𝑿 𝒚 + 𝜷 𝑿 𝑿𝜷
The least-square estimator must satisfy

𝜕𝑆
= −2𝑿 𝒚 + 2𝑿 𝑿 𝜷 = 0 .
𝜕𝜷 𝜷
The least-square estimator of 𝜷 is
𝜷 = (𝑿 𝑿) 𝑿𝒚
provided that the inverse matrix (𝑿 𝑿)– exists. The (𝑿 𝑿)– will always exists if the regressors
are linearly independent, that is, if none of the columns of the X matrix is a linear combination of
the other columns.
The fitted multiple linear regression model is
𝒚 = 𝑿𝜷 = 𝑿(𝑿 𝑿) 𝑿 𝒚 = 𝑯𝒚 ,
where 𝑯 = 𝑿(𝑿 𝑿) 𝑿 is an 𝑛 × 𝑛 matrix and is called the hat matrix. It maps the vector of
observed values into a vector of fitted values. The hat matrix and its properties play a central role
in regression analysis.
LSE Properties:
1. 𝐸(𝛽 ) = 𝐸[(𝑋 ′ 𝑋) 𝑋 ′ 𝑦] = 𝛽.
2. 𝛽 = (𝑋 ′ 𝑋) 𝑋 ′ 𝑦 is the best linear unbiased estimator (BLUE) of 𝛽.
3. cov 𝜷 = 𝐸 𝜷 − 𝐸 𝜷 𝜷−𝐸 𝜷 = var 𝜷 = var[(𝑿 𝑿) 𝑿 𝒚] = 𝝈 (𝑿 𝑿) .
which is a 𝑝 × 𝑝 symmetric matrix whose jth diagonal element is the variance of 𝛽 and
(ij)th off-diagonal element is the covariance between 𝛽 and 𝛽 .
Let 𝑪 = (𝑿 𝑿) , then the variance of 𝛽 , 𝑗 = 0, … , 𝑘 , is 𝜎 𝐶 and the covariance of 𝛽 and 𝛽 is

𝜎 𝐶 .

Residuals
The difference between the observed value 𝑦 and the corresponding fitted value 𝑦 is the residual
𝑒 = 𝑦 – 𝑦 . The n residuals may be conveniently written in matrix notation as
𝒆 = 𝒚 − 𝒚 = 𝒚 − 𝑿𝜷 = 𝒚 − 𝑯𝒚 = (𝑰 − 𝑯)𝒚
where I is a 𝑛 × 𝑛 identity matrix.
Example 3.1:
In a small-scale experimental study of the relation between degree of brand liking (𝑌) and moisture
content (𝑋 ) and sweetness (𝑋 ) of the product, the following results were obtained from the
experiment based on a completely randomized design. Fit a multiple linear regression model
relating the brand liking to the content and sweetness of the product.
𝑖 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
𝑥 4 4 4 4 6 6 6 6 8 8 8 8 10 10 10 10
𝑥 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4
𝑦 64 73 61 76 72 80 71 83 83 89 86 93 88 95 94 100
Solution:
1 4 2 64 1 4 2
1 1 ⋯ 1 16 112 48
1 4 4 73 1 4 4
𝑋= ,𝑦= ,𝑋 𝑋= 4 4 ⋯ 10 = 112 864 336
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
2 4 ⋯ 4 48 336 160
1 10 4 100 1 10 4
64
1 1 ⋯ 1 1308
73
𝑋 𝑦= 4 4 ⋯ 10 = 9510
⋮
2 4 ⋯ 4 3994
100
𝛽 = (𝑋 ′ 𝑋) 𝑋 ′ 𝑦
16 112 48 1308 1.2375 −0.0875 −0.1875 1308 37.65
= 112 864 336 9510 = −0.0875 0.0125 0 9510 = 4.425
48 336 160 3994 −0.1875 0 0.0625 3994 4.375
𝒚= 37.65+4.425x1+4.375x2
R-Code for the matrix notation:

# To enter the data and put it into matrix form
X1<- c(4,4,4,4,6,6,6,6,8,8,8,8,10,10,10,10)
X2<- c(2,4,2,4,2,4,2,4,2,4,2,4,2,4,2,4)
Y <- c(64,73,61,76,72,80,71,83,83,89,86,93,88,95,94,100)
X <- cbind(1, X1,X2)

# To obtain matrix for X’X

𝑡(𝑋) = 𝑋
XPX<- t(X)%*%X
XPX
X1 X2
16 112 48
X1 112 864 336
X2 48 336 160
# To obtain matrix for X`Y

XPY<- t(X)%*%Y
XPY
[,1]
1308
X1 9510
X2 3994
# To compute the inverse of X`X

IXPX <- solve(XPX)
IXPX
X1 X2
1.2375 -8.750000e-02 -0.1875
X1 -0.0875 1.250000e-02 0.0000
X2 -0.1875 2.602085e-17 0.0625
# To compute the solution for beta

Beta <- IXPX%*%XPY
Beta
[,1]
37.650
X1 4.425
X2 4.375
R-Codes to obtain the solutions for 𝛽

# To enter the data
content<- c(4,4,4,4,6,6,6,6,8,8,8,8,10,10,10,10)
sweetness<- c(2,4,2,4,2,4,2,4,2,4,2,4,2,4,2,4)
B.liking <- c(64,73,61,76,72,80,71,83,83,89,86,93,88,95,94,100)
#Use lm( ) function to fit a linear regression

lm(formula = B.liking~content+sweetness)

Output
Call:
lm(formula = B.liking ~ content + sweetness) B0
B1 B2
Coefficients:
(Intercept) content sweetness
37.650 4.425 4.375
Example 3.2: The Delivery Time

A soft drink bottler is analyzing the vending machine service routes in his distribution system. He
is interested in predicting the amount of time required by the route driver to service the vending
machines in an outlet. This service activity includes stocking the machine with beverage products
and minor maintenance or housekeeping. The industrial engineer responsible for the study has
suggested that the two most important variables affecting the delivery time (𝑦) are the number of
cases of product stocked (𝑥 ) and the distance walked by the route driver (𝑥 ). 25 observations on
delivery time are collected. Data file: C02EX2.2Delivery.txt
Observation No. Delivery Time, y (min) Number of Cases, 𝑥 Distance, 𝑥

1 16.68 7 560
2 11.50 3 220
3 12.03 3 340
⋮ ⋮ ⋮ ⋮
25 10.75 4 150
a. How would you interpret the values of 𝛽 and 𝛽 ?

b. Predict the delivery time for a machine with 15 cases in distance of 800ft.
R-Codes
#read data from file
setwd("E:/… ")
Ex2.2delivery.dat<-read.table(file = "C02EX2.2Delivery.txt", header=TRUE)
#Print data set

Ex2.2delivery.dat
#Use the names command to know what columns are contained:

names(Ex2.2delivery.dat)

Delivery.Reg<-lm(Delivery~NoCases+Distance, data=Ex2.2delivery.dat)
summary(Delivery.Reg)

Output:
Call:
lm(formula = Delivery ~ NoCases + Distance, data = Ex2.2delivery.dat)
Residuals:
Min 1Q Median 3Q Max
-5.7880 -0.6629 0.4364 1.1566 7.4197
B0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.341231 1.096730 2.135 0.044170 * B1
NoCases 1.615907 0.170735 9.464 3.25e-09 ***
Distance 0.014385 0.003613 3.981 0.000631 *** B2
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
MSE
Residual standard error: 3.259 on 22 degrees of freedom
Multiple R-squared: 0.9596, Adjusted R-squared: 0.9559
df=n-p
F-statistic: 261.2 on 2 and 22 DF, p-value: 4.687e-16
Solution:
The multiple regression model is
𝑦 =𝛽 +𝛽 𝑥 +𝛽 𝑥 + 𝜀 or 𝐸(𝒀) = 𝛽 + 𝛽 𝑿 + 𝛽 𝑿
The fitted regression equation is

𝒚= 2.3412+1.6159X1+0.0144X2
a.
𝛽= : The delivery time of a machine is expected to increase by1.62 minsfor each additional
case (𝑋 ) when the distance (𝑋 ) is held constant.
𝛽= : The delivery time of a machine is expected to increase by ________

0.0144 ft for every 1ft
increase in distance (𝑋 ) when number of cases (𝑋 ) is held constant.
b.
𝑥 = 15 , 𝑥 = 800𝑓𝑡 .
𝑦= 38.0997

Example 3.1.1: Inadequacy of Scatter Diagrams in Multiple Regression
Consider the data shown in figure below. These data were generated from the equation
𝑦 = 8 − 5𝑥 + 12𝑥
y X1 X2
10 2 1
17 3 2
48 4 5
27 1 2
55 5 6
26 6 4
9 7 3
16 8 4

The matrix of scatterplots is shown in the figure. The y-versus-x1, plot does not exhibit any apparent
relationship between the two variables. The y-versus-x2 plot indicates that a linear relationship
exists, with a slope of approximately 8. Note that both scatter diagrams convey erroneous
information.
This example illustrates that constructing scatter diagrams of y versus 𝑥 (𝑗 = 1, … , 𝑘) can be
misleading, even in the case of only two regressors operating in a perfectly additive fashion with
no noise.
Estimation of 𝝈𝟐
As in simple linear regression, we may develop an estimator of 𝝈𝟐 from the residual sum of squares
𝑆𝑆 = (𝑦 − 𝑦) = 𝑒 = 𝑒 𝑒 = 𝒚 − 𝑿𝜷 𝒚 − 𝑿𝜷
𝑆𝑆 = 𝑆𝑆 =𝒚 𝒚−𝜷 𝑿 𝒚.
This residual sum of squares has 𝑛 − 𝑘 − 1 = 𝑛 − 𝑝 degrees of freedom associate with it since 𝑘 +
1 parameters are estimated in the regression model.
The residual mean square is

𝑆𝑆
𝜎 = 𝑀𝑆 =
𝑛−𝑘−1
Note: 𝑀𝑆 is an unbiased estimator.
Example 3.3:
Estimate the error variance 𝜎 for Ex. 3.1, Brand liking of product.
Solution:
64 1308
1 1 ⋯ 1
73 9510
𝑋 𝑦= 4 4 ⋯ 10 = 3994
⋮
2 4 ⋯ 4
100
𝛽 = (𝑋 ′ 𝑋) 𝑋 ′ 𝑦
16 112 48 1308 1.2375 −0.0875 −0.1875 1308 37.65
4.425
= 112 864 336 9510 = −0.0875 0.0125 0 9510 = 4.375
48 336 160 3994 −0.1875 0 0.0625 3994
𝒚= 37.65+4.425X1+4.375X2

64
73
𝒚 𝒚 = [64 73 ⋯ 100] = 108896
⋮
100
1308
𝜷 𝑿 𝒚 = [37.65 4.425 4.375] 9510 = 108801.7
3994
𝒚𝒚 𝜷𝑿𝒚 108896-108801.7
𝑀𝑆 = = ------------------------ =7.2538
16-2-1
R-Code for the matrix notation:

# To enter the data and put it into matrix form
X1<- c(4,4,4,4,6,6,6,6,8,8,8,8,10,10,10,10)
X2<- c(2,4,2,4,2,4,2,4,2,4,2,4,2,4,2,4)
Y <- c(64,73,61,76,72,80,71,83,83,89,86,93,88,95,94,100)
X <- cbind(1, X1,X2)
# To obtain matrix for X’X

XPX<- t(X)%*%X
XPX
X1 X2
16 112 48
X1 112 864 336
X2 48 336 160
# To obtain matrix for X`Y

XPY<- t(X)%*%Y
XPY
[,1]
1308
X1 9510
X2 3994
# To compute the inverse of X`X

IXPX <- solve(XPX)
IXPX
X1 X2
1.2375 -8.750000e-02 -0.1875
X1 -0.0875 1.250000e-02 0.0000
X2 -0.1875 2.602085e-17 0.0625
# To compute the solution for beta

Beta <- IXPX%*%XPY
Beta
[,1]
37.650
X1 4.425
X2 4.375

# To compute y'y
YPY <- t(Y)%*%Y
YPY
[,1]
[1,] 108896
# To compute B'X'Y
BPXPY <- t(Beta)%*%XPY
BPXPY
[,1]
[1,] 108801.7
# To compute MSE
MSE <- (YPY - BPXPY)/13
MSE
[,1]
[1,] 7.253846
R-Codes
Brand.Reg <- lm(formula = B.liking~content+sweetness)
summary(Brand.Reg)
Output:
Call:
lm(formula = B.liking ~ content + sweetness)
Residuals:
-4.400 -1.762 0.025 1.587 4.200
Coefficients:
(Intercept) 37.6500 2.9961 12.566 1.20e-08 ***
content 4.4250 0.3011 14.695 1.78e-09 ***
sweetness 4.3750 0.6733 6.498 2.01e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 MSE=(2.639)^2
=7.252249
Exercise: Find MSE for Ex. 3.2, delivery time.

=3.259^2=10.621081

2.3. Hypothesis Testing in Multiple Linear Regression
Once we have estimated the parameters in the model, we face two immediate questions:
1. What is the overall adequacy of the model?

2. Which specific regressors seem important?
Several hypothesis testing procedures prove useful for addressing these questions. The formal tests
require that our random errors be 𝑁𝐼𝐷(0, 𝜎 ) .
The Analysis of Variance (ANOVA) for Multiple Linear Regression
The test procedure is a generalization of the analysis of variance used in simple linear regression.
𝑆𝑆 = ∑(𝑦 − 𝑦̄ ) = 𝑦 𝑦 − 𝑦 𝐽𝑦 is measure of "how good" 𝑦̄ does.
where 𝐽 is an 𝑛 × 𝑛 square matrix with all elements 1.
∑
Computational formula: 𝑆𝑆 = 𝑦 ′ 𝑦 − .
SS E  ee    yi  yˆ i 
2
′
= 𝑦 − 𝑋𝛽 𝑦 − 𝑋𝛽
= 𝑦 𝑦−𝛽 𝑋 𝑦 is a measure of "how good" 𝑦 does
𝑆𝑆 = 𝑆𝑆 − 𝑆𝑆
=∑(𝑦 − 𝑦̄ ) − ∑(𝑦 − 𝑦 )
= (𝑦 𝑦 − 𝑦 𝐽𝑦) – (𝑦 ′ 𝑦 − 𝛽 ′ 𝑋 ′ 𝑦)
=𝛽 𝑋 𝑦− 𝑦 ′ 𝐽𝑦 is the amount “gained” by doing the regression.
∑
Computational formula: 𝑆𝑆 = 𝛽 ′ 𝑋 ′ 𝑦 −
Note that:
/
~𝜒 ; ~𝜒 ; 𝐹= /( )
= ~𝐹 ,

The ANOVA Table

Source of Degrees of Mean
Sum of Squares 𝐹
Variation Freedom Square
Regression 𝑆𝑆 𝑘 𝑀𝑆 𝑀𝑆
Residual 𝑆𝑆 𝑛−𝑘−1 𝑀𝑆 𝑀𝑆
Total 𝑆𝑆 𝑛−1
Notes:
1) 𝑆𝑆 has 𝑘 degrees of freedom. (Note: When there is only 1 independent variable, the degree
of freedom is 1).
2) 𝑆𝑆 has 𝑛 − 𝑘 − 1 degrees of freedom.
3) 𝐻 rejects if 𝐹 > 𝐹( , , ).
Test for Significance of Regression (Global Test of Model Adequacy)
Is the regression equation that uses information provided by the predictor variables 𝑥 , 𝑥 , … , 𝑥
substantially better than the simple predictor 𝑦̄ that does not rely on any of the 𝑋 −values?
The test for significance of regression is a test to determine if there is a linear relationship
between the response 𝑦 and any of the regressor variables 𝑥 , 𝑥 , … , 𝑥 . This procedure is often
thought of as an overall or global test of model adequacy.
𝐻 : 𝛽 =𝛽 =⋯=𝛽 =0
𝐻 : 𝛽 ≠ 0 for at least one 𝑗
Test statistics:
𝐹= ~𝐹 , ; Reject 𝐻 when 𝐹 > 𝐹 ; ,
Rejection of 𝐻 implies that at least one of the regressors 𝑥 , 𝑥 , … , 𝑥 contributes

significantly to the model.
Example 3.4: Delivery Time (refer Example 3.2)

Determine if the model:
𝒚 = 2.3412 + 1.6159 𝑿 + 0.0144 𝑿 .
or
Delivery time = 2.3412 +1.6159No.Cases + 0.0144 Distance
is significant for estimating delivery time? Use 𝛼 = 0.01 in the hypothesis test.

R-Codes:
#Use lm( ) function to obtain the ANOVA table
anv.delivery <- anova(Delivery.Reg)
anv.delivery
SSR <- sum(anv.delivery$"Sum Sq"[1:2])
SSR
Output:
Analysis of Variance Table
Response: Delivery
Df Sum Sq Mean Sq F value Pr(>F)

NoCases 1 5382.4 5382.4 506.619 < 2.2e-16 ***
MSE
Distance 1 168.4 168.4 15.851 0.0006312 ***
Residuals 22 233.7 10.6
--- 𝑆𝑆 = 5382.4+
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 168.4
SSR
[1] 5550.811
Solution:
𝐻 : 𝛽 =𝛽 =⋯=𝛽 =0
𝐻 : 𝛽 ≠ 0 for at least one 𝑗
Test statistics:
𝑀𝑆
𝐹= = 5550.8/2 ~ 𝐹( , ) .
𝑀𝑆 -------------=261.8302
10.6
Reject H0
Decision Rule: _______________ when 𝐹 > 𝐹 ; ,
With α = 0.01, F(2, 22, 0.01) = ______________, and F = _______________ > F2, 22, 0.01.
Reject H0, we have sufficient evidence to conclude that

______________________________________________________________________________
____________________.
Output for summary of regression model:


Example 3.5: Brand liking

Test for significance of regression using brand liking data in Ex. 3.1.
Solution:
∑ ∑
𝑆𝑆 = 𝑦 𝑦 − , 𝑆𝑆 = 𝛽 ′ 𝑋 ′ 𝑦 − , 𝑆𝑆 = 𝑆𝑆 − 𝑆𝑆 ,
64
73 (∑ 𝑦) (1308)
𝒚 𝒚 = [64 73 ⋯ 100] = 108896 , = = 106929 ,
⋮ 𝑛 16
100
1308
𝜷 𝑿 𝒚 = [37.65 4.425 4.375] 9510 = 108801.7 ,
3994
𝑆𝑆 = 108896-106929=1967 , 𝑆𝑆 = 108801.7-106929=1872.7 ,
1872.7
𝑀𝑆 2 936.35
𝑆𝑆 = 94.3 ;𝐹= = = = 129.0841
𝑀𝑆 94.3 7.2538
13
𝐹( . , , ) = 3.8056 ; 𝐹 = 129.08 > 𝐹 .
Reject 𝑯𝟎 , we have sufficient evidence to conclude that the model using content 𝑋 , and sweetness
𝑋 , of product as predictor variables is useful for estimating the brand liking.
R-Output:
# To compute ANOVA table
anv.Brand <- anova(Brand.Reg)
anv.Brand
Response: B.liking

content 1 1566.45 1566.45 215.947 1.778e-09 ***
sweetness 1 306.25 306.25 42.219 2.011e-05 ***
Residuals 13 94.30 7.25
--------
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------------------------------------

Testing the Significance of individual Regression Coefficients (Partial /

Marginal Tests)
Once we have determined that the model is useful for predicting 𝑌, we should explore the nature of
the “usefulness” in more detail. Do all the predictor variables add important information for
prediction in the presence of other predictors already in the model?
The null and alternative hypothesis are

𝐻 :𝛽 = 0
𝐻 :𝛽 ≠ 0
Test statistics:
𝑡= = ~𝑡 ; Reject 𝐻 when |𝑡 | > 𝑡 / ; ,
𝐶 is the (𝑗 + 1)𝑡ℎ diagonal element of the (𝑋 ′ 𝑋) matrix.
If 𝑯𝟎 is not rejected, this indicates that the regressor 𝒙𝒋 does not contribute significantly to
the model, in another word, 𝒙𝒋 can be deleted from the model.
Example 3.6: Delivery Time

Suppose the estimated regression model is:
𝒚 = 2.3412 + 1.6159 𝑿 + 0.0144 𝑿 .
or
Delivery time = 2.3412 +1.6159No.Cases + 0.0144 Distance
Is number of cases (𝑋 ) significantly related to delivery time in the model, given that distance (𝑋 )
is already in the model? Use 𝛼 = 0.01 for the hypothesis test.
(i.e. Should cases be used in the model (with distance) to estimate delivery time?)
Solution:
𝐻 :𝛽 = 0
𝐻 :𝛽 ≠ 0
𝛽 1.615907
𝑇= = = 9.4644 ~ t ( ) ; t( . , ) = 2.819 .
𝑠𝑒 𝛽 0.170735
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 3.25 × 10 < 𝛼 = 0.01 ,

𝑯𝟎 is rejected, this indicates that the regressor 𝑥 contribute significantly to the model, in another
word, 𝑥 can be in the model.

Output:
Call:
Residuals:
-5.7880 -0.6629 0.4364 1.1566 7.4197
Coefficients: B1
(Intercept) 2.341231 1.096730 2.135 0.044170 * se(B1)
NoCases 1.615907 0.170735 9.464 3.25e-09 ***
Distance 0.014385 0.003613 3.981 0.000631 *** t-value for
--- Ho:B1=0
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
p-value for
Residual standard error: 3.259 on 22 degrees of freedom H1:B1=0
Notes:
Since number of cases and distance have a p-value < 0.01 = , both variables should used in the
model.
Example 3.6.1:
Call:
lm(formula = Price ~ Area + HValue + LValue, data = Home.dat)
Coefficients:
(Intercept) 1470.2759 5746.3246 0.256 0.80132
Area 13.5286 6.5857 2.054 0.05666 .
HValue 0.8204 0.2112 3.885 0.00131 **
LValue 0.8145 0.5122 1.590 0.13137
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7919 on 16 degrees of freedom

Multiple R-Squared: 0.8974,Adjusted R-squared: 0.8782

Recommendation for checking a model’s variables usefulness in estimating 𝑌: 2 steps

1) Do a 𝐹 −test for regression relation
a) If 𝐻 is not rejected : Stop. Model variables may not be useful in estimating 𝑌.
b) If 𝐻 is rejected: There is at least one variable that is useful in estimating 𝒀. Go to step
2.
2) Do 𝑡 − tests on individual 𝛽 s to determine which variables are useful in estimating 𝑌

a) If 𝐻 is rejected for a 𝛽 : The corresponding regressor 𝑋 is useful in estimating 𝑌.
b) If 𝐻 is not rejected for a 𝛽 : The corresponding regressor 𝑋 may not be useful in
estimating 𝑌.
Testing Sets of Regression Coefficient (Partial 𝐹 −test or “Extra Sum of

Squares”)
Suppose a company suspects that the demand 𝑌 for a product could be related to as many as five
predictor variables, 𝑋 , 𝑋 , 𝑋 , 𝑋 , 𝑋 . The cost of obtaining measurements on the variables
𝑋 , 𝑋 and 𝑋 is very high. If, in a small pilot study, the company could show that these three
variables contribute little or no information for predicting 𝑌, they can be removed for the study at
great savings to the company.
If 𝑋 , 𝑋 and 𝑋 contribute little or no information for predicting 𝑌, then you want to test
𝐻 :𝛽 = 𝛽 = 𝛽 = 0
𝐻 : At least one of 𝛽 , 𝛽 or 𝛽 differs from 0.
This is a test to investigate the contribution of a subset of the regressor variables to the model.
Let the vector of regression coefficients be partitioned into 2 groups:
𝛽= where 𝛽 = (𝛽 , 𝛽 , . . . , 𝛽 ) and 𝛽 = (𝛽 , 𝛽 , . . . , 𝛽 ).
A test of hypothesis concerning a set of parameters involves two models:

Full Model
𝑦 = 𝑋𝛽 + 𝜀 = 𝑋 𝛽 + 𝑋 𝛽 + 𝜀
𝐸(𝑌) = 𝛽 + 𝛽 𝑋 +. . . +𝛽 𝑋 + 𝛽 𝑋 . . . +𝛽 𝑋
∑ ′ ′ ′
𝛽 = (𝑋 𝑋) 𝑋 𝑦; 𝑆 = 𝛽 ′ 𝑋 ′ 𝑦 − , 𝑘 degrees of freedom; 𝑀𝑆 = .
where 𝑋 and 𝑋 are orthogonal.

Hypothesis Test for sets of Parameters

𝐻 :𝛽 = 𝛽 =. . . = 𝛽 = 0
𝐻 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝛽 , 𝛽 , . . . 𝑜𝑟 𝛽 𝑑𝑖𝑓𝑓𝑒𝑟𝑠 𝑓𝑟𝑜𝑚 0.
To find the contribution of the 𝛽 to the regression, we assume that 𝐻 is true.
The Reduced Model

𝑦 =𝑋 𝛽 +𝜀
𝐸(𝑌) = 𝛽 + 𝛽 𝑋 +. . . +𝛽 𝑋
∑
𝛽 = (𝑋 𝑋 ) 𝑋 𝑦; 𝑆𝑆 (𝜷 ) = 𝜷 𝑿 𝒚 − , 𝑟 − 1 degrees of freedom.
The regression sum of squares due to 𝛽 given that 𝛽 is already in the model is
𝑆𝑆 (𝜷𝟐 |𝜷𝟏 ) = 𝑆𝑆 (𝑅𝑒𝑑𝑢𝑐𝑒𝑑) − 𝑆𝑆 (𝐹𝑢𝑙𝑙) = 𝑆𝑆 (𝜷𝟏 ) − 𝑺𝑺𝑬 (𝜷) , =SSR(F)-SSR(R)
with (𝑘 − 𝑟 + 1) degrees of freedom.
This is called “extra sum of squares” because it measures the increase in the regression sum of
squares that results from adding the regressors 𝑋 , . . . , 𝑋 to a model that already contains
𝑋 ,...,𝑋 .
Test statistics:
( | )/( )
𝐹= ~𝐹 , , Reject 𝐻 when 𝐹 > 𝐹 ,
MSE for full model
Remark:
The Partial 𝐹 −test on a single variable 𝑋 is equivalent to the 𝑡 − test.
Example 3.7: Real Estate Agent

Suppose a real estate agent relate the listed selling price 𝑌 to the square feet of living area 𝑋 , the
number of bathrooms 𝑋 , the number of floors 𝑋 , and the number of bedrooms 𝑋 . The realtor
suspects that the square footage of living area and number of bathrooms are the most important
predictor variable and that the other variables might be eliminated from the model without loss of
much prediction information. Test this claim with 𝛼 = 0.05.(Data file: C02EX2.7RealEs.txt)

R-Codes:
# Read data from file
setwd("E:/… ")
Real.dat <-read.table("C02EX2.7RealEs.txt", header=TRUE)
#Use lm( ) function to fit a linear regression to reduced and full models
PropertyR.Reg <- lm(Price~Area+Bath, data=Real.dat) (reduced only left important variables)
PropertyF.Reg <- lm(Price~Area+Bath+Floor+Bedroom, data=Real.dat)
# To obtain ANOVA Tables
ANVR <- anova(PropertyR.Reg)
ANVR
ANVF <- anova(PropertyF.Reg)
ANVF
# To compute the Test Statistics and P-value

SSER <- ANVR$"Sum Sq"[3]
SSER
SSEF <- ANVF$"Sum Sq"[5]
SSEF
NDF <- sum(ANVF$"Df"[1:4])-sum(ANVR$"Df"[1:2])
DDF <- ANVF$"Df"[5]
FStat = ((SSER-SSEF)/(NDF))/(SSEF/DDF)
FStat
P.Value = 1-pf(FStat,NDF, DDF)
P.Value

Output:
SSR
Response: Price
Area 1 14829.3 14829.3 221.870 4.21e-09 ***
Bath 1 750.8 750.8 11.233 0.005763 **
Residuals 12 802.1 66.8
---
Analysis of Variance Table SSE
Response: Price

Area 1 14829.3 14829.3 316.1025 6.76e-09 ***
Bath 1 750.8 750.8 16.0046 0.002516 **
Floor 1 316.3 316.3 6.7428 0.026642 *
Bedroom 1 16.6 16.6 0.3538 0.565189
Residuals 10 469.1 46.9(MSE)
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
SSE ( reduced model )
SSER = 802.0537
SSE ( full model) ---include all variables
SSEF = 469.1295
FStat = 3.548319 p-value for H1: at least one of β3

or β4 is not zero.
P.Value = 0.06846151
when p-value test =0, it is two tail test
Solution:
𝐻 :𝛽 = 𝛽 = 0
𝐻 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝛽 𝑜𝑟 𝛽 𝑖𝑠 𝑛𝑜𝑡 𝑧𝑒𝑟𝑜
Reduced Model: SSE(Reduced) = 802.1

Full Model : SSE(Full) = 469.1
SSR(β2| β1) = SSE(Reduced) – SSE(Full) = 333
/
𝐹= . = 3.55 , F0.05, 2, 10 = 4.10
With α = 0.05, we ___________________and

do not reject conclude that there is no statistical significant
evidence that at least one of the two variables, number of hours and number of bedrooms in
contributing significant information for predicting the listed selling price, thus those predictors can
be eliminated from the model.

2.4. Confidence Intervals in Multiple Linear Regression
Confidence intervals on individual regression coefficients and confidence intervals on the mean
response given specific levels of the regressors play the same important role in multiple regression
that they do in simple linear regression.
Confidence Intervals on Regression Coefficients
(1 − 𝛼)100% confidence interval (CI) for 𝛽 is

𝛽 ±𝑡 ;
𝑠𝑒 𝛽 ; 𝑗 = 0, 1, … , 𝑘
where
′
𝑠𝑒 𝛽 = 𝑀𝑆 ∗ 𝐶 , 𝐶 is the (𝑗 + 1)𝑡ℎ diagonal element of the (𝑋 𝑋) matrix.

Refer to Ex. 3.2, Construct a 95% confidence interval for 1.
R Output
Call:
Residuals:
-5.7880 -0.6629 0.4364 1.1566 7.4197
Coefficients:
(Intercept) 2.341231 1.096730 2.135 0.044170 *
NoCases 1.615907 0.170735 9.464 3.25e-09 ***
Distance 0.014385 0.003613 3.981 0.000631 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Solution:
We have found that:
𝛽 = 1.6159, 𝑠𝑒 𝛽 = 0.1707, 𝑡 . , = 2.074
95%confidence interval (C.I.) for 𝛽 is

𝛽 ± 𝑡 . ; 𝑠𝑒(𝛽 ) = 1.6159 ± 2.074(0.1707) = ( 1.2618 , 1.97 )
With 95% confidence, we estimate that the change in the mean of delivery time when No of cases
increase by one unit, holding the distance value constant, is somewhere between ____________
1.26mins
and ______________________.
1.97mins
R-Codes for CI
# Compute CI for regresstion parameters
confint(Delivery.Reg, level=0.95)
---------------------
Output:
2.5 % 97.5 %
(Intercept) 0.066751987 4.61571030
NoCases 1.261824662 1.96998976
Distance 0.006891745 0.02187791
Using the Model for Estimation and Prediction
Recall that the least squares line yielded the same value for both the estimate for 𝐸(𝑌 ) and the
prediction of some future value of 𝑦 . The confidence interval for the mean 𝐸(𝑌 ) is narrower than
the prediction interval for 𝑦 because of the additional uncertainty attributable to the random error
𝜀 when predicting some future value of 𝑦ℎ .
These same concepts carry over to multiple regression models.
The fitted value at point 𝑥 is.
𝐸(𝑦ℎ ) = 𝛽 𝑥 var 𝐸(𝑌 ) = 𝜎 𝒙 (𝑿 𝑿) 𝒙 .
Confidence Interval for 𝐸(𝑌 ) and Prediction Interval for 𝑦ℎ .
(1 − 𝛼)100% CI on the mean response at the point 𝑥 , 𝐸(𝑌ℎ ) is

𝐸 (𝑌ℎ ) − 𝑡 / ; 𝑠𝑒[𝐸 (𝑌ℎ )] ≤ 𝐸(𝑌ℎ ) ≤ 𝐸 (𝑌ℎ ) + 𝑡 / ; 𝑠𝑒[𝐸 (𝑌ℎ )]
where and se[ Eˆ (Yh )]  MS E * x h (XX) 1 x h .

(1 − 𝛼)100% prediction interval (PI) on the future observation 𝑦 at a specified value of 𝑥 is

𝑦ℎ − 𝑡 / ; 𝑠𝑒(𝑦ℎ ) ≤ 𝑦ℎ ≤ 𝑦ℎ + 𝑡 / ; 𝑠𝑒(𝑦ℎ )
where se yˆ h   MS E (1  x h (XX) 1 x h ) .
Note:
se yˆ h   MS E (1  xh (XX) 1 x h )  MS E  MS E xh (XX) 1 x h  MS E  ( se[ Eˆ (Yh )]) 2

Reconsider Ex. 3.2. Construct a 95% CI on the mean delivery time for an outlet requiring 𝑥 = 8
cases and where the distance 𝑥 = 275 feet.
Solution: t0,025,22=2.074
1 2.34123
𝒙 = 8 , 𝑦 = 𝒙 𝜷 = [1 8 275] 1.6159 = 19.22 min;
275 0.01438
Variance of 𝑦 :
0.11322 −0.00445 −0.00008 1
𝜎 𝒙 (𝑋 𝑋) 𝒙 = 10.6239[1 8 275] −0.00445 0.00274 −0.00005 8
−0.00008 −0.00005 0.000001 275
= 10.6239(0.05346) = 0.56794
_________________
So, a 95% CI on the mean delivery time at this point is:
19.22 − 2.074√0.56794 ≤ 𝐸(𝑦|𝑥 ) ≤ 19.22 + 2.074√0.56794

→ ( 17.66 , 20.78 ).
Means that 95% of such intervals will contain the true delivery time.
R-Codes
#read data from file
setwd("E:/… ")
Ex2.2delivery.dat<-read.table(file = "C02EX2.2Delivery.txt", header=TRUE)

Delivery.Reg<-lm(Delivery~NoCases+Distance, data=Ex2.2delivery.dat)
summary(Delivery.Reg)
# To construct CI for mean response and PI

# Input the new data xh
ND <- data.frame(NoCases=c(8), Distance=c(275))
CIM <-predict(object = Delivery.Reg, newdata = ND, se.fit = TRUE, interval =
c("confidence"), level=0.95)
PI <- predict(object = Delivery.Reg, newdata = ND, se.fit = TRUE,
interval=c("prediction"), level=0.95)
CIM
PI

Output:
> CIM
$fit
fit lwr upr
[1,] 19.22432 17.6539 20.79474
$se.fit se(E(yh)
[1] 0.7572407
$df
[1] 22
$residual.scale
[1] 3.259473
> PI Prediction Interval

$fit
fit lwr upr

[1,] 19.22432 12.28456 26.16407
$se.fit
[1] 0.7572407 it is not se(yh)
$df
[1] 22
df for error
$residual.scale
[1] 3.259473
Example 3.10:
When performing a regression of 𝑌 on 𝑋 and 𝑋 , we find that
i. 𝑦 = 20 − 1.5𝑥 + 1.8𝑥
ii. Source DF SS MS
Regression 2 42 21
Error 3 12 4
4/3 −1/4 −1/3
iii. (𝑋′𝑋) = −1/4 1/16 0
−1/3 0 2/3
a. Find 𝑠𝑒 𝛽 .
b. Calculate the value of the test statistic for testing 𝐻 : 𝛽 = 1.
c. Suppose 𝑥 = 2, and 𝑥 = 3, find 𝑠𝑒[𝐸 (𝑌 )] and 𝑠𝑒(𝑦ℎ ).

Solutions:
(a) 𝑠𝑒 𝛽 = 𝑀𝑆𝐸 𝐶 = 4(2/3)=1.633
(b) H0 = 𝛽 = 1
𝛽 −𝛽 1.8-1
𝑡= =
𝑠𝑒 𝛽 ---------- = 0.49
1.633
(c)
4 1 1
⎡ − − ⎤
⎢ 3 4 3⎥
1 55
1 1
𝑥ℎ′ (𝑥′𝑥) 𝑥ℎ = (1 2 3) ⎢ − 0⎥ 2 =
⎢ 4 16 ⎥ 12
3
⎢ 1 2⎥
⎣− 3 0
3⎦
𝑠𝑒(𝑦ℎ ) = 𝑀𝑆𝐸(1 + 𝑥′ℎ (𝑥′𝑥) 𝑥ℎ ) = 4(1+55/12) = 67/3 =4.73
𝑠𝑒 𝐸(𝑦ℎ ) = 𝑀𝑆𝐸(𝑥′ℎ (𝑥′𝑥) 𝑥ℎ ) = 4(55/12) = 4.28
2.5. Coefficient of determination, 𝑹𝟐
Two other ways to assess the overall adequacy of the model are 𝑅 and the adjusted 𝑅 , denoted
𝑅 . Recall that
𝑆𝑆 𝑆𝑆
𝑅 = = 1−
𝑆𝑆 𝑆𝑆
𝑅 has the same interpretation as before, but with respect to 𝑘 independent variables.
(i.e. 𝑹𝟐 𝟏𝟎𝟎%of the variation in 𝒀 can be explained by using the independent variables to
predict 𝒀)
Notes:
1) Use 𝑅 as a measure of fit when the sample size is substantially larger than the number of
variables in the model; otherwise, 𝑅 may be artificially high.
2) As more variables are added to the model, 𝑹𝟐 will always increase even if the additional
variables do a poor job of estimating 𝑌 (i.e. 𝑆𝑆 can never become larger with more predictor
variables and 𝑆𝑆 is always the same for a given set of responses). Therefore, some regression
model builders prefer to use adjusted R2.

The Adjusted 𝑹𝟐 , 𝑹𝟐𝑨𝒅𝒋
/( )
𝑅 =1− /( )
= 1− (1 − 𝑅 )
Note:
1. Since 𝑆𝑆 /(𝑛 − 𝑘 − 1) is the residual mean square and 𝑆𝑆 /(𝑛 − 1) is constant regardless
of how many variables are in the model, the 𝑅 will only increase on adding a variable
to the model if the addition of the variable reduces the residual mean square.
2. The interpretation of 𝑅 is about the same as 𝑅 .
3. 𝑅 ≤𝑅 .
4. 𝑅 can be less than 0.
Relationship between 𝑭 and 𝑹𝟐
/ ( )
𝐹=( )/( )
= ( )

Reconsider Ex. 3.2. Calculate 𝑅 and the adjusted 𝑅 .
Solution:
𝑆𝑆 5382.4+168.4
𝑅 = = ------------------------------ = 0.9596
𝑆𝑆
5382.4+168.4+233.7
𝑛−1
𝑅 = 1− (1 − 𝑅 ) = 1- 24/22 (1-0.9596)=0.9559
𝑛−𝑘−1
Since 𝑅 0.9559
= ____________, 95.6%
approximately __________ of the variation in delivery time can be
explained by using cases and distance to predict delivery time.
R Output
Call:
Coefficients:
(Intercept) 2.341231 1.096730 2.135 0.044170 *
NoCases 1.615907 0.170735 9.464 3.25e-09 ***
Distance 0.014385 0.003613 3.981 0.000631 ***

---
----------------------------------------------
Response: Delivery
NoCases 1 5382.4 5382.4 506.619 < 2.2e-16 ***
Distance 1 168.4 168.4 15.851 0.0006312 ***
Residuals 22 233.7 10.6
Example 3.12:
Examine what happens to 𝑅 and 𝑅 , when additional variables are added to the model
Consider the model
𝑌 = 𝛽 +𝛽 𝑋 +𝛽 𝑋 +𝛽 𝑋 +𝛽 𝑋 +𝜀
The data for 𝑌, 𝑋 , 𝑋 , 𝑋 , 𝑋 for 𝑛 = 15 observations were input into R program below.
# To enter the data

y <- c(69, 118.5, 116.5, 125, 129.9, 135, 139.9, 147.9,160, 169.9, 134.9, 155, 169.9, 194.5, 209.9)
x1 <- c(6, 10, 10, 11, 13, 13, 13, 17, 19, 18, 13,18,17,20,21)
x2 <- c(1,2,2,2,1.7,2.5,2,2.5,2, 2,2,2,3,3,3)
x3 <- c(1,1,1,1,1,2,1,2,2,1,1,1,2,2,2)
x4 <- c(2,2,3,3,3,3,3,3,3,3,4,4,4,4,4)
#Use lm( ) function to fit a linear regressions
mod.fit1<-lm(formula = y~x1)
sum.fit1 <- summary(mod.fit1)
mod.fit2<-lm(formula = y~x1+x2)
mod.fit3<-lm(formula = y~x1+x2+x3)
mod.fit4<-lm(formula = y~x1+x2+x3+x4)
R.sq.values<-data.frame(model = c("x1", "x1, x2","x1, x2, x3","x1, x2, x3, x4" ), R.sq =
c(sum.fit1$r.squared, sum.fit2$r.squared, sum.fit3$r.squared, sum.fit4$r.squared ),
adj.R.sq = c(sum.fit1$adj.r.squared,sum.fit2$adj.r.squared, sum.fit3$adj.r.squared,
sum.fit4$adj.r.squared))
mod.fit1
mod.fit2
mod.fit3
mod.fit4
R.sq.values

Output:
model R.sq adj.R.sq
1 x1 0.9052093 0.8979178
2 x1, x2 0.9510411 0.9428813
3 x1, x2, x3 0.9703503 0.9622640
4 x1, x2, x3, x4 0.9713634 0.9599088
>mod.fit1
lm(formula = y ~ x1)
Coefficients:
(Intercept) x1
35.55 7.50
> mod.fit2
lm(formula = y ~ x1 + x2)
Coefficients:
(Intercept) x1 x2
18.62 5.75 19.49
> mod.fit3
lm(formula = Price ~ x1 + x2 + x3)
Coefficients:
(Intercept) x1 x2 x3
15.811 6.044 28.222 -14.664
> mod.fit4
lm(formula = y ~ x1 + x2 + x3 + x4)
Coefficients:
(Intercept) x1 x2 x3 x4
18.763 6.270 30.271 -16.203 -2.673
Model 𝑅 𝑅
𝑌 = 35.5 + 7.5𝑋 0.9052 0.8979
𝑌 = 18.62 + 5.75𝑋 + 19.49𝑋 0.9510 0.9429
𝑌 = 15.81 + 6.04𝑋 + 128.22𝑋 − 14.66𝑋 0.9704 0.9623
𝑌 = 18.76 + 6.27𝑋 + 30.3𝑋 − 16.2𝑋 − 2.67𝑋 0.9714 0.9599
a) Note that 𝑋 , 𝑋 and 𝑋 were significantly related to 𝑌 in the model, but 𝑋 was not.
When a variable is added to the model that “may not” be useful, the 𝑅 decreased. Thus,
the decrease in 𝑅 after 𝑋 is added to the model suggests that 𝑿𝟒 may not be useful in
estimating 𝒀.
b) Notice that 𝑅 increased after each variable was added to the model.

Note:
(the highest adjusted R the best)
𝑅 is mainly used to compare two or more models that use different numbers of predictor
variables.
Example 3.13:
𝑅 and 𝑅 were calculated for all possible subsets of three independent variables. The results are
as follow:
Subsets of Regression: 𝑌 versus 𝑋 , 𝑋 , 𝑋
Independent Variable 𝑅 𝑅
𝑋 0.9052 0.8979
𝑋 0.6948 0.6713
𝑋 0.5565 0.5223
𝑋 ,𝑋 0.9510 0.9429
𝑋 ,𝑋 0.9150 0.9008
𝑋 ,𝑋 0.7565 0.7159
𝑋 ,𝑋 ,𝑋 0.9519 0.9388 after adding X3, the
value decreases, X3
may not be useful
If you had to compare these models and choose the best one, which model would you choose?
Explain.
Solution:
X1,X2, since adjusted R^2 is the highest.
2.6. Polynomial in one Variable

Polynomial regression models may contain one, two or more than two predictor variables, and each
predictor variable may be present in various powers.
If you suspect that one independent variable 𝑿 affects the response 𝒀, but that relationship is
curvilinear rather than linear, then you might choose to fit a quadratic model:
𝑌 =𝛽 +𝛽 𝑥 +𝛽 𝑥 +𝜀
(need to calculate the xi)
where 𝑥 = 𝑋 − 𝑋 (xi – centered value)
𝛽 is the 𝑦 −intercept of the curve (i.e. when 𝑋 = 𝑋)
𝛽 is a linear effect coefficient/parameter
𝛽 is a quadratic effect coefficient/rate of curvature.

The term involving 𝑥 , called a quadratic term (or second-order term).When the curve opens
upward, the sign of 𝛽 is positive (see figure 2.2a); when the curve opens downward, the sign of
𝛽 is negative (See figure 2.2b). This polynomial model is a second-order model with one
predictor variable.
The response function for regression model is

𝐸(𝑌) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥
Figure 2.2a: Graphs for Quadratic Models when 11 > 0 (Concave up)
5
Y
0 1 2 3
Figure 2.2b: Graphs for Quadratic Models when 11 < 0 (Concave down)
-1
-2
-3
-4
Y
-5
-6
-7
-8
-9
0 1 2 3
Example 3.14:
In all-electric homes, the amount of electricity expended is of interest to consumers. Suppose we
wish to investigate the monthly electric usage, 𝑌, in all-electric homes and its relationship to the
size, 𝑋, of the home. Moreover, suppose we think that monthly electrical usage in all-electric homes
is related to the size of the home by the quadratic model
𝑦 =𝛽 +𝛽 𝑥 +𝛽 𝑥 +𝜀
To fit the model, the values of 𝑌 and 𝑋 are collected for 10 homes during a particular month. The
data are shown below:

Size of Home, 𝑋 (Sq. ft) Monthly Usage 𝑌 (Kilowatt-hours)

1,290 1,180
1,350 1,172
1,470 1,264
1,600 1,493
1,710 1,571
1,840 1,711
1,980 1,804
2,230 1,840
2,400 1,956
2,930 1954
a. Fit the data to a regression model. Plot the fitted regression function and the data? Does the
quadratic regression function appear to be a good fit here? Find 𝑅 .
b. Explain why the value 𝛽 = 1703 has no practical interpretation.
c. Explain why the value 𝛽 = 0.7068 should not be interpret as a slope.
d. Examine the value of 𝛽 to determine the nature of the curvature (concave upward or
downward) in the sample data.
e. Test whether or not there is a regression relation; use 𝛼 = 0.01.
f. Is there sufficient evidence of concave down curvature in the electric-home size relationship?
Test with 𝛼 = 0.01.
g. Estimate the mean electric usage for all 1200 sq ft. houses with 95% confidence interval.
Interpret your interval.
h. Predict the electric usage for a 1200 sq ft. house with 95% confidence interval. Interpret your
interval.
R-codes
# To enter the data
Size <- c(1290,1350,1470,1600,1710,1840,1980,2230,2400,2930)
Usage <-c(1182,1172,1264,1493,1571,1711,1804,1840,1956,1954)
sSize <- Size-mean(Size)
# To fit the regression function. I() = identity function, use to prevent special # interpretation of
operators in a model formula.
Electric.Reg <- lm(Usage ~ sSize + I(sSize^2))
summary(Electric.Reg)
# Plot the fitted regression function and the data
plot(x = sSize, y = Usage, xlab = “Size”, ylab =
“Usage”, main = “Usage vs. Size”, col = “red”, pch = 19,cex=1.5)

curve(expr = predict(object = Electric.Reg, newdata =

data.frame(sSize = x)), col = “blue”, lty = “solid”, (solid line)
(line size)lwd = 1, add = TRUE, from = min(sSize), to =
max(sSize))
Output:
lm(formula = Usage ~ sSize + I(sSize^2))
Residuals:
-73.792 -22.426 5.886 31.689 52.436
Coefficients:
(Intercept) 1.703e+03 2.054e+01 82.914 9.77e-12 *** p value for i test
sSize 7.068e-01 3.723e-02 18.985 2.80e-07 ***
I(sSize^2) -4.500e-04 5.908e-05 -7.618 0.000124 ***
Signif. Codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

Multiple R-Squared: 0.9819, Adjusted R-squared: 0.9767
F-statistic: 189.7 on 2 and 7 DF, p-value: 8e-07
Note:
Notice how the I() (identity) function is used in the formula statements of lm(). The I() function
helps to protect the meaning of what is inside of it. Note that just saying age^2 without the function
will not work properly. Thus, age^2 just means age to R because there are no other terms with it.
Solution:
a. Fit the data to a regression model. Plot the fitted regression function and the data?
Does the quadratic regression function appear to be a good fit here? Find 𝑹𝟐 .
Fitted regression equation: 𝑦 = 1703+0.7068-0.00045X^2

Usage vs. Size

1800
1600
Usage
1400
1200
-500 0 500 1000
Size
The figure above illustrates that the electrical usage appears to __________________________
increase the curvillinear matter
with the size of the home. This provides some support for the inclusion of the quadratic term 𝑥
in the model and hence provides a good fit to the data.
R2 =0.9819
This implies that almost ____________
98.19% of the sample variation in electrical usage (𝑌) can be
explained by the quadratic model.
b. Explain why the value 𝜷𝟎 = 1703 has no practical interpretation.

Since the range of x does not include 0 (a home with 0 square feet), thus the interpretation
of B0 is not meaningful.
c.
d. Explain why the value 𝜷𝟏 = 0.7068 should not be interpret as a slope.
𝛽 = 0.7068 is no longer representing a slope in the presence of the quadratic term 𝑥 .
The estimated coefficient of the first-order 𝑥 __________________________________
will not have meaningful interpretation in the
quadratic model.
e. Examine the value of 𝜷𝟏𝟏 to determine the nature of the curvature (concave upward
or downward) in the sample data.
𝛽 = -0.000454 . The negative sign of 𝛽 downwards
indicates that the curve is concave _________.

f. Test whether or not there is a regression relation; use 𝜶 = 𝟎. 𝟎𝟏.
H0: β1 = β11= 0
H1: at least one of the β1 or β11 is not zero.
F = 189.7 , v1 = 2, v7 = 7
p-value = 0,0000008 < 𝛼 = 0.01
Reject H0
________________, we have sufficient evidence to say that the overall model is a useful
model to predict the electrical usage.
g. Is there sufficient evidence of concave down curvature in the electric-home size

relationship? Test with 𝜶 = 𝟎. 𝟎𝟏
H0: B11 =0
H1: B11<0
t = -7.618 , p – value =0.000124/2=0.000062 < 0.01

_____________________.
Reject H0 There is a statistical significant evidence of the concave down
curvature in the electric-home size relationship.
h. Estimate the mean electric usage for all 1200 sq ft. houses with 95% confidence
interval. Interpret your interval.
# Part (g) : Construct a 95% CI for the mean for yh

ND <- data.frame(sSize=c(1200-mean(Size)))
CIM <-predict(object = Electric.Reg, newdata = ND, se.fit = TRUE,interval=c(“confidence”),
level=0.95)
CIM
Output:
> CIM
$fit
fit lwr upr
[1,] 1014.514 925.4332 1103.596
$se.fit
[1] 37.67245
$df
[1] 7
$residual.scale
[1] 46.80133

With 95% confidence, we could conclude that the mean electric usage for all 1200 sq ft. houses
925.43
falls between __________ Kilowatt-hours and _____________
1103.60 Kilowatt-hours.
i. Predict the electric usage for a 1200 sq ft. house with 95% confidence interval.
Interpret your interval.
# Part (h):Construct a 95% PI for yh
PI <-predict(object = Electric.Reg, newdata = ND, se.fit = TRUE,interval=c("prediction"),
level=0.95)
PI
Output:
> PI
$fit
fit lwr upr
[1,] 1014.514 872.4483 1156.580
$se.fit
[1] 37.67245
$df
[1] 7
$residual.scale
[1] 46.80133
With 95% confidence, we could conclude that the predicted electric usage for a 1200 sq ft. house
falls somewhere between ____________
872.45 Kilowatt-hours and ______________
1156.58 Kilowatt-hours.
2.7. Interaction Models

In first-order model, the relationship between 𝐸(𝑌) and any one independent variable does not
depend on the values of the other independent variables in the models.
However, if the relationship between 𝐸(𝑌) and 𝑋 does, in fact, depend on the values of the
remaining 𝑋’s held fixed, then the first-order model is not appropriate for predicting 𝑌. In this case,
we need another model that will take into account this dependence. Such model includes the cross
products of two or more 𝑋’s.
An Interaction Model relating 𝐸(𝑌) to two Quantitative Independent Variables is
𝐸(𝑌) = 𝛽 + 𝛽 𝑋 + 𝛽 𝑋 + 𝛽 𝑋 𝑋
where 𝛽 represents the interaction effect coefficient for interaction between pair of predictor
variable 𝑋 and 𝑋 .

𝛽 + 𝛽 𝑋 represents the change in 𝑌 for every 1-unit increase in 𝑋 , holding 𝑿𝟐 fixed.

𝛽 + 𝛽 𝑋 represents the change in 𝑌 for every 1-unit increase in 𝑋 , holding 𝑿𝟏 fixed.
The interaction model traces a ruled (twisted plane) in three dimensional space (shown in Figure
2.3).
Figure 2.3: Computer-Generated Graph for an Interaction Model
3D regression plane
22
-10
3
1
-26
3 -1 x2
1
-1 -3
x1 -3
Example 3.15:
A collector of antique clocks knows that the price received for the clocks increases with the age of
the clocks. Moreover, the collector believes that the rate of increase of the auction price with age
will be driven upward by a large number of bidders. Consequently, the interaction model is
proposed:
𝑌 =𝛽 +𝛽 𝑋 +𝛽 𝑋 +𝛽 𝑋 𝑋 +𝜀
A sample of 32 auction prices of clocks, along with their age and the number of bidders, is given
below.
Age, x1 127 115 127 150 156 182 156 132 137 113 137
# of bidders, x2 13 12 7 9 6 11 12 10 9 9 15
Auction Price, y 1235 1080 845 1522 1047 1979 1822 1253 1297 946 1713
Age, x1 117 137 153 117 126 170 182 162 184 143 159
# of bidders, x2 11 8 6 13 10 14 8 11 10 6 9
Auction Price, y 1024 1147 1092 1152 1336 2131 1550 1884 2041 845 1483
Age, x1 108 175 108 179 111 187 111 115 194 168
# of bidders, x2 14 8 6 9 15 8 7 7 5 7
Auction Price, y 1055 1545 729 1792 1175 1593 785 744 1356 1262
The 32 data points were used to fit the model with interaction.

R Codes:
# to enter the data
Age <- c(127, 115, 127, 150, 156, 182, 156, 132, 137, 113, 137, 117, 137, 153, 117, 126, 170,
182, 162, 184, 143, 159, 108, 175, 108, 179, 111, 187, 111, 115, 194, 168)
Bidder <- c(13, 12, 7, 9, 6, 11, 12, 10, 9, 9, 15, 11, 8, 6, 13, 10, 14, 8, 11, 10, 6, 9, 14, 8, 6, 9, 15,
8, 7, 7, 5, 7)
Price <- c(1235, 1080, 845, 1522, 1047, 1979, 1822, 1253, 1297, 946, 1713, 1024, 1147, 1092,
1152, 1336, 2131, 1550, 1884, 2041, 845, 1483, 1055, 1545, 729, 1792, 1175, 1593, 785, 744,
1356, 1262)
# To fit the model
auc.reg <- lm(Price ~ Age+Bidder+I(Age*Bidder))
summary(auc.reg)
R Output:
Call:
lm(formula = Price ~ Age + Bidder + I(Age * Bidder))
Residuals:
-154.995 -70.431 2.069 47.880 202.259
Coefficients:
(Intercept) 320.4580 295.1413 1.086 0.28684
Age 0.8781 2.0322 0.432 0.66896
Bidder -93.2648 29.8916 -3.120 0.00416
I(Age * Bidder) 1.2978 0.2123 6.112 1.35e-06

Multiple R-Squared: 0.9539, Adjusted R-squared: 0.9489
F-statistic: 193 on 3 and 28 DF, p-value: < 2.2e-16
R Codes:
auc.anv <-anova(auc.reg)
SSR <- auc.reg$"Sum Sq"[1:3]
Fstat <- (SSR/sum(auc.anv$"Df"[1:3]))/ (auc.anv$"Sum Sq"[4]/sum(auc.anv$"Df"[4]))
auc.anv
Fstat
#p-value for Fstat

1-pf(Fstat,3,28)

R Output:
Response: Price
Age 1 2555224 2555224 323.209 < 2.2e-16 ***
Bidder 1 1727838 1727838 218.554 9.382e-15 ***
I(Age * Bidder) 1 295364 295364 37.361 1.353e-06 ***
Residuals 28 221362 7906
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#F-statistic
[1] 193.0411
#p-value for Fstat

[1] 0
a. Test the usefulness of the regression model using 𝐹 −test at 𝛼 = 0.05

b. Test the hypothesis (at 𝛼 = 0.05) that the price-age slope increases as the number of bidders
increase, i.e. the age and number of bidders interact positively.
c. Estimate the change in auction price of a 150-year-old clock, 𝑌, for each additional bidder.
Solution:
(a) H0: 𝛽 = 𝛽 = 𝛽 =0
H1: At least one of the β’j is not 0.
F= , p – value = < 0.05
Reject H0, we can conclude that there is statistically evidence that the regression model is useful in
estimating auction price.
(b) H0:
H1:
t= , p- value < 0.05
Reject H0. We have sufficient evidence to prove that the price-age slope increases as the number of
bidders increase, i.e. the age and number of bidders interact positively.

( )
(c) x1 = 150, =𝛽 +𝛽 𝑥
Note:𝑌 = 𝛽 + 𝛽 𝑥 − 𝛽 𝑥 + 𝛽 𝑥 𝑥 = 320.4580 + 0.8781𝑥 − 93.2648𝑥 + 1.2978𝑥 𝑥

β2 = , β12 =
The estimate change in auction price,
Note:
Once interaction effect is significant in the model 𝐸(𝑌) = 𝛽 + 𝛽 𝑋 + 𝛽 𝑋 + 𝛽 𝑋 𝑋 , do not
conduct 𝒕 −test on the 𝜷 coefficients of the first order terms 𝑿𝟏 and 𝑿𝟐 . These terms should
be kept in the model regardless of the magnitude of their associated 𝑝 −values.
Example 3.15.1:
Refer to Ex. 2.7. Real Estate
R-Output
Call:
lm(formula = Price ~ Area + Bath + Floor + Bedroom, data = Real.dat)
Residuals:
-12.700 -1.616 0.984 2.510 11.759
Coefficients:
(Intercept) 18.7633 9.2074 2.038 0.06889 .
Area 6.2698 0.7252 8.645 5.93e-06 ***
Bath 30.2705 6.8487 4.420 0.00129 **
Floor -16.2033 6.2121 -2.608 0.02611 *
Bedroom -2.6730 4.4939 -0.595 0.56519
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


-----------------------------------------------------------------------
Call:
lm(formula = Price ~ Area + Bath + Floor + Bedroom + I(Area *
Bath) + I(Area * Floor) + I(Area * Bedroom), data = Real.dat)
Residuals:
-12.6810 -1.7298 0.2067 2.5028 6.8279
Coefficients:
(Intercept) 46.1270 31.4595 1.466 0.1860
Area 4.8124 2.3829 2.020 0.0832 .
Bath 11.6684 22.1825 0.526 0.6151
Floor -40.3080 26.5576 -1.518 0.1729
Bedroom 9.3871 17.3444 0.541 0.6051
I(Area * Bath) 1.4876 1.6551 0.899 0.3986
I(Area * Floor) 1.2242 1.7087 0.716 0.4969
I(Area * Bedroom) -0.9592 1.2965 -0.740 0.4835
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


2.8. Using Quantitative and Qualitative predictor variables in a

Regression Model
Multiple regression models can also be written to include qualitative predictor variables. Qualitative
variables, unlike quantitative variables, cannot be measured on a numerical scale. Therefore, we
must code the values of the qualitative variable (called levels) as number before we can fit the
model. These coded qualitative variables are called dummy (or indicator) variables.
Example 3.16:
To enter gender as a variable, use
1, if male
𝑋 =
0, if female
Qualitative variables that involve 𝑘 categories are entered into the model by using 𝑘 − 1 dummy
variables.
Example 3.17:
In a model that relates the mean salary of group of employees to a number of predictor variables,
you may want to include the employee’s ethnic background. If each employee included in your
study belongs to one of the three ethnic groups – say, A, B, or C –you can enter the qualitative
variable “ethnicity” into your model using two dummy variables:
1, if group B 1, if group C
𝑋 = ,𝑋 =
0, if not 0, if not
The model is 𝐸(𝑌) = 𝛽 + 𝛽 𝑋 + 𝛽 𝑋 .
For employees in group A:
𝐸(𝑌) = 𝛽 + 𝛽 (0) + 𝛽 (0) = 𝛽
For employees in group B:
𝐸(𝑌) = 𝛽 + 𝛽 (1) + 𝛽 (0) = 𝛽 + 𝛽
For employees in group C:
𝐸(𝑌) = 𝛽 + 𝛽 (0) + 𝛽 (1) = 𝛽 + 𝛽
The model allows a different average response for each group.
𝛽 measures the average response for group A.
𝛽 measures the difference in the average responses between groups B and A.
𝛽 measures the difference in the average responses between groups C and A.

2.9. Models with Both Quantitative and Qualitative Variables
A model may contain predictor variables – qualitative or quantitative as well as cross-products

(interaction) of the dummy variables with other variables that appear in the model.
Example 3.18:
Consider the following model:
𝐸(𝑌) = 𝛽 + 𝛽 𝑋 + 𝛽 𝑋
where 𝑌 is the annual salary of a college lecturer,
𝑋 is the number of years of teaching experience,
1, if male college lecturer
𝑋 =
0, otherwise
Model above contains one quantitative variable (years of teaching experience) and one qualitative
variable (gender) which has two categories, i.e. male and female.
Mean salary for female college lecturer: 𝐸(𝑌) = 𝛽 + 𝛽 𝑋 + 𝛽 (0) = 𝛽 + 𝛽 𝑋
Mean salary for male college lecturer: 𝐸(𝑌) = 𝛽 + 𝛽 𝑋 + 𝛽 (1) = (𝛽 + 𝛽 ) + 𝛽 𝑋
Geometrically, the graph of 𝐸(𝑌) = 𝛽 + 𝛽 𝑋 + 𝛽 𝑋 is shown in Figure 2.4. In words, 𝐸(𝑌) =

𝛽 + 𝛽 𝑋 + 𝛽 𝑋 assumed that the level of male lecturer’s mean salary is different from that of
the female lecturer’s mean salary by 𝛽 but the rate of change in the mean annual salary by years
of experience is the same for both genders.
If the assumption of common slope is valid (can be tested later), a test of the hypothesis that the
two regressions have same intercept (i.e., there is no gender discrimination) can be conducted
based 𝐻 : 𝛽 = 0 using 𝑡 −test.
Figure 2.4
Y (Salary)
Male Teacher
Female Teacher
2
0
Years of Experience

The fact that the slopes of the two lines may differ means that the two predictor variables interact;
that is, the change in 𝐸(𝑌) corresponding to a change in 𝑿𝟏 depends on whether the lecturer is
a man or a woman. To allow for this interaction, the interaction term 𝑋 𝑋 is introduced into the
model.
𝐸(𝑌) = 𝛽 + 𝛽 𝑋 + 𝛽 𝑋 + 𝛽 𝑋 𝑋
Mean salary for female college lecturer:
𝐸(𝑌) = 𝛽 + 𝛽 𝑋 + 𝛽 (0) + 𝛽 𝑋 (0) = 𝛽 + 𝛽 𝑋
which is a straight line with slope 𝜷𝟏 and intercept 𝜷𝟎 (see Figure 2.4).
Mean salary for male college lecturer:

𝐸(𝑌) = 𝛽 + 𝛽 𝑋 + 𝛽 (1) + 𝛽 𝑋 (1) = 𝛽 + 𝛽 + (𝛽 + 𝛽 )𝑋
which is a straight line with slope 𝜷𝟏 + 𝜷𝟏𝟐 and intercept 𝜷𝟎 + 𝜷𝟐 (see Figure 2.5).
Figure 2.5:
Y (Salary)
Male Teacher
 1 +  12
Female Teacher
1
2
0
Years of Experience
The two lines have different slopes and different intercepts, which allows the relationship between
salary 𝑌 and years of experience 𝑋 to behave differently for men and women.

Example 3.19:
Table below gives hypothetical data on starting annual salaries and years of experience of 10 college
lecturers.
Years of Experience, 𝑋 Salary for Men (in RM1000) Salary for women (in RM1000)
5 27 24
4 26.7 23
3 26 23.5
2 25.5 22
1 26.2 22.5
a. Write a model for mean salary, 𝐸(𝑌).

𝐸(𝑌) = 𝛽 + 𝛽 𝑋 + 𝛽 𝑋 + 𝛽 𝑋 𝑋
Y= annual salary
1, if male college lecturer
𝑋 = Years of Experience, 𝑋 =
0, otherwise
R-Codes:
# to enter the data
Year <- c(5,5,4,4,3,3,2,2,1,1)
Gender <- c(1,0,1,0,1,0,1,0,1,0)
Salary <- c(27,24,26.7,23,26,23.5,25.5,22,26.2,22.5)
# Fit the model
Pay.Reg <- lm(Salary~Year+Gender+I(Year*Gender))
summary(Pay.Reg)
curve(expr = Pay.Reg$coefficients[1] +
Pay.Reg$coefficients[2]*x +
Pay.Reg$coefficients[3] +
Pay.Reg$coefficients[4]*x, col = "red", lty =
"solid", lwd = 2, xlim = c(1,6), ylim = c(22,27), xlab = "Year", ylab = "Salary", main =
"Salary vs. Years", panel.first = grid(col = "gray", lty = "dotted"))
curve(expr = Pay.Reg$coefficients[1] +
Pay.Reg$coefficients[2]*x, col = "blue", lty =
"solid", lwd = 2,xlim = c(1,6), ylim = c(22,27), add = TRUE)
legend(x=4.55, y=26.5, legend = c("Male", "Female"), col = c("red", "blue"), lty ="solid", bty =
"n", cex = 1, lwd = 2)

lm(formula = Salary ~ Year + Gender + I(Year * Gender))

Coefficients:
(Intercept) 21.8000 0.5251 41.516 1.31e-08
Year 0.4000 0.1583 2.526 0.04489
Gender 3.6400 0.7426 4.902 0.00271
I(Year * Gender) -0.1200 0.2239 -0.536 0.61127
b. Fit the model and graph the prediction equations for Men and Woman lecturer
Salary vs. Years
27
Male
26
Female
25
Salary
24
23
22
1 2 3 4 5 6
Year
𝑦=
The prediction equation for woman X2= 0,𝑦 = 21.8 + 0.4𝑋 + 3.64(0) − 0.12𝑋 (0)
=
The prediction equation for men X2=1 , 𝑦 = 21.8 + 0.4𝑋 + 3.64(1) − 0.12𝑋 (1)
=
c. Use the prediction equation to find the mean salary for male with 3.5 years of
experience.
𝑦=
ND<- data.frame(Year = c(3.5), Gender = c(1))

CIM <-predict(object = Pay.Reg, newdata = ND, se.fit = TRUE,interval=c("confidence"),
level=0.95)
CIM
$fit
fit lwr upr
[1,] 26.42 25.83889 27.00111
$se.fit
[1] 0.2374868
$df

[1] 6
d. Find a 95% confidence interval for the mean salary of all male lecturers with 3.5 years of
experience.
With 95% confidence, the mean salary of all male lecturers with 3.5 years of experience is
between ______________ and ________________.
i) A CI represents an inference on a parameter and is an interval that is intended to cover

the value of the parameter.
ii) A PI is a statement about the value to be taken by a random variable, the new observation.
R-Codes:
ND<- data.frame(Year = c(3.5), Gender = c(1))
PI <- predict(object = Pay.Reg, newdata = ND, se.fit = TRUE,interval=c("prediction"),
level=0.95)
PI
$fit
fit lwr upr
1 26.42 25.06408 27.77592
$se.fit
[1] 0.2374868
$df
e. Find a 95% prediction interval for the salary of a male lecturer with 3.5 years of
experience.
With 95% confidence, the salary of a male lecturer with 3.5 years of experience is between
____________ and ________________.
f. Do the data provide sufficient evidence to indicate that the annual rate of increase in female
lecturer salaries exceeds the annual rate of increase in male lecturer salaries? Test at 𝛼 =
0.1
Coefficients:
(Intercept) 21.8000 0.5251 41.516 1.31e-08 ***
Year 0.4000 0.1583 2.526 0.04489 *
Gender 3.6400 0.7426 4.902 0.00271 **

I(Year * Gender) -0.1200 0.2239 -0.536 0.61127
Since β12 measures the difference in slopes, the slope of the 2 lines will be identified if β12= 0.
H0: β12 = 0 H1: β12 < 0
t= , p – value = > 0.10
____________________, the data do not provide sufficient evidence to indicate that the annual rate
of increase in female lecturer salaries exceeds the annual rate of increase in male lecturer salaries.
Remark:
If the indicator variable is defined to be
1, if female college lecturer
𝑋 =
0, otherwise
with the same model for mean salary, E(Y) as
𝐸(𝑌) = 𝛽 + 𝛽 𝑋 + 𝛽 𝑋 + 𝛽 𝑋 𝑋
where 𝑌is the annual salary of a college lecturer,
𝑋 is the number of years of teaching experience,
R-code
Year <- c(5,5,4,4,3,3,2,2,1,1)
Gender <- c(0,1,0,1,0,1,0,1,0,1)
Salary <- c(27,24,26.7,23,26,23.5,25.5,22,26.2,22.5)
Pay.Reg <- lm(Salary~Year+Gender+I(Year*Gender))
summary(Pay.Reg)
R-output:
Residuals:
-0.600 -0.370 0.150 0.275 0.500
Coefficients:
(Intercept) 25.4400 0.5251 48.448 5.19e-09 ***
Year 0.2800 0.1583 1.769 0.12738
Gender -3.6400 0.7426 -4.902 0.00271 **
I(Year * Gender) 0.1200 0.2239 0.536 0.61127
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


F-statistic: 38.94 on 3 and 6 DF, p-value: 0.0002504
Then the fitted equation will be

𝑦= 25.44 + 0.28 𝑋 – 3.64𝑋 + 0.12 𝑋 𝑋
The prediction equation for women can be obtained by substituting 𝑋 = 1.
𝑦 = 25.44 + 0.28 𝑋 – 3.64(1) + 0.12𝑋 (1)
= 21.8 + 0.4 𝑋 (same result as previous one)
The prediction equation for men can be obtained by substituting 𝑋 = 0.
𝑦 = 25.44 + 0.28 𝑋 – 3.64(0) + 0.12 𝑋 (0)
= 25.44 +0.28 𝑋 (same result as previous one)

Chapter 3 - Multiple Linear Regression

Uploaded by

Chapter 3 - Multiple Linear Regression

Uploaded by

Chapter 3 - Multiple Linear Regression

College GPA = 𝛽 + 𝛽 (HSGPA) + 𝛽 (ACT Scores) + 𝛽 (Rank) + 𝜀

where the parameters 𝛽 , 𝛽 , 𝛽 , and 𝛽 would be estimated from the data.

2. Multiple Linear Regression Model and Assumptions

Assume that 𝜀 ~ 𝑁𝐼𝐷(0, 𝜎 ) and thus the mean of 𝑦 is 𝐸(𝑌) = 𝛽 + 𝛽 𝑋 + 𝛽 𝑋 +. . . +𝛽 𝑋 .

UECM2253 Applied Regression Analysis Chapter 3 - 1

2.1. Matrix Form of Multiple Linear Regression Model

The multiple linear regression model can be written in matrix notation as 𝑦 = 𝑋𝛽 + 𝜀

In general, 𝑦 is an 𝑛 × 1 vector of the observations, 𝑋 is an 𝑛 × 𝑝 matrix of the levels of the

2.2. Estimation of the Model Parameter

Least Squares Estimation of the Regression Coefficients

Data for Multiple Linear Regression

UECM2253 Applied Regression Analysis Chapter 3 - 2

Suppose that 𝜀 ~ 𝑁𝐼𝐷(0, 𝜎 ) then the least square function is given as

And in matrix form

𝑆(𝜷) = 𝜺 = (𝒚 − 𝑿𝜷) (𝒚 − 𝑿𝜷) .

Note that 𝑆(𝜷) may be expressed as

The least-square estimator must satisfy

The least-square estimator of 𝜷 is

The fitted multiple linear regression model is

Let 𝑪 = (𝑿 𝑿) , then the variance of 𝛽 , 𝑗 = 0, … , 𝑘 , is 𝜎 𝐶 and the covariance of 𝛽 and 𝛽 is

UECM2253 Applied Regression Analysis Chapter 3 - 3

where I is a 𝑛 × 𝑛 identity matrix.

R-Code for the matrix notation:

UECM2253 Applied Regression Analysis Chapter 3 - 4

# To obtain matrix for X’X

# To obtain matrix for X`Y

# To compute the inverse of X`X

# To compute the solution for beta

R-Codes to obtain the solutions for 𝛽

#Use lm( ) function to fit a linear regression

UECM2253 Applied Regression Analysis Chapter 3 - 5

Example 3.2: The Delivery Time

Observation No. Delivery Time, y (min) Number of Cases, 𝑥 Distance, 𝑥

a. How would you interpret the values of 𝛽 and 𝛽 ?

#Print data set

#Use the names command to know what columns are contained:

#Use lm( ) function to fit a linear regression

UECM2253 Applied Regression Analysis Chapter 3 - 6

The fitted regression equation is

𝛽= : The delivery time of a machine is expected to increase by ________

UECM2253 Applied Regression Analysis Chapter 3 - 7

Example 3.1.1: Inadequacy of Scatter Diagrams in Multiple Regression

UECM2253 Applied Regression Analysis Chapter 3 - 8

The residual mean square is

Note: 𝑀𝑆 is an unbiased estimator.

UECM2253 Applied Regression Analysis Chapter 3 - 9

R-Code for the matrix notation:

# To obtain matrix for X’X

# To obtain matrix for X`Y

# To compute the inverse of X`X

# To compute the solution for beta

UECM2253 Applied Regression Analysis Chapter 3 - 10

Exercise: Find MSE for Ex. 3.2, delivery time.

UECM2253 Applied Regression Analysis Chapter 3 - 11

2.3. Hypothesis Testing in Multiple Linear Regression

1. What is the overall adequacy of the model?

The Analysis of Variance (ANOVA) for Multiple Linear Regression

where 𝐽 is an 𝑛 × 𝑛 square matrix with all elements 1.

UECM2253 Applied Regression Analysis Chapter 3 - 12

The ANOVA Table

Test for Significance of Regression (Global Test of Model Adequacy)

Rejection of 𝐻 implies that at least one of the regressors 𝑥 , 𝑥 , … , 𝑥 contributes

Example 3.4: Delivery Time (refer Example 3.2)

UECM2253 Applied Regression Analysis Chapter 3 - 13

Df Sum Sq Mean Sq F value Pr(>F)

Reject H0, we have sufficient evidence to conclude that

Output for summary of regression model:

UECM2253 Applied Regression Analysis Chapter 3 - 14

Example 3.5: Brand liking

Analysis of Variance Table

Df Sum Sq Mean Sq F value Pr(>F)

UECM2253 Applied Regression Analysis Chapter 3 - 15

Testing the Significance of individual Regression Coefficients (Partial /

The null and alternative hypothesis are

𝐶 is the (𝑗 + 1)𝑡ℎ diagonal element of the (𝑋 ′ 𝑋) matrix.