Module 11 Unit 3 Multiple Linear Regression
Module 11 Unit 3 Multiple Linear Regression
Learning Outcomes:
(1) Develop an estimated multiple linear regression model to predict the value of
a dependent variable based on more than one independent variable.
(2) Interpret the constants in the estimated multiple linear regression equation.
Many times a simple linear regression model is not adequate to explain the behavior of a
dependent or response variable. In reality, the dependent or response variable maybe
better explained by more than one independent or predictor variables, thus the need for a
multiple linear model.
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑘 𝑋𝑘 + 𝜀
The intercept or regression constant, 𝛽0 ,of the regression model is the 𝑦-intercept of the
regression hyperplane (counterpart of the straight line for simple linear regression) which
gives the value of 𝑌 when the independent variables are all equal to zero (in case zero is in
the scope of all the independent variables). On the other hand, the regression coefficients,
𝛽𝑖 for 𝑖 = 1, 2, 3, … , 𝑘, represents an estimate of the change in the dependent variable
corresponding to a unit increase in 𝑋𝑖 when all the other independent variables are held
constant or fixed at some value.
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 205
Estimating the Regression Coefficients (By Method of Least Squares)
𝑌 = 𝑏0 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + ⋯ + 𝑏𝑘 𝑋𝑘
where𝑌𝑖 is the observed response to the values 𝑋1𝑖 , 𝑋2𝑖 , … , 𝑋𝑘𝑖 of the 𝑘 independent variables
𝑋1 , 𝑋2 , … , 𝑋𝑘 .The values of 𝑏0 , 𝑏1 , … , 𝑏𝑘 can be obtained by applying the method of least
squares which generates the following set of 𝒌 + 𝟏 normal equations for multiple linear
regression:
𝑛 𝑛 𝑛 𝑛
These equations can be solved for 𝑏0 , 𝑏1 , … , 𝑏𝑘 by any appropriate method for solving
systems of linear equations.In the case of two independent variables, the regression
constant and coefficients of the estimated multiple linear regression equation
𝑌 = 𝑏0 + 𝑏1 𝑋1 + 𝑏2 𝑋2
𝑛 𝑛 𝑛
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 206
What results is a system of linear equations in three unknowns 𝑏0 , 𝑏1 , and 𝑏2 which can be
solved algebraically.
Example:
Omni Foods, Inc. is planning a nationwide introduction of OmniPower, a new high-energy
bar. Originally marketed to runners, mountain climbers, and other athletes, high-energy
bars are now popular with the general public. Omni Foods is anxious to capture a share of
this thriving market. Because the marketplace already contains several successful energy
bars, the marketing manager needs to develop an effective marketing strategy. In
particular, he needs to determine the effect that price and in-store promotions will have on
sales of OmniPower. Before marketing the bar nationwide, he conducts a test-market study
of OmniPower sales using a sample of 34 stores in a supermarket chain. The following table
shows the price of the bar in cents (of a dollar), the monthly budget for in-store
promotional expenditures in dollars, and the number of OmniPower bars sold in a month.
Predict the monthly sales volume as a function of the price and in-store promotions
budget.
We first encode our data set into Excel, ensuring that the columns for the independent
variables, in this case Price (𝑋1 ) and Promotion (𝑋2 ), are side by side. The column for the
dependent variable, Sales (𝑌), may appear before or after the column for the
independent variables. Make sure you rememeber which is which!However, for the
purpose of this example, the file is available as “salesvolume.csv”. Now, let us take a look
at the analysis using R.
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 207
R Script and Output
store Y X1 X2
1 1 4141 59 200
2 2 3842 59 200
3 3 3056 59 200
4 4 3519 59 200
5 5 4226 59 400
6 6 4630 59 400
# R Output
Call:
lm(formula = Y ~ X1 + X2, data = sales)
Residuals:
Min 1Q Median 3Q Max
-1680.96 -406.40 53.45 297.48 1342.43
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5837.5208 628.1502 9.293 1.79e-10 ***
X1 -53.2173 6.8522 -7.766 9.20e-09 ***
X2 3.6131 0.6852 5.273 9.82e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 208
#To examine coefficients table only
summary(model)$coefficient
# R Output
At the 5% significance level, the coefficients table show that all the coefficients of the
estimated multiple linear regression model are significant (even at the 1% significance
level) since the respective p-value for each coefficient is less than 0.05.
To assess the significance of the multiple linear regression model, we refer to the last row of
the output which is labeled F-statistic. The reported p-value for the model is lesser than the
0.05 significance level, hence, this implies that the model is significant, that is, it can be
used to predict or estimate monthly sales based on the price and the monthly in-store
promotional expenditure.
As to the goodness of fit of the multiple linear regression model, we now look at the
adjusted R-squared value. The adjusted R-squared value is a modified version of the R-
squared where it takes into account the number of predictors in the model. In our output,
we find the adjusted R-squared value as 0.7421. Here, we now see that 74.21% of the
variation in monthly sales is explained by the linear relationship between sales and the
independent variables price and in-store promotion.
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 209
Practice Exercise 11-3
Suppose that for a sample of 12 students taking MathBA 111, the following data were
recorded:
Use the R software to solve the following problem as directed. Construct a .csv file for the
given data. Present the problem followed by the R output on a .docx file. Give your
discussion/interpretation of the outputs. Save your work as LRA11-3<LASTNAME>.docx and
save your R script as LRA11-3<LASTNAME>.R.
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 210
The owner of Showtime Movie Theaters, Inc. would like to estimate weekly gross revenue as
a function of advertising expenditures. Historical data for a sample of 8 weeks follow.
a. Develop an estimated simple linear regression equation to predict the weekly gross
revenue from the amount of television advertising. Interpret the regression
coefficients and check their significance in the model. What do the values of 𝑟 and
𝑟 2 say about the association between the variables? (10 points)
b. Develop an estimated multiple linear regression equation to predict the weekly
gross revenuewith both television advertising and newspaper advertising as the
independent variables. Interpret the regression coefficients and assess their
significance in the model. What does the adjusted𝑟 2 say about the association
between the variables? (10 points)
c. What can you say about the estimated regression equation coefficient for television
advertising expenditures from (a) and (b)? Interpret. (5 points)
d. What is the estimated weekly gross revenue for a week when $3500 is spent on
television advertising and $1800 is spent on newspaper advertising? (5 points)
Congratulations! You just completed all the modules and units for the Finals.
You are now ready to take the last examination for AE 311.
There are less topics to study now, so be confident and get a high score!
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 211
What else can we say but CONGRATULATIONS!
You have just completed all the modules in AE311. We know you have
learned a lot from them. You did your part, so you deserve those
learnings you acquired and the skills you developed.
We want to say Thank You for being a part of this new normal of
teaching and learning this term. WeI hope you can help us improve the
future versions of this module by accomplishing our feedback form.
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 212