0% found this document useful (0 votes)
238 views8 pages

Module 11 Unit 3 Multiple Linear Regression

This document discusses multiple linear regression analysis. Multiple linear regression allows a dependent variable to be predicted from two or more independent variables. It extends simple linear regression to use additional predictor variables. The multiple linear regression equation and method of least squares for estimating the regression coefficients are presented. An example illustrates predicting monthly sales volume from price and in-store promotion budget.

Uploaded by

Beatriz Lorezco
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
238 views8 pages

Module 11 Unit 3 Multiple Linear Regression

This document discusses multiple linear regression analysis. Multiple linear regression allows a dependent variable to be predicted from two or more independent variables. It extends simple linear regression to use additional predictor variables. The multiple linear regression equation and method of least squares for estimating the regression coefficients are presented. An example illustrates predicting monthly sales volume from price and in-store promotion budget.

Uploaded by

Beatriz Lorezco
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 8

MODULE 11: CORRELATION AND REGRESSION ANALYSIS

UNIT 3: MULTIPLE LINEAR REGRESSION


(For DECEMBER 14)

Learning Outcomes:
(1) Develop an estimated multiple linear regression model to predict the value of
a dependent variable based on more than one independent variable.
(2) Interpret the constants in the estimated multiple linear regression equation.

Many times a simple linear regression model is not adequate to explain the behavior of a
dependent or response variable. In reality, the dependent or response variable maybe
better explained by more than one independent or predictor variables, thus the need for a
multiple linear model.

Multiple linear regression is an extension of simple linear regression. It is appropriate for


research questions where the relationship between two or more independent variables
and one dependent variable is of interest. Multiple regression allows the researcher to
make predictions of the dependent variable based on several independent variables.

In general, the multiple linear regression model can be written as:

𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑘 𝑋𝑘 + 𝜀

where the variables are defined as follows:


𝑌 = the dependent variable
𝑋1 , 𝑋2 , . . . , 𝑋𝑘 = the explanatory variables or independent variables
𝛽0 = the regression constant
𝛽1 , 𝛽2 , . . ., 𝛽𝑘 = the regression coefficients or partial regression coefficients
𝜀 = the error term

The intercept or regression constant, 𝛽0 ,of the regression model is the 𝑦-intercept of the
regression hyperplane (counterpart of the straight line for simple linear regression) which
gives the value of 𝑌 when the independent variables are all equal to zero (in case zero is in
the scope of all the independent variables). On the other hand, the regression coefficients,
𝛽𝑖 for 𝑖 = 1, 2, 3, … , 𝑘, represents an estimate of the change in the dependent variable
corresponding to a unit increase in 𝑋𝑖 when all the other independent variables are held
constant or fixed at some value.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 205
Estimating the Regression Coefficients (By Method of Least Squares)

We wish to fit the multiple linear regression model

𝑌 = 𝑏0 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + ⋯ + 𝑏𝑘 𝑋𝑘

to the data points

𝑋1𝑖 , 𝑋2𝑖 , … , 𝑋𝑘𝑖 , 𝑌𝑖 𝑖 = 1, 2, … , 𝑛 𝑎𝑛𝑑 𝑛 > 𝑘

where𝑌𝑖 is the observed response to the values 𝑋1𝑖 , 𝑋2𝑖 , … , 𝑋𝑘𝑖 of the 𝑘 independent variables
𝑋1 , 𝑋2 , … , 𝑋𝑘 .The values of 𝑏0 , 𝑏1 , … , 𝑏𝑘 can be obtained by applying the method of least
squares which generates the following set of 𝒌 + 𝟏 normal equations for multiple linear
regression:

𝑛 𝑛 𝑛 𝑛

𝑛𝑏0 + 𝑏1 𝑋1𝑖 + 𝑏2 𝑋2𝑖 + ⋯ + 𝑏𝑘 𝑋𝑘𝑖 = 𝑌𝑖


𝑖=1 𝑖 =1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛 𝑛 𝑛
2
𝑏0 𝑋1𝑖 + 𝑏1 𝑋1𝑖 + 𝑏2 𝑋1𝑖 𝑋2𝑖 + ⋯ + 𝑏𝑘 𝑋1𝑖 𝑋𝑘𝑖 = 𝑋1𝑖 𝑌𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1

𝑛 𝑛 𝑛 𝑛 𝑛
2
𝑏0 𝑋𝑘𝑖 + 𝑏1 𝑋𝑘𝑖 𝑋1𝑖 + 𝑏2 𝑋𝑘𝑖 𝑋2𝑖 + ⋯ + 𝑏𝑘 𝑋𝑘𝑖 = 𝑋𝑘𝑖 𝑌𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1

These equations can be solved for 𝑏0 , 𝑏1 , … , 𝑏𝑘 by any appropriate method for solving
systems of linear equations.In the case of two independent variables, the regression
constant and coefficients of the estimated multiple linear regression equation

𝑌 = 𝑏0 + 𝑏1 𝑋1 + 𝑏2 𝑋2

can be found using the following normal equations:

𝑛 𝑛 𝑛

𝑛𝑏0 + 𝑏1 𝑋1𝑖 + 𝑏2 𝑋2𝑖 = 𝑌𝑖


𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛 𝑛
2
𝑏0 𝑋1𝑖 + 𝑏1 𝑋1𝑖 + 𝑏2 𝑋1𝑖 𝑋2𝑖 = 𝑋1𝑖 𝑌𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛 𝑛

𝑏0 𝑋2𝑖 + 𝑏1 𝑋2𝑖 𝑋1𝑖 + 𝑏2 𝑋2𝑖 2 = 𝑋2𝑖 𝑌𝑖


𝑖=1 𝑖=1 𝑖=1 𝑖=1

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 206
What results is a system of linear equations in three unknowns 𝑏0 , 𝑏1 , and 𝑏2 which can be
solved algebraically.

The regression coefficients 𝑏0 , 𝑏1 , … , 𝑏𝑘 estimate the population regression coefficients


𝛽0 , 𝛽1 , … , 𝛽𝑘 and are interpreted in the same way: 𝑏𝑖 for 𝑖 = 1, 2, 3, … , 𝑘, represents an estimate
of the change in the dependent variable corresponding to a unit increase in 𝑋𝑖 provided
that the other independent variables are held constant or fixed at some value.

Example:
Omni Foods, Inc. is planning a nationwide introduction of OmniPower, a new high-energy
bar. Originally marketed to runners, mountain climbers, and other athletes, high-energy
bars are now popular with the general public. Omni Foods is anxious to capture a share of
this thriving market. Because the marketplace already contains several successful energy
bars, the marketing manager needs to develop an effective marketing strategy. In
particular, he needs to determine the effect that price and in-store promotions will have on
sales of OmniPower. Before marketing the bar nationwide, he conducts a test-market study
of OmniPower sales using a sample of 34 stores in a supermarket chain. The following table
shows the price of the bar in cents (of a dollar), the monthly budget for in-store
promotional expenditures in dollars, and the number of OmniPower bars sold in a month.
Predict the monthly sales volume as a function of the price and in-store promotions
budget.

We first encode our data set into Excel, ensuring that the columns for the independent
variables, in this case Price (𝑋1 ) and Promotion (𝑋2 ), are side by side. The column for the
dependent variable, Sales (𝑌), may appear before or after the column for the
independent variables. Make sure you rememeber which is which!However, for the
purpose of this example, the file is available as “salesvolume.csv”. Now, let us take a look
at the analysis using R.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 207
R Script and Output

# Load the readr package


library(readr)

# Import the "salesvolume.csv" file and assign it to "sales".


sales<-read.csv("salesvolume.csv")
head(sales)

store Y X1 X2
1 1 4141 59 200
2 2 3842 59 200
3 3 3056 59 200
4 4 3519 59 200
5 5 4226 59 400
6 6 4630 59 400

# Build the linear regression model


model <-lm(Y~X1+X2, data = sales)
summary(model)

# R Output

Call:
lm(formula = Y ~ X1 + X2, data = sales)

Residuals:
Min 1Q Median 3Q Max
-1680.96 -406.40 53.45 297.48 1342.43

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5837.5208 628.1502 9.293 1.79e-10 ***
X1 -53.2173 6.8522 -7.766 9.20e-09 ***
X2 3.6131 0.6852 5.273 9.82e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 638.1 on 31 degrees of freedom


Multiple R-squared: 0.7577, Adjusted R-squared: 0.7421
F-statistic: 48.48 on 2 and 31 DF, p-value: 2.863e-10

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 208
#To examine coefficients table only
summary(model)$coefficient

# R Output

Estimate Std. Error t value Pr(>|t|)


(Intercept) 5837.520759 628.1502250 9.293192 1.791009e-10
X1 -53.217336 6.8522206 -7.766437 9.200160e-09
X2 3.613058 0.6852221 5.272828 9.821961e-06

At the 5% significance level, the coefficients table show that all the coefficients of the
estimated multiple linear regression model are significant (even at the 1% significance
level) since the respective p-value for each coefficient is less than 0.05.

Hence, the multiple linear regression equation is given by 𝑌 = 5837.5208 − 53.2173𝑋1 +


3.6131𝑋2 where 𝑌 is the predicted monthly sales of OmniPower bars, 𝑋1 is the price of
OmniPower bars (in cents), and 𝑋2 is the monthly in-store promotional expenditures (in
dollars). Let us interpret the regression coefficients. From the equation, we see that if the
monthly budget for in-store promotions is held constant, the number of OmniPower bars
sold will decrease by around 53 pieces for every centavo of increase in price. On the other
hand, if the price is held constant, increasing the in-store pormotion budget by a dollar will
only increase the sales by around 3 bars. The intercept (5837.5208) has no meaning since
the price of the energy bars cannot be zero, although in-store promotion can have a
budget of $0.

To assess the significance of the multiple linear regression model, we refer to the last row of
the output which is labeled F-statistic. The reported p-value for the model is lesser than the
0.05 significance level, hence, this implies that the model is significant, that is, it can be
used to predict or estimate monthly sales based on the price and the monthly in-store
promotional expenditure.

As to the goodness of fit of the multiple linear regression model, we now look at the
adjusted R-squared value. The adjusted R-squared value is a modified version of the R-
squared where it takes into account the number of predictors in the model. In our output,
we find the adjusted R-squared value as 0.7421. Here, we now see that 74.21% of the
variation in monthly sales is explained by the linear relationship between sales and the
independent variables price and in-store promotion.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 209
Practice Exercise 11-3
Suppose that for a sample of 12 students taking MathBA 111, the following data were
recorded:

Student Grade, 𝒀 Test Score, 𝑿𝟏 Classes Missed, 𝑿𝟐


1 85 65 1
2 74 50 7
3 76 55 5
4 90 65 2
5 85 55 6
6 87 70 3
7 94 65 2
8 98 70 5
9 81 55 4
10 91 70 3
11 76 50 1
12 74 55 4

Using the Regression tool in Microsoft Excel,


a. Find the estimated simple linear regression equation 𝑌 = 𝑎 + 𝑏𝑋1 that will predict the
student’s grade from his/her test score. What do the values of 𝑟 and 𝑟 2 say about
the association between the variables?
b. Find the multiple linear regression equation 𝑌 = 𝑏0 + 𝑏1 𝑋1 + 𝑏2 𝑋2 .that will predict the
student’s grade from his/her test score and number of classes missed. What do the
values of 𝑟 and 𝑟 2 say about the association between the variables?
c. Estimate the grade for a MathBA 111student who has an test score of 60 and missed
4 classes.

Learning Reinforcement Activity No. 11-3: MULTIPLE LINEAR REGRESSION


Accomplish by Decmber 14, 2020

Use the R software to solve the following problem as directed. Construct a .csv file for the
given data. Present the problem followed by the R output on a .docx file. Give your
discussion/interpretation of the outputs. Save your work as LRA11-3<LASTNAME>.docx and
save your R script as LRA11-3<LASTNAME>.R.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 210
The owner of Showtime Movie Theaters, Inc. would like to estimate weekly gross revenue as
a function of advertising expenditures. Historical data for a sample of 8 weeks follow.

Weekly Gross Television Newspaper


Revenue Advertising Advertising
($1000s) ($1000s) ($1000s)
96 5.0 1.5
90 2.0 2.0
95 4.0 1.5
92 2.5 2.5
95 3.0 3.3
94 3.5 2.3
94 2.5 4.2
94 3.0 2.5

a. Develop an estimated simple linear regression equation to predict the weekly gross
revenue from the amount of television advertising. Interpret the regression
coefficients and check their significance in the model. What do the values of 𝑟 and
𝑟 2 say about the association between the variables? (10 points)
b. Develop an estimated multiple linear regression equation to predict the weekly
gross revenuewith both television advertising and newspaper advertising as the
independent variables. Interpret the regression coefficients and assess their
significance in the model. What does the adjusted𝑟 2 say about the association
between the variables? (10 points)
c. What can you say about the estimated regression equation coefficient for television
advertising expenditures from (a) and (b)? Interpret. (5 points)
d. What is the estimated weekly gross revenue for a week when $3500 is spent on
television advertising and $1800 is spent on newspaper advertising? (5 points)

Congratulations! You just completed all the modules and units for the Finals.
You are now ready to take the last examination for AE 311.
There are less topics to study now, so be confident and get a high score!

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 211
What else can we say but CONGRATULATIONS!

You have just completed all the modules in AE311. We know you have
learned a lot from them. You did your part, so you deserve those
learnings you acquired and the skills you developed.

In the future, when you will be working in the business world,


you might be needing those knowledge and skills.

We want to say Thank You for being a part of this new normal of
teaching and learning this term. WeI hope you can help us improve the
future versions of this module by accomplishing our feedback form.

May God bless you and your whole family.


Keep safe and stay healthy always.

Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 212

You might also like