Multiple Linear Regression
Multiple Linear Regression
Problem 1:
The dataset “Exam revision.sav” shows the sample of 40 observations for which data on the
exam scores, hours spent on revising, anxiety levels and A-level entry point scores for 40
students are shown. Construct a multiple linear regression model to explain the effect of the
three independent variables namely hours spent on revising, anxiety levels and A-level entry
point scores on the exam scores.
Problem 2:
The dataset “Net sales.sav” shows the sample of 27 observations for which data on the
Annual net sales, No. of square feet, Inventory, Amount spent on advertising, Size of sales
district and No. of competing stores in district are shown. Construct a multiple linear
regression model to show the effect of five independent variables namely No. of square feet,
Inventory, Amount spent on advertising, Size of sales district and No. of competing stores in
district on Annual net sales.
Problem 1
AIM:
To fit a regression model for the given data using SPSS
PROCEDURE:
Identify the dependent and the independent variables from the given data set
Independent variables: hours spent on revising, Anxiety, A-level entry points
Dependent variable: Exam scores
Multiple Linear Regression Model:
Exam scores=β0 + β 1 ( hours spent on revising )+ β 2 ( Anxiety ) + β 3 ( A−level entry points )+ ϵ
Find the scatter plot for the given data.
Interpretation
Correlations
N 40 40 40 40
Pearson Correlation .832**
1 -.333 *
.778**
hours spent revising Sig. (2-tailed) .000 .036 .000
N 40 40 40 40
Pearson Correlation -.112 -.333 *
1 -.230
anxiety Sig. (2-tailed) .493 .036 .153
N 40 40 40 40
Pearson Correlation .902**
.778 **
-.230 1
N 40 40 40 40
Interpretation
The above table gives the correlation coefficients as;
0.832 between exam score and hours spent revising, which indicates a strong positive
correlation
0.902 between exam score and A-level entry points, which indicates a strong positive
correlation
-0.112 between exam score and anxiety, which indicates a weak negative correlation
Model Summary
Model Summary
Total 5062.775 39
Interpretation
Since the p-value is < 0.05, the overall effect of the three independent variables on the
dependent variable is significant.
Coefficientsa
Interpretation
From the above table, the correlation coefficients are β 0= -15.270, β 1=0.489,β2= 0.101 and
β3= 2.234.
The p-value of β 0 0.010 is lesser than 0.05, β 0 is significant.
The p-value of β 1 0.000 is lesser than 0.05, β 1 is significant.
The p-value of β 20.010 is lesser than 0.05, β 2 is significant.
The p-value of β 30.000 is lesser than 0.05, β 3 is significant.
CONCLUSIONS:
The overall effect of the three independent variables namely hours spent revising,
anxiety, A-level entry points on exam scores is significant at 5% level of significance.
This model can explain up to 88% of the variation in the given data.
Problem 2
AIM
To fit a regression model for the given data using SPSS
PROCEDURE
Identify the dependent and the independent variables from the given data set
Independent Variables: number sq. ft./1000,inventory/$1000, amount spent on
advertising/$1000, size of sales district/1000 families, number of competing stores in
district
Dependent variable:annual net sales/$1000
Multiple Linear Regression:
Annual net sales= β 0+ β 1 (number sq.ft)+ β 2 (inventory)+ β 3 (amount spent on
advertising)+ β 4 (size of sales district) + β 5 (number of competing stores in district)+
Error
Find the scatter plot for the data
Pearson
1 .873** .945** .920** .955** -.912**
annual net Correlation
N 27 27 27 27 27 27
Pearson
.873** 1 .808** .726** .820** -.761**
number sq. Correlation
ft./1000 Sig. (2-tailed) .000 .000 .000 .000 .000
N 27 27 27 27 27 27
Pearson
.945** .808** 1 .902** .859** -.807**
Correlation
inventory/$1000
Sig. (2-tailed) .000 .000 .000 .000 .000
N 27 27 27 27 27 27
Pearson
.920** .726** .902** 1 .807** -.856**
amount spent on Correlation
advertizing/$1000 Sig. (2-tailed) .000 .000 .000 .000 .000
N 27 27 27 27 27 27
Pearson
size of sales .955** .820** .859** .807** 1 -.880**
Correlation
district/1000
Sig. (2-tailed) .000 .000 .000 .000 .000
families
N 27 27 27 27 27 27
Pearson
number of -.912** -.761** -.807** -.856** -.880** 1
Correlation
competing stores
Sig. (2-tailed) .000 .000 .000 .000 .000
in district
N 27 27 27 27 27 27
Model Summary
Model Summary
Interpretation:
R2 =0.994,which shows that the regression model can explain 99.4% variation in the given
data.
Anova Table
ANOVAa
Total 959366.667 26
Interpretation:
Since the p-value is < 0.05, the overall effect of the independent variables on the dependent
variable is significant.
amount spent on
12.145 2.512 .237 4.835 .000
1 advertizing/$1000
size of sales
13.992 1.730 .377 8.088 .000
district/1000 families
number of competing
-3.581 1.772 -.091 -2.021 .056
stores in district
Interpretation
From the above table
β 0= -48.507. Since the p-value of β 0 is > 0.05, β 0 is not significant.
β 1= 13.851. Since the p-value of β 1 is < 0.05, β 1 is significant.
β 2= 0.214. Since the p-value of β 2 is < 0.05, β 2 is significant.
β 3= 12.145. Since the p-value of β 3 is < 0.05, β 3 is significant.
β 4 = 13.992. Since the p-value of β 4 is < 0.05, β 4 is significant.
β 5= -3.581. Since the p-value of β 5 is ~ equal to 0.05, β 5 is significant.
CONCLUSIONS
The fitted regression model is given as
Annual net sales=-48.507+13.851 (number sq.ft)+ 0.214(inventory)+12.145 (amount
spent on advertising)+13.992(size of sales district) +(-3.581) (number of competing
stores in district)+ Error
The overall effect of the independent variables on the dependent variable is significant
at 5% level of significance.
This model can explain up to 99.4 % of the variation in the given data.