0% found this document useful (0 votes)
9 views17 pages

Multiple Linear Regression

Uploaded by

ismael kenedy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
9 views17 pages

Multiple Linear Regression

Uploaded by

ismael kenedy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 17

MULTIPLE LINEAR REGRESSION

The objective of this section is to introduce multiple linear regression and by end of the chapter
you will be able to:

 Define a multiple linear regression,


 Fit a multiple regression model manually (using matrices) and using SPSS,
 Test significance of regression parameters,
 Use ANOVA to test for the significance of the fitted regression model, and
 Interpret SPSS output.

1.1 Introduction
Up to now, we have been dealing with regression relationships in which two variables,
dependent and one independent variable were involved. Multiple linear regression analysis is
merely an extension of simple linear regression. In multiple linear regression, there are 𝑝 − 1
explanatory variables, and the relationship between the dependent variable and the explanatory
variables is represented by the following equation:

𝑦𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 … + 𝛽𝑝−1 𝑥𝑝−1𝑖 + 𝜀𝑖

Where 𝛽0is a constant term and 𝛽1 to 𝛽𝑝−1 are the coefficients relating the 𝑝 − 1 explanatory
variables to the variables (𝑋1𝑖 , 𝑋2𝑖 , … , 𝑋𝑝−1,𝑖 ) of interest.

Note: The multiple linear regression has 𝑝 parameters, that is, 𝛽0 to 𝛽𝑝−1.

So, multiple linear regression can be thought of an extension of simple linear regression, where
there are 𝑝 parameters, or simple linear regression can be thought of as a special case of multiple
linear regression, where 𝑝 = 2.
The term ‘linear’ is used because in multiple linear regression we assume that 𝑌 is directly
related to a linear combination of the explanatory variables.

Examples where multiple linear regression may be used include:

(i) Trying to predict an individual’s income given several socio-economic characteristics.


(ii) Trying to predict the overall examination performance of pupils in Grade 12, given
the values of a set of exam scores at age 16.
(iii) Trying to estimate systolic or diastolic blood pressure, given a variety of socioeconomic
and behavioral characteristics (occupation, drinking smoking, age etc).
(iv) Trying to predict crop yield given amount of rainfall received, amount of fertilizer
applied, soil type, temperature, e.tc.
As the case in simple linear regression, our main task will be estimating the 𝑝 parameters of the
multiple linear regression model.

1.2 Estimation of Parameters

There are many ways of estimating the parameters in a regression model. As we did in
simple linear regression we shall focus attention on the Least Squares method. There are two
to ways we can apply the least squares method;

(i) Matrix approach


(ii) Estimation by substitution

In multiple linear regression, the matrix approach seem to be more appropriate as compared to
estimation by substitution because as the number of explanatory variables increase the
substitution method become complex.

Consider now writing an equation for each observation:

𝑦1 = 𝛽0 + 𝛽1 𝑥11 + 𝛽2 𝑥21 + ⋯ + 𝛽𝑝−1 𝑥𝑝−1,1 + 𝜀1

𝑦2 = 𝛽0 + 𝛽1 𝑥12 + 𝛽2 𝑥22 + ⋯ + 𝛽𝑝−1 𝑥𝑝−1,2 + 𝜀2

⋮ ⋮ ⋮ ⋮ + … + ⋮ ⋮

𝑦𝑛 = 𝛽0 + 𝛽1 𝑥1𝑛 + 𝛽2 𝑥2𝑛 + ⋯ + 𝛽𝑝−1 𝑥𝑝−1,𝑛 + 𝜀𝑛

In matrix notation, our model is given by

𝑦1 𝛽0 + 𝛽1 𝑥11 + 𝛽2 𝑥21 + ⋯ + 𝛽𝑝−1 𝑥𝑝−1,1 𝜀1


𝑦2 𝛽 + 𝛽1 𝑥12 + 𝛽2 𝑥22 + ⋯ + 𝛽𝑝−1 𝑥𝑝−1,2 𝜀2
[⋮]= 0 +[ ⋮ ]
⋮ ⋮ ⋮ + … + ⋮
𝑦𝑛 [𝛽0 + 𝛽1 𝑥1𝑛 + 𝛽2 𝑥2𝑛 + ⋯ + 𝛽𝑝−1 𝑥𝑝−1,𝑛 ] 𝜀𝑛

𝑦1 1 𝑥11 … 𝑥𝑝−1,1 𝛽0 𝜀1
𝑦2 1 𝑥12 … 𝑥𝑝−1,2 𝛽1 𝜀2
[⋮]=[ ⋮ ][ ⋮ ] + [ ⋮ ]
⋮ ⋮ ⋮
𝑦𝑛 1 𝑥1𝑛 … 𝑥𝑝−1,𝑛 𝛽𝑝−1 𝜀𝑛

The above can be written as 𝒀 = 𝑿𝜷 + 𝜺, where


𝑦1
𝑦2
𝒀 is the response vector and given by 𝒀 = [ ⋮ ]
𝑦𝑛
1 𝑥11 … 𝑥𝑝−1,1
1 𝑥12 … 𝑥𝑝−1,2
𝑿 is the design matrix and given by 𝑿 = [ ⋮ ]
⋮ ⋮ ⋮
1 𝑥1𝑛 … 𝑥𝑝−1,𝑛

𝛽0
𝛽
𝜷 is the vector of parameters give by 𝜷 = [ 1 ] and

𝛽𝑝−1

𝜀1
𝜀2
𝜺 is the error vector given by 𝜺 = [ ⋮ ]
𝜀𝑛

Assumptions in Matrix Form

𝜺~𝑁(𝟎, 𝜎 2 𝐈), where 𝐈 is 𝑛 × 𝑛 the identity matrix. Ones in the diagonal elements of I specify
that the variance of each 𝜀𝑖 is 1 times 𝜎 2 . Zeros in the off diagonal elements of I specify that the
covariance between different 𝜀𝑖 zero implying that that the correlations are zero.

As in simple linear regression, the normal equation in matrix form is given by

𝑿′ 𝑿𝜷 = 𝑿′ 𝒀
𝛽̂0
̂ = 𝛽̂1 . Pre-multiply both
Solving this equation for 𝜷 gives the least squares solution for 𝜷

̂
[𝛽𝑝−1 ]
sides by the inverse of 𝑿 𝑿(assuming it exists), that is [𝑿 𝑿] 𝑿 𝑿𝜷 = [𝑿′ 𝑿]−𝟏 𝑿′ 𝒀 we have
′ ′ −𝟏 ′

̂ = [𝑿′ 𝑿]−𝟏 𝑿′ 𝒀.
𝜷

Having estimated regression coefficients we have to test for the significance of each parameter.
If the coefficient of a given variable is insignificant it implies that that variable should be
removed from the model.

1.3 Hypothesis testing on the parameters.

We can test the following hypotheses:

(A) 𝐻0 : 𝛽𝑖 = 𝑏 versus 𝐻1 : 𝛽𝑖 ≠ 𝑏 for 𝑖 = 0,1,2, … , 𝑝 − 1.


(B) 𝐻0 : 𝛽𝑖 ≥ 𝑏 versus 𝐻1 : 𝛽𝑖 < 𝑏 for 𝑖 = 0,1,2, … , 𝑝 − 1.
(C) 𝐻0 : 𝛽𝑖 ≤ 𝑏 versus 𝐻1 : 𝛽𝑖 > 𝑏 for 𝑖 = 0,1,2, … , 𝑝 − 1.
Test statistic:

𝛽̂𝑖 − 𝑏
𝑡= ~𝑡(𝑛 − 𝑝)
√𝑉𝑎𝑟(𝛽̂𝑖 )

The estimate of the variance of 𝛽̂𝑖 is given by


𝑛
∑ (𝑦𝑖 −𝑦̂𝑖 )2
𝑆𝑆𝐸
𝑉𝑎𝑟(𝛽̂𝑖 ) = 𝑠 2 × [(𝑖 + 1), (𝑖 + 1)]𝑡ℎ element of [𝑿′ 𝑿]−1 , where 𝑠 2 = 𝑛−𝑝 = 𝑖=0𝑛−𝑝

Note: In most cases𝑏 = 0, because we will be testing for the significance of regression
parameter.

It therefore follows that for the above hypotheses,

(A) We reject 𝐻0 if

𝛽̂𝑖 − 𝑏
|𝑡| = || || > 𝑡𝛼 (𝑛 − 𝑝)
2
√𝑉𝑎𝑟(𝛽̂𝑖 )

(B) We reject 𝐻0 if
𝛽̂𝑖 − 𝑏
𝑡= < −𝑡𝛼 (𝑛 − 𝑝)
√𝑉𝑎𝑟(𝛽̂𝑖 )
(C) We reject 𝐻0 if
𝛽̂𝑖 − 𝑏
𝑡= > −𝑡𝛼 (𝑛 − 𝑝)
√𝑉𝑎𝑟(𝛽̂𝑖 )

1.4 Confidence Intervals of Parameters

The (1 − 𝛼)100% confidence interval of 𝛽𝑖 is given by

(𝛽̂𝑖 − 𝑡𝛼 (𝑛 − 𝑝)√𝑉𝑎𝑟(𝛽̂𝑖 ) , 𝛽̂𝑖 + 𝑡𝛼 (𝑛 − 𝑝)√𝑉𝑎𝑟(𝛽̂𝑖 ))


2 2
1.5 Analysis of Variance Approach to Multiple Linear Regression
Analysis of Variance (ANOVA) is a highly useful and flexible mode of analysis for regression
𝑆𝑆𝐸
models. We will use ANOVA to compute 𝑠 2 = 𝑛−𝑝 (an estimate of 𝜎 2 ) and to check if there is a
regression relationship.

Sum of Squares

The total sum of squares can be partitioned into two components, regression and error sum of
squares. In matrix terms they are defined as;
2
𝑆𝑆𝑇 = 𝒀′ 𝒀 − 𝑛𝑦
̂ ′ 𝑿′ 𝒀 − 𝑛𝑦2
𝑆𝑆𝑅 = 𝜷
𝑆𝑆𝐸 = 𝒀′ 𝒀 − 𝜷 ̂ ′ 𝑿′ 𝒀

Thus once 𝜷 have been estimated, the sum of squares can easily be computed.

Degrees of freedom

𝑆𝑆𝑇 has 𝑛 − 1 degrees of freedom.

𝑆𝑆𝑅 has 𝑝 − 1 degreees of freedom.

𝑆𝑆𝐸 has 𝑛 − 𝑝 degrees of freedom.

Mean Squares

The sum of squares divided by its degrees of freedom is called mean squares. The two important
mean squares are the Regression Mean Squares (𝑀𝑆𝑅) and Error Mean Squares (𝑀𝑆𝐸) and
these are given by;

𝑆𝑆𝑅
𝑀𝑆𝑅 =
𝑝−1

𝑆𝑆𝐸
𝑀𝑆𝐸 =
𝑛−𝑝

F-ratio

𝑀𝑆𝑅
𝐹= ~𝐹(𝑝 − 1, 𝑛 − 𝑝)
𝑀𝑆𝐸
Table 4.1: ANOVA table (Multiple regression)

Source of variation Sum of squares d.f MS F


Regression 𝑆𝑆𝑅 𝑝−1 𝑀𝑆𝑅 𝐹= 𝑀𝑆𝑅⁄𝑀𝑆𝐸
Error 𝑆𝑆𝐸 𝑛−𝑝 𝑀𝑆𝐸
Total 𝑆𝑆𝑇 𝑛−1

To test for the significance of regression using ANOVA, our hypotheses are of the form

𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑝−1 = 0

𝐻1 : 𝛽𝑖 ≠ 0 for at least one 𝑖, 𝑖 = 1,2, … , 𝑝 − 1.

Test statistic: F-ratio

Rejection criteria: Testing at α level of significance, we reject 𝐻0 if 𝐹 > 𝐹𝛼 (𝑝 − 1, 𝑛 − 𝑝).

Failing to reject 𝐻0 implies that that there is no regression relationship between the response
variable 𝑌 and the 𝑝 − 1 explanatory variables. If 𝐻0 has been rejected, it implies that there is
regression relationship. However we should go on to test the significance of each parameter to
find out which variable(s) led to the rejection of the null hypothesis.

Example 4.1 In a small-scale regression study, the following data were obtained:

X1i 7 4 16 3 21 8
X2i 33 41 7 49 5 31
Yi 42 33 75 28 91 55

Suppose the data can be modeled by a multiple linear regression model.

(a) Express the regression model in matrix form, defining all the terms
(b) Find the least squares estimates of β, given that,

34.5785574 −1.65089268 −0.65704022


  −1.6508927
1
X 
X =[ 0.08030796 0.03112763 ]
−0.6570402 0.03112763 0.01268501

(c) Construct the ANOVA table and test for the significance of the regression line using
α=0.05.
(d) Test the hypothesis H0:β2=0 versus H1: 𝛽2 ≠ 0 at α=0.05.
(e) Find the 95% confidence interval for the intercept and test whether it is significant.
Solution

(a) 𝒀 = 𝑿𝜷 + 𝜺, where

7 1 33 42 𝜀1
4 1 41 33 𝛽0 𝜀2
16 1 7 75 𝜀3
𝒀= ,𝑿= , 𝜷 = [𝛽1 ] and 𝜺 = 𝜀
3 1 49 28 4
21 1 5 91 𝛽2 𝜀5
[8] [1 31 55] [𝜀6 ]

̂ = [𝑿′ 𝑿]−𝟏 𝑿′ 𝒀
(b) 𝜷

324
𝑿′ 𝒀 = [4061].
6796
34.5785574 −1.65089268 −0.65704022 324 33.9321
̂ = [−1.6508927
Thus, 𝜷 0.08030796 0.03112763 ] [4061] = [ 2.7848 ]
−0.6570402 0.03112763 0.01268501 6796 −0.2644

2
(c) 𝑆𝑆𝑇 = 𝒀′ 𝒀 − 𝑛𝑦 = 20568 − 6(54)2 = 3072
̂ ′ 𝑿′ 𝒀 − 𝑛𝑦2
𝑆𝑆𝑅 = 𝜷
324
= [33.9321 2.7848 −0.2644] [4061] − 6(54)2
6796
= 3010.2108

̂ ′ 𝑿′ 𝑿 = 𝑆𝑆𝑇 − 𝑆𝑆𝐸 = 61.7892


𝑆𝑆𝐸 = 𝒀′ 𝒀 − 𝜷

𝑆𝑆𝑇 has 𝑛 − 1 = 6 − 1 = 5 degrees of freedom.

𝑆𝑆𝑅 has 𝑝 − 1 = 3 − 1 = 2 degreees of freedom.

𝑆𝑆𝐸 has 𝑛 − 𝑝 = 6 − 3 = 3 degrees of freedom.

𝑆𝑆𝑅 3010.2108
𝑀𝑆𝑅 = = = 1505.1054
𝑝−1 2

𝑆𝑆𝐸 61.7892
𝑀𝑆𝐸 = = = 20.5964
𝑛−𝑝 3
𝑀𝑆𝑅
𝐹= = 73.0761
𝑀𝑆𝐸
Table 4.2: ANOVA table

Source of variation Sum of squares d.f MS F


Regression 3010.2108 2 1505.1054 73.0761
Error 61.7890 3 20.5964
Total 3072 5

𝐻0 : 𝛽1 = 𝛽2 = 0

𝐻1 : 𝛽𝑖 ≠ 0 for at least one 𝑖, 𝑖 = 1,2.


𝑀𝑆𝑅
Test statistic: 𝐹 = 𝑀𝑆𝐸 ~𝐹(𝑝 − 1, 𝑛 − 𝑝)

Rejection criteria: Testing at 𝛼 = 0.05 level of significance, we reject 𝐻0 if

𝐹 > 𝐹0.05 (2,3) = 9.55.

Test statistics: 𝐹 = 73.0761 (from ANOVA table)

Conclusion: Since 𝐹 > 9.55, we reject 𝐻0 and conclude that the regression relationship is
significant.

(d) 𝐻0 : 𝛽2 = 0 versus 𝐻1 : 𝛽2 ≠ 0

̂2
𝛽
Test statistic: 𝑡 = ~𝑡(𝑛 − 𝑝)
̂2 )
√𝑉𝑎𝑟(𝛽

Rejection criteria: Testing at 𝛼=0.05 level of significance we reject 𝐻0 if


̂2
𝛽
|𝑡| = | | > 𝑡𝛼 (𝑛 − 𝑝) = 𝑡0.025 (3) = 2.35
̂2 )
√𝑉𝑎𝑟(𝛽 2

Test statistic:

𝛽̂2
|𝑡| = || ||
√𝑉𝑎𝑟(𝛽̂2 )

𝑉𝑎𝑟(𝛽̂2 ) = 𝑠 2 × [3,3]𝑡ℎ element of [𝑿′ 𝑿]−1


= 20.5964 × 0.01268501
= 0.26126554
𝛽̂2 −0.2644
|𝑡| = || || = | | = 0.5173
√0.26126554
√𝑉𝑎𝑟(𝛽̂2 )

Conclusion: Since |𝑡| < 2.35, we fail to reject 𝐻0 and conclude the variable 𝑋2𝑖 is
insignificant, that is, it has to be removed from the model.

(e) The 95% confidence interval of 𝛽0 is given by

(𝛽̂0 − 𝑡𝛼 (𝑛 − 𝑝)√𝑉𝑎𝑟(𝛽̂0 ) , 𝛽̂0 + 𝑡𝛼 (𝑛 − 𝑝)√𝑉𝑎𝑟(𝛽̂0 ))


2 2

= (33.9321 + 𝑡0.025 (3)√712.1937 , 33.9321 + 𝑡0.025 (3)√712.1937 )


= (−50.9324 , 118.7966)

Activity 4.1

Consider the following data set, where 𝑌 is the dependent variable and 𝑋1𝑖 and 𝑋2𝑖 are the
regressors.

𝑌 4.1 8.5 5.2 9.6 8.7


𝑋1𝑖 2.5 3.7 2.6 5.5 4.0
𝑋2𝑖 3.5 4.4 3.9 4.3 4.9

Suppose the data can be described by the model 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝜀𝑖 , where
𝜀𝑖 ~𝑁(0, 𝜎 2 ) and 𝐶𝑜𝑣(𝜀𝑖 , 𝜀𝑗 ) = 0 for 𝑖 ≠ 𝑗.

(a) Express the above model in matrix form.


(b) Find the least squares estimates of 𝜷 given that

17.2124 0.5764 −4.5529


[𝑿′ 𝑿]−1 = [ 0.5764 0.2632 −0.3666 ]
−4.5529 −0.3666 1.4035
(c) Construct the ANOVA table and test for the significance of the regression line using
α=0.05.
(d) Test the hypothesis 𝐻0 : 𝛽0 = 0 versus 𝐻1 : 𝛽0 ≠ 0 at α=0.05.
(e) Estimate 𝑌 at 𝑋1 = 3and 𝑋2 = 4.5.

1.6 Coefficient of Determination, 𝑅2


Coefficient of determination can be used to determine how good our model is. It gives the
amount of variation or changes in the response variable explained or accounted by the
model. A high value of 𝑅 2 indicates a good model and 𝑅 2 is given by

𝑆𝑆𝑅 𝑆𝑆𝐸
𝑅2 = = 1−
𝑆𝑆𝑇 𝑆𝑆𝑇

As we add more and more variables to the model (even random ones), 𝑅 2 will increase to 1.
Adjusted 𝑅 2 tries to take this into account by replacing sums of squares by mean squares

𝑆𝑆𝐸⁄
𝑛−𝑝
𝑅 2 (𝑎𝑑𝑗) = 1 −
𝑆𝑆𝑇⁄
𝑛−1

Example 4.2 Referring to example 4.1, calculate 𝑅 2 and 𝑅 2 (𝑎𝑑𝑗) and comment.

𝑆𝑆𝑅 𝑆𝑆𝐸⁄
𝑛−𝑝
Solution: 𝑅 2 = 𝑆𝑆𝑇 = 0.9799 and 𝑅 2 (𝑎𝑑𝑗) = 1 − 𝑆𝑆𝑇⁄ = 0.9665
𝑛−1

Comment: The two values 𝑅 2 and 𝑅 2 (𝑎𝑑𝑗) are almost the same and are quite high indicating
that the model is quite good.

In example 4.1 we dealt with two explanatory variables. As the number of explanatory increases
the estimation of parameters become complex, so we would want to use statistical software to
estimate the parameters rather than doing it manually. The following example will illustrate how
we use SPSS in multiple linear regression;

Example 4.3: An auctioneer of rugs kept records of his weekly auctions in order to determine
the relationships among price, age of carpet or rug, number of people attending the auction, and
number of times the winning bidder had previously attended his auctions. He felt that, with this
information, he could plan his auctions better, serve his steady customers better and make a
higher overall profit for himself. The results shown in the table below were obtained.

Price Age Audience size Previous attendance


1080 80 40 1
2540 150 80 12
1490 85 55 3
960 55 45 0
2100 140 70 8
1820 95 65 5
2230 140 80 7
1490 80 60 9
1620 90 65 10
1260 60 55 8
1880 90 70 7
2080 100 100 5
2150 120 85 3
1940 95 80 0
1860 90 80 6
2240 135 90 8
2950 175 120 10
2370 150 115 10
1240 55 55 3
1620 70 75 5
2120 120 100 0
1090 50 50 8
1850 65 65 9
2220 125 95 7

Fit an appropriate multiple linear regression model to the above.

Solution: Using SPSS

To perform a multiple linear regression analysis, go to the Analyze > Regression > Linear
We will be presented with a dialog box:
Choose the dependent and independent (explanatory) variables you require, in this case price is
the dependent variable and age, audience size and previous attendance are the independent
variables. The default ‘enter’ method puts all explanatory variables you specify in the model, in
the order that you specify them. Note that the order is unimportant in terms of the modeling
process. There are other methods available for model building, based on statistical significance,
such as backward elimination or forward selection but when building the model on a substantive
basis, the enter method is best: variables are included in the regression equation regardless of
whether or not they are statistically significant. Having chosen the variables we have the
following dialogue box;
Then we have the following output if press ‘OK’ on the above dialog box.

b
Variables Entered/Removed

Variables Variables
Model Entered Removed Method

1 Previous
attendance,
. Enter
Audience size,
a
Age

a. All requested variables entered.

b. Dependent Variable: Price


Model Summary

Adjusted R Std. Error of the


Model R R Square Square Estimate
a
1 .961 .924 .912 147.676

a. Predictors: (Constant), Previous attendance, Audience size, Age

b
ANOVA

Model Sum of Squares df Mean Square F Sig.


a
1 Regression 5288169.883 3 1762723.294 80.829 .000

Residual 436163.451 20 21808.173

Total 5724333.333 23

a. Predictors: (Constant), Previous attendance, Audience size, Age

b. Dependent Variable: Price

a
Coefficients

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 207.729 115.939 1.792 .088

Age 7.484 1.501 .524 4.988 .000

Audience size 10.547 2.395 .446 4.403 .000

Previous attendance 15.336 9.458 .108 1.621 .121

a. Dependent Variable: Price

The first table confirms that price is the dependent variable and age, audience size and previous
attendance are the independent variables.

The second table, model summary, shows that we have explained about 92.4% of the variation in
price with the three explanatory variables and our model is quite good.

The third table, ANOVA, indicates that the model is highly significant,
since 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.00 < 0.05. The table of coefficients, shows us that not all parameters are
significant. The constant term, 𝛽0 is not significant and the coefficient of previous attendance is
not significant. This basically means that previous attendance should be removed as an
explanatory variable. However, the two other explanatory variables are significant, that is, age
and audience size. The following output is obtained if age and audience size are our independent
variables:

b
Variables Entered/Removed

Variables Variables
Model Entered Removed Method

1 Audience size,
a
. Enter
Age

a. All requested variables entered.

b. Dependent Variable: Price

Model Summary

Adjusted R Std. Error of the


Model R R Square Square Estimate
a
1 .956 .914 .906 153.297

a. Predictors: (Constant), Audience size, Age

b
ANOVA

Model Sum of Squares df Mean Square F Sig.


a
1 Regression 5230832.462 2 2615416.231 111.294 .000

Residual 493500.871 21 23500.041

Total 5724333.333 23

a. Predictors: (Constant), Audience size, Age

b. Dependent Variable: Price


a
Coefficients

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 247.097 117.684 2.100 .048

Age 8.153 1.498 .571 5.445 .000

Audience size 10.351 2.483 .437 4.168 .000

a. Dependent Variable: Price

Using the ANOVA table, our model is significant. Also from the coefficients table above, all the
𝑝 − 𝑣𝑎𝑙𝑢𝑒𝑠 are less than 0.05 (default level of significance) implying that all the parameters are
significant, hence the fitted model is;

̂ = 247.097 + 8.153𝐴𝑔𝑒 + 10.351𝐴𝑢𝑑𝑖𝑒𝑛𝑐𝑒 𝑠𝑖𝑧𝑒


𝐶𝑜𝑠𝑡

Activity 4.2
In an effort to model annual company executive salaries for the year 2010, thirty three firms
were selected and data were gathered on salaries, sales, profits and employment. The following
table shows the data:
Firm Annual salary Sales(thousands) Profits(thousands) Employment
(thousands)
1 45 460.6 128.1 480
2 38.7 925.5 783.9 559
3 36.8 152.6 80.2 137
4 27.7 168.3 79.0 277
5 67.6 752.8 231.5 340
6 45.4 205.8 129.5 265
7 50.7 384.6 281.8 308
8 49.6 746.0 237.9 410
9 48.7 434.0 222.3 259
10 38.3 470.6 63.7 860
11 31.1 508 149.5 210
12 27.1 464.4 62.0 680
13 52.4 329.3 277.3 390
14 49.8 377.5 250.7 343
15 84.3 1174.3 820.6 940
16 34.3 174.3 82.6 194
17 32.4 724.7 190.8 400
18 22.5 178.9 63.3 56
19 25.4 66.8 42.8 139
20 20.8 191 48.5 106
21 51.8 933.1 310.6 392
22 40.6 613.2 491.6 400
23 33.2 457.8 228.0 96
24 34.0 545.3 254.6 78
25 69.8 2286.2 1011.3 571
26 30.6 361.0 203.1 52
27 61.3 614.1 201.0 500
28 30.2 101.3 81.3 47
29 54.0 560.3 194.6 300
30 29.3 855.7 260.3 123
31 52.8 421.6 352.1 180
32 45.6 544.04 455.2 177
33 41.7 229.9 97.5 146

Using SPSS, fit an appropriate multiple linear regression model to the data.

You might also like