Multiple Linear Regression (Continue)
EXAMPLE: The following data represent the performance of a chemical process as a
function of several controllable process variables:
CO2 Solvent Hydrogen
Product Total Consumption Y2 X 12 X 22 X1Y X2Y X1X2
Y X1 X2
36.98 2227.25 2.06 1367.52 4960643 4.2436 82364 76.179 4588.1
13.74 434.90 1.33 188.79 189138 1.7689 5976 18.274 578.4
10.08 481.19 0.97 101.61 231544 0.9409 4850 9.778 466.8
8.53 247.14 0.62 72.76 61078 0.3844 2108 5.289 153.2
36.42 1645.89 0.22 1326.42 2708954 0.0484 59943 8.012 362.1
26.59 907.59 0.76 707.03 823720 0.5776 24133 20.208 689.8
19.07 608.05 1.71 363.66 369725 2.9241 11596 32.610 1039.8
5.96 380.55 3.93 35.52 144818 15.4449 2268 23.423 1495.6
15.52 213.40 1.97 240.87 45540 3.8809 3312 30.574 420.4
56.61 2043.36 5.08 3204.69 4175320 25.8064 115675 287.579 10380.3
229.50 9189.32 18.65 7608.87 13710479 56.0201 312224 511.926 20174.4
1. Fit a simple linear regression relating CO2 product to total solvent and calculate the value of
R2. (Assignment)
2. Fit a multiple linear regression relating CO 2 product to total solvent and hydrogen
consumption and calculate the value of R2 and compare the value of R2 in part (1) and
comment.
3. Test the significance of partial regression coefficients, also construct 95% C.I for regression
parameters.
4. Test the significance of multiple regression coefficients
5. Can we conclude that total solvent and hydrogen consumption are sufficient number of
independent variables for explaining the variability in CO2 product?
6. Which explanatory variable effects more to response variable?
SOLUTION:
43.9475
Y
18.6225
1723.79
X1
716.86
3.865
X2
1.435
25 75 6. 8
6 .7 9 35 65
. 62 . 94 71 23 1.4 3 .8
18 43 17
X 1 918.93 X 2 1.865 Y 22.95
( X 1 )( Y ) (9189.32)(229.5)
S ( X 1Y ) X 1Y 312224 101329.106
n 10
( X 2 )( Y ) (18.65)(229.5)
S ( X 2Y ) X 2Y 511.926 83.91
n 10
( X 1 ) 2
S(X1X1) X1
2
5266118.8
n
( X 2 ) 2
S(X 2 X 2 ) X 2
2
21.24
n
( X 1 )( X 2 )
S(X1 X 2 ) X1X 2 3036.32
n
( Y ) 2
S (YY ) Y 2 2341.84
n
S ( X 2 , X 2 )S ( X 1 , Y ) S ( X 1 , X 2 )S ( X 2 , Y ) 1897452.6
b1 0.0185
S ( X 1 , X 1 ) S ( X 2 , X 2 ) [ S ( X 1 , X 2 )] 2
102633124.2
S ( X 1 , X 1 ) S ( X 2 , Y ) S ( X 1 , X 2 ) S ( X 1 , Y ) 134212437.4
b2 1.31
S ( X 1 , X 1 ) S ( X 2 , X 2 ) [ S ( X 1 , X 2 )] 2
bo Y b1 X 1 b2 X 2 3.52
Fitted regression line is
Y = 3.52 + 0.0185 X1 + 1.31 X2
Standard Error of estimate
Se
e 2
1
Y 2 b0 Y b1 X 1Y b2 X 2Y
nk n3
Test of hypothesis about significance of the partial regression
coefficients:
Test of hypothesis for 1
1) Construction of hypotheses
H o : 1 = 0
H1: 1 0
2) Level of significance
= 5%
3) TEST STATISTIC
b1 1 0.0185 0
t 5.68
SE (b1) 0.003257
where
S ( X 2, X 2) 21.24
S .E (b1) S e 7.16 0.003257
S ( X 1, X 1) S ( X 2, X 2) [ S ( X 1, X 2)] 2
(5266118 .8)(21.24) (3036.32) 2
4) Decision Rule:- Reject Ho if tcal t ( n 3) t0.025(7)
2
5) Result:- So reject Ho and conclude that there is significant relationship between CO2
Product and Solvent Total
95% C.I for 1
b1 t /2( n 3) SE (b1)
0.0185 t .025(7) 0.003257
0.0185 ( 2.306) 0.003257
(0.011, 0.026)
Test of hypothesis for 2
1) Construction of hypotheses
H o : 2 = 0
H1: 2 0
2) Level of significance
= 5%
3) TEST STATISTIC
b 2 2 1.31 0
t 0.81
SE (b 2) 1.622
where
S ( X 1, X 1) 5266118.8
S .E (b 2) S e 7.16 1.622
S ( X 1, X 1) S ( X 2, X 2) [ S ( X 1, X 2)] 2
(5266118.8)( 21.24) (3036.32) 2
4) Decision Rule:- Reject Ho if tcal t ( n 3) t0.025(7) 2.306
2
5) Result:- So don’t reject Ho and conclude that there is non-significant relationship between
CO2 Product and Hydrogen Consumption
95% C.I for 2
b2 t /2( n 3) SE (b 2)
1.31 t .025( 6) 1.622
1.31 ( 2.306)1.622
(-2.43, 5.05)
ANALYSIS OF VARIANCE IN MULTIPLE LINEAR REGRESSION
The hypothesis 1=2=0 may be tested by analysis of variance procedure.
Total SS=S(Y,Y)= 2341.84
Reg.SS =b1 S(X1,Y)+ b2 S(X2,Y)=(0.0185)( 101329.106 )+(1.31)( 83.91 )=1983.07
ANOVA TABLE
Degree of Mean Sum
Source Of Freedom Sum of Squares
of Squares
Variation Fcal Ftab
(S.O.V) (SS)
(DF) (MSS=SS/df)
Regression 2 1983.07 991.54 F.05(2,7)=4.74
19.35*
Error 7 358.77 51.25
TOTAL 9 12341.84
Coefficient of Determination
The co-efficient of determination tells us the proportion of variation in the dependent variable
explained by the independent variables
Re g .SS 1983.07
R2 x100 x100 84.7%
TotalSS 12341.84
The value of R2, indicates that about 85 % variation in the dependent variable has been explained
by the linear relationship with X1 & X2 and remaining are due to some other unknown factors.
Relative importance of independent variables
Standardized regression coefficients are useful for measuring the relative importance of the
independent variables because Standardized regression coefficients are unit free quantities
S(X1, X1) 5266118.8
b1* b1 0.0185 0.38
S (YY ) 12341.84
S(X 2 , X 2 ) 21.24
b2* b2 1.31 0.054
S (YY ) 12341.84
So Solvent Total(X1) is more important variable than Hydrogen Consumption(X2) in predicting
the CO2 Product.