Passing Reference Multiple Regression

A Passing Reference on Multiple Regression Analysis
Dibyojyoti Bhattacharjee
Department of Statistics, Assam University, Silchar. Email: djb.stat@gmail.com
1. Introduction
Multiple regression is an extension of a bivariate regression technique. It includes several independent
variables (also called as predictors) instead of only one as in case of the bivariate regression. Multiple
regression model is used, when a relationship is to be established between a dependent variable Y and
its potential predictors (X1, X2, X3,…). In general, the response (dependent) variable may be related to the
k regressors (or predictor) by the relation,
y   0  1 x1   2 x2  ...   k xk   …(1)
Where,
b0 = the constant of regression also called as intercept.
b1, b2 ,…, bk = the partial regression coefficients associated with the k independent variables.
Î = The residual of regression, which is the error term associated with the regression model.
x1, x2 etc. are the independent variables or regressors or predictors
y the dependent or response variable.
The value ‘j’ explains the expected change that takes place in the response variable (Y) with unit change
in xj when all the other regressor variables viz. xi (i ≠ j) are held constant.
2. Assumptions of Multiple Regression

• The population from which the dependent variables are drawn should be normally distributed.
• The independent variables are not related to each other.
• Each of the observation was recorded free from the other.
• The error terms are independently distributed and the expected value is zero and constant
variance i.e.  ~ N (0,  2 ) .
These assumptions are important; as it is only under these assumptions that the ordinary least square
estimates yield unbiased, consistent and efficient estimates of the unknown parameters i (i = 1, 2,…,k).
3. Understanding the Strength of Regression

The general practice is that the regression’s strength is explained by R2 which is the square of
the multiple correlation coefficient of Y on X 1, X2, X3,…,Xk. Higher the value of R2, much better is the
prediction made by the regressors about the response variable. But this is not always true. Sometimes a
model with low R2 may give useful predictions, while a model with high R2 may conceal problems. The
principle of ‘Occam’s Razor’ says that a complex model, that is only slightly better may not be preferred
if a simple model can do the job.
If there is sufficient data then one should use half the data to estimate the model and the other
half to test the model’s predictive capability. Before considering an independent variable as regressor in
the regression model the following criteria should be evaluated:
Logic: Is there any reason to expect a relation between the predictors and the response variable?
1
Fit: Does the overall regression shows any improvement with the inclusion in the new variable?
Parsimony: Does each predictor contribute significantly in the relation?
Stability: Are the predictors strongly related to each other?
3.1 Coefficient of Determination (R2)
The most common measure for overall fit is the coefficient of determination or R2.
R2  1
 ( y  yˆ )
i i
2
 ( y  y) i
2
…(2)
2
The value of R lies between 0 and 1. The value is multiplied by 100 to get the percentage of variation in
the response variable that are explained by the regressors.
3.2 Adjusted R2
With increase in the number of predictors in the model the value of the coefficient of determination i.e.
R2 shall keep on increasing. This will make us feel that more the number of predictors better is the
regression fit. To discourage this, an adjustment can be made in the R2 statistic by penalizing the
inclusion of useless predictors. The adjusted coefficient of determination using n observations and k
predictors are,
 n 1 
2
Radj  1  (1  R 2 ) 
 n  k 1 …(3)
We have for a given set of data Radj  R . With increase in the predictors in the model the value of R 2
2 2
2
cannot decline and generally raises but the value of Radj may raise, remain the same or shall fall
depending on whether the added predictors increase R 2 sufficiently.
3.3 Standard Error
Another important measure of fit is the standard error (SE) of the regression, which is a function of the
sum of squares for residuals (SSE) for n observations and k predictors, SE is given by,
n
(y i  yˆ i ) 2
SSE
SE  i 1
 ...(4)
n  k 1 n  k 1
If all predictions are perfect SE shall be equal to zero. Thus, less the value of the SE better is the
regression fit. Whenever we report the result of a multiple regression we shall spell out the estimated
values of the parameter along with the values of R 2 and S.E.
4. How many predictors shall we use in the model?

When there is more than one regressor we can have several models. Suppose, we have three
regressors X1, X2 and X3 and one response variable Y. We can have several regression equation like,
Y   0  1 X 1  
Y  0  2 X 2  
Y   0  1 X 1   2 X 2  
Y   0  1 X 1   3 X 3  
Y   0  1 X 1   2 X 2   3 X 3   an so on.
2
Statistically the regression is possible as long as the number of observations (sample size) is more than
the number of predictors. But according to Evan’s rule if n is the sample size and k is the number of
n n
predictors, then  10 (atleast 10 observations per predictor), and as per Doane’s rule,  5 (atleast
k k
10 observations per predictor).
5. F test for Significance of the Regression Equation

The overall quality of fit of a regression is assessed using the F test. For a regression with k predictors,
the hypothesis to be tested is,
H0 : All the partial regression coefficients are zero i.e. b1 = b2 = …= bk = 0
To be tested against the alternative hypothesis,
H1: Atleast one of the coefficients is non zero.
The basis of this F-test is the ANOVA (Analysis of variance) table, which decomposes the variation in the
response variable into two main parts viz,
Total Variation (TSS) = Variation explained by the regression (SSR) + Unexplained error (SSE)
n n n
Mathematically,  ( yi  y ) 2  ( yˆ i  y ) 2   ( yi  yˆ i ) 2
i 1 i 1 i 1 ...(5)
i.e. Total sum of squares (TSS) = Sum of squares due to Regression (SSR) + Error sum of squares (SSE)
The ANOVA calculations can be summarized in the table as follows:
Source of Sum of df Mean sum of F value p-value
variation Squares squares
Regression SSR k MSR = SSR/k F=MSR/MSE
Error SSE n-k-1 MSE = SSE/(n-k-1)
Total TSS n-1
The p-value shall depend on the calculated value of F i.e. MSR/MSE. p-value is the calculated value of
the probability of rejecting the null hypothesis (H0) of a study question when that hypothesis is true. If
the p- value is less than the chosen significance level (generally 0.05), then you reject the null hypothesis
i.e. accept that the sample gives reasonable evidence to believe that atleast one of the regression
coefficients are different from zero.
6. Test of Significance of Individual Regression Coefficients

As defined earlier each of the partial regression coefficients shows the expected change in the value of Y
with unit change in the corresponding explanatory variable, holding the other explanatory variables
constant. Thus, we are generally interested in testing if the value of the partial regression coefficient is
significantly different from zero, i.e. to test H 0: j = 0. Accepting the null hypothesis would mean that j =
0 and so the variable associated with j i.e. Xj shall have no significant impact on the response variable Y.
The test statistic for the purpose is given by,
3
ˆ j
t ~ t n  k 1
sj ...(6)
Where, sj is the standard error associated with the estimated value of the regression coefficient
7. Computing Regression in Excel

Let us consider the following data set considered from Kadiyala (1970) 1 which comprises of ice cream
consumption (Y, pints per capita) measured over 30 weeks, X 1 refers to price of ice cream per pint in $,
X2 refers to weekly family income in $ and X 3 refers to mean temperature in 0F. This is an abridged view
of the data used for computation.
Sl. No. Y X1 X2 X3
1 0.386 0.27 78 41
2 0.374 0.282 79 56
3 0.393 0.277 81 63
… … … … …
30 0.548 0.26 90 71
To fit a regression model to this data we need to estimate the ’s as given in (1) and other relevant tests
as discussed earlier. Fortunately, we can use Microsoft Excel in doing so provided the ‘Data Analysis
Toolpak’ is loaded. Under the menu Data, click Data Analysis sub-menu and choose Regression2. The
Regression toolkit will ask for the cells containing the Y [response variable] information and all the cells
containing X [explanatory variable] information. The latter may be contained in several [adjacent]
columns but the former should have its data in one column only. The rest of the toolkit is self-
explanatory.
On completion of the inputs in the dialog box shown in ‘Screenshot 1’ we click ‘OK’ to we get the results.
Once the results are available it is important to interpret them. The results are displayed in three tables;
the first one is titled as, Regression Statistics. The table as seen below provides the value of the
coefficient of determination (R Square) and the adjusted R 2 value. For this example we can comment
that 71.8 % variation in the dependent variable i.e. per capita consumption of ice cream (Y) is explained
by the independent variables i.e. X1, X2 and X3 as mentioned earlier.
The next table is the ANOVA table from which we can find the overall significance of the
regression equation. The most important value of the table is the value under ‘Significance F’ which is
actually the p-value of the test. If the value under ‘Significance F’ is less than 0.05 then we say that the
regression is significant i.e. the independent variables (or predictors) are able to explain the dependent
(response) variable as the case here.
1
Koteswara Rao Kadiyala (1970) Testing for the independence of regression disturbances. Econometrica, 38, 97-
117.
2
In case the Data Analysis sub-menu is not present in the data menu you have to get it activated. For that Click the
Office button, , Excel OptionsAdd InsAnalysis ToolpakGoOK.
4
Next table helps us to get the regression equation as it spells out the values of the partial
regression coefficients. The P-value column helps us to understand which of the regression coefficients
are significant. From the table below we understand that the regression equation is:
y = 0.1973-1.0444 X1 + 0.0033 X2+0.0034 X3+ …(7)
Screenshot 1: Multiple regression dialog box
Regression Statistics
Multiple R 0.847935052
R Square 0.718993852
Adjusted R Square 0.686570066
Standard Error 0.036832698
Observations 30
ANOVA
Significance
df SS MS F F
Regression 3 0.09025 0.0301 22.17 2.45×10-7
Residual 26 0.0353 0.0014
Total 29 0.1255
5
Standard
Coefficients Error t Stat P-value
Intercept 0.197315072 0.270216157 0.730212 0.471789405
X1 -1.04441399 0.834357321 -1.25176 0.22180273
X2 0.00330776 0.001171418 2.823722 0.008988729
X3 0.00345843 0.000445547 7.762213 3.10002×10-8
Also from the P-value column we can understand that the variables X 2 and X3 are having a significant
impact in the regression equation as in both these cases the P-values are less than 0.05.
8. What if a predictor is Binary?

A binary predictor is one which signifies the presence and absence of an attribute. For example, a
subject identified by its gender can be denoted by,
X = 1 for male
= 0, for female
For investment habits we can have,
X = 1 if the respondent invests in share market
= 0, otherwise
For electronics equipments we can have,
X = 1 if equipment is produced by a branded company
= 0, otherwise
The binary predictor is also called as shift variable because it shifts the regression plane up or down.
Suppose we have the regression equation as
Y   0  1 X 1   2 X 2   …(8)
Let X1 be a binary predictor, thus X 1 = 1, if an attribute is present, in such a case the regression equation
becomes
Y   0  1  1   2 X 2  
 (  0  1 )   2 X 2   …(9)
Otherwise if X1 = 0, then
Y   0  1  0   2 X 2  
 0  2 X 2   …(10)
Testing the significance of the partial regression coefficient associated with a binomial predictor is same
as that of any other predictor. It does not require any other special treatment.
9. In Case of Multinomial Predictor

Sometimes some of the predictor may have more than two categories like smoking habits. Here we may
have a few categories like, non-smoker, occasional smoker and chain smoker or for source of drinking
water the available categories can be pond, spring, river, tap etc. In such a case we require to have
several binary variables for representing one multinomial predictor. In the smoker case we shall require
6
two binary predictors i.e. one less than the total number of categories. The binary predictors in this case
shall be,
Non-smoker = 1 if the respondent is a non-smoker
= 0, otherwise
Occasional smoker = 1 if the respondent is an occasional smoker
= 0, otherwise
There is no need to include the ‘chain smoker’ as a binary predictor as ‘non-smoker = 0’ and ‘occasional
smoker = 0’ shall imply that the respondent is a chain smoker. However, it is better not to include many
such multinomial variables in the regression equation as each multinomial variable shall give raise to
several binomial predictors to be included in the regression model. Inclusion of several such variables
may lead to violation of both Evan’s rule and Doane’s rule discussed earlier.
10. Including Non-linearity and Interaction in a Regression Model

We generally assume that all the predictors in a regression model are linearly related to the response
variable, but sometimes there may not be any reason for such supposition. In such a case with two
predictors instead of assuming a model of the form,
Y   0  1 X 1   2 X 2   …(11)
We can start with,
2 2
Y   0  1 X 1   2 X 1   3 X 2   4 X 2   …(12)
One can draw a scatter plot between (X 1, Y) or (X2, Y) and came out with a decision if at all such a
supposition as in (12) is necessary. Here, X 12 , X 22 can be treated as new variables in the model and
their values be used to fit the regression in the usual manner. In case the hypotheses H 0:  2  0 and H0:
 4  0 are accepted there is no reason to include X 12 and X 22 in the model. However, square of
predictors add complexity to the model as it decreases the degrees of freedom for significant tests ( 1 df
for each predictor). But, if such inclusion improves the model specification then the choice is justified.
Interaction between two predictors can be included in the model by introducing their product in the
regression model. For example, the equation,
Y   0  1 X 1   2 X 2   3 X 1 X 2   …(13)
included the interaction between X1 and X2 via X1X2.
If the t test for the partial regression coefficient 3 rejects the hypothesis H0:  3  0 then it is pointless
to carry the interaction term in the model. Interaction effects, lead to loss of degrees of freedom at the
cost of 1 df per interaction. However, if inclusion of the interaction improves the model fit then the loss
is worth the cost.
11. Multicollinearity
When the independent variables of the regression model i.e. X1, X2,…, Xk are inter correlated instead
of being independent of each other, then it is known as multicollinearity. If any two independent
variables are correlated then we have collinearity. Multicollinearity influences the variances of the
7
estimated regression coefficients. This inflates the confidence interval for the true coefficients b1, b2,...,
bk making the t statistics less reliable. Thus, the role of each of the regression coefficients gets mixed up.
It is a good practice to calculate correlation coefficient between all the pairs of variables and test if
the values are significant i.e. if the correlation values are significant i.e. if the correlation values are differ
from zero at  = 0.05 in a two tailed test. If correlation between any pair comes out to be significant
then any one of the variable is to be retained in the model. The Klein’s rule states that when the
correlation between a pairwise predictor exceeds the multiple correlation coefficient (R) then dropping
of one of the variables of the pair from the model may be suggested. However, correlation coefficient
only looks at pairwise relation between variables but a general test of multicollinearity should look at
more complex relationship between the predictors. For example, X 2 might be a linear function of X1, X3
and X4 even though its pairwise correlation may not be very large. The variance inflation factor (VIF) for
each predictor provides a more comprehensive test. The VIF for the j th predictor is given by,
1
VIFj = ...(14)
1  R 2j
Where Rj2 is the coefficient of determination when the predictor j is regressed against all the other
predictors excluding Y. The minimum value of VIF is 1. As a rule of thumb any value of VIF > 10 may be
considered to have a strong evidence of j th variable to be related to the other j – 1 predictors. Removing
a relevant predictor from a model should be considered seriously as it may lead to misspecification of
the model.
12. Non-normal Errors

One important assumption of regression analysis is that the errors are normally distributed. In the
absence of several major outliers violation of the assumption is not that serious. Such violation may lead
to an unreliable confidence interval for the estimated parameters. One can draw a probability plot or
histogram of the residuals to test the normality of residuals visually.
13. Non constant Variance

Another important assumption of regression analysis is that of the constant variance i.e. the errors
are homoscedastic. Violation of this assumption would lead to biasness in estimated variances of the
least square estimates of the regression parameters. This should lead o the overstated values of the t
statistic for the estimated regression parameters making their confidence interval artificially narrow. A
visual procedure of understanding constant variance is to plot a scatter plot with residual values of the
variates along Y axis and the corresponding fitted values of Y along the X axis. If the points in the scatter
plot causes a fan-out pattern or a funnel-in pattern then there is an evidence of non-constant variance.
14. Auto Correlation

Auto correlation is a pattern of non-independent errors, which mainly occurs in time series data.
In regression involving a time series each of the residuals e t should be independent of the predecessors
et-1, et-2,... In case of autocorrelation errors the estimates of the regression coefficients are unbiased but
the estimated variances are biased. This leads to the confidence intervals of the regression coefficients
8
that are too narrow and t statistic that are too large. Thus, the model fit may be overstated. The most
widely used test for the detection of auto correlation is the Durbin-Watson test.
Here, H0: Errors are nonautocorrelated.
H1: Errors are autocorrelated.
The test statistic is given by,
n
 (e t  et 1 ) 2
DW  t 2
n
e
t 1
2
t
In general,
DW < 2 suggests positive autocorrelation (commonly seen)
DW  2 suggests no autocorrelation (ideal)
DW > 2 suggests negative autocorrelation (rare)
A way to get rid of the first order auto-correlation is to transform both variables by their first difference.
Thus, we have,
xt  xt  xt 1
y t  y t  y t 1
The transformation leads to the diminishing of the number of observations by 1. Some other possible
methods of eliminating the auto correlated errors can be studied in books related to econometrics.
15. Standardized Residuals
By dividing each residual by its standard error we get the standardized residual. As a rule of
thumb any absolute value of the standardized residual which is more than 2 is unusual and any residual
whose absolute value is three or more would be considered as an outlier.
16. Outliers
The outlier causes several problems in regression including loss of fit. An outlier may be an error in
recording the data. In case of an error it is advisable to check and input correct data. But often it is
difficult to identify such an error.
17. Variable Transformation
Sometimes a relationship cannot be assumed using linear regression. Some variable transformation
can be performed to improve the fit. In case of an exponential trend in the scatter plot one can uses log
transformation as it reduces the heteroscedasticity to improve the fit.
18. Conclusion
Another interesting area of discussion is how the regression equation behaves when the assumption on
which it is based are violated. That is a very calling area and introduces several other terminologies like
auto correlation, multicollinearity, variance inflation factor, homoscedastic etc. Some other time the
discussion would be continued. I hope this note shall help the young research scholars, students and
teachers to understand multiple regression in a nut shell. The entire discussion is based on the chapter
Multiple Regression Analysis that appeared in the book ‘Applied Statistics in Business and Economics’ by
Doane and Seward, published by Tata-McGraw-Hill Publishing Company Limited, New Delhi in 2007,
page 558-603.
9
10

Passing Reference Multiple Regression

Uploaded by

Passing Reference Multiple Regression

Uploaded by

A Passing Reference on Multiple Regression Analysis

2. Assumptions of Multiple Regression

3. Understanding the Strength of Regression

4. How many predictors shall we use in the model?

5. F test for Significance of the Regression Equation

6. Test of Significance of Individual Regression Coefficients

7. Computing Regression in Excel

Office button, , Excel OptionsAdd InsAnalysis ToolpakGoOK.

Screenshot 1: Multiple regression dialog box

8. What if a predictor is Binary?

9. In Case of Multinomial Predictor

10. Including Non-linearity and Interaction in a Regression Model

12. Non-normal Errors

13. Non constant Variance

14. Auto Correlation

You might also like