Simple Linear Regression Analysis
Simple Linear Regression Analysis
1
Introduction to Regression Analysis
The regression analysis is one of the most important and widely
used statistical techniques in business and economic analysis for
examining the functional relationships between two or more
variables. One variable is specified to be the dependent/response
variable (DV), denoted by Y, and the other one or more variables
are called the independent/predictor/explanatory variables (IV),
denoted by Xi, i=1,2, … k.
There are two different situations:
(a) Y is a random variable and Xi are fixed, no-random
variable, e.g. to predict the sales for a company, the Year is
the fixed Xi variable.
(b) Both Xi and Y are random variables, e.g. all survey data
are of this type, in this situation, cases are selected randomly
from the population, and both Xi and Y are measured.
2
Main Purposes
Regression analysis can be used for either of two main
purposes:
(1)Descriptive: The kind of relationship and its strength
are examined. This examination can be done
graphically or by the use of descriptive equations.
Tests of hypotheses and confidence intervals can serve
to draw inferences regarding the relationship.
(2)Predictive: The equation relating Y and Xi can be used
to predict the value of Y for a given value of Xi .
Prediction intervals can also be used to indicate a likely
range of the predicted value of Y.
3
Description of Methods of Regression:
The general form of a probabilistic model is
Y = Deterministic component + random error
As you will see, the random error plays an important role in
testing hypotheses and finding confidence intervals for the
parameters in the model.
5
Important Features of the Model
(1) The response Yi in the ith trial is the sum of two components: (1)
the constant term 0 + 1 Xi and (2) the random term i . Hence,
Yi is a random variable.
(2) E(Yi) = 0 + 1 Xi
(3) Yi in the ith trial exceeds or falls short of the value of the
regression function by the error term amount i .
(4) var(Yi) = var(i) = 2. Thus, the regression model assumes that the
probability distributions of Y have the same variance 2,
regardless of the level of the predictor variable X.
(5) Since the error terms i and j are uncorrelated, so are Yi and Yj.
(6) In summary, the regression model implies that the responses Yi
come from probability distributions whose means are E(Yi) = 0 +
1 Xi and var(Yi)=2, the same for all levels of X. Further, any
two Yi and Yj are uncorrelated.
6
Estimating the Model Parameters
E() = 0 is equivalent to that E(Y) equals the deterministic
component of the model. That is,
E(Y) = 0 + 1X ,
where the constants 0 and 1 are the population parameters. It
is called the population regression equation (line).
7
The Least Squares Method
The principle of least squares is illustrated in the
following Figure.
Y Estimated (Y)
e4
e2 Actual (Y)
e1 e3
Y b0 + b1 x
X
8
The Least Squares Method (Cont.)
For every observed Yi in a sample of points, there is a
corresponding predicted value i, equal to b0 + b1 xi. The
sample deviation of the observed value Yi from the
predicted i is
ei = Yi - i ,
called a residual, that is,
ei = Yi - b0 - b1Xi .
We shall find b0 and b1 so that the sum of the squares of the
errors (residuals) SSE =ei2 =(Yi - i )2=(Yi - b0 - b1 Xi)2 is
a minimum.
9
The Least Squares Method (Cont.)
Differentiating SSE with respect to b0 and b1, we have
(SSE)/ b0 = -2(yi - b0 - b1 Xi)
(SSE)/ b1 = -2(yi - b0 - b1 Xi)Xi .
Setting the partial derivatives equal to zero and rearranging the
terms, we obtain the equations (called the normal equations)
n b0 + b1 Xi = Yi and b0 Xi + b1 Xi2 = Yi Xi
which may be solved simultaneously to yield computing formulas
for b0 and b1 as follows:
b1 = SSxy /SSxx (=r×sy/sx), b0 = - b1
where
SSxy = (Xi - )(Yi - ) = Xi Yi - ( Xi Yi )/n
SSxx= (Xi - )2 = Xi2 - ( Xi)2/n
10
Properties of Least Squares Estimators
(1)Gauss-Markov Theorem
Under the conditions of the regression model, the least
squares estimators b0 and b1 are unbiased estimators
(i.e., E(b0) = 0 and E(b1) = 1) and have minimum variance
among all unbiased linear estimators.
(2) The estimated value of Y (i.e. = b0 + b1X) is an unbiased
estimator of E(Y) = 0 + 1 X, with minimum variance in the
class of unbiased linear estimators.
11
Properties of Fitted Regression Line
(1) The sum of the residuals is zero: ei = 0
(2) The sum of the squared residuals, ei2 , is a minimum.
(3) i = Yi
(4) Xiei = 0
(5) iei = 0
(6) The regression line always goes through the point ( , ).
12
Example 1
A random sample of 42 firms was chosen from the S&P500 firms
listed in the Spring 2003 Special Issue of Business Week (The
Business Week Fifty Best Performers).
The dividend yield (DIVYIELD) and the 2002 earnings per share
(EPS) were recorded for the 42 firms. These data are in a file named
DIV3. Using dividend yield as the DV and EPS as the IV, plot the
scatter diagram and run a regression using SPSS.
(a)Find the estimated regression line .
(b)Find the predicted values of DV given EPS =1 and EPS=2.
13
Example 1 – Solution
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
EPS
.374 .239 .240 1.562 .126
= 2.034 + 0.374 x
14
Example 1 - Scatter Diagram
= 2.034 + 0.374x
15
Normal Error Regression Model
No matter what may be the form of the distribution of the error
terms i (and hence of the Yi), the LSM provides unbiased point
estimators of 0 and 1 that have minimum variance among all
unbiased linear estimators.
To set up interval estimates and make tests, however, we need to
make an assumption about the form of the distribution of the i .
The standard assumption is that the error terms i are normally
distributed, and we will adopt it here.
Since now the functional form of the probability distribution of the
error terms is specified, we can use the maximum likelihood
method to obtain estimators of the parameters 0, 1 and 2. In
fact, MLE and LSE for 0 and 1 are the same. The MLE for 2 is
biased = ei2/n= SSE/n = s2 (n-2)/n.
A normal error term greatly simplifies the theory of regression
analysis (See the comments on page 32).
16
Normality & Constant Variance
Assumptions
f(e)
Y
X1
X2
X
E(Y) = 0 + 1 X
17
Inferences Concerning the Regression
Coefficients
18
Inferences Concerning 1
(a) b1 is an normal random variable for the normal error model.
(b) E(b1) = 1 . That is, b1 is an unbiased estimator of 1.
(c) Var(b1) = 2/SSxx, which is estimated by s2 (b1) = s2/SSxx , where s2
is the unbiased estimator of 2.
(d) The (1 - ) 100% Confidence interval for 1 (2 unknown)
b1 - t/2 s(b1) < 1 < b1 + t/2 s(b1)
where t/2 is a value of the t - distribution with (n - 2) degrees of
freedom, and s(b1) is the standard error of b1 , i.e. s(b1) = s /(SSxx)1/2 .
(e) Hypothesis test of 1
To test the null hypothesis H0: 1 = 0 against a suitable alternative,
we can use the t distribution with n-2 degrees of freedom to
establish a critical region and then base our decision on the value of
t = b1 /s(b1) .
19
Inferences Concerning 0
(a) b0 is an normal random variable for the normal error model.
(b) E(b0) = 0 . That is, b0 is an unbiased estimator of 0.
(c) Var(b0) =2 Xi2/nSSxx, which is estimated by s2 (b0) = s2Xi2/nSSxx ,
where s2 is the unbiased estimator of 2.
(d) The (1 - ) 100% Confidence interval for 0 (2 unknown)
b0 - t/2 s(b0) < 0 < b0 + t/2 s(b0)
where t/2 is a value of the t - distribution with (n - 2) degrees of
freedom, and s(b0) = s(Xi2/nSSxx )1/2 .
(e) Hypothesis test of 0
To test the null hypothesis H0: 0 = 0 against a suitable alternative,
we can use the t distribution with n-2 degrees of freedom to
establish a critical region and then base our decision on the value of
t = b0 /s(b0) .
20
Some Considerations
Effects of Departures From Normality
If the probability distributions of Y are not exactly
normal but do not depart seriously, the sampling
distributions of b0 and b1 will be approximately
normal. Even if the distributions of Y are far from
normal, the estimators b0 and b1 generally have the
property of asymptotic normality as the sample
size increases. Thus, with sufficiently large
samples, the confidence interval and decision rules
given earlier still apply even if the probability
distributions of Y depart far from normality.
21
Inferences Concerning E(Y)
(1) The sampling distribution of i is normal for the normal error
model.
(2) i is an unbiased estimator of E(Yi).
Because E(Yi) = 0 + 1Xi and
E( i) = E(b0 + b1 Xi) = 0 + 1Xi = E(Yi).
23
Prediction Interval
Prediction Prediction
limits limits
if E(Yi) here if E(Yi) here
24
Prediction Interval (cont.)
Since we cannot be certain of the location of the distribution
of Y, prediction limits for Yi(new) clearly must take account of
two elements: (a) variation in possible location of the
distribution of Y; and (b) variation within the probability
distribution of Y. That is,
var(predi)=var(Yi(new)- i)= var(Yi(new))+var( i)= 2+var( i).
An unbiased estimator of var(pred) is as follows
s2(predi)= s 2 + s2( i) = s2[1+ (1/n) + (Xi - )2/SSxx]
The (1 - ) 100% prediction interval for Yi(new) is as follows
i - t/2, (n-2) s(predi) < Yi(new) < i + t/2, (n-2) s(predi)
25
Comments on Prediction Interval
The prediction limits, unlike the confidence limits for a
mean response E(Yi), are sensitive to departures from
normality of the error terms distribution.
26
Hyperbolic Interval Bands
_ X
X Xgiven 27
Example 2
The vice-president of marketing for a large firm is concerned about the
effect of advertising on sales of the firm’s major product. To investigate
the relationship between advertising and sales, data on the two variables
were gathered from a random sample of 20 sales districts. These data
are available in a file named SALESAD3. Sales (DV) and advertising
(IV) are both expressed in hundreds of dollars.
Coefficientsa
= -57.281 + 17.57x
29
Example 2 – Scatter Plot
= -57.281 + 17.57x
30
The Coefficient of Determination
In many regression problems, the major reason for constructing
the regression equation is to obtain a tool that is useful in
predicting the value of the dependent variable Y from some known
value of the independent variable X. Thus, we often wish to assess
the accuracy of the regression line in predicting the Y values.
32
Partitioning Variation (Cont.)
The dependent variable Y can be partitioned into two parts -
explained variation by regression & unexplained variation.
33
Computing Formulas
The various sums of squares may be found more simply by
using the following formulas.
SST = SSyy=(Yi - )2 = (Yi)2 - (Yi )2/n
SSR = ( i - )2 = b1 (SSxy)
SSE = (Yi - i)2 = SS(Total) - SSR .
Solution:
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 4.052E7 1 4.052E7 114.539 .000a
Residual 6368342.383 18 353796.799
Total 4.689E7 19
a. Predictors: (Constant), adv
36
Description of Methods of Regression:
Case When X is Random
For variable-X case, both X and Y are random variables
measured on cases that are randomly selected from a
population.
The fixed-X regression model applies in this case when we treat
the X values as if they were pre-selected. This technique is
justifiable theoretically by conditioning on the X values that
happened to be obtained in the sample (Textbook page 83).
Therefore all the previous discussion and formulas are
precisely the same for this case as for the fixed-X case.
Since both X and Y are considered random variables, other
parameters can be useful for describing the model, say,
covariance of X and Y, denoted by XY (or Cov(X, Y)), and
correlation coefficient, denoted by , which are measures of
how the two variables vary together.
37
Correlation Coefficient
The correlation coefficient = XY/XY is a measure of the
direction and the strength of linear association between two
variables. It is dimensionless, and it may take any value between
- 1 and 1, inclusive.
39
Test of Coefficient of Correlation
Note that tests of hypotheses and confidence intervals for the
variable-X case require that X and Y be jointly normally
distributed. That is, X and Y follow a bivariate normal
distribution.
40
Example 2 (cont.)
Use the data in the example to test that if there is a significant linear
relationship between the sales and advertising expense (both in
hundreds of dollars). Use = 0.05.
Solution:
(1) H0: = 0 against Ha: 0
(2) = 0.05, n = 20, df = n - 2 = 18 and t0.025, 18 = 2.101
(3) The rejection rule: If the |t| > 2.101, then reject the H0.
(4) Computations: r =SSxy/(SSxxSSyy)1/2 = 0.9296
(5) We reject H0 at = 0.05 since t = 10.701 > 2.101 and conclude that
there is a significant linear relationship between the weekly usage
and annual maintenance expense .
41
Further Examination of Computer Output
Standardized Regression Coefficient
The standardized regression coefficient is the slope in the
regression equation if X and Y are standardized. After
standardization the intercept in the regression equation will
be zero, and for simple linear regression the standardized
slop will be equal to the correlation coefficient r. In multiple
regression, the standardized regression coefficients help
quantify the relative contribution of each X variable.
Coefficientsa
Standardiz
Unstandardized ed 95% Confidence
Coefficients Coefficients Interval for B
Lower Upper
Model B Std. Error Beta t Sig. Bound Bound r
1 (Constant) -57.281 509.750 -.112 .912 -1128.227 1013.665
Adv 17.570 1.642 .930 10.702 .000 14.121 21.019
a. Dependent Variable: sales 42
Checking for Violations of Assumptions
We usually do not know in advance whether a linear regression
model is appropriate for our data set. Therefore, it is necessary to
conduct a search to check whether the necessary assumptions are
violated. The analysis of the residuals is frequently helpful and
useful tool for this purpose.
The basic principles apply to all statistical models discussed in this
course.
Residuals: In model building, a residual is what is left after the
model is fit. It is the difference between an observed value of Y
and the predicted value of Y, i.e. Residuali = ei = (Yi - i). In
regression analysis, the true errors are assumed to be independent
normal variables with a mean of 0 and a constant variance of 2.
If the model is appropriate for the data, the residuals ei, which are
estimates of the true errors, should have similar characteristics.
(Refer to Pages102~103)
43
Checking for Violations of Assumptions
Identification of equality of variance
Scatter plots can also be used to detect whether the assumption of
constant variance of y for all values of x is being violated. If the
spread of the residuals increases or decreases with the values of
the independent variable or with the predicted values, then the
assumption of homogeneity of variance is being violated.
Identification of independence
Usually this assumption is relative easy to meet since observations
appear in a random position, and hence successive error terms are
also likely to be random. However, in time series data or repeated
measures data, this problem of dependence between successive
error terms often occurs.
44
Checking for Violations of Assumptions (Cont.)
Identification of normality
A critical assumption of the simple linear regression model is that
the error terms associated with each xi have a normal
distribution. Note that it is unreasonable to expect the observed
residuals to be exactly normal - some deviation is expected because
of sampling variation. Even if the errors are normally distributed
in the population, sample residuals are only approximately
normal.
Identification of outliers
In combination with a scatter plot of the observed dependent and
independent variables, the plot of residuals can be used to identify
observations which appear to fall a long way from the normal
cluster observations (a residual that is larger than 3s is an outlier).
46
Overview of Tests Involving Residuals
Tests for Randomness in the Residuals
Runs Test
Tests for Autocorrelation in the Residuals in Time Order
Durbin-Watson Test
Tests for Normality
Correlation Test (Shapiro-Wilk Test)
Chi-Square Test
Kolmogorov Test
Tests for Constancy of Error Variance
Brown-Forsythe (Modified Levene) Test*
Cook-Weisberg (Breusch-Pagan) Test*
F-test for Lack Of Fit
Test whether a linear regression function is a good fit for the data*.
(Note that the tests with * are valid only for large samples or under strong
assumptions)
47
Overview of Remedial Measures
If the linear regression normal error model is not
appropriate for a data set, there are two basic choices
Abandon the model and develop and use a more
Transformations
Transformations for nonlinear relation
variances
Box-Cox Transformations
48
What to Watch Out For
In the development of the theory for linear regression, the
sample is assumed to be obtained randomly in such a way
that it represents the whole population you are studying.
Often, convenience samples, which are samples of easily
available cases, are taken for economic or other reasons. It is
likely to be an underestimate of the variance and possibly
bias in the regression line.
49
What to Watch Out For (Cont.)
Association versus Causality – A common mistake made
when using regression analysis is to assume that a strong
fit (high R2) of a regression of Y on X automatically
means that “X causes Y” .
(1) The reverse could be true: Y causes X
(2) There may be third variable related to both X and Y.
50
Matrix Approach to Simple Linear
Regression Analysis
yi = 0 + 1 xi + i , i = 1, 2, …, n
This implies
y1 = 0 + 1 x1 + 1 ,
y2 = 0 + 1 x2 + 2 ,
…………………….
yn = 0 + 1 xn + n ,
Let Yn1 = (y1, y2 , …, yn)’, Xn2 = [1n1 , (x1, x2, … xn)’],
21 = (0 , 1)’ and n1 = (1, 2 , …, n)’ .
Then the normal model in matrix terms is as follows
Yn1 = Xn2 21 + n1 or simply Y = X +
where is a vector of independent normal variables with
E( ) = 0 and Var() = Var(Y) = 2 I.
51
LS Estimation in Matrix Terms
Normal Equations
n b0 + b1 Xi = Yi
b0 Xi +b1 Xi2 = Yi Xi
in matrix terms are X’Xb = X’Y where b = (b0, b1)’.
Estimated Regression Coefficients
(X’X)-1 X’Xb = (X’X)-1 X’Y
b = (X’X)-1 X’Y
LSM in Matrix Notation
Q = [Yi - ( 0 + 1 Xi)]2 = (Y - X)’(Y - X)
= Y’Y - ’X’Y - Y’X + ’X’X = Y’Y - 2’X’Y + ’X’X
(Q)/ = -2X’Y + 2X’X = [Q/0, Q/1]’
Equating to the zero vector, dividing by 2, and substituting b for ,
then, b = (X’X)-1 X’Y
52
Fitted Values and Residuals in Matrix Terms
Fitted Values
Residuals
Variance-Covariance Matrix
Var(e) = Var[(I - H)Y] = (I - H) Var(Y) (I - H)’
= (I - H) 2I (I - H)’ = 2 (I - H)
and is estimated by s2(e) = MSE (I - H)
53
ANOVA in Matrix Terms
SS(Total) = Yi2 - (Yi)2/n = Y’Y - Y’JY/n
SSE = e’e = (Y - Xb)’(Y - Xb) = Y’Y - b’X’Y
SSR = b’X’Y - Y’JY/n
Note that Xb = HY and b’X’ = (Xb)’ = (HY)’ = Y’H, then
SS(T) = Y’(I - J/n)Y = Y’A1Y
SSE = Y’(I - H)Y = Y’A2Y
SSR = Y’(H - J/n)Y = Y’A3Y
Since A1, A2 and A3 are symmetric, SS(T), SSE and SSR are
quadratic forms of the Yi.
Quadratic forms play an important role in statistics because
all sum of squares in the ANOVA for linear statistical
models can be expressed as quadratic forms.
54
Inferences in Matrix Terms
The variance covariance matrix
Var(b) = 2 (X’X)-1
The estimated variance-covariance matrix of b is
s2(b) = MSE (X’X)-1
Mean Response
Let Xh = (1, xh)’
Var( ) = 2 Xh’(X’X)-1 Xh
The estimated variance of in matrix notation is
s2( ) = MSE(Xh’(X’X)-1 Xh)
Prediction of New Observation
s2(pred) = MSE(1+Xh’(X’X)-1 Xh)
55