0% found this document useful (0 votes)
130 views55 pages

Simple Linear Regression Analysis

This document discusses simple linear regression analysis, including introducing the simple linear regression model as Y = β0 + β1X + ε, describing how to estimate the model parameters β0 and β1 using the least squares method, and explaining some properties of the fitted regression line obtained through least squares.

Uploaded by

王宇晴
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
130 views55 pages

Simple Linear Regression Analysis

This document discusses simple linear regression analysis, including introducing the simple linear regression model as Y = β0 + β1X + ε, describing how to estimate the model parameters β0 and β1 using the least squares method, and explaining some properties of the fitted regression line obtained through least squares.

Uploaded by

王宇晴
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 55

Simple Linear Regression Analysis

 Introduction to Regression Analysis


 Simple Linear Regression Model
 Inferences in Regression Analysis
 Diagnostics and Remedial Measures
 Matrix Approach to Linear Regression Analysis

1
Introduction to Regression Analysis
 The regression analysis is one of the most important and widely
used statistical techniques in business and economic analysis for
examining the functional relationships between two or more
variables. One variable is specified to be the dependent/response
variable (DV), denoted by Y, and the other one or more variables
are called the independent/predictor/explanatory variables (IV),
denoted by Xi, i=1,2, … k.
 There are two different situations:
(a) Y is a random variable and Xi are fixed, no-random
variable, e.g. to predict the sales for a company, the Year is
the fixed Xi variable.
(b) Both Xi and Y are random variables, e.g. all survey data
are of this type, in this situation, cases are selected randomly
from the population, and both Xi and Y are measured.
2
Main Purposes
 Regression analysis can be used for either of two main
purposes:
(1)Descriptive: The kind of relationship and its strength
are examined. This examination can be done
graphically or by the use of descriptive equations.
Tests of hypotheses and confidence intervals can serve
to draw inferences regarding the relationship.
(2)Predictive: The equation relating Y and Xi can be used
to predict the value of Y for a given value of Xi .
Prediction intervals can also be used to indicate a likely
range of the predicted value of Y.

3
Description of Methods of Regression:
 The general form of a probabilistic model is
Y = Deterministic component + random error
As you will see, the random error plays an important role in
testing hypotheses and finding confidence intervals for the
parameters in the model.

 The simple regression analysis means that the value of the


dependent variable Y is estimated on the basis of only one
independent variable.
Y = f(X) +  .
 On the other hand, multiple regression is concerned with
estimating the value of the dependent variable Y on the basis of
two or more independent variables.
Y = f(X1 , X2 ... Xk) +  , where k  2 .
4
Simple Linear Regression Model
 We begin with the simplest of probabilistic models - the simple linear
regression model. That is, f(X) is a simple linear function of X,
f(X) = 0 + 1 X .

 The model can be stated as follows:


Yi = 0 + 1 Xi + i , i = 1, 2, …, n
where
Yi is the value of the response variable in the ith trial
Xi is a non-random variable, the value of the predictor variable
in the ith trial.
i is a random error with E(i) = 0, var(i) =2, and cov(i, j) = 0, ij.
0 and 1 are parameters.

5
Important Features of the Model
(1) The response Yi in the ith trial is the sum of two components: (1)
the constant term 0 + 1 Xi and (2) the random term i . Hence,
Yi is a random variable.
(2) E(Yi) = 0 + 1 Xi
(3) Yi in the ith trial exceeds or falls short of the value of the
regression function by the error term amount i .
(4) var(Yi) = var(i) = 2. Thus, the regression model assumes that the
probability distributions of Y have the same variance 2,
regardless of the level of the predictor variable X.
(5) Since the error terms i and j are uncorrelated, so are Yi and Yj.
(6) In summary, the regression model implies that the responses Yi
come from probability distributions whose means are E(Yi) = 0 +
1 Xi and var(Yi)=2, the same for all levels of X. Further, any
two Yi and Yj are uncorrelated.

6
Estimating the Model Parameters
 E() = 0 is equivalent to that E(Y) equals the deterministic
component of the model. That is,
E(Y) = 0 + 1X ,
where the constants 0 and 1 are the population parameters. It
is called the population regression equation (line).

 Denoting estimates of 0 & 1 by 0 = b0 and 1 = b1


respectively, we can then estimate E(Y) by from the sample
regression equation (or the fitted regression line)
= b0 + b1 X .
 The problem of fitting a line to a sample of points is essentially the
problem of efficiently estimating the parameters 0 and 1 by b0
and b1 respectively. The best known method for doing this is
called the least squares method (LSM).

7
The Least Squares Method
 The principle of least squares is illustrated in the
following Figure.

Y Estimated (Y)

e4
e2 Actual (Y)
e1 e3
Y  b0 + b1 x
X

8
The Least Squares Method (Cont.)
 For every observed Yi in a sample of points, there is a
corresponding predicted value i, equal to b0 + b1 xi. The
sample deviation of the observed value Yi from the
predicted i is
ei = Yi - i ,
called a residual, that is,
ei = Yi - b0 - b1Xi .
 We shall find b0 and b1 so that the sum of the squares of the
errors (residuals) SSE =ei2 =(Yi - i )2=(Yi - b0 - b1 Xi)2 is
a minimum.

 This minimization procedure for estimating the parameters is


called the method of least squares.

9
The Least Squares Method (Cont.)
 Differentiating SSE with respect to b0 and b1, we have
(SSE)/ b0 = -2(yi - b0 - b1 Xi)
(SSE)/ b1 = -2(yi - b0 - b1 Xi)Xi .
 Setting the partial derivatives equal to zero and rearranging the
terms, we obtain the equations (called the normal equations)
n b0 + b1 Xi =  Yi and b0  Xi + b1  Xi2 = Yi Xi
which may be solved simultaneously to yield computing formulas
for b0 and b1 as follows:
b1 = SSxy /SSxx (=r×sy/sx), b0 = - b1
where
SSxy = (Xi - )(Yi - ) = Xi Yi - ( Xi  Yi )/n
SSxx=  (Xi - )2 = Xi2 - ( Xi)2/n

10
Properties of Least Squares Estimators
(1)Gauss-Markov Theorem
Under the conditions of the regression model, the least
squares estimators b0 and b1 are unbiased estimators
(i.e., E(b0) = 0 and E(b1) = 1) and have minimum variance
among all unbiased linear estimators.
(2) The estimated value of Y (i.e. = b0 + b1X) is an unbiased
estimator of E(Y) = 0 + 1 X, with minimum variance in the
class of unbiased linear estimators.

 Note that the common variance 2 can not be estimated by LSM.


We can prove that the following statistic is an unbiased point
estimator of 2 (You should try to prove it)
s2 = SSE/(n-2) = (SSyy- b1SSxy)/(n-2)

11
Properties of Fitted Regression Line
(1) The sum of the residuals is zero: ei = 0
(2) The sum of the squared residuals, ei2 , is a minimum.
(3)  i = Yi
(4) Xiei = 0
(5)  iei = 0
(6) The regression line always goes through the point ( , ).

 All properties can be proved directly by using the norm equations,


(Yi - b0 - b1 Xi) = 0 and (Yi - b0 - b1Xi)Xi = 0,
or
n b0 + b1 Xi =  Yi and b0  Xi +b1  Xi2 =  Yi Xi .

12
Example 1
A random sample of 42 firms was chosen from the S&P500 firms
listed in the Spring 2003 Special Issue of Business Week (The
Business Week Fifty Best Performers).

The dividend yield (DIVYIELD) and the 2002 earnings per share
(EPS) were recorded for the 42 firms. These data are in a file named
DIV3. Using dividend yield as the DV and EPS as the IV, plot the
scatter diagram and run a regression using SPSS.
(a)Find the estimated regression line .
(b)Find the predicted values of DV given EPS =1 and EPS=2.

13
Example 1 – Solution
Coefficientsa

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.


1 (Constant)
2.034 .541 3.762 .001

EPS
.374 .239 .240 1.562 .126

a. Dependent Variable: Divyield

= 2.034 + 0.374 x

14
Example 1 - Scatter Diagram

= 2.034 + 0.374x

15
Normal Error Regression Model
 No matter what may be the form of the distribution of the error
terms i (and hence of the Yi), the LSM provides unbiased point
estimators of 0 and 1 that have minimum variance among all
unbiased linear estimators.
 To set up interval estimates and make tests, however, we need to
make an assumption about the form of the distribution of the i .
The standard assumption is that the error terms i are normally
distributed, and we will adopt it here.
 Since now the functional form of the probability distribution of the
error terms is specified, we can use the maximum likelihood
method to obtain estimators of the parameters 0, 1 and 2. In
fact, MLE and LSE for 0 and 1 are the same. The MLE for 2 is
biased = ei2/n= SSE/n = s2 (n-2)/n.
 A normal error term greatly simplifies the theory of regression
analysis (See the comments on page 32).

16
Normality & Constant Variance
Assumptions

f(e)

Y
X1
X2
X
E(Y) = 0 + 1 X
17
Inferences Concerning the Regression
Coefficients

 Aside from merely estimating the linear relationship


between X and Y for purposes of prediction, we may
also be interested in drawing certain inferences about
the population parameters, say 0 and 1 .

 To make inferences or test hypotheses concerning these


parameters, we must know the sampling distributions
of b0 and b1. (Note that b0 and b1 are statistics, i.e.,
functions of the random sample, therefore, they are
random variables)

18
Inferences Concerning 1
(a) b1 is an normal random variable for the normal error model.
(b) E(b1) = 1 . That is, b1 is an unbiased estimator of 1.
(c) Var(b1) = 2/SSxx, which is estimated by s2 (b1) = s2/SSxx , where s2
is the unbiased estimator of 2.
(d) The (1 - ) 100% Confidence interval for 1 (2 unknown)
b1 - t/2 s(b1) < 1 < b1 + t/2 s(b1)
where t/2 is a value of the t - distribution with (n - 2) degrees of
freedom, and s(b1) is the standard error of b1 , i.e. s(b1) = s /(SSxx)1/2 .
(e) Hypothesis test of 1
To test the null hypothesis H0: 1 = 0 against a suitable alternative,
we can use the t distribution with n-2 degrees of freedom to
establish a critical region and then base our decision on the value of
t = b1 /s(b1) .
19
Inferences Concerning 0
(a) b0 is an normal random variable for the normal error model.
(b) E(b0) = 0 . That is, b0 is an unbiased estimator of 0.
(c) Var(b0) =2 Xi2/nSSxx, which is estimated by s2 (b0) = s2Xi2/nSSxx ,
where s2 is the unbiased estimator of 2.
(d) The (1 - ) 100% Confidence interval for 0 (2 unknown)
b0 - t/2 s(b0) < 0 < b0 + t/2 s(b0)
where t/2 is a value of the t - distribution with (n - 2) degrees of
freedom, and s(b0) = s(Xi2/nSSxx )1/2 .
(e) Hypothesis test of 0
To test the null hypothesis H0: 0 = 0 against a suitable alternative,
we can use the t distribution with n-2 degrees of freedom to
establish a critical region and then base our decision on the value of
t = b0 /s(b0) .
20
Some Considerations
 Effects of Departures From Normality
If the probability distributions of Y are not exactly
normal but do not depart seriously, the sampling
distributions of b0 and b1 will be approximately
normal. Even if the distributions of Y are far from
normal, the estimators b0 and b1 generally have the
property of asymptotic normality as the sample
size increases. Thus, with sufficiently large
samples, the confidence interval and decision rules
given earlier still apply even if the probability
distributions of Y depart far from normality.

21
Inferences Concerning E(Y)
(1) The sampling distribution of i is normal for the normal error
model.
(2) i is an unbiased estimator of E(Yi).
Because E(Yi) = 0 + 1Xi and
E( i) = E(b0 + b1 Xi) = 0 + 1Xi = E(Yi).

(3) The variance of i : var( i) = 2 [(1/n) + (Xi - )2/SSxx]


& the estimated variance of i : s2( i) = s2 [(1/n) + (Xi - )2/SSxx]
(4) The (1 - ) 100% confidence interval for the mean response E(Yi )
is as follows i - t/2, (n-2) s( i) < E(Yi) < i + t/2, (n-2) s( i)
 Note that the confidence limits for E(Yi) are not sensitive to
moderate departures from the assumption that the error terms are
normally distributed. Indeed, the limits are not sensitive to
substantial departures from normality if the sample size is large.
22
Prediction of New Observation
 The distinction between estimation of the mean response E(Yi),
discussed in the preceding section, and prediction of a new
response Yi(new), discussed now, is basic. In the former case, we
estimate the mean of the distribution of Y. In the present case,
we predict an individual outcome draw from the distribution
of Y.
 Prediction Interval for Yi(new)
When the regression parameters are unknown, they must be
estimated. The mean of the distribution of Y is estimated by ,
as usual, and the variance of the distribution of Y is estimated
by MSE (i.e. s2). From the Figure in next page, we can see that
there are two probability distributions of Y, corresponding to
the upper and lower limits of a confidence interval for E(Y).

23
Prediction Interval
Prediction Prediction
limits limits
if E(Yi) here if E(Yi) here

Confidence limits for E(Yi)

24
Prediction Interval (cont.)
 Since we cannot be certain of the location of the distribution
of Y, prediction limits for Yi(new) clearly must take account of
two elements: (a) variation in possible location of the
distribution of Y; and (b) variation within the probability
distribution of Y. That is,
var(predi)=var(Yi(new)- i)= var(Yi(new))+var( i)= 2+var( i).
 An unbiased estimator of var(pred) is as follows
s2(predi)= s 2 + s2( i) = s2[1+ (1/n) + (Xi - )2/SSxx]
 The (1 - ) 100% prediction interval for Yi(new) is as follows
i - t/2, (n-2) s(predi) < Yi(new) < i + t/2, (n-2) s(predi)

25
Comments on Prediction Interval
 The prediction limits, unlike the confidence limits for a
mean response E(Yi), are sensitive to departures from
normality of the error terms distribution.

 Prediction intervals resemble confidence intervals.


However, they differ conceptually. A confidence
interval represents an inference on a parameter and is
an interval that is intended to cover the value of the
parameter. A prediction interval, on other hand, is a
statement about the value to be taken by a random
variable, the new observation Yi(new).

26
Hyperbolic Interval Bands

_ X
X Xgiven 27
Example 2
The vice-president of marketing for a large firm is concerned about the
effect of advertising on sales of the firm’s major product. To investigate
the relationship between advertising and sales, data on the two variables
were gathered from a random sample of 20 sales districts. These data
are available in a file named SALESAD3. Sales (DV) and advertising
(IV) are both expressed in hundreds of dollars.

(a) What is the sample regression equation relating sales to advertising?


(b) Is there a linear relationship between sales and advertising?
(c) What conclusion can be drawn from the test result?
(d) Find the 95% confidence interval estimate for the mean value of
DV given that IV = 410.
(e) Find the 95% prediction interval for the individual value of DV
given that IV = 410.
(f) Construct a 95% confidence interval estimate of 1 .
28
Example 2 – SPSS OUTPUTS
Model Summaryb

Std. Error of the


Model R R Square Adjusted R Square Estimate
1 .930a .864 .857 594.80820
a. Predictors: (Constant), adv

b. Dependent Variable: sales

Coefficientsa

Unstandardized Standardized 95% Confidence Interval


Coefficients Coefficients for B
Lower Upper
Model B Std. Error Beta t Sig. Bound Bound
1 (Constant) -57.281 509.750 -.112 .912 -1128.2 1013.7
Adv 17.570 1.642 .930 10.702 .000 14.121 21.019
a. Dependent Variable: sales

= -57.281 + 17.57x
29
Example 2 – Scatter Plot

= -57.281 + 17.57x

30
The Coefficient of Determination
 In many regression problems, the major reason for constructing
the regression equation is to obtain a tool that is useful in
predicting the value of the dependent variable Y from some known
value of the independent variable X. Thus, we often wish to assess
the accuracy of the regression line in predicting the Y values.

 The R2 , called the coefficient of determination, provides a


summary measure of how well the regression line fits the sample.
It has a proportional reduction in error interpretation. That is,

 R2 is the proportion of the variability in the dependent variable that


is explained by the independent variable (see the figure), namely,

Sum of squares due to regression


R2 =
Total sum of squares
31
Partitioning Variation

32
Partitioning Variation (Cont.)
The dependent variable Y can be partitioned into two parts -
explained variation by regression & unexplained variation.

Total Explained Unexplained


Variation Variation Variation

The total sum of squares is SST = (Yi - )2 .


The SS(Total) can be subdivided into two components:
SSR = the sum of squares due to regression (explained variation)
SSE = the sum of squares due to error (unexplained variation).
That is, SST = SSR + SSE, namely,
(Yi - )2 = ( i - )2 +  (Yi - i)2

33
Computing Formulas
 The various sums of squares may be found more simply by
using the following formulas.
SST = SSyy=(Yi - )2 = (Yi)2 - (Yi )2/n
SSR = ( i - )2 = b1 (SSxy)
SSE = (Yi - i)2 = SS(Total) - SSR .

 Now we can calculate R2 by using the following equation


R2=SSR/SS(Total) = 1 - SSE/SS(Total) = b1SSxy/SSyy
and 0  R2  1.

 The computations are usually summarized in tabular form


(ANOVA Table).
34
ANOVA Table
ANOVA Table for Simple Regression

 While the t-test is used to test the significance of individual


independent variables, the ANOVA Table provides an overall
test of the significance of the whole set of independent
variables. The test is an F-test with d.f. (k, n-k-1), where k is
the number of independent variables in the model.
F= MSR/MSE = [R2/k]/[(1-R2)/(n-k-1)] = (n-2)R2/(1-R2).
 For the simple linear regression model, the F-test is
equivalent to the t-test for parameter 1 . But it is not the
case for the multiple regression model.
35
Example 2 (Cont.)
(a) Find SST, SSR, SSE, and R2 .
(b) Present an ANOVA summary Table.
(c) Test the hypothesis H0: 1 = 0 against Ha: 1  0 by
using an F-statistic. Let  = 0.05.

Solution:
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 4.052E7 1 4.052E7 114.539 .000a
Residual 6368342.383 18 353796.799
Total 4.689E7 19
a. Predictors: (Constant), adv

b. Dependent Variable: sales

36
Description of Methods of Regression:
Case When X is Random
 For variable-X case, both X and Y are random variables
measured on cases that are randomly selected from a
population.
 The fixed-X regression model applies in this case when we treat
the X values as if they were pre-selected. This technique is
justifiable theoretically by conditioning on the X values that
happened to be obtained in the sample (Textbook page 83).
Therefore all the previous discussion and formulas are
precisely the same for this case as for the fixed-X case.
 Since both X and Y are considered random variables, other
parameters can be useful for describing the model, say,
covariance of X and Y, denoted by XY (or Cov(X, Y)), and
correlation coefficient, denoted by , which are measures of
how the two variables vary together.
37
Correlation Coefficient
 The correlation coefficient  = XY/XY is a measure of the
direction and the strength of linear association between two
variables. It is dimensionless, and it may take any value between
- 1 and 1, inclusive.

 A positive correlation (i.e.  > 0) means that as one variable


increases, the other likewise increases.

 A negative correlation (i.e.  < 0) means that as one variable


increases, the other decreases.

 If  = 0 for two variables, then we say that the variables are


uncorrelated and that there is no linear association between them.

 Note that  measures only linear relationship. The variables may


be perfectly correlated in a curvilinear relationship, even  = 0.
38
Correlation Coefficient and R2
 The sample correlation coefficient r is an estimator for . The
equation for the sample correlation coefficient is given as follows:
r = SSxy/ [(SSxx)(SSyy)]1/2 .

 Simple regression techniques and correlation methods are related.


In correlation, r is an estimator for the population correlation
coefficient  . In regression, r2 = R2 is simply a measure of
closeness of fit.

 Thus the sample correlation coefficient r is used to estimate the


direction and the strength of the linear relationship between two
variables, whereas the coefficient of determination r2 = R2 is the
proportion of the squared error that the regression equation can
explain when we use the regression equation rather than the
sample mean as a predictor.

39
Test of Coefficient of Correlation
 Note that tests of hypotheses and confidence intervals for the
variable-X case require that X and Y be jointly normally
distributed. That is, X and Y follow a bivariate normal
distribution.

 Under the assumption mentioned above, we can test whether


there is a linear relationship between X and Y variables (i.e. if 
= 0), by using the following t-test. The same conclusion as testing
population slope 1 will be drawn.
(1) H0:  = 0 against Ha:   0 (or  > 0 or  < 0)
(2) The test statistic is t =
Under H0 the statistic t has the t-distribution with (n-2)
degrees of freedom.

40
Example 2 (cont.)
Use the data in the example to test that if there is a significant linear
relationship between the sales and advertising expense (both in
hundreds of dollars). Use  = 0.05.
Solution:
(1) H0:  = 0 against Ha:   0
(2)  = 0.05, n = 20, df = n - 2 = 18 and t0.025, 18 = 2.101
(3) The rejection rule: If the |t| > 2.101, then reject the H0.
(4) Computations: r =SSxy/(SSxxSSyy)1/2 = 0.9296

The test statistic is = = 10.701

(5) We reject H0 at  = 0.05 since t = 10.701 > 2.101 and conclude that
there is a significant linear relationship between the weekly usage
and annual maintenance expense .

41
Further Examination of Computer Output
 Standardized Regression Coefficient
The standardized regression coefficient is the slope in the
regression equation if X and Y are standardized. After
standardization the intercept in the regression equation will
be zero, and for simple linear regression the standardized
slop will be equal to the correlation coefficient r. In multiple
regression, the standardized regression coefficients help
quantify the relative contribution of each X variable.
Coefficientsa
Standardiz
Unstandardized ed 95% Confidence
Coefficients Coefficients Interval for B
Lower Upper
Model B Std. Error Beta t Sig. Bound Bound r
1 (Constant) -57.281 509.750 -.112 .912 -1128.227 1013.665
Adv 17.570 1.642 .930 10.702 .000 14.121 21.019
a. Dependent Variable: sales 42
Checking for Violations of Assumptions
 We usually do not know in advance whether a linear regression
model is appropriate for our data set. Therefore, it is necessary to
conduct a search to check whether the necessary assumptions are
violated. The analysis of the residuals is frequently helpful and
useful tool for this purpose.
 The basic principles apply to all statistical models discussed in this
course.
 Residuals: In model building, a residual is what is left after the
model is fit. It is the difference between an observed value of Y
and the predicted value of Y, i.e. Residuali = ei = (Yi - i). In
regression analysis, the true errors are assumed to be independent
normal variables with a mean of 0 and a constant variance of 2.
If the model is appropriate for the data, the residuals ei, which are
estimates of the true errors, should have similar characteristics.
(Refer to Pages102~103)
43
Checking for Violations of Assumptions
 Identification of equality of variance
Scatter plots can also be used to detect whether the assumption of
constant variance of y for all values of x is being violated. If the
spread of the residuals increases or decreases with the values of
the independent variable or with the predicted values, then the
assumption of homogeneity of variance is being violated.

 Identification of independence
Usually this assumption is relative easy to meet since observations
appear in a random position, and hence successive error terms are
also likely to be random. However, in time series data or repeated
measures data, this problem of dependence between successive
error terms often occurs.

44
Checking for Violations of Assumptions (Cont.)
 Identification of normality
A critical assumption of the simple linear regression model is that
the error terms associated with each xi have a normal
distribution. Note that it is unreasonable to expect the observed
residuals to be exactly normal - some deviation is expected because
of sampling variation. Even if the errors are normally distributed
in the population, sample residuals are only approximately
normal.

Another way to compare the observed distribution of residuals to


that expected under the assumption of normality is to plot the two
cumulative distributions against each other for a series of points.
If the two distributions are identical, a straight line results. It is
called a P-P plot (a cumulative probability plot).
45
Checking for Violations of Assumptions (Cont.)
 Identification of linearity
For the simple regression, a scatter plot gives a good indication of
how well a straight line fits the data. Another convenient method
is to plot the residuals against the predicted values. If the
assumptions of linearity and homogeneity of variance are met,
there should be no relationship between the predicted and residual
values, i.e. the residuals should be randomly distributed around
the horizontal line through zero. You should be suspicious of any
observable pattern.

 Identification of outliers
In combination with a scatter plot of the observed dependent and
independent variables, the plot of residuals can be used to identify
observations which appear to fall a long way from the normal
cluster observations (a residual that is larger than 3s is an outlier).
46
Overview of Tests Involving Residuals
 Tests for Randomness in the Residuals
 Runs Test
 Tests for Autocorrelation in the Residuals in Time Order
 Durbin-Watson Test
 Tests for Normality
 Correlation Test (Shapiro-Wilk Test)
 Chi-Square Test
 Kolmogorov Test
 Tests for Constancy of Error Variance
 Brown-Forsythe (Modified Levene) Test*
 Cook-Weisberg (Breusch-Pagan) Test*
 F-test for Lack Of Fit
 Test whether a linear regression function is a good fit for the data*.
(Note that the tests with * are valid only for large samples or under strong
assumptions)
47
Overview of Remedial Measures
 If the linear regression normal error model is not
appropriate for a data set, there are two basic choices
 Abandon the model and develop and use a more

appropriate model ( non-normal, nonlinear models)


 Employ some transformation(s) on the data.

 Transformations
 Transformations for nonlinear relation

 Transformations for nonnormality and unequal

variances
 Box-Cox Transformations

48
What to Watch Out For
 In the development of the theory for linear regression, the
sample is assumed to be obtained randomly in such a way
that it represents the whole population you are studying.
Often, convenience samples, which are samples of easily
available cases, are taken for economic or other reasons. It is
likely to be an underestimate of the variance and possibly
bias in the regression line.

 The lack of randomness in the sample can seriously


invalidate our inferences. Confidence intervals are often
optimistically narrow because the sample is not truly a
random one from the whole population to which we wish to
generalize.

49
What to Watch Out For (Cont.)
 Association versus Causality – A common mistake made
when using regression analysis is to assume that a strong
fit (high R2) of a regression of Y on X automatically
means that “X causes Y” .
(1) The reverse could be true: Y causes X
(2) There may be third variable related to both X and Y.

 Forecasting Outside the range of the explanatory


variables.

50
Matrix Approach to Simple Linear
Regression Analysis
 yi = 0 + 1 xi + i , i = 1, 2, …, n
This implies
y1 =  0 +  1 x1 +  1 ,
y2 =  0 +  1 x2 +  2 ,
…………………….
yn =  0 +  1 xn +  n ,
Let Yn1 = (y1, y2 , …, yn)’, Xn2 = [1n1 , (x1, x2, … xn)’],
21 = (0 , 1)’ and n1 = (1, 2 , …, n)’ .
Then the normal model in matrix terms is as follows
Yn1 = Xn2 21 + n1 or simply Y = X  + 
where  is a vector of independent normal variables with
E( ) = 0 and Var() = Var(Y) = 2 I.
51
LS Estimation in Matrix Terms
 Normal Equations
n b0 + b1 Xi =  Yi
b0  Xi +b1  Xi2 =  Yi Xi
in matrix terms are X’Xb = X’Y where b = (b0, b1)’.
 Estimated Regression Coefficients
(X’X)-1 X’Xb = (X’X)-1 X’Y
b = (X’X)-1 X’Y
 LSM in Matrix Notation
Q = [Yi - ( 0 + 1 Xi)]2 = (Y - X)’(Y - X)
= Y’Y - ’X’Y - Y’X + ’X’X = Y’Y - 2’X’Y + ’X’X
(Q)/ = -2X’Y + 2X’X = [Q/0, Q/1]’
Equating to the zero vector, dividing by 2, and substituting b for ,
then, b = (X’X)-1 X’Y
52
Fitted Values and Residuals in Matrix Terms
 Fitted Values

 Residuals

 Variance-Covariance Matrix
Var(e) = Var[(I - H)Y] = (I - H) Var(Y) (I - H)’
= (I - H) 2I (I - H)’ = 2 (I - H)
and is estimated by s2(e) = MSE (I - H)
53
ANOVA in Matrix Terms
 SS(Total) = Yi2 - (Yi)2/n = Y’Y - Y’JY/n
SSE = e’e = (Y - Xb)’(Y - Xb) = Y’Y - b’X’Y
SSR = b’X’Y - Y’JY/n
Note that Xb = HY and b’X’ = (Xb)’ = (HY)’ = Y’H, then
SS(T) = Y’(I - J/n)Y = Y’A1Y
SSE = Y’(I - H)Y = Y’A2Y
SSR = Y’(H - J/n)Y = Y’A3Y
Since A1, A2 and A3 are symmetric, SS(T), SSE and SSR are
quadratic forms of the Yi.
 Quadratic forms play an important role in statistics because
all sum of squares in the ANOVA for linear statistical
models can be expressed as quadratic forms.
54
Inferences in Matrix Terms
 The variance covariance matrix
Var(b) = 2 (X’X)-1
The estimated variance-covariance matrix of b is
s2(b) = MSE (X’X)-1
 Mean Response
Let Xh = (1, xh)’
Var( ) = 2 Xh’(X’X)-1 Xh
The estimated variance of in matrix notation is
s2( ) = MSE(Xh’(X’X)-1 Xh)
 Prediction of New Observation
s2(pred) = MSE(1+Xh’(X’X)-1 Xh)
55

You might also like