Simple Linear Regression Analysis

Simple Linear Regression Analysis
 Introduction to Regression Analysis

 Simple Linear Regression Model
 Inferences in Regression Analysis
 Diagnostics and Remedial Measures
 Matrix Approach to Linear Regression Analysis
1
Introduction to Regression Analysis
 The regression analysis is one of the most important and widely
used statistical techniques in business and economic analysis for
examining the functional relationships between two or more
variables. One variable is specified to be the dependent/response
variable (DV), denoted by Y, and the other one or more variables
are called the independent/predictor/explanatory variables (IV),
denoted by Xi, i=1,2, … k.
 There are two different situations:
(a) Y is a random variable and Xi are fixed, no-random
variable, e.g. to predict the sales for a company, the Year is
the fixed Xi variable.
(b) Both Xi and Y are random variables, e.g. all survey data
are of this type, in this situation, cases are selected randomly
from the population, and both Xi and Y are measured.
2
Main Purposes
 Regression analysis can be used for either of two main
purposes:
(1)Descriptive: The kind of relationship and its strength
are examined. This examination can be done
graphically or by the use of descriptive equations.
Tests of hypotheses and confidence intervals can serve
to draw inferences regarding the relationship.
(2)Predictive: The equation relating Y and Xi can be used
to predict the value of Y for a given value of Xi .
Prediction intervals can also be used to indicate a likely
range of the predicted value of Y.
3
Description of Methods of Regression:
 The general form of a probabilistic model is
Y = Deterministic component + random error
As you will see, the random error plays an important role in
testing hypotheses and finding confidence intervals for the
parameters in the model.
 The simple regression analysis means that the value of the

dependent variable Y is estimated on the basis of only one
independent variable.
Y = f(X) +  .
 On the other hand, multiple regression is concerned with
estimating the value of the dependent variable Y on the basis of
two or more independent variables.
Y = f(X1 , X2 ... Xk) +  , where k  2 .
4
Simple Linear Regression Model
 We begin with the simplest of probabilistic models - the simple linear
regression model. That is, f(X) is a simple linear function of X,
f(X) = 0 + 1 X .
 The model can be stated as follows:

Yi = 0 + 1 Xi + i , i = 1, 2, …, n
where
Yi is the value of the response variable in the ith trial
Xi is a non-random variable, the value of the predictor variable
in the ith trial.
i is a random error with E(i) = 0, var(i) =2, and cov(i, j) = 0, ij.
0 and 1 are parameters.
5
Important Features of the Model
(1) The response Yi in the ith trial is the sum of two components: (1)
the constant term 0 + 1 Xi and (2) the random term i . Hence,
Yi is a random variable.
(2) E(Yi) = 0 + 1 Xi
(3) Yi in the ith trial exceeds or falls short of the value of the
regression function by the error term amount i .
(4) var(Yi) = var(i) = 2. Thus, the regression model assumes that the
probability distributions of Y have the same variance 2,
regardless of the level of the predictor variable X.
(5) Since the error terms i and j are uncorrelated, so are Yi and Yj.
(6) In summary, the regression model implies that the responses Yi
come from probability distributions whose means are E(Yi) = 0 +
1 Xi and var(Yi)=2, the same for all levels of X. Further, any
two Yi and Yj are uncorrelated.
6
Estimating the Model Parameters
 E() = 0 is equivalent to that E(Y) equals the deterministic
component of the model. That is,
E(Y) = 0 + 1X ,
where the constants 0 and 1 are the population parameters. It
is called the population regression equation (line).
 Denoting estimates of 0 & 1 by 0 = b0 and 1 = b1

respectively, we can then estimate E(Y) by from the sample
regression equation (or the fitted regression line)
= b0 + b1 X .
 The problem of fitting a line to a sample of points is essentially the
problem of efficiently estimating the parameters 0 and 1 by b0
and b1 respectively. The best known method for doing this is
called the least squares method (LSM).
7
The Least Squares Method
 The principle of least squares is illustrated in the
following Figure.
Y Estimated (Y)
e4
e2 Actual (Y)
e1 e3
Y  b0 + b1 x
X
8
The Least Squares Method (Cont.)
 For every observed Yi in a sample of points, there is a
corresponding predicted value i, equal to b0 + b1 xi. The
sample deviation of the observed value Yi from the
predicted i is
ei = Yi - i ,
called a residual, that is,
ei = Yi - b0 - b1Xi .
 We shall find b0 and b1 so that the sum of the squares of the
errors (residuals) SSE =ei2 =(Yi - i )2=(Yi - b0 - b1 Xi)2 is
a minimum.
 This minimization procedure for estimating the parameters is

called the method of least squares.
9
The Least Squares Method (Cont.)
 Differentiating SSE with respect to b0 and b1, we have
(SSE)/ b0 = -2(yi - b0 - b1 Xi)
(SSE)/ b1 = -2(yi - b0 - b1 Xi)Xi .
 Setting the partial derivatives equal to zero and rearranging the
terms, we obtain the equations (called the normal equations)
n b0 + b1 Xi =  Yi and b0  Xi + b1  Xi2 = Yi Xi
which may be solved simultaneously to yield computing formulas
for b0 and b1 as follows:
b1 = SSxy /SSxx (=r×sy/sx), b0 = - b1
where
SSxy = (Xi - )(Yi - ) = Xi Yi - ( Xi  Yi )/n
SSxx=  (Xi - )2 = Xi2 - ( Xi)2/n
10
Properties of Least Squares Estimators
(1)Gauss-Markov Theorem
Under the conditions of the regression model, the least
squares estimators b0 and b1 are unbiased estimators
(i.e., E(b0) = 0 and E(b1) = 1) and have minimum variance
among all unbiased linear estimators.
(2) The estimated value of Y (i.e. = b0 + b1X) is an unbiased
estimator of E(Y) = 0 + 1 X, with minimum variance in the
class of unbiased linear estimators.
 Note that the common variance 2 can not be estimated by LSM.

We can prove that the following statistic is an unbiased point
estimator of 2 (You should try to prove it)
s2 = SSE/(n-2) = (SSyy- b1SSxy)/(n-2)
11
Properties of Fitted Regression Line
(1) The sum of the residuals is zero: ei = 0
(2) The sum of the squared residuals, ei2 , is a minimum.
(3)  i = Yi
(4) Xiei = 0
(5)  iei = 0
(6) The regression line always goes through the point ( , ).
 All properties can be proved directly by using the norm equations,

(Yi - b0 - b1 Xi) = 0 and (Yi - b0 - b1Xi)Xi = 0,
or
n b0 + b1 Xi =  Yi and b0  Xi +b1  Xi2 =  Yi Xi .
12
Example 1
A random sample of 42 firms was chosen from the S&P500 firms
listed in the Spring 2003 Special Issue of Business Week (The
Business Week Fifty Best Performers).
The dividend yield (DIVYIELD) and the 2002 earnings per share
(EPS) were recorded for the 42 firms. These data are in a file named
DIV3. Using dividend yield as the DV and EPS as the IV, plot the
scatter diagram and run a regression using SPSS.
(a)Find the estimated regression line .
(b)Find the predicted values of DV given EPS =1 and EPS=2.
13
Example 1 – Solution
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.

1 (Constant)
2.034 .541 3.762 .001
EPS
.374 .239 .240 1.562 .126
a. Dependent Variable: Divyield
= 2.034 + 0.374 x
14
Example 1 - Scatter Diagram
= 2.034 + 0.374x
15
Normal Error Regression Model
 No matter what may be the form of the distribution of the error
terms i (and hence of the Yi), the LSM provides unbiased point
estimators of 0 and 1 that have minimum variance among all
unbiased linear estimators.
 To set up interval estimates and make tests, however, we need to
make an assumption about the form of the distribution of the i .
The standard assumption is that the error terms i are normally
distributed, and we will adopt it here.
 Since now the functional form of the probability distribution of the
error terms is specified, we can use the maximum likelihood
method to obtain estimators of the parameters 0, 1 and 2. In
fact, MLE and LSE for 0 and 1 are the same. The MLE for 2 is
biased = ei2/n= SSE/n = s2 (n-2)/n.
 A normal error term greatly simplifies the theory of regression
analysis (See the comments on page 32).
16
Normality & Constant Variance
Assumptions
f(e)
Y
X1
X2
X
E(Y) = 0 + 1 X
17
Inferences Concerning the Regression
Coefficients
 Aside from merely estimating the linear relationship

between X and Y for purposes of prediction, we may
also be interested in drawing certain inferences about
the population parameters, say 0 and 1 .
 To make inferences or test hypotheses concerning these

parameters, we must know the sampling distributions
of b0 and b1. (Note that b0 and b1 are statistics, i.e.,
functions of the random sample, therefore, they are
random variables)
18
Inferences Concerning 1
(a) b1 is an normal random variable for the normal error model.
(b) E(b1) = 1 . That is, b1 is an unbiased estimator of 1.
(c) Var(b1) = 2/SSxx, which is estimated by s2 (b1) = s2/SSxx , where s2
is the unbiased estimator of 2.
(d) The (1 - ) 100% Confidence interval for 1 (2 unknown)
b1 - t/2 s(b1) < 1 < b1 + t/2 s(b1)
where t/2 is a value of the t - distribution with (n - 2) degrees of
freedom, and s(b1) is the standard error of b1 , i.e. s(b1) = s /(SSxx)1/2 .
(e) Hypothesis test of 1
To test the null hypothesis H0: 1 = 0 against a suitable alternative,
we can use the t distribution with n-2 degrees of freedom to
establish a critical region and then base our decision on the value of
t = b1 /s(b1) .
19
Inferences Concerning 0
(a) b0 is an normal random variable for the normal error model.
(b) E(b0) = 0 . That is, b0 is an unbiased estimator of 0.
(c) Var(b0) =2 Xi2/nSSxx, which is estimated by s2 (b0) = s2Xi2/nSSxx ,
where s2 is the unbiased estimator of 2.
(d) The (1 - ) 100% Confidence interval for 0 (2 unknown)
b0 - t/2 s(b0) < 0 < b0 + t/2 s(b0)
where t/2 is a value of the t - distribution with (n - 2) degrees of
freedom, and s(b0) = s(Xi2/nSSxx )1/2 .
(e) Hypothesis test of 0
To test the null hypothesis H0: 0 = 0 against a suitable alternative,
we can use the t distribution with n-2 degrees of freedom to
establish a critical region and then base our decision on the value of
t = b0 /s(b0) .
20
Some Considerations
 Effects of Departures From Normality
If the probability distributions of Y are not exactly
normal but do not depart seriously, the sampling
distributions of b0 and b1 will be approximately
normal. Even if the distributions of Y are far from
normal, the estimators b0 and b1 generally have the
property of asymptotic normality as the sample
size increases. Thus, with sufficiently large
samples, the confidence interval and decision rules
given earlier still apply even if the probability
distributions of Y depart far from normality.
21
Inferences Concerning E(Y)
(1) The sampling distribution of i is normal for the normal error
model.
(2) i is an unbiased estimator of E(Yi).
Because E(Yi) = 0 + 1Xi and
E( i) = E(b0 + b1 Xi) = 0 + 1Xi = E(Yi).
(3) The variance of i : var( i) = 2 [(1/n) + (Xi - )2/SSxx]

& the estimated variance of i : s2( i) = s2 [(1/n) + (Xi - )2/SSxx]
(4) The (1 - ) 100% confidence interval for the mean response E(Yi )
is as follows i - t/2, (n-2) s( i) < E(Yi) < i + t/2, (n-2) s( i)
 Note that the confidence limits for E(Yi) are not sensitive to
moderate departures from the assumption that the error terms are
normally distributed. Indeed, the limits are not sensitive to
substantial departures from normality if the sample size is large.
22
Prediction of New Observation
 The distinction between estimation of the mean response E(Yi),
discussed in the preceding section, and prediction of a new
response Yi(new), discussed now, is basic. In the former case, we
estimate the mean of the distribution of Y. In the present case,
we predict an individual outcome draw from the distribution
of Y.
 Prediction Interval for Yi(new)
When the regression parameters are unknown, they must be
estimated. The mean of the distribution of Y is estimated by ,
as usual, and the variance of the distribution of Y is estimated
by MSE (i.e. s2). From the Figure in next page, we can see that
there are two probability distributions of Y, corresponding to
the upper and lower limits of a confidence interval for E(Y).
23
Prediction Interval
Prediction Prediction
limits limits
if E(Yi) here if E(Yi) here
Confidence limits for E(Yi)
24
Prediction Interval (cont.)
 Since we cannot be certain of the location of the distribution
of Y, prediction limits for Yi(new) clearly must take account of
two elements: (a) variation in possible location of the
distribution of Y; and (b) variation within the probability
distribution of Y. That is,
var(predi)=var(Yi(new)- i)= var(Yi(new))+var( i)= 2+var( i).
 An unbiased estimator of var(pred) is as follows
s2(predi)= s 2 + s2( i) = s2[1+ (1/n) + (Xi - )2/SSxx]
 The (1 - ) 100% prediction interval for Yi(new) is as follows
i - t/2, (n-2) s(predi) < Yi(new) < i + t/2, (n-2) s(predi)
25
Comments on Prediction Interval
 The prediction limits, unlike the confidence limits for a
mean response E(Yi), are sensitive to departures from
normality of the error terms distribution.
 Prediction intervals resemble confidence intervals.

However, they differ conceptually. A confidence
interval represents an inference on a parameter and is
an interval that is intended to cover the value of the
parameter. A prediction interval, on other hand, is a
statement about the value to be taken by a random
variable, the new observation Yi(new).
26
Hyperbolic Interval Bands
_ X
X Xgiven 27
Example 2
The vice-president of marketing for a large firm is concerned about the
effect of advertising on sales of the firm’s major product. To investigate
the relationship between advertising and sales, data on the two variables
were gathered from a random sample of 20 sales districts. These data
are available in a file named SALESAD3. Sales (DV) and advertising
(IV) are both expressed in hundreds of dollars.
(a) What is the sample regression equation relating sales to advertising?

(b) Is there a linear relationship between sales and advertising?
(c) What conclusion can be drawn from the test result?
(d) Find the 95% confidence interval estimate for the mean value of
DV given that IV = 410.
(e) Find the 95% prediction interval for the individual value of DV
given that IV = 410.
(f) Construct a 95% confidence interval estimate of 1 .
28
Example 2 – SPSS OUTPUTS
Model Summaryb
Std. Error of the

Model R R Square Adjusted R Square Estimate
1 .930a .864 .857 594.80820
a. Predictors: (Constant), adv
b. Dependent Variable: sales
Coefficientsa
Unstandardized Standardized 95% Confidence Interval

Coefficients Coefficients for B
Lower Upper
Model B Std. Error Beta t Sig. Bound Bound
1 (Constant) -57.281 509.750 -.112 .912 -1128.2 1013.7
Adv 17.570 1.642 .930 10.702 .000 14.121 21.019
a. Dependent Variable: sales
= -57.281 + 17.57x
29
Example 2 – Scatter Plot
= -57.281 + 17.57x
30
The Coefficient of Determination
 In many regression problems, the major reason for constructing
the regression equation is to obtain a tool that is useful in
predicting the value of the dependent variable Y from some known
value of the independent variable X. Thus, we often wish to assess
the accuracy of the regression line in predicting the Y values.
 The R2 , called the coefficient of determination, provides a

summary measure of how well the regression line fits the sample.
It has a proportional reduction in error interpretation. That is,
 R2 is the proportion of the variability in the dependent variable that

is explained by the independent variable (see the figure), namely,
Sum of squares due to regression

R2 =
Total sum of squares
31
Partitioning Variation
32
Partitioning Variation (Cont.)
The dependent variable Y can be partitioned into two parts -
explained variation by regression & unexplained variation.
Total Explained Unexplained

Variation Variation Variation
The total sum of squares is SST = (Yi - )2 .

The SS(Total) can be subdivided into two components:
SSR = the sum of squares due to regression (explained variation)
SSE = the sum of squares due to error (unexplained variation).
That is, SST = SSR + SSE, namely,
(Yi - )2 = ( i - )2 +  (Yi - i)2
33
Computing Formulas
 The various sums of squares may be found more simply by
using the following formulas.
SST = SSyy=(Yi - )2 = (Yi)2 - (Yi )2/n
SSR = ( i - )2 = b1 (SSxy)
SSE = (Yi - i)2 = SS(Total) - SSR .
 Now we can calculate R2 by using the following equation

R2=SSR/SS(Total) = 1 - SSE/SS(Total) = b1SSxy/SSyy
and 0  R2  1.
 The computations are usually summarized in tabular form

(ANOVA Table).
34
ANOVA Table
ANOVA Table for Simple Regression
 While the t-test is used to test the significance of individual

independent variables, the ANOVA Table provides an overall
test of the significance of the whole set of independent
variables. The test is an F-test with d.f. (k, n-k-1), where k is
the number of independent variables in the model.
F= MSR/MSE = [R2/k]/[(1-R2)/(n-k-1)] = (n-2)R2/(1-R2).
 For the simple linear regression model, the F-test is
equivalent to the t-test for parameter 1 . But it is not the
case for the multiple regression model.
35
Example 2 (Cont.)
(a) Find SST, SSR, SSE, and R2 .
(b) Present an ANOVA summary Table.
(c) Test the hypothesis H0: 1 = 0 against Ha: 1  0 by
using an F-statistic. Let  = 0.05.
Solution:
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 4.052E7 1 4.052E7 114.539 .000a
Residual 6368342.383 18 353796.799
Total 4.689E7 19
a. Predictors: (Constant), adv
b. Dependent Variable: sales
36
Description of Methods of Regression:
Case When X is Random
 For variable-X case, both X and Y are random variables
measured on cases that are randomly selected from a
population.
 The fixed-X regression model applies in this case when we treat
the X values as if they were pre-selected. This technique is
justifiable theoretically by conditioning on the X values that
happened to be obtained in the sample (Textbook page 83).
Therefore all the previous discussion and formulas are
precisely the same for this case as for the fixed-X case.
 Since both X and Y are considered random variables, other
parameters can be useful for describing the model, say,
covariance of X and Y, denoted by XY (or Cov(X, Y)), and
correlation coefficient, denoted by , which are measures of
how the two variables vary together.
37
Correlation Coefficient
 The correlation coefficient  = XY/XY is a measure of the
direction and the strength of linear association between two
variables. It is dimensionless, and it may take any value between
- 1 and 1, inclusive.
 A positive correlation (i.e.  > 0) means that as one variable

increases, the other likewise increases.
 A negative correlation (i.e.  < 0) means that as one variable

increases, the other decreases.
 If  = 0 for two variables, then we say that the variables are

uncorrelated and that there is no linear association between them.
 Note that  measures only linear relationship. The variables may

be perfectly correlated in a curvilinear relationship, even  = 0.
38
Correlation Coefficient and R2
 The sample correlation coefficient r is an estimator for . The
equation for the sample correlation coefficient is given as follows:
r = SSxy/ [(SSxx)(SSyy)]1/2 .
 Simple regression techniques and correlation methods are related.

In correlation, r is an estimator for the population correlation
coefficient  . In regression, r2 = R2 is simply a measure of
closeness of fit.
 Thus the sample correlation coefficient r is used to estimate the

direction and the strength of the linear relationship between two
variables, whereas the coefficient of determination r2 = R2 is the
proportion of the squared error that the regression equation can
explain when we use the regression equation rather than the
sample mean as a predictor.
39
Test of Coefficient of Correlation
 Note that tests of hypotheses and confidence intervals for the
variable-X case require that X and Y be jointly normally
distributed. That is, X and Y follow a bivariate normal
distribution.
 Under the assumption mentioned above, we can test whether

there is a linear relationship between X and Y variables (i.e. if 
= 0), by using the following t-test. The same conclusion as testing
population slope 1 will be drawn.
(1) H0:  = 0 against Ha:   0 (or  > 0 or  < 0)
(2) The test statistic is t =
Under H0 the statistic t has the t-distribution with (n-2)
degrees of freedom.
40
Example 2 (cont.)
Use the data in the example to test that if there is a significant linear
relationship between the sales and advertising expense (both in
hundreds of dollars). Use  = 0.05.
Solution:
(1) H0:  = 0 against Ha:   0
(2)  = 0.05, n = 20, df = n - 2 = 18 and t0.025, 18 = 2.101
(3) The rejection rule: If the |t| > 2.101, then reject the H0.
(4) Computations: r =SSxy/(SSxxSSyy)1/2 = 0.9296
The test statistic is = = 10.701
(5) We reject H0 at  = 0.05 since t = 10.701 > 2.101 and conclude that
there is a significant linear relationship between the weekly usage
and annual maintenance expense .
41
Further Examination of Computer Output
 Standardized Regression Coefficient
The standardized regression coefficient is the slope in the
regression equation if X and Y are standardized. After
standardization the intercept in the regression equation will
be zero, and for simple linear regression the standardized
slop will be equal to the correlation coefficient r. In multiple
regression, the standardized regression coefficients help
quantify the relative contribution of each X variable.
Coefficientsa
Standardiz
Unstandardized ed 95% Confidence
Coefficients Coefficients Interval for B
Lower Upper
Model B Std. Error Beta t Sig. Bound Bound r
1 (Constant) -57.281 509.750 -.112 .912 -1128.227 1013.665
Adv 17.570 1.642 .930 10.702 .000 14.121 21.019
a. Dependent Variable: sales 42
Checking for Violations of Assumptions
 We usually do not know in advance whether a linear regression
model is appropriate for our data set. Therefore, it is necessary to
conduct a search to check whether the necessary assumptions are
violated. The analysis of the residuals is frequently helpful and
useful tool for this purpose.
 The basic principles apply to all statistical models discussed in this
course.
 Residuals: In model building, a residual is what is left after the
model is fit. It is the difference between an observed value of Y
and the predicted value of Y, i.e. Residuali = ei = (Yi - i). In
regression analysis, the true errors are assumed to be independent
normal variables with a mean of 0 and a constant variance of 2.
If the model is appropriate for the data, the residuals ei, which are
estimates of the true errors, should have similar characteristics.
(Refer to Pages102~103)
43
Checking for Violations of Assumptions
 Identification of equality of variance
Scatter plots can also be used to detect whether the assumption of
constant variance of y for all values of x is being violated. If the
spread of the residuals increases or decreases with the values of
the independent variable or with the predicted values, then the
assumption of homogeneity of variance is being violated.
 Identification of independence
Usually this assumption is relative easy to meet since observations
appear in a random position, and hence successive error terms are
also likely to be random. However, in time series data or repeated
measures data, this problem of dependence between successive
error terms often occurs.
44
Checking for Violations of Assumptions (Cont.)
 Identification of normality
A critical assumption of the simple linear regression model is that
the error terms associated with each xi have a normal
distribution. Note that it is unreasonable to expect the observed
residuals to be exactly normal - some deviation is expected because
of sampling variation. Even if the errors are normally distributed
in the population, sample residuals are only approximately
normal.
Another way to compare the observed distribution of residuals to

that expected under the assumption of normality is to plot the two
cumulative distributions against each other for a series of points.
If the two distributions are identical, a straight line results. It is
called a P-P plot (a cumulative probability plot).
45
Checking for Violations of Assumptions (Cont.)
 Identification of linearity
For the simple regression, a scatter plot gives a good indication of
how well a straight line fits the data. Another convenient method
is to plot the residuals against the predicted values. If the
assumptions of linearity and homogeneity of variance are met,
there should be no relationship between the predicted and residual
values, i.e. the residuals should be randomly distributed around
the horizontal line through zero. You should be suspicious of any
observable pattern.
 Identification of outliers
In combination with a scatter plot of the observed dependent and
independent variables, the plot of residuals can be used to identify
observations which appear to fall a long way from the normal
cluster observations (a residual that is larger than 3s is an outlier).
46
Overview of Tests Involving Residuals
 Tests for Randomness in the Residuals
 Runs Test
 Tests for Autocorrelation in the Residuals in Time Order
 Durbin-Watson Test
 Tests for Normality
 Correlation Test (Shapiro-Wilk Test)
 Chi-Square Test
 Kolmogorov Test
 Tests for Constancy of Error Variance
 Brown-Forsythe (Modified Levene) Test*
 Cook-Weisberg (Breusch-Pagan) Test*
 F-test for Lack Of Fit
 Test whether a linear regression function is a good fit for the data*.
(Note that the tests with * are valid only for large samples or under strong
assumptions)
47
Overview of Remedial Measures
 If the linear regression normal error model is not
appropriate for a data set, there are two basic choices
 Abandon the model and develop and use a more
appropriate model ( non-normal, nonlinear models)

 Employ some transformation(s) on the data.
 Transformations
 Transformations for nonlinear relation
 Transformations for nonnormality and unequal
variances
 Box-Cox Transformations
48
What to Watch Out For
 In the development of the theory for linear regression, the
sample is assumed to be obtained randomly in such a way
that it represents the whole population you are studying.
Often, convenience samples, which are samples of easily
available cases, are taken for economic or other reasons. It is
likely to be an underestimate of the variance and possibly
bias in the regression line.
 The lack of randomness in the sample can seriously

invalidate our inferences. Confidence intervals are often
optimistically narrow because the sample is not truly a
random one from the whole population to which we wish to
generalize.
49
What to Watch Out For (Cont.)
 Association versus Causality – A common mistake made
when using regression analysis is to assume that a strong
fit (high R2) of a regression of Y on X automatically
means that “X causes Y” .
(1) The reverse could be true: Y causes X
(2) There may be third variable related to both X and Y.
 Forecasting Outside the range of the explanatory

variables.
50
Matrix Approach to Simple Linear
Regression Analysis
 yi = 0 + 1 xi + i , i = 1, 2, …, n
This implies
y1 =  0 +  1 x1 +  1 ,
y2 =  0 +  1 x2 +  2 ,
…………………….
yn =  0 +  1 xn +  n ,
Let Yn1 = (y1, y2 , …, yn)’, Xn2 = [1n1 , (x1, x2, … xn)’],
21 = (0 , 1)’ and n1 = (1, 2 , …, n)’ .
Then the normal model in matrix terms is as follows
Yn1 = Xn2 21 + n1 or simply Y = X  + 
where  is a vector of independent normal variables with
E( ) = 0 and Var() = Var(Y) = 2 I.
51
LS Estimation in Matrix Terms
 Normal Equations
n b0 + b1 Xi =  Yi
b0  Xi +b1  Xi2 =  Yi Xi
in matrix terms are X’Xb = X’Y where b = (b0, b1)’.
 Estimated Regression Coefficients
(X’X)-1 X’Xb = (X’X)-1 X’Y
b = (X’X)-1 X’Y
 LSM in Matrix Notation
Q = [Yi - ( 0 + 1 Xi)]2 = (Y - X)’(Y - X)
= Y’Y - ’X’Y - Y’X + ’X’X = Y’Y - 2’X’Y + ’X’X
(Q)/ = -2X’Y + 2X’X = [Q/0, Q/1]’
Equating to the zero vector, dividing by 2, and substituting b for ,
then, b = (X’X)-1 X’Y
52
Fitted Values and Residuals in Matrix Terms
 Fitted Values
 Residuals
 Variance-Covariance Matrix
Var(e) = Var[(I - H)Y] = (I - H) Var(Y) (I - H)’
= (I - H) 2I (I - H)’ = 2 (I - H)
and is estimated by s2(e) = MSE (I - H)
53
ANOVA in Matrix Terms
 SS(Total) = Yi2 - (Yi)2/n = Y’Y - Y’JY/n
SSE = e’e = (Y - Xb)’(Y - Xb) = Y’Y - b’X’Y
SSR = b’X’Y - Y’JY/n
Note that Xb = HY and b’X’ = (Xb)’ = (HY)’ = Y’H, then
SS(T) = Y’(I - J/n)Y = Y’A1Y
SSE = Y’(I - H)Y = Y’A2Y
SSR = Y’(H - J/n)Y = Y’A3Y
Since A1, A2 and A3 are symmetric, SS(T), SSE and SSR are
quadratic forms of the Yi.
 Quadratic forms play an important role in statistics because
all sum of squares in the ANOVA for linear statistical
models can be expressed as quadratic forms.
54
Inferences in Matrix Terms
 The variance covariance matrix
Var(b) = 2 (X’X)-1
The estimated variance-covariance matrix of b is
s2(b) = MSE (X’X)-1
 Mean Response
Let Xh = (1, xh)’
Var( ) = 2 Xh’(X’X)-1 Xh
The estimated variance of in matrix notation is
s2( ) = MSE(Xh’(X’X)-1 Xh)
 Prediction of New Observation
s2(pred) = MSE(1+Xh’(X’X)-1 Xh)
55

Simple Linear Regression Analysis

Uploaded by

Simple Linear Regression Analysis

Uploaded by

Simple Linear Regression Analysis

 Introduction to Regression Analysis

 The simple regression analysis means that the value of the

 The model can be stated as follows:

 Denoting estimates of 0 & 1 by 0 = b0 and 1 = b1

 This minimization procedure for estimating the parameters is

 Note that the common variance 2 can not be estimated by LSM.

 All properties can be proved directly by using the norm equations,

Model B Std. Error Beta t Sig.

a. Dependent Variable: Divyield

 Aside from merely estimating the linear relationship

 To make inferences or test hypotheses concerning these

(3) The variance of i : var( i) = 2 [(1/n) + (Xi - )2/SSxx]

Confidence limits for E(Yi)

 Prediction intervals resemble confidence intervals.

(a) What is the sample regression equation relating sales to advertising?

Std. Error of the

b. Dependent Variable: sales

Unstandardized Standardized 95% Confidence Interval

 The R2 , called the coefficient of determination, provides a

 R2 is the proportion of the variability in the dependent variable that

Sum of squares due to regression

Total Explained Unexplained

The total sum of squares is SST = (Yi - )2 .

 Now we can calculate R2 by using the following equation

 The computations are usually summarized in tabular form

 While the t-test is used to test the significance of individual

b. Dependent Variable: sales

 A positive correlation (i.e.  > 0) means that as one variable

 A negative correlation (i.e.  < 0) means that as one variable

 If  = 0 for two variables, then we say that the variables are

 Note that  measures only linear relationship. The variables may

 Simple regression techniques and correlation methods are related.

 Thus the sample correlation coefficient r is used to estimate the

 Under the assumption mentioned above, we can test whether

The test statistic is = = 10.701

Another way to compare the observed distribution of residuals to

appropriate model ( non-normal, nonlinear models)

 Transformations for nonnormality and unequal

 The lack of randomness in the sample can seriously

 Forecasting Outside the range of the explanatory

You might also like