Lecture 10 Heteroscedasticity

Lecture 10
Heteroscedasticity
The assumption A.4 of Model A which is equivalent to the second Gauss-Markov condition
requires of the disturbance term to be homoscedastic. It is the same to say that the dispersion of the
disturbance term is identical in all observations. Mathematically it is written as 𝜎𝑢2𝑖 = 𝜎𝑢2 for all i.
However, it is not always a plausible assumption: variance is generally speaking different in different
observations. Heteroscedasticity is concerned with a non-constant variance of the disturbance term.
Formally, it means that 𝜎𝑢2𝑖 ≠ 𝜎𝑢2𝑗 for some 𝑖 ≠ 𝑗. This lecture will analyze heteroscedasticity
according to the following plan:
1. Reasons
2. Consequences
3. Detection
4. Remedial measures
I. Reasons
The problem of heteroscedasticity can arise when the scale of different economic variables changes in
the same direction. Suppose, we want to analyze how expenditure on education depends on GDP of a
country. We have n observations for n countries and the following model is used:
𝑬𝑫𝑼𝑪_𝑬𝑿𝑷 = 𝜷𝟏 + 𝜷𝟐 𝑮𝑫𝑷 + 𝒖
Obviously, both variables 𝑬𝑫𝑼𝑪_𝑬𝑿𝑷and 𝑮𝑫𝑷 change their scales simultaneously: a country
with a larger GDP can spend in absolute terms much more on education. Therefore, the absolute value
of the expected dispersion of the dependent variable 𝑬𝑫𝑼𝑪_𝑬𝑿𝑷 will increase with the value of the
explanatory variable 𝑮𝑫𝑷. The reason is that the variances of the omitted variables and the
measurement errors, which jointly determine the values of the disturbance term, rise.
Cross-sectional data often give rise to heteroscedasticity. Many economic variables tend to
move in size together. For example, consider the sample that contains data on different companies.
Heteroscedasticity is likely to arise because large firms will typically display much greater variation
(for instance, in profits, costs, expenditures …) than smaller ones. One more example is when we
analyze the relationship between consumption expenditures and family income. It is reasonable to
suppose that a family with greater aggregate income will have greater variation in consumption
expenditures.
II) Consequences
Let’s analyze statistical properties the OLS estimators when the disturbance term is subject to
heteroscedasticity. Obviously, the fact that the dispersion of the disturbance term is not constant
affects standard deviations of the regression estimators because we do not use all available
information about counting each observation differently. Nevertheless, the procedure of OLS
estimation and checking unbiasedness do not depend on the presence of heteroscedasticity, so
expected values of estimators are unchanged. The main results are as following:
1) Standard errors of the regression coefficients are estimated wrongly (it is likely that they will
be underestimated). Hence, t-test and F-test are invalid. Moreover, as the bias of standard
errors will be typically negative, t-statistics will be overestimated giving rise to the misleading
impression of the precision of regression coefficients;
2) OLS estimators are unbiased and consistent BUT inefficient. It becomes possible to find
estimators that are still unbiased with a smaller variance.
1
Intuitive explanation
𝒀 = 𝜷𝟏 + 𝜷𝟐 𝑿 + 𝒖
Let’s illustrate some reasons behind the consequences of heteroscedasticity graphically:
Heteroscedasticity Homoscedasticity
(different distributions of 𝑢 in terms of (the same distributions of 𝑢 in terms of
variance) variance)
sample values sample values

theoretical values theoretical values
Observations with lower variance like 𝑋1 and Before the sample is generated, each
𝑋2 tend to be better guides than ones with a observation of 𝑋𝑖 is equally reliable guide to the
higher variance like 𝑋5 BUT OLS procedure location of the true regression line
count them equally => INEFFICIENCY
In other words,
2
The OLS criteria is ∑𝑖 𝑒𝑖 → 𝑚𝑖𝑛. In the presence of heteroscedasticity, obviously, it would be
better to use a rule ∑𝑖 𝛾𝑖 𝑒𝑖2 → 𝑚𝑖𝑛, where 𝛾𝑖 some decreasing function with respect to 𝜎𝑢2𝑖 .
Mathematically:
Let’s use explicit formulae for the estimated coefficients for the simple linear regression model to
analyze their statistical properties. In fact, there are several ways to do so. Let’s use the following
(see lecture 3):
Model: 𝒀𝒊 = 𝜷𝟏 + 𝜷𝟐 𝑿𝒊 + 𝒖𝒊 , where 𝑢𝑖 ~𝑁(0, 𝜎𝑢2𝑖 )
𝑋𝑖 −𝑋 𝑥𝑖
OLS estimation: 𝑏2 = 𝛽2 + ∑𝑖 𝑎𝑖 𝑢𝑖 , where 𝑎𝑖 = 2 =∑ 2
∑𝑗(𝑋𝑗 −𝑋) 𝑗 𝑥𝑗
When we take the expectation of 𝑏2 , we have nothing to do with the fact that the variance of the
disturbance term is not constant. Hence, there is the same result as before:
E(𝑏2 ) = E(𝛽2 + ∑𝑖 𝑎𝑖 𝑢𝑖 ) = 𝛽2 + E(∑𝑖 𝑎𝑖 𝑢𝑖 ) = 𝛽2 + ∑𝑖 𝑎𝑖 E(𝑢𝑖 ) = 𝛽2 + ∑𝑖 𝑎𝑖 ⋅ 0 = 𝛽2 =>
unbiased.
We used that 𝑋 − is non-stochastic and E(𝑢𝑖 ) = 0.
Precision:
𝑛 𝑛
2 2
𝜎𝑏22 = E{(𝑏2 − 𝐸(𝑏2 )) 2}
= E{(𝑏2 − 𝛽2 )2 } 2
= E(∑ 𝑎𝑖 𝑢𝑖 ) = ∑ 𝑎𝑖 E(𝑢𝑖 ) + ∑ ∑ 𝑎𝑖 𝑎𝑗 𝐸(𝑢𝑖 𝑢𝑗 ) =
𝑖 𝑖=1 𝑖=1 𝑗≠𝑖
𝑛 𝑛
∑𝑛𝑖=1(𝑋𝑖 − 𝑋̄)2 𝜎𝑖2 ∑𝑛𝑖=1 𝑥𝑖 2 𝜎𝑖2
= ∑ 𝑎𝑖2 𝜎𝑖2 +0 = ∑ 𝑎𝑖 𝜎𝑖22
= = 2.
2 2
𝑖=1 𝑖=1 (∑𝑛𝑗=1(𝑋𝑗 − 𝑋̄) ) (∑𝑛𝑗=1 𝑥𝑗 2 )
2
Standard errors are biased but White (1980) shows that the following estimator is consistent:
∑𝑛 2 2
𝑖=1 𝑥𝑖 𝑒𝑖
s. e. (𝑏2) = 𝑠𝑏2 = √ 2 = √∑𝑛𝑖=1 𝑎𝑖 2 𝑒𝑖2 −heteroscedasticity-consistent (or robust) s.e.
(∑𝑛 2
𝑗=1 𝑥𝑗 )
Therefore, if it is impossible to identify the nature of heteroscedasticity, then in large samples

heteroscedasticity-consistent (or robust) standard errors make t-test and F-test asymptotically valid.
However, there are some problems:
1) The obtained estimator may not perform well in finite samples;
2) OLS point estimates remain inefficient.
Heteroscedasticity Homoscedasticity
𝐸(𝑏2 ) = 𝛽2 𝐸(𝑏2 ) = 𝛽2
𝑛 𝑛
∑𝑛𝑖=1(𝑋𝑖 − 𝑋̄)2 𝜎𝑖2 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̄)2
𝜎𝑏22 = ∑ 𝑎𝑖 𝜎𝑖2 2
= 𝜎𝑏22 = 𝜎𝑢2 ⋅ ∑ 𝑎𝑖 = 2
𝜎𝑢2 ⋅
2 2 2 2
𝑖=1 (∑𝑛𝑗=1(𝑋𝑗 − 𝑋̄) ) 𝑖=1 (∑𝑛𝑗=1(𝑋𝑗 − 𝑋̄) )
𝑛
∑𝑛𝑖=1 𝑥𝑖2 𝑒𝑖2 ∑𝑛𝑖=1 𝑒𝑖2
𝑠𝑏22 = = ∑ 𝑎𝑖 𝑒𝑖22 𝑠𝑏22 =
(∑𝑛𝑗=1 𝑥𝑗 2 )
2 (𝑛 − 2) ⋅ (∑𝑛𝑗=1 𝑥𝑗 2 )
𝑖=1
Therefore, depending on the behavior of 𝜎𝑖2 , the population variance of the heteroscedasticity
case can be either greater or less than the standard one of the homoscedasticity case. Generally,
the direction of the bias depends on the pattern of the heteroscedasticity. It can be shown
2
that when 𝜎𝑖2 and 𝑥𝑖 = (𝑋𝑖 − 𝑋) are positively correlated, the OLS estimation underestimates
the true variance of the 𝑏2 .
III) Detection
There are several tests for detection of heteroscedasticity:
1. Goldfeld-Quandt test
It assumes that the standard deviation of the disturbance term is proportional to some factor, i.e.
𝜎𝑖 = 𝛾 ⋅ 𝑋𝑖 . Assume that other assumptions of Model A are satisfied.
Testing procedure:
𝐻0 : Homoscedasticity: Var(𝑋𝑖 ) = 𝜎 2
𝐻1 : Heteroscedasticity: 𝜎𝑖 = 𝛾 ⋅ 𝑋𝑖
1) Arrange all observations according to the factor 𝑋;

2) Divide the sample into 3 subsamples: 𝑛1 observations with the smallest 𝑋, 𝑛2 observations
with the largest 𝑋, and the remaining observations (𝑛 − 𝑛1 − 𝑛2 ) that are dropped entirely. It
has been empirically established that the maximum power of the Goldfeld-Quandt test is achieved
3
when 𝑛1 = 𝑛2 ≈ 8 𝑛. Then 2 separate regressions for the two groups are estimated and the
following F-statistics is computed:
𝑅𝑆𝑆2 /(𝑛2 − 𝑘) 𝐻0
𝐹= ~ 𝐹(𝑛2 − 𝑘, 𝑛1 − 𝑘)
𝑅𝑆𝑆1 /(𝑛1 − 𝑘)
where RSS1 is calculated for the first subsample (smallest X)
RSS2 is calculated for the second subsample (largest X)
𝑘 − number of estimated parameters (including constant)
3
In other words, we test whether 𝑅𝑆𝑆2 is significantly greater than 𝑅𝑆𝑆1 using F-test. To perform
𝑐𝑟𝑖𝑡
the test let’s compare the calculated F-statistics to 𝐹𝛼% :
𝑐𝑟𝑖𝑡
If 𝐹 < 𝐹𝛼% , then do not reject the null hypothesis of homoscedasticity at 𝛼% significance level;
𝑐𝑟𝑖𝑡
If 𝐹 > 𝐹𝛼% , then there is enough evidence of heteroscedasticity of the type 𝜎𝑖 = 𝛾 ⋅ 𝑋𝑖 at 𝛼%
significance level.
Note, that this test can be used for the case when the standard deviation of the disturbance term is
𝛾
inversely related to some factor 𝑋 i.e. 𝜎𝑖 = 𝑋 . In this case we get RSS1  RSS 2 . Hence,
𝑖
𝑅𝑆𝑆1 /(𝑛1 − 𝑘) 𝐻0
𝐹= ~ 𝐹(𝑛1 − 𝑘, 𝑛2 − 𝑘).
𝑅𝑆𝑆2 /(𝑛2 − 𝑘)
Then perform the same F-test as before.
2. White test and Breusch-Pagan test

It is used for the detection of heteroscedasticity of the general type. This test deals with residuals
because, in some sense, 𝑒𝑖2 is a counterpart of 𝜎𝑖2 .
Testing procedure:
𝐻0 : Homoscedasticity
𝐻1 : Heteroscedasticity (any type).
1) Estimate the equation and get the residuals;
2) Regress the obtained squared residuals on the explanatory variables, their squares, and their
cross-products, omitting any duplicative variables (for example, if there is a dummy variable,
then its square coincides with the dummy variable => duplicative). As a result, we get so called
White auxiliary equation. If there is a perfect multicollinearity, then skip one of the variables.
If the number of d.f. is insufficient, then skip cross-terms.
3) Evaluate the general significance of the equation using usual F-test or, alternatively, 𝜒 2 − 𝑡𝑒𝑠𝑡:
largesample
𝐻0
2 2 2
𝜒 −statistics: 𝜒 = 𝑛 ⋅ 𝑅 ~ 𝜒 2 (𝑑. 𝑓. )
where d.f. ― #of regressors in the White auxiliary equation,
𝑛 − samplesize
𝑅 2 . ― goodness of fit of the White auxiliary equation,
𝑐𝑟𝑖𝑡
Compare the calculated 𝜒 2 −statistics to 𝜒𝛼% :
𝑐𝑟𝑖𝑡
If 𝜒 < 𝜒𝛼% , then do not reject the null hypothesis of homoscedasticity at 𝛼% significance level;
𝑐𝑟𝑖𝑡
If 𝜒 > 𝜒𝛼% , then there is enough evidence of heteroscedasticity of the general type at 𝛼%
significance level.
Example:
Consider the model: 𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝑢
White auxiliary equation: 𝑒𝑖 2 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝛽4 𝑋2𝑖 2 + 𝛽5 𝑋3𝑖 2 + 𝛽6 𝑋2𝑖 𝑋3𝑖 + 𝑣𝑖
largesample
𝐻0
2 2 2
𝜒 −statistics: 𝜒 = 𝑛 ⋅ 𝑅 ~ 𝜒 2 (5) as there are 5 regressors (# of estimated parameters in
White auxiliary equation minus one).
Note that 5 degrees of freedom are absorbed in the White auxiliary equation, leaving 𝑛 − 5 degrees
of freedom in this case for the regression. If the regression includes many explanatory variables,
then it may cause a problem of insufficiency of d.f. left.
4
White test
Advantages Disadvantages
• low power – a price for its generality;
• only for large samples;
• can be performed if the type of
• if many explanatory variables are included,
heteroscedasticity is not known;
then the problems of insufficient d.f. left and
• can detect heteroscedasticity even
possible perfect multicollinearity arise =>
if it is not connected with factors
some variables should be excluded in the
included in the regression.
White auxiliary equation but what are these
variables? No predictions.
These problems explain the reasons for possible different results of the White test and the
Goldfeld-Quandt test.
The Breusch-Pagan (or the Breusch-Pagan-Godfrey) test is doing the same as the White test, but
the auxiliary equation does not include the quadratic terms:
For the model: 𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝑢
The Breusch-Pagan auxiliary equation is: 𝑒𝑖 2 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝑣𝑖 . It needs less
parameters to be estimated than the White test.
IV) Remedial measures

Weighted Least Squares:
In order to deal with the problem of heteroscedasticity, the weighted least squares (WLS)
procedure is usually used. In fact, WLS is a special case of a more general method of generalized
least squares (GLS) that allows to transform variables in such a way that standard assumptions of
model A are satisfied. As a result, GLS estimators are BLUE.
Let’s denote standard deviation of the disturbance term in observation i by 𝜎𝑖 . If the value of 𝜎𝑖 in
each observation was known, we would be able to eliminate heteroscedasticity by dividing both parts
of the equation for each observation by the corresponding 𝜎𝑖 . Then the model takes the form:
𝒀𝒊 𝟏 𝑿𝒊 𝒖𝒊
= 𝜷𝟏 + 𝜷𝟐 +
𝝈𝒊 𝝈𝒊 𝝈𝒊 𝝈𝒊
𝒖𝒊 𝒖
The disturbance term here is homoscedastic, since its population variance is {𝝈𝒊 } =
𝝈𝒊 𝒊
𝒖𝒊 𝟐 𝟏 𝟏
𝑬 {(𝝈 ) } = 𝝈𝟐 𝐄(𝒖𝟐𝒊 ) = 𝝈𝟐 𝝈𝟐𝒖𝒊 = 𝟏 for any i.
𝒊 𝒊 𝒊
In practice, however, the variance of the disturbance term is usually unknown. Nevertheless, we can
assume, for example, that 𝝈𝒊 is proportional to some measurable variable 𝑍𝑖 as in the Goldfeld-
Quandt test, and then to divide each observation by the corresponding value of 𝑍𝑖 , so that the model
becomes:
𝒀𝒊 𝟏 𝑿𝒊 𝒖𝒊
= 𝜷𝟏 + 𝜷𝟐 +
𝒁𝒊 𝒁𝒊 𝒁𝒊 𝒁𝒊
𝑢𝑖
The new disturbance term 𝑍 will have a constant variance= 𝛾 2 . It is unknown but it does not matter
𝑖
because crucially it is constant. Now all the assumptions of the model A are satisfied => OLS
procedure will give BLUE estimates.
Alternative approach:
Heteroscedasticity can be a cause of an inappropriate mathematical specification. Suppose, in
5
particular, that the true relationship is in fact logarithmic.
𝐥𝐨𝐠 𝒀 = 𝜷𝟏 + 𝜷𝟐 𝐥𝐨𝐠 𝑿 + 𝒖 <=> 𝒀 = 𝒆𝜷𝟏 𝑿𝜷𝟐 𝒆𝒖
This specification means that for large values of 𝑋 the absolute size of the effect of the
disturbance term is large, while for small values of 𝑋 it is small. In other words, 𝑢 in the
logarithmic model is equivalent to multiplicative one in the original specification 𝑌 = 𝑒 𝛽1 𝑋𝛽2 𝑒 𝑢 .
Heteroscedasticity: Monte Carlo illustration
The idea of biased standard errors and inefficient estimators can be illustrated with the help of
a Monte Carlo experiment.
Firstly, it is necessary to generate the data so that the assumption of homoscedasticity fails (in
EViews this can be done by generating the disturbance term according to the formula 𝑢𝑖 = 𝑋𝑖 ⋅
𝑁𝑅𝑁𝐷).
Secondly, having conducted a large number of experiments (1 million ones), it is possible to
calculate the true standard deviations of the regression coefficients according to the formula
∑(𝑏𝑖 − 𝛽)2
𝑠. 𝑑. (𝑏) = √
𝑛
and obtain their distribution. Moreover, if we specify the type of heteroscedasticity as 𝜎𝑖 = 𝛾 ⋅ 𝑋𝑖 ,
then it can be eliminated by means of WLS that allows us to compare efficiency of estimates.
Finally, knowing the true standard deviations, it is possible to determine the direction of bias
of standard errors’ estimates and analyze efficiency issues of regressors’ coefficients.
Illustration:
Inefficiency Biased standard errors
Both OLS and WLS are unbiased but WLS is Comparing standard errors it is evident that they
more efficient are biased.

Lecture 10 Heteroscedasticity

Uploaded by

Lecture 10 Heteroscedasticity

Uploaded by

Lecture 10

sample values sample values

Therefore, if it is impossible to identify the nature of heteroscedasticity, then in large samples

1) Arrange all observations according to the factor 𝑋;

2. White test and Breusch-Pagan test

IV) Remedial measures

You might also like