Lecture 10 Heteroscedasticity
Lecture 10 Heteroscedasticity
Heteroscedasticity
The assumption A.4 of Model A which is equivalent to the second Gauss-Markov condition
requires of the disturbance term to be homoscedastic. It is the same to say that the dispersion of the
disturbance term is identical in all observations. Mathematically it is written as 𝜎𝑢2𝑖 = 𝜎𝑢2 for all i.
However, it is not always a plausible assumption: variance is generally speaking different in different
observations. Heteroscedasticity is concerned with a non-constant variance of the disturbance term.
Formally, it means that 𝜎𝑢2𝑖 ≠ 𝜎𝑢2𝑗 for some 𝑖 ≠ 𝑗. This lecture will analyze heteroscedasticity
according to the following plan:
1. Reasons
2. Consequences
3. Detection
4. Remedial measures
I. Reasons
The problem of heteroscedasticity can arise when the scale of different economic variables changes in
the same direction. Suppose, we want to analyze how expenditure on education depends on GDP of a
country. We have n observations for n countries and the following model is used:
𝑬𝑫𝑼𝑪_𝑬𝑿𝑷 = 𝜷𝟏 + 𝜷𝟐 𝑮𝑫𝑷 + 𝒖
Obviously, both variables 𝑬𝑫𝑼𝑪_𝑬𝑿𝑷and 𝑮𝑫𝑷 change their scales simultaneously: a country
with a larger GDP can spend in absolute terms much more on education. Therefore, the absolute value
of the expected dispersion of the dependent variable 𝑬𝑫𝑼𝑪_𝑬𝑿𝑷 will increase with the value of the
explanatory variable 𝑮𝑫𝑷. The reason is that the variances of the omitted variables and the
measurement errors, which jointly determine the values of the disturbance term, rise.
Cross-sectional data often give rise to heteroscedasticity. Many economic variables tend to
move in size together. For example, consider the sample that contains data on different companies.
Heteroscedasticity is likely to arise because large firms will typically display much greater variation
(for instance, in profits, costs, expenditures …) than smaller ones. One more example is when we
analyze the relationship between consumption expenditures and family income. It is reasonable to
suppose that a family with greater aggregate income will have greater variation in consumption
expenditures.
II) Consequences
Let’s analyze statistical properties the OLS estimators when the disturbance term is subject to
heteroscedasticity. Obviously, the fact that the dispersion of the disturbance term is not constant
affects standard deviations of the regression estimators because we do not use all available
information about counting each observation differently. Nevertheless, the procedure of OLS
estimation and checking unbiasedness do not depend on the presence of heteroscedasticity, so
expected values of estimators are unchanged. The main results are as following:
1) Standard errors of the regression coefficients are estimated wrongly (it is likely that they will
be underestimated). Hence, t-test and F-test are invalid. Moreover, as the bias of standard
errors will be typically negative, t-statistics will be overestimated giving rise to the misleading
impression of the precision of regression coefficients;
2) OLS estimators are unbiased and consistent BUT inefficient. It becomes possible to find
estimators that are still unbiased with a smaller variance.
1
Intuitive explanation
𝒀 = 𝜷𝟏 + 𝜷𝟐 𝑿 + 𝒖
Let’s illustrate some reasons behind the consequences of heteroscedasticity graphically:
Heteroscedasticity Homoscedasticity
(different distributions of 𝑢 in terms of (the same distributions of 𝑢 in terms of
variance) variance)
Mathematically:
Let’s use explicit formulae for the estimated coefficients for the simple linear regression model to
analyze their statistical properties. In fact, there are several ways to do so. Let’s use the following
(see lecture 3):
Model: 𝒀𝒊 = 𝜷𝟏 + 𝜷𝟐 𝑿𝒊 + 𝒖𝒊 , where 𝑢𝑖 ~𝑁(0, 𝜎𝑢2𝑖 )
𝑋𝑖 −𝑋 𝑥𝑖
OLS estimation: 𝑏2 = 𝛽2 + ∑𝑖 𝑎𝑖 𝑢𝑖 , where 𝑎𝑖 = 2 =∑ 2
∑𝑗(𝑋𝑗 −𝑋) 𝑗 𝑥𝑗
When we take the expectation of 𝑏2 , we have nothing to do with the fact that the variance of the
disturbance term is not constant. Hence, there is the same result as before:
E(𝑏2 ) = E(𝛽2 + ∑𝑖 𝑎𝑖 𝑢𝑖 ) = 𝛽2 + E(∑𝑖 𝑎𝑖 𝑢𝑖 ) = 𝛽2 + ∑𝑖 𝑎𝑖 E(𝑢𝑖 ) = 𝛽2 + ∑𝑖 𝑎𝑖 ⋅ 0 = 𝛽2 =>
unbiased.
We used that 𝑋 − is non-stochastic and E(𝑢𝑖 ) = 0.
Precision:
𝑛 𝑛
2 2
𝜎𝑏22 = E{(𝑏2 − 𝐸(𝑏2 )) 2}
= E{(𝑏2 − 𝛽2 )2 } 2
= E(∑ 𝑎𝑖 𝑢𝑖 ) = ∑ 𝑎𝑖 E(𝑢𝑖 ) + ∑ ∑ 𝑎𝑖 𝑎𝑗 𝐸(𝑢𝑖 𝑢𝑗 ) =
𝑖 𝑖=1 𝑖=1 𝑗≠𝑖
𝑛 𝑛
∑𝑛𝑖=1(𝑋𝑖 − 𝑋̄)2 𝜎𝑖2 ∑𝑛𝑖=1 𝑥𝑖 2 𝜎𝑖2
= ∑ 𝑎𝑖2 𝜎𝑖2 +0 = ∑ 𝑎𝑖 𝜎𝑖22
= = 2.
2 2
𝑖=1 𝑖=1 (∑𝑛𝑗=1(𝑋𝑗 − 𝑋̄) ) (∑𝑛𝑗=1 𝑥𝑗 2 )
2
Standard errors are biased but White (1980) shows that the following estimator is consistent:
∑𝑛 2 2
𝑖=1 𝑥𝑖 𝑒𝑖
s. e. (𝑏2) = 𝑠𝑏2 = √ 2 = √∑𝑛𝑖=1 𝑎𝑖 2 𝑒𝑖2 −heteroscedasticity-consistent (or robust) s.e.
(∑𝑛 2
𝑗=1 𝑥𝑗 )
3
In other words, we test whether 𝑅𝑆𝑆2 is significantly greater than 𝑅𝑆𝑆1 using F-test. To perform
𝑐𝑟𝑖𝑡
the test let’s compare the calculated F-statistics to 𝐹𝛼% :
𝑐𝑟𝑖𝑡
If 𝐹 < 𝐹𝛼% , then do not reject the null hypothesis of homoscedasticity at 𝛼% significance level;
𝑐𝑟𝑖𝑡
If 𝐹 > 𝐹𝛼% , then there is enough evidence of heteroscedasticity of the type 𝜎𝑖 = 𝛾 ⋅ 𝑋𝑖 at 𝛼%
significance level.
Note, that this test can be used for the case when the standard deviation of the disturbance term is
𝛾
inversely related to some factor 𝑋 i.e. 𝜎𝑖 = 𝑋 . In this case we get RSS1 RSS 2 . Hence,
𝑖
𝑅𝑆𝑆1 /(𝑛1 − 𝑘) 𝐻0
𝐹= ~ 𝐹(𝑛1 − 𝑘, 𝑛2 − 𝑘).
𝑅𝑆𝑆2 /(𝑛2 − 𝑘)
Then perform the same F-test as before.
Example:
Consider the model: 𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝑢
White auxiliary equation: 𝑒𝑖 2 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝛽4 𝑋2𝑖 2 + 𝛽5 𝑋3𝑖 2 + 𝛽6 𝑋2𝑖 𝑋3𝑖 + 𝑣𝑖
largesample
𝐻0
2 2 2
𝜒 −statistics: 𝜒 = 𝑛 ⋅ 𝑅 ~ 𝜒 2 (5) as there are 5 regressors (# of estimated parameters in
White auxiliary equation minus one).
Note that 5 degrees of freedom are absorbed in the White auxiliary equation, leaving 𝑛 − 5 degrees
of freedom in this case for the regression. If the regression includes many explanatory variables,
then it may cause a problem of insufficiency of d.f. left.
4
White test
Advantages Disadvantages
• low power – a price for its generality;
• only for large samples;
• can be performed if the type of
• if many explanatory variables are included,
heteroscedasticity is not known;
then the problems of insufficient d.f. left and
• can detect heteroscedasticity even
possible perfect multicollinearity arise =>
if it is not connected with factors
some variables should be excluded in the
included in the regression.
White auxiliary equation but what are these
variables? No predictions.
These problems explain the reasons for possible different results of the White test and the
Goldfeld-Quandt test.
The Breusch-Pagan (or the Breusch-Pagan-Godfrey) test is doing the same as the White test, but
the auxiliary equation does not include the quadratic terms:
For the model: 𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝑢
The Breusch-Pagan auxiliary equation is: 𝑒𝑖 2 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝑣𝑖 . It needs less
parameters to be estimated than the White test.
5
particular, that the true relationship is in fact logarithmic.
𝐥𝐨𝐠 𝒀 = 𝜷𝟏 + 𝜷𝟐 𝐥𝐨𝐠 𝑿 + 𝒖 <=> 𝒀 = 𝒆𝜷𝟏 𝑿𝜷𝟐 𝒆𝒖
This specification means that for large values of 𝑋 the absolute size of the effect of the
disturbance term is large, while for small values of 𝑋 it is small. In other words, 𝑢 in the
logarithmic model is equivalent to multiplicative one in the original specification 𝑌 = 𝑒 𝛽1 𝑋𝛽2 𝑒 𝑢 .
Heteroscedasticity: Monte Carlo illustration
The idea of biased standard errors and inefficient estimators can be illustrated with the help of
a Monte Carlo experiment.
Firstly, it is necessary to generate the data so that the assumption of homoscedasticity fails (in
EViews this can be done by generating the disturbance term according to the formula 𝑢𝑖 = 𝑋𝑖 ⋅
𝑁𝑅𝑁𝐷).
Secondly, having conducted a large number of experiments (1 million ones), it is possible to
calculate the true standard deviations of the regression coefficients according to the formula
∑(𝑏𝑖 − 𝛽)2
𝑠. 𝑑. (𝑏) = √
𝑛
and obtain their distribution. Moreover, if we specify the type of heteroscedasticity as 𝜎𝑖 = 𝛾 ⋅ 𝑋𝑖 ,
then it can be eliminated by means of WLS that allows us to compare efficiency of estimates.
Finally, knowing the true standard deviations, it is possible to determine the direction of bias
of standard errors’ estimates and analyze efficiency issues of regressors’ coefficients.
Illustration:
Inefficiency Biased standard errors
Both OLS and WLS are unbiased but WLS is Comparing standard errors it is evident that they
more efficient are biased.