Econometrics II, Spring 2018
Department of Economics, University of Copenhagen
By Morten Nyboe Tabor
Solution Guide:
Theoretical Exercises #4
The ARCH Model
#4.1 The ARCH Model
Consider the ARCH(1) model given by:
rt = δ + t , (4.1)
t = σt zt , zt ∼ N (0, 1), (4.2)
σt2 = $ + α2t−1 , (4.3)
where $ > 0 and α ≥ 0. Define the information set It−1 = {r1 , r2 , ..., rt−1 }.
(1) Derive the conditional mean of the innovations: E(t |It−1 ).
Derive the conditional variance of the innovations: E(2t |It−1 ).
State the distribution of the innovations t conditional on the information
set It−1 .
Solution:
We first derive the unconditional mean of the innovations. We first plug in the
expression for t in equation (4.2). Then we use that E(σt |It−1 ) = σt as σt is
known at time t. This holds as the innovation t−1 is known at time t (it is only
a function of rt−1 , which is included in It−1 , and the parameter δ).
1
We get:
E(t |It−1 ) = E(σt zt |It−1 )
= σt E(zt |It−1 )
= σt E(zt )
= σt · 0
= 0.
In the last steps, we have used that the information set It contains no information
about zt , so E(zt |It−1 ) = E(zt ), and from zt ∼ N (0, 1) that E(zt ) = 0.
Second, we derive the conditional variance using arguments similar to those
above and noting that V (t |It−1 ) = E(2t |It−1 ) as we have shown above that the
conditional mean is zero. We get:
E(2t |It−1 ) = E((σt zt )2 |It−1 )
= E(σt2 zt2 |It−1 )
= σt2 E(zt2 |It−1 )
= σt2 E(zt2 )
= σt2 · 1
= σt2 .
We have, thus, shown that the conditional variance of the innovations is σt2 .
This gives the following interpretation of the ARCH model. Equation (4.1)
defines the equation for the conditional mean of rt and equation (4.3) defines the
equation for the conditional variance of the innovations. This is a generalization
of the models we have worked with so far where the innovations had a constant
variance. It should also be noted that the restriction α = 0 gives the standard
model with a constant conditional variance.
Finally, using the two results above, we can state the conditional distribu-
tional of the innovations as:
t |It−1 ∼ N (0, σt2 ),
so the innovations are conditionally normally distributed with a conditional vari-
ance σt2 . This holds because t = σt zt , so the standard normally distributed
innovations zt ∼ N (0, 1) are scaled by σt in every period. Note that while the in-
novations t are conditionally normally distributed, they are not unconditionally
normally distributed. Moreover, note that the conditional variance is predeter-
mined in the sense that it is given at time t.
(2) Explain in words why the parameters $ and α are restricted to be positive
and non-negative, respectively.
2
Solution:
As σt2 is the conditional variance of the innovations t , the parameter restric-
tions $ > 0 and α ≥ 0 are imposed on the ARCH model to ensure a positive
conditional variance.
(3) Explain in words how the ARCH(1) model allows for volatility clustering.
Solution:
The ARCH model allows for volatility clustering as the conditional variance σt2 in
equation (4.3) depends on the lagged squared innovations 2t−1 . A large (positive
or negative) residual at time t − 1 implies that 2t−1 is large and consequently
the conditional variance σt2 will be large as α ≥ 0. Thereby, large shocks tend
to be followed by large shocks (in absolute terms) and small shocks tend to be
followed by small shocks (in absolute terms). This is the feature we call volatility
clustering.
(4) Derive the unconditional mean of rt : E(rt ).
Solution:
We plug in the expression for rt from equation (4.1), and use the law of iterated
expectations and the fact that E(t |It−1 ) = 0 from above, to get:
E(rt ) = E(δ + t )
= δ + E(t )
= δ + E(E(t |It−1 ))
= δ + E(0)
= δ.
We find that the conditional expectation of rt is simply the constant term δ. So
equation (4.1) defines the equation for the conditional mean of rt
We could extend the model with explanatory variables, say rt = x0t θ + t or
rt = δ + ρrt−1 + t . What would the conditional mean then be?
Any random variable xt can be decomposed into the sum of a conditional expec-
tation given some information set I and a residual vt which is uncorrelated with
that information set:
xt = E(xt |I) + vt , where E(vt |I) = 0.
(5) Use the decomposition above for 2t conditional on the information set It−1 ,
2t = E(2t |It−1 ) + vt , E(vt |It−1 ), (4.4)
to show that the squared innovations 2t follow an AR(1) process with resid-
uals vt .
3
Solution:
We first use the decomposition in (4.4) to find an expression for σt2 :
2t = E(2t |It−1 ) + vt = σt2 + vt ,
such that
σt2 = 2t − vt .
This expression for σt2 we can plug in to equation (4.3) to get:
σt2 = $ + α2t−1
2t − vt = $ + α2t−1
2t = $ + α2t−1 + vt , (4.5)
which shows that the ARCH(1) model implies that the squared innovations 2t
follow an AR(1) process with innovations vt .
What would the equivalent result be if we extended the ARCH(1) model to an
ARCH(p) model by replacing (4.3) by σt2 = $ + α1 2t−1 + α2 2t−2 + ... + +αp 2t−p ?
(6) State the condition for the process for rt to be weakly stationary.
Solution:
The condition for weak stationarity is that the parameter α satisfy 0 ≤ α < 1.
This is equivalent to the well-known condition that −1 < α < 1 for the
process for 2t in (4.5) to be weakly stationary combined with the restriction
α ≤ 0 in the ARCH model.
(7) Assuming that the condition for weak stationarity is fullfilled, derive the
unconditional variance of the innovations t : E(2t ).
Solution:
Let σ 2 denote the unconditional variance σ 2 = E(2t ).
We assume that the condition for weak stationarity, 0 ≤ α < 1, is fullfilled,
such that σ 2 = E(2t ) = E(2t−1 ). We find the expression for σ 2 by taking the
unconditional expectation of (4.5):
E(2t ) = E($ + α2t−1 + vt )
σ 2 = $ + αE(2t−1 ) + E(vt )
σ 2 = $ + ασ 2 + E(E(vt |It−1 ))
σ 2 = $ + ασ 2 + E(0)
σ 2 = $ + ασ 2 ,
4
and by re-arranging terms we find the unconditional variance of the innovations:
$
σ 2 = E(2t ) = . (4.6)
1−α
Note that the unconditional variance is not given by $. Instead, it depends on
both $ and α, which is equivalent to the results we know for a autoregressive
processes.
Note also that the unconditional variance E(2t ) is not defined for α ≥ 1.
(8) What does it mean that the innovations t in the weakly stationary ARCH(1)
model are unconditionally homoskedastic and conditionally heteroskedas-
tic?
Solution:
In the weakly stationary ARCH(1) model the innovations t are conditionally
heteroskedastic as the conditional variance of the innovations, E(2t |It−1 ) = σt2 ,
changes over time in an autoregressive way – hence the name, conditional autore-
gressive heteroskedasticity. This is the feature that allows us to model volatility
clustering in financial data.
However, in the weakly stationary ARCH(1) model the innovations are un-
conditionally homoskedastic as the unconditional variance of the innovations is
constant, E(2t ) = σ 2 = $/(1 − α).
#4.2 The GARCH Model
Consider the GARCH(1,1) model given by:
rt = δ + t , (4.7)
t = σt zt , zt ∼ N (0, 1), (4.8)
σt2 = $ + α2t−1 + βσt−1
2
, (4.9)
where $ > 0, α ≥ 0, and β ≥ 0. Define the information set It−1 = {r1 , r2 , ..., rt−1 }.
(1) Derive the variance of the innovations t conditional on the information set
It−1 , i.e. derive E(2t |It−1 ).
State the distribution of the innovations t conditional on the information
set It−1 .
Solution:
(2) Use the decomposition of 2t in (4.4) to show that the squared innovations
2t follow an ARMA(1,1) process with residuals vt .
5
Solution:
First, as in # 4.1 Question 1, the conditional variance of the innovations is zero:
E(t |It−1 ) = 0. The proof is identical to the one in # 4.1 Question 1.
That implies that the conditional variance equals V (t |It−1 ) = E(2t |It−1 ).
Similarly to # 4.1 Question 1, we find that:
E(2t |It−1 ) = σt2 ,
but compared to the ARCH(1) model the specification of the conditional variance
differs as σt2 now depends on both 2t−1 .
(3) State the condition for weak stationarity of rt .
Solution:
Weak stationarity of rt requires that the unconditional mean and variance of rt
are constant (and that the autocovariances are constant over time, but we do not
consider those here).
The unconditional mean is given by E(rt ) = E(δ + t ) = δ + E(t ) = δ as
E(t ) = 0 (see Question 4 in Theoretical Exercise # 4.1). That implies that the
unconditional variance is:
V (rt ) = E((rt − E(rt ))2 ) = E((δ + t − δ)2 ) = E(2t ).
To find the condition for weak stationarity, we therefore need to derive an ex-
pression for 2t . From the decomposition of 2t in (4.4) we have that σt2 = 2t − vt .
2
Pluggin in this expression for σt2 and σt−1 in (4.9), we get:
σt2 = $ + α2t−1 + βσt−1
2
2t − vt = $ + α2t−1 + β(2t−1 − vt−1 )
2t = $ + (α + β)2t−1 + vt − βvt−1 , (4.10)
which shows that the squared innovations 2t follow an ARMA(1,1) process with
residuals vt . From the topic on ARMA models, we know that this process has a
stationary solution if the autoregressive parameter α + β satisfies the condition
0 ≤ α + β < 1 (the greater than or equal to zero part follows from the parameter
restrictions α ≥ 0 and β ≥ 0).
(4) Assuming that the condition for weak stationarity is fullfilled, derive the
unconditional variance of the innovations t : E(2t ).
Solution:
We assume that the condition for weak stationarity, 0 ≤ α + β < 1, is satisfied.
Given stationarity it holds that σ 2 = E(t ), i.e. the unconditional variance of the
6
innovations is the same for all t. We use this to find the unconditional variance
from (4.10):
E(2t ) = E($ + (α + β)2t−1 + vt − βvt−1 )
E(2t ) = $ + (α + β)E(2t−1 ) + E(vt ) − βE(vt−1 )
σ 2 = $ + (α + β)σ 2 + 0 − β · 0
σ 2 = $ + (α + β)σ 2 ,
where we have used the law of iterated expectations and E(vt |It−1 ) = 0, which
implies that E(vt ) = E(E(vt |It−1 )) = E(0) = 0. A similar argument applies to
E(vt−1 ).
By re-arranging terms in the last equation, we find the unconditional variance
of the innovations:
$
σ 2 = E(2t ) = .
1−α−β
Note that the unconditional variance of the innovations depends on all three
parameters $, α, and β – it is not given exclusively by $!
Similarly to the weakly stationary ARHC(1) model in Theoretical Exercise
#4.1, the weakly stationary GARCH(1,1) model has innovations which are con-
ditionally heteroskedastic with time-varying conditional variance E(2t |It−1 ) = σt2
and unconditionally homoskedastic with a constant unconditional variance σ 2 =
E(2t ) = $/(1 − α − β).
(5) (Advanced) Show that σt2 can be written as a function of the squared
innovations 2t−1 , 2t−2 , 2t−3 , ....
[ Hint: Start from the equation in (4.9) and plug in the model’s expression
2 . Continue to do so recursively for σ 2 , then σ 2 , etc. ]
for σt−1 t−2 t−3
Solution:
2 :
We start from equation (4.9) and plug in recursively for σt−1
σt2 = $ + α2t−1 + βσt−1
2
= $ + α2t−1 + β($ + α2t−2 + βσt−2
2
)
= (1 + β)$ + α2t−1 + βα2t−2 + β 2 σt−2
2
= (1 + β)$ + α2t−1 + βα2t−2 + β 2 ($ + α2t−3 + βσt−3
2
)
= (1 + β + β 2 )$ + α2t−1 + βα2t−2 + β 2 α2t−3 + β 3 σt−3
2
.
Continuing the recursive substitution, we can write the conditional variance σt2
in terms of the squared residuals 2t in the infinite past:
∞ ∞
X $ X
σt2 2 3
= (1 + β + β + β + ...)$ + α β i−1 2t−i = +α β i−1 2t−i .
1−β
i=1 i=1
7
This shows that the GARCH(1,1) model can be interpreted as an infinite ARCH
process with restricted parameters. This is a major reason the GARCH model
is typically empirically more relevant than the ARCH(p) model.