Lectures Notes GARCH_4
Lectures Notes GARCH_4
1
Main Purposes of Lectures 12 and 13:
2
Plan of Lecture 12
(12a) Preliminaries
3
12a. Preliminaries
Heteroscedasticity
Let X = (X1, . . . , Xd) be a gaussian vector with mean vector µ
and variance covariance
matrix Σ. More in detail
σ12 cov12 . . . cov1d
cov σ 2 . . . cov
Σ = . 21 2 2d
. .
. .
. .
.
covd1 covd2 . . . σd2
We say that the vector is homoscedastic (or homoskedastic) when
σ1 = σ2 = · · · = σd.
We say that the vector is heteroscedastic (or heteroskedastic) when
this assumption does not hold.
4
Volatility
Given a financial time series {X(t)} (in general, the log-returns
of a financial instrument) the volatility of the financial instrument
at time t is the standard deviation of the random variable X(t)
conditional on the registered values
F(t − 1) = (X(t − 1), X(t − 2), . . . ).
More precisely, denoting the volatility by σ(t),
σ 2(t) = var(X(t) | F(t))
The variance of a random variable X conditional on the values F
can be computed as
2
var(X | F) = E X − E(X | F) | F
2 2
= E X | F − E(X | F)
Remember from Lecture 6, one of the 6 stylized facts:
5
(4) Volatility appears to vary over time.
Defining the volatility as the conditional standard deviation of
the returns given the past information, it is observed that if
recent returns have been large, it is expected to have large re-
turns.
Example Assume that our returns follow the random walk hy-
pothesis, with gaussian returns. In other terms,
X(t) = µ + σε(t), where {ε(t)} GWN with unit variance.
This gives a price process of the form
S(t) = S(0) exp nµ + σ(ε(1) + · · · + ε(t)) .
i.e. a discretization of the Black-Scholes model.
The expected return at time t, conditional on the past information
F(t − 1) is
E(X(t) | F(t)) = E(µ + σε(t) | F(t − 1)) = µ,
6
as ε(t) is independent of F(t − 1) and centered.
The conditional variance is
var(X(t) |F(t − 1))
2 2
= E X(t) | F(t − 1) − E[X(t) | F(t)]
= E (µ + ε(t)) | F(t − 1) − µ2 = σ 2.
2
7
Let us compute the volatility begining by the conditional expec-
tation:
E(X(t) |F(t − 1)) = E[µ + φX(t − 1) + ε(t) | F(t − 1)]
= µ + E[φX(t − 1) | F(t − 1)] + E[ε(t) | F(t − 1)]
= µ + φX(t − 1).
So
X(t) − E(X(t) | F(t − 1)) = ε(t),
that is independent of F(t − 1), so the volatility satisfies
2 2
σ(t) = var (X(t) − E(X(t) | F(t − 1))) | F(t − 1)
8
12b. Auto Regressive Conditionally Heteroscedastic
(ARCH) Model
In order fulfill the observed empirical characteristics of financial
time series, in 1982 Robert Engle1 introduced the ARCH model2
The ARCH model assumes that {X(t)} is a stationary process
satisfying:
X(t) = σ(t)ε(t), σ(t)2 = ω + αX(t − 1)2,
where {ε(t)} is a strict white noise with unit variance, and the
parameters satisfy
ω > 0, 0 ≤ α < 1.
Note that for α = 0 we obtain a strict white noise model.
12003 Nobel Laureate in Economics.
2Robert F. Engle. “Autoregressive Conditional Heteroscedasticity with Estimates of Variance of United
Kingdom Inflation”, Econometrica 50:987-1008, 1982. .
9
12c. Properties of the ARCH model
Let us examine the statistical properties of this model.
Conditional Mean:
E(X(t) | F(t − 1)) = E σ(t)ε(t) | F(t − 1)
= σ(t) E ε(t) | F(t − 1) = σ(t) E ε(t) = 0
(Unconditional) Mean
E X(t) = E E(X(t) | F(t − 1) = 0.
This means that the sequence {X(t)} forms a martingale differ-
ence. From this it follows that
E X(t)X(t − 1) = 0,
i.e. the values are uncorrelated (but not independent!)
10
Conditional Variance and Volatility
var(X(t) | F(t − 1)) = E(X(t)2 | F(t − 1))
2 2
= E σ(t) ε(t) | F(t − 1)
(Unconditional) Variance
var X(t) = E[var(X(t) | F(t − 1))]
= E[ω + αX(t − 1)2]
= ω + α var X(t − 1).
11
As the process is stationary,
ω
var X(t) = var X(t − 1) = .
1−α
Skewness and Kurtosis
For m = 3, 4 we obtain
E(X(t)m | F(t − 1)) = σ(t)m E εm(t).
In particular, if the white noise has normal distributions, after
some computations, we obtain
• γ X(t) = 0 (we have no skewness).
6α 2 p
• κX(t) = 1−3α2 , if α < 1/3
p
• κX(t) = ∞, if α ≥ 1/3
• We always have positive kurtosis, (i.e. X(t) is leptokurtic)
12
If the white noise has a different distribution, for instance a t-
student distribution, X(t) inherits non-vanishing skewness. The
kurtosis remains positive.
Correlation of Squares
It can be computed that
2 ω 2
var[X(t)2] = 2
.
1 − 3α 1 − α
Furthermore,
2 2 1 + 3α ω 2
E X(t) X(t − 1) = 2
.
1 − 3α 1 − α
And these computations allow to compute the correlation between
the squared values, that is
ρ(X(t)2, X(t − 1)2) = α.
13
12d. Maximum Likelihood Parameter Estimation
Assume that we have historical data of certain financial instrument
X(0), X(1), . . . X(n),
and we want to fit certain model depending on a (possibly vecto-
rial) parameter θ.
Maximum Likelihood Estimation (ML) is one method to perform
the estimation of the parameter θ. The ML estimator θ̄ is the one
that maximizes the density function of our model when we plug
in our empirical data in the places of the variables of the density
function of the model.
From the general theory of statistics, it is known that (under mild
assumptions) the ML estimator θ̄ has two important properties:
14
• Estimators are consistent:
θ̄ → θ, n → ∞.
This means that, if the sample is enough large (and the model
is true) our estimations are near to the true values of the para-
meters.
• Estimators are asymptotically normal:
√
n(θ̄ − θ) ∼ N (0, σ 2)
meaning that, estimating σ (this is a number if we have one
parameter, and a matrix if we have more that one), we can
construct confidence intervals for our estimators.
15
Example To see how ML works, we first examine a simpler
example. Suppose that we observe a sample with 10 independent
values:
x(1) = −0.38, x(2) = 0.11, 2.2, 1.2, −0.33, 1.3,
− 0.38, 2.1, 1.5, x(10) = 1.8,
that we we want to model through a N (µ, 1).
Step 1: Compute the joint density of the sample of our model:
1 1 2
f (x1, . . . , x10) = √ exp − (x1 − µ)
2π 2
1 1 2
. . . √ exp − (x10 − µ)
2π 2
10
1 1X 2
= 5
exp − (xt − µ)
(2π) 2
t=1
16
Step 2: Plug the observed values in the density, to obtain the likelihood
function:
L(µ) = f (−0.38, . . . , 1.8)
10
1 1 X
2
= 5
exp − (x(t) − µ)
(2π) 2
t=1
Step 3: Take the logarithm to obtain the log-likelihood function:
10
1
(x(t) − µ)2.
X
`(µ) = log L(µ) = 5 log(2π) −
2
t=1
17
Step 4: To find the maximum we differentiate with respect to µ:
10 10
∂`(µ) X X
=− (x(t) − µ) = nµ − x(t),
∂µ
t=1 t=1
that vanishes when
10
1 X
µ= x(t).
10
t=1
(that can be checked to produce a minimum).
Step 5: In this way we obtain our estimator
1
µ̄ = (−0.38 + · · · + 1.8) = 0.9
10
Remark
The data was taken from a simulated sample of N (1, 1).
18