STA457: Time Series Analysis
Lecture 8
Lijia Wang
Department of Statistical Sciences
University of Toronto
Lijia Wang (UofT) STA457: Time Series Analysis 1 / 35
Overview
Last Time:
1 Autoregressive (AR) process
2 Moving average (MA) process
3 Autoregressive moving average (ARMA)
Today:
1 Autoregressive moving average (ARMA)
2 Partial Auto-correlation
3 Forecasting
4 Estimation
Lijia Wang (UofT) STA457: Time Series Analysis 2 / 35
Outline
1 Autoregressive Moving Average Models
2 Partial Auto-correlation Function (PACF)
3 Forecasting
4 Estimation
Lijia Wang (UofT) STA457: Time Series Analysis 3 / 35
ARMA(p,q)
Definition: A time series {xt ; t = 0, ±1, ±2, · · · } is ARMA(p, q) if it is
stationary and
xt = ω1 xt→1 + ω2 xt→2 + · · · + ωp xt→p + wt + ε1 wt→1 + ε2 wt→2 + · · · + εq wt→q
with ωp →= 0, εq →= 0, ϑw2 > 0. The parameters p and q are called the
autoregressive and the moving average orders, respectively.
The ARMA(p, q) model can then be written in concise form as
ω(B)xt = ε(B)wt .
Lijia Wang (UofT) STA457: Time Series Analysis 4 / 35
Problems encountered - Solutions
To summarize, we have seen the following problems:
(i) parameter redundant models,
Solution: we require that ω(z) and ε(z) have no common factors.
(ii) stationary AR models that depend on the future, and
Solution: A formal definition of causality for ARMA models.
(iii) MA models that are not unique.
Solution: The formal definition of invertible property allows an infinite
autoregressive representation.
Lijia Wang (UofT) STA457: Time Series Analysis 5 / 35
Example: ARMA(p,q)
Example 5: Consider the following time series model
xt = 0.4xt→1 + 0.45xt→2 + wt + wt→1 + 0.25wt→2 (1)
1 Identify the above model as ARMA(p, q) model (watch out parameter
redundancy)
2 Determine whether the model is causal and/or invertible
3 If the model is causal, write the model as a linear process
Lijia Wang (UofT) STA457: Time Series Analysis 6 / 35
Outline
1 Autoregressive Moving Average Models
2 Partial Auto-correlation Function (PACF)
3 Forecasting
4 Estimation
Lijia Wang (UofT) STA457: Time Series Analysis 7 / 35
The Partial Autocorrelation Function (PACF)
For MA (q) models, the ACF will be zero for lags greater than q.
Moreover, because εq →= 0, the ACF will not be zero at lag q. We can
use this property to identify MA models.
If the process, however, is ARMA or AR, the ACF alone tells us little
about the orders of dependence.
The partial autocorrelation function (PACF) can be used to identify
AR models.
Lijia Wang (UofT) STA457: Time Series Analysis 8 / 35
The Partial Autocorrelation Function (PACF)
Definition: Suppose that X , Y and Z are random variables. If we
Regressing X on Z to obtain X̂ .
Regressing Y on Z to obtain Ŷ .
The partial correlation between X and Y given Z is obtained by calculating
ϖXY |Z = corr(X ↑ X̂ , Y ↑ Ŷ )
ϖXY |Z measures the correlation between X and Y after removing linear
e!ect.
Lijia Wang (UofT) STA457: Time Series Analysis 9 / 35
PACF Motivation
Consider a causal AR(1) model |ω| <|
xt = ωxt→1 + wt .
We have
ϱx (2) = cov (xt , xt→2 )
= cov (ωxt→1 + wt , xt→2 ) = cov (ω (ωxt→2 + wt→1 ) + wt , xt→2 )
! 2 "
= cov ω xt→2 + ωwt→1 + wt , xt→2
= ω2 ϱx (0)
corr (xt , xt→2 ) →= 0 because xt depends on xt→2 through xt→1 .
We try to remove the e!ect xt→1 . Consider
cov (xt ↑ ωxt→1 , xt→2 ↑ ωxt→1 ) = cov(wt , xt→2 ↑ ωxt→1 ) = 0.
Lijia Wang (UofT) STA457: Time Series Analysis 10 / 35
The Partial Autocorrelation Function (PACF)
Consider the mean-zero stationary time series xt .
For h ↓ 2, let x̂t+h denote the regression of xt+h on
{xt+h→1 , xt+h→2 , · · · , xt+1 }. We can write
x̂t+h = ς1 xt+h→1 + ς2 xt+h→2 + · · · + ςh→1 xt+1 .
No intercept term is needed because the mean of xt is zero.
let x̂t denote the regression of xt on {xt+1 , xt+2 , · · · , xt+h→1 }, Then
x̂t = ς1 xt+1 + ς2 xt+2 + · · · + ςh→1 xt+h→1 .
Because of stationarity, the coe”cients, ς1 , ς2 , · · · , ςh→1 are the
same.
The partial autocorrelation function (PACF) of a stationary process,
xt , denoted ωhh , for h = 1, 2, · · · , is
ω11 = corr (xt+1 , xt ) = ϖ(1)
and
ωhh = corr (xt+h ↑ x̂t+h , xt ↑ x̂t ) .
Lijia Wang (UofT) STA457: Time Series Analysis 11 / 35
Partial autocorrelation function (PACF)
Definition: The PACF, ωhh , is the correlation between xt+h and xt with
the linear dependence of xt+1 , xt+2 , · · · , xt+h→1 on each, removed.
Example 6: Compute the PACF of the AR(1) process given by
xt = ωxt→1 + wt , with |ω| < 1.
Result: The PACF of an AR(p) for h > p
ωhh = corr (xt+h ↑ x̂t+h , xt ↑ x̂t ) = corr (wt+h , xt ↑ x̂t ) = 0.
Lijia Wang (UofT) STA457: Time Series Analysis 12 / 35
Behavior of the ACF and PACF for ARMA models
AR (P) MA (q) ARMA(p, q)
ACF Tails o! Cuts o! after lag q Tails o!
PACF Cuts o! after lag p Tails o! Tails o!
Lijia Wang (UofT) STA457: Time Series Analysis 13 / 35
Example 2: AR simulation
Example 2: Generate AR (1) process for ω = 0.9 and ω = ↑0.9.
Lijia Wang (UofT) STA457: Time Series Analysis 14 / 35
Example 2: AR simulation ACF
Figure: ACF plots of the AR processes
Lijia Wang (UofT) STA457: Time Series Analysis 15 / 35
Example 2: AR simulation PACF
Figure: PACF plots of the AR processes
Lijia Wang (UofT) STA457: Time Series Analysis 16 / 35
Example 4: MA simulation
Generate MA(1) model for ε = 0.5 and ε = ↑0.5.
Lijia Wang (UofT) STA457: Time Series Analysis 17 / 35
Example 4: MA simulation ACF
Figure: ACF plots of the MA processes
Lijia Wang (UofT) STA457: Time Series Analysis 18 / 35
Example 4: MA simulation PACF
Figure: PACF plots of the MA processes
Lijia Wang (UofT) STA457: Time Series Analysis 19 / 35
Outline
1 Autoregressive Moving Average Models
2 Partial Auto-correlation Function (PACF)
3 Forecasting
4 Estimation
Lijia Wang (UofT) STA457: Time Series Analysis 20 / 35
Introduction
The next topic: Forecasting using Time-series Models
In forecasting, the goal is to predict future values of a time series,
xn+m , m = 1, 2, · · · , based on the data collected to the present,
x1:n {x1 , x2 , · · · , xn }.
Throughout this section, we will assume xy is stationary and the
model parameters are known.
Lijia Wang (UofT) STA457: Time Series Analysis 21 / 35
Forecasting
The minimum mean square error predictor of xn+m is:
n
xn+m = E (xn+m | x1:n )
Because the conditional expectation minimizes the mean square error
E [xn+m ↑ g (x1:n )]2 ,
where g (x1:n ) is a function of the observations x1:n .
We will restrict attention to predictors that are linear functions of the
data, that is, predictors of the form
n
#
n
xn+m = φ0 + φ k xk
k=1
where φ0 , φ1 , · · · , φn are real numbers.
Lijia Wang (UofT) STA457: Time Series Analysis 22 / 35
Forecasting
We note that the φ s depend on n and m, but for now we drop the
dependence from the notation.
For example, if n = m = 1, then x21 is the one-step-ahead linear
forecast of x2 given x1 . That is, x21 = φ0 + φ1 x1 .
But if n = 2, x32 is the one-step-ahead linear forecast of x3 given x1
and x2 . That is, x32 = φ0 + φ1 x1 + φ2 x2 .
Definition: Linear predictors of this form that minimizes the mean
square prediction error are called best linear predictors (BLPs).
Lijia Wang (UofT) STA457: Time Series Analysis 23 / 35
Best Linear Prediction for Stationary Processes
Given data x1 , x2 , · · · , xn , the best linear predictor,
n
#
n
xn+m = φ0 + φ k xk ,
k=1
for m ↓ 1, is found by solving
$! n
% &
E xn+m ↑ xn+m xk = 0, k = 0, 1, · · · , n,
where x0 = 1, for φ0 , φ1 , · · · , φn . These equations are called the
prediction equations.
Lijia Wang (UofT) STA457: Time Series Analysis 24 / 35
Example: Best Linear Prediction for Stationary Processes
Example 1: Consider the AR(1) series
xt = ωxt→1 + wt
where wt is white noise with variance ϑw2 and the model parameters are
known. Suppose that we have observed x1 and x2 , and we would like to
estimate x3 . Find the best linear predictor of x3 .
Lijia Wang (UofT) STA457: Time Series Analysis 25 / 35
Prediction Error
Definition: The mean square m-step-ahead prediction error is
n
$ n
&2
Pn+m = E xn+m ↑ xn+m
Lijia Wang (UofT) STA457: Time Series Analysis 26 / 35
Prediction Interval
To assess the precision of the forecasts, prediction intervals are typically
calculated along with the forecasts.
In general, 100(1 ↑ φ)% prediction intervals are of the form
'
n n
xn+m ± Cω/2 Pn+m ,
where Cω/2 is chosen to get the desired degree of confidence.
For example, if the process is Gaussian, then choosing Cω/2 = 2 will yield
an approximate 95% prediction interval for xn+m .
Lijia Wang (UofT) STA457: Time Series Analysis 27 / 35