Skip to content

Commit

Permalink
Update 14-ch14.Rmd
Browse files Browse the repository at this point in the history
  • Loading branch information
Ocalak authored Jan 17, 2024
1 parent 2a2da29 commit 7701ab7
Showing 1 changed file with 30 additions and 30 deletions.
60 changes: 30 additions & 30 deletions 14-ch14.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ library(readxl)
library(urca)
```

Time series data is collected for a single entity over time. This is fundamentally different from cross-section data which is data on multiple entities at the same point in time. Time series data allows estimation of the effect on $Y$ of a change in $X$ *over time*. This is what econometricians call a *dynamic causal effect*. Let us go back to the application to cigarette consumption of Chapter \@ref(ivr) where we were interested in estimating the effect on cigarette demand of a price increase caused by a raise of the general sales tax. One might use time series data to assess the causal effect of a tax increase on smoking both initially and in subsequent periods.
Time series data is collected for a single entity over time. This is fundamentally different from cross-section data which is data on multiple entities at the same point in time. Time series data allows estimation of the effect on $Y$ of a change in $X$ *over time*. This is what econometricians call a *dynamic causal effect*. Let us go back to the application of cigarette consumption of Chapter \@ref(ivr) where we were interested in estimating the effect on cigarette demand of a price increase caused by a raise of the general sales tax. One might use time series data to assess the causal effect of a tax increase on smoking both, initially and in subsequent periods.

Another application of time series data is forecasting. For example, weather services use time series data to predict tomorrow's temperature by, inter alia, using today's temperature and temperatures of the past. To motivate an economic example, central banks are interested in forecasting next month's unemployment rates.

Expand All @@ -18,14 +18,14 @@ Most empirical applications in this chapter are concerned with forecasting and u

The following packages and their dependencies are needed for reproduction of the code chunks presented throughout this chapter:

+ `r ttcode("AER")` [@R-AER]
+ `r ttcode("dynlm")` [@R-dynlm]
+ `r ttcode("forecast")` [@R-forecast]
+ `r ttcode("readxl")` [@R-readxl]
+ `r ttcode("stargazer")` [@R-stargazer]
+ `r ttcode("scales")` [@R-scales]
+ `r ttcode("quantmod")` [@R-quantmod]
+ `r ttcode("urca")` [@R-urca]
+ `r ttcode("AER")` [@R-AER],
+ `r ttcode("dynlm")` [@R-dynlm],
+ `r ttcode("forecast")` [@R-forecast],
+ `r ttcode("readxl")` [@R-readxl],
+ `r ttcode("stargazer")` [@R-stargazer],
+ `r ttcode("scales")` [@R-scales],
+ `r ttcode("quantmod")` [@R-quantmod],
+ `r ttcode("urca")` [@R-urca].

Please verify that the following code chunk runs on your machine without any errors.

Expand Down Expand Up @@ -132,7 +132,7 @@ plot(as.zoo(GDPGrowth),

### Notation, Lags, Differences, Logarithms and Growth Rates {-}

For observations of a variable $Y$ recorded over time, $Y_t$ denotes the value observed at time $t$. The period between two sequential observations $Y_t$ and $Y_{t-1}$ is a unit of time: hours, days, weeks, months, quarters, years etc. Key Concept 14.1 introduces the essential terminology and notation for time series data we use in the subsequent sections.
For observations of a variable $Y$ recorded over time, $Y_t$ denotes the value observed at time $t$. The period between two sequential observations $Y_t$ and $Y_{t-1}$ is a unit of time: hours, days, weeks, months, quarters, years, etc. Key Concept 14.1 introduces the essential terminology and notation for time series data we use in the subsequent sections.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
Expand Down Expand Up @@ -187,7 +187,7 @@ quants <- function(series) {
}
```

The annual growth rate is computed using the approximation $$Annual Growth Y_t = 400 \cdot\Delta\log(Y_t)$$ since $100\cdot\Delta\log(Y_t)$ is an approximation of the quarterly percentage changes, see Key Concept 14.1.
The annual growth rate is computed using the approximation $$Annual Growth Y_t = 400 \cdot\Delta\log(Y_t)$$ since $100\cdot\Delta\log(Y_t)$, is an approximation of the quarterly percentage changes, see Key Concept 14.1.

We call `r ttcode("quants()")` on observations for the period 2011:Q3 - 2013:Q1.

Expand Down Expand Up @@ -256,7 +256,7 @@ Using `r ttcode("acf()")` it is straightforward to compute the first four sample
acf(na.omit(GDPGrowth), lag.max = 4, plot = F)
```

This is evidence that there is mild positive autocorrelation in the growth of GDP: if GDP grows faster than average in one period, there is a tendency for it to grow faster than average in the following periods.
This is the evidence that there is mild positive autocorrelation in the growth of GDP: if GDP grows faster than average in one period, there is a tendency for it to grow faster than average in the following periods.

#### Other Examples of Economic Time Series {-}

Expand Down Expand Up @@ -340,7 +340,7 @@ Autoregressive models are heavily used in economic forecasting. An autoregressiv

#### The First-Order Autoregressive Model {-}

It is intuitive that the immediate past of a variable should have power to predict its near future. The simplest autoregressive model uses only the most recent outcome of the time series observed to predict future values. For a time series $Y_t$ such a model is called a first-order autoregressive model, often abbreviated AR(1), where the 1 indicates that the order of autoregression is one:
It is intuitive that the immediate past of a variable should have power to predict its near future. The simplest autoregressive model uses only the most recent outcome of the time series observed to predict future values. For a time series $Y_t$ such a model is called a first-order autoregressive model, often abbreviated as AR(1), where the 1 indicates that the order of autoregression is one:
\begin{align*}
Y_t = \beta_0 + \beta_1 Y_{t-1} + u_t
\end{align*}
Expand Down Expand Up @@ -850,7 +850,7 @@ cat('
<div class = "keyconcept" id="KC14.5">
<h3 class = "right"> Key Concept 14.5 </h3>
<h3 class = "left"> Stationarity </h3>
A time series $Y_t$ is stationary if its probability distribution is time independent, that is the joint distribution of $Y_{s+1}, Y_{s+2},\\dots,Y_{s+T}$ does not change as $s$ is varied, regardless of $T$.
A time series $Y_t$ is stationary if its probability distribution is time independent, that is, the joint distribution of $Y_{s+1}, Y_{s+2},\\dots,Y_{s+T}$ does not change as $s$ is varied, regardless of $T$.
Similarly, two time series $X_t$ and $Y_t$ are *jointly stationary* if the joint distribution of $(X_{s+1},Y_{s+1}, X_{s+2},Y_{s+2} \\dots, X_{s+T},Y_{s+T})$ does not depend on $s$, regardless of $T$.
Expand All @@ -861,7 +861,7 @@ Stationarity makes it easier to learn about the characteristics of past data.

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Stationarity]{14.5}
A time series $Y_t$ is stationary if its probability distribution is time independent, that is the joint distribution of $Y_{s+1}, Y_{s+2},\\dots,Y_{s+T}$ does not change as $s$ is varied, regardless of $T$.\\newline
A time series $Y_t$ is stationary if its probability distribution is time independent, that is, the joint distribution of $Y_{s+1}, Y_{s+2},\\dots,Y_{s+T}$ does not change as $s$ is varied, regardless of $T$.\\newline
Similarly, two time series $X_t$ and $Y_t$ are \\textit{jointly stationary} if the joint distribution of $(X_{s+1},Y_{s+1}, X_{s+2},Y_{s+2} \\dots, X_{s+T}, Y_{s+T})$ does not depend on $s$, regardless of $T$.\\newline
Expand Down Expand Up @@ -889,13 +889,13 @@ The general time series regression model extends the ADL model such that multipl
&+ \\delta_{11} X_{1,t-1} + \\delta_{12} X_{1,t-2} + \\dots + \\delta_{1q} X_{1,t-q} \\\\
&+ \\dots \\\\
&+ \\delta_{k1} X_{k,t-1} + \\delta_{k2} X_{k,t-2} + \\dots + \\delta_{kq} X_{k,t-q} \\\\
&+ u_t
&+ u_t.
\\end{aligned}
\\end{equation}
For estimation we make the following assumptions:
1. The error term $u_t$ has conditional mean zero given all regressors and their lags: $$E(u_t\\vert Y_{t-1}, Y_{t-2}, \\dots, X_{1,t-1}, X_{1,t-2} \\dots, X_{k,t-1}, X_{k,t-2}, \\dots)$$ This assumption is an extension of the conditional mean zero assumption used for AR and ADL models and guarantees that the general time series regression model stated above gives the best forecast of $Y_t$ given its lags, the additional regressors $X_{1,t},\\dots,X_{k,t}$ and their lags.
1. The error term $u_t$ has conditional mean zero given all regressors and their lags: $$E(u_t\\vert Y_{t-1}, Y_{t-2}, \\dots, X_{1,t-1}, X_{1,t-2} \\dots, X_{k,t-1}, X_{k,t-2}, \\dots,)$$ This assumption is an extension of the conditional mean zero assumption used for AR and ADL models and guarantees that the general time series regression model stated above gives the best forecast of $Y_t$ given its lags, the additional regressors $X_{1,t},\\dots,X_{k,t}$ and their lags.
2. The i.i.d. assumption for cross-sectional data is not (entirely) meaningful for time series data. We replace it by the following assumption which consists of two parts:
Expand All @@ -920,15 +920,15 @@ The general time series regression model extends the ADL model such that multipl
+& \\, \\delta_{11} X_{1,t-1} + \\delta_{12} X_{1,t-2} + \\dots + \\delta_{1q} X_{1,t-q} \\\\
+& \\, \\dots \\\\
+& \\, \\delta_{k1} X_{k,t-1} + \\delta_{k2} X_{k,t-2} + \\dots + \\delta_{kq} X_{k,t-q} \\\\
+& \\, u_t
+& \\, u_t.
\\end{aligned}
\\end{equation}
For estimation we make the following assumptions:\\newline
\\begin{enumerate}
\\item The error term $u_t$ has conditional mean zero given all regressors and their lags: $$E(u_t\\vert Y_{t-1}, Y_{t-2}, \\dots, X_{1,t-1}, X_{1,t-2} \\dots, X_{k,t-1}, X_{k,t-2}, \\dots)$$ This assumption is an extension of the conditional mean zero assumption used for AR and ADL models and guarantees that the general time series regression model stated above gives the best forecast of $Y_t$ given its lags, the additional regressors $X_{1,t},\\dots,X_{k,t}$ and their lags.
\\item The i.i.d. assumption for cross-sectional data is not (entirely) meaningful for time series data. We replace it by the following assumption witch consists of two parts:\\newline
\\item The error term $u_t$ has conditional mean zero given all regressors and their lags: $$E(u_t\\vert Y_{t-1}, Y_{t-2}, \\dots, X_{1,t-1}, X_{1,t-2} \\dots, X_{k,t-1}, X_{k,t-2}, \\dots,)$$ This assumption is an extension of the conditional mean zero assumption used for AR and ADL models and guarantees that the general time series regression model stated above gives the best forecast of $Y_t$ given its lags, the additional regressors $X_{1,t},\\dots,X_{k,t}$ and their lags.
\\item The i.i.d. assumption for cross-sectional data is not (entirely) meaningful for time series data. We replace it by the following assumption which consists of two parts:\\newline
\\begin{itemize}
\\item[(a)] The $(Y_{t}, X_{1,t}, \\dots, X_{k,t})$ have a stationary distribution (the "identically distributed" part of the i.i.d. assumption for cross-setional data). If this does not hold, forecasts may be biased and inference can be strongly misleading.
Expand Down Expand Up @@ -972,7 +972,7 @@ We have already performed a Granger causality test on the coefficients of term s

In general, it is good practice to report a measure of the uncertainty when presenting results that are affected by the latter. Uncertainty is particularly of interest when forecasting a time series. For example, consider a simple ADL$(1,1)$ model
\begin{align*}
Y_t = \beta_0 + \beta_1 Y_{t-1} + \delta_1 X_{t-1} + u_t
Y_t = \beta_0 + \beta_1 Y_{t-1} + \delta_1 X_{t-1} + u_t,
\end{align*}
where $u_t$ is a homoskedastic error term. The forecast error is
\begin{align*}
Expand Down Expand Up @@ -1022,7 +1022,7 @@ We choose `r ttcode("level = seq(5, 99, 10)")` in the call of `r ttcode("forecas

The dashed red line shows point forecasts of the series for the next 25 periods based on an $AR(2)$ model and the shaded areas represent the prediction intervals. The degree of shading indicates the level of the prediction interval. The darkest of the blue bands displays the $5\%$ forecast intervals and the color fades towards grey as the level of the intervals increases.

## Lag Length Selection Using Information Criteria {#llsuic}
## Lag Length Selection using Information Criteria {#llsuic}

The selection of lag lengths in AR and ADL models can sometimes be guided by economic theory. However, there are statistical methods that are helpful to determine how many lags should be included as regressors. In general, too many lags inflate the standard errors of coefficient estimates and thus imply an increase in the forecast error while omitting lags that should be included in the model may result in an estimation bias.

Expand All @@ -1038,11 +1038,11 @@ The order of an AR model can be determined using two approaches:

+ The *Bayes information criterion* (BIC):

$$BIC(p) = \log\left(\frac{SSR(p)}{T}\right) + (p + 1) \frac{\log(T)}{T}$$
$$BIC(p) = \log\left(\frac{SSR(p)}{T}\right) + (p + 1) \frac{\log(T)}{T}.$$

+ The *Akaike information criterion* (AIC):

$$AIC(p) = \log\left(\frac{SSR(p)}{T}\right) + (p + 1) \frac{2}{T}$$
$$AIC(p) = \log\left(\frac{SSR(p)}{T}\right) + (p + 1) \frac{2}{T}.$$

Both criteria are estimators of the optimal lag length $p$. The lag order $\widehat{p}$ that minimizes the respective criterion is called the *BIC estimate* or the *AIC estimate* of the optimal model order. The basic idea of both criteria is that the $SSR$ decreases as additional lags are added to the model such that the first term decreases whereas the second increases as the lag order grows. One can show that the the $BIC$ is a consistent estimator of the true lag order while the AIC is not which is due to the differing factors in the second addend. Nevertheless, both estimators are used in practice where the $AIC$ is sometimes used as an alternative when the $BIC$ yields a model with "too few" lags.

Expand Down Expand Up @@ -1119,7 +1119,7 @@ The $BIC$ is in favor of the ADL($2$,$2$) model \@ref(eq:gdpgradl22) we have est

If a series is nonstationary, conventional hypothesis tests, confidence intervals and forecasts can be strongly misleading. The assumption of stationarity is violated if a series exhibits trends or breaks and the resulting complications in an econometric analysis depend on the specific type of the nonstationarity. This section focuses on time series that exhibit trends.

A series is said to exhibit a trend if it has a persistent long-term movement. One distinguishes between *deterministic* and *stochastic* trends.
A series is said to exhibit a trend if it has a persistent long-term movement. One can distinguishes between *deterministic* and *stochastic* trends as:

+ A trend is *deterministic* if it is a nonrandom function of time.

Expand Down Expand Up @@ -1301,11 +1301,11 @@ A formal test for a stochastic trend has been proposed by @dickey1979 which thus
\end{align*}
The null hypothesis is that the AR($1$) model has a unit root and the alternative hypothesis is that it is stationary. One often rewrites the AR($1$) model by subtracting $Y_{t-1}$ on both sides:
\begin{align}
Y_t = \beta_0 + \beta_1 Y_{t-1} + u_t \ \ \Leftrightarrow \ \ \Delta Y_t = \beta_0 + \delta Y_{t-1} + u_t (\#eq:dfmod)
Y_t = \beta_0 + \beta_1 Y_{t-1} + u_t \ \ \Leftrightarrow \ \ \Delta Y_t = \beta_0 + \delta Y_{t-1} + u_t, (\#eq:dfmod)
\end{align}
where $\delta = \beta_1 - 1$. The testing problem then becomes
\begin{align*}
H_0: \delta = 0 \ \ \ \text{vs.} \ \ \ H_1: \delta < 0
H_0: \delta = 0 \ \ \ \text{vs.} \ \ \ H_1: \delta < 0,
\end{align*}
which is convenient since the corresponding test statistic is reported by many relevant `r ttcode("R")` functions.^[The $t$-statistic of the Dickey-Fuller test is computed using homoskedasticity-only standard errors since under the null hypothesis, the usual $t$-statistic is robust to conditional heteroskedasticity.]

Expand Down Expand Up @@ -1342,14 +1342,14 @@ Under the null, the $t$-statistic corresponding to $H_0: \\delta = 0$ does not h
cat('\\begin{keyconcepts}[The ADF Test for a Unit Root]{14.8}
Consider the regression
\\begin{align}
\\Delta Y_t = \\beta_0 + \\delta Y_{t-1} + \\gamma_1 \\Delta_1 Y_{t-1} + \\gamma_2 \\Delta Y_{t-2} + \\dots + \\gamma_p \\Delta Y_{t-p} + u_t. \\label{eq:ADFreg1}
\\Delta Y_t = \\beta_0 + \\delta Y_{t-1} + \\gamma_1 \\Delta Y_{t-1} + \\gamma_2 \\Delta Y_{t-2} + \\dots + \\gamma_p \\Delta Y_{t-p} + u_t. \\label{eq:ADFreg1}
\\end{align}
The ADF test for a unit autoregressive root tests the hypothesis $H_0: \\delta = 0$ (stochastic trend) against the one-sided alternative $H_1: \\delta < 0$ (stationarity) using the usual OLS $t$-statistic.\\newline
If it is assumed that $Y_t$ is stationary around a deterministic linear time trend, the model is augmented by the regressor $t$:
\\begin{align}
\\Delta Y_t = \\beta_0 + at + \\delta Y_{t-1} + \\gamma_1 \\Delta_1 Y_{t-1} + \\gamma_2 \\Delta Y_{t-2} + \\dots + \\gamma_p \\Delta Y_{t-p} + u_t, \\label{eq:ADFreg2}
\\Delta Y_t = \\beta_0 + at + \\delta Y_{t-1} + \\gamma_1 \\Delta Y_{t-1} + \\gamma_2 \\Delta Y_{t-2} + \\dots + \\gamma_p \\Delta Y_{t-p} + u_t, \\label{eq:ADFreg2}
\\end{align}
where again $H_0: \\delta = 0$ is tested against $H_1: \\delta < 0$.\\newline
Expand Down Expand Up @@ -1496,7 +1496,7 @@ legend("topleft",

The deviations from the standard normal distribution are significant: both Dickey-Fuller distributions are skewed to the left and have a heavier left tail than the standard normal distribution.

#### Does U.S. GDP Have a Unit Root? {-}
#### Does U.S. GDP have a Unit Root? {-}

As an empirical example, we use the ADF test to assess whether there is a stochastic trend in U.S. GDP using the regression
\begin{align*}
Expand Down

0 comments on commit 7701ab7

Please sign in to comment.