Unit 3 Regression Models

DASC6510/DASC4990
Unit 3: Regression models
Erfanul Hoque, PhD

Thompson Rivers University
The note is strongly inspired by the materials shared on the Book:
Hyndman, R. J. & Athanasopoulos, G. (2021) Forecasting:
principles and practice, 3rd edition. https://otexts.com/fpp3/
2
Time series linear model
3
Time series linear model
We discuss regression models. The basic concept is that we

forecast the time series of interest y assuming that it has a linear
relationship with other time series x.
4
Multiple regression and forecasting
yt = β0 + β1 x1,t + β2 x2,t + · · · + βk xk,t + εt .
• yt is the variable we want to forecast: the “response”

variable
• Each xj,t is numerical and is called a “predictor”.
They are usually assumed to be known for all past
and future times.
• The coefficients β1 , . . . , βk measure the effect of each
predictor after taking account of the effect of all
other predictors in the model.
That is, the coefficients measure the marginal effects of
predictor variables. 5
Example: US consumption expenditure
us_change %>%
pivot_longer(c(Consumption, Income), names_to="Series") %>%
autoplot(value) +
labs(y="% change")
2.5
% change
Series
Consumption
0.0
Income
−2.5
1980 Q1 2000 Q1 2020 Q1

Quarter [1Q]
6
us_change %>%
ggplot(aes(x = Income, y = Consumption)) +
labs(y = "Consumption (quarterly % change)",
x = "Income (quarterly % change)") +
geom_point() + geom_smooth(method = "lm", se = FALSE)
2
Consumption (quarterly % change)
−1
−2
−2.5 0.0 2.5

Income (quarterly % change)
7
fit_cons <- us_change %>%

model(lm = TSLM(Consumption ~ Income))
report(fit_cons)
## Series: Consumption
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.582 -0.278 0.019 0.323 1.422
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5445 0.0540 10.08 < 2e-16 ***
## Income 0.2718 0.0467 5.82 2.4e-08 ***
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.591 on 196 degrees of freedom
## Multiple R-squared: 0.147, Adjusted R-squared: 0.143
## F-statistic: 33.8 on 1 and 196 DF, p-value: 2e-08 8
Consumption
1
0
−1
−2
2.5
Income
0.0
−2.5
2.5
Production
0.0
−2.5
−5.0
40
20
Savings
0
−20
−40
−60
1.5
Unemployment
1.0
0.5
0.0
−0.5
−1.0
1980 Q1 2000 Q1 2020 Q1
Quarter
9
Consumption Income Production Savings Unemployment
Consumption
0.6
0.4
Corr: Corr: Corr: Corr:
0.2 0.384*** 0.529*** −0.257*** −0.527***
0.0
2.5
Income
Corr: Corr: Corr:
0.0
0.269*** 0.720*** −0.224**
−2.5
2.5
Production
0.0 Corr: Corr:
−2.5 −0.059 −0.768***
−5.0
40
20
Savings
0 Corr:
−20 0.106
−40
−60
1.5
Unemployment
1.0
0.5
0.0
−0.5
−1.0
−2 −1 0 1 2 −2.5 0.0 2.5 −5.0 −2.5 0.0 2.5 −60 −40 −20 0 20 40−1.0 −0.5 0.0 0.5 1.0 1.5
10
fit_consMR <- us_change %>%
model(lm = TSLM(Consumption ~ Income + Production + Unemployment + Savings))
report(fit_consMR)
## Series: Consumption
## Model: TSLM
##
## Residuals:
## -0.906 -0.158 -0.036 0.136 1.155
##
## Coefficients:
## (Intercept) 0.25311 0.03447 7.34 5.7e-12 ***
## Income 0.74058 0.04012 18.46 < 2e-16 ***
## Production 0.04717 0.02314 2.04 0.043 *
## Unemployment -0.17469 0.09551 -1.83 0.069 .
## Savings -0.05289 0.00292 -18.09 < 2e-16 ***
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## F-statistic: 160 on 4 and 193 DF, p-value: <2e-16
11
Percent change in US consumption expenditure
Data
0
Fitted
−1
−2
1980 Q1 2000 Q1 2020 Q1

Quarter
12
Percentage change in US consumption expenditure
2
Fitted (predicted values)
−1
−2
−1 0 1 2
Data (actual values)
13
fit_consMR %>% gg_tsresiduals()

Innovation residuals
1.0
0.5
0.0
−0.5
−1.0
1980 Q1 2000 Q1 2020 Q1
Quarter
40
0.1
30
0.0 count
acf
20
−0.1 10
−0.2 0
2 4 6 8 10 12 14 16 18 20 22 −1.0 −0.5 0.0 0.5 1.0
lag [1Q] .resid
14
Some useful predictors
15
Trend
Linear trend
xt = t
• t = 1, 2, . . . , T
• Strong assumption that trend will continue.
16
Dummy variables
• If a categorical variable takes only two values (e.g., Yes'

orNo’), then an equivalent numerical variable can be
constructed taking value 1 if yes and 0 if no. This is called a
dummy variable.
17
Dummy variables
• If there are more than two categories, then the variable can be
coded using several dummy variables (one fewer than the total
number of categories).
18
Beware of the dummy variable trap!
• Using one dummy for each category gives too many dummy
variables!
• The regression will then be singular and inestimable.
• Either omit the constant, or omit the dummy for one category.
• The coefficients of the dummies are relative to the omitted
category.
19
Uses of dummy variables
Seasonal dummies
• For quarterly data: use 3 dummies
• For monthly data: use 11 dummies
• For daily data: use 6 dummies
• What to do with weekly data?
Outliers
• If there is an outlier, you can use a dummy variable
to remove its effect.
Public holidays
• For daily data: if it is a public holiday, dummy=1,
otherwise dummy=0.
20
Beer production revisited
Australian quarterly beer production
500
Megalitres
450
400
1995 Q1 2000 Q1 2005 Q1 2010 Q1

Quarter [1Q]
Regression model
yt = β0 + β1 t + β2 d2,t + β3 d3,t + β4 d4,t + εt
• di,t = 1 if t is quarter i and 0 otherwise.
21
fit_beer <- recent_production %>% model(TSLM(Beer ~ trend() + season()))
report(fit_beer)
## Series: Beer
## Model: TSLM
##
## Residuals:
## -42.9 -7.6 -0.5 8.0 21.8
##
## Coefficients:
## (Intercept) 441.8004 3.7335 118.33 < 2e-16 ***
## trend() -0.3403 0.0666 -5.11 2.7e-06 ***
## season()year2 -34.6597 3.9683 -8.73 9.1e-13 ***
## season()year3 -17.8216 4.0225 -4.43 3.4e-05 ***
## season()year4 72.7964 4.0230 18.09 < 2e-16 ***
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## F-statistic: 211 on 4 and 69 DF, p-value: <2e-16
22
augment(fit_beer) %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Beer, colour = "Data")) +
geom_line(aes(y = .fitted, colour = "Fitted")) +
labs(y="Megalitres",title ="Australian quarterly beer production") +
scale_colour_manual(values = c(Data = "black", Fitted = "#D55E00"))
Australian quarterly beer production
500
Megalitres
colour
450 Data
Fitted
400
1995 Q1 2000 Q1 2005 Q1 2010 Q1

Quarter
23
Quarterly beer production
480
Quarter
1
Fitted
2
440
3
4
400
400 450 500

Actual values
24
fit_beer %>% gg_tsresiduals()

20
−20
−40
1995 Q1 2000 Q1 2005 Q1 2010 Q1
Quarter
0.2
0.1 20
count
acf
0.0
10
−0.1
−0.2
0
2 4 6 8 10 12 14 16 18 −25 0 25
lag [1Q] .resid
25
fit_beer %>% forecast %>% autoplot(recent_production)
500
level
Beer
450
80
95
400
350
1995 Q1 2000 Q1 2005 Q1 2010 Q1
Quarter
26
Intervention variables
Spikes
• Equivalent to a dummy variable for handling an outlier.
Steps
• Variable takes value 0 before the intervention and 1

afterwards.
Change of slope
• Variables take values 0 before the intervention and values

{1, 2, 3, . . . } afterwards.
27
Holidays
For monthly data
• Christmas: always in December so part of monthly seasonal

effect
• Easter: use a dummy variable vt = 1 if any part of Easter is in
that month, vt = 0 otherwise.
• Ramadan and Chinese new year similar.
28
Trading days
With monthly data, if the observations vary depending on how

many different types of days in the month, then trading day
predictors can be useful.
z1 = # Mondays in month;
z2 = # Tuesdays in month;
..
.
z7 = # Sundays in month.
29
Distributed lags
Lagged values of a predictor.

Example: x is advertising which has a delayed effect
x1 = advertising for previous month;

x2 = advertising for two months previously;
..
.
xm = advertising for m months previously.
30
Nonlinear trend
Piecewise linear trend with bend at τ
x1,t = t
{
0 t<τ
x2,t =
(t − τ ) t ≥ τ
Quadratic or higher order trend
x1,t = t, x2,t = t 2 , ...
31
Nonlinear trend
Piecewise linear trend with bend at τ
x1,t = t
{
0 t<τ
x2,t =
(t − τ ) t ≥ τ
Quadratic or higher order trend
x1,t = t, x2,t = t 2 , ...
NOT RECOMMENDED!
31
Example: Boston marathon winning times
marathon <- boston_marathon %>%

filter(Event == "Men's open division") %>%
select(-Event) %>%
mutate(Minutes = as.numeric(Time)/60)
marathon %>% autoplot(Minutes) +
labs(y="Winning times in minutes")
170
Winning times in minutes
160
150
140
130
1900 1925 1950 1975 2000 2025

Year [1Y]
32
fit_trends <- marathon %>%

model(
# Linear trend
linear = TSLM(Minutes ~ trend()),
# Exponential trend
exponential = TSLM(log(Minutes) ~ trend()),
# Piecewise linear trend
piecewise = TSLM(Minutes ~ trend(knots = c(1940, 1980)))
)
fit_trends
## # A mable: 1 x 3
## linear exponential piecewise
## <model> <model> <model>
## 1 <TSLM> <TSLM> <TSLM>
33
fit_trends %>% forecast(h=10) %>% autoplot(marathon)
Boston marathon winning times
160 level
95
Minutes
140
.model
exponential
linear
piecewise
120
1920 1960 2000

Year [1Y]
34
fit_trends %>% select(piecewise) %>%

gg_tsresiduals()
20
10
−10
1900 1925 1950 1975 2000 2025

Year
0.3
0.2 20
0.1 count
acf
0.0 10
−0.1
−0.2 0
5 10 15 20 −10 0 10 20
lag [1Y] .resid
35
Residual diagnostics
36
Multiple regression and forecasting
For forecasting purposes, we require the following assumptions:
• εt are uncorrelated and zero mean

• εt are uncorrelated with each xj,t .
It is useful to also have εt ∼ N(0, σ 2 ) when producing prediction

intervals or doing statistical tests.
37
Residual plots
Useful for spotting outliers and whether the linear model was
appropriate.
• Scatterplot of residuals εt against each predictor xj,t .

• Scatterplot residuals against the fitted values ŷt
• Expect to see scatterplots resembling a horizontal band with
no values too far from the band and no patterns such as
curvature or increasing spread.
38
Residual patterns
• If a plot of the residuals vs any predictor in the model shows a

pattern, then the relationship is nonlinear.
• If a plot of the residuals vs any predictor not in the model
shows a pattern, then the predictor should be added to the
model.
• If a plot of the residuals vs fitted values shows a pattern, then
there is heteroscedasticity in the errors. (Could try a
transformation.)
39
Selecting predictors and forecast
evaluation
40
Comparing regression models
Computer output for regression will always give the R 2 value. This
is a useful summary of the model.
• It is equal to the square of the correlation between y and ŷ .

• It is often called the “coefficient of determination’ ’.
• It can also be calculated as follows:
∑
2 (ŷt − ȳ )2
R =∑
(yt − ȳ )
2
• It is the proportion of variance accounted for (explained) by

the predictors.
41
However . . .
• R 2 does not allow for “degrees of freedom’ ’.
• Adding any variable tends to increase the value of R 2 ,
even if that variable is irrelevant.
To overcome this problem, we can use adjusted R 2 :
T −1
R̄ 2 = 1 − (1 − R 2 )
T −k −1
where k = no. predictors and T = no. observations.
Maximizing R̄ 2 is equivalent to minimizing σ̂ 2 .
2 1 ∑T
σ̂ = ε2t
T − k − 1 t=1
42
Akaike’s Information Criterion
AIC = −2 log(L) + 2(k + 2)
where L is the likelihood and k is the number of predictors in the

model.
• AIC penalizes terms more heavily than R̄ 2 .

• Minimizing the AIC is asymptotically equivalent to minimizing
MSE via leave-one-out cross-validation (for any linear
regression).
43
Corrected AIC
For small values of T , the AIC tends to select too many predictors,
and so a bias-corrected version of the AIC has been developed.
2(k + 2)(k + 3)
AICC = AIC +
T −k −3
As with the AIC, the AICC should be minimized.
44
Bayesian Information Criterion
BIC = −2 log(L) + (k + 2) log(T )
where L is the likelihood and k is the number of predictors in the

model.
• BIC penalizes terms more heavily than AIC

• Also called SBIC and SC.
• Minimizing BIC is asymptotically equivalent to leave-v -out
cross-validation when v = T [1 − 1/(log(T ) − 1)].
45
Leave-one-out cross-validation
For regression, leave-one-out cross-validation is faster and more

efficient than time-series cross-validation.
• Select one observation for test set, and use remaining

observations in training set. Compute error on test
observation.
• Repeat using each possible observation as the test set.
• Compute accuracy measure over all errors.
46
Cross-validation
Traditional evaluation
Training data Test data
time
47
Cross-validation
time
Time series cross-validation

h=1
time 47
Cross-validation
time
Leave-one-out cross-validation
h=1
time 48
glance(fit_trends) %>%
select(.model, r_squared, adj_r_squared, AICc, CV)
## # A tibble: 3 x 5
## .model r_squared adj_r_squared AICc CV
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 linear 0.728 0.726 452. 39.1
## 2 exponential 0.744 0.742 -779. 0.00176
## 3 piecewise 0.767 0.761 438. 34.8
• Be careful making comparisons when transformations

are used.
49
Choosing regression variables
Best subsets regression
• Fit all possible regression models using one or more of the

predictors.
• Choose the best model based on one of the measures of
predictive ability (CV, AIC, AICc).
50
Best subsets regression
• Fit all possible regression models using one or more of the

predictors.
• Choose the best model based on one of the measures of
predictive ability (CV, AIC, AICc).
Warning!
• If there are a large number of predictors, this is not possible.

• For example, 44 predictors leads to 18 trillion possible models!
50
Backwards stepwise regression
• Start with a model containing all variables.

• Try subtracting one variable at a time. Keep the model if it
has lower CV or AICc.
• Iterate until no further improvement.
Notes
• Stepwise regression is not guaranteed to lead to the best

possible model.
• Inference on coefficients of final model will be wrong.
51
Forecasting with regression
52
Ex-ante versus ex-post forecasts
• Ex ante forecasts are made using only information available in

advance.
• require forecasts of predictors
• Ex post forecasts are made using later information on the
predictors.
• useful for studying behaviour of forecasting models.
• trend, seasonal and calendar variables are all known in
advance, so these don’t need to be forecast.
53
Scenario based forecasting
• Assumes possible scenarios for the predictor variables

• Prediction intervals for scenario based forecasts do not include
the uncertainty associated with the future values of the
predictor variables.
54
Building a predictive regression model
• If getting forecasts of predictors is difficult, you can use lagged

predictors instead.
yt = β0 + β1 x1,t−h + · · · + βk xk,t−h + εt
• A different model for each forecast horizon h.
55
US Consumption
fit_consBest <- us_change %>%

model(
TSLM(Consumption ~ Income + Savings + Unemployment)
)
future_scenarios <- scenarios(

Increase = new_data(us_change, 4) %>%
mutate(Income=1, Savings=0.5, Unemployment=0),
Decrease = new_data(us_change, 4) %>%
mutate(Income=-1, Savings=-0.5, Unemployment=0),
names_to = "Scenario")
fc <- forecast(fit_consBest, new_data = future_scenarios)
56
US Consumption
us_change %>% autoplot(Consumption) +

labs(y="% change in US consumption") +
autolayer(fc) +
labs(title = "US consumption", y = "% change")
US consumption
Scenario
1 Decrease
% change
Increase
0
level
80
−1
95
−2
1980 Q1 2000 Q1 2020 Q1

Quarter [1Q]
57
Matrix formulation
58
Matrix formulation
yt = β0 + β1 x1,t + β2 x2,t + · · · + βk xk,t + εt .
Let y = (y1 , . . . , yT )′ , ε = (ε1 , . . . , εT )′ , β = (β0 , β1 , . . . , βk )′ and

 
1 x1,1 x2,1 ... xk,1
 
1 x1,2 x2,2 ... xk,2 
 
X = . .. .. ..  .
 .. . . . 
 
1 x1,T x2,T ... xk,T
Then
y = Xβ + ε.
59
Matrix formulation
Least squares estimation

Minimize: (y − Xβ)′ (y − Xβ)
Differentiate with respect to β gives
β̂ = (X ′ X)−1 X ′ y (Prove it!)
(The “normal equation”.)
1
σ̂ 2 = (y − X β̂)′ (y − X β̂) (Prove it!)
T −k −1
Note: If you fall for the dummy variable trap, (X ′ X) is a singular

matrix.
60
Likelihood
If the errors are iid and normally distributed, then
y ∼ N(Xβ, σ 2 I).
So the likelihood is
( )
1 1
L= T T /2
exp − 2 (y − Xβ)′ (y − Xβ)
σ (2π) 2σ
which is maximized when (y − Xβ)′ (y − Xβ) is minimized.
So MLE = OLS. (Prove it!)
61
Exercise to do!
• Consider a simple linear regression model of the form:
yt = βxt + ϵt ,
ϵt ’s are normally distributed independent random variables

having mean zero and constant variance σ 2 . Based on
n-observations (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ).
• What is the least squares estimate of β?
• Find the variance of the least squares estimate β??
62
Multiple regression forecasts
Optimal forecasts
ŷ ∗ = E(y ∗ |y, X, x ∗ ) = x ∗ β̂ = x ∗ (X ′ X)−1 X ′ y
where x ∗ is a row vector containing the values of the predictors for

the forecasts (in the same format as X).
Forecast variance
[ ]
Var(y ∗ |X, x ∗ ) = σ 2 1 + x ∗ (X ′ X)−1 (x ∗ )′
• This ignores any errors in x ∗ .

• 95% prediction intervals assuming normal errors:
√
ŷ ∗ ± 1.96 Var(y ∗ |X, x ∗ ).
63
Fitted values
ŷ = X β̂ = X(X ′ X)−1 X ′ y = Hy
where H = X(X ′ X)−1 X ′ is the “hat matrix’ ’.
64
Fitted values
ŷ = X β̂ = X(X ′ X)−1 X ′ y = Hy
where H = X(X ′ X)−1 X ′ is the “hat matrix’ ’.

Leave-one-out residuals
Let h1 , . . . , hT be the diagonal values of H, then the
cross-validation statistic is
1 ∑T
CV = [et /(1 − ht )]2 ,
T t=1
where et is the residual obtained from fitting the model to all T

observations.
64
Correlation, causation and
forecasting
65
Correlation is not causation
• When x is useful for predicting y , it is not necessarily causing

y.
• e.g., predict number of drownings y using number of
ice-creams sold x .
• Correlations are useful for forecasting, even when there is no
causality.
• Better models usually involve causal relationships (e.g.,
temperature x and people z to predict drownings y ).
66
Multicollinearity
In regression analysis, multicollinearity occurs when:
• Two predictors are highly correlated (i.e., the correlation

between them is close to ±1).
• A linear combination of some of the predictors is highly
correlated with another predictor.
• A linear combination of one subset of predictors is highly
correlated with a linear combination of another subset of
predictors.
67
Multicollinearity
If multicollinearity exists. . .
• the numerical estimates of coefficients may be wrong (worse

in Excel than in a statistics package)
• don’t rely on the p-values to determine significance.
• there is no problem with model predictions provided the
predictors used for forecasting are within the range used for
fitting.
• omitting variables can help.
• combining variables can help.
68
Exercise to do!
Data set olympic_running contains the winning times (in seconds)

in each Olympic Games sprint, middle-distance and long-distance
track events from 1896 to 2016.
• Plot the winning time against the year for each event. Describe
the main features of the plot.
• Fit a regression line to the data for each event. Obviously the
winning times have been decreasing, but at what average rate
per year?
• Plot the residuals against the year. What does this indicate
about the suitability of the fitted lines?
• Predict the winning time for each race in the 2020 Olympics.
Give a prediction interval for your forecasts. What assumptions
have you made in these calculations?
69
Next Lecture!
• In the next lecture, we learn The Exponential Smoothing
Please go to Chapter 8 of text book (Hyndman, R. J. &

Athanasopoulos, G. (2021) Forecasting: principles and practice,
3rd edition. https://otexts.com/fpp3/) beforehand.
70

Unit 3 Regression Models

Uploaded by

Unit 3 Regression Models

Uploaded by

DASC6510/DASC4990

Unit 3: Regression models

Erfanul Hoque, PhD

We discuss regression models. The basic concept is that we

yt = β0 + β1 x1,t + β2 x2,t + · · · + βk xk,t + εt .

• yt is the variable we want to forecast: the “response”

1980 Q1 2000 Q1 2020 Q1

−2.5 0.0 2.5

fit_cons <- us_change %>%

Consumption Income Production Savings Unemployment

Percent change in US consumption expenditure

1980 Q1 2000 Q1 2020 Q1

Percentage change in US consumption expenditure

fit_consMR %>% gg_tsresiduals()

• If a categorical variable takes only two values (e.g., Yes'

1995 Q1 2000 Q1 2005 Q1 2010 Q1

• di,t = 1 if t is quarter i and 0 otherwise.

Australian quarterly beer production

1995 Q1 2000 Q1 2005 Q1 2010 Q1

Quarterly beer production

400 450 500

fit_beer %>% gg_tsresiduals()

fit_beer %>% forecast %>% autoplot(recent_production)

• Equivalent to a dummy variable for handling an outlier.

• Variable takes value 0 before the intervention and 1

• Variables take values 0 before the intervention and values

For monthly data

• Christmas: always in December so part of monthly seasonal

With monthly data, if the observations vary depending on how

Lagged values of a predictor.

x1 = advertising for previous month;

Piecewise linear trend with bend at τ

Quadratic or higher order trend

x1,t = t, x2,t = t 2 , ...

Piecewise linear trend with bend at τ

Quadratic or higher order trend

x1,t = t, x2,t = t 2 , ...

marathon <- boston_marathon %>%

1900 1925 1950 1975 2000 2025

fit_trends <- marathon %>%

fit_trends %>% forecast(h=10) %>% autoplot(marathon)

Boston marathon winning times

1920 1960 2000

fit_trends %>% select(piecewise) %>%

1900 1925 1950 1975 2000 2025

For forecasting purposes, we require the following assumptions:

• εt are uncorrelated and zero mean

It is useful to also have εt ∼ N(0, σ 2 ) when producing prediction

• Scatterplot of residuals εt against each predictor xj,t .

• If a plot of the residuals vs any predictor in the model shows a

• It is equal to the square of the correlation between y and ŷ .

• It is the proportion of variance accounted for (explained) by

AIC = −2 log(L) + 2(k + 2)

where L is the likelihood and k is the number of predictors in the

• AIC penalizes terms more heavily than R̄ 2 .

As with the AIC, the AICC should be minimized.

BIC = −2 log(L) + (k + 2) log(T )

where L is the likelihood and k is the number of predictors in the

• BIC penalizes terms more heavily than AIC

For regression, leave-one-out cross-validation is faster and more

• Select one observation for test set, and use remaining

Time series cross-validation

• Be careful making comparisons when transformations

Best subsets regression

• Fit all possible regression models using one or more of the

Best subsets regression

• Fit all possible regression models using one or more of the

• If there are a large number of predictors, this is not possible.

Backwards stepwise regression

• Start with a model containing all variables.

• Stepwise regression is not guaranteed to lead to the best

• Ex ante forecasts are made using only information available in

• Assumes possible scenarios for the predictor variables

• If getting forecasts of predictors is diﬃcult, you can use lagged

• A diﬀerent model for each forecast horizon h.

fit_consBest <- us_change %>%