0% found this document useful (0 votes)
4 views74 pages

Unit 3 Regression Models

e

Uploaded by

jenapham129
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
4 views74 pages

Unit 3 Regression Models

e

Uploaded by

jenapham129
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 74

DASC6510/DASC4990

Unit 3: Regression models

Erfanul Hoque, PhD


Thompson Rivers University
The note is strongly inspired by the materials shared on the Book:
Hyndman, R. J. & Athanasopoulos, G. (2021) Forecasting:
principles and practice, 3rd edition. https://otexts.com/fpp3/

2
Time series linear model

3
Time series linear model

We discuss regression models. The basic concept is that we


forecast the time series of interest y assuming that it has a linear
relationship with other time series x.

4
Multiple regression and forecasting

yt = β0 + β1 x1,t + β2 x2,t + · · · + βk xk,t + εt .

• yt is the variable we want to forecast: the “response”


variable
• Each xj,t is numerical and is called a “predictor”.
They are usually assumed to be known for all past
and future times.
• The coefficients β1 , . . . , βk measure the effect of each
predictor after taking account of the effect of all
other predictors in the model.
That is, the coefficients measure the marginal effects of
predictor variables. 5
Example: US consumption expenditure

us_change %>%
pivot_longer(c(Consumption, Income), names_to="Series") %>%
autoplot(value) +
labs(y="% change")

2.5
% change

Series
Consumption
0.0
Income

−2.5

1980 Q1 2000 Q1 2020 Q1


Quarter [1Q]
6
Example: US consumption expenditure

us_change %>%
ggplot(aes(x = Income, y = Consumption)) +
labs(y = "Consumption (quarterly % change)",
x = "Income (quarterly % change)") +
geom_point() + geom_smooth(method = "lm", se = FALSE)

2
Consumption (quarterly % change)

−1

−2

−2.5 0.0 2.5


Income (quarterly % change)
7
Example: US consumption expenditure

fit_cons <- us_change %>%


model(lm = TSLM(Consumption ~ Income))
report(fit_cons)

## Series: Consumption
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.582 -0.278 0.019 0.323 1.422
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5445 0.0540 10.08 < 2e-16 ***
## Income 0.2718 0.0467 5.82 2.4e-08 ***
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.591 on 196 degrees of freedom
## Multiple R-squared: 0.147, Adjusted R-squared: 0.143
## F-statistic: 33.8 on 1 and 196 DF, p-value: 2e-08 8
Example: US consumption expenditure

Consumption
1
0
−1
−2

2.5

Income
0.0
−2.5

2.5

Production
0.0
−2.5
−5.0

40
20

Savings
0
−20
−40
−60
1.5

Unemployment
1.0
0.5
0.0
−0.5
−1.0
1980 Q1 2000 Q1 2020 Q1
Quarter

9
Example: US consumption expenditure

Consumption Income Production Savings Unemployment

Consumption
0.6
0.4
Corr: Corr: Corr: Corr:
0.2 0.384*** 0.529*** −0.257*** −0.527***
0.0

2.5

Income
Corr: Corr: Corr:
0.0
0.269*** 0.720*** −0.224**
−2.5

2.5

Production
0.0 Corr: Corr:
−2.5 −0.059 −0.768***
−5.0

40
20

Savings
0 Corr:
−20 0.106
−40
−60
1.5

Unemployment
1.0
0.5
0.0
−0.5
−1.0
−2 −1 0 1 2 −2.5 0.0 2.5 −5.0 −2.5 0.0 2.5 −60 −40 −20 0 20 40−1.0 −0.5 0.0 0.5 1.0 1.5

10
Example: US consumption expenditure
fit_consMR <- us_change %>%
model(lm = TSLM(Consumption ~ Income + Production + Unemployment + Savings))
report(fit_consMR)

## Series: Consumption
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.906 -0.158 -0.036 0.136 1.155
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.25311 0.03447 7.34 5.7e-12 ***
## Income 0.74058 0.04012 18.46 < 2e-16 ***
## Production 0.04717 0.02314 2.04 0.043 *
## Unemployment -0.17469 0.09551 -1.83 0.069 .
## Savings -0.05289 0.00292 -18.09 < 2e-16 ***
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.31 on 193 degrees of freedom
## Multiple R-squared: 0.768, Adjusted R-squared: 0.763
## F-statistic: 160 on 4 and 193 DF, p-value: <2e-16
11
Example: US consumption expenditure

Percent change in US consumption expenditure

Data
0
Fitted

−1

−2

1980 Q1 2000 Q1 2020 Q1


Quarter

12
Example: US consumption expenditure

Percentage change in US consumption expenditure

2
Fitted (predicted values)

−1

−2

−1 0 1 2
Data (actual values)

13
Example: US consumption expenditure

fit_consMR %>% gg_tsresiduals()


Innovation residuals

1.0

0.5

0.0

−0.5

−1.0
1980 Q1 2000 Q1 2020 Q1
Quarter

40
0.1
30
0.0 count
acf

20
−0.1 10

−0.2 0
2 4 6 8 10 12 14 16 18 20 22 −1.0 −0.5 0.0 0.5 1.0
lag [1Q] .resid

14
Some useful predictors

15
Trend

Linear trend
xt = t

• t = 1, 2, . . . , T
• Strong assumption that trend will continue.

16
Dummy variables

• If a categorical variable takes only two values (e.g., Yes'


orNo’), then an equivalent numerical variable can be
constructed taking value 1 if yes and 0 if no. This is called a
dummy variable.

17
Dummy variables

• If there are more than two categories, then the variable can be
coded using several dummy variables (one fewer than the total
number of categories).

18
Beware of the dummy variable trap!

• Using one dummy for each category gives too many dummy
variables!
• The regression will then be singular and inestimable.
• Either omit the constant, or omit the dummy for one category.
• The coefficients of the dummies are relative to the omitted
category.

19
Uses of dummy variables

Seasonal dummies
• For quarterly data: use 3 dummies
• For monthly data: use 11 dummies
• For daily data: use 6 dummies
• What to do with weekly data?
Outliers
• If there is an outlier, you can use a dummy variable
to remove its effect.
Public holidays
• For daily data: if it is a public holiday, dummy=1,
otherwise dummy=0.
20
Beer production revisited
Australian quarterly beer production

500
Megalitres

450

400

1995 Q1 2000 Q1 2005 Q1 2010 Q1


Quarter [1Q]

Regression model
yt = β0 + β1 t + β2 d2,t + β3 d3,t + β4 d4,t + εt

• di,t = 1 if t is quarter i and 0 otherwise.

21
Beer production revisited
fit_beer <- recent_production %>% model(TSLM(Beer ~ trend() + season()))
report(fit_beer)

## Series: Beer
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.9 -7.6 -0.5 8.0 21.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 441.8004 3.7335 118.33 < 2e-16 ***
## trend() -0.3403 0.0666 -5.11 2.7e-06 ***
## season()year2 -34.6597 3.9683 -8.73 9.1e-13 ***
## season()year3 -17.8216 4.0225 -4.43 3.4e-05 ***
## season()year4 72.7964 4.0230 18.09 < 2e-16 ***
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.2 on 69 degrees of freedom
## Multiple R-squared: 0.924, Adjusted R-squared: 0.92
## F-statistic: 211 on 4 and 69 DF, p-value: <2e-16
22
Beer production revisited

augment(fit_beer) %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Beer, colour = "Data")) +
geom_line(aes(y = .fitted, colour = "Fitted")) +
labs(y="Megalitres",title ="Australian quarterly beer production") +
scale_colour_manual(values = c(Data = "black", Fitted = "#D55E00"))

Australian quarterly beer production

500
Megalitres

colour
450 Data
Fitted

400

1995 Q1 2000 Q1 2005 Q1 2010 Q1


Quarter
23
Beer production revisited

Quarterly beer production

480
Quarter
1
Fitted

2
440
3
4

400

400 450 500


Actual values

24
Beer production revisited

fit_beer %>% gg_tsresiduals()


Innovation residuals

20

−20

−40
1995 Q1 2000 Q1 2005 Q1 2010 Q1
Quarter

0.2
0.1 20

count
acf

0.0
10
−0.1
−0.2
0
2 4 6 8 10 12 14 16 18 −25 0 25
lag [1Q] .resid

25
Beer production revisited

fit_beer %>% forecast %>% autoplot(recent_production)

500

level
Beer

450
80
95

400

350
1995 Q1 2000 Q1 2005 Q1 2010 Q1
Quarter

26
Intervention variables

Spikes

• Equivalent to a dummy variable for handling an outlier.

Steps

• Variable takes value 0 before the intervention and 1


afterwards.

Change of slope

• Variables take values 0 before the intervention and values


{1, 2, 3, . . . } afterwards.

27
Holidays

For monthly data

• Christmas: always in December so part of monthly seasonal


effect
• Easter: use a dummy variable vt = 1 if any part of Easter is in
that month, vt = 0 otherwise.
• Ramadan and Chinese new year similar.

28
Trading days

With monthly data, if the observations vary depending on how


many different types of days in the month, then trading day
predictors can be useful.

z1 = # Mondays in month;
z2 = # Tuesdays in month;
..
.
z7 = # Sundays in month.

29
Distributed lags

Lagged values of a predictor.


Example: x is advertising which has a delayed effect

x1 = advertising for previous month;


x2 = advertising for two months previously;
..
.
xm = advertising for m months previously.

30
Nonlinear trend

Piecewise linear trend with bend at τ

x1,t = t
{
0 t<τ
x2,t =
(t − τ ) t ≥ τ

Quadratic or higher order trend

x1,t = t, x2,t = t 2 , ...

31
Nonlinear trend

Piecewise linear trend with bend at τ

x1,t = t
{
0 t<τ
x2,t =
(t − τ ) t ≥ τ

Quadratic or higher order trend

x1,t = t, x2,t = t 2 , ...

NOT RECOMMENDED!
31
Example: Boston marathon winning times

marathon <- boston_marathon %>%


filter(Event == "Men's open division") %>%
select(-Event) %>%
mutate(Minutes = as.numeric(Time)/60)
marathon %>% autoplot(Minutes) +
labs(y="Winning times in minutes")

170
Winning times in minutes

160

150

140

130

1900 1925 1950 1975 2000 2025


Year [1Y]
32
Example: Boston marathon winning times

fit_trends <- marathon %>%


model(
# Linear trend
linear = TSLM(Minutes ~ trend()),
# Exponential trend
exponential = TSLM(log(Minutes) ~ trend()),
# Piecewise linear trend
piecewise = TSLM(Minutes ~ trend(knots = c(1940, 1980)))
)

fit_trends

## # A mable: 1 x 3
## linear exponential piecewise
## <model> <model> <model>
## 1 <TSLM> <TSLM> <TSLM>

33
Example: Boston marathon winning times

fit_trends %>% forecast(h=10) %>% autoplot(marathon)

Boston marathon winning times

160 level
95
Minutes

140
.model
exponential
linear
piecewise
120

1920 1960 2000


Year [1Y]

34
Example: Boston marathon winning times

fit_trends %>% select(piecewise) %>%


gg_tsresiduals()
Innovation residuals

20

10

−10

1900 1925 1950 1975 2000 2025


Year

0.3
0.2 20
0.1 count
acf

0.0 10
−0.1
−0.2 0
5 10 15 20 −10 0 10 20
lag [1Y] .resid

35
Residual diagnostics

36
Multiple regression and forecasting

For forecasting purposes, we require the following assumptions:

• εt are uncorrelated and zero mean


• εt are uncorrelated with each xj,t .

It is useful to also have εt ∼ N(0, σ 2 ) when producing prediction


intervals or doing statistical tests.

37
Residual plots

Useful for spotting outliers and whether the linear model was
appropriate.

• Scatterplot of residuals εt against each predictor xj,t .


• Scatterplot residuals against the fitted values ŷt
• Expect to see scatterplots resembling a horizontal band with
no values too far from the band and no patterns such as
curvature or increasing spread.

38
Residual patterns

• If a plot of the residuals vs any predictor in the model shows a


pattern, then the relationship is nonlinear.
• If a plot of the residuals vs any predictor not in the model
shows a pattern, then the predictor should be added to the
model.
• If a plot of the residuals vs fitted values shows a pattern, then
there is heteroscedasticity in the errors. (Could try a
transformation.)

39
Selecting predictors and forecast
evaluation

40
Comparing regression models

Computer output for regression will always give the R 2 value. This
is a useful summary of the model.

• It is equal to the square of the correlation between y and ŷ .


• It is often called the “coefficient of determination’ ’.
• It can also be calculated as follows:

2 (ŷt − ȳ )2
R =∑
(yt − ȳ )
2

• It is the proportion of variance accounted for (explained) by


the predictors.

41
Comparing regression models

However . . .
• R 2 does not allow for “degrees of freedom’ ’.
• Adding any variable tends to increase the value of R 2 ,
even if that variable is irrelevant.
To overcome this problem, we can use adjusted R 2 :

T −1
R̄ 2 = 1 − (1 − R 2 )
T −k −1
where k = no. predictors and T = no. observations.
Maximizing R̄ 2 is equivalent to minimizing σ̂ 2 .
2 1 ∑T
σ̂ = ε2t
T − k − 1 t=1
42
Akaike’s Information Criterion

AIC = −2 log(L) + 2(k + 2)

where L is the likelihood and k is the number of predictors in the


model.

• AIC penalizes terms more heavily than R̄ 2 .


• Minimizing the AIC is asymptotically equivalent to minimizing
MSE via leave-one-out cross-validation (for any linear
regression).

43
Corrected AIC

For small values of T , the AIC tends to select too many predictors,
and so a bias-corrected version of the AIC has been developed.

2(k + 2)(k + 3)
AICC = AIC +
T −k −3

As with the AIC, the AICC should be minimized.

44
Bayesian Information Criterion

BIC = −2 log(L) + (k + 2) log(T )

where L is the likelihood and k is the number of predictors in the


model.

• BIC penalizes terms more heavily than AIC


• Also called SBIC and SC.
• Minimizing BIC is asymptotically equivalent to leave-v -out
cross-validation when v = T [1 − 1/(log(T ) − 1)].

45
Leave-one-out cross-validation

For regression, leave-one-out cross-validation is faster and more


efficient than time-series cross-validation.

• Select one observation for test set, and use remaining


observations in training set. Compute error on test
observation.
• Repeat using each possible observation as the test set.
• Compute accuracy measure over all errors.

46
Cross-validation

Traditional evaluation
Training data Test data
time

47
Cross-validation

Traditional evaluation
Training data Test data
time

Time series cross-validation


h=1

time 47
Cross-validation

Traditional evaluation
Training data Test data
time

Leave-one-out cross-validation
h=1

time 48
Comparing regression models

glance(fit_trends) %>%
select(.model, r_squared, adj_r_squared, AICc, CV)

## # A tibble: 3 x 5
## .model r_squared adj_r_squared AICc CV
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 linear 0.728 0.726 452. 39.1
## 2 exponential 0.744 0.742 -779. 0.00176
## 3 piecewise 0.767 0.761 438. 34.8

• Be careful making comparisons when transformations


are used.
49
Choosing regression variables

Best subsets regression

• Fit all possible regression models using one or more of the


predictors.
• Choose the best model based on one of the measures of
predictive ability (CV, AIC, AICc).

50
Choosing regression variables

Best subsets regression

• Fit all possible regression models using one or more of the


predictors.
• Choose the best model based on one of the measures of
predictive ability (CV, AIC, AICc).

Warning!

• If there are a large number of predictors, this is not possible.


• For example, 44 predictors leads to 18 trillion possible models!

50
Choosing regression variables

Backwards stepwise regression

• Start with a model containing all variables.


• Try subtracting one variable at a time. Keep the model if it
has lower CV or AICc.
• Iterate until no further improvement.

Notes

• Stepwise regression is not guaranteed to lead to the best


possible model.
• Inference on coefficients of final model will be wrong.

51
Forecasting with regression

52
Ex-ante versus ex-post forecasts

• Ex ante forecasts are made using only information available in


advance.
• require forecasts of predictors
• Ex post forecasts are made using later information on the
predictors.
• useful for studying behaviour of forecasting models.
• trend, seasonal and calendar variables are all known in
advance, so these don’t need to be forecast.

53
Scenario based forecasting

• Assumes possible scenarios for the predictor variables


• Prediction intervals for scenario based forecasts do not include
the uncertainty associated with the future values of the
predictor variables.

54
Building a predictive regression model

• If getting forecasts of predictors is difficult, you can use lagged


predictors instead.

yt = β0 + β1 x1,t−h + · · · + βk xk,t−h + εt

• A different model for each forecast horizon h.

55
US Consumption

fit_consBest <- us_change %>%


model(
TSLM(Consumption ~ Income + Savings + Unemployment)
)

future_scenarios <- scenarios(


Increase = new_data(us_change, 4) %>%
mutate(Income=1, Savings=0.5, Unemployment=0),
Decrease = new_data(us_change, 4) %>%
mutate(Income=-1, Savings=-0.5, Unemployment=0),
names_to = "Scenario")

fc <- forecast(fit_consBest, new_data = future_scenarios)

56
US Consumption

us_change %>% autoplot(Consumption) +


labs(y="% change in US consumption") +
autolayer(fc) +
labs(title = "US consumption", y = "% change")

US consumption

Scenario
1 Decrease
% change

Increase

0
level
80
−1
95

−2

1980 Q1 2000 Q1 2020 Q1


Quarter [1Q]

57
Matrix formulation

58
Matrix formulation

yt = β0 + β1 x1,t + β2 x2,t + · · · + βk xk,t + εt .

Let y = (y1 , . . . , yT )′ , ε = (ε1 , . . . , εT )′ , β = (β0 , β1 , . . . , βk )′ and


 
1 x1,1 x2,1 ... xk,1
 
1 x1,2 x2,2 ... xk,2 
 
X = . .. .. ..  .
 .. . . . 
 
1 x1,T x2,T ... xk,T

Then

y = Xβ + ε.

59
Matrix formulation

Least squares estimation


Minimize: (y − Xβ)′ (y − Xβ)
Differentiate with respect to β gives

β̂ = (X ′ X)−1 X ′ y (Prove it!)

(The “normal equation”.)

1
σ̂ 2 = (y − X β̂)′ (y − X β̂) (Prove it!)
T −k −1

Note: If you fall for the dummy variable trap, (X ′ X) is a singular


matrix.

60
Likelihood

If the errors are iid and normally distributed, then

y ∼ N(Xβ, σ 2 I).

So the likelihood is
( )
1 1
L= T T /2
exp − 2 (y − Xβ)′ (y − Xβ)
σ (2π) 2σ

which is maximized when (y − Xβ)′ (y − Xβ) is minimized.

So MLE = OLS. (Prove it!)

61
Exercise to do!

• Consider a simple linear regression model of the form:

yt = βxt + ϵt ,

ϵt ’s are normally distributed independent random variables


having mean zero and constant variance σ 2 . Based on
n-observations (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ).
• What is the least squares estimate of β?
• Find the variance of the least squares estimate β??

62
Multiple regression forecasts

Optimal forecasts
ŷ ∗ = E(y ∗ |y, X, x ∗ ) = x ∗ β̂ = x ∗ (X ′ X)−1 X ′ y

where x ∗ is a row vector containing the values of the predictors for


the forecasts (in the same format as X).
Forecast variance
[ ]
Var(y ∗ |X, x ∗ ) = σ 2 1 + x ∗ (X ′ X)−1 (x ∗ )′

• This ignores any errors in x ∗ .


• 95% prediction intervals assuming normal errors:

ŷ ∗ ± 1.96 Var(y ∗ |X, x ∗ ).

63
Multiple regression forecasts

Fitted values
ŷ = X β̂ = X(X ′ X)−1 X ′ y = Hy

where H = X(X ′ X)−1 X ′ is the “hat matrix’ ’.

64
Multiple regression forecasts

Fitted values
ŷ = X β̂ = X(X ′ X)−1 X ′ y = Hy

where H = X(X ′ X)−1 X ′ is the “hat matrix’ ’.


Leave-one-out residuals
Let h1 , . . . , hT be the diagonal values of H, then the
cross-validation statistic is

1 ∑T
CV = [et /(1 − ht )]2 ,
T t=1

where et is the residual obtained from fitting the model to all T


observations.
64
Correlation, causation and
forecasting

65
Correlation is not causation

• When x is useful for predicting y , it is not necessarily causing


y.
• e.g., predict number of drownings y using number of
ice-creams sold x .
• Correlations are useful for forecasting, even when there is no
causality.
• Better models usually involve causal relationships (e.g.,
temperature x and people z to predict drownings y ).

66
Multicollinearity

In regression analysis, multicollinearity occurs when:

• Two predictors are highly correlated (i.e., the correlation


between them is close to ±1).
• A linear combination of some of the predictors is highly
correlated with another predictor.
• A linear combination of one subset of predictors is highly
correlated with a linear combination of another subset of
predictors.

67
Multicollinearity

If multicollinearity exists. . .

• the numerical estimates of coefficients may be wrong (worse


in Excel than in a statistics package)
• don’t rely on the p-values to determine significance.
• there is no problem with model predictions provided the
predictors used for forecasting are within the range used for
fitting.
• omitting variables can help.
• combining variables can help.

68
Exercise to do!

Data set olympic_running contains the winning times (in seconds)


in each Olympic Games sprint, middle-distance and long-distance
track events from 1896 to 2016.

• Plot the winning time against the year for each event. Describe
the main features of the plot.
• Fit a regression line to the data for each event. Obviously the
winning times have been decreasing, but at what average rate
per year?
• Plot the residuals against the year. What does this indicate
about the suitability of the fitted lines?
• Predict the winning time for each race in the 2020 Olympics.
Give a prediction interval for your forecasts. What assumptions
have you made in these calculations?

69
Next Lecture!

• In the next lecture, we learn The Exponential Smoothing

Please go to Chapter 8 of text book (Hyndman, R. J. &


Athanasopoulos, G. (2021) Forecasting: principles and practice,
3rd edition. https://otexts.com/fpp3/) beforehand.

70

You might also like