Exploring Data Patterns
Exploring Data Patterns
1
';vl';df
2
When data grow or decline over several time periods, a
trend pattern exists. The following Figures show the trend
component:
3
The trend is a long term component that represents the
growth or decline in the time series over an extended period of
time.
4
When data collected over time fluctuate around a constant level
or mean, a horizontal pattern exists. This type of series is to be
stationary in its mean. Monthly sales for a food product that do
not increase or decrease consistently over an extended period
would be considered to have a horizontal pattern.
The seasonal component is a pattern that repeats itself
year after year.
Where
r k = the autocorrelation coefficient for a lag of k periods
Ȳ= the mean of the values of the series
5
Y t = the observation in time period t
Y t−k = the observation k time periods earlier or at time period t - k
6
160
150
140
130
120
120 130 140 150 160
n
∑ (Y t −Ȳ )(Y t −2 −Ȳ )
t=2+1 682
r2= n
= =. 463
1474
∑ (Y t −Ȳ )2
t=1
7
Autocorrelation coefficient for different time lags for a variable
can be used to answer the following questions about a time
series:
1. Are the data random?
2. Do the data have a trend (are they nonstationary)?
3. Are the data stationary?
4. Are the data seasonal?
Are the data random?
SE ( r k )=
√ 1+ 2 ∑ r 2i
n
i =1
(3.2)
Where
SE (r k )= the standard error (estimated standard deviation) of the
autocorrelation at time lag k
r i = the autocorrelation at time lag i
k= the time lag
n= the number of observations in the time series
9
At a specified confidence level, a series can be considered
random if each of the calculated autocorrelation coefficients is
within the interval about 0 given by 0±SE(r k ) , where the
multiplier t is an appropriate percentage point of a t distribution.
Where
n= the number of observations in the time series
k= the time lag
m= the number of time lags to be tested
10
Are the data in table 3-1 consistent with this model? This issue
will be explored in Example 3.2 and 3.3.
Example 3.2
A hypothesis test is developed to determine whether a particular
autocorrelation coefficient is significantly different from zero
for the correlation shown in Figure 3-5. The null and alternative
hypotheses for testing the significance of the lag 1 population
autocorrelation coefficient are
H 0 : ρ1 =0
H 1: : ρ1 ≠0
11
To test for zero autocorrelation at time lag 2, we consider
H 0 : ρ2 =0
H 1: : ρ2 ≠0
SE ( r k )=
√ 1+ 2 ∑
i=1
n
r 21
=
n
i=1
=
√ 1+2( . 572)
12
=
1 . 6544
√
12
=√ . 138=. 371
,
r2 . 463
t= = =1 .25
SE(r 2 ) . 371
12
Example 3. 3
Do the Data Have a Trend?
If a series has a trend, a significance relationship exists between
successive series values. The autocorrelation coefficients are
typically large for the first several time lags and then gradually
drop toward zero as the number of lags increases.
A stationary time series is one whose basic statistical
properties, such as the mean and variance remain constant over
time. Consequently a series that varies about a fixed level (no
growth or decline) over time is said to be stationary. A series
that contains trend is said to be nonstationary. The
autocorrelation coefficients for a stationary series decline to zero
fairly rapidly, generally after the second or third time lag. On
the other hand, sample autocorrelation for nonstationary series
remain fairly large for several time periods. Often, to analyze
nonstationary series, the trend is removed before additional
modeling occurs. The procedures discussed in chapter 9 use this
approach.
A method called differencing can be used to remove the
trend from nonstationary series. The data originally presented in
table 3-1 are shown again in figure 3-8, column A. the Y t
values are lagged one period, Y t−1 , are shown in column B. the
differences, Y t −Y t−1 (column A – column B), are shown in
column C. for example the first value of differences is Y 2−Y 1 =
130-123=7. Note the upward growth or trend of the VCR data
shown in Figure 3-9, Plot A. Now observe the stationary pattern
of the differenced data in figure 3-9, plot B. Differencing the
data has removed the trend.
13
A B C D E F E
1 Yt Y t−1 Differences
2 123
3 130 123 7
4 125 130 -5
5 138 125 13
6 145 138 7
7 142 145 -3
8 141 142 -1
9 146 141 5
10 147 146 1
11 157 147 10
12 150 157 -7
13 160 150 10
Figure 3-8 Excel Results of Differencing VCR Data of Example 3.1
160
yt
150
140
130
120
1 2 3 4 5 6 7 8 9 10 11 12
Figure 3-9 Time series plots the VCR Data and Differenced VCR Data for
Example 3-1
14
Differences
15
10
-5
-10
1 2 3 4 5 6 7 8 9 10 11
Example 3.4
An analyst for Sears is assigned the task of forecasting operating
revenue for 2005. She gathers the data for the years 1955 to
2004, shown in table 3-4. The data are plotted as time series in
figure 3-10. Notice that, although Sears operating revenues were
declining over the 2000-2004 period, the general trend over the
entire 1955-2004 time frame is up. First, the analyst computes a
95% confidence interval for the autocorrelation coefficients at
time lag 1 using 0±Z . 025 1 / √n where, for large samples, the
standard normal .025 points has replaced the corresponding t
distribution percentage point:
0±1.96 √ 1/50
0±.277
Table 3-4 Yearly operating revenue for Sears, 1955-2004, for Example 3.4
Year Yt Year Yt Year Yt Year Yt Year Yt
1955 3307 1967 7296 1979 17514 1991 57242 2003 23253
1956 3556 1968 8178 1980 25195 1992 52345 2004 19701
1957 3601 1969 8844 1981 27357 1993 50838
1958 3721 1970 9251 1982 30020 1994 54559
1959 4036 1971 10006 1983 35883 1995 34925
1960 4134 1972 10991 1984 38828 1996 38236
1961 4268 1973 12306 1985 40715 1997 41296
1962 4578 1974 13101 1986 44282 1998 41322
1963 5093 1975 13639 1987 48440 1999 41071
1964 5716 1976 14950 1988 50251 2000 40937
1965 6357 1977 17224 1989 53794 2001 36151
1966 6769 1978 17946 1990 55972 2002 30762
15
welfjk
Operating Revenue
70000
60000
50000
40000
30000
20000
10000
Next the analyst runs the data on Minitab and produces the
autocorrelation function shown in Figure 3-11. Upon the
examination, the analyst notices that the autocorrelation for the
first four time lags are significantly different from zero (.96, .92,
.87, and .81) and that the values then gradually drop to zero. As
a final check, the analyst looks at the Q statistic for 10 times
lags. He LBQ is 300.56, which is greater than the chi-square
value 18.3 (the upper .05 point of a chi-square distribution with
10 degrees of freedom). This result indicates the autocorrelation
for the first 10 lags as a group are significantly different from
zero. The analyst decides that the data are highly autocorrelated
and exhibit trendlike behavior.
The analyst suspects that the series can be differenced to
remove the trend and to create a stationary series. He differences
the data (see Minitab applications section at end of the chapter),
and the results are shown in Figure 3-12. The differenced series
shows no evidence of a trend, and the autocorrelation function,
16
shown in Figure 3-13, appears to support this conclusion.
Examining Figure 3-13, the analyst notes that the
autocorrelation coefficient at time lag 3, .32 is significantly
different from zero (tested at the .05 significance level). The
autocorrelations at lags other than lag 3 are small, and the LBQ
statistics for 10 lags is also relatively small, so there is little
evidence to suggest the differenced data are autocorrelated. Yet
the analyst wonders whether there is some pattern in these data
can be modeled by one of the more advanced forecasting
techniques discussed in Chapter 9.
Are the Data Seasonal?
If quarterly data with a seasonal pattern are analyzed, first
quarters tend to look alike, second quarters tend to look alike,
and so forth, and a significant autocorrelation coefficient will
appear at lag 4. If monthly data are analyzed, a significant
autocorrelation coefficient will appear at lag 12. That is January
will correlate with other Januarys, February will correlate with
other Februarys, and so on. Example 3.5 discusses a series that
is seasonal.
Example 3.5
Perkin is an analyst for the Costal Marine Corporation. Perkin
gathers the data shown in table 3-5 for the quarterly sales of the
corporation from 1994 to 2006 and plots them as the time series
Table 3-5 Quarterly sales for Costal Marine Corporation,
1994-2006, for example 3.5
year 31-Dec 31-Mar 30-Jun 30-Sep
1994 147.6 251.8 273.1 249.1
1995 139.3 221.2 260.2 259.5
1996 140.5 245.5 298.8 287.0
1997 168.8 322.6 393.5 404.3
1998 259.7 401.1 464.6 479.7
1999 264.4 402.6 411.3 385.9
2000 232.7 309.2 310.7 293.0
2001 205.1 234.4 285.4 285.7
17
2002 193.2 263.7 292.5 315.2
2003 178.3 274.5 295.4 286.4
2004 190.8 263.5 318.8 305.5
2005 242.6 318.8 329.6 338.2
2006 232.1 285.6 291.0 281.4
500
400
300
200
100
18
Choosing a Forecasting Technique
19
Forecasting Technique for Stationary Data
A Stationary Series is one whose mean value is not changing
over time.
It is important to recognize that stationary data do not
necessarily vary randomly about the mean level. Stationary
series can be autocorrelated.
Stationary forecasting technique are used in the following
circumstances:
20
Example:
Transforming a series by taking logarithms, square roots, or
differences.
22
Example:
Retail sales influenced by holidays.
Techniques that should be considered when forecasting seasonal
series include classical decomposition, Census X-12, Winter's
exponential smoothing, multiple regression, and autoregressive
integrated moving average (ARIMA) models (Box-Jenkins
methods).
23
Techniques that should be considered when forecasting cyclical
series include classical decomposition, economic indicators,
econometric models, multiple regression, and autoregressive
integrated moving average (ARIMA) models (Box-Jenkins
methods).
24