Forecasting: Principles and Practice Rob J Hyndman 6. Transformations, stationarity, differencing

advertisement
Rob J Hyndman
Forecasting:
Principles and Practice
6. Transformations, stationarity, differencing
OTexts.com/fpp/2/4/
OTexts.com/fpp/8/1/
Forecasting: Principles and Practice
1
Outline
1 Transformations
2 Stationarity
3 Ordinary differencing
4 Seasonal differencing
5 Unit root tests
6 Backshift notation
Forecasting: Principles and Practice
Transformations
2
Transformations to stabilize the variance
If the data show different variation at different levels of the
series, then a transformation can be useful.
Denote original observations as y1 , . . . , yn and transformed
observations as w1 , . . . , wn .
Mathematical transformations for stabilizing variation
Square root
wt =
√
yt
↓
Cube root
wt =
√
3
Increasing
Logarithm
wt = log(yt )
strength
yt
Logarithms, in particular, are useful because they are more
interpretable: changes in a log value are relative (percent)
changes on the original scale.
Forecasting: Principles and Practice
Transformations
3
Transformations to stabilize the variance
If the data show different variation at different levels of the
series, then a transformation can be useful.
Denote original observations as y1 , . . . , yn and transformed
observations as w1 , . . . , wn .
Mathematical transformations for stabilizing variation
Square root
wt =
√
yt
↓
Cube root
wt =
√
3
Increasing
Logarithm
wt = log(yt )
strength
yt
Logarithms, in particular, are useful because they are more
interpretable: changes in a log value are relative (percent)
changes on the original scale.
Forecasting: Principles and Practice
Transformations
3
Transformations to stabilize the variance
If the data show different variation at different levels of the
series, then a transformation can be useful.
Denote original observations as y1 , . . . , yn and transformed
observations as w1 , . . . , wn .
Mathematical transformations for stabilizing variation
Square root
wt =
√
yt
↓
Cube root
wt =
√
3
Increasing
Logarithm
wt = log(yt )
strength
yt
Logarithms, in particular, are useful because they are more
interpretable: changes in a log value are relative (percent)
changes on the original scale.
Forecasting: Principles and Practice
Transformations
3
Transformations to stabilize the variance
If the data show different variation at different levels of the
series, then a transformation can be useful.
Denote original observations as y1 , . . . , yn and transformed
observations as w1 , . . . , wn .
Mathematical transformations for stabilizing variation
Square root
wt =
√
yt
↓
Cube root
wt =
√
3
Increasing
Logarithm
wt = log(yt )
strength
yt
Logarithms, in particular, are useful because they are more
interpretable: changes in a log value are relative (percent)
changes on the original scale.
Forecasting: Principles and Practice
Transformations
3
Transformations
40
60
80
100
120
Square root electricity production
1960
1970
1980
1990
Year
Forecasting: Principles and Practice
Transformations
4
Transformations
12
14
16
18
20
22
24
Cube root electricity production
1960
1970
1980
1990
Year
Forecasting: Principles and Practice
Transformations
4
Transformations
7.5
8.0
8.5
9.0
9.5
Log electricity production
1960
1970
1980
1990
Year
Forecasting: Principles and Practice
Transformations
4
Transformations
−8e−04
−6e−04
−4e−04
−2e−04
Inverse electricity production
1960
1970
1980
1990
Year
Forecasting: Principles and Practice
Transformations
4
Box-Cox transformations
Each of these transformations is close to a member
of the family of Box-Cox transformations:
wt =
log(yt ),
(ytλ − 1)/λ,
λ = 0;
λ 6= 0.
λ = 1: (No substantive transformation)
λ = 12 : (Square root plus linear transformation)
λ = 0: (Natural logarithm)
λ = −1: (Inverse plus 1)
Forecasting: Principles and Practice
Transformations
5
Box-Cox transformations
Each of these transformations is close to a member
of the family of Box-Cox transformations:
wt =
log(yt ),
(ytλ − 1)/λ,
λ = 0;
λ 6= 0.
λ = 1: (No substantive transformation)
λ = 12 : (Square root plus linear transformation)
λ = 0: (Natural logarithm)
λ = −1: (Inverse plus 1)
Forecasting: Principles and Practice
Transformations
5
Box-Cox transformations
Each of these transformations is close to a member
of the family of Box-Cox transformations:
wt =
log(yt ),
(ytλ − 1)/λ,
λ = 0;
λ 6= 0.
λ = 1: (No substantive transformation)
λ = 12 : (Square root plus linear transformation)
λ = 0: (Natural logarithm)
λ = −1: (Inverse plus 1)
Forecasting: Principles and Practice
Transformations
5
Box-Cox transformations
Each of these transformations is close to a member
of the family of Box-Cox transformations:
wt =
log(yt ),
(ytλ − 1)/λ,
λ = 0;
λ 6= 0.
λ = 1: (No substantive transformation)
λ = 12 : (Square root plus linear transformation)
λ = 0: (Natural logarithm)
λ = −1: (Inverse plus 1)
Forecasting: Principles and Practice
Transformations
5
Box-Cox transformations
Each of these transformations is close to a member
of the family of Box-Cox transformations:
wt =
log(yt ),
(ytλ − 1)/λ,
λ = 0;
λ 6= 0.
λ = 1: (No substantive transformation)
λ = 12 : (Square root plus linear transformation)
λ = 0: (Natural logarithm)
λ = −1: (Inverse plus 1)
Forecasting: Principles and Practice
Transformations
5
Box-Cox transformations
10000
6000
2000
Monthly electricity production
14000
λ = 1.00
1960
1970
1980
1990
Year
Forecasting: Principles and Practice
Transformations
6
Box-Cox transformations
ytλ for λ close to zero behaves like logs.
If some yt = 0, then must have λ > 0
if some yt < 0, no power transformation is
possible unless all yt adjusted by adding a
constant to all values.
Choose a simple value of λ. It makes
explanation easier.
Results are relatively insensitive to value of λ
Often no transformation (λ = 1) needed.
Transformation often makes little difference to
forecasts but has large effect on PI.
Choosing λ = 0 is a simple way to force
forecasts to be positive
Forecasting: Principles and Practice
Transformations
7
Box-Cox transformations
ytλ for λ close to zero behaves like logs.
If some yt = 0, then must have λ > 0
if some yt < 0, no power transformation is
possible unless all yt adjusted by adding a
constant to all values.
Choose a simple value of λ. It makes
explanation easier.
Results are relatively insensitive to value of λ
Often no transformation (λ = 1) needed.
Transformation often makes little difference to
forecasts but has large effect on PI.
Choosing λ = 0 is a simple way to force
forecasts to be positive
Forecasting: Principles and Practice
Transformations
7
Box-Cox transformations
ytλ for λ close to zero behaves like logs.
If some yt = 0, then must have λ > 0
if some yt < 0, no power transformation is
possible unless all yt adjusted by adding a
constant to all values.
Choose a simple value of λ. It makes
explanation easier.
Results are relatively insensitive to value of λ
Often no transformation (λ = 1) needed.
Transformation often makes little difference to
forecasts but has large effect on PI.
Choosing λ = 0 is a simple way to force
forecasts to be positive
Forecasting: Principles and Practice
Transformations
7
Box-Cox transformations
ytλ for λ close to zero behaves like logs.
If some yt = 0, then must have λ > 0
if some yt < 0, no power transformation is
possible unless all yt adjusted by adding a
constant to all values.
Choose a simple value of λ. It makes
explanation easier.
Results are relatively insensitive to value of λ
Often no transformation (λ = 1) needed.
Transformation often makes little difference to
forecasts but has large effect on PI.
Choosing λ = 0 is a simple way to force
forecasts to be positive
Forecasting: Principles and Practice
Transformations
7
Box-Cox transformations
ytλ for λ close to zero behaves like logs.
If some yt = 0, then must have λ > 0
if some yt < 0, no power transformation is
possible unless all yt adjusted by adding a
constant to all values.
Choose a simple value of λ. It makes
explanation easier.
Results are relatively insensitive to value of λ
Often no transformation (λ = 1) needed.
Transformation often makes little difference to
forecasts but has large effect on PI.
Choosing λ = 0 is a simple way to force
forecasts to be positive
Forecasting: Principles and Practice
Transformations
7
Box-Cox transformations
ytλ for λ close to zero behaves like logs.
If some yt = 0, then must have λ > 0
if some yt < 0, no power transformation is
possible unless all yt adjusted by adding a
constant to all values.
Choose a simple value of λ. It makes
explanation easier.
Results are relatively insensitive to value of λ
Often no transformation (λ = 1) needed.
Transformation often makes little difference to
forecasts but has large effect on PI.
Choosing λ = 0 is a simple way to force
forecasts to be positive
Forecasting: Principles and Practice
Transformations
7
Box-Cox transformations
ytλ for λ close to zero behaves like logs.
If some yt = 0, then must have λ > 0
if some yt < 0, no power transformation is
possible unless all yt adjusted by adding a
constant to all values.
Choose a simple value of λ. It makes
explanation easier.
Results are relatively insensitive to value of λ
Often no transformation (λ = 1) needed.
Transformation often makes little difference to
forecasts but has large effect on PI.
Choosing λ = 0 is a simple way to force
forecasts to be positive
Forecasting: Principles and Practice
Transformations
7
Box-Cox transformations
ytλ for λ close to zero behaves like logs.
If some yt = 0, then must have λ > 0
if some yt < 0, no power transformation is
possible unless all yt adjusted by adding a
constant to all values.
Choose a simple value of λ. It makes
explanation easier.
Results are relatively insensitive to value of λ
Often no transformation (λ = 1) needed.
Transformation often makes little difference to
forecasts but has large effect on PI.
Choosing λ = 0 is a simple way to force
forecasts to be positive
Forecasting: Principles and Practice
Transformations
7
Back-transformation
We must reverse the transformation (or
back-transform) to obtain forecasts on the original
scale. The reverse Box-Cox transformations are
given by
yt =
exp(wt ),
(λWt + 1)1/λ ,
λ = 0;
λ 6= 0.
plot(BoxCox(elec,lambda=1/3))
fit <- snaive(elec, lambda=1/3)
plot(fit)
plot(fit, include=120)
Forecasting: Principles and Practice
Transformations
8
Back-transformation
We must reverse the transformation (or
back-transform) to obtain forecasts on the original
scale. The reverse Box-Cox transformations are
given by
yt =
exp(wt ),
(λWt + 1)1/λ ,
λ = 0;
λ 6= 0.
plot(BoxCox(elec,lambda=1/3))
fit <- snaive(elec, lambda=1/3)
plot(fit)
plot(fit, include=120)
Forecasting: Principles and Practice
Transformations
8
Automated Box-Cox transformations
BoxCox.lambda(elec)
This attempts to balance the seasonal
fluctuations and random variation across the
series.
Always check the results.
A low value of λ can give extremely large
prediction intervals.
Forecasting: Principles and Practice
Transformations
9
Automated Box-Cox transformations
BoxCox.lambda(elec)
This attempts to balance the seasonal
fluctuations and random variation across the
series.
Always check the results.
A low value of λ can give extremely large
prediction intervals.
Forecasting: Principles and Practice
Transformations
9
Automated Box-Cox transformations
BoxCox.lambda(elec)
This attempts to balance the seasonal
fluctuations and random variation across the
series.
Always check the results.
A low value of λ can give extremely large
prediction intervals.
Forecasting: Principles and Practice
Transformations
9
Automated Box-Cox transformations
BoxCox.lambda(elec)
This attempts to balance the seasonal
fluctuations and random variation across the
series.
Always check the results.
A low value of λ can give extremely large
prediction intervals.
Forecasting: Principles and Practice
Transformations
9
ETS and transformations
A Box-Cox transformation followed by an
additive ETS model is often better than an ETS
model without transformation.
It makes no sense to use a Box-Cox
transformation and a non-additive ETS model.
Forecasting: Principles and Practice
Transformations
10
ETS and transformations
A Box-Cox transformation followed by an
additive ETS model is often better than an ETS
model without transformation.
It makes no sense to use a Box-Cox
transformation and a non-additive ETS model.
Forecasting: Principles and Practice
Transformations
10
Outline
1 Transformations
2 Stationarity
3 Ordinary differencing
4 Seasonal differencing
5 Unit root tests
6 Backshift notation
Forecasting: Principles and Practice
Stationarity
11
Stationarity
Definition
If {yt } is a stationary time series, then for
all s, the distribution of (yt , . . . , yt+s ) does
not depend on t.
A stationary series is:
roughly horizontal
constant variance
no patterns predictable in the long-term
Forecasting: Principles and Practice
Stationarity
12
Stationarity
Definition
If {yt } is a stationary time series, then for
all s, the distribution of (yt , . . . , yt+s ) does
not depend on t.
A stationary series is:
roughly horizontal
constant variance
no patterns predictable in the long-term
Forecasting: Principles and Practice
Stationarity
12
Stationarity
Definition
If {yt } is a stationary time series, then for
all s, the distribution of (yt , . . . , yt+s ) does
not depend on t.
A stationary series is:
roughly horizontal
constant variance
no patterns predictable in the long-term
Forecasting: Principles and Practice
Stationarity
12
Stationarity
Definition
If {yt } is a stationary time series, then for
all s, the distribution of (yt , . . . , yt+s ) does
not depend on t.
A stationary series is:
roughly horizontal
constant variance
no patterns predictable in the long-term
Forecasting: Principles and Practice
Stationarity
12
3800
3700
3600
Dow−Jones index
3900
Stationary?
0
50
100
150
200
250
300
Day
Forecasting: Principles and Practice
Stationarity
13
0
−50
−100
Change in Dow−Jones index
50
Stationary?
0
50
100
150
200
250
300
Day
Forecasting: Principles and Practice
Stationarity
14
Stationary?
5000
4500
4000
3500
Number of strikes
5500
6000
Annual strikes in the US
1950
1955
1960
1965
1970
1975
1980
Year
Forecasting: Principles and Practice
Stationarity
15
Stationary?
60
50
40
30
Total sales
70
80
90
Sales of new one−family houses, USA
1975
1980
Forecasting: Principles and Practice
1985
1990
Stationarity
1995
16
Stationary?
200
150
100
$
250
300
350
Price of a dozen eggs in 1993 dollars
1900
1920
1940
1960
1980
Year
Forecasting: Principles and Practice
Stationarity
17
Stationary?
100
90
80
thousands
110
Number of pigs slaughtered in Victoria
1990
1991
1992
Forecasting: Principles and Practice
1993
1994
Stationarity
1995
18
1000 2000 3000 4000 5000 6000 7000
Annual Canadian Lynx trappings
0
Number trapped
Stationary?
1820
1840
1860
1880
1900
1920
Time
Forecasting: Principles and Practice
Stationarity
19
Stationary?
450
400
megaliters
500
Australian quarterly beer production
1995
Forecasting: Principles and Practice
2000
2005
Stationarity
20
Stationary?
2000
6000
GWh
10000
14000
Annual monthly electricity production
1960
1970
1980
1990
Year
Forecasting: Principles and Practice
Stationarity
21
Stationarity
Definition
If {yt } is a stationary time series, then for
all s, the distribution of (yt , . . . , yt+s ) does
not depend on t.
Transformations help to stabilize the
variance.
For ARIMA modelling, we also need to
stabilize the mean.
Forecasting: Principles and Practice
Stationarity
22
Stationarity
Definition
If {yt } is a stationary time series, then for
all s, the distribution of (yt , . . . , yt+s ) does
not depend on t.
Transformations help to stabilize the
variance.
For ARIMA modelling, we also need to
stabilize the mean.
Forecasting: Principles and Practice
Stationarity
22
Stationarity
Definition
If {yt } is a stationary time series, then for
all s, the distribution of (yt , . . . , yt+s ) does
not depend on t.
Transformations help to stabilize the
variance.
For ARIMA modelling, we also need to
stabilize the mean.
Forecasting: Principles and Practice
Stationarity
22
Non-stationarity in the mean
Identifying non-stationary series
time plot.
The ACF of stationary data drops to
zero relatively quickly
The ACF of non-stationary data
decreases slowly.
For non-stationary data, the value of r1
is often large and positive.
Forecasting: Principles and Practice
Stationarity
23
Non-stationarity in the mean
Identifying non-stationary series
time plot.
The ACF of stationary data drops to
zero relatively quickly
The ACF of non-stationary data
decreases slowly.
For non-stationary data, the value of r1
is often large and positive.
Forecasting: Principles and Practice
Stationarity
23
Non-stationarity in the mean
Identifying non-stationary series
time plot.
The ACF of stationary data drops to
zero relatively quickly
The ACF of non-stationary data
decreases slowly.
For non-stationary data, the value of r1
is often large and positive.
Forecasting: Principles and Practice
Stationarity
23
Non-stationarity in the mean
Identifying non-stationary series
time plot.
The ACF of stationary data drops to
zero relatively quickly
The ACF of non-stationary data
decreases slowly.
For non-stationary data, the value of r1
is often large and positive.
Forecasting: Principles and Practice
Stationarity
23
Example: Dow-Jones index
3600
3700
3800
3900
Dow Jones index (daily ending 15 Jul 94)
0
50
Forecasting: Principles and Practice
100
150
200
Stationarity
250
24
0.4
0.2
0.0
−0.2
ACF
0.6
0.8
1.0
Example: Dow-Jones index
1
2
3
4
5
6
7
8
9 10
12
14
16
18
20
22
Lag
Forecasting: Principles and Practice
Stationarity
25
0
−50
−100
Change in Dow−Jones index
50
Example: Dow-Jones index
0
50
100
150
200
250
300
Day
Forecasting: Principles and Practice
Stationarity
26
−0.15
−0.05
ACF
0.05
0.10
0.15
Example: Dow-Jones index
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Lag
Forecasting: Principles and Practice
Stationarity
27
Outline
1 Transformations
2 Stationarity
3 Ordinary differencing
4 Seasonal differencing
5 Unit root tests
6 Backshift notation
Forecasting: Principles and Practice
Ordinary differencing
28
Differencing
Differencing helps to stabilize the
mean.
The differenced series is the change
between each observation in the
original series: yt0 = yt − yt−1 .
The differenced series will have only
T − 1 values since it is not possible to
calculate a difference y10 for the first
observation.
Forecasting: Principles and Practice
Ordinary differencing
29
Differencing
Differencing helps to stabilize the
mean.
The differenced series is the change
between each observation in the
original series: yt0 = yt − yt−1 .
The differenced series will have only
T − 1 values since it is not possible to
calculate a difference y10 for the first
observation.
Forecasting: Principles and Practice
Ordinary differencing
29
Differencing
Differencing helps to stabilize the
mean.
The differenced series is the change
between each observation in the
original series: yt0 = yt − yt−1 .
The differenced series will have only
T − 1 values since it is not possible to
calculate a difference y10 for the first
observation.
Forecasting: Principles and Practice
Ordinary differencing
29
Example: Dow-Jones index
The differences of the Dow-Jones index
are the day-today changes.
Now the series looks just like a white
noise series:
no autocorrelations outside the 95% limits.
Ljung-Box Q∗ statistic has a p-value 0.153 for
h = 10.
Conclusion: The daily change in the
Dow-Jones index is essentially a
random amount uncorrelated with
previous days.
Forecasting: Principles and Practice
Ordinary differencing
30
Example: Dow-Jones index
The differences of the Dow-Jones index
are the day-today changes.
Now the series looks just like a white
noise series:
no autocorrelations outside the 95% limits.
Ljung-Box Q∗ statistic has a p-value 0.153 for
h = 10.
Conclusion: The daily change in the
Dow-Jones index is essentially a
random amount uncorrelated with
previous days.
Forecasting: Principles and Practice
Ordinary differencing
30
Example: Dow-Jones index
The differences of the Dow-Jones index
are the day-today changes.
Now the series looks just like a white
noise series:
no autocorrelations outside the 95% limits.
Ljung-Box Q∗ statistic has a p-value 0.153 for
h = 10.
Conclusion: The daily change in the
Dow-Jones index is essentially a
random amount uncorrelated with
previous days.
Forecasting: Principles and Practice
Ordinary differencing
30
Example: Dow-Jones index
The differences of the Dow-Jones index
are the day-today changes.
Now the series looks just like a white
noise series:
no autocorrelations outside the 95% limits.
Ljung-Box Q∗ statistic has a p-value 0.153 for
h = 10.
Conclusion: The daily change in the
Dow-Jones index is essentially a
random amount uncorrelated with
previous days.
Forecasting: Principles and Practice
Ordinary differencing
30
Example: Dow-Jones index
The differences of the Dow-Jones index
are the day-today changes.
Now the series looks just like a white
noise series:
no autocorrelations outside the 95% limits.
Ljung-Box Q∗ statistic has a p-value 0.153 for
h = 10.
Conclusion: The daily change in the
Dow-Jones index is essentially a
random amount uncorrelated with
previous days.
Forecasting: Principles and Practice
Ordinary differencing
30
Random walk model
Graph of differenced data suggests model
for Dow-Jones index:
yt − yt−1 = et
or
yt = yt−1 + et .
“Random walk” model very widely used
for non-stationary data.
This is the model behind the naïve
method.
Random walks typically have:
long periods of apparent trends up or down
sudden and unpredictable changes in direction.
Forecasting: Principles and Practice
Ordinary differencing
31
Random walk model
Graph of differenced data suggests model
for Dow-Jones index:
yt − yt−1 = et
or
yt = yt−1 + et .
“Random walk” model very widely used
for non-stationary data.
This is the model behind the naïve
method.
Random walks typically have:
long periods of apparent trends up or down
sudden and unpredictable changes in direction.
Forecasting: Principles and Practice
Ordinary differencing
31
Random walk model
Graph of differenced data suggests model
for Dow-Jones index:
yt − yt−1 = et
or
yt = yt−1 + et .
“Random walk” model very widely used
for non-stationary data.
This is the model behind the naïve
method.
Random walks typically have:
long periods of apparent trends up or down
sudden and unpredictable changes in direction.
Forecasting: Principles and Practice
Ordinary differencing
31
Random walk model
Graph of differenced data suggests model
for Dow-Jones index:
yt − yt−1 = et
or
yt = yt−1 + et .
“Random walk” model very widely used
for non-stationary data.
This is the model behind the naïve
method.
Random walks typically have:
long periods of apparent trends up or down
sudden and unpredictable changes in direction.
Forecasting: Principles and Practice
Ordinary differencing
31
Random walk model
Graph of differenced data suggests model
for Dow-Jones index:
yt − yt−1 = et
or
yt = yt−1 + et .
“Random walk” model very widely used
for non-stationary data.
This is the model behind the naïve
method.
Random walks typically have:
long periods of apparent trends up or down
sudden and unpredictable changes in direction.
Forecasting: Principles and Practice
Ordinary differencing
31
Random walk model
Graph of differenced data suggests model
for Dow-Jones index:
yt − yt−1 = et
or
yt = yt−1 + et .
“Random walk” model very widely used
for non-stationary data.
This is the model behind the naïve
method.
Random walks typically have:
long periods of apparent trends up or down
sudden and unpredictable changes in direction.
Forecasting: Principles and Practice
Ordinary differencing
31
Random walk with drift model
yt − yt−1 = c + et
or
yt = c + yt−1 + et .
c is the average change between
consecutive observations.
This is the model behind the drift
method.
Forecasting: Principles and Practice
Ordinary differencing
32
Random walk with drift model
yt − yt−1 = c + et
or
yt = c + yt−1 + et .
c is the average change between
consecutive observations.
This is the model behind the drift
method.
Forecasting: Principles and Practice
Ordinary differencing
32
Random walk with drift model
yt − yt−1 = c + et
or
yt = c + yt−1 + et .
c is the average change between
consecutive observations.
This is the model behind the drift
method.
Forecasting: Principles and Practice
Ordinary differencing
32
Second-order differencing
Occasionally the differenced data will not
appear stationary and it may be necessary
to difference the data a second time:
yt00 = yt0 − yt0 −1
= (yt − yt−1) − (yt−1 − yt−2)
= yt − 2yt−1 + yt−2.
yt00 will have T − 2 values.
In practice, it is almost never necessary
to go beyond second-order differences.
Forecasting: Principles and Practice
Ordinary differencing
33
Second-order differencing
Occasionally the differenced data will not
appear stationary and it may be necessary
to difference the data a second time:
yt00 = yt0 − yt0 −1
= (yt − yt−1) − (yt−1 − yt−2)
= yt − 2yt−1 + yt−2.
yt00 will have T − 2 values.
In practice, it is almost never necessary
to go beyond second-order differences.
Forecasting: Principles and Practice
Ordinary differencing
33
Second-order differencing
Occasionally the differenced data will not
appear stationary and it may be necessary
to difference the data a second time:
yt00 = yt0 − yt0 −1
= (yt − yt−1) − (yt−1 − yt−2)
= yt − 2yt−1 + yt−2.
yt00 will have T − 2 values.
In practice, it is almost never necessary
to go beyond second-order differences.
Forecasting: Principles and Practice
Ordinary differencing
33
Second-order differencing
Occasionally the differenced data will not
appear stationary and it may be necessary
to difference the data a second time:
yt00 = yt0 − yt0 −1
= (yt − yt−1) − (yt−1 − yt−2)
= yt − 2yt−1 + yt−2.
yt00 will have T − 2 values.
In practice, it is almost never necessary
to go beyond second-order differences.
Forecasting: Principles and Practice
Ordinary differencing
33
Outline
1 Transformations
2 Stationarity
3 Ordinary differencing
4 Seasonal differencing
5 Unit root tests
6 Backshift notation
Forecasting: Principles and Practice
Seasonal differencing
34
Seasonal differencing
A seasonal difference is the difference
between an observation and the
corresponding observation from the
previous year.
yt0 = yt − yt−m
where m = number of seasons.
For monthly data m = 12.
For quarterly data m = 4.
Forecasting: Principles and Practice
Seasonal differencing
35
Seasonal differencing
A seasonal difference is the difference
between an observation and the
corresponding observation from the
previous year.
yt0 = yt − yt−m
where m = number of seasons.
For monthly data m = 12.
For quarterly data m = 4.
Forecasting: Principles and Practice
Seasonal differencing
35
Seasonal differencing
A seasonal difference is the difference
between an observation and the
corresponding observation from the
previous year.
yt0 = yt − yt−m
where m = number of seasons.
For monthly data m = 12.
For quarterly data m = 4.
Forecasting: Principles and Practice
Seasonal differencing
35
Seasonal differencing
A seasonal difference is the difference
between an observation and the
corresponding observation from the
previous year.
yt0 = yt − yt−m
where m = number of seasons.
For monthly data m = 12.
For quarterly data m = 4.
Forecasting: Principles and Practice
Seasonal differencing
35
Antidiabetic drug sales
15
10
5
$ million
20
25
30
Antidiabetic drug sales
1995
> plot(a10)
Forecasting: Principles and Practice
2000
2005
Year
Seasonal differencing
36
Antidiabetic drug sales
1.0
1.5
2.0
2.5
3.0
Log Antidiabetic drug sales
1995
> plot(log(a10))
Forecasting: Principles and Practice
2000
2005
Year
Seasonal differencing
37
Monthly lo
2.0
1.5
1.0
0.3
0.2
0.1
0.0
−0.1
Annual change in monthly log sales
Antidiabetic drug sales
1995
2000
2005
Year
> plot(diff(log(a10),12))
Forecasting: Principles and Practice
Seasonal differencing
38
300
250
200
150
Billion kWh
350
400
Electricity production
1980
> plot(usmelec)
Forecasting: Principles and Practice
1990
2000
2010
Year
Seasonal differencing
39
5.4
5.2
5.0
Logs
5.6
5.8
6.0
Electricity production
1980
> plot(log(usmelec))
Forecasting: Principles and Practice
1990
2000
2010
Year
Seasonal differencing
40
0.10
0.05
0.00
−0.05
Seasonally differenced logs
0.15
Electricity production
1980
1990
2000
2010
Year
> plot(diff(log(usmelec),12))
Forecasting: Principles and Practice
Seasonal differencing
41
0.05
0.00
−0.05
−0.10
Doubly differenced logs
0.10
Electricity production
1980
1990
2000
2010
Year
> plot(diff(diff(log(usmelec),12),1))
Forecasting: Principles and Practice
Seasonal differencing
42
Electricity production
Seasonally differenced series is closer
to being stationary.
Remaining non-stationarity can be
removed with further first difference.
If yt0 = yt − yt−12 denotes seasonally
differenced series, then twice-differenced
series is
yt∗ = yt0 − yt0 −1
= (yt − yt−12) − (yt−1 − yt−13)
= yt − yt−1 − yt−12 + yt−13 .
Forecasting: Principles and Practice
Seasonal differencing
43
Seasonal differencing
When both seasonal and first differences are
applied. . .
it makes no difference which is done first—the
result will be the same.
If seasonality is strong, we recommend that
seasonal differencing be done first because
sometimes the resulting series will be
stationary and there will be no need for further
first difference.
It is important that if differencing is used, the
differences are interpretable.
Forecasting: Principles and Practice
Seasonal differencing
44
Seasonal differencing
When both seasonal and first differences are
applied. . .
it makes no difference which is done first—the
result will be the same.
If seasonality is strong, we recommend that
seasonal differencing be done first because
sometimes the resulting series will be
stationary and there will be no need for further
first difference.
It is important that if differencing is used, the
differences are interpretable.
Forecasting: Principles and Practice
Seasonal differencing
44
Seasonal differencing
When both seasonal and first differences are
applied. . .
it makes no difference which is done first—the
result will be the same.
If seasonality is strong, we recommend that
seasonal differencing be done first because
sometimes the resulting series will be
stationary and there will be no need for further
first difference.
It is important that if differencing is used, the
differences are interpretable.
Forecasting: Principles and Practice
Seasonal differencing
44
Seasonal differencing
When both seasonal and first differences are
applied. . .
it makes no difference which is done first—the
result will be the same.
If seasonality is strong, we recommend that
seasonal differencing be done first because
sometimes the resulting series will be
stationary and there will be no need for further
first difference.
It is important that if differencing is used, the
differences are interpretable.
Forecasting: Principles and Practice
Seasonal differencing
44
Seasonal differencing
When both seasonal and first differences are
applied. . .
it makes no difference which is done first—the
result will be the same.
If seasonality is strong, we recommend that
seasonal differencing be done first because
sometimes the resulting series will be
stationary and there will be no need for further
first difference.
It is important that if differencing is used, the
differences are interpretable.
Forecasting: Principles and Practice
Seasonal differencing
44
Interpretation of differencing
first differences are the change
between one observation and the
next;
seasonal differences are the change
between one year to the next.
But taking lag 3 differences for yearly data,
for example, results in a model which
cannot be sensibly interpreted.
Forecasting: Principles and Practice
Seasonal differencing
45
Interpretation of differencing
first differences are the change
between one observation and the
next;
seasonal differences are the change
between one year to the next.
But taking lag 3 differences for yearly data,
for example, results in a model which
cannot be sensibly interpreted.
Forecasting: Principles and Practice
Seasonal differencing
45
Interpretation of differencing
first differences are the change
between one observation and the
next;
seasonal differences are the change
between one year to the next.
But taking lag 3 differences for yearly data,
for example, results in a model which
cannot be sensibly interpreted.
Forecasting: Principles and Practice
Seasonal differencing
45
Interpretation of differencing
first differences are the change
between one observation and the
next;
seasonal differences are the change
between one year to the next.
But taking lag 3 differences for yearly data,
for example, results in a model which
cannot be sensibly interpreted.
Forecasting: Principles and Practice
Seasonal differencing
45
Outline
1 Transformations
2 Stationarity
3 Ordinary differencing
4 Seasonal differencing
5 Unit root tests
6 Backshift notation
Forecasting: Principles and Practice
Unit root tests
46
Unit root tests
Statistical tests to determine the
required order of differencing.
1
2
3
Augmented Dickey Fuller test: null
hypothesis is that the data are
non-stationary and non-seasonal.
Kwiatkowski-Phillips-Schmidt-Shin
(KPSS) test: null hypothesis is that the
data are stationary and non-seasonal.
Other tests available for seasonal data.
Forecasting: Principles and Practice
Unit root tests
47
Unit root tests
Statistical tests to determine the
required order of differencing.
1
2
3
Augmented Dickey Fuller test: null
hypothesis is that the data are
non-stationary and non-seasonal.
Kwiatkowski-Phillips-Schmidt-Shin
(KPSS) test: null hypothesis is that the
data are stationary and non-seasonal.
Other tests available for seasonal data.
Forecasting: Principles and Practice
Unit root tests
47
Unit root tests
Statistical tests to determine the
required order of differencing.
1
2
3
Augmented Dickey Fuller test: null
hypothesis is that the data are
non-stationary and non-seasonal.
Kwiatkowski-Phillips-Schmidt-Shin
(KPSS) test: null hypothesis is that the
data are stationary and non-seasonal.
Other tests available for seasonal data.
Forecasting: Principles and Practice
Unit root tests
47
Dickey-Fuller test
Test for “unit root”
Estimate regression model
yt0 = φyt−1 + b1 yt0 −1 + b2 yt0 −2 + · · · + bk yt0 −k
where yt0 denotes differenced series yt − yt−1 .
Number of lagged terms, k, is usually set to be
about 3.
If original series, yt , needs differencing, φ̂ ≈ 0.
If yt is already stationary, φ̂ < 0.
In R: Use adf.test().
Forecasting: Principles and Practice
Unit root tests
48
Dickey-Fuller test
Test for “unit root”
Estimate regression model
yt0 = φyt−1 + b1 yt0 −1 + b2 yt0 −2 + · · · + bk yt0 −k
where yt0 denotes differenced series yt − yt−1 .
Number of lagged terms, k, is usually set to be
about 3.
If original series, yt , needs differencing, φ̂ ≈ 0.
If yt is already stationary, φ̂ < 0.
In R: Use adf.test().
Forecasting: Principles and Practice
Unit root tests
48
Dickey-Fuller test
Test for “unit root”
Estimate regression model
yt0 = φyt−1 + b1 yt0 −1 + b2 yt0 −2 + · · · + bk yt0 −k
where yt0 denotes differenced series yt − yt−1 .
Number of lagged terms, k, is usually set to be
about 3.
If original series, yt , needs differencing, φ̂ ≈ 0.
If yt is already stationary, φ̂ < 0.
In R: Use adf.test().
Forecasting: Principles and Practice
Unit root tests
48
Dickey-Fuller test
Test for “unit root”
Estimate regression model
yt0 = φyt−1 + b1 yt0 −1 + b2 yt0 −2 + · · · + bk yt0 −k
where yt0 denotes differenced series yt − yt−1 .
Number of lagged terms, k, is usually set to be
about 3.
If original series, yt , needs differencing, φ̂ ≈ 0.
If yt is already stationary, φ̂ < 0.
In R: Use adf.test().
Forecasting: Principles and Practice
Unit root tests
48
Dickey-Fuller test
Test for “unit root”
Estimate regression model
yt0 = φyt−1 + b1 yt0 −1 + b2 yt0 −2 + · · · + bk yt0 −k
where yt0 denotes differenced series yt − yt−1 .
Number of lagged terms, k, is usually set to be
about 3.
If original series, yt , needs differencing, φ̂ ≈ 0.
If yt is already stationary, φ̂ < 0.
In R: Use adf.test().
Forecasting: Principles and Practice
Unit root tests
48
Dickey-Fuller test in R
adf.test(x,
alternative = c("stationary", "explosive"),
k = trunc((length(x)-1)^(1/3)))
k = bT − 1c1/3
Set alternative = stationary.
> adf.test(dj)
Augmented Dickey-Fuller Test
data: dj
Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816
alternative hypothesis: stationary
Forecasting: Principles and Practice
Unit root tests
49
Dickey-Fuller test in R
adf.test(x,
alternative = c("stationary", "explosive"),
k = trunc((length(x)-1)^(1/3)))
k = bT − 1c1/3
Set alternative = stationary.
> adf.test(dj)
Augmented Dickey-Fuller Test
data: dj
Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816
alternative hypothesis: stationary
Forecasting: Principles and Practice
Unit root tests
49
Dickey-Fuller test in R
adf.test(x,
alternative = c("stationary", "explosive"),
k = trunc((length(x)-1)^(1/3)))
k = bT − 1c1/3
Set alternative = stationary.
> adf.test(dj)
Augmented Dickey-Fuller Test
data: dj
Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816
alternative hypothesis: stationary
Forecasting: Principles and Practice
Unit root tests
49
Dickey-Fuller test in R
adf.test(x,
alternative = c("stationary", "explosive"),
k = trunc((length(x)-1)^(1/3)))
k = bT − 1c1/3
Set alternative = stationary.
> adf.test(dj)
Augmented Dickey-Fuller Test
data: dj
Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816
alternative hypothesis: stationary
Forecasting: Principles and Practice
Unit root tests
49
Dickey-Fuller test in R
adf.test(x,
alternative = c("stationary", "explosive"),
k = trunc((length(x)-1)^(1/3)))
k = bT − 1c1/3
Set alternative = stationary.
> adf.test(dj)
Augmented Dickey-Fuller Test
data: dj
Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816
alternative hypothesis: stationary
Forecasting: Principles and Practice
Unit root tests
49
How many differences?
ndiffs(x)
nsdiffs(x)
Automated differencing
ns <- nsdiffs(x)
if(ns > 0)
xstar <- diff(x,lag=frequency(x),
differences=ns)
else
xstar <- x
nd <- ndiffs(xstar)
if(nd > 0)
xstar <- diff(xstar,differences=nd)
Forecasting: Principles and Practice
Unit root tests
50
How many differences?
ndiffs(x)
nsdiffs(x)
Automated differencing
ns <- nsdiffs(x)
if(ns > 0)
xstar <- diff(x,lag=frequency(x),
differences=ns)
else
xstar <- x
nd <- ndiffs(xstar)
if(nd > 0)
xstar <- diff(xstar,differences=nd)
Forecasting: Principles and Practice
Unit root tests
50
Outline
1 Transformations
2 Stationarity
3 Ordinary differencing
4 Seasonal differencing
5 Unit root tests
6 Backshift notation
Forecasting: Principles and Practice
Backshift notation
51
Backshift notation
A very useful notational device is the backward shift
operator, B, which is used as follows:
Byt = yt−1 .
In other words, B, operating on yt , has the effect of
shifting the data back one period. Two
applications of B to yt shifts the data back two
periods:
B(Byt ) = B2 yt = yt−2 .
For monthly data, if we wish to shift attention to
“the same month last year,” then B12 is used, and
the notation is B12 yt = yt−12 .
Forecasting: Principles and Practice
Backshift notation
52
Backshift notation
A very useful notational device is the backward shift
operator, B, which is used as follows:
Byt = yt−1 .
In other words, B, operating on yt , has the effect of
shifting the data back one period. Two
applications of B to yt shifts the data back two
periods:
B(Byt ) = B2 yt = yt−2 .
For monthly data, if we wish to shift attention to
“the same month last year,” then B12 is used, and
the notation is B12 yt = yt−12 .
Forecasting: Principles and Practice
Backshift notation
52
Backshift notation
A very useful notational device is the backward shift
operator, B, which is used as follows:
Byt = yt−1 .
In other words, B, operating on yt , has the effect of
shifting the data back one period. Two
applications of B to yt shifts the data back two
periods:
B(Byt ) = B2 yt = yt−2 .
For monthly data, if we wish to shift attention to
“the same month last year,” then B12 is used, and
the notation is B12 yt = yt−12 .
Forecasting: Principles and Practice
Backshift notation
52
Backshift notation
A very useful notational device is the backward shift
operator, B, which is used as follows:
Byt = yt−1 .
In other words, B, operating on yt , has the effect of
shifting the data back one period. Two
applications of B to yt shifts the data back two
periods:
B(Byt ) = B2 yt = yt−2 .
For monthly data, if we wish to shift attention to
“the same month last year,” then B12 is used, and
the notation is B12 yt = yt−12 .
Forecasting: Principles and Practice
Backshift notation
52
Backshift notation
The backward shift operator is convenient for
describing the process of differencing. A first
difference can be written as
yt0 = yt − yt−1 = yt − Byt = (1 − B)yt .
Note that a first difference is represented by (1 − B).
Similarly, if second-order differences (i.e., first
differences of first differences) have to be
computed, then:
yt00 = yt − 2yt−1 + yt−2 = (1 − B)2 yt .
Forecasting: Principles and Practice
Backshift notation
53
Backshift notation
The backward shift operator is convenient for
describing the process of differencing. A first
difference can be written as
yt0 = yt − yt−1 = yt − Byt = (1 − B)yt .
Note that a first difference is represented by (1 − B).
Similarly, if second-order differences (i.e., first
differences of first differences) have to be
computed, then:
yt00 = yt − 2yt−1 + yt−2 = (1 − B)2 yt .
Forecasting: Principles and Practice
Backshift notation
53
Backshift notation
The backward shift operator is convenient for
describing the process of differencing. A first
difference can be written as
yt0 = yt − yt−1 = yt − Byt = (1 − B)yt .
Note that a first difference is represented by (1 − B).
Similarly, if second-order differences (i.e., first
differences of first differences) have to be
computed, then:
yt00 = yt − 2yt−1 + yt−2 = (1 − B)2 yt .
Forecasting: Principles and Practice
Backshift notation
53
Backshift notation
The backward shift operator is convenient for
describing the process of differencing. A first
difference can be written as
yt0 = yt − yt−1 = yt − Byt = (1 − B)yt .
Note that a first difference is represented by (1 − B).
Similarly, if second-order differences (i.e., first
differences of first differences) have to be
computed, then:
yt00 = yt − 2yt−1 + yt−2 = (1 − B)2 yt .
Forecasting: Principles and Practice
Backshift notation
53
Backshift notation
Second-order difference is denoted (1 − B)2 .
Second-order difference is not the same as a
second difference, which would be denoted
1 − B2 ;
In general, a dth-order difference can be
written as
(1 − B)d yt .
A seasonal difference followed by a first
difference can be written as
(1 − B)(1 − Bm )yt .
Forecasting: Principles and Practice
Backshift notation
54
Backshift notation
Second-order difference is denoted (1 − B)2 .
Second-order difference is not the same as a
second difference, which would be denoted
1 − B2 ;
In general, a dth-order difference can be
written as
(1 − B)d yt .
A seasonal difference followed by a first
difference can be written as
(1 − B)(1 − Bm )yt .
Forecasting: Principles and Practice
Backshift notation
54
Backshift notation
Second-order difference is denoted (1 − B)2 .
Second-order difference is not the same as a
second difference, which would be denoted
1 − B2 ;
In general, a dth-order difference can be
written as
(1 − B)d yt .
A seasonal difference followed by a first
difference can be written as
(1 − B)(1 − Bm )yt .
Forecasting: Principles and Practice
Backshift notation
54
Backshift notation
Second-order difference is denoted (1 − B)2 .
Second-order difference is not the same as a
second difference, which would be denoted
1 − B2 ;
In general, a dth-order difference can be
written as
(1 − B)d yt .
A seasonal difference followed by a first
difference can be written as
(1 − B)(1 − Bm )yt .
Forecasting: Principles and Practice
Backshift notation
54
Backshift notation
The “backshift” notation is convenient because the
terms can be multiplied together to see the
combined effect.
(1 − B)(1 − Bm )yt = (1 − B − Bm + Bm+1 )yt
= yt − yt−1 − yt−m + yt−m−1 .
For monthly data, m = 12 and we obtain the same
result as earlier.
Forecasting: Principles and Practice
Backshift notation
55
Backshift notation
The “backshift” notation is convenient because the
terms can be multiplied together to see the
combined effect.
(1 − B)(1 − Bm )yt = (1 − B − Bm + Bm+1 )yt
= yt − yt−1 − yt−m + yt−m−1 .
For monthly data, m = 12 and we obtain the same
result as earlier.
Forecasting: Principles and Practice
Backshift notation
55
Download