Rob J Hyndman Forecasting: Principles and Practice 6. Transformations, stationarity, differencing OTexts.com/fpp/2/4/ OTexts.com/fpp/8/1/ Forecasting: Principles and Practice 1 Outline 1 Transformations 2 Stationarity 3 Ordinary differencing 4 Seasonal differencing 5 Unit root tests 6 Backshift notation Forecasting: Principles and Practice Transformations 2 Transformations to stabilize the variance If the data show different variation at different levels of the series, then a transformation can be useful. Denote original observations as y1 , . . . , yn and transformed observations as w1 , . . . , wn . Mathematical transformations for stabilizing variation Square root wt = √ yt ↓ Cube root wt = √ 3 Increasing Logarithm wt = log(yt ) strength yt Logarithms, in particular, are useful because they are more interpretable: changes in a log value are relative (percent) changes on the original scale. Forecasting: Principles and Practice Transformations 3 Transformations to stabilize the variance If the data show different variation at different levels of the series, then a transformation can be useful. Denote original observations as y1 , . . . , yn and transformed observations as w1 , . . . , wn . Mathematical transformations for stabilizing variation Square root wt = √ yt ↓ Cube root wt = √ 3 Increasing Logarithm wt = log(yt ) strength yt Logarithms, in particular, are useful because they are more interpretable: changes in a log value are relative (percent) changes on the original scale. Forecasting: Principles and Practice Transformations 3 Transformations to stabilize the variance If the data show different variation at different levels of the series, then a transformation can be useful. Denote original observations as y1 , . . . , yn and transformed observations as w1 , . . . , wn . Mathematical transformations for stabilizing variation Square root wt = √ yt ↓ Cube root wt = √ 3 Increasing Logarithm wt = log(yt ) strength yt Logarithms, in particular, are useful because they are more interpretable: changes in a log value are relative (percent) changes on the original scale. Forecasting: Principles and Practice Transformations 3 Transformations to stabilize the variance If the data show different variation at different levels of the series, then a transformation can be useful. Denote original observations as y1 , . . . , yn and transformed observations as w1 , . . . , wn . Mathematical transformations for stabilizing variation Square root wt = √ yt ↓ Cube root wt = √ 3 Increasing Logarithm wt = log(yt ) strength yt Logarithms, in particular, are useful because they are more interpretable: changes in a log value are relative (percent) changes on the original scale. Forecasting: Principles and Practice Transformations 3 Transformations 40 60 80 100 120 Square root electricity production 1960 1970 1980 1990 Year Forecasting: Principles and Practice Transformations 4 Transformations 12 14 16 18 20 22 24 Cube root electricity production 1960 1970 1980 1990 Year Forecasting: Principles and Practice Transformations 4 Transformations 7.5 8.0 8.5 9.0 9.5 Log electricity production 1960 1970 1980 1990 Year Forecasting: Principles and Practice Transformations 4 Transformations −8e−04 −6e−04 −4e−04 −2e−04 Inverse electricity production 1960 1970 1980 1990 Year Forecasting: Principles and Practice Transformations 4 Box-Cox transformations Each of these transformations is close to a member of the family of Box-Cox transformations: wt = log(yt ), (ytλ − 1)/λ, λ = 0; λ 6= 0. λ = 1: (No substantive transformation) λ = 12 : (Square root plus linear transformation) λ = 0: (Natural logarithm) λ = −1: (Inverse plus 1) Forecasting: Principles and Practice Transformations 5 Box-Cox transformations Each of these transformations is close to a member of the family of Box-Cox transformations: wt = log(yt ), (ytλ − 1)/λ, λ = 0; λ 6= 0. λ = 1: (No substantive transformation) λ = 12 : (Square root plus linear transformation) λ = 0: (Natural logarithm) λ = −1: (Inverse plus 1) Forecasting: Principles and Practice Transformations 5 Box-Cox transformations Each of these transformations is close to a member of the family of Box-Cox transformations: wt = log(yt ), (ytλ − 1)/λ, λ = 0; λ 6= 0. λ = 1: (No substantive transformation) λ = 12 : (Square root plus linear transformation) λ = 0: (Natural logarithm) λ = −1: (Inverse plus 1) Forecasting: Principles and Practice Transformations 5 Box-Cox transformations Each of these transformations is close to a member of the family of Box-Cox transformations: wt = log(yt ), (ytλ − 1)/λ, λ = 0; λ 6= 0. λ = 1: (No substantive transformation) λ = 12 : (Square root plus linear transformation) λ = 0: (Natural logarithm) λ = −1: (Inverse plus 1) Forecasting: Principles and Practice Transformations 5 Box-Cox transformations Each of these transformations is close to a member of the family of Box-Cox transformations: wt = log(yt ), (ytλ − 1)/λ, λ = 0; λ 6= 0. λ = 1: (No substantive transformation) λ = 12 : (Square root plus linear transformation) λ = 0: (Natural logarithm) λ = −1: (Inverse plus 1) Forecasting: Principles and Practice Transformations 5 Box-Cox transformations 10000 6000 2000 Monthly electricity production 14000 λ = 1.00 1960 1970 1980 1990 Year Forecasting: Principles and Practice Transformations 6 Box-Cox transformations ytλ for λ close to zero behaves like logs. If some yt = 0, then must have λ > 0 if some yt < 0, no power transformation is possible unless all yt adjusted by adding a constant to all values. Choose a simple value of λ. It makes explanation easier. Results are relatively insensitive to value of λ Often no transformation (λ = 1) needed. Transformation often makes little difference to forecasts but has large effect on PI. Choosing λ = 0 is a simple way to force forecasts to be positive Forecasting: Principles and Practice Transformations 7 Box-Cox transformations ytλ for λ close to zero behaves like logs. If some yt = 0, then must have λ > 0 if some yt < 0, no power transformation is possible unless all yt adjusted by adding a constant to all values. Choose a simple value of λ. It makes explanation easier. Results are relatively insensitive to value of λ Often no transformation (λ = 1) needed. Transformation often makes little difference to forecasts but has large effect on PI. Choosing λ = 0 is a simple way to force forecasts to be positive Forecasting: Principles and Practice Transformations 7 Box-Cox transformations ytλ for λ close to zero behaves like logs. If some yt = 0, then must have λ > 0 if some yt < 0, no power transformation is possible unless all yt adjusted by adding a constant to all values. Choose a simple value of λ. It makes explanation easier. Results are relatively insensitive to value of λ Often no transformation (λ = 1) needed. Transformation often makes little difference to forecasts but has large effect on PI. Choosing λ = 0 is a simple way to force forecasts to be positive Forecasting: Principles and Practice Transformations 7 Box-Cox transformations ytλ for λ close to zero behaves like logs. If some yt = 0, then must have λ > 0 if some yt < 0, no power transformation is possible unless all yt adjusted by adding a constant to all values. Choose a simple value of λ. It makes explanation easier. Results are relatively insensitive to value of λ Often no transformation (λ = 1) needed. Transformation often makes little difference to forecasts but has large effect on PI. Choosing λ = 0 is a simple way to force forecasts to be positive Forecasting: Principles and Practice Transformations 7 Box-Cox transformations ytλ for λ close to zero behaves like logs. If some yt = 0, then must have λ > 0 if some yt < 0, no power transformation is possible unless all yt adjusted by adding a constant to all values. Choose a simple value of λ. It makes explanation easier. Results are relatively insensitive to value of λ Often no transformation (λ = 1) needed. Transformation often makes little difference to forecasts but has large effect on PI. Choosing λ = 0 is a simple way to force forecasts to be positive Forecasting: Principles and Practice Transformations 7 Box-Cox transformations ytλ for λ close to zero behaves like logs. If some yt = 0, then must have λ > 0 if some yt < 0, no power transformation is possible unless all yt adjusted by adding a constant to all values. Choose a simple value of λ. It makes explanation easier. Results are relatively insensitive to value of λ Often no transformation (λ = 1) needed. Transformation often makes little difference to forecasts but has large effect on PI. Choosing λ = 0 is a simple way to force forecasts to be positive Forecasting: Principles and Practice Transformations 7 Box-Cox transformations ytλ for λ close to zero behaves like logs. If some yt = 0, then must have λ > 0 if some yt < 0, no power transformation is possible unless all yt adjusted by adding a constant to all values. Choose a simple value of λ. It makes explanation easier. Results are relatively insensitive to value of λ Often no transformation (λ = 1) needed. Transformation often makes little difference to forecasts but has large effect on PI. Choosing λ = 0 is a simple way to force forecasts to be positive Forecasting: Principles and Practice Transformations 7 Box-Cox transformations ytλ for λ close to zero behaves like logs. If some yt = 0, then must have λ > 0 if some yt < 0, no power transformation is possible unless all yt adjusted by adding a constant to all values. Choose a simple value of λ. It makes explanation easier. Results are relatively insensitive to value of λ Often no transformation (λ = 1) needed. Transformation often makes little difference to forecasts but has large effect on PI. Choosing λ = 0 is a simple way to force forecasts to be positive Forecasting: Principles and Practice Transformations 7 Back-transformation We must reverse the transformation (or back-transform) to obtain forecasts on the original scale. The reverse Box-Cox transformations are given by yt = exp(wt ), (λWt + 1)1/λ , λ = 0; λ 6= 0. plot(BoxCox(elec,lambda=1/3)) fit <- snaive(elec, lambda=1/3) plot(fit) plot(fit, include=120) Forecasting: Principles and Practice Transformations 8 Back-transformation We must reverse the transformation (or back-transform) to obtain forecasts on the original scale. The reverse Box-Cox transformations are given by yt = exp(wt ), (λWt + 1)1/λ , λ = 0; λ 6= 0. plot(BoxCox(elec,lambda=1/3)) fit <- snaive(elec, lambda=1/3) plot(fit) plot(fit, include=120) Forecasting: Principles and Practice Transformations 8 Automated Box-Cox transformations BoxCox.lambda(elec) This attempts to balance the seasonal fluctuations and random variation across the series. Always check the results. A low value of λ can give extremely large prediction intervals. Forecasting: Principles and Practice Transformations 9 Automated Box-Cox transformations BoxCox.lambda(elec) This attempts to balance the seasonal fluctuations and random variation across the series. Always check the results. A low value of λ can give extremely large prediction intervals. Forecasting: Principles and Practice Transformations 9 Automated Box-Cox transformations BoxCox.lambda(elec) This attempts to balance the seasonal fluctuations and random variation across the series. Always check the results. A low value of λ can give extremely large prediction intervals. Forecasting: Principles and Practice Transformations 9 Automated Box-Cox transformations BoxCox.lambda(elec) This attempts to balance the seasonal fluctuations and random variation across the series. Always check the results. A low value of λ can give extremely large prediction intervals. Forecasting: Principles and Practice Transformations 9 ETS and transformations A Box-Cox transformation followed by an additive ETS model is often better than an ETS model without transformation. It makes no sense to use a Box-Cox transformation and a non-additive ETS model. Forecasting: Principles and Practice Transformations 10 ETS and transformations A Box-Cox transformation followed by an additive ETS model is often better than an ETS model without transformation. It makes no sense to use a Box-Cox transformation and a non-additive ETS model. Forecasting: Principles and Practice Transformations 10 Outline 1 Transformations 2 Stationarity 3 Ordinary differencing 4 Seasonal differencing 5 Unit root tests 6 Backshift notation Forecasting: Principles and Practice Stationarity 11 Stationarity Definition If {yt } is a stationary time series, then for all s, the distribution of (yt , . . . , yt+s ) does not depend on t. A stationary series is: roughly horizontal constant variance no patterns predictable in the long-term Forecasting: Principles and Practice Stationarity 12 Stationarity Definition If {yt } is a stationary time series, then for all s, the distribution of (yt , . . . , yt+s ) does not depend on t. A stationary series is: roughly horizontal constant variance no patterns predictable in the long-term Forecasting: Principles and Practice Stationarity 12 Stationarity Definition If {yt } is a stationary time series, then for all s, the distribution of (yt , . . . , yt+s ) does not depend on t. A stationary series is: roughly horizontal constant variance no patterns predictable in the long-term Forecasting: Principles and Practice Stationarity 12 Stationarity Definition If {yt } is a stationary time series, then for all s, the distribution of (yt , . . . , yt+s ) does not depend on t. A stationary series is: roughly horizontal constant variance no patterns predictable in the long-term Forecasting: Principles and Practice Stationarity 12 3800 3700 3600 Dow−Jones index 3900 Stationary? 0 50 100 150 200 250 300 Day Forecasting: Principles and Practice Stationarity 13 0 −50 −100 Change in Dow−Jones index 50 Stationary? 0 50 100 150 200 250 300 Day Forecasting: Principles and Practice Stationarity 14 Stationary? 5000 4500 4000 3500 Number of strikes 5500 6000 Annual strikes in the US 1950 1955 1960 1965 1970 1975 1980 Year Forecasting: Principles and Practice Stationarity 15 Stationary? 60 50 40 30 Total sales 70 80 90 Sales of new one−family houses, USA 1975 1980 Forecasting: Principles and Practice 1985 1990 Stationarity 1995 16 Stationary? 200 150 100 $ 250 300 350 Price of a dozen eggs in 1993 dollars 1900 1920 1940 1960 1980 Year Forecasting: Principles and Practice Stationarity 17 Stationary? 100 90 80 thousands 110 Number of pigs slaughtered in Victoria 1990 1991 1992 Forecasting: Principles and Practice 1993 1994 Stationarity 1995 18 1000 2000 3000 4000 5000 6000 7000 Annual Canadian Lynx trappings 0 Number trapped Stationary? 1820 1840 1860 1880 1900 1920 Time Forecasting: Principles and Practice Stationarity 19 Stationary? 450 400 megaliters 500 Australian quarterly beer production 1995 Forecasting: Principles and Practice 2000 2005 Stationarity 20 Stationary? 2000 6000 GWh 10000 14000 Annual monthly electricity production 1960 1970 1980 1990 Year Forecasting: Principles and Practice Stationarity 21 Stationarity Definition If {yt } is a stationary time series, then for all s, the distribution of (yt , . . . , yt+s ) does not depend on t. Transformations help to stabilize the variance. For ARIMA modelling, we also need to stabilize the mean. Forecasting: Principles and Practice Stationarity 22 Stationarity Definition If {yt } is a stationary time series, then for all s, the distribution of (yt , . . . , yt+s ) does not depend on t. Transformations help to stabilize the variance. For ARIMA modelling, we also need to stabilize the mean. Forecasting: Principles and Practice Stationarity 22 Stationarity Definition If {yt } is a stationary time series, then for all s, the distribution of (yt , . . . , yt+s ) does not depend on t. Transformations help to stabilize the variance. For ARIMA modelling, we also need to stabilize the mean. Forecasting: Principles and Practice Stationarity 22 Non-stationarity in the mean Identifying non-stationary series time plot. The ACF of stationary data drops to zero relatively quickly The ACF of non-stationary data decreases slowly. For non-stationary data, the value of r1 is often large and positive. Forecasting: Principles and Practice Stationarity 23 Non-stationarity in the mean Identifying non-stationary series time plot. The ACF of stationary data drops to zero relatively quickly The ACF of non-stationary data decreases slowly. For non-stationary data, the value of r1 is often large and positive. Forecasting: Principles and Practice Stationarity 23 Non-stationarity in the mean Identifying non-stationary series time plot. The ACF of stationary data drops to zero relatively quickly The ACF of non-stationary data decreases slowly. For non-stationary data, the value of r1 is often large and positive. Forecasting: Principles and Practice Stationarity 23 Non-stationarity in the mean Identifying non-stationary series time plot. The ACF of stationary data drops to zero relatively quickly The ACF of non-stationary data decreases slowly. For non-stationary data, the value of r1 is often large and positive. Forecasting: Principles and Practice Stationarity 23 Example: Dow-Jones index 3600 3700 3800 3900 Dow Jones index (daily ending 15 Jul 94) 0 50 Forecasting: Principles and Practice 100 150 200 Stationarity 250 24 0.4 0.2 0.0 −0.2 ACF 0.6 0.8 1.0 Example: Dow-Jones index 1 2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 Lag Forecasting: Principles and Practice Stationarity 25 0 −50 −100 Change in Dow−Jones index 50 Example: Dow-Jones index 0 50 100 150 200 250 300 Day Forecasting: Principles and Practice Stationarity 26 −0.15 −0.05 ACF 0.05 0.10 0.15 Example: Dow-Jones index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Lag Forecasting: Principles and Practice Stationarity 27 Outline 1 Transformations 2 Stationarity 3 Ordinary differencing 4 Seasonal differencing 5 Unit root tests 6 Backshift notation Forecasting: Principles and Practice Ordinary differencing 28 Differencing Differencing helps to stabilize the mean. The differenced series is the change between each observation in the original series: yt0 = yt − yt−1 . The differenced series will have only T − 1 values since it is not possible to calculate a difference y10 for the first observation. Forecasting: Principles and Practice Ordinary differencing 29 Differencing Differencing helps to stabilize the mean. The differenced series is the change between each observation in the original series: yt0 = yt − yt−1 . The differenced series will have only T − 1 values since it is not possible to calculate a difference y10 for the first observation. Forecasting: Principles and Practice Ordinary differencing 29 Differencing Differencing helps to stabilize the mean. The differenced series is the change between each observation in the original series: yt0 = yt − yt−1 . The differenced series will have only T − 1 values since it is not possible to calculate a difference y10 for the first observation. Forecasting: Principles and Practice Ordinary differencing 29 Example: Dow-Jones index The differences of the Dow-Jones index are the day-today changes. Now the series looks just like a white noise series: no autocorrelations outside the 95% limits. Ljung-Box Q∗ statistic has a p-value 0.153 for h = 10. Conclusion: The daily change in the Dow-Jones index is essentially a random amount uncorrelated with previous days. Forecasting: Principles and Practice Ordinary differencing 30 Example: Dow-Jones index The differences of the Dow-Jones index are the day-today changes. Now the series looks just like a white noise series: no autocorrelations outside the 95% limits. Ljung-Box Q∗ statistic has a p-value 0.153 for h = 10. Conclusion: The daily change in the Dow-Jones index is essentially a random amount uncorrelated with previous days. Forecasting: Principles and Practice Ordinary differencing 30 Example: Dow-Jones index The differences of the Dow-Jones index are the day-today changes. Now the series looks just like a white noise series: no autocorrelations outside the 95% limits. Ljung-Box Q∗ statistic has a p-value 0.153 for h = 10. Conclusion: The daily change in the Dow-Jones index is essentially a random amount uncorrelated with previous days. Forecasting: Principles and Practice Ordinary differencing 30 Example: Dow-Jones index The differences of the Dow-Jones index are the day-today changes. Now the series looks just like a white noise series: no autocorrelations outside the 95% limits. Ljung-Box Q∗ statistic has a p-value 0.153 for h = 10. Conclusion: The daily change in the Dow-Jones index is essentially a random amount uncorrelated with previous days. Forecasting: Principles and Practice Ordinary differencing 30 Example: Dow-Jones index The differences of the Dow-Jones index are the day-today changes. Now the series looks just like a white noise series: no autocorrelations outside the 95% limits. Ljung-Box Q∗ statistic has a p-value 0.153 for h = 10. Conclusion: The daily change in the Dow-Jones index is essentially a random amount uncorrelated with previous days. Forecasting: Principles and Practice Ordinary differencing 30 Random walk model Graph of differenced data suggests model for Dow-Jones index: yt − yt−1 = et or yt = yt−1 + et . “Random walk” model very widely used for non-stationary data. This is the model behind the naïve method. Random walks typically have: long periods of apparent trends up or down sudden and unpredictable changes in direction. Forecasting: Principles and Practice Ordinary differencing 31 Random walk model Graph of differenced data suggests model for Dow-Jones index: yt − yt−1 = et or yt = yt−1 + et . “Random walk” model very widely used for non-stationary data. This is the model behind the naïve method. Random walks typically have: long periods of apparent trends up or down sudden and unpredictable changes in direction. Forecasting: Principles and Practice Ordinary differencing 31 Random walk model Graph of differenced data suggests model for Dow-Jones index: yt − yt−1 = et or yt = yt−1 + et . “Random walk” model very widely used for non-stationary data. This is the model behind the naïve method. Random walks typically have: long periods of apparent trends up or down sudden and unpredictable changes in direction. Forecasting: Principles and Practice Ordinary differencing 31 Random walk model Graph of differenced data suggests model for Dow-Jones index: yt − yt−1 = et or yt = yt−1 + et . “Random walk” model very widely used for non-stationary data. This is the model behind the naïve method. Random walks typically have: long periods of apparent trends up or down sudden and unpredictable changes in direction. Forecasting: Principles and Practice Ordinary differencing 31 Random walk model Graph of differenced data suggests model for Dow-Jones index: yt − yt−1 = et or yt = yt−1 + et . “Random walk” model very widely used for non-stationary data. This is the model behind the naïve method. Random walks typically have: long periods of apparent trends up or down sudden and unpredictable changes in direction. Forecasting: Principles and Practice Ordinary differencing 31 Random walk model Graph of differenced data suggests model for Dow-Jones index: yt − yt−1 = et or yt = yt−1 + et . “Random walk” model very widely used for non-stationary data. This is the model behind the naïve method. Random walks typically have: long periods of apparent trends up or down sudden and unpredictable changes in direction. Forecasting: Principles and Practice Ordinary differencing 31 Random walk with drift model yt − yt−1 = c + et or yt = c + yt−1 + et . c is the average change between consecutive observations. This is the model behind the drift method. Forecasting: Principles and Practice Ordinary differencing 32 Random walk with drift model yt − yt−1 = c + et or yt = c + yt−1 + et . c is the average change between consecutive observations. This is the model behind the drift method. Forecasting: Principles and Practice Ordinary differencing 32 Random walk with drift model yt − yt−1 = c + et or yt = c + yt−1 + et . c is the average change between consecutive observations. This is the model behind the drift method. Forecasting: Principles and Practice Ordinary differencing 32 Second-order differencing Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time: yt00 = yt0 − yt0 −1 = (yt − yt−1) − (yt−1 − yt−2) = yt − 2yt−1 + yt−2. yt00 will have T − 2 values. In practice, it is almost never necessary to go beyond second-order differences. Forecasting: Principles and Practice Ordinary differencing 33 Second-order differencing Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time: yt00 = yt0 − yt0 −1 = (yt − yt−1) − (yt−1 − yt−2) = yt − 2yt−1 + yt−2. yt00 will have T − 2 values. In practice, it is almost never necessary to go beyond second-order differences. Forecasting: Principles and Practice Ordinary differencing 33 Second-order differencing Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time: yt00 = yt0 − yt0 −1 = (yt − yt−1) − (yt−1 − yt−2) = yt − 2yt−1 + yt−2. yt00 will have T − 2 values. In practice, it is almost never necessary to go beyond second-order differences. Forecasting: Principles and Practice Ordinary differencing 33 Second-order differencing Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time: yt00 = yt0 − yt0 −1 = (yt − yt−1) − (yt−1 − yt−2) = yt − 2yt−1 + yt−2. yt00 will have T − 2 values. In practice, it is almost never necessary to go beyond second-order differences. Forecasting: Principles and Practice Ordinary differencing 33 Outline 1 Transformations 2 Stationarity 3 Ordinary differencing 4 Seasonal differencing 5 Unit root tests 6 Backshift notation Forecasting: Principles and Practice Seasonal differencing 34 Seasonal differencing A seasonal difference is the difference between an observation and the corresponding observation from the previous year. yt0 = yt − yt−m where m = number of seasons. For monthly data m = 12. For quarterly data m = 4. Forecasting: Principles and Practice Seasonal differencing 35 Seasonal differencing A seasonal difference is the difference between an observation and the corresponding observation from the previous year. yt0 = yt − yt−m where m = number of seasons. For monthly data m = 12. For quarterly data m = 4. Forecasting: Principles and Practice Seasonal differencing 35 Seasonal differencing A seasonal difference is the difference between an observation and the corresponding observation from the previous year. yt0 = yt − yt−m where m = number of seasons. For monthly data m = 12. For quarterly data m = 4. Forecasting: Principles and Practice Seasonal differencing 35 Seasonal differencing A seasonal difference is the difference between an observation and the corresponding observation from the previous year. yt0 = yt − yt−m where m = number of seasons. For monthly data m = 12. For quarterly data m = 4. Forecasting: Principles and Practice Seasonal differencing 35 Antidiabetic drug sales 15 10 5 $ million 20 25 30 Antidiabetic drug sales 1995 > plot(a10) Forecasting: Principles and Practice 2000 2005 Year Seasonal differencing 36 Antidiabetic drug sales 1.0 1.5 2.0 2.5 3.0 Log Antidiabetic drug sales 1995 > plot(log(a10)) Forecasting: Principles and Practice 2000 2005 Year Seasonal differencing 37 Monthly lo 2.0 1.5 1.0 0.3 0.2 0.1 0.0 −0.1 Annual change in monthly log sales Antidiabetic drug sales 1995 2000 2005 Year > plot(diff(log(a10),12)) Forecasting: Principles and Practice Seasonal differencing 38 300 250 200 150 Billion kWh 350 400 Electricity production 1980 > plot(usmelec) Forecasting: Principles and Practice 1990 2000 2010 Year Seasonal differencing 39 5.4 5.2 5.0 Logs 5.6 5.8 6.0 Electricity production 1980 > plot(log(usmelec)) Forecasting: Principles and Practice 1990 2000 2010 Year Seasonal differencing 40 0.10 0.05 0.00 −0.05 Seasonally differenced logs 0.15 Electricity production 1980 1990 2000 2010 Year > plot(diff(log(usmelec),12)) Forecasting: Principles and Practice Seasonal differencing 41 0.05 0.00 −0.05 −0.10 Doubly differenced logs 0.10 Electricity production 1980 1990 2000 2010 Year > plot(diff(diff(log(usmelec),12),1)) Forecasting: Principles and Practice Seasonal differencing 42 Electricity production Seasonally differenced series is closer to being stationary. Remaining non-stationarity can be removed with further first difference. If yt0 = yt − yt−12 denotes seasonally differenced series, then twice-differenced series is yt∗ = yt0 − yt0 −1 = (yt − yt−12) − (yt−1 − yt−13) = yt − yt−1 − yt−12 + yt−13 . Forecasting: Principles and Practice Seasonal differencing 43 Seasonal differencing When both seasonal and first differences are applied. . . it makes no difference which is done first—the result will be the same. If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference. It is important that if differencing is used, the differences are interpretable. Forecasting: Principles and Practice Seasonal differencing 44 Seasonal differencing When both seasonal and first differences are applied. . . it makes no difference which is done first—the result will be the same. If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference. It is important that if differencing is used, the differences are interpretable. Forecasting: Principles and Practice Seasonal differencing 44 Seasonal differencing When both seasonal and first differences are applied. . . it makes no difference which is done first—the result will be the same. If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference. It is important that if differencing is used, the differences are interpretable. Forecasting: Principles and Practice Seasonal differencing 44 Seasonal differencing When both seasonal and first differences are applied. . . it makes no difference which is done first—the result will be the same. If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference. It is important that if differencing is used, the differences are interpretable. Forecasting: Principles and Practice Seasonal differencing 44 Seasonal differencing When both seasonal and first differences are applied. . . it makes no difference which is done first—the result will be the same. If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference. It is important that if differencing is used, the differences are interpretable. Forecasting: Principles and Practice Seasonal differencing 44 Interpretation of differencing first differences are the change between one observation and the next; seasonal differences are the change between one year to the next. But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted. Forecasting: Principles and Practice Seasonal differencing 45 Interpretation of differencing first differences are the change between one observation and the next; seasonal differences are the change between one year to the next. But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted. Forecasting: Principles and Practice Seasonal differencing 45 Interpretation of differencing first differences are the change between one observation and the next; seasonal differences are the change between one year to the next. But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted. Forecasting: Principles and Practice Seasonal differencing 45 Interpretation of differencing first differences are the change between one observation and the next; seasonal differences are the change between one year to the next. But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted. Forecasting: Principles and Practice Seasonal differencing 45 Outline 1 Transformations 2 Stationarity 3 Ordinary differencing 4 Seasonal differencing 5 Unit root tests 6 Backshift notation Forecasting: Principles and Practice Unit root tests 46 Unit root tests Statistical tests to determine the required order of differencing. 1 2 3 Augmented Dickey Fuller test: null hypothesis is that the data are non-stationary and non-seasonal. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: null hypothesis is that the data are stationary and non-seasonal. Other tests available for seasonal data. Forecasting: Principles and Practice Unit root tests 47 Unit root tests Statistical tests to determine the required order of differencing. 1 2 3 Augmented Dickey Fuller test: null hypothesis is that the data are non-stationary and non-seasonal. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: null hypothesis is that the data are stationary and non-seasonal. Other tests available for seasonal data. Forecasting: Principles and Practice Unit root tests 47 Unit root tests Statistical tests to determine the required order of differencing. 1 2 3 Augmented Dickey Fuller test: null hypothesis is that the data are non-stationary and non-seasonal. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: null hypothesis is that the data are stationary and non-seasonal. Other tests available for seasonal data. Forecasting: Principles and Practice Unit root tests 47 Dickey-Fuller test Test for “unit root” Estimate regression model yt0 = φyt−1 + b1 yt0 −1 + b2 yt0 −2 + · · · + bk yt0 −k where yt0 denotes differenced series yt − yt−1 . Number of lagged terms, k, is usually set to be about 3. If original series, yt , needs differencing, φ̂ ≈ 0. If yt is already stationary, φ̂ < 0. In R: Use adf.test(). Forecasting: Principles and Practice Unit root tests 48 Dickey-Fuller test Test for “unit root” Estimate regression model yt0 = φyt−1 + b1 yt0 −1 + b2 yt0 −2 + · · · + bk yt0 −k where yt0 denotes differenced series yt − yt−1 . Number of lagged terms, k, is usually set to be about 3. If original series, yt , needs differencing, φ̂ ≈ 0. If yt is already stationary, φ̂ < 0. In R: Use adf.test(). Forecasting: Principles and Practice Unit root tests 48 Dickey-Fuller test Test for “unit root” Estimate regression model yt0 = φyt−1 + b1 yt0 −1 + b2 yt0 −2 + · · · + bk yt0 −k where yt0 denotes differenced series yt − yt−1 . Number of lagged terms, k, is usually set to be about 3. If original series, yt , needs differencing, φ̂ ≈ 0. If yt is already stationary, φ̂ < 0. In R: Use adf.test(). Forecasting: Principles and Practice Unit root tests 48 Dickey-Fuller test Test for “unit root” Estimate regression model yt0 = φyt−1 + b1 yt0 −1 + b2 yt0 −2 + · · · + bk yt0 −k where yt0 denotes differenced series yt − yt−1 . Number of lagged terms, k, is usually set to be about 3. If original series, yt , needs differencing, φ̂ ≈ 0. If yt is already stationary, φ̂ < 0. In R: Use adf.test(). Forecasting: Principles and Practice Unit root tests 48 Dickey-Fuller test Test for “unit root” Estimate regression model yt0 = φyt−1 + b1 yt0 −1 + b2 yt0 −2 + · · · + bk yt0 −k where yt0 denotes differenced series yt − yt−1 . Number of lagged terms, k, is usually set to be about 3. If original series, yt , needs differencing, φ̂ ≈ 0. If yt is already stationary, φ̂ < 0. In R: Use adf.test(). Forecasting: Principles and Practice Unit root tests 48 Dickey-Fuller test in R adf.test(x, alternative = c("stationary", "explosive"), k = trunc((length(x)-1)^(1/3))) k = bT − 1c1/3 Set alternative = stationary. > adf.test(dj) Augmented Dickey-Fuller Test data: dj Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816 alternative hypothesis: stationary Forecasting: Principles and Practice Unit root tests 49 Dickey-Fuller test in R adf.test(x, alternative = c("stationary", "explosive"), k = trunc((length(x)-1)^(1/3))) k = bT − 1c1/3 Set alternative = stationary. > adf.test(dj) Augmented Dickey-Fuller Test data: dj Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816 alternative hypothesis: stationary Forecasting: Principles and Practice Unit root tests 49 Dickey-Fuller test in R adf.test(x, alternative = c("stationary", "explosive"), k = trunc((length(x)-1)^(1/3))) k = bT − 1c1/3 Set alternative = stationary. > adf.test(dj) Augmented Dickey-Fuller Test data: dj Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816 alternative hypothesis: stationary Forecasting: Principles and Practice Unit root tests 49 Dickey-Fuller test in R adf.test(x, alternative = c("stationary", "explosive"), k = trunc((length(x)-1)^(1/3))) k = bT − 1c1/3 Set alternative = stationary. > adf.test(dj) Augmented Dickey-Fuller Test data: dj Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816 alternative hypothesis: stationary Forecasting: Principles and Practice Unit root tests 49 Dickey-Fuller test in R adf.test(x, alternative = c("stationary", "explosive"), k = trunc((length(x)-1)^(1/3))) k = bT − 1c1/3 Set alternative = stationary. > adf.test(dj) Augmented Dickey-Fuller Test data: dj Dickey-Fuller = -1.9872, Lag order = 6, p-value = 0.5816 alternative hypothesis: stationary Forecasting: Principles and Practice Unit root tests 49 How many differences? ndiffs(x) nsdiffs(x) Automated differencing ns <- nsdiffs(x) if(ns > 0) xstar <- diff(x,lag=frequency(x), differences=ns) else xstar <- x nd <- ndiffs(xstar) if(nd > 0) xstar <- diff(xstar,differences=nd) Forecasting: Principles and Practice Unit root tests 50 How many differences? ndiffs(x) nsdiffs(x) Automated differencing ns <- nsdiffs(x) if(ns > 0) xstar <- diff(x,lag=frequency(x), differences=ns) else xstar <- x nd <- ndiffs(xstar) if(nd > 0) xstar <- diff(xstar,differences=nd) Forecasting: Principles and Practice Unit root tests 50 Outline 1 Transformations 2 Stationarity 3 Ordinary differencing 4 Seasonal differencing 5 Unit root tests 6 Backshift notation Forecasting: Principles and Practice Backshift notation 51 Backshift notation A very useful notational device is the backward shift operator, B, which is used as follows: Byt = yt−1 . In other words, B, operating on yt , has the effect of shifting the data back one period. Two applications of B to yt shifts the data back two periods: B(Byt ) = B2 yt = yt−2 . For monthly data, if we wish to shift attention to “the same month last year,” then B12 is used, and the notation is B12 yt = yt−12 . Forecasting: Principles and Practice Backshift notation 52 Backshift notation A very useful notational device is the backward shift operator, B, which is used as follows: Byt = yt−1 . In other words, B, operating on yt , has the effect of shifting the data back one period. Two applications of B to yt shifts the data back two periods: B(Byt ) = B2 yt = yt−2 . For monthly data, if we wish to shift attention to “the same month last year,” then B12 is used, and the notation is B12 yt = yt−12 . Forecasting: Principles and Practice Backshift notation 52 Backshift notation A very useful notational device is the backward shift operator, B, which is used as follows: Byt = yt−1 . In other words, B, operating on yt , has the effect of shifting the data back one period. Two applications of B to yt shifts the data back two periods: B(Byt ) = B2 yt = yt−2 . For monthly data, if we wish to shift attention to “the same month last year,” then B12 is used, and the notation is B12 yt = yt−12 . Forecasting: Principles and Practice Backshift notation 52 Backshift notation A very useful notational device is the backward shift operator, B, which is used as follows: Byt = yt−1 . In other words, B, operating on yt , has the effect of shifting the data back one period. Two applications of B to yt shifts the data back two periods: B(Byt ) = B2 yt = yt−2 . For monthly data, if we wish to shift attention to “the same month last year,” then B12 is used, and the notation is B12 yt = yt−12 . Forecasting: Principles and Practice Backshift notation 52 Backshift notation The backward shift operator is convenient for describing the process of differencing. A first difference can be written as yt0 = yt − yt−1 = yt − Byt = (1 − B)yt . Note that a first difference is represented by (1 − B). Similarly, if second-order differences (i.e., first differences of first differences) have to be computed, then: yt00 = yt − 2yt−1 + yt−2 = (1 − B)2 yt . Forecasting: Principles and Practice Backshift notation 53 Backshift notation The backward shift operator is convenient for describing the process of differencing. A first difference can be written as yt0 = yt − yt−1 = yt − Byt = (1 − B)yt . Note that a first difference is represented by (1 − B). Similarly, if second-order differences (i.e., first differences of first differences) have to be computed, then: yt00 = yt − 2yt−1 + yt−2 = (1 − B)2 yt . Forecasting: Principles and Practice Backshift notation 53 Backshift notation The backward shift operator is convenient for describing the process of differencing. A first difference can be written as yt0 = yt − yt−1 = yt − Byt = (1 − B)yt . Note that a first difference is represented by (1 − B). Similarly, if second-order differences (i.e., first differences of first differences) have to be computed, then: yt00 = yt − 2yt−1 + yt−2 = (1 − B)2 yt . Forecasting: Principles and Practice Backshift notation 53 Backshift notation The backward shift operator is convenient for describing the process of differencing. A first difference can be written as yt0 = yt − yt−1 = yt − Byt = (1 − B)yt . Note that a first difference is represented by (1 − B). Similarly, if second-order differences (i.e., first differences of first differences) have to be computed, then: yt00 = yt − 2yt−1 + yt−2 = (1 − B)2 yt . Forecasting: Principles and Practice Backshift notation 53 Backshift notation Second-order difference is denoted (1 − B)2 . Second-order difference is not the same as a second difference, which would be denoted 1 − B2 ; In general, a dth-order difference can be written as (1 − B)d yt . A seasonal difference followed by a first difference can be written as (1 − B)(1 − Bm )yt . Forecasting: Principles and Practice Backshift notation 54 Backshift notation Second-order difference is denoted (1 − B)2 . Second-order difference is not the same as a second difference, which would be denoted 1 − B2 ; In general, a dth-order difference can be written as (1 − B)d yt . A seasonal difference followed by a first difference can be written as (1 − B)(1 − Bm )yt . Forecasting: Principles and Practice Backshift notation 54 Backshift notation Second-order difference is denoted (1 − B)2 . Second-order difference is not the same as a second difference, which would be denoted 1 − B2 ; In general, a dth-order difference can be written as (1 − B)d yt . A seasonal difference followed by a first difference can be written as (1 − B)(1 − Bm )yt . Forecasting: Principles and Practice Backshift notation 54 Backshift notation Second-order difference is denoted (1 − B)2 . Second-order difference is not the same as a second difference, which would be denoted 1 − B2 ; In general, a dth-order difference can be written as (1 − B)d yt . A seasonal difference followed by a first difference can be written as (1 − B)(1 − Bm )yt . Forecasting: Principles and Practice Backshift notation 54 Backshift notation The “backshift” notation is convenient because the terms can be multiplied together to see the combined effect. (1 − B)(1 − Bm )yt = (1 − B − Bm + Bm+1 )yt = yt − yt−1 − yt−m + yt−m−1 . For monthly data, m = 12 and we obtain the same result as earlier. Forecasting: Principles and Practice Backshift notation 55 Backshift notation The “backshift” notation is convenient because the terms can be multiplied together to see the combined effect. (1 − B)(1 − Bm )yt = (1 − B − Bm + Bm+1 )yt = yt − yt−1 − yt−m + yt−m−1 . For monthly data, m = 12 and we obtain the same result as earlier. Forecasting: Principles and Practice Backshift notation 55