Chapter 3: Box-Jenkins Seasonal Modelling 3.1 Stationarity Transformation “Pre-differencing transformation” is often used to stablize the seasonal variation of the time series. A common transformation is of the form: y n( yt ) * t 1 • “Differencing transformation”: 1) Z t yt* yt*1 (first non-seasonal difference) 2) Z t yt* yt* L (first seasonal difference, where L is the number of seasons in a year) * * * * Z y y y y t L t L 1 t 1 3) t t (first seasonal and first non- seasonal difference) Of course, one can also obtain second and higher order differences by simply applying the same rule. 2 3.2 Autocorrelation and Partial Autocorrelation To determine if the data are stationary, we examine the behaviour of the autocorrelation and partial autocorrelation of the series at both the seasonal and non-seasonal level. The behaviour of the SAC and SPAC functions at lags 1 to L-3 is often considered as the behaviour of these functions at the non-seasonal level. A spike (significant memory) is said to exist if the corresponding SAC or SPAC are greater than twice their respective standard deviations. The time series is considered to be stationary if the SAC of the series cuts off or dies down reasonably quickly at both the seasonal & non-seasonal levels. 3 Example 3.1 • Figure 3.1 shows the monthly passenger totals (yt) in thousands of passengers from 1949-59. The plot levels patterns of increasing seasonal variations. • Figure 3.2 shows y n( yt ) , which seems to have equalized the seasonal variations. * t 4 Figure 3.1 Monthly total international airline passengers (in thousands), 1949-1959 600 500 400 no. of passengers 300 200 100 0 Dec-48 May-50 Sep-51 Jan-53 Jun-54 Oct-55 Mar-57 Jul-58 Dec-59 5 Figure 3.2 Natural logarithms of monthly total international airline passengers, 1949-1959 6.5 6.3 6.1 5.9 5.7 no. of passengers 5.5 5.3 5.1 4.9 4.7 4.5 Sep-48 Feb-50 Jun-51 Nov-52 Mar-54 Jul-55 Dec-56 Apr-58 Sep-59 6 • The following SAS output shows the * SAC’s of yt , its first difference at the non-seasonal level, at the seasonal level and at both the non-seasonal and seasonal levels. • On the basis of the SAC’s, it appears that first difference at either seasonal level, or at both seasonal and nonseasonal levels are necessary to ensure the stationarity of the data. 7 ARIMA Procedure Name of variable = LY. Mean of working series = 5.486478 Standard deviation = 0.414728 Number of observations = 132 Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 0.171999 1.00000 | |********************| 1 0.163124 0.94840 | . |******************* | 2 0.152803 0.88839 | . |****************** | 3 0.143954 0.83694 | . |***************** | 4 0.136137 0.79150 | . |**************** | 5 0.130741 0.76013 | . |*************** | 6 0.126696 0.73661 | . |*************** | 7 0.123230 0.71646 | . |************** | 8 0.121237 0.70487 | . |************** | 9 0.122719 0.71349 | . |************** | 10 0.124451 0.72355 | . |************** | 11 0.127306 0.74015 | . |*************** | 12 0.128377 0.74638 | . |*************** | 13 0.120171 0.69867 | . |************** | 14 0.110539 0.64267 | . |*************. | 15 0.102490 0.59587 | . |************ . | 16 0.094860 0.55151 | . |*********** . | 17 0.089022 0.51757 | . |********** . | 18 0.084737 0.49266 | . |********** . | 19 0.081216 0.47219 | . |********* . | 20 0.079499 0.46220 | . |********* . | 21 0.080921 0.47047 | . |********* . | 22 0.082292 0.47845 | . |********** . | 23 0.084129 0.48913 | . |********** . | 24 0.084738 0.49267 | . |********** . | "." marks two standard errors 8 ARIMA Procedure Name of variable = LY. Period(s) of Differencing = 1. Mean of working series = 0.009812 Standard deviation = 0.106038 Number of observations = 131 NOTE: The first observation was eliminated by differencing. Autocorrelations Lag 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0.011244 1.00000 | |********************| 0.0021211 0.18864 | . |**** | -0.0014190 -0.12620 | .***| . | -0.0017381 -0.15458 | .***| . | -0.0036763 -0.32696 | *******| . | -0.000749 -0.06661 | . *| . | 0.00045338 0.04032 | . |* . | -0.0011063 -0.09839 | . **| . | -0.0038510 -0.34250 | *******| . | -0.0012310 -0.10948 | . **| . | -0.0013408 -0.11925 | . **| . | 0.0022435 0.19953 | . |****. | 0.0093677 0.83312 | . |***************** | 0.0022267 0.19803 | . |**** . | -0.0015966 -0.14200 | . ***| . | -0.0012365 -0.10996 | . **| . | -0.0032543 -0.28942 | ******| . | -0.0005262 -0.04680 | . *| . | 0.00039747 0.03535 | . |* . | -0.0011731 -0.10433 | . **| . | -0.0035000 -0.31128 | .******| . | -0.0012046 -0.10713 | . **| . | -0.000954 -0.08485 . **| . | 0.0020942 0.18625 . |**** . | 0.0080211 0.71337 | . |************** | "." marks two standard errors 9 ARIMA Procedure Name of variable = LY. Period(s) of Differencing = 12. Mean of working series = 0.121282 Standard deviation = 0.063215 Number of observations = 120 NOTE: The first 12 observations were eliminated by differencing. Autocorrelations Lag 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0.0039962 1.00000 | |********************| 0.0029424 0.73631 | . |*************** | 0.0025646 0.64176 | . |************* | 0.0019980 0.49997 | . |********** | 0.0018314 0.45830 | . |********* | 0.0015802 0.39543 | . |******** | 0.0013155 0.32920 | . |******* | 0.0010092 0.25255 | . |***** . | 0.00079972 0.20012 | . |**** . | 0.00058932 0.14747 | . |*** . | 0.00003062 0.00766 | . | . | -0.0004257 -0.10653 | . **| . | -0.0009502 -0.23779 | . *****| . | -0.0005842 -0.14618 | . ***| . | -0.0005817 -0.14556 | . ***| . | -0.0004511 -0.11287 | . **| . | -0.0006197 -0.15507 | . ***| . | -0.0004318 -0.10805 | . **| . | -0.0005272 -0.13193 | . ***| . | -0.0005622 -0.14069 | . ***| . | -0.0006994 -0.17501 | . ****| . | -0.0005544 -0.13872 | . ***| . | -0.000448 -0.11211 | . **| . | -0.0001579 -0.03950 | . *| . | -0.0003788 -0.09480 | . **| . | "." marks two standard errors 10 ARIMA Procedure Name of variable = LY. Period(s) of Differencing = 1,12. Mean of working series = 0.001322 Standard deviation = 0.044889 Number of observations = 119 NOTE: The first 13 observations were eliminated by differencing. Autocorrelations Lag 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0.0020150 1.00000 | |********************| -0.0006363 -0.31578 | ******| . | 0.00021167 0.10505 | . |** . | -0.0004326 -0.21469 | ****| . | 0.00009253 0.04592 | . |* . | 0.00006174 0.03064 | . |* . | 0.00008917 0.04425 | . |* . | -0.0001315 -0.06527 | . *| . | 0.0000188 0.00933 | . | . | 0.00032781 0.16268 | . |***. | -0.0000966 -0.04794 | . *| . | 0.00014118 0.07007 . |* . | -0.0008188 -0.40633 | ********| . | 0.00031546 0.15655 | . |*** . | -0.0000898 -0.04457 | . *| . | 0.00028601 0.14194 | . |*** . | -0.000294 -0.14590 | . ***| . | 0.00018672 0.09266 | . |** . | -0.0000634 -0.03145 | . *| . | 0.00010845 0.05382 | . |* . | -0.0002755 -0.13673 | . ***| . | 0.00006769 0.03359 | . |* . | -0.0001636 -0.08119 | . **| . | 0.00044341 0.22005 | . |****. | -0.0000687 -0.03409 | . *| . | "." marks two standard errors 11 Example 3.2 • Figure 3.3 shows the monthly values of the number of people (Xt) in Wisconsin employed in trade from 1961 to 1975. No predifferencing transformation appears to be necessary. 12 Figure 3.3 Number of employees (in thousands), 1961-1975 400 380 360 340 320 no. of employees 300 280 260 240 220 Mar-60 Dec-62 Sep-65 Jun-68 Mar-71 Dec-73 Aug-76 13 • Next, let’s examine the SAC’s of Xt, its first difference at the non-seasonal level, at the seasonal level and at both the seasonal and non-seasonal levels. 14 ARIMA Procedure Name of variable = X. Mean of working series = 307.5584 Standard deviation = 46.62852 Number of observations = 178 Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 2174.219 1.00000 | |********************| 1 2111.301 0.97106 | . |******************* | 2 2046.143 0.94109 | . |******************* | 3 1990.467 0.91549 | . |****************** | 4 1953.651 0.89855 | . |****************** | 5 1923.082 0.88449 | . |****************** | 6 1894.387 0.87130 | . |***************** | 7 1857.165 0.85418 | . |***************** | 8 1822.990 0.83846 | . |***************** | 9 1795.368 0.82575 | . |***************** | 10 1781.604 0.81942 | . |**************** | 11 1766.588 0.81252 | . |**************** | 12 1754.960 0.80717 | . |**************** | 13 1689.253 0.77695 | . |**************** | 14 1622.604 0.74629 | . |*************** | 15 1565.605 0.72008 | . |************** | 16 1526.444 0.70207 | . |************** | 17 1493.548 0.68694 | . |**************. | 18 1462.579 0.67269 | . |************* . | 19 1424.437 0.65515 | . |************* . | 20 1390.875 0.63971 | . |************* . | 21 1363.633 0.62718 | . |************* . | 22 1347.737 0.61987 | . |************ . | 23 1328.662 0.61110 | . |************ . | 24 1312.463 0.60365 | . |************ . | "." marks two standard errors 15 ARIMA Procedure Name of variable = X. Period(s) of Differencing = 1. Mean of working series = 0.902825 Standard deviation = 7.210001 Number of observations = 177 NOTE: The first observation was eliminated by differencing. Autocorrelations Lag 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 51.984116 1.00000 | |********************| 1.341360 0.02580 | . |* . | -10.104648 -0.19438 | ****| . | -16.397040 -0.31542 | ******| . | -6.537721 -0.12576 | ***| . | 0.720104 0.01385 | . | . | 11.646511 0.22404 | . |**** | 0.382655 0.00736 | . | . | -5.583873 -0.10741 | . **| . | -15.804044 -0.30402 | ******| . | -9.291756 -0.17874 | ****| . | 2.139864 0.04116 | . |* . | 46.868231 0.90159 | . |****************** | 0.801322 0.01541 | . | . | -9.690318 -0.18641 | .****| . | -15.285807 -0.29405 | ******| . | -6.236594 -0.11997 | . **| . | 0.881801 0.01696 | . | . | 10.680823 0.20546 | . |**** . | 0.496121 0.00954 | . | . | -4.968756 -0.09558 | . **| . | -14.320935 -0.27549 | ******| . | -8.286359 -0.15940 | . ***| . | 1.685671 0.03243 | . |* . | 42.361435 0.81489 | . |**************** | "." marks two standard errors 16 ARIMA Procedure Name of variable = X. Period(s) of Differencing = 12. Mean of working series = 10.3759 Standard deviation = 5.005722 Number of observations = 166 NOTE: The first 12 observations were eliminated by differencing. Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 25.057251 1.00000 | |********************| 1 23.551046 0.93989 | . |******************* | 2 21.750363 0.86803 | . |***************** | 3 19.984942 0.79757 | . |**************** | 4 18.383410 0.73366 | . |*************** | 5 17.031926 0.67972 | . |************** | 6 15.647808 0.62448 | . |************ | 7 14.141135 0.56435 | . |*********** | 8 12.707374 0.50713 | . |********** | 9 11.123315 0.44392 | . |*********. | 10 9.421701 0.37601 | . |******** . | 11 7.755107 0.30950 | . |****** . | 12 6.024674 0.24044 | . |***** . | 13 5.018099 0.20027 | . |**** . | 14 4.119250 0.16439 | . |*** . | 15 3.165849 0.12634 | . |*** . | 16 2.245328 0.08961 | . |** . | 17 1.057665 0.04221 | . |* . | 18 -0.103884 -0.00415 | . | . | 19 -0.936067 -0.03736 | . *| . | 20 -1.623877 -0.06481 | . *| . | 21 -2.257332 -0.09009 | . **| . | 22 -2.941722 -0.11740 | . **| . | 23 -3.670260 -0.14647 | . ***| . | 24 -4.472118 -0.17848 | . ****| . | "." marks two standard errors 17 ARIMA Procedure Name of variable = X. Period(s) of Differencing = 1,12. Mean of working series = 0.087273 Standard deviation = 1.438735 Number of observations = 165 NOTE: The first 13 observations were eliminated by differencing. Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 2.069959 1.00000 | |********************| 1 0.380397 0.18377 | . |**** | 2 -0.056837 -0.02746 | . *| . | 3 -0.021478 -0.01038 | . | . | 4 -0.290834 -0.14050 | ***| . | 5 -0.0045074 -0.00218 | . | . | 6 0.200142 0.09669 | . |**. | 7 0.041474 0.02004 | . | . | 8 0.187094 0.09039 | . |**. | 9 0.197702 0.09551 | . |**. | 10 0.0004563 0.00022 | . | . | 11 -0.144889 -0.07000 | . *| . | 12 -0.572732 -0.27669 | ******| . | 13 -0.200208 -0.09672 | . **| . | 14 0.056730 0.02741 | . |* . | 15 0.0061858 0.00299 | . | . | 16 0.287759 0.13902 | . |***. | 17 0.049923 0.02412 | . | . | 18 -0.209991 -0.10145 | . **| . | 19 -0.198252 -0.09578 | . **| . | 20 -0.113819 -0.05499 | . *| . | 21 -0.039443 -0.01906 | . | . | 22 -0.039793 -0.01922 | . | . | 23 0.106062 0.05124 | . |* . | 24 -0.165247 -0.07983 | . **| . | "." marks two standard errors 18 Notations * t Now, suppose that y is a pre-differencing transformed series, the general stationarity transformation is: Zt y D L d * t (1 B ) (1 B) y L D d * t where B is the lag (backward shift) operator, D is the degree of seasonal differencing and d is the degree of non-seasonal differencing. 19 3.3 Estimation and Diagnostic Checking The general seasonal Box-Jenkins model can be written in the form, p(B)p(BL)Zt = δ+θq(B)Q(BL)t where p(B) = (1 1B 2B2 … pBp) is the non-seasonal autoregressive operator of order p, p(BL) = (1 1,LBL 2,LB2L … p,LBpL) is the seasonal autoregressive operator of order P, q(B) = (1 1B 2B2 … pBq) is the non-seasonal moving average operator of order q, Q(BL) = (1 1,LBL 2,LB2L … Q,LBQL) is the seasonal moving average operator of order Q, = p(B)P(BL) The ARIMA notation is usually written as ARIMA (p, d, q) (P, D, Q)L20 . Identification of the order p, q, P and Q are basically the same as in nonseasonal Box-Jenkins models. The following table provides some guidelines for choosing non-seasonal and seasonal operators 21 22 • Estimation is usually carried out using maximum likelihood, as in the case of non-seasonal Box-Jenkins analysis. • As an example, consider the SPAC of the time series of example 3.2, after first difference at both seasonal and non-seasonal levels. 23 Partial Autocorrelations Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 0.18377 | . |**** | 2 -0.06337 | . *| . | 3 0.00689 | . | . | 4 -0.14706 | ***| . | 5 0.05599 | . |* . | 6 0.07694 | . |**. | 7 -0.00961 | . | . | 8 0.08102 | . |**. | 9 0.07084 | . |* . | 10 0.00110 | . | . | 11 -0.07361 | . *| . | 12 -0.25948 | *****| . | 13 0.01555 | . | . | 14 0.00615 | . | . | 15 -0.03042 | . *| . | 16 0.09184 | . |**. | 17 -0.01900 | . | . | 18 -0.04354 | . *| . | 19 -0.08056 | .**| . | 20 0.03107 | . |* . | 21 0.03687 | . |* . | 22 -0.06723 | . *| . | 23 0.03476 | . |* . | 24 -0.18999 | ****| . | 24 • At the non-seasonal level, both the SAC and SPAC appear to have a significant spike at lag 1 and cuts off after lag 1. • One can tentatively identify an AR(1), MA(1) or ARMA(1, 1) models for the non-seasonal part of the series. • At the seasonal level, the SPAL appears to be dying down, while the SAC cuts off after lag 12. Hence a seasonal MA(1) model is identified. • Combining both the seasonal & non-seasonal levels, we have the following tentative models: ARIMA(1, 1, 0) (0, 1, 1)12, ARIMA(0, 1, 1) (0, 1, 1)12, ARIMA(1, 1, 1) (0, 1, 1)12 25 • The SAS program for estimating these models is as follows: • • • • • • • • • • • • • • • data employ; input @7 x; cards; etc. 239.6 236.4 236.8 241.5 ; proc arima data=employ; identify var=x(1,12); estimate p=1 q=(12) printall plot method=ml; estimate q=(1) (12) printall plot method=ml; estimate p=1 q=(1) (12) printall plot method=ml; run; 26 ARIMA(1,1,0)(0,1,1)12 Estimation results of an model Maximum Likelihood Estimation Parameter MU MA1,1 AR1,1 Estimate 0.07694 0.41307 0.16005 Constant Estimate = Approx. Std Error 0.07647 0.07493 0.07702 T Ratio 1.01 5.51 2.08 Lag 0 12 1 0.0646288 Variance Estimate = 1.80012263 Std Error Estimate = 1.34168649 AIC = 570.489073 SBC = 579.80691 Number of Residuals= 165 Autocorrelation Check of Residuals To Lag 6 12 18 24 30 Chi Square 4.42 7.61 11.92 19.71 24.38 Autocorrelations DF 4 10 16 22 28 Prob 0.352 0.667 0.750 0.601 0.662 0.011 -0.048 0.039 -0.103 -0.001 0.106 -0.059 0.024 0.100 -0.020 0.013 0.059 -0.085 0.050 -0.035 0.082 0.041 -0.063 -0.146 -0.046 0.027 -0.020 0.103 -0.075 -0.090 0.061 -0.064 -0.044 0.068 -0.030 27 ARIMA(0,1,1)(0,1,1)12 Estimation results of an Maximum Likelihood Estimation Parameter MU MA1,1 MA2,1 Estimate 0.07723 -0.17261 0.40941 Constant Estimate Approx. Std Error 0.07570 0.07695 0.07502 T Ratio 1.02 -2.24 5.46 model Lag 0 1 12 = 0.07723315 Variance Estimate = 1.79723527 Std Error Estimate = 1.34061004 AIC = 570.185014 SBC = 579.50285 Number of Residuals= 165 Autocorrelation Check of Residuals To Lag 6 12 18 24 30 Chi Square 4.04 7.15 11.57 19.20 23.68 Autocorrelations DF 4 10 16 22 28 Prob 0.400 0.711 0.773 0.633 0.698 0.000 -0.025 0.040 -0.101 -0.002 0.105 -0.058 0.026 0.098 -0.020 0.013 0.058 -0.087 0.055 -0.039 0.083 0.036 -0.061 -0.142 -0.046 0.027 -0.026 0.102 -0.076 -0.089 0.059 -0.065 -0.043 0.063 -0.031 28 Estimation results of an model ARIMA(1,1,1)(0,1,1)12 Maximum Likelihood Estimation Parameter MU MA1,1 MA2,1 AR1,1 Estimate 0.07778 -0.45907 0.40276 -0.29315 Constant Estimate Approx. Std Error 0.07374 0.37080 0.07533 0.39756 T Ratio 1.05 -1.24 5.35 -0.74 Lag 0 1 12 1 = 0.10058242 Variance Estimate = 1.80600827 Std Error Estimate = 1.34387807 AIC = 571.896463 SBC = 584.320245 Number of Residuals= 165 Autocorrelation Check of Residuals To Lag 6 12 18 24 30 Chi Square 3.19 5.88 10.55 18.00 21.80 Autocorrelations DF 3 9 15 21 27 Prob 0.364 0.752 0.784 0.649 0.747 0.010 0.019 0.015 -0.050 0.031 0.093 -0.089 0.059 -0.042 -0.140 -0.053 0.025 -0.085 0.048 -0.064 -0.087 -0.005 0.101 -0.011 0.009 0.054 0.089 0.027 -0.060 -0.028 0.097 -0.076 -0.039 0.052 -0.033 29 Diagnostic checking is conducted using the Ljung-Box-Pierce Statistic k Q n(n 2) r * 1 z e ( n ) where n is number of observations available after differencing. 30