Some more issues of time series analysis Time series regression with modelling of error terms In a time series regression model L 1 yt 0 1 t 2 t 2 3 t 3 s , j xs , j t j 1 the error terms are tentatively assumed to be independent and identically distributed. Is this wise? Performing e.g. the Durbin-Watson test we may quite easily answer the question whether they are or not. What if D-W gives evidence of serial correlation in the error terms? Apply an AR(p) model to the error terms at the same time as the rest of the model is fitted. Standard procedure: • Study the residuals from an ordinary regression fit • Identify which order p of the AR-model that may be the most appropriate for the error terms. • Make the fit of the combined regression-AR-model Estimation can no longer be done using ordinary least-squares. Instead the conditional least-squares method is used. Procedures are not curretly available in Minitab, but in more comprehensive computer packages such as SAS and SPSS. Example Consider again the Hjälmaren month data set (that is used in assignments for weeks 36, 39 and 41) Time Series Plot of Discharge.m 120 Discharge.m 100 80 60 40 20 0 Month jan Year 1994 jan 2011 jan 2028 jan 2045 jan 2062 jan 2079 jan 2096 Minitab output from an ordinary time series regression: The regression equation is Discharge.m = 83.1 - 0.0300 Time.m + 2.79 Jan + 6.36 Feb + 7.89 Mar + 16.1 Apr + 12.2 May - 5.06 Jun - 10.9 Jul - 10.1 Aug - 10.3 Sep - 10.1 Oct - 4.64 Nov Predictor Constant Time.m Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Coef 83.13 -0.03000 2.795 6.359 7.887 16.145 12.228 -5.059 -10.938 -10.144 -10.278 -10.138 -4.638 S = 19.1121 SE Coef 33.60 0.01727 2.613 2.613 2.613 2.613 2.613 2.613 2.613 2.613 2.613 2.613 2.613 R-Sq = 18.8% T 2.47 -1.74 1.07 2.43 3.02 6.18 4.68 -1.94 -4.19 -3.88 -3.93 -3.88 -1.77 P 0.013 0.083 0.285 0.015 0.003 0.000 0.000 0.053 0.000 0.000 0.000 0.000 0.076 R-Sq(adj) = 18.1% Residual plots Autocorrelation Function for RESI1 (with 5% significance limits for the autocorrelations) Time Series Plot of RESI1 1.0 100 0.8 75 Autocorrelation 0.6 RESI1 50 25 0 0.4 0.2 0.0 -0.2 -0.4 -0.6 -25 -0.8 -1.0 -50 Month jan Year 1991 jan 2008 jan 2025 jan 2042 jan 2059 jan 2076 jan 2093 1 5 10 15 20 25 30 35 40 45 Lag 50 55 60 65 70 75 80 Partial Autocorrelation Function for RESI1 (with 5% significance limits for the partial autocorrelations) 1.0 Residuals seem to follow an ARmodel with order 1 or 2 Partial Autocorrelation 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 1 5 10 15 20 25 30 35 40 45 Lag 50 55 60 65 70 75 80 SPSS output of a regression analysis with error term modelled as AR(1) FINAL PARAMETERS: Number of residuals Standard error Log likelihood AIC SBC AR1 JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV TIME CONSTANT 1284 15.210763 -5310.1953 10648.391 10720.599 Variables in the Model: B SEB .605651 .022323 2.641382 1.644536 6.239922 2.077007 7.788472 2.295270 16.059974 2.411374 12.151510 2.468703 -5.129816 2.485909 -11.003682 2.468291 -10.204025 2.410445 -10.331080 2.293586 -10.180902 2.074100 -4.664558 1.639306 -.031821 .034726 86.726889 67.485642 Variance of pure error term smaller than variance of error term in ordinary regression! T-RATIO 27.130704 1.606156 3.004285 3.393270 6.660092 4.922224 -2.063557 -4.458016 -4.233254 -4.504335 -4.908587 -2.845447 -.916338 1.285116 APPROX. PROB. .00000000 .10848820 .00271421 .00071191 .00000000 .00000097 .03926251 .00000900 .00002469 .00000727 .00000104 .00450615 .35966355 .19898601 Non-parametric tests for trend All models so far taken up in the course are parametric models. Parametric models assume a specific probability distribution is governing the obtained observations (i.e. the normal distribution) and The population mean value of each observation can be expressed in terms of the parameters of the model. What if we cannot specify this probability distribution? • Least-squares fitting of time series regression models can still be done, but none of the significance tests are valid We cannot test for the presence of a trend (nor for the presence of seasonal variation) • Classical decomposition is still possible but they have no significance tests built-in (they are all descriptive analysis tools) • Conditional least-squares estimation in ARIMA models are not valid as they emerge from the assumption that the observations are normally distributed. As a consequence the significant tests are not valid. The Mann-Kendall test for a monotonic trend Example: Look again at the data set of sales values from lecture 3, but with restriction to the years 1985-1996 Year Sales values 1985 151 1986 151 1987 147 1988 149 1989 146 1990 142 1991 143 1992 145 1993 141 1994 143 1995 145 1996 138 Sales values 160 140 120 100 80 60 40 20 0 1985 1987 1989 1991 1993 1995 Could there be a trend in data? If there is a trend, we do not assume that it has a specific functionalform, such as linear or quadratic, just assume it is monotonic, i.e. decreasing or increasing. In this case it would be a decreasing trend. The sign function: 1 x 0 sgn x 1 x 0 0 x0 Now define the Mann-Kendall test statistic as T sgn y j yi i j i.e. the statistic is a sum of +1:s, –1:s and 0:s depending on whether yj is higher than, lower than or equal to yi for each pair of time points (i, j : i < j) . Large positive values of T would then be consistent with an upward trend Large negative values of T would be consistent with a downward trend Values around 0 of T would be consistent with no trend For the current data set: Now, is T = – 43 enough negatively large to show evidence for a trend? The non-parametric initial “fashion”: • Calculate all possible values of T by letting each difference yj – yi , i < j have in order the signs –1, 0 and 1. • (Put these values in ascending order ) • For the test of H0: No trend vs. HA : Negative monotonic trend at the level of significance , calculate the (100)th percentile of the (ordered) values T • If the observed T is < T reject H0 , otherwise “accept” H0 If a fairly long length of the time series this procedure is quite tedious. Approximate solution: The variance of T can be shown to be g 1 VarT n n 1 2n 5 t p t p 1 2t p 5 18 p 1 where n is the length of the time series, g is the number of so-called ties (ties means values that have duplicates) and tp is the number of duplicates for tie p. Then for fairly large n T is approx. N 0,1 if H 0 is true VarT For the current time series of sales values: n = 11 g = 3 (the values 143, 145 and 151 have each two duplicates) t1 = t2 = t3 = 2 Var (T ) = (1/18)(111027 – (3219)) = 162 T 43 3.378 VarT 162 P-value is 0.00036 Thus H0 may be rejected at any reasonable level of significance For time series with seasonal variation, Hirsch & Sclack has developed a modification of the Mann-Kendall test with test statistic L TS Tk k 1 where Tk is the Mann-Kendall test statistic for the time series consisting of values from season k only (e.g. for montly data we consider the series of January values, the series of February values etc.) Expressions for the variance of TS can be derived and analogously to the MannKendall test TS is approx. N 0,1 if H 0 is true VarTS