Forecasting (prediction) limits Example Linear deterministic trend estimated by least-squares Yt 0 t t et Yˆt l E Yt l Y1 , , Yt b0 b1 t l b0 Y1 , , Yt b1 Y1 , , Yt t l eˆt l Yt l Yˆt l Var eˆt l Yt l and Yˆt l independent Var Yt l Var Yˆt l 2 t 1 t l 2 1 From regression analysis t heory e2 e2 2 t t t 1 s s 1 2 2 t 1 t l 1 2 e2 1 2 t t t 1 s s 1 2 Note! The average of the numbers 1, 2, … , t is 1 1 t t 1 t 1 t s 1 s t t 2 2 Hence, calculated prediction limits for Yt+l become t 1 t l 1 2 Yˆt l c ˆ e 1 2 t t t 1 s 1 s 2 where c is a quantile of a proper sampling distribution emerging from the use of ˆ e2 as an estimatorof e2 and the requested coverage of the limits. For t large it suffices to use the standard normal distribution and a good approximation is also obtained even if the term t 1 t l 1 2 2 t t t 1 s1 s 2 is omitted under the square root Yˆt l z 2 ˆ e PrN 0,1 z 2 2 ARIMA-models Yˆt l z 2 ˆ e l 1 2 ˆ j j 0 where ˆ 0 ,,ˆ l 1 are functions of the parameter estimatesˆ ,, ˆ and ˆ ,,ˆ 1 p 1 q Using R ts=arima(x,…) for fitting models plot.Arima(ts,…) for plotting fitted models with 95% prediction limits See documentation for plot.Arima . However, the generic command plot can be used. forecast.Arima Install and load package “forecast”. Gives more flexibility with respect to prediction limits. Seasonal ARIMA models Example “beersales” data A clear seasonal pattern and also a trend, possibly a quadratic trend Residuals from detrended data beerq<-lm(beersales~time(beersales)+I(time(beersales)^2)) plot(y=rstudent(beerq),x=as.vector(time(beersales)),type="b", pch=as.vector(season(beersales)),xlab="Time") Seasonal pattern, but possibly no long-term trend left SAC and SPAC of the residuals: SAC Spikes at or close to seasonal lags (or half-seasonal lags) SPAC Modelling the autocorrelation at seasonal lags Pure seasonal variation: Yt 1 Yt 12 et Seasonal AR(1)12 - model Stationary if 1 1 Roots to the characteristic equation 1 12 1 0 outside the unit circle 1k 12 ρk 0 k 0,12,24,36,... otherwise Yt et 1 et 12 Seasonal MA(1)12 - model Invertible if 1 1 Root s to the characteristic equation 1 12 1 0 outside the unit circle 1 1 ρk 2 1 1 0 k 0 k 12 otherwise Non-seasonal and seasonal variation: AR(p, P)s or ARMA(p,0)(P,0)s Yt 1 Yt 1 p Yt p 1 Yt s P Yt Ps et However, we cannot discard that the non-seasonal and seasonal variation “interact” Better to use multiplicative Seasonal AR Models 1 B B 1 B p 1 p 1 s P BPs Yt et Example: 1 0.3 B 1 0.2 B12 Yt et 1 0.3 B 0.2 B12 0.3 0.2 B13 Yt et Yt 0.3 Yt 1 0.2 Yt 12 0.05 Yt 13 et Multiplicative MA(q, Q)s or ARMA(0,q)(0,Q)s Yt 11B q Bq 1 1Bs Q BQs et Mixed models: 1 B B 1 B B Y 1 B B 1 B B e p 1 p 1 P q 1 P s s q t Q s s 1 Q t Many terms! Condensed expression: B B s Yt B B s et B 1 i 1i B i ; B s 1 i 1 i B s p i P B s 1 j 1 j B j ; B 1 j 1 j B q Q s j ARMA p, q P, Qs Non-stationary Seasonal ARIMA models Non-stationary at non-seasonal level: Model dth order regular differences: dYt Yt 1 B Yt d Non-stationary at seasonal level: Seasonal non-stationarity is harder to detect from a plotted times-series. The seasonal variation is not stable. Model Dth order seasonal differences: Example First-order monthly differences: 12Yt 1 Bs Yt Yt Yt 12 can follow a stable seasonal pattern sDYt s s sYt 1 B s Yt D The general Seasonal ARIMA model B B s 1 B d 1 B s Yt B B s et D It does not matter whether regular or seasonal differences are taken first ARIMA p, d , q P, D, Qs Model specification, fitting and diagnostic checking Example “beersales” data Clearly nonstationary at nonseasonal level, i.e. there is a longterm trend Investigate SAC and SPAC of original data Many substantial spikes both at non-seasonal and at seasonal levelCalls for differentiation at both levels. Try first-order seasonal differences first. Here: monthly data Wt 1 B12 Yt Yt Yt 12 beer_sdiff1 <- diff(beersales,lag=12) Look at SAC and SPAC again Better, but now we need to try regular differences Take first order differences in seasonally differenced data Ut 1 B 1 B12 Yt 1 BWt Wt Wt 1 Yt Yt 12 Yt 1 Yt 13 beer_sdiff1rdiff1 <- diff(beer_sdiff1,lag=1) Look at SAC and SPAC again SAC starts to look “good”, but SPAC not Take second order differences in seasonally differenced data Since we suspected a non-linear long-term trend Vt 1 B 1 B12 Yt 1 B U t U t U t 1 2 Wt Wt 1 Wt 1 Wt 2 Wt 2Wt 1 Wt 2 Yt Yt 12 2Yt 1 Yt 13 Yt 2 Yt 14 beer_sdiff1rdiff2 <- diff(diff(beer_sdiff1,lag=1),lag=1) Non-seasonal part Seasonal part Could be an ARMA(2,0)(0,1)12 or an ARMA(1,1) (0,1)12 These models for original data becomes ARIMA(2,2,0) (0,1,1)12 and ARIMA(1,2,1) (0,1,1)12 model1 <-arima(beersales,order=c(2,2,0), seasonal=list(order=c(0,1,1),period=12)) Series: beersales ARIMA(2,2,0)(0,1,1)[12] Coefficients: ar1 -1.0257 s.e. 0.0596 ar2 -0.6200 0.0599 sma1 -0.7092 0.0755 sigma^2 estimated as 0.6095: log likelihood=-216.34 AIC=438.69 AICc=438.92 BIC=451.42 Diagnostic checking can be used in a condensed way by function tsdiag. The Ljung-Box test can specifically be obtained from function Box.test tsdiag(model1) standardized residuals SPAC(standardized residuals) P-values of Ljung-Box test with K = 24 Box.test(residuals(model1), lag = 12, type = "Ljung-Box", fitdf = 3) p+q+P+Q (how many degrees of freedom withdrawn from K) K (how many lags included) Box-Ljung test data: residuals(model1) X-squared = 30.1752, df = 9, p-value = 0.0004096 For seasonal data with season length s the L-B test is usually calculated for K = s, 2s, 3s and 4s Box.test(residuals(model1), lag = 24, type = "Ljung-Box", fitdf = 3) Box-Ljung test data: residuals(model1) X-squared = 57.9673, df = 21, p-value = 2.581e-05 Box.test(residuals(model1), lag = 36, type = "Ljung-Box", fitdf = 3) Box-Ljung test data: residuals(model1) X-squared = 76.7444, df = 33, p-value = 2.431e-05 Box.test(residuals(model1), lag = 48, type = "Ljung-Box", fitdf = 3) Box-Ljung test data: residuals(model1) X-squared = 92.9916, df = 45, p-value = 3.436e-05 Hence, the residuals from the first model are not satisfactory model2 <-arima(beersales,order=c(1,2,1), seasonal=list(order=c(0,1,1),period=12)) print(model2) Series: beersales ARIMA(1,2,1)(0,1,1)[12] Coefficients: ar1 -0.4470 s.e. 0.0678 ma1 -0.9998 0.0176 sma1 -0.6352 0.0930 sigma^2 estimated as 0.4575: log likelihood=-192.86 AIC=391.72 AICc=391.96 BIC=404.45 Better fit ! But is it good? tsdiag(model2) Not good! We should maybe try second-order seasonal differentiation too. Time series regression models The classical set-up uses deterministic trend functions and seasonal indices Yt mt S t et Examples: 12 Yt 0 1 t s , j x j t et linear trend in monthly data j 2 1 if t is in month j where x j t otherwise 0 Yt 0 1 t 2 t 2 quatadic trend, no seasonal variation The classical set-up can be extended by allowing for autocorrelated error terms (instead of white noise). Usually it is sufficient with and AR(1) or AR(2). However, the trend and seasonal terms are still assumed deterministic. Dynamic time series regression models To extend the classical set-up with explanatory variables comprising other time series we need another way of modelling. Note that a stationary ARMA-model Yt 0 1 Yt 1 p Yt p et 1 et 1 q et q 1 B 1 p q B Y 1 B B et p t 0 1 1 B Yt 0 B et can also be written B Yt 0 et B 0 0 B The general dynamic regression model for a response time series Yt with one covariate time series Xt can be written C B b B Yt 0 B Xt et B B Special case 1: Xt relates to some event that has occurred at a certain time points (e.g. 9/11) It can the either be a step function 1 t T X t T 0 t T St T or a pulse function 1 t T X t T 0 t T Pt T Step functions would imply a permanent change in the level of Yt . Such a change can further be constant or gradually increasing (depending on (B) and (B) ). It can also be delayed (depending on b ) Pulse functions would imply a temporary change in the level of Yt . Such a change may be just at the specific time point gradually decreasing (depending on (B) and (B) ). Strep and pulse functions are used to model the effects of a particular event, as so-called intervention. Intervention models For Xt being a “regular” times series (i.e. varying with time) the models are called transfer function models