How to fit a time series model when you know it is wrong? Howell Tong London School of Economics Y. Xia and H. Tong, Statistical Science 2011 March 25, 2014 Introduction A New Approach Outline 1 Introduction 2 A New Approach 3 Real Data 4 Conclusions Real Data Conclusions Introduction A New Approach Real Data Conclusions Are Conventional Methods of Estimation Fit for Purpose? Conventional Methods of Parametric Modelling 1 Assume that there is a true model; 2 Evaluate the efficacy of the estimation as if the postulated model is true. Introduction A New Approach Real Data Conclusions Are Conventional Methods of Estimation Fit for Purpose? Let y = the observed/real data x = the data generated by the postulated model. Introduction A New Approach Real Data Conclusions Are Conventional Methods of Estimation Fit for Purpose? Postulated Model xt = gθ (xt−1 , ..., xt−p ) + εt , (1) where εt is the innovation, and the function gθ (.) is known up to parameters θ. Introduction A New Approach Real Data Conclusions Are Conventional Methods of Estimation Fit for Purpose? Observed Data: {y1 , y2 , · · · , yT } Conventional methods Typically minimize a loss function −1 L(θ) = (T − p) T −1 X {yt+1 − gθ (yt , ..., yt−p+1 )}2 , t=p where, here and elsewhere, T denotes the sample size. Introduction A New Approach Real Data Conclusions Are Conventional Methods of Estimation Fit for Purpose? Efficient IF the postulated model xt = gθ (xt−1 , ..., xt−p ) + εt , is true. Introduction A New Approach Real Data Conclusions Are Conventional Methods of Estimation Fit for Purpose? Match observations {y1 , y2 , ..., yT }, by postulating g. Postulate linear gθ , Gaussian εt , zero mean, finite var(εt ). Let {C(j), j = 0, 1, 2, ..., T − 1} denote the sample autocovariance function of the y-data. Minimizing L(θ) yields well-known estimates of θ that are functions of S = {C(0), C(1), ..., C(p)}. Introduction A New Approach Real Data Conclusions Are Conventional Methods of Estimation Fit for Purpose? 1 Postulated model is right S is a minimal set of sufficient statistics (ignoring boundary effects) 2 Postulated model is wrong It is unlikely that S is a minimal set of sufficient statistics. Introduction A New Approach Real Data Multi-step predictions ARIMA(0,1,1) model leads to EWMA predictor: (1 − B)Xt = (1 − ηB)et X̂t (1) = (1 − η)Xt + η X̂t−1 (1). X̂t (ℓ) = X̂t (ℓ − 1); ℓ = 2, 3, . . . Conclusions Introduction A New Approach Real Data Figure: Box and Jenkins Series A-Chemical Process Concentration Readings: every two hours Conclusions Introduction A New Approach Real Data Conclusions Figure: Ave(η(ℓ)) refers to model in pocket ℓ. R̂(ℓ) is ratio of ave squared prediction errors using η estimated in 1st pocket to that in the Introduction A New Approach Real Data George Box 1 Essentially, all models are wrong, but some are useful. 2 Consistency of estimation is predicated on the model being correct/true. 3 Post-modelling reconciliation of Box’s dictum : diagnostic checks, goodness of fit tests, etc. Challenge Recognising Box’s dictum right at the modelling stage. Conclusions Introduction A New Approach Real Data Conclusions Catch-all Approach Our objective Good matching of salient features of the observed time series. Short-term prediction is secondary. Introduction A New Approach Real Data Conclusions Catch-all Approach For expositional simplicity, let p = 1. We want P {x1 (θ0 ) < u1 , ..., xn (θ0 ) < un |x0 (θ0 ) = y0 } to be as close as possible to ≡ P {y1 < u1 , ..., yn < un |y0 } almost surely for some θ0 and any n and real values u1 , u2 , . . . , un . Introduction A New Approach Real Data First Weaker Form First weaker form Generalize the loss function to L(θ) = (T − p)−1 Q T −1 X X [m] wm {yt+m − gθ (yt , ..., yt−p+1 ), }2 , t=p m=1 where [m] gθ (yt , ..., yt−p+1 ) = E(xt+m |xt = yt , . . . , xt−p+1 = yt−p+1 ), P and wm ≥ 0 and wm = 1. Conclusions Introduction A New Approach Real Data Conclusions Second Weaker Form Suppose that observable{yt } and generated {xt (θ)} are both second-order stationary. DC (yt , xt (θ)) = sup ∞ X wm {γx(θ) (m) − γy (m)}2 . {wm } m=0 ∞ 1 1X fy (ω) = γ(0) + γy (k) cos(kω). 2π π k=1 DF (yt , xt (θ)) = Z π n f (ω) o f (ω) y + log( x ) − 1 dω. fy (ω) −π fx (ω) Introduction A New Approach Real Data Second Weaker Form Whittle’s likelihood T n o X I(ωj ) θ̂ = min + log(fθ (ωj )) , θ fθ (ωj ) j=1 where I(ω) = T 2 1 X yt exp(−ιωt) 2πT t=1 and ωj = 2πj/T . Whittle’s likelihood matches the second-order moments by using a natural sample version of DF (yt , xt (θ)) up to a constant. Conclusions Introduction A New Approach Real Data Conclusions Example R1 [Sea levels] 15 sea levels 10 5 0 −5 −10 −15 1994 1996 1998 2000 2002 year 2004 2006 2008 Introduction A New Approach Real Data Conclusions Example R1 [Sea levels] Postulated model= AR(6), using AIC for order determination. SDF MLE; Whittle; APE(≤ 20) 2 2 2 1 1 1 0 0 2 0 0 2 0 0 2 Introduction A New Approach Real Data Conclusions Example R1 [Sea levels] relative averaged multi-step ahead prediction errors by taking those of the one-step method as 1 unit. o = APE(≤ 10); × =APE(≤ 20); ⋆ = APE(≤ 30); ⋄ = APE(≤ 50) relative prediction errors 1.05 1 0.95 5 10 15 20 no. of steps ahead 25 30 Introduction A New Approach Real Data Conclusions Example R2: Nicholson’s Blowflies Total number of blowflies (Lucilia cuprina) under controlled laboratory conditions. Counts for every second day. The developmental delay (from egg to adult) is between 14-15 days. Nicholson obtained 361 bi-daily recordings over a 2-year period (722 days). A major transition appears to have occurred around day 400. Following Tong (1990), we consider the first part of the time series (to day 400, thus T=200), for which the population has a 19-bi-day cycle; see figure. Introduction A New Approach Real Data Conclusions We postulate the single species animal population discrete model as suggested by Gurney et al. (1980), and thus xt = P oisson(cxt−τ exp(−xt−τ /N0 )xt−τ + νxt−1 ), where we take τ = 8 (bi-days) corresponding to the time taken for an egg to develop into an adult. 3 parameters: c, N0 and ν. MLE estimates for the parameters are ĉ = 8.49, N̂0 = 528.23, ν̂ = 0.77. Catch-all method gives ĉ = 8.82, N̂0 = 604.98, ν̂ = 0.67. Introduction A New Approach Real Data Conclusions 10 10 x 10 x 10 2.5 3 power 2 2 1.5 1 1 0.5 0 0 10 20 30 40 period (bi−days/cycle) 50 0 0 10 20 30 40 period (bi−days/cycle) 50 A New Approach population population Introduction Real Data Conclusions 10000 5000 0 0 50 100 150 0 50 100 150 200 250 300 350 400 200 250 300 350 400 10000 5000 0 Time(days) Introduction A New Approach Real Data Conclusions How do the cycles change with the time needed by the fly to grow to maturity? We vary the time τ from 4 to 100 bi-days. The corresponding cycles (in bi-days) are shown in the figure. APE(≤ T ) shows a clear linearly increasing trend in the cycle-periods as τ increases. APE(≤ 1) shows strange excursions that are difficult to interpret. Introduction A New Approach Real Data Conclusions Example R3: [Measles in London] Postulated SIR model for the transmission of measles [infected (I); susceptible (S); birth (B)]: It+1 = exp(δt,k βk )St It , St+1 = St +Bt −It+1 = S0 + t X Bτ − τ =0 t+1 X τ =1 It cannot be observed; can observe yt that has mean It . For observable yt , we postulate model xt = P oi(It ), where (δt,k βk ) is employed to indicate the seasonality force, with δt,k = 1 if time t is at the kth season, 0 otherwise. For measles, the time unit for t is bi-weekly, based on the infection procedure of measle. Iτ , Introduction A New Approach Real Data Conclusions Table: Estimated parameters in the transmission model method APE(≤ 1) APE(≤ T ) APE(≤ 1) APE(≤ T ) APE(≤ 1) APE(≤ T ) APE(≤ 1) APE(≤ T ) β1 -11.92 -11.95 β9 -11.92 -11.95 β17 -11.95 -11.97 β25 -11.99 -11.99 β2 -12.00 -12.00 β10 -11.99 -11.99 β18 -12.15 -12.08 β26 -11.98 -11.98 β3 -11.88 -11.93 β11 -12.05 -12.03 β19 -12.28 -12.16 S0 178280 168190 β4 -11.99 -11.99 β12 -12.01 -12.00 β20 -12.40 -12.23 β5 -11.89 -11.93 β13 -11.93 -11.96 β21 -12.21 -12.12 β6 -11.81 -11.89 β14 -11.96 -11.98 β22 -11.99 -11.99 β7 -11.89 -11.93 β15 -11.98 -11.99 β23 -11.79 -11.87 β8 -11.97 -11.98 β16 -12.04 -12.02 β24 -11.87 -11.92 Introduction A New Approach Real Data Conclusions measles cases in London The skeletons based on APE(≤ 1) and APE(≤ T ) are shown in solid red line in panel 1 and panel 2. APE(≤ T ) much better match than APE(≤ 1) in terms of outbreak scale and cycle period. 10000 5000 0 1970 1975 1980 1985 1990 1970 1975 1980 1985 1990 1970 1975 1980 1985 1990 6000 4000 2000 adjusted biweekly births 0 3000 2000 1000 0 Introduction A New Approach Real Data Conclusions The periodogram is also much better matched by APE(≤ T ) than by APE(≤ 1). 10 power 15 10 x 10 6 10 4 5 2 0 0 50 100 150 period (biweeks/cycle) 0 x 10 0 50 100 150 period (biweeks/cycle) Introduction A New Approach Real Data Conclusions period (years/cycle) 6 5 4 3 2 1 0 0 2000 4000 6000 8000 10000 number of births Figure shows clearly that when the birth rate is high (from about 5000 and above) the cycle is annual, but when the birth rate is medium at about 3000 to 4000, the cycles become two-year cycles. Introduction A New Approach Real Data Conclusions As the birth rate gets lower, the model shows that cycle become three-year cycles or even five-year cycles. Thus, the postulated model has thrown some light on the effect of the birth rate on the frequency of outbreaks of measles. Our conclusion agrees with the changing cycle-periods observer over different time durations and at different places as reported by the epidemiologists Earn et al. Nature, 2000. Introduction A New Approach Real Data Conclusions Conclusions (1) Truer to Box than Box! (2) Ways to improve the ability of a postulated parametric model to match features (e.g. second-order moments, cycles, and others) of the observed time series that are deemed important; Introduction A New Approach Real Data Conclusions Conclusions (3) Methods based on just the one-step-ahead prediction often found unfit for purpose, in many situations: the absence of a true model short data sets observation errors highly cyclical data and others (4) Deeper understanding of Whittle’s likelihood as an extended likelihood of a model; W-likelihood is a precursor of XT-likelihood. Introduction A New Approach Real Data TONY, ENJOY YOUR PERMANENT SABBATICAL LEAVE! Conclusions Introduction A New Approach Real Data Conclusions Optimal Parameter For some positive integer m (which may be infinite), we define the optimal parameter by ϑm,w = arg min θ m X k=1 2 wk E yt+k − E{xt+k (θ)|Xt (θ) = Yt } , where Xt (θ) = (xt (θ), xt−1 (θ), . . . , xt+p−1 (θ)), and {wk } define the weight function, typically positive and summing to unity. Introduction A New Approach Real Data Conclusions Optimal Parameter For ease of exposition, we assume that the solution to the above minimization is unique. Then we have immediately Theorem Under general conditions θ̃{m} → ϑm,w in probability as the sample size T → ∞.