Time Series Forecasting: The Case for the Single Source of Error State Space Model J. Keith Ord, Georgetown University Ralph D. Snyder, Monash University Anne B. Koehler, Miami University Rob J. Hyndman, Monash University Mark Leeds, The Kellogg Group http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/2005 1 Outline of Talk • Background • General SSOE model – Linear and nonlinear examples – Estimation and model selection • General linear state space model – – – – – – MSOE and SSOE forms Parameter spaces Convergence Equivalent Models Explanatory variables ARCH and GARCH models • Advantages of SSOE 2 Review Paper A New Look At Models for Exponential Smoothing (2001). JRSS, series D [The Statistician], 50, 14759. Chris Chatfield, Anne Koehler, Keith Ord &Ralph Snyder 3 Framework Paper A State Space Framework for Automatic Forecasting Using Exponential Smoothing(2002) International J. of Forecasting, 18, 439-454 Rob Hyndman, Anne Koehler, Ralph Snyder & Simone Grosse 4 Some background • The Kalman filter: Kalman (1960), Kalman & Bucy (1961) • Engineering: Jazwinski (1970), Anderson & Moore (1979) • Regression approach: Duncan and Horn (JASA, 1972) • Bayesian Forecasting & Dynamic Linear Model: Harrison & Stevens (1976, JRSS B); West & Harrison (1997) • Structural models: Harvey (1989) • State Space Methods: Durbin & Koopman (2001) 5 Single Source of Error (SSOE) State Space Model • Developed by Snyder (1985) among others • Also known as the Innovations Representation • Any Gaussian time series has an innovations representation [SSOE looks restrictive but it is not!] 6 Why a structural model? • Structural models enable us to formulate model in terms of unobserved components and to decompose the model in terms of those components • Structural models will enable us to formulate schemes with non-linear error structures, yet familiar forecast functions 7 General Framework: Notation yt : the observable process of interest, and we set I t {y t , y t -1 ,..., y1} xt : vector of unobservab le state variables t : the unobservab le random errors with means 0 and variance 2 mt : vector of estimators for state variables 8 Single Source of Error (SSOE) State Space Model yt h(xt 1 ) k (xt 1 ) t xt f (xt 1 ) g(xt 1 , α) t t ~ NID(0, ) 2 xt is a k 1 state vector and α is a k 1 vector of parameters 9 Simple Exponential Smoothing (SES) Measurement Equation yt t 1 t State Equation t t 1 t t is the level at time t 10 Another Form for State Equation Measurement Equation yt t 1 t State Equation t t 1 ( yt t 1 ) or t yt (1 ) t 1 11 Reduced ARIMA Form ARIMA(0,1,1): yt yt 1 t (1 ) t 1 12 Another SES Model Measurement Equation yt t 1 t 1 t State Equation t t 1 t 1 t 13 Same State Equation for Second Model yt t 1 t t 1 t t 1 t 1 yt t 1 t 1 t t 1 ( yt t 1 ) 14 Reduced ARIMA Model for Second SES Model NONE 15 Point Forecasts for Both Models yˆ t h ̂ t ˆ ˆ ˆ ( y ˆ ) t t 1 t t 1 or ˆ ˆy (1 ˆ )ˆ ) t t t 1 16 SSOE Model for Holt-Winters Method yt ( t 1 bt 1 )st m ( t 1 bt 1 )st m t t ( t 1 bt 1 ) ( t 1 bt 1 ) t bt bt 1 ( t 1 bt 1 ) t st st m st m t 17 Likelihood, Exponential Smoothing, and Estimation Likelikhoo d with fixed x0 n n L (α , x ) n log 2 2 log k ( x ) t t 0 t 1 t 1 yt h(x t 1 ) t k (x t 1 ) yt h(x t 1 ) x t f (x t 1 ) g(x t 1 ) k (x t 1 ) 18 Model Selection Akaike Informatio n Criterion AIC L (αˆ , xˆ 0 ) 2 p p is the number of free states plus the number of parameters 19 General Linear State Space Model yt h'xt 1 t xt Fx t 1 ηt 2 0 t ~ NID , η V 0 t η Vη Vη 20 Special Cases MSOE Model Vη Cov( t , t ) 0 Vη is diagonal, that is, Cov(it , jt ) 0 for i j SSOE Model ηt α ε t Vη Cov( t , ηt ) α 2 Vη Cov( ηt ) α' α 2 21 Linear SSOE Model yt hxt 1 t xt Fx t 1 α t h is a k 1 vector F is a k k vector α is a k 1 vector 22 SSOE for Holt’s Linear Trend Exponential Smoothing t 1 t yt 1 1 bt 1 1 1 t 1 t x t 0 1 bt 1 23 MSOE Model for Holt’s Liner Trend Exponential Smoothing yt t 1 bt 1 t t t 1 bt 1 1t bt bt 1 2t 24 Parameter Space 1 • Both correspond to the same ARIMA model in the steady state BUT parameter spaces differ – SSOE has same space as ARIMA – MSOE space is subset of ARIMA • Example: for ARIMA (0,1,1), = 1- – MSOE has 0 < < 1 – SSOE has 0 < <2 equivalent to –1 < < 1 25 Parameter space 2 • In general, ρ = 1 (SSOE) yields the same parameter space as ARIMA, ρ = 0 (MSOE) yields a smaller space. • No other value of ρ yields a larger parameter space than does ρ = 1 [Theorems 5.1 and 5.2] • Restricted parameter spaces may lead to poor model choices [e.g. Morley et al., 2002] 26 Convergence of the Covariance Matrix for Linear SSOE In the Kalman filter , Ct 0 as t where mt E (xt | y1 , y2 ,, yt ) Ct Cov(xt | y1 , y2 ,, yt ) E[(xt mt )(xt mt ) | y1 , y2 ,, yt ] m t Fm t 1 a t ( yt hm t 1 ) Kalman gain 2 2 1 FC h )( h C h ) at (α t 1 t 1 27 Convergence 2 • The practical import of this result is that, provided t is not too small, we can approximate the state variable by its estimate • That is, heuristic forecasting procedures, such as exponential smoothing, that generate forecast updates in a form like the state equations, are validated. 28 Equivalence • Equivalent linear state space models (West and Harrison) will give rise to the same forecast distribution. • For the MSOE model the equivalence transformation H of the state vector typically produces a non-diagonal covariance matrix. • For the SSOE model the equivalence transformation H preserves the perfect correlation of the state vectors. 29 Explanatory Variables yt hxt 1 zt γ t x t Fx t 1 t SSOE can be put into a regression framework ~ yt ~ zt t ~ yt is a function of yt and ~ zt is an augmented function of zt and x 0 γ 30 ARCH Effects SSOE version of the ARCH(1) model yt 1 hx t 1 t x t Fx t 1 t t ht1/ 2 t ~ N (0,1) t ht 0 1 t21 31 Advantages of SSOE Models • Mapping from model to forecasting equations is direct and easy to see • ML estimation can be applied directly without need for the Kalman updating procedure • Nonlinear models are readily incorporated into the model framework 32 Further Advantages of SSOE Models • Akaike and Schwarz information criteria can be used to choose models, including choices among models with different numbers of unit roots in the reduced form • Largest parameter space among state space models. • In Kalman filter, the covariance matrix of the state vector converges to 0. 33