Rob J Hyndman Forecasting: Principles and Practice 9. State space models Forecasting: Principles and Practice 1 Outline 1 Recall ETS models 2 Simple structural models 3 Linear Gaussian state space models 4 Kalman filter 5 ARIMA models in state space form 6 Kalman smoothing 7 Time varying parameter models Forecasting: Principles and Practice Recall ETS models 2 Exponential smoothing methods Seasonal Component N A M (None) (Additive) (Multiplicative) Trend Component N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N Ad ,A Ad ,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M General notation E T S : ExponenTial Smoothing Examples: A,N,N: A,A,N: M,A,M: Simple exponential smoothing with additive errors Holt’s linear method with additive errors Multiplicative Holt-Winters’ method with multiplicative errors Forecasting: Principles and Practice Recall ETS models 3 Exponential smoothing methods Seasonal Component N A M (None) (Additive) (Multiplicative) Trend Component N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N Ad ,A Ad ,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M General notation E T S : ExponenTial Smoothing Examples: A,N,N: A,A,N: M,A,M: Simple exponential smoothing with additive errors Holt’s linear method with additive errors Multiplicative Holt-Winters’ method with multiplicative errors Forecasting: Principles and Practice Recall ETS models 3 Exponential smoothing methods Seasonal Component N A M (None) (Additive) (Multiplicative) Trend Component N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N Ad ,A Ad ,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M General notation E T S : ExponenTial Smoothing ↑ Trend Examples: A,N,N: A,A,N: M,A,M: Simple exponential smoothing with additive errors Holt’s linear method with additive errors Multiplicative Holt-Winters’ method with multiplicative errors Forecasting: Principles and Practice Recall ETS models 3 Exponential smoothing methods Seasonal Component N A M (None) (Additive) (Multiplicative) Trend Component N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N Ad ,A Ad ,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M General notation E T S : ExponenTial Smoothing ↑ - Trend Seasonal Examples: A,N,N: A,A,N: M,A,M: Simple exponential smoothing with additive errors Holt’s linear method with additive errors Multiplicative Holt-Winters’ method with multiplicative errors Forecasting: Principles and Practice Recall ETS models 3 Exponential smoothing methods Seasonal Component N A M (None) (Additive) (Multiplicative) Trend Component N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N Ad ,A Ad ,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M General notation E T S : ExponenTial Smoothing % ↑ - Error Trend Seasonal Examples: A,N,N: A,A,N: M,A,M: Simple exponential smoothing with additive errors Holt’s linear method with additive errors Multiplicative Holt-Winters’ method with multiplicative errors Forecasting: Principles and Practice Recall ETS models 3 Exponential smoothing methods Seasonal Component N A M (None) (Additive) (Multiplicative) Trend Component N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N Ad ,A Ad ,M M (Multiplicative) M,N M,A M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M General notation E T S : ExponenTial Smoothing % ↑ - Error Trend Seasonal Examples: A,N,N: A,A,N: M,A,M: Simple exponential smoothing with additive errors Holt’s linear method with additive errors Multiplicative Holt-Winters’ method with multiplicative errors Forecasting: Principles and Practice Recall ETS models 3 Exponential smoothing methods Innovations state space models Seasonal Component Trend N A M å AllComponent ETS models can (None) be written in innovations (Additive) (Multiplicative) N state (None) space form. N,N N,A N,M A (Additive) A,N A,A A,M å Additive and multiplicative versions give the Ad (Additive damped) Ad ,N Ad ,A Ad ,M M Md same point forecastsM,N but different prediction (Multiplicative) M,A M,M intervals. (Multiplicative damped) Md ,N Md ,A Md ,M General notation E T S : ExponenTial Smoothing % ↑ - Error Trend Seasonal Examples: A,N,N: A,A,N: M,A,M: Simple exponential smoothing with additive errors Holt’s linear method with additive errors Multiplicative Holt-Winters’ method with multiplicative errors Forecasting: Principles and Practice Recall ETS models 3 Innovations state space models iid Let xt = (`t , bt , st , st−1 , . . . , st−m+1 ) and εt ∼ N(0, σ 2 ). yt = h(xt−1 ) + k (xt−1 )εt | {z } µt | {z et } xt = f (xt−1 ) + g(xt−1 )εt Additive errors: k (x) = 1. yt = µ t + ε t . Multiplicative errors: k (xt−1 ) = µt . yt = µt (1 + εt ). εt = (yt − µt )/µt is relative error. Forecasting: Principles and Practice Recall ETS models 4 Outline 1 Recall ETS models 2 Simple structural models 3 Linear Gaussian state space models 4 Kalman filter 5 ARIMA models in state space form 6 Kalman smoothing 7 Time varying parameter models Forecasting: Principles and Practice Simple structural models 5 State space models xt −1 ETS state vector yt xt = (`t , bt , st , st−1 , . . . , st−m+1 ) xt yt+1 xt +1 yt+2 xt +2 y t +3 xt +3 yt+4 xt+4 Forecasting: Principles and Practice yt+5 Simple structural models 6 State space models xt −1 ETS state vector yt xt = (`t , bt , st , st−1 , . . . , st−m+1 ) xt yt+1 ETS models å yt depends on xt−1 . xt +1 å The same error yt+2 xt +2 process affects xt |xt−1 and yt |xt−1 . y t +3 xt +3 yt+4 xt+4 Forecasting: Principles and Practice yt+5 Simple structural models 6 State space models xt ETS state vector yt xt = (`t , bt , st , st−1 , . . . , st−m+1 ) xt +1 y t +1 xt+2 yt+2 xt+3 Structural models å yt depends on xt . å A different error process affects xt |xt−1 and yt |xt . Forecasting: Principles and Practice yt+3 xt +4 y t +4 xt+5 yt+5 Simple structural models 7 Local level model Stochastically varying level (random walk) observed with noise y t = `t + ε t ` t = ` t −1 + ξ t εt and ξt are independent Gaussian white noise processes. Compare ETS(A,N,N) where ξt = αεt−1 . Parameters to estimate: σε2 and σξ2 . If σξ2 = 0, yt ∼ NID(`0 , σε2 ). Forecasting: Principles and Practice Simple structural models 8 Local level model Stochastically varying level (random walk) observed with noise y t = `t + ε t ` t = ` t −1 + ξ t εt and ξt are independent Gaussian white noise processes. Compare ETS(A,N,N) where ξt = αεt−1 . Parameters to estimate: σε2 and σξ2 . If σξ2 = 0, yt ∼ NID(`0 , σε2 ). Forecasting: Principles and Practice Simple structural models 8 Local level model Stochastically varying level (random walk) observed with noise y t = `t + ε t ` t = ` t −1 + ξ t εt and ξt are independent Gaussian white noise processes. Compare ETS(A,N,N) where ξt = αεt−1 . Parameters to estimate: σε2 and σξ2 . If σξ2 = 0, yt ∼ NID(`0 , σε2 ). Forecasting: Principles and Practice Simple structural models 8 Local level model Stochastically varying level (random walk) observed with noise y t = `t + ε t ` t = ` t −1 + ξ t εt and ξt are independent Gaussian white noise processes. Compare ETS(A,N,N) where ξt = αεt−1 . Parameters to estimate: σε2 and σξ2 . If σξ2 = 0, yt ∼ NID(`0 , σε2 ). Forecasting: Principles and Practice Simple structural models 8 Local linear trend model Dynamic trend observed with noise y t = `t + ε t `t = `t−1 + bt−1 + ξt bt = bt−1 + ζt εt , ξt and ζt are independent Gaussian white noise processes. Compare ETS(A,A,N) where ξt = (α + β)εt−1 and ζt = βεt−1 Parameters to estimate: σε2 , σξ2 , and σζ2 . If σζ2 = σξ2 = 0, yt = `0 + tb0 + εt . Model is a time-varying linear regression. Forecasting: Principles and Practice Simple structural models 9 Local linear trend model Dynamic trend observed with noise y t = `t + ε t `t = `t−1 + bt−1 + ξt bt = bt−1 + ζt εt , ξt and ζt are independent Gaussian white noise processes. Compare ETS(A,A,N) where ξt = (α + β)εt−1 and ζt = βεt−1 Parameters to estimate: σε2 , σξ2 , and σζ2 . If σζ2 = σξ2 = 0, yt = `0 + tb0 + εt . Model is a time-varying linear regression. Forecasting: Principles and Practice Simple structural models 9 Local linear trend model Dynamic trend observed with noise y t = `t + ε t `t = `t−1 + bt−1 + ξt bt = bt−1 + ζt εt , ξt and ζt are independent Gaussian white noise processes. Compare ETS(A,A,N) where ξt = (α + β)εt−1 and ζt = βεt−1 Parameters to estimate: σε2 , σξ2 , and σζ2 . If σζ2 = σξ2 = 0, yt = `0 + tb0 + εt . Model is a time-varying linear regression. Forecasting: Principles and Practice Simple structural models 9 Local linear trend model Dynamic trend observed with noise y t = `t + ε t `t = `t−1 + bt−1 + ξt bt = bt−1 + ζt εt , ξt and ζt are independent Gaussian white noise processes. Compare ETS(A,A,N) where ξt = (α + β)εt−1 and ζt = βεt−1 Parameters to estimate: σε2 , σξ2 , and σζ2 . If σζ2 = σξ2 = 0, yt = `0 + tb0 + εt . Model is a time-varying linear regression. Forecasting: Principles and Practice Simple structural models 9 Local linear trend model Dynamic trend observed with noise y t = `t + ε t `t = `t−1 + bt−1 + ξt bt = bt−1 + ζt εt , ξt and ζt are independent Gaussian white noise processes. Compare ETS(A,A,N) where ξt = (α + β)εt−1 and ζt = βεt−1 Parameters to estimate: σε2 , σξ2 , and σζ2 . If σζ2 = σξ2 = 0, yt = `0 + tb0 + εt . Model is a time-varying linear regression. Forecasting: Principles and Practice Simple structural models 9 Basic structural model yt = `t + s1,t + εt `t = `t−1 + bt−1 + ξt bt = bt−1 + ζt s1,t = − m −1 X sj,t−1 + ηt j=1 sj,t = sj−1,t−1 , j = 2, . . . , m − 1 εt , ξt , ζt and ηt are independent Gaussian white noise processes. Compare ETS(A,A,A). Parameters to estimate: σε2 , σξ2 , σζ2 and ση2 Deterministic seasonality if ση2 = 0. Forecasting: Principles and Practice Simple structural models 10 Basic structural model yt = `t + s1,t + εt `t = `t−1 + bt−1 + ξt bt = bt−1 + ζt s1,t = − m −1 X sj,t−1 + ηt j=1 sj,t = sj−1,t−1 , j = 2, . . . , m − 1 εt , ξt , ζt and ηt are independent Gaussian white noise processes. Compare ETS(A,A,A). Parameters to estimate: σε2 , σξ2 , σζ2 and ση2 Deterministic seasonality if ση2 = 0. Forecasting: Principles and Practice Simple structural models 10 Basic structural model yt = `t + s1,t + εt `t = `t−1 + bt−1 + ξt bt = bt−1 + ζt s1,t = − m −1 X sj,t−1 + ηt j=1 sj,t = sj−1,t−1 , j = 2, . . . , m − 1 εt , ξt , ζt and ηt are independent Gaussian white noise processes. Compare ETS(A,A,A). Parameters to estimate: σε2 , σξ2 , σζ2 and ση2 Deterministic seasonality if ση2 = 0. Forecasting: Principles and Practice Simple structural models 10 Basic structural model yt = `t + s1,t + εt `t = `t−1 + bt−1 + ξt bt = bt−1 + ζt s1,t = − m −1 X sj,t−1 + ηt j=1 sj,t = sj−1,t−1 , j = 2, . . . , m − 1 εt , ξt , ζt and ηt are independent Gaussian white noise processes. Compare ETS(A,A,A). Parameters to estimate: σε2 , σξ2 , σζ2 and ση2 Deterministic seasonality if ση2 = 0. Forecasting: Principles and Practice Simple structural models 10 Trigonometric models yt = `t + J X sj,t + εt j=1 `t = `t−1 + bt−1 + ξt bt = bt−1 + ζt sj,t = cos λj sj,t−1 + sin λj s∗j,t−1 + ωj,t s∗j,t = − sin λj sj,t−1 + cos λj s∗j,t−1 + ωj∗,t λj = 2π j/m εt , ξt , ζt , ωj,t , ωj∗,t are independent Gaussian white noise processes 2 ωj,t and ωj∗,t have same variance σω, j 2 2 and J = m/2 Equivalent to BSM when σω, = σ ω j Choose J < m/2 for fewer degrees of freedom Forecasting: Principles and Practice Simple structural models 11 Trigonometric models yt = `t + J X sj,t + εt j=1 `t = `t−1 + bt−1 + ξt bt = bt−1 + ζt sj,t = cos λj sj,t−1 + sin λj s∗j,t−1 + ωj,t s∗j,t = − sin λj sj,t−1 + cos λj s∗j,t−1 + ωj∗,t λj = 2π j/m εt , ξt , ζt , ωj,t , ωj∗,t are independent Gaussian white noise processes 2 ωj,t and ωj∗,t have same variance σω, j 2 2 and J = m/2 Equivalent to BSM when σω, = σ ω j Choose J < m/2 for fewer degrees of freedom Forecasting: Principles and Practice Simple structural models 11 Trigonometric models yt = `t + J X sj,t + εt j=1 `t = `t−1 + bt−1 + ξt bt = bt−1 + ζt sj,t = cos λj sj,t−1 + sin λj s∗j,t−1 + ωj,t s∗j,t = − sin λj sj,t−1 + cos λj s∗j,t−1 + ωj∗,t λj = 2π j/m εt , ξt , ζt , ωj,t , ωj∗,t are independent Gaussian white noise processes 2 ωj,t and ωj∗,t have same variance σω, j 2 2 and J = m/2 Equivalent to BSM when σω, = σ ω j Choose J < m/2 for fewer degrees of freedom Forecasting: Principles and Practice Simple structural models 11 Trigonometric models yt = `t + J X sj,t + εt j=1 `t = `t−1 + bt−1 + ξt bt = bt−1 + ζt sj,t = cos λj sj,t−1 + sin λj s∗j,t−1 + ωj,t s∗j,t = − sin λj sj,t−1 + cos λj s∗j,t−1 + ωj∗,t λj = 2π j/m εt , ξt , ζt , ωj,t , ωj∗,t are independent Gaussian white noise processes 2 ωj,t and ωj∗,t have same variance σω, j 2 2 and J = m/2 Equivalent to BSM when σω, = σ ω j Choose J < m/2 for fewer degrees of freedom Forecasting: Principles and Practice Simple structural models 11 Trigonometric models yt = `t + J X sj,t + εt j=1 `t = `t−1 + bt−1 + ξt bt = bt−1 + ζt sj,t = cos λj sj,t−1 + sin λj s∗j,t−1 + ωj,t s∗j,t = − sin λj sj,t−1 + cos λj s∗j,t−1 + ωj∗,t λj = 2π j/m εt , ξt , ζt , ωj,t , ωj∗,t are independent Gaussian white noise processes 2 ωj,t and ωj∗,t have same variance σω, j 2 2 and J = m/2 Equivalent to BSM when σω, = σ ω j Choose J < m/2 for fewer degrees of freedom Forecasting: Principles and Practice Simple structural models 11 ETS vs Structural models ETS models are much more general as they allow non-linear (multiplicative components). ETS allows automatic forecasting due to its larger model space. Additive ETS models are almost equivalent to the corresponding structural models. ETS models have a larger parameter space. Structural models parameters are always non-negative (variances). Structural models are much easier to generalize (e.g., add covariates). It is easier to handle missing values with structural models. Forecasting: Principles and Practice Simple structural models 12 ETS vs Structural models ETS models are much more general as they allow non-linear (multiplicative components). ETS allows automatic forecasting due to its larger model space. Additive ETS models are almost equivalent to the corresponding structural models. ETS models have a larger parameter space. Structural models parameters are always non-negative (variances). Structural models are much easier to generalize (e.g., add covariates). It is easier to handle missing values with structural models. Forecasting: Principles and Practice Simple structural models 12 ETS vs Structural models ETS models are much more general as they allow non-linear (multiplicative components). ETS allows automatic forecasting due to its larger model space. Additive ETS models are almost equivalent to the corresponding structural models. ETS models have a larger parameter space. Structural models parameters are always non-negative (variances). Structural models are much easier to generalize (e.g., add covariates). It is easier to handle missing values with structural models. Forecasting: Principles and Practice Simple structural models 12 ETS vs Structural models ETS models are much more general as they allow non-linear (multiplicative components). ETS allows automatic forecasting due to its larger model space. Additive ETS models are almost equivalent to the corresponding structural models. ETS models have a larger parameter space. Structural models parameters are always non-negative (variances). Structural models are much easier to generalize (e.g., add covariates). It is easier to handle missing values with structural models. Forecasting: Principles and Practice Simple structural models 12 ETS vs Structural models ETS models are much more general as they allow non-linear (multiplicative components). ETS allows automatic forecasting due to its larger model space. Additive ETS models are almost equivalent to the corresponding structural models. ETS models have a larger parameter space. Structural models parameters are always non-negative (variances). Structural models are much easier to generalize (e.g., add covariates). It is easier to handle missing values with structural models. Forecasting: Principles and Practice Simple structural models 12 ETS vs Structural models ETS models are much more general as they allow non-linear (multiplicative components). ETS allows automatic forecasting due to its larger model space. Additive ETS models are almost equivalent to the corresponding structural models. ETS models have a larger parameter space. Structural models parameters are always non-negative (variances). Structural models are much easier to generalize (e.g., add covariates). It is easier to handle missing values with structural models. Forecasting: Principles and Practice Simple structural models 12 Structural models in R StructTS(oil, type="level") StructTS(ausair, type="trend") StructTS(austourists, type="BSM") fit <- StructTS(austourists, type = "BSM") decomp <- cbind(austourists, fitted(fit)) colnames(decomp) <- c("data","level","slope", "seasonal") plot(decomp, main="Decomposition of International visitor nights") Forecasting: Principles and Practice Simple structural models 13 Structural models in R 40 35 −0.5 10 −2.0 0 −10 seasonal slope 25 level 45 20 data 60 Decomposition of International visitor nights 2000 2002 2004 2006 2008 2010 Time Forecasting: Principles and Practice Simple structural models 14 ETS decomposition 60 40 45 20 35 0.509025 0 5 0.5075 −10 season slope level observed Decomposition by ETS(A,A,A) method 2000 2002 2004 2006 2008 2010 Time Forecasting: Principles and Practice Simple structural models 15 Outline 1 Recall ETS models 2 Simple structural models 3 Linear Gaussian state space models 4 Kalman filter 5 ARIMA models in state space form 6 Kalman smoothing 7 Time varying parameter models Forecasting: Principles and Practice Linear Gaussian state space models 16 Linear Gaussian SS models Observation equation State equation yt = f 0 xt + εt xt = Gxt−1 + wt State vector xt of length p G a p × p matrix, f a vector of length p εt ∼ NID(0, σ 2 ), wt ∼ NID(0, W ). Local level model: f = G = 1, xt = `t . Local linear trend model: f 0 = [1 0], 2 σξ 0 `t 1 1 xt = G= W= 0 σζ2 bt 0 1 Forecasting: Principles and Practice Linear Gaussian state space models 17 Linear Gaussian SS models Observation equation State equation yt = f 0 xt + εt xt = Gxt−1 + wt State vector xt of length p G a p × p matrix, f a vector of length p εt ∼ NID(0, σ 2 ), wt ∼ NID(0, W ). Local level model: f = G = 1, xt = `t . Local linear trend model: f 0 = [1 0], 2 σξ 0 `t 1 1 xt = G= W= 0 σζ2 bt 0 1 Forecasting: Principles and Practice Linear Gaussian state space models 17 Linear Gaussian SS models Observation equation State equation yt = f 0 xt + εt xt = Gxt−1 + wt State vector xt of length p G a p × p matrix, f a vector of length p εt ∼ NID(0, σ 2 ), wt ∼ NID(0, W ). Local level model: f = G = 1, xt = `t . Local linear trend model: f 0 = [1 0], 2 σξ 0 `t 1 1 xt = G= W= 0 σζ2 bt 0 1 Forecasting: Principles and Practice Linear Gaussian state space models 17 Linear Gaussian SS models Observation equation State equation yt = f 0 xt + εt xt = Gxt−1 + wt State vector xt of length p G a p × p matrix, f a vector of length p εt ∼ NID(0, σ 2 ), wt ∼ NID(0, W ). Local level model: f = G = 1, xt = `t . Local linear trend model: f 0 = [1 0], 2 σξ 0 `t 1 1 xt = G= W= 0 σζ2 bt 0 1 Forecasting: Principles and Practice Linear Gaussian state space models 17 Linear Gaussian SS models Observation equation State equation yt = f 0 xt + εt xt = Gxt−1 + wt State vector xt of length p G a p × p matrix, f a vector of length p εt ∼ NID(0, σ 2 ), wt ∼ NID(0, W ). Local level model: f = G = 1, xt = `t . Local linear trend model: f 0 = [1 0], 2 σξ 0 `t 1 1 xt = G= W= 0 σζ2 bt 0 1 Forecasting: Principles and Practice Linear Gaussian state space models 17 Linear Gaussian SS models Observation equation State equation yt = f 0 xt + εt xt = Gxt−1 + wt State vector xt of length p G a p × p matrix, f a vector of length p εt ∼ NID(0, σ 2 ), wt ∼ NID(0, W ). Local level model: f = G = 1, xt = `t . Local linear trend model: f 0 = [1 0], 2 σξ 0 `t 1 1 xt = G= W= bt 0 1 0 σζ2 Forecasting: Principles and Practice Linear Gaussian state space models 17 Linear Gaussian SS models Observation equation State equation yt = f 0 xt + εt xt = Gxt−1 + wt State vector xt of length p G a p × p matrix, f a vector of length p εt ∼ NID(0, σ 2 ), wt ∼ NID(0, W ). Local level model: f = G = 1, xt = `t . Local linear trend model: f 0 = [1 0], 2 σξ 0 `t 1 1 xt = G= W= bt 0 1 0 σζ2 Forecasting: Principles and Practice Linear Gaussian state space models 17 Basic structural model Linear Gaussian state space model yt = f 0 xt + εt , εt ∼ N(0, σ 2 ) wt ∼ N(0, W ) xt = Gxt−1 + wt f 0 = [1 0 1 0 · · · 0], xt = `t bt s1,t s2,t s3,t .. . G= sm−1,t Forecasting: Principles and Practice W = diagonal(σξ2 , σζ2 , ση2 , 0, . . . , 0) ... 0 0 ... 0 0 . . . −1 −1 ... 0 0 .. .. ... 0 1 . . .. . . . . . . 0 0 . 0 ... 0 1 0 1 0 0 0 1 0 0 1 0 0 0 −1 −1 0 1 0 0 .. . 0 .. . 0 0 Linear Gaussian state space models 18 Basic structural model Linear Gaussian state space model yt = f 0 xt + εt , εt ∼ N(0, σ 2 ) wt ∼ N(0, W ) xt = Gxt−1 + wt f 0 = [1 0 1 0 · · · 0], xt = `t bt s1,t s2,t s3,t .. . G= sm−1,t Forecasting: Principles and Practice W = diagonal(σξ2 , σζ2 , ση2 , 0, . . . , 0) ... 0 0 ... 0 0 . . . −1 −1 ... 0 0 .. .. ... 0 1 . . .. . . . . . . 0 0 . 0 ... 0 1 0 1 0 0 0 1 0 0 1 0 0 0 −1 −1 0 1 0 0 .. . 0 .. . 0 0 Linear Gaussian state space models 18 Outline 1 Recall ETS models 2 Simple structural models 3 Linear Gaussian state space models 4 Kalman filter 5 ARIMA models in state space form 6 Kalman smoothing 7 Time varying parameter models Forecasting: Principles and Practice Kalman filter 19 Kalman filter Notation: x̂t|t = E[xt |y1 , . . . , yt ] P̂t|t = Var[xt |y1 , . . . , yt ] x̂t|t−1 = E[xt |y1 , . . . , yt−1 ] P̂t|t−1 = Var[xt |y1 , . . . , yt−1 ] ŷt|t−1 = E[yt |y1 , . . . , yt−1 ] v̂t|t−1 = Var[yt |y1 , . . . , yt−1 ] Forecasting: ŷt|t−1 = f 0 x̂t|t−1 v̂t|t−1 = f 0 P̂t|t−1 f + σ 2 Updating or State Filtering: x̂t|t = x̂t|t−1 + P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 ) P̂t|t = P̂t|t−1 − P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1 State Prediction x̂t+1|t = Gx̂t|t P̂t+1|t = GP̂t|t G0 + W Forecasting: Principles and Practice Kalman filter 20 Kalman filter Notation: x̂t|t = E[xt |y1 , . . . , yt ] P̂t|t = Var[xt |y1 , . . . , yt ] x̂t|t−1 = E[xt |y1 , . . . , yt−1 ] P̂t|t−1 = Var[xt |y1 , . . . , yt−1 ] ŷt|t−1 = E[yt |y1 , . . . , yt−1 ] v̂t|t−1 = Var[yt |y1 , . . . , yt−1 ] Forecasting: ŷt|t−1 = f 0 x̂t|t−1 v̂t|t−1 = f 0 P̂t|t−1 f + σ 2 Updating or State Filtering: x̂t|t = x̂t|t−1 + P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 ) P̂t|t = P̂t|t−1 − P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1 State Prediction x̂t+1|t = Gx̂t|t P̂t+1|t = GP̂t|t G0 + W Forecasting: Principles and Practice Kalman filter 20 Kalman filter Notation: x̂t|t = E[xt |y1 , . . . , yt ] P̂t|t = Var[xt |y1 , . . . , yt ] x̂t|t−1 = E[xt |y1 , . . . , yt−1 ] P̂t|t−1 = Var[xt |y1 , . . . , yt−1 ] ŷt|t−1 = E[yt |y1 , . . . , yt−1 ] v̂t|t−1 = Var[yt |y1 , . . . , yt−1 ] Forecasting: ŷt|t−1 = f 0 x̂t|t−1 v̂t|t−1 = f 0 P̂t|t−1 f + σ 2 Updating or State Filtering: x̂t|t = x̂t|t−1 + P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 ) P̂t|t = P̂t|t−1 − P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1 State Prediction x̂t+1|t = Gx̂t|t P̂t+1|t = GP̂t|t G0 + W Forecasting: Principles and Practice Kalman filter 20 Kalman filter Notation: x̂t|t = E[xt |y1 , . . . , yt ] P̂t|t = Var[xt |y1 , . . . , yt ] x̂t|t−1 = E[xt |y1 , . . . , yt−1 ] P̂t|t−1 = Var[xt |y1 , . . . , yt−1 ] ŷt|t−1 = E[yt |y1 , . . . , yt−1 ] v̂t|t−1 = Var[yt |y1 , . . . , yt−1 ] Forecasting: ŷt|t−1 = f 0 x̂t|t−1 v̂t|t−1 = f 0 P̂t|t−1 f + σ 2 Updating or State Filtering: x̂t|t = x̂t|t−1 + P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 ) P̂t|t = P̂t|t−1 − P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1 State Prediction x̂t+1|t = Gx̂t|t P̂t+1|t = GP̂t|t G0 + W Forecasting: Principles and Practice Kalman filter 20 Kalman filter Notation: x̂t|t = E[xt |y1 , . . . , yt ] P̂t|t = Var[xt |y1 , . . . , yt ] x̂t|t−1 = E[xt |y1 , . . . , yt−1 ] P̂t|t−1 = Var[xt |y1 , . . . , yt−1 ] ŷt|t−1 = E[yt |y1 , . . . , yt−1 ] v̂t|t−1 = Var[yt |y1 , . . . , yt−1 ] Forecasting: ŷt|t−1 = f 0 x̂t|t−1 Iterate for t = 1, . . . , T v̂t|t−1 = f 0 P̂t|t−1 f + σ 2 Updating or State Filtering: x̂t|t = x̂t|t−1 + P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 ) P̂t|t = P̂t|t−1 − P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1 State Prediction x̂t+1|t = Gx̂t|t P̂t+1|t = GP̂t|t G0 + W Forecasting: Principles and Practice Kalman filter 20 Kalman filter Notation: x̂t|t = E[xt |y1 , . . . , yt ] P̂t|t = Var[xt |y1 , . . . , yt ] x̂t|t−1 = E[xt |y1 , . . . , yt−1 ] P̂t|t−1 = Var[xt |y1 , . . . , yt−1 ] ŷt|t−1 = E[yt |y1 , . . . , yt−1 ] v̂t|t−1 = Var[yt |y1 , . . . , yt−1 ] Forecasting: ŷt|t−1 = f 0 x̂t|t−1 v̂t|t−1 = f 0 P̂t|t−1 f + σ 2 Updating or State Filtering: x̂t|t = x̂t|t−1 + P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 ) Iterate for t = 1, . . . , T Assume we know x1|0 and P1|0 . P̂t|t = P̂t|t−1 − P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1 State Prediction x̂t+1|t = Gx̂t|t P̂t+1|t = GP̂t|t G0 + W Forecasting: Principles and Practice Kalman filter 20 Kalman filter Notation: x̂t|t = E[xt |y1 , . . . , yt ] P̂t|t = Var[xt |y1 , . . . , yt ] x̂t|t−1 = E[xt |y1 , . . . , yt−1 ] P̂t|t−1 = Var[xt |y1 , . . . , yt−1 ] ŷt|t−1 = E[yt |y1 , . . . , yt−1 ] v̂t|t−1 = Var[yt |y1 , . . . , yt−1 ] Forecasting: ŷt|t−1 = f 0 x̂t|t−1 Iterate for t = 1, . . . , T v̂t|t−1 = f 0 P̂t|t−1 f + σ 2 Updating or State Filtering: x̂t|t = x̂t|t−1 + P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 ) Assume we know x1|0 and P1|0 . P̂t|t = P̂t|t−1 − P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1 State Prediction x̂t+1|t = Gx̂t|t 0 P̂t+1|t = GP̂t|t G + W Forecasting: Principles and Practice Just conditional expectations. So this gives minimum MSE estimates. Kalman filter 20 Kalman recursions KALMAN RECURSIONS observation at time t 2. Forecasting Forecast Observation 1. State Prediction Filtered State Time t-1 Forecasting: Principles and Practice 3. State Filtering Predicted State Filtered State Time t Time t Kalman filter 21 Initializing Kalman filter Need x1|0 and P1|0 to get started. Common approach for structural models: set x1|0 = 0 and P1|0 = kI for a very large k. Lots of research papers on optimal initialization choices for Kalman recursions. ETS approach was to estimate x1|0 and avoid P1|0 by assuming error processes identical. A random x1|0 could be used with ETS models, and then a form of Kalman filter would be required for estimation and forecasting. This gives more realistic prediction intervals. Forecasting: Principles and Practice Kalman filter 22 Initializing Kalman filter Need x1|0 and P1|0 to get started. Common approach for structural models: set x1|0 = 0 and P1|0 = kI for a very large k. Lots of research papers on optimal initialization choices for Kalman recursions. ETS approach was to estimate x1|0 and avoid P1|0 by assuming error processes identical. A random x1|0 could be used with ETS models, and then a form of Kalman filter would be required for estimation and forecasting. This gives more realistic prediction intervals. Forecasting: Principles and Practice Kalman filter 22 Initializing Kalman filter Need x1|0 and P1|0 to get started. Common approach for structural models: set x1|0 = 0 and P1|0 = kI for a very large k. Lots of research papers on optimal initialization choices for Kalman recursions. ETS approach was to estimate x1|0 and avoid P1|0 by assuming error processes identical. A random x1|0 could be used with ETS models, and then a form of Kalman filter would be required for estimation and forecasting. This gives more realistic prediction intervals. Forecasting: Principles and Practice Kalman filter 22 Initializing Kalman filter Need x1|0 and P1|0 to get started. Common approach for structural models: set x1|0 = 0 and P1|0 = kI for a very large k. Lots of research papers on optimal initialization choices for Kalman recursions. ETS approach was to estimate x1|0 and avoid P1|0 by assuming error processes identical. A random x1|0 could be used with ETS models, and then a form of Kalman filter would be required for estimation and forecasting. This gives more realistic prediction intervals. Forecasting: Principles and Practice Kalman filter 22 Initializing Kalman filter Need x1|0 and P1|0 to get started. Common approach for structural models: set x1|0 = 0 and P1|0 = kI for a very large k. Lots of research papers on optimal initialization choices for Kalman recursions. ETS approach was to estimate x1|0 and avoid P1|0 by assuming error processes identical. A random x1|0 could be used with ETS models, and then a form of Kalman filter would be required for estimation and forecasting. This gives more realistic prediction intervals. Forecasting: Principles and Practice Kalman filter 22 Initializing Kalman filter Need x1|0 and P1|0 to get started. Common approach for structural models: set x1|0 = 0 and P1|0 = kI for a very large k. Lots of research papers on optimal initialization choices for Kalman recursions. ETS approach was to estimate x1|0 and avoid P1|0 by assuming error processes identical. A random x1|0 could be used with ETS models, and then a form of Kalman filter would be required for estimation and forecasting. This gives more realistic prediction intervals. Forecasting: Principles and Practice Kalman filter 22 Local level model yt = `t + εt `t = `t−1 + ut εt ∼ NID(0, σ 2 ) ut ∼ NID(0, q2 ) Kalman recursions: ŷt|t−1 = `ˆt−1|t−1 v̂t|t−1 = p̂t|t−1 + σ 2 `ˆt|t = `ˆt−1|t−1 + p̂t|t−1 v̂t−|t1−1 (yt − ŷt|t−1 ) p̂t+1|t = p̂t|t−1 (1 − v̂t−|t1−1 p̂t|t−1 ) + q2 Forecasting: Principles and Practice Kalman filter 23 Local level model yt = `t + εt `t = `t−1 + ut εt ∼ NID(0, σ 2 ) ut ∼ NID(0, q2 ) Kalman recursions: ŷt|t−1 = `ˆt−1|t−1 v̂t|t−1 = p̂t|t−1 + σ 2 `ˆt|t = `ˆt−1|t−1 + p̂t|t−1 v̂t−|t1−1 (yt − ŷt|t−1 ) p̂t+1|t = p̂t|t−1 (1 − v̂t−|t1−1 p̂t|t−1 ) + q2 Forecasting: Principles and Practice Kalman filter 23 Handling missing values Forecasting: 0 ŷt|t−1 = f x̂t|t−1 v̂t|t−1 = f 0 P̂t|t−1 f + σ 2 Iterate for t = 1, . . . , T starting with x1|0 and P1|0 . Updating or State Filtering: x̂t|t = x̂t|t−1 +P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 ) P̂t|t = P̂t|t−1 −P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1 State Prediction x̂t|t−1 = Gx̂t−1|t−1 P̂t|t−1 = GP̂t−1|t−1 G0 + W Forecasting: Principles and Practice Kalman filter 24 Handling missing values Forecasting: 0 ŷt|t−1 = f x̂t|t−1 v̂t|t−1 = f 0 P̂t|t−1 f + σ 2 Iterate for t = 1, . . . , T starting with x1|0 and P1|0 . Updating or State Filtering: x̂t|t = x̂t|t−1 +P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 ) P̂t|t = P̂t|t−1 −P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1 State Prediction x̂t|t−1 = Gx̂t−1|t−1 Ignored greyed out section if yt missing. P̂t|t−1 = GP̂t−1|t−1 G0 + W Forecasting: Principles and Practice Kalman filter 24 Handling missing values Forecasting: 0 ŷt|t−1 = f x̂t|t−1 v̂t|t−1 = f 0 P̂t|t−1 f + σ 2 Iterate for t = 1, . . . , T starting with x1|0 and P1|0 . Updating or State Filtering: x̂t|t = x̂t|t−1 +P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 ) P̂t|t = P̂t|t−1 −P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1 State Prediction x̂t|t−1 = Gx̂t−1|t−1 Ignored greyed out section if yt missing. P̂t|t−1 = GP̂t−1|t−1 G0 + W Forecasting: Principles and Practice Kalman filter 24 Multi-step forecasting Forecasting: ŷt|t−1 = f 0 x̂t|t−1 v̂t|t−1 = f 0 P̂t|t−1 f + σ 2 Iterate for t = T + 1, . . . , T + h starting with xT |T and PT |T . Updating or State Filtering: x̂t|t = x̂t|t−1 +P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 ) P̂t|t = P̂t|t−1 −P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1 State Prediction x̂t|t−1 = Gx̂t−1|t−1 P̂t|t−1 = GP̂t−1|t−1 G0 + W Forecasting: Principles and Practice Kalman filter 25 Multi-step forecasting Forecasting: Iterate for t = T + 1, . . . , T + h starting with xT |T and PT |T . ŷt|t−1 = f 0 x̂t|t−1 v̂t|t−1 = f 0 P̂t|t−1 f + σ 2 Updating or State Filtering: x̂t|t = x̂t|t−1 +P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 ) P̂t|t = P̂t|t−1 −P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1 State Prediction x̂t|t−1 = Gx̂t−1|t−1 Treat future values as missing. P̂t|t−1 = GP̂t−1|t−1 G0 + W Forecasting: Principles and Practice Kalman filter 25 Kalman filter What’s so special about the Kalman filter Very general equations for any model in state space format. Any model in state space format can easily be generalized. Optimal MSE forecasts Easy to handle missing values. Easy to compute likelihood. Forecasting: Principles and Practice Kalman filter 26 Kalman filter What’s so special about the Kalman filter Very general equations for any model in state space format. Any model in state space format can easily be generalized. Optimal MSE forecasts Easy to handle missing values. Easy to compute likelihood. Forecasting: Principles and Practice Kalman filter 26 Kalman filter What’s so special about the Kalman filter Very general equations for any model in state space format. Any model in state space format can easily be generalized. Optimal MSE forecasts Easy to handle missing values. Easy to compute likelihood. Forecasting: Principles and Practice Kalman filter 26 Kalman filter What’s so special about the Kalman filter Very general equations for any model in state space format. Any model in state space format can easily be generalized. Optimal MSE forecasts Easy to handle missing values. Easy to compute likelihood. Forecasting: Principles and Practice Kalman filter 26 Kalman filter What’s so special about the Kalman filter Very general equations for any model in state space format. Any model in state space format can easily be generalized. Optimal MSE forecasts Easy to handle missing values. Easy to compute likelihood. Forecasting: Principles and Practice Kalman filter 26 Likelihood calculation θ = all unknown parameters fθ (yt |y1 , y2 , . . . , yt−1 ) = one-step forecast density. Likelihood L(y1 , . . . , yT ; θ) = T Y fθ (yt |y1 , . . . , yt−1 ) t =1 Gaussian log likelihood log L = − T 2 T log(2π) − 1X 2 t =1 T log v̂t|t−1 − 1X 2 e2t /v̂t|t−1 t =1 where et = yt − ŷt|t−1 . All terms obtained from Kalman filter equations. Forecasting: Principles and Practice Kalman filter 27 Likelihood calculation θ = all unknown parameters fθ (yt |y1 , y2 , . . . , yt−1 ) = one-step forecast density. Likelihood L(y1 , . . . , yT ; θ) = T Y fθ (yt |y1 , . . . , yt−1 ) t =1 Gaussian log likelihood log L = − T 2 T log(2π) − 1X 2 t =1 T log v̂t|t−1 − 1X 2 e2t /v̂t|t−1 t =1 where et = yt − ŷt|t−1 . All terms obtained from Kalman filter equations. Forecasting: Principles and Practice Kalman filter 27 Likelihood calculation θ = all unknown parameters fθ (yt |y1 , y2 , . . . , yt−1 ) = one-step forecast density. Likelihood L(y1 , . . . , yT ; θ) = T Y fθ (yt |y1 , . . . , yt−1 ) t =1 Gaussian log likelihood log L = − T 2 T log(2π) − 1X 2 t =1 T log v̂t|t−1 − 1X 2 e2t /v̂t|t−1 t =1 where et = yt − ŷt|t−1 . All terms obtained from Kalman filter equations. Forecasting: Principles and Practice Kalman filter 27 Outline 1 Recall ETS models 2 Simple structural models 3 Linear Gaussian state space models 4 Kalman filter 5 ARIMA models in state space form 6 Kalman smoothing 7 Time varying parameter models Forecasting: Principles and Practice ARIMA models in state space form 28 ARMA models in state space form AR(2) model et ∼ NID(0, σ 2 ) yt = φ1 yt−1 + φ2 yt−2 + et , et yt . and wt = Let xt = 0 y t −1 Then yt = [1 0]xt xt = φ1 φ2 1 0 xt−1 + wt Now in state space form We can use Kalman filter to compute likelihood and forecasts. Forecasting: Principles and Practice ARIMA models in state space form 29 ARMA models in state space form AR(2) model et ∼ NID(0, σ 2 ) yt = φ1 yt−1 + φ2 yt−2 + et , et yt . and wt = Let xt = 0 y t −1 Then yt = [1 0]xt xt = φ1 φ2 1 0 xt−1 + wt Now in state space form We can use Kalman filter to compute likelihood and forecasts. Forecasting: Principles and Practice ARIMA models in state space form 29 ARMA models in state space form AR(2) model et ∼ NID(0, σ 2 ) yt = φ1 yt−1 + φ2 yt−2 + et , et yt . and wt = Let xt = 0 y t −1 Then yt = [1 0]xt xt = φ1 φ2 1 0 xt−1 + wt Now in state space form We can use Kalman filter to compute likelihood and forecasts. Forecasting: Principles and Practice ARIMA models in state space form 29 ARMA models in state space form AR(2) model et ∼ NID(0, σ 2 ) yt = φ1 yt−1 + φ2 yt−2 + et , et yt . and wt = Let xt = 0 y t −1 Then yt = [1 0]xt xt = φ1 φ2 1 0 xt−1 + wt Now in state space form We can use Kalman filter to compute likelihood and forecasts. Forecasting: Principles and Practice ARIMA models in state space form 29 ARMA models in state space form AR(2) model et ∼ NID(0, σ 2 ) yt = φ1 yt−1 + φ2 yt−2 + et , et yt . and wt = Let xt = 0 y t −1 Then yt = [1 0]xt xt = φ1 φ2 1 0 xt−1 + wt Now in state space form We can use Kalman filter to compute likelihood and forecasts. Forecasting: Principles and Practice ARIMA models in state space form 29 ARMA models in state space form AR(2) model et ∼ NID(0, σ 2 ) yt = φ1 yt−1 + φ2 yt−2 + et , Alternative formulation et yt . and wt = Let xt = 0 φ2 yt−1 yt = 1 0 xt φ1 1 xt = x + wt φ 2 0 t −1 Alternative state space form We can use Kalman filter to compute likelihood and forecasts. Forecasting: Principles and Practice ARIMA models in state space form 30 ARMA models in state space form AR(2) model et ∼ NID(0, σ 2 ) yt = φ1 yt−1 + φ2 yt−2 + et , Alternative formulation et yt . and wt = Let xt = 0 φ2 yt−1 yt = 1 0 xt φ1 1 xt = x + wt φ 2 0 t −1 Alternative state space form We can use Kalman filter to compute likelihood and forecasts. Forecasting: Principles and Practice ARIMA models in state space form 30 ARMA models in state space form AR(2) model et ∼ NID(0, σ 2 ) yt = φ1 yt−1 + φ2 yt−2 + et , Alternative formulation et yt . and wt = Let xt = 0 φ2 yt−1 yt = 1 0 xt φ1 1 xt = x + wt φ 2 0 t −1 Alternative state space form We can use Kalman filter to compute likelihood and forecasts. Forecasting: Principles and Practice ARIMA models in state space form 30 ARMA models in state space form AR(2) model et ∼ NID(0, σ 2 ) yt = φ1 yt−1 + φ2 yt−2 + et , Alternative formulation et yt . and wt = Let xt = 0 φ2 yt−1 yt = 1 0 xt φ1 1 xt = x + wt φ 2 0 t −1 Alternative state space form We can use Kalman filter to compute likelihood and forecasts. Forecasting: Principles and Practice ARIMA models in state space form 30 ARMA models in state space form AR(2) model et ∼ NID(0, σ 2 ) yt = φ1 yt−1 + φ2 yt−2 + et , Alternative formulation et yt . and wt = Let xt = 0 φ2 yt−1 yt = 1 0 xt φ1 1 xt = x + wt φ 2 0 t −1 Alternative state space form We can use Kalman filter to compute likelihood and forecasts. Forecasting: Principles and Practice ARIMA models in state space form 30 ARMA models in state space form AR(p) model et ∼ NID(0, σ 2 ) yt = φ1 yt−1 + · · · + φp yt−p + et , Let xt = yt y t −1 .. . et 0 and w = .. . t . yt−p+1 0 yt = 1 0 0 . . . φ1 1 xt = . .. φ2 0 0 ... Forecasting: Principles and Practice 0 xt . . . φp−1 φp ... 0 0 .. .. xt−1 + wt .. . . . 0 1 0 ARIMA models in state space form 31 ARMA models in state space form AR(p) model et ∼ NID(0, σ 2 ) yt = φ1 yt−1 + · · · + φp yt−p + et , Let xt = yt y t −1 .. . et 0 and w = .. . t . yt−p+1 0 yt = 1 0 0 . . . φ1 1 xt = . .. φ2 0 0 ... Forecasting: Principles and Practice 0 xt . . . φp−1 φp ... 0 0 .. .. xt−1 + wt .. . . . 0 1 0 ARIMA models in state space form 31 ARMA models in state space form ARMA(1, 1) model et ∼ NID(0, σ 2 ) yt = φyt−1 + θ et−1 + et , yt et Let xt = and wt = . θ et θ et φ 1 y t = 1 0 xt xt = Forecasting: Principles and Practice 0 0 xt−1 + wt ARIMA models in state space form 32 ARMA models in state space form ARMA(1, 1) model et ∼ NID(0, σ 2 ) yt = φyt−1 + θ et−1 + et , yt et Let xt = and wt = . θ et θ et φ 1 y t = 1 0 xt xt = Forecasting: Principles and Practice 0 0 xt−1 + wt ARIMA models in state space form 32 ARMA models in state space form ARMA(p, q) model yt = φ1 yt−1 + · · · + φp yt−p + θ1 et−1 + · · · + θq et−q + et Let r = max(p, q + 1), θi = 0, q < i ≤ r, φj = 0, p < j ≤ r. yt = 1 0 . . . φ1 1 0 xt 0 ... 0 .. . 1 .. . φ2 0 1 θ1 . . . . xt = .. .. .. . . 0 xt−1 + .. et . φ 0 1 r −1 0 . . . θr −1 φr 0 0 . . . 0 The arima function in R is implemented using this formulation. Forecasting: Principles and Practice ARIMA models in state space form 33 ARMA models in state space form ARMA(p, q) model yt = φ1 yt−1 + · · · + φp yt−p + θ1 et−1 + · · · + θq et−q + et Let r = max(p, q + 1), θi = 0, q < i ≤ r, φj = 0, p < j ≤ r. yt = 1 0 . . . φ1 1 0 xt 0 ... 0 .. . 1 .. . φ2 0 1 θ1 . . . . xt = .. .. .. . . 0 xt−1 + .. et . φ 0 1 r −1 0 . . . θr −1 φr 0 0 . . . 0 The arima function in R is implemented using this formulation. Forecasting: Principles and Practice ARIMA models in state space form 33 ARMA models in state space form ARMA(p, q) model yt = φ1 yt−1 + · · · + φp yt−p + θ1 et−1 + · · · + θq et−q + et Let r = max(p, q + 1), θi = 0, q < i ≤ r, φj = 0, p < j ≤ r. yt = 1 0 . . . φ1 1 0 xt 0 ... 0 .. . 1 .. . φ2 0 1 θ1 . . . . xt = .. .. .. . . 0 xt−1 + .. et . φ 0 1 r −1 0 . . . θr −1 φr 0 0 . . . 0 The arima function in R is implemented using this formulation. Forecasting: Principles and Practice ARIMA models in state space form 33 ARMA models in state space form ARMA(p, q) model yt = φ1 yt−1 + · · · + φp yt−p + θ1 et−1 + · · · + θq et−q + et Let r = max(p, q + 1), θi = 0, q < i ≤ r, φj = 0, p < j ≤ r. yt = 1 0 . . . φ1 1 0 xt 0 ... 0 .. . 1 .. . φ2 0 1 θ1 . . . . xt = .. .. .. . . 0 xt−1 + .. et . φ 0 1 r −1 0 . . . θr −1 φr 0 0 . . . 0 The arima function in R is implemented using this formulation. Forecasting: Principles and Practice ARIMA models in state space form 33 Outline 1 Recall ETS models 2 Simple structural models 3 Linear Gaussian state space models 4 Kalman filter 5 ARIMA models in state space form 6 Kalman smoothing 7 Time varying parameter models Forecasting: Principles and Practice Kalman smoothing 34 Kalman smoothing Want estimate of xt |y1 , . . . , yT where t < T. That is, x̂t|T . x̂t|T = x̂t|t + At x̂t+1|T − x̂t+1|t P̂t|T = P̂t|t + At P̂t+1|T − P̂t+1|t A0t where At = P̂t|t G0 P̂t+1|t −1 . Uses all data, not just previous data. Useful for estimating missing values: ŷt|T = f 0 x̂t|T . Useful for seasonal adjustment when one of the states is a seasonal component. Forecasting: Principles and Practice Kalman smoothing 35 Kalman smoothing Want estimate of xt |y1 , . . . , yT where t < T. That is, x̂t|T . x̂t|T = x̂t|t + At x̂t+1|T − x̂t+1|t P̂t|T = P̂t|t + At P̂t+1|T − P̂t+1|t A0t where At = P̂t|t G0 P̂t+1|t −1 . Uses all data, not just previous data. Useful for estimating missing values: ŷt|T = f 0 x̂t|T . Useful for seasonal adjustment when one of the states is a seasonal component. Forecasting: Principles and Practice Kalman smoothing 35 Kalman smoothing Want estimate of xt |y1 , . . . , yT where t < T. That is, x̂t|T . x̂t|T = x̂t|t + At x̂t+1|T − x̂t+1|t P̂t|T = P̂t|t + At P̂t+1|T − P̂t+1|t A0t where At = P̂t|t G0 P̂t+1|t −1 . Uses all data, not just previous data. Useful for estimating missing values: ŷt|T = f 0 x̂t|T . Useful for seasonal adjustment when one of the states is a seasonal component. Forecasting: Principles and Practice Kalman smoothing 35 Kalman smoothing in R fit <- StructTS(austourists, type = "BSM") sm <- tsSmooth(fit) plot(austourists) lines(sm[,1],col=’blue’) lines(fitted(fit)[,1],col=’red’) legend("topleft",col=c(’blue’,’red’),lty=1, legend=c("Filtered level","Smoothed level")) Forecasting: Principles and Practice Kalman smoothing 36 Filtered level Smoothed level 40 30 20 austourists 50 60 Kalman smoothing in R 2000 2002 2004 2006 2008 2010 Time Forecasting: Principles and Practice Kalman smoothing 37 Kalman smoothing in R fit <- StructTS(austourists, type = "BSM") sm <- tsSmooth(fit) plot(austourists) # Seasonally adjusted data aus.sa <- austourists - sm[,3] lines(aus.sa,col=’blue’) Forecasting: Principles and Practice Kalman smoothing 38 40 30 20 austourists 50 60 Kalman smoothing in R 2000 2002 2004 2006 2008 2010 Time Forecasting: Principles and Practice Kalman smoothing 39 Kalman smoothing in R x <- austourists miss <- sample(1:length(x), 5) x[miss] <- NA fit <- StructTS(x, type = "BSM") sm <- tsSmooth(fit) estim <- sm[,1]+sm[,3] plot(x, ylim=range(austourists)) points(time(x)[miss], estim[miss], col=’red’, pch=1) points(time(x)[miss], austourists[miss], col=’black’, pch=1) legend("topleft", pch=1, col=c(2,1), legend=c("Estimate","Actual")) Forecasting: Principles and Practice Kalman smoothing 40 60 ● Estimate Actual 50 ● ● ● 40 ● ● ● ● 30 ● ● 20 x Kalman smoothing in R 2000 2002 2004 2006 2008 2010 Time Forecasting: Principles and Practice Kalman smoothing 41 Outline 1 Recall ETS models 2 Simple structural models 3 Linear Gaussian state space models 4 Kalman filter 5 ARIMA models in state space form 6 Kalman smoothing 7 Time varying parameter models Forecasting: Principles and Practice Time varying parameter models 42 Time varying parameter models Linear Gaussian state space model yt = ft0 xt + εt , xt = Gt xt−1 + wt εt ∼ N(0, σt2 ) wt ∼ N(0, Wt ) Kalman recursions: ŷt|t−1 = ft0 x̂t|t−1 v̂t|t−1 = ft0 P̂t|t−1 ft + σt2 x̂t|t = x̂t|t−1 + P̂t|t−1 ft v̂t−|t1−1 (yt − ŷt|t−1 ) P̂t|t = P̂t|t−1 − P̂t|t−1 ft v̂t−|t1−1 ft0 P̂t|t−1 x̂t|t−1 = Gt x̂t−1|t−1 P̂t|t−1 = Gt P̂t−1|t−1 G0t + Wt Forecasting: Principles and Practice Time varying parameter models 43 Time varying parameter models Linear Gaussian state space model yt = ft0 xt + εt , xt = Gt xt−1 + wt εt ∼ N(0, σt2 ) wt ∼ N(0, Wt ) Kalman recursions: ŷt|t−1 = ft0 x̂t|t−1 v̂t|t−1 = ft0 P̂t|t−1 ft + σt2 x̂t|t = x̂t|t−1 + P̂t|t−1 ft v̂t−|t1−1 (yt − ŷt|t−1 ) P̂t|t = P̂t|t−1 − P̂t|t−1 ft v̂t−|t1−1 ft0 P̂t|t−1 x̂t|t−1 = Gt x̂t−1|t−1 P̂t|t−1 = Gt P̂t−1|t−1 G0t + Wt Forecasting: Principles and Practice Time varying parameter models 43 Structural models with covariates Local level with covariate yt = `t + β zt + εt ft0 = [1 zt ] `t = `t−1 + ξt 2 σξ 0 `t 1 0 Wt = xt = G= 0 1 β 0 0 Assumes zt is fixed and known (as in regression) Estimate of β is given by x̂T |T . Equivalent to simple linear regression with time varying intercept. Easy to extend to multiple regression with additional terms. Forecasting: Principles and Practice Time varying parameter models 44 Structural models with covariates Local level with covariate yt = `t + β zt + εt ft0 = [1 zt ] `t = `t−1 + ξt 2 σξ 0 `t 1 0 Wt = xt = G= 0 1 β 0 0 Assumes zt is fixed and known (as in regression) Estimate of β is given by x̂T |T . Equivalent to simple linear regression with time varying intercept. Easy to extend to multiple regression with additional terms. Forecasting: Principles and Practice Time varying parameter models 44 Structural models with covariates Local level with covariate yt = `t + β zt + εt ft0 = [1 zt ] `t = `t−1 + ξt 2 σξ 0 `t 1 0 Wt = xt = G= 0 1 β 0 0 Assumes zt is fixed and known (as in regression) Estimate of β is given by x̂T |T . Equivalent to simple linear regression with time varying intercept. Easy to extend to multiple regression with additional terms. Forecasting: Principles and Practice Time varying parameter models 44 Structural models with covariates Local level with covariate yt = `t + β zt + εt ft0 = [1 zt ] `t = `t−1 + ξt 2 σξ 0 `t 1 0 Wt = xt = G= 0 1 β 0 0 Assumes zt is fixed and known (as in regression) Estimate of β is given by x̂T |T . Equivalent to simple linear regression with time varying intercept. Easy to extend to multiple regression with additional terms. Forecasting: Principles and Practice Time varying parameter models 44 Structural models with covariates Local level with covariate yt = `t + β zt + εt ft0 = [1 zt ] `t = `t−1 + ξt 2 σξ 0 `t 1 0 Wt = xt = G= 0 1 β 0 0 Assumes zt is fixed and known (as in regression) Estimate of β is given by x̂T |T . Equivalent to simple linear regression with time varying intercept. Easy to extend to multiple regression with additional terms. Forecasting: Principles and Practice Time varying parameter models 44 Structural models with covariates Local level with covariate yt = `t + β zt + εt ft0 = [1 zt ] `t = `t−1 + ξt 2 σξ 0 `t 1 0 Wt = xt = G= 0 1 β 0 0 Assumes zt is fixed and known (as in regression) Estimate of β is given by x̂T |T . Equivalent to simple linear regression with time varying intercept. Easy to extend to multiple regression with additional terms. Forecasting: Principles and Practice Time varying parameter models 44 Time varying regression Simple linear regression with time varying parameters yt = `t + βt zt + εt ft0 = [1 zt ] `t = `t−1 + ξt βt = βt−1 + ζt 2 σξ 0 `t 1 0 xt = G= Wt = βt 0 1 0 σζ2 Allows for a linear regression with parameters that change slowly over time. Parameters follow independent random walks. Estimates of parameters given by x̂t|t or x̂t|T . Forecasting: Principles and Practice Time varying parameter models 45 Time varying regression Simple linear regression with time varying parameters yt = `t + βt zt + εt ft0 = [1 zt ] `t = `t−1 + ξt βt = βt−1 + ζt 2 σξ 0 `t 1 0 xt = G= Wt = βt 0 1 0 σζ2 Allows for a linear regression with parameters that change slowly over time. Parameters follow independent random walks. Estimates of parameters given by x̂t|t or x̂t|T . Forecasting: Principles and Practice Time varying parameter models 45 Time varying regression Simple linear regression with time varying parameters yt = `t + βt zt + εt ft0 = [1 zt ] `t = `t−1 + ξt βt = βt−1 + ζt 2 σξ 0 `t 1 0 xt = G= Wt = βt 0 1 0 σζ2 Allows for a linear regression with parameters that change slowly over time. Parameters follow independent random walks. Estimates of parameters given by x̂t|t or x̂t|T . Forecasting: Principles and Practice Time varying parameter models 45 Time varying regression Simple linear regression with time varying parameters yt = `t + βt zt + εt ft0 = [1 zt ] `t = `t−1 + ξt βt = βt−1 + ζt 2 σξ 0 `t 1 0 xt = G= Wt = βt 0 1 0 σζ2 Allows for a linear regression with parameters that change slowly over time. Parameters follow independent random walks. Estimates of parameters given by x̂t|t or x̂t|T . Forecasting: Principles and Practice Time varying parameter models 45 Time varying regression Simple linear regression with time varying parameters yt = `t + βt zt + εt ft0 = [1 zt ] `t = `t−1 + ξt βt = βt−1 + ζt 2 σξ 0 `t 1 0 xt = G= Wt = βt 0 1 0 σζ2 Allows for a linear regression with parameters that change slowly over time. Parameters follow independent random walks. Estimates of parameters given by x̂t|t or x̂t|T . Forecasting: Principles and Practice Time varying parameter models 45 Updating (“online”) regression Same idea can be used to estimate a regression iteratively as new data arrives. Simple linear regression with updating parameters yt = `t + βt zt + εt ft0 = [1 zt ] `t = `t−1 + ξt βt = βt−1 + ζt `t 1 0 0 0 xt = G= Wt = βt 0 1 0 0 Updated parameter estimates given by x̂t|t . Recursive residuals given by yt − ŷt|t−1 . Forecasting: Principles and Practice Time varying parameter models 46 Updating (“online”) regression Same idea can be used to estimate a regression iteratively as new data arrives. Simple linear regression with updating parameters yt = `t + βt zt + εt ft0 = [1 zt ] `t = `t−1 + ξt βt = βt−1 + ζt `t 1 0 0 0 xt = G= Wt = βt 0 1 0 0 Updated parameter estimates given by x̂t|t . Recursive residuals given by yt − ŷt|t−1 . Forecasting: Principles and Practice Time varying parameter models 46 Updating (“online”) regression Same idea can be used to estimate a regression iteratively as new data arrives. Simple linear regression with updating parameters yt = `t + βt zt + εt ft0 = [1 zt ] `t = `t−1 + ξt βt = βt−1 + ζt `t 1 0 0 0 xt = G= Wt = βt 0 1 0 0 Updated parameter estimates given by x̂t|t . Recursive residuals given by yt − ŷt|t−1 . Forecasting: Principles and Practice Time varying parameter models 46 Updating (“online”) regression Same idea can be used to estimate a regression iteratively as new data arrives. Simple linear regression with updating parameters yt = `t + βt zt + εt ft0 = [1 zt ] `t = `t−1 + ξt βt = βt−1 + ζt `t 1 0 0 0 xt = G= Wt = βt 0 1 0 0 Updated parameter estimates given by x̂t|t . Recursive residuals given by yt − ŷt|t−1 . Forecasting: Principles and Practice Time varying parameter models 46 Updating (“online”) regression Same idea can be used to estimate a regression iteratively as new data arrives. Simple linear regression with updating parameters yt = `t + βt zt + εt ft0 = [1 zt ] `t = `t−1 + ξt βt = βt−1 + ζt `t 1 0 0 0 xt = G= Wt = βt 0 1 0 0 Updated parameter estimates given by x̂t|t . Recursive residuals given by yt − ŷt|t−1 . Forecasting: Principles and Practice Time varying parameter models 46