Forecasting: Principles and Practice Rob J Hyndman 9. State space models

advertisement
Rob J Hyndman
Forecasting:
Principles and Practice
9. State space models
Forecasting: Principles and Practice
1
Outline
1 Recall ETS models
2 Simple structural models
3 Linear Gaussian state space models
4 Kalman filter
5 ARIMA models in state space form
6 Kalman smoothing
7 Time varying parameter models
Forecasting: Principles and Practice
Recall ETS models
2
Exponential smoothing methods
Seasonal Component
N
A
M
(None)
(Additive) (Multiplicative)
Trend
Component
N
(None)
N,N
N,A
N,M
A
(Additive)
A,N
A,A
A,M
Ad
(Additive damped)
Ad ,N
Ad ,A
Ad ,M
M
(Multiplicative)
M,N
M,A
M,M
Md
(Multiplicative damped)
Md ,N
Md ,A
Md ,M
General notation
E T S : ExponenTial Smoothing
Examples:
A,N,N:
A,A,N:
M,A,M:
Simple exponential smoothing with additive errors
Holt’s linear method with additive errors
Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting: Principles and Practice
Recall ETS models
3
Exponential smoothing methods
Seasonal Component
N
A
M
(None)
(Additive) (Multiplicative)
Trend
Component
N
(None)
N,N
N,A
N,M
A
(Additive)
A,N
A,A
A,M
Ad
(Additive damped)
Ad ,N
Ad ,A
Ad ,M
M
(Multiplicative)
M,N
M,A
M,M
Md
(Multiplicative damped)
Md ,N
Md ,A
Md ,M
General notation
E T S : ExponenTial Smoothing
Examples:
A,N,N:
A,A,N:
M,A,M:
Simple exponential smoothing with additive errors
Holt’s linear method with additive errors
Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting: Principles and Practice
Recall ETS models
3
Exponential smoothing methods
Seasonal Component
N
A
M
(None)
(Additive) (Multiplicative)
Trend
Component
N
(None)
N,N
N,A
N,M
A
(Additive)
A,N
A,A
A,M
Ad
(Additive damped)
Ad ,N
Ad ,A
Ad ,M
M
(Multiplicative)
M,N
M,A
M,M
Md
(Multiplicative damped)
Md ,N
Md ,A
Md ,M
General notation
E T S : ExponenTial Smoothing
↑
Trend
Examples:
A,N,N:
A,A,N:
M,A,M:
Simple exponential smoothing with additive errors
Holt’s linear method with additive errors
Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting: Principles and Practice
Recall ETS models
3
Exponential smoothing methods
Seasonal Component
N
A
M
(None)
(Additive) (Multiplicative)
Trend
Component
N
(None)
N,N
N,A
N,M
A
(Additive)
A,N
A,A
A,M
Ad
(Additive damped)
Ad ,N
Ad ,A
Ad ,M
M
(Multiplicative)
M,N
M,A
M,M
Md
(Multiplicative damped)
Md ,N
Md ,A
Md ,M
General notation
E T S : ExponenTial Smoothing
↑ -
Trend Seasonal
Examples:
A,N,N:
A,A,N:
M,A,M:
Simple exponential smoothing with additive errors
Holt’s linear method with additive errors
Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting: Principles and Practice
Recall ETS models
3
Exponential smoothing methods
Seasonal Component
N
A
M
(None)
(Additive) (Multiplicative)
Trend
Component
N
(None)
N,N
N,A
N,M
A
(Additive)
A,N
A,A
A,M
Ad
(Additive damped)
Ad ,N
Ad ,A
Ad ,M
M
(Multiplicative)
M,N
M,A
M,M
Md
(Multiplicative damped)
Md ,N
Md ,A
Md ,M
General notation
E T S : ExponenTial Smoothing
% ↑ -
Error Trend Seasonal
Examples:
A,N,N:
A,A,N:
M,A,M:
Simple exponential smoothing with additive errors
Holt’s linear method with additive errors
Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting: Principles and Practice
Recall ETS models
3
Exponential smoothing methods
Seasonal Component
N
A
M
(None)
(Additive) (Multiplicative)
Trend
Component
N
(None)
N,N
N,A
N,M
A
(Additive)
A,N
A,A
A,M
Ad
(Additive damped)
Ad ,N
Ad ,A
Ad ,M
M
(Multiplicative)
M,N
M,A
M,M
Md
(Multiplicative damped)
Md ,N
Md ,A
Md ,M
General notation
E T S : ExponenTial Smoothing
% ↑ -
Error Trend Seasonal
Examples:
A,N,N:
A,A,N:
M,A,M:
Simple exponential smoothing with additive errors
Holt’s linear method with additive errors
Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting: Principles and Practice
Recall ETS models
3
Exponential smoothing methods
Innovations state space models
Seasonal Component
Trend
N
A
M
å AllComponent
ETS models can (None)
be written
in innovations
(Additive)
(Multiplicative)
N
state
(None) space form.
N,N
N,A
N,M
A
(Additive)
A,N
A,A
A,M
å Additive
and multiplicative
versions
give
the
Ad
(Additive damped)
Ad ,N
Ad ,A
Ad ,M
M
Md
same
point forecastsM,N
but different
prediction
(Multiplicative)
M,A
M,M
intervals.
(Multiplicative damped)
Md ,N
Md ,A
Md ,M
General notation
E T S : ExponenTial Smoothing
% ↑ -
Error Trend Seasonal
Examples:
A,N,N:
A,A,N:
M,A,M:
Simple exponential smoothing with additive errors
Holt’s linear method with additive errors
Multiplicative Holt-Winters’ method with multiplicative errors
Forecasting: Principles and Practice
Recall ETS models
3
Innovations state space models
iid
Let xt = (`t , bt , st , st−1 , . . . , st−m+1 ) and εt ∼ N(0, σ 2 ).
yt
= h(xt−1 ) + k (xt−1 )εt
| {z }
µt
|
{z
et
}
xt = f (xt−1 ) + g(xt−1 )εt
Additive errors:
k (x) = 1.
yt = µ t + ε t .
Multiplicative errors:
k (xt−1 ) = µt .
yt = µt (1 + εt ).
εt = (yt − µt )/µt is relative error.
Forecasting: Principles and Practice
Recall ETS models
4
Outline
1 Recall ETS models
2 Simple structural models
3 Linear Gaussian state space models
4 Kalman filter
5 ARIMA models in state space form
6 Kalman smoothing
7 Time varying parameter models
Forecasting: Principles and Practice
Simple structural models
5
State space models
xt −1
ETS state vector
yt
xt = (`t , bt , st , st−1 , . . . , st−m+1 )
xt
yt+1
xt +1
yt+2
xt +2
y t +3
xt +3
yt+4
xt+4
Forecasting: Principles and Practice
yt+5
Simple structural models
6
State space models
xt −1
ETS state vector
yt
xt = (`t , bt , st , st−1 , . . . , st−m+1 )
xt
yt+1
ETS models
å yt depends on xt−1 .
xt +1
å The same error
yt+2
xt +2
process affects
xt |xt−1 and yt |xt−1 .
y t +3
xt +3
yt+4
xt+4
Forecasting: Principles and Practice
yt+5
Simple structural models
6
State space models
xt
ETS state vector
yt
xt = (`t , bt , st , st−1 , . . . , st−m+1 )
xt +1
y t +1
xt+2
yt+2
xt+3
Structural models
å yt depends on xt .
å A different error
process affects
xt |xt−1 and yt |xt .
Forecasting: Principles and Practice
yt+3
xt +4
y t +4
xt+5
yt+5
Simple structural models
7
Local level model
Stochastically varying level (random walk)
observed with noise
y t = `t + ε t
` t = ` t −1 + ξ t
εt and ξt are independent Gaussian white noise
processes.
Compare ETS(A,N,N) where ξt = αεt−1 .
Parameters to estimate: σε2 and σξ2 .
If σξ2 = 0, yt ∼ NID(`0 , σε2 ).
Forecasting: Principles and Practice
Simple structural models
8
Local level model
Stochastically varying level (random walk)
observed with noise
y t = `t + ε t
` t = ` t −1 + ξ t
εt and ξt are independent Gaussian white noise
processes.
Compare ETS(A,N,N) where ξt = αεt−1 .
Parameters to estimate: σε2 and σξ2 .
If σξ2 = 0, yt ∼ NID(`0 , σε2 ).
Forecasting: Principles and Practice
Simple structural models
8
Local level model
Stochastically varying level (random walk)
observed with noise
y t = `t + ε t
` t = ` t −1 + ξ t
εt and ξt are independent Gaussian white noise
processes.
Compare ETS(A,N,N) where ξt = αεt−1 .
Parameters to estimate: σε2 and σξ2 .
If σξ2 = 0, yt ∼ NID(`0 , σε2 ).
Forecasting: Principles and Practice
Simple structural models
8
Local level model
Stochastically varying level (random walk)
observed with noise
y t = `t + ε t
` t = ` t −1 + ξ t
εt and ξt are independent Gaussian white noise
processes.
Compare ETS(A,N,N) where ξt = αεt−1 .
Parameters to estimate: σε2 and σξ2 .
If σξ2 = 0, yt ∼ NID(`0 , σε2 ).
Forecasting: Principles and Practice
Simple structural models
8
Local linear trend model
Dynamic trend observed with noise
y t = `t + ε t
`t = `t−1 + bt−1 + ξt
bt = bt−1 + ζt
εt , ξt and ζt are independent Gaussian white
noise processes.
Compare ETS(A,A,N) where ξt = (α + β)εt−1 and
ζt = βεt−1
Parameters to estimate: σε2 , σξ2 , and σζ2 .
If σζ2 = σξ2 = 0, yt = `0 + tb0 + εt .
Model is a time-varying linear regression.
Forecasting: Principles and Practice
Simple structural models
9
Local linear trend model
Dynamic trend observed with noise
y t = `t + ε t
`t = `t−1 + bt−1 + ξt
bt = bt−1 + ζt
εt , ξt and ζt are independent Gaussian white
noise processes.
Compare ETS(A,A,N) where ξt = (α + β)εt−1 and
ζt = βεt−1
Parameters to estimate: σε2 , σξ2 , and σζ2 .
If σζ2 = σξ2 = 0, yt = `0 + tb0 + εt .
Model is a time-varying linear regression.
Forecasting: Principles and Practice
Simple structural models
9
Local linear trend model
Dynamic trend observed with noise
y t = `t + ε t
`t = `t−1 + bt−1 + ξt
bt = bt−1 + ζt
εt , ξt and ζt are independent Gaussian white
noise processes.
Compare ETS(A,A,N) where ξt = (α + β)εt−1 and
ζt = βεt−1
Parameters to estimate: σε2 , σξ2 , and σζ2 .
If σζ2 = σξ2 = 0, yt = `0 + tb0 + εt .
Model is a time-varying linear regression.
Forecasting: Principles and Practice
Simple structural models
9
Local linear trend model
Dynamic trend observed with noise
y t = `t + ε t
`t = `t−1 + bt−1 + ξt
bt = bt−1 + ζt
εt , ξt and ζt are independent Gaussian white
noise processes.
Compare ETS(A,A,N) where ξt = (α + β)εt−1 and
ζt = βεt−1
Parameters to estimate: σε2 , σξ2 , and σζ2 .
If σζ2 = σξ2 = 0, yt = `0 + tb0 + εt .
Model is a time-varying linear regression.
Forecasting: Principles and Practice
Simple structural models
9
Local linear trend model
Dynamic trend observed with noise
y t = `t + ε t
`t = `t−1 + bt−1 + ξt
bt = bt−1 + ζt
εt , ξt and ζt are independent Gaussian white
noise processes.
Compare ETS(A,A,N) where ξt = (α + β)εt−1 and
ζt = βεt−1
Parameters to estimate: σε2 , σξ2 , and σζ2 .
If σζ2 = σξ2 = 0, yt = `0 + tb0 + εt .
Model is a time-varying linear regression.
Forecasting: Principles and Practice
Simple structural models
9
Basic structural model
yt = `t + s1,t + εt
`t = `t−1 + bt−1 + ξt
bt = bt−1 + ζt
s1,t = −
m
−1
X
sj,t−1 + ηt
j=1
sj,t = sj−1,t−1 ,
j = 2, . . . , m − 1
εt , ξt , ζt and ηt are independent Gaussian white noise
processes.
Compare ETS(A,A,A).
Parameters to estimate: σε2 , σξ2 , σζ2 and ση2
Deterministic seasonality if ση2 = 0.
Forecasting: Principles and Practice
Simple structural models
10
Basic structural model
yt = `t + s1,t + εt
`t = `t−1 + bt−1 + ξt
bt = bt−1 + ζt
s1,t = −
m
−1
X
sj,t−1 + ηt
j=1
sj,t = sj−1,t−1 ,
j = 2, . . . , m − 1
εt , ξt , ζt and ηt are independent Gaussian white noise
processes.
Compare ETS(A,A,A).
Parameters to estimate: σε2 , σξ2 , σζ2 and ση2
Deterministic seasonality if ση2 = 0.
Forecasting: Principles and Practice
Simple structural models
10
Basic structural model
yt = `t + s1,t + εt
`t = `t−1 + bt−1 + ξt
bt = bt−1 + ζt
s1,t = −
m
−1
X
sj,t−1 + ηt
j=1
sj,t = sj−1,t−1 ,
j = 2, . . . , m − 1
εt , ξt , ζt and ηt are independent Gaussian white noise
processes.
Compare ETS(A,A,A).
Parameters to estimate: σε2 , σξ2 , σζ2 and ση2
Deterministic seasonality if ση2 = 0.
Forecasting: Principles and Practice
Simple structural models
10
Basic structural model
yt = `t + s1,t + εt
`t = `t−1 + bt−1 + ξt
bt = bt−1 + ζt
s1,t = −
m
−1
X
sj,t−1 + ηt
j=1
sj,t = sj−1,t−1 ,
j = 2, . . . , m − 1
εt , ξt , ζt and ηt are independent Gaussian white noise
processes.
Compare ETS(A,A,A).
Parameters to estimate: σε2 , σξ2 , σζ2 and ση2
Deterministic seasonality if ση2 = 0.
Forecasting: Principles and Practice
Simple structural models
10
Trigonometric models
yt = `t +
J
X
sj,t + εt
j=1
`t = `t−1 + bt−1 + ξt
bt = bt−1 + ζt
sj,t =
cos λj sj,t−1 + sin λj s∗j,t−1 + ωj,t
s∗j,t = − sin λj sj,t−1 + cos λj s∗j,t−1 + ωj∗,t
λj = 2π j/m
εt , ξt , ζt , ωj,t , ωj∗,t are independent Gaussian white
noise processes
2
ωj,t and ωj∗,t have same variance σω,
j
2
2
and
J = m/2
Equivalent to BSM when σω,
=
σ
ω
j
Choose J < m/2 for fewer degrees of freedom
Forecasting: Principles and Practice
Simple structural models
11
Trigonometric models
yt = `t +
J
X
sj,t + εt
j=1
`t = `t−1 + bt−1 + ξt
bt = bt−1 + ζt
sj,t =
cos λj sj,t−1 + sin λj s∗j,t−1 + ωj,t
s∗j,t = − sin λj sj,t−1 + cos λj s∗j,t−1 + ωj∗,t
λj = 2π j/m
εt , ξt , ζt , ωj,t , ωj∗,t are independent Gaussian white
noise processes
2
ωj,t and ωj∗,t have same variance σω,
j
2
2
and
J = m/2
Equivalent to BSM when σω,
=
σ
ω
j
Choose J < m/2 for fewer degrees of freedom
Forecasting: Principles and Practice
Simple structural models
11
Trigonometric models
yt = `t +
J
X
sj,t + εt
j=1
`t = `t−1 + bt−1 + ξt
bt = bt−1 + ζt
sj,t =
cos λj sj,t−1 + sin λj s∗j,t−1 + ωj,t
s∗j,t = − sin λj sj,t−1 + cos λj s∗j,t−1 + ωj∗,t
λj = 2π j/m
εt , ξt , ζt , ωj,t , ωj∗,t are independent Gaussian white
noise processes
2
ωj,t and ωj∗,t have same variance σω,
j
2
2
and
J = m/2
Equivalent to BSM when σω,
=
σ
ω
j
Choose J < m/2 for fewer degrees of freedom
Forecasting: Principles and Practice
Simple structural models
11
Trigonometric models
yt = `t +
J
X
sj,t + εt
j=1
`t = `t−1 + bt−1 + ξt
bt = bt−1 + ζt
sj,t =
cos λj sj,t−1 + sin λj s∗j,t−1 + ωj,t
s∗j,t = − sin λj sj,t−1 + cos λj s∗j,t−1 + ωj∗,t
λj = 2π j/m
εt , ξt , ζt , ωj,t , ωj∗,t are independent Gaussian white
noise processes
2
ωj,t and ωj∗,t have same variance σω,
j
2
2
and
J = m/2
Equivalent to BSM when σω,
=
σ
ω
j
Choose J < m/2 for fewer degrees of freedom
Forecasting: Principles and Practice
Simple structural models
11
Trigonometric models
yt = `t +
J
X
sj,t + εt
j=1
`t = `t−1 + bt−1 + ξt
bt = bt−1 + ζt
sj,t =
cos λj sj,t−1 + sin λj s∗j,t−1 + ωj,t
s∗j,t = − sin λj sj,t−1 + cos λj s∗j,t−1 + ωj∗,t
λj = 2π j/m
εt , ξt , ζt , ωj,t , ωj∗,t are independent Gaussian white
noise processes
2
ωj,t and ωj∗,t have same variance σω,
j
2
2
and
J = m/2
Equivalent to BSM when σω,
=
σ
ω
j
Choose J < m/2 for fewer degrees of freedom
Forecasting: Principles and Practice
Simple structural models
11
ETS vs Structural models
ETS models are much more general as they
allow non-linear (multiplicative components).
ETS allows automatic forecasting due to its
larger model space.
Additive ETS models are almost equivalent to
the corresponding structural models.
ETS models have a larger parameter space.
Structural models parameters are always
non-negative (variances).
Structural models are much easier to
generalize (e.g., add covariates).
It is easier to handle missing values with
structural models.
Forecasting: Principles and Practice
Simple structural models
12
ETS vs Structural models
ETS models are much more general as they
allow non-linear (multiplicative components).
ETS allows automatic forecasting due to its
larger model space.
Additive ETS models are almost equivalent to
the corresponding structural models.
ETS models have a larger parameter space.
Structural models parameters are always
non-negative (variances).
Structural models are much easier to
generalize (e.g., add covariates).
It is easier to handle missing values with
structural models.
Forecasting: Principles and Practice
Simple structural models
12
ETS vs Structural models
ETS models are much more general as they
allow non-linear (multiplicative components).
ETS allows automatic forecasting due to its
larger model space.
Additive ETS models are almost equivalent to
the corresponding structural models.
ETS models have a larger parameter space.
Structural models parameters are always
non-negative (variances).
Structural models are much easier to
generalize (e.g., add covariates).
It is easier to handle missing values with
structural models.
Forecasting: Principles and Practice
Simple structural models
12
ETS vs Structural models
ETS models are much more general as they
allow non-linear (multiplicative components).
ETS allows automatic forecasting due to its
larger model space.
Additive ETS models are almost equivalent to
the corresponding structural models.
ETS models have a larger parameter space.
Structural models parameters are always
non-negative (variances).
Structural models are much easier to
generalize (e.g., add covariates).
It is easier to handle missing values with
structural models.
Forecasting: Principles and Practice
Simple structural models
12
ETS vs Structural models
ETS models are much more general as they
allow non-linear (multiplicative components).
ETS allows automatic forecasting due to its
larger model space.
Additive ETS models are almost equivalent to
the corresponding structural models.
ETS models have a larger parameter space.
Structural models parameters are always
non-negative (variances).
Structural models are much easier to
generalize (e.g., add covariates).
It is easier to handle missing values with
structural models.
Forecasting: Principles and Practice
Simple structural models
12
ETS vs Structural models
ETS models are much more general as they
allow non-linear (multiplicative components).
ETS allows automatic forecasting due to its
larger model space.
Additive ETS models are almost equivalent to
the corresponding structural models.
ETS models have a larger parameter space.
Structural models parameters are always
non-negative (variances).
Structural models are much easier to
generalize (e.g., add covariates).
It is easier to handle missing values with
structural models.
Forecasting: Principles and Practice
Simple structural models
12
Structural models in R
StructTS(oil, type="level")
StructTS(ausair, type="trend")
StructTS(austourists, type="BSM")
fit <- StructTS(austourists, type = "BSM")
decomp <- cbind(austourists, fitted(fit))
colnames(decomp) <- c("data","level","slope",
"seasonal")
plot(decomp, main="Decomposition of
International visitor nights")
Forecasting: Principles and Practice
Simple structural models
13
Structural models in R
40
35
−0.5
10 −2.0
0
−10
seasonal
slope
25
level
45 20
data
60
Decomposition of International visitor nights
2000
2002
2004
2006
2008
2010
Time
Forecasting: Principles and Practice
Simple structural models
14
ETS decomposition
60
40
45 20
35
0.509025
0 5 0.5075
−10
season
slope
level
observed
Decomposition by ETS(A,A,A) method
2000
2002
2004
2006
2008
2010
Time
Forecasting: Principles and Practice
Simple structural models
15
Outline
1 Recall ETS models
2 Simple structural models
3 Linear Gaussian state space models
4 Kalman filter
5 ARIMA models in state space form
6 Kalman smoothing
7 Time varying parameter models
Forecasting: Principles and Practice
Linear Gaussian state space models
16
Linear Gaussian SS models
Observation equation
State equation
yt = f 0 xt + εt
xt = Gxt−1 + wt
State vector xt of length p
G a p × p matrix, f a vector of length p
εt ∼ NID(0, σ 2 ), wt ∼ NID(0, W ).
Local level model:
f = G = 1,
xt = `t .
Local linear trend model:
f 0 = [1 0],
2
σξ 0
`t
1 1
xt =
G=
W=
0 σζ2
bt
0 1
Forecasting: Principles and Practice
Linear Gaussian state space models
17
Linear Gaussian SS models
Observation equation
State equation
yt = f 0 xt + εt
xt = Gxt−1 + wt
State vector xt of length p
G a p × p matrix, f a vector of length p
εt ∼ NID(0, σ 2 ), wt ∼ NID(0, W ).
Local level model:
f = G = 1,
xt = `t .
Local linear trend model:
f 0 = [1 0],
2
σξ 0
`t
1 1
xt =
G=
W=
0 σζ2
bt
0 1
Forecasting: Principles and Practice
Linear Gaussian state space models
17
Linear Gaussian SS models
Observation equation
State equation
yt = f 0 xt + εt
xt = Gxt−1 + wt
State vector xt of length p
G a p × p matrix, f a vector of length p
εt ∼ NID(0, σ 2 ), wt ∼ NID(0, W ).
Local level model:
f = G = 1,
xt = `t .
Local linear trend model:
f 0 = [1 0],
2
σξ 0
`t
1 1
xt =
G=
W=
0 σζ2
bt
0 1
Forecasting: Principles and Practice
Linear Gaussian state space models
17
Linear Gaussian SS models
Observation equation
State equation
yt = f 0 xt + εt
xt = Gxt−1 + wt
State vector xt of length p
G a p × p matrix, f a vector of length p
εt ∼ NID(0, σ 2 ), wt ∼ NID(0, W ).
Local level model:
f = G = 1,
xt = `t .
Local linear trend model:
f 0 = [1 0],
2
σξ 0
`t
1 1
xt =
G=
W=
0 σζ2
bt
0 1
Forecasting: Principles and Practice
Linear Gaussian state space models
17
Linear Gaussian SS models
Observation equation
State equation
yt = f 0 xt + εt
xt = Gxt−1 + wt
State vector xt of length p
G a p × p matrix, f a vector of length p
εt ∼ NID(0, σ 2 ), wt ∼ NID(0, W ).
Local level model:
f = G = 1,
xt = `t .
Local linear trend model:
f 0 = [1 0],
2
σξ 0
`t
1 1
xt =
G=
W=
0 σζ2
bt
0 1
Forecasting: Principles and Practice
Linear Gaussian state space models
17
Linear Gaussian SS models
Observation equation
State equation
yt = f 0 xt + εt
xt = Gxt−1 + wt
State vector xt of length p
G a p × p matrix, f a vector of length p
εt ∼ NID(0, σ 2 ), wt ∼ NID(0, W ).
Local level model:
f = G = 1,
xt = `t .
Local linear trend model:
f 0 = [1 0],
2
σξ 0
`t
1 1
xt =
G=
W=
bt
0 1
0 σζ2
Forecasting: Principles and Practice
Linear Gaussian state space models
17
Linear Gaussian SS models
Observation equation
State equation
yt = f 0 xt + εt
xt = Gxt−1 + wt
State vector xt of length p
G a p × p matrix, f a vector of length p
εt ∼ NID(0, σ 2 ), wt ∼ NID(0, W ).
Local level model:
f = G = 1,
xt = `t .
Local linear trend model:
f 0 = [1 0],
2
σξ 0
`t
1 1
xt =
G=
W=
bt
0 1
0 σζ2
Forecasting: Principles and Practice
Linear Gaussian state space models
17
Basic structural model
Linear Gaussian state space model
yt = f 0 xt + εt ,
εt ∼ N(0, σ 2 )
wt ∼ N(0, W )
xt = Gxt−1 + wt
f 0 = [1 0 1 0 · · · 0],






xt = 




`t
bt
s1,t
s2,t
s3,t
..
.












G=








sm−1,t
Forecasting: Principles and Practice
W = diagonal(σξ2 , σζ2 , ση2 , 0, . . . , 0)

...
0
0
...
0
0

. . . −1 −1 

...
0
0

..
.. 
...
0
1
.
.

.. . . . .
.
.
0
0
.
0 ... 0
1
0
1
0
0
0
1
0
0
1
0
0
0 −1 −1
0
1
0
0
..
.
0
..
.
0 0
Linear Gaussian state space models
18
Basic structural model
Linear Gaussian state space model
yt = f 0 xt + εt ,
εt ∼ N(0, σ 2 )
wt ∼ N(0, W )
xt = Gxt−1 + wt
f 0 = [1 0 1 0 · · · 0],






xt = 




`t
bt
s1,t
s2,t
s3,t
..
.












G=








sm−1,t
Forecasting: Principles and Practice
W = diagonal(σξ2 , σζ2 , ση2 , 0, . . . , 0)

...
0
0
...
0
0

. . . −1 −1 

...
0
0

..
.. 
...
0
1
.
.

.. . . . .
.
.
0
0
.
0 ... 0
1
0
1
0
0
0
1
0
0
1
0
0
0 −1 −1
0
1
0
0
..
.
0
..
.
0 0
Linear Gaussian state space models
18
Outline
1 Recall ETS models
2 Simple structural models
3 Linear Gaussian state space models
4 Kalman filter
5 ARIMA models in state space form
6 Kalman smoothing
7 Time varying parameter models
Forecasting: Principles and Practice
Kalman filter
19
Kalman filter
Notation:
x̂t|t = E[xt |y1 , . . . , yt ]
P̂t|t = Var[xt |y1 , . . . , yt ]
x̂t|t−1 = E[xt |y1 , . . . , yt−1 ]
P̂t|t−1 = Var[xt |y1 , . . . , yt−1 ]
ŷt|t−1 = E[yt |y1 , . . . , yt−1 ]
v̂t|t−1 = Var[yt |y1 , . . . , yt−1 ]
Forecasting:
ŷt|t−1 = f 0 x̂t|t−1
v̂t|t−1 = f 0 P̂t|t−1 f + σ 2
Updating or State Filtering:
x̂t|t = x̂t|t−1 + P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 )
P̂t|t = P̂t|t−1 − P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1
State Prediction
x̂t+1|t = Gx̂t|t
P̂t+1|t = GP̂t|t G0 + W
Forecasting: Principles and Practice
Kalman filter
20
Kalman filter
Notation:
x̂t|t = E[xt |y1 , . . . , yt ]
P̂t|t = Var[xt |y1 , . . . , yt ]
x̂t|t−1 = E[xt |y1 , . . . , yt−1 ]
P̂t|t−1 = Var[xt |y1 , . . . , yt−1 ]
ŷt|t−1 = E[yt |y1 , . . . , yt−1 ]
v̂t|t−1 = Var[yt |y1 , . . . , yt−1 ]
Forecasting:
ŷt|t−1 = f 0 x̂t|t−1
v̂t|t−1 = f 0 P̂t|t−1 f + σ 2
Updating or State Filtering:
x̂t|t = x̂t|t−1 + P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 )
P̂t|t = P̂t|t−1 − P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1
State Prediction
x̂t+1|t = Gx̂t|t
P̂t+1|t = GP̂t|t G0 + W
Forecasting: Principles and Practice
Kalman filter
20
Kalman filter
Notation:
x̂t|t = E[xt |y1 , . . . , yt ]
P̂t|t = Var[xt |y1 , . . . , yt ]
x̂t|t−1 = E[xt |y1 , . . . , yt−1 ]
P̂t|t−1 = Var[xt |y1 , . . . , yt−1 ]
ŷt|t−1 = E[yt |y1 , . . . , yt−1 ]
v̂t|t−1 = Var[yt |y1 , . . . , yt−1 ]
Forecasting:
ŷt|t−1 = f 0 x̂t|t−1
v̂t|t−1 = f 0 P̂t|t−1 f + σ 2
Updating or State Filtering:
x̂t|t = x̂t|t−1 + P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 )
P̂t|t = P̂t|t−1 − P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1
State Prediction
x̂t+1|t = Gx̂t|t
P̂t+1|t = GP̂t|t G0 + W
Forecasting: Principles and Practice
Kalman filter
20
Kalman filter
Notation:
x̂t|t = E[xt |y1 , . . . , yt ]
P̂t|t = Var[xt |y1 , . . . , yt ]
x̂t|t−1 = E[xt |y1 , . . . , yt−1 ]
P̂t|t−1 = Var[xt |y1 , . . . , yt−1 ]
ŷt|t−1 = E[yt |y1 , . . . , yt−1 ]
v̂t|t−1 = Var[yt |y1 , . . . , yt−1 ]
Forecasting:
ŷt|t−1 = f 0 x̂t|t−1
v̂t|t−1 = f 0 P̂t|t−1 f + σ 2
Updating or State Filtering:
x̂t|t = x̂t|t−1 + P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 )
P̂t|t = P̂t|t−1 − P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1
State Prediction
x̂t+1|t = Gx̂t|t
P̂t+1|t = GP̂t|t G0 + W
Forecasting: Principles and Practice
Kalman filter
20
Kalman filter
Notation:
x̂t|t = E[xt |y1 , . . . , yt ]
P̂t|t = Var[xt |y1 , . . . , yt ]
x̂t|t−1 = E[xt |y1 , . . . , yt−1 ]
P̂t|t−1 = Var[xt |y1 , . . . , yt−1 ]
ŷt|t−1 = E[yt |y1 , . . . , yt−1 ]
v̂t|t−1 = Var[yt |y1 , . . . , yt−1 ]
Forecasting:
ŷt|t−1 = f 0 x̂t|t−1
Iterate for t = 1, . . . , T
v̂t|t−1 = f 0 P̂t|t−1 f + σ 2
Updating or State Filtering:
x̂t|t = x̂t|t−1 + P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 )
P̂t|t = P̂t|t−1 − P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1
State Prediction
x̂t+1|t = Gx̂t|t
P̂t+1|t = GP̂t|t G0 + W
Forecasting: Principles and Practice
Kalman filter
20
Kalman filter
Notation:
x̂t|t = E[xt |y1 , . . . , yt ]
P̂t|t = Var[xt |y1 , . . . , yt ]
x̂t|t−1 = E[xt |y1 , . . . , yt−1 ]
P̂t|t−1 = Var[xt |y1 , . . . , yt−1 ]
ŷt|t−1 = E[yt |y1 , . . . , yt−1 ]
v̂t|t−1 = Var[yt |y1 , . . . , yt−1 ]
Forecasting:
ŷt|t−1 = f 0 x̂t|t−1
v̂t|t−1 = f 0 P̂t|t−1 f + σ 2
Updating or State Filtering:
x̂t|t = x̂t|t−1 + P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 )
Iterate for t = 1, . . . , T
Assume we know x1|0 and
P1|0 .
P̂t|t = P̂t|t−1 − P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1
State Prediction
x̂t+1|t = Gx̂t|t
P̂t+1|t = GP̂t|t G0 + W
Forecasting: Principles and Practice
Kalman filter
20
Kalman filter
Notation:
x̂t|t = E[xt |y1 , . . . , yt ]
P̂t|t = Var[xt |y1 , . . . , yt ]
x̂t|t−1 = E[xt |y1 , . . . , yt−1 ]
P̂t|t−1 = Var[xt |y1 , . . . , yt−1 ]
ŷt|t−1 = E[yt |y1 , . . . , yt−1 ]
v̂t|t−1 = Var[yt |y1 , . . . , yt−1 ]
Forecasting:
ŷt|t−1 = f 0 x̂t|t−1
Iterate for t = 1, . . . , T
v̂t|t−1 = f 0 P̂t|t−1 f + σ 2
Updating or State Filtering:
x̂t|t = x̂t|t−1 + P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 )
Assume we know x1|0 and
P1|0 .
P̂t|t = P̂t|t−1 − P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1
State Prediction
x̂t+1|t = Gx̂t|t
0
P̂t+1|t = GP̂t|t G + W
Forecasting: Principles and Practice
Just conditional expectations. So this
gives minimum MSE estimates.
Kalman filter
20
Kalman recursions
KALMAN
RECURSIONS
observation at time t
2. Forecasting
Forecast Observation
1. State Prediction
Filtered State
Time t-1
Forecasting: Principles and Practice
3. State Filtering
Predicted State
Filtered State
Time t
Time t
Kalman filter
21
Initializing Kalman filter
Need x1|0 and P1|0 to get started.
Common approach for structural models:
set x1|0 = 0 and P1|0 = kI for a very large k.
Lots of research papers on optimal
initialization choices for Kalman recursions.
ETS approach was to estimate x1|0 and avoid
P1|0 by assuming error processes identical.
A random x1|0 could be used with ETS models,
and then a form of Kalman filter would be
required for estimation and forecasting.
This gives more realistic prediction intervals.
Forecasting: Principles and Practice
Kalman filter
22
Initializing Kalman filter
Need x1|0 and P1|0 to get started.
Common approach for structural models:
set x1|0 = 0 and P1|0 = kI for a very large k.
Lots of research papers on optimal
initialization choices for Kalman recursions.
ETS approach was to estimate x1|0 and avoid
P1|0 by assuming error processes identical.
A random x1|0 could be used with ETS models,
and then a form of Kalman filter would be
required for estimation and forecasting.
This gives more realistic prediction intervals.
Forecasting: Principles and Practice
Kalman filter
22
Initializing Kalman filter
Need x1|0 and P1|0 to get started.
Common approach for structural models:
set x1|0 = 0 and P1|0 = kI for a very large k.
Lots of research papers on optimal
initialization choices for Kalman recursions.
ETS approach was to estimate x1|0 and avoid
P1|0 by assuming error processes identical.
A random x1|0 could be used with ETS models,
and then a form of Kalman filter would be
required for estimation and forecasting.
This gives more realistic prediction intervals.
Forecasting: Principles and Practice
Kalman filter
22
Initializing Kalman filter
Need x1|0 and P1|0 to get started.
Common approach for structural models:
set x1|0 = 0 and P1|0 = kI for a very large k.
Lots of research papers on optimal
initialization choices for Kalman recursions.
ETS approach was to estimate x1|0 and avoid
P1|0 by assuming error processes identical.
A random x1|0 could be used with ETS models,
and then a form of Kalman filter would be
required for estimation and forecasting.
This gives more realistic prediction intervals.
Forecasting: Principles and Practice
Kalman filter
22
Initializing Kalman filter
Need x1|0 and P1|0 to get started.
Common approach for structural models:
set x1|0 = 0 and P1|0 = kI for a very large k.
Lots of research papers on optimal
initialization choices for Kalman recursions.
ETS approach was to estimate x1|0 and avoid
P1|0 by assuming error processes identical.
A random x1|0 could be used with ETS models,
and then a form of Kalman filter would be
required for estimation and forecasting.
This gives more realistic prediction intervals.
Forecasting: Principles and Practice
Kalman filter
22
Initializing Kalman filter
Need x1|0 and P1|0 to get started.
Common approach for structural models:
set x1|0 = 0 and P1|0 = kI for a very large k.
Lots of research papers on optimal
initialization choices for Kalman recursions.
ETS approach was to estimate x1|0 and avoid
P1|0 by assuming error processes identical.
A random x1|0 could be used with ETS models,
and then a form of Kalman filter would be
required for estimation and forecasting.
This gives more realistic prediction intervals.
Forecasting: Principles and Practice
Kalman filter
22
Local level model
yt = `t + εt
`t = `t−1 + ut
εt ∼ NID(0, σ 2 )
ut ∼ NID(0, q2 )
Kalman recursions:
ŷt|t−1 = `ˆt−1|t−1
v̂t|t−1 = p̂t|t−1 + σ 2
`ˆt|t = `ˆt−1|t−1 + p̂t|t−1 v̂t−|t1−1 (yt − ŷt|t−1 )
p̂t+1|t = p̂t|t−1 (1 − v̂t−|t1−1 p̂t|t−1 ) + q2
Forecasting: Principles and Practice
Kalman filter
23
Local level model
yt = `t + εt
`t = `t−1 + ut
εt ∼ NID(0, σ 2 )
ut ∼ NID(0, q2 )
Kalman recursions:
ŷt|t−1 = `ˆt−1|t−1
v̂t|t−1 = p̂t|t−1 + σ 2
`ˆt|t = `ˆt−1|t−1 + p̂t|t−1 v̂t−|t1−1 (yt − ŷt|t−1 )
p̂t+1|t = p̂t|t−1 (1 − v̂t−|t1−1 p̂t|t−1 ) + q2
Forecasting: Principles and Practice
Kalman filter
23
Handling missing values
Forecasting:
0
ŷt|t−1 = f x̂t|t−1
v̂t|t−1 = f 0 P̂t|t−1 f + σ 2
Iterate for t = 1, . . . , T
starting with
x1|0 and P1|0 .
Updating or State Filtering:
x̂t|t = x̂t|t−1 +P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 )
P̂t|t = P̂t|t−1 −P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1
State Prediction
x̂t|t−1 = Gx̂t−1|t−1
P̂t|t−1 = GP̂t−1|t−1 G0 + W
Forecasting: Principles and Practice
Kalman filter
24
Handling missing values
Forecasting:
0
ŷt|t−1 = f x̂t|t−1
v̂t|t−1 = f 0 P̂t|t−1 f + σ 2
Iterate for t = 1, . . . , T
starting with
x1|0 and P1|0 .
Updating or State Filtering:
x̂t|t = x̂t|t−1 +P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 )
P̂t|t = P̂t|t−1 −P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1
State Prediction
x̂t|t−1 = Gx̂t−1|t−1
Ignored greyed out
section if yt missing.
P̂t|t−1 = GP̂t−1|t−1 G0 + W
Forecasting: Principles and Practice
Kalman filter
24
Handling missing values
Forecasting:
0
ŷt|t−1 = f x̂t|t−1
v̂t|t−1 = f 0 P̂t|t−1 f + σ 2
Iterate for t = 1, . . . , T
starting with
x1|0 and P1|0 .
Updating or State Filtering:
x̂t|t = x̂t|t−1 +P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 )
P̂t|t = P̂t|t−1 −P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1
State Prediction
x̂t|t−1 = Gx̂t−1|t−1
Ignored greyed out
section if yt missing.
P̂t|t−1 = GP̂t−1|t−1 G0 + W
Forecasting: Principles and Practice
Kalman filter
24
Multi-step forecasting
Forecasting:
ŷt|t−1 = f 0 x̂t|t−1
v̂t|t−1 = f 0 P̂t|t−1 f + σ 2
Iterate for
t = T + 1, . . . , T + h
starting with
xT |T and PT |T .
Updating or State Filtering:
x̂t|t = x̂t|t−1 +P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 )
P̂t|t = P̂t|t−1 −P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1
State Prediction
x̂t|t−1 = Gx̂t−1|t−1
P̂t|t−1 = GP̂t−1|t−1 G0 + W
Forecasting: Principles and Practice
Kalman filter
25
Multi-step forecasting
Forecasting:
Iterate for
t = T + 1, . . . , T + h
starting with
xT |T and PT |T .
ŷt|t−1 = f 0 x̂t|t−1
v̂t|t−1 = f 0 P̂t|t−1 f + σ 2
Updating or State Filtering:
x̂t|t = x̂t|t−1 +P̂t|t−1 f v̂t−|t1−1 (yt − ŷt|t−1 )
P̂t|t = P̂t|t−1 −P̂t|t−1 f v̂t−|t1−1 f 0 P̂t|t−1
State Prediction
x̂t|t−1 = Gx̂t−1|t−1
Treat future values as
missing.
P̂t|t−1 = GP̂t−1|t−1 G0 + W
Forecasting: Principles and Practice
Kalman filter
25
Kalman filter
What’s so special about the Kalman filter
Very general equations for any model in state
space format.
Any model in state space format can easily be
generalized.
Optimal MSE forecasts
Easy to handle missing values.
Easy to compute likelihood.
Forecasting: Principles and Practice
Kalman filter
26
Kalman filter
What’s so special about the Kalman filter
Very general equations for any model in state
space format.
Any model in state space format can easily be
generalized.
Optimal MSE forecasts
Easy to handle missing values.
Easy to compute likelihood.
Forecasting: Principles and Practice
Kalman filter
26
Kalman filter
What’s so special about the Kalman filter
Very general equations for any model in state
space format.
Any model in state space format can easily be
generalized.
Optimal MSE forecasts
Easy to handle missing values.
Easy to compute likelihood.
Forecasting: Principles and Practice
Kalman filter
26
Kalman filter
What’s so special about the Kalman filter
Very general equations for any model in state
space format.
Any model in state space format can easily be
generalized.
Optimal MSE forecasts
Easy to handle missing values.
Easy to compute likelihood.
Forecasting: Principles and Practice
Kalman filter
26
Kalman filter
What’s so special about the Kalman filter
Very general equations for any model in state
space format.
Any model in state space format can easily be
generalized.
Optimal MSE forecasts
Easy to handle missing values.
Easy to compute likelihood.
Forecasting: Principles and Practice
Kalman filter
26
Likelihood calculation
θ = all unknown parameters
fθ (yt |y1 , y2 , . . . , yt−1 ) = one-step forecast density.
Likelihood
L(y1 , . . . , yT ; θ) =
T
Y
fθ (yt |y1 , . . . , yt−1 )
t =1
Gaussian log likelihood
log L = −
T
2
T
log(2π) −
1X
2
t =1
T
log v̂t|t−1 −
1X
2
e2t /v̂t|t−1
t =1
where et = yt − ŷt|t−1 .
All terms obtained from Kalman filter equations.
Forecasting: Principles and Practice
Kalman filter
27
Likelihood calculation
θ = all unknown parameters
fθ (yt |y1 , y2 , . . . , yt−1 ) = one-step forecast density.
Likelihood
L(y1 , . . . , yT ; θ) =
T
Y
fθ (yt |y1 , . . . , yt−1 )
t =1
Gaussian log likelihood
log L = −
T
2
T
log(2π) −
1X
2
t =1
T
log v̂t|t−1 −
1X
2
e2t /v̂t|t−1
t =1
where et = yt − ŷt|t−1 .
All terms obtained from Kalman filter equations.
Forecasting: Principles and Practice
Kalman filter
27
Likelihood calculation
θ = all unknown parameters
fθ (yt |y1 , y2 , . . . , yt−1 ) = one-step forecast density.
Likelihood
L(y1 , . . . , yT ; θ) =
T
Y
fθ (yt |y1 , . . . , yt−1 )
t =1
Gaussian log likelihood
log L = −
T
2
T
log(2π) −
1X
2
t =1
T
log v̂t|t−1 −
1X
2
e2t /v̂t|t−1
t =1
where et = yt − ŷt|t−1 .
All terms obtained from Kalman filter equations.
Forecasting: Principles and Practice
Kalman filter
27
Outline
1 Recall ETS models
2 Simple structural models
3 Linear Gaussian state space models
4 Kalman filter
5 ARIMA models in state space form
6 Kalman smoothing
7 Time varying parameter models
Forecasting: Principles and Practice
ARIMA models in state space form
28
ARMA models in state space form
AR(2) model
et ∼ NID(0, σ 2 )
yt = φ1 yt−1 + φ2 yt−2 + et ,
et
yt
.
and wt =
Let xt =
0
y t −1
Then
yt = [1 0]xt
xt =
φ1 φ2
1
0
xt−1 + wt
Now in state space form
We can use Kalman filter to compute likelihood
and forecasts.
Forecasting: Principles and Practice
ARIMA models in state space form
29
ARMA models in state space form
AR(2) model
et ∼ NID(0, σ 2 )
yt = φ1 yt−1 + φ2 yt−2 + et ,
et
yt
.
and wt =
Let xt =
0
y t −1
Then
yt = [1 0]xt
xt =
φ1 φ2
1
0
xt−1 + wt
Now in state space form
We can use Kalman filter to compute likelihood
and forecasts.
Forecasting: Principles and Practice
ARIMA models in state space form
29
ARMA models in state space form
AR(2) model
et ∼ NID(0, σ 2 )
yt = φ1 yt−1 + φ2 yt−2 + et ,
et
yt
.
and wt =
Let xt =
0
y t −1
Then
yt = [1 0]xt
xt =
φ1 φ2
1
0
xt−1 + wt
Now in state space form
We can use Kalman filter to compute likelihood
and forecasts.
Forecasting: Principles and Practice
ARIMA models in state space form
29
ARMA models in state space form
AR(2) model
et ∼ NID(0, σ 2 )
yt = φ1 yt−1 + φ2 yt−2 + et ,
et
yt
.
and wt =
Let xt =
0
y t −1
Then
yt = [1 0]xt
xt =
φ1 φ2
1
0
xt−1 + wt
Now in state space form
We can use Kalman filter to compute likelihood
and forecasts.
Forecasting: Principles and Practice
ARIMA models in state space form
29
ARMA models in state space form
AR(2) model
et ∼ NID(0, σ 2 )
yt = φ1 yt−1 + φ2 yt−2 + et ,
et
yt
.
and wt =
Let xt =
0
y t −1
Then
yt = [1 0]xt
xt =
φ1 φ2
1
0
xt−1 + wt
Now in state space form
We can use Kalman filter to compute likelihood
and forecasts.
Forecasting: Principles and Practice
ARIMA models in state space form
29
ARMA models in state space form
AR(2) model
et ∼ NID(0, σ 2 )
yt = φ1 yt−1 + φ2 yt−2 + et ,
Alternative formulation
et
yt
.
and wt =
Let xt =
0
φ2 yt−1
yt = 1 0 xt
φ1 1
xt =
x
+ wt
φ 2 0 t −1
Alternative state space form
We can use Kalman filter to compute likelihood
and forecasts.
Forecasting: Principles and Practice
ARIMA models in state space form
30
ARMA models in state space form
AR(2) model
et ∼ NID(0, σ 2 )
yt = φ1 yt−1 + φ2 yt−2 + et ,
Alternative formulation
et
yt
.
and wt =
Let xt =
0
φ2 yt−1
yt = 1 0 xt
φ1 1
xt =
x
+ wt
φ 2 0 t −1
Alternative state space form
We can use Kalman filter to compute likelihood
and forecasts.
Forecasting: Principles and Practice
ARIMA models in state space form
30
ARMA models in state space form
AR(2) model
et ∼ NID(0, σ 2 )
yt = φ1 yt−1 + φ2 yt−2 + et ,
Alternative formulation
et
yt
.
and wt =
Let xt =
0
φ2 yt−1
yt = 1 0 xt
φ1 1
xt =
x
+ wt
φ 2 0 t −1
Alternative state space form
We can use Kalman filter to compute likelihood
and forecasts.
Forecasting: Principles and Practice
ARIMA models in state space form
30
ARMA models in state space form
AR(2) model
et ∼ NID(0, σ 2 )
yt = φ1 yt−1 + φ2 yt−2 + et ,
Alternative formulation
et
yt
.
and wt =
Let xt =
0
φ2 yt−1
yt = 1 0 xt
φ1 1
xt =
x
+ wt
φ 2 0 t −1
Alternative state space form
We can use Kalman filter to compute likelihood
and forecasts.
Forecasting: Principles and Practice
ARIMA models in state space form
30
ARMA models in state space form
AR(2) model
et ∼ NID(0, σ 2 )
yt = φ1 yt−1 + φ2 yt−2 + et ,
Alternative formulation
et
yt
.
and wt =
Let xt =
0
φ2 yt−1
yt = 1 0 xt
φ1 1
xt =
x
+ wt
φ 2 0 t −1
Alternative state space form
We can use Kalman filter to compute likelihood
and forecasts.
Forecasting: Principles and Practice
ARIMA models in state space form
30
ARMA models in state space form
AR(p) model
et ∼ NID(0, σ 2 )
yt = φ1 yt−1 + · · · + φp yt−p + et ,



Let xt = 

yt
y t −1
..
.

 
et

0

 
and
w
=

 .. .
t

.
yt−p+1
0
yt = 1 0 0 . . .

φ1
1

xt =  .
 ..
φ2
0
0
...
Forecasting: Principles and Practice
0 xt

. . . φp−1 φp
...
0
0

..
..  xt−1 + wt
..
.
.
.
0
1
0
ARIMA models in state space form
31
ARMA models in state space form
AR(p) model
et ∼ NID(0, σ 2 )
yt = φ1 yt−1 + · · · + φp yt−p + et ,



Let xt = 

yt
y t −1
..
.

 
et

0

 
and
w
=

 .. .
t

.
yt−p+1
0
yt = 1 0 0 . . .

φ1
1

xt =  .
 ..
φ2
0
0
...
Forecasting: Principles and Practice
0 xt

. . . φp−1 φp
...
0
0

..
..  xt−1 + wt
..
.
.
.
0
1
0
ARIMA models in state space form
31
ARMA models in state space form
ARMA(1, 1) model
et ∼ NID(0, σ 2 )
yt = φyt−1 + θ et−1 + et ,
yt
et
Let xt =
and wt =
.
θ et
θ et
φ 1
y t = 1 0 xt
xt =
Forecasting: Principles and Practice
0 0
xt−1 + wt
ARIMA models in state space form
32
ARMA models in state space form
ARMA(1, 1) model
et ∼ NID(0, σ 2 )
yt = φyt−1 + θ et−1 + et ,
yt
et
Let xt =
and wt =
.
θ et
θ et
φ 1
y t = 1 0 xt
xt =
Forecasting: Principles and Practice
0 0
xt−1 + wt
ARIMA models in state space form
32
ARMA models in state space form
ARMA(p, q) model
yt = φ1 yt−1 + · · · + φp yt−p + θ1 et−1 + · · · + θq et−q + et
Let r = max(p, q + 1), θi = 0, q < i ≤ r, φj = 0, p < j ≤ r.
yt = 1 0 . . .

φ1
1
0 xt
0
...

0
.. 
.

1

..

.
 φ2 0 1
 θ1 
 .



.
.
.
xt =  ..
..
..
. . 0 xt−1 +  ..  et


 . 
φ
0 1
r −1 0 . . .
θr −1
φr 0 0 . . . 0
The arima function in R is implemented using this formulation.
Forecasting: Principles and Practice
ARIMA models in state space form
33
ARMA models in state space form
ARMA(p, q) model
yt = φ1 yt−1 + · · · + φp yt−p + θ1 et−1 + · · · + θq et−q + et
Let r = max(p, q + 1), θi = 0, q < i ≤ r, φj = 0, p < j ≤ r.
yt = 1 0 . . .

φ1
1
0 xt
0
...

0
.. 
.

1

..

.
 φ2 0 1
 θ1 
 .



.
.
.
xt =  ..
..
..
. . 0 xt−1 +  ..  et


 . 
φ
0 1
r −1 0 . . .
θr −1
φr 0 0 . . . 0
The arima function in R is implemented using this formulation.
Forecasting: Principles and Practice
ARIMA models in state space form
33
ARMA models in state space form
ARMA(p, q) model
yt = φ1 yt−1 + · · · + φp yt−p + θ1 et−1 + · · · + θq et−q + et
Let r = max(p, q + 1), θi = 0, q < i ≤ r, φj = 0, p < j ≤ r.
yt = 1 0 . . .

φ1
1
0 xt
0
...

0
.. 
.

1

..

.
 φ2 0 1
 θ1 
 .



.
.
.
xt =  ..
..
..
. . 0 xt−1 +  ..  et


 . 
φ
0 1
r −1 0 . . .
θr −1
φr 0 0 . . . 0
The arima function in R is implemented using this formulation.
Forecasting: Principles and Practice
ARIMA models in state space form
33
ARMA models in state space form
ARMA(p, q) model
yt = φ1 yt−1 + · · · + φp yt−p + θ1 et−1 + · · · + θq et−q + et
Let r = max(p, q + 1), θi = 0, q < i ≤ r, φj = 0, p < j ≤ r.
yt = 1 0 . . .

φ1
1
0 xt
0
...

0
.. 
.

1

..

.
 φ2 0 1
 θ1 
 .



.
.
.
xt =  ..
..
..
. . 0 xt−1 +  ..  et


 . 
φ
0 1
r −1 0 . . .
θr −1
φr 0 0 . . . 0
The arima function in R is implemented using this formulation.
Forecasting: Principles and Practice
ARIMA models in state space form
33
Outline
1 Recall ETS models
2 Simple structural models
3 Linear Gaussian state space models
4 Kalman filter
5 ARIMA models in state space form
6 Kalman smoothing
7 Time varying parameter models
Forecasting: Principles and Practice
Kalman smoothing
34
Kalman smoothing
Want estimate of xt |y1 , . . . , yT where t < T. That is,
x̂t|T .
x̂t|T = x̂t|t + At x̂t+1|T − x̂t+1|t
P̂t|T = P̂t|t + At P̂t+1|T − P̂t+1|t A0t
where
At = P̂t|t G0 P̂t+1|t
−1
.
Uses all data, not just previous data.
Useful for estimating missing values:
ŷt|T = f 0 x̂t|T .
Useful for seasonal adjustment when one of the
states is a seasonal component.
Forecasting: Principles and Practice
Kalman smoothing
35
Kalman smoothing
Want estimate of xt |y1 , . . . , yT where t < T. That is,
x̂t|T .
x̂t|T = x̂t|t + At x̂t+1|T − x̂t+1|t
P̂t|T = P̂t|t + At P̂t+1|T − P̂t+1|t A0t
where
At = P̂t|t G0 P̂t+1|t
−1
.
Uses all data, not just previous data.
Useful for estimating missing values:
ŷt|T = f 0 x̂t|T .
Useful for seasonal adjustment when one of the
states is a seasonal component.
Forecasting: Principles and Practice
Kalman smoothing
35
Kalman smoothing
Want estimate of xt |y1 , . . . , yT where t < T. That is,
x̂t|T .
x̂t|T = x̂t|t + At x̂t+1|T − x̂t+1|t
P̂t|T = P̂t|t + At P̂t+1|T − P̂t+1|t A0t
where
At = P̂t|t G0 P̂t+1|t
−1
.
Uses all data, not just previous data.
Useful for estimating missing values:
ŷt|T = f 0 x̂t|T .
Useful for seasonal adjustment when one of the
states is a seasonal component.
Forecasting: Principles and Practice
Kalman smoothing
35
Kalman smoothing in R
fit <- StructTS(austourists, type = "BSM")
sm <- tsSmooth(fit)
plot(austourists)
lines(sm[,1],col=’blue’)
lines(fitted(fit)[,1],col=’red’)
legend("topleft",col=c(’blue’,’red’),lty=1,
legend=c("Filtered level","Smoothed level"))
Forecasting: Principles and Practice
Kalman smoothing
36
Filtered level
Smoothed level
40
30
20
austourists
50
60
Kalman smoothing in R
2000
2002
2004
2006
2008
2010
Time
Forecasting: Principles and Practice
Kalman smoothing
37
Kalman smoothing in R
fit <- StructTS(austourists, type = "BSM")
sm <- tsSmooth(fit)
plot(austourists)
# Seasonally adjusted data
aus.sa <- austourists - sm[,3]
lines(aus.sa,col=’blue’)
Forecasting: Principles and Practice
Kalman smoothing
38
40
30
20
austourists
50
60
Kalman smoothing in R
2000
2002
2004
2006
2008
2010
Time
Forecasting: Principles and Practice
Kalman smoothing
39
Kalman smoothing in R
x <- austourists
miss <- sample(1:length(x), 5)
x[miss] <- NA
fit <- StructTS(x, type = "BSM")
sm <- tsSmooth(fit)
estim <- sm[,1]+sm[,3]
plot(x, ylim=range(austourists))
points(time(x)[miss], estim[miss],
col=’red’, pch=1)
points(time(x)[miss], austourists[miss],
col=’black’, pch=1)
legend("topleft", pch=1, col=c(2,1),
legend=c("Estimate","Actual"))
Forecasting: Principles and Practice
Kalman smoothing
40
60
●
Estimate
Actual
50
●
●
●
40
●
●
●
●
30
●
●
20
x
Kalman smoothing in R
2000
2002
2004
2006
2008
2010
Time
Forecasting: Principles and Practice
Kalman smoothing
41
Outline
1 Recall ETS models
2 Simple structural models
3 Linear Gaussian state space models
4 Kalman filter
5 ARIMA models in state space form
6 Kalman smoothing
7 Time varying parameter models
Forecasting: Principles and Practice
Time varying parameter models
42
Time varying parameter models
Linear Gaussian state space model
yt = ft0 xt + εt ,
xt = Gt xt−1 + wt
εt ∼ N(0, σt2 )
wt ∼ N(0, Wt )
Kalman recursions:
ŷt|t−1 = ft0 x̂t|t−1
v̂t|t−1 = ft0 P̂t|t−1 ft + σt2
x̂t|t = x̂t|t−1 + P̂t|t−1 ft v̂t−|t1−1 (yt − ŷt|t−1 )
P̂t|t = P̂t|t−1 − P̂t|t−1 ft v̂t−|t1−1 ft0 P̂t|t−1
x̂t|t−1 = Gt x̂t−1|t−1
P̂t|t−1 = Gt P̂t−1|t−1 G0t + Wt
Forecasting: Principles and Practice
Time varying parameter models
43
Time varying parameter models
Linear Gaussian state space model
yt = ft0 xt + εt ,
xt = Gt xt−1 + wt
εt ∼ N(0, σt2 )
wt ∼ N(0, Wt )
Kalman recursions:
ŷt|t−1 = ft0 x̂t|t−1
v̂t|t−1 = ft0 P̂t|t−1 ft + σt2
x̂t|t = x̂t|t−1 + P̂t|t−1 ft v̂t−|t1−1 (yt − ŷt|t−1 )
P̂t|t = P̂t|t−1 − P̂t|t−1 ft v̂t−|t1−1 ft0 P̂t|t−1
x̂t|t−1 = Gt x̂t−1|t−1
P̂t|t−1 = Gt P̂t−1|t−1 G0t + Wt
Forecasting: Principles and Practice
Time varying parameter models
43
Structural models with covariates
Local level with covariate
yt = `t + β zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
2 σξ 0
`t
1 0
Wt =
xt =
G=
0 1
β
0 0
Assumes zt is fixed and known (as in
regression)
Estimate of β is given by x̂T |T .
Equivalent to simple linear regression with time
varying intercept.
Easy to extend to multiple regression with
additional terms.
Forecasting: Principles and Practice
Time varying parameter models
44
Structural models with covariates
Local level with covariate
yt = `t + β zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
2 σξ 0
`t
1 0
Wt =
xt =
G=
0 1
β
0 0
Assumes zt is fixed and known (as in
regression)
Estimate of β is given by x̂T |T .
Equivalent to simple linear regression with time
varying intercept.
Easy to extend to multiple regression with
additional terms.
Forecasting: Principles and Practice
Time varying parameter models
44
Structural models with covariates
Local level with covariate
yt = `t + β zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
2 σξ 0
`t
1 0
Wt =
xt =
G=
0 1
β
0 0
Assumes zt is fixed and known (as in
regression)
Estimate of β is given by x̂T |T .
Equivalent to simple linear regression with time
varying intercept.
Easy to extend to multiple regression with
additional terms.
Forecasting: Principles and Practice
Time varying parameter models
44
Structural models with covariates
Local level with covariate
yt = `t + β zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
2 σξ 0
`t
1 0
Wt =
xt =
G=
0 1
β
0 0
Assumes zt is fixed and known (as in
regression)
Estimate of β is given by x̂T |T .
Equivalent to simple linear regression with time
varying intercept.
Easy to extend to multiple regression with
additional terms.
Forecasting: Principles and Practice
Time varying parameter models
44
Structural models with covariates
Local level with covariate
yt = `t + β zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
2 σξ 0
`t
1 0
Wt =
xt =
G=
0 1
β
0 0
Assumes zt is fixed and known (as in
regression)
Estimate of β is given by x̂T |T .
Equivalent to simple linear regression with time
varying intercept.
Easy to extend to multiple regression with
additional terms.
Forecasting: Principles and Practice
Time varying parameter models
44
Structural models with covariates
Local level with covariate
yt = `t + β zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
2 σξ 0
`t
1 0
Wt =
xt =
G=
0 1
β
0 0
Assumes zt is fixed and known (as in
regression)
Estimate of β is given by x̂T |T .
Equivalent to simple linear regression with time
varying intercept.
Easy to extend to multiple regression with
additional terms.
Forecasting: Principles and Practice
Time varying parameter models
44
Time varying regression
Simple linear regression with time varying
parameters
yt = `t + βt zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
βt = βt−1 + ζt
2
σξ 0
`t
1 0
xt =
G=
Wt =
βt
0 1
0 σζ2
Allows for a linear regression with parameters
that change slowly over time.
Parameters follow independent random walks.
Estimates of parameters given by x̂t|t or x̂t|T .
Forecasting: Principles and Practice
Time varying parameter models
45
Time varying regression
Simple linear regression with time varying
parameters
yt = `t + βt zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
βt = βt−1 + ζt
2
σξ 0
`t
1 0
xt =
G=
Wt =
βt
0 1
0 σζ2
Allows for a linear regression with parameters
that change slowly over time.
Parameters follow independent random walks.
Estimates of parameters given by x̂t|t or x̂t|T .
Forecasting: Principles and Practice
Time varying parameter models
45
Time varying regression
Simple linear regression with time varying
parameters
yt = `t + βt zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
βt = βt−1 + ζt
2
σξ 0
`t
1 0
xt =
G=
Wt =
βt
0 1
0 σζ2
Allows for a linear regression with parameters
that change slowly over time.
Parameters follow independent random walks.
Estimates of parameters given by x̂t|t or x̂t|T .
Forecasting: Principles and Practice
Time varying parameter models
45
Time varying regression
Simple linear regression with time varying
parameters
yt = `t + βt zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
βt = βt−1 + ζt
2
σξ 0
`t
1 0
xt =
G=
Wt =
βt
0 1
0 σζ2
Allows for a linear regression with parameters
that change slowly over time.
Parameters follow independent random walks.
Estimates of parameters given by x̂t|t or x̂t|T .
Forecasting: Principles and Practice
Time varying parameter models
45
Time varying regression
Simple linear regression with time varying
parameters
yt = `t + βt zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
βt = βt−1 + ζt
2
σξ 0
`t
1 0
xt =
G=
Wt =
βt
0 1
0 σζ2
Allows for a linear regression with parameters
that change slowly over time.
Parameters follow independent random walks.
Estimates of parameters given by x̂t|t or x̂t|T .
Forecasting: Principles and Practice
Time varying parameter models
45
Updating (“online”) regression
Same idea can be used to estimate a
regression iteratively as new data arrives.
Simple linear regression with updating
parameters
yt = `t + βt zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
βt = βt−1 + ζt
`t
1 0
0 0
xt =
G=
Wt =
βt
0 1
0 0
Updated parameter estimates given by x̂t|t .
Recursive residuals given by yt − ŷt|t−1 .
Forecasting: Principles and Practice
Time varying parameter models
46
Updating (“online”) regression
Same idea can be used to estimate a
regression iteratively as new data arrives.
Simple linear regression with updating
parameters
yt = `t + βt zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
βt = βt−1 + ζt
`t
1 0
0 0
xt =
G=
Wt =
βt
0 1
0 0
Updated parameter estimates given by x̂t|t .
Recursive residuals given by yt − ŷt|t−1 .
Forecasting: Principles and Practice
Time varying parameter models
46
Updating (“online”) regression
Same idea can be used to estimate a
regression iteratively as new data arrives.
Simple linear regression with updating
parameters
yt = `t + βt zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
βt = βt−1 + ζt
`t
1 0
0 0
xt =
G=
Wt =
βt
0 1
0 0
Updated parameter estimates given by x̂t|t .
Recursive residuals given by yt − ŷt|t−1 .
Forecasting: Principles and Practice
Time varying parameter models
46
Updating (“online”) regression
Same idea can be used to estimate a
regression iteratively as new data arrives.
Simple linear regression with updating
parameters
yt = `t + βt zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
βt = βt−1 + ζt
`t
1 0
0 0
xt =
G=
Wt =
βt
0 1
0 0
Updated parameter estimates given by x̂t|t .
Recursive residuals given by yt − ŷt|t−1 .
Forecasting: Principles and Practice
Time varying parameter models
46
Updating (“online”) regression
Same idea can be used to estimate a
regression iteratively as new data arrives.
Simple linear regression with updating
parameters
yt = `t + βt zt + εt
ft0 = [1 zt ]
`t = `t−1 + ξt
βt = βt−1 + ζt
`t
1 0
0 0
xt =
G=
Wt =
βt
0 1
0 0
Updated parameter estimates given by x̂t|t .
Recursive residuals given by yt − ŷt|t−1 .
Forecasting: Principles and Practice
Time varying parameter models
46
Download