Dacheng`s Notes - The University of Chicago Booth School of

advertisement
BUS 41910 Time Series Analysis
Linear State Space Models
Dacheng Xiu
University of Chicago Booth School of Business
1
References
I
Forecasting, Structural Time Series Models and the Kalman
Filter, by A. C. Harvey.
I
Time Series Analysis, by J. Hamilton
I
Time Series Analysis by State Space Models, 2nd Edition, by
J. Durbin and S. J. Koopman
2
General Form of Linear State Space Models
The state space model is given by:
Observation Equation :
State Equation :
i.i.d.
yt = Zt αt + εt ,
αt+1 = Tt αt + Rt ηt ,
i.i.d.
where εt ∼ N(0, Ht ), ηt ∼ N(0, Qt ),
t = 1, 2, . . . , n.
I
yt : p × 1 observation vector
I
αt : m × 1 (unobservable) state vector
I
Zt , Tt , Rt , Ht , Qt are given (up to some unknown
parameters).
I
Zt and Tt−1 can depend on y1 , y2 , . . . , yt−1 .
I
Rt is a selection matrix, i.e., columns of Im .
I
α1 ∼ N(a1 , P1 ), with a1 and P1 given.
3
Example: Local Level Model
A particular example is the local level model, which can be used for
modeling transaction prices:
yt = αt + εt ,
αt+1 = αt + ηt ,
εt ∼ N(0, σε2 )
ηt ∼ N(0, ση2 ),
where yt is the observed price, αt is the unobserved efficient price,
modeled as a random walk.
I
σε2 denotes the variance of the market microstructure noise,
i.e., bid-ask bounces.
I
ση2 is the volatility of asset returns.
4
Example: VAR(1) Model
Consider the following VAR(1) model:
Yt = AYt−1 + Ut .
It can be written (trivially) in the state space form:
y t = α t + εt ,
αt = Tt αt−1 + ηt
where yt = Yt , αt = Yt , εt = 0, Tt = A, ηt = Ut .
5
Example: ARMA models
Suppose yt follows an ARMA(2,1) model without a constant term:
yt = µ+φ1 yt−1 + φ2 yt−2 + ζt + θ1 ζt−1 ,
yt+1
φ2 yt + θ1 ζt+1
=
φ1 1
φ2 0
ζt−1 ∼ N(0, σζ2 ).
yt
φ2 yt−1 + θ1 ζt
+
1
θ1
I
εt = 0, Ht = 0. Obviously, state space form accommodates
observation errors.
I
αt = (yt , φ2 yt−1 + θ1 ζt )| .
I
Zt = (1, 0).
I
The state-space representation is not unique.
ζt+1
Homework: How to incorporate the constant µ?
6
Example: Regression Models
The regression model for a univariate yt is given by
yt = xt| β + εt ,
i.i.d.
εt ∼ N(0, Ht )
where β is a k × 1 vector.
I
Zt = xt| , Tt = Ik .
I
Rt = Qt = 0.
I
αt = α1 = β.
I
Obviously, the state-space form allows time-varying βt .
I
Can also accommodate regression model with ARMA errors.
7
Filtering
I
The object of filtering is to update our knowledge of the
system each time a new observation yt is brought in.
I
The classical filtering method is the Kalman filter, which
works under normality assumption.
I
We first develop the theory of filtering for local level model.
8
Kalman Filter
I
Let Yt−1 be the vector of (y1 , . . . , yt−1 )| , for t = 2, 3, . . ..
I
Suppose αt |Yt−1 ∼ N(at , Pt ), αt |Yt ∼ N(at|t , Pt|t ).
I
The goal is to calculate at|t , Pt|t , at+1 , and Pt+1 , when yt is
brought in, so we obtain the distribution of αt+1 |Yt .
I
Terminology: at|t filtered estimator of the state αt , and at+1
as the one-step ahead predictor of αt+1 .
9
A Useful Result
Suppose that x and y are jointly normal with
x
µx
x
Σxx
E
=
, Var
=
y
µy
y
Σ|xy
Σxy
Σyy
then the conditional distribution of x given y is normal with mean
vector
E(x|y ) = µx + Σxy Σ−1
yy (y − µy ),
(1)
and variance matrix
|
Var(x|y ) = Σxx − Σxy Σ−1
yy Σxy .
(2)
10
Kalman Filter
I
Let vt = yt − at , for t = 1, 2, . . . , n.
I
Using (1) and (2), we have
at|t =E(αt |yt , Yt−1 )
=E(αt |Yt−1 ) +
Cov(αt , yt |Yt−1 )
(yt − E(yt |Yt−1 ))
Var(yt |Yt−1 )
Pt
vt ,
Pt + σε2
=Var(αt |yt , Yt−1 )
=at +
Pt|t
Cov(αt , yt |Yt−1 )2
Var(yt |Yt−1 )
2
Pt
Pt σε2
=Pt −
=
.
Pt + σε2
Pt + σε2
=Var(αt |Yt−1 ) −
11
Kalman Filter
I
Finally, using the state equation, we have
at+1 =E(αt+1 |Yt ) = E(αt |Yt ) = at|t
Pt+1 =Var(αt+1 |Yt ) = Var(αt |Yt ) + ση2 = Pt|t + ση2
I
To make these results consistent with the general results later,
we introduce
Ft = Var(vt |Yt−1 ) = Pt + σε2 ,
Kt =
Pt
Ft
where Ft is the variance o the prediction error, and Kt is the
Kalman gain.
12
Kalman Filter: Updating Equations
I
To summarize:
vt =yt − at ,
at+1 =at + Kt vt ,
Ft = Pt + σε2
Pt+1 = Pt (1 − Kt ) + ση2
for t = 1, 2, . . . , n, where Kt = Pt /Ft .
13
Example: River Nile Data
Annual follow volume at Aswan from 1871 - 1970.
I
R package: ‘KFAS’ - SSModel, fitSSM, KFS, fitted
0
20
40
60
Time
80
100
14000
10000
6000
Variance of Filtered Values
1000
800
600
400
Filtered Annual flow
1400
I
0
20
40
60
80
100
Time
14
Forecast Errors
I
Since vt s are linear combinations of y1 , y2 , . . . , yt−1 , we have
p(v1 ) = p(y1 ), p(vt ) = p(yt |Yt−1 ) (the Jacobian term is 1),
therefore the likelihood is given by
p(y1 , y2 , . . . , yn ) =
n
Y
p(yt |Yt−1 ) =
t=1
n
Y
p(vt )
t=1
Therefore v1 , v2 , . . . , vn are mutually independent.
i.i.d.
I
vt ∼ N(0, Ft ), t = 1, 2, . . . , n.
I
Kalman filter can be used for maximum likelihood estimation.
15
Error Recursions
I
The state estimation error is
xt = αt − at ,
I
Var(xt ) = Pt .
Therefore, vt = yt − at = αt + εt − at = xt + εt .
xt+1 =αt+1 − at+1
=αt + ηt − at − Kt vt
=xt + ηt − Kt (xt + εt )
=Lt xt + ηt − Kt εt
I
xt is a linear combination of past xs, ηs and εs, so xt ⊥εt .
I
These derivations will be useful for state smoothing.
16
State Smoothing
We now consider the estimation of α1 , . . . , αn given the entire
sample path Yn , i.e., state smoothing.
I
The conditional density αt |Yn ∼ N(b
αt , Vt ).
I
α
b is the smoothed state, Vt is the smoothed state variance.
By (1) again, we have
α
bt =E(αt |Yt−1 )
(3)
−1
+ Cov(αt , Yt:n |Yt−1 )Var(Yt:n |Yt−1 )
n
X
=at +
Cov(αt , vj )Fj−1 vj
(Yt:n − E(Yt:n |Yt−1 ))
(4)
j=t
17
Smoothed State
Since Cov(αt , vj ) = Cov(xt , vj ), and
Cov(xt , vt ) = E(xt (xt + εt )) = Var(xt ) = Pt
Cov(xt , vt+1 ) = E(xt (xt+1 + εt+1 ))
= E(xt (Lt xt + ηt − Kt εt + εt+1 )) = Pt Lt
..
..
.
.
Cov(xt , vn ) = Pt Lt Lt+1 · · · Ln−1 .
Plugging this to (4), we obtain the backward state smoothing
recursion:
α
bt = at + Pt rt−1 ,
rt−1 =
vt
+ Lt rt
Ft
with rn = 0, for t = n, n − 1, . . . , 1.
18
Smoothed State Variance
By (2), we have
Vt = Var(αt |Yn ) = Pt −
n
X
Cov(αt , vj )2 Fj−1 .
j=t
Using the same trick as before, we obtain the state variance
smoothing recursion:
Vt = Pt − Pt2 Nt−1 ,
Nt−1 = Ft−1 + L2t Nt
for t = n, . . . , 1, and Nn = 0.
19
Example: River Nile Data
0
20
40
60
Time
80
100
4000
3500
3000
2500
Variance of Smoothed Values
1200
800
400
Smoothed Annual flow
State Smoothing
0
20
40
60
80
100
Time
20
Disturbance Smoothing
We now turn to the calculations of the smoothed disturbances,
together with their variances:
εbt = E(εt |Yn ) = yt − α
bt ,
ηbt = E(ηt |Yn ) = α
bt+1 − α
bt .
It is easier, however, to calculate them using rt and Nt . The
results are given here without a proof:
εbt = σε2 ut ,
ut = Ft−1 vt − Kt rt ,
Var(εt |Yn ) =
σε2
−
σε4 Dt ,
t = n, . . . , 1
Dt = Ft−1 + Kt2 Nt .
and
ηbt = ση2 rt ,
t = n, . . . , 1
Var(ηt |Yn ) = ση2 − ση4 Nt .
21
Example: River Nile Data
20
0
-40
-20
Smoothed Eta
100
-100
-300
Smoothed Eps
Disturbance Smoothing
1880
1920
Time
1960
1880
1920
1960
Time
22
Missing Observations
I
A distinct advantage of state space models is the ease with
which missing observations can be dealt with. This is an
important matter in practice.
I
For example, transaction prices arrive irregularly and
asynchronously, so that certain prices are missing from the
vector of all assets, at any point in time.
23
Dealing with Missing Observations
Suppose that for some 1 < τ < τ ∗ ≤ n, observations yj ,
j = τ, . . . , τ ∗ − 1 are missing.
For filtering at times t = τ, . . . , τ ∗ − 1, we have


t−1
X
E(αt |Yt ) = E(αt |Yτ −1 ) = E ατ +
ηj Yτ −1  = aτ ,
j=τ
E(αt+1 |Yt ) = E(αt+1 |Yτ −1 ) = aτ .

Var(αt |Yt ) = Var(αt |Yτ −1 ) = Var ατ +
t−1
X

ηj Yτ −1 
j=τ
= Pτ + (t − τ )ση2
Var(αt+1 |Yt ) = Pτ + (t − τ + 1)ση2 .
24
Dealing with Missing Observations
This leads to, for t = τ, . . . , τ ∗ − 1,
at|t = at ,
at+1 = at ,
Pt|t = Pt ,
Pt+1 = Pt + ση2 .
This amounts to replacing Kt = Pt /Ft by Kt = 0, i.e., no Kalman
gain, at the missing time points, so the same code can be applied!
Moreover, this simple twist applies to all formulas in forecast error
recursions, state smoothing, and disturbance smoothing.
25
Forecasting
Suppose ȳn+j be the minimum mean square error forecast of yn+j
given Yn , for j = 1, 2, . . . J. Then we have
ȳn+j = E(yn+j |Yn ),
F̄n+j = Var(yn+j |Yn ).
The problem can be regarded as a missing observations problem,
i.e., with τ = n + 1 and τ ∗ = n + J in a filtering problem for yt
with t = 1, 2, . . . , n + J!
26
Example: River Nile Data
0
20
40
60
Time
80
100
30000
15000
5000
Variance of Filtered Values
1200
800
400
Filtered Annual flow
Filtering with missing values
0
20
40
60
80
100
Time
27
Example: River Nile Data
0
20
40
60
Time
80
100
8000
4000
Variance of Smoothed Values
1200
800
400
Smoothed Annual flow
Smoothing with missing values
0
20
40
60
80
100
Time
28
Initialization
I
We now consider how to start up the filter when nothing
about α1 is known.
I
It is reasonable to represent α1 as having a diffuse prior
density, i.e., fix it at an arbitrary value and let P1 → ∞.
I
Plugging this in, we obtain a2 = y1 , and P2 = σε2 + ση2 , we
can then proceed normally.
I
This is amount to treating y1 as given, and α1 ∼ N(y1 , σε2 ).
I
This is also equivalent to assuming α1 is unknown and
estimating it using y1 (which is the MLE for α1 ).
29
Likelihood Estimation
The log-likelihood is given by (assuming a1 and P1 are given up to
some parameters)
n n
1X
v2
L = − log(2π) −
log Ft + t ,
2
2
Ft
t=1
which can be easily implemented using Kalman filter.
For the case with diffuse initialization, the likelihood is given by
n n
1X
vt2
Ld = − log(2π) −
log Ft +
2
2
Ft
t=2
30
EM Algorithm
I
The EM algorithm is a well-known tool for iterative maximum
likelihood estimation., developed by Shumway and Stoffer
(1982) and Watson and Engle (1983).
I
It consists of an E-step (expectation) and M-step
(maximization).
I
For the state space model, it has a particular neat form.
31
EM Algorithm for Local Level Model
As an illustration, we apply EM algorithm to estimate the local
level model. Note that the log likelihood knowing α is
(
)
n
2
ηt−1
ε2t
1X
2
2
log σε + log ση + 2 + 2
log p(α, Yn |θ) =const −
2
σε
ση
t=1
The log-likelihood function of the data is given by:
log p(Yn |θ) = log p(α, Yn |θ) − log p(α|Yn , θ).
32
EM Algorithm
I
The E-step takes the conditional expectation, denoted as Ẽ,
of log p(Yn |θ) with respect to the density p(α|Yn , θ):
log p(Yn |θ) = Ẽ (log p(α, Yn |θ)) − Ẽ (log p(α|Yn , θ))
I
The M-step involves maximizing the likelihood with respect to
θ:
∂ log p(α, Yn |θ)
∂ log p(Yn |θ)
= Ẽ
∂θ
∂θ
This leads to
( n
)
X 1
1
1 2
1 2
+
− ε − η
=0
Ẽ
σε2 ση2 σε4 t ση4 t−1
i=1
33
EM Algorithm
The closed-form solution to the M-step is
σ
bε2
n
n
1X
1X
2
Ẽ εt =
=
εb2t + Var (εt |Yn ) ,
n
n
t=1
σ
bη2 =
1
n−1
t=1
n
X
t=2
2
Ẽ ηt−1
=
n
1 X 2
ηbt−1 + Var (ηt−1 |Yn ) .
n−1
t=2
The procedure then repeats itself with the new trial values of
(b
σε2 , σ
bη2 ), until convergence has been attained. Given each trial of
parameters, we need disturbance smoothing to update.
34
Summary for General Models
We now move to the general model introduced at the beginning of
this lecture.
vector
yt
αt
εt
ηt
p×1
m×1
p×1
r ×1
a1
m×1
matrix
Zt
Tt
Ht
Rt
Qt
P1
p×m
m×m
p×p
m×r
r ×r
m×m
35
Kalman Filter Recursion
There is no difference in the proof compared to what we have
done, except that some matrix algebra is needed:
vt = yt − Zt at ,
Ft = Zt Pt Zt| + Ht
at|t = at + Pt Zt| Ft−1 vt ,
at+1 = Tt at + Kt vt ,
Pt|t = Pt − Pt Zt| Ft−1 Zt Pt ,
Pt+1 = Tt Pt (Tt − Kt Zt )| + Rt Qt Rt| ,
for t = 1, 2, . . . , n, where Kt = Tt Pt Zt| Ft−1 , and a1 and P1 are
given.
vector
vt
p×1
at
at|t
m×1
m×1
matrix
Ft
Kt
Pt
Pt|t
p×p
m×p
m×m
m×m
36
State and Disturbance Smoothing Recursion
rt−1 =Zt| Ft−1 vt + L|t rt ,
α
bt = at + Pt rt−1 ,
Nt−1 = Zt| Ft−1 Zt + L|t Nt Lt ,
Vt = Pt − Pt Nt−1 Pt ,
εbt = Ht (Ft−1 vt − Kt| rt ),
ηbt = Qt Rt| rt ,
Var(εt |Yn ) = Ht − Ht (Ft−1 + Kt| Nt Kt )Ht ,
Var(ηt |Yn ) = Qt − Qt Rt| Nt Rt Qt ,
Lt = Tt − Kt Zt
for t = n, n − 1, . . . , 1, initialized with rn = 0, and Nn = 0.
vector
rt
α
bt
ut
εbt
ηbt
m×1
m×1
p×1
p×1
r ×1
matrix
Nt
Vt
Dt
m×m
m×m
p×p
37
Missing Observations
I
If the entire vector yt is missing for t = τ, . . . , τ ∗ − 1, e.g.,
forecasting, then we should use Zt = 0 for t = τ, . . . , τ ∗ − 1 in
at|t , Pt|t , at+1 , Pt+1 , rt−1 , Nt−1 , Kt , and Lt .
I
If certain elements of the vector yt is missing, e.g.,
asynchronous trading, we define yt∗ be the vector of values
actually observed. Note that the dimension of yt∗ changes
with t. And yt∗ = Wt yt , where Wt is the selection matrix
whose rows are a subset of rows of I. It implies that
yt∗ = Zt∗ αt + ε∗t ,
ε∗t ∼ N(0, Ht∗ ),
where Zt∗ = Wt Zt , ε∗t = Wt εt , and Ht∗ = Wt Ht Wt| .
38
Download