State Space Models and Filtering Jesús Fernández-Villaverde University of Pennsylvania 1 State Space Form • What is a state space representation? • States versus observables. • Why is it useful? • Relation with filtering. • Relation with optimal control. • Linear versus nonlinear, Gaussian versus nongaussian. 2 State Space Representation • Let the following system: — Transition equation xt+1 = F xt + Gω t+1, ω t+1 ∼ N (0, Q) — Measurement equation zt = H 0xt + υ t, υ t ∼ N (0, R) — where xt are the states and zt are the observables. • Assume we want to write the likelihood function of z T = {zt}T t=1. 3 The State Space Representation is Not Unique • Take the previous state space representation. • Let B be a non-singular squared matrix conforming with F . ¡ 0 ¢0 ∗ ∗ −1 ∗ ∗ • Then, if xt = Bxt, F = BF B , G = BG, and H = H B , we can write a new, equivalent, representation: — Transition equation x∗t+1 = F ∗x∗t + G∗ω t+1, ω t+1 ∼ N (0, Q) — Measurement equation zt = H ∗0x∗t + υ t, υ t ∼ N (0, R) 4 Example I • Assume the following AR(2) process: ³ zt = ρ1zt−1 + ρ2zt−2 + υ t, υ t ∼ N 0, σ 2υ • Model is not apparently not Markovian. • Can we write this model in different state space forms? • Yes! 5 ´ State Space Representation I • Transition equation: xt = where xt = h yt ρ2yt−1 " # ρ1 1 ρ2 0 xt−1 + " i0 • Measurement equation: zt = h 6 i 1 0 xt 1 0 # υt State Space Representation II • Transition equation: xt = where xt = h yt yt−1 " ρ1 ρ2 1 0 zt = • Try B = 1 0 0 ρ2 # xt−1 + " 1 0 # υt i0 • Measurement equation: " # h i 1 0 xt on the second system to get the first system. 7 Example II • Assume the following MA(1) process: ³ ´ 2 zt = υ t + θυ t−1, υ t ∼ N 0, σ υ , and Eυ tυ s = 0 for s 6= t. • Again, we have a more conmplicated structure than a simple Markovian process. • However, it will again be straightforward to write a state space representation. 8 State Space Representation I • Transition equation: xt = where xt = h yt θυ t " 0 1 0 0 # zt = h xt−1 + " i0 • Measurement equation: 9 i 1 0 xt 1 θ # υt State Space Representation II • Transition equation: xt = υ t−1 • where xt = [υ t−1]0 • Measurement equation: zt = θxt + υ t • Again both representations are equivalent! 10 Example III • Assume the following random walk plus drift process: ³ zt = zt−1 + β + υ t, υ t ∼ N 0, σ 2υ • This is even more interesting. • We have a unit root. • We have a constant parameter (the drift). 11 ´ State Space Representation • Transition equation: xt = where xt = h yt β " 1 1 0 1 # zt = h xt−1 + " i0 • Measurement equation: 12 i 1 0 xt 1 0 # υt Some Conditions on the State Space Representation • We only consider Stable Systems. • A system is stable if for any initial state x0, the vector of states, xt, converges to some unique x∗. • A necessary and sufficient condition for the system to be stable is that: |λi (F )| < 1 for all i, where λi (F ) stands for eigenvalue of F . 13 Introducing the Kalman Filter • Developed by Kalman and Bucy. • Wide application in science. • Basic idea. • Prediction, smoothing, and control. • Why the name “filter”? 14 Some Definitions ³ ´ • Let xt|t−1 = E xt|z t−1 be the best linear predictor of xt given the history of observables until t − 1, i.e. z t−1. ³ ´ t−1 • Let zt|t−1 = E zt|z = H 0xt|t−1 be the best linear predictor of zt given the history of observables until t − 1, i.e. z t−1. ³ ´ t • Let xt|t = E xt|z be the best linear predictor of xt given the history of observables until t, i.e. z t. 15 What is the Kalman Filter trying to do? • Let assume we have xt|t−1 and zt|t−1. • We observe a new zt. • We need to obtain xt|t. • Note that xt+1|t = F xt|t and zt+1|t = H 0xt+1|t, so we can go back to the first step and wait for zt+1. • Therefore, the key question is how to obtain xt|t from xt|t−1 and zt. 16 A Minimization Approach to the Kalman Filter I • Assume we use the following equation to get xt|t from zt and xt|t−1: ³ ´ ³ xt|t = xt|t−1 + Kt zt − zt|t−1 = xt|t−1 + Kt zt − H 0xt|t−1 ´ • This formula will have some probabilistic justification (to follow). • What is Kt? 17 A Minimization Approach to the Kalman Filter II • Kt is called the Kalman filter gain and it measures how much we update xt|t−1 as a function in our error in predicting zt. • The question is how to find the optimal Kt. • The Kalman filter is about how to build Kt such that optimally update xt|t from xt|t−1 and zt. • How do we find the optimal Kt? 18 Some Additional Definitions • Let Σt|t−1 ≡ E µ³ ´³ ´0 ¶ • Let Ωt|t−1 ≡ E µ³ ´³ ´0 ¶ xt − xt|t−1 xt − xt|t−1 |z t−1 be the predicting error variance covariance matrix of xt given the history of observables until t − 1, i.e. z t−1. zt − zt|t−1 zt − zt|t−1 |z t−1 be the predicting error variance covariance matrix of zt given the history of observables until t − 1, i.e. z t−1. • Let Σt|t ≡ E µ³ xt − xt|t ´³ ´0 ¶ xt − xt|t |z t be the predicting error vari- ance covariance matrix of xt given the history of observables until t−1, i.e. z t. 19 Finding the optimal Kt • We want Kt such that min Σt|t. • It can be shown that, if that is the case: ³ ´−1 0 Kt = Σt|t−1H H Σt|t−1H + R • with the optimal update of xt|t given zt and xt|t−1 being: ³ xt|t = xt|t−1 + Kt zt − H 0xt|t−1 • We will provide some intuition later. 20 ´ Example I Assume the following model in State Space form: • Transition equation ³ ´ ³ ´ xt = µ + υ t, υ t ∼ N 0, σ 2υ • Measurement equation zt = xt + ξ t, ξ t ∼ N 0, σ 2ξ • Let σ 2ξ = qσ 2υ . 21 Example II • Then, if Σ1|0 = σ 2υ , what it means that x1 was drawn from the ergodic distribution of xt. • We have: K1 = σ 2υ 1 1 ∝ . 1+q 1+q • Therefore, the bigger σ 2ξ relative to σ 2υ (the bigger q) the lower K1 and the less we trust z1. 22 The Kalman Filter Algorithm I Given Σt|t−1, zt, and xt|t−1, we can now set the Kalman filter algorithm. Let Σt|t−1, then we compute: Ωt|t−1 ≡ E µ³ zt − zt|t−1 ³ ´³ ´0 zt − zt|t−1 |z t−1 H 0 xt − xt|t−1 ´³ ¶ ´0 xt − xt|t−1 H+ ³ ´0 ³ ´ 0 0 = E υ t xt − x t|t−1 H + H xt − xt|t−1 υ t+ υ tυ 0t|z t−1 = H 0Σt|t−1H + R 23 The Kalman Filter Algorithm II Let Σt|t−1, then we compute: E µ ³ E H 0 xt − xt|t−1 ´³ µ³ zt − zt|t−1 xt − xt|t−1 ´0 ´³ ´0 xt − xt|t−1 |z t−1 ³ ´0 + υ t xt − xt|t−1 |z t−1 Let Σt|t−1, then we compute: ³ ´−1 0 Kt = Σt|t−1H H Σt|t−1H + R Let Σt|t−1, xt|t−1, Kt, and zt then we compute: ³ xt|t = xt|t−1 + Kt zt − H 0xt|t−1 24 ´ ¶ ¶ = = H 0Σt|t−1 The Kalman Filter Algorithm III Let Σt|t−1, xt|t−1, Kt, and zt, then we compute: E ³ ³ Σt|t ≡ E xt − xt|t−1 xt − xt|t−1 ³ ´³ ´³ µ³ xt − xt|t ´³ ´0 ¶ xt − xt|t |z t = xt − xt|t−1 − ´0 ´0 zt − H 0xt|t−1 Kt0 − ´³ ´0 0 Kt zt − H xt|t−1 xt − xt|t−1 + ³ ´³ ´0 0 0 Kt zt − H xt|t−1 zt − H xt|t−1 Kt0 |z t = Σt|t−1 − KtH 0Σt|t−1 ´ 0 where, you have to notice that xt −xt|t = xt −xt|t−1 −Kt zt − H xt|t−1 . 25 ³ The Kalman Filter Algorithm IV Let Σt|t−1, xt|t−1, Kt, and zt, then we compute: Σt+1|t = F Σt|tF 0 + GQG0 Let xt|t, then we compute: 1. xt+1|t = F xt|t 2. zt+1|t = H 0xt+1|t Therefore, from xt|t−1, Σt|t−1, and zt we compute xt|t and Σt|t. 26 The Kalman Filter Algorithm V We also compute zt|t−1 and Ωt|t−1. Why? To calculate the likelihood function of z T = {zt}T t=1 (to follow). 27 The Kalman Filter Algorithm: A Review We start with xt|t−1 and Σt|t−1. The, we observe zt and: • Ωt|t−1 = H 0Σt|t−1H + R • zt|t−1 = H 0xt|t−1 ³ ´−1 0 • Kt = Σt|t−1H H Σt|t−1H + R • Σt|t = Σt|t−1 − KtH 0Σt|t−1 28 ³ • xt|t = xt|t−1 + Kt zt − H 0xt|t−1 • Σt+1|t = F Σt|tF 0 + GQG0 • xt+1|t = F xt|t We finish with xt+1|t and Σt+1|t. 29 ´ Some Intuition about the optimal Kt ³ ´−1 0 • Remember: Kt = Σt|t−1H H Σt|t−1H + R • Notice that we can rewrite Kt in the following way: Kt = Σt|t−1HΩ−1 t|t−1 • If we did a big mistake forecasting xt|t−1 using past information (Σt|t−1 large) we give a lot of weight to the new information (Kt large). • If the new information is noise (R large) we give a lot of weight to the old prediction (Kt small). 30 A Probabilistic Approach to the Kalman Filter • Assume: Z|w = [X 0|w Y 0|w]0 ∼ N • then: Ã" x∗ y∗ # " , Σxx Σxy Σyx Σyy #! ³ ∗ −1 X|y, w ∼ N x∗ + Σxy Σ−1 yy (y − y ) , Σxx − Σxy Σyy Σyx ³ ´ t−1 • Also xt|t−1 ≡ E xt|z and: µ³ Σt|t−1 ≡ E xt − xt|t−1 31 ´³ ´0 xt − xt|t−1 |z t−1 ¶ ´ Some Derivations I If zt|z t−1 is the random variable zt (observable) conditional on z t−1, then: ³ ´ ³ ´ t−1 0 t−1 • Let zt|t−1 ≡ E zt|z = E H xt + υ t|z = H 0xt|t−1 • Let Ωt|t−1 ≡ E ³ µ³ zt − zt|t−1 ´³ ´³ ´0 ¶ zt − zt|t−1 |z t−1 = ´0 0 x −x H t t|t−1 xt −´xt|t−1 H+ ³ 0 υ t xt − xt|t−1 H+ = H 0Σ E ³ ´ t|t−1H + R H 0 xt − xt|t−1 υ 0t+ υ tυ 0t|z t−1 32 Some Derivations II Finally, let E µ ³ µ³ zt − zt|t−1 E H 0 xt − xt|t−1 ´³ ´³ ´0 ¶ xt − xt|t−1 |z t−1 = xt − xt|t−1 ´0 ´0 ¶ + υ t xt − xt|t−1 |z t−1 = = H 0Σt|t−1 33 ³ The Kalman Filter First Iteration I • Assume we know x1|0 and Σ1|0, then à ! x1 0 |z N z1 • Remember that: Ã" x1|0 H 0x1|0 # " , ³ Σ1|0 Σ1|0H H 0Σ1|0 H 0Σ1|0H + R #! ∗ −1 X|y, w ∼ N x∗ + Σxy Σ−1 yy (y − y ) , Σxx − Σxy Σyy Σyx 34 ´ The Kalman Filter First Iteration II Then, we can write: where and ³ ´ 0 1 x1|z1, z = x1|z ∼ N x1|1, Σ1|1 ³ ´−1 ³ ´ 0 0 x1|1 = x1|0 + Σ1|0H H Σ1|0H + R z1 − H x1|0 ³ ´−1 0 H 0Σ1|0 Σ1|1 = Σ1|0 − Σ1|0H H Σ1|0H + R 35 • Therefore, we have that: — z1|0 = H 0x1|0 — Ω1|0 = H 0Σ1|0H + R ³ ´−1 ³ ´ 0 0 — x1|1 = x1|0 + Σ1|0H H Σ1|0H + R z1 − H x1|0 ³ ´−1 0 — Σ1|1 = Σ1|0 − Σ1|0H H Σ1|0H + R H 0Σ1|0 • Also, since x2|1 = F x1|1 + Gω 2|1 and z2|1 = H 0x2|1 + υ 2|1: — x2|1 = F x1|1 — Σ2|1 = F Σ1|1F 0 + GQG0 36 The Kalman Filter th Iteration I • Assume we know xt|t−1 and Σt|t−1, then à ! xt t−1 |z N zt • Remember that: Ã" xt|t−1 H 0xt|t−1 # " , ³ Σt|t−1 Σt|t−1H H 0Σt|t−1 H 0Σt|t−1H + R ∗ −1 X|y, w ∼ N x∗ + Σxy Σ−1 yy (y − y ) , Σxx − Σxy Σyy Σyx 37 #! ´ The Kalman Filter th Iteration II Then, we can write: where and ³ ´ t−1 t xt|zt, z = xt|z ∼ N xt|t, Σt|t ³ ´−1 ³ ´ 0 0 xt|t = xt|t−1 + Σt|t−1H H Σt|t−1H + R zt − H xt|t−1 ³ ´−1 0 H 0Σt|t−1 Σt|t = Σt|t−1 − Σt|t−1H H Σt|t−1H + R 38 The Kalman Filter Algorithm Given xt|t−1, Σt|t−1 and observation zt • Ωt|t−1 = H 0Σt|t−1H + R • zt|t−1 = H 0xt|t−1 ³ ´−1 • Σt|t = Σt|t−1 − Σt|t−1H H 0Σt|t−1H + R ³ H 0Σ ´−1 ³ ´0 0 0 • xt|t = xt|t−1 + Σt|t−1H H Σt|t−1H + R zt − H xt|t−1 39 • Σt+1|t = F Σt|tF 0 + GQGt|t−1 • xt+1|t = F xt|t−1 40 Putting the Minimization and the Probabilistic Approaches Together • From the Minimization Approach we know that: ³ xt|t = xt|t−1 + Kt zt − H 0xt|t−1 • From the Probability Approach we know that: ³ ´ ´−1 ³ ´ 0 0 zt − H xt|t−1 xt|t = xt|t−1 + Σt|t−1H H Σt|t−1H + R 41 • But since: ³ ´−1 0 Kt = Σt|t−1H H Σt|t−1H + R • We can also write in the probabilistic approach: ³ ´−1 ³ ´ 0 0 zt − H xt|t−1 = xt|t = xt|t−1 + Σt|t−1H H Σt|t−1H + R ³ ´ 0 = xt|t−1 + Kt zt − H xt|t−1 • Therefore, both approaches are equivalent. 42 Writing the Likelihood Function We want to write the likelihood function of z T = {zt}T t=1: T X T X t=1 ³ ´ T log ` z |F, G, H, Q, R = ³ ´ t−1 log ` zt|z F, G, H, Q, R = T ¯ ¯ X 1 N 1 ¯ ¯ log 2π + log ¯Ω vt0 Ω−1 vt − t|t−1¯ + t|t−1 2 2 t=1 t=1 2 vt = zt − zt|t−1 = zt − H 0xt|t−1 0 Ωt|t−1 = HtΣt|t−1Ht + R 43 Initial conditions for the Kalman Filter • An important step in the Kalman Fitler is to set the initial conditions. • Initial conditions: 1. x1|0 2. Σ1|0 • Where do they come from? 44 Since we only consider stable system, the standard approach is to set: • x1|0 = x∗ • Σ1|0 = Σ∗ where x solves x∗ = F x∗ Σ∗ = F Σ∗F 0 + GQG0 How do we find Σ∗? Σ∗ = [I − F ⊗ F ]−1 vec(GQG0) 45 Initial conditions for the Kalman Filter II Under the following conditions: 1. The system is stable, i.e. all eigenvalues of F are strictly less than one in absolute value. 2. GQG0 and R are p.s.d. symmetric 3. Σ1|0 is p.s.d. symmetric Then Σt+1|t → Σ∗. 46 Remarks 1. There are more general theorems than the one just described. 2. Those theorems are based on non-stable systems. 3. Since we are going to work with stable system the former theorem is enough. 4. Last theorem gives us a way to find Σ as Σt+1|t → Σ for any Σ1|0 we start with. 47 The Kalman Filter and DSGE models • Basic Real Business Cycle model max E0 ∞ X t=0 β t {ξ log ct + (1 − ξ) log (1 − lt)} ct + kt+1 = ktα (ezt lt)1−α + (1 − δ) k zt = ρzt−1 + εt, εt ∼ N (0, σ) • Parameters: γ = {α, β, ρ, ξ, η, σ} 48 Equilibrium Conditions ( ³ 1 1 α−1 1−α = βEt 1 + αezt+1 kt+1 lt+1 − η ct ct+1 ξ 1−ξ = (1 − α) ezt ktαlt−α 1 − lt ct ct + kt+1 = ezt ktαlt1−α + (1 − η) kt zt = ρzt−1 + εt 49 ) ´ A Special Case • We set, unrealistically but rather useful for our point, η = 1. • In this case, the model has two important and useful features: 1. First, the income and the substitution effect from a productivity shock to labor supply exactly cancel each other. Consequently, lt is constant and equal to: (1 − α) ξ lt = l = (1 − α) ξ + (1 − ξ) (1 − αβ) 2. Second, the policy function for capital is kt+1 = αβezt ktαl1−α. 50 A Special Case II • The definition of kt+1 implies that ct = (1 − αβ) ezt ktαl1−α. • Let us try if the Euler Equation holds: ( ) ´ ³ 1 1 α−1 1−α = βEt αezt+1 kt+1 lt+1 ct c ) (t+1 ³ ´ 1 1 zt+1 kα−1l1−α = βE αe t t+1 t+1 α l1−α (1 − αβ) ezt ktαl1−α (1 − αβ) ezt+1kt+1 ( 1 α = βE t (1 − αβ) ezt ktαl1−α (1 − αβ) kt+1 αβ βα = (1 − αβ) (1 − αβ) 51 ) • Let us try if the Intratemporal condition holds 1−ξ ξ zt kαl−α (1 − α) e = t 1−l (1 − αβ) ezt ktαl1−α 1−ξ ξ (1 − α) = 1−l (1 − αβ) l (1 − αβ) (1 − ξ) l = ξ (1 − α) (1 − l) ((1 − αβ) (1 − ξ) + (1 − α) ξ) l = (1 − α) ξ • Finally, the budget constraint holds because of the definition of ct. 52 Transition Equation • Since this policy function is linear in logs, we have the transition equation for the model: 1 1 0 0 1 0 log kt+1 = log αβλl1−α α ρ log kt + 1 ²t. zt zt−1 1 0 0 ρ • Note constant. • Alternative formulations. 53 Measurement Equation • As observables, we assume³ log yt and log ´0 it subject to a linearly additive measurement error Vt = v1,t v2,t . • Let Vt ∼ N (0, Λ), where Λ is a diagonal matrix with σ 21 and σ 22, as diagonal elements. • Why measurement error? Stochastic singularity. • Then: à log yt log it ! = à − log αβλl1−α 1 0 0 1 0 54 ! à 1 log kt+1 + zt v1,t v2,t ! . The Solution to the Model in State Space Form à 1 log yt xt = log kt , zt = log it zt−1 1 0 0 0 F = log αβλl1−α α ρ , G = 1 1 0 0 ρ H0 = à − log αβλl1−α 1 0 0 1 0 55 ! ! 2 ,Q = σ ,R = Λ The Solution to the Model in State Space Form III • Now, using z T , F, G, H, Q, and R as defined in the last slide... • ...we can use the Ricatti equations to compute the likelihood function of the model: ³ ´ T log ` z |F, G, H, Q, R • Croos-equations restrictions implied by equilibrium solution. • With the likelihood, we can do inference! 56 What do we Do if η 6= 1? We have two options: • First, we could linearize or log-linearize the model and apply the Kalman filter. • Second, we could compute the likelihood function of the model using a non-linear filter (particle filter). • Advantages and disadvantages. • Fernández-Villaverde, Rubio-Ramı́rez, and Santos (2005). 57 The Kalman Filter and linearized DSGE Models • We linearize (or loglinerize) around the steady state. • We assume that we have data on log output (log yt), log hours (log lt), and log investment (log ct) subject to a linearly additive measurement ³ ´0 error Vt = v1,t v2,t v3,t . • We need to write the model in state space form. Remember that and b b k t+1 = P kt + Qzt b + Sz lbt = Rk t t 58 Writing the Likelihood Function I • The transitions Equation: 1 1 1 0 0 0 b b kt+1 = 0 P Q k t + 0 ²t. 0 0 ρ 1 zt+1 zt • The Measurement Equation requires some care. 59 Writing the Likelihood Function II b + (1 − α)lb • Notice that ybt = zt + αk t t b + Sz • Therefore, using lbt = Rk t t b + (1 − α)(Rk b + Sz ) = ybt = zt + αk t t t b + (1 + (1 − α)S) z (α + (1 − α)R) k t t b and using again lb = Rk b + Sz • Also since cbt = −α5lbt + zt + αk t t t t b − α (Rk b + Sz ) = cbt = zt + αk t t t 5 b + (1 − α S) z (α − α5R) k t t 5 60 Writing the Likelihood Function III Therefore the measurement equation is: 1 log yt log y α + (1 − α)R 1 + (1 − α)S b R S log lt = log l kt log ct 1 − α5S log c α − α5R zt v1,t + v2,t . v3,t 61 The Likelihood Function of a General Dynamic Equilibrium Economy • Transition equation: St = f (St−1, Wt; γ) • Measurement equation: Yt = g (St, Vt; γ) • Interpretation. 62 Some Assumptions n 1. We can partition {Wt} into two independent sequences W1,t n o ³ ´ ³ ´ o and W2,t , s.t. Wt = W1,t, W2,t and dim W2,t +dim (Vt) ≥ dim (Yt). ³ ´ t t−1 2. We can always evaluate the conditional densities p yt|W1 , y , S0; γ . Lubick and Schorfheide (2003). 3. The model assigns positive probability to the data. 63 Our Goal: Likelihood Function • Evaluate the likelihood function of the a sequence of realizations of the observable y T at a particular parameter value γ: ³ p yT ; γ • We factorize it as: ³ p yT ; γ = T Z Z Y t=1 ³ ´ = T Y t=1 ´ ³ p yt|y t−1; γ ´ ´ ³ ´ t t−1 t t−1 p yt|W1 , y , S0; γ p W1 , S0|y ; γ dW1tdS0 64 A Law of Large Numbers If (½ ¾N )T n ³ ´oT t|t−1,i t|t−1,i t t−1 s0 , w1 N i.i.d. draws from p W1 , S0|y ; γ , t=1 i=1 t=1 then: ³ ´ p yT ; γ ' T N Y 1 X t=1 N i=1 µ t|t−1,i p yt|w1 65 t|t−1,i , y t−1, s0 ;γ ¶ ...thus The problem of evaluating the likelihood is equivalent to the problem of drawing from n ³ p W1t, S0|y t−1; γ 66 ´oT t=1 Introducing Particles • n o ³ ´ t−1,i t−1,i N t−1 t−1 s0 , w1 N i.i.d. draws from p W1 , S0|y ; γ . i=1 t−1,i t−1,i • Each s0 , w1 is a particle and particles. • n o t−1,i t−1,i N s0 , w1 a swarm of i=1 ½ ¾ ³ ´ t|t−1,i t|t−1,i N t t−1 s0 , w1 N i.i.d. draws from p W1 , S0|y ; γ . i=1 t|t−1,i t|t−1,i , w1 is a proposed particle and • Each s0 a swarm of proposed particles. 67 ½ ¾ t|t−1,i t|t−1,i N s0 , w1 i=1 ... and Weights µ t|t−1,i p yt|w1 qti = P N µ t|t−1,i , y t−1, s0 ;γ ¶ t|t−1,i t−1 t|t−1,i p y |w , y , s0 ;γ t 1 i=1 68 ¶ A Proposition Let and ½ ¾N t|t−1,i be a draw with replacement from st|t−1,i , w 0 1 i=1 n oN ³ ´ probabilities qti. Then sei0, we1i i=1 is a draw from p W1t, S0|yt; γ . n oN e 1i sei0, w i=1 69 Importance of the Proposition ½ ¾ ³ ´ t|t−1,i t|t−1,i N t t−1 , w1 from p W1 , S0|y ; γ 1. It shows how a draw s0 i=1 n oN ³ ´ t,i t,i t t can be used to draw s0 , w1 from p W1 , S0|y ; γ . i=1 n o ³ ´ ³ ´ t,i t,i N t t 2. With a draw s0 , w1 from p W1 , S0|y ; γ we can use p W1,t+1; γ i=1 ½ ¾ t+1|t,i t+1|t,i N to get a draw s0 , w1 and iterate the procedure. i=1 70 Sequential Monte Carlo I: Filtering Step 0, Initialization: Set t à p (S0; γ). ³ 1 and initialize p W1t−1, S0|y t−1; γ ´ = ½ ¾N t|t−1,i t|t−1,i Step 1, Prediction: Sample N values s0 , w1 from the i=1´ ³ ´ ³ ´ ³ t−1 density p W1t, S0|y t−1; γ = p W1,t; γ p W1 , S0|y t−1; γ . t|t−1,i Step 2, Weighting: Assign to each draw s0 qti. t|t−1,i , w1 the weight ½ ¾ o t|t−1,i t|t−1,i N t,i t,i N with rep. from s0 , w1 Step 3, Sampling: Draw s0 , w1 i=1 i=1 n oN with probabilities qti . If t < T set t à t + 1 and go to i=1 n step 1. Otherwise stop. 71 Sequential Monte Carlo II: Likelihood Use (½ ¾N )T t|t−1,i t|t−1,i s0 , w1 to compute: i=1 t=1 ³ ´ p yT ; γ ' T 1 X N Y t=1 N i=1 µ t|t−1,i p yt|w1 72 t|t−1,i , y t−1, s0 ;γ ¶ A “Trivial” Application ³ ´ T How do we evaluate the likelihood function p y |α, β, σ of the nonlinear, nonnormal process: st = α + β st−1 + wt 1 + st−1 yt = st + vt where wt ∼ N (0, σ) and vt ∼ t (2) given some observables y T = {yt}T t=1 and s0. 73 0,i 1. Let s0 = s0 for all i. 2. Generate N i.i.d. draws µ 1|0,i 3. Evaluate p y1|w1 ½ ¾N 1|0,i s0 , w1|0,i from N (0, σ). i=1 1|0,i , y 0, s0 ¶ à à = pt(2) y1 − α + 4. Evaluate the relative weights q1i = 74 à à 1|0,i s0 β 1|0,i 1+s0 + w1|0,i !! !! 1|0,i s pt(2) y1− α+β 0 1|0,i +w1|0,i 1+s 0 à à !! . 1|0,i PN s 0 p y − α+β +w1|0,i 1 i=1 t(2) 1|0,i 1+s 0 . ½ ¾N 1|0,i 5. Resample with replacement N values of s0 , w1|0,i with relan oNi=1 1,i tive weights q1i . Call those sampled values s0 , w1,i . i=1 6. Go to step 1, and iterate 1-4 until the end of the sample. 75 A Law of Large Numbers A law of the large numbers delivers: µ ¶ N X 1 1|0,i 1|0,i p y1| y 0, α, β, σ ' p y1|w1 , y 0, s0 N i=1 ³ ´ and consequently: µ ¶ T N ¯ ´ Y X 1 t|t−1,i t−1 t|t−1,i ¯ p y T ¯ α, β, σ ' p yt|w1 , y , s0 N t=1 i=1 ³ 76 Comparison with Alternative Schemes • Deterministic algorithms: Extended Kalman Filter and derivations (Jazwinski, 1973), Gaussian Sum approximations (Alspach and Sorenson, 1972), grid-based filters (Bucy and Senne, 1974), Jacobian of the transform (Miranda and Rui, 1997). Tanizaki (1996). • Simulation algorithms: Kitagawa (1987), Gordon, Salmond and Smith (1993), Mariano and Tanizaki (1995) and Geweke and Tanizaki (1999). 77 A “Real” Application: the Stochastic Neoclassical Growth Model • Standard model. • Isn’t the model nearly linear? • Yes, but: 1. Better to begin with something easy. 2. We will learn something nevertheless. 78 The Model ³ ´1−τ 1−θ θ ct (1−lt ) P∞ t . • Representative agent with utility function U = E0 t=0 β 1−τ • One good produced according to yt = ezt Aktαlt1−α with α ∈ (0, 1) . • Productivity evolves zt = ρzt−1 + ²t, |ρ| < 1 and ²t ∼ N (0, σ ²). • Law of motion for capital kt+1 = it + (1 − δ)kt. • Resource constrain ct + it = yt. 79 • Solve for c (·, ·) and l (·, ·) given initial conditions. • Characterized by: h ³ α−1 Uc(t) = βEt Uc(t + 1) 1 + αAezt+1kt+1 l(kt+1, zt+1)α − δ ´i 1 − θ c(kt, zt) = (1 − α) ezt Aktαl(kt, zt)−α θ 1 − l(kt, zt) • A system of functional equations with no known analytical solution. 80 Solving the Model • We need to use a numerical method to solve it. • Different nonlinear approximations: value function iteration, perturbation, projection methods. • We use a Finite Element Method. Why? Aruoba, Fernández-Villaverde and Rubio-Ramı́rez (2003): 1. Speed: sparse system. 2. Accuracy: flexible grid generation. 3. Scalable. 81 Building the Likelihood Function • Time series: 1. Quarterly real output, hours worked and investment. 2. Main series from the model and keep dimensionality low. • Measurement error. Why? • γ = (θ, ρ, τ , α, δ, β, σ ², σ 1, σ 2, σ 3) 82 State Space Representation kt = λt = gdpt = hourst = invt = ³ ´1−α −1 (λ tanh ) α −1 t−1 k f1(St−1, Wt; γ) = e ∗ t−1l kt−1, tanh (λt−1); γ ³ ³ ´´ −1 1 − l kt−1, tanh (λt−1); γ θ 1 − + (1 − δ) kt−1 ³ ´ (1 − α) −1 1−θ l kt−1, tanh (λt−1); γ f2(St−1, Wt; γ) = tanh(ρ tanh−1(λt−1) + ²t) ´1−α −1 (λ ) α ³ tanh −1 t g1(St, Vt; γ) = e kt l kt, tanh (λt); γ + V1,t ³ ´ −1 g2(St, Vt; γ) = l kt, tanh (λt); γ + V2,t ´1−α −1 (λ ) α ³ tanh −1 t g3(St, Vt; γ) = e kt l kt, tanh (λt); γ ∗ ³ ³ ´´ −1 1 − l kt, tanh (λt); γ θ 1 − + V3,t ³ ´ (1 − α) 1−θ l kt, tanh−1(λt); γ Likelihood Function 83 Since our measurement equation implies that − 32 p (yt|St; γ) = (2π) − 12 − |Σ| e ω(St ;γ) 2 where ω(St; γ) = (yt − x(St; γ)))0 Σ−1 (yt − x(St; γ)) ∀t, we have − 3T 2 (2π) −T |Σ| 2 Z p yT ; γ T Z Y − e t=1 ' ³ − 3T (2π) 2 ´ = ´ ω(St ;γ) ³ t−1 2 p St|y , S0; γ dSt p (S0; γ) dS1 −T |Σ| 2 T 1 X N ω(sit ;γ) Y e− 2 t=1 N i=1 84 Priors for the Parameters Priors for the Parameters of the Model Parameters Distribution Hyperparameters θ Uniform 0,1 ρ Uniform 0,1 τ Uniform 0,100 α Uniform 0,1 δ Uniform 0,0.05 β Uniform 0.75,1 σ² Uniform 0,0.1 σ1 Uniform 0,0.1 σ2 Uniform 0,0.1 σ3 Uniform 0,0.1 85 Likelihood-Based Inference I: a Bayesian Perspective • Define priors over parameters: truncated uniforms. • Use a Random-walk Metropolis-Hastings to draw from the posterior. • Find the Marginal Likelihood. 86 Likelihood-Based Inference II: a Maximum Likelihood Perspective • We only need to maximize the likelihood. • Difficulties to maximize with Newton type schemes. • Common problem in dynamic equilibrium economies. • We use a simulated annealing scheme. 87 An Exercise with Artificial Data • First simulate data with our model and use that data as sample. • Pick “true” parameter values. Benchmark calibration values for the stochastic neoclassical growth model (Cooley and Prescott, 1995). Calibrated Parameters Parameter θ ρ τ α Value 0.357 0.95 2.0 0.4 Parameter β σ² σ1 σ2 Value 0.99 0.007 1.58*10−4 0.0011 • Sensitivity: τ = 50 and σ ² = 0.035. 88 δ 0.02 σ3 8.66*10−4 Figure 5.1: Likelihood Function Benchmark Calibration Likelihood cut at ρ Likelihood cut at τ 0.9 0.92 0.94 0.96 0.98 1.5 Likelihood cut at σ 2 2.5 3 Likelihood cut at α 3.5 0.38 Likelihood cut at β 0.4 0.42 Likelihood cut at δ 0.018 0.02 0.022 Likelihood cut at θ Nonlinear Linear Pseudotrue 7 8 9 10 11 x 10 0.98 -3 0.985 0.99 0.25 0.3 0.35 0.4 Figure 5.2: Posterior Distribution Benchmark Calibration ρ τ 5000 5000 4000 4000 3000 3000 2000 2000 1000 1000 0 0.94850.9490.9495 0.95 0.95050.9510.9515 0 1.996 1.998 α 4000 2000 2000 0.3995 0.4 0 0.4005 0.01957 0.0196 σ 0.01963 5000 6.99 6.995 7 7.005 0 7.01 x 10 0.988 0.989 0.99 0.991 -3 σ1 θ 5000 0 2.004 β 5000 0 2.002 δ 4000 0 2 5000 0.3564 0.3568 0.3572 0.3576 0 1.578 1.579 1.58 1.581 1.582 1.583 1.584 x 10 σ2 σ3 4000 4000 2000 2000 0 1.116 1.117 1.118 -4 1.119 1.12 x 10 -3 0 8.645 8.65 8.655 8.66 8.665 8.67 8.675 x 10 -4 Figure 5.3: Likelihood Function Extreme Calibration Likelihood cut ρ Likelihood cut τ 0.9 0.92 0.94 0.96 0.98 Likelihood cut σ 40 45 50 55 Likelihood cut α 0.36 0.38 Likelihood cut β 0.4 0.42 0.44 Likelihood cut δ 0.016 0.018 0.02 0.022 Likelihood cut θ Nonlinear Linear Pseudotrue 0.03 0.035 0.04 0.95 0.96 0.97 0.98 0.99 0.3 0.35 0.4 Figure 5.4: Posterior Distribution Extreme Calibration ρ τ 6000 6000 4000 4000 2000 2000 0 0.9495 0.95 0 49.95 0.9505 50 α δ 6000 6000 4000 4000 2000 2000 0 0.3996 0.3998 0.4 0 0.4002 0.019555 0.019565 0.019575 0.019585 σ β 6000 6000 4000 4000 2000 2000 0 0.035 0.035 0.035 0.035 0.035 0.035 0.035 0 0.989 0.9895 0.99 0.9905 σ1 θ 5000 0 50.05 5000 0.3567 0.3569 0.3571 0.3573 0 1.58 1.5805 1.581 1.5815 1.582 1.5825 x 10 σ2 σ3 6000 6000 4000 4000 2000 2000 0 1.117 1.1175 1.118 -4 1.1185 1.119 x 10 -3 0 8.655 8.66 8.665 x 10 -4 Figure 5.5: Converge of Posteriors Extreme Calibration ρ τ 1 80 0.8 60 0.6 0.4 0 1 2 3 4 x 10 40 0 1 2 3 5 4 x 10 α 5 δ 0.03 0.45 0.4 0.35 0.02 0 1 2 3 4 x 10 0.01 0 1 1 0.03 0.95 1 2 4 5 β σ 0 3 x 10 0.04 0.02 2 5 3 4 x 10 0.9 0 1 2 3 5 4 x 10 5 σ1 θ 0.4 0.05 0.35 0 1 2 3 4 x 10 0 0 1 2 3 5 4 x 10 σ2 5 σ3 0.015 0.04 0.01 0.02 0.005 0 0 1 2 3 4 x 10 5 0 0 1 2 3 4 x 10 5 Figure 5.6: Posterior Distribution Real Data ρ τ 10000 15000 10000 5000 5000 0 0.96 2 x 10 0.97 0.98 0.99 1 0 1.68 1.7 α 4 1.72 1.74 1.76 6.35 6.4 δ 10000 1 5000 0 0.32 0.322 0.324 0.326 0.328 0.33 0.332 0 6.2 6.25 6.3 x 10 σ β 15000 15000 10000 10000 5000 5000 0 0.0198 0.02 0.0202 0.0204 0 0.9964 0.9966 0.9968 0.997 0.9972 0.9974 0.9976 σ1 θ 10000 10000 5000 5000 0 0.385 0.39 -3 0.395 0.4 0 0.0435 0.044 0.0445 0.045 0.0455 0.046 0.0465 σ2 σ3 10000 15000 10000 5000 5000 0 0.014 0.0145 0.015 0.0155 0.016 0 0.037 0.0375 0.038 0.0385 0.039 0.0395 0.04 Figure 6.1: Likelihood Function Transversal cut at α Transversal cut at β -1000 -1000 -2000 -2000 -3000 -3000 -4000 -4000 -5000 -5000 Exact 100 Particles 1000 Particles 10000 Particles -6000 -7000 -7000 0.39 0.395 0.4 0.405 Exact 100 Particles 1000 Particles 10000 Particles -6000 0.41 0.97 0.98 Exact 100 Particles 1000 Particles 10000 Particles -60 -80 -40 -80 -100 -120 -120 -140 -140 -160 -160 -180 -180 -200 -200 -220 -220 0.94 0.95 1.01 0.96 0.97 Exact 100 Particles 1000 Particles 10000 Particles -60 -100 0.93 1 Transversal cut at σε Transversal cut at ρ -40 0.99 6.85 6.9 6.95 7 7.05 7.1 7.15 x 10 -3 Figure 6.2: C.D.F. Benchmark Calibration 10000 particles 20000 particles 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 2000 4000 6000 8000 10000 0 0 0.5 1 1.5 2 x 10 30000 particles 40000 particles 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 1 2 3 x 10 0 0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 2 3 3 4 4 60000 particles 1 1 2 x 10 50000 particles 0 1 4 1 0 4 4 5 x 10 4 0 0 2 4 6 x 10 4 Figure 6.3: C.D.F. Extreme Calibration 10000 particles 20000 particles 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 2000 4000 6000 8000 10000 0 0 0.5 1 1.5 2 x 10 30000 particles 40000 particles 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 1 2 3 x 10 0 0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 2 3 3 4 4 60000 particles 1 1 2 x 10 50000 particles 0 1 4 1 0 4 4 5 x 10 4 0 0 2 4 6 x 10 4 Figure 6.4: C.D.F. Real Data 10000 particles 20000 particles 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 2000 4000 6000 8000 10000 0 0 0.5 1 1.5 2 x 10 30000 particles 40000 particles 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 1 2 3 x 10 0 0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 2 3 3 4 4 60000 particles 1 1 2 x 10 50000 particles 0 1 4 1 0 4 4 5 x 10 4 0 0 2 4 6 x 10 4 Posterior Distributions Benchmark Calibration Parameters Mean s.d. 6.72×10−5 0.357 θ 3.40×10−4 0.950 ρ 6.78×10−4 2.000 τ 0.400 8.60×10−5 α 0.020 1.34×10−5 δ 0.989 β 1.54×10−5 0.007 σ² 9.29×10−6 1.58×10−4 σ1 5.75×10−8 σ2 1.12×10−2 6.44×10−7 σ3 8.64×10−4 6.49×10−7 89 Maximum Likelihood Estimates Benchmark Calibration Parameters MLE s.d. 0.357 8.19×10−6 θ 0.950 0.001 ρ 2.000 0.020 τ 0.400 2.02×10−6 α 0.002 δ 2.07×10−5 0.990 β 1.00×10−6 0.007 σ² 0.004 1.58×10−4 σ1 0.007 σ2 1.12×10−3 0.007 σ3 8.63×10−4 0.005 90 Posterior Distributions Extreme Calibration Parameters Mean s.d. 7.19×10−4 0.357 θ 1.88×10−4 0.950 ρ 7.12×10−3 50.00 τ 0.400 4.80×10−5 α 0.020 3.52×10−6 δ 0.989 β 8.69×10−6 0.035 σ² 4.47×10−6 1.58×10−4 σ1 1.87×10−8 σ2 1.12×10−2 2.14×10−7 σ3 8.65×10−4 2.33×10−7 91 Maximum Likelihood Estimates Extreme Calibration Parameters MLE s.d. 0.357 2.42×10−6 θ 0.950 6.12×10−3 ρ 50.000 0.022 τ 0.400 α 3.62×10−7 0.019 δ 7.43×10−6 0.990 β 1.00×10−5 0.035 σ² 0.015 1.58×10−4 σ1 0.017 σ2 1.12×10−3 0.014 σ3 8.66×10−4 0.023 92 Convergence on Number of Particles Convergence Real Data N Mean s.d. 10000 1014.558 0.3296 20000 1014.600 0.2595 30000 1014.653 0.1829 40000 1014.666 0.1604 50000 1014.688 0.1465 60000 1014.664 0.1347 93 Posterior Distributions Real Data Parameters Mean s.d. 7.976 × 10−4 θ 0.323 0.008 ρ 0.969 0.011 τ 1.825 0.001 α 0.388 3.557 × 10−5 δ 0.006 9.221 × 10−5 β 0.997 σ² 0.023 2.702 × 10−4 σ1 0.039 5.346 × 10−4 σ2 0.018 4.723 × 10−4 σ3 0.034 6.300 × 10−4 94 Maximum Likelihood Estimates Real Data Parameters MLE s.d. 0.044 θ 0.390 0.708 ρ 0.987 1.398 τ 1.781 0.019 α 0.324 0.160 δ 0.006 β 0.997 8.67×10−3 σ² 0.023 0.224 σ1 0.038 0.060 σ2 0.016 0.061 σ3 0.035 0.076 95 Logmarginal Likelihood Difference: Nonlinear-Linear p Benchmark Calibration Extreme Calibration Real Data 0.1 73.631 117.608 93.65 0.5 73.627 117.592 93.55 0.9 73.603 117.564 93.55 96 output hours inv Nonlinear versus Linear Moments Real Data Real Data Nonlinear (SMC filter) Linear (Kalman filter) Mean s.d Mean s.d Mean s.d 1.95 0.073 1.91 0.129 1.61 0.068 0.36 0.014 0.36 0.023 0.34 0.004 0.42 0.066 0.44 0.073 0.28 0.044 97 A “Future” Application: Good Luck or Good Policy? • U.S. economy has become less volatile over the last 20 years (Stock and Watson, 2002). • Why? 1. Good luck: Sims (1999), Bernanke and Mihov (1998a and 1998b) and Stock and Watson (2002). 2. Good policy: Clarida, Gertler and Galı́ (2000), Cogley and Sargent (2001 and 2003), De Long (1997) and Romer and Romer (2002). 3. Long run trend: Blanchard and Simon (2001). 98 How Has the Literature Addressed this Question? • So far: mostly with reduced form models (usually VARs). • But: 1. Results difficult to interpret. 2. How to run counterfactuals? 3. Welfare analysis. 99 Why Not a Dynamic Equilibrium Model? • New generation equilibrium models: Christiano, Eichebaum and Evans (2003) and Smets and Wouters (2003). • Linear and Normal. • But we can do it!!! 100 Environment • Discrete time t = 0, 1, ... t = (s , ..., s ) and probability • Stochastic process s ∈ S with history s t 0 ³ ´ µ st . 101 The Final Good Producer • Perfectly Competitive Final Good Producer that solves max yi(st) µZ Z ³ ´θ ¶ 1θ ³ ´ ³ ´ t yi s di − pi st yi st di. • Demand function for each input of the form ³ ´ 1 θ−1 ³ ´ t ³ ´ p s i t , y s yi st = p (st) with price aggregator: ³ ´ p st = ÃZ ! θ−1 ³ ´ θ θ t θ−1 pi s di . 102 The Intermediate Good Producer • Continuum of intermediate good producers, each of one behaving as monopolistic competitor. • The producer of good i has access to the technology: ½ ¾ ³ ´ ³ ´ ³ ´ t yi st = max ez (s )kiα st−1 li1−α st − φ, 0 . ³ ´ ³ ´ ³ ´ t t−1 • Productivity z s = ρz s + εz st . • Calvo pricing with indexing. Probability of changing prices (before observing current period shocks) 1 − ζ. 103 Consumers Problem ³ ´´σ c ³ ´σ ³ ´σ m ³ ³ ´ ∞ ³ ´ c st − dc st−1 ³ ´ l st l ³ ´ m st X E β t εc st − εl st + εm st σc σl σm st t=0 ¯ ´ ³ ³ ´³ ³ ´ ³ ´´ ³ ´ Z ³ ´ ¯ t t t t t t+1 t+1 p s c s +x s +M s + q s ¯s B s dst+1 = t+1 s ³ ´³ ³ ´ ³ ´ ³ ´ ³ ´´ ³ ´ ³ ´ ³ ´ ³ ´ t t t t t−1 t−1 t t p s w s l s +r s k s +M s + B s + Π s + T st ³ ´ t+1 B s ≥B ³ ´ t ³ ´ ³ ´ ³ ´ x s t t−1 ´ + x st . k s = (1 − δ) k s − φ ³ k st−1 104 Government Policy • Monetary Policy: Taylor rule ³ ´ ³ ´ t i s = rg π g st ³ ´³ ³ ´ ³ ´´ t t +a s π s − π g st ³ ´³ ³ ´ ³ ´´ ³ ´ t t t +b s y s − yg s + εi st ³ ´ ³ ´ ³ ´ t t−1 πg s = πg s + επ st ³ ´ ³ ´ ³ ´ t t−1 a s = a s + εa st ³ ´ ³ ´ ³ ´ t t−1 b s = b s + εb st • Fiscal Policy. 105 Stochastic Volatility I • We can stack all shocks in one vector: ³ ´ ³ ³ ´ ³ ´ ³ ´ ³ ´ ³ ´ ³ ´ ³ ´ ³ ´´0 t t t t t t t t ε s = εz s , εc s , εl s , εm s , εi s , επ s , εa s , εb st • Stochastic volatility: ³ ´ ³ ´0.5 ³ ´ t ε s = R st ϑ st . ³ ´ • The matrix R st can be decomposed as: ³ ´ ³ ´−1 ³ ´ ³ ´ t t H st G st . R s =G s 106 Stochastic Volatility II ³ ´ • H st (instantaneous shocks variances) is diagonal with nonzero ele³ ´ ments hi st that evolve: ³ ´ ³ ´ ³ ´ t t−1 + ς iη i st . log hi s = log hi s ³ ´ • G st (loading matrix) is lower triangular, with unit entries in the ³ ´ diagonal and entries γ ij st that evolve: ³ ´ ³ ´ ³ ´ t t−1 + ω ij ν ij st . γ ij s = γ ij s 107 Where Are We Now? • Solving the model: problem with 45 state variables: physical ³ capital, ´ t the aggregate price level, 7 shocks, 8 elements of matrix H s , and ³ ´ the 28 elements of the matrix G st . • Perturbation. • We are making good progress. 108