Drawing Stochastic Volatility William J. McCausland 1 Université de Montréal, CIREQ and CIRANO Denis Pelletier 2 North Carolina State University Current version: May 1, 2006, very preliminary 1 Mailing succursale address: Département de Centre-ville, Montréal QC sciences H3C économiques, 3J7, Canada. C.P. 6128, e-mail: william.j.mccausland@umontreal.ca. Web site: www.cirano.qc.ca/∼mccauslw. 2 Mailing address: Department of Economics, Campus Box 8110, North Carolina State University, Raleigh, 27695-8110, USA. e-mail: denis pelletier@ncsu.edu. Web site: http://www4.ncsu.edu/∼dpellet. We thank Siddhartha Chib, John Geweke, Éric Jacquier, Luke Tierney and Herman van Dijk for helpful comments. We alone are responsible for any errors. Abstract Bayesian analysis of stochastic volatility (SV) models typically involves using Markov chain Monte Carlo (MCMC) methods to simulate the joint posterior distribution of parameters and stochastic volatility. These simulators often draw proposals of parameters or volatility from approximate distributions and then use an accept/reject scheme to correct for the approximations. Jacquier, Polson and Rossi (1994) introduce a simulator which draws volatility proposals one observation at a time. Shephard and Pitt (1997) demonstrate the important numerical efficiency gains obtained by drawing (log) volatility proposals in blocks. Kim, Shephard and Chib (1998) show the substantial additional improvements to be had by drawing joint proposals of parameters and the entire log-volatility vector. We propose a new procedure for drawing joint proposals of parameters and log-volatility in a standard univariate stochastic volatility model. We first describe a simple, but naive, multivariate normal log-volatility proposal distribution approximating the conditional distribution of log-volatility given parameters and return data. We then describe two refinements, which result in a much better approximation. We then present a parameter proposal distribution approximating the conditional distribution of parameters given only data. Combining the log-volatility proposal with the parameter proposal gives a joint proposal whose distribution closely approximates the joint posterior distribution of parameters and log-volatility. The procedure is fast, and for reasonable numbers of observations, we achieve high numerical efficiency. The general approach is not very model specific, and we suggest that our procedure can be modified to work with more general models, with features such as leverage effects, finite-mixture-of-normals return shocks and multivariate returns without a factor structure. We illustrate our procedure using foreign exchange data. 1 Introduction The conditional variance, or volatility, of asset returns evolves over time. Engle’s (1982) autoregressive conditional heteroscedasticity (ARCH) model and various generalizations capture this phenomenon. Since volatility is a deterministic function of previous returns, the likelihood function is easy to compute. In stochastic volatility (SV) models, volatility is a latent stochastic process. Jacquier, Polson and Rossi (1994) and Geweke (1994) give evidence suggesting that SV models are more realistic. However, the likelihood function cannot easily be evaluated, which makes maximum likelihood approaches difficult. Bayesian approaches to inference for SV models are widely used, partly because it is easy to avoid having to evaluate the likelihood function: the joint posterior density of parameters and volatilities is easily evaluated up to a multiplicative normalization constant, and so we can simulate this joint posterior distribution using Markov chain Monte Carlo methods. We now describe a simple univariate discrete-time stochastic volatility model. This model, together with variants differing only in terms of parametrization, is widely used. The log-volatility equation is λt = φ λt−1 + ξt , and the return equation is rt = eλt /2 ηt . The error sequences {ξt } and {ηt } are both Gaussian white noise and are mutually independent. Their precisions1 are ωλ and ωr respectively. The volatility sequence {λt } is stationary. Define the parameter vector θ ≡ (ωλ , φ, ωr ). Jacquier, Polson and Rossi (1994) propose a posterior simulator that draws volatility proposals one observation at a time. This approach is relatively simple. It is not overly model-specific, and it can thus easily be extended to more elaborate 1 The precision is the inverse of the variance 1 models. However, the high posterior autocorrelation of volatility, especially for daily returns, leads to highly autocorrelated posterior draws. Shephard and Pitt (1997) (SP) show how to draw blocks of log-volatilities using a Metropolis-Hastings update. This update involves drawing a random proposal for the volatility block and accepting it with a probability given by the Hastings ratio, which measures the closeness of the proposal density to the true conditional posterior density at both the current state and the random proposal. If the proposal is rejected, the new value of the volatility block is set equal to its value at the current state. The log-volatility block proposals are multivariate normal. Capturing volatility autocorrelation within a block reduces the autocorrelation of posterior draws, but only to a point: as the blocks get longer, the proposal distribution becomes a cruder approximation of the correct distribution, and the acceptance rate deteriorates. Eventually, the autocorrelation of posterior draws begins to increase with block length. Kim, Shephard and Chib (1998) (KSC) transform the canonical SV model into a linear one, and approximate the random component of the transformed model as a mixture of normals. They employ a technique known as data augmentation, adding to the vector of unknown quantities a sequence of discrete latent variables indicating mixture components. Their simulator samples the joint distribution of parameters, volatilities and mixture component indices. Conditional on these indices, the approximate model is linear and Gaussian, and this simplicity allows them not only to draw volatilities for all observations at once, achieving a similar numerical efficiency improvement as SP, but also to integrate out volatilities and then draw both volatilities and parameters as a single block. Since volatilities and parameters are highly correlated, this leads to further numerical efficiency improvements. The KSC method has been extended to many generalizations of the canonical SV model, including models with a leverage effect (Omori et al. (2006)), scale mixtures of normals for the return equation shock (Chib, Nardari and Shephard (2002), Omori et al. (2006)), jumps (Chib, Nardari and Shephard (2002)), and multivariate returns (Chib, Nardari and Shephard (2006)). 2 Important as it is, this method has some limitations. The chain’s stationary distribution is only an approximation of the posterior distribution. Obtaining simulation-consistent sample moments requires re-weighting the draws generated by the chain, and numerical standard errors for the reweighted chain are larger than those for the unweighted chain. While samplers based on the KSC approach have been developed for models with pure scale mixtures for the return equation shock, more general mixtures present difficulties. The methods of Chib, Nardari and Shepard (2006) apply to factor models with independent factors. Models that do not have a factor structure are more difficult. In this paper, we propose a new method for drawing joint proposals of parameters and volatility. The procedure is fast and has very high numerical efficiency for reasonable numbers of observations. We will argue that our general approach to drawing volatility can be extended to models where return shocks have a finite mixture of normals distribution and to multivariate models with very flexible cross-sectional dependance. Section ?? is devoted to preliminary concepts. We write down the density functions of the model, review properties of the precision and co-vector of a normal distribution and discuss quadratic approximations of log f (r|θ, λ). In Section ??, we discuss methods for drawing volatility proposals λ∗ = (λ∗1 , . . . , λ∗T ) and evaluating the proposal distribution q(λ∗ ; θ, r) at the realized draw, all in O(T ) time. We first present a basic multivariate normal proposal that approximates the conditional posterior distribution λ|r, θ. It is based on a quadratic approximation of log f (r|θ, λ) using its gradient and (diagonal) Hessian at the mode of λ|r, θ. We show how to draw λ∗t sequentially backwards. To compute E[λ∗t |λ∗t+1 , . . . , λ∗T ] and var[λ∗t |λ∗t+1 , . . . , λ∗T ], we use an algorithm by Vandebril, Mastronardi and Van Barel (VMVB) for solving band diagonal symmetric systems. The conditional means and variances we need are simple functions of intermediate computations. We then offer two refinements of the basic proposal distribution. The first refinement involves changing the quadratic approximation of log f (r|θ, λ) as we 3 learn about the trajectory of volatility. The result is a proposal that relaxes multivariate normality, but retains conditional normality of the λ∗t |λ∗t+1 , . . . , λ∗T . In the second refinement, we replace the conditional normal distribution of λ∗t |λ∗t+1 , . . . , λ∗T with a mixture distribution which captures some of the departure from normality of the distribution λt |λt+1 , . . . , λT , r, θ. The proposal distribution we use in practice incorporates both refinements. In Section ??, we show how to draw proposals θ∗ of the parameter vector from a distribution approximating its marginal posterior distribution θ|r. Together with the log-volatility proposal of Section ??, which draws λ∗ from a distribution approximating λ|θ, r, we obtain a proposal distribution for the pair (θ∗ , λ∗ ) that closely approximates the distribution θ, λ|r. In Section ??, we show how to approximate the marginal likelihood f (r). In Section ??, we present empirical results suggesting that the simulator is numerically efficient. In Section ??, we conclude, suggesting that our approach can be extended to more general models. These include models with features such as leverage effects and return equation shocks with a finite mixture of normals distribution. They also include multivariate models with very flexible dependance. 2 Preliminaries 2.1 Densities for parameters, volatilities and returns We complete the model with a prior distribution for θ = (ωλ , φ, ωr ) where ωλ , φ and ωr are independent, s̄λ ωλ ∼ χ2 (ν̄λ ), s̄r ωr ∼ χ2 (ν̄r ), and the distribution of φ is the truncation of the distribution N(φ̄, ω̄φ−1) to the stationary interval (−1, 1). 4 Thus, f (θ) = · · s̄2λ 2 ν̄λ /2 1 (ν̄ −2)/2 ω λ exp(−s̄2λ ωλ /2) Γ(ν̄λ /2) λ r h ω̄ i ω̄φ 1 φ exp − (φ − φ̄)2 √ √ 2 Φ( ω̄φ (1 − φ̄)) − Φ( ω̄φ (−1 − φ̄)) 2π 2 ν̄r /2 1 s̄r ω (ν̄r −2)/2 exp(−s̄2r ωr /2) 2 Γ(ν̄r /2) r We observe the return rt for t = 1, . . . , T . Let λ = (λ1 , . . . , λT ) and r = (r1 , . . . , rT ). The distributions λ|ωλ , φ and r|λ, ωr have densities 1/2 ωλ (1 − φ2 ) 2 ωλ (1 − φ2 ) exp − λ1 f (λ|ωλ, φ) = 2π 2 T i h ω Y ωλ 1/2 λ 2 · exp − (λt − φλt−1 ) 2π 2 t=2 and f (r|λ, ωr ) = 1/2 T Y ωr e−λt t=1 2π ωr e−λt 2 exp − rt . 2 2.2 Precisions and co-vectors We find it very useful to work with precisions and co-vectors as well as means and variances, so we define these terms here and point out some simple but useful properties. For any random vector x ∼ N(µ, Σ), we call H ≡ Σ−1 the precision, and c ≡ Σ−1 µ the co-vector. Note that Σ = H −1 and µ = H −1 c, an illustration of the duality between (µ, Σ) and (c, H). In matrix notation, the density for λ|ωλ, φ is f (λ|ωλ, φ) = (2π)−T /2 |H̄|1/2 · exp − 21 λ′ H̄λ , 5 where the precision H̄ is given by 1 −φ 0 2 −φ 1 + φ −φ 0 −φ 1 + φ2 H̄ = ωλ . .. .. .. . . 0 0 0 0 0 0 ··· ··· ··· .. . 0 0 0 0 .. . 0 0 .. . · · · 1 + φ2 ··· −φ . −φ 1 The co-vector is c̄ = 0T . It is easy to verify that for 1 ≤ t < T , the distribution λ1 , . . . , λt |ωλ, φ, λt+1 , . . . , λT is multivariate normal, that its precision is the sub-matrix H̄1,1 · · · H̄1,t . .. .. .. . . , H̄t,1 · · · H̄t,t and that its co-vector is (0, . . . , 0, φ ωλλt+1 )′ . 2.3 Taylor Expansions of f (rt|λt , ωr ) We note that log f (r|λ, θ) is concave in λ and additively separable in λ1 , . . . , λT . We will approximate log f (r|λ, θ), as a quadratic form in λ with a diagonal coefficient matrix. This will lead to multivariate normal approximations of the posterior distribution λ|θ, r. Let g(λt ) ≡ ωr rt2 e−λt +λt , and note that f (rt |λt , ωr ) is proportional to exp[− 21 g(λt)] as a function of λt . We approximate g(λt ) by g̃(λt ), consisting of the first three terms of the Taylor expansion of g(λt ) around some value λ◦t : g(λt) ≈ g̃(λt ) ≡ g(λ◦t ) + g ′ (λ◦t ) · (λt − λ◦t ) + 21 g ′′ (λ◦t ) · (λt − λ◦t )2 . The first two derivatives of g(λt ) are g ′ (λt ) = −ωr rt2 e−λt + 1 and g ′′ (λt ) = ωr rt2 e−λt . 6 If we complete the square, we obtain g̃(λt ) = ht (λt − ct /ht )2 + k, where ht = 12 g ′′ (λ◦t ) = 12 ωr rt2 e−λt , ◦ ct = 1 2 = 1 2 [g ′′ (λ◦t )λ◦t − g ′ (λ◦t )] ◦ ◦ ωr rt2 e−λt λ◦t − 1 + ωr rt2 e−λt = ht (1 + λ◦t ) − 21 , and k is an unimportant term not depending on λt . We point out that ht and ct are the precision and co-vector of a univariate normal distribution with density proportional to exp[− 21 g̃(λt )]. The additive separability of log f (r|λ, θ) in the elements of λ means that it is Q reasonably well approximated, as a function of λ, by Tt=1 exp[− 12 g̃(λt )], which is proportional to a multivariate normal distribution with precision H and co-vector c, given by h1 0 · · · 0 c1 . 0 h2 · · · 0 . H≡ .. . . . and c ≡ .. . . . .. . . cT 0 0 · · · hT The posterior density f (λ|r, θ), proportional to f (r|λ, θ)f (λ|θ), can be approxi¯ = H̄ + H and co-vector mated by a multivariate normal density with precision H̄ c̄¯ = c̄ + c. The approximation depends on the choice of λ◦ , the vector of values around which we Taylor-expand the g(λt ). We will discuss this choice later, but for now consider the the mode of the distribution λ|θ, r as a reasonable choice. 2.4 An algorithm for solving band diagonal systems Vandebril, Mastronardi and Van Barel (2005) (VMVB) introduce a Levinson-like algorithm for finding the solution x to the equation Ax = y, where A is an T × T real symmetric band-diagonal matrix, and y is a real T × 1 vector, in time O(T ). 7 For A with a single non-zero off-diagonal, the algorithm is as follows. 1. Compute µ1 := y1 /A11 , α1 := −1/A11 . 2. For t = 2, . . . , T , compute µt := yt − At,t−1 µt−1 , Att + (At,t−1 )2 αt−1 αt := −1 , Att + (At,t−1 )2 αt−1 3. Compute xT := µT . 4. For t = T − 1, . . . , 1, compute xt := µt + At+1,t αt xt+1 VMVB show that the result x indeed satisfies y = Ax. They also give evidence, using numerical experiments, that their procedure is stable. The issue of stability is important and not trivial: there is a simple and fairly obvious algorithm for solving Ax = y that is numerically unstable. The following two results will be important for drawing volatility proposals. Result 2.1 ((A1:t )−1 y1:t )t = µt , where A1:t is the leading t × t sub-matrix of A and y1:t is the leading t × 1 sub-vector of y. Proof. Apply the algorithm to the system A1:t z = y1:t and note that µ1 , . . . , µt and α1 , . . . , αt are identical for the systems A1:t z = y1:t and A x = y. Result 2.2 ((A1:t )−1 )tt = −αt . Proof. Apply the algorithm to the system A1:t z = et , where et is the t-vector (0, . . . , 0, 1), and note that µτ = 0 for τ = 1, . . . , t − 1 and µt = −αt . 8 3 Volatility Proposals We introduce methods for drawing volatility proposals λ∗ . We first introduce a simple, but inefficient, basic proposal. We then discuss two refinements of the basic proposal, and show that they greatly improve the numerical efficiency of a chain simulating λ|θ, r. 3.1 The Basic Proposal The basic proposal is multivariate normal. We choose λ◦ as the mode of the ¯ and c̄¯ from λ◦ as described in Section posterior distribution λ|θ, r and compute H̄ ¯ and c̄¯: ??. Then λ∗ is multivariate normal with precision H̄ ¯ −1 c̄¯, H̄ ¯ −1 ) λ∗ ∼ N(H̄ We obtain λ◦ using an iterative approach where we repeat the following two steps until kλ◦(i) − λ◦(i−1) k∞ is less than a certain tolerance. ¯ and c̄¯ in terms of λ◦ . 1. Compute H̄ (i) (i) (i−1) ¯ −1 c̄¯ using the VMVB algorithm. 2. Compute λ◦(i) = H̄ (i) (i) We draw the λ∗t sequentially backwards. Recall that the precision of the dis¯ of H̄ ¯ and the cotribution λ∗1 , . . . , λ∗t |λ∗t+1 , . . . , λ∗T is the leading sub-matrix H̄ 1:t vector is c̄¯1:t + φωλ λ∗t+1 et , where c̄¯1:t is the leading t-vector of c̄¯ and et is the t-vector (0, . . . , 0, 1). Standard results on conditional normal distributions give ¯ )−1 (c̄¯ + φω λ∗ e ), ((H̄ ¯ )−1 ) ). λ∗t |λ∗t+1 , . . . , λ∗T ∼ N((H̄ 1:t 1:t λ t+1 t 1:t tt ¯ )−1 ) is simply the interResult ?? tells us that the conditional variance ((H̄ 1:t tt ¯. mediate quantity −αt obtained by applying the VMVB algorithm with A = H̄ ¯ and y = ȳ¯ and note that µ must be replaced with m Result ?? (take A = H̄ t t ¯ ¯ −1 ¯ ∗ −1 ¯ to compute (H̄1:t ) (c̄1:t + φωλ λt+1 et ) rather than (H̄1:t ) c̄1:t ) tells us that the ¯ )−1 (c̄¯ + φω λ∗ ) is given by conditional mean (H̄ 1:t 1:t λ t+1 mt ≡ ¯ c̄¯t + φωλλ∗t+1 − H̄ t,t−1 µt−1 . ¯ + (H̄ ¯ H̄ )2 α tt t,t−1 9 t−1 Putting everything together, we see that the following algorithm draws λ∗ and evaluates q ∗ , the proposal density evaluated at λ∗ , all in time O(T ). We take µ0 = α0 = 0 and initialize q ∗ := 1. 1. For t = 1, . . . , T , compute ¯ c̄¯t − H̄ t,t−1 µt−1 , µt := ¯ ¯ H̄tt + (H̄t,t−1 )2 αt−1 −1 αt := ¯ , ¯ H̄tt + (H̄t,t−1 )2 αt−1 2. For t = T, T − 1, . . . , 1, (a) compute ¯ c̄¯t + φωλ λ∗t+1 − H̄ t,t−1 µt−1 mt := . ¯ + (H̄ ¯ H̄ )2 α tt t,t−1 t−1 (b) Draw λ∗t ∼ N(mt , −αt ). (c) Evaluate 1 1 (λ∗t − mt )2 q := q · p exp − 2 −αt 2π(−αt ) ∗ ∗ We illustrate the performance of the basic proposal with an example. We first simulate some data. We set θ = (ωλ , φ, ωr ) = (20.0, 0.95, 1.0 × 104) and draw T = 1000 return observations. We then simulate the distribution λ|θ, r using an independence Metropolis Hastings chain, with the basic proposal distribution. We draw M = 106 proposals in all. The chosen value of ωλ is small but plausible. We do this to increase the weight of the non-quadratic contribution to log f (λ|r, θ) relative to the quadratic part, making this a difficult but reasonable case. To analyze the performance of the proposal, we look at the run lengths of sequences of repeated values of the λ vector in the posterior sample. If a volatility proposal λ∗ is accepted, the next n volatility proposals are rejected, and the n + 1st is accepted, we say that the run length of λ∗ is n + 1. Long run lengths indicate poor mixing of the posterior chain and high numerical standard errors. 10 Figure ?? shows estimates of the survival function and the hazard rate of the run length for the basic proposal. We see that the distribution of run lengths has a thick tail, and that long run lengths are intolerably probable. 3.2 A First Refinement The basic multivariate normal proposal is based on a quadratic approximation of log f (r|λ, θ) at λ◦ , the mode of the distribution λ|r, θ. The idea behind the first refinement is to use the information obtained while drawing λ∗ to refine the quadratic approximation of log(r|λ, θ). After each draw λ∗t , we adjust λ◦t−J , . . . , λ◦t−1 , where J is a small integer, in the direction of the mode of the conditional distribution λ1 , . . . , λt |λt+1 , . . . , λT , θ. Using the modified values λ◦t−J , . . . , λ◦t−1 , we refine the quadratic approximation of log f (r|λ, θ) by adjusting ht−J , . . . , ht and ct−J , . . . , ct . We then recompute αt−J , . . . , αt and µt−J , . . . , µt−1 . Finally, we compute mt , the conditional mean E[λ∗t |λ∗t+1 , . . . , λT ]. The conditional variance is the negative of the adjusted value of αt . The result is, like the basic proposal, a sequence of draws λ∗T , . . . , λ∗1 that are conditionally normal. However, the proposal is no longer multivariate normal. See the modified algorithm below. Step one, the forward pass, is identical to that of the basic proposal. The backward pass has three sub-steps. The horizon parameter J determines how far back we refine our quadratic approximations of g(λt ) in order to draw each λ∗t . In sub-step 2(a), we compute, for τ = t, t−1, . . . , t−J, an approximation mτ of the conditional mean E[λτ |λt+1 = λ∗t+1 , . . . , λT = λ∗T , θ, r]. We then recompute the quadratic expansion of g(λτ ) around mτ , obtaining new values for hτ and cτ . In sub-step 2(b), we update, for τ = t − J, . . . , t − 1, the values µτ and ατ based on the new values of hτ and cτ . In sub-step 2(c), we update mt , which we use as the conditional mean E[λ∗t |λ∗t+1 , . . . , λ∗T ], and αt , the negative of which we use as the conditional variance var[λ∗t |λ∗t+1 , . . . , λ∗T ]. In sub-step 2(d), we draw λ∗t ∼ N(mt , −αt ). In sub-step 2(e), we update the eval- uation of q ∗ , the proposal distribution evaluated at λ∗ . We take µ0 = α0 = 0 and 11 0 10 −1 10 −2 10 −3 10 −4 10 −5 10 0 50 100 150 200 250 300 350 400 Figure 1: Survival Function and Hazard Rate for Run Lengths, Basic Proposal 12 450 500 initialize q ∗ := 1. 1. For t = 1, . . . , T , compute ¯ c̄¯t − H̄ t,t−1 µt−1 µt := ¯ , ¯ 2 H̄tt + (H̄ t,t−1 ) αt−1 −1 αt := ¯ . ¯ 2 H̄tt + (H̄ t,t−1 ) αt−1 2. For t = T, T − 1, . . . , 1, (a) for τ = t, t − 1, . . . , min(t − J, 1), compute ¯ ∗ c̄¯t +φωλ λt+1 −H̄t,t−1 µt−1 ¯ ¯ H̄tt +(H̄t,t−1 )2 αt−1 mτ := µ + α H̄ ¯ τ τ τ −1,τ mτ +1 τ =t τ < t, hτ := 21 ωr rτ2 e−mτ , cτ := hτ (1 + mτ ) − 12 . (b) for τ = min(t − J, 1), . . . , t − 1, compute ¯ c̄¯τ − H̄ τ,τ −1 µτ −1 µτ := ¯ ¯ H̄ + (H̄ )2 α ττ τ,τ −1 τ −1 , −1 ατ := ¯ , ¯ H̄τ τ + (H̄τ,τ −1 )2 ατ −1 (c) Compute mt := ¯ c̄¯t + φωλ λ∗t+1 − H̄ t,t−1 µt−1 , ¯ + (H̄ ¯ H̄ )2 α tt t,t−1 t−1 −1 αt := ¯ , ¯ 2 H̄tt + (H̄ t,t−1 ) αt−1 (d) Draw λ∗t ∼ N(mt , −αt ). (e) Evaluate 1 (λ∗t − mt )2 1 exp − q := q · p 2 −αt 2π(−αt ) ∗ ∗ We illustrate the performance of these proposals using the same run length analysis that we used for the basic proposal. Figure ?? illustrates the survival function and the hazard rate for the run lengths of four different proposals. The four panels correspond to the horizon parameters J = 1, J = 3, J = 5 and J = 7. We see important improvements in the performance of the proposal. Beyond J = 7, there is little additional improvement. 13 0 10 −5 10 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 0 10 −5 10 0 10 −5 10 0 10 −5 10 Figure 2: Survival Function and Hazard Rate for Run Lengths, Proposal with First Refinement, J = 1, 3, 5, 7 14 3.3 A Second Refinement While the conditional distribution λt |λt+1 , . . . , λT , r, θ is very nearly normal, it is in fact slightly positively skewed and its right tail is thicker than that of the normal distributions λ∗t |λ∗t+1 , . . . , λ∗T of the basic proposal. In the second refinement, we replace this conditional normal distribution with a mixture distribution designed to capture these departures from normality. For a better understanding of these issues, we first consider the distribution λt |λ1 , . . . , λt−1 , λt+1 , . . . , λT , θ, r. Its unnormalized density is given by f (λt |λ1 , . . . , λt−1 , λt+1 , . . . , λT , θ, r) φ 1 2 2 , (λt−1 + λt+1 )) ∝ exp − g(λt ) + ωλ (1 + φ )(λt − 2 1 + φ2 where g(λt) = ωr rt2 exp(−λt ) + λt . Let λ◦t be the mode of this distribution and g̃(λt ) the Taylor expansion around λ◦t up to the quadratic term. We then have the following approximation. f (λt |λ1 , . . . , λt−1 , λt+1 , . . . , λT , θ, r) s i ĥ 1h ◦ 2 exp − ĥt (λt − λt ) ≈ 2π 2 1 φ 2 2 ∝ exp − g̃(λt ) + ωλ (1 + φ )(λt − , (λt−1 + λt+1 )) 2 1 + φ2 where 1 ĥt = ωλ (1 + φ2 ) + ωr rt2 exp(−λ◦t ). 2 The next term of the Taylor expansion of g(λt ) around λ◦t is 1 ′′′ ◦ 1 g (λt )(λt − λ◦t )3 = ωr rt2 exp(−λ◦t )(λt − λ◦t )3 . 3! 6 Including this term gives the following improved approximation: f (λt |λ1 , . . . , λt−1 , λt+1 , . . . , λT , θ, r) s 1 1 ĥt ◦ 2 2 ◦ ◦ 3 . exp − ĥt (λt − λt ) − ωr rt exp(−λt )(λt − λt ) ≈ 2π 2 6 15 Using the approximation ex ≈ 1 + x, we have f (λt |λ1 , . . . , λt−1 , λt+1 , . . . , λT , θ, r) s i ĥt 1 1h ◦ 2 2 ◦ ◦ 3 1 + ωr rt exp(−λt )(λt − λt ) . exp − ĥt (λt − λt ) ≈ 2π 2 12 We can approximate the conditional mean E[λt |λ1 , . . . , λt−1 , λt+1 , . . . , λT , θ, r] by the following integral. s Z i ĥt 1 1h ◦ 2 ◦ 2 ◦ ◦ 4 (λt − λt ) + ωr rt exp(−λt )(λt − λt ) dλt exp − ĥt (λt − λt ) 2π 2 12 Knowing that the kurtosis of a normal random variable is 3, we compute the value of the integral to be λ◦t + 1 ωr rt2 exp(−λ◦t ) . 4 [ωλ (1 + φ2 ) + ωr rt2 exp(−λ◦t )]2 The second term is a reasonable mean correction term for the conditional distribution λt |λ1 , . . . , λt−1 , λt+1 , . . . , λT , θ, r. This is not directly useful, however, since we would like to refine our approximation of the distribution λt |λt+1 , . . . , λT , θ, r. We can approximate the mean of the distribution λ|r, θ as ¯ −1 (c̄¯ + c∗ ) = λ◦ + H̄ ¯ −1 c∗ , E[λ|r, θ] ≈ H̄ ¯ and c̄¯ are the precision and cowhere λ◦ is the mode of the distribution λ|r, θ, H̄ vector associated with this value of λ◦ , and c∗ is a vector of co-vector corrections, with ωr rt2 exp(−λ◦t ) 1 . c∗t = 4 ωλ (1 + φ2 ) + ωr rt2 exp(−λ◦t ) ¯ −1 c∗ of mean corrections using the basic We compute the vector m∗ = H̄ VMVB algorithm. To capture the skewness and thick right tail of λt |λt+1 , . . . , λT , θ, r, we replace the normal distribution λ∗t |λ∗t+1 , . . . , λ∗T with the mixture distribution with 16 the following density. q(λ∗t |λ∗t+1 , . . . , λ∗T ) 1 (λ∗t − µ1 )2 1 exp − = πp 2 σ12 2πσ12 Φ(z) 1 1 (λ∗t − µ2 )2 + (1 − π) · 1(−∞,µ2 +zσ2 ) (λ∗t ) ·p exp − 2 2 Φ(z) + φ(z)/z 2 σ 2πσ2 2 φ(z)/z z ∗ z + (1 − π) · exp − (λt − (µ2 + zσ2 )) 1[µ2 +zσ2 ,∞)(λ∗t ). · Φ(z) + φ(z)/z σ2 σ2 The second component is truncated normal on the support (−∞, µ2 + zσ2 ) while the third component is exponential on the support [µ2 + zσ2 , ∞). Their relative component probabilities and the mean parameter of the exponential distribution are chosen to match slopes and values at µ2 + zσ2 . The exponential component is designed to thicken the right tail of the distribution. We choose the parameters µ1 , µ2 , σ1 , σ2 , π and z so that the mixture distribution is positively skewed and has a mean of mt + m∗t . Trial and error suggest that the following parameter values are reasonable: µ1 = mt + m∗t /2, µ2 = mt + 3m∗t , σ12 = −αt , σ22 = 0.04/ωλ + 0.96(−α), π = 0.8 and z = 2.5. We hope to put the choice of these parameters on a more rigorous footing in a later version of the paper. Figure ?? illustrates the performance of the proposal distribution when we add both refinements. It illustrates the survival function and the hazard rate for the run lengths of four different proposals. The four panels correspond to the horizon parameters J = 1, J = 3, J = 5 and J = 7. We see significant additional improvements in performance. 4 Joint Parameter and Volatility Proposals Please be aware that this section is incomplete and in some places innacurate. The method we currently use to draw parameters is similar to the method described 17 0 10 −5 10 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 0 10 −5 10 0 10 −5 10 0 10 −5 10 Figure 3: Survival Function and Hazard Rate for Run Lengths, Proposal with First Refinement and Second Refinement, J = 1, 3, 5, 7 18 here, but not exactly the same. We are currently working on an alternate method for approximating the f (θ|r) posterior density. We first propose a method for drawing proposals θ∗ from a distribution approximating θ|r. We then note that this method can be combined with the procedure outlined in the previous section for drawing proposals λ∗ from a distribution approximating λ|θ, r. The two procedures can be used to draw joint proposals (θ∗ , λ∗ ) from a distribution approximating θ, λ|r. Before doing posterior simulation for real, we draw a “burn-in” pre-sample (b) (θ , λ(b) )B b=1 . Our proposal distribution is a mixture distribution with B equally probable mixture components. The b’th component approximates the distribution θ|λ(b) , r and thus the mixture approximates the distribution θ|r. Specifically, the proposal distribution has the following density: B 1 X q(θ) = f (ωλ |φ(b) , λ(b) )f˜(φ; ωλ, λ(b) , r)f (ωr |λ = λ(b) ). B b=1 The density f (ωλ |φ, λ) is the exact density for the conditional posterior distribution ωλ |φ, ωr , λ: 2 ν̄¯λ /2 1 s̄¯λ (ν̄¯ −2)/2 ωλ λ exp(−s̄¯2λ ωλ /2), f (ωλ |φ, ωr , λ, r) = ¯ 2 Γ(ν̄λ /2) where ν̄¯λ ≡ ν̄λ + T and s̄¯λ ≡ s̄λ + (1 − φ2 )λ21 + T X t=2 (λt − φλt−1 )2 . The density f˜(φ; ωλ, ωr , λ, r) approximates the density for the conditional posterior distribution φ|ωλ , ωr , λ: f˜(φ; ωλ, ωr , λ, r) ≡ where 1 Φ( p p ¯ φ (1 − φ̄¯)) − Φ( ω̄ ¯ φ (−1 − φ̄¯)) ω̄ ¯ φ ≡ ω̄φ + ωλ ω̄ T −1 X t=1 λ2t r ¯φ ¯φ ω̄ ω̄ 2 ¯ exp − (φ − φ̄) , 2π 2 ¯ φ−1 ω̄φ φ̄ + ωλ and φ̄¯ ≡ ω̄ 19 T −1 X t=1 λt λt+1 ! . The density f (ωλ |φ, λ) is the exact density for the conditional posterior distribution ωr |φ, ωλ, λ: f (ωr |φ, ωλ, λ, r) = s̄¯2r 2 ν̄¯r /2 1 ¯ ωr(ν̄r −2)/2 exp(−s̄¯2r ωr /2), Γ(ν̄¯r /2) where ν̄¯r ≡ ν̄r + T and s̄¯λ ≡ s̄λ + T X exp(−λt )rt2 . t=1 Simulating from and evaluating these distributions is straightforward. The result is a proposal distribution approximating the distribution θ|λ. We propose the pair (θ∗ , λ∗ ) by proposing θ∗ using the procedure outlined in this section and then λ∗ using the method outlined in the previous section. The joint proposal density is q(θ∗ )q(λ∗ |θ∗ ). We draw the pre-sample in the same way that we draw the real sample, except that we use a mixture with only the currently available components. Thus to draw θ(b) , we use an evenly weighted mixture of the first b − 1 components. We draw θ(1) , the first pre-sample draw of θ, from the prior. 5 Approximation of the Marginal Likelihood Approximating the marginal likelihood is straightforward using importance sampling. We average the ratio of true and proposal densities over the proposals λ∗(1) , . . . , λ∗(M ) which are are i.i.d.: M ∗ ∗ ∗ X )f (r|θ(m) , λ∗(m) ) )f (λ∗(m) |θ(m) f (θ(m) m=1 a.s. −→ = ∗ ∗ q(θ(m) )q(λ∗(m) |θ(m) ) f (θ∗ )f (λ∗ |θ∗ )f (r|θ∗ , λ∗ ) · q(θ∗ )q(λ∗ |θ∗ )dθ∗ dλ∗ q(θ∗ )q(λ∗ |θ∗ ) f (r) Z 20 Table 1: Posterior Moments of Parameters Parameter Mean Std. Dev. Numer. Std. Error Relative Numer. Effic. ωλ 63.28 28.27 0.276 52.4% φ ωr 0.9755 3.836 × 104 0.0102 0.745 × 104 0.00011 0.0074 × 104 40.6% 50.2% 6 An Empirical Example We illustrate our methods with an empirical example. We use data on the French Franc exchange rate from January 6, 1994 to January 15, 1999. We construct the series rt ≡ log Pt − log Pt−1 , where Pt is the noon rate for the French Franc in U.S. dollars. Our data source is the web site of the Federal Reserve Bank of New York. Recall that the parameters ωλ , φ and ωr are a priori independent, s̄λ ωλ ∼ χ2 (ν̄λ ), s̄r ωr ∼ χ2 (ν̄r ), and φ is the truncation of the distribution N(φ̄, σφ2 ) to the stationary interval (−1, 1). We choose ν̄λ = 3.0 and s̄2 = 0.03, which makes the prior mean of ωλ 100.0 and the prior standard deviation approximately 81.65. We set φ̄ = 0.99 and ω̄φ = 2500. The untruncated distribution has mean 0.99 and standard deviation 0.02. Finally, ν̄r = 3.0 and s̄2r = 7.5 × 10−5 , which implies a prior mean of 40 000 for ωr and a prior standard deviation of approximately 32 659.9. Table ?? shows the posterior mean, posterior standard deviation, numerical standard error and relative numerical efficiency for each of the three parameters. Results are based on a posterior sample of length 20000. The relative numerical efficiency is a ratio of numerical error variances, that of a hypothetical i.i.d. sample to that of the sample we draw. We can interpret an numerical efficiency of 40.6% as meaning that a sample we draw gives the same information about the posterior mean of the parameter in question as an i.i.d. sample 40.6% of its length. 21 Figures ??, ?? and ?? illustrate the three bivariate posterior parameter distributions. The log marginal likelihood is 6734.239 with a numerical standard error of 0.015. Figure ?? shows estimates of the survival function and the hazard rate of the run length for the posterior simulator applied to the French Franc data. 7 Conclusions We have proposed a fast and numerically efficient posterior simulator for a basic univariate stochastic volatility model. In future work, we hope to be able to adapt the simulator to more general models. We suggest that the approach we have taken lends itself well to this project. The basic idea depends on the band diagonality of the Hessian of log f (λ|θ, r) and the prior multivariate normality of the log volatility. Within these constraints we can see several potential extensions. Adding a leverage effect makes the Hessian of log(r|θ, λ) in λ no longer diagonal. But it is still band diagonal, with a single non-zero off-diagonal. We can consider more flexible conditional distributions such as finite mixtures of normals, which can capture conditional leptokurtosis and skewness. Increasing the autoregressive order of log-volatility increases the number of non-zero off-diagonals of the Hessian of log f (λ|θ, r). The VMVB algorithm can accommodate this. Even non-normality of λ|θ might be feasible if we introduce finite mixtures of normals for the conditional distribution of λt and augment the data to include indicators for mixture components. Most importantly, we can look to multivariate models. We hope to be able to handle multivariate models with a flexible dependance structure, and draw each volatility sequence and the parameters on which it depends in one Gibbs block. 22 350 300 250 200 150 100 50 0 0 5 10 15 4 x 10 Figure 4: Posterior Scatterplot of ωλ versus ωr 23 4 15 x 10 10 5 0 0.92 0.93 0.94 0.95 0.96 0.97 Figure 5: Posterior Scatterplot of ωr versus φ 24 0.98 0.99 1 1 0.99 0.98 0.97 0.96 0.95 0.94 0.93 0.92 0 50 100 150 200 Figure 6: Posterior Scatterplot of φ versus ωλ 25 250 300 350 0 10 −1 10 −2 10 −3 10 −4 10 −5 10 0 2 4 6 8 10 12 14 16 Figure 7: Survival Function and Hazard Rate for Run Lengths, Empirical Example 26 18 20 References [1] Siddhartha Chib, Federico Nardari, and Neil Shephard. Markov chain Monte Carlo methods for stochastic volatility models. Journal of Econometrics, 108:281–316, 2002. [2] Siddhartha Chib, Federico Nardari, and Neil Shephard. Analysis of high dimensional multivariate stochastic volatility models. Forthcoming, Journal of Econometrics, 2006. [3] R. F. Engle. Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica, 50(4):987–1007, 1982. [4] J. Geweke. Bayesian comparison of econometric models. 1994. Working Paper, Federal Reserve Bank of Minneapolis Research Department. [5] E. Jacquier, N. Polson, and P. Rossi. Bayesian analysis of stochastic volatility models. Journal of Business and Economic Statistics, 12(4):371–388, 1994. [6] Sangjoon Kim, Neil Shephard, and Siddhartha Chib. Stochastic volatility: Likelihood inference and comparison with ARCH models. Review of Economic Studies, 65(3):361–393, 1998. [7] Yasuhiro Omori, Siddhartha Chib, Neil Shephard, and Jouchi Nakajima. Stochastic volatility with leverage: fast likelihood inference. Forthcoming, Journal of Econometrics, 2004. [8] Neil Shephard and Michael K. Pitt. Likelihood analysis of non-Gaussian measurement time series. Biometrika, 84(3):653–667, 1997. [9] Raf Vandebril, Nicola Mastronardi, and Marc Van Barel. A Levinson-like algorithm for symmetric strongly nonsingular higher order semiseparable plus band matrices. Report TW 427, Katholieke Universiteit Leuven Department of Computer Science, 2005. 27