Drawing Stochastic Volatility

advertisement
Drawing Stochastic Volatility
William J. McCausland 1 Université de Montréal, CIREQ and CIRANO
Denis Pelletier 2
North Carolina State University
Current version: May 1, 2006, very preliminary
1 Mailing
succursale
address:
Département
de
Centre-ville,
Montréal
QC
sciences
H3C
économiques,
3J7,
Canada.
C.P.
6128,
e-mail:
william.j.mccausland@umontreal.ca. Web site: www.cirano.qc.ca/∼mccauslw.
2 Mailing address: Department of Economics, Campus Box 8110, North Carolina
State University, Raleigh, 27695-8110, USA. e-mail: denis pelletier@ncsu.edu. Web site:
http://www4.ncsu.edu/∼dpellet. We thank Siddhartha Chib, John Geweke, Éric Jacquier,
Luke Tierney and Herman van Dijk for helpful comments. We alone are responsible for
any errors.
Abstract
Bayesian analysis of stochastic volatility (SV) models typically involves using
Markov chain Monte Carlo (MCMC) methods to simulate the joint posterior distribution of parameters and stochastic volatility. These simulators often draw proposals of parameters or volatility from approximate distributions and then use an
accept/reject scheme to correct for the approximations. Jacquier, Polson and Rossi
(1994) introduce a simulator which draws volatility proposals one observation at
a time. Shephard and Pitt (1997) demonstrate the important numerical efficiency
gains obtained by drawing (log) volatility proposals in blocks. Kim, Shephard and
Chib (1998) show the substantial additional improvements to be had by drawing
joint proposals of parameters and the entire log-volatility vector.
We propose a new procedure for drawing joint proposals of parameters and
log-volatility in a standard univariate stochastic volatility model. We first describe a simple, but naive, multivariate normal log-volatility proposal distribution
approximating the conditional distribution of log-volatility given parameters and
return data. We then describe two refinements, which result in a much better
approximation. We then present a parameter proposal distribution approximating the conditional distribution of parameters given only data. Combining the
log-volatility proposal with the parameter proposal gives a joint proposal whose
distribution closely approximates the joint posterior distribution of parameters and
log-volatility.
The procedure is fast, and for reasonable numbers of observations, we achieve
high numerical efficiency. The general approach is not very model specific, and
we suggest that our procedure can be modified to work with more general models,
with features such as leverage effects, finite-mixture-of-normals return shocks and
multivariate returns without a factor structure. We illustrate our procedure using
foreign exchange data.
1 Introduction
The conditional variance, or volatility, of asset returns evolves over time. Engle’s
(1982) autoregressive conditional heteroscedasticity (ARCH) model and various
generalizations capture this phenomenon. Since volatility is a deterministic function of previous returns, the likelihood function is easy to compute.
In stochastic volatility (SV) models, volatility is a latent stochastic process.
Jacquier, Polson and Rossi (1994) and Geweke (1994) give evidence suggesting
that SV models are more realistic.
However, the likelihood function cannot easily be evaluated, which makes
maximum likelihood approaches difficult. Bayesian approaches to inference for
SV models are widely used, partly because it is easy to avoid having to evaluate
the likelihood function: the joint posterior density of parameters and volatilities is
easily evaluated up to a multiplicative normalization constant, and so we can simulate this joint posterior distribution using Markov chain Monte Carlo methods.
We now describe a simple univariate discrete-time stochastic volatility model.
This model, together with variants differing only in terms of parametrization, is
widely used.
The log-volatility equation is
λt = φ λt−1 + ξt ,
and the return equation is
rt = eλt /2 ηt .
The error sequences {ξt } and {ηt } are both Gaussian white noise and are mutually
independent. Their precisions1 are ωλ and ωr respectively. The volatility sequence
{λt } is stationary. Define the parameter vector θ ≡ (ωλ , φ, ωr ).
Jacquier, Polson and Rossi (1994) propose a posterior simulator that draws
volatility proposals one observation at a time. This approach is relatively simple.
It is not overly model-specific, and it can thus easily be extended to more elaborate
1
The precision is the inverse of the variance
1
models. However, the high posterior autocorrelation of volatility, especially for
daily returns, leads to highly autocorrelated posterior draws.
Shephard and Pitt (1997) (SP) show how to draw blocks of log-volatilities
using a Metropolis-Hastings update. This update involves drawing a random proposal for the volatility block and accepting it with a probability given by the Hastings ratio, which measures the closeness of the proposal density to the true conditional posterior density at both the current state and the random proposal. If the
proposal is rejected, the new value of the volatility block is set equal to its value
at the current state. The log-volatility block proposals are multivariate normal.
Capturing volatility autocorrelation within a block reduces the autocorrelation of
posterior draws, but only to a point: as the blocks get longer, the proposal distribution becomes a cruder approximation of the correct distribution, and the acceptance rate deteriorates. Eventually, the autocorrelation of posterior draws begins
to increase with block length.
Kim, Shephard and Chib (1998) (KSC) transform the canonical SV model into
a linear one, and approximate the random component of the transformed model
as a mixture of normals. They employ a technique known as data augmentation,
adding to the vector of unknown quantities a sequence of discrete latent variables
indicating mixture components. Their simulator samples the joint distribution
of parameters, volatilities and mixture component indices. Conditional on these
indices, the approximate model is linear and Gaussian, and this simplicity allows
them not only to draw volatilities for all observations at once, achieving a similar
numerical efficiency improvement as SP, but also to integrate out volatilities and
then draw both volatilities and parameters as a single block. Since volatilities
and parameters are highly correlated, this leads to further numerical efficiency
improvements. The KSC method has been extended to many generalizations of
the canonical SV model, including models with a leverage effect (Omori et al.
(2006)), scale mixtures of normals for the return equation shock (Chib, Nardari
and Shephard (2002), Omori et al. (2006)), jumps (Chib, Nardari and Shephard
(2002)), and multivariate returns (Chib, Nardari and Shephard (2006)).
2
Important as it is, this method has some limitations. The chain’s stationary
distribution is only an approximation of the posterior distribution. Obtaining
simulation-consistent sample moments requires re-weighting the draws generated
by the chain, and numerical standard errors for the reweighted chain are larger
than those for the unweighted chain. While samplers based on the KSC approach
have been developed for models with pure scale mixtures for the return equation
shock, more general mixtures present difficulties. The methods of Chib, Nardari
and Shepard (2006) apply to factor models with independent factors. Models that
do not have a factor structure are more difficult.
In this paper, we propose a new method for drawing joint proposals of parameters and volatility. The procedure is fast and has very high numerical efficiency
for reasonable numbers of observations. We will argue that our general approach
to drawing volatility can be extended to models where return shocks have a finite mixture of normals distribution and to multivariate models with very flexible
cross-sectional dependance.
Section ?? is devoted to preliminary concepts. We write down the density
functions of the model, review properties of the precision and co-vector of a normal distribution and discuss quadratic approximations of log f (r|θ, λ).
In Section ??, we discuss methods for drawing volatility proposals λ∗ =
(λ∗1 , . . . , λ∗T ) and evaluating the proposal distribution q(λ∗ ; θ, r) at the realized
draw, all in O(T ) time. We first present a basic multivariate normal proposal
that approximates the conditional posterior distribution λ|r, θ. It is based on a
quadratic approximation of log f (r|θ, λ) using its gradient and (diagonal) Hessian
at the mode of λ|r, θ. We show how to draw λ∗t sequentially backwards. To compute E[λ∗t |λ∗t+1 , . . . , λ∗T ] and var[λ∗t |λ∗t+1 , . . . , λ∗T ], we use an algorithm by Vandebril, Mastronardi and Van Barel (VMVB) for solving band diagonal symmetric
systems. The conditional means and variances we need are simple functions of
intermediate computations.
We then offer two refinements of the basic proposal distribution. The first
refinement involves changing the quadratic approximation of log f (r|θ, λ) as we
3
learn about the trajectory of volatility. The result is a proposal that relaxes multivariate normality, but retains conditional normality of the λ∗t |λ∗t+1 , . . . , λ∗T .
In the second refinement, we replace the conditional normal distribution of
λ∗t |λ∗t+1 , . . . , λ∗T with a mixture distribution which captures some of the departure
from normality of the distribution λt |λt+1 , . . . , λT , r, θ.
The proposal distribution we use in practice incorporates both refinements. In
Section ??, we show how to draw proposals θ∗ of the parameter vector from a distribution approximating its marginal posterior distribution θ|r. Together with the
log-volatility proposal of Section ??, which draws λ∗ from a distribution approximating λ|θ, r, we obtain a proposal distribution for the pair (θ∗ , λ∗ ) that closely
approximates the distribution θ, λ|r.
In Section ??, we show how to approximate the marginal likelihood f (r). In
Section ??, we present empirical results suggesting that the simulator is numerically efficient. In Section ??, we conclude, suggesting that our approach can be
extended to more general models. These include models with features such as
leverage effects and return equation shocks with a finite mixture of normals distribution. They also include multivariate models with very flexible dependance.
2 Preliminaries
2.1 Densities for parameters, volatilities and returns
We complete the model with a prior distribution for θ = (ωλ , φ, ωr ) where ωλ , φ
and ωr are independent, s̄λ ωλ ∼ χ2 (ν̄λ ), s̄r ωr ∼ χ2 (ν̄r ), and the distribution of
φ is the truncation of the distribution N(φ̄, ω̄φ−1) to the stationary interval (−1, 1).
4
Thus,
f (θ) =
·
·
s̄2λ
2
ν̄λ /2
1
(ν̄ −2)/2
ω λ
exp(−s̄2λ ωλ /2)
Γ(ν̄λ /2) λ
r
h ω̄
i
ω̄φ
1
φ
exp − (φ − φ̄)2
√
√
2
Φ( ω̄φ (1 − φ̄)) − Φ( ω̄φ (−1 − φ̄)) 2π
2 ν̄r /2
1
s̄r
ω (ν̄r −2)/2 exp(−s̄2r ωr /2)
2
Γ(ν̄r /2) r
We observe the return rt for t = 1, . . . , T . Let λ = (λ1 , . . . , λT ) and r =
(r1 , . . . , rT ).
The distributions λ|ωλ , φ and r|λ, ωr have densities
1/2
ωλ (1 − φ2 ) 2
ωλ (1 − φ2 )
exp −
λ1
f (λ|ωλ, φ) =
2π
2
T i
h ω
Y
ωλ 1/2
λ
2
·
exp − (λt − φλt−1 )
2π
2
t=2
and
f (r|λ, ωr ) =
1/2
T Y
ωr e−λt
t=1
2π
ωr e−λt 2
exp −
rt .
2
2.2 Precisions and co-vectors
We find it very useful to work with precisions and co-vectors as well as means
and variances, so we define these terms here and point out some simple but useful
properties. For any random vector x ∼ N(µ, Σ), we call H ≡ Σ−1 the precision,
and c ≡ Σ−1 µ the co-vector. Note that Σ = H −1 and µ = H −1 c, an illustration
of the duality between (µ, Σ) and (c, H).
In matrix notation, the density for λ|ωλ, φ is
f (λ|ωλ, φ) = (2π)−T /2 |H̄|1/2 · exp − 21 λ′ H̄λ ,
5
where the precision H̄ is given by

1
−φ
0

2
−φ 1 + φ
−φ

 0
−φ 1 + φ2

H̄ = ωλ  .
..
..
 ..
.
.


0
0
 0
0
0
0
···
···
···
..
.
0
0
0
0
..
.
0
0
..
.
· · · 1 + φ2
···
−φ






.



−φ
1
The co-vector is c̄ = 0T .
It is easy to verify that for 1 ≤ t < T , the distribution λ1 , . . . , λt |ωλ, φ, λt+1 , . . . , λT
is multivariate normal, that its precision is the sub-matrix


H̄1,1 · · · H̄1,t
 .
.. 
..
 ..
.
. 
,

H̄t,1 · · · H̄t,t
and that its co-vector is (0, . . . , 0, φ ωλλt+1 )′ .
2.3 Taylor Expansions of f (rt|λt , ωr )
We note that log f (r|λ, θ) is concave in λ and additively separable in λ1 , . . . , λT .
We will approximate log f (r|λ, θ), as a quadratic form in λ with a diagonal coefficient matrix. This will lead to multivariate normal approximations of the posterior
distribution λ|θ, r.
Let g(λt ) ≡ ωr rt2 e−λt +λt , and note that f (rt |λt , ωr ) is proportional to exp[− 21 g(λt)]
as a function of λt .
We approximate g(λt ) by g̃(λt ), consisting of the first three terms of the Taylor
expansion of g(λt ) around some value λ◦t :
g(λt) ≈ g̃(λt ) ≡ g(λ◦t ) + g ′ (λ◦t ) · (λt − λ◦t ) + 21 g ′′ (λ◦t ) · (λt − λ◦t )2 .
The first two derivatives of g(λt ) are
g ′ (λt ) = −ωr rt2 e−λt + 1
and g ′′ (λt ) = ωr rt2 e−λt .
6
If we complete the square, we obtain
g̃(λt ) = ht (λt − ct /ht )2 + k,
where
ht = 12 g ′′ (λ◦t ) = 12 ωr rt2 e−λt ,
◦
ct =
1
2
=
1
2
[g ′′ (λ◦t )λ◦t − g ′ (λ◦t )]
◦
◦
ωr rt2 e−λt λ◦t − 1 + ωr rt2 e−λt = ht (1 + λ◦t ) − 21 ,
and k is an unimportant term not depending on λt . We point out that ht and ct
are the precision and co-vector of a univariate normal distribution with density
proportional to exp[− 21 g̃(λt )].
The additive separability of log f (r|λ, θ) in the elements of λ means that it is
Q
reasonably well approximated, as a function of λ, by Tt=1 exp[− 12 g̃(λt )], which is
proportional to a multivariate normal distribution with precision H and co-vector
c, given by


 
h1 0 · · · 0
c1


.
 0 h2 · · · 0 
.
H≡
.. . .
.  and c ≡ 
 ..
 . .
. .. 
.
.

cT
0 0 · · · hT
The posterior density f (λ|r, θ), proportional to f (r|λ, θ)f (λ|θ), can be approxi¯ = H̄ + H and co-vector
mated by a multivariate normal density with precision H̄
c̄¯ = c̄ + c.
The approximation depends on the choice of λ◦ , the vector of values around
which we Taylor-expand the g(λt ). We will discuss this choice later, but for now
consider the the mode of the distribution λ|θ, r as a reasonable choice.
2.4 An algorithm for solving band diagonal systems
Vandebril, Mastronardi and Van Barel (2005) (VMVB) introduce a Levinson-like
algorithm for finding the solution x to the equation Ax = y, where A is an T × T
real symmetric band-diagonal matrix, and y is a real T × 1 vector, in time O(T ).
7
For A with a single non-zero off-diagonal, the algorithm is as follows.
1. Compute
µ1 := y1 /A11 ,
α1 := −1/A11 .
2. For t = 2, . . . , T , compute
µt :=
yt − At,t−1 µt−1
,
Att + (At,t−1 )2 αt−1
αt :=
−1
,
Att + (At,t−1 )2 αt−1
3. Compute
xT := µT .
4. For t = T − 1, . . . , 1, compute
xt := µt + At+1,t αt xt+1
VMVB show that the result x indeed satisfies y = Ax. They also give evidence, using numerical experiments, that their procedure is stable. The issue of
stability is important and not trivial: there is a simple and fairly obvious algorithm
for solving Ax = y that is numerically unstable.
The following two results will be important for drawing volatility proposals.
Result 2.1 ((A1:t )−1 y1:t )t = µt , where A1:t is the leading t × t sub-matrix of A
and y1:t is the leading t × 1 sub-vector of y.
Proof. Apply the algorithm to the system A1:t z = y1:t and note that µ1 , . . . , µt
and α1 , . . . , αt are identical for the systems A1:t z = y1:t and A x = y. Result 2.2 ((A1:t )−1 )tt = −αt .
Proof. Apply the algorithm to the system A1:t z = et , where et is the t-vector
(0, . . . , 0, 1), and note that µτ = 0 for τ = 1, . . . , t − 1 and µt = −αt . 8
3 Volatility Proposals
We introduce methods for drawing volatility proposals λ∗ . We first introduce a
simple, but inefficient, basic proposal. We then discuss two refinements of the
basic proposal, and show that they greatly improve the numerical efficiency of a
chain simulating λ|θ, r.
3.1 The Basic Proposal
The basic proposal is multivariate normal. We choose λ◦ as the mode of the
¯ and c̄¯ from λ◦ as described in Section
posterior distribution λ|θ, r and compute H̄
¯ and c̄¯:
??. Then λ∗ is multivariate normal with precision H̄
¯ −1 c̄¯, H̄
¯ −1 )
λ∗ ∼ N(H̄
We obtain λ◦ using an iterative approach where we repeat the following two
steps until kλ◦(i) − λ◦(i−1) k∞ is less than a certain tolerance.
¯ and c̄¯ in terms of λ◦ .
1. Compute H̄
(i)
(i)
(i−1)
¯ −1 c̄¯ using the VMVB algorithm.
2. Compute λ◦(i) = H̄
(i) (i)
We draw the λ∗t sequentially backwards. Recall that the precision of the dis¯ of H̄
¯ and the cotribution λ∗1 , . . . , λ∗t |λ∗t+1 , . . . , λ∗T is the leading sub-matrix H̄
1:t
vector is c̄¯1:t + φωλ λ∗t+1 et , where c̄¯1:t is the leading t-vector of c̄¯ and et is the
t-vector (0, . . . , 0, 1). Standard results on conditional normal distributions give
¯ )−1 (c̄¯ + φω λ∗ e ), ((H̄
¯ )−1 ) ).
λ∗t |λ∗t+1 , . . . , λ∗T ∼ N((H̄
1:t
1:t
λ t+1 t
1:t
tt
¯ )−1 ) is simply the interResult ?? tells us that the conditional variance ((H̄
1:t
tt
¯.
mediate quantity −αt obtained by applying the VMVB algorithm with A = H̄
¯ and y = ȳ¯ and note that µ must be replaced with m
Result ?? (take A = H̄
t
t
¯
¯
−1 ¯
∗
−1 ¯
to compute (H̄1:t ) (c̄1:t + φωλ λt+1 et ) rather than (H̄1:t ) c̄1:t ) tells us that the
¯ )−1 (c̄¯ + φω λ∗ ) is given by
conditional mean (H̄
1:t
1:t
λ t+1
mt ≡
¯
c̄¯t + φωλλ∗t+1 − H̄
t,t−1 µt−1
.
¯ + (H̄
¯
H̄
)2 α
tt
t,t−1
9
t−1
Putting everything together, we see that the following algorithm draws λ∗ and
evaluates q ∗ , the proposal density evaluated at λ∗ , all in time O(T ). We take
µ0 = α0 = 0 and initialize q ∗ := 1.
1. For t = 1, . . . , T , compute
¯
c̄¯t − H̄
t,t−1 µt−1
,
µt := ¯
¯
H̄tt + (H̄t,t−1 )2 αt−1
−1
αt := ¯
,
¯
H̄tt + (H̄t,t−1 )2 αt−1
2. For t = T, T − 1, . . . , 1,
(a) compute
¯
c̄¯t + φωλ λ∗t+1 − H̄
t,t−1 µt−1
mt :=
.
¯ + (H̄
¯
H̄
)2 α
tt
t,t−1
t−1
(b) Draw λ∗t ∼ N(mt , −αt ).
(c) Evaluate
1
1 (λ∗t − mt )2
q := q · p
exp −
2
−αt
2π(−αt )
∗
∗
We illustrate the performance of the basic proposal with an example. We first
simulate some data. We set θ = (ωλ , φ, ωr ) = (20.0, 0.95, 1.0 × 104) and draw
T = 1000 return observations. We then simulate the distribution λ|θ, r using an
independence Metropolis Hastings chain, with the basic proposal distribution. We
draw M = 106 proposals in all.
The chosen value of ωλ is small but plausible. We do this to increase the
weight of the non-quadratic contribution to log f (λ|r, θ) relative to the quadratic
part, making this a difficult but reasonable case.
To analyze the performance of the proposal, we look at the run lengths of
sequences of repeated values of the λ vector in the posterior sample. If a volatility
proposal λ∗ is accepted, the next n volatility proposals are rejected, and the n + 1st is accepted, we say that the run length of λ∗ is n + 1. Long run lengths indicate
poor mixing of the posterior chain and high numerical standard errors.
10
Figure ?? shows estimates of the survival function and the hazard rate of the
run length for the basic proposal. We see that the distribution of run lengths has a
thick tail, and that long run lengths are intolerably probable.
3.2 A First Refinement
The basic multivariate normal proposal is based on a quadratic approximation
of log f (r|λ, θ) at λ◦ , the mode of the distribution λ|r, θ. The idea behind the
first refinement is to use the information obtained while drawing λ∗ to refine the
quadratic approximation of log(r|λ, θ). After each draw λ∗t , we adjust λ◦t−J , . . . , λ◦t−1 ,
where J is a small integer, in the direction of the mode of the conditional distribution λ1 , . . . , λt |λt+1 , . . . , λT , θ. Using the modified values λ◦t−J , . . . , λ◦t−1 , we
refine the quadratic approximation of log f (r|λ, θ) by adjusting ht−J , . . . , ht and
ct−J , . . . , ct . We then recompute αt−J , . . . , αt and µt−J , . . . , µt−1 . Finally, we
compute mt , the conditional mean E[λ∗t |λ∗t+1 , . . . , λT ]. The conditional variance
is the negative of the adjusted value of αt .
The result is, like the basic proposal, a sequence of draws λ∗T , . . . , λ∗1 that are
conditionally normal. However, the proposal is no longer multivariate normal.
See the modified algorithm below. Step one, the forward pass, is identical to that of the basic proposal. The backward pass has three sub-steps. The
horizon parameter J determines how far back we refine our quadratic approximations of g(λt ) in order to draw each λ∗t . In sub-step 2(a), we compute, for
τ = t, t−1, . . . , t−J, an approximation mτ of the conditional mean E[λτ |λt+1 =
λ∗t+1 , . . . , λT = λ∗T , θ, r]. We then recompute the quadratic expansion of g(λτ )
around mτ , obtaining new values for hτ and cτ .
In sub-step 2(b), we update, for τ = t − J, . . . , t − 1, the values µτ and ατ
based on the new values of hτ and cτ .
In sub-step 2(c), we update mt , which we use as the conditional mean E[λ∗t |λ∗t+1 , . . . , λ∗T ],
and αt , the negative of which we use as the conditional variance var[λ∗t |λ∗t+1 , . . . , λ∗T ].
In sub-step 2(d), we draw λ∗t ∼ N(mt , −αt ). In sub-step 2(e), we update the eval-
uation of q ∗ , the proposal distribution evaluated at λ∗ . We take µ0 = α0 = 0 and
11
0
10
−1
10
−2
10
−3
10
−4
10
−5
10
0
50
100
150
200
250
300
350
400
Figure 1: Survival Function and Hazard Rate for Run Lengths, Basic Proposal
12
450
500
initialize q ∗ := 1.
1. For t = 1, . . . , T , compute
¯
c̄¯t − H̄
t,t−1 µt−1
µt := ¯
,
¯
2
H̄tt + (H̄
t,t−1 ) αt−1
−1
αt := ¯
.
¯
2
H̄tt + (H̄
t,t−1 ) αt−1
2. For t = T, T − 1, . . . , 1,
(a) for τ = t, t − 1, . . . , min(t − J, 1), compute

¯
∗
 c̄¯t +φωλ λt+1 −H̄t,t−1 µt−1
¯
¯
H̄tt +(H̄t,t−1 )2 αt−1
mτ :=
µ + α H̄
¯
τ
τ τ −1,τ mτ +1
τ =t
τ < t,
hτ := 21 ωr rτ2 e−mτ ,
cτ := hτ (1 + mτ ) − 12 .
(b) for τ = min(t − J, 1), . . . , t − 1, compute
¯
c̄¯τ − H̄
τ,τ −1 µτ −1
µτ := ¯
¯
H̄ + (H̄
)2 α
ττ
τ,τ −1
τ −1
,
−1
ατ := ¯
,
¯
H̄τ τ + (H̄τ,τ −1 )2 ατ −1
(c) Compute
mt :=
¯
c̄¯t + φωλ λ∗t+1 − H̄
t,t−1 µt−1
,
¯ + (H̄
¯
H̄
)2 α
tt
t,t−1
t−1
−1
αt := ¯
,
¯
2
H̄tt + (H̄
t,t−1 ) αt−1
(d) Draw λ∗t ∼ N(mt , −αt ).
(e) Evaluate
1 (λ∗t − mt )2
1
exp −
q := q · p
2
−αt
2π(−αt )
∗
∗
We illustrate the performance of these proposals using the same run length
analysis that we used for the basic proposal. Figure ?? illustrates the survival
function and the hazard rate for the run lengths of four different proposals. The
four panels correspond to the horizon parameters J = 1, J = 3, J = 5 and J = 7.
We see important improvements in the performance of the proposal. Beyond
J = 7, there is little additional improvement.
13
0
10
−5
10
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
0
10
−5
10
0
10
−5
10
0
10
−5
10
Figure 2: Survival Function and Hazard Rate for Run Lengths, Proposal with First
Refinement, J = 1, 3, 5, 7
14
3.3 A Second Refinement
While the conditional distribution λt |λt+1 , . . . , λT , r, θ is very nearly normal, it is
in fact slightly positively skewed and its right tail is thicker than that of the normal
distributions λ∗t |λ∗t+1 , . . . , λ∗T of the basic proposal. In the second refinement, we
replace this conditional normal distribution with a mixture distribution designed
to capture these departures from normality.
For a better understanding of these issues, we first consider the distribution
λt |λ1 , . . . , λt−1 , λt+1 , . . . , λT , θ, r. Its unnormalized density is given by
f (λt |λ1 , . . . , λt−1 , λt+1 , . . . , λT , θ, r)
φ
1
2
2
,
(λt−1 + λt+1 ))
∝ exp − g(λt ) + ωλ (1 + φ )(λt −
2
1 + φ2
where
g(λt) = ωr rt2 exp(−λt ) + λt .
Let λ◦t be the mode of this distribution and g̃(λt ) the Taylor expansion around
λ◦t up to the quadratic term. We then have the following approximation.
f (λt |λ1 , . . . , λt−1 , λt+1 , . . . , λT , θ, r)
s
i
ĥ
1h
◦ 2
exp − ĥt (λt − λt )
≈
2π
2
1
φ
2
2
∝ exp − g̃(λt ) + ωλ (1 + φ )(λt −
,
(λt−1 + λt+1 ))
2
1 + φ2
where
1
ĥt = ωλ (1 + φ2 ) + ωr rt2 exp(−λ◦t ).
2
The next term of the Taylor expansion of g(λt ) around λ◦t is
1 ′′′ ◦
1
g (λt )(λt − λ◦t )3 = ωr rt2 exp(−λ◦t )(λt − λ◦t )3 .
3!
6
Including this term gives the following improved approximation:
f (λt |λ1 , . . . , λt−1 , λt+1 , . . . , λT , θ, r)
s
1
1
ĥt
◦ 2
2
◦
◦ 3
.
exp − ĥt (λt − λt ) − ωr rt exp(−λt )(λt − λt )
≈
2π
2
6
15
Using the approximation ex ≈ 1 + x, we have
f (λt |λ1 , . . . , λt−1 , λt+1 , . . . , λT , θ, r)
s
i ĥt
1
1h
◦ 2
2
◦
◦ 3
1 + ωr rt exp(−λt )(λt − λt ) .
exp − ĥt (λt − λt )
≈
2π
2
12
We can approximate the conditional mean E[λt |λ1 , . . . , λt−1 , λt+1 , . . . , λT , θ, r]
by the following integral.
s
Z
i ĥt
1
1h
◦ 2
◦
2
◦
◦ 4
(λt − λt ) + ωr rt exp(−λt )(λt − λt ) dλt
exp − ĥt (λt − λt )
2π
2
12
Knowing that the kurtosis of a normal random variable is 3, we compute the value
of the integral to be
λ◦t +
1
ωr rt2 exp(−λ◦t )
.
4 [ωλ (1 + φ2 ) + ωr rt2 exp(−λ◦t )]2
The second term is a reasonable mean correction term for the conditional distribution λt |λ1 , . . . , λt−1 , λt+1 , . . . , λT , θ, r. This is not directly useful, however,
since we would like to refine our approximation of the distribution λt |λt+1 , . . . , λT , θ, r.
We can approximate the mean of the distribution λ|r, θ as
¯ −1 (c̄¯ + c∗ ) = λ◦ + H̄
¯ −1 c∗ ,
E[λ|r, θ] ≈ H̄
¯ and c̄¯ are the precision and cowhere λ◦ is the mode of the distribution λ|r, θ, H̄
vector associated with this value of λ◦ , and c∗ is a vector of co-vector corrections,
with
ωr rt2 exp(−λ◦t )
1
.
c∗t =
4 ωλ (1 + φ2 ) + ωr rt2 exp(−λ◦t )
¯ −1 c∗ of mean corrections using the basic
We compute the vector m∗ = H̄
VMVB algorithm.
To capture the skewness and thick right tail of λt |λt+1 , . . . , λT , θ, r, we replace the normal distribution λ∗t |λ∗t+1 , . . . , λ∗T with the mixture distribution with
16
the following density.
q(λ∗t |λ∗t+1 , . . . , λ∗T )
1 (λ∗t − µ1 )2
1
exp −
= πp
2
σ12
2πσ12
Φ(z)
1
1 (λ∗t − µ2 )2
+ (1 − π) ·
1(−∞,µ2 +zσ2 ) (λ∗t )
·p
exp −
2
2
Φ(z) + φ(z)/z
2
σ
2πσ2
2
φ(z)/z
z ∗
z
+ (1 − π) ·
exp − (λt − (µ2 + zσ2 )) 1[µ2 +zσ2 ,∞)(λ∗t ).
·
Φ(z) + φ(z)/z σ2
σ2
The second component is truncated normal on the support (−∞, µ2 + zσ2 ) while
the third component is exponential on the support [µ2 + zσ2 , ∞). Their relative
component probabilities and the mean parameter of the exponential distribution
are chosen to match slopes and values at µ2 + zσ2 . The exponential component is
designed to thicken the right tail of the distribution.
We choose the parameters µ1 , µ2 , σ1 , σ2 , π and z so that the mixture distribution is positively skewed and has a mean of mt + m∗t . Trial and error suggest that
the following parameter values are reasonable: µ1 = mt + m∗t /2, µ2 = mt + 3m∗t ,
σ12 = −αt , σ22 = 0.04/ωλ + 0.96(−α), π = 0.8 and z = 2.5. We hope to put
the choice of these parameters on a more rigorous footing in a later version of the
paper.
Figure ?? illustrates the performance of the proposal distribution when we add
both refinements. It illustrates the survival function and the hazard rate for the run
lengths of four different proposals. The four panels correspond to the horizon
parameters J = 1, J = 3, J = 5 and J = 7.
We see significant additional improvements in performance.
4 Joint Parameter and Volatility Proposals
Please be aware that this section is incomplete and in some places innacurate. The
method we currently use to draw parameters is similar to the method described
17
0
10
−5
10
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
0
10
−5
10
0
10
−5
10
0
10
−5
10
Figure 3: Survival Function and Hazard Rate for Run Lengths, Proposal with First
Refinement and Second Refinement, J = 1, 3, 5, 7
18
here, but not exactly the same. We are currently working on an alternate method
for approximating the f (θ|r) posterior density.
We first propose a method for drawing proposals θ∗ from a distribution approximating θ|r. We then note that this method can be combined with the procedure outlined in the previous section for drawing proposals λ∗ from a distribution
approximating λ|θ, r. The two procedures can be used to draw joint proposals
(θ∗ , λ∗ ) from a distribution approximating θ, λ|r.
Before doing posterior simulation for real, we draw a “burn-in” pre-sample
(b)
(θ , λ(b) )B
b=1 . Our proposal distribution is a mixture distribution with B equally
probable mixture components. The b’th component approximates the distribution
θ|λ(b) , r and thus the mixture approximates the distribution θ|r. Specifically, the
proposal distribution has the following density:
B
1 X
q(θ) =
f (ωλ |φ(b) , λ(b) )f˜(φ; ωλ, λ(b) , r)f (ωr |λ = λ(b) ).
B b=1
The density f (ωλ |φ, λ) is the exact density for the conditional posterior distribution ωλ |φ, ωr , λ:
2 ν̄¯λ /2
1
s̄¯λ
(ν̄¯ −2)/2
ωλ λ
exp(−s̄¯2λ ωλ /2),
f (ωλ |φ, ωr , λ, r) =
¯
2
Γ(ν̄λ /2)
where
ν̄¯λ ≡ ν̄λ + T
and s̄¯λ ≡ s̄λ + (1 − φ2 )λ21 +
T
X
t=2
(λt − φλt−1 )2 .
The density f˜(φ; ωλ, ωr , λ, r) approximates the density for the conditional
posterior distribution φ|ωλ , ωr , λ:
f˜(φ; ωλ, ωr , λ, r)
≡
where
1
Φ(
p
p
¯ φ (1 − φ̄¯)) − Φ( ω̄
¯ φ (−1 − φ̄¯))
ω̄
¯ φ ≡ ω̄φ + ωλ
ω̄
T −1
X
t=1
λ2t
r
¯φ
¯φ
ω̄
ω̄
2
¯
exp − (φ − φ̄) ,
2π
2
¯ φ−1 ω̄φ φ̄ + ωλ
and φ̄¯ ≡ ω̄
19
T −1
X
t=1
λt λt+1
!
.
The density f (ωλ |φ, λ) is the exact density for the conditional posterior distribution ωr |φ, ωλ, λ:
f (ωr |φ, ωλ, λ, r) =
s̄¯2r
2
ν̄¯r /2
1
¯
ωr(ν̄r −2)/2 exp(−s̄¯2r ωr /2),
Γ(ν̄¯r /2)
where
ν̄¯r ≡ ν̄r + T
and s̄¯λ ≡ s̄λ +
T
X
exp(−λt )rt2 .
t=1
Simulating from and evaluating these distributions is straightforward. The
result is a proposal distribution approximating the distribution θ|λ.
We propose the pair (θ∗ , λ∗ ) by proposing θ∗ using the procedure outlined in
this section and then λ∗ using the method outlined in the previous section. The
joint proposal density is q(θ∗ )q(λ∗ |θ∗ ).
We draw the pre-sample in the same way that we draw the real sample, except
that we use a mixture with only the currently available components. Thus to draw
θ(b) , we use an evenly weighted mixture of the first b − 1 components. We draw
θ(1) , the first pre-sample draw of θ, from the prior.
5 Approximation of the Marginal Likelihood
Approximating the marginal likelihood is straightforward using importance sampling. We average the ratio of true and proposal densities over the proposals
λ∗(1) , . . . , λ∗(M ) which are are i.i.d.:
M
∗
∗
∗
X
)f (r|θ(m)
, λ∗(m) )
)f (λ∗(m) |θ(m)
f (θ(m)
m=1
a.s.
−→
=
∗
∗
q(θ(m)
)q(λ∗(m) |θ(m)
)
f (θ∗ )f (λ∗ |θ∗ )f (r|θ∗ , λ∗ )
· q(θ∗ )q(λ∗ |θ∗ )dθ∗ dλ∗
q(θ∗ )q(λ∗ |θ∗ )
f (r)
Z
20
Table 1: Posterior Moments of Parameters
Parameter
Mean
Std. Dev.
Numer. Std. Error Relative Numer. Effic.
ωλ
63.28
28.27
0.276
52.4%
φ
ωr
0.9755
3.836 × 104
0.0102
0.745 × 104
0.00011
0.0074 × 104
40.6%
50.2%
6 An Empirical Example
We illustrate our methods with an empirical example. We use data on the French
Franc exchange rate from January 6, 1994 to January 15, 1999. We construct the
series rt ≡ log Pt − log Pt−1 , where Pt is the noon rate for the French Franc in
U.S. dollars. Our data source is the web site of the Federal Reserve Bank of New
York.
Recall that the parameters ωλ , φ and ωr are a priori independent, s̄λ ωλ ∼
χ2 (ν̄λ ), s̄r ωr ∼ χ2 (ν̄r ), and φ is the truncation of the distribution N(φ̄, σφ2 ) to the
stationary interval (−1, 1). We choose ν̄λ = 3.0 and s̄2 = 0.03, which makes the
prior mean of ωλ 100.0 and the prior standard deviation approximately 81.65. We
set φ̄ = 0.99 and ω̄φ = 2500. The untruncated distribution has mean 0.99 and
standard deviation 0.02. Finally, ν̄r = 3.0 and s̄2r = 7.5 × 10−5 , which implies a
prior mean of 40 000 for ωr and a prior standard deviation of approximately 32
659.9.
Table ?? shows the posterior mean, posterior standard deviation, numerical
standard error and relative numerical efficiency for each of the three parameters.
Results are based on a posterior sample of length 20000. The relative numerical
efficiency is a ratio of numerical error variances, that of a hypothetical i.i.d. sample to that of the sample we draw. We can interpret an numerical efficiency of
40.6% as meaning that a sample we draw gives the same information about the
posterior mean of the parameter in question as an i.i.d. sample 40.6% of its length.
21
Figures ??, ?? and ?? illustrate the three bivariate posterior parameter distributions.
The log marginal likelihood is 6734.239 with a numerical standard error of
0.015.
Figure ?? shows estimates of the survival function and the hazard rate of the
run length for the posterior simulator applied to the French Franc data.
7 Conclusions
We have proposed a fast and numerically efficient posterior simulator for a basic
univariate stochastic volatility model.
In future work, we hope to be able to adapt the simulator to more general models. We suggest that the approach we have taken lends itself well to this project.
The basic idea depends on the band diagonality of the Hessian of log f (λ|θ, r) and
the prior multivariate normality of the log volatility. Within these constraints we
can see several potential extensions. Adding a leverage effect makes the Hessian
of log(r|θ, λ) in λ no longer diagonal. But it is still band diagonal, with a single
non-zero off-diagonal. We can consider more flexible conditional distributions
such as finite mixtures of normals, which can capture conditional leptokurtosis
and skewness. Increasing the autoregressive order of log-volatility increases the
number of non-zero off-diagonals of the Hessian of log f (λ|θ, r). The VMVB
algorithm can accommodate this. Even non-normality of λ|θ might be feasible if
we introduce finite mixtures of normals for the conditional distribution of λt and
augment the data to include indicators for mixture components.
Most importantly, we can look to multivariate models. We hope to be able to
handle multivariate models with a flexible dependance structure, and draw each
volatility sequence and the parameters on which it depends in one Gibbs block.
22
350
300
250
200
150
100
50
0
0
5
10
15
4
x 10
Figure 4: Posterior Scatterplot of ωλ versus ωr
23
4
15
x 10
10
5
0
0.92
0.93
0.94
0.95
0.96
0.97
Figure 5: Posterior Scatterplot of ωr versus φ
24
0.98
0.99
1
1
0.99
0.98
0.97
0.96
0.95
0.94
0.93
0.92
0
50
100
150
200
Figure 6: Posterior Scatterplot of φ versus ωλ
25
250
300
350
0
10
−1
10
−2
10
−3
10
−4
10
−5
10
0
2
4
6
8
10
12
14
16
Figure 7: Survival Function and Hazard Rate for Run Lengths, Empirical Example
26
18
20
References
[1] Siddhartha Chib, Federico Nardari, and Neil Shephard. Markov chain Monte
Carlo methods for stochastic volatility models. Journal of Econometrics,
108:281–316, 2002.
[2] Siddhartha Chib, Federico Nardari, and Neil Shephard. Analysis of high dimensional multivariate stochastic volatility models. Forthcoming, Journal of
Econometrics, 2006.
[3] R. F. Engle. Autoregressive conditional heteroscedasticity with estimates of
the variance of united kingdom inflation. Econometrica, 50(4):987–1007,
1982.
[4] J. Geweke. Bayesian comparison of econometric models. 1994. Working
Paper, Federal Reserve Bank of Minneapolis Research Department.
[5] E. Jacquier, N. Polson, and P. Rossi. Bayesian analysis of stochastic volatility
models. Journal of Business and Economic Statistics, 12(4):371–388, 1994.
[6] Sangjoon Kim, Neil Shephard, and Siddhartha Chib. Stochastic volatility:
Likelihood inference and comparison with ARCH models. Review of Economic Studies, 65(3):361–393, 1998.
[7] Yasuhiro Omori, Siddhartha Chib, Neil Shephard, and Jouchi Nakajima.
Stochastic volatility with leverage: fast likelihood inference. Forthcoming,
Journal of Econometrics, 2004.
[8] Neil Shephard and Michael K. Pitt. Likelihood analysis of non-Gaussian measurement time series. Biometrika, 84(3):653–667, 1997.
[9] Raf Vandebril, Nicola Mastronardi, and Marc Van Barel. A Levinson-like
algorithm for symmetric strongly nonsingular higher order semiseparable plus
band matrices. Report TW 427, Katholieke Universiteit Leuven Department
of Computer Science, 2005.
27
Download