Chapter 2 SINGLE SAMPLING 2.1 Motivation

advertisement
Chapter 2
SINGLE SAMPLING
2.1 Motivation
In this chapter the focus will be on single move samplers for non-Gaussian state space models. This means each component of the states will be sampled conditional on the other states
and the measurements. The two examples considered will centre on time series models, the
duration model and the SV model. The general model under consideration is the following
non-Gaussian state space model described in detail in Section 1.4,
yt
f (ytjst);
st = ct + Ztt
t
= dt + Ttt + Ht ut; ut (0; I );
jY N (a j ; P j ):
NID
+1
1
0
10
(2.1)
10
2.2 General results for acceptance-rejection
For the moment I shall suppress the notation of the parameters , which are conditioned upon.
I first show that reasonably generally it is possible to produce a Gibbs sampler for these problems, for we can sample from t jt;1 ; t+1 ; yt straightforwardly. This single move sampling
problem has been highlighted in Section 1.4.3. Part of the analysis will be based around the
jackknife density: t jnt . Since the states follow the state space form equation
33
C HAPTER 2 SINGLE SAMPLING
t
+1
we obtain t jnt
= dt + Ttt + Ht ut; ut NID
34
(0; I );
(t; t ), where t is linear combination of t
N
+1
and t;1 and
nt = (0 ; :::; t0 ; ; t0 ; :::; n0 )0 :
1
1
+1
The likelihood will be written log f (yt jt ) = l(t ), suppressing the dependence on yt for compactness of notation. Finally let
@l(t ) = l0 ( )
t
@t
@ l(t ) = l00 ( ):
t
@t @t0
2
and
Theorem 2.1
Suppose t is non-singular and that l00 (t ) is negative semi-definite for all values of t .
Then the following two results hold.
(1) We can sample from t jt;1 ; t+1 ; yt by making suggestions t from
N
ft + t l0 (mt ); tg ;
(2.2)
which are accepted with probability
exp l(t ) ; l(mt ) ; l0(mt )T (t ; mt) ;
(2.3)
whatever the value of mt .
(2) The probability of rejecting the suggestion made in (2.2) is minimised by selecting mt
as the mode of f (t jt;1 ; t+1 ; yt ).
The proof of Theorem 2.1 is given in Section 2.6.2. This indicates that for log-concave
C HAPTER 2 SINGLE SAMPLING
35
measurement densities, the mode of the density of t jt;1 ; t+1 ; yt is the best point of expansion in terms of the expected number of rejection steps before acceptance. The log-likelihood
is bounded above by a first order expansion around mt . It is instructive to note that the acceptreject method with acceptance probability given by (2.3) is valid whatever value of
mt
is
chosen. This result has a number of attractions. Evaluating the rejection rate does not require
computing the jackknife density. It only involves the likelihood, the jackknife mean and variance.
Of course the mode of f (t jt;1 ; t+1 ; yt ) is relatively straightforward to find numerically
due to the assumption that l00 (t ) is negative semi-definite for all t . Indeed a recursion of the
form
mti
( +1)
= t + t l0 mti ;
( )
starting at
mt = t;
(0)
is guaranteed to converge to the mode. This is in fact a quasi Newton-Raphson method but
1
using ;;
t rather than the true matrix of second derivatives. The one-step algorithm of taking
mt = t +t l0 (t), is likely to be very successful in most situations as t is usually quite small.
Hence it is unlikely that there will be much gain from iterating the procedure until convergence,
which after all does take some computer time. This is certainly true in the following examples
in which the jackknife density dominates the posterior.
Although this result requires l00 (:) to be negative semi-definite, which may appear constraining, in fact many interesting measurement models (such as the SV, Poisson and binomial)
have this property.
C HAPTER 2 SINGLE SAMPLING
36
2.3 Stochastic volatility
The first model considered will be the SV model. This model and its properties are explored
in Section 1.4.1. We have,
yt = t exp(t=2); t = t + t ;
(2.4)
+1
where t and t are independent Gaussian processes with variances of 1 and 2 respectively.
In the SV case the prior for the initial state is 1
(0, =(1 ; )), so that its mean and
N
2
2
variance are the same as the unconditional mean and variance of the states. It is assumed for
the moment that yt and t are univariate, although the methods introduced in this chapter easily
extend to multivariate cases. As mentioned in chapter 1, single move MCMC methods have
been used on this model by, for instance, Jacquier, Polson and Rossi (1994) and more recently
the general method of univariate sampling of Gilks and Wild (1992) has been suggested by
Kim and Shephard (1994). The method of Gilks and Wild (1992) can run quite slowly and is
only applicable to univariate densities. The single move method of Jacquier et al. (1994) seems
generally unreliable and expensive to run.
2.3.1 Gibbs sampler
The results of Theorem 1 apply immediately as, ignoring constants,
l(t) = log f (ytjt) = ; 2t ; 2yt exp(;t )
2
2
@ log f (ytjt) = ; yt exp(; ) < 0.
t
@t
2
2
2
implying
2
2
C HAPTER 2 SINGLE SAMPLING
We can write the jackknife density as t jnt
37
(t ; t ) where t = (t + t; )=(1+ )
and t = =(1 + ). Then we can draw from t jyt ; nt by sampling from
2
2
2
N
+1
1
2
1
2
2
N ;
;
t
t
t
t yt exp (; ) ; 1
t+
t
2 2
= 2
2
and rejecting with probability (2.3). This is an expansion around t . For typical problems
y2
this algorithm accepts with probability of approximately 1 ; 4t2 exp(;t )ft2 + (t ; t )2 g;
a derivation of this result is given in the Appendix. The acceptance probability is typically
over 99% for most financial datasets and seems very robust. On a P5/133 computer with n =
500, this sampler carries out approximately 100 complete sweeps in about one second. Notice
that this approach allows the first state,
, and last state, n, to be easily drawn from their
1
conditional densities. The only thing that changes is the Gaussian proposal density based on
the jackknife of these states. For instance, if the target density is f (1 jy1 ; 2 ) then we have
= 1
2
and 12
= for the jackknife density.
2
2.3.2 Pseudo-dominating Metropolis sampler
I now use a second order expansion to produce a pseudo-dominating Metropolis sampler, see
Tierney (1994) and Section 1.2.2.2. A similar expansion occurs in Green and Han (1990) as
referred to in chapter 6. In chapter 3 the pseudo-dominating Metropolis samplers will be used
in high dimensional multi-move samplers. Here this method will be developed in the singlemove case to introduce some of the ideas. By analogy with Theorem 1, it is possible to carry
out a quadratic Taylor expansion of the log measurement density at any point and still produce
a Gaussian proposal density. A higher rate can be obtained by iterating the second order expansion, obtaining the mode. The result would be a Laplace approximation suggestion density.
1 In different contexts, first
order Taylor expansions of the likelihood are used in the work of Chib and Greenberg (1994) and Chib and Greenberg (1995) to generate Metropolis suggestions.
C HAPTER 2 SINGLE SAMPLING
38
Using the second order expansion we have,
log f (tjyt; nt ) = log f (tjt; ; t ) + log f (ytjt)
1
+1
= ; ;2 ; ;
;
( t
t
2 t
;
2
)
t
2
2
( t
t)
2 t2
; ;
t
2
= log g:
yt2 exp(; )
t
2 2
yt2 exp (; ) 1 ; ( ; ) +
t
t
t
2 2
1
2
(t ; t)
2
(2.5)
The quadratic term in
log g
means that it does not bound
log f .
This delivers a pseudo-
dominating suggestion, with suggested draws z for t jyt ; nt being made from
N
2
2 y 2
t + t
t2 t 2
Notice that if yt
exp (;t ) (1 + t) ; 1
t
2
= 0, then t = t
2
2
and
; 2
t
;
where t;2 = t;2 +
yt
:
2 exp(t)
2
2
f is truly normally distributed and equals g.
The
precision of t;2 increases with yt2. In the accept-reject part of the algorithm, these suggestions
are accepted with probability min(f=g; 1), while the Metropolis probability of accepting z is
2
n
o
f (z j yt; nkt ) min f (t(k) j yt; n(kt )); g(t(k))
n
o
min 4
(k )
(k )
(k )
f (t j yt; nt ) min f (z j yt; nt ); g(z)
( )
3
; 15 ;
> 1 or f (tk )=g(tk ) > 1. If
we denote l(t ) log f (yt jt ) and its second order expansion by l (t ) then at the acceptreject stage the probability of acceptance simplifies to min(! (z ); 1), where ! (z ) = expfl(z ) ;
l (z)g. Similarly the Metropolis probability simplifies to
and now has to be calculated and employed when f (z )=g (z )
( )
( )
~
~
"
#
maxf1; !(z)g
min
;1 ;
maxf1; !(tk )g
( )
only involving the log measurement density and its approximation.
A slightly less direct way of thinking about this analysis is that we are using a Gaussian
C HAPTER 2 SINGLE SAMPLING
39
approximation to the log-likelihood, log f (yt jt ), which is then added to the then-conjugate
Gaussian prior to deliver a Gaussian posterior. Notice that as yt goes to zero this likelihood
becomes uninformative, although the posterior is perfectly well-behaved. This way of thinking
about this problem is easily extended to updating more than a single state at a time.
2.3.3 Samplers for the parameters
In this section a Bayesian analysis of the parameters in the model will be pursued. As this setup
is used various times in this thesis, it is useful to spell out the assumptions in a some detail here.
Given the states, sampling from the parameters of the model is straightforward for and 2 .
First assuming a flat prior for log we achieve the posterior 2 jy; 2
2
while assuming a prior of ;
p S0 for j we have
j; ;n
2
2
+
p
( n
X
t=2
;n
2
2
2
1
In the work presented it is assumed that for daily data p
2
yt exp(;t ),
2
)
(t ; t; ) + (1 ; ) + S
1
P
:
0
= 10 and S = p 0:01, while for
0
weekly data S0 is taken to be p 0:05.
The prior on the persistence parameter will be designed to ensure that the log volatility
process is stationary, having support only for between ;1 and 1. To carry this out I employ a
beta prior on ( + 1) =2, with E() = f2=( + )g; 1: In the analysis given below, and = 1 so the prior mean is 0:902.
= 19:5
It could be argued that the prior should be closer to a
unit root. As a result of the prior choice, the posterior is non-conjugate and samples are drawn
from j2 ; using an accept-reject algorithm (using a variant of Theorem 1). Indeed, we have
the following likelihood for ;
log p(yj; ; ; ) = const ; 1 ( ; ) + 1 log(1 ; );
2 2
2
2
2
2
C HAPTER 2 SINGLE SAMPLING
40
where the last term arises from the variance of 1 and
n
P
=
t t;1
t=2
; and 2
nP
;1
t2
t=2
= nP
; :
t
2
1
2
t=2
The prior is
log f () = ( ; 1) log(1 + ) + ( ; 1) log(1 ; );
so we have
log p(j ; ) = c ; 1 ( ; ) + p();
2 2
2
2
where
p() = ( ; 21 ) log(1 + ) + ( ; 12 ) log(1 ; );
(2.6)
of Beta form. The non-quadratic term p() is concave and the likelihood, which dominates, is
Gaussian. We can therefore take a first order expansion of (2.6) around the likelihood mean and
perform accept-reject sampling. I obtain the following proposal and acceptance probability,
N
+ p0(); 2
2
Pr(A) = exp fp() ; p() ; p0()( ; )g ;
where in actual fact I draw from the truncated normal density (truncated to be less than 1). In
this case it may be sensible to iterate to the mode using the quasi-Newton Raphson scheme
starting with m
= then setting m = + p0(m ) until convergence to the mode, then
2
using m as the expansion point for the accept-reject procedure.
C HAPTER 2 SINGLE SAMPLING
sampling phi | y
1
sampling beta | y
1.5
.975
sampling sigma_eta | y
.3
1
.2
.5
.1
41
.95
0
250000 500000 750000 1.e+06
0
250000 500000 750000 1.e+06
0
250000 500000 750000 1.e+06
15
4
40
10
2
20
1
5
.925
.95
Correlogram
.975
1
.5
Correlogram
1
0
1
1.5
0
0
15000
30000
45000
.05
.1
.15
Correlogram
1
.2
.25
.3
0
0
15000
30000
45000
0
15000
30000
45000
Figure 2.1 Daily returns for the Pound against the Dollar. Top graphs: the simulation against
iteration number. Middle graphs: histograms of the resulting marginal distribution. Bottom
graphs: the corresponding correlogram for the iterations.
2.3.4 Empirical effectiveness
The SV model was applied to the daily percentage returns on the Pound Sterling/US Dollar
exchange rate from 1/10/81 to 28/6/85 (946 observations). The SV Gibbs sampler was initialised by setting all the log-volatilities to zero and = 0:95, = 0:02 and = 1. The Gibbs
2
sampler was then iterated on the states for 1,000 iterations and then the parameters and states
for 50,000 more iterations, before recording any answers. The next 1000,000 iterations are
graphed in Figure 2.1 and summarised in Table 2.1.
The correlogram shows significant autocorrelations for
at 10,000 lags, for at 25,000
lags and for at 5,000 lags. This is not an unfamiliar pattern for Gibbs samplers in high dimensional problems (see Ripley and Kirkland (1990) for a good example in spatial statistics).
The inefficiency factors (how many times the single move sampler would need to be run to
C HAPTER 2 SINGLE SAMPLING
42
produce the same precision as a hypothetical independent Monte Carlo sampler) are computed
using the Parzen window, of Section 1.2.2.4, over 100,000 lags. The inefficiency factors are
estimated as 476 for , 920 for and 12; 110 for .
Mean
0.9821
0.1382
0.6594
Computer time 233,303
jy
jy
jy
Monte Carlo S.E. Covariance & Correlation of Posterior
0.000277
8.34310;5
-0.629
0.217
0.000562
-0.0001479 0.0006631
-0.156
0.0121
0.0002089 -0.0004228
0.01209
Table 2.1 Daily returns for the Pound against the Dollar. Summaries of Figure 2.1. The S.E.
of simulation is computed using 100,000 lags. Italics are correlations rather than covariances.
Computer time is seconds on a P5/133.
2.4 Duration time series
The second example will be a model of consecutive durations. Observation-driven autoregressive models of this type have been suggested in work by Wold (1948), Cox (1955) and, more
recently, Engle and Russell (1994). Parameter-driven models have been analysed by Gamerman (1992).
Here a very simple exponential distribution model is used to summarise a day of durations
in seconds between price announcements (bids and asks) in the Japanese Yen against the US
Dollar exchange rate recorded in the Olsen and Associates tape for 5th April 1993. Summary
information for this series is given in Figure 2.2. If yt denotes the time between price announcements, then the basic model will be the same as the SV model (1.4.1) but with the normality
assumption on "t replaced by an exponential density with mean one.
An obvious model for this type of data would be a doubly stochastic Poisson point process (a Cox process) with a log-link intensity model which evolves according to an OrnsteinUhlenbeck process. Here a discrete time approximation to this model is used, with exponen-
C HAPTER 2 SINGLE SAMPLING
43
Durations between price announcements
1500
1000
500
0
2.5
5
7.5
10
12.5
15
17.5
20
22.5
.02
.01
0
200
400
Correlogram of price durations
1
600
800
1000
1200
1400
1600
.5
0
5
10
15
20
25
30
35
40
45
50
Figure 2.2 Time between price announcements in the Yen/Dollar market on 5 April 1993.
Histogram of the unconditional distribution and correlogram of the consecutive duration
times.
tially distributed time intervals
f (ytjt) = t exp(;t yt);
t
= exp(;t );
t
= t + t ; t +1
NID
(0; ):
2
This type of discrete time approximation to the continuous time model was used previously by
Gamerman (1992) and Carlin, Gelfand and Smith (1992a).
The time series behaviour of this model can be summarised by
E
(yt ) =
E
; ;1 t
= ; exp( =2);
1
2
Var
(yt) = ; exp() 2 exp( ) ; 1 ;
2
2
2
C HAPTER 2 SINGLE SAMPLING
and
Cov
(yt; yt s) =
+
E
44
; ;1 ;1 + Cov(;1;1 );
t
t+s
t
t+s
= ; exp( ) [2 exp f (1 + 2s)g ; 1] :
2
This implies
2
2
(yt ; yt s) = [2 exp f (1 + 2 )g ; 1]
[2 exp ( ) ; 1]
s
2
Corr
+
2
which has a similar structure to an ARMA(1,1) model.
This model can be analysed using the same Gibbs sampling approach used for the SV
model, but now the proposal density becomes, with expansion around some point mt ,
N
2
; ;
t
with acceptance probability
t
t = t + t fyt exp (;mt ) ; 1g
2
exp [;yt fexp(;t ) ; exp(;mt ) (1 ; (t ; mt ))g],
which
gives an average acceptance rate of approximately 1 ; yt2 exp(;mt )ft2 + (t ; m)2 g, see
Appendix. The updating of the parameters is unchanged from the SV model case except for
P
jy; fn; yt exp(;t )g with a prior proportional to on : The results of the analysis
G
1
are given in Figure 2.3 for this data. The sampler works in a satisfactory way, converging fairly
quickly and leading to low integrated autocorrelation time for the parameters and . The
sampler is rather less efficient for the parameter . Reparameterisation, of the kind used in
chapter 4, should eliminate the problem for . The efficiency in the other parameters is due to
the lower level of persistence in the log-mean durations than the corresponding log-volatilities
for the SV model.
2.5 Conclusion
In this chapter I have shown that reasonably simple and general single-move Gibbs samplers
or pseudo-dominating Metropolis algorithms can be developed for this wide class of non-
C HAPTER 2 SINGLE SAMPLING
sampling phi | y
sampling beta | y
.98
40
.96
30
0
45
25000 50000 75000 100000
Histogram
50
sampling sigma_eta | y
.2
.15
0
25000 50000 75000 100000
Histogram
0
30
25000 50000 75000 100000
Histogram
.1
20
25
.05
.94
.96
Correlogram
1
.98
10
1
20
30
Correlogram
1
0
40
0
0
100
200
.125
.15
Correlogram
1
.175
.2
0
0
100
200
0
100
200
Figure 2.3 Single-move Gibbs sampler for the duration time series model. Top graphs: the
simulation against iteration number. Middle graphs: histograms of the resulting marginal distributions. Bottom graphs: the corresponding correlograms for the iterations (the lag numbers
should be multiplied by 100).
Gaussian state space models. Samplers based upon Taylor expansions accept with very high
probability. Unfortunately, the single-move methods developed appear to suffer from slow
convergence and inefficient output, as measured by integrated autocorrelation time, in equilibrium.
We need more reliable methods if Bayesian methods are going to be widely used for such
models. In the next chapter I shall develop one such approach.
C HAPTER 2 SINGLE SAMPLING
46
2.6 Appendix
2.6.1 Acceptance frequencies
2.6.1.1 Expansion around mode
It is straightforward to make good approximations to the acceptance rates for the accept-reject
algorithms used in the single-move Gibbs samplers. Suppose we have iterated to the mode mt
of the conditional density f (t jt;1 ; t+1 ; yt ) and we use this as the expansion point. So by
definition, mt
= t + t l0 (mt ), where t is the jackknife mean and l(t ) log f (ytjt ). I will
2
outline the univariate case for expositional reasons. The acceptance probability for proposal
t is given by
acct = E [exp fl(t ) ; l(mt ) ; l0(t )(t ; mt )g]
where
t jyt; t; ; t (mt ; t ):
1
+1
N
2
Hence
acct 1 + E fl(t) ; l(mt ) ; l0 (mt)(t ; mt )g
= 1 + E fl(t)g ; l(mt )
' 1 + l00(mt ) 2t :
(2.7)
2
(2.8)
Now, for the SV model and the duration model the upper bound of (2.7) can easily be determined.
C HAPTER 2 SINGLE SAMPLING
47
2.6.1.2 Expansion around arbitrary point
If expansion is performed around mt where mt is not the mode we have
acct = E [exp fl(t ) ; l(mt ) ; l0(mt )(t ; mt )g] ;
where
t jyt; t; ; t 1
Let t
+1
N
t + t l0(mt ); t :
2
2
= t + t l0(mt ). Hence
2
acct 1 + E fl(t) ; l(mt ) ; l0 (mt)(t ; mt )g
00
l
(
m
t)
' 1 + E 2 (t ; mt )
00
= 1 + l (mt ) t + (t ; mt ) :
2
2
2
2
This result provides insight into why it is best to choose the mode where t
= mt . This is par-
ticularly important when l00 (mt ) is highly negative in which case the measurement dominates
the jackknife prior. This, however, is rarely the case for single move suggestions.
2.6.1.3 SV model
The model acceptance rate, taking the expectation over the suggestions is
acct 1 ;
' 1;
yt2
yt2
4 2
2 2
exp(;mt )fexp(t =2) ; 1g
2
exp(;mt )t :
2
C HAPTER 2 SINGLE SAMPLING
A typical situation is where t2
= 0:01.
48
Notice that it is unlikely that yt exp(;mt )= 2 will
be very large as mt is a smoothed draw from the log-mean of yt given the data and so reflects
the variation in yt . A very extreme case yt2 exp(;mt )= 2
reasonably when yt exp(;t )= 2
= 100 implies acct ' 0:75: More
= 2, acct ' 0:995, while when the value is one, acct '
0:9975. In my experience this acceptance rate seems usual for real financial datasets. If mt is
not the mode (the prior mean, t , say) then
acct ' 1 ;
yt2
4 2
exp(;mt )ft + (t ; mt ) g:
2
2
2.6.1.4 Exponential duration model
The modal acceptance rate, taking the expectation over the suggestions is
acct 1 ; E [yt fexp(;t ) ; exp(;mt )g]
= 1 ; yt exp(;mt ) fexp(t =2) ; 1g
' 1 ; y exp(;mt )t :
2
2
t
2
This has similar acceptance rates to the SV model given above. The non-modal acceptance
rate is
acct ' 1 ; y exp(;mt )ft + (t ; mt ) g:
2
t
2
2
2.6.2 Proof of Theorem 2.1
The basis of this will be that, ignoring constants
log f (t jt; ; t ; yt) = l(t ) ; (t ; t )T ;t (t ; t);
1
+1
1
2
1
l(mt ) + l0 (mt )T (t ; mt ) ; (t ; t )T ;t (t ; t);
1
2
= log g(t; mt);
1
C HAPTER 2 SINGLE SAMPLING
49
by Taylor’s mean value theorem since l00 (t ) is negative semi-definite for all t . Bounding
means we can use the standard accept-reject results to prove the validity of the sampler. This
gives the proof of the first result. Recall I have written the densities f and g only up to constants
of proportionality. I will express
1
1
log g(t; mt ) = log Rt(mt ); (t ;at )T ;t (t ;at ); Tt ;t t;
2
2
1
1
where
at = t +t lT (mt );
and the remainder terms which only involve mt are
1
log Rt (mt ) = l(mt ) ; l0(mt )T mt + aTt ;t at :
2
1
Now the probability of acceptance, Pr(A), is Ft =Gt (mt ), where
Ft =
Z
f (t)dt
and
Gt(mt ) =
Z
g(t; mt )dt:
since,
Pr(A) = E [f (t )=g(t; mt )]
Z
f (x) g(x; mt) dx = Ft :
=
g(x; mt) Gt (mt )
Gt (mt )
Now
Gt (mt ) = cRt (mt )
where c does not involve mt . Hence to maximise the acceptance probability we can select mt
to minimise log Rt (mt ). However,
@ log Rt (mt )
@mt
= l0 (mt ) ; l00 (mt )mt ; l0(mt ) + l00 (mt )at
= l00 (mt ) ft + t l0(mt ) ; mt g :
C HAPTER 2 SINGLE SAMPLING
Equating to zero, this has a solution m
ct which satisfies the equation m
ct
50
= t + t l0 (m
ct ). This
is the only point at which the derivative is 0. However, the solution to this equation is also the
mode of f .
To verify we have a minimum, it is clear that at mt
=m
ct ,
@ log Rt (mt ) = l00(mb ) t t
@mt
2
2
2
which is positive semi-definite. This proves the second part of the theorem.
Download