Chapter 2 SINGLE SAMPLING 2.1 Motivation In this chapter the focus will be on single move samplers for non-Gaussian state space models. This means each component of the states will be sampled conditional on the other states and the measurements. The two examples considered will centre on time series models, the duration model and the SV model. The general model under consideration is the following non-Gaussian state space model described in detail in Section 1.4, yt f (ytjst); st = ct + Ztt t = dt + Ttt + Ht ut; ut (0; I ); jY N (a j ; P j ): NID +1 1 0 10 (2.1) 10 2.2 General results for acceptance-rejection For the moment I shall suppress the notation of the parameters , which are conditioned upon. I first show that reasonably generally it is possible to produce a Gibbs sampler for these problems, for we can sample from t jt;1 ; t+1 ; yt straightforwardly. This single move sampling problem has been highlighted in Section 1.4.3. Part of the analysis will be based around the jackknife density: t jnt . Since the states follow the state space form equation 33 C HAPTER 2 SINGLE SAMPLING t +1 we obtain t jnt = dt + Ttt + Ht ut; ut NID 34 (0; I ); (t; t ), where t is linear combination of t N +1 and t;1 and nt = (0 ; :::; t0 ; ; t0 ; :::; n0 )0 : 1 1 +1 The likelihood will be written log f (yt jt ) = l(t ), suppressing the dependence on yt for compactness of notation. Finally let @l(t ) = l0 ( ) t @t @ l(t ) = l00 ( ): t @t @t0 2 and Theorem 2.1 Suppose t is non-singular and that l00 (t ) is negative semi-definite for all values of t . Then the following two results hold. (1) We can sample from t jt;1 ; t+1 ; yt by making suggestions t from N ft + t l0 (mt ); tg ; (2.2) which are accepted with probability exp l(t ) ; l(mt ) ; l0(mt )T (t ; mt) ; (2.3) whatever the value of mt . (2) The probability of rejecting the suggestion made in (2.2) is minimised by selecting mt as the mode of f (t jt;1 ; t+1 ; yt ). The proof of Theorem 2.1 is given in Section 2.6.2. This indicates that for log-concave C HAPTER 2 SINGLE SAMPLING 35 measurement densities, the mode of the density of t jt;1 ; t+1 ; yt is the best point of expansion in terms of the expected number of rejection steps before acceptance. The log-likelihood is bounded above by a first order expansion around mt . It is instructive to note that the acceptreject method with acceptance probability given by (2.3) is valid whatever value of mt is chosen. This result has a number of attractions. Evaluating the rejection rate does not require computing the jackknife density. It only involves the likelihood, the jackknife mean and variance. Of course the mode of f (t jt;1 ; t+1 ; yt ) is relatively straightforward to find numerically due to the assumption that l00 (t ) is negative semi-definite for all t . Indeed a recursion of the form mti ( +1) = t + t l0 mti ; ( ) starting at mt = t; (0) is guaranteed to converge to the mode. This is in fact a quasi Newton-Raphson method but 1 using ;; t rather than the true matrix of second derivatives. The one-step algorithm of taking mt = t +t l0 (t), is likely to be very successful in most situations as t is usually quite small. Hence it is unlikely that there will be much gain from iterating the procedure until convergence, which after all does take some computer time. This is certainly true in the following examples in which the jackknife density dominates the posterior. Although this result requires l00 (:) to be negative semi-definite, which may appear constraining, in fact many interesting measurement models (such as the SV, Poisson and binomial) have this property. C HAPTER 2 SINGLE SAMPLING 36 2.3 Stochastic volatility The first model considered will be the SV model. This model and its properties are explored in Section 1.4.1. We have, yt = t exp(t=2); t = t + t ; (2.4) +1 where t and t are independent Gaussian processes with variances of 1 and 2 respectively. In the SV case the prior for the initial state is 1 (0, =(1 ; )), so that its mean and N 2 2 variance are the same as the unconditional mean and variance of the states. It is assumed for the moment that yt and t are univariate, although the methods introduced in this chapter easily extend to multivariate cases. As mentioned in chapter 1, single move MCMC methods have been used on this model by, for instance, Jacquier, Polson and Rossi (1994) and more recently the general method of univariate sampling of Gilks and Wild (1992) has been suggested by Kim and Shephard (1994). The method of Gilks and Wild (1992) can run quite slowly and is only applicable to univariate densities. The single move method of Jacquier et al. (1994) seems generally unreliable and expensive to run. 2.3.1 Gibbs sampler The results of Theorem 1 apply immediately as, ignoring constants, l(t) = log f (ytjt) = ; 2t ; 2yt exp(;t ) 2 2 @ log f (ytjt) = ; yt exp(; ) < 0. t @t 2 2 2 implying 2 2 C HAPTER 2 SINGLE SAMPLING We can write the jackknife density as t jnt 37 (t ; t ) where t = (t + t; )=(1+ ) and t = =(1 + ). Then we can draw from t jyt ; nt by sampling from 2 2 2 N +1 1 2 1 2 2 N ; ; t t t t yt exp (; ) ; 1 t+ t 2 2 = 2 2 and rejecting with probability (2.3). This is an expansion around t . For typical problems y2 this algorithm accepts with probability of approximately 1 ; 4t2 exp(;t )ft2 + (t ; t )2 g; a derivation of this result is given in the Appendix. The acceptance probability is typically over 99% for most financial datasets and seems very robust. On a P5/133 computer with n = 500, this sampler carries out approximately 100 complete sweeps in about one second. Notice that this approach allows the first state, , and last state, n, to be easily drawn from their 1 conditional densities. The only thing that changes is the Gaussian proposal density based on the jackknife of these states. For instance, if the target density is f (1 jy1 ; 2 ) then we have = 1 2 and 12 = for the jackknife density. 2 2.3.2 Pseudo-dominating Metropolis sampler I now use a second order expansion to produce a pseudo-dominating Metropolis sampler, see Tierney (1994) and Section 1.2.2.2. A similar expansion occurs in Green and Han (1990) as referred to in chapter 6. In chapter 3 the pseudo-dominating Metropolis samplers will be used in high dimensional multi-move samplers. Here this method will be developed in the singlemove case to introduce some of the ideas. By analogy with Theorem 1, it is possible to carry out a quadratic Taylor expansion of the log measurement density at any point and still produce a Gaussian proposal density. A higher rate can be obtained by iterating the second order expansion, obtaining the mode. The result would be a Laplace approximation suggestion density. 1 In different contexts, first order Taylor expansions of the likelihood are used in the work of Chib and Greenberg (1994) and Chib and Greenberg (1995) to generate Metropolis suggestions. C HAPTER 2 SINGLE SAMPLING 38 Using the second order expansion we have, log f (tjyt; nt ) = log f (tjt; ; t ) + log f (ytjt) 1 +1 = ; ;2 ; ; ; ( t t 2 t ; 2 ) t 2 2 ( t t) 2 t2 ; ; t 2 = log g: yt2 exp(; ) t 2 2 yt2 exp (; ) 1 ; ( ; ) + t t t 2 2 1 2 (t ; t) 2 (2.5) The quadratic term in log g means that it does not bound log f . This delivers a pseudo- dominating suggestion, with suggested draws z for t jyt ; nt being made from N 2 2 y 2 t + t t2 t 2 Notice that if yt exp (;t ) (1 + t) ; 1 t 2 = 0, then t = t 2 2 and ; 2 t ; where t;2 = t;2 + yt : 2 exp(t) 2 2 f is truly normally distributed and equals g. The precision of t;2 increases with yt2. In the accept-reject part of the algorithm, these suggestions are accepted with probability min(f=g; 1), while the Metropolis probability of accepting z is 2 n o f (z j yt; nkt ) min f (t(k) j yt; n(kt )); g(t(k)) n o min 4 (k ) (k ) (k ) f (t j yt; nt ) min f (z j yt; nt ); g(z) ( ) 3 ; 15 ; > 1 or f (tk )=g(tk ) > 1. If we denote l(t ) log f (yt jt ) and its second order expansion by l (t ) then at the acceptreject stage the probability of acceptance simplifies to min(! (z ); 1), where ! (z ) = expfl(z ) ; l (z)g. Similarly the Metropolis probability simplifies to and now has to be calculated and employed when f (z )=g (z ) ( ) ( ) ~ ~ " # maxf1; !(z)g min ;1 ; maxf1; !(tk )g ( ) only involving the log measurement density and its approximation. A slightly less direct way of thinking about this analysis is that we are using a Gaussian C HAPTER 2 SINGLE SAMPLING 39 approximation to the log-likelihood, log f (yt jt ), which is then added to the then-conjugate Gaussian prior to deliver a Gaussian posterior. Notice that as yt goes to zero this likelihood becomes uninformative, although the posterior is perfectly well-behaved. This way of thinking about this problem is easily extended to updating more than a single state at a time. 2.3.3 Samplers for the parameters In this section a Bayesian analysis of the parameters in the model will be pursued. As this setup is used various times in this thesis, it is useful to spell out the assumptions in a some detail here. Given the states, sampling from the parameters of the model is straightforward for and 2 . First assuming a flat prior for log we achieve the posterior 2 jy; 2 2 while assuming a prior of ; p S0 for j we have j; ;n 2 2 + p ( n X t=2 ;n 2 2 2 1 In the work presented it is assumed that for daily data p 2 yt exp(;t ), 2 ) (t ; t; ) + (1 ; ) + S 1 P : 0 = 10 and S = p 0:01, while for 0 weekly data S0 is taken to be p 0:05. The prior on the persistence parameter will be designed to ensure that the log volatility process is stationary, having support only for between ;1 and 1. To carry this out I employ a beta prior on ( + 1) =2, with E() = f2=( + )g; 1: In the analysis given below, and = 1 so the prior mean is 0:902. = 19:5 It could be argued that the prior should be closer to a unit root. As a result of the prior choice, the posterior is non-conjugate and samples are drawn from j2 ; using an accept-reject algorithm (using a variant of Theorem 1). Indeed, we have the following likelihood for ; log p(yj; ; ; ) = const ; 1 ( ; ) + 1 log(1 ; ); 2 2 2 2 2 2 C HAPTER 2 SINGLE SAMPLING 40 where the last term arises from the variance of 1 and n P = t t;1 t=2 ; and 2 nP ;1 t2 t=2 = nP ; : t 2 1 2 t=2 The prior is log f () = ( ; 1) log(1 + ) + ( ; 1) log(1 ; ); so we have log p(j ; ) = c ; 1 ( ; ) + p(); 2 2 2 2 where p() = ( ; 21 ) log(1 + ) + ( ; 12 ) log(1 ; ); (2.6) of Beta form. The non-quadratic term p() is concave and the likelihood, which dominates, is Gaussian. We can therefore take a first order expansion of (2.6) around the likelihood mean and perform accept-reject sampling. I obtain the following proposal and acceptance probability, N + p0(); 2 2 Pr(A) = exp fp() ; p() ; p0()( ; )g ; where in actual fact I draw from the truncated normal density (truncated to be less than 1). In this case it may be sensible to iterate to the mode using the quasi-Newton Raphson scheme starting with m = then setting m = + p0(m ) until convergence to the mode, then 2 using m as the expansion point for the accept-reject procedure. C HAPTER 2 SINGLE SAMPLING sampling phi | y 1 sampling beta | y 1.5 .975 sampling sigma_eta | y .3 1 .2 .5 .1 41 .95 0 250000 500000 750000 1.e+06 0 250000 500000 750000 1.e+06 0 250000 500000 750000 1.e+06 15 4 40 10 2 20 1 5 .925 .95 Correlogram .975 1 .5 Correlogram 1 0 1 1.5 0 0 15000 30000 45000 .05 .1 .15 Correlogram 1 .2 .25 .3 0 0 15000 30000 45000 0 15000 30000 45000 Figure 2.1 Daily returns for the Pound against the Dollar. Top graphs: the simulation against iteration number. Middle graphs: histograms of the resulting marginal distribution. Bottom graphs: the corresponding correlogram for the iterations. 2.3.4 Empirical effectiveness The SV model was applied to the daily percentage returns on the Pound Sterling/US Dollar exchange rate from 1/10/81 to 28/6/85 (946 observations). The SV Gibbs sampler was initialised by setting all the log-volatilities to zero and = 0:95, = 0:02 and = 1. The Gibbs 2 sampler was then iterated on the states for 1,000 iterations and then the parameters and states for 50,000 more iterations, before recording any answers. The next 1000,000 iterations are graphed in Figure 2.1 and summarised in Table 2.1. The correlogram shows significant autocorrelations for at 10,000 lags, for at 25,000 lags and for at 5,000 lags. This is not an unfamiliar pattern for Gibbs samplers in high dimensional problems (see Ripley and Kirkland (1990) for a good example in spatial statistics). The inefficiency factors (how many times the single move sampler would need to be run to C HAPTER 2 SINGLE SAMPLING 42 produce the same precision as a hypothetical independent Monte Carlo sampler) are computed using the Parzen window, of Section 1.2.2.4, over 100,000 lags. The inefficiency factors are estimated as 476 for , 920 for and 12; 110 for . Mean 0.9821 0.1382 0.6594 Computer time 233,303 jy jy jy Monte Carlo S.E. Covariance & Correlation of Posterior 0.000277 8.34310;5 -0.629 0.217 0.000562 -0.0001479 0.0006631 -0.156 0.0121 0.0002089 -0.0004228 0.01209 Table 2.1 Daily returns for the Pound against the Dollar. Summaries of Figure 2.1. The S.E. of simulation is computed using 100,000 lags. Italics are correlations rather than covariances. Computer time is seconds on a P5/133. 2.4 Duration time series The second example will be a model of consecutive durations. Observation-driven autoregressive models of this type have been suggested in work by Wold (1948), Cox (1955) and, more recently, Engle and Russell (1994). Parameter-driven models have been analysed by Gamerman (1992). Here a very simple exponential distribution model is used to summarise a day of durations in seconds between price announcements (bids and asks) in the Japanese Yen against the US Dollar exchange rate recorded in the Olsen and Associates tape for 5th April 1993. Summary information for this series is given in Figure 2.2. If yt denotes the time between price announcements, then the basic model will be the same as the SV model (1.4.1) but with the normality assumption on "t replaced by an exponential density with mean one. An obvious model for this type of data would be a doubly stochastic Poisson point process (a Cox process) with a log-link intensity model which evolves according to an OrnsteinUhlenbeck process. Here a discrete time approximation to this model is used, with exponen- C HAPTER 2 SINGLE SAMPLING 43 Durations between price announcements 1500 1000 500 0 2.5 5 7.5 10 12.5 15 17.5 20 22.5 .02 .01 0 200 400 Correlogram of price durations 1 600 800 1000 1200 1400 1600 .5 0 5 10 15 20 25 30 35 40 45 50 Figure 2.2 Time between price announcements in the Yen/Dollar market on 5 April 1993. Histogram of the unconditional distribution and correlogram of the consecutive duration times. tially distributed time intervals f (ytjt) = t exp(;t yt); t = exp(;t ); t = t + t ; t +1 NID (0; ): 2 This type of discrete time approximation to the continuous time model was used previously by Gamerman (1992) and Carlin, Gelfand and Smith (1992a). The time series behaviour of this model can be summarised by E (yt ) = E ; ;1 t = ; exp( =2); 1 2 Var (yt) = ; exp() 2 exp( ) ; 1 ; 2 2 2 C HAPTER 2 SINGLE SAMPLING and Cov (yt; yt s) = + E 44 ; ;1 ;1 + Cov(;1;1 ); t t+s t t+s = ; exp( ) [2 exp f (1 + 2s)g ; 1] : 2 This implies 2 2 (yt ; yt s) = [2 exp f (1 + 2 )g ; 1] [2 exp ( ) ; 1] s 2 Corr + 2 which has a similar structure to an ARMA(1,1) model. This model can be analysed using the same Gibbs sampling approach used for the SV model, but now the proposal density becomes, with expansion around some point mt , N 2 ; ; t with acceptance probability t t = t + t fyt exp (;mt ) ; 1g 2 exp [;yt fexp(;t ) ; exp(;mt ) (1 ; (t ; mt ))g], which gives an average acceptance rate of approximately 1 ; yt2 exp(;mt )ft2 + (t ; m)2 g, see Appendix. The updating of the parameters is unchanged from the SV model case except for P jy; fn; yt exp(;t )g with a prior proportional to on : The results of the analysis G 1 are given in Figure 2.3 for this data. The sampler works in a satisfactory way, converging fairly quickly and leading to low integrated autocorrelation time for the parameters and . The sampler is rather less efficient for the parameter . Reparameterisation, of the kind used in chapter 4, should eliminate the problem for . The efficiency in the other parameters is due to the lower level of persistence in the log-mean durations than the corresponding log-volatilities for the SV model. 2.5 Conclusion In this chapter I have shown that reasonably simple and general single-move Gibbs samplers or pseudo-dominating Metropolis algorithms can be developed for this wide class of non- C HAPTER 2 SINGLE SAMPLING sampling phi | y sampling beta | y .98 40 .96 30 0 45 25000 50000 75000 100000 Histogram 50 sampling sigma_eta | y .2 .15 0 25000 50000 75000 100000 Histogram 0 30 25000 50000 75000 100000 Histogram .1 20 25 .05 .94 .96 Correlogram 1 .98 10 1 20 30 Correlogram 1 0 40 0 0 100 200 .125 .15 Correlogram 1 .175 .2 0 0 100 200 0 100 200 Figure 2.3 Single-move Gibbs sampler for the duration time series model. Top graphs: the simulation against iteration number. Middle graphs: histograms of the resulting marginal distributions. Bottom graphs: the corresponding correlograms for the iterations (the lag numbers should be multiplied by 100). Gaussian state space models. Samplers based upon Taylor expansions accept with very high probability. Unfortunately, the single-move methods developed appear to suffer from slow convergence and inefficient output, as measured by integrated autocorrelation time, in equilibrium. We need more reliable methods if Bayesian methods are going to be widely used for such models. In the next chapter I shall develop one such approach. C HAPTER 2 SINGLE SAMPLING 46 2.6 Appendix 2.6.1 Acceptance frequencies 2.6.1.1 Expansion around mode It is straightforward to make good approximations to the acceptance rates for the accept-reject algorithms used in the single-move Gibbs samplers. Suppose we have iterated to the mode mt of the conditional density f (t jt;1 ; t+1 ; yt ) and we use this as the expansion point. So by definition, mt = t + t l0 (mt ), where t is the jackknife mean and l(t ) log f (ytjt ). I will 2 outline the univariate case for expositional reasons. The acceptance probability for proposal t is given by acct = E [exp fl(t ) ; l(mt ) ; l0(t )(t ; mt )g] where t jyt; t; ; t (mt ; t ): 1 +1 N 2 Hence acct 1 + E fl(t) ; l(mt ) ; l0 (mt)(t ; mt )g = 1 + E fl(t)g ; l(mt ) ' 1 + l00(mt ) 2t : (2.7) 2 (2.8) Now, for the SV model and the duration model the upper bound of (2.7) can easily be determined. C HAPTER 2 SINGLE SAMPLING 47 2.6.1.2 Expansion around arbitrary point If expansion is performed around mt where mt is not the mode we have acct = E [exp fl(t ) ; l(mt ) ; l0(mt )(t ; mt )g] ; where t jyt; t; ; t 1 Let t +1 N t + t l0(mt ); t : 2 2 = t + t l0(mt ). Hence 2 acct 1 + E fl(t) ; l(mt ) ; l0 (mt)(t ; mt )g 00 l ( m t) ' 1 + E 2 (t ; mt ) 00 = 1 + l (mt ) t + (t ; mt ) : 2 2 2 2 This result provides insight into why it is best to choose the mode where t = mt . This is par- ticularly important when l00 (mt ) is highly negative in which case the measurement dominates the jackknife prior. This, however, is rarely the case for single move suggestions. 2.6.1.3 SV model The model acceptance rate, taking the expectation over the suggestions is acct 1 ; ' 1; yt2 yt2 4 2 2 2 exp(;mt )fexp(t =2) ; 1g 2 exp(;mt )t : 2 C HAPTER 2 SINGLE SAMPLING A typical situation is where t2 = 0:01. 48 Notice that it is unlikely that yt exp(;mt )= 2 will be very large as mt is a smoothed draw from the log-mean of yt given the data and so reflects the variation in yt . A very extreme case yt2 exp(;mt )= 2 reasonably when yt exp(;t )= 2 = 100 implies acct ' 0:75: More = 2, acct ' 0:995, while when the value is one, acct ' 0:9975. In my experience this acceptance rate seems usual for real financial datasets. If mt is not the mode (the prior mean, t , say) then acct ' 1 ; yt2 4 2 exp(;mt )ft + (t ; mt ) g: 2 2 2.6.1.4 Exponential duration model The modal acceptance rate, taking the expectation over the suggestions is acct 1 ; E [yt fexp(;t ) ; exp(;mt )g] = 1 ; yt exp(;mt ) fexp(t =2) ; 1g ' 1 ; y exp(;mt )t : 2 2 t 2 This has similar acceptance rates to the SV model given above. The non-modal acceptance rate is acct ' 1 ; y exp(;mt )ft + (t ; mt ) g: 2 t 2 2 2.6.2 Proof of Theorem 2.1 The basis of this will be that, ignoring constants log f (t jt; ; t ; yt) = l(t ) ; (t ; t )T ;t (t ; t); 1 +1 1 2 1 l(mt ) + l0 (mt )T (t ; mt ) ; (t ; t )T ;t (t ; t); 1 2 = log g(t; mt); 1 C HAPTER 2 SINGLE SAMPLING 49 by Taylor’s mean value theorem since l00 (t ) is negative semi-definite for all t . Bounding means we can use the standard accept-reject results to prove the validity of the sampler. This gives the proof of the first result. Recall I have written the densities f and g only up to constants of proportionality. I will express 1 1 log g(t; mt ) = log Rt(mt ); (t ;at )T ;t (t ;at ); Tt ;t t; 2 2 1 1 where at = t +t lT (mt ); and the remainder terms which only involve mt are 1 log Rt (mt ) = l(mt ) ; l0(mt )T mt + aTt ;t at : 2 1 Now the probability of acceptance, Pr(A), is Ft =Gt (mt ), where Ft = Z f (t)dt and Gt(mt ) = Z g(t; mt )dt: since, Pr(A) = E [f (t )=g(t; mt )] Z f (x) g(x; mt) dx = Ft : = g(x; mt) Gt (mt ) Gt (mt ) Now Gt (mt ) = cRt (mt ) where c does not involve mt . Hence to maximise the acceptance probability we can select mt to minimise log Rt (mt ). However, @ log Rt (mt ) @mt = l0 (mt ) ; l00 (mt )mt ; l0(mt ) + l00 (mt )at = l00 (mt ) ft + t l0(mt ) ; mt g : C HAPTER 2 SINGLE SAMPLING Equating to zero, this has a solution m ct which satisfies the equation m ct 50 = t + t l0 (m ct ). This is the only point at which the derivative is 0. However, the solution to this equation is also the mode of f . To verify we have a minimum, it is clear that at mt =m ct , @ log Rt (mt ) = l00(mb ) t t @mt 2 2 2 which is positive semi-definite. This proves the second part of the theorem.