Chapter 3 BLOCK SAMPLING 3.1 Introduction 1

advertisement
Chapter 3
BLOCK SAMPLING
3.1 Introduction
As shown in chapter 2, single move sampling has its perils, even when the acceptance rates
are very close to 1. The main problems are slow convergence and inefficiency in estimating
posterior moments when the resulting chain has reached equilibrium. In order to avoid these
separate, but highly related, pitfalls I now consider block sampling of the states. Liu, Wong
and Kong (1994) suggest that a key feature to improve the speed of convergence for samples is
to use blocks. A proof of this for Gaussian problems for which the off diagonal elements of the
inverse variance are negative is given by Roberts and Sahu (1997), and is discussed in chapter
4. As mentioned in chapter 1, this has motivated a considerable time series statistics literature
on this topic in models built out of Gaussianity, but with some non-normality mixed in: see the
work of Carter and Kohn (1994), Shephard (1994) and de Jong and Shephard (1995), outlined
in chapter 1. However, these methods are very model specific applying only to models which
are in GSS form, conditional upon some parameters or mixture components. In this chapter I
introduce a more robust, general and flexible approach for all non-Gaussian state space forms
51
C HAPTER 3 BLOCK SAMPLING
52
of the type in (3.1),
yt
f (ytjst);
t+1
= dt + Ttt + Ht ut; ut NID(0; I );
1 jY0
N(a j ; P j );
10
st = ct + Zt t
10
(3.1)
t = 1; :::; n;
for which it is assumed that log f (yt jst ) is concave in st and consequently in t ; as in chapter
2.
Sampling from j y may be too ambitious as this is highly multivariate and so if n is very
large it is likely that we will run into very large rejection frequencies, counteracting the effectiveness of the blocking. Hence I will employ a potentially intermediate strategy. The sampling
method will be based around sampling blocks of disturbances, say ut;k
= (ut; ; :::; ut k; )0
1
+
1
given beginning and end conditions, t;1 and t+k+1 ; and the observations. There is, of course,
a deterministic relationship between ut;k and t;k
= (t ; :::; t k )0 so we could equally ima+
gine that the states are being sampled given the end conditions t;1 and t+k+1 . I call these
end conditions “stochastic knots”, the nomenclature being selected by analogy with their role
in splines. At the beginning and end of the data set, there will be no need to use two sided
knots. In these cases typically only single knots will be required, simulating, for example,
from 1;k jt+k+1 ; y1 ; :::; yt+k . In practice
k will be a tuning parameter, allowing the lengths
of blocks to be selected. Typically it will be chosen to be stochastic, varying the stochastic
knots at each iteration of the samplers. If k is too large the sampler will be slow because of
rejections, too small it will be correlated because of the structure of the model.
3.2 MCMC method
In this section, I detail the Markov chain Monte Carlo method. In Section 3.2.1, I describe the
issue of randomly placing the fixed states and signals for each sweep of the method. In Section
C HAPTER 3 BLOCK SAMPLING
53
3.2.2, I describe how the proposal density is formed to approximate the true conditional density
of a block of states between two knots. The next section, Section 3.2.3, shows how this proposal density can be seen as a GSSF model and therefore easily sampled from. The expansion
points for the signals, usually chosen as the mode of the conditional density, are described in
Section 3.2.4. Finally, the construction of the overall Metropolis method for deciding whether
to update a block of states is detailed in Section 3.2.5.
3.2.1 Stochastic knots
The stochastic knots play a crucial role in this method. A fixed number, K , of states, widely
spaced over the time domain, are randomly chosen to remain fixed for one sweep of the MCMC
method. These states, known as “knots”, ensure that as the sample size increases the algorithm
does not fail due to excessive numbers of rejections. Since the knots are selected randomly
the points of conditioning change over the iterations. I propose to work with a collection of
stochastic knots, at times
= (1 ; :::; K )0 and corresponding values n = (0 1 ; :::; 0 K )0 ;
~
which appropriately cover the time span of the sample. The corresponding signals are, of
course, also regarded as fixed.
The selection of the knots will be carried out randomly and independently of the outcome
of the MCMC process. In this chapter I have used the scheme
i = int fn (i + Ui )=(K + 2)g ; where Ui U(0; 1); i = 1; :::; K;
(3.2)
where int(:) means rounded to the nearest integer. Thus the selection of knots is now indexed
by a single parameter
K which is controlled. Certain, widely spaced, states are therefore
chosen to retain their values from the previous MCMC sweep. I now detail the form of the
conditional density between two knots and the corresponding proposal density.
C HAPTER 3 BLOCK SAMPLING
54
3.2.2 The proposal density
The basis of the MCMC method will be the use of a Taylor type expansion of the conditional
density
log f = log f (t;k j t; ; t
1
k
+ +1
; yt ; :::; yt+k )
around some preliminary estimate of t;k made using the conditioning arguments t;1 ; t+k+1
and yt ; :::; yt+k . These estimates, and the corresponding st;k
= (st ; :::; st k ), will be denoted
+
by hats. How they are formed will be discussed in Section 3.2.4. As in chapter 2, I will write
l(st ) to denote log f (yt jst ) (an implicit function of t ) and its first and second derivatives with
respect to st as l0 (st ) and l00 (st ) respectively. The expansion is then
P
log f = ; u0t;k ut;k + ti tk l(si); i = ci + Ti i + Hiui; si = di + Zii
1
2
' ; u0t;k ut;k +
1
2
+
=
Pt+k
i=t
+1
l(sbi ) + (si ; sbi )T l0 (sbi ) + 12 (si ; sbi )T Di (sbi )(si ; sbi ) (3.3)
= log g:
where ut;k and st;k are defined, implicitly, in terms of t;k . Hence the approximating form
in (3.3) is regarded as being in terms of the states. This is very similar to the single move
method, based on a quadratic expansion, that I considered in chapter 2 in (2.5). I will require
the assumption that the as yet unspecified matrix
Di (s) is everywhere strictly negative as a
function of s. Typically I will take Di (sbi ) = l00 (sbi ) so that the approximation is a second order
Taylor expansion. This will be convenient, for in the vast majority of cases, for l concave for
example, l00 will be everywhere strictly negative. However, I let the method have the possibility
that we will not take Di as the second derivative so that I cover unusual (non log-concave) cases
as well. Of course for those cases, I will have to provide sensible rules for the selection of Di .
A crucially attractive feature of this expansion is that the ratio f=g , used in the Metropolis
step, involves only the difference between l(si ) and l~ (si ) = l(sbi ) + (si ; sbi )T l0 (sbi ) +
1
2
(si ;
bsi )T Di (sbi )(si ; sbi ), not the transition density u0t;k ut;k . The implication of this is that the al-
C HAPTER 3 BLOCK SAMPLING
55
gorithm should not become significantly less effective as the dimension of i increases. This
can be contrasted with other approaches such as the numerical integration routines used in Kitagawa (1987), whose effectiveness usually deteriorate as the dimension of i increases.
This type of expansion also appears in the work of Durbin and Koopman (1992) where
a sequential expansion based on filtering and smoothing algorithms was used to provide an
approximate likelihood analysis of a wide class of non-Gaussian models. Their method is essentially modal estimation using a first order expansion rather than the expansion given above.
3.2.3 Simulating using GSSF model
The density of g is highly multivariate Gaussian. It is not a dominating density for log f , but
there is some hope that it will be a good approximation. Now it can be seen that,
P
log g = ; u0t;k ut;k + ti tk l(sbi) + (si ; sbi)T l0 (sbi) + (si ; sbi)T Di(sbi)(si ; sbi)
+
=
1
2
1
2
Xt k
= c ; u0t;k ut;k ;
(yb ; si)T Vi; (ybi ; si)
i t i
1
2
1
2
where
and Vi;1
+
(3.4)
1
=
ybi = sbi + Vi l0 (sbi ); i = t; :::; t + k:
= ;Di (sbi) by equating coefficients of powers of si. It is now clear that the approx-
imating density can be viewed as a GSS form model consisting of the required Gaussian measurement density with pseudo measurements ybi and the standard linear Gaussian Markov chain
prior in the states. So the approximating joint density of t;k jt;1 ,t+k+1 ; yt ; :::; yt+k can be
calculated by writing:
ybi
= si + "i; "i N (0; Vi)
si
= ci + Zii; i = t; :::; t + k;
i+1 = di + Ti i + Hi ui ; ui NID(0; I ):
(3.5)
C HAPTER 3 BLOCK SAMPLING
The knots are fixed by setting ybi
measurement equation ybi
56
= i; i = t ; 1 and i = t + k + 1 and by making the
= i + "i , "i N (0; I ), where is extremely small, at the positions
of these knots, i = t ; 1 and i = t + k + 1.
The model is now in GSSF. Consequently, it is possible to simulate from
t;k
j
t;1 ; t+k+1 ; yb using the de Jong and Shephard (1995) simulation smoother, described in
chapter 1, on the constructed set of pseudo-measurements ybt ; :::; ybt+k .
As g does not bound f it is not possible to use this simulation smoother inside an acceptreject algorithm within the Gibbs sampler. Rather I will use the simulation smoother to provide
suggestions for the pseudo-dominating Metropolis algorithm suggested by Tierney (1994) and
discussed in chapter 2.
However, before describing the Metropolis move probability, I will first detail how the expansion points sbt ; :::; b
st+k are found.
3.2.4 Finding sbt; :::; sbt+k
It is important to select sensible values for the sequence sbt ; :::; b
st+k , the points at which the
quadratic expansion is carried out. The most straightforward choice would be to take them as
the mode of
f (st ; :::; st+k jt;1 ; t+k+1; yt ; :::; yt+k ). An expansion which is similar in spirit
to (3.5), but without the knots, is used in the work of Durbin and Koopman (1992). Their algorithm converges to the mode of the density of jy in cases where
@2 log l=@t @t0 is neg-
ative semi-definite; the same condition is needed for generalized linear regression models to
have a unique maximum (see McCullagh and Nelder (1989, p. 117), Wedderburn (1976) and
Haberman (1977)). This condition is typically stated as a requirement that the link function be
log-concave. Durbin and Koopman (1992) use an expectation smoother to provide an estimate of jy using the first order Taylor expanded approximation. These authors do not adopt a
Bayesian approach, the focus of their interest being on the mode.
The approach based on the second order expansion of (3.5) is a sounder basis on which to
C HAPTER 3 BLOCK SAMPLING
57
find the mode. In the applications to be presented the interest is on the mode given the knots.
I first expand around some arbitrary starting value of sbt;k to obtain (3.5), set sbt;k to the means
from the resulting expectation smoother (given in chapter 1) and then expand sbt;k again, and so
on. This will ensure that we obtain b
st;k as the mode of f (st ; :::; st+k jt;1 ; t+k+1 ; yt ; :::; yt+k ).
This is, in fact, a very efficient Newton-Raphson method (since it avoids explicitly calculating
the Hessian) using analytic first and second derivatives on a concave objective function. Thus
the approximations presented here can be interpreted as Laplace approximations to a very high
dimensional density function. In practice it is found that the after 3 iterations of the smoothing
algorithm we obtain a sequence b
st;k
= (sbt; :::; bst k ) which is extremely close to the mode.
+
This is very important since this expansion will be performed for each iteration of the proposed
MCMC sampler.
In fact rather than considering a block st;k , it is possible to use the same method to find the
mode of all the signals which are not knots, s say, conditional upon all the measurements and
the knots, n , by using this scheme with the GSSF set analogously to (3.5). Again this is a
Newton-Raphson scheme and the overall mode bs can be found. The resulting GSSF can then
be simulated from and each block between the knots can be updated or remain the same based
upon the Metropolis criteria of the following section.
3.2.5 Metropolis acceptance probability
The setup of the proposal density and the expansion around the mode has been described. I now
wish to describe the way the Metropolis acceptance probabilities are constructed. Suppose
we have set up
log g(t;k j t; ; t
1
k
+ +1
; yt ; :::; yt+k ) as described. Then we can perform a
o , with corresponding
direct Metropolis method deciding whether to retain our old values t;k
n and sn drawn from g (:). The probability
signals sot;k , or to update to the proposed values t;k
t;k
C HAPTER 3 BLOCK SAMPLING
n is
of accepting t;k
(
58
)
! (snt;k )
o
n
Pr(t;k ! t;k ) = min 1; o ;
! (st;k )
where
! (st;k ) = expfl(st;k ) ; l~ (st;k )g = exp
" t+k
X
i=t
(3.6)
#
l(si ) ; l~ (si )
:
If we use accept-reject within Metropolis, as in chapter 2, of Tierney (1994), then we have
the scheme that we sample t;k from the proposal density using the simulation smoother until
accepting the proposed block with probability min[! (st;k ); 1] . When this stage of acceptance
n
has been achieved we set t;k
= t;k .
This now forms the proposed value at the M-H stage.
The M-H probability of accepting this proposal is
o
Pr(t;k
!
(
n )
t;k
)
!ar (snt;k )
= min 1;
;
!ar (sot;k )
where
!ar (st;k ) = exp[l(st;k ) ; minfl~ (st;k ); l(st;k )g]
= exp[maxf0; l(st;k ) ; l (st;k )g] = max f1; !(st;k )g :
~
As observed in chapter 2,
! (st;k ) should be close to 1, hence resulting in high probabilities
of acceptance in both stages. We proceed in this fashion, sampling all the blocks between the
fixed knots. This defines a complete MCMC sweep through the states.
The parameters are then sampled from f (j) via their conditional densities described in
chapter 2 . For the next sweep a fixed number of knots is again randomly chosen from the states
and so the process continues.
C HAPTER 3 BLOCK SAMPLING
59
3.3 Particular measurement densities
In this section, I will detail interesting cases of measurement densities which can be analysed
via the MCMC method introduced.
3.3.1 Example 1: Exponential family measurements with canonical link
An exponential family for the measurement density arises when log f (yt jt ) = yt t ; b(t ) +
c(t ) and we have a known function h(:) with h(t ) = st . If a canonical link is assumed, that
is t
= st ; then log f (ytjst) = ytst ; b(st ) + c(yt). So
::
n
:
o
vt;1 =b (sbt ) and ybt = sbt + vt yt ; b (sbt ) .
:
The notation b
(:)
and
::
:
b (:) indicates the first and second derivatives of b(:) respectively. A
special case of this is the Poisson model, where b(st ) = exp(st ) and so
vt;1 = exp(sbt ); ybt = sbt + exp(;sbt ) fybt ; exp(sbt )g :
Another important example is the binomial, where b(sbt ) = n log f1 + exp(sbt )g. For this model
vt;1 = npt (1 ; pt ); ybt = bst +(yt ; npt ) = fnpt (1 ; pt )g ; where pt = exp(sbt )= f1 + exp(sbt )g .
C HAPTER 3 BLOCK SAMPLING
60
3.3.2 Example 2: SV model
In this case st
= t :
As seen in chapter 2, this model has
log f (ytjt) = ;t =2 ;
yt2 exp(;t )=2 2. Thus
vt 2
yt2
2
exp(
;
b
)
;
y
b
=
b
+
y
exp(
;
b
)
=
;
1
:
t
t
t
t
2 2
2 t
vt;1 =
This case is particularly interesting as vt depends on yt , which cannot happen in the exponential
family canonical link case. At first sight this raises some problems for as yt
and ybt
! 0 so vt; ! 0
1
! ;1. This would suggest that the suggestions from the simulation smoother might
always be rejected. However, this observation ignores the role of the prior distribution, which
will in effect treat such observations as missing. Of course there is a numerical overflow problem here, but that can be dealt with in a number of ways without resulting in any approximation.
3.3.3 Example 3: heavy tailed SV model
This argument can be extended to allow t in the SV model, see (2.4), to follow a scaled tdistribution, t
p
= tt = ( ; 2)= , where tt t . Then
log f (ytjt ) = ; ; ( + 1) log 1 + yt exp(;t ) ;
2
2
( ; 2)
t
2
2
2
l0 (t ) =
1 4 22 (;2) yt2 exp(;t )
2
( +1)
1+
yt2 exp(;t )
2 ( ;2)
3
; 15 ;
and
3
2
l00 (t ) = ;
so
( + 1) 4 yt exp(;t )= ( ; 2) 5 :
2
4
(1 + yt2 ;;t )
2
2
exp(
(
2)
) 2
The resulting vt;1 and ybt are easy to compute.
This approach has some advantages over the generic outlier approaches for Gaussian models suggested in Shephard (1994) and Carter and Kohn (1994), which explicitly use the mixture
C HAPTER 3 BLOCK SAMPLING
61
representation of a t-distribution, since they require mixtures in order to obtain conditionally
GSS form models.
3.3.4 Example 4: factor SV model
An economically interesting factor SV model can be constructed, from the corresponding
ARCH work of Diebold and Nerlove (1989) and King, Sentana and Wadhwani (1994). In the
simplest univariate case it takes on the form
yt = t exp(t =2) + !t ;
where the model is the same as (1.4.1) but with added measurement error !t
NID(0; ).
2
With this model it is not possible to set t so as to drive the variance of yt to zero, as it is
bounded from below by !2 . Interestingly
yt
log f (ytjt) = ; 1 log( exp(t) + ! ) ;
;
2
2( exp(t ) + ! )
2
2
2
2
2
is not necessarily concave in t . Notice that
l0 ( ) =
t
and so
1
l00 ( ) = ;
t
yt2
2 exp(t )
;1 ;
2( 2 exp(t ) + !2 ) ( 2 exp(t ) + !2 )
4 exp(2t )
2yt2
;1 ;
2 ( 2 exp(t) + !2 )2 ( 2 exp(t ) + !2 )
which can be positive if yt is small. There are a number of approaches which can be suggested
to overcome this problem. We could ignore the contribution of the ;1=2 log(2 exp(t )+ !2 )
C HAPTER 3 BLOCK SAMPLING
62
term to the second derivative. This would give us
Dt = ;
yt2 2 exp(2t )
:
( 2 exp(t ) + !2 )3
This would be a good approximation if !2 is small or if yt2 is big. In any case the approximation does not affect the validity of the approach, only the rejection probability. A multivariate
generalisation of this model is considered in chapter 7.
3.4 Illustration on SV model
To illustrate the effect of blocking I will work with the SV model for purposes of comparison,
considered in the previous two chapters. I now analyse the output from the suggested MCMC
algorithm on simulated data and the real data examined in chapter 2.
3.4.1 Output of MCMC algorithms on simulated data
The simulated data allow two sets of parameters, designed to reflect typical problems for
weekly and daily financial data sets. In the weekly case,
while in the daily case = 1, 2 = 0:1 and = 0:9,
= 1, = 0:01 and = 0:99. Here I carry out the MCMC method
2
for a simulated SV model noting efficiency gains over the single-move algorithm of chapter 2.
Table 3.1 reports some results from a simulation using n
= 1; 000.
The table splits into two
sections. The first is concerned with estimating the states given the parameters, a pure signal
extraction problem. It is clearly difficult to summarise the results for all 1; 000 time periods
and so I focus on the middle state, 500 , in all the calculations. Extensive simulations suggest
that the results reported here are representative of these general results.
The second section of Table 3.1 looks at the estimation of the states at the same time as
estimating the parameters of the model. The three parameters of the SV model are drawn,
C HAPTER 3 BLOCK SAMPLING
63
conditional upon the states and measurements, in the manner described in chapter 2, Section
2.3.3. Hence for that simulation the problem is a four-fold one: estimate the states and three
parameters.
The simulations are analysed using a Parzen type window, see chapter 1. The table reports
the ratio of the resulting variance of the single-move sampler to the multi-move sampler. Numbers bigger than one reflect gains from using a multi-move sampler. One interpretation of the
table is that if the ratio is x then the single-move sampler has to be iterated x times more than
the multi-move sampler to achieve the same degree of precision in the estimates of interest.
So if a sample is 10 times more efficient, then it produces the same degree of accuracy from
1; 000 iterations as 10; 000 iterations from the inferior simulator.
For the results of Table 3.1, the multi-move sampler was run for 100; 000 iterations with
the bandwidth, discussed in Section 1.2.2.4, set as
was run for 1000; 000 iterations with B
B = 10; 000. The single move sampler
= 100; 000. The run-in (number of iterations before
the samples were recorded) for the iterations was 10; 000 and 100; 000 for the multi-move and
single move samplers respectively.
Table 3.1 indicates a number of results. In all cases the multi-move sampler outperformed
the single move sampler. When the number of knots is 0, so all the states are sampled simultaneously, the gains for the weekly parameter case are not that great, whilst for the daily
parameters they are substantial. This is because the Gaussian approximation is better for the
daily parameters since the Gaussian AR(1) prior dominates for this persistent case. This is
important, because it is for persistent cases where the single move method does particularly
badly. There are two competing considerations here. If the number of knots is large then the
blocks are small and the Gaussian approximation is good since it is over low dimension and
so the Metropolis method will accept frequently. On the other hand we will be retaining a lot
of states from the previous MCMC sweep leading to correlation over sweeps. Generally, the
multi-move method does not appear to be too sensitive to block size but the best number of
C HAPTER 3 BLOCK SAMPLING
64
Weekly parameters
Statesjparameters
States
K=0 K=1
1.7
4.1
20
22
1.3
1.3
1.5
1.4
16
14
K=3
7.8
32
1.1
1.1
32
K=5 K=10 K=20
17
45
14
28
39
12
1.7
2.2
1.7
1.6
2.6
1.7
14
23
6
K=50 K=100 K=200
12
4.3
3.0
12
21
1.98
1.5
1.7
1.5
1.5
2.0
1.5
9
10
1
Daily parameters
Statesjparameters
States
K=0 K=1 K=3
66
98
98
91
40
30
2.7
3.1
2.9
16
13
18
93
51
51
K=5 K=10 K=20
85
103
69
60
47
80
2.8
3.4
3.8
18
16
18
76
65
106
K=50 K=100 K=200
25
8.5
2.5
14
27
18
2.6
3.6
1.8
6.8
11
5.7
23
27
26
Table 3.1 Relative efficiency of block sampler to single-move Gibbs sampler. K denotes the
number of stochastic knots used. The figures are the ratio of the computed variances, and so
reflect efficiency gains. The variances are computed using 10,000 lags and 100,000 iterations
in all cases except for the single-move sampler on daily parameters cases. For that problem
100,000 lags and 1,000,000 iterations were used. In all cases the burn-in period is the same
as the number of lags.
knots for these simulations appears to be about 10. The optimal size of blocks is considered in
more detail in chapter 4.
3.4.2 Output of MCMC algorithms on real data
To illustrate the effectiveness of this method I will return to the application of the SV model
considered in chapter 2. I maintain exactly the same model used earlier, but now use
10
stochastic knots in the block sampler. The basic results are displayed in Figure 3.1. This
was generated by using 200 iterations using the initial parameter, 300 iterations updating the
parameters and states and finally recording 10; 000 iterations from the equilibrium path of the
sampler.
The graph shares the features of Figure 2.1, with the same distribution for the parameters.
However, the correlations amongst the simulations are now quite manageable. This seems a
workable tool.
C HAPTER 3 BLOCK SAMPLING
Sampling phi | y
1
Sampling beta | y
1.25
65
.95
1
.3
.9
.75
.2
.85
.5
.1
0
5000
10000
0
40
5
20
2.5
1
.85
.9
Correlogram
.95
5000
10000
150
.5
.75
Correlogram
1
300
450
5000
10000
10
1
.1
.2
Correlogram
1
0
0
0
20
1
0
Sampling sigma _eta | y
.4
.3
0
0
150
300
450
0
150
300
450
Figure 3.1 Daily returns for the Pound against the US Dollar. Top graphs: the simulation
aginst iteraions number for block sampler using 10 knots. Middle graphs: histograms of the
resulting marginal distributions. Bottom graphs: the corresponding correlogram for the iterations.
It is often useful to report the precision of the results from the simulation estimation. Here I
use the Parzen type window of chapter 1 using 1; 000 lags. Note that this estimates of integrated
autocorrelation time less well as the autocorrelation increases. The results given in Table 3.2
are consistent with those given in Table 2.1, for the 1; 000; 000 iterations of the single-move
algorithm. However, the precision achieved with the 1; 000; 000 iterations is broadly the same
as that achieved by 10; 000 iterations from the multi-move sampler.
3.5 Conclusions
The methods and results indicate that the single-move algorithms, for example those of chapter
2, are likely to be unreliable for real applications due to their slow convergence and high correl-
C HAPTER 3 BLOCK SAMPLING
Mean
jy
0.9802
jy
0.1431
jy
0.6589
Computer time 322
66
Monte Carlo S.E. Covariance & Correlation of Posterior
0.000734
0.000105
-0.689
0.294
0.00254
-0.000198 0.000787
-0.178
0.0100
0.000273 -0.000452
0.00823
Table 3.2 Daily returns for Pound against Dollar. Summaries of Figure 3.1, 10,000 replications of the multi-move sampler, using 10 stochastic knots. The Standard Error of the simulation is computed using 1,000 lags. The figures in italics are correlations. Computer time is in
seconds on a P5/133.
ation properties. Instead I argue that the development of Taylor expansion based multi-move
simulation smoothing algorithms can offer the delivery of reliable methods. The basis for the
proposals in this chapter are approximations to the conditional densities which lead to the socalled independence Metropolis samplers.
The resulting MCMC methods have five basic advantages. (1) they integrate into the analysis of non-Gaussian models the role of the Kalman filter and simulation smoother, so fully
exploiting the structure of the model to improve the speed of the methods. (2) the central expansion in the chapter has been previously used in various approximate methods suggested in
the time series and spline literatures (but not in this context). (3) the expansion and the Metropolis ratio only use log f (yt jst ) and its approximation so as the dimension of the state increases
the computational efficiency of the method should not diminish significantly. (4) the methods
are only of O (n), the time dimension, regardless of the number of knots (indeed the methods
are constant in speed for different numbers of knots). (5) the methods can be straightforwardly
extended to many multivariate cases as seen in chapter 7.
It is clear from the results of this chapter, that the efficiency gains resulting from the use of
multi-move samplers is considerable. Some theoretical justification for the use of multi-move
samplers rather than single move samplers is provided in chapter 4. In addition, the role of the
parameters indexing the non-Gaussian state space model has been largely suppressed during
this chapter. In fact, it is often the case that the parameters are of central interest (often more
so than the states in econometric applications). Issues relating to the choice among equivalent
C HAPTER 3 BLOCK SAMPLING
parameterisations are discussed in the following chapter.
67
Download