Spectral Analytic Comparisons for Data Augmentation Vivekananda Roy Department of Statistics

advertisement
Spectral Analytic Comparisons for Data Augmentation
Vivekananda Roy
Department of Statistics
Iowa State University
September 2011
Abstract
The sandwich algorithm (SA) is an alternative to the data augmentation (DA) algorithm that
uses an extra simulation step at each iteration. In this paper, we show that the sandwich algorithm
always converges at least as fast as the DA algorithm, in the Markov operator norm sense. We also
establish conditions under which the spectrum of SA dominates that of DA. An example illustrates
the results.
1
Introduction
Let fX : X → [0, ∞) be a probability density function (with respect to a σ−finite measure µ, say) and
assume that direct simulation from fX is not possible. Suppose f : X × Y → [0, ∞) is a joint density
R
(with respect to µ × ν, say) whose x−marginal is fX i.e., Y f (x, y)ν(dy) = fX (x). If sampling from
the corresponding conditional densities fX|Y and fY |X is straightforward, then we can use the data
augmentation (DA) algorithm (Tanner and Wong (1987)) based on f (x, y) to explore fX . In particular,
the Markov transition density (Mtd) of this DA algorithm is given by
Z
0
k(x |x) =
fX|Y (x0 |y)fY |X (y|x)ν(dy).
Y
So each iteration of the DA algorithm consists of two simple steps — a draw from fY |X followed by
a draw from fX|Y . The DA algorithm, like its deterministic counterpart EM algorithm, is considered a
useful algorithm that often suffers from slow convergence.
Key words and phrases. Compact operator, Convergence rate, Data augmentation algorithm, Eigenvalue, Markov chain,
Spectrum.
1
Following Liu and Wu (1999), Meng and van Dyk (1999) and van Dyk and Meng (2001), Hobert
and Marchev (2008) recently introduced an alternative to DA to speed up the convergence. In order to
describe Hobert and Marchev’s (2008) sandwich algorithm (SA), let R(y, ·) be the Markov transition
function (Mtf) of any Markov chain on Y with invariant density fY , where fY is the y−marginal density
R
of f (x, y) i.e., X f (x, y)µ(dx) = fY (y). Then the Mtd of the sandwich algorithm is
Z Z
fX|Y (x0 |y 0 )R(y, dy 0 )fY |X (y|x)ν(dy).
k̃(x0 |x) =
Y
Y
Simple calculations show that fX is the invariant density for k̃ and that, if R is reversible with respect to
fY , then k̃ is reversible with respect to fX . While Theorem 1, the first main result of this paper, requires
that fX is only the invariant density for k̃, Theorem 2, like most of the existing theory comparing DA
and SA, is based on the stronger assumption that R is reversible with respect to fY . Notice that each
iteration of the sandwich algorithm consists of three steps — a draw from fY |X followed by a draw from
R and finally a draw from fX|Y . So SA has an extra step according to R sandwiched between the draws
from the two conditional densities. In practice R is often chosen to be a univariate Markov chain, so
simulating from R is computationally inexpensive compared to simulating from fY |X and fX|Y which
are often high dimensional densities. In that case the DA and sandwich algorithms are equivalent in
terms of computational complexity. On the other hand, there is a great deal of empirical evidence
showing that sandwich algorithms converge much faster than the DA algorithms. (For examples, see
Liu and Wu (1999), Meng and van Dyk (1999), van Dyk and Meng (2001), Roy and Hobert (2007) and
Hobert, Roy and Robert (2011).)
In this short paper, we develop some theoretical results comparing the convergence rates of DA
and sandwich algorithms. In particular, we show that SA is always at least as good as the original
DA algorithm in terms of having smaller operator norm, that is, we have kK̃k ≤ kKk, where K, K̃
are the operators defined by k and k̃ respectively and k · k denotes operator norm. This result extends
results in Hobert and Rosenthal (2007) and Hobert and Román (2011). In these articles the above norm
comparison result has been obtained under some conditions on SA. Hobert and Rosenthal (2007) prove
that kK̃k ≤ kKk as long as K̃ is a positive operator. Recently, Hobert and Román (2011) pointed out
that Yu and Meng’s (2011) Theorem 1 can be used to establish the above result when R is reversible.
While the norm of a Markov operator provides a univariate summary of the convergence of the
corresponding Markov chain, a detailed picture of its convergence can be obtained by studying the
spectrum of the operator (Diaconis, Khare and Saloff-Coste (2008), Hobert et al. (2011)). We prove
that if the Markov operator corresponding to the DA chain is compact and the Mtf R is idempotent (i.e.
R
0
0
00
00
2
Y R(y, dy )R(y , dy ) = R(y, dy ) or in short R = R) then the spectrum of SA dominates that of DA
in the sense that the ordered eigenvalues of SA are less than or equal to the corresponding eigenvalues of
2
DA (see Section 2 for definition of compact operator). This is a generalization of results in Hobert et al.
(2011) and Khare and Hobert (2011). While Hobert et al. (2011) proved this eigenvalue domination
result under the condition that Y is finite (in which case, the DA chain is of course compact), Khare
and Hobert (2011) proved the same result for trace-class DA algorithms (see Section 2 for definition
of trace-class operator). Note that a trace-class operator is necessarily compact. In this article, we also
give weaker conditions on the class of DA algorithms than Khare and Hobert (2011) that allow the
sandwich algorithms to be strictly better than the corresponding DA algorithms in the Markov operator
norm sense. In particular, we show that if the DA operator is compact and R satisfies certain conditions
then the norm of the sandwich algorithm is strictly less than that of the DA. Khare and Hobert (2011)
proved this result under the stronger assumption that DA algorithm is trace-class.
The remainder of this paper is organized as follows. Section 2 contains a brief review of results
from operator theory that are used in this article. Our main results comparing DA and SA appear in
Section 3. Section 4 contains an example of a compact DA algorithm that is not trace-class to illustrate
our theoretical results.
2
Background on Markov Operators
Suppose fX : X → [0, ∞) is a pdf with respect to a σ−finite measure µ. Let
(
Z
Z
L20 (fX )
=
2
g:X→R:
g (x)fX (x)µ(dx) < ∞ and
X
)
g(x)fX (x)µ(dx) = 0
.
X
The inner product in L20 (fX ) is defined as hg, hi =
p
is kgk = hg, gi.
R
X g(x) h(x) fX (x) µ(dx)
and hence the norm of g
Let P (x, dx0 ) be the Mtf of an irreducible, aperiodic and Harris recurrent Markov chain {Xn }∞
n=0
on X with invariant density fX . Let P : L20 (fX ) → L20 (fX ) be the corresponding operator that maps g ∈
R
R
L20 (fX ) to (P g)(x) = X g(x0 )P (x, dx0 ). Define L20,1 (fX ) = {g ∈ L20 (fX ) : X g 2 (x) fX (x) µ(dx) =
1}. The (operator) norm of P is defined as
kP k =
kP gk .
sup
g∈L20,1 (fX )
Liu, Wong and Kong (1994) showed that kP k =
sup
corr(f (X0 ), g(X1 )), where corr(U, V ) is
f,g∈L20 (fX )
the classical (Pearson) correlation between two random variables U and V . Hence kP k describes the
strength of correlation between two consecutive steps of the chain. It easily follows that kP k ≤ 1 and
if the Mtf P (x, dx0 ) is reversible with respect to fX , i.e., if fX (x)P (x, dx0 ) = fX (x0 )P (x0 , dx) for all
x, x0 , then P is a self-adjoint operator. For the rest of this section we assume that P is a self-adjoint
3
operator. It is known that kP k < 1 if and only if the underlying Markov chain is geometrically ergodic
(Roberts and Rosenthal (1997)). Rosenthal (2003) showed that for a geometrically ergodic Markov
chain, the quantity 1 − kP k, which is called the spectral gap, is a good measure of its (asymptotic) rate
of convergence to the stationary distribution.
The spectrum of P , σ(P ) is defined as σ(P ) = {β ∈ R : P − βI is not invertible}. It is known
that σ(P ) ⊂ [−kP k, kP k] ⊂ [−1, 1] (Retherford, 1993, chap. 6). The operator P is called positive if
hP g, gi ≥ 0 for all g ∈ L20 (fX ). It can be shown that for a positive P , σ(P ) ⊂ [0, 1](Retherford, 1993,
p. 153). When the state space is finite, the spectrum is simply the set of eigenvalues of the corresponding
Markov transition matrix. But, for a general state space X, the spectrum of the Markov operator, P can
be quite complex. One exception is when P is compact.
The operator P is compact if for any sequence {gn } ∈ L20 (fX ) with kgn k ≤ 1, there is a subsequence {gn k } such that {P gn k } converges. For a compact operator P , all the points (except 0) in
the spectrum are eigenvalues, σ(P ) is at most countable and the spectrum has at most one limit point,
namely 0 (Conway, 1990, p. 214). If βn ↓ 0 are the ordered eigenvalues of a positive compact operator
P , then kP k = β1 (Retherford, 1993, chap. 7) and in this case β1 is necessarily less than 1 (because
∞
X
of ergodicity). The operator P is trace-class if
βn < ∞, i.e., the sum of the eigenvalues is finite
and P is Hilbert Schmidt if
∞
X
n=1
βn2 < ∞. Note that if a positive Markov operator is trace-class then it
n=1
is automatically Hilbert Schmidt. Diaconis et al. (2008) showed that if P is Hilbert Schmidt then the
Markov chain’s χ2 distance to its stationary distribution can be written as
2
Z n 0
X
p (x |x) − fX (x0 )
0
µ(dx
)
=
βi2n ξi2 (x) ,
fX (x0 )
X
i
where pn (·|x) denotes the density of Xn given X0 = x and {ξi } is an orthonormal basis of eigenfunctions corresponding to {βi }. The above representation shows that among positive Hilbert Schmidt
operators the Markov chains with smaller eigenvalues are likely to have faster convergence to stationarity.
In the next section we compare the DA and sandwich algorithms.
3
Comparison of the DA and sandwich algorithms
Consider the Mtd of the DA algorithm that is given by
Z
0
k(x |x) =
fX|Y (x0 |y)fY |X (y|x)ν(dy).
Y
4
Recall that fX|Y and fY |X are the two conditional densities associated with f (x, y). Let K : L20 (fX ) →
R
L20 (fX ) be the Markov operator defined by k(x0 |x) i.e., K takes g ∈ L20 (fX ) to (Kg)(x) = X g(x0 )k(x0 |x)µ(dx0 ).
Liu et al. (1994) showed that the DA operator K is always self-adjoint and positive. Following Diaconis et al. (2008), we can write K as K = Q∗ Q, (see also Buja (1990)) where the operators
Q : L20 (fX ) → L20 (fY ) and its adjoint Q∗ : L20 (fY ) → L20 (fX ) are defined as follows:
Z
Z
h(y) fY |X (y|x) ν(dy) .
g(x) fX|Y (x|y) µ(dx)
and
(Q∗ h)(x) =
(Qg)(y) =
Y
X
Slightly abusing notation we use k · k to denote the norm of any operator regardless of its domain
and range. Similarly, we use h·, ·i as inner product on both L20 (fX ) and L20 (fY ). The following result is
a simple extension of Proposition 2.7 in Conway (1990, p. 32).
p
Proposition 1. kKk = kQk = kQ∗ k.
Let K̃ : L20 (fX ) → L20 (fX ) be the operator of the sandwich algorithm with the Mtd
Z Z
0
k̃(x |x) =
fX|Y (x0 |y 0 )R(y, dy 0 )fY |X (y|x)ν(dy),
Y
where
R(y, dy 0 )
Y
is a Mtf with invariant density fY . Simple calculation shows that fX is the invariant
density of k̃. Clearly, we can represent K̃ as K̃ = Q∗ RQ, where R : L20 (fY ) → L20 (fY ) is the operator
corresponding to the Mtf R(y, dy 0 ).
We now prove that the norm of the DA chain is at least as large as that of the SA.
Theorem 1. If K and K̃ are the operators corresponding to the DA and sandwich algorithms respectively then kK̃k ≤ kKk.
Proof. Note that
kK̃k = kQ∗ RQk ≤ kQ∗ kkRkkQk = kRkkKk ≤ kKk,
where the second inequality is due to the fact that R is a Markov operator and the second equality
follows from Proposition 1.
We now consider conditions under which kK̃k is strictly smaller than kKk. Assume that the Mtf
R is reversible with respect to fY . Of course, then k̃ is reversible with respect to fX . If R is also
geometrically ergodic, that is, if kRk < 1 then we have kK̃k ≤ kRkkKk < kKk. But, as mentioned
in the Introduction, in practice R is often chosen to be a one dimensional reducible Markov chain. In
fact, often R is an idempotent operator (i.e., R2 = R). Clearly, in this case if R 6= 0 then kRk = 1. In
Theorem 2 below, we establish results comparing DA and SA under the assumption that R is idempotent
and the operator K is compact.
The following result is a minor extension of results in Retherford (1993, chap. VII).
5
Proposition 2. The following statements are equivalent.
• K is compact.
• Q is compact.
• Q∗ is compact.
We assume that the DA operator K is compact. So Q and Q∗ are also compact. Moreover, spectral
theorem for compact dual pairs and self-adjoint operators (Naylor and Sell (1982, Section 6.14); see
also Buja (1990)) guarantees the existence of singular values {λn }∞
n=1 and associated transformations
∞
{gn }∞
n=1 , {hn }n=1 where
• λn ∈ [0, 1] and λn ≤ λn−1 .
• {gn } and {hn } form complete orthonormal bases of L20 (fX ) and L20 (fY ) respectively.
• (Qgn )(y) = λn hn (y) and (Q∗ hn )(x) = λn gn (x).
•
R R
0
X Y gn (x)hn (y)f (x, y)µ(dx)ν(dy)
= 0 for n 6= n0 .
Let α1 ≥ α2 ≥ . . . be the (ordered) eigenvalues of K. Since Kgn = Q∗ Qgn = λn Q∗ hn = λ2n gn , we
have αn = λ2n and kKk = α1 = λ21 . Also, since K is compact and Harris ergodic, we have kKk < 1
which in turn shows that λn ∈ [0, 1) for all n = 1, 2, . . . Now we will prove the following theorem.
Theorem 2. Assume that R is idempotent with kRk = 1 and that the DA operator K is compact. Define
m = max{n ∈ N : λn = λ1 }. Then
1. K̃ is positive and compact.
2. Let α̃1 ≥ α̃2 ≥ . . . be the (ordered) eigenvalues of K̃, then α̃n ≤ αn for all n = 1, 2, . . . .
3. A necessary and sufficient condition for kK̃k < kKk is that
R
m
X
ai hi =
i=1
m
X
ai hi
i=1
holds if and only if (a1 , a2 , . . . , am ) = 0 ∈ Rm .
Remark 1. Note that if m = 1, then kK̃k < kKk if and only if Rh1 6= h1 .
6
Proof. Since R is idempotent
hK̃g, gi = hQ∗ RQg, gi = hRQg, Qgi = hRQg, RQgi ≥ 0,
which shows that K̃ is positive. Since R is a bounded operator and Q is compact, a minor extension of
results in Retherford (1993, chap. VII) shows that RQ is compact. Then similarly we have K̃ = Q∗ RQ
is compact since Q∗ is bounded.
As in Khare and Hobert (2011), note that for any g ∈ L20 (fX ),
hKg, gi − hK̃g, gi = h(K − K̃)g, gi = hQ∗ (I − R)Qg, gi = h(I − R)Qg, (I − R)Qgi ≥ 0,
i.e., K − K̃ is a positive operator. Then the Courant-Fischer-Weyl minmax characterization of eigenvalues of positive, compact, self-adjoint operators (see, e.g., Voss, 2003) yields
α̃n =
min
dim(V )=n−1
hK̃g, gi
≤
min
,g6=0 hg, gi
dim(V )=n−1
max
g∈V
⊥
hKg, gi
= αn .
,g6=0 hg, gi
max
g∈V
⊥
We know that kQk = λ1 = kQ∗ k. Then using the properties of compact adjoint operators mentioned above, the proof of 3 directly follows from the proof of Khare and Hobert’s (2011) Theorem
1.
In the next section we present a compact DA algorithm which is not trace-class. We also construct
a sandwich algorithm where the operator R is idempotent with kRk = 1. Using Theorem 2 we then
show that kK̃k < kKk. Notice that since the DA algorithm in this example is not trace-class, Khare
and Hobert’s (2011) results are not applicable to compare the DA and SA in this case.
4
A toy compact DA algorithm
Let fX (x) be the hyperbolic secant density given by
fX (x) =
1
, −∞ < x < ∞
2 cosh(πx/2)
with respect to Lebesgue measure on R. While we do not need to use MCMC algorithms to explore
fX (x), it is interesting to construct and compare DA and sandwich algorithms in this context.
Consider a joint density f (x, y) given by
f (x, y) = fX (y − x)fX (x), (x, y) ∈ R2
7
with respect to Lebesgue measure on R2 . Note that
R
R f (x, y)dy
= fX (x). Suppose W1 and W2 are
two independent standard Cauchy random variables. Then fX is the density function of
(Morris, 1982, p. 73). The marginal density fY , which is the density of
fY (y) =
2
π
2
π
log |W1 |
log |W1 W2 |, is given by
y
, −∞ < y < ∞.
2 sinh(πy/2)
The Mtd of the corresponding DA algorithm is given by
Z ∞
0
fX|Y (x0 |y)fX (y − x)dy,
k(x |x) =
−∞
where the conditional density fX|Y (x|y) is given by
fX|Y (x|y) =
sinh(πy/2)
,
2y cosh(π(y − x)/2) cosh(πx/2)
which is not a standard distribution. Diaconis et al. (2008) considered this DA algorithm in their study
of Gibbs sampler for location families with conjugate priors. In fact from Diaconis et al. (2008), it
1
for n = 1, 2, . . . . Since
follows that the DA algorithm (K) is compact with eigenvalues αn = n+1
∞
∞
X
X 1
αn =
= ∞, K is not trace-class.
n+1
n=1
n=1
In order to construct a sandwich algorithm, we use Hobert and Marchev’s (2008) recipe using group
action. Consider the multiplicative group R+ , where the group composition is defined as multiplication.
The Haar measure on this unimodular group is ω(dg) = dg/g, where dg is the Lebesgue measure on
R+ . Consider a group action gy : R+ × R → R, where, as the notation suggests, the group action is
defined by multiplication. Then the Lebesgue measure on R, dy is invariant with multiplier χ(g) = g
(Eaton (1989)), i.e.,
Z
χ(g)
Z
Z
φ(gy)dy = g
R
φ(gy)dy =
R
φ(y)dy,
R
for all g ∈ R+ and all integrable φ : R → R.
R
Let m(y) = R+ fY (gy)χ(g)ω(dg). Then
Z
Z
Z ∞
dg
1
z
1
m(y) =
fY (gy)χ(g)ω(dg) =
fY (gy)g
=
dz =
.
g
|y| 0 2 sinh(πz/2)
2|y|
R+
R+
Note that m(y) is positive for all y ∈ R and is finite for all y ∈ R\{0}. Given a fixed y 6= 0, assume g
has the density (with respect to Lebesgue measure on R+ )
fY (gy)χ(g) 1
gy|y|
=
,
m(y)
g
sinh(πgy/2)
g ∈ R+ .
Suppose y 0 = gy. Then conditional on y 6= 0, the density of y 0 is
h
i
y0
0
0
r(y 0 |y) =
I
(y)I
(y
)
+
I
(y)I
(y
)
.
R
R
R
R
+
+
−
−
sinh(πy 0 /2)
8
We define the sandwich Mtf, R in the SA chain as R(y, A) =
R
A r(y
0 |y)dy 0
for measurable A ⊂
R\{0}. Then from Hobert and Marchev (2008) it follows that the corresponding Markov operator R
is self-adjoint and idempotent with kRk = 1. From Theorem 2 it follows that the spectrum of the
SA chain dominates that of the DA chain, that is, α̃n ≤ αn for all n = 1, 2, . . . . Note that since
X
X
X
1
< ∞, both of the DA and sandwich algorithms in this example are
α̃n2 ≤
αn2 =
(n + 1)2
n
n
n
Hilbert Schmidt. We now show that kK̃k < kKk.
Since the eigenvalues αn , n = 1, 2, . . . , are strictly decreasing, we need to show that Rh1 6= h1
2
(Remark 1), where {hn }∞
n=1 is the orthonormal basis of L0 (fY ) as mentioned in Section 3. The eigen-
functions {hn }∞
n=1 are Meixner-Pollaczek orthonormal polynomials (Diaconis et al. (2008)) given by
Pnλ (y, ϕ) =
(2λ)n
−2iϕ inϕ
)e ,
2 F1 (−n, λ + iy; 2λ|1 − e
n!
with ϕ = π/2, λ = 1. Here (a)0 = 1; for n ∈ N, (a)n = a(a + 1) . . . (a + n − 1) and
r Fs (b1 , . . . , br ;
c1 , . . . , cs |z) =
∞
X
(b1 . . . br )l z l
l=0
(c1 . . . cs )l l!
In particular, simple calculations show that h1 (y) = 2y. Since
Z
Z
0
0
0
(Rh1 )(y) = h1 (y )r(y |y)dy = 2Sign(y)
r
Y
with (b1 . . . br )l =
(bi )l .
i=1
z2
dz 6= h1 (y),
sinh(πz/2)
we have kK̃k < kKk.
Acknowledgments
The author thanks two anonymous reviewers for helpful comments and suggestions.
References
B UJA , A. (1990). Remarks on functional canonical variates, alternating least squares methods and ACE.
The Annals of Statistics, 18 1032–1069.
C ONWAY, J. B. (1990). A Course in Functional Analysis. 2nd ed. Springer-Verlag, New York.
D IACONIS , P., K HARE , K. and S ALOFF -C OSTE , L. (2008). Gibbs sampling, exponential families and
orthogonal polynomials (with discussion). Statistical Science, 23 151–200.
E ATON , M. L. (1989). Group Invariance Applications in Statistics. Institute of Mathematical Statistics
and the American Statistical Association, Hayward, California and Alexandria, Virginia.
9
H OBERT, J. P. and M ARCHEV, D. (2008). A theoretical comparison of the data augmentation, marginal
augmentation and PX-DA algorithms. The Annals of Statistics, 36 532–554.
H OBERT, J. P. and ROM ÁN , J. C. (2011). Discussion of “to center or not to center: that is not the
question-and ancillarity-sufficiency interweaving strategy (ASIS) for boosting MCMC efficiency” by
Y. Yu and X.-L. Meng. Journal of Computational and Graphical Statistics. In press.
H OBERT, J. P. and ROSENTHAL , J. S. (2007). Norm comparisons for data augmentation. Advances
and Application in Statistics, 7 291–302.
H OBERT, J. P., ROY, V. and ROBERT, C. P. (2011). Improving the convergence properties of the data
augmentation algorithm with an application to Bayesian mixture modelling. Statistical Science. To
appear.
K HARE , K. and H OBERT, J. P. (2011). A spectral analytic comparison of trace-class data augmentation
algorithms and their sandwich variants. The Annals of Statistics. To appear.
L IU , J. S., W ONG , W. H. and KONG , A. (1994). Covariance structure of the Gibbs sampler with
applications to comparisons of estimators and augmentation schemes. Biometrika, 81 27–40.
L IU , J. S. and W U , Y. N. (1999). Parameter expansion for data augmentation. Journal of the American
Statistical Association, 94 1264–1274.
M ENG , X.-L. and VAN DYK , D. A. (1999). Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika, 86 301–320.
M ORRIS , C. N. (1982). Natural exponential families with quadratic variance functions. The Annals of
Statistics, 10 65–80.
NAYLOR , A. W. and S ELL , G. R. (1982). Linear Operator Theory in Engineering and Science.
Springer, New York.
R ETHERFORD , J. R. (1993). Hilbert Space: Compact Operators and the Trace theorem. Cambridge
University Press.
ROBERTS , G. O. and ROSENTHAL , J. S. (1997). Geometric ergodicity and hybrid Markov chains.
Electronic Communications in Probability, 2 13–25.
ROSENTHAL , J. S. (2003). Asymptotic variance and convergence rates of nearly-periodic Markov chain
Monte Carlo algorithms. Journal of the American Statistical Association, 98 169–177.
10
ROY, V. and H OBERT, J. P. (2007). Convergence rates and asymptotic standard errors for MCMC
algorithms for Bayesian probit regression. Journal of the Royal Statistical Society, Series B, 69 607–
623.
TANNER , M. A. and W ONG , W. H. (1987). The calculation of posterior distributions by data augmentation(with discussion). Journal of the American Statistical Association, 82 528–550.
VAN
DYK , D. A. and M ENG , X.-L. (2001). The Art of Data Augmentation (with Discussion). Journal
of Computational and Graphical Statistics, 10 1–50.
VOSS , H. (2003). Variational characterization of eigenvalues of nonlinear eigenproblems. In Proceedings of the International Conference on Mathematical and Computer Modelling in Science and
Engineering (M. Kocandrlova and V. Kelar, eds.). Czech Technical University in Prague, 379–383.
Y U , Y. and M ENG , X.-L. (2011). To center or not to center: that is not the question-and ancillaritysufficiency interweaving strategy (ASIS) for boosting MCMC efficiency. Journal of Computational
and Graphical Statistics. In press.
11
Download