[width=14pt,height=12pt]beamericonbookbeamericonbook [width=14pt,height=12pt]beamericonbookshadedbeamericonbook.20

advertisement
[width=14pt,height=12pt]beamericonbookbeamericonbook
[width=14pt,height=12pt]beamericonbookshadedbeamericonbook.20
beamericonbook.!20opaquebeamericonbookshaded
beamericonbook.!15opaquebeamericonbookshaded
beamericonbook.!10opaquebeamericonbookshaded
beamericonbook.!5opaquebeamericonbookshaded
beamericonbook.!2opaquebeamericonbookshaded
[width=11pt,height=14pt]beamericonarticlebeamericonarticle
[width=11pt,height=14pt]beamericonarticleshadedbeamericonarticle.20
beamericonarticle.!20opaquebeamericonarticleshaded
beamericonarticle.!15opaquebeamericonarticleshaded
beamericonarticle.!10opaquebeamericonarticleshaded
beamericonarticle.!5opaquebeamericonarticleshaded
beamericonarticle.!2opaquebeamericonarticleshaded
@keypgfimagewidth @keypgfimageheight @keypgfimagepage
@keypgfimageinterpolate[true] @keypgfimagemaskifundefinedpgf@mask@1
@keypgfmaskmatte
On the flexibility of acceptance probabilities in auxiliary
variable Metropolis-Hastings algorithms
On the flexibility of acceptance probabilities in
auxiliary variable Metropolis-Hastings
algorithms
Geir Storvik
Dept. of Mathematics, University of Oslo
SFI2 , Statistics for inovation
Norwegian Computing Center
SFF-CEES, Ecology and Evolution synthesis
Warwick March 16-20 2009
Outline
MC-methods in complex situations
Simulation using auxiliary variables
Common framework
Importance sampling
Metropolis-Hastings
Sequentially generated proposals
Non-sequential algorithms
Summary/discussion
Monte Carlo estimation
R
f (y)π(y)dy = E π [f (y)]
◮
Interest in θ =
◮
Markov Chain Monte Carlo
◮
◮
◮
P
Estimated through θ̂ = n1 ni=1 f (yi∗ )
∗
∗
y1 , y2 , ... generated through a Markov chain with π as invariant
distribution
Importance sampling
◮
◮
y
Estimated through θ̂ =
1
n
π(yi∗ )
∗
i=1 q(yi∗ ) f (yi ),
Pn
yi∗ iid from q(·)
Complex situations: Simple methods not enough
◮
◮
◮
Multiple modes
Jumps between models
Densities with long, narrow contours
Combining importance sampling/MCMC
◮
MCMC gives samples close to π(y)
◮
Importance sampling correct for deviance
Possible to combine?
Using within importance sampling:
◮
◮
◮
◮
∗
Generate y ∗ through an MCMC algorithm x1∗ , x2∗ , ..., xt+1
= y∗
∗
∗
Importance weight π(y )/qy (y ) with
qy (y ) = qy (xt+1 ) =
◮
◮
◮
Z
q1 (x1 )
x1:t
t+1
Y
q(xj |xj−1 )dx1:t
j=2
Problem: qy (y) not easy to compute
Other proposal schemes of interest
Many “proper” weights possible through extended space
formulation
Use of auxiliary variables within MH
◮
◮
Current value y, simulate y ∗ ∼ qy (·|y)
(
π(y ∗ )q (y |y ∗ )
y ∗ with prob α(y; y ∗ ) = min{1, π(y )qyy(y ∗ |y ) }
new
y
=
y
otherwise
Simulation through auxiliary variables
x1∗ ∼q1 (x1∗ |y),
∗
xj∗ ∼qj (xj∗ |xj−1
), j = 2, ..., t
y ∗ ∼qy |x (y ∗ |xt∗ ).
◮
◮
◮
What acceptance probability should now be used?
Ideal α(y; y ∗ ) usually not possible to evaluate
Many “proper” choices are available through working in extended
space.
Examples from literature
◮
◮
Annealed importance sampling (Neal, 2001)
Mode jumping (Tjelmeland and Hegstad, 2001; Jennison and
Sharp, 2007)
◮
Model selection and reversible jump MCMC (Al-Awadhi et al.,
2004)
◮
Delayed rejection sampling (Tierney and Mira, 1999; Green and
Mira, 2001)
Pseudo-marginal algorithms (Beaumont, 2003)
◮
◮
Likelihoods with intractable normalising constants (Møller et al.,
2006)
◮
Directional MCMC (Eidsvik and Tjelmeland, 2006)
Multi-try methods and particle proposals (Liu et al., 2000;
Andrieu et al., 2008)
◮
Common framework
◮
Different ways of motivating the algorithms
◮
Different ways of proving their validity
Possible to put all into common framework
“Standard” use of Imp.samp/MCMC on extended spaces
◮
◮
◮
Easier to understand
Easier to construct alternative acceptance probabilities.
◮
Toolbox for constructing/proving validity of other algorithms.
◮
Auxiliary variables and importance sampling
◮
◮
Simulate y ∗ through x ∗ ∼ qx (·) and y ∗ ∼ qy |x (·|x ∗ ).
For any conditional distribution h(x|y),
Z
Z Z
θ = f (y)π(y)dy =
f (y)π(y)h(x|y)dxdy
y
R R
y
)h(x|y )
f (y) qxπ(y
(x)qy |x (y |x) qx (x)qy |x (y|x)dxdy
◮
θ=
◮
Importance sampling:
y
x
x
n
θ̂ =
1X
w(x i , y i )f (y i )
n
i=1
π(y)h(x|y)
w(x, y) =
qx (x)qy |x (y|x)
◮
Many possible weights for different choices of h!
Properties
◮
Define qy (y) =
R
x
qx (x)qy |x (y|x)dx. Then
π(y)
qy (y)
π(y)
Var[w(x, y)] ≥Var
qy (y)
E[w(x, y)|y] =
Combination of h′ s
◮
Combination of h’s:
hs (x|y), s = 1, 2, ...
imply for as ≥ 0,
h(x|y) =
P
X
s
cond. distributions
as = 1
as hs (x|y)
cond. distribution
s
◮
Combination of w’s:
ws (x, y) =
π(y)hs (x|y)
qx (x)qy |x (y|x)
proper weight function
imply
X
s
as ws (x, y)
also proper weight function
Proposal through MCMC-steps
◮
∗
Assume x1∗ ∼ q1 (·), xj∗ ∼ q(·|xj−1
), j = 2, ..., t + 1 with
π(x)q(y|x) = π(y)q(x|y)
◮
∗
Proposal y ∗ = xt+1
◮
Note:
◮
Possible to choose h(x|y) such that w(x, y) → 1 as t → ∞?
π(y )
qy (y )
≈ 1 for t large
Proposal through MCMC (cont)
◮
hs (x1:t |y = xt+1 ) = q1 (x1 )
ws (x, y) =
◮
π(xs )
q(xs |xs−1 )
Qs−1
j=2
q(xj |xj−1 )
Qt
Special cases
π(x1 )
q1 (x1 )
◮
s = 1, w1 (x, y ) =
◮
s = t + 1, wt+1 (x, y ) =
◮
Combination: w̄(x, y )
Note: w̄(x, y ) → E[w(x, y )] = 1 as t → ∞
◮
π(xt+1 )
q(xt+1 |xt )
Pt+1 π(xs )
= 1t
j=2 q(xs |xs−1 )
j=s
q(xj |xj+1 ) imply,
MH and auxiliary variables
◮
Ideas from importance sampling can be transferred to MCMC
Current value y, simulate x ∗ ∼ qx (·|y), y ∗ ∼ qy |x (·|x ∗ )
MH algorithm constructed in extended space (x, y).
◮
r (x, y; x ∗ , y ∗ ) =
◮
Different strategies possible
◮
◮
◮
◮
w(y ;x ∗ ,y ∗ )
w(y ∗ ;x,y )
Store previous x
Generate x ∗ , y ∗ and x
Proposal through “inner” MCMC steps
◮
∗
Assume x1 ∼ q1 (| · y), xj∗ ∼ q(·|xj−1
), j = 2, 3, ..., t + 1 with
π(x)q(y|x) = π(y)q(x|y)
◮
∗
Proposal y ∗ = xt+1
r (y; x ∗ , y ∗ , y) =
◮
w(y; x ∗ , y ∗ )
w(y; x ∗ , y ∗ )
Special cases
◮
π(x1 )
q1 (x1 |y )
π(x
)
= q(x t+1|xt )
t+1
Pt+1 π(xs )
1
j=2 q(xs |xs−1 )
t
w1 (x, y ) =
◮
wt+1 (x, y )
◮
w̄(x, y ) =
Note: w̄(x, y ) → E[w(x, y )] = 1 as t → ∞
◮
Example
◮
◮
◮
◮
◮
◮
π(y) = βN(y; µ1 , I) + (1 − β)N(y; µ2 , I) where
µ1,1 = −µ2,1 = −10 while µi,j = 0, i = 1, 2, j = 2, ..., p.
2
x1∗ ∼ N(0, σlarge
I)
t + 1 = 10 “inner” MH steps (discrete Langevin diffusion)
Previous x stored
M = 20 outer MCMC-steps.
w1 (x, y) =
π(x1 )
q1 (x1 ) ,
wt+1 (x, y) =
π(xt+1 )
q(xt+1 |xt ) ,
w̄(x, y)
0.15
0.20
0.00
0.05
0.10
0.15
0.20
0.00
0.05
0.10
0.15
0.20
0.00
0.05
0.10
0.15
0.00
0.05
0.10
0.15
0.00
0.05
0.10
0.15
(a) p=1
2
(d) p=5
2
3
3
1
1
(b) p=2
2
(e) p=10
2
3
3
1
1
(c) p=3
2
(f) p=10
2
3
3
Dist to truth for different weight functions
0.10
1
0.05
1
0.00
More general schemes
◮
◮
◮
x ∗ ∼ qx (·|y), y ∗ ∼ qy |x (·|x ∗ ).
Some sequential structure in generation of x ∗ = (x1∗ , ..., xt∗ ).
Examples
◮
◮
◮
◮
Annealed importance sampling (Neal, 2001)
Mode jumping (Tjelmeland and Hegstad, 2001; Jennison and
Sharp, 2007)
Model selection and reversible jump MCMC (Al-Awadhi et al., 2004)
Particle proposals (Andrieu et al., 2008)
RJMCMC proposals (Al-Awadhi et al., 2004)
◮
x1∗ : Jump between models
◮
xj∗ , j = 2, ..., t + 1: Jump within model
◮
(Al-Awadhi et al., 2004): w1 (y; x ∗ , y ∗ ) =
◮
Alternatives
π(n)π(x1∗ )
qM (n|m,y )qn (x1∗ |y )
π(n)π(xs∗ )
∗ )
qM (n|m, y)qn (xs∗ |xs−1
P
π (x ∗ )
t+1
π(n)qM (m|n, y ∗ ) s=2 qn (xn∗ |xs∗ )
s
s−1
r=
Pt+1 πm (xs ) .
π(m)qM (n|m, y) s=2 qm (xs |xs−1 )
ws (y; x ∗ , y ∗ ) =
◮
Converges towards π(n)qM (m|n, y ∗ )/π(m)qM (n|m, y)
Particle proposals
◮
◮
◮
∗
Generate parallel sequences {xj,1:t+1
, j = 1, ..., N}
Choose K ∈ {1, ..., N}, put y ∗ = xK∗ ,t+1 .
∗
∗
∗
Each sequence independently:xj,i
∼ qi (xj,i
|xj,i−1
)
Pr(K ) ∝
◮
π(xK∗ ,t+1 )
qt+1 (xK∗ ,t+1 |xK∗ ,t )
.
h(x|y, x ∗ , y ∗ ) =
Qt
Q
Qt+1
N −1 δ(xK ,t+1 − y) i=1 qi (xK ,i |xK ,i−1 ) j6=K i=1 qi (xj,i |xj,i−1 )
gives
∗
π(xj,t+1
)
∗
∗)
j=1 qt+1 (xj,t+1
|xj,t
PN
r (y; x ∗ , y ∗ , x) = PN
π(xj,t+1 )
j=1 qt+1 (xj,t+1 |xj,t )
.
Non-sequential examples
◮
Pseudo-marginal algorithms (Beaumont, 2003; Andrieu and
Roberts, 2009)
◮
Likelihoods with intractable normalising constants (Møller et al.,
2006)
◮
Delayed rejection sampling (Tierney and Mira, 1999; Green and
Mira, 2001)
Delayed rejection sampling
◮
◮
◮
◮
◮
◮
x1∗ ∼ q1 (·|y), accepted with prob. α1 (y; x1∗ )
If rejected, x2∗ ∼ q2 (·|x1∗ , y), accepted with prob. α2 (y; x1∗ , x2∗ )
y ∗ = x1∗ with prob α1 , otherwise = x2∗ .
Tierney and Mira (1999)
π(x2∗ )q1 (x1∗ |x2∗ )q2 (y|x1∗ , x2∗ )[1 − α1 (x2∗ ; x1∗ )]
∗
∗
α2 (y; x1 , x2 ) = min 1,
π(y)q1 (x1∗ |y)q2 (x2∗ |x1∗ , y)[1 − α1 (y; x1∗ )]
Alternative: Generate x1 ∼ q1 (x1 |y ∗ ),
π(y ∗ )q2 (y|x1 , y ∗ )[1 − α1 (y ∗ ; x1 )]
∗
∗
α2 (y; x1 , x2 ) = min 1,
π(y)q2 (y ∗ |x1∗ , y)[1 − α1 (y; x1∗ )]
n
o
∗
1 (y ;x1 )
q2 = π ⇒ α2 (y; x1∗ , x2∗ ) = min 1, 1−α
∗
1−α1 (y ;x )
1
Example
zi =µ + ηi + εi ,
ε ∼N(0, τ1−1 ),
X
ηj , ni−1 τ2−1 ),
ηi |η−i ∼N(βni−1
j∼i
Interest in posterior for (τ1 , τ2 ).
i = 1, ..., n
iid
CAR
0
1
2
tau2.v
3
4
5
Posterior for (τ1 , τ2 )
0
1
2
3
tau1.v
4
5
Delayed rejection sampling
◮
First stage:
τj1 = τj fj ,
◮
Second stage: Re-parametrisation
τ=
◮
q(fj ) = (1 + f −1 )I(fj ∈ [F −1 , F ]), j = 1, 2
τ2
τ1 τ2
, r=
τ1 + τ2
τ1 + τ2
r ∼ [0, 1], τ ∼ π(τ |r )
Compare two choices for α2 .
0.5
0.6
KS-distance to posterior for (τ1 , τ2 )
0.3
0.2
0.1
t(d[c(2, 3), ])
0.4
DR1
DR2
0e+00
2e+04
4e+04
6e+04
Time
8e+04
1e+05
Pseudo-marginal algorithms
◮
◮
◮
◮
◮
Target given through π(y) =
Simulate y ∗ ∼ qy (y ∗ |y)
R
x
π̄(x, y)dx
Simulate x1∗ , ..., xt∗ ∼ qx|y (x ∗ |y ∗ )
Q
hs (x1:N |y) = π̄(xs |y) j6=s qx|y (xj |y)
Acceptance ratio (Beaumont, 2003)
rx,y (x ∗ , y ∗ ) =
w(y, x ∗ , y ∗ )
w(y ∗ , x, y)
where
t
w(y, x ∗ , y ∗ ) =
π̄(xi∗ , y ∗ )
π(y ∗ )
1X
≈
t
qx|y (xi∗ |y ∗ )qy (y ∗ |y)
qy (y ∗ |y)
i=1
◮
Variant: Generate new x’s each time
Intractable normalising constants (Møller et al., 2006)
π(y) =p(y|z) = C −1 p(y)p(z|y)
p(z|y) =Z −1 (y)p̃(z|y)
◮
◮
◮
(x, y) is the current state
q(x ∗ , y ∗ |x, y) = qy (y ∗ |x, y)qx|y (x ∗ |x, y, y ∗ ) with
qx|y (x ∗ |x, y, y ∗ ) = Z −1 (y ∗ )p̃(x ∗ |y ∗ )
h(x|y) arbitrary, but state space similar to z.
r (x, y; x ∗ , y ∗ ) =
◮
p(y ∗ )p̃(z|y ∗ )h(x ∗ |y ∗ )qy (y|x ∗ , y ∗ )p̃(x|y)
p(y)p̃(z|y)h(x|y)qy (y ∗ |x, y)p̃(x ∗ |y ∗ )
h(x|y, x ∗ , y ∗ ) = δ(x − x ∗ ) gives
p(y ∗ )p̃(z|y ∗ )q |x (y ∗ |y )p̃(x ∗ |y )
r (x ∗ , y ∗ |x, y) = p(y )p̃(z|y )qy |xy(y
|y ∗ )p̃(x ∗ |y ∗ )
Summary/discussion
◮
◮
◮
◮
General framework: x ∗ ∼ qx (·|y), y ∗ ∼ qy |x (·|x ∗ )
Many existing algorithms within this framework
Many different weight functions/acceptance probabilities are
possible
General framework:
◮
◮
◮
◮
Tool for constructing new algorithm
Tool for constructing alternative weight functions
Common understanding of many different algorithms
Further work: Experimental/theoretical results on properties of
different weight functions
References I
Al-Awadhi, F., M. Hurn, and C. Jennison (2004). Improving the acceptance rate of reversible jump mcmc proposals. Statistics & Probability
Letters 69, 189–198.
Andrieu, C., A. Doucet, and R. Holenstein (2008). Particle Markov chain Monte Carlo. Preprint.
Andrieu, C. and G. O. Roberts (2009). The pseudo-marginal approach for efficient Monte Carlo computations. Annals of Statistics. To be
published.
Beaumont, M. (2003). Estimation of Population Growth or Decline in Genetically Monitored Populations. Genetics 164(3), 1139–1160.
Eidsvik, J. and H. Tjelmeland (2006). On directional metropolis-hastings algorithms. Stat. Comput 16, 93–106.
Green, P. and A. Mira (2001). Delayed rejection in reversible jump Metropolis-Hastings. Biometrika 88(4), 1035–1053.
Jennison, C. and R. Sharp (2007). Mode jumping in MCMC: Adapting proposals to the local environment. Talk at Conference to honour
Allan Seheult, Durham, March 2007. Available at http://people.bath.ac.uk/mascj/.
Liu, J., F. Liang, and W. Wong (2000). The multiple-try method and local optimization in Metropolis sampling. Journal of the American
Statistical Association 95(449), 121–134.
Møller, J., A. Pettitt, R. Reeves, and K. Berthelsen (2006). An efficient Markov chain Monte Carlo method for distributions with intractable
normalising constants. Biometrika 93(2), 451–458.
Neal, R. M. (2001). Annealing importance sampling. Statistics and Computing 2, 125–139.
Tierney, L. and A. Mira (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine 18(1718), 2507–2515.
Tjelmeland, H. and B. Hegstad (2001). Mode jumping proposals in MCMC. Scand. J. Statist. 28(1), 205–223.
Download