Beyond MCMC: two case studies and a few thoughts

advertisement
Beyond MCMC: two case studies
and a few thoughts
Nicolas Chopin
(joint work with Tonly Lelièvre, Gabriel Stoltz, Judith Rousseau, Brunero
Liseo)
nicolas.chopin@ensae.fr
ENSAE-CREST, Malakoff, FRANCE
EPSRC MCMC Symp’09 – p. 1
Limitations of MCMC
slow: error is 0( N −1/2 ), correlation between successive steps.
local exploration: how to assess convergence if posterior is not
unimodal?
tedious to tune ⇒ these algorithms are not robust.
bugs are difficult to detect.
EPSRC MCMC Symp’09 – p. 2
Alternatives
fast approximations: variational Bayes, Expectation-Propagation,
INLA (see Havard’s talk).
adaptive MCMC;
adaptive biasing; e.g. Wang-Landau.
Sequential Monte Carlo, see e.g. Doucet et al. (2006).
EPSRC MCMC Symp’09 – p. 3
First case study
Gaussian Mixture model:
K
p ( yi | θ ) =
∑ qk
k =1
r
o
n v
vk
k
2
exp − (yi − µk )
2π
2
µ1 , . . . , µK ∼ N (m, s2 ),
v1 , . . . , vK ∼ G ( a, b)
q1 , . . . , qK ∼ Dir (1, . . . , 1)
or equivalently
q k = ω k / ( ω1 + . . . + ω K ) ,
ω1 , . . . , ωK ∼ Exp(1).
EPSRC MCMC Symp’09 – p. 4
Hidalgo stamps dataset
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
6
7
8
9
10
11
12
13
14
EPSRC MCMC Symp’09 – p. 5
Quotes
Jasra et al. (2005): “ We feel that the Gibbs sampler run with completion is
often not worth programming [...] since the chance of it falling to
converge is too high.”
Celeux et al. (2000): “Although somewhat presumptuous, we consider
that almost the entirety of MCMC implemented for mixture models has
failed to converge.”
EPSRC MCMC Symp’09 – p. 6
Molecular simulation
Concerns simulation of some continuous-time dynamics, e.g.
dXt = −∇V ( Xt )dt + ( β/2)−1/2 dWt
with associated Boltzmann-Gibbs density:
p( x ) ∝ exp {− βV ( x )}
(In practice, approximated by long runs of random walk
Hastings-Metropolis.)
EPSRC MCMC Symp’09 – p. 7
Metastability
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0
1
2
4
3
5
6
7
xi
EPSRC MCMC Symp’09 – p. 8
Adaptive biased sampling
Aim is to sample from biased potential function
where F is the free energy, i.e.
e = V − F (ξ )
V
n
e
pe( x ) ∝ exp − βV
o
is such that the margin of ξ is uniform.
At iteration t, perform Hastings-Metropolis move w.r.t.
and adjust Ft on the fly.
et = V − Ft (ξ )
V
EPSRC MCMC Symp’09 – p. 9
Wang-Landau (ABP)
At each iteration t, compute, for ξ in small bin [ek , ek+1 ]
1
Ft (ξ ) =
N
n
∑ I {ξ (xi ) ∈ [ek , ek+1 ]}
i =1
i.e. penalises progressively already visited regions, see e.g. Atchadé and
Liu (2004).
EPSRC MCMC Symp’09 – p. 10
Adaptive Biasing Force (ABF)
EVe
Progressively adapt the ‘force’
1
∂Ft
(ξ ) =
∂ξ
N
∂F
∂V
−
=0
∂ξ
∂ξ
∂F
∂ξ ,
i.e. for ξ ∈ [ek , ek+1 ]
n
∂V
∑ ∂ξ (xi ) I {ξ (xi ) ∈ [ek , ek+1 ]}
i =1
EPSRC MCMC Symp’09 – p. 11
Back to mixture problems
Our findings:
1. ABF outperforms ABP.
2. ‘temperature’ not necessarily the best ξ; certainly not the easiest to
interpret.
3. seems much easier to find ‘stable’ directions, i.e. all symmetric
functions of θ are ‘stable’.
4. but careful with metaphors.
EPSRC MCMC Symp’09 – p. 12
ξ = q1
1. not symmetric;
2. constrained to [0, 1];
3. Forcing q1 close to one empty other components, and helps
component switching, in a symmetric way;
4. Forcing q1 close to zero: more later.
EPSRC MCMC Symp’09 – p. 13
Hidalgo, K = 7 (I)
mu_i
14
13
12
11
10
9
8
7
6
50.0
0.2
0.4
0.6
0.8
1.0
1e7
EPSRC MCMC Symp’09 – p. 14
Hidalgo, K = 7 (II)
q_i
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.2
0.4
0.6
0.8
1.0
0.0
0.0
1e7
EPSRC MCMC Symp’09 – p. 15
Hidalgo, K = 7 (III)
biased log-density
-700
-710
-720
-730
-740
0.2
0.4
0.6
0.8
1.0
-750
0.0
1e7
EPSRC MCMC Symp’09 – p. 16
Hidalgo, K = 7 (IV)
free energy (bias)
80
70
60
50
40
30
20
10
0
-10
0.0
0.2
0.4
0.6
0.8
1.0
EPSRC MCMC Symp’09 – p. 17
Hidalgo, K = 3 (I)
mu_i
13
12
11
10
9
8
7
0.4
0.6
0.8
1e8
0.2
EPSRC MCMC Symp’09 – p. 18
Hidalgo, K = 3 (II)
biased log-density
-730
-735
-740
-745
-750
-755
-760
-765
0.4
0.6
0.8
1.0
0.2
1e8
EPSRC MCMC Symp’09 – p. 19
Hidalgo, K = 3 (III)
bias
140
120
100
80
60
40
20
0
-20
-40
0.0
0.2
0.4
0.6
0.8
1.0
EPSRC MCMC Symp’09 – p. 20
ξ=β
In the prior for inverse-variances:
vi ∼ G ( a, β),
β ∼ G ( g, h)
1. symmetric;
2. constrained to [0, 10 ∗ b];
3. Large values of β penalise small variances.
EPSRC MCMC Symp’09 – p. 21
Hidalgo, K = 3, ξ = β
16
14
12
Value
10
8
6
4
2
0
0e+00
2e+08
4e+08
6e+08
Iteration
8e+08
1e+09
EPSRC MCMC Symp’09 – p. 22
Visiting qi ≈ 0 regions
allows us to estimate p(y|K )/p(y|K − 1):
p(y|K )
= EVe [exp{ F (ξ )}]
pe(y)
p ( y | K − 1)
1
=
pe(y)
K
K
∑ EVe
i =1
p(y|θ )
exp{ F (ξ )}
p(y|θ [qi = 0])
using importance sampling, or reversible jump (with birth-death steps).
EPSRC MCMC Symp’09 – p. 23
Second case study
Nonparametric inference from long memory Gaussian processes:
Y | f ∼ N (0, T ( f ))
T ( f )[k, l ] = γ(k − l ) =
Z π
−π
ei(k−l )ω f (ω ) dω
with a nonparametric prior for f , e.g. (Rousseau et al., 2009)
−2d
iω f ( ω ) = 1 − e exp
(
K
∑ θ j cos( jω)
j =0
)
with d ∼ U (0, 1/2), θ j ∼ N (0, 1/j) , K ∼ Poisson(1).
EPSRC MCMC Symp’09 – p. 24
Computational difficulties
1 T
−1/2
−n/2
p(y| f ) = (2π )
exp − y T ( f )−1 y
| T ( f )|
2
where | T ( f )| is easy to approximate, and
T( f )
−1
≈ T ( g ),
1
g=
4π 2 f
but we still have n integrals to compute, for k = 1, . . . , n;
Z π
−π
eikω g(ω ) dω
EPSRC MCMC Symp’09 – p. 25
Interpolation
If g is replaced by a linear interpolation, all the integrals can be computed
exactly using one FFT of the points g(ωk ), k = 1, . . . , M; the number of
bins M must be ≥ 4N. Cost is O( M).
EPSRC MCMC Symp’09 – p. 26
Sequential Monte Carlo
Consider the ‘intermediate’ likelihood functions pm (y| f ) corresponding to

 T ( g)[k, l ]
Tm [k, l ] =
 0
if|k − l | ≤ 2m
otherwise
then apply SMC to sequence:
pt ( f ) ∝ p( f ) pm (y| f )r/a pm+1 (y| f )1−r/a
for t = am + r, r ∈ [0, a − 1].
(Additional benefit: use Divide and Conquer structure of FFT.)
EPSRC MCMC Symp’09 – p. 27
SMC steps
0. Draw f j ∼ p( f ), j = 1, . . . , N. Set t = 0, w j = 1.
1. Re-weight
p t +1 ( f j )
wj = wj
pt ( f j )
1. If Variance(weights)>γ,
(a) resample
(b) apply MCMC step w.r.t. pt+1 .
2. t ← t + 1. Go to 1.
EPSRC MCMC Symp’09 – p. 28
Automatic tuning of MCMC
Conditional on K, random walk HM steps, with proposal
covariance tuned to current particle system.
To move K, use reversible jump with Green (2003, HSS book)’s
proposal.
Or just use positive discrimination (Chopin, 2007), i.e.
Wang-Landau biasing w.r.t. K.
EPSRC MCMC Symp’09 – p. 29
A simulated example
y
posterior of d
4
8
3
7
2
6
1
5
0
4
-1
3
-2
2
-3
1
-4
0
200
400
600
800
1000
0
0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
EPSRC MCMC Symp’09 – p. 30
A simulated example (II)
Effective Sample Size, a=4
10000
7500
5000
2500
0
4
8
12
16
20
24
28
32
36
40
44
EPSRC MCMC Symp’09 – p. 31
A simulated example (III)
no RJ
with RJ
0.35
0.35
0.30
0.30
0.25
0.25
0.20
0.20
0.15
0.15
0.10
0.10
0.05
0.05
0.00
0
2
4
6
8
10
0.00
0
2
4
6
8
10
EPSRC MCMC Symp’09 – p. 32
Conclusions
thriving to provide automatic/robust algorithms.
SMC, together with biasing methods, seem excellent candidate for
this purpose:
1. Biasing (a) makes target easier to explore the target (b)
facilitates construction of proposals
2. in SMC, tuning proposals is trivial.
Currently working on a SMC version of ABF.
EPSRC MCMC Symp’09 – p. 33
Download