Divide-and-Conquer Sequential Monte Carlo

advertisement
Divide-and-Conquer Sequential Monte Carlo
Adam M. Johansen
Joint work with:
John Aston, Alexandre Bouchard-Côté, Brent Kirkpatrick,
Fredrik Lindsten, Christian Næsseth, Thomas Schön
University of Warwick
a.m.johansen@warwick.ac.uk
www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/
OxWaSP/Amazon Meeting, Berlin
May 12th, 2016
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Outline
I
Sequential Monte Carlo
I
Divide and Conquer SMC
I
Illustrative Applications
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Essential Problem
The Abstract Problem
I
Given a distribution,
π(x) =
γ(x)
,
Z
I
such that γ(x) can be evaluated pointwise,
I
how can we approximate π
I
and how about Z ?
I
Can we do so robustly?
I
In a distributed setting?
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Importance Sampling
I
Simple Identity, provided γ µ:
Z
Z
γ(x)
Z = γ(x)dx =
µ(x)dx
µ(x)
I
So, if X1 , X2 , . . . ∼ µ, then:
iid
#
N
1 X γ(Xi )
=Z
∀N : E
N
µ(Xi )
"
unbiasedness
i=1
N
1 X γ(Xi )
a.s.
slln
lim
ϕ(Xi ) →γ(ϕ)
µ(Xi )
N→∞ N
i=1
"
#
N
√
1 X γ(Xi )
d
clt lim N
ϕ(Xi ) − γ(ϕ) →W
N
µ(Xi )
N→∞
i=1
h
i
γ(X )
where W ∼ N 0, Var µ(X1i ) ϕ(X1 ) .
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Sequential Importance Sampling
I
Write
γ(x1:n ) = γ(x1 )
n
Y
γ(xp |x1:p−1 ),
p=2
I
define, for p = 1, . . . , n
γp (x1:p ) = γ1 (x1 )
p
Y
γ(xq |x1:q−1 ),
q=2
I
then
n
γ1 (x1 ) Y
γp (x1:p )
γ(x1:n )
=
,
µ(x1:n )
µ1 (x1 )
γp−1 (x1:p−1 )µp (xp |x1:p−1 )
p=2
| {z } | {z }
|
{z
}
Wn (x1:n )
I
w1 (x1 )
wp (x1:p )
and we can sequentially approximate Zp =
Divide-and-Conquer SMC
Adam M. Johansen
R
γp (x1:p )dx1:p .
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Sequential Importance Resampling (SIR)
Given a sequence γ1 (x1 ), γ2 (x1:2 ), . . .:
Initialisation, n = 1:
iid
I
Sample X11 , . . . , X1N ∼ µ1
I
Compute
W1i =
I
bN =
Obtain Z
1
1
N
γ1 (X1i )
µi1 (X1i )
PN
i
i=1 W1
[This is just importance sampling.]
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Iteration, n ← n + 1:
iid
I
1
N
Resample: sample (Xn,1:n−1
, . . . , Xn,1:n−1
)∼
I
i ∼ q (·|X i
Sample Xn,n
n
n,1:n−1 )
I
Compute
Wni =
I
PN
i
i=1 δXn−1
i
γn (Xn,1:n
)
.
i
i |X i
γn−1 (Xn,1:n−1 ) · qn (Xn,n
n,1:n−1 )
Obtain
N
1 X i
N
N
b
b
Wn .
Zn = Zn−1 ·
N
i=1
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
SIR: Theoretical Justification
Under regularity conditions we still have:
unbiasedness
b N ] = Zn
E[Z
n
slln
a.s.
bnN (ϕ) = πn (ϕ)
lim π
N→∞
clt For a normal random variable Wn of appropriate
variance:
√
d
lim N[b
πnN (ϕ) − πn (ϕ)] = Wn
N→∞
although establishing this becomes a little harder (cf., e.g. Del
Moral (2004), Andrieu et al. 2010).
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Simple Particle Filters: One Family of SIR Algorithms
I
I
I
x1
x2
x3
x4
x5
x6
y1
y2
y3
y4
y5
y6
Unobserved Markov chain {Xn } transition f .
Observed process {Yn } conditional density g.
The joint density is available:
p(x1:n , y1:n |θ) = f1θ (x1 )g θ (y1 |x1 )
n
Y
f θ (xi |xi−1 )g θ (yi |xi ).
i=2
I
Natural SIR target distributions:
πnθ (x1:n ) :=p(x1:n |y1:n , θ) ∝ p(x1:n , y1:n |θ) =: γnθ (x1:n )
Z
Znθ = p(x1:n , y1:n |θ)dx1:n = p(y1:n |θ)
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Bootstrap PFs and Similar
I
Choosing
πnθ (x1:n ) :=p(x1:n |y1:n , θ) ∝ p(x1:n , y1:n |θ) =: γnθ (x1:n )
Z
θ
Zn = p(x1:n , y1:n |θ)dx1:n = p(y1:n |θ)
I
and qp (xp |x1:p−1 ) = f θ (xp |xp−1 ) yields the bootstrap particle
filter of Gordon et al. (1993),
I
whereas qp (xp |x1:p−1 ) = p(xp |xp−1 , yp , θ) yields the “locally
optimal” particle filter.
I
Note: Many alternative particle filters are SIR algorithms
with other targets. Cf. J. and Doucet (2008); Doucet and J.
(2011).
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Sequential Monte Carlo Samplers: Another SIR Class
Given a sequence of targets π1 , . . . , πn on arbitrary spaces, Del
Moral et al. (2006) extend the space:
π̃n (x1:n ) =πn (xn )
1
Y
Lp (xp+1 , xp )
p=n−1
γ̃n (x1:n ) =γn (xn )
1
Y
Lp (xp+1 , xp )
p=n−1
Z˜n =
Z
γ̃n (x1:n )dx1:n
Z
=
γn (xn )
1
Y
Z
Lp (xp+1 , xp )dx1:n =
γn (xn )dxn = Zn
p=n−1
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
A Simple SMC Sampler
Given γ1 , . . . , γn , on (E , E), for i = 1, . . . , N
iid
I
Sample X1i ∼ µ1 compute W1i =
P
bN = 1 N W i
Z
I
For p = 2, . . . , n
1
I
I
N
γ1 (X1i )
µ1 (X1i )
and
1
i=1
iid P
1:N
i
Resample: Xn,n−1
∼ N
i=1 Wn−1 δXn−1 .
i
Sample: Xni ∼ Kn (Xn,1:n−1 , ·), where πn Kn = πn .
i
I
Compute: Wni =
I
bN =
And Z
n
Divide-and-Conquer SMC
Adam M. Johansen
γn (Xn,n−1 )
i
.
γn−1 (Xn,n−1 )
P
i
N
b
Zn−1 · N1 N
i=1 Wn .
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Bayesian Inference
(Chopin, 2001;Del Moral et al., 2006)
In a Bayesian context:
I
Given a prior p(θ) and likelihood l(θ; y1:m )
I
One could specify:
Data Tempering γp (θ) = p(θ)l(θ; y1:mp ) for
m1 = 0 < m2 < · · · < mT = m
Likelihood Tempering γp (θ) = p(θ)l(θ; y1:m )βp for
β1 = 0 < β2 < · · · < βT = 1
Something else?
R
Here ZT = p(θ)l(θ; y1:n )dθ and γT (θ) ∝ p(θ|y1:n ).
I
I
Specifying (m1 , . . . , mT ), (β1 , . . . , βT ) or (γ1 , . . . , γT ) is
hard.
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
1.2
1.0
0.8
0.6
dnorm(x, mu0, var0)
0.0
0.2
0.4
0.8
0.6
0.4
0.0
0.2
dnorm(x, mu0, var0)
1.0
1.2
Illustrative Sequences of Targets
−10
−5
0
x
Divide-and-Conquer SMC
Adam M. Johansen
5
10
−10
−5
0
5
10
x
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
One Adaptive Scheme (Zhou, J. & Aston, 2016)+Refs
Resample When ESS is below a threshold.
Likelihood Tempering At iteration n: Set αn such that:
P
(j)
(j) 2
N( N
j=1 Wn−1 wn )
= CESS?
PN
(k)
(k) 2
W
(w
)
n
k=1
n−1
Proposals Follow (Jasra et al., 2010): adapt to keep
acceptance rate about right.
Question
Are there better, practical approaches to specifying a sequence of
distributions?
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Divide-and-Conquer (Lindsten, J. et al., 2014)
Many models admit natural decompositions:
e
x5
Level 0:
e
x4
Level 1:
Level 2:
e
x4
e
x1
e
x2
e
x3
e
x1
e
x2
e
x3
e
x1
e
x2
e
x3
y1
y2
y3
y1
y2
y3
y1
y2
y3
To which we can apply a divide-and-conquer strategy:
πr
πt
...
...
πc1
Divide-and-Conquer SMC
Adam M. Johansen
...
πc C
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
A few formalities. . .
I
Use a tree, T of models (with rootward variable inclusion):
πr
πt
...
...
πc 1
...
πcC
I
Let t ∈ T denote a node; r ∈ T is the root.
I
I
Let C(t) = {c1 , . . . , cC } denote the children of t.
Let Xet denote the space of variables included in t but not its
children.
I
dc-smc can be viewed as a recursion over this tree.
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
dc-smc(t) an extension of SIR
1. For c ∈ C(t) :
bN
1.1 ({Xci , Wci }N
i=1 , Zc ) ← dc-smc(c).
1.2 Resample {xic , wic }N
i=1 to obtain the equally weighted particle
i
N
system {x̂c , 1}i=1 .
2. For particle i = 1 : N:
2.1 If X̃t 6= ∅, simulate e
xit ∼ qt (· | x̂ic1 , . . . , x̂icC ), where
(c1 , c2 , . . . , cC ) = C(t);
else e
xit ← ∅.
2.2 Set xit = (x̂ic1 , . . . , x̂icC , e
xit ).
2.3 Compute wit =
γt (xit )
1
.
i
xit |x̂ic1 ,...,x̂ic )
c∈C(t) γc (x̂c ) qt (e
Q
C
3. Compute
bN
Z
t
=
n P
N
1
N
i
i=1 wt
oQ
bN
c∈C(t) Zc .
bN
4. Return ({xit , wit }N
i=1 , Zt ).
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Theoretical Properties I
Unbiasedness of Normalising Constant Estimates
Provided that γt ⊗c∈C(t) γc ⊗ qt for every t ∈ T and an
unbiased, exchangeable resampling scheme is applied to every
population at every iteration, we have for any N ≥ 1:
Z
N
b
E[Zr ] = Zr = γr (xr )dxr .
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Strong Law of Large Numbers
Under regularity conditions the weighted particle system
(xir ,N , wir ,N )N
i=1 generated by dc-smc(r ) is consistent in that for all
functions f : Z → R satisfying certain assumptions:
N
X
wN,i
r
PN
i=1
Divide-and-Conquer SMC
Adam M. Johansen
N,j
j=1 wr
f
a.s.
(xir ,N ) →
Z
f (x)π(x)dx,
as N → ∞.
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Some (Importance) Extensions
1. Mixture Resampling
2. Tempering (Del Moral et al, 2006)
3. Adaptation (Zhou, J. and Aston, 2016)
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
An Ising Model
I
k indexes V ∈ V ⊂ Z2
I
xk ∈ {−1, 1}
I
p(z) ∝ e −βE (z) , β ≥ 0
P
E (z) = − (k,l)∈E xk xl
I
We consider a grid of size 64 × 64 with β = 0.4407 (the critical
temperature).
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
A sequence of decompositions
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
log Z
3815
3810
3805
3800
SMC
D&C−SMC (ann)
2
D&C−SMC (ann+mix)
3
10
4
10
Wall−clock time (s)
10
E[E (x)]
−2450
−2500
−2550
−2600
SMC
−2650
2
10
D&C−SMC (ann)
D&C−SMC (ann+mix)
Single flip MH
3
4
10
Wall−clock time (s)
10
Summaries over 50 independent runs of each algorithm.
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
New York Schools Maths Test: data
I
Data organised into a tree T .
I
A root-to-leaf path is: NYC (the root, denoted by r ∈ T ),
borough, school district, school, year.
I
Each leaf t ∈ T comes with an observation of mt exam
successes out of Mt trials.
I
Total of 278 399 test instances
I
five borough (Manhattan, The Bronx, Brooklyn, Queens,
Staten Island),
I
32 distinct districts,
I
710 distinct schools.
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
New York Schools Maths Test: Bayesian Model
I
Number of successes mt at a leaf t is Bin(Mt , pt ).
I
where pt = logistic(θt ), where θt is a latent parameter.
I
internal nodes of the tree also have a latent θt
I
model the difference in θt along e = (t → t 0 ) as
θt 0 = θt + ∆ e ,
I
where, ∆e ∼ N(0, σe2 ).
I
We put an improper prior (uniform on (−∞, ∞)) on θr .
I
We also make the variance random, but shared across
siblings, σt2 ∼ Exp(1).
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
New York Schools Maths Test: Implementation
I
The basic SIR-implementation of dc-smc.
I
Using the natural hierarchical structure provided by the
model.
I
Given σt2 and the θt at the leaves, the other random variables
are multivariate normal.
I
We instantiate values for θt only at the leaves.
I
At internal node t 0 , sample only σt20 and marginalize out θt 0 .
Each step of dc-smctherefore is either:
I
i. At leaves sample pt ∼ Beta(1 + mt , 1 + Mt − mt ) and set
θt = logit(pt ).
ii. At internal nodes sample σt2 ∼ Exp(1).
I
Java implementation:
https://github.com/alexandrebouchard/multilevelSMC
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Mean parameter
New York Schools Maths Test: Results
NY
NY−Bronx
NY−Kings NY−Manhattan NY−Queens NY−Richmond
5
4
3
2
1
0
0
1
2
0
1
2
0
1
2
0
1
2
0
1
2
0
1
2
Posterior density approximation
I
I
I
I
I
1
DC with 10 000 particles.
Bronx County has the highest fraction (41%) of children
(under 18) living below poverty level.1
Queens has the second lowest (19.7%),
after Richmond (Staten Island, 16.7%).
Staten Island contains a single school district so the posterior
distribution is much flatter for this borough.
Statistics from the New York State Poverty Report 2013,
http://ams.nyscommunityaction.org/Resources/Documents/News/NYSCAAs˙2013˙Poverty˙Report.pdf
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Normalising Constant Estimates
●
●
●
●
●
●
Estimate of log(Z)
−3825
●
−3850
●
−3875
sampling_method
DC
STD
−3900
100
1000
10000
100000
1000000
Number of particles (log scale)
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Distributed Implementation
5000
4000
10
Time (s)
Speedup
3000
2000
1000
0
1
0
10
20
Number of nodes
30
1
10
Number of nodes
Xeon X5650 2.66GHz processors connected by a non-blocking
Infiniband 4X QDR network
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
Conclusions
I
SMC ≈ SIR
I
D&C-SMC ≈ SIR + Coalescence
I
Distributed implementation is straightforward
I
D&C strategy can improve even serial performance
Some questions remain unanswered:
I
I
I
How can we construct (near) optimal tree-decompositions?
How much standard SMC theory can be extended to this
setting?
Divide-and-Conquer SMC
Adam M. Johansen
Sequential Monte Carlo
Divide-and-Conquer SMC
Illustrative Applications
References
[1] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo. Journal of the Royal
Statistical Society B, 72(3):269–342, 2010.
[2] N. Chopin. A sequential particle filter method for static models. Biometrika, 89(3):539–551, 2002.
[3] P. Del Moral. Feynman-Kac formulae: genealogical and interacting particle systems with applications.
Probability and Its Applications. Springer Verlag, New York, 2004.
[4] P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo methods for Bayesian Computation. In
Bayesian Statistics 8. Oxford University Press, 2006.
[5] A. Doucet and A. M. Johansen. A tutorial on particle filtering and smoothing: Fiteen years later. In
D. Crisan and B. Rozovsky, editors, The Oxford Handbook of Nonlinear Filtering, pages 656–704. Oxford
University Press, 2011.
[6] N. J. Gordon, S. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-Gaussian Bayesian
state estimation. IEE Proceedings-F, 140(2):107–113, April 1993.
[7] A. Jasra, D. A. Stephens, A. Doucet, and T. Tsagaris. Inference for Lévy-Driven Stochastic Volatility
Models via Adaptive Sequential Monte Carlo. Scandinavian Journal of Statistics, 38(1):1–22, Dec. 2010.
[8] A. M. Johansen and A. Doucet. A note on the auxiliary particle filter. Statistics and Probability Letters, 78
(12):1498–1504, September 2008. URL http://dx.doi.org/10.1016/j.spl.2008.01.032.
[9] F. Lindsten, A. M. Johansen, C. A. Næsseth, B. Kirkpatrick, T. Schön, J. A. D. Aston, and
A. Bouchard-Côté. Divide and conquer with sequential Monte Carlo samplers. Technical Report
1406.4993, ArXiv Mathematics e-prints, 2014.
[10] Y. Zhou, A. M. Johansen, and J. A. D. Aston. Towards automatic model comparison: An adaptive
sequential Monte Carlo approach. Journal of Computational and Graphical Statistics, 2015. In press.
Download