Divide-and-Conquer Sequential Monte Carlo Adam M. Johansen Joint work with: John Aston, Alexandre Bouchard-Côté, Brent Kirkpatrick, Fredrik Lindsten, Christian Næsseth, Thomas Schön University of Warwick a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ OxWaSP/Amazon Meeting, Berlin May 12th, 2016 Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Outline I Sequential Monte Carlo I Divide and Conquer SMC I Illustrative Applications Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Essential Problem The Abstract Problem I Given a distribution, π(x) = γ(x) , Z I such that γ(x) can be evaluated pointwise, I how can we approximate π I and how about Z ? I Can we do so robustly? I In a distributed setting? Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Importance Sampling I Simple Identity, provided γ µ: Z Z γ(x) Z = γ(x)dx = µ(x)dx µ(x) I So, if X1 , X2 , . . . ∼ µ, then: iid # N 1 X γ(Xi ) =Z ∀N : E N µ(Xi ) " unbiasedness i=1 N 1 X γ(Xi ) a.s. slln lim ϕ(Xi ) →γ(ϕ) µ(Xi ) N→∞ N i=1 " # N √ 1 X γ(Xi ) d clt lim N ϕ(Xi ) − γ(ϕ) →W N µ(Xi ) N→∞ i=1 h i γ(X ) where W ∼ N 0, Var µ(X1i ) ϕ(X1 ) . Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Sequential Importance Sampling I Write γ(x1:n ) = γ(x1 ) n Y γ(xp |x1:p−1 ), p=2 I define, for p = 1, . . . , n γp (x1:p ) = γ1 (x1 ) p Y γ(xq |x1:q−1 ), q=2 I then n γ1 (x1 ) Y γp (x1:p ) γ(x1:n ) = , µ(x1:n ) µ1 (x1 ) γp−1 (x1:p−1 )µp (xp |x1:p−1 ) p=2 | {z } | {z } | {z } Wn (x1:n ) I w1 (x1 ) wp (x1:p ) and we can sequentially approximate Zp = Divide-and-Conquer SMC Adam M. Johansen R γp (x1:p )dx1:p . Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Sequential Importance Resampling (SIR) Given a sequence γ1 (x1 ), γ2 (x1:2 ), . . .: Initialisation, n = 1: iid I Sample X11 , . . . , X1N ∼ µ1 I Compute W1i = I bN = Obtain Z 1 1 N γ1 (X1i ) µi1 (X1i ) PN i i=1 W1 [This is just importance sampling.] Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Iteration, n ← n + 1: iid I 1 N Resample: sample (Xn,1:n−1 , . . . , Xn,1:n−1 )∼ I i ∼ q (·|X i Sample Xn,n n n,1:n−1 ) I Compute Wni = I PN i i=1 δXn−1 i γn (Xn,1:n ) . i i |X i γn−1 (Xn,1:n−1 ) · qn (Xn,n n,1:n−1 ) Obtain N 1 X i N N b b Wn . Zn = Zn−1 · N i=1 Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications SIR: Theoretical Justification Under regularity conditions we still have: unbiasedness b N ] = Zn E[Z n slln a.s. bnN (ϕ) = πn (ϕ) lim π N→∞ clt For a normal random variable Wn of appropriate variance: √ d lim N[b πnN (ϕ) − πn (ϕ)] = Wn N→∞ although establishing this becomes a little harder (cf., e.g. Del Moral (2004), Andrieu et al. 2010). Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Simple Particle Filters: One Family of SIR Algorithms I I I x1 x2 x3 x4 x5 x6 y1 y2 y3 y4 y5 y6 Unobserved Markov chain {Xn } transition f . Observed process {Yn } conditional density g. The joint density is available: p(x1:n , y1:n |θ) = f1θ (x1 )g θ (y1 |x1 ) n Y f θ (xi |xi−1 )g θ (yi |xi ). i=2 I Natural SIR target distributions: πnθ (x1:n ) :=p(x1:n |y1:n , θ) ∝ p(x1:n , y1:n |θ) =: γnθ (x1:n ) Z Znθ = p(x1:n , y1:n |θ)dx1:n = p(y1:n |θ) Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Bootstrap PFs and Similar I Choosing πnθ (x1:n ) :=p(x1:n |y1:n , θ) ∝ p(x1:n , y1:n |θ) =: γnθ (x1:n ) Z θ Zn = p(x1:n , y1:n |θ)dx1:n = p(y1:n |θ) I and qp (xp |x1:p−1 ) = f θ (xp |xp−1 ) yields the bootstrap particle filter of Gordon et al. (1993), I whereas qp (xp |x1:p−1 ) = p(xp |xp−1 , yp , θ) yields the “locally optimal” particle filter. I Note: Many alternative particle filters are SIR algorithms with other targets. Cf. J. and Doucet (2008); Doucet and J. (2011). Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Sequential Monte Carlo Samplers: Another SIR Class Given a sequence of targets π1 , . . . , πn on arbitrary spaces, Del Moral et al. (2006) extend the space: π̃n (x1:n ) =πn (xn ) 1 Y Lp (xp+1 , xp ) p=n−1 γ̃n (x1:n ) =γn (xn ) 1 Y Lp (xp+1 , xp ) p=n−1 Z˜n = Z γ̃n (x1:n )dx1:n Z = γn (xn ) 1 Y Z Lp (xp+1 , xp )dx1:n = γn (xn )dxn = Zn p=n−1 Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications A Simple SMC Sampler Given γ1 , . . . , γn , on (E , E), for i = 1, . . . , N iid I Sample X1i ∼ µ1 compute W1i = P bN = 1 N W i Z I For p = 2, . . . , n 1 I I N γ1 (X1i ) µ1 (X1i ) and 1 i=1 iid P 1:N i Resample: Xn,n−1 ∼ N i=1 Wn−1 δXn−1 . i Sample: Xni ∼ Kn (Xn,1:n−1 , ·), where πn Kn = πn . i I Compute: Wni = I bN = And Z n Divide-and-Conquer SMC Adam M. Johansen γn (Xn,n−1 ) i . γn−1 (Xn,n−1 ) P i N b Zn−1 · N1 N i=1 Wn . Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Bayesian Inference (Chopin, 2001;Del Moral et al., 2006) In a Bayesian context: I Given a prior p(θ) and likelihood l(θ; y1:m ) I One could specify: Data Tempering γp (θ) = p(θ)l(θ; y1:mp ) for m1 = 0 < m2 < · · · < mT = m Likelihood Tempering γp (θ) = p(θ)l(θ; y1:m )βp for β1 = 0 < β2 < · · · < βT = 1 Something else? R Here ZT = p(θ)l(θ; y1:n )dθ and γT (θ) ∝ p(θ|y1:n ). I I Specifying (m1 , . . . , mT ), (β1 , . . . , βT ) or (γ1 , . . . , γT ) is hard. Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications 1.2 1.0 0.8 0.6 dnorm(x, mu0, var0) 0.0 0.2 0.4 0.8 0.6 0.4 0.0 0.2 dnorm(x, mu0, var0) 1.0 1.2 Illustrative Sequences of Targets −10 −5 0 x Divide-and-Conquer SMC Adam M. Johansen 5 10 −10 −5 0 5 10 x Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications One Adaptive Scheme (Zhou, J. & Aston, 2016)+Refs Resample When ESS is below a threshold. Likelihood Tempering At iteration n: Set αn such that: P (j) (j) 2 N( N j=1 Wn−1 wn ) = CESS? PN (k) (k) 2 W (w ) n k=1 n−1 Proposals Follow (Jasra et al., 2010): adapt to keep acceptance rate about right. Question Are there better, practical approaches to specifying a sequence of distributions? Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Divide-and-Conquer (Lindsten, J. et al., 2014) Many models admit natural decompositions: e x5 Level 0: e x4 Level 1: Level 2: e x4 e x1 e x2 e x3 e x1 e x2 e x3 e x1 e x2 e x3 y1 y2 y3 y1 y2 y3 y1 y2 y3 To which we can apply a divide-and-conquer strategy: πr πt ... ... πc1 Divide-and-Conquer SMC Adam M. Johansen ... πc C Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications A few formalities. . . I Use a tree, T of models (with rootward variable inclusion): πr πt ... ... πc 1 ... πcC I Let t ∈ T denote a node; r ∈ T is the root. I I Let C(t) = {c1 , . . . , cC } denote the children of t. Let Xet denote the space of variables included in t but not its children. I dc-smc can be viewed as a recursion over this tree. Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications dc-smc(t) an extension of SIR 1. For c ∈ C(t) : bN 1.1 ({Xci , Wci }N i=1 , Zc ) ← dc-smc(c). 1.2 Resample {xic , wic }N i=1 to obtain the equally weighted particle i N system {x̂c , 1}i=1 . 2. For particle i = 1 : N: 2.1 If X̃t 6= ∅, simulate e xit ∼ qt (· | x̂ic1 , . . . , x̂icC ), where (c1 , c2 , . . . , cC ) = C(t); else e xit ← ∅. 2.2 Set xit = (x̂ic1 , . . . , x̂icC , e xit ). 2.3 Compute wit = γt (xit ) 1 . i xit |x̂ic1 ,...,x̂ic ) c∈C(t) γc (x̂c ) qt (e Q C 3. Compute bN Z t = n P N 1 N i i=1 wt oQ bN c∈C(t) Zc . bN 4. Return ({xit , wit }N i=1 , Zt ). Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Theoretical Properties I Unbiasedness of Normalising Constant Estimates Provided that γt ⊗c∈C(t) γc ⊗ qt for every t ∈ T and an unbiased, exchangeable resampling scheme is applied to every population at every iteration, we have for any N ≥ 1: Z N b E[Zr ] = Zr = γr (xr )dxr . Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Strong Law of Large Numbers Under regularity conditions the weighted particle system (xir ,N , wir ,N )N i=1 generated by dc-smc(r ) is consistent in that for all functions f : Z → R satisfying certain assumptions: N X wN,i r PN i=1 Divide-and-Conquer SMC Adam M. Johansen N,j j=1 wr f a.s. (xir ,N ) → Z f (x)π(x)dx, as N → ∞. Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Some (Importance) Extensions 1. Mixture Resampling 2. Tempering (Del Moral et al, 2006) 3. Adaptation (Zhou, J. and Aston, 2016) Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications An Ising Model I k indexes V ∈ V ⊂ Z2 I xk ∈ {−1, 1} I p(z) ∝ e −βE (z) , β ≥ 0 P E (z) = − (k,l)∈E xk xl I We consider a grid of size 64 × 64 with β = 0.4407 (the critical temperature). Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications A sequence of decompositions Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications log Z 3815 3810 3805 3800 SMC D&C−SMC (ann) 2 D&C−SMC (ann+mix) 3 10 4 10 Wall−clock time (s) 10 E[E (x)] −2450 −2500 −2550 −2600 SMC −2650 2 10 D&C−SMC (ann) D&C−SMC (ann+mix) Single flip MH 3 4 10 Wall−clock time (s) 10 Summaries over 50 independent runs of each algorithm. Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications New York Schools Maths Test: data I Data organised into a tree T . I A root-to-leaf path is: NYC (the root, denoted by r ∈ T ), borough, school district, school, year. I Each leaf t ∈ T comes with an observation of mt exam successes out of Mt trials. I Total of 278 399 test instances I five borough (Manhattan, The Bronx, Brooklyn, Queens, Staten Island), I 32 distinct districts, I 710 distinct schools. Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications New York Schools Maths Test: Bayesian Model I Number of successes mt at a leaf t is Bin(Mt , pt ). I where pt = logistic(θt ), where θt is a latent parameter. I internal nodes of the tree also have a latent θt I model the difference in θt along e = (t → t 0 ) as θt 0 = θt + ∆ e , I where, ∆e ∼ N(0, σe2 ). I We put an improper prior (uniform on (−∞, ∞)) on θr . I We also make the variance random, but shared across siblings, σt2 ∼ Exp(1). Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications New York Schools Maths Test: Implementation I The basic SIR-implementation of dc-smc. I Using the natural hierarchical structure provided by the model. I Given σt2 and the θt at the leaves, the other random variables are multivariate normal. I We instantiate values for θt only at the leaves. I At internal node t 0 , sample only σt20 and marginalize out θt 0 . Each step of dc-smctherefore is either: I i. At leaves sample pt ∼ Beta(1 + mt , 1 + Mt − mt ) and set θt = logit(pt ). ii. At internal nodes sample σt2 ∼ Exp(1). I Java implementation: https://github.com/alexandrebouchard/multilevelSMC Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Mean parameter New York Schools Maths Test: Results NY NY−Bronx NY−Kings NY−Manhattan NY−Queens NY−Richmond 5 4 3 2 1 0 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 Posterior density approximation I I I I I 1 DC with 10 000 particles. Bronx County has the highest fraction (41%) of children (under 18) living below poverty level.1 Queens has the second lowest (19.7%), after Richmond (Staten Island, 16.7%). Staten Island contains a single school district so the posterior distribution is much flatter for this borough. Statistics from the New York State Poverty Report 2013, http://ams.nyscommunityaction.org/Resources/Documents/News/NYSCAAs˙2013˙Poverty˙Report.pdf Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Normalising Constant Estimates ● ● ● ● ● ● Estimate of log(Z) −3825 ● −3850 ● −3875 sampling_method DC STD −3900 100 1000 10000 100000 1000000 Number of particles (log scale) Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Distributed Implementation 5000 4000 10 Time (s) Speedup 3000 2000 1000 0 1 0 10 20 Number of nodes 30 1 10 Number of nodes Xeon X5650 2.66GHz processors connected by a non-blocking Infiniband 4X QDR network Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications Conclusions I SMC ≈ SIR I D&C-SMC ≈ SIR + Coalescence I Distributed implementation is straightforward I D&C strategy can improve even serial performance Some questions remain unanswered: I I I How can we construct (near) optimal tree-decompositions? How much standard SMC theory can be extended to this setting? Divide-and-Conquer SMC Adam M. Johansen Sequential Monte Carlo Divide-and-Conquer SMC Illustrative Applications References [1] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo. Journal of the Royal Statistical Society B, 72(3):269–342, 2010. [2] N. Chopin. A sequential particle filter method for static models. Biometrika, 89(3):539–551, 2002. [3] P. Del Moral. Feynman-Kac formulae: genealogical and interacting particle systems with applications. Probability and Its Applications. Springer Verlag, New York, 2004. [4] P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo methods for Bayesian Computation. In Bayesian Statistics 8. Oxford University Press, 2006. [5] A. Doucet and A. M. Johansen. A tutorial on particle filtering and smoothing: Fiteen years later. In D. Crisan and B. Rozovsky, editors, The Oxford Handbook of Nonlinear Filtering, pages 656–704. Oxford University Press, 2011. [6] N. J. Gordon, S. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings-F, 140(2):107–113, April 1993. [7] A. Jasra, D. A. Stephens, A. Doucet, and T. Tsagaris. Inference for Lévy-Driven Stochastic Volatility Models via Adaptive Sequential Monte Carlo. Scandinavian Journal of Statistics, 38(1):1–22, Dec. 2010. [8] A. M. Johansen and A. Doucet. A note on the auxiliary particle filter. Statistics and Probability Letters, 78 (12):1498–1504, September 2008. URL http://dx.doi.org/10.1016/j.spl.2008.01.032. [9] F. Lindsten, A. M. Johansen, C. A. Næsseth, B. Kirkpatrick, T. Schön, J. A. D. Aston, and A. Bouchard-Côté. Divide and conquer with sequential Monte Carlo samplers. Technical Report 1406.4993, ArXiv Mathematics e-prints, 2014. [10] Y. Zhou, A. M. Johansen, and J. A. D. Aston. Towards automatic model comparison: An adaptive sequential Monte Carlo approach. Journal of Computational and Graphical Statistics, 2015. In press.