Some [very biased] Perspectives on Sequential Monte Carlo and Normalising Constants Adam M. Johansen Collaborators Include: John Aston, Alexandre Bouchard-Côté, Pierre Del Moral, Arnaud Doucet, Pieralberto Guarniero, Anthony Lee, Fredrik Lindsten, Christian Næsseth, Thomas Schön, Yan Zhou University of Warwick a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ CRiSM Workshop on Estimating Normalising Constants University of Warwick, April 20th, 2016 SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References Outline I Background: SMC and (Ratios of) Constants I Some Applications and Variations on a Theme I Some “Interesting” (Open) Questions SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References Essential Problem I Given a distribution, π(dx) = I I γ(dx) γ(x)dx = , Z Z such that γ(x) can be evaluated pointwise, R how can we “estimate” Z = γ(dx)? SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References Importance Sampling I Simple Identity, provided γ µ: Z Z Z dγ γ(x) Z = γ(dx) = (x)µ(dx) = µ(x)dx dµ µ(x) I So, if X1 , X2 , . . . ∼ µ, then: iid # N 1 X dγ (Xi ) =Z ∀N : E N dµ " unbiasedness 1 N→∞ N slln lim √ " 1 N h i dγ (X1 ) . where W ∼ N 0, Var dµ clt SMC and Constants Adam M. Johansen lim N→∞ N N X i=1 i=1 N X i=1 dγ a.s. (Xi ) →Z dµ # dγ d (Xi ) − Z →W dµ Background Applications, Variations, Questions References Sequential Importance Sampling I Write γ(x1:n ) = γ(x1 ) n Y γ(xp |x1:p−1 ), p=2 I define, for p = 1, . . . , n γp (x1:p ) = γ1 (x1 ) p Y γ(xq |x1:q−1 ), q=2 I then n γ1 (x1 ) Y γp (x1:p ) γ(x1:n ) = , µ(x1:n ) µ1 (x1 ) γp−1 (x1:p−1 )µp (xp |x1:p−1 ) p=2 | {z } | {z } | {z } Wn (x1:n ) I w1 (x1 ) wp (x1:p ) and we can sequentially approximate Zp = SMC and Constants Adam M. Johansen R γp (x1:p )dx1:p . Background Applications, Variations, Questions References Sequential Importance Resampling (SIR) Given a sequence γ1 (x1 ), γ2 (x1:2 ), . . .: Initialisation, n = 1: iid I Sample X11 , . . . , X1N ∼ µ1 I Compute W1i = I bN = Obtain Z 1 1 N γ1 (X1i ) µi1 (X1i ) PN i i=1 W1 [This is just importance sampling.] SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References Iteration, n ← n + 1: I 1:N 1 Resample: sample (A1n−1 , . . . , AN n−1 ) ∼ r (·|Wn−1 ) . I i i Set Bn,n = i and recursively Bn,p = Ap n,p+1 . I n,n−1 ) Sample Xni ∼ qn (·|X1 n,1 , . . . , Xn−1 I Compute Bi Bi Bi Bi Wni I = Bi γn ((X1 n,1 , . . . , Xn n,n )) Bi Bi Bi Bi n,n−1 n,n−1 ) )) · qn (Xni |X1 n,1 , . . . , Xn−1 γn−1 ((X1 n,1 , . . . , Xn−1 Obtain N 1 X i N N b b Zn = Zn−1 · Wn . N i=1 1 j such that P(Ain−1 = j) ∝ Wn−1 SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References Ancestral Trees t=1 t=2 t=3 a21 =1 a24 =3 a11 =1 a14 =3 2 4 6 b3,1:3 =(1, 1, 2) b3,1:3 =(3, 3, 4) b3,1:3 =(4, 5, 6) SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References SIR: Theoretical Justification Under regularity conditions we still have: unbiasedness b N ] = Zn E[Z n slln b N a.s. lim Z n = Zn N→∞ clt For a normal random variable Wn of appropriate variance: √ d b N − Zn ] = lim N[Z Wn n N→∞ although establishing this becomes a little harder (cf., e.g. Del Moral (2004), Andrieu et al. 2010). SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References Simple Particle Filters: One Family of SIR Algorithms I I I x1 x2 x3 x4 x5 x6 y1 y2 y3 y4 y5 y6 Unobserved Markov chain {Xn } transition f . Observed process {Yn } conditional density g. The joint density is available: p(x1:n , y1:n |θ) = f1θ (x1 )g θ (y1 |x1 ) n Y f θ (xi |xi−1 )g θ (yi |xi ). i=2 I Natural SIR target distributions: πnθ (x1:n ) :=p(x1:n |y1:n , θ) ∝ p(x1:n , y1:n |θ) =: γnθ (x1:n ) Z Znθ = p(x1:n , y1:n |θ)dx1:n = p(y1:n |θ) SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References Bootstrap PFs and Similar I Choosing πnθ (x1:n ) :=p(x1:n |y1:n , θ) ∝ p(x1:n , y1:n |θ) =: γnθ (x1:n ) Z θ Zn = p(x1:n , y1:n |θ)dx1:n = p(y1:n |θ) I and qp (xp |x1:p−1 ) = f θ (xp |xp−1 ) yields the bootstrap particle filter of Gordon et al. (1993), I whereas qp (xp |x1:p−1 ) = p(xp |xp−1 , yp , θ) yields the “locally optimal” particle filter. I Note: Many alternative particle filters are SIR algorithms with other targets. Cf. J. and Doucet (2008); Doucet and J. (2011). SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References Sequential Monte Carlo Samplers: Another SIR Class Given a sequence of targets π1 , . . . , πn on arbitrary spaces, Del Moral et al. (2006) extend the space: π̃n (x1:n ) =πn (xn ) 1 Y Lp (xp+1 , xp ) p=n−1 γ̃n (x1:n ) =γn (xn ) 1 Y Lp (xp+1 , xp ) p=n−1 Z˜n = Z γ̃n (dx1:n ) Z = γn (dxn ) 1 Y Z Lp (xp+1 , dxp ) = γn (dxn ) = Zn p=n−1 SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References A Simple SMC Sampler Given γ1 , . . . , γn , on (E , E), for i = 1, . . . , N iid I Sample X1i ∼ µ1 compute W1i = P bN = 1 N W i Z I For p = 2, . . . , n 1 N γ1 (X1i ) µ1 (X1i ) and 1 i=1 I 1:N 1:N ∼ r (·|Wn−1 ). Resample: An−1 I n−1 Sample: Xni ∼ Kn (Xn−1 , ·), where πn Kn = πn . I Compute: Wni = Ai Ai I SMC and Constants Adam M. Johansen bN = And Z n n−1 γn (Xn−1 ) Ai . n−1 γn−1 (Xn−1 ) PN 1 N b Zn−1 · N i=1 Wni . Background Applications, Variations, Questions References Rare Events (J., Del Moral and Doucet, 2006) I If Y ∼ η, P(Y ∈ A) = η(A) = η(IA ). I We’re interested in A = {x : V (x) ≥ V̂ }, with η(A) 1. I If γ(x) = η(x)IA (x), then Zn = η(A). I Let α1 = 0 < α2 < · · · < αT = ∞, and define b − V (x))))−1 γn (x) = η(x)(1 + exp((αn (V I We have γ1 = η and γT /η = IA so ZT = η(A). SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References 1.0 1.5 An Illustrative Sequence of Targets alpha 0.0 0.5 pi_a(x) 0 1 2 5 10 25 −5 0 5 x SMC and Constants Adam M. Johansen 10 15 Background Applications, Variations, Questions References Evidence Evaluation (Chopin, 2001;Del Moral et al., 2006) In a Bayesian context: I Given a prior p(θ) and likelihood l(θ; y1:m ) I One could specify: Data Tempering γp (θ) = p(θ)l(θ; y1:mp ) for m1 = 0 < m2 < · · · < mn = m Likelihood Tempering γp (θ) = p(θ)l(θ; y1:m )βp for β1 = 0 < β2 < · · · < βT = 1 Something else? R For such schemes ZT = p(θ)l(θ; y1:n )dθ. I I Specifying (m1 , . . . , mT ), (β1 , . . . , βT ) or (γ1 , . . . , γT ) is not trivial. SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References 1.2 1.0 0.8 0.6 dnorm(x, mu0, var0) 0.0 0.2 0.4 0.8 0.6 0.4 0.0 0.2 dnorm(x, mu0, var0) 1.0 1.2 Illustrative Sequences of Targets −10 −5 0 x SMC and Constants Adam M. Johansen 5 10 −10 −5 0 5 10 x Background Applications, Variations, Questions References One Adaptive Scheme (Zhou, J. & Aston, 2016)+Refs Resample When ESS is below a threshold. Likelihood Tempering At iteration n: Set αn such that: P (j) (j) 2 N( N j=1 Wn−1 wn ) = CESS? PN (k) (k) 2 W (w ) n k=1 n−1 Proposals Follow (Jasra et al., 2010): adapt to keep acceptance rate about right. Question Are there better, practical approaches to specifying a sequence of distributions? SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References Results from Zhou et al. (2016) Model.Order 1 2 3 Model.Order 1 2 3 Model selection results using AIC (above) / Evidence (below). SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References Iterative Adaptation (Guaniero, J. and Lee, 2015) [Poster 7] I In filtering context πp (x1:p ) ∝ p(x1:p , y1:p )p(yp+1:n |xp ) | {z } ψp? (xp ) and qp (xp |x1:p−1 ) = p(xp |xp−1 , yp:n ) I gives better estimates of ZT than the filtering distributions. The iAPF iteratively targets sequences πnl (x1:n ) ∝ p(x1:n , y1:n )ψnl (xn ) while estimating ψnl+1 . Question Can we approximate broader classes of transitions? Are there better approximation schemes? SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References A Linear Gaussian Model: Behaviour with Dimension µ = N (·; 0, Id ) f (x, ·) = N (·; Ax, Id ) where Aij = 0.42|i−j|+1 , 0.5 1.0 1.5 iAPF Bootstrap 0.0 Estimates 2.0 and g(x, ·) = N (·; x, Id ) 5 5 10 10 20 40 80 Dimension Box plots of Ẑ /Z for different |X| (1000 replicates; T = 100). SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References Linear Gaussian Model: Sensitivity to Parameters 1.5 1.0 Estimates 0.0 0.5 1.5 1.0 0.0 0.5 Estimates 2.0 2.0 Fixing d = 10: Bootstrap (N = 50, 000) / iAPF (N0 = 1, 000) 0.3 0.34 0.38 0.42 Values of α Box plots of SMC and Constants Adam M. Johansen Ẑ Z 0.46 0.5 0.3 0.34 0.38 0.42 0.46 0.5 Values of α for different values of the parameter α using 1000 replicates. Background Applications, Variations, Questions References Linear Gaussian Model: PMMH Empirical Autocorrelations 0 100 200 300 400 500 600 0.8 ACF 0 Lag d =5 µ = N (·; 0, Id ) f (x, ·) = N (·; Ax, Id ) and g(x, ·) = N (·; x, 0.25Id ) SMC and Constants Adam M. Johansen 0.4 0.0 ACF 0.4 0.0 0.4 0.0 ACF A55 0.8 A41 0.8 A11 100 200 300 400 500 Lag In this case: 0.9 0.3 A= 0.1 0.4 0.1 600 0 100 200 300 400 500 600 Lag 0 0 0 0 0.7 0 0 0 0.2 0.6 0 0 0.1 0.1 0.3 0 0.2 0.5 0.2 0 , Background treated as unknown lower trianguApplications, Variations, Questions References lar matrix. Divide and Conquer (Lindsten, J. et al., 2014) Question Are there good ways to decompose general models? SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References dc-smc(t) 1. For c ∈ C(t) : bN 1.1 ({Xci , Wci }N i=1 , Zc ) ← dc-smc(c). 1.2 Resample {xic , wic }N i=1 to obtain the equally weighted particle i N system {x̂c , 1}i=1 . 2. For particle i = 1 : N: 2.1 If X̃t 6= ∅, simulate e xit ∼ qt (· | x̂ic1 , . . . , x̂icC ), where (c1 , c2 , . . . , cC ) = C(t); else e xit ← ∅. 2.2 Set xit = (x̂ic1 , . . . , x̂icC , e xit ). 2.3 Compute wit = γt (xit ) 1 . i xit |x̂ic1 ,...,x̂ic ) c∈C(t) γc (x̂c ) qt (e Q C 3. Compute bN Z t = n P N 1 N i i=1 wt oQ bN c∈C(t) Zc . bN 4. Return ({xit , wit }N i=1 , Zt ). SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References Multilevel modelling of NYC maths test data ● ● ● ● ● ● Estimate of log(Z) −3825 ● −3850 ● −3875 sampling_method DC STD −3900 100 1000 10000 100000 1000000 Number of particles (log scale) SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References How Wrong is it to Approximate? (Everitt, J. et al., 2015) [Poster 4] What if Wpi can’t be evaluated exactly or Kn isn’t exactly πn -invariant? b N ), weights: MSE of log(Z n exact (black solid), unbiased random (black dashed), biased random (grey solid) biased random — perfect mixing (grey dashed). Question Can we actually say anything practically relevant? SMC and Constants Adam M. Johansen Background Applications, Variations, Questions References Inconclusive Conclusions I I SMC provides good estimates of many (normalizing) constants. There are many questions left to answer: I How best to choose sequences of distributions. . . I I I I I I SMC and Constants Adam M. Johansen “Filtering” Generally: Tempering / Data-tempering / not tempering? Model-decompositions? Tempering to Independence? To what extent can we ever justify using approximate approximations within SMC? How best can we approximate optimal lookahead functions? How much can be done adaptively? Background Applications, Variations, Questions References References [1] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo. Journal of the Royal Statistical Society B, 72(3):269–342, 2010. [2] N. Chopin. A sequential particle filter method for static models. Biometrika, 89(3):539–551, 2002. [3] P. Del Moral. Feynman-Kac formulae: genealogical and interacting particle systems with applications. Probability and Its Applications. Springer Verlag, New York, 2004. [4] P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo methods for Bayesian Computation. In Bayesian Statistics 8. Oxford University Press, 2006. [5] A. Doucet and A. M. Johansen. A tutorial on particle filtering and smoothing: Fiteen years later. In D. Crisan and B. Rozovsky, editors, The Oxford Handbook of Nonlinear Filtering, pages 656–704. Oxford University Press, 2011. [6] R. G. Everitt, A. M. Johansen, E. Rowing, and M. Evdemon-Hogan. Bayesian model selection with un-normalised likelihoods. Statistics and Computing, 2016. In press. [7] N. J. Gordon, S. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings-F, 140(2):107–113, April 1993. [8] P. Guarniero, A. M. Johansen, and A. Lee. The iterated auxiliary particle filter. ArXiv mathematics e-print 1511.06286, ArXiv Mathematics e-prints, 2015. [9] A. Jasra, D. A. Stephens, A. Doucet, and T. Tsagaris. Inference for Lévy-Driven Stochastic Volatility Models via Adaptive Sequential Monte Carlo. Scandinavian Journal of Statistics, 38(1):1–22, Dec. 2010. [10] A. M. Johansen and A. Doucet. A note on the auxiliary particle filter. Statistics and Probability Letters, 78 (12):1498–1504, September 2008. URL http://dx.doi.org/10.1016/j.spl.2008.01.032. [11] A. M. Johansen, P. Del Moral, and A. Doucet. Sequential Monte Carlo samplers for rare events. In Proceedings of the 6th International Workshop on Rare Event Simulation, pages 256–267, Bamberg, Germany, October 2006. [12] F. Lindsten, A. M. Johansen, C. A. Næsseth, B. Kirkpatrick, T. Schön, J. A. D. Aston, and A. Bouchard-Côté. Divide and conquer with sequential Monte Carlo samplers. Technical Report 1406.4993, ArXiv Mathematics e-prints, 2014. [13] Y. Zhou, A. M. Johansen, and J. A. D. Aston. Towards automatic model comparison: An adaptive sequential Monte Carlo approach. Journal of Computational and Graphical Statistics, 2015. In press.