Introduction Interpretation Applications & Extensions Conclusions References Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen1 adam.johansen@bristol.ac.uk Oxford University Man Institute — 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley 1 Introduction Interpretation Applications & Extensions Conclusions References Introduction 2 Introduction Interpretation Applications & Extensions Conclusions References Outline Outline I Background: Particle Filters I I I I Interpretation I I I I Auxiliary Particle Filters are SIR Algorithms Theoretical Considerations Practical Implications Applications I I I I Hidden Markov Models & Filtering Particle Filters & Sequential Importance Resampling Auxiliary Particle Filters Auxiliary SMC Samplers The Probability Hypothesis Density On Stratification Conclusions 3 Introduction Interpretation Applications & Extensions Conclusions References Outline Outline I Background: Particle Filters I I I I Interpretation I I I I Auxiliary Particle Filters are SIR Algorithms Theoretical Considerations Practical Implications Applications I I I I Hidden Markov Models & Filtering Particle Filters & Sequential Importance Resampling Auxiliary Particle Filters Auxiliary SMC Samplers The Probability Hypothesis Density On Stratification Conclusions 4 Introduction Interpretation Applications & Extensions Conclusions References Outline Outline I Background: Particle Filters I I I I Interpretation I I I I Auxiliary Particle Filters are SIR Algorithms Theoretical Considerations Practical Implications Applications I I I I Hidden Markov Models & Filtering Particle Filters & Sequential Importance Resampling Auxiliary Particle Filters Auxiliary SMC Samplers The Probability Hypothesis Density On Stratification Conclusions 5 Introduction Interpretation Applications & Extensions Conclusions References Outline Outline I Background: Particle Filters I I I I Interpretation I I I I Auxiliary Particle Filters are SIR Algorithms Theoretical Considerations Practical Implications Applications I I I I Hidden Markov Models & Filtering Particle Filters & Sequential Importance Resampling Auxiliary Particle Filters Auxiliary SMC Samplers The Probability Hypothesis Density On Stratification Conclusions 6 Introduction Interpretation Applications & Extensions Conclusions References HMMs & Particle Filters Hidden Markov Models I x1 x2 x3 x4 x5 x6 y1 y2 y3 y4 y5 y6 Xn is a X -valued Markov Chain with transition density f : Xn |{Xn−1 = xn−1 } ∼ f (·|xn−1 ) I Yn is a Y-valued stochastic process: Yn |{Xn = xn } ∼ g(·|xn ) 7 Introduction Interpretation Applications & Extensions Conclusions References HMMs & Particle Filters Optimal Filtering I The filtering distribution may be expressed recursively as: Z p(xn |y1:n−1 ) = p(xn−1 |y1:n−1 )f (xn |xn−1 )dxn−1 Prediction p(xn |y1:n ) = R I p(xn |y1:n−1 )g(yn |xn ) p(xn |y1:n−1 )g(yn |xn )dxn Update. The smoothing distribution may be expressed recursively as: p(x1:n |y1:n−1 ) = p(x1:n−1 |y1:n−1 )fn (xn |xn−1 ) p(x1:n |y1:n−1 )gn (yn |xn ) p(x1:n |y1:n ) = R p(x1:n |y1:n−1 )gn (yn |xn )dx1:n Prediction Update. 8 Introduction Interpretation Applications & Extensions Conclusions References HMMs & Particle Filters Optimal Filtering I The filtering distribution may be expressed recursively as: Z p(xn |y1:n−1 ) = p(xn−1 |y1:n−1 )f (xn |xn−1 )dxn−1 Prediction p(xn |y1:n ) = R I p(xn |y1:n−1 )g(yn |xn ) p(xn |y1:n−1 )g(yn |xn )dxn Update. The smoothing distribution may be expressed recursively as: p(x1:n |y1:n−1 ) = p(x1:n−1 |y1:n−1 )fn (xn |xn−1 ) p(x1:n |y1:n−1 )gn (yn |xn ) p(x1:n |y1:n ) = R p(x1:n |y1:n−1 )gn (yn |xn )dx1:n Prediction Update. 9 Introduction Interpretation Applications & Extensions Conclusions References HMMs & Particle Filters Particle Filters: Sequential Importance Resampling At time n = 1: I i ∼ q (·). Sample X1,1 1 I Weight i i i W1i ∝ f (X1,1 )g(y1 |X1,1 )/q1 (X1,1 ) I Resample. At times n > 1, iterate: I i i i i Sample Xn,n ∼ qn (·|Xn−1,n−1 ). Set Xn,1:n−1 = Xn−1 . I Weight Wni ∝ I i |X i i f (Xn,n n,n−1 )g(Xn,n |yn ) qn (Xn,n |Xn,1:n−1 ) Resample. 10 Introduction Interpretation Applications & Extensions Conclusions References Auxiliary Particle Filters Auxiliary [v] Particle Filters (Pitt & Shephard ’99) If we have access to the next observation before resampling, we could use this structure: (i) (i) I Pre-weight every particle with λn ∝ p̂(yn |Xn−1 ). I Propose new states, from the mixture distribution N X N X (i) λ(i) q(·|X ) λ(i) n n . n−1 i=1 I Weight samples, correcting for the pre-weighting. Wni ∝ I i=1 i |X i i f (Xn,n n,n−1 )g(Xn,n |yn ) λin qn (Xn,n |Xn,1:n−1 ) Resample particle set. 11 Introduction Interpretation Applications & Extensions Conclusions References Auxiliary Particle Filters Some Well Known Refinements We can tidy things up a bit: 1. The auxiliary variable step is equivalent to multinomial resampling. 2. So, there’s no need to resample before the pre-weighting. Now we have: (i) (i) I Pre-weight every particle with λn ∝ p̂(yn |Xn−1 ). I Resample I Propose new states I Weight samples, correcting for the pre-weighting. Wni ∝ i |X i i f (Xn,n n,n−1 )g(Xn,n |yn ) λin qn (Xn,n |Xn,1:n−1 ) 12 Introduction Interpretation Applications & Extensions Conclusions References Interpretation 13 Introduction Interpretation Applications & Extensions Conclusions References APFs without Auxiliary Variables General SIR Algorithms The SIR algorithm can be used somewhat more generally. Given {πn } defined on En = X n : I i i i i Sample Xn,n ∼ qn (·|Xn−1 ). Set Xn,1:n−1 = Xn−1 . I Weight Wni ∝ I πn (Xni ) πn−1 (Xn,1:n−1 )qn (Xn,n |Xn,1:n−1 ) Resample. 14 Introduction Interpretation Applications & Extensions Conclusions References APFs without Auxiliary Variables An Interpretation of the APF If we move the first step at time n + 1 to the last at time n, we get: I Resample I Propose new states I Weight samples, correcting earlier pre-weighting. I Pre-weight every particle with λn+1 ∝ p̂(yn+1 |Xn ). (i) (i) which is an SIR algorithm targetting the sequence of distributions ηn (xn ) ∝ p(x1:n |y1:n )p̂(yn+1 |xn ) which allows estimation under the actually interesting distribution via importance sampling. 15 Introduction Interpretation Applications & Extensions Conclusions References Theoretical Considerations Theoretical Considerations I Direct analysis of the APF is largely unnecessary. I Results can be obtained by considering the associated SIR algorithm. I SIR has a (discrete time) Feynman-Kac interpretation. 16 Introduction Interpretation Applications & Extensions Conclusions References Theoretical Considerations For example. . . Proposition. Under standard regularity conditions √ 2 N ϕ bN n,AP F − ϕn → N 0, σn (ϕn ) where, p ( x1 | y1 )2 Z 2 σ1 (ϕ1 ) = q1 (x1 ) p(x1 |y1:n )2 Z 2 σn (ϕn ) = + q1 (x1 ) t−1 X k=2 Z + Z 2 (ϕ1 (x1 ) − ϕ1 ) dx1 «2 „Z ϕn (x1:n )p(x2:n |y2:n , x1 )dx2:n − ϕ̄n p(x1:k |y1:n ) 2 b (x1:k−1 |y1:k )qk (xk |xk−1 ) p p(x1:n |y1:n )2 b (x1:n−1 |y1:n )qn (xn |xn−1 ) p dx1 «2 „Z ϕn (x1:n )p(xk+1:n |yk+1:n , xk )dxk+1:n − ϕ̄n dx1:k 2 (ϕn (x1:n ) − ϕ̄n ) dx1:n . 17 Introduction Interpretation Applications & Extensions Conclusions References Practical Implications Practical Implications I It means we’re doing importance sampling. I Choosing pb(yn |xn−1 ) = p (yn |xn = E [Xn |xn−1 ]) is dangerous. I A safer choice would be ensure that sup xn−1 ,xn I g(yn |xn )f (xn |xn−1 ) <∞ pb(yn |xn−1 )q(xn |xn−1 ) Using APF doesn’t ensure superior performance. 18 Introduction Interpretation Applications & Extensions Conclusions References Practical Implications A Contrived Illustration Consider the following binary state-space model with common state and observation spaces: X = {0, 1} p(x1 = 0) = 0.5 p(xn = xn−1 ) = 1 − δ Y=X p(yn = xn ) = 1 − ε. I δ controls ergodicity of the state process. I controls the information contained in observations. Consider estimating E(X2 |Y1:2 = (0, 1)). 19 Introduction Interpretation Applications & Extensions Conclusions References Practical Implications Variance of SIR - Variance of APF 0.05 0 -0.05 -0.1 -0.15 -0.2 1 0.9 0.8 0.7 0.05 0.6 0.1 0.15 0.5 0.4 0.2 0.25 ǫ 0.3 0.3 0.35 δ 0.2 0.4 0.45 0.1 0.5 20 Introduction Interpretation Applications & Extensions Conclusions References Practical Implications Variance Comparison at ǫ = 0.25 0.45 SISR APF SISR Asymptotic APF Asymptotic 0.4 Variance 0.35 0.3 0.25 0.2 0.15 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 δ 21 Introduction Interpretation Applications & Extensions Conclusions References Applications & Extensions 22 Introduction Interpretation Applications & Extensions Conclusions References Auxiliary SMC Samplers SMC Samplers (Del Moral, Doucet & Jasra, 2006) SIR Techniques can be adapted for any sequence of distributions {πn }. I Define π fn (x1:n ) = πn (xn ) n−1 Y Lk (xk+1 , xk ). k=1 I Sample at time n, i Xni ∼ Kn (Xn−1 , ·) I Weight Wni ∝ I i πn (Xni )Ln−1 (Xni , Xn−1 ) i i πn−1 (Xn−1 )Kn (Xn−1 , Xni ) Resample. 23 Introduction Interpretation Applications & Extensions Conclusions References Auxiliary SMC Samplers Auxiliary SMC Samplers An auxiliary SMC sampler for a sequence of distributions πn comprises: I An SMC sampler targeting some auxiliary sequence of distributions µn . I A sequence of importance weight functions w fn (xn ) ∝ dπn (xn ). dµn Generic approaches: I Choose µn (xn ) ∝ πn (xn )Vn (xn ). I Incorporate as much information as possible prior to resampling. 24 Introduction Interpretation Applications & Extensions Conclusions References Auxiliary SMC Samplers Auxiliary SMC Samplers An auxiliary SMC sampler for a sequence of distributions πn comprises: I An SMC sampler targeting some auxiliary sequence of distributions µn . I A sequence of importance weight functions w fn (xn ) ∝ dπn (xn ). dµn Generic approaches: I Choose µn (xn ) ∝ πn (xn )Vn (xn ). I Incorporate as much information as possible prior to resampling. 25 Introduction Interpretation Applications & Extensions Conclusions References Auxiliary SMC Samplers Resample-Move Approaches A common approach in SMC Samplers: I Choose Kn (xn−1 , xn ) to be πn -invariant. I Set Ln−1 (xn , xn−1 ) = I πn (xn−1 )Kn (xn−1 , xn ) . πn (xn ) Leading to Wn (xn−1 , xn ) = I πn (xn−1 ) . πn−1 (xn−1 ) It would make more sense to resample and then move. . . 26 Introduction Interpretation Applications & Extensions Conclusions References Auxiliary SMC Samplers An Interpretation of Pure Resample-Move It’s SIR with auxiliary distributions. . . µn (xn ) = πn+1 (xn ) µn−1 (xn−1 )Kn (xn−1 , xn ) πn (xn−1 )Kn (xn−1 , xn ) Ln−1 (xn , xn−1 ) = = µn−1 (xn ) πn (xn ) µn (xn ) πn+1 (xn ) wn (xn−1:n ) = = µn−1 (xn ) πn (xn ) µn−1 (xn ) πn (xn ) w en (xn ) = = . µn (xn ) πn+1 (xn ) 27 Introduction Interpretation Applications & Extensions Conclusions References Auxiliary SMC Samplers Piecewise-Deterministic Processes 25 20 y 15 10 5 0 −5 −10 35 40 45 50 55 60 x 28 Introduction Interpretation Applications & Extensions Conclusions References Auxiliary SMC Samplers “Filtering” of Piecewise-Deterministic Processes (Whiteley, Johansen & Godsill, 2007) I Xn = (kn , τn,1:kn , θn,0:kn ) specifies a continuous time trajectory (ζt )t∈[0,tn ] . I ζt observed in the presence of noise, e.g. Yn = ζtn + Vn I Target sequence of distributions {πn } on nested spaces: πn (k, τ1:k , θ0:k ) ∝ q(θ0 ) k Y f (τj |τj−1 )q(θj |τj , θj−1 , τj−1 ) j=1 × S(tn , τk ) n Y g(yp |ζtp ) p=1 29 Introduction Interpretation Applications & Extensions Conclusions References Auxiliary SMC Samplers “Filtering” of Piecewise-Deterministic Processes (Whiteley, Johansen & Godsill, 2007) I πn yields posterior marginal distributions for recent history of ζt I Employ auxiliary SMC samplers I Choose µn (k, τ1:k , θ0:k ) ∝ πn (k, τ1:k , θ0:k )Vn (τk , θk ) I Kernel proposes new pairs (τj , θj ) and adjusts recent pairs I Applications in object-tracking and finance 30 Introduction Interpretation Applications & Extensions Conclusions References The Auxiliary SMC-PHD Filter Probability Hypothesis Density Filtering (Mahler, 2003; Vo et al. 2005) I Approximates the optimal filter for a class of spatial point process-valued HMMs I A recursion for intensity functions: Z f (xn |xn−1 )pS (xn−1 )b αn−1 (xn−1 )dxn−1 + γ(xn ) αn (xn ) = "E # mn X ψn,r (xn ) α bn (xn ) = 1 − pD (xn ) + αn (xn ). Zn,r r=1 I Approximate the intensity functions {b αn }n≥0 using SMC 31 Introduction Interpretation Applications & Extensions Conclusions References The Auxiliary SMC-PHD Filter An ASMC Implementation (Whiteley et al., 2007) 1 N PN i (dxn−1 ) i i=1 Wn−1 δXn−1 I N (dx α bn−1 (dxn−1 ) ≈ α bn−1 n−1 ) = I In a simple case, target integral at nth iteration: Z Z X mn E I E r=1 ϕ(xn ) ψn,r (xn ) N f (xn |xn−1 )pS (xn−1 )dxn α bn−1 (dxn−1 ) Zn,r N , Proposal distribution qn (x0n , x0n−1 , rn ) built from α bn−1 potential functions {Vn,r } and factorises: (i) i=1 Vn,r (Xn−1 )Wn−1 δX (i) PN qn (x0n |x0n−1 , r) I n−1 PN (i) (dx0n−1 ) i=1 Vn,r (Xn−1 )Wn−1 qn (r) and also estimate normalising constants {Zn,r } by IS 32 Introduction Interpretation Applications & Extensions Conclusions References On Stratification Stratification. . . I Back to HMM, let (Ap )M p=1 denote a partition of X P Introduce stratum indicator variable Rn = M p=1 pIAp (Xn ) I Define extended model: I p(x1:n , r1:n |y1:n ) ∝ g(yn |xn )f (xn |rn , xn−1 )f (rn |xn−1 ) × p(x1:n−1 , r1:n−1 |y1:n−1 ) I An auxiliary SMC filter resamples from distribution with N × M support points, over particles × strata I Applications in switching state space models 33 Introduction Interpretation Applications & Extensions Conclusions References Conclusions 34 Introduction Interpretation Applications & Extensions Conclusions References Conclusions Conclusions I I The APF is a standard SIR algorithm for nonstandard distributions. This interpretation. . . I I I I allows standard results to be applied directly, provides guidance on implementation, and allows the same technique to be applied in more general settings. Thanks for listening. . . any questions? 35 Introduction Interpretation Applications & Extensions Conclusions References Conclusions Conclusions I I The APF is a standard SIR algorithm for nonstandard distributions. This interpretation. . . I I I I allows standard results to be applied directly, provides guidance on implementation, and allows the same technique to be applied in more general settings. Thanks for listening. . . any questions? 36 Introduction Interpretation Applications & Extensions Conclusions References Conclusions Conclusions I I The APF is a standard SIR algorithm for nonstandard distributions. This interpretation. . . I I I I allows standard results to be applied directly, provides guidance on implementation, and allows the same technique to be applied in more general settings. Thanks for listening. . . any questions? 37 Introduction Interpretation Applications & Extensions Conclusions References Conclusions Conclusions I I The APF is a standard SIR algorithm for nonstandard distributions. This interpretation. . . I I I I allows standard results to be applied directly, provides guidance on implementation, and allows the same technique to be applied in more general settings. Thanks for listening. . . any questions? 38 Introduction Interpretation Applications & Extensions Conclusions References Conclusions Conclusions I I The APF is a standard SIR algorithm for nonstandard distributions. This interpretation. . . I I I I allows standard results to be applied directly, provides guidance on implementation, and allows the same technique to be applied in more general settings. Thanks for listening. . . any questions? 39 Introduction Interpretation Applications & Extensions Conclusions References Conclusions Conclusions I I The APF is a standard SIR algorithm for nonstandard distributions. This interpretation. . . I I I I allows standard results to be applied directly, provides guidance on implementation, and allows the same technique to be applied in more general settings. Thanks for listening. . . any questions? 40 Introduction Interpretation Applications & Extensions Conclusions References Conclusions Conclusions I I The APF is a standard SIR algorithm for nonstandard distributions. This interpretation. . . I I I I allows standard results to be applied directly, provides guidance on implementation, and allows the same technique to be applied in more general settings. Thanks for listening. . . any questions? 41 References [1] P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society B, 63(3):411–436, 2006. [2] A. M. Johansen and A. Doucet. Auxiliary variable sequential Monte Carlo methods. Technical Report 07:09, University of Bristol, Department of Mathematics – Statistics Group, University Walk, Bristol, BS8 1TW, UK, July 2007. [3] A. M. Johansen and A. Doucet. A note on the auxiliary particle filter. Statistics and Probability Letters, 2008. To appear. [4] A. M. Johansen and N. Whiteley. A modern perspective on auxiliary particle filters. In Proceedings of Workshop on Inference and Estimation in Probabilistic Time Series Models. Isaac Newton Institute, June 2008. To appear. [5] R. P. S. Mahler. Multitarget Bayes filtering via first-order multitarget moments. IEEE Transactions on Aerospace and Electronic Systems, 39(4):1152, October 2003. [6] M. K. Pitt and N. Shephard. Filtering via simulation: Auxiliary particle filters. Journal of the American Statistical Association, 94(446):590–599, 1999. [7] B. Vo, S.S. Singh, and A. Doucet. Sequential Monte Carlo methods for multi-target filtering with random finite sets. IEEE Transactions on Aerospace and Electronic Systems, 41(4):1223–1245, October 2005. [8] N. Whiteley, A. M. Johansen, and S. Godsill. Monte Carlo filtering of piecewise-deterministic processes. Technical Report CUED/F-INFENG/TR-592, University of Cambridge, Department of Engineering, Cambridge University Engineering Department, Trumpington Street, Cambridge, CB2 1PZ, December 2007. [9] N. Whiteley, S. Singh, and S. Godsill. Auxiliary particle implementation of the probability hypothesis density filter. Technical Report CUED F-INFENG/590, University of Cambridge, Department of Engineering, Trumpington Street, Cambridge,CB1 2PZ, United Kingdom, 2007.