Interacting Particle Systems and (Discrete Time) Filtering Adam M. Johansen

Interacting Particle Systems and (Discrete Time) Filtering Adam M. Johansen a.m.johansen@warwick.ac.uk http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic/ johansen/talks/ MASDOC Statistical Frontiers Seminar 1st February 2013 Part 1 – A Statistical Problem Filtering Filtering: The Problem Hidden Markov Models / State Space Models x1 x2 x3 x4 x5 x6 y1 y2 y3 y4 y5 y6 I Unobserved Markov chain {Xn } transition f . I Observed process {Yn } conditional density g . I Density: p(x1:n , y1:n ) = f1 (x1 )g (y1 |x1 ) n Y f (xi |xi−1 )g (yi |xi ). i=2 4/79 Filtering / Smoothing I Let X1 , . . . denote the position of an object which follows Markovian dynamics: Xn |{Xn−1 = xn−1 } ∼ f (·|xn−1 ). I Let Y1 , . . . denote a collection of observations: Yi |{Xi = xi } ∼ g (·|xi ). I Smoothing: estimate, as observations arrive, p(x1:n |y1:n ). I Filtering: estimate, as observations arrive, p(xn |y1:n ). I Formal Solution: p(x1:n |y1:n ) = p(x1:n−1 |y1:n−1 ) f (xn |xn−1 )g (yn |xn ) p(yn |y1:n−1 ) 5/79 A Motivating Example: Data 30 25 20 sy/km 15 10 5 0 −5 −10 0 5 10 15 20 25 30 35 sx/km 6/79 Example: Almost Constant Velocity Model I I I States: xn = [snx unx sny uny ]T Dynamics: xn = Axn−1 + n  x   sn 1 ∆t  unx   0 1  y =  sn   0 0 uny 0 0  x sn−1 0 0  ux 0 0    n−1 y 1 ∆t   sn−1 y un−1 0 1    + n  Observation: yn = Bxn + νn  snx  unx   y  + νn  sn  uny  rnx rny = 1 0 0 0 0 0 1 0 7/79 Sampling Approaches The Monte Carlo Method I Given a probability density, f , and ϕ : E → R Z ϕ(x)f (x)dx I = E I Simple Monte Carlo solution: I I i.i.d. Sample X1 , . . . , XN ∼ f . N P Estimate bI = N1 ϕ(Xi ). i=1 I Can also be viewed as approximating π(dx) = f (x)dx with π bN (dx) = N 1 X δXi (dx). N i=1 9/79 The Importance–Sampling Identity I Given g , such that I I f (x) > 0 ⇒ g (x) > 0 and f (x)/g (x) < ∞, define w (x) = f (x)/g (x) and: Z Z Z ϕ(x)f (x)dx = ϕ(x)f (x)g (x)/g (x)dx = ϕ(x)w (x)g (x)dx. I This suggests the importance sampling estimator: I I i.i.d. Sample X1 , . . . , XN ∼ g . N P w (Xi )ϕ(Xi ). Estimate bI = N1 i=1 I Can also be viewed as approximating π(dx) = f (x)dx with π bN (dx) = N 1 X w (Xi )δXi (dx). N i=1 10/79 Importance Sampling Example 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 -4 -3 -2 -1 0 1 2 3 4 11/79 Self-Normalised Importance Sampling I Often, f is known only up to a normalising constant. I If v (x) = cf (x)/g (x) = cw (x), then Eg (cw ϕ) Eg (v ϕ) cEf (ϕ) = = = Ef (ϕ). Eg (v 1) Eg (cw 1) cEf (1) I Estimate the numerator and denominator with the same sample: N P v (Xi )ϕ(Xi ) bI = i=1 . N P v (Xi ) i=1 I Biased for finite samples, but consistent. I Typically reduces variance. 12/79 Importance Sampling for Smoothing/Filtering I (i) Sample {X1:n } at time n from qn (x1:n ), define p(x1:n |y1:n ) p(x1:n , y1:n ) = q(x1:n ) q(x1:n )p(y1:n ) Qn f (x1 )g (y1 |x1 ) m=2 f (xm |xm−1 )g (ym |xm ) ∝ qn (x1:n ) wn (x1:n ) ∝ (i) (i) P (j) I set Wn = wn (X1:n )/ I then {Wn , Xn } is a consistently weighted sample. I This seems inefficient. (i) j wn (X1:n ), (i) 13/79 Sequential Importance Sampling (SIS) I I Importance weight wn (x1:n ) ∝ = I (i) f (x1 )g (y1 |x1 ) Qn f (x1 )g (y1 |x1 ) qn (x1 ) n Y m=2 f (xm |xm−1 )g (ym |xm ) qn (x1:n ) m=2 f (xm |xm−1 )g (ym |xm ) qn (xm |x1:m−1 ) (i) Given {Wn−1 , X1:n−1 } targetting p(x1:n−1 |y1:n−1 ) I Let qn (x1:n−1 ) = qn−1 (x1:n−1 ), I sample Xn (i) i.i.d. (i) (i) ∼ qn (·|X1:n−1 ) or even qn (·|Xn−1 ). 14/79 Sequential Importance Sampling (SIS) II I And update the weights: wn (x1:n ) =wn−1 (x1:n−1 ) f (xn |xn−1 )g (yn |xn ) qn (xn |xn−1 ) (i) Wn(i) =wn (X1:n ) (i) (i) =wn−1 (X1:n−1 ) (i) (i) =Wn−1 R (i) (i) f (Xn |Xn−1 )g (yn |Xn ) (i) (i) qn (Xn |Xn−1 ) (i) (i) f (Xn |Xn−1 )g (yn |Xn ) (i) (i) qn (Xn |Xn−1 ) p(x1:n |y1:n )dxn ≈ p(x1:n−1 |y1:n−1 ) this makes sense. I If I We only need to store {Wn , Xn−1:n }. I Same computation every iteration. (i) (i) 15/79 Importance Sampling on Huge Spaces Doesn’t Work I It’s said that IS breaks the curse of dimensionality: " # Z N √ 1 X d N w (Xi )ϕ(Xi ) − ϕ(x)f (x)dx → N (0, Varg [w ϕ]) N i=1 I This is true. I But it’s not enough. I Varg [w ϕ] increases (often exponentially) with dimension. I Eventually, an SIS estimator (of p(x1:n |y1:n )) will fail. I But p(xn |y1:n ) is a fixed-dimensional distribution. 16/79 Sequential Importance Resampling Resampling: The SIR[esampling] Algorithm I Problem: variance of the weights builds up over time. I Solution? Given {Wn−1 , X1:n−1 }: (i) (i) (i) e 1. Resample? , to obtain { N1 , X 1:n−1 }. (i) (i) e ). 2. Sample Xn ∼ qn (·|X n−1 (i) e (i) . 3. Set X1:n−1 = X 1:n−1 (i) (i) (i) (i) (i) (i) 4. Set Wn = f (Xn |Xn−1 )g (yn |Xn )/qn (Xn |Xn−1 ). I And continue as with SIS. I There is a cost, but this really works. ? There are many algorithms for doing this. . . 18/79 Iteration 2 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 19/79 Iteration 3 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 20/79 Iteration 4 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 21/79 Iteration 5 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 22/79 Iteration 6 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 23/79 Iteration 7 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 24/79 Iteration 8 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 25/79 Iteration 9 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 26/79 Iteration 10 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 27/79 Part 2 – Applied Probability Feynman-Kac Formulæ Feynman-Kac Formulæ I I A natural description for measure-valued stochastic processes. Model for: I I I I Particle motion in absorbing environments. Classes of branching particle system. Simple genetic algorithms. Particle filters and related algorithms. Structure of this section: I Probabilistic Construction I Semigroup[oid] Structure I McKean Interpretations I Particle Approximations I Selected Results 29/79 Probabilistic Construction Following Del Moral (2004) The Canonical Markov Chain I Consider the filtered probability space: (Ω, F, {Fn }n∈N , Pµ ) I Let {Xn }n∈N be a Markov chain such that for any n ∈ N: Pµ (X1:n ∈ dx1:n ) =µ(dx1 ) n Y Mi (xi−1 , dxi ) i=2 Xi : Ω →Ei µ ∈P(E1 ) Mi : Ei−1 →P(Ei ) I (Ei , Ei ) are measurable spaces. I The Xi are Ei /Fi -measurable. I Using Kolmogorov’s/Tulcea’s extension theorem there exists a unique process-valued extension. 31/79 Some Operator Notation Given two measurable spaces, (E , E) and (F , F), a measure µ on (E , E) and a Markov kernel, K : E → P(F ), define: Z µ(ϕE ) := µ(dx)ϕE (x) Z µK (ϕF ) := µ(dx)K (x, dy )ϕF (y ) µK ∈ P(F ) Z K (ϕF )(x) := K (x, dy )ϕF (y ) K (ϕF ) : E → R with ϕE , ϕF suitably measurable functions. Given two functions, g , h : E → R, define g · h : E → R via (g · h)(x) = g (x)h(x). Given e : E → R and f : F → R, let (e ⊗ f )(x, y ) := e(x)f (y ). 32/79 The Feynman-Kac Formulæ I Given Pµ and potential functions: {Gi }i∈N I Gi : Ei →[0, ∞) Define two path measures weakly: n−1 Q E ϕ1:n (X1:n ) Gi (Xi ) i=1 Qn (ϕ1:n ) = n−1 Q Gi (Xi ) E i=1 n Q E ϕ1:n (X1:n ) Gi (Xi ) b n (ϕ1:n ) = n i=1 Q Q E Gi (Xi ) i=1 where ϕ1:n : ⊗ni=1 Ei → R. 33/79 Example (Filtering via FK Formulæ: Prediction) I Let µ(x1 ) = f (x1 ), Mn (xn−1 , dxn ) = f (xn |xn−1 )dxn . I Let Gn (xn ) = g (yn |xn ). I Then: " Qn (ϕ1:n ) =E ϕ1:n (X1:n ) " =E ϕ1:n (X1:n ) n−1 Y i=1 n−1 Y #, "n−1 # Y Gi (Xi ) E Gi (Xi ) i=1 #, "n−1 # Y g (yi |Xi ) E g (yi |Xi ) i=1 R i=2 = i=1 # "n−1 n Q Q f (x1 ) f (xi |xi−1 ) g (yj |xj ) ϕ1:n (x1:n )dx1:n j=1 # "n−1 n R Q Q f (x1 ) f (xi |xi−1 ) g (yj |xj ) dx1:n i=2 j=1 Z = p(x1:n |y1:n−1 )ϕ1:n (x1:n )dx1:n 34/79 Example (Filtering via FK Formulæ: Update/Filtering) I Whilst: " b n (ϕ1:n ) =E ϕ1:n (X1:n ) Q n Y #, " Gi (Xi ) E i=1 " =E ϕ1:n (X1:n ) n Y n Y f (x1 ) n Q g (yi |Xi ) i=1 n Y E f (xi |xi−1 ) = R # g (yi |Xi ) i=1 " i=2 Gi (Xi ) #, " i=1 R # n Q # g (yj |xj ) ϕ1:n (x1:n )dx1:n j=1 # " n n Q Q f (x1 ) f (xi |xi−1 ) g (yj |xj ) dx1:n i=2 j=1 Z = p(x1:n |y1:n )ϕ1:n (x1:n )dx1:n 35/79 Feynman-Kac Marginal Measures We are typically interested in marginals: " # " # n−1 n Y Y γn (ϕn ) =E ϕn (Xn ) Gi (Xi ) γ bn (ϕn ) =E ϕn (Xn ) Gi (Xi ) i=1 i=1 ηn (ϕn ) =Qn (11:n−1 ⊗ ϕn ) n−1 Q Gi (Xi ) E ϕn (Xn ) i=1 = n−1 Q Gi (Xi ) E b n (11:n−1 ⊗ ϕn ) ηbn =Q n Q E ϕn (Xn ) Gi (Xi ) i=1 n = Q Gi (Xi ) E i=1 i=1 =b γn (ϕn )/b γn (1) =γn (ϕn )/γn (1) Key property: Z ηn (An ) = Qn (dx1:n ) E1 ×...En−1 ×An Z ηbn (An ) = b n (dx1:n ) Q E1 ×...En−1 ×An 36/79 Analysis: Semigroup Structure A Dynamic Systems View: How do the marginal distributions evolve? Some Recursive Relationships I The unnormalized marginals obey: γ bn (ϕn ) =γn (ϕn · Gn ) I Whilst the normalized marginals satisfy: γ bn (ϕn ) γ bn (1) γn (ϕn · Gn ) = γn (Gn ) ηn (ϕn · Gn ) = ηn (Gn ) ηbn (ϕn ) = I γn (ϕn ) =b γn−1 Mn (ϕn ) γn (ϕn ) γn (1) γ bn−1 Mn (ϕn ) = γ bn−1 Mn (1) ηbn−1 Mn (ϕn ) = ηbn−1 Mn (1) =b ηn−1 Mn (ϕn ) ηn (ϕn ) = So: ηbn = ηbn−1 Mn (ϕn · Gn ) ηbn−1 Mn (Gn ) 38/79 The Boltzmann-Gibbs Operator I Given ν ∈ P(E ) and G : E → R: P(E ) →P(E ) ΨG : ν →ΨG (ν) ΨG : I The Boltzmann-Gibbs Operator ΨG is defined weakly by: ∀ϕ ∈ Cb : I ΨG (ν) (ϕ) = ν(G · ϕ) ν(G ) or equivalently, for all measurable sets A: ν (G · IA ) ν(G ) R ν(dx)G (x) =R A 0 0 E ν(dx )G (x ) ΨG (A) = 39/79 Example (Boltzmann-Gibbs Operators and Bayes’ Rule) I Let µ(dx) = f (x)λ(dx) be a prior measure. I Let G (x) = g (y |x) be the likelihood. I Then: R µ(dx)G (x)ϕ(x) µ(G · ϕ) = R ΨG (µ) (ϕ) = µ(G ) µ(dx 0 )G (x 0 ) R f (x)g (y |x)ϕ(x)λ(dx) = R f (x 0 )g (y |x 0 )λ(dx 0 ) Z = f (x|y )ϕ(x)λ(dx) with f (x|y ) := R I f (x)g (y |x) f (x)g (y |x)λ(dx) So: Ψg (y |·) : Prior → Posterior. 40/79 Markov Semigroups I A semigroup S comprises: I I I A set, S. An associative binary operation, ·. A Markov Chain with homogeneous transition M has dynamic semigroup Mn : I I I I M0 (x, A) = δx (A). M1 (x, A) = M(x, A). R Mn (x, A) = M(x,Rdy )Mn−1 (y , A). (Mn · Mm )(x, A) = Mn (x, dy )Mm (y , A) = Mn+m (x, A). I A linear semigroup. I Key property: P (Xn+m ∈ A|Xm = x) = Mn (x, A). 41/79 Markov Semigroupoids I A semigroupoid, S 0 comprises: I I I A set, S. A partial associative binary operation, ·. A Markov Chain with inhomogeneous transitions Mn has dynamic semigroupoid Mp:q : I I I I Mp:p (x, A) = δx (A). Mp:p+1 (x, A) = R Mp+1 (x, A). Mp:q (x, A) = Mp+1 (x, R dy )Mp+1:q (y , A). (Mp:q · Mq:r )(x, A) = Mp:q (x, dy )Mq:r (y , A) = Mp:r (x, A). I A linear semigroupoid. I Key property: P (Xn+m ∈ A|Xm = x) = Mm,n+m (x, A). 42/79 An Unnormalized Feynman-Kac Semigroupoid I We previously established: γn =b γn−1 Mn I γ bn (ϕn ) =γn (ϕn · Gn ) Defining Qp (xp−1 , dxp ) = Mp (xp−1 , dxp )Gp (xp ) I we obtain γn = γn−1 Qn . We can construct the dynamic semigroupoid Qp:q : I I I I I Qp:p (x, A) = δx (A). Qp:p+1 (x, A) = R Qp+1 (x, A). Qp:q (x, A) = Qp+1 (x, R dy )Qp+1:q (y , A). (Qp:q · Qq:r )(x, A) = Qp:q (x, dy )Qq:r (y , A) = Qp:r (x, A). Just a Markov semigroupoid for general measures: ∀p ≤ q : γq = γp Qp:q . 43/79 A Normalised Feynman-Kac Semigroupoid I We previously established: ηn =b ηn−1 Mn (ϕn ) ηbn = ηn (ϕn · Gn ) ηn (Gn ) I From the definition of ΨGn : ηbn = ΨGn (ηn ). I Defining Φn : P(En−1 ) → P(En ) as: Φn : ηn−1 → ΨGn−1 (ηn−1 )Mn we have the recursion ηn = Φn (ηn−1 ) and the nonlinear semigroupoid, Φp:q : I I I I I Φp:p (x, A) = δx (A). Φp:p+1 (x, A) = Φp+1 (x, A). Φp:q (x, A) = Φp+1:q (Φ R p+1 (ηp )) for q > p + 1. (Φp:q · Φq:r )(x, A) = Φq:r (y , A)Φp:q (x, dy ) = Φp:r (x, A). Again: ∀p ≤ q : ηq = ηp Φp:q . 44/79 McKean Interpretations Microscopic mass transport. McKean Interpretations of Feynman-Kac Formulæ I Families of Markov kernels consistent with FK Marginals. I A collection {Kn,η }n∈N,η∈P(En−1 ) is a McKean Interpretation if: ∀n ∈ N : ηn = Φn (ηn−1 ) = ηn−1 Kn,ηn−1 . I Not unique. . . and not linear. Selection/Mutation approach seems natural: I I I I Choose Sn,η such that ηSn,η = ΨGn (η). Set Kn+1,η = Sn,η Mn+1 . Still not unique: I I Sn,η (xn , ·) = ΨGn (η) Sn,η (xn , ·) = n Gn (xn )δxn (·) + (1 − n Gn (xn ))ΨGn (η)(·) 46/79 Particle Interpretations Stochastic discretisations. Particle Interpretations of Feynman-Kac Formulæ I Given a McKean interpretation, we can attach an N-particle model. (N) I Denote ξn I Allow I (N,1) = (ξn (N,2) , ξn (N,N) , . . . , ξn ) ∈ EnN . ΩN , F N = (FnN )n∈N , ξ (N) , PN η0 to indicate a particle-set-valued Markov chain. P (N) Let ηn−1 = N1 N i=1 δξ (N,i) . n−1 I Allow the elementary transitions to be: N Y (N) (N) (N) (N,p) (N,p) Kn,η(N) (ξn−1 , dξn ) P ξn ∈ dξn |ξn−1 = p=1 n−1 48/79 Particle Interpretations of Feynman-Kac Formulæ II I Consider Kn,η = Sn−1,η Mn N Y (N) (N) (N) (N,p) (N,p) P ξn ∈ dξn |ξn−1 = Sn−1,η(N) Mn (ξn−1 , dξn ) n−1 p=1 I Defining: (N) (N) (N) Sn−1 (ξn−1 , d ξbn ) (N) (N) (N) = Mn (ξbn−1 , dξn ) = N Y i=1 N Y (N,p) (N,p) Sn,η(N) (ξn−1 , d ξbn−1 ) n−1 (N,p) (N,p) Mn (ξbn−1 , ξn ) i=1 it is clear that: Z (N) (N) (N) (N) P ξn(N) ∈ dξn(N) |ξn−1 = Sn−1,η(N) (ξn−1 , d ξbn−1 )Mn (ξbn−1 , dξn(N) ) n−1 N En−1 49/79 Selection, Mutation and Structure I A suggestive structural similarity: ηn−1 ∈P(En−1 ) Sn−1,ηn−1 −→ Select (N) N ξn−1 ∈En−1 I −→ M n −→ ηbn ∈P(En−1 ) Mutate (N) N ξbn ∈En−1 −→ ηn ∈P(En ) (N) ξn ∈EnN Selection: Sn−1,η(N) n−1 (N) =ΨGn−1 (ηn−1 ) = (N,i) N X i=1 Gn−1 (ξn−1 ) δ (N,i) (N,j) ξn−1 j=1 Gn−1 (ξn−1 ) PN (N,i) i.i.d. (N) ξbn−1 ∼ ΨGn−1 (ηn−1 ) I Mutation: (N,i) i.i.d. ξn I (N,i) (N,i) ∼ Mn (ξbn−1 , dξn ) Semigroupoid N P (N) (ξn ∈ (N) (N) dxn |ξn−1 ) = N Y (N) (N,i) Φn (ηn−1 )(dxn ) i=1 50/79 Selected Results Local Error Decomposition η1 → η2 =Φ2 (η1 ) → η3 =Φ1:3 (η1 ) → ... →Φ1:n (η1 ) → Φ1:3 (η1N ) → ... →Φ1:n (η1N ) → Φ3 (η2N ) → ... →Φ2:n (η2N ) → ... →Φ3:n (η3N ) ⇓ η1N → Φ2 (η1N ) ⇓ η2N ⇓ η3N .. . ⇓ N ηn−1 N →Φn (ηn−1 ) ⇓ ηnN Formally: ηnN − ηn = n X N Φi,n (ηiN ) − Φi,n (Φi (ηi−1 )) i=1 52/79 A Key Martingale Proposition (Del Moral, 2004 Propostion 7.4.1) For each n ≥ 0, ϕn ∈ Cb (En ) define: ΓN ·,n (ϕn ) : p ∈ {1, . . . , n} → Γp,n (ϕn ) N ΓN p,n (ϕn ) := γp (Qp,n ϕn ) − γp (Qp,n ϕn ) N For any p ≤ n: ΓN ·,n (ϕn ) has F -martingale decomposition: ΓN p,n (ϕn ) = p X h i N γqN (1) ηqN (Qq,n ϕn ) − ηq−1 Kq,ηq−1 (Qq,n ϕn ) N q=1 p h i2 X N N N Γ·,n p = γqN (1)2 ηq−1 Kq,ηq−1 (Qq,n (ϕn ) − ηq−1 Kq,ηq−1 Qq,n (ϕn )) N N q=1 53/79 Normalizing the Unnormalized γnN (ϕn ) γn (ϕn ) − γn (1) γnN (1) N γn (1) γn (ϕn ) γn (ϕn ) γnN (1) = N − × γn (1) γn (1) γn (1) γn (1) N γn (1) γn (ϕn ) γnN (1) = N − ηn (ϕn ) × γn (1) γn (1) γn (1) γn (1) ϕn − ηn (ϕn ) = N γN γn (1) γn (1) n ηnN (ϕn ) − ηn (ϕn ) = 54/79 Law of Large Numbers and Weak Convergence Theorem (Del Moral 2004: Theorem 7.4.4) Under regularity conditions, for any n ≥ 1, p ≥ 1, ϕn ∈ Cb (En ): √ h i1/p NE |ηnN (ϕn ) − ηn (ϕn )|p ≤ cp,n ||ϕn ||∞ By a Borel-Cantelli argument: a.s. lim ηnN (ϕn ) → ηn (ϕn ). N→∞ 55/79 Central Limit Theorem Proposition (Del Moral 2004: Proposition 9.4.2) Under regularity conditions, for any n ≥ 1: √ d N(ηnN (ϕn ) − ηn (ϕn )) → N 0, σn2 (ϕn ) where σn2 (ϕn ) = n X 2 ηq−1 Kq,ηq−1 (Qq,n (ϕn ) − Kq,ηq−1 (Qq,n (ϕn ))) q=1 56/79 Part 3 – Interface Particle Filters as McKean Interpretations The Bootstrap Particle Filter The Simplest Case Recall: The SIR Particle Filter I (i) (i) At iteration n, given {Wn−1 , X1:n−1 }: e (i) }. 1. Resample, to obtain { N1 , X 1:n−1 (i) e (i) ). 2. Sample Xn ∼ qn (·|X 3. Set 4. Set I n−1 (i) e (i) . X1:n−1 = X 1:n−1 (i) (i) (i) (i) (i) (i) Wn = f (Xn |Xn−1 )g (yn |Xn )/qn (Xn |Xn−1 ). Selection Mutation Feynman-Kac formulation? I I (i) (i) Generally Wn depends upon Xn−1 . (At least) 2 solutions exist. 59/79 The Bootstrap SIR Filter I The bootstrap particle filter: I I I Proposal: q(xn−1 , xt ) = f (xn |xn−1 ) Weight: w (xn ) ∝ g (yn |xn ) Feynman-Kac model: I I I (Gordon, Salmond and Smith, 1993) Mutation: Mn (xn−1 , dxn ) = f (xn |xn−1 )dxt . Potential: Gn (xn ) = g (yn |xn ). McKean interpretation: I I McKean transitions: Kn+1,η = Sn,η Mn+1 . Selection operation: Sn,η = ΨGn (η). 60/79 Bootstrap Particle Filter Results LLN (i) (i) i=1 Wn ϕn (Xn ) a.s. → PN (j) W n j=1 PN lim N→∞ Z ϕn (xn )p(xn |y1:n )dxn CLT (i) (i) i=1 Wn ϕn (Xn ) PN (j) j=1 Wn PN √ N ! Z − ϕn (xn )p(xn |y1:n )dxn d 2 → N 0, σBS,n (ϕn ) 61/79 Bootstrap Particle Filter: Asymptotic Variance 2 σBS,n (ϕn ) = 2 Z Z p(x1 |y1:n )2 ϕn (xn )p(xn |y2:n , x1 )dxn − ϕ̄n dx1 p(x1 ) 2 Z t−1 Z X p(xk |y1:n )2 + ϕn (xn )p(xn |yk+1:n , xk )dxn − ϕ̄n dx1:k p(xk |y1:k−1 ) k=2 Z p(xn |y1:n )2 + (ϕn (xn ) − ϕ̄n )2 dxn . p(xn |y1:n−1 ) with Z ϕ̄n = p(xn |y1:n )ϕn (xn )dxn . 62/79 Extended Spaces: General SIR Particle Filter I (i) (i) At iteration n, given {Wn−1 , X1:n−1 }: e (i) }. 1. Resample, to obtain { N1 , X 1:n−1 (i) e (i) ). 2. Sample Xn ∼ qn (·|X 3. Set 4. Set n−1 (i) e (i) . X1:n−1 = X 1:n−1 (i) (i) (i) (i) (i) (i) Wn = f (Xn |Xn−1 )g (yn |Xn )/qn (Xn |Xn−1 ). (i) Selection Mutation (i) I But Wn depends upon Xn−1 I Let Ẽn = En−1 × En . I Define Yn = (Xn−1 , Xn ). I Now Wn = G̃n (Yn ). I Set M̃n (yn−1 , dyn ) = δyn−1,2 (dyn,1 )q(yn−1,2 , dyn,1 ). I A Feynman-Kac representation. 63/79 SIR Asymptotic Variance 2 σSIR,n Z 2 p(x1 |y1:n )2 (ϕn ) = ϕn (xn )p(xn |y2:n , x1 )dxn − ϕ̄n dx1 q1 (x1 ) t−1 Z X p(x1:k |y1:n )2 + p(x1:k−1 |y1:k−1 )qk (xk |xk−1 ) k=2 Z 2 ϕn (xn )p(x:n |yk+1:n , xk )dxn − ϕ̄n dx1:k Z p(x1:n |y1:n )2 + (ϕn (xn ) − ϕ̄n )2 dx1:n . p(x1:n−1 |y1:n−1 )qn (xn |xn−1 ) Z 64/79 Auxiliary Particle Filters Another algorithm. Auxiliary [v] Particle Filters (Pitt & Shephard ’99) If we have access to the next observation before resampling, we could use this structure: (i) (i) I Pre-weight every particle with λn ∝ b p (yn |Xn−1 ). I Propose new states, from the mixture distribution N X (i) (i) λn q(·|Xn−1 ) i=1 I (i) λn . i=1 Weight samples, correcting for the pre-weighting. (i) Wni I N X ∝ (i) (i) f (Xn,n |Xn,n−1 )g (Xn,n |yn ) (i) λn qn (Xn,n |Xn,1:n−1 ) Resample particle set. 66/79 Some Well Known Refinements We can tidy things up a bit: 1. The auxiliary variable step is equivalent to multinomial resampling. 2. So, there’s no need to resample before the pre-weighting. Now we have: (i) (i) I Pre-weight every particle with λn ∝ b p (yn |Xn−1 ). I Resample I Propose new states I Weight samples, correcting for the pre-weighting. (i) (i) Wn ∝ (i) (i) f (Xn,n |Xn,n−1 )g (Xn,n |yn ) (i) (i) (i) λn qn (Xn,n |Xn,1:n−1 ) 67/79 A Feynman-Kac Interpretation of the APF A transition and a potential. An Interpretation of the APF If we move the first step at time n + 1 to the last at time n, we get: I Resample I Propose new states I Weight samples, correcting earlier pre-weighting. I Pre-weight every particle with λn+1 ∝ b p (yn+1 |Xn ). (i) (i) An SIR algorithm targetting: b pn (x1:n |y1:n+1 ) ∝ p(x1:n |y1:n )b p (yn+1 |xn ). 69/79 Some Consequences Asymptotics. Theoretical Considerations I Direct analysis of the APF is largely unnecessary. I Results can be obtained by considering the associated SIR algorithm. I SIR has a (discrete time) Feynman-Kac interpretation. 71/79 For example. . . Proposition. Under standard regularity conditions √ N N ϕ bn,APF − ϕn → N 0, σn2 (ϕn ) where, σn2 (ϕn ) = Z 2 Z p(x1 |y1:n )2 ϕn (xn )p(xn |y2:n , x1 )dxn − ϕ̄n dx1 q1 (x1 ) Z 2 Z t−1 X p(x1:k |y1:n )2 + ϕn (xn )p(xn |yk+1:n , xk )dxn − ϕ̄n dx1:k b(x1:k−1 |y1:k )qk (xk |xk−1 ) p k=2 Z p(x1:n |y1:n )2 2 + (ϕn (xn ) − ϕ̄n ) dx1:n . b(x1:n−1 |y1:n )qn (xn |xn−1 ) p 72/79 Practical Implications I It means we’re doing importance sampling. I Choosing b p (yn |xn−1 ) = p (yn |xn = E [Xn |xn−1 ]) is dangerous. I A safer choice would be ensure that sup xn−1 ,xn I g (yn |xn )f (xn |xn−1 ) <∞ b(yn |xn−1 )q(xn |xn−1 ) p Using APF doesn’t ensure superior performance. 73/79 A Contrived Illustration Consider the following binary state-space model with common state and observation spaces: X = {0, 1} p(x1 = 0) = 0.5 p(xn = xn−1 ) = 1 − δ Y=X p(yn = xn ) = 1 − ε. I δ controls ergodicity of the state process. I controls the information contained in observations. Consider estimating E(X2 |Y1:2 = (0, 1)). 74/79 Variance of SIR - Variance of APF 0.05 0 -0.05 -0.1 -0.15 -0.2 1 0.9 0.8 0.7 0.05 0.6 0.1 0.15 0.5 0.4 0.2 0.25 ǫ 0.3 0.3 0.35 δ 0.2 0.4 0.45 0.1 0.5 75/79 Variance Comparison at ǫ = 0.25 0.45 SISR APF SISR Asymptotic APF Asymptotic 0.4 Variance 0.35 0.3 0.25 0.2 0.15 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 δ 76/79 Further Reading Particle Filters / Sequential Monte Carlo I Novel approach to nonlinear/non-Gaussian Bayesian state estimation, Gordon, Salmond and Smith, IEE Proceedings-F, 140(2):107-113, 1993. I Sequential Monte Carlo in Practice, Doucet, De Freitas & Gordon, Springer, 2001. I A survey of convergence results on particle filtering methods for practitioners., Crisan and Doucet, IEEE Transactions on Signal Processing, 50(3):736-746, 2002. I Central limit theorem for sequential Monte Carlo methods and its applications to Bayesian inference. Chopin, Annals of Statistics, 32(6):2385-2411, 2004. I A Tutorial on Particle Filtering. . . , Doucet & J., in Oxford Handbook of Nonlinear Filtering, 2011). Chapter 24:656–704. I Sequential Monte Carlo Samplers, Doucet, Del Moral & Jasra, JRSSB, 63(3):411-436, 2006. 78/79 Resampling I Comparison of resampling schemes for particle filters. Douc, Cappé and Moulines (2005). In Proc. ISPA 4, I:64–69. Feynman-Kac Particle Models I Particle Methods: An Introduction with Applications, Del Moral & Doucet, HAL INRIA Report RR-6691, 2009. I Feynman-Kac Formulæ. . . , Del Moral, Springer, 2004. Auxiliary Particle Filters I Filtering Via Simulation, Pitt & Shepherd, JASA, 94(446):590-599, 1999. I A Note on Auxiliary Particle Filters, J. & Doucet, Stat. & Prob. Lett., 78(12):1498–1504, 2008. I Optimality of the APF, Douc, Moulines & Olsson, Prob. & Math. Stat., 29(1):1–28, 2009. 79/79

Interacting Particle Systems and (Discrete Time) Filtering Adam M. Johansen

Related documents

Products

Support

Interacting Particle Systems and (Discrete Time) Filtering Adam M. Johansen

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib