Designing “Particle Filters” for Particle MCMC Adam M. Johansen a.m.johansen@warwick.ac.uk go.warwick.ac.uk/amjohansen/talks/ Oxford, March 4th, 2016 Outline I I Background: SMC and PMCMC Beyond simple filters: I I I I Iterative Lookahead Methods Tempering Blocks in SMC Hierarchical Particle Filters Conclusions Discrete Time Filtering Online inference for Hidden Markov Models: x1 x2 x3 x4 x5 x6 y1 y2 y3 y4 y5 y6 I Given initial distribution µθ (x1 ), transition fθ (xn−1 , xn ), I and likelihood gθ (xn , yn ), use pθ (xn |y1:n ) to characterize latent state, but, R pθ (xn−1 |y1:n−1 )fθ (xn−1 , xn )dxn−1 gθ (xn , yn ) pθ (xn |y1:n ) = R R pθ (xn−1 |y1:n−1 )fθ (xn−1 , xn0 )dxn−1 gθ (xn0 , yn )dxn0 I and other densities of interest are typically intractable. Particle Filtering A Simple “Bootstrap” Particle Filter (Gordon et al. [10]) At n = 1: I Sample X11 , . . . , X1N ∼ µθ . For n > 1: I Sample j j=1 gθ (Xn−1 , yn−1 )δj (·) Pn k k=1 gθ (Xn−1 , yn−1 ) PN A1n , . . . , AN n I I ∼ Ai n For i in 1 : N sample Xni ∼ fθ (Xn−1 , ·). Approximate pθ (dxn |y1:n ), pθ (y1:n ) with PN j=1 pbθ (·|y1:n ) = Pn gθ (Xnj , yn )δXnj k=1 gθ (Xnk , yn ) , N 1X pbθ (y1:n ) gθ (Xnk , yn ) = pbθ (y1:n−1 ) n j=1 Iteration 2 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 Iteration 3 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 Iteration 4 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 Iteration 5 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 Iteration 6 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 Iteration 7 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 Iteration 8 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 Iteration 9 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 Iteration 10 8 7 6 5 4 3 2 1 0 −1 −2 1 2 3 4 5 6 7 8 9 10 Various Improvements to Particle Filters I Better Proposal Distributions: I I Locally optimal (cf. Doucet et al. [6]) case q(xn |xn−1 , yn ) ∝ fθ (xn−1 , xn )g (xn , yn ). Incremental importance weights Gn (xn−1 , xn ) = f (xn−1 , xn )g (xn , yn )/q(xn−1 , xn ). I Auxiliary Particle Filters (Pitt and Shephard [15]) (see also Johansen and Doucet [13]) I Better Resampling Schemes (cf. Douc and Cappé [5]) I “Adaptive” Resampling (cf. Del Moral et al. [4]) I Resample-Move Schemes (MCMC; Gilks and Berzuini [8]) I Tempered Transitions (Godsill and Clapp [9]) All of these retain the online character of particle filtering. Key Properties of Particle Filters I Fundamentally online: can approximate pθ (xn |y1:n ) at iteration n at constant cost per iteration. I Yield good approximations of the filtering distribution: pθ (xn |y1:n ) at time n. I Can approximate the marginal likelihood Z pθ (y1:n ) = pθ (x1:n , y1:n )dx1:n unbiasedly. Only the last of these is critical in an offline setting. . . Online Particle Filters for Offline Estimation (via PMCMC) Particle Markov chain Monte Carlo (Andrieu et al. [2]) I Embed SMC within MCMC, I justified via explicit auxiliary variable construction, I or in some cases by a pseudomarginal (Andrieu and Roberts [1]) argument. I Very widely applicable, I but prone to poor mixing when SMC performs poorly for some θ (Owen et al. [14, Section 4.2.1]). I Is valid for very general SMC algorithms. Iterative Lookahead Methods1 : Motivation I Online algorithms can only perform so well: Z p(x1:n |y1:T ) = p(x1:T |y1:T )dxn+1:T 6= p(x1:n |y1:n ) I We’d benefit from targetting the smoothing distributions: π̃n (x1:n ) = p(x1:n |y1:T ) ∝ p(x1:n |y1:n )p(yn+1:T |xn ) in place of the filtering distributions: πn (x1:n ) = p(x1:n |y1:n ) I 1 But this is really hard: can we approximate p(yn+1:T |xn )? Joint work with Pieralberto Guarniero and Anthony Lee Twisting the HMM (complements (Whiteley and Lee [16])) Given (µ, f , g ) and y1:T , introducing ψ := (ψ1 , ψ2 , . . . , ψT ) , ψt ∈ Cb (X, (0, ∞)) and Z Z ψ̃0 := µ (x1 ) ψ1 (x1 ) dx1 ψ̃t (xt ) := f (xt , xt+1 ) ψt+1 (xt+1 ) dxt+1 X X ψ ψ (and ψ̃T ≡ 1) we obtain (µψ 1 , {ft }, {gt }), with µψ 1 (x1 ) := µ(x1 )ψ1 (x1 ) , ψ̃0 ftψ (xt−1 , xt ) := f (xt−1 , xt ) ψt (xt ) ψ̃t−1 (xt−1 ) and the sequence of non-negative functions g1ψ (x1 ) := g (x1 , y1 ) ψ̃1 (x1 ) ψ̃0 , ψ1 (x1 ) gtψ (xt ) := g (xt , yt ) ψ̃t (xt ) . ψt (xt ) Proposition For any sequence of bounded, continuous and positive functions ψ, let Z Zψ := XT ψ µψ 1 (x1 ) g1 (x1 ) T Y ftψ (xt−1 , xt ) gtψ (xt ) dx1:T . t=2 Then, Zψ = pθ (y1:T ) for any such ψ. The (variance) optimal choice is: T Y ψt∗ (xt ) := g (xt , yt ) E g (Xp , yp ) {Xt = xt } , p=t+1 xt ∈ X, for t ∈ {1, . . . , T − 1}. Then, ZψN∗ = p(y1:T ) with probability 1. ψ−Auxiliary Particle Filters (Guarniero et al. [11]) ψ-Auxiliary Particle Filter 1. Sample ξ1i ∼ µψ independently for i ∈ {1, . . . , N}. 2. For t = 2, . . . , T , sample independently ψ j ψ j j=1 gt−1 (ξt−1 )ft (ξt−1 , ·) , PN ψ j j=1 gt−1 (ξt−1 ) PN ξti ∼ Necessary features of ψ 1. It is possible to sample from ftψ . 2. It is possible to evaluate gtψ . 3. To be useful: V(ZψN ) must be small. i ∈ {1, . . . , N}. A Recursive Approximtion Proposition The sequence ψ ∗ satisfies ψT∗ (xT ) = g (xT , yT ), xT ∈ X and ∗ ψt∗ (xt ) = g (xt , yt ) f xt , ψt+1 , xt ∈ X, t ∈ {1, . . . , T − 1}. Algorithm 1 Recursive function approximations For t = T , . . . , 1: 1. Set ψti ← g ξti , yt f ξti , ψt+1 for i ∈ {1, . . . , N}. 2. Choose ψt as a member of Ψ on the basis of ξt1:N and ψt1:N . Iterated Auxiliary Particle Filters (Guarniero et al. [11]) Algorithm 2 An iterated auxiliary particle filter (param’s: N0 , k, τ ) 1. Initialize: set ψt 0 = 1. l ← 0. 2. Repeat: 2.1 Run a ψ l -APF with Nl particles; set Ẑl ← ZψNll . 2.2 If l > k and sd(Ẑl−k:l )/mean(Ẑl−k:l ) < τ , go to 3. 2.3 Compute ψ l+1 using Algorithm 1. 2.4 If Nl−k = Nl and the sequence Ẑl−k:l is not monotonically increasing, set Nl+1 ← 2Nl . Otherwise, set Nl+1 ← Nl . 2.5 Set l ← l + 1. Go to 2a. 3. Run a ψ l -APF. Return Ẑ := ZψNl . An Elementary Implementation Function Approximation . I Numerically obtain: (mt∗ , Σ∗t , λ∗t ) = arg min(m,Σ,λ) N X (N (xti , m, Σ)−λψti )2 i=1 I Set: ψt (xt ) := N (xt ; mt∗ , Σ∗t ) + c(N, mt∗ , Σ∗t ). Stopping Rule . I k = 3 or k = 5 in the following examples τ = 0.5 I Multinomial when ESS < N/2. I Resampling . A Linear Gaussian Model: Behaviour with Dimension µ = N (·; 0, Id ) f (x, ·) = N (·; Ax, Id ) where Aij = 0.42|i−j|+1 , and g (x, ·) = N (·; x, Id ) 0.5 1.0 1.5 iAPF Bootstrap 0.0 Estimates 2.0 Box plots of Ẑ /Z for different dim(X) (1000 replicates; T = 100). 5 5 10 10 20 40 80 Linear Gaussian Model: Sensitivity to Parameters 1.5 1.0 Estimates 0.0 0.5 1.5 1.0 0.0 0.5 Estimates 2.0 2.0 Fixing d = 10: Bootstrap (N = 50, 000) / iAPF (N0 = 1, 000) 0.3 0.34 0.38 0.42 Values of α Box plots of Ẑ Z 0.46 0.5 0.3 0.34 0.38 0.42 0.46 0.5 Values of α for different values of the parameter α using 1000 replicates. Linear Gaussian Model: PMMH Empirical Autocorrelations 0 100 200 300 400 500 600 0.8 ACF 0 Lag d =5 µ = N (·; 0, Id ) f (x, ·) = N (·; Ax, Id ) 0.4 0.0 ACF 0.4 0.0 0.4 0.0 ACF A55 0.8 A41 0.8 A11 and g (x, ·) = N (·; x, 0.25Id ) 100 200 300 400 500 600 0 100 200 Lag In this case: 0.9 0.3 A= 0.1 0.4 0.1 300 400 Lag 0 0 0 0 0.7 0 0 0 0.2 0.6 0 0 0.1 0.1 0.3 0 0.2 0.5 0.2 0 , treated as unknown lower triangular matrix. 500 600 Stochastic Volatility I A simple stochastic volatility model is defined by: µ(·) =N (·; 0, σ 2 /(1 − α)2 ) f (x, ·) =N (·; αx, σ 2 ) and g (x, ·) =N (·; 0, β 2 exp(x)), I where α ∈ (0, 1), β > 0 and σ 2 > 0 are unknown. Considered T = 945 observations y1:T corresponding to the mean-corrected daily returns for the GBP/USD exchange rate from 1/10/81 to 28/6/85. ACF 0.0 0.4 0.8 Estimated PMCMC Autocorrelation 0 20 40 60 80 Lag ACF 0.0 0.4 0.8 a 0 20 40 60 80 60 80 Lag ACF 0.0 0.4 0.8 σ 0 β 20 40 Lag Boostrap N = 1, 000. iAPF N0 = 100. Comparable cost. 150,000 PMCMC iterations. 0.974 0.978 0.982 0.986 0.99 0.994 0.974 0.978 0.982 0.986 0.99 0.994 0.974 0.978 0.982 0.986 0.99 0.994 Estimates −922 −923 −924 −925 −926 −927 −928 Values of a Bootstrap : N = 1, 000 / N = 10, 000 / iAPF , N0 = 100 Estimates −923 −924 −925 −926 0.61 0.65 0.69 0.73 0.77 0.81 0.61 0.65 0.69 0.73 0.77 0.81 0.61 0.65 0.69 0.73 0.77 0.81 −927 Values of β Bootstrap : N = 1, 000 / N = 10, 000 / iAPF , N0 = 100 −922 Estimates −923 −924 −925 −926 −927 0.12 0.13 0.14 0.15 0.16 0.17 0.12 0.13 0.14 0.15 0.16 0.17 0.12 0.13 0.14 0.15 0.16 0.17 −928 Values of σ Bootstrap : N = 1, 000 / N = 10, 000 / iAPF , N0 = 100 Block-Sampling and Tempering in Particle Filters Tempered Transitions (Godsill and Clapp [9]) I Introduce each likelihood term gradually, targetting: πn,m (x1:n ) ∝ p(x1:n |y1:n−1 )p(yn |xn )βm I between p(x1:n−1 |y1:n−1 ) and p(x1:n |y1:n ). Can improve performance — but to a limited extent. Block Sampling (Doucet et al. [7]) I Essentially uses yn+1:n+L in proposing xn+1 . I Can dramatically improve performance, I but requires good analytic approximation of p(xn+1:n+L |xn , yn+1:n+L ). Block Sampling: An Idealised Approach At time n, given x1:n−1 ; discard xn−L+1:n−1 : I I Sample from q(xn−L+1:n |xn−L , yn−L+1:n ). Weight with W (x1:n ) = I p(x1:n |y1:n ) p(x1:n−L |y1:n−1 )q(xn−L+1:n |xn−L , y1:n−L+1:n ) Optimally, q(xn−L+1:n |xn−L , yn−L+1:n ) =p(xn−L+1:n |xn−L , yn−L+1:n ) p(x1:n−L |y1:n ) W (x1:n ) ∝ =p(yn |x1:n−L , yn−L+1:n−1 ) p(x1:n−L |y1:n−1 ) I Typically intractable; auxiliary variable approach in [7]. Block-Tempering (Johansen [12]) I We could combine blocking and tempering strategies. I Run a simple SMC sampler (Del Moral et al. [3]) targetting: 1 β(t,r ) θ πt,r (x1:t∧T ) =µθ (x1 ) T ∧t Y s=2 1 γ(t,r ) gθ (y1 |x1 ) fθ (xs |xs−1 ) s β(t,r ) · s s where {β(t,r ) } and {γ(t,r ) } are [0, 1]-valued I I I s γ(t,r ) gθ (ys |xs ) , for s ∈ [[1, T ]], r ∈ [[1, R]] and t ∈ [[1, T 0 ]], with T 0 = T + L for some R, L ∈ N. Can be validly embedded within PMCMC: I I Terminal likelihood estimate is unbiased. Explicit auxiliary variable construction is possible. (1) Two Simple Block-Tempering Strategies Tempering both likelihood and transition probabilities s s β(t,r ) = γ(t,r ) = 1∧ R(t − s) + r RL ∨0 (2) Tempering only the observation density s β(t,r ) =I{s ≤ t} s γ(t,r ) R(t − s) + r = 1∧ ∨ 0, RL (3) Tempering Only the Observation Density s s β(t,R) = β(t,0) 1 0 t s s γ(t,R) 1 1 L s γ(t,0) 0 t−L+1 t s Illustrative Example I Univariate Linear Gaussian SSM: transition f (x 0 |x) =N (x 0 ; x, 1) likelihood 5 ● ● ● ●● ● ● g (y |x) =N (y ; x, 1) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Artificial jump in observation seqeuence at time 75. I Cartoon of model misspecification I — a key difficulty with PMCMC. Observations, y[t] ● I ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● −5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ● −10 ● ● ● ● I Temper only likelihood. −15 ● ● ● ● ● ● ● ● ● ● I Use single-site Metropolis-Within Gibbs (standard normal proposal) MCMC moves. 0 25 50 Time, t 75 ● ● ● ● ● ● ● ●●● ● ● 100 True Filtering and Smoothing Distributions 5 ● ●● ●● ● ● ● ● ● ●● ●●● ● ● ● ● 0 ● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● −5 ● ● x ● Algorithm smoother ● filter ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ●● ● ●●●●● ●● ● ● −10 ● ● ●● ● ● ● ●● −15 ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● 0 25 50 Time, t 75 100 b Estimating the Normalizing Constant, Z −88 L log(Estimated Z) ● ● −90 ● ● ● ● ● 1 2 3 5 ● R ● 0 1 5 10 −92 3 4 5 log(Cost = N(1+RL)) 6 b Against Computational Effort Relative Error in Z 2 L log(Relative Error in Z) 1 ● 1 ● 2 ● 3 ● 5 Performance: log(N) ● ● ● 3 ● 4 ● 5 ●6 R ● 0 0 1 5 10 4 5 log(Cost = N(1+RL)) 6 (R = 1, L = 1) ≺(R > 1, L = 1) ≺(R > 1, L > 1) Hierarchical Particle Filters2 I (Optimal) block sampler requires samples from (approximately) p(xn−L+1:n |xn−L , yn−L+1:n ) I and evaluations of p(yn |xn−L , yn−L+1:n−1 ) = I p(yn−L+1:n |xn−L ) . p(yn−L+1:n−1 |xn−L ) Particle filters can provide sample approximations of p(xn−L+1:n |xn−L , yn−L+1:n ) I and of p(yn−L+1:n |xn−L ). I 2 Can we use particle filters hierarchically? Joint work with Arnaud Doucet. . . SMC Distributions Formally gives rise to the SMC Distribution: M ψn,L an−L+2:n , xn−L+1:n , k; xn−L # " "M # n M Y Y Y i a p q( x in−L+1 x n−L ) r (ap |wp−1 ) q x ip |x p−1 r (k|wn ) = i=1 p=n−L+2 i=1 and the conditional SMC Distribution: ek k M k e e ψen,L ak , x ; x xn−L+1:n n−L+2:n n−L+1:n n−L bn−L+1:n−1 , k, e M ψn,L (e an−L+2:n , e xn−L+1:n , k; xn−L ) " # = ebk ebn n k e Q bn,n−L+1 n,p n,p−1 k e e p−1 q e e n) q e xn−L+1 |xn−L r bn,p |w xp |e xp−1 r (k|w p=n−L+2 Local Particle Filtering: Current Trajectories Starting Point 4 2 0 −2 −4 0 2 4 6 8 n 10 12 14 16 Local Particle Filtering: PF Proposal PF Step 4 2 0 −2 −4 0 2 4 6 8 n 10 12 14 16 Local Particle Filtering: CPF Auxiliary Proposal CPF Step 4 2 0 −2 −4 0 2 4 6 8 n 10 12 14 16 Local SMC: Version 1 I I Not just a Random Weight Particle Filter. Propose from: ⊗n−1 M U1:M (b1:n−2 , k)p(x1:n−1 |y1:n−1 )ψn,L (an−L+2:n , xn−L+1:n , k; xn−L ) k k M e ψn−1,L−1 e an−L+2:n−1 , e xn−L+1:n−1 ; xn−L ||bn−L+2:n−1 , xn−L+1:n−1 I Target: b k̄ n,n−L+1:n ⊗n k̄ U1:M (b1:n−L , b̄n,n−L+1:n−1 , k̄)p(x1:n−L , x n−L+1:n |y1:n ) k̄ k̄ k M b̄n,n−L+1:n , x bn,n−L+1:n , ; x ψen,L ak x n−L n−L+2:n n−L+1:n n−L+1:n M ψn−1,L−1 (e an−L+2:n−1 , e xn−L+1:n−1 , k; xn−L ) . I en−L+1:n−1 . Weight: Z̄n−L+1:n /Z Local SMC: Version 2 Problems with this PF+CPF scheme: I Expensive to run 2 filters per proposal. . . I and large M is required. . . I can’t we do better? Using a non-standard CPF/PF proposal is preferable: I Running a CPF from time n − L + 1 to n, I and extending it’s paths with a PF step to n + 1, I invoking a careful auxiliary variable construction, I we reduce computational cost and variance. I Leads to importance weights: bk ∝ n−1,n−L p̂(yn−L+1:n |xn−L ) bk n−1,n−L p̂(yn−L+1:n−1 |xn−L ) Hierarchical Particle Filtering: Current Trajectories Starting Point 7 6 5 4 3 2 1 0 -1 -2 0 2 4 6 8 n 10 12 14 16 Hierarchical Particle Filtering: CPF Proposal Component CPF Step 7 6 5 4 3 2 1 0 -1 -2 0 2 4 6 8 n 10 12 14 16 Hierarchical Particle Filtering: PF Proposal Component PF Step 7 6 5 4 3 2 1 0 -1 -2 0 2 4 6 8 n 10 12 14 16 A Toy Model: Linear Gaussian HMM I Linear, Gaussian state transition: f (xt |xt−1 ) = N (xt ; xt−1 , 1) I and likelihood g (yt |xt ) = N (yt ; xt , 1) I Analytically: Kalman filter/smoother/etc. I Simple bootstrap HPF: I Local proposal: q(xt |xt−1 , yt ) = f (xt |xt−1 ) I Weighting: W (xt−1 , xt ) ∝ g (yt |xt ) ● 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● N ● ● ● ● ● 100 ● ● ● ● ● 1000 ● 10000 ● ● 1e+05 2 ● ● M ● ● 1 ● rZ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● 2 ● ● ● ● ● ● 4 8 16 ● ● ● ● ● 32 64 ● ● 128 ● ● 256 ● ● ● ● ● 512 ● 1024 ● Bo ots 4 tra p 0 L 0 00 10 00 0 12 80 0 16 00 0 25 60 0 32 00 0 51 20 0 64 00 0 1e +0 5 10 24 00 12 80 00 20 48 00 25 60 00 40 96 00 51 20 00 10 24 00 0 20 48 00 0 40 96 00 0 80 00 00 64 32 00 16 80 log(rZ) 2 ● N 100 ● ● ● 1000 C ● ● ● ● ● ● ● ● 10000 0 ● ● −4 M 1e+05 ● 1 ● 2 ● 4 8 −2 16 32 64 128 256 512 L 1024 4 Bootstrap Conclusions I I I To fully realise the potential of PMCMC we should exploit its flexibility. Even very simple variants on the standard particle filter can significantly improve performance. Three particular possibilities were discussed here: I iAPF I I I I Block-Tempering I I I The iAPF can improve performance substantially in some settings. Extending the extent of its applicability is ongoing work. In principle any function approximation scheme can be employed: provided that ftψ can be sampled from and gtψ evaluated. Can significantly improve performance with misspecified models. Requires MCMC kernels and introduces 2 tuning parameters Hierarchical Particle Filters I I Dramatically reduce the need for resampling. Computationally rather costly. References [1] C. Andrieu and G. O. Roberts. The pseudo-marginal approach for efficient Monte Carlo computations. Annals of Statistics, 37(2):697–725, 2009. [2] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo. Journal of the Royal Statistical Society B, 72(3):269–342, 2010. [3] P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society B, 63(3):411–436, 2006. [4] P. Del Moral, A. Doucet, and A. Jasra. On adaptive resampling procedures for sequential Monte Carlo methods. Bernoulli, 18(1), 2012. URL http://hal.inria.fr/inria-00332436/PDF/RR-6700.pdf. [5] R. Douc and O. Cappé. Comparison of resampling schemes for particle filters. In Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, volume I, pages 64–69. IEEE, 2005. [6] A. Doucet, S. Godsill, and C. Andrieu. On sequential simulation-based methods for Bayesian filtering. Statistics and Computing, 10(3):197–208, 2000. [7] A. Doucet, M. Briers, and S. Sénécal. Efficient block sampling strategies for sequential Monte Carlo methods. Journal of Computational and Graphical Statistics, 15(3):693–711, 2006. [8] W. R. Gilks and C. Berzuini. Following a moving target – Monte Carlo inference for dynamic Bayesian models. Journal of the Royal Statistical Society B, 63: 127–146, 2001. [9] S. Godsill and T. Clapp. Improvement strategies for Monte Carlo particle filters. In A. Doucet, N. de Freitas, and N. Gordon, editors, Sequential Monte Carlo Methods in Practice, Statistics for Engineering and Information Science, pages 139–158. Springer Verlag, New York, 2001. [10] N. J. Gordon, S. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings-F, 140(2): 107–113, April 1993. [11] P. Guarniero, A. M. Johansen, and A. Lee. The iterated auxiliary particle filter. ArXiv mathematics e-print 1511.06286, ArXiv Mathematics e-prints, 2015. [12] A. M. Johansen. On blocks, tempering and particle MCMC for systems identification. In Y. Zhao, editor, Proceedings of 17th IFAC Symposium on System Identification, pages 969–974, Beijing, China., 2015. IFAC. Invited submission. [13] A. M. Johansen and A. Doucet. A note on the auxiliary particle filter. Statistics and Probability Letters, 78(12):1498–1504, September 2008. URL http://dx.doi.org/10.1016/j.spl.2008.01.032. [14] J. Owen, D. J. Wilkinson, and C. S. Gillespie. Likelihood free inference for markov processes: a comparison. Statistical Applications in Genetics and Molecular Biology, 2015. In press. [15] M. K. Pitt and N. Shephard. Filtering via simulation: Auxiliary particle filters. Journal of the American Statistical Association, 94(446):590–599, 1999. [16] Nick Whiteley and Anthony Lee. Twisted particle filters. Annals of Statistics, 42 (1):115–141, 2014.