On “Particle Filters” for Parameter Estimation Adam M. Johansen University of Warwick a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ Christmas Workshop on SMC Imperial College, December 22nd, 2015 Outline I Background: SMC and PMCMC I Iterative Lookahead Methods I Tempering Blocks in SMC I Hierarchical Particle Filters I Conclusions Discrete Time Filtering Online inference for Hidden Markov Models: x1 x2 x3 x4 x5 x6 y1 y2 y3 y4 y5 y6 I Given transition fθ (xn−1 , xn ), I and likelihood gθ (xn , yn ), use pθ (xn |y1:n ) to characterize latent state, but, R pθ (xn−1 |y1:n−1 )fθ (xn−1 , xn )dxn−1 gθ (xn , yn ) R R pθ (xn |y1:n ) = pθ (xn−1 |y1:n−1 )fθ (xn−1 , x0n )dxn−1 gθ (x0n , yn )dx0n I isn’t often tractable. Particle Filtering A (sequential) Monte Carlo (SMC) scheme to approximate the filtering distributions. A Simple Particle Filter [6] At n = 1: I Sample X11 , . . . , X1N ∼ µθ . For n > 1: I Sample j j j=1 gθ (Xn−1 , yn−1 )fθ (Xn−1 , ·) Pn k k=1 gθ (Xn−1 , yn−1 ) PN Xn1 , . . . , XnN I ∼ Approximate pθ (dxn |y1:n ), pθ (y1:n ) with PN pbθ (·|y1:n ) = j=1 P n gθ (Xnj , yn )δXnj k k=1 gθ (Xn , yn ) N , pbθ (y1:n ) 1X = gθ (Xnk , yn ) pbθ (y1:n−1 ) n j=1 Online Particle Filters for Offline Systems Identification Particle Markov chain Monte Carlo (PMCMC) [2] I Embed SMC within MCMC, I justified via explicit auxiliary variable construction, I or in some cases by a pseudomarginal [1] argument. I Very widely applicable, I but prone to poor mixing when SMC performs poorly for some θ [9, Section 4.2.1]. I Is valid for very general SMC algorithms. Twisting the HMM (a complement to [10]) Given (µ, f, g) and y1:T , introducing ψ := (ψ1 , ψ2 , . . . , ψT ) , ψt ∈ Cb (X, (0, ∞)) and Z Z µ (x1 ) ψ1 (x1 ) dx1 ψ̃t (xt ) := f (xt , xt+1 ) ψt+1 (xt+1 ) dxt+1 ψ̃0 := X X ψ ψ we obtain (µψ 1 , {ft }, {gt }), with µψ 1 (x1 ) := µ(x1 )ψ1 (x1 ) , ψ̃0 ftψ (xt−1 , xt ) := f (xt−1 , xt ) ψt (xt ) ψ̃t−1 (xt−1 ) and the sequence of non-negative functions g1ψ (x1 ) := g (x1 , y1 ) ψ̃1 (x1 ) ψ̃0 , ψ1 (x1 ) gtψ (xt ) := g (xt , yt ) ψ̃t (xt ) . ψt (xt ) Proposition For any sequence of bounded, continuous and positive functions ψ, let Z Zψ := XT ψ µψ 1 (x1 ) g1 (x1 ) T Y ftψ (xt−1 , xt ) gtψ (xt ) dx1:T . t=2 Then, Zψ = pθ (y1:T ) for any such ψ. The optimal choice is: ψt∗ (xt ) := g (xt , yt ) E T Y p=t+1 g (Xp , yp ) {Xt = xt } , xt ∈ X, N = p(y for t ∈ {1, . . . , T − 1}. Then, Zψ ∗ 1:T ) with probability 1. Towards Iterative Auxiliary Particle Filters [7] ψ-Auxiliary Particle Filter 1. Sample ξ1i ∼ µψ independently for i ∈ {1, . . . , N }. 2. For t = 2, . . . , T , sample independently ψ j ψ j j=1 gt−1 (ξt−1 )ft (ξt−1 , ·) , PN ψ j j=1 gt−1 (ξt−1 ) PN ξti ∼ Necessary features of ψ 1. It is possible to sample from ftψ . 2. It is possible to evaluate gtψ . N ) must be small. 3. To be useful: V(Zbψ i ∈ {1, . . . , N }. A Recursive Approximtion Proposition The sequence ψ ∗ satisfies ψT∗ (xT ) = g (xT , yT ), xT ∈ X and ∗ ψt∗ (xt ) = g (xt , yt ) f xt , ψt+1 , xt ∈ X, t ∈ {1, . . . , T − 1}. Algorithm 1 Recursive function approximations For t = T, . . . , 1: 1. Set ψti ← g ξti , yt f ξti , ψt+1 for i ∈ {1, . . . , N }. 2. Choose ψt as a member of Ψ on the basis of ξt1:N and ψt1:N . Iterated Auxiliary Particle Filters Algorithm 2 An iterated auxiliary particle filter with parameters (N0 , k, τ ) 1. Initialize: set ψt 0 = 1. l ← 0. 2. Repeat: Nl 2.1 Run a ψ l -APF with Nl particles; set Ẑl ← Zψ l. 2.2 If l > k and sd(Ẑl−k:l )/mean(Ẑl−k:l ) < τ , go to 3. 2.3 Compute ψ l+1 using Algorithm 1. 2.4 If Nl−k = Nl and the sequence Ẑl−k:l is not monotonically increasing, set Nl+1 ← 2Nl . Otherwise, set Nl+1 ← Nl . 2.5 Set l ← l + 1. Go to 2a. Nl 3. Run a ψ l -APF. Return Ẑ := Zψ . An Elementary Implementation Function Approximation . I Numerically obtain: (m∗t , Σ∗t , λ∗t ) = arg min(m,Σ,λ) N X (N (xit , m, Σ)−λψti )2 i=1 I Set: ψt (xt ) := N (xt ; m∗t , Σ∗t ) + c(N, m∗t , Σ∗t ). Stopping Rule . I k = 3 or k = 5 in the following examples τ = 0.5 I Multinomial when ESS < N/2. I Resampling . A Linear Gaussian Model: Behaviour with Dimension µ = N (·; 0, Id ) f (x, ·) = N (·; Ax, Id ) where Aij = 0.42|i−j|+1 , and g(x, ·) = N (·; x, Id ) 0.5 1.0 1.5 iAPF Bootstrap 0.0 Estimates 2.0 Box plots of Ẑ/Z for different |X| (1000 replicates; T = 100). 5 5 10 10 20 40 80 Linear Gaussian Model: Sensitivity to Parameters 1.5 1.0 Estimates 0.0 0.5 1.5 1.0 0.5 0.0 Estimates 2.0 2.0 Fixing d = 10: Bootstrap (N = 50, 000) / iAPF (N0 = 1, 000) 0.3 0.34 0.38 0.42 Values of α Box plots of Ẑ Z 0.46 0.5 0.3 0.34 0.38 0.42 0.46 0.5 Values of α for different values of the parameter α using 1000 replicates. Linear Gaussian Model: PMMH Empirical Autocorrelations 0 100 200 300 400 500 600 0.8 ACF 0 Lag d=5 µ = N (·; 0, Id ) f (x, ·) = N (·; Ax, Id ) 0.4 0.0 ACF 0.4 0.0 0.4 0.0 ACF A55 0.8 A41 0.8 A11 and g(x, ·) = N (·; x, 0.25Id ) 100 200 300 400 500 600 0 100 200 300 Lag In this case: 0.9 0.3 A= 0.1 0.4 0.1 400 Lag 0 0 0 0 0.7 0 0 0 0.2 0.6 0 0 0.1 0.1 0.3 0 0.2 0.5 0.2 0 , treated as unknown lower triangular matrix. 500 600 Stochastic Volatility I A simple stochastic volatility model is defined by: µ(·) =N (·; 0, σ 2 /(1 − α)2 ) f (x, ·) =N (·; αx, σ 2 ) and g(x, ·) =N (·; 0, β 2 exp(x)), I where α ∈ (0, 1), β > 0 and σ 2 > 0 are unknown. Considered T = 945 observations y1:T corresponding to the mean-corrected daily returns for the GBP/USD exchange rate from 1/10/81 to 28/6/85. ACF 0.0 0.4 0.8 Estimated PMCMC Autocorrelation 0 20 40 60 80 Lag ACF 0.0 0.4 0.8 a 0 20 40 60 80 60 80 Lag ACF 0.0 0.4 0.8 σ 0 β 20 40 Lag Boostrap N = 1, 000. iAPF N0 = 100. Comparable cost. 150,000 PMCMC iterations. 0.974 0.978 0.982 0.986 0.99 0.994 0.974 0.978 0.982 0.986 0.99 0.994 0.974 0.978 0.982 0.986 0.99 0.994 Estimates −922 −923 −924 −925 −926 −927 −928 Values of a Bootstrap : N = 1, 000 / N = 10, 000 / iAP F, N0 = 100 Estimates −923 −924 −925 −926 0.61 0.65 0.69 0.73 0.77 0.81 0.61 0.65 0.69 0.73 0.77 0.81 0.61 0.65 0.69 0.73 0.77 0.81 −927 Values of β Bootstrap : N = 1, 000 / N = 10, 000 / iAP F, N0 = 100 −922 Estimates −923 −924 −925 −926 −927 0.12 0.13 0.14 0.15 0.16 0.17 0.12 0.13 0.14 0.15 0.16 0.17 0.12 0.13 0.14 0.15 0.16 0.17 −928 Values of σ Bootstrap : N = 1, 000 / N = 10, 000 / iAP F, N0 = 100 Block-Sampling and Tempering in Particle Filters Tempered Transitions [5] I Introduce each likelihood term gradually, targetting: πn,m (x1:n ) ∝ p(x1:n |y1:n−1 )p(yn |xn )βm I between p(x1:n−1 |y1:n−1 ) and p(x1:n |y1:n ). Can improve performance — but to a limited extent. Block Sampling [4] I Essentially uses yn:n+L in proposing xn . I Can dramatically improve performance, I but requires good analytic approximation of p(xn:n+L |xn−1 , yn:n+L ). Block-Tempering [8] I We could combine blocking and tempering strategies. I Run a simple SMC sampler [3] targetting: 1 β(t,r) θ πt,r (x1:t∧T ) =µθ (x1 ) T ∧t Y s=2 1 γ(t,r) gθ (y1 |x1 ) fθ (xs |xs−1 ) s β(t,r) · s γ(t,r) gθ (ys |xs ) s s where {β(t,r) } and {γ(t,r) } are [0, 1]-valued I I I , for s ∈ [[1, T ]], r ∈ [[1, R]] and t ∈ [[1, T 0 ]], with T 0 = T + L for some R, L ∈ N. Can be validly embedded within PMCMC: I I Terminal likelihood estimate is unbiased. Explicit auxiliary variable construction is possible. (1) Two Simple Block-Tempering Strategies Tempering both likelihood and transition probabilities s β(t,r) = s γ(t,r) R(t − s) + r = 1∧ ∨0 RL (2) Tempering only the observation density s β(t,r) =I{s ≤ t} s γ(t,r) R(t − s) + r = 1∧ ∨ 0, RL (3) Tempering Only the Observation Density s s β(t,R) = β(t,0) 1 0 t s s γ(t,R) 1 1 L s γ(t,0) 0 t−L+1 t s Illustrative Example I Univariate Linear Gaussian SSM: transition f (x0 |x) =N (x0 ; x, 1) likelihood 5 ● ● ● ●● ● ● g(y|x) =N (y; x, 1) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Artificial jump in observation seqeuence at time 75. I Cartoon of model misspecification I — a key difficulty with PMCMC. Observations, y[t] ● I ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● −5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ●● ● −10 ● ● ● ● I Temper only likelihood. −15 ● ● ● ● ● ● ● ● ● ● I Use single-site Metropolis-Within Gibbs (standard normal proposal) MCMC moves. 0 25 50 Time, t 75 ● ● ● ● ● ● ● ●●● ● ● 100 True Filtering and Smoothing Distributions 5 ● ●● ●● ● ● ● ● ● ●● ●●● ● ● ● ● 0 ● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● −5 ● ● x ● Algorithm smoother ● filter ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ●● ● ●●●●● ●● ● ● −10 ● ● ●● ● ● ● ●● −15 ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● 0 25 50 Time, t 75 100 Estimating the Normalizing Constant, Zb −88 L log(Estimated Z) ● ● −90 ● ● ● ● ● 1 2 3 5 ● R ● 0 1 5 10 −92 3 4 5 log(Cost = N(1+RL)) 6 Relative Error in Zb Against Computational Effort 2 L log(Relative Error in Z) 1 ● 1 ● 2 ● 3 ● 5 Performance: log(N) ● ● ● 3 ● 4 ● 5 ●6 R ● 0 0 1 5 10 4 5 log(Cost = N(1+RL)) 6 (R = 1, L = 1) ≺(R > 1, L = 1) ≺(R > 1, L > 1) Conclusions I To fully realise the potential of PMCMC we should exploit its flexibility. I Even very simple variants on the standard particle filter can significantly improve performance. I The iAPF can improve performance substantially in some settings. I Extending the extent of its applicability is ongoing work. I In principle any function approximation scheme can be employed: provided that ftψ can be sampled from and gtψ evaluated. I Other [standard and less standard] ideas including blocking and tempering can also be readily employed. References [1] C. Andrieu and G. O. Roberts. The pseudo-marginal approach for efficient Monte Carlo computations. Annals of Statistics, 37(2):697–725, 2009. [2] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo. Journal of the Royal Statistical Society B, 72(3):269–342, 2010. [3] P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society B, 63(3):411–436, 2006. [4] A. Doucet, M. Briers, and S. Sénécal. Efficient block sampling strategies for sequential Monte Carlo methods. Journal of Computational and Graphical Statistics, 15(3):693–711, 2006. [5] S. Godsill and T. Clapp. Improvement strategies for Monte Carlo particle filters. In A. Doucet, N. de Freitas, and N. Gordon, editors, Sequential Monte Carlo Methods in Practice, Statistics for Engineering and Information Science, pages 139–158. Springer Verlag, New York, 2001. [6] N. J. Gordon, S. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings-F, 140(2):107–113, April 1993. [7] P. Guarniero, A. M. Johansen, and A. Lee. The iterated auxiliary particle filter. ArXiv mathematics e-print 1511.06286, ArXiv Mathematics e-prints, 2015. [8] A. M. Johansen. On blocks, tempering and particle MCMC for systems identification. In Proceedings of 17th IFAC Symposium on System Identification, Beijing, China., 2015. IFAC. [9] J. Owen, D. J. Wilkinson, and C. S. Gillespie. Likelihood free inference for markov processes: a comparison. Statistical Applications in Genetics and Molecular Biology, 2015. In press. [10] N. Whiteley and A. Lee. Twisted particle filters. Annals of Statistics, 42(1):115–141, 2014.