On “Particle Filters” for Parameter Estimation Adam M. Johansen

advertisement
On “Particle Filters” for Parameter Estimation
Adam M. Johansen
University of Warwick
a.m.johansen@warwick.ac.uk
www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/
Christmas Workshop on SMC
Imperial College, December 22nd, 2015
Outline
I
Background: SMC and PMCMC
I
Iterative Lookahead Methods
I
Tempering Blocks in SMC
I
Hierarchical Particle Filters
I
Conclusions
Discrete Time Filtering
Online inference for Hidden Markov Models:
x1
x2
x3
x4
x5
x6
y1
y2
y3
y4
y5
y6
I
Given transition fθ (xn−1 , xn ),
I
and likelihood gθ (xn , yn ),
use pθ (xn |y1:n ) to characterize latent state, but,
R
pθ (xn−1 |y1:n−1 )fθ (xn−1 , xn )dxn−1 gθ (xn , yn )
R
R
pθ (xn |y1:n ) =
pθ (xn−1 |y1:n−1 )fθ (xn−1 , x0n )dxn−1 gθ (x0n , yn )dx0n
I
isn’t often tractable.
Particle Filtering
A (sequential) Monte Carlo (SMC) scheme to approximate the
filtering distributions.
A Simple Particle Filter [6]
At n = 1:
I
Sample X11 , . . . , X1N ∼ µθ .
For n > 1:
I
Sample
j
j
j=1 gθ (Xn−1 , yn−1 )fθ (Xn−1 , ·)
Pn
k
k=1 gθ (Xn−1 , yn−1 )
PN
Xn1 , . . . , XnN
I
∼
Approximate pθ (dxn |y1:n ), pθ (y1:n ) with
PN
pbθ (·|y1:n ) =
j=1
P
n
gθ (Xnj , yn )δXnj
k
k=1 gθ (Xn , yn )
N
,
pbθ (y1:n )
1X
=
gθ (Xnk , yn )
pbθ (y1:n−1 )
n j=1
Online Particle Filters for Offline Systems Identification
Particle Markov chain Monte Carlo (PMCMC) [2]
I
Embed SMC within MCMC,
I
justified via explicit auxiliary variable construction,
I
or in some cases by a pseudomarginal [1] argument.
I
Very widely applicable,
I
but prone to poor mixing when SMC performs poorly for
some θ [9, Section 4.2.1].
I
Is valid for very general SMC algorithms.
Twisting the HMM (a complement to [10])
Given (µ, f, g) and y1:T , introducing
ψ := (ψ1 , ψ2 , . . . , ψT ) , ψt ∈ Cb (X, (0, ∞)) and
Z
Z
µ (x1 ) ψ1 (x1 ) dx1 ψ̃t (xt ) :=
f (xt , xt+1 ) ψt+1 (xt+1 ) dxt+1
ψ̃0 :=
X
X
ψ
ψ
we obtain (µψ
1 , {ft }, {gt }), with
µψ
1 (x1 ) :=
µ(x1 )ψ1 (x1 )
,
ψ̃0
ftψ (xt−1 , xt ) :=
f (xt−1 , xt ) ψt (xt )
ψ̃t−1 (xt−1 )
and the sequence of non-negative functions
g1ψ (x1 ) := g (x1 , y1 )
ψ̃1 (x1 )
ψ̃0 ,
ψ1 (x1 )
gtψ (xt ) := g (xt , yt )
ψ̃t (xt )
.
ψt (xt )
Proposition
For any sequence of bounded, continuous and positive functions
ψ, let
Z
Zψ :=
XT
ψ
µψ
1 (x1 ) g1 (x1 )
T
Y
ftψ (xt−1 , xt ) gtψ (xt ) dx1:T .
t=2
Then, Zψ = pθ (y1:T ) for any such ψ.
The optimal choice is:

ψt∗ (xt ) := g (xt , yt ) E 
T
Y
p=t+1

g (Xp , yp ) {Xt = xt } ,
xt ∈ X,
N = p(y
for t ∈ {1, . . . , T − 1}. Then, Zψ
∗
1:T ) with probability 1.
Towards Iterative Auxiliary Particle Filters [7]
ψ-Auxiliary Particle Filter
1. Sample ξ1i ∼ µψ independently for i ∈ {1, . . . , N }.
2. For t = 2, . . . , T , sample independently
ψ
j
ψ j
j=1 gt−1 (ξt−1 )ft (ξt−1 , ·)
,
PN ψ
j
j=1 gt−1 (ξt−1 )
PN
ξti
∼
Necessary features of ψ
1. It is possible to sample from ftψ .
2. It is possible to evaluate gtψ .
N ) must be small.
3. To be useful: V(Zbψ
i ∈ {1, . . . , N }.
A Recursive Approximtion
Proposition
The sequence ψ ∗ satisfies ψT∗ (xT ) = g (xT , yT ), xT ∈ X and
∗
ψt∗ (xt ) = g (xt , yt ) f xt , ψt+1
, xt ∈ X, t ∈ {1, . . . , T − 1}.
Algorithm 1 Recursive function approximations
For t = T, . . . , 1:
1. Set ψti ← g ξti , yt f ξti , ψt+1 for i ∈ {1, . . . , N }.
2. Choose ψt as a member of Ψ on the basis of ξt1:N and ψt1:N .
Iterated Auxiliary Particle Filters
Algorithm 2 An iterated auxiliary particle filter with parameters (N0 , k, τ )
1. Initialize: set ψt 0 = 1. l ← 0.
2. Repeat:
Nl
2.1 Run a ψ l -APF with Nl particles; set Ẑl ← Zψ
l.
2.2 If l > k and sd(Ẑl−k:l )/mean(Ẑl−k:l ) < τ , go to 3.
2.3 Compute ψ l+1 using Algorithm 1.
2.4 If Nl−k = Nl and the sequence Ẑl−k:l is not monotonically
increasing, set Nl+1 ← 2Nl .
Otherwise, set Nl+1 ← Nl .
2.5 Set l ← l + 1. Go to 2a.
Nl
3. Run a ψ l -APF. Return Ẑ := Zψ
.
An Elementary Implementation
Function Approximation .
I Numerically obtain:
(m∗t , Σ∗t , λ∗t ) = arg min(m,Σ,λ)
N
X
(N (xit , m, Σ)−λψti )2
i=1
I
Set:
ψt (xt ) := N (xt ; m∗t , Σ∗t ) + c(N, m∗t , Σ∗t ).
Stopping Rule .
I
k = 3 or k = 5 in the following examples
τ = 0.5
I
Multinomial when ESS < N/2.
I
Resampling .
A Linear Gaussian Model: Behaviour with Dimension
µ = N (·; 0, Id )
f (x, ·) = N (·; Ax, Id )
where Aij = 0.42|i−j|+1 ,
and g(x, ·) = N (·; x, Id )
0.5
1.0
1.5
iAPF
Bootstrap
0.0
Estimates
2.0
Box plots of Ẑ/Z for different |X| (1000 replicates; T = 100).
5
5
10
10
20
40
80
Linear Gaussian Model: Sensitivity to Parameters
1.5
1.0
Estimates
0.0
0.5
1.5
1.0
0.5
0.0
Estimates
2.0
2.0
Fixing d = 10: Bootstrap (N = 50, 000) / iAPF (N0 = 1, 000)
0.3
0.34
0.38
0.42
Values of α
Box plots of
Ẑ
Z
0.46
0.5
0.3
0.34
0.38
0.42
0.46
0.5
Values of α
for different values of the parameter α using
1000 replicates.
Linear Gaussian Model: PMMH Empirical
Autocorrelations
0
100
200
300
400
500
600
0.8
ACF
0
Lag
d=5
µ = N (·; 0, Id )
f (x, ·) = N (·; Ax, Id )
0.4
0.0
ACF
0.4
0.0
0.4
0.0
ACF
A55
0.8
A41
0.8
A11
and g(x, ·) = N (·; x, 0.25Id )
100
200
300
400
500
600
0
100
200
300
Lag
In this case:

0.9
 0.3

A=
 0.1
 0.4
0.1
400
Lag
0
0
0 0
0.7 0
0 0
0.2 0.6 0 0
0.1 0.1 0.3 0
0.2 0.5 0.2 0



,


treated as unknown lower triangular matrix.
500
600
Stochastic Volatility
I
A simple stochastic volatility model is defined by:
µ(·) =N (·; 0, σ 2 /(1 − α)2 )
f (x, ·) =N (·; αx, σ 2 )
and g(x, ·) =N (·; 0, β 2 exp(x)),
I
where α ∈ (0, 1), β > 0 and σ 2 > 0 are unknown.
Considered T = 945 observations y1:T corresponding to the
mean-corrected daily returns for the GBP/USD exchange
rate from 1/10/81 to 28/6/85.
ACF
0.0
0.4
0.8
Estimated PMCMC Autocorrelation
0
20
40
60
80
Lag
ACF
0.0
0.4
0.8
a
0
20
40
60
80
60
80
Lag
ACF
0.0
0.4
0.8
σ
0
β
20
40
Lag
Boostrap N = 1, 000.
iAPF N0 = 100.
Comparable cost.
150,000 PMCMC iterations.
0.974
0.978
0.982
0.986
0.99
0.994
0.974
0.978
0.982
0.986
0.99
0.994
0.974
0.978
0.982
0.986
0.99
0.994
Estimates
−922
−923
−924
−925
−926
−927
−928
Values of a
Bootstrap : N = 1, 000 / N = 10, 000 / iAP F, N0 = 100
Estimates
−923
−924
−925
−926
0.61
0.65
0.69
0.73
0.77
0.81
0.61
0.65
0.69
0.73
0.77
0.81
0.61
0.65
0.69
0.73
0.77
0.81
−927
Values of β
Bootstrap : N = 1, 000 / N = 10, 000 / iAP F, N0 = 100
−922
Estimates
−923
−924
−925
−926
−927
0.12
0.13
0.14
0.15
0.16
0.17
0.12
0.13
0.14
0.15
0.16
0.17
0.12
0.13
0.14
0.15
0.16
0.17
−928
Values of σ
Bootstrap : N = 1, 000 / N = 10, 000 / iAP F, N0 = 100
Block-Sampling and Tempering in Particle Filters
Tempered Transitions [5]
I
Introduce each likelihood term gradually, targetting:
πn,m (x1:n ) ∝ p(x1:n |y1:n−1 )p(yn |xn )βm
I
between p(x1:n−1 |y1:n−1 ) and p(x1:n |y1:n ).
Can improve performance — but to a limited extent.
Block Sampling [4]
I
Essentially uses yn:n+L in proposing xn .
I
Can dramatically improve performance,
I
but requires good analytic approximation of
p(xn:n+L |xn−1 , yn:n+L ).
Block-Tempering [8]
I
We could combine blocking and tempering strategies.
I
Run a simple SMC sampler [3] targetting:
1
β(t,r)
θ
πt,r
(x1:t∧T ) =µθ (x1 )
T
∧t
Y
s=2
1
γ(t,r)
gθ (y1 |x1 )
fθ (xs |xs−1 )
s
β(t,r)
·
s
γ(t,r)
gθ (ys |xs )
s
s
where {β(t,r)
} and {γ(t,r)
} are [0, 1]-valued
I
I
I
,
for s ∈ [[1, T ]], r ∈ [[1, R]] and t ∈ [[1, T 0 ]], with T 0 = T + L
for some R, L ∈ N.
Can be validly embedded within PMCMC:
I
I
Terminal likelihood estimate is unbiased.
Explicit auxiliary variable construction is possible.
(1)
Two Simple Block-Tempering Strategies
Tempering both likelihood and transition probabilities
s
β(t,r)
=
s
γ(t,r)
R(t − s) + r
= 1∧
∨0
RL
(2)
Tempering only the observation density
s
β(t,r)
=I{s ≤ t}
s
γ(t,r)
R(t − s) + r
= 1∧
∨ 0,
RL
(3)
Tempering Only the Observation Density
s
s
β(t,R)
= β(t,0)
1
0
t
s
s
γ(t,R)
1
1
L
s
γ(t,0)
0
t−L+1
t
s
Illustrative Example
I
Univariate Linear Gaussian SSM:
transition f (x0 |x) =N (x0 ; x, 1)
likelihood
5
●
●
●
●●
●
●
g(y|x) =N (y; x, 1)
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
Artificial jump in observation
seqeuence at time 75.
I
Cartoon of model misspecification
I
— a key difficulty with PMCMC.
Observations, y[t]
●
I
●
●
0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
−5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
−10
●
●
●
●
I
Temper only likelihood.
−15
●
●
●
●
●
●
●
●
●
●
I
Use single-site Metropolis-Within
Gibbs (standard normal
proposal) MCMC moves.
0
25
50
Time, t
75
●
●
●
● ●
●
●
●●●
●
●
100
True Filtering and Smoothing Distributions
5
●
●●
●●
●
●
●
●
● ●●
●●●
●
●
●
●
0
● ●● ●
●
●● ●●
●
● ●
●●
●
●
●
●
●
●
●
−5
●
●
x
●
Algorithm
smoother
● filter
●
●
●
●
●
●
●
●●
●●
●●
●
● ● ●
●●
●● ●
●●●●●
●●
● ●
−10
●
●
●●
●
●
●
●●
−15
●
●
●
●
● ● ●
●●
●
●● ●
●
● ●●
0
25
50
Time, t
75
100
Estimating the Normalizing Constant, Zb
−88
L
log(Estimated Z)
●
●
−90
●
●
●
●
●
1
2
3
5
●
R
● 0
1
5
10
−92
3
4
5
log(Cost = N(1+RL))
6
Relative Error in Zb Against Computational Effort
2
L
log(Relative Error in Z)
1
●
1
●
2
●
3
●
5
Performance:
log(N)
●
●
●
3
●
4
●
5
●6
R
●
0
0
1
5
10
4
5
log(Cost = N(1+RL))
6
(R = 1, L = 1)
≺(R > 1, L = 1)
≺(R > 1, L > 1)
Conclusions
I
To fully realise the potential of PMCMC we should exploit
its flexibility.
I
Even very simple variants on the standard particle filter
can significantly improve performance.
I
The iAPF can improve performance substantially in some
settings.
I
Extending the extent of its applicability is ongoing work.
I
In principle any function approximation scheme can be
employed: provided that ftψ can be sampled from and gtψ
evaluated.
I
Other [standard and less standard] ideas including blocking
and tempering can also be readily employed.
References
[1] C. Andrieu and G. O. Roberts. The pseudo-marginal approach for efficient Monte Carlo
computations. Annals of Statistics, 37(2):697–725, 2009.
[2] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo. Journal of
the Royal Statistical Society B, 72(3):269–342, 2010.
[3] P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo samplers. Journal of the
Royal Statistical Society B, 63(3):411–436, 2006.
[4] A. Doucet, M. Briers, and S. Sénécal. Efficient block sampling strategies for sequential
Monte Carlo methods. Journal of Computational and Graphical Statistics, 15(3):693–711,
2006.
[5] S. Godsill and T. Clapp. Improvement strategies for Monte Carlo particle filters. In
A. Doucet, N. de Freitas, and N. Gordon, editors, Sequential Monte Carlo Methods in
Practice, Statistics for Engineering and Information Science, pages 139–158. Springer
Verlag, New York, 2001.
[6] N. J. Gordon, S. J. Salmond, and A. F. M. Smith. Novel approach to
nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings-F, 140(2):107–113,
April 1993.
[7] P. Guarniero, A. M. Johansen, and A. Lee. The iterated auxiliary particle filter. ArXiv
mathematics e-print 1511.06286, ArXiv Mathematics e-prints, 2015.
[8] A. M. Johansen. On blocks, tempering and particle MCMC for systems identification. In
Proceedings of 17th IFAC Symposium on System Identification, Beijing, China., 2015. IFAC.
[9] J. Owen, D. J. Wilkinson, and C. S. Gillespie. Likelihood free inference for markov
processes: a comparison. Statistical Applications in Genetics and Molecular Biology, 2015.
In press.
[10] N. Whiteley and A. Lee. Twisted particle filters. Annals of Statistics, 42(1):115–141, 2014.
Download