Exact Approximation of Rao-Blackwellized Particle Filters Adam M. Johansen , Nick Whiteley

advertisement
Exact Approximation of
Rao-Blackwellized Particle Filters
Adam M. Johansen? , Nick Whiteley† and Arnaud Doucet‡
? University
of Warwick Department of Statistics,
of Bristol School of Mathematics,
‡ University of Oxford Department of Statistics
† University
a.m.johansen@warwick.ac.uk
http://go.warwick.ac.uk/amjohansen/talks
February 21st, 2013
Séminaire BIG’MC, Paris
1
Context
Filtering in State-Space Models:
I
SIR Particle Filters [GSS93]
I
Rao-Blackwellized Particle Filters [AD02, CL00]
Exact Approximation of Monte Carlo Algorithms:
I
Particle MCMC [ADH10]
I
SMC2 [CJP11]
I
Local SMC [JD13]
Approximating the RBPF
I
Approximated Rao-Blackwellized Particle Filters [CSOL11]
This: EA-RBPFs [JWD12]
2
A (Rather Broad) Class of Hidden Markov Models
y1
x1
z1
x2
z2
x3
y2
z3
y3
I
I
I
Unobserved Markov chain {(Xn , Zn )} transition f .
Observed process {Yn } conditional density g .
Density:
p(x1:n , z1:n , y1:n ) = f1 (x1 , z1 )g (y1 |x1 , z1 )
n
Y
f (xi , zi |xi−1 , zi−1 )g (yi |xi , zi ).
i=2
3
Formal Solutions
I
Filtering and Prediction Recursions:
p(xn , zn |y1:n−1 )g (yn |xn , zn )
p(xn , zn |y1:n ) = R
p(xn0 , zn0 |y1:n−1 )g (yn |xn0 , zn0 )d(xn0 , zn0 )
Z
p(xn+1 , zn+1 |y1:n ) = p(xn , zn |y1:n )f (xn+1 , zn+1 |xn , zn )d(xn , zn )
I
Smoothing:
p((x, z)1:n |y1:n ) ∝ p((x, z)1:n−1 |y1:n−1 )f ((x, z)n |(x, z)n−1 )g (yn |(x, z)n )
4
Importance Sampling
I
Given q, such that
I
I
p(x) > 0 ⇒ q(x) > 0
and p(x)/q(x) < ∞,
define w (x) = p(x)/q(x) and:
Z
Z
I = ϕ(x)p(x)dx = ϕ(x)w (x)q(x)dx.
I
This suggests the estimator:
I
I
iid
Sample X1 , . . . , XN ∼ q.
N
P
w (Xi )ϕ(Xi ).
Estimate bI = N1
i=1
I
Can also be viewed as approximating π(dx) = p(x)dx with
π
bN (dx) =
N
1 X
w (Xi )δXi (dx).
N
i=1
5
Self-Normalised Importance Sampling
I
Often, p is known only up to a normalising constant.
I
As Eq (Cw ϕ) = C Ep (ϕ). . .
I
If v (x) = Cw (x), then
Eq (Cw ϕ)
C Ep (ϕ)
Eq (v ϕ)
=
=
= Ep (ϕ).
Eq (v 1)
Eq (Cw 1)
C Ep (1)
I
Estimate the numerator and denominator with the same
sample:
N
P
v (Xi )ϕ(Xi )
bI = i=1
.
N
P
v (Xi )
i=1
6
Importance Sampling in The HMM Setting
I
Given p((x, z)1:n |y1:n ) for n = 1, 2, . . . .
I
Choose
qn ((x, z)1:n ) = qn ((x, z)n |(x, z)1:n−1 )qn−1 ((x, z)1:n−1 ).
I
Weight:
p((x, z)1:n |y1:n )
qn ((x, z)n |(x, z)1:n−1 )qn−1 ((x, z)1:n−1 )
p((x, z)1:n |y1:n )wn−1 ((x, z)1:n−1 )
=
qn ((x, z)n |(x, z)1:n−1 )p((x, z)1:n−1 |y1:n−1 )
f ((x, z)n |(x, z)n−1 )g (yn |(x, z)n )
∝
wn−1 ((x, z)1:n−1 )
qn ((x, z)n |(x, z)n−1 )
wn ((x, z)1:n ) ∝
7
Sequential Importance Sampling – Prediction & Update
I
A first “particle filter”:
I
I
Simple default: qn ((x, z)n |(x, z)n−1 ) = f ((x, z)n |(x, z)n−1 ).
Importance weighting becomes:
wn ((x, z)1:n ) = wn−1 ((x, z)1:n−1 ) × g (yn |(x, z)n )
I
Algorithmically, at iteration n:
I
i
Given {Wn−1
, (X , Z )i1:n−1 } for i = 1, . . . , N:
I
I
I
Sample (X , Z )in ∼ f (·|(X , Z )in−1 ) (prediction)
i
Weight Wni ∝ Wn−1
g (yn |(X , Z )in ) (update)
Actually:
I
I
Better proposals exist. . .
but even they aren’t good enough.
8
A Simple SIR Filter
Algorithmically, at iteration n:
I
I
i
Given {Wn−1
, (X , Z )i1:n−1 } for i = 1, . . . , N:
e, Z
e )i
}.
Resample, obtaining { 1 , (X
1:n−1
N
I
I
Sample
(X , Z )in
e, Z
e )i )
∼ qn (·|(X
n−1
Weight
Wni ∝
e ,Z
e )i )g (yn |(X ,Z )i )
f ((X ,Z )in |(X
n
n−1
e ,Z
e )i )
qn ((X ,Z )i |(X
n
n−1
Actually:
I
Resample efficiently.
I
Only resample when necessary.
I
...
9
A Rao-Blackwellized SIR Filter
Algorithmically, at iteration n:
I
X ,i
i
i
Given {Wn−1
, (X1:n−1
, p(z1:n−1 |X1:n−1
, y1:n−1 )}
1
ei
ei
, p(z1:n−1 |X
Resample, obtaining { , (X
I
For i = 1, . . . , N:
I
1:n−1
N
I
I
1:n−1 , y1:n−1 ))}.
ei )
Sample Xni ∼ qn (·|X
n−1
i
i
i
e
Set X1:n ← (X
,
1:n−1 Xn ).
ei )
p(Xni ,yn |X
n−1
i
e
qn (X |X i )
I
Weight WnX ,i ∝
I
i
Compute p(z1:n |y1:n , X1:n
).
n
n−1
Requires analytically tractable substructure.
10
An Approximate Rao-Blackwellized SIR Filter
Algorithmically, at iteration n:
I
I
I
X ,i
i
i
Given {Wn−1
, (X1:n−1
,b
p (z1:n−1 |X1:n−1
, y1:n−1 )}
1
ei
ei
,b
p (z1:n−1 |X
Resample, obtaining { , (X
1:n−1
N
1:n−1 , y1:n−1 ))}.
For i = 1, . . . , N:
I
I
ei )
Sample Xni ∼ qn (·|X
n−1
i
i
i
e
Set X1:n ← (X
,
1:n−1 Xn ).
ei )
b
p (Xni ,yn |X
n−1
i
e
qn (X |X i )
I
Weight WnX ,i ∝
I
i
Compute b
p (z1:n |y1:n , X1:n
).
n
n−1
Is approximate; how does error accumulate?
11
Exactly Approximated Rao-Blackwellized SIR Filter
At time n = 1
I
I
I
Sample, X1i ∼ q x ( ·| y1 ).
Sample, Z1i,j ∼ q z ·| X1i , y1 .
Compute and normalise the local weights
w1z X1i , Z1i,j
p(X i , y1 , Z1i,j )
, W1z,i,j
:= 1
i,j
i
z
q Z1 X1 , y1
define b
p (X1i , y1 ) :=
:=
w1z X1i , Z1i,j
,
PM
z X i , Z i,k
w
1
1
k=1 1
M
1 X z i i,j w1 X1 , Z1 .
M
j=1
I
Compute and normalise the top-level weights
w1x
X1i
w1x X1i
b
p (X1i , y1 )
x,i
, W1 := PN
:= x
.
x
k
q X1i |y1
k=1 w1 X1
12
At times n ≥ 2
I
Resample
x,i
Wn−1
,
n
o z,i,j
i,j
i
X1:n−1 , Wn−1 , Z1:n−1
j
i
to obtain
I
I
I
n
o 1
z,i,j
i,j
ei
,
, X
W
,
Z
.
1:n−1
n−1
1:n−1
N
j
i
z,i,j
i,j
1 e i,j
Resample {W n−1 , Z 1:n−1 }j to obtain { M
, Z1:n−1 }j .
i
i
ei
ei
Sample Xni ∼ q x (·|X
1:n−1 , y1:n ); set X1:n := (X1:n−1 , Xn ).
e i,j
Sample Zni,j ∼ q z ·| X i , y1:n , Z
; set
1:n
i,j
Z1:n
1:n−1
e i,j , Zni,j ).
:= (Z
1:n−1
13
I
Compute and normalise the local weights
ei , Z
e i,j
p Xni , yn , Zni,j X
n−1
n−1
i,j
i
,
wnz X1:n
, Z1:n
:=
i,j
i,j
e
q z Zn X i , y1:n , Z
1:n
ei
b
p ( Xni , yn X
1:n−1 , y1:n−1 ) : =
Wnz,i,j : =
I
1:n−1
M
1 X z i
i,j
wn X1:n , Z1:n
,
M
j=1
i , Z i,j
wnz X1:n
1:n
.
PM
z X i , Z i,k
w
1:n
1:n
k=1 n
Compute and normalise the top-level weights
ei
b
p ( Xni , yn X
1:n−1 , y1:n−1 )
i
wnx X1:n
:=
,
ei
q x (Xni |X
,
y
)
1:n
1:n−1
x Xi
w
n
1:n
x,i
Wn := PN
.
k
x
k=1 wn X1:n
14
How can this be justified?
I
As an extended space SIR algorithm.
I
Via unbiased estimation arguments.
Note also the M = 1 and M → ∞ cases.
How does this differ from CSOL11?
Principally in the local weights, benefits including:
I
Valid (N-consistent) for all M ≥ 1 rather than
(M, N)-consistent.
I
Computational cost O(MN) rather than O(M 2 N).
I
Only requires knowledge of joint behaviour of x or z; doesn’t
require say p(xn |xn−1 , zn−1 ).
15
Ancestral Lines
n=1
n=2
n=3
a31 =1
a34 =3
a21 =1
a24 =3
2
4
6
b3,1:3
=(1, 1, 2) b3,1:3
=(3, 3, 4) b3,1:3
=(4, 5, 6)
16
Importance Sampling Intepretation – Proposal
Neglecting resampling at the top level, the time n proposal is:
z,1:M
1:M
, z1:n
qn x1:n , a1:n−1
y1:n
=
z,1:M
1:M
, z1:n
qn ( x1:n | y1:n ) qn a1:n−1
x1:n , y1:n
where
qn ( x1:n | y1:n ) = q x ( x1 | y1 )
n
Y
q x ( xm | x1:m−1 , y1:m ) ,
m=2
and assuming multinomialresampling locally
z,1:M
1:M x , y
qn a1:n−1
, z1:n
1:n 1:n becomes
M
Q
j=1
(
q
z
z1j
x1 , y1
n
Y
z,az,j
Wm−1m−1 q z
j
zm
z,j
am−1
x1:m , y1:m , z1:m−1
)
.
m=2
17
Importance Sampling Interpretation — Weights and Target
The importance weights are:
n
b(x1 , y1 ) Y p
b ( xm , ym | x1:m−1 , y1:m−1 )
p
z,1:M
1:M
wn x1:n , a1:n−1
,
, z1:n
= x
q (x1 |y1 )
q x (xm |x1:m−1 , y1:m )
m=2
implying the target distribution
z,1:M
1:M
, z1:n
∝
πn x1:n , a1:n−1
z,1:M
z,1:M
1:M
1:M
, z1:n
, z1:n
wn x1:n , a1:n−1
qn x1:n , a1:n−1
y1:n
z,1:M
1:M
b(x1:n , y1:n )qn a1:n−1
x1:n , y1:n
∝ p
, z1:n
where
b
p (x1:n , y1:n ) := b
p (x1 , y1 )
n
Y
b
p ( xm , ym | x1:m−1 , y1:m−1 )
m=2
18
Importance Sampling Interpretation — Marginal Target
It is now well-known that the marginal likelihood estimate provided
by a particle filter is unbiased; i.e.
X Z
z,1:M
1:M
1:M
b
p (x1:n , y1:n ) qn a1:n−1
, z1:n
y1:n , x1:n dz1:n
= p (x1:n , y1:n ) .
z,1:M
a1:n−1
It follows straightforwardly that the marginal target is:
X Z
z,1:M
1:M
1:M
πn (x1:n ) =
, z1:n
dz1:n
= p ( x1:n | y1:n ) .
πn x1:n , a1:n−1
z,1:M
a1:n−1
19
A Joint Target Distribution I
A little rearrangement yields:
z,1:M
1:M =
πn x1:n , a1:n−1
, z1:n
p( x1:n |y1:n )
n−1
M


j
1
z zk x , y
×M
p
z
x
,
y
q
1 1
1
1:n 1:n 1:n 
j=1

k=1,k6=b1j
(
)
z,k
z,k
n
M
Q
Q
am−1
am−1
k x
×
Wm−1
q z zm
,
y
,
z
1:m 1:m 1:m−1
PM
m=2

M
Q
j
k=1,k6=bm
[Easily verified by division by the joint proposal.]
20
A Joint Target Distribution II
Introducing an additional discrete random variable L ∈ {1, ..., M}
such that
1:M
1:M
πn x1:n , a1:n−1
, l, z1:n
=




M


l
Y
p x1:n , z1:n y1:n
z
k
q
z
x
,
y
1
1
1


Mn


k=1,k6=b1j



n 
M
z,k
Y
Y
k
am−1
am−1
z
k
.
×
Wm−1 q
zm x1:m , y1:m , z1:m−1


j
m=2
k=1,k6=bm
This provides use with an estimate of the joint distribution of
interest p ( x1:n , z1:n | y1:n ).
21
Toy Example: Model
We use a simulated sequence of 100 observations from the model
defined by the densities:
x1
0
1 0
µ(x1 , z1 ) =N
;
z1
0
0 1
xn
xn−1
1 0
f (xn , zn |xn−1 , zn−1 ) =N
;
,
zn
zn−1
0 1
2
xn
σx 0
g (yn |xn , zn ) =N yn ;
,
zn
0 σz2
Consider IMSE (relative to optimal filter) of filtering estimate of
first coordinate marginals.
22
Mean Squared Filtering Error
Approximation of the RBPF
N = 10
−1
10
N = 20
N = 40
N = 80
−2
10
N = 160
0
10
1
10
Number of Lower−Level Particles, M
2
10
For σx2 = σz2 = 1.
23
Computational Performance
0
Mean Squared Filtering Error
10
N = 10
N = 20
N = 40
N = 80
N = 160
−1
10
−2
10
−3
10
1
10
2
10
3
10
Computational Cost, N(M+1)
4
10
5
10
For σx2 = σz2 = 1.
24
Computational Performance
N = 10
N = 20
N = 40
N = 80
N = 160
1.9
Mean Squared Filtering Error
10
1.8
10
1.7
10
1.6
10
1
10
2
10
3
10
Computational Cost, N(M+1)
4
10
5
10
For σx2 = 102 and σz2 = 0.12 .
25
In Conclusion
I
SMC can be used hierarchically.
I
Software implementation is not difficult [Joh09].
The Rao-Blackwellized particle filter can be approximated
exactly
I
I
I
I
I
Can reduce estimator variance at fixed cost.
Attractive for distributed/parallel implementation.
Allows combination of different sorts of particle filter.
Can be combined with other techniques for parameter
estimation etc..
26
References I
C. Andrieu and A. Doucet.
Particle filtering for partially observed Gaussian state space models.
Journal of the Royal Statistical Society B, 64(4):827–836, 2002.
C. Andrieu, A. Doucet, and R. Holenstein.
Particle Markov chain Monte Carlo.
Journal of the Royal Statistical Society B, 72(3):269–342, 2010.
N. Chopin, P. Jacob, and O. Papaspiliopoulos.
SMC2 .
ArXiv Mathematics e-prints, 1101.1528, January 2011.
R. Chen and J. S. Liu.
Mixture Kalman filters.
Journal of the Royal Statistical Society B, 62(3):493–508, 2000.
T. Chen, T. Schön, H. Ohlsson, and L. Ljung.
Decentralized particle filter with arbitrary state decomposition.
IEEE Transactions on Signal Processing, 59(2):465–478, February 2011.
N. J. Gordon, S. J. Salmond, and A. F. M. Smith.
Novel approach to nonlinear/non-Gaussian Bayesian state estimation.
IEE Proceedings-F, 140(2):107–113, April 1993.
27
References II
A. M. Johansen and A. Doucet.
Hierarchical particle sampling for intractable state-space models.
CRiSM working paper, University of Warwick, 2013.
In preparation.
A. M. Johansen.
SMCTC: Sequential Monte Carlo in C++.
Journal of Statistical Software, 30(6):1–41, April 2009.
A. M. Johansen, N. Whiteley, and A. Doucet.
Exact approximation of Rao-Blackwellised particle filters.
In Proceedings of 16th IFAC Symposium on Systems Identification, Brussels,
Belgium, July 2012. IFAC, IFAC.
28
Download