Particle Filters Inference via Interacting Particle Systems Adam M. Johansen

advertisement
Particle Filters
Inference via Interacting Particle Systems
Adam M. Johansen
a.m.johansen@warwick.ac.uk
http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic/
johansen/talks/
MASDOC Statistical Frontiers Seminar
13th January 2011
Part 1 – Statistics & Computation
Problems and Algorithms
A Statistical Problem
Hidden Markov Models / State Space Models
x1
x2
x3
x4
x5
x6
y1
y2
y3
y4
y5
y6
I
Unobserved Markov chain {Xn } transition f .
I
Observed process {Yn } conditional density g.
I
Density:
p(x1:n , y1:n ) = f1 (x1 )g(y1 |x1 )
n
Y
f (xi |xi−1 )g(yi |xi ).
i=2
4/79
Filtering / Smoothing
I
Let X1 , . . . denote the position of an object which follows
Markovian dynamics:
Xn |{Xn−1 = xn−1 } ∼ f (·|xn−1 ).
I
Let Y1 , . . . denote a collection of observations:
Yi |{Xi = xi } ∼ g(·|xi ).
I
Smoothing: estimate, as observations arrive, p(x1:n |y1:n ).
I
Filtering: estimate, as observations arrive, p(xn |y1:n ).
I
Formal Solution:
p(x1:n |y1:n ) = p(x1:n−1 |y1:n−1 )
f (xn |xn−1 )g(yn |xn )
p(yn |y1:n−1 )
5/79
A Motivating Example: Data
30
25
20
sy/km
15
10
5
0
−5
−10
0
5
10
15
20
25
30
35
sx/km
6/79
Example: Almost Constant Velocity Model
I
I
I
States: xn = [sxn uxn syn uyn ]T
Dynamics: xn = Axn−1 + n
 x  
1 ∆t 0 0
sn
 uxn   0 1 0 0
 y =
 sn   0 0 1 ∆t
0 0 0 1
uyn

sxn−1
  uxn−1 
 y
 + n
 s

n−1
y
un−1

Observation: yn = Bxn + νn

sxn
 uxn 
 y  + νn
 sn 
uyn

rnx
rny
=
1 0 0 0
0 0 1 0
7/79
Sampling Approaches
The Monte Carlo Method
I
Given a probability density, f , and ϕ : E → R
Z
I=
ϕ(x)f (x)dx
E
I
Simple Monte Carlo solution:
I
I
i.i.d.
Sample X1 , . . . , XN ∼ f .
N
P
ϕ(Xi ).
Estimate Ib = N1
i=1
I
Can also be viewed as approximating π(dx) = f (x)dx with
π
bN (dx) =
N
1 X
δXi (dx).
N
i=1
9/79
The Importance–Sampling Identity
I
Given g, such that
I
I
f (x) > 0 ⇒ g(x) > 0
and f (x)/g(x) < ∞,
define w(x) = f (x)/g(x) and:
Z
Z
Z
ϕ(x)f (x)dx = ϕ(x)f (x)g(x)/g(x)dx = ϕ(x)w(x)g(x)dx.
I
This suggests the importance sampling estimator:
I
I
i.i.d.
Sample X1 , . . . , XN ∼ g.
N
P
w(Xi )ϕ(Xi ).
Estimate Ib = N1
i=1
I
Can also be viewed as approximating π(dx) = f (x)dx with
π
bN (dx) =
N
1 X
w(Xi )δXi (dx).
N
i=1
10/79
Importance Sampling Example
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
-4
-3
-2
-1
0
1
2
3
4
11/79
Self-Normalised Importance Sampling
I
Often, f is known only up to a normalising constant.
I
If v(x) = cf (x)/g(x) = cw(x), then
cEf (ϕ)
Eg (vϕ)
Eg (cwϕ)
=
=
= Ef (ϕ).
Eg (v1)
Eg (cw1)
cEf (1)
I
Estimate the numerator and denominator with the same
sample:
N
P
v(Xi )ϕ(Xi )
i=1
Ib =
.
N
P
v(Xi )
i=1
I
Biased for finite samples, but consistent.
I
Typically reduces variance.
12/79
Importance Sampling for Smoothing/Filtering
I
(i)
Sample {X1:n } at time n from qn (x1:n ), define
p(x1:n |y1:n )
p(x1:n , y1:n )
=
q(x1:n )
q(x )p(y1:n )
Qn 1:n
f (x1 )g(y1 |x1 ) m=2 f (xm |xm−1 )g(ym |xm )
∝
qn (x1:n )
wn (x1:n ) ∝
(i)
(i)
P
(j)
I
set Wn = wn (X1:n )/
I
then {Wn , Xn } is a consistently weighted sample.
I
This seems inefficient.
(i)
j
wn (X1:n ),
(i)
13/79
Sequential Importance Sampling (SIS) I
I
Importance weight
wn (x1:n ) ∝
f (x1 )g(y1 |x1 )
Qn
m=2 f (xm |xm−1 )g(ym |xm )
qn (x1:n )
n
Y
f (xm |xm−1 )g(ym |xm )
f (x1 )g(y1 |x1 )
=
qn (x1 )
qn (xm |x1:m−1 )
m=2
I
(i)
(i)
Given {Wn−1 , X1:n−1 } targetting p(x1:n−1 |y1:n−1 )
I
Let qn (x1:n−1 ) = qn−1 (x1:n−1 ),
I
sample Xn
(i) i.i.d.
(i)
∼ qn (·|Xn−1 ).
14/79
Sequential Importance Sampling (SIS) II
I
And update the weights:
wn (x1:n ) =wn−1 (x1:n−1 )
f (xn |xn−1 )g(yn |xn )
qn (xn |xn−1 )
(i)
Wn(i) =wn (X1:n )
(i)
(i)
=wn−1 (X1:n−1 )
(i)
(i)
=Wn−1
R
(i)
(i)
f (Xn |Xn−1 )g(yn |Xn )
(i)
(i)
qn (Xn |Xn−1 )
(i)
(i)
f (Xn |Xn−1 )g(yn |Xn )
(i)
(i)
qn (Xn |Xn−1 )
I
If
p(x1:n |y1:n )dxn ≈ p(x1:n−1 |y1:n−1 ) this makes sense.
I
We only need to store {Wn , Xn−1:n }.
I
Same computation every iteration.
(i)
(i)
15/79
Importance Sampling on Huge Spaces Doesn’t Work
I
It’s said that IS breaks the curse of dimensionality:
#
"
Z
N
√
1 X
d
w(Xi )ϕ(Xi ) − ϕ(x)f (x)dx → N (0, Varg [wϕ])
N
N
i=1
I
This is true.
I
But it’s not enough.
I
Varg [wϕ] increases (often exponentially) with dimension.
I
Eventually, an SIS estimator (of p(x1:n |y1:n )) will fail.
I
But p(xn |y1:n ) is a fixed-dimensional distribution.
16/79
Sequential Importance Resampling
Resampling: The SIR[esampling] Algorithm
I
Problem: variance of the weights builds up over time.
I
Solution? Given {Wn−1 , X1:n−1 }:
(i)
1.
2.
3.
4.
(i)
e (i) }.
Resample, to obtain { N1 , X
1:n−1
(i)
(i)
e
Sample Xn ∼ qn (·|X
n−1 ).
(i)
e (i) .
Set X1:n−1 = X
1:n−1
(i)
(i)
(i)
(i)
(i)
(i)
Set Wn = f (Xn |Xn−1 )g(yn |Xn )/qn (Xn |Xn−1 ).
I
And continue as with SIS.
I
There is a cost, but this really works.
18/79
Iteration 2
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
19/79
Iteration 3
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
20/79
Iteration 4
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
21/79
Iteration 5
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
22/79
Iteration 6
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
23/79
Iteration 7
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
24/79
Iteration 8
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
25/79
Iteration 9
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
26/79
Iteration 10
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
27/79
Part 2 – Applied Probability
Feynman-Kac Formulæ
Feynman-Kac Formulæ
I
I
A natural description for measure-valued stochastic
processes.
Model for:
I
I
I
I
Particle motion in absorbing environments.
Classes of branching particle system.
Simple genetic algorithms.
Particle filters and related algorithms.
Structure of this section:
I
Probabilistic Construction
I
Semigroup[oid] Structure
I
McKean Interpretations
I
Particle Approximations
I
Selected Results
29/79
Probabilistic Construction
The Canonical Markov Chain
I
Consider the filtered probability space:
(Ω, F, {Fn }n∈N , Pµ )
I
Let {Xn }n∈N be a Markov chain such that for any n ∈ N:
Pµ (X1:n ∈ dx1:n ) =µ(dx1 )
n
Y
Mi (xi−1 , dxi )
i=2
Xi : Ω →Ei
µ ∈P(E1 )
Mi : Ei−1 →P(Ei )
I
(Ei , Ei ) are measurable spaces.
I
The Xi are Ei /Fi -measurable.
I
Using Kolmogorov’s/Tulcea’s extension theorem there
exists a unique process-valued extension.
31/79
Some Operator Notation
Given two measurable spaces, (E, E) and (F, F), a measure µ
on (E, E) and a Markov kernel, K : E → P(F ), define:
Z
µ(ϕE ) := µ(dx)ϕE (x)
Z
µK(ϕF ) := µ(dx)K(x, dy)ϕF (y)
µK ∈ P(F )
Z
K(ϕF )(x) := K(x, dy)ϕF (y)
K(ϕF ) : E → R
with ϕE , ϕF suitably measurable functions.
Given two functions, f, g : E → R, define f · g : E → R via
(f · g)(x) = f (x)g(x).
32/79
The Feynman-Kac Formulæ
I
Given Pµ and potential functions:
{Gi }i∈N
I
Gi : Ei →[0, ∞)
Define two path measures weakly:
n−1
Q
E ϕ1:n (X1:n )
Gi (Xi )
i=1
Qn (ϕ1:n ) =
n−1
Q
E
Gi (Xi )
i=1
n
Q
E ϕ1:n (X1:n )
Gi (Xi )
b n (ϕ1:n ) =
n i=1 Q
Q
E
Gi (Xi )
i=1
where ϕ1:n : ⊗ni=1 Ei → R.
33/79
Example (Filtering via FK Formulæ: Prediction)
I
Let µ(x1 ) = f (x1 ), Mn (xn−1 , dxn ) = f (xn |xn−1 )dxn .
I
Let Gn (xn ) = g(yn |xn ).
I
Then:
"
Qn (ϕ1:n ) =E ϕ1:n (X1:n )
"
=E ϕ1:n (X1:n )
n−1
Y
i=1
n−1
Y
#, "n−1
#
Y
Gi (Xi )
E
Gi (Xi )
i=1
#, "n−1
#
Y
g(yi |Xi )
E
g(yi |Xi )
i=1
R
f (x1 )
n
Q
f (xi |xi−1 )
R
#
g(yj |xj ) ϕ1:n (x1:n )dx1:n
j=1
i=2
=
i=1
"n−1
Q
#
"n−1
n
Q
Q
f (x1 )
f (xi |xi−1 )
g(yj |xj ) dx1:n
i=2
j=1
Z
=
p(x1:n |y1:n−1 )ϕ1:n (x1:n )dx1:n
34/79
Example (Filtering via FK Formulæ: Update/Filtering)
I
Whilst:
"
b n (ϕ1:n ) =E ϕ1:n (X1:n )
Q
n
Y
#, " n
#
Y
Gi (Xi )
E
Gi (Xi )
i=1
"
=E ϕ1:n (X1:n )
n
Y
g(yi |Xi )
E
i=1
R
#
g(yi |Xi )
i=1
#
" n
n
Q
Q
f (x1 )
f (xi |xi−1 )
g(yj |xj ) ϕ1:n (x1:n )dx1:n
j=1
i=2
=
i=1
n
Y
#, "
#
" n
n
R
Q
Q
f (x1 )
f (xi |xi−1 )
g(yj |xj ) dx1:n
i=2
j=1
Z
=
p(x1:n |y1:n )ϕ1:n (x1:n )dx1:n
35/79
Feynman-Kac Marginal Measures
We are typically interested in marginals:
"
#
"
#
n−1
n
Y
Y
γn (ϕn ) =E ϕn (Xn )
Gi (Xi )
γ
bn (ϕn ) =E ϕn (Xn )
Gi (Xi )
i=1
i=1
ηn (ϕn ) =Qn (11:n−1 ⊗ ϕn )
n−1
Q
E ϕn (Xn )
Gi (Xi )
i=1
=
n−1
Q
Gi (Xi )
E
b n (11:n−1 ⊗ ϕn )
ηbn =Q
n
Q
E ϕn (Xn )
Gi (Xi )
i=1
n
=
Q
Gi (Xi )
E
i=1
i=1
=γn (ϕn )/γn (1)
=b
γn (ϕn )/b
γn (1)
Key property:
Z
ηn (An ) =
Qn (dx1:n )
E1 ×...En−1 ×An
Z
ηbn (An ) =
b n (dx1:n )
Q
E1 ×...En−1 ×An
36/79
Analysis: Semigroup Structure
How do the marginal distributions evolve?
Some Recursive Relationships
I
The unnormalized marginals obey:
γ
bn (ϕn ) =γn (ϕn · Gn )
I
γn (ϕn ) =b
γn−1 Mn (ϕn )
Whilst the normalized marginals satisfy:
γ
bn (ϕn )
γ
bn (1)
γn (ϕn · Gn )
=
γn (Gn )
ηn (ϕn · Gn )
=
ηn (Gn )
ηbn (ϕn ) =
γn (ϕn )
γn (1)
γ
bn−1 Mn (ϕn )
=
γ
bn−1 Mn (1)
ηbn−1 Mn (ϕn )
=
ηbn−1 Mn (1)
=b
ηn−1 Mn (ϕn )
ηn (ϕn ) =
38/79
The Boltzmann-Gibbs Operator
I
Given ν ∈ P(E) and G : E → R:
ΨG :
P(E) →P(E)
ΨG :
I
ν →ΨG (ν)
The Boltzmann-Gibbs Operator ΨG is defined weakly
by:
∀ϕ ∈ Cb :
I
ΨG (ν) (ϕ) =
ν(G · ϕ)
ν(G)
or equivalently, for all measurable sets A:
ν (G · IA )
ν(G)
R
ν(dx)G(x)
=R A
0
0
E ν(dx )G(x )
ΨG (A) =
39/79
Example (Boltzmann-Gibbs Operators and Bayes’ Rule)
I
Let µ(dx) = f (x)λ(dx) be a prior measure.
I
Let G(x) = g(y|x) be the likelihood.
I
Then:
R
µ(dx)G(x)ϕ(x)
µ(G · ϕ)
= R
ΨG (µ) (ϕ) =
µ(G)
µ(dx0 )G(x0 )
R
f (x)g(y|x)ϕ(x)λ(dx)
= R
f (x0 )g(y|x0 )λ(dx0 )
Z
= f (x|y)ϕ(x)λ(dx)
with
f (x|y) := R
I
f (x)g(y|x)
f (x)g(y|x)λ(dx)
So: Ψg(y|·) : Prior → Posterior.
40/79
Markov Semigroups
I
A semigroup S comprises:
I
I
I
A set, S.
An associative binary operation, ·.
A Markov Chain with homogeneous transition M has
dynamic semigroup Mn :
I
I
I
I
M0 (x, A) = δx (A).
M1 (x, A) = M
R (x, A).
Mn (x, A) = M (x, Rdy)Mn−1 (y, A).
(Mn · Mm )(x, A) = Mn (x, dy)Mm (y, A) = Mn+m (x, A).
I
A linear semigroup.
I
Key property:
P (Xn+m ∈ A|Xm = x) = Mn (x, A).
41/79
Markov Semigroupoids
I
A semigroupoid, S 0 comprises:
I
I
I
A set, S.
A partial associative binary operation, ·.
A Markov Chain with inhomogeneous transitions Mn has
dynamic semigroupoid Mp:q :
I
I
I
I
Mp:p (x, A) = δx (A).
Mp:p+1 (x, A) =
R Mp+1 (x, A).
Mp:q (x, A) = Mp+1 (x,
R dy)Mp+1:q (y, A).
(Mp:q · Mq:r )(x, A) = Mp:q (x, dy)Mq:r (y, A) = Mp:r (x, A).
I
A linear semigroupoid.
I
Key property:
P (Xn+m ∈ A|Xm = x) = Mm,n+m (x, A).
42/79
An Unnormalized Feynman-Kac Semigroupoid
I
We previously established:
γn =b
γn−1 Mn
I
γ
bn (ϕn ) =γn (ϕn · Gn )
Defining
Qp (xp−1 , dxp ) = Mp (xp−1 , dxp )Gp (xp )
I
we obtain γn = γn−1 Qn .
We can construct the dynamic semigroupoid Qp:q :
I
I
I
I
I
Qp:p (x, A) = δx (A).
Qp:p+1 (x, A) =
R Qp+1 (x, A).
Qp:q (x, A) = Qp+1 (x,
R dy)Qp+1:q (y, A).
(Qp:q · Qq:r )(x, A) = Qp:q (x, dy)Qq:r (y, A) = Qp:r (x, A).
Just a Markov semigroupoid for general measures:
∀p ≤ q : γq = γp Qp:q .
43/79
A Normalised Feynman-Kac Semigroupoid
I
We previously established:
ηn =b
ηn−1 Mn (ϕn )
ηbn =
ηn (ϕn · Gn )
ηn (Gn )
I
From the definition of ΨGn : ηbn = ΨGn (ηn ).
I
Defining Φn : P(En−1 ) → P(En ) as:
Φn : ηn−1 → ΨGn−1 (ηn−1 )Mn
we have the recursion ηn = Φn (ηn−1 ) and the nonlinear
semigroupoid, Φp:q :
I
I
I
I
I
Φp:p (x, A) = δx (A).
Φp:p+1 (x, A) = Φp+1 (x, A).
Φp:q (x, A) = Φp+1:q (Φ
R p+1 (ηp )) for q > p + 1.
(Φp:q · Φq:r )(x, A) = Φq:r (y, A)Φp:q (x, dy) = Φp:r (x, A).
Again: ∀p ≤ q : ηq = ηp Φp:q .
44/79
McKean Interpretations
Microscopic mass transport.
McKean Interpretations of Feynman-Kac Formulæ
I
Families of Markov kernels consistent with FK Marginals.
I
A collection {Kn,η }n∈N,η∈P(En−1 ) is a McKean
Interpretation if:
∀n ∈ N : ηn = Φn (ηn−1 ) = ηn−1 Kn,ηn−1 .
I
I
Not unique. . . and not linear.
Selection/Mutation approach seems natural:
I
I
I
Choose Sn,η such that ηSn,η = ΨGn (η).
Set Kn+1,η = Sn,η Mn+1 .
Still not unique:
I
I
Sn,η (xn , ·) = ΨGn (η)
Sn,η (xn , ·) = n Gn (xn )δxn (·) + (1 − n Gn (xn ))ΨGn (η)(·)
46/79
Particle Interpretations
Stochastic discretisations.
Particle Interpretations of Feynman-Kac Formulæ I
Given a McKean interpretation, we can attach an N -particle
model.
(N )
I
Denote ξn
I
Allow
I
(N,2)
(N,1)
, ξn
N
N
= (ξn
Ω ,F
=
(N,N )
, . . . , ξn
) ∈ EnN .
(FnN )n∈N , ξ (N ) , PN
η0
to indicate a particle-set-valued Markov chain.
P
(N )
Let ηn−1 = N1 N
i=1 δξ (N,i) .
n−1
I
Allow the elementary transitions to be:
N
Y
(N )
(N,p)
P ξn(N ) ∈ dξn(N ) |ξn−1 =
Kn,η(N ) (ξn−1 , dξn(N,p) )
p=1
n−1
48/79
Particle Interpretations of Feynman-Kac Formulæ II
I
Consider Kn,η = Sn−1,η Mn
N
Y
(N )
(N,p)
P ξn(N ) ∈ dξn(N ) |ξn−1 =
Sn−1,η(N ) Mn (ξn−1 , dξn(N,p) )
n−1
p=1
I
Defining:
(N ) (N )
Sn−1 (ξn−1 , dξbn(N ) )
=
(N )
) b
(N )
M(N
n (ξn−1 , dξn ) =
N
Y
i=1
N
Y
(N,p)
(N,p)
Sn,η(N ) (ξn−1 , dξbn−1 )
n−1
(N,p)
Mn (ξbn−1 , ξn(N,p) )
i=1
it is clear that:
Z
(N )
P ξn(N ) ∈ dξn(N ) |ξn−1 =
(N )
(N )
(N )
Sn−1,η(N ) (ξn−1 , dξbn−1 )Mn (ξbn−1 , dξn(N ) )
n−1
N
En−1
49/79
Selection, Mutation and Structure
I
A suggestive structural similarity:
Sn−1,ηn−1
ηn−1 ∈P(En−1 )
−→
Select
(N )
N
ξn−1 ∈En−1
I
−→
ηbn ∈P(En−1 )
M
n
−→
Mutate
N
ξbn(N ) ∈En−1
−→
ηn ∈P(En )
ξn(N ) ∈EnN
Selection:
Sn−1,η(N )
n−1
(N )
=ΨGn−1 (ηn−1 )
=
(N,i)
N
X
i=1
Gn−1 (ξn−1 )
δ
(N,i)
(N,j) ξn−1
j=1 Gn−1 (ξn−1 )
PN
(N )
(N,i) i.i.d.
ξbn−1 ∼ ΨGn−1 (ηn−1 )
I
Mutation:
i.i.d.
(N,i)
ξn(N,i) ∼ Mn (ξbn−1 , dξn(N,i) )
I
Semigroupoid
P
N
(ξn(N )
∈
) (N )
dx(N
n |ξn−1 )
=
N
Y
(N )
Φn (ηn−1 )(dx(N,i)
)
n
i=1
50/79
Selected Results
Local Error Decomposition
η1
→
η2 =Φ2 (η1 )
→ η3 =Φ1:3 (η1 )
→
...
→Φ1:n (η1 )
⇓
η1N
→
Φ2 (η1N )
→
Φ1:3 (η1N )
→
...
→Φ1:n (η1N )
→
Φ3 (η2N )
→
...
→Φ2:n (η2N )
→
...
→Φ3:n (η3N )
⇓
η2N
⇓
η3N
..
.
⇓
N
ηn−1
N
→Φn (ηn−1
)
⇓
ηnN
Formally: ηnN − ηn =
n
X
N
Φi,n (ηiN ) − Φi,n (Φi (ηi−1
))
i=1
52/79
A Key Martingale
Proposition (Del Moral, 2004 Propostion 7.4.1)
For each n ≥ 0, ϕn ∈ Cb (En ) define:
ΓN
·,n (ϕn ) : p ∈ {1, . . . , n} → Γp,n (ϕn )
N
ΓN
p,n (ϕn ) := γp (Qp,n ϕn ) − γp (Qp,n ϕn )
N
For any p ≤ n: ΓN
·,n (ϕn ) has F -martingale decomposition:
ΓN
p,n (ϕn )
=
p
X
h
i
N
N
γqN (1) ηqN (Qq,n ϕn ) − ηq−1
Kq,ηq−1
(Qq,n ϕn )
q=1
p
h
i2
X
N
N
N
N
N
Γ·,n p =
γqN (1)2 ηq−1
Kq,ηq−1
(Qq,n (ϕn ) − ηq−1
Kq,ηq−1
Qq,n (ϕn ))
q=1
53/79
Normalizing the Unnormalized
γnN (ϕn ) γn (ϕn )
−
γnN (1)
γn (1)
N
γn (1) γn (ϕn ) γn (ϕn ) γnN (1)
= N
−
×
γn (1) γn (1)
γn (1)
γn (1)
N
γn (1) γn (ϕn )
γnN (1)
= N
− ηn (ϕn ) ×
γn (1) γn (1)
γn (1)
ϕn − ηn (ϕn )
γn (1)
= N
γN
γn (1) n
γn (1)
ηnN (ϕn ) − ηn (ϕn ) =
54/79
Law of Large Numbers and Weak Convergence
Theorem (Del Moral 2004: Theorem 7.4.4)
Under regularity conditions, for any n ≥ 1, p ≥ 1, ϕn ∈ Cb (En ):
√
1/p
N E |ηnN (ϕn ) − ηn (ϕn )|p
≤ cp,n ||ϕn ||∞
By a Borel-Cantelli argument:
a.s.
lim ηnN (ϕn ) → ηn (ϕn ).
N →∞
55/79
Central Limit Theorem
Proposition (Del Moral 2004: Proposition 9.4.2)
Under regularity conditions, for any n ≥ 1:
√
d
N (ηnN (ϕn ) − ηn (ϕn )) → N 0, σn2 (ϕn )
where
σn2 (ϕn ) =
n
X
2
ηq−1 Kq,ηq−1 (Qq,n (ϕn ) − Kq,ηq−1 (Qq,n (ϕn )))
q=1
56/79
Part 3 – Interface
Particle Filters as McKean Interpretations
The Bootstrap Particle Filter
The Simplest Case
Recall: The SIR Particle Filter
I
(i)
1.
2.
3.
4.
I
(i)
At iteration n, given {Wn−1 , X1:n−1 }:
(i)
e
Resample, to obtain { N1 , X
Selection
1:n−1 }.
(i)
(i)
e
Mutation
Sample Xn ∼ qn (·|Xn−1 ).
(i)
(i)
e
Set X1:n−1 = X1:n−1 .
(i)
(i)
(i)
(i)
(i)
(i)
Set Wn = f (Xn |Xn−1 )g(yn |Xn )/qn (Xn |Xn−1 ).
Feynman-Kac formulation?
I
I
(i)
(i)
Generally Wn depends upon Xn−1 .
(At least) 2 solutions exist.
59/79
The Bootstrap SIR Filter
I
The bootstrap particle filter:
I
I
I
Proposal: q(xt−1 , xt ) = f (xt |xt−1 )
Weight: w(xt ) ∝ g(yt |xt )
Feynman-Kac model:
I
I
I
(Gordon, Salmond and Smith, 1993)
Mutation: Mt (xt−1 , dxt ) = f (xt |xt−1 )dxt .
Potential: Gt (xt ) = g(yt |xt ).
McKean interpretation:
I
I
McKean transitions: Kn+1,η = Sn,η Mn+1 .
Selection operation: Sn,η = ΨGn (η).
60/79
Bootstrap Particle Filter Results
LLN
PN
lim
i
i
i=1 Wn ϕn (Xn ) a.s.
→
PN
j
W
n
j=1
√
PN
N →∞
Z
ϕn (xn )p(xn |y1:n )dxn
CLT
N
i
i
i=1 Wn ϕn (Xn )
PN
j
j=1 Wn
!
Z
−
ϕn (xn )p(xn |y1:n )dxn
d
2
→N 0, σBS,n
(ϕn )
61/79
Bootstrap Particle Filter: Asymptotic Variance
2
σBS,n
(ϕn ) =
2
Z
Z
p(x1 |y1:n )2
ϕn (xn )p(xn |y2:n , x1 )dxn − ϕ̄n dx1
p(x1 )
2
Z
t−1 Z
X
p(xk |y1:n )2
+
ϕn (xn )p(xn |yk+1:n , xk )dxn − ϕ̄n dx1:k
p(xk |y1:k−1 )
k=2
Z
p(xn |y1:n )2
+
(ϕn (xn ) − ϕ̄n )2 dxn .
p(xn |y1:n−1 )
with
Z
ϕ̄n =
p(xn |y1:n )ϕn (xn )dxn .
62/79
Extended Spaces: General SIR Particle Filter
I
(i)
(i)
At iteration n, given {Wn−1 , X1:n−1 }:
1.
2.
3.
4.
(i)
e
Selection
Resample, to obtain { N1 , X
1:n−1 }.
(i)
(i)
e
Mutation
Sample Xn ∼ qn (·|Xn−1 ).
(i)
(i)
e
Set X1:n−1 = X1:n−1 .
(i)
(i)
(i)
(i)
(i)
(i)
Set Wn = f (Xn |Xn−1 )g(yn |Xn )/qn (Xn |Xn−1 ).
(i)
(i)
I
But Wn depends upon Xn−1
I
Let Ẽn = En−1 × En .
I
Define Yn = (Xn−1 , Xn ).
I
Now Wn = G̃n (Yn ).
I
Set M̃n (yn−1 , dyn ) = δyn−1,2 (dyn,1 )q(yn−1,2 , dyn,1 ).
I
A Feynman-Kac representation.
63/79
SIR Asymptotic Variance
2
σSIR,n (ϕn ) =
Z
p(x1 |y1:n )2
q1 (x1 )
+
t−1
X
k=2
Z
+
Z
2
Z
ϕn (xn )p(xn |y2:n , x1 )dxn − ϕ̄n
p(x1:k |y1:n )2
p(x1:k−1 |y1:k−1 )qk (xk |xk−1 )
p(x1:n |y1:n )2
p(x1:n−1 |y1:n−1 )qn (xn |xn−1 )
dx1
2
Z
ϕn (xn )p(x:n |yk+1:n , xk )dxn − ϕ̄n
dx1:k
2
(ϕn (xn ) − ϕ̄n ) dx1:n .
64/79
Auxiliary Particle Filters
Another algorithm.
Auxiliary [v] Particle Filters
(Pitt & Shephard ’99)
If we have access to the next observation before resampling, we
could use this structure:
(i)
(i)
I
Pre-weight every particle with λn ∝ pb(yn |Xn−1 ).
I
Propose new states, from the mixture distribution
N
X
i=1
I
N
X
λ(i)
n .
i=1
Weight samples, correcting for the pre-weighting.
Wni ∝
I
(i)
λ(i)
n q(·|Xn−1 )
i |X i
i
f (Xn,n
n,n−1 )g(Xn,n |yn )
λin qn (Xn,n |Xn,1:n−1 )
Resample particle set.
66/79
Some Well Known Refinements
We can tidy things up a bit:
1. The auxiliary variable step is equivalent to multinomial
resampling.
2. So, there’s no need to resample before the pre-weighting.
Now we have:
(i)
(i)
I
Pre-weight every particle with λn ∝ pb(yn |Xn−1 ).
I
Resample
I
Propose new states
I
Weight samples, correcting for the pre-weighting.
Wni
i |X i
i
f (Xn,n
n,n−1 )g(Xn,n |yn )
∝
λin qn (Xn,n |Xn,1:n−1 )
67/79
A Feynman-Kac Interpretation
of the APF
A transition and a potential.
An Interpretation of the APF
If we move the first step at time n + 1 to the last at time n, we
get:
I
Resample
I
Propose new states
I
Weight samples, correcting earlier pre-weighting.
I
Pre-weight every particle with λn+1 ∝ pb(yn+1 |Xn ).
(i)
(i)
An SIR algorithm targetting:
pbn (x1:n |y1:n+1 ) ∝ p(x1:n |y1:n )b
p(yn+1 |xn ).
69/79
Some Consequences
Asymptotics.
Theoretical Considerations
I
Direct analysis of the APF is largely unnecessary.
I
Results can be obtained by considering the associated SIR
algorithm.
I
SIR has a (discrete time) Feynman-Kac interpretation.
71/79
For example. . .
Proposition. Under standard regularity conditions
√
2
N ϕ
bN
n,AP F − ϕn → N 0, σn (ϕn )
where,
σn2 (ϕn ) =
Z
2
Z
p(x1 |y1:n )2
ϕn (xn )p(xn |y2:n , x1 )dxn − ϕ̄n dx1
q1 (x1 )
Z
2
Z
t−1
X
p(x1:k |y1:n )2
+
ϕn (xn )p(xn |yk+1:n , xk )dxn − ϕ̄n dx
b (x1:k−1 |y1:k )qk (xk |xk−1 )
p
k=2
Z
p(x1:n |y1:n )2
2
+
(ϕn (xn ) − ϕ̄n ) dx1:n .
b (x1:n−1 |y1:n )qn (xn |xn−1 )
p
72/79
Practical Implications
I
It means we’re doing importance sampling.
I
Choosing pb(yn |xn−1 ) = p (yn |xn = E [Xn |xn−1 ]) is
dangerous.
I
A safer choice would be ensure that
sup
xn−1 ,xn
I
g(yn |xn )f (xn |xn−1 )
<∞
pb(yn |xn−1 )q(xn |xn−1 )
Using APF doesn’t ensure superior performance.
73/79
A Contrived Illustration
Consider the following binary state-space model with common
state and observation spaces:
X = {0, 1}
p(x1 = 0) = 0.5
p(xn = xn−1 ) = 1 − δ
Y=X
p(yn = xn ) = 1 − ε.
I
δ controls ergodicity of the state process.
I
controls the information contained in observations.
Consider estimating E(X2 |Y1:2 = (0, 1)).
74/79
Variance of SIR - Variance of APF
0.05
0
-0.05
-0.1
-0.15
-0.2
1
0.9
0.8
0.7
0.05
0.6
0.1
0.15
0.5
0.4
0.2
0.25
ǫ
0.3
0.3
0.35
δ
0.2
0.4
0.45
0.1
0.5
75/79
Variance Comparison at ǫ = 0.25
0.45
SISR
APF
SISR Asymptotic
APF Asymptotic
0.4
Variance
0.35
0.3
0.25
0.2
0.15
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
δ
76/79
Further Reading
Particle Filters / Sequential Monte Carlo
I
Novel approach to nonlinear/non-Gaussian Bayesian state
estimation, Gordon, Salmond and Smith, IEE
Proceedings-F, 140(2):107-113, 1993.
I
Sequential Monte Carlo in Practice, Doucet, De Freitas &
Gordon, Springer, 2001.
I
A survey of convergence results on particle filtering
methods for practitioners., Crisan and Doucet, IEEE
Transactions on Signal Processing, textbf50(3):736-746,
2002.
I
Central limit theorem for sequential Monte Carlo methods
and its applications to Bayesian inference. Chopin, Annals
of Statistics, 32(6):2385-2411, 2004.
I
A Tutorial on Particle Filtering. . . , Doucet & J., in Oxford
Handbook of Nonlinear Filtering, 2010 (to appear).
78/79
I
Sequential Monte Carlo Samplers, Doucet, Del Moral &
Jasra, JRSSB, 63(3):411-436, 2006.
Feynman-Kac Particle Models
I
Particle Methods: An Introduction with Applications, Del
Moral & Doucet, HAL INRIA Report RR-6691, 2009.
I
Feynman-Kac Formulæ. . . , Del Moral, Springer, 2004.
Auxiliary Particle Filters
I
Filtering Via Simulation, Pitt & Shepherd, JASA,
94(446):590-599, 1999.
I
A Note on Auxiliary Particle Filters, J. & Doucet, Stat. &
Prob. Lett., 78(12):1498–1504, 2008.
I
Optimality of the APF, Douc, Moulines & Olsson, Prob. &
Math. Stat., 29(1):1–28, 2009.
79/79
Download