Interacting Particle Systems and (Discrete Time) Filtering Adam M. Johansen

advertisement
Interacting Particle Systems
and (Discrete Time) Filtering
Adam M. Johansen
a.m.johansen@warwick.ac.uk
http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic/
johansen/talks/
MASDOC Statistical Frontiers Seminar
1st February 2013
Part 1 – A Statistical Problem
Filtering
Filtering: The Problem
Hidden Markov Models / State Space Models
x1
x2
x3
x4
x5
x6
y1
y2
y3
y4
y5
y6
I
Unobserved Markov chain {Xn } transition f .
I
Observed process {Yn } conditional density g .
I
Density:
p(x1:n , y1:n ) = f1 (x1 )g (y1 |x1 )
n
Y
f (xi |xi−1 )g (yi |xi ).
i=2
4/79
Filtering / Smoothing
I
Let X1 , . . . denote the position of an object which follows
Markovian dynamics:
Xn |{Xn−1 = xn−1 } ∼ f (·|xn−1 ).
I
Let Y1 , . . . denote a collection of observations:
Yi |{Xi = xi } ∼ g (·|xi ).
I
Smoothing: estimate, as observations arrive, p(x1:n |y1:n ).
I
Filtering: estimate, as observations arrive, p(xn |y1:n ).
I
Formal Solution:
p(x1:n |y1:n ) = p(x1:n−1 |y1:n−1 )
f (xn |xn−1 )g (yn |xn )
p(yn |y1:n−1 )
5/79
A Motivating Example: Data
30
25
20
sy/km
15
10
5
0
−5
−10
0
5
10
15
20
25
30
35
sx/km
6/79
Example: Almost Constant Velocity Model
I
I
I
States: xn = [snx unx sny uny ]T
Dynamics: xn = Axn−1 + n
 x  
sn
1 ∆t
 unx   0 1
 y =
 sn   0 0
uny
0 0
 x
sn−1
0 0
 ux
0 0 
  n−1
y
1 ∆t   sn−1
y
un−1
0 1


 + n

Observation: yn = Bxn + νn

snx
 unx 
 y  + νn
 sn 
uny

rnx
rny
=
1 0 0 0
0 0 1 0
7/79
Sampling Approaches
The Monte Carlo Method
I
Given a probability density, f , and ϕ : E → R
Z
ϕ(x)f (x)dx
I =
E
I
Simple Monte Carlo solution:
I
I
i.i.d.
Sample X1 , . . . , XN ∼ f .
N
P
Estimate bI = N1
ϕ(Xi ).
i=1
I
Can also be viewed as approximating π(dx) = f (x)dx with
π
bN (dx) =
N
1 X
δXi (dx).
N
i=1
9/79
The Importance–Sampling Identity
I
Given g , such that
I
I
f (x) > 0 ⇒ g (x) > 0
and f (x)/g (x) < ∞,
define w (x) = f (x)/g (x) and:
Z
Z
Z
ϕ(x)f (x)dx = ϕ(x)f (x)g (x)/g (x)dx = ϕ(x)w (x)g (x)dx.
I
This suggests the importance sampling estimator:
I
I
i.i.d.
Sample X1 , . . . , XN ∼ g .
N
P
w (Xi )ϕ(Xi ).
Estimate bI = N1
i=1
I
Can also be viewed as approximating π(dx) = f (x)dx with
π
bN (dx) =
N
1 X
w (Xi )δXi (dx).
N
i=1
10/79
Importance Sampling Example
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
-4
-3
-2
-1
0
1
2
3
4
11/79
Self-Normalised Importance Sampling
I
Often, f is known only up to a normalising constant.
I
If v (x) = cf (x)/g (x) = cw (x), then
Eg (cw ϕ)
Eg (v ϕ)
cEf (ϕ)
=
=
= Ef (ϕ).
Eg (v 1)
Eg (cw 1)
cEf (1)
I
Estimate the numerator and denominator with the same
sample:
N
P
v (Xi )ϕ(Xi )
bI = i=1
.
N
P
v (Xi )
i=1
I
Biased for finite samples, but consistent.
I
Typically reduces variance.
12/79
Importance Sampling for Smoothing/Filtering
I
(i)
Sample {X1:n } at time n from qn (x1:n ), define
p(x1:n |y1:n )
p(x1:n , y1:n )
=
q(x1:n )
q(x1:n )p(y1:n )
Qn
f (x1 )g (y1 |x1 ) m=2 f (xm |xm−1 )g (ym |xm )
∝
qn (x1:n )
wn (x1:n ) ∝
(i)
(i)
P
(j)
I
set Wn = wn (X1:n )/
I
then {Wn , Xn } is a consistently weighted sample.
I
This seems inefficient.
(i)
j
wn (X1:n ),
(i)
13/79
Sequential Importance Sampling (SIS) I
I
Importance weight
wn (x1:n ) ∝
=
I
(i)
f (x1 )g (y1 |x1 )
Qn
f (x1 )g (y1 |x1 )
qn (x1 )
n
Y
m=2 f (xm |xm−1 )g (ym |xm )
qn (x1:n )
m=2
f (xm |xm−1 )g (ym |xm )
qn (xm |x1:m−1 )
(i)
Given {Wn−1 , X1:n−1 } targetting p(x1:n−1 |y1:n−1 )
I
Let qn (x1:n−1 ) = qn−1 (x1:n−1 ),
I
sample Xn
(i) i.i.d.
(i)
(i)
∼ qn (·|X1:n−1 ) or even qn (·|Xn−1 ).
14/79
Sequential Importance Sampling (SIS) II
I
And update the weights:
wn (x1:n ) =wn−1 (x1:n−1 )
f (xn |xn−1 )g (yn |xn )
qn (xn |xn−1 )
(i)
Wn(i) =wn (X1:n )
(i)
(i)
=wn−1 (X1:n−1 )
(i)
(i)
=Wn−1
R
(i)
(i)
f (Xn |Xn−1 )g (yn |Xn )
(i)
(i)
qn (Xn |Xn−1 )
(i)
(i)
f (Xn |Xn−1 )g (yn |Xn )
(i)
(i)
qn (Xn |Xn−1 )
p(x1:n |y1:n )dxn ≈ p(x1:n−1 |y1:n−1 ) this makes sense.
I
If
I
We only need to store {Wn , Xn−1:n }.
I
Same computation every iteration.
(i)
(i)
15/79
Importance Sampling on Huge Spaces Doesn’t Work
I
It’s said that IS breaks the curse of dimensionality:
"
#
Z
N
√
1 X
d
N
w (Xi )ϕ(Xi ) − ϕ(x)f (x)dx → N (0, Varg [w ϕ])
N
i=1
I
This is true.
I
But it’s not enough.
I
Varg [w ϕ] increases (often exponentially) with dimension.
I
Eventually, an SIS estimator (of p(x1:n |y1:n )) will fail.
I
But p(xn |y1:n ) is a fixed-dimensional distribution.
16/79
Sequential Importance Resampling
Resampling: The SIR[esampling] Algorithm
I
Problem: variance of the weights builds up over time.
I
Solution? Given {Wn−1 , X1:n−1 }:
(i)
(i)
(i)
e
1. Resample? , to obtain { N1 , X
1:n−1 }.
(i)
(i)
e ).
2. Sample Xn ∼ qn (·|X
n−1
(i)
e (i) .
3. Set X1:n−1 = X
1:n−1
(i)
(i)
(i)
(i)
(i)
(i)
4. Set Wn = f (Xn |Xn−1 )g (yn |Xn )/qn (Xn |Xn−1 ).
I
And continue as with SIS.
I
There is a cost, but this really works.
? There are many algorithms for doing this. . .
18/79
Iteration 2
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
19/79
Iteration 3
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
20/79
Iteration 4
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
21/79
Iteration 5
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
22/79
Iteration 6
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
23/79
Iteration 7
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
24/79
Iteration 8
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
25/79
Iteration 9
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
26/79
Iteration 10
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
27/79
Part 2 – Applied Probability
Feynman-Kac Formulæ
Feynman-Kac Formulæ
I
I
A natural description for measure-valued stochastic processes.
Model for:
I
I
I
I
Particle motion in absorbing environments.
Classes of branching particle system.
Simple genetic algorithms.
Particle filters and related algorithms.
Structure of this section:
I
Probabilistic Construction
I
Semigroup[oid] Structure
I
McKean Interpretations
I
Particle Approximations
I
Selected Results
29/79
Probabilistic Construction
Following Del Moral (2004)
The Canonical Markov Chain
I
Consider the filtered probability space:
(Ω, F, {Fn }n∈N , Pµ )
I
Let {Xn }n∈N be a Markov chain such that for any n ∈ N:
Pµ (X1:n ∈ dx1:n ) =µ(dx1 )
n
Y
Mi (xi−1 , dxi )
i=2
Xi : Ω →Ei
µ ∈P(E1 )
Mi : Ei−1 →P(Ei )
I
(Ei , Ei ) are measurable spaces.
I
The Xi are Ei /Fi -measurable.
I
Using Kolmogorov’s/Tulcea’s extension theorem there exists a
unique process-valued extension.
31/79
Some Operator Notation
Given two measurable spaces, (E , E) and (F , F), a measure µ on
(E , E) and a Markov kernel, K : E → P(F ), define:
Z
µ(ϕE ) := µ(dx)ϕE (x)
Z
µK (ϕF ) := µ(dx)K (x, dy )ϕF (y )
µK ∈ P(F )
Z
K (ϕF )(x) := K (x, dy )ϕF (y )
K (ϕF ) : E → R
with ϕE , ϕF suitably measurable functions.
Given two functions, g , h : E → R, define g · h : E → R via
(g · h)(x) = g (x)h(x).
Given e : E → R and f : F → R, let (e ⊗ f )(x, y ) := e(x)f (y ).
32/79
The Feynman-Kac Formulæ
I
Given Pµ and potential functions:
{Gi }i∈N
I
Gi : Ei →[0, ∞)
Define two path measures weakly:
n−1
Q
E ϕ1:n (X1:n )
Gi (Xi )
i=1
Qn (ϕ1:n ) =
n−1
Q
Gi (Xi )
E
i=1
n
Q
E ϕ1:n (X1:n )
Gi (Xi )
b n (ϕ1:n ) =
n i=1 Q
Q
E
Gi (Xi )
i=1
where ϕ1:n : ⊗ni=1 Ei → R.
33/79
Example (Filtering via FK Formulæ: Prediction)
I
Let µ(x1 ) = f (x1 ), Mn (xn−1 , dxn ) = f (xn |xn−1 )dxn .
I
Let Gn (xn ) = g (yn |xn ).
I
Then:
"
Qn (ϕ1:n ) =E ϕ1:n (X1:n )
"
=E ϕ1:n (X1:n )
n−1
Y
i=1
n−1
Y
#, "n−1
#
Y
Gi (Xi )
E
Gi (Xi )
i=1
#, "n−1
#
Y
g (yi |Xi )
E
g (yi |Xi )
i=1
R
i=2
=
i=1
#
"n−1
n
Q
Q
f (x1 )
f (xi |xi−1 )
g (yj |xj ) ϕ1:n (x1:n )dx1:n
j=1
#
"n−1
n
R
Q
Q
f (x1 )
f (xi |xi−1 )
g (yj |xj ) dx1:n
i=2
j=1
Z
=
p(x1:n |y1:n−1 )ϕ1:n (x1:n )dx1:n
34/79
Example (Filtering via FK Formulæ: Update/Filtering)
I
Whilst:
"
b n (ϕ1:n ) =E ϕ1:n (X1:n )
Q
n
Y
#, "
Gi (Xi )
E
i=1
"
=E ϕ1:n (X1:n )
n
Y
n
Y
f (x1 )
n
Q
g (yi |Xi )
i=1
n
Y
E
f (xi |xi−1 )
=
R
#
g (yi |Xi )
i=1
"
i=2
Gi (Xi )
#, "
i=1
R
#
n
Q
#
g (yj |xj ) ϕ1:n (x1:n )dx1:n
j=1
#
" n
n
Q
Q
f (x1 )
f (xi |xi−1 )
g (yj |xj ) dx1:n
i=2
j=1
Z
=
p(x1:n |y1:n )ϕ1:n (x1:n )dx1:n
35/79
Feynman-Kac Marginal Measures
We are typically interested in marginals:
"
#
"
#
n−1
n
Y
Y
γn (ϕn ) =E ϕn (Xn )
Gi (Xi )
γ
bn (ϕn ) =E ϕn (Xn )
Gi (Xi )
i=1
i=1
ηn (ϕn ) =Qn (11:n−1 ⊗ ϕn )
n−1
Q
Gi (Xi )
E ϕn (Xn )
i=1
=
n−1
Q
Gi (Xi )
E
b n (11:n−1 ⊗ ϕn )
ηbn =Q
n
Q
E ϕn (Xn )
Gi (Xi )
i=1
n
=
Q
Gi (Xi )
E
i=1
i=1
=b
γn (ϕn )/b
γn (1)
=γn (ϕn )/γn (1)
Key property:
Z
ηn (An ) =
Qn (dx1:n )
E1 ×...En−1 ×An
Z
ηbn (An ) =
b n (dx1:n )
Q
E1 ×...En−1 ×An
36/79
Analysis: Semigroup Structure
A Dynamic Systems View:
How do the marginal distributions evolve?
Some Recursive Relationships
I
The unnormalized marginals obey:
γ
bn (ϕn ) =γn (ϕn · Gn )
I
Whilst the normalized marginals satisfy:
γ
bn (ϕn )
γ
bn (1)
γn (ϕn · Gn )
=
γn (Gn )
ηn (ϕn · Gn )
=
ηn (Gn )
ηbn (ϕn ) =
I
γn (ϕn ) =b
γn−1 Mn (ϕn )
γn (ϕn )
γn (1)
γ
bn−1 Mn (ϕn )
=
γ
bn−1 Mn (1)
ηbn−1 Mn (ϕn )
=
ηbn−1 Mn (1)
=b
ηn−1 Mn (ϕn )
ηn (ϕn ) =
So:
ηbn =
ηbn−1 Mn (ϕn · Gn )
ηbn−1 Mn (Gn )
38/79
The Boltzmann-Gibbs Operator
I
Given ν ∈ P(E ) and G : E → R:
P(E ) →P(E )
ΨG :
ν →ΨG (ν)
ΨG :
I
The Boltzmann-Gibbs Operator ΨG is defined weakly by:
∀ϕ ∈ Cb :
I
ΨG (ν) (ϕ) =
ν(G · ϕ)
ν(G )
or equivalently, for all measurable sets A:
ν (G · IA )
ν(G )
R
ν(dx)G (x)
=R A
0
0
E ν(dx )G (x )
ΨG (A) =
39/79
Example (Boltzmann-Gibbs Operators and Bayes’ Rule)
I
Let µ(dx) = f (x)λ(dx) be a prior measure.
I
Let G (x) = g (y |x) be the likelihood.
I
Then:
R
µ(dx)G (x)ϕ(x)
µ(G · ϕ)
= R
ΨG (µ) (ϕ) =
µ(G )
µ(dx 0 )G (x 0 )
R
f (x)g (y |x)ϕ(x)λ(dx)
= R
f (x 0 )g (y |x 0 )λ(dx 0 )
Z
= f (x|y )ϕ(x)λ(dx)
with
f (x|y ) := R
I
f (x)g (y |x)
f (x)g (y |x)λ(dx)
So: Ψg (y |·) : Prior → Posterior.
40/79
Markov Semigroups
I
A semigroup S comprises:
I
I
I
A set, S.
An associative binary operation, ·.
A Markov Chain with homogeneous transition M has
dynamic semigroup Mn :
I
I
I
I
M0 (x, A) = δx (A).
M1 (x, A) = M(x,
A).
R
Mn (x, A) = M(x,Rdy )Mn−1 (y , A).
(Mn · Mm )(x, A) = Mn (x, dy )Mm (y , A) = Mn+m (x, A).
I
A linear semigroup.
I
Key property:
P (Xn+m ∈ A|Xm = x) = Mn (x, A).
41/79
Markov Semigroupoids
I
A semigroupoid, S 0 comprises:
I
I
I
A set, S.
A partial associative binary operation, ·.
A Markov Chain with inhomogeneous transitions Mn has
dynamic semigroupoid Mp:q :
I
I
I
I
Mp:p (x, A) = δx (A).
Mp:p+1 (x, A) =
R Mp+1 (x, A).
Mp:q (x, A) = Mp+1 (x,
R dy )Mp+1:q (y , A).
(Mp:q · Mq:r )(x, A) = Mp:q (x, dy )Mq:r (y , A) = Mp:r (x, A).
I
A linear semigroupoid.
I
Key property:
P (Xn+m ∈ A|Xm = x) = Mm,n+m (x, A).
42/79
An Unnormalized Feynman-Kac Semigroupoid
I
We previously established:
γn =b
γn−1 Mn
I
γ
bn (ϕn ) =γn (ϕn · Gn )
Defining
Qp (xp−1 , dxp ) = Mp (xp−1 , dxp )Gp (xp )
I
we obtain γn = γn−1 Qn .
We can construct the dynamic semigroupoid Qp:q :
I
I
I
I
I
Qp:p (x, A) = δx (A).
Qp:p+1 (x, A) =
R Qp+1 (x, A).
Qp:q (x, A) = Qp+1 (x,
R dy )Qp+1:q (y , A).
(Qp:q · Qq:r )(x, A) = Qp:q (x, dy )Qq:r (y , A) = Qp:r (x, A).
Just a Markov semigroupoid for general measures:
∀p ≤ q : γq = γp Qp:q .
43/79
A Normalised Feynman-Kac Semigroupoid
I
We previously established:
ηn =b
ηn−1 Mn (ϕn )
ηbn =
ηn (ϕn · Gn )
ηn (Gn )
I
From the definition of ΨGn : ηbn = ΨGn (ηn ).
I
Defining Φn : P(En−1 ) → P(En ) as:
Φn : ηn−1 → ΨGn−1 (ηn−1 )Mn
we have the recursion ηn = Φn (ηn−1 ) and the nonlinear
semigroupoid, Φp:q :
I
I
I
I
I
Φp:p (x, A) = δx (A).
Φp:p+1 (x, A) = Φp+1 (x, A).
Φp:q (x, A) = Φp+1:q (Φ
R p+1 (ηp )) for q > p + 1.
(Φp:q · Φq:r )(x, A) = Φq:r (y , A)Φp:q (x, dy ) = Φp:r (x, A).
Again: ∀p ≤ q : ηq = ηp Φp:q .
44/79
McKean Interpretations
Microscopic mass transport.
McKean Interpretations of Feynman-Kac Formulæ
I
Families of Markov kernels consistent with FK Marginals.
I
A collection {Kn,η }n∈N,η∈P(En−1 ) is a McKean Interpretation
if:
∀n ∈ N : ηn = Φn (ηn−1 ) = ηn−1 Kn,ηn−1 .
I
Not unique. . . and not linear.
Selection/Mutation approach seems natural:
I
I
I
I
Choose Sn,η such that ηSn,η = ΨGn (η).
Set Kn+1,η = Sn,η Mn+1 .
Still not unique:
I
I
Sn,η (xn , ·) = ΨGn (η)
Sn,η (xn , ·) = n Gn (xn )δxn (·) + (1 − n Gn (xn ))ΨGn (η)(·)
46/79
Particle Interpretations
Stochastic discretisations.
Particle Interpretations of Feynman-Kac Formulæ I
Given a McKean interpretation, we can attach an N-particle
model.
(N)
I
Denote ξn
I
Allow
I
(N,1)
= (ξn
(N,2)
, ξn
(N,N)
, . . . , ξn
) ∈ EnN .
ΩN , F N = (FnN )n∈N , ξ (N) , PN
η0
to indicate a particle-set-valued Markov chain.
P
(N)
Let ηn−1 = N1 N
i=1 δξ (N,i) .
n−1
I
Allow the elementary transitions to be:
N
Y
(N)
(N) (N)
(N,p)
(N,p)
Kn,η(N) (ξn−1 , dξn
)
P ξn ∈ dξn |ξn−1 =
p=1
n−1
48/79
Particle Interpretations of Feynman-Kac Formulæ II
I
Consider Kn,η = Sn−1,η Mn
N
Y
(N)
(N) (N)
(N,p)
(N,p)
P ξn ∈ dξn |ξn−1 =
Sn−1,η(N) Mn (ξn−1 , dξn
)
n−1
p=1
I
Defining:
(N) (N)
(N)
Sn−1 (ξn−1 , d ξbn )
(N)
(N)
(N)
=
Mn (ξbn−1 , dξn ) =
N
Y
i=1
N
Y
(N,p)
(N,p)
Sn,η(N) (ξn−1 , d ξbn−1 )
n−1
(N,p)
(N,p)
Mn (ξbn−1 , ξn
)
i=1
it is clear that:
Z
(N)
(N)
(N)
(N)
P ξn(N) ∈ dξn(N) |ξn−1 =
Sn−1,η(N) (ξn−1 , d ξbn−1 )Mn (ξbn−1 , dξn(N) )
n−1
N
En−1
49/79
Selection, Mutation and Structure
I
A suggestive structural similarity:
ηn−1 ∈P(En−1 )
Sn−1,ηn−1
−→
Select
(N)
N
ξn−1 ∈En−1
I
−→
M
n
−→
ηbn ∈P(En−1 )
Mutate
(N)
N
ξbn ∈En−1
−→
ηn ∈P(En )
(N)
ξn
∈EnN
Selection:
Sn−1,η(N)
n−1
(N)
=ΨGn−1 (ηn−1 )
=
(N,i)
N
X
i=1
Gn−1 (ξn−1 )
δ
(N,i)
(N,j) ξn−1
j=1 Gn−1 (ξn−1 )
PN
(N,i) i.i.d.
(N)
ξbn−1 ∼ ΨGn−1 (ηn−1 )
I
Mutation:
(N,i) i.i.d.
ξn
I
(N,i)
(N,i)
∼ Mn (ξbn−1 , dξn )
Semigroupoid
N
P
(N)
(ξn
∈
(N) (N)
dxn |ξn−1 )
=
N
Y
(N)
(N,i)
Φn (ηn−1 )(dxn
)
i=1
50/79
Selected Results
Local Error Decomposition
η1
→ η2 =Φ2 (η1 )
→ η3 =Φ1:3 (η1 )
→
...
→Φ1:n (η1 )
→
Φ1:3 (η1N )
→
...
→Φ1:n (η1N )
→
Φ3 (η2N )
→
...
→Φ2:n (η2N )
→
...
→Φ3:n (η3N )
⇓
η1N
→
Φ2 (η1N )
⇓
η2N
⇓
η3N
..
.
⇓
N
ηn−1
N
→Φn (ηn−1
)
⇓
ηnN
Formally: ηnN − ηn =
n
X
N
Φi,n (ηiN ) − Φi,n (Φi (ηi−1
))
i=1
52/79
A Key Martingale
Proposition (Del Moral, 2004 Propostion 7.4.1)
For each n ≥ 0, ϕn ∈ Cb (En ) define:
ΓN
·,n (ϕn ) : p ∈ {1, . . . , n} → Γp,n (ϕn )
N
ΓN
p,n (ϕn ) := γp (Qp,n ϕn ) − γp (Qp,n ϕn )
N
For any p ≤ n: ΓN
·,n (ϕn ) has F -martingale decomposition:
ΓN
p,n (ϕn )
=
p
X
h
i
N
γqN (1) ηqN (Qq,n ϕn ) − ηq−1
Kq,ηq−1
(Qq,n ϕn )
N
q=1
p
h
i2
X
N
N
N
Γ·,n p =
γqN (1)2 ηq−1
Kq,ηq−1
(Qq,n (ϕn ) − ηq−1
Kq,ηq−1
Qq,n (ϕn ))
N
N
q=1
53/79
Normalizing the Unnormalized
γnN (ϕn ) γn (ϕn )
−
γn (1)
γnN (1)
N
γn (1) γn (ϕn ) γn (ϕn ) γnN (1)
= N
−
×
γn (1)
γn (1)
γn (1) γn (1)
N
γn (1) γn (ϕn )
γnN (1)
= N
− ηn (ϕn ) ×
γn (1)
γn (1) γn (1)
γn (1)
ϕn − ηn (ϕn )
= N
γN
γn (1)
γn (1) n
ηnN (ϕn ) − ηn (ϕn ) =
54/79
Law of Large Numbers and Weak Convergence
Theorem (Del Moral 2004: Theorem 7.4.4)
Under regularity conditions, for any n ≥ 1, p ≥ 1, ϕn ∈ Cb (En ):
√
h
i1/p
NE |ηnN (ϕn ) − ηn (ϕn )|p
≤ cp,n ||ϕn ||∞
By a Borel-Cantelli argument:
a.s.
lim ηnN (ϕn ) → ηn (ϕn ).
N→∞
55/79
Central Limit Theorem
Proposition (Del Moral 2004: Proposition 9.4.2)
Under regularity conditions, for any n ≥ 1:
√
d
N(ηnN (ϕn ) − ηn (ϕn )) → N 0, σn2 (ϕn )
where
σn2 (ϕn ) =
n
X
2
ηq−1 Kq,ηq−1 (Qq,n (ϕn ) − Kq,ηq−1 (Qq,n (ϕn )))
q=1
56/79
Part 3 – Interface
Particle Filters as McKean Interpretations
The Bootstrap Particle Filter
The Simplest Case
Recall: The SIR Particle Filter
I
(i)
(i)
At iteration n, given {Wn−1 , X1:n−1 }:
e (i) }.
1. Resample, to obtain { N1 , X
1:n−1
(i)
e (i) ).
2. Sample Xn ∼ qn (·|X
3. Set
4. Set
I
n−1
(i)
e (i) .
X1:n−1 = X
1:n−1
(i)
(i)
(i)
(i)
(i)
(i)
Wn = f (Xn |Xn−1 )g (yn |Xn )/qn (Xn |Xn−1 ).
Selection
Mutation
Feynman-Kac formulation?
I
I
(i)
(i)
Generally Wn depends upon Xn−1 .
(At least) 2 solutions exist.
59/79
The Bootstrap SIR Filter
I
The bootstrap particle filter:
I
I
I
Proposal: q(xn−1 , xt ) = f (xn |xn−1 )
Weight: w (xn ) ∝ g (yn |xn )
Feynman-Kac model:
I
I
I
(Gordon, Salmond and Smith, 1993)
Mutation: Mn (xn−1 , dxn ) = f (xn |xn−1 )dxt .
Potential: Gn (xn ) = g (yn |xn ).
McKean interpretation:
I
I
McKean transitions: Kn+1,η = Sn,η Mn+1 .
Selection operation: Sn,η = ΨGn (η).
60/79
Bootstrap Particle Filter Results
LLN
(i)
(i)
i=1 Wn ϕn (Xn ) a.s.
→
PN
(j)
W
n
j=1
PN
lim
N→∞
Z
ϕn (xn )p(xn |y1:n )dxn
CLT
(i)
(i)
i=1 Wn ϕn (Xn )
PN
(j)
j=1 Wn
PN
√
N
!
Z
−
ϕn (xn )p(xn |y1:n )dxn
d
2
→ N 0, σBS,n
(ϕn )
61/79
Bootstrap Particle Filter: Asymptotic Variance
2
σBS,n
(ϕn ) =
2
Z
Z
p(x1 |y1:n )2
ϕn (xn )p(xn |y2:n , x1 )dxn − ϕ̄n dx1
p(x1 )
2
Z
t−1 Z
X
p(xk |y1:n )2
+
ϕn (xn )p(xn |yk+1:n , xk )dxn − ϕ̄n dx1:k
p(xk |y1:k−1 )
k=2
Z
p(xn |y1:n )2
+
(ϕn (xn ) − ϕ̄n )2 dxn .
p(xn |y1:n−1 )
with
Z
ϕ̄n =
p(xn |y1:n )ϕn (xn )dxn .
62/79
Extended Spaces: General SIR Particle Filter
I
(i)
(i)
At iteration n, given {Wn−1 , X1:n−1 }:
e (i) }.
1. Resample, to obtain { N1 , X
1:n−1
(i)
e (i) ).
2. Sample Xn ∼ qn (·|X
3. Set
4. Set
n−1
(i)
e (i) .
X1:n−1 = X
1:n−1
(i)
(i)
(i)
(i)
(i)
(i)
Wn = f (Xn |Xn−1 )g (yn |Xn )/qn (Xn |Xn−1 ).
(i)
Selection
Mutation
(i)
I
But Wn
depends upon Xn−1
I
Let Ẽn = En−1 × En .
I
Define Yn = (Xn−1 , Xn ).
I
Now Wn = G̃n (Yn ).
I
Set M̃n (yn−1 , dyn ) = δyn−1,2 (dyn,1 )q(yn−1,2 , dyn,1 ).
I
A Feynman-Kac representation.
63/79
SIR Asymptotic Variance
2
σSIR,n
Z
2
p(x1 |y1:n )2
(ϕn ) =
ϕn (xn )p(xn |y2:n , x1 )dxn − ϕ̄n dx1
q1 (x1 )
t−1 Z
X
p(x1:k |y1:n )2
+
p(x1:k−1 |y1:k−1 )qk (xk |xk−1 )
k=2
Z
2
ϕn (xn )p(x:n |yk+1:n , xk )dxn − ϕ̄n dx1:k
Z
p(x1:n |y1:n )2
+
(ϕn (xn ) − ϕ̄n )2 dx1:n .
p(x1:n−1 |y1:n−1 )qn (xn |xn−1 )
Z
64/79
Auxiliary Particle Filters
Another algorithm.
Auxiliary [v] Particle Filters
(Pitt & Shephard ’99)
If we have access to the next observation before resampling, we
could use this structure:
(i)
(i)
I
Pre-weight every particle with λn ∝ b
p (yn |Xn−1 ).
I
Propose new states, from the mixture distribution
N
X
(i)
(i) λn q(·|Xn−1 )
i=1
I
(i)
λn .
i=1
Weight samples, correcting for the pre-weighting.
(i)
Wni
I
N
X
∝
(i)
(i)
f (Xn,n |Xn,n−1 )g (Xn,n |yn )
(i)
λn qn (Xn,n |Xn,1:n−1 )
Resample particle set.
66/79
Some Well Known Refinements
We can tidy things up a bit:
1. The auxiliary variable step is equivalent to multinomial
resampling.
2. So, there’s no need to resample before the pre-weighting.
Now we have:
(i)
(i)
I
Pre-weight every particle with λn ∝ b
p (yn |Xn−1 ).
I
Resample
I
Propose new states
I
Weight samples, correcting for the pre-weighting.
(i)
(i)
Wn ∝
(i)
(i)
f (Xn,n |Xn,n−1 )g (Xn,n |yn )
(i)
(i)
(i)
λn qn (Xn,n |Xn,1:n−1 )
67/79
A Feynman-Kac Interpretation
of the APF
A transition and a potential.
An Interpretation of the APF
If we move the first step at time n + 1 to the last at time n, we
get:
I
Resample
I
Propose new states
I
Weight samples, correcting earlier pre-weighting.
I
Pre-weight every particle with λn+1 ∝ b
p (yn+1 |Xn ).
(i)
(i)
An SIR algorithm targetting:
b
pn (x1:n |y1:n+1 ) ∝ p(x1:n |y1:n )b
p (yn+1 |xn ).
69/79
Some Consequences
Asymptotics.
Theoretical Considerations
I
Direct analysis of the APF is largely unnecessary.
I
Results can be obtained by considering the associated SIR
algorithm.
I
SIR has a (discrete time) Feynman-Kac interpretation.
71/79
For example. . .
Proposition. Under standard regularity conditions
√ N
N ϕ
bn,APF − ϕn → N 0, σn2 (ϕn )
where,
σn2 (ϕn ) =
Z
2
Z
p(x1 |y1:n )2
ϕn (xn )p(xn |y2:n , x1 )dxn − ϕ̄n dx1
q1 (x1 )
Z
2
Z
t−1
X
p(x1:k |y1:n )2
+
ϕn (xn )p(xn |yk+1:n , xk )dxn − ϕ̄n dx1:k
b(x1:k−1 |y1:k )qk (xk |xk−1 )
p
k=2
Z
p(x1:n |y1:n )2
2
+
(ϕn (xn ) − ϕ̄n ) dx1:n .
b(x1:n−1 |y1:n )qn (xn |xn−1 )
p
72/79
Practical Implications
I
It means we’re doing importance sampling.
I
Choosing b
p (yn |xn−1 ) = p (yn |xn = E [Xn |xn−1 ]) is dangerous.
I
A safer choice would be ensure that
sup
xn−1 ,xn
I
g (yn |xn )f (xn |xn−1 )
<∞
b(yn |xn−1 )q(xn |xn−1 )
p
Using APF doesn’t ensure superior performance.
73/79
A Contrived Illustration
Consider the following binary state-space model with common
state and observation spaces:
X = {0, 1}
p(x1 = 0) = 0.5
p(xn = xn−1 ) = 1 − δ
Y=X
p(yn = xn ) = 1 − ε.
I
δ controls ergodicity of the state process.
I
controls the information contained in observations.
Consider estimating E(X2 |Y1:2 = (0, 1)).
74/79
Variance of SIR - Variance of APF
0.05
0
-0.05
-0.1
-0.15
-0.2
1
0.9
0.8
0.7
0.05
0.6
0.1
0.15
0.5
0.4
0.2
0.25
ǫ
0.3
0.3
0.35
δ
0.2
0.4
0.45
0.1
0.5
75/79
Variance Comparison at ǫ = 0.25
0.45
SISR
APF
SISR Asymptotic
APF Asymptotic
0.4
Variance
0.35
0.3
0.25
0.2
0.15
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
δ
76/79
Further Reading
Particle Filters / Sequential Monte Carlo
I
Novel approach to nonlinear/non-Gaussian Bayesian state
estimation, Gordon, Salmond and Smith, IEE Proceedings-F,
140(2):107-113, 1993.
I
Sequential Monte Carlo in Practice, Doucet, De Freitas &
Gordon, Springer, 2001.
I
A survey of convergence results on particle filtering methods
for practitioners., Crisan and Doucet, IEEE Transactions on
Signal Processing, 50(3):736-746, 2002.
I
Central limit theorem for sequential Monte Carlo methods and
its applications to Bayesian inference. Chopin, Annals of
Statistics, 32(6):2385-2411, 2004.
I
A Tutorial on Particle Filtering. . . , Doucet & J., in Oxford
Handbook of Nonlinear Filtering, 2011). Chapter 24:656–704.
I
Sequential Monte Carlo Samplers, Doucet, Del Moral & Jasra,
JRSSB, 63(3):411-436, 2006.
78/79
Resampling
I
Comparison of resampling schemes for particle filters. Douc,
Cappé and Moulines (2005). In Proc. ISPA 4, I:64–69.
Feynman-Kac Particle Models
I
Particle Methods: An Introduction with Applications, Del
Moral & Doucet, HAL INRIA Report RR-6691, 2009.
I
Feynman-Kac Formulæ. . . , Del Moral, Springer, 2004.
Auxiliary Particle Filters
I
Filtering Via Simulation, Pitt & Shepherd, JASA,
94(446):590-599, 1999.
I
A Note on Auxiliary Particle Filters, J. & Doucet, Stat. &
Prob. Lett., 78(12):1498–1504, 2008.
I
Optimality of the APF, Douc, Moulines & Olsson, Prob. &
Math. Stat., 29(1):1–28, 2009.
79/79
Download