Local Sequential Monte Carlo and Arnaud Doucet Adam M. Johansen

advertisement
Local Sequential Monte Carlo
Adam M. Johansen and Arnaud Doucet
University of Warwick
a.m.johansen@warwick.ac.uk
www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/
March 28th, 2011
Lille Workshop on Filtering, MCMC and ABC
1
Outline
I
Background
I
I
I
I
Local SMC
I
I
I
Hidden Markov Models / State Space Models
Particle Filters / Sequential Monte Carlo
Block Sampling
Motivation
Formulation
Examples
I
I
A Toy Linear Gaussian Model
Stochastic Volatility
2
The Structure of the Problem
30
25
20
sy/km
15
10
5
0
−5
−10
0
5
10
15
20
25
30
35
sx/km
3
Hidden Markov Models / State Space Models
x1
x2
x3
x4
x5
x6
y1
y2
y3
y4
y5
y6
I
Unobserved Markov chain {Xn } transition f .
I
Observed process {Yn } conditional density g.
I
Density:
p(x1:n , y1:n ) = f1 (x1 )g(y1 |x1 )
n
Y
f (xi |xi−1 )g(yi |xi ).
i=2
4
Motivating Examples
I
Tracking, e.g. ACV Model:
I
I
I
States: xn = [sxn uxn syn uyn ]T
Dynamics: xn = Axn−1 + n
 x  
1 ∆t 0 0
sn
 uxn   0 1 0 0
 y =
 sn   0 0 1 ∆t
uyn
0 0 0 1

sxn−1
  uxn−1 

 y
  sn−1  + n
uyn−1

Observation: yn = Bxn + νn

sxn
 uxn 
 y  + νn
 sn 
uyn

I
rnx
rny
=
1
0
0
0
0
1
0
0
Stochastic Volatility, e.g.:
f (xi |xi−1 ) =N φxi−1 , σ 2
g(yi |xi ) =N 0, β 2 exp(xi )
5
Formal Solutions
I
Filtering: Prediction and Update Recursions:
Z
p(xn |y1:n−1 ) = p(xn−1 |y1:n−1 )f (xn |xn−1 )dxn−1
p(xn |y1:n ) = R
I
p(xn |y1:n−1 )g(yn |xn )
p(x0n |y1:n−1 )g(yn |x0n )dx0n
Smoothing:
p(x1:n |y1:n ) = R
p(x1:n−1 |y1:n−1 )f (xn |xn−1 )g(yn |xn )
g(yn |x0n )f (x0n |xn−1 )p(x0n−1 |y1:n−1 )dx0n−1:n
6
The Monte Carlo Method
I
Given a probability density, p,
Z
I=
ϕ(x)p(x)dx
E
I
Simple Monte Carlo solution:
I
I
iid
Sample X1 , . . . , XN ∼ p.
N
P
Estimate Ib = N1
ϕ(Xi ).
i=1
I
Can also be viewed as approximating π(dx) = p(x)dx with
π
bN (dx) =
N
1 X
δXi (dx).
N
i=1
7
Importance Sampling
I
Given q, such that
I
I
p(x) > 0 ⇒ q(x) > 0
and p(x)/q(x) < ∞,
define w(x) = p(x)/q(x) and:
Z
Z
I = ϕ(x)p(x)dx = ϕ(x)w(x)q(x)dx.
I
This suggests the estimator:
I
I
iid
Sample X1 , . . . , XN ∼ q.
N
P
w(Xi )ϕ(Xi ).
Estimate Ib = N1
i=1
I
Can also be viewed as approximating π(dx) = p(x)dx with
π
bN (dx) =
N
1 X
w(Xi )δXi (dx).
N
i=1
8
Self-Normalised Importance Sampling
I
Often, p is known only up to a normalising constant.
I
As Eq (Cwϕ) = CEp (ϕ). . .
I
If v(x) = Cw(x), then
Eq (Cwϕ)
CEp (ϕ)
Eq (vϕ)
=
=
= Ep (ϕ).
Eq (v1)
Eq (Cw1)
CEp (1)
I
Estimate the numerator and denominator with the same
sample:
N
P
v(Xi )ϕ(Xi )
i=1
b
I=
.
N
P
v(Xi )
i=1
9
Importance Sampling in The HMM Setting
I
Given p(x1:n |y1:n ) for n = 1, 2, . . . .
I
Choose qn (x1:n ) = qn (xn |x1:n−1 )qn−1 (x1:n−1 ).
I
Weight:
p(x1:n |y1:n )
qn (xn |x1:n−1 )qn−1 (x1:n−1 )
p(x1:n |y1:n )
=
wn−1 (x1:n−1 )
qn (xn |x1:n−1 )p(x1:n−1 |y1:n−1 )
f (xn |xn−1 )g(yn |xn )
∝
wn−1 (x1:n−1 )
qn (xn |xn−1 )
wn (x1:n ) ∝
10
Sequential Importance Sampling – Prediction & Update
I
A first “particle filter”:
I
I
Simple default: qn (xn |xn−1 ) = f (xn |xn−1 ).
Importance weighting becomes:
wn (x1:n ) = wn−1 (x1:n−1 ) × g(yn |xn )
I
Algorithmically, at iteration n:
I
i
i
} for i = 1, . . . , N :
, X1:n−1
Given {Wn−1
I
I
I
i
Sample Xni ∼ f (·|Xn−1
) (prediction)
i
Weight Wni ∝ Wn−1
g(yn |Xni ) (update)
Actually:
I
I
Better proposals exist. . .
but even they aren’t good enough.
11
Resampling
I
I
Stabilisation of importance weights.
Given {Wni , Xni }:
I
I
I
eni } such that:
Draw {X
"
# P
n
n
W i ϕ(X i )
1 X ei
j
j n
Pn n i n
ϕ(Xn ) σ({Wn , Xn }j=1 ) = i=1
E
n i=1
i=1 Wn
1 ei N
Replace {Wni , Xni }N
i=1 with { N , Xn }i=1 .
Simplest approach (multinomial) resampling:
Pn
j
j=1 Wn δXnj
i iid
e
Xn ∼ P
j
j=1n Wn
I
Lower variance options preferable.
12
Sequential Importance Resampling
I
Algorithmically, at iteration n:
I
I
I
i
i
} for i = 1, . . . , N :
, X1:n−1
Given {Wn−1
ei
Resample, obtaining {1/N, X
1:n−1 }.
I
i
en−1
Sample Xni ∼ qn (·|X
)
I
Weight Wni ∝
i ei
i
f (Xn
|Xn−1 )g(yn |Xn
)
ei
qn (X i |X
)
n
n−1
Actually:
I
I
Resample efficiently.
Only resample when necessary.
13
Iteration 2
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
14
Iteration 3
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
15
Iteration 4
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
16
Iteration 5
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
17
Iteration 6
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
18
Iteration 7
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
19
Iteration 8
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
20
Iteration 9
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
21
Iteration 10
8
7
6
5
4
3
2
1
0
−1
−2
1
2
3
4
5
6
7
8
9
10
22
Block Sampling: An Idealised Approach
At time n, given x1:n−1 ; discard xn−L+1:n−1 :
I
Sample from q(xn−L+1:n |xn−L , yn−L+1:n ).
I
Weight with
W (x1:n ) =
I
p(x1:n |y1:n )
p(x1:n−L |y1:n−1 )q(xn−L+1:n |xn−L , y1:n−L+1:n )
Optimally,
q(xn−L+1:n |xn−L , yn−L+1:n ) =p(xn−L+1:n |xn−L , yn−L+1:n )
p(x1:n−L |y1:n )
W (x1:n ) ∝
=p(yn |x1:n−L , yn−L+1:n−1 )
p(x1:n−L |y1:n−1 )
I
Typically intractable; auxiliary variable approach in [4].
23
Particle MCMC
I
I
MCMC algorithms which employ SMC proposals [1]
SMC algorithm as a collection of RVs
I
I
I
I
Values
Weights
Ancestral Lines
Construct MCMC algorithms:
I
I
I
With many auxiliary variables
Exactly invariant for distribution on extended space
Standard MCMC arguments justify strategy
I
Does this suggest anything about SMC?
I
Can something similar help with smoothing?
24
Ancestral Trees
t=1
t=2
t=3
a13 =1
a43 =3
a12 =1
a42 =3
b23,1:3 =(1, 1, 2) b43,1:3 =(3, 3, 4) b63,1:3 =(4, 5, 6)
25
SMC Distributions
We’ll need:
M
ψn,L
an−L+2:n , xn−L+1:n , k; xn−L
#
"
"M
#
n
M
Y
Y
Y
i
i
i ap
r(k|wn )
r(ap |wp−1 )
q xp |xp−1
=
q( xn−L+1 xn−L )
i=1
p=n−L+2
i=1
and
k
k
M
en−L+2:n
en−L+1:n
ψen,L
a
,x
; xn−L ebkn−L+1:n−1 , k, x
ekn−L+1:n
M
en−L+1:n , k; xn−L )
ψn,L
(e
an−L+2:n , x
"
= #
ebk
n
k
n
e
e
Q
bn,n−L+1
b
n,p−1
e p−1 q x
e n)
q x
en−L+1 |xn−L
r ebkn,p |w
xp−1
epn,p |e
r(k|w
p=n−L+2
26
Toy Model: Linear Gaussian HMM
I
Linear, Gaussian state transition:
f (xt |xt−1 ) = N (xt ; xt−1 , 1)
I
and likelihood
g(yt |xt ) = N (yt ; xt , 1)
I
Analytically: Kalman filter/smoother/etc.
I
Simple bootstrap PF:
I
Proposal:
q(xt |xt−1 , yt ) = f (xt |xt−1 )
I
Weighting:
W (xt−1 , xt ) ∝ g(yt |xt )
I
Resample residually every iteration.
27
More than one SMC Algorithm?
I
Standard approach:
I
I
Run an SIR algorithm with N particles.
Use
N
X
i
πnN (dx1:n ) =
Wni δX1:n
(dx1:n ).
i=1
I
A crude alternative:
I
I
Run L = bN/M c algorithms with M particles.
Use
M
X
Wnl,i δX l,i (dx1:n ).
πnM,l (dx1:n ) =
1:n
i=1
I
I
Guarantees L i.i.d. samples.
For small M their distribution may be poor.
28
Covariance Estimation: 1d Linear Gaussian Model
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
100
200
300
400
500
600
700
800
900
1000
29
Local Particle Filtering: Current Trajectories
4
3
2
1
0
−1
−2
−3
−4
0
2
4
6
8
10
12
14
16
30
Local Particle Filtering: First Particle
4
3
2
1
0
−1
−2
−3
−4
0
2
4
6
8
10
12
14
16
31
Local Particle Filtering: SMC Proposal
4
3
2
1
0
−1
−2
−3
−4
0
2
4
6
8
10
12
14
16
32
Local Particle Filtering: CSMC Auxiliary Proposal
4
3
2
1
0
−1
−2
−3
−4
0
2
4
6
8
10
12
14
16
33
Local SMC
I
Propose from:
⊗n−1
M
U1:M
(b1:n−2 , k)p(x1:n−1 |y1:n−1 )ψn,L
(an−L+2:n , xn−L+1:n , k; xn−L )
k
k
M
e
en−L+2:n−1 , x
en−L+1:n−1 ; xn−L ||bn−L+2:n−1 , xn−L+1:n−1
ψn−1,L−1 a
I
Target:
k̄
b
n,n−L+1:n
⊗n
U1:M
(b1:n−L , b̄k̄n,n−L+1:n−1 , k̄)p(x1:n−L , xn−L+1:n
|y1:n )
k̄
bn,n−L+1:n
k
k
M
k̄
e
ψn,L an−L+2:n , xn−L+1:n ; xn−L b̄n,n−L+1:n , xn−L+1:n
M
en−L+1:n−1 , k; xn−L ) .
ψn−1,L−1
(e
an−L+2:n−1 , x
I
Weight: Z̄n−L+1:n /Zen−L+1:n−1 .
34
Key Identity
M
ψn,L
(an−L+2:n , xn−L+1:n , k; xn−L )
k
M (a k
p(xn−L+1:n |xn−L , yn−L+1:n )ψen,L
n−L+2:n , xn−L+1:n , k; xn−L ||. . . )
"
k
k
#
n
Q
bn,n−L+1
bn,p bn
n,p−1
k
q xn−L+1 |xn−L
r bn,p |wp−1 q xp |xp−1
r(k|wn )
=
p=n−L+2
p(xn−L+1:n |xn−L , yn−L+1:n )
b
=Zn−L+1:n /p(yn−L+1:n |xn−L )
35
Bootstrap Local SMC
I
Top Level:
I
I
I
Local SMC proposal.
Stratified resampling when ESS< N/2.
Local SMC Proposal:
I
Proposal:
q(xt |xt−1 , yt ) = f (xt |xt−1 )
I
Weighting:
W (xt−1 , xt ) ∝
I
f (xt |xt−1 )g(yt |xt )
= g(yt |xt )
f (xt |xt−1 )
Resample multinomially every iteration.
36
Bootstrap Local SMC: M=100
N = 100, M = 100
100
Average Number of Unique Values
90
80
70
L=2
L=3
L=4
L=5
60
50
40
30
20
10
0
0
100
200
300
400
500
n
600
700
800
900
1000
37
Bootstrap Local SMC: M=1000
N = 100, M = 1000
100
Average Number of Unique Values
90
80
70
L=2
L=3
L=4
L=5
60
50
40
30
20
10
0
0
100
200
300
400
500
n
600
700
800
900
1000
38
Bootstrap Local SMC: M=10000
N = 100, M = 10000
100
Average Number of Unique Values
90
80
70
L=2
L=3
L=4
L=5
60
50
40
30
20
10
0
0
100
200
300
400
500
n
600
700
800
900
1000
39
Tuned Local SMC
I
Top Level:
I
I
I
Local SMC proposal.
Stratified resampling when ESS< N/2.
Local SMC Proposal:
I
Proposal:
q(xt |xt−1 , yt ) = p(xt |xt−1 , yt )
I
Weighting:
W (xt−1 , xt ) ∝ p(yt |xt−1 )
I
Resample residually every iteration.
40
Tuned Local SMC: M=100
N = 100, M = 100
100
Average Number of Unique Values
90
80
70
L=2
L=3
L=4
L=5
60
50
40
30
20
10
0
0
100
200
300
400
500
n
600
700
800
900
1000
41
Tuned Local SMC: M=1000
N = 100, M = 1000
100
Average Number of Unique Values
90
80
70
L=2
L=3
L=4
L=5
60
50
40
30
20
10
0
0
100
200
300
400
500
n
600
700
800
900
1000
42
Tuned Local SMC: M=10000
N = 100, M = 10000
100
Average Number of Unique Values
90
80
70
L=2
L=3
L=4
L=5
60
50
40
30
20
10
0
0
100
200
300
400
500
n
600
700
800
900
1000
43
Optimal Block Sampling
N = 100, Exact Block Sampling
100
90
80
L=2
L=3
L=4
L=5
70
60
50
40
30
20
10
0
0
100
200
300
400
500
600
700
800
900
1000
44
Stochastic Volatility Bootstrap Local SMC
I
Model:
f (xi |xi−1 ) =N φxi−1 , σ 2
g(yi |xi ) =N 0, β 2 exp(xi )
I
Top Level:
I
I
I
Local SMC proposal.
Stratified resampling when ESS< N/2.
Local SMC Proposal:
I
Proposal:
q(xt |xt−1 , yt ) = f (xt |xt−1 )
I
Weighting:
W (xt−1 , xt ) ∝
I
f (xt |xt−1 )g(yt |xt )
= g(yt |xt )
f (xt |xt−1 )
Resample residually every iteration.
45
SV Simulated Data
Simulated Data
4
3
Observed Values
2
1
0
−1
−2
−3
0
100
200
300
400
500
600
700
800
900
n
46
SV Bootstrap Local SMC: M=100
N = 100, M = 100
100
Average Number of Unique Values
90
80
70
L=2
L=4
L=6
L = 10
60
50
40
30
20
10
0
0
100
200
300
400
500
600
700
800
900
n
47
SV Bootstrap Local SMC: M=1000
N = 100, M = 1000
100
Average Number of Unique Values
90
80
70
60
L=2
L=4
L=6
L = 10
L = 14
L = 18
50
40
30
20
10
0
0
100
200
300
400
500
600
700
800
900
n
48
SV Bootstrap Local SMC: M=10000
N = 100, M = 10000
100
Average Number of Unique Values
90
80
70
60
50
L=2
L=4
L=6
L = 10
L = 14
L = 18
L = 22
L = 26
40
30
20
10
0
0
100
200
300
400
500
600
700
800
900
n
49
SV Exchange Rata Data
Exchange Rate Data
5
4
Observed Values
3
2
1
0
−1
−2
−3
−4
0
100
200
300
400
500
600
700
800
900
n
50
SV Bootstrap Local SMC: M=100
N = 100, M = 100
100
Average Number of Unique Values
90
80
70
L=2
L=4
L=6
L = 10
60
50
40
30
20
10
0
0
100
200
300
400
500
600
700
800
900
n
51
SV Bootstrap Local SMC: M=1000
N = 100, M = 1000
100
Average Number of Unique Values
90
80
70
60
L=2
L=4
L=6
L = 10
L = 14
L = 18
50
40
30
20
10
0
0
100
200
300
400
500
600
700
800
900
n
52
SV Bootstrap Local SMC: M=10000
N=100, M=10,000
100
Average Number of Unique Values
90
80
70
60
50
L=2
L=4
L=6
L = 10
L = 14
L = 18
L = 22
L = 26
40
30
20
10
0
0
100
200
300
400
500
600
700
800
900
n
53
In Conclusion
I
SMC can be used hierarchically.
I
Software implementation is not difficult [5].
Optimal block sampling can be approximated well:
I
I
I
I
Little specific tuning is required.
Minimizes need for resampling.
Robustness to outliers.
I
The computational cost of this strategy is rather high.
I
Parallel implementations are natural.
I
Actually, similar techniques apply elsewhere [3, 2].
54
References
[1] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain
Monte Carlo. Journal of the Royal Statistical Society B, 72(3):
269–342, 2010.
[2] N. Chopin, P. Jacob, and O. Papaspiliopoulos. SMC2 . ArXiv
Mathematics e-prints, 1101.1528, January 2011.
[3] P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo
samplers. Journal of the Royal Statistical Society B, 63(3):
411–436, 2006.
[4] A. Doucet, M. Briers, and S. Sénécal. Efficient block sampling
strategies for sequential Monte Carlo methods. Journal of
Computational and Graphical Statistics, 15(3):693–711, 2006.
[5] A. M. Johansen. SMCTC: Sequential Monte Carlo in C++.
Journal of Statistical Software, 30(6):1–41, April 2009.
Thanks for Listening. Any questions?
55
Download