Some [very biased] Perspectives on Sequential Monte Carlo and Normalising Constants

advertisement
Some [very biased] Perspectives on Sequential
Monte Carlo and Normalising Constants
Adam M. Johansen
Collaborators Include: John Aston, Alexandre Bouchard-Côté,
Pierre Del Moral, Arnaud Doucet, Pieralberto Guarniero, Anthony
Lee, Fredrik Lindsten, Christian Næsseth, Thomas Schön, Yan Zhou
University of Warwick
a.m.johansen@warwick.ac.uk
www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/
CRiSM Workshop on Estimating Normalising Constants
University of Warwick, April 20th, 2016
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
Outline
I
Background: SMC and (Ratios of) Constants
I
Some Applications and Variations on a Theme
I
Some “Interesting” (Open) Questions
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
Essential Problem
I
Given a distribution,
π(dx) =
I
I
γ(dx)
γ(x)dx
=
,
Z
Z
such that γ(x) can be evaluated pointwise,
R
how can we “estimate” Z = γ(dx)?
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
Importance Sampling
I
Simple Identity, provided γ µ:
Z
Z
Z
dγ
γ(x)
Z = γ(dx) =
(x)µ(dx) =
µ(x)dx
dµ
µ(x)
I
So, if X1 , X2 , . . . ∼ µ, then:
iid
#
N
1 X dγ
(Xi ) =Z
∀N : E
N
dµ
"
unbiasedness
1
N→∞ N
slln
lim
√
"
1
N
h
i
dγ
(X1 ) .
where W ∼ N 0, Var dµ
clt
SMC and Constants
Adam M. Johansen
lim
N→∞
N
N
X
i=1
i=1
N
X
i=1
dγ
a.s.
(Xi ) →Z
dµ
#
dγ
d
(Xi ) − Z →W
dµ
Background
Applications, Variations, Questions
References
Sequential Importance Sampling
I
Write
γ(x1:n ) = γ(x1 )
n
Y
γ(xp |x1:p−1 ),
p=2
I
define, for p = 1, . . . , n
γp (x1:p ) = γ1 (x1 )
p
Y
γ(xq |x1:q−1 ),
q=2
I
then
n
γ1 (x1 ) Y
γp (x1:p )
γ(x1:n )
=
,
µ(x1:n )
µ1 (x1 )
γp−1 (x1:p−1 )µp (xp |x1:p−1 )
p=2
| {z } | {z }
|
{z
}
Wn (x1:n )
I
w1 (x1 )
wp (x1:p )
and we can sequentially approximate Zp =
SMC and Constants
Adam M. Johansen
R
γp (x1:p )dx1:p .
Background
Applications, Variations, Questions
References
Sequential Importance Resampling (SIR)
Given a sequence γ1 (x1 ), γ2 (x1:2 ), . . .:
Initialisation, n = 1:
iid
I
Sample X11 , . . . , X1N ∼ µ1
I
Compute
W1i =
I
bN =
Obtain Z
1
1
N
γ1 (X1i )
µi1 (X1i )
PN
i
i=1 W1
[This is just importance sampling.]
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
Iteration, n ← n + 1:
I
1:N 1
Resample: sample (A1n−1 , . . . , AN
n−1 ) ∼ r (·|Wn−1 ) .
I
i
i
Set Bn,n
= i and recursively Bn,p
= Ap n,p+1 .
I
n,n−1
)
Sample Xni ∼ qn (·|X1 n,1 , . . . , Xn−1
I
Compute
Bi
Bi
Bi
Bi
Wni
I
=
Bi
γn ((X1 n,1 , . . . , Xn n,n ))
Bi
Bi
Bi
Bi
n,n−1
n,n−1
)
)) · qn (Xni |X1 n,1 , . . . , Xn−1
γn−1 ((X1 n,1 , . . . , Xn−1
Obtain
N
1 X i
N
N
b
b
Zn = Zn−1 ·
Wn .
N
i=1
1
j
such that P(Ain−1 = j) ∝ Wn−1
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
Ancestral Trees
t=1
t=2
t=3
a21 =1
a24 =3
a11 =1
a14 =3
2
4
6
b3,1:3
=(1, 1, 2) b3,1:3
=(3, 3, 4) b3,1:3
=(4, 5, 6)
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
SIR: Theoretical Justification
Under regularity conditions we still have:
unbiasedness
b N ] = Zn
E[Z
n
slln
b N a.s.
lim Z
n = Zn
N→∞
clt For a normal random variable Wn of appropriate
variance:
√
d
b N − Zn ] =
lim N[Z
Wn
n
N→∞
although establishing this becomes a little harder (cf., e.g. Del
Moral (2004), Andrieu et al. 2010).
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
Simple Particle Filters: One Family of SIR Algorithms
I
I
I
x1
x2
x3
x4
x5
x6
y1
y2
y3
y4
y5
y6
Unobserved Markov chain {Xn } transition f .
Observed process {Yn } conditional density g.
The joint density is available:
p(x1:n , y1:n |θ) = f1θ (x1 )g θ (y1 |x1 )
n
Y
f θ (xi |xi−1 )g θ (yi |xi ).
i=2
I
Natural SIR target distributions:
πnθ (x1:n ) :=p(x1:n |y1:n , θ) ∝ p(x1:n , y1:n |θ) =: γnθ (x1:n )
Z
Znθ = p(x1:n , y1:n |θ)dx1:n = p(y1:n |θ)
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
Bootstrap PFs and Similar
I
Choosing
πnθ (x1:n ) :=p(x1:n |y1:n , θ) ∝ p(x1:n , y1:n |θ) =: γnθ (x1:n )
Z
θ
Zn = p(x1:n , y1:n |θ)dx1:n = p(y1:n |θ)
I
and qp (xp |x1:p−1 ) = f θ (xp |xp−1 ) yields the bootstrap particle
filter of Gordon et al. (1993),
I
whereas qp (xp |x1:p−1 ) = p(xp |xp−1 , yp , θ) yields the “locally
optimal” particle filter.
I
Note: Many alternative particle filters are SIR algorithms
with other targets. Cf. J. and Doucet (2008); Doucet and J.
(2011).
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
Sequential Monte Carlo Samplers: Another SIR Class
Given a sequence of targets π1 , . . . , πn on arbitrary spaces, Del
Moral et al. (2006) extend the space:
π̃n (x1:n ) =πn (xn )
1
Y
Lp (xp+1 , xp )
p=n−1
γ̃n (x1:n ) =γn (xn )
1
Y
Lp (xp+1 , xp )
p=n−1
Z˜n =
Z
γ̃n (dx1:n )
Z
=
γn (dxn )
1
Y
Z
Lp (xp+1 , dxp ) =
γn (dxn ) = Zn
p=n−1
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
A Simple SMC Sampler
Given γ1 , . . . , γn , on (E , E), for i = 1, . . . , N
iid
I
Sample X1i ∼ µ1 compute W1i =
P
bN = 1 N W i
Z
I
For p = 2, . . . , n
1
N
γ1 (X1i )
µ1 (X1i )
and
1
i=1
I
1:N
1:N
∼ r (·|Wn−1
).
Resample: An−1
I
n−1
Sample: Xni ∼ Kn (Xn−1
, ·), where πn Kn = πn .
I
Compute: Wni =
Ai
Ai
I
SMC and Constants
Adam M. Johansen
bN =
And Z
n
n−1
γn (Xn−1
)
Ai
.
n−1
γn−1 (Xn−1
)
PN
1
N
b
Zn−1 · N i=1 Wni .
Background
Applications, Variations, Questions
References
Rare Events (J., Del Moral and Doucet, 2006)
I
If Y ∼ η, P(Y ∈ A) = η(A) = η(IA ).
I
We’re interested in A = {x : V (x) ≥ V̂ }, with η(A) 1.
I
If γ(x) = η(x)IA (x), then Zn = η(A).
I
Let α1 = 0 < α2 < · · · < αT = ∞, and define
b − V (x))))−1
γn (x) = η(x)(1 + exp((αn (V
I
We have γ1 = η and γT /η = IA so ZT = η(A).
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
1.0
1.5
An Illustrative Sequence of Targets
alpha
0.0
0.5
pi_a(x)
0
1
2
5
10
25
−5
0
5
x
SMC and Constants
Adam M. Johansen
10
15
Background
Applications, Variations, Questions
References
Evidence Evaluation (Chopin, 2001;Del Moral et al.,
2006)
In a Bayesian context:
I
Given a prior p(θ) and likelihood l(θ; y1:m )
I
One could specify:
Data Tempering γp (θ) = p(θ)l(θ; y1:mp ) for
m1 = 0 < m2 < · · · < mn = m
Likelihood Tempering γp (θ) = p(θ)l(θ; y1:m )βp for
β1 = 0 < β2 < · · · < βT = 1
Something else?
R
For such schemes ZT = p(θ)l(θ; y1:n )dθ.
I
I
Specifying (m1 , . . . , mT ), (β1 , . . . , βT ) or (γ1 , . . . , γT ) is not
trivial.
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
1.2
1.0
0.8
0.6
dnorm(x, mu0, var0)
0.0
0.2
0.4
0.8
0.6
0.4
0.0
0.2
dnorm(x, mu0, var0)
1.0
1.2
Illustrative Sequences of Targets
−10
−5
0
x
SMC and Constants
Adam M. Johansen
5
10
−10
−5
0
5
10
x
Background
Applications, Variations, Questions
References
One Adaptive Scheme (Zhou, J. & Aston, 2016)+Refs
Resample When ESS is below a threshold.
Likelihood Tempering At iteration n: Set αn such that:
P
(j)
(j) 2
N( N
j=1 Wn−1 wn )
= CESS?
PN
(k)
(k) 2
W
(w
)
n
k=1
n−1
Proposals Follow (Jasra et al., 2010): adapt to keep
acceptance rate about right.
Question
Are there better, practical approaches to specifying a sequence of
distributions?
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
Results from Zhou et al. (2016)
Model.Order
1
2
3
Model.Order
1
2
3
Model selection results using AIC (above) / Evidence (below).
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
Iterative Adaptation (Guaniero, J. and Lee, 2015)
[Poster 7]
I
In filtering context
πp (x1:p ) ∝ p(x1:p , y1:p )p(yp+1:n |xp )
|
{z
}
ψp? (xp )
and
qp (xp |x1:p−1 ) = p(xp |xp−1 , yp:n )
I
gives better estimates of ZT than the filtering distributions.
The iAPF iteratively targets sequences
πnl (x1:n ) ∝ p(x1:n , y1:n )ψnl (xn ) while estimating ψnl+1 .
Question
Can we approximate broader classes of transitions?
Are there better approximation schemes?
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
A Linear Gaussian Model: Behaviour with Dimension
µ = N (·; 0, Id )
f (x, ·) = N (·; Ax, Id )
where Aij = 0.42|i−j|+1 ,
0.5
1.0
1.5
iAPF
Bootstrap
0.0
Estimates
2.0
and g(x, ·) = N (·; x, Id )
5
5
10
10
20
40
80
Dimension
Box plots of Ẑ /Z for different |X| (1000 replicates; T = 100).
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
Linear Gaussian Model: Sensitivity to Parameters
1.5
1.0
Estimates
0.0
0.5
1.5
1.0
0.0
0.5
Estimates
2.0
2.0
Fixing d = 10: Bootstrap (N = 50, 000) / iAPF (N0 = 1, 000)
0.3
0.34
0.38
0.42
Values of α
Box plots of
SMC and Constants
Adam M. Johansen
Ẑ
Z
0.46
0.5
0.3
0.34
0.38
0.42
0.46
0.5
Values of α
for different values of the parameter α using 1000
replicates.
Background
Applications, Variations, Questions
References
Linear Gaussian Model: PMMH Empirical
Autocorrelations
0
100
200
300
400
500
600
0.8
ACF
0
Lag
d =5
µ = N (·; 0, Id )
f (x, ·) = N (·; Ax, Id )
and g(x, ·) = N (·; x, 0.25Id )
SMC and Constants
Adam M. Johansen
0.4
0.0
ACF
0.4
0.0
0.4
0.0
ACF
A55
0.8
A41
0.8
A11
100
200
300
400
500
Lag
In this case:

0.9
 0.3

A=
 0.1
 0.4
0.1
600
0
100
200
300
400
500
600
Lag
0
0
0 0
0.7 0
0 0
0.2 0.6 0 0
0.1 0.1 0.3 0
0.2 0.5 0.2 0



,


Background
treated as unknown lower trianguApplications, Variations, Questions
References
lar matrix.
Divide and Conquer (Lindsten, J. et al., 2014)
Question
Are there good ways to decompose general models?
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
dc-smc(t)
1. For c ∈ C(t) :
bN
1.1 ({Xci , Wci }N
i=1 , Zc ) ← dc-smc(c).
1.2 Resample {xic , wic }N
i=1 to obtain the equally weighted particle
i
N
system {x̂c , 1}i=1 .
2. For particle i = 1 : N:
2.1 If X̃t 6= ∅, simulate e
xit ∼ qt (· | x̂ic1 , . . . , x̂icC ), where
(c1 , c2 , . . . , cC ) = C(t);
else e
xit ← ∅.
2.2 Set xit = (x̂ic1 , . . . , x̂icC , e
xit ).
2.3 Compute wit =
γt (xit )
1
.
i
xit |x̂ic1 ,...,x̂ic )
c∈C(t) γc (x̂c ) qt (e
Q
C
3. Compute
bN
Z
t
=
n P
N
1
N
i
i=1 wt
oQ
bN
c∈C(t) Zc .
bN
4. Return ({xit , wit }N
i=1 , Zt ).
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
Multilevel modelling of NYC maths test data
●
●
●
●
●
●
Estimate of log(Z)
−3825
●
−3850
●
−3875
sampling_method
DC
STD
−3900
100
1000
10000
100000
1000000
Number of particles (log scale)
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
How Wrong is it to Approximate? (Everitt, J. et al.,
2015) [Poster 4]
What if Wpi can’t be evaluated exactly or Kn isn’t exactly
πn -invariant?
b N ), weights:
MSE of log(Z
n
exact (black solid),
unbiased random (black
dashed),
biased random (grey solid)
biased random — perfect
mixing (grey dashed).
Question
Can we actually say anything practically relevant?
SMC and Constants
Adam M. Johansen
Background
Applications, Variations, Questions
References
Inconclusive Conclusions
I
I
SMC provides good estimates of many (normalizing)
constants.
There are many questions left to answer:
I
How best to choose sequences of distributions. . .
I
I
I
I
I
I
SMC and Constants
Adam M. Johansen
“Filtering”
Generally: Tempering / Data-tempering / not tempering?
Model-decompositions? Tempering to Independence?
To what extent can we ever justify using approximate
approximations within SMC?
How best can we approximate optimal lookahead functions?
How much can be done adaptively?
Background
Applications, Variations, Questions
References
References
[1] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo. Journal of the Royal
Statistical Society B, 72(3):269–342, 2010.
[2] N. Chopin. A sequential particle filter method for static models. Biometrika, 89(3):539–551, 2002.
[3] P. Del Moral. Feynman-Kac formulae: genealogical and interacting particle systems with applications.
Probability and Its Applications. Springer Verlag, New York, 2004.
[4] P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo methods for Bayesian Computation. In
Bayesian Statistics 8. Oxford University Press, 2006.
[5] A. Doucet and A. M. Johansen. A tutorial on particle filtering and smoothing: Fiteen years later. In
D. Crisan and B. Rozovsky, editors, The Oxford Handbook of Nonlinear Filtering, pages 656–704. Oxford
University Press, 2011.
[6] R. G. Everitt, A. M. Johansen, E. Rowing, and M. Evdemon-Hogan. Bayesian model selection with
un-normalised likelihoods. Statistics and Computing, 2016. In press.
[7] N. J. Gordon, S. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-Gaussian Bayesian
state estimation. IEE Proceedings-F, 140(2):107–113, April 1993.
[8] P. Guarniero, A. M. Johansen, and A. Lee. The iterated auxiliary particle filter. ArXiv mathematics e-print
1511.06286, ArXiv Mathematics e-prints, 2015.
[9] A. Jasra, D. A. Stephens, A. Doucet, and T. Tsagaris. Inference for Lévy-Driven Stochastic Volatility
Models via Adaptive Sequential Monte Carlo. Scandinavian Journal of Statistics, 38(1):1–22, Dec. 2010.
[10] A. M. Johansen and A. Doucet. A note on the auxiliary particle filter. Statistics and Probability Letters, 78
(12):1498–1504, September 2008. URL http://dx.doi.org/10.1016/j.spl.2008.01.032.
[11] A. M. Johansen, P. Del Moral, and A. Doucet. Sequential Monte Carlo samplers for rare events. In
Proceedings of the 6th International Workshop on Rare Event Simulation, pages 256–267, Bamberg,
Germany, October 2006.
[12] F. Lindsten, A. M. Johansen, C. A. Næsseth, B. Kirkpatrick, T. Schön, J. A. D. Aston, and
A. Bouchard-Côté. Divide and conquer with sequential Monte Carlo samplers. Technical Report
1406.4993, ArXiv Mathematics e-prints, 2014.
[13] Y. Zhou, A. M. Johansen, and J. A. D. Aston. Towards automatic model comparison: An adaptive
sequential Monte Carlo approach. Journal of Computational and Graphical Statistics, 2015. In press.
Download