Particle Markov Chain Monte Carlo Christophe Andrieu University of Bristol

advertisement
Particle Markov Chain Monte Carlo
Christophe Andrieu
University of Bristol
(joint with A. Doucet & R. Holenstein)
UoB
16/03/2009
C.A. (UoB)
PMCMC
16/03/2009
1 / 33
Background
Aim: sample from a probability distribution π with di¢ cult patterns
of dependence of its possibly numerous components.
C.A. (UoB)
PMCMC
16/03/2009
2 / 33
Background
Aim: sample from a probability distribution π with di¢ cult patterns
of dependence of its possibly numerous components.
Sampling directly from π is rarely possible directly and the most
popular techniques are
C.A. (UoB)
PMCMC
16/03/2009
2 / 33
Background
Aim: sample from a probability distribution π with di¢ cult patterns
of dependence of its possibly numerous components.
Sampling directly from π is rarely possible directly and the most
popular techniques are
Markov chain Monte Carlo (MCMC),
C.A. (UoB)
PMCMC
16/03/2009
2 / 33
Background
Aim: sample from a probability distribution π with di¢ cult patterns
of dependence of its possibly numerous components.
Sampling directly from π is rarely possible directly and the most
popular techniques are
Markov chain Monte Carlo (MCMC),
Sequential Monte Carlo (SMC) aka Particle …lters.
C.A. (UoB)
PMCMC
16/03/2009
2 / 33
Background
Aim: sample from a probability distribution π with di¢ cult patterns
of dependence of its possibly numerous components.
Sampling directly from π is rarely possible directly and the most
popular techniques are
Markov chain Monte Carlo (MCMC),
Sequential Monte Carlo (SMC) aka Particle …lters.
Both classes of methods require the design of so-called proposal
distributions whose aim is to propose samples which are then
corrected in order to produce samples from π.
C.A. (UoB)
PMCMC
16/03/2009
2 / 33
Background
Aim: sample from a probability distribution π with di¢ cult patterns
of dependence of its possibly numerous components.
Sampling directly from π is rarely possible directly and the most
popular techniques are
Markov chain Monte Carlo (MCMC),
Sequential Monte Carlo (SMC) aka Particle …lters.
Both classes of methods require the design of so-called proposal
distributions whose aim is to propose samples which are then
corrected in order to produce samples from π.
The design of such instrumental distributions is di¢ cult since,
C.A. (UoB)
PMCMC
16/03/2009
2 / 33
Background
Aim: sample from a probability distribution π with di¢ cult patterns
of dependence of its possibly numerous components.
Sampling directly from π is rarely possible directly and the most
popular techniques are
Markov chain Monte Carlo (MCMC),
Sequential Monte Carlo (SMC) aka Particle …lters.
Both classes of methods require the design of so-called proposal
distributions whose aim is to propose samples which are then
corrected in order to produce samples from π.
The design of such instrumental distributions is di¢ cult since,
they should be “easy” to sample from,
C.A. (UoB)
PMCMC
16/03/2009
2 / 33
Background
Aim: sample from a probability distribution π with di¢ cult patterns
of dependence of its possibly numerous components.
Sampling directly from π is rarely possible directly and the most
popular techniques are
Markov chain Monte Carlo (MCMC),
Sequential Monte Carlo (SMC) aka Particle …lters.
Both classes of methods require the design of so-called proposal
distributions whose aim is to propose samples which are then
corrected in order to produce samples from π.
The design of such instrumental distributions is di¢ cult since,
they should be “easy” to sample from,
should capture important features of the target distribution π (scale,
dependence structure).
C.A. (UoB)
PMCMC
16/03/2009
2 / 33
Background
Aim: sample from a probability distribution π with di¢ cult patterns
of dependence of its possibly numerous components.
Sampling directly from π is rarely possible directly and the most
popular techniques are
Markov chain Monte Carlo (MCMC),
Sequential Monte Carlo (SMC) aka Particle …lters.
Both classes of methods require the design of so-called proposal
distributions whose aim is to propose samples which are then
corrected in order to produce samples from π.
The design of such instrumental distributions is di¢ cult since,
they should be “easy” to sample from,
should capture important features of the target distribution π (scale,
dependence structure).
The success of MCMC and SMC relies on their ability to break down
the di¢ cult problem of sampling from π into simpler sub-problems.
C.A. (UoB)
PMCMC
16/03/2009
2 / 33
Objectives
Given their di¤ering strengths it seems natural to combine MCMC
and SMC.
C.A. (UoB)
PMCMC
16/03/2009
3 / 33
Objectives
Given their di¤ering strengths it seems natural to combine MCMC
and SMC.
We know how to use MCMC within SMC (Gilks & Berzuini, JRSS B,
2001), but what about the reverse?
C.A. (UoB)
PMCMC
16/03/2009
3 / 33
Objectives
Given their di¤ering strengths it seems natural to combine MCMC
and SMC.
We know how to use MCMC within SMC (Gilks & Berzuini, JRSS B,
2001), but what about the reverse?
We show here how to use SMC as proposal mechanisms in MCMC in
a simple and principled way.
C.A. (UoB)
PMCMC
16/03/2009
3 / 33
Objectives
Given their di¤ering strengths it seems natural to combine MCMC
and SMC.
We know how to use MCMC within SMC (Gilks & Berzuini, JRSS B,
2001), but what about the reverse?
We show here how to use SMC as proposal mechanisms in MCMC in
a simple and principled way.
In particular we show that this can allow one to simplify the design of
proposal distributions in numerous scenarios.
C.A. (UoB)
PMCMC
16/03/2009
3 / 33
Motivating Example: General State-Space Models
Let fXk gk
1
be a Markov process de…ned by
X1
C.A. (UoB)
µ and Xk j (Xk
1
PMCMC
= xk
1)
f ( j xk
1) .
16/03/2009
4 / 33
Motivating Example: General State-Space Models
Let fXk gk
1
be a Markov process de…ned by
X1
µ and Xk j (Xk
1
= xk
1)
f ( j xk
1) .
We only have access to a process fYk gk 1 such that, conditional
upon fXk gk 1 , the observations are statistically independent and
Yk j (Xk = xk )
C.A. (UoB)
PMCMC
g ( j xk ) .
16/03/2009
4 / 33
Bayesian Inference in General State-Space Models
Given a collection of observations y1:T := (y1 , ..., yT ), we are
interested in inference about X1:T := (X1 , ..., XT ) .
C.A. (UoB)
PMCMC
16/03/2009
5 / 33
Bayesian Inference in General State-Space Models
Given a collection of observations y1:T := (y1 , ..., yT ), we are
interested in inference about X1:T := (X1 , ..., XT ) .
In this Bayesian framework, inference relies on
p ( x1:T j y1:T ) =
p (x1:T , y1:T )
p (y1:T )
where
T
p (x1:T , y1:T ) = µ (x1 ) ∏ f ( xk j xk
|
C.A. (UoB)
k =2
{z
prior
PMCMC
T
1)
∏ g ( yk j xk )
}k|=1 {z
likelihood
}
16/03/2009
5 / 33
Bayesian Inference in General State-Space Models
Given a collection of observations y1:T := (y1 , ..., yT ), we are
interested in inference about X1:T := (X1 , ..., XT ) .
In this Bayesian framework, inference relies on
p ( x1:T j y1:T ) =
p (x1:T , y1:T )
p (y1:T )
where
T
p (x1:T , y1:T ) = µ (x1 ) ∏ f ( xk j xk
|
k =2
{z
prior
T
1)
∏ g ( yk j xk )
}k|=1 {z
likelihood
}
Except for a few models including …nite state-space HMM and linear
Gaussian models, this posterior distribution does not admit a closed
form expression.
C.A. (UoB)
PMCMC
16/03/2009
5 / 33
Standard MCMC Approaches
We sample an ergodic Markov chain fX1:T (i )g of invariant
distribution p ( x1:T j y1:T ) using a Metropolis-Hastings (MH)
algorithm; e.g. an independent MH sampler.
C.A. (UoB)
PMCMC
16/03/2009
6 / 33
Standard MCMC Approaches
We sample an ergodic Markov chain fX1:T (i )g of invariant
distribution p ( x1:T j y1:T ) using a Metropolis-Hastings (MH)
algorithm; e.g. an independent MH sampler.
At iteration i
Sample X1:T
W.p. 1 ^
q (x1:T ) .
p (X 1:T ,y1:T ) q (X 1:T (i 1 ))
q (X 1:T ) p (X 1:T (i 1 ),y1:T )
X1:T (i ) = X1:T (i
C.A. (UoB)
set X1:T (i ) = X1:T , otherwise set
1).
PMCMC
16/03/2009
6 / 33
Standard MCMC Approaches
We sample an ergodic Markov chain fX1:T (i )g of invariant
distribution p ( x1:T j y1:T ) using a Metropolis-Hastings (MH)
algorithm; e.g. an independent MH sampler.
At iteration i
Sample X1:T
W.p. 1 ^
q (x1:T ) .
p (X 1:T ,y1:T ) q (X 1:T (i 1 ))
q (X 1:T ) p (X 1:T (i 1 ),y1:T )
X1:T (i ) = X1:T (i
set X1:T (i ) = X1:T , otherwise set
1).
Very simple but very ine¢ cient.
C.A. (UoB)
PMCMC
16/03/2009
6 / 33
Standard MCMC Approaches
We sample an ergodic Markov chain fX1:T (i )g of invariant
distribution p ( x1:T j y1:T ) using a Metropolis-Hastings (MH)
algorithm; e.g. an independent MH sampler.
At iteration i
Sample X1:T
W.p. 1 ^
q (x1:T ) .
p (X 1:T ,y1:T ) q (X 1:T (i 1 ))
q (X 1:T ) p (X 1:T (i 1 ),y1:T )
X1:T (i ) = X1:T (i
set X1:T (i ) = X1:T , otherwise set
1).
Very simple but very ine¢ cient.
For good performance, we need to select q (x1:T ) p ( x1:T j y1:T ) but
this is essentially impossible as soon as, say, T > 10.
C.A. (UoB)
PMCMC
16/03/2009
6 / 33
Standard MCMC Approaches
We sample an ergodic Markov chain fX1:T (i )g of invariant
distribution p ( x1:T j y1:T ) using a Metropolis-Hastings (MH)
algorithm; e.g. an independent MH sampler.
At iteration i
Sample X1:T
W.p. 1 ^
q (x1:T ) .
p (X 1:T ,y1:T ) q (X 1:T (i 1 ))
q (X 1:T ) p (X 1:T (i 1 ),y1:T )
X1:T (i ) = X1:T (i
set X1:T (i ) = X1:T , otherwise set
1).
Very simple but very ine¢ cient.
For good performance, we need to select q (x1:T ) p ( x1:T j y1:T ) but
this is essentially impossible as soon as, say, T > 10.
Errors tend to accumulate.
C.A. (UoB)
PMCMC
16/03/2009
6 / 33
Common Approaches and Limitations
Standard practice consists of building an MCMC kernel updating
subblocks
p (xn:n +K
C.A. (UoB)
1 jy1:T , x1:n 1 , xn +K :T )
∝
PMCMC
n +K
∏ f (xk jxk
k =n
n +K 1
1)
∏
k =n
g (yk jxk ) .
16/03/2009
7 / 33
Common Approaches and Limitations
Standard practice consists of building an MCMC kernel updating
subblocks
p (xn:n +K
1 jy1:T , x1:n 1 , xn +K :T )
∝
n +K
∏ f (xk jxk
k =n
n +K 1
1)
∏
k =n
g (yk jxk ) .
Tradeo¤ between length K or the blocks and ease of design of good
proposals.
C.A. (UoB)
PMCMC
16/03/2009
7 / 33
Common Approaches and Limitations
Standard practice consists of building an MCMC kernel updating
subblocks
p (xn:n +K
1 jy1:T , x1:n 1 , xn +K :T )
∝
n +K
∏ f (xk jxk
k =n
n +K 1
1)
∏
k =n
g (yk jxk ) .
Tradeo¤ between length K or the blocks and ease of design of good
proposals.
Many complex models (e.g. volatility models, in…nite HMM) are such
that it is only possible to sample forward from the prior but impossible
to evaluate it pointwise: how could one use MCMC for such models?
C.A. (UoB)
PMCMC
16/03/2009
7 / 33
Sequential Monte Carlo aka Particle Filters
SMC methods provide an alternative way to approximate
p ( x1:T j y1:T ) and to compute p (y1:T ).
C.A. (UoB)
PMCMC
16/03/2009
8 / 33
Sequential Monte Carlo aka Particle Filters
SMC methods provide an alternative way to approximate
p ( x1:T j y1:T ) and to compute p (y1:T ).
To sample from p ( x1:T j y1:T ), SMC proceeds sequentially by …rst
approximating p ( x1 j y1 ) at time 1 then p ( x1:2 j y1:2 ) at time 2 and so
on.
C.A. (UoB)
PMCMC
16/03/2009
8 / 33
Sequential Monte Carlo aka Particle Filters
SMC methods provide an alternative way to approximate
p ( x1:T j y1:T ) and to compute p (y1:T ).
To sample from p ( x1:T j y1:T ), SMC proceeds sequentially by …rst
approximating p ( x1 j y1 ) at time 1 then p ( x1:2 j y1:2 ) at time 2 and so
on.
SMC methods approximate the distributions of interest via a cloud of
N particles which are propagated using Importance Sampling and
Resampling steps.
C.A. (UoB)
PMCMC
16/03/2009
8 / 33
Vanilla SMC - The Boostrap Filter
At time 1
(i )
Sample X1
(i )
∑N
i = 1 W1
C.A. (UoB)
(i )
µ (x1 ) and compute W1
∝g
(i )
y1 j X1 , ,
= 1.
PMCMC
16/03/2009
9 / 33
Vanilla SMC - The Boostrap Filter
At time 1
(i )
Sample X1
(i )
∑N
i = 1 W1
(i )
µ (x1 ) and compute W1
(i )
∝g
y1 j X1 , ,
= 1.
(i )
Resample N times from p
b ( x1 j y1 ) = ∑N
i =1 W1 δX (i ) (x1 ).
1
At time k, k > 1
C.A. (UoB)
PMCMC
16/03/2009
9 / 33
Vanilla SMC - The Boostrap Filter
At time 1
(i )
Sample X1
(i )
∑N
i = 1 W1
(i )
µ (x1 ) and compute W1
(i )
∝g
y1 j X1 , ,
= 1.
(i )
Resample N times from p
b ( x1 j y1 ) = ∑N
i =1 W1 δX (i ) (x1 ).
1
At time k, k > 1
(i )
Sample Xk
(i )
Wk
∝g
C.A. (UoB)
(i )
f
xk j Xk
(i )
yk j Xk
(i )
1
(i )
, set X1:k = X1:k
(i )
, ∑N
i = 1 Wk
(i )
1 , Xk
and compute
= 1.
PMCMC
16/03/2009
9 / 33
Vanilla SMC - The Boostrap Filter
At time 1
(i )
Sample X1
(i )
∑N
i = 1 W1
(i )
µ (x1 ) and compute W1
(i )
∝g
y1 j X1 , ,
= 1.
(i )
Resample N times from p
b ( x1 j y1 ) = ∑N
i =1 W1 δX (i ) (x1 ).
1
At time k, k > 1
(i )
Sample Xk
(i )
Wk
∝g
(i )
f
xk j Xk
(i )
yk j Xk
(i )
1
(i )
, set X1:k = X1:k
(i )
, ∑N
i = 1 Wk
(i )
1 , Xk
and compute
= 1.
(i )
Resample N times from p
b ( x1:k j y1:k ) = ∑N
i =1 Wk δX (i ) (x1:k ).
1:k
C.A. (UoB)
PMCMC
16/03/2009
9 / 33
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
15
20
25
15
20
25
time index
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
time index
C.A. (UoB)
PMCMC
16/03/2009
10 / 33
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
15
20
25
15
20
25
time index
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
time index
b [ X1 j y1 ] , E
b [ X2 j y1 :2 ] (top) and particle
Figure: p ( x1 j y1 ) , p ( x2 j y1 :2 )and E
approximation of p ( x1 :2 j y1 :2 ) (bottom)
C.A. (UoB)
PMCMC
16/03/2009
11 / 33
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
15
20
25
15
20
25
time index
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
time index
b [ Xk j y1 :k ] for k = 1, 2, 3 (top) and particle
Figure: p ( xk j y1 :k ) and E
approximation of p ( x1 :3 j y1 :3 ) (bottom)
C.A. (UoB)
PMCMC
16/03/2009
12 / 33
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
15
20
25
15
20
25
time index
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
time index
b [ Xk j y1 :k ] for k = 1, .., 4 (top) and particle
Figure: p ( xk j y1 :k ) and E
approximation of p ( x1 :4 j y1 :4 ) (bottom)
C.A. (UoB)
PMCMC
16/03/2009
13 / 33
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
15
20
25
15
20
25
time index
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
time index
b [ Xk j y1 :k ] for k = 1, ..., 10 (top) and particle
Figure: p ( xk j y1 :k ) and E
approximation of p ( x1 :10 j y1 :10 ) (bottom)
C.A. (UoB)
PMCMC
16/03/2009
14 / 33
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
15
20
25
15
20
25
time index
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
time index
b [ Xk j y1 :k ] for k = 1, ..., 20 (top) and particle
Figure: p ( xk j y1 :k ) and E
approximation of p ( x1 :20 j y1 :20 ) (bottom)
C.A. (UoB)
PMCMC
16/03/2009
15 / 33
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
15
20
25
15
20
25
time index
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
time index
b [ Xk j y1 :k ] for k = 1, ..., 24 (top) and particle
Figure: p ( xk j y1 :k ) and E
approximation of p ( x1 :24 j y1 :24 ) (bottom)
C.A. (UoB)
PMCMC
16/03/2009
16 / 33
SMC Output
At time T , we obtain the following approximation of the posterior of
interest p ( x1:T j y1:T )
N
p
b ( x1:T j y1:T ) =
∑ WT
(i )
i =1
δX (i ) (x1:T )
1:T
and an approximation of p (y1:T ) given by
T
b ( yk j y1:k
p
b (y1:T ) = p
b (y1 ) ∏ p
T
1)
k =2
C.A. (UoB)
PMCMC
=
∏
k =1
1
N
N
∑
i =1
g
(i )
yk j Xk
16/03/2009
!
.
17 / 33
SMC Output
At time T , we obtain the following approximation of the posterior of
interest p ( x1:T j y1:T )
N
p
b ( x1:T j y1:T ) =
∑ WT
(i )
i =1
δX (i ) (x1:T )
1:T
and an approximation of p (y1:T ) given by
T
b ( yk j y1:k
p
b (y1:T ) = p
b (y1 ) ∏ p
T
1)
k =2
=
∏
k =1
1
N
N
∑
i =1
g
(i )
yk j Xk
!
.
These approximations are asymptotically (i.e. N ! ∞) consistent
under very weak assumptions.
C.A. (UoB)
PMCMC
16/03/2009
17 / 33
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
15
20
25
15
20
25
time index
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
time index
b [ Xk j y1 :k ] for k = 1, ..., 24 (top) and particle
Figure: p ( xk j y1 :k ) and E
approximation of p ( x1 :24 j y1 :24 ) (bottom)
C.A. (UoB)
PMCMC
16/03/2009
18 / 33
Some Theoretical Results
Under mixing assumptions (Del Moral, 2004), we have
kLaw (X1:T )
where X1:T
C.A. (UoB)
p ( x1:T j y1:T )ktv
C
T
N
p
b ( x1:T j y1:T ).
PMCMC
16/03/2009
19 / 33
Some Theoretical Results
Under mixing assumptions (Del Moral, 2004), we have
kLaw (X1:T )
where X1:T
p ( x1:T j y1:T )ktv
C
T
N
p
b ( x1:T j y1:T ).
Under mixing assumptions (Chopin, 2004; Del Moral & D., 2004) we
also have
T
V [p
b (y1:T )]
D .
2
p (y1:T )
N
C.A. (UoB)
PMCMC
16/03/2009
19 / 33
Some Theoretical Results
Under mixing assumptions (Del Moral, 2004), we have
kLaw (X1:T )
where X1:T
p ( x1:T j y1:T )ktv
C
T
N
p
b ( x1:T j y1:T ).
Under mixing assumptions (Chopin, 2004; Del Moral & D., 2004) we
also have
T
V [p
b (y1:T )]
D .
2
p (y1:T )
N
Loosely speaking, the performance of SMC only degrades linearly with
time instead of exponentially for naive approaches.
C.A. (UoB)
PMCMC
16/03/2009
19 / 33
Combining MCMC and SMC
‘Idea’: Use the output of our SMC algorithm to de…ne ‘good’
high-dimensional proposal distributions for MCMC.
C.A. (UoB)
PMCMC
16/03/2009
20 / 33
Combining MCMC and SMC
‘Idea’: Use the output of our SMC algorithm to de…ne ‘good’
high-dimensional proposal distributions for MCMC.
Problem: Consider X1:T
p
b ( x1:T j y1:T ) then
Law (X1:T ) := q (x1:T ) = E
N
∑
i =1
(i )
WT δ X ( i )
1:T
!
(x1:T ) .
However, q (x1:T ) does not admit a closed-form expression and so we
cannot directly use the MH algorithm to correct for the discrepancy
between q (x1:T ) and p ( x1:T j y1:T ).
C.A. (UoB)
PMCMC
16/03/2009
20 / 33
Combining MCMC and SMC
‘Idea’: Use the output of our SMC algorithm to de…ne ‘good’
high-dimensional proposal distributions for MCMC.
Problem: Consider X1:T
p
b ( x1:T j y1:T ) then
Law (X1:T ) := q (x1:T ) = E
N
∑
i =1
(i )
WT δ X ( i )
1:T
!
(x1:T ) .
However, q (x1:T ) does not admit a closed-form expression and so we
cannot directly use the MH algorithm to correct for the discrepancy
between q (x1:T ) and p ( x1:T j y1:T ).
Hence the need for a new methodology....
C.A. (UoB)
PMCMC
16/03/2009
20 / 33
Particle IMH Sampler
At iteration 1
Run an SMC algorithm to obtain p
b(1 ) ( x1:T j y1:T ) and p
b(1 ) (y1:T ).
C.A. (UoB)
PMCMC
16/03/2009
21 / 33
Particle IMH Sampler
At iteration 1
Run an SMC algorithm to obtain p
b(1 ) ( x1:T j y1:T ) and p
b(1 ) (y1:T ).
Sample X1:T (1)
At iteration i; i
C.A. (UoB)
p
b(1 ) ( x1:T j y1:T ) .
2
PMCMC
16/03/2009
21 / 33
Particle IMH Sampler
At iteration 1
Run an SMC algorithm to obtain p
b(1 ) ( x1:T j y1:T ) and p
b(1 ) (y1:T ).
Sample X1:T (1)
At iteration i; i
p
b(1 ) ( x1:T j y1:T ) .
2
Run an SMC algorithm to obtain p
b ( x1:T j y1:T ) and p
b (y1:T ).
C.A. (UoB)
PMCMC
16/03/2009
21 / 33
Particle IMH Sampler
At iteration 1
Run an SMC algorithm to obtain p
b(1 ) ( x1:T j y1:T ) and p
b(1 ) (y1:T ).
Sample X1:T (1)
At iteration i; i
p
b(1 ) ( x1:T j y1:T ) .
2
Run an SMC algorithm to obtain p
b ( x1:T j y1:T ) and p
b (y1:T ).
Sample X1:T
C.A. (UoB)
p
b ( x1:T j y1:T ) .
PMCMC
16/03/2009
21 / 33
Particle IMH Sampler
At iteration 1
Run an SMC algorithm to obtain p
b(1 ) ( x1:T j y1:T ) and p
b(1 ) (y1:T ).
Sample X1:T (1)
At iteration i; i
p
b(1 ) ( x1:T j y1:T ) .
2
Run an SMC algorithm to obtain p
b ( x1:T j y1:T ) and p
b (y1:T ).
Sample X1:T
W.p.
p
b ( x1:T j y1:T ) .
1^
p
b (y1:T )
(y1:T )
p
b(i 1 )
set X1:T (i ) = X1:T and p
b(i ) (y1:T ) = p
b (y1:T ), otherwise set
X1:T (i ) = X1:T (i 1) and p
b(i ) (y1:T ) = p
b(i 1 ) (y1:T ) .
C.A. (UoB)
PMCMC
16/03/2009
21 / 33
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
15
20
25
15
20
25
time index
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
time index
b [ X1 j y1 ] , E
b [ X2 j y1 :2 ] (top) and particle
Figure: p ( x1 j y1 ) , p ( x2 j y1 :2 )and E
approximation of p ( x1 :2 j y1 :2 ) (bottom)
C.A. (UoB)
PMCMC
16/03/2009
22 / 33
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
15
20
25
15
20
25
time index
1.6
1.4
state
1.2
1
0.8
0.6
0.4
5
10
time index
b [ Xk j y1 :k ] for k = 1, 2, 3 (top) and particle
Figure: p ( xk j y1 :k ) and E
approximation of p ( x1 :3 j y1 :3 ) (bottom)
C.A. (UoB)
PMCMC
16/03/2009
23 / 33
Main result
Consider the distribution of all the r.v. generated by the particle …lter
ψ x11:N , ..., xT1:N , a11:N , ..., aT1:N 1
C.A. (UoB)
PMCMC
16/03/2009
24 / 33
Main result
Consider the distribution of all the r.v. generated by the particle …lter
ψ x11:N , ..., xT1:N , a11:N , ..., aT1:N 1
Proposition. For any N 1 the PIMH sampler is an independent
MH sampler de…ned on the extended space
f1, . . . , N g X TN f1, . . . , N g(T 1 )N with target density
k, x11:N , ..., xT1:N , a11:N , . . . , aT1:N 1
=
1
NT
bk
µ (x 1 1 )
T
k jy
p (x1:T
1:T )
∏n =2 r (ank
1 j wn
1 )f
ak
ak 1
1)
(xn n jx1:nn
1:N
ψ x11:N , ..., xT1:N , a1:T
1
k
which admits a conditional distribution for x1:T
equal to
p ( x1:T j y1:T ).
C.A. (UoB)
PMCMC
16/03/2009
24 / 33
Waste Recycling
The ‘standard’estimate of
iterations is
C.A. (UoB)
R
f (x1:T ) p ( x1:T j y1:T ) dx1:T for L MCMC
1 L
f (X1:T (i )).
L i∑
=1
PMCMC
16/03/2009
25 / 33
Waste Recycling
The ‘standard’estimate of
iterations is
R
f (x1:T ) p ( x1:T j y1:T ) dx1:T for L MCMC
1 L
f (X1:T (i )).
L i∑
=1
We generate N particles at each iteration i of the MCMC algorithm
to decide whether to accept or reject one single candidate. This
seems terribly wasteful...
C.A. (UoB)
PMCMC
16/03/2009
25 / 33
Waste Recycling
The ‘standard’estimate of
iterations is
R
f (x1:T ) p ( x1:T j y1:T ) dx1:T for L MCMC
1 L
f (X1:T (i )).
L i∑
=1
We generate N particles at each iteration i of the MCMC algorithm
to decide whether to accept or reject one single candidate. This
seems terribly wasteful...
It is possible to use all the particles by considering the estimate
!
N
1 L
(k )
(k )
∑ WT (i ) f (X1:T (i )) .
L i∑
=1 k =1
C.A. (UoB)
PMCMC
16/03/2009
25 / 33
Waste Recycling
The ‘standard’estimate of
iterations is
R
f (x1:T ) p ( x1:T j y1:T ) dx1:T for L MCMC
1 L
f (X1:T (i )).
L i∑
=1
We generate N particles at each iteration i of the MCMC algorithm
to decide whether to accept or reject one single candidate. This
seems terribly wasteful...
It is possible to use all the particles by considering the estimate
!
N
1 L
(k )
(k )
∑ WT (i ) f (X1:T (i )) .
L i∑
=1 k =1
We can even recycle the populations of rejected particles...
C.A. (UoB)
PMCMC
16/03/2009
25 / 33
Extensions
If T is too large compared to the number N of particles, then the
SMC approximation might be poor and results in an ine¢ cient MH
algorithm.
C.A. (UoB)
PMCMC
16/03/2009
26 / 33
Extensions
If T is too large compared to the number N of particles, then the
SMC approximation might be poor and results in an ine¢ cient MH
algorithm.
We can use a mixture/composition of PMH transition probabilities
that update subblocks of the form Xa:b for 1 a < b T , e¤ectively
targeting conditional distributions of the type p (xa:b jxa 1 , xb +1 , ya:b ).
C.A. (UoB)
PMCMC
16/03/2009
26 / 33
Extensions
If T is too large compared to the number N of particles, then the
SMC approximation might be poor and results in an ine¢ cient MH
algorithm.
We can use a mixture/composition of PMH transition probabilities
that update subblocks of the form Xa:b for 1 a < b T , e¤ectively
targeting conditional distributions of the type p (xa:b jxa 1 , xb +1 , ya:b ).
Given that we know the exact structure of the extended target
distribution, we can rejuvenate some of the particles conditional upon
the others for computational purposes.
C.A. (UoB)
PMCMC
16/03/2009
26 / 33
Nonlinear State-Space Model
Consider the following model
Xk
Yk
where Vk
C.A. (UoB)
=
=
1
Xk
2
1
+ 25
Xk 1
1 + Xk2
+ 8 cos 1.2k + Vk ,
1
Xk2
+ Wk
20
N 0, σ2v , Wk
N 0, σ2w and X1
PMCMC
N 0, 52 .
16/03/2009
27 / 33
Nonlinear State-Space Model
Consider the following model
Xk
Yk
=
=
1
Xk
2
1
+ 25
Xk 1
1 + Xk2
+ 8 cos 1.2k + Vk ,
1
Xk2
+ Wk
20
N 0, σ2v , Wk N 0, σ2w and X1 N 0, 52 .
We investigate the evolution of the expected acceptance ratio as a
function of N in two scenarios.
where Vk
C.A. (UoB)
PMCMC
16/03/2009
27 / 33
Nonlinear State-Space Model
Consider the following model
Xk
Yk
=
=
1
Xk
2
1
+ 25
Xk 1
1 + Xk2
+ 8 cos 1.2k + Vk ,
1
Xk2
+ Wk
20
N 0, σ2v , Wk N 0, σ2w and X1 N 0, 52 .
We investigate the evolution of the expected acceptance ratio as a
function of N in two scenarios.
Use the prior as proposal distribution!
where Vk
C.A. (UoB)
PMCMC
16/03/2009
27 / 33
A More Realistic Problem
In numerous scenarios, µ (x ), f ( x 0 j x ) and g ( y j x ) are not known
exactly but depend on an unknown parameter θ and we note µθ (x ),
fθ ( x 0 j x ) and gθ ( y j x ).
C.A. (UoB)
PMCMC
16/03/2009
28 / 33
A More Realistic Problem
In numerous scenarios, µ (x ), f ( x 0 j x ) and g ( y j x ) are not known
exactly but depend on an unknown parameter θ and we note µθ (x ),
fθ ( x 0 j x ) and gθ ( y j x ).
We set a prior p (θ ) on θ and we are interested in
p ( θ, x1:T j y1:T ) =
C.A. (UoB)
p ( x1:T , y1:T j θ ) p (θ )
p (y1:T )
PMCMC
16/03/2009
28 / 33
A Marginal Metropolis-Hastings Algorithm
To sample from p ( θ, x1:T j y1:T )one could suggest the ideal algorithm
(which e¤ectively integrates x1:T out).
C.A. (UoB)
PMCMC
16/03/2009
29 / 33
A Marginal Metropolis-Hastings Algorithm
To sample from p ( θ, x1:T j y1:T )one could suggest the ideal algorithm
(which e¤ectively integrates x1:T out).
At iteration i
Sample θ
q ( θ j θ (i 1)) and x1:T
pθ (x1:T jy1:T )
p ( y1:T jθ )p (θ )
q ( θ (i 1 )jθ )
W.p. 1 ^ q ( θ jθ (i 1 )) p ( y jθ (i 1 ))p (θ (i 1 )) set θ (i ) = θ and
1:T
x1:T (i ) = x1:T , otherwise set θ (i ) = θ (i 1) and
x1:T (i ) = x1:T (i 1).
C.A. (UoB)
PMCMC
16/03/2009
29 / 33
A Marginal Metropolis-Hastings Algorithm
To sample from p ( θ, x1:T j y1:T )one could suggest the ideal algorithm
(which e¤ectively integrates x1:T out).
At iteration i
Sample θ
q ( θ j θ (i 1)) and x1:T
pθ (x1:T jy1:T )
p ( y1:T jθ )p (θ )
q ( θ (i 1 )jθ )
W.p. 1 ^ q ( θ jθ (i 1 )) p ( y jθ (i 1 ))p (θ (i 1 )) set θ (i ) = θ and
1:T
x1:T (i ) = x1:T , otherwise set θ (i ) = θ (i 1) and
x1:T (i ) = x1:T (i 1).
Problems: sampling
x1:T exactly and expression for
R
p ( y1:T j θ ) = p ( x1:T , y1:T j θ ) dx1:T ?
C.A. (UoB)
PMCMC
16/03/2009
29 / 33
Particle Marginal MH Sampler
At iteration 1
Run an SMC algorithm to obtain p
b ( y1:T j θ (1)) and sample
x1:T (1) p̂θ (1 ) (x1:T jy1:T )
At iteration i; i
C.A. (UoB)
2
PMCMC
16/03/2009
30 / 33
Particle Marginal MH Sampler
At iteration 1
Run an SMC algorithm to obtain p
b ( y1:T j θ (1)) and sample
x1:T (1) p̂θ (1 ) (x1:T jy1:T )
At iteration i; i
2
Sample θ
q ( θ j θ (i 1)), run an SMC algorithm to obtain
p
b ( y1:T j θ ) and a sample x1:T
p̂θ (x1:T jy1:T )
C.A. (UoB)
PMCMC
16/03/2009
30 / 33
Particle Marginal MH Sampler
At iteration 1
Run an SMC algorithm to obtain p
b ( y1:T j θ (1)) and sample
x1:T (1) p̂θ (1 ) (x1:T jy1:T )
At iteration i; i
2
Sample θ
q ( θ j θ (i 1)), run an SMC algorithm to obtain
p
b ( y1:T j θ ) and a sample x1:T
p̂θ (x1:T jy1:T )
W.p.
1^
q ( θ (i 1)j θ )
p
b ( y1:T j θ ) p (θ )
q ( θ j θ (i 1)) p
b ( y1:T j θ (i 1)) p (θ (i
1))
set θ (i ) = θ and x1:T (i ) = x1:T otherwise set θ (i ) = θ (i
x1:T (i ) = x1:T (i 1).
C.A. (UoB)
PMCMC
1) and
16/03/2009
30 / 33
More in CA, AD & RH...
Identifying the structure of the underlying target distribution allows us
to de…ne a ‘Particle Gibbs sampler’: this requires one to use a
‘conditional SMC update’.
C.A. (UoB)
PMCMC
16/03/2009
31 / 33
More in CA, AD & RH...
Identifying the structure of the underlying target distribution allows us
to de…ne a ‘Particle Gibbs sampler’: this requires one to use a
‘conditional SMC update’.
More generally, sampling not constrained to the PIMH or the PMMH
updates.
C.A. (UoB)
PMCMC
16/03/2009
31 / 33
More in CA, AD & RH...
Identifying the structure of the underlying target distribution allows us
to de…ne a ‘Particle Gibbs sampler’: this requires one to use a
‘conditional SMC update’.
More generally, sampling not constrained to the PIMH or the PMMH
updates.
Application to Levy-driven stochastic volatility, for which sampling
from the prior is the only known possibility.
C.A. (UoB)
PMCMC
16/03/2009
31 / 33
Extensions
This methodology is by no means limited to state-space models and
in particular there is no need for any Markovian assumption.
C.A. (UoB)
PMCMC
16/03/2009
32 / 33
Extensions
This methodology is by no means limited to state-space models and
in particular there is no need for any Markovian assumption.
Wherever SMC have been used, PMCMC can be used: self-avoiding
random walks, contingency tables, mixture models, SMC samplers etc.
C.A. (UoB)
PMCMC
16/03/2009
32 / 33
Extensions
This methodology is by no means limited to state-space models and
in particular there is no need for any Markovian assumption.
Wherever SMC have been used, PMCMC can be used: self-avoiding
random walks, contingency tables, mixture models, SMC samplers etc.
Sophisticated SMC algorithms can also be used including ‘clever’
importance distributions and resampling schemes.
C.A. (UoB)
PMCMC
16/03/2009
32 / 33
Extensions
This methodology is by no means limited to state-space models and
in particular there is no need for any Markovian assumption.
Wherever SMC have been used, PMCMC can be used: self-avoiding
random walks, contingency tables, mixture models, SMC samplers etc.
Sophisticated SMC algorithms can also be used including ‘clever’
importance distributions and resampling schemes.
Theoretical results of (Andrieu & Roberts, Ann. Stats, to appear) can
be used in this setup.
C.A. (UoB)
PMCMC
16/03/2009
32 / 33
Discussion
PMCMC allows us to design ‘good’high dimensional proposals based
only on low dimensional proposals,
C.A. (UoB)
PMCMC
16/03/2009
33 / 33
Discussion
PMCMC allows us to design ‘good’high dimensional proposals based
only on low dimensional proposals,
PMCMC provides a simple way to perform inference for models for
which only forward simulation is possible,
C.A. (UoB)
PMCMC
16/03/2009
33 / 33
Discussion
PMCMC allows us to design ‘good’high dimensional proposals based
only on low dimensional proposals,
PMCMC provides a simple way to perform inference for models for
which only forward simulation is possible,
PMCMC is computationally expensive but can be combined with
standard MCMC steps,
C.A. (UoB)
PMCMC
16/03/2009
33 / 33
Discussion
PMCMC allows us to design ‘good’high dimensional proposals based
only on low dimensional proposals,
PMCMC provides a simple way to perform inference for models for
which only forward simulation is possible,
PMCMC is computationally expensive but can be combined with
standard MCMC steps,
Connections to the EAV (Expected Auxiliary Variable) approach for
simulations,
C.A. (UoB)
PMCMC
16/03/2009
33 / 33
Discussion
PMCMC allows us to design ‘good’high dimensional proposals based
only on low dimensional proposals,
PMCMC provides a simple way to perform inference for models for
which only forward simulation is possible,
PMCMC is computationally expensive but can be combined with
standard MCMC steps,
Connections to the EAV (Expected Auxiliary Variable) approach for
simulations,
Extension to ‘dependent’proposals in the spirit of MTM (Liu et al.,
JASA 2000) is feasible.
C.A. (UoB)
PMCMC
16/03/2009
33 / 33
Download