Applications of SMC to the analysis of partially observed jump processes

advertisement
Applications of SMC to the analysis of
partially observed jump processes
and: the Entangled Monte Carlo algorithm
Alexandre Bouchard-Côté
University of British Columbia
Collaborators:
Seong-Hwan Jun, Bonnie Kirkpatrick, Liangliang Wang
samedi 22 septembre 2012
Part I : Overview
 Problem: posterior inference on countably infinite
Continuous Time Markov Chains (CTMCs)
 Motivations: phylogenetic inference under evolutionary
models with random dependencies across sites
 Proposed method:
 Proposals based on supermartingales on combinatorial
potentials
 Weights given by exponentiation of random matrices
samedi 22 septembre 2012
Example of Non-local Evolutionary Events
Slipped Strand Mispairing (SSM)
Slipped strand mispairing (SSMs)
An example from Levinson ’87
Normal pairing during DNA replication:
Normal pairing
during DNA
replication
Slipped Strand Mispairing (SSM)
Example of Non-local Evolutionary Events
An example from Levinson ’87
Replication continues. Inserting TA repeat unit:
SSM:
Example of
insertion of an
extra TA
repeat
L. Wang and A. Bouchard-Côté (UBC)
Harnessing Non-Local Evolutionary Events for Tree Infer.
June 25, 2012
Levinson ’87
samedi 22 septembre 2012
SSMs on a tree
Example of Non-local Evolutionary Events
An Example of Evolutionary Events
String-valued branching process:
!:
G A T C
v4 :
v1 :
samedi 22 septembre 2012
G T
A
C
v2 :
v3 :
Example of Non-local Evolutionary Events
SSMs on Events
a branch
xample of Evolutionary
0
G
T1
T2 A
G
!:
A
T
T
C
T
C
C
G A T C
Starting sequence
Point deletion: G
Point mutation: A->G
T3
SSM insertion: (G T)
G
T
G
T
C
T4
Point insertion: A
G T G T A C
T5
SSM deletion: (G T)
G
T
T6
samedi 22 septembre 2012
A
C
v1 :
G T
A
C
Ending sequence
SSMs and phylogenetic inference
 Potential of SSM in phylogenetics:
 Interactions between SSMs and point mutations adds
constrains---this can help resolving trees and alignments
 Very frequent in neutral regions (e.g. plant introns)
 This potential has not been exploited yet
 Reason: inference is computationally challenging
samedi 22 septembre 2012
Computational problem
 Our application (phylogenetic tree inference) requires
SMC/PMCMC samplers...
 but the main ideas can be explained in a simpler
setup:
 Computing a marginal transition probability,
 using importance sampling
samedi 22 septembre 2012
mate for the transition probability P(Xe |Xs ).
Model: String-valued Continuous Time Markov Chain
Marginal
transition
probability
Evolution of strings is denoted by M
r
Branch length is T = T6 .
Strings {Xt : 0 ≤ t ≤ T }; Xt has a countable infinite domain of all possib
molecular sequences.
X = Strings
x5 G T G T A C
x4 G T G T C
x6 G
T
A
C
x1 G A
T
C
x2 A
T
C
x3 G T
C
XN = y
X1 = x
X2
0
L. Wang and A. Bouchard-Côté (UBC)
samedi 22 septembre 2012
T1
T2
T3
T4 T5
Harnessing Non-Local Evolutionary Events for Tree Infer.
T6 Branch length
t
June 25, 2012
mate for the transition probability P(Xe |Xs ).
Model: String-valued Continuous Time Markov Chain
Marginal
transition
probability
X
Evolution of strings is denoted by M
r
Branch length is T = T6 .
Strings {Xt : 0 ≤ t ≤
T }; X
has
a
countable
infinite
domain
of
all
possib
t
P(X
=
y|X
=
x)
N
1
molecular sequences.
X = Strings
x5 G T G T A C
x4 G T G T C
x6 G
T
A
C
x1 G A
T
C
x2 A
T
C
x3 G T
C
XN = y
X1 = x
X2
0
L. Wang and A. Bouchard-Côté (UBC)
samedi 22 septembre 2012
T1
T2
T3
T4 T5
Harnessing Non-Local Evolutionary Events for Tree Infer.
T6 Branch length
t
June 25, 2012
f the above equation.
The
average
of
the
P(XN = y|X1 = x)
mate for the transition
probability
P(X
|X
).
e
s
P(X = y|X = x)
N
Model
1
Model: String-valued Continuous Time Markov Chain
Xi+1 |Xi ∼ νXi
Evolution of strings is denoted by M
 Jump distribution: Xi+1 |Xi ∼ νXi
r
Branch length is T = T6 .
.
∼t has
Exp(λ);
Hinfinite
Strings
{Xt :times:
0 ≤ t ≤H
T };i X
a countable
domain
all possibl
 Hold
i =T
i − Tof
i−1
molecular sequences.
X = Strings
x5 G T G T A C
x4 G T G T C
x6 G
T
A
C
x1 G A
T
C
x2 A
T
C
x3 G T
C
XN = y
X1 = x
X2
0
L.22
Wang
and A.
Bouchard-Côté
samedi
septembre
2012
(UBC)
T1
T2
H3
T3
T4 T5
Harnessing Non-Local Evolutionary Events for Tree Infer.
T6 Branch length
t
June 25, 2012
i+1
X
i
Xi
Xi+1 |Xi ∼ νXi
Parameters:
example
H ∼ Exp(λ); H = T − T
i
i
Model: String-valued Continuous Time Markov Chain
Model: String-valued Continuous Time Markov Chain
P(XN = y|X1 = x)
H
∼
Exp(λ);
i
An
Example
of
λ(x)
and J(x
An Example
i
i−1
H
=
T
−
T
i
i
i−1
→
·)
→ ·)
λ : X → (0, ∞)
|Xdeparting
The X
rate
of
departing
from x:
x: λ(x)
λ(x)
i+1of
i ∼ νXi from
The
rate
λ(x) =
= nθ
nθsub
+ λλλ
nµpt
+λ
λ(0,
+ ff (x)µ
(x)µSSM
sub +
pt +
pt→
SSM ∞)
SSM
:+X
λ(x)
nµ
+
+
pt
SSM
Hi ∼ Exp(λ); Hi = Ti − Ti−1
n: length
length of
of x;
x;
n:
(x): the
the number
number of
of valid
valid SSM
SSM deletion
deletion locations.
locations.
ff(x):
λjumping
: X → (0,
∞)
The
distribution:
J(x
→
·) × FX → [0, 1]
ν→
:X
The jumping distribution: J(x
·)
!!
Mutation
type
from
x
to
x
Mutation type from x to x


ν : X × FX → [0, 1] 

θsub
Point substitution
substitution

θ
Point
sub


λpt



λ

pt
Point insertion
insertion


n+1
Point

n+1
$ x!! ) = 11
µpt
Point deletion
deletion
J(x
→
νx ({x
λ(x)  µpt
Point
J(x
→ x})) = λ(x)
λSSM


λ
SSM insertion
insertion

SSM


f
(x)
SSM



f (x)


 µµSSM
SSM deletion,
deletion,
SSM
SSM
2
L. Wang and A. Bouchard-Côté (UBC)
L. Wang and A. Bouchard-Côté (UBC)
samedi 22 septembre 2012
2
Harnessing Non-Local Evolutionary Events for Tree Infer.
Harnessing Non-Local Evolutionary Events for Tree Infer.
2
June 25, 2012
June 25, 2012
8 / 17
8 / 17
i+1
X
i
Xi
Xi+1 |Xi ∼ νXi
Parameters:
example
H ∼ Exp(λ); H = T − T
i
i
Model: String-valued Continuous Time Markov Chain
Model: String-valued Continuous Time Markov Chain
P(XN = y|X1 = x)
H
∼
Exp(λ);
i
An
Example
of
λ(x)
and J(x
An Example
i
i−1
H
=
T
−
T
i
i
i−1
→
·)
→ ·)
λ : X → (0, ∞)
|Xdeparting
The X
rate
of
departing
from x:
x: λ(x)
λ(x)
i+1of
i ∼ νXi from
The
rate
λ(x) =
= nθ
nθsub
+ λλλ
nµpt
+λ
λ(0,
+ ff (x)µ
(x)µSSM
sub +
pt +
pt→
SSM ∞)
SSM
:+X
λ(x)
nµ
+
+
pt
SSM
Hi ∼ Exp(λ); Hi = Ti − Ti−1
n: length
length of
of x;
x;
n:
(x): the
the number
number of
of valid
valid SSM
SSM deletion
deletion locations.
locations.
ff(x):
λjumping
: X → (0,
∞)
The
distribution:
J(x
→
·) × FX → [0, 1]
ν→
:X
The jumping distribution: J(x
·)
!!
Mutation
type
from
x
to
x
Mutation type from x to x


ν : X × FX → [0, 1] 

θsub
Point substitution
substitution

θ
Point
sub


λpt



λ

pt
Point insertion
insertion


n+1
Point
Note:
this
is

n+1
1
!
1
µpt
Point deletion
deletion
J(x
→$xx
! ) = λ(x)
νx ({x
})
µ
Point
J(x
→
) = λ(x)

λpt
explosion
free
SSM


λ
SSM
insertion

SSM


f (x)
SSM insertion



f
(x)
 µ


SSM deletion,
deletion,
(always assumed
SSM
µSSM
SSM
2
L. Wang and A. Bouchard-Côté (UBC)
L. Wang and A. Bouchard-Côté (UBC)
samedi 22 septembre 2012
2
Harnessing Non-Local Evolutionary Events for Tree Infer.
Harnessing Non-Local Evolutionary Events for Tree Infer.
2
today)
June 25, 2012
June 25, 2012
8 / 17
8 / 17
i+1
X
i
Xi
Xi+1 |Xi ∼ νXi
Parameters:
example
H ∼ Exp(λ); H = T − T
i
Model: String-valued Continuous Time Markov Chain
Model: String-valued Continuous Time Markov Chain
P(XN = y|X1 = x)
H
∼
Exp(λ);
i
An
Example
of
λ(x)
and J(x
An Example
i
i
Note:
i−1
unbounded
rate
H
=
T
−
T
i
i
i−1
→
·)
→ ·)
function
λ : X → (0, ∞)
|Xdeparting
The X
rate
of
departing
from x:
x: λ(x)
λ(x)
i+1of
i ∼ νXi from
The
rate
λ(x) =
= nθ
nθsub
+ λλλ
nµpt
+λ
λ(0,
+ ff (x)µ
(x)µSSM
sub +
pt +
pt→
SSM ∞)
SSM
:+X
λ(x)
nµ
+
+
pt
SSM
Hi ∼ Exp(λ); Hi = Ti − Ti−1
n: length
length of
of x;
x;
n:
(x): the
the number
number of
of valid
valid SSM
SSM deletion
deletion locations.
locations.
ff(x):
λjumping
: X → (0,
∞)
The
distribution:
J(x
→
·) × FX → [0, 1]
ν→
:X
The jumping distribution: J(x
·)
!!
Mutation
type
from
x
to
x
Mutation type from x to x


ν : X × FX → [0, 1] 

θsub
Point substitution
substitution

θ
Point
sub


λpt



λ

pt
Point insertion
insertion


n+1
Point
Note:
this
is

n+1
1
!
1
µpt
Point deletion
deletion
J(x
→$xx
! ) = λ(x)
νx ({x
})
µ
Point
J(x
→
) = λ(x)

λpt
explosion
free
SSM


λ
SSM
insertion

SSM


f (x)
SSM insertion



f
(x)
 µ


SSM deletion,
deletion,
(always assumed
SSM
µSSM
SSM
2
L. Wang and A. Bouchard-Côté (UBC)
L. Wang and A. Bouchard-Côté (UBC)
samedi 22 septembre 2012
2
Harnessing Non-Local Evolutionary Events for Tree Infer.
Harnessing Non-Local Evolutionary Events for Tree Infer.
2
today)
June 25, 2012
June 25, 2012
8 / 17
8 / 17
Related work
 Finite case: efficient exact and approximate exponentiation and
estimation of rate matrices (Albert 1962; Asmussen et al 1996;
Hobolth et al. 2005; Tataru et al. 2011; inter alia)
 When the rate function is bounded: Uniformization (Jensen
1953; Hobolth et al. 2009; inter alia), more recent jump-diffusion
inference schemes using thinning for the discrete part (Casella
et al. 2011, Murray Pollock’s talk)
 MCMC approaches (Rao et al. 2011)
 Work on countable spaces based on forward simulation (Saeedi
et al. 2011, Läubli 2011)
 Birth-death processes (Crawford et al. 2011; inter alia)
samedi 22 septembre 2012
∗
Proposed
method:
notation
!
∗
|x |
! νx∗i ({x∗i+1 })P
∗
Xν =
(X
,
X
,
.
.
.
,
X
)
1
2
N
∗
i=1 xi ({xi+1 })P
|x |
State space: list of visited states between end points
T = (T1 , T2 , . . . , TN )
X = (X1 , X2 , . . . , XN )
i=1
X
=
(X
,
X
,
.
.
.
,
X
)
1
2
N
∗
∗
Marginalized:
transition
times
X
=
(X
X21,=
. . x,
.,X
N )= y)
π(x ) = P(X = x1 ,|X
XN
T = (T1 , T2 , . . . , TN )
T = (T1 , T2 , . . . , TN )
∗
Target distribution: x ∈ X
∗
∗
∗
π({x }) = P(X = x |X1 = x, XN = y)
samedi 22 septembre 2012
∗
x ∈X
∗
−λ(X1 ) λ(X1 )

" !
−λ(X
)
λ(X
)

#
#
1
1
−λ(X
)
λ(X
)
n−1
n
2
2
T
=
(T
,
T
,
.
.
.
,
T
)
∗
1
2 transition
N t ≤ probability
Obtaining
marginal
=
y P Xthe
=
x
,
H
<
H
|X
i
i
1
i=1 2 ) λ(X2 ) i=1

−λ(X




P(X
=
y|X
=
x)
−
N
1

∗
∗
π({x}) = P(X = x |X1 = x, XN =−λ(X
y) N
P (X =
∗ x |X
∗ 1 = x) } γ(x ) Z
=
x ∈X
∗
γ(x
P(XN = y|X1 =
x) ) } Z
∗
∗
Marginal
transition obtained from
∗ the estimator of Z
" !
=y
P̃(X
=
x
)
#
#
n−1
n
∗
P X = x , i=1 Hi < t ≤ i=1 Hi |X
∗ N = y|X1 ∗
P(X
=
x)
P̃(X = x ) = P(X = x |X1 = x)
samedi 22 septembre 2012
"
!
#n−1
#n
"
P X "= x!, i=1 Hi < t ≤ i=1 Hi |X1 = x
Proposal
#
#
n−1
n
∗
∗
x|x∗ | = y P(X
P NX==y|X
x , 1 =i=1
x)Hi < t ≤ i=1 Hi |X1
P(XN = y|X1 = x)
∗
Notation: P̃(X = x )
∗
P̃(X = x )
∗
Natural choice: Forward simulation
∗
∗
P̃(X = x ) = P(X = x |X1 = x)
The space is infinite
samedi 22 septembre 2012
positive probability of not reaching y
∗
∗
P̃(X = x ) = P(X = x y|X1
Solution: introduce potentials ρ
 Functions on the state space
 Assume: ρy(x) = 0 iff x = y
=
y
ρ :X →N
 Dependency on the length to end point also possible
Example: Levenshtein edit distance
ρ’ACTG’(‘CGG’) = min number of point insertion, deletion, subst.
=2
samedi 22 septembre 2012
∗
P̃(X = x ) = P(Xy = x |X1 = x)
ρ
:
X
→
N
P̃(X = x ) = P(X = x |X1 = x)
Using the potentials
y
ρ :X →N
$
!
$
y
y
y
Pρy ρ:!X(X
)
<
ρ
(X
)
ρ
(X
)
$
$
n+1
n
n
"
→N
∗
∗
$
 If for all x ≠ y: P ρ (Xn+1 ) < ρ (x) $ Xn = x > 0
y
!
∗
y
$
"
y  For ρ = Levenshtein,
y
y because forP̃
this$holds
x ≠ y there
ρ (Xisn+1
)
<
ρ
(X
)
ρ
(X
)
>
0
>
0
$
P̃
n
n
always a string z reached in one operation and
closer (or equal) to y
 Then we can build
P̃
P̃(N < ∞) = 1
such that:
↑y
νx (A)
P̃(N < ∞) = 1
y
= νx (A ∩ {z : ρ (x) > ρ (z)})
P
samedi 22 septembre 2012
y
P̃
∗
∗
∗
∗
h1 λ(x1 )) . . . λ(xn−1 ) exp(hn−1 xn−1 )(1−exp(hn xn )) dh1
%
%
y
P
Construction
P
ρ∗ : X → N ∗
∗
..
...
λ(x1P̃(N
) exp(−h
< ∞) 1=λ(x
1 1 )) . . . λ(xn−1 ) exp(
Notation:
!restriced on states decreasing$the potential:
& '( ↓yProposal
)
↓y y
$ y
y
P ρ (X
) < ρ (X ) $ ρ (X ) >
λ(x )) . . . λ(x
) exp(h
x
)(1−exp(h x ))
ν
ν
−
ν
x
∗
x
x
y
y
n=|x
|times
∗=
∗
∗
n+1
n∗
n
y
y
↓y
α
+
(1
−
α
)
λ(x
))
.
.
.
λ(x
)
exp(h
x
)(1−exp(h
x∗ν↓y (A)
x
1
n−1
n
↓y
=
ν
(A
∩
{z
:
ρ
(x)
>
ρ
(z)})
1 ) exp(−h
1
n−1
n−1
∗
∗
∗
x
x (X )
ν
1
−
ν
(X
)
x
x
1
n−1
n n
1
n−1
n−1
↓y
ν
y
↓y
x
y
= max{α,
(Xyields
)}
P̃
With αx large
enough,νxthis
a
suitable
:
ν̃
=
α
+
x ↓y
Px
νx (X )↓y
↓y
y
ν̃x ↓y
= αx
ν
νx
↓y
νx (X )
+ (1 −
νx
y
αx ) ↓y
−ν y
νx
− νx
(1 −
y
αx )
↓y
↓y
x
x− νx (X )
1
y
y
x
x
= αx ↓y
+
(1
−
α
)
x
↓y
∗
∗
∗
) exp(−h
λ(x) 1 )) . . . λ(xn−11) exp(h
x
)(1−exp(hn x
ν
−
ν
(X
)
x 1(X
x n−1
n−1
Example: for ρ = Levenshtein can pick
3
y
αx
samedi 22 septembre 2012
=
↓y
max{α, νx (X )}
α = max{α, ν (X
1
α>
2
!
$
"
$ y
y
y
ρ (Xn+1 ) < ρMultiple
(Xn ) $ excursions
ρ (Xn ) > 0 > 0
P̃(N < ∞) = 1
Paths generated by P̃ stop as soon as they hit y
x
↑y
νx (A)
y
y
= νx (A ∩ {z : ρ (x) > ρ (z
yy
↓y
αx = max{α, νx (X )}
This is not necessarily the case under P
x
y
1
=2
αE >
2
Solution: first sample a number of excursions E from a
hyper-parameter distribution
samedi 22 septembre 2012
E ∼ Geo(β)
Proposal hyper-parameters
 How to set α, β ?
 Optimal choice depends on the process and on t
 We use an ensemble of kernels with different
combinations of α, β, ranging over several magnitudes
 The particles produced by the members of this ensemble
compete; the weights and resampling naturally do selection
 Easy to justify with an auxiliary variable construction
samedi 22 septembre 2012
P(Xe |Xs )
∗
i )).
y
y
↑y
ν
(A)
=
ν
(A
∩
{z
:
ρ
(x)
>
ρ
(z)})
x
We therefore build
a IS algorithm
with unormalized
x
Weights
numerator of the above equation. The average of the
gives an estimate for the transition probability
P(Xe |Xs ).
P
Integrating
the
holding
times:
Evolution of strings is denoted by M
%
%
or .paper
Branch length
is T∗ ))
= T. .. . λ(x∗
..
λ(x∗ ) exp(−h
λ(x
) exp(h
Model: String-valued Continuous Time Markov Chain
1
& '( )
n=|x∗ |times
1
1
6
n−1
∗
∗
x
)(1−exp(h
x
n−1 n−1
n n )) dh1 . . . dhn
Strings {Xt : 0 ≤ t ≤ T }; Xt has a countable infinite domain of all possible
molecular sequences.
X = Strings
x5 G T G T A C
x4 G T G T C
x6 G
T
A
C
x1 G A
T
C
x2 A
T
C
x3 G T
C
XN = y
X1 = x
X2
0
L. Wang and A. Bouchard-Côté (UBC)
T1
T2
H3
T3
T4 T5
Harnessing Non-Local Evolutionary Events for Tree Infer.
T6 Branch length
t
June 25, 2012
 High dimensional integral
 Results on convolution of3 exponential?
 Not directly applicable
 Expensive when rates have multiplicities
samedi 22 septembre 2012
6 / 17
(x∗i )). We therefore build a IS algorithm with unormalized
he numerator of the above equation. The average of the
les gives an estimate for the transition probability P(Xe |Xs ).
Reduction to a matrix exponential
Model: String-valued Continuous Time Markov Chain
Evolution of strings is denoted by M
for
paper
Branch
length
is T =
T .
Idea: construct
a
finite
rate
matrix
Q
on
the
fly,
for
each
particle
Strings {X : 0 ≤ t ≤ T }; X has a countable infinite domain of all possible
6
t
t
molecular sequences.
X = Strings
x5 G T G T A C
x4 G T G T C
x6 G
T
A
C
x1 G A
T
C
x2 A
T
C
x3 G T
C
XN = y
X1 = x
X2
0
T1
L. Wang and A. Bouchard-Côté (UBC)
For each state visited in
X, build an artificial state
(with multiplicities)
samedi 22 septembre 2012
T2
H3
T3
T4 T5
Harnessing Non-Local Evolutionary Events for Tree Infer.
T6 Branch length
t
June 25, 2012
...
6 / 17
(x∗i )). We therefore build a IS algorithm with unormalized
he numerator of the above equation. The average of the
les gives an estimate for the transition probability P(Xe |Xs ).
Reduction to a matrix exponential
Model: String-valued Continuous Time Markov Chain
Evolution of strings is denoted by M
for
paper
Branch
length
is T =
T .
Idea: construct
a
finite
rate
matrix
Q
on
the
fly,
for
each
particle
Strings {X : 0 ≤ t ≤ T }; X has a countable infinite domain of all possible
6
t
t
molecular sequences.
X = Strings
x5 G T G T A C
x4 G T G T C
x6 G
T
A
C
x1 G A
T
C
x2 A
T
C
x3 G T
C
XN = y
X1 = x
X2
0
T1
L. Wang and A. Bouchard-Côté (UBC)
For each state visited in
X, build an artificial state
(with multiplicities)
samedi 22 septembre 2012
T2
H3
T3
T4 T5
Harnessing Non-Local Evolutionary Events for Tree Infer.
T6 Branch length
t
June 25, 2012
...
6 / 17
There will be positive rates
only between consecutive
artificial states
Reduction to a matrix exponential
Construct Q using the rate function parameter as follows:
E∼β






−λ(X1 )
0
λ(X1 )
−λ(X2 )
λ(X2 )
...
0
...
samedi 22 septembre 2012
0
−λ(XN )





λ(XN ) 
Reduction to a matrix exponential
X = (X1 , X2 , . . . , XN )
T = (T1 , T2 , . . . , TN )
Take the matrix exponential
π({x∗ }) = P(X = x∗ |X1 = x, XN = y)
x∗ ∈ X ∗
=
!
1 x∗|x∗ | = y
"
!

P X = x∗ ,
−λ(X1 )
#n−1


M = exp 


i=1
Hi < t ≤
P(XN = y|X1 = x)
#n
i=1
λ(X1 )
−λ(X2 )
Hi |X1 = x
P̃(X = x∗ )
P̃(X = x∗ ) = P(X = x∗ |X1 = x)
0
"
ρy : X → N
E∼β
0
λ(X2 )
...
0
$
!
"
$
P ρy (Xn+1 ) < ρy (Xn ) $ ρy (Xn ) > 0 > 0

−λ(XN )
P̃
P̃(N < ∞) = 1
Value of the integral: given by entry M1,N-1
νx↑y (A) = νx (A ∩ {z : ρy (x) > ρy (z)})
P
%
...
'( )
λ(x∗1 ) exp(−h1 λ(x∗1 )) . . . λ(x∗n−1 ) exp(hn−1 x∗n−1 )(1−exp(hn x∗n )) dh1
|x∗ |times
samedi 22 septembre 2012
. . . dhn
=
...




λ(XN ) 
Numerical issues
 If all rates are distinct (in particular, same state not
visited twice): exponentiation through diagonalisation
is possible and fast
 Using sparsity: inversion is quadratic
 Can do the computation only for one entry of M
 If rates are not distinct: above method fails (Q does not
have a complete set of linearly indep. eigenvectors)
 Can use Jordan-Chevalley decomposition (Q = A + N, A
diag., N nilpotent)
 Simpler: Padé series + scaling & squaring method
samedi 22 septembre 2012
Experiments
 Numerical validations of consistency in # of particles
 All the ideas presented today tested on 2 (of the rare)
countably infinite CTMCs with closed form for the marginals
 Linear birth death process
 Poisson Indel Process
 Experiments on phylogenetic inference for the
proposal presented today but without integrated
holding times
samedi 22 septembre 2012
Experiments
Experiments
A Subset of Simulated
Data
Experiments
branch
 Task: reconstruction of tree topologies and
lengths (error
using
tree metrics)
A measured
Subset of
Simulated
Data
 10 taxa at the leaves
 Example of simulated data:
 Alignment not given
length λ
is pt
3; θ=
= 0.03;
λpt=
= 0.05;
= 0.2;
2.0; µSSM
= 2.0
sub0.05;
Setting: SSM length is Setting:
3; θsubSSM
= 0.03;
µpt
0.2; µλptSSM
=λSSM
2.0; =µSSM
= 2.0
samedi 22 septembre 2012
L. Wang and A. Bouchard-Côté (UBC)
Harnessing Non-Local Evolutionary Events for Tree Infer.
Preliminary results
Experiments
Performance on Estimating Trees
Tree inference using correct parameters:
100
1000
1
100
1000
!
!
!
!
100
1000
40
30
40
30
20
!
!
10
20
!
10
5
5
!
Kuhner Felsenstein Metric
10
10
!
Partition Metric
10
15
10
15
1
0
0
0
0
!
1
10
100
1000
1
Number of particles
L. Wang and A. Bouchard-Côté (UBC)
samedi 22 septembre 2012
10
Number of particles
(replications on 10 random trees & datasets)
Harnessing Non-Local Evolutionary Events for Tree Infer.
June 25, 2012
15 / 17
Scaling up to large datasets
 Large number of particles needed
 Large phylogenetic trees
 Mixing proposals with different hyper-parameter values α, β
 Motivation for parallel architectures
 Revised Moore’s law: parallel architectures
 Each particle is large
 particles are forests
 need to keep one string for each tree in forest
 ‘worst’ case: one string = one genome
samedi 22 septembre 2012
Part II : Entangled Monte Carlo (EMC)
 Goal:
 Do parallelization in such as way that the result is
equivalent to running everything on a (hypothetical) single
machine
 Complementary approach: modify SMC
 Éric Moulines’ talk on Island models from yesterday
 Pierre Jacob’s talk on pairwise resampling scheme; this
afternoon
samedi 22 septembre 2012
080StochasticSmaps
S
2.1
082
algorithms
the ideaspace.
of a stochastic map. We
where
(S, FS ) isis
a probability
081
083
Markov chain, where it was first introduced to
An
important
concept
used
in
the
construction
o
2.1
Stochastic
maps
082
maps
084
start
by
reviewing
stochastic
maps
in
the
context
083
085
cept used in the construction
of our algorithms
is important
the idea of a stochastic
map.
We
An
concept
used
in
the
cons
design
perfect
simulation
algorithms.
nel
of
a
Markov
chain
(generally
constructed
084
stochastic maps 086
in the context of a Markov chain, where it was first introduced to
start
by
reviewing
stochastic
maps
in
th
ulation
ve
or
not
using
a
Metropolis-Hastings
(MH)
 algorithms.
085
A way087
to decouple
and
Let T randomness
: S design
× FS perfect
→
[0,state
1]
denote the
transitio
simulation
algorithms.
→ [0,
1]dependencies
denote
the transition
kernel
of a Markovinto
chaina(generally
constructed
his
chain,
pushing
the
randomness
list
086
by
first
proposing
and
then
deciding
whether
t
088
and
then
deciding whether
to move
or not using
a Metropolis-Hastings
(MH)
→
S)-valued
random
variable
F
such
that
Let
T
:
S
×
F
→
[0,
1]
denote
the
087
Consider
an
arbitrary
kernel:

S
ratio).
A
stochastic
map
is
an
equivalent
view
tic map is an equivalent
view of this chain, pushing the randomness into a list
089
by
first
proposing
and
then
deciding
event
∈Formally,
FS . Concretely,
these
maps
are
088
map:
-valued
r.v.
F
such
that
A Stochastic
ion functions.
it is
a
(S
→
S)-valued
random
variable
F
such
that
of random transition functions. Formally, it is
090
s)
∈ A)
for all state
S and
event A ratio).
∈ t(u,
FS . Concretely,
these map
maps are
A
stochastic
is
an
equiva
ally
defined
as as ∈transformation
s)
with
089
T
(s,
A)
=
P(F
(s)
∈
A)
for
all
state
s
∈
S
ing the observation
that
T
is
typically
defined
as
a
transformation
t(u,
s)
with
091
of
random
transition
functions.
Form
se
where
t
is
based
on
the
inverse
cumulative
090
using
theinverse
observation
most fundamental example is constructed
the case where tby
is based
on the
cumulative that T is
092 F alternate
T
(s,
A)
=
P(F
(s)
∈
A)
for
all
stat

Example:
view
on
MCMC
od.
We
can
then
write
(s) =ut(U,
s)
for
a
uniform
random
variable
U
on
[0,
1].
) for a uniform random
variable
U
on
[0,
1].
∈091[0, 1]. The most fundamental example is t
constructed
by
using
the
observation
093
Sample F1but
, F2distribution
, ...092
i.i.d.
, we geta non-standard,
useful
way of formulating
MCMC
algorithms.
First, F (s) =
method.
We
can
then
write
uidentically.
∈ [0, 1].First,
The most
fundamental
exa
way
of
formulating
MCMC
algorithms.
tic maps
F
,
F
,
.
.
.
,
F
,
independently
and
Second,
to
compute
the
094
1
2
N
 Pick x0 arbitrarily093
fter n transitions,
simply return
F1 (Fthis
(F
)) =method.
F1 ◦·a· ·◦F
(x0 ),
for thenbut
distribution
We
can
write
2 (. . . notation,
n (x
0 )) . . .we
n
ently
and
identically.
Second,
to
compute
the
With
get
non-standard,
u
095
tate x0 ∈ S,Return:
n ∈ {1, 2, . . . , N }.094
This representation decouples the dependencies
sample
maps
, Fthe
, .state
.a. ,non-stand
FN , inde
F
(. . . (F
. . . the
))
= F1N
◦· stochastic
·With
·◦F
(xby0notation,
),
for F1we
2get
n (x
0 ))from
nthis
096
m 2number
generation
dependencies
induced
operations
on
095
state
of
the
chain
after
n
transitions,
simply
retur
the latter
are still not readily
amenable
to
parallelization,
and
this
is the motivation
his
representation
decouples
the
dependencies
sample
N
stochastic
maps
F
,
F
097
1
2, . . .
096
the foundation of our method.
We
will showstart
in Section
3x
that ∈
SMC
algorithms
an
arbitrary
state
S,
nn∈transitions,
{1, 2, . . . ,sim
N
0
endencies
induced
by
operations
on
the
state
state
of
the
chain
after
098
en using stochastic
maps.
097
induced
by
random
number
generation
from
th
ble to parallelization,
and
this
is
the
motivation
an
arbitrary
start
state
x
∈
S,
n
∈
{1
0
099
098 In MCMC, the latter are still not readily a
samedi 22 septembre 2012
space.
Stochastic maps
Algorithm 2 : reconstruct(s, ρ, i, F = {Fi : i ∈ I}
components decreases by one at every step. More precisely, we will build each
1:
F
←
I
rooted X -tree t by proposing a sequence
of X -forests s , s , . . . , s = t, where
Overview
2:
while
(s(i)
=
nil)
do
an X -forest s = {(t , X )} is a collection of rooted X -trees t such that the
3: union
F of←
F of◦ the
Fitrees in the forest is equal to the original set of
disjoint
leaves
Sample
a
global
collection
of
i.i.d.
stochastic
maps
for
both
the
4: ! X i=←
ρ(i)that with this specific construction, a forest of rank r
leaves,
X . Note
proposal
{
F
i } and resampling steps { Gi }
has5:
|X | end
− r trees.
while
The
of partial
considered in this Section are assumed to satisfy
6: sets
return
F states
(s(i))
Assume the global collection is transmitted to all machines
the following three conditions:
(O(1) if pseudo-random)
1
r
i
i
i
i
2
R
i
i
1. The sets of partial states of different ranks should be disjoint, i.e. Sr ∩ Ss
for all r $= s (in phylogenetics, this holds since a forest with r trees cannot
1
be a forest with s trees when r $= s).
2. The set of partial state of smallest rank should have a single element
denoted by ⊥, S1 = {⊥} (in phylogenetics, ⊥ is the disconnected graph
2
on X ).
3. The set of partial states of rank R should coincide with the target space,
3
SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tree
and are members of the target space X ).
samedi 22 septembre 2012
r
Algorithm 2 : reconstruct(s, ρ, i, F = {Fi : i ∈ I}
components decreases by one at every step. More precisely, we will build each
1:
F
←
I
rooted X -tree t by proposing a sequence
of X -forests s , s , . . . , s = t, where
Overview
2:
while
(s(i)
=
nil)
do
an X -forest s = {(t , X )} is a collection of rooted X -trees t such that the
3: union
F of←
F of◦ the
Fitrees in the forest is equal to the original set of
disjoint
leaves
Sample
a
global
collection
of
i.i.d.
stochastic
maps
for
both
the
4: ! X i=←
ρ(i)that with this specific construction, a forest of rank r
leaves,
X . Note
proposal
{
F
i } and resampling steps { Gi }
Each machine m
has5:
|X | end
− r trees.
while
is
responsible
of
a
The
sets
of
partial
states
considered
in
this
Section
are
assumed
to
satisfy
6: return F (s(i))
Assume the global collection is transmitted to all machines
subset of the
the following three conditions:
particles
(O(1) if pseudo-random)
1
r
i
i
i
i
2
R
i
i
1. The sets of partial states of different ranks should be disjoint, i.e. Sr ∩ Ss
for all r $= s (in phylogenetics, this holds since a forest with r trees cannot
1
be a forest with s trees when r $= s).
2. The set of partial state of smallest rank should have a single element
denoted by ⊥, S1 = {⊥} (in phylogenetics, ⊥ is the disconnected graph
2
on X ).
3. The set of partial states of rank R should coincide with the target space,
3
SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tree
and are members of the target space X ).
samedi 22 septembre 2012
r
Algorithm 2 : reconstruct(s, ρ, i, F = {Fi : i ∈ I}
components decreases by one at every step. More precisely, we will build each
1:
F
←
I
rooted X -tree t by proposing a sequence
of X -forests s , s , . . . , s = t, where
Overview
2:
while
(s(i)
=
nil)
do
an X -forest s = {(t , X )} is a collection of rooted X -trees t such that the
3: union
F of←
F of◦ the
Fitrees in the forest is equal to the original set of
disjoint
leaves
Sample
a
global
collection
of
i.i.d.
stochastic
maps
for
both
the
4: ! X i=←
ρ(i)that with this specific construction, a forest of rank r
leaves,
X . Note
proposal
{
F
i } and resampling steps { Gi }
Each machine m
has5:
|X | end
− r trees.
while
is
responsible
of
a
The
sets
of
partial
states
considered
in
this
Section
are
assumed
to
satisfy
6: return F (s(i))
Assume the global collection is transmitted to all machines
subset of the
the following three conditions:
particles
(O(1) if pseudo-random)
1
r
i
i
i
i
2
R
i
i
1. The sets of partial states of different ranks should be disjoint, i.e. Sr ∩ Ss
for all r $= s (in phylogenetics, this holds since a forest with r trees cannot
1
be a forest with s trees when r $= s).
2. The set of partial state of smallest rank should have a single element
denoted by ⊥, S1 = {⊥} (in phylogenetics, ⊥ is the disconnected graph
2
3.
Consequence of resampling
on X ).
step: sometimes machine m
a particles
i
outside
of
The set of partial states of rank R should coincideneeds
with the
target space,
3
its subset
SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tree
and are members of the target space X ).
samedi 22 septembre 2012
r
Algorithm 2 : reconstruct(s, ρ, i, F = {Fi : i ∈ I}
components decreases by one at every step. More precisely, we will build each
1:
F
←
I
rooted X -tree t by proposing a sequence
of X -forests s , s , . . . , s = t, where
Overview
2:
while
(s(i)
=
nil)
do
an X -forest s = {(t , X )} is a collection of rooted X -trees t such that the
3: union
F of←
F of◦ the
Fitrees in the forest is equal to the original set of
disjoint
leaves
Sample
a
global
collection
of
i.i.d.
stochastic
maps
for
both
the
4: ! X i=←
ρ(i)that with this specific construction, a forest of rank r
leaves,
X . Note
proposal
{
F
i } and resampling steps { Gi }
Each machine m
has5:
|X | end
− r trees.
while
is
responsible
of
a
The
sets
of
partial
states
considered
in
this
Section
are
assumed
to
satisfy
6: return F (s(i))
Assume the global collection is transmitted to all machines
subset of the
the following three conditions:
particles
(O(1) if pseudo-random)
1
r
i
i
i
i
2
R
i
i
1. The sets of partial states of different ranks should be disjoint, i.e. Sr ∩ Ss
for all r $= s (in phylogenetics, this holds since a forest with r trees cannot
1
Idea:
Reconstruct
2. The set of partial state of smallest rank should have a single element
particle i
denoted by ⊥, S = {⊥} (in phylogenetics, ⊥ is the disconnected graph
2resampling
Consequence
of
using the
on X ).
step: sometimes machine m
stochastic
needs
a particles
i
outside
of
3. The
set
of
partial
states
of
rank
R
should
coincide
with
the
target
space,
3
maps
be a forest with s trees when r $= s).
1
its subset
SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tree
and are members of the target space X ).
samedi 22 septembre 2012
r
3:
F ←F ◦F
4: ! X i=←
ρ(i)that with this specific construction, a forest of rank
leaves,
X . Note
has5:
|X | end
− r trees.
Distributed
genealogy s(i), ρ(i)
while
The
of partial
considered in this Section are assumed to satisf
6: sets
return
F states
(s(i))
disjoint union of leaves of thei trees in the forest is equal to the original set o
i
i
the following three conditions:
1. The sets of partial states of different ranks should be disjoint, i.e. Sr ∩ S
for all r $= s (in phylogenetics, this holds since a forest with r trees canno
1
be a forest with s trees when r $= s).
2. The set of partial state of smallest rank should have a single elemen
denoted by ⊥, S1 = {⊥} (in phylogenetics, ⊥ is the disconnected grap
2
on X ).
3. The set of partial states of rank R should coincide with the target space
3
SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tre
and are members of the target space X ).
r
These conditions will be subsumed by the more general framework of Sec
Figure 2: Illustration of compact
tion 4.5, but the more concrete conditions above help understanding the pose
samedi 22 septembre 2012
3:
F ←F ◦F
4: ! X i=←
ρ(i)that with this specific construction, a forest of rank
leaves,
X . Note
has5:
|X | end
− r trees.
Distributed
genealogy s(i), ρ(i)
while
The
of partial
considered in this Section are assumed to satisf
6: sets
return
F states
(s(i))
disjoint union of leaves of thei trees in the forest is equal to the original set o
i
i
the following three conditions:
1. The sets of partial states of different ranks should be disjoint, i.e. Sr ∩ S
for all r $= s (in phylogenetics, this holds since a forest with r trees canno
1
Assume
be a forest with s trees when r $= s).
w.l.o.g. there
2. The set of partial state of smallest rank should have a single elemen
is always a
denoted by ⊥, S = {⊥} (in phylogenetics, ⊥ is the disconnected grap
2
shared
on X ).
common
ancestor
3. The set of partial states of rank R should coincide with the target space
3
1
SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tre
and are members of the target space X ).
r
These conditions will be subsumed by the more general framework of Sec
Figure 2: Illustration of compact
tion 4.5, but the more concrete conditions above help understanding the pose
samedi 22 septembre 2012
3:
F ←F ◦F
4: ! X i=←
ρ(i)that with this specific construction, a forest of rank
leaves,
X . Note
has5:
|X | end
− r trees.
Distributed
genealogy s(i), ρ(i)
while
The
of partial
considered in this Section are assumed to satisf
6: sets
return
F states
(s(i))
disjoint union of leaves of thei trees in the forest is equal to the original set o
i
i
the following
three conditions:
Current concrete
particles:
1.
those explicitly stored in machine m
s(i) ≠ nil
The sets of partial states of different ranks should be disjoint, i.e.
Sr ∩ S
for all r $= s (in phylogenetics, this holds since a forest with r trees canno
1
Assume
be a forest with s trees when r $= s).
w.l.o.g. there
2. The set of partial state of smallest rank should have a single elemen
is always a
denoted by ⊥, S = {⊥} (in phylogenetics, ⊥ is the disconnected grap
2
shared
on X ).
common
ancestor
3. The set of partial states of rank R should coincide with the target space
3
1
SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tre
and are members of the target space X ).
r
These conditions will be subsumed by the more general framework of Sec
Figure 2: Illustration of compact
tion 4.5, but the more concrete conditions above help understanding the pose
samedi 22 septembre 2012
3:
F ←F ◦F
4: ! X i=←
ρ(i)that with this specific construction, a forest of rank
leaves,
X . Note
has5:
|X | end
− r trees.
Distributed
genealogy s(i), ρ(i)
while
The
of partial
considered in this Section are assumed to satisf
6: sets
return
F states
(s(i))
disjoint union of leaves of thei trees in the forest is equal to the original set o
i
i
the following
three conditions:
Current concrete
particles:
1.
those explicitly stored in machine m
s(i) ≠ nil
The sets of partial states of different ranks should be disjoint, i.e.
Sr ∩ S
for all r $= s (in phylogenetics, this holds since a forest with r trees canno
1
Assume
be a forest with s trees when r $= s).
w.l.o.g. there
2. The set of partial state of smallest rank should have a single elemen
is always a
denoted by ⊥, S = {⊥} (in phylogenetics, ⊥ is the disconnected grap
2
shared
on X ).
common
ancestor
3. The set of partial states of rank R should coincide with the target space
3
1
SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tre
and are members of the target space X ).
r
Current compact particles: only store id of the parent ρ(i)
These conditions will be subsumed
s(i) = by
nil the more general framework
of Sec
Figure 2: Illustration of compact
tion 4.5, but the more concrete conditions above help understanding the pose
samedi 22 septembre 2012
ρ(i)
Reconstruction
of
particle
i
le
Algorithm 2 : reconstruct(s, ρ, i, F =
al states
considered in this Section are assumed to sa
F
(s(i))
1: F ← I
Note that with this specific construction, a forest of ra
components decreases by one at every step. More precisely, we w
rooted X -tree t by proposing a sequence of X -forests s1 , s2 , . . . , s
2: while (s(i) = nil) do
3: union
F of←
F of◦ the
Fitrees in the forest is equal to the o
disjoint
leaves
on 3.3}
4: ! X i=←
ρ(i)that
rtial states of different
ranks
should
bespecific
disjoint,
i.e.a S
leaves,
X . Note
with this
construction,
forr
has5:
|X | end
− r trees.
while
n phylogenetics, The
this
holds
since
a
forest
with
r
trees
ca
of partial
considered in this Section are assum
6: sets
return
F states
(s(i))
1
conditions:
an X -forest sr = {(ti , Xi )} is a collection of rooted Xi -trees ti s
i
i
the following three conditions:
th s trees when
r $= s).
1. The sets of partial states of different ranks should be disjoin
for all r $=rank
s (in phylogenetics,
holds since
a forest with
rtial state of smallest
should this
have
a single
eler
ection B}
be a forest with s trees when r $= s).
ction
3.1}
S1 =
{⊥} (in phylogenetics, ⊥ is the disconnected g
2
2. The set of partial state of smallest rank should have a si
)
samedi 22 septembre 2012
denoted by ⊥, S1 = {⊥} (in phylogenetics, ⊥ is the disconn
on X ).
Resampling
 At resampling, only transmit particle weights
 Genealogy can be updated efficiently from this
information
samedi 22 septembre 2012
Details
 See NIPS paper:
 Jun, Wang, Bouchard-Côté (2012) NIPS.
 Datastructures the stochastic maps
 Constant storage using pseudo-randomness
 Need random access to the random number: binary trees of
xor’ing 2 streams of random numbers
 Allocations schemes: heuristics to minimize the
amount of particle transmission
samedi 22 septembre 2012
03
samedi 22 septembre 2012
1200000
200000 400000 600000 800000
97
98 Comparison:
 Particle
99
transmission over
network (red)
00
EMC
(blue)

01
02
Total run time of EMC versus Particle transfer
0
92
93 Setup:
94  Phylogenetic
inference
95
 100 particles/EC2
96
instance
Time (milliseconds)
90
91
Figure 3: (a) The speedup factor for the 16S actin
factor for the 5S actinobacteria dataset with 100 ta
synthetic experiment
(see text).
Experiments
500
1000
1500
2000
# of particles
2500
3000
Future directions
 SMC algorithms for inference over countably infinite /
combinatorial CTMCs
Model: String-valued Continuous Time Markov Chain
Evolution
of strings
is denoted byjump
M
remove
the bounded
rate
 Using these techniques to
Branch length is T = T .
assumption in jump-diffusion
methods
Strings {X : 0 ≤ t ≤ T }; X has a countable infinite domain of all possible
molecular sequences.
 New applications:
6
t
t
Structures
Strings
RNA strand
following itsEvolution of strings is denoted by M
folding pathway Branch length is T = T .
x5 G T G T A C
x4 G T G T C
x6 G
T
AModel:
C String-valued Continuous Time Markov Chain
x1 G A
T
x2 A
T
C
x3 G T
C
C
6
Strings {Xt : 0 ≤ t ≤ T }; Xt has a countable infinite domain of all possible
T3
T4 T5
T6 BranchTime
T1 T2
0
length
molecular sequences.
L. Wang and A. Bouchard-Côté (UBC)
Bayesian
non-parametric
model
samedi 22 septembre 2012
Harnessing Non-Local Evolutionary Events for Tree Infer.
Strings
Latent
June 25, 2012
6 / 16
states ...
x5 G T G T A C
x4 G T G T C
x6 G
T
A
C
x1 G A
T
C
x2 A
T
C
x3 G T
C
4
3
2
1
0
L. Wang and A. Bouchard-Côté (UBC)
T1
T2
T3
T4 T5
Harnessing Non-Local Evolutionary Events for Tree Infer.
T6 BranchTime
length
June 25, 2012
6 / 16
Future directions
 EMC
 Working on another version where only the sum of the
particle weights is transmitted (using DHT methods)
 Better understanding of when and why the method works
well
samedi 22 septembre 2012
Download