Applications of SMC to the analysis of partially observed jump processes and: the Entangled Monte Carlo algorithm Alexandre Bouchard-Côté University of British Columbia Collaborators: Seong-Hwan Jun, Bonnie Kirkpatrick, Liangliang Wang samedi 22 septembre 2012 Part I : Overview Problem: posterior inference on countably infinite Continuous Time Markov Chains (CTMCs) Motivations: phylogenetic inference under evolutionary models with random dependencies across sites Proposed method: Proposals based on supermartingales on combinatorial potentials Weights given by exponentiation of random matrices samedi 22 septembre 2012 Example of Non-local Evolutionary Events Slipped Strand Mispairing (SSM) Slipped strand mispairing (SSMs) An example from Levinson ’87 Normal pairing during DNA replication: Normal pairing during DNA replication Slipped Strand Mispairing (SSM) Example of Non-local Evolutionary Events An example from Levinson ’87 Replication continues. Inserting TA repeat unit: SSM: Example of insertion of an extra TA repeat L. Wang and A. Bouchard-Côté (UBC) Harnessing Non-Local Evolutionary Events for Tree Infer. June 25, 2012 Levinson ’87 samedi 22 septembre 2012 SSMs on a tree Example of Non-local Evolutionary Events An Example of Evolutionary Events String-valued branching process: !: G A T C v4 : v1 : samedi 22 septembre 2012 G T A C v2 : v3 : Example of Non-local Evolutionary Events SSMs on Events a branch xample of Evolutionary 0 G T1 T2 A G !: A T T C T C C G A T C Starting sequence Point deletion: G Point mutation: A->G T3 SSM insertion: (G T) G T G T C T4 Point insertion: A G T G T A C T5 SSM deletion: (G T) G T T6 samedi 22 septembre 2012 A C v1 : G T A C Ending sequence SSMs and phylogenetic inference Potential of SSM in phylogenetics: Interactions between SSMs and point mutations adds constrains---this can help resolving trees and alignments Very frequent in neutral regions (e.g. plant introns) This potential has not been exploited yet Reason: inference is computationally challenging samedi 22 septembre 2012 Computational problem Our application (phylogenetic tree inference) requires SMC/PMCMC samplers... but the main ideas can be explained in a simpler setup: Computing a marginal transition probability, using importance sampling samedi 22 septembre 2012 mate for the transition probability P(Xe |Xs ). Model: String-valued Continuous Time Markov Chain Marginal transition probability Evolution of strings is denoted by M r Branch length is T = T6 . Strings {Xt : 0 ≤ t ≤ T }; Xt has a countable infinite domain of all possib molecular sequences. X = Strings x5 G T G T A C x4 G T G T C x6 G T A C x1 G A T C x2 A T C x3 G T C XN = y X1 = x X2 0 L. Wang and A. Bouchard-Côté (UBC) samedi 22 septembre 2012 T1 T2 T3 T4 T5 Harnessing Non-Local Evolutionary Events for Tree Infer. T6 Branch length t June 25, 2012 mate for the transition probability P(Xe |Xs ). Model: String-valued Continuous Time Markov Chain Marginal transition probability X Evolution of strings is denoted by M r Branch length is T = T6 . Strings {Xt : 0 ≤ t ≤ T }; X has a countable infinite domain of all possib t P(X = y|X = x) N 1 molecular sequences. X = Strings x5 G T G T A C x4 G T G T C x6 G T A C x1 G A T C x2 A T C x3 G T C XN = y X1 = x X2 0 L. Wang and A. Bouchard-Côté (UBC) samedi 22 septembre 2012 T1 T2 T3 T4 T5 Harnessing Non-Local Evolutionary Events for Tree Infer. T6 Branch length t June 25, 2012 f the above equation. The average of the P(XN = y|X1 = x) mate for the transition probability P(X |X ). e s P(X = y|X = x) N Model 1 Model: String-valued Continuous Time Markov Chain Xi+1 |Xi ∼ νXi Evolution of strings is denoted by M Jump distribution: Xi+1 |Xi ∼ νXi r Branch length is T = T6 . . ∼t has Exp(λ); Hinfinite Strings {Xt :times: 0 ≤ t ≤H T };i X a countable domain all possibl Hold i =T i − Tof i−1 molecular sequences. X = Strings x5 G T G T A C x4 G T G T C x6 G T A C x1 G A T C x2 A T C x3 G T C XN = y X1 = x X2 0 L.22 Wang and A. Bouchard-Côté samedi septembre 2012 (UBC) T1 T2 H3 T3 T4 T5 Harnessing Non-Local Evolutionary Events for Tree Infer. T6 Branch length t June 25, 2012 i+1 X i Xi Xi+1 |Xi ∼ νXi Parameters: example H ∼ Exp(λ); H = T − T i i Model: String-valued Continuous Time Markov Chain Model: String-valued Continuous Time Markov Chain P(XN = y|X1 = x) H ∼ Exp(λ); i An Example of λ(x) and J(x An Example i i−1 H = T − T i i i−1 → ·) → ·) λ : X → (0, ∞) |Xdeparting The X rate of departing from x: x: λ(x) λ(x) i+1of i ∼ νXi from The rate λ(x) = = nθ nθsub + λλλ nµpt +λ λ(0, + ff (x)µ (x)µSSM sub + pt + pt→ SSM ∞) SSM :+X λ(x) nµ + + pt SSM Hi ∼ Exp(λ); Hi = Ti − Ti−1 n: length length of of x; x; n: (x): the the number number of of valid valid SSM SSM deletion deletion locations. locations. ff(x): λjumping : X → (0, ∞) The distribution: J(x → ·) × FX → [0, 1] ν→ :X The jumping distribution: J(x ·) !! Mutation type from x to x Mutation type from x to x ν : X × FX → [0, 1] θsub Point substitution substitution θ Point sub λpt λ pt Point insertion insertion n+1 Point n+1 $ x!! ) = 11 µpt Point deletion deletion J(x → νx ({x λ(x) µpt Point J(x → x})) = λ(x) λSSM λ SSM insertion insertion SSM f (x) SSM f (x) µµSSM SSM deletion, deletion, SSM SSM 2 L. Wang and A. Bouchard-Côté (UBC) L. Wang and A. Bouchard-Côté (UBC) samedi 22 septembre 2012 2 Harnessing Non-Local Evolutionary Events for Tree Infer. Harnessing Non-Local Evolutionary Events for Tree Infer. 2 June 25, 2012 June 25, 2012 8 / 17 8 / 17 i+1 X i Xi Xi+1 |Xi ∼ νXi Parameters: example H ∼ Exp(λ); H = T − T i i Model: String-valued Continuous Time Markov Chain Model: String-valued Continuous Time Markov Chain P(XN = y|X1 = x) H ∼ Exp(λ); i An Example of λ(x) and J(x An Example i i−1 H = T − T i i i−1 → ·) → ·) λ : X → (0, ∞) |Xdeparting The X rate of departing from x: x: λ(x) λ(x) i+1of i ∼ νXi from The rate λ(x) = = nθ nθsub + λλλ nµpt +λ λ(0, + ff (x)µ (x)µSSM sub + pt + pt→ SSM ∞) SSM :+X λ(x) nµ + + pt SSM Hi ∼ Exp(λ); Hi = Ti − Ti−1 n: length length of of x; x; n: (x): the the number number of of valid valid SSM SSM deletion deletion locations. locations. ff(x): λjumping : X → (0, ∞) The distribution: J(x → ·) × FX → [0, 1] ν→ :X The jumping distribution: J(x ·) !! Mutation type from x to x Mutation type from x to x ν : X × FX → [0, 1] θsub Point substitution substitution θ Point sub λpt λ pt Point insertion insertion n+1 Point Note: this is n+1 1 ! 1 µpt Point deletion deletion J(x →$xx ! ) = λ(x) νx ({x }) µ Point J(x → ) = λ(x) λpt explosion free SSM λ SSM insertion SSM f (x) SSM insertion f (x) µ SSM deletion, deletion, (always assumed SSM µSSM SSM 2 L. Wang and A. Bouchard-Côté (UBC) L. Wang and A. Bouchard-Côté (UBC) samedi 22 septembre 2012 2 Harnessing Non-Local Evolutionary Events for Tree Infer. Harnessing Non-Local Evolutionary Events for Tree Infer. 2 today) June 25, 2012 June 25, 2012 8 / 17 8 / 17 i+1 X i Xi Xi+1 |Xi ∼ νXi Parameters: example H ∼ Exp(λ); H = T − T i Model: String-valued Continuous Time Markov Chain Model: String-valued Continuous Time Markov Chain P(XN = y|X1 = x) H ∼ Exp(λ); i An Example of λ(x) and J(x An Example i i Note: i−1 unbounded rate H = T − T i i i−1 → ·) → ·) function λ : X → (0, ∞) |Xdeparting The X rate of departing from x: x: λ(x) λ(x) i+1of i ∼ νXi from The rate λ(x) = = nθ nθsub + λλλ nµpt +λ λ(0, + ff (x)µ (x)µSSM sub + pt + pt→ SSM ∞) SSM :+X λ(x) nµ + + pt SSM Hi ∼ Exp(λ); Hi = Ti − Ti−1 n: length length of of x; x; n: (x): the the number number of of valid valid SSM SSM deletion deletion locations. locations. ff(x): λjumping : X → (0, ∞) The distribution: J(x → ·) × FX → [0, 1] ν→ :X The jumping distribution: J(x ·) !! Mutation type from x to x Mutation type from x to x ν : X × FX → [0, 1] θsub Point substitution substitution θ Point sub λpt λ pt Point insertion insertion n+1 Point Note: this is n+1 1 ! 1 µpt Point deletion deletion J(x →$xx ! ) = λ(x) νx ({x }) µ Point J(x → ) = λ(x) λpt explosion free SSM λ SSM insertion SSM f (x) SSM insertion f (x) µ SSM deletion, deletion, (always assumed SSM µSSM SSM 2 L. Wang and A. Bouchard-Côté (UBC) L. Wang and A. Bouchard-Côté (UBC) samedi 22 septembre 2012 2 Harnessing Non-Local Evolutionary Events for Tree Infer. Harnessing Non-Local Evolutionary Events for Tree Infer. 2 today) June 25, 2012 June 25, 2012 8 / 17 8 / 17 Related work Finite case: efficient exact and approximate exponentiation and estimation of rate matrices (Albert 1962; Asmussen et al 1996; Hobolth et al. 2005; Tataru et al. 2011; inter alia) When the rate function is bounded: Uniformization (Jensen 1953; Hobolth et al. 2009; inter alia), more recent jump-diffusion inference schemes using thinning for the discrete part (Casella et al. 2011, Murray Pollock’s talk) MCMC approaches (Rao et al. 2011) Work on countable spaces based on forward simulation (Saeedi et al. 2011, Läubli 2011) Birth-death processes (Crawford et al. 2011; inter alia) samedi 22 septembre 2012 ∗ Proposed method: notation ! ∗ |x | ! νx∗i ({x∗i+1 })P ∗ Xν = (X , X , . . . , X ) 1 2 N ∗ i=1 xi ({xi+1 })P |x | State space: list of visited states between end points T = (T1 , T2 , . . . , TN ) X = (X1 , X2 , . . . , XN ) i=1 X = (X , X , . . . , X ) 1 2 N ∗ ∗ Marginalized: transition times X = (X X21,= . . x, .,X N )= y) π(x ) = P(X = x1 ,|X XN T = (T1 , T2 , . . . , TN ) T = (T1 , T2 , . . . , TN ) ∗ Target distribution: x ∈ X ∗ ∗ ∗ π({x }) = P(X = x |X1 = x, XN = y) samedi 22 septembre 2012 ∗ x ∈X ∗ −λ(X1 ) λ(X1 ) " ! −λ(X ) λ(X ) # # 1 1 −λ(X ) λ(X ) n−1 n 2 2 T = (T , T , . . . , T ) ∗ 1 2 transition N t ≤ probability Obtaining marginal = y P Xthe = x , H < H |X i i 1 i=1 2 ) λ(X2 ) i=1 −λ(X P(X = y|X = x) − N 1 ∗ ∗ π({x}) = P(X = x |X1 = x, XN =−λ(X y) N P (X = ∗ x |X ∗ 1 = x) } γ(x ) Z = x ∈X ∗ γ(x P(XN = y|X1 = x) ) } Z ∗ ∗ Marginal transition obtained from ∗ the estimator of Z " ! =y P̃(X = x ) # # n−1 n ∗ P X = x , i=1 Hi < t ≤ i=1 Hi |X ∗ N = y|X1 ∗ P(X = x) P̃(X = x ) = P(X = x |X1 = x) samedi 22 septembre 2012 " ! #n−1 #n " P X "= x!, i=1 Hi < t ≤ i=1 Hi |X1 = x Proposal # # n−1 n ∗ ∗ x|x∗ | = y P(X P NX==y|X x , 1 =i=1 x)Hi < t ≤ i=1 Hi |X1 P(XN = y|X1 = x) ∗ Notation: P̃(X = x ) ∗ P̃(X = x ) ∗ Natural choice: Forward simulation ∗ ∗ P̃(X = x ) = P(X = x |X1 = x) The space is infinite samedi 22 septembre 2012 positive probability of not reaching y ∗ ∗ P̃(X = x ) = P(X = x y|X1 Solution: introduce potentials ρ Functions on the state space Assume: ρy(x) = 0 iff x = y = y ρ :X →N Dependency on the length to end point also possible Example: Levenshtein edit distance ρ’ACTG’(‘CGG’) = min number of point insertion, deletion, subst. =2 samedi 22 septembre 2012 ∗ P̃(X = x ) = P(Xy = x |X1 = x) ρ : X → N P̃(X = x ) = P(X = x |X1 = x) Using the potentials y ρ :X →N $ ! $ y y y Pρy ρ:!X(X ) < ρ (X ) ρ (X ) $ $ n+1 n n " →N ∗ ∗ $ If for all x ≠ y: P ρ (Xn+1 ) < ρ (x) $ Xn = x > 0 y ! ∗ y $ " y For ρ = Levenshtein, y y because forP̃ this$holds x ≠ y there ρ (Xisn+1 ) < ρ (X ) ρ (X ) > 0 > 0 $ P̃ n n always a string z reached in one operation and closer (or equal) to y Then we can build P̃ P̃(N < ∞) = 1 such that: ↑y νx (A) P̃(N < ∞) = 1 y = νx (A ∩ {z : ρ (x) > ρ (z)}) P samedi 22 septembre 2012 y P̃ ∗ ∗ ∗ ∗ h1 λ(x1 )) . . . λ(xn−1 ) exp(hn−1 xn−1 )(1−exp(hn xn )) dh1 % % y P Construction P ρ∗ : X → N ∗ ∗ .. ... λ(x1P̃(N ) exp(−h < ∞) 1=λ(x 1 1 )) . . . λ(xn−1 ) exp( Notation: !restriced on states decreasing$the potential: & '( ↓yProposal ) ↓y y $ y y P ρ (X ) < ρ (X ) $ ρ (X ) > λ(x )) . . . λ(x ) exp(h x )(1−exp(h x )) ν ν − ν x ∗ x x y y n=|x |times ∗= ∗ ∗ n+1 n∗ n y y ↓y α + (1 − α ) λ(x )) . . . λ(x ) exp(h x )(1−exp(h x∗ν↓y (A) x 1 n−1 n ↓y = ν (A ∩ {z : ρ (x) > ρ (z)}) 1 ) exp(−h 1 n−1 n−1 ∗ ∗ ∗ x x (X ) ν 1 − ν (X ) x x 1 n−1 n n 1 n−1 n−1 ↓y ν y ↓y x y = max{α, (Xyields )} P̃ With αx large enough,νxthis a suitable : ν̃ = α + x ↓y Px νx (X )↓y ↓y y ν̃x ↓y = αx ν νx ↓y νx (X ) + (1 − νx y αx ) ↓y −ν y νx − νx (1 − y αx ) ↓y ↓y x x− νx (X ) 1 y y x x = αx ↓y + (1 − α ) x ↓y ∗ ∗ ∗ ) exp(−h λ(x) 1 )) . . . λ(xn−11) exp(h x )(1−exp(hn x ν − ν (X ) x 1(X x n−1 n−1 Example: for ρ = Levenshtein can pick 3 y αx samedi 22 septembre 2012 = ↓y max{α, νx (X )} α = max{α, ν (X 1 α> 2 ! $ " $ y y y ρ (Xn+1 ) < ρMultiple (Xn ) $ excursions ρ (Xn ) > 0 > 0 P̃(N < ∞) = 1 Paths generated by P̃ stop as soon as they hit y x ↑y νx (A) y y = νx (A ∩ {z : ρ (x) > ρ (z yy ↓y αx = max{α, νx (X )} This is not necessarily the case under P x y 1 =2 αE > 2 Solution: first sample a number of excursions E from a hyper-parameter distribution samedi 22 septembre 2012 E ∼ Geo(β) Proposal hyper-parameters How to set α, β ? Optimal choice depends on the process and on t We use an ensemble of kernels with different combinations of α, β, ranging over several magnitudes The particles produced by the members of this ensemble compete; the weights and resampling naturally do selection Easy to justify with an auxiliary variable construction samedi 22 septembre 2012 P(Xe |Xs ) ∗ i )). y y ↑y ν (A) = ν (A ∩ {z : ρ (x) > ρ (z)}) x We therefore build a IS algorithm with unormalized x Weights numerator of the above equation. The average of the gives an estimate for the transition probability P(Xe |Xs ). P Integrating the holding times: Evolution of strings is denoted by M % % or .paper Branch length is T∗ )) = T. .. . λ(x∗ .. λ(x∗ ) exp(−h λ(x ) exp(h Model: String-valued Continuous Time Markov Chain 1 & '( ) n=|x∗ |times 1 1 6 n−1 ∗ ∗ x )(1−exp(h x n−1 n−1 n n )) dh1 . . . dhn Strings {Xt : 0 ≤ t ≤ T }; Xt has a countable infinite domain of all possible molecular sequences. X = Strings x5 G T G T A C x4 G T G T C x6 G T A C x1 G A T C x2 A T C x3 G T C XN = y X1 = x X2 0 L. Wang and A. Bouchard-Côté (UBC) T1 T2 H3 T3 T4 T5 Harnessing Non-Local Evolutionary Events for Tree Infer. T6 Branch length t June 25, 2012 High dimensional integral Results on convolution of3 exponential? Not directly applicable Expensive when rates have multiplicities samedi 22 septembre 2012 6 / 17 (x∗i )). We therefore build a IS algorithm with unormalized he numerator of the above equation. The average of the les gives an estimate for the transition probability P(Xe |Xs ). Reduction to a matrix exponential Model: String-valued Continuous Time Markov Chain Evolution of strings is denoted by M for paper Branch length is T = T . Idea: construct a finite rate matrix Q on the fly, for each particle Strings {X : 0 ≤ t ≤ T }; X has a countable infinite domain of all possible 6 t t molecular sequences. X = Strings x5 G T G T A C x4 G T G T C x6 G T A C x1 G A T C x2 A T C x3 G T C XN = y X1 = x X2 0 T1 L. Wang and A. Bouchard-Côté (UBC) For each state visited in X, build an artificial state (with multiplicities) samedi 22 septembre 2012 T2 H3 T3 T4 T5 Harnessing Non-Local Evolutionary Events for Tree Infer. T6 Branch length t June 25, 2012 ... 6 / 17 (x∗i )). We therefore build a IS algorithm with unormalized he numerator of the above equation. The average of the les gives an estimate for the transition probability P(Xe |Xs ). Reduction to a matrix exponential Model: String-valued Continuous Time Markov Chain Evolution of strings is denoted by M for paper Branch length is T = T . Idea: construct a finite rate matrix Q on the fly, for each particle Strings {X : 0 ≤ t ≤ T }; X has a countable infinite domain of all possible 6 t t molecular sequences. X = Strings x5 G T G T A C x4 G T G T C x6 G T A C x1 G A T C x2 A T C x3 G T C XN = y X1 = x X2 0 T1 L. Wang and A. Bouchard-Côté (UBC) For each state visited in X, build an artificial state (with multiplicities) samedi 22 septembre 2012 T2 H3 T3 T4 T5 Harnessing Non-Local Evolutionary Events for Tree Infer. T6 Branch length t June 25, 2012 ... 6 / 17 There will be positive rates only between consecutive artificial states Reduction to a matrix exponential Construct Q using the rate function parameter as follows: E∼β −λ(X1 ) 0 λ(X1 ) −λ(X2 ) λ(X2 ) ... 0 ... samedi 22 septembre 2012 0 −λ(XN ) λ(XN ) Reduction to a matrix exponential X = (X1 , X2 , . . . , XN ) T = (T1 , T2 , . . . , TN ) Take the matrix exponential π({x∗ }) = P(X = x∗ |X1 = x, XN = y) x∗ ∈ X ∗ = ! 1 x∗|x∗ | = y " ! P X = x∗ , −λ(X1 ) #n−1 M = exp i=1 Hi < t ≤ P(XN = y|X1 = x) #n i=1 λ(X1 ) −λ(X2 ) Hi |X1 = x P̃(X = x∗ ) P̃(X = x∗ ) = P(X = x∗ |X1 = x) 0 " ρy : X → N E∼β 0 λ(X2 ) ... 0 $ ! " $ P ρy (Xn+1 ) < ρy (Xn ) $ ρy (Xn ) > 0 > 0 −λ(XN ) P̃ P̃(N < ∞) = 1 Value of the integral: given by entry M1,N-1 νx↑y (A) = νx (A ∩ {z : ρy (x) > ρy (z)}) P % ... '( ) λ(x∗1 ) exp(−h1 λ(x∗1 )) . . . λ(x∗n−1 ) exp(hn−1 x∗n−1 )(1−exp(hn x∗n )) dh1 |x∗ |times samedi 22 septembre 2012 . . . dhn = ... λ(XN ) Numerical issues If all rates are distinct (in particular, same state not visited twice): exponentiation through diagonalisation is possible and fast Using sparsity: inversion is quadratic Can do the computation only for one entry of M If rates are not distinct: above method fails (Q does not have a complete set of linearly indep. eigenvectors) Can use Jordan-Chevalley decomposition (Q = A + N, A diag., N nilpotent) Simpler: Padé series + scaling & squaring method samedi 22 septembre 2012 Experiments Numerical validations of consistency in # of particles All the ideas presented today tested on 2 (of the rare) countably infinite CTMCs with closed form for the marginals Linear birth death process Poisson Indel Process Experiments on phylogenetic inference for the proposal presented today but without integrated holding times samedi 22 septembre 2012 Experiments Experiments A Subset of Simulated Data Experiments branch Task: reconstruction of tree topologies and lengths (error using tree metrics) A measured Subset of Simulated Data 10 taxa at the leaves Example of simulated data: Alignment not given length λ is pt 3; θ= = 0.03; λpt= = 0.05; = 0.2; 2.0; µSSM = 2.0 sub0.05; Setting: SSM length is Setting: 3; θsubSSM = 0.03; µpt 0.2; µλptSSM =λSSM 2.0; =µSSM = 2.0 samedi 22 septembre 2012 L. Wang and A. Bouchard-Côté (UBC) Harnessing Non-Local Evolutionary Events for Tree Infer. Preliminary results Experiments Performance on Estimating Trees Tree inference using correct parameters: 100 1000 1 100 1000 ! ! ! ! 100 1000 40 30 40 30 20 ! ! 10 20 ! 10 5 5 ! Kuhner Felsenstein Metric 10 10 ! Partition Metric 10 15 10 15 1 0 0 0 0 ! 1 10 100 1000 1 Number of particles L. Wang and A. Bouchard-Côté (UBC) samedi 22 septembre 2012 10 Number of particles (replications on 10 random trees & datasets) Harnessing Non-Local Evolutionary Events for Tree Infer. June 25, 2012 15 / 17 Scaling up to large datasets Large number of particles needed Large phylogenetic trees Mixing proposals with different hyper-parameter values α, β Motivation for parallel architectures Revised Moore’s law: parallel architectures Each particle is large particles are forests need to keep one string for each tree in forest ‘worst’ case: one string = one genome samedi 22 septembre 2012 Part II : Entangled Monte Carlo (EMC) Goal: Do parallelization in such as way that the result is equivalent to running everything on a (hypothetical) single machine Complementary approach: modify SMC Éric Moulines’ talk on Island models from yesterday Pierre Jacob’s talk on pairwise resampling scheme; this afternoon samedi 22 septembre 2012 080StochasticSmaps S 2.1 082 algorithms the ideaspace. of a stochastic map. We where (S, FS ) isis a probability 081 083 Markov chain, where it was first introduced to An important concept used in the construction o 2.1 Stochastic maps 082 maps 084 start by reviewing stochastic maps in the context 083 085 cept used in the construction of our algorithms is important the idea of a stochastic map. We An concept used in the cons design perfect simulation algorithms. nel of a Markov chain (generally constructed 084 stochastic maps 086 in the context of a Markov chain, where it was first introduced to start by reviewing stochastic maps in th ulation ve or not using a Metropolis-Hastings (MH) algorithms. 085 A way087 to decouple and Let T randomness : S design × FS perfect → [0,state 1] denote the transitio simulation algorithms. → [0, 1]dependencies denote the transition kernel of a Markovinto chaina(generally constructed his chain, pushing the randomness list 086 by first proposing and then deciding whether t 088 and then deciding whether to move or not using a Metropolis-Hastings (MH) → S)-valued random variable F such that Let T : S × F → [0, 1] denote the 087 Consider an arbitrary kernel: S ratio). A stochastic map is an equivalent view tic map is an equivalent view of this chain, pushing the randomness into a list 089 by first proposing and then deciding event ∈Formally, FS . Concretely, these maps are 088 map: -valued r.v. F such that A Stochastic ion functions. it is a (S → S)-valued random variable F such that of random transition functions. Formally, it is 090 s) ∈ A) for all state S and event A ratio). ∈ t(u, FS . Concretely, these map maps are A stochastic is an equiva ally defined as as ∈transformation s) with 089 T (s, A) = P(F (s) ∈ A) for all state s ∈ S ing the observation that T is typically defined as a transformation t(u, s) with 091 of random transition functions. Form se where t is based on the inverse cumulative 090 using theinverse observation most fundamental example is constructed the case where tby is based on the cumulative that T is 092 F alternate T (s, A) = P(F (s) ∈ A) for all stat Example: view on MCMC od. We can then write (s) =ut(U, s) for a uniform random variable U on [0, 1]. ) for a uniform random variable U on [0, 1]. ∈091[0, 1]. The most fundamental example is t constructed by using the observation 093 Sample F1but , F2distribution , ...092 i.i.d. , we geta non-standard, useful way of formulating MCMC algorithms. First, F (s) = method. We can then write uidentically. ∈ [0, 1].First, The most fundamental exa way of formulating MCMC algorithms. tic maps F , F , . . . , F , independently and Second, to compute the 094 1 2 N Pick x0 arbitrarily093 fter n transitions, simply return F1 (Fthis (F )) =method. F1 ◦·a· ·◦F (x0 ), for thenbut distribution We can write 2 (. . . notation, n (x 0 )) . . .we n ently and identically. Second, to compute the With get non-standard, u 095 tate x0 ∈ S,Return: n ∈ {1, 2, . . . , N }.094 This representation decouples the dependencies sample maps , Fthe , .state .a. ,non-stand FN , inde F (. . . (F . . . the )) = F1N ◦· stochastic ·With ·◦F (xby0notation, ), for F1we 2get n (x 0 ))from nthis 096 m 2number generation dependencies induced operations on 095 state of the chain after n transitions, simply retur the latter are still not readily amenable to parallelization, and this is the motivation his representation decouples the dependencies sample N stochastic maps F , F 097 1 2, . . . 096 the foundation of our method. We will showstart in Section 3x that ∈ SMC algorithms an arbitrary state S, nn∈transitions, {1, 2, . . . ,sim N 0 endencies induced by operations on the state state of the chain after 098 en using stochastic maps. 097 induced by random number generation from th ble to parallelization, and this is the motivation an arbitrary start state x ∈ S, n ∈ {1 0 099 098 In MCMC, the latter are still not readily a samedi 22 septembre 2012 space. Stochastic maps Algorithm 2 : reconstruct(s, ρ, i, F = {Fi : i ∈ I} components decreases by one at every step. More precisely, we will build each 1: F ← I rooted X -tree t by proposing a sequence of X -forests s , s , . . . , s = t, where Overview 2: while (s(i) = nil) do an X -forest s = {(t , X )} is a collection of rooted X -trees t such that the 3: union F of← F of◦ the Fitrees in the forest is equal to the original set of disjoint leaves Sample a global collection of i.i.d. stochastic maps for both the 4: ! X i=← ρ(i)that with this specific construction, a forest of rank r leaves, X . Note proposal { F i } and resampling steps { Gi } has5: |X | end − r trees. while The of partial considered in this Section are assumed to satisfy 6: sets return F states (s(i)) Assume the global collection is transmitted to all machines the following three conditions: (O(1) if pseudo-random) 1 r i i i i 2 R i i 1. The sets of partial states of different ranks should be disjoint, i.e. Sr ∩ Ss for all r $= s (in phylogenetics, this holds since a forest with r trees cannot 1 be a forest with s trees when r $= s). 2. The set of partial state of smallest rank should have a single element denoted by ⊥, S1 = {⊥} (in phylogenetics, ⊥ is the disconnected graph 2 on X ). 3. The set of partial states of rank R should coincide with the target space, 3 SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tree and are members of the target space X ). samedi 22 septembre 2012 r Algorithm 2 : reconstruct(s, ρ, i, F = {Fi : i ∈ I} components decreases by one at every step. More precisely, we will build each 1: F ← I rooted X -tree t by proposing a sequence of X -forests s , s , . . . , s = t, where Overview 2: while (s(i) = nil) do an X -forest s = {(t , X )} is a collection of rooted X -trees t such that the 3: union F of← F of◦ the Fitrees in the forest is equal to the original set of disjoint leaves Sample a global collection of i.i.d. stochastic maps for both the 4: ! X i=← ρ(i)that with this specific construction, a forest of rank r leaves, X . Note proposal { F i } and resampling steps { Gi } Each machine m has5: |X | end − r trees. while is responsible of a The sets of partial states considered in this Section are assumed to satisfy 6: return F (s(i)) Assume the global collection is transmitted to all machines subset of the the following three conditions: particles (O(1) if pseudo-random) 1 r i i i i 2 R i i 1. The sets of partial states of different ranks should be disjoint, i.e. Sr ∩ Ss for all r $= s (in phylogenetics, this holds since a forest with r trees cannot 1 be a forest with s trees when r $= s). 2. The set of partial state of smallest rank should have a single element denoted by ⊥, S1 = {⊥} (in phylogenetics, ⊥ is the disconnected graph 2 on X ). 3. The set of partial states of rank R should coincide with the target space, 3 SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tree and are members of the target space X ). samedi 22 septembre 2012 r Algorithm 2 : reconstruct(s, ρ, i, F = {Fi : i ∈ I} components decreases by one at every step. More precisely, we will build each 1: F ← I rooted X -tree t by proposing a sequence of X -forests s , s , . . . , s = t, where Overview 2: while (s(i) = nil) do an X -forest s = {(t , X )} is a collection of rooted X -trees t such that the 3: union F of← F of◦ the Fitrees in the forest is equal to the original set of disjoint leaves Sample a global collection of i.i.d. stochastic maps for both the 4: ! X i=← ρ(i)that with this specific construction, a forest of rank r leaves, X . Note proposal { F i } and resampling steps { Gi } Each machine m has5: |X | end − r trees. while is responsible of a The sets of partial states considered in this Section are assumed to satisfy 6: return F (s(i)) Assume the global collection is transmitted to all machines subset of the the following three conditions: particles (O(1) if pseudo-random) 1 r i i i i 2 R i i 1. The sets of partial states of different ranks should be disjoint, i.e. Sr ∩ Ss for all r $= s (in phylogenetics, this holds since a forest with r trees cannot 1 be a forest with s trees when r $= s). 2. The set of partial state of smallest rank should have a single element denoted by ⊥, S1 = {⊥} (in phylogenetics, ⊥ is the disconnected graph 2 3. Consequence of resampling on X ). step: sometimes machine m a particles i outside of The set of partial states of rank R should coincideneeds with the target space, 3 its subset SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tree and are members of the target space X ). samedi 22 septembre 2012 r Algorithm 2 : reconstruct(s, ρ, i, F = {Fi : i ∈ I} components decreases by one at every step. More precisely, we will build each 1: F ← I rooted X -tree t by proposing a sequence of X -forests s , s , . . . , s = t, where Overview 2: while (s(i) = nil) do an X -forest s = {(t , X )} is a collection of rooted X -trees t such that the 3: union F of← F of◦ the Fitrees in the forest is equal to the original set of disjoint leaves Sample a global collection of i.i.d. stochastic maps for both the 4: ! X i=← ρ(i)that with this specific construction, a forest of rank r leaves, X . Note proposal { F i } and resampling steps { Gi } Each machine m has5: |X | end − r trees. while is responsible of a The sets of partial states considered in this Section are assumed to satisfy 6: return F (s(i)) Assume the global collection is transmitted to all machines subset of the the following three conditions: particles (O(1) if pseudo-random) 1 r i i i i 2 R i i 1. The sets of partial states of different ranks should be disjoint, i.e. Sr ∩ Ss for all r $= s (in phylogenetics, this holds since a forest with r trees cannot 1 Idea: Reconstruct 2. The set of partial state of smallest rank should have a single element particle i denoted by ⊥, S = {⊥} (in phylogenetics, ⊥ is the disconnected graph 2resampling Consequence of using the on X ). step: sometimes machine m stochastic needs a particles i outside of 3. The set of partial states of rank R should coincide with the target space, 3 maps be a forest with s trees when r $= s). 1 its subset SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tree and are members of the target space X ). samedi 22 septembre 2012 r 3: F ←F ◦F 4: ! X i=← ρ(i)that with this specific construction, a forest of rank leaves, X . Note has5: |X | end − r trees. Distributed genealogy s(i), ρ(i) while The of partial considered in this Section are assumed to satisf 6: sets return F states (s(i)) disjoint union of leaves of thei trees in the forest is equal to the original set o i i the following three conditions: 1. The sets of partial states of different ranks should be disjoint, i.e. Sr ∩ S for all r $= s (in phylogenetics, this holds since a forest with r trees canno 1 be a forest with s trees when r $= s). 2. The set of partial state of smallest rank should have a single elemen denoted by ⊥, S1 = {⊥} (in phylogenetics, ⊥ is the disconnected grap 2 on X ). 3. The set of partial states of rank R should coincide with the target space 3 SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tre and are members of the target space X ). r These conditions will be subsumed by the more general framework of Sec Figure 2: Illustration of compact tion 4.5, but the more concrete conditions above help understanding the pose samedi 22 septembre 2012 3: F ←F ◦F 4: ! X i=← ρ(i)that with this specific construction, a forest of rank leaves, X . Note has5: |X | end − r trees. Distributed genealogy s(i), ρ(i) while The of partial considered in this Section are assumed to satisf 6: sets return F states (s(i)) disjoint union of leaves of thei trees in the forest is equal to the original set o i i the following three conditions: 1. The sets of partial states of different ranks should be disjoint, i.e. Sr ∩ S for all r $= s (in phylogenetics, this holds since a forest with r trees canno 1 Assume be a forest with s trees when r $= s). w.l.o.g. there 2. The set of partial state of smallest rank should have a single elemen is always a denoted by ⊥, S = {⊥} (in phylogenetics, ⊥ is the disconnected grap 2 shared on X ). common ancestor 3. The set of partial states of rank R should coincide with the target space 3 1 SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tre and are members of the target space X ). r These conditions will be subsumed by the more general framework of Sec Figure 2: Illustration of compact tion 4.5, but the more concrete conditions above help understanding the pose samedi 22 septembre 2012 3: F ←F ◦F 4: ! X i=← ρ(i)that with this specific construction, a forest of rank leaves, X . Note has5: |X | end − r trees. Distributed genealogy s(i), ρ(i) while The of partial considered in this Section are assumed to satisf 6: sets return F states (s(i)) disjoint union of leaves of thei trees in the forest is equal to the original set o i i the following three conditions: Current concrete particles: 1. those explicitly stored in machine m s(i) ≠ nil The sets of partial states of different ranks should be disjoint, i.e. Sr ∩ S for all r $= s (in phylogenetics, this holds since a forest with r trees canno 1 Assume be a forest with s trees when r $= s). w.l.o.g. there 2. The set of partial state of smallest rank should have a single elemen is always a denoted by ⊥, S = {⊥} (in phylogenetics, ⊥ is the disconnected grap 2 shared on X ). common ancestor 3. The set of partial states of rank R should coincide with the target space 3 1 SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tre and are members of the target space X ). r These conditions will be subsumed by the more general framework of Sec Figure 2: Illustration of compact tion 4.5, but the more concrete conditions above help understanding the pose samedi 22 septembre 2012 3: F ←F ◦F 4: ! X i=← ρ(i)that with this specific construction, a forest of rank leaves, X . Note has5: |X | end − r trees. Distributed genealogy s(i), ρ(i) while The of partial considered in this Section are assumed to satisf 6: sets return F states (s(i)) disjoint union of leaves of thei trees in the forest is equal to the original set o i i the following three conditions: Current concrete particles: 1. those explicitly stored in machine m s(i) ≠ nil The sets of partial states of different ranks should be disjoint, i.e. Sr ∩ S for all r $= s (in phylogenetics, this holds since a forest with r trees canno 1 Assume be a forest with s trees when r $= s). w.l.o.g. there 2. The set of partial state of smallest rank should have a single elemen is always a denoted by ⊥, S = {⊥} (in phylogenetics, ⊥ is the disconnected grap 2 shared on X ). common ancestor 3. The set of partial states of rank R should coincide with the target space 3 1 SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tre and are members of the target space X ). r Current compact particles: only store id of the parent ρ(i) These conditions will be subsumed s(i) = by nil the more general framework of Sec Figure 2: Illustration of compact tion 4.5, but the more concrete conditions above help understanding the pose samedi 22 septembre 2012 ρ(i) Reconstruction of particle i le Algorithm 2 : reconstruct(s, ρ, i, F = al states considered in this Section are assumed to sa F (s(i)) 1: F ← I Note that with this specific construction, a forest of ra components decreases by one at every step. More precisely, we w rooted X -tree t by proposing a sequence of X -forests s1 , s2 , . . . , s 2: while (s(i) = nil) do 3: union F of← F of◦ the Fitrees in the forest is equal to the o disjoint leaves on 3.3} 4: ! X i=← ρ(i)that rtial states of different ranks should bespecific disjoint, i.e.a S leaves, X . Note with this construction, forr has5: |X | end − r trees. while n phylogenetics, The this holds since a forest with r trees ca of partial considered in this Section are assum 6: sets return F states (s(i)) 1 conditions: an X -forest sr = {(ti , Xi )} is a collection of rooted Xi -trees ti s i i the following three conditions: th s trees when r $= s). 1. The sets of partial states of different ranks should be disjoin for all r $=rank s (in phylogenetics, holds since a forest with rtial state of smallest should this have a single eler ection B} be a forest with s trees when r $= s). ction 3.1} S1 = {⊥} (in phylogenetics, ⊥ is the disconnected g 2 2. The set of partial state of smallest rank should have a si ) samedi 22 septembre 2012 denoted by ⊥, S1 = {⊥} (in phylogenetics, ⊥ is the disconn on X ). Resampling At resampling, only transmit particle weights Genealogy can be updated efficiently from this information samedi 22 septembre 2012 Details See NIPS paper: Jun, Wang, Bouchard-Côté (2012) NIPS. Datastructures the stochastic maps Constant storage using pseudo-randomness Need random access to the random number: binary trees of xor’ing 2 streams of random numbers Allocations schemes: heuristics to minimize the amount of particle transmission samedi 22 septembre 2012 03 samedi 22 septembre 2012 1200000 200000 400000 600000 800000 97 98 Comparison: Particle 99 transmission over network (red) 00 EMC (blue) 01 02 Total run time of EMC versus Particle transfer 0 92 93 Setup: 94 Phylogenetic inference 95 100 particles/EC2 96 instance Time (milliseconds) 90 91 Figure 3: (a) The speedup factor for the 16S actin factor for the 5S actinobacteria dataset with 100 ta synthetic experiment (see text). Experiments 500 1000 1500 2000 # of particles 2500 3000 Future directions SMC algorithms for inference over countably infinite / combinatorial CTMCs Model: String-valued Continuous Time Markov Chain Evolution of strings is denoted byjump M remove the bounded rate Using these techniques to Branch length is T = T . assumption in jump-diffusion methods Strings {X : 0 ≤ t ≤ T }; X has a countable infinite domain of all possible molecular sequences. New applications: 6 t t Structures Strings RNA strand following itsEvolution of strings is denoted by M folding pathway Branch length is T = T . x5 G T G T A C x4 G T G T C x6 G T AModel: C String-valued Continuous Time Markov Chain x1 G A T x2 A T C x3 G T C C 6 Strings {Xt : 0 ≤ t ≤ T }; Xt has a countable infinite domain of all possible T3 T4 T5 T6 BranchTime T1 T2 0 length molecular sequences. L. Wang and A. Bouchard-Côté (UBC) Bayesian non-parametric model samedi 22 septembre 2012 Harnessing Non-Local Evolutionary Events for Tree Infer. Strings Latent June 25, 2012 6 / 16 states ... x5 G T G T A C x4 G T G T C x6 G T A C x1 G A T C x2 A T C x3 G T C 4 3 2 1 0 L. Wang and A. Bouchard-Côté (UBC) T1 T2 T3 T4 T5 Harnessing Non-Local Evolutionary Events for Tree Infer. T6 BranchTime length June 25, 2012 6 / 16 Future directions EMC Working on another version where only the sum of the particle weights is transmitted (using DHT methods) Better understanding of when and why the method works well samedi 22 septembre 2012