The Discrete-Time Coalescent

The Geometric Distribution • Probability of the 1st success on the Nth trial, given a probability, p, of success P(Roll 1st 6 on the ith roll) = (1 - 5/6)i-1 (1/6) P(N  j)  (1  p) j1 p P(1st heads on the ith flip) = (1 - 1/2)i-1 (1/2) 1 E(N)  p 1 p Var(N)  2 p To show P(N=j) is a proper pdf:   (1  p) p j1  p  4/13/2015 j1  (1  p) j1 1 p Comp 790– Distributions & Coalescence j (1  p)  (1  p)  p  (1  p) 1       1 p 0  (1  p)  p   p  1 1 p 1 Example • Difference from “Binomial” distribution – Binomial(k) = P(k successes in N trials) – Geometric(k) = P(1st success after k-1 failures) 4/13/2015 Comp 790– Distributions & Coalescence 2 Expected Value Proof • Expected value is value times its probability  E(N)   j(1  p) j1 p j1  p  • Recall the relation:  • Substituting gives:  4/13/2015   j1 ja j   j(1  p) j j1 1 p a (1  a) 2 for 0  a  1 1  p  p 2   p  1 E(N)   1 p p Comp 790– Distributions & Coalescence 3 Other Properties • Markov Property – The probability of the “next step” in a discrete or continuous process depends only on the process's present state – The process is without memory of previous events P(T t2 T  t1 )  P(T  t2  t1 )  4/13/2015 Comp 790– Distributions & Coalescence 4 Continuous Generalization • Geometric distributions characterize “discrete” events • Sometimes we’d like to pose questions about continuous variable, for example – Probability that a population will be inbred after T years, rather than after N generations, where T is a real number, and N is an integer • The “continuous” counterpart of the geometric distribution is the “exponential” distribution 4/13/2015 Comp 790– Distributions & Coalescence 5  Exponential Distribution • The Exponential density function is characterized by one parameter, a, called the “rate” or “intensity” Exp(a,t)  aeat 1 E(Exp(a,t))  a 1 Var(Exp(a,t))  a2 To show Exp(a,t) is a proper pdf:   ae at dt t0   t0 aeat dt  1  eat  10 1 0  4/13/2015 Comp 790– Distributions & Coalescence 6 Exponential Properties • Other useful properties of U = Exp(a,t) include: – Markov property, where t2 > t1 P(U t2 U  t1 )  P(U  t2  t1 ) – Assuming a second independent exponential process, V = Exp(b,t)  a P(U  V)  a b min(U,V) ~ Exp(a  b) 4/13/2015 Comp 790– Distributions & Coalescence 7 Approximations • The geometric distribution can be approximated with the exponential distribution in various ways • Consider the following geometric distribution P(N  j)  (1  p) j There are at least “j” failures before the first success • We can model discrete time as a rational fraction of of some very large number, M, that includes all intervals of interest.  (i.e. 1/M, 2/M, … N/M … M/M, rather than 1, 2, 3, …) • Assuming p is small and N is large, we can approximate “continuous” time as t = j/M and a = pM 4/13/2015 Comp 790– Distributions & Coalescence 8 Approximations (cont) • Recalling t = j/M and a = pM, we can rewrite (1-p)j as: jM M  pM   a  j P(N  j)  (1  p)  1    1    P( MN  t)  M   M  tM • Also note, for large M:  a  at 1    e  M  tM  • Thus, P(T = t) = a P(N/M ≥ t) is approximately exponential with intensity a.  4/13/2015 Comp 790– Distributions & Coalescence 9 The Discrete-Time Coalescent • We consider the N-coalescent, or the coalescent for a sample of N genes (Kingman 1982) • N-coalescent: What is the distribution of the number of generations to find the Most Recent Common Ancestor (MCRA) for a fixed population of 2N genes • We use 2N because we recognize that the diploid case is more realistic, and it is related to the simpler haploid case by a factor of 2 4/13/2015 Comp 790– Distributions & Coalescence 10 MRCA Examples 4/13/2015 Comp 790– Distributions & Coalescence 11 Coalescence of two genes • What is the distribution of the number of prior generations for the MCRA (waiting time)? • Probability a common parent (i.e. the MCRA is in the immediately previous generation) is: 1 2N The first gene can choose its ancestor freely, but the second must choose the same of the first, thus it has 1 out of 2N choices • Probability that 2 genes have a different parents is  1 4/13/2015 1 2N Comp 790– Distributions & Coalescence 12 Going back further • Since sampling in successive generations is independent of the past, the probability that two genes find a common ancestor j generations back is:  1  1 MRCA( j)  1    2N  2N j1 In the first, j-1, generations they chose different ancestors, and then in generation j they chose the same ancestor • Which is a geometric distribution with p = 1/2N • Thus, the coalescence time for 2 genes is: E(MRCA(j))  1p  2N 4/13/2015 Comp 790– Distributions & Coalescence 13 MRCA Examples N = 10 4/13/2015 Comp 790– Distributions & Coalescence 14 N-genes, no common parent • The waiting time for k ≤ 2N genes to have fewer than k lineages is: (2N 1) (2N  2) 2N 2N (2N  (k 1))  2N The 1st gene can choose it parent freely, but the next k-1 must choose from the remainder Genes without a child k1 1  i 2N i1 • Manipulating a little  k1 1  i1 k1 i 2N 1  i1 k  1  1     O 2 2N  j O 2N 1 N2  1 N2 • Where, for large N, 1/N2 is negligible  4/13/2015 Comp 790– Distributions & Coalescence 15 N-gene Colescence • The probability k-genes have different parents is: k  1 1    2 2N • And one or more have a common parent:  k  1  k  1 1  1        2 2N  2 2N       • Repeated failures for j generations leads to a geometric  distribution, with k  1 p    2 2N 4/13/2015  k  1 j1k  1 P(N  j)   1  2 2N   2 2N       Comp 790– Distributions & Coalescence 16 Next Time • Finish coalesence of a N-genes • The effect of approximations • The continuous-time coalescent • The effective population size 4/13/2015 Comp 790– Distributions & Coalescence 17

The Discrete-Time Coalescent

Related documents

Products

Support

The Discrete-Time Coalescent

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib