Waiting time to two sequences find a common ancestor . Assume the following simple model for reproduction. Every generation 2N sequences make the next generation by making a large number of copies of themselves in a large pool. The new generation is then made by sampling from this large pool. This can seem unrealistic, but adding realism such diploidy, overlapping generations and only a small number of copies from the parent population, does not change fundamental properties of the model, only makes it more complicated to model in. We will now ask some questions about ancestry looking back in time. Ancestor alleles : x x x x X x x x X x t=-1 The present : x x x X x x x X x x t=0 What is the probability that two individuals descend from the same ancestor in the previous generation (call this p)? What is the probability that it was more than t generations ago? What is the waiting time for finding a common ancestor? What is the probability that 3 sequences had 2 ancestors 1 generation ago? What is the probability, they had 3? Make an approximation for these probabilities that is good, when N is large. What is the probability that k sequences had k ancestors 1 generation ago? What is the probability, they had k? Make an approximation for these probabilities that is good, when N is large. What is the mean height of the tree relating k sequences? What is the average branch length? How doesthe number of observable mutations (columns where not all sequences are identical) grow as function of number of sequences? As function of sequence length? We will now introduce mutation. Every generation the is probability m that a sequence mutates – we will assume that the new sequence is a new variant all the time. What is the probability that two alleles are identical? What is the probability that three alleles are identical? Are all different? Two identical and the last different? Exe 9: Nu introducerer vi mutation. Hver generation er der ss m for at en sekvens muterer til ny variant og vi antager at denne nye variant er ny. If you can only register identity/difference in comparison of sequences, how many observations are possible for 4 sequences? 5 sequences? For 3 we have 5 possibilities: All identical, all different, two identical and the different can be chosen in 3 ways. The number of ways to make exhausting, disjoint sets of {1.2..,.n} is called the Bell numbers, B(n). The number of ways to make k disjoint sets of {1.2..,.n} is called the Stirling numbers of second kind, S(n,k). Clearly B(n) = S(n,k). 0 k n Write a recursion for S(n,k). Take two sequences for the original Kreitman data What is your guess for 4N*m pr. position? If you knew what the m was 10-8, what would you guess be for 2N? Do the practical found on http://www.coalescent.dk/