Math 4600: Homework 8 Solutions Gregory Handy [8.1] Consider a process of molecules leaving a cell, independent of each other, with a constant probabilistic rate of 2 per second. (a) Find probability that at least 2 molecules leave during a 4 second experiment? (b) How many molecules do you expect to leave, on average, during a 10 second experiment? (c) What is the probability that the number of molecules that leave during the 10 second experiment is within 1 from the expected number? (d) What is the expected waiting time between second molecule that leaves and the third one? (e) What is he probability that you have to wait longer than one second for the first molecule to leave? (f) If you watch for a while and the molecules keep leaving and nothing enters the cell, come up with a reason for why the process may stop being Poisson? This process is described by a Poisson distribution, with λ = 2 molecules per second. (a) In order to find Pr(N ≥ 2) for a two second experiment, we take t = 4, and write Pr(N ≥ 2) = 1 − Pr(N < 2) = 1 − [Pr(N = 0) + Pr(N = 1)] −λt e (λt)0 e−λt (λt)1 =1− + 0! 1! = 0.9970. (b) During a 10 second experiment, we expect µ = t · λ = 10 · 2 = 20 molecules to leave on average. (c) From part (b), we know that the expect number is 20. So to find the probability to be within one of this, we take t = 10 and write Pr(19 ≤ N ≤ 21) = Pr(N = 19) + Pr(N = 20) + Pr(N = 21) e−λt (λt)20 e−λt (λt)21 e−λt (λt)19 + + 19! 20! 21! = 0.2623. = (d) To find the expected waiting time, T, between the second and third molecules, we use the exponential distribution E[T ] = 1 1 = seconds. λ 2 Note that this would also be the answer if we asked for the expected waiting time between the eighth and ninth molecules. (e) To find the probability that T is greater than 1 second, we have Pr(T > 1) = 1 − Pr(T ≤ 1) = 1 − 1 − e−λt , where t = 1 = 0.1353. 1 (f) After a while, if the molecules keep leaving the cell, with nothing entering, it is safe to assume that the total number of molecules in the cell has decreased by a significant number. As a result, it is unlikely that the rate of a molecule leaving (λ = 2) will we constant. Since a Poisson process requires a fixed rate for λ, this process is stop being a Poisson process. [8.2] Generate times of 100 events of a Poisson process with λ = 1. (To help: the command t=(1/lambda)*log(1-rand(m,n)) will generate m by n matrix of random numbers that are drawn from exponential distribution). (a) Plot the event times as dots in a line at the appropriate times (horizontal axis should be time, dots, located on the same horizontal line, mark times of events). Look at different parts of the array. Does it look random? (to check that you did this correctly your last event should happen at t near 100.) (b) Count the number of events up to time 10. Is it close to the number we expect from theory? Do the same thing up to time 50. Is it close this time? (c) Now look at your generated array and pretend that you have forgotten λ that you used. Estimate it. (a) The plot of the event time can be found in Fig. 1. Zooming in, the points do appear randomly. 1 Event Times 0.5 0 −0.5 −1 0 20 40 60 80 100 Time Figure 1: Plot of the event times. (b) I counted 12 events that occurred up to time 10, and 48 events occurred up to time 50. From theory, we expect µ = λt = 1 · 10 = 10 and 50 events to occur on average during these two time windows, respectively. I found that both numbers were close to this theory. (c) We know from theory that E[t] = 1 , λ Where E[t] is the average time between events. It follows that an estimate for λ can be found by using the following estimate for E[t] b= 1 λ b E[t] 100 = P100 i=1 ti = 1.011, where tj are the event times. You could have also used the fact that time of the last event. 2 P100 i=1 ti = T100 , where T100 is the [8.3] Consider two populations of fish. The first one has mutation rate of 0.002 per generation and the second one has a mutation rate of 0.0004 of generations. They reproduce once a year. The results of experimentally measuring the number of mutation differences in fish subpopulations are summarized in the following diagram (found on question sheet). There is a debate in a newspaper in which one side claims that these fish have been independently evolving for 1 million years and the other side claims that have been evolving for 10 thousand years. Which do you think is more likely based on comparing the predicted distributions of mutation differences with the data? We know that the mutation rate of fish 1 follows a Poisson process with λ1 = 0.002, and similarly for fish 2, we have λ2 = 0.0004. It follows that the difference between these two populations, D = M1 + M2 , where M1 is the number of mutations in fish 1 and M2 is the number of mutations in fish 2, is also a Poisson process, governed by λ = λ1 + λ2 . Thus, the distribution at time t of D is defined to be Pr(D = k) = e−λt (λt)k . k! Using MATLAB, we can graph this distribution for t = 106 and t = 10, 000 to find: t=106 −3 t=10,000 0.09 8 0.08 7 0.07 6 0.06 Probability Probability x 10 5 4 0.05 0.04 3 0.03 2 0.02 1 0.01 0 2100 2200 2300 2400 D 2500 2600 0 0 2700 10 20 30 40 50 D Figure 2: Plot of the distributions with t = 106 (left) and t = 10, 000 (right). Comparing to the diagram found on the question sheet, we find the distribution for t = 10, 000 is a better match. Thus, they are most likely to have been evolving independent for 10,000 years. [8.4] Let us say that the drug resistance in HIV results from a particular 3 base mutation. Find probability that the drug resistance develops in the HIV population in one day. Do and explain all steps, including those that we did in class. Step 1: Find the total number of possible 3 base mutations, which follows the formula 3i N i , where N is the number of bases, and i is the number of mutations α = #3-base mutations 104 3 =3 3 = 4.4987 · 1012 Step 2: Find the number of 3 mutations that are produced each day, by first find the probability of a 3 mutation occurring, Pr(k = 3) = B(N, i, mutation rate) 3 = B(104 , 3, 3 × 10−5 ) 3 104 −3 104 = 3 × 10−5 1 − 3 × 10−5 3 = 3.33 × 10−3 . Since there are 108 new infectious viruses per day, the number that have 3 mutations is: Nminf = B(104 , 3, 3 × 10−5 ) · 108 = 3.33 × 105 virons per day Step 3: Find the probability that drug resistance develops in one day: Nminf α 3.33 × 105 = 4.4987 · 1012 = 7.4 · 10−8 per day Pr(Drug resistance) = 4