Math 4600, Homework 9 1. (Computing) Generate cumulative times of 100 events of a Poisson process with λ = 0.7. Inter-event times can be generated with the function rexp(n = 100, rate = .7), where “n” is the number of observations wanted, and “rate” is the probabilistic rate of the Poisson process. Use the cumsum() function to get the cumulative sum of these times. a. Plot the event times as dots in a line at the appropriate times (horizontal axis should be time. Dots, located on the same horizontal line, mark times of events). b. How many events occur in the first 50 units of time? What is the expected number of events in the first 50 units of time? Now find the number of events that occurred in the first 100 units of time, and compare it to the expected number. c. Pretend that you forgot the actual λ you used; use your data to estimate it. Do this 3 times: Once with only the first 10 interevent times, next with the first 50, and finally with all the data. d. Write down the log-likelihood function for n inter-event times τi (and simplify this using log properties, as in class). Define this function in R, and use it to graph the log-likelihood function of the 1st 10 inter-event times of your data. Add the lines of the log-likelihood functions of the 1st 50 events, and 100 events. Write down the log-likelihood function in each case. How does the qualitative form of the log-likelihood function change near λ = 0.7 as more data are included? 2. Suppose two lineages of bacteria (A and B) have been separated for g generations. Both have genomes with n sites, and mutation rate p per nucleotide, per replication. For the following problems, assume that mutations on each genome are distributed according to a Poisson distribution, as in class. a. What is the probability of a comparison resulting in k differences? This will be a function of g, p, n and k. b. Suppose the differences {ki }, i = 1, 2, 3, ..., N are obtained. What is probability of obtaining this result? This will be a product of terms from part (a), and is called the Likelihood. Call this function L. c. Define L = lnL as the log-likelihood. Find an expression for the g that maximizes the probability of obtaining the data in part b. Do this by finding dL dg , setting it to 0, and solving for g. d. Suppose scientists obtain the genomes of 100 individuals of each type, and document the number of differences in the DNA. The data are available on the website. Use the data to estimate the number of generations since the two lineages shared a common ancestor. Choose n = 5 × 106 , and p = 2 × 10−6 . 1