Math 4600, Homework 9

advertisement
Math 4600, Homework 9
1. (Computing) Generate cumulative times of 100 events of a Poisson process with λ = 0.7. Inter-event
times can be generated with the function rexp(n = 100, rate = .7), where “n” is the number of observations
wanted, and “rate” is the probabilistic rate of the Poisson process. Use the cumsum() function to get the
cumulative sum of these times.
a. Plot the event times as dots in a line at the appropriate times (horizontal axis should be time. Dots,
located on the same horizontal line, mark times of events).
b. How many events occur in the first 50 units of time? What is the expected number of events in the
first 50 units of time? Now find the number of events that occurred in the first 100 units of time, and
compare it to the expected number.
c. Pretend that you forgot the actual λ you used; use your data to estimate it. Do this 3 times: Once
with only the first 10 interevent times, next with the first 50, and finally with all the data.
d. Write down the log-likelihood function for n inter-event times τi (and simplify this using log properties,
as in class). Define this function in R, and use it to graph the log-likelihood function of the 1st 10
inter-event times of your data. Add the lines of the log-likelihood functions of the 1st 50 events, and
100 events. Write down the log-likelihood function in each case. How does the qualitative form of
the log-likelihood function change near λ = 0.7 as more data are included?
2. Suppose two lineages of bacteria (A and B) have been separated for g generations. Both have genomes
with n sites, and mutation rate p per nucleotide, per replication. For the following problems, assume that
mutations on each genome are distributed according to a Poisson distribution, as in class.
a. What is the probability of a comparison resulting in k differences? This will be a function of g, p, n
and k.
b. Suppose the differences {ki }, i = 1, 2, 3, ..., N are obtained. What is probability of obtaining this
result? This will be a product of terms from part (a), and is called the Likelihood. Call this function
L.
c. Define L = lnL as the log-likelihood. Find an expression for the g that maximizes the probability of
obtaining the data in part b. Do this by finding dL
dg , setting it to 0, and solving for g.
d. Suppose scientists obtain the genomes of 100 individuals of each type, and document the number of
differences in the DNA. The data are available on the website. Use the data to estimate the number of
generations since the two lineages shared a common ancestor. Choose n = 5 × 106 , and p = 2 × 10−6 .
1
Download