Math 4600, Homework 7

Math 4600, Homework 7
1. Computing The HIV virus genome has approximately 104 sites. The conversion of viral RNA to DNA
(by reverse transcriptase) is error-prone, and each site has a probability p = 3 × 10−5 of having the wrong
nucleotide copied during replication (from Perelson, Nelson, 1999).
a. What distribution does the total number of mutations follow? What is the expected number of
mutations after 1 cycle of replication? What is the variance in this outcome?
b. Write a simulation of 104 RNA sites that are copied incorrectly with probability p, and correctly with
probability 1 − p. Add up the number of mutations. Repeat this experiment many times (having the
initial condition of 0 mutants in the population), and graph the results in R using a histogram (set
the frequencies parameter to FALSE).
c. Add a plot of the probability model you specified in part a).
2. This problem continues off of problem 1
a. What is the probability that a single virus replication results in at least 1 mutation? Call this
probability m.
b. Computing Simulate a group of 1000 viral RNA molecules over 100 minutes of time, and assume
that they all get reverse transcribed simultaneously once each minute, with probability m (from part
a.) of having at least 1 mutation. For each individual, track the time until the first mutation occurs,
and plot these times on a histogram. Add a plot of the probabilities from a geometric distribution
describing this scenario (they should match).
c. Suppose a DNA molecule is transcribed from a virus RNA each minute. How long would you have
to wait on average before observing the first mutated DNA copy from this single virus RNA?
. Suppose this is the rate at
d. Recall that the clearance rate for viruses was estimated to be 3.1 day
which viral RNA is cleared. What is the expected number of mutated DNA versions of itself that 1
virus mRNA will produce in its lifetime?
3. (From F. Adler “Modeling the Dynamics of Life”). An important application of Bayes’ theorem is the
planning of rare disease screening. Suppose we have a disease that affects 1% of people. A diagnostic test
always detects the disease but also generates 5% false positives (a positive test when there is no disease).
a. A patient tests positive. What is the probability that she has the disease? (You may find it useful to
define the following events: D - the disease is present, N - there is no disease, P - the test is positive).
b. If you solved the part a) correctly, you find that even with a positive test an individual is still unlikely
to have the disease. For this reason, it is often more useful to test only the risk groups, where the
disease is more common. Assume that in a risk group the occurrence of the disease is 40%. Check
that in this case the probability for a positively-tested individual to have the disease is about 93%,
i.e. now the test is a nearly certain indicator of the disease.
c. Computing Suppose the disease affects p% of people. Make a graph of (the probability that a
positive test correctly indicates disease) vs p. What does this curve say about the importance of the
prevalence of the disease in a correct diagnosis?