Calculus for Biologists Lab Math 1180-002 Spring 2012

advertisement
Calculus for Biologists Lab
Math 1180-002
Spring 2012
Lab #11 - Binomial Distribution and Normal Approximation
Report due date: Tuesday, April 10, 2012 at 9 a.m.
Goal: To simulate a binomial random variable and to compute a normal approximation.
? Create a new script, either in R (laptop) or with a text editor (Linux computers).
DNA Damage
Suppose a lab is doing an experiment in order to determine the effect of mutation number on organismal phenotype.
After 10 days, this particular type of organism experiences no further alterations to its DNA, so the labs looks at
the frequency of mutations under equilibrium. Taking a single 250-base-pair gene segment from 1000 organisms,
the lab counts the number of base pairs containing at least one mutation. Let bp and organisms denote the
number of base pairs and organisms under observation, respectively.
bp = ##
organisms = ##
Simulation
Assume that the resident lab has a mathematically inclined scientist, who was able to compute the equilibrium
fraction of base pairs that would end up damaged after standard mutation/repair processes played out. This
scientist found this fraction to be 13 . Save this as p.
p = ##
Simulate the base pair profile B of this set of organisms. Let 0 denote the absence of a mutation, and 1 the
presence of a mutation. Replace any ## comments appropriately.
B = matrix(sample(c(0,1), bp*organisms, replace=TRUE, prob=c(##,##)), #rows?#, #columns?#)
You should now have a matrix with multiple rows and columns. Let’s count the number of mutations found in
each organism, and summarize the results with a saved histogram:
bad.bp = colSums(B)
bad.hist = hist(bad.bp,plot=F)
We can plot this histogram within a range of values related to the information embedded in bad.hist.
hist(bad.bp,
## what to make a histogram of
col="dodgerblue",
## what color to make the bars
xlim=c(0,bp),
## limits along the x-axis
ylim=c(0,1.2*max(bad.hist$density)), ## y-axis limits above the largest probability found
prob=T,
## plot probability, not frequency
ylab = "Probability",
xlab = "Number of mutations")
The normal approximation
There are a set of circumstances under which the normal distribution may be used to approximate that of the
binomial. Recall the p.d.f. of the normal distribution:
N (x, µ, σ) =
(x−µ)2
1
−
√ e (2σ2 )
σ 2π
Define the function normal in R to be this p.d.f. The quantity pi in R is equivalent to the number π. Be careful
with your parentheses!
normal = function(x,mu,sigma) ##
In order to achieve a good approximation,
1 of 2
L11
• the number of “trials” n in binomial-land needs to be sufficiently large;
• np and n(1 − p) should both be >5; and
• p should not be close to 0 nor to 1
If these conditions are satisfied, then a random variable satisfying B(n, p) will look a lot like one satisfying
N (np, np(1 − p)). Take some time to think about what n should be for the current situation.
Define x as a sequence of numbers from 0 to bp of length 300.
x = seq(#,#,#)
Finally, evaluate the approximation appropriately (choosing the correct µ and σ), and plot it on top of the
histogram using the lines command.
lines(x, normal(##,##,##), type='l', lwd=2)
Plot 11.1: save this figure to include in your assignment.
? Save your script so that you can use it for your assignment.
2 of 2
L11
Download