Calculus for Biologists Lab Math 1180-002 Spring 2012 Lab #11 - Binomial Distribution and Normal Approximation Report due date: Tuesday, April 10, 2012 at 9 a.m. Goal: To simulate a binomial random variable and to compute a normal approximation. ? Create a new script, either in R (laptop) or with a text editor (Linux computers). DNA Damage Suppose a lab is doing an experiment in order to determine the effect of mutation number on organismal phenotype. After 10 days, this particular type of organism experiences no further alterations to its DNA, so the labs looks at the frequency of mutations under equilibrium. Taking a single 250-base-pair gene segment from 1000 organisms, the lab counts the number of base pairs containing at least one mutation. Let bp and organisms denote the number of base pairs and organisms under observation, respectively. bp = ## organisms = ## Simulation Assume that the resident lab has a mathematically inclined scientist, who was able to compute the equilibrium fraction of base pairs that would end up damaged after standard mutation/repair processes played out. This scientist found this fraction to be 13 . Save this as p. p = ## Simulate the base pair profile B of this set of organisms. Let 0 denote the absence of a mutation, and 1 the presence of a mutation. Replace any ## comments appropriately. B = matrix(sample(c(0,1), bp*organisms, replace=TRUE, prob=c(##,##)), #rows?#, #columns?#) You should now have a matrix with multiple rows and columns. Let’s count the number of mutations found in each organism, and summarize the results with a saved histogram: bad.bp = colSums(B) bad.hist = hist(bad.bp,plot=F) We can plot this histogram within a range of values related to the information embedded in bad.hist. hist(bad.bp, ## what to make a histogram of col="dodgerblue", ## what color to make the bars xlim=c(0,bp), ## limits along the x-axis ylim=c(0,1.2*max(bad.hist$density)), ## y-axis limits above the largest probability found prob=T, ## plot probability, not frequency ylab = "Probability", xlab = "Number of mutations") The normal approximation There are a set of circumstances under which the normal distribution may be used to approximate that of the binomial. Recall the p.d.f. of the normal distribution: N (x, µ, σ) = (x−µ)2 1 − √ e (2σ2 ) σ 2π Define the function normal in R to be this p.d.f. The quantity pi in R is equivalent to the number π. Be careful with your parentheses! normal = function(x,mu,sigma) ## In order to achieve a good approximation, 1 of 2 L11 • the number of “trials” n in binomial-land needs to be sufficiently large; • np and n(1 − p) should both be >5; and • p should not be close to 0 nor to 1 If these conditions are satisfied, then a random variable satisfying B(n, p) will look a lot like one satisfying N (np, np(1 − p)). Take some time to think about what n should be for the current situation. Define x as a sequence of numbers from 0 to bp of length 300. x = seq(#,#,#) Finally, evaluate the approximation appropriately (choosing the correct µ and σ), and plot it on top of the histogram using the lines command. lines(x, normal(##,##,##), type='l', lwd=2) Plot 11.1: save this figure to include in your assignment. ? Save your script so that you can use it for your assignment. 2 of 2 L11