Calculus for Biologists Lab Math 1180-002 Spring 2012 Lab #9 - Applications of the Binomial Distribution Report due date: Tuesday, March 27, 2012 at 9 a.m. Goal: To explore binomially distributed random variables and their cumulative distributions over time. ? Create a new script, either in R (laptop) or with a text editor (Linux computers). Binomial recap The binomial distribution describes the probability that there are a certain number of k successes out of n trials, where the probability of a single success is p. These successes are represented by a random variable, say H, and success must be measured in terms of a Bernoulli random variable, in which there are only two options: 0 and 1. These can be applied to a variety of circumstances: “yes or no”, “in or out”, etc. The probability density for k successes is represented by n k b(k; n, p) = Pr(H = k) = p (1 − p)n−k , (1) k and E(H) = np and Var(H) = p(1 − p). Note that k and n must be integers. To determine the probability that there are at least k successes, we can calculate the cumulative distribution B(k; n, p) = Pr(H 6 k) = k X b(k; n, p). i=0 If, on the other hand, we wished to know the probability of having more than k successes (i.e. Pr(H > k + 1)), we would need to calculate (1 − B(k; n, p)). Antioxidants Background: Antioxidants have proven to be very helpful in a variety of organisms. Most antioxidants are enzymatic and function to promote cellular health and longevity. In some organisms, antioxidants are even believed to slow the aging process. As with many proteins, antioxidant production in our cells can be enhanced by increased expression of transcription factors. The problem: Suppose a lab wants to determine the impact of treating cells with an antioxidant-enhancing solution. The molecules contained in this solution work to directly increase the translation of a newly discovered, and extremely effective, antioxidant enzyme called AwesomeAox (AA for short). The molecules themselves have only a 5% chance of leaving the cell each second. In each experiment, a fixed concentration of solution is injected into the cell. This concentration results in 100 intracellular molecules, which we’ll call awesomemols. Applying the binomial distribution Notice that the number of awesomemols inside the cell at any given second t can be described by the binomial distribution, where n is the starting number. Since there’s a 5% chance of molecules exiting each second, what is the function that describes the probability that a given molecule is still inside the cell as a function of time? Save your answer as p = function (t) { ??? } We can define the binomial probability density function for this problem as follows: n = 100 ## initial number of awesomemols binom = function (K,P) choose(n,K)*(P^K)*(1-P)^(n-K) Since n = 100 will not change for the experiments, we need only define binom as a function of K intracellular awesomemols and the probability P of finding one of them inside. The R function choose(n,k) calculates the binomial coefficient, as defined by n n! = . (n − k)!k! k We can determine how the probability of having a certain number of awesomemols inside changes over time. We will input the values of k that are multiples of 10 (including 0) and plot these eleven function over time. 1 of 3 L9 time = seq(0,60,by=1/10) ## evaluate over 1 minute k = seq(0,n,by=10) ## the k values we care about colors = rainbow(length(k)) ## set up an empty plot plot(time,time,type="n", ylim=c(0,1), xlab="Time (sec)", ylab="Pr (k awesomemols inside)") ## evaluate binom at each element of k for (j in 1:length(k)){ lines(time,binom(k[j],p(time)), type="l", lwd=3, col=colors[j]) } ## add a legend legend("topright",paste("k = ", k),lty=1,col=colors, lwd=3) Plot 9.1: Save this to include in your assignment. The threshold problem After several experiments, the researchers determine that the activity of the solution is only effective if the number of intracellular molecules is maintained above a certain threshold M of 70 awesomemols. M = ## ??? To determine the probability that there is a super-threshold number of awesomemols inside, you will first need to compute the cumulative distribution. cumulative = matrix(0,length(time),1) for (j in 1:length(time)){ cumulative[j] = sum(binom(0:M,p(time[j]))) } Recall that the cumulative distribution will give the probability of being at or below threshold. Determine what it is that you need to plot with the following command and replace the comments appropriately. plot(time, ## what you need to plot goes here ##, type='l', lwd=3, ylab="Pr (at least M awesomemols inside)", xlab="Time (sec)", main = "Cumulative distribution - single treatment") Identify in this graph an integer estimate of the time at which we can no longer be 100% certain that the number of awesomemols exceeds the threshold. (Translate this statement into mathematical terms to make sure you understand it). Save this time as target = ## ??? Plot 9.2: Save this graph to include in your assignment. “Fixing” the threshold problem Our esteemed researchers decide to apply multiple injections every target number of seconds in an effort to maintain more than M awesomemols. We can try to simulate this experiment by creating a new probability function p that accounts for the extra injections. To do this, we will simulate the following discrete-time equation: 1 if t = 0, 1 · target, 2 · target, 3 · target . . . pt+1 = rpt otherwise where r is the appropriate fraction. Define r. r = ## ??? In the following code, p.new represents the probability that a single awesomemol is inside at a given second. This is an easier way of implementing the piecewise definition above; it is not imperative that you understand the code, but rather its output. time.new = 0:max(time) p.new = matrix(1,length(time.new),1) 2 of 3 L9 for (j in 2:length(time.new)){ p.new[j] = if (time.new[j] %in% seq(0,max(time.new),by=target)){ 1} else {r*p.new[j-1]} } Now, re-calculate the cumulative probability and plot the appropriate vector. cumulative = matrix(0,length(time.new),1) for (j in 1:length(time.new)){ cumulative[j] = sum(binom(0:M,p.new[j])) } plot(time.new, ## what you need to plot goes here ##, ylim=c(0,1), type='o', ylab="Pr (more than M awesomemols inside)", xlab="Time (sec)", main = "Cumulative distribution - frequent treatment", pch=20) Plot 9.3: Save this to include in your assignment. ? Save your script so that you can use it for your assignment. 3 of 3 L9