1180:Lab8 James Moore March 5th, 2012 1 Generating Exit Times In this lab we will use R to visualize some probability data. The example that I am using is similar to section 6.6 from your book. Throughout this entire assignment we will consider the very basic process of molecules diffusing out of (i.e. leaving) a cell. First we generate the leaving times. We’ll sample from an exponential distribution with rate .13 (we’ll see what that means later on). The exponential distribution is a continuous random variable. ltimes=rexp(n=100000,rate=.13) The following commands give some information about what we just generated. > head(ltimes) [1] 16.3208468 0.6565472 7.5878192 1.5364116 15.8764461 > summary(ltimes) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.00002 2.21700 5.32000 7.68300 10.67000 81.97000 9.4162347 The simplest way to visualize this data is with a histogram. This is fantastically easy in R. hist(ltimes,xlab="Leaving Time (Minutes)",main="Leaving Time",freq=F) Let’s compare this with what the distribution ‘should’ look like which is given by the probability density function. We can get the probability density function of any distribution by replacing the r (for random) with a d (for density). In this case, rexp becomes dexp. tlist<-seq(0,80,.1) lines(tlist,dexp(tlist,rate=.13),col=’red’) Save this plot (#1) The histogram shows the distribution of our 100000 random samples, the red curve shows the expected distribution if we took infinitely many samples. Right now it looks like the curve lies right on top of our histogram. However, we can increase the number of boxes to ‘expose’ the randomness of our samples. 1 hist(ltimes,xlab="Leaving Time (Minutes)",main="Leaving Time",freq=F,breaks=500) lines(tlist,dexp(tlist,rate=.13),col=’red’) Save this plot (#2) This histogram should show some small deviation from the red curve. When we take wider bars, these variations tend to cancel out. Another way to look at the data is with a cumulative distribution function. We’ll plot how many molecules are still in the cell as a function of time. The code to do this is below. accumulation<-function(t){ N=length(ltimes) length(ltimes[ltimes<t])/N } tlist2=seq(0,80,length=20) approxcdf=seq(1,20) for(i in seq(1,20)){approxcdf[i]<-accumulation(tlist2[i])} plot(tlist2,approxcdf,type=’h’) We can check this against the expected curve with integration. The probability density function (dexp) is actually just .13e−.13t . Integrate this from 0 to T and then add the theorectical curve to the plot created by the supplied code. Save this plot (#3). Using your calculated distribution function, estimate what the probability is that a molecule has left by 1 minute. Do the same calculation for 4 mins and 8 mins. 2 Looking at individual cells The example of molecules leaving cells is just used as an illustration of the process in general. However, one might actually be interested in this if you are studying how the cell exports molecules. Suppose that you have a population of 1000 cells and you know that each cell holds ten molecules (yes it’s not a perfect example). By attaching a fluorescent bead to the molecules you can see how many are in each cell at any point. We perform this simulation to know what to expect if it were simple diffusion. We have simulated the leaving times of 1000000 molecules. Now we wish to conduct the following experiment. 1. Choose 10 random molecules 2. See how many have not left at t = 1, 4, 8 3. Repeat 1000 times. 2 4. Plot the results in a histogram. The code below should get you started. par(mfrow=c(1,3))#This lines allows you to put three plots side by side greaterthanone<-seq(1,1000) for(i in seq(1,1000)){ miniltimes=sample(ltimes,size=10) greaterthanone[i]<-sum(miniltimes>1) } hist(greaterthanone,breaks=seq(-.5,10.5),main="Molecules in Cell at t=1",freq=F) Create a histogram that shows the distribution at each of the three times and put them on the same plot (#4) 3 Comparing with Binomial Distribution The leaving times of a molecule are a contunous random variable. If we count the number of molecules left at a particularly time then we get a discrete random variable. Using your cumulative density function, you should have figured out the probability that a given molecule has left at various times. Suppose, for example, you found that 30 percent have left by 1 minute. That implies that 70 percent remain. If we assume that there are ten molecules in a cell and each has a 70 percent chance of remaining after a minute, then there should typically be around 7 molecules left. However, it could be 6 or 8, or even 5 or 9. The distribution for this random variable is exactly a binomial. Now we can regenerate the previous graph and add the expected values. Like so greaterthanone<-seq(1,1000) for(i in seq(1,1000)){ miniltimes=sample(ltimes,size=10) greaterthanone[i]<-sum(miniltimes>1) } hist(greaterthanone,breaks=seq(-.5,10.5),main="Molecules in Cell at t=1",freq=F) points(seq(0,10),dbinom(seq(0,10),size=10,prob=.7),col=’red’,pch=15)#added line Note the line added at the end. You’ll have to change the value or ‘prob’. Recreate plot #4 with the expected values added on to each panel (#5) 3