Calculus for Biologists Lab Math 1180-002 Spring 2012 Lab #6 - Discrete Random Variables Report due date: Tuesday, February 28, 2012 at 9 a.m. Goal: To explore properties of discrete random variables using histograms and expected values. You will interpret information contained in histograms and also use R to compute expected values of simulated data. ? Create a new script, either in R (laptop) or with a text editor (Linux computers). Purple plants revisited Last week, you simulated the purple-flowered plants example from lecture and calculated various probabilities. This week, you will look at the same example and compute the expectation of the number of purple offspring from heterozygous parents. To start, you’ll need to simulate another set of three offspring for 500 pairs of heterzygotes. pairs = 500 children = 3 total.offspring = children*pairs purple = 1 white = 0 offspring = matrix(sample(c(purple,white),total.offspring,replace=TRUE,prob=c(0.75,0.25)), children,pairs) offspring is a table with three rows and 500 columns, as before. We will use colSums once more to calculate the total number of purple offspring borne to each set of parents. P.count = colSums(offspring) Finally, we complete the set up by calculating the fraction of all pairs that gave rise to zero, one, two, or three purple plants. Recall that as.data.frame saves the input to a usable table format, whereas table summarizes the frequency of 0s, 1s, 2s, and 3s that appear in P.count. The “usable” data frame has a column labeled Freq that gives these numbers explicitly, which we access with $. Finally, we divide the result by the total number of parental sets to obtain the desired fractions and save them to est.prob.P.count: est.prob.P.count = as.data.frame(table(P.count))$Freq/pairs ? Checkpoint: P.count is a list of the simulation results, separated by pairs; est.prob.P.count gives the estimated probability (via simulation) of having one of four outcomes ({0, 1, 2, 3}) for the number of purple offspring produced by each pair. The actual probabilities of these events occurring can be calculated using the probability theory you learned in lecture (or a couple of Punnett squares). Here’s a summary of them: Pr Pr Pr Pr (0 (1 (2 (3 purple purple purple purple offspring offspring offspring offspring | | | | heterozygous heterozygous heterozygous heterozygous parents) parents) parents) parents) = = = = 1/64 9/64 27/64 27/64 Save these probabilities to the vector true.prob.P.count = ## ??? Histograms are useful tools for summarizing event probabilities, especially in the case of discrete variables. Here, the discrete variable is the number of purple offspring, since there are exactly four options. R has the extremely convenient hist command that does the work for us. Copy and paste the following to set up a figure of two side-by-side plots, one of which will show a histogram summarizing the information contained in P.count. 1 of 3 L6 par(mfrow=c(1,2)) ## set up the figure structure hist(P.count, ## make a histogram of P.count breaks=0:4-0.5, ## center side-by-side bars at 0,1,2,3, all with identical widths freq=F, ## give me fractions, not whole numbers xlab="Number of purple offspring",ylab="Simulated probability",col="purple3") You should see a histogram on the left side of your plot window. To compare this with the probabilities we know to be true, we will set up a fake population containing the exact probabilities indicated in true.prob.count. The following command will create a list of 0s, 1s, 2s, and 3s, the number of which is determined by their respective fractions multiplied by 64 (so you end up with whole numbers). Essentially, we have falsified these data, solely to create a histogram that illustrates the true probabilities. true.P.count = rep(0:3,true.prob.P.count*64) print(true.P.count) Now, plot the other histogram, which should appear in a different shade of purple from the previous one. hist(true.P.count,breaks=0:4-0.5,freq=F, xlab="Number of purple offspring",ylab="Actual probability",col="orchid3") Plot 6.1: Save this figure to include in your assignment. Expectations The expected value of a discrete variable tells you the value that your data achieves on average. If everything behaves the same, this is simply the arithmetic mean. In the current exploration, however, there are different probabilities for the number of purple offspring generated by a given set of parents. We can account for these with the definition of the expected value: Let P be the random variable describing the number of purple offspring that a pair of heterozygotes produces. Since P can only equal i = 0, 1, 2, 3, the expectation of P is E(P ) = 3 X Pr(P = i) · i = Pr(P = 0) · 0 + Pr(P = 1) · 1 + Pr(P = 2) · 2 + Pr(P = 3) · 3 (1) i=0 Take the time to calculate (by hand) the true expectation based on the “true” probabilities defined earlier. Record your answer to use in your assignment. The analogous process for estimating the expectation based on your R simulation is as follows: est.expected.P.count = sum(est.prob.P.count*(0:3)) This line of code took each of the fractions in est.prob.P.count, multiplied them by their corresponding number in {0, 1, 2, 3}, and then summed these products together. print(est.expected.P.count) Record the resulting number to use in your assignment. The roll of the dice Switching gears a little bit, now suppose you roll two fair dice simultaneously 100 different times. Each roll gives a sum, based on the numbers that appear on each die. You will find the probabilities of each possible sum as well as the expected sum. As always, we need to simulate something. Here, we’ll simulate 100 dice rolls, where the probability of obtaining numbers 1 through 6 are all equal (fair dice). fair = 1/6 N.dice = 2 N.rolls = 100 dice = matrix(sample(1:6, N.dice*N.rolls, replace=TRUE, prob=rep(fair,6)), N.rolls, N.dice) ## possible die results ## total number of rolls (100 per die) ## all 6 probabilities are equal to 'fair' ## dimensions of the table to which we save results 2 of 3 L6 Now we need some way of obtaining the sum for each individual roll of the dice. With rowSums, we can add the columns of dice together within each row. roll.sums = rowSums(dice) Use a histogram to view the frequency of each sum. hist(roll.sums,freq=F,breaks=2:13-0.5, xlab="Sum of two dice",ylab="Fraction",ylim=c(0,0.25),col="deepskyblue") Plot 6.2: Save this figure to include in your assignment. View a print-out of these probabilities: est.prob.sums = as.data.frame(table(roll.sums))$Freq/N.rolls names(est.prob.sums) = 2:12 print(est.prob.sums) Record these results to use in your assignment. ? Save your script so that you can use it for your assignment. 3 of 3 L6