Calculus for Biologists Lab Math 1180-002 Spring 2012

advertisement
Calculus for Biologists Lab
Math 1180-002
Spring 2012
Lab #6 - Discrete Random Variables
Report due date: Tuesday, February 28, 2012 at 9 a.m.
Goal: To explore properties of discrete random variables using histograms and expected values. You will interpret
information contained in histograms and also use R to compute expected values of simulated data.
? Create a new script, either in R (laptop) or with a text editor (Linux computers).
Purple plants revisited
Last week, you simulated the purple-flowered plants example from lecture and calculated various probabilities.
This week, you will look at the same example and compute the expectation of the number of purple offspring
from heterozygous parents.
To start, you’ll need to simulate another set of three offspring for 500 pairs of heterzygotes.
pairs = 500
children = 3
total.offspring = children*pairs
purple = 1
white = 0
offspring = matrix(sample(c(purple,white),total.offspring,replace=TRUE,prob=c(0.75,0.25)),
children,pairs)
offspring is a table with three rows and 500 columns, as before. We will use colSums once more to calculate
the total number of purple offspring borne to each set of parents.
P.count = colSums(offspring)
Finally, we complete the set up by calculating the fraction of all pairs that gave rise to zero, one, two, or three
purple plants. Recall that as.data.frame saves the input to a usable table format, whereas table summarizes
the frequency of 0s, 1s, 2s, and 3s that appear in P.count. The “usable” data frame has a column labeled Freq
that gives these numbers explicitly, which we access with $. Finally, we divide the result by the total number of
parental sets to obtain the desired fractions and save them to est.prob.P.count:
est.prob.P.count = as.data.frame(table(P.count))$Freq/pairs
? Checkpoint: P.count is a list of the simulation results, separated by pairs; est.prob.P.count gives the
estimated probability (via simulation) of having one of four outcomes ({0, 1, 2, 3}) for the number of purple
offspring produced by each pair.
The actual probabilities of these events occurring can be calculated using the probability theory you learned in
lecture (or a couple of Punnett squares). Here’s a summary of them:
Pr
Pr
Pr
Pr
(0
(1
(2
(3
purple
purple
purple
purple
offspring
offspring
offspring
offspring
|
|
|
|
heterozygous
heterozygous
heterozygous
heterozygous
parents)
parents)
parents)
parents)
=
=
=
=
1/64
9/64
27/64
27/64
Save these probabilities to the vector
true.prob.P.count = ## ???
Histograms are useful tools for summarizing event probabilities, especially in the case of discrete variables. Here,
the discrete variable is the number of purple offspring, since there are exactly four options. R has the extremely
convenient hist command that does the work for us.
Copy and paste the following to set up a figure of two side-by-side plots, one of which will show a histogram
summarizing the information contained in P.count.
1 of 3
L6
par(mfrow=c(1,2))
## set up the figure structure
hist(P.count,
## make a histogram of P.count
breaks=0:4-0.5,
## center side-by-side bars at 0,1,2,3, all with identical widths
freq=F,
## give me fractions, not whole numbers
xlab="Number of purple offspring",ylab="Simulated probability",col="purple3")
You should see a histogram on the left side of your plot window. To compare this with the probabilities we know
to be true, we will set up a fake population containing the exact probabilities indicated in true.prob.count. The
following command will create a list of 0s, 1s, 2s, and 3s, the number of which is determined by their respective
fractions multiplied by 64 (so you end up with whole numbers). Essentially, we have falsified these data, solely
to create a histogram that illustrates the true probabilities.
true.P.count = rep(0:3,true.prob.P.count*64)
print(true.P.count)
Now, plot the other histogram, which should appear in a different shade of purple from the previous one.
hist(true.P.count,breaks=0:4-0.5,freq=F,
xlab="Number of purple offspring",ylab="Actual probability",col="orchid3")
Plot 6.1: Save this figure to include in your assignment.
Expectations
The expected value of a discrete variable tells you the value that your data achieves on average. If everything
behaves the same, this is simply the arithmetic mean. In the current exploration, however, there are different
probabilities for the number of purple offspring generated by a given set of parents. We can account for these
with the definition of the expected value:
Let P be the random variable describing the number of purple offspring that a pair of heterozygotes produces.
Since P can only equal i = 0, 1, 2, 3, the expectation of P is
E(P ) =
3
X
Pr(P = i) · i = Pr(P = 0) · 0 + Pr(P = 1) · 1 + Pr(P = 2) · 2 + Pr(P = 3) · 3
(1)
i=0
Take the time to calculate (by hand) the true expectation based on the “true” probabilities defined earlier. Record
your answer to use in your assignment.
The analogous process for estimating the expectation based on your R simulation is as follows:
est.expected.P.count = sum(est.prob.P.count*(0:3))
This line of code took each of the fractions in est.prob.P.count, multiplied them by their corresponding number
in {0, 1, 2, 3}, and then summed these products together.
print(est.expected.P.count)
Record the resulting number to use in your assignment.
The roll of the dice
Switching gears a little bit, now suppose you roll two fair dice simultaneously 100 different times. Each roll gives
a sum, based on the numbers that appear on each die. You will find the probabilities of each possible sum as well
as the expected sum.
As always, we need to simulate something. Here, we’ll simulate 100 dice rolls, where the probability of obtaining
numbers 1 through 6 are all equal (fair dice).
fair = 1/6
N.dice = 2
N.rolls = 100
dice = matrix(sample(1:6,
N.dice*N.rolls,
replace=TRUE,
prob=rep(fair,6)),
N.rolls, N.dice)
## possible die results
## total number of rolls (100 per die)
## all 6 probabilities are equal to 'fair'
## dimensions of the table to which we save results
2 of 3
L6
Now we need some way of obtaining the sum for each individual roll of the dice. With rowSums, we can add the
columns of dice together within each row.
roll.sums = rowSums(dice)
Use a histogram to view the frequency of each sum.
hist(roll.sums,freq=F,breaks=2:13-0.5,
xlab="Sum of two dice",ylab="Fraction",ylim=c(0,0.25),col="deepskyblue")
Plot 6.2: Save this figure to include in your assignment.
View a print-out of these probabilities:
est.prob.sums = as.data.frame(table(roll.sums))$Freq/N.rolls
names(est.prob.sums) = 2:12
print(est.prob.sums)
Record these results to use in your assignment.
? Save your script so that you can use it for your assignment.
3 of 3
L6
Download