Calculus for Biologists Lab Math 1180-002 Spring 2012

advertisement

Calculus for Biologists Lab

Math 1180-002

Spring 2012

Lab #5 - Simulating Populations and Computing Probabilities

Report due date: Tuesday, February 21, 2012 at 9 a.m.

Goal: To simulate random populations with varying characteristics and use these to determine different probabilities. In this lab, you will use the plant example from lecture to compare the mathematical results to your simulations. In the corresponding assignment, you will extend these methods to the rare disease example.

?

Create a new script, either in R (laptop) or with a text editor (Linux computers).

Note: For each number you record in this lab, be sure to round to at least 4 decimal places.

Purple- versus white-flowered plants

Recall the plant example from lecture. Dominant allele A produces a purple-flowered phenotype. Recessive allele a only produces white flowers when paired with another a . Consider a population of 500 pairs of parent plants with the heterozygous genotype. This means that each individual parent is purple with one copy of A and one copy of a . Assume, as before, that each set of parents has three offspring.

Define the following variables in R: pairs = ## total number of parent pairs children = ## number of offspring per pair

The simulation

You will simulate a population of offspring that depends on the mathematical probabilities calculated in lecture.

To do this, you need to determine how many total offspring will be obtained from all sets of parents. Save this number as total.offspring = ##

To distinguish between purple and white offspring, we will assign a value of 1 to any purple ones, and a value of

0 to any white ones. Save these now: purple = ## assigned value white = ## assigned value

We will use sample to choose values of only 0s and 1s to assign randomly to a total of total.offspring

individuals. These will all be stored in a table with children number of rows and pairs number of columns.

This way, each column will show which of the three offspring are purple and which ones are white. The following code tells R to store a whole list of randomly chosen numbers in a children -bypairs grid.

?

Insert the appropriate probabilities in the code before executing it.

offspring = matrix( ## how we're storing things sample(c(purple,white), ## what we're randomly choosing from total.offspring, ## how many we need total replace=TRUE, prob=c(##,##)), children,pairs)

## keep choosing from just 2 values

## specify probabilities of choosing one or the other

## number of rows,columns we want in the end

As a test, view the third column of offspring .

?

What numbers did you get (specify the order), and what do they mean?

?

Let’s count the number of offspring that are purple for each set of parents. To do this, we’ll make use of the colSums command – this adds the numbers in each column of the table – and save the result to P.count

.

P.count = colSums(offspring)

What would you expect the smallest number in P.count

to be? The largest? Check your answers by viewing

P.count

in R. Were your suspicions confirmed?

1 of 3

L5

Determine how you would define W.count

given the information you have, and do it.

W.count = ## ???

Plot 5.1: Save the following plot as something you’ll remember, to include in your assignment.

plot(P.count,xlab="Parent pair index",cex=0.5)

Think about the information provided in this plot and how you might interpret it. You’ll need to do this in your assignment.

Calculating probabilities

To view the simulated probabilities of the number of purple offspring, we can use the table command.

table(P.count)

You should literally see a table that gives you the total number of parent pairs (bottom row) that yield a particular number of purple offspring (top row). Unfortunately, we can’t do anything with these numbers as they are; so, we will combine the command with as.data.frame

, and save the result to summary . Note the difference.

summary = as.data.frame(table(P.count)) summary

Notice that there are two columns instead of rows, and that these are now labeled with the headings “P.count” and “Freq”.

Freq gives the total number/frequency of purple offspring in each category. We need to compute the fraction of all parental pairs that had each number of purple offspring, which is an estimate of the true probability.

For this, we need to access just the second column of summary and then divide by the total number of parent pairs. If you have a data frame in R, we access a named column with $ , followed by the name of the column: sim.P = (summary$Freq)/pairs sim.P

Record the results of sim.P

to use in your assignment.

Explosive seeds

Suppose, once more, that 1% of white-flowered plants have explosive seeds, while 5% of purple-flowered plants have this defect. Simulate this situation with your existing set of offspring.

First, save the total number of purple offspring over all sets of parents with the sum command, which is the list-version of colSums . (Aside: What percentage of all offspring should this be? Verify this.) all.P = sum(P.count)

Determine how all.W

should be calculated and have R do the work for you.

all.W = ## ???

Now have R compute the fraction of all offspring that are purple and the fraction that are white, using the appropriate variables defined earlier in the lab.

frac.P = ## ???

frac.W = ## ???

Identify which probabilities these estimate. You’ll need them later in the lab.

Of the number of purple offspring, simulate which ones have explosive seeds according to the percentage stated above. Use an appropriately modified sample command to do this. Save this population to explosive.P = ## ???

Do the same for the remaining population of white offspring.

explosive.W = ## ???

Make use of the sum command to define the fractions of purple and white offspring that are defective. Again, use previously defined variables to have R do the work for you.

prob.explosive.P = ## ???

prob.explosive.W = ## ???

2 of 3

L5

Based only on the simulation probabilities/fractions you’ve computed thus far, use the law of total probability to determine the probability that any randomly chosen offspring has explosive seeds. Recall that this law says that for mutually exclusive and collectively exhaustive events E

1

, E

2

, . . . , E n and some event A ,

Pr( A ) = n

X

Pr( A | E i

) Pr( E i

) .

i =1 tot.prob.explosive = ## ???

Recall Bayes’ Theorem:

For any events A and B , Pr( A | B ) =

Pr( B | A ) Pr( A )

.

Pr( B )

Use this and your simulation results to estimate the probability of being a white-flowered plant given the existence of explosive seeds ( Pr( W | D ) ). Do the same for the probability of being purple conditional on being defective

( Pr( P | D ) ). Record your results to use with your assignment.

?

Did you round each of your recordings to at least 4 decimal places?

?

Save your script so that you can use it for your assignment.

3 of 3

L5

Download