Calculus for Biologists Lab Math 1180-002 Spring 2012

advertisement
Calculus for Biologists Lab
Math 1180-002
Spring 2012
Lab #8 - Pathogen exposure and illness
Report due date: Tuesday, March 20, 2012 at 9 a.m.
Goal: To explore the interaction between pathogen exposure and sickness in different populations. You will
use the joint probability distribution to determine the marginal and conditional probability distributions of these
exposure and sickness random variables. You will also calculate the covariance and correlation between the two.
? Create a new script, either in R (laptop) or with a text editor (Linux computers).
The story
Suppose we are studying how varying levels of pathogen exposure affect the presence of sickness in an immunecompromised population and in a population of elementary school teachers. Let L be the random variable
describing the level of pathogen exposure. L can take on the following values:
•
•
•
•
L = 0:
L = 1:
L = 2:
L = 3:
no exposure
low exposure
moderate exposure
high exposure
Take a little time to think about the probability that each population is subject to each L value.
Let S be random variable describing whether or not a given individual is “sick”. You may define sickness as you
like: symptomatic presentation, antibody presence, infectiousness, etc. S can take on the following values:
• S = 0: not sick
• S = 1: sick
Simulate the L and S values for 100 individuals in both populations by copying and pasting the code in the script
pathogen.R from the lab website.
You should now have a number of variables at your disposal. Here is a list of them and what they represent.
• imm.data – table of simulated exposure (column 1) and sick status (column 2) values for 100 immunecompromised people
• teach.data – table of simulated exposure (column 1) and sick status (column 2) values for 100 teachers
• joint.imm – joint probability distribution for the random variables among the immune-compromised
• joint.teach – joint probability distribution for the random variables among teachers
Joint and marginal distributions
Enter joint.imm into R. You should see something that looks like this:
status
exposure
0
1
0 0.92 0.00
1 0.02 0.04
3 0.00 0.02
The columns correspond to the sick status, while the rows correspond to exposure level. Notice that in this
particular example, there is no row for L = 2. This means that there were no immune-compromised individuals
with moderate exposure to pathogens. In probabilistic terms, this implies the marginal probability Pr(L = 2) = 0.
Using your joint.imm table, define the marginal probability distribution for this population, by creating two
vectors of numbers. The first should have 4 numbers corresponding to the marginal probabilities for each L (0
to 3), while the second should have just two numbers for sick status (0 to 1).
marg.L.imm = ## ??
marg.S.imm = ## ??
Now, view joint.teach in R, and repeat the process to define the marginal distribution for teachers.
1 of 3
L8
marg.L.teach = ## ??
marg.S.teach = ## ??
We can plot histograms of L and S in both populations.
par(mfcol=c(2,2),mar=c(5,5,3,1),oma=c(0,0,2,0),xaxs="i",yaxs="i")
hist(imm.data$exposure, breaks=0:4-0.5, freq=F, ylim=c(0,1.1), labels=TRUE, xaxt="n",
xlab="Pathogen exposure level", ylab="Probability",
main="Marginal probabilities:\nimmune-compromised", col="lightgray")
axis(1,at=L)
hist(imm.data$status, breaks=0:2-0.5, freq=F, ylim=c(0,1.1), labels=TRUE, xaxt="n",
xlab="Sick status",ylab="Probability", main="", col="lightgray")
axis(1,at=S)
hist(teach.data$exposure, breaks=0:4-0.5, freq=F, ylim=c(0,1.1), labels=TRUE, xaxt="n",
xlab="Pathogen exposure level", ylab="Probability",
main="Marginal probabilities:\nelementary school teachers", col="orange")
axis(1,at=L)
hist(teach.data$status,breaks=0:2-0.5, freq=F, ylim=c(0,1.1), labels=TRUE, xaxt="n",
xlab="Sick Status", ylab="Probability", main="", col="orange")
axis(1,at=S)
You should see a total of four histograms. The first column shows the marginal probabilities of the two random
variables for the immune-compromised group, and the second shows the same for teachers. Verify that the
marginal probabilities printed above each bar match those you defined in your lists above.
Plot 8.1: Save this plot to include in your assignment.
Conditional probabilities: Pr(L|S)
Execute the following code to generate another set of four histograms, this time representing what the sick status
indicates about the level of exposure.
par(mfcol=c(2,2),xaxs="i",yaxs="i",oma=c(0,0,3,0),mar=c(6,5,2,1))
## immune-compromised
for(i in S){
hist(imm.data$exposure[imm.data$status==i], breaks=0:4-0.5,
freq=F, ylim=c(0,1.1), labels=TRUE, xaxt="n",
main=paste("S = ",i), xlab="", ylab="Probability", col="lightgray")
axis(1,at=L)
}
title(xlab="Pathogen exposure level\n(immune-compromised)")
## teachers
for(i in S){
hist(teach.data$exposure[teach.data$status==i], breaks=0:4-0.5,
freq=F, ylim=c(0,1.1), labels=TRUE, xaxt="n",
main=paste("S = ",i), xlab="", ylab="Probability", col="orange")
axis(1,at=L)
}
title(xlab="Pathogen exposure level\n(teachers)")
mtext(side=3, "Conditional probabilities of sick status (S) on exposure level (L)",
outer=T, font=2)
Plot 8.2: Save this figure to include in your assignment.
Conditional probabilities: Pr(S|L)
Execute the following code to generate a final set of eight histograms, which represent what pathogen exposure
reveals about sick status.
par(mfrow=c(2,4),xaxs="i",yaxs="i",oma=c(3,3,3,0),mar=c(3,5,2,1))
for(i in L){
2 of 3
L8
hist(imm.data$status[imm.data$exposure==i],breaks=0:2-0.5,freq=F,ylim=c(0,1.1),labels=TRUE,
xaxt="n", main=paste("L = ",i), xlab="", ylab="Probability", col="lightgray")
axis(1,at=S)
}
mtext(side=2, "Immune-compromised", font=2, adj=1, outer=TRUE)
for(i in L){
hist(teach.data$status[teach.data$exposure==i],breaks=0:2-0.5,freq=F,ylim=c(0,1.1),
labels=TRUE,xaxt="n", main=paste("L = ",i), xlab="", ylab="Probability", col="orange")
axis(1,at=S)
}
mtext(side=3,"Conditional probabilities of exposure level (L) on sick status (S)",
outer=T, font=2)
mtext(side=1, "Sick status",outer=TRUE)
mtext(side=2,"Teachers", font=2, adj=0.25, outer=TRUE)
Plot 8.3: Save this figure to include in your assignment.
Covariance
We can identify whether there is a correlation between sickness and pathogen exposure in our two distinct
populations. Such a relationship would indicate that higher levels of exposure lead to higher probabilities of
sickness. Before we start, take a moment to think about what you would expect the correlation (positive,
negative, zero) to be between L and S for both groups.
In order to get to the correlation, we need to compute the covariance. In order to get the covariance, we need to
compute the expectations of the marginal distributions.
## immune-compromised
expect.L.imm = sum(L*marg.L.imm)
expect.S.imm = sum(S*marg.S.imm)
## teachers
expect.L.teach = sum(L*marg.L.teach)
expect.S.teach = sum(S*marg.S.teach)
As before, we multiply each possible value of the random variables by their probabilities and sum them all.
Now we are ready to calculate the covariance of L and S, which is given by
Cov(L, S) = E[(X − X̄)(Y − Ȳ )].
Use the following formula to compute the covariance of L and S for the two populations:
Cov(L, S) =
4
2 X
X
li sj pij − L̄S̄.
j=1 i=1
This formula says to multiply the joint probabilities of all possible S/L combinations, add them together and
subtract the product of the expectations for L and S. Note: li should take on the values between 0 and 3 (four
numbers total), while sj should take on numbers 0 and 1 only.
Set up the things you need to multiply and sum together on paper, and then have R do the actual calculations
for you. Save the results to
Cov.LS.imm = ## ??
Cov.LS.teach = ## ??
Check your answers by comparing them to the results of the next two lines.
cov(imm.data$exposure,imm.data$status)
cov(teach.data$exposure,teach.data$status)
Your answers may not exactly match, but they should if you round to the first 2 decimal places.
? Save your script so that you can use it for your assignment.
3 of 3
L8
Download