Calculus for Biologists Lab Math 1180-002 Spring 2012 Lab #8 - Pathogen exposure and illness Report due date: Tuesday, March 20, 2012 at 9 a.m. Goal: To explore the interaction between pathogen exposure and sickness in different populations. You will use the joint probability distribution to determine the marginal and conditional probability distributions of these exposure and sickness random variables. You will also calculate the covariance and correlation between the two. ? Create a new script, either in R (laptop) or with a text editor (Linux computers). The story Suppose we are studying how varying levels of pathogen exposure affect the presence of sickness in an immunecompromised population and in a population of elementary school teachers. Let L be the random variable describing the level of pathogen exposure. L can take on the following values: • • • • L = 0: L = 1: L = 2: L = 3: no exposure low exposure moderate exposure high exposure Take a little time to think about the probability that each population is subject to each L value. Let S be random variable describing whether or not a given individual is “sick”. You may define sickness as you like: symptomatic presentation, antibody presence, infectiousness, etc. S can take on the following values: • S = 0: not sick • S = 1: sick Simulate the L and S values for 100 individuals in both populations by copying and pasting the code in the script pathogen.R from the lab website. You should now have a number of variables at your disposal. Here is a list of them and what they represent. • imm.data – table of simulated exposure (column 1) and sick status (column 2) values for 100 immunecompromised people • teach.data – table of simulated exposure (column 1) and sick status (column 2) values for 100 teachers • joint.imm – joint probability distribution for the random variables among the immune-compromised • joint.teach – joint probability distribution for the random variables among teachers Joint and marginal distributions Enter joint.imm into R. You should see something that looks like this: status exposure 0 1 0 0.92 0.00 1 0.02 0.04 3 0.00 0.02 The columns correspond to the sick status, while the rows correspond to exposure level. Notice that in this particular example, there is no row for L = 2. This means that there were no immune-compromised individuals with moderate exposure to pathogens. In probabilistic terms, this implies the marginal probability Pr(L = 2) = 0. Using your joint.imm table, define the marginal probability distribution for this population, by creating two vectors of numbers. The first should have 4 numbers corresponding to the marginal probabilities for each L (0 to 3), while the second should have just two numbers for sick status (0 to 1). marg.L.imm = ## ?? marg.S.imm = ## ?? Now, view joint.teach in R, and repeat the process to define the marginal distribution for teachers. 1 of 3 L8 marg.L.teach = ## ?? marg.S.teach = ## ?? We can plot histograms of L and S in both populations. par(mfcol=c(2,2),mar=c(5,5,3,1),oma=c(0,0,2,0),xaxs="i",yaxs="i") hist(imm.data$exposure, breaks=0:4-0.5, freq=F, ylim=c(0,1.1), labels=TRUE, xaxt="n", xlab="Pathogen exposure level", ylab="Probability", main="Marginal probabilities:\nimmune-compromised", col="lightgray") axis(1,at=L) hist(imm.data$status, breaks=0:2-0.5, freq=F, ylim=c(0,1.1), labels=TRUE, xaxt="n", xlab="Sick status",ylab="Probability", main="", col="lightgray") axis(1,at=S) hist(teach.data$exposure, breaks=0:4-0.5, freq=F, ylim=c(0,1.1), labels=TRUE, xaxt="n", xlab="Pathogen exposure level", ylab="Probability", main="Marginal probabilities:\nelementary school teachers", col="orange") axis(1,at=L) hist(teach.data$status,breaks=0:2-0.5, freq=F, ylim=c(0,1.1), labels=TRUE, xaxt="n", xlab="Sick Status", ylab="Probability", main="", col="orange") axis(1,at=S) You should see a total of four histograms. The first column shows the marginal probabilities of the two random variables for the immune-compromised group, and the second shows the same for teachers. Verify that the marginal probabilities printed above each bar match those you defined in your lists above. Plot 8.1: Save this plot to include in your assignment. Conditional probabilities: Pr(L|S) Execute the following code to generate another set of four histograms, this time representing what the sick status indicates about the level of exposure. par(mfcol=c(2,2),xaxs="i",yaxs="i",oma=c(0,0,3,0),mar=c(6,5,2,1)) ## immune-compromised for(i in S){ hist(imm.data$exposure[imm.data$status==i], breaks=0:4-0.5, freq=F, ylim=c(0,1.1), labels=TRUE, xaxt="n", main=paste("S = ",i), xlab="", ylab="Probability", col="lightgray") axis(1,at=L) } title(xlab="Pathogen exposure level\n(immune-compromised)") ## teachers for(i in S){ hist(teach.data$exposure[teach.data$status==i], breaks=0:4-0.5, freq=F, ylim=c(0,1.1), labels=TRUE, xaxt="n", main=paste("S = ",i), xlab="", ylab="Probability", col="orange") axis(1,at=L) } title(xlab="Pathogen exposure level\n(teachers)") mtext(side=3, "Conditional probabilities of sick status (S) on exposure level (L)", outer=T, font=2) Plot 8.2: Save this figure to include in your assignment. Conditional probabilities: Pr(S|L) Execute the following code to generate a final set of eight histograms, which represent what pathogen exposure reveals about sick status. par(mfrow=c(2,4),xaxs="i",yaxs="i",oma=c(3,3,3,0),mar=c(3,5,2,1)) for(i in L){ 2 of 3 L8 hist(imm.data$status[imm.data$exposure==i],breaks=0:2-0.5,freq=F,ylim=c(0,1.1),labels=TRUE, xaxt="n", main=paste("L = ",i), xlab="", ylab="Probability", col="lightgray") axis(1,at=S) } mtext(side=2, "Immune-compromised", font=2, adj=1, outer=TRUE) for(i in L){ hist(teach.data$status[teach.data$exposure==i],breaks=0:2-0.5,freq=F,ylim=c(0,1.1), labels=TRUE,xaxt="n", main=paste("L = ",i), xlab="", ylab="Probability", col="orange") axis(1,at=S) } mtext(side=3,"Conditional probabilities of exposure level (L) on sick status (S)", outer=T, font=2) mtext(side=1, "Sick status",outer=TRUE) mtext(side=2,"Teachers", font=2, adj=0.25, outer=TRUE) Plot 8.3: Save this figure to include in your assignment. Covariance We can identify whether there is a correlation between sickness and pathogen exposure in our two distinct populations. Such a relationship would indicate that higher levels of exposure lead to higher probabilities of sickness. Before we start, take a moment to think about what you would expect the correlation (positive, negative, zero) to be between L and S for both groups. In order to get to the correlation, we need to compute the covariance. In order to get the covariance, we need to compute the expectations of the marginal distributions. ## immune-compromised expect.L.imm = sum(L*marg.L.imm) expect.S.imm = sum(S*marg.S.imm) ## teachers expect.L.teach = sum(L*marg.L.teach) expect.S.teach = sum(S*marg.S.teach) As before, we multiply each possible value of the random variables by their probabilities and sum them all. Now we are ready to calculate the covariance of L and S, which is given by Cov(L, S) = E[(X − X̄)(Y − Ȳ )]. Use the following formula to compute the covariance of L and S for the two populations: Cov(L, S) = 4 2 X X li sj pij − L̄S̄. j=1 i=1 This formula says to multiply the joint probabilities of all possible S/L combinations, add them together and subtract the product of the expectations for L and S. Note: li should take on the values between 0 and 3 (four numbers total), while sj should take on numbers 0 and 1 only. Set up the things you need to multiply and sum together on paper, and then have R do the actual calculations for you. Save the results to Cov.LS.imm = ## ?? Cov.LS.teach = ## ?? Check your answers by comparing them to the results of the next two lines. cov(imm.data$exposure,imm.data$status) cov(teach.data$exposure,teach.data$status) Your answers may not exactly match, but they should if you round to the first 2 decimal places. ? Save your script so that you can use it for your assignment. 3 of 3 L8