MATH 1180: Calculus for Biologists II (Spring 2011) Lab Meets: March 2, 2011 Report due date: March 9, 2011 Section 002: Wednesday 9:40−10:30am Section 003: Wednesday 10:45−11:35am Lab location − LCB 115 Lab instructor − Erica Graham, graham@math.utah.edu Lab webpage − www.math.utah.edu/~graham/Math1180.html Lab office hours − Monday, 9:00 − 10:30 a.m., LCB 115 Lab 07 General Lab Instructions In−class Exploration Review: Last week, we simulated a discrete−time stochastic diffusion model and observed just how obnoxious ReallyObnoxiousStuff could be. Background: In today’s lab, we will simulate the rare disease example from the text to determine how good screening tests are at actually catching sick people. restart; with(Statistics): Suppose a rare disease infects 5% of a population. A diagnostic test always identifies people with the disease, but it also generates 10% false positives. The test will return a 1 for those who test positive and a 0 for those who test negative. We can simulate a fake population of 100 people to see how many actually have the disease before we deal with testing inaccuracy. Again, we’ll need the RandomVariable and ProbabilityTable( ) commands to set up the characteristics of the population. pdisease:=0.05: ## probability of having the disease P1:=[1−pdisease,pdisease]; ## probability table, yields 1 if healthy, 2 if sick X:=RandomVariable(ProbabilityTable(P1)): ## necessary randomness P1 := 0.95, 0.05 (2.1) Now we can simulate the sick and healthy people for the initial population size. Because sampling X will return only 1s and 2s, we need to modify the results to be 0 when there’s a 1 and 1 when there’s a 2 (like we did last week). The difference is we’re sampling 100 people, so we need to organize things using seq( ). Our entire population, reduced to a series of 0s and 1s, is represented by the list ’status.’ population:=150: xsample:=Sample(X,population): ## sample of 100 people (1s and 2s) status:=[seq(xsample[j]−1,j=1..population)]: ## convert sample to 0s and 1s To figure out how many sick people are in the population, we will add up all the 1s in ’status.’ The number of healthy people is simply the total population size minus the number of sick people. sickpeople:=add(status[j],j=1..population); ## all the 1s healthypeople:=population−sickpeople; ## all the 0s sickpeople := 9 (2.2) healthypeople := 141 (2.2) We now know what our population looks like. For all the healthy people, the original setup states that the test will give a false positive 10% of the time. We’ll see how many people in the healthy population would receive a false positive if they were tested. To do this, we can follow the same process as before, assigning different names for things. pfalse:=0.1: ## probability of getting a false positive P2:=[1−pfalse,pfalse]; X2:=RandomVariable(ProbabilityTable(P2)): P2 := 0.9, 0.1 (2.3) Our new sample size is ’healthypeople.’ x2sample:=Sample(X2,healthypeople): diagnostic:=[seq(x2sample[j]−1,j=1..healthypeople)]: falsepositives:=add(diagnostic[j],j=1..healthypeople); ## how many healthy people tested positive falsepositives := 15 (2.4) What fraction of the positives detected actually identified sick people? Put another way, what is the probability of being sick given a positive result? There are 2 ways to answer this question: [1] We can estimate the fraction using the results from our simulation. (We’ll do this now.) [2] We can use Bayes’ theorem for the actual fraction. (You’ll do this later.) We need the following to answer do method [1]: (true positives)/(true positives + false positives). Since the test is 100% accurate for catching the people with the disease, we know that all ’sickpeople’ were flagged with 1s. The false positives that we just found add to the total number of positives. So, the percent ’score’ for this particular diagnostic test in identifying sick people is simply 100*(appropriate fraction). score:=evalf(100*sickpeople/(sickpeople+falsepositives)); ## percentage of positive tests that identify sick people score := 37.50000000 (2.5) Please copy the entire section below into a new worksheet, and save it as something you’ll remember. Lab 07 Homework Problems Your Full Name: Your (registered) Lab Section: Useful Tip #1: Read each problem carefully, and be sure to follow the directions specified for each question! I will take a vow of silence if you ask me a question that is clearly stated in a problem. Useful Tip #2: Try to minimize your code by not simply copying and pasting absolutely everything we do in class. See if you can eliminate unnecessary commands by knowing what it is you have to do and what tools you (minimally) need to do it. Useful Tip #3: Don’t be afraid to troubleshoot! Does your answer make sense to you? If not, explore why. If you’re still unsure, ask me. Useful Tip #4: Whenever you re−open Maple to complete an assignment, you will need to re−execute everything that you’ve done. Maple has no memory of anything you did; it just shows you your non− suppressed output. Paper−saving tip: Make the size of your output graphs smaller to save paper when you print them. Please ask me if you’re unsure of how to do this. (You can see how much paper you’d use beforehand by going to File Print Preview.) Also, please DO NOT attach printer header sheets (usually yellow, pink or blue) to your assignment. Recycle them instead! NOTE: For all assignments, you should re−define/assign any parameters or functions we used in class, as needed. This will require an understanding of what’s being asked of you. Again, everything that we did in class does not necessarily need to be done here. (0) Import the Maple Statistics package we used in class. with(Statistics): (1)(a) Use method [2] at the end of the in−class exploration to determine the actual percentage of the flagged positives that identified sick people. This is equivalent to 100*Pr(sick|positive). Hint: For any 2 events, A and B, Bayes’ theorem says that Pr(B|A) = (Pr(A|B)*Pr(B))/Pr(A). You have all the necessary pieces of information to answer this problem. Try not to get overwhelmed. Note: You DO NOT need Maple for this problem. (b) Compare your result to the ’score’ we calculated from our simulation. Is the simulated score better or worse? Now suppose we know that a particular subgroup of 100 people within a different population from before is more likely to have the disease than the general population. Assume that the probability of being sick in this target group is 0.4. (2)(a) Simulate only the disease statuses of this fake subpopulation. ## simulate 100 fake people given their probability of being sick (b) Count the number of individuals here who are sick. ## how many are sick? (3)(a) With the group’s healthy people, simulate the situation in which the test generates 10% false positives. ## simulate tests for healthy people (b) How many false positives are there among the healthy subgroup? ## how many were told they were sick, but weren’t? (4) Estimate the percentage of positive tests that actually identify people who are sick. Use our method from class as a guide. ## what’s the ’score’ this time? (5)(a) What is the actual chance (in percentage) that a person in this target group who tests positive is actually sick? (You’ll need Bayes’ theorem for this.) (b) How does this answer compare to your estimated result? (6)(a) What do you notice about the actual scores of the diagnostic tests for this subpopulation versus the population we simulated in class? Which test performs better? (b) What’s so good about identifying highly susceptible subpopulations for a rare disease? Explain your answer. Did you remember to save paper?