Instructors Guide: The FishPond Game (Capture-Recapture) Quick Info Level: Intro/Intermediate undergraduate statistics Brief Description: Students play an on-line game where they are asked to estimate the population size. They are invited to try various estimation techniques and better estimates are given more points. Topics Covered: Capture-recapture techniques to estimate population size. Learning Goals: Discuss basic ideas behind estimating population size. Provide students a method to evaluate whether a statistic is a good estimator. Allow students an opportunity to create and evaluate their own estimator. Software Required: Data analysis software such as Excel, Minitab or R for basic calculations. Students will also need computer access to play the FishPond game on the web (this can be done inside or outside of the regularly scheduled class time). Prerequisites: Descriptive Statistics, Histograms Time: 2 hours in class and 2 hours of homework Instructor Resources: web.grinnell.edu/individuals/kuipers/stat2labs/fishpond1.html includes links to the Student Labs, Instructor Guides, and to the FishPond Game. Summary Students are placed in charge of a fish farm with only one species of fish, the gobble fish, and are told they need to estimate the number of fish in the pond. The on-line game allows students to sample data, submit a population estimate, and then the game gives students points for good population estimates. However each sample costs the students points and thus larger sample sizes are more costly. Students continue sampling and making estimates until their estimate is close enough to the true population size to win the game (or until they do not have enough funds/points to take more samples). Note that the true population size changes each time the game is restarted, so that students can play the games multiple times. Students are also asked to conduct simulations in order to compare estimators. Using simulations students can evaluate the quality of an estimator, such as determining if the estimator is consistent or biased. What type of course is this FishPond lab designed for? This lab may be incorporated into an introductory level calculus or non-calculus based statistics course or a more advanced applied statistics course. How should you conduct the lab? How much time should you expect to allocate? DAY 1: Student Handout (20 Minutes): Students read through the first page and answer Question 1. Lecture or discuss how the proportion of marked animals in the second sample m2 , can be used n 2 as an estimate the proportion of marked animals in the entire population n1 . N Play the FishPond Game (20 Minutes): This can be done in small groups during class. If students do not have access to the web during class, this can be assigned as homework. Be prepared to have several students take much longer than the others. Encourage respect towards others. Decide whether you want students who finish quickly to play the game again or allow them to leave class early. Make sure to check the “Participant Info” box to ensure student data is collected. Students are asked to read a brief tutorial, similar to the student handout. After the tutorial, students are asked to click the sample button to start collecting their own data. Select the sizes of your first and second sample. You start with $1000, each unit sampled (in either the first or second sample) cost $1. After selecting both sample sizes and hitting the “Confirm Sample” button, the number of marked fish is provided by the game. You are asked to input your estimate of the population size. Click Confirm Estimate You win if your estimate is close to the population estimate. If your estimate is not close enough to win, you will still get a reward for your estimate. Rewards are random, but, closer estimates tend to earn larger rewards. Click the sample button again to take another sample and make another estimate. Keep trying until you win or run out of money. “Main Menu” restarts the Game Day 2: Demonstrate how to use the on-line simulations within the game and then have students complete the lab (50 Minutes). If students do not complete the lab during class they can finish the work as a homework assignment (possibly in small groups). Students can input a hypothetical population size (N) as well as both first (n1 ) and second (n2 ) sample sizes Students should simulate about 10000 iterations of capture-recapture samples to evaluate which estimator is best. The program simulates the number of recaptured fish (m2 ) as both estimates ( N̂ ~ and N ). Use the slider to see all graphs. Helpful hints for instructors using the FishPond lab for the first time. 1. Spend time in class discussing the importance of simulations to evaluate statistics. While this lab does not provide theoretical reasoning for evaluating statistics, students start to appreciate how difficult it can be to determine a good estimate. You may want to ask students to compare the sample mean and sample median as an estimate of the center of the population. You could also ask students to consider designing a simulation to determine if the population (n in the denominator) or sample standard deviation ( n 1 in the denominator) is a better estimation of population spread. 2. Ask students how biased samples or extraneous variables could influence the estimates of population size. Students should get an appreciation for the variation involved in conducting experiments. 3. In class assignments where data is only collected solely for educational purposes typically do not require Institutional Review Board (IRB) approval. For more advanced courses, instructors may choose to expose students to the IRB process. Information about the IRB process is provided on the Stat2Labs website. 4. Encouraging students to work in pairs or small groups provides a great opportunity for the students to talk about their understanding and teach each other. 5. You may want to assign students a challenge (let them play multiple times in order to improve their score, giving a small amount of extra credit for the group with the best score). There is a significant amount of randomness involved in the amount of money earned with each guess, so the best strategy does not always result in the best final score. What else is in this Instructor Guide? In the next section, we provide detailed comments on the student lab. We suggest questions you can ask to promote class discussion and point out common issues you may run into when using the lab. Note that you need to use the Review option with Final Showing Markup, to see the detailed comments for instructors. For more information and ideas on using FishPond in your course, go to web.grinnell.edu/individuals/kuipers/stat2labs/fishpond1.html. Introduction to Capture-Recapture: A technique to estimate the size of a population Animal ecologists are often interested in knowing how many animals exist in a population. Those studying birds may set up nets or simply count the number of birds seen or heard within a specific area. Ecologists studying larger wildlife, such as wild horses in the western United States, may fly over a specific area and count the number of horses sighted. In fisheries, biologists may count the number of fish caught by a commercial operation. Most techniques discussed in introductory statistics classes are based on using data from one sample to infer something about a larger population. However, wildlife are often difficult to see, catch, or hear. Thus even if a census is attempted, it is very likely that not all animals within a specific area are counted. Thus, other techniques are often used to estimate the size of the population. The Lincoln-Petersen Estimator: To estimate the size of the population, ecologists use a technique called capture-recapture. The Lincoln-Petersen estimator is the simplest capture-recapture technique. In this process, each of the following steps is taken: A) A sample of n1 animals are collected and marked or tagged so that they can be easily recognized and then released back into the population. Since n1 is an unknown fraction of the entire population, we can write n size of sample 1 p 1 (Equation 1) N population size where N is the population size and p represents the proportion of marked (captured) animals in the population. B) A second sample of size n2 is then collected (possibly a few days later). In this second sample, the number of marked animals, m2, are counted. Note that m2 represents the number of animals caught in both the first and second sample (they were captured and then recaptured). C) If both samples were properly collected, then the proportion of marked animals in the second sample m2 n , can be used as an estimate the proportion of marked animals in the entire population 1 . N n2 Equating the sample proportion to the population proportion, m2 n1 , allows us to estimate the n2 N population size by solving for N̂ : nn Nˆ 1 2 . m2 (Equation 2) Modified estimator: Since it is possible to observe zero marked fish in the second sample (m2 = 0). Equation 2 cannot be used to estimate the population size if m2 = 0. A modified estimate of the population is: N (n1 1)(n2 1) 1. m2 1 (Equation 3) Evaluating Population Estimators: 1) Assume you collected the following information: n1 = 100, n2 = 100, and m2 = 4. Use ~ Equations 2) and 3) to estimate N̂ and N . Play the FishPond game: The on-line allows you to sample data, submit a population estimate, and then gives points for good population estimates. However sampling more will cost points. Continue sampling and estimating until you win the game or run out of money. Go to the web site http://statgames.tietronix.com/fishpondnew/ Check the Participant Info box and type in the appropriate information to record your data: Participant ID:______________ (This will be on the web, do not use a name that will identify you.) Group ID:_______________ Select Start Game and submit the following data for your game: Participant ID Did you Win? Number of times you collected data Final Cash Amount Final Score Actual Population Size Final Estimated Population Size Game 1 Game 2 (optional) Game 3 (optional) To develop a strategy to win the FishPond game, it can be useful to see how estimates of the population size can be impacted by sample sizes. Click the “Simulations” button found in the game). Input a population size of N = 3500, n1 = 200, and n2 = 200. Run 10000 simulations. This will simulate playing the game 10000 times. Histograms show m2 and the population estimates using both N̂ and ~ N for each of the 10000 simulations (slide the arrow to see all graphs). Compare the shapes of these two histograms. ~ 2) Which histogram, N̂ or N , has an estimate closer to the population mean (3500)? Do either ~ N̂ or N appear to be a biased estimate of the population size (i.e. tend to overestimate or underestimate the population)? 3) Which histogram has a larger spread? Note that the N̂ graph may have invalid (infinite) ~ population estimates. Does N̂ or N appear to be a more precise estimate? 4) Keep N = 3500, n1 = 200, and run 10000 simulations but change n2 to 50, then change n2 to 300. ~ Describe how increasing n2 impacts both the N̂ and N histograms. Does increasing n2 impact the ~ ~ mean of N̂ or N ? Does increasing n2 impact the spread of N̂ or N ? 5) RESTART THE SIMULATION. Repeat 4) with n1 = 300. Describe how changing n1 impacts ~ ~ both the N̂ and N estimates. Does increasing n1 impact the mean of N̂ or N ? Does increasing ~ n1 impact the spread of the N̂ or N estimates? 6) Clearly in a real-world situation we would not be able to automatically change the population size. However we can choose our sample sizes. Play the FishPond Game again. Describe any strategies you used to determine sample sizes for your estimates. Explain why you selected this strategy and record your game results below. Participant ID Did you Win? Number of times you collected data Final Cash Amount Final Score Actual Population Size Final Estimated Population Size