SamplingDistributionoftheProportion CONTEXT FOR USE: This guided software lab uses simulation to introduce the concept of a sampling distribution. . LEARNING GOALS: Define terminology connected with sampling distributions in general and the distribution of the sample proportion in particular. Through a visually-rich software simulation, build understanding of the extent and nature of predictable variation across random samples. Begin to build the connection between a simulated empirical distribution of sample proportions and the (previously studied) binomial distribution. Plant the seeds of “surprising” sample results as a foundation for the concept of a p-value. DETAILED DESCRIPTION: Earlier in the course, students worked with a dataset containing the census of over one million active-duty U.S. military personnel as of April 2010. Within this population, students had earlier found that 13.221% of personnel were women. As a homework assignment, students read a six-page selection that begins by posing the following question: “. We want to think about what would happen if we did not have the entire population data set, and needed to use a simple random sample in order to estimate the parameter. How much might our estimates deviate from the actual value? “ The text then leads the reader through the use of an application that simulates repeated 100-observation simple random samples from a population with a 0.13221 known proportion of women. After generating 6,000 samples the reader is asked to note the mean and standard deviation of the empirical collection of simulated samples. Students are instructed to note these results and bring them to class. The lab continues by having students use software to compute the theoretical binomial distribution of a binomial variable with 100 repeated trials and a success probability of 0.13221. The reader then uses to respond to the question “If the population actually contains 13.2% women, how surprised should we be if our sample of 100 contains fewer than 10% women?” During the next class discussion, the instructor begins by asking several students to report their results from the simulation and then prompting a short discussion with prompts like “Why don’t these results match exactly?” “If I were to run the simulation right now, which of these results do you think I’d get”? At that point, the instructor can run the now-familiar simulation and project the results on the screen. With a histogram of several thousand repetitions visible, then ask, “So some samples will be very close to the population proportion of 0.13, but some ‘miss’ by quite a lot. How many of these simulated samples contained fewer than 10% women?” At that point, ask volunteers for their findings from the binomial computations, and provide a brief explanation of the connection between the empirical results of the simulations and the theoretical binomial model. INSTRUCTOR NOTES: This particular reading and self-paced lab uses a simulation that ships with JMP software, but there are many comparable simulations available on the web. Note that the simulation does not need to incorporate the actual dataset, but that it is important for students to operate the simulation on their own at their own pace. It is easy to expand the exercise to investigate the impacts of varying sample size and varying population proportions. RESOURCES: Carver, Robert H. (2014) Practical Data Analysis with JMP, 2nd Ed. Cary, NC: SAS Press. Chapter 8, pages 150155. For the Military dataset and a full description, also see OpenIntro Statistics, 2nd Ed. (2012) by David M Diez, Christopher D Barr, and Mine Cetinkaya-Rundel, available online at http://www.openintro.org/stat/textbook.php.