Sampling Distribution of the Proportion

advertisement
SamplingDistributionoftheProportion
CONTEXT FOR USE: This guided software lab uses simulation to introduce the concept of a sampling
distribution. .
LEARNING GOALS:
 Define terminology connected with sampling distributions in general and the distribution of the
sample proportion in particular.
 Through a visually-rich software simulation, build understanding of the extent and nature of
predictable variation across random samples.
 Begin to build the connection between a simulated empirical distribution of sample proportions
and the (previously studied) binomial distribution.
 Plant the seeds of “surprising” sample results as a foundation for the concept of a p-value.
DETAILED DESCRIPTION:
Earlier in the course, students worked with a dataset containing the census of over one million active-duty
U.S. military personnel as of April 2010. Within this population, students had earlier found that 13.221%
of personnel were women. As a homework assignment, students read a six-page selection that begins by
posing the following question: “. We want to think about what would happen if we did not have the entire
population data set, and needed to use a simple random sample in order to estimate the parameter. How
much might our estimates deviate from the actual value? “
The text then leads the reader through the use of an application that simulates repeated 100-observation
simple random samples from a population with a 0.13221 known proportion of women. After generating
6,000 samples the reader is asked to note the mean and standard deviation of the empirical collection of
simulated samples. Students are instructed to note these results and bring them to class.
The lab continues by having students use software to compute the theoretical binomial distribution of a
binomial variable with 100 repeated trials and a success probability of 0.13221. The reader then uses to
respond to the question “If the population actually contains 13.2% women, how surprised should we be if
our sample of 100 contains fewer than 10% women?”
During the next class discussion, the instructor begins by asking several students to report their results
from the simulation and then prompting a short discussion with prompts like “Why don’t these results
match exactly?” “If I were to run the simulation right now, which of these results do you think I’d get”?
At that point, the instructor can run the now-familiar simulation and project the results on the screen.
With a histogram of several thousand repetitions visible, then ask, “So some samples will be very close to
the population proportion of 0.13, but some ‘miss’ by quite a lot. How many of these simulated samples
contained fewer than 10% women?”
At that point, ask volunteers for their findings from the binomial computations, and provide a brief
explanation of the connection between the empirical results of the simulations and the theoretical
binomial model.
INSTRUCTOR NOTES:
This particular reading and self-paced lab uses a simulation that ships with JMP software, but there are
many comparable simulations available on the web. Note that the simulation does not need to incorporate
the actual dataset, but that it is important for students to operate the simulation on their own at their own
pace.
It is easy to expand the exercise to investigate the impacts of varying sample size and varying population
proportions.
RESOURCES:
Carver, Robert H. (2014) Practical Data Analysis with JMP, 2nd Ed. Cary, NC: SAS Press. Chapter 8, pages 150155.
For the Military dataset and a full description, also see OpenIntro Statistics, 2nd Ed. (2012) by David M Diez,
Christopher D Barr, and Mine Cetinkaya-Rundel, available online at
http://www.openintro.org/stat/textbook.php.
Download