Transcript

advertisement
Slide 1
Statistical Sampling Part II – Language and Technique
Slide 2
This video is designed to accompany pages 41-76 of the workbook “Making Sense of Uncertainty:
Activities for Teaching Statistical Reasoning,” a publication of the Van-Griner Publishing Company
Slide 3
Let’s establish some simple, but very important language.
The Population is that larger collection of subjects or items that you are interested in understanding,
which is way too large or too complex for you to examine each subject or item.
The Sample is that collection of subjects or items that you select from the population to examine.
While there are many wrong ways to select a Sample from the Population, there are many correct ways
as well. Be aware, however, that “correct” is correct in a mathematical or probabilistic sense. It is more
than just a common sense selection.
Slide 4
The journalist Diane Sawyer once said “I think the one lesson I have learned is that there is no substitute
for paying attention.” This is good advice, especially now. The language we are learning is so simple,
from one point of view, that it may be hard to appreciate. But it is through this language that we are
able to frame the very essence of statistical inference.
Since we have said the Population is that larger collection of subjects or items that you are interested in
understanding, let’s be a bit more specific about what we are interested in. The Population Parameter is
a number that describes the population characteristic we are most interested in. It may be that it is a
proportion, such as the true proportion of all UK students who would answer “Yes” to the question “Do
you support gay marriage?”
We almost never will know a population parameter. That’s why we sample.
In a well-chosen Sample from the Population, the Statistic is a number of interest to the researcher that
describes the sample. For instance, the statistic of interest may be the proportion of 150 students
sampled who respond “yes” to the question: “Do you support gay marriage?”
Remember S-S: Statistic/Sample and
P-P: Parameter/Population
Slide 5
Let’s take a look at a CBS News/New York Times poll concerning gun laws. Here is what the article says:
As the president outlined sweeping new proposals aimed to reduce gun violence, a new CBS News/New
York Times poll found that Americans back the central components of the president's proposals,
including background checks, a national gun sale database, limits on high capacity magazines and a ban
on semi-automatic weapons. Asked if they generally back stricter gun laws, more than half of
respondents - 54 percent - support stricter gun laws …. That is a jump from April - before the Newtown
and Aurora shootings - when only 39 percent backed stricter gun laws but about the same as ten years
ago.
…
This poll was conducted by telephone from January 11-15, 2013 among 1,110 adults nationwide. Phone
numbers were dialed from samples of both standard land-line and cell phones. The error due to sampling
for results based on the entire sample could be plus or minus three percentage points.
(Hit return)
The Sample is comprised of the 1,110 adults contacted in this telephone survey taken in early January
2013. What was of interest in this sample? The poll wanted to know whether the person being
interviewed backed the central components of President Obama’s gun proposals.
(Hit return)
The survey yielded a Statistic of 54% who said they did back those components.
To identify the Population you have to ask yourself what larger group did the researchers want to
address?
(Hit return)
In this case, that group is clearly All Americans.
Finally, the population Parameter (Hit return) is the true, but unknown proportion (or percentage) of all
Americans who would have said they back the central components of the President’s gun proposals, had
it been possible to ask all Americans.
Slide 6
It is virtually impossible to say anything mathematically meaningful about how good your statistic is as
an estimate of the parameter of interest, if you haven’t taken your sample the right way. “Right way”
means more than “fairly” or “carefully.” It is a probabilistic sense of “right way.”
The concept of a “simple random sample” is perhaps the simplest way to think about what this means.
A simple random sample (shorthand “SRS”) is a sample of size n that is chosen from the population in
such a way that every set of n individuals have the same chance of being chosen.
Let’s think about this. Suppose there are 100 students in your class. It would be a lot of work but we
could list all the possible samples of 5 students. If we were to form our sample by taking alternating
between male and female students from five different rows, then we have not selected a simple random
sample. For example, one possible sample might be five males all in the same row, but the chances of
selecting that sample of size five is zero, not the same as the chance of selecting all the other samples of
size 5. The sample selected might seem “fair” in a sense, but it is not an SRS.
The best way to think about selecting a simple random sample is to imagine “mixing up” your population
and reaching in with your eyes closed and pulling out a subject one at a time until you have your sample
of size n. This ends up satisfying the mathematical definition of an SRS.
Slide 7
Understanding the role of sampling in statistical inference is largely a mathematical task. However,
rarely will you be able to physically mix up your population and draw out a sample of size n, as described
on the last slide.
However, there are many ways of actually taking a sample of size n if you are able to label the objects in
your population and have access to those labels (house numbers, social security numbers, etc.). One
useful tool is “Research Randomizer.” To use Research Randomizer you have to have your population of
N objects numbered 1 to N. Then it is easy to identify a sample of size n.
Slide 8
Let’s revisit the main idea of this video. Critical to being able to say something mathematically
meaningful about your population parameter – which you are likely to never know for certain – is to
have computed your statistic from a sample that was chosen the right probabilistic way, like a simple
random sample. That critical first step, and some really neat mathematics allows some very useful
statements to be made about how much confidence you have in your parameter estimate.
Slide 9
Keep in mind that a correct or fair sample in this probabilistic sense is not a cross-sectional sample. That
is, you are not guaranteed that a simple random sample will be a cross section of the population on all
features you deem important (race, income, education level, etc.). Let’s look at a famous example.
Often called the world’s most famous newspaper error, the banner proclaiming “Dewey Defeats
Truman” was plastered across the front page of the Chicago Tribune on November 3, 1948, the day after
the incumbent President defeated the New York Governor Thomas Dewey by a popular-vote margin of
50% to 45%. There are several reasons the Tribune got it wrong, including going to press early. But
several polling organizations also got it wrong, including Gallup (predicted 44% Truman to 50% Dewey),
Crossley, and Roper.
What happened to the polls is generally thought of as a stark reminder that cross-sectional samples
can’t do for you what probabilistic samples can, especially since you’ll never know if they were actually
an accurate cross- section.
For example, in St. Louis a Gallup poll interviewer was required to interview 13 subjects, of whom:
 exactly 6 live in suburbs, 7 in the central city
 exactly 7 were to be men, 6 women
 of the 7 men, 3 under 40; 1 black, 6 white
 rental prices paid had to be
 1 $44.01 or more
 3 $18.01 to $44.00
 2 under $18.00
Very nicely thought out in one sense, but clearly missed the mark ultimately. More importantly,
samples like these, which were typical in the early days of polling, don’t allow any mathematical
statements to be made about the integrity of the sample-based estimates of the parameters of interest.
Slide 10
This concludes our video on the language and techniques of sampling. Remember, a probabilistic
sample like an SRS is a critical step in estimating a population parameter with a sample statistic.
Download