Slide 1 Statistical Sampling Part II – Language and Technique Slide 2 This video is designed to accompany pages 41-76 of the workbook “Making Sense of Uncertainty: Activities for Teaching Statistical Reasoning,” a publication of the Van-Griner Publishing Company Slide 3 Let’s establish some simple, but very important language. The Population is that larger collection of subjects or items that you are interested in understanding, which is way too large or too complex for you to examine each subject or item. The Sample is that collection of subjects or items that you select from the population to examine. While there are many wrong ways to select a Sample from the Population, there are many correct ways as well. Be aware, however, that “correct” is correct in a mathematical or probabilistic sense. It is more than just a common sense selection. Slide 4 The journalist Diane Sawyer once said “I think the one lesson I have learned is that there is no substitute for paying attention.” This is good advice, especially now. The language we are learning is so simple, from one point of view, that it may be hard to appreciate. But it is through this language that we are able to frame the very essence of statistical inference. Since we have said the Population is that larger collection of subjects or items that you are interested in understanding, let’s be a bit more specific about what we are interested in. The Population Parameter is a number that describes the population characteristic we are most interested in. It may be that it is a proportion, such as the true proportion of all UK students who would answer “Yes” to the question “Do you support gay marriage?” We almost never will know a population parameter. That’s why we sample. In a well-chosen Sample from the Population, the Statistic is a number of interest to the researcher that describes the sample. For instance, the statistic of interest may be the proportion of 150 students sampled who respond “yes” to the question: “Do you support gay marriage?” Remember S-S: Statistic/Sample and P-P: Parameter/Population Slide 5 Let’s take a look at a CBS News/New York Times poll concerning gun laws. Here is what the article says: As the president outlined sweeping new proposals aimed to reduce gun violence, a new CBS News/New York Times poll found that Americans back the central components of the president's proposals, including background checks, a national gun sale database, limits on high capacity magazines and a ban on semi-automatic weapons. Asked if they generally back stricter gun laws, more than half of respondents - 54 percent - support stricter gun laws …. That is a jump from April - before the Newtown and Aurora shootings - when only 39 percent backed stricter gun laws but about the same as ten years ago. … This poll was conducted by telephone from January 11-15, 2013 among 1,110 adults nationwide. Phone numbers were dialed from samples of both standard land-line and cell phones. The error due to sampling for results based on the entire sample could be plus or minus three percentage points. (Hit return) The Sample is comprised of the 1,110 adults contacted in this telephone survey taken in early January 2013. What was of interest in this sample? The poll wanted to know whether the person being interviewed backed the central components of President Obama’s gun proposals. (Hit return) The survey yielded a Statistic of 54% who said they did back those components. To identify the Population you have to ask yourself what larger group did the researchers want to address? (Hit return) In this case, that group is clearly All Americans. Finally, the population Parameter (Hit return) is the true, but unknown proportion (or percentage) of all Americans who would have said they back the central components of the President’s gun proposals, had it been possible to ask all Americans. Slide 6 It is virtually impossible to say anything mathematically meaningful about how good your statistic is as an estimate of the parameter of interest, if you haven’t taken your sample the right way. “Right way” means more than “fairly” or “carefully.” It is a probabilistic sense of “right way.” The concept of a “simple random sample” is perhaps the simplest way to think about what this means. A simple random sample (shorthand “SRS”) is a sample of size n that is chosen from the population in such a way that every set of n individuals have the same chance of being chosen. Let’s think about this. Suppose there are 100 students in your class. It would be a lot of work but we could list all the possible samples of 5 students. If we were to form our sample by taking alternating between male and female students from five different rows, then we have not selected a simple random sample. For example, one possible sample might be five males all in the same row, but the chances of selecting that sample of size five is zero, not the same as the chance of selecting all the other samples of size 5. The sample selected might seem “fair” in a sense, but it is not an SRS. The best way to think about selecting a simple random sample is to imagine “mixing up” your population and reaching in with your eyes closed and pulling out a subject one at a time until you have your sample of size n. This ends up satisfying the mathematical definition of an SRS. Slide 7 Understanding the role of sampling in statistical inference is largely a mathematical task. However, rarely will you be able to physically mix up your population and draw out a sample of size n, as described on the last slide. However, there are many ways of actually taking a sample of size n if you are able to label the objects in your population and have access to those labels (house numbers, social security numbers, etc.). One useful tool is “Research Randomizer.” To use Research Randomizer you have to have your population of N objects numbered 1 to N. Then it is easy to identify a sample of size n. Slide 8 Let’s revisit the main idea of this video. Critical to being able to say something mathematically meaningful about your population parameter – which you are likely to never know for certain – is to have computed your statistic from a sample that was chosen the right probabilistic way, like a simple random sample. That critical first step, and some really neat mathematics allows some very useful statements to be made about how much confidence you have in your parameter estimate. Slide 9 Keep in mind that a correct or fair sample in this probabilistic sense is not a cross-sectional sample. That is, you are not guaranteed that a simple random sample will be a cross section of the population on all features you deem important (race, income, education level, etc.). Let’s look at a famous example. Often called the world’s most famous newspaper error, the banner proclaiming “Dewey Defeats Truman” was plastered across the front page of the Chicago Tribune on November 3, 1948, the day after the incumbent President defeated the New York Governor Thomas Dewey by a popular-vote margin of 50% to 45%. There are several reasons the Tribune got it wrong, including going to press early. But several polling organizations also got it wrong, including Gallup (predicted 44% Truman to 50% Dewey), Crossley, and Roper. What happened to the polls is generally thought of as a stark reminder that cross-sectional samples can’t do for you what probabilistic samples can, especially since you’ll never know if they were actually an accurate cross- section. For example, in St. Louis a Gallup poll interviewer was required to interview 13 subjects, of whom: exactly 6 live in suburbs, 7 in the central city exactly 7 were to be men, 6 women of the 7 men, 3 under 40; 1 black, 6 white rental prices paid had to be 1 $44.01 or more 3 $18.01 to $44.00 2 under $18.00 Very nicely thought out in one sense, but clearly missed the mark ultimately. More importantly, samples like these, which were typical in the early days of polling, don’t allow any mathematical statements to be made about the integrity of the sample-based estimates of the parameters of interest. Slide 10 This concludes our video on the language and techniques of sampling. Remember, a probabilistic sample like an SRS is a critical step in estimating a population parameter with a sample statistic.