Chapter 20: Chance Errors In Sampling A simple random sample is like drawing at random, without replacement, from a box of tickets. In practice, we use a computer to generate a simple random sample because it’s quite difficult to properly randomize paper tickets. A simple random sample is affected by chance variability. Example: a population of 6,672 Americans, age 18 to 79 comprised 46% men (M) and 54% women (F). A simple random sample of 100 people from this population gave 51 men and 49 women: Why didn’t we get 46% men? Does this mean there something wrong with the computer? Or is there something wrong with our sample? A closer look: 250 simple random samples from the same population as before: Histograms for the percentage of men in the sample. For bigger samples, the sample percentage is less variable. Sample size 100 Sample size 400 What can we say about the percentage of 1’s in the draws? The expected value for the % of 1s in the draws is EV% = the % of 1’s in the box = EVsum number of draws Population percentage 100% The standard error for the % of 1s in the draws is SE% = SEsum number of draws 100% Example 1. In a population of 100,000 people, we know that 20,000 are minorities. I take a simple random sample of 400 people from this population. a) How many minorities do I expect to get in my sample? Give-or-take how many? b) What percentage of minorities do I expect to get in my sample? Give-or-take what percentage? Example 2. (See Example 1). a) Find the SE% for a sample of size 100. b) Find the SE% for a sample of size 1600. Multiplying the sample size by some factor, divides the SE% by the SQUARE ROOT of the factor. c) Find the SE% for a sample of size 3600. Example 3. If I take a simple random sample of 100 people and get SE% = 5%, how many people do I need to sample to make SE% = 1%? If the sum of the draws follows the normal curve, so does the percentage of 1’s in the draws. Probability histogram for the sum Probability histograms for the sum and the percentage for 400 draws from the box: 4 0 1 Probability histogram for the percentage For a large number of draws, the % of 1s in the draws will follow the normal curve with average EV% and standard deviation SE%. In particular, 68% of the time the % of 1s in the draws will be between EV% – SE% and EV% + SE%. 95% of the time the % of 1s in the draws will be between EV% – 2 (SE%) and EV% + 2(SE%). We can use the normal curve to find the chance that the % of 1s in the draws is in a region of interest. We use EV% and SE% to get standard units. Example 4. Refer to example 1. a) Would you be surprised to hear that I got 75 minorities in my sample? b) Would you be surprised to hear I got 15% minorities in my sample? Example 5. In a certain town, 25% of college students own an automobile. If we take a simple random sample of 400 of these students, what is the chance that less than 20% of the sampled students own an automobile? Sample size and Accuracy For simple random samples, the accuracy is determined by the sample size itself, not the sample size relative to the population size. E.g. If we want the same accuracy in Salt Lake City and Logan, we should sample the same number of people in Salt Lake City and Logan. The rule is true if the sample size is small compared to the population size – sampling 20,000 people in Logan would be a bit more accurate than sampling 20,000 in Salt Lake City. The rule assumes the thing being estimated is not too different in the two places. Example 6. I’m interested in estimating the percentage of people who watch Channel 5 News. In each case, will the Logan or Salt Lake City sample be more accurate? a) A simple random sample of 500 Logan residents and a simple random sample of 500 Salt Lake City residents. b) A simple random sample of 500 Logan residents and a simple random sample of 750 Salt Lake City residents. c) A simple random sample of 5% of all Logan residents and a simple random sample of 5% of all Salt Lake City residents.