Download the Book

advertisement
Chapter 20: Chance Errors In Sampling
A simple random sample is like drawing at
random, without replacement, from a box of
tickets.
In practice, we use a computer to generate a
simple random sample because it’s quite difficult to
properly randomize paper tickets.
A simple random sample is affected by chance
variability.
Example: a population of 6,672 Americans, age 18 to 79
comprised 46% men (M) and 54% women (F). A simple
random sample of 100 people from this population gave 51
men and 49 women:
Why didn’t we get 46% men?
Does this mean there something wrong with the computer?
Or is there something wrong with our sample?
A closer look: 250 simple random samples from the same
population as before:
Histograms for
the percentage
of men in the
sample.
For bigger
samples, the
sample
percentage is
less variable.
Sample size 100
Sample size 400
What can we say about the percentage of 1’s in
the draws?
The expected value for the % of 1s in the draws is
EV%
= the % of 1’s in the box
=
EVsum
number of draws
Population
percentage
100%
The standard error for the % of 1s in the draws is
SE%
=
SEsum
number of draws
100%
Example 1. In a population of 100,000 people, we know
that 20,000 are minorities. I take a simple random
sample of 400 people from this population.
a) How many minorities do I expect to get in my sample?
Give-or-take how many?
b)
What percentage of minorities do I expect to get in my
sample? Give-or-take what percentage?
Example 2. (See Example 1).
a) Find the SE% for a sample of size 100.
b) Find the SE% for a sample of size 1600.
Multiplying the sample size by some factor, divides
the SE% by the SQUARE ROOT of the factor.
c) Find the SE% for a sample of size 3600.
Example 3. If I take a simple random sample of 100 people
and get SE% = 5%, how many people do I need to sample
to make SE% = 1%?
If the sum of the draws follows the normal
curve, so does the percentage of 1’s in the
draws.
Probability histogram for the sum
Probability histograms for
the sum and the
percentage for 400 draws
from the box:
4
0
1
Probability histogram for the percentage
For a large number of draws, the % of 1s in the
draws will follow the normal curve with average
EV% and standard deviation SE%.
In particular,
68% of the time the % of 1s in the draws will be
between EV% – SE% and EV% + SE%.
95% of the time the % of 1s in the draws will be
between EV% – 2 (SE%) and EV% + 2(SE%).
We can use the normal curve to find the chance
that the % of 1s in the draws is in a region of
interest. We use EV% and SE% to get standard
units.
Example 4. Refer to example 1.
a)
Would you be surprised to hear that I got 75 minorities
in my sample?
b)
Would you be surprised to hear I got 15% minorities in
my sample?
Example 5. In a certain town, 25% of college students own
an automobile. If we take a simple random sample of 400
of these students, what is the chance that less than 20% of
the sampled students own an automobile?
Sample size and Accuracy
For simple random samples, the accuracy is
determined by the sample size itself, not the
sample size relative to the population size.
E.g. If we want the same accuracy in Salt Lake City and
Logan, we should sample the same number of people in
Salt Lake City and Logan.
The rule is true if the sample size is small compared to the
population size – sampling 20,000 people in Logan would
be a bit more accurate than sampling 20,000 in Salt Lake
City.
The rule assumes the thing being estimated is not too
different in the two places.
Example 6. I’m interested in estimating the percentage of
people who watch Channel 5 News. In each case, will
the Logan or Salt Lake City sample be more accurate?
a) A simple random sample of 500 Logan residents and a
simple random sample of 500 Salt Lake City residents.
b)
A simple random sample of 500 Logan residents and a
simple random sample of 750 Salt Lake City residents.
c) A simple random sample of 5% of all Logan residents
and a simple random sample of 5% of all Salt Lake City
residents.
Download