Stats: Modeling the World

advertisement
Stats: Modeling the
World
Chapter 12
Sample Surveys
The WHO of our data….
The population of interest is the entire group
of people/things that we wish to study.
Since we want to know
things about the population,
we need to figure out how
to gather that data!!
Let’s “sample” them all!!
A census is a “sample” of the entire
population.
How many “G’s”?
Problems with a Census
So why doesn’t a census always work?
- difficult or impractical to complete
- populations shift in their demographics
- too complex in terms of time and budget
Taking a Survey
Often, we ask questions of a small group
(called a sample) in the hope of learning
something about the entire population…
These are called opinion polls or surveys
The ultimate question….
Does every sample represent the
population fairly?
Literary Digest Poll…
The Moral of the Story…
If your sample was chosen in a poor
manner, it doesn’t matter how many
people you surveyed, bad data will still
produce bad results.
The phrase Garbage In – Garbage
Out applies.
In order to draw valid conclusions, you need
a sample (no matter the size) that well
represents the population!
Getting a Representative Sample…
Making sure that, on average, the sample looks
like the rest of the population allows us to draw
conclusions based on our data.
A small sample, IF it
is chosen correctly,
can represent the
entire population!
Parameters vs Statistics
A population parameter is a value that
describes the entire population.
This value is rarely known and typically
unknowable due to constant change
Our goal is usually to estimate the parameter.
A sample statistic is a value that is found
from the sample data.
We use the sample to estimate the parameter.
Try it!!
This summer, when I went to the grocery store, I
kept track of my receipts to help with budgeting. I
wrote down how much I spent at Smith’s during
June and July. On average, I spent $75.98.
a) What is the parameter I'm trying to estimate?
b) If Smith's took a sample of customers and
checked their receipts, what parameter is Smith's
trying to estimate?
So how do we gather statistics?
Picking a sample at random protects us
from the influences of all the features of our
population, even ones that we may not have
thought of.
Statistical sampling uses random chance,
not human choice!!
Jelly Blubbers…
Materials needed…
- JellyBlubber colony
- Ruler
- Calculator
- Data Sheet
JellyBlubbers…
What is our population of interest?
What is the population parameter?
Judgmental Sample
Select 5 Jelly Blubbers that, in your
judgment, are representative of the
population of Jelly Blubbers. Record the
lengths of your five Jelly Blubbers in
millimeters and find the average.
Plot your average on the whiteboard…
Simple Random Sample (SRS)
A Simple Random Sample has two requirements…
- every person has an equal chance and
- every combination of people has an equal
chance of being selected.
To select an SRS
“Label and Table”
1) Assign numbers to each of the subjects
2) Use a random table to select the sample.
Table Example:
89810 48512 90174 02687 83117
Simple Random Sample
Use the 1st line of your random number table
to choose 5 JellyBlubbers and measure the
length in mm. Find the average length and
plot it on the board.
Advantages?
Disadvantages?
Simple Random Sample:
Advantages:
Disadvantages:
Systematic Sample
A systematic sample involves every nth object.
This is useful when you believe that the order of the list
will not affect the results of your survey.
To get a systematic sample:
1) Determine a starting place using a random table
2) From your starting place, sample every nth
object on the list.
Systematic Random Sample
Since we have 100 JB’s and we want a
sample of 5, we need to count every _____th
JB. Use the 2nd line of your RNT to choose
a JB as a starting place.
Advantages?
Disadvantages?
Cluster Sample
A cluster sample involves splitting the population
into subgroups.
This is useful when you think all subgroups are
pretty similar and each group will adequately
represent the population. (ALL from SOME)
To get a cluster sample:
1) Split your population into heterogeneous
groups, called clusters
2) Use an SRS to determine which
cluster(s) to sample
Cluster Random Sample
Using the 3rd line of your RNT, pick a
random JB to measure. Then choose the
two JBs before it and the two JBs after it and
measure those too. Plot your average!
Advantages?
Disadvantages?
Stratified Random Sample
A stratified sample is more complicated than the
others. It also involves splitting the population into
subgroups.
This is useful when you think certain characteristics
may be an influence in the data. (SOME from ALL)
To get a stratified sample:
1) Split your population into homogeneous
groups, called strata
2) Within each strata, use an SRS to determine
who is sampled
3) Combine the results from each strata
Stratified Random Sample
Using the 4rd and 5th lines of your RNT, pick
a random JB to measure from each strata.
Plot your average!
Advantages?
Disadvantages?
About the Lab…
Did we all get the same results each time?
Does each graph look alike?
Which one does the best job of predicting JB
length? Why?
Other ways to sample….
Multistage Sampling
Voluntary Response Sampling
Convenience Sampling
Try it!!
What kind of sampling method is used?
a) We want to know what percentage of local doctors accept
Medicare patients. We call the offices of 50 doctors randomly
selected from the local Yellow Pages.
b) We want to know what percentage of local businesses
anticipate hiring additional employees in the upcoming month.
We randomly select a page in the Yellow Pages and call
every business on the page.
c) We want to know if students at our school are satisfied with
the food available on campus. We go to the cafeteria and
interview every 10th person in line.
Watch out for Bias
Bias means that something about the
sample’s design has systematically
distorted the result so that the sample
does not reflect or even approximate
reality.
There is usually no way to fix a biased
sample and no way to salvage useful
information from it.
Problems to watch for…
Sometimes the sampling frame (the list
from which we sample from) is difficult to
obtain or even to define. This creates a
problem because the people who are left
out of the list may differ from the people on
the list.
Problems to watch for…
Many samples suffer from a bias called
undercoverage, in which some portion of the
population is not sampled at all or has a smaller
representation in the sample than it has in the
actual population.
Problems to watch for…
A major issue in sampling is nonresponse bias, where
someone who is chosen for the sample cannot be contacted
or refuses to cooperate. The problem is that those who don’t
respond may differ from those who did respond.
Problems to watch for…
Another major issue for surveys is known as response
bias. (not be confused with non-response!) Response
bias refers to anything in the survey that influences the
responses, such as wanting to please the interviewer,
not wanting to answer personal or legal questions, etc.
Problems to watch for…
Watch out for the wording of the question
in a survey, as it can also influence the
responses. Asking a question with a
leading statement is a good way to bias
the responses.
How to combat bias…
Look for bias in any survey you encounter.
- If you are developing your own survey,
critique your survey before gathering data.
- Spend your resources and time trying to
reduce bias
- Pretest your survey so that you can
make changes before it is too late.
- Report your sampling method in detail!
POD #6
8/29/2011
Quiz Review
Chapter 2/12 Review… One Minute
Paper
For each topic below, write a few
sentences briefly summarizing the
concept. Then rate yourself on each
concept with Green (Good to Go!),
Yellow (Kind of shaky), or Red (Whoa!
Help me!)
• The W’s of Data
• Identifying the Population, Parameter,
Sample, and Statistic
• Identifying the type of sampling
method used
• Using a random number table to
POD #7
8/30/2011
1997 #27
For the other options, determine the
sampling method.
What about bias?
Download