Chapter 19 - Sample Surveys P VI : S Dr. Joseph Brennan

advertisement
Chapter 19 - Sample Surveys
PART VI : SAMPLING
Dr. Joseph Brennan
Math 148, BU
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
1 / 20
Parameters & Statistics
Parameter: A numerical fact about a population.
The average age of all eligible voters.
The percentage of eligible voters registered to vote.
The percentage of adults retired/employed/students.
As many populations are too large to feasibly take a consensus, sampling
is used to estimate a parameter.
Statistic: A numerical fact about a sample. A statistic is used to
estimate a parameter.
An investigator knows a statistic and wants to know a parameter.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
2 / 20
Bias
Recall that bias error is not random but affects the data in a single
direction.
When a selection procedure is biased, taking a larger sample does not
lessen the bias. One is just repeating the same basic mistake on a larger
scale.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
3 / 20
Example: Roosevelt and Landon
In 1936, Franklin Delano Roosevelt and Alfred Landon were in competition
for the American presidency.
The Literary Digest magazine predicted Roosevelt would poll at 43% on
election day.
The Literary Digest had accurately predicted the winner of every election
since 1916.
The 1936 prediction was based upon a sample of 2.4 million voters; an
obscenely large opinion poll before the advent of the internet.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
4 / 20
Example: Roosevelt and Landon
Roosevelt won with a majority 62%. The Literary
Digest was off by 19 percentage points!
The Literary Digest folded after the 1936 election
and is best-remembered today for the circumstances
surrounding its demise.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
5 / 20
Example: Roosevelt and Landon
The Literary Digest compiled their 2.4 million responses (todays typical
political survey asks between 500 & 1000 respondents) from a total 10
million questionnaires. The response rate was low; 24%.
The 10 million names for the questionnaires were pulled from telephone
books and car registrations.
In 1936 only 1 in 4 American households had telephones and far less
owned a car. Remember that 1936 was the height of the Great Depression.
The Literary Digest’s error can be partially attributed to sampling bias.
During the depression, in an election split ideologically on economic policy,
the Literary Digest’s questionnaire was biased against the poor.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
6 / 20
Bias
Sampling Bias: A bias in which a sample is collected in such a way that
some members of the intended population are less likely to be included
than others.
The bias can lead to an over/underrepresentation of the
corresponding parameter in the population.
Almost every sample in practice is biased because it is practically
impossible to ensure a perfectly random sample.
Non-response Bias: A bias that results when respondents differ in
meaningful ways from nonrespondents.
Nonresponse is often problem with mail surveys, where the response
rate can be very low.
With a response rate of 25%, the Literary Digest’s error can be
partially attributed to nonresponse bias.
Respondents and nonrespondents can differ in ways beyond their
willingness to answer a questionnaire.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
7 / 20
Quota Sampling
Quota Sampling: A sampling method in which interviewers are
assigned a fixed quota of subjects to interview.
Example: An interviewer during the 1948 presidential election is sent to
St. Louis to interview 13 subjects:
6 from the suburbs and 7 from the city.
7 men and 6 women.
Of the 7 men:
3 are to be under 40 and 4 are to be over 40 years old.
1 is to be black and 6 are to be white.
Of the 6 white men; concerning their rent:
1 pays more than $44.
3 pay between $18 and $44.
2 pay less than $18.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
8 / 20
Example: Truman and Dewey
In the 1948 presidential election, the three major polls sampled using
quota sampling.
The polls gave Dewey a 5 percentage point lead over Truman; people were
very confident of Dewey’s eventual victory.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
9 / 20
Quota Sampling
Quota Sampling is inherently flawed in that unintentional bias is
introduced by the interviewers.
The point of a quota sample is to find a sample which accurately
represents the populations diversity subject to known parameters.
The quotas are an indirect way to force a sampling distribution, though
this control can be misguided. There are many factors that may influence
the results of a sample besides the ones a survey organization controls for.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
10 / 20
Example: Truman and Dewey
In the 1948 election, the Gallup Poll’s quota sampling had a Republican
bias.
For a number of confounding reasons, interviewers chose too many
Republicans. Within enough demographic groups, the Republicans were
marginally easier to interview.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
11 / 20
Probability Methods
Probability Methods: Sampling techniques which implements an
objective chance process to choose subjects from the population, leaving
no discretion to the interviewer.
It is possible to compute the chance that any particular individual in
the population will get into the sample.
Simple Random Sampling: A sampling technique where selection of
individuals is equally likely and drawing for the sample is performed
without replacement.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
12 / 20
Probability Methods
National elections have a population (eligible voters) of approximately 200
million individuals. To frame an opinion poll where all individuals have an
equally likely chance of being polled is not a simple task.
Between 1952 and 1984, Gallup used a technique called Multistage
Cluster Sampling:
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
13 / 20
Probability Methods
Gallup’s Multistage Cluster Sampling:
Divide the country into four geographical regions, each having similar
population sizes.
In each region, group together all the population centers of similar
size and take a random sample.
For election purposes, population centers are divided into wards and
wards may be further divided into precincts. Among the chosen
population centers, take a random sample of wards and precincts.
Among these samples, take a random sample of households.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
14 / 20
Probability Methods
Interviewers sent to individual households are not given discretion as to
who they poll. Interviewers may be instructed to:
”Speak with the youngest male eligible voter, of if none are available,
speak with the oldest female eligible voter.”
This method of questioning is constructed like a quota sample: it is
constructed so that the sample distribution and the population distribution
are similar for a certain set of parameters.
The difference between this probability method and quota sampling is the
lack of discretion an interviewer is allowed.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
15 / 20
Gallup’s Probability Methods
Using Probability Methods, Gallup has been able to reduce sample sizes
and increase accuracy:
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
16 / 20
Gallup’s Probability Methods
The best probability methods minimize bias by being impartial and
objective in choosing samples.
Section 6 of this chapter has an in-depth analysis of the Gallup survey
from the presidential election of 1984. The goal of the survey is to infer,
from the sample, how the nation will vote.
However, the population is much more complicated then just eligible
voters, and should be amended to eligible voters who intend to vote.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
17 / 20
Gallup’s Probability Methods
Gallup attempts to screen for nonvoters by having them answer questions:
1) How much thought have you given to the coming November
elections?
2) Have you ever voted in this precinct or district?
3) Where do people who live in this neighborhood go to vote?
4) In the previous election did things come up which kept you from
voting, or did you happen to vote? For whom?
Since not voting can have a negative stigma, the above questions can
screen for nonvoters without forcing people to admit that they do not
attempt to vote. Question 3 is not multiple choice and asks for a specific
location.
Polling nonvoters will introduce a bias into your sample. The preferences
of likely voters differ from preferences of likely nonvoters.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
18 / 20
Gallup’s Probability Methods
Even with personal interviews, many subjects are missed. Since they tend
to be different from the subjects available for the interview a
non-response bias is created. This bias is adjusted out by giving more
weight to subjects who were available but hard to get.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
19 / 20
Estimating Parameters
When we infer a parameter (population) from a statistic (sample), our
estimate has an error described:
estimate = parameter + bias + chance error
Chance error is sampling error and is natural and measurable for all
samples, as they are only part of the whole population.
Bias is non-sampling error as this area arises from the way the sample was
drawn and is a systematic error.
Dr. Joseph Brennan (Math 148, BU)
Chapter 19 - Sample Surveys
20 / 20
Download