DataCollection

advertisement
Surveys
FPP Chapter 19
General Idea
Parameter
Population
Inference
Sample
Statistic
Some new vocabulary
 Population
 Sample
 Parameter
 Statistic
 Inference
 Bias
 Non-response bias
 Response bias
 Simple random sample
 Convenience sampling
 Frame coverage bias
 Judgment sampling
 Voluntary sampling
 Probably others that I’ve missed
Plan of Study
1.
Issues in questionnaire design
2.
Methods for selecting units to survey
3.
Administration of surveys
Challenges to writing good
questions
1.
2.
3.
4.
5.
6.
Defining objectives and specifying the kind of answers needed to
meet objectives of the question
Ensuring all respondents have a shared, common understanding
of the question
Ensuring people are asked questions to which they know the
answers
Asking questions respondents are able to answer in the terms
required by the question
Asking questions respondents are willing to answer accurately
Asking questions that don’t lead respondent to a certain answer
Steps to running a survey
1.
Establish the target population
2.
Obtain a sampling frame (this can be very difficult)
3.
Select a sample
4.
Obtain data from the sampled units
Misspecifying target population
 1994 Democratic gubernatorial primary in Arizona
 All polls predict Eddie Basha would trail front-runner by at
least 9 points
 Result of election: Basha won
 Target population used in polls: registered voters who had
voted in previous primaries
Surveys that use census as
sampling frame
 U.S. census often used as frame for many federal and social
surveys
 target population here is folks living in U.S.
 U.S. census misses some people
 can you think of any examples?
 Samples take from frame are non-representative even before
sampling
Selecting samples
 Units sample should be representative of the target
population
 How do we ensure this?
 Select a subset of units from the frame at random
 Most common method is to obtain a “simple random sample”
 If random sample is large enough, it should have
characteristics that mirror the characteristics of the
population frame.
Obtaining survey data
 Remember the following when designing a survey
 Imperative that purpose of survey is stated clearly
 Confidentiality should be promised and kept
 At ISU there is a group that ok’s confidentiality of surveys is met
 Method for asking questions should be the same for all sampled
units
Unreliable methods of selecting
samples
 What follows are examples of how NOT to select a sample
 Convenience sampling:
 Picking units that are easy to measure
 Judgement sampling:
 Picking units you judge as representative of the population
 Voluntary response sampling:
 Picking units who respond voluntarily
 What are some examples of each?
Additional potential pitfalls
 Nonresponse bias:
 Units that do not respond differ from those that do. These folks
will be under representated.
 Frame coverage bias:
 Frame doesn’t include all of target population
 Can we think of some examples?
Example of voluntary response
survey
 Nightline call-in poll:
Ted Koppel asked people to call his show to express their
opinion on whether the United Nations should continue to
have its headquarters in New York
186,000 people called in with 67% saying no.
Independent random sample: 72% said yes.
Examples of problematic survey
designs
 Shere Hite’s book, Women and Love: A Cultural Revolution in
Progress (1987), claims:
 84% of women “not satisfied emotionally with their
relationships” (pg. 804)
 95% of women “report forms of emotional and psychological
harassment from men with whom they are in love relationships”
(pg. 810)
 70% of women “married five or more years are having sex
outside of their marriages” (pg. 856)
Hite’s survey
 To whom did she send a survey?
 100,00 questionnaires mailed to professional women’s groups,
counseling centers, church societies, and senior citizens’
centers.
 Her target population was women. What was her actually
population?
Hite’s survey
 What did the survey look like?
 127 essay questions on questionnaire
 4.5% of these questionnaires returned
 What was not taken into account?
Hite’s survey
 How did she ask the questions?
 Questions use vague words like “love”.
 People have different interpretations of such words
 Questions were leading
 “Does your husband/lover treat you as an equal? Or are there times when
he seems to you as an inferior? Leave you out of decisions? Act superior?”
(pg. 795)
Another problematic survey design
 The article “Abortion Rights Groups Surveying Voters’
Views”, by Jack Coffman, appeared in the December 26,
1989 issue of the St. Paul Pioneer Press Dispatch.
 Problems with Minnesota survey
Random sampling comment 1
 Say you collect data on units using a method other than a random
sample, and you know these data are not representative of the
population of interest. Then, you take a random sample from these
collected data. This random sample is representative of the population.
 Wrongo !!
 Large random samples are representative of the population in the frame.
 Effectively, this methods uses the unrepresentative, collected data as a
frame.
 By randomly sampling from a unrepresentative sample, you just get a
smaller unrepresentative sample.
Random sampling comment 2
 Say you obtain data that are representative of the target
population. Should you take a random sample from these
collected data?
 This question arises when researchers use data collected by
others, for example in a Stat 101 project.
 No!
 If you have a representative sample, use it.
 This sub-sampling method just reduces the amount of data
you work with
Random sampling comment 3
 A census is a measurement of outcomes for all units in the population. For example the
U.S.. Government does a census of the population every 10 years to apportion seats in
the House of Representatives. It also takes censuses of agriculture and business.
 Why do survey instead of census?
 Surveys are cheaper
 They require much fewer people to contact
 Surveys results can be obtained more quickly
 Same reason as above
 This is important because we want to make policy decisions on current answers not
answers that are months or years old.
 Surveys can be more accurate
 Fewer people to contact, less problems with interviewer effects and non-response bias
 Up shot: less data of high quality is better than more data of poor quality
Random sampling comment 4
 Most major surveys are not simple random samples
 They involve multiple stages of random selection
 e.g., randomly pick 100 cities. From these cities random pick 500
households, then random pick 1 person from each household
 Data collection like this are NOT representative of the population.
However, because units are selected randomly, statistician can
account for the non-representation.
 This is done by assigning a weight to each observation that reflects
how many units it represents in the population
 A good question to ask here would be: Where do the weights come
from?
 Generally when analyzing data from surveys that are not simple
random samples it is wise to contact a professional statistician
Download