Survey Experiments: Past, Present, Future

advertisement
2010 Annual Conference
Harvard Program in Survey Research
October 22, 2010
Survey Experiments:
Past, Present, Future
Thomas M. Guterbock
Director, Center for Survey Research
University of Virginia
Center for Survey Research
University of Virginia
1
Overview
•
•
•
•
•
•
•
•
Why survey experiments are so cool
Defining the survey experiment
Methods vs. substance
Scan of survey methods experiments
Substantive experiments
Key design issues in survey experiments
Factorial (“vignette”) surveys
Example: dirty bomb scenarios in the National Capital
Region
• A look at the future of survey experiments
Center for Survey Research
University of Virginia
2
Why survey experiments are so cool!
Sample surveys:
• Generalizability
• External validity
Experiments:
• Valid causal inference
• Internal validity
Survey experiments:
• Generalizability
• Valid causal inference
• External & internal
validity
Center for Survey Research
University of Virginia
The best of both worlds!
3
Two knowledge gaps
• Psych experimenters don’t always know a lot
about doing surveys
– Some don’t think sampling is very important
– They don’t think surveys measure things well
• Survey researchers don’t always know a lot about
experiments
– And they question the external validity or relevance of
many lab experiments
• Assumption: this audience, like the author, is more
likely to be in the latter group
Center for Survey Research
University of Virginia
4
What’s a survey experiment?
(and what’s not)
Center for Survey Research
University of Virginia
5
Experiments generally
• An intervention, treatment or stimulus is varied
– Subjects randomly assigned to treatment vs. control
• Outcomes are measured
• Because of random assignment, any variation in
outcomes can be attributed to the treatment
– Absent various threats to internal validity
• The ‘classic experiment’ involves pre- and posttests (measurements of outcome variables)
Center for Survey Research
University of Virginia
6
Survey experiments
• Systematically vary one or more elements of the
survey across subjects
• Usually do not include ‘pre-test’ measurement
– Thus, most survey experiments are not ‘classic’ in
design
– “Posttest-only control group design”
• Random assignment is critical to the design
Center for Survey Research
University of Virginia
7
Diagram of Basic Experimental Design
Source: Babbie textbook
An inclusive definition
It’s still a survey experiment even if:
• Sample is small
• Sample is not probability based
• Sample is not representative
• It’s done in a lab setting
• It’s only part of a pre-test for a survey project
• Any aspect of the survey protocol is varied
Large, probability-based samples do make the survey
experiment better!
Center for Survey Research
University of Virginia
9
What’s NOT a survey experiment
• General tinkering . . .
– “Let’s experiment!”
• One shot trial of a new method
• Mid-stream change in method
– No true randomization of cases when this happens
• Experiments that only use a survey to measure
outcomes, pre- or post-intervention
– But do not vary the survey itself
Center for Survey Research
University of Virginia
10
Is this classical experiment a survey experiment?
This is a
questionnaire
But this is
not a survey
experiment
Source: Babbie textbook
Methods vs. Substance
A slippery distinction
Center for Survey Research
University of Virginia
12
Prevailing narrative . . .
• Methods experiments have been around a long
time
– Mostly split-ballot question wording experiments
• New trend is: use survey experiments for testing
theories about substantive social science problems
• The field is moving from methods experiments to
substantive experiments
– And from applied to basic research
. . .This is but a partial picture.
Center for Survey Research
University of Virginia
13
In fact . . .
• Methods experiments are burgeoning in number
• Methods experiments deal with far more than
question wording
• Some methods experiments are quite complex
• The line between ‘methods’ and ‘substantive’
research is increasingly blurred
– As theories are developed to explain variations in
survey response, methods experiments are used to test
these theories.
– The same theories may underlie some ‘substantive’
experimentation
Center for Survey Research
University of Virginia
14
Cross-cutting categories of
survey experiments
Methodological
Substantive
Applied
Basic
Center for Survey Research
University of Virginia
15
Cross-cutting categories of
survey experiments
Methodological
Applied
Experiments
to raise
response rates
Basic
Test of
‘leveragesalience’
theory
Substantive
Center for Survey Research
University of Virginia
16
Cross-cutting categories of
survey experiments
Methodological
Substantive
Applied
Experiments Message
to raise
testing
response rates
Basic
Test of
‘leveragesalience’
theory
Test for
‘activation’ of
racial hostility
Center for Survey Research
University of Virginia
17
Survey experiments
about survey methods
A quick scan of the landscape
Center for Survey Research
University of Virginia
18
What’s a methods experiment?
• Purpose: improve survey methods
–
–
–
–
Lower the cost
Deliver quicker results
Increase usability
Decrease survey error
• Decrease:
–
–
–
–
Sampling error
Coverage error
Nonresponse error
Measurement error
Center for Survey Research
University of Virginia
19
Independent variables (factors)
in methods experiments
• Mode comparisons
–
–
–
–
Phone versus personal interviews
Web versus mail
Usually address several types of error
Coverage, nonresponse, measurement
• Sampling and coverage experiments
– RDD versus Listed sample
– ABS versus area-probability
– Methods of selection within households
Center for Survey Research
University of Virginia
20
More factors . . .
• Unit non-response
– Dillman’s “Total Design” research
– Response rate research in mailed surveys
• Reminders, advance letters, stamps, length, color
– Response rate research for web surveys
• Paper reminders, progress indicators
– Advance letters to boost telephone response
– Cash incentives research
• Item non-response
– Arrows, visual design, skip instructions
Center for Survey Research
University of Virginia
21
Still more factors . . .
• Measurement error experiments
– Classic (and newer) experiments in
• Question wording
• Question sequencing
• Offering a “don’t know” response or not
• Question formats, response scales
• Unfolding questions
• Numbering, labeling of scales
• Cell phone versus landline interviewing
– Interviewer, context effects
• Race, gender of interviewer
• School versus home setting
• Conversational vs. structured interviewing
Center for Survey Research
University of Virginia
22
Outcomes measured in
methods experiments
• Response rates
– Completion, cooperation, refusal, mid-survey break-off rates
• Responses to the survey questions
–
–
–
–
Level of reporting of sensitive behaviors
% who say they “don’t know”
% giving extreme responses, standard deviations
Length of open-ends
• Data quality measures
– Rate of skip errors
– Missing responses
– Interview length
• Usability and cost measures
– Including results from para-data
Center for Survey Research
University of Virginia
23
In short . . .
The primary corpus of accepted research in survey
methods today is almost entirely based on:
Survey experiments
Center for Survey Research
University of Virginia
24
Substantive survey experiments
Center for Survey Research
University of Virginia
25
Substantive survey experiments
• Most notable in the field of race relations
–
–
–
–
Cf. Sniderman, Gilens, Kuklinski, et al.
“mere mention” experiment
Unbalanced list technique
Activation of racial identity affecting minority responses
• Movement spreading to other substantive areas
– but methods experiments are still more common
• TESS has fostered much experimentation
– Over 200 experiments by 100 researchers by 2007
– Won 2007 AAPOR Warren Mitofsky Innovators Award
• Factorial “vignette” technique—a long tradition
– (more on this later)
Center for Survey Research
University of Virginia
26
Design issues in
survey experiments
Center for Survey Research
University of Virginia
27
Split-ballot vs. within-subject
• The vast majority of survey experiments use Split-Sample
designs
– “Randomized Posttest/Control Group” design
– Statistical tests based on independent samples
– Needs a lot of cases; most surveys have plenty
• Some use within-subjects designs
– Greater statistical power (paired comparisons)
– But later responses may be influenced by earlier questions
• Factorial vignette surveys often combine these
– Vignettes vary across subjects
– But each subject scores several vignettes
Center for Survey Research
University of Virginia
28
Statistical power issues
• Power of a split-sample is greatest when cases are
evenly divided
– If groups are equal in size, critical value = ME * 2
– Example: N= 1200, split over two groups of 600 each.
• ME for each group = +/- 4 %
• Critical value for contrast = 4% x 1.41 = +/- 5.6%
• Sometimes, control group needs to be larger
– To preserve comparability with earlier survey
– Because experimental treatment is expensive
• Many experiments use more than one treatment
• Are pre-tests big enough for an experiment?
Center for Survey Research
University of Virginia
29
Randomization issues
• Full randomization between groups is crucial to the
experiment’s validity
• Difficult to get people to carry out randomization
– If possible, have the computer do it
• In CATI systems, don’t randomize within the interview
– Pre-assign all values and store in the sample database
• Be sure to track which group is which!
• Don’t confound experimental effects with interviewer
effects
– Randomize across interviewers
– Control for interviewer effects in analysis
– Keep interviewers blind to your hypotheses
Center for Survey Research
University of Virginia
30
More design issues
• Lab experiment or field experiment
–
–
–
–
Lab gives better control over background variables
Usually lower cost
Easier to do complex measurements
Field experiments give greater realism,
representativeness
• Better external validity
• “Packages” vs. factorial design
– Best design depends on study purposes
Center for Survey Research
University of Virginia
31
Factorial (vignette) surveys
(with thanks to the late
Steven L. Nock, my co-author)
Center for Survey Research
University of Virginia
32
Factorial surveys
• Substantive survey experiments about factors that
affect
– Judgments
– Decisions
– Evaluations
• These studies tell us:
– What elements of information enter into the judgment?
– How much weight does each element receive?
– How closely do people agree about the above?
Center for Survey Research
University of Virginia
33
More on factorial surveys . . .
• Respondents evaluate hypothetical situations or
objects, known as ‘vignettes.’
• Experimental stimuli: vignettes
• Outcomes of interest: judgments about the
vignettes
• Judgments will vary across the vignettes
– But also across respondents
Center for Survey Research
University of Virginia
34
Why factorial surveys are cool
• When values of factors are allocated
independently across vignettes, the factors are
uncorrelated.
– This simplifies modeling of the effects on judgments
• Factors are also independent of respondent
characteristics
• Respondents can be given quite complex vignettes
to consider
– Unusual combinations presented more easily as
vignettes than in the real world
Center for Survey Research
University of Virginia
35
Key design choices in factorial surveys
• How many factors? How many values?
– More factors make the respondent’s task more difficult
– More values on more factors yield larger number of
possible unique vignettes
– Phone surveys need simpler vignettes
• Example: in Nock’s study with 10 factors, and 2 to
9 values on each, there were 270,000 possible
vignettes
– These are sampled (by SRS) and randomly assigned
across respondents
Center for Survey Research
University of Virginia
36
More design choices . . .
• Which vignettes to present?
–
–
–
–
When there are a lot of vignettes, these must be sampled
SRS across all values yields uncorrelated factors
But distribution on some factors may not be like the real world
Some randomly created vignettes are implausible
• The number to present to the respondent
– Need to avoid fatigue, boredom, and satisficing
• Later judgments may be more affected by just a few factors, to
which respondents become increasingly attentive
– This choice depends on mode, sample, type of respondent
• How many assessments?
– One judgment, or a series of questions about each vignette?
Center for Survey Research
University of Virginia
37
Another design choice
• What survey mode to use?
– Paper, self-administered is possible
• use Mail Merge to create unique set of vignettes on each
questionnaire
– Phone is possible
• But number of vignettes and number of factors is
restricted due to oral administration
– Face-to-face with paper vignettes
– CASI (with interviewer guidance)
– Internet
Center for Survey Research
University of Virginia
38
Analysis can be complex
• If 500 respondents each rate 5 vignettes . . .
–
–
–
–
Then 2,500 vignettes are rated
Data must be converted to a vignette file of n= 2,500
But: ratings are not independent!
Each respondent is a ‘cluster’ of interdependent ratings
• Solution:
– Multi-level analysis
– Analyze models using HLM
Center for Survey Research
University of Virginia
39
Example of a
factorial survey
Center for Survey Research
University of Virginia
40
2009 Survey of
Behavioral Aspects of
Sheltering and Evacuation
in the National Capital
Region
Sponsors:
Virginia Dept. of Emergency
Management
U.S. Dept. of Homeland Security
41
Features of the Survey

In-depth survey: average interview length 28 minutes

Data collection using CATI (Computer-Assisted Telephone
Interviewing)

2,657 interviews conducted by CSR, Sept-Dec 2009.

Triple-frame sample design includes cellphones, landline
RDD and listed phones
• Fully supported Spanish language interviews as needed
• Inclusion of cellphones increases representativeness

Margin of error: +/- 2.3 percentage points

Weighting by ownership, race, gender, geography, and type
of telephone service
Center for Survey Research
University of Virginia
42
Event Scenarios
Focus: dirty bomb(s) in the NCR
Will residents decide to stay or to go?
3 scenarios at increasing hazard
levels: Minimum, moderate,
maximum
Respondent is presented with only
two of the three tested scenarios




•
Over 5,000 scenario tests in the study
Center for Survey Research
University of Virginia
43
Factorial Design

•
•
•
•

Four aspects (“factors”) of the
scenarios were experimentally varied
using random assignment
PATH: Which two hazard levels are asked
NOTICE: Whether the event is preceded
by prior notice or threats
LOCATION: The respondent’s location
when the event occurs
SOURCE: The source of the information
about the event
Notice, location, and source are kept
constant for both scenarios asked
Center for Survey Research
University of Virginia
44
Three Levels of Hazard
Minimum
Moderate
Maximum
1 bomb far away
1 bomb 1 mile away
Multiple bombs far away
+ 1 bomb 1 mile away
Cloud in “that area”
Cloud in “your
community”
Cloud in “your
community”
Wind blowing
away from you
Wind blowing
towards you
Wind blowing
towards you
People in
“that area”
should shelter – no
instructions for you
People in
“your community”
should shelter
People in
“your community”
should shelter
Respondent’s
location
Minimum
“Far away” defined as:
Maximum
“Far away” defined as:
Near the University of Maryland in College Park,
Maryland
In DC and Maryland
In DC
In Tysons Corner, Virginia
In Virginia and Maryland
In Maryland
In Tysons Corner, Virginia
In DC and Virginia
In Virginia
Center for Survey Research
University of Virginia
45
Factors – two levels of NOTICE
No Prior Notice
Prior Notice Given
No additional
language added
to the scenario
Please imagine that three days ago in
London a bomb exploded and was
confirmed to be a dirty bomb, and two
days ago a bomb exploded in New York
and was also confirmed to be a dirty
bomb…Now please imagine that yesterday
[source of message] reported that a
terrorist group had claimed responsibility
for these bombs and said that another
bomb would go off in the Washington, D.C.
area. Because of that, yesterday the threat
level in the Washington Metro Area was
raised to the highest level.
Center for Survey Research
University of Virginia
46
Factors – two levels of LOCATION
At Home
Respondent
asked to
imagine being
at home during
the day when
event occurs
Not At Home
If the respondent is employed and
primarily works indoors the location is
at work (if respondent works nights
they are asked to imagine being in
the building during the day)
If the respondent is not employed or
does not primarily work indoors, the
location is a building that is not home
and is a location where the
respondent sometimes is on
weekdays
Center for Survey Research
University of Virginia
47
Factors – four levels of SOURCE
Emergency
manager
“The local
emergency
manager”
Fire
chief
“The local
fire chief”
Local chief
administrative
officer
“A top local
official”
Governor/mayor
If respondent’s location
when the event occurs is
in VA or MD:
“The Governor”
If respondent’s location
when the event occurs is
in DC:
“The Mayor”
The four factors result in 48 different possible versions of the scenario,
randomly assigned.
Center for Survey Research
University of Virginia
48
Detailed Follow-up Questions


Follow up questions were asked about
the decision to shelter in place or
evacuate, as appropriate (once only)
Shelter in place detail
•

Evacuation detail
•

Willingness to remain at location, reasons
for leaving, what would aid staying
Reason for leaving, destination, mode of
travel, needs, use of designated route
Mandatory evacuation: everyone was
asked evacuation detail eventually
Center for Survey Research
University of Virginia
49
Perception of Hazard
Center for Survey Research
University of Virginia
50
“What is your perception of the risk of death or
serious injury to you or members of your
household from this event?”
Percent who perceive “High Risk” or “Very High Risk” (by hazard level)
Maximum
59%
Moderate
47%
Minimum
0%
24%
10%
Center for Survey Research
University of Virginia
20%
30%
40%
50%
60%
70%
51
Population Sheltering and
Evacuation Behaviors
Will They Stay
or Will They
Go?
Center for Survey Research
University of Virginia
52
Shelter-in-Place or Evacuation
“Based on this information, would you stay at HOME, would
you leave immediately to go somewhere else or would you
continue with your activities?”
Scenario Location:
Home
Stay at home
Leave immediately
Continue with activities
Something else
Center for Survey Research
University of Virginia
Hazard Level
Min Mod Max
71.5 77.4 77.7
15.8
6.6
6.1
17.1 17.1
--5.5 5.2
53
Shelter-in-Place or Evacuation
(cont.)
“Based on this information, would you stay at WORK, would
you leave immediately to go somewhere else or would you
continue with your activities?”
Scenario Location:
Work or Other Building
Stay at work
Go home
Go to another place
Continue with activities
Something else
Center for Survey Research
University of Virginia
Hazard Level
Min Mod Max
40.6 68.9 71.2
32.9
10.8
3.8
11.8
11.1 4.7
10.3 17.7
--9.7
6.4
54
Factors Affecting Behavioral
Response
Center for Survey Research
University of Virginia
55
Notice of Event
Location when event occurs: Home
At home:
In the minimum
scenario, prior
notice has a
significant effect on
the decision to stay
or go
Center for Survey Research
University of Virginia
56
Notice of Event
Location when event occurs: Work or Other Building
At work:
Prior notice has no
significant effect
Center for Survey Research
University of Virginia
57
Source of Message to Shelter-in-Place
(Effect on ‘leave immediately’)
Governor or DC
Mayor
19%
A Top Official
Compliance with
shelter in place
instruction is
highest when the
source of
information is the
State Governor or
Mayor of DC
24%
Local Fire Chief
21%
Local Emergency
Manager
24%
0%
5%
10%
15%
20%
25%
30%
Percentage of Respondents Who Would Leave Immediately
Center for Survey Research
University of Virginia
58
Gender of Respondent
[Event occurs while at home]
13%
Maximum
22%
At home, gender effect
is significant for all
three scenarios.
12%
Moderate
22%
When event occurs
while at work/another
building, gender effect
is significant in two of
three scenarios.
13%
Minimum
19%
0%
5%
10%
15%
20%
25%
Percentage of Respondents Who Would Leave Immediately
Male
Center for Survey Research
University of Virginia
Female
59
Summary Findings From Scenarios:


Percentage of people who would leave their
home immediately is not large
Many people will leave their place of work if
the event is far away (‘minimal hazard’)
• Most of these will head to their homes

The scenarios with greater ‘hazard’ did raise
perception of risk
• But the rates of leaving are similar for moderate
and maximum hazards


Higher education, prior positive experience in
an emergency, female gender also increase
sheltering compliance
Still to come: multivariate analysis using HLM
Center for Survey Research
University of Virginia
60
Closing thoughts
and a look to the future
Center for Survey Research
University of Virginia
61
Are survey experiments
externally valid?
• Survey experiments help to establish external
validity
– Because they are carried out on broadly representative
populations
• But answering a survey question or judging a
vignette is not necessary a ‘real world’ test
– External validity is not assured by the design
• External validity isn’t a problem for applied
survey methods experiments
– The survey itself is the ‘real world’ setting for the
behavior of interest
Center for Survey Research
University of Virginia
62
Survey experiments aren’t free
• Full-scale stand-alone survey experiments are expensive
– Factorial designs are hungry for cases
• Adding a small experiment to an existing survey costs less
• But the added experiment does increase costs at every step
– Design, sample creation, programming
– Interviewer training, sample management
– Data entry, analysis, reporting, project management
• Split-ballot wording experiments on existing items reduce
statistical power of the original question
– Asked in the control group only, smaller n
Center for Survey Research
University of Virginia
63
We need more survey experiments
• Most questions used in most surveys have never
been subjected to rigorous testing in experiments
– Substantial improvements in measurement might be
achievable through more experimentation
• Despite small n’s and low power, testing of
questions in pre-tests is potentially useful to the
practitioner
– Let’s do more pre-test experiments!
• Possibilities for substantive research are boundless
Center for Survey Research
University of Virginia
64
New technologies are changing
our survey experiments
• Computerization has made experimentation easier
in every mode (CAPI, CATI, CASI, Web)
• Capture of para-data sheds new light on outcomes
• New multimedia tools offer enhanced possibilities
for presenting experimental stimuli
• The Internet allows experimenters to reach outside
the ‘subject pool’ to the general public
– But not always using probability sampling
Center for Survey Research
University of Virginia
65
The ‘knowledge gaps’ are closing . . .
• Survey experiments are increasing in number and
sophistication
– Survey researchers learning more about experiments
• Behavioral scientists moving more of their
experiments to the Internet
– Seeking larger, more representative samples
• The traditional lines between survey research and
social science experiments are blurring further . . .
. . . to the mutual benefit of both!
Center for Survey Research
University of Virginia
66
You’ve seen the movie . . .
now read the book!
Steven L. Nock and Thomas M. Guterbock
“Survey Experiments.”
Chapter in James Wright and Peter Marsden, eds., Handbook
of Survey Research, Second Edition. Wiley Interscience,
2010.
Center for Survey Research
University of Virginia
67
2010 Annual Conference
Harvard Program in Survey Research
October 22, 2010
Survey Experiments:
Past, Present, Future
Thomas M. Guterbock
Director, Center for Survey Research
University of Virginia
Center for Survey Research
University of Virginia
68
Download