2010 Annual Conference Harvard Program in Survey Research October 22, 2010 Survey Experiments: Past, Present, Future Thomas M. Guterbock Director, Center for Survey Research University of Virginia Center for Survey Research University of Virginia 1 Overview • • • • • • • • Why survey experiments are so cool Defining the survey experiment Methods vs. substance Scan of survey methods experiments Substantive experiments Key design issues in survey experiments Factorial (“vignette”) surveys Example: dirty bomb scenarios in the National Capital Region • A look at the future of survey experiments Center for Survey Research University of Virginia 2 Why survey experiments are so cool! Sample surveys: • Generalizability • External validity Experiments: • Valid causal inference • Internal validity Survey experiments: • Generalizability • Valid causal inference • External & internal validity Center for Survey Research University of Virginia The best of both worlds! 3 Two knowledge gaps • Psych experimenters don’t always know a lot about doing surveys – Some don’t think sampling is very important – They don’t think surveys measure things well • Survey researchers don’t always know a lot about experiments – And they question the external validity or relevance of many lab experiments • Assumption: this audience, like the author, is more likely to be in the latter group Center for Survey Research University of Virginia 4 What’s a survey experiment? (and what’s not) Center for Survey Research University of Virginia 5 Experiments generally • An intervention, treatment or stimulus is varied – Subjects randomly assigned to treatment vs. control • Outcomes are measured • Because of random assignment, any variation in outcomes can be attributed to the treatment – Absent various threats to internal validity • The ‘classic experiment’ involves pre- and posttests (measurements of outcome variables) Center for Survey Research University of Virginia 6 Survey experiments • Systematically vary one or more elements of the survey across subjects • Usually do not include ‘pre-test’ measurement – Thus, most survey experiments are not ‘classic’ in design – “Posttest-only control group design” • Random assignment is critical to the design Center for Survey Research University of Virginia 7 Diagram of Basic Experimental Design Source: Babbie textbook An inclusive definition It’s still a survey experiment even if: • Sample is small • Sample is not probability based • Sample is not representative • It’s done in a lab setting • It’s only part of a pre-test for a survey project • Any aspect of the survey protocol is varied Large, probability-based samples do make the survey experiment better! Center for Survey Research University of Virginia 9 What’s NOT a survey experiment • General tinkering . . . – “Let’s experiment!” • One shot trial of a new method • Mid-stream change in method – No true randomization of cases when this happens • Experiments that only use a survey to measure outcomes, pre- or post-intervention – But do not vary the survey itself Center for Survey Research University of Virginia 10 Is this classical experiment a survey experiment? This is a questionnaire But this is not a survey experiment Source: Babbie textbook Methods vs. Substance A slippery distinction Center for Survey Research University of Virginia 12 Prevailing narrative . . . • Methods experiments have been around a long time – Mostly split-ballot question wording experiments • New trend is: use survey experiments for testing theories about substantive social science problems • The field is moving from methods experiments to substantive experiments – And from applied to basic research . . .This is but a partial picture. Center for Survey Research University of Virginia 13 In fact . . . • Methods experiments are burgeoning in number • Methods experiments deal with far more than question wording • Some methods experiments are quite complex • The line between ‘methods’ and ‘substantive’ research is increasingly blurred – As theories are developed to explain variations in survey response, methods experiments are used to test these theories. – The same theories may underlie some ‘substantive’ experimentation Center for Survey Research University of Virginia 14 Cross-cutting categories of survey experiments Methodological Substantive Applied Basic Center for Survey Research University of Virginia 15 Cross-cutting categories of survey experiments Methodological Applied Experiments to raise response rates Basic Test of ‘leveragesalience’ theory Substantive Center for Survey Research University of Virginia 16 Cross-cutting categories of survey experiments Methodological Substantive Applied Experiments Message to raise testing response rates Basic Test of ‘leveragesalience’ theory Test for ‘activation’ of racial hostility Center for Survey Research University of Virginia 17 Survey experiments about survey methods A quick scan of the landscape Center for Survey Research University of Virginia 18 What’s a methods experiment? • Purpose: improve survey methods – – – – Lower the cost Deliver quicker results Increase usability Decrease survey error • Decrease: – – – – Sampling error Coverage error Nonresponse error Measurement error Center for Survey Research University of Virginia 19 Independent variables (factors) in methods experiments • Mode comparisons – – – – Phone versus personal interviews Web versus mail Usually address several types of error Coverage, nonresponse, measurement • Sampling and coverage experiments – RDD versus Listed sample – ABS versus area-probability – Methods of selection within households Center for Survey Research University of Virginia 20 More factors . . . • Unit non-response – Dillman’s “Total Design” research – Response rate research in mailed surveys • Reminders, advance letters, stamps, length, color – Response rate research for web surveys • Paper reminders, progress indicators – Advance letters to boost telephone response – Cash incentives research • Item non-response – Arrows, visual design, skip instructions Center for Survey Research University of Virginia 21 Still more factors . . . • Measurement error experiments – Classic (and newer) experiments in • Question wording • Question sequencing • Offering a “don’t know” response or not • Question formats, response scales • Unfolding questions • Numbering, labeling of scales • Cell phone versus landline interviewing – Interviewer, context effects • Race, gender of interviewer • School versus home setting • Conversational vs. structured interviewing Center for Survey Research University of Virginia 22 Outcomes measured in methods experiments • Response rates – Completion, cooperation, refusal, mid-survey break-off rates • Responses to the survey questions – – – – Level of reporting of sensitive behaviors % who say they “don’t know” % giving extreme responses, standard deviations Length of open-ends • Data quality measures – Rate of skip errors – Missing responses – Interview length • Usability and cost measures – Including results from para-data Center for Survey Research University of Virginia 23 In short . . . The primary corpus of accepted research in survey methods today is almost entirely based on: Survey experiments Center for Survey Research University of Virginia 24 Substantive survey experiments Center for Survey Research University of Virginia 25 Substantive survey experiments • Most notable in the field of race relations – – – – Cf. Sniderman, Gilens, Kuklinski, et al. “mere mention” experiment Unbalanced list technique Activation of racial identity affecting minority responses • Movement spreading to other substantive areas – but methods experiments are still more common • TESS has fostered much experimentation – Over 200 experiments by 100 researchers by 2007 – Won 2007 AAPOR Warren Mitofsky Innovators Award • Factorial “vignette” technique—a long tradition – (more on this later) Center for Survey Research University of Virginia 26 Design issues in survey experiments Center for Survey Research University of Virginia 27 Split-ballot vs. within-subject • The vast majority of survey experiments use Split-Sample designs – “Randomized Posttest/Control Group” design – Statistical tests based on independent samples – Needs a lot of cases; most surveys have plenty • Some use within-subjects designs – Greater statistical power (paired comparisons) – But later responses may be influenced by earlier questions • Factorial vignette surveys often combine these – Vignettes vary across subjects – But each subject scores several vignettes Center for Survey Research University of Virginia 28 Statistical power issues • Power of a split-sample is greatest when cases are evenly divided – If groups are equal in size, critical value = ME * 2 – Example: N= 1200, split over two groups of 600 each. • ME for each group = +/- 4 % • Critical value for contrast = 4% x 1.41 = +/- 5.6% • Sometimes, control group needs to be larger – To preserve comparability with earlier survey – Because experimental treatment is expensive • Many experiments use more than one treatment • Are pre-tests big enough for an experiment? Center for Survey Research University of Virginia 29 Randomization issues • Full randomization between groups is crucial to the experiment’s validity • Difficult to get people to carry out randomization – If possible, have the computer do it • In CATI systems, don’t randomize within the interview – Pre-assign all values and store in the sample database • Be sure to track which group is which! • Don’t confound experimental effects with interviewer effects – Randomize across interviewers – Control for interviewer effects in analysis – Keep interviewers blind to your hypotheses Center for Survey Research University of Virginia 30 More design issues • Lab experiment or field experiment – – – – Lab gives better control over background variables Usually lower cost Easier to do complex measurements Field experiments give greater realism, representativeness • Better external validity • “Packages” vs. factorial design – Best design depends on study purposes Center for Survey Research University of Virginia 31 Factorial (vignette) surveys (with thanks to the late Steven L. Nock, my co-author) Center for Survey Research University of Virginia 32 Factorial surveys • Substantive survey experiments about factors that affect – Judgments – Decisions – Evaluations • These studies tell us: – What elements of information enter into the judgment? – How much weight does each element receive? – How closely do people agree about the above? Center for Survey Research University of Virginia 33 More on factorial surveys . . . • Respondents evaluate hypothetical situations or objects, known as ‘vignettes.’ • Experimental stimuli: vignettes • Outcomes of interest: judgments about the vignettes • Judgments will vary across the vignettes – But also across respondents Center for Survey Research University of Virginia 34 Why factorial surveys are cool • When values of factors are allocated independently across vignettes, the factors are uncorrelated. – This simplifies modeling of the effects on judgments • Factors are also independent of respondent characteristics • Respondents can be given quite complex vignettes to consider – Unusual combinations presented more easily as vignettes than in the real world Center for Survey Research University of Virginia 35 Key design choices in factorial surveys • How many factors? How many values? – More factors make the respondent’s task more difficult – More values on more factors yield larger number of possible unique vignettes – Phone surveys need simpler vignettes • Example: in Nock’s study with 10 factors, and 2 to 9 values on each, there were 270,000 possible vignettes – These are sampled (by SRS) and randomly assigned across respondents Center for Survey Research University of Virginia 36 More design choices . . . • Which vignettes to present? – – – – When there are a lot of vignettes, these must be sampled SRS across all values yields uncorrelated factors But distribution on some factors may not be like the real world Some randomly created vignettes are implausible • The number to present to the respondent – Need to avoid fatigue, boredom, and satisficing • Later judgments may be more affected by just a few factors, to which respondents become increasingly attentive – This choice depends on mode, sample, type of respondent • How many assessments? – One judgment, or a series of questions about each vignette? Center for Survey Research University of Virginia 37 Another design choice • What survey mode to use? – Paper, self-administered is possible • use Mail Merge to create unique set of vignettes on each questionnaire – Phone is possible • But number of vignettes and number of factors is restricted due to oral administration – Face-to-face with paper vignettes – CASI (with interviewer guidance) – Internet Center for Survey Research University of Virginia 38 Analysis can be complex • If 500 respondents each rate 5 vignettes . . . – – – – Then 2,500 vignettes are rated Data must be converted to a vignette file of n= 2,500 But: ratings are not independent! Each respondent is a ‘cluster’ of interdependent ratings • Solution: – Multi-level analysis – Analyze models using HLM Center for Survey Research University of Virginia 39 Example of a factorial survey Center for Survey Research University of Virginia 40 2009 Survey of Behavioral Aspects of Sheltering and Evacuation in the National Capital Region Sponsors: Virginia Dept. of Emergency Management U.S. Dept. of Homeland Security 41 Features of the Survey In-depth survey: average interview length 28 minutes Data collection using CATI (Computer-Assisted Telephone Interviewing) 2,657 interviews conducted by CSR, Sept-Dec 2009. Triple-frame sample design includes cellphones, landline RDD and listed phones • Fully supported Spanish language interviews as needed • Inclusion of cellphones increases representativeness Margin of error: +/- 2.3 percentage points Weighting by ownership, race, gender, geography, and type of telephone service Center for Survey Research University of Virginia 42 Event Scenarios Focus: dirty bomb(s) in the NCR Will residents decide to stay or to go? 3 scenarios at increasing hazard levels: Minimum, moderate, maximum Respondent is presented with only two of the three tested scenarios • Over 5,000 scenario tests in the study Center for Survey Research University of Virginia 43 Factorial Design • • • • Four aspects (“factors”) of the scenarios were experimentally varied using random assignment PATH: Which two hazard levels are asked NOTICE: Whether the event is preceded by prior notice or threats LOCATION: The respondent’s location when the event occurs SOURCE: The source of the information about the event Notice, location, and source are kept constant for both scenarios asked Center for Survey Research University of Virginia 44 Three Levels of Hazard Minimum Moderate Maximum 1 bomb far away 1 bomb 1 mile away Multiple bombs far away + 1 bomb 1 mile away Cloud in “that area” Cloud in “your community” Cloud in “your community” Wind blowing away from you Wind blowing towards you Wind blowing towards you People in “that area” should shelter – no instructions for you People in “your community” should shelter People in “your community” should shelter Respondent’s location Minimum “Far away” defined as: Maximum “Far away” defined as: Near the University of Maryland in College Park, Maryland In DC and Maryland In DC In Tysons Corner, Virginia In Virginia and Maryland In Maryland In Tysons Corner, Virginia In DC and Virginia In Virginia Center for Survey Research University of Virginia 45 Factors – two levels of NOTICE No Prior Notice Prior Notice Given No additional language added to the scenario Please imagine that three days ago in London a bomb exploded and was confirmed to be a dirty bomb, and two days ago a bomb exploded in New York and was also confirmed to be a dirty bomb…Now please imagine that yesterday [source of message] reported that a terrorist group had claimed responsibility for these bombs and said that another bomb would go off in the Washington, D.C. area. Because of that, yesterday the threat level in the Washington Metro Area was raised to the highest level. Center for Survey Research University of Virginia 46 Factors – two levels of LOCATION At Home Respondent asked to imagine being at home during the day when event occurs Not At Home If the respondent is employed and primarily works indoors the location is at work (if respondent works nights they are asked to imagine being in the building during the day) If the respondent is not employed or does not primarily work indoors, the location is a building that is not home and is a location where the respondent sometimes is on weekdays Center for Survey Research University of Virginia 47 Factors – four levels of SOURCE Emergency manager “The local emergency manager” Fire chief “The local fire chief” Local chief administrative officer “A top local official” Governor/mayor If respondent’s location when the event occurs is in VA or MD: “The Governor” If respondent’s location when the event occurs is in DC: “The Mayor” The four factors result in 48 different possible versions of the scenario, randomly assigned. Center for Survey Research University of Virginia 48 Detailed Follow-up Questions Follow up questions were asked about the decision to shelter in place or evacuate, as appropriate (once only) Shelter in place detail • Evacuation detail • Willingness to remain at location, reasons for leaving, what would aid staying Reason for leaving, destination, mode of travel, needs, use of designated route Mandatory evacuation: everyone was asked evacuation detail eventually Center for Survey Research University of Virginia 49 Perception of Hazard Center for Survey Research University of Virginia 50 “What is your perception of the risk of death or serious injury to you or members of your household from this event?” Percent who perceive “High Risk” or “Very High Risk” (by hazard level) Maximum 59% Moderate 47% Minimum 0% 24% 10% Center for Survey Research University of Virginia 20% 30% 40% 50% 60% 70% 51 Population Sheltering and Evacuation Behaviors Will They Stay or Will They Go? Center for Survey Research University of Virginia 52 Shelter-in-Place or Evacuation “Based on this information, would you stay at HOME, would you leave immediately to go somewhere else or would you continue with your activities?” Scenario Location: Home Stay at home Leave immediately Continue with activities Something else Center for Survey Research University of Virginia Hazard Level Min Mod Max 71.5 77.4 77.7 15.8 6.6 6.1 17.1 17.1 --5.5 5.2 53 Shelter-in-Place or Evacuation (cont.) “Based on this information, would you stay at WORK, would you leave immediately to go somewhere else or would you continue with your activities?” Scenario Location: Work or Other Building Stay at work Go home Go to another place Continue with activities Something else Center for Survey Research University of Virginia Hazard Level Min Mod Max 40.6 68.9 71.2 32.9 10.8 3.8 11.8 11.1 4.7 10.3 17.7 --9.7 6.4 54 Factors Affecting Behavioral Response Center for Survey Research University of Virginia 55 Notice of Event Location when event occurs: Home At home: In the minimum scenario, prior notice has a significant effect on the decision to stay or go Center for Survey Research University of Virginia 56 Notice of Event Location when event occurs: Work or Other Building At work: Prior notice has no significant effect Center for Survey Research University of Virginia 57 Source of Message to Shelter-in-Place (Effect on ‘leave immediately’) Governor or DC Mayor 19% A Top Official Compliance with shelter in place instruction is highest when the source of information is the State Governor or Mayor of DC 24% Local Fire Chief 21% Local Emergency Manager 24% 0% 5% 10% 15% 20% 25% 30% Percentage of Respondents Who Would Leave Immediately Center for Survey Research University of Virginia 58 Gender of Respondent [Event occurs while at home] 13% Maximum 22% At home, gender effect is significant for all three scenarios. 12% Moderate 22% When event occurs while at work/another building, gender effect is significant in two of three scenarios. 13% Minimum 19% 0% 5% 10% 15% 20% 25% Percentage of Respondents Who Would Leave Immediately Male Center for Survey Research University of Virginia Female 59 Summary Findings From Scenarios: Percentage of people who would leave their home immediately is not large Many people will leave their place of work if the event is far away (‘minimal hazard’) • Most of these will head to their homes The scenarios with greater ‘hazard’ did raise perception of risk • But the rates of leaving are similar for moderate and maximum hazards Higher education, prior positive experience in an emergency, female gender also increase sheltering compliance Still to come: multivariate analysis using HLM Center for Survey Research University of Virginia 60 Closing thoughts and a look to the future Center for Survey Research University of Virginia 61 Are survey experiments externally valid? • Survey experiments help to establish external validity – Because they are carried out on broadly representative populations • But answering a survey question or judging a vignette is not necessary a ‘real world’ test – External validity is not assured by the design • External validity isn’t a problem for applied survey methods experiments – The survey itself is the ‘real world’ setting for the behavior of interest Center for Survey Research University of Virginia 62 Survey experiments aren’t free • Full-scale stand-alone survey experiments are expensive – Factorial designs are hungry for cases • Adding a small experiment to an existing survey costs less • But the added experiment does increase costs at every step – Design, sample creation, programming – Interviewer training, sample management – Data entry, analysis, reporting, project management • Split-ballot wording experiments on existing items reduce statistical power of the original question – Asked in the control group only, smaller n Center for Survey Research University of Virginia 63 We need more survey experiments • Most questions used in most surveys have never been subjected to rigorous testing in experiments – Substantial improvements in measurement might be achievable through more experimentation • Despite small n’s and low power, testing of questions in pre-tests is potentially useful to the practitioner – Let’s do more pre-test experiments! • Possibilities for substantive research are boundless Center for Survey Research University of Virginia 64 New technologies are changing our survey experiments • Computerization has made experimentation easier in every mode (CAPI, CATI, CASI, Web) • Capture of para-data sheds new light on outcomes • New multimedia tools offer enhanced possibilities for presenting experimental stimuli • The Internet allows experimenters to reach outside the ‘subject pool’ to the general public – But not always using probability sampling Center for Survey Research University of Virginia 65 The ‘knowledge gaps’ are closing . . . • Survey experiments are increasing in number and sophistication – Survey researchers learning more about experiments • Behavioral scientists moving more of their experiments to the Internet – Seeking larger, more representative samples • The traditional lines between survey research and social science experiments are blurring further . . . . . . to the mutual benefit of both! Center for Survey Research University of Virginia 66 You’ve seen the movie . . . now read the book! Steven L. Nock and Thomas M. Guterbock “Survey Experiments.” Chapter in James Wright and Peter Marsden, eds., Handbook of Survey Research, Second Edition. Wiley Interscience, 2010. Center for Survey Research University of Virginia 67 2010 Annual Conference Harvard Program in Survey Research October 22, 2010 Survey Experiments: Past, Present, Future Thomas M. Guterbock Director, Center for Survey Research University of Virginia Center for Survey Research University of Virginia 68