Opinión Pública y Análisis de Encuestas Módulo II: Introducción a las encuestas martes 6 de julio de 2010 David Crow, Associate Director UC Riverside, Survey Research Center david.crow@ucr.edu What Surveys Measure • Attitudes – positive or negative orientation toward something • Beliefs – opinions about the objective state of the world (something is true or untrue) • Knowledge • Behavior BUT self-reported behavior (“stated” not “revealed” preferences) Survey Goals (Weisberg, Chap. 7) • Four Basic Survey Goals (recapitulating from Chap. 1) 1) Measuring prevalence of attitudes, beliefs, and behaviors - attitudes: likes and dislikes (do I like or dislike it?) beliefs: acceptance of factual proposition about the state of the world (is it true or not?) behaviors: things we do 2) Change over time 3) Differences between subgroups 4) Analyze causes of attitudes, beliefs, and behaviors Uses of Surveys (Weisberg, Chap. 1) • Polls and Elections • Population Characteristics – Current Population Survey (CPS) – American Community Survey (ACS) – Bureau of Labor Statistics • Consumer Research • Courts – “contingent valuation” – Utah apportionment case Other Research Designs • Experiments: experimenter manipulates variable (experimental treatment) and observes its effects in different groups • Aggregate Data: using data (census, election, sales) available for groups or geographical clusters (countries, states, census tracts) of people (units) • Deliberative Poll: do entrance survey, then give participants information and ask them to participate in group discussion, do exit survey to see if views have changed. • Focus Groups: moderated discussion • Audience Reaction: exposing • Others: in-depth interviews, participant observation, content analysis Tradeoff: Broad vs. Deep • Know a little about a lot of people, or a lot about few people • Surveys, Aggregate Data, Experiments: results can be generalized from sample to populations, BUT we can’t explore topics in-depth • In-Depth Interviews, Participant Observation, Focus Groups: We can explore concepts and meanings with great detail and nuance, but we can’t be sure the results are valid beyond the participants Measuring Behavior • Self-reported actions in the past • Problems: 1) Imperfect Recall: the farther the behavior is in the past, the less accurate reports of it are; forward telescoping (remembering events as more recent than they really were) and backward telescoping (remembering events as more remote than they really were) Solutions: 1) ask people about recent behavior; 2) set time frames for people with “warm-up” questions; 3) anchor memories to important life or historical events 2) Sensitive Topics: people are reluctant to report engaging in socially disapproved behavior and “over-report” engaging in socially desirable behavior Solutions: 1) “bogus pipeline” technique; 2) include behavior as part of a list of non-controversial behaviors; 3) randomized response technique Measuring Attitudes: Non-Attitudes • Non-Attitudes: respondents don’t know/ haven’t thought enough about question to have meaningful opinion (Converse, 1964) – “Don’t know / No opinion” – BUT, sometimes they give an on-the-spot answer that doesn’t reflect a real opinion – Debate: include NO/DK option or not? Pro: prevents reporting of non-attitudes as attitudes; Con: gets respondents off the hook from doing tough cognitive work solutions: 1) offer DK/NO option, but prompt people to reflect on responses; 2) deliberative polling • Attitude Strength: how strongly a person feels about a topic – Strong attitudes tend to be more stable over time – Strong attitudes are better at predicting behavior – Factors contributing to attitude strength: 1) knowledge of topic; 2) interest in topic; 3) value system Measuring Attitude Strength • Ask directly respondents rate topic on a scale of importance • Measure length of response time longer responses indicate weaker opinions • Give counterarguments survey-based experiment: ask for opinion, give counter-argument, re-ask question and see if answer varies Attitude Stability • Is public opinion fickle—i.e., change easily over time—or stable? • Is change real change or faulty measurement? Differences over meanings of words e.g., conservative / liberal • Strong opinions are more stable – More resistant to new evidence • When do opinions change? - Exposure to new evidence (e.g., increased opposition to Clinton health plan) - Changing “frames”: a “frame” is a widely accepted conceptual lens (often created in part by news media coverage) used to interpret events frames change over time - E.g., Anita Hill / Clarence Thomas; sexual harassment to high tech lynching - E.g., Schwarzenegger election: “Kooky Californians”, “Popular Revolt”, “Great Incommunicator” (Lakoff, 2004) Measuring Change Over Time • Change: increase or decrease of numerical variables over time • Some ways to measure change: – Attitude Recall Data: ask R directly how he/she felt about something in the past sometimes inaccurate because of “consistency bias” = desire to present oneself as consistent underestimates true change – Comparison of Cross-Sectional Surveys Over Time: compare averages for same (or similar) questions asked of different people at different times – Panel Studies: ask same questions to same people at different points in time “repeated measurements” – Instant Polls: interactive polls that measure real-time reactions to stimuli e.g., Frank Luntz “dial” polls Problems with Cross-Sectional Comparisons Problem: Distinguishing Between Real Differences and Ones that are Artifacts of Survey Methodology • Different Survey Organizations: different survey organizations use different methods to select, contact, and interview respondents results could vary • Different Populations: are the samples drawn from the same population? E.g., voting age adults vs. “likely voters” • Different Questions: even slight variations in question wording can change responses • Sampling Error: fluctuation over time might be random error rather than real change Panel Studies: Advantages • Allows for Assessment of Causal Effects: Causality implies temporal priority of cause to effect cross-section allows us to see how an effect varies across groups, but not how an effect varies as a result of some changing circumstance; - e.g., effect of age on voting: older people vote more often than younger people, but getting older doesn’t increase your propensity to vote - e.g., effect of gun ownership on feelings of safety: people with guns feel safer than people without, but getting a gun doesn’t necessarily mean you will feel safer • Gross vs. Net Change: net = individual-level change; gross = overall, aggregate-level change panel studies allow for measurement of both Panel Studies: Disadvantages • Expensive to locate people for re-interviews: • “Mortality” (aka “attrition”) people drop out of successive waves of study - Non-random attrition (i.e., when attrition systematically affects one group more than others) could alter results e.g., poorer people are more transient and more difficult to locate; people who are less interested in a topic are less likely to be interviewed • Survey process itself could alter behavior under study - e.g., initial interview about elections could increase interest in an election, causing people to vote who otherwise would not have voted Subgroup Comparisons • Comparing differences in behaviors, attitudes, and beliefs across subpopulations • Take the average for one group and compare it to that of another group; e.g., Calderón’s job approval ratings among PAN, PRI, PRD adherents (and those with no affiliation) • Note that you must take into account uncertainty associated with estimation for each subgroup • Implications for sampling typical nationally representative sample of 1,200 may not be enough to assess differences - “double-” or “over-sample” subgroups - “pyramiding”: combining estimates for subgroups over several surveys (at same point in time or at different time points) • BUT potential difficulties in assessing aggregate behavior and attitudes are less severe when comparing subgroup • E.g., recalled voting behavior: inaccurate memory affects our estimates of total proportions of people who voted, but it would only alter our estimates of the relationship of union membership to voting if we felt that union members were more or less likely to remember inaccurately than non-union members Assessing Causes of Behavior • Ask people directly why they do things people offer post hoc rationalizations • Better strategy: think about possible causes of behavior, and social and individual circumstances that influence actions, and ask about them • Explore numerical associations through statistical techniques such as crosstabulations, correlation, multiple regression Populations and Samples • Define the group of people to be studied - Characteristics: geography? age? gender? - Should be population suited to study of research question • Samples: Representative Subset of Population - Who should be interviewed? Population or sample? - How many interviews are necessary? • Larger is more representative and gives a more precise estimate, but costs more • Are subgroups important? oversampling • Depends on research question: elections 900 to 1,500 drug trials often as few as 200 • Modes of contact: 1) face-to-face; 2) SAQ (pencil and paper or on Web); 3) telephone Problems & Challenges (Cont’d) • Response Rate: not everyone responds, sample overrepresents easy-to-reach people; sample is not representative of population solutions: 1) increase efforts to reach hard-to-reach people (e.g., increase callbacks); 2) substitute new respondents or PSUs for non-respondents possibility of “substitution bias”; 3) offer incentives • “Sugging”: selling under the guise of surveys people grow wary of surveys • Sample Error: Uncertainty in estimates based on only a part of the sample (margin of error) can be calculated only for probability samples; • Non-coverage Error: sampling frame does not correspond to target population Constructing Questionnaire • Topic Order: Avoid embarrassing or difficult topics at beginning; demographics, especially income, toward end ESTABLISH TRUST • Question Order: Questions should flow easily; place related questions together. However, “consistency bias” respondents want to appear consistent and give same answers to similar questions; solution: invert scales, vary question placement • Response Set: Vary response set • Number of Questions: Keep interview manageable Questionnaire Construction (Weisberg, Chap. 4) • Question Form: - Closed-ended questions: predefined response categories easier to code and analyze, but don’t take into account all possible responses - Open-ended questions: allows for free responses accurately reflects range of possible responses, but very difficult to group together and time-consuming • Rating Scales: - Likert scale: four or five ordinal categories e.g., “strongly agree”, “agree”, “neither agree nor disagree”, “disagree”, “strongly disagree” - Feeling thermometer: 0 is very cold, 50 is neutral, 100 is very warm - Semantic differential: bipolar scales (typically seven-point) that ask respondents to rate something along several dimensions - Numerical scales: e.g., 1 to 10, sometimes with end-points anchored by semantic content Question Wording • Avoid Ambiguity: – Conceptual Ambiguity: be as specific as possible; e.g., not “racial integration”, but “racial integration” in specific situations; short, direct questions – Temporal ambiguity: broad, undefined time periods for self-reported past behaviors • Avoid Bias: question should scrupulously avoid leading respondents toward a particular response – 1) use neutral “frames”, e.g., taxes estate tax, “death tax”; – 2) social desirability bias interviewees say what people want to hear; solutions: non-judgmental question phrasing, interviewer rapport Question Wording (Cont’d) • Avoid “double-barreled” questions: “twofers” that ask about two things in one question e.g., bipolar scales should really be opposites; two possible reasons for response, e.g., “do you taxes on foreign oil should be used to reduce consumption” NO don’t want to reduce consumption or don’t want to tax foreign oil? Branching questions • No Opinion Option? Debate: early research (Converse, Michigan) portrayed citizens as uninformed forcing an answer may pressure citizens into meaningless response; More recent research (Krosnick) shows that “no opinion” option gives respondents an easy out allows them to avoid cognitive work of thinking about tough issues; solution: interviewer prompts Question Wording • Use Standard Wordings: if a question has been asked before, see how others have done this Census (Current Population Survey, CPS; American Community Survey, ACS); American National Election Survey, ANES (U. of Michigan); General Social Survey, GSS (University of Chicago, NORC) • Wording Matters! Are different question wordings equivalent? E.g., “prohibiting abortion” vs. “protecting the life of an unborn child” or “satisfied with democracy in Mexico” vs. “satisfied with the way democracy is working in Mexico” Issues in Rating Scales • Three Decisions: 1) How many points to include? 2) Middle category or not? 3) How many and what labels? • Difficult to remember many categories: solutions 1) prompt cards in face-to-face interviews; 2) branching format e.g., party ID on American National Election Survey (first question asks what party R identifies with; if respondent answers none, follow-up question asks if R leans toward any party) Order Effects 1) order in which answers are presented could slant responses; “primacy” first category privileged; “recency” last category 2) order in which questions are presented in poll could bias answers e.g., question that asks about if R voted, followed immediately by a question if R is registered to vote will bias responses to the second question Evaluating Questions • Reliability: people should answer question the same way each time they are asked; results should be reproducible; ways to assess reliability: 1) measure same people short time later; 2) batteries of similarly worded questions answers should be correlated • Validity: question should measure concept it is intended to measure 1) “face validity”: question seems to measure appropriate concept on first inspection; 2) “convergent validity” and “divergent validity”: measures of same concept should have similar answers, measures of different concepts should have different answers tested by correlational analysis; 3) “criterion validity” compare answers against objective data e.g., self-reported voting behavior; 4) “content validity” measures all important aspects of concept; 5) “construct validity” measures how related one concept is to another concept related concepts should be correlated Technical Concepts • Sampling Error: because a survey result is based on only a part of the population, it will typically be a little above or below the real value of the variable • Margin of Error: an estimate of the precision of the estimated value, reported as +/- x% around the estimated value; depends basically on number of respondents higher for subsamples • Confidence Level: percentage of samples in which the true value will fall within the margin of error; if CL is 95%, in 1 out of 20 samples, the real value will be outside the margin of error; higher confidence level wider margin of error Media and Polls (Gollin 1987) • Increase in media reliance on polls - Volume of stories based on polls has expanded dramatically - Polls become the story, rather than being used as part of a story - Media outlets open their own polling operations o Sporadic before 1960’s, increasingly common after o CBS / NYT o Washington Post/ABC • Public demand for polls increased - Between 1940s – 1980s public trusts in polls as accurate reflections of public opinion Media Reporting of Polls • However, suspicions arose 1970’s - News media using polls to “make news” rather than report it - Candidate claims to electability (based on internal, “secret” polls) contradicted by public polls - Citizens suspected activists and politicians of manipulating polls to further their aims - Increased number of polling organizations uneven quality of polls - Conflicting, inaccurate electoral forecasts • Response: Legal regulation? - Congressional bills after 1936 and 1948, but 1st Amendment protections win out - Self-regulation: professional associations (AAPOR, NCPP) Public Perceptions of Polling • Public is increasingly mistrustful • Public is weary of polls - Invasive and make demands on time - Public attitudes toward telephone etiquette changing (Groves “Survey Nonresponse) - Sales, telemarketing, commercial and governmental research Should We Trust Polls Reported in Media? (Asher, Chap. 6) • Is the poll well done technically? • Is the media source interpreting the poll correctly? (Do we have enough information to know?) • Who’s paying for the poll and why? Technical Reporting Standards • NCPP/AAPOR Standards (Asher, Chap. 6) – Sponsorship – Field Work: • Dates of field work • Location • Contact method – Sample: • Population sampled • Description of sampling frame • Selection procedure (random? Self-selected?) • Size (N) • Response / Completion rates • Description of subsamples, if any – Question wording – Precision (sampling error, margin of error) Technical Standards Don’t Ensure Quality of Information • Source of poll (pollster) may be different than news agency covering poll - If same, easier to comply - If different, no way to enforce recommendations • Standards themselves are incomplete - Don’t address response rates / efforts to increase response - Don’t address sample adjustments weighting, filters Weighting: e.g., Latinos constitute 35% of CA population, but only 10% of sample weighting adjustment multiplies each Latino by .35 / .10 = 3.5 Filters: e.g., probable voters: “How likely are you to vote?” (fivepoint scale ranging from “definitely” to “not at all”) filter out “not at all” and base conclusions on other respondents Media Don’t Always Comply with Standards • Newspapers - Compliance varies (Miller and Hurd, 1982: of 116 polls reported, 85% complied with sample size, but only 16% with margin of error) - BUT, study based on big-market papers - Coverage improved over time • TV: Paletz et al., 1980 - Considerably worse than newspapers - Sponsor never mentioned; question wording in 5% of TV news programs; survey dates in only 30% (cf. 43% in NYT) - Virtually no other technical information – Larson, 2000 - 50% mentioned sampling error, but no one understood how it works Substantive Interpretation • Media have wide latitude in interpreting results of polls; e.g. 1985 NYT abortion poll: Legal as is now Legal only to save mother, rape, incest Not Permitted Don’t Know / NA 40% 40% 16% 4% “Abortion is murdering a child / Abortion not murder because fetus isn’t a person” Murder 53% Not Murder 35% Don’t Know / NA 10% “Abortion sometimes is the best course in a bad situation” Agree 66% Disagree 26% Don’t Know / NA 8% Criticism of Media Poll Reporting • Misinterpretation, e.g.: – NYT 1989 overstated support for tort reform (poll sponsored by insurance company Aetna) (Krosnick 1989) – News outlets wrongly reported increasing support of Panama Canal Treaty (Smith and Hogan 1987) – Coverage of Ohioans support of “intelligent design” ignored questions in poll that suggested this should be taught at home or in church (Jacobs and Shapiro 1995) • Also, media CREATE news by carrying out polls and reporting on them • Focus on numbers often in detriment to underlying meaning “horse races” in presidential elections