measurement issues - Laboratory of Opinion Research (LORE)

advertisement
MEASUREMENT
ISSUES
LEONIE HUDDY
STONY BROOK UNIVERSITY
LEONIE.HUDDY@SUNYSB.EDU
OUTLINE
I.
MEASUREMENT ERROR
1. Definitions & Sources
2. Need for reliable measurement of DVs and moderators in
survey experiments
•
Examples: measurement of moderator variables
II. Using Experiments to Validate Measurement
I. Example: Racial Resentment
III. Cross-National Measurement
I. MEASUREMENT ERROR
1. DEFINITIONS (ALWIN, 2007)
Measurement error: Error that occurs when observed value is
different from the true value (systematically or at random)
 Bias: a measure differs in systematic ways from its true
value
 Reliability: the measure is free of measurement error
Validity: measures right concept. May also need to assess
this to ensure valid measurement.
• Face validity (looks right on the surface)
• Discriminant validity (differs from opposing soing what it
should)
• Convergent validity (goes with what it should)
• Predictive validity (predicts what it is supposed to)
2. SOURCES OF MEASUREMENT
ERROR (ALWIN)
Bias
Variance
-interviewer bias
-interviewer error variance
-respondent bias
-respondent error variance
-instrument bias
-instrument error variance
-mode bias
-mode error variance
3. RELIABLE MEASUREMENT OF
THE DEPENDENT VARIABLE (DV)
The major problem with measurement error in the DV:
VARIABILITY: Measurement error makes it more difficult to
successfully identify significant treatment effects.
• Important to include multiple measures of the DV to
reduce measurement error and increase measurement
reliability
• Many experimental studies focus more on the
manipulation than the DV
Bias in the DV (under or over estimates) does not bias the
estimated relationship with an independent variable
4. RELIABLE MEASUREMENT OF
EXPERIMENTAL MODERATORS
Experimental effects in political science are frequently
heterogeneous. Hypothetical examples include:
•The effect of elite partisan cues in a framing study depends
on partisan identity (direction and identity strength)
•The effects of new information about a government policy
may depend on existing levels of political sophistication
•Exposure to a more or less generous welfare policy depends
on la respondent’s left/right political ideology
The reliable measurement of moderators (and their correct
theoretical model specification) will increase the likelihood of
detecting heterogeneous experimental effects.
A. MODERATOR MEASUREMENT EXAMPLE:
PARTISAN IDENTITY VS. TRADITIONAL PID
STRENGTH; HUDDY, MASON & AAROE
Threat: an experimental blog statement that suggests that
Democrats or Republicans will lose the upcoming election;
message is from either the same or the other main party.
Sample statements in Democratic threat manipulation (from
the other party):
•I love watching Democrats delude themselves! They’re talking a big
game, but look closer and they know they’re in trouble.
•America clearly wants Republican leadership, and the Democrats are
running in circles desperately trying to convince themselves that
anyone in America trusts them!
•People don’t trust Democrats and they don’t like their politics.
•They lost a lot of credibility over their years of flip-flopping, it's going to
take more than a couple of years to get it back.
MULTI-ITEM PARTISAN IDENTITY VS.
TRADITIONAL PID STRENGTH;
HUDDY, MASON & AAROE
Partisan Identity Scale
How important is being a [Democrat/ Republican/Independent] to
you?”
How well does the term [Democrat/ Republican/Independent]
describe you?
When talking about [Democrats/ Republicans/Independents], how
often do you use “we instead of “they”?
To what extent do you think of yourself as being a [Democrat/
Republican/Independent]?
Traditional PID Strength
“Generally speaking do you think of yourself as a Democrat,
a Republican, or an Independent?”
“Are you a strong or not so strong Democrat/Republican?”
IF INDEPENDENT: “Do you think of yourself as close to the
Republican party or closer to the Democratic party? “
MULTI-ITEM PARTISAN IDENTITY VS. PID
STRENGTH: PREDICTING ANGRY REACTIONS TO
THREAT; HUDDY, MASON & AAROE
Merged Blog Study
Traditional
Strength
Partisan Identity
Partisan Threat
StrengthXThreat
IdentityXThreat
N
Student Sample
Dem
Rep
Dem
Rep
.004 (.03)
-.06 (.08)
-.04 (.07)
.05 (.16)
-.03 (.06)
-.24 (.14)
.00 (.20)
-.48 (.41)
.09 (.05)
-.04 (.11)
-.41 (.16)
-.15 (.34)
.02 (.04)
-.03 (.10)
-.07 (.12)
-.08 (.28)
.36 (.08)
.59 (.19)
.96 (.31)
.44 (.66)
1568
252
145
38
B. MODERATOR MEASUREMENT
EXAMPLE: CANDIDATE SKIN COLOR,
RACIAL PREJUDICE AND SOCIAL
DESIRABILITY; TERKILDSEN 1993
• Assigned to read about a light or and dark skinned black
candidate
• Subjects part of the Louisville, KY jury pool
• Measured self–monitoring (tendency to distort true beliefs
in response to social norms) AND racial prejudice as
factors that moderate the experimental treatment
• Both are measured as multi-item scales to reduce
measurement error
QUESTION WORDING-SELF
MONITORING; TERKILDSEN (1993)
C. Self-monitoring scale (abridged version): Respondents were asked to
indicate if "each statement is true or false as it applies to you:" Scale reliability
was .74.
F 1. I can only argue for ideas which I already believe.
T 2. When I am uncertain how to act in social situations I look to the behavior of
others.
T 3. I laugh more when I watch a comedy with others than when alone.
F 4. I would not change or modify my opinions in order to please someone else or
win favor. T 5. I am not always the person I appear to be.
F 6. My behavior is usually an expression of my true attitudes and beliefs.
F 7. I am not particularly good at making other people like me.
T 8. I can look anyone in the eye and tell a lie.
Scoring indicates responses for high self-monitors. Respondents received a 1
when they agreed with a high self-monitor's response and a 0 when they
disagreed.
QUESTION WORDING-RACIAL
PREJUDICE; TERKILDSEN (1993):
D. Racial Prejudice (adopted from the General Social
Survey): "Please rate black Americans on each scale
provided using any number between 1 and 5." A "don't know"
option was furnished. The endpoints of the six scales were
labeled as follows: Scale reliability was .85.
1.
Rich-Poor
2.
Intelligent-Unintelligent
3.
Hard-working-Lazy
4.
Prone to Violence-Not Prone to Violence
5.
Prefer to be self-supporting-Prefer to live off welfare
6.
Patriotic-Unpatriotic Item four is reverse coded.
RACE AND SKIN-TONE OF POLITICAL
CANDIDATES, VOTE FOR GOVERNOR ON 1-4
SCALE; TERKILDSEN, 1993
Light-Skinned Dark-Skinned White
black
black
candidate
candidate
candidate
Prejudice
-1.5 (1.2)*
-4.10 (1.5) ***
1.2 (1.0)
Self-monitor (SM)
-.74 (.41) **
-0.25 (.46)
-.29 (.31)
Prejudice X SM
.69 (2.4)
# of cases
100
9.3 (3.5) ***
109
-2.9 (2.4)
109
*** p<.01, one-tailed; ** p<.05, one-tailed test; * p<.10, one-tailed test
Prejudice is coded from -.5 to +.5; SM is coded from 0-1.
5. MEASUREMENT OF
THE TREATMENT EFFECT
• Emotional ads study (Weber); The Campaign Ads Study
(2007) examined the emotional impact of experimentally
altered campaign ads on political attitudes and
participation.
• 4 ads designed to manipulate anger, anxiety, sadness,
enthusiasm
• Respondents complete a battery of emotion questions (3
question / emotion) after the treatment
• In this study, ads have heterogeneous effects and do not
alter emotions cleanly. Raises questions about how to
assess the effects of the treatment. At a minimum, need to
measure the treatment well.
MANIPULATION CHECKS-EMOTIONAL ADS; TOP
PANEL-SMIS ADULT SAMPLE, BOTTOM PANEL-STUDENTS (WEBER)
Reported Anger
0.7
Reported Anxiety
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
Enthusiasm
Sadness
Anxiety
Anger
Enthusiasm
Sadness
Anxiety
Anger
II. USING EXPERIMENTS TO
VALIDATE KEY VARIABLES:
Racial Resentment, (Feldman and Huddy
2005 )
Controversy over the measurement and conception of racial
prejudice in political research
A. Overt Prejudice: belief that blacks are inherently inferior to
whites.
B. New Racism: resentment at the special treatment of
blacks.
• symbolic racism (Kinder and Sears);
• modern racism (McConahay);
• racial resentment (Kinder and Sanders).
NEW RACISM IS
CONTROVERSIAL
•It is an excellent predictor of white racial policy attitudes (
•But some argue that the items may be too close to the racial policies
they are supposed to predict (e.g., Schuman 2000; Sniderman and Tetlock
1986)
•Conceptualization makes it difficult to distinguish resentment from
individualism (Sniderman et al 2000).
Racial Resentment Items
1) “Irish, Italians, Jewish and many other minorities overcame prejudice and
worked their way up. Blacks should do the same without any special favors.”
(2) “Over the past few years blacks have gotten less than they deserve.”
(3) “It's really a matter of some people not trying hard enough; if blacks would only
try harder they could be just as well off as whites.”
(4) “Generations of slavery and discrimination have created conditions that make it
difficult for blacks to work their way out of the lower class.”
DATA: NEW YORK STATE
RACIAL ATTITUDES SURVEY
•
RDD telephone interview of New York state residents (late 2000 2001)
•
760 white, non-Hispanic, non-Asian respondents.
•
Survey conducted by the Center for Survey Research at Stony Brook
University.
College Scholarship Experiment. (similar to a program adopted by some
universities to replace race-based affirmative action college
admissions).
Respondents were randomly assigned to one of 8 conditions.
•
“To what extent do you favor providing special college scholarships
for _____ students who score in the top fifteen percent of their
school class, even if their school’s grades are not in the top fifteen
percent nationally?”
•
The eight conditions referred to white, black, poor white, poor black,
middle class white, middle class black, poor, and middle class
students.
PREDICTIONS CONCERNING
RACIAL RESENTMENT :
IDEOLOGY OR PREJUDICE?
• Racial resentment as prejudice: should only predict
opposition to policies targeted for black students
• Racial resentment as ideology: should promote
opposition to programs for all students regardless of race
• Or does the meaning of racial resentment vary with leftright (liberal-conservative) ideology?
• Racial for liberals (only affects their opposition to programs
for black students)
• Ideological for conservatives (predicts opposition to
program for all students)
Probability of Support for Scholarships by Racial Resentment:
POLITICAL LIBERALS
Poor White
Middle Class White
Poor Black
Middle Class Black
1
Probability of Support
.8
.6
.4
.2
0
0
.5
Racial Resentment
1
Probability of Support for Scholarships by Racial Resentment: Race-by-Class Conditions
POLITICAL CONSERVATIVES
Poor White
Middle Class White
Poor Black
Middle Class Black
1
Probability of Support
.8
.6
.4
.2
0
0
.5
Racial Resentment
1
III. CROSS-NATIONAL SURVEYS:
METHODS TO DEVELOP
COMPARABLE QUESTIONS
1. Sequential:
Developed in one context and then exported to another; survey
simply translated without adaptation for another context
• Examples: Eurobaromoeter; usually questions developed in French and
English first, and then these questions are translated for other countries
Does not allow for pre-testing in other languages. Other countries
are stuck with what ahs been developed initially.
• Example of problems: ISSP problems: could not ask Japanese about
whether their earnings were ”just” or “fair” because this is inappropriate in
the Japanese context.
Harkness: all items should be carefully exported.
It may be easier to discard “bad” items in long psychological
batteries because there are others. It may be more difficult in
social science questionnaires in which a concept is measured by
only one or two items.
2. Parallel Development:
Combine expertise from many countries and develop
the survey in a single language
• e.g., ESS which is written and developed in English first; ISSP also
is developed by a multicultural group and everyone votes on the
final questionnaire
Survey is then subject to multicultural testing before it
is finalized
Advance translation occurs by translating some
questions before the questionnaire is completed in
order to identify problems. Such translations do not
have to be perfect but are designed to bring up
obvious problems.
Overall, this approach is better than sequential but is
time consuming and involves complex coordination
3. SIMULTANEOUS:
Decentering: a draft questionnaire is produced
in one language and the final version is
produced in two. In the decentering phase
specific cultural references are also removed.
Typically applied when studying only 2
cultures; ensures that questions are truly
comparable.
This technique has been used on existing
instruments.
But it my create very bland items
An alternative is to have some core common
questions and some country specific; but then
these are difficult to compare
REFERENCES
Alwin, Duane F. 2007. Margins of Error: A Study of Reliability in Survey Measurement.”
Groves, Robert M. 1989. Survey Errors and Survey Costs. New York: Wiley.
Weber, Lavine, Federico
Lavine
Tourangeau, Roger, Lance Rips, and Kenneth Rasinski. 2000. The Psychology of Survey
Response. New York: Cambridge University Press.
Snyder, Mark, and Steven W. Gangestad. 1986. On the Nature of Self-Monitoring: Matters of
Assessment, Matters of Validity. Journal of Personality and Social Psychology, 51, 125-139.
Feldman, Stanley and Huddy, Leonie. 2005. “Racial Resentment and White Opposition to
Race-Conscious Programs: Principles or Prejudice? “American Journal of Political Science,
49 (1): 168-183.
Huddy, Leonie and Anna Gunthorsdottir. 2000. The Persuasive Effects of Emotive Visual
Imagery: Superficial Manipulation or A Deepening of Conviction? Political Psychology. 21:745778.
Harkness, Janet. 2003. “Questionnaire Translation” In Janet Harkness, Fons J. R. Van De
Vijver, and Peter de Mohler. Cross-Cultural Survey Methods. Hoboken, NJ: John Wiley and
sons. pp. 35-56.
HUDDY, MASON & AAROE
Shcuman
Sniderman & Tetlock
Download