a new method for evaluating the validity of survey questions

Experimental thinkaloud
protocols: a new method for
evaluating the validity of survey
questions
Patrick Sturgis
National Centre for Research Methods (NCRM)
and
University of Southampton
Paper presented at the New Measurement Issues in Survey Research meeting
of the Survey Resources Network, 21 September 2010
Do different questions
measure the same thing?

Many important concepts are measured by
different ‘standard’ questions in surveys:






Social/political trust
General health
Life happiness/satisfaction
Fear of crime/confidence in police
How to tell if they are ‘equivalent’?
How to tell which is the ‘best’ measure?
Validity assessment strategies







Face/process validity
Correlation with criterion variables
Multi-trait-multi-method (MTMM)
Expert panels
Behaviour coding
Interviewer debrief
Thinkaloud protocols/cognitive interview
Experimental thinkalouds



Randomly assign respondents to receive one or
other version of the ‘same’ question
Follow-up with verbatim probe ‘what came to mind
when answering last question?’
Examine marginal distribution of cognitive frames by
question type


Are people thinking of things they should be?
Use thinkaloud variables in regression model to
predict earlier response

Which cognitive frames are most relevant in forming
answers to the questions?
Example 1 - Trust
Conceptions of Trust





Trust is a ‘good thing’
Trusting citizens are good citizens (voting,
volunteering, civic engagement)
Trusting societies are good societies (more
democratic, egalitarian, > economic
performance)
Trust ‘lubricates’ social and economic
transactions
Reduces ‘monitoring costs’ and the need for
contracts etc.
The standard trust question

Generally speaking, would you say that most
people can be trusted, or that you can't be
too careful in dealing with people?



Most people can be trusted
Can’t be too careful
Usually credited to Rosenberg (1959), the
‘Rosenberg Generalized Trust’ (RGT) item
The Local Area Trust item

How much do you trust people in your local
area?





a lot
a fair amount
not very much
not at all
Reflects Putnam’s emphasis on trust being a
property of local areas
Trust by Question type


These items are both used more or less
interchangeably as measures of generalized trust
Yet, they yield very different estimates of trust at the
national level. e.g.:



Social Capital Community Benchmark survey: 47% most
people can be trusted; 83% trust people in local area ‘some’
or ‘a lot’
UK Taking Part survey: 44% most people can be trusted;
74% trust ‘many’ or ‘some’ of the people in their local area
Why such a large discrepancy in generalized trust
(trust in strangers)?
Research Design





Ipsos-MORI general population omnibus survey
Random selection of small areas, quota controlled
selection of individuals
n=989 (fieldwork, November 2007)
Respondents randomly assigned to RGT or TLA
item
In answering the last question, who came to mind
when you were thinking about ‘most people’/ ‘people
in your local area’?
Distributions for trust questions
TLA item (n=481)
RGT item (n=508)
Most people can be trusted
48% (229)
A lot
20% (100)
Can’t be too careful
52% (252)
A fair amount
60% (302)
Not very much
17% (88)
Not at all
3% (17)
Primary Codes
1. colleagues/ex-colleagues
2. family/family member
3. friends
4. most people I know/meet
5. neighbours
6. people from my church
7. anyone/all people
8. everyone/everybody
9. foreigners/ethnic minorities
10. general public/people in general
11. children/young people
12. no-one in particular
13. strangers
14. people in this town/village
15. doctors
16.officials/authority
figures/professionals
17. police
18. politicians/political parties
19. salesmen/sales people
20. tradesmen
21. don't know these days
22. identity theft
23. you have to place trust in people
24. people interested in themselves
25. people mostly trustworthy
26. trust people until they upset me
27. trusting is naïve
28. other answers
29. don't know/not stated
Higher Order Codes
% mentioned
Known others
42%
Unknown others
22%
Local community
5%
Named job/profession
10%
Other (not relevant)
13%
Don’t know/no answer
22%
Who comes to mind by RGT
80%
70%
60%
most people can be trusted
can't be too careful
% mentioned
50%
40%
30%
20%
10%
0%
known others
unknown others
named
job/profession
people in local
area
code
other
don't know/not
stated
Who comes to mind by TLA
80%
70%
a lot
a fair amount
not at all/not very much
60%
% mentioned
50%
40%
30%
20%
10%
0%
known others
unknown others
named
job/profession
people in local
area
code
other
don't know/not
stated
Who came to mind – both questions
60%
50%
RGT
TLA
% mentioned
40%
30%
20%
10%
0%
known others
unknown others
named
job/profession
people in local area
code
other
don't know/not
stated
Explanatory Models 1
Covariates
Age (years)
Sex (male=1)
social class (ABC1=1)
longstanding illness (yes = 1)
Highest qualification (ref=no qualifications)
Degree
GSCE or above
Marital status (ref = single, never married)
Divorced
Married
Widow
Who came to mind? (ref=2. unknown others)
1. known others
3. people in local area
4. named job/profession
5. other (not relevant)
6. non-one/don't know/not stated
Constant
RGT Item – Binary Logit Model
Model 1a
Model 2a
O.R
Logit (S.E.)
.
Logit (S.E.)
0.028 (0.036)
1.03
0.013 (0.038)
0.057 (0.197)
1.06
0.091 (0.207)
0.817 (0.213)*** 2.26
0.949 (0.227)***
0.355 (0.335)
1.43
0.462 (0.349)
O.R.
1.01
1.09
2.58
1.59
0.944 (0.337)**
0.108 (0.261)
2.60
1.11
1.029 (0.354)**
0.142 (0.276)
2.80
1.15
0.236 (0.454)
0.176 (0.274)
-0.124 (0.516)
1.27
1.19
0.88
0.508 (0.476)
0.413 (0.291)
0.272 (0.540)
1.66
1.51
1.31
-1.178 (0.345)
0.31
1.535 (0.267)***
1.885 (0.763)**
-0.255 (0.373)
0.257 (0.328)
1.043 (0.280)***
-2.161 (0.410)
4.64
6.60
0.78
1.29
2.84
0.12
Explanatory Models 2
Covariates
Age (years)
Sex (male=1)
social class (ABC1=1)
longstanding illness (yes = 1)
Highest qualification (ref=no qualifications)
Degree
GSCE or above
Marital status (ref = single, never married)
Divorced
Married
Widow
Who came to mind? (ref=2. unknown others)
1. known others
3. people in local area
4. named job/profession
5. other (not relevant)
6. non-one/don't know/not stated
Constant
TLA Item – Ordered Logit Model
Model 1b
Model 2b
O.R
Logit (S.E.)
.
Logit (S.E.)
0.097 (0.034)**
0.076 (0.034)*
1.10
-0.393 (0.186)**
-0.255 (0.190)
0.68
0.751 (0.204)*** 2.12
0.771 (0.207)***
0.230 (0.293)
0.297 (0.297)
1.26
0.605 (0.312)*
0.218 (0.255)
-0.247 (0.409)
0.323 (0.249)
0.516 (0.440)
1.83
1.24
0.425 (0.320)
0.075 (0.258)
1.53
1.08
0.78
1.38
1.68
-0.206 (0.418)
0.275 (0.253)
0.447 (0.448)
0.81
1.32
1.56
1.559 (0.305)***
0.953 (0.408)*
0.087 (0.305)
0.383 (0.356)
0.579 (0.346)
-
4.75
2.59
1.09
1.47
1.78
-
-
O.R.
1.08
0.77
2.16
1.35
-
The science of well-being
“Now is the time for every government to collect data on a uniform
basis on the happiness of its population…every survey of
individuals should automatically measure their well-being, so that
in time we can really say what matters to people and by how
much. When we do, it will produce very different priorities for our
society. ” Layard 2010, Science.
Survey measures of subjective
well-being

Tend to ask about ‘happiness’ or ‘satisfaction’
with life

And treat these as if they are measuring the
same concept
Happiness = Satisfaction?

Yes – time-series models show same pattern
of effects (Blanchlower and Oswald, 2002)

No – happiness and satisfaction correlated
but not equivalent in European Values Survey
(Gundelach and Kreiner 2004)
Mode effects

Widely different estimates of well-being
across different surveys

Could mode be an explanatory factor?


Being unhappy with your life is not socially
desirable (people may over-state happiness to an
interviewer)
Conti and Pudney (2008) find higher ratings
of satisfaction in interviewer relative to selfadministered questions
Design



Ipsos-MORI face-to-face omnibus survey
(quota sample), April 2010
n=2033
Respondents randomly allocated to:
1.
2.
3.
4.
interviewer administered life satisfaction
Self-administered life satisfaction
Interviewer administered happiness
Self-administered happiness
Questions (from European Social Survey)
All things considered, how happy would you say you are? Please answer
using the scale on the card where 1 means ‘extremely unhappy’ and
10 means ‘extremely happy’.
1. Extremely unhappy
.
.
10. Extremely happy
All things considered, how satisfied are you with your life as a whole
nowadays? Please answer using the scale on the card where 1
means ‘extremely dissatisfied’ and 10 means ‘extremely satisfied
1. Extremely dissatisfied
.
.
10. Extremely satisfied
Verbatims
Now, thinking about your answer to the last question, please tell me what
came to mind when thinking about your answer. There are no right or
wrong answers; I just want you to tell me everything that came to mind
in thinking about how happy you are. What else?
PROBE FULLY
Results 1
satisfaction = happiness?
Raw distributions for
happiness and satisfaction
Mean=7.38
Mean=7.39
Satisfaction v Happiness distributions
Pearson’s Chi
Square, p=0.041
Satisfaction v Happiness by sex
Means
Male = 7.43
Female = 7.34
p=0.047
p=0.394
Results 2
mode effects
Mode effect by question means
Question
Happiness
Satisfaction
CAPI (s.e.)
7.45 (.077)
7.29 (.081)
CASI (s.e.)
7.32 (.081)
7.49 (.085)*
Mode effect by question distributions
p=0.209
p=0.015
Question*mode*sex - means
Question
men
Happiness
Satisfaction
women
Happiness
Satisfaction
CAPI (s.e.)
CASI (s.e.)
7.40 (.105)
7.46 (.118)
7.36 (.118)
7.52 (.127)
7.50 (.111)
7.12 (.118)
7.28 (.112)
7.48 (.127)**
Question*mode*sex - distributions
p=0.053
p=0.037
p=0.018
p=0.145
Prediction model
happiness
(Constant)
sex (male)
age (years)
social grade (AB)
social grade (CD)
net income (banded)
parent(yes)
highest qual (degree)
no qualifications
mode (CASI)
n
R2
6.154
-.012
.016
.062
-.050
.131
-.049
.201
-.169
-.060
643.000
.053
s.e.
satisfaction
.351
.137
.004
.187
.179
.058
.156
.171
.217
.134
6.385
.292
.006
.337
.217
.206
-.070
-.263
.052
-.280
645.000
.052
s.e.
.352
.137
.004
.188
.175
.055
.158
.169
.212
.134
Verbatim responses
Verbatim responses

Verbatim responses coded to a descriptive
frame with 111 codes

These were then allocated to one of 14
thematic codes
Thematic Codes
1. work/job/education
2. family/friends/pets
3. emotions/feelings/outlook
4. ageing
5. house/home/area
6. financial/material possessions
7. social life/hobby
8. freedom/independence
9. events/temporary
10. health (self)
11. health (other)
12. political/environmental concerns
13. neutral/in the middle
14. other/idiosyncratic
Significant differences in thematic
codes across questions
25.0
% reporting code
20.0
15.0
happiness
satisfaction
10.0
5.0
0.0
work/job/education
economy/financial/material
events/temporary
thematic code
political/environmental
Conclusions




great deal of heterogeneity in the frames of
reference people use in answering trust
questions
Acquaintances more trusted than strangers
Problematic to assume these questions
measure generalized trust
Local area question should not be used
interchangeably with standard trust item