Correction for measurement error in survey research using SQP Willem E. Saris RECSM 2013 Introduction • All researchers agree that survey data contain measurement errors • Since 1971 procedures for correction of measurement errors are known (Duncan and Goldberger) • However, very few researchers try to correct for these errors Attention to measurement problems in social science journals of 2011 Journal Year No. paper Survey research used ESR 2011 48 41 9 1 EJPR 2011 32 20 4 1 POQ 2011 33 32 4 1 AJPS 2011 54 23 3 0 JM 2011 47 27 11 8 ESR=The European Sociological Review, EJPR= European Journal of Political Research, POQ= Public Opinion Quarterly, APSR=The American Journal of Political Science , JM=Journal of Marketing Errors Errors mentioned corrected Why does this happen? 1. because the effect of measurement errors is very small? or 2. because it is very difficult to correct for measurement error? or 3. because the information about the size of the measurement errors is not available? 1. Is the effect of the measurement errors very small? 1. Is the effect of the measurement errors very small? • Around 1971, Alwin, Andrews and I detected, using LISREL, that the errors in survey questions are very large • All three have spent their academic life on the estimation and correction for measurement error • Duane Alwin (2007) concentrated on the Quasi Simplex approach • Frank Andrews (1984) and I used the MTMM approach The size of the error variance • Our estimate was that in average 50% of the variance of responses to survey questions is error • So there is a considerable difference between the variable one likes to measure and the observed variable Consequences of measurement error • The consequences will discussed for • The observed correlations • The regression analysis • Comparative research The consequences for the correlation • Imagine that we are interested in the correlation between – f1 = job satisfaction – f2= life satisfaction • We ask : How satisfied are you with your job? and : How satisfied are you with your life? The responses are represented by y1 and y2 We know that there is quite a difference between f1 and y1 and between f2 and y2 A very simple model f1 q1 y1 e1 r(f1f2) f2 q2 y2 e2 If the variables fi and yi are standardized • qi2 = the quality of the indicator i for latent variable i • 1- qi2 = the error variance of indicator i for latent variable i • It can be proven that: r(y1y2) = r(f1f2) q1q2 Consequences for correlations • If the correlation between the latent variables is r(f1,f2) = .9, the correlation between the observed variables will be as follows Quality Quality coefficient coefficient Observed correlation q1 q2 r(y1, y2) 1.0 1.0 .90 .9 .9 .73 .8 .8 .58 .7 .7 .45 .6 .6 .33 Consequences for correlations and regressions JS* .6 JS e1 .4 e3 LS .6 LS* .4 u3 Age* Age .99 e2 Consequences for correlations and regressions Correlations between Latent variables JS* Age* JS* 1.0 Age* 0.0 1.0 LS* .4 .4 Observed variables LS* 1.0 Regression LS*=.4JS*+.4Age*+u3 JS Age JS 1.0 Age 0.0 1.0 LS .13 .24 LS 1.0 LS=.13JS+.24Age+e3 Consequences for cross cultural comparison Country A f1 .9 Country B f1 f2 .9 .7 f2 .7 y1 y2 y1 y2 e1 e2 e1 e2 Corr(Y1,Y2)=.65=.8*.9*.9 Corr(Y1,Y2)=.4=.8*.7*.7 Conclusions • The research of me, Andrews, Alwin and others shows that the error variance in survey data is rather large • The errors cause that the correlations and regression coefficients between observed variables can be very different from those between latent variables • Differences in error variances across countries will make comparisons across countries impossible 2. Is correction for measurement errors very difficult? The standard SEM approach 1 e1 y1 e2 y2 e3 y3 x1 Environmental Values (1) Environment friendly behavior (3) e4 y4 e5 y5 e6 y6 Influence 2 Perception Environmental damage (1) x2 1 2 3 x3 3 x4 4 Understanding politics (2) Is this the approach to use? • In principle this approach is correct but in reality it leads to a lot of complications and errors • This may be a reason that researchers don´t correct for measurement errors • There should be simpler procedures 2. Is correction for measurement errors very difficult? f1 q1 y1 e1 r(f1f2) f2 q2 y2 e2 If this model holds : r(y1y2) = r(f1f2) q1q2 Then it also holds that r(f1f2) = r(y1y2)/ q1q2 So correction for measurement error is very simple This holds for single questions as well as composite scores Quality estimates of two scales in the last Pilot of the ESS • Two scales were constructed: – one based on opinions about liberal rights called “liberal democracy” and – one based on opinions about electoral requirements called “electoral democracy” • The quality of the scale is: – for liberal demoncracy .79 – for electoral democracy .77 Correction for measurement error • The oberved correlation between the two scales is r(y1y2) = .638 • So r(f1f2) = .638/√(.79x.77) = .82 • So while the observed correlation is not very high, the correlation corrected for measurement error indicates quite a strong relationship between the two scales Relationships with other variables • We expect that the scale of liberal democracy should correlate with the variables : – Just (no poverty), quality = .51 – Direct (referenda), quality = .62 – Income (houshold), quality = .92 • We will now show how simple we can do regresion analysis with and without correcting for measurement errors Procedure to correct for measurement error using LISREL Without correction for measurement error With correction for measurement error Effects on liberal democracy in the UK da ni=4 no=378 ma=km km 1.0 .495 1.0 .401 .413 1.0 .210 -.053 -.116 1.0 labels liberal just direct income model ny=1 nx=3 out Effects on liberal democracy in the UK da ni=4 no=378 ma=km cm .79 .495 .51 .401 .413 .62 .210 -.053 -.116 .92 labels liberal just direct income model ny=1 nx=3 out Here 1 on the diagonal Here quality on the diagonal The correlations and regression Without correction for measurement errors With correction for measurement errors Correlations Correlations liberal just direct income -------- -------- -------- -------liberal 1.00 just 0.50 1.00 direct 0.40 0.41 1.00 income 0.21 -0.05 -0.12 1.00 Regression (36% explained) liberal s.e. t-value just direct income -------- -------- -------0.40 0.27 0.26 (0.05) (0.05) (0.04) 8.77 5.84 6.29 liberal just direct income -------- -------- -------- -------liberal 1.00 just 0.78 1.00 direct 0.57 0.73 1.00 income 0.25 -0.08 -0.15 1.00 Regression (70% explained) just direct income -------- -------- -------liberal 0.76 0.07 0.32 s.e. (0.04) (0.04) (0.03) t-value 18.22 1.59 11.06 Generalization • The same can be done for causal models with several variables and composite scores • It can be done for standardized and unstandardized coefficients • STATA has also posibilities for correction for measurement error but less general Procedure to correct for measurement error using Stata Limitations: • One can apply it only on regression, not on causal models in general • Only correction for measurement error in the independent variables • Only unstandardized analysis Regression without correction in STATA regress liberal socjustice direct income if cntry==1 The procedure for correction in STATA eivreg liberal socjustice direct income , r(socjustice .51 direct .62 income .92), if cntry==1 Conclusions • Correction for measurement errors is nowadays very simple • Correction for measurement errors is also necessary 3. Is it difficult to estimate the quality of questions and composite scores? 3. Is it difficult to estimate the quality of questions and composite scores? • There are a lot of different procedures • They all require at least 2 questions for each concept and the estimates are specific for the formulations of these questions • That means that the questionnaires become twice as long and more expensive The Multi-Trait Multi Method approach • There are many procedures developed to obtain estimates of the quality of questions and composite scores (Saris&Gallhofer 2007) • We have chosen the MTMM design – proposed by Campbell and Fiske (1959) – further developed by Andrews (1984), Saris and Andrews (1991), Saris, Satorra and Coenders (2004) An example • Three ESS questions about satisfaction: – On the whole, how satisfied are you with the present state of the economy in Britain? – Now think about the national government. How satisfied are you with the way it is doing its job? – And on the whole, how satisfied are you with the way democracy works in Britain? Three alternative response scales The first (M1): 1)very satisfied , 2)fairly satisfied, 3)fairly dissatisfied or 4)very dissatisfied The second (M2): very dissatisfied 0 1 2 3 4 5 6 7 8 9 very satisfied 10 The third (M3): 1)not at all satisfied 2)satisfied 3)rather satisfied 4)very satisfied Estimation • In this way one gets 45 variances and covariances • Using this data the quality coefficients for these 9 questions can be estimated Limitation of these experiments • In the ESS 3.000 questions have been evaluated with respect to quality up to now • However, in the same time 60.000 questions have been asked • One can never evaluate all questions • So an alternative procedure is necessary An alternative procedure • Frank Andrews already studied the relationship between the characteristics of the questions and the quality of questions • My idea was that if these relationships are strong one can use them for the prediction of the quality of new questions • I also thought of creating a program that could make these quality predictions MTMM experimenst in IRMCS 1990 - 2000 • 87 MTMM experiments were collected in the US (Andrews), the Netherlands (Scherpenzeel), Belgium (Billiet)and Austria (Költringer) containing 1023 questions • A first meta analysis was done to see if the quality of the questions could be explained by question characteristics • The results were very promising: the explained variance was .50 and .60 for the reliability and validity (Saris & Gallhofer 2007) MTMM experiments in the ESS 2000 - 2012 • In the European Social Survey in each round in each country 4 to 6 experiments • That means that in each round 1000 questions in more than 20 different European languages were evaluated • After 3 rounds, we had information about the quality of 3.000 questions • We expected to be able to predict the quality of the questions from the questions characteristics The long way to the solution: SQP • We coded the question characteristics of the MTMM questions • And we estimated the relationship between these characteristics and the quality of the questions • Without going into details (Oberski et al 2012), we could predict reliability with a R2 =.8 and the validity with a R2=.9 for the present 3.700 MTMM questions • The prediction procedure was implemented in the program SQP 2.0 The quality predictions of SQP 2.0 • So we are quite confident that SQP can make rather good predictions of new questions on the basis of the characteristics of the question Let us go to have a look Available here: http://sqp.upf.edu/ Can be used free of charge! You just need to register and then you can use it directly online Conclusions • It seems that it is easy to get information about the quality of questions • SQP gives for a lot of questions information about the quality based on research • SQP can also be used to predict the quality of questions that have not been studied • Users can bring in their own questions and by coding the question obtain a prediction of the quality • If the qualities of single questions are known, the quality of composite scores can also be derived Conclusions • The program SQP is an internet application • So all users that are coding questions add information about quality of new questions to the database • In this way,one gets a growing data base of questions with their quality: A wikipedia for questions Conclusions Is there any reason not to correct for measurement error ? 1. Is the effect of measurement errors very small? NO! 2. Is it very difficult to correct for measurement error? NO! 3. Is the information about the size of the measurement errors missing? NO! Conclusions • There is no reason anymore to analyze data without correction for measurement error • If one takes research seriously, one has to make the correction for measurement errors • Otherwise one cannot trust the results from the research Summary • A summary of all details and problems of this approach using ESS data will be provided in a second edition of • Saris and Gallhofer Design, Evaluation and Analysis of Questionnaires for Survey Research. Hoboken, Wiley • The book will appear in 2014 A FINAL ILLUSTRATION OF CORRECTION FOR A MORE COMPLEX CASE • A very popular topic of research is the explanantion of the opinion about immigration of people from outside Europe Economic threat Allow more people from outside Europe Better life Culture threat Summary of the predicted values of the quality indicators in Ireland Variable Method r2 v2 m2 q2 Allow SQP2.0 .826 .906 .094 .747 Economy SQP2.0 .770 .780 .220 .601 Culture SQP2.0 .761 .705 .295 .537 Better SQP2.0 .748 .725 .275 .543 Correction for errors taking cmv into account ρ(f1,f2) f1 f2 v1j Mj m1j t1j v2j m2j t2j r1j r2j fi = ith variable of interest vij = validity coefficient for variable i Mj = method factor for both variables mij = method effect on variable i tij = true score for yij rij = reliability coefficient y1j y2j yij = the observed variable e1j e2j eij= the random error in variable yij r(y1j,y2j) = r(f1,f2)q1jq2j + cmv r(f1,f2)r(y1j,y2j) - cmv]/ q1jq2j Correction of the correlations for random errors and CMV Estimates of the parameters with and without correction Conclusion • This example shows again that the corrections for measurement error is necessary • Now there is also no excuse anymore. • The procedures to correct are simple • And SQP provides information about the quality of questions even without collecting extra new data We did not do this work alone • Hubert Blalock, Karl Jöreskog, Frank Andrews, Albert Satorra, Marius de Pijper, Anuska Ferligoj, Roger Jowel, JoanManuel Batista • Past Students: Annette Scherpenzeel, Richard Költringer, Germa Coenders, Chris Aalbers, Irmgard Corten, William van der Veld, Luis Coromina, Laura Guillen, Desiree Knoppen • The new generation: Melanie Revilla, Diana Zavalla, Laur Lilleoja, Wiebke Weber • Special group: Daniel Oberski and Tom Gruner The Future • Of course the predictions are not perfect • Improvement is always possible - Alternative quality estimation procedures can be developed • Extention is necessary for - new question forms and - other languages • But… Future • I leave this task for the RECSM researchers: – Wiebke Weber, Melanie Revilla, Diana Zavalla, Anna de Castellarnau, Lydia Repke, Jennifer Neumann, Bruno Arpino, Paolo Moncagatta and André Pirralha • I have a lot of confidence that they will take the proper decisions in the future to maintain and improve the present tool • So that I can concentrate on other things… Club Pati Barcelona www.upf.edu/survey recsm@upf.edu