The Questionnaire Design Pitfalls of Multiple Modes Dr Pamela Campanelli Survey Methods Consultant Chartered Statistician Chartered Scientist Acknowledgements Other main members of UK “Mixed Modes and Measurement Error” grant team: Gerry Nicolaas Peter Lynn Annette Jäckle Steven Hope Ipsos MORI University of Essex University of Essex University College London Grant funding from: UK Economic and Social Research Council (Award RES-175-25-0007) Larger Project Looked for Evidence of Mode Differences by • Question Content • Question Format • Type of task • Characteristics of the task • Implementation of the task • Made recommendations Today a few highlights Sensitive Questions Mixed Mode Context • Very well-known • Sensitive questions prone to social desirability effects in interviewer modes (see Tourangeau and Yan, 2007; Kreuter, Presser and Tourangeau, 2008) • But not all questions (Fowler, Roman, and Di, 1998) • Difference by time frame (Fowler, Roman, and Di, 1998) Modes to Use Recommendations SA • If mixed mode design includes (Self• F2F interviews, ask sensitive questions in a paper SA administered) form or use CASI • TEL interview, pre-test sensitive questions across modes that will be used to see if there are differences Non-Sensitive: Factual Versus Subjective (1) Mixed Mode Context • Subjective questions more prone to mode effects than factual questions (see Lozar Manfreda and Vehovar, 2002; Schonlau et al, 2003) • But factual questions also susceptible (Campanelli, 2010) • Subjective scalar questions can be prone to TEL positivity bias TEL (and F2F) Positivity Bias Dillman et al (2009) - aural versus visual effect • TEL Rs giving more extreme positive answers Ye et al (2011) - TEL Rs giving more extreme positive answers • But found that F2F was like TEL • Concluded caused by a MUM effect Hope et al (2011) – TEL Rs giving more extreme positive answers • But no trace of this in F2F (with a showcard and without a show card) Thus, actual cause for the TEL positivity bias is still unclear Non-Sensitive: Factual Versus Subjective (2) Modes to Use Recommendations F2F TEL? SA Factual Questions • Use Dillman’s uni-mode principles and test to see if there are differences across modes Subjective scalar questions • Avoid TEL, if possible, due to TEL positivity bias • Test F2F to see if positivity bias is present Inherently Difficult Questions (1) General Questionnaire Design Context Inherent difficulty: Question is difficult due to conceptual, comprehension and/or recall issues • Survey satisficing should be greater for inherently difficult questions (Krosnick, 1991) • But this is not true for all inherently difficult questions (Hunt et al, 1982; Sangster and Fox, 2000; Nicolaas et al, 2011) Inherently Difficult Questions (2) EXAMPLE: N56y. What are the things that you like about your neighbourhood? Do you like your neighbourhood because of its community spirit? Yes……. 1 No…….. 2 N57y. Do you like your neighbourhood because it feels safe? Yes……. 1 No…….. 2 Etc. Nicolaas et al (2011) Inherently Difficult Questions (3) Modes to Use Recommendations F2F TEL? SA? General Questionnaire Design Context • In general, before use, test questions that are inherently difficult to see how feasible the questions are for Rs • (Testing can be done with cognitive interviewing or Belson’s (1981) respondent debriefing method) Mixed Modes Context • In mixed mode design, pre-test questions with inherent difficulty across modes that will be used to see if there are differences Mark All That Apply vs. Yes/No for Each (1) Mark at that apply Yes/No for each This card shows a number of different ways for reducing poverty. In your opinion, which of the following would be effective in reducing poverty? MARK ALL THAT APPLY. Increasing pensions Investing in education for children Improving access to childcare Redistribution of wealth Increasing trade union rights Reducing discrimination Increasing income support Investing in job creation None of these 1 2 3 4 5 6 7 8 9 Next are a number of questions about different ways for reducing poverty. In your opinion, which of the following would be effective? Would increasing pensions reduce poverty? Yes No 1 2 Would investing in education for children reduce poverty? Yes No Etc. 1 2 Mark All That Apply vs. Yes/No for Each (2) General Questionnaire Design Context ‘Mark all that apply’ is problematic • Sudman and Bradburn (1982) • Rasinski et al (1994), Smyth et al (2006) and Thomas and Klein (2006) • Thomas and Klein (2006) • Smyth et al (2006) • Nicolaas et al (2011) Mark All That Apply vs. Yes/No for Each (3) Mixed Mode Context • Smyth et al (2008) - student sample • Nicolaas et al (2011) - probability sample of the adult population • More research needed Mark All That Apply vs. Yes/No for Each (4) Mark all that apply Modes to Use Recommendations F2F? • The ‘mark all that apply’ format is prone to lower SA? reporting of items, quicker processing time and primacy effects. Therefore probably best to avoid. • However, it may be less likely to show mode effects in a mixed mode design (F2F with showcard versus SA). Yes/No for each Modes to Use Recommendations F2F • The ‘Yes/No for each’ format is strongly supported as TEL superior to ‘mark all that apply’ by Smyth et al (2006, SA 2008). But • It can add to the time taken to complete a questionnaire • Long lists of items should be avoided to reduce potential R burden • The results from Nicolaas et al (2011) suggest that the format should be tested across modes before use Ranking versus Rating (1) Ranking What would you consider most important in improving the quality of your neighbourhood? Please rank the following 7 items from 1 (meaning most important) to 7 (meaning least important). Less traffic Less crime More / better shops Better schools More / better facilities for leisure activities Better transport links More parking spaces □ □ □ □ □ □ □ Battery of Rating Questions Next are a number of questions about improving your neighbourhood? How important would less traffic be for improving the quality of your neighbourhood? Very important Moderately important Somewhat important Or not important at all? Etc. 1 2 3 4 Ranking versus Rating (2) General Questionnaire Design Context Ranking • Is difficult (Fowler, 1995) • Primacy effects (see Stern, Dillman & Smyth, 2007) • Better quality (see Alwin and Krosnick, 1985; Krosnick, 1999; Krosnick, 2000). Ranking versus Rating (3) Mixed Modes Context • Rating more susceptible to non-differentiation in Web than TEL (Fricker et al, 2005) • Similarly, rating sometimes more susceptible to nondifferentiation in Web or TEL than F2F (Hope et al, 2011) • Ranking more susceptible to non-differentiation in Web than F2F (TEL not tested) (Hope et al, 2011) Ranking versus Rating (4) Ranking Modes to Use Recommendations F2F NOT TEL SA? Avoid use of ranking in mixed mode studies • Ranking could be considered for F2F surveys if the list is short • Ranking is not feasible for TEL surveys (unless 4 categories or less) • Ranking is often problematic in SA modes • Ranking with programme controls in Web may irritate or confuse some Rs Rating Modes to Use Recommendations F2F TEL? SA ? • Avoid long sequences of questions using the same rating scale in mixed mode designs that include Web and possibly TEL • Could try rating task followed by ranking of the duplicates (except in postal where skip patterns would be too difficult) Agree/Disagree Questions This neighbourhood is not a bad place to live. Strongly agree 1 Agree 2 Neither agree nor disagree 3 Disagree 4 Or strongly disagree? 5 General Questionnaire Design Context • Agree/Disagree questions are a problematic format in all modes • They create a cognitively complex task • Are susceptible to acquiescence bias • For additional problems see Fowler (1995), Converse and Presser (1986), Saris et al (2010) and recent Holbrook AAPOR Webinar Mixed Modes Context • Differences across modes were found with more acquiescence bias in the interview modes and curiously, more middle category selection in SA (Hope et al, 2011) Modes to Use Recommendations Should not be used • Avoid use of agree-disagree scales and use alternative in any mode formats, such as questions with item specific (IS) response options Use of Middle Category (1) And how satisfied or dissatisfied are you with street cleaning? Very satisfied Moderately satisfied Slightly satisfied Neither satisfied nor dissatisfied Slightly dissatisfied Moderately dissatisfied Very dissatisfied 1 2 3 4 5 6 7 Use of Middle Category (2) General Questionnaire Design Context • Kalton et al (1980) • Krosnick (1991) and Krosnick and Fabrigar (1997) • Schuman and Presser (1981) • Krosnick and Presser (2010) • Krosnick and Fabrigar (1997) • O’Muircheartaigh, Krosnick and Helic (1999) • Hope et al (2011) Use of Middle Category (3) Mixed modes context • More use of the middle category in visual (as opposed to aural) mode (Tarnai and Dillman, 1992) • More selection of middle categories on end-labelled scales than fully labelled scales, but less so for TEL (Hope et al 2011) • More use of the middle category in Web as opposed to F2F or TEL (Hope et al 2011) Use of Middle Category (4) Modes to Use Recommendations F2F TEL? SA? • Probably best not to use middle categories with a mixed modes study with SA • If mixed mode design includes • TEL interviews be cautious of the use of end-labelled scales Overall Typology of Questions A classification of question characteristics relevant to measurement error Question content Topic: behaviour, other factual, attitude, satisfaction, other subjective Sensitivity Inherent difficulty: conceptual, comprehension, recall Question format Type of task Number Date Short textual/ verbal Open Unconstrained textual/verbal Ratio/interval Visual analogue scale Characteristics of the task Implementation of question Closed Ordinal Nominal Agree/disagree Yes/no Rating-unipolar Mark all Rating-bipolar Ranking Numeric bands Battery of rating scales Number of categories Middle categories Full/end labels Branching Use of instructions, probes, clarification, etc. Edit checks DK/refused explicit or implicit Formatting of Size of answer response boxes box/text field Labelling of Delineation of response boxes answer space Formatting of response lists Showcards In Summary 1) Mode is a characteristic of a question 2) Good questionnaire design is key to minimising many measurement differences 3) But we are unlikely to eliminate all differences as there are different types of satisficing in different modes 4) We need to do more to assess any remaining differences and find ways to adjust for these (more on this in the next few slides) Assessing Mixed Mode Measurement Error (1) Quality indicators For example: • Mean item nonresponse rate • Mean length of responses to open question • Mean number of responses in mark all that apply • Psychometric scaling properties • Comparison of survey estimates to a ‘gold’ standard (de Leeuw 2005; Kreuter et al, 2008; Voogt and Saris, 2005) • Although validation data often hard or impossible to obtain • Etc. Assessing Mixed Mode Measurement Error (2) How was the mixed mode data collected? What are the confounding factors or limitations? • Random assignment • R’s randomly assigned to mode (Nicolaas et al, 2011): But this is not always possible • Random group changes mode during the interview (Heerwegh, 2009) • In both cases non-compatibility can occur due to differential nonresponse bias • R choses mode of data collection • May reduce nonresponse, but selection and measurement error effects are confounded (Vannieuwenhuyze et al, 2010) Assessing Mixed Mode Measurement Error (3) Ways to separate sample composition from mode effects • Compare mixed mode data to that of a comparable single-mode survey (Vannieuwenhuyze et al, 2010) • Statistical modelling: • Weighting (Lee, 2006) • Multivariate model (Dillman et al, 2009) • Latent variable models (Biemer, 2001) • Propensity score matching (Lugtig et al, 2011) • Matching Rs from two survey modes which share the same background characteristics • Identify Rs who are unique to a specific survey mode and those who are found in both modes • May be a useful technique Assessing Mixed Mode Measurement Error (4) The size of effects between modes Depends on the type of analyses, which Depends on the type of reporting needed For example: • Reporting of • Means • Percentages for extreme categories • Percentages for all categories We hope that today’s talk has given you. . . • More understanding of the theoretical and practical differences in how Rs react to different modes of data collection • More awareness of specific question attributes that make certain questions less portable across modes • More knowledge and confidence in executing your own mixed modes questionnaires Thank you all for listening dr.pamela.campanelli@thesurveycoach.com Complete table of results and recommendations available upon request Appendix Open Questions (1) Option 1: Unconstrained textual/verbal open questions (i.e., fully open questions) General Questionnaire Design Context - SA Lines in text boxes versus an open box • Christian and Dillman (2004) • But Ciochetto et al (2006) Slightly larger answer spaces (Christian and Dillman, 2004) Open Questions (2) Option 1: Fully open questions (continued) Mixed Mode Context • TEL Rs give less detailed answers to open-ended questions than F2F Rs (Groves and Kahn, 1979; Sykes & Collins, 1988; de Leeuw and van der Zouwen, 1988) • Paper SA Rs give less complete answers to open-ended questions than F2F or TEL Rs (Dillman, 2007; de Leeuw,1992, Groves and Kahn, 1979) • Web Rs provide 30 more words on average than paper SA Rs (Schaeffer and Dillman, 1998) • Positive effects of larger answer spaces may also apply to interview surveys (Smith, 1993; 1995) Open Questions (3) Option 1: Fully open questions (continued) Modes to Use F2F TEL SA? Recommendations • If mixed mode design includes SA, • Minimise the use of open questions (as less complete answers are obtained) • Pre-test SA visual layout 1) To ensure that the question is understood as intended 2) To check if there are differences across modes Open Questions (4) Option 2: Open question requiring a number, date, or short textual/verbal response General Questionnaire Design Context - SA • Small changes in visual design can have large impact on measurement • Examples • Couper, Traugott and Lamias (2001) • Smith (1993; 1995) • Dillman et al (2004) • Martin et al (2007) Open Questions (5) Option 2: Short number, date or textual/verbal response (continued) Mixed Modes Context Modes to Use Recommendations F2F TEL SA? • Test SA visual layout 1) To ensure that the question is understood as intended 2) To check if there are differences across modes End-labelled versus Fully-labelled (1) On the whole, how satisfied are you with the present state of the economy in Great Britain, where 1 is very satisfied and 7 is very dissatisfied? General Questionnaire Design Context • Krosnick and Fabrigar (1997) suggest that fully-labelled scales are • Easier to answer • More reliable and valid • Two formats are not equivalent • Fully-labelled scales produce more positive responses (Dillman and Christian, 2005; Campanelli et al, 2012) • End-labelled scales have a higher percent of Rs in the middle category (Campanelli et al, 2012; not discussed in text but in tables of Dillman and Christian, 2005) End-labelled versus Fully-labelled (2) Mixed Modes Context • Although higher endorsement of middle categories on end-labelled scales • Less true for TEL Rs (Campanelli et al, 2012) Modes to Use Recommendations F2F TEL? SA • Be careful of the use of end-labelled scales as these are more difficult for Rs • If mixed mode design includes • TEL interviews be cautious of the use of end-labelled scales Branching versus No Branching (1) In the last 12 months would you say your health has been good or not good? Good Not good 1 2 IF GOOD: Would you say your health has been fairly good or very good? Fairly good Very good 1 2 General Questionnaire Design Context • In TEL surveys, ordinal scales are often changed into a sequence of two or more branching questions in order to reduce the cognitive burden • Krosnick and Berent (1993) • Malhotra et al (2009) IF NOT GOOD: Would you say your health has been not very good or not good at all? Not very good 1 Not good at all 2 • Hunter (2005) • Nicolaas et al (2011) Branching versus No Branching (2) Mixed Modes Context • Nicolaas et al (2000) found more extreme responses to attitude questions in the branched format in TEL mode (but unclear whether more valid) • Nicolaas et al (2011) found • Mode differences between F2F, TEL and Web, but with but with no clear patterns • No mode difference for the non-branching format • More research needed Branching versus No Branching (3) Modes to Use F2F TEL SA Recommendations • As branching may improve reliability and validity, if used, it should be used across all modes • But testing is recommended to see if mode differences are present • Due to R non-compliance with skip patterns in paper SA1, Dillman (2007) recommends • Avoidance of branching questions in mixed mode surveys that include a postal component • Instead reduce number of categories so that branching is not required 1 Dillman (2007) shows that the skips after a filter question can be missed by a fifth of postal survey Rs Implementation of task Use of instructions, probes, clarifications, etc. (1) Can I check, is English your first or main language? INTERVIEWER: If ‘yes', probe - 'Is English the only language you speak or do you speak any other languages, apart from languages you may be learning at school as part of your studies?' Yes - English only Yes - English first/main and speaks other languages No, another language is respondent's first or main language Respondent is bilingual 1 2 3 4 Use of instructions, probes, clarifications, etc. (2) • It is common practice to provide interviewers with additional information that can be used if necessary to improve the quality of information from Rs • Although not yet studied in mixed modes, it is likely that this may result in differences across modes in a study that uses SA alongside interviewer modes Use of instructions, probes, clarifications, etc. (3) Modes to Use Recommendations F2F TEL SA • Where possible, all instructions and clarifications should be added to the question for all modes (rather than being left to the discretion of the interviewer) or excluded from all modes • Dillman (2007) recommends that interviewer instructions be evaluated for unintended response effects and their use for SA modes considered Don’t Know (1) What, if any, is your religion? None 1 Christian 2 Buddhist 3 Hindu 4 Jewish 5 Muslim 6 Sikh 7 Another religion 8 General Questionnaire Design Context (Spontaneous only) (Don’t know (Refused • In SA modes, the ‘don’t know’ option tends to be either an explicit response option or it is omitted altogether 98) 99) • Offering explicit ‘don’t know’ response greatly increases cases in this category • Particularly true for R’s with • lower educational attainment (see Schuman and Presser, 1981; Krosnick et al, 2002) • Common practice not to provide an explicit ‘don’t know’ in TEL and F2F Don’t Know (2) Mixed Mode Context • Treating ‘don’t know’ differently in different modes may result in different rates of ‘don’t know’ across the modes • Fricker et al (2005) • Dennis and Li (2007) • Bishop et al (1980) • Vis-Visschers (2009) Don’t Know (3) Modes to Use Recommendations F2F TEL SA • Spontaneous ‘don’t know’ can be offered in mixed mode designs that include only interviewer administered modes (i.e., TEL & F2F). • For mixed mode designs that include both interviewer-administered and SA modes, it is generally recommended not to allow ‘don’t know’ as a response option. • Further research is required to compare spontaneous ‘don’t know’ in TEL and F2F with alternative methods of dealing with ‘don’t know’ in Web questionnaires (e.g. allowing questions to be skipped without further prompting). • For questions where it is likely that many Rs may not know the answer, explicit don’t knows should be used across all modes. Full References (1) Alwin, D., & Krosnick, J. (1985). The Measurement of Values in Surveys: A Comparison of Ratings and Rankings. Public Opinion Quarterly, 49(4), 535-552. Belson, W. (1981). The Design and Understanding of Survey Questions. Aldershot, England: Gower. Biemer, P. (2001). Non-response Bias and Measurement Bias in a Comparison of Face to Face and Telephone Interviewing. Journal of Official Statistics, 17(2), 295-320. Bishop, G., Oldendick, R., Tuchfarber, A. & Bennett, S. (1980). PseudoOpinions on Public Affairs. Public Opinion Quarterly, 44 (2), 198-209. Campanelli, P. (2010). Internal analysis documents from ESRC Survey Design and Measurement Initiative grant on Mixed Modes and Measurement Error. Campanelli, P., Gray, M., Blake, M. and Hope, S. (2012). Suspicious and Non-Suspicious Response Patterns which Are and Are Not Problematic. Unpublished paper. Full References (2) Christian, L. & Dillman, D. (2004). The Influence of Graphical and Symbolic Language Manipulations on Responses to SelfAdministered Questions. Public Opinion Quarterly, 68(1), 57-80. Ciochetto, S., Murphy, E & Agarwal, A. (2006). Usability Testing of Alternative, Design Features for the 2005 National Census Test (NCT) Internet Form: Methods, Results, and Recommendations of Round-2 Testing. Human-Computer Interaction Memorandum #85, Washington, DC: U. S. Census Bureau (Usability Laboratory). Converse, J. & Presser, S. (1986). Survey Questions: Handcrafting the Standardized Questionnaire. Thousand Oaks, California: Sage. Couper, M., Traugott, M., & Lamias, M. (2001). Web Survey Design and Administration. Public Opinion Quarterly, 65(2), 230. de Leeuw, E. & van der Zouwen, J. (1988). Data Quality in Telephone and Face-to-Face Surveys: a Comparative Meta-Analysis. In R. Groves et al. (Eds), Telephone survey methodology. Hoboken, New Jersey: Wiley. Full References (3) de Leeuw, E. (1992). Data Quality in Mail, Telephone and Face to Face Surveys. Dissertatie. T. T. Publikaties Amsterdam. de Leeuw, E.D. (2005). To Mix or Not to Mix Data Collection Modes in Surveys. Journal of Official Statistics, 21(2), 233-255. Dennis, M. & Li, R. (2007). More Honest Answers to Web Surveys? A Study on Data Collection Mode Effects. IMRO’s Journal of Online Research, published on 10/10/2007. http://ijor.mypublicsquare.com/view/more-honest-answers. Dillman, D. (2007). Mail and Internet Surveys: The Tailored Design Method, 2rd edition. Hoboken, New Jersey: Wiley. Dillman, D. & Christian, L. (2005). Survey Mode as a Source of Instability in Responses Across Surveys. Field Methods, 17(1), 30-52. Dillman, D., Parsons, N. & Mahon-Haft, T. (2004). Cognitive Interview Comparisons of the Census 2000 Form and New Alternatives. Technical Report 04-030 of the Social and Economic Sciences Research Center, Washington State University, Pullman, Washington. http://www.sesrc.wsu.edu/dillman/papers/2004/connectionsbetweeno pticalfeatures.pdf Full References (4) Dillman, D., Smyth, J. & Christian, L.M. (2009). Internet, Mail and MixedMode Surveys: The Tailored Design Method, 3rd edition. Hoboken, New Jersey: Wiley. Fricker, S., Galesic, M., Tourangeau, R. & Yan, T. (2005). An Experimental Comparison of Web and Telephone Surveys. Public Opinion Quarterly, 69(3), 370-392. Fowler, F.J. (1995). Improving Survey Questions: Design and Evaluation. Thousand Oaks, California: Sage. Fowler, F.J., Roman, A. & Di, Z. (1998). Mode Effects in a Survey of Medicare Prostate Surgery Patients. Public Opinion Quarterly, 62(1), 29. Groves, R.M. & Kahn, R.L. (1979). Surveys by Telephone: A National Comparison with Personal Interview. New York: Academic Press. Heerwegh, D. (2009). Mode differences between face-to-face and web surveys: An Experimental Investigation of Data Quality and Social Desirability Effects. International Journal of Public Opinion Research, 21(1), 111-121. Full References (5) Hope, S., Campanelli, P., Nicolaas, G., Lynn, P., Jäckle, A. (2011). The role of the interviewer in producing mode effects: Results from a mixed modes experiment. ESRA Conference, 21 July 2011. Hunt, S., Sparkman, R. & Wilcox, J. (1982). The Pretest in Survey Research: Issues and Preliminary Findings. Journal of Marketing Research, 19(2), 269-273. Hunter, J. (2005). Cognitive Test of the 2006 NRFU: Round 1, Statistical Research Division Study Series, Survey Methodology #2005–07, U.S. Census Bureau. http://www.census.gov/srd/papers/pdf/ssm2005-07.pdf Jäckle, A., Lynn, P., Campanelli, P., Nicolaas, G., & Hope, S. (2011). How and When Does the Mode of Data Collection Affect Survey Measurement? ESRA Conference, 21 July 2011. Kalton, G., Roberts, J. & Holt, D. (1980). The Effects of Offering a Middle Response Option with Opinion Questions. Statistician, 29, 65-78. Full References (6) Kreuter, F., Presser, S. & Tourangeau, R. (2008). Social Desirability Bias in CATI, IVR, and Web Surveys: The Effects of Mode and Question Sensitivity. Public Opinion Quarterly, 72(5), 847-865. Krosnick, J. (1991). Response Strategies for Coping with the Cognitive Demands of Attitude Measures in Surveys. Applied Cognitive Psychology, 5, 213-236. Krosnick, J. (1999). Survey Research. Annual Review of Psychology, 50: 537–567. Krosnick, J. (2000). The Threat of Satisficing in Surveys: The Shortcuts Respondents Take in Answering Questions. Survey Methods Newsletter, 20(1), 2000 (Published by the National Centre for Social Research, London, UK). Krosnick, J. & Berent, M. (1993). Comparisons of Party Identification and Policy Preferences: The Impact of Survey Question Format. American Journal of Political Science, 37(3), 941-964. Full References (7) Krosnick, J. & Fabrigar, L. (1997). Designing Rating Scales for Effective Measurement in Surveys. In L. Lyberg et al. (Eds), Survey Measurement and Process Quality (pp. 141-164). Hoboken, New Jersey: Wiley. Krosnick, J., Holbrook, A., Berent, M., Carson, R., Hanemann, W., Kopp, R., Mitchell, R., et al. (2002). The Impact of "No Opinion" Response Options on Data Quality: Non-Attitude Reduction or an Invitation to Satisfice? Public Opinion Quarterly, 66(3), 371-403. Krosnick, J., Narayan, S. & Smith, W. (1996). Satisficing in Surveys: Initial Evidence. In M. T. Braverman & J. K. Slater (Eds.), Advances in Survey Research (Vol. 70, pp. 29-44). San Francisco: Jossey-Bass. Krosnick, J. and Presser, S. (2010). Question and Questionnaire Design. Handbook of Survey Research, 2nd Edition. In James D. Wright and Peter V. Marsden (Eds). San Diego, CA: Elsevier. Lee, S. (2006). Propensity Score Adjustment as a Weighting Scheme for Volunteer Panel Web Surveys. Journal of Official Statistics, 22(2), 329349. Lozar Manfreda, K. & Vehovar, V. (2002). Mode Effect in Web Surveys. In the proceedings from The American Association for Public Opinion Research (AAPOR) 57th Annual Conference, 2002. Full References (8) Lugtig, P., Lensvelt-Mulders, G., Frerichs, R., & Greven, A. (2011). Estimating Nonresponse Bias and Mode Effects in a Mixed-Mode Survey. International Journal of Market Research, 53(5). Malhotra, N., Krosnick, J. & Thomas, R. (2009). Optimal Design of Branching Questions to Measure Bipolar Constructs. Public Opinion Quarterly, 73(2): 304-324. Martin, E., Childs, H., Hill, J., Gerber, E., & Styles, K. (2007). Guidelines for Designing Questionnaires for Administration in Different Modes., US Census Bureau, Washington. http://www.census.gov/srd/modeguidelines.pdf Nicolaas, G., Campanelli, P., Hope, S., Jäckle, A. & Lynn, P. (2011). Is It a Good Idea to Optimise Question Format for Mode of Data Collection? Results from a Mixed Modes Experiment, ISER Working paper, no. 2011-31, ISER, University of Essex. Nicolaas, G., Thomson, K. & Lynn, P. (2000). Feasibility of Conducting Electoral Surveys in the UK by Telephone. National Centre for Social Research. O'Muircheartaigh, C., Krosnick, J. & Helic, A. (2001). Middle Alternatives, Acquiescence, and the Quality of Questionnaire Data, Irving B. Harris Graduate School of Public Policy Studies, University of Chicago, 2001. Full References (9) Rasinski, K., Mingay, D., & Bradburn, N. (1994). Do Respondents Really “Mark All That Apply” On Self-Administered Questions? The Public Opinion Quarterly, 58(3), 400-408. Roy, L., Gilmour, G. & Laroche, D. (2004). The Internet Response Method: Impact on Canadian Census of Population Data. Statistics Canada Internal Report, 2004. http://www.amstat.org/sections/srms/proceedings/y2006/Files/JS M2006-000808.pdf Sangster, R. & Fox, J. (2000). Housing Rent Stability Bias Study. Washington, DC: U.S. Bureau of Labor Statistics, Statistical Methods Division. Saris, W., Revilla, M., Krosnick, J. & Shaeffer, E. (2010). Comparing Questions with Agree / Disagree Response Options to Questions with Item-Specific Response Options. Survey Research Methods, 4(1), 61-79. Schaeffer, D. & Dillman, D. (1998). Development of a Standard Email Methodology. Public Opinion Quarterly, 62(3), 378-397 http://www.schonlau.net/publication/03socialsciencecomputerrevie w_propensity_galley.pdf Full References (10) Schonlau, M., Zapert, K., Simon, L., Sanstad, K., Marcus, S., Adams, J., Spranca, M., et al. (2003). A Comparison Between Responses From a Propensity-Weighted Web Survey and an Identical RDD Survey. Social Science Computer Review, 22(1), 128-138. Schuman, H. & Presser, S. (1981). Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, & Context. New York: Academic Press. Smith, T. (1993). Little Things Matter: A Sampler of How Differences in Questionnaire Format Can Affect Survey Responses, GSS Methodological Report no. 78, Chicago: National Opinion Research Center. Smith, T. (1995). Little Things Matter: A Sample of How Differences in Questionnaire Format Can Affect Survey Responses, paper presented at the annual meeting of the American Association for Public Opinion Research. Smyth, J., Dillman, D., Christian, L. & Stern, M. (2006). Comparing CheckAll and Forced-Choice Question Formats in Web Surveys. Public Opinion Quarterly, 70(1), 66-77. Full References (11) Smyth, J., Christian, L. & Dillman, D. (2008). Does “Yes or No” on the Telephone Mean the Same as “Check-All-That-Apply” on the Web? Public Opinion Quarterly, 72(1), 103-113. Stern, M., Dillman, D. & Smyth, J. (2007). Visual Design, Order Effects, and Respondent Characteristics in a Self-Administered Survey. Survey Research Methods, 1(3), 121-138. Sudman, S. & Bradburn, N.M. (1982). Asking Questions. San Francisco, California: Jossey-Bass. Sykes, W. & Collins, M. (1988). Effect of Mode of Interview: Experiments in the UK. In R. Groves et al. (Eds), Telephone survey methodology. Hoboken, New Jersey: Wiley. Tarnai, J. & Dillman, D. (1992). Questionnaire Context as a Source of Response Differences in Mail vs. Telephone Surveys. In: N. Schwarz, H. J. Hippler, and S. Sudman (Eds.) Order Effects in Social and Psychological Research. New York: Springer-Verlag. Thomas, R. K. and Klein, J. D. (2006). Merely incidental?: Effects of response format on self-reported behaviour. Journal of Official Statistics, (22) 221-244. Full References (12) Tourangeau, R. & Yan, T. (2007). Sensitive Questions in Surveys. Psychological Bulletin, 133(5), 859-883. Vannieuwenhuyze, J., Loosveldt, G., and Molenberghs, G. (2010). A Method for Evaluating Mode Effects in Mixed-Mode Surveys. Public Opinion Quarterly, 74(5), 1027-1045. Vis-Visschers, R. (2009). Presenting 'don’t know' in web surveys. Paper presented at 7th Quest Workshop Bergen Norway, 18-20 May 2009. Ye, C., Fulton, J. & Tourangeau, R. (2011). More Positive or More Extreme? A Meta-Analysis of Mode Differences in Response Choice. Public Opinion Quarterly, 75(2), 349–365.