Non-sampling errors (Session 20) SADC Course in Statistics Learning Objectives By the end of this session, you will be able to • describe the types of non-sampling errors that arise in survey work • explain actions that may be taken to minimise commonly occurring non-sampling errors • have a greater appreciation that sampling errors is only a small component of all errors that may arise and that close attention to reducing non-sampling errors is equally or more important in survey work. To put your footer here go to View > Header and Footer 2 Non-sampling errors: 1 Non-sampling errors cover all errors other than those due to sampling a subset of the population. In both surveys and censuses, it is quite usual to find non-sampling errors because their absence implies that the data collection process has been: (a)Implemented and enumerated perfectly, & (a)Completely free of measurement errors, i.e. inaccuracies in the recording of information from selected units. To put your footer here go to View > Header and Footer 3 Non-sampling errors: 2 Non-sampling errors are not all due to avoidable mistakes and/or deficiencies. They can often occur because of decisions by the researchers to balance the need for good quality data with the need to obtain timely data at acceptable cost. The problem then reduces to one of defining and minimising errors associated with the data collection and data processing procedures. To put your footer here go to View > Header and Footer 4 Types of non-sampling errors Non-sampling errors can be of various types Coverage (or Frame) errors Non-response errors Measurement errors Data handling errors Note that the first more often applies to sample surveys, while the last three apply to both surveys and censuses. To put your footer here go to View > Header and Footer 5 Coverage (frame) errors In surveys, the sample is selected from a list, i.e. a sampling frame, of all population members. An inadequate frame leads to coverage errors. Often can have either under-coverage (missing elements), or over-coverage (duplicates) Both lead to biased results. See below for an example. To put your footer here go to View > Header and Footer 6 Minimising frame errors For under-coverage, consider re-defining the population, i.e. the target population is simply considered as the population which can be accessed by the frame. For duplicates, develop a system to identify the duplicates, e.g. by using additional information on the recording unit. Both under & over-coverage are minimised by using up-to-date frames, e.g. in UK the Postcode Address File is updated every 3 months, and is hence often used by the Office of National Statistics (ONS). To put your footer here go to View > Header and Footer 7 Non-response errors Non-response errors are all errors arising from: • Unit non-response, i.e. failure to obtain information from a pre-chosen sampling unit or population unit • Item non-response, i.e. failure to get a response to a specific question or item in the data recording form. To put your footer here go to View > Header and Footer 8 Types of non-response errors Discussion: What are the typical forms of non-response (both unit and item non-response) you encounter in your work? What are the reasons for non-response? How can such non-response errors be minimised? To put your footer here go to View > Header and Footer 9 Measurement Errors Measurement errors arise when the recorded response differs from the true value. They can occur for a variety of reasons, e.g. • by respondent (e.g. heads of households) giving an incorrect answer • because of instrument or question error • by interviewer error. Further, errors may be greater for some subgroups of the population, e.g. those less literate, or those unwilling to co-operate. To put your footer here go to View > Header and Footer 10 Reasons for respondent errors Respondent errors arise for many reasons e.g. • respondent gives an incorrect answer, e.g. due to prestige or competence implications, or due to sensitivity or social undesirability of question • respondent misunderstands the requirements • lack of motivation to give an accurate answer • “lazy” respondent gives an “average” answer • question requires memory/recall • proxy respondents are used, i.e. taking answers from someone other than the respondent. How can such errors be minimised? To put your footer here go to View > Header and Footer 11 Instrument Errors Instrument or question errors arise when • The question is unclear, ambiguous or difficult to answer • the list of possible answers suggested in the recording instrument is incomplete • requested information assumes a framework unfamiliar to the respondent • the definitions used by the survey are different from those used by the respondent (e.g. how many part-time employees do you have? See next slide for an example) How can such errors be minimised? To put your footer here go to View > Header and Footer 12 An example of instrument error The following example is from Ruddock (1998) – see slide 18 In the Short Term Employment Survey (STES) conducted by Office of National Statistics in UK, data are collected on numbers of full-time and part-time employees on a given reference date. Some firms ignored the reference date and gave figures for employees paid at the end of the month, thus including those who joined and those who left in that month – leading to an overestimate. Firms found it difficult to give details of part-time employees as their definition of “part-time” did not agree with that used by ONS. To put your footer here go to View > Header and Footer 13 Interviewer errors Interviewer errors arise when • different interviewers administer a survey in different ways • differences occur in reactions of respondents to different interviewers, e.g. to interviewers of their own sex or own ethnic group • inadequate training of interviewers • inadequate attention to the selection of interviewers • there is too high a workload for the interviewer How can such errors be minimised? To put your footer here go to View > Header and Footer 14 Data handling errors Data handling errors can occur from the stage of data collection up to the final stages of data analysis. Types of errors that can arise include:• • • • errors in transmission of data from the field to the office errors in preparing the data in a suitable format for computerisation, e.g. during coding of qualitative answers errors in computerisation of the data errors during data analysis, e.g. imputation and weighting. Do any of these types of error occur in your work. If so, what can you do to minimise them? To put your footer here go to View > Header and Footer 15 Measuring non-sampling errors Measuring non-sampling errors is difficult and often impossible. Attempts have often been through specific additional studies, e.g. characteristics of non-respondents in the 1996 British Crime Survey were investigated by a mini-questionnaire to those living in 25% of non-responding addresses. Several studies to assess non-sampling errors can be found in Ruddock (1998) (see slide 18 for full ref.) & in Lessler, J.T. and Kalsbeek, W.D. (1992) Non-sampling error in surveys; Wiley. To put your footer here go to View > Header and Footer 16 Non-sampling errors: Key Points Non-sampling errors are inevitable in production of national statistics. Important that:• At planning stage, all potential non-sampling errors are listed and steps taken to minimise them are considered. • If data are collected from other sources, question procedures adopted for data collection, and data verification at each step of the data chain. • Critically view the data collected and attempt to resolve queries immediately they arise. • Document sources of non-sampling errors so that results presented can be interpreted meaningfully. To put your footer here go to View > Header and Footer 17 References Ruddock, V. (1998) “Measuring and Improving Data Quality” UK Govt. Statistical Service Methodology Series No. 14, for a very comprehensive coverage of non-sampling errors. This document may be downloaded from http://www.statistics.gov.uk/methods_quality/publications.asp Lepkowski, J. (2004) Non-observation error in household surveys in developing countries. Chapter VIII, pp 149-169 of the UN Publication An Analysis of Operating Characteristics of Household Surveys in Developing and Transition Countries: Survey Costs, Design Effects and Non-Sampling Errors. Available at http://unstats.un.org/unsd/hhsurveys/index.htm To put your footer here go to View > Header and Footer 18 Practical work follows… To put your footer here go to View > Header and Footer 19