Data sources and data compilation methods Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics 27-30 May 2008, Addis Ababa, Ethiopia UNITED NATIONS STATISTICS DIVISION Trade Statistics Branch Distributive Trade Statistics Section Outline of the presentation Data sources for DTS – statistical surveys, administrative data sources and frames Data compilation methods Data collection strategy Data sources for compilation of DTS Generation of DTS - based on data collected from numerous sources Statistical data sources – data are collected specifically for statistical purposes Administrative data sources - provide data created originally for purposes other than the production of statistical data Statistical data sources Statistical surveys Economic censuses – enumeration of all units in the population; basis for the establishment of BR; sampling frame for surveys Sample surveys – collect responses from a few representative units scientifically selected from the population Advantages of statistical surveys vs. administrative data sources Planning, execution, data collection and processing procedures are under the control of the statistical office itself Respondents have less reason to deliberately misreport the data as the NSO guarantees confidentiality Disadvantages Resource intensity (both financial and manpower) Increase respondents burden High non-response rates Sampling errors Census of trade units (1) Types Part of an economy-wide census, including all economic activities Independent census for distributive trade sector/activities only Advantages Tend to provide a complete enumeration of units engaged with trade activity, including units of informal sector at a particular point of time Allow collection of DTS in great details that are required at longer intervals of time Disadvantages Limited in terms of data content Census planning and organization and the subsequent transformation of census’s basic data into DTS Time consuming and resource intensive exercise Costly, imposes a high burden on respondents Response rates may be reduced thus affecting quality of collected information Census of trade units (2) Recommendations Conduct of a complete census of trade units is recommended when: A particular country does not maintain an up-to-date business register There is a significant user interest for detailed statistical data by geographical area Censuses should be followed as closely as possible by periodic (annual, quarterly or monthly) sample surveys Censuses of trade units should not be conducted if there are other ways of collecting and producing distributive trade statistics of highly enough quality Sample surveys of trade units (1) Technique for obtaining data about a large population of statistical units by selecting and measuring a limited number of units (sample) from that population Conclusions about the total population of units are made on the basis of the estimates obtained from the sample Scientific sample designs should be applied in order to reduce the risk of a distorted view of the population Sample survey technique is a less costly way of data collection as compared to the census It may be used in conjunction with a cut-off point or not Sample surveys of trade units (2) Wholesale and retail trade sample surveys Rarely restricted to one standard form Tend to be a combination of forms, differentiated by periodicity and major characteristics of trade units activity, size, legal form, type of operation and the type of variables occasionally an extra characteristic, such as the geographical location of the unit, may influence the contents of a sample survey Sample surveys of trade units (3) Size thresholds Size of units plays an important role in determining the target population and, where relevant, the sample population of units Most of the sample surveys are conducted for units above a certain size threshold Reasons for using the threshold Desire to limit the size of the survey Reduce the response burden on businesses Take account of the problems of maintaining registers for smaller units Appropriate size threshold No international recommendation Decision is left to the judgment of each NSO May vary between surveys for different trade activities and periodicity Countries are encouraged to: Make periodic assessments of the under coverage of the surveys due to the thresholds Include a description of such thresholds in country’s metadata Types of DTS surveys (1) Enterprise surveys Sampling units comprise trade enterprises (or statistical units belonging to these enterprises) Assume availability of a sampling frame of trade units List-based frame – BR or census list Area-based frame – a sample of areas is selected first, then the enterprises in it are enumerated Recommendations For surveys of distributive trade enterprises, the list-based enterprise surveys should be generally preferred to areabased surveys List-based survey is more efficient from a sampling perspective in terms of sample size and maintenance of the list Area based sampling is inappropriate for large or medium sized enterprises that operate in several areas Area-based enterprise survey approach to be used for collection of data from small trade enterprises operating in informal or unorganized segment of the economy Types of DTS surveys (2) Household surveys Households are the sampled, reporting and observation units – ensures coverage of production by households enterprises that are too small to be recorded Disadvantages of household surveys Sample is not designed to provide a representative coverage of trade activities, but on the distribution of households Distributions of households and trade activities are different, as trade activities tend to be concentrated in commercial and market zones Recommendations For coverage of unincorporated household enterprises which are not recognized as legal entities separately from their owners Types of DTS surveys (3) Mixed household-enterprise surveys A sample of households is selected and each household is asked whether any of its members own and operate an unincorporated enterprise The list of enterprises thus compiled is used as the basis for selecting the enterprises from which desired data are finally collected In contrast to household surveys they collect information about enterprises per se, not about the persons in a household, including their contribution to the enterprises Disadvantages Inefficiency of the sample design Difficulties of handling enterprises with production units in more than one location Recommendations Preferred to household surveys or area-based enterprise surveys approach for collecting data and estimating the output of small trade units that are excluded from listbased enterprise surveys Administrative data sources (ADS) (1) Set up in response to legislation and/or regulation Each regulation results in a register of the units Countries should use ADS for statistical purposes with caution Privately controlled ADS Data obtained from private sector data suppliers Transfer of data from them to NSOs takes the form of a contract with a payment of a fee Recommendations Compilers of DTS should identify and review the available ADS in their countries and use the most appropriate of them for compiling DTS Administrative data sources (2) Advantages Complete coverage of units and perceived as low non-response Avoidance of response burden Cheaper for NSOs to acquire data from an ADS than to conduct a survey Suitable for covering the smallest segment of units population which contributes relatively little to the estimates but makes up a substantial percentage of the number of units in the population Smaller sampling errors than in survey, better accuracy Disadvantages Discrepancy between administrative and statistical concepts Poor integration with other data of the statistical system Risks with respect to stability The level of scrutiny to variables that are of statistical interest may not be satisfactory Data may become available with unacceptable delay Legal constraints with respect to access and confidentiality Business register Business register (BR) - recommended as the most appropriate source for deriving sample frame for distributive trade surveys Organization and conduct of any enterprise survey of distributive trade units assumes availability of an adequate sampling frame Sampling frame - set of units subject to sampling together with the details about them that will be used for stratification, sampling and contact purposes Statistical business register Comprehensive list of all enterprises and other units together with their characteristics that are active in a national economy A tool for the conduct of statistical surveys as well as a source for statistics in its own right Operationalises the selected model of statistical units and facilitates classification of units according to the agreed conceptual standards for all surveys Statistical business register (1) Establishment Available administrative registers - starting point for the establishment of Statistical BR If only one administrative register is used, the resulting Statistical BR would likely to be deficient in terms of coverage and content and would not provide an adequate sampling frame for subsequent statistical surveys Countries are encouraged to work towards improvement of the coverage and content of their Statistical BR by incorporating data from several administrative sources Need of a single business number for all enterprises Maintenance Should be up-to-date and with satisfactory quality Should be regularly maintained and updated to take note of the changes in the enterprise dynamics Statistical business register (2) Sources for the establishment and maintenance of Statistical BR Economic census - provide the most comprehensive list of units and links between them in a given country Administrative data sources - VAT tax and payroll tax systems, records maintained by the governments for the administration of unemployment insurance, social security or other programmes Feedback from enterprise surveys - provide new information on contact address changes, closure of business, change in the economic activity of the unit, etc. Business register surveys - profiling of enterprises Other potential sources - information from trade associations, telephone directories or special listings prepared by telephone companies, etc. Profiling of enterprises Group of Enterprises - holding company One establishment enterprise Local Unit establishment Holding enterprise/ establishment serving mainly as control investment unit Local Unit 1 establishment Multi-establishment enterprise Local Unit 2 establishment Local Unit 3 ancillary establishment Data compilation methods Process of data compilation Comprises more than just aggregating the questionnaire items Statistical offices perform a number of checks, validation and statistical procedures to bring the collected data to the level of the intended statistical output DTS respondents - prone to commit errors while completing a statistical questionnaire DTS data collected trough statistical surveys - affected by response and non-response errors of different kinds Data validation and editing (1) Integral part of all types of statistical surveys data processing operations Needed to solve problems of missing, invalid or inconsistent responses Editing Systematic examination of collected data for the purpose of identifying and eventually modifying the inadmissible, inconsistent and highly questionable or improbable values, according to predetermined rules Essential process for assuring quality of the collected information Types of editing Micro editing (input editing) - focuses on the editing of an individual record or a questionnaire Macro editing (output editing) – checks are performed on aggregated data Data validation and editing (2) Selective editing Approach for prioritizing and further reducing costs of editing Targets only those of the micro data items or records that would have a significant impact on the distributive trade surveys results Recommended for editing of distributive trade data Influential observations Particular data item responses that have most significant impact upon the main estimates Editing efforts should be focused on them Data validation and editing (3) Edit checks for detecting errors in distributive trade data Routine checks - test whether all questions have been answered Validation checks - test whether answers are permissible Rational checks - set of checks based on the statistical analysis of respondent data Plausibility checks – used to pick up large random errors Imputations (1) Missing data Encountered in most of the trade surveys Create problems for data editing Types of missing data Item non-response - data for a particular data item of the questionnaire is missing Unit non-response - selected unit has not returned the filled-in questionnaire Techniques for dealing with missing data Imputations Re-weighting Imputations (2) Replace one or more erroneous responses or non-responses in a record with plausible and internally consistent values Process of filling gaps and eliminating inconsistencies Means of producing a complete and consistent file containing imputed data Used mainly for estimating missing data in case of item non-response Substitution - used in the case of unit nonresponse Data from previous available periods of that unit Data available for that unit from administrative information Imputations (3) Commonly used imputation methods Subjective treatment Mean/modal value imputation Post stratification Substitution Cold deck - makes use of a fixed set of values, which covers all of the data items Hot deck - replaces each missing value by the available value from a 'donor', i.e. a similar participant in the same survey Nearest-neighbour imputation or distance function matching Sequential hot deck imputation Regression (model based) imputation Item non-response Strategies for dealing with item nonresponse The analysis is confined to the fully completed forms only as all forms with missing values are ignored Not recommended because even the valid data contained in the partially complete forms are discarded Missing data are imputed so that the data matrix is complete Unit non-response Non-response may occur due to: Non existence of the unit included in the survey Lack of appreciation of the importance of the data on part of the respondents Refusal to respond Lack of knowledge how to respond Lack of resources Non-availability of the desired information Ways to minimize the non-response Increase the awareness among respondents about importance of surveys Appeal to the respondents to cooperate with the statistical authorities Reminders to the non-respondents and resorting to the enforcement measures laid down in the national legislation Strategies for dealing with unit non-response Re-weighting - the sample is re-weighted as to include only the responding sample units Various forms of imputations – similarly to those used for item non-response Data collection strategy (1) DTS surveys and/or administrative data sources should cover all units in the economy engaged in economic activities within the scope of the distributive trade sector (Section G of ISIC, Rev.4) Units of all sizes and types including corporations and unincorporated (household) units Data collection strategy NSOs should develop their own data collection strategy Ensures a complete coverage of distributive trade activity Based on an integrated approach covering in principle all trade units across all class sizes enterprises Commensurate with their specific statistical and organizational circumstances Data collection strategy (2) Public incorporated enterprises A directory of such units is available in most of the cases To be covered on a complete enumeration basis Private and foreign controlled incorporated enterprises Large-scale units To be covered on a complete enumeration basis if possible Others Tend to be significant in numbers but relatively homogenous To be covered through a sample survey Small enterprises Sample surveys - if these are on the statistical BR or through the use of administrative data (tax returns of small enterprises) Fully Integrated Rational Survey Technique (FIRST) - if the register of unincorporated enterprises is not available Data collection strategy (3) Total population of units engaged in trade activities In the Business Register (List-frame segment) Large units Not in the Business Register (Non-list-frame segment) With fixed premises Small units Public Sector Private sector Should be covered on a complete enumeration basis Segment 1: Large units should be covered on a complete enumeration basis Segment 2: Remaining units should be covered through sample survey Covered either through sample surveys or Administrative Data sources Without fixed premises 1 Area frame 2 Should be covered through sample surveys FIRST (1) Survey programme that efficiently capture comprehensive statistical information from all distributive trade enterprises operating in an economy Application Requires two basic statistical sets of information Census enumeration, preferably an economic census - to establish the complete statistical population of units for construction of sampling frame and sample selection Population census – alternative in the absence of economic census Supporting documentation on sample areas/enumeration blocks for the benchmark enumeration Divides the units into two segments List-frame segment – comprises of relatively small number of large units Non-list-frame segment – includes all remaining units that can be covered only by an (geographical) area frame approach FIRST (2) List-frame segment Population of units tends to be very heterogeneous in its size and characteristics Surveys are drawn from a BR or a directory of units Non-list-frame segment First stage - a sample of area units is selected Second stage - a list of all establishments operating in each of the selected in the first stage unit is identified Establishments falling in the scope of DTS are classified by kind-of-activity Sample of units is drawn from the listed establishments Mobile units All identifiable establishments outside the owners’ home located in the selected area unit as well as household-based enterprises located within home - listed by a house-to-house visit Distributive trade surveys Annual surveys Should provide estimates that cover all wholesale and retail trade establishments Comprehensive surveys are not always necessary Infra-annual surveys (quarterly or monthly) All establishments above a given cut-off point may be completely enumerated, while the others may be sampled All units may receive a survey form, but an abbreviated version may be used for the small establishments Estimates for the small establishments may be made from administrative data or from other statistical inquiries such as mixed household-enterprise surveys More restricted coverage than annual surveys Small establishments - coverage is subject to their significance and availability of reliable administrative data source Infrequent surveys (5-10 years) Used for collection of data on specialised topics or in greater details Not appropriate for the purpose of collecting and compiling structural type DTS Reference period Reference period for annual surveys Data compiled in annual surveys should relate to a 12-month period 12-month period should preferably be the calendar year Other options Data are more readily available on a different fiscal-year basis for some establishments Some data items (wages and salaries) have to be collected on both a fiscal-year and calendar-year basis to facilitate building up calendar year aggregates Data for all establishments are available on a fiscal year basis which become the normal accounting period Reference period for infra-annual surveys Corresponding calendar month/quarter - recommended as the reference period for infra-annual surveys Other options Some establishments work in quarterly periods of four, four and five weeks NSO should make every efforts to standardize the information provided in the monthly returns by some estimation procedures Thank You