Joint EC-OECD Workshop on International Development of Business and Consumer Tendency Surveys Brussels 14-15 November 2005 Task Force on Harmonisation of Survey Operation and Technical Design Efficient Sample Design and Weighting Methodologies Analysis of Key Issues and Recommendations Efficient Sample Design and Weighting Methodologies The usefulness of BTS and CS strongly depends on their statistical quality • Quality of survey data may be measured in terms of (OECD, 2003): • • • • • reliability timeliness of release comparability over time transparency and accessibility to the users This task force has to do with sample design and weighting methodologies • A sound definition of these aspects increases the reliability – and therefore the overall quality – of the data • Efficient Sample Design and Weighting Methodologies Business Tendency Surveys (BTS) are conducted on manufacturing, services, retail and construction sectors • Consumer Surveys (CS) measure households’ opinion and expectations on personal and general economic situation • For both BTS and CS, key issues in Efficient Sampling Design and Weighting Methodologies will be analyzed • An overview of the current practices in these fields will be presented, from which it will be derived a number of draft recommendations aimed at improving the overall quality of the surveys • Efficient Sampling Design for Business Tendency Surveys In the case of BTS, key Issues for an efficient sample design are (see also Donzè, Etter, Sydow, Zellweger, “Sample Design for Indystry Survey”, ECFIN/2003/A3-03): • • • • • Identification of relevant Universe/reference population Identification of the sample frame Identification of the correct method for sample selection Treatment of missing data Efficient Sampling Design for Business Tendency Surveys The first step in setting up a BTS is the choice of the Relevant Universe/reference population • Typically, it is represented by all the firms operating in a given sector, as resulting from some official/statistical register • Some firms may be excluded looking at their size (i.e., below a certain size threshold) or location, or on the basis of their structural characteristic (i.e., exclusion of government bodies) • Efficient Sampling Design for Business Tendency Surveys The second step is the choice of the sample frame, having the goal of maximize sample coverage and minimize coverage errors • BTS are usually based on a panel of responding firms, that are re-interviewed each month • Demographic and structural characteristic of the respondents have to be known in order to build up the sample, the construction of the frame implying the following steps: • • • • • identification of the appropriate frame list eventual adoption of a cut off strategy identification of the sample, reporting and response unit Updating of the frame list Efficient Sampling Design for Business Tendency Surveys • The frame list may be derived from: • official or statistical registers • membership lists of business associations and chamber of commerce 60% 50% 40% 30% 20% 10% 0% non eu eu-25 total Manufacturing Statistical registers Business directory Services Retail Construction Governement registers and other Efficient Sampling Design for Business Tendency Surveys • The adoption of a cut off strategy may respond to the need of: • better focusing on the sector of interest (cut-off with respect to the sector of activity) • ensuring a certain stability of the panel, excluding firms below a certain threshold (cut-off with respect to size) The sample unit is the unit on which to perform the sample selection procedure; the main choice is between having: • • the whole firm • establishments, local units, or kind of activity units (kau) Even if the firm is chosen as the sample unit, it is possible to have more reporting and response units within the firm, sending the questionnaires to different establishments/local units/KAUs within the firm • Efficient Sampling Design for Business Tendency Surveys •90% 80% 70% 60% 50% 40% 30% 20% 10% 0% eu non Eu Total Manufacturing Establishemnt Services Enterprise Retail Construction Mixed (establishment + kau) Efficient Sampling Design for Business Tendency Surveys Finally, the list should be updated frequently in order to keep track of the changes in the structure of the reference Universe and avoid possible problems in terms of: • • Under coverage (new firms entering the market) • ineligibility (old firms exiting the market) • duplicate entries (errors) 50% Manufacturing Services Retail Construction 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% Continously more than 1 time per year yearly between 1 and 6 years other (ad hoc) not available Efficient Sampling Design for Business Tendency Surveys The third step implies the choice of an appropriate method of sample selection • The sample should be representative of the relevant Universe • In order to build up a representative sample, choices have to be made relatively to: • • The sampling method • The sampling size BTS are usually based on a fixed panel of reporting units • A fixed sample structure may rise representitiveness problems, because the panel may loose its initial representativeness if it is not updated regularly • For this reason, Institutes often use a rotating panel method, in which a a fixed percentage of units are replaced at regular intervals • More precisely, the largest and most important firms should ideally always be included, with smaller firms being rotated out on a regular basis • Efficient Sampling Design for Business Tendency Surveys • Possible methods of sample selection include: • non-statistically founded methods, such as: • • comprehensive surveys with cut-off purposive or quota sampling • statistically funded methods, such as: • • simple stratified sampling stratified sampling (PPS, OAS) Once selecting the sample, appropriate sample sizes • Institutes have to choose Generally speaking, sample size will be chosen accordingly to a predetermined desired maximum measurement error • Efficient Sampling Design for Business Tendency Surveys • Manufacturing Services Retail Construction 40% 35% 30% 25% 20% 15% 10% 5% 0% Representative panel Quota Simple stratified Stratified Stratified Cluster and sampling sampling w ith sampling w ith random PPS Neyman sampling allocation Purposive Comprehensive choice or mixed Efficient Sampling Design for Business Tendency Surveys The fourth step in building up BTS implies the treatment of missing data • It is possible to distinguish between • • unit non response (entire interview is missing) • item non response (some answer is missing) If missing data occur with large, dominant firms the problem is severe • If there is missing data among the smallest firms, the problem is less critical and missing data may easily be imputed on the basis of the answer of similar firms • Most commonly used method for dealing with missing data problem is the use of follow-up techniques (re-interviewing with telephone, fax or web based techniques) • If missing data problems persist, re-weighting or imputation methods should be used • Weighting Methods for Business Tendency Surveys Weighting is used to transform data for the realized sample into estimates for the reference population • Weights may be based on: • • information coming from the survey itself (size, output of the firm) • auxiliary sources (official/statistical data on the size of the reference sector/region) • Weighting methods are usually based on either: • one stage weight scheme • two stage weight scheme In the one-stage scheme, a weight is associated with each reporting unit, in order to take into account its relative importance inside the sample • In the second case, a unit-specific weight is used to calculate strata results, further aggregating the strata with some external sources in order to obtain industry aggregates. • Weighting Methods for Business Tendency Surveys • Manufacturing Services Retail Construction 60% 50% 40% 30% 20% 10% 0% One-stage weighting procedure Two-stage weighting procedureNo weights/missing information Efficient Sampling Design for Consumers Surveys In the case of CS, in order to build up a representative sample choices have to be made relatively to: • • The sampling frame • The sampling methods • The construction of the sampling frame implies the following steps: • • • • • identification of the appropriate frame list eventual adoption of a cut off strategy identification of the sample unit, reporting unit and response unit updating of the frame list In the identification of the appropriate frame list, it is crucial that • the right population is being sampled, • all the members of the population have the same chance of being sampled • In OECD countries, frame lists are usually based on: • official population register (including every adult member of the population) • telephone register (arising possible bias problems, to be solved adopting random extraction of phone numbers or using other sample techniques for those excluded from the directories) Efficient Sampling Design for Consumers Surveys • A cut off strategy is often adopted • on the basis of age (cut off age varying often across countries in EU); • in some countries, geographical cut offs are also applied Response unit may differ from the sample unit; typically, sample are devised to be representative of all households, with the selected respondent reporting on the household as a whole • The list should be updated frequently in order to monitor as close as possible the evolution of the relevant population • 35% 30% 25% 20% 15% 10% 5% 0% Continously 4 times a year yearly betw een 1 and 6 years more than 6 years not available Efficient Sampling Design for Consumers Surveys • Key issues in sampling extraction include •the choice of the appropriate sampling method • the choice of the optimal size of the sample In CS usually a independent cross-section of household is extracted each month: • • In EU a general strategy of simple random sampling is used • In the US a rotating sampling design is usually applied, in which the respondent chosen in each drawing is re-interviewed six months later, in order to provide a regular assessment of change in consumers’ attitudes • Most widely used methods of sampling extractions include: • simple stratified sampling • multiple stage stratified sampling • Random Digit Dialing methods There is no consensus in the literature on the appropriate sample size • In practice, sample size currently converges to about 2000, a size supposed to provide acceptable confidence intervals for this type of survey • Weighting Methodologies for Consumers Surveys Information gathered from survey’ respondents may be appropriately weighted to derive aggregate information on household’ opinion and expectations • Weights may be based on • • auxiliary information (demographic or socio-economic weights) • inverse selection probabilities (sample weights) • Most commonly used weight variables are: • demographic characteristics of the household: • • gender and age of the respondent region of residence and size of the township • socio economic characteristics of the household: • • • economic occupation level of education housing condition, type of area/municipality A number of Institutes do not use weights: this is appropriate only when: • •every household has an equal chance of selection • there is no differential no response Weighting Methodologies for Consumers Surveys • 80% 70% 60% 50% 40% 30% 20% 10% 0% Sample weights Socio Demographic characteristic No weights Minimum requirements and recommendation for BTS: sample design – the sample frame • The frame lists • Frame lists should include an as exhaustive as possible account of active firms for the survey of interest • As a consequence, the use of official or statistical registers of active firms is recommended over that of – more partial – business or membership registers • Cut off strategies • Institutes are advised to use cut-off strategies in order to stabilize the panel (size cut off) and for a precise identification of the survey objectives (branch cut off) Minimum requirements and recommendation for BTS: sample design – the sample frame • Sample units and reporting units • Establishments may be considered the ideal choice for the sample unit; however, it may be difficult to gather information at this level • Use of KAU is advisable if we are particularly interested in the industrial structure • Use of local units is advisable if we are particularly interested in the regional structure Sample frame: reporting units • Even if the firms is identified as the sample unit, it is advisable – if possible – to have different reporting units within the firm • Response units • In any case it is strongly recommended that the Institutes ensure that the same response unit answer the questionnaire every month • Updating of the lists • As a minimum requirement, frame lists should be updated as soon as a new census of active firms is available Minimum requirements and recommendation for BTS: sample design – sampling methods As a recommendation, a fixed panel should be used… • … established on a statistically founded basis… • … using a rotating pattern of updating … • … with a fixed percentage of participants being replaced at regular intervals • As a minimum requirement, sampling extraction should be based on sound probabilistic considerations • The use of exhaustive sampling is possible for small countries or for a sub-set of the sample • Avoiding of purposive or ad hoc sampling methods is strongly recommended • Different probabilistic methods of sample selection may be used; as a general consideration, the more heterogeneous is the population, the more is advisable the use of stratification based sampling methods • Minimum requirements and recommendation for BTS: sample design – treatment of missing data Institutes should define what procedures are used for the treatment of item and unit non response (missing data) • As a minimum requirement, institutes are advised: • • to closely monitor the impact of missing data (especially for large firms) • to use follow up techniques in order to reduce their impact (fax, telephone, web remainder) The use of imputation methods to deal with remaining missing data should be considered with care, in order to avoid possible distortions • The use of re-weighting techniques, taking into account different composition of the panel in adjacent surveys, may be advisable to reduce the bias • Minimum requirements and recommendation for BTS: weighting methods The use of weights is strongly recommended in order to improve the precision of the estimates • As a minimum requirement the use of a simple – one stage – system of weights is suggested • Two stage (or multiple stage) weighting procedures are advisable for heterogeneous population, especially in large countries • Minimum requirements and recommendation for CS: sample design – the sample frame Frame list should include an as exhaustive as possible account of the adult population • As a consequences, official census or statistical registers are to be preferred to telephone registers • If telephone registers are used, appropriate methods to correct for possible bias is recommended • Cut off strategies with respect to age are advisable – this may call for further harmonization in the EU • As a recommendation, frame lists should be updated yearly • Minimum requirements and recommendation for CS: sample design – sampling methods As a minimum requirement, random sampling techniques have to be used in order to ensure survey representitiveness • In case of heterogeneous population, the use of stratified sampling methods should be preferred to simple random sampling • Finally, a major difference emerges between EU (using independent drawing of the sample each month) and the US (using a rotating sample design) • The adoption of the US method may possibly enhance research option available to analyst even in the EU • Minimum requirements and recommendation for CS: weighting Weighting is recommended in order to ensure better representitiveness • Demographic characteristic of the households may be used as weights, considering among them: • • age and gender • region of residence and size of the township Alternatively, socio-economic characteristics may be used as weights: • • type of occupation • Level of education • type of area municipality THANK YOU FOR YOUR ATTENTION! Task Force Members are: Richard Curtin (University of Michigan, United States), Isabelle De Greef (National Bank of Belgium), Richard Etter (KOF / ETZ, Switzerland), Christian Gayer (European Commission), Marie Hormannova (CZSO, Czech Republic), Marco Malgarini (Institute for Studies and Economic Analysis – ISAE, Italy), Rony Nilsson (OECD), Raymund Petz (GKI Research, Hungary), Takashi Sakuma (ESRI, Japan), Philippe Scherrer (INSEE, France), Anna Stangl (Ifo Institute for Economic Research, Germany), Andres Vertes (GKI Research, Humgary), Peter Weiss (European Commission), Jonathan Wood (Confederation of British Industry – CBI, United Kingdom).