Integrated Approach Processing SNA seminar in the Caribbean Marie Brodeur Director General, Industry Statistics Branch, Statistics Canada St. Lucia February, 2014 Why A Centralized Process? Best Practices Standardization of Processes • Cross Survey Comparisons • Enterprise Centric Processing/Coherence Analysis Efficient use of Resources Transportable Knowledge Across Survey Programs 2 UES Post-Collection Processing Records from Collection Pre-Grooming Edit & Imputation Allocation / Estimation 3 Tax Data Data Service Center Business Register Subject Matter Review & Correction Tool Collection Precontact (Dec-Jan) – Mostly for Business Register (BR) births; verification of contact information (name, address, …) – By phone (in a few cases, a letter or a fact sheet is sent) Mail-out of questionnaires (Jan-March) – 2 or 3 mail-out dates Follow-up in case of non-response for some units (begins about a month after mail-out) – Phone call, remail or fax Mail-back of questionnaires Verifications of received questionnaires / Edits – Is the questionnaire complete or are some key variables missing? (Edit follow-up by phone in some cases) 4 Centralized Collection Pre-Contact Mailout Receipt (75% target) Capture / Imaging 5 Prioritize Edit / Verification “Clean” Records Delinquent Follow-Up Use Of Tax Data Validation (comparison) Verify dubious collected data against the equivalent tax data record Imputation One of the methods used for non-response Estimation Direct Data Replacement Calibration Estimates Update Business Register Allocation of survey data (use tax revenues, salaries and expenses) Centralized Processing Systems And Databases Develop centralized systems • Move away from stand-alone • Single point of access for security Integrated Questionnaire Metadata System Edit and imputation Allocation and Estimation Data Warehouse Enterprise Portfolio Managers Top 350 enterprises in Canada Status • Platinum, Gold, Silver, Bronze Personal visits Enterprise Profiling Coordination of mail-out and collection Enterprise/ Establishment coherence Holistic Response Management • Strategic Response Unit • Escalation Process / Statistics Act 8 What Is E & I? Editing • Verify that parts add-up to total • Ensure that there are no missing values where parts add up to total • There must be consistency between related variables Imputation • Changing values in fields which fail edit rules with a view to ensuring that the resulting data satisfy all edit rules. In practice, reported data will rarely be changed • Impute for missing data or partially responded data • Impute entire records in the case of total nonresponse 9 Why Is E&I Necessary? To produce a complete and consistent data file that accounts for all sampled units Both units that did not respond to the survey must be imputed and units that did not provide a complete response must be imputed Correct erroneous responses 10 E&I Terminology Data Group • Groupings (defined by SM) of records that will be kept together for imputation purposes • These groupings are based on multi dimensions: industry (NAICS) geography (province) Data groups that will be used for a specific survey will depend on: • initial sample design (number of units sampled and the level of stratification used) • number of records that respond to the survey (a minimum of 5 or 10 records are required) 11 BANFF E & I System Impute for missing key variables as specified by subject matter (i.e. total revenue, total expenses) Impute for other missing variables: • Apply Historical Trend • Apply Current Year Trend • Use donor (for partial imputation) 12 BANFF Algorithms DIFTREND - Historical trend imputation CURRATIO - Current ratio imputation PREVALUE – Value from the previous period for the same unit is imputed PREAUX – Historical value of a proxy variable for the same unit CURAUX – Current value of a proxy variable for the same unit 13 Allocation - Definition & Purpose Definition: Allocation is the distribution of survey and administrative data from their acquisition level (Collection Entity) to the targeted statistical units (Establishments or Locations) as defined on the survey frame. Purpose: To provide fully-processed micro data on a fiscal year basis, for establishments or locations in-sample for the UES Determine the distribution of value added by province 14 Sample Survey Allocation SAMPLE Collection/ Processing Allocation Establishment 1 Establishment 1 Establishment 2 Establishment 2 Questionnaire 1 Establishment 3 Establishment 3 Establishment U Establishment 4 15 Questionnaire 2 Establishment 4 Overview of the IBSP Rolling Estimates Approach Sampling Multi-Mode Collection Active Management Follow-Up Manual Editing Rolling Estimates Quality Indicators and Scores 16 Statistics Canada • Statistique Canada Automated Processing Editing Imputation Estimation Interpretation & Dissemination 2016-07-23 Active Management – Strategy Settings A subset of all Key Estimates is selected All Key Estimates are: • Ranked from the most to the least important • Weighted relatively using an importance factor • Assigned a Quality Target Targets are set in line with the importance factor. Active Collection ends for a Key Estimate when the Quality Indicator meets the Quality Target. Active management and sampling strategies are coherent by design. 17 Statistics Canada • Statistique Canada 2016-07-23 Active Management – Definitions Quality Indicator (QI) • QI= Sampling CV & Imputation CV & Pseudo Relative Bias Measure of Impact (MI) Score • Impact of a unit on the QI for a given estimate • Units imputed from a poor model or with reported/imputed values far from their predicted values will have high MIs. 18 Statistics Canada • Statistique Canada 2016-07-23 Empirical Study – RY2011 Prototype Parallel run for 47 Business Surveys Four Rolling Estimates iterations Total CV calculated for all key estimates (8,600) at each iteration 19 Statistics Canada • Statistique Canada 2016-07-23 20 Statistics Canada • Statistique Canada 2016-07-23