Integrated Approach Processing SNA seminar in the Caribbean Marie Brodeur

advertisement
Integrated Approach
Processing
SNA seminar in the Caribbean
Marie Brodeur
Director General, Industry Statistics Branch, Statistics Canada
St. Lucia
February, 2014
Why A Centralized Process?
 Best Practices
 Standardization of Processes
• Cross Survey Comparisons
• Enterprise Centric Processing/Coherence
Analysis
 Efficient use of Resources
 Transportable Knowledge Across Survey
Programs
2
UES Post-Collection Processing
Records from
Collection
Pre-Grooming
Edit & Imputation
Allocation /
Estimation
3
Tax
Data
Data Service
Center
Business
Register
Subject Matter
Review & Correction
Tool
Collection
 Precontact (Dec-Jan)
– Mostly for Business Register (BR) births; verification of contact
information (name, address, …)
– By phone (in a few cases, a letter or a fact sheet is sent)
 Mail-out of questionnaires (Jan-March)
– 2 or 3 mail-out dates
 Follow-up in case of non-response for some units
(begins about a month after mail-out)
– Phone call, remail or fax
 Mail-back of questionnaires
 Verifications of received questionnaires / Edits
– Is the questionnaire complete or are some key variables
missing? (Edit follow-up by phone in some cases)
4
Centralized Collection
Pre-Contact
Mailout
Receipt
(75% target)
Capture / Imaging
5
Prioritize
Edit / Verification
“Clean”
Records
Delinquent
Follow-Up
Use Of Tax Data
 Validation (comparison)
 Verify dubious collected data against the
equivalent tax data record
 Imputation
 One of the methods used for non-response
 Estimation
 Direct Data Replacement
 Calibration Estimates
 Update Business Register
 Allocation of survey data (use tax revenues, salaries
and expenses)
Centralized Processing Systems
And Databases
 Develop centralized systems
• Move away from stand-alone
• Single point of access for security
 Integrated Questionnaire Metadata System
 Edit and imputation
 Allocation and Estimation
 Data Warehouse
Enterprise Portfolio Managers
 Top 350 enterprises in Canada
 Status
• Platinum, Gold, Silver, Bronze
 Personal visits
 Enterprise Profiling
 Coordination of mail-out and collection
 Enterprise/ Establishment coherence
 Holistic Response Management
• Strategic Response Unit
• Escalation Process / Statistics Act
8
What Is E & I?
 Editing
• Verify that parts add-up to total
• Ensure that there are no missing values where parts
add up to total
• There must be consistency between related
variables
 Imputation
• Changing values in fields which fail edit rules with a
view to ensuring that the resulting data satisfy all
edit rules. In practice, reported data will rarely be
changed
• Impute for missing data or partially responded data
• Impute entire records in the case of total nonresponse
9
Why Is E&I Necessary?
 To produce a complete and consistent data file
that accounts for all sampled units
 Both units that did not respond to the survey
must be imputed and units that did not provide
a complete response must be imputed
 Correct erroneous responses
10
E&I Terminology
 Data Group
• Groupings (defined by SM) of records that will be
kept together for imputation purposes
• These groupings are based on multi dimensions:
 industry (NAICS)
 geography (province)
 Data groups that will be used for a specific survey will
depend on:
• initial sample design (number of units sampled and
the level of stratification used)
• number of records that respond to the survey (a
minimum of 5 or 10 records are required)
11
BANFF E & I System
 Impute for missing key variables as
specified by subject matter (i.e. total
revenue, total expenses)
 Impute for other missing variables:
• Apply Historical Trend
• Apply Current Year Trend
• Use donor (for partial imputation)
12
BANFF Algorithms
 DIFTREND - Historical trend imputation
 CURRATIO - Current ratio imputation
 PREVALUE – Value from the previous period for
the same unit is imputed
 PREAUX – Historical value of a proxy variable for
the same unit
 CURAUX – Current value of a proxy variable for the
same unit
13
Allocation - Definition & Purpose
Definition:
 Allocation is the distribution of survey and administrative
data from their acquisition level (Collection Entity) to the
targeted statistical units (Establishments or Locations)
as defined on the survey frame.
Purpose:
 To provide fully-processed micro data on a fiscal year
basis, for establishments or locations in-sample for the
UES
 Determine the distribution of value added by province
14
Sample Survey Allocation
SAMPLE
Collection/
Processing
Allocation
Establishment 1
Establishment 1
Establishment 2
Establishment 2
Questionnaire 1
Establishment 3
Establishment 3
Establishment U
Establishment 4
15
Questionnaire 2
Establishment 4
Overview of the IBSP Rolling Estimates Approach
Sampling
Multi-Mode
Collection
Active Management
Follow-Up
Manual Editing
Rolling
Estimates
Quality Indicators
and Scores
16
Statistics Canada • Statistique Canada
Automated
Processing
Editing
Imputation
Estimation
Interpretation &
Dissemination
2016-07-23
Active Management – Strategy Settings
 A subset of all Key Estimates is selected
 All Key Estimates are:
• Ranked from the most to the least important
• Weighted relatively using an importance factor
• Assigned a Quality Target
 Targets are set in line with the importance factor.
 Active Collection ends for a Key Estimate when the Quality Indicator meets
the Quality Target.
 Active management and sampling strategies are
coherent by design.
17
Statistics Canada • Statistique Canada
2016-07-23
Active Management – Definitions
 Quality Indicator (QI)
• QI= Sampling CV & Imputation CV & Pseudo Relative Bias
 Measure of Impact (MI) Score
• Impact of a unit on the QI for a given estimate
• Units imputed from a poor model or with reported/imputed values
far from their predicted values will have high MIs.
18
Statistics Canada • Statistique Canada
2016-07-23
Empirical Study – RY2011 Prototype
 Parallel run for 47 Business Surveys
 Four Rolling Estimates iterations
 Total CV calculated for all key estimates (8,600) at each iteration
19
Statistics Canada • Statistique Canada
2016-07-23
20
Statistics Canada • Statistique Canada
2016-07-23
Download