Statistics Canada Statistique Canada Use of Tax Data in the Unified Enterprise Survey (UES) Workshop on Use of Administrative Data in Economics Statistics Marie Brodeur Moscow November, 2006 Overview of the presentation 1. 2. 3. 4. 5. 6. 7. 8. UES Background Integrated Approach Principles Survey Characteristics Business Register Sampling Achievements and Future Directions Use of Tax Data Research and development 1. UES Background Major project to improve provincial statistics (1996) Reliable Annual Provincial Data for the Allocation of HST Revenues (SNA I-O Tables) More detailed Industry & Commodity data Creation of Enterprise Statistics Division (ESD) UES Pilot (RY 1997) -- 7 surveys Gradual Expansion of Surveys; Covers 65% of GDP 2. Integrated Approach Principles Use of Single, Unduplicated Frame -- the BR Expanded coverage Common Sample Design Methodology Integrated Questionnaire -- common / simple language; harmonized concepts / variables Centralized Data Collection at the Statistical Establishment level 2. Integrated Approach Principles (continued) Common Generic Processing Systems and Methods Centralized Warehouse Head Office Survey Maximum Use of Tax Data Annual Profiling of Large Enterprises Enterprise Portfolio Managers 3. Survey Characteristics Separate Enterprise & Establishment Surveys Over 50 Establishment Surveys Over 55,000 collection entities representing about 68,000 establishments (17K replaced by tax for RY 2005) Centralized Collection -- $3.5 million budget Smallest businesses estimated through tax 4. Business Register (BR) BR covers all sectors Incorporated and unincorporated businesses Complex and simple enterprises Structure Legal Operational Statistical (Enterprise & Establishment) Updated with Administrative Data 5. Sampling Stratified Random Sample Industry (NAICS 4) Province Size 1 Take-all stratum 2 Take-some strata (50% of units replaced by tax) Take-none strata (under Royce-Maranda thresholds) Stratification in One Look Cell Sampling revenue Take-all Take-some 2 Must take units Take-some 1 Take-none Royce-Maranda (RM) Exclusion Thresholds: •To reduce response burden on small enterprises Sampling Process BR (2.3M businesses) Survey Universe File (2M businesses) Sample Control File (2M businesses) UES Sample (70K businesses) Survey Interface File 38K CEs / Questionnaires Tax Est’d (1.4M) 55K CEs Tax Replacements 17K CEs 6. Achievements Timeliness Centralized Processing Systems and Databases Response Burden Use of Tax Data 6a. Timeliness Very problematic during start-up years Many processing systems in development Problems with questionnaires Task force created in 2001 Target: 15 months after reference year Since RY 2003, all surveys between 12-15 month period 6b. Centralized Processing Systems and Databases Develop centralized systems Move away from stand-alone Single point of access for security Integrated Questionnaire Metadata System Edit and imputation Allocation and Estimation Data Warehouse Centralized Collection Pre-Contact (17K Businesses) Mailout (38K CEs) Receipt (75% target) Score Function Edit / Verification (BLAISE) “Clean” Records Capture / Imaging Delinquent Follow-Up Post-Collection Processing “Clean” Records Pre-Grooming Tax Data Central Data Store USTART Edit & Imputation Allocation / Estimation Subject Matter Review & Correction Tool 7. Use of Tax data Significant process since 1997 Strategic Streamlining Initiative Result Almost 65% of units replaced by tax data Impact of 27% in the total estimate Streamlining Initiatives at STC Announced in 2002 Objectives Maintaining quality Create efficiencies Enhance work flows Identify trade-offs Expand the use of tax data for survey replacement T1\T2 Project Objective is to substitute 50% of simple establishments. Direct Data Replacement for annual surveys using T1(unincorporated) T2 (incorporated) Facilitated by the Chart of Accounts (COA). Types of Administrative (Tax) Data From the Canadian Revenue Agency (CRA) Agreement between CRA and STC T1 (unincorporated businesses) T2 (incorporated businesses) T4 (pay slips) GST (goods and service tax) PD7 (payroll deduction accounts) Processing of Tax Data Edit erroneous reports Outlier detection Eliminate duplication Impute for missing values Annualize in case of monthly data Stratification in One Look Cell Sampling revenue Take-all Take-some 2 Must take units Take-some 1 Take-none Royce-Maranda (RM) Exclusion Thresholds: •To reduce response burden on small enterprises RY2005 Methodology: Tax Replacement T1 T2 Main sample Main sampleto tobe besurveyed surveyed Not eligible for tax : questionnaire Characteristic survey (some Services surveys) or questionnaire (all other divisions) Tax replaced ROYCE-MARANDA THRESHOLDS T1 TakeNone: Sample of e-filers T2 Take-None: Census of General Index of Financial Information (GIFI) UES: Use of Tax Data Validation (comparison) Verify dubious collected data against the equivalent tax data record Imputation One of the methods used for non-response Estimation Below take-none Direct Data Replacement Some annual surveys 100% tax (Taxi & Limousines, Survey of Mapping) Update Business Register Allocation of survey data ( use tax revenues, salaries and expenses) CHART OF ACCOUNTS Why does a Bureau of Statistics need one? BUSINESS WORLD Chart of Accounts (COA) BUREAU OF STATISTICS Chart of Accounts COLLECTION Sales Operating revenue EBIT Gross Cost of profit Expenses sales LINK, BRIDGE, CONCORDANCE DISSIMINATION Shipments Outputs Inputs Value added Operating Surplus GDP Expected Benefits of a Chart of Accounts Standardization in business data collection Higher survey response Increase in quality of data Comparison of data from various sources Increase efficiency in using administrative data Links to Chart of Accounts Establishment CHART OF ACCOUNT Legal entity Legal entity GST Data Monthly tax data Used to replace survey data for monthly surveys Implemented for manufacturing, services and retail surveys For RY 2005 used for analytical comparisons for annual Services Surveys Research and Development Data Integration Project make a more efficient use of tax data Development of new quality indicators (e.g. Rates, coefficients of variation)