Administrative Data and their Use in Economic Statistics Vladimir Markhonko United Nations Statistics Division Vladimir Markhonko 12/7/2007 Contents Definitions Advantages of using administrative data Common problems Quality of administrative data Using administrative data in practice Conclusions Vladimir Markhonko 12/7/2007 Narrow Definition Data Sources Primary (Statistical) Secondary (Non-statistical) Public Sector Vladimir Markhonko 12/7/2007 Private Sector Wider Definition Data Sources Primary (Statistical) Secondary (Non-statistical) Public Sector Vladimir Markhonko 12/7/2007 Private Sector Administrative sources are sources containing information which is not primarily collected for statistical purposes. Vladimir Markhonko 12/7/2007 Reasons for this Definition Privatisation of some government functions Growth of private sector “value-added re-sellers” User interest in new types of data Vladimir Markhonko 12/7/2007 Benefits of Administrative Data Cost Surveys / censuses are expensive, administrative data are often “free” Response burden Reduced burden on data suppliers Statistics can be compiled more frequently with no extra burden Vladimir Markhonko 12/7/2007 Benefits of Administrative Data Coverage Full coverage of target population No survey errors and lower non-response Better small-area data Timeliness (sometimes!) Public image Making use of existing data can enhance the prestige of a statistical organisation by making it seem more efficient Vladimir Markhonko 12/7/2007 Population Census Costs 2000-2001 €367m, €6.2 per person Austria, €56m, €6.9 per person Finland, €0.8m, €0.2 per person UK, Source: Eurostat – Documentation of the 2000 round of population and Housing censuses in the EU, EFTA and Candidate Countries; Table 22 Vladimir Markhonko 12/7/2007 Common Problems Administrative units do not always coincide with statistical units Conversion via automatic rules for simple cases Profiling for more complex cases Gives a better understanding of complex business structures Expensive and needs trained staff Vladimir Markhonko 12/7/2007 Vladimir Markhonko 12/7/2007 Common Problems Different definitions and classifications Administrative and statistical priorities are often different Conversion matrices needed for different classifications Timeliness Data arrive too late Data relate to a different time period Vladimir Markhonko 12/7/2007 Lag in12/7/2007 days Vladimir Markhonko 1000 950 900 850 800 750 700 650 600 550 500 450 400 350 300 250 200 150 100 50 0 Frequency (thousands) VAT Birth Lags 200 180 160 140 120 100 80 60 40 20 0 VAT Birth Lags 2/3 of businesses are on the register within 2 months of start-up Mean lag = 4 months due to “outliers” Median = Approx. 40 days Some pre-register - negative lags Vladimir Markhonko 12/7/2007 Common Problems Change Risk of changes in government policy, thresholds, definitions, coverage etc. Need contingency plans Data management from multiple sources Matching / linking issues Data conflicts – priority rules Vladimir Markhonko 12/7/2007 Quality of Administrative Data There are many aspects to quality Administrative data will be better than survey data in some aspects but not in others It is important to look at overall quality Do the data meet the needs of users? Vladimir Markhonko 12/7/2007 Three Aspects of Quality Quality of incoming data Quality of processing (matching, merging, ...) Quality of outputs - likely to be different to survey based outputs, but are they better? Vladimir Markhonko 12/7/2007 Quality Measurement How to measure the quality of data from administrative sources? Comparing sources Quality check surveys Knowledge of source (metadata) Quality reports / templates Vladimir Markhonko 12/7/2007 Quality Templates Companies House Data • Framework: Contract • Frequency: Quarterly updates, continuous on-line access • Timeliness: Good • Quality: Good • Delivery: CD-ROM / Internet • Key content: Legal name, company number Vladimir Markhonko 12/7/2007 Using Administrative Data Conversion to statistical concepts and definitions Linking / Matching – – Exact Matching - linking records from two or more sources, often using common identifiers Probabilistic Matching - determining the probability that records from different sources should match, using a combinationVladimir of variables Markhonko 12/7/2007 UK Business Register VAT Survey inputs Satellite registers Company registrations PAYE Geographic information systems Business Register Dun and Bradstreet Vladimir Markhonko 12/7/2007 Vladimir Markhonko 12/7/2007 Satellite Registers Vladimir Markhonko 12/7/2007 Examples of Satellite Registers Tourism - hotel register (category, number of beds) Transport - vehicle or ship register (type, capacity) Distributive trades - buildings register (building size, sales area) Vladimir Markhonko 12/7/2007 Conclusions Administrative sources should be defined in the widest sense There are many benefits in using administrative data, particularly reduced costs There are problems when using administrative data, but usually someone has found a solution Vladimir Markhonko 12/7/2007 Conclusions Most problems can be reduced by effective planning and detailed knowledge of the source The benefits are often greater than the costs Vladimir Markhonko 12/7/2007 Thank you for your attention. Vladimir Markhonko 12/7/2007