The Use of Administrative Sources for Economic Statistics An Overview Steven Vale Office for National Statistics UK Contents • Definitions • Advantages of using administrative data • Common problems • Quality of administrative data • Using administrative data in practice • Conclusions Narrow Definition Data Sources Primary (Statistical) Secondary (Non-statistical) Public Sector Private Sector Wider Definition Data Sources Primary (Statistical) Secondary (Non-statistical) Public Sector Private Sector Administrative sources are sources containing information which is not primarily collected for statistical purposes. Reasons for this Definition • Privatisation of some government functions • Growth of private sector “value-added re-sellers” • User interest in new types of data Benefits of Administrative Data • Cost – Surveys / censuses are expensive, administrative data are often “free” • Response burden – Reduced burden on data suppliers – Statistics can be compiled more frequently with no extra burden Benefits of Administrative Data • Coverage – Full coverage of target population – No survey errors and lower non-response – Better small-area data • Timeliness (sometimes!) • Public image – Making use of existing data can enhance the prestige of a statistical organisation by making it seem more efficient Population Census Costs 2000-2001 • UK, €367m, €6.2 per person • Austria, €56m, €6.9 per person • Finland, €0.8m, €0.2 per person Source: Eurostat – Documentation of the 2000 round of population and Housing censuses in the EU, EFTA and Candidate Countries; Table 22 Common Problems • Administrative units do not always coincide with statistical units • Conversion via automatic rules for simple cases • Profiling for more complex cases – Gives a better understanding of complex business structures – Expensive and needs trained staff Common Problems • Different definitions and classifications – Administrative and statistical priorities are often different – Conversion matrices needed for different classifications • Timeliness – Data arrive too late – Data relate to a different time period Lag in days 1000 950 900 850 800 750 700 650 600 550 500 450 400 350 300 250 200 150 100 50 0 Frequency (thousands) VAT Birth Lags 200 180 160 140 120 100 80 60 40 20 0 VAT Birth Lags • 2/3 of businesses are on the register within 2 months of start-up • Mean lag = 4 months due to “outliers” • Median = Approx. 40 days • Some pre-register - negative lags Common Problems • Change management – Risk of changes in government policy, thresholds, definitions, coverage etc. – Need contingency plans • Data from multiple sources – Matching / linking issues – Data conflicts – priority rules Quality of Administrative Data • There are many aspects to quality • Administrative data will be better than survey data in some aspects but not in others • It is important to look at overall quality • Do the data meet the needs of users? Three Aspects of Quality • Quality of incoming data • Quality of processing (matching, merging, ...) • Quality of outputs - likely to be different to survey based outputs, but are they better? Quality Measurement • How to measure the quality of data from administrative sources? – Comparing sources – Quality check surveys – Knowledge of source (metadata) – Quality reports / templates Quality Templates Companies House Data • Framework: Contract • Frequency: Quarterly updates, continuous on-line access • Timeliness: Good • Quality: Good • Delivery: CD-ROM / Internet • Key content: Legal name, company number Using Administrative Data • Conversion to statistical concepts and definitions • Linking / Matching – Exact Matching - linking records from two or more sources, often using common identifiers – Probabilistic Matching - determining the probability that records from different sources should match, using a combination of variables UK Business Register VAT Survey inputs Satellite registers Company registrations Business Register PAYE Geographic information systems Dun and Bradstreet Satellite Registers Examples of Satellite Registers • Tourism - hotel register (category, number of beds) • Transport - vehicle or ship register (type, capacity) • Distributive trades - buildings register (building size, sales area) Conclusions • Administrative sources should be defined in the widest sense • There are many benefits in using administrative data, particularly reduced costs • There are problems when using administrative data, but usually someone has found a solution Conclusions • Most problems can be reduced by effective planning and detailed knowledge of the source • The benefits are often greater than the costs Thank-you for listening. Any Questions? steve.vale@ons.gov.uk