Realizing the statistical potential of administrative data John Dunne, John Hayes Central Statistics Office, Ireland Paper presented at the U.N.E.C.E. Seminar on New Frontiers for Statistical Data Collection, 31 October-02 November, 2012, Geneva Introduction • This paper describes the progression towards an Irish Statistical System, a holistic system based on the exploitation of administrative data, comprehending linkages to survey data and other administrative data. • The paper focuses on the role of the CSO’s Administrative Data Centre, which has the dual purpose of acting as clearing house for administrative data and promoting the development of the Irish Statistical System. 2 The National Statistics Board • In 2009, the National Statistics Board (NSB) laid out a strategy1 for achieving an Irish Statistical System. Amongst the implementation priorities identified is: Developing systems to ensure that the statistical value of existing survey and administrative data is maximized. • The NSB paper also identified three critical infrastructural requirements in developing the Irish Statistical System: 1 A unique business identifier and a central business register; A unique personal identifier; Spatial and geographic data capture. Strategy for Statistics, 2009-2014, http://www.nsb.ie/media/nsbie/pdfdocs/StrategyforStatistics20092014.pdf 3 Policy progression • In 2011, an NSB position paper1 elaborated on some of the core objectives of the earlier document, advocating, in particular: The development of the infrastructure to maximise the use of data sources, including the compilation of registers of persons, businesses, and buildings, with linkage between each such register – “joined-up” data. • The government Public Sector Reform Plan2, published in 2011, further supports the development of the Irish Statistical System with the following stated objectives: Improved sharing of data on businesses across the Public Service, including the development of business registers linkable to that of the Revenue Commissioners; Developing a code of practice for data gathering and its use for statistical purposes across the Public Service, including promoting consistent approaches to identifiers, classifications, and geo-spatial/postcode data. 1Double paper The Irish Statistical System: The Way Forward and Joined Up Government Needs Joined Up Data http://www.nsb.ie/media/nsbie/pdfdocs/NSB%20ISS%20Position%20Papers.pdf 2 http://per.gov.ie/wp-content/uploads/Public-Service-Reform-pdf3.pdf 4 The Statistics Act, 1993 • The CSO was established statutorily under the Statistics Act, 19931. This legislation assigns certain powers to the Director General of the CSO with respect to data held by public authorities: The Director General may require a public body to provide copies of any records in its charge for statistical purposes; The Director General may require a public body to co-operate with him on assessing the statistical potential of its records and in developing its recording methods and systems for statistical purposes; A public body shall consult with the Director General, and accept his reasonable recommendations, if it proposes to introduce or revise any system for the storage and retrieval of information or to make a statistical survey. 1 http://www.irishstatutebook.ie/1993/en/act/pub/0021/print.html 5 A joined-up data system (after Thygesen1) 1 The importance of the archive statistical idea for the development of social statistics and population and housing censuses in Denmark, Thygesen, Lars, 2011 http://ww4.dst.dk/upload/nordbotten_and_denmark_final_draft_4.pdf 6 Joined-up data and the CSO • The CSO’s Business Register is fully aligned with administrative sources from the Revenue Commissioners. • Linkage between persons and businesses is available to the CSO from employer tax returns to the Revenue Commissioners. • There exists in Ireland a comprehensive buildings database for the state, called the Geodirectory1, available on a commercial basis. • Ireland does not yet have a post code system, but this is planned for 2013. • The Department of Social Protection maintains the master list of official Personal Public Service Numbers (PPSN) in the state. This list is the basis of the CSO’s Person Activity Register, which identifies each person’s engagement with key administrative systems. 1 http://www.geodirectory.ie/ 7 The CSO’s Administrative Data Centre • The Administrative Data Centre (ADC) is the CSO unit designated as the conduit for data transfers from other government bodies and is the central repository for received data from those bodies. • This unit currently maintains over fifty different administrative data flows serving the statistical production systems in the CSO. • ADC controls access to the data in accordance with confidentiality obligations under national and EU legislation. • Subject to these criteria, ADC may also make anonymized data available as Research Micro Files (RMFs) to external researchers. 8 9 Following the setting-up of the ADC... 10 ADC interaction with other public bodies • ADC policy is to implement institutional-level Memorandums of Understanding (MoUs) to underpin the flow of administrative data to the CSO, as distinct from having data flow-specific MoUs. • In the case of the Office of the Revenue Commissioners, the MoU1 has led to a relationship which has allowed the CSO to adopt a business register that is based on the Revenue Commissioners’ registration system and to use the Revenue Customer Number as a common business identifier between the two bodies. • The government has charged the CSO with developing a statistical code of practice for the Irish public service. The ADC is progressing this objective through its chairing of the Statistician Liaison Group, a forum of statistical units across the public service. 1 http://www.cso.ie/en/aboutus/descriptionsandfunctions/memorandumofunderstandingbetweenthecsoandrevenue/ 11 ADC – technical aspects • Data received from other government bodies are converted to SAS datasets and held in a warehouse environment having Source, Analysis, and External Researcher tiers. • In the case of person-based administrative data, ADC anonymizes such files before making them available to CSO users, as Analysis tier data flows. • All CSO staff have access, via a data portal, to core metadata and summary statistics on all administrative data held. • The data model for the administrative data held in the ADC domain is a hierarchical model: Data flow Data flow instance Instance version Datasets 12 ADC – technical aspects • An example of an Analysis tier data flow is the P35 (employee) dataset, which links person- and business-based registers as illustrated here: 13 The future – concrete objectives • The key challenge for the CSO will be to avail of the increasing opportunities for joining up available administrative data sources. Steps to complete a fully joined-up data system in Ireland might include: • • • The implementation in public administration systems of a link between a person and a residence, where the residence is itself identified by a location or (x,y)-based identifier; The mandatory use of the PPSN in the engagement of persons with the state through the different life stages; The implementation of a unique business identifier for businesses interacting with the state, and the linking of this identifier with a building identification number. 14 The future – critical success factors • Statistical code of practice for the Irish public sector • Partnership approach to development of joined-up data • Delivery of projects which deliver value for policy purposes 15 Conclusion The Irish Statistical System continues to face significant challenges in the years ahead; however in the words of W. Edwards Deming, “It is not necessary to change. Survival is not mandatory.” 16