Integration of different micro data bases – a significant value added for statisticians Francesca Monacelli Statistics collection and processing Department 21 June 2013 Integration of different micro data bases a significant value added for statisticians AGENDA • Some preliminary definitions • The pursuit in the integration of data-bases • The internal dissemination of Central Credit Register’s data (an example) • Final remarks 1 Some preliminary definitions • What is a microdata? Information describing the characteristics of the units of a population • Which population? The same of the Reporter and/or of third parties • Which categories of information we deal with: quantitative and qualitative data (Registers). Our methodological approach and inquiry tools are the same. For this reason we call these data indistinguishably “reported data”. The same approach and tools are also used for the management of: aggregated/compiled data (macrodata), DQM indicators, the definition of a survey. This uniform treatment is a significant value added for statisticians The pursuit in the integration of data bases definition of reporting schemes aimed to avoid redundancies in data requests (the multipurpose perspective of the model). a unique identification process of all statistical requirements (governance) Integration is found in merging the different statistical requirements in a unique general framework and results in ….. a unique data representation model aimed to manage all different types of Reported and calculated data (micro and macro data, quantitative and qualitative data,…) a unique corporate statistical data dictionary aimed to define a comprehensive system of definition and classification of all statistical concepts used for different purposes and of all the transformation rules a unique company-wide statistical data warehouse to be used for different internal purposes and supported by the same inquiry methods 2 A company wide Data warehouse Structural composition - input collection - DQM - transformation rules - concepts’ domains - output production reporting institutions national & intern.l entities TARGET platform monetary policy platform …. data vendors Microdata reporting institutions data vendors national authorities… Macrodata Single DWH for multidimensional data process of merging also the Time series DB in the A company wide Data Warehouse Logical thematic sections (multidimensional) Work data Etc …….. Etc …….. Some sections contain also macrodata as ready to use statistics Micro data Metadata managing sw Macrodata - aggregations by counterparty sector/econ. activity, geogr. area, phenomenon, reporter type, dimension, etc…. - statistical indicators - ratios 3 A company wide Data Warehouse Use & users Work data Etc …….. Etc …….. End user Specialised app. External flows Publications On line DB - General inquiry - Supervision analysis - etc. 7 A COMPANY WIDE DATA WAREHOUSE Access rights Work data Etc …….. Etc …….. applications NEED TO KNOW PRINCIPLE 8 4 The internal dissemination of Central Credit Register’s Data (an example) Work data Etc …….. Etc …….. applications Central Credit Register’s Data (an example part 1) join with borrower’s code MICRODATA basic layer with reported data (real code) + borrowers indicators (restricted use & log) Results of the join MICRODATA Set of ID variables related to a code. Harmonised domains of gender, sector and type of economic activity, residence, etc… MACRODATA Ready to use statistics on single lender or credit system (extended use). Accessible by users & applications join with borrower’s alias MICRODATA Alias code + micro indicators (extended use) Users put together the microdata contained in the archives and build their personalised statistics without being logged. The set of ID data is limited The inquiry tools for all the sections are the same 10 5 Central Credit Register’s Data (an example part 2) MICRODATA Set of ID variables related to a code. Harmonised concepts (gender, sector and type of economic activity, residence, etc…) NAM E MICRODATA Central Balance sheet DB set of ID variables (external source) Different subject’s code and different sets of ID variables XYZ SEX RESIDENCE TAXCODE CODE NAM E SEX TOWN TAXNMB NUM 123 J. White M Milan DFWS4321 R. Seles M Turin sdef43321t R. Green F Venice Grgr432432 ABC R. D.Green F Venice Grgr432432 234 B. B. Blue F Naples 2123jhgdks B. B. Blue F Naples fsfaffshhg D. Red M Rome edso099223 HJK D. Red M Rome edso099223 567 MATCHES FOUND Now we can put together the info on same subjects in Central BS & CCR data (pairs of codes) XYZ 123 EFG CODE NUM ABC 234 Identity of relevant fields HJK 567 Identity of all fields 345 11 Final remarks • So far we have achieved the integration of all multidimensional data (micro and macro) • Time series can be represented as multidimensional data where for all the dimensions, except time, is established a unique legal value: the time series key • With tranformation rules we create time series from multidimensional reported macro data • The “Time series DB” contains info derived from multidimensional data + external data (OECD, IMF, ECB, etc) •The last challenge is to integrate the Time series Data base with the rest of the Data Warehouse increase efficiency of data production and use 6