Integration of different micro data statisticians Integration of different micro data bases

advertisement
Integration of different micro data
bases – a significant value added for
statisticians
Francesca Monacelli
Statistics collection and processing Department
21 June 2013
Integration of different micro data bases
a significant value added for statisticians
AGENDA
• Some preliminary definitions
• The pursuit in the integration of data-bases
• The internal dissemination of Central Credit Register’s data (an example)
• Final remarks
1
Some preliminary definitions
• What is a microdata? Information describing the characteristics of the units of a
population
• Which population? The same of the Reporter and/or of third parties
• Which categories of information we deal with: quantitative and qualitative data
(Registers).
Our methodological approach and inquiry tools are the same. For this reason we
call these data indistinguishably “reported data”.
The same approach and tools are also used for the management of:
aggregated/compiled data (macrodata), DQM indicators, the definition of a survey.
This uniform treatment is a significant value added for statisticians
The pursuit in the integration of data bases

definition of reporting schemes aimed to avoid redundancies in data requests
(the multipurpose perspective of the model).
a unique identification process of all statistical requirements (governance)
Integration is found in merging the different statistical requirements in a unique
general framework and results in …..

a unique data representation model aimed to manage all different types of
Reported and calculated data (micro and macro data, quantitative and
qualitative data,…)

a unique corporate statistical data dictionary aimed to define a
comprehensive system of definition and classification of all statistical concepts
used for different purposes and of all the transformation rules

a unique company-wide statistical data warehouse to be used for different
internal purposes and supported by the same inquiry methods
2
A company wide Data warehouse
Structural composition
- input collection
- DQM
- transformation rules
- concepts’ domains
- output production
reporting institutions
national & intern.l entities
TARGET platform
monetary policy platform
….
data vendors
Microdata
reporting institutions
data vendors
national authorities…
Macrodata
Single DWH for multidimensional data
process of merging also the Time series DB
in the
A company wide Data Warehouse
Logical thematic sections (multidimensional)
Work data
Etc ……..
Etc ……..
Some sections contain also macrodata as ready to use statistics
Micro
data
Metadata managing sw
Macrodata
- aggregations by
counterparty sector/econ.
activity, geogr. area,
phenomenon, reporter type,
dimension, etc….
- statistical indicators
- ratios
3
A company wide Data Warehouse
Use & users
Work data
Etc ……..
Etc ……..
End user
Specialised app.
External flows
Publications
On line DB
- General inquiry
- Supervision analysis
- etc.
7
A COMPANY WIDE DATA WAREHOUSE
Access rights
Work data
Etc ……..
Etc ……..
applications
NEED TO KNOW PRINCIPLE
8
4
The internal dissemination of
Central Credit Register’s Data (an example)
Work data
Etc ……..
Etc ……..
applications
Central Credit Register’s Data (an example part 1)
join with borrower’s code
MICRODATA
basic layer with
reported data
(real code) +
borrowers
indicators
(restricted use &
log)
Results
of the
join
MICRODATA
Set of ID variables related to a
code. Harmonised domains of
gender, sector and type of
economic activity, residence, etc…
MACRODATA
Ready to use statistics on
single lender or credit
system (extended use).
Accessible by users &
applications
join with borrower’s alias
MICRODATA
Alias code +
micro
indicators
(extended
use)
Users put together the microdata contained in the
archives and build their personalised statistics without
being logged.
The set of ID data is limited
The inquiry tools for all the sections are the same
10
5
Central Credit Register’s Data (an example part 2)
MICRODATA
Set of ID
variables related
to a code.
Harmonised
concepts
(gender, sector
and type of
economic
activity,
residence, etc…)
NAM E
MICRODATA
Central Balance sheet DB
set of ID variables
(external source)
Different subject’s code and
different sets of ID variables
XYZ
SEX RESIDENCE TAXCODE CODE
NAM E
SEX TOWN
TAXNMB
NUM
123
J. White
M
Milan
DFWS4321
R. Seles
M
Turin
sdef43321t
R. Green
F
Venice
Grgr432432 ABC
R. D.Green
F
Venice
Grgr432432 234
B. B. Blue
F
Naples
2123jhgdks
B. B. Blue
F
Naples
fsfaffshhg
D. Red
M
Rome
edso099223 HJK
D. Red
M
Rome
edso099223 567
MATCHES FOUND
Now we can put together the info
on same subjects in Central BS
& CCR data (pairs of codes)
XYZ
123
EFG
CODE
NUM
ABC
234
Identity of relevant fields
HJK
567
Identity of all fields
345
11
Final remarks
• So far we have achieved the integration of all multidimensional data
(micro and macro)
• Time series can be represented as multidimensional data where for all the
dimensions, except time, is established a unique legal value: the time series
key
• With tranformation rules we create time series from multidimensional
reported macro data
• The “Time series DB” contains info derived from multidimensional data +
external data (OECD, IMF, ECB, etc)
•The last challenge is to integrate the Time series Data base with the rest of
the Data Warehouse
increase efficiency of data production and use
6
Download