ESS-minutes Helsenki

advertisement
Draft
MINUTES
2011-06-07
STATISTICS SWEDEN
Process department
Kent Olofsson
1(6)
ESS-net on Data warehousing and data linking
Workshop Helsinki
31 May 2011
Participants
Pieter Vlag
Maia Ennok
Allan Randlepp
Janis Juriso
Sami Saarikvi
Riita Piela
Pekka Tamminen
Lars Göran Lundell
Kent Olofsson
Peter Thoren
NL
EE
EE
EE
FI
FI
FI
SE
SE
SE
Agenda
See document “ESSNET_DWH_agenda workshop Helsinki_v02.doc”
Welcome
Sami greeted the participants
Presentation of the participants
Opening + Introduction
Presentation of the ESS-net project
Find out what projects are running in Europe
- Questionnaires were sent out
- Difficult to find out what was really going on
- Meet to exchange ideas
It was difficult to interpret the results from the questionnaire. That’s why this
meeting was decided with the aim to exchange ideas.
Comparison of the DWH-models
What to discuss?
- DW architecture
- Data warehouse
Document1
16-02-09 06.30
Draft
MINUTES
2011-06-07
STATISTICS SWEDEN
Process department
Kent Olofsson
-
2(6)
Data warehousing
Does the term refer to the whole process, from collection, editing and in the
end the DW, or only the pure DW storage? What are the boundaries?
Pieter presented the ONS model and how it is used in the Netherlands
Se below
- All types of input sources
- Data checked
- On standards units
- Data cleaning. Where?
Data Analysis – DW
DW should make it possible to produce flexible output regardless of input
source. Metadata describe what you have in the DW.
Business register is a kind of key input
BR people in the Netherlands say no!!! They say BR is an independent
information frame.
XBRL is only a delivery format, or it is a kind of source because it is a
delivery format directly from the enterprises.
Stored in SQL-databases
Data on standard units can have the same data from different sources.
Cleaned original data linked to standard units.
No plans for the DW model
A short discussion took place whether the business register is a part and
integrated in the DW or not. The impression was that the opinions of the
participants were that BR is an integrated part of the DW. It was not clarified
exactly what the participants meant.
The Finnish model
See the presentations “May31_Saarikivi_Piela.ppt and
Concepts_Ensglish.pdf”
Data collection in one database both direct collection and admin data.
Technical checking of admin data. No checks of the contents.
Data processing in one database
DW updated every day
External database also
Document1
16-02-09 06.30
Draft
MINUTES
2011-06-07
STATISTICS SWEDEN
Process department
Kent Olofsson
3(6)
The Finnish DW pilot studies
A brief presentation of the cube made in SQL server 2008 Analysis Server
- One dimension per classification.
- The fact table contains one column per variable
- The fact table contains annual data (Later on a question was put and
the answer was that the fact table contains annual, quarterly and
monthly data)
- The DW is supposed to contain only correct data
- The Enterprises do not show up as a dimension. The Enterprise ID is
a column in the fact table
NA and others take data from DW and put it into another DW.
SAS and Proclarity are used to access the DW
The Estonian model
See the presentation “Data architecture.pptx”
Use Oracle
Raw data databases for different collections, direct web and admin data
Data processing in several databases, then DW is loaded from those
databases.
The collection data produce the data staging area. One system organised per
statistical area. Some parts are standardised. The DW is loaded from the data
staging area.
Integrated statistical registers: business, population, dwellings
Metadata exist for everything, also for processing.
Dissemination from statistical database using PC-Axis. Try to keep data in
one place, the DW. Users should use that data, not copy data.
DW more or less in the planning phase but will be loaded with census data.
Some pilots exist.
The Swedish model
See the presentation: DW strategy - Helsinki May 2011.pptx
The term Data Warehouse is avoided, since it is confusing in Swedish.
Document1
16-02-09 06.30
STATISTICS SWEDEN
Process department
Kent Olofsson
Draft
MINUTES
2011-06-07
4(6)
Metadata are in the centre, vital for the system. Meta data today are mostly
documentation afterwards. Want to make metadata more active.
The “input data funnel” is one single data entry point for all admin data.
The observation data store contains longitudinal data. Every action creates a
new generation of a data item.
All data are kept and if an item is not correct, a new value is inserted. Only
inserts are allowed, no updates.
Data stored in one place only, minimized data transfers. Data harmonised.
Base registers (population registers): Business, Individuals and Real
property.
In current situation “Everybody is allowed to fool around”. In the future all
users have to go through a service layer.
Meta data partly in place
The base registers are in place but the links, (individuals/local units,
individuals/dwellings and local units/addresses/real property/coordinates)
between them have to be improved.
The funnel is in place but is not handling all admin data. Some deliveries of
data to SCB go outside the funnel.
No editing between observation data and target data, only derivation of the
same data.
Project organisation in Estonia and Finland
Finland
See below
Project duration: 2011 – 2014
Temporary unit: 8 persons (project managers, 2 IT persons).
About 60 persons participate in the project
Project plan contains 30 staff years in total, no external resources. Heads of
all units are members of the steering group. Very clear communication plan.
Estonia
A permanent DW-unit exists in the organisation. Several projects are
currently going on. External resources are used.
Document1
16-02-09 06.30
Draft
MINUTES
2011-06-07
STATISTICS SWEDEN
Process department
Kent Olofsson
5(6)
Steering group
Project groups
Development partners
Subgroups
20 people participate in the project, 2 of those fulltime. About 10 external
IT-persons.
Discussion on the role of the Business Register
The meeting agreed on:


If the business register is not to correct, correct it
The business register is an integrated part of the DW
The impression was that the opinions of the participants were that BR is an
integrated part of the DW. It was not clarified exactly what the participants
meant.
Metadata
See the presentations: “The role of metadata - Helsinki May 2011.pptx
It seems that everybody agrees on the importance, but not very much has
been done. Why?
Active metadata contribute to production of statistics
Passive metadata are documentation afterwards
Estonia solution:
Centralised metadata physically and logically, but they can be replicated.
- Collection metadata, processing metadata and links between.
- Can trace a certain data cell back to the source.
Finland solution:
XML database, one single repository
Comments:
- We have to be practical.
- We have find out what we need for the data warehouse.
- What is the minimum?
- We have to prioritize.
Document1
16-02-09 06.30
STATISTICS SWEDEN
Process department
Kent Olofsson
Draft
MINUTES
2011-06-07
6(6)
How to ensure flexible output and data confidentiality
Estonia:
All figures can be used for analysis, but cannot be published.
Finland:
In the data warehouse all kinds of data can be combined, but the access to
data is regulated.
Document1
16-02-09 06.30
Download