Draft MINUTES 2011-06-07 STATISTICS SWEDEN Process department Kent Olofsson 1(6) ESS-net on Data warehousing and data linking Workshop Helsinki 31 May 2011 Participants Pieter Vlag Maia Ennok Allan Randlepp Janis Juriso Sami Saarikvi Riita Piela Pekka Tamminen Lars Göran Lundell Kent Olofsson Peter Thoren NL EE EE EE FI FI FI SE SE SE Agenda See document “ESSNET_DWH_agenda workshop Helsinki_v02.doc” Welcome Sami greeted the participants Presentation of the participants Opening + Introduction Presentation of the ESS-net project Find out what projects are running in Europe - Questionnaires were sent out - Difficult to find out what was really going on - Meet to exchange ideas It was difficult to interpret the results from the questionnaire. That’s why this meeting was decided with the aim to exchange ideas. Comparison of the DWH-models What to discuss? - DW architecture - Data warehouse Document1 16-02-09 06.30 Draft MINUTES 2011-06-07 STATISTICS SWEDEN Process department Kent Olofsson - 2(6) Data warehousing Does the term refer to the whole process, from collection, editing and in the end the DW, or only the pure DW storage? What are the boundaries? Pieter presented the ONS model and how it is used in the Netherlands Se below - All types of input sources - Data checked - On standards units - Data cleaning. Where? Data Analysis – DW DW should make it possible to produce flexible output regardless of input source. Metadata describe what you have in the DW. Business register is a kind of key input BR people in the Netherlands say no!!! They say BR is an independent information frame. XBRL is only a delivery format, or it is a kind of source because it is a delivery format directly from the enterprises. Stored in SQL-databases Data on standard units can have the same data from different sources. Cleaned original data linked to standard units. No plans for the DW model A short discussion took place whether the business register is a part and integrated in the DW or not. The impression was that the opinions of the participants were that BR is an integrated part of the DW. It was not clarified exactly what the participants meant. The Finnish model See the presentations “May31_Saarikivi_Piela.ppt and Concepts_Ensglish.pdf” Data collection in one database both direct collection and admin data. Technical checking of admin data. No checks of the contents. Data processing in one database DW updated every day External database also Document1 16-02-09 06.30 Draft MINUTES 2011-06-07 STATISTICS SWEDEN Process department Kent Olofsson 3(6) The Finnish DW pilot studies A brief presentation of the cube made in SQL server 2008 Analysis Server - One dimension per classification. - The fact table contains one column per variable - The fact table contains annual data (Later on a question was put and the answer was that the fact table contains annual, quarterly and monthly data) - The DW is supposed to contain only correct data - The Enterprises do not show up as a dimension. The Enterprise ID is a column in the fact table NA and others take data from DW and put it into another DW. SAS and Proclarity are used to access the DW The Estonian model See the presentation “Data architecture.pptx” Use Oracle Raw data databases for different collections, direct web and admin data Data processing in several databases, then DW is loaded from those databases. The collection data produce the data staging area. One system organised per statistical area. Some parts are standardised. The DW is loaded from the data staging area. Integrated statistical registers: business, population, dwellings Metadata exist for everything, also for processing. Dissemination from statistical database using PC-Axis. Try to keep data in one place, the DW. Users should use that data, not copy data. DW more or less in the planning phase but will be loaded with census data. Some pilots exist. The Swedish model See the presentation: DW strategy - Helsinki May 2011.pptx The term Data Warehouse is avoided, since it is confusing in Swedish. Document1 16-02-09 06.30 STATISTICS SWEDEN Process department Kent Olofsson Draft MINUTES 2011-06-07 4(6) Metadata are in the centre, vital for the system. Meta data today are mostly documentation afterwards. Want to make metadata more active. The “input data funnel” is one single data entry point for all admin data. The observation data store contains longitudinal data. Every action creates a new generation of a data item. All data are kept and if an item is not correct, a new value is inserted. Only inserts are allowed, no updates. Data stored in one place only, minimized data transfers. Data harmonised. Base registers (population registers): Business, Individuals and Real property. In current situation “Everybody is allowed to fool around”. In the future all users have to go through a service layer. Meta data partly in place The base registers are in place but the links, (individuals/local units, individuals/dwellings and local units/addresses/real property/coordinates) between them have to be improved. The funnel is in place but is not handling all admin data. Some deliveries of data to SCB go outside the funnel. No editing between observation data and target data, only derivation of the same data. Project organisation in Estonia and Finland Finland See below Project duration: 2011 – 2014 Temporary unit: 8 persons (project managers, 2 IT persons). About 60 persons participate in the project Project plan contains 30 staff years in total, no external resources. Heads of all units are members of the steering group. Very clear communication plan. Estonia A permanent DW-unit exists in the organisation. Several projects are currently going on. External resources are used. Document1 16-02-09 06.30 Draft MINUTES 2011-06-07 STATISTICS SWEDEN Process department Kent Olofsson 5(6) Steering group Project groups Development partners Subgroups 20 people participate in the project, 2 of those fulltime. About 10 external IT-persons. Discussion on the role of the Business Register The meeting agreed on: If the business register is not to correct, correct it The business register is an integrated part of the DW The impression was that the opinions of the participants were that BR is an integrated part of the DW. It was not clarified exactly what the participants meant. Metadata See the presentations: “The role of metadata - Helsinki May 2011.pptx It seems that everybody agrees on the importance, but not very much has been done. Why? Active metadata contribute to production of statistics Passive metadata are documentation afterwards Estonia solution: Centralised metadata physically and logically, but they can be replicated. - Collection metadata, processing metadata and links between. - Can trace a certain data cell back to the source. Finland solution: XML database, one single repository Comments: - We have to be practical. - We have find out what we need for the data warehouse. - What is the minimum? - We have to prioritize. Document1 16-02-09 06.30 STATISTICS SWEDEN Process department Kent Olofsson Draft MINUTES 2011-06-07 6(6) How to ensure flexible output and data confidentiality Estonia: All figures can be used for analysis, but cannot be published. Finland: In the data warehouse all kinds of data can be combined, but the access to data is regulated. Document1 16-02-09 06.30