Country Estonia NSI Statistics Estonia Name contact person Maia Ennok E-mail contact person maia.ennok@stat.ee 1. What is the current status of your S-DWH? (planned / in development / in implementation or operational) Indicated in the questionnaire: question 6 No single coherent system which covers most of the data in the production of business statistics yet Implementing S-DWH is implemented at first for Population and Housing Census in 2011. Implementing all social studies and 6 economical statistical activities by the end of 2014 General Description 2.Please give a (short) general description of the nature of your SDWH by describing topics as: Which problems do/did you want to solve by implementing a S-DWH? What is the goal of the SDWH? What are the most important solutions desired by the NSI, to be solved by implementing a S-DWH? What functionalities are (to be) implemented in the S-DWH? Indicated in the questionnaire: question 1 + 1.1 resp 1 + 1.2 + remarks, 2.1 and 14 No Single conceptual approach for processing data in the production of business statistics. The current situation be best characterized as ‘The 'Process Model' We have the "Process Model" approach if there is also possible more than one input for output and inputs (surveys, admin data, census, observation register) and outputs (aggregate statistics, micro data, time series) are the same for the "Process Model" as for the "Data Model". We have the "Data Model" approach for the business register for statistical purposes. (Remark made) Main motivation to start DWH in your business statistics systems are: More ways to (re)use data Data linking / integration To implement changes in business process architecture To improve efficiency in business statistics production Technical efficiency Most important challenges of the NSI - Changes in the society are fast. - The users of statistics in Estonia as well as on international level require data which are reliable, comparable and corresponding to the users’ needs. - The demand for statistics is continuously increasing. - For the EU Member States the compatibility of the needs of the European Union and the country is a challenge. - Local governments and counties need data for observing their development and for making decisions. - For the new decade the qualitative data have more important role than ever in the society as a whole (in public administration and enterprise). That is the reason why the importance of objectivity and professionalism of statistics is - increasing. The users of statistics need the data as soon as possible. For this reason the statisticians must choose between being upto-date and quality, compromises must not be made with regard to the latter. Most important technical weaknesses data is duplicated data consistency is not assured metadata is missing or not harmonized different data sets are hard or impossible to link data re-using is very difficult impact analysis is not possible different statistical activities use different platforms for data processing and data storing standardizing of processes is very time consuming Goals from business strategy 1. Increase the coherence and comparability of statistics of different subject areas 2. Increase the volume of analytical products 3. Respond faster to the needs of clients 4. Standardise the process of data processing 5. Separate product development and orders for information from production process 6. Develop the department-driven organisational culture into the institution-driven one 7. Create a modern working environment 8. Reuse of statistical data, metadata and data steps (data processing code) Solutions to achieve above mentioned goals - Centralizing and standardizing meta data (Goals:1;3;4;6;7;8) - Standardizing and automating processes of data processing (Goals:1;3;4;6;7;8) - Building and using S-DWH (Goals:1;2;3;5;6;7;8) - Centralizing technical specialists of data warehouse (Goals:1;4;6) - Single integrated platform for metadata, data processing and data storing (creating precondition for creating unified archiving solution) (Goals:1;2;3;4;5;6;7;8) - Functionalities: Data processing software (VAIS) has essential role in IT architecture. It is central integration tool and main data processing software. VAIS moves data from one system to another. For example from data collection to data processing and statistical registries, from processing to data warehouse etc. VAIS is metadata driven and enables to combine workflows from prior programmed data processing steps. Manual intervention (edit, impute, code etc) is also possible using operator’s application. iMeta is central metadata repository, based on MMX MOF 3 meta-meta model, that enables to manage several different meta models. As metadata we manage both reference metadata and structural metadata (including process metadata, technical metadata, user roles and privileges etc). Data warehouse (conformed collection of datasets) consists of data processed and prepared for analysis. In data warehouse variables (columns) are linked with variable descriptions in iMeta. Data sets in Data Warehouse are versionised. With each version is stored also suitable data processing package version, with what data was produced. Data sets are mutually linked with common dimensions and facts in different data sets are unique (avoid data duplication in different data sets). 3. What is the scope of your S-DWH? (ETL, Data Warehousing, metadata, integrations between metadata system and ETL) 4. How is the S-DWH organised? (activities, responsibilities, roles etc) Need assistance of centre of competence or best practice from some other NSI’s Data duplication in S-DWH Store everything vs easy access Best practice for storing population and sample Role of Statistical Business Registry (?) Implementation of standards (for example DDI) Statistical Data Warehouse system consists of: Phase 2: iMeta application for designing statistical activity (specifying outputs, variables, data collection methods etc) Phase 2 and 3: VAIS Designer for designing, building and configuring workflows Phase 5: VAIS Operator for manual processing, XDTL Runtime for automatic meta data driven data processing (ETL) and Data Warehouse (conformed datasets) for storing processed data to be analysed in Phase 6. Phase 6: VAIS Cube Designer for designing and building metadata enriched cubes, calculating new variables etc. All metadata is stored in one metadata repository. Best practice: Metadata repository - MMX (integrated metadata system (iMeta), data processing workflow design, build, configuration (VAIS Designer)) Metadata driven data processing system (XDTL Runtime, VAIS Operator) Metadata driven cube generation (VAIS Cube Designer) In data processing, there are fallowing roles: Lead statistician – describes metadata of statistical activity in metadata system and data processing rules and checks in data processing system. Workflow designer – compile workflow in data processing system. Data Warehouse programmer – programs and complement reusable data processing steps. Data Warehouse architect– develops and maintains data model of Data Warehouse. Operator – manually corrects errors found during data processing. Analyst – analyze data in Data Warehouse and produces statistics. Lead statistician, operators and analysts are structurally in departments of statistical domains. Data Warehouse architect, workflow designer, Data Warehouse programmer are situated in unit of data warehouse. Statistical domains are responsible for statistical methodological side and data warehouse unit is responsible for data warehouse technical realization. 5. Which problems did you encounter so far? Indicated in the questionnaire: questions 15, 16, 17 Main methodological barriers to implementing an integrated system are: Not enough expertise Too difficult, complex Data linking Insufficient development time Too high costs Problems encountered in data integration you desire to solve with an integrated DWH-system are: With Data Warehouse system we could have more data available to use and reuse, and by using metadata-driven DWH-system we could have better data quality for output data. DWH could positively effect the response burden (diminish administrative response burden). (remark made) Problems encountered in process integration you desire to solve with an integrated DWH-system With Data Warehouse system we could have more effective and unified process (using one framework for all data warehouse processes) and with monitoring process metadata we could have better process quality. (remark made) It is difficult to develop massive system if requirements for system are not cleared. Creation (agreement) of metadata to new system takes a lot of time. People resistance to changes. Support of administration for changes could be bigger. No agreement how new system has influence division of work between departments. Technical problems with implementing freshly finished software. Characteristics 6. Please check the characteristics as indicated in the questionnaire and correct / complete if necessary? Indicated in the questionnaire: questions 3 + 4 There is not a one-to-one correspondence between input data and outputs. The data warehouse is passive. ETL part is active but data storing is passive (no changes in data warehouse). 7. Is the data in the SDWH unit based (micro data)? 8. Do you have weighted data in the S-DWH? Generally yes, but we can store aggregates also if needed. We plan to store both, original and weighted data together with weights. Statistical Registers 9. Is the Business Register (BR) or other Statistical Register (SR) managed in the S-DWH? 10. Which BR / SR data are stored in the S-DWH? Indicated in questionnaire: question 10 The business register currently sits (or will it be) outside the current (or planned) DWH system. BR is stored and managed outside of SDWH. stratification variables (from snapshots to Statistical Unit dimension) identifying variables (only in house ID, district level address) 11. Is the S-DWH used for updating the BR? Yes, mainly for updating stratification variables in BR. 12. Do you have a snapshot of the BR in the S-DWH? No, snapshots of BR is stored together with BR. 13. Do you keep different versions of BR snapshot in the S-DWH? No, snapshots of BR are versioned in BR. Statistical Unit dimension in S-DWH is based on BR snapshot and is versioned according to snapshot versions. Metadata concepts!) (please read Annex 1 for an explanation of the underlined Metadata 14. What kind of statistical metadata is used? 15. What kind of process metadata is used? 16. What kind of technical metadata is used? A. What kind of reference (business) metadata is used? We use iMETA system for statistical metadata (both reference and structural). Neuchatel model: statistical activity descriptions, collection methods. B. What kind of structural (technical) metadata is used Codes in Neuchatel model: statistical activity, statistical activity version, variables, classifiers (inc code lists) etc. S-DWH creates and uses process metadata. XDTL metamodel: package, tasks, steps, parameter, variable, connection. Rule, transformation, condition etc. Relational Database (RDB) Metamodel (database, schema, table, column, roles etc.). 17. What kind of quality metadata is used? 18. What kind of authorisation metadata is used? 17. Specify the key relations between the metadata classes according to your metamodel(s)? (eg statistical activity, classifier, variable) 18. Specify the key relations between the different kinds of metadata (reference, structural, quality) 19. How is the metadata maintenance organised and stored? 20. What are your metadata quality requirements? Neuchatel model: statistical Imputation rate, unit and item activity instance response rate etc. description: user needs, user satisfaction, quality assurance, quality assessment, quality management, quality documentation. Descriptions which Role-Based Access Control Model statistical activity uses and role based security: privileges which administrative data. by statistical activity, by variables, Who is statistical activity by operations. manager. Relation between statistical activity and variables, variables and classifiers etc. Different kind of metadata is somehow related. Key relations: relations between variable and database column, relations between variable and privilege (role based data access), etc. iMeta is central metadata repository, based on MMX MOF 3 meta-meta model, that enables describe several different meta models. As metadata we manage reference metadata, structural metadata (including process metadata, technical metadata, user roles and privileges etc.). Methodological unit is responsible for metadata management. Different units are responsible for metadata inputs and updates. Principle is that metadata is filled where it formed (process and unit). Reference metadata is just descriptive and not used for metadata driven system (it gives user the context). Structural metadata is used for metadata driven system (example variables codes). Metadata is described in metadata systems by templates to assure: no missing values, only unique codes, metadata property quality rules (dependence between different types of metadata elements) etc. Annex 1: Metadata concepts1 Reference metadata are metadata that describe the contents and quality of the data in order to help the user understand and evaluate them (conceptually) Examples: Quality information on survey, register and variable levels; variable definitions; reference dates; confidentiality information; contact information; relations between metadata items Structural metadata are metadata that help the user find, identify, access and utilise the data (physically) Examples: Classification codes; parameter lists Statistical metadata are data about statistical data This definition will obviously cover all kinds of documentation with some reference to any type of statistical data and is applicable to metadata that refer to data stored in a S-DWH as well as any other type of data store Examples: Variable definition; register description; code list. Process metadata are metadata that describe the expected or actual outcome of one or more processes using evaluable and operational metrics Examples: Operator’s manual (active, structured, reference); parameter list (active, structured, reference); log file (passive, structured, reference/structural) Technical metadata are metadata that describe or define the physical storage or location of data. Examples: Server, database, table and column names and/or identifiers; server, directory and file names and/or identifiers Quality metadata are any kind of metadata that contribute to the description or interpretation of the quality of data. Examples: Quality declarations for a survey or register (passive, free-form, reference); documentation of methods that were used during a survey (passive, free-form, reference); most log lists (passive, structured, reference/structural) Authorisation metadata are administrative data that are used by programmes, systems or subsystems to manage users’ access to data. Examples: User lists with privileges; cross references between resources and users 1 Metadata Framework for Statistical Data Warehousing v0.9; ESSnet on Data Warehousing