in partnership with Title: Documentation of the mapping of the result of 1.4 on the ‘ideal architecture’ framework WP: 1 Deliverable: 1.6 Version: 1.0 Date: 30-6-2013 NSI: Statistics Estonia Authors: Maia Ennok ESS - NET ON MICRO DATA LINKING AND DATA WAREHOUSING IN PRODUCTION OF BUSINESS STATISTICS 0 ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) INDEX 1. Introduction 2 1.1 Target audience 2 1.2 What is in this document? 2 2. Metadata examples by metadata subsets, functionality groups and layers of the S-DWH 3 2.1 Source layer 3 2.2 Integration layer 6 2.3 Interpretation and data analysis layer 10 2.4 Data access layer 13 3. Metadata in data flows of Functional Architecture of the S-DWH 3.1 General variable flow 16 Source layer 17 Integration layer 17 Interpretation and data analysis layer 17 3.2 Data editing 17 3.3 Derivation of value 18 4. Conclusion 1 16 19 ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) 1. Introduction In this document we focus on the mapping of the Functionalities of metadata system to facilitate and support the operation of the S-DWH1 (deliverable 1.4) and Functional Architecture of the S-DWH2 (deliverable 3.3). We want to make the audience to become more aware of metadata in S-DWH. For that we give metadata examples by metadata subsets, functionality groups in layers of the S-DWH and give metadata examples by data flows in functional architecture of the S-DWH. 1.1 Target audience This document is intended as a tool to assist S-DWH experts and users to link metadata of S-DWH to the new functional architecture of the S-DWH for better implementing S-DWH and the metadata system of S-DWH. 1.2 What is in this document? In this document, we focus on the metadata subsets by metadata functionalities and give examples of metadata in data flows: Metadata examples by metadata subsets, functionality groups and layers; Metadata examples in data flows of Functional Architecture of the S-DWH. 1 Ennok M. (2012) Functionalities of metadata system to facilitate and support the operation of the S-DWH. Deliverable 1.4 2 Berglund B. (2013) Functional Architecture of the S-DWH. Deliverable 3.3 2 ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) 2. Metadata examples by metadata subsets, functionality groups and layers of the S-DWH In this paragraph we describe examples of metadata by metadata subsets, functionality groups and layers. Layers are defined in the S-DWH Business Architecture3 document (deliverable 3.1) and metadata subsets by layers are defined in the Metadata framework4 (deliverable 1.1). Examples of metadata by metadata functionalities are described in the Functionalities of metadata system to facilitate and support the operation of the S-DWH5 (deliverable 1.4). 2.1 Source layer 3 4 2–Design Metadata creation Phas e of GSB PM sub– process/met adata Statistical metadata 2.2 – design variable descriptions Creates metadata of IOR (Initial observation registry) variables; 2.3 - design data collection methodology; 2.4 – design frame and sample methodology Process metadata Quality metadata Technical metadata Authorisation metadata Creates validation rules of data collection Creates frame, sample, stratum metadata Laureti Palma A. (2012) S-DWH Business Architecture. Deliverable 3.1 Lundell L.G. (2012) Metadata Framework for Statistical Data Warehousing. Deliverable 1.1 5 Ennok M. (2012) Functionalities of metadata system to facilitate and support the operation of the S-DWH. Deliverable 1.4 3 ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) 3–Build 3.1 – build data collection instrument Creates technical metadata of data collection instrumen ts (server etc.); Creates technical metadata (servers, databases , tables, columns etc.) of IOR Creates statistical metadata of workflow (how data collection is carried on) 3.3 – configure workflows; Creates collection process metadata (process log: when (start time), who (started collection ), where (link to technical metadata ); 4.2 – set up collection; 4.3 – run collection; 4–Collect 4.4 – finalize collection Creates respon se rate/co unt of planne d records /collect ed records 4 3–Build Metadata usages Creates metadata of finalize collection (process log) 3.3 configure workflows – Uses metadata IOR variables of Uses metadata of users, roles, privileges of ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) Metadata maintenance 4–Collect 4.2 – set up collection; 4.3 – run collection; 4.4 – finalize collection Setting up collection uses 2.6 pre–fill metadata, 3.1 metadata of collection instruments, 4.1 select sample metadata; Using metadata of IOR variables for running and finalizing collection Maintain (create, update, delete, versioning) source metadata (data sources metadata, variables metadata, sample metadata etc.). Mostly manually managed, technical metadata is managed automatically. Source metadata can be stored in different meta models (managing meta models). Meta model for statistical metadata, process metadata etc. Metadata evaluation All kind of metadata and data are versioned 5 Setting up collection uses 3.1 data collection structure technical metadata (server, database, tables, columns etc) source layers User rights are managed according to the S–DWH system process operations. S–DWH has following operations – read metadata, create data processing packages, access to delicate data, solve data processing tasks, schedule packages etc. Evaluate and assure quality metadata of source layer by – fill-in controls of metadata validation Systematic built–in processes for managing the workflow of quality assurance of metadata Organizational processes of metadata validation ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) 2.2 Integration layer 6 1–Specify needs Metadata creation P h as e of G S P B M sub–process/ Statistical metadata metadata 1.3 – establish output objectives 1.6 – prepare business case Creates output variable metadata; create output metadata (which cube variables are in which cubes); Creates metadata of statistical activity Process metadata Quality metadata Technical metadata Authorisation metadata ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) 2.1. – design outputs; Builds output (output algorithms); 2.2 – design variable descriptions; Creates new variable metadata (creating algorithms) using available variables metadata; 2.5 - design statistical processing methodology; 3–Build 2–Design 2.6 – design production systems and workflow 7 Creates statistical metadata of workflow (which data processing sub-processes are used and which order) Creates pre-fill metadata Creates output validation metadata (algorith ms); Creates algorithm s of statistical confidenti ality; Creates coding algorithms using classifiers and coding tables metadata; Creat es data quality variabl es metad ata ( edit failure rate, imputati on rate) Creates data warehouse data model metadata (using design data model of raw data) Creates data staging area data model (using technical metadata of collection instrument) Creates imputation algorithms (uses imputation methods); Creates disseminati on metadata (publication calendar); Creates data processing algorithms and scheduling metadata 3.3 – configure workflows Creates scheduling technical metadata. ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) 5.1 – integrate data; Creates process metadata of data processing subprocesses (process log) 5.2 – classify & code; 5.3 – review, validate & edit; 5–Process 5.4 – impute; 5.5 – derive new variables and statistical units; Creates data finalizing metadata (date, person etc.); 5.6 – calculate weights; 5.7 – calculate aggregate; Creates data loading to S-DWH metadata (logs, date, person etc.) 5.8 – finalize data files 6.1 – prepare draft output Creates process metadata (logs etc.) of output/cube creation 6–Analyze 1–Specify needs Metadata usages 8 1.5 – check data availability Creat es quality indicat ors for integr ation proce ss Uses metadata of available administrative data Creates output technical metadata ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) 2.1. – design outputs; Uses cube/output variable metadata 2.2 – design variable descriptions; Uses metadata 3–Build 2–Design 2.5 - design statistical processing methodology; Uses statistical processing metadata 3.3 – configure workflows Uses scheduling statistical metadata 5.1 – integrate data; Uses metadata; 5.4 – impute; 5–Process Uses available variables metadata in creation of new variable metadata (creating algorithms) 2.6 – design production systems and workflow 5.2 – classify & code; 5.5 – derive new variables and statistical units, 5.6 – calculate weights; 5.7 – calculate aggregate; 5.8 – finalize data files 9 variable sample Uses variable metadata, classifiers metadata; Uses coding algorithms, classifiers, coding tables’ metadata; Uses algorithms for integration; Uses imputation algorithms Uses new variable creating algorithms Uses stratum and frame metadata; Uses methods aggregate; Uses classifiers and coding tables metadata in creation coding algorithm s; Creates imputatio n algorithm s using imputatio n methods in creation imputatio n algorithm s for Uses tables/colu mns metadata Uses data quality variable s metadat a for data profiling Uses data model of raw data; Uses data warehouse data model metadata ESSnet on Data Warehousing 6–Analyze Maia Ennok (Statistics Estonia) 6.1 – prepare draft output Uses classifiers metadata; Uses output/cube variable metadata in creation of output/cube Metadata maintenance Maintain integration metadata (data processing algorithms, data warehouse data models etc.). User rights are managed according to the S–DWH system process operations Integration metadata can be stored in different meta models (managing meta models). Meta model for statistical metadata, process metadata etc. Metadata evaluation All kind of metadata and data are versioned 10 Systematic built-in processes for managing the workflow of quality assurance of metadata Organizational processes of metadata validation Interpretation and data analysis layer Phas e of GSB PM sub– process/meta data Statistical metadata 1–Specify needs 2.3 Evaluate and assure quality metadata of integration layer by – fill-in controls of metadata validation 1.3 – establish output objectives Creates output variable metadata; create output metadata (which cube variables are in which cubes); Process metadata Quality metadata Technical metadata Authorisation metadata ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) 2.1 – design outputs; 2–Design 2.2 – design variable description; 2.4 – design frame and sample methodology ; 2.5 – design statistical processing methodology ; Creates new variable metadata; Creates sample metadata; Creates methodological rules for analyse; 3–Build 3.3 – configure workflows 4.1 – select sample 5–Process 5.1 – integrate data; 5.5 – derive new variables and statistical units; 5.6 – calculate weights 5.7 – calculate aggregate 11 Creates quality variable s Creates statistical metadata of workflow 4–Collect Metadata creation 2.6 – design production systems and workflow Creates output validation metadata (algorithms), creates algorithms of statistical confidentiality, creates dissemination metadata (publication calendar); builds output (output algorithms); Creates technical metadata of workflow Creates process metadata of sample taking Creates process metadata Creates quality indicator s ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) 6– Analyze 6.1 – prepare draft output; Creates process metadata of validation (process log) 6.2 – validate outputs, 6.3 – scrutinize and explain; Creates process metadata 6.4 – apply disclosure control; 3–Build 4–Collect 5–Process Metadata usages 1–Specify needs 9-Evaluate 7Disseminate 6.5 – finalize outputs 12 7.1 – update output systems Creates process metadata 9.1 – gather evaluation inputs; Creates process metadata (process log) 9.2 – conduct evaluation 1.5 – check data availability Uses variable descriptions; Uses variable data element descriptions 3.3 – configure workflows 4.1 – select sample 5.1 – integrate data; 5.5 – derive new variables and statistical units and link Uses 2.5 metadata for Analyse process Uses methodological metadata Uses metadata 2.4 variable Uses new variable creating algorithms Creates output technical metadata ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) 6.1 – prepare draft output; 6–Analyze 6.3 – scrutinize and explain; Uses classifiers metadata; Uses output metadata Uses validation algorithm s; uses algorithm s of statistical confidenti ality 6.2 – validate outputs; 6.4 – apply disclosure control; Uses quality variable s metadat a. Metadata evaluation Metadata maintenance 9-Evaluate 7– Disseminate 6.5 – finalize outputs 2.4 Uses metadata. 9.1 – gather evaluation inputs; Uses evaluation metadata output 9.2 – conduct evaluation Maintain interpretation metadata (outputs, output variables, classifier, clarifying metadata). Manual, automated. User rights are managed according to the S–DWH system process operations All kind of metadata and data are versioned Evaluate and assure quality metadata of interpretation and data analysis layer by – fill-in controls of metadata validation Systematic built–in processes for managing the workflow of quality assurance of metadata Organizational processes of metadata validation Data access layer Phas e of GSB 13 7.1 – update output systems sub– process/metad ata Statistical metadata Process metadata Quality metadata Technical metadata Authorisation metadata ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) 2–Design PM 2.1 - design outputs; Build/creates output; 2.6 – design production systems and workflow Creates statistical metadata of workflow Creates technical metadata of output 3–Build Creates technical metadata of workflow 7.1 – update output systems; 7–Disseminate Metadata creation 3.3 – configure workflows Creates process metadata (when, who, etc.); 7.2 – produce dissemination; 7.3 – manage release of dissemination products; 7.4 – promote dissemination; 7.5 – manage user support 7–Disseminate Metadata usages 7.1 – update output systems; 7.2 – produce dissemination, 7.3 – manage release of dissemination products; 7.4 – promote dissemination; 7.5 – manage user support 14 Uses output metadata (output variable metadata, classifiers metadata) Uses publication calendar metadata and statistical activity metadata. Creates user metadata ESSnet on Data Warehousing Metadata maintenance Maia Ennok (Statistics Estonia) Maintain access metadata (output, product, publication calendar, user support). Manual, automated. All kind of metadata and data are versioned User rights are managed according to the S–DWH system process operations. Metadata evaluation Evaluate and assure quality metadata of data access layer by – fill-in controls of metadata validation 15 Systematic built-in processes for managing the workflow of quality assurance of metadata Organizational processes of metadata validation 3. Metadata in data flows of Functional Architecture of the SDWH For better understanding functional architecture of the S-DWH we have to see the data flows through the S-DWH from metadata perspective. Examples of data flow through the warehouse are from Functional Architecture of the S-DWH6 (deliverable 3.3). Users and systems of S-DWH have to understand how versions of variables differ from each other. These differences are described in matadata (data and also metadata is versioned). 3.1 General variable flow Metadata in general data flow: 6 Berglund B. (2013) Functional Architecture of the S-DWH. Deliverable 3.3 16 ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) Source layer Collecting of new versions of survay’s variables is as same as collecting initial variables. Differences are only in identifications of versions. Examples of metadata of data collection : Statistical metadata: variable name and code Quality metadata: quality indicators Process metadata: process log (when, who and how data is collected); data validation rules (collecting instruments) Technical metadata: where the collected data is stored (server, database, tabel) Authorization metadata: who has access to data, who can collect data etc. Data is versioned (version number, current version). Integration layer Processing as collecting of new versions of survay’s variables is as same as processing inital variables. Differences are only in identifications of versions. Examples of metadata of data processing : Statistical metadata: variable name and code Quality metadata: quality indicators Process metadata: process log (when, who and how data is processed); data validation rules; data processing algorithms; dsa data mart mapping algorithms Technical metadata: where the processed data is stored (data staging area, data mart) (server, database, tabel) Authorization metadata: who has access to data, who can process data etc. Data is versioned (version number, current version). Interpretation and data analysis layer Loading new value of variable to Interpretation layer changes also metadata of old value making it not the current version anymore. All versions of variables are identified in data store, also there are link to different kind of metadata. Data is versioned (version number, current version). 3.2 Data editing Metadata in data editing: 17 ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) If data processing algorithms (process metadata) in Integration layer are changed then new generation for variable is created. It means also creating new process metadata (processing algorithms). Metadata and data are both versioned (version number, current version). 3.3 Derivation of value Metadata in derivation of value: 18 ESSnet on Data Warehousing Maia Ennok (Statistics Estonia) If the new variable (calculated variable) is created in Integration layer it also means creating new metadata: statstical, process, tehnical, quality and autorisation metadata for that new variable. Metadata and data are both versioned (version number, current version). 4. Conclusion Metadata is relevant for understanding how the S-DWH works and for identifying processes done with data and metadata. Data without metadata is not usable, so there has to be always metadata for data. The more data is characterised by metadata (statistical, process, technical etc.) the easier it is for users and systems to identify and characterise data. And the more metadata is described the bigger chance it is to automate processes. 19