ESS.VIP Validation Objectives, scope & concepts Angel Simón Delgado EUROSTAT Email: ESTAT-ESSVIP-VALIDATION@ec.europa.eu Eurostat ESS.VIP VALIDATION VIP VALIDATION – First Phase VIP VALIDATION – Deliverables in First Phase ESS.VIP VALIDATON – Definition of the project Validation Service (EDIT) Eurostat 2 ESS VIP-V First Phase Overall goal: To develop validation solutions to be used by different production chains (horizontal integration), within the ESS (vertical integration) Bottom-up approach: • Extensive consultation of all possible stakeholders • Participative management • Business driven approach • From pilots experience to general principles Eurostat 3 ESS VIP-V First Phase Scope, objectives and outputs • Documentation / Standardisation: a) template and guidelines for process description b) template and guidelines for a standard documentation of the validation process • Methodological analysis of Data Validation a) Typology of validation rules b) Standard definition of validation levels c) Standard formalised “syntax” (understandable by business users) to express validation rules • Distribution of responsibilities in the production chain a) Guidelines to be used for the attribution of responsibility in the whole production chain (MSs and Eurostat) by the WG. b) Guidelines based on efficiency principles (Validation Corrections: “the sooner, the better”) c) Preparing for IS/IT solutions and architecture Eurostat 4 ESS VIP-V First Phase Scope, objectives and outputs • Towards IT/IS solutions and architecture: a) Users’ requirements to develop a new software to allow business users to input validation rules and the corresponding error messages in a shared Central Repository of Validation Rules. The new software should be able to generate the rules in the Validation syntax developed by the project. b) Validation Architecture defining the elements and their relationships in an integrated validation system ("common platform") to be used by: Internal users, in an appropriate IS architecture to facilitate horizontal integration of IT/IS systems All stakeholders in the production chain Eurostat 5 Contribution to the VISION More efficient production chain with clear attribution of responsibilities Standard definitions, guidelines and validation syntax Development of a common validation procedure Common solutions to be shared within the ESS Eurostat 6 ESS.VIP Validation First Phase - Deliverables Documentation 1.1 1.2 1.5 1.6 Inventory of documents Analysis of inventory Inventory of validation rules Inventory of error messages Solutions 3.2 Validation syntax 4.1 Functional specifications for GUI Examples 1.3 2.4 3.1 3.3 Validation & statistical processes Analysis of validation typologies Validation rules by typology Error messages Methodology 1.4 Validation typologies 2.4 Analysis of validation typologies 2.5 Levels of validation Templates & Guidelines 2.1 Documentation of validation process 2.2 Documentation of statistical process 3.3 Error messages 3.4 Selection of validation rules 3.5 Improvement actions 3.6 Attribution of responsibilities 7 Same file Between files Between datasets From different sources Between domains Between different organisations Level 0: Format & file structure Level 1: Cells, records, file Level 2: Revisions and Time series Level 2: Between correlated datasets Level 3: Mirror checks Level 4: Consistency checks Level 5: Consistency checks Eurostat Validation complexity Same dataset From the same source Within a domain Within an organisation Data Deliverables: Validation levels 8 Deliverables: Typology of validation rules File Structure >1 file checks Filename Referential integrity File type Code list Delimiters Cardinality Format Mirror Time series 1 file checks Revised data integrity Model-based consistency Type Length 1n file checks Presence Allowed character Consistency Uniqueness Control Range Conditional Eurostat 9 Deliverables: Guidelines for the attribution of responsibility of validation activities in the whole production chain • Guidelines for the allocation of responsibility, for the implementation of validation rules within the ESS based on an AGREEMENT Eurostat-NSI's with periodic performance revisions from both sides • Proposal for a generic business architecture of data validation: Eurostat Loading data to production database Step 4 Transmission of data and validation report Step 3 Data preparation by the NSI's Step 2 Step 1 Validation Controls – Different actors – Different responsibilities Additional processing & dissemination 10 Deliverables: Standard template for error/warning messages Standard templates for error/warning messages and for validation report Validation report structure: Body Footer • Rules applied • Error/warning • Total failures messages • No. errors • No. warnings Header • Total records • Time stamp • Records failed • User ID • Sum of weights • Data checked • Maximum (dataset name…) admissible error weight • Rate of acceptance • Maximum possible amount of error • Rate of performance Eurostat Error/warning message structure: • Rule ID • Severity • Rule type ID • Message text • Action • Failing data 11 Deliverables: VALS - Validation syntax Standard syntax for validation language To define a meta-language for the domain of statistical data validation to express, document and communicate validation rules Trade-off between Human-understandable and Computer-parseable language Implementation through Graphic User Interface to support business users to input and maintain validation rules and rule-sets Types Examples Type check validate ( type(A1.Rcount)='ΤΕΧΤ') Range & math Code list check validate( Table2.C_5 between (Table2.C_5_1 + Table2.C_5_2 + Table2.C_5_3 - tolerance) (Table2.C_5_1 + Table2.C_5_2 + Table2.C_5_3 + tolerance)) validate ( match_codelist (A1.Quarter, CL_QRT) ) Range check validate ( H.HB050 between 1 and 12 ) Rules & Metarules validate ( age_range_rule.result = true and country_codes_rule.result = true) … … Eurostat and 12 Proposed approach ESS.VIP VALIDATION - Package 1: IMPLEMENTATION Goals • Implementation of the methodological developments of VIP-V Phase I in the statistical domains/WGs • Maintenance and refinement of standards developed • User requirements for further developments • Evaluation, monitoring and reporting Eurostat 13 Proposed approach ESS.VIP VALIDATION - Package 2: MICRO DATA Goals • Vertical integration of the micro data validation within the ESS production processes taking into account the results of the first phase of ESS VIP-V • Extension of the functional specifications to apply to micro data validation • Integrated solutions for micro data validation Eurostat 14 Proposed approach ESS.VIP VALIDATION - Package 3: SOLUTIONS Goals • Adaptation of existing validation tools to the functional specifications issued from ESS VIP-V Phase I • Deployment of validation solutions to MSs • Distribution of validation rules in agreed language • Building-Blocks in an adequate web services architecture • Provision of web services validation solutions to be used by Member States before transmission to Eurostat Eurostat 15 Proposed approach ESS.VIP VALIDATION - Package 4: EXTENSIONS AND GENERAL COORDINATION Goals • Overall coordination of the project • Coherence of validation approaches within ESS • Implementation of the meta-language within ESS • Analysis of links with other VIP & ESS VIP projects • Good practices identification • More sophisticated validation solutions: • • • Longitudinal validation Mirror checks validation ESS wide shared final warehouses Eurostat 16 Elements in a validation system: ESS production chain Eurostat Member States Member States statistical production and validation Single Entry Point Validation Services Web Inteface Eurostat Statistical domain 1 Statistical domain … Statistical domain n Validation Service Validation rules repository Data Data Definition registry Validation Service Errors Validation report Metrics Eurostat 18 Validation architecture elements – Global overview Repository of rulesets and their metadata Data Structure SEP Single Entry Point Validation Report Data Current actions and next steps Assessment of Eurostat domains in the field of validation Set of standard documentation for domain managers for harmonisation of communication with data providers Task Force to: • Identify best practices in ESS • Advice on implementation in the ESS • Optimisation of the validation process Eurostat 20 Current actions and next steps Functional specifications for Validation services (EDIT) to be accordingly adapted to the findings of the project Tools development: • System to create/maintain a central repository of validation rules • Development and/or adaptation of IT Tools (EDIT, eDAMIS) • Validation Quality Metrics Eurostat 21 Thank you More information on: Email: ESTAT-ESSVIP-VALIDATION@ec.europe.eu Wiki (only from EUROSTAT): http://www.cc.cec/wikis/display/ESTATmethodology/ESS.VIP+VALIDATION) Eurostat