ESS.VIP Presentation (ppt)

advertisement
ESS.VIP Validation
Objectives, scope & concepts
Angel Simón Delgado
EUROSTAT
Email: ESTAT-ESSVIP-VALIDATION@ec.europa.eu
Eurostat
ESS.VIP VALIDATION




VIP VALIDATION – First Phase
VIP VALIDATION – Deliverables in First Phase
ESS.VIP VALIDATON – Definition of the project
Validation Service (EDIT)
Eurostat
2
ESS VIP-V First Phase
Overall goal:
To develop validation solutions to be used by different
production chains (horizontal integration), within the ESS
(vertical integration)
Bottom-up approach:
• Extensive consultation of all possible stakeholders
• Participative management
• Business driven approach
• From pilots experience to general principles
Eurostat
3
ESS VIP-V First Phase
Scope, objectives and outputs
• Documentation / Standardisation:
a) template and guidelines for process description
b) template and guidelines for a standard documentation of the validation process
• Methodological analysis of Data Validation
a) Typology of validation rules
b) Standard definition of validation levels
c) Standard formalised “syntax” (understandable by business users) to express validation rules
• Distribution of responsibilities in the production chain
a) Guidelines to be used for the attribution of responsibility in the whole production chain (MSs
and Eurostat) by the WG.
b) Guidelines based on efficiency principles (Validation  Corrections: “the sooner, the better”)
c) Preparing for IS/IT solutions and architecture
Eurostat
4
ESS VIP-V First Phase
Scope, objectives and outputs
• Towards IT/IS solutions and architecture:
a) Users’ requirements to develop a new software to allow business
users to input validation rules and the corresponding error messages
in a shared Central Repository of Validation Rules. The new software
should be able to generate the rules in the Validation syntax
developed by the project.
b) Validation Architecture defining the elements and their relationships in
an integrated validation system ("common platform") to be used by:
 Internal users, in an appropriate IS architecture to facilitate
horizontal integration of IT/IS systems
 All stakeholders in the production chain
Eurostat
5
Contribution to the VISION
More efficient
production
chain with clear
attribution of
responsibilities
Standard
definitions,
guidelines and
validation
syntax
Development of
a common
validation
procedure
Common
solutions to be
shared within
the ESS
Eurostat
6
ESS.VIP Validation First Phase - Deliverables
Documentation
1.1
1.2
1.5
1.6
Inventory of documents
Analysis of inventory
Inventory of validation rules
Inventory of error messages
Solutions
3.2 Validation syntax
4.1 Functional specifications for
GUI
Examples
1.3
2.4
3.1
3.3
Validation & statistical processes
Analysis of validation typologies
Validation rules by typology
Error messages
Methodology
1.4 Validation typologies
2.4 Analysis of validation
typologies
2.5 Levels of validation
Templates & Guidelines
2.1 Documentation of validation
process
2.2 Documentation of statistical
process
3.3 Error messages
3.4 Selection of validation rules
3.5 Improvement actions
3.6 Attribution of
responsibilities
7
Same file
Between
files
Between
datasets
From different
sources
Between
domains
Between different
organisations
Level 0: Format &
file structure
Level 1: Cells,
records, file
Level 2: Revisions
and Time series
Level 2: Between
correlated datasets
Level 3: Mirror checks
Level 4: Consistency checks
Level 5: Consistency checks
Eurostat
Validation complexity
Same
dataset
From the same
source
Within a domain
Within an organisation
Data
Deliverables: Validation levels
8
Deliverables: Typology of validation rules
File Structure
>1 file checks
Filename
Referential integrity
File type
Code list
Delimiters
Cardinality
Format
Mirror
Time series
1 file checks
Revised data integrity
Model-based consistency
Type
Length
1n file checks
Presence
Allowed character
Consistency
Uniqueness
Control
Range
Conditional
Eurostat
9
Deliverables: Guidelines for the attribution of
responsibility of validation activities in the whole
production chain
•
Guidelines for the allocation of responsibility, for the implementation of
validation rules within the ESS based on an AGREEMENT Eurostat-NSI's
with periodic performance revisions from both sides
•
Proposal for a generic business architecture of data validation:
Eurostat
Loading
data to
production
database
Step 4
Transmission
of data and
validation
report
Step 3
Data
preparation
by the NSI's
Step 2
Step 1
Validation Controls – Different actors – Different responsibilities
Additional
processing &
dissemination
10
Deliverables: Standard template for error/warning messages
Standard templates for error/warning messages and for validation report
Validation
report
structure:
Body
Footer
• Rules applied
• Error/warning
• Total failures
messages
• No. errors
• No. warnings
Header
• Total records
• Time stamp
• Records failed
• User ID
• Sum of weights
• Data checked
• Maximum
(dataset name…)
admissible error
weight
• Rate of acceptance
• Maximum possible
amount of error
• Rate of
performance
Eurostat
Error/warning
message structure:
• Rule ID
• Severity
• Rule type ID
• Message text
• Action
• Failing data
11
Deliverables: VALS - Validation syntax
Standard syntax for validation language
To define a meta-language for the domain of statistical data validation to express,
document and communicate validation rules
Trade-off between Human-understandable and Computer-parseable language
Implementation through Graphic User Interface to support business users to input and
maintain validation rules and rule-sets
Types
Examples
Type check
validate ( type(A1.Rcount)='ΤΕΧΤ')
Range & math
Code list check
validate( Table2.C_5
between
(Table2.C_5_1 + Table2.C_5_2 + Table2.C_5_3 - tolerance)
(Table2.C_5_1 + Table2.C_5_2 + Table2.C_5_3 + tolerance))
validate ( match_codelist (A1.Quarter, CL_QRT) )
Range check
validate ( H.HB050 between 1 and 12 )
Rules &
Metarules
validate ( age_range_rule.result = true
and
country_codes_rule.result = true)
…
…
Eurostat
and
12
Proposed approach
ESS.VIP VALIDATION - Package 1:
IMPLEMENTATION
Goals
• Implementation of the methodological developments of VIP-V
Phase I in the statistical domains/WGs
• Maintenance and refinement of standards developed
• User requirements for further developments
• Evaluation, monitoring and reporting
Eurostat
13
Proposed approach
ESS.VIP VALIDATION - Package 2:
MICRO DATA
Goals
• Vertical integration of the micro data validation within the ESS
production processes taking into account the results of the
first phase of ESS VIP-V
• Extension of the functional specifications to apply to micro
data validation
• Integrated solutions for micro data validation
Eurostat
14
Proposed approach
ESS.VIP VALIDATION - Package 3:
SOLUTIONS
Goals
• Adaptation of existing validation tools to the functional
specifications issued from ESS VIP-V Phase I
• Deployment of validation solutions to MSs
•
Distribution of validation rules in agreed language
•
Building-Blocks in an adequate web services architecture
•
Provision of web services validation solutions to be used by
Member States before transmission to Eurostat
Eurostat
15
Proposed approach
ESS.VIP VALIDATION - Package 4:
EXTENSIONS AND GENERAL COORDINATION
Goals
• Overall coordination of the project
• Coherence of validation approaches within ESS
• Implementation of the meta-language within ESS
• Analysis of links with other VIP & ESS VIP projects
• Good practices identification
• More sophisticated validation solutions:
•
•
•
Longitudinal validation
Mirror checks validation
ESS wide shared final warehouses
Eurostat
16
Elements in a validation system: ESS production chain
Eurostat
Member States
Member
States
statistical
production
and validation
Single
Entry
Point
Validation
Services
Web
Inteface
Eurostat
Statistical
domain 1
Statistical
domain …
Statistical
domain n
Validation Service
Validation
rules
repository
Data
Data
Definition
registry
Validation
Service
Errors
Validation report
Metrics
Eurostat
18
Validation architecture elements – Global overview
Repository of
rulesets and
their metadata
Data Structure
SEP
Single
Entry Point
Validation
Report
Data
Current actions and next steps
 Assessment of Eurostat domains in the field of
validation
 Set of standard documentation for domain
managers for harmonisation of communication with
data providers
 Task Force to:
• Identify best practices in ESS
• Advice on implementation in the ESS
• Optimisation of the validation process
Eurostat
20
Current actions and next steps
 Functional specifications for Validation services
(EDIT) to be accordingly adapted to the findings of
the project
 Tools development:
• System to create/maintain a central repository of validation
rules
• Development and/or adaptation of IT Tools (EDIT, eDAMIS)
• Validation Quality Metrics
Eurostat
21
Thank you
More information on:
Email:
ESTAT-ESSVIP-VALIDATION@ec.europe.eu
Wiki (only from EUROSTAT):
http://www.cc.cec/wikis/display/ESTATmethodology/ESS.VIP+VALIDATION)
Eurostat
Download