European Commission – Eurostat/B1, Eurostat/E1, Eurostat/E6 WORKING DOCUMENT – Pending further analysis and improvements Based on deliverable 2.5 Contract No. 40107.2011.001-2011.567 ‘VIP on data validation general approach’ Definition of validation levels and other related concepts – v 0.1307 July 2013 Project: ESS.VIP.BUS Common data validation policy Document: D2.5 - Definition of validation levels and other related concepts Version: 0.1307 Document Service Data Type of Document Deliverable Reference: 2-5 Definition of “validation levels” and other related concepts Version: 0.1304 Created by: Angel SIMÓN Status: Date: Draft 23.04.2013 European Commission – Eurostat/B1, Eurostat/E1, Eurostat/E6 For Internal Use Only Angel SIMÓN Remark: Pending further analysis and improvements Distribution: Reviewed by: Approved by: Document Change Record Version Date Change 0.1304 23.04.2013 Initial release based on deliverable from contractor AGILIS 0.1307 08.07.2013 Added an executive summary and small improvements Contact Information EUROSTAT Ángel SIMÓN Unit E-6: Transport statistics BECH B4/334 Tel.: +352 4301 36285 Email: Angel.SIMON@ec.europa.eu April 2013 Page 2 Project: ESS.VIP.BUS Common data validation policy Document: D2.5 - Definition of validation levels and other related concepts Version: 0.1307 Table of contents Page 1 Introduction ..................................................................................................................................4 2 2.1 2.2 Some preliminary basic concepts and terminology .................................................................5 Concepts ........................................................................................................................................5 Terminology....................................................................................................................................7 3 Some more formal definitions ....................................................................................................8 4 4.1 4.2 4.3 4.4 4.5 Validation levels ...........................................................................................................................9 Validation level 0 ..........................................................................................................................11 Validation level 1 ..........................................................................................................................12 Validation levels 2 and 3 ..............................................................................................................13 Validation level 4 ..........................................................................................................................15 Validation level 5 ..........................................................................................................................15 5 Some further observations .......................................................................................................15 Annex .....................................................................................................................................................16 April 2013 Page 3 Project: ESS.VIP.BUS Common data validation policy Document: D2.5 - Definition of validation levels and other related concepts Version: 0.1307 1 Executive summary The present document provides definitions to support the classification of validation rules into different levels from 0 to 5 depending on the target dataset and the origin of the checked dataset(s) by validation rules. The document is an evolution of the document “Validation levels” produced by the VIP Validation project in 2011 with added definitions and examples. Following graph represents the validation levels described in the document. Data Between different providers Within a statistical provider Between domains Within a domain From the same source Same dataset Same file Between files Between datasets From different sources Level 5: Consistency checks Level 4: Consistency checks Level 3: Mirror checks Level 2: Between correlated datasets Level 0: Level 1: Level 2: Format & file Cells, Revisions and structure records, file Time series Graph 1: Visual representation of Validation levels and increased complexity April 2013 Page 4 Project: ESS.VIP.BUS Common data validation policy Document: D2.5 - Definition of validation levels and other related concepts Version: 0.1307 2 Introduction The expression "validation levels" is currently given (slightly) different meanings. This document is intended to give some first definitions to this and other related concepts. The document provides examples of rules for each validation level; these examples however are by no means an exhaustive list of the possible rules for each level. Nor is the classification of validation rules into levels the only possible typology of validation rules. The purpose of this document is not to be / become a consistent and exhaustive paper, but just to give some initial elements for an exchange of views with possible stakeholders in both Eurostat and the Member States. The final aim is not to produce a methodological reference but rather a practical tool to improve communication in the concrete statistical production activity. This is in line with the philosophy of the "VIP on validation": "be concrete", "be useful". 3 Some preliminary basic concepts and terminology 3.1 Concepts What is "validation"? The following definitions can be found in a dictionary: Valid o 1(a) legally effective because made or done with the correct formalities o 1(b) legally usable or acceptable o well based or logical; sound Validate o make (sth) legally valid; ratify o make (sth) logical or justifiable Both meanings are applicable to statistical activity. Indeed, Eurostat / NSIs’ "official" statistics are used in precise legal contexts (in both the public and private spheres) and they have to correspond to specific characteristics (harmonised definitions; independence of statistical offices; etc.). Also the second, more general, meaning (logical, sound, well based) can be used with reference to statistical results of good quality. Thus, data validation could be operationally defined as a process which ensures the correspondence of the final (published) data with a number of quality characteristics. More realistically, data validation is an attempt to ensure such a correspondence, since the action of validating or making valid may be successful or not. In the negative event this may lead to a nonpublication of the collected data or to their publication with appropriate explanatory notes in order to guide users to an appropriate use of the data. April 2013 Page 5 Project: ESS.VIP.BUS Common data validation policy Document: D2.5 - Definition of validation levels and other related concepts Version: 0.1307 How to define the concept of data quality in the framework of data validation? In general the concept of data quality encompasses different dimensions. Some relate to the way data (and the accompanying metadata) are published, such as accessibility and clarity; some relate to the time data are published, such as timeliness and punctuality; others relate more generally to the capability of statistics to meet user needs (relevance). However in the framework of data validation, one usually refers to the intrinsic characteristics of the data: accuracy, coherence, comparability. Accuracy Accuracy of statistics refers to the absence of (substantial) errors or in other words to the closeness of results to the "truth", or the "real" value of the phenomenon to be measured. 1 More concretely, data validation tries to ensure the degree of accuracy of the published statistics, required by (known) user needs. More operationally, data validation can be embodied in a "system of quality checks" capable to identify those (substantial) errors which prevent statistics to reach the level of accuracy required by (main) user needs.2 A quality check is meant to identify those values which do not comply with well defined logical or other kinds of (more or less empirically defined) conformity rules. The identified outliers (lying out of the expected or usual range of values) are (potential) errors. The final purpose of data validation is to correct the outliers only where relevant, i.e. by identifying the "real" errors. 3 Usually data validation is aiming at correcting errors before the publication of the statistics. However, sometimes the corrections can be also performed after the publication: revisions. The decision whether to publish or not statistics including identified errors is based on a (usually empirical) trade-off between different and "contradictory" quality components (usually timeliness and accuracy). Sometimes quality checks can help in identifying structural data errors (as opposed to one-off outliers). These can originate from a misinterpretation of the statistical definitions or by the nonconformity of the sources or other characteristics of the data collection vis-à-vis the statistical concepts. The correction of these errors may require an improvement / clarification of the methodological definitions and/or a (structural) modification in the implementation of the data collection. Coherence and comparability Quality checks can also be used to test the internal coherence of a single statistical output (for example arithmetic relations between variables) or the comparability of its results over time or over space. Consistency and comparability can also be tested between two or more different statistical outputs (domains): in this case the possibility to develop quality checks depends on the degree to which the different data collections make use of the same harmonised concepts (definitions, classifications, etc.). 1 This can be defined in terms of "accepted range of deviation". 2 The required level of accuracy has an impact on the way simple equations are implemented. 3 Imputation can be intuitively seen as a kind of correction, specifically for "missing data". April 2013 Page 6 Project: ESS.VIP.BUS Common data validation policy Document: D2.5 - Definition of validation levels and other related concepts Version: 0.1307 3.2 Terminology Term Description Data warehouse Whole set of domains under the (direct or indirect) "control" of the same Institution. Database used for reporting and data analysis. Domain Set of information and data covering a certain topic Source Origin of the data Statistical variable Object of a measurement with reference to a group of statistical units defined by the reference variables. Statistical unit Member of a set of entities being studied. Reference variable Identify the necessary reference to put in a "context" the statistical variables. Quantitative variable Quantitative data is data expressing a certain quantity, amount or range. Usually, there are measurement units associated with the data, e.g. metres, in the case of the height of a person. It makes sense to set boundary limits to such data, and it is also meaningful to apply arithmetic operations to the data. Derived variable A derived statistic is obtained by an arithmetical observation from the primary observations. In this sense, almost every statistic is "derived". The term is mainly used to denote descriptive statistical quantities obtained from data which are primary in the sense of being mere summaries of observations, e.g. population figures are primary and so are geographical areas, but population-per-square-mile is a derived quantity. Outlier An outlier is value standing far out of the range of a set of values which it refers to. In Animal Production Statistics, the set of value for a variable is the known time series for this variable. Suspicious value A value is suspicious when it has broken a validation check but no final decision has yet been taken. Error Outlier or inconsistency compared to the expected result Fatal Error Error stopping the data processing Warning Error not stopping the data processing Dataset Structure for the transmission or organisation of statistical data File A dataset specific transmission occurrence Record A row providing the statistical information in a file Code Allows identifying the statistical references Dictionary List of the allowed codes Revision New occurrence of a data file Correction Modification of statistical information without reception of a new occurrence of a data file April 2013 Page 7 Project: ESS.VIP.BUS Common data validation policy Document: D2.5 - Definition of validation levels and other related concepts Version: 0.1307 4 Some more formal definitions In the previous section some concepts have been given imprecise meaning/definitions, in order not to complicate the presentation. This section is meant to remedy this and provides definitions which are in line with international standards.4 Data editing is the activity aimed at detecting and correcting errors (logical inconsistencies) in data. 5 The editing procedure usually includes three phases: - the definition of a consistent system of requirements (checking rules), - their verification on given data (data validation or data checking) and - elimination or substitution of data which are in contradiction with the defined requirements (data imputation). Checking rule is a logical condition or a restriction to the value of a data item or a data group which must be met if data are to be considered correct. Common synonyms: edit rule, edit. Data validation is an activity aimed at verifying whether the value of the data item comes from the given (finite or infinite) set of acceptable values. 6 Data checking is the activity through which the correctness conditions of the data are verified. It also includes the specification of the type of error or condition not met, and the qualification of the data and its division into the "error free" and "erroneous" data. Data imputation is the substitution of estimated values for missing or inconsistent data items (fields). The substituted values intended to create a data record that does not fail edits. Quality assurance is a planned and systematic pattern of all the actions necessary to provide adequate confidence that a product will conform to established requirements. 7 4 The definitions are taken from the UNECE "Glossary of terms on statistical data editing": http://live.unece.org/fileadmin/DAM/stats/publications/editingglossary.pdf A continuously updated version of the Publication edited in 2000 is available on line: http://www1.unece.org/stat/platform/display/kbase/Glossary 5 The definition of "data validation" in the OECD "Glossary of statistical terms" is identical: http://stats.oecd.org/glossary/detail.asp?ID=2545 6 The definition of "data editing" in the OECD "Glossary of statistical terms" is identical: http://stats.oecd.org/glossary/detail.asp?ID=3408 7 as available in the OECD Glossary of statistical terms: http://stats.oecd.org/glossary/detail.asp?ID=4954 April 2013 Page 8 Project: ESS.VIP.BUS Common data validation policy Document: D2.5 - Definition of validation levels and other related concepts Version: 0.1307 5 Data validation Data editing Quality assurance Taken into account the above international definitions, the following hierarchical relation can be established between three concepts widely utilised in the context of data quality: - Establishment of checking rules; Detection of outliers or potential errors; Communication of the detailed problems to the "actors" in the best position to investigate them; - Corrections of the errors based on appropriate investigations or automatic imputation; - Technical activities, e.g. data analysis, which are not part of the agreed set of checking rules; Activities of other nature, for instance compliance monitoring which is a set of governance processes that are meant to stimulate EU Member States to respect their obligation to apply systematically the EU statistical legislation. - Validation levels Validation levels can be defined with reference to quite different concepts. Sometimes the levels (first level of validation, second level of validation, etc.) reflect simply and empirically the (early vs. late) phases of the production process at which quality checks are performed. On similar grounds the levels can be referred to either the persons / Institutions (for example Member States vs. Eurostat) or the specific IT tools (for example GENEDI, eDAMIS validation engine or eVE vs. Editing Building Block or EBB) performing the quality checks at successive stages in the production chain. Validation levels can also be defined by the degree to which the detected "outliers" are not in conformity with the rules underlying the quality checks: clear (logical / arithmetic) vs. potential nonconformity. In other words the classification of validation levels can based on the above mentioned distinction between errors (or "real errors") and outliers (or "potential errors"). An automatic application of this classification to the kind of operational behaviour to follow for the two types of "outliers" can lead to the distinction between "fatal errors" (the production process is stopped until corrections are made) and "warnings" (the production process can continue). However a less automatic application of this principle, based on a more flexible trade-off between timeliness and accuracy, can lead to consider as "warnings", also situations where real (but not very serious) errors are detected. This case can be implemented in specific validation tools as the possibility to "force" the data transmission (or the publication) even though some real (for example logical or arithmetic) errors are detected. On the contrary, serious outliers (potential errors) can also lead to the decision of stopping the production process until verification is made. A classification of validation levels can also be based on the progressively larger amount of data which are necessary to perform the quality checks. April 2013 Page 9 Project: ESS.VIP.BUS Common data validation policy Document: D2.5 - Definition of validation levels and other related concepts Version: 0.1307 In order to define some basic concepts which are necessary for the above mentioned classification of validation levels, let's assume in a simplified example that data are organised or exchanged by using bi-dimensional tables. The columns of the table identify two different categories of "variables": - the reference "variables" (or classification "variables", or reference metadata) - the statistical variables. The rows identify the "records". The reference variables identify the necessary reference to put in a "context" the statistical variables and give a meaning to the collected numbers. Example of a statistical variable: number of inhabitants (for example: 200,000). Possible corresponding reference "variables": country of residence (for example: Luxembourg), date (for example: 15/02/2011), sex (for example: Females). More precisely a statistical variable is the object of a measurement with reference to a group of statistical units defined by the reference "variables". The above information can be organised in a record (a row of the table): Luxembourg; 15/02/2011; Females; 200,000 More records constitute a file: Luxembourg; 15/02/2011; Females; 200,000: Luxembourg; 15/02/2011; Males; 190,000: Luxembourg; 15/02/2011; Total; 390,000: A dataset is the agreed structure for the transmission or organisation of statistical data. The above dataset could be named "census population by country of residence and by sex". Its agreed structure can be described as: "country; date; sex; number of inhabitants". A dataset is the unit for data transmission.8 A file is a dataset specific transmission occurrence. Certain reference variables are associated to defined (agreed) dictionaries. Dictionaries include the list of the possible occurrences of the specific reference variable. Example of a dictionary is "Sex": - Females - Males - Total 8 If data are not "transmitted" (for example in the case of a "hub" architecture), a dataset is a defined subset of the underlying database. April 2013 Page 10 Project: ESS.VIP.BUS Common data validation policy Document: D2.5 - Definition of validation levels and other related concepts Version: 0.1307 Usually each specific occurrence is associated to an agreed code: this leads to a simplification of the file content. An example of dictionary with codes is "Sex": F; Females M; Males T; Total With the use of agreed dictionaries, the file of the above example could be simplified as follows: LU; 15/02/2011; F; 200,000: LU; 15/02/2011; M; 190,000: LU; 15/02/2011; T; 390,000: In this specific example, colon, semi-colon and comma are special characters used to structure respectively the records, the fields (the variables) and the thousands in the numbers; the date in the second field has a specified format. Let's define now the validation levels on the basis of the progressively larger amount of data which are necessary to perform the quality checks. 5.1 Validation level 0 Some quality checks do not need any data of the file (referring to the specific values of either the statistical or the reference variables) in order to be performed: validation level 0. For these quality checks for example only the structure of the file or the format of the variables are necessary as input. For example, it is possible to define a "dataset naming convention" on the basis of which the file above has to be "named", for example CENSUS_2011_LU_SEX. Before even "reading" the (statistical) content of the data included in the file one can check for example: - if the file has been sent/prepared by the authorised authority (data sender); - if the column separator, the end of record symbol are correctly used; - if the file has 4 columns (agreed format of the file); - if the first column is alphanumeric (format of each variable / column) - if the first column is two-character long (length of the first column) - if the second column fits a particular mask (date); - if all the required information is included in the file (no missing data). Example of a file for which validation level 0 may give error messages: LU; 15/02/2011; F; 200,000: L; 15/02/11; ; 190,000_ LU; 15/02/2011, T; 390,000: April 2013 Page 11 Project: ESS.VIP.BUS Common data validation policy Document: D2.5 - Definition of validation levels and other related concepts Version: 0.1307 Validation level 0 is usually referred to as "(sender,) format and file structure" checks or more simply "format and file structure checks". 5.2 Validation level 1 Validation level 1 can group all these quality checks which only need the (statistical) information included in the file itself. a) Some checks can be based at the level of each record, or even at the level of each cell (identified by "coordinates" of one row and one column). One could check for example: - if the number included in column 4 is not negative; - if the number included in column 4 is an integer. Combining the information of one cell in the file and the file name one could check: - if the year in the second column is 2011, as in the file name; - if the content of the first column is LU, as in the file name. Combining the information of one cell in the file with information included in dictionaries associated to variables one could check: - if the content of the third column is one of the codes of the dictionary "Sex"; - if the content of the first column is one of the codes of the dictionary "Country". Combining the information of one cell in the file and other dictionaries associated to that dataset one could check: - if the content of the first column is consistent with the data sender (let's assume that there is a dictionary including the list of the data senders associated to the specific dataset): data for Luxembourg should not be sent by another country. b) Some checks can be based at the level of a record (using the information included in the cells of one row). For example one could devise the following quality checks: - based on information available before data collection (for example from previous survey or other sources) one could establish a "plausibility range" for inhabitants of Luxembourg in 2011 as follows: IF column 1 = "LU" and Column 3 = "T" then the variable number of inhabitants should be included in the range 100,000 – 1,000,000. This quality check is not a time series check, but rather a plausibility check based on historical or other information. It is an "order of magnitude" check, i.e. it is meant to detect clerical mistakes, like typing one zero more or one zero less, typing a decimal comma instead of a thousand separator comma. - in files where two or more statistical variables are collected, other logical or plausibility checks can be conceived between two or more variables (also using dictionaries): a certain combination of codes is illogical, a variable has to be reported only for a certain combination of codes, etc. c) Some checks can be based at the level of two or more records up to all the records of the entire file. Examples of mathematical expressions to be respected: - Total inhabitants (col 3 = "T") > = Male (Col 3 = "M") inhabitants - Total inhabitants = male inhabitants + female inhabitants (col 3 = "F") April 2013 Page 12 Project: ESS.VIP.BUS Common data validation policy Document: D2.5 - Definition of validation levels and other related concepts Version: 0.1307 Based on information available before data collection one could conceive for example the following plausibility rule: - Female inhabitants = (total inhabitants / 2) +/- 10% - one could check the absence of double records (records for which the combination of the reference variables is identical) Validation level 1 is usually referred to as "intra-dataset checks". Since we have defined a file as a specific occurrence of a dataset, these checks should be more properly called as "intra-file checks"9. 5.3 Validation levels 2 and 3 The next validation levels could group all the quality checks in which the content of the file is compared with the content of "other files" referring to the same statistical system (or domain). Case a) the "other files" can be other versions of exactly the same file. In this case the quality checks are meant to detect "revisions" compared to previously sent data. Detection and analysis of revisions can be useful for example to verify if revisions are consistent with outliers detected in previous quality checks (corrections) or to have an estimate of the impact of the revisions in the "to be published" results, for the benefit of the users. Case b) the "other files" can be already received versions of the same dataset, from the same data provider (for ex. a country), but referring to other time periods. These checks are usually referred to as "time series checks" and are meant to verify the plausibility of the time series. "Revision checks" and "time series checks" could be referred to as "intra-dataset inter-file checks", since they are performed between different occurrences (files) of the same defined data (transmission) structure (a dataset). 9 It is useful to remind that a dataset is defined as the unit of data transmission. The file should correspond to an entire dataset (and not to a sub-set of it) when defining "intra-file checks". However, sometimes a dataset can be physically split in different files, which are sometimes collected by different data providers or Institutions. Sometimes the files are merged before data transmission by one of these Institutions (of another Institutions), who is meant to guarantee the homogeneity of concepts used by the original different data sources. However, sometimes the files are transmitted separately by the different data providers or are assembled in one file by one Institution without an effective action aiming at guaranteeing the homogeneity of the concepts used by the original data providers. In case of inconsistencies between data transmitted by different data providers (for example because of a lack of harmonisation of concepts used by the different providers) the definition of "dataset" should be used in order to identify the correct level of validation to which the detection of these inconsistencies should be classified. Indeed these inconsistencies should be seen as inconsistencies between records of the same dataset: these are "intra-dataset" or (at least conceptually) "intra-file checks" and not "inter-source" or "inter-data providers checks" (to compare with paragraph 4.3 Case d below). April 2013 Page 13 Project: ESS.VIP.BUS Common data validation policy Document: D2.5 - Definition of validation levels and other related concepts Version: 0.1307 Case c) the "other files" can refer to other datasets from the same data provider (country), but referring to the same or other correlated time periods. Sometimes a group of datasets (same country, same reference period) is sent at the same time. Example: three files could be sent at the same time, from the same country and referring to the same time period: one file includes data for "females", one for "male", one for "total". Consistency between the results of the three files can be checked. Another example: results from annual datasets can be compared with the results of the corresponding quarterly datasets. These checks are usually referred to as "inter-dataset checks". More precisely they could be called "intra-data provider (or intra-source) inter-dataset checks".10 Case d) the "other files" can refer to the same dataset, but from another data provider (country). For example mirror checks are included in this class. Mirror checks verify the consistency between declarations from different sources referring to the same phenomenon. Example: export declared by country A to country B should be the same as import declared by country B from country A. Mirror checks are "inter-source intra-dataset checks" (if, in our example, export and import are included in the same dataset) or "inter-source inter-dataset checks". If one defines a database as the combination of all datasets of the same "domain" (this implies that the methodological basis of all these datasets is somehow homogeneous), then all the above quality checks (Cases a, b, c and d) could be referred to as "intra-database checks". Two important aspects have to be clarified. First, the concept of database is technical. For technical reasons it may happen that the same database includes data from different domains; vice-versa the data of the same domain could be stocked in more than one database. As a consequence it would be more correct to refer to the above validation operations (Cases a, b, c and d) as "intra-domain checks". Second, when data for the same domain (= the same homogeneous methodological framework) are collected from different sources (countries), it can happen that the implementation of the methodological concepts may be not completely homogeneous from one source to another. For this and other practical reasons (who is in the best position to perform the specific quality checks?), it could be considered sound to distinguish two different validation levels, as follows: Validation level 2 is defined as "intra-domain, intra-source checks": it includes revision checks, time series checks and (intra-source) inter-dataset checks (Cases a, b and c above). Validation level 3 is defined as "intra-domain inter-source checks": it includes mirror checks (Case d). 10 Or maybe "intra-source intra-group (of correlated datasets) inter-dataset checks" April 2013 Page 14 Project: ESS.VIP.BUS Common data validation policy Document: D2.5 - Definition of validation levels and other related concepts Version: 0.1307 5.4 Validation level 4 Validation level 4 could be defined as plausibility or consistency checks between separate domains available in the same Institution. The availability implies a certain level of "control" over the methodologies by the concerned Institution. These checks could be based on the plausibility of results describing the "same" phenomenon from different statistical domains: example unemployment from registers and from Labour Force Survey. Checks could also be made between results from correlated micro-data and macro-data sources. Other plausibility checks could be based on known correlations between different phenomena: for example external trade and international transport activity in ports. These checks could be defined as intra-Institution checks or inter-domain checks or intra-datawarehouse checks, if one define as a data warehouse the whole set of domains under the (direct or indirect) "control" of the same Institution. 5.5 Validation level 5 Validation level 5 could be defined as plausibility or consistency checks between the data available in the Institution and the data / information available outside the Institution. This implies no "control" over the methodology on the basis of which the external data are collected, and sometimes a limited knowledge of it. Statistical indicators collected by Eurostat might also be compiled for their own needs by national institutions such as National Statistical Institutes or Ministries; by private entities (ports, airports, companies, etc.) and also by international organisations (World Bank, United Nations, International Monetary Fund, etc.). For example, EU road freight statistics are prepared by Member States according to the EU Commission legal acts and in addition countries can carry out specific surveys for national purposes. A benchmarking between indicators common to these different surveys allows assessing the coherence of these data and could help improving the methodologies for data collection. A graphical representation of the allocation of validation rules into validation levels is available in Annex. 6 Some further observations As a general rule, the higher the level of validation the more difficult is to identify "fatal errors", i.e. errors that imply the "rejection" of a data file. The above classification of validation levels seems to be operational in terms of both IT implementation (the level of data needed to carry out the checks) and the attribution of the responsibility (the availability of information generally follows the hierarchical organisation of different Institutions participating to the data collection exercise). April 2013 Page 15 Project: ESS.VIP.BUS Common data validation policy Document: D2.5 - Definition of validation levels and other related concepts Version: 0.1307 Annex Graph 2: Visual guide to the allocation of validation rules into validation levels. Data Within a statistical authority Within a domain From the same source Same file Level 0 Format and File structure checks January 2013 Level 1 Cells, Records, File checks Between files Between domains From different sources Between datasets Same dataset Between statistical authorities Level 5 Consistency checks Level 4 Consistency Checks Level 3 Mirror checks Level 2 Checks between correlated datasets Level 2 Revisions and Time series Page 16