International Recommendations for Water Statistics (IRWS) –

International Recommendations for

Water Statistics (IRWS) –

Chapter VII Metadata and Data quality

Expert Group Meeting on the IRWS

United Nations

New York,

4-6 November 2008

1

Location in IRWS

PART I

Chapter 1: Introduction.

Chapter 2: Main concepts and the SEEAW

Chapter 3: Economic units

Chapter 4: Data items

PART II

Chapter 5: Data collection strategy

Chapter 6: Data sources compilation methods

Chapter 7: Metadata and data quality

Chapter 8: Dissemination

Chapter 9: Indicators

ANNEXES

Annex 1: Supplementary data items

Annex 2: Link between data items and the SEEAW

Annex 3: Link between data items and indicators of WWDR and MDG

Annex 4: Link between data items and indicators of FAO

GLOSSARY

Outline of Chapter

•

Section A – Introduction

•

Section B – Metadata for water statistics

•

Section C – Data quality dimensions

•

Section D – Data quality assessment framework

Section B

–

Metadata

Metadata is information used to describe data. A very short definition of metadata then is “data about data”.

Metadata descriptions go beyond the pure form and contents of data. They are used to describe:

•

Administrative facts about data (who created them, and when),

• How data were collected and processed before being disseminated or stored in a database.

Metadata frameworks

There are many metadata frameworks. These include, for example:

•

Statistical Data and Metadata Exchange

(SDMX);

•

Dublin Core Metadata Initiative (DCMI), ISO-

19115;

•

FGDC (Federal Geographic Data Committee);

•

Data Documentation Initiative (DDI);

•

Resource Description Framework (RDF).

SDMX

Element

Contact

Metadata update

Statistical presentation

Release calendar policy

Institutional framework

Transparency

Description

It describes contact points for the data or metadata, including how to reach the contact points.

Date on which the metadata element was inserted or modified.

Description of the table contents, with their data breakdowns.

Describes the policy regarding the release of statistics according to a preannounced schedule (if available).

Refers to a law or other formal provision that assigns responsibilities and authority to agencies for the collection, processing, and dissemination of the statistics, and includes any data sharing arrangements.

Describes the policy on:

- the availability of the terms and conditions under which statistics are collected, compiled, and disseminated

- providing advanced notice of major changes in methodology, source data, and statistical techniques

- internal governmental access to statistics prior to their release; the policy on statistical products’ identification.

Reference to available quality reports for the data.

Related quality reports

SDMX …. cont

Comparability and coherence

Accuracy and reliability

Statistical concepts

Scope and coverage

Source data

Data validation

Relevance

Quality assurance

The extent to which differences between statistics from different geographical areas, non-geographical domains, or over time, can be attributed to differences between the true values of the statistics.

The accuracy of statistical information is the degree to which the information correctly describes the phenomena it was designed to measure.

The statistical concept under measure and the organisation of data, i.e. the type of variables included in the domain of study, and the source of these concepts (i.e. SNA, SEEAW, IRWS other).

Scope/Coverage describes the coverage of the statistics and how consistent this is with internationally accepted standards, guidelines, or good practices.

Description of the data collection programs and their adequacy for the production of statistics, including meeting the requirements for methodological frameworks, scope, classifications systems, and basis for recording.

Validation describes methods and processes for routinely assessing microdata and macrodata.

Refers to the processes for monitoring the relevance and practical utility of existing statistics in meeting users’ needs and how these processes inform the development of statistical programs.

Refers to processes in place to focus on quality, to monitor the quality of the statistical programs, to deal with quality considerations in planning the statistical programs.

Section C – Dimensions of data quality

Box 8.1. Comparison between IMF Data Quality Assessment Framework, Eurostat

Quality Definition and OECD Quality Measurement Framework

Eurostat OECD

IMF DQAF (incl. elements)

0. Prerequisites of quality

0.1 Legal and institutional environment

0.2 Resources

0.3 Relevance

0.4 Other quality measurement

1. Assurance of integrity

1.1 Professionalism

1.2 Transparency

1.3 Ethical standards

2. Methodological soundness

2.1 Concept and definitions

2.2 Scope

2.3Classification/Sectorisation

2.4 Basis for recording

3. Accuracy and reliability

3.1 Source data

3.2 Assessment of source data

3.3 Statistical techniques

3.4 Assessment and validation of intermediate data and statistical outputs

3.5 Revision studies

4. Serviceability

4.1 Periodicity and timeliness

4.2 Consistency

4.3 Revision policy and practice

5. Accessibility

5.1 Data accessibility

5.2 Meta data accessibility

5.3 Assistance to users

Institutional and organisational arrangements

Core statistical process

Statistical products

Relevance

Comparability across countries

Accuracy

Timeliness and punctuality

Coherence

Accessibility and clarity

Relevance

Credibility

Interpretability

Accuracy

Timeliness

Coherence

Accessibility

Source: Laliberte,

Grunewald and Probst

(2003): Data Quality: A

Comparison of IMF’s

Data Quality Assessment

Framework (DQAF) and

Eurostat’s Quality

Definition. Available from http://www.oecd.org/datao ecd/26/3/17831984.pdf

Section C

– Dimensions of data quality

•

Prerequisites of quality

•

Accessibility

•

Accuracy

•

Coherence

•

Credibility

•

Relevance

•

Timeliness

Prerequisites of quality

These are all institutional and organisational conditions that have an impact on the quality of water statistics. These include:

•

The legal basis for compilation of data; adequacy of data sharing and coordination among data producing agencies;

• Assurance of confidentiality of data provided by data producers and respondents,

•

Adequacy of human, financial, and technical resources for implementation of water statistics programmes and implementation of measures to ensure their efficient use;

•

Quality awareness by staff and data producers.

Accessibility

This includes:

• The ease with which the existence of information can be ascertained;

• The suitability of the form (e.g. standard tables or water indicators);

•

The media (e.g. web or paper publications) of dissemination through which the information can be accessed;

•

Availability of metadata;

• Existence of user support services and an advance released calendar.

Accuracy

The degree to which the data correctly estimate the data items.

Accuracy has many attributes and in practice there is not a single aggregate or overall measure of accuracy. In general, it is characterized in terms of errors in statistical estimates and is traditionally decomposed into bias (systematic error) and variance (random error).

Accuracy depends upon the quality of data collected and the processes undertaken by statistical offices to reduce errors at all stages of the data collection process.

Coherence

This reflects the degree to which the data are logically connected and mutually consistent, i.e. they can be successfully brought together with other statistical information

The use of standard concepts, classifications and statistical populations promotes coherence, as does the use of common methodology across water data collections.

Coherence does not necessarily imply full numerical consistency.

Coherence has four important sub-dimensions: coherence within a dataset, coherence across datasets, coherence over time, and coherence across countries.

Credibility

This refers to the confidence that users have in the producers of the data. Users’ confidence is built over time. One important aspect is trust in the objectivity of the data. That is the data are perceived to be produced professionally in accordance with appropriate statistical standards, such as the SEEAW and IRWS, and that policies and practices are transparent. For example, data should not be released in response to political pressure

Relevance

This reflects the degree to which the data meets the needs of users.

Measuring relevance requires identification of user groups and their needs.

Some indicators of relevance are:

•

The use of data by key users

•

The number of requests for data by all users

•

The results of user satisfaction surveys

Timeliness

• This refers to the amount of time between the end of the reference period, and the date on which the data are released.

• The timeliness of information influences its relevance.

•

Often timeliness is a trade-off against accuracy.

• Timeliness is related to the existence of a publication schedule. A publication schedule comprises a set of release dates or may involve a commitment to release water data within a prescribed time period from their receipt.

•

Punctuality is is the amount of time between the announced release date and the actual release data


Data collection process

1. Identify

2. Review

3. Collect

4. Compile

5. Disseminate


0. Prerequisites for data quality

0.1. Institutional arrangements support the development of water statistics.

0.2. Legal arrangements support the development of water statistics.

Prerequisite

0.3. The production and dissemination of water statistics are guided by professional principles, policies and practices.

Prerequisite

Prerequisite

0.4. Staff, facilities, computing resources, and financing are commensurate with statistical programs

Prerequisite

0.5. Data quality is considered at all stages of statistical development.

1. Identifying what information to produce

Prerequisite

1.1. Mechanisms are in place to identify new and emerging water information needs

1.2. Data items are identified and selected based on information needs.

Relevance

Relevance


(cont.)

2. Reviewing existing water data

2.1. Data quality are assessed against relevant data quality indicators and frameworks.

2.2. Gaps in existing data and information have recently been identified and recorded

(within the last 3 years).

2.3. Deficiencies with existing data and information (such as data quality issues) are identified and recorded (i.e. within the last 3 years).

3. Selecting and collecting data

Accuracy

Coherence

Coherence

3.1. The choice of data sources and statistical techniques are informed solely by statistical considerations.

3.2. Frames are regularly updated.

3.3. Data collections are designed, and tested to ensure they collect relevant and accurate data.

3.4. Data collections are conducted in a professional manner.

Credibility

Accuracy

Accuracy

Accuracy


(cont.)

4. Compiling information

4.1. Data are compiled using international statistical standards, guidelines and best practices.

4.2. Data are compiled using standard classifications.

4.3. Data is compiled using reliable statistical methods and procedures.

4.4. Revisions are made when required.

5. Disseminating information to users

Coherence

Coherence

Accuracy

Accuracy

5.1. Decisions about dissemination are informed solely by statistical considerations.

Credibility

5.2. Water statistics are disseminated to a range of audiences.

5.3. Data dissemination includes information regarding water statistics publications ad publication schedules.

5.4. Data dissemination includes support services.

5.5. The relevance and practical utility of water statistics are monitored.

Accessibility

Accessibility

Accessibility

Relevance

5.6. Publications are published on time and schedule.

Timeliness

Questions to the EGM:

1.

Should there be standard metadata for water statistics?

2.

If so what should it be and which of the metadata frameworks is the most appropriate starting point for water statistics?

3.

Which data quality framework is most appropriate starting point for water statistics?

4.

Should we develop a data quality assessment framework as part of the IRWS or should this be part of the compilation guidelines?

International Recommendations for Water Statistics (IRWS) –

International Recommendations for

Water Statistics (IRWS) –

Chapter VII Metadata and Data quality

Location in IRWS

Outline of Chapter

Section B

–

Metadata

Metadata frameworks

SDMX

SDMX …. cont

Section C

– Dimensions of data quality

Prerequisites of quality

Accessibility

Accuracy

Coherence

Credibility

Relevance

Timeliness

Section D – Data quality assessment framework

Questions to the EGM:

Related documents

Products

Support

International Recommendations for Water Statistics (IRWS) –