Expert Group Meeting on the IRWS
United Nations
New York,
4-6 November 2008
1
PART I
Chapter 1: Introduction.
Chapter 2: Main concepts and the SEEAW
Chapter 3: Economic units
Chapter 4: Data items
PART II
Chapter 5: Data collection strategy
Chapter 6: Data sources compilation methods
Chapter 7: Metadata and data quality
Chapter 8: Dissemination
Chapter 9: Indicators
ANNEXES
Annex 1: Supplementary data items
Annex 2: Link between data items and the SEEAW
Annex 3: Link between data items and indicators of WWDR and MDG
Annex 4: Link between data items and indicators of FAO
GLOSSARY
•
Section A – Introduction
•
Section B – Metadata for water statistics
•
Section C – Data quality dimensions
•
Section D – Data quality assessment framework
Metadata is information used to describe data. A very short definition of metadata then is “data about data”.
Metadata descriptions go beyond the pure form and contents of data. They are used to describe:
•
Administrative facts about data (who created them, and when),
• How data were collected and processed before being disseminated or stored in a database.
There are many metadata frameworks. These include, for example:
•
Statistical Data and Metadata Exchange
(SDMX);
•
Dublin Core Metadata Initiative (DCMI), ISO-
19115;
•
FGDC (Federal Geographic Data Committee);
•
Data Documentation Initiative (DDI);
•
Resource Description Framework (RDF).
Element
Contact
Metadata update
Statistical presentation
Release calendar policy
Institutional framework
Transparency
Description
It describes contact points for the data or metadata, including how to reach the contact points.
Date on which the metadata element was inserted or modified.
Description of the table contents, with their data breakdowns.
Describes the policy regarding the release of statistics according to a preannounced schedule (if available).
Refers to a law or other formal provision that assigns responsibilities and authority to agencies for the collection, processing, and dissemination of the statistics, and includes any data sharing arrangements.
Describes the policy on:
- the availability of the terms and conditions under which statistics are collected, compiled, and disseminated
- providing advanced notice of major changes in methodology, source data, and statistical techniques
- internal governmental access to statistics prior to their release; the policy on statistical products’ identification.
Reference to available quality reports for the data.
Related quality reports
Comparability and coherence
Accuracy and reliability
Statistical concepts
Scope and coverage
Source data
Data validation
Relevance
Quality assurance
The extent to which differences between statistics from different geographical areas, non-geographical domains, or over time, can be attributed to differences between the true values of the statistics.
The accuracy of statistical information is the degree to which the information correctly describes the phenomena it was designed to measure.
The statistical concept under measure and the organisation of data, i.e. the type of variables included in the domain of study, and the source of these concepts (i.e. SNA, SEEAW, IRWS other).
Scope/Coverage describes the coverage of the statistics and how consistent this is with internationally accepted standards, guidelines, or good practices.
Description of the data collection programs and their adequacy for the production of statistics, including meeting the requirements for methodological frameworks, scope, classifications systems, and basis for recording.
Validation describes methods and processes for routinely assessing microdata and macrodata.
Refers to the processes for monitoring the relevance and practical utility of existing statistics in meeting users’ needs and how these processes inform the development of statistical programs.
Refers to processes in place to focus on quality, to monitor the quality of the statistical programs, to deal with quality considerations in planning the statistical programs.
Section C – Dimensions of data quality
Box 8.1. Comparison between IMF Data Quality Assessment Framework, Eurostat
Quality Definition and OECD Quality Measurement Framework
Eurostat OECD
IMF DQAF (incl. elements)
0. Prerequisites of quality
0.1 Legal and institutional environment
0.2 Resources
0.3 Relevance
0.4 Other quality measurement
1. Assurance of integrity
1.1 Professionalism
1.2 Transparency
1.3 Ethical standards
2. Methodological soundness
2.1 Concept and definitions
2.2 Scope
2.3Classification/Sectorisation
2.4 Basis for recording
3. Accuracy and reliability
3.1 Source data
3.2 Assessment of source data
3.3 Statistical techniques
3.4 Assessment and validation of intermediate data and statistical outputs
3.5 Revision studies
4. Serviceability
4.1 Periodicity and timeliness
4.2 Consistency
4.3 Revision policy and practice
5. Accessibility
5.1 Data accessibility
5.2 Meta data accessibility
5.3 Assistance to users
Institutional and organisational arrangements
Core statistical process
Statistical products
Relevance
Comparability across countries
Accuracy
Timeliness and punctuality
Coherence
Accessibility and clarity
Relevance
Credibility
Interpretability
Accuracy
Timeliness
Coherence
Accessibility
Source: Laliberte,
Grunewald and Probst
(2003): Data Quality: A
Comparison of IMF’s
Data Quality Assessment
Framework (DQAF) and
Eurostat’s Quality
Definition. Available from http://www.oecd.org/datao ecd/26/3/17831984.pdf
•
Prerequisites of quality
•
Accessibility
•
Accuracy
•
Coherence
•
Credibility
•
Relevance
•
Timeliness
These are all institutional and organisational conditions that have an impact on the quality of water statistics. These include:
•
The legal basis for compilation of data; adequacy of data sharing and coordination among data producing agencies;
• Assurance of confidentiality of data provided by data producers and respondents,
•
Adequacy of human, financial, and technical resources for implementation of water statistics programmes and implementation of measures to ensure their efficient use;
•
Quality awareness by staff and data producers.
This includes:
• The ease with which the existence of information can be ascertained;
• The suitability of the form (e.g. standard tables or water indicators);
•
The media (e.g. web or paper publications) of dissemination through which the information can be accessed;
•
Availability of metadata;
• Existence of user support services and an advance released calendar.
The degree to which the data correctly estimate the data items.
Accuracy has many attributes and in practice there is not a single aggregate or overall measure of accuracy. In general, it is characterized in terms of errors in statistical estimates and is traditionally decomposed into bias (systematic error) and variance (random error).
Accuracy depends upon the quality of data collected and the processes undertaken by statistical offices to reduce errors at all stages of the data collection process.
This reflects the degree to which the data are logically connected and mutually consistent, i.e. they can be successfully brought together with other statistical information
The use of standard concepts, classifications and statistical populations promotes coherence, as does the use of common methodology across water data collections.
Coherence does not necessarily imply full numerical consistency.
Coherence has four important sub-dimensions: coherence within a dataset, coherence across datasets, coherence over time, and coherence across countries.
This refers to the confidence that users have in the producers of the data. Users’ confidence is built over time. One important aspect is trust in the objectivity of the data. That is the data are perceived to be produced professionally in accordance with appropriate statistical standards, such as the SEEAW and IRWS, and that policies and practices are transparent. For example, data should not be released in response to political pressure
This reflects the degree to which the data meets the needs of users.
Measuring relevance requires identification of user groups and their needs.
Some indicators of relevance are:
•
The use of data by key users
•
The number of requests for data by all users
•
The results of user satisfaction surveys
• This refers to the amount of time between the end of the reference period, and the date on which the data are released.
• The timeliness of information influences its relevance.
•
Often timeliness is a trade-off against accuracy.
• Timeliness is related to the existence of a publication schedule. A publication schedule comprises a set of release dates or may involve a commitment to release water data within a prescribed time period from their receipt.
•
Punctuality is is the amount of time between the announced release date and the actual release data
Data collection process
1. Identify
2. Review
3. Collect
4. Compile
5. Disseminate
Section D – Data quality assessment framework
0. Prerequisites for data quality
0.1. Institutional arrangements support the development of water statistics.
0.2. Legal arrangements support the development of water statistics.
Prerequisite
0.3. The production and dissemination of water statistics are guided by professional principles, policies and practices.
Prerequisite
Prerequisite
0.4. Staff, facilities, computing resources, and financing are commensurate with statistical programs
Prerequisite
0.5. Data quality is considered at all stages of statistical development.
1. Identifying what information to produce
Prerequisite
1.1. Mechanisms are in place to identify new and emerging water information needs
1.2. Data items are identified and selected based on information needs.
Relevance
Relevance
Section D – Data quality assessment framework
(cont.)
2. Reviewing existing water data
2.1. Data quality are assessed against relevant data quality indicators and frameworks.
2.2. Gaps in existing data and information have recently been identified and recorded
(within the last 3 years).
2.3. Deficiencies with existing data and information (such as data quality issues) are identified and recorded (i.e. within the last 3 years).
3. Selecting and collecting data
Accuracy
Coherence
Coherence
3.1. The choice of data sources and statistical techniques are informed solely by statistical considerations.
3.2. Frames are regularly updated.
3.3. Data collections are designed, and tested to ensure they collect relevant and accurate data.
3.4. Data collections are conducted in a professional manner.
Credibility
Accuracy
Accuracy
Accuracy
Section D – Data quality assessment framework
(cont.)
4. Compiling information
4.1. Data are compiled using international statistical standards, guidelines and best practices.
4.2. Data are compiled using standard classifications.
4.3. Data is compiled using reliable statistical methods and procedures.
4.4. Revisions are made when required.
5. Disseminating information to users
Coherence
Coherence
Accuracy
Accuracy
5.1. Decisions about dissemination are informed solely by statistical considerations.
Credibility
5.2. Water statistics are disseminated to a range of audiences.
5.3. Data dissemination includes information regarding water statistics publications ad publication schedules.
5.4. Data dissemination includes support services.
5.5. The relevance and practical utility of water statistics are monitored.
Accessibility
Accessibility
Accessibility
Relevance
5.6. Publications are published on time and schedule.
Timeliness
1.
Should there be standard metadata for water statistics?
2.
If so what should it be and which of the metadata frameworks is the most appropriate starting point for water statistics?
3.
Which data quality framework is most appropriate starting point for water statistics?
4.
Should we develop a data quality assessment framework as part of the IRWS or should this be part of the compilation guidelines?