Recommendations on the Impact of Metadata Quality in the

advertisement
in partnership with
Recommendations on the Impact of Metadata Quality in
the Statistical Data Warehouse
Title:
WP:
1
Deliverable:
1.2
Version:
0.3
Date:
3-01-2013
Autor:
Collin Bowler, Michel Lindelauf,
Jos Dressen
NSI:
ONS, CBS
ESS - NET
ON MICRO DATA LINKING AND DATA WAREHOUSING
IN PRODUCTION OF BUSINESS STATISTICS
Recommendations on the Impact of Metadata Quality in the
Statistical Data Warehouse
1.Introduction
Quality of metadata (and data) can sometimes be difficult to define in an unambiguous manner, and in
the context of a Statistical Data Warehouse (SDWH) this is no different.In this document, we are
specifically interested in the quality of metadata used in the SDWH
So what is the definition of ‘Quality’?

A general definition which can be used is ‘fitness for use, or purpose’.

ISO9000:2005 defines quality as the ‘degree to which a set of inherent characteristics fulfils
requirements ‘.
‘Fitness for use’ is a relative definition, allowing for various perspectives on what constitutes quality,
depending on the intended uses of the metadata (and indeed the intended uses of the data to which
the metadata refers).
Also, the degree of quality indicates that there will be a set of acceptable quality levels associated
with the characteristics, or dimensions, which the metadata must satisfy in order to be fit for use.
2. Quality measure or quality indicator?
Quality measures are defined as items that directly measure aparticular aspect of quality. For
example, the time lag from the reference date to therelease of the output is a direct measure.
However, in practice many quality measurescan be difficult or costly to calculate. Instead,the use of
quality indicators can givean insight into quality.
Quality indicators usually consist of information that is a by-product of the statisticalprocess. They do
not measure quality directly but can provide enough information toprovide an insight into quality.
(ONS – 2007)
3. Types of Statistical Metadata
Lars-GoranLundell (2012) definesthree main metadata categories in use in the SDWH, and also
states that any item of metadata will normally fit intoeach of these categories:
 Active /Passive
 Formalised/Free-form
 Structural/Reference.
Active metadata, enables operational use, driving the processes within the S-DWH (e.g.
scripts/triggers to carry out activities on the data/metadata), whereas Passive metadata does not act
upon the the data /metadata within the system, e.g. quality reports, documentation etc.
Formalised metadata would have some form of structure, e.g. classifications/code lists, whereas
free-form metadata might contain descriptive information, as in quality reports for example.
Structural metadata is generally thought of (especially in the statistical data world) as metadata
which defines data, and generally help the user ‘find, identify, access and utilise the data’– for
example, classification codes. Reference metadata, by contrast, describe the content and quality of
the data, and is most usually associated with quality reports.
All of these categories of metadata could be subject to quality measurement, except perhaps the
quality report reference metadata which is itself a report on quality measurement.
4. International Standards for Metadata
There are some international standards and statistical models which apply to, or are concerned with
metadata, and quality characteristics are mentioned in some of them. Appendix B and Appendix C
provide more detail of specific standards available.
The ISO 11179 standard pertains to Metadata Registries (MDR), which has the data element as the
fundamental concept, and is concerned with the semantics around metadata definitions
The Wikipedia definition of metadata registry is: “a central location in an organization where
metadata definitions are stored and maintained in a controlled method.”
Within an MDR, quality is monitored through the use of a registration status. The status records the
level of quality.
ISO 11179 states that the main purposes of monitoring metadata quality are:





Monitoring adherence to rules for providing metadata for each attribute
Monitoring adherence to conventions for forming definitions, creating names, and performing
classification.
Determining whether an administered item still has relevance
Determining the similarity of related administered items and harmonizing their differences
Determining whether it is possible to ever get higher quality metadata for some administered
items
In a metadata registry, metadata quality is monitored through the use of a registration status. The
status records the level of quality for each administered item (i.e an administered item’s level of
conformance to the required standard), and the levels go (in increasing quality) from Candidate,
Recorded, Qualified, Standard, Preferred Standard.
This is a rigorous evaluation process, and could be used to apply to different elements of metadata
which are used for evaluation of specific quality dimensions, as appropriate to the scenario, or usecase (see below).
5. Dimensions, or Characteristics,of Metadata Quality
When examining the dimensions to be used when assessing quality in the context of statistical
processing, there are many available.
The European Statistical System (ESS) specifiesthat we have dimensions relating to:

Relevance

Accuracy

Timeliness and Punctuality

Accessibility and Clarity

Comparability

Coherence
From Johanis (2002), is the suggestion of a similar set of dimensions, originating from Statistics
Canada’s Quality Assurance Framework (QAF) (2002):

Relevance,

Accuracy

Timeliness

Accessibility

Interpretability

Coherence
Whilst these are seen as ‘static’ quality dimensions, the QAF also defines some complementary
quality aspects which are seen as ‘dynamic’:

Non-Response

Coverage

Sampling
Bruce & Hillman (2004), discussing metadata quality within the digital library context, suggest 7
similar dimensions, with an additional one covering ‘Provenance’:

Completeness

Accuracy,

Provenance

Conformance to expectations

Logical consistency and coherence

Timeliness

Accessibility
From Daas& van Nederpelt (2010), the dimensionsthought appropriate to metadata in the
context of ‘secondary data sources’ (i.e. mainly non-survey sources) are:

Clarity (encompassing coherence)

Comparability (encompassing linkability, replaceability and uniqueness)

Completeness (encompassing coverage, detailedness, availability, relevance, selectivity
and size)

Confidentiality

Correctness (encompassing accuracy, authenticity, and reliability)

Stability

Timeliness (encompassing punctuality)
Daas&Ossen(2011)proposes that when evaluating metadata quality of secondary data sources, the
use of ‘hyperdimensions’ is appropriate. These are where several metadata quality characteristics are
grouped together to give an overall quality assessment for a data source.
So which set of dimensions do we use when assessing metadata quality throughout the SDWH?
There does not seem to be any conclusive guidance around this specific issue.
Most sets of dimensions quoted in statistical quality frameworks appear to be aimed specifically at
statistical outputs from a data perspective, rather than metadata. Whenwe examine the detail of the
dimensions, we can see that some are really aimed at metadata e.g. when considering Timeliness, an
examination of the period to which data pertains compared to the period for which data is required,
the period information itself would be considered as metadata. However, measurement of the
Timeliness aspect is still a quality attribute which relates to the data itself rather than the metadata.
Perhaps the best approach is to use whatever dimensions are appropriate for aparticular scenario or
use-case.
6. Application of Quality Characteristics to Metadata in the Layers
The importance of the various quality characteristics when assessing the quality of different metadata
will vary depending upon a set of criteria which includes (but is not necessarily limited to):
(1) the layer of the SDWH in which the evaluation needs to take place;
(2) the source of the metadata (e.g..may be accompanying the data provided/collected, or may be
entered separately); and
(3) the use to which the data associated with the metadata is to be put.
Examples of how quality dimensions may be applied in the layers
Source Layer
The assessment of the quality of data concepts, definitions and classifications of the administrative
populations and variables will determine the relevance of this data to be used within an output.
Whereas a statistical institution can adjust the concepts,definitions and classifications used in its own
surveys to meet userneeds, the institution usually has little or no influence over thoseused by
administrative sources. Hence the presence of metadata containing sufficiently accurate descriptions
of the concepts can assist the decision of whether the source data meets their needs.
All metadata made available by an administrative data supplier should be described, along with
thosemetadata items which are missing. A description shouldinclude how the missing metadata affect
the ability to assess thefitness for purpose of the administrative data. The completeness of the
information would be used to determine whether the users can make appropriate use of data. Links to
appropriatemetadata ensure that this information is accessible.
For example, if data for a variable from an external data source has an accompanying description of
simply ‘Sales’ then the metadata it would fail the quality requirements of an output which might require
aggregation of variables values with a more specific description, such as ‘Sales - excluding VAT’. In
this instance, because of the quality of the metadata, this piece of data will be overlooked for this
particular output, even though the variable might have actually represented ‘Sales excluding VAT’ but
did not expressly say so in the description.
In another scenario, the measure of aprovenance characteristic might be important when assessing
the usefulness of a particular metadata item in relation to its source. This could be used as part of the
quality assessment as to whether a particular piece of micro data could (or should) be used to
contribute to an output. For example, if a piece of data arrives at the SDWH from an administrative
data source which is known to have previously supplied unreliable or inaccurate measure data
associated with a variable, this can be used as a quality evaluation when carrying out a selection of
the data which will be used to contribute to a particular output.
Integration layer
Many of the issues relating to the quality of metadata in the source layer are relevant to the
integration layer also. This is where an examination of the quality and status of the metadata relating
to prospective data for inclusion in the integration process. In addition, in this layer we would expect
processes such as editing, imputation, classification/coding to take place, often to be carried out by
automated scripts. The assessment of the quality of these scripts (which are actually Active metadata)
would be particularly important.
Interpretation and Analysis layer
When generating or prototyping a potential new output, the user will need to check whether data
exists for the statistical concept(s) that they are measuring. This would include a quality check of
descriptions of the statistical measure, the population, variables, statistical unit types, domains and
time reference. Quality checking this metadata would give users an understanding of the relevance
ofthe input data to their needs (for example, whether the output coverstheir required population or
time period).
Access layer
In the scenario of carrying out a search for valid datasets via some form of data explorer, the entry
into the search engine of valid search criteria is obviously very important if the appropriate datasets
are to be found. This means that any metadata entered as part of the search criteria much have an
acceptable level of quality in order for the search to be successful. For example, if the user enters a
value of ‘201203’ as the reference period of the data they require, but the metadata is held in the
SDWH in the form of ‘2012Q1’, then the search will fail. Metadata quality checks need to be carried
out on the correctness (or accuracy) of the metadata entered by the user.
7. Acceptable Quality Levels
These are threshold values which will indicate the acceptability of following the application of quality
measurements to the metadata, for each of the appropriate quality dimensions.
These levels could conceivably change depending upon the quality requirements of particular outputs
or processes.
8. Metadata Quality Management
Should we be concerned about the management of metadata quality?
Some aspects of the Quality Management principles (ISO 9000) should be applied to metadata
quality management. In particular the following principles seem particularly relevant to the SDWH
environment::



Customer focus - Organizations depend on their customers and therefore should understand
current and future customer needs, should meet customer requirements and strive to exceed
customer expectations;
Process approach - A desired result is achieved more efficiently when activities and related
resources are managed as aprocess;
System approach to management - Identifying, understanding and managing interrelated
processes as a system contributes to theorganization's effectiveness and efficiency in
achieving its objectives
This indicates that the processes surrounding the SDWH should encompass quality management
processes. For example, it would be expected that a customer, such as an expert user who is
carrying out some detailed analysis process, will have access to a system which will provide all the
information required by the user relating to the metadata, including some form of mechanism for
feeding back any information relating to the metadata quality which might come to light as a result of
the process being carried out by the user.
9. References
Lars-GoranLundell (2012) – Metadata Framework for Statistical Data Warehousing (ESSnetproject on
Statistical Data Warehouse)
International Standard ISO9000:2005 – Quality Management Systems fundamentals and vocabulary
International Standard ISO/IEC 11179 – Information Technology – Metadata Registries (Parts 1 – 6)
Office for National Statistics (2007) – Guidelines for Measuring Statistical Quality – Published by Her
Majesty’s Stationery Office (HMSO) – now ‘The Stationery Office’ - for the Office for National Statistics
Paul Johanis (2002) - Assessing the Quality of Metadata. Statistics Canada presentation at the work
session on METIS, 6-8 March 2002, Luxembourg
Statistics Canada - Statistics Canada’s Quality Assurance Framework (2002)
Thomas R. Bruce & Diane I Hillman (2004) – The Continnuum of metadata quality: Defining,
expressing, exploiting. From Metadata in Practice (pp.238-256). Chicago ALA
Piet J.H. Daas and Peter W.M. van Nederpelt (2010) - Application of the object oriented quality
management model to secondary data sources – Statistics Netherlands
Piet J.H. Daas and Saskia J.L. Ossen (2011) – Metadata Quality Evaluation of Secondary Data
Sources - Statistics Netherlands. Presented at the 5th International Quality Conference, May 20th
2011
Appendix A – Quality Dimension Definitions
Quality Assurance Framework- Stats Canada
Relevance:The relevance of statistical information reflects the degree to which it meets the real
needs of clients. It isconcerned with whether the available information sheds light on the issues of
most importance to users.Assessing relevance is a subjective matter dependent upon the varying
needs of users. The Agency’s challengeis to weigh and balance the conflicting needs of current and
potential users to produce a program that goes asfar as possible in satisfying the most important
needs within given resource constraints.
Accuracy:The accuracy of statistical information is the degree to which the information correctly
describes thephenomena it was designed to measure. It is usually characterized in terms of error in
statistical estimates and istraditionally decomposed into bias (systematic error) and variance (random
error) components. It may also bedescribed in terms of the major sources of error that potentially
cause inaccuracy (e.g., coverage, sampling,non-response, response).
Timeliness:The timeliness of statistical information refers to the delay between the reference point
(or the end of thereference period) to which the information pertains, and the date on which the
information becomes available. Itis typically involved in a trade-off against accuracy. The timeliness of
information will influence its relevance.
Accessibility:The accessibility of statistical information refers to the ease with which it can be
obtained from the Agency.This includes the ease with which the existence of information can be
ascertained, as well as the suitability ofthe form or medium through which the information can be
accessed. The cost of the information may also be anaspect of accessibility for some users.
Interpretability:The interpretability of statistical information reflects the availability of the
supplementary information andmetadata necessary to interpret and utilize it appropriately. This
information normally covers the underlyingconcepts, variables and classifications used, the
methodology of data collection and processing, and indicationsof the accuracy of the statistical
information.
Coherence:The coherence of statistical information reflects the degree to which it can be
successfully brought togetherwith other statistical information within a broad analytic framework and
over time. The use of standardconcepts, classifications and target populations promotes coherence,
as does the use of common methodologyacross surveys. Coherence does not necessarily imply full
numerical consistency
ONS Guidelines for Measuring Statistical Quality
Relevance - The degree to which the statistical product meets user needs for both coverage and
content.
Accuracy - The degree to which the statistical product meets user needs for both coverage and
content.
Timeliness and Punctuality - Timeliness refers to the lapse of time between publication and the period
to which the data refer. Punctuality refers to the time lag between the actual and planned dates of
publication.
Accessibility and Clarity - Accessibility is the ease with which users are able to access the data. It also
relates to the format(s) in which the data are available and the availability of supporting information.
Clarity refers to the quality and sufficiency of the metadata, illustrations and accompanying advice.
Comparability - The degree to which data can be compared over time and domain.
Coherence - The degree to which data that are derived from different sources or methods, but
which refer to the same phenomenon, are similar.
Appendix B – International Standards relevant to Metadata

ISO/IEC TR 20943 – Achieving Metadata Registry Content Consistency
http://metadata-stds.org/20943/index.html
First conclusion and Summary:
This standards consist of 6 parts and some are still under development or on
hold but can provide the reader with useful information on the subject of
metadata within a SDWH to our opinion as we discussed several items shortly
during the WP1 meeting in The Hague.
The purpose of ISO/IEC TR 20943-1:2003 is to describe a set of procedures for the
consistent registration of ata elements and their attributes in a registry. ISO/IEC TR
20943-1:2003 is not a data entry manual, but a user’s guide for conceptualizing a
data element and its associated metadata items for the purpose of consistently
establishing good quality data elements. An organization may adapt and/or add to
these procedures as necessary. The scope of ISO/IEC TR 20943-1:2003 is limited to
the associated items of a data element: the data element identifier, names and
definitions in particular contexts, and examples; data element concept; conceptual
omain with its value meanings; and value domain with its permissible values.
The purpose of ISO/IEC 20943-2is to describe ways of representing XML structured
data in a 11179-3 metadata registry hereinafter referred to as "a 11179 MDR" or
simply "an MDR"). XML structures may be mapped to, and represented by, one or
more constructs in an MDR. ISO/IEC 11179-3:2003 does not explicitly specify how to
represent XML structures, and practitioners have found more than one way to
represent similar structures using the constructs defined by ISO/IEC 11179-3:2003.
This part describes some possible representations of various XML structures, some
pros and cons of each, with techniques for mapping from one to another.

ISO/IEC TR 9789:1994 - Guidelines for the organisation and representation of data elements
for data interchange * coding methods and principles
http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=17651
First conclusion and summary:
We are not sure if this ISO standard can provide the reader with useful extra
information as it is not a free standard. For downloading thePDF- document you
must pay the amount of 98 CHF!
The ISO 9789 standard provides general guidance on the manner on which data can
be expressed by codes. Describes the objectives of coding, the characteristics,
advantages and disadvantages of different coding methods, the features of codes and
gives guidelines for the design of codes.

ISO/IEC TR 14957:2010 - Representation of data elements values: Notation of the format
http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=5565
2
First conclusion and summary:
We are not sure if this ISO standard can provide the reader with useful extra
information as it is not a free standard. For downloading thePDF- document you
must pay the amount of 50 CHF!
ISO/IEC 14957:2010 specifies the notation to be used for stating the format, i.e. the
character classes, used in the representation of data elements and the length of these
representations. It also specifies additional notations relative to the representation of
numerical figures. For example, this formatting technique might be used as part of the
metadata for data elements. The scope of ISO/IEC 14957:2010 is limited to graphic
characters, such as digits, letters and special characters. The scope is limited to the
basic datatypes of characters, character strings, integers, reals, and pointers.

ISO/IEC TR 15452:2000 – Specification of Data Value Domain
http://webstore.iec.ch/p-preview/info_isoiec15452%7Bed1.0%7Den.pdf
First conclusion and summary:
According to the above hyperlink this document has become withdrawn and
therefore no longer valid

ISO/IEC 19763 - Framework for Metamodel interoperability
http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=3
8637
First conclusion and summary:
According to the above hyperlink this document has become withdrawn and
therefore no longer valid.
 ISO/IEC 24706 - Metadata for technical standards and specifications documents
http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=56081
First conclusion and summary:
According to the above hyperlink this document is still under development and
therefore there is no summary available yet. We think it is worth to check this
standard again in the near future but for now it is not usefull for the project.

ISO/IEC 19773 – Metadata registries (MDR) Module
http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=4
1769
First conclusion and summary:
We are not sure if this ISO standard can provide the reader with useful extra
information as it is not a free standard. For downloading thePDF- document you
must pay the amount of 196 CHF!
ISO/IEC 19773:2011 specifies small modules of data that can be used or reused in
applications. These modules have been extracted from ISO/IEC 11179-3, ISO/IEC
19763, and OASIS EBXML, and have been refined further. These modules are
intended to harmonize with current and future versions of the ISO/IEC 11179 series
and the ISO/IEC 19763 series. These modules include: reference-or-literal (reflit) for
on-demand choices of pointers or data; multitext, multistring, etc. for recording
internationalized and localized data within the same structure; slots and slot arrays for
standardized extensible data structures; internationalized contact data, including UPU
postal addresses, ITU-T E.164 phone numbers, internet E-mail addresses, etc.;
generalized model for context data based upon who-what-where-when-why-how
(W5H); data structures for reified relationships and entity-person-groups. Conformity
can be selected on a per-module basis.

ISO/IEC 20944 – Metadata Registry Interoperabillity& Bindings (MDR-IB)
http://metadata-stds.org/20944/index.html
First conclusion and summary:
This standards consist of 5 parts and some are still under development but can
provide the reader with usefull information on the subject of metadata within a
SDWH to our opinion as we discussed several items shortly during the WP1
meeting in The Hague. Although further research is necessary to judge this.
The ISO/IEC 20944 family of standards is being developed to provide interoperability
among metadata registries (11179-3), such as reading/writing attributes from/to a
metadata registry. However, the ISO/IEC 20944 series may be used generically, such
as for applications that are unrelated to 11179-3 metadata registries, or applications
that extend 11179-3 metadata registry attributes (attributes outside of the 11179-3
specification).
Appendix C - Summary of ISO/IEC 11179
Introduction
International standards apply to metadata. Much work is being accomplished in the national and
international standards communities, especially ANSI (American National Standards Institute) and
ISO (International Organization for Standardization) to reach consensus on standardizing metadata
and registries.
The core standard is ISO/IEC 11179-1 and subsequent standards. All yet published registrations
according to this standard cover just the definition of metadata and do not serve the structuring of
metadata storage or retrieval neither any administrative standardisation. It is important to note that
this standard refers to metadata as data about containers of data and not to metadata (metacontent)
as data about data contents. It should also be noted that this standard describes itself originally as a
"data element" registry, describing disembodied data elements, and explicitly disavows the capability
of containing complex structures. Thus the original term "data element" is more applicable than the
later applied buzzword "metadata".
Intended purpose
Today, organizations often want to exchange data quickly and precisely between computer systems
using enterprise application integration technologies. Completed transactions are also often
transferred to separate data warehouse and business rules systems with structures designed to
support data for analysis. The industry de facto standard model for data integration platforms is the
Common Warehouse Model (CWM). Data integration is often also solved as a data, rather than a
metadata, problem, with the use of so called master data. ISO/IEC 11179 claims that it is a standard
for metadata-driven exchange of data in an heterogeneous environment, based on exact definitions of
data.
Structure of an ISO/IEC 11179 metadata registry
The ISO/IEC 11179 model is a result of two principles of semantic theory, combined with basic
principles of data modelling.
The first principle from semantic theory is the thesaurus type relation between wider and more narrow
(or specific) concepts, e.g. the wide concept "income" has a relation to the more narrow concept "net
income".
The second principle from semantic theory is the relation between a concept and its representation,
i.e. "buy" and "purchase" are the same concept even if different terms are used.
The basic principle of data modelling is the combination of an object class and a characteristic. For
example, "Person - hair color".
When applied to data modelling, ISO/IEC 11179 combines a wide "concept" with an "object class" to
form a more specific "data element concept". For example, the high-level concept "income" is
combined with the object class "person" to form the data element concept "net income of person".
Note that "net income" is more specific than "income".
The different possible representations of a data element concept are then described with the use of
one or more data elements. Differences in representation may be a result of the use of synonyms or
different value domains in different data sets in a data holding. A value domain is the permitted range
of values for a characteristic of an object class. An example of a value domain for "gender of person"
is "M = Male, F = Female, U = Unknown". The letters M, F and U are then the permitted values of
gender of person in a particular dataset.
The data element concept "monthly net income of person" may thus have one data element called
"monthly net income of individual by 100 dollar groupings" and one called "monthly net income of
person range 0-1000 dollars", etc., depending on the heterogeneity of representation that exists within
the data holdings covered by one ISO/IEC 11179 registry. Note that these two examples have
different terms for the object class (person/individual) and different value sets (a 0-1000 dollar range
as opposed to 100 dollar groupings).
The result of this is a catalogue of sorts, in which related data element concepts are grouped by a
high-level concept and an object class, and data elements grouped by a shared data element
concept. Strictly speaking, this is not a hierarchy, even if it resembles one.
It is worth noting that ISO/IEC 11179 proper does not describe data as it is actually stored. There is
no part of the model that caters to the description of physical files, tables and columns. All the
ISO/IEC 11179 constructs are "semantic" as opposed to "physical" or "technical".
Since the standard has two main purposes (definition and exchange) the core object is the data
element concept, since it defines a concept and, ideally, describes data independent of its
representation in any one system, table, column or organisation.
The data element is foundational concept in an ISO/IEC 11179 metadata registry. The purpose of the
registry is to maintain a semantically precise structure of data elements.
Each Data element in an ISO/IEC 11179 metadata registry:



should be registered according to the Registration guidelines (11179-6)
will be uniquely identified within the register (11179-5)
should be named according to Naming and Identification Principles (11179-5) See data
element name
 should be defined by the Formulation of Data Definitions rules (11179-4) See data element
definition and
 may be classified in a Classification Scheme (11179-2) See classification scheme
Data elements that store "Codes" or enumerated values must also specify the semantics of each of
the code values with precise definitions
Structure of the ISO/IEC 11179 standard
The standard consists of six parts:
 Part 1 - Framework
 Part 2 - Classification
 Part 3 - Registry metamodel and basic attributes
 Part 4 - Formulation of data definitions
 Part 5 - Naming and identification principles
 Part 6 - Registration
Part 1 explains the purpose of each part. Part 3 specifies the metamodel that defines the registry. The
other parts specify various aspects of the use of the registry.
11179-1: Framework
This part of ISO/IEC 11179 introduces and discusses fundamental ideas of data elements, value
domains, data element concepts, conceptual domains, and classification schemes essential to the
understanding of this set of standards and provides the context for associating the individual parts of
ISO/IEC 11179.
11179-2: Classification
This part of ISO/IEC 11179 provides a conceptual model for managing classification schemes. There
are many structures used to organize classification schemes and there are many subject matter areas
that classification schemes describe. So, this Part also provides a two-faceted classification for
classification schemes themselves.
11179-3: Registry metamodel and basic attributes
This part of ISO/IEC 11179 specifies a conceptual model for a metadata registry, and a set of basic
attributes for metadata for use when a full registry solution is not needed.
11179-4: Formulation of data definition
This part of ISO/IEC 11179 provides guidance on how to develop unambiguous data definitions. A
number of specific rules and guidelines are presented in ISO/IEC 11179-4 that specify exactly how a
data definition should be formed. A precise, well-formed definition is one of the most critical
requirements for shared understanding of an administered item; well-formed definitions are imperative
for the exchange of information. Only if every user has a common and exact understanding of the
data item can it be exchanged trouble-free.
11179-5: Naming and identification principles
This part of ISO/IEC 11179 provides guidance for the identification of administered items.
Identification is a broad term for designating, or identifying, a particular data item. Identification can be
accomplished in various ways, depending upon the use of the identifier. Identification includes the
assignment of numerical identifiers that have no inherent meanings to humans; icons (graphic
symbols to which meaning has been assigned); and names with embedded meaning, usually for
human understanding, that are associated with the data item's definition and value domain.
11179-6: Registration
This part of ISO/IEC 11179 provides instruction on how a registration applicant may register a data
item with a central Registration Authority and the allocation of unique identifiers for each data item.
Maintenance of administered items already registered is also specified in this document.
Additional information
Classification scheme: 11179-2 (Wikipedia)
In metadata a classification scheme is a hierarchical arrangement of kinds of things (classes) or
groups of kinds of things. Typically it is accompanied by descriptive information of the classes or
groups. A classification scheme is intended to be used for an arrangement or division of individual
objects into the classes or groups. The classes or groups are based on characteristics which the
objects (members) have in common. In linguistics, the subordinate concept is called a hyponym of its
superordinate. Typically a hyponym is 'a kind of' its superordinate (Keith Allan, Natural language
Semantics.
The ISO/IEC 11179 metadata registry standard uses classification schemes as a way to classify
administered items, such as data elements, in a metadata registry.
Some quality criteria for classification schemes are:






Whether different kinds are grouped together. In other words whether it is a grouping system
or a pure classification system. In case of grouping, a subset (subgroup) does not have
(inherit) all the characteristics of the superset, which makes that the knowledge and
requirements about the superset are not applicable for the members of the subset.
Whether the classes have overlaps.
Whether subordinates (may) have multiple superordinates. Some classification schemes
allow that a kind of thing has more than one superordinate others don't. Multiple supertypes
for one subtype implies that the subordinate has the combined characteristics of all its
superordinates. This is called multiple inheritance (of characteristics from multiple
superordinates to their subordinates).
Whether the criteria for belonging to a class or group are well defined.
Whether the kinds of relations between the concepts are made explicit and well defined.
Whether subtype-supertype relations are distinguished from composition relations (partwhole relations) and from object-role relations.
Benefits of using classification schemes
Using one or more classification schemes for the classification of a collection of objects has many
benefits. Some of these include:
It allows a user to find an individual object quickly on the basis of its kind or group.
It makes it easier to detect duplicate objects.
It conveys semantics (meaning) of an object from the definition of its kind, which meaning is not
conveyed by the name of the individual object or its way of spelling.
Knowledge and requirements about a kind of thing can be applied to the members of the kind.
Examples of kinds of classification schemes
The following are examples of different kinds of classification schemes. This list is in approximate
order from informal to more formal:
thesaurus - a collection of categorized concepts, denoted by words or phrases, that are related to
each other by narrower term, wider term and related term relations.
taxonomy - a formal list of concepts, denoted by controlled words or phrases, arranged from abstract
to specific, related by subtype-supertype relations or by superset-subset relations.
data model - an arrangement of concepts (entity types), denoted by words or phrases, that have
various kinds of relationships. Typically, but not necessarily, representing requirements and
capabilities for a specific scope (application area).
network (mathematics) - an arrangement of objects in a random graph.
ontology - an arrangement of concepts that are related by various well defined kinds of relations. The
arrangement can be visualized in a directed acyclic graph.
One example of a classification scheme for data elements is a representation term.
Data element definition 11179-4 (Wikipedia)
In metadata, a data element definition is a human readable phrase or sentence associated with a
data element within a data dictionary that describes the meaning or semantics of a data element.
Data element definitions are critical for external users of any data system. Good definitions can
dramatically ease the process of mapping one set of data into another set of data. This is a core
feature of distributed computing and intelligent agent development.
There are several guidelines that should be followed when creating high-quality data element
definitions.
Properties of clear definitions
A good definition is:
Precise - The definition should use words that have a precise meaning. Try to avoid words that have
multiple meanings or multiple word senses.
Concise - The definition should use the shortest description possible that is still clear.
Non Circular - The definition should not use the term you are trying to define in the definition itself.
This is known as a circular definition.
Distinct - The definition should differentiate a data element from other data elements. This process is
called disambiguation.
Unencumbered - The definition should be free of embedded rationale, functional usage, domain
information, or procedural information.
A data element definition is a required property when adding data elements to a metadata registry.
Definitions should not refer to terms or concepts that might be misinterpreted by others or that have
different meanings based on the context of a situation. Definitions should not contain acronyms that
are not clearly defined or linked to other precise definitions.
If you are creating a large number of data elements, all the definitions should be consistent with
related concepts.
Critical Data Element -- Not all data elements are of equal importance or value to an organization. A
key metadata property of an element is categorizing the data as a Critical Data Element (CDE). This
categorization provides focus for data governance and data quality. An organization often has various
sub-categories of CDEs, based on use of the data. e.g.,
Security Coverage – data elements that are categorized as personal health information or PHI
warrant particular attention for security and access
Marketing Department Usage – the Marketing department could have a particular set of CDEs
identified for identifying Unique Customer or for Campaign Management
Finance Department Usage – the Finance department could have a different set of CDEs from
Marketing. They are focused on data elements which provide measures and metrics for fiscal
reporting
Standards such as the ISO/IEC 11179 Metadata Registry specification give guidelines for creating
precise data element definitions. Specifically chapter four of the ISO/IEC 11179 metadata registry
standard covers data element definition quality standards
Using precise words
Common words such as play or run frequently have many meanings. For example the WordNet
database documents over 57 different distinct meanings for the word "play" but only a single definition
for the term dramatic play. Fewer definitions in a chosen word's dictionary entry is preferable. This
minimizes misinterpretation related to a reader's context and background. The process of finding a
good meaning of a word is called Word sense disambiguation.
Examples of definitions that could be improved
Here is the definition of "person" data element as defined in the www.w3c.org Friend of a Friend
specification *:
Person: A person.
Although most people do have an intuitive understanding of what a person is, the definition has much
room for improvement. The first problem is that the definition is circular. Note that this definition really
does not help most readers and needs to be clarified.
Here is the definition of the "Person" Data Element in the Global Justice XML Data Model 3.0 *:
person: Describes inherent and frequently associated characteristics of a person.
Note that once again the definition is still circular. Person should not reference itself. The definition
should use terms other than person to describe what a person is.
Here is a more precise but shorter definition of a person:
Person: An individual human being.
Note that it uses the word individual to state that this is an instance of a class of things called human
being. Technically you might use "homo sapiens" in your definition, but more people are familiar with
the term "human being" than "homo sapiens," so commonly used terms, if they are still precise, are
always preferred.
Sometimes your system may have cultural norms and assumptions in the definitions. For example if
your "Person" data element tracked characters in a science fiction series that included aliens you may
need a more general term other than human being.
Person: An individual of a sentient species.
Data element name 11179-5 (Wikipedia)
A data element name is a name given to a data element in, for example, a data dictionary or
metadata registry. In a formal data dictionary, there is often a requirement that no two data elements
may have the same name, to allow the data element name to become an identifier, though some data
dictionaries may provide ways to qualify the name in some way, for example by the application
system or other context in which it occurs.
In a database driven data dictionary, the fully qualified data element name may become the primary
key, or an alternate key, of a Data Elements table of the data dictionary.
The data element name typically conforms to ISO/IEC 11179 metadata registry naming conventions
and has at least three parts:
 Object, Property and Representation term.
Many standards require the use of Upper camel case to differentiate the components of a data
element name. This is the standard used by ebXML, GJXDM and NIEM.
Example of ISO/IEC 11179 naming in relational databases
ISO/IEC 11179 is applicable when naming tables and columns within a relational database.
Tables are Collections of Entities, and follow Collection naming guidelines. Ideally, a collective name
is used: e.g., Personnel. Plural is also correct: Employees. Incorrect names include: Employee,
tblEmployee, and EmployeeTable.
Columns are Properties of the Entity and are named in a multi-part format:
[Object] [Qualifier] Property RepresentationTerm
The Object part may be omitted from a name when the property is within its object's context. The
Qualifier is used when it is necessary to uniquely identify an element. For example, columns on the
WorkOrders table would be expressed as:
WorkOrder_Number
Requirements_Text
Requesting_Employee_Number
Approving_Employee_Number
For Requirements_Text, the full name (i.e., the name that goes in the registry, or data dictionary) is
WorkOrder_Requirements_Text.
 Object is WorkOrder in full name.
 Property is Requirements in full name.
 RepresentationTerm is Text in full name.
The Requesting_Employee_Number and Approving_Employee_Number columns have Qualifiers to
ensure that the data element names are unique and descriptive. The Object part of the element name
is also omitted because it is declared within the object context.
Note that for the examples provided, an underscore was used as a separator. A separator is not
mandated by ISO/IEC 11179 but is recommended.
Example of ISO/IEC 11179 name in XML
Users frequently encounter ISO/IEC 11179 when they are exposed to XML Data Element names that
have a multi-part Camel Case format:
Object [Qualifier] Property RepresentationTerm
The specification also includes normative documentation in appendices.
For example the XML element for a person's given (first) name would be expressed as:
<PersonGivenName>John</PersonGivenName>
Where Person is the Object=Person, Property=Given and Representation term="Name". In this case
the optional qualifier is not used.
Download