GSIM and Metadata Structures

advertisement
GSIM and Metadata Structures
Problem Statement
Currently there is no support in the GSIM model for “reference” metadata such as that to support
quality frameworks and methodological notes.
SDMX has a Metadata Structure which is analogous to the Data Structure.
Model
Introduction
The Metadata Structure recognises that if we can build a generic model for structuring data then we
can do the same for metadata. The use case is clear: organisations will have different metadata that
need to be authored, stored, and consumed in the statistical process lifecycle. And like data these
metadata are many and varied and cannot be harmonised as explicit classes and attributes in a
model. Like the Data Structure the harmonisation of the metadata is achieved outside of the model
by means of the constructs already in GSIM: Concepts, Variables, Classifications etc.
The diagram below shows the classes in the SDMX model but in terms of the GSIM model. Whilst the
way metadata are structured is analogous to the way the data are structured there are some
differences.
1. There is no “observation”
2. The “identifiers” are called Identifiable Object Target and there can be more than one that is
used to identify to which object the metadata pertains. These are analogous to the Identifier
Components in the Data Structure model.
3. The metadata are structured using Metadata Attributes. These are analogous to the
Attribute Component in the Data Structure Model.
Depending on what changes are made to the Data Structure Model the Metadata Attribute and the
Identifiable Object Target could be sub classes of the Identifier Component and the Attribute
Component.
Diagram
class Metadata_Structure
MetadataStructure
1..*
1..*
MetadataTarget
MetadataReportStructure
structureFor
1..*
0..*
hierarchy
1..*
TargetObject
1..*
MetadataAttribute
Specifies that the
metadata reported will
pertain to a specific
identifiable object type
e.g. Dataflow
IdentifiableObj ectTarget
Concept
-
objectType
Concepts::
RepresentedVariable
1
Concepts::
ValueDomain
Restricts the types of
object that are valid for
which metadata can be
reported using this
Metadata Target.
Explanation
Metadata Structure
The purpose of a Metadata Structure is to:
1. Specify the type of object(s) to which metadata are to be attached: this could be any of
structures in the GSIM model or even to a data key or observation (e.g. the identifiers of the
Data Point).
2. Specify what is allowed to be reported in a Metadata Set in terms of Metadata Attributes.
Note that the Identifiable Object Target is at present the only sub class of Target Object but there
will probably be other type of “target” so the Target Object is modelled as an abstract class.
The Metadata Structure can comprise multiple targets. For instance, the Identifiable Object Target
object may be restricted to Design Context in which case the Value Domain for the target object will
be restricted to single value in a specific Enumerated Value Domain which lists all of the possible
Identifiable Artefacts. Or another target could be any of Variable, Represented Variable, and
Instanced Variable.
The Metadata Report Structure specifies the Metadata Attributes that will comprise the metadata
reported in the Metadata Set. There can be many Metadata Report Structures in the Metadata
Structure, each one must be linked to a Metadata Target (and it is possible that one Metadata
Report Structure is linked to many Metadata Targets).
Metadata Set
The metadata are reported in a Metadata Set. The Metadata Set (not modelled) will contain:
1. The identity of the specific object(s) for which the metadata report pertains.
2. The metadata report comprising the metadata values for the Metadata Attributes.
Example
Quality metadata can be expressed in a Metadata Structure and reported in a Metadata Set. The
following metadata attributes are used by the Eurostat ESMS (Euro SDMX Metadata Structure).
An example of part of a report is:
In the SDMX Information Model these metadata relate to a Dataflow (Labour Force Survey). So, the
target object is a Dataflow constrained by the values in an SDMX Category Scheme (note, this is not
the same as a GSIM Category Scheme). In SDMX the Metadata Attribute is linked to a Concept and is
associated to a value domain in the Metadata Attribute and not via a Represented Variables (as
Variables do not exist independently in SDMX)
The actual Metadata Structure for this part of the LFS report (which was authored using SDMX) is
shown below as it relates to the constructs in the model presented above.
An extract of the XML (SDMX) of the Metadata Set is shown below.
Download