Scenario a(ii) Making content standards available for (re)use by the

advertisement

Scenario a(ii)

Making content standards available for (re)use by the SDMX and DDI communities

Summary

A major focus in this case is making content standards (eg recommended question wording, coding methods and “input to output data transformations”) associated with “standard variables” available in a machine interpretable form. This helps others to make active use – in an efficient, accurate and consistent manner – of the standardised content when designing and undertaking their own data collection processes.

A secondary consideration is “post processing” of existing statistical data using the standardised content. For example, if data is available coded to the base (most detailed) level of a standard classification, standardised content related to that classification might be used to

drive aggregation/summarisation of the data to a higher (less detailed) level of categorisation within the classification, or

map the data to another standard classification which has a documented

“correspondence” (sometimes termed “concordance”) to the original classification

For reasons outlined in Scenario a(i), such application of standardised content may be by agencies that use SDMX or by agencies that use DDI.

Premise

Many NSIs are responsible for developing and promoting various "content standards" on a national basis.

In the case of the ABS, the most complex examples are termed " Standard Variables ". ABS

Standard Variables (eg Age ) approach standardisation in regard to matter such as

underlying concepts,

classifications and coding used,

methods of collection (including questions to be used), and

standard output categories (and the relationship between these and the input categories/data)

The information presented in regard to ABS Standard Variables corresponds to multiple types of objects within the SDMX and DDI information models.

At the present time the ABS presents this information in a semi structured manner only designed for use by people. It does not, for example, provide an XML based - or other - encoding directly usable by systems.

It is the case that users external to the ABS may have enough information available to them to create information that is usable by their systems, including through appropriately manipulating and transforming Excel based representations of some standard classifications that the ABS currently makes available. This is, however, a time consuming and potentially error prone process (the external user may have an incomplete understanding of, eg, which codes should be used for which purpose). It can act as a disincentive to (re)using the standard at all, and a barrier to using it properly.

This outcome runs counter to the key reason why the ABS promotes Standard Variables as an output in their own right (rather than just as reference metadata to support statistics aligned with these standards produced by the ABS). Even where other agencies are collecting data primarily for administrative/operational purposes rather than directly as statistical data, the ABS seeks (and has a legislative mandate for doing so) to have data related to standardised variables collected, managed and presented aligned with these standards.

It should be noted that Standard Variables are not the only examples of content standards relevant to the ABS and other NSIs in this regard. Other examples include

standard classifications (independent of their use by a particular Standard Variable) potentially including correspondences with other classifications, coding guides/indices etc

for example Australian and New Zealand Standard Industrial Classification (ANZSIC),

2006 (Revision 1.0) including Downloads and Related Information tabs.

concept schemes independent of their use by a particular Standard Variable

other forms of controlled vocabularies, glossaries etc

A key early goal of the current ABS Information Management Transformation Program is to have such information readily accessible in machine usable, and systematically referenceable, form.

This includes making relevant aspects available as the basis for

SDMX concepts and code lists, and

reference metadata

Scenario a(i) discusses drivers for making content available also to the DDI community. For

Scenario a(ii) there are some additional drivers which are:

Institutions (typically other than NSIs) which make use of DDI V3 to support "end to end" operations may need to refer aspects of Standard Variables and similar content within concept banks, question banks etc. Depending on system architecture, they may need to physically copy relevant information into their local content banks.

NSIs that implement Scenario b(i) (support for end-to-end statistical production process using both SDMX and DDI) many need to be able to consume such "standardised content" from other agencies (eg UN, OECD, Eurostat, other NSIs) in DDI form to support particular sub processes in their end to end statistical production process

Requirements

Requirements around lossless, automated transformation services with high levels of robustness are as per Scenario a(i), but the focus here is on structural metadata and reference metadata from an SDMX perspective rather than data.

Requirements are likely to span

being able to draw on the standardised content to incorporate it "in line" within SDMX messages and (eg) DDI Resource Packages, and

ability to reference the standardised content "in situ" (eg within an externalised registry) from within SDMX messages and (eg) DDI Resource Packages.

Even where only a reference is provided, an application that receives and processes the SDMX message or DDI Resource Package needs to be able to instantiate that reference using whichever standard "it speaks".

It should be noted that particularly for this scenario requirements may be more general than just representing the relevant standardised content using either SDMX or DDI. For example

geospatial standards such as ISO 19115 and ISO 19139 have their own representation standards for common controlled vocabularies etc, and

expression using standards such as RDF appear increasingly important.

It may be that (for types of standardised objects to which it is applicable) the DDI approach of utilising the OASIS/W3C genericode standard as a "generic" means of describing controlled vocabularies and similar constructs would be relevant as a form that is specifically designed to be readily rendered into multiple end forms in a straightforward and consistent manner.

It is recognised the requirements from this scenario could be generalised to a requirement that any content which may need to expressed beyond an NSI (whether commonly standardised forms of content or general content such as the description of the particular statistical activity which produced a particular set of statistical outputs) should be expressible in both SDMX (eg as reference metadata conforming to a standardised metadata structure definition based on agreed semantics) and in DDI.

The focus of Scenario a(ii), however, is specifically on content that an NSI seeks to have reused by others. At this time the more general case is seen as likely to be supported via Scenario b(iii) where an NSI for its own purposes has needs to be able to refer to equivalent content via

DDI (e.g. for "structural" purposes) and SDMX (e.g. for dissemination reference purposes) at different stages in the statistical production process.

Implications

While the ABS has identified a range of forms of "standardised content" believed to be in scope for this scenario, it would seem good to agree a better defined working list of "highest priority" object types in this regard. (In principle, almost any type of content could be standardised and shared, in practice NSIs tend to focus most on particular types of objects such as standardised variables and classifications.)

It should then be possible to identify which are most likely to need to be made available in

SDMX and/or DDI (and/or other standard) forms. This will lead on to identifying those types of standardised content that are most likely to be required in both SDMX and DDI (and possibly other) formats, and the best means of supporting these requirements.

Similarly to Scenario a(i), this is expected to result in building on existing broad thinking about interoperability between SDMX and DDI to achieve mapping that is comprehensive, detailed, robust and agreed in regard to relevant types of standardised content.

Download