Document

advertisement
WORLD METEOROLOGICAL ORGANIZATION
_________________________
ET-ADRS-1/ Doc. 2.5(2)
(14.04.2008)
____________
COMMISSION FOR BASIC SYSTEMS
ITEM 2.5
OPAG ON INFORMATION SYSTEMS & SERVICES
EXPERT TEAM ON ASSESSMENT OF DATA
REPRESENTATION SYSTEMS
ENGLISH only
FIRST MEETING
SILVER SPRING, MD, USA, 23 TO 25 APRIL 2008
REVIEW OF DATA REPRESENTATION SYSTEMS
Interoperability Standards at Levels of Syntax and Semantics
(Submitted by the Secretariat)
Summary and purpose of document
The document discusses the need to distinguish between interoperability
standards that operate at the level of syntax (e.g, ASN.1 Abstract Syntax
Notation and XML, eXtensible Markup Language) in contrast to standards
that address semantics (e.g., ISO/IEC 11179, Metadata Registries).
ACTION PROPOSED
The ET-ADRS is invited to consider:
(a)
Syntactic interoperability should be achieved using automated transformations based
on precise structural and data type definitions as expressed through syntax
description languages such as ASN.1 specifications and XML schemas, and
standards in these areas should be promoted through WMO policies and procedures;
(b)
In the absence of a semantic relationship between elements of interest (i.e., precise
definitions of the data or information elements), interoperability at the syntactic level
cannot assure that the resulting integrated or transformed information is meaningful;
(c)
The standard ISO/IEC 11179, Metadata Registries, provides useful guidance for
documenting the meanings of data or information elements, thereby allowing schema
elements to be related to each other within and among data dictionaries;
(d)
A policy promoting broader WMO use of guidance available in ISO/IEC 11179 would
align well with existing WMO procedures, and would complement broader use of the
ISO 191xx series of standards, especially ISO 19115 which uses ISO/IEC 11179 in
defining metadata elements.
Contents:
Interoperability Standards at Levels of Syntax and Semantics............................................... 2
ET-ADRS-1/ Doc. 2.5(2), p. 2
Interoperability Standards at Levels of Syntax and Semantics
1.
Distinguishing Syntax from Semantics
1.1.
Semantics deals with the meaning of a symbol in some language, whereas syntax
deals with the handling of symbols independent of their meaning. Information and
communications technology has always supported many and diverse automated
mechanisms for manipulating the syntax of data and information. Semantics, on the other
hand, has mostly required the active involvement of human analysts.
1.2.
An example of the distinction between semantics and syntax is seen in the
architecture section of the GEOSS 10-Year Implementation Plan Reference Document: 1
Systems interoperating in GEOSS agree to avoid non-standard data syntaxes in
favor of well-known and precisely defined syntaxes for data traversing system
interfaces. The international standard ASN.1 (Abstract Syntax Notation) and the
industry standard XML (Extensible Markup Language) are examples of robust and
generalized data syntaxes, and these are themselves inter-convertible.
It is also important to register the semantics of shared data elements so that any
system designer can determine in a precise way the exact meaning of data occurring
at service interfaces between components. The standard ISO/IEC 11179, Information
Technology--Metadata Registries, provides guidance on representing data semantics
in a common registry.
1.3.
Global organizations such as WMO require policies and procedures that
accommodate wide diversity in data and information syntax. Open standards mechanisms for
syntax transformation should be promoted, including the transformation between ASN.1 and
XML described herein. WMO policies and procedures should also address the needs of
analysts seeking semantic interoperability in specific applications. As explained below, it is
important to distinguish between interoperability standards that address syntax (e.g, ASN.1,
XML) and standards that address semantics (e.g., ISO/IEC 11179).
2.
Standards-based Interoperability at the Level of Syntax
2.1.
Automated mechanisms for manipulating the syntax of data and information include
ways to define structure (e.g., elements, element groups, records) as well as data type
(e.g., numbers, dates, text). There are various conventions for how to associate such
structure and type descriptions with particular instances of data, including external as well as
internal descriptions (sometimes called "self-describing file formats"). There are also various
methods for conveying structural information within a data instance, including start-stop tag
"mark-up" approaches and "start position, length" approaches. It is also important to realize
that some mechanisms have the ability to distinguish bits as individual data elements,
whereas others have whole characters as their smallest distinguishable data element.
2.2.
Examples of the mechanism characteristics described above can be seen with
ASN.1 and XML. Both use external descriptions to associate a given data instance to its
syntax definitions; ASN.1 calls this syntax description a "specification", whereas XML calls it
a "schema". ASN.1 uses "start position, length" to convey structural information, whereas
XML uses start-stop mark-up for the same purpose. ASN.1 can describe data elements to
the bit level, whereas XML is character-oriented (with Unicode as its character repertoire).
1
GEOSS (the Global Earth Observation System of Systems) is directed by the Group on
Earth Observations, an international partnership (see http://earthobservations.org/ )
ET-ADRS-1/ Doc. 2.5(2), p. 3
2.3.
Although information and communications technology has diverse mechanisms for
manipulating syntax, the interoperability challenge here is fairly straightforward. In general, it
is not necessary that interoperating systems come to a joint decision on a single syntax.
Rather, it is only necessary that among the syntaxes in use there exists a standardized
transformation with acceptable "lossiness". For example, two systems can have syntactic
interoperability even though one describes syntax with ASN.1 specifications while the other
uses XML schemas. This is the case between ASN.1 and XML because there does exist a
standardized transformation: XML Encoding Rules.
2.4.
An ASN.1 specification defines what elements of data are allowed, working up from
single bits. It is likely that ASN.1 could be used to describe currently used WMO formats
such as BUFR and NetCDF, as it was designed to describe just about every type of data as
implemented on every type of computing device, including all types of binary data. Of course,
one could design an XML representation of formats such as BUFR and NetCDF without first
constructing an ASN.1 specification. But, the resulting character-based XML messages
would be much larger in size (perhaps several times larger or more). This is not only
because ASN.1 is binary-oriented whereas XML is character-oriented. Efficiency comes
primarily from the ASN.1 techniques for encoding abstract specifications into real messages.
2.5.
In ASN.1, the concrete syntax ("on-the-wire encoding") is provided separately from
the abstract syntax. The concrete syntax is realized through standard "encoding rules", which
are applied over a communications session for the actual message instances. Commonly
used ASN.1 encoding rules are: Basic Encoding Rules which represent each byte present as
defined in the specification; Packed Encoding Rules which group repeated bytes (a
technique broadly implemented as "Zip" file compression); Distinguished Encoding Rules
which represent only the bytes that change on a message to message basis, and XML
Encoding Rules which translate between ASN.1 elements and their XML equivalents.
2.6.
An XML Encoding Rules (XER) tool can automatically and losslessly generate the
corresponding XML schema. (This is greatly simplified because an ASN.1 specification
provides element names that are legal for XML elements as well). The reverse is also true:
XER tools can automatically generate an ASN.1 specification from an XML schema. This
means that an application designer can define any data using ASN.1, including all manner of
binary data down to assigning meanings to individual bits, and then generate an XML
schema. Or, the designer can take an XML stream through XER and have the file
automatically compressed using packed or distinguished encoding rules.
2.7.
Using XER, applications can have the best of both worlds: high transmission
efficiency and the broadest range of data type support through ASN.1, as well as the designtime simplicity of XML representation. XER can be applied at the instance level rather than
the schema level if it is desired that the applications are unaffected. In other words, particular
channels can put XER converters at both ends to enhance transmission efficiency "on-thefly". This is especially useful for bandwidth-constrained channels such as cell phones, as well
as for channels that must carry very high traffic volumes. In many WMO applications, high
data volume and/or narrow bandwidth are major issues, and the efficient packing of binary
data must be of primary interest.
2.8.
ASN.1 is the most common way of describing data that moves by networks. Almost
all of the world's digital voice telephone traffic is described via ASN.1, and most Internet
network control traffic (Simple Network Management Protocol) is described by ASN.1 as well.
The recently approved ITU (International Telecommunication Union) Recommendation
X.1303, Common Alerting Protocol (CAP), provides a good example of supplementing XML
with ASN.1. The use of XER to achieve ASN.1 and XML interoperability in the case of the
standard X.1303, Common Alerting Protocol, is featured in another paper submitted to this
ET-ADRS-1/ Doc. 2.5(2), p. 4
meeting. CAP should become a widely used standard across WMO Members, as it
standardizes all-hazards, all-media public warning.
2.9.
CAP was originally developed in the OASIS standards body, which works primarily
in the realm of e-business and Electronic Data Interchange. Message elements of the CAP
standard were originally expressed using XML, and many applications had already deployed
XML for their CAP messages. On the other hand, the ITU standards body is concerned with
telecommunications efficiency. The ITU insisted that CAP messages be represented in
ASN.1. Because of XER, the addition of ASN.1 for CAP messages was a simple add-on, and
ASN.1 was readily accepted as an additional transmission format in the CAP standard.
2.10.
XER transformations are lossless in both directions. But, there are syntax pairs
where a standardized syntax transformation is not lossless in practice and perhaps not even
in principle. The challenge then is to determine whether the degree of syntax transformation
lossiness is acceptable in the context of the given application that requires interoperability.
Some applications, such as those involving archiving, require syntax mechanisms to be
interoperable over very long time periods. Designers of those applications must be especially
mindful of the robustness of standardized syntax transformations.
3.
Standards-based Interoperability at the Level of Semantics
3.1.
As discussed above, syntactic interoperability is a straightforward task that can be
automated in an application as long as there is a standardized syntax transformation with
acceptable lossiness. Yet, even though two components in an application have interoperable
or identical syntax mechanisms (e.g., both use XML or both use ASN.1), the application
designer must still consider issues of semantic interoperability.
3.2.
For this discussion, data exchanged between components is considered to be
interoperable at the semantic level if its meaning is understood well enough to justify
whatever integrated or transformation operations are being performed. In the simplest case,
the respective data definitions for an element exchanged between two components may be
identical. For example, both components may define a data element for a point location on
the earth. As the definitions are identical, semantic interoperability is established. Of course,
a syntax transformation may be necessary if one component represents the point using
"latitude/longitude" and units of "degrees/minutes/seconds", while the other component
represents the point using "x/y coordinates" and units of "decimal degrees".
3.3.
Semantic interoperability is often constrained in practice by the lack of available data
definitions that are good enough to justify the integration or transformation operations
envisioned for the application. All too often, the definitions of a data or information element is
not written out in a data dictionary, but is only understood in the context of the computer
programs that create or use the element. A somewhat better situation exists among systems
that operate in a single "universe of discourse", such as the community of practices that
focuses on Geographic Information Systems. To the extent that data element meanings are
strongly constrained throughout such a community, system designers might focus primarily
on syntactic interoperability. In general, however, to integrate or transform any data or
information element without an explicit definition is an inherently risky undertaking.
3.4.
The standard ISO/IEC 11179, Metadata Registries, provides useful guidance for
documenting the meanings of data or information elements. 2 Because it is not tied to any
particular information and communications technology, it provides great flexibility in relating
2
A copy of the ISO/IEC 11179 standard is available at
http://standards.iso.org/ittf/PubliclyAvailableStandards/c035343_ISO_IEC_11179-1_2004(E).zip
ET-ADRS-1/ Doc. 2.5(2), p. 5
elements one to the other within and among data dictionaries. As a tool for regularizing
semantic expressions, it may be also helpful in establishing the base for automated inference
in the case of complex semantic issues. Yet, it is important to note that the objective here is
have good definitions for data and information elements. Additional standards from the
discipline of formal logic would need to be applied if the objective is to fully automate
inferencing across collections of data dictionaries or metadata registries.
3.5.
A policy promoting broader WMO use of guidance available in ISO/IEC 11179 would
align well with existing WMO procedures, and would complement broader use of the
ISO 191xx series of standards, especially ISO 19115 which uses ISO/IEC 11179 in defining
metadata elements.
3.6.
The following table gives examples of data definitions written in compliance with
ISO/IEC 11179, as found in the Data Dictionary section of the standard X.1303, Common
Alerting Protocol, referenced above.
Element
Name
alert
Context. Class.
Attribute.
Representation
cap.
alert.
group
identifier
sender
cap.
alert.
identifier
cap.
alert.
sender.
identifier
sent
cap.
alert.
sent.
time
status
cap.
alert.
status.
code
Definition and
(Optionality)
Notes or Value Domain
The container
for all
component parts
of the alert
message
(REQUIRED)
(1) Surrounds CAP alert message sub-elements.
(2) MUST include the xmlns attribute referencing the
CAP URI as the namespace, e.g.:
The identifier of
the alert
message
(REQUIRED)
The identifier of
the sender of
the alert
message
(REQUIRED)
The time and
date of the
origination of the
alert message
(REQUIRED)
The code
denoting the
appropriate
handling of the
alert message
(REQUIRED)
<cap:alert xmlns:cap="urn:oasis:
names:tc:emergency:cap:1.1">
[sub-elements]
</cap: alert>
(3) In addition to the specified subelements, MAY contain one or more <info> blocks.
(1) A number or string uniquely identifying
this message, assigned by the sender
(2) MUST NOT include spaces, commas, or restricted
characters (< and &)
(1) Identifies the originator of this alert. Guaranteed by
assigner to be unique globally; e. g., may be based
on an Internet domain name
(2) MUST NOT include spaces, commas, or restricted
characters (< and &)
The date and time is represented in [dateTime] format
(e.g., "2002-05-24T16:49:00-07:00" for
24 May 2002 at 16: 49 PDT).
Code Values:
“Actual" Actionable by all targeted recipients
“Exercise" Actionable only by designated exercise
participants; exercise identifier should appear in
<note>
“System" For messages that support alert network
internal functions.
“Test" Technical testing only, all recipients disregard
______________________
Download