Recommended XML Namespace Scheme for Australian

advertisement
Recommended XML Namespace Scheme
For Australian Government Organizations
November 2006
Australian Government Information Management Office
1.
2.
3.
EXECUTIVE SUMMARY .................................................................. 2
INTRODUCTION .............................................................................. 3
BACKGROUND ................................................................................ 5
3.1. Namespace - References .......................................................... 5
3.2. XML documents and XML Schema ........................................... 5
3.3. Using namespaces – declaration & qualification ....................... 5
3.4. Defining namespaces – the “target namespace” ....................... 6
3.5. The format of a namespace - URI’s, URN’s, URL’s ................... 7
3.6. Registration of a namespace – DNS and IANA ......................... 7
4. THE “xml-gov-au” NAMESPACE SCHEME ..................................... 9
4.1. URN Syntax – IETF RFC2141 ................................................... 9
4.2. Namespace Identifier (NID) proposal ......................................... 9
4.3. Namespace Specific String (NSS) proposal ............................ 10
4.4. Namespace Granularity / Modularity Proposal ........................ 10
4.5. Resolvable Location Proposal ................................................. 12
4.6. Summary .................................................................................. 13
Page 1 of 13
Australian Government XML Namespace Scheme
1. EXECUTIVE SUMMARY
The “eXtensible Markup Language” (XML) is rapidly becoming the foundation
standard for the interchange of data between organisations at the system to system
level. XML plays an important role in Australian Government projects such as
“Standardised Business Reporting” (SBR).
An XML namespace is a globally unique identifier for a collection of XML elements
that are logically related. But rather than using a meaningless string of numbers, an
XML namespace uses formal naming schemes to provide both global uniqueness as
well as meaningful names. For example, the XML namespace for the first release of
the Australian government name & address schema is:
urn:xml-gov-au:final:data:NameAndAddress:1.0
As agencies start to collaborate on whole of government projects like SBR, they will
each be contributing XML schema from a variety of sources. Accordingly there
needs to be a means to uniquely identify each re-usable component and to eliminate
confusion due to “name collision” (when two agencies give the same name to a
concept). XML namespaces provide a solution to this problem as shown in the
conceptual example below.
Without Namespaces
With Namespaces
<identifier>GAVIC411711441</..
<identifier>34098932168</..
<gnaf:identifier>GAVIC411711441</..
<abn:identifier>34098932168</..
This documents outlines a naming scheme for XML namespaces and follows best
practice as recommended by organisations like the W3C (World Wide Web
Consortium) and IETF (Internet Engineering Task Force) and as applied by other
national governments such as the US xml.gov initiative. It is recommended that the
XML namespace scheme defined in this document be promoted across Australian
Government agencies and jurisdictions.
Page 2 of 13
Australian Government XML Namespace Scheme
INTRODUCTION
XML is a key technology in the development and implementation of IT and data
interchange projects around the world and in the Australian Government. Key wholeof-government interoperability projects such as Standardised Business Reporting
(SBR) will rely heavily on XML technology and on re-usable libraries of standardised
XML data structures (such as Name & Address). Specific projects will re-use XML
fragments from various libraries in
order to build interoperable schema
Eg Name &
that meet project requirements. As
Address, Payroll
the size and complexity of the
repository of re-usable XML
Common
reporting
fragments grows, an increasingly
Eg International
elements
important issue will be how to avoid
Trade, Environment,
Health
confusion and name collisions
between similar elements from
Domain specific cluster
Eg Local Laws
different domains. This is the
reporting elements
purpose of the XML “Namespace”
concept. A namespace is
Miscellaneous (non-cluster) reporting
essentially a globally unique
identifier for an XML fragment. An example namespace is:
urn:xml-gov-au:final:data:NameAndAddress:1.0
It is important that the government establish a cohesive, coordinated namespace
approach to support its various XML efforts. This namespace approach must define a
standardized structure for namespaces across jurisdictions as well as establish a
standardized naming convention for those namespaces. Without such a coordinated
approach, individual government organizations will create a proliferation of disparate
XML namespace structures and names resulting in chaotic management of XML
components. Given the ever expanding proliferation of government namespaces, it is
crucial that this strategy be put in place as quickly as possible since harmonizing the
namespace structure and names used by different government organizations will
become increasingly difficult over time.
This report provides some technical background on XML Namespaces and explores
the use of the Uniform Resource Name (URN) and Uniform Resource Locator (URL)
variants of Uniform Resource Indicators (URIs) for a government namespace naming
convention. Finally, the report outlines specific naming rules for XML namespaces.
The advantages to agencies of applying the namespace guidelines defined in this
document are :





That schema components are uniquely identified with meaningful names..
That schema components can be more easily managed through their
development lifecycle.
That each agency can take responsibility for their schema components
without fear of name collision.
That complex information schema can be assembled from re-usable
components issued by any agency.
Compliance with evolving national frameworks such as Standardised
Business Reporting (SBR) is facilitated.
Page 3 of 13
Australian Government XML Namespace Scheme
From the World Wide Web Consortium (http://www.w3.org/TR/REC-xml-names/)
We envision applications of Extensible Markup Language (XML) where a single XML
document may contain elements and attributes (here referred to as a "markup
vocabulary") that are defined for and used by multiple software modules. One motivation
for this is modularity: if such a markup vocabulary exists which is well-understood and for
which there is useful software available, it is better to re-use this markup rather than reinvent it.
Such documents, containing multiple markup vocabularies, pose problems of recognition
and collision. Software modules need to be able to recognize the elements and attributes
which they are designed to process, even in the face of "collisions" occurring when
markup intended for some other software package uses the same element name or attribute
name.
These considerations require that document constructs should have names constructed so
as to avoid clashes between names from different markup vocabularies. This specification
describes a mechanism, XML namespaces, which accomplishes this by assigning expanded
names to elements and attributes.
Page 4 of 13
Australian Government XML Namespace Scheme
2. BACKGROUND
This sections provides the necessary technical background to understand XML
namespaces. Section 2.1 provides references to detailed material available on the
internet. Alternatively, sections 2.2 to 2.6 provide an overview of XML namespace
usage, declaration, syntax, and registration in sufficient detail for readers to
understand the key concepts.
2.1. Namespace - References
For an in-depth technical understanding of XML namespace and related concepts,
readers should follow the links provided here:
Specification
Link
Description
XML
Language
http://www.w3.org/TR/2006/REC-xml20060816/
The syntax of the XML language
itself.
XML Schema
http://www.w3.org/TR/xmlschema-0/
XML Schema are used to define
the meaning of and validate the
contents of XML documents
XML
Namespace
http://www.w3.org/TR/REC-xml-names/
XML Namespace
recommendations from W3C
URI Syntax
http://www.rfc-editor.org/rfc/rfc3986.txt
URI’s can be used for
namespaces – this document
provides IETF recommendations
URN Syntax
http://www.rfc-editor.org/rfc/rfc2141.txt
URN’s are the recommended
syntax for xml.gov.au
namespaces. This document
provides IETF recommendations.
US xml.gov
http://xml.gov/documents/completed/lmi
/GS301L1_namespace.pdf
US government policy document
on XML Namepsaces was a key
reference for the development of
this document.
2.2. XML documents and XML Schema
It is important that readers clearly understand the difference between an XML
“instance document” and an XML “schema”. An XML instance document carries
actual data (for example an actual eBAS application from Widget Pty Ltd). An XML
schema on the other hand is the formal description of what the instance should look
like (for example the definition of the format of any eBAS application). XML instance
documents will normally reference one or more XML Schema that define valid
content. At runtime, an XML instance is often validated against the schema to
ensure accuracy and completeness.
For the purposes of this document it is sufficient to understand that a namespace is
defined in an XML schema and is then declared and used in XML instance
documents.
2.3. Using namespaces – declaration & qualification
A namespace is declared in the root element of a Schema or any element of an
instance using a namespace identifier in the form of a URI (see section 2.5). For
brevity and readability, the namespace identifier is associated with a prefix (usually 3
letters) that is used throughout the XML document to tag elements that belong to the
Page 5 of 13
Australian Government XML Namespace Scheme
declared namespace. This makes the elements “namespace qualified.” In the
following example, the namespace identifier is urn:xml-gov-au:ato:abn and the
namespace prefix is abn:
<schema xmlns:abn=“urn:xml-gov-au:ato:abn”>
prefix
namespace identifier
This means that any construct in the schema or instance with a namespace prefix of
abn belongs to the ATO namespace, as in the following example:
<element name=“abn:Identifier” type=“xsd:string”/>
Namespaces allow constructs with the same name but from different markup
vocabularies to be used in the same Schema with no adverse effects. In the following
example, two Identifier elements are used in the same Schema, but they are
associated with two different namespaces. One element represents an Australian
Business Number in the ATO’s namespace, while the other represents the
identification of a geo-coded location in the PSMA (Public Sector Mapping Agencies)
namespace.
<?xml version=“1.0” encoding=“UTF-8”?>
<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema”
xmlns:abn=“urn:xml-gov-au:ato:abn”
Defining the
namespace
and prefix
xmlns:gnaf=“urn:xml-gov-au:psma:gnaf”>
<xsd:element name=“abn:identifier” type=“xsd:string”/>
<xsd:element name=“gnaf:identifier” type=“xsd:string”/>
<!—information removed for example purposes—>
Using the prefix
to qualify
element names
</xsd:schema>
If the identifier elements described above were not in separate namespaces, an XML
processor would generate an error. This condition is known as “name collision.”
2.4. Defining namespaces – the “target namespace”
The previous section described how namespaces are declared and used in XML
documents and schema. This section describes how namespaces are defined in the
reference XML schema using the “targetNamespace” construct.
The declaration of a target namespace in a Schema indicates that the Schema is
acting as a “collector” of constructs declared in it. While a Schema may have more
than one namespace defined within it, only one namespace can be designated as the
target namespace.
A target namespace is declared using the namespace identifier of the selected
namespace. In this example, the urn:xml-gov-au:ato:abn namespace is declared as
the target namespace:
Page 6 of 13
Australian Government XML Namespace Scheme
<?xml version=“1.0” encoding=“UTF-8”?>
<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema”
xmlns:gsa=“urn:xml-gov-au:ato:abn”
targetNamespace=“ urn:xml-gov-au:ato:abn”>
This means that any element, attribute, or data type declared in the Schema belongs
to the Schema’s target namespace. Target namespaces are valuable because they
allow a set of Schema constructs to be collected into a single conceptual space.
2.5. The format of a namespace - URI’s, URN’s, URL’s
A URN (Uniform Resource Name) is a globally unique, structured and persistent
name for a resource such an XML Schema. A URL (Uniform Resource Location) is a
physical and resolvable location for a resource such as an XML Schema. In simple
terms, a URN is a name for something and a URL is where you can find it. A URI
(Uniform Resource Indicator) is a collective name for both URLs and URNs. Both
URLs and URNs have specific syntax and structure guidelines. A typical URN might
look like:
urn:xml-gov-au:final:data:NameAndAddress:1.0
And a (much more familiar looking) URL might look like:
http://xml.gov.au/final/data/NameAndAddress/1.0/NameAndAddress.xsd
But a URL could also be:
file://C:MyDocuments/MySchema/standards/1.0/NameAndAddress.xsd
The point here is that a schema identified with a particular name (eg the URN) could
be located in several places – either on a web server on the internet or on a local file
system. Namespaces are intended to be long term persistent globally unique
identifiers for a resource and are not intended to be resolvable to a physical location.
Note that it is common practice today to use the URL Notation for namespace
declarations (eg http://www.someDomain/someResource.xsd) but this does not
mean that the resource is available from the specified location – just that the URL
notation has been used to specify a globally unique name. Although common
practice, this is confusing and not recommended for the Australian Government
namespace scheme.
2.6. Registration of a namespace – DNS and IANA
Both URLs and URNs need a mechanism for global uniqueness. For URLs that
mechanism is the global Domain Naming System (DNS) that allocates “domain
names” like www.google.com and www.ato.gov.au to business or government
entities and maps them to physical internet (IP) addresses. It is then up to the
organisation that owns the domain to manage the domain suffix (eg the
“/publications/2005/04/agtifv2” in
http://www.agimo.gov.au/publications/2005/04/agtifv2).
URNs have a similar mechanism for registration and the authority is IANA (Internet
Assigned Numbers Authority). However there are only a few hundred URN schemes
registered with IANA compared to millions of domains registered for URLs. There are
two reasons for this:
Page 7 of 13
Australian Government XML Namespace Scheme


Most websites do not need persistent names that are different to their
physical location. Standard re-usable XML schema components, on the
other hand, describe an interface or service that might be implemented by
hundreds or thousands of organisations and so need a naming system that is
independent of some physical location.
There is an informal but widely used practice of relating URN prefixes to URL
domains. So for example, the “gov.au” domain is owned and administered
by the Australian Government. Therefore a URN prefix of the form au-gov
“borrows” the domain registration of the corresponding URL and can be fairly
safely used without fear of collision with another URN prefix.
Page 8 of 13
Australian Government XML Namespace Scheme
3. THE “xml-gov-au” NAMESPACE SCHEME
3.1. URN Syntax – IETF RFC2141
The proposed structure follows the structure defined by the IETF Network Working
Group in RFC 2141. That structure contains the uniform resource identifier consisting
of “urn,” the namespace identifier (NID), and namespace-specific string (NSS):
urn:<nid>:<nss>
Additionally RFC2141 specifies some constraints on the allowed character sets for
NID and NSS:
NIDs
Upper or lower case letters, numbers, and the hyphen character only
NSSs NID characters plus any of: ()+,.:=@;$_!*’
Following is an example:
urn:xml-gov-au:abs:final:code:1292.0_anzsic:1993
Where
NID is
xml-gov-au
NSS is
abs:final:code:1292.0_anzsic:1993
The following sections provide a justification for the proposal of “xml-gov-au” as
Australian Government XML Namespace Identifier (NID) and also provide a
proposed naming convention for the namespace specific string (NSS).
3.2. Namespace Identifier (NID) proposal
It is proposed that the Namespace identifier for use by all Australian government
organisations be:
urn:xml-gov-au
NIDs should be registered with IANA and, although any previously unregistered
name could be used, the convention is to use a name that is based on existing
registered domain names. The Australian government is already the registered
owner of the “gov.au” domain and therefore an NID including “gov-au” is the most
obvious candidate for registration as namespace identifier. The “-“ is used in place of
the “.” because RFC 2141 only allows alphanumeric and hyphen characters in NIDs.
One option is simply to register “gov-au”. However the gov.au domain is very broad
and it is foreseeable that there could be other quite separate uses for a “gov-au”
beyond XML namespaces. IANA registration of NIDs requires that a naming
convention for the NSS is also specified and so it is recommended that a more
specific NID be registered that is aligned with the scope of this document. The string
“xml” is proposed because it is more specific that something more general like
“names” and because it is aligned with the approach already taken by some other
governments (eg http://www.xml.gov/).
Page 9 of 13
Australian Government XML Namespace Scheme
3.3. Namespace Specific String (NSS) proposal
It is proposed that the naming convention for the namespace specific string (NSS)
used by Australian government agencies and projects be:
{<state>}:<agency | project>:[<status>]:[<type>]:<name>:<version>
Where:
{<state>}
is an placeholder for state level domain name (eg “nsw”).
Required if applicable.
<agency | project>
is a required placeholder for the agency domain name (eg
“ato”) or cross-agency project name (eg “crimtrac”).
[<status>]
is an optional placeholder for an indicator of the status of the
artefacts in this namespace. Allowed values are draft | final |
obsolete. If not present, the value is assumed to be “final”.
[<type>]
is an optional placeholder for an indicator of the type of
artefacts in this namespace. Allowed values are:

“core” – for context neutral core component libraries

“data” – for re-usable aggregates.

“types” – for data types

“codes” – for code-lists

“docs” – for final assembly documents / messages

“service” – for service or interface definitions
<name>
Is a required placeholder for a meaningful name that describes
the content or purpose of the namespace. Any string of NSS
allowed characters except the colon can be used.
<version>
is a required placeholder for the version of the artefacts in the
namespace. All schema or XML files in this namespace inherit
this version.
The goal of this NSS proposal is to lead to namespace URNs that are meaningful,
unique, stable, and consistent across government agencies and jurisdictions. At the
same time, the scheme is intended to be simple to implement and flexible enough to
meet varying agency or project needs.
3.4. Namespace Granularity / Modularity Proposal
A common question in XML schema design is how granular (or how “modular”) to
make the schema. For example, a government project may need to develop a
standard information model that describes 10 transactions that might be exchanged
between several government or business entities. Should the whole model be
implemented in one XML namespace or should it be broken down into smaller
modules? If it is broken down then how should it be broken down?
Page 10 of 13
Australian Government XML Namespace Scheme
Etc...
Revocation
Notification
Validation
Referral
Application
The modularity proposal in this section is based on existing best practice from
international standards groups that has evolved over the last ten years. An example
is the OASIS UBL (Universal Business Library) standard. The diagram below shows
a namespace modularity scheme for a hypothetical project in the health and safety
domain that involves a process of applying for a permit that requires subsequent
cross agency validations and referrals.
Final Message Assembly and Miscellaneous Elements
Assembled
From
Security
Safety
Health
Domain Specific Data Types and Business Entity Libraries
Are
Based On
Common Data
Types
Codes
Core Data Types and Common Data Elements Library
Fig 1 – Namespace Modularity Scheme
Each white box represents one or more namespace(s) that are separately managed
and version controlled:

Code Lists are maintained in their own namespace. There may be one or
more code-lists in one namespace. Code-lists should be grouped in
namespaces according to how they are owned and managed. So, for
example, AS4590 (Australian Standard Client Interchange) defines a
collection of codes such as “Name Suffix” in one specification that is
versioned as a whole. Accordingly the collection of code-lists can be in one
namespace. On the other hand ABSs ANZSIC code-list is managed and
released separately and so should exist in it’s own namespace.

Data Types are maintained in their own namespace. Some “core data types”
such as “dateTime”, “Measure”, “Amount” would be maintained in a common
whole of government namespace (xml-gov-au without any agency or state
qualifier). Other domain or project specific data types would be maintained by
the responsible agency or project.

Data Elements are reusable components like “Address” or “Geocode” that are
likely to be re-used in many contexts. They are grouped together into
common domains and maintained in their own namespace by the responsible
agency. A powerful approach to increase consistency across domains is to
define a library of neutral “core components” that are used as the basis for
domain specific components. In this way different domains that use the same
Page 11 of 13
Australian Government XML Namespace Scheme
concept such as “Address” differently are all based on a common set of
definitions.

Assembly documents are built from re-usable component libraries but each
should exist in it’s own dedicated namespace. Examples of assembly
documents are a local government building permit form, or an e-Tax
submission, or a patient record.
This modularity scheme allows agencies to re-use common components. This
improves both efficiency and consistency. Furthermore the modularity scheme
significantly improves manageability because an entire schema does not need to be
re-released when, for example, a new value is added to a code-list.
3.5. Resolvable Location Proposal
As discussed in section 2.3, URNs provide a persistent identifier for information
schema whilst URLs point to where they are stored. The same schema identified with
the same URN can be stored in multiple locations. In principle there need not be any
relationship between the two but it is often important to be consistent about schema
location as well as naming. Therefore this section provides an optional URL scheme
that is based on theURN scheme.
Where agencies are hosting their own schema libraries:
http://xml.<agency | project>.<state>.gov.au/
<status>/<type>/<name>/<version>/<filename>.<fileExtension>
Where agencies choose to host their schema on an external repository (such as
http://xml.gov.au):
http://<repositoryURL>/<state>/<agency | project>/
<status>/<type>/<name>/<version>/<filename>.<fileExtension>
Where

<agency | project>, <state>, <status>, <type>, <name> and <version> have
the same meaning and values as defined in the URN scheme.

<filename> will normally be equal to the name and version of the specification
(with underscore separator). The name duplication is intentional because it
leads to meaningful filenames in webserver directories.

<fileExtension> will normally be “.xsd” for XML schema but could also be
other types such as documents or model files that support the schema and
exist in the same namespace and directory.

<repositoryURL> is the location of the repository service that is hosting the
schema.
Example:
The national name & address standard XML schema URN is:
urn:xml-gov-au:final:data:NameAndAddress:1.0
The corresponding schema location would be:
http://xml.gov.au/final/data/NameAndAddress/1.0/NameAndAddress_1.0.xsd
Page 12 of 13
Australian Government XML Namespace Scheme
3.6. Summary
This document defines a namespace URN scheme of the form:
urn:xml-gov-au:
{<state>}:<agency|project>:[<status>]:[<type>]:<name>:<version>
And a resolvable location URL scheme of the form:
http://xml.<agency|project>.<state>.gov.au/<status>/<type>/<name>/<version>/
<filename>.<fileExtension>
This document also defines a namespace modularity scheme that breaks XML
schema into separate namespaces according to schema type and owning agency.
The XML namespace scheme proposed in this document provides a foundation for
the construction of scalable XML schema libraries that will support strategic projects
like Standardised Business Reporting (SBR).
Page 13 of 13
Download