Recommended XML Namespace Scheme For Australian Government Organizations November 2006 Australian Government Information Management Office 1. 2. 3. EXECUTIVE SUMMARY .................................................................. 2 INTRODUCTION .............................................................................. 3 BACKGROUND ................................................................................ 5 3.1. Namespace - References .......................................................... 5 3.2. XML documents and XML Schema ........................................... 5 3.3. Using namespaces – declaration & qualification ....................... 5 3.4. Defining namespaces – the “target namespace” ....................... 6 3.5. The format of a namespace - URI’s, URN’s, URL’s ................... 7 3.6. Registration of a namespace – DNS and IANA ......................... 7 4. THE “xml-gov-au” NAMESPACE SCHEME ..................................... 9 4.1. URN Syntax – IETF RFC2141 ................................................... 9 4.2. Namespace Identifier (NID) proposal ......................................... 9 4.3. Namespace Specific String (NSS) proposal ............................ 10 4.4. Namespace Granularity / Modularity Proposal ........................ 10 4.5. Resolvable Location Proposal ................................................. 12 4.6. Summary .................................................................................. 13 Page 1 of 13 Australian Government XML Namespace Scheme 1. EXECUTIVE SUMMARY The “eXtensible Markup Language” (XML) is rapidly becoming the foundation standard for the interchange of data between organisations at the system to system level. XML plays an important role in Australian Government projects such as “Standardised Business Reporting” (SBR). An XML namespace is a globally unique identifier for a collection of XML elements that are logically related. But rather than using a meaningless string of numbers, an XML namespace uses formal naming schemes to provide both global uniqueness as well as meaningful names. For example, the XML namespace for the first release of the Australian government name & address schema is: urn:xml-gov-au:final:data:NameAndAddress:1.0 As agencies start to collaborate on whole of government projects like SBR, they will each be contributing XML schema from a variety of sources. Accordingly there needs to be a means to uniquely identify each re-usable component and to eliminate confusion due to “name collision” (when two agencies give the same name to a concept). XML namespaces provide a solution to this problem as shown in the conceptual example below. Without Namespaces With Namespaces <identifier>GAVIC411711441</.. <identifier>34098932168</.. <gnaf:identifier>GAVIC411711441</.. <abn:identifier>34098932168</.. This documents outlines a naming scheme for XML namespaces and follows best practice as recommended by organisations like the W3C (World Wide Web Consortium) and IETF (Internet Engineering Task Force) and as applied by other national governments such as the US xml.gov initiative. It is recommended that the XML namespace scheme defined in this document be promoted across Australian Government agencies and jurisdictions. Page 2 of 13 Australian Government XML Namespace Scheme INTRODUCTION XML is a key technology in the development and implementation of IT and data interchange projects around the world and in the Australian Government. Key wholeof-government interoperability projects such as Standardised Business Reporting (SBR) will rely heavily on XML technology and on re-usable libraries of standardised XML data structures (such as Name & Address). Specific projects will re-use XML fragments from various libraries in order to build interoperable schema Eg Name & that meet project requirements. As Address, Payroll the size and complexity of the repository of re-usable XML Common reporting fragments grows, an increasingly Eg International elements important issue will be how to avoid Trade, Environment, Health confusion and name collisions between similar elements from Domain specific cluster Eg Local Laws different domains. This is the reporting elements purpose of the XML “Namespace” concept. A namespace is Miscellaneous (non-cluster) reporting essentially a globally unique identifier for an XML fragment. An example namespace is: urn:xml-gov-au:final:data:NameAndAddress:1.0 It is important that the government establish a cohesive, coordinated namespace approach to support its various XML efforts. This namespace approach must define a standardized structure for namespaces across jurisdictions as well as establish a standardized naming convention for those namespaces. Without such a coordinated approach, individual government organizations will create a proliferation of disparate XML namespace structures and names resulting in chaotic management of XML components. Given the ever expanding proliferation of government namespaces, it is crucial that this strategy be put in place as quickly as possible since harmonizing the namespace structure and names used by different government organizations will become increasingly difficult over time. This report provides some technical background on XML Namespaces and explores the use of the Uniform Resource Name (URN) and Uniform Resource Locator (URL) variants of Uniform Resource Indicators (URIs) for a government namespace naming convention. Finally, the report outlines specific naming rules for XML namespaces. The advantages to agencies of applying the namespace guidelines defined in this document are : That schema components are uniquely identified with meaningful names.. That schema components can be more easily managed through their development lifecycle. That each agency can take responsibility for their schema components without fear of name collision. That complex information schema can be assembled from re-usable components issued by any agency. Compliance with evolving national frameworks such as Standardised Business Reporting (SBR) is facilitated. Page 3 of 13 Australian Government XML Namespace Scheme From the World Wide Web Consortium (http://www.w3.org/TR/REC-xml-names/) We envision applications of Extensible Markup Language (XML) where a single XML document may contain elements and attributes (here referred to as a "markup vocabulary") that are defined for and used by multiple software modules. One motivation for this is modularity: if such a markup vocabulary exists which is well-understood and for which there is useful software available, it is better to re-use this markup rather than reinvent it. Such documents, containing multiple markup vocabularies, pose problems of recognition and collision. Software modules need to be able to recognize the elements and attributes which they are designed to process, even in the face of "collisions" occurring when markup intended for some other software package uses the same element name or attribute name. These considerations require that document constructs should have names constructed so as to avoid clashes between names from different markup vocabularies. This specification describes a mechanism, XML namespaces, which accomplishes this by assigning expanded names to elements and attributes. Page 4 of 13 Australian Government XML Namespace Scheme 2. BACKGROUND This sections provides the necessary technical background to understand XML namespaces. Section 2.1 provides references to detailed material available on the internet. Alternatively, sections 2.2 to 2.6 provide an overview of XML namespace usage, declaration, syntax, and registration in sufficient detail for readers to understand the key concepts. 2.1. Namespace - References For an in-depth technical understanding of XML namespace and related concepts, readers should follow the links provided here: Specification Link Description XML Language http://www.w3.org/TR/2006/REC-xml20060816/ The syntax of the XML language itself. XML Schema http://www.w3.org/TR/xmlschema-0/ XML Schema are used to define the meaning of and validate the contents of XML documents XML Namespace http://www.w3.org/TR/REC-xml-names/ XML Namespace recommendations from W3C URI Syntax http://www.rfc-editor.org/rfc/rfc3986.txt URI’s can be used for namespaces – this document provides IETF recommendations URN Syntax http://www.rfc-editor.org/rfc/rfc2141.txt URN’s are the recommended syntax for xml.gov.au namespaces. This document provides IETF recommendations. US xml.gov http://xml.gov/documents/completed/lmi /GS301L1_namespace.pdf US government policy document on XML Namepsaces was a key reference for the development of this document. 2.2. XML documents and XML Schema It is important that readers clearly understand the difference between an XML “instance document” and an XML “schema”. An XML instance document carries actual data (for example an actual eBAS application from Widget Pty Ltd). An XML schema on the other hand is the formal description of what the instance should look like (for example the definition of the format of any eBAS application). XML instance documents will normally reference one or more XML Schema that define valid content. At runtime, an XML instance is often validated against the schema to ensure accuracy and completeness. For the purposes of this document it is sufficient to understand that a namespace is defined in an XML schema and is then declared and used in XML instance documents. 2.3. Using namespaces – declaration & qualification A namespace is declared in the root element of a Schema or any element of an instance using a namespace identifier in the form of a URI (see section 2.5). For brevity and readability, the namespace identifier is associated with a prefix (usually 3 letters) that is used throughout the XML document to tag elements that belong to the Page 5 of 13 Australian Government XML Namespace Scheme declared namespace. This makes the elements “namespace qualified.” In the following example, the namespace identifier is urn:xml-gov-au:ato:abn and the namespace prefix is abn: <schema xmlns:abn=“urn:xml-gov-au:ato:abn”> prefix namespace identifier This means that any construct in the schema or instance with a namespace prefix of abn belongs to the ATO namespace, as in the following example: <element name=“abn:Identifier” type=“xsd:string”/> Namespaces allow constructs with the same name but from different markup vocabularies to be used in the same Schema with no adverse effects. In the following example, two Identifier elements are used in the same Schema, but they are associated with two different namespaces. One element represents an Australian Business Number in the ATO’s namespace, while the other represents the identification of a geo-coded location in the PSMA (Public Sector Mapping Agencies) namespace. <?xml version=“1.0” encoding=“UTF-8”?> <xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema” xmlns:abn=“urn:xml-gov-au:ato:abn” Defining the namespace and prefix xmlns:gnaf=“urn:xml-gov-au:psma:gnaf”> <xsd:element name=“abn:identifier” type=“xsd:string”/> <xsd:element name=“gnaf:identifier” type=“xsd:string”/> <!—information removed for example purposes—> Using the prefix to qualify element names </xsd:schema> If the identifier elements described above were not in separate namespaces, an XML processor would generate an error. This condition is known as “name collision.” 2.4. Defining namespaces – the “target namespace” The previous section described how namespaces are declared and used in XML documents and schema. This section describes how namespaces are defined in the reference XML schema using the “targetNamespace” construct. The declaration of a target namespace in a Schema indicates that the Schema is acting as a “collector” of constructs declared in it. While a Schema may have more than one namespace defined within it, only one namespace can be designated as the target namespace. A target namespace is declared using the namespace identifier of the selected namespace. In this example, the urn:xml-gov-au:ato:abn namespace is declared as the target namespace: Page 6 of 13 Australian Government XML Namespace Scheme <?xml version=“1.0” encoding=“UTF-8”?> <xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema” xmlns:gsa=“urn:xml-gov-au:ato:abn” targetNamespace=“ urn:xml-gov-au:ato:abn”> This means that any element, attribute, or data type declared in the Schema belongs to the Schema’s target namespace. Target namespaces are valuable because they allow a set of Schema constructs to be collected into a single conceptual space. 2.5. The format of a namespace - URI’s, URN’s, URL’s A URN (Uniform Resource Name) is a globally unique, structured and persistent name for a resource such an XML Schema. A URL (Uniform Resource Location) is a physical and resolvable location for a resource such as an XML Schema. In simple terms, a URN is a name for something and a URL is where you can find it. A URI (Uniform Resource Indicator) is a collective name for both URLs and URNs. Both URLs and URNs have specific syntax and structure guidelines. A typical URN might look like: urn:xml-gov-au:final:data:NameAndAddress:1.0 And a (much more familiar looking) URL might look like: http://xml.gov.au/final/data/NameAndAddress/1.0/NameAndAddress.xsd But a URL could also be: file://C:MyDocuments/MySchema/standards/1.0/NameAndAddress.xsd The point here is that a schema identified with a particular name (eg the URN) could be located in several places – either on a web server on the internet or on a local file system. Namespaces are intended to be long term persistent globally unique identifiers for a resource and are not intended to be resolvable to a physical location. Note that it is common practice today to use the URL Notation for namespace declarations (eg http://www.someDomain/someResource.xsd) but this does not mean that the resource is available from the specified location – just that the URL notation has been used to specify a globally unique name. Although common practice, this is confusing and not recommended for the Australian Government namespace scheme. 2.6. Registration of a namespace – DNS and IANA Both URLs and URNs need a mechanism for global uniqueness. For URLs that mechanism is the global Domain Naming System (DNS) that allocates “domain names” like www.google.com and www.ato.gov.au to business or government entities and maps them to physical internet (IP) addresses. It is then up to the organisation that owns the domain to manage the domain suffix (eg the “/publications/2005/04/agtifv2” in http://www.agimo.gov.au/publications/2005/04/agtifv2). URNs have a similar mechanism for registration and the authority is IANA (Internet Assigned Numbers Authority). However there are only a few hundred URN schemes registered with IANA compared to millions of domains registered for URLs. There are two reasons for this: Page 7 of 13 Australian Government XML Namespace Scheme Most websites do not need persistent names that are different to their physical location. Standard re-usable XML schema components, on the other hand, describe an interface or service that might be implemented by hundreds or thousands of organisations and so need a naming system that is independent of some physical location. There is an informal but widely used practice of relating URN prefixes to URL domains. So for example, the “gov.au” domain is owned and administered by the Australian Government. Therefore a URN prefix of the form au-gov “borrows” the domain registration of the corresponding URL and can be fairly safely used without fear of collision with another URN prefix. Page 8 of 13 Australian Government XML Namespace Scheme 3. THE “xml-gov-au” NAMESPACE SCHEME 3.1. URN Syntax – IETF RFC2141 The proposed structure follows the structure defined by the IETF Network Working Group in RFC 2141. That structure contains the uniform resource identifier consisting of “urn,” the namespace identifier (NID), and namespace-specific string (NSS): urn:<nid>:<nss> Additionally RFC2141 specifies some constraints on the allowed character sets for NID and NSS: NIDs Upper or lower case letters, numbers, and the hyphen character only NSSs NID characters plus any of: ()+,.:=@;$_!*’ Following is an example: urn:xml-gov-au:abs:final:code:1292.0_anzsic:1993 Where NID is xml-gov-au NSS is abs:final:code:1292.0_anzsic:1993 The following sections provide a justification for the proposal of “xml-gov-au” as Australian Government XML Namespace Identifier (NID) and also provide a proposed naming convention for the namespace specific string (NSS). 3.2. Namespace Identifier (NID) proposal It is proposed that the Namespace identifier for use by all Australian government organisations be: urn:xml-gov-au NIDs should be registered with IANA and, although any previously unregistered name could be used, the convention is to use a name that is based on existing registered domain names. The Australian government is already the registered owner of the “gov.au” domain and therefore an NID including “gov-au” is the most obvious candidate for registration as namespace identifier. The “-“ is used in place of the “.” because RFC 2141 only allows alphanumeric and hyphen characters in NIDs. One option is simply to register “gov-au”. However the gov.au domain is very broad and it is foreseeable that there could be other quite separate uses for a “gov-au” beyond XML namespaces. IANA registration of NIDs requires that a naming convention for the NSS is also specified and so it is recommended that a more specific NID be registered that is aligned with the scope of this document. The string “xml” is proposed because it is more specific that something more general like “names” and because it is aligned with the approach already taken by some other governments (eg http://www.xml.gov/). Page 9 of 13 Australian Government XML Namespace Scheme 3.3. Namespace Specific String (NSS) proposal It is proposed that the naming convention for the namespace specific string (NSS) used by Australian government agencies and projects be: {<state>}:<agency | project>:[<status>]:[<type>]:<name>:<version> Where: {<state>} is an placeholder for state level domain name (eg “nsw”). Required if applicable. <agency | project> is a required placeholder for the agency domain name (eg “ato”) or cross-agency project name (eg “crimtrac”). [<status>] is an optional placeholder for an indicator of the status of the artefacts in this namespace. Allowed values are draft | final | obsolete. If not present, the value is assumed to be “final”. [<type>] is an optional placeholder for an indicator of the type of artefacts in this namespace. Allowed values are: “core” – for context neutral core component libraries “data” – for re-usable aggregates. “types” – for data types “codes” – for code-lists “docs” – for final assembly documents / messages “service” – for service or interface definitions <name> Is a required placeholder for a meaningful name that describes the content or purpose of the namespace. Any string of NSS allowed characters except the colon can be used. <version> is a required placeholder for the version of the artefacts in the namespace. All schema or XML files in this namespace inherit this version. The goal of this NSS proposal is to lead to namespace URNs that are meaningful, unique, stable, and consistent across government agencies and jurisdictions. At the same time, the scheme is intended to be simple to implement and flexible enough to meet varying agency or project needs. 3.4. Namespace Granularity / Modularity Proposal A common question in XML schema design is how granular (or how “modular”) to make the schema. For example, a government project may need to develop a standard information model that describes 10 transactions that might be exchanged between several government or business entities. Should the whole model be implemented in one XML namespace or should it be broken down into smaller modules? If it is broken down then how should it be broken down? Page 10 of 13 Australian Government XML Namespace Scheme Etc... Revocation Notification Validation Referral Application The modularity proposal in this section is based on existing best practice from international standards groups that has evolved over the last ten years. An example is the OASIS UBL (Universal Business Library) standard. The diagram below shows a namespace modularity scheme for a hypothetical project in the health and safety domain that involves a process of applying for a permit that requires subsequent cross agency validations and referrals. Final Message Assembly and Miscellaneous Elements Assembled From Security Safety Health Domain Specific Data Types and Business Entity Libraries Are Based On Common Data Types Codes Core Data Types and Common Data Elements Library Fig 1 – Namespace Modularity Scheme Each white box represents one or more namespace(s) that are separately managed and version controlled: Code Lists are maintained in their own namespace. There may be one or more code-lists in one namespace. Code-lists should be grouped in namespaces according to how they are owned and managed. So, for example, AS4590 (Australian Standard Client Interchange) defines a collection of codes such as “Name Suffix” in one specification that is versioned as a whole. Accordingly the collection of code-lists can be in one namespace. On the other hand ABSs ANZSIC code-list is managed and released separately and so should exist in it’s own namespace. Data Types are maintained in their own namespace. Some “core data types” such as “dateTime”, “Measure”, “Amount” would be maintained in a common whole of government namespace (xml-gov-au without any agency or state qualifier). Other domain or project specific data types would be maintained by the responsible agency or project. Data Elements are reusable components like “Address” or “Geocode” that are likely to be re-used in many contexts. They are grouped together into common domains and maintained in their own namespace by the responsible agency. A powerful approach to increase consistency across domains is to define a library of neutral “core components” that are used as the basis for domain specific components. In this way different domains that use the same Page 11 of 13 Australian Government XML Namespace Scheme concept such as “Address” differently are all based on a common set of definitions. Assembly documents are built from re-usable component libraries but each should exist in it’s own dedicated namespace. Examples of assembly documents are a local government building permit form, or an e-Tax submission, or a patient record. This modularity scheme allows agencies to re-use common components. This improves both efficiency and consistency. Furthermore the modularity scheme significantly improves manageability because an entire schema does not need to be re-released when, for example, a new value is added to a code-list. 3.5. Resolvable Location Proposal As discussed in section 2.3, URNs provide a persistent identifier for information schema whilst URLs point to where they are stored. The same schema identified with the same URN can be stored in multiple locations. In principle there need not be any relationship between the two but it is often important to be consistent about schema location as well as naming. Therefore this section provides an optional URL scheme that is based on theURN scheme. Where agencies are hosting their own schema libraries: http://xml.<agency | project>.<state>.gov.au/ <status>/<type>/<name>/<version>/<filename>.<fileExtension> Where agencies choose to host their schema on an external repository (such as http://xml.gov.au): http://<repositoryURL>/<state>/<agency | project>/ <status>/<type>/<name>/<version>/<filename>.<fileExtension> Where <agency | project>, <state>, <status>, <type>, <name> and <version> have the same meaning and values as defined in the URN scheme. <filename> will normally be equal to the name and version of the specification (with underscore separator). The name duplication is intentional because it leads to meaningful filenames in webserver directories. <fileExtension> will normally be “.xsd” for XML schema but could also be other types such as documents or model files that support the schema and exist in the same namespace and directory. <repositoryURL> is the location of the repository service that is hosting the schema. Example: The national name & address standard XML schema URN is: urn:xml-gov-au:final:data:NameAndAddress:1.0 The corresponding schema location would be: http://xml.gov.au/final/data/NameAndAddress/1.0/NameAndAddress_1.0.xsd Page 12 of 13 Australian Government XML Namespace Scheme 3.6. Summary This document defines a namespace URN scheme of the form: urn:xml-gov-au: {<state>}:<agency|project>:[<status>]:[<type>]:<name>:<version> And a resolvable location URL scheme of the form: http://xml.<agency|project>.<state>.gov.au/<status>/<type>/<name>/<version>/ <filename>.<fileExtension> This document also defines a namespace modularity scheme that breaks XML schema into separate namespaces according to schema type and owning agency. The XML namespace scheme proposed in this document provides a foundation for the construction of scalable XML schema libraries that will support strategic projects like Standardised Business Reporting (SBR). Page 13 of 13