3 Design Patterns - 188/4 Electronic Commerce Group

XML Schema Design Guidelines 2.1
Working Draft 2003 November 18
This version:
SchemaDesignGuidelines, 2003 Nov 18
Previous version:
SchemaDesignGuidelines-2_0, 2003 Feb 24
Editor:
Kim Bartkus (kim@hr-xml.org)
Bill Kerr (bill.kerr@oracle.com)
Paul Kiel (paul@hr-xml.com)
Lee Humphries (Lee_Humphries@softworks.com.au)
Authors:
Paul Kiel (paul@hr-xml.com)
David Baliles (david.baliles@personic.com)
Chuck Allen (chucka@hr-xml.org)
Lee Humphries (Lee_Humphries@softworks.com.au)
Contributors:
Technical Steering Committee (TSC) (tsc@lists.hr-xml.org)
Copyright statement
©2003 HR-XML. All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means, electronic,
mechanical, photocopying, recording, or otherwise, without the prior written permission of
the publisher. Printed in the United States of America.
Abstract
XML Schemas are a fundamental technology that governs the structure of HR-XML document
types. They are a document syntax that allows for the automated processing and error checking
of an XML document via a parser. This document is intended to show the architecture of HR-XML
schemas, including some best practices and general design principles.
Status of this Document
This document has been submitted to the TSC for review. It is a draft and should not be
considered an official technical note from the TSC.
This document URL: http://schemas.hr-xml.org/xc/canon/TSC/SchemaDesignGuidelines.doc
106752302
1
Target Audience
This document is intended for assisting work group and Cross Process Object activities in
designing modular, reusable, and extensible HR-XML document types. It is also intended for
those new to schema, with guidelines on best practices.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in RFC 2119.
Table of Contents
1
Scope and Purpose.................................................................................................................. 3
1.1
Introduction: Definitions ................................................................................................. 3
1.1
Scope ............................................................................................................................ 3
1.2
Purpose ......................................................................................................................... 3
1.3
Relationship between examples and work group activities ........................................... 4
2
Naming Conventions ................................................................................................................ 4
2.1
Use CamelCase for Element and Attribute Names....................................................... 4
2.2
Upper and Lower CamelCase ....................................................................................... 4
2.3
File Naming Conventions .............................................................................................. 4
2.4
Use Meaningful Element and Attribute Names ............................................................. 5
2.5
Don’t Use Private Encodings......................................................................................... 5
2.6
Fully-qualified and Context-aware naming .................................................................... 5
2.6.1 Locally scoped naming : Context-aware .......................................................... 6
2.6.2 Globally scoped naming : Fully-qualified .......................................................... 6
2.7
Naming types ................................................................................................................ 7
3
Design Patterns........................................................................................................................ 8
3.1
Use Elements to Represent Data Content .................................................................... 8
3.1.1 Elements/Attributes: Other Considerations ...................................................... 8
3.2
Special Content Models ................................................................................................ 8
3.2.1 Mixed ................................................................................................................ 8
3.2.2 ANY .................................................................................................................. 9
3.2.3 Recursive ......................................................................................................... 9
3.2.4 xsd:all ............................................................................................................. 11
3.3
Elements and complexTypes ...................................................................................... 11
3.4
Extended Enumerations .............................................................................................. 12
3.4.1 Objective ........................................................................................................ 12
3.4.2 Scope ............................................................................................................. 12
3.4.3 Possible approaches: Convention or Pattern Extension ................................ 12
3.4.4 Data and metadata: what is good XML design? ............................................ 15
3.5
Final note on enumerations: limitations of use ............................................................ 15
4
Scoping of components ......................................................................................................... 15
4.1
Global and local elements/types ................................................................................. 15
4.2
Design Approaches – Russian Doll, Salami Slice and Venetian Blind ........................ 16
4.3
Scope decision tree ..................................................................................................... 17
5
Namespaces and Versioning ................................................................................................. 18
5.1
Namespaces ............................................................................................................... 19
5.2
Versioning ................................................................................................................... 20
5.3
An instance example .....................................................Error! Bookmark not defined.
106752302
2
6
Localization (supporting region-specific data) ........................................................................ 20
7
Appendix A - References ....................................................................................................... 21
8
Appendix B - Document Version History ................................................................................ 22
9
Appendix D - Future Issues ................................................................................................... 22
9.1
General Items .............................................................................................................. 22
1
Scope and Purpose
1.1 Introduction: Definitions
XML:
XML stands for eXtensible Markup Language. It enables the user, by means of tags (e.g.
'<Person>', '<Salary>’) to define the semantic meaning of a given XML document. An XML
document contains a set of tags and data content, which follow a particular structure. A
Schema may define the structure of the XML document. The user or a special interest group
such as the HR-XML Consortium can define these structures for others to use. The ability to
create standard structures enables two or more interested parties to exchange information in
a way that the receiver can understand and validate.
Schema:
A schema serves the same purpose as a DTD, but a schema provides a more
powerful and flexible means of defining markup languages.
A schema is written using XML syntax. This is not true of DTDs, which use a syntax
specified for Standard Generalized Markup Language (SGML), a precursor
technology of which XML Version 1.0 is a subset. Schemas support data typing and
constructs that allow one element to derive properties from another element type.
Several groups and companies developed their own schema specifications in advance of the
World Wide Web Consortium’s (W3C) issuance of its XML Schema Definition Language
(“XSDL” or “XSD”) recommendation. The term “schema” is sometimes used loosely in a way
that encompasses DTDs as well as the various schema formats. However, as used in
this document, “schema” refers to the W3C’s XML Schema Definition Language
(“XSDL” or “XSD”) recommendation. For further information, see http://www.w3c.org.
1.1
Scope
This document provides design guidelines and conventions for XML Schemas developed by HRXML workgroups.
1.2
Purpose
HR-XML’s Schema design guidelines and conventions are intended to ensure that XML Schemas
developed by HR-XML workgroups are:


Easy to use and intelligible to the various end-user constituencies -software developers,
HR professionals, database administrators, etc.
Consistent in design across HR-XML workgroups
106752302
3







Consistent with a library-based approach
Easy to maintain
Simple
Extensible (without undermining the standard!)
Version control featured
“Best Practices” aware
Context aware
These guidelines are for stating the best practices of schema design within the HR-XML
Consortium.
1.3
Relationship between examples and work group activities
While examples in this document occasionally draw upon working drafts of schemas being
developed within the HR-XML Consortium, these schemas were often edited for space or
simplicity and should not be construed as representing valid instances or schema fragments
resulting from actual work group activities.
2 Naming Conventions
2.1
Use CamelCase for Element and Attribute Names
Element names that contain multiple words, should have each word capitalized. This is known as
“camel case” notation. Other special delimiting characters, such as ‘_’, ‘-‘, ‘.’ should be avoided.
An exception to this convention is made for element names that contain abbreviated words – in
which case the abbreviated letters are in upper case.
2.2
Upper and Lower CamelCase
Follow these conventions for the use of upper and lower CamelCase:
 All element names are upper CamelCase – e.g., PersonName
 All attribute names are lower CamelCase – e.g., familyNamePrefix
 Within elements, abbreviations are upper CamelCase – e.g. IntlCode, TelNumber
 Within elements, acronyms are UpperCase – e.g. UL, TTDNumber
 Within attributes, acronyms are lower CamelCase, i.e. src, idRef

2.3
Enumerated list values must use the rules above that apply to elements, meaning the use
of upper CamelCase
File Naming Conventions
Filenames also should follow the CamelCase convention. For instance, a schema written to
support a benefits enrollment process might be named:
BenefitsEnrollment.xsd
106752302
4
2.4
Use Meaningful Element and Attribute Names
Choose element names that reflect the business language you are modeling. They should be
meaningful and easy to understand. Abbreviations within names should be avoided, unless the
element name would otherwise be excessively long. When using abbreviations, the most
commonly accepted and clearest version should be used. Keep in mind international audiences,
so avoid regional abbreviations.
<Shp2> Don’t do this </Shp2>
<Dptmnt> Don’t do this </Dptmnt>
<Dept> Write it this way </Dept>
2.5
Don’t Use Private Encodings
When practical, use distinct elements to separate parts of data. If you have a data item that has
internal structure, separating the parts with tags enables an XML parser to do the work of
distinguishing components for you, which saves you the cost of writing a custom parser.
Additionally, you will have made the structure publicly readable and thereby available to standard
tools such as XSL style sheets and query languages. If something has more than one part, it is
likely that it will gain still more parts in the future, and using tags up-front means easier extension
later.
For example:
<Price>USD 123.45</Price> is semantically easier to parse as:
<Price currency="USD">123.45</Price>
2.6
Fully-qualified and Context-aware naming
Within work groups, there is a tension between modeling schemas based on top-down and
bottom-up processes. The top-down approach uses object-oriented models, developed from a
well-researched model of the domain space. The bottom-up approach is based on the need for
immediate, utilitarian business document modeling in a space where there may not be any extant
domain object models.
To illustrate this tension, consider the top-down model oriented element structure:
<Object>
<Type>
<Name/>
</Type>
</Object>
The corresponding bottom-up, utilitarian xml might look like this:
<ObjectTypeName/>
A comparison:
Model Oriented
Pros
Cons
Allows for greater reuse of components.
Takes longer to model domain properly.
Minimizes the number of unique components,
easing implications on processing and
Bloats instance with more tags, especially in
large documents.
106752302
5
glossaries.
Does not lend itself easily to using Prime,
Class, and Modifier words in each element.
Utilitarian
Pros
Cons
Can be implemented faster.
Components are not as reusable.
Lends itself easily to using Prime, Class, and
Modifier words for each element.
Increases the number of unique components,
with implications in processing and glossaries.
(i.e. <Object1TypeName/>, <Object2TypeName/>,
<Object3TypeName/>)
Given the scenarios described above, HR-XML naming conventions will try to balance fully
qualified and context-specific naming patterns. For components (elements, attributes, types)
scoped globally, use class name based naming (fully qualified). For locally scoped components,
use context aware naming.
2.6.1
Locally scoped naming : Context-aware
When a component is deemed to be scoped locally, the context of the name MAY be considered.
1
2.6.2
Globally scoped naming : Fully-qualified
A data element name is usually composed of between one and three component words. Limiting
names to this length keeps the naming conventions consistent and minimizes the performance
drain from excessively long element names.
Each component is classified as a prime word, class word, or modifier word.
First, a data name may contain a prime word designator. Prime words identify business
information objects to which the data element (or attribute) belongs, for example <Insurance>.
Generic elements such as <Address> may not need an industry-specific prime word, especially if
it will be used in many contexts.
Second, elements and attributes may be classified into common groupings using class words.
Classes can better organize and provide meaningful names for elements.
Common class names include: Cost, Date, Days, Day, Description, Id, Idref, Indicator, Name,
Number, Rate, and Ratio.
1
I believe there may still be some confusion regarding the context surrounding the naming of
local elements. During the development and final reviews of the Background Checking
specification, the one suggestion was to include the context of the element within the name, and
the other suggestion was to allow the parent element define the context. To a certain degree, this
was also reflected in some of the comments from the HRDD workgroup. As one example, a
number of our clients have identified confusion regarding the ReferenceId element which is used
in several contexts. In each case, the context is defined by the Parent element.
106752302
6
Lastly, in addition to a prime and class word designator, a data name may contain one or more
modifier words or adjectives that further describe the data element.
An example that illustrates the use of all three types of component names is <JobPostingDate>.
“Job” is a business object specific to this domain, “Posting” further modifies the term “Job”, and
“Date” is a class name indicating the type of data element.
Good Examples:
<attribute name="expenseRatio"/>
Here the class word “Ratio” is used to indicate the class of information.
<element name="DropDeadDate"/>
If the data element matched, this could also be DropDeadDay or DropDeadDays.
<element name="ErrorDescription"/>
“Description” indicates the meaning of this element, as opposed to an error code or an error
handling instruction.
<attribute name="jobSeekerId"/>
This name includes clear prime, class, and modifier words.
<attribute name="insuranceIndicator" default="covered"/>
Bad Examples:
<element name="TheCode"/>
The class word “Code” does not make this element name more comprehensible. The choice of
prime word is equally as bad.
<element name="ListOfAllVacanciesInTheDatabase"/>
If all elements were given such long names, the impact on performance with large documents
would begin to show.
<element name="Item"/>
This element is unclear.
2.7
Naming types
It is the Consortium’s policy for the name of types to correlate with the components which use
them. This is done by appending the key word “Type” to the end of the name of a components
which would use the type. For example, the type used in the element “PersonName” would be
called “PersonNameType”.
This pattern is a loose correlation, however, as many different elements with different names
could use the same type (i.e. the element “Participant” could be of type “PersonNameType”). This
guideline is intended to semantically relate types with components for ease of use and readability
and should be used in that context.
106752302
7
3 Design Patterns
3.1
Use Elements to Represent Data Content
The choice of whether to represent semantic components as elements or attributes is made on a
case-by-case basis. However, the general rule is that elements should be used to represent
objects and data content. Attributes should be used to represent element metadata.
One rule to apply is to see if the data item could exist on its own as an inherent piece of
information. If the information is a distinct item, but dependent upon another, make it a child
element. If it merely refines a larger item, then use an attribute.
3.1.1
Elements/Attributes: Other Considerations
In deciding whether a semantic component should be represented as an element or attribute, the
following key differences should be considered:
XML 1.0 rules to bear in mind
3.2

Elements can contain only other elements or data, while attributes can contain only
data. Thus, if a semantic component has a complex structure or could potentially
contain other elements, the component should be modeled as an element.

Elements can be declared so XML 1.0 conforming parsers preserve the order of
elements. You cannot similarly control the order of attributes. Some parsers may
preserve the order of attributes, but this is not a requirement under XML 1.0.

Processors that use the SAX interface return attributes as a set with the start of an
element. In contrast, child elements are returned one-by-one.

An element can have only one attribute of type “ID.” (And of course attributes of type
“ID” must have unique values within the entire document.)

Elements and attributes must begin with a letter. (i.e. <7DaysAWeek/> is not a valid
XML 1.0 element)

If the potential values of data are known, then an enumerated list of values will
provide enforcement and error checking. [See section on Extended Enumerations.]
Special Content Models
Guidance on the use of the following content models: Mixed, ANY, recursive, and xsd:all.
3.2.1
Mixed
Mixed content models are elements that contain data as well as sub elements. These content
models MUST NOT be used in any context. Processing this data by applying style sheets, for
example, produces inconsistent results across parser implementations. The use of a child
element to contain the mixed data can solve this issue. In this manner, all components are either
data or elements, but not both.
Avoid:
<Book>The HR-XML Story
106752302
8
<Author>Chuck Allen</Author>
</Book>
Good:
<Book>
<Title> The HR-XML Story </Title>
<Author>Chuck Allen</Author>
</Book>
Mixed should only be used in HR-XML where there is a necessity to accommodate either text or
elements within one element (an equivalent to Choice). The original reason for mixed content is
to allow for text markup, therefore it may still be desirable in certain contexts, e.g. the text of
résumés.
3.2.2
ANY
Content models of type ANY should be used with caution. This type of content can serve the
positive purpose of enabling extensions or integrations but need to be restricted enough to
prevent dilution of the standard. Careful selection of its use is encouraged.
The Technical Steering Committee should be consulted on any potential exceptions to this rule.
3.2.3
Recursive
A recursive data content model is where an element contains itself as a descendent.
example:
For
<xsd:element name="RootElement">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="ChildElement"/>
<xsd:element ref="AnotherChild"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="ChildElement">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="RootElement"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<!—above: ChildElement calls its parent as a child -->
<xsd:element name="AnotherChild">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="AnotherChild"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<!—above: AnotherChild calls itself directly -->
The use of a recursive content model should be done with caution. This structure could lead to
system problems with creating or processing a document such as with applying style sheets. For
example, if a program or stylesheet was created to “walk the tree” so to speak, simply processing
106752302
9
each node in turn, then there is the potential for causing infinite looping. Design of such
processing should take this potential into account.
An alternate way to model this type of relationship is to use ID and IDREF. For example,
<xsd:element name="RootElement">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="ChildElement"/>
<xsd:element ref="AnotherChild"/>
</xsd:sequence>
<xsd:attribute name="id" type="ID" use="required"/>
</xsd:complexType>
</xsd:element>
<xsd:element name="ChildElement">
<xsd:complexType>
<xsd:simpleContent>
<xsd:restriction base="string">
<xsd:attribute name="rootIdRef" type="IDREF" use="required"/>
</xsd:restriction>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
<!—above: ChildElement points to its parent -->
<xsd:element name="AnotherChild">
<xsd:complexType>
<xsd:simpleContent>
<xsd:restriction base="string">
<xsd:attribute name="id" type="ID" use="required"/>
<xsd:attribute name="anotherIdRef" type="IDREF" use="required"/>
</xsd:restriction>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
<!—above: AnotherChild points to a sibling -->
Occasionally the correct modeling of business objects and their relationships requires recursion,
rather than the “implied” relationships using IDREF. Below is one example, where an
unstructured form of resume enables nested sections. This, in essence, provides structure for
data that may not conform to structures modeled elsewhere in the schema.
Example instance:
<FreeFormResume>
<ResumeSection>
<SectionTitle>Skills</SectionTitle>
<SecBody>
<!-- here, ResumeSection is a descendent of itself describing
the “Management” part of the Skills section -->
<ResumeSection>
<SectionTitle>Management</SectionTitle>
<SecBody>
<P>Supervised 8 FTE.</P>
</SecBody>
</ResumeSection>
<!-- here, ResumeSection is again a descendent of itself describing
the “Information Technology” part of the Skills section -->
<ResumeSection>
<SectionTitle>Information Technology</SectionTitle>
<SecBody>
<P>Programmed in C++ and Java.</P>
106752302
10
</SecBody>
</ResumeSection>
</SecBody>
</ResumeSection>
</FreeFormResume>
3.2.4
xsd:all
The use of the XML Schema “all” occurrence (meaning child elements must all occur but can
occur in any order) is strongly discouraged. This usage lends itself easily to ambiguous content
models, which can cause interoperability problems across parser implementations.
3.3
Elements and complexTypes
Choosing whether to model a data item as an <element> or <complexType> has a significant
impact on its future use. In XML Schemas, complexTypes can be reused whereas elements can
only be “referenced”. This distinction can be the difference between good modular design and
inflexible rigidity.
Take for example:
<xsd:element name="PhoneNumber">
<xsd:element name="Prefix"/>
<xsd:element name="MainNumber"/>
</xsd:element>
In this Schema, the element <PhoneNumber> is clearly defined with 2 sub elements. Creating a
<FaxNumber> element with the same structure would require declaring the same content model
thus duplicating items. On the other hand, the “PostalAddress” definition in the example below is
a complex data type, not an element. This builds in modularity, allowing the creation of multiple
elements with this same content model, as shown in <DeliveryAddress> and <BillingAddress>.
This modularity allows for the building of reusable components during the design phase.
The modularity extends beyond the ability to declare multiple elements of the same type. In the
maintenance phase of the Schema, making changes to a single data type declaration will affect all
elements that refer to it. In this example, editing the “Address” declaration will affect changes in
both <DeliveryAddress> and <BillingAddress>.
In designing Schemas, careful examination of the nature of the business object can determine if it
is likely to be reused elsewhere or in the future. If this is so, then using <complexType> will
reduce duplicative declarations and provide extensibility. Conversely, truly unique items can
simply be declared as an <element>.
Example – Address/Phone Number:
01: <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
02: <xsd:complexType name="PhoneNumber">
03:
<xsd:sequence>
04:
<xsd:element name="Prefix"/>
05:
<xsd:element name="MainNumber"/>
06:
</xsd:sequence>
07: </xsd:complexType>
08: <xsd:complexType name="PostalAddress">
09:
<xsd:sequence>
10:
<xsd:element name="AddressLine" type="xsd:string"/>
11:
<xsd:element name="Municipality" type="xsd:string"/>
12:
<xsd:element name="Region" type="xsd:string"/>
13:
<xsd:element name="PostalCode" type="xsd:string"/>
106752302
11
14:
</xsd:sequence>
15: </xsd:complexType>
16: <xsd:element name="FaxNumber" type="PhoneNumber"/>
17: <xsd:element name="DeliveryAddress" type="PostalAddress"/>
18: <xsd:element name="BillingAddress" type="PostalAddress"/>
19: </xsd:schema>
It is important to note that types can also be extended, either via restriction or extension when a
completely new data type is not needed. In the example below, the base type “string” is restricted
in this attribute. In this case, the attribute may have one of an enumerated list of only 5 possible
strings. In this fashion, existing types can be used to derive other types.
Using extension, existing types can be used to derive other types.
Example fragment:
<xsd:attribute name="PhoneType">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="voice"/>
<xsd:enumeration value="fax"/>
<xsd:enumeration value="pager"/>
<xsd:enumeration value="TDD"/>
<xsd:enumeration value="cell"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
3.4
3.4.1
Extended Enumerations
Objective
HR-XML Consortium work groups are increasingly grappling with a common problem regarding
the use of enumerations in schemas. The traditional use of enumerations consists of a fixed
number of provided values, which are determined at the time of the schema design. The problem
arises when a business process is modeled with a schema that includes enumerated lists that do
not cover 100% of the foreseen cases. The main question arises: how can a work group
standardize enumerated values when less than 100% of the foreseeable values are known at
design time? The objective of this text is to endorse a method for standardizing enumerated
values without preventing extensions to cover unknown or trading partner specific values.
3.4.2
Scope
This text focuses on Consortium-wide endorsement of XML Schema enumeration standardization
and extension methods. For the purposes of this text, the problem is referred to as “incomplete
enumeration lists.”
3.4.3
Possible approaches: Convention or Pattern Extension
The two most debated approaches to standardizing incomplete enumeration lists are a union of
values with a string, essentially a convention, and a union of values with a pattern, known as
pattern extension.
106752302
12
The Convention Method
The “convention” method consists of a list of known enumerated values unioned with a string.
Consider a scenario where there is a contact method element called <VoiceNumber> (lines 01 to
09). This element models telcom number data used for contacting some person. An attribute of
this element, called “whenAvailable” (line 05) is used to model when this VoiceNumber is
considered a valid contact. A schema design team may know many of the values may occur in
this attribute, i.e. “daytime”, “nighttime”, “weekends”, etc. However, they may want to allow more
values than are stated at design time, while still standardizing the known values.
01:
02:
03:
04:
05:
06:
07:
08:
09:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
<xsd:element name="VoiceNumber">
<xsd:complexType>
<xsd:simpleContent>
<xsd:restriction base="xsd:string">
<xsd:attribute name="whenAvailable" type="whenAvailableExtensionType"/>
</xsd:restriction>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
<!-- the known enumerations for availability -->
<xsd:simpleType name="whenAvailableType">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="daytime"/>
<xsd:enumeration value="nighttime"/>
<xsd:enumeration value="weekends"/>
</xsd:restriction>
</xsd:simpleType>
<!-- this is the unioning of the enumeration with a string -->
<xsd:simpleType name="whenAvailableExtensionType">
<xsd:union memberTypes="whenAvailableType xsd:string"/>
</xsd:simpleType>
In this example, the known values are stated as an enumeration (lines 11 to 17). The final step
that makes this a convention is where it is unioned with a string data type (lines 19 to 21). This
union means that the attribute “whenAvailable” must be either a member of the enumerated list
OR a string.
Since all enumerated values given are also strings, this in effect makes the data type a string.
The parser cannot know that the value “daytimmme” is a misspelling of an enumerated value and
should be an error. The value is still a string, so it is considered valid. According to the parser,
both of the instances below are considered valid.
<VoiceNumber whenAvailable="daytime">11111111</VoiceNumber>
<VoiceNumber whenAvailable="daytimmme">2222222</VoiceNumber>
The advantage of going to the trouble of enumerating values when the result is essentially a
string, is that implementers have a set of known values they can use to pass data that need not
be detailed in a Trading Partner Agreement. It standardizes values, but only as a convention.
The disadvantage is that validation of values falls on the implementation instead of the parser.
The effective data type of the data is a string, and so implementations must determine if the value
is valid or a misspelled enumeration for example.
The Pattern Extension Method
The pattern extension method is similar to the convention except that the parser is able to
distinguish between standard and non-standard values. Consider the same scenario as above, a
<VoiceNumber> element with a “whenAvailable” attribute.
106752302
13
01:
02:
03:
04:
05:
06:
07:
08:
09:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
<xsd:element name="VoiceNumber">
<xsd:complexType>
<xsd:simpleContent>
<xsd:restriction base="xsd:string">
<xsd:attribute name="whenAvailable" type="whenAvailableExtensionType"/>
</xsd:restriction>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
<!-- the known enumerations for availability -->
<xsd:simpleType name="whenAvailableType">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="daytime"/>
<xsd:enumeration value="nighttime"/>
<xsd:enumeration value="weekends"/>
</xsd:restriction>
</xsd:simpleType>
<!-- the pattern used for all extension enumeration values -->
<xsd:simpleType name="otherPatternType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="x:\S.*"/>
</xsd:restriction>
</xsd:simpleType>
<!-- here is where we bring them together, either is it a known enumeration or
it is an enumeration that begins with the string "x:" -->
<xsd:simpleType name="whenAvailableExtensionType">
<xsd:union memberTypes="whenAvailableType otherPatternType"/>
</xsd:simpleType>
In this case, the known values for the attribute are stated the same as in the convention method
(lines 11 to 17). However, a pattern is used to establish the format of any non-standard values
(lines 19 to 23). This pattern states that any value conforming to this pattern must begin with the
string “x:” (not including the quotes) and must consist of a string with at least one character, and
may include spaces (line 21). When the enumeration is unioned with this pattern (lines 26 to 28),
the result is that the attribute must be either an enumerated value OR a string beginning with “x:”.
For consistency, the pattern should always use lower case “x”.
<!-- valid values from the enumeration list -->
<VoiceNumber whenAvailable="daytime">11111111</VoiceNumber>
<VoiceNumber whenAvailable="nighttime">2222222</VoiceNumber>
<!-- valid values, conforming to the pattern extension-->
<VoiceNumber whenAvailable="x:everyTuesday">4444444</VoiceNumber>
<VoiceNumber whenAvailable="x:one summer night">5555555</VoiceNumber>
<!-- invalid values, neither enumerated values nor do they conform to the pattern -->
<VoiceNumber whenAvailable="mondaysOnly">5555555</VoiceNumber>
<VoiceNumber whenAvailable="daytimmme">5555555</VoiceNumber>
The advantage of this approach is that the parser can enforce the difference between standard
and non-standard values using the delimiter pattern “x:”. It can also enforce the list of standard
enumerations (see last instance example above).
The disadvantage is the requirement for processing the extension values, as the delimiter must be
discarded.
106752302
14
3.4.4
Data and metadata: what is good XML design?
Given the two methodologies discussed, the technical temptation is to endorse the Pattern
Extension method, as it utilizes the parser much more effectively. But does this violate a principle
of good XML design? In designing schemas, editors often adhere to the notion that data should
be separated from metadata. Further, an often-used design rule is that elements should contain
data and attributes metadata. Given the Pattern Extension method, is this combining data and
metadata in the attribute value?
The Technical Steering Committee has determined that while the principle of separating data from
metadata has significant merit, in this case, making the best use of the parser can be more
important. Consequently, it endorses the Pattern Extension method of standardization of
incomplete enumeration lists when most of the values are known. When only a few values are
know, the Convention Method is acceptable as well as using an atomic data type such as a simple
string.
3.5
Final note on enumerations: limitations of use
Enumerations are part of the standard, and conforming implementations must support all
enumerations. Therefore, using enumerations (including the Pattern Extension method) may not
be a good tool for modeling data that may vary in implementation. For example, if the
enumerations are geographically specific, and the list of valid enumerations likely to be
implemented is region-specific, then enumerations are most likely not the proper tool, as any
conforming implementation must support all enumerations.
The Pattern Extension method is a powerful tool for enabling the use of incomplete enumeration
lists. However, it should not be seen as a standard practice for all enumerations. It is intended for
instances where most values are known, and not where only a few are known. When only a few
values are known, and an atomic data type such as a simple string is insufficient, the Convention
Method is preferred.
Even further, the Pattern Extension method should not be used when work groups believe they
know all the enumerations that should be in a schema, but want to leave the door open to
“possibilities.” Excessive use of the Pattern Extension could lead to an undermining of the
standard. Essentially, this design should be used selectively.
4 Scoping of components
4.1
Global and local elements/types
Example – Address/Phone Number:
01: <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
02: <xsd:complexType name="PhoneNumber">
03:
<xsd:sequence>
04:
<xsd:element name="Prefix"/>
05:
<xsd:element name="MainNumber"/>
06:
</xsd:sequence>
07: </xsd:complexType>
08: <xsd:complexType name="PostalAddress">
09:
<xsd:sequence>
10:
<xsd:element name="AddressLine" type="xsd:string"/>
11:
<xsd:element name="Municipality" type="xsd:string"/>
12:
<xsd:element name="Region" type="xsd:string"/>
13:
<xsd:element name="PostalCode" type="xsd:string"/>
14:
</xsd:sequence>
106752302
15
15: </xsd:complexType>
16: <xsd:element name="FaxNumber" type="PhoneNumber"/>
17: <xsd:element name="DeliveryAddress" type="PostalAddress"/>
18: <xsd:element name="BillingAddress" type="PostalAddress"/>
19: </xsd:schema>
In the Address/Phone Number example above, there are global and local elements declared as
indicated below. The use of global and local elements is an important consideration in the design
of schemas and CPO objects. Recall that global elements and types are declared as a child of
the root element <schema>. All declarations and definitions occurring below this level
(grandchildren) are considered local.
Global elements and complexTypes:
02: <xsd:complexType name="PhoneNumber">
08: <xsd:complexType name="PostalAddress">
16: <xsd:element name="FaxNumber" type="PhoneNumber">
17: <xsd:element name="DeliveryAddress" type="PostalAddress"/>
18: <xsd:element name="BillingAddress" type="PostalAddress"/>
Local elements and complexTypes:
04: <xsd:element name="Prefix"/>
05: <xsd:element name="MainNumber"/>
10: <xsd:element name="AddressLine" type="xsd:string"/>
11: <xsd:element name="Municipality" type="xsd:string"/>
12: <xsd:element name="Region" type="xsd:string"/>
13: <xsd:element name="PostalCode" type="xsd:string"/>
The key distinction between global and local is in the reusability of components. The
complexTypes “PhoneNumber” and “PostalAddress” can be defined once and reused for
elements with similar structure, as evidenced in the <FaxNumber>, <DeliveryAddress> and
<BillingAddress> elements.
4.2
Design Approaches – Russian Doll, Salami Slice and Venetian Blind
In a general discussion on schema best practices [see XML Schemas: Best Practices], there
are 3 design approaches explained. In the Russian Doll design, all elements are localized. This
design prevents the reuse of components and should not therefore be used in HR-XML schemas.
On the other end of the spectrum, the Salami Slice design uses all global elements and
complexTypes. While the reusability is very high, there are some drawbacks. The Salami Slice is
very verbose and prevents the ability to hide namespaces. While HR-XML schemas expose
namespaces [see section Namespaces and Versioning], the right to change this design without
having to completely redesign the schemas in the future should be retained. The solution is to
use the Venetian Blind design, where major reusable components are defined as global
complexTypes. Declarations within the reusable components should be local.
Since the Venetian Blind approach retains reusability without preventing the future use of
namespace hiding, it is considered the preferred design approach. The HR-XML Technical
Steering Committee, and the XML Schemas: Best Practices document, both endorse the use of
the Venetian Blind design. HR-XML Consortium schemas should therefore follow this design
approach.
106752302
16
Given this design goal, HR-XML CPO objects should be designed with a few, or perhaps one,
global complexTypes. Since it is not necessary to reuse every part of every CPO object, the
remainder of the declarations and definitions should be local. This allows the schema that
includes the CPO object to reuse the main object structure(s).
Example – Company master schema:
<xsd:schema targetNamespace="http://ns.hr-xml.org"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://ns.hr-xml.org"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
xsi:schemaLocation="http://ns.hr-xml.org
Company.xsd ">
<xsd:include schemaLocation="../CPO/PostalAddress.xsd"/>
<xsd:element name="Company">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="DeliveryAddress" type="Address"/>
<xsd:element name="BillingAddress" type="Address"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Example – PostalAddress CPO object:
<xsd:schema targetNamespace="http://ns.hr-xml.org"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://ns.hr-xml.org"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"http://ns.hr-xml.org
../CPO/PostalAddress.xsd ">
<xsd:complexType name="PostalAddress">
<xsd:sequence>
<xsd:element name="AddressLine" type="xsd:string"/>
<xsd:element name="Municipality" type="xsd:string"/>
<xsd:element name="Region" type="xsd:string"/>
<xsd:element name="PostalCode" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
4.3
Scope decision tree
106752302
17
use context
naming
element
is it the root
element?
Yes
use class
names
attribute
No / Not Sure
is it used more
than once?
locally scoped
use
<xsd:group>
Yes / Not Sure
is it the the
root element
type?
No
Yes
Yes
globally scoped
<Person
type="PersonType"/>
Yes
type
can it be used
consortium-wide
with the same
meaning? *
No
can it be used
consortium-wide
with the same
meaning? *
No
can it be used
consortium-wide
with the same
meaning?
Yes
No
No
is it used more
than once?
No
is it used more
than once?
No
Yes
use
<xsd:attributeGroup>
Yes
An alternative to this approach is to more fully qualify the name of a component. For example, if
an element name is “Participant” and a work group is unsure if it can define this for the entire
Consortium, it can more fully qualify the name to “BenefitPlanParticipant”.
5 Namespaces and Versioning
In the HR-XML architecture until now, the design was for version-specific namespaces. This
design reflects the stand alone, business process-centric nature of the Consortium. The CPOs
were the primary reusable, with other reuse of components requiring management of multiple
namespaces.
As we move toward a library-based approach, the opportunity to simplify the use of namespaces
becomes clear. The architecture in this document is meant to simplify the use of namespaces as
well as address some issues raised with the previous method. Specifically:
1) Stand alone CPOs: There has been increasing interest in using HR-XML CPOs stand
alone in applications supporting this semantic model. The no-namespace CPO design
was assuming that they would be used within another schema which would assign the
namespace. Giving CPOs a namespace would better support their use as stand alone
data models.
106752302
18
2) Simplifying: Simplifying the management of namespaces would make design as well as
implementation easier. As stated, there are currently numerous master schemas that
each have their own namespace. These varying namespaces often use the same version
of a CPO. This may confuse the situation. For example, BusinessProcess-1_0.xsd and
BusinessProcess-2_0.xsd may both use the same CPO PersonName-1_2. So the same
CPO has different namespaces. An implementer may ask, “does a new version of a
master schema mean I have to re-code my implementation of PersonName?” The new
project version namespace may confuse the fact that no new coding on the CPO may be
required. Having one, universal namespace would eliminate this confusion.
3) Library status: Defining a single, universal namespace would better reflect this library
status. If the Consortium treats HR-XML objects as a library, what purpose is there for
crossing namespaces? The logical answer is “none.”
4) Versioning in namespaces is not necessary: The Consortium reflects versioning in its
filenames as well as in its directory structure. The need to further delineate versioning in
namespaces is not necessary.
5) Best Practices: The Best Practices consensus on versioning indicates that at most, a
version-specific release package may be used. In the decentralized design until now, that
is reflected in the namespace-coercion design. However, with a library approach, this
would mean at most a Consortium-wide “2.0” or “3.0” based namespace. Even further,
the best practices indicate that schema *additions* should not trigger a new namespace
but that schema changes might (meaning schema components that change meaning).
The most unobtrusive method is to use the version attribute in the schema.
5.1
Namespaces
Given issues such as these, the Consortium is endorsing one, universal namespace:
http://ns.hr-xml.org
This namespace will be used to indicate all HR-XML Consortium defined components.
Example – Job Position Seeker:
01: <xsd:schema targetNamespace="http://ns.hr-xml.org"
02: elementFormDefault="qualified"
03: xmlns="http://ns.hr-xml.org"
04: version="1.0"
05: xsi:schemaLocation="http://ns.hr-xml.org
06:
../SEP/JobPositionSeeker.xsd">
07: <xsd:include schemaLocation="../CPO/PersonName.xsd"/>
08: <xsd:include schemaLocation="../CPO/TelcomNumber.xsd"/>
09: <xsd:element name="JobPositionSeeker">
10:
<xsd:complexType>
11:
<xsd:sequence>
12:
<xsd:element name="Person" type="PersonName" maxOccurs="unbounded"/>
13:
<xsd:element name="Phone" type=" TelcomNumberType " maxOccurs="unbounded"/>
14:
</xsd:sequence>
15:
<xsd:attribute name="contentLanguage" type="xsd:string" use="default" value="EN"/>
16:
</xsd:complexType>
17: </xsd:element>
18: </xsd:schema>
All master schemas MUST have the elementFormDefault set to “qualified,” as seen in line 02.
The targetNamespace and the default namespace MUST be the same. Master schemas and
CPOs alike MUST conform to this pattern.
106752302
19
6 Versioning2
Given the issues in the previous section, the version of schemas will be reflected as follows:
1) In the schema, the “version” attribute will be set to the package-release of the HR-XML
library.
2) Also in the schema, the filename itself will continue to have individual schema version
filenames, as indicated in the Naming Conventions section.
3) In the instance, the filename in the “schemaLocation” attribute will indicate to which file
the schema conforms.
7 Localization (supporting region-specific data)
While the Consortium is committed to supporting the adoption of international standards, there
are portions of some HR transactions that are intricately connected to a particular region. This
may be because of regulatory or tax jurisdiction reasons. For example, a payroll instruction is a
concept that is independent of any region, as each payroll system will need to make adjustments
to an employee’s pay. However, some of this payroll instruction data may be tied to a particular
jurisdiction. Therein lies the localization problem.
The solution to this problem consists of three simple pieces. First, a base schema is created that
is independent of any region (just as it would in any HR-XML project). Second, the component
that is region-specific is defined via an include, so as to separate international from region-specific
components. This region-specific component is a union of the third piece of the localization
puzzle, that of local data. The local data is modularized by the particular jurisdiction or region the
context demands and defines the data that is supported in the transaction.
The image below indicates how the example above works using simple schema includes.
2
Would it be possible to include an attribute, in addition to the version with a default value
identifying the actual release. I.e. 2003-09 release. It seems to me that this would make it absolutely clear
to the trading partners.
106752302
20
The benefit of this design is three fold:
1) First, it provides a way to support localization of parts of standard schemas without using
complex (and often inconsistently implemented) features of XML Schema.
2) Second, it allows for future regional schemas to be added as needed without having to
change the original base schema.
3) Third, trading partners can choose which regional schema dependencies they want to
support without having to implement parts of the standard that do not apply to them.
8 Appendix A - References
HRXMLExtension
Describes endorsed HR-XML methods for extending schemas.
http://ns.hr-xml.org/TSC/HRXMLExtension-1_0/HRXMLExtension-1_0.pdf
http://ns.hr-xml.org/TSC/HRXMLExtension-1_0/UserArea-1_0.xsd
The Open Applications Group
The OAGIS 8.0 architecture describes the use of additional constraints and contextual data.
http://www.openapplications.org
106752302
21
Schematron: An XML Structure Validation Language using Patterns in Trees
http://www.ascc.net/xml/resource/schematron/schematron.html
XML Namespaces
See http://www.w3.org/TR/REC-xml-names/
XML Schema: Primer
W3C Recommendation, 2 May 2001.
See http://www.w3.org/TR/xmlschema-0/
XML Schema Part 1: Structures
W3C Recommendation, 2 May 2001. Includes a “schema for schemas.”
See http://www.w3.org/TR/xmlschema-1/
XML Schema Part 2: Datatypes
W3C Recommendation, 2 May 2001.
See http://www.w3.org/TR/xmlschema-2/
XML Schema Tutorial
See http://www.xfront.com/xml-schema.html
XML Schemas: Best Practices
A set of best practices for XML Schemas as developed by a discussion of xml implementers and
coordinated by Roger Costello. See http://www.xfront.com/BestPracticesHomepage.html
9 Appendix B - Document Version History
Version
2.0
Date
2002-05-20
2002-11-22
Description
Draft
Added contextual schemas in part
10 Appendix D - Future Issues
XML Schemas are rich and flexible, as are its possibilities. These guidelines are an evolving
source of standard approaches for HR-XML schemas. Future versions of these guidelines MAY
include some of the following issues that are not addressed in this document. General comments
are included in this section.
10.1 General Items
1) Should we have no anonymous types?
2) Should there be default values in the schema? Some say data should be kept with data
(the instance) and metadata should be kept with metadata (the schema).
106752302
22
3) Ordering of occurrence of child elements. We have informally had the strategy of saying
ID related elements should occur first, extension elements (i.e. UserArea) last, and the
rest occur alphabetically in between. Is this something to enforce?
106752302
23