XML Schema Design Guidelines 2.1 Working Draft 2003 November 18 This version: SchemaDesignGuidelines, 2003 Nov 18 Previous version: SchemaDesignGuidelines-2_0, 2003 Feb 24 Editor: Kim Bartkus (kim@hr-xml.org) Bill Kerr (bill.kerr@oracle.com) Paul Kiel (paul@hr-xml.com) Lee Humphries (Lee_Humphries@softworks.com.au) Authors: Paul Kiel (paul@hr-xml.com) David Baliles (david.baliles@personic.com) Chuck Allen (chucka@hr-xml.org) Lee Humphries (Lee_Humphries@softworks.com.au) Contributors: Technical Steering Committee (TSC) (tsc@lists.hr-xml.org) Copyright statement ©2003 HR-XML. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. Abstract XML Schemas are a fundamental technology that governs the structure of HR-XML document types. They are a document syntax that allows for the automated processing and error checking of an XML document via a parser. This document is intended to show the architecture of HR-XML schemas, including some best practices and general design principles. Status of this Document This document has been submitted to the TSC for review. It is a draft and should not be considered an official technical note from the TSC. This document URL: http://schemas.hr-xml.org/xc/canon/TSC/SchemaDesignGuidelines.doc 106752302 1 Target Audience This document is intended for assisting work group and Cross Process Object activities in designing modular, reusable, and extensible HR-XML document types. It is also intended for those new to schema, with guidelines on best practices. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Table of Contents 1 Scope and Purpose.................................................................................................................. 3 1.1 Introduction: Definitions ................................................................................................. 3 1.1 Scope ............................................................................................................................ 3 1.2 Purpose ......................................................................................................................... 3 1.3 Relationship between examples and work group activities ........................................... 4 2 Naming Conventions ................................................................................................................ 4 2.1 Use CamelCase for Element and Attribute Names....................................................... 4 2.2 Upper and Lower CamelCase ....................................................................................... 4 2.3 File Naming Conventions .............................................................................................. 4 2.4 Use Meaningful Element and Attribute Names ............................................................. 5 2.5 Don’t Use Private Encodings......................................................................................... 5 2.6 Fully-qualified and Context-aware naming .................................................................... 5 2.6.1 Locally scoped naming : Context-aware .......................................................... 6 2.6.2 Globally scoped naming : Fully-qualified .......................................................... 6 2.7 Naming types ................................................................................................................ 7 3 Design Patterns........................................................................................................................ 8 3.1 Use Elements to Represent Data Content .................................................................... 8 3.1.1 Elements/Attributes: Other Considerations ...................................................... 8 3.2 Special Content Models ................................................................................................ 8 3.2.1 Mixed ................................................................................................................ 8 3.2.2 ANY .................................................................................................................. 9 3.2.3 Recursive ......................................................................................................... 9 3.2.4 xsd:all ............................................................................................................. 11 3.3 Elements and complexTypes ...................................................................................... 11 3.4 Extended Enumerations .............................................................................................. 12 3.4.1 Objective ........................................................................................................ 12 3.4.2 Scope ............................................................................................................. 12 3.4.3 Possible approaches: Convention or Pattern Extension ................................ 12 3.4.4 Data and metadata: what is good XML design? ............................................ 15 3.5 Final note on enumerations: limitations of use ............................................................ 15 4 Scoping of components ......................................................................................................... 15 4.1 Global and local elements/types ................................................................................. 15 4.2 Design Approaches – Russian Doll, Salami Slice and Venetian Blind ........................ 16 4.3 Scope decision tree ..................................................................................................... 17 5 Namespaces and Versioning ................................................................................................. 18 5.1 Namespaces ............................................................................................................... 19 5.2 Versioning ................................................................................................................... 20 5.3 An instance example .....................................................Error! Bookmark not defined. 106752302 2 6 Localization (supporting region-specific data) ........................................................................ 20 7 Appendix A - References ....................................................................................................... 21 8 Appendix B - Document Version History ................................................................................ 22 9 Appendix D - Future Issues ................................................................................................... 22 9.1 General Items .............................................................................................................. 22 1 Scope and Purpose 1.1 Introduction: Definitions XML: XML stands for eXtensible Markup Language. It enables the user, by means of tags (e.g. '<Person>', '<Salary>’) to define the semantic meaning of a given XML document. An XML document contains a set of tags and data content, which follow a particular structure. A Schema may define the structure of the XML document. The user or a special interest group such as the HR-XML Consortium can define these structures for others to use. The ability to create standard structures enables two or more interested parties to exchange information in a way that the receiver can understand and validate. Schema: A schema serves the same purpose as a DTD, but a schema provides a more powerful and flexible means of defining markup languages. A schema is written using XML syntax. This is not true of DTDs, which use a syntax specified for Standard Generalized Markup Language (SGML), a precursor technology of which XML Version 1.0 is a subset. Schemas support data typing and constructs that allow one element to derive properties from another element type. Several groups and companies developed their own schema specifications in advance of the World Wide Web Consortium’s (W3C) issuance of its XML Schema Definition Language (“XSDL” or “XSD”) recommendation. The term “schema” is sometimes used loosely in a way that encompasses DTDs as well as the various schema formats. However, as used in this document, “schema” refers to the W3C’s XML Schema Definition Language (“XSDL” or “XSD”) recommendation. For further information, see http://www.w3c.org. 1.1 Scope This document provides design guidelines and conventions for XML Schemas developed by HRXML workgroups. 1.2 Purpose HR-XML’s Schema design guidelines and conventions are intended to ensure that XML Schemas developed by HR-XML workgroups are: Easy to use and intelligible to the various end-user constituencies -software developers, HR professionals, database administrators, etc. Consistent in design across HR-XML workgroups 106752302 3 Consistent with a library-based approach Easy to maintain Simple Extensible (without undermining the standard!) Version control featured “Best Practices” aware Context aware These guidelines are for stating the best practices of schema design within the HR-XML Consortium. 1.3 Relationship between examples and work group activities While examples in this document occasionally draw upon working drafts of schemas being developed within the HR-XML Consortium, these schemas were often edited for space or simplicity and should not be construed as representing valid instances or schema fragments resulting from actual work group activities. 2 Naming Conventions 2.1 Use CamelCase for Element and Attribute Names Element names that contain multiple words, should have each word capitalized. This is known as “camel case” notation. Other special delimiting characters, such as ‘_’, ‘-‘, ‘.’ should be avoided. An exception to this convention is made for element names that contain abbreviated words – in which case the abbreviated letters are in upper case. 2.2 Upper and Lower CamelCase Follow these conventions for the use of upper and lower CamelCase: All element names are upper CamelCase – e.g., PersonName All attribute names are lower CamelCase – e.g., familyNamePrefix Within elements, abbreviations are upper CamelCase – e.g. IntlCode, TelNumber Within elements, acronyms are UpperCase – e.g. UL, TTDNumber Within attributes, acronyms are lower CamelCase, i.e. src, idRef 2.3 Enumerated list values must use the rules above that apply to elements, meaning the use of upper CamelCase File Naming Conventions Filenames also should follow the CamelCase convention. For instance, a schema written to support a benefits enrollment process might be named: BenefitsEnrollment.xsd 106752302 4 2.4 Use Meaningful Element and Attribute Names Choose element names that reflect the business language you are modeling. They should be meaningful and easy to understand. Abbreviations within names should be avoided, unless the element name would otherwise be excessively long. When using abbreviations, the most commonly accepted and clearest version should be used. Keep in mind international audiences, so avoid regional abbreviations. <Shp2> Don’t do this </Shp2> <Dptmnt> Don’t do this </Dptmnt> <Dept> Write it this way </Dept> 2.5 Don’t Use Private Encodings When practical, use distinct elements to separate parts of data. If you have a data item that has internal structure, separating the parts with tags enables an XML parser to do the work of distinguishing components for you, which saves you the cost of writing a custom parser. Additionally, you will have made the structure publicly readable and thereby available to standard tools such as XSL style sheets and query languages. If something has more than one part, it is likely that it will gain still more parts in the future, and using tags up-front means easier extension later. For example: <Price>USD 123.45</Price> is semantically easier to parse as: <Price currency="USD">123.45</Price> 2.6 Fully-qualified and Context-aware naming Within work groups, there is a tension between modeling schemas based on top-down and bottom-up processes. The top-down approach uses object-oriented models, developed from a well-researched model of the domain space. The bottom-up approach is based on the need for immediate, utilitarian business document modeling in a space where there may not be any extant domain object models. To illustrate this tension, consider the top-down model oriented element structure: <Object> <Type> <Name/> </Type> </Object> The corresponding bottom-up, utilitarian xml might look like this: <ObjectTypeName/> A comparison: Model Oriented Pros Cons Allows for greater reuse of components. Takes longer to model domain properly. Minimizes the number of unique components, easing implications on processing and Bloats instance with more tags, especially in large documents. 106752302 5 glossaries. Does not lend itself easily to using Prime, Class, and Modifier words in each element. Utilitarian Pros Cons Can be implemented faster. Components are not as reusable. Lends itself easily to using Prime, Class, and Modifier words for each element. Increases the number of unique components, with implications in processing and glossaries. (i.e. <Object1TypeName/>, <Object2TypeName/>, <Object3TypeName/>) Given the scenarios described above, HR-XML naming conventions will try to balance fully qualified and context-specific naming patterns. For components (elements, attributes, types) scoped globally, use class name based naming (fully qualified). For locally scoped components, use context aware naming. 2.6.1 Locally scoped naming : Context-aware When a component is deemed to be scoped locally, the context of the name MAY be considered. 1 2.6.2 Globally scoped naming : Fully-qualified A data element name is usually composed of between one and three component words. Limiting names to this length keeps the naming conventions consistent and minimizes the performance drain from excessively long element names. Each component is classified as a prime word, class word, or modifier word. First, a data name may contain a prime word designator. Prime words identify business information objects to which the data element (or attribute) belongs, for example <Insurance>. Generic elements such as <Address> may not need an industry-specific prime word, especially if it will be used in many contexts. Second, elements and attributes may be classified into common groupings using class words. Classes can better organize and provide meaningful names for elements. Common class names include: Cost, Date, Days, Day, Description, Id, Idref, Indicator, Name, Number, Rate, and Ratio. 1 I believe there may still be some confusion regarding the context surrounding the naming of local elements. During the development and final reviews of the Background Checking specification, the one suggestion was to include the context of the element within the name, and the other suggestion was to allow the parent element define the context. To a certain degree, this was also reflected in some of the comments from the HRDD workgroup. As one example, a number of our clients have identified confusion regarding the ReferenceId element which is used in several contexts. In each case, the context is defined by the Parent element. 106752302 6 Lastly, in addition to a prime and class word designator, a data name may contain one or more modifier words or adjectives that further describe the data element. An example that illustrates the use of all three types of component names is <JobPostingDate>. “Job” is a business object specific to this domain, “Posting” further modifies the term “Job”, and “Date” is a class name indicating the type of data element. Good Examples: <attribute name="expenseRatio"/> Here the class word “Ratio” is used to indicate the class of information. <element name="DropDeadDate"/> If the data element matched, this could also be DropDeadDay or DropDeadDays. <element name="ErrorDescription"/> “Description” indicates the meaning of this element, as opposed to an error code or an error handling instruction. <attribute name="jobSeekerId"/> This name includes clear prime, class, and modifier words. <attribute name="insuranceIndicator" default="covered"/> Bad Examples: <element name="TheCode"/> The class word “Code” does not make this element name more comprehensible. The choice of prime word is equally as bad. <element name="ListOfAllVacanciesInTheDatabase"/> If all elements were given such long names, the impact on performance with large documents would begin to show. <element name="Item"/> This element is unclear. 2.7 Naming types It is the Consortium’s policy for the name of types to correlate with the components which use them. This is done by appending the key word “Type” to the end of the name of a components which would use the type. For example, the type used in the element “PersonName” would be called “PersonNameType”. This pattern is a loose correlation, however, as many different elements with different names could use the same type (i.e. the element “Participant” could be of type “PersonNameType”). This guideline is intended to semantically relate types with components for ease of use and readability and should be used in that context. 106752302 7 3 Design Patterns 3.1 Use Elements to Represent Data Content The choice of whether to represent semantic components as elements or attributes is made on a case-by-case basis. However, the general rule is that elements should be used to represent objects and data content. Attributes should be used to represent element metadata. One rule to apply is to see if the data item could exist on its own as an inherent piece of information. If the information is a distinct item, but dependent upon another, make it a child element. If it merely refines a larger item, then use an attribute. 3.1.1 Elements/Attributes: Other Considerations In deciding whether a semantic component should be represented as an element or attribute, the following key differences should be considered: XML 1.0 rules to bear in mind 3.2 Elements can contain only other elements or data, while attributes can contain only data. Thus, if a semantic component has a complex structure or could potentially contain other elements, the component should be modeled as an element. Elements can be declared so XML 1.0 conforming parsers preserve the order of elements. You cannot similarly control the order of attributes. Some parsers may preserve the order of attributes, but this is not a requirement under XML 1.0. Processors that use the SAX interface return attributes as a set with the start of an element. In contrast, child elements are returned one-by-one. An element can have only one attribute of type “ID.” (And of course attributes of type “ID” must have unique values within the entire document.) Elements and attributes must begin with a letter. (i.e. <7DaysAWeek/> is not a valid XML 1.0 element) If the potential values of data are known, then an enumerated list of values will provide enforcement and error checking. [See section on Extended Enumerations.] Special Content Models Guidance on the use of the following content models: Mixed, ANY, recursive, and xsd:all. 3.2.1 Mixed Mixed content models are elements that contain data as well as sub elements. These content models MUST NOT be used in any context. Processing this data by applying style sheets, for example, produces inconsistent results across parser implementations. The use of a child element to contain the mixed data can solve this issue. In this manner, all components are either data or elements, but not both. Avoid: <Book>The HR-XML Story 106752302 8 <Author>Chuck Allen</Author> </Book> Good: <Book> <Title> The HR-XML Story </Title> <Author>Chuck Allen</Author> </Book> Mixed should only be used in HR-XML where there is a necessity to accommodate either text or elements within one element (an equivalent to Choice). The original reason for mixed content is to allow for text markup, therefore it may still be desirable in certain contexts, e.g. the text of résumés. 3.2.2 ANY Content models of type ANY should be used with caution. This type of content can serve the positive purpose of enabling extensions or integrations but need to be restricted enough to prevent dilution of the standard. Careful selection of its use is encouraged. The Technical Steering Committee should be consulted on any potential exceptions to this rule. 3.2.3 Recursive A recursive data content model is where an element contains itself as a descendent. example: For <xsd:element name="RootElement"> <xsd:complexType> <xsd:sequence> <xsd:element ref="ChildElement"/> <xsd:element ref="AnotherChild"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="ChildElement"> <xsd:complexType> <xsd:sequence> <xsd:element ref="RootElement"/> </xsd:sequence> </xsd:complexType> </xsd:element> <!—above: ChildElement calls its parent as a child --> <xsd:element name="AnotherChild"> <xsd:complexType> <xsd:sequence> <xsd:element ref="AnotherChild"/> </xsd:sequence> </xsd:complexType> </xsd:element> <!—above: AnotherChild calls itself directly --> The use of a recursive content model should be done with caution. This structure could lead to system problems with creating or processing a document such as with applying style sheets. For example, if a program or stylesheet was created to “walk the tree” so to speak, simply processing 106752302 9 each node in turn, then there is the potential for causing infinite looping. Design of such processing should take this potential into account. An alternate way to model this type of relationship is to use ID and IDREF. For example, <xsd:element name="RootElement"> <xsd:complexType> <xsd:sequence> <xsd:element ref="ChildElement"/> <xsd:element ref="AnotherChild"/> </xsd:sequence> <xsd:attribute name="id" type="ID" use="required"/> </xsd:complexType> </xsd:element> <xsd:element name="ChildElement"> <xsd:complexType> <xsd:simpleContent> <xsd:restriction base="string"> <xsd:attribute name="rootIdRef" type="IDREF" use="required"/> </xsd:restriction> </xsd:simpleContent> </xsd:complexType> </xsd:element> <!—above: ChildElement points to its parent --> <xsd:element name="AnotherChild"> <xsd:complexType> <xsd:simpleContent> <xsd:restriction base="string"> <xsd:attribute name="id" type="ID" use="required"/> <xsd:attribute name="anotherIdRef" type="IDREF" use="required"/> </xsd:restriction> </xsd:simpleContent> </xsd:complexType> </xsd:element> <!—above: AnotherChild points to a sibling --> Occasionally the correct modeling of business objects and their relationships requires recursion, rather than the “implied” relationships using IDREF. Below is one example, where an unstructured form of resume enables nested sections. This, in essence, provides structure for data that may not conform to structures modeled elsewhere in the schema. Example instance: <FreeFormResume> <ResumeSection> <SectionTitle>Skills</SectionTitle> <SecBody> <!-- here, ResumeSection is a descendent of itself describing the “Management” part of the Skills section --> <ResumeSection> <SectionTitle>Management</SectionTitle> <SecBody> <P>Supervised 8 FTE.</P> </SecBody> </ResumeSection> <!-- here, ResumeSection is again a descendent of itself describing the “Information Technology” part of the Skills section --> <ResumeSection> <SectionTitle>Information Technology</SectionTitle> <SecBody> <P>Programmed in C++ and Java.</P> 106752302 10 </SecBody> </ResumeSection> </SecBody> </ResumeSection> </FreeFormResume> 3.2.4 xsd:all The use of the XML Schema “all” occurrence (meaning child elements must all occur but can occur in any order) is strongly discouraged. This usage lends itself easily to ambiguous content models, which can cause interoperability problems across parser implementations. 3.3 Elements and complexTypes Choosing whether to model a data item as an <element> or <complexType> has a significant impact on its future use. In XML Schemas, complexTypes can be reused whereas elements can only be “referenced”. This distinction can be the difference between good modular design and inflexible rigidity. Take for example: <xsd:element name="PhoneNumber"> <xsd:element name="Prefix"/> <xsd:element name="MainNumber"/> </xsd:element> In this Schema, the element <PhoneNumber> is clearly defined with 2 sub elements. Creating a <FaxNumber> element with the same structure would require declaring the same content model thus duplicating items. On the other hand, the “PostalAddress” definition in the example below is a complex data type, not an element. This builds in modularity, allowing the creation of multiple elements with this same content model, as shown in <DeliveryAddress> and <BillingAddress>. This modularity allows for the building of reusable components during the design phase. The modularity extends beyond the ability to declare multiple elements of the same type. In the maintenance phase of the Schema, making changes to a single data type declaration will affect all elements that refer to it. In this example, editing the “Address” declaration will affect changes in both <DeliveryAddress> and <BillingAddress>. In designing Schemas, careful examination of the nature of the business object can determine if it is likely to be reused elsewhere or in the future. If this is so, then using <complexType> will reduce duplicative declarations and provide extensibility. Conversely, truly unique items can simply be declared as an <element>. Example – Address/Phone Number: 01: <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 02: <xsd:complexType name="PhoneNumber"> 03: <xsd:sequence> 04: <xsd:element name="Prefix"/> 05: <xsd:element name="MainNumber"/> 06: </xsd:sequence> 07: </xsd:complexType> 08: <xsd:complexType name="PostalAddress"> 09: <xsd:sequence> 10: <xsd:element name="AddressLine" type="xsd:string"/> 11: <xsd:element name="Municipality" type="xsd:string"/> 12: <xsd:element name="Region" type="xsd:string"/> 13: <xsd:element name="PostalCode" type="xsd:string"/> 106752302 11 14: </xsd:sequence> 15: </xsd:complexType> 16: <xsd:element name="FaxNumber" type="PhoneNumber"/> 17: <xsd:element name="DeliveryAddress" type="PostalAddress"/> 18: <xsd:element name="BillingAddress" type="PostalAddress"/> 19: </xsd:schema> It is important to note that types can also be extended, either via restriction or extension when a completely new data type is not needed. In the example below, the base type “string” is restricted in this attribute. In this case, the attribute may have one of an enumerated list of only 5 possible strings. In this fashion, existing types can be used to derive other types. Using extension, existing types can be used to derive other types. Example fragment: <xsd:attribute name="PhoneType"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="voice"/> <xsd:enumeration value="fax"/> <xsd:enumeration value="pager"/> <xsd:enumeration value="TDD"/> <xsd:enumeration value="cell"/> </xsd:restriction> </xsd:simpleType> </xsd:attribute> 3.4 3.4.1 Extended Enumerations Objective HR-XML Consortium work groups are increasingly grappling with a common problem regarding the use of enumerations in schemas. The traditional use of enumerations consists of a fixed number of provided values, which are determined at the time of the schema design. The problem arises when a business process is modeled with a schema that includes enumerated lists that do not cover 100% of the foreseen cases. The main question arises: how can a work group standardize enumerated values when less than 100% of the foreseeable values are known at design time? The objective of this text is to endorse a method for standardizing enumerated values without preventing extensions to cover unknown or trading partner specific values. 3.4.2 Scope This text focuses on Consortium-wide endorsement of XML Schema enumeration standardization and extension methods. For the purposes of this text, the problem is referred to as “incomplete enumeration lists.” 3.4.3 Possible approaches: Convention or Pattern Extension The two most debated approaches to standardizing incomplete enumeration lists are a union of values with a string, essentially a convention, and a union of values with a pattern, known as pattern extension. 106752302 12 The Convention Method The “convention” method consists of a list of known enumerated values unioned with a string. Consider a scenario where there is a contact method element called <VoiceNumber> (lines 01 to 09). This element models telcom number data used for contacting some person. An attribute of this element, called “whenAvailable” (line 05) is used to model when this VoiceNumber is considered a valid contact. A schema design team may know many of the values may occur in this attribute, i.e. “daytime”, “nighttime”, “weekends”, etc. However, they may want to allow more values than are stated at design time, while still standardizing the known values. 01: 02: 03: 04: 05: 06: 07: 08: 09: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: <xsd:element name="VoiceNumber"> <xsd:complexType> <xsd:simpleContent> <xsd:restriction base="xsd:string"> <xsd:attribute name="whenAvailable" type="whenAvailableExtensionType"/> </xsd:restriction> </xsd:simpleContent> </xsd:complexType> </xsd:element> <!-- the known enumerations for availability --> <xsd:simpleType name="whenAvailableType"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="daytime"/> <xsd:enumeration value="nighttime"/> <xsd:enumeration value="weekends"/> </xsd:restriction> </xsd:simpleType> <!-- this is the unioning of the enumeration with a string --> <xsd:simpleType name="whenAvailableExtensionType"> <xsd:union memberTypes="whenAvailableType xsd:string"/> </xsd:simpleType> In this example, the known values are stated as an enumeration (lines 11 to 17). The final step that makes this a convention is where it is unioned with a string data type (lines 19 to 21). This union means that the attribute “whenAvailable” must be either a member of the enumerated list OR a string. Since all enumerated values given are also strings, this in effect makes the data type a string. The parser cannot know that the value “daytimmme” is a misspelling of an enumerated value and should be an error. The value is still a string, so it is considered valid. According to the parser, both of the instances below are considered valid. <VoiceNumber whenAvailable="daytime">11111111</VoiceNumber> <VoiceNumber whenAvailable="daytimmme">2222222</VoiceNumber> The advantage of going to the trouble of enumerating values when the result is essentially a string, is that implementers have a set of known values they can use to pass data that need not be detailed in a Trading Partner Agreement. It standardizes values, but only as a convention. The disadvantage is that validation of values falls on the implementation instead of the parser. The effective data type of the data is a string, and so implementations must determine if the value is valid or a misspelled enumeration for example. The Pattern Extension Method The pattern extension method is similar to the convention except that the parser is able to distinguish between standard and non-standard values. Consider the same scenario as above, a <VoiceNumber> element with a “whenAvailable” attribute. 106752302 13 01: 02: 03: 04: 05: 06: 07: 08: 09: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: <xsd:element name="VoiceNumber"> <xsd:complexType> <xsd:simpleContent> <xsd:restriction base="xsd:string"> <xsd:attribute name="whenAvailable" type="whenAvailableExtensionType"/> </xsd:restriction> </xsd:simpleContent> </xsd:complexType> </xsd:element> <!-- the known enumerations for availability --> <xsd:simpleType name="whenAvailableType"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="daytime"/> <xsd:enumeration value="nighttime"/> <xsd:enumeration value="weekends"/> </xsd:restriction> </xsd:simpleType> <!-- the pattern used for all extension enumeration values --> <xsd:simpleType name="otherPatternType"> <xsd:restriction base="xsd:string"> <xsd:pattern value="x:\S.*"/> </xsd:restriction> </xsd:simpleType> <!-- here is where we bring them together, either is it a known enumeration or it is an enumeration that begins with the string "x:" --> <xsd:simpleType name="whenAvailableExtensionType"> <xsd:union memberTypes="whenAvailableType otherPatternType"/> </xsd:simpleType> In this case, the known values for the attribute are stated the same as in the convention method (lines 11 to 17). However, a pattern is used to establish the format of any non-standard values (lines 19 to 23). This pattern states that any value conforming to this pattern must begin with the string “x:” (not including the quotes) and must consist of a string with at least one character, and may include spaces (line 21). When the enumeration is unioned with this pattern (lines 26 to 28), the result is that the attribute must be either an enumerated value OR a string beginning with “x:”. For consistency, the pattern should always use lower case “x”. <!-- valid values from the enumeration list --> <VoiceNumber whenAvailable="daytime">11111111</VoiceNumber> <VoiceNumber whenAvailable="nighttime">2222222</VoiceNumber> <!-- valid values, conforming to the pattern extension--> <VoiceNumber whenAvailable="x:everyTuesday">4444444</VoiceNumber> <VoiceNumber whenAvailable="x:one summer night">5555555</VoiceNumber> <!-- invalid values, neither enumerated values nor do they conform to the pattern --> <VoiceNumber whenAvailable="mondaysOnly">5555555</VoiceNumber> <VoiceNumber whenAvailable="daytimmme">5555555</VoiceNumber> The advantage of this approach is that the parser can enforce the difference between standard and non-standard values using the delimiter pattern “x:”. It can also enforce the list of standard enumerations (see last instance example above). The disadvantage is the requirement for processing the extension values, as the delimiter must be discarded. 106752302 14 3.4.4 Data and metadata: what is good XML design? Given the two methodologies discussed, the technical temptation is to endorse the Pattern Extension method, as it utilizes the parser much more effectively. But does this violate a principle of good XML design? In designing schemas, editors often adhere to the notion that data should be separated from metadata. Further, an often-used design rule is that elements should contain data and attributes metadata. Given the Pattern Extension method, is this combining data and metadata in the attribute value? The Technical Steering Committee has determined that while the principle of separating data from metadata has significant merit, in this case, making the best use of the parser can be more important. Consequently, it endorses the Pattern Extension method of standardization of incomplete enumeration lists when most of the values are known. When only a few values are know, the Convention Method is acceptable as well as using an atomic data type such as a simple string. 3.5 Final note on enumerations: limitations of use Enumerations are part of the standard, and conforming implementations must support all enumerations. Therefore, using enumerations (including the Pattern Extension method) may not be a good tool for modeling data that may vary in implementation. For example, if the enumerations are geographically specific, and the list of valid enumerations likely to be implemented is region-specific, then enumerations are most likely not the proper tool, as any conforming implementation must support all enumerations. The Pattern Extension method is a powerful tool for enabling the use of incomplete enumeration lists. However, it should not be seen as a standard practice for all enumerations. It is intended for instances where most values are known, and not where only a few are known. When only a few values are known, and an atomic data type such as a simple string is insufficient, the Convention Method is preferred. Even further, the Pattern Extension method should not be used when work groups believe they know all the enumerations that should be in a schema, but want to leave the door open to “possibilities.” Excessive use of the Pattern Extension could lead to an undermining of the standard. Essentially, this design should be used selectively. 4 Scoping of components 4.1 Global and local elements/types Example – Address/Phone Number: 01: <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 02: <xsd:complexType name="PhoneNumber"> 03: <xsd:sequence> 04: <xsd:element name="Prefix"/> 05: <xsd:element name="MainNumber"/> 06: </xsd:sequence> 07: </xsd:complexType> 08: <xsd:complexType name="PostalAddress"> 09: <xsd:sequence> 10: <xsd:element name="AddressLine" type="xsd:string"/> 11: <xsd:element name="Municipality" type="xsd:string"/> 12: <xsd:element name="Region" type="xsd:string"/> 13: <xsd:element name="PostalCode" type="xsd:string"/> 14: </xsd:sequence> 106752302 15 15: </xsd:complexType> 16: <xsd:element name="FaxNumber" type="PhoneNumber"/> 17: <xsd:element name="DeliveryAddress" type="PostalAddress"/> 18: <xsd:element name="BillingAddress" type="PostalAddress"/> 19: </xsd:schema> In the Address/Phone Number example above, there are global and local elements declared as indicated below. The use of global and local elements is an important consideration in the design of schemas and CPO objects. Recall that global elements and types are declared as a child of the root element <schema>. All declarations and definitions occurring below this level (grandchildren) are considered local. Global elements and complexTypes: 02: <xsd:complexType name="PhoneNumber"> 08: <xsd:complexType name="PostalAddress"> 16: <xsd:element name="FaxNumber" type="PhoneNumber"> 17: <xsd:element name="DeliveryAddress" type="PostalAddress"/> 18: <xsd:element name="BillingAddress" type="PostalAddress"/> Local elements and complexTypes: 04: <xsd:element name="Prefix"/> 05: <xsd:element name="MainNumber"/> 10: <xsd:element name="AddressLine" type="xsd:string"/> 11: <xsd:element name="Municipality" type="xsd:string"/> 12: <xsd:element name="Region" type="xsd:string"/> 13: <xsd:element name="PostalCode" type="xsd:string"/> The key distinction between global and local is in the reusability of components. The complexTypes “PhoneNumber” and “PostalAddress” can be defined once and reused for elements with similar structure, as evidenced in the <FaxNumber>, <DeliveryAddress> and <BillingAddress> elements. 4.2 Design Approaches – Russian Doll, Salami Slice and Venetian Blind In a general discussion on schema best practices [see XML Schemas: Best Practices], there are 3 design approaches explained. In the Russian Doll design, all elements are localized. This design prevents the reuse of components and should not therefore be used in HR-XML schemas. On the other end of the spectrum, the Salami Slice design uses all global elements and complexTypes. While the reusability is very high, there are some drawbacks. The Salami Slice is very verbose and prevents the ability to hide namespaces. While HR-XML schemas expose namespaces [see section Namespaces and Versioning], the right to change this design without having to completely redesign the schemas in the future should be retained. The solution is to use the Venetian Blind design, where major reusable components are defined as global complexTypes. Declarations within the reusable components should be local. Since the Venetian Blind approach retains reusability without preventing the future use of namespace hiding, it is considered the preferred design approach. The HR-XML Technical Steering Committee, and the XML Schemas: Best Practices document, both endorse the use of the Venetian Blind design. HR-XML Consortium schemas should therefore follow this design approach. 106752302 16 Given this design goal, HR-XML CPO objects should be designed with a few, or perhaps one, global complexTypes. Since it is not necessary to reuse every part of every CPO object, the remainder of the declarations and definitions should be local. This allows the schema that includes the CPO object to reuse the main object structure(s). Example – Company master schema: <xsd:schema targetNamespace="http://ns.hr-xml.org" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://ns.hr-xml.org" xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" xsi:schemaLocation="http://ns.hr-xml.org Company.xsd "> <xsd:include schemaLocation="../CPO/PostalAddress.xsd"/> <xsd:element name="Company"> <xsd:complexType> <xsd:sequence> <xsd:element name="DeliveryAddress" type="Address"/> <xsd:element name="BillingAddress" type="Address"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> Example – PostalAddress CPO object: <xsd:schema targetNamespace="http://ns.hr-xml.org" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://ns.hr-xml.org" xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"http://ns.hr-xml.org ../CPO/PostalAddress.xsd "> <xsd:complexType name="PostalAddress"> <xsd:sequence> <xsd:element name="AddressLine" type="xsd:string"/> <xsd:element name="Municipality" type="xsd:string"/> <xsd:element name="Region" type="xsd:string"/> <xsd:element name="PostalCode" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema> 4.3 Scope decision tree 106752302 17 use context naming element is it the root element? Yes use class names attribute No / Not Sure is it used more than once? locally scoped use <xsd:group> Yes / Not Sure is it the the root element type? No Yes Yes globally scoped <Person type="PersonType"/> Yes type can it be used consortium-wide with the same meaning? * No can it be used consortium-wide with the same meaning? * No can it be used consortium-wide with the same meaning? Yes No No is it used more than once? No is it used more than once? No Yes use <xsd:attributeGroup> Yes An alternative to this approach is to more fully qualify the name of a component. For example, if an element name is “Participant” and a work group is unsure if it can define this for the entire Consortium, it can more fully qualify the name to “BenefitPlanParticipant”. 5 Namespaces and Versioning In the HR-XML architecture until now, the design was for version-specific namespaces. This design reflects the stand alone, business process-centric nature of the Consortium. The CPOs were the primary reusable, with other reuse of components requiring management of multiple namespaces. As we move toward a library-based approach, the opportunity to simplify the use of namespaces becomes clear. The architecture in this document is meant to simplify the use of namespaces as well as address some issues raised with the previous method. Specifically: 1) Stand alone CPOs: There has been increasing interest in using HR-XML CPOs stand alone in applications supporting this semantic model. The no-namespace CPO design was assuming that they would be used within another schema which would assign the namespace. Giving CPOs a namespace would better support their use as stand alone data models. 106752302 18 2) Simplifying: Simplifying the management of namespaces would make design as well as implementation easier. As stated, there are currently numerous master schemas that each have their own namespace. These varying namespaces often use the same version of a CPO. This may confuse the situation. For example, BusinessProcess-1_0.xsd and BusinessProcess-2_0.xsd may both use the same CPO PersonName-1_2. So the same CPO has different namespaces. An implementer may ask, “does a new version of a master schema mean I have to re-code my implementation of PersonName?” The new project version namespace may confuse the fact that no new coding on the CPO may be required. Having one, universal namespace would eliminate this confusion. 3) Library status: Defining a single, universal namespace would better reflect this library status. If the Consortium treats HR-XML objects as a library, what purpose is there for crossing namespaces? The logical answer is “none.” 4) Versioning in namespaces is not necessary: The Consortium reflects versioning in its filenames as well as in its directory structure. The need to further delineate versioning in namespaces is not necessary. 5) Best Practices: The Best Practices consensus on versioning indicates that at most, a version-specific release package may be used. In the decentralized design until now, that is reflected in the namespace-coercion design. However, with a library approach, this would mean at most a Consortium-wide “2.0” or “3.0” based namespace. Even further, the best practices indicate that schema *additions* should not trigger a new namespace but that schema changes might (meaning schema components that change meaning). The most unobtrusive method is to use the version attribute in the schema. 5.1 Namespaces Given issues such as these, the Consortium is endorsing one, universal namespace: http://ns.hr-xml.org This namespace will be used to indicate all HR-XML Consortium defined components. Example – Job Position Seeker: 01: <xsd:schema targetNamespace="http://ns.hr-xml.org" 02: elementFormDefault="qualified" 03: xmlns="http://ns.hr-xml.org" 04: version="1.0" 05: xsi:schemaLocation="http://ns.hr-xml.org 06: ../SEP/JobPositionSeeker.xsd"> 07: <xsd:include schemaLocation="../CPO/PersonName.xsd"/> 08: <xsd:include schemaLocation="../CPO/TelcomNumber.xsd"/> 09: <xsd:element name="JobPositionSeeker"> 10: <xsd:complexType> 11: <xsd:sequence> 12: <xsd:element name="Person" type="PersonName" maxOccurs="unbounded"/> 13: <xsd:element name="Phone" type=" TelcomNumberType " maxOccurs="unbounded"/> 14: </xsd:sequence> 15: <xsd:attribute name="contentLanguage" type="xsd:string" use="default" value="EN"/> 16: </xsd:complexType> 17: </xsd:element> 18: </xsd:schema> All master schemas MUST have the elementFormDefault set to “qualified,” as seen in line 02. The targetNamespace and the default namespace MUST be the same. Master schemas and CPOs alike MUST conform to this pattern. 106752302 19 6 Versioning2 Given the issues in the previous section, the version of schemas will be reflected as follows: 1) In the schema, the “version” attribute will be set to the package-release of the HR-XML library. 2) Also in the schema, the filename itself will continue to have individual schema version filenames, as indicated in the Naming Conventions section. 3) In the instance, the filename in the “schemaLocation” attribute will indicate to which file the schema conforms. 7 Localization (supporting region-specific data) While the Consortium is committed to supporting the adoption of international standards, there are portions of some HR transactions that are intricately connected to a particular region. This may be because of regulatory or tax jurisdiction reasons. For example, a payroll instruction is a concept that is independent of any region, as each payroll system will need to make adjustments to an employee’s pay. However, some of this payroll instruction data may be tied to a particular jurisdiction. Therein lies the localization problem. The solution to this problem consists of three simple pieces. First, a base schema is created that is independent of any region (just as it would in any HR-XML project). Second, the component that is region-specific is defined via an include, so as to separate international from region-specific components. This region-specific component is a union of the third piece of the localization puzzle, that of local data. The local data is modularized by the particular jurisdiction or region the context demands and defines the data that is supported in the transaction. The image below indicates how the example above works using simple schema includes. 2 Would it be possible to include an attribute, in addition to the version with a default value identifying the actual release. I.e. 2003-09 release. It seems to me that this would make it absolutely clear to the trading partners. 106752302 20 The benefit of this design is three fold: 1) First, it provides a way to support localization of parts of standard schemas without using complex (and often inconsistently implemented) features of XML Schema. 2) Second, it allows for future regional schemas to be added as needed without having to change the original base schema. 3) Third, trading partners can choose which regional schema dependencies they want to support without having to implement parts of the standard that do not apply to them. 8 Appendix A - References HRXMLExtension Describes endorsed HR-XML methods for extending schemas. http://ns.hr-xml.org/TSC/HRXMLExtension-1_0/HRXMLExtension-1_0.pdf http://ns.hr-xml.org/TSC/HRXMLExtension-1_0/UserArea-1_0.xsd The Open Applications Group The OAGIS 8.0 architecture describes the use of additional constraints and contextual data. http://www.openapplications.org 106752302 21 Schematron: An XML Structure Validation Language using Patterns in Trees http://www.ascc.net/xml/resource/schematron/schematron.html XML Namespaces See http://www.w3.org/TR/REC-xml-names/ XML Schema: Primer W3C Recommendation, 2 May 2001. See http://www.w3.org/TR/xmlschema-0/ XML Schema Part 1: Structures W3C Recommendation, 2 May 2001. Includes a “schema for schemas.” See http://www.w3.org/TR/xmlschema-1/ XML Schema Part 2: Datatypes W3C Recommendation, 2 May 2001. See http://www.w3.org/TR/xmlschema-2/ XML Schema Tutorial See http://www.xfront.com/xml-schema.html XML Schemas: Best Practices A set of best practices for XML Schemas as developed by a discussion of xml implementers and coordinated by Roger Costello. See http://www.xfront.com/BestPracticesHomepage.html 9 Appendix B - Document Version History Version 2.0 Date 2002-05-20 2002-11-22 Description Draft Added contextual schemas in part 10 Appendix D - Future Issues XML Schemas are rich and flexible, as are its possibilities. These guidelines are an evolving source of standard approaches for HR-XML schemas. Future versions of these guidelines MAY include some of the following issues that are not addressed in this document. General comments are included in this section. 10.1 General Items 1) Should we have no anonymous types? 2) Should there be default values in the schema? Some say data should be kept with data (the instance) and metadata should be kept with metadata (the schema). 106752302 22 3) Ordering of occurrence of child elements. We have informally had the strategy of saying ID related elements should occur first, extension elements (i.e. UserArea) last, and the rest occur alphabetically in between. Is this something to enforce? 106752302 23