Guidelines for using XML for Electronic Data Interchange

Guidelines for using XML for Electronic
Data Interchange
Version 0.05
25th January 1998
Editor: Martin Bryan, The SGML Centre
Contributors: Members of the XML/EDI working group, including Benoít Marchal,
Norbert H Mikula, Bruce Peat and David RR Webber.
XML/EDI Group Home Page URL: http://www.xmledi.org
Copyright © 1998. XML/EDI Group. All rights reserved, no part of this document may
be commercially reproduced in part or in whole without consent and prior approval.
Changes made to this version
Addition of figures used in presentation to W3C in January 1998.
Brief explanation of differences in business processes between client-centric electronic
business transactions and server-centric web retailing.
Rules templates now linked to messages via a processing instruction rather than XLL
simple link (for conformance to way in which style sheets are linked to the message.
The examples in Annex A have been updated to show an XML book order that can be
displayed using Micorsoft's MSXSL beta add-on to Internet Explorer 4.0. On-line link to
demonstration software now provided.
Contents
1. Purpose & Goal of the XML/EDI Guidelines
2. Definitions for XML/EDI
o The standards involved in XML/EDI
3. Scope of XML/EDI
o Business-to-business Electronic Data Interchange
o Electronic business transactions
4. Base Technologies of XML/EDI
o Why use XML?
o Integrating XML with EDI
5. XML/EDI Components
o Types of applications
 Lexicon Repositories
 XML/EDI Data Manipulation Agents (DataBots)
 XML/EDI Business Objects
 XML/EDItors
 XML/EDI extensions for message stores
 Search Agents
 Trading Partner Pages
6. The Implementation Process
o Using XML for Electronic Data Interchange
o Identifying data sets
o Developing DTDs
o Application specific extensions
o Creating message instances
o Validating messages
o Exchanging messages
o Processing messages
o Activating rules



Appendix A1: Using XML/EDI for Book Ordering
o Applying XML/EDI to Book Ordering
Glossary
Bibliography
1. Purpose & Goal of the XML/EDI Guidelines
Put simply, the goal of XML/EDI is to deliver unambiguous and durable business
transactions via electronic means.
Associated with this is a goal to establish a standard for commercial electronic data
interchange that is open and accessible to all, and which delivers a broad spectrum of
capabilities suitable to meet the full breadth of business needs.
To achieve this requires the use of a methodology that it is not only extensible enough to
meet future requirements but also adaptable enough to incorporate new technologies and
requirements as they emerge. To ensure broad adoption the technology selected needs to
be widely and freely available. The Extensible Markup Language (XML) developed by
the World Wide Web Consortium (W3C) provides such a freely available, widely
transportable, methodology for well-controlled data interchange.
XML was designed principally for the exchange of information in the form of computer
displayable "documents". Not all commercial data is interchanged in a displayable
format. In particular data designed for electronic data interchange typically needs to be
processed before it can be displayed. For this to be possible the data must be mapped,
using some form of template, to a set of processing rules. These XML/EDI guidelines
provide a standardized way in which such rules templates can be added to interchanged
data.
These XML/EDI guidelines begin by formally defining the terms used in the text. This is
followed by an impact statement that makes predictions from various viewpoints. The
guidelines then give a background on the tools and standards which XML/EDI is built.
Note: These guidelines form the basis for development work on XML/EDI. They form an
precursor to a formal "Specification of an EDI Application for XML". As a document
designed to be a lighting rod for ideas, this working document has been, and will
continue to be, released in draft form. Comments on this draft should be sent to the
XML/EDI working group at xml-edi@riv.be.
2. Definitions for XML/EDI
Electronic commerce has been defined in the European Workshop on Open System's
Technical Guide on Electronic Commerce (EWOS ETG 066) as "Electronic exchange of
data to support business transactions, i.e. the exchange of value through the delivery of a
product from a seller to a buyer". As such it encompasses much more than what has been
possible using traditional methods of Electronic Data Interchange (EDI) such as
EDIFACT. Electronic commerce is defined by EWOS as covering activities such as
marketing, contract exchange, logistics support, settlement and interaction with
administrative bodies (e.g. tax and custom data interchange). Electronic commerce covers
all industrial and service operations, including services such as insurance, healthcare,
travel and interactive home shopping.
Many people use the term EDI to refer to the set of messages developed for business-tobusiness communication as part of the United Nations Standard Messages Directory for
Electronic Data Interchange for Administration, Commerce and Transport (EDIFACT).
EDIFACT messages are transmitted in compressed form, using predefined field
identifiers, which must occur in a predefined sequence. While EDI is, strictly speaking,
wider in scope than EDIFACT, for the purposes of these guidelines EDI will be used in
this restricted sense when not otherwise qualified.
The basic unit of information in an EDI message is the data element. For an EDI invoice,
each item being invoiced would be represented by a data element. Data elements can be
grouped into compound data elements, and data elements and/or compound data elements
may be grouped into data segments. Data segments can be grouped into loops; and loops
and/or data segments form business documents.
The EDIFACT standards define whether data segments are mandatory, optional, or
conditional, and indicate whether, how many times, and in what order a particular data
segment can be repeated. For each EDI message, a field definition table exists. For each
data segment, the field definition table includes a key field identifier string to indicate the
data elements to be included in the data segment, the sequence of the elements, whether
each element is mandatory, optional, or conditional, and the form of each element in
terms of the number of characters and whether the characters are numeric or alphabetic.
Similarly, field definition tables include data element identifier strings to describe
individual data elements. Element identifier strings define an element's name, a reference
designator, a data dictionary reference number specifying the location in a data dictionary
where information on the data element can be found, a requirement designator (either
mandatory, optional, or conditional), a type (such as numeric, decimal, or alphanumeric),
and a length (minimum and maximum number of characters). A data element dictionary
gives the content and meaning for each data element.
Originally, EDI translation software was developed to support a variety of private system
formats. Most often, the sender and receiver were required to contract in advance for a
tailored software program that would be dedicated to mapping between their two types of
datasets. Each time a new sender or receiver was added to the client list, a new translation
program would be needed by the new party to format their data to conform to the
standards in use by the participants. Of course, this becomes expensive. Such static
systems do not easily allow synchronization of business transactions in distributed
business processes that involve global rules, but with participants and actions that are not
predetermined. To solve these issues it is desirable to develop automated tools and
techniques that are easy to use and allow decomposition of transactions in actions to be
performed locally and mapping of local actions onto efficient protocol exchanges.
The Electronic Enterprise
The concept of the Electronic Enterprise requires a transition away from paper form
based EDI. Key concepts that are required are the encapsulation of agreed sets of
business rules (in EDI parlance the Implementation Guidelines) and also mechanisms to
handle state and flow control (such as those provided by hyperlink anchors in HTML
files). Also message sets must be able to handle partial information, where the complete
information is not yet available, or simply is not required for the particular business
process. This allows different parts of an enterprise to selectively contribute only the
information that is germane to their business functions.
A fundemental difference between the proposals in these XML/EDI Guidelines and those
found in other proposals for XML-based web retailing, such as those covered in the Open
Trading Protocol (OTP), is the client-centric nature of the business processes, as
contrasted with the server-centric nature of electronic retailing. To distinguish these two
terms, we use the term "Electronic Business" to refer to the processes of fulfilling
customer requirements through the application of negotiated business processes leading
to the supply of manufactured goods to retailers and service providers, and "Web
Commerce" to describe the process of selling manufactured goods to consumers.
Electronic business is client-centric in that is starts with a specification of a client's
requirements, rather than a statement of what the supplier has to offer. The specification
of requirements gets sent to a number of potential suppliers, who are asked to tender for
the business by a predefined date/time. The purchaser is, as a result of this process,
provided with more than one choice, and must determine which quotation to accept. This
may require a period of contract negotiation to ensure that adequate terms and conditions,
including delivery criteria, are met. This may require a looping of the processes, with a
need to cross-refer between successive documents.
Once the purchaser has selected a supplier the business processes involved are very
similar to those involved in web commerce, but there are subtle differences. For example,
electronic payment before delivery is unlikely to be required for electronic business
transactions. Instead of being an integral part of the negotiation phase, with payment
being made at the time the order is placed, payment in the electronic business scenario is
a separate process that occurs immediately after delivery. This introduces concepts such
as statements, which do not occur in web commerce scenarios.
The standards involved in XML/EDI
XML is the Extensible Markup Language subset of ISO's Standard Generalized Markup
Language (SGML) developed by the World Wide Web Consortium (W3C) SGML on the
Web working party during the latter half of 1996 and early 1997. The formal
recommendation was submitted for approval by W3C members on 8th December 1997.
On 10th September 1997 a proposal for a new form of XML Style Language (XSL),
which incorporates the ECMAScript standardized variant of JavaScript, was published by
a consortium led by Microsoft, ArborText and the Inso Corporation. This version of the
XML/EDI specification uses the power provided by this new advanced language
combination to show how control of XML/EDI document processes can be achieved in a
distributed manner.
In October 1997 a specification for a formal Document Object Model (DOM) for XML
documents was published by W3C. This model provides a standardized API for XMLbased tools.
Combining XML and EDI to develop XML/EDI suggests that the main method of
capturing and coding EDI information will be through XML-coded electronic forms. At
present the form handling characteristics of XML are yet to be fully agreed (agreement is
expected during 1998). To allow interaction with existing sytems the XML/EDI
Guidelines show how EDIFACT messages can be generated from XML/EDI forms, and
vice versa.
XML/EDI isn't creating a new standard. XML/EDI is defining how companies can use
current standards to solve their business problems.
3. Scope of XML/EDI
Detail of the scope of XML/EDI, and the impact it is expected to have on business
communities, are covered in Introducing XML/EDI.... To help readers of this document to
appreciate the differences in practice between traditional EDIFACT-based web
transactions and XML/EDI this section discusses some of the differences between
traditional business-to-business electronic data interchange systems and the new breed of
interactive electronic business tools being provided through the Internet.
Business-to-business Electronic Data Interchange
Electronic Data Interchange (EDI) has been used for business-to-business communication
for almost a quarter of a century. Initial efforts involved inter-company agreements on
how to exchange commercial data, initially as information stored on tape and later as
messages sent over dedicated data lines. To avoid having to use different protocols to
move data between different companies, various industry groups identified sets of data
that could form the basis of individual agreements. The industry groups also sought to
agree the format in which fields in such data sets were interchange so that a company
only needed to develop one methodology for decoding information received without
resource to human intervention.
The Achilles Heel for this approach has always been two fold. Firstly, companies require
flexibility in, and wish to deviate from, doctrinaire standards that do not fully meet their
business needs. Secondly, because the standards are pre-ordained there is no mechanism
provided to transfer processing rules and associated information. It is assumed that the
data meets the defined constraints and if not, has been duly modified to conform. This
means that companies must conduct exacting analysis to determine precisely how they
are going to move their business data to and from the predefined EDI formats. The cost
of these constraints has been borne as excessively long and complex implementation
cycles for traditional EDI systems.
The world has changed from thirty years ago, and now requires more dynamic and
vibrant services that match the organized yet ad hoc nature presented by both modern
business practice, and particularly its manifestations on the Internet. The Internet is rewriting the rules on how people interact, buy and sell, and exchange goods and services.
In particular the Internet is showing us that EDI is not only relevant for business-tobusiness communications. The same concepts are also relevant for all consumer-tosupplier relationships, whether the consumer is an end-user, a manufacturer, a service
organization such as a hospital or a hotel, a governmental organization or a virtual
organization.
Electronic Commerce in 1997
Electronic business transactions
With the arrival of the Internet in the last decade of the 20th century the pattern of
electronic commerce has dramatically changed. In particular, the Internet has introduced
many new ways of trading, allowing interaction between groups that previously could not
economically afford to trade with one another.
Whereas previously commercial data interchange involved mainly the movement of data
fields from one computer to another, without human intervention, the new model for
web-based commerce introduced by the Internet is typically dependent on human
interaction for the transaction to take place. The new model is based principally on the
use of interactive selection of a set of options, and on the completion of "electronic
forms", to specify user requirements.
As this new model develops there has been a fundamental shift in how data used for
commerce should be processed. The original create-->transmit-->receive-->process cycle
of information processing, using individual programs, is beginning to be replaced by the
concept of active objects which have inherent processes associated with them, based on
the class of information they contain. Today an invoice may no longer contain a copy of
the information stored in the database it was generated from: instead it contains a pointer
that says where it expects to get the data from, and this data will be fetched from its
managed source each time the invoice is processed.
Such interactive programs require us to review the underlying philosophy of electronic
commerce. What are the characteristics of a system designed for "electronic business
transactions" in an international marketplace?
To be truly interactive you need to be able to:
1. Understand the business concepts represented in the interchanged data.
2. Apply business-specific rules to the interchanged data to identify what class(es) of
data it contains and formulate appropriate responses.
To do this you need to be able to:






identify the role and syntax of each piece of interchanged data
identify the source of each shared piece of information
identify which pieces of information should occur in each interchanged set of data
and, if relevant, the order in which they occur in a particular message stream
identify who is responsible for creating, transmitting, receiving and processing
each message, and which programs should be used to control each of these
processes
identify when a message should be moved from one stage of the interchange
process to another
identify which rules should be used to check that the relevant forms of
interchange have taken place and to move data from one presentation template to
another.
Because these interactions can be complex, and potentially require specialized
knowledge, the rule templates can be supplemented by XML/EDI data manipulation
agents (DataBots) to ensure that users can express their requirements in high-level,
natural language, terms. DataBots automatically create appropriate rule templates and
XML syntax to match user requirements and broker the entire interchange.
When DataBots are being used XML/EDI is identified as being robot generated by
adding an R to its name to become XML/EDI-R.
At this point in time the ECMAScript subset of the Java programming language provides
the vehicle that permits the DataBots to be deployed and received along with XML/EDI
messages.
Future based on XML/EDI
4. Base Technologies of XML/EDI
XML/EDI is a synthesis of many concepts. XML/EDI:







uses the XML protocol as its "data interchange modelling" layer
uses the XSL protocol as its "presentation" layer
can be integrated with traditional methods of Electronic Data Interchange (EDI)
can be used with all standard Internet transport mechanisms such as IP routing,
HTTP, FTP and SMTP
allows for document-centric views and processing methodologies
uses modern programming tools such as Java and ActiveX to allow data to be
shared between programs
uses agent technologies for data manipulation, parsing, mapping, searching...
XML/EDI can be seen as the fusion of five existing technologies:
1.
2.
3.
4.
Web data interchange based on the new XML specification
Existing EDI business methods and message structures
Knowledge templates that provide process control logic
Data manipulation agents (DataBots) that perform specialist functions
5. Data repositories that allow relationships to be maintained.
The Five Technologies of XML/EDI
Why use XML?
XML will be native language for the next generation of most of the popular WWW
browsers. XML/EDI seeks to leverage the work and support (technically and financially)
which XML is receiving. With traditional EDI, the infrastructure was built from the
ground up, without being able to share resources with other programs. This paradigm is
no longer appropriate in today's world of shared software development. By adopting
XML/EDI, the EDI community can get to share the cost of extension and future
development.
In 1986 the International Organization for Standardization (ISO) published an
international standard defining a Standard Generalized Markup Language (SGML) that
allowed its users to:



identify the role and syntax of each piece of data in an interchanged document
identify which pieces of information should occur in each interchanged set of data
and, if relevant, the order in which each such element should occur in a particular
document
identify which programs should be used to control each of these processes.
SGML has formed the basis of many of the large, multinational, documentation projects
that have developed in the decade since its publication. It also formed the basis for the
formalization of the HyperText Markup Language (HTML) that led to the formation of
the World Wide Web of documentation that has become available on the Internet.
Key to the success of HTML was the development of the concept of Uniform Resource
Locators (URLs) that allow users to identify the source of each piece of shared data in a
consistent manner. Whilst the original concept has limitations as to the granularity of data
access, its universality has greatly improved computer-to-computer communications.
In July 1996 the World Wide Web Consortium (W3C) set up a working group to study
how SGML could be simplified to allow for its efficient use over the Internet. The result
was the development of an Extensible Markup Language (XML) that combined the
expressive power of SGML with the Internet-aware functionality of HTML.
XML provides an ideal methodology for electronic business because:





XML allows message type creators to clearly identify the role and syntax of each
piece of interchanged data using a definition that is both machine processable and
human interpretable
XML allows message type creators to identify the source of each shared structure
using an Internet Uniform Resource Locator
XML allows message type creators to optionally identify which pieces of
information should occur in each interchanged set of data and, where relevant, the
order in which individual fields should occur in a particular message stream
XML documents can be given metadata fields that can be used to identify who is
responsible for creating, transmitting, receiving and processing each message, and
can have built-in facilities for identifying the storage points of programs that
should be used to control processes
XML can make use of facilities provided by the latest version of the Internet
HyperText Transfer Protocol (HTTP), which can identify when a message should
be moved from one stage of the interchange process to another, and to check that
the relevant forms of interchange have taken place.
Integrating XML with EDI
XML can be integrated with existing EDI systems by:



providing application-specific forms that users can complete to generate EDI
messages
generating EDI message formats for transmission between computers over the
Internet, or through existing value-added networks (VANs)
allowing data received in EDI format to be interpreted according to sets of
predefined rules for display by the receiver on standardized browsers using a userdefined template, rather than having to rely on specially customized display
packages.
XML can extend existing EDI applications by:




allowing message creators to add application-specific data to standardized
message sets where required
allowing message creators or receivers to display the contents of each field in
conjunction with explanatory material which is specific to the application and the
language preferences of the user
allowing system developers to customize the help information associated with the
data for each field
allowing field value checking to be integrated with checks on the validity of the
data with respect to information stored on local databases.
5. XML/EDI Components
The following figyre illustrates the main layers of a fully integrated XML/EDI system.
The layers of an XML/EDI system
The XML/EDI specific components are built on top of existing standards for transmitting
and processing XML-encoded data. These standards define shared features such as:

the standard Internet file storage/naming and data transport mechanisms





file and message transfer formats
the syntax of data coded in XML
the way in which XML files can be validated by an XML parser or document
object model generator
the way in which XSL presentation and data evaluation scripts can be associated
with parsed objects
the use of rules and data management robots to manage application and repository
interfaces.
XML parsers, document browsers, page markup programs and related software functions
are available of-the-shelf today. XML/EDI isn't, therefore, a new standard; it simply
provides a framework for using existing standards to tackle existing problems in a new
way.
XML/EDI specific components will either manifest themselves as built-in components
into existing products, plug-in programs to existing tools or standalone applications. It is
anticipated that new applications will be created from the spark of XML/EDI
implementation.
Types of applications
The following examples of the type of facilities that could be built into an XML/EDI
implementation isn't comprehensive, but a starting place for discussion:







Lexicon Repositories (holding EDIFACT, X12, or BSI dictionaries, DTDs and
common business objects)
XML/EDI Data Manipulation Agents (DataBots) - which enable rule-driven
processing
XML/EDI Business Objects (predefined object processors identified by XSL
scripts)
XML/EDItors used for the creation and completion of form-based EDI messages
XML/EDI extensions for message stores
Search Agents which recognize XML/EDI specific tagging and are able to request
data from both private and public message stores (e.g. through Web interfacing)
Trading Partner Pages - extensions of "yellow pages", but with verified integrity
of characteristics.
Each of these options is explained in more detail in the following subsections.
Lexicon Repositories
A primary component of XML/EDI is its dynamic common language and syntax
repository. The various type of repositories include:


DTD Repositories
Segment/Element Repositories (e.g. EDIFACT, X12, or BSI dictionaries)


Business Object Repositories (see below)
Trading Partner Pages (see below).
XML/EDI Data Manipulation Agents (DataBots)
The central goals behind the development of the concept of DataBots are:



to automate EDI exchanges using familiar 'code-free' open spreadsheet and SQL
syntaxes, integrated with 'smart' rule-based tools, to making EDI universally
accessible
to provide a 'next generation' for EDI that is fully backward compatible with the
existing X12 and UN/EDIFACT standards, but provides key new business
functions
providing ability to seamlessly translate between X12 and UN/EDIFACT
standards.
All these goals are realizable using XML/EDI-R.
DataBots and their associated XSL scripts provide facilities that allow XML/EDI systems
to:













automatically analyze and manipulate the data structures embodied in the EDI or
XML messages. The rule set used to accomplish this is defined externally, while
the method itself has embedded knowledge of EDI/XML messaging that allows it
to process messages even with partial or incomplete structural information.
allow system developers to express complex EDI interactions through an interface
to a rule template that is clear and simple to follow. This allows users to focus on
the business rules and interactions, without having to use programming language
level tools for EDI.
exchange rule templates as a way to integrate their data interactions, and to access
and, where appropriate, update each others databases
extract and update data to and from heterogeneous data structures. Message
creators and receivers merely have to tell the system where the data is, or where
they want it to be placed, and the system will access the data directly.
provide a data interfacce for existing desktop database systems
use traditional EDI Implementation guideline rules, expressed as rule templates
that can be distributed electronically
provide in-memory, single pass, message translation that frees message creators
and receivers from constraints of database and message structure conditions
traditionally imposed by EDI systems
allow applications to exchange data integration rules and database structures
between systems on an ad hoc and rapid basis.
provide a template mapping system which allows details of the data manipulation
required for translation to be expressed by a lay person, without requiring that
they define the entire message structure in every detail
provide the means for the message creator to communicate process related
information independent of the receiver's processing system and data formats
provide formal descriptions for the complete range of commercial transactions,
modelling of transactions and their computational complexity, decomposition of
transactions in actions to be performed locally; and mapping of local actions on to
efficient protocol exchanges.
detect that the client requires either a later version of the XSL template, or the
Java components, and send these to the client as an HTTP Distribution and
Replication Protocol (DRP) update message.
exchange data, update data and translate data as required as it processes the XML
and thereby receives the templates and associated data. It will of course be able to
reverse the process, and take outbound data, translate it, and generate the
outbound XML as well.
It should also be noted that the template method that XML/EDI DataBots implement is
extremely compact and concise. This means that it is a low-bandwidth, efficient protocol,
which is required to meet high volume constraints in batch EDI delivery systems.
Some additional considerations also need to be taken into account include Process
Control and Object Oriented support. Process Control can be easily accommodated using
through the trend towards the use of the Integrated Computer Aided Manufacturing
(ICAM) Definition Language (IDEF) process modelling language or Documented Petri
Nets. Developers can either assign XML tokens to IDEF entities, and then process
control lines added to the template format, or IDEF can be defined as a notation that can
be processed by an XML/EDI-aware browser. Object oriented support can be provided
through W3C's Document Object Model (DOM), which provides a CORBA IDL
definition for XML objects.
Editor's Note: To do - explain how Documented Petri Nets could be used.
In summary, the optional DataBots component provides the agent that brokers, controls,
corrects, directs and ensures that the XML/EDI-R method can progress information
transfers correctly.
XML/EDI Business Objects
XML/EDI business objects will be available off-the shelf, created by developers, with
rule sequences devised by users. The usage of these objects can be defined by their
sphere of influence. Business objects can be:







common to a set of XML/EDI applications
common and shared within an industry
shared between trading partners
corporate infrastructure interface; departments, etc.
application specific objects
transaction specific
instance specific.
Business objects, in most but not all cases, will be invoked by the XML/EDI Data
Manipulation Agents. It is anticipated that for efficiency these object manipulation
DataBots will be written in Java, or using similarly dirstributed programming language
tools. End-users will be supplied with tools that automatically generate the relevant
agents from information provided about the application.
Below are just a few examples of the many possible classes of XML/EDI business
objects:




objects which allow rule driven mapping to/from databases
processing of applied taxes
processing of shipment costs
processing of document routing; internal and external.
XML/EDItors
Used for the interactive creation and completion of form-based EDI, XML/EDItors are
predicated to become the front-end for business applications. XML/EDI editors will
reference Lexicon Repositories to prompt users for appropriate data using XML parse
trees to request related fields.
XML/EDI extensions for message stores
It is anticipated that message stores will require extensions to provide the types of
complex workflow management needed to ensure the correct delivery and processing of
XML/EDI messages. For example, a message store should not be able to acknowledge
receipt of a message until its contents have been parsed by an XML parser to ensure that
the unencrypted data stream still forms a valid message.
In time it is anticipated that message stores will mutate to use XML natively. This is not
because of XML/EDI directly but because message stores that know how to identify,
search for and process objects within multimedia streams or business messages will be
required for a wide range of application scenarios.
Search Agents
Based on ad-hoc, learned or profiled information, search engines will recognize
XML/EDI specific tagging and be able to reference suitable private and public message
stores, using standard WWW interfacing, to extract data intelligently. This will allow for
the best combination of free-text and fielded search. Catalogs and buyer agents will be
among the first to use XML/EDI technology in this way.
Trading Partner Pages
XML/EDI will use a mix of today's X.500 technology, security certificates, "yellow
pages", Email look-up, and verified characteristics of entities. This is a critical
component of performing business, much more so when employing electronic means.
Subsystems will undoubtedly develop along these lines: they will have to support
XML/EDI interfacing of basic CRUD functions (Create, Revise, Update, Delete) as a
minimum. XML/EDI Data Manipulation Agents shall be able to draw upon these
resources to validate transactions.
6. The Implementation Process
Using XML for Electronic Data Interchange
The following stages are involved in using XML for the interchange of commercial EDI
messages:


identification of suitable data sets for electronic business transactions
development of XML document type definitions (DTDs) that formally define the
relationships of the fields that are to form a particular class of EDI messages





the definition of application-specific extensions to standard message types
the creation of instances of specific types of electronic business message
the validation of the contents of messages
the transmission and receipt of electronic business messages
the processing of electronic business messages using DataBots.
An application does not need to use all of the levels of processing shown in Figure 1 and
the above list: it can stop at whichever level in the hierarchy suits it. For example, an
application can confine itself to checking incoming and outgoing EDI messages using a
document object model that has been formally defined in an XML DTD.
Identifying data sets
Identification of data sets for electronic business transactions will often be the
responsibility of industry associations and various standardization bodies such as
UN/EDIFACT and EBES (the European Board for EDI standardization).
Whereas existing EDI definitions are primarily concerned with the way in which a set of
fields forms a message, the concepts required for XML/EDI are based more on the
definition of independent classes of information that can be combined together with other
classes of information to form interchangeable messages. As such the concepts are more
akin to the idea of a Basic Semantic Repository (BSR) being proposed by ISO, and of the
Business Systems Interconnection (BSI) proposal from University of Melbourne.
There is, however, one basic difference between using XML/EDI for defining data
classes and using the BSR or BSI methodologies. In XML/EDI the order and number of
subclasses of a data class can be altered by message creators without having to formally
register that fact with any centralized organization. For example, if it was necessary for
an application to separate building numbers or names from information about the street
the building is located within, XML/EDI would allow system developers to define two
new subclasses that would be combined to provide the information needed for an existing
EDI address component.
One of the advantages the accrues from XML/EDI's ability to subclass fields is that such
fields can be developed interactively using information supplied from more than one
location. For example, telephone order processing systems in today's world of electronic
business transactions often start by asking users for their postcode. This tells the system
which region, town and street the user is located in, but not which building they are in. To
find this out you need to ask the user for a number or name that uniquely identifies the
building within the street identified by the postcode. Using these two related pieces of
information it is possible to interactively complete a standardized class of information, an
address, that can then be shared by an order, its delivery note, and the invoice required
for settlement.
Once information has been captured once, and used to create an instance of the relevant
class of data, it should not be necessary to recreate the information each time it is
required. All that should be needed is that business processes that need this information
reference the point at which the data was originally captured, e.g. the address associated
with the order for the goods.
An essential precursor to the design process of an XML/EDI application is a study of
how business processes re-utilize stored information. Where suitable business models
already exist, these can be represented in XML form. Where there are no existing model,
or the existing models do not meet the requirements of the trading partners for some
reason, developers should perform a full analysis of the relevant business processes, and
seek to identify similarities between these processes and those already formally
documented for use by other applications. Knowledge of the source and contents of
public repositories of resusable data segments will help to simplify this process. One of
the goals of XML/EDI, therefore is to encourage the setting up of such repositories of
knowledge.
To ensure that users can guarantee the long-term maintenance of data set components
repositories of formal XML definitions will need to be created, and unique object
identifiers will need to be assigned to each set of components. While initially testing can
be done using system identifiers that resolve to Internet Unique Resource Locators
(URLs), in the longer term a mechanism for identifying shared data sets using formally
registered SGML public identifiers associated with URLs will need to be developed. A
system for resolving public identifiers to obtain copies of the registered definitions will
also be required.
Developing DTDs
Messages that pass between systems will typically conform to a previously agreed XML
document type definition (DTD) that formally describes, in terms interpretable by both
humans and computers, an internationally accepted message type.
Note: The structure of XML DTDs and document instances is formally defined in
Extensible Markup Language (XML). A bried introduction to the components of XML can
be found in An Introduction to the Extensible Markup Language. More complete
information on the the structure of SGML DTDs, including those that implement the Web
SGML extensions, can be found in Web SGML and HTML 4.0 Explained, which contains
examples of the use of each of the constructs used in SGML and XML, and explains how
these facilities are used within HTML.
Warning: The following text presumes some knowledge of SGML and/or XML.
XML DTDs can be developed by:


international standards bodies wishing to develop standardized sets of
interchangeable data
industry associations wishing to develop agreed procedures for interchanging
messages between members



one of the members of a multilateral or bilateral agreement to share information
a company wishing to supply information to a number of suppliers or customers
a company wishing to obtain information in a known format from a number of
suppliers or customers.
Declarations that form a standardized XML DTD will typically be stored in separate files,
which can be referenced, as an XML external subset, by those wishing to use it through
the Internet Uniform Resource Locator that its originator has assigned to a publicly
available copy of the data. Alternatively, if public access is to be restricted, the document
type definition can be stored as the internal subset within the document type definition
sent with the message.
Where the document type definition is based on classes of information shared by more
than one message, each class of information can be defined in a separate file, known in
XML as an external entity, these files being referenced in a suitable sequence from within
the external or internal subset of the XML DTD.
For example, an XML DTD could have the form:
%address;
%items;
This DTD fragment defines two external and one internal parameter entity, four locally
defined elements and contains two parameter entity references (%address; and %items;)
that call in the contents of the external entities at appropriate points in the definition. Both
of the parameter entity references are preceded by explanatory comments.
Note that the source of each class of information is identified not in the call to the class
itself (%address;) but within a formal definition of the data storage entities required to
process the class definition references (e.g. the first two lines of the DTD). This
technique allows files to be moved without having to change the main definition of the
DTD.
Typically the entity definitions will be stored outside the DTD, which will contain a
reference to the URL of the point at which the latest details of library file locations can be
found. For example:
%library;
%address;
%items;
where %library; references a file containing the entity definitions given at the start of
the previous example.
XML provides (experimental) facilities for ensuring that data modules taken from
libraries do not introduce name clashes in their elements. The names of elements within
each module can be qualified by a module (namespace) identifier. Each namespace
identifier can be associated with a URL that uniquely identifies where the module is
formally defined. For example, the contents of the library file referenced above could be
defined as:
">
Application-specific extensions
XML permits entities and attributes that are defined in the external subset to be redefined
in the internal subset. This facility allows XML/EDI users to develop locally significant
subclasses. It can also be used to create subsets of messages by removing unused fields
from the data model.
For example, the internal subset of a DTD based on the above standardized DTD could
contain the following local redefinition for the %items; parameter entity:
">
]>
123456
The SGML Centre
29 Oldbury Orchard
Churchdown
Glos.
GL3 2PU
key151235
15356378797
Special Offer 16
12
Note that, because of the prioritization SGML gives to local definitions, the definition for
the %items; parameter entity provided in the local subset will replace the reference to
the external source for the same entity provided as part of the file referenced using the
external subset.
Validating messages
XML/EDI messages can be validated by a validating XML document instance processor
(known as an XML parser) to ensure they contain all required elements from the
specified data set, and that the fields are in the required sequence. When the document is
found to be valid the parser can generate a document tree that conforms to the rules laid
down in the Document Object Model (DOM) specification that provides a standardized
API between XML parsers and browsers and other forms of program.
XML elements can be assigned attributes that point to processors that can undertake
relevant data validity checks. This can be done either by associating notation processors
with an element, or by associating an ECMAScript specification with the element as part
of an XSL "action" associated with the specific element types used in specific contexts, or
with particular attribute values.
Where the XML Style Language (XSL) is not being used (e.g. because the browser does
not yet support it) the basic XML language allows user-defined notation processors to be
used to validate the contents of specific XML elements. This is done by adding definitions
of the following form to the external or internal subset of the DTD:
...
The predefined check attribute of the EAN element will cause the contents of the element
to be passed to the program identified by the declaration for the notation assigned the
local name EAN-validator which is stored at the location indicated by the URL given in
the notation declaration. This processor would typically pass back a message indicating
whether or not the EAN is valid within the context of the relevant message.
XSL provides an alternative, and more generally applicable method that allows
ECMAScript to be used to validate the contents of XML elements. Details of this method
are given below under the heading "Processing messages".
Note: In December 1997 an extension to SGML allowed typed data attributes to be used
in standard SGML files. As soon as this new functionality is absorbed into XML it will be
possible to greatly simplify the validation of message contents.
Exchanging messages
Data captured in XML/EDI messages can be exchanged:





in the form of an XML file (which can be encoded in any way required, but would
normally be transmitted using the UTF-8 encoding of the UCS-2 data set by
default) interchanged using the HTTP protocol, or one of its derivatives (e.g.
Secure HTTP)
in the form of a multipart Internet e-mail message (MIME or Secure MIME
encoded)
in the form of a zipped (or otherwise encoded) file transferred using the Internet
File Transfer Protocol (FTP)
as a compressed (but not otherwise encrypted) set of extensions to an HTTP
POST message that conforms to Internet's Common Gateway Interface (CGI)
specification
in the form of an EDI message (created by processing the XML file at source
using a special conversion program).
Where conversion into a known EDIFACT format is required the DTD can be extended to
provide additional attributes that can guide the transformation process. For example, the
following additional properties could be added to the list of attributes assigned to the EAN
element:
Messages exchanged as XML/EDI files can be re-validated on receipt by running them
through an XML/EDI validating parser. Where messages have been converted into nonXML files prior to transmission the conversion should be reversed to allow re-validation
of the received message.
During re-validation any linked parts of messages should be retrieved to ensure that the
full contents of the message have been checked. When re-validation has been confirmed
the Document Object Model created as part of the validation process can be used to
create an auditable copy of the received message in a message store/database.
Processing messages
The way in which a received message would be processed would depend on which of the
available methods for exchanging messages was chosen. If the message was received in a
format that provided the XML/EDI message generated by the originator, the XML Style
Language (XSL) can be used to associate different processes with individual element
classes so that elements can be processed by one or more local processors.
XML/EDI message instances are specifically designed to make the selection of data fields
and classes at the receiver as easy as possible. Each field starts with a "start-tag" that
clearly identifies the class (element type in SGML/XML parlance) of the following data
or embedded subelement set, and specifies any non-default properties to be associated
with the data. The end of each data element is clearly identified by an "end-tag", which
consists of the name of the element (class) preceded by a slash between a matched pair of
outward pointing angle brackets. Fields that contain no data, and no embedded
subelements, (e.g. fields that are only present to point to other data sources) have the
slash indicating their end point immediately before the last angle bracket of the start-tag
rather than immediately after the first one of the end-tag. (See the example for the
element above.) Classes that contain subclasses of information have embedded elements
between their start-tag and end-tag.
XSL allows sets of actions to be associated with particular XML elements. Actions can be
defined in terms of values to be assigned to a set of data presentation attributes (styles),
or in terms of a data processing script that users can define using a define-script object .
XSL scripts are defined using the ECMAScript language used for exchanging Java
programming modules.
Which actions are associated with which elements can be defined using XML element sets
known as XSL rules. A simplified set of style-rules allow presentation properties to be
applied to element classes. Rules can be associated with elements that have been
assigned a unique identifier (id) attribute or that have been assigned a particular value
for a class attribute.
Sets of rules and actions can be defined in macros. Macros can be associated with style
processing attributes associated with specific instances of an element. The default set of
style properties defined in XSL can be extended using define-style objects
The component parts of an XML Style Sheet can be:




defined in separate file(s) referenced using Internet URLs
associated with the definition of elements in the DTD
appended as a header to the document instance, or
associated with a specific instance of an element.
A typical XML/EDI XSL description will contain:


a element that contains ECMAScript definitions of the variables and functions
required to process the document (in addition to the default function set provided
by XSL)
a set of elements that provide named sets of predefined actions










a set of elements that define properties that are to be used to control style
processing
a set of elements that contain within them:
a that indicates which type of element the rule is to apply to, or
an element that identifies the unique identifier of the particular instance of an
element the rule applies to, or
a element that identifies which class of elements the rule is to apply to
an element that defines ancestors and descendants of the targeted element that
must be present for the rule to apply (ancestors are dsfined by elements that
surround the targeted element definition, descendants are defined by elements
nested within the definition of the target element)
an element that identifies which attributes the selected element(s) must have
before the rule applies
an element, which may have embedded within it a set of (argument) control
elements, that indicates which macros are to be associated with the rule
a set of actions that must be processed/evaluated when the rule pattern is matched
a set of elements that show which presentation styles should be associated with
particular element types/classes/instances.
XSL actions are typically associated with the way in which objects should be presented to
users. This process is typically controlled through the use of flow objects. XSL provides
two default sets of flow objects, one based on the elements typically found in HTML files,
and the other based on the flow objects defined in ISO/IEC 10179 (DSSSL). The set of
DSSSL flow objects supported by XSL includes:







scroll - used for control of scrollable screen displays
paragraph - used to create blocks of text
line-field - used to control the presentation of lists
table, together with associated controls for table-part, table-column, table-row,
table-cell and table-border - used to control the presentation of tables
simple-page-sequence - used for creating multi-page documents
sequence - used for specifying inherited characteristics
link - used to control the presentation of hypertext links.
The element can be used to indicate points at which macros and scripts are to be
evaluated as a result of applying a rule.
For an example of the use of XSL specifications based on the use of HTML flow objects
refer to Appendix A.
Activating rules
The XML link process can be used to associate XML/EDI rules with a file. Normally the
Simple Link format will be used to identify one or more files containing the relevant
rules. Typically this will result in a processing instruction of the following form being
added to the start of the document instance:
Appendix A1: Using XML/EDI for Book Ordering
The following statement of the current role of EDI in Book Ordering was made by the
European Board of EDI Standardization by the UK Book Industry Communication (BIC)
manager, Brian Green in May 1997:
"The nature of the book trade has encouraged its adoption of various forms of Electronic
Commerce over the last 20 years. The introduction of a national UK standard book
numbering system in the 1960's and an international standard (ISBN) in the early 70's
together with central catalogues of books in print in nearly all countries was essential for
an industry where even the smallest retail outlet offered customers the facility to order
any one of around 600,000 books currently in print (in the UK) from 20,000 publishers
with, currently, one hundred thousand new titles appearing every year. There was no hub
in the traditional sense since, although WH Smith in the UK has always had a large
market share, the number of book titles stocked is relatively low and they have not, until
very recently, been much concerned with customers special orders.
In the late 1970's, the UK book trade set up Teleordering as a centralized ordering
service using a simple non-standard order format, providing dedicated terminals on
which booksellers simply keyed quantity and ISBN (their location number was installed
on the form as a default). The orders were polled overnight by Teleordering and
automatically routed to the correct publisher either electronically or, in the case of small
publishers, by mail or fax. The bookseller received a basic confirmation of receipt of the
order by Teleordering with an indication from the Teleordering database whether the
book was recorded as available or out of print. Today TeleOrdering has an annual
throughput of some 27 million orders, runs on PC's and is owned J Whitaker & Sons who
also publish a 'books in print' CD-ROM and provide a sales data monitoring service.
Teleordering has also established itself as an EDI VAN with a full range of Tradacoms
and EDIFACT messages. The two services run side by side and will convert the nonstandard Teleordering format orders coming from booksellers to EDIFACT or
Tradacoms for transmission to publishers.
Similar services were set up in other European countries, the US, Canada etc., although
the UK service has always been the largest in the world.
A second book trade EDI service, called First EDItion was set up in 1992 in the UK. This
is a pure EDI service based on INS and is particularly strong in the library sector. Both
First EDItion and Teleordering are being used for international trade, mainly between
UK publishers and European wholesalers who, e.g. in Netherlands and Germany,
operate their own dedicated electronic ordering services for booksellers in their
countries. First Edition has announced that it will introduce a book trade service based
on GE's "TradeWeb", which offers a forms-based Internet service linking to the GEIS
VAN.
There has been an interesting 'light EDI' scheme running in the UK for the last four
years. Following publication of the book trade Tradacoms messages by Book Industry
Communication, the UK book trade EDI body, the major UK wholesalers, who had until
then been offering dedicated electronic ordering services, decided to collaborate in a
service called BUYLINE. They provided all their bookseller customers, at a nominal cost
with simple forms based ordering software that links in with either the 'book bank' books
in print CD-ROM or a wholesalers own stockist, enabling the bookseller to select the
books required and choose their supplier from a pull down list. BUYLINE includes
communications software that dials up the selected supplier and transmits the order in
Tradacoms format. The software will also accept Tradacoms acknowledgments and
present these to the user in a simple user-friendly format. The rights in this product have
now reverted to the systems house, Triptych! ! ! , who developed it and they are extending
the service to the major distributors as well as wholesalers. Their software is also
included in a number of the book shop computer systems. It is generally expected that the
BUYLINE system will migrate to EDIFACT and use Internet rather than direct dial up
communications in due course.
A further development is the regular monthly production of multimedia CD-ROM stock
catalogues by major European wholesalers. Most of these allow users to build order files
and output them in EDI formats, normally using direct dial-up. It is anticipated that data
compression and increased bandwidth will soon allow these facilities to be available over
Internet. An important point, however, is that BIC in the UK and EDItEUR in Europe
have managed to produce a consensus on the book trade implementation of the messages
that ensures that all recent services use standard message formats."
BIC feel that trials of standard forms freely available over the Internet, outputting
EDIFACT messages to any trading partner able to receive them, would be very helpful.
Applying XML/EDI to Book Ordering
The form shown in Figure A.1 has been designed for displaying a book order based on
the EDItEUR Book Ordering Message as described in the EDItEUR EDI Implementation
Guidelines for Book Trade Distribution.
Note: The use of form fields in the following table is gratuitous: it is intended to indicate
that user interaction with XML displays is possible. The form was produced using beta
software for an add-on to Internet Explorer 4.0 (MSXSL) released by Microsoft in
January 1998 to demonstrate the power of their XML Scripting Language (XSL)
proposal.
Figure A.1: Form for displaying EDItEUR Lite-EDI Book Order Messages
Figure A.2 shows how XML is used to code a form in Figure A.1.
967634
19990308
5412345000176
0201403943
Bryan, Martin/SGML and HTML Explained
1
0856674427
Light, Richard/Presenting XML
1
Figure A.2: XML encoding of Book Order Message
A typical reaction to comparing this file with the displayed example and its EDI-based
EDItEUR equivaent is "Where has all the EDI information gone?", and "Where has all
the material under the table come from?". The answer is that all immutable information
goes into the document type definition (DTD) referenced in the <!DOCTYPE statement that
starts the coding, or in the associated style sheet.
Note: The beta release of MSXSL does not support the xml-stylesheet processing
instruction, and requires the use of another technique to associate the stylesheet with the
document instance.
Figure A.3 shows the contents of the DTD used for this example. The single line
reference to this DTD is sufficient to provide the browser with all the additional
information it needs to process the message.
Note how the definition of each element defined in Figure A.3 contains attributes whose
fixed values contain the prefixes and suffixes of each of the EDIFACT fields that may
need to be generated in response to the messages.
The message format generated from the completed form could be a pure EDIFACT
message of the type shown on Page II-2-2 of the EDItEUR EDI Implementation
Guidelines for Book Trade Distribution.
Note: The beta release MSXSL add-on to Internet Explorer 4.0 is only capable of
submitting form data in the form of an HTTP's Common Gateway Interchange (GCI)
message format. Conversion to EDI format would require additional program modules,
which are not shown in this simple example.
%ISOlat1; %ISOnum;
]>
Figure A.3: XML Document Type Definition for Lite-EDI Book Order
The XSL style sheet used to create the displayed version of the form shown in Figure A.1
took the form shown in Figure A.4:
Tick here if a delayed/partial supply of order is acceptable
Tick here if Confirmation of Acceptance of Order
is to be returned by e-mail
Tick here if e-mail Delivery Note is required
to confirm details of delivery
E-mail address:
Please respond in:
English
Press here to send completed form to supplier
Book Order No:
=this.text
Your reference number for order
Message Date:
=this.text
(ISO8601DateCheck(this.text))
Date in CCYYMMDD format
Buyer EAN:
=this.text
Your unique identification number
Supplier EAN:
ancestor("Book-Order", this).getAttribute("Supplier")
Book Supplies Incorporated
ISBN:
=this.text
Order line reference number:
String(Number(ancestor("Order-Line",this).getAttribute("Reference-No")))
Author/Title:
=this.text
Quantity:
=this.
Figure A.4: XSL Processing Rules for Example Lite-EDI Book Order
The style sheet is itself coded in XML, conforming to an unidentified meta-DTD specified
by the XSL protocol. The first element within this document shows how developers can
define functions using the ECMAScript language embedded within XSL. This example
converts ISO 8601 format dates to a form that is easier for users to check. This is a
simple example of the powerful client-side functionality that can be added to XSL stylesheets.
Note: The comments in the initial function indicate some problems encountered with
using the beta release of the MSXSL software.
The remainder of the style sheet consists of a set of elements that identify a sequence of
actions associated with target elements. The actions create HTML flow objects that
Internet Explorer 4.0 displays.
Note: It should be noted that these HTML elements have to conform to the XML syntax.
This is most evident in the case of the empty line break elements, which are entered as
Other significant features that should be noted from this example include:
1. The use of the contents of an attribute of the Book-Order element as the contents
of the Supplier EAN row of the table.
2. The call to the ISO8601DateCheck function associated with the rules for
displaying the Message-Date element.
3. The use of an attribute associated with the Order-Line element to assign
information to a set of fields in the displayed form.
Users of Version 4.0 of Microsoft's Internet Exlplorer web browser will find a
demonstration of the application of the above simple examples at
http://www.xmledi.net/edi-test.htm
Glossary
DataBots - XML/EDI Data Manipulation Agent (a.k.a. "Bot" is a software term for a
component that acts as an Agent).
XML/EDI-R - the combination of XML message syntax and rule based EDI.
Bibliography
Bons, R (1997) Designing Trustworthy Trade Procedures for EC.
To be developed