CobWeb: A Constraint-based XML for the Web 1

CobWeb: A Constraint-based XML for the Web1
Timothy D. McKernan
Bharat Jayaraman
Sangeetha Raghavan
Srivatsava Shanker
Department of Computer Science and Engineering
State University of New York at Buffalo
Buffalo, NY 14260-2000
E-Mail: {tdm4, bharat, sr33, sshanker} @cse.buffalo.edu
Phone: (716) 645-3180 x 111
Fax: (716) 645-3464
Abstract
We present a constraint-based extension of XML for specifying the structure and semantic coherence of
websites and their data. This extension is motivated by the fact that many websites, especially organizational
websites and corporate intranets, largely contain structured information. Such websites can be regarded as
databases. XML can help view, store, manipulate, and transfer semi-structured data that exists in files – often
web pages - and facilitates a less ad hoc method of handling data than HTML allows. In support of the view
that a website is a database, we introduce CobWeb, a constraint-based extension of XML. CobWeb allows
developers to express the concept of the semantic integrity of a database. Constraints may be used in a
document type definition (DTD) in order to place restrictions on the values of both elements and attributes in
the DTD. These constraints can effectively govern the contents of otherwise disparate web pages in a website,
thereby ensuring the both the structured and the semantic integrity of the site as a whole. We believe that
constraint XML can be the lingua franca of B2B E-commerce. It is easy, less expensive to be used on the
World Wide Web than Electronic Data Interchange, making it the best alternative for on-line transactions. We
provide unary (or domain) constraints, binary constraints (including various comparison operations), as well
as aggregation constraints (sum, average, etc.). We also define data types not included in the XML 1.0
specification, so that declaring constraints can be more easily facilitated. By taking advantage of XML’s
modular design, we can create a parser that works in conjunction with and extends existing XML parsers. We
have built a prototype implementation using this idea and tested out several of the examples presented in this
paper. We are presently extending this implementation and planning to apply to practical problems.
1 Introduction
The motivation for this work stems from the observation that many websites, especially those of businesses
and organizations, contain structured information, such as listings and itemized descriptions of people, places,
and events. Such websites can be regarded as databases, in that the structure of the information can be
described formally. We might consider using a language such as XML in order to represent this information.
XML provides the concept of document type definitions (DTDs) to describe the logical structure of a web
page. (Note that HTML is not a good choice because it is more oriented towards describing the layout, or
presentation, of a document.) Defining the logical structure of a website is important, since it is akin to
1
This paper may be referenced as Technical Report 2000-06, Department of Computer Science and Engineering, State University of
New York at Buffalo, May 2000. Last Revision: June 2001.
1
defining the schema of a database: not only can we check the validity of the content of a web page but we can
also facilitate searching the website. The specification of the logical structure of a website by itself is not
enough. In many instances, we also need to ensure that the data in the web pages is semantically coherent. For
example, suppose a web page contains a listing of sales figures for different geographic regions and also an
additional field for the total sales. In this case, we would like to state a constraint that the total sales figure is
the sum of the regional sales figures.
For another example, suppose we had a listing of the monthly sales figures for a given region, we might have
a separate field for the average monthly sales and a constraint that the value in this field is the average of the
values in all other monthly sales figures. Such semantic conditions are akin to the notion of integrity
2
constraints in databases. XML has constructs to describe the logical structure of a web page, but one cannot
specify integrity constraints using XML. In order to overcome this limitation, in this paper we present a
constraint-based extension of XML for specifying both logical structure and semantic coherence of a website.
Essentially, constraints may be used in a DTD in order to place restrictions on the values of the elements as
well as attributes in the DTD. We provide unary (or domain) constraints, binary constraints (including various
comparison operations), and aggregation constraints (sum, average, etc.). Although not considered in this
paper, the built-in constraint predicates can be augmented with user-defined predicates, which may then be
used in constraint specifications. A website whose content is specified using CobWeb will typically be
authored using a special editor that is capable of checking constraints (and, in some cases, generating values
of fields based upon the constraints). Thus, constraints are checked when the website is built (build-time)
rather than when the website is viewed or browsed (browse-time). Such a website is therefore correct by
construction. Moreover, a browser need not be equipped with the ability to check constraints, and will not
incur any time delays checking constraints during browsing. In this approach, the DTDs of constraint-based
XML pages are simply translated into standard DTDs.
The remainder of this paper is organized as follows: section 2 describes our constraint-based extensions;
sections 5 and 6 present two illustrative examples of constructs: an on-line brochure example and a product
comparison example; section 7 gives the comparison of Cobweb with other schema languages like
Schematron, DSD, XDR, SOX; section 8 gives the current status of our implementation of Cobweb and future
directions. We assume knowledge of XML Document Type Definition including elements, attributes,
namespaces and links. For an introduction to these features, the reader may refer to www.xml.org.
2 Constraint-based Extension of XML
Despite XML’s strengths in handling data, it has several weaknesses that hinder it from describing many
types of structured data. Specifically, XML does not define data types other than character data (#PCDATA),
and it does not support operations on the data it defines. We use constraints to deal with these problems. A
constraint declaration is expressed through a constraint expression.
Constraint_expr::=[NOT] Constraint(Lop constraint)*
Lop ::= AND|OR
Constraint ::= Complex_id (Rop Complex_id)*
Rop ::=
2
ge|gt|le|lt|=
See integrity constraints in XML http://ftp.sas.com/techsup/download/technote/ts594.html
2
Complex_id ::= Simple_id(.Simple_id)* [:attribute| :href(Complex_id)]
Simple_id ::= identifier | identifier ‘[‘ index ‘]’
Index ::= first | last | pos_integer
Aggregate_constraint ::= Agg_term Rop Agg_term
| constraint Agg_op (Complex_id)
Agg_term
::= Complex_id | Agg_func ( Complex_id)
Constraint ::= Complex_id (Rop Agg_op
Complex_id)*
Agg_op ::= Agg_func | Agg_pred
Agg_func ::= SUM |AVE |COUNT
Agg_pred ::= ASCENDING|DESCENDING
Quantified constraint ::= [EVERY | EXIST] (x : Complex_id) Constraint_ expr
A constraint is simply a predicate (or its negation) applied to its arguments. In the above syntax, lop can be
AND or OR. The syntax allows constraints to be placed on elements as well as attributes. While a DTD
specifies the logical structure of some piece of data, constraints augment a DTD by specifying the semantic
conditions on the integrity of the data. A CONSTRAINT is placed in the DTD because a DTD defines all other
aspects of the meaning of some data. Note that constraints may be placed on both elements and their attributes.
Constraints are often categorized as either unary or binary constraints. A unary constraint is a constraint that
acts upon a single variable, such as a domain constraint, while a binary constraint describes two or more
variables. For example, a variable x is given with the constraint that it must be even. This is a unary constraint
because the constraint acts only on the variable x. A binary constraint involves a predicate such as lt with
two arguments, e.g. x lt y. Other common relational operators are included in constrained XML: gt
(greater than), = (equal to), != (not equal to), le (less than or equal to), ge (greater than or equal to).
These operators allow simple comparisons between values. The logical operators used are AND, OR, and NOT.
The AND operator is included for completeness, and to give programmers a familiar way to express these
concepts, even though they can already be expressed with the tools described so far. For example, any
element that needs two constraints can be expressed with separate constraint elements or with the AND
operator. Unlike standard XML, in which a tag’s data has only a character type, constrained XML also uses
integer, real, and string types. These types are specified in the ELEMENT declaration as #INTEGER, #REAL
(as in the above example), and #STRING. In ATTRIBUTE declarations these types as specified as
INTEGER, REAL, and STRING. The inclusion of aggregate types, such as a DATE type, are being
considered for future versions of CobWeb.
Extensions to the Element definition
Like Constraint based extension, we also extend the element definition by introducing Inheritance
and sets. We have a preview of inheritance in this section while a much-detailed explanation can be
found in Section 3.
3
1. Inheritance
<! ELEMENT child :: parent element_body>
Inheritance is explained in detail at a later section.
2. ENUM
3. Set Abstraction: Element body can have a set in addition to sequence e.g. {a, b, c} is a short-hand
notation for
(a, b, c) | (a, c, b) | (b, a, c) | (b, c, a) | (c, a, b) | (c, b, a).
Complex Identifier
An XML document basically has a tree-structure. Therefore Cobweb uses “Complex Identifier” for
identifying nodes within the tree. A complex identifier such as a.b.c defines a path for the root of the tree to a
node such that sequence of nodes on the path have element names a, b and c.
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
a.b
a.b.c
a . b[2] . c[3]
a[1] . b[2] . c[3]
a : attrA
a . b : attrB
The subscript 2 refers to the second branch of b or the second sub-element of b. When we place constraints on
attributes, we use the colon identifier. Readers may refer to example 1 in section 3 for this aspect. We can
also place a constraint on the attribute of a sub element through the identifiers.
2.1 Simple Examples
Example. A constraint may be placed on a nested element. The relation of the parent element to the nested
element is shown by using a dot notation:
<?xml
encoding = "US-ASCII"?>
<!ELEMENT adult
(age)>
<!ELEMENT age
(#INTEGER)>
<!CONSTRAINT (adult.age ge 18)>
This defines the constraint on the age to only those age elements contained within an adult element.
Example We could specify the same constraint using attributes; they are specified in a similar manner using a
colon. We reformulate example 1 using an attribute.
<?xml
encoding = "US-ASCII"?>
<!ELEMENT adult
EMPTY>
<!ATTLIST adult age REAL
#REQUIRED >
<!CONSTRAINT (adult:age ge 18)>
4
Example Constraints may be combined to form more powerful declarations:
<?xml
<!ELEMENT
<!CONSTRAINT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!CONSTRAINT
encoding = "US-ASCII"?>
gender
(#PCDATA)>
(gender = "male" OR gender = "female") >
age
(#REAL)>
patient
(age,gender)>
m_a_p
(patient)>
(m_a_p.age ge 18 AND m_a_p.gender = “male”)>
Here we are using conjunctive and disjunctive operators to define both the gender and patient
elements for a web page that displays information based on the patient’s age and gender.
<?xml version="1.0"?>
<!DOCTYPE male_patient SYSTEM "patient.dtd">
<male_patient>
<patient>
<age>21</age>
<gender>male</gender>
</patient>
</male_patient>
Example Now, we illustrate the use of links and constructs to specify how to access data in external resources,
as well as how to specify the type of data that should exist in these resources.
<?xml encoding="US-ASCII"?>
<!ELEMENT
patient
EMPTY>
<!ATTLIST
patient
xmlns:xlink CDATA
#FIXED “http://www.w3.org/XML/XLink/0.9”
xlink:type
(locator) #FIXED
xlink:href
CDATA
#REQUIRED >
<!ELEMENT
adult (age)>
<!ELEMENT
age (#INTEGER)>
<!CONSTRAINT
(patient:href() = adult)>
<!CONSTRAINT
(patient:href(adult.age) ge 18)>
The first constraint declares that the document found by following the patient’s href attribute must
contain an adult element as its document root. The constraint on the age remains the same as in previous
examples, but now it is accessed by following the patient’s href attribute to the adult page.
2.2 Aggregate Operations
<!CONSTRAINT
(Complex_id Rop agg_ op ( Complex_id))>
An aggregate operation is function that maps a set of values to a single value, e.g., summation of the members
of the set, the minimum or maximum value in the set, the average value, etc. Since many webpages contain
collections of entities, we often wish to express a constraint in terms of the aggregate value of some collection.
In the example below, we show a constraint that makes use of the average value of a set of student grades.
<?xml encoding="US-ASCII"?>
<!ELEMENT
class
(course, average, student+)>
<!ELEMENT
course
(#PCDATA)>
5
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!CONSTRAINT
student
name
grade
average
(average =
(name, grade)>
(#PCDATA)>
(#REAL)>
(#REAL)>
AVERAGE(class.student.grade))>
The introduction of data typing allows more complex data structures to be created. In XML, as in HTML,
elements may be nested within elements. This is similar to objects containing other objects in an objectoriented language; some of the implications are the same. One element may contain a set of elements
describing a student (i.e. name, age, id number, grades, etc.). Several student elements may be combined with
a similar teacher element to create a class element, and so on.
Adding constraints to these structures allows more precise combinations to be created. We make constraints
available to the programmer through several predicate functions. The functions return a Boolean value
depending upon whether or not the specified constraint has been met. The use of predicate functions over a
more declarative syntax allows a programming device to be used that is familiar to most web authors. A
simple example of predicate functions is the COUNT(c element) function. The function compares the
value of the constrained element to the number of occurrences of c element, and returns true if they are
equal, and false otherwise. In a DTD, it would look something like this:
<?xml encoding="US-ASCII"?>
<!ELEMENT
packing_slip
(address, item+, total_items)>
<!ELEMENT
address
(#PCDATA)>
<!ELEMENT
item
(#INTEGER)>
<!ELEMENT
total_items
(#INTEGER)>
<!CONSTRAINT
(total_items = COUNT(item))>
The value of total_items must equal the number of item tags in a document. A packing slip would
use this: the slip lists the total number of items in a package, and itemizes each one. Each item is listed as an
item tag (perhaps with the name of the item in a tag nested within the item tag). The total items
tag must be an accurate count of these items so that the packers can check that they have filled the package
correctly. Unlike COUNT(), some of the functions have more than one format. The SUM() function can
have one or more parameters: SUM(element), and SUM(element a, element b, ...). SUM()
always figures out the sum of the data values for each element that is passed in to it, and for every instance of
that element in the document. A computer parts distributor might have a document containing a list of
products ordered by each customer. A table at the bottom of the page lists how many of each product are
needed. SUM() would check how many total speakers, modems, etc, have been ordered by adding the values
of the data in the quantity tag of each product.
<?xml encoding="US-ASCII"?>
<!ELEMENT shipping_orders
(customer*, modem_total, monitor_total,
keyboard_total, drive_total)>
<!ELEMENT
customer
(name, address, items*)>
<!ELEMENT
name
(#PCDATA)>
<!ELEMENT
address
(#PCDATA)>
<!ELEMENT
items
((modems | monitors | keyboards | drives)*)
>
<!ELEMENT
modems
(quantity)>
<!ELEMENT
monitors
(quantity)>
<!ELEMENT
keyboards
(quantity)>
<!ELEMENT
drives
(quantity)>
<!ELEMENT
quantity
(#INTEGER)>
6
<!ELEMENT
<!CONSTRAINT
modem_total
(#INTEGER)>
(SUM(customer.items.modems.quantity)) >
<!ELEMENT
<!CONSTRAINT
monitor_total
(monitor_total=
<!ELEMENT
<!CONSTRAINT
<!ELEMENT
<!CONSTRAINT
keyboard_total
(#INTEGER)>
(keyboard_total= SUM(customer.items.keyboards.quantity)) >
drive_total
(#INTEGER)>
(drive_total=
SUM(customer.items.drives.quantity)) >
(#INTEGER)>
SUM(customer.items.monitors.quantity)) >
2.2.1 Ordering
<!CONSTRAINT (ASCENDING(Complex_id))>
<!CONSTRAINT (DESCENDING(Complex_id))>
Ordering is a form of aggregation constraint. The ASCENDING() and DESCENDING() constraints are
used to put tags in either alphabetical or numerical order according to their data. This immediately allows lists
of names, lists of products, chronologies, etc, to be specified in a DTD:
<!ELEMENT
<!CONSTRAINT
roster
(member*)>
(ASCENDING(roster.member.name)>
<!ELEMENT
<!ELEMENT
member
name
(name)>
(#PCDATA)>
Any XML document that uses this constraint must fulfill it in the following way: the roster element has
been constrained so that its member elements must be ordered alphabetically by the nametags within
them. A roster of members usually has more structure to it than just an alphabetical order. Consider a roster of
faculty members. Often a department’s web page will first group faculty according to title (i.e., full professor,
associate professor, lecturer, etc), and then alphabetize within the ranks. Ordering by title is an interesting
problem because the title names are not in alphabetical order. As far as a standard XML application is
concerned, there is no way to order these titles.
The enumeration data type can be very useful for aggregation and for performing comparison operations
when data is in the form of string.
<!ELEMENT
<!ENUM
week
days
(days+)>
MON|TUE|WED|THU|FRI|SAT|SUN>
Given only the ASCENDING() and DESCENDING() constraints, a separate tag must be created within the
faculty tag and given a numeric rank, so that the new rank tags may be ordered:
<!ELEMENT
<!CONSTRAINT
<!CONSTRAINT
<!ELEMENT
<!ATTLIST
title
>
<!ELEMENT
roster
(member*)>
ASCENDING(roster.member.rank)>
ASCENDING(roster.member.name)>
member
(name, rank)>
member
CDATA
#REQUIRED
name
(#PCDATA)>
7
<!ELEMENT
rank
(#INTEGER)>
2.2.2 Quantified Constraints
<!CONSTRAINT (EVERY (x : Complex_id) constraint_expr)>
<!CONSTRAINT (EXISTS (x : Complex_id) constraint_expr)>
We also provide the EVERY and EXISTS constraints for stating conditions that must be satisfied by all or
some elements respectively. For example, suppose we wanted to state the condition that every faculty member
in a hiring committee must have a rank of 1, we can state this as follows:
<!ELEMENT
<!CONSTRAINT
<!ELEMENT
<!ATTLIST
title
>
<!ELEMENT
<!ELEMENT
hiring_committee (member*)>
EVERY(x:hiring_committee.member.rank ) x = 1>
member (name, rank)>
member
CDATA
#REQUIRED
name
rank
(#PCDATA)>
(#INTEGER)>
On the other hand, if we wanted to state that at least one member of the graduate affairs committee should be
a student, we can state this as:
<!ELEMENT
<!CONSTRAINT
<!ELEMENT
<!ATTLIST
title
>
<!ELEMENT
<!ELEMENT
graduate_affairs (member*)>
EXISTS(x:graduate_affairs.member.rank) x = 3>
member (name, rank)>
member
CDATA
#REQUIRED
name
rank
(#PCDATA)>
(#INTEGER)>
Referential Integrity
We can achieve referential integrity using quantified constraints. For example,
All instructors who take class should be a faculty in some department. Hence we have a foreign key constraint as
follows,
(∀x: Instructor) (∃y: faculty_of_some_department) x = y
<!CONSTRAINT
(EVERY (x : instructor)) (EXISTS(y : department)) x = y>
3 Inheritance
<!ELEMENT child :: parent element_body>
The ‘double colon’ ( :: ) notation is used for inheritance.
8
Inheritance is essential in any language that follows object-oriented technology. Inheritance is achieved by
extending the base type. It can be categorized as
Consider the base class student and derived classes ‘teaching assistant’ and ‘research assistant’.
<?Xml encoding="US-ASCII"?>
<!ELEMENT
student
(student-id,name, transcript)>
<!ELEMENT
transcript (student-id,course-list)>
<!ELEMENT
course-list (course-id,semester,grade)+>
<!ELEMENT
student-id (#PCDATA)>
<!ELEMENT
name
(#PCDATA)>
<!ELEMENT
grade
(#PCDATA)>
<!ELEMENT
semester
(#PCDATA)>
<!ELEMENT
course-id (#PCDATA)>
<!CONSTRAINT (student .transcript .course-list.course-id .grade != “F”)>
A student can be a Teaching Assistant or a Research Assistant. Thus the ‘student’ superclass has two
subclasses ‘TA’ and ‘RA’.
<!ELEMENT
TA :: student (work, stipend)>
<!ATTLIST
TA
duties #PCDATA (#REQUIRED) default=’grade students’>
stipend #REAL (#REQUIRED) >
<!CONSTRAINT (TA . transcript.course-list.course-id.grade ge “B+”) >
<!ELEMENT
RA :: student (work, stipend)>
<!ATTLIST
RA
duties #PCDATA (#REQUIRED) default=’grade students’>
stipend #REAL (#REQUIRED) >
<!CONSTRAINT (RA . transcript.course-list.course-id.grade ge “B”) >
Series-Parallel Circuits
<!—The circuit example -->
<?xml encoding="US-ASCII"?>
<!ELEMENT component >
<!ATTLIST component
voltage
#REAL (#REQUIRED)
current
#REAL (#REQUIRED)
resistance #REAL (#REQUIRED)
>
<!CONSTRAINT (voltage = current*resistance)>
<!ELEMENT series::component (ser_comps)>
<!ELEMENT ser_comps (component+)>
<!CONSTRAINT voltage
= SUM(ser_comps.component:voltage)>
<!CONSTRAINT current
= EVERY(ser_comps.component:current)>
<!CONSTRAINT resistance = SUM(ser_comps.component:resistance)>
<!ELEMENT parallel::component (par_comps)>
<!ELEMENT par_comps (component+)>
<!CONSTRAINT voltage = EVERY(ser_comps.component:voltage)>
<!CONSTRAINT current = SUM(ser_comps.component:current)>
9
<!CONSTRAINT 1/resistance = SUM(1/(ser_comps.component:resistance))>
<!ELEMENT Battery >
<!ATTLIST Battery
voltage #REAL (#REQUIRED)
>
<!ELEMENT connect (battery, component)>
<!CONSTRAINT battery:voltage = component:voltage)>
4 Miscellaneous features
Include & Import
In the ‘include’ statement, externally defined schema fragments, which have the same target namespace as the
current schema, are pulled in for convenience using the ‘include’ feature. If the schema definitions are
modular, it improves readability and maintenance.
Include DTD location= ‘Universal Resource Indicator’
Once the above statement is stated, all definitions in the URI are automatically included in the working
schema. The same applies with ‘import’ but with a subtle difference.
In the ‘import’ statement, externally defined schema fragments, which have different target namespace as the
current schema, are pulled in for convenience using the import feature.
Default value for element
While constraint DTD offers default values for attributes, we decided to extend this property by allowing
default values for elements also.
<!ELEMENT address (city,state)>
<!ELEMENT city
(# PCDATA)>
<!ELEMENT state
(# PCDATA)>
default=’Buffalo’
default=’NY’
Link Traversal
Binary constraints also offer a way to check the integrity of a whole website. The arguments to any constraint
may cross over to remote files without a need for a separate syntax. A teacher may make a page for a class
that includes the class average. This average may be checked by accessing each student’s individual page and
reading the student’s average. A new version of the class average example from section 2.2 looks like this:
<?xml encoding="US-ASCII"?>
<!ELEMENT
class (course, average, student+)>
<!CONSTRAINT
(class.student:href() = student_page) >
<!ELEMENT
course
(#PCDATA)>
<!ELEMENT
average
(#REAL)>
<!CONSTRAINT (AVERAGE(class.student:href(student_page.average))) >
<!ELEMENT
student
EMPTY>
<!ATTLIST
student
xmlns:xlink
CDATA
10
#FIXED "http://www.w3.org/XML/XLink/0.9"
xlink:type (locator) #FIXED
xlink:href CDATA #REQUIRED
>
<!ELEMENT
<!ELEMENT
<!ELEMENT
student_page (name,grade)>
name
(#PCDATA)>
grade
(#REAL)>
A CobWeb-aware application knows to traverse the link and looks for a student page tag (the document
root), and then looks for the average tag, which contains the value we want. In order for the constraint
checking application to know what tags are in a student’s page, the DTD for a student page must also be
declared in the class page, or the elements may simply be included in the same DTD.
Combining link traversal with ordering allows even more complicated structures to be defined. Consider a
school’s online brochure, which is composed of a main page containing the table of contents, that links to
each section of the brochure. The order of the sections is important. In addition, each section has a link to the
previous and following sections, as well as back to the main page. Constraints may be used to ensure that the
links to previous, next, and main are correct for each page. This model would allow the structure of multiple
web pages to be checked. Brochures, tutorials, and well-structured literature (such as plays) could all be
checked in this way. The online brochure example will be fully explained in a detailed example below.
5 Case studies
5.1 The Online Brochure Example
Using the concepts described in the earlier sections, we now demonstrate how CobWeb can be used as a
solution to a common problem in websites: verifying data content and links across multiple, interconnected
web pages. Many websites contain online brochures or other similarly structured web pages, such as tutorials
and books. In these web pages, a table of contents is defined which links together several ordered sections.
Computer Engineering?
Section 1: What is Computer Science
1 -----
Table of Contents
1 ----2 ----3 ----4 ----5 -----
Home
Next
Section 2: General Information
about under graduate programs
2 ---11
The table of contents from the University at Buffalo’s Computer Science and Engineering Department
undergraduate brochure is given above as an example. The sections refer to each other: Section 1 contains
links to the table of contents and to Section 2. Section 2 contains links to the table of contents, Section 1, and
Section 3, and so on. The links within sections create two ordered lists, one linking Section 1 through Section
18, and the other linking the sections in the reverse order. Larger and more complex linking schemes are
possible. A group of interconnected tutorials, such as the one Sun Microsystems has for Java technology
(http://web2.java.sun.com/docs/books/tutorial/), could have its structure verified by CobWeb. Checking the
accuracy of these links is necessary, and yet tedious and impractical for large structures. While this problem
cannot be solved using standard XML, CobWeb offers a concise solution. The DTD below defines both the
table of contents and the sections.
12
<?xml encoding="US-ASCII"?>
<!ELEMENT
toc
(toc_section+, toc_loc)>
<!ELEMENT
toc_section
(toc_description)>
<!ATTLIST
toc_section
xmlns:xlink
CDATA
#FIXED "http://www.w3.org/XML/XLink/0.9"
xlink:type (locator) #FIXED "locator"
xlink:href CDATA #REQUIRED
xlink:role CDATA #IMPLIED
>
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ATTLIST
toc_description
(toc_number, toc_title)>
toc_number
(#INTEGER)>
toc_title
(#PCDATA)>
toc_loc
EMPTY>
toc_loc
xmlns:xlink
CDATA
#FIXED "http://www.w3.org/XML/XLink/0.9"
xlink:type
(locator) #FIXED "locator"
xlink:href CDATA
#REQUIRED
xlink:role
CDATA
#FIXED "toc_loc"
xlink:title
CDATA
#FIXED "Table of Contents"
>
<!CONSTRAINT (toc_section:href(section.toc_loc:href)=/toc_loc:href) >
<!CONSTRAINT (toc_section.toc_description.toc_number =
toc_section:href:(section.toc_description.toc_number))
>
<!CONSTRAINT (ASCENDING(toc_section.toc_description.toc_number)) >
<!ELEMENT
section
next_loc?)
>
(toc_description,
section_data,
toc_loc,
prev_loc?,
<!ELEMENT section_data (#PCDATA)>
<!ELEMENT prev_loc EMPTY>
<!ATTLIST prev_loc
xmlns:xlink CDATA
#FIXED "http://www.w3.org/XML/XLink/0.9"
xlink:type (locator) #FIXED "locator"
xlink:href CDATA #REQUIRED
xlink:role CDATA #FIXED "prev_loc"
xlink:title CDATA #REQUIRED
>
13
<!ELEMENT next_loc EMPTY>
<!ATTLIST next_loc
xmlns:xlink CDATA
#FIXED "http://www.w3.org/XML/XLink/0.9"
xlink:type (locator) #FIXED "locator"
xlink:href CDATA #REQUIRED
xlink:role CDATA #FIXED "next_loc"
xlink:title CDATA #REQUIRED
>
<!CONSTRAINT ((section.prev_loc:href(section.toc_description.toc_number) + 1)
= section.toc_description.toc_number)
>
<!CONSTRAINT ((section.next_loc:href(section.toc_description.toc_number) - 1)
= section.toc_description.toc_number)>
The user may refer to an XML instance of the above DTD in Appendix II.
A toc document (Table Of Contents document) is composed of multiple toc section’s, as well as a pointer to
itself. Each toc section contains a link to a corresponding webpage with the expected text describing a
particular facet of the undergraduate program. Each toc section also contains a toc description which has a
number and also a section title.
A section element contains a toc description which contains data matching its corresponding toc description
in the table of contents webpage. section data is the text of the section. toc loc, prev loc, and next loc are the
links to the table of contents page, the previous section, and the next section, respectively.
The correct ordering of each section is checked using multiple constraints. Each toc section is checked to
make certain that its corresponding section correctly points to the table of contents. Each toc section is also
has its toc number compared against the corresponding section’s toc number to make certain they are equal.
Within each section , the prev loc and next loc sections are checked using their toc number numbers. This
combination of constraints ensures that the proper structure of the brochure is maintained. Only four
constraints are added, which comprise approximately 25% of the code, yet the effects of the constraints are
powerful: without them, a separate and unique application is needed to check this structure.
5.2 Product Comparison Example
In this example, data from other websites is collected, analyzed, and displayed for a user to browse. Unlike
the online brochure, in which constraints are used to maintain the structure of the web site, we now use
constraints as a way of collecting and modifying data. An application is written that searches popular
ecommerce sites (in this case, Barnes & Noble and Amazon.com) for product prices and shipping costs. The
results are mapped into an XML page, which adds up the total product plus shipping costs for each site,
displays each site’s relevant information, and then displays the site with the lowest cost. This example makes
use of two systems - one is the information-gathering application that collects the product information from
each site. This application may or may not get information in the correct XML format. It will transform that
data into a syntactically correct XML tree (in memory) and present it to the XML parser. The parser is the
second system, where the constraint checking occurs. It is through the constraint checking that the parser will
add product and shipping costs, and determine the lowest price. The comparator.dtd is the file that specifies
how site and product information must be presented in order for the parser to correctly compare sites.
14
The comparator element contains a list of sites as well as a separate best buy element, which has the site with
the lowest price in it. A site has a name (”Amazon.com”), a list of products, shipping costs, a total price
which combines the products and shipping costs, and a link to the store’s website. The products element may
be a book, a cd, or a dvd, each with a title and price, and either a catalog number (for cd’s and dvd’s) or an
ISBN number (for books), as well as shipping costs.
Constraints are added to each product: the title, catalog number/ISBN, and price of each product are all
derived from each store’s website. The total price element then uses this information about each product to
add up the sum of the prices and the shipping costs. Finally, a constraint is placed on the best buy element
which says that the best buy element’s site.total price must be equal to the minimum of the total prices of all
sites. The basic comparing mechanism described above can be easily extended to make more complicated
analyses. Information regarding when the product will ship (”shipped by” or ”in stock”) could be collected
and used to help find the best choice in those cases where the best choice depends upon shipping dates.
The DTD for the comparator looks like this:
<!-- add links to site and add constraints -->
<?xml encoding="US-ASCII"?>
<!ELEMENT comparator (site*, best_buy)>
<!ELEMENT site (name, products, shipping, total_price)>
<!ATTLIST site
xmlns:xlink CDATA #FIXED "http://www.w3.org/XML/XLink/0.9"
xlink:type (locator) #FIXED
xlink:href CDATA #REQUIRED
xlink:role CDATA #FIXED "data location"
xlink:title CDATA #FIXED "Click to visit the site"
>
<!ELEMENT name
(#PCDATA)>
<!ELEMENT products(book*, cd*, dvd*)>
<!ELEMENT book
(title, ISBN, price, shipping?)>
<!ELEMENT cd
(title, catalog_number, price, shipping?)>
<!ELEMENT dvd
<!ELEMENT title
(title, catalog_number, price, shipping?)>
(#PCDATA)>
<!CONSTRAINT
(title = comparator.site:href(product.title))>
<!ELEMENT ISBN
(#PCDATA)>
<!CONSTRAINT
(ISBN = comparator.site:href(product.ISBN))>
<!ELEMENT price
(#REAL)>
<!CONSTRAINT
(price = comparator.site:href(product.price))>
<!ELEMENT catalog_number (#PCDATA)>
<!ELEMENT shipping (#REAL)>
<!CONSTRAINT (shipping = comparator.site:href(shipping.ground))>
<!ELEMENT total_price (#REAL)>
<!CONSTRAINT (SUM(comparator.site.products.*.price)
15
+ SUM(comparator.site.products.*.shipping + comparator.site.shipping)
>
<!ELEMENT best_buy (site)>
<!CONSTRAINT (best_buy.site.total_price == MIN(comparator.site.total_price))
>
The user may refer to an XML instance of the above DTD in Appendix II.
6 Comparison with other Schema Languages
XML Schema and CobWeb
The XML Schema specification describes several similar features. In particular, XML Schema defines data
types, and promotes the creation of new types based on combining different elements. [6] [7] The most
important difference between XML Schema and CobWeb is CobWeb’s support for link traversal. CobWeb
allows developers to define whole websites, and to extract data from different webpages in order to check
constraints - XML Schema does not include support for this. XML Schema does not support the range of
constraints that CobWeb does. Also, the CobWeb syntax tends to be more concise, and uses the DTD to
declare constraints. XML Schema does not use a DTD and instead requires a separate file for its specification
(an XSD file). Below we give sample XML and XSD files that are on w3’s website. [8] Then we define the
same XML document using CobWeb. The XML document contains data for a purchase order, in which there
is a shipTo and billTo address, room for comments, and unlimited items. With the exception of a
user-defined data type based on regular expressions (Sku) in the XSD file, CobWeb is capable of reproducing
the same results as the XSD file. Adding the functionality of regular expressions to describe strings is planned
for a future version of CobWeb.
<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
<shipTo country="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Mill Valley</city>
<state>CA</state>
<zip>90952</zip>
</shipTo>
<billTo country="US">
<name>Robert Smith</name>
<street>8 Oak Avenue</street>
<city>Old Town</city>
<state>PA</state>
<zip>95819</zip>
</billTo>
<comment>Hurry, my lawn is going wild!</comment>
<items>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<price>148.95</price>
<comment>Confirm this is electric</comment>
</item>
<item partNum="926-AA">
<productName>Baby Monitor</productName>
<quantity>1</quantity>
<price>39.98</price>
<shipDate>1999-05-21</shipDate>
</item>
</items>
</purchaseOrder>
16
The XSD file below defines the structure and data types of the elements and attributes used in the XML
document.
<xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema">
<xsd:annotation>
<xsd:documentation>
Purchase order schema for Example.com.
Copyright 2000 Example.com. All rights reserved.
</xsd:documentation>
</xsd:annotation>
<xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
<xsd:element name="comment" type="xsd:string"/>
<xsd:complexType name="PurchaseOrderType">
<xsd:element name="shipTo" type="Address"/>
<xsd:element name="billTo" type="Address"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name="Address">
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:decimal"/>
<xsd:attribute name="country" type="xsd:NMTOKEN"
use="fixed" value="US"/>
</xsd:complexType>
<xsd:complexType name="Items">
<xsd:element name="item" minOccurs="0"
maxOccurs="unbounded">
<xsd:complexType>
<xsd:element name="productName"
type="xsd:string"/>
<xsd:element name="quantity">
<xsd:simpleType base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:simpleType>
</xsd:element>
<xsd:element name="price" type="xsd:decimal"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="shipDate" type="xsd:date"
minOccurs=’0’/>
<xsd:attribute name="partNum" type="Sku"/>
</xsd:complexType>
</xsd:element>
</xsd:complexType>
<xsd:simpleType name="Sku" base="xsd:string">
<xsd:pattern value="\d{3}-[A-Z]{2}"/>
</xsd:simpleType>
</xsd:schema>
17
Here is a DTD for the same XML document:
<?xml encoding="US-ASCII"?>
<!ELEMENT purchaseOrder (shipTo, billTo, comment, items*)>
<!ATTLIST purchaseOrder
orderDate
CDATA
#REQUIRED
>
<!ELEMENT shipTo
(name, street, city, state, zip)>
<!ATTLIST shipTo
country
CDATA
#FIXED "US"
>
<!ELEMENT billTo
(name, street, city, state, zip)>
<!ATTLIST billTo
country
CDATA
#FIXED "US"
>
<!ELEMENT
name
(#PCDATA)>
<!ELEMENT
street
(#PCDATA)>
<!ELEMENT
city
(#PCDATA)>
<!ELEMENT
state
(#PCDATA)>
<!ELEMENT
zip
(#INTEGER)>
<!CONSTRAINT (zip ge 10000 AND zip le 99999)>
<!ELEMENT
items (item*)>
<!ELEMENT
item
(productName, quantity, price, comment?, shipDate)>
<!ATTLIST
item
partNum
CDATA
#REQUIRED
>
<!ELEMENT
productName
(#PCDATA)>
<!ELEMENT
quantity
(#INTEGER)>
<!CONSTRAINT (quantity ge 0 AND quantity lt 100)>
<!ELEMENT
price
(#REAL)>
<!CONSTRAINT (price ge 0)>
<!ELEMENT
<!ELEMENT
comment
shipDate
(#PCDATA)>
(#PCDATA)>
Cobweb and XML Schema:
Cobweb has its element definitions and constraints defined in the Document Type Definition while XML
Schemas define it in the XML syntax itself. While it may be advantageous to have the element definitions or
the constraints defined in the same syntax as the instance it sometimes gets tedious. Since the early realms of
programming languages when functional and declarative languages were born, we have maintained the fact
that declarations were always separate and had its own methodology. All schema data types have been
incorporated into Cobweb. We strongly criticize open content model as it creates loopholes in the system.
Cobweb and Schematron
Schematron is a schema language used for validating XML using patterns. Its focus is on validating and
not declaring. It provides powerful constraint specification via Xpath, query patterns (querying done by
assert and report) for defining the rules and checks. With just a subset of Xpath, powerful XSLT style
sheets can be created to process very complex XML instances. The basic elements of Schematron are an
18
optional title, zero or more prefixes and namespaces, several patterns containing the rule context
containing assert test and report test.
Although Schematron supports constraints to some extent because of Xpath, it does not support
inheritance, whereas Cobweb supports inheritance. The elements and attributes do not have a default
value in Schematron. Constraints have not been explored to their greatest depth in schematron for
instance Cob deals with quantified constraints can only deal with minor constraints using the “assert”
statement.
CobWeb and XDR
XDR stands for XML Data Reduced. There is broad recognition that XML's existing DTD is inadequate
and/or inappropriate language for expressing what many of the current and anticipated applications of XML
need to include in schemas. XML-Data provides an alternative approach using XML instance syntax language
to address these needs.
CobWeb and SOX (version 2.0)
SOX is Schema for Object-Oriented XML. This is for defining the syntactic structure and partial semantics
for XML document types. It extends DTD in an object -oriented way by allowing extensible data types and
inheritance among element types. SOX was initially developed to support the development of large-scale,
distributed electronic commerce applications but is applicable across the whole range of applications of
markup. As compared to XML DTDs, SOX dramatically decreases the complexity of supporting
interoperation among heterogeneous applications by facilitating software mapping of XML data structures,
expressing domain abstractions and common relationships directly and explicitly, enabling reuse at the
document design and the application programming levels, and supporting the generation of common
application components.
CobWeb and DSD (version 1.0)
DSD is Document Structure Description. A DSD document is a specification of a class of XML documents
together with a default mechanism and documentation. The goal of DSD was to establish a context-dependent
description of elements and attributes, flexible fault insertion mechanisms and very good expressive power. It
has a strong edge on schema constraints. It guarantees linear time processing in the size of the application
document.
CobWeb and DSD
DSD also has some constraints built into it, is not full-fledged, because, it too like Schematron does not
support quantified constraints and inheritance. Furthermore, it does not support Namespaces.
Table 1: Summary of the feature comparisons.
Features
Schema
Syntax in XML
Cob Web
No
DTD
XML
XDR 1.0
1.0 Schema 1.0
No
Yes
Yes
SOX 2.0
Schematron
1.4
DSD 1.0
Yes
Yes
Yes
19
Namespace
Include
Import
Yes
Yes
Yes
No
No
No
Yes
Yes
Yes
Yes
No
No
Yes
Yes
Yes
Yes
No
No
No
Yes
No
Built-in type
User-defined type
Domain constraint
Null
Attribute
Default value
Choice
Optional vs. required
Domain constraint
Conditional definition
Element
Default value
Content model
Ordered sequence
Unordered sequence
Choice
Min & max occurrence
Open model
Conditional definition
Inheritance
Simple type by
extension
Simple type by
restriction
Complex type by
extension
Complex type by
restriction
Being unique or key
Uniqueness for
attribute
Uniqueness for nonattribute
Key for attribute
Key for non-attribute
Foreign key for
attribute
Foreign key for nonattribute
38
Yes
Yes
No
10
No
No
No
37
Yes
Yes
Yes
33
No
No
No
17
Yes
Partial
No
0
No
Yes
No
0
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
Partial
No
Yes
No
Yes
Yes
No
Yes
No
Yes
Partial
No
Yes
No
Yes
Partial
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
No
Yes
Yes
No
Yes
Partial
No
No
Partial
Yes
Yes
Yes
Yes
Yes
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Partial
Yes
No
Yes
Yes
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Partial
No
Yes
No
No
No
No
No
No
No
Yes
No
Yes
No
No
No
Yes
No
Yes
No
No
No
Yes
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
Partial
No
Yes
No
No
No
Yes
Yes
No
No
No
No
Yes
Yes
No
No
Partial
Yes
Partial
Partial
Yes
Yes
No
Yes
No
No
No
Yes
Data type
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
20
Miscellaneous
Dynamic constraint
Version
Documentation
Embedded HTML
Self-describability
Yes
Yes
Yes
No
No
No
No
No
No
No
No
No
Yes
Yes
Partial
No
No
No
No
No
No
No
Yes
Yes
No
Yes
No
Yes
Partial
Partial
No
Yes
Yes
Yes
Yes
7 Conclusions and Further Work
In this paper we have discussed the need for a constraint-based extension of XML. Our proposed extension
offers several powerful features. Using constrained XML allows developers to expand upon the notion of
viewing the web as part of a database. Not only is data checked for validity (i.e., conformance with the
grammatical part of the DTD) but it is also checked for semantic integrity (i.e., satisfies the constraints).
Using constrained XML to author websites will allow us to provide tools that automatically ensure their
structural and semantic correctness. Currently we have a prototype implementation of the constraints
described in this paper in order to demonstrate their usefulness and test the appropriateness of the syntax and
design [2]. This implementation was been developed using the IBM XML4J parser. Several of the examples
in the paper have been tested using this implementation, and the concept of Constraint XML has been found
to be useful and easy to use. This implementation does not yet support links and namespaces, and we are in
the process of extending the implementation. Using CobWeb as an authoring language is only one aspect of
the language. There are several scenarios, which do not fall into the context of this paper yet could benefit
from a constraint-based extension of XML.
Many content-driven websites create web pages dynamically as a natural part of their operation. In these cases,
using CobWeb to generate the solutions to the constraints becomes beneficial as opposed to writing a separate
application to handle the data. Similarly, business-to-business applications often involve data transactions
between multiple servers. CobWeb can be used to further restrict the data that is to be transferred, and help to
define the data interfaces between companies. CobWeb can be extended to include solutions for these
scenarios. CobWeb makes a distinction between checking and fulfilling constraints. All of the examples given
so far check the validity of the data using the given constraints. However, there are circumstances in which
fulfilling constraints becomes useful. In one such circumstance, we emulate the SQL SELECT query. In the
following example, we want to know the average of all failing students in a given class:
<?xml encoding="US-ASCII"?>
<!ELEMENT class
(student+, passing_student*, failing_student*)>
<!ELEMENT student (name, grade)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT grade (#REAL)>
<!ELEMENT failing_student (student)>
<!CONSTRAINT (class.student.grade lt 50)>
<!ELEMENT passing_student (student)>
<!ELEMENT failing_average (#REAL)>
<!CONSTRAINT (AVERAGE(class.failing_student))>
Failing students are chosen from all student elements in the class. Unlike previous examples, the author of
the document does not populate the failing student tags. Instead, the parser itself takes on this duty.
The SQL query that this emulates would look like this:
21
SELECT student
FROM class
WHERE student.grade lt 50
Such a feature would allow us to broaden CobWeb’s functionality beyond checking constraints at build-time
and specify whether constraints should be checked or fulfilled at build-time, browse-time, or during querying.
References
[1]
Bosak,
Jon.
XML,
Java,
and
the
Future
of
the
Internet.
http://www.xml.com/pub/w3j/s3.bosak.html,
November 1987. Also published in World Wide Web Journal.
[2] Saradha, K., Design and Implementation of a Parser for a Constraint-Based Extension of XML, Dept of
Computer Science and Engineering, University and Buffalo, January 2001.
[3] Dongwon Lee Wesley W. Chu, Comparative analysis of six schema languages
[4] H. S. Thompson, D. Beech, M. Maloney, N. Mendelsohn (ed.) XML Schema Part 1: Structures, W3C,
April 2000.
[5]Constraints-preserving Transformation from XML Document Type Definition to Relational Schema
Dongwon Lee, Wesley W. Chu
[6] World Wide Web Consortium. Extensible Markup Language (XML) 1.0.
Available at:
http://www.w3.org/TR/REC-xml.
[7] World Wide Web Consortium. Namespaces in XML. Available at: http://www.w3.org/TR/RECxmlnames/.
[8] World Wide Web Consortium. XML Linking Language (XLink). Available at:
http://www.w3.org/TR/2000/WD-xlink-20000221/.
[9] World Wide Web Consortium. XML Schema Part 1: Structures. Available at:
http://www.w3.org/TR/xmlschema-1/.
[10] World Wide Web Consortium. XML Schema Part 2: Datatypes. Available at:
http://www.w3.org/TR/xmlschema-2/.
[11] World Wide Web Consortium. XML Schema Part 0: Primer. Available at:
http://www.w3.org/TR/xmlschema-0/.
[12] St. Laurent, Simon. XML Elements of Style. New York: McGraw-Hill, 2000.
Appendix I: A list of constraints in XML
The following is a list of constraints and their definitions. common numerical and string operators:
+
*
/
=
lt
gt
le
ge
addition
subraction
multiplication
division
equals
less than
greater than
less than or equal to
greater than or equal to
22
keywords used for ordering:
ASCENDING
DESCENDING
keywords used for quantification:
EVERY
SOME
AVERAGE(element)
the constrained element must be equal to the average of the sum of all occurences of the element
passed in.
CEILING(element)
the constrained element must be the integer ceiling of the element passed in.
COUNT(element)
the constrained element must be equal to the number of occurences of the element that is passed in.
FLOOR(element)
the constrained element must be the integer floor of the element passed in.
ISINT()
the value of the constrained element must be an integer.
ISREAL()
the value of the constrained element must be a real number.
ISTHIS(element)
the value of the constrained element must exist as the value of at least one of the values of the
occurences of the element passed in.
MAX(element)
the value of the constrained element must have the maximum value of all instances of the element
in the scope of the element
MIN(element)
the value of the constrained element must have the minimum value of all instances of the element
in the scope of the element
SUM(element[, ...])
the constrained element must be equal to the sum of the elements passed in. Note that this may
take several forms. Either multiple elements may be passed in, and their sum taken:
SUM(apples.total, oranges.total, bananas.total)
or a single element may be passed in, and the application must search for all occurences of the
element in the page, and add each instances’ value:
SUM(product.total)
23
Appendix II
Online Brochure Example
<?xml version="1.0"?>
<!DOCTYPE toc SYSTEM "brochure.dtd">
<toc>
<toc_loc
xlink:href="http://www.cse.buffalo.edu/pub/WWW/undergrad/brochure.xml"
/>
<toc_section
xlink:href="http://www.cse.buffalo.edu/pub/WWW/undergrad/whatiscs.xml"
>
<toc_description>
<toc_number>1</toc_number>
<toc_title>
What is Computer Science? Computer Engineering?
</toc_title>
</toc_description>
</toc_section>
<toc_section
xlink:href="http://www.cse.buffalo.edu/pub/WWW/undergrad/general.xml"
>
<toc_description>
<toc_number>2</toc_number>
<toc_title>
General Information about Undergraduate Programs
</toc_title>
</toc_description>
</toc_section>
<toc_section
xlink:href="http://www.cse.buffalo.edu/pub/WWW/undergrad/admission.xml"
>
<toc_description>
<toc_number>3</toc_number>
<toc_title>Admission to the CS Major
(B.A., B.S. Degree Programs)</toc_title>
</toc_description>
</toc_section>
</toc>
Each section listed within the table of contents has its own XML file. The file for the first section is given
below:
<?xml version="1.0"?>
<!DOCTYPE section SYSTEM "brochure.dtd">
<section>
<toc_description>
<toc_number>1</toc_number>
<toc_title>
What is Computer Science? Computer Engineering?
</toc_title>
</toc_description>
24
<toc_loc
xlink:href="http://www.cse.buffalo.edu/pub/WWW/undergrad/brochure.xml"
/>
<next_loc
xlink:href="http://www.cse.buffalo.edu/pub/WWW/undergrad/general.xml"
/>
<section_data>
The Department of Computer Science (CS) at the State University of New York
at Buffalo, which was established in 1967, became the Department of Computer
Science and Engineering (CSE) in 1998...
</section_data>
</section>
Product Comparison Example
A sample XML file, comparing data from Amazon.com and Barnes & Noble:
<comparator>
<site xlink:href="http://www.amazon.com/">
<name>Amazon.com</name>
<products>
<book>
<title>Fountainhead</title>
<ISBN>0451191153</ISBN>
<price>7.19</price>
<shipping>0.99</shipping>
</book>
<book>
<title>Running Linux</title>
<ISBN>156592469X</ISBN>
<price>6.99</price>
<shipping>0.99</shipping>
</book>
</products>
<shipping>3.00</shipping>
<total_price>19.16</total_price>
</site>
<site
xlink:href="http://www.barnesandnoble.com/products/search.jsp"
>
<name>Barnes & Noble</name>
<products>
<book>
<title>Fountainhead</title>
<ISBN>0451191153</ISBN>
<price>7.19</price>
<shipping>0.95</shipping>
</book>
<book>
<title>Running Linux</title>
25
<ISBN>156592469X</ISBN>
<price>26.96</price>
<shipping>0.95</shipping>
</book>
</products>
<shipping>3.00</shipping>
<total_price>39.05</total_price>
</site>
<best_buy>
<site xlink:href="http://www.amazon.com/">
<name>Amazon.com</name>
<products>
<book>
<title>Fountainhead</title>
<ISBN>0451191153</ISBN>
<price>7.19</price>
<shipping>0.99</shipping>
</book>
<book>
<title>Running Linux</title>
<ISBN>156592469X</ISBN>
<price>6.99</price>
<shipping>0.99</shipping>
</book>
</products>
<shipping>3.00</shipping>
<total_price>19.16</total_price>
</site>
</best_buy>
</comparator>
26