What is XML?
XML is a way of adding intelligence to your documents. It lets you identify each element using
meaningful tags and it lets you add information ("metatdata") about each element.
XML is very much a part of the future of Web, and part of the future for all electronic information.
XML is a syntax for marking up data and it works with many other technologies to display and process
information. It looks and feels very much like HTML.
XML isn't going to replace everything else you've already learned; it complements it and extends it.
XML isn't going to change the way your Web pages look. You'll still need to use CSS--Cascading
Style Sheets-- (with XML) to define font colors or JavaScript (again, with XML) to make your images
fly around. Yet XML will change the way you and others read documents and it will change the way
documents are filed and stored. It's a new technology and you certainly don't need to use it in order to
build a great Web site -- but you will want to be aware of it as you look at the Web of the future.
What's the Fuss About?
XML lets you make documents smarter, more portable, and more powerful -- that's the promise of
XML and that's what all the fuss is about.
XML allows you to use your own tags to define parts of a document. You can do this because XML is
a descriptive, not a procedural, language. That is, XML describes what something is rather than
performing an action.
For example, take a look at the front page of a newspaper. You'll see different font sizes, different
sections, and columns.
If you were to create a Web page for that newspaper--using the same formatting and styles--you
would use tags such as <H1> and <font color="red"> to define the size and color of a large
headline, or <i> to italicize a word such as a byline, in order to distinguish it from the rest of the text.
But just try to write tags that actually explain that you've got a Headline and that the words "John
Smith" make up a byline. HTML won't know what you're talking about if you create tags such as
<Headline> or <byline> or <advertisement>.
XML, with help from other technologies such as CSS, understands what the elements are and how to
display them.
That means, in the future, when you're searching on the web for say, a Barbie doll for your niece's
birthday, you'll get Barbie the DOLL instead of some other type of Barbie, because the Barbie doll
page might be marked up like this:
<DOLL>Barbie</DOLL>.
Pretty cool, huh?
XML documents can be moved to any format on any platform -- without the elements losing their
meaning. That means you can publish the same information to a web browser, a PDA, or a networkenabled bread machine and each device would use the information appropriately.
The most important thing to remember about XML, though, it that is doesn't stand alone. It needs
other technologoies, like CSS, in for you to see its results.
If all of this seems like a pain, and you don't want to mess with XML, it's OK. You don't need it to make
a great web page. But you never know when organization will come in handy.
Where Did XML Come From?
XML is a simplified version of SGML and a cousin of HTML. It was developed by members of the W3C
and released as a recommendation by the W3C in February 1998.
SGML, the parent of XML, is an international standard that has been in use as a markup language
primarily for technical documentation and government applications since the early 1980s. It was
developed to standardize the production process for large document sets. Think: Medical records.
Company databases. Aircraft parts catalogs. Other really huge documents.
1
Marking-up documents in SGML allows information to be passed from one system to the next without
losing information. With databases marked-up in SGML you can see what Widget A is all about and
go check to see if Widget A is in stock.
Early on, people thought that SGML would be useful for the Web. In fact, HTML is really an very basic
application of SGML! But HTML quickly became used for visual layout, so a group of people returned
to the basics, determined to create something that had the strengths of SGML without being so
difficult to implement -- and had the ease of use of HTML, but with more structural power. The result
was XML.
The design goals of XML, taken from the XML Specification are:
 XML shall be straightforwardly usable over the Internet.
 XML shall support a wide variety of applications.
 XML shall be compatible with SGML.
 It shall be easy to write programs which process XML documents.
 The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
 XML documents should be human-legible and reasonably clear.
 The XML design should be prepared quickly.
 The design of XML shall be formal and concise.
 XML documents shall be easy to create.
 Terseness in XML markup is of minimal importance.
In other words, XML is easy to create, easy to read, and designed for use over the Internet. What
more could a Web designer ask for?
What Does XML Look Like?
If you've ever used HTML, XML is going to look very familiar!
When you view the source of a document written in XML the first thing you'll see is the XML
declaration, which looks like this:
<?xml version="1.0"?>
Then, in the body of the document, you'll see a lot of tags. The tags look familiar at first -- they start
with the usual less than sign and end with the usual greater than sign, like this:
<name>
But then you'll notice that the tags might not be quite the names you've come to expect! You'll see
tags that seem to be made-up tag names. Tags like <dogchow> and <badcars> and <species>. In
fact, if you view the source of an XML document, you'll see tags surrounding lots of words, maybe
every word in the document. These tags define exactly what the content is. And the creator of the
document had the power to create his or her own specific set of tags.
Suppose you're looking at a Web page marked up in XML on The Canterbury Tales by Chaucer.
You're looking specifically at lines 282-286 of "The Physician's Tale." The document source for that
section might look like this:
<?xml version="1.0"?>
<CANTERBURY-TALES>
<SECTION name="physician">
The Physician's Tale
<LINE number="282">
That no man woot therof but God and he.
</LINE>
2
<LINE number="283">
For be he lewed man, or ellis lered,
</LINE>
<LINE number="284">
He noot how soone that he shal been afered.
</LINE>
<LINE number="285">
Therfore I rede yow this conseil take -</LINE>
<LINE number="286">
Forsaketh synne, er synne yow forsake.
</LINE>
</SECTION>
</CANTERBURY-TALES>
The tags simply define that:
1) This document is the Canterbury Tales.
2) This section is the Physician's Tale.
3) Each line of the Physician's Tale is defined.
4) Each line ends, and the Physician's Tale and The Canterbury Tales end.
If the entire document were marked up such as this, you could easily jump to a certain line or section.
The entire document is annotated for easy reference and searching, and instead of viewing the entire
document, users could request only specific sections of a document--simply by calling the specific
tags they want. Oh, and we don't recommend that you manually type out each line in the Canterbury
Tales. Get a computer to count the lines for you.
XML Versus HTML
HTML and XML are cousins. They draw off the same inspiration, SGML. They both identify elements
in your page. They both use a very simliar syntax. If you are famliar with HTML, XML will also feel
familiar.
The big difference between HTML and XML is that HMTL has evolved into a markup language that
describes the look, feel and action of a Web page. An <H1> is a headline that is displayed in a certain
size, for example.
In contrast, XML doesn't describe how a page looks, how it acts or what it does. XML describes what
the words in a document ARE. This is a critical distinction! While HTML combines structure and
display, XML separates them. This means that XML documents are more portable and can be used in
many different types of applications.
In the near future, we'll see both XML and HTML documents. Eventually, XML will probably replace
HTML, or HTML will become an application of XML. But that doesn't mean you should toss out
everything you know! In many ways, XML builds on HTML and if you know HTML, XML will be easier
to work with.
3
Valid and Well-Formed XML
You'll sometimes hear an XML document referred to as a "valid" XML document or a "well-formed"
XML document. This distinction touches on one of the nice things about XML.
When you used SGML, you had to create something call a Document Type Definition (DTD, for short)
in order make the SGML document useful. DTDs were fairly complex and required a lot of work to
create. They were one of the roadblocks to widespread use of SGML.
With XML you have an option. You can make a well-formed XML document by simply following the
XML syntax rules. You don't have to create a separate DTD if you don't want to.
If you do create an set of rules -- a DTD -- and make your document conform to those rules, it is
considered a valid XML document.
DTDs describe the structure of your document. We'll be discussing DTDs in detail later on. Right now,
all you need to know is that the main difference between Valid and Well-Formed XML is that Valid
XML refers to and conforms to a DTD and Well-Formed XML doesn't.
Structure
XML applies structure to documents. Documents are sets of related information.
The term structure seems to bring some unpleasant imagery with it, especially for creative souls who
want to make this medium work in new and innovative ways. But when one is dealing with publishing,
the term structure is quite positive. It is the way we put a skeleton behind information, so that the
pieces of information work together and make sense as a whole.
There are two key principles behind a structured model:
1. Each part -- or element -- has a relationship with other elements. This series of relationships
defines the structure.
2. The meaning of the element is separate from its visual appearance.
Documents
We can't really talk about structure without first talking a bit about documents. Document is another of
those terms that conjures up somewhat negative images; one tends to picture "dusty stacks of
documents" or "attorney's documents" or "document processing." But in this case, a document is
simply a collection related information.
For example, this page is a document. Your favorite 'zine is set of documents. Your intranet is
probably comprised of hundreds if not thousands of documents.
Sometimes documents are created as a single unit. Sometimes they are built on demand, pulling
pieces from a database and assembling into a document as the reader requests. In both cases,
structure makes the document easier to create, maintain, and display.
Document Structure
The document structure defines the elements which make up a document, the information you want
to collect about those elements, and the relationship those elements have to each other. You use XML
to markup the document, following the structure you have decided upon.
By treating a document as a collection of elements, you free it from the constraints of time, place, and
presentation format. You can move the structured document from a word processor to a PDA to a web
browser. The structure is intact on each; you just alter the display characteristics for each device.
The document structure is called the document tree. The main trunk of the tree is the parent. All the
branches and leaves are children. Document trees are usually visually represented as a hierarchal
chart.
4
Structure vs. Format
The most important thing to remember about a structured document is that it is defined by the
elements it contains, not by how it looks.
Structure says that an element is a paragraph. Format says to display the paragraph in 12 point
Times.
Structure says the element is a book title. Format says to display the book title in green bold body text.
Structure say the element is a social security number. Format says to hide and not display the social
security number.
Learning to separate structure from format is critical in making good use of XML.
Metadata
Metadata is data about data. A key use of XML is to collect and work with metadata.
At its most basic level, XML is a metadata language. That is, it is a way of assigning information to
pieces of data. The most obvious use of this is to identify a piece of data as a certain structural
element. But this is just the beginning.
XML is about much more than marking up documents for use in a web browser. XML is really about
adding layers of information to your data, so that the data can be processed, used, and transferred
between applications.
Metadata in HTML
If you've built a website, you've almost certainly worked with metadata. The keyword and description
meta tags are simple uses of metadata. With these meta tags you can assign the document as a
whole information about the general type of content it contains. This information doesn't display in a
web browser, but it does display in search engine results.
Another use of meta tags is to store information such as creator name and creation date. Some
servers are structured to work with these meta tags, allowing you to sort by creation date or display
based on creator name.
Going Further with XML
XML takes this basic idea much further. With XML, you can describe where you found your data, you
can quantify, qualify, and further define it. You can then use this metadata to validate information,
perform searches, set display constraints, or process other data.
Here's just a few examples:



XML initiatives are under way which will allow for digital signature verification and validated
form submission. This could make it possible for forms, with signatures, to be submitted online
and be legally binding.
XML initiatives are under way to help catalog web content. Using metadata, the web can be be
indexed better and search more effectively.
XML is being used to transfer data, based on factors just as date entered, between unlike
databases. The metadata is both a means to find the correct data bits and a common
language of transfer between databases which do not speak each other's specific language.
The RDF Proposal
One W3C-blessed use of metadata which you may have heard about is a proposal called the
Resource Description Framework, or RDF. RDF is an application of XML for making metadata
5
machine-processable. It allows applications to exchange information about data automatically. This
has implications in indexing, content rating, intellectual ownership, e-commerce, and privacy, among
other things. The W3C says:
RDF with digital signature will be the key to building the "Web of
Trust" for electronic commerce, collaboration, and other
applications.
Display Issues
XML alone will not display a page. You must use a formatting technology, such as CSS or XSL to display
XML-tagged documents in a Web browser.
XML is about separating structure and format. An XML document doesn't know anything about how to
display itself. It relies on other technologies for this.
Although XML does not deal with form, it contains a great deal of information about the document and
its elements. This, when combined with style tools, gives you a whole new strength and flexibility in
displaying your documents without having to maintain multiple copies of the document.
XSL
Extensible Stylesheet Language, XSL, is the future of XML display. It is an XML-based languages for
expressing stylesheets.
With XSL, you can make context-sensitive display decisions. For example, you could automatically
display the document one way in a Web browser and another on a PDA.
XSL can also transform XML into HTML, so that older browsers can view XML documents.
CSS
Cascading Style Sheets, CSS-1 and CSS-2, are the current way to display XML documents in a Web
browser. CSS is a means of assigning display values to page elements.
If you are going to be working with XML and you will be concerned with displaying pages, learn CSS.
The CSS Reference Guide contains a guide to the CSS-1 properties.
Behaviors
Behaviors are a non-standard, IE5 technique that lets you do some interesting display actions with
XML tags. They combine scripting and CSS in a component file. This component can be attached to a
particular tag and used in many different documents. The Behaviors Library shows some of the
things you can do with this technique.
The DOM
The Document Object Model lets you address, change, and manipulate any individual portion of the Web
page.
6
The phrase "document object model" means that you treat your document as a collection of individual
objects, rather than a single solid unit. The W3C DOM is the set of rules for doing this in a standard
way in a Web browser, with HTML and XML files.
O is for Object
In an object-oriented approach, the program or the document is made up of many smaller components
called objects. The smaller components can be re-arranged, added to, or removed dynamically.
The idea of objects has become quite popular in both software and documents. The programming
language Java and the scripting language JavaScript each has an object-oriented philosophy at its
core. The adoption of the standard DOM enables Web pages to share that object approach too.
With an object model, you manage the small pieces, combining them and reusing them as it makes
sense -- instead of writing one huge applications program or one huge document. You might think of
an object approach as being a little like a collection of Lego blocks ... different pieces do different
things, but you can combine and recombine them into many different finished projects.
Each object type acts a template. You can use an instance of the same object over and over again.
For example, you might have multiple instances of the <canine> element in a document. All the
objects share the same name, canine, and work the same way, but each one represents its own set of
data and can be addressed individually.
The API
It isn't enough to merely know that an object is an object. You also need to know how to talk to that
object and give it commands. That's where the API comes in.
API stands for Application Programming Interface. An API is a set of rules that describes how you
can access and manipulate an object. The DOM specification describes the API for HTML and XML
documents.
The DOM, by providing a standard API, defines the naming conventions, programming models, and
other rules for communicating with an object in an HTML or XML page.
Getting from XML to Objects
In an XML document, each element is actually an object -- it has a name and it has attributes that
describe it.
The browser, combined with a stylesheet, displays each of the XML elements/objects in a web page.
Because they are objects, you can address and change them individually.
Ah, but just knowing that every piece is an object isn't enough. You need to have a set of rules, an
API, to describe how to address those objects when they are placed in a web page. That's where the
DOM comes in.
The DOM does three things -- you might think of it as explaining the "who, what, and how" of the web
page.
1. First, it describes who -- which objects are a web page and how XML objects are represented
there?
2. Second, it defines what -- what can these objects do and how do they work with others?
3. Third, it defines how -- how can these objects can be addressed?
The DOM is the translator, the interface that lets all the pieces be represented properly, talk to each
other, and communicate with scripts and other action tools.
It is XML that lets you add and identify data, but it is the DOM that lets the script manipulate and
display that data on command in the web browser window.
7
Pulling It All Together
You'll typically be working with four technologies that combine to create an interactive Web page: XML
(or HTML), a scripting language, CSS, and the DOM. This illustration shows their relationship.





XML identifies data. For example: "King Lear" is a title element.
CSS stores information about display values for elements and delivers the information to
the browser. For example: Titles are displayed in 18 point black courier type.
The script "talks" to the objects and sends messages to and from the browser about the
objects. Typically these are "change your display" or "do this" messages based on user
actions or other variables. For example: If a particular title is out of stock, display it in red.
The DOM provides the common interface through which various scripts and objects talk to
one another and display in the Web browser.
The browser displays the results to the end user.
If any of these pieces are missing, you can't create a dynamically-changing presentation of your
document.
Element
An element is the basic building block of HTML and XML documents.
Elements are identified by a tag. The tag consists of angle brackets and content, and looks like this:
<AUTHOR>Thadius J. Frog</AUTHOR>
In HTML, you use a pre-defined set of elements. In XML you create your own set of elements.
Attribute
8
Attributes are like adjectives, in that they further describe elements. Each attribute has a name and a
value.
Attributes are entered as part of the tag, like this:
<AUTHOR dob="1874">Thadius J. Frog</AUTHOR>
Tag
You use a tag to identify a piece of data by element name.
Tags usually appear in pairs, surrounding the data. The opening tag contains the element name. The
closing tag contains a slash and the element's name, like this:
<AUTHOR>Thadius J. Frog</AUTHOR>
Attribute Value
Attributes contain an attribute values. The value might be a number, a word, or a URL.
Attribute values follow the attribute and an equal sign, like this:
<AUTHOR dob="1874">Thadius J. Frog</AUTHOR>
In XML, attribute values are always surrounded by quotation marks.
Declaration
You begin an XML file with an XML declaration. The declaration states that this is an XML file. The
xml declaration looks like this:
<?xml version="1.0"?>
DTD
Document Type Defintion. The DTD defines the elements, attributes, and relationships between
elements for an XML document.
A DTD is a way to check that the document is structured correctly, but you do not need to use one in
order to use XML.
The XML Document
An XML file is an ASCII text file with XML markup tags. It has a .xml extension, like this:
booklist.xml
Inside an XML File
An XML file contains three basic parts:
1. A declaration that announces that this is an XML file;
2. An optional definition about the type of document it is and what DTD it follows;
3. Content marked up with XML tags.
Click on this paragraph to see a very simple example of an XML document. Click on an part of
the document to learn more about it.
Types of XML Documents
9
There are two types of XML documents: well-formed or valid. The only difference between the two is
that one uses a DTD and the other doesn't.
Well-formed
Well-formed documents conform with XML syntax. They contain text and XML tags. Everything is
entered correctly. They do not, however, refer to a DTD.
Valid
Valid documents not only conform to XML syntax but they also are error checked against a Document
Type Definition (DTD). A DTD is a set of rules outlining which tags are allowed, what values those
tags may contain, and how the tags relate to each other.
Typically, you'll use a valid document when you have documents that require error checking, that use
an enforced structure, or are part of a company- or industry-wide environment in which many
documents need to follow the same guidelines.
DTDs
A Document Type Definition (DTD) is a set of rules that defines the elements, element attribute and
attribute values, and the relationship between elements in a document.
When your XML document is processed, it is compared to its associated DTD to be sure it is
structured correctly and all tags are used in the proper manner. This comparison process is called
validation and is is performed by a tool called a parser.
Remember, you don't need to have a DTD to create an XML document; you only need a DTD for a
valid XML document.
Here's a few reasons you'd want to use a DTD:




Your document is part of a larger document set and you want to ensure that the whole set
follows the same rules.
Your document must contain a specific set of data and you want to ensure that all required
data has been included.
Your document is used across your industry and need to match other industry-specific
documents.
You want to be able to error check your document for accuracy of tag use.
Deciding on a DTD
Using a DTD doesn't necessarily mean you have to create one from scratch. There are a number of
existing DTDs, with more being added everyday
Shared DTDs
As XML becomes wide-spread, your industry association or company is likely to have one or more
published DTDs that you can use and link to. These DTDs define tags for elements that are commonly
used in your applications. You don't need to recreate these DTDs -- you just point to them in your
doctype tag in your XML file, and follow their rules when you create your XML document.
Some of these DTDs may be public DTDs, like the HTML DTD. Others may belong to your company.
If you are interested in using a DTD, ask around and see if there is a good match that already exists.
Create Your Own DTD
Another option is to create your own DTD. The DTD can be very simple and basic or it can be large
and complex. The DTD will be a reflection of the needs of your document.
10
It is perfectly acceptable to have a DTD with just four or five basic elements if that is what your
document needs. Don't feel that creating a DTD necessarily needs to be a huge undertaking.
However, if your documents are complex, do plan on setting aside time -- several days or several
weeks -- to understand the document and the document elements and create a solid DTD that will
really work for you over time.
Make an Internal DTDs
You can insert DTD data within your DOCTYPE definition. If you're worked with CSS styles, you can
think of this as being a little like putting style data into your file header. DTDs inserted this way are
used in that specific XML document. This might be the approach to take if you want to validate the use
of a small number of tags in a single document or to make elements that will be used only for one
document.
Remember, the primary use for a DTD is to validate that the tags you enter in your XML document are
entered as specified in the DTD. It is an error-checking process that ensures your data conforms to a
set a rules.
XML Syntax
Tagging an XML document is, in many ways, similar to tagging an HTML document. Here are some of
the most important guidelines to follow.
Rule #1: Remember the XML declaration
This declaration goes at the beginning of the file and alerts the browser or other processing tools that
this document contains XML tags. The declaration looks like this:
<?xml version="1.0" standalone="yes/no" encoding="UTF-8"?>
You can leave out the encoding attribute and the processor will use the UTF-8 default.
Rule #2: Do what the DTD instructs
If you are creating a valid XML file, one that is checked against a DTD, make sure you Know what
tags are part of the DTD and use them appropriately in your document. Understand what each does
and when to use it. Know what the allowable values are for each. Follow those rules. The XML
document will validate against the specified DTD.
Rule #3: Watch your capitalization
XML is case-sensitive. <P> is not the same as <p>. Be consistent in how you define element names.
For example, use ALL CAPS, or use Initial caps, or use all lowercase. It is very easy to create mismatching case errors.
Also, make sure starting and ending tags use matching capitalization, too. If you start a paragraph
with the <P> tag, you must end it with the </P> tag, not a </p>.
Rule #4: Quote attribute values
In HTML there is some confusion over when to enclose attribute values in quotes. In XML the rule is
simple: enclose all attribute values in quotes, like this:
<NAME dob="1960">Ben Johnson</NAME>
Rule #5: Close all tags
11
In XML you must close all tags. This means that paragraphs must have corresponding end paragraph
tags. Anchor names must have corresponding anchor end tags. A strict interpretation of HTML says
we should have been doing this all along, but in reality, most of us haven't.
Rule #6: Close Empty tags, too
In HTML, empty tags, such as <br> or <img>, do not close. In XML, empty tags do close. You can
close them either by adding a separate close tag (</tagname>) or by combining the open and close
tags into one tag. You create the open/close tag by adding a slash, /, to the end of the tag, like this:
<br/>
Examples
This table shows some HTML common tags and how they would be treated in XML.
Tag
Comment
End-Tag
<P>
Technically, in HTML, you're supposed to close this
</P>
tag. In XML, it's essential to close it.
<ELEMENT>
All Elements in XML must have a Start-tag and an
</ELEMENT>
end-tag.
<LI>
This tag must be closed in XML in order to ensure a </LI>
Well-Formed XML document.
<META
META tags are considered empty elements in XML,
<META
name="keywords"
and they must close.
name="keywords"
content="XML, SGML,
content="XML, SGML,
HTML">
HTML"/>
<BR>
Break tags are considered empty elements.
<BR/>
<IMG src=
This is an empty element tag.
<IMG src=
"coolpictures.html">
"coolpictures.html"/>
Copyright © 1998-99
Well-formed XML
A document that conforms to the XML syntax rules is called "well-formed." If all your tags are correctly
formed and follow XML guidelines, then your document is considered a well-formed XML document.
That's one of the nice things about XML -- you don't need to have a DTD in order to use it.
Begin the Well-formed Document
To begin a well-formed document, type the XML declaration:
<?xml version="1.0" standalone="yes" encoding="UTF-8"?>
If you are embedding XML, it will go after the <HTML> and <HEAD> tags, and before any Javascript.
If you are creating an XML-only document, it will be the first thing in the file.
12
Version
You must include the version attribute for the XML declaration. The version is currently "1.0." Defining
the version lets the browser know that the document that follows is an XML document, using XML 1.0
structure and syntax.
Standalone
The next step is to declare that the document "stands alone." The application that is processing this
document knows that it doesn't need to look for a DTD and validate the XML tags.
Encoding
Finally, declare the encoding of the document. In this case, the encoding is UTF-8, which is the default
encoding for XML. You can leave off this attribute and the processor will default to UTF-8.
Remember the Root Element
After the declaration, enter the tag for the root element of your document. This is the top-most
element, under which all elements are grouped.
Follow XML Syntax
Now, enter the rest of the your content. Remember to follow XML syntax:




Remember that capitalization matters;
Quote all attribute values;
Close all tags;
Remember to close empty tags too, like this:
<br/>
Pretty easy, isn't it? That's all there is to it!
Valid XML
A valid document conforms to the XML syntax rules and follows the guidelines of a Document Type
Definition (DTD).
The process of comparing the XML document to the DTD is called validation. This process is
performed by a tool called a parser.
Begin the Valid XML Document
To begin a well-formed document, type the XML declaration:
<?xml version="1.0" standalone="no" encode="UTF-8"?>
If you are embedding XML, it will go after the <HTML> and <HEAD> tags, and before any Javascript.
If you are creating an XML-only document, it will be the first thing in the file.
Version
You must include the version attribute for the XML declaration. The version is currently "1.0." Defining
the version lets the browser know that the document that follows is an XML document, using XML 1.0
structure and syntax.
Standalone
The standalone="no" attribute tells the computer that it must look for a DTD and validate the XML
tags.
Encoding
Finally, declare the encoding of the document. You can leave off this attribute and the processor will
default to UTF-8.
13
Create a DOCTYPE Definition
The second element in a valid XML document is the DOCTYPE definition. This identifies the type of
document and DTD in use.
If you look at HTML source files, you'll often see a !DOCTYPE definition, especially if the file was
created by a WYSIWYG tool. The DOCTYPE definition points to an HTML DTD.
In a valid XML file, !DOCTYPE tells the program that is processing your XML file two things: the name
of the type of document and the name and location of the DTD against which to validate the file's
contents.
The DOCTYPE definition looks like this:
<!DOCTYPE type-of-doc SYSTEM/PUBLIC "dtd-name">
!DOCTYPE
This says that you are defining the DOCTYPE.
type-of-doc
This is the name of the type of document contained in this file. Typically, this is the same name as the
DTD.
SYSTEM/PUBLIC
SYSTEM tells the processor to look for the private DTD at the following location. PUBLIC tells the
processor to look for a public DTD at the following location.
"dtd-name"
The URL after SYSTEM or PUBLIC is the name of the dtd file. All DTDs end with the extension .dtd.
If you want, instead of pointing to an external DTD, you could place the DTD information within the
DOCTYPE definition, making it local to your XML document. You should do this only if you want to
define a few simple elements and you want them permanently attached to a particular document.
Remember the Root Element
After the declaration, enter the tag for the root element of your document. This is the top-most
element, under which all elements are grouped.
Follow XML Syntax
Now, enter the rest of the your content. Remember to follow XML syntax:




Remember that capitalization matters;
Quote all attribute values;
Close all tags;
Remember to close empty tags too, like this:
<br/>
Elements
Elements are the basic building blocks of XML (and HTML, for that matter). Each element is a piece of
data, identified by a tag. The tag contains the name of the element and any of its attributes, like this:
<AUTHOR dob="1864">Thadius J. Frog</AUTHOR>
Thadius J. Frog is now identified as an author element. This particular author element as a date of
birth (dob) attribute value of 1864.
Chose Your Own
XML is an extensible markup language. This means you create a set of elements that work for your
content -- and that you'll be able to use consistently within the document.
14
Whether you use a DTD or not, you'll still want to sit down and write a list of the element names that
you will be using in your document. XML is case-sensitive, so as you're thinking about the element
names, be sure the think about how you capitalize them also.
Select names that are both easy to rememberer and easy to type. Ideally, your tags should have
some inherent meaning too. This makes them easier to use. For example, if you want to identify "last
name" as an element, consider naming the element something like "last-name" or "surname."
Be consistent in your use of names. It is easier to apply one set of general rules to 20 different tags
than it is to remember eight discrete tags that follow no particular pattern. For example, if your
document is a listing of classes, you could use these elements:
<list-of-classes>
<name-class>
<instructor-name>
<Sec>
<TIME>
<descprt>
But you're just asking for confusion!
There's a mix of capitalization. There's a mix of abbreviation and full words. In one case the phrase
"name" is the first part of tag; in the other it is the second part of the tag. It isn't logical to remember
this set of names.
Wouldn't names like this be easier to use?
<classlist>
<class>
<section>
<time>
<instructor>
<description>
Theses names are all lowercase, full words, no plurals -- and easy set of criteria to remember.
Focus on Structure, Not Format
One of the goals of using XML is to separate structure ("this is an author") from format ("display this in
10 point Helvetica"). Elements remain identified as elements, no matter what platform you move the
data to. An XML document is completely interpretable.
When you think about elements, think about the role they play and the data they contain. Don't think
about how the elements will look on the page. Appearance is handled separately.
You are using elements to identify data within your document as playing a certain role or belonging to
a certain category of data.
Displaying Elements
You can use any tag name you want, as long as you follow proper XML syntax. Of course, those tags
alone won't do anything. They will just sit there quietly, marking up your data.
After you data is marked up, you'll use style sheets or other processing tools to display the XML
document. You can control the display based on information contained in the elements.
Using Elements
In a well-formed XML document, you can insert any element tag you want, as long as you follow
proper syntax.
In a valid XML document, only the elements which are specified in the DTD will pass muster. If you
randomly add other elements, their use will be flagged as an error.
When you use elements in an XML document, you must follow standard XML syntax:

15
The element name surrounds the data which it defines. For example: <chapterhead>Tying Knots</chapter-head>.


All elements, including empty elements, must end. This means having an open and close tag
for regular elements and a tag that closes with a slash for empty elements.
The element name is case sensitive: <AUTHOR>, is not the same as <author>.
DTDs and Elements
One of the ways to define and codify all your elements is to create a DTD. A DTD defines the
allowable elements, their attributes (if any) are, and their relationship is to other elements.
By validating your XML document against a DTD, you can test to be sure that elements in the
documents are being used correctly.
Attributes
Attributes provide additional information about elements.
You use elements and attributes all the time in HTML. For example, in HTML, a tag such as <H1
align="center"> includes an element: H1, and an attribute: align and an attribute value:
center.
In HTML, attributes allow you to specify additional information about your elements. Often this
information is formatting-related, such as align or size. In XML, attributes allow you to specify
additional data about an element, but it is never formatting-related. It is, instead, additional data about
that particular element.
Let's say, for example, you're creating documents about late 20th century popular music. In your DTD
you've created an element called <SONG> which identifies each musical title. You have music that falls
into different decade categories -- the 70's, the 80's and the 90's. You can give the song element an
attribute called era. Now, you'll be able to know from what era each song dates.
By using an attribute, you can identify different versions of the same song -- "I've Got You Babe" from
the 1960s and "I've Got You Babe" from the 1980s. Later on, you can use this data to display all 70s
songs in green, or to sort the displayed titles by era.
You would use the attribute like this:
<SONG era="60s">I've Got You Babe</SONG>
<SONG era="70s">Billy Don't Be a Hero</SONG>
<SONG era="80s">I've Got You Babe</SONG>
"I've Got You Babe" is identified as a "song" element with an "era" attribute value of "60s". "Billy Don't
Be A Hero" is identified as a "song" element with an "era" attribute value of "70s". "I've Got You Babe"
is identified as a "song" element with an "era" attribute value of "80s".
Attributes and their allowable values are created in your DTD, when you specify elements. They are
specified through an attribute list. Like element names, attribute names are case-sensitive, so be
aware of your use of capitalization when you select and use attribute names.
One other important thing to remember about attributes in XML tags is that the attribute values must
always be contained inside quotes. In HTML it's a mixed bag, but in XML the rule is easy to
remember: quote all attribute values.
Comments
Comments are a way to add your own notes to an XML document. The browser and the XML
processors will ignore anything inside comments.
You aren't going to remember what you were thinking three months later when you return to edit the
document, so don't be afraid to add comments as reminders or as markers of work that you have
done.
To create a comment:
1. Type a less than sign, followed by an exclamation point and two dashes like this:
16
<!-2. Type the text you want inside the comment. Be sure the text DOES NOT contain two dashes!
<!--This defines a listing of books
3. Now, close the comments, with two dashes and a closing greater than tag:
<!--This defines a listing of books-->
CDATA
CDATA stands for "character data." Character data are letters, numbers and other symbols that are
used exactly as they are typed. They are not parsed or processed, or treated as if they have any
special meaning.
You can create a CDATA section within your XML document. A CDATA section is handy way to show
code examples or to use characters, such as > that would otherwise take on a special meaning. You
can use CDATA instead of using a series of <, for example.
To create a CDATA section:
1. At the place in the document where you want the CDATA section to appear, begin a CDATA
definition with the less than sign and an exclamation point.
<!
2. Type an open square brace and the letters CDATA.
<![CDATA
3. Type another open square brace.
<![CDATA[
4. Now type the CDATA itself. In this example, we are typing some sample code.
<![CDATA[<NAME common="freddy" breed"springer-spaniel">Sir Fredrick
of Ledyard's End</NAME>
5. End the section with two closing square bracket and a greater than symbol.
<![CDATA[<NAME common="freddy" breed"springer-spaniel">Sir Fredrick
of Ledyard's End</NAME>]]>
Click anywhere on this code to see how it would be displayed in a browser, assuming of course, that it
is linked to a stylesheet:
<HEAD1>
Entering a Kennel Club Member
</HEAD1>
<DESCRIPTION>
Enter the member by the name on his or her papers. Use the NAME tag. The NAME
tag has two attributes. Common (all in lowercase, please!) is the dog's call
name. Breed (also in all lowercase) is the dog's breed. Please see the breed
17
reference guide for acceptable breeds. Your entry should look something like
this:
</DESCRIPTION>
<EXAMPLE>
<![CDATA[<NAME common="freddy" breed"=springer-spaniel">Sir Fredrick of
Ledyard's End</NAME>]]>
</EXAMPLE>
Namespaces
Namespaces are a way of using elements from more than one DTD within the same XML document.
Sometimes you may be working with material that draws on several sets of element tags. For
example, you might have an online store selling tropical fish and you'd like to use the <SOURCE> tag
to identify both the geographic location from which each species comes and the wholesaler from
whom you buy it. Namespaces are a way to do this.
An XML namespace is a collection of names, identified by a URI reference, which are used in XML
documents as element types and attribute names. In practice, namespaces let you match a tag you
are using with a particular set of tags.
In the beginning of your document (or at the start of a particular element of your document), you
identify the namespaces you'll be using and where the tag information is located. Then, when you use
the tag to identify an element in your document, you precede it with the appropriate namespace name.
Declaring Namespaces
At the beginning of your document, you'll want to identify the namespaces you are using in your
document. This process is called declaring the namespace. In this example, you are creating a
namespace called "sales." The URI for sales is the mythical fishworld.org/schema:
<document xmlns:SALES='http://fishworld.org/schema'>
Using Namespaces
When you use the tag to create the element that is defined in one of the namespaces, the namespace
is the first part of the tag, like this:
<SALES: SOURCE>Fish-o-Rama Wholesalers and Suppliers to the Trade</SOURCE>
When you use your own tag you just use the tag name, like this:
<SOURCE>Mexico, Central America</SOURCE>
In January 1999, Namespaces became a W3C Recommendation.
XML Entities
An entity is a short cut to a set of information.
When you use an entity, it "expands" to its full meaning, but you need only type the shorter entity
name during data entry. You might think of an entity as being a bit like a macro -- it is a set of
information that can be used by calling one name.
XML defines two types of entities. The general entity, which we'll talk about here, is used in XML
document. The parameter entity is used in DTDs. General entities are easy to spot: they begin with
the ampersand and end with the semicolon, like this:
&entity-name;
Uses for Entities
18
Entities are a way to make entering and managing data easier.
You've probably already used entities without calling them that. If you've ever entered the characters
< to create the < symbol, you've used an entity. This keystroke combination is a standard predefined
entity in both HTML and XML that lets you access a particular ascii character without having to
memorize the character set number.
Here are a few reasons you might want to define and use entities:




Entities save typing. Suppose you have a paragraph, like a copyright notice, that you use in
every single document. You could type that notice over and over again. Or, you could use an
entity to call it forth in place.
Entities can reduce errors. By the 101st time you type that copyright notice, it is likely your
poor fingers will be so tired you'll make an error and set your copyright for 1989 instead of
1999. Using an entity can reduce the potential for these types of errors.
Entities are easy to update. It is time to update that copyright notice -- with an entity you can
make the change in one place and be done with it. Without an entity you'd be searching and
replacing throughout your document set.
Entities can act as placeholders for TBD information. Maybe legal hasn't quite finalized
what they want that copyright notice to say. That doesn't have to stop production -- you can
use and entity and when the final wording comes down, the entity will automatically display the
new, corrected version in all your documents.
You can get quite creative with the use of entities, and even have documents that are constructed
entirely from entities. Here's an example:
You want to create different documents, each contains a set of bios for members of your staff. You'll
have an executive set, a set for each product line, a set for six different regions around the world ...
subsets of the same content appears in each.
One approach you could take is creating 10 or 12 separate flat files, with the appropriate biography
information into each. But an easier way is to create a small file for each bio, then call each into the
executive page, the European page, the Flying Toys Division page and so on via an entity.
Here's how the content code for your Flying Toys Division Page might look. Upon display, the entities
would expand and you'd see the full bios of each person. If you needed to change the bios, you could
do it in one place. If the product manager changed, all your pages would be automatically updated
with the new person.
Click anywhere in the code to see how it might expand into a displayed document:
<HEAD>The Faces Behind Flying Toys!</HEAD>
<BIO>&bio-ft-div-head;</BIO>
<BIO>&bio-ft-prod-mgr;</BIO>
<BIO>&bio-ft-designer;</BIO>
<BIO>&bio-ft-lead-engineer;</BIO>
Defining Entities
You can define entities in your local document as part of the DOCTYPE definition. You can also link to
external files that contain the entity data. This, too, is done through the DOCTYPE definition. A third
option is to define the entities in your external DTD.
Use a local definition when the entity is being used only in this one particulars file. Use a linked,
external file when the entity being used in many document sets.
To define an entity:
1. Start your DOCTYPE definition as usual, like this:
<!DOCTYPE
2. Now mark that you are defining some data by entering a square bracket:
19
<!DOCTYPE [
3. Start the entity definition, with a less than sign, an exclamation mark, and the phrase ENTITY,
all in caps:
<!DOCTYPE [
<!ENTITY
4. Type the name of the entity. Type it using the capitalization that you will use when calling it
later on.
<!DOCTYPE [
<!ENTITY copyright
5. If you are defining the entity locally, type the value of the entity, surrounded by quotes, and
then close the entity definition with a greater than sign.
<!DOCTYPE [
<!ENTITY copyright "Copyright 2000, As The World Spins Corp. All
rights reserved. Please do not copy or use without authorization. For
authorization contact legal@worldspins.com.">
6. If you are defining an entity in an external, ascii text file, put in a pointer to the external file,
then close the entity definition with a greater than sign.
<!DOCTYPE [
<!ENTITY copyright SYSTEM
"http://www.worldspins.com/legal/copyright.xml">
7. Create all your entity definitions. When you are done, close the DOCTYPE definition with a
square brace and a greater than sign.
<!DOCTYPE [
<!ENTITY copyright "Copyright 2000, As The World Spins Corp. All
rights reserved. Please do not copy or use without authorization. For
authorization contact legal@worldspins.com.">
<!ENTITY trademark SYSTEM
"http://www.worldspins.com/legal/trademark.xml">
]
>
Using Entities
To use an entity in your document, just call it by name. The name begins with an & and ends with a
semi-colon.
Click anywhere on this code to see how it would display, assuming of course, that it was linked to a
style sheet.
<?xml version="1.0">
<!DOCTYPE [
<!ENTITY copyright "Copyright 2000, As The World Spins Corp. All rights
reserved. Please do not copy or use without authorization. For authorization
contact legal@worldspins.com.">
<!ENTITY trademark SYSTEM "http://www.worldspins.com/legal/trademark.xml">
]
>
20
<PRESSRELEASE>
<HEAD>Mini-globe revolutionizes keychain industry</HEAD>
<LEAD>
Today As The World Spins introduces a new approach to key chains. With the new
MINI-GLOBE keys can be kept inside a chain, called for upon demand, and stored
safely. Never more will consumers lose a key or stand at a door flipping
through a stack of keys seeking the right one.
</LEAD>
<LEGAL>
&trademark;
&copyright;
</LEGAL>
</PRESSRELEASE>
XML DTDs: Introduction
Valid XML documents follow a set of rules defined in a associated DTD. This Document Type Definition
defines elements, attributes, and relationships between elements.
DTDs are saved in an ascii text file with the extension .dtd, like this:
mypage.dtd
When your XML document is processed, it is compared to its associated DTD to be sure it is
structured correctly and all tags are used in the proper manner. This comparison process is called
validation and is is performed by a tool called a parser.
Remember, you don't need to have a DTD to create an XML document; you only need a DTD for a
valid XML document.
Before You Begin
There are a handful of terms you'll be hearing as you work with an XML DTD. Take a couple of
minutes to become familiar with them before you begin. Click on any of the terms to see its definition.
Schema
A schema is a description of the rules for data.
A schema does things:
1. It defines the elements in a data set and their relationship to each other.
2. It defines the content that can be contained in each element.
DTDs are a schema for XML documents.
DTD
Document Type Defintion. The DTD defines the elements, attributes, and relationships between
elements for an XML document.
A DTD is a way to check that the document is structured correctly, but you do not need to use one in
order to use XML.
Document Tree
21
A document tree is the representation of the hierarchy of elements in a document.
A document tree has one root element. All other elements are part of this top-level element. The first
tag in your XML document is always the root element.
Root Element
The root element is the top-most element in the hierachy. All other elements in a document are
children of this element.
In an XML file, the first tag is the root element's tag.
In the DTD, the root element is the first element you should define.
Parent Element
A parent element is a element which contains other elements. The other elements are called
children.
For example, a list is a parent. The list items are children.
A parent element is sometimes referred to as a branch element. Each branch sprouts off the tree;
from the branch hang other brances and individual leaves. The branches and leaves "belong" to the
parent branch.
Child Element
The child element a sub-set of the parent element.
An element may be both a parent and a child at the same time. For example, the list element may be
a child of the root element. At the same time it is the parent of the list item element.
If a child element is the outer-most element in the hierachy and does not contain any other elements it
is sometimes called a leaf element.
Parser
A parser is a software tool that checks to be sure a document follows a particular syntax.
XML parsers come in two varieties:


A non-validating parser checks a document to be sure XML syntax rules are followed and
builds a document tree from the element tags.
A validating parser checks the syntax, builds the tree, and compares the use of element tags
to be sure they conform with the rules specified in the document's associated DTD.
Paresers can be either external programs or part of the editing tool or browsing tool.
The XML Reference section includes a list of some of the XML parsers
DTD Contents
A DTD is a way to ensure that an XML document uses elements correctly. It contains a set of rules.
When your XML document is processed, it is compared to its associated DTD to be sure it is
structured correctly and all tags are used in the proper manner.
A DTD:



22
Always contains rules that define elements.
Always contains rules that define the relationship between elements.
May contain rules that define attributes for elements, althought not all elements have
attributes.


May contain rules that define entities.
May may contain rules that define notations
Finding a DTD
Using a DTD doesn't necessarily mean you have to create one from scratch. There are a number of
existing DTDs, with more being added everyday.
Shared DTDs
As XML becomes wide-spread, your industry association or company is likely to have one or more
published DTDs that you can use and link to. These DTDs define tags for elements that are commonly
used in your applications. You don't need to recreate these DTDs -- you just point to them in your
doctype tag in your XML file, and follow their rules when you create your XML document.
Some of these DTDs may be public DTDs, like the HTML DTD. Others may belong to your company.
If you are interested in using a DTD, ask around and see if there is a good match that already exists.
Create Your Own External DTD
Another option is to create your own DTD. The DTD can be very simple and basic or it can be large
and complex. The DTD will be a reflection of the needs of your document.
It is perfectly acceptable to have a DTD with just four or five basic elements if that is what your
document needs. Don't feel that creating a DTD necessarily needs to be a huge undertaking.
However, if your documents are complex, do plan on setting aside time -- several days or several
weeks -- to understand the document and the document elements and create a solid DTD that will
really work for you over time. Remember, you'll be able to use this DTD with many individual
documents, so it is worth the time to think it through and craft it well.
Create Your Own Internal DTDs
You can insert DTD data within your DOCTYPE definition in an individual XML document. If you're
worked with CSS styles, you can think of this as being a little like putting style data into your file
header.
DTDs inserted this way are used in that specific XML document only. This might be the approach to
take if you want to validate the use of a small number of tags in a single document or to make
elements that will be used only for one document.
Internal DTDs
You can insert DTD data within your doctype declaration. This type of DTD is used only by the one
specific XML document that contains it.
This is a very simple example of DTD data within the doctype declaration. Click on any line of the
code to learn what it does.
<!DOCTYPE books [
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ENTITY copyright "Copyright 1999, Flying Toys Inc., all rights reserved.">
]>
External DTDs
DTDs are stored as ascii text files with the extenstion .dtd. Each file begins with a DOCTYPE
definition and includes a seres of element definitions, attribute lists, entity defintions and notation
23
definitions. Here's an example; this might be the DTD for a set of documents about books. Click on
any line for more information about it:
<!--This defines a listing of books-->
<!DOCTYPE booklist [
<!ELEMENT booklist (title, author)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ATTLIST title (paper|cloth|hard) "paper">
<!ENTITY copyright "Copyright 1999, Flying Toys Inc., all rights reserved.">
]
>
DTDs can be much more complex than this example -- and they typically are -- but this gives you a
sense of what they can do. It's just a matter of structuring your data and figuring out the "parts" of your
content.
Reading a DTD
Even if you don't plan to build a DTD from scratch, it is helpful to know how to read one and to
understand the document it is describing.
From reading a DTD you should be able to compile a list of elements and their attribute, and how and
when to use them. You should also be able to compile a list of entities that you can use within the
document.
Some people find it helpful to actually sketch out a document tree as they go through the DTD, to
visualize the structure of the document.
Check List
Here's a list of things to look for as you go through a DTD:









Read the Comments
Note the Basic Elements
Read the Element Declaration
Look for Parent/Child Relationships
Read Attribute Lists
Find Attribute Names for Each Element
Determine Attribute Value Types
See the Attribute's Default
Read Entity Declarations
Read the Comments
Read the comments! Comments can tell you a lot about the DTD, how to use it, and what to be aware
of when using it.
Most DTD authors will include information that you should know before using the DTD. This might
range from use restrictions to how-to information.
Comments look like this:
<!-- Here's a comment -->
Note the Basic Elements
Look through the DTD and identify the element names that comprise the document. Note how they
are capitalized. You might want to develop a reference sheet of elements, that you can make notes on
as you work your way through the DTD.
Elements begin like this:
<!ELEMENT
24
The text immediately after the element declaration is the element's name.
Read the Element Declaration
Each element declaration provides the name of the element and the content which it contains.
Sometimes the content is text. Other times is other elements, arranged in a certain order or used a
certain number of times.
Click on each portion of these element declarations to learn about the rules they describe.
<!ELEMENT EMPLOYEE (FIRST, MI, LAST)>
<!ELEMENT FIRST (#PCDATA)>
<!ELEMENT MI (#PCDATA)>
<!ELEMENT LAST (#PCDATA)>
Look for Parent/Child Relationships
The element rules build a hierarchy of element, describing how one element is related to another. And
element that is contained within another is considered a child of the element in which it is contained.
Use these relationships to sketch out your document tree.
The parent/child relationship is defined in the content type portion of the element definition. If the
content type is another element, then those elements are children of the element whose definition you
are reading. For example: FIRST, MI, and LAST are children of EMPLOYEE:
<!ELEMENT EMPLOYEE (FIRST, MI, LAST)>
The DTD can require that the child elements be used in a certain order or that they be used one,
none, or many times. It can also group elements to create more detailed rules.
Read Attribute Lists
After element definitions, you may see attachment lists. An attachment list begins like this:
<!ATTLIST
Each attribute list defines the attributes for an element. Many attributes may be defined in one
ATTLIST.
The ATTLIST is structure like this:
<!ATTLIST element-name attribute-name attribute-type default-data>
See Which Element the Attribute Defines
Right after the ATTLIST declaration is the name of an element. This is the element that the attribute
list defines. For example, this ATTLIST defines the COMMENT element:
<!ATTLIST COMMENT attribute-name attribute-type default-data>
Find Attribute Names for Each Element
Following the element name is the name of the first attribute declared in this list. This name is the
attribute name you type into the element tag in the XML file. For example, this ATTLIST defines the
attribute "category" for the element COMMENT.
<!ATTLIST COMMENT category attribute-type default-data>
Add the attribute information to the element reference list you are building.
Determine Attribute Value Types
Attributes can be one of several different types. The attribute-type describes the type of value that the
attribute may contain. For example, this ATTLIST says that the "category" attribute for the element
COMMENT contains one of four values: red, green, blue, or other.
<!ATTLIST COMMENT category (red | green| blue| other) default-data>
25
See the Attribute's Default
The final part of the ATTLIST is the default value of the attribute. The default value has a strong
effect on how the attribute is used and what values it might have if you don't use it in the XML tag. You
can make the value required (#REQUIRED) or optional (#IMPLIED). Or, you can provide a default
value that will be used automatically if the attribute is not entered.
Read Entity Declarations
Along with element and attribute definitions, you may also see entity definitions. Typically, these will
appear in a group, often at the beginning of the DTD, and usually with explanatory comments.
An entity definition begins like this:
<!ENTITY
After the declaration, is the entity's name and the contents of the entity. The contents may be text or it
may be a pointer to another external file. For example, this defines two entities, one called "copyright"
and one called "trademark." Copyright is defined within the definition, while trademark points to
another file.
<!ENTITY copyright "Copyright 2000, As The World Spins Corp. All rights
reserved. Please do not copy or use without authorization. For authorization
contact legal@worldspins.com.">
<!ENTITY trademark SYSTEM "http://www.worldspins.com/legal/trademark.xml">
Making Elements
Elements are the basic building blocks of XML. You define elements in a DTD; you use them in a
document. A basic element definition looks like this:
<!ELEMENT DESCRIPTION (#PCDATA, DEFINITION)*>
Element Declaration
Each element begins with an element declaration, <!ELEMENT. This announces that you are defining
an element.
Element Name
After the declaration is the element's name. The way the name appear in the element definition is
exactly the way it must be used in the XML document. Capitalization counts!
Element Rule
After the name comes a rule that describes what the element can contain. Through this description,
the element take on hierarchal relationships with each other.
Although the basic bits of the rules are simple, they can be grouped and combined to create quite
complex definitions.
This table summarizes the element rule definitions.
Contents
Elements can contain text, other elements, a combination of text and other elements, or they may be
empty.
Text. Elements can contain textual data.
26
Other Elements. Elements can contain only other specified elements and no text. The contained
element are called children of the containing element. The containing element is the parent of the child
elements.
Combination. Element can contain a mix of textual data and other specific elements.
Empty. Empty elements get their value from their attributes. An empty element will typically have at
least one attribute. In HTML, the IMG tag is a good example of an empty element. It gets its value
from the src attribute.
Number of Occurences
You can specify the number of times a child element is used within its parent.
Once and only once. The element listed by itself indicates that it can be used once and only once:
DTD definition
Used in document
<!ELEMENT EVENTLIST
(EVENT)>
<EVENTLIST>
<EVENT>Balsa Wood Flyer Days</EVENT>
</EVENTLIST>
At least once, or many times. The element followed by a plus sign indicates that this element can be
used many times with the parent:
DTD definition
Used in document
<!ELEMENT EVENTLIST
(EVENT+)>
<EVENTLIST>
<EVENT>Balsa Wood Flyer Days</EVENT>
<EVENT>Sundays in the Park</EVENT>
<EVENT>Teach Your Child to Fly</EVENT>
</EVENTLIST>
Once or not at all. The element followed by a question mark indicates that this element can be used
either one time or not at all:
DTD definition
Used in document
<!ELEMENT EVENT (LOCATION,
SPONSOR?)>
<EVENT>
<LOCATION>West Bay Ballpark</LOCATION>
</EVENT>
or
<EVENT>
<LOCATION>West Bay Ballpark</LOCATION>
<SPONSOR>Flying Toys</SPONSOR>
</EVENT>
Once, not at all, or a many times as you want. The element followed by an asterisk indicates that
this element can be used as many time as needed.
DTD definition
Used in document
<!ELEMENT EVENT (LOCATION*,
EVENT-NAME)>
<EVENT>
<LOCATION>West Bay Ballpark</LOCATION>
<LOCATION>North Side Park</LOCATION>
<EVENT-NAME>Sundays in the Park</EVENTNAME>
</EVENT>
or
<EVENT>
<EVENT-NAME>Sundays in the Park</EVENT-
27
NAME>
</EVENT>
Order
You can specify the order in which child elements appear.
Specific order. Child elements can be defined to be used in a specific order. The comma (,)
separates elements that are listed in a specific order. For example, you could set a rule that creates
an EVENTLIST. In the list, you must always use the EVENT element, followed by the SPONSOR
element.
DTD definition
Used in document
<EVENTLIST (EVENT,
SPONSOR)>
<EVENTLIST>
<EVENT>Balsa Wood Plane Days</EVENT>
<SPONSOR>Flying Toys</SPONSOR>
</EVENTLIST>
Either Or. You can define child elements so that one or another can be used. The bar (|) separates
either or choices.
DTD definition
Used in document
<EVENT (EVENT-NAME |
SPONSOR)>
<EVENT>
<EVENT-NAME>Balsa Wood Plane Days</EVENTNAME>
</EVENT>
or
<EVENT>
<SPONSOR>Flying Toys</SPONSOR>
</EVENT>
Groups
Groups can be used to create complex rules, that combine elements and different usage option.
For example, when groups are combined with a "use many times" symbol, you can create a rule that
allows multiple uses of elements -- either in in any order or as repeated sets. For example, here the
element EVENTLIST can contain multiple sets of EVENT and SPONSOR groups:
DTD definition
Used in document
<EVENTLIST (EVENT,
SPONSOR)*>
<EVENTLIST>
<EVENT>Balsa Wood Plane Days</EVENT-NAME>
<SPONSOR>Flying Toys</SPONSOR>
<EVENT>Sundays in the Park</EVENT-NAME>
<SPONSOR>Deer Island Recreation
Department</SPONSOR>
</EVENTLIST
Here, the EVENTLIST can contain either the EVENT element or the SPONSOR element, but this
either or group can be used many times.
28
DTD definition
Used in document
<EVENTLIST (EVENT |
SPONSOR)*>
<EVENTLIST>
<EVENT>Balsa Wood Plane Days</EVENT-NAME>
<SPONSOR>Flying Toys</SPONSOR>
<SPONSOR>Deer Island Recreation
Department</SPONSOR>
</EVENTLIST
Hints for Element Names




Select names that are both easy to remember and easy to type.
Give your tags should have some inherent meaning. For example, if you want to identify "last
name" as an element, consider naming the element something like "last-name" or "surname."
Use names that are consistent with current processes. If people call "social security number"
SSN, create an element called SSN. Don't create an unfamiliar "socsecnum" element.
Be consistent in your use of names. It is easier to apply one set of general rules to 20 different
tags than it is to remember eight discrete tags that follow no particular pattern.
Attribute Lists
Elements can have attributes, which describe the element in more detail. When you create an element
in your DTD, you can also an create an attribute list for the element.
Attribute lists define the name, data type, and default value (if any) of each attribute associated with
an element.
In this very simple example, we're adding some attributes to the title element from our book list. We
want to be able to specify the edition date and whether the book is paperback or hardcover. Click on
any of the attribute list code to see what it does.
<!--This defines a listing of books-->
<!DOCTYPE books [
<!ELEMENT booklist (title, author)>
<!ELEMENT title (#PCDATA)>
<!ATTLIST title
edition (CDATA) #REQUIRED
type (paper|cloth|hard) "paper">
<!ELEMENT author (#PCDATA)>
]
>
Here's how you'd use these attributes in an XML file. Notice the use of the edition attributes in
each title tag. Notice how one title tag also uses the type attribute to indicate that this book is a
hardcover title.
Attribute Types
Attributes can have one of seven different types of data, but the two most common are:
CDATA. Character data. This allows the attribute value to be textual data. You use it like this:
<!ATTLIST edition date (CDATA)>
Pre-defined values. You can list a string of specific values that the attribute can have. The value set
is enclosed in parenthesis and each value is separated with a vertical bar, like this:
<!ATTLIST edition type (paper|hard|cloth)>
Default Values
You can specify a default value for the attribute, or make the attribute required or optional. The
default value has a strong effect on how the attribute is used and what values it might have if you
don't use it in the XML tag.
#REQUIRED: the attribute must have a value every time the element is listed. You specify that an
attribute is required like this:
<!ATTLIST edition date (CDATA) #REQUIRED>
29
#IMPLIED: the procesor ignores this attribute unless it used as part of the element. It does not
assume any default value.
#FIXEDvalue: an attribute is not required for the element, but if it occurs, it must have the specified
value. For example, if the new attribute is used, it must have the value of "yes":
<!ATTLIST edition new #FIXED "yes">
VALUE defaultvalue provides a default value for that attribute. If the attribute in not included in the
element, the processing program assumes that this is the attribute's value. For example, this gives the
type attribute a default value of "hard":
<!ATTLIST edition type (paper|cloth|hard) "hard">
Entities
An entity is a short cut to a set of information.
When you use an entity, it "expands" to its full meaning, but you need only type the shorter entity
name during data entry. You might think of an entity as being a bit like a macro -- it is a set of
information that can be used by calling one name.
XML defines two types of entities.
The general entity is one that you define in a DTD and use in a document. General entities are
easy to spot. They are defined with the entity declaration, <!ENTITY, and when they are used they
begin with the ampersand and end with the semicolon, like this:
&entity-name;
The parameter entity is one that you define and use within a DTD. The content of a parameter
entities may be either included in the DTD or stored in an external file. In addition, parameter entities
must be parsed; they cannot be unparsed. That is, they must contain textual data that is processed
rather than a GIF or other non-textual data type.
It too is defined with a entity declaration, but it is called with a percent sign, like this:
%info;
Defining a General Entity
To define an entity:
1. Start the entity definition, with a less than sign, an exclamation mark, and the phrase ENTITY,
all in caps:
<!ENTITY
2. Type the name of the entity. Type it using the capitalization that you will use when calling it
later on.
<!ENTITY copyright
3. If you are defining the entity locally, type the value of the entity, surrounded by quotes, and
then close the entity definition with a greater than sign.
<!ENTITY copyright "Copyright 2000, As The World Spins Corp. All
rights reserved. Please do not copy or use without authorization. For
authorization contact legal@worldspins.com.">
4. If you are defining an entity in an external, ascii text file, put in a pointer to the external file,
then close the entity definition with a greater than sign.
<!ENTITY copyright SYSTEM
"http://www.worldspins.com/legal/copyright.xml">
30
Using a General Entity
You won't be using a general entity in a DTD. You will only be defining it here. You will be using it in
an XML file, where it is called by tying an ampersand, the entity name, and a semi-colon, &entityname;
Defining a Parameter Entity
To declare a parameter entity:
1. Type the entity declaration:
<!ENTITY
2. Type a space, followed by a percent sign. It is important to remember the space!
<ENTITY %
3. Type another space, followed by the name of the entity:
<!ENTITY % list
4. Type the value of the entity, surrounded by quotation marks:
<!ENTITY % list "name CDATA #REQUIRED gender (m | f) "f" color (red |
fawn | merle | black)"
5. End the declaration with an end tag symbol.
<!ENTITY % info "name CDATA #REQUIRED gender (m | f) #REQUIRED color
(red | fawn | merle | black |other) #REQUIRED"
One thing to notice about entities in a DTD is that when they are defined there is a space between the
percent sign and the entity name--but when the entity is used there is no space between the percent
sign and the entity name.
Using a Parameter Entity
It is quite simple to use a parameter entity. Simply enter the entity name, preceded by a percent sign
and followed by a semi-colon, like this:
<HOUND (NAME)>
<!ATTLIST HOUND %info;>
<WORKING (NAME)>
<!ATTLIST WORKING %info;>
<COMPANION (NAME)>
<!ATTLIST COMPANION %info;>
When the DTD is processed, the entity will be expanded. In this example, %info; will be replaced with
a set of attribute data, which was defined in the info entity declaration.
Again, remember that when a parameter entity is defined, there is a space between the percent sign
and the entity name--but when the entity is used there is no space between the percent sign and the
entity name.
31
XML Parsers
Parsing is the process of checking the syntax of your document and creating the "tree structure." If
you are using a validating parser, the process will also compare the XML file to its DTD.
On-line Parsers
There are a number of online parsers. To use these, you typically type in the URI of your file and tell
the process to begin.


Online validating parser, from the W3C
The W3C offers an online parser. Type the URL of the file into the form and the XML file is
both parsed and validated.
Validating Parser from Brown University Scholarly Technolgy Group
This is the most easily accessible and understandable presentation of the online parsers.
Downloadable Parsers
There are many parsers that you can download and run on your local machine. Most of these require
you to have either a Windows or UNIX machine. They are written in a variety of langauges; this is a
cross section of some of the many which are available.






James Clark's expat parser
James Clark is amost a brand in the SGML/XML world. His rendition of an XML parser is
widely used.
Java-based Validating XML Parser
From IBM's AlphaWorks group, this parser claims to be 100% pure Java.
Microsoft XML Parser in C++
A parser from Microsoft.
XML Parser written in Python
This is a validating parser.
XML Parser written in JavaScript.
This parser is non-validating and checks XML syntax only.
SiRPAC, Simple RDF Parser and Compiler
From the W3C.
XML Syntax
Tagging an XML document is, in many ways, similar to tagging an HTML document. Here are some of
the most important guidelines to follow.
Rule #1: Remember the XML declaration
This declaration goes at the beginning of the file and alerts the browser or other processing tools that
this document contains XML tags. The declaration looks like this:
<?xml version="1.0" standalone="yes/no" encoding="UTF-8"?>
You can leave out the encoding attribute and the processor will use the UTF-8 default.
Rule #2: Do what the DTD instructs
If you are creating a valid XML file, one that is checked against a DTD, make sure you Know what
tags are part of the DTD and use them appropriately in your document. Understand what each does
32
and when to use it. Know what the allowable values are for each. Follow those rules. The XML
document will validate against the specified DTD.
Rule #3: Watch your capitalization
XML is case-sensitive. <P> is not the same as <p>. Be consistent in how you define element names.
For example, use ALL CAPS, or use Initial caps, or use all lowercase. It is very easy to create mismatching case errors.
Also, make sure starting and ending tags use matching capitalization, too. If you start a paragraph
with the <P> tag, you must end it with the </P> tag, not a </p>.
Rule #4: Quote attribute values
In HTML there is some confusion over when to enclose attribute values in quotes. In XML the rule is
simple: enclose all attribute values in quotes, like this:
<NAME dob="1960">Ben Johnson</NAME>
Rule #5: Close all tags
In XML you must close all tags. This means that paragraphs must have corresponding end paragraph
tags. Anchor names must have corresponding anchor end tags. A strict interpretation of HTML says
we should have been doing this all along, but in reality, most of us haven't.
Rule #6: Close Empty tags, too
In HTML, empty tags, such as <br> or <img>, do not close. In XML, empty tags do close. You can
close them either by adding a separate close tag (</tagname>) or by combining the open and close
tags into one tag. You create the open/close tag by adding a slash, /, to the end of the tag, like this:
<br/>
Examples
This table shows some HTML common tags and how they would be treated in XML.
Tag
Comment
End-Tag
<P>
Technically, in HTML, you're supposed to close this
</P>
tag. In XML, it's essential to close it.
<ELEMENT>
All Elements in XML must have a Start-tag and an
</ELEMENT>
end-tag.
<LI>
This tag must be closed in XML in order to ensure a </LI>
Well-Formed XML document.
<META
META tags are considered empty elements in XML,
<META
name="keywords"
and they must close.
name="keywords"
content="XML, SGML,
content="XML, SGML,
HTML">
HTML"/>
<BR>
33
Break tags are considered empty elements.
<BR/>
<IMG src=
This is an empty element tag.
"coolpictures.html">
<IMG src=
"coolpictures.html"/>
Element and Attribute Rules
The first table contains the basic guidelines for creating element rules in an XML DTD.
The second contains attribute value types.
The third contains attribute default options.
Element Rules:
Symbol
Meaning
Example
#PCDATA
Contains parsed
character data, or
text.
<POW(#PCDATA)>
#PCDATA,
elementname
Contains text and
another element.
#PCDATA is always
listed first in a rule.
<POW(#PCDTATA, NAME)>
,
(comma)
Use in this order
<POW (NAME, RANK, SERIAL)>
The POW element contains textual data.
The POW element must contain both text and the NAME
element.
The POW element must contain the NAME element,
followed by the RANK element, followed by the SERIAL
element.
|
(bar)
Use either or
< POW(NAME | RANK | SERIAL)>
The POW element must contain either the NAME
element, or the RANK element, or the SERIAL element.
name
(by itself)
Use one time only
<POW (NAME)>
The POW element must contain the NAME element,
used exactly one time.
name?
Use either once or
not at all
<POW(NAME, RANK?, SERIAL?)>
The POW element must contain the NAME element
used exactly oncee, followed by one or none RANK
elements, and one or none SERIAL elements.
name+
Use either once or
many times
<POW(NAME+, RANK?, SERIAL)>
The POW element must contain at least one but maybe
more NAME elements, followed by one or none RANK
elements, and exactly one SERIAL elements.
name*
34
Use once, use many
times, or don't use it
at all.
<POW(NAME*, RANK?, SERIAL)>
The POW element must contain at one, many, or none
NAME elements, followed by one or none RANK
elements, and exactly one SERIAL elements.
()
Indicated groups,
may be nested.
<POW(#PCDATA | NAME)*>
The POW element contains one more use uses of either
or both text and the NAME element.
<POW((NAME*, RANK?, SERIAL)* | COMMENT)>
The POW element must contain many instances of the
group that contains one, many, or none NAME elements,
followed by one or none RANK elements, and exactly
one SERIAL elements. OR, it may contain one
COMMENT element.
<POW(NAME | RANK)+>
The POW element must contain a NAME or RANK
element. The NAME or RANK option may appear once
or may be repeated many times.
Attribute Values:
Type
Meaning
Example
CDATA
Character data, text.
<ATTLIST COMMENT category
CDATA #REQUIRED>
The COMMENT element has an
attribute named category. This
attribute contains letters, numbers,
or punctuation symbols.
NMTOKEN
(value-1 |
value-2 |
value-3)
value list
Name token, text with some restrictions.
The value contains number and letter.
However, it cannot begin with the letters
"xml" and the only symbols it can contain
are _, -, ., and :..
<ATTLIST COMMENT category
NMTOKEN #REQUIRED>
A value list provides a set of acceptable
options for the attribute to contain. In
general, you should always include "other"
as one of the options.
<ATTLIST COMMENT category
(red | green | blue |
other) "other">
The COMMENT element has an
attribute named category. This
attribute contains a name token.
The COMMENT element has an
attribute named category. The
category can be "red," "green,"
"blue," or "other." The default value
is "other."
ID
The keyword ID means that this attribute
has an ID value that idenifies this
particular element.
<ATTLIST COMMENT category
ID #IMPLIED>
The COMMENT element has an
attribute named category. The
category will contain an ID value.
35
ID and IDREF work together to
create cross-references.
IDREF
The keyword IDREF means that this
attribute has an ID reference value that
points to another instance's ID value.
<ATTLIST COMMENT category
IDREF #IMPLIED>
The COMMENT element has an
attribute named category. The
category will contain an IDREF
value. ID and IDREF work together
to let you do cross-reference
elements.
ENTITY
NOTATION
The keyword ENTITY means that this
attribute's value is an entity. An entity is a
value that has been defined elsewhere in
the DTD to have a particular meaning.
<ATTLIST COMMENT category
ENTITY #IMPLIED>
The keyword NOTATION means that this
attribute's value is a notation. A notation is
a description of how information should be
processed. You could set up a notation
that allows only numbers to be used for
the value, for example.
<ATTLIST COMMENT category
NOTATION #IMPLIED>
The COMMENT element has an
attribute named category. The
category will contain an entity name
rather than text.
The COMMENT element has an
attribute named category. The
category attribute will contain a
notation name.
Attribute Default Options:
Type
Meaning
Example
#REQUIRED The attribute must always be <ATTLIST COMMENT category CDATA
#REQUIRED>
included when the element
is used.
The COMMENT element has an attribute named
category. This attribute contains letters, numbers,
or punctuation symbols. The attribute must always
be used with the element. If you omit the attribute,
the parser will give you an error message.
#IMPLIED
#FIXED
36
The attribute is optional. If
you see the keyword
#IMPLIED, you know that
this attribute will be ignored
unless it is included in the
element tag. It won't take on
any default values.
<ATTLIST COMMENT category CDATA
#IMPLIED>
The attribute is optional, but
if it is used, it must always
have a certain value. If you
see the keyword #FIXED,
you know that this attribute
will always have the
<ATTLIST COMMENT confirm #FIXED
"yes">
The COMMENT element has an attribute named
category. You may use the attribute or omit the
attribute, as the instance requires.
The COMMENT element has an attribute named
confirm. If it is used, its value will be "yes." If it is
not used, it will not have a value.
specified value when it is
entered.
"value"
A value in quotes is the
default value of this attribute.
If you don't enter the
attribute in the element tag,
the processor will assume
the attribute has this default
value.
<ATTLIST COMMENT category
(red|green|blue|other) "other">
The COMMENT element has an attribute named
category. If you don't use the attribute in the
element tag, the attribute will automatically receive
the value "other."
Interaction Between Components
XML, CSS, script, the DOM, and the browser work together to let you create interactive presentations of
your content. Click on each piece to learn what role it plays.
Copyright © 1998-99
DevX.com, Inc.
XML Parsers
Parsing is the process of checking the syntax of your document and creating the "tree structure." If
you are using a validating parser, the process will also compare the XML file to its DTD.
On-line Parsers
37
There are a number of online parsers. To use these, you typically type in the URI of your file and tell
the process to begin.


Online validating parser, from the W3C
The W3C offers an online parser. Type the URL of the file into the form and the XML file is
both parsed and validated.
Validating Parser from Brown University Scholarly Technolgy Group
This is the most easily accessible and understandable presentation of the online parsers.
Downloadable Parsers
There are many parsers that you can download and run on your local machine. Most of these require
you to have either a Windows or UNIX machine. They are written in a variety of langauges; this is a
cross section of some of the many which are available.






James Clark's expat parser
James Clark is amost a brand in the SGML/XML world. His rendition of an XML parser is
widely used.
Java-based Validating XML Parser
From IBM's AlphaWorks group, this parser claims to be 100% pure Java.
Microsoft XML Parser in C++
A parser from Microsoft.
XML Parser written in Python
This is a validating parser.
XML Parser written in JavaScript.
This parser is non-validating and checks XML syntax only.
SiRPAC, Simple RDF Parser and Compiler
From the W3C.
Introduction to Behaviors
Behaviors are an enhancement to Internet Explorer 5 that allow designers to add scripting elements without
having to do the scripting needed to make them work. Behaviors are also a way in which scripters can write a
script once and turn it over to designers for use whenever needed.
So what can behaviors do? By using XML we can link behaviors to any element in a Web page and manipulate
that element. We can, for example, copy that element's text into a pullquote area on the page. We could offer a
way to magnify small type on a page. Many of the everyday things we do with scripting can be transfered to
behaviors and by combining them with XML we can have greatly enhanced Web pages that will work down the
browser foodchain with no ill effects.
At the left you will find links to several behaviors created here at Project Cool. Each link will take you to a page
that not only demonstrates the behavior but also shows you just how simple they are to implement.
We've divided our behaviors into two categories:


fx - Special Effects behaviors don't add value neccessarily, but do add eye-catching special effects that can
make your page stand out if used appropriatly.
publishing - These behaviors can add value and utility to pages of text content. They make your pages much
more usable for the viewer or add new ways to get them involved in the text.
So what are you waiting for? Click one of the links to the left and start exploring what you can do with behaviors
and XML.
Copyright © 1998
Earthquake!
This behavior falls into the realm of special effects. It's really not useful but it could help provoke mood on a
website. To see it in action just run your mouse over the headline.
38
While it would probably be easy to implement this in the document directly we've chosen to use it as a behavior.
Part of the beauty of behaviors is that they allow a designer to take pre-written code and effects and insert them
into a webpage without having to be a programmer. By having effects like Earthquake available as behaviors a
designer can build of an astonishing repetoire of web display tools without needing to learn JavaScript.
Earthquake is set up via XML so you'll need to create an appropriate namespace before you can use it. We're doing it
as XML so that older browsers aren't affected adversely. It also let's us define a brand new tag. The namespace is set
up in the <html> tag on your webpage. Here's the one we're using on this page:
<html xmlns:fx>
The next step is to define XML tag we'll be using. This is done in the specific media type. In this case the
behavior will apply to the screen so will place its CSS properties there and we associate it with our namespace
by prefixing the namespace to the declaration. Our declaration looks like this:
<style>
<!-@media screen{
fx\:EARTHQUAKE { behavior:url(earthquake.htc) }
}
-->
</style>
As you can see, the only part that is needed is the behavior property. It must point to the behavior file,
earthquake.htc. You can download the earthquake.htc file here. Once you have it just make sure it's on your
server and that the url is specified properly in your CSS.
All that's left then is to place the XML tags around the item you wish to trigger the earthquake behavior.
Earthquake will be triggered when someone runs their mouse over the item. The tagging is very simple and
looks like this:
<fx:EARTHQUAKE>Shake it, baby!</fx:EARTHQUAKE>
Now you've got it, everything you need to know to create your own earthquakes. So...uh....Shake it, baby!
Typewriter Behavior
Sure it owes its heritage to movies and computer gaming, but a typewriter effect can be quite eye-catching if used
properly. We'd bet you're reading this as it types. It's not for every Web site though, so use it sparingly.
This behavior can be set to type at whatever speed you need. The above example types at a speed of one
character every 100 milliseconds.
Typewriter is set up via XML so you'll need to create an appropriate namespace before you can use it. We're
doing it as XML so that older browsers aren't affected adversely. It also let's us define a brand new tag. The
namespace is set up in the <html> tag on your Web page. Here's the one we're using on this page:
<html xmlns:fx>
The next step is to define XML tag we'll be using. This is done in the specific media type. In this case the
behavior will apply to the screen so will place its CSS properties there and we associate it with our namespace
by prefixing the namespace to the declaration. Our declaration looks like this:
<style>
<!-@media screen{
fx\:TYPEWRITER { behavior:url(typewriter.htc);
height: 4em;
font-family: "ocr a extended", courier;
}
}
-->
</style>
39
The most important part of that is the behavior property. It's the only part really needed and it must point to the
behavior file, typewriter.htc. You can download the typewriter.htc file here. Once you have it just make sure it's
on your server and that the url is specified properly in your CSS.
All that's left then is to place the XML tags around the text you wish to have typed onto the page. That's simple
too:
<fx:TYPEWRITER speed="120">Type this text.</fx:TYPEWRITER>
Notice that we've set the speed to 120. If you don't set a speed the typing will appear with the default setting of
100.
That's really about all you need to know to use it. Be aware that this behavior only runs once and only when the page
is first loaded. So if you use this, make sure it's someplace that your users will be able to see it.
Now start typing!
Footnote Behavior
If you've ever seen a Web document with footnotes you know what a problem it is to read a relevant footnote
and then scroll back up the document to find where you had stopped reading. This behavior changes that. It will
bring the footnotes to the user(1) without the need for them to scroll away from their place in the page.
Let's face it too, footnotes can be ugly things tacked to the bottom of a page. By implementing a footnote tag via
a behavior and XML we can give a designer complete control over what the footnote is going to look like when it
appears for the user. Everything about the way the footnote looks can be adjusted via CSS.
Since FOOTNOTE is set up via XML so you'll need to create an appropriate namespace before you can use it.
We're doing it as XML so that older browsers aren't affected adversely. It also let's us define a brand new tag.
The namespace is set up in the <html> tag on your webpage. Here's the one we're using on this page:
<html xmlns:pub>
The next step is to define XML tag we'll be using. This is done in the specific media type. In this case the
behavior will apply to the screen so will place its CSS properties there and we associate it with our namespace
by prefixing the namespace to the declaration. Our declaration looks like this:
<style>
<!-@media screen {
pub\:FOOTNOTE {behavior:url(footnote.htc)}
.footstyle {width: 250;
position: absolute;
left:-1000;
color: black;
background-color: #9999cc;
text-align: justify;
border-color: #404040;
border-width: thin;
border-style: solid;
padding: 1em;
font-family: arial;
font-size: 10pt;
}
.closer {cursor: hand;
color="#ffff00";
text-align: right;
margin-top: 1em;
}
40
.fhilite {cursor: hand;
color: chocolate;
font-family: "Arial";
text-decoration: none;
}
}
-->
</style>
As you can see, FOOTNOTE only needs the behavior property. It must point to the behavior file, footnote.htc.
You can download the footnote.htc file here. Once you have it just make sure it's on your server and that the url
is specified properly in your CSS.
There are three CSS classes defined in the namespace as well. These are all used by the footnote behavior.
The first is footstyle. This defines how the footnote will look when the user calls it. It should and applied to the
division holding the footnote and it's important that it have at least three properties:

width sets the display width in pixels of all footnotes.

left property is used to hide the footnote until it is called.

postition: absolute frees the footnote so that it can be postitioned anywhere on the page.
The closer class describes how the word "close" will look in the displayed footnote block. This word is added to
the bottom right corner of footnotes so that there is an option to remove them from the page display.
Lastly, the class fhilite describes how the footnote link will appear and adds a hand cursor for user feedback.
You'll need to create individual divisions for each footnote to be displayed. Here's what one from this page looks
like:
<div id=foot1 class=footstyle>
<a name="footnote1"></a>
(1) A user used to be someone who was heavily into drugs.
Here a user simply refers to the person using a Web page.
In this case, you.
</div>
The id of the division is extremely important. It is via this id that the behavior manipulates the footnote. The
name can be anything you like as long as it is unique. You'll be using it in the FOOTNOTE tag to link the action
to the division. In this case we used the id of foot1. This would be referenced in the FOOTNOTE tag as
footName="foot1"(2).
Let's take a look now at how that last footnote was called:
<pub:FOOTNOTE footName="foot2">
<a href="#footnote2">(2)</a></pub:FOOTNOTE>
It's that simple. Notice we've placed it around working HTML which would scroll down to the footnote in older
browsers. The footnote behavior will erase that for IE5 and replace it with appropriate HTML to call our
enhanced footnotes leaving just the text that is present within the tag.
You should note that footName is a required property. If you forget to include it you won't get an error message.
The enhanced footnote behavior will simply do nothing.
Ok, so consider yourself armed, er, footed. You should now know everything you need to apply footnotes to your
pages
Magnify Behavior
41
It's become commonplace today to see websites that have lots of text crammed into a small area. Oftentimes
some of that text is in the tiniest possible font. I can't speak for everyone, but in the wee hours of the morning it
can be hard to read that text. Often I've wished for a way to magnify it without resizing the fonts in my browser.
It seems a natural that having an easy way to magnify just a portion a page would be ideal. By creating a
behavior for this and linking it to the page via XML it makes it possible for a magnify effect to be used nearly
anywhere yet have the pages still work seamlessly for older browsers.
See how easy it is to read magnified text by clicking the icon? After you've opened this you can close it by
clicking the close icon on the bottom right.See how easy it is to read magnified text by clicking the icon? After
you've opened this you can close it by clicking the close icon on the bottom right.
If you look to your right you'll see an area of small text. If you are using IE5 beta 2+ you'll also see an icon of a
magnifying glass. Older browsers won't show this icon since it was inserted into the page via the magnify
behavior. If you click the icon a text window will display a magnified version of the exact text that is contained in
the block along with an icon that will allow you to close it. Also, if there is any HTML formatting in that text, such
as a link, it will be applied to the magnified version as well.
This behavior was designed so that nearly all the control is in the hands of the designer. The only exception
being the names of the icons used to indicate magnify and close magnify. These must be set in the .HTC file
controlling this behavior. Everything else is done in the Web page itself using CSS and XML.
Since MAGNIFY is set up via XML so you'll need to create an appropriate namespace before you can use it.
We're doing it as XML so that older browsers aren't affected adversely. It also let's us define a brand new tag.
The namespace is set up in the <html> tag on your webpage. Here's the one we're using on this page:
<html xmlns:pub>
The next step is to define XML tag we'll be using. This is done in the specific media type. In this case the behavior will
apply to the screen so will place its CSS properties there and we associate it with our namespace by prefixing the
namespace to the declaration. Our declaration looks like this:
The next step is to define the XML and the tag properties for MAGNIFY. In doing this we also create a class
called "magstyle" that defines what the magnified text will look like. This is done in the specific media type. In
this case the behavior will apply to the screen so will place its CSS properties there and we associate it with our
namespace by prefixing the namespace to the declaration. Our declaration looks like this:
<style>
<!-@media screen {
pub\:MAGNIFY {behavior:url(magnify.htc)}
.magstyle {color: black;
background-color: goldenrod;
border-color: #black;
border-width: thin;
border-style: solid;
padding: 1em;
font-family: arial;
font-size: 16pt;
position: absolute;
left:-1000;
}
}
-->
</style>
The most important part of that is the behavior property. It must point to the behavior file, magnify.htc. You can
download the magnify.htc file here. Once you have it just make sure it's on your server and that the url is
specified properly in your CSS.
One thing to notice about the magstyle class is that it specifies a left position of -1000. This is so that the HTML
that the behavior creates will be hidden from the user by appearing far off to the left of the display window. We're
42
doing this in part because of a small display glitch in the version of IE used to create this and also because it's
always been my prefered way to hide content. It's just as easy to specify a new postion as it is to specify
hidden/visible.
You'll also need to download the two icons used by this behavior. Right click on each one and then select
"save picture" to save magnify.gif and unmag.gif. This behavior looks for these icons in a directory called
images. You can change these icons to others by editing the magnify.htc file to point to other images. You need
one icon to represent the magnify option and one to indicate close magnify.
All that's left then is to place the XML tags around the text for which you wish to offer a magnified view. It's this simple:
<pub:MAGNIFY newId="1" width="400" align="left">The text that
you wish to be magnifiable.</pub:MAGNIFY>
I'm sure you noticed the properties we are passing to the magnify behavior. The first one, newId, is required.
While we could have added a complex random identification generation routine to the behavior we chose to
keep it simple and simply ask the designer to assign a unique name to each magnifiable section. Always be sure
to assign a value to newId. This is needed to link the icon to the newly generated HTML of the magnified text.
The other two properties are optional, they don't need to be specified. width specifies how wide the magnified
area should be on the display. It defaults to 350 pixels if no width is specified. By making this a specifiable
property the designer is given control of how the text will fit the screen with each magnified area.
The other property, align, specifies the alignment of the magnify icon. Only "left" and "right" are correct values
here. Any other value, or no specification at all, will cause left alignment to be used.
By now you should be ready to apply magnification to your own Web pages. If you still feel a bit uncomfortable
trying this then view the source of this page to see how we've done it.
Now go forth and magnify!
Pullquote Behavior
If you've ever picked up a magazine then the odds are good that you've seen a pullquote. A pullquote is where a
bit of text from the body of an article or story is pulled from the text and highlighted in some way to catch your
eye. It's hoped that the quote will tease you enough to get you to read the story.
Up until now "...it's been a pain to do pullquotes in a Web page."it's been a pain to do pullquotes in a Web page.
It always required working them into the HTML code and hand copying the text to be quoted. For those reasons
pullquotes have been a bit scarce on the Web.
By using a pullquote behavior it's now possible for anyone to put a pullquote into a Web page without having to
do complex layout tricks. It's as simple as putting a tag and some basic CSS into a Web page.
You setup PULLQUOTE via XML so you'll need to create an appropriate namespace before you can use it. We're
doing it as XML so that older browsers aren't affected adversely. It also let's us define a brand new tag. The
namespace is set up in the <html> tag on your Web page. Here's the one we're using on this page:
<html xmlns:pub>
The next step is to define XML tag we'll be using. This is done in the specific media type. In this case the
behavior will apply to the screen so will place its CSS properties there and we associate it with our namespace
by prefixing the namespace to the declaration. Our declaration looks like this:
<style>
<!-@media screen {
pub\:PULLQUOTE {behavior:url(pullquote.htc)}
.pullstyle {width: 200;
color:black;
text-align: left;
border-color:#9966cc;
border-width:thin;
border-style:solid;
border-right: none;
border-left: none;
43
padding: 1em;
margin: 6pt;
font-family: arial;
font-style: italic;
font-size: 14pt;
}
}
-->
</style>
As you can see, PULLQUOTE only needs the behavior property. It must point to the behavior file, footnote.htc.
You can download the pullquote.htc file here. Once you have it just make sure it's on your server and that the url
is specified properly in your CSS.
We also define a class called pullstyle. This is the CSS description of how a pullquote will look when rendered
on a Web page. The behavior will apply this style to the pullquote that it creates. You have complete control over
the appearance by changing the properties and values in pullstyle.
All that's left then is to place the XML tags around the text for which you wish to offer a magnified view. Here's
how we marked the pullqoute near the top of this page:
<pub:PULLQUOTE align="right" lips="pre">it's always
been a pain to do pullquotes in a Web page.</pub:PULLQUOTE>
The align property specifies whether the pullquote will align on the left or the right of the page. Its valid values
are, surprisingly enough, left or right. If you don't specify an alignment then the pullquote will align on the left.
The second property is lips. This is our abbreviation for ellipsis. "An ellipsis is a series of three dots..."An ellipsis
is a series of three dots that can be used at the beginning of a quote, at the end, or on both ends. You can see
an ellipsis in the pullquote to your left. The acceptable values for lips are pre, post, or both. All other values will
be ignored. This is an optional property but it is very useful if you are only quoting part of a sentence.
Finally, just a few thoughts on proper use.
A pullquote should contain text that will draw the reader in. It should only contain a small amount of relevant text
and not several sentences. You'll probably want to consider using it near the top of a page so that it will be seen
immediately by a prospective viewer. You also shouldn't make the style too different from the rest of the page. It
should fit in yet be immediately visible.
With those thoughts in mind, as well and your newfound knowledge of how to apply this behavior, it's time for
you go to out there and pull one over on someone. A quote, that is.
44