What is XML? XML is a way of adding intelligence to your documents. It lets you identify each element using meaningful tags and it lets you add information ("metatdata") about each element. XML is very much a part of the future of Web, and part of the future for all electronic information. XML is a syntax for marking up data and it works with many other technologies to display and process information. It looks and feels very much like HTML. XML isn't going to replace everything else you've already learned; it complements it and extends it. XML isn't going to change the way your Web pages look. You'll still need to use CSS--Cascading Style Sheets-- (with XML) to define font colors or JavaScript (again, with XML) to make your images fly around. Yet XML will change the way you and others read documents and it will change the way documents are filed and stored. It's a new technology and you certainly don't need to use it in order to build a great Web site -- but you will want to be aware of it as you look at the Web of the future. What's the Fuss About? XML lets you make documents smarter, more portable, and more powerful -- that's the promise of XML and that's what all the fuss is about. XML allows you to use your own tags to define parts of a document. You can do this because XML is a descriptive, not a procedural, language. That is, XML describes what something is rather than performing an action. For example, take a look at the front page of a newspaper. You'll see different font sizes, different sections, and columns. If you were to create a Web page for that newspaper--using the same formatting and styles--you would use tags such as <H1> and <font color="red"> to define the size and color of a large headline, or <i> to italicize a word such as a byline, in order to distinguish it from the rest of the text. But just try to write tags that actually explain that you've got a Headline and that the words "John Smith" make up a byline. HTML won't know what you're talking about if you create tags such as <Headline> or <byline> or <advertisement>. XML, with help from other technologies such as CSS, understands what the elements are and how to display them. That means, in the future, when you're searching on the web for say, a Barbie doll for your niece's birthday, you'll get Barbie the DOLL instead of some other type of Barbie, because the Barbie doll page might be marked up like this: <DOLL>Barbie</DOLL>. Pretty cool, huh? XML documents can be moved to any format on any platform -- without the elements losing their meaning. That means you can publish the same information to a web browser, a PDA, or a networkenabled bread machine and each device would use the information appropriately. The most important thing to remember about XML, though, it that is doesn't stand alone. It needs other technologoies, like CSS, in for you to see its results. If all of this seems like a pain, and you don't want to mess with XML, it's OK. You don't need it to make a great web page. But you never know when organization will come in handy. Where Did XML Come From? XML is a simplified version of SGML and a cousin of HTML. It was developed by members of the W3C and released as a recommendation by the W3C in February 1998. SGML, the parent of XML, is an international standard that has been in use as a markup language primarily for technical documentation and government applications since the early 1980s. It was developed to standardize the production process for large document sets. Think: Medical records. Company databases. Aircraft parts catalogs. Other really huge documents. 1 Marking-up documents in SGML allows information to be passed from one system to the next without losing information. With databases marked-up in SGML you can see what Widget A is all about and go check to see if Widget A is in stock. Early on, people thought that SGML would be useful for the Web. In fact, HTML is really an very basic application of SGML! But HTML quickly became used for visual layout, so a group of people returned to the basics, determined to create something that had the strengths of SGML without being so difficult to implement -- and had the ease of use of HTML, but with more structural power. The result was XML. The design goals of XML, taken from the XML Specification are: XML shall be straightforwardly usable over the Internet. XML shall support a wide variety of applications. XML shall be compatible with SGML. It shall be easy to write programs which process XML documents. The number of optional features in XML is to be kept to the absolute minimum, ideally zero. XML documents should be human-legible and reasonably clear. The XML design should be prepared quickly. The design of XML shall be formal and concise. XML documents shall be easy to create. Terseness in XML markup is of minimal importance. In other words, XML is easy to create, easy to read, and designed for use over the Internet. What more could a Web designer ask for? What Does XML Look Like? If you've ever used HTML, XML is going to look very familiar! When you view the source of a document written in XML the first thing you'll see is the XML declaration, which looks like this: <?xml version="1.0"?> Then, in the body of the document, you'll see a lot of tags. The tags look familiar at first -- they start with the usual less than sign and end with the usual greater than sign, like this: <name> But then you'll notice that the tags might not be quite the names you've come to expect! You'll see tags that seem to be made-up tag names. Tags like <dogchow> and <badcars> and <species>. In fact, if you view the source of an XML document, you'll see tags surrounding lots of words, maybe every word in the document. These tags define exactly what the content is. And the creator of the document had the power to create his or her own specific set of tags. Suppose you're looking at a Web page marked up in XML on The Canterbury Tales by Chaucer. You're looking specifically at lines 282-286 of "The Physician's Tale." The document source for that section might look like this: <?xml version="1.0"?> <CANTERBURY-TALES> <SECTION name="physician"> The Physician's Tale <LINE number="282"> That no man woot therof but God and he. </LINE> 2 <LINE number="283"> For be he lewed man, or ellis lered, </LINE> <LINE number="284"> He noot how soone that he shal been afered. </LINE> <LINE number="285"> Therfore I rede yow this conseil take -</LINE> <LINE number="286"> Forsaketh synne, er synne yow forsake. </LINE> </SECTION> </CANTERBURY-TALES> The tags simply define that: 1) This document is the Canterbury Tales. 2) This section is the Physician's Tale. 3) Each line of the Physician's Tale is defined. 4) Each line ends, and the Physician's Tale and The Canterbury Tales end. If the entire document were marked up such as this, you could easily jump to a certain line or section. The entire document is annotated for easy reference and searching, and instead of viewing the entire document, users could request only specific sections of a document--simply by calling the specific tags they want. Oh, and we don't recommend that you manually type out each line in the Canterbury Tales. Get a computer to count the lines for you. XML Versus HTML HTML and XML are cousins. They draw off the same inspiration, SGML. They both identify elements in your page. They both use a very simliar syntax. If you are famliar with HTML, XML will also feel familiar. The big difference between HTML and XML is that HMTL has evolved into a markup language that describes the look, feel and action of a Web page. An <H1> is a headline that is displayed in a certain size, for example. In contrast, XML doesn't describe how a page looks, how it acts or what it does. XML describes what the words in a document ARE. This is a critical distinction! While HTML combines structure and display, XML separates them. This means that XML documents are more portable and can be used in many different types of applications. In the near future, we'll see both XML and HTML documents. Eventually, XML will probably replace HTML, or HTML will become an application of XML. But that doesn't mean you should toss out everything you know! In many ways, XML builds on HTML and if you know HTML, XML will be easier to work with. 3 Valid and Well-Formed XML You'll sometimes hear an XML document referred to as a "valid" XML document or a "well-formed" XML document. This distinction touches on one of the nice things about XML. When you used SGML, you had to create something call a Document Type Definition (DTD, for short) in order make the SGML document useful. DTDs were fairly complex and required a lot of work to create. They were one of the roadblocks to widespread use of SGML. With XML you have an option. You can make a well-formed XML document by simply following the XML syntax rules. You don't have to create a separate DTD if you don't want to. If you do create an set of rules -- a DTD -- and make your document conform to those rules, it is considered a valid XML document. DTDs describe the structure of your document. We'll be discussing DTDs in detail later on. Right now, all you need to know is that the main difference between Valid and Well-Formed XML is that Valid XML refers to and conforms to a DTD and Well-Formed XML doesn't. Structure XML applies structure to documents. Documents are sets of related information. The term structure seems to bring some unpleasant imagery with it, especially for creative souls who want to make this medium work in new and innovative ways. But when one is dealing with publishing, the term structure is quite positive. It is the way we put a skeleton behind information, so that the pieces of information work together and make sense as a whole. There are two key principles behind a structured model: 1. Each part -- or element -- has a relationship with other elements. This series of relationships defines the structure. 2. The meaning of the element is separate from its visual appearance. Documents We can't really talk about structure without first talking a bit about documents. Document is another of those terms that conjures up somewhat negative images; one tends to picture "dusty stacks of documents" or "attorney's documents" or "document processing." But in this case, a document is simply a collection related information. For example, this page is a document. Your favorite 'zine is set of documents. Your intranet is probably comprised of hundreds if not thousands of documents. Sometimes documents are created as a single unit. Sometimes they are built on demand, pulling pieces from a database and assembling into a document as the reader requests. In both cases, structure makes the document easier to create, maintain, and display. Document Structure The document structure defines the elements which make up a document, the information you want to collect about those elements, and the relationship those elements have to each other. You use XML to markup the document, following the structure you have decided upon. By treating a document as a collection of elements, you free it from the constraints of time, place, and presentation format. You can move the structured document from a word processor to a PDA to a web browser. The structure is intact on each; you just alter the display characteristics for each device. The document structure is called the document tree. The main trunk of the tree is the parent. All the branches and leaves are children. Document trees are usually visually represented as a hierarchal chart. 4 Structure vs. Format The most important thing to remember about a structured document is that it is defined by the elements it contains, not by how it looks. Structure says that an element is a paragraph. Format says to display the paragraph in 12 point Times. Structure says the element is a book title. Format says to display the book title in green bold body text. Structure say the element is a social security number. Format says to hide and not display the social security number. Learning to separate structure from format is critical in making good use of XML. Metadata Metadata is data about data. A key use of XML is to collect and work with metadata. At its most basic level, XML is a metadata language. That is, it is a way of assigning information to pieces of data. The most obvious use of this is to identify a piece of data as a certain structural element. But this is just the beginning. XML is about much more than marking up documents for use in a web browser. XML is really about adding layers of information to your data, so that the data can be processed, used, and transferred between applications. Metadata in HTML If you've built a website, you've almost certainly worked with metadata. The keyword and description meta tags are simple uses of metadata. With these meta tags you can assign the document as a whole information about the general type of content it contains. This information doesn't display in a web browser, but it does display in search engine results. Another use of meta tags is to store information such as creator name and creation date. Some servers are structured to work with these meta tags, allowing you to sort by creation date or display based on creator name. Going Further with XML XML takes this basic idea much further. With XML, you can describe where you found your data, you can quantify, qualify, and further define it. You can then use this metadata to validate information, perform searches, set display constraints, or process other data. Here's just a few examples: XML initiatives are under way which will allow for digital signature verification and validated form submission. This could make it possible for forms, with signatures, to be submitted online and be legally binding. XML initiatives are under way to help catalog web content. Using metadata, the web can be be indexed better and search more effectively. XML is being used to transfer data, based on factors just as date entered, between unlike databases. The metadata is both a means to find the correct data bits and a common language of transfer between databases which do not speak each other's specific language. The RDF Proposal One W3C-blessed use of metadata which you may have heard about is a proposal called the Resource Description Framework, or RDF. RDF is an application of XML for making metadata 5 machine-processable. It allows applications to exchange information about data automatically. This has implications in indexing, content rating, intellectual ownership, e-commerce, and privacy, among other things. The W3C says: RDF with digital signature will be the key to building the "Web of Trust" for electronic commerce, collaboration, and other applications. Display Issues XML alone will not display a page. You must use a formatting technology, such as CSS or XSL to display XML-tagged documents in a Web browser. XML is about separating structure and format. An XML document doesn't know anything about how to display itself. It relies on other technologies for this. Although XML does not deal with form, it contains a great deal of information about the document and its elements. This, when combined with style tools, gives you a whole new strength and flexibility in displaying your documents without having to maintain multiple copies of the document. XSL Extensible Stylesheet Language, XSL, is the future of XML display. It is an XML-based languages for expressing stylesheets. With XSL, you can make context-sensitive display decisions. For example, you could automatically display the document one way in a Web browser and another on a PDA. XSL can also transform XML into HTML, so that older browsers can view XML documents. CSS Cascading Style Sheets, CSS-1 and CSS-2, are the current way to display XML documents in a Web browser. CSS is a means of assigning display values to page elements. If you are going to be working with XML and you will be concerned with displaying pages, learn CSS. The CSS Reference Guide contains a guide to the CSS-1 properties. Behaviors Behaviors are a non-standard, IE5 technique that lets you do some interesting display actions with XML tags. They combine scripting and CSS in a component file. This component can be attached to a particular tag and used in many different documents. The Behaviors Library shows some of the things you can do with this technique. The DOM The Document Object Model lets you address, change, and manipulate any individual portion of the Web page. 6 The phrase "document object model" means that you treat your document as a collection of individual objects, rather than a single solid unit. The W3C DOM is the set of rules for doing this in a standard way in a Web browser, with HTML and XML files. O is for Object In an object-oriented approach, the program or the document is made up of many smaller components called objects. The smaller components can be re-arranged, added to, or removed dynamically. The idea of objects has become quite popular in both software and documents. The programming language Java and the scripting language JavaScript each has an object-oriented philosophy at its core. The adoption of the standard DOM enables Web pages to share that object approach too. With an object model, you manage the small pieces, combining them and reusing them as it makes sense -- instead of writing one huge applications program or one huge document. You might think of an object approach as being a little like a collection of Lego blocks ... different pieces do different things, but you can combine and recombine them into many different finished projects. Each object type acts a template. You can use an instance of the same object over and over again. For example, you might have multiple instances of the <canine> element in a document. All the objects share the same name, canine, and work the same way, but each one represents its own set of data and can be addressed individually. The API It isn't enough to merely know that an object is an object. You also need to know how to talk to that object and give it commands. That's where the API comes in. API stands for Application Programming Interface. An API is a set of rules that describes how you can access and manipulate an object. The DOM specification describes the API for HTML and XML documents. The DOM, by providing a standard API, defines the naming conventions, programming models, and other rules for communicating with an object in an HTML or XML page. Getting from XML to Objects In an XML document, each element is actually an object -- it has a name and it has attributes that describe it. The browser, combined with a stylesheet, displays each of the XML elements/objects in a web page. Because they are objects, you can address and change them individually. Ah, but just knowing that every piece is an object isn't enough. You need to have a set of rules, an API, to describe how to address those objects when they are placed in a web page. That's where the DOM comes in. The DOM does three things -- you might think of it as explaining the "who, what, and how" of the web page. 1. First, it describes who -- which objects are a web page and how XML objects are represented there? 2. Second, it defines what -- what can these objects do and how do they work with others? 3. Third, it defines how -- how can these objects can be addressed? The DOM is the translator, the interface that lets all the pieces be represented properly, talk to each other, and communicate with scripts and other action tools. It is XML that lets you add and identify data, but it is the DOM that lets the script manipulate and display that data on command in the web browser window. 7 Pulling It All Together You'll typically be working with four technologies that combine to create an interactive Web page: XML (or HTML), a scripting language, CSS, and the DOM. This illustration shows their relationship. XML identifies data. For example: "King Lear" is a title element. CSS stores information about display values for elements and delivers the information to the browser. For example: Titles are displayed in 18 point black courier type. The script "talks" to the objects and sends messages to and from the browser about the objects. Typically these are "change your display" or "do this" messages based on user actions or other variables. For example: If a particular title is out of stock, display it in red. The DOM provides the common interface through which various scripts and objects talk to one another and display in the Web browser. The browser displays the results to the end user. If any of these pieces are missing, you can't create a dynamically-changing presentation of your document. Element An element is the basic building block of HTML and XML documents. Elements are identified by a tag. The tag consists of angle brackets and content, and looks like this: <AUTHOR>Thadius J. Frog</AUTHOR> In HTML, you use a pre-defined set of elements. In XML you create your own set of elements. Attribute 8 Attributes are like adjectives, in that they further describe elements. Each attribute has a name and a value. Attributes are entered as part of the tag, like this: <AUTHOR dob="1874">Thadius J. Frog</AUTHOR> Tag You use a tag to identify a piece of data by element name. Tags usually appear in pairs, surrounding the data. The opening tag contains the element name. The closing tag contains a slash and the element's name, like this: <AUTHOR>Thadius J. Frog</AUTHOR> Attribute Value Attributes contain an attribute values. The value might be a number, a word, or a URL. Attribute values follow the attribute and an equal sign, like this: <AUTHOR dob="1874">Thadius J. Frog</AUTHOR> In XML, attribute values are always surrounded by quotation marks. Declaration You begin an XML file with an XML declaration. The declaration states that this is an XML file. The xml declaration looks like this: <?xml version="1.0"?> DTD Document Type Defintion. The DTD defines the elements, attributes, and relationships between elements for an XML document. A DTD is a way to check that the document is structured correctly, but you do not need to use one in order to use XML. The XML Document An XML file is an ASCII text file with XML markup tags. It has a .xml extension, like this: booklist.xml Inside an XML File An XML file contains three basic parts: 1. A declaration that announces that this is an XML file; 2. An optional definition about the type of document it is and what DTD it follows; 3. Content marked up with XML tags. Click on this paragraph to see a very simple example of an XML document. Click on an part of the document to learn more about it. Types of XML Documents 9 There are two types of XML documents: well-formed or valid. The only difference between the two is that one uses a DTD and the other doesn't. Well-formed Well-formed documents conform with XML syntax. They contain text and XML tags. Everything is entered correctly. They do not, however, refer to a DTD. Valid Valid documents not only conform to XML syntax but they also are error checked against a Document Type Definition (DTD). A DTD is a set of rules outlining which tags are allowed, what values those tags may contain, and how the tags relate to each other. Typically, you'll use a valid document when you have documents that require error checking, that use an enforced structure, or are part of a company- or industry-wide environment in which many documents need to follow the same guidelines. DTDs A Document Type Definition (DTD) is a set of rules that defines the elements, element attribute and attribute values, and the relationship between elements in a document. When your XML document is processed, it is compared to its associated DTD to be sure it is structured correctly and all tags are used in the proper manner. This comparison process is called validation and is is performed by a tool called a parser. Remember, you don't need to have a DTD to create an XML document; you only need a DTD for a valid XML document. Here's a few reasons you'd want to use a DTD: Your document is part of a larger document set and you want to ensure that the whole set follows the same rules. Your document must contain a specific set of data and you want to ensure that all required data has been included. Your document is used across your industry and need to match other industry-specific documents. You want to be able to error check your document for accuracy of tag use. Deciding on a DTD Using a DTD doesn't necessarily mean you have to create one from scratch. There are a number of existing DTDs, with more being added everyday Shared DTDs As XML becomes wide-spread, your industry association or company is likely to have one or more published DTDs that you can use and link to. These DTDs define tags for elements that are commonly used in your applications. You don't need to recreate these DTDs -- you just point to them in your doctype tag in your XML file, and follow their rules when you create your XML document. Some of these DTDs may be public DTDs, like the HTML DTD. Others may belong to your company. If you are interested in using a DTD, ask around and see if there is a good match that already exists. Create Your Own DTD Another option is to create your own DTD. The DTD can be very simple and basic or it can be large and complex. The DTD will be a reflection of the needs of your document. 10 It is perfectly acceptable to have a DTD with just four or five basic elements if that is what your document needs. Don't feel that creating a DTD necessarily needs to be a huge undertaking. However, if your documents are complex, do plan on setting aside time -- several days or several weeks -- to understand the document and the document elements and create a solid DTD that will really work for you over time. Make an Internal DTDs You can insert DTD data within your DOCTYPE definition. If you're worked with CSS styles, you can think of this as being a little like putting style data into your file header. DTDs inserted this way are used in that specific XML document. This might be the approach to take if you want to validate the use of a small number of tags in a single document or to make elements that will be used only for one document. Remember, the primary use for a DTD is to validate that the tags you enter in your XML document are entered as specified in the DTD. It is an error-checking process that ensures your data conforms to a set a rules. XML Syntax Tagging an XML document is, in many ways, similar to tagging an HTML document. Here are some of the most important guidelines to follow. Rule #1: Remember the XML declaration This declaration goes at the beginning of the file and alerts the browser or other processing tools that this document contains XML tags. The declaration looks like this: <?xml version="1.0" standalone="yes/no" encoding="UTF-8"?> You can leave out the encoding attribute and the processor will use the UTF-8 default. Rule #2: Do what the DTD instructs If you are creating a valid XML file, one that is checked against a DTD, make sure you Know what tags are part of the DTD and use them appropriately in your document. Understand what each does and when to use it. Know what the allowable values are for each. Follow those rules. The XML document will validate against the specified DTD. Rule #3: Watch your capitalization XML is case-sensitive. <P> is not the same as <p>. Be consistent in how you define element names. For example, use ALL CAPS, or use Initial caps, or use all lowercase. It is very easy to create mismatching case errors. Also, make sure starting and ending tags use matching capitalization, too. If you start a paragraph with the <P> tag, you must end it with the </P> tag, not a </p>. Rule #4: Quote attribute values In HTML there is some confusion over when to enclose attribute values in quotes. In XML the rule is simple: enclose all attribute values in quotes, like this: <NAME dob="1960">Ben Johnson</NAME> Rule #5: Close all tags 11 In XML you must close all tags. This means that paragraphs must have corresponding end paragraph tags. Anchor names must have corresponding anchor end tags. A strict interpretation of HTML says we should have been doing this all along, but in reality, most of us haven't. Rule #6: Close Empty tags, too In HTML, empty tags, such as <br> or <img>, do not close. In XML, empty tags do close. You can close them either by adding a separate close tag (</tagname>) or by combining the open and close tags into one tag. You create the open/close tag by adding a slash, /, to the end of the tag, like this: <br/> Examples This table shows some HTML common tags and how they would be treated in XML. Tag Comment End-Tag <P> Technically, in HTML, you're supposed to close this </P> tag. In XML, it's essential to close it. <ELEMENT> All Elements in XML must have a Start-tag and an </ELEMENT> end-tag. <LI> This tag must be closed in XML in order to ensure a </LI> Well-Formed XML document. <META META tags are considered empty elements in XML, <META name="keywords" and they must close. name="keywords" content="XML, SGML, content="XML, SGML, HTML"> HTML"/> <BR> Break tags are considered empty elements. <BR/> <IMG src= This is an empty element tag. <IMG src= "coolpictures.html"> "coolpictures.html"/> Copyright © 1998-99 Well-formed XML A document that conforms to the XML syntax rules is called "well-formed." If all your tags are correctly formed and follow XML guidelines, then your document is considered a well-formed XML document. That's one of the nice things about XML -- you don't need to have a DTD in order to use it. Begin the Well-formed Document To begin a well-formed document, type the XML declaration: <?xml version="1.0" standalone="yes" encoding="UTF-8"?> If you are embedding XML, it will go after the <HTML> and <HEAD> tags, and before any Javascript. If you are creating an XML-only document, it will be the first thing in the file. 12 Version You must include the version attribute for the XML declaration. The version is currently "1.0." Defining the version lets the browser know that the document that follows is an XML document, using XML 1.0 structure and syntax. Standalone The next step is to declare that the document "stands alone." The application that is processing this document knows that it doesn't need to look for a DTD and validate the XML tags. Encoding Finally, declare the encoding of the document. In this case, the encoding is UTF-8, which is the default encoding for XML. You can leave off this attribute and the processor will default to UTF-8. Remember the Root Element After the declaration, enter the tag for the root element of your document. This is the top-most element, under which all elements are grouped. Follow XML Syntax Now, enter the rest of the your content. Remember to follow XML syntax: Remember that capitalization matters; Quote all attribute values; Close all tags; Remember to close empty tags too, like this: <br/> Pretty easy, isn't it? That's all there is to it! Valid XML A valid document conforms to the XML syntax rules and follows the guidelines of a Document Type Definition (DTD). The process of comparing the XML document to the DTD is called validation. This process is performed by a tool called a parser. Begin the Valid XML Document To begin a well-formed document, type the XML declaration: <?xml version="1.0" standalone="no" encode="UTF-8"?> If you are embedding XML, it will go after the <HTML> and <HEAD> tags, and before any Javascript. If you are creating an XML-only document, it will be the first thing in the file. Version You must include the version attribute for the XML declaration. The version is currently "1.0." Defining the version lets the browser know that the document that follows is an XML document, using XML 1.0 structure and syntax. Standalone The standalone="no" attribute tells the computer that it must look for a DTD and validate the XML tags. Encoding Finally, declare the encoding of the document. You can leave off this attribute and the processor will default to UTF-8. 13 Create a DOCTYPE Definition The second element in a valid XML document is the DOCTYPE definition. This identifies the type of document and DTD in use. If you look at HTML source files, you'll often see a !DOCTYPE definition, especially if the file was created by a WYSIWYG tool. The DOCTYPE definition points to an HTML DTD. In a valid XML file, !DOCTYPE tells the program that is processing your XML file two things: the name of the type of document and the name and location of the DTD against which to validate the file's contents. The DOCTYPE definition looks like this: <!DOCTYPE type-of-doc SYSTEM/PUBLIC "dtd-name"> !DOCTYPE This says that you are defining the DOCTYPE. type-of-doc This is the name of the type of document contained in this file. Typically, this is the same name as the DTD. SYSTEM/PUBLIC SYSTEM tells the processor to look for the private DTD at the following location. PUBLIC tells the processor to look for a public DTD at the following location. "dtd-name" The URL after SYSTEM or PUBLIC is the name of the dtd file. All DTDs end with the extension .dtd. If you want, instead of pointing to an external DTD, you could place the DTD information within the DOCTYPE definition, making it local to your XML document. You should do this only if you want to define a few simple elements and you want them permanently attached to a particular document. Remember the Root Element After the declaration, enter the tag for the root element of your document. This is the top-most element, under which all elements are grouped. Follow XML Syntax Now, enter the rest of the your content. Remember to follow XML syntax: Remember that capitalization matters; Quote all attribute values; Close all tags; Remember to close empty tags too, like this: <br/> Elements Elements are the basic building blocks of XML (and HTML, for that matter). Each element is a piece of data, identified by a tag. The tag contains the name of the element and any of its attributes, like this: <AUTHOR dob="1864">Thadius J. Frog</AUTHOR> Thadius J. Frog is now identified as an author element. This particular author element as a date of birth (dob) attribute value of 1864. Chose Your Own XML is an extensible markup language. This means you create a set of elements that work for your content -- and that you'll be able to use consistently within the document. 14 Whether you use a DTD or not, you'll still want to sit down and write a list of the element names that you will be using in your document. XML is case-sensitive, so as you're thinking about the element names, be sure the think about how you capitalize them also. Select names that are both easy to rememberer and easy to type. Ideally, your tags should have some inherent meaning too. This makes them easier to use. For example, if you want to identify "last name" as an element, consider naming the element something like "last-name" or "surname." Be consistent in your use of names. It is easier to apply one set of general rules to 20 different tags than it is to remember eight discrete tags that follow no particular pattern. For example, if your document is a listing of classes, you could use these elements: <list-of-classes> <name-class> <instructor-name> <Sec> <TIME> <descprt> But you're just asking for confusion! There's a mix of capitalization. There's a mix of abbreviation and full words. In one case the phrase "name" is the first part of tag; in the other it is the second part of the tag. It isn't logical to remember this set of names. Wouldn't names like this be easier to use? <classlist> <class> <section> <time> <instructor> <description> Theses names are all lowercase, full words, no plurals -- and easy set of criteria to remember. Focus on Structure, Not Format One of the goals of using XML is to separate structure ("this is an author") from format ("display this in 10 point Helvetica"). Elements remain identified as elements, no matter what platform you move the data to. An XML document is completely interpretable. When you think about elements, think about the role they play and the data they contain. Don't think about how the elements will look on the page. Appearance is handled separately. You are using elements to identify data within your document as playing a certain role or belonging to a certain category of data. Displaying Elements You can use any tag name you want, as long as you follow proper XML syntax. Of course, those tags alone won't do anything. They will just sit there quietly, marking up your data. After you data is marked up, you'll use style sheets or other processing tools to display the XML document. You can control the display based on information contained in the elements. Using Elements In a well-formed XML document, you can insert any element tag you want, as long as you follow proper syntax. In a valid XML document, only the elements which are specified in the DTD will pass muster. If you randomly add other elements, their use will be flagged as an error. When you use elements in an XML document, you must follow standard XML syntax: 15 The element name surrounds the data which it defines. For example: <chapterhead>Tying Knots</chapter-head>. All elements, including empty elements, must end. This means having an open and close tag for regular elements and a tag that closes with a slash for empty elements. The element name is case sensitive: <AUTHOR>, is not the same as <author>. DTDs and Elements One of the ways to define and codify all your elements is to create a DTD. A DTD defines the allowable elements, their attributes (if any) are, and their relationship is to other elements. By validating your XML document against a DTD, you can test to be sure that elements in the documents are being used correctly. Attributes Attributes provide additional information about elements. You use elements and attributes all the time in HTML. For example, in HTML, a tag such as <H1 align="center"> includes an element: H1, and an attribute: align and an attribute value: center. In HTML, attributes allow you to specify additional information about your elements. Often this information is formatting-related, such as align or size. In XML, attributes allow you to specify additional data about an element, but it is never formatting-related. It is, instead, additional data about that particular element. Let's say, for example, you're creating documents about late 20th century popular music. In your DTD you've created an element called <SONG> which identifies each musical title. You have music that falls into different decade categories -- the 70's, the 80's and the 90's. You can give the song element an attribute called era. Now, you'll be able to know from what era each song dates. By using an attribute, you can identify different versions of the same song -- "I've Got You Babe" from the 1960s and "I've Got You Babe" from the 1980s. Later on, you can use this data to display all 70s songs in green, or to sort the displayed titles by era. You would use the attribute like this: <SONG era="60s">I've Got You Babe</SONG> <SONG era="70s">Billy Don't Be a Hero</SONG> <SONG era="80s">I've Got You Babe</SONG> "I've Got You Babe" is identified as a "song" element with an "era" attribute value of "60s". "Billy Don't Be A Hero" is identified as a "song" element with an "era" attribute value of "70s". "I've Got You Babe" is identified as a "song" element with an "era" attribute value of "80s". Attributes and their allowable values are created in your DTD, when you specify elements. They are specified through an attribute list. Like element names, attribute names are case-sensitive, so be aware of your use of capitalization when you select and use attribute names. One other important thing to remember about attributes in XML tags is that the attribute values must always be contained inside quotes. In HTML it's a mixed bag, but in XML the rule is easy to remember: quote all attribute values. Comments Comments are a way to add your own notes to an XML document. The browser and the XML processors will ignore anything inside comments. You aren't going to remember what you were thinking three months later when you return to edit the document, so don't be afraid to add comments as reminders or as markers of work that you have done. To create a comment: 1. Type a less than sign, followed by an exclamation point and two dashes like this: 16 <!-2. Type the text you want inside the comment. Be sure the text DOES NOT contain two dashes! <!--This defines a listing of books 3. Now, close the comments, with two dashes and a closing greater than tag: <!--This defines a listing of books--> CDATA CDATA stands for "character data." Character data are letters, numbers and other symbols that are used exactly as they are typed. They are not parsed or processed, or treated as if they have any special meaning. You can create a CDATA section within your XML document. A CDATA section is handy way to show code examples or to use characters, such as > that would otherwise take on a special meaning. You can use CDATA instead of using a series of &lt;, for example. To create a CDATA section: 1. At the place in the document where you want the CDATA section to appear, begin a CDATA definition with the less than sign and an exclamation point. <! 2. Type an open square brace and the letters CDATA. <![CDATA 3. Type another open square brace. <![CDATA[ 4. Now type the CDATA itself. In this example, we are typing some sample code. <![CDATA[<NAME common="freddy" breed"springer-spaniel">Sir Fredrick of Ledyard's End</NAME> 5. End the section with two closing square bracket and a greater than symbol. <![CDATA[<NAME common="freddy" breed"springer-spaniel">Sir Fredrick of Ledyard's End</NAME>]]> Click anywhere on this code to see how it would be displayed in a browser, assuming of course, that it is linked to a stylesheet: <HEAD1> Entering a Kennel Club Member </HEAD1> <DESCRIPTION> Enter the member by the name on his or her papers. Use the NAME tag. The NAME tag has two attributes. Common (all in lowercase, please!) is the dog's call name. Breed (also in all lowercase) is the dog's breed. Please see the breed 17 reference guide for acceptable breeds. Your entry should look something like this: </DESCRIPTION> <EXAMPLE> <![CDATA[<NAME common="freddy" breed"=springer-spaniel">Sir Fredrick of Ledyard's End</NAME>]]> </EXAMPLE> Namespaces Namespaces are a way of using elements from more than one DTD within the same XML document. Sometimes you may be working with material that draws on several sets of element tags. For example, you might have an online store selling tropical fish and you'd like to use the <SOURCE> tag to identify both the geographic location from which each species comes and the wholesaler from whom you buy it. Namespaces are a way to do this. An XML namespace is a collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names. In practice, namespaces let you match a tag you are using with a particular set of tags. In the beginning of your document (or at the start of a particular element of your document), you identify the namespaces you'll be using and where the tag information is located. Then, when you use the tag to identify an element in your document, you precede it with the appropriate namespace name. Declaring Namespaces At the beginning of your document, you'll want to identify the namespaces you are using in your document. This process is called declaring the namespace. In this example, you are creating a namespace called "sales." The URI for sales is the mythical fishworld.org/schema: <document xmlns:SALES='http://fishworld.org/schema'> Using Namespaces When you use the tag to create the element that is defined in one of the namespaces, the namespace is the first part of the tag, like this: <SALES: SOURCE>Fish-o-Rama Wholesalers and Suppliers to the Trade</SOURCE> When you use your own tag you just use the tag name, like this: <SOURCE>Mexico, Central America</SOURCE> In January 1999, Namespaces became a W3C Recommendation. XML Entities An entity is a short cut to a set of information. When you use an entity, it "expands" to its full meaning, but you need only type the shorter entity name during data entry. You might think of an entity as being a bit like a macro -- it is a set of information that can be used by calling one name. XML defines two types of entities. The general entity, which we'll talk about here, is used in XML document. The parameter entity is used in DTDs. General entities are easy to spot: they begin with the ampersand and end with the semicolon, like this: &entity-name; Uses for Entities 18 Entities are a way to make entering and managing data easier. You've probably already used entities without calling them that. If you've ever entered the characters &lt; to create the < symbol, you've used an entity. This keystroke combination is a standard predefined entity in both HTML and XML that lets you access a particular ascii character without having to memorize the character set number. Here are a few reasons you might want to define and use entities: Entities save typing. Suppose you have a paragraph, like a copyright notice, that you use in every single document. You could type that notice over and over again. Or, you could use an entity to call it forth in place. Entities can reduce errors. By the 101st time you type that copyright notice, it is likely your poor fingers will be so tired you'll make an error and set your copyright for 1989 instead of 1999. Using an entity can reduce the potential for these types of errors. Entities are easy to update. It is time to update that copyright notice -- with an entity you can make the change in one place and be done with it. Without an entity you'd be searching and replacing throughout your document set. Entities can act as placeholders for TBD information. Maybe legal hasn't quite finalized what they want that copyright notice to say. That doesn't have to stop production -- you can use and entity and when the final wording comes down, the entity will automatically display the new, corrected version in all your documents. You can get quite creative with the use of entities, and even have documents that are constructed entirely from entities. Here's an example: You want to create different documents, each contains a set of bios for members of your staff. You'll have an executive set, a set for each product line, a set for six different regions around the world ... subsets of the same content appears in each. One approach you could take is creating 10 or 12 separate flat files, with the appropriate biography information into each. But an easier way is to create a small file for each bio, then call each into the executive page, the European page, the Flying Toys Division page and so on via an entity. Here's how the content code for your Flying Toys Division Page might look. Upon display, the entities would expand and you'd see the full bios of each person. If you needed to change the bios, you could do it in one place. If the product manager changed, all your pages would be automatically updated with the new person. Click anywhere in the code to see how it might expand into a displayed document: <HEAD>The Faces Behind Flying Toys!</HEAD> <BIO>&bio-ft-div-head;</BIO> <BIO>&bio-ft-prod-mgr;</BIO> <BIO>&bio-ft-designer;</BIO> <BIO>&bio-ft-lead-engineer;</BIO> Defining Entities You can define entities in your local document as part of the DOCTYPE definition. You can also link to external files that contain the entity data. This, too, is done through the DOCTYPE definition. A third option is to define the entities in your external DTD. Use a local definition when the entity is being used only in this one particulars file. Use a linked, external file when the entity being used in many document sets. To define an entity: 1. Start your DOCTYPE definition as usual, like this: <!DOCTYPE 2. Now mark that you are defining some data by entering a square bracket: 19 <!DOCTYPE [ 3. Start the entity definition, with a less than sign, an exclamation mark, and the phrase ENTITY, all in caps: <!DOCTYPE [ <!ENTITY 4. Type the name of the entity. Type it using the capitalization that you will use when calling it later on. <!DOCTYPE [ <!ENTITY copyright 5. If you are defining the entity locally, type the value of the entity, surrounded by quotes, and then close the entity definition with a greater than sign. <!DOCTYPE [ <!ENTITY copyright "Copyright 2000, As The World Spins Corp. All rights reserved. Please do not copy or use without authorization. For authorization contact legal@worldspins.com."> 6. If you are defining an entity in an external, ascii text file, put in a pointer to the external file, then close the entity definition with a greater than sign. <!DOCTYPE [ <!ENTITY copyright SYSTEM "http://www.worldspins.com/legal/copyright.xml"> 7. Create all your entity definitions. When you are done, close the DOCTYPE definition with a square brace and a greater than sign. <!DOCTYPE [ <!ENTITY copyright "Copyright 2000, As The World Spins Corp. All rights reserved. Please do not copy or use without authorization. For authorization contact legal@worldspins.com."> <!ENTITY trademark SYSTEM "http://www.worldspins.com/legal/trademark.xml"> ] > Using Entities To use an entity in your document, just call it by name. The name begins with an & and ends with a semi-colon. Click anywhere on this code to see how it would display, assuming of course, that it was linked to a style sheet. <?xml version="1.0"> <!DOCTYPE [ <!ENTITY copyright "Copyright 2000, As The World Spins Corp. All rights reserved. Please do not copy or use without authorization. For authorization contact legal@worldspins.com."> <!ENTITY trademark SYSTEM "http://www.worldspins.com/legal/trademark.xml"> ] > 20 <PRESSRELEASE> <HEAD>Mini-globe revolutionizes keychain industry</HEAD> <LEAD> Today As The World Spins introduces a new approach to key chains. With the new MINI-GLOBE keys can be kept inside a chain, called for upon demand, and stored safely. Never more will consumers lose a key or stand at a door flipping through a stack of keys seeking the right one. </LEAD> <LEGAL> &trademark; &copyright; </LEGAL> </PRESSRELEASE> XML DTDs: Introduction Valid XML documents follow a set of rules defined in a associated DTD. This Document Type Definition defines elements, attributes, and relationships between elements. DTDs are saved in an ascii text file with the extension .dtd, like this: mypage.dtd When your XML document is processed, it is compared to its associated DTD to be sure it is structured correctly and all tags are used in the proper manner. This comparison process is called validation and is is performed by a tool called a parser. Remember, you don't need to have a DTD to create an XML document; you only need a DTD for a valid XML document. Before You Begin There are a handful of terms you'll be hearing as you work with an XML DTD. Take a couple of minutes to become familiar with them before you begin. Click on any of the terms to see its definition. Schema A schema is a description of the rules for data. A schema does things: 1. It defines the elements in a data set and their relationship to each other. 2. It defines the content that can be contained in each element. DTDs are a schema for XML documents. DTD Document Type Defintion. The DTD defines the elements, attributes, and relationships between elements for an XML document. A DTD is a way to check that the document is structured correctly, but you do not need to use one in order to use XML. Document Tree 21 A document tree is the representation of the hierarchy of elements in a document. A document tree has one root element. All other elements are part of this top-level element. The first tag in your XML document is always the root element. Root Element The root element is the top-most element in the hierachy. All other elements in a document are children of this element. In an XML file, the first tag is the root element's tag. In the DTD, the root element is the first element you should define. Parent Element A parent element is a element which contains other elements. The other elements are called children. For example, a list is a parent. The list items are children. A parent element is sometimes referred to as a branch element. Each branch sprouts off the tree; from the branch hang other brances and individual leaves. The branches and leaves "belong" to the parent branch. Child Element The child element a sub-set of the parent element. An element may be both a parent and a child at the same time. For example, the list element may be a child of the root element. At the same time it is the parent of the list item element. If a child element is the outer-most element in the hierachy and does not contain any other elements it is sometimes called a leaf element. Parser A parser is a software tool that checks to be sure a document follows a particular syntax. XML parsers come in two varieties: A non-validating parser checks a document to be sure XML syntax rules are followed and builds a document tree from the element tags. A validating parser checks the syntax, builds the tree, and compares the use of element tags to be sure they conform with the rules specified in the document's associated DTD. Paresers can be either external programs or part of the editing tool or browsing tool. The XML Reference section includes a list of some of the XML parsers DTD Contents A DTD is a way to ensure that an XML document uses elements correctly. It contains a set of rules. When your XML document is processed, it is compared to its associated DTD to be sure it is structured correctly and all tags are used in the proper manner. A DTD: 22 Always contains rules that define elements. Always contains rules that define the relationship between elements. May contain rules that define attributes for elements, althought not all elements have attributes. May contain rules that define entities. May may contain rules that define notations Finding a DTD Using a DTD doesn't necessarily mean you have to create one from scratch. There are a number of existing DTDs, with more being added everyday. Shared DTDs As XML becomes wide-spread, your industry association or company is likely to have one or more published DTDs that you can use and link to. These DTDs define tags for elements that are commonly used in your applications. You don't need to recreate these DTDs -- you just point to them in your doctype tag in your XML file, and follow their rules when you create your XML document. Some of these DTDs may be public DTDs, like the HTML DTD. Others may belong to your company. If you are interested in using a DTD, ask around and see if there is a good match that already exists. Create Your Own External DTD Another option is to create your own DTD. The DTD can be very simple and basic or it can be large and complex. The DTD will be a reflection of the needs of your document. It is perfectly acceptable to have a DTD with just four or five basic elements if that is what your document needs. Don't feel that creating a DTD necessarily needs to be a huge undertaking. However, if your documents are complex, do plan on setting aside time -- several days or several weeks -- to understand the document and the document elements and create a solid DTD that will really work for you over time. Remember, you'll be able to use this DTD with many individual documents, so it is worth the time to think it through and craft it well. Create Your Own Internal DTDs You can insert DTD data within your DOCTYPE definition in an individual XML document. If you're worked with CSS styles, you can think of this as being a little like putting style data into your file header. DTDs inserted this way are used in that specific XML document only. This might be the approach to take if you want to validate the use of a small number of tags in a single document or to make elements that will be used only for one document. Internal DTDs You can insert DTD data within your doctype declaration. This type of DTD is used only by the one specific XML document that contains it. This is a very simple example of DTD data within the doctype declaration. Click on any line of the code to learn what it does. <!DOCTYPE books [ <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ENTITY copyright "Copyright 1999, Flying Toys Inc., all rights reserved."> ]> External DTDs DTDs are stored as ascii text files with the extenstion .dtd. Each file begins with a DOCTYPE definition and includes a seres of element definitions, attribute lists, entity defintions and notation 23 definitions. Here's an example; this might be the DTD for a set of documents about books. Click on any line for more information about it: <!--This defines a listing of books--> <!DOCTYPE booklist [ <!ELEMENT booklist (title, author)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ATTLIST title (paper|cloth|hard) "paper"> <!ENTITY copyright "Copyright 1999, Flying Toys Inc., all rights reserved."> ] > DTDs can be much more complex than this example -- and they typically are -- but this gives you a sense of what they can do. It's just a matter of structuring your data and figuring out the "parts" of your content. Reading a DTD Even if you don't plan to build a DTD from scratch, it is helpful to know how to read one and to understand the document it is describing. From reading a DTD you should be able to compile a list of elements and their attribute, and how and when to use them. You should also be able to compile a list of entities that you can use within the document. Some people find it helpful to actually sketch out a document tree as they go through the DTD, to visualize the structure of the document. Check List Here's a list of things to look for as you go through a DTD: Read the Comments Note the Basic Elements Read the Element Declaration Look for Parent/Child Relationships Read Attribute Lists Find Attribute Names for Each Element Determine Attribute Value Types See the Attribute's Default Read Entity Declarations Read the Comments Read the comments! Comments can tell you a lot about the DTD, how to use it, and what to be aware of when using it. Most DTD authors will include information that you should know before using the DTD. This might range from use restrictions to how-to information. Comments look like this: <!-- Here's a comment --> Note the Basic Elements Look through the DTD and identify the element names that comprise the document. Note how they are capitalized. You might want to develop a reference sheet of elements, that you can make notes on as you work your way through the DTD. Elements begin like this: <!ELEMENT 24 The text immediately after the element declaration is the element's name. Read the Element Declaration Each element declaration provides the name of the element and the content which it contains. Sometimes the content is text. Other times is other elements, arranged in a certain order or used a certain number of times. Click on each portion of these element declarations to learn about the rules they describe. <!ELEMENT EMPLOYEE (FIRST, MI, LAST)> <!ELEMENT FIRST (#PCDATA)> <!ELEMENT MI (#PCDATA)> <!ELEMENT LAST (#PCDATA)> Look for Parent/Child Relationships The element rules build a hierarchy of element, describing how one element is related to another. And element that is contained within another is considered a child of the element in which it is contained. Use these relationships to sketch out your document tree. The parent/child relationship is defined in the content type portion of the element definition. If the content type is another element, then those elements are children of the element whose definition you are reading. For example: FIRST, MI, and LAST are children of EMPLOYEE: <!ELEMENT EMPLOYEE (FIRST, MI, LAST)> The DTD can require that the child elements be used in a certain order or that they be used one, none, or many times. It can also group elements to create more detailed rules. Read Attribute Lists After element definitions, you may see attachment lists. An attachment list begins like this: <!ATTLIST Each attribute list defines the attributes for an element. Many attributes may be defined in one ATTLIST. The ATTLIST is structure like this: <!ATTLIST element-name attribute-name attribute-type default-data> See Which Element the Attribute Defines Right after the ATTLIST declaration is the name of an element. This is the element that the attribute list defines. For example, this ATTLIST defines the COMMENT element: <!ATTLIST COMMENT attribute-name attribute-type default-data> Find Attribute Names for Each Element Following the element name is the name of the first attribute declared in this list. This name is the attribute name you type into the element tag in the XML file. For example, this ATTLIST defines the attribute "category" for the element COMMENT. <!ATTLIST COMMENT category attribute-type default-data> Add the attribute information to the element reference list you are building. Determine Attribute Value Types Attributes can be one of several different types. The attribute-type describes the type of value that the attribute may contain. For example, this ATTLIST says that the "category" attribute for the element COMMENT contains one of four values: red, green, blue, or other. <!ATTLIST COMMENT category (red | green| blue| other) default-data> 25 See the Attribute's Default The final part of the ATTLIST is the default value of the attribute. The default value has a strong effect on how the attribute is used and what values it might have if you don't use it in the XML tag. You can make the value required (#REQUIRED) or optional (#IMPLIED). Or, you can provide a default value that will be used automatically if the attribute is not entered. Read Entity Declarations Along with element and attribute definitions, you may also see entity definitions. Typically, these will appear in a group, often at the beginning of the DTD, and usually with explanatory comments. An entity definition begins like this: <!ENTITY After the declaration, is the entity's name and the contents of the entity. The contents may be text or it may be a pointer to another external file. For example, this defines two entities, one called "copyright" and one called "trademark." Copyright is defined within the definition, while trademark points to another file. <!ENTITY copyright "Copyright 2000, As The World Spins Corp. All rights reserved. Please do not copy or use without authorization. For authorization contact legal@worldspins.com."> <!ENTITY trademark SYSTEM "http://www.worldspins.com/legal/trademark.xml"> Making Elements Elements are the basic building blocks of XML. You define elements in a DTD; you use them in a document. A basic element definition looks like this: <!ELEMENT DESCRIPTION (#PCDATA, DEFINITION)*> Element Declaration Each element begins with an element declaration, <!ELEMENT. This announces that you are defining an element. Element Name After the declaration is the element's name. The way the name appear in the element definition is exactly the way it must be used in the XML document. Capitalization counts! Element Rule After the name comes a rule that describes what the element can contain. Through this description, the element take on hierarchal relationships with each other. Although the basic bits of the rules are simple, they can be grouped and combined to create quite complex definitions. This table summarizes the element rule definitions. Contents Elements can contain text, other elements, a combination of text and other elements, or they may be empty. Text. Elements can contain textual data. 26 Other Elements. Elements can contain only other specified elements and no text. The contained element are called children of the containing element. The containing element is the parent of the child elements. Combination. Element can contain a mix of textual data and other specific elements. Empty. Empty elements get their value from their attributes. An empty element will typically have at least one attribute. In HTML, the IMG tag is a good example of an empty element. It gets its value from the src attribute. Number of Occurences You can specify the number of times a child element is used within its parent. Once and only once. The element listed by itself indicates that it can be used once and only once: DTD definition Used in document <!ELEMENT EVENTLIST (EVENT)> <EVENTLIST> <EVENT>Balsa Wood Flyer Days</EVENT> </EVENTLIST> At least once, or many times. The element followed by a plus sign indicates that this element can be used many times with the parent: DTD definition Used in document <!ELEMENT EVENTLIST (EVENT+)> <EVENTLIST> <EVENT>Balsa Wood Flyer Days</EVENT> <EVENT>Sundays in the Park</EVENT> <EVENT>Teach Your Child to Fly</EVENT> </EVENTLIST> Once or not at all. The element followed by a question mark indicates that this element can be used either one time or not at all: DTD definition Used in document <!ELEMENT EVENT (LOCATION, SPONSOR?)> <EVENT> <LOCATION>West Bay Ballpark</LOCATION> </EVENT> or <EVENT> <LOCATION>West Bay Ballpark</LOCATION> <SPONSOR>Flying Toys</SPONSOR> </EVENT> Once, not at all, or a many times as you want. The element followed by an asterisk indicates that this element can be used as many time as needed. DTD definition Used in document <!ELEMENT EVENT (LOCATION*, EVENT-NAME)> <EVENT> <LOCATION>West Bay Ballpark</LOCATION> <LOCATION>North Side Park</LOCATION> <EVENT-NAME>Sundays in the Park</EVENTNAME> </EVENT> or <EVENT> <EVENT-NAME>Sundays in the Park</EVENT- 27 NAME> </EVENT> Order You can specify the order in which child elements appear. Specific order. Child elements can be defined to be used in a specific order. The comma (,) separates elements that are listed in a specific order. For example, you could set a rule that creates an EVENTLIST. In the list, you must always use the EVENT element, followed by the SPONSOR element. DTD definition Used in document <EVENTLIST (EVENT, SPONSOR)> <EVENTLIST> <EVENT>Balsa Wood Plane Days</EVENT> <SPONSOR>Flying Toys</SPONSOR> </EVENTLIST> Either Or. You can define child elements so that one or another can be used. The bar (|) separates either or choices. DTD definition Used in document <EVENT (EVENT-NAME | SPONSOR)> <EVENT> <EVENT-NAME>Balsa Wood Plane Days</EVENTNAME> </EVENT> or <EVENT> <SPONSOR>Flying Toys</SPONSOR> </EVENT> Groups Groups can be used to create complex rules, that combine elements and different usage option. For example, when groups are combined with a "use many times" symbol, you can create a rule that allows multiple uses of elements -- either in in any order or as repeated sets. For example, here the element EVENTLIST can contain multiple sets of EVENT and SPONSOR groups: DTD definition Used in document <EVENTLIST (EVENT, SPONSOR)*> <EVENTLIST> <EVENT>Balsa Wood Plane Days</EVENT-NAME> <SPONSOR>Flying Toys</SPONSOR> <EVENT>Sundays in the Park</EVENT-NAME> <SPONSOR>Deer Island Recreation Department</SPONSOR> </EVENTLIST Here, the EVENTLIST can contain either the EVENT element or the SPONSOR element, but this either or group can be used many times. 28 DTD definition Used in document <EVENTLIST (EVENT | SPONSOR)*> <EVENTLIST> <EVENT>Balsa Wood Plane Days</EVENT-NAME> <SPONSOR>Flying Toys</SPONSOR> <SPONSOR>Deer Island Recreation Department</SPONSOR> </EVENTLIST Hints for Element Names Select names that are both easy to remember and easy to type. Give your tags should have some inherent meaning. For example, if you want to identify "last name" as an element, consider naming the element something like "last-name" or "surname." Use names that are consistent with current processes. If people call "social security number" SSN, create an element called SSN. Don't create an unfamiliar "socsecnum" element. Be consistent in your use of names. It is easier to apply one set of general rules to 20 different tags than it is to remember eight discrete tags that follow no particular pattern. Attribute Lists Elements can have attributes, which describe the element in more detail. When you create an element in your DTD, you can also an create an attribute list for the element. Attribute lists define the name, data type, and default value (if any) of each attribute associated with an element. In this very simple example, we're adding some attributes to the title element from our book list. We want to be able to specify the edition date and whether the book is paperback or hardcover. Click on any of the attribute list code to see what it does. <!--This defines a listing of books--> <!DOCTYPE books [ <!ELEMENT booklist (title, author)> <!ELEMENT title (#PCDATA)> <!ATTLIST title edition (CDATA) #REQUIRED type (paper|cloth|hard) "paper"> <!ELEMENT author (#PCDATA)> ] > Here's how you'd use these attributes in an XML file. Notice the use of the edition attributes in each title tag. Notice how one title tag also uses the type attribute to indicate that this book is a hardcover title. Attribute Types Attributes can have one of seven different types of data, but the two most common are: CDATA. Character data. This allows the attribute value to be textual data. You use it like this: <!ATTLIST edition date (CDATA)> Pre-defined values. You can list a string of specific values that the attribute can have. The value set is enclosed in parenthesis and each value is separated with a vertical bar, like this: <!ATTLIST edition type (paper|hard|cloth)> Default Values You can specify a default value for the attribute, or make the attribute required or optional. The default value has a strong effect on how the attribute is used and what values it might have if you don't use it in the XML tag. #REQUIRED: the attribute must have a value every time the element is listed. You specify that an attribute is required like this: <!ATTLIST edition date (CDATA) #REQUIRED> 29 #IMPLIED: the procesor ignores this attribute unless it used as part of the element. It does not assume any default value. #FIXEDvalue: an attribute is not required for the element, but if it occurs, it must have the specified value. For example, if the new attribute is used, it must have the value of "yes": <!ATTLIST edition new #FIXED "yes"> VALUE defaultvalue provides a default value for that attribute. If the attribute in not included in the element, the processing program assumes that this is the attribute's value. For example, this gives the type attribute a default value of "hard": <!ATTLIST edition type (paper|cloth|hard) "hard"> Entities An entity is a short cut to a set of information. When you use an entity, it "expands" to its full meaning, but you need only type the shorter entity name during data entry. You might think of an entity as being a bit like a macro -- it is a set of information that can be used by calling one name. XML defines two types of entities. The general entity is one that you define in a DTD and use in a document. General entities are easy to spot. They are defined with the entity declaration, <!ENTITY, and when they are used they begin with the ampersand and end with the semicolon, like this: &entity-name; The parameter entity is one that you define and use within a DTD. The content of a parameter entities may be either included in the DTD or stored in an external file. In addition, parameter entities must be parsed; they cannot be unparsed. That is, they must contain textual data that is processed rather than a GIF or other non-textual data type. It too is defined with a entity declaration, but it is called with a percent sign, like this: %info; Defining a General Entity To define an entity: 1. Start the entity definition, with a less than sign, an exclamation mark, and the phrase ENTITY, all in caps: <!ENTITY 2. Type the name of the entity. Type it using the capitalization that you will use when calling it later on. <!ENTITY copyright 3. If you are defining the entity locally, type the value of the entity, surrounded by quotes, and then close the entity definition with a greater than sign. <!ENTITY copyright "Copyright 2000, As The World Spins Corp. All rights reserved. Please do not copy or use without authorization. For authorization contact legal@worldspins.com."> 4. If you are defining an entity in an external, ascii text file, put in a pointer to the external file, then close the entity definition with a greater than sign. <!ENTITY copyright SYSTEM "http://www.worldspins.com/legal/copyright.xml"> 30 Using a General Entity You won't be using a general entity in a DTD. You will only be defining it here. You will be using it in an XML file, where it is called by tying an ampersand, the entity name, and a semi-colon, &entityname; Defining a Parameter Entity To declare a parameter entity: 1. Type the entity declaration: <!ENTITY 2. Type a space, followed by a percent sign. It is important to remember the space! <ENTITY % 3. Type another space, followed by the name of the entity: <!ENTITY % list 4. Type the value of the entity, surrounded by quotation marks: <!ENTITY % list "name CDATA #REQUIRED gender (m | f) "f" color (red | fawn | merle | black)" 5. End the declaration with an end tag symbol. <!ENTITY % info "name CDATA #REQUIRED gender (m | f) #REQUIRED color (red | fawn | merle | black |other) #REQUIRED" One thing to notice about entities in a DTD is that when they are defined there is a space between the percent sign and the entity name--but when the entity is used there is no space between the percent sign and the entity name. Using a Parameter Entity It is quite simple to use a parameter entity. Simply enter the entity name, preceded by a percent sign and followed by a semi-colon, like this: <HOUND (NAME)> <!ATTLIST HOUND %info;> <WORKING (NAME)> <!ATTLIST WORKING %info;> <COMPANION (NAME)> <!ATTLIST COMPANION %info;> When the DTD is processed, the entity will be expanded. In this example, %info; will be replaced with a set of attribute data, which was defined in the info entity declaration. Again, remember that when a parameter entity is defined, there is a space between the percent sign and the entity name--but when the entity is used there is no space between the percent sign and the entity name. 31 XML Parsers Parsing is the process of checking the syntax of your document and creating the "tree structure." If you are using a validating parser, the process will also compare the XML file to its DTD. On-line Parsers There are a number of online parsers. To use these, you typically type in the URI of your file and tell the process to begin. Online validating parser, from the W3C The W3C offers an online parser. Type the URL of the file into the form and the XML file is both parsed and validated. Validating Parser from Brown University Scholarly Technolgy Group This is the most easily accessible and understandable presentation of the online parsers. Downloadable Parsers There are many parsers that you can download and run on your local machine. Most of these require you to have either a Windows or UNIX machine. They are written in a variety of langauges; this is a cross section of some of the many which are available. James Clark's expat parser James Clark is amost a brand in the SGML/XML world. His rendition of an XML parser is widely used. Java-based Validating XML Parser From IBM's AlphaWorks group, this parser claims to be 100% pure Java. Microsoft XML Parser in C++ A parser from Microsoft. XML Parser written in Python This is a validating parser. XML Parser written in JavaScript. This parser is non-validating and checks XML syntax only. SiRPAC, Simple RDF Parser and Compiler From the W3C. XML Syntax Tagging an XML document is, in many ways, similar to tagging an HTML document. Here are some of the most important guidelines to follow. Rule #1: Remember the XML declaration This declaration goes at the beginning of the file and alerts the browser or other processing tools that this document contains XML tags. The declaration looks like this: <?xml version="1.0" standalone="yes/no" encoding="UTF-8"?> You can leave out the encoding attribute and the processor will use the UTF-8 default. Rule #2: Do what the DTD instructs If you are creating a valid XML file, one that is checked against a DTD, make sure you Know what tags are part of the DTD and use them appropriately in your document. Understand what each does 32 and when to use it. Know what the allowable values are for each. Follow those rules. The XML document will validate against the specified DTD. Rule #3: Watch your capitalization XML is case-sensitive. <P> is not the same as <p>. Be consistent in how you define element names. For example, use ALL CAPS, or use Initial caps, or use all lowercase. It is very easy to create mismatching case errors. Also, make sure starting and ending tags use matching capitalization, too. If you start a paragraph with the <P> tag, you must end it with the </P> tag, not a </p>. Rule #4: Quote attribute values In HTML there is some confusion over when to enclose attribute values in quotes. In XML the rule is simple: enclose all attribute values in quotes, like this: <NAME dob="1960">Ben Johnson</NAME> Rule #5: Close all tags In XML you must close all tags. This means that paragraphs must have corresponding end paragraph tags. Anchor names must have corresponding anchor end tags. A strict interpretation of HTML says we should have been doing this all along, but in reality, most of us haven't. Rule #6: Close Empty tags, too In HTML, empty tags, such as <br> or <img>, do not close. In XML, empty tags do close. You can close them either by adding a separate close tag (</tagname>) or by combining the open and close tags into one tag. You create the open/close tag by adding a slash, /, to the end of the tag, like this: <br/> Examples This table shows some HTML common tags and how they would be treated in XML. Tag Comment End-Tag <P> Technically, in HTML, you're supposed to close this </P> tag. In XML, it's essential to close it. <ELEMENT> All Elements in XML must have a Start-tag and an </ELEMENT> end-tag. <LI> This tag must be closed in XML in order to ensure a </LI> Well-Formed XML document. <META META tags are considered empty elements in XML, <META name="keywords" and they must close. name="keywords" content="XML, SGML, content="XML, SGML, HTML"> HTML"/> <BR> 33 Break tags are considered empty elements. <BR/> <IMG src= This is an empty element tag. "coolpictures.html"> <IMG src= "coolpictures.html"/> Element and Attribute Rules The first table contains the basic guidelines for creating element rules in an XML DTD. The second contains attribute value types. The third contains attribute default options. Element Rules: Symbol Meaning Example #PCDATA Contains parsed character data, or text. <POW(#PCDATA)> #PCDATA, elementname Contains text and another element. #PCDATA is always listed first in a rule. <POW(#PCDTATA, NAME)> , (comma) Use in this order <POW (NAME, RANK, SERIAL)> The POW element contains textual data. The POW element must contain both text and the NAME element. The POW element must contain the NAME element, followed by the RANK element, followed by the SERIAL element. | (bar) Use either or < POW(NAME | RANK | SERIAL)> The POW element must contain either the NAME element, or the RANK element, or the SERIAL element. name (by itself) Use one time only <POW (NAME)> The POW element must contain the NAME element, used exactly one time. name? Use either once or not at all <POW(NAME, RANK?, SERIAL?)> The POW element must contain the NAME element used exactly oncee, followed by one or none RANK elements, and one or none SERIAL elements. name+ Use either once or many times <POW(NAME+, RANK?, SERIAL)> The POW element must contain at least one but maybe more NAME elements, followed by one or none RANK elements, and exactly one SERIAL elements. name* 34 Use once, use many times, or don't use it at all. <POW(NAME*, RANK?, SERIAL)> The POW element must contain at one, many, or none NAME elements, followed by one or none RANK elements, and exactly one SERIAL elements. () Indicated groups, may be nested. <POW(#PCDATA | NAME)*> The POW element contains one more use uses of either or both text and the NAME element. <POW((NAME*, RANK?, SERIAL)* | COMMENT)> The POW element must contain many instances of the group that contains one, many, or none NAME elements, followed by one or none RANK elements, and exactly one SERIAL elements. OR, it may contain one COMMENT element. <POW(NAME | RANK)+> The POW element must contain a NAME or RANK element. The NAME or RANK option may appear once or may be repeated many times. Attribute Values: Type Meaning Example CDATA Character data, text. <ATTLIST COMMENT category CDATA #REQUIRED> The COMMENT element has an attribute named category. This attribute contains letters, numbers, or punctuation symbols. NMTOKEN (value-1 | value-2 | value-3) value list Name token, text with some restrictions. The value contains number and letter. However, it cannot begin with the letters "xml" and the only symbols it can contain are _, -, ., and :.. <ATTLIST COMMENT category NMTOKEN #REQUIRED> A value list provides a set of acceptable options for the attribute to contain. In general, you should always include "other" as one of the options. <ATTLIST COMMENT category (red | green | blue | other) "other"> The COMMENT element has an attribute named category. This attribute contains a name token. The COMMENT element has an attribute named category. The category can be "red," "green," "blue," or "other." The default value is "other." ID The keyword ID means that this attribute has an ID value that idenifies this particular element. <ATTLIST COMMENT category ID #IMPLIED> The COMMENT element has an attribute named category. The category will contain an ID value. 35 ID and IDREF work together to create cross-references. IDREF The keyword IDREF means that this attribute has an ID reference value that points to another instance's ID value. <ATTLIST COMMENT category IDREF #IMPLIED> The COMMENT element has an attribute named category. The category will contain an IDREF value. ID and IDREF work together to let you do cross-reference elements. ENTITY NOTATION The keyword ENTITY means that this attribute's value is an entity. An entity is a value that has been defined elsewhere in the DTD to have a particular meaning. <ATTLIST COMMENT category ENTITY #IMPLIED> The keyword NOTATION means that this attribute's value is a notation. A notation is a description of how information should be processed. You could set up a notation that allows only numbers to be used for the value, for example. <ATTLIST COMMENT category NOTATION #IMPLIED> The COMMENT element has an attribute named category. The category will contain an entity name rather than text. The COMMENT element has an attribute named category. The category attribute will contain a notation name. Attribute Default Options: Type Meaning Example #REQUIRED The attribute must always be <ATTLIST COMMENT category CDATA #REQUIRED> included when the element is used. The COMMENT element has an attribute named category. This attribute contains letters, numbers, or punctuation symbols. The attribute must always be used with the element. If you omit the attribute, the parser will give you an error message. #IMPLIED #FIXED 36 The attribute is optional. If you see the keyword #IMPLIED, you know that this attribute will be ignored unless it is included in the element tag. It won't take on any default values. <ATTLIST COMMENT category CDATA #IMPLIED> The attribute is optional, but if it is used, it must always have a certain value. If you see the keyword #FIXED, you know that this attribute will always have the <ATTLIST COMMENT confirm #FIXED "yes"> The COMMENT element has an attribute named category. You may use the attribute or omit the attribute, as the instance requires. The COMMENT element has an attribute named confirm. If it is used, its value will be "yes." If it is not used, it will not have a value. specified value when it is entered. "value" A value in quotes is the default value of this attribute. If you don't enter the attribute in the element tag, the processor will assume the attribute has this default value. <ATTLIST COMMENT category (red|green|blue|other) "other"> The COMMENT element has an attribute named category. If you don't use the attribute in the element tag, the attribute will automatically receive the value "other." Interaction Between Components XML, CSS, script, the DOM, and the browser work together to let you create interactive presentations of your content. Click on each piece to learn what role it plays. Copyright © 1998-99 DevX.com, Inc. XML Parsers Parsing is the process of checking the syntax of your document and creating the "tree structure." If you are using a validating parser, the process will also compare the XML file to its DTD. On-line Parsers 37 There are a number of online parsers. To use these, you typically type in the URI of your file and tell the process to begin. Online validating parser, from the W3C The W3C offers an online parser. Type the URL of the file into the form and the XML file is both parsed and validated. Validating Parser from Brown University Scholarly Technolgy Group This is the most easily accessible and understandable presentation of the online parsers. Downloadable Parsers There are many parsers that you can download and run on your local machine. Most of these require you to have either a Windows or UNIX machine. They are written in a variety of langauges; this is a cross section of some of the many which are available. James Clark's expat parser James Clark is amost a brand in the SGML/XML world. His rendition of an XML parser is widely used. Java-based Validating XML Parser From IBM's AlphaWorks group, this parser claims to be 100% pure Java. Microsoft XML Parser in C++ A parser from Microsoft. XML Parser written in Python This is a validating parser. XML Parser written in JavaScript. This parser is non-validating and checks XML syntax only. SiRPAC, Simple RDF Parser and Compiler From the W3C. Introduction to Behaviors Behaviors are an enhancement to Internet Explorer 5 that allow designers to add scripting elements without having to do the scripting needed to make them work. Behaviors are also a way in which scripters can write a script once and turn it over to designers for use whenever needed. So what can behaviors do? By using XML we can link behaviors to any element in a Web page and manipulate that element. We can, for example, copy that element's text into a pullquote area on the page. We could offer a way to magnify small type on a page. Many of the everyday things we do with scripting can be transfered to behaviors and by combining them with XML we can have greatly enhanced Web pages that will work down the browser foodchain with no ill effects. At the left you will find links to several behaviors created here at Project Cool. Each link will take you to a page that not only demonstrates the behavior but also shows you just how simple they are to implement. We've divided our behaviors into two categories: fx - Special Effects behaviors don't add value neccessarily, but do add eye-catching special effects that can make your page stand out if used appropriatly. publishing - These behaviors can add value and utility to pages of text content. They make your pages much more usable for the viewer or add new ways to get them involved in the text. So what are you waiting for? Click one of the links to the left and start exploring what you can do with behaviors and XML. Copyright © 1998 Earthquake! This behavior falls into the realm of special effects. It's really not useful but it could help provoke mood on a website. To see it in action just run your mouse over the headline. 38 While it would probably be easy to implement this in the document directly we've chosen to use it as a behavior. Part of the beauty of behaviors is that they allow a designer to take pre-written code and effects and insert them into a webpage without having to be a programmer. By having effects like Earthquake available as behaviors a designer can build of an astonishing repetoire of web display tools without needing to learn JavaScript. Earthquake is set up via XML so you'll need to create an appropriate namespace before you can use it. We're doing it as XML so that older browsers aren't affected adversely. It also let's us define a brand new tag. The namespace is set up in the <html> tag on your webpage. Here's the one we're using on this page: <html xmlns:fx> The next step is to define XML tag we'll be using. This is done in the specific media type. In this case the behavior will apply to the screen so will place its CSS properties there and we associate it with our namespace by prefixing the namespace to the declaration. Our declaration looks like this: <style> <!-@media screen{ fx\:EARTHQUAKE { behavior:url(earthquake.htc) } } --> </style> As you can see, the only part that is needed is the behavior property. It must point to the behavior file, earthquake.htc. You can download the earthquake.htc file here. Once you have it just make sure it's on your server and that the url is specified properly in your CSS. All that's left then is to place the XML tags around the item you wish to trigger the earthquake behavior. Earthquake will be triggered when someone runs their mouse over the item. The tagging is very simple and looks like this: <fx:EARTHQUAKE>Shake it, baby!</fx:EARTHQUAKE> Now you've got it, everything you need to know to create your own earthquakes. So...uh....Shake it, baby! Typewriter Behavior Sure it owes its heritage to movies and computer gaming, but a typewriter effect can be quite eye-catching if used properly. We'd bet you're reading this as it types. It's not for every Web site though, so use it sparingly. This behavior can be set to type at whatever speed you need. The above example types at a speed of one character every 100 milliseconds. Typewriter is set up via XML so you'll need to create an appropriate namespace before you can use it. We're doing it as XML so that older browsers aren't affected adversely. It also let's us define a brand new tag. The namespace is set up in the <html> tag on your Web page. Here's the one we're using on this page: <html xmlns:fx> The next step is to define XML tag we'll be using. This is done in the specific media type. In this case the behavior will apply to the screen so will place its CSS properties there and we associate it with our namespace by prefixing the namespace to the declaration. Our declaration looks like this: <style> <!-@media screen{ fx\:TYPEWRITER { behavior:url(typewriter.htc); height: 4em; font-family: "ocr a extended", courier; } } --> </style> 39 The most important part of that is the behavior property. It's the only part really needed and it must point to the behavior file, typewriter.htc. You can download the typewriter.htc file here. Once you have it just make sure it's on your server and that the url is specified properly in your CSS. All that's left then is to place the XML tags around the text you wish to have typed onto the page. That's simple too: <fx:TYPEWRITER speed="120">Type this text.</fx:TYPEWRITER> Notice that we've set the speed to 120. If you don't set a speed the typing will appear with the default setting of 100. That's really about all you need to know to use it. Be aware that this behavior only runs once and only when the page is first loaded. So if you use this, make sure it's someplace that your users will be able to see it. Now start typing! Footnote Behavior If you've ever seen a Web document with footnotes you know what a problem it is to read a relevant footnote and then scroll back up the document to find where you had stopped reading. This behavior changes that. It will bring the footnotes to the user(1) without the need for them to scroll away from their place in the page. Let's face it too, footnotes can be ugly things tacked to the bottom of a page. By implementing a footnote tag via a behavior and XML we can give a designer complete control over what the footnote is going to look like when it appears for the user. Everything about the way the footnote looks can be adjusted via CSS. Since FOOTNOTE is set up via XML so you'll need to create an appropriate namespace before you can use it. We're doing it as XML so that older browsers aren't affected adversely. It also let's us define a brand new tag. The namespace is set up in the <html> tag on your webpage. Here's the one we're using on this page: <html xmlns:pub> The next step is to define XML tag we'll be using. This is done in the specific media type. In this case the behavior will apply to the screen so will place its CSS properties there and we associate it with our namespace by prefixing the namespace to the declaration. Our declaration looks like this: <style> <!-@media screen { pub\:FOOTNOTE {behavior:url(footnote.htc)} .footstyle {width: 250; position: absolute; left:-1000; color: black; background-color: #9999cc; text-align: justify; border-color: #404040; border-width: thin; border-style: solid; padding: 1em; font-family: arial; font-size: 10pt; } .closer {cursor: hand; color="#ffff00"; text-align: right; margin-top: 1em; } 40 .fhilite {cursor: hand; color: chocolate; font-family: "Arial"; text-decoration: none; } } --> </style> As you can see, FOOTNOTE only needs the behavior property. It must point to the behavior file, footnote.htc. You can download the footnote.htc file here. Once you have it just make sure it's on your server and that the url is specified properly in your CSS. There are three CSS classes defined in the namespace as well. These are all used by the footnote behavior. The first is footstyle. This defines how the footnote will look when the user calls it. It should and applied to the division holding the footnote and it's important that it have at least three properties: width sets the display width in pixels of all footnotes. left property is used to hide the footnote until it is called. postition: absolute frees the footnote so that it can be postitioned anywhere on the page. The closer class describes how the word "close" will look in the displayed footnote block. This word is added to the bottom right corner of footnotes so that there is an option to remove them from the page display. Lastly, the class fhilite describes how the footnote link will appear and adds a hand cursor for user feedback. You'll need to create individual divisions for each footnote to be displayed. Here's what one from this page looks like: <div id=foot1 class=footstyle> <a name="footnote1"></a> (1) A user used to be someone who was heavily into drugs. Here a user simply refers to the person using a Web page. In this case, you. </div> The id of the division is extremely important. It is via this id that the behavior manipulates the footnote. The name can be anything you like as long as it is unique. You'll be using it in the FOOTNOTE tag to link the action to the division. In this case we used the id of foot1. This would be referenced in the FOOTNOTE tag as footName="foot1"(2). Let's take a look now at how that last footnote was called: <pub:FOOTNOTE footName="foot2"> <a href="#footnote2">(2)</a></pub:FOOTNOTE> It's that simple. Notice we've placed it around working HTML which would scroll down to the footnote in older browsers. The footnote behavior will erase that for IE5 and replace it with appropriate HTML to call our enhanced footnotes leaving just the text that is present within the tag. You should note that footName is a required property. If you forget to include it you won't get an error message. The enhanced footnote behavior will simply do nothing. Ok, so consider yourself armed, er, footed. You should now know everything you need to apply footnotes to your pages Magnify Behavior 41 It's become commonplace today to see websites that have lots of text crammed into a small area. Oftentimes some of that text is in the tiniest possible font. I can't speak for everyone, but in the wee hours of the morning it can be hard to read that text. Often I've wished for a way to magnify it without resizing the fonts in my browser. It seems a natural that having an easy way to magnify just a portion a page would be ideal. By creating a behavior for this and linking it to the page via XML it makes it possible for a magnify effect to be used nearly anywhere yet have the pages still work seamlessly for older browsers. See how easy it is to read magnified text by clicking the icon? After you've opened this you can close it by clicking the close icon on the bottom right.See how easy it is to read magnified text by clicking the icon? After you've opened this you can close it by clicking the close icon on the bottom right. If you look to your right you'll see an area of small text. If you are using IE5 beta 2+ you'll also see an icon of a magnifying glass. Older browsers won't show this icon since it was inserted into the page via the magnify behavior. If you click the icon a text window will display a magnified version of the exact text that is contained in the block along with an icon that will allow you to close it. Also, if there is any HTML formatting in that text, such as a link, it will be applied to the magnified version as well. This behavior was designed so that nearly all the control is in the hands of the designer. The only exception being the names of the icons used to indicate magnify and close magnify. These must be set in the .HTC file controlling this behavior. Everything else is done in the Web page itself using CSS and XML. Since MAGNIFY is set up via XML so you'll need to create an appropriate namespace before you can use it. We're doing it as XML so that older browsers aren't affected adversely. It also let's us define a brand new tag. The namespace is set up in the <html> tag on your webpage. Here's the one we're using on this page: <html xmlns:pub> The next step is to define XML tag we'll be using. This is done in the specific media type. In this case the behavior will apply to the screen so will place its CSS properties there and we associate it with our namespace by prefixing the namespace to the declaration. Our declaration looks like this: The next step is to define the XML and the tag properties for MAGNIFY. In doing this we also create a class called "magstyle" that defines what the magnified text will look like. This is done in the specific media type. In this case the behavior will apply to the screen so will place its CSS properties there and we associate it with our namespace by prefixing the namespace to the declaration. Our declaration looks like this: <style> <!-@media screen { pub\:MAGNIFY {behavior:url(magnify.htc)} .magstyle {color: black; background-color: goldenrod; border-color: #black; border-width: thin; border-style: solid; padding: 1em; font-family: arial; font-size: 16pt; position: absolute; left:-1000; } } --> </style> The most important part of that is the behavior property. It must point to the behavior file, magnify.htc. You can download the magnify.htc file here. Once you have it just make sure it's on your server and that the url is specified properly in your CSS. One thing to notice about the magstyle class is that it specifies a left position of -1000. This is so that the HTML that the behavior creates will be hidden from the user by appearing far off to the left of the display window. We're 42 doing this in part because of a small display glitch in the version of IE used to create this and also because it's always been my prefered way to hide content. It's just as easy to specify a new postion as it is to specify hidden/visible. You'll also need to download the two icons used by this behavior. Right click on each one and then select "save picture" to save magnify.gif and unmag.gif. This behavior looks for these icons in a directory called images. You can change these icons to others by editing the magnify.htc file to point to other images. You need one icon to represent the magnify option and one to indicate close magnify. All that's left then is to place the XML tags around the text for which you wish to offer a magnified view. It's this simple: <pub:MAGNIFY newId="1" width="400" align="left">The text that you wish to be magnifiable.</pub:MAGNIFY> I'm sure you noticed the properties we are passing to the magnify behavior. The first one, newId, is required. While we could have added a complex random identification generation routine to the behavior we chose to keep it simple and simply ask the designer to assign a unique name to each magnifiable section. Always be sure to assign a value to newId. This is needed to link the icon to the newly generated HTML of the magnified text. The other two properties are optional, they don't need to be specified. width specifies how wide the magnified area should be on the display. It defaults to 350 pixels if no width is specified. By making this a specifiable property the designer is given control of how the text will fit the screen with each magnified area. The other property, align, specifies the alignment of the magnify icon. Only "left" and "right" are correct values here. Any other value, or no specification at all, will cause left alignment to be used. By now you should be ready to apply magnification to your own Web pages. If you still feel a bit uncomfortable trying this then view the source of this page to see how we've done it. Now go forth and magnify! Pullquote Behavior If you've ever picked up a magazine then the odds are good that you've seen a pullquote. A pullquote is where a bit of text from the body of an article or story is pulled from the text and highlighted in some way to catch your eye. It's hoped that the quote will tease you enough to get you to read the story. Up until now "...it's been a pain to do pullquotes in a Web page."it's been a pain to do pullquotes in a Web page. It always required working them into the HTML code and hand copying the text to be quoted. For those reasons pullquotes have been a bit scarce on the Web. By using a pullquote behavior it's now possible for anyone to put a pullquote into a Web page without having to do complex layout tricks. It's as simple as putting a tag and some basic CSS into a Web page. You setup PULLQUOTE via XML so you'll need to create an appropriate namespace before you can use it. We're doing it as XML so that older browsers aren't affected adversely. It also let's us define a brand new tag. The namespace is set up in the <html> tag on your Web page. Here's the one we're using on this page: <html xmlns:pub> The next step is to define XML tag we'll be using. This is done in the specific media type. In this case the behavior will apply to the screen so will place its CSS properties there and we associate it with our namespace by prefixing the namespace to the declaration. Our declaration looks like this: <style> <!-@media screen { pub\:PULLQUOTE {behavior:url(pullquote.htc)} .pullstyle {width: 200; color:black; text-align: left; border-color:#9966cc; border-width:thin; border-style:solid; border-right: none; border-left: none; 43 padding: 1em; margin: 6pt; font-family: arial; font-style: italic; font-size: 14pt; } } --> </style> As you can see, PULLQUOTE only needs the behavior property. It must point to the behavior file, footnote.htc. You can download the pullquote.htc file here. Once you have it just make sure it's on your server and that the url is specified properly in your CSS. We also define a class called pullstyle. This is the CSS description of how a pullquote will look when rendered on a Web page. The behavior will apply this style to the pullquote that it creates. You have complete control over the appearance by changing the properties and values in pullstyle. All that's left then is to place the XML tags around the text for which you wish to offer a magnified view. Here's how we marked the pullqoute near the top of this page: <pub:PULLQUOTE align="right" lips="pre">it's always been a pain to do pullquotes in a Web page.</pub:PULLQUOTE> The align property specifies whether the pullquote will align on the left or the right of the page. Its valid values are, surprisingly enough, left or right. If you don't specify an alignment then the pullquote will align on the left. The second property is lips. This is our abbreviation for ellipsis. "An ellipsis is a series of three dots..."An ellipsis is a series of three dots that can be used at the beginning of a quote, at the end, or on both ends. You can see an ellipsis in the pullquote to your left. The acceptable values for lips are pre, post, or both. All other values will be ignored. This is an optional property but it is very useful if you are only quoting part of a sentence. Finally, just a few thoughts on proper use. A pullquote should contain text that will draw the reader in. It should only contain a small amount of relevant text and not several sentences. You'll probably want to consider using it near the top of a page so that it will be seen immediately by a prospective viewer. You also shouldn't make the style too different from the rest of the page. It should fit in yet be immediately visible. With those thoughts in mind, as well and your newfound knowledge of how to apply this behavior, it's time for you go to out there and pull one over on someone. A quote, that is. 44