What it is and how it works
eXtensible Markup Language
XML was designed to describe data and to focus on what data is.
HTML was designed to display data and to focus on how data looks.
XML is a markup language much like HTML
XML was designed to describe data
XML tags are not predefined. You must define your own tags
XML uses a Document Type Definition (DTD) or an XML Schema to describe the data
XML with a DTD or XML Schema is designed to be self-descriptive
XML was designed to carry data.
XML is not a replacement for HTML.
XML and HTML were designed with different goals:
XML was designed to describe data and to focus on what data is.
HTML was designed to display data and to focus on how data looks.
HTML is about displaying information, while XML is about describing information.
XML was not designed to DO anything.
Maybe it is a little hard to understand, but
XML does not DO anything.
XML was created to structure, store and to send information.
The following example is a note to Joe from theOtherJoe, stored as
XML:
<note>
<to>Joe</to>
<from>theOtherJoe</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The note has a header and a message body.
It also has sender and receiver information.
But still, this XML document does not DO anything.
It is just pure information wrapped in XML tags.
Software must be used to send, receive or display it.
XML tags are not predefined. You must "invent" your own tags.
The tags used to mark up HTML documents and the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard (like <p>, <h1>, etc.).
XML allows the author to define his own tags and his own document structure.
The tags in the example above (like <to> and <from>) are not defined in any XML standard. These tags are
"invented" by the author of the XML document.
It's created using standard text files.
It works on Windows (all versions).
It works on macintosh (all versions).
It works on Unix (all versions).
blah,blah, blah.
With XML, your data is stored outside your HTML.
With XML, data can be exchanged between incompatible systems.
With XML, financial information can be exchanged over the Internet.
With XML, plain text files can be used to share data.
With XML, plain text files can be used to store data.
XML can also be used to store data in files or in databases.
Applications can be written to store and retrieve information from the store, and generic applications can be used to display the data.
With XML, your data is available to more users.
XML is the mother of WAP and WML.
If Developers have Sense most future applications will exchange their data in XML.
The syntax rules of XML are very simple and very strict.
The rules are very easy to learn, and very easy to use.
Because of this, creating software that can read and manipulate XML is very easy
An XML document starts with a header tag:
<?xml version="1.0" encoding="ISO-8859-1"?>
XML tags are case sensitive.
XML tags MUST be closed.
XML tags must be properly nested.
All XML documents must have a root element.
Attribute values MUST ALWAYS be quoted.
In XML white space is PRESERVED.
XML comments are like HTML comments <!-- and -->
XML new lines are stored as Line Feeds. With ordinary text a new line is CR LF in windows, LF in unix and CR in mac.
Consider our previous example:
<note>
<to>Joe</to>
<from>theOtherJoe</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
If we have an application that will process this document and print it out correctly, then that application will still work if we change our definition of note as follows:
<note>
<date>2002-08-01</date>
<to>Joe</to>
<from>theOtherJoe</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
This version of note with an extra field will not break the software that was used to manipulate the previous note.
If you remove fields or change names then it will break.
<book>
<title>My First XML</title>
<prod id="33-657" media="paper"></prod>
<chapter>Introduction to XML
<para>What is HTML</para>
<para>What is XML</para>
</chapter>
<chapter>XML Syntax
<para>Elements must have a closing tag</para>
<para>Elements must be properly nested</para>
</chapter>
</book>
Element Content
An XML element is everything from the beginning of its start tag to the end of its closing tag.
Element content can include the following:
Element content (an element within an element)
Mixed content
Simple content
Empty content
An element can also have attributes.
Consider the previous example:
<book> has element content
<chapter> has mixed content (text and other elements)
<para> has simple (or text) content.
<prod> has empty content
<prod> also has attributes with values.
Element Naming Rules and Conventions
Names can contain letters, numbers and other characters.
Names must NOT start with a number or punctuation mark.
Names must not start with the 3 letter combination XML ( or xml or XmL, etc)
Names may not contain spaces.
Names should be meaningful.
There is no (reasonable) limit to the length of a name.
Dashes and periods should probably be avoided in names due to possible software interpretation problems. Underscores can substitute.
First_Name rather than First-Name or First.Name
Colon (:) has a special meaning and should not be used.
Either double or single quotes may be used for attributes but not both, unless the attribute contains a quote in which case use the other type.
Attributes vs Child Elements
Usually it’s better to use child elements.
Consider the XML <name> having a first and last element and an attribute of sex, vs a <name> having first, last, and sex elements and no attributes.
A good rule of thumb is to use Attributes like ID’s and classes are used in HTML.
Data should be stored as elements and metadata as attributes
Defining what Data should look like
DTD (data type definitions)
Schemas
The rules for XML software state that if the data file has an “error” the program will stop.
This means that if the syntax not the content is wrong then the XML file is invalid.
XSL -- xml style sheets.