XML

advertisement
DT228/3 Web Development
Introduction to
XML
Definition
XML is a cross-platform, software
and hardware independent tool for
transmitting information.
“XMl is going to be everywhere” – W3C
Introduction
•
XML is an important technology used throughout web
applications
•
XML stands for EXtensible Markup Language
•
XML is a markup language much like HTML
•
XML was designed to describe data
•
XML tags are not predefined in XML. You must define
your own tags
Introduction
•
XML uses a Document Type Definition (DTD) or an
XML Schema to describe the data
•
XMl documents are human readable
•
XMl documents end with .xml
e.g. note.xml
Introduction
The Web was created to publish
information for people
• – “eyes-only” was dominant design
perspective
• – Hard to search
• – Hard to automate processing
Introduction
The Web is using XML to become a
platform for information exchange
between computers (and people)
• – Overcomes HTML’s inherent
limitations
• – Enables the new business models of
the network economy
eXtensible Markup Language
• Instead of a fixed set of format-oriented tags like
HTML, XML allows you to create whatever set of
tags are needed for your type of information.
• This makes any XML instance “self-describing”
and easily understood by computers and people.
• XML-encoded information is smart enough to
support new classes of Web and e-commerce
applications.
Why XML?
• Sample Catalog Entry in HTML
<TITLE> Laptop Computer </TITLE>
<BODY>
<UL>
<LI> IBM Thinkpad 600E
<LI>400 MHz
<LI> 64 Mb
<LI>8 Gb
<LI> 4.1 pounds
<LI> $3200
</UL>
</BODY>
•How can I parse the
content? E.g. price?
•Need a more flexible
mechanism than HTML
to interpret content.
Why XML?
Sample Catalog Entry using XMl
<COMPUTER TYPE=“Laptop”>
<MANUFACTURER>IBM</MANUFACTURER>
<LINE> ThinkPad</LINE>
<MODEL>600E</MODEL>
<SPECIFICATIONS>
<SPEED UNIT = “MHz”>400</SPEED>
<MEMORY UNIT=“MB”>64</MEMORY>
<DISK UNIT=“GB”>8</DISK>
<WEIGHT UNIT=“POUND”>4.1</WEIGHT>
<PRICE CURRENCY=“USD”>3200</PRICE>
</SPECIFICATIONS>
</COMPUTER>
Smart processing using XMl
• <COMPUTER> and <SPECIFICATIONS> provide
logical containers for extracting and
manipulating product information as a unit
• – Sort by <MANUFACTURER>, <SPEED>,
<WEIGHT>, <PRICE>, etc.
• Explicit identification of each part enables
its automated processing
– Convert <PRICE> from “USD” to Euro,
Yen,etc.
Document exchange
• Use of XMl allows companies to
exchange information that can be
processed automatically without
human intervention e.g.
– Purchase orders
– Invoices
– Catalogues etc
Difference between HTML and
XML?
•
XML was designed to carry data.
•
XML is not a replacement for HTML.
•
XML and HTML were designed with different goals:
•XML was designed to describe data and to focus on
what data is.
HTML was designed to display data and to focus on
how data looks.
•HTML is about displaying information, while XML is
about describing information.
XML: Describes Data
•
XML was not designed to DO anything.
•
XML is created to structure, store and to send
information BUT it requires software to send it, receive it or
display it
•
The following example is a note to Angela from Jim, stored
as XML:
<note>
<to>Angela</to>
<from>Jimmy</from>
<heading>Reminder</heading>
<body>Get the paper!</body>
</note>
XML: Describes Data
The note has a header and a message body. It also has
sender and receiver information. But still, this XML
document does not DO anything.
It is just pure information wrapped in XML tags.
Someone must write a piece of software to send, receive
or display it.
XML is free and extensible
•
XML tags are not predefined. You must "invent" your
own tags.
•
The tags used to mark up HTML documents and the
structure of HTML documents are predefined. The
author of HTML documents can only use tags that are
defined in the HTML standard (like <p>, <h1>, etc.).
•
XML allows the author to define his own tags and his
own document structure.
•
The tags in the example above (like <to> and <from>)
are not defined in any XML standard. These tags are
"invented" by the author of the XML document.
XML Document Structure
XMl Declaration
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
XML Document: Example
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>Angela</to>
The first line in the document - the XML
declaration - defines the XML version
<from>Jimmy</from>
and the character encoding used in the
<heading>Reminder</heading> document. In this case the document
<body>Get the paper!</body> conforms to the 1.0 specification of XML
and uses the ISO-8859-1 (Latin-1/West
</note>
European) character set.
<Root> element (what the
document is)
End of the
root element
<Child>
elements
XML Syntax rules
1. All elements must have a closing </…> tag*
e.g. in HTML it’s okay to write:
•
<p> this is a paragraph
HTML allows the closing tags to be left out.
In XMl, would have
•
<p> this is a paragraph </p>
* except for the XMl declaration
XML Syntax rules
2. All elements are case sensitive
Unlike HTML, XMl tags are case sensitive
<letter> this is a incorrect </Letter>
<letter> this is a correct </letter>
XML Syntax rules
3. All elements must be properly nested
<b><i> this is a incorrect in XML</b></i>
acceptable in HTML but not XML
<b><i> this is a correct in XML </i></b>
XML Syntax rules
4. All XML documents must have a root element
•
All XML documents must contain a single tag pair to define a
root element - all other elements must be within this root
element.
•
All elements can have sub elements (child elements).
•
Sub elements must be correctly nested within their parent
element: <root>
<child>
<subchild>.....</subchild>
</child>
</root>
XML Syntax rules
5. Attribute values must always be in quotations “”
•
XML elements can have attributes in name/value pairs just
like in HTML. In XML the attribute value must always be
quoted. Study the two XML documents below. The first one
is incorrect, the second is correct:
Incorrect
<?xml version= etc
<note date=12/11/2002>
<to>Tove</to>
<from>Jani</from>
</note>
correct
<?xml version= etc
<note date=“12/11/2002”>
<to>Tove</to>
<from>Jani</from>
</note>
XML Syntax rules
6. Valid Element Naming
XML elements must follow these naming rules:
•Names can contain letters, numbers, and other characters
•Names must not start with a number or punctuation character
•Names must not start with the letters xml (or XML or Xml ..)
•Names cannot contain spaces
child/parent elements
<book>
<title>My First XML</title>
<prod id="33-657" media="paper"></prod>
<chapter>Introduction to XML
<para>What is HTML</para>
<para>What is XML</para>
</chapter>
<chapter>XML Syntax
<para>Elements must have tag</para>
<para>Elements must be nested</para>
</chapter>
</book>
Note:
Indentation helps
readability..
Book is the root element. Title, prod, and chapter are child
elements of book. Book is the parent element of title, prod,
and chapter. Title, prod, and chapter are siblings (or sister
elements) because they have the same parent.
child elements vs attributes
1.
<note>
<note date="12/11/2002">
2.
<date>
<to>Tove</to>
<day>12</day>
<from>Jani</from>
<month>11</month>
<heading>Reminder</heading>
<year>2002</year>
<body>Hello!</body>
</date>
</note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Hello!</body>
</note>
Both 1 and 2 contain exactly the same information. 1 uses an
attribute for the date. 2 uses child elements.
Usually easier to use child elements – easier to read and
maintain.
Well formed XMl documents
A “well formed” XMl document adhere to syntax rules
described
Can check whether an XML document has valid syntax
at:
http://www.w3schools.com/dom/dom_validate.asp
OR
can simply open your xml document in a web browser
such as Internet Explorer and it will only open if correctly
formatted
To manually create or write an
XML document:
• Figure out what the overall document is (-> root tag)
• Figure out what the key fields of information.. (-> tag
names)
• Figure out which information is data (-> tag contents)
To manually create or write an
XML document:
Example: Write an XML document for the following
Address book
Name
Jeremy Cannon
Naomi Murphy
Sheila Zheng
Telephone
Address
0098837
992887
999287
22, Marlboro Ct
39, Alma Road
Apt 4, Hyde Road
To manually create or write an
XML document:
• Figure out what the overall document is (-> root tag)
--- Address book
• Figure out what the key fields of information.. (-> tag
names)
--- Information is Name (which can be broken into first
name, surname), Telephone, Address (which can be
broken down into house number, street name)
• Figure out which information is data (= tag contents)
number, street name)
To manually create or write an
XML document:
example
<addressbook>
<person
<name>
<firstname> Jeremy </first name>
<surname> Cannon</surname>
</name>
<telephone> 0098837</telephone>
<address>
<housenumber>22</house number>
<street> Marlboro Ct</street>
</address>
</person
<person>
<name>
<firstname> Jeremy </first name>
… etc
</person
</addressbook>
To check your document..
• Save it as .xml file
• Open it in a browser – and see if any errors are produced
OR
• go to a specialist XML validator for better error diagnosis
e.g. http://www.w3schools.com/dom/dom_validate.asp
Valid XMl documents
A “valid” XMl document adheres a definition of what it can
contain (e.g. shouldn’t be able to put ’13’ as a <month>,
must have only allowed elements)
Two ways to define this definition
(1)Document Type Definition (DTD)
(2)XML Schema
Brief intro to both…
DTD Declaration
You usually specify the DTD for your XML document by
providing a reference to it near the top of the XMl
document.
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this
weekend!</body>
</note>
Syntax: <!Doctype root-element SYSTEM “filename”
Document Type Definitions
The DTD (note.DTD) for XMl document Note: is
.
<!DOCTYPE note [
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
note (to,from,heading,body)>
to (#PCDATA)>
from (#PCDATA)>
heading (#PCDATA)>
body (#PCDATA)> ]>
Lists the document type (note) and the valid
elements, and the type of content they can accept
Why use a DTD?
With DTD, each of your XML files can carry a description of its
own format with it.
With a DTD, independent groups of people can agree to
use a common DTD for interchanging data.
Your application can use a standard DTD to verify that the
data you receive from the outside world is valid.
XML Schema
XML Schema is an XML based alternative to DTD.
An XML schema describes the structure of an XML
document.
XML Schemas will probably be used in most Web
applications as a replacement for DTDs because:
•XML Schemas are supported by the W3C
•XML Schemas are richer and more useful than DTDs
•XML Schemas are written in XML
•XML Schemas support data types
•XML Schemas are extensible to future additions
•XML Schemas support namespaces
XML Schema
XML Schema is an XML based alternative to DTD.
An XML schema describes the structure of an XML
document.
It uses XMl syntax.
For XMl document note.xml: XML Schema shown overleaf
<?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
XML Schema: Example
root element
defines the
elements
allowed in
the .xml
document
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3schools.com"
xmlns="http://www.w3schools.com"
elementFormDefault="qualified">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
XML Schema
XML Schemas will probably be used in most Web
applications as a replacement for DTDs because:
•XML Schemas
•XML Schemas
•XML Schemas
•XML Schemas
•XML Schemas
•XML Schemas
are supported by the W3C
are richer and more useful than DTDs
are written in XML
support data types
are extensible to future additions
support namespaces
Download