Uploaded by srdolanka

01 Fundamentals of XML (2 Lectures).ppt

advertisement
University of Dublin
Trinity College
Fundamentals of XML
Owen.Conlan@scss.tcd.ie
What is Markup
• <Lecture>
• Sequence of characters within a text or word
processing file to define
– Print properties
– Display properties
– Document's logical structure
• Markup indicators are often called "tags"
– Examples
• RTF
• EDIFACT
• XML
\
{}
+
</>
'
<>
:
"
Mark Up: RTF
\li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel2\a
djustright\rin0\lin0\itap0
\b\f1\fs26\lang2057\langfe1033\cgrid\langnp2057\langfenp1033
{\lang6153\langfe1033\langnp6153 Entity Relationship Diagram
\par }\pard\plain \s1\ql
\li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel0\a
djustright\rin0\lin0\itap0 \cbpat17
\b\f1\fs24\lang2057\langfe1033\kerning32\cgrid\langnp2057\langfenp1033
{\lang6153\langfe1033\langnp6153 Entity Type
\par }\pard\plain \ql
\li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
\fs24\lang2057\langfe1033\cgrid\langnp2057\langfenp1033
{\b\fs20\ul\lang6153\langfe1033\langnp6153
Def.:}{\b\fs20\lang6153\langfe1033\langnp6153 }{
\fs20\lang6153\langfe1033\langnp6153 An object or concept that is
identified by the enterprise as having an independent existence.
\par }\pard\plain \s1\ql
\li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel0\a
djustright\rin0\lin0\itap0 \cbpat17
\b\f1\fs24\lang2057\langfe1033\kerning32\cgrid\langnp2057\langfenp1033
{\lang6153\langfe1033\langnp6153 Entity
\par }\pard\plain \ql
Mark Up: EDIFACT
'''ED2'''OPENET:1111111:OVT':003705655815:OVT'ABC1234567'0'TYP:ORDERS'N
RQ:1'''
UNA:+.?
'UNB+UNOC:2+003705655815:30+1111111:30+980729:2233+4++ORDERS911+++KKK
KATE+1'UNH+
1+ORDERS:001:911:UN:FI0030'BGM+640+1234567'DTM+4:19981201:102'DTM+2:199
90101:102'DTM+2:9901:616'RFF+BC:123'RFF+VN:123456'NAD+BY+003705655815:1
00'
NAD+SE+11111111::92'NAD+PL+53432::92++KAUPPA:KAUPUNKI+KATU
9+KAUPUNKI++00007'NAD+CN+-::ZZ++TERMINAALI+OVI 42+TOINEN
KAUPUNKI++00069'UNS+D'LIN+1++23442423234
:EN'PIA+5+3244:MF'PIA+5+2341234324:ZBU'PIA+5+234243:ZCG'IMD+F+8+::91:KUKKAPUR
KKI:SAVI'QTY+21:8:KPL'FTX+AAA+++T.HARMAA:V[RI'FTX+AAA+++10:KOKO'PRI+NTP
:7.23:+
RP:7.32:PE'TAX+7+VAT+++:::22.00'LIN+2++543434554345:EN'PIA+5+535:MF'PIA
+5+45:
PCE‘UNT+38+2'UNZ+2+4'
'''EOF'''9'
Mark Up: XML
<fragment>
<section>
<title>Introduction</title>
<para>Since the emergence of <acronym refid="xml">XML</acronym> in
early 1998 and it's subsequent adoption across diverse application
domains, one of the key benefits it enabled was the separation of
content and presentation <bibref refloc="Bos97"/>. <acronym
refid="xml">XML</acronym> borrowed this model (along with other
important concepts) from the <acronym.grp><acronym
refid="sgml">SGML</acronym><expansion id="sgml">Standard
Generalised Markup Language</expansion></acronym.grp>. An
<acronym refid="sgml">SGML</acronym> document consists of
logically structured content and uses a separate file (style
sheet) to specify how the content should be formatted for
[...]
<figure id="img1">
<title>ePublishing Components</title>
<graphic href="02-04-03-fig01.jpg" width="321" height="214"/>
</figure>
</section>
</fragment>
What is SGML?
• Standard Generalised MarkUp Language
• ISO standard since 1986
• Meta-language for defining
document mark-up
vocabularies
• Uses logical mark-up
(structure, content) instead
physical (how document looks
on printed page)
• Platform-, system-, vendorand version-independent
documents
• Very powerful, but contains a
number of complex features
<!DOCTYPE anthology [
<!ELEMENT anthology - - (poem+)
>
<!ELEMENT poem - - (title?, stanza+)>
<!ELEMENT title
- O (#PCDATA)
>
<!ELEMENT stanza
- O (line+)
>
<!ELEMENT line
O O (#PCDATA)
>
]]>
<anthology>
<poem>
<title> The SICK ROSE </title>
<stanza>
<line>O Rose thou art sick.</line>
<line>The invisible worm,</line>
[...]
</stanza>
<stanza>
<line>Has found out thy bed</line>
<line>Of crimson joy:</line>
[...]
</stanza>
</poem>
</anthology>
What is HTML?
• HTML, the de facto standard
for publishing Web content, is
an SGML vocabulary
• Supporting full SGML on the
Web was too difficult so HTML
made some simplifications
–
–
–
–
not extensible
limited structure
not content oriented
cannot be validated
• HTML is a simple language to
understand and use
• Most of the content available
on the Web has been created
with HTML
<html>
<head>
<title>The SICK ROSE</title>
</head>
<body>
<h1>The SICK ROSE</h1>
<p>
O Rose thou art sick.<br />
The invisible worm,<br />
[...]
</p>
<p>
Has found out thy bed<br />
Of crimson joy:<br />
[...]
</p>
</body>
</html>
What is XML?
• eXtensible Markup Language
• XML is a simplified subset of SGML
• Can also be used to define document markup
vocabularies (e.g. XHTML)
– These can have a strictly defined structure (DTD)
• Retains the powerful features of SGML (extensibility,
structure, validation)
• Ignores the complex features of SGML and is
therefore easier to use and implement
• XML documents look similar to HTML documents
• Separates structure and presentation (like SGML)
Design of XML
• The design goals for XML as set out in the 1.0
specification are as follows:
1.XML shall be straightforwardly usable over the
Internet.
2.XML shall support a wide variety of applications.
3.XML shall be compatible with SGML.
4.It shall be easy to write programs which process
XML documents.
5.The number of optional features in XML is to be
kept to the absolute minimum, ideally zero.
Design of XML
6. XML documents should be human-legible and
reasonably clear.
7. The XML design should be prepared quickly.
8. The design of XML shall be formal and concise.
9. XML documents shall be easy to create.
10. Terseness in XML markup is of minimal
importance.
XML Example
<?xml version='1.0' encoding='ISO-8859-1' standalone='yes' ?>
<doc type="book" isbn="1-56592-796-9" xml:lang="en">
<title>A Guide to XML</title>
<author>Norman Walsh</author>
<chapter>
<title>What Do XML Documents Look Like?</title>
<paragraph>If you are [...]</paragraph>
<ol>
<item>
<paragraph>The document begins [...]</paragraph>
</item>
<item>
<paragraph>Empty elements have [...]</paragraph>
<paragraph>In a very [...]</paragraph>
</item>
</ol>
<section>[...]</section>
[...]
</chapter>
<chapter>[...]</chapter>
</doc>
Meta Language vs. Vocabulary
Meta Languages
SGML
XML
XSL
SMIL
XHTML
SVG
HL7
v3
CEN
TC251
ASTM
31.25
SynExML
Electronic Patient Record Vocabularies
XTM
Presentation Vocabularies
Vocabularies
HTML
Why is the emergence of XML
an important development?
• XML is a tool for defining languages
– XML languages are easy to read
– XML is self describing
• Parse tree embedded in document
• Grammar for language referenced via DTD/Schema
• XML languages are easy for computers to
process, exchange and display
– XML tools are ubiquitous, free and conform to
established standards
– Natural affinity with Object serialization
– Data source neutral
XML technologies
Presentation
Linking
CSS, Cascading Style Sheets
XSL, Extensible Stylesheet Language
XPath, XQuery
XLink, XBase
XPointer
Semantics
Topics Maps, Ontology Web Language
RDF, Resource Description Framework
Structure
XML Schema, RelaxNG, RDF Schema,
Document Type Definition (DTD)
Syntax
XML Namespaces
XML 1.0
XML 1.0
• The XML 1.0 specification describes the syntax
for XML documents (elements and attributes)
and DTDs
• An XML document is a hierarchical data
structure using self-definable tags
– e.g. <doc><author>[..]</author></doc>
• There are many other technologies related to
XML
• XML is
A simple common layer
for tree structures
in a character stream.
Design goals of XML 1.0 specification
1. XML shall be straightforwardly usable over the
Internet.
2. XML shall support a wide variety of applications.
3. XML shall be compatible with SGML.
4. It shall be easy to write programs which process XML
documents.
5. The number of optional features in XML is to be kept
to the absolute minimum, ideally zero.
6. XML documents should be human-legible and
reasonably clear.
7. The XML design should be prepared quickly.
8. The design of XML shall be formal and concise.
9. XML documents shall be easy to create.
10.Terseness in XML markup is of minimal importance.
Physical Parts of XML documents
Physical parts of XML documents
•
•
•
•
•
•
•
•
XML Declaration
Elements
Attributes
Document Type Declaration
Entities
Processing Instructions
Comments
Character Data Sections
• XML Namespaces
XML Declaration
• Placed at the start of an
XML document
• Informs XML software of
– the version of XML the
document conforms to
– the character encoding
scheme used in the
document
– whether or not a set of
external declarations
affect the interpretation of
this document
<?xml version="1.0" ?>
<?xml
version="1.0“
encoding="UTF-8" ?>
<?xml
version="1.0“
encoding="UTF-8"
standalone="yes" ?>
Elements
• Define logical structure and
sections of XML documents
• Four different content types:
–
–
–
–
Data content
Element content
Mixed content
Empty.
• Each element must be
completely enclosed by
another element, except for
the root
• Note
– Any XML name must start
with a letter, underscore but
after that can include also
digits, fullstops, hyphens.
Don’t start with colon due to
namespaces
Don’t include spaces
<?xml version="1.0" ?>
<doc>
<title>Java Gently</title>
<author>Judy Bishop</author>
<publisher name=‘HH’ />
<chapter>
<thetext> this is <bold>
bold </bold> text </thetext>
<paragraph/>
</chapter>
</doc>
Attributes
• Provides additional
information about an
element
• Attributes are contained
within the start-tag
• Consists of a name and
associated value
separated by an equals
sign
• The attribute value must
always be enclosed by
quotes
• The order of attributes is
insignificant
<?xml version="1.0" ?>
<doc type="book"
isbn="0-201-71050-1">
<title>Java Gently</title>
<author>Judy Bishop</author>
<chapter>
<paragraph type="abstract">
In this book ...
</paragraph>
</chapter>
</doc>
ELEMENT vs. ATTRIBUTE
• Lexically little difference,
• application specific,
• no hard/fast rules available.
ELEMENT
ATTRIBUTE
• Constituent data,
• Used for content,
• White space can be
ignored or preserved
• Nesting allowed (child
elements),
• Convenient for large
values, or binary
entities.
• Inherent data,
• Used for meta-data,
• No further nesting
possible (atomic data),
• Default values,
• Minimal datatypes,
Entities
• Storage units for
repeated text
– Defined in a DTD
• Character entities are
used to insert characters
that cannot be typed
directly
• XML contains a number
of 'built-in' entities
–
–
–
–
–
"
'
<
>
&
<math>
5 < 6 and 6 > 5
</math>
<copyright>
&copyright-notice;
</copyright>
<bullet>
XML contains a number
of 'built-in'
entities
<list>
<item>"</item>
<item>'</item>
<item><</item>
<item>></item>
<item>&</item>
</list>
</bullet>
Character Data Sections
• Data which is to be
parsed is called PCDATA
• An XML parser will not
treat the contents of a
CDATA section as
markup
– Used to simplify mark-up
by escaping a selection of
text
• Entity references are not
resolved
• Useful for including
source code in XML
<![CDATA[
You don't need to escape
special characters in CDATA
sections, such as <, >, &, ,
' and ".
]]>
<![CDATA[<<< STOP now >>>]]>
<![CDATA[<?xml version='1.0'?>
<person>
<name>Mike</name>
<age>24</age>
</person>]]>
Processing Instructions
• Pass additional
information to
application (e.g. parser)
• Application-specific
instructions
• Consists of a PI Target
and PI Value
• Processed by
applications that
recognise the PI Target
<?xml-stylesheet type='text/
css' href='style.css'?>
<?xml-stylesheet type='text/
xsl' href='style.xsl'?>
<?myapp filename='test.txt'?>
Comments
• Used to comment XML
documents
• Not considered to be
part of an XML
document
• An XML parser is not
required to pass
comments to higherlevel applications
<!–- one-line comment -->
<!-This
is a
multi-line comment
-->
Well formed XML
•
•
XML Declaration required
At least one element
–
•
Empty elements are written in one of two ways:
–
–
•
•
•
•
•
Exactly one root element
Closing tag (e.g. "<br></br>")
Special start tag (e.g. "<br />")
For non-empty elements, closing tags are required
Start tag must match closing tag (name & case)
Correct nesting of elements
Attribute values must always be quoted
Attribute minimisation not allowed
Document Type Declaration
• Internal/embedded DTD
<?xml version='1.0'
standalone='yes'>
<!DOCTYPE person [
<!ELEMENT person (name,
adult, nationality)>
…
]>
• External DTD
<?xml version='1.0'>
<!DOCTYPE person SYSTEM
'person.dtd'>
What are XML Namespaces?
• W3C recommendation (January 1999)
• Each XML vocabulary is considered to own a
namespace in which all elements (and attributes) are
unique
• A single document can use elements and attributes
from multiple namespaces
– A prefix is declared for each namespace used within a
document.
– The namespace is identified using a URI (Uniform Resource
Identifier)
• An element or attribute can be associated with a
namespace by placing the namespace prefix before its
name (i.e. 'prefix:name')
– Elements (and attributes) belonging to the default namespace
do not require a prefix
Example: XML Namespaces
<?xml version='1.0'?>
St. James’s Hospital
<!ELEMENT Patient (Name, DOB)>
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
Name
First
Last
DOB
(First, Last)>
(#PCDATA)>
(#PCDATA)>
(#PCDATA)>
Airport Pharmacy
<!ELEMENT Drug
((Name|Substance), Code)>
<!ELEMENT Name
(#PCDATA)>
<!ELEMENT Substance (#PCDATA)>
<!ELEMENT Code
(#PCDATA)>
© 2003 B. Jung
<Accident Report
xmlns:sjh="http://hospital/sjh"
xmlns:dub=http://airport/dub >
<sjh:Patient>
<sjh:Name>
<sjh:First>Mike</sjh:First>
<sjh:Last>Murphy</sjh:Last>
</sjh:Name>
<sjh:DOB>12/12/1950</sjh:DOB>
</sjh:Patient>
<dub:Drug>
<dub:Name>Nurofen</dub:Name>
<dub:Code>IE-975-2</dub:Code>
</dub:Drug>
[...]
</Accident Report>
Why Namespaces?
• Important for creating XML documents
containing different types of data
• An XML document can be assembled using
elements (and attributes) from different XML
vocabularies
• Must be able to
– avoid conflicts between names
– identify the vocabulary an element belongs to
XML Processing: DOM Processing
XML
Doc
Character
Stream
Process
into Tree
Navigation
API
DOM
Application
• It views an XML tree as a data structure
• It is quite large and complex...
– Level 1 Core: W3C Recommendation, October 1998
• primitive navigation and manipulation of XML trees
• other Level 1 parts: HTML
– Level 2 Core: W3C Recommendation, November 2000
• adds Namespace support and minor new features
• other Level 2 parts: Events, Views, Style, Traversal and Range
– Level 3 Core: W3C Working Draft, April 2002
• adds minor new features
• other Level 3 parts: Schemas, XPath, Load/Save
Example: A Recipe
<recipe>
<title>Zuppa Inglese</title>
<ingredient name="egg yolks" amount="4" />
<ingredient name="milk" amount="2.5" unit="cup" />
<ingredient name="Savoiardi biscuits" amount="21" />
<ingredient name="sugar" amount="0.75" unit="cup" />
<ingredient name="Alchermes liquor" amount="1" unit="cup" />
<ingredient name="lemon zest" amount="*" />
<ingredient name="flour"
amount="0.5" unit="cup" />
<ingredient name="fresh whipping cream" amount="*" /> <preparation> <step>Warm up the milk in a nonstick sauce pan</step>
<step>In a large bowl beat the egg yolks with the sugar, add the flour and
combine the ingredients until well mixed.</step>
<step>Add the milk, a little
bit at the time to the egg mixture, mixing well.</step>
<step>Put the mixture
into the sauce pan and cook it on the stove at a medium low heat. Mix the cream
continuously with a wooden spoon. When it starts to thicken remove it from the
heat and pour it on a large plate to cool off.</step>
<step>Stir the cream
now and then so that the top doesn't harden.</step>
<step>Dip quickly both
sides of the lady fingers in the liquor. Layer them one at the time in a glass
bowl large enough to contain 7 biscuits.</step>
<step>Spread 1/3 of the cream
and repeat the layer with lady fingers. Finish with the cream.</step>
</preparation>
<comment>Refrigerate for at least 4 hours better yet overnight. Before
serving decorate the zuppa inglese with whipped cream.</comment>
<nutrition calories="612" fat="49" carbohydrates="45" protein="4"
alcohol="2" />
</recipe>
Example: Getting a Recipe
import java.io.*;
import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.*;
public class FirstRecipeDOM {
public static void main(String[] args) {
try {
DOMParser p = new DOMParser();
p.parse(args[0]);
Document doc = p.getDocument();
Node n = doc.getDocumentElement().getFirstChild();
while (n!=null && !n.getNodeName().equals("recipe"))
n = n.getNextSibling();
PrintStream out = System.out;
out.println("<?xml version=\"1.0\"?>");
out.println("<collection>");
if (n!=null)
print(n, out);
out.println("</collection>");
} catch (Exception e) {e.printStackTrace();}}
© 2003 B. Jung
COPYRIGHT © 2000-2003 ANDERS MØLLER & MICHAEL I. SCHWARTZBACH
XML Processing: SAX Processing
XML
Doc
Character Stream
Events API
SAX
Application
• An XML tree is not viewed as a data structure, but as a stream of
events generated by the parser.
• The kinds of events are:
–
–
–
–
–
–
the start of the document is encountered
the end of the document is encountered
the start tag of an element is encountered
the end tag of an element is encountered
character data is encountered
a processing instruction is encountered
• Scanning the XML file from start to end, each event invokes a
corresponding callback method that the programmer writes
Example: Getting total amount of Flour
import
import
import
import
java.io.*;
org.xml.sax.*;
org.xml.sax.helpers.*;
org.apache.xerces.parsers.SAXParser;
public static void main(String[] args) {
Flour f = new Flour();
SAXParser p = new SAXParser();
p.setContentHandler(f);
try { p.parse(args[0]); }
catch (Exception e) {e.printStackTrace();}
System.out.println(f.amount);
}
public class Flour extends DefaultHandler {
float amount = 0;
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) {
if (namespaceURI.equals("http://recipes.org") && localName.equals("ingredient")) {
String n = atts.getValue("","name");
if (n.equals("flour")) {
String a = atts.getValue("","amount"); // assume 'amount' exists
amount = amount + Float.valueOf(a).floatValue();
}
}
}
}
COPYRIGHT © 2000-2003 ANDERS MØLLER & MICHAEL I. SCHWARTZBACH
Summary
• XML = eXtensible Markup Language
• An XML document is a hierarchical data structure
using self-definable tags
• Physical parts of XML document
–
–
–
–
–
–
–
–
–
XML Declaration
Elements
Attributes
Document Type Declaration
Entities
Processing Instructions
Comments
Character Data Sections
XML Namespaces
• Two types of APIs popular for XML Processing: DOM &
SAX
• </Lecture>
University of Dublin
Trinity College
Defining XML Vocabularies
DTDs and XML Schemas
Owen.Conlan@scss.tcd.ie
What is an XML vocabulary?
• Synonyms
– ‘Application of XML’
– XML Language
• Set of elements and attributes for
representing domain-specific information
• “Instance” of a Mark Up Language
• Defined by DTD or XML Schema
• Some are approved by standard organisations
– E.g. ebXML, MathML, XSL etc.
Remember: XML is syntax!
What is a DTD?
• Document Type Definition,
• Defines structure/model of XML documents
– Elements and Cardinality
– Attributes
– Aggregation
• Defines default ATTRIBUTE values
• Defines ENTITIES
• Stored in a plain text file and referenced by an XML document
(external)
• Alternatively a DTD can be placed in the XML document itself
(internal)
• Used to validate an XML document
• “Is there a need for a DTD”?
Why use a DTD?
• Applications may require all documents to be
consistent instances of a particular vocabulary
• Indicates what structures and names can be
used in a document
• Documents are constructed and named in a
conformant manner
– Ease constructing (provide structure)
– Ease parsing
• Validate documents in order to find
inconsistencies
Valid XML
• Well-formed plus conforms to DTD
• All elements and attributes are declared within
a DTD (internal or external)
• Elements and attributes match the
declarations in the DTD
Element Type Declaration
• Define grouping of
elements
– "(", “)"
<!ELEMENT doc
(title, author, editor,
chapter, appendix)>
<!ELEMENT title (#PCDATA)>
• Define sequence of
elements
– ",": followed-by
(Sequence)
– "|": logical or
(Choice)
<!ELEMENT author
(name | synonym)>
<!ELEMENT image EMPTY>
<!ELEMENT paragraph
(#PCDATA | bold | italic)*>
Element Type Declaration
• Define occurrences of
elements
– ?: zero-or-one
– +: one-or-more
– *: zero-or-more
<!ELEMENT doc
(title, author+, editor?,
chapter+, appendix*)>
<!ELEMENT chapter
(title,
(section+ | paragraph+))>
<!ELEMENT list
(item?, item?, item)>
<!ENTITY % list "ordered |
unordered | definition">
<!ELEMENT paragraph
(#PCDATA | %list;)*>
Attribute List Declaration
• Define type of attribute
–
–
–
–
–
ID
IDREF
ENTITY
NMTOKEN
NOTATION
• Define default values of
attributes
–
–
–
–
#REQUIRED
#IMPLIED
#FIXED
A list of values with
default selection
<!ATTLIST person
ssn ID #IMPLIED>
<!ATTLIST adult
age CDATA #REQUIRED>
<!ATTLIST mml
version ‘1.0’ #FIXED>
<!ATTLIST person
sex (m | f) #REQUIRED>
<!ATTLIST day
temperature (l | m | h) "l">
Entity Declaration
• Internal entities
– Built-in
• External entities
– References to a file
(text, images etc.)
• Parameter entities
– Used inside DTDs
<!ENTITY author
"Norman Walsh, Sun Corp.">
<!ENTITY copyright
SYSTEM "copyright.xml">
<!ENTITY % part
"(title?, (paragraph |
section)*)">
Simple DTD Example
<!ENTITY % part "(title?, (paragraph | section)*)">
<!ELEMENT doc (title, author+, chapter+, appendix*)>
<!ATTLIST doc type (book | article) "book“
isbn CDATA #REQUIRED>
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ATTLIST
<!ELEMENT
<!ELEMENT
<!ELEMENT
title (#PCDATA)>
author (#PCDATA)>
chapter %part;>
appendix %part;>
section %part;>
paragraph (#PCDATA | url | ol)*>
paragraph type CDATA #IMPLIED>
ol (item+)>
item (paragraph+)>
url (#PCDATA)>
Example XML and related DTD
<database>
<person age='34'>
<name>
<title> Mr </title>
<firstname> John </firstname>
<firstname> Paul </firstname>
<surname> Murphy </surname>
</name>
<hobby> Football </hobby>
<hobby> Racing </hobby>
</person>
<!DOCTYPE database [
<person >
<name>
<firstname> Mary </firstname>
<surname> Donnelly </surname>
</name>
</person>
</database>
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT database (person*)>
<!ELEMENT person (name,hobby*)>
<!ATTLIST person age CDATA
#IMPLIED>
<!ELEMENT name (title?, firstname
+, surname)>
]>
hobby (#PCDATA)>
title (#PCDATA)>
firstname (#PCDATA)>
surname (#PCDATA)>
What are XML Schemas?
• W3C Recommendation, 2 May 2001
– Part 0: Primer
– Part 1: Structures
– Part 2: Datatypes
• DTDs use a non-XML syntax and have a
number of limitations
– no namespace support
– lack of data-types
• XML Schemas are an alternative to DTDs
• Used to formally specify a "class" of XML
documents ( n "instance document")
• Supports simple/complex data-types
Why use XML Schemas?
• Uses an XML syntax
• Supports simple and complex data-types such
as user-defined types
• An XML document and its contents can be
validated against a Schema
• Can validate documents containing multiple
namespaces
• Schemas are more powerful than DTDs and
will eventually replace DTDs
DTD
Named Types –
simple
<!ELEMENT firstname (#PCDATA)>
XML doc. Instance
XML Schema
<xsd:element name="firstname" type="xsd:string"/>
<firstname>Michael</firstname>
XML doc. Instance
XML Schema
DTD
Named Types –
complex
<!ELEMENT name (firstname, lastname)>
<xsd:complexType name="namePerson">
<xsd:sequence>
<xsd:element name="firstname" type="xsd:string"/>
<xsd:element name="lastname" type="xsd:string/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="name" type="namePerson"/>
<name>
<firstname>Michael</firstname>
<lastname>Porter</lastname>
</name>
Primitive Datatypes
•
•
•
•
•
•
•
•
•
string
boolean
decimal
float
double
duration
dateTime
time
date
http://www.w3.org/TR/xmlschema-2/
•
•
•
•
•
•
•
•
•
•
gYearMonth
gYear
gMonthDay
gDay
gMonth
hexBinary
base64Binary
anyURI
QName
NOTATION
XML doc. Instance
XML Schema
Simple Type -
Restriction
<simpleType name='celsiusBodyTemp'>
<restriction base='decimal'>
<totalDigits value='4'/>
<fractionDigits value='1'/>
<minInclusive value='36.4'/>
<maxInclusive value='40.5'/>
</restriction>
</simpleType>
<xsd:element name="temp" type="celsiusBodyTemp"/>
<temp>37.2</temp>
XML doc. Instance
XML Schema
Simple Type -
Enumeration
<xsd:simpleType name="weekday">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="Sunday"/>
<xsd:enumeration value="Monday"/>
<xsd:enumeration value="Tuesday"/>
[...]
</xsd:restriction>
</xsd:simpleType>
<xsd:element name="delivery" type="weekday"/>
<delivery>Tuesday</delivery>
DTD
<!ENTITY % fullname "title?, firstname*, lastname">
<!ELEMENT name (%fullname;)>
XML Schema
Cardinalities
<xsd:complexType name="fullname">
<xsd:sequence>
<xsd:element name="title" minOccurs="0"/>
<xsd:element name="firstname" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element name="lastname"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="name" type="fullname"/>
XML doc. Instance
Complex Type -
<name>
<firstname>Michael</firstname>
<firstname>Jason</firstname>
<lastname>Porter</lastname>
</name>
DTD
<!ENTITY % name "title?, firstname*, lastname">
<!ELEMENT name (%name;, maidenname?)>
XML Schema
Derived Type by extension
<xsd:complexType name="fullnameExt">
<xsd:complexContent>
<xsd:extension base="fullname">
<xsd:sequence>
<xsd:element name="maidenname" minOccurs="0"/>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:element name="name" type="fullnameExt"/>
XML doc. Instance
Complex Type –
<name>
<firstname>Jane</firstname>
<lastname>Porter</lastname>
<maidenname>Hughes</maidenname>
</name>
XML doc. Instance
XML Schema
Complex Type –
Derived Type by Restriction
<xsd:complexType name="simpleName">
<xsd:complexContent>
<xsd:restriction base="fullname">
<xsd:sequence>
<xsd:element name="title" maxOccurs="0"/>
<xsd:element name="firstname" minOccurs="1"/>
<xsd:element name="lastname"/>
</xsd:sequence>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
<xsd:element name="name" type="simpleName"/>
<name>
<firstname>Jane</firstname>
<lastname>Porter</lastname>
</name>
Sequence
XML Schema
<!ELEMENT name (title?, firstname*, lastname)>
<xsd:complexType name="fullname">
<xsd:sequence>
<xsd:element name="title" minOccurs="0"/>
<xsd:element name="firstname" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element name="lastname"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="name" type="fullname"/>
XML doc. Instance
DTD
Structure -
<name>
<firstname>Michael</firstname>
<firstname>Jason</firstname>
<lastname>Porter</lastname>
</name>
Choice
XML Schema
<!ELEMENT pay (product, number, (cash | cheque))>
<xsd:complexType name="payment">
<xsd:sequence>
<xsd:element ref="product"/>
<xsd:element ref="number"/>
<xsd:choice>
<xsd:element ref="cash"/>
<xsd:element ref="cheque"/>
</xsd:choice>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="pay" type="payment"/>
XML doc. Inst.
DTD
Structure -
<pay>
<product>Ericsson Telefon MD110</product>
<number>1544-198-J</number>
<cash>IR£150</cash>
</pay>
DTD
<xsd:element name="greeting">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:string">
<xsd:attribute name="language" type="xsd:string"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
XML doc. Instance
<!ELEMENT greeting (#PCDATA)>
<!ATTLIST greeting language CDATA "English">
XML Schema
Attributes
<greeting language="German">Hello!</greeting>
XML Inst.
XML Schema
DTD
Attribute Groups
<!ELEMENT img EMPTY>
<!ATTLIST img src CDATA #REQUIRED
width CDATA #IMPLIED
height CDATA #IMPLIED>
<xsd:attributeGroup name="imgAttributes">
<xsd:attribute name="src" type="xsd:string" use="required"/>
<xsd:attribute name="width" type="xsd:integer"/>
<xsd:attribute name="height" type="xsd:integer"/>
</xsd:attributeGroup>
<xsd:element name="img">
<xsd:complexType>
<xsd:attributeGroup ref="imgAttributes"/>
<xsd:complexType>
</xsd:element>
<img src="XMLmanager.gif" width="60"/>
XML Schema
DTD
Mixed Content
<!ELEMENT p (#PCDATA | b | i)*>
<!ELEMENT b (#PCDATA)>
<xsd:complexType name="bolditalicText" mixed="true">
<xsd:choice minOccurs="0" maxOccurs="unbounded"/>
<xsd:element ref="b" />
<xsd:element ref="i" />
</xsd:choice>
</xsd:complexType>
XML doc. Instance
<xsd:element name="p" type="bolditalicText"/>
<p>This is <b>bold</b> and <i>italic</i> text</p>
XML doc. Instance
XML Schema
DTD
Empty Element
<!ELEMENT img EMPTY>
<!ATTLIST src CDATA #REQUIRED>
<xsd:element name="img">
<xsd:complexType>
<xsd:attribute name="src" type="xsd:string"/>
</xsd:complexType>
</xsd:element>
<img src="XMLmanager.gif"/>
XML Schema Example
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
<xsd:element name="book">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="title" type="xsd:string"/>
<xsd:element name="author" type="xsd:string"/>
<xsd:element name="character” type="xsd:string"
minOccurs="0" maxOccurs="unbounded">
</xsd:element>
</xsd:sequence>
<xsd:attribute name="isbn" type="xsd:string"/>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Summary
• XML Vocabularies are defined using
– DTD
– XSD
• DTDs/XSDs used to validate XML documents
• XSD – more powerful than DTDs
– Supports simple and complex data-types such as
user-defined types
– Can validate documents containing multiple
namespaces
Download