XML
Major Sources:
•http://www.cis.upenn.edu/~cis550/slides/xml.
ppt CIS550 Course Notes, U. Penn, source for many slides
•Yaron Kanza’s slides, source for many slides
•Brian Travis, XML Day At
Microsoft Tech·Ed 99
•XML Black Book
•Other sources ….
1
Part I: Background
What’s the difference between the world
of documents and information retrieval and
databases and query interfaces?
•The line between the document world and
the database world is not clear.
•In some cases, both approaches are
legitimate.
•An interesting middle ground is data
formats -- of which XML is an example
2
Documents
vs
Databases
Document world
> plenty of small documents
> usually static
Database world
> a few large databases
> usually dynamic
> implicit structure
> explicit structure (schema)
> tagging
> records
> human friendly
> machine friendly
> content
> content
section, paragraph, toc,
form/layout, annotation
> Paradigms
“Save as”, wysiwyg
> meta-data
author name, date, subject
schema, data, methods
> Paradigms
Atomicity, Concurrency, Isolation, Durability
> meta-data
schema description
3
What to do with them
Documents
• editing
Database
• updating
• printing
• spell-checking
• counting words
• cleaning
• retrieving (IR)
• querying
• searching
• composing/transforming
4
The Structure of XML
• XML consists of tags and text
• Tags come in pairs <date> ...</date>
• They must be properly nested
<date> <day> ... </day> ... </date> --- good
<date> <day> ... </date>... </day> --- bad
(You can’t do <i> ... <b> ... </i> ...</b> in HTML)
5
XML text
XML has only one “basic” type -- text.
It is bounded by tags e.g.
<title> The Big Sleep </title>
<year> 1935 </ year> --- 1935 is still text
XML text is called PCDATA (for parsed
character data). It uses a 16-bit encoding,
e.g. \&\#x0152 for the Hebrew letter Mem
Later we shall see how new types are specified by
XML-data
6
XML is tree-like
person
name
tel
tel
email
Malcolm Atchison
(215) 898 4321
(215) 898 4321
mp@dcs.gla.ac.sc
Semistructured data models typically put the
labels on the edges
7
Mixed Content
An element may contain a mixture of sub-elements and
PCDATA
<airline>
<name> British Airways </name>
<motto>
World’s <dubious> favorite</dubious> airline
</motto>
</airline>
Data of this form is not typically generated from databases. It is
needed for consistency with HTML
8
A Complete XML Document
<?XMLversion ="1.0" encoding="UTF-8"
standalone="no"?>
<!DOCTYPE addresses SYSTEM
"http://www.cs.technion.ac.il/~oshmu/addresses1.dtd">
<addresses>
<person>
<name> Jeff Cohen</name>
<tel> 04-828-1345 </tel>
<tel> 0544-470-778 </tel>
<email> jeffc@cs.technion.ac.il </email>
</person>
</addresses>
9
The Header Tag
• <?xml version="1.0" standalone="yes/no"
encoding="UTF-8"?>
• You can leave out the encoding attribute and the
processor will use the UTF-8 default.
10
Ways of representing a DB
projects:
title
employees:
name
budget
ssn
managedBy
age
11
Project and Employee relations in XML
Projects and employees are intermixed
<db>
<project>
<title> Pattern recognition </title>
<budget> 10000 </budget>
<managedBy> Joe </managedBy>
</project>
<employee>
<name> Joe </name>
<ssn> 344556 </ssn>
<age> 34 < /age>
</employee>
<employee>
<name> Sandra </name>
<ssn> 2234 </ssn>
<age> 35 </age>
</employee>
<project>
<title> Auto guided vehicle </title>
<budget> 70000 </budget>
<managedBy> Sandra </managedBy>
</project>
:
</db>
12
Project and Employee relations in XML (cont’d)
Employees follow projects
<db>
<employees>
<projects>
<employee>
<project>
<name> Joe </name>
<title> Pattern recognition </title>
<ssn> 344556 </ssn>
<budget> 10000 </budget>
<age> 34 </age>
<managedBy> Joe </managedBy>
</employee>
</project>
<employee>
<project>
<name> Sandra </name>
<title> Auto guided vehicles </title>
<ssn> 2234 </ssn>
<budget> 70000 </budget>
<age>35 </age>
<managedBy> Sandra </managedBy>
</employee>
</project>
:
:
<employees>
</projects>
</db>
13
Project and Employee relations in XML (cont’d)
Or without “separator” tags …
<db>
<projects>
<employees>
<title> Pattern recognition </title>
<name> Joe </name>
<ssn> 344556 </ssn>
<budget> 10000 </budget>
<age> 34 </age>
<managedBy> Joe </managedBy>
<name> Sandra </name>
<title> Auto guided vehicles </title>
<ssn> 2234 </ssn>
<budget> 70000 </budget>
<age> 35 </age>
<managedBy> Sandra </managedBy>
:
</employees>
:
</db>
</projects>
14
Attributes
An (opening) tag may contain attributes. These are
typically used to describe the content of an element
<entry>
<word language = “en”> cheese </word>
<word language = “fr”> fromage </word>
<word language = “ro”> branza </word>
<meaning> A food made … </meaning>
</entry>
15
Attributes (cont’d)
Another common use for attributes is to express
dimension or type
<picture>
<height dim= “cm”> 2400 </height>
<width dim= “in”> 96 </width>
<data encoding = “gif” compression = “zip”>
M05-.+C$@02!G96YE<FEC ...
</data>
</picture>
A document that obeys the “nested tags” rule and
does not repeat an attribute within a tag is said to
be well-formed .
16
ODL schema
class Movie
( extent Movies, key title )
class Actor
( extent Actors, key name )
{
{
attribute string title;
attribute string director;
relationship set<Actor> casts
inverse Actor::acted_In;
attribute int budget;
};
attribute string name;
relationship set<Movie> acted_In
inverse Movie::casts;
attribute int age;
attribute set<string> directed;
};
17
An example
<db>
<movie id=“m1”>
<title>Waking Ned Divine</title>
<director>Kirk Jones III</director>
<cast idrefs=“a1 a3”></cast>
<budget>100,000</budget>
</movie>
<movie id=“m2”>
<title>Dragonheart</title>
<director>Rob Cohen</director>
<cast idrefs=“a2 a9 a21”></cast>
<budget>110,000</budget>
</movie>
<movie id=“m3”>
<title>Moondance</title>
<director>Dagmar Hirtz</director>
<cast idrefs=“a1 a8”></cast>
<budget>90,000</budget>
</movie>
:
<actor id=“a1”>
<name>David Kelly</name>
<acted_In idrefs=“m1 m3 m78” >
</acted_In>
</actor>
<actor id=“a2”>
<name>Sean Connery</name>
<acted_In idrefs=“m2 m9 m11”>
</acted_In>
<age>68</age>
</actor>
<actor id=“a3”>
<name>Ian Bannen</name>
<acted_In idrefs=“m1 m35”>
</acted_In>
</actor>
:
</db>
18
Part II: Document Type Descriptors
Imposing structure on XML documents
19
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT email (#PCDATA)>
<!ELEMENT tel (#PCDATA)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT person (name,tel*,email)>
<!ATTLIST person friend (yes | no) #IMPLIED
id ID #REQUIRED
knows IDREFS #IMPLIED>
<!ELEMENT addresses (person)*>
20
Document Type Descriptors
• Document Type Descriptors (DTDs) impose
structure on an XML document.
• There is some relationship between a DTD and a
schema, but it is not close -- hence the need for
additional “typing” systems.
• XML Schema and RELAX NG are two such
formalisms.
• The DTD is a syntactic specification.
21
Example: The Address Book
<person>
<name> MacNiel, John </name>
<greet> Dr. John MacNiel </greet>
Exactly one name
At most one greeting
<addr>1234 Huron Street </addr> As many address lines
<addr> Rome, OH 98765 </addr>
<tel> (321) 786 2543 </tel>
<fax> (321) 786 2543 </fax>
<tel> (321) 786 2543 </tel>
<email> jm@abc.com </email>
</person>
as needed (in order)
Mixed telephones
and faxes
As many
as needed
22
Specifying the structure
• name
to specify a name element
• greet?
to specify an optional
(0 or 1) greet elements
• name,greet?
to specify a name followed by an
optional greet
23
Specifying the structure (cont)
• addr*
• tel | fax
to specify 0 or more address lines
a tel or a fax element
• (tel | fax)* 0 or more repeats of tel or fax
• email*
0 or more email elements
24
Specifying the structure (cont)
So the whole structure of a person entry is specified
by
name, greet?, addr*, (tel | fax)*, email*
This is known as a regular expression. Why is it
important?
25
Internal DTD for the address book
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE addressbook [
<!ELEMENT addressbook (project*)>
<!ELEMENT person
(name, greet?, address*, (fax | tel)*, email*)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT greet (#PCDATA)>
<!ELEMENT address (#PCDATA)>
<!ELEMENT tel
(#PCDATA)>
<!ELEMENT fax
(#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
26
Rest of the address book
<addressbook>
<person>
<name> Jeff Cohen </name>
<greet> Dr. Cohen </greet>
<email> jc@penny.com </email>
</person>
</addressbook>
27
Our relational DB revisited
projects:
title
employees:
name
budget
ssn
managedBy
age
28
Two DTDs for the relational DB
<!DOCTYPE db [
<!ELEMENT db
(projects,employees)>
<!ELEMENT projects
(project*)>
<!ELEMENT employees (employee*)>
<!ELEMENT project
(title, budget, managedBy)>
<!ELEMENT employee (name, ssn, age)>
...
]>
<!DOCTYPE db [
<!ELEMENT db
(project | employee)*>
<!ELEMENT project
(title, budget, managedBy)>
<!ELEMENT employee (name, ssn, age)>
...
]>
29
Recursive DTDs
<!DOCTYPE genealogy [
<!ELEMENT genealogy (person*)>
<!ELEMENT person (
name,
dateOfBirth,
person,
person )>
...
]>
-- mother
-- father
What is the problem with this?
30
Recursive DTDs cont’d.
<!DOCTYPE genealogy [
<!ELEMENT genealogy (person*)>
<!ELEMENT person (
name,
dateOfBirth,
person?,
person? )>
...
]>
-- mother
-- father
What is now the problem with this?
31
General Definitions of Entities
ANY - tells that the element can have any
content.
EMPTY - tells that the element has no
content.
32
Summary of DTD regular expressions
•
•
•
•
•
•
•
A
e1,e2
e*
e?
e+
e1 | e2
(e)
The tag A occurs
The expression e1 followed by e2
0 or more occurrences of e
Optional -- 0 or 1 occurrences
1 or more occurrences
either e1 or e2
grouping
33
Specifying attributes in the DTD
<!ELEMENT height (#PCDATA)>
<!ATTLIST height
dimension CDATA #REQUIRED
accuracy CDATA #IMPLIED >
The dimension attribute is required; the accuracy
attribute is optional.
CDATA is the “type” of the attribute -- it means
string, may take any literal string as a value.
34
Specifying ID and IDREF attributes
<!DOCTYPE family [
<!ELEMENT family (person)*>
<!ELEMENT person (name)>
<!ELEMENT name (#PCDATA)>
<!ATTLIST person
id
ID
#REQUIRED
mother IDREF #IMPLIED
father
IDREF #IMPLIED
children IDREFS #IMPLIED>
]>
35
Some conforming data
<family>
<person id="jane" mother="mary" father="john">
<name> Jane Doe </name>
</person>
<person id="john" children="jane jack">
<name> John Doe </name>
</person>
<person id="mary" children="jane jack">
<name> Mary Doe </name>
</person>
<person id="jack" mother="mary" father="john">
<name> Jack Doe </name>
</person>
</family>
36
Consistency of ID and IDREF attribute values
•If an attribute is declared as ID
– the associated values must all be distinct (no
confusion).
•If an attribute is declared as IDREF
– the associated value must exist as the value of
some ID attribute (no dangling “pointers”).
•Similarly for all the values of an IDREFS
attribute.
•ID and IDREF attributes are not typed.
37
Formally
• Validity constraint: One ID per Element Type
No element type may have more than one ID attribute
specified.
• Validity constraint: ID Attribute Default
An ID attribute must have a declared default of
#IMPLIED or #REQUIRED.
• Validity constraint: IDREF
Values of type IDREF must match the Name production,
and values of type IDREFS must match Names; each
Name must match the value of an ID attribute on some
element in the XML document; i.e. IDREF values must
match the value of some ID attribute.
38
A useful abbreviation
When an element has empty content we can use
<tag blahblahbla/>
for
<tag blahblahbla></tag>
For example (DTD on next slide):
<family>
<person id = "jane”>
<name> Jane Doe </name>
<mother idref = "mary”/>
<father idref = "john”/>
</person>
...
</family>
39
An alternative specification
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE family [
<!ELEMENT family (person)*>
<!ELEMENT person (name, mother?, father?, children?)>
<!ATTLIST person id ID #REQUIRED>
<!ELEMENT name (#PCDATA)>
<!ELEMENT mother EMPTY>
<!ATTLIST mother idref IDREF #REQUIRED>
<!ELEMENT father EMPTY>
<!ATTLIST father idref IDREF #REQUIRED>
<!ELEMENT children EMPTY>
<!ATTLIST children idrefs IDREFS #REQUIRED>
]>
40
The revised data
<family>
<person id="jane">
<name> Jane Doe
</name>
<children idrefs="ami
tami"/>
</person>
<person id="john">
<name> John Doe
</name>
<children idrefs="ami
tami"/>
</person>
<person id="ami">
<name> Ami Doe
</name>
<mother idref="jane"/>
<father idref="john"/>
</person>
<person id="tami">
<name> Tami Doe
</name>
</person>
</family>
41
ODL schema
class Movie
( extent Movies, key title )
class Actor
( extent Actors, key name )
{
{
attribute string name;
relationship set<Movie> acted_In
inverse Movie::cast;
attribute int age;
attribute set<string> directed;
attribute string title;
attribute string director;
relationship set<Actor> cast
inverse Actor::acted_In;
attribute int budget;
};
};
42
Schema.dtd
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE db [
<!ELEMENT db
(movie+, actor+)>
<!ELEMENT movie (title,director,cast,budget)>
<!ATTLIST
movie id ID #REQUIRED>
<!ELEMENT title
(#PCDATA)>
<!ELEMENT director (#PCDATA)>
<!ELEMENT cast
EMPTY>
<!ATTLIST cast
idrefs IDREFS #REQUIRED>
<!ELEMENT budget (#PCDATA)>
43
Schema.dtd (cont’d)
<!ELEMENT actor (name, acted_In,age?,directed*)>
<!ATTLIST actor id ID #REQUIRED>
<!ELEMENT name (#PCDATA)>
<!ELEMENT acted_In EMPTY>
<!ATTLIST acted_In idrefs IDREFS #REQUIRED>
<!ELEMENT age (#PCDATA)>
<!ELEMENT directed (#PCDATA)>
]>
44
Data
<db>
<movie id="ohgod">
<title> Oh God!</title>
<director> Woody Allen </director>
<cast idrefs="burns"></cast>
<budget> $2M </budget>
</movie>
<actor id="burns">
<name> George Burns </name>
<acted_In idrefs="ohgod" />
</actor>
</db>
45
Connecting the document with its DTD
In line:
<?xml version="1.0"?>
<!DOCTYPE db [<!ELEMENT ...> … ]>
<db> ... </db>
Another file:
<!DOCTYPE db SYSTEM "schema.dtd">
A URL:
<!DOCTYPE db SYSTEM
"http://www.schemaauthority.com/schema.dtd">
46
Connecting the document with its DTD
Both:
file c:/schema.dtd:
<!ELEMENT
db
(movie+, actor+)>
file to be validated
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE db SYSTEM "c:/schema.dtd"
[
<!ELEMENT movie
(title,director,cast,budget)>
<!ATTLIST movie
id ID #REQUIRED>
<!ELEMENT title
(#PCDATA)>
<!ELEMENT director (#PCDATA)>
<!ELEMENT cast
EMPTY>
<!ATTLIST
cast
idrefs IDREFS #REQUIRED>
<!ELEMENT budget
(#PCDATA)>
<!ELEMENT actor (name, acted_In,age?, directed*)>
<!ATTLIST actor id ID #REQUIRED>
<!ELEMENT name (#PCDATA)>
<!ELEMENT acted_In
EMPTY>
<!ATTLIST acted_In idrefs IDREFS #REQUIRED>
<!ELEMENT age (#PCDATA)>
<!ELEMENT directed (#PCDATA)>
]>
<db>
<movie id="ohgod">
<title> Oh God!</title>
<director> Woody Allen </director>
<cast idrefs="burns"></cast>
<budget> $2M </budget>
</movie>
<actor id="burns">
<name> George Burns </name>
<acted_In idrefs="ohgod" />
</actor>
</db>
47
Well-formed and Valid Documents
• Well-formed applies to any document (with or
without a DTD): proper nesting of tags and unique
attributes.
• Valid specifies that the document conforms to the
DTD: conforms to regular expression grammar,
types of attributes correct, and constraints on
references satisfied.
48
DTDs vs. Schemas (or Types)
• By database (or programming language) standards
DTDs are rather weak specifications.
– Only one base type -- PCDATA
– No useful “abstractions” e.g., sets
– IDREFs are untyped. You point to something, but you
don’t know what!
– No constraints e.g., child is inverse of parent
– No methods
– Tag definitions are global
• XML Schema and other standards are similar to
DB schemas.
49
Part III: Entities
To take storage into account
50
What are Entities
An entity is a shortcut to a set of information items.
You might think of an entity as being a bit like a
macro.
Entities allow dividing a document between
some different storage devices.
51
Why to use entities:
• Entities save typing.
• Entities can reduce errors.
• Entities are easy to update.
• Entities can act as placeholders for TBD
information.
52
Defining Entities
• You can define entities in your local document as
part of the DOCTYPE definition.
• You can also link to external files that contain the
entity data. This, too, is done through the
DOCTYPE definition.
• A third option is to define the entities in your
external DTD.
• Use a local definition when the entity is being used only
in this one particulars file.
• Use a linked, external file when the entity being used in
many document sets.
53
Kinds of Entities
There are two kinds of entities:
•
•
general entities
parameter entities
•
•
Internal
External
•
•
Parsed
Unparsed
•
Possibilities (first 4 are Parsed):
1.
2.
3.
4.
5.
Internal Parameter
External Parameter
Internal General
External General
External General Unparsed
54
General entities
The definition of general entities in the DTD
<!ENTITY Name EntityDefinition >
The usage of the entity in the document is by
&Name;
55
Example (partial)
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE mdb [
<!ENTITY bm "bad movie">
<!ELEMENT mdb (movie+)>
<!ELEMENT movie (title,director,cast?,budget)>]>
<mdb>
<movie id="ohgod" opinion="&bm;">
<title> Oh God!</title>
<director> Woody Allen </director>
<budget> $2M </budget>
</movie>
</mdb>
56
Example - in full
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE mdb [
<!ENTITY bm "bad movie">
<!ELEMENT mdb (movie+)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT budget (#PCDATA)>
<!ELEMENT director (#PCDATA)>
<!ELEMENT movie (title,director,cast?,budget)>
<!ATTLIST movie
id ID #REQUIRED
opinion CDATA #IMPLIED>]>
<mdb>
<movie id="ohgod" opinion="&bm;">
<title> Oh God!</title>
<director> Woody Allen </director>
<budget> $2M </budget>
</movie>
</mdb>
57
Browser View
58
Non-parsed Entities
<!DOCTYPE mdb [
<!NOTATION gif SYSTEM "c:\Program
Files\Netscape\Communicator\Program\Netscape.exe">
<!ENTITY starpicture SYSTEM
"http://www.cs.technion.ac.il/~oshmu/star.gif" NDATA gif>
<!ENTITY bm "bad movie">
<!ELEMENT mdb (movie+)>
<!ELEMENT movie (title,director, budget)>
<!ATTLIST movie id ID #REQUIRED
opinion CDATA #IMPLIED
starimage ENTITY #IMPLIED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT director (#PCDATA)>
<!ELEMENT budget (#PCDATA)>
]>
59
Data
<mdb>
<movie id="ohgod" opinion="&bm;"
starimage="starpicture">
-- note: no ampersand
<title> Oh God!</title>
<director> Woody Allen </director>
<budget> $2M </budget>
</movie>
</mdb>
60
Parameter Entities
Parameter entities are used only within DTDs.
They carry information for use in the markup
declaration.
• Internal entities - references are within the DTD.
• External entities - references draw information
from outside files.
Parameter Entity declaration:
<!ENTITY % Name EntittyDefinition >
Can’t use in the internal DTD subset
61
Parameter Entity Example
<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY % essential "name, tel*">
<!ELEMENT email (#PCDATA)>
<!ELEMENT tel (#PCDATA)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT person (%essential;, email, advisor?)>
<!ATTLIST person friend (yes | no) #IMPLIED
id ID #REQUIRED
knows IDREFS #IMPLIED>
<!ELEMENT advisor (person)>
<!ELEMENT addresses (person)*>
62
Entities Definition
Local Definition:
<!DOCTYPE [ <!ENTITY copyright
"Copyright 2000, As The World Spins Corp. All
rights reserved. Please do not copy or use without
authorization. For authorization contact
legal@worldspins.com."> ]>
Global Definition:
<!DOCTYPE [ <!ENTITY copyright SYSTEM
"http://www.worldspins.com/legal/copyright.xml">]>
63
Example
<?xml version="1.0">
<!DOCTYPE [ <!ENTITY copyright
"Copyright 2000, As The World Spins Corp. All rights reserved.
Please do not copy or use without authorization. For
authorization contact legal@worldspins.com.">
<!ENTITY trademark SYSTEM
"http://www.worldspins.com/legal/trademark.xml">
]>
64
Example (cont.)
<PRESSRELEASE>
<HEAD>
Mini-globe revolutionizes keychain industry
</HEAD>
<LEAD>
Today As The World Spins introduces a new approach to key
chains. With the new MINI-GLOBE keys can be kept inside a
chain, called for upon demand, and stored safely. Never more
will consumers lose a key or stand at a door flipping through a
stack of keys seeking the right one.
</LEAD>
<LEGAL>&trademark;&copyright;</LEGAL>
</PRESSRELEASE>
65
Using CDATA
<HEAD1>
Entering a Kennel Club Member
</HEAD1>
<DESCRIPTION>
Enter the member by the name on his or her papers. Use the NAME tag.
The NAME tag has two attributes. Common (all in lowercase, please!) is the
dog's call name. Breed (also in all lowercase) is the dog's breed. Please see
the breed reference guide for acceptable breeds. Your entry should look
something like this:
</DESCRIPTION>
<EXAMPLE>
<![CDATA[<NAME common="freddy" breed"=springer-spaniel">Sir Fredrick
of Ledyard's End</NAME>]]>
</EXAMPLE>
66
67
Namespaces
• Namespaces are a way of preventing name clashes
among elements from more than one source within the
same XML document.
• They are also useful in identifying elements that are
meaningful for a particular XML application.
• See http://www.w3.org/TR/REC-xml-names/
68
Namespaces
• URIs are either of URLs or URNs.
• An XML namespace is, literally, identified by a URI
reference.
• The reference need not point to an actual resource!
• A URI reference may be associated with more than
one prefix.
• Prefixes are used in XML documents in forming
element and attribute names (prefix:localname).
• Two prefixes that are associated with the same URI
are said to be in the same namespace.
• declaring a namespace - identifying a namespace used
in the document.
• DTDs are unaware of namespaces.
69
Example
Defining the Namespace ATDB:
<document xmlns:ATDB=
'http://www.cs.huji.ac.il/atdb-schema'>
Using a tag from the ATDB Namespace
<ATDB:myTAG>This is an xml tag.</ATDB:myTAG>
ADTB:myTag is a qualified name.
Using A tag not from the namespace:
<myTAG>This is a ‘made in Israel’ tag.</myTAG>
70
Scope of Namespaces
• A prefix is associated with the namespace in the
element scope in which it is defined.
• Example (birthdate is associated with no
namespace):
<yp:person xlmns:yp="http://www.cs.technion.ac.il">
<yp:name> John Smith</yp:name>
<birthdate> 12-11-87</birthdate>
<address xlmns:yp="http://www.ee.technion.ac.il">
Technion City 234</address>
</yp:person>
71
Default Namespaces
• A default namespace applies to all elements in its scope.
• However, it does not override explicit prefixes (their nonprefixed child elements are default-bound).
• Example (name and birthdate are bound):
<person xlmns="http://www.cs.technion.ac.il">
<name > John Smith</name>
<birthdate> 12-11-87</birthdate>
<yp:address type="local"
xlmns:yp="http://www.ee.technion.ac.il"> Technion City
234</yp:address>
</person>
• Non-prefixed attribute names are associated with no
namespace even when in scope.
72
Summary
• XML is a useful data format. Its main virtues are
widespread acceptance and the (important) ability
to handle semi-structured data (data without
schema)
• DTDs provide some useful syntactic constraints on
documents. As schemas they are weak.
– How to store large XML documents?
– How to query them?
– How to map between XML and other representations?
73