Grammars

advertisement
XML Grammars
95-733 Internet Technologies
Internet Technologies
1
XML Grammars: Three Major Uses
1. Validation
2. Code Generation
3. Communication
Internet Technologies
2
XML Validation
Sources for this lecture:
“Data on the Web” Abiteboul, Buneman and Suciu
“XML in a Nutshell” Harold and Means
“The XML Companion” Bradley
The validation examples were originally tested with an older parser
and so the specific outputs may differ from those shown.
Internet Technologies
3
XML Validation
A batch validating process involves comparing the DTD
against a complete document instance and producing a
report containing any errors or warnings.
Consider batch validation to be analogous to program
compilation, with similar errors detected.
Interactive validation involves constant comparison of the DTD
against a document as it is being created.
Internet Technologies
4
XML Validation
The benefits of validating documents against a DTD include:
• Programmers can write extraction and manipulation filters
without fear of their software ever processing unexpected
input.
• Using an XML-aware word processor, authors and editors can
be guided and constrained to produce conforming documents.
Consider how Netbeans allows you to edit web.xml files.
Internet Technologies
5
XML Validation Examples
XML elements may contain further, embedded elements, and
the entire document must be enclosed by a single document
element.
These are recursive hierarchical structures.
A Document Type Definition (DTD) contains rules for each
element allowed within a specific class of documents.
Internet Technologies
6
Things the DTD does not do:
• Specify the document root.
• Specify the number of instances of each kind of element.
(Or, it’s rather hard to do.)
• Describe the character data inside an element (the precise
syntax).
•DTD’s don’t naturally handle namespaces.
• The XML schema language is much more recent
and improves on DTD’s. We have “programmer level”
type specifications.
• To see a real DTD, view source on
http://www.silmaril.ie/software/rss2.dtd
Internet Technologies
7
// Validate.java using Xerces
import java.io.*;
We’ll run this program
against several xml files
with DTD’s. We’ll study the
code soon.
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.XMLReader;
import org.xml.sax.InputSource;
import org.xml.sax.helpers.XMLReaderFactory;
import org.xml.sax.helpers.DefaultHandler;
This slide shows the imported
classes.
Internet Technologies
8
public class Validate
{
public static boolean valid = true;
public static void main (String argv [])
{
if (argv.length != 1) {
System.err.println ("Usage: java Validate filename.xml");
System.exit (1);
}
Here we check if the command
line is correct.
Internet Technologies
9
try {
// get a parser
XMLReader reader =
XMLReaderFactory.createXMLReader(
"org.apache.xerces.parsers.SAXParser");
// request validation
reader.setFeature("http://xml.org/sax/features/validation",
true);
// associate an InputSource object with the file name
InputSource inputSource = new InputSource(argv[0]);
// go ahead and parse
reader.parse(inputSource);
}
Internet Technologies
10
// Catch any errors or fatal errors here.
// The parser will handle simple warnings.
catch(org.xml.sax.SAXException e) {
System.out.println("Error in parsing " + e);
valid = false;
}
catch(java.io.IOException e) {
System.out.println("Error in I/O " + e);
System.exit(0);
}
System.out.println("Valid Document is " + valid);
}
}
Internet Technologies
11
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd">
<FixedFloatSwap>
XML Document
<Notional>100</Notional>
<Fixed_Rate>5</Fixed_Rate>
<NumYears>3</NumYears>
<NumPayments>6</NumPayments>
</FixedFloatSwap>
DTD
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) >
<!ELEMENT Notional (#PCDATA) >
<!ELEMENT Fixed_Rate (#PCDATA) >
<!ELEMENT NumYears (#PCDATA) >
Valid document is true
<!ELEMENT NumPayments (#PCDATA) >
Internet Technologies
12
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FixedFloatSwap SYSTEM "http://localhost:8001/dtd/FixedFloatSwap.dt
<FixedFloatSwap>
XML Document
<Notional>100</Notional>
<Fixed_Rate>5</Fixed_Rate>
<NumYears>3</NumYears>
<NumPayments>6</NumPayments>
</FixedFloatSwap>
DTD on the Web?
VERY NICE
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) >
<!ELEMENT Notional (#PCDATA) >
<!ELEMENT Fixed_Rate (#PCDATA) >
<!ELEMENT NumYears (#PCDATA) >
Valid document is true
<!ELEMENT NumPayments (#PCDATA) >
Internet Technologies
13
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FixedFloatSwap [
XML Document with
an internal subset
<!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) >
<!ELEMENT Notional (#PCDATA) >
<!ELEMENT Fixed_Rate (#PCDATA) >
<!ELEMENT NumYears (#PCDATA) >
<!ELEMENT NumPayments (#PCDATA) >
]>
<FixedFloatSwap>
<Notional>100</Notional>
<Fixed_Rate>5</Fixed_Rate>
<NumYears>3</NumYears>
<NumPayments>6</NumPayments>
</FixedFloatSwap>
Internet Technologies
Valid document is true
14
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd">
<FixedFloatSwap>
XML Document
<Notional>100</Notional>
<Fixed_Rate>5</Fixed_Rate>
<NumYears>3</NumYears>
<NumPayments>6</NumPayments>
</FixedFloatSwap>
DTD
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumPayments ) >
<!ELEMENT Notional (#PCDATA) >
<!ELEMENT Fixed_Rate (#PCDATA) >
<!ELEMENT NumPayments (#PCDATA) >
Valid document is false
Internet Technologies
15
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Swaps SYSTEM "FixedFloatSwap.dtd">
<Swaps>
<FixedFloatSwap>
<Notional>100</Notional>
<Fixed_Rate>5</Fixed_Rate>
<NumYears>3</NumYears>
<NumPayments>6</NumPayments>
</FixedFloatSwap>
XML Document
<FixedFloatSwap>
<Notional>100</Notional>
<Fixed_Rate>5</Fixed_Rate>
<NumYears>3</NumYears>
<NumPayments>6</NumPayments>
</FixedFloatSwap>
</Swaps>
Internet Technologies
16
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT Swaps (FixedFloatSwap+) >
<!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) >
<!ELEMENT Notional (#PCDATA) >
<!ELEMENT Fixed_Rate (#PCDATA) >
<!ELEMENT NumYears (#PCDATA) >
<!ELEMENT NumPayments (#PCDATA) >
DTD
C:\McCarthy\www\examples\sax>java Validate FixedFloatSwap.xml
Valid document is true
Quantity Indicators
?
0 or 1 time
+ 1 or more times
* 0 or more times
Internet Technologies
17
Is this a valid document?
<?xml version="1.0"?>
<!DOCTYPE person [
<!ELEMENT person (name+, profession*)>
<!ELEMENT profession (#PCDATA)>
<!ELEMENT name (#PCDATA)>
]>
<person>
<name>Alan Turing</name>
<profession>computer scientist</profession>
<profession>cryptographer</profession>
</person>
Internet Technologies
Sure!
18
The locations where document text data is allowed are indicated
by the keyword ‘PCDATA’ (Parsed Character Data).
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd">
<FixedFloatSwap>
XML Document
<Notional>100</Notional>
<Fixed_Rate>5</Fixed_Rate>
<NumYears>
<StartYear>2000</StartYear>
<EndYear>2002</EndYear>
</NumYears>
<NumPayments>6</NumPayments>
</FixedFloatSwap>
Internet Technologies
19
DTD
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) >
<!ELEMENT Notional (#PCDATA) >
<!ELEMENT Fixed_Rate (#PCDATA) >
<!ELEMENT NumYears (#PCDATA) >
<!ELEMENT NumPayments (#PCDATA) >
Output
C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xml
org.xml.sax.SAXParseException: Element "NumYears" does not allow "StartYear" -(#PCDATA)
org.xml.sax.SAXParseException: Element type "StartYear" is not declared.
org.xml.sax.SAXParseException: Element "NumYears" does not allow "EndYear" -- (#
PCDATA)
org.xml.sax.SAXParseException: Element type "EndYear" is not declared.
Valid document is false
Internet Technologies
20
Mixed Content
There are strict rules which must be applied when an element
is allowed to contain both text and child elements.
The PCDATA keyword must be the first token in the group,
and the group must be a choice group (using “|” not “,”).
The group must be optional and repeatable.
This is known as a mixed content model.
Internet Technologies
21
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT Mixed (emph) >
<!ELEMENT emph (#PCDATA | sub | super)* >
<!ELEMENT sub (#PCDATA)>
<!ELEMENT super (#PCDATA)>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Mixed SYSTEM "Mixed.dtd">
<Mixed>
<emph>H<sub>2</sub>O is water.</emph>
</Mixed>
DTD
XML Document
Valid document is
true
Internet Technologies
22
Is this a valid document?
<?xml version="1.0"?>
<!DOCTYPE page [
<!ELEMENT page (paragraph+)>
<!ELEMENT paragraph ( #PCDATA | profession | bold)*>
<!ELEMENT profession (#PCDATA)>
<!ELEMENT bold (#PCDATA)>
]>
<page>
<paragraph>
Alan Turing broke codes during <bold>World War II</bold>.
He very precisely defined the notion of "algorithm".
And so he had several professions:
<profession>computer scientist</profession>
<profession>cryptographer</profession>
And
<profession>mathematician</profession>
</paragraph>
Internet Technologies
</page>
Sure!
23
How
about
this
one?
<?xml version="1.0"?>
<!DOCTYPE page [
<!ELEMENT page (paragraph+)>
<!ELEMENT paragraph ( #PCDATA | profession | bold)*>
<!ELEMENT profession (#PCDATA)>
<!ELEMENT bold (#PCDATA)>
]>
<page>
The following is a paragraph marked up in XML.
<paragraph>
Alan Turing broke codes during <bold>World War II</bold>.
He very precisely defined the notion of "algorithm".
And so he had several professions:
<profession>computer scientist</profession>
java Validate mixed.xml
<profession>cryptographer</profession>
org.xml.sax.SAXParseException:
And
The content of element type "page"
<profession>mathemetician </profession>
must match "(paragraph)+".
</paragraph>
Valid document is false
Internet Technologies
24
</page>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd">
<FixedFloatSwap>
<Notional>100</Notional>
XML Document
<Fixed_Rate>5</Fixed_Rate>
<NumYears>3</NumYears>
<NumPayments>6</NumPayments>
<Note>
<![CDATA[This is text that <b>will not be
CDATA Section
parsed for markup]]>
</Note>
</FixedFloatSwap>
DTD
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT FixedFloatSwap ( Notional, Fixed_Rate, NumYears,
NumPayments, Note ) >
<!ELEMENT Notional (#PCDATA)>
<!ELEMENT Fixed_Rate (#PCDATA) >
<!ELEMENT NumYears (#PCDATA) >
<!ELEMENT NumPayments (#PCDATA)
>
Internet Technologies
<!ELEMENT Note (#PCDATA) >
25
Recursion
<?xml version="1.0"?>
<!DOCTYPE tree [
<!ELEMENT tree (node)>
<!ELEMENT node (leaf | (node,node))>
<!ELEMENT leaf (#PCDATA)>
]>
java Validate recursive1.xml
<tree>
Valid document is true
<node>
<leaf>A DTD is a context-free grammar</leaf>
</node>
</tree>
Internet Technologies
26
How
about
this
one?
<?xml version="1.0"?>
<!DOCTYPE tree [
<!ELEMENT tree (node)>
<!ELEMENT node (leaf | (node,node))>
<!ELEMENT leaf (#PCDATA)>
java Validate recursive1.xml
org.xml.sax.SAXParseException:
]>
The content of element type
<tree>
"tree" must match "(node)".
<node>
Valid document is false
<leaf>Alan Turing would like this</leaf>
</node>
<node>
<leaf>Alan Turing would like this</leaf>
</node>
</tree>
Internet Technologies
27
Relational Databases and XML
Consider the relational database r1(a,b,c), r2(c,d)
r1: a b
a1 b1
a2 b2
c
c1
c2
r2: c
c2
c3
c4
d
d2
d3
d4
How can we represent this database with an XML DTD?
Internet Technologies
28
Relations
<?xml version="1.0"?>
<!DOCTYPE db [
<!ELEMENT db (r1*, r2*)>
<!ELEMENT r1 (a,b,c)>
<!ELEMENT r2 (c,d)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT b (#PCDATA)>
<!ELEMENT c (#PCDATA)>
<!ELEMENT d (#PCDATA)>
]>
java Validate Db.xml
Valid document is true
<db>
<r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1>
<r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1>
<r2><c> c2 </c> <d> d2 </d> </r2>
<r2><c> c3 </c> <d> d3 </d> </r2>
There
<r2><c> c4 </c> <d> d4 </d> Internet
</r2> Technologies
</db>
is a small problem….
29
Relations
<?xml version="1.0"?>
<!DOCTYPE db [
<!ELEMENT db (r1|r2)* >
<!ELEMENT r1 ((a,b,c) | (a,c,b) | (b,a,c) | (b,c,a) | (c,a,b) | (c,b,a))>
<!ELEMENT r2 ((c,d) | (d,c))>
<!ELEMENT a (#PCDATA)>
The order of the relations
<!ELEMENT b (#PCDATA)>
<!ELEMENT c (#PCDATA)>
should not count and neither
<!ELEMENT d (#PCDATA)>
should the order of
within rows.
]>
columns
<db>
<r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1>
<r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1>
<r2><c> c2 </c> <d> d2 </d> </r2>
<r2><c> c3 </c> <d> d3 </d> </r2>
<r2><c> c4 </c> <d> d4 </d> </r2>
Internet Technologies
</db>
30
Attributes
An attribute is associated with a particular element by the DTD
and is assigned an attribute type.
The attribute type can restrict the range of values it can hold.
Example attribute types include :
CDATA indicates a simple string of characters
NMTOKEN indicates a word or token
A named token group such as (left | center | right)
ID an element id that holds a unique value (among other
element ID’s in the document)
IDREF attributes referInternet
to an
ID
Technologies
31
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) >
<!ELEMENT Notional (#PCDATA) >
<!ELEMENT Fixed_Rate (#PCDATA) >
DTD
<!ELEMENT NumYears (#PCDATA) >
<!ELEMENT NumPayments (#PCDATA) >
<!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd">
<FixedFloatSwap>
XML Document
<Notional>100</Notional>
<Fixed_Rate>5</Fixed_Rate>
<NumYears>3</NumYears>
<NumPayments>6</NumPayments>
</FixedFloatSwap>
C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xml
org.xml.sax.SAXParseException: Attribute value for "currency" is #REQUIRED.
Internet is
Technologies
Valid document
false
32
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) >
<!ELEMENT Notional (#PCDATA) >
<!ELEMENT Fixed_Rate (#PCDATA) >
DTD
<!ELEMENT NumYears (#PCDATA) >
<!ELEMENT NumPayments (#PCDATA) >
<!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd">
<FixedFloatSwap>
XML Document
<Notional currency = “Pounds”>100</Notional>
<Fixed_Rate>5</Fixed_Rate>
<NumYears>3</NumYears>
<NumPayments>6</NumPayments>
Valid document is true
</FixedFloatSwap>
Internet Technologies
33
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) >
<!ELEMENT Notional (#PCDATA) >
<!ELEMENT Fixed_Rate (#PCDATA) >
DTD
<!ELEMENT NumYears (#PCDATA) >
<!ELEMENT NumPayments (#PCDATA) >
<!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED>
<!ATTLIST FixedFloatSwap note CDATA #IMPLIED>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd">
<FixedFloatSwap>
XML Document
<Notional currency = “Pounds”>100</Notional>
<Fixed_Rate>5</Fixed_Rate>
<NumYears>3</NumYears>
<NumPayments>6</NumPayments>
Valid document is true
</FixedFloatSwap>
#IMPLIED means optional
Internet Technologies
34
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) >
<!ELEMENT Notional (#PCDATA) >
<!ELEMENT Fixed_Rate (#PCDATA) >
DTD
<!ELEMENT NumYears (#PCDATA) >
<!ELEMENT NumPayments (#PCDATA) >
<!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED>
<!ATTLIST FixedFloatSwap note CDATA #IMPLIED>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd">
<FixedFloatSwap note = “For your eyes only”>
XML Document
<Notional currency = “Pounds”>100</Notional>
<Fixed_Rate>5</Fixed_Rate>
<NumYears>3</NumYears>
<NumPayments>6</NumPayments>
Valid document is true
</FixedFloatSwap>
Internet Technologies
35
ID and IDREF Attributes
We can represent complex relationships within an XML
document using ID and IDREF attributes.
Internet Technologies
36
An Undirected Graph
edge
vertex
v
u
x
y
Internet Technologies
w
z
37
A Directed Graph
u
w
y
x
v
Internet Technologies
38
Geom100
Math 100
Calc100
CS1
Calc200
CS2
Calc300
Philo45
This is called a DAG (Directed Acyclic Graph)
Internet Technologies
39
<?xml version="1.0"?>
<!DOCTYPE Course_Descriptions SYSTEM
"course_descriptions.dtd">
<Course_Descriptions>
This course has an ID
<Course>
<Course-ID id = "Math100" />
<Title>Algebra I</Title>
<Description> Students in this course study
introductory algebra.
</Description>
But no prerequisites
<Prerequisites/>
</Course>
Internet Technologies
40
<Course>
<Course-ID id = "Geom100" />
The DTD will force
this to be unique.
<Title>Geometry I</Title>
<Description> Students in this course study how to
prove several theorems in geometry.
</Description>
<Prerequisites/>
</Course>
Internet Technologies
41
<Course>
<Course-ID id="Calc100" />
<Title>Calculus I</Title>
<Description> Students in this course study the derivative.
</Description>
<Prerequisites pre="Math100 Geom100" />
</Course>
<Course>
These are references to
ID’s. (IDREFS)
Internet Technologies
42
<Course-ID id = "Calc200" />
<Title>Calculus II</Title>
<Description> Students in this course study the integral.
</Description>
<Prerequisites pre="Calc100" />
</Course>
The DTD requires that this name
be a unique id defined within this
document. Otherwise, the document
is invalid.
Internet Technologies
43
<Course>
<Course-ID id = "Calc300" />
<Title>Calculus II</Title>
<Description> Students in this course study the derivative
and the integral (in 3-space).
</Description>
<Prerequisites pre="Calc200" />
</Course>
Prerequisites is an EMPTY
element. It’s used only for its
attributes.
Internet Technologies
44
<Course>
<Course-ID id = "CS1" />
<Title>Introduction to Computer Science I</Title>
<Description> In this course we study Turing machines.
</Description>
<Prerequisites pre="Calc100" />
</Course>
<Course>
IDREF
ID
A One-to-one link
Internet Technologies
45
<Course-ID id = "CS2" />
<Title>Introduction to Computer Science II</Title>
<Description> In this course we study basic data structures.
</Description>
<Prerequisites pre="Calc200 CS1"/>
</Course>
<Course>
ID
IDREFS
ID
One-to-many links
Internet Technologies
46
<Course-ID id = "Philo45" />
<Title>Ethical Implications of Information Technology</Title>
<Description> TBA
</Description>
<Prerequisites/>
</Course>
</Course_Descriptions>
Internet Technologies
47
The Course_Descriptions.dtd
<?xml version="1.0"?>
<!-- Course Description DTD -->
<!ELEMENT Course_Descriptions (Course)+>
<!ELEMENT Course (Course-ID,Title,Description,Prerequisites)>
<!ELEMENT Course-ID EMPTY>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Description (#PCDATA)>
<!ELEMENT Prerequisites EMPTY>
<!ATTLIST Course-ID id ID #REQUIRED>
<!ATTLIST Prerequisites pre IDREFS #IMPLIED>
Internet Technologies
48
General Entities &
General entities are used to place text into the XML document.
They may be declared in the DTD and referenced in the document.
They may also be declared in the DTD as residing in a file. They
may then be referenced in the document.
Internet Technologies
49
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [
<!ENTITY bankname "Mellon National Bank and Trust" >
]
>
<FixedFloatSwap>
Document using
<Bank>&bankname;</Bank>
a General Entity
<Notional>100</Notional>
<Fixed_Rate>5</Fixed_Rate>
<NumYears>3</NumYears>
<NumPayments>6</NumPayments>
</FixedFloatSwap>
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT FixedFloatSwap (Bank,Notional, Fixed_Rate, NumYears,
NumPayments ) >
DTD
<!ELEMENT Bank (#PCDATA) >
<!ELEMENT Notional (#PCDATA) >
Validate is true
<!ELEMENT Fixed_Rate (#PCDATA) >
<!ELEMENT NumYears (#PCDATA) >
Internet Technologies
<!ELEMENT NumPayments (#PCDATA) >
50
The general entity is replaced before xslt sees it.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match = "Bank">
<WML>
<CARD>
<xsl:apply-templates/>
</CARD>
</WML>
</xsl:template>
XSLT Program
<xsl:template match = "Notional | Fixed_Rate | NumYears | NumPayments">
</xsl:template>
</xsl:stylesheet>
Internet Technologies
51
C:\McCarthy\www\46-928\examples\sax>java -Dcom.jclark.xsl.sax.parser=com.jclark.
xml.sax.CommentDriver com.jclark.xsl.sax.Driver FixedFloatSwap.xml FixedFloatSwa
p.xsl FixedFloatSwap.wml
C:\McCarthy\www\46-928\examples\sax>type FixedFloatSwap.wml
<?xml version="1.0" encoding="utf-8"?>
<WML><CARD>Mellon National Bank and Trust</CARD></WML>
XSLT OUTPUT
Internet Technologies
52
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [
<!ENTITY bankname SYSTEM "JustAFile.dat" >
]
>
An external text entity
<FixedFloatSwap>
<Bank>&bankname;</Bank>
<Notional>100</Notional>
<Fixed_Rate>5</Fixed_Rate>
<NumYears>3</NumYears>
<NumPayments>6</NumPayments>
</FixedFloatSwap>
Internet Technologies
53
JustAFile.dat
Mellon Bank And Trust Corporation
Pittsburgh PA
XSLT Output
<?xml version="1.0" encoding="utf-8"?>
<WML><CARD>Mellon Bank And Trust Corporation
Pittsburgh PA</CARD></WML>
Internet Technologies
54
Parameter Entities %
While general entities are used to place text into the XML document
parameter entities are used to modify the DTD.
We want to build modular DTD’s so that we can create new DTD’s
using existing ones.
We’ll look at slide from www.fpml.org and the see some examples.
Internet Technologies
55
FpML is a Complete Description of
the TradeVanilla Swap
Trade
Vanilla Fixed Float Swap
Cancellable
Swaption
FX Spot
FX Outright
FX Swap
Forward Rate Agreement...
Trade ID
Product
Adjustable Period
Rate
Notional
Party
Party
Pool of modular components
Rate
grouped into separate namespaces
Money
Product
Date Schedule
Adjustable Period
Internet Technologies
Date
Notional
56
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears,
NumPayments ) >
DTD
<!ENTITY % parsedCharacterData "(#PCDATA)">
<!ELEMENT Notional %parsedCharacterData; >
<!ELEMENT Fixed_Rate (#PCDATA) >
Internal Parameter Entities
<!ELEMENT NumYears (#PCDATA) >
<!ELEMENT NumPayments (#PCDATA) >
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd">
<FixedFloatSwap>
<Notional>100</Notional>
<Fixed_Rate>5</Fixed_Rate>
<NumYears>3</NumYears>
XML Document
<NumPayments>6</NumPayments>
Internet Technologies
</FixedFloatSwap>
57
External Parameter Entities and
DTD
Components
<?xml version="1.0" encoding = "UTF-8"?>
<!DOCTYPE ORDER SYSTEM "order.dtd">
<!-- example order form from “XML A Manager’s Guide” -->
<ORDER SOURCE ="web" CUSTOMERTYPE="consumer" CURRENCY="USD">
<addresses>
<address ADDTYPE="billship">
Order.xml
<firstname>Kevin</firstname>
<lastname>Dick</lastname>
<street ORDER="1">123 Anywhere Lane</street>
<street ORDER="2">Apt 1b</street>
<city>Palo Alto</city>
<state>CA</state>
<postal>94303</postal>
<country>USA</country>
</address>
Internet Technologies
58
<address ADDTYPE="bill">
An order may
<firstname>Kevin</firstname>
address.
<lastname>Dick</lastname>
<street ORDER="1">123 Not The Same Lane</street>
<street ORDER="2">Work Place</street>
<city>Palo Alto</city>
<state>CA</state>
<postal>94300</postal>
<country>USA</country>
</address>
</addresses>
Internet Technologies
have more than one
59
Several products
may be purchased.
<lineitems>
<lineitem ID="line1">
<product CAT="MBoard">440BX Motherboard</product>
<quantity>1</quantity>
<unitprice>200</unitprice>
</lineitem>
<lineitem ID="line2">
<product CAT = "RAM">128 MB PC-100 DIMM</product>
<quantity>2</quantity>
<unitprice>175</unitprice>
</lineitem>
<lineitem ID="line3">
<product CAT="CDROM">40x CD-ROM</product>
<quantity>1</quantity>
<unitprice>50</unitprice>
</lineitem>
</lineitems>
Internet Technologies
60
The payment is with
Visa card.
<payment>
a
<card CARDTYPE="VISA">
<cardholder>Kevin S. Dick</cardholder>
<cardnumber>11111-22222-33333</cardnumber>
<expiration>01/01</expiration>
</card>
</payment>
</ORDER>
We want this document to be validated.
Internet Technologies
61
order.dtd
<?xml version="1.0" encoding="UTF-8"?>
<!-- Example Order form DTD adapted from XML: A Manager's Guide -->
<!-- Define an ORDER element -->
<!ELEMENT ORDER (addresses, lineitems, payment)>
<!ATTLIST ORDER
SOURCE
(web | phone | retail)
#REQUIRED
CUSTOMERTYPE (consumer | business)
"consumer"
CURRENCY
CDATA
"USD"
>
Define an order based on other elements.
Internet Technologies
62
<!ENTITY % anAddress SYSTEM "address.dtd" >
%anAddress;
<!-- Collection of Addresses -->
<!ELEMENT addresses (address+)>
External parameter
entity declaration %
<!ENTITY % aLineItem SYSTEM "lineitem.dtd" >
%aLineItem;
<!-- Collection of LineItems -->
<!ELEMENT lineitems (lineitem+)>
External parameter
entity reference %
<!ENTITY % aPayment SYSTEM "payment.dtd" >
%aPayment;
Internet Technologies
63
address.dtd
<!-- Address Structure -->
<!ELEMENT address (firstname, middlename?, lastname, street+,
city, state,postal,country)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT middlename (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city
(#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT postal (#PCDATA)>
<!ELEMENT country (#PCDATA)>
<!ATTLIST address
ADDTYPE
(bill | ship | billship) "billship">
<!ATTLIST street
ORDER
CDATA
#IMPLIED>
Internet Technologies
64
lineitem.dtd
<!ELEMENT lineitem (product,quantity,unitprice)>
<!ATTLIST lineitem
ID
ID
#REQUIRED>
<!ELEMENT product (#PCDATA)>
<!ATTLIST product
CAT
(CDROM|MBoard|RAM)
#REQUIRED>
<!ELEMENT quantity (#PCDATA)>
<!ELEMENT unitprice (#PCDATA)>
Internet Technologies
65
payment.dtd
<!ELEMENT payment (card | PO)>
<!ELEMENT card (cardholder, cardnumber, expiration)>
<!ELEMENT cardholder (#PCDATA)>
<!ELEMENT cardnumber (#PCDATA)>
<!ELEMENT expiration (#PCDATA)>
<!ELEMENT PO (number,authorization*)>
<!ELEMENT number (#PCDATA)>
<!ELEMENT authorization (#PCDATA)>
<!ATTLIST card
CARDTYPE
(VISA|MasterCard|Amex) #REQUIRED>
Internet Technologies
66
XML Schemas Improve on
DTD’s
• XML Schema is the official name
• XSDL (XML Schema Definition Language) is the language
used to create schema definitions
• XML Syntax
• Can be used to more tightly constrain a document instance
• Supports namespaces
• Permits type derivation
• Harder than DTD’s
Internet Technologies
67
Other Grammars Include
• RELAX
• TREX (James Clark - Tree Regular Expressions
for XML)
• RELAX NG (RELAX and TREX combined to
Relax Next Generation)
• Schematron (“Rule based” rather than “grammar
based” see www.ascc.net/xml/schematron) Based
on XSLT and XPath
Internet Technologies
68
XSDL - A Simple Purchase Order
<?xml version="1.0" encoding="UTF-8"?>
<!-- po.xml -->
<purchaseOrder orderDate="07.23.2001"
xmlns="http://www.cds-r-us.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.cds-r-us.com
po.xsd"
>
Internet Technologies
69
<recipient country="USA">
<name>Dennis Scannel</name>
<street>175 Perry Lea Side Road</street>
<city>Waterbury</city>
<state>VT</state>
<postalCode>15216</postalCode>
</recipient>
<order>
<cd artist="Brooks Williams" title="Little Lion" />
<cd artist="David Wilcox" title="What you whispered" />
</order>
</purchaseOrder>
Internet Technologies
70
Purchase Order XSDL
<?xml version="1.0" encoding="utf-8"?> <!-- po.xsd -->
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.cds-r-us.com"
targetNamespace="http://www.cds-r-us.com"
>
Internet Technologies
71
<xs:element name="purchaseOrder">
<xs:complexType>
<xs:sequence>
<xs:element ref="recipient" />
<xs:element ref="order" />
</xs:sequence>
<xs:attribute name="orderDate" type="xs:string" />
</xs:complexType>
</xs:element>
Internet Technologies
72
<xs:element name = "recipient">
<xs:complexType>
<xs:sequence>
<xs:element ref="name" />
<xs:element ref="street" />
<xs:element ref="city" />
<xs:element ref="state" />
<xs:element ref="postalCode" />
</xs:sequence>
<xs:attribute name="country" type="xs:string" />
</xs:complexType>
</xs:element>
Internet Technologies
73
<xs:element name = "name" type="xs:string" />
<xs:element name = "street" type="xs:string" />
<xs:element name = "city" type="xs:string" />
<xs:element name = "state" type="xs:string" />
<xs:element name = "postalCode" type="xs:short" />
<xs:element name = "order">
<xs:complexType>
<xs:sequence>
<xs:element ref="cd" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Internet Technologies
74
<xs:element name="cd">
<xs:complexType>
<xs:attribute name="artist" type="xs:string" />
<xs:attribute name="title" type="xs:string" />
</xs:complexType>
</xs:element>
</xs:schema>
Internet Technologies
75
Validate.java
// Validate.java using Xerces
import java.io.*;
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.XMLReader;
import org.xml.sax.InputSource;
import org.xml.sax.helpers.XMLReaderFactory;
import org.xml.sax.helpers.DefaultHandler;
import java.io.*;
Internet Technologies
76
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.SAXException;
import org.xml.sax.InputSource;
import org.xml.sax.SAXParseException;
Internet Technologies
77
public class Validate extends DefaultHandler
{
public static boolean valid = true;
public void error(SAXParseException exception) {
System.out.println("Received notification of a recoverable error." + exception);
valid = false;
}
public void fatalError(SAXParseException exception) {
System.out.println("Received notification of a non-recoverable error."+
exception);
valid = false;
}
public void warning(SAXParseException exception) {
System.out.println("Received notification of a warning."+ exception);
}
Internet Technologies
78
public static void main (String argv [])
{
if (argv.length != 1) {
System.err.println ("Usage: java Validate filename.xml");
System.exit (1);
}
try {
// get a parser
XMLReader
reader = XMLReaderFactory.createXMLReader(
"org.apache.xerces.parsers.SAXParser");
// request validation
reader.setFeature("http://xml.org/sax/features/validation",true);
reader.setFeature(
"http://apache.org/xml/features/validation/schema",true);
reader.setErrorHandler(new Validate());
// associate an InputSource object with the file name
InputSource inputSource = new InputSource(argv[0]);
// go ahead and parse
reader.parse(inputSource);Internet Technologies
79
}
catch(org.xml.sax.SAXException e) {
System.out.println("Error in parsing " + e);
valid = false;
}
catch(java.io.IOException e) {
System.out.println("Error in I/O " + e);
System.exit(0);
}
System.out.println("Valid Document is " + valid);
}
}
Internet Technologies
80
XML Document
<?xml version="1.0" encoding="utf-8"?>
<itemList
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:noNamespaceSchemaLocation="itemList.xsd">
<item>
<name>pen</name>
<quantity>5</quantity>
</item>
<item>
<name>eraser</name>
<quantity>7</quantity>
</item>
<item>
<name>stapler</name>
<quantity>2</quantity>
</item>
</itemList>
Internet Technologies
81
XSDL Grammar itemList.xsd
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'>
<xsd:element name="itemList">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="item"
minOccurs="0" maxOccurs="3"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Internet Technologies
82
<xsd:element name="item">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="name"/>
<xsd:element ref="quantity"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="quantity" type="xsd:short"/>
</xsd:schema>
Internet Technologies
83
D:..95-733\examples\XSDL\testing>ant run
Buildfile: build.xml
run:
Running Validate.java on itemList-xsd.xml
Valid Document is true
Internet Technologies
84
Another Example
<?xml version="1.0" encoding="UTF-8"?> <!-- po.xml -->
<myns:purchaseOrder orderDate="07.23.2001"
xmlns:myns="http://www.cds-r-us.com"
xmlns:xsi=
"http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation=
"http://www.cds-r-us.com
po.xsd"
>
Internet Technologies
85
<myns:recipient country="USA">
<myns:name>Dennis Scannel</myns:name>
<myns:street>175 Perry Lea Side Road</myns:street>
<myns:city>Waterbury</myns:city>
<myns:state>VT</myns:state>
<myns:postalCode>05675A</myns:postalCode>
</myns:recipient>
Note that there is a problem with this document.
Internet Technologies
86
<myns:order>
<myns:cd artist="Brooks Williams" title="Little Lion" />
<myns:cd artist="David Wilcox" title="What you whispered" />
</myns:order>
</myns:purchaseOrder>
Internet Technologies
87
XSDL Grammar po.xsd
<?xml version="1.0" encoding="utf-8"?> <!-- po.xsd -->
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.cds-r-us.com"
targetNamespace="http://www.cds-r-us.com"
>
<xs:element name="purchaseOrder">
<xs:complexType>
<xs:sequence>
<xs:element ref="recipient" />
<xs:element ref="order" />
</xs:sequence>
<xs:attribute name="orderDate" type="xs:string" />
</xs:complexType>
Internet Technologies
</xs:element>
88
<xs:element name = "recipient">
<xs:complexType>
<xs:sequence>
<xs:element ref="name" />
<xs:element ref="street" />
<xs:element ref="city" />
<xs:element ref="state" />
<xs:element ref="postalCode" />
</xs:sequence>
<xs:attribute name="country" type="xs:string" />
</xs:complexType>
</xs:element>
Internet Technologies
89
<xs:element name = "name" type="xs:string" />
<xs:element name = "street" type="xs:string" />
<xs:element name = "city" type="xs:string" />
<xs:element name = "state" type="xs:string" />
<xs:element name = "postalCode" type="xs:short" />
<xs:element name = "order">
<xs:complexType>
<xs:sequence>
<xs:element ref="cd" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Internet Technologies
90
<xs:element name="cd">
<xs:complexType>
<xs:attribute name="artist"
type="xs:string" />
<xs:attribute name="title" type="xs:string" />
</xs:complexType>
</xs:element>
</xs:schema>
Internet Technologies
91
Running Validate
D:..\examples\XSDL\testing>ant run
Buildfile: build.xml
run:
Running Validate.java on po.xml
Received notification of a recoverable
error.org.xml.sax.SAXParseException: cvc-datatype-valid.1.2.1:
'05675A' is not a valid 'integer' value.
Received notification of a recoverable
error.org.xml.sax.SAXParseException: cvc-type.3.1.3: The value
'05675A' of element 'myns:postalCode' is not valid.
Valid Document is false
Internet Technologies
92
Fix the error and run again
D:\..\XSDL\testing>ant run
Buildfile: build.xml
run:
Running Validate.java on po.xml
Valid Document is true
Internet Technologies
93
Introduce a Namespace Error
<?xml version="1.0" encoding="UTF-8"?>
<!-- po.xml -->
<myns:purchaseOrder orderDate="07.23.2001"
xmlns:myns="http://www.cds-r-us.edu"
xmlns:xsi=
"http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.cds-r-us.com
po.xsd"
>
Internet Technologies
94
<myns:recipient country="USA">
<myns:name>Dennis Scannel</myns:name>
<myns:street>
175 Perry Lea Side Road
</myns:street>
<myns:city>Waterbury</myns:city>
<myns:state>VT</myns:state>
<myns:postalCode>05675</myns:postalCode>
</myns:recipient>
Internet Technologies
95
<myns:order>
<myns:cd artist="Brooks Williams" title="Little Lion" />
<myns:cd artist="David Wilcox" title="What you whispered" />
</myns:order>
</myns:purchaseOrder>
Internet Technologies
96
And run validate
run:
Running Validate.java on po.xml
Received notification of a recoverable
error.org.xml.sax.SAXParseException: cvc-elt.1:
Cannot find the declaration of element 'myns:purchaseOrder'.
Valid Document is false
Internet Technologies
97
Code Generation
• Run JAXB against the .xsd file
• Code generated will present an API
allowing us to process that style of
document
Internet Technologies
98
itemList.xsd again
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'>
<xsd:element name="itemList">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="item"
minOccurs="0" maxOccurs="3"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Internet Technologies
99
<xsd:element name="item">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="name"/>
<xsd:element ref="quantity"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="quantity" type="xsd:short"/>
</xsd:schema>
Internet Technologies
100
Run xjc
D:..XSDL\testing>xjc itemList.xsd
D:\McCarthy\www\95-733\examples\XSDL\testing>java -jar
D:\jwsdp-1.1\jaxb-1.0\lib
\jaxb-xjc.jar itemList.xsd
parsing a schema...
compiling a schema...
generated\impl\ItemImpl.java
generated\impl\ItemListImpl.java
generated\impl\ItemListTypeImpl.java
generated\impl\ItemTypeImpl.java
generated\impl\NameImpl.java
Internet Technologies
101
generated\impl\QuantityImpl.java
generated\Item.java
generated\ItemList.java
generated\ItemListType.java
generated\ItemType.java
generated\Name.java
generated\ObjectFactory.java
generated\Quantity.java
generated\bgm.ser
generated\jaxb.properties
Write Java Code That uses NEW the api
Internet Technologies
102
The build script used for these
examples
<?xml version="1.0"?>
<project basedir="." default="compile">
<path id="classpath">
<fileset dir="D:/jwsdp-1.1/saaj-1.1.1/lib" includes="*.jar"/>
<fileset dir="D:/jwsdp-1.1/jaxb-1.0/lib" includes="*.jar"/>
<fileset dir="d:/jwsdp-1.1/common/lib" includes="*.jar"/>
Internet Technologies
103
<fileset dir="D:/jwsdp-1.1/jaxm-1.1.1/lib" includes="*.jar"/>
<fileset dir="D:/jwsdp-1.1/bin" includes="*.jar" />
<fileset dir="D:/jwsdp-1.1/jaxp-1.2.2/lib" includes="*.jar"/>
<fileset dir="D:/jwsdp-1.1/jaxp-1.2.2/lib/endorsed"
includes="*.jar"/>
<fileset dir="D:/jwsdp-1.1/jwsdp-shared/lib" includes="*.jar"/>
<fileset dir="D:/jwsdp-1.1/jaxr-1.0_03/lib" includes="*.jar"/>
<fileset dir="D:/jwsdp-1.1/jakarta-ant-1.5.1/lib" includes="*.jar"/>
<fileset dir="D:/j2sdk1.4.1_01/lib" includes="*.jar"/>
<pathelement location="."/>
</path>
Internet Technologies
104
<!-- compile Java source files -->
<target name="compile">
<!-- compile all of the java sources -->
<echo message="Compiling the java source files..."/>
<javac srcdir="." destdir="." debug="on">
<classpath refid="classpath" />
</javac>
</target>
<target name="run">
<echo message="Running Validate.java on po.xml"/>
<java classname="Validate" fork="fasle">
<arg value="po.xml"/>
<classpath refid="classpath" />
</java>
</target>
</project>
Internet Technologies
105
Download