XML Grammars 95-733 Internet Technologies Internet Technologies 1 XML Grammars: Three Major Uses 1. Validation 2. Code Generation 3. Communication Internet Technologies 2 XML Validation Sources for this lecture: “Data on the Web” Abiteboul, Buneman and Suciu “XML in a Nutshell” Harold and Means “The XML Companion” Bradley The validation examples were originally tested with an older parser and so the specific outputs may differ from those shown. Internet Technologies 3 XML Validation A batch validating process involves comparing the DTD against a complete document instance and producing a report containing any errors or warnings. Consider batch validation to be analogous to program compilation, with similar errors detected. Interactive validation involves constant comparison of the DTD against a document as it is being created. Internet Technologies 4 XML Validation The benefits of validating documents against a DTD include: • Programmers can write extraction and manipulation filters without fear of their software ever processing unexpected input. • Using an XML-aware word processor, authors and editors can be guided and constrained to produce conforming documents. Consider how Netbeans allows you to edit web.xml files. Internet Technologies 5 XML Validation Examples XML elements may contain further, embedded elements, and the entire document must be enclosed by a single document element. These are recursive hierarchical structures. A Document Type Definition (DTD) contains rules for each element allowed within a specific class of documents. Internet Technologies 6 Things the DTD does not do: • Specify the document root. • Specify the number of instances of each kind of element. (Or, it’s rather hard to do.) • Describe the character data inside an element (the precise syntax). •DTD’s don’t naturally handle namespaces. • The XML schema language is much more recent and improves on DTD’s. We have “programmer level” type specifications. • To see a real DTD, view source on http://www.silmaril.ie/software/rss2.dtd Internet Technologies 7 // Validate.java using Xerces import java.io.*; We’ll run this program against several xml files with DTD’s. We’ll study the code soon. import org.xml.sax.ErrorHandler; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.InputSource; import org.xml.sax.helpers.XMLReaderFactory; import org.xml.sax.helpers.DefaultHandler; This slide shows the imported classes. Internet Technologies 8 public class Validate { public static boolean valid = true; public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java Validate filename.xml"); System.exit (1); } Here we check if the command line is correct. Internet Technologies 9 try { // get a parser XMLReader reader = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser"); // request validation reader.setFeature("http://xml.org/sax/features/validation", true); // associate an InputSource object with the file name InputSource inputSource = new InputSource(argv[0]); // go ahead and parse reader.parse(inputSource); } Internet Technologies 10 // Catch any errors or fatal errors here. // The parser will handle simple warnings. catch(org.xml.sax.SAXException e) { System.out.println("Error in parsing " + e); valid = false; } catch(java.io.IOException e) { System.out.println("Error in I/O " + e); System.exit(0); } System.out.println("Valid Document is " + valid); } } Internet Technologies 11 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> XML Document <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> DTD <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > Valid document is true <!ELEMENT NumPayments (#PCDATA) > Internet Technologies 12 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "http://localhost:8001/dtd/FixedFloatSwap.dt <FixedFloatSwap> XML Document <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> DTD on the Web? VERY NICE <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > Valid document is true <!ELEMENT NumPayments (#PCDATA) > Internet Technologies 13 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap [ XML Document with an internal subset <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > ]> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> Internet Technologies Valid document is true 14 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> XML Document <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> DTD <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > Valid document is false Internet Technologies 15 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Swaps SYSTEM "FixedFloatSwap.dtd"> <Swaps> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> XML Document <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> </Swaps> Internet Technologies 16 <?xml version="1.0" encoding="utf-8"?> <!ELEMENT Swaps (FixedFloatSwap+) > <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > DTD C:\McCarthy\www\examples\sax>java Validate FixedFloatSwap.xml Valid document is true Quantity Indicators ? 0 or 1 time + 1 or more times * 0 or more times Internet Technologies 17 Is this a valid document? <?xml version="1.0"?> <!DOCTYPE person [ <!ELEMENT person (name+, profession*)> <!ELEMENT profession (#PCDATA)> <!ELEMENT name (#PCDATA)> ]> <person> <name>Alan Turing</name> <profession>computer scientist</profession> <profession>cryptographer</profession> </person> Internet Technologies Sure! 18 The locations where document text data is allowed are indicated by the keyword ‘PCDATA’ (Parsed Character Data). <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> XML Document <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears> <StartYear>2000</StartYear> <EndYear>2002</EndYear> </NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> Internet Technologies 19 DTD <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > Output C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xml org.xml.sax.SAXParseException: Element "NumYears" does not allow "StartYear" -(#PCDATA) org.xml.sax.SAXParseException: Element type "StartYear" is not declared. org.xml.sax.SAXParseException: Element "NumYears" does not allow "EndYear" -- (# PCDATA) org.xml.sax.SAXParseException: Element type "EndYear" is not declared. Valid document is false Internet Technologies 20 Mixed Content There are strict rules which must be applied when an element is allowed to contain both text and child elements. The PCDATA keyword must be the first token in the group, and the group must be a choice group (using “|” not “,”). The group must be optional and repeatable. This is known as a mixed content model. Internet Technologies 21 <?xml version="1.0" encoding="utf-8"?> <!ELEMENT Mixed (emph) > <!ELEMENT emph (#PCDATA | sub | super)* > <!ELEMENT sub (#PCDATA)> <!ELEMENT super (#PCDATA)> <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Mixed SYSTEM "Mixed.dtd"> <Mixed> <emph>H<sub>2</sub>O is water.</emph> </Mixed> DTD XML Document Valid document is true Internet Technologies 22 Is this a valid document? <?xml version="1.0"?> <!DOCTYPE page [ <!ELEMENT page (paragraph+)> <!ELEMENT paragraph ( #PCDATA | profession | bold)*> <!ELEMENT profession (#PCDATA)> <!ELEMENT bold (#PCDATA)> ]> <page> <paragraph> Alan Turing broke codes during <bold>World War II</bold>. He very precisely defined the notion of "algorithm". And so he had several professions: <profession>computer scientist</profession> <profession>cryptographer</profession> And <profession>mathematician</profession> </paragraph> Internet Technologies </page> Sure! 23 How about this one? <?xml version="1.0"?> <!DOCTYPE page [ <!ELEMENT page (paragraph+)> <!ELEMENT paragraph ( #PCDATA | profession | bold)*> <!ELEMENT profession (#PCDATA)> <!ELEMENT bold (#PCDATA)> ]> <page> The following is a paragraph marked up in XML. <paragraph> Alan Turing broke codes during <bold>World War II</bold>. He very precisely defined the notion of "algorithm". And so he had several professions: <profession>computer scientist</profession> java Validate mixed.xml <profession>cryptographer</profession> org.xml.sax.SAXParseException: And The content of element type "page" <profession>mathemetician </profession> must match "(paragraph)+". </paragraph> Valid document is false Internet Technologies 24 </page> <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> XML Document <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> <Note> <![CDATA[This is text that <b>will not be CDATA Section parsed for markup]]> </Note> </FixedFloatSwap> DTD <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap ( Notional, Fixed_Rate, NumYears, NumPayments, Note ) > <!ELEMENT Notional (#PCDATA)> <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > Internet Technologies <!ELEMENT Note (#PCDATA) > 25 Recursion <?xml version="1.0"?> <!DOCTYPE tree [ <!ELEMENT tree (node)> <!ELEMENT node (leaf | (node,node))> <!ELEMENT leaf (#PCDATA)> ]> java Validate recursive1.xml <tree> Valid document is true <node> <leaf>A DTD is a context-free grammar</leaf> </node> </tree> Internet Technologies 26 How about this one? <?xml version="1.0"?> <!DOCTYPE tree [ <!ELEMENT tree (node)> <!ELEMENT node (leaf | (node,node))> <!ELEMENT leaf (#PCDATA)> java Validate recursive1.xml org.xml.sax.SAXParseException: ]> The content of element type <tree> "tree" must match "(node)". <node> Valid document is false <leaf>Alan Turing would like this</leaf> </node> <node> <leaf>Alan Turing would like this</leaf> </node> </tree> Internet Technologies 27 Relational Databases and XML Consider the relational database r1(a,b,c), r2(c,d) r1: a b a1 b1 a2 b2 c c1 c2 r2: c c2 c3 c4 d d2 d3 d4 How can we represent this database with an XML DTD? Internet Technologies 28 Relations <?xml version="1.0"?> <!DOCTYPE db [ <!ELEMENT db (r1*, r2*)> <!ELEMENT r1 (a,b,c)> <!ELEMENT r2 (c,d)> <!ELEMENT a (#PCDATA)> <!ELEMENT b (#PCDATA)> <!ELEMENT c (#PCDATA)> <!ELEMENT d (#PCDATA)> ]> java Validate Db.xml Valid document is true <db> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r2><c> c2 </c> <d> d2 </d> </r2> <r2><c> c3 </c> <d> d3 </d> </r2> There <r2><c> c4 </c> <d> d4 </d> Internet </r2> Technologies </db> is a small problem…. 29 Relations <?xml version="1.0"?> <!DOCTYPE db [ <!ELEMENT db (r1|r2)* > <!ELEMENT r1 ((a,b,c) | (a,c,b) | (b,a,c) | (b,c,a) | (c,a,b) | (c,b,a))> <!ELEMENT r2 ((c,d) | (d,c))> <!ELEMENT a (#PCDATA)> The order of the relations <!ELEMENT b (#PCDATA)> <!ELEMENT c (#PCDATA)> should not count and neither <!ELEMENT d (#PCDATA)> should the order of within rows. ]> columns <db> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r2><c> c2 </c> <d> d2 </d> </r2> <r2><c> c3 </c> <d> d3 </d> </r2> <r2><c> c4 </c> <d> d4 </d> </r2> Internet Technologies </db> 30 Attributes An attribute is associated with a particular element by the DTD and is assigned an attribute type. The attribute type can restrict the range of values it can hold. Example attribute types include : CDATA indicates a simple string of characters NMTOKEN indicates a word or token A named token group such as (left | center | right) ID an element id that holds a unique value (among other element ID’s in the document) IDREF attributes referInternet to an ID Technologies 31 <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > DTD <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > <!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED> <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> XML Document <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xml org.xml.sax.SAXParseException: Attribute value for "currency" is #REQUIRED. Internet is Technologies Valid document false 32 <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > DTD <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > <!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED> <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> XML Document <Notional currency = “Pounds”>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> Valid document is true </FixedFloatSwap> Internet Technologies 33 <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > DTD <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > <!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED> <!ATTLIST FixedFloatSwap note CDATA #IMPLIED> <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> XML Document <Notional currency = “Pounds”>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> Valid document is true </FixedFloatSwap> #IMPLIED means optional Internet Technologies 34 <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > DTD <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > <!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED> <!ATTLIST FixedFloatSwap note CDATA #IMPLIED> <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap note = “For your eyes only”> XML Document <Notional currency = “Pounds”>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> Valid document is true </FixedFloatSwap> Internet Technologies 35 ID and IDREF Attributes We can represent complex relationships within an XML document using ID and IDREF attributes. Internet Technologies 36 An Undirected Graph edge vertex v u x y Internet Technologies w z 37 A Directed Graph u w y x v Internet Technologies 38 Geom100 Math 100 Calc100 CS1 Calc200 CS2 Calc300 Philo45 This is called a DAG (Directed Acyclic Graph) Internet Technologies 39 <?xml version="1.0"?> <!DOCTYPE Course_Descriptions SYSTEM "course_descriptions.dtd"> <Course_Descriptions> This course has an ID <Course> <Course-ID id = "Math100" /> <Title>Algebra I</Title> <Description> Students in this course study introductory algebra. </Description> But no prerequisites <Prerequisites/> </Course> Internet Technologies 40 <Course> <Course-ID id = "Geom100" /> The DTD will force this to be unique. <Title>Geometry I</Title> <Description> Students in this course study how to prove several theorems in geometry. </Description> <Prerequisites/> </Course> Internet Technologies 41 <Course> <Course-ID id="Calc100" /> <Title>Calculus I</Title> <Description> Students in this course study the derivative. </Description> <Prerequisites pre="Math100 Geom100" /> </Course> <Course> These are references to ID’s. (IDREFS) Internet Technologies 42 <Course-ID id = "Calc200" /> <Title>Calculus II</Title> <Description> Students in this course study the integral. </Description> <Prerequisites pre="Calc100" /> </Course> The DTD requires that this name be a unique id defined within this document. Otherwise, the document is invalid. Internet Technologies 43 <Course> <Course-ID id = "Calc300" /> <Title>Calculus II</Title> <Description> Students in this course study the derivative and the integral (in 3-space). </Description> <Prerequisites pre="Calc200" /> </Course> Prerequisites is an EMPTY element. It’s used only for its attributes. Internet Technologies 44 <Course> <Course-ID id = "CS1" /> <Title>Introduction to Computer Science I</Title> <Description> In this course we study Turing machines. </Description> <Prerequisites pre="Calc100" /> </Course> <Course> IDREF ID A One-to-one link Internet Technologies 45 <Course-ID id = "CS2" /> <Title>Introduction to Computer Science II</Title> <Description> In this course we study basic data structures. </Description> <Prerequisites pre="Calc200 CS1"/> </Course> <Course> ID IDREFS ID One-to-many links Internet Technologies 46 <Course-ID id = "Philo45" /> <Title>Ethical Implications of Information Technology</Title> <Description> TBA </Description> <Prerequisites/> </Course> </Course_Descriptions> Internet Technologies 47 The Course_Descriptions.dtd <?xml version="1.0"?> <!-- Course Description DTD --> <!ELEMENT Course_Descriptions (Course)+> <!ELEMENT Course (Course-ID,Title,Description,Prerequisites)> <!ELEMENT Course-ID EMPTY> <!ELEMENT Title (#PCDATA)> <!ELEMENT Description (#PCDATA)> <!ELEMENT Prerequisites EMPTY> <!ATTLIST Course-ID id ID #REQUIRED> <!ATTLIST Prerequisites pre IDREFS #IMPLIED> Internet Technologies 48 General Entities & General entities are used to place text into the XML document. They may be declared in the DTD and referenced in the document. They may also be declared in the DTD as residing in a file. They may then be referenced in the document. Internet Technologies 49 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ <!ENTITY bankname "Mellon National Bank and Trust" > ] > <FixedFloatSwap> Document using <Bank>&bankname;</Bank> a General Entity <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Bank,Notional, Fixed_Rate, NumYears, NumPayments ) > DTD <!ELEMENT Bank (#PCDATA) > <!ELEMENT Notional (#PCDATA) > Validate is true <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > Internet Technologies <!ELEMENT NumPayments (#PCDATA) > 50 The general entity is replaced before xslt sees it. <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match = "Bank"> <WML> <CARD> <xsl:apply-templates/> </CARD> </WML> </xsl:template> XSLT Program <xsl:template match = "Notional | Fixed_Rate | NumYears | NumPayments"> </xsl:template> </xsl:stylesheet> Internet Technologies 51 C:\McCarthy\www\46-928\examples\sax>java -Dcom.jclark.xsl.sax.parser=com.jclark. xml.sax.CommentDriver com.jclark.xsl.sax.Driver FixedFloatSwap.xml FixedFloatSwa p.xsl FixedFloatSwap.wml C:\McCarthy\www\46-928\examples\sax>type FixedFloatSwap.wml <?xml version="1.0" encoding="utf-8"?> <WML><CARD>Mellon National Bank and Trust</CARD></WML> XSLT OUTPUT Internet Technologies 52 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ <!ENTITY bankname SYSTEM "JustAFile.dat" > ] > An external text entity <FixedFloatSwap> <Bank>&bankname;</Bank> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> Internet Technologies 53 JustAFile.dat Mellon Bank And Trust Corporation Pittsburgh PA XSLT Output <?xml version="1.0" encoding="utf-8"?> <WML><CARD>Mellon Bank And Trust Corporation Pittsburgh PA</CARD></WML> Internet Technologies 54 Parameter Entities % While general entities are used to place text into the XML document parameter entities are used to modify the DTD. We want to build modular DTD’s so that we can create new DTD’s using existing ones. We’ll look at slide from www.fpml.org and the see some examples. Internet Technologies 55 FpML is a Complete Description of the TradeVanilla Swap Trade Vanilla Fixed Float Swap Cancellable Swaption FX Spot FX Outright FX Swap Forward Rate Agreement... Trade ID Product Adjustable Period Rate Notional Party Party Pool of modular components Rate grouped into separate namespaces Money Product Date Schedule Adjustable Period Internet Technologies Date Notional 56 <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > DTD <!ENTITY % parsedCharacterData "(#PCDATA)"> <!ELEMENT Notional %parsedCharacterData; > <!ELEMENT Fixed_Rate (#PCDATA) > Internal Parameter Entities <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> XML Document <NumPayments>6</NumPayments> Internet Technologies </FixedFloatSwap> 57 External Parameter Entities and DTD Components <?xml version="1.0" encoding = "UTF-8"?> <!DOCTYPE ORDER SYSTEM "order.dtd"> <!-- example order form from “XML A Manager’s Guide” --> <ORDER SOURCE ="web" CUSTOMERTYPE="consumer" CURRENCY="USD"> <addresses> <address ADDTYPE="billship"> Order.xml <firstname>Kevin</firstname> <lastname>Dick</lastname> <street ORDER="1">123 Anywhere Lane</street> <street ORDER="2">Apt 1b</street> <city>Palo Alto</city> <state>CA</state> <postal>94303</postal> <country>USA</country> </address> Internet Technologies 58 <address ADDTYPE="bill"> An order may <firstname>Kevin</firstname> address. <lastname>Dick</lastname> <street ORDER="1">123 Not The Same Lane</street> <street ORDER="2">Work Place</street> <city>Palo Alto</city> <state>CA</state> <postal>94300</postal> <country>USA</country> </address> </addresses> Internet Technologies have more than one 59 Several products may be purchased. <lineitems> <lineitem ID="line1"> <product CAT="MBoard">440BX Motherboard</product> <quantity>1</quantity> <unitprice>200</unitprice> </lineitem> <lineitem ID="line2"> <product CAT = "RAM">128 MB PC-100 DIMM</product> <quantity>2</quantity> <unitprice>175</unitprice> </lineitem> <lineitem ID="line3"> <product CAT="CDROM">40x CD-ROM</product> <quantity>1</quantity> <unitprice>50</unitprice> </lineitem> </lineitems> Internet Technologies 60 The payment is with Visa card. <payment> a <card CARDTYPE="VISA"> <cardholder>Kevin S. Dick</cardholder> <cardnumber>11111-22222-33333</cardnumber> <expiration>01/01</expiration> </card> </payment> </ORDER> We want this document to be validated. Internet Technologies 61 order.dtd <?xml version="1.0" encoding="UTF-8"?> <!-- Example Order form DTD adapted from XML: A Manager's Guide --> <!-- Define an ORDER element --> <!ELEMENT ORDER (addresses, lineitems, payment)> <!ATTLIST ORDER SOURCE (web | phone | retail) #REQUIRED CUSTOMERTYPE (consumer | business) "consumer" CURRENCY CDATA "USD" > Define an order based on other elements. Internet Technologies 62 <!ENTITY % anAddress SYSTEM "address.dtd" > %anAddress; <!-- Collection of Addresses --> <!ELEMENT addresses (address+)> External parameter entity declaration % <!ENTITY % aLineItem SYSTEM "lineitem.dtd" > %aLineItem; <!-- Collection of LineItems --> <!ELEMENT lineitems (lineitem+)> External parameter entity reference % <!ENTITY % aPayment SYSTEM "payment.dtd" > %aPayment; Internet Technologies 63 address.dtd <!-- Address Structure --> <!ELEMENT address (firstname, middlename?, lastname, street+, city, state,postal,country)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT middlename (#PCDATA)> <!ELEMENT lastname (#PCDATA)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT postal (#PCDATA)> <!ELEMENT country (#PCDATA)> <!ATTLIST address ADDTYPE (bill | ship | billship) "billship"> <!ATTLIST street ORDER CDATA #IMPLIED> Internet Technologies 64 lineitem.dtd <!ELEMENT lineitem (product,quantity,unitprice)> <!ATTLIST lineitem ID ID #REQUIRED> <!ELEMENT product (#PCDATA)> <!ATTLIST product CAT (CDROM|MBoard|RAM) #REQUIRED> <!ELEMENT quantity (#PCDATA)> <!ELEMENT unitprice (#PCDATA)> Internet Technologies 65 payment.dtd <!ELEMENT payment (card | PO)> <!ELEMENT card (cardholder, cardnumber, expiration)> <!ELEMENT cardholder (#PCDATA)> <!ELEMENT cardnumber (#PCDATA)> <!ELEMENT expiration (#PCDATA)> <!ELEMENT PO (number,authorization*)> <!ELEMENT number (#PCDATA)> <!ELEMENT authorization (#PCDATA)> <!ATTLIST card CARDTYPE (VISA|MasterCard|Amex) #REQUIRED> Internet Technologies 66 XML Schemas Improve on DTD’s • XML Schema is the official name • XSDL (XML Schema Definition Language) is the language used to create schema definitions • XML Syntax • Can be used to more tightly constrain a document instance • Supports namespaces • Permits type derivation • Harder than DTD’s Internet Technologies 67 Other Grammars Include • RELAX • TREX (James Clark - Tree Regular Expressions for XML) • RELAX NG (RELAX and TREX combined to Relax Next Generation) • Schematron (“Rule based” rather than “grammar based” see www.ascc.net/xml/schematron) Based on XSLT and XPath Internet Technologies 68 XSDL - A Simple Purchase Order <?xml version="1.0" encoding="UTF-8"?> <!-- po.xml --> <purchaseOrder orderDate="07.23.2001" xmlns="http://www.cds-r-us.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.cds-r-us.com po.xsd" > Internet Technologies 69 <recipient country="USA"> <name>Dennis Scannel</name> <street>175 Perry Lea Side Road</street> <city>Waterbury</city> <state>VT</state> <postalCode>15216</postalCode> </recipient> <order> <cd artist="Brooks Williams" title="Little Lion" /> <cd artist="David Wilcox" title="What you whispered" /> </order> </purchaseOrder> Internet Technologies 70 Purchase Order XSDL <?xml version="1.0" encoding="utf-8"?> <!-- po.xsd --> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.cds-r-us.com" targetNamespace="http://www.cds-r-us.com" > Internet Technologies 71 <xs:element name="purchaseOrder"> <xs:complexType> <xs:sequence> <xs:element ref="recipient" /> <xs:element ref="order" /> </xs:sequence> <xs:attribute name="orderDate" type="xs:string" /> </xs:complexType> </xs:element> Internet Technologies 72 <xs:element name = "recipient"> <xs:complexType> <xs:sequence> <xs:element ref="name" /> <xs:element ref="street" /> <xs:element ref="city" /> <xs:element ref="state" /> <xs:element ref="postalCode" /> </xs:sequence> <xs:attribute name="country" type="xs:string" /> </xs:complexType> </xs:element> Internet Technologies 73 <xs:element name = "name" type="xs:string" /> <xs:element name = "street" type="xs:string" /> <xs:element name = "city" type="xs:string" /> <xs:element name = "state" type="xs:string" /> <xs:element name = "postalCode" type="xs:short" /> <xs:element name = "order"> <xs:complexType> <xs:sequence> <xs:element ref="cd" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> Internet Technologies 74 <xs:element name="cd"> <xs:complexType> <xs:attribute name="artist" type="xs:string" /> <xs:attribute name="title" type="xs:string" /> </xs:complexType> </xs:element> </xs:schema> Internet Technologies 75 Validate.java // Validate.java using Xerces import java.io.*; import org.xml.sax.ErrorHandler; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.InputSource; import org.xml.sax.helpers.XMLReaderFactory; import org.xml.sax.helpers.DefaultHandler; import java.io.*; Internet Technologies 76 import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.SAXException; import org.xml.sax.InputSource; import org.xml.sax.SAXParseException; Internet Technologies 77 public class Validate extends DefaultHandler { public static boolean valid = true; public void error(SAXParseException exception) { System.out.println("Received notification of a recoverable error." + exception); valid = false; } public void fatalError(SAXParseException exception) { System.out.println("Received notification of a non-recoverable error."+ exception); valid = false; } public void warning(SAXParseException exception) { System.out.println("Received notification of a warning."+ exception); } Internet Technologies 78 public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java Validate filename.xml"); System.exit (1); } try { // get a parser XMLReader reader = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser"); // request validation reader.setFeature("http://xml.org/sax/features/validation",true); reader.setFeature( "http://apache.org/xml/features/validation/schema",true); reader.setErrorHandler(new Validate()); // associate an InputSource object with the file name InputSource inputSource = new InputSource(argv[0]); // go ahead and parse reader.parse(inputSource);Internet Technologies 79 } catch(org.xml.sax.SAXException e) { System.out.println("Error in parsing " + e); valid = false; } catch(java.io.IOException e) { System.out.println("Error in I/O " + e); System.exit(0); } System.out.println("Valid Document is " + valid); } } Internet Technologies 80 XML Document <?xml version="1.0" encoding="utf-8"?> <itemList xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:noNamespaceSchemaLocation="itemList.xsd"> <item> <name>pen</name> <quantity>5</quantity> </item> <item> <name>eraser</name> <quantity>7</quantity> </item> <item> <name>stapler</name> <quantity>2</quantity> </item> </itemList> Internet Technologies 81 XSDL Grammar itemList.xsd <?xml version="1.0" encoding="utf-8"?> <xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'> <xsd:element name="itemList"> <xsd:complexType> <xsd:sequence> <xsd:element ref="item" minOccurs="0" maxOccurs="3"/> </xsd:sequence> </xsd:complexType> </xsd:element> Internet Technologies 82 <xsd:element name="item"> <xsd:complexType> <xsd:sequence> <xsd:element ref="name"/> <xsd:element ref="quantity"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="name" type="xsd:string"/> <xsd:element name="quantity" type="xsd:short"/> </xsd:schema> Internet Technologies 83 D:..95-733\examples\XSDL\testing>ant run Buildfile: build.xml run: Running Validate.java on itemList-xsd.xml Valid Document is true Internet Technologies 84 Another Example <?xml version="1.0" encoding="UTF-8"?> <!-- po.xml --> <myns:purchaseOrder orderDate="07.23.2001" xmlns:myns="http://www.cds-r-us.com" xmlns:xsi= "http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.cds-r-us.com po.xsd" > Internet Technologies 85 <myns:recipient country="USA"> <myns:name>Dennis Scannel</myns:name> <myns:street>175 Perry Lea Side Road</myns:street> <myns:city>Waterbury</myns:city> <myns:state>VT</myns:state> <myns:postalCode>05675A</myns:postalCode> </myns:recipient> Note that there is a problem with this document. Internet Technologies 86 <myns:order> <myns:cd artist="Brooks Williams" title="Little Lion" /> <myns:cd artist="David Wilcox" title="What you whispered" /> </myns:order> </myns:purchaseOrder> Internet Technologies 87 XSDL Grammar po.xsd <?xml version="1.0" encoding="utf-8"?> <!-- po.xsd --> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.cds-r-us.com" targetNamespace="http://www.cds-r-us.com" > <xs:element name="purchaseOrder"> <xs:complexType> <xs:sequence> <xs:element ref="recipient" /> <xs:element ref="order" /> </xs:sequence> <xs:attribute name="orderDate" type="xs:string" /> </xs:complexType> Internet Technologies </xs:element> 88 <xs:element name = "recipient"> <xs:complexType> <xs:sequence> <xs:element ref="name" /> <xs:element ref="street" /> <xs:element ref="city" /> <xs:element ref="state" /> <xs:element ref="postalCode" /> </xs:sequence> <xs:attribute name="country" type="xs:string" /> </xs:complexType> </xs:element> Internet Technologies 89 <xs:element name = "name" type="xs:string" /> <xs:element name = "street" type="xs:string" /> <xs:element name = "city" type="xs:string" /> <xs:element name = "state" type="xs:string" /> <xs:element name = "postalCode" type="xs:short" /> <xs:element name = "order"> <xs:complexType> <xs:sequence> <xs:element ref="cd" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> Internet Technologies 90 <xs:element name="cd"> <xs:complexType> <xs:attribute name="artist" type="xs:string" /> <xs:attribute name="title" type="xs:string" /> </xs:complexType> </xs:element> </xs:schema> Internet Technologies 91 Running Validate D:..\examples\XSDL\testing>ant run Buildfile: build.xml run: Running Validate.java on po.xml Received notification of a recoverable error.org.xml.sax.SAXParseException: cvc-datatype-valid.1.2.1: '05675A' is not a valid 'integer' value. Received notification of a recoverable error.org.xml.sax.SAXParseException: cvc-type.3.1.3: The value '05675A' of element 'myns:postalCode' is not valid. Valid Document is false Internet Technologies 92 Fix the error and run again D:\..\XSDL\testing>ant run Buildfile: build.xml run: Running Validate.java on po.xml Valid Document is true Internet Technologies 93 Introduce a Namespace Error <?xml version="1.0" encoding="UTF-8"?> <!-- po.xml --> <myns:purchaseOrder orderDate="07.23.2001" xmlns:myns="http://www.cds-r-us.edu" xmlns:xsi= "http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.cds-r-us.com po.xsd" > Internet Technologies 94 <myns:recipient country="USA"> <myns:name>Dennis Scannel</myns:name> <myns:street> 175 Perry Lea Side Road </myns:street> <myns:city>Waterbury</myns:city> <myns:state>VT</myns:state> <myns:postalCode>05675</myns:postalCode> </myns:recipient> Internet Technologies 95 <myns:order> <myns:cd artist="Brooks Williams" title="Little Lion" /> <myns:cd artist="David Wilcox" title="What you whispered" /> </myns:order> </myns:purchaseOrder> Internet Technologies 96 And run validate run: Running Validate.java on po.xml Received notification of a recoverable error.org.xml.sax.SAXParseException: cvc-elt.1: Cannot find the declaration of element 'myns:purchaseOrder'. Valid Document is false Internet Technologies 97 Code Generation • Run JAXB against the .xsd file • Code generated will present an API allowing us to process that style of document Internet Technologies 98 itemList.xsd again <?xml version="1.0" encoding="utf-8"?> <xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'> <xsd:element name="itemList"> <xsd:complexType> <xsd:sequence> <xsd:element ref="item" minOccurs="0" maxOccurs="3"/> </xsd:sequence> </xsd:complexType> </xsd:element> Internet Technologies 99 <xsd:element name="item"> <xsd:complexType> <xsd:sequence> <xsd:element ref="name"/> <xsd:element ref="quantity"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="name" type="xsd:string"/> <xsd:element name="quantity" type="xsd:short"/> </xsd:schema> Internet Technologies 100 Run xjc D:..XSDL\testing>xjc itemList.xsd D:\McCarthy\www\95-733\examples\XSDL\testing>java -jar D:\jwsdp-1.1\jaxb-1.0\lib \jaxb-xjc.jar itemList.xsd parsing a schema... compiling a schema... generated\impl\ItemImpl.java generated\impl\ItemListImpl.java generated\impl\ItemListTypeImpl.java generated\impl\ItemTypeImpl.java generated\impl\NameImpl.java Internet Technologies 101 generated\impl\QuantityImpl.java generated\Item.java generated\ItemList.java generated\ItemListType.java generated\ItemType.java generated\Name.java generated\ObjectFactory.java generated\Quantity.java generated\bgm.ser generated\jaxb.properties Write Java Code That uses NEW the api Internet Technologies 102 The build script used for these examples <?xml version="1.0"?> <project basedir="." default="compile"> <path id="classpath"> <fileset dir="D:/jwsdp-1.1/saaj-1.1.1/lib" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/jaxb-1.0/lib" includes="*.jar"/> <fileset dir="d:/jwsdp-1.1/common/lib" includes="*.jar"/> Internet Technologies 103 <fileset dir="D:/jwsdp-1.1/jaxm-1.1.1/lib" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/bin" includes="*.jar" /> <fileset dir="D:/jwsdp-1.1/jaxp-1.2.2/lib" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/jaxp-1.2.2/lib/endorsed" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/jwsdp-shared/lib" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/jaxr-1.0_03/lib" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/jakarta-ant-1.5.1/lib" includes="*.jar"/> <fileset dir="D:/j2sdk1.4.1_01/lib" includes="*.jar"/> <pathelement location="."/> </path> Internet Technologies 104 <!-- compile Java source files --> <target name="compile"> <!-- compile all of the java sources --> <echo message="Compiling the java source files..."/> <javac srcdir="." destdir="." debug="on"> <classpath refid="classpath" /> </javac> </target> <target name="run"> <echo message="Running Validate.java on po.xml"/> <java classname="Validate" fork="fasle"> <arg value="po.xml"/> <classpath refid="classpath" /> </java> </target> </project> Internet Technologies 105