University of Dublin Trinity College Fundamentals of XML Owen.Conlan@scss.tcd.ie What is Markup • <Lecture> • Sequence of characters within a text or word processing file to define – Print properties – Display properties – Document's logical structure • Markup indicators are often called "tags" – Examples • RTF • EDIFACT • XML \ {} + </> ' <> : " Mark Up: RTF \li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel2\a djustright\rin0\lin0\itap0 \b\f1\fs26\lang2057\langfe1033\cgrid\langnp2057\langfenp1033 {\lang6153\langfe1033\langnp6153 Entity Relationship Diagram \par }\pard\plain \s1\ql \li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel0\a djustright\rin0\lin0\itap0 \cbpat17 \b\f1\fs24\lang2057\langfe1033\kerning32\cgrid\langnp2057\langfenp1033 {\lang6153\langfe1033\langnp6153 Entity Type \par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang2057\langfe1033\cgrid\langnp2057\langfenp1033 {\b\fs20\ul\lang6153\langfe1033\langnp6153 Def.:}{\b\fs20\lang6153\langfe1033\langnp6153 }{ \fs20\lang6153\langfe1033\langnp6153 An object or concept that is identified by the enterprise as having an independent existence. \par }\pard\plain \s1\ql \li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel0\a djustright\rin0\lin0\itap0 \cbpat17 \b\f1\fs24\lang2057\langfe1033\kerning32\cgrid\langnp2057\langfenp1033 {\lang6153\langfe1033\langnp6153 Entity \par }\pard\plain \ql Mark Up: EDIFACT '''ED2'''OPENET:1111111:OVT':003705655815:OVT'ABC1234567'0'TYP:ORDERS'N RQ:1''' UNA:+.? 'UNB+UNOC:2+003705655815:30+1111111:30+980729:2233+4++ORDERS911+++KKK KATE+1'UNH+ 1+ORDERS:001:911:UN:FI0030'BGM+640+1234567'DTM+4:19981201:102'DTM+2:199 90101:102'DTM+2:9901:616'RFF+BC:123'RFF+VN:123456'NAD+BY+003705655815:1 00' NAD+SE+11111111::92'NAD+PL+53432::92++KAUPPA:KAUPUNKI+KATU 9+KAUPUNKI++00007'NAD+CN+-::ZZ++TERMINAALI+OVI 42+TOINEN KAUPUNKI++00069'UNS+D'LIN+1++23442423234 :EN'PIA+5+3244:MF'PIA+5+2341234324:ZBU'PIA+5+234243:ZCG'IMD+F+8+::91:KUKKAPUR KKI:SAVI'QTY+21:8:KPL'FTX+AAA+++T.HARMAA:V[RI'FTX+AAA+++10:KOKO'PRI+NTP :7.23:+ RP:7.32:PE'TAX+7+VAT+++:::22.00'LIN+2++543434554345:EN'PIA+5+535:MF'PIA +5+45: PCE‘UNT+38+2'UNZ+2+4' '''EOF'''9' Mark Up: XML <fragment> <section> <title>Introduction</title> <para>Since the emergence of <acronym refid="xml">XML</acronym> in early 1998 and it's subsequent adoption across diverse application domains, one of the key benefits it enabled was the separation of content and presentation <bibref refloc="Bos97"/>. <acronym refid="xml">XML</acronym> borrowed this model (along with other important concepts) from the <acronym.grp><acronym refid="sgml">SGML</acronym><expansion id="sgml">Standard Generalised Markup Language</expansion></acronym.grp>. An <acronym refid="sgml">SGML</acronym> document consists of logically structured content and uses a separate file (style sheet) to specify how the content should be formatted for [...] <figure id="img1"> <title>ePublishing Components</title> <graphic href="02-04-03-fig01.jpg" width="321" height="214"/> </figure> </section> </fragment> What is SGML? • Standard Generalised MarkUp Language • ISO standard since 1986 • Meta-language for defining document mark-up vocabularies • Uses logical mark-up (structure, content) instead physical (how document looks on printed page) • Platform-, system-, vendorand version-independent documents • Very powerful, but contains a number of complex features <!DOCTYPE anthology [ <!ELEMENT anthology - - (poem+) > <!ELEMENT poem - - (title?, stanza+)> <!ELEMENT title - O (#PCDATA) > <!ELEMENT stanza - O (line+) > <!ELEMENT line O O (#PCDATA) > ]]> <anthology> <poem> <title> The SICK ROSE </title> <stanza> <line>O Rose thou art sick.</line> <line>The invisible worm,</line> [...] </stanza> <stanza> <line>Has found out thy bed</line> <line>Of crimson joy:</line> [...] </stanza> </poem> </anthology> What is HTML? • HTML, the de facto standard for publishing Web content, is an SGML vocabulary • Supporting full SGML on the Web was too difficult so HTML made some simplifications – – – – not extensible limited structure not content oriented cannot be validated • HTML is a simple language to understand and use • Most of the content available on the Web has been created with HTML <html> <head> <title>The SICK ROSE</title> </head> <body> <h1>The SICK ROSE</h1> <p> O Rose thou art sick.<br /> The invisible worm,<br /> [...] </p> <p> Has found out thy bed<br /> Of crimson joy:<br /> [...] </p> </body> </html> What is XML? • eXtensible Markup Language • XML is a simplified subset of SGML • Can also be used to define document markup vocabularies (e.g. XHTML) – These can have a strictly defined structure (DTD) • Retains the powerful features of SGML (extensibility, structure, validation) • Ignores the complex features of SGML and is therefore easier to use and implement • XML documents look similar to HTML documents • Separates structure and presentation (like SGML) Design of XML • The design goals for XML as set out in the 1.0 specification are as follows: 1.XML shall be straightforwardly usable over the Internet. 2.XML shall support a wide variety of applications. 3.XML shall be compatible with SGML. 4.It shall be easy to write programs which process XML documents. 5.The number of optional features in XML is to be kept to the absolute minimum, ideally zero. Design of XML 6. XML documents should be human-legible and reasonably clear. 7. The XML design should be prepared quickly. 8. The design of XML shall be formal and concise. 9. XML documents shall be easy to create. 10. Terseness in XML markup is of minimal importance. XML Example <?xml version='1.0' encoding='ISO-8859-1' standalone='yes' ?> <doc type="book" isbn="1-56592-796-9" xml:lang="en"> <title>A Guide to XML</title> <author>Norman Walsh</author> <chapter> <title>What Do XML Documents Look Like?</title> <paragraph>If you are [...]</paragraph> <ol> <item> <paragraph>The document begins [...]</paragraph> </item> <item> <paragraph>Empty elements have [...]</paragraph> <paragraph>In a very [...]</paragraph> </item> </ol> <section>[...]</section> [...] </chapter> <chapter>[...]</chapter> </doc> Meta Language vs. Vocabulary Meta Languages SGML XML XSL SMIL XHTML SVG HL7 v3 CEN TC251 ASTM 31.25 SynExML Electronic Patient Record Vocabularies XTM Presentation Vocabularies Vocabularies HTML Why is the emergence of XML an important development? • XML is a tool for defining languages – XML languages are easy to read – XML is self describing • Parse tree embedded in document • Grammar for language referenced via DTD/Schema • XML languages are easy for computers to process, exchange and display – XML tools are ubiquitous, free and conform to established standards – Natural affinity with Object serialization – Data source neutral XML technologies Presentation Linking CSS, Cascading Style Sheets XSL, Extensible Stylesheet Language XPath, XQuery XLink, XBase XPointer Semantics Topics Maps, Ontology Web Language RDF, Resource Description Framework Structure XML Schema, RelaxNG, RDF Schema, Document Type Definition (DTD) Syntax XML Namespaces XML 1.0 XML 1.0 • The XML 1.0 specification describes the syntax for XML documents (elements and attributes) and DTDs • An XML document is a hierarchical data structure using self-definable tags – e.g. <doc><author>[..]</author></doc> • There are many other technologies related to XML • XML is A simple common layer for tree structures in a character stream. Design goals of XML 1.0 specification 1. XML shall be straightforwardly usable over the Internet. 2. XML shall support a wide variety of applications. 3. XML shall be compatible with SGML. 4. It shall be easy to write programs which process XML documents. 5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero. 6. XML documents should be human-legible and reasonably clear. 7. The XML design should be prepared quickly. 8. The design of XML shall be formal and concise. 9. XML documents shall be easy to create. 10.Terseness in XML markup is of minimal importance. Physical Parts of XML documents Physical parts of XML documents • • • • • • • • XML Declaration Elements Attributes Document Type Declaration Entities Processing Instructions Comments Character Data Sections • XML Namespaces XML Declaration • Placed at the start of an XML document • Informs XML software of – the version of XML the document conforms to – the character encoding scheme used in the document – whether or not a set of external declarations affect the interpretation of this document <?xml version="1.0" ?> <?xml version="1.0“ encoding="UTF-8" ?> <?xml version="1.0“ encoding="UTF-8" standalone="yes" ?> Elements • Define logical structure and sections of XML documents • Four different content types: – – – – Data content Element content Mixed content Empty. • Each element must be completely enclosed by another element, except for the root • Note – Any XML name must start with a letter, underscore but after that can include also digits, fullstops, hyphens. Don’t start with colon due to namespaces Don’t include spaces <?xml version="1.0" ?> <doc> <title>Java Gently</title> <author>Judy Bishop</author> <publisher name=‘HH’ /> <chapter> <thetext> this is <bold> bold </bold> text </thetext> <paragraph/> </chapter> </doc> Attributes • Provides additional information about an element • Attributes are contained within the start-tag • Consists of a name and associated value separated by an equals sign • The attribute value must always be enclosed by quotes • The order of attributes is insignificant <?xml version="1.0" ?> <doc type="book" isbn="0-201-71050-1"> <title>Java Gently</title> <author>Judy Bishop</author> <chapter> <paragraph type="abstract"> In this book ... </paragraph> </chapter> </doc> ELEMENT vs. ATTRIBUTE • Lexically little difference, • application specific, • no hard/fast rules available. ELEMENT ATTRIBUTE • Constituent data, • Used for content, • White space can be ignored or preserved • Nesting allowed (child elements), • Convenient for large values, or binary entities. • Inherent data, • Used for meta-data, • No further nesting possible (atomic data), • Default values, • Minimal datatypes, Entities • Storage units for repeated text – Defined in a DTD • Character entities are used to insert characters that cannot be typed directly • XML contains a number of 'built-in' entities – – – – – &quot; &apos; &lt; &gt; &amp; <math> 5 &lt; 6 and 6 &gt; 5 </math> <copyright> &copyright-notice; </copyright> <bullet> XML contains a number of &apos;built-in&apos; entities <list> <item>&amp;quot;</item> <item>&amp;apos;</item> <item>&amp;lt;</item> <item>&amp;gt;</item> <item>&amp;amp;</item> </list> </bullet> Character Data Sections • Data which is to be parsed is called PCDATA • An XML parser will not treat the contents of a CDATA section as markup – Used to simplify mark-up by escaping a selection of text • Entity references are not resolved • Useful for including source code in XML <![CDATA[ You don't need to escape special characters in CDATA sections, such as <, >, &, , ' and ". ]]> <![CDATA[<<< STOP now >>>]]> <![CDATA[<?xml version='1.0'?> <person> <name>Mike</name> <age>24</age> </person>]]> Processing Instructions • Pass additional information to application (e.g. parser) • Application-specific instructions • Consists of a PI Target and PI Value • Processed by applications that recognise the PI Target <?xml-stylesheet type='text/ css' href='style.css'?> <?xml-stylesheet type='text/ xsl' href='style.xsl'?> <?myapp filename='test.txt'?> Comments • Used to comment XML documents • Not considered to be part of an XML document • An XML parser is not required to pass comments to higherlevel applications <!–- one-line comment --> <!-This is a multi-line comment --> Well formed XML • • XML Declaration required At least one element – • Empty elements are written in one of two ways: – – • • • • • Exactly one root element Closing tag (e.g. "<br></br>") Special start tag (e.g. "<br />") For non-empty elements, closing tags are required Start tag must match closing tag (name & case) Correct nesting of elements Attribute values must always be quoted Attribute minimisation not allowed Document Type Declaration • Internal/embedded DTD <?xml version='1.0' standalone='yes'> <!DOCTYPE person [ <!ELEMENT person (name, adult, nationality)> … ]> • External DTD <?xml version='1.0'> <!DOCTYPE person SYSTEM 'person.dtd'> What are XML Namespaces? • W3C recommendation (January 1999) • Each XML vocabulary is considered to own a namespace in which all elements (and attributes) are unique • A single document can use elements and attributes from multiple namespaces – A prefix is declared for each namespace used within a document. – The namespace is identified using a URI (Uniform Resource Identifier) • An element or attribute can be associated with a namespace by placing the namespace prefix before its name (i.e. 'prefix:name') – Elements (and attributes) belonging to the default namespace do not require a prefix Example: XML Namespaces <?xml version='1.0'?> St. James’s Hospital <!ELEMENT Patient (Name, DOB)> <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT Name First Last DOB (First, Last)> (#PCDATA)> (#PCDATA)> (#PCDATA)> Airport Pharmacy <!ELEMENT Drug ((Name|Substance), Code)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Substance (#PCDATA)> <!ELEMENT Code (#PCDATA)> © 2003 B. Jung <Accident Report xmlns:sjh="http://hospital/sjh" xmlns:dub=http://airport/dub > <sjh:Patient> <sjh:Name> <sjh:First>Mike</sjh:First> <sjh:Last>Murphy</sjh:Last> </sjh:Name> <sjh:DOB>12/12/1950</sjh:DOB> </sjh:Patient> <dub:Drug> <dub:Name>Nurofen</dub:Name> <dub:Code>IE-975-2</dub:Code> </dub:Drug> [...] </Accident Report> Why Namespaces? • Important for creating XML documents containing different types of data • An XML document can be assembled using elements (and attributes) from different XML vocabularies • Must be able to – avoid conflicts between names – identify the vocabulary an element belongs to XML Processing: DOM Processing XML Doc Character Stream Process into Tree Navigation API DOM Application • It views an XML tree as a data structure • It is quite large and complex... – Level 1 Core: W3C Recommendation, October 1998 • primitive navigation and manipulation of XML trees • other Level 1 parts: HTML – Level 2 Core: W3C Recommendation, November 2000 • adds Namespace support and minor new features • other Level 2 parts: Events, Views, Style, Traversal and Range – Level 3 Core: W3C Working Draft, April 2002 • adds minor new features • other Level 3 parts: Schemas, XPath, Load/Save Example: A Recipe <recipe> <title>Zuppa Inglese</title> <ingredient name="egg yolks" amount="4" /> <ingredient name="milk" amount="2.5" unit="cup" /> <ingredient name="Savoiardi biscuits" amount="21" /> <ingredient name="sugar" amount="0.75" unit="cup" /> <ingredient name="Alchermes liquor" amount="1" unit="cup" /> <ingredient name="lemon zest" amount="*" /> <ingredient name="flour" amount="0.5" unit="cup" /> <ingredient name="fresh whipping cream" amount="*" /> <preparation> <step>Warm up the milk in a nonstick sauce pan</step> <step>In a large bowl beat the egg yolks with the sugar, add the flour and combine the ingredients until well mixed.</step> <step>Add the milk, a little bit at the time to the egg mixture, mixing well.</step> <step>Put the mixture into the sauce pan and cook it on the stove at a medium low heat. Mix the cream continuously with a wooden spoon. When it starts to thicken remove it from the heat and pour it on a large plate to cool off.</step> <step>Stir the cream now and then so that the top doesn't harden.</step> <step>Dip quickly both sides of the lady fingers in the liquor. Layer them one at the time in a glass bowl large enough to contain 7 biscuits.</step> <step>Spread 1/3 of the cream and repeat the layer with lady fingers. Finish with the cream.</step> </preparation> <comment>Refrigerate for at least 4 hours better yet overnight. Before serving decorate the zuppa inglese with whipped cream.</comment> <nutrition calories="612" fat="49" carbohydrates="45" protein="4" alcohol="2" /> </recipe> Example: Getting a Recipe import java.io.*; import org.apache.xerces.parsers.DOMParser; import org.w3c.dom.*; public class FirstRecipeDOM { public static void main(String[] args) { try { DOMParser p = new DOMParser(); p.parse(args[0]); Document doc = p.getDocument(); Node n = doc.getDocumentElement().getFirstChild(); while (n!=null && !n.getNodeName().equals("recipe")) n = n.getNextSibling(); PrintStream out = System.out; out.println("<?xml version=\"1.0\"?>"); out.println("<collection>"); if (n!=null) print(n, out); out.println("</collection>"); } catch (Exception e) {e.printStackTrace();}} © 2003 B. Jung COPYRIGHT © 2000-2003 ANDERS MØLLER & MICHAEL I. SCHWARTZBACH XML Processing: SAX Processing XML Doc Character Stream Events API SAX Application • An XML tree is not viewed as a data structure, but as a stream of events generated by the parser. • The kinds of events are: – – – – – – the start of the document is encountered the end of the document is encountered the start tag of an element is encountered the end tag of an element is encountered character data is encountered a processing instruction is encountered • Scanning the XML file from start to end, each event invokes a corresponding callback method that the programmer writes Example: Getting total amount of Flour import import import import java.io.*; org.xml.sax.*; org.xml.sax.helpers.*; org.apache.xerces.parsers.SAXParser; public static void main(String[] args) { Flour f = new Flour(); SAXParser p = new SAXParser(); p.setContentHandler(f); try { p.parse(args[0]); } catch (Exception e) {e.printStackTrace();} System.out.println(f.amount); } public class Flour extends DefaultHandler { float amount = 0; public void startElement(String namespaceURI, String localName, String qName, Attributes atts) { if (namespaceURI.equals("http://recipes.org") && localName.equals("ingredient")) { String n = atts.getValue("","name"); if (n.equals("flour")) { String a = atts.getValue("","amount"); // assume 'amount' exists amount = amount + Float.valueOf(a).floatValue(); } } } } COPYRIGHT © 2000-2003 ANDERS MØLLER & MICHAEL I. SCHWARTZBACH Summary • XML = eXtensible Markup Language • An XML document is a hierarchical data structure using self-definable tags • Physical parts of XML document – – – – – – – – – XML Declaration Elements Attributes Document Type Declaration Entities Processing Instructions Comments Character Data Sections XML Namespaces • Two types of APIs popular for XML Processing: DOM & SAX • </Lecture> University of Dublin Trinity College Defining XML Vocabularies DTDs and XML Schemas Owen.Conlan@scss.tcd.ie What is an XML vocabulary? • Synonyms – ‘Application of XML’ – XML Language • Set of elements and attributes for representing domain-specific information • “Instance” of a Mark Up Language • Defined by DTD or XML Schema • Some are approved by standard organisations – E.g. ebXML, MathML, XSL etc. Remember: XML is syntax! What is a DTD? • Document Type Definition, • Defines structure/model of XML documents – Elements and Cardinality – Attributes – Aggregation • Defines default ATTRIBUTE values • Defines ENTITIES • Stored in a plain text file and referenced by an XML document (external) • Alternatively a DTD can be placed in the XML document itself (internal) • Used to validate an XML document • “Is there a need for a DTD”? Why use a DTD? • Applications may require all documents to be consistent instances of a particular vocabulary • Indicates what structures and names can be used in a document • Documents are constructed and named in a conformant manner – Ease constructing (provide structure) – Ease parsing • Validate documents in order to find inconsistencies Valid XML • Well-formed plus conforms to DTD • All elements and attributes are declared within a DTD (internal or external) • Elements and attributes match the declarations in the DTD Element Type Declaration • Define grouping of elements – "(", “)" <!ELEMENT doc (title, author, editor, chapter, appendix)> <!ELEMENT title (#PCDATA)> • Define sequence of elements – ",": followed-by (Sequence) – "|": logical or (Choice) <!ELEMENT author (name | synonym)> <!ELEMENT image EMPTY> <!ELEMENT paragraph (#PCDATA | bold | italic)*> Element Type Declaration • Define occurrences of elements – ?: zero-or-one – +: one-or-more – *: zero-or-more <!ELEMENT doc (title, author+, editor?, chapter+, appendix*)> <!ELEMENT chapter (title, (section+ | paragraph+))> <!ELEMENT list (item?, item?, item)> <!ENTITY % list "ordered | unordered | definition"> <!ELEMENT paragraph (#PCDATA | %list;)*> Attribute List Declaration • Define type of attribute – – – – – ID IDREF ENTITY NMTOKEN NOTATION • Define default values of attributes – – – – #REQUIRED #IMPLIED #FIXED A list of values with default selection <!ATTLIST person ssn ID #IMPLIED> <!ATTLIST adult age CDATA #REQUIRED> <!ATTLIST mml version ‘1.0’ #FIXED> <!ATTLIST person sex (m | f) #REQUIRED> <!ATTLIST day temperature (l | m | h) "l"> Entity Declaration • Internal entities – Built-in • External entities – References to a file (text, images etc.) • Parameter entities – Used inside DTDs <!ENTITY author "Norman Walsh, Sun Corp."> <!ENTITY copyright SYSTEM "copyright.xml"> <!ENTITY % part "(title?, (paragraph | section)*)"> Simple DTD Example <!ENTITY % part "(title?, (paragraph | section)*)"> <!ELEMENT doc (title, author+, chapter+, appendix*)> <!ATTLIST doc type (book | article) "book“ isbn CDATA #REQUIRED> <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ATTLIST <!ELEMENT <!ELEMENT <!ELEMENT title (#PCDATA)> author (#PCDATA)> chapter %part;> appendix %part;> section %part;> paragraph (#PCDATA | url | ol)*> paragraph type CDATA #IMPLIED> ol (item+)> item (paragraph+)> url (#PCDATA)> Example XML and related DTD <database> <person age='34'> <name> <title> Mr </title> <firstname> John </firstname> <firstname> Paul </firstname> <surname> Murphy </surname> </name> <hobby> Football </hobby> <hobby> Racing </hobby> </person> <!DOCTYPE database [ <person > <name> <firstname> Mary </firstname> <surname> Donnelly </surname> </name> </person> </database> <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT database (person*)> <!ELEMENT person (name,hobby*)> <!ATTLIST person age CDATA #IMPLIED> <!ELEMENT name (title?, firstname +, surname)> ]> hobby (#PCDATA)> title (#PCDATA)> firstname (#PCDATA)> surname (#PCDATA)> What are XML Schemas? • W3C Recommendation, 2 May 2001 – Part 0: Primer – Part 1: Structures – Part 2: Datatypes • DTDs use a non-XML syntax and have a number of limitations – no namespace support – lack of data-types • XML Schemas are an alternative to DTDs • Used to formally specify a "class" of XML documents ( n "instance document") • Supports simple/complex data-types Why use XML Schemas? • Uses an XML syntax • Supports simple and complex data-types such as user-defined types • An XML document and its contents can be validated against a Schema • Can validate documents containing multiple namespaces • Schemas are more powerful than DTDs and will eventually replace DTDs DTD Named Types – simple <!ELEMENT firstname (#PCDATA)> XML doc. Instance XML Schema <xsd:element name="firstname" type="xsd:string"/> <firstname>Michael</firstname> XML doc. Instance XML Schema DTD Named Types – complex <!ELEMENT name (firstname, lastname)> <xsd:complexType name="namePerson"> <xsd:sequence> <xsd:element name="firstname" type="xsd:string"/> <xsd:element name="lastname" type="xsd:string/> </xsd:sequence> </xsd:complexType> <xsd:element name="name" type="namePerson"/> <name> <firstname>Michael</firstname> <lastname>Porter</lastname> </name> Primitive Datatypes • • • • • • • • • string boolean decimal float double duration dateTime time date http://www.w3.org/TR/xmlschema-2/ • • • • • • • • • • gYearMonth gYear gMonthDay gDay gMonth hexBinary base64Binary anyURI QName NOTATION XML doc. Instance XML Schema Simple Type - Restriction <simpleType name='celsiusBodyTemp'> <restriction base='decimal'> <totalDigits value='4'/> <fractionDigits value='1'/> <minInclusive value='36.4'/> <maxInclusive value='40.5'/> </restriction> </simpleType> <xsd:element name="temp" type="celsiusBodyTemp"/> <temp>37.2</temp> XML doc. Instance XML Schema Simple Type - Enumeration <xsd:simpleType name="weekday"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="Sunday"/> <xsd:enumeration value="Monday"/> <xsd:enumeration value="Tuesday"/> [...] </xsd:restriction> </xsd:simpleType> <xsd:element name="delivery" type="weekday"/> <delivery>Tuesday</delivery> DTD <!ENTITY % fullname "title?, firstname*, lastname"> <!ELEMENT name (%fullname;)> XML Schema Cardinalities <xsd:complexType name="fullname"> <xsd:sequence> <xsd:element name="title" minOccurs="0"/> <xsd:element name="firstname" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="lastname"/> </xsd:sequence> </xsd:complexType> <xsd:element name="name" type="fullname"/> XML doc. Instance Complex Type - <name> <firstname>Michael</firstname> <firstname>Jason</firstname> <lastname>Porter</lastname> </name> DTD <!ENTITY % name "title?, firstname*, lastname"> <!ELEMENT name (%name;, maidenname?)> XML Schema Derived Type by extension <xsd:complexType name="fullnameExt"> <xsd:complexContent> <xsd:extension base="fullname"> <xsd:sequence> <xsd:element name="maidenname" minOccurs="0"/> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="name" type="fullnameExt"/> XML doc. Instance Complex Type – <name> <firstname>Jane</firstname> <lastname>Porter</lastname> <maidenname>Hughes</maidenname> </name> XML doc. Instance XML Schema Complex Type – Derived Type by Restriction <xsd:complexType name="simpleName"> <xsd:complexContent> <xsd:restriction base="fullname"> <xsd:sequence> <xsd:element name="title" maxOccurs="0"/> <xsd:element name="firstname" minOccurs="1"/> <xsd:element name="lastname"/> </xsd:sequence> </xsd:restriction> </xsd:complexContent> </xsd:complexType> <xsd:element name="name" type="simpleName"/> <name> <firstname>Jane</firstname> <lastname>Porter</lastname> </name> Sequence XML Schema <!ELEMENT name (title?, firstname*, lastname)> <xsd:complexType name="fullname"> <xsd:sequence> <xsd:element name="title" minOccurs="0"/> <xsd:element name="firstname" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="lastname"/> </xsd:sequence> </xsd:complexType> <xsd:element name="name" type="fullname"/> XML doc. Instance DTD Structure - <name> <firstname>Michael</firstname> <firstname>Jason</firstname> <lastname>Porter</lastname> </name> Choice XML Schema <!ELEMENT pay (product, number, (cash | cheque))> <xsd:complexType name="payment"> <xsd:sequence> <xsd:element ref="product"/> <xsd:element ref="number"/> <xsd:choice> <xsd:element ref="cash"/> <xsd:element ref="cheque"/> </xsd:choice> </xsd:sequence> </xsd:complexType> <xsd:element name="pay" type="payment"/> XML doc. Inst. DTD Structure - <pay> <product>Ericsson Telefon MD110</product> <number>1544-198-J</number> <cash>IR£150</cash> </pay> DTD <xsd:element name="greeting"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="language" type="xsd:string"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element> XML doc. Instance <!ELEMENT greeting (#PCDATA)> <!ATTLIST greeting language CDATA "English"> XML Schema Attributes <greeting language="German">Hello!</greeting> XML Inst. XML Schema DTD Attribute Groups <!ELEMENT img EMPTY> <!ATTLIST img src CDATA #REQUIRED width CDATA #IMPLIED height CDATA #IMPLIED> <xsd:attributeGroup name="imgAttributes"> <xsd:attribute name="src" type="xsd:string" use="required"/> <xsd:attribute name="width" type="xsd:integer"/> <xsd:attribute name="height" type="xsd:integer"/> </xsd:attributeGroup> <xsd:element name="img"> <xsd:complexType> <xsd:attributeGroup ref="imgAttributes"/> <xsd:complexType> </xsd:element> <img src="XMLmanager.gif" width="60"/> XML Schema DTD Mixed Content <!ELEMENT p (#PCDATA | b | i)*> <!ELEMENT b (#PCDATA)> <xsd:complexType name="bolditalicText" mixed="true"> <xsd:choice minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="b" /> <xsd:element ref="i" /> </xsd:choice> </xsd:complexType> XML doc. Instance <xsd:element name="p" type="bolditalicText"/> <p>This is <b>bold</b> and <i>italic</i> text</p> XML doc. Instance XML Schema DTD Empty Element <!ELEMENT img EMPTY> <!ATTLIST src CDATA #REQUIRED> <xsd:element name="img"> <xsd:complexType> <xsd:attribute name="src" type="xsd:string"/> </xsd:complexType> </xsd:element> <img src="XMLmanager.gif"/> XML Schema Example <?xml version="1.0" encoding="utf-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"> <xsd:element name="book"> <xsd:complexType> <xsd:sequence> <xsd:element name="title" type="xsd:string"/> <xsd:element name="author" type="xsd:string"/> <xsd:element name="character” type="xsd:string" minOccurs="0" maxOccurs="unbounded"> </xsd:element> </xsd:sequence> <xsd:attribute name="isbn" type="xsd:string"/> </xsd:complexType> </xsd:element> </xsd:schema> Summary • XML Vocabularies are defined using – DTD – XSD • DTDs/XSDs used to validate XML documents • XSD – more powerful than DTDs – Supports simple and complex data-types such as user-defined types – Can validate documents containing multiple namespaces