Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin 15.1 Introduction • The Extensible Markup Language (XML) was developed in 1996 by the World Wide Web Consortium’s (W3C’s) XML Working Group • XML is a portable, widely supported, open (i.e., nonproprietary) technology for data storage and exchange What is the difference between XML and HTML? An HTML Example <h2>Nonmonotonic Reasoning: ContextDependent Reasoning</h2> <i>by <b>V. Marek</b> and <b>M. Truszczynski</b></i><br> Springer 1993<br> ISBN 0387976892 The Same Example in XML <book> <title>Nonmonotonic Reasoning: ContextDependent Reasoning</title> <author>V. Marek</author> <author>M. Truszczynski</author> <publisher>Springer</publisher> <year>1993</year> <ISBN>0387976892</ISBN> </book> HTML versus XML: Similarities • Both use tags (e.g. <h2> and <year>) • Tags may be nested (tags within tags) • Human users can read and interpret both HTML and XML representations quite easily • But how about machines? Problems with Automated Interpretation of HTML Documents • An intelligent agent trying to retrieve the names of the authors of the book • Authors’ names could appear immediately after the title or immediately after the word “by” • Are there two authors? • Or just one, called iV. Marek and M. Truszczynskii? HTML vs XML: Structural Information • HTML documents do not contain structural information: pieces of the document and their relationships. • XML more easily accessible to machines because – Every piece of information is described. – Relations are also defined through the nesting structure. – E.g., the <author> tags appear within the <book> tags, so they describe properties of the particular book. HTML vs XML: Formatting • The HTML representation provides more than the XML representation: – The formatting of the document is also described • The main use of an HTML document is to display information: it must define formatting • XML: separation of content from display – same information can be displayed in different ways 15.2 XML Basics • XML permits document authors to create markup for virtually any type of information – Can create entirely new markup languages that describe specific types of data, including mathematical formulas, chemical molecular structures, music and recipes • XML describes data in a way that human beings can understand and computers can process 15.2 XML Basics (2) • An XML parser is responsible for identifying components of XML documents (typically files with the .xml extension) and then storing those components in a data structure for manipulation • An XML document can reference a Document Type Definition (DTD) or schema that defines the document’s proper structure 15.2 XML Basics (3) • An XML document that conforms to a DTD/schema (i.e., has the appropriate structure) is valid • If an XML parser (validating or non-validating) can process an XML document successfully, that XML document is well-formed player.xml XML that describes a baseball player’s information 15.2 XML Basics (4) • DTDs and schemas are essential for businessto-business (B2B) transactions and mission critical systems • Validating XML documents ensures that disparate systems can manipulate data structured in standardized ways and prevents errors caused by missing or malformed data. 15.3 Structuring Data XML Prolog 15.3 Structuring Data (2) • XML element names can be of any length and can contain letters, digits, underscores, hyphens and periods – Must begin with either a letter or an underscore, and they should not begin with “xml” in any combination of uppercase and lowercase letters, as this is reserved for use in the XML standards 15.3 Structuring Data (3) • When a user loads an XML document in a browser, the browser uses a style sheet to format the data for display • Google Chrome places a down arrow and right arrow next to every container element; they’re not part of the XML document. – down arrow indicates that the browser is displaying the container element’s child elements – clicking the right arrow next to an element expands that element article.xml in web browser XML used to mark up an article 15.3 Structuring Data (4) • An error will happen if: – the XML declaration is missing – any characters, including white space, is placed before the XML declaration – start tag is not matched with end tag or omitting either tag – different cases is used for the start-tag and endtag names for the same element 15.3 Structuring Data (5) – a white-space character is used in an XML element name – nesting XML tags improperly. For example, <x><y>hello</x></y> is an error, because the </y> tag must precede the </x> tag – Failure to enclose attribute values in double ("") or single ('') quotes letter.xml in web browser Business letter marked up with XML 15.3 Structuring Data (6) • An XML document is not required to reference a DTD, but validating XML parsers can use a DTD to ensure that the document has the proper structure • Validating an XML document helps guarantee that independent developers will exchange data in a standardized form that conforms to the DTD 15.4 Namespaces • XML namespaces provide a means to prevent naming collisions • Each namespace prefix is bound to a uniform resource identifier (URI) that uniquely identifies the namespace – A URN or URL or even a random string – The parser does not visit these URLs, nor do these URLs need to refer to actual web pages 15.4 Namespaces (2) • To eliminate the need to place a namespace prefix in each element, authors can specify a default namespace for an element and its children namespace.xml and defaultnamespace.xml XML namespaces demonstration and default namespace demonstration 15.5 Document Type Definitions (DTDs) • To verify whether an XML document is valid (i.e., its elements contain the proper attributes and appear in the proper sequence), an XML parser needs: – Document Type Definitions (DTD) or – Schema (not covered in this course) • DTDs and schemas specify documents’ element types and attributes, and their relationships to one another 15.5 Document Type Definitions (DTDs) (2) • A DTD expresses the set of rules for document structure using an EBNF (Extended BackusNaur Form) grammar • In a DTD: – an ELEMENT element type declaration defines the rules for an element – an ATTLIST attribute-list declaration defines attributes for a particular element 15.5 Document Type Definitions (DTDs) (3) • Internal DTD <?xml version="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend</body> </note> 15.5 Document Type Definitions (DTDs) (4) • External DTD <?xml version="1.0"?> <!DOCTYPE note SYSTEM "note.dtd"> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> 15.5 Document Type Definitions (DTDs) (5) • In ELEMENT, when children are declared in a sequence separated by commas, the children must appear in the same sequence in the document <!ELEMENT note (to, from, heading, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> • PCDATA specifies that an element (e.g., name) may contain parsed character data. Elements with parsed character data cannot contain markup characters, such as less than (<), greater than (>) or ampersand (&). Replace them with &lt; &gt and &amp; 15.5 Document Type Definitions (DTDs) (6) • In ELEMENT, • Declaring Only One Occurrence <!ELEMENT note (message)> • Minimum One Occurrence <!ELEMENT note (message+)> • Zero or More Occurrences <!ELEMENT note (message*)> • Declaring Zero or One Occurrences <!ELEMENT note (message?)> • Declaring either/or Content <!ELEMENT note (to,from,header,(message|body))> 15.5 Document Type Definitions (DTDs) (7) • Attributes are declared with an ATTLIST declaration • CDATA specifies that attribute type contains character data. A parser will pass such data to an application without modification • #REQUIRED, #IMPLIED, #FIXED value <!ELEMENT square EMPTY> <!ATTLIST square width CDATA "0"> <!ATTLIST contact fax CDATA #IMPLIED> <!ATTLIST person number CDATA #REQUIRED> <!ATTLIST sender company CDATA #FIXED "Microsoft"> • Enumerated Attribute Values <!ATTLIST payment type (check|cash) "cash"> 15.5 Document Type Definitions (DTDs) (8) <person sex="female"> <firstname>Anna</firstname> <lastname>Smith</lastname> </person> <person> <sex>female</sex> <firstname>Anna</firstname> <lastname>Smith</lastname> </person> • • • • • attributes cannot contain multiple values (child elements can) attributes are not easily expandable (for future changes) attributes cannot describe structures (child elements can) attributes are more difficult to manipulate by program code Attribute values are not easy to test against a DTD 15.5 Document Type Definitions (DTDs) (9) <note date="12/11/2002"> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> <note> <date>12/11/2002</date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> <note> <date> <day>12</day> <month>11</month> <year>2002</year> </date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> 15.5 Document Type Definitions (DTDs) (10) • ENTITY: to define shortcuts to special characters • Internal declaration DTD: <!ENTITY writer "Donald Duck."> <!ENTITY copyright "Copyright W3Schools."> XML: <author>&writer;&copyright;</author> • External declaration DTD: <!ENTITY writer SYSTEM "http://www.w3schools.com/entities.dtd"> <!ENTITY copyright SYSTEM "http://www.w3schools.com/entities.dtd"> XML: <author>&writer;&copyright;</author> letter2.xml and letter.dtd Document Type Definition (DTD) for a business letter 15.7 XML Vocabularies • XML allows authors to create their own tags to describe data precisely – People and organizations in various fields of study have created many different kinds of XML for structuring data – Some of these markup languages are: • MathML (Mathematical Markup Language) – describes mathematical expressions for display • Scalable Vector Graphics (SVG) 15.7 XML Vocabularies (2) • • • • • • Wireless Markup Language (WML) Extensible Business Reporting Language (XBRL) Extensible User Interface Language (XUL) Product Data Markup Language (PDML) W3C XML Schema Extensible Stylesheet Language (XSL) Mathml2.mml file:///H:/TP2543/textbookcode/ch15 /Fig15_15/mathml2.mml Firefox 15.8 Extensible Stylesheet Language and XSL Transformations • Convert XML into any text-based document • XSL documents have the extension .xsl • XSL is a group of three technologies: – XSL-FO (XSL Formatting Objects): specifying formatting – XPath (XML Path Language): locating structures and data (such as specific elements and attributes) – XSLT (XSL Transformations): transforming the structure of the XML document data to another structure 15.8 Extensible Stylesheet Language and XSL Transformations (2) • For example, XSLT allows you to convert a simple XML document to an HTML5 document that presents the XML document’s data (or a subset of the data) formatted for display in a web browser • Transforming an XML document using XSLT involves two tree structures – the source tree (i.e., the XML document to transform) – the result tree (i.e., the XML document to create) sports.xml, sports.xsl, style.css http://test.deitel.com/iw3htp5/ch15/ Fig15_18-19/sports.xml sorting.xml, sorting.xsl, style.css 15.8 Extensible Stylesheet Language and XSL Transformations (3) • XPath character / (a forward slash) – Selects the document root – In XPath, a leading forward slash specifies that we are using absolute addressing – An XPath expression with no beginning forward slash uses relative addressing • XSL @ symbol – Retrieves an attribute’s value • XSL name() – Retrieves the current node’s element name • XSL text() – Retrieves the text between an element’s start and end tags • XPath expression //* – Selects all the nodes in an XML document • Fig. 15.22 for XSL style-sheet elements The End Thank You