06-XML

advertisement
XML
Internet Engineering
Spring 2015
Bahador Bakhshi
CE & IT Department, Amirkabir University of Technology
Questions
Q6) How to define the data that is transferred
between web server and client?
Q6.1) Which technology?
Q6.2) Is data correctly encoded?
Q6.3) How to access the data in web pages?
Q6.4) How to present the data?
2
Outline
Introduction
Namespaces
Validation
Presentation
XML Processing (using JavaScript)
Conclusion
4
Outline
Introduction
Namespaces
Validation
Presentation
XML Processing (using JavaScript)
Conclusion
5
Introduction
HTML + CSS + JavaScript  Interactive Web pages
 Web server is not involved after page is loaded

JavaScript reacts to user events
However, most web applications needs data
from server after the page is loaded
 e.g., new emails data in Gmail
 A mechanism to communication: AJAX
 A common (standard) format to exchange data
In most applications, the data is structured
6
Introduction (cont’d)
In general (not only in web) to store or
transport data, we need a common format, to
specify the stucture of data; e.g.,
 Documents: PDF, HTML, DOCx, PPTx, ...
 Objects: Java Object Serialization/Deserialization
How to define the data structure?
 Binary format (similar to binary files)

Difficult to develop & debug, machine depended, …
 Text format (similar to text files)

Human readable, machine independent & easier
7
Introduction (cont’d)
 Example: Data structure of a class
 Course name, teacher, # of students, each student information
IE
Bakhshi
48
Ali
Hassani
1111
Babak
Hosseini
2222
….
Student num: 48
Name: IE
Teacher: Bakhshi
Ali Hassani 1111
Babak Hosseini 2222
….
class Course{
string name;
string teacher;
integer num;
Array st of Students;
}
c = new Course();
c.name = IE;
c.teacher = Bakhshi;
c.num = 48
st[1] = new Student();
st[1].name=Ali; st[2].fam=Hassani
….
8
Introduction (cont’d)
W3C’s approach
 XML: eXtensible Markup Language
 A meta-markup language to describe data structure
 In each application, a markup language (set of tags &
attributes) are defined
<course>
<title> IE </title>
<num> 48 </num>
<teacher> Bakhshi </teacher>
<students>
<student><name>Ali</name> <fam>Hassani</fam> <id> 1111 </id></student>
…
</students>
</course>
9
Introduction (cont’d)
Standard Generalized Markup Language (SGML)
 Expensive, complex to implement
XML: a subset of SGML
 Goals: generality while simplicity and usability
 Simplifies SGML by:

leaving out many syntactical options and variants
 SGML ~ 600pp, XML ~ 30pp
 XML = SGML  {complexity, document perspective}
+ {simplicity, data exchange perspective}
10
Why to Study XML: Benefits
Simplify data sharing & transport
 XML is text based and platform independent
Extensive tools to process XML
 To validate, to present, to search, …
In web application, data separation from HTML
 E.g., table structure by HTML, table data by XML
Extensible for different applications
 A powerful tool to model/describe complex data
 E.g., MS Office!!!
11
XML Document Elements
Markup
 Elements


Tag + Content
Attributes
 Comments
 Processing instructions
Content
 Parsed Character Data
 Unparsed Character Data (CDATA)
12
XML Elements
XML element structure
 Tag + content
<tagname attribute=“value”>
Content
</tagname>
No predefined tag
If content is not CDATA, is parsed by parser
 A value for this element
 Child elements of this element
13
XML Elements’ Attributes
Tags (elements) are customize by attribute
 No predefined attributes
<os install="factory">Windows</os>
<os install="user">Linux</os>
Attribute vs. Tags (elements)
 Attributes can be replaced by elements


Attribute cannot be repeated for an element
Attribute cannot have children
 Attributes mainly used for metadata, e.g., ID, class
14
Processing Instructions
 Processing instructions pass information (instruction) to
the application that processes the XML file
 They are not a part of user data
<?Target
String ?>
 Common usage
<?xml-stylesheet href="URL" type="text/xsl"?>
 XML Declaration is a special PI
<?xml version="1.0" encoding="UTF-16"?>
 XML Declaration is always first line in file
15
Basic XML Document Structure
<?xml version="1.0" encoding="UTF-16"?>
<root-tag>
<inner-tags>
Data
</inner-tags>
<!-- Comment -->
</root-tag>
16
Example
<?xml version="1.1" encoding="UTF-8" ?>
<notebook>
<name>ThinkPad</name>
<model>T500</model>
<spec>
<hardware>
<RAM>4GB</RAM>
</hardware>
<software>
<OS>Linux, FC21 </OS>
</software>
</spec>
</notebook>
17
Example (CDATA)
<?xml version="1.1" encoding="UTF-8" ?>
<operator>
<mathematic>
+-*/%
</mathematic>
<comparison>
<![CDATA[
< <= == >= > !=
]]>
</comparison>
</operator>
18
XML vs. HTML
Tags
 HTML: Predefined fixed tags
 XML: No predefined (meta-language)

User defined tags & attributes
Purpose
 HTML: Information display
 XML: Data structure & transfer (which is displayed)
Rules’ strictness
 HTML (not XHTLM): loose
 XML: strong/strict rule checking
19
XML in General Application
 XML by itself does not do
anything
XML
document
 XML just describes the
structure of the data
 Other applications parse
XML and use it
Apache
Xerces
 A similar approach is used for
formats (event user-defined format);
so, what is the advantages of XML?!!!
 XML is standard
 Available XML tools & technologies
20
XML processor
(aka. XML
Parser)
SAX, DOM
application
XML Technology Components
Data structure (tree) representation
 XML document (a text file)
Validation & Conformance
 Document Type Definition (DTD) or XML Schema
Element access & addressing
 XPath, DOM
Display and transformation
 XSLT or CSS
Programming, Database, Query, …
21
Outline
Introduction
Namespaces
Validation
Presentation
XML Processing (using JavaScript)
Conclusion
22
Namespaces
 In XML, element names are defined by developers
 Results in a conflict when trying to mix XML documents from different
XML applications
 XML file 1
<table>
<tr>
<td>Apples</td> <td>Bananas</td>
</tr>
</table>
 XML file 2
<table>
<name>Dinner Table</name>
<width>80</width> <length>120</length>
</table>
23
Namespaces
Name conflicts in XML can easily be avoided by
using a qualified names according to a prefix
 Qualified name is the prefixed name
 Prefix is the namespaces
Step 1: Namespace declaration
 Defines a label (prefix) for the namespace and
associates it to the namespace identifier

URI/URL is used to be universally unique
Step 2: Qualified name
 namespace prefix: local name
24
Namespaces
<?xml version="1.0"?>
<ceit:course
xmlns:ceit="http://ceit.aut.ac.ir">
<ceit:department>
<ceit:name>
Computer Engineering &
Information Technology
</ceit:name>
</ceit:department>
Actual name of this tag of parser:
<ceit:name>
http://ceit.aut.ac.ir:department
Internet Engineering
</ceit:name>
</ceit:course>
25
Default Namespaces
<alltables>
<table xmlns="http://www.w3.org/TR/html4/">
<tr>
<td>Apples</td> <td>Bananas</td>
</tr>
</table>
<table xmlns="http://www.dinnertable.com">
<name>Dinner Table</name>
<width>80</width>
<length>120</length>
</table>
</alltables>
26
Outline
Introduction
Namespaces
Validation
Presentation
XML Processing (using JavaScript)
Conclusion
27
Valid XML
XML is used to describe a structured data
 The description must be correct

A valid XML file
Correctness
 Syntax


Syntax error  parser fails to parse the file
Syntax rules: e.g., all XML tags must be closed
 Symantec (structure)


Application specific rules, e.g. student must have ID
Error  Application failure
28
XML Syntax Rules (Well-Formed)
 Start-tag and End-tag, or self-closing tag
 Tags can’t overlap
 XML documents can have only one root element
 XML naming conventions
 Names can start with letters or the dash (-) character

After the first character, numbers, hyphens, and periods are allowed
 Names can’t start with “xml”, in uppercase or lowercase
 There can’t be a space after the opening < character
 XML is case sensitive
 Value of attributes must be quoted
 White-spaces are preserved
 &, <, > are represented by & < >
29
How to Validate XML (structure)?
1) Application specific programs need to check
structure of XML document
 Different applications  different programs
 Change in data structure  code modification
2) General XML parser + reference document
 Reference document

Tag names, attributes, tree structure, tag relations, …
 Different reference documents

DTD, XML Schema, RELAX NG
30
XML Validation (cont’d)
Document Type Definition (DTD) or XML Schema
 A language to define document type
The rules of the structure of XML
 Internal or External

parser
interface
XML data
XML-based
application
parser
DTD / Schema
31
DTD
DTD is a set of structural rules called declarations,
specify
 A set of elements and attributes that can be in XML

Where these elements and attributes may appear
 <!keyword …>

ELEMENT: to define tags



For leaf nodes: Character pattern
For internal nodes: List of children
ATTLIST: to define tag attributes

Includes: name of the element, the attribute’s name, its
type, and a default option
32
ELEMENT Declaration
 General form of internal nodes
 <!ELEMENT element_name (list of children)>
 To control the number of times a child may appear



+ : One or more
* : Zero or more
? : Zero or one
 General form of leaf nodes
 <!ELEMENT element_name (#type)>
 Where, types

PCDATA: Most commonly used, the content will be parsed,



i.e. < > & is not allowed
ANY: Any character can be used (i.e., CDATA)
EMPTY: No content
33
ATTLIST Declaration
 <!ATTLIST element_name attribute_name
attribute_type default_option>
 element_name: The name of the corresponding element
 attribute_name: The name of attribute
 attribute_type: Commonly CDATA is used
 default_option:
 A value: The default value of the attribute
 #REQUIRED: The attribute is mandatory
 #IMPLIED: The attribute is optional
34
ENTITY Declaration
<!ENTITY % name value>
 Defines an entity by given name and associated value
 %name is replaced by the value
 Very similar to #define in C
 Example
 <!ENTITY % ContentType "CDATA">
 <!ELEMENT test %ContentType;>
35
Example: Internal DTD
<?xml version="1.0"?>
<!DOCTYPE name [
<!ELEMENT name (first, middle, last)>
<!ATTLIST name nickename (#CDATA)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT middle (#PCDATA)>
<!ELEMENT last (#PCDATA)>
]>
<name nickname="Jo">
<first>John</first>
<middle>Johansen</middle>
<last>Smith</last>
</name>
36
External DTD
 System
<!DOCTYPE root_name SYSTEM "URL" >
 Public
<!DOCTYPE root_name PUBLIC "-//name//DTD
Name//EN" "URL">
 Common format is FPI (Defined in the document ISO 9070)
 Example
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1strict.dtd">
37
Example: External DTD
sample.dtd
<!ELEMENT note (to+,from,heading*,main)>
<!ELEMENT to
(#PCDATA)>
<!ELEMENT from
(#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT main
(#PCDATA)>
-------------------------------------------------------------------------external-dtd.xml
<?xml version="1.0" ?>
<!DOCTYPE note SYSTEM "sample.dtd" >
<note>
<to>Ali</to>
<to>Hassan</to>
<from>Babak</from>
<main>This is message</main>
</note>
38
XML Schema
 XML Schema describes the structure of an XML file
 Also referred to as XML Schema Definition (XSD)
 XML Schemas benefits (DTD disadvantages)
 Created using basic XML syntax (DTD has its own syntax)
 Validate text element content based on built-in and user-
defined data types (DTD does not fully support data type)
 Similar to OOP
 Schema is a class & XML files are instances
 Schema specifies


Elements and attributes, where and how often
Data type of every element and attribute
39
Schema (cont’d)
XML schema is itself an XML-based language
 Has its own predefined tags & namespace
xmlns:xs="http://www.w3.org/2001/XMLSchema"
Two categories of data types
 Simple: Cannot have nested elements or attribute (i.e.,
itself is a leaf or attribute)
 Primitive: string, Boolean, integer, float, ...
 Derived: byte, long, unsignedInt, …
 User defined: restriction of base types
 Complex: Can have attribute or/and nested elements
40
XML Schema (cont’d)
 Simple element declaration
<xs:element name="a name" type="a type" />
<xs:attribute name="a name" type="a type" />
 Complex element declaration
<xs:element name="a name">
<xs:complexType>
<xs:sequence> or <xs:all> or <xs:choice>
<xs:element name
minOccurs="…" maxOccurs="…"/>
</xs:sequence> or </xs:all> or </xs:choice>
</xs:complexType>
</xs:element>
41
XML Schema (cont’d)
Notes on minOccurs & maxOccurs
 Using the all indicator

minOccurs & maxOccurs indicator can only
be 0 or 1
 The default value for minOccurs is 1
 To allow an element to appear an unlimited
number of times, use the
maxOccurs="unbounded" statement
42
XML Schema Example: note.xsd
<?xml version="1.0"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="date" type="xs:date"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
43
XML Schema Example: note.xml
<?xml version="1.0"?>
<note
xmlns:xsi="http://www.w3.org/2001/XM
LSchema-instance"
xsi:schemaLocation="note.xsd">
<to>Ali</to>
<from>Reza</from>
<date>1391/1/1 </date>
</note>
44
XML Validation Tools
Online validators
 validator.w3.org
 www.xmlvalidation.com
XML tools & commands
 xmllint commands in Linux


xmllint xmlfile --valid --dtdvalid DTD
xmllint xmlfile --schema schema
XML libraries
 LibXML2 for C
 Java & C# XML libraries
45
Outline
Introduction
Namespaces
Validation
Presentation
XML Processing (using JavaScript)
Conclusion
46
XML Presentation
By default, browsers parses & displays XML files
 Tree structure of XML
 Syntax checking  Well-formed XML
Other presentations of XML
 1) Browsers support CSS for XML files

CSS is used to format the representation of XML
 2) Transform to HTML + CSS using XSLT

A powerful tool to separate data from HTML
 3) Use JavaScript to generate HTML for XML

Parse the XML and create HTML elements
47
XML & CSS
Attach styling instructions directly to XML
<?xml-stylesheet href="URL"
type="text/css" ?>
Can style but not rearrange elements
 Block or inline style
 Bold, italic, underline, font, color, etc.
…
Tag_name {color:red;
font-weight:bold;
font-family:serif;}
48
CSS Example
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="book.css" ?>
<programming>
Good programming books:
<C>
<book>
<title>The C Programming Language </title>
<author>Ritchie</author>
</book>
</C>
<Java>
<book>
<title>Thinking in Java </title>
<author>Eckel</author>
</book>
</Java>
</programming>
==========================================
*{display: block;}
programming{font-family: Arial; font-size:20pt;}
C{color: blue;}
Java{color: green;}
author{ font-style:italic;}
49
XML & CSS
CSS is mainly designed to format HTML
presentation
It does not work well in XML
 XML does not have any predefined tags/attributes
 No ID, No Class
 CSS for XML uses tag names
 The same style for all tags with the same name
CSS can only present the data in XML
 Cannot process & transform the data into other format
 e.g., presenting data as a table
50
XSL
XSL stands for eXtensible Stylesheet Language,
and is a style sheet language for XML documents
 Started with XSL & leads to XSLT, XPath, and XSL-FO
Xpath
 A language for navigating XML documents
XSLT (XSL Transform)
 Transforms XML into other formats, like HTML
XSL-FO (XSL Formatting Objects)
 Not discussed here!
51
XPath
XPath is a language for addressing different
parts of XML document
 XPath is a syntax for defining parts of an XML
XPath uses path expressions to navigate in
XML documents
52
XPath: Basic Syntax
XPath considers the tree structure of XML
 Parent, child, sibling, Ancestor, …
 Very similar to Linux FS hierarchy
Nodes are selected using path expressions
 Absolute path: starts with /
 Relative path: starts from current node
Levels are separated by /
Expressions can be based on attributes
53
XPath: Basic Syntax (cont’d)
Expression
Description
/
Selects from the root node
//
Selects nodes from descendants of the current node
.
Selects the current node
..
Selects the parent of the current node
@
Selects attributes
*
Matches any element node
@*
Matches any attribute node
54
XPath: Basic Syntax: Example
<bookstore>
<book>
<title lang="eng">Beginning XML</title>
<price>100</price>
</book>
</bookstore>
Path Expression
/bookstore
/bookstore/book
bookstore/book/title
//book
bookstore//book
//@lang
Result
Selects the root element bookstore
Selects all book elements that are children of bookstore
Selects titles of all books
Selects all book elements no matter where they are in the
document
Selects all book elements that are descendant of the
bookstore element, no matter where they are under the
bookstore element
Selects all attributes that are named lang
55
XPath: Advanced Syntax
Path Expression
Result
/bookstore/book[1]
Selects the first book element that is the child
of the bookstore element.
/bookstore/book[last()]
Selects the last book element that is the child
of the bookstore element
/bookstore/book[last()-1]
Selects the last but one book element that is
the child of the bookstore element
/bookstore/book[position()<3]
Selects the first two book elements that are
children of the bookstore element
//title[@lang]
Selects all the title elements that have an
attribute named lang
//title[@lang='eng']
Selects all the title elements that have an
attribute named lang with a value of 'eng'
/bookstore/book[price>35.00]
Selects all the book elements of the bookstore
element that have a price element with a value
greater than 35.00
56
XSLT
 What is XSLT (XSL Transformations)?
 XSLT is an XML file that transforms an XML document into
another document: e.g., XML or XHTML
 How does it work?
 XSLT is composed of templates

XSLT uses XPath to define parts of XML that should match template
 Algorithm:



Applying
Templates

Set current node = /
Find the (best) template that matches the current node
If matched (current node == Path expression), transform current node
into the result document defined by XSLT commands in the template
If not matched, apply the default template
57
XSLT Structure
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="expression">
XSLT commands
HTML tags + Other contents
</xsl:template>
</xsl:stylesheet>
Copied to output
The template are run for every node that is
matched to the given “expression”
The default template:
If there is a child node  current node = child
If there is not child node  print the value
58
XSLT Commands (elements)
 <xsl:output method="method" />
 To define the format of the output created by the stylesheet


In this course, we want to create HTML from XML
method="html"
 The default output is also HTML
 <xsl:template match="expression">
 To define a template for a specified nodes
 <xsl:apply-templates select="expression">
 If select is given  Processes all the element specified by the
expression (current node loops over the selected elements)
 If select is not given  Process all child nodes of the current
node (current node loops over the children of this node)
59
XSLT Commands (elements)
 <xsl:value-of select="expression"/>
 The value of the specified nodes
 <xsl:text disable-output-escaping="yes|no"> Text
</xsl:text>
 Copy the given text to the output
 If escaping is not disabled  output is escaped; i.e, “>”  “>”
 <xsl:for-each select="expression">
 Create a loop on array of selected nodes
 <xsl:if test="expression">
 Conditional rules
 Comparison Operators
 Equal: =
Not equal: !=
Less than: <
60
Greater than: >
XSLT Commands (elements)
 <xsl:choose>
<xsl:when test="expression">
<xsl:otherwise>
</xsl:choose>
 Switch-case (if-else) rules
 The “template applying” algorithm selects the best matched
template if current node matches with multiple templates
 The best is defined according to the priority of templates


The priorities are defined in the standard (we don’t see here)
E.g., in the case of multiple template with identical match attribute,
the last template has the highest priority
61
XSLT Example: XML Data file
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="xslt-test.xslt"?>
<class>
<course>
<name>Internet Engineering</name>
<semester>Spring 2012</semester>
</course>
<student>
<name>Ali</name><family>Alizadeh</family>
<grade>18.0</grade><number>123</number>
</student>
<student>
<name>Babak</name><family>Babaki</family>
<grade>7.0</grade><number>234</number>
</student>
<student>
<name>Hassan</name><family>Hassani</family>
<grade>19.0</grade><number>345</number>
</student>
</class>
62
XSLT Example: XSLT file
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html> <body>
<h2>Course: <xsl:value-of
select="/class/course/name"/></h2>
<h3>Semester: <xsl:value-of select="//semester"/>
</h3>
<h3>Students:
<xsl:for-each select="//family">
"<xsl:value-of select="."/>"
</xsl:for-each></h3>
<table border="1">
<tr> <th> Student # </th> <th>Name</th>
<th>Family</th> <th>Grade</th></tr>
63
<xsl:for-each select="/class/student">
<tr>
<td><xsl:value-of select="number"/></td>
<td><xsl:value-of select="name"/></td>
<td><xsl:value-of select="family"/></td>
<xsl:choose>
<xsl:when test="grade < 10">
<td style="backgroundcolor:red"><xsl:value-of select="grade"/></td>
</xsl:when>
<xsl:otherwise>
<td style="backgroundcolor:green"><xsl:value-of select="grade"/></td>
</xsl:otherwise>
</xsl:choose>
</tr>
</xsl:for-each>
</table>
</body> </html>
</xsl:template>
64
</xsl:stylesheet>
XSLT Example: Result
65
XSLT Example: apply-templates (1)
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" />
<xsl:template match="/">
<html> <body>
<xsl:apply-templates />
</body> </html>
</xsl:template>
<xsl:template match="class">
<b><xsl:apply-templates /></b>
</xsl:template>
<xsl:template match="student">
<br /> <em><xsl:apply-templates /></em>
</xsl:template>
</xsl:stylesheet>
66
XSLT Example: apply-templates (2)
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" />
<xsl:template match="/">
<html> <body>
<xsl:apply-templates select="/class/course" />
<table border="1">
<tr> <th> Student # </th> <th>Name</th> <th>Family</th>
<th>Grade</th></tr>
<xsl:apply-templates select="/class/student" />
</table>
</body> </html>
</xsl:template>
67
XSLT Example: apply-templates (2)
<xsl:template match="course">
<h2>Course: <xsl:value-of
select="/class/course/name"/></h2>
<h3>Semester: <xsl:value-of select="//semester"/>
</h3>
<h3>Students: <xsl:for-each select="//family">
"<xsl:value-of select="."/>" </xsl:for-each></h3>
</xsl:template>
<xsl:template match="student">
<tr>
<td><xsl:value-of select="number"/></td>
<td><xsl:value-of select="name"/></td>
<td><xsl:value-of select="family"/></td>
68
XSLT Example: apply-templates (2)
<xsl:choose>
<xsl:when test="grade < 10">
<td style="background-color:red">
<xsl:value-of select="grade"/>
</td>
</xsl:when>
<xsl:otherwise>
<td style="background-color:green">
<xsl:value-of select="grade"/>
</td>
</xsl:otherwise>
</xsl:choose>
</tr>
</xsl:template>
</xsl:stylesheet>
69
Where XSLT? Server Side
Server transforms XML to HTML/CSS;
Ship to client browser for display
XML
XSLT
HTML
+CSS
Stylesheet
71
http
Browser/
Interface
Where XSLT? Client-Side
Server sends XML & Stylesheet to client
Client transforms XML to HTML & CSS
XML
http
XSLT
Stylesheet
72
HTML
+CSS
Browser/
Interface
Where XSLT?
Client-side (browser)
 Reduce the load on your servers
 Search engines do not process the XML & XSLT
Server-side
 Performance could be a problem on busy servers
 Search engines can process the HTML (we have SO!)
Offline Pre-conversion
 E.g. xsltproc command Linux
 Best performance
 Not good for dynamic documents
73
Outline
Introduction
Namespaces
Validation
Presentation
XML Processing (using JavaScript)
Conclusion
74
XML Processor: Parsers
There are two basic types of XML parsers:
Tree (DOM)-based parser:
 Whole document is analyzed to create a DOM tree
 Advantages: Multiple & Random access to elements,
easier to validate the structure of XML
Event-based parser (SAX):
 XML document is interpreted as a series of events
 When a specific event occurs, a function is called to
handle it (we will see it later in PHP)
 Advantages: less memory usage and no wait to
complete faster
75
XML Parsing in Browser
Web browsers have built-in XML parser
 XML parser output: XML DOM
XML DOM is accessible through JavaScript
How to get XML file in Java Script?
 Using AJAX

We will see later
 (Input) string
76
XML DOM
 XML DOM is similar to HTML DOM
 A tree of nodes (with different types: element, text, attr, …)
 Nodes are accessed by getElementsByTagName
 Nodes are objects (have method & fields)
 DOM can be modified, e.g., create/remove nodes
 However
 There is not predefined attributes link id/class

getElementById or similar methods are not applicable
 Since XML is not for presentation


Nods have not event handler functions
Nodes have not style field
77
XML DOM in JavaScript
DOMParser can parse an input XML string
Each node have
 parentNode, children, childNodes, …
Access to value of a node
 In the DOM, everything is a node (with different types)
 Element nodes do not have a content value
 The content of an element is stored in a child node

To get content of a leaf element, the value of the first
child node (text node) should got
78
Example: Message Parser
<body>
Enter Your XML:<br />
<textarea name="inputtext" cols="50"
rows="10"><root><msg><from></from><to></to><body
></body></msg></root></textarea>
<input type="button" onclick="parse()"
value="Parse" /> <br />
<div name="outputdiv" style="borderstyle:solid; border-width:1px;
width:50%;"></div>
</body>
79
Example: Message Parser
function parse(){
output = "";
input = document.getElementsByName("inputtext")[0].value;
parser = new DOMParser();
xmlDoc = parser.parseFromString(input,"text/xml");
messages = xmlDoc.getElementsByTagName("root")[0].children;
for(i=0; i < messages.length; i++){
msg = messages[i];
fromNode = msg.getElementsByTagName("from")[0];
fromText = fromNode.childNodes[0].nodeValue;
toNode = msg.getElementsByTagName("to")[0];
toText = toNode.childNodes[0].nodeValue;
bodyNode = msg.getElementsByTagName("body")[0];
bodyText = bodyNode.childNodes[0].nodeValue;
output = output + fromText +" sent following message to " +
toText + "<br /> ''" + bodyText +"''<hr />"
}
document.getElementsByName("outputdiv")[0].innerHTML = output;
}
80
Outline
Introduction
Namespaces
Validation
Presentation
XML Processing (using JavaScript)
Conclusion
81
Conclusion: XML Technologies
 XHTML
 A stricter and cleaner XML based version of HTML
 XSL (Extensible Style Sheet Language) XSL consists of
three parts:
 XSLT (XSL Transform) - transforms XML into other format
 XPath - a language for navigating XML documents
 DTD (Document Type Definition)
 A standard for defining the legal elements
 XSD (XML Schema)
 An XML-based alternative to DTD
82
Conclusion: XML Technologies
 SVG (Scalable Vector Graphics)
 Defines graphics in XML format
 XQuery (XML Query Language)
 An XML based language for querying XML data
 XLink (XML Linking Language)
 A language for creating hyperlinks in XML documents
 XPointer (XML Pointer Language)
 Allows the XLink hyperlinks to point to more specific parts in
the XML document
…
83
Conclusion
 XML is easy and powerful technology
 To describe data structure


To exchange
To process & transform
 XML application beyond web
 Image format: Scalable Vector Graphics (SVG)
 Everywhere to save/restore structured data

Microsoft Office, …
 Other related technologies in data exchange
 JSON (JavaScript), Protocol Buffers (Google), Thrift (Apache & FB)
84
Answers
 Q6.1) Which technology?
 Text based!
 XML: A markup met-language with user defined tags
 Q6.2) Is data correctly encoded?
 XML Validation: DTD and Schema
 Q6.3) How to access the data in web pages?
 Parse XML to DOM, we know how to work with DOM
 Q6.4) How to present the data?
 CSS for XML
 XPATH to select element, XSLT to translate XML to HTML
85
References
Reading Assignment: Chapter 7 of
“Programming the World Wide Web”
David Hunter, et.al, “Beginning XML,”
chapters 1-5 & chapter 8
Andrew H. Watt, “Sams Teach Yourself XML
in 10 Minutes,” chapters 1-4, chapters 8-10
http://w3schools.com/xsl/default.asp
86
Download