Chp15 XML

advertisement
Chapter 15: XML
TP2543 Web Programming
Mohammad Faidzul Nasrudin
15.1 Introduction
• The Extensible Markup Language (XML) was
developed in 1996 by the World Wide Web
Consortium’s (W3C’s) XML Working Group
• XML is a portable, widely supported, open
(i.e., nonproprietary) technology for data
storage and exchange
What is the difference between
XML and HTML?
An HTML Example
<h2>Nonmonotonic Reasoning:
ContextDependent Reasoning</h2>
<i>by <b>V. Marek</b> and
<b>M. Truszczynski</b></i><br>
Springer 1993<br>
ISBN 0387976892
The Same Example in XML
<book>
<title>Nonmonotonic Reasoning:
ContextDependent Reasoning</title>
<author>V. Marek</author>
<author>M. Truszczynski</author>
<publisher>Springer</publisher>
<year>1993</year>
<ISBN>0387976892</ISBN>
</book>
HTML versus XML: Similarities
• Both use tags (e.g. <h2> and <year>)
• Tags may be nested (tags within tags)
• Human users can read and interpret both
HTML and XML representations quite easily
• But how about machines?
Problems with Automated
Interpretation of HTML Documents
• An intelligent agent trying to retrieve the
names of the authors of the book
• Authors’ names could appear immediately
after the title or immediately after the word
“by”
• Are there two authors?
• Or just one, called iV. Marek and M.
Truszczynskii?
HTML vs XML: Structural Information
• HTML documents do not contain structural
information: pieces of the document and their
relationships.
• XML more easily accessible to machines because
– Every piece of information is described.
– Relations are also defined through the nesting
structure.
– E.g., the <author> tags appear within the <book> tags,
so they describe properties of the particular book.
HTML vs XML: Formatting
• The HTML representation provides more than
the XML representation:
– The formatting of the document is also described
• The main use of an HTML document is to
display information: it must define formatting
• XML: separation of content from display
– same information can be displayed in different
ways
15.2 XML Basics
• XML permits document authors to create
markup for virtually any type of information
– Can create entirely new markup languages that
describe specific types of data, including
mathematical formulas, chemical molecular
structures, music and recipes
• XML describes data in a way that human
beings can understand and computers can
process
15.2 XML Basics (2)
• An XML parser is responsible for identifying
components of XML documents (typically files
with the .xml extension) and then storing
those components in a data structure for
manipulation
• An XML document can reference a Document
Type Definition (DTD) or schema that defines
the document’s proper structure
15.2 XML Basics (3)
• An XML document that conforms to a
DTD/schema (i.e., has the appropriate
structure) is valid
• If an XML parser (validating or non-validating)
can process an XML document successfully,
that XML document is well-formed
player.xml
XML that describes a baseball
player’s information
15.2 XML Basics (4)
• DTDs and schemas are essential for businessto-business (B2B) transactions and mission
critical systems
• Validating XML documents ensures that
disparate systems can manipulate data
structured in standardized ways and prevents
errors caused by missing or malformed data.
15.3 Structuring Data
XML
Prolog
15.3 Structuring Data (2)
• XML element names can be of any length and
can contain letters, digits, underscores,
hyphens and periods
– Must begin with either a letter or an underscore,
and they should not begin with “xml” in any
combination of uppercase and lowercase letters,
as this is reserved for use in the XML standards
15.3 Structuring Data (3)
• When a user loads an XML document in a
browser, the browser uses a style sheet to format
the data for display
• Google Chrome places a down arrow and right
arrow next to every container element; they’re
not part of the XML document.
– down arrow indicates that the browser is displaying
the container element’s child elements
– clicking the right arrow next to an element expands
that element
article.xml in web browser
XML used to mark up an article
15.3 Structuring Data (4)
• An error will happen if:
– the XML declaration is missing
– any characters, including white space, is placed
before the XML declaration
– start tag is not matched with end tag or omitting
either tag
– different cases is used for the start-tag and endtag names for the same element
15.3 Structuring Data (5)
– a white-space character is used in an XML element
name
– nesting XML tags improperly. For example,
<x><y>hello</x></y> is an error, because the </y>
tag must precede the </x> tag
– Failure to enclose attribute values in double ("") or
single ('') quotes
letter.xml in web browser
Business letter marked up with XML
15.3 Structuring Data (6)
• An XML document is not required to reference
a DTD, but validating XML parsers can use a
DTD to ensure that the document has the
proper structure
• Validating an XML document helps guarantee
that independent developers will exchange
data in a standardized form that conforms to
the DTD
15.4 Namespaces
• XML namespaces provide a means to prevent
naming collisions
• Each namespace prefix is bound to a uniform
resource identifier (URI) that uniquely
identifies the namespace
– A URN or URL or even a random string
– The parser does not visit these URLs, nor do these
URLs need to refer to actual web pages
15.4 Namespaces (2)
• To eliminate the need to place a namespace
prefix in each element, authors can specify a
default namespace for an element and its
children
namespace.xml and
defaultnamespace.xml
XML namespaces demonstration and
default namespace demonstration
15.5 Document Type Definitions
(DTDs)
• To verify whether an XML document is valid
(i.e., its elements contain the proper attributes
and appear in the proper sequence), an XML
parser needs:
– Document Type Definitions (DTD) or
– Schema (not covered in this course)
• DTDs and schemas specify documents’ element
types and attributes, and their relationships to
one another
15.5 Document Type Definitions
(DTDs) (2)
• A DTD expresses the set of rules for document
structure using an EBNF (Extended BackusNaur Form) grammar
• In a DTD:
– an ELEMENT element type declaration defines the
rules for an element
– an ATTLIST attribute-list declaration defines
attributes for a particular element
15.5 Document Type Definitions
(DTDs) (3)
• Internal DTD
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>
15.5 Document Type Definitions
(DTDs) (4)
• External DTD
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
15.5 Document Type Definitions
(DTDs) (5)
• In ELEMENT, when children are declared in a sequence separated by
commas, the children must appear in the same sequence in the document
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
• PCDATA specifies that an element (e.g., name) may contain parsed
character data. Elements with parsed character data cannot contain
markup characters, such as less than (<), greater than (>) or ampersand
(&). Replace them with < &gt and &
15.5 Document Type Definitions
(DTDs) (6)
• In ELEMENT,
• Declaring Only One Occurrence
<!ELEMENT note (message)>
• Minimum One Occurrence
<!ELEMENT note (message+)>
• Zero or More Occurrences
<!ELEMENT note (message*)>
• Declaring Zero or One Occurrences
<!ELEMENT note (message?)>
• Declaring either/or Content
<!ELEMENT note (to,from,header,(message|body))>
15.5 Document Type Definitions
(DTDs) (7)
• Attributes are declared with an ATTLIST declaration
• CDATA specifies that attribute type contains character data. A parser will
pass such data to an application without modification
• #REQUIRED, #IMPLIED, #FIXED value
<!ELEMENT square EMPTY>
<!ATTLIST square width CDATA "0">
<!ATTLIST contact fax CDATA #IMPLIED>
<!ATTLIST person number CDATA #REQUIRED>
<!ATTLIST sender company CDATA #FIXED "Microsoft">
•
Enumerated Attribute Values
<!ATTLIST payment type (check|cash) "cash">
15.5 Document Type Definitions
(DTDs) (8)
<person sex="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
<person>
<sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
•
•
•
•
•
attributes cannot contain multiple values (child elements can)
attributes are not easily expandable (for future changes)
attributes cannot describe structures (child elements can)
attributes are more difficult to manipulate by program code
Attribute values are not easy to test against a DTD
15.5 Document Type Definitions
(DTDs) (9)
<note date="12/11/2002">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this
weekend!</body>
</note>
<note>
<date>12/11/2002</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this
weekend!</body>
</note>
<note>
<date>
<day>12</day>
<month>11</month>
<year>2002</year>
</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this
weekend!</body>
</note>
15.5 Document Type Definitions
(DTDs) (10)
• ENTITY: to define shortcuts to special characters
• Internal declaration
DTD:
<!ENTITY writer "Donald Duck.">
<!ENTITY copyright "Copyright W3Schools.">
XML:
<author>&writer;&copyright;</author>
• External declaration
DTD:
<!ENTITY writer SYSTEM "http://www.w3schools.com/entities.dtd">
<!ENTITY copyright SYSTEM "http://www.w3schools.com/entities.dtd">
XML:
<author>&writer;&copyright;</author>
letter2.xml and letter.dtd
Document Type Definition (DTD) for a
business letter
15.7 XML Vocabularies
• XML allows authors to create their own tags to
describe data precisely
– People and organizations in various fields of study
have created many different kinds of XML for
structuring data
– Some of these markup languages are:
• MathML (Mathematical Markup Language)
– describes mathematical expressions for display
• Scalable Vector Graphics (SVG)
15.7 XML Vocabularies (2)
•
•
•
•
•
•
Wireless Markup Language (WML)
Extensible Business Reporting Language (XBRL)
Extensible User Interface Language (XUL)
Product Data Markup Language (PDML)
W3C XML Schema
Extensible Stylesheet Language (XSL)
Mathml2.mml
file:///H:/TP2543/textbookcode/ch15
/Fig15_15/mathml2.mml
Firefox
15.8 Extensible Stylesheet Language
and XSL Transformations
• Convert XML into any text-based document
• XSL documents have the extension .xsl
• XSL is a group of three technologies:
– XSL-FO (XSL Formatting Objects): specifying
formatting
– XPath (XML Path Language): locating structures and
data (such as specific elements and attributes)
– XSLT (XSL Transformations): transforming the structure
of the XML document data to another structure
15.8 Extensible Stylesheet Language
and XSL Transformations (2)
• For example, XSLT allows you to convert a simple
XML document to an HTML5 document that
presents the XML document’s data (or a subset of
the data) formatted for display in a web browser
• Transforming an XML document using XSLT
involves two tree structures
– the source tree (i.e., the XML document to transform)
– the result tree (i.e., the XML document to create)
sports.xml, sports.xsl, style.css
http://test.deitel.com/iw3htp5/ch15/
Fig15_18-19/sports.xml
sorting.xml, sorting.xsl, style.css
15.8 Extensible Stylesheet Language
and XSL Transformations (3)
• XPath character / (a forward slash)
– Selects the document root
– In XPath, a leading forward slash specifies that we are using absolute
addressing
– An XPath expression with no beginning forward slash uses relative addressing
• XSL @ symbol
– Retrieves an attribute’s value
• XSL name()
– Retrieves the current node’s element name
• XSL text()
– Retrieves the text between an element’s start and end tags
• XPath expression //*
– Selects all the nodes in an XML document
• Fig. 15.22 for XSL style-sheet elements
The End
Thank You
Download