XML Lec. notes

advertisement
ISOM
Standards in Information
Management: XML
Arijit Sengupta
Learning Objectives
ISOM
• Learn what XML is
• Learn the various ways in which
XML is used
• Learn the key companion
technologies
• See how XML is being used in
industry as a meta-language
Agenda
ISOM
•
•
•
•
Overview
Syntax and Structure
The XML Alphabet Soup
XML as a meta-language
Overview
What is XML?
ISOM
•
•
•
•
A tag-based meta language
Designed for structured data representation
Represents data hierarchically (in a tree)
Provides context to data (makes it meaningful)
 Self-describing data
• Separates presentation (HTML) from data (XML)
• An open W3C standard
• A subset of SGML
 vs. HTML, which is an implementation of SGML
Overview
What is XML?
ISOM
• XML is a “use everywhere” data
specification
XML
XML
Application X
Documents
XML
Repository
XML
Database
Configuration
Overview
Documents vs. Data
ISOM
• XML is used to represent two main
types of things:
Documents
• Lots of text with tags to identify and
annotate portions
of the document
Data
• Hierarchical data structures
Overview
XML and Structured Data
ISOM
• Pre-XML representation of data:
“PO-1234”,”CUST001”,”X9876”,”5”,”14.98”
• XML representation of the same data:
<PURCHASE_ORDER>
<PO_NUM> PO-1234 </PO_NUM>
<CUST_ID> CUST001 </CUST_ID>
<ITEM_NUM> X9876 </ITEM_NUM>
<QUANTITY> 5 </QUANTITY>
<PRICE> 14.98 </PRICE>
</PURCHASE_ORDER>
Overview
Benefits of XML
ISOM
• Open W3C standard
• Representation of data across
heterogeneous environments
 Cross platform
 Allows for high degree of interoperability
• Strict rules
 Syntax
 Structure
 Case sensitive
Overview
Who Uses XML?
ISOM
• Submissions by
 Microsoft
 IBM
 Hewlett-Packard
 Fujitsu Laboratories
 Sun Microsystems
 Netscape (AOL), and others…
• Technologies using XML
 SOAP, ebXML, BizTalk, WebSphere, many
others…
Agenda
ISOM
•
•
•
•
Overview
Syntax and Structure
The XML Alphabet Soup
XML as a meta-language
Syntax and Structure
Components of an XML Document
ISOM
• Elements
 Each element has a beginning and ending tag
• <TAG_NAME>...</TAG_NAME>
 Elements can be empty (<TAG_NAME />)
• Attributes
 Describes an element; e.g. data type, data range, etc.
 Can only appear on beginning tag
• Processing instructions
 Encoding specification (Unicode by default)
 Namespace declaration
 Schema declaration
Syntax and Structure
Components of an XML Document
ISOM
<?xml version=“1.0” ?>
<?xml-stylesheet type="text/xsl” href=“template.xsl"?>
<ROOT>
<ELEMENT1><SUBELEMENT1 /><SUBELEMENT2 /></ELEMENT1>
<ELEMENT2> </ELEMENT2>
<ELEMENT3 type=‘string’> </ELEMENT3>
<ELEMENT4 type=‘integer’ value=‘9.3’> </ELEMENT4>
</ROOT>
Elements with Attributes
Elements
Prologue (processing instructions)
Syntax and Structure
Rules For Well-Formed XML
ISOM
• There must be one, and only one, root element
• Sub-elements must be properly nested
 A tag must end within the tag in which it was started
• Attributes are optional
 Defined by an optional schema
• Attribute values must be enclosed in “” or ‘’
• Processing instructions are optional
• XML is case-sensitive
 <tag> and <TAG> are not the same type of element
Syntax and Structure
Well-Formed XML?
ISOM
• No, CHILD2 and CHILD3 do not
nest properly
<xml? Version=“1.0” ?>
<PARENT>
<CHILD1>This is element 1</CHILD1>
<CHILD2><CHILD3>Number 3</CHILD2></CHILD3>
</PARENT>
Syntax and Structure
Well-Formed XML?
ISOM
• No, there are two root elements
<xml? Version=“1.0” ?>
<PARENT>
<CHILD1>This is element 1</CHILD1>
</PARENT>
<PARENT>
<CHILD1>This is another element 1</CHILD1>
</PARENT>
Syntax and Structure
Well-Formed XML?
ISOM
• Yes
<xml? Version=“1.0” ?>
<PARENT>
<CHILD1>This is element 1</CHILD1>
<CHILD2/>
<CHILD3></CHILD3>
</PARENT>
Syntax and Structure
An XML Document
ISOM
<?xml version='1.0'?>
<bookstore>
<book genre=‘autobiography’ publicationdate=‘1981’
ISBN=‘1-861003-11-0’>
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre=‘novel’ publicationdate=‘1967’ ISBN=‘0-201-63361-2’>
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Syntax and Structure
Namespaces: Overview
ISOM
• Part of XML’s extensibility
• Allow authors to differentiate between tags
of the same name (using a prefix)
 Frees author to focus on the data and decide
how to best describe it
 Allows multiple XML documents from multiple
authors to be merged
• Identified by a URI (Uniform Resource
Identifier)
 When a URL is used, it does NOT have to
represent
a live server
Syntax and Structure
Namespaces: Declaration
ISOM
Namespace declaration examples:
xmlns: bk = “http://www.example.com/bookinfo/”
xmlns: bk = “urn:mybookstuff.org:bookinfo”
xmlns: bk = “http://www.example.com/bookinfo/”
Namespace declaration
Prefix
URI (URL)
Syntax and Structure
Namespaces: Examples
ISOM
<BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”>
<bk:TITLE>All About XML</bk:TITLE>
<bk:AUTHOR>Joe Developer</bk:AUTHOR>
<bk:PRICE currency=‘US Dollar’>19.99</bk:PRICE>
<bk:BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”
xmlns:money=“urn:finance:money”>
<bk:TITLE>All About XML</bk:TITLE>
<bk:AUTHOR>Joe Developer</bk:AUTHOR>
<bk:PRICE money:currency=‘US Dollar’>
19.99</bk:PRICE>
Syntax and Structure
Namespaces: Default Namespace
ISOM
• An XML namespace declared
without a prefix becomes the default
namespace for all
sub-elements
• All elements without a prefix will
belong to the default namespace:
<BOOK xmlns=“http://www.bookstuff.org/bookinfo”>
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
Syntax and Structure
Namespaces: Scope
ISOM
• Unqualified elements belong to the
inner-most default namespace.
BOOK, TITLE, and AUTHOR belong to
the default book namespace
PUBLISHER and NAME belong to the
default publisher namespace
xmlns=“www.bookstuff.org/bookinfo”>
<BOOK
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
<PUBLISHER xmlns=“urn:publishers:publinfo”>
<NAME>Microsoft Press</NAME>
</PUBLISHER>
</BOOK>
Syntax and Structure
Namespaces: Attributes
ISOM
• Unqualified attributes do NOT
belong to any namespace
Even if there is a default namespace
• This differs from elements, which
belong to the default namespace
Syntax and Structure
Entities
ISOM
• Entities provide a mechanism for textual
substitution, e.g.
Entity
Substitution
<
&
<
&
• You can define your own entities
• Parsed entities can contain text and markup
• Unparsed entities can contain any data
 JPEG photos, GIF files, movies, etc.
Agenda
ISOM
•
•
•
•
Overview
Syntax and Structure
The XML Alphabet Soup
XML as a meta-language
The XML ‘Alphabet Soup’
ISOM
• XML itself is fairly simple
• Most of the learning curve is
knowing about
all of the related technologies
The XML ‘Alphabet Soup’
ISOM
XML
Extensible Markup
Language
Defines XML documents
Infoset
Information Set
Abstract model of XML data;
definition of terms
DTD
Document Type
Definition
Non-XML schema
XSD
XML Schema
XML-based schema language
XDR
XML Data Reduced
An earlier XML schema
CSS
Cascading Style Sheets Allows you to specify styles
XSL
Extensible Stylesheet
Language
Language for expressing
stylesheets; consists of XSLT and
XSL-FO
XSLT
XSL Transformations
Language for transforming XML
documents
XSL-FO
XSL Formatting
Objects
Language to describe precise layout
of text on a page
The XML ‘Alphabet Soup’
ISOM
XPath
XML Path Language
A language for addressing parts of
an XML document, designed to be
used by both XSLT and XPointer
XPointer
XML Pointer
Supports addressing into the
Language
internal structures of XML
documents
XLink
XML Linking
Describes links between XML
Language
documents
XQuery
XML Query Language Flexible mechanism for querying
(draft)
XML data as if it were a database
DOM
Document Object
API to read, create and edit XML
Model
documents; creates in-memory
object model
SAX
Simple API for XML
API to parse XML documents;
event-driven
Data Island XML data embedded in a HTML page
Data
Automatic population of HTML elements from XML data
Binding
The XML ‘Alphabet Soup’
Schemas: Overview
ISOM
• DTD (Document Type Definitions)
Not written in XML
No support for data types or
namespaces
• XSD (XML Schema Definition)
Written in XML
Supports data types
Current standard recommended by
W3C
The XML ‘Alphabet Soup’
Schemas: Purpose
ISOM
• Define the “rules” (grammar) of the document
 Data types
 Value bounds
• A XML document that conforms to a schema
is said to be valid
 More restrictive than well-formed XML
• Define which elements are present and
in what order
• Define the structural relationships of elements
The XML ‘Alphabet Soup’
Schemas: DTD Example
ISOM
• XML document:
<BOOK>
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
</BOOK>
• DTD schema:
<!DOCTYPE
<!ELEMENT
<!ELEMENT
<!ELEMENT
]>
BOOK
BOOK
TITLE
AUTHOR
[
(TITLE+, AUTHOR) >
(#PCDATA) >
(#PCDATA) >
The XML ‘Alphabet Soup’
Schemas: XSD Example
ISOM
• XML document:
<CATALOG>
<BOOK>
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
</BOOK>
…
</CATALOG>
The XML ‘Alphabet Soup’
Schemas: XSD Example
ISOM
<xsd:schema id="NewDataSet“ targetNamespace="http://tempuri.org/schema1.xsd"
xmlns="http://tempuri.org/schema1.xsd"
xmlns:xsd="http://www.w3.org/1999/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xsd:element name="book">
<xsd:complexType content="elementOnly">
<xsd:all>
<xsd:element name="title" minOccurs="0" type="xsd:string"/>
<xsd:element name="author" minOccurs="0" type="xsd:string"/>
</xsd:all>
</xsd:complexType>
</xsd:element>
<xsd:element name=“Catalog" msdata:IsDataSet="True">
<xsd:complexType>
<xsd:choice maxOccurs="unbounded">
<xsd:element ref="book"/>
</xsd:choice>
</xsd:complexType>
</xsd:element>
</xsd:schema>
The XML ‘Alphabet Soup’
Schemas: Why You Should Use XSD
ISOM
• Newest W3C Standard
• Broad support for data types
• Reusable “components”
 Simple data types
 Complex data types
•
•
•
•
•
Extensible
Inheritance support
Namespace support
Ability to map to relational database tables
XSD support in Visual Studio.NET
The XML ‘Alphabet Soup’
Transformations: XSL
ISOM
• Language for expressing document
styles
• Specifies the presentation of XML
More powerful than CSS
• Consists of:
XSLT
XPath
XSL Formatting Objects (XSL-FO)
The XML ‘Alphabet Soup’
Transformations: Overview
ISOM
• XSLT – a language used to
transform XML data into a different
form (commonly XML or HTML)
XML
XML,
HTML,
…
XSLT
The XML ‘Alphabet Soup’
Transformations: XSLT
ISOM
• The language used for converting XML
documents into other forms
• Describes how the document is
transformed
• Expressed as an XML document (.xsl)
• Template rules
 Patterns match nodes in source document
 Templates instantiated to form part of result
document
• Uses XPath for querying, sorting, etc.
The XML ‘Alphabet Soup’
XPath (XML Path Language)
ISOM
• General purpose query language for
identifying nodes in an XML
document
• Declarative (vs. procedural)
• Contextual – the results depend on
current node
• Supports standard comparison,
Boolean and mathematical
operators (=, <, and, or, *, +, etc.)
The XML ‘Alphabet Soup’
XPath Operators
ISOM
Operator Usage Description
/
Child operator – selects only immediate children
(when at the beginning of the pattern, context is root)
//
Recursive descent – selects elements at any depth
(when at the beginning of the pattern, context is root)
.
Indicates current context
..
Selects the parent of the current node
*
Wildcard
@
Prefix to attribute name (when alone, it is an attribute
wildcard)
[ ]
Applies filter pattern
The XML ‘Alphabet Soup’
XPath Query Examples
ISOM
./author
(finds all author elements within current context)
/bookstore (find the bookstore element at the root)
/*
(find the root element)
//author
(find all author elements anywhere in document)
/bookstore[@specialty = “textbooks”]
(find all bookstores where the specialty
attribute = “textbooks”)
/book[@style = /bookstore/@specialty]
(find all books where the style attribute = the
specialty attribute of the bookstore element
at the root)
More XPath Examples
ISOM
Path Expression
Result
/bookstore/book[1]
Selects the first book element that is the child of the
bookstore element
/bookstore/book[last()]
Selects the last book element that is the child of the
bookstore element
/bookstore/book[last()-1]
Selects the last but one book element that is the child of
the bookstore element
/bookstore/book[position()<3]
Selects the first two book elements that are children of the
bookstore element
//title[@lang]
Selects all the title elements that have an attribute named
lang
//title[@lang='eng']
Selects all the title elements that have an attribute named
lang with a value of 'eng'
/bookstore/book[price>35.00]
Selects all the book elements of the bookstore element
that have a price element with a value greater than
35.00
/bookstore/book[price>35.00]/title
Selects all the title elements of the book elements of the
bookstore element that have a price element with a
value greater than 35.00
XPath Functions
ISOM
• Accessor functions:
node-name, data, base-uri, document-uri
• Numeric value functions:
abs, ceiling, floor, round, …
• String functions:
compare, concat, substring, string-length,
uppercase, lowercase, starts-with, endswith, matches, replace, …
• Other functions include functions on
boolean values, dates, nodes, etc.
The XML ‘Alphabet Soup’
Data Islands
ISOM
• XML embedded in an HTML document
• Manipulated via client side script or data
binding
<XML id=“XMLID”>
<BOOK>
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
</BOOK>
</XML>
<XML id=“XMLID” src=“mydocument.xml”>
The XML ‘Alphabet Soup’
Data Islands
ISOM
• Can be embedded in an HTML
SCRIPT element
• XML is accessible via the DOM:
<SCRIPT language=“xml” id=“XMLID”>
<SCRIPT type=“text/xml” id=“XMLID”>
<SCRIPT language=“xml” id=“XMLID”
src=“mydocument.xml”>
The XML ‘Alphabet Soup’
XML-Based Applications
ISOM
• Microsoft SQL Server
 Retrieve relational data as XML
 Query XML data
 Join XML data with existing database tables
 Update the database via XML Updategrams
 New XML data type in SQL 2005
• Microsoft Exchange Server
 XML is native representation of many types of
data
 Used to enhance performance of UI scenarios
(for example, Outlook Web Access (OWA))
Agenda
ISOM
•
•
•
•
Overview
Syntax and Structure
The XML Alphabet Soup
XML as a meta-language
XML as a Meta-Language
ISOM
SAX
DOM
A Language to
create Languages
CSS
DSSL
XSL
XML/DTD
XLL
XSLT
XSchema
GO
CML
XPath
MathML
XPointer
XQL
WML
BeanML
Gene Ontology (GO)
ISOM
• Describing and manipulating information about
the molecular function, biological process and
cellular component of gene products.
• Gene Ontology website:
 http://www.geneontology.org
• GO DTD:
 ftp://ftp.geneontology.org/pub/go/xml/dtd/go.dtd
• GO Browsers and tools:
 http://www.geneontology.org/#tools
• GO Resources and samples:
 http://www.geneontology.org/#annotations
Math ML
ISOM
• Describing and manipulating mathematical
notations
• MathML website
 www.w3.org/Math
• MathML DTD
 www.w3.org/Math/DTD
• MathML Browser
 www.w3.org/Amaya
• MathML Resources
 www.webeq.com/mathml see sample documents here
Chemical ML
ISOM
• Representing molecular and chemical
information
• CML website
 www.xml-cml.org
• CML DTD
 www.xml-cml.org/dtdschema/index.html
• CML Browser and Authoring Environment
 www.xml-cml.org/jumbo.html
• CML Resources
 www.xml-cml.org/chimeral/index.html
 see sample documents here
 some require plug-in downloads, can be slow
Wireless ML
ISOM
• Allows web pages to be displayed over mobile
devices
• WML works with WAP to deliver the content
• Underlying model: Deck of Cards that the User
can sift through
• WAP/WML website
 www.wapforum.org
• WML DTD
 www.wapforum.org/DTD/wml_1.1.xml
• WAP/WML Resources
 www.oasis-open.org/cover/wap-wml.html
 www.w3scripts.com/wap Tutorial on WML, also see
WAP Demo
Scalable Vector Graphics
ISOM
• Describing vector graphics data for use over the
web
• Rendering is done on the browser
• Bandwidth demands lower, scaling easier
• SVG website
 www.w3.org/Graphics/SVG
• SVG Plug-Ins
 www.adobe.com/svg
• SVG Resources
 www.irt.org/articles/js176 1999 article and good, brief
tutorial
 planet.svg An Example from Deitel
Bean ML
ISOM
• Describing software components such as Java
Beans
• Defines how the components are interconnected
and can be used
• Bean ML Specs and Tools
 www.alphaworks.ibm.com/aw.nsf/techmain/bml
• Bean ML Resources
 www.oasis-open.org/cover/beanML.html
 With Bean ML
• You can mark-up beans using Bean ML
• And invoke different operations on Beans
• Includes BML Scripting Framework
XBRL
ISOM
• Extensible Business Reporting Language
• Capturing and representing financial and accounting
information
• Variety of situations
 e.g. publishing reports, extracting data for analysis,
regulatory forms etc.
• Initiated under the direction of AICPA
• XBRL website
 www.xbrl.org
• XBRL DTDs and Schemas
 http://www.xbrl.org/Core/2000-07-31/default.htm
• Demos and Tools
 http://www.xbrl.org/Demos/demos.htm
 http://www.xbrl.org/Tools.htm
News ML
ISOM
• Designed to be media-independent
• Initiated by International Press
Telecommunications Council
• Enables tracking of news stories over time
• NewsML website
 www.newsml.org
• NewsML DTD
 http://www.oasis-open.org/cover/newsML.html
• SportsML DTD – Derived from NewsML DTD
 http://xml.coverpages.org/sportsML.html
cXML
ISOM
• CommerceXML from Ariba plus 40 other
companies
• cXML website
 www.cxml.org
• Primary Set of Tools/Implementations to support
cXML
 http://www.ariba.com/solutions/solutions_overview.cfm
 See also Whitepapers link explaining how these can be
used for
• E-procurement
• E-fulfillment
• And others ..
xCBL
ISOM
• xCBL from Microsoft, SAP, Sun
• xCBL website
 www.xcbl.org
 Marketed as XML component library for B2B
e-commerce
• Available Resources (see internal links)
 DTDs and Schemas
 XDK: SOX Parser and an XSLT Engine
 Example Documents
ebXML
ISOM
• UN/CEFACT: the United Nations body whose mandate
covers worldwide policy and technical development in the
area of trade facilitation and electronic business.
 www.uncefact.org
• ebXML website
 www.ebxml.org
• Current Endorsements
 http://www.ebxml.org/endorsements.htm
 Still needs buy-in from the larger IS/IT vendors
• Related Effort: RosettaNet
 http://www.rosettanet.org/rosettanet/Rooms/DisplayPages/L
ayoutInitial
 Business Processes for IT, Component and Chip companies
Conclusion
ISOM
•
•
•
•
Overview
Syntax and Structure
The XML Alphabet Soup
XML as a meta-language
Resources
ISOM
•
•
•
•
http://www.xml.com/
http://www.w3.org/xml/
http://www.w3schools.com/
http://msdn.microsoft.com/xml/
Download