xml - University at Albany

advertisement
Extensible Markup Language
XML
MSI 602 – Spring 2003
Databases
Types
• Data is facts and figures
• Database is a related set of data
Kinds of databases
• Unstructured
– Meaning of data interpreted by user
•
Semi-Structured
– Structure of data wrapped around data
•
Structured
– Fixed structure of data
– Data added to the fixed structure
XML
Definition and Example
•
XML is a text based markup language that is fast becoming a
standard of data interchange
–
–
An open standard from W3C
A direct descendant from SGML
Example: Product Inventory Data
<Product>
<Name>Refrigerator</Name>
<Model Number>R3456d2h</Model Number>
<Manufacturer>General Electric</Manufacturer>
<Price>1290.00</Price>
<Quantity>1200</Quantity>
</Product>
XML
Data Interchange
•
•
XMLs key role is data interchange
Two business partners want to exchange customer data
–
–
•
Agree on a set of tags
Exchange data without having to change internal databases
Other business partners can participate by using the same
tagset
–
New tags can be added to extend the functionality
Key to successful data interchange is building
consensus and standardizing of tag sets
XML
Universal Data
 Universal Networking
 Universal Rendering
 Universal Code
 Universal Data
•
•
•
•
TCP/IP
HTML
Java
XML
•
Numerous standard bodies are set up for standardization of
tags in different domains
–
–
–
–
ebXML
XBRL
MML
CML
HTML vs. XML
Comparison
•
Both are markup languages
–
–
•
Usage
–
–
•
HTML tags specify how to display data
XML tags specify semantics of the data
Tag Interpretation
–
–
•
HTML has fixed set of tags
XML allows user to specify the tags based on requirements
HTML specifies what each tag and attribute means
XML tags delimit data & leave interpretation to the parsing application
Well formedness
–
–
HTML very tolerant of rule violations (nesting, matching tags)
XML very strictly follows rules of well formedness
XML
Structure
•
Prolog
–
–
•
Instructs the parser as to what it it parsing
Contains processing instructions for processor
Body
–
–
–
Tags
Attributes
Comments
- Entities
- Properties of Entities
- Statements for clarification in the document
Example
<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>
<contact>
<name>
<first name>Sanjay</first name>
<last name>Goel</last name>
</name>
<address>
<street>56 Della Street</street>
<city>Phoenix</city>
<state>AZ</state>
<zip>15784</zip>
</address>
</contact>
 Prolog
 Body
XML
Prolog
•
•
•
Syntax: <?xml version=“1.0” encoding=“UTF-8”
standalone=“yes” ?>
Contains eclaration that identifies a document as xml
Version
–
–
•
Encoding
–
–
•
Identifies the character set used to encode the data
Default compressed Unicode: UTF-8
Standalone
–
•
Version of XML markup language used in the data
Not optional
Tells whether or not this document references external entity
May contain entity definitions and tag specifications
XML Syntax
Elements & Attributes
•
•
Uses less-than and greater-than characters (<…>) as
delimiters
Every opening tag must having an accompanying closing tag
–
–
–
•
Tags can have attributes which must be enclosed in double
quotes
–
•
<name first=“Sanjay” last=“Goel”)
Elements should be properly nested
–
–
•
<First Name>Sanjay</First Name>
Empty tags do not require an accompanying closing tag.
Empty tags have a forward slash before the greater-than sign e.g.
<Name/>
The nesting can not be interleaved
Each document must have one single root element
Elements and attribute names are case sensitive
Tree Structure
Elements
•
XML documents have a tree structure containing multiple
levels of nested tags.
–
–
Root element is a single XML element which encloses all of the other
XML elements and data in the document
All other elements are children of the root element
<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>
<contact>
 Root Element
<name>
<first name>Sanjay</first name>
<last name>Goel</last name>
</name>
<address>
<street>56 Della Street</street>
 Child Elements
<city>Phoenix</city>
<state>AZ</state>
<zip>15784</zip>
</address>
</contact>
Attributes
Definition and Example
•
•
Attributes are properties associated with an element
Each attribute is a name value pair
–
–
No element may contain two attributes with same name
Name and value are strings
Example
<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>
<contact>
<name first=“Sanjay” last=“Goel”></name>
 Attributes
<address>
<street>56 Della Street</street>
 Nested Elements
<city>Phoenix</city>
<state>AZ</state>
<zip>15784</zip>
</address>
</contact>
Elements vs. Attributes
Comparison
•
•
Data should be stored in Elements
Information about data (meta-data) should be stored in
attributes
–
•
When in doubt use elements
Rules of thumb
–
–
–
Elements should have information which some one may want to read.
Attributes are appropriate for information about document that has
nothing to do with content of document
e.g. URLs, units, references, ids belong to attributes
What is your meta-data may be some ones data
Comments
Basics
•
XML comments begin with “<!--”and end with “-->”
–
–
•
•
•
All data between these delimiters is discarded
<!-- This is a list of names of people -->
Comments should not come before XML declaration
Comments can not be placed inside a tag
Comments may be used to hide and surround tags
<Name>
<first>Sanjay</first>
<!-- <last>Goel</last> -->
 Last tag is ignored
</Name>
•
“--” string may not occur inside a comment except as part of
its opening and closing tag
–
<!-- the Red door -- that is the second -->
 Illegal
Namespaces
Basics
•
XML documents come from different sources
– Combining elements from different sources can result in
name conflict
– Namespaces allow the interpreter to resolve the elements
•
Namespaces
– Declared within element start-tag using attribute xmlns
– Represented as an actual URI (since namespaces are
globally unique)
–
–
•
e.g. <Collection xmlns:book="http://www.mjyOnline.com/books"
xmlns:cd=http://www.mjyOnline.com/books>
Here book and cd are short hands for the full namespace name
Default namespace is used if no other namespace is defined
–
It does not have any prefix associated with it
Namespaces
Example
<?xml version="1.0"?>
<!-- File Name: Collection.xml -->
<COLLECTION
xmlns:book="http://www.mjyOnline.com/books"
xmlns:cd="http://www.mjyOnline.com/cds">
<ITEM Status="in">
<TITLE>The Adventures of Huckleberry
Finn</book:TITLE>
<AUTHOR>Mark Twain</book:AUTHOR>
<PRICE>$5.49</book:PRICE>
</ITEM>
<ITEM Status="in">
<TITLE>The Marble Faun</TITLE>
<AUTHOR>Nathaniel Hawthorne</AUTHOR>
<PRICE>$10.95</PRICE>
</ITEM>
<ITEM>
<ITEM Status="out">
<TITLE>Leaves of Grass</TITLE>
<AUTHOR>Walt Whitman</AUTHOR>
<PRICE>$7.75</PRICE>
</ITEM>
<ITEM Status="out">
<TITLE>The Legend of Sleepy Hollow</TITLE>
<AUTHOR>Washington Irving</AUTHOR>
<PRICE>$2.95</PRICE>
</ITEM>
<?xml version="1.0"?>
<!-- File Name: Collection.xml -->
<COLLECTION
<ITEM>
<TITLE>Violin Concertos Numbers 1, 2, and 3</TITLE>
<COMPOSER>Mozart</COMPOSER>
<PRICE>$16.49</PRICE>
</ITEM>
<TITLE>Violin Concerto in D</TITLE>
<COMPOSER>Beethoven</COMPOSER>
<PRICE>$14.95</PRICE>
</ITEM>
</COLLECTION>
Books and CDs are tracked in different files if combined will lead to conflicts
Namespaces
Example
<?xml version="1.0"?>
<!-- File Name: Collection.xml -->
<COLLECTION
xmlns:book="http://www.mjyOnline.com/books"
xmlns:cd="http://www.mjyOnline.com/cds">
<book:ITEM Status="in">
<book:TITLE>The Adventures of Huckleberry
Finn</book:TITLE>
<book:AUTHOR>Mark Twain</book:AUTHOR>
<book:PRICE>$5.49</book:PRICE>
</book:ITEM>
<cd:ITEM>
<cd:TITLE>Violin Concerto in D</cd:TITLE>
<cd:COMPOSER>Beethoven</cd:COMPOSER>
<cd:PRICE>$14.95</cd:PRICE>
</cd:ITEM>
<book:ITEM Status="out">
<book:TITLE>Leaves of Grass</book:TITLE>
<book:AUTHOR>Walt Whitman</book:AUTHOR>
<book:PRICE>$7.75</book:PRICE>
</book:ITEM>
<cd:ITEM>
<cd:TITLE>Violin Concertos Numbers 1, 2, and
3</cd:TITLE>
<cd:COMPOSER>Mozart</cd:COMPOSER>
<cd:PRICE>$16.49</cd:PRICE>
</cd:ITEM>
<book:ITEM Status="out">
<book:TITLE>The Legend of Sleepy
Hollow</book:TITLE>
<book:AUTHOR>Washington Irving</book:AUTHOR>
<book:PRICE>$2.95</book:PRICE>
</book:ITEM>
<book:ITEM Status="in">
<book:TITLE>The Marble Faun</book:TITLE>
<book:AUTHOR>Nathaniel Hawthorne</book:AUTHOR>
<book:PRICE>$10.95</book:PRICE>
</book:ITEM>
</COLLECTION>
Display XML
Style Sheets
•
•
A style sheet is a file that contains instructions for
rendering individual elements in an XML document
Two kinds of style sheets exist
– Cascading Style Sheets (CSS)
– Extensible Stylesheet language (XSLT)
•
Please refer to the following web site for
comprehensive information on style sheets
– http://www.w3schools.com/css/default.asp
Cascading Style Sheets
Example
<?xml version="1.0"?>
<!-- File Name: Inventory01.xml -->
<?xml-stylesheet type="text/css"
href="Inventory01.css"?>
<INVENTORY>
<BOOK>
<TITLE>The Adventures of Huckleberry
Finn</TITLE>
<AUTHOR>Mark Twain</AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>298</PAGES>
<PRICE>$5.49</PRICE>
</BOOK>
<BOOK>
<TITLE>Leaves of Grass</TITLE>
<AUTHOR>Walt Whitman</AUTHOR>
<BINDING>hardcover</BINDING>
<PAGES>462</PAGES>
<PRICE>$7.75</PRICE>
</BOOK>
<BOOK>
<TITLE>The Legend of Sleepy Hollow</TITLE>
<AUTHOR>Washington Irving</AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>98</PAGES>
<PRICE>$2.95</PRICE>
</BOOK>
<BOOK>
<TITLE>The Marble Faun</TITLE>
<AUTHOR>Nathaniel Hawthorne</AUTHOR>
<BINDING>trade paperback</BINDING>
<PAGES>473</PAGES>
<PRICE>$10.95</PRICE>
</BOOK>
<BOOK>
<TITLE>Moby-Dick</TITLE>
<AUTHOR>Herman Melville</AUTHOR>
<BINDING>hardcover</BINDING>
<PAGES>724</PAGES>
<PRICE>$9.95</PRICE>
</BOOK>
</INVENTORY>
Cascading Style Sheets
Example
/* File Name: Inventory02.css */
BOOK
{display:block;
margin-top:12pt;
font-size:10pt}
TITLE
{display:block;
font-size:12pt;
font-weight:bold;
font-style:italic}
AUTHOR
{display:block;
margin-left:15pt;
font-weight:bold}
BINDING
{display:block;
margin-left:15pt}
PAGES
{display:none}
PRICE
{display:block;
margin-left:15pt}
Cascading Style Sheets
Display
Formal Languages/Grammars
Basics
•
A formal language is a set of strings
–
–
–
•
It is characterized by a set of rules which determine which strings are
a part of the language and which are not
In case of programming languages, programs which compile are
grammatical corret (others are not)
In a natural language, like English, correct sentences follows rules of
the English language grammar
More precisely grammar a defines four things
–
–
–
–
A vocabulary out of which the strings are constructed (terminal
symbols)
Vocabulary that is used to formulate grammar rules (non terminal
symbols)
Grammar rules (productions), each of which has a lhs and a rhs
A designated start symbol
Validated XML Document
Basics
•
An XML document is valid if it conforms to the grammar of
the language
–
•
Two ways to specify the grammar of the language
–
–
•
Document Type Definition (DTD)
XML Schema
Why bother with the language grammar
–
–
–
•
Validity is different from well-formedness
It provides the blueprint of the language
Ensures that the data is interchangable
Eliminates processing errors in custom software which expects a
particular document content and structure
Validity of the document is checked by using a validator
Document Type Declaration
Basics
•
Document type declaration is a block of XML markup added
to the prolog of the document
–
–
•
It defines the content and structure of the language
–
•
Without a document type declaration or schema a document is merely
checked for well-formedness and not validity
Why bother with the language grammar
–
–
–
•
It has to follow the XML declaration
It has to be outside of other markup language
It provides the blueprint of the language
Ensures that the data is interchangable
Eliminates processing errors in custom software which expects a
particular document content and structure
The form of a document type declaration is:
–
–
–
<!DOCTYPE Name DTD>
DTD is document type definition
Name specifies the name of the document element
Document Type Definitions
Basics
•
Document type definition (DTD) consists of a series of
markup declarations enclosed in square brackets
<?xml version=“1.0” standalone=“yes”?>
<!DOCTYPE GREETING [
<!ELEMENT GREETING (#PCDATA)>
]>
<GREETING>
Hello XML!
</GREETING>
•
A DTD can also be stored separately from the XML document
and referenced in it.
Document Type Definitions
Syntax
• Element Type Declaration
–
–
–
•
Example:
–
•
Syntax: <!Element Name contentspec>
Name is the name of the element
contentspec is the content specification
<!Element Title (#PCDATA)>
Content specification can have four types of values
–
–
–
–
EMPTY content – Element must not have content
<!Element Image EMPTY>
ANY Content – Can contain any thing
<!Element misc ANY>
Element Content – Child elements but no character data
<!DOCTYPE BOOK [
<!ELEMENT BOOK (TITLE, AUTHOR)>
<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT AUTHOR (#PCDATA)>
Mixed Content – character data and child elements interspersed
Element Content Specification
Types
• Content Specification indicates allowed child elements and
their order
–
•
If element has element content it can not contain any character data
Types of content specifications
–
–
Sequence: Indicates that each element must have a specific sequence
of child elements
Example
<!Doctype Mountain [
<!ELEMENT MOUNTAIN (NAME, HEIGHT, STATE)>
<!ELEMENT NAME (#PCDATA)
<!ELEMENT HEIGHT (#PCDATA)
<!ELEMENT STATE (#PCDATA)
]>
–
Valid XML
<MOUNTAIN>
<NAME>Wheeler</NAME>
<HEIGHT>13161</HEIGHT>
<STATE>New Mexico</STATE>
</MOUNTAIN>
Element Content Specification
Types
• Types of content specifications
–
–
–
Choice: Indicates that element can have one of a series of child
elements
Each element is separated by a | sign
Example
<!Doctype FILM [
<!ELEMENT FILM (STAR | NARRATOR | INSTRUCTOR)>
<!ELEMENT STAR (#PCDATA)>
<!ELEMENT NARRATOR (#PCDATA)>
<!ELEMENT INSTRUCTOR (#PCDATA)>
]>
–
Valid XML
<FILM>
<STAR>ROBERT REDFORD</STAR>
</FILM>
–
Invalid XML
<FILM>
<NARRATOR>Sir Gregory Parsloe</NARRATOR>
<INSTRUCTOR>Galahad Threepwood</INSTRUCTOR>
</FILM>
Element Content Specification
Number of Elements
•
Specifying the number of elements allowed
–
–
–
–
? zero or one
+ one or more
* zero or more
Example
<!Doctype Mountain [
<!ELEMENT MOUNTAIN (NAME+, HEIGHT?, STATE)>
<!ELEMENT NAME (#PCDATA)
<!ELEMENT HEIGHT (#PCDATA)
<!ELEMENT STATE (#PCDATA)
]>
–
Valid XML
<MOUNTAIN>
<NAME>Peublo Peak</NAME>
<NAME>Taos Mountain</NAME>
<STATE>New Mexico</STATE>
</MOUNTAIN>
Element Content Specification
Modification
•
Modifying a group of elements
– Example
<!Doctype FILM [
<!ELEMENT FILM (STAR | NARRATOR | INSTRUCTOR)+>
<!ELEMENT STAR (#PCDATA)>
<!ELEMENT NARRATOR (#PCDATA)>
<!ELEMENT INSTRUCTOR (#PCDATA)>
]>
– Valid XML
<FILM>
<NARRATOR>Sir Gregory Parsloe</NARRATOR>
<STAR>ROBERT REDFORD</STAR>
<NARRATOR>PLUG BASHMAN</NARRATOR>
</FILM>
Element Content Specification
Nesting
•
Nesting in specification
–
Example
<!Doctype FILM [
<!ELEMENT FILM TITLE, CLASS,(STAR | NARRATOR | INSTRUCTOR)+>
<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT CLASS (#PCDATA)>
<!ELEMENT STAR (#PCDATA)>
<!ELEMENT NARRATOR (#PCDATA)>
<!ELEMENT INSTRUCTOR (#PCDATA)>
]>
–
Valid XML
<FILM>
<TITLE>The Net</TITLE>
<CLASS>Action</CLASS>
<STAR>Sandra Bullock</STAR>
</FILM>
Element Content Specification
Mixed Content Model
•
Mixed Content Model: Allows element to contain
–
–
–
•
Character Data
Child elements in any position and any frequency (zero or more
repetitions)
Child elements can be interspersed with data
Character data only
–
Example
<!ELEMENT TITLE (#PCDATA)>
•
Character data and elements
–
Example:
<!ELEMENT TITLE (#PCDATA | SUBTITLE)+>
<!ELEMENT SUBTITLE (#PCDATA)>
–
Valid XML
<TITLE>Moby Dick <SUBTITLE>Or, The Whale</SUBTITLE></TITLE>
<TITLE><SUBTITLE>Or, The Whale</SUBTITLE>Moby Dick</TITLE>
Attribute Specification
Basics
•
All attributes in the document need to be specified using an
attribute declaration list. It defines
–
–
–
•
Syntax: <!ATTLIST Name Attdefs>
–
–
•
Defines the name of the attribute
Defines the data type of each attribute
Specifies whether an attribute is required or noe
Name is the name of the element
Attdefs is a series of one or more attribute definitions
Attribute definition Syntax: Name AttType DefaultDecl
–
–
–
–
Name is the attribute name
AttType is the type of the attribute (CDATA, Token Type,
Enumerated)
DefaultDecl specifies if attribute is required & default values
Example:
<!ELEMENT FILM (TITLE, (STAR | NARRATOR | INSTRUCTOR))>
<!ATTLIST FILM Class CDATA “fictional” Year CDATA #REQUIRED>
Entity Specification
Types
•
There are two kinds of entities in XML documents1
– Character entities (referred by character unicode number)
– Named entities, referred to by name
XML Parsing
Definition and Types
•
•
An XML parser is a program that reads an XML document
and makes its contents available for processing
There are two standard types of parsers for XML
–
–
•
Document Object Model (DOM) which makes the document available
as a tree
Simple XML Parser (SAX) which associates an event with each tag
and each block of text
XML parsers are available from many vendors
–
–
–
–
Each vendor conforms to the standardized XML interfaces
One of the best parsers is the xerces parser
Suns API for XML parsing is JAXP (supports basic classes and
interfaces that a Java XML parser should support)
Often SAX parsers are used for writing DOM parsers
SAX Parser
Basics
•
As the parser scans the document it sends notifications of
events, for instance
–
–
–
•
Element start
Element end
Character sequence between two elements is found
SAX provides standard names for these callback functions that
are triggerd by these events
void characters (char[] ch, int start, int length): notification of character data
void startDocument(): notification of start of document
void endDocument(): notification of end of document
void startElement(String name, AttributeList atts): notification of start of element
void endElement(String name): notification of end of element
void processingInstruction(String target, String data): notification of processing
instruction
SAX Parser
Example
From professional JSP page 658
XSLT Parser
Definition and Uses
•
XSLT is an XML structure transforming language
– Any treee transforming language needs an ability to refer
to tree paths
– Xpath is the sub-language underneath XSLT for tree path
description
•
There are two scenarios for use of XSLT
– Browser contains an XSLT and uses it to render XML
documents
– XSLT is used for changing the structure of an existing
XML document
•
To run XSLT the following components are required
– Java 1.4 standard development kit
– James Clark’s xt (xt.jar)
XSLT Parser
Basics
•
•
XSLT style sheet is an XML document
Consists of two parts
–
–
•
Standard XML declaration including namespace declaractions
Top level elements that set up the general framework for the output,
e.g., variables or import parameters from the command line
Processing involves the following
–
–
–
–
–
A current list of nodes from the source document is created by
matching a pattern
Output to the current node is generated by instantiating a template
corresponding the current pattern
In process of transformation new nodes can be added to the list
The processing begins by processing a list containing the entire
document
Transformation ends when the node list is empty
XSLT Parser
Example
•
XSLT
Web Services
Web Services
Definition
•
Web Services are software programs that use XML to exchange
information with other software programs via common Internet
protocols.
–
–
–
•
Web services communicate over the network to provide specific
methods that other applications can invoke.
Thus applications residing on different computer can work
synergistically by invoking methods on each other
Http is the key protocol used for Web Services.
Characteristics
–
–
–
–
–
Programmable
Encapsulate a task
XML based data exchange allows programs on heterogenous platforms
to communicate (SOAP)
Self-describing (WSDL)
Discoverable (UDDI)
Web Services
SOAP
•
SOAP – Simple Object Access Protocol
–
–
–
•
•
SOAP consists of standardized XML schemas
Defines a format for transmitting XML messages over network
–
•
Enables data transfer between systems distributed over a network
A SOAP method send to the a Web Service invokes a method provided
by the service
Web Service may return the result via another SOAP message
Includes data types and message structure
Layered over an Internet protocol, such as HTTP and can be
used to transfer data across the Web and other networks
–
Http allows message transfer across firewall since Http messages are
usually accepted by firewalls
Web Services
SOAP
•
SOAP message consists of three parts
–
–
–
•
•
•
•
Envelope
Header
Body
Envelope wraps the entire message and contains header and
body
Header (optional) provides information on security and routing
Body contains application specific data that is being transferred
Other alternative to SOAP are XML-RPC
–
SOAP de facto standard due to simplicity, extensibility and
interoperability
Web Services
WSDL
•
•
WSDL – Web Services Description Language
Provides means to provide information about a web service
–
–
•
•
Instructions of its use
Capability of the service
Provides information on connection to the service and
communicate
Syntax is fairly complex
–
–
Normally created using automated tools
Not important to understand the precise syntax of WSDL while
developing web services
Web Services
UDDI
•
UDDI – Universal Description, Discovery and Integration
–
–
•
Allows developers and businesses to publish and locate web services on
a network via use of registries
The registries can be made private or public
Structure similar to a phone book
–
–
–
White pages contain contact information and textual description
Yellow pages provides classification information about companies and
details of company’s electronic capability
Green pages list technical data relating to services and business
processes
Download