SOC-appendixA - the Department of Computer Science

advertisement
Appendix A:
XML and XML Schema
Service-Oriented Computing: Semantics, Processes, Agents
– Munindar P. Singh and Michael N. Huhns, Wiley, 2005
Highlights of this Chapter









Appendix A
XML and Vocabularies
Well-Formedness
Namespaces and Qualified Names
XML Extensions
XML Schema
XML Query Languages
XPath
XSLT
Limitations
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
2
Brief Introduction to XML




Appendix A
Basics
Parsing
Storage
Transformations
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
3
Markup History





None
Ad hoc tags
SGML (Standard Generalized Markup L):
complex, few reliable tools
HTML (HyperText ML): simple, unprincipled,
mixes structure and display
XML (eXtensible ML): simple, yet extensible
subset of SGML to capture new vocabularies


Appendix A
Machine processible
Comprehensible to people: easier debugging
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
4
XML Basics and Namespaces
<?xml version="1.0"?> <!– not part of the document per se 
<arbitrary:toptag xmlns=“http://one.default.namespace/if-needed”
xmlns:arbitrary=“http://wherever.it.might.be/arbit-ns”
xmlns:random=“http://another.one/random-ns”>
<arbitrary:atag attr1=“v1” attr2=“v2”>
Optional text also known as PCDATA
<arbitrary:btag attr1=“v1” attr2=“v2” />
</arbitrary:atag>
<random:simple_tag/>
<random:atag attr3=“v3”/> <!– compare with arbitrary:atag
above 
</arbitrary:toptag>
Appendix A
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
5
Parsing and Validating

An XML document maps to a parse tree.




Each tag ends once: nesting structure (one root)
Each attribute occurs at most once; quoted string
Well-formed XML documents can be parsed
Applications have an explicit or implicit syntax
for their particular XML-based tags

If explicit, may be expressed in DTDs and XML
Schemas



Appendix A
Best referred to definitions elsewhere
XML Schemas, expressed in XML, are superior to DTDs
When docs are produced by external components,
they should be validated
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
6
XML Schema

A data definition language for XML: defines a
notion of schema validity




Same syntax as regular XML documents
Local scoping of subelement names
Incorporates namespaces
Types






Appendix A
Primitive (built-in): string, integer, float, date, …
Primitive (built-in): ID (key), IDREF (foreign key)
simpleType constructors: list, union
Restrictions: intervals, lengths, enumerations, regex
patterns,
Flexible ordering of elements
Key and referential integrity constraints
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
7
XML Schema: complexType

Specifies types of elements with
structure:





Appendix A
Must use a compositor if ¸ 1subelements
Subelements with types
Min and max occurrences (default 1) of
subelements
Elements with text content not easy:
ignore
EMPTY elements: easy. Example?
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
8
XML Schema: Compositors

Sequence: ordered



All: unordered



Must occur directly below root element
Max occurrence of each element is 1
Choice: exclusive or

Appendix A
Can occur within other compositors
Allows varying min and max occurrence
Can occur within other compositors
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
9
XML Schema: Key Namespaces

http://www.w3.org/2001/XMLSchema




http://www.w3.org/2001/XMLSchemainstance



Appendix A
Conventional prefix: xsd
Terms for defining schemas: schema, element,
attribute, …
The tag schema has an attribute targetNamespace
Conventional prefix: xsi
Terms for use in instances: schemaLocation, null
targetNamespace: user-defined
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
10
XML Schema Instance Doc
<Music xmlns=http://a.b.c/Muse
xmlns:xsi=“the standard-xsi”
xsi:schemaLocation=“a-schema-as-a-URI
a-schema-location-as-a-URL”>
…
</Music>
Define null values as <aTag xsi:nil=“true”/>
Appendix A
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
11
Creating Schema Docs: 1
<schema xmlns=“the-standard-xsd”
targetNamespace=“the-target”>
<include schemaLocation=“part-one.xsd”/>
<include schemaLocation=“part-two.xsd”/>
<!– schemaLocation as in xsd, not xsi 
</schema>
Included into the same namespace as the
including space.
Appendix A
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
12
Creating Schema Docs: 2

Use imports instead of include


Appendix A
Specify namespaces from which schemas
are to be imported
Location of schemas not required and may
be ignored if provided
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
13
Document Object Model (DOM)

Basis for parsing XML, which provides a nodelabeled tree in its API




Appendix A
Conceptually simple: traverse by requesting tag,
its attribute values, and its children
Processing program reflects document structure
Can edit documents
Inefficient for large documents: parses them first
entirely to build the tree even if a tiny part is
needed
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
14
DOM Example [Simeoni 2003]
Element s = d.getDocumentElement();
NodeList l =
s.getElementsByTagName(“member”);
Element m = (Element) l.item(0);
int code = m.getAttribute(“code”);
NodeList kids = m.getChildNodes();
Node kid = kids.item(0);
String tagName =
((Element)kid).getTagName();
…
Appendix A
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
15
Simple API for XML (SAX)

Parser generates a sequence of events:


Programmer implements these as
callbacks


Appendix A
startElement, endElement, …
More control for the programmer
Processing program does not reflect
document structure
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
16
SAX Example [Simeoni 2003]
class MemberProcess extends DefaultHandler {
public void startElement (String uri, String n, String
qName, Attributes attrs) {
if (n.equals(“member”)) code =
attrs.getValue(“code”);
if (n.equals(“project”)) inProject = true;
buffer.reset(); }
public void endElement (String uri, String n, String
qName) {
if (n.equals(“project”)) inProject = false;
if (n.equals(“member”) && !inProject)
name = buffer.toString().trim(); } }
Appendix A
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
17
Programming with XML

Current approaches concentrate on
structure but ignore meaning




Emerging approaches (e.g., JAXB)
provide superior binding from XML to
programming languages

Appendix A
Difficult to construct and maintain
Treat everything as a string
Inadequate type checking can hide errors
Primitives such as unmarshal to materialize
an object from XML
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
18
Uses of XML



Exchanging information across software
components
Storing information in nonproprietary format
XML documents represent structured
descriptions:




Products, services, catalogs
Contracts
Queries, requests, invocations (as in SOAP)
Data-centric versus document-centric
(irregular, heterogeneous data, depend on
entire doc for app-specific meaning) views
Appendix A
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
19
Data-Centric View
<relation>
<tuple><attr1>V11</attr1>…
<attrn>V1n</attrn></tuple>
…
<tuple><attr1>Vm1</attr1>…
<attrn>Vmn</attrn></tuple>
</relation>



Extract and store into DB via mapping
to DB model
Regular, homogeneous tags
May be expensive if repeatedly parsed
and instantiated
Appendix A
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
20
Document-Centric View

Storing docs in DBs




Appendix A
Use character large objects (clobs) within
DB
Store paths to external files containing
docs
Combine with some structured elements
with search conditions for both structured
elements and unstructured clobs or files
Heterogeneity also complicates mappings
to traditional typed OO programming
languages
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
21
Directions

Limitations of XML



Doesn’t represent meaning
Enables multiple representations for the
same information; transform if models
known
Trends: sophisticated approaches for



Appendix A
Querying and manipulating XML, e.g., XSLT
Binding to PLs and DBs
Semantics, e.g., RDF, DAML, OWL, …
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
22
XML Query Languages




Appendix A
XPath
XPointer
XSLT
XQuery
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
23
XPath

Model XML documents as trees with
nodes





Appendix A
Elements
Attributes
Text (PCDATA)
Comments
Root node: above root of document
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
24
Achtung!


Parent in XPath is like parent as traditionally
in computer science
Child in XPath is confusing:



An attribute is not the child of its parent
Makes a difference for certain kinds of recursion
(e.g., apply-templates discussed in XSLT)
Our terminology is based on the traditional
terminology:


Appendix A
e-children, a-children, t-children
Sets via et- or ta-, etc.
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
25
XPath Paths







Leading /: root
/: indicates walking down a tree
.:current node
..:parent node
@attr: to access values for the given
attribute
text()
comment()
Appendix A
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
26
XPath Navigation


Select children according to position, e.g., [j],
where j could be 1 … last()
Descendant-or-self operator, //




.//elem finds all elems under the current
//elem finds all elems in the document
Ancestors: not needed in this course
Wildcard, *:


Appendix A
collects e-children of the node where it is applied,
but omits the t-children
@*: finds all attribute values
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
27
XPath Queries

Incorporate selection conditions in XPath









Appendix A
Attributes: //Song[@genre=“jazz”]
Elements: //Song[starts-with(.//group, “Led”)]
Existence of attribute: //Song[@genre]
Existence of subelement: //Song[group]
Boolean operators: and, not, or
Set operator: union (|); none others
Arithmetic operators: >, <, …
String functions: contains(), concat(), length(),
Aggregates: sum(), count()
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
28
XPointer



Combines XPath with URLs
URL to get to a document; XPath to
walk down the document
Can be used to formulate queries, e.g.,

Appendix A
SongURL#xpointer(//Song[@genre=“jazz”])
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
29
XSLT


A functional programming language
A stylesheet specifies transformations
on a document
<?xml version=“1.0”?>
<?xml-stylesheet type=“text/xsl”
href=“URL-to-dot-xsl”?> <!– the sheet to use 
<main-tag>
…
</main-tag>
Appendix A
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
30
XSLT Stylesheets

Use the XSLT namespace,
conventionally abbreviated as xsl
Includes primitives:




Appendix A
Copy-of
<for-each select=“…”>
<if test=“…”>
<choose >
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
31
XSLT Templates: 1

A pattern to specify where a given
transform should apply
This match only works on the root:
<xsl:template match=“/”>
…
</xsl:template>
 Only anonymous templates in this course

Appendix A
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
32
XSLT Templates: 2

Can be applied recursively on the et-children
via
<xsl:apply-templates/>

By default, if no other template matches,
recursively apply to et-children of current
node (ignores attributed) and to root:
<xsl:template match=“*|/”>
<xsl:apply-templates/>
</xsl:template>

Can over-apply; to override the default, may
need an empty template:
<xsl:template match=“…”/> <!– e.g., match all text() 
Appendix A
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
33
XSLT Templates: 3


Appendix A
Subtleties of XSLT matching are beyond
our scope
Discuss some examples
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
34
Appendix A Summary


XML enables information sharing
XML is well established




Appendix A
Several aspects are worked out
Lots of tools
Works with databases and programming
languages
XML provides a useful substrate for
service-oriented computing
Service-Oriented Computing: Semantics, Processes, Agents - Munindar Singh and Michael Huhns
35
Download