XML - mindproject

advertisement
Canonical lexical representation:
A canonical lexical representation is a set of literals from among the valid set of literals for a
datatype such that there is a one-to-one mapping between literals in the canonical lexical
representation and values in the value space.
Canonical lexical form:
The canonical lexical form of a base64Binary data value is the base64 encoding of the value
which matches the Canonical-base64Binary production in the following grammar:
Canonical-base64Binary ::= (B64 B64 B64 B64)*
((B64 B64 B16 '=') | (B64 B04 '=='))?
B04
::= [AQgw]
B16
::= [AEIMQUYcgkosw048]
B64
::= [A-Za-z0-9+/]
Data binding:


JAXB: (Java Architecture for XML Binding)
o Binding between Java and XML object using XML schema;
o Data(object) level;
Caster:
o Fairly new;
o Need very little change of your code from JAXB to Caster;
o Need not XML schema to achieve data binding;
o Provides JDO(Java Data Object) capabilities
DOM:
Document Object Model, a platform and language independent-standard object model for
representing HTML or XML and related formats.
Because the DOM supports navigation in any direction (e.g., parent and previous sibling) and
allows for arbitrary modifications, an implementation must at least buffer the document that
has been read so far (or some parsed form of it). Hence the DOM is likely to be best suited for
applications where the document must be accessed repeatedly or out of sequence order. If the
application is strictly sequential and one-pass, the SAX model is likely to be faster and use less
memory.
JAXP:
The Java API for XML Processing, or JAXP (pronounced jaks-p), is one of the Java XML
programming APIs. This API specifies certain common tasks that the DOM and SAX
standards leave out. Specifically, creating parser objects is not defined by the DOM
or SAX standards, and DOM does not define turning features of those parsers on
and off. It provides the capability of validating and parsing XML documents. The two basic
parsing interfaces are:
 the Document Object Model parsing interface or DOM interface
 the Simple API for XML parsing interface or SAX interface
A third interface most likely be added in the next major version of Java, Mustang due late 2006.
 StAX the Streaming API for XML.
In addition to the parsing interfaces, the API provides an XSLT interface to provide data and
structural transformations on an XML document.
Namespace:
Ex. <customer_summary
xmlns:addr="http://www.xyz.com/addresses/"
xmlns:books="http://www.zyx.com/books/"
xmlns:mortgage="http://www.yyz.com/title/"
>
The String in a namespace definition is just a string. Although it looks like a URL, it isn’t
used that way. They are here just because they are unique.
NCNAME:
represents XML "non-colonized" Names;
QName:
Qualified Name, QName represents XML qualified names. The ·value space· of QName is the set
of tuples {namespace name, local part}, where namespace name is an anyURI and local part is
an NCName:
QName ::=
PrefixedName
| UnprefixedName
PrefixedName ::=
Prefix ':' LocalPart
UnprefixedName
::=
LocalPart
Prefix
::=
NCName
LocalPart
::=
NCName
The Prefix provides the namespace prefix part of the qualified name, and MUST be associated
with a namespace URI reference in a namespace declaration. [Definition: The LocalPart provides
the local part of the qualified name.]
SAX:
The Simple API for XML (SAX) is a serial access parser API for XML. SAX provides a mechanism
for reading data from an XML document.
A parser which implements SAX (ie, a SAX Parser) functions as a stream parser, with an
event-driven API. The user defines a number of callback methods that will be called when events
occur during parsing.
Schema Validation:
The process of checking to see if an XML document conforms to a schema is called validation,
which is separate from XML's core concept of syntactic well-formedness. All XML documents
must be well-formed, but it is not required that a document be valid unless the XML parser is
"validating", in which case the document is also checked for conformance with its associated
schema.
Documents are only considered valid if they satisfy the requirements of the schema with which
they have been associated. These requirements typically include such constraints as:
* Elements and attributes that must/may be included, and their permitted structure;
* The structure is specified by regular expression syntax;
* How character data is to be interpreted, e.g. as a number, a date, a URL, a Boolean, etc. XML
Schema validations can be effectively performed using specialized parsers, like JAXB or SAX.
StAX:
Streaming API for XML;
XOM:
XOM is a new XML object model. It is an open source (LGPL), object-oriented XML API(that is, it
uses java object to describe corresponding elements in XML document), tree-based API for
processing XML with Java that strives for correctness, simplicity, and performance;
 XOM is the only XML API that makes no compromises on correctness;
 dual streaming, namely, it can be used to read and write XML documents;
 strive for maximum simplicity;
 Individual nodes in the tree can be processed while the document is still being built. This
enables XOM programs to operate almost as fast as the underlying parser can supply data.
You don't need to wait for the document to be completely parsed before you can start
working with it.
 memory efficient:
If you read an entire document into memory, XOM uses as little memory as possible. More
importantly, XOM allows you to filter documents as they're built so you don't have to build
the parts of the tree you aren't interested in. For instance, you can skip building text nodes
that only represent boundary white space, if such white space is not significant in your
application. You can even process a document piece by piece and throw away each piece
when you're done with it. XOM has been used to process documents that are gigabytes in
size.
 it depends on an underlying SAX parser to read the document;
 Prefer classes to interfaces:
o Interfaces (and the corresponding factory methods) are harder to use than classes (and
constructors);
difficult for interface-based code to determine which class it is actually using. In the
ideal world, this shouldn’t matter. Any implementation of the interface should be able
to take the place of any other. In practice this simply isn’t true;
o Interfaces cannot verify constraints on an object.
o There is no way to assert in an interface that the name of an element must be a legal
XML 1.0 name, or that the text content of an element cannot contain nulls and
unmatched halves of surrogate pairs. You must rely on the good faith of implementers
not to violate such important preconditions.

XPath:
XPath (XML Path Language) is an expression language for addressing portions of an XML
document, or for computing values (strings, numbers, or boolean values) based on the content
of an XML document.
The XPath language is based on a tree representation of the XML document, and provides
the ability to navigate around the tree, selecting nodes by a variety of criteria.
XPP:
XML Pull Parser;
XML parsing:

XML parser:
A piece of code that reads an XML document and analyzes its structure.
Parsers support different kinds of XML parsing APIs (that is the parser supports the interface
defined in the standard), such as DOM, SAX and so on;


The most common problem with various XML APIs have been with Namespace.
XML APIs are too complicated, too simple, or both;

Five styles of XML API
o A push model: SAX and XNI(Xerces Native Interface)
it is push because the parser is pushing data at the client program;
the parser takes control;
Streaming parse;
fast and easy to implement for parser vendors;
use very little memory;
o A pull model: AXIOM in Axis2
The client application asks the parser to give it the next piece of information whenever it
want it, so it is the client application who is in control of the parsing process;
Streaming parse;
Fast and memory efficient;
o A tree-based API: DOM, JDOM, DOM4J and XOM
The parser generate an object model, typically a tree with nodes for elements,
attributes, comments and so forth. This object is stored in memory;
Memory consumed is more;
o A data binding API:
Similar to tree API in that a tree object is built;
But instead of representing the XML element, it represent the concepts the XML
represents(So a book element might become an Book object);
o A query API: TrAX(Transforming API with XSLT)
XML Transformation:
The great challenge of a transformation API is how to deal with all the possible combinations of
inputs and outputs, without becoming specialized for any of the given types.
Transformations may be described by Java code, Perl code, XSLT Stylesheets, other types of
script, or by proprietary formats. The inputs, one or multiple, to a transformation, may be a URL,
XML stream, a DOM tree, SAX Events, or a proprietary format or data structure. The output
types are pretty much the same types as the inputs, but different inputs may need to be
combined with different outputs.
XML Information Set(Infoset):
an abstract data set, whose purpose is to provide a consistent set of definitions for use in other
specifications that need to refer to the information in a well-formed XML document.
An XML document's information set consists of a number of information items; the
information set for any well-formed XML document will contain at least a document information
item and several others. An information item is an abstract description of some part of an XML
document: each information item has a set of associated named properties. In this specification,
the property names are shown in square brackets, [thus].
An XML document has an information set if it is well-formed and satisfies the namespace
constraints described below. There is no requirement for an XML document to be valid in order
to have an information set.
An information set can contain up to eleven different types of information item:
1. There is exactly one document information item in the information set;
2. There is an element information item for each element appearing in the XML document;
3. There is an attribute information item for each attribute (specified or defaulted) of each
element in the document;
4. There is a processing instruction information item for each processing instruction in the
document;
5. unexpanded entity reference information item;
6. There is a character information item for each data character that appears in the
document;
7. There is a comment information item for each XML comment in the original document,
except for those appearing in the DTD;
8. document type declaration information item;
9. There is an unparsed entity information item for each unparsed general entity declared
in the DTD;
10. There is a notation information item for each notation declared in the DTD;
11. Each element in the document has a namespace information item for each namespace
that is in scope for that element.
Example:
Consider the following example XML document:
<?xml version="1.0"?>
<msg:message doc:date="19990421"
xmlns:doc="http://doc.example.org/namespaces/doc"
xmlns:msg="http://message.example.org/"
>Phone home!</msg:message>
The information set for this XML document contains the following information items:
* A document information item.
* An element information item with namespace name "http://message.example.org/", local
part "message", and prefix "msg".
* An attribute information item with the namespace name
"http://doc.example.org/namespaces/doc", local part "date", prefix "doc", and normalized value
"19990421".
* Three namespace information items for the http://www.w3.org/XML/1998/namespace,
http://doc.example.org/namespaces/doc, and http://message.example.org/ namespaces.
* Two attribute information items for the namespace attributes.
* Eleven character information items for the character data.
XML wire format:
The XML wire format is the physical representation of a message that can be parsed as XML. An
XML wire format describes the physical representation of a message that is written according to
the standards given in the W3C Extensible Markup Language (XML) specification. The wire
format defines information that is used to parse or write XML messages in a runtime
environment such as a broker.
XML Schema:
1. complexType and simplyType:
In XML Schema, there is a basic difference between complex types which allow
elements in their content and may carry attributes, and simple types which cannot have
element content and cannot carry attributes.
simplyType:
We use the simpleType element to define and name the new simple type. We use the
restriction element to indicate the existing (base) type, Several facets can be applied to
list types: length, minLength, maxLength, pattern, union and enumeration:
<xsd:simpleType name="myInteger">
<xsd:restriction base="xsd:integer">
<xsd:minInclusive value="10000"/>
<xsd:maxInclusive value="99999"/>
</xsd:restriction>
</xsd:simpleType>
2. restriction:
A. use the simpleType element to define and name the new simple type. We use
the restriction element to indicate the existing (base) type, and to identify the
"facets" that constrain the range of values.
B. Restriction of complex types is conceptually the same as restriction of simple
types, except that the restriction of complex types involves a type's declarations
rather than the acceptable range of a simple type's values. A complex type
derived by restriction is very similar to its base type, except that its declarations
are more limited than the corresponding declarations in the base type. In fact,
the values represented by the new type are a subset of the values represented
by the base type (as is the case with restriction of simple types);
3. group,sequence, choice and all:
group reprents just a set of elements which can be used in other groups;
The choice group element allows only one of its children to appear in an instance;
all: All the elements in the group may appear once or not at all, and they may appear in
any order.
4. simpleContent,mixContent and emptyContent:

simpleComtent: can hold attribute,compared with simpleType,but
only simple types allowed also;
<xsd:element name="internationalPrice">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:decimal">
<xsd:attribute name="currency" type="xsd:string"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>

mixContent: can hold simply types and elements at the same time;
<letterBody>
<salutation>Dear Mr.<name>Robert Smith</name>.</salutation>
Your order of <quantity>1</quantity> <productName>Baby
Monitor</productName> shipped from our warehouse on
<shipDate>1999-05-21</shipDate>. ....
</letterBody>
<xsd:element name="letterBody">
<xsd:complexType mixed="true">
<xsd:sequence>
<xsd:element name="salutation">
<xsd:complexType mixed="true">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="quantity" type="xsd:positiveInteger"/>
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="shipDate"
type="xsd:date"
minOccurs="0"/>
<!-- etc. -->
</xsd:sequence>
</xsd:complexType>
</xsd:element>


emptyContent: can hold no content;
complexContent: The complexContent element signals that we intend to restrict
or extend the content model of a complex type;
5. ref:
<element name=”book”>
<sequence>
<element ref=”title”/>(title is defined previously)
</sequence>
</element>
6. unique, key and keyref:

unique: This location of the xs:unique element in the schema gives the context
node in which the constraint holds.
<xs:unique name="charName">
<xs:selector xpath="character"/>
<xs:field xpath="name"/>
</xs:unique>

key: similar to xs:unique except that the value has to be non null
<xs:key name="charName">
<xs:selector xpath="character"/>
<xs:field xpath="name"/>
</xs:key>

keyref: allows us to define a reference to a xs:key or a xs:unique
Ex: To indicate that friend-of needs to refer to a character from this same book, we
will write, at the same level as we defined our key constraint, the following:
<xs:keyref name="charNameRef" refer="charName">(see above)
<xs:selector xpath="character"/>
<xs:field xpath="friend-of"/>
</xs:keyref>
7. import, include and redefine:
import: reuse definitions from other namespace
<xs:import namespace="http://www.w3.org/XML/1998/namespace"
schemaLocation="myxml.xsd"/>
include: it's an inclusion and as such it doesn't allow to override the definitions of the
included schema;
<xs:include schemaLocation="character.xsd"/>
redefine: similar to xs:include, except that it lets you redefine declarations from the
included schema, the declarations that are redefined must be placed in the
xs:redefine element.
<xs:redefine schemaLocation="character12.xsd">
<xs:simpleType name="nameType">
<xs:restriction base="xs:string">
<xs:maxLength value="40"/>
</xs:restriction>
</xs:simpleType>
</xs:redefine>
8. abstract type and final type:
abstract type: used like in OOP
<xs:element name="name-elt" type="xs:string" abstract="true"/>
<xs:element name="name" type="xs:string"
substitutionGroup="name-elt"/>
<xs:element name="surname" type="xs:string"
substitutionGroup="name-elt"/>
(
<xs:element name="name" type="xs:string"/>
<xs:element name="surname" type="xs:string"
substitutionGroup="name" />
) the element surname can be used anywhere an element name has been defined
final type:
<xs:complexType name="characterType" final="#all"> (can also be restriction and
extension)
<xs:sequence>
<xs:element name="name" type="nameType"/>
<xs:element name="since" type="sinceType"/>
<xs:element name="qualification" type="descType"/>
</xs:sequence>
</xs:complexType>
<xs:simpleType name="nameType">
<xs:restriction base="xs:string">
<xs:maxLength value="32" fixed="true"/>()
</xs:restriction>
</xs:simpleType>
9. Namespace:
<xs:schema targetNamespace="http://example.org/ns/books/"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:bk="http://example.org/ns/books/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
.../...
</xs:schema>
whether attributes and elements are considered by default to be
qualified (in a namespace). This differentiation between
qualified and unqualified can be indicated by specifying the
default values;
10.
Any and anyAttribute:
For instance, if we want to extend the definition of our description type to any
XHTML tag, we could declare:
<xs:complexType name="descType" mixed="true">
<xs:sequence>
<xs:any namespace="http://www.w3.org/1999/xhtml"
processContents="skip" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
The xs:anyAttribute gives the same functionality for attributes.
11. XMLSchema instance(xsi):
Namespace: http://www.w3.org/2001/XMLSchema-instance
Used in xml instance document:
 xsi:noNamespaceSchemaLocation and xsi:schemaLocation attributes allow
you to tie a document to its W3C XML Schema;
 xsi:type, which lets you define the simple or complex type of an element;
 xsi:nil, which lets you specify a nil (null) value for an element (that has to be
defined as nillable in the schema using a nillable=true attribute).
XSD:
XML Schema Definition;
XSLT:
EXtensible Stylesheet Language Transformations (XSLT) is an XML-based language used for the
transformation of XML documents into other XML or "human-readable" documents.
The original document is not changed; rather, a new document is created based on the
content of an existing one.The new document may be serialized (output) by the processor in
standard XML syntax or in another format, such as HTML or plain text. XSLT is most often used
to convert data between different XML schemas or to convert XML data into HTML or XHTML
documents for web pages, creating a dynamic web page, or into an intermediate XML format
that can be converted to PDF documents.
An XSLT processor reads both an input XML document and an XSLT stylesheet (which is itself
an XML document because XSLT is an XML application) and produces a result tree as output.
This result tree may then be serialized into a file or written onto a stream. Documents can be
transformed using a standalone program or as part of a larger program that communicates with
the XSLT processor through its API.
Download