Canonical lexical representation: A canonical lexical representation is a set of literals from among the valid set of literals for a datatype such that there is a one-to-one mapping between literals in the canonical lexical representation and values in the value space. Canonical lexical form: The canonical lexical form of a base64Binary data value is the base64 encoding of the value which matches the Canonical-base64Binary production in the following grammar: Canonical-base64Binary ::= (B64 B64 B64 B64)* ((B64 B64 B16 '=') | (B64 B04 '=='))? B04 ::= [AQgw] B16 ::= [AEIMQUYcgkosw048] B64 ::= [A-Za-z0-9+/] Data binding: JAXB: (Java Architecture for XML Binding) o Binding between Java and XML object using XML schema; o Data(object) level; Caster: o Fairly new; o Need very little change of your code from JAXB to Caster; o Need not XML schema to achieve data binding; o Provides JDO(Java Data Object) capabilities DOM: Document Object Model, a platform and language independent-standard object model for representing HTML or XML and related formats. Because the DOM supports navigation in any direction (e.g., parent and previous sibling) and allows for arbitrary modifications, an implementation must at least buffer the document that has been read so far (or some parsed form of it). Hence the DOM is likely to be best suited for applications where the document must be accessed repeatedly or out of sequence order. If the application is strictly sequential and one-pass, the SAX model is likely to be faster and use less memory. JAXP: The Java API for XML Processing, or JAXP (pronounced jaks-p), is one of the Java XML programming APIs. This API specifies certain common tasks that the DOM and SAX standards leave out. Specifically, creating parser objects is not defined by the DOM or SAX standards, and DOM does not define turning features of those parsers on and off. It provides the capability of validating and parsing XML documents. The two basic parsing interfaces are: the Document Object Model parsing interface or DOM interface the Simple API for XML parsing interface or SAX interface A third interface most likely be added in the next major version of Java, Mustang due late 2006. StAX the Streaming API for XML. In addition to the parsing interfaces, the API provides an XSLT interface to provide data and structural transformations on an XML document. Namespace: Ex. <customer_summary xmlns:addr="http://www.xyz.com/addresses/" xmlns:books="http://www.zyx.com/books/" xmlns:mortgage="http://www.yyz.com/title/" > The String in a namespace definition is just a string. Although it looks like a URL, it isn’t used that way. They are here just because they are unique. NCNAME: represents XML "non-colonized" Names; QName: Qualified Name, QName represents XML qualified names. The ·value space· of QName is the set of tuples {namespace name, local part}, where namespace name is an anyURI and local part is an NCName: QName ::= PrefixedName | UnprefixedName PrefixedName ::= Prefix ':' LocalPart UnprefixedName ::= LocalPart Prefix ::= NCName LocalPart ::= NCName The Prefix provides the namespace prefix part of the qualified name, and MUST be associated with a namespace URI reference in a namespace declaration. [Definition: The LocalPart provides the local part of the qualified name.] SAX: The Simple API for XML (SAX) is a serial access parser API for XML. SAX provides a mechanism for reading data from an XML document. A parser which implements SAX (ie, a SAX Parser) functions as a stream parser, with an event-driven API. The user defines a number of callback methods that will be called when events occur during parsing. Schema Validation: The process of checking to see if an XML document conforms to a schema is called validation, which is separate from XML's core concept of syntactic well-formedness. All XML documents must be well-formed, but it is not required that a document be valid unless the XML parser is "validating", in which case the document is also checked for conformance with its associated schema. Documents are only considered valid if they satisfy the requirements of the schema with which they have been associated. These requirements typically include such constraints as: * Elements and attributes that must/may be included, and their permitted structure; * The structure is specified by regular expression syntax; * How character data is to be interpreted, e.g. as a number, a date, a URL, a Boolean, etc. XML Schema validations can be effectively performed using specialized parsers, like JAXB or SAX. StAX: Streaming API for XML; XOM: XOM is a new XML object model. It is an open source (LGPL), object-oriented XML API(that is, it uses java object to describe corresponding elements in XML document), tree-based API for processing XML with Java that strives for correctness, simplicity, and performance; XOM is the only XML API that makes no compromises on correctness; dual streaming, namely, it can be used to read and write XML documents; strive for maximum simplicity; Individual nodes in the tree can be processed while the document is still being built. This enables XOM programs to operate almost as fast as the underlying parser can supply data. You don't need to wait for the document to be completely parsed before you can start working with it. memory efficient: If you read an entire document into memory, XOM uses as little memory as possible. More importantly, XOM allows you to filter documents as they're built so you don't have to build the parts of the tree you aren't interested in. For instance, you can skip building text nodes that only represent boundary white space, if such white space is not significant in your application. You can even process a document piece by piece and throw away each piece when you're done with it. XOM has been used to process documents that are gigabytes in size. it depends on an underlying SAX parser to read the document; Prefer classes to interfaces: o Interfaces (and the corresponding factory methods) are harder to use than classes (and constructors); difficult for interface-based code to determine which class it is actually using. In the ideal world, this shouldn’t matter. Any implementation of the interface should be able to take the place of any other. In practice this simply isn’t true; o Interfaces cannot verify constraints on an object. o There is no way to assert in an interface that the name of an element must be a legal XML 1.0 name, or that the text content of an element cannot contain nulls and unmatched halves of surrogate pairs. You must rely on the good faith of implementers not to violate such important preconditions. XPath: XPath (XML Path Language) is an expression language for addressing portions of an XML document, or for computing values (strings, numbers, or boolean values) based on the content of an XML document. The XPath language is based on a tree representation of the XML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria. XPP: XML Pull Parser; XML parsing: XML parser: A piece of code that reads an XML document and analyzes its structure. Parsers support different kinds of XML parsing APIs (that is the parser supports the interface defined in the standard), such as DOM, SAX and so on; The most common problem with various XML APIs have been with Namespace. XML APIs are too complicated, too simple, or both; Five styles of XML API o A push model: SAX and XNI(Xerces Native Interface) it is push because the parser is pushing data at the client program; the parser takes control; Streaming parse; fast and easy to implement for parser vendors; use very little memory; o A pull model: AXIOM in Axis2 The client application asks the parser to give it the next piece of information whenever it want it, so it is the client application who is in control of the parsing process; Streaming parse; Fast and memory efficient; o A tree-based API: DOM, JDOM, DOM4J and XOM The parser generate an object model, typically a tree with nodes for elements, attributes, comments and so forth. This object is stored in memory; Memory consumed is more; o A data binding API: Similar to tree API in that a tree object is built; But instead of representing the XML element, it represent the concepts the XML represents(So a book element might become an Book object); o A query API: TrAX(Transforming API with XSLT) XML Transformation: The great challenge of a transformation API is how to deal with all the possible combinations of inputs and outputs, without becoming specialized for any of the given types. Transformations may be described by Java code, Perl code, XSLT Stylesheets, other types of script, or by proprietary formats. The inputs, one or multiple, to a transformation, may be a URL, XML stream, a DOM tree, SAX Events, or a proprietary format or data structure. The output types are pretty much the same types as the inputs, but different inputs may need to be combined with different outputs. XML Information Set(Infoset): an abstract data set, whose purpose is to provide a consistent set of definitions for use in other specifications that need to refer to the information in a well-formed XML document. An XML document's information set consists of a number of information items; the information set for any well-formed XML document will contain at least a document information item and several others. An information item is an abstract description of some part of an XML document: each information item has a set of associated named properties. In this specification, the property names are shown in square brackets, [thus]. An XML document has an information set if it is well-formed and satisfies the namespace constraints described below. There is no requirement for an XML document to be valid in order to have an information set. An information set can contain up to eleven different types of information item: 1. There is exactly one document information item in the information set; 2. There is an element information item for each element appearing in the XML document; 3. There is an attribute information item for each attribute (specified or defaulted) of each element in the document; 4. There is a processing instruction information item for each processing instruction in the document; 5. unexpanded entity reference information item; 6. There is a character information item for each data character that appears in the document; 7. There is a comment information item for each XML comment in the original document, except for those appearing in the DTD; 8. document type declaration information item; 9. There is an unparsed entity information item for each unparsed general entity declared in the DTD; 10. There is a notation information item for each notation declared in the DTD; 11. Each element in the document has a namespace information item for each namespace that is in scope for that element. Example: Consider the following example XML document: <?xml version="1.0"?> <msg:message doc:date="19990421" xmlns:doc="http://doc.example.org/namespaces/doc" xmlns:msg="http://message.example.org/" >Phone home!</msg:message> The information set for this XML document contains the following information items: * A document information item. * An element information item with namespace name "http://message.example.org/", local part "message", and prefix "msg". * An attribute information item with the namespace name "http://doc.example.org/namespaces/doc", local part "date", prefix "doc", and normalized value "19990421". * Three namespace information items for the http://www.w3.org/XML/1998/namespace, http://doc.example.org/namespaces/doc, and http://message.example.org/ namespaces. * Two attribute information items for the namespace attributes. * Eleven character information items for the character data. XML wire format: The XML wire format is the physical representation of a message that can be parsed as XML. An XML wire format describes the physical representation of a message that is written according to the standards given in the W3C Extensible Markup Language (XML) specification. The wire format defines information that is used to parse or write XML messages in a runtime environment such as a broker. XML Schema: 1. complexType and simplyType: In XML Schema, there is a basic difference between complex types which allow elements in their content and may carry attributes, and simple types which cannot have element content and cannot carry attributes. simplyType: We use the simpleType element to define and name the new simple type. We use the restriction element to indicate the existing (base) type, Several facets can be applied to list types: length, minLength, maxLength, pattern, union and enumeration: <xsd:simpleType name="myInteger"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="10000"/> <xsd:maxInclusive value="99999"/> </xsd:restriction> </xsd:simpleType> 2. restriction: A. use the simpleType element to define and name the new simple type. We use the restriction element to indicate the existing (base) type, and to identify the "facets" that constrain the range of values. B. Restriction of complex types is conceptually the same as restriction of simple types, except that the restriction of complex types involves a type's declarations rather than the acceptable range of a simple type's values. A complex type derived by restriction is very similar to its base type, except that its declarations are more limited than the corresponding declarations in the base type. In fact, the values represented by the new type are a subset of the values represented by the base type (as is the case with restriction of simple types); 3. group,sequence, choice and all: group reprents just a set of elements which can be used in other groups; The choice group element allows only one of its children to appear in an instance; all: All the elements in the group may appear once or not at all, and they may appear in any order. 4. simpleContent,mixContent and emptyContent: simpleComtent: can hold attribute,compared with simpleType,but only simple types allowed also; <xsd:element name="internationalPrice"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:decimal"> <xsd:attribute name="currency" type="xsd:string"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element> mixContent: can hold simply types and elements at the same time; <letterBody> <salutation>Dear Mr.<name>Robert Smith</name>.</salutation> Your order of <quantity>1</quantity> <productName>Baby Monitor</productName> shipped from our warehouse on <shipDate>1999-05-21</shipDate>. .... </letterBody> <xsd:element name="letterBody"> <xsd:complexType mixed="true"> <xsd:sequence> <xsd:element name="salutation"> <xsd:complexType mixed="true"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="quantity" type="xsd:positiveInteger"/> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/> <!-- etc. --> </xsd:sequence> </xsd:complexType> </xsd:element> emptyContent: can hold no content; complexContent: The complexContent element signals that we intend to restrict or extend the content model of a complex type; 5. ref: <element name=”book”> <sequence> <element ref=”title”/>(title is defined previously) </sequence> </element> 6. unique, key and keyref: unique: This location of the xs:unique element in the schema gives the context node in which the constraint holds. <xs:unique name="charName"> <xs:selector xpath="character"/> <xs:field xpath="name"/> </xs:unique> key: similar to xs:unique except that the value has to be non null <xs:key name="charName"> <xs:selector xpath="character"/> <xs:field xpath="name"/> </xs:key> keyref: allows us to define a reference to a xs:key or a xs:unique Ex: To indicate that friend-of needs to refer to a character from this same book, we will write, at the same level as we defined our key constraint, the following: <xs:keyref name="charNameRef" refer="charName">(see above) <xs:selector xpath="character"/> <xs:field xpath="friend-of"/> </xs:keyref> 7. import, include and redefine: import: reuse definitions from other namespace <xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="myxml.xsd"/> include: it's an inclusion and as such it doesn't allow to override the definitions of the included schema; <xs:include schemaLocation="character.xsd"/> redefine: similar to xs:include, except that it lets you redefine declarations from the included schema, the declarations that are redefined must be placed in the xs:redefine element. <xs:redefine schemaLocation="character12.xsd"> <xs:simpleType name="nameType"> <xs:restriction base="xs:string"> <xs:maxLength value="40"/> </xs:restriction> </xs:simpleType> </xs:redefine> 8. abstract type and final type: abstract type: used like in OOP <xs:element name="name-elt" type="xs:string" abstract="true"/> <xs:element name="name" type="xs:string" substitutionGroup="name-elt"/> <xs:element name="surname" type="xs:string" substitutionGroup="name-elt"/> ( <xs:element name="name" type="xs:string"/> <xs:element name="surname" type="xs:string" substitutionGroup="name" /> ) the element surname can be used anywhere an element name has been defined final type: <xs:complexType name="characterType" final="#all"> (can also be restriction and extension) <xs:sequence> <xs:element name="name" type="nameType"/> <xs:element name="since" type="sinceType"/> <xs:element name="qualification" type="descType"/> </xs:sequence> </xs:complexType> <xs:simpleType name="nameType"> <xs:restriction base="xs:string"> <xs:maxLength value="32" fixed="true"/>() </xs:restriction> </xs:simpleType> 9. Namespace: <xs:schema targetNamespace="http://example.org/ns/books/" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns:bk="http://example.org/ns/books/" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> .../... </xs:schema> whether attributes and elements are considered by default to be qualified (in a namespace). This differentiation between qualified and unqualified can be indicated by specifying the default values; 10. Any and anyAttribute: For instance, if we want to extend the definition of our description type to any XHTML tag, we could declare: <xs:complexType name="descType" mixed="true"> <xs:sequence> <xs:any namespace="http://www.w3.org/1999/xhtml" processContents="skip" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> The xs:anyAttribute gives the same functionality for attributes. 11. XMLSchema instance(xsi): Namespace: http://www.w3.org/2001/XMLSchema-instance Used in xml instance document: xsi:noNamespaceSchemaLocation and xsi:schemaLocation attributes allow you to tie a document to its W3C XML Schema; xsi:type, which lets you define the simple or complex type of an element; xsi:nil, which lets you specify a nil (null) value for an element (that has to be defined as nillable in the schema using a nillable=true attribute). XSD: XML Schema Definition; XSLT: EXtensible Stylesheet Language Transformations (XSLT) is an XML-based language used for the transformation of XML documents into other XML or "human-readable" documents. The original document is not changed; rather, a new document is created based on the content of an existing one.The new document may be serialized (output) by the processor in standard XML syntax or in another format, such as HTML or plain text. XSLT is most often used to convert data between different XML schemas or to convert XML data into HTML or XHTML documents for web pages, creating a dynamic web page, or into an intermediate XML format that can be converted to PDF documents. An XSLT processor reads both an input XML document and an XSLT stylesheet (which is itself an XML document because XSLT is an XML application) and produces a result tree as output. This result tree may then be serialized into a file or written onto a stream. Documents can be transformed using a standalone program or as part of a larger program that communicates with the XSLT processor through its API.