DOM

advertisement
DOM
DOM
(Document Object Model)
Cheng-Chia Chen
Transparency No. 1
DOM
What is DOM?
 DOM (Document Object Model)
 A tree-based Data model of XML Documents
 An API for XML document processing




cross multi-languages
language neutral.
defined in terms of CORBA IDL
language-specific bindings supplied for
ECMAScript, java, ….
Transparency No. 2
DOM
Document Object Model
 Defines how XML and HTML documents are represented
as objects in programs
 W3C Standard
 Defined in IDL; thus language independent
 HTML as well as XML
 Writing as well as reading
 Covers everything except internal and external DTD
subsets
Transparency No. 3
DOM
Trees
 An XML document can be represented as a tree.
 It has a root.
 It has nodes.
 It is amenable to recursive processing.
Transparency No. 4
DOM
DOM (Document Object Model)
 What is the tree view of the document ?
<?xml version=“1.0” encoding=“UTF-8” ?>
<TABLE>
<TBODY>
<TR>
<TD>紅樓夢</TD>
<TD>曹雪芹</TD>
</TR>
<TR>
<TD>三國演義</TD>
<TD>羅貫中</TD>
</TR>
</TBODY>
</TABLE>
Transparency No. 5
DOM
Tree view (DOM view) of an XML Docuemnt
(document node; root)
(element node)
(text node)
紅樓夢
曹雪芹
三國演義
羅貫中
Transparency No. 6
DOM
DOM Evolution




DOM Level 0:
DOM Level 1, a W3C Standard
DOM Level 2, a W3C Standard
DOM Level 3: W3C Standard:
 Document Object Model (DOM) Level 3 Core Specification
 Document Object Model (DOM) Level 3 Load and Save Specification
 Document Object Model (DOM) Level 3 Validation Specification
 DOM Level 3 : W3C Working group notes
 Document Object Model (DOM) Level 3 XPath Specification Version 1.0
 Document Object Model (DOM) Level 3 Views and Formatting
Specification
 Document Object Model (DOM) Level 3 Events Specification Version
1.0
 W3c DOM Working group
 W3C DOM Tech Reports
Transparency No. 7
DOM
DOM Implementations for Java
 Apache XML Project's Xerces parsers:
 http://xml.apache.org/xerces2-j/index.html
 Oracle/Sun's Java API for XML
 http://jaxp.java.net/
 GNU JAXP:
 http://www.gnu.org/software/classpathx/jaxp/jaxp.html
 Now part of GNU Classpath
Transparency No. 8
DOM
Modules
 Modules:









Core: org.w3c.dom (L1~L3)
Traversal: org.w3c.dom.traversal (L2)
Xpath, Load and Save, Validation (L3)
Range: org.w3c.dom.range (L2)
HTML: org.w3c.dom.html (L2)
Views: org.w3c.dom.views(L2)
StyleSheets: org.w3c.dom.stylesheets
CSS: org.w3c.dom.css
Events: org.w3c.dom.events (L2)
 Only the core,traversal, XPath, L&S, and Validation
modules really apply to XML. The others are for HTML.
Transparency No. 9
DOM
DOM Trees
 Entire document is represented as a tree.
 A tree contains nodes.
 Some nodes may contain other nodes (depending on
node type).
 Each document node contains:
 zero or one doctype nodes
 one root element node
 zero or more comment and processing instruction nodes
Transparency No. 10
DOM
org.w3c.dom
 17 interfaces:











Attr
CDATASection
CharacterData
Comment
Document
DocumentFragment
DocumentType
DOMImplementation
Element
Entity
EntityReference






NamedNodeMap
Node
NodeList
Notation
ProcessingInstruction
Text
 plus one exception:
DOMException
 Plus a bunch of HTML stuff in
org.w3c.dom.html and other
packages
Transparency No. 11
DOM
The DOM Interface Hierarchy
NamedNodeMap
DOMImplementation
NodeList
DOMException
Node
Fundamental Interface
Document
CharacterData
Comment
Attr
Text
Element
DocumentType
Notation
Extended Interface
CDATASection
Entity
EntityReference
ProcessingInstruction
DocumentFragment
Transparency No. 12
DOM
Steps to use DOM
 Creates a parser using library specific code
 Use the parser to parse the document and return a DOM
org.w3c.dom.Document object.
 The entire document is stored in memory.
 DOM methods and interfaces are used to extract data
from this object
Transparency No. 13
DOM
Parsing documents with a (Xerces) DOM Parser Example
import com.sun.org.apache.xerces.internal.parsers.*;
// import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import java.io.*;
public class DOMParserMaker {
public static void main(String[] args) {
DOMParser parser = new DOMParser();
for (int i = 0; i < args.length; i++) {
try {
// Read the entire document into memory
parser.parse(args[i]);
Document d = parser.getDocument();
// work with the document...
}
catch (SAXException e) {
System.err.println(e);
}
catch (IOException e) {
System.err.println(e);
} } }}
Transparency No. 14
DOM
Parsing process using JAXP
 javax.xml.parsers.DocumentBuilderFactory.newInstance()
creates a DocumentBuilderFactory
 Configure the factory
 The factory's newDocumentBuilder() method creates a
DocumentBuilder
Configure the builder
 The builder parses the document and returns a DOM
org.w3c.dom.Document object.
 The entire document is stored in memory.
 DOM methods and interfaces are used to extract data
from this object
Transparency No. 15
DOM
JAXP’s DOM plugability mechanism

Transparency No. 16
DOM
Parsing documents with a JAXP DocumentBuilder
import javax.xml.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*;
public class JAXPParserMaker {
public static void main(String[] args) {
try {
DocumentBuilderFactory builderFactory
= DocumentBuilderFactory.newInstance();
builderFactory.setNamespaceAware(true);
DocumentBuilder parser
= builderFactory.newDocumentBuilder();
for (int i = 0; i < args.length; i++) {
try {
// Read the entire document into memory
Document d = parser.parse(args[i]);
// work with the document...
}
catch (SAXException e) {
System.err.println(e);
catch (IOException e) {
System.err.println(e);
}
}
} // end for
} catch (ParserConfigurationException e) {
System.err.println("You need to install a JAXP aware parser.");
}}}
Transparency No. 17
DOM
The Node Interface
package org.w3c.dom;
public interface Node {
// NodeType
public static final short ELEMENT_NODE
= 1;
public static final short ATTRIBUTE_NODE
= 2;
public static final short TEXT_NODE
= 3;
public static final short CDATA_SECTION_NODE
= 4;
public static final short ENTITY_REFERENCE_NODE
= 5;
public static final short ENTITY_NODE
= 6;
public static final short PROCESSING_INSTRUCTION_NODE = 7;
public static final short COMMENT_NODE
= 8;
public static final short DOCUMENT_NODE
= 9;
public static final short DOCUMENT_TYPE_NODE
= 10;
public static final short DOCUMENT_FRAGMENT_NODE = 11;
public static final short NOTATION_NODE
= 12;
Transparency No. 18
DOM
The Node interface
 Node Properties
 name(qname, uri, lname, prefix), type, value
public String getNodeName();
public String getNamespaceURI();
public String getPrefix();
public void setPrefix(String prefix) throws DOMException;
public String getLocalName();
public String getNodeValue() throws DOMException;
public String setNodeValue(String value)
throws DOMException;
public short getNodeType();
Transparency No. 19
DOM
The Node interface
 Tree navigation
public Node
getParentNode();
public NodeList
getChildNodes();
public Node
getFirstChild();
public Node
getLastChild();
public Node
getPreviousSibling();
public Node
getNextSibling();
public NamedNodeMap getAttributes();
public Document
getOwnerDocument();
public boolean
hasChildNodes();
public boolean
hasAttributes();
Transparency No. 20
DOM
Node navigation
parentNode
this
nextSibling
previousSibling
firstChild
lastChild
childNodes
Transparency No. 21
DOM
The Node interface
 Tree Modification
public Node insertBefore (Node newNode, Node refNode)
throws DOMException;
public Node appendChild(Node newNode)
throws DOMException;
public Node replaceChild (Node newNode, Node refNode)
throws DOMException;
public Node removeChild(Node node)
throws DOMException;
Transparency No. 22
DOM
Node manipulation
this
firstChild
refNode
this.appendChild(newNode)
lastChild
childNodes
this.insertBefore(newNode, refNode)
this.replaceChild(newNode, refNode)
This.removeNode(refNode)
newNode
Transparency No. 23
DOM
The Node interface
 Utilities
public Node
cloneNode(boolean deep);
public void
normalize();
 merge all adjacent text nodes into one.
 CDATASECTION delimiters reserved
 No empty text nodes
public boolean isSupported(String feature, String version);
 Tests whether the DOM implementation implements a
specific feature and that feature is supported by this node.
Transparency No. 24
DOM
Node (continued) new in DOM 3
 String getTextContent()
 returns the text content of this node and its descendants.
 i.e., the string-value of the node in xpath view (except for Document,
of which getTextContent() is null).
 void setTextConent(String arg )
 reset arg as the unique child of this node
 boolean
isDefaultNamespace(String namespaceURI)
 This method checks if the specified namespaceURI is the default
namespace or not.
 boolean
isEqualNode(Node arg) // Tests if two nodes are equal.
 When are two nodes equal?
  Same type, names, attributes, value & childNodes
 boolean
isSameNode(Node other)
 Returns whether this node is the same node as the given one.
 String

lookupNamespaceURI(String prefix)
Look up the namespace URI associated to the given prefix,
starting from this node.
Transparency No. 25
DOM
 short
compareDocumentPosition(Node other)
 Compare ‘this’ node with ‘other’ node
 possible values: Node.DOCUMENT_POSITION_PRECEDING,
_FOLLOWING,_CONTAINS,CONTAINED_BY,_DISCONNECTED,
_IMPLEMENTATION_SPECIFIC
 String getBaseURI() //order: xml:base  entity  document
 The absolute base URI of this node or null if the
implementation wasn't able to obtain an absolute URI.
 Object
getUserData(String key)
 Retrieves the object associated to a key on this node.
 Object setUserData(String key, Object data,
UserDataHandler handler)
 Associate an object to a key on this node.
 handler can handle events (clone, del, renamed etc.) for the node
Transparency No. 26
DOM
UserDataHandler (skipped!)
 gets called when the node the object is associated to is being
cloned, adopted, deleted , imported, or renamed.
 can be used by the application to implement various behaviors
regarding the data it associates to the DOM nodes.
 void handle(short operation, String key, Object data, Node src,
Node dst)
 This method is called whenever the node for which this handler is
registered is imported or cloned.
 operation - Specifies the type of operation that is being performed on
the node. : { adopt, clone, import, delete, rename }
 methods: Doc.adoptNode(), Node.cloneNode(),
Doc.importNode(),Node.removeChild(node), Doc.renameNode()
 key - Specifies the key for which this handler is being called.
 data - Specifies the data for which this handler is being called.
 src - Specifies the node being cloned, adopted, imported, or renamed.
This is null when the node is being deleted.

dst - Specifies the node newly created if any, or null.
Transparency No. 27
DOM
The NodeList Interface
 Represent an ordered collection of nodes, without
defining or constraining how this collection is
implemented.
 package org.w3c.dom;
public interface NodeList {
// 0-based
public Node item(int index); // access by position!
public int getLength();
}
 Why not just List<Node> ?
 applicable to Java only.
 but DOM is defined not only for Java.
Transparency No. 28
DOM
The NamedNodeMap interface
 Represent collections of nodes that can be accessed by
name.
public interface NamedNodeMap {
public Node item(int index); // same as NodeList
public int
getLength();
public Node getNamedItem(String name); // key = nodeName
public Node setNamedItem(Node arg) throws DOMException;
// insert/replace node depending on if the map has a node with the same
name as arg.getNodeName()
// old node returned if this is a replacement!
public Node removeNamedItem(String name) throws DOMException;
// Introduced in DOM Level 2: key=URI+localName
public Node getNamedItemNS(namespaceURI, localName);
public Node setNamedItemNS(Node arg) throws DOMException;
public Node removeNamedItemNS(namespaceURI, localName)
throws DOMException ; }
Transparency No. 29
DOM
DOMStringList, NameLIst
 DOMStringList // List<String>
 an ordered collection of DOMString(i.e., Java String) values.
 boolean contains(String str)
Test if a string is part of this DOMStringList.
 + getLength() + item(int)
 NameList // List< ( prefix Name,NamespaceURI) >
 an ordered collection of pairs of (prefix) name and
namespace values (which could be null values).
 int
getLength()
 String
getName(int index)
 String getNamespaceURI(int index)
 boolean contains(String str)
 Test if a name is part of this NameList.
 boolean containsNS(String namespaceURI, String name)
Test if the pair namespaceURI/name is part of this NameList.
Transparency No. 30
DOM
NodeReporter
import javax.xml.parsers.*; import org.w3c.dom.*;
import org.xml.sax.*; import java.io.*;
public class NodeReporter {
public static void main(String[] args) {
try {
DocumentBuilderFactory builderFactory
= DocumentBuilderFactory.newInstance();
DocumentBuilder parser
= builderFactory.newDocumentBuilder();
NodeReporter iterator = new NodeReporter();
for (int i = 0; i < args.length; i++) {
try {
// Read the entire document into memory
Document doc = parser.parse(args[i]);
iterator.followNode(doc);
}
catch (SAXException ex) {
System.err.println(args[i] + " is not well-formed.");
}
catch (IOException ex) {
System.err.println(ex);
}
} }
catch (ParserConfigurationException ex) {
System.err.println("You need to install a JAXP aware parser."); }
} // end main
Transparency No. 31
DOM
// note use of recursion
public void followNode(Node node) {
processNode(node);
if (node.hasChildNodes()) {
NodeList children = node.getChildNodes();
for (int i = 0; i < children.getLength(); i++) {
followNode(children.item(i));
}
}
}
public void processNode(Node node) {
String name = node.getNodeName();
String type = typeName[node.getNodeType()];
System.out.println("Type " + type + ": " + name);
}
Transparency No. 32
DOM
Type2TypeName
Public String[ ] typeName = new String[]{
"Unknown Type“ ,
"Element“,
"Attribute“,
"Text“,
"CDATA Section“,
"Entity Reference“,
"Entity“,
"Processing Instruction“,
"Comment“,
"Document“,
"Document Type Declaration“,
"Document Fragment“,
"Notation“,
}}
Transparency No. 33
DOM
Values of NodeName, NodeValue and attributes in a Node
Interface
nodeName
nodeValue
attributes
Attr
name of attribute
value of attribute
null
CDATASection#cdata-section
content
null
Comment
#comment
content
null
Document
#document
null
null
DocumentFragment
#document-fragment null
null
DocumentType document type name null
null
Element
tag name
null
NamedNodeMap
Entity
entity name
null
null
EntityReference
null
name of entity referenced
null
Notation
notation name
null
null
ProcessingInstruction
content excluding target
target
null
Text
#text
content of the text node
null
Transparency No. 34
DOM
The Document Node
 The root node representing the entire document; not the
same as the root element
 Contains:




zero or more processing instruction nodes
zero or more comment nodes
zero or one document type node
one element node
Transparency No. 35
DOM
The Document Interface
package org.w3c.dom;
public interface Document extends Node {
public DocumentType
getDoctype();
public DOMImplementation getImplementation();
public Element
getDocumentElement();
public String
getDocumentURI()V3;
// =null if not specified or create using
DOMImplementation.createDocuemnt()
public NodeList getElementsByTagName(String tagname);
public NodeList getElementsByTagNameNS(String
NamespaceURI, String localName);
public Element
getElementById(String elementId);
Transparency No. 36
DOM
The Document Interface
// Factory methods
public Element
createElement(String tagName) throws DOMException;
public Element
createElementNS(String namespaceURI, String qName)
throws DOMException;
public DocumentFragment createDocumentFragment();
public Text
public Comment
public CDATASection
createTextNode(String data);
createComment(String data);
createCDATASection(String data) throws DOMException;
public ProcessingInstruction createProcessingInstruction(String target, String data)
throws DOMException;
public Attr
public Attr
createAttribute(String name) throws DOMException;
createAttributeNS(String namespaceURI, String qName) throws
DOMException;
public EntityReference
createEntityReference(String name) throws DOMException;
public Node
importNode(Node importedNode, boolean deep) throws
DOMException;
}
Transparency No. 37
DOM
New in Document V3
 Node adoptNode(Node node):
 adopt(i.e., move) trees rooted at node from its owner document
to this document.
 It is detached from its parent if it has one.
 c.f. importNode(Node, deep) // this is a copy
 DOMConfiguration getDomConfig()
 DOMCOnfiguration is a table of (key, value) parameters used
to control how DOCUMENT.normalizeDocument() behaves.
 normalizeDocument()
 acts as if the document was going through a save and load
cycle, putting the document in a "normal" form.
Transparency No. 38
DOM
 use case:
 DOMConfiguration docConfig = myDocument.getDomConfig();
docConfig.setParameter("infoset", Boolean.TRUE);
myDocument.normalizeDocument();
 Node renameNode(Node n, String namespaceURI,
String qualifiedName) throws DOMException
 Rename an existing node of type ELEMENT_NODE or
ATTRIBUTE_NODE.
 getXmlVersion(), getXmlEncoding(),
getStandalone():boolean
 get respective value from the XML declaration of a document.
 <?xml version="1.1" encoding="UTF-8" standard="true" ?>
 setXmlVersion(String), setXMLStandalone(boolean)
Transparency No. 39
DOM
Element Nodes
 Represents a complete element including its
 start-tag,
 end-tag, and
 content
 Content may contain:






Element nodes
ProcessingInstruction nodes
Comment nodes
Text nodes
CDATASection nodes
EntityReference nodes
Transparency No. 40
DOM
The Element Interface
public String getTagName(); // = getNodeName();
public NodeList getElementsByTagName(String name);
public NodeList getElementsByTagNameNS(String rui, String localName);
public String getAttribute(String name);
public String getAttributeNS(String uri, String localName);
public void
public void
setAttribute(String name, String value) throws DOMException;
setAttributeNS(String uriURI, String qName, String value)
throws DOMException;
public void
public void
removeAttribute(String name) throws DOMException;
removeAttributeNS(String uri, String localName) throws DOMException;
public Attr
public Attr
getAttributeNode(String name);
getAttributeNodeNS(String namespaceURI, String localName);
public Attr
public Attr
setAttributeNode(Attr newAttr) throws DOMException;
setAttributeNodeNS(Attr newAttr) throws DOMException;
public Attr
removeAttributeNode(Attr oldAttr) throws DOMException;
Transparency No. 41
DOM
Example application
 RSS-based list of Web logs
<?xml version="1.0"?>
<!-<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd">
-->
<weblogs>
<log>
<name>MozillaZine</name>
<url>http://www.mozillazine.org</url>
<changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl>
<ownerName>Jason Kersey</ownerName>
<ownerEmail>kerz@en.com</ownerEmail>
<description>THE source for news on the Mozilla Organization. DevChats, Reviews,
Chats, Builds, Demos, Screenshots, and more.</description>
<imageUrl></imageUrl>
<adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif
</adImageUrl>
</log> …
</weblogs>
Transparency No. 42
DOM
DOM Design
 Want to find all URLs in the logs
 The character data of each url element needs to be read.
Everything else can be ignored.
 The getElementsByTagName() method in Document
gives us a quick list of all the url elements.
Transparency No. 43
DOM
The program
WeblogsDOM .java
Transparency No. 1
DOM
CharacterData interface
 Represents things that are basically text holders
 Super interface of
 Text
 Comment
 CDATASection
Transparency No. 45
DOM
The CharacterData Interface
 Note: applicable to Comment, Text and CDATASection
public interface CharacterData extends Node {
// content retrieval
public String getData() throws DOMException;
public int
getLength();
public String substringData(int offset, int count)
throws DOMException; // 0-based
// content modification
public void setData(String data) throws DOMException;
public void appendData(String arg) throws DOMException;
public void insertData(int offset, String arg) throws
DOMException;
public void deleteData(int offset, int count) throws
DOMException;
public void replaceData(int offset, int count, String arg)
throws DOMException; }
Transparency No. 46
DOM
Text Nodes
 Represents the text content of
 an element or
 an attribute
 Contains only pure text, no markup
 Parsers will return a single maximal text node for each
contiguous run of pure text.
 Editing may change this.
Transparency No. 47
DOM
The Text Interface
public interface Text extends CharacterData {
public Text splitText(int offset) throws DOMException;

split this into two, this becomes the first part and the last part
is returned.
String getWholeText()
 Returns all text of Text nodes logically-adjacent to this node,
concatenated in document order.
boolean isElementContentWhitespace()
 Returns whether this text node contains ignorable whitespace.
Text replaceWholeText(String content)
 Replaces the text of the current node and all logically-adjacent
text nodes with the specified text.
 return the Text node created with the new specified content. }
Transparency No. 48
DOM
CDATA section Nodes
 Represents a CDATA section like this example from a hypothetical
SVG tutorial:
<p>You can use a default <code>xmlns</code> attribute to avoid
having to add the svg prefix to all your elements:</p>
<![CDATA[
<svg xmlns="http://www.w3.org/2000/svg"
width="12cm" height="10cm">
<ellipse rx="110" ry="130" />
<rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg>
]]>
 No children
Transparency No. 49
DOM
The CDATASection Interface
// no additional methods other than those form Text
public interface CDATASection extends Text {
}
Transparency No. 50
DOM
DocumentType Nodes
 Represents a document type declaration
 Has no children
Transparency No. 51
DOM
The DocumentType Interface
public interface DocumentType extends Node {
public String
public String
public String
getName();
getPublicId();
getSystemId();
public NamedNodeMap getEntities();
 return all general entities, both external and internal, declared
in the DTD.
public NamedNodeMap getNotations();
public String
getInternalSubset();
 return the internal subset as a string or null if there is none. }
Transparency No. 52
DOM
Example
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
"DTD/xhtml1-strict.dtd“ [
<!ENTITY foo "foo"> <!ENTITY bar "bar"> <!ENTITY bar
"bar2"> <!ENTITY % baz "baz">
]>
 name = “html”
 pubicId = "-//W3C//DTD XHTML 1.0 Strict//EN"
 systemId= "DTD/xhtml1-strict.dtd“
 internalSubset="<!ENTITY foo "foo"> … "
 getEntities()  [ ‘foo’: entity(foo),‘bar’:entity(bar)] // readonly
 getNotations()  null
Transparency No. 53
DOM
Attr Nodes
 Represents an attribute
 Contains:
 Text nodes
 Entity reference nodes
Transparency No. 54
DOM
The Attr Interface
public interface Attr extends Node {
public String
getName();
public boolean getSpecified(); //false => from DTD
public String
getValue();
public void
setValue(String value)
throws DOMException;
public Element getOwnerElement();
 Attr.getParent() == null
 Attr is not a member of Element.getChildren()
// namespaceURI, prefix, localName inherited from Node
public boolean isId()V3 // is this an ID attribute
TypeInfo getSchemaTypeInfo()
 The type information associated with this attribute. }
Transparency No. 55
DOM
TypeInfo
 represents a type referenced from Element or Attr nodes,
specified in the schemas associated with the document.
 = [namespace URI] + [type name]
pubulic interface TypeInfo {
 String getTypeName()
 The name of a type declared for the associated element or attribute, or
null if unknown.
 String
getTypeNamespace()
 The namespace of the type declared for the associated element or
attribute or null if the element does not have declaration or if no
namespace information is available.
 boolean isDerivedFrom (String typeNamespaceArg,
String typeNameArg, int derivationMethod)
 true if this is derived from the arg type definition.
 derMethod = one of { Extension, Restriction, union, list }
Transparency No. 56
DOM
ProcessingInstruction Nodes
 Represents a processing instruction like
<?robots index="yes" follow="no"?>
 No children
Transparency No. 57
DOM
The ProcessingInstruction Interface
public interface ProcessingInstruction extends Node {
public String getTarget();
public String getData();
public void setData(String data) throws DOMException;
}
Ex: <?robots index="yes" follow="no“ ?>
 target = [robots]
 data = [index="yes" follow="no“ ]
Transparency No. 58
DOM
Comment Nodes
 Represents a comment like this example from the XML
1.0 spec:
<!--* This is a comment -->
 No children
 The Comment Interface
package org.w3c.dom;
public interface Comment extends CharacterData { }
 Notes: Text, CDATASection, Comment are all
subinterfaces of CharacterData and can use all methods
defined in it.
Transparency No. 59
DOM
Entity (for internal /external parsed/unparsed general entities)
public interface Entity extends Node { // PEs not represented
public String
getPublicId(); // external only
public String
getSystemId(); // external only
public String
getNotationName(); // unparsed only
public String
getXmlEncoding() // external only
 encoding given at text declaration;
 null if not given or not external entity.
public String
getInputEncoding() // external only
 actual encoding used for parsing; = XmlEncoding if it is given.
public String
getXmlVersion()
} // external only
// Entity’s replacement Text are stored as its readonly
// childNodes if available.Its text can be got by getTextContent()
Transparency No. 60
DOM
Notation, Entity and EntityReference
public interface Notation extends Node {
public String getPublicId();
public String getSystemId(); }
public interface EntityReference extends Node { }
// referred entity contents are children of this node.
// nodeName contains entity name referenced. }
Transparency No. 61
DOM
DOMException
 A runtime exception but you should catch it
 Error code accessible from the public code field
 Error code gives more detailed information:
import static DOMException.*;
 DOMException.INDEX_SIZE_ERR
 Index or size is negative, or greater than the allowed value
 DOMSTRING_SIZE_ERR
 The specified range of text does not fit into a String
 HIERARCHY_REQUEST_ERR
 Attempt to insert a node somewhere it doesn't belong
 WRONG_DOCUMENT_ERR
 If a node is used in a different document than the one that created it
(that doesn't support it)
 INVALID_CHARACTER_ERR
 An invalid or illegal character is specified, such as in a name.
 NO_DATA_ALLOWED_ERR
 Attempt to add data to a node which does not support data
Transparency No. 62
DOM
DOMException
 NO_MODIFICATION_ALLOWED_ERR
 Attempt to modify a read-only object
 NOT_FOUND_ERR
 Attempt to reference a node in a context where it does not exist
 NOT_SUPPORTED_ERR
 The implementation does not support the type of object requested
 INUSE_ATTRIBUTE_ERR
 Attempt to add an attribute to an element that already has that attribute
 INVALID_STATE_ERR
 An attempt is made to use an object that is not, or no longer, usable.
 SYNTAX_ERR
 An invalid or illegal string is specified.
 INVALID_MODIFICATION_ERR
 An attempt to modify the type of the underlying object.
 NAMESPACE_ERR
 An attempt is made to create or change an object in a way which is incorrect
with regard to namespaces.
 INVALID_ACCESS_ERR
 A parameter or an operation is not supported by the underlying object.
Transparency No. 63
DOM
The DOMImplementation interface
 Creates new Document objects
 Creates new DocType objects
 Tests features supported by this implementation
 get impementation of other features
Transparency No. 64
DOM
DOMImplementation interface
package org.w3c.dom;
public interface DOMImplementation {
public boolean hasFeature(String feature, String version)
public Object getFeature(String feature, String version)
public DocumentType createDocumentType(String qName,
String publicID, String systemID, String internalSubset)
public Document createDocument(String uri, String qName,
DocumentType doctype) throws DOMException
}
Transparency No. 65
DOM
org.apache.xerces.dom.DOMImplementationImpl
 The Xerces-specific class that implements DOMImplementation
package org.apache.xerces.dom; // or
// package com.sun.org.apache.xerces.dom;
public class DOMImplementationImpl implements
DOMImplementation {
// factory method
public static DOMImplementation getDOMImplementation()
public boolean hasFeature(String feature, String version)
public Object getFeature(String feature, String version)
public DocumentType createDocumentType(String qName,
String publicID, String systemID, String internalSubset)
public Document createDocument(String uri, String qName,
DocumentType doctype) throws DOMException
}
Transparency No. 66
DOM
Examples of creating DOM documents in the memory
 FibonacciDOM.java using Xerces-j
 FibonacciJAXP.java using JAXP.
Transparency No. 67
DOM
Which modules and features are supported?
 A DOM application can use the
hasFeature() method of the
DOMImplementation interface to
determine whether a module is
supported or not.
 XML Module:
 "XML"
 HTML Module:
 "HTML"
 Views Module:
 "Views"
 StyleSheets Module:
 "StyleSheets"
 CSS Module:
 "CSS“
 CSS (extended interfaces) Module:
 "CSS2"
 Events Module:
 "Events"
 User Interface Events (UIEvent
interface) Module:
 "UIEvents"
 Mouse Events Module:
 "MouseEvents"
 Mutation Events Module:
 "MutationEvents"
 HTML Events Module:
 "HTMLEvents"
 Traversal Module:
 "Traversal"
 Range Module:
 "Range"
Transparency No. 68
DOM
Which modules are supported?
import org.apache.xerces.dom.DOMImplementationImpl;
import org.w3c.dom.*;
import java.io.*;
public class ModuleChecker {
public static void main(String[] args) {
// parser dependent
DOMImplementation implementation
= DOMImplementationImpl.getDOMImplementation();
String[] features = { "XML", "HTML", "Views", "StyleSheets",
"CSS", "CSS2", "Events", "UIEvents", "MouseEvents",
"MutationEvents", "HTMLEvents", "Traversal", "Range"};
for (int i = 0; i < features.length; i++) {
if (implementation.hasFeature(features[i], "2.0")) {
System.out.println("Implementation supports " + features[i] );
} else {
System.out.println("Implementation does not support " + features[i]);
} } }}
Transparency No. 69
DOM
The result
> java ModuleChecker
Implementation supports XML
Implementation does not support HTML
Implementation does not support Views
Implementation does not support StyleSheets
Implementation does not support CSS
Implementation does not support CSS2
Implementation supports Events
Implementation does not support UIEvents
Implementation does not support MouseEvents
Implementation supports MutationEvents
Implementation does not support HTMLEvents
Implementation supports Traversal
Implementation supports Range
>
Transparency No. 70
DOM
Which modules are supported?
import org.apache.xerces.dom.DOMImplementationImpl;
import org.w3c.dom.*;
import java.io.*;
public class ModuleChecker {
public static void main(String[] args) {
// use jax 1.3
DOMImplementation implementation = DocumentBuilderFactory
.newInstance().newDocumentBuilder().getDOMImplementation();
String[] features = { "XML", "HTML", "Views", "StyleSheets",
"CSS", "CSS2", "Events", "UIEvents", "MouseEvents",
"MutationEvents", "HTMLEvents", "Traversal", "Range"};
for (int i = 0; i < features.length; i++) {
if (implementation.hasFeature(features[i], “3.0")) {
System.out.println("Implementation supports " + features[i] );
} else {
System.out.println("Implementation does not support " + features[i]);
} } }}
Transparency No. 71
DOM
The result
> java ModuleChecker
Implementation supports XML
Implementation does not support HTML
Implementation does not support Views
Implementation does not support StyleSheets
Implementation does not support CSS
Implementation does not support CSS2
Implementation supports Events
Implementation does not support UIEvents
Implementation does not support MouseEvents
Implementation supports MutationEvents
Implementation does not support HTMLEvents
Implementation supports Traversal
Implementation supports Range
>
Transparency No. 72
DOM
Serialization
 The process of taking an in-memory DOM tree and converting it to
a stream of characters that can be written onto an output stream
 Not a standard part of DOM Level 2
 The org.apache.xml.serialize package:
 public interface DOMSerializer
 public interface Serializer
 public abstract class BaseMarkupSerializer extends Object
 implements DocumentHandler, org.xml.sax.misc.LexicalHandler,
DTDHandler, org.xml.sax.misc.DeclHandler,
 DOMSerializer, Serializer
 public class HTMLSerializer extends BaseMarkupSerializer
 public final class TextSerializer extends BaseMarkupSerializer
 public final class XHTMLSerializer extends HTMLSerializer
 public final class XMLSerializer extends BaseMarkupSerializer
Transparency No. 73
DOM
Example
 A DOM program that writes Fibonacci numbers onto
System.out
 FibonacciDOMSerializer.java
Transparency No. 74
DOM
OutputFormat
 For pretty format of output.
package org.apache.xml.serialize;
public class OutputFormat extends Object {
public OutputFormat( [String method, String encoding, boolean indenting ])
public OutputFormat( [Document doc,] String encoding, boolean indenting)
// abbreviated as public property String method;
// typical values: “xml”, “html” and “text”
public String getMethod(); public void setMethod(String method)
// other public properties :
int
indent, lineWidth;
boolean indenting, OmitXMLDeclaration, Standalone, LineSeparator,
PreserveSpace;
String
encoding, version, mediaType, DoctypePublic, DoctypeSystem;
public void
setDoctype(String publicID, String systemID)
// Elements whose text children should be output as CDATA
public String[] getCDataElements()
public boolean isCDataElement(String tagName)
public void
setCDataElements(String[] cdataElements)
Transparency No. 75
DOM
OutputFormat
 //NonEscape elements; i.e., text children output without using char
reference
public String[] getNonEscapingElements()
public boolean isNonEscapingElement(String tagName)
public void
setNonEscapingElements(String[] nonEscapingElements)
// last printable character in the encoding
public char
getLastPrintable()
 Query methods
public static String whichMethod(Document doc)
public static String whichDoctypePublic(Document doc)
public static String whichDoctypeSystem(Document doc)
public static String whichMediaType(String method)
Transparency No. 76
DOM
Better formatted output
 UTF-8 encoding, Indentation, Word wrapping
 Document type declaration
try {
// Now that the document is created we need to *serialize* it
OutputFormat format = new OutputFormat(fibonacci, “UTF-8", true);
format.setLineSeparator("\r\n");
format.setLineWidth(72);
format.setDoctype(null, "fibonacci.dtd");
XMLSerializer serializer = new XMLSerializer(System.out, format);
serializer.serialize(root);
}
catch (IOException e) { System.err.println(e); }
 > Java domexample. PrettyFibonacciDOMSerializer
Transparency No. 77
DOM
DOM based XMLPrettyPrinter
public class DOMPrettyPrinter {
public static void main(String[] args) {
DOMParser parser = new DOMParser();
for (int i = 0; i < args.length; i++) {
try {
// Read the entire document into memory
parser.parse(args[i]);
Document document = parser.getDocument();
// set output format & serialize
OutputFormat format = new OutputFormat(document, "UTF-8", true);
format.setLineSeparator("\r\n");
format.setIndent(2);
format.setPreserveSpace(false);
format.setIndenting(true);
format.setLineWidth(72);
XMLSerializer serializer = new XMLSerializer(System.out, format);
serializer.serialize(document);
}
catch (SAXException e) {
catch (IOException e) {
} } // end main }
System.err.println(e);
}
System.err.println(e);
}
Transparency No. 78
DOM
Notes
 Using the DOM to write documents automatically
maintains well-formedness constraints
 Validity is not automatically maintained.
Transparency No. 79
DOM
References
 Much code this presentation uses came from:
http://www.cafeconleche.org/slides/sd2004west/saxdom
 Processing XML with Java Elliotte Rusty Harold,
Chapters 9-13:





Chapter 9, The Document Object Model:
Chapter 10, Creating New XML Documents with DOM:
Chapter 11, The Document Object Model Core:
Chapter 12, The DOM Traversal Module:
Chapter 13, Output from DOM:
 DOM Level 2 Core Specification:
 DOM Level 2 Traversal and Range Specification:
Transparency No. 80
DOM
JAXP(Java API for XML ) for
DOM
Transparency No. 1
DOM
DOMParsers and DOMImplementations
XML
Document
DOM
Parser
DOM
Document
Problems:
 How to get a DOM Document object from an XML
Document ?
 Get a DOM Parser, parse an XML document and then get a
DOM document.
 HOW to construct DOM objects directly by program?
 Get a DOMImplementation, invoke cerateDocument() to get the
initial DOM document.
 HOW to get a DOM object form an XML Document and
modify it by programs ?
 Get a DOM document by parsing the XML Document, use the
factory methods of Document to create Nodes and use Node
methods to add them to the result tree.
Transparency No. 82
DOM
Use Apache’s xerces for DOM
 XML2DOM:
// find the DOM parser implementation class:
// import org.apache.xerces.parsers.DOMParser
// or import com.sun.org.apache.xerces.internal.parsers.DOMParser
DOMParser parser = new DOMParser();
parser.setFeature(("http://xml.org/sax/features/validation", true );
parser.setFeature(("http://xml.org/sax/features/namespace",
true ); …
parser.parse( url_or_inputSource) ;
Document doc = parser.getDocument();
DOMImplementation dm = doc.getImplementation();
Transparency No. 83
DOM
Construct DOM from scratch
// find DOMImplematation class:
// org.apache.xerces.dom.DOMImplementationImpl
DOMImplementation dm = new
DOMImplementationImpl();
// or dm =
DOMImplementationImpl.getDOMImplementation();
Document doc = dm.createDocument(…);
Element e = doc.createElement(…);
Attr attr = doc.createAttributeNS(…);
Text txt = doc.createTextNode(“…”);
Transparency No. 84
DOM
JAXP (Java API for XML Processing)
 Sun’s Java API for XML Processing
 three modules:
 for DOM Processing
 for SAX Processing
 for Transformation
 5 packages
 1. javax.xml.parsers
 Provides classes allowing the processing of XML
documents.
 Two types of plugable parsers are supported:
 SAX (Simple API for XML)
 DOM (Document Object Model)
 2. javax.xml.transform ( + … )
 APIs for processing transformation instructions, and
performing a transformation from source to result.
Transparency No. 85
DOM
JAXP’s DOM plugability mechanism
Transparency No. 86
DOM
JAXP API for DOM
 javax.xml.parsers.DocumentBuilder
 Using this class, an application programmer can
obtain a Document from XML.
 javax.xml.parsers.DocumentBuilderFactory
 a factory class for obtaining a DocumentrBuilder.
 abstract class
 Concrete subclass can be obtained by the static
method:

DocumentBuilderFactory.newInstance()
 desired capability of the parser can be specified
by setting the various properties of the obtained
factory instance.
Transparency No. 87
DOM
Example code snippet
import javax.xml.parsers.*;
DocumentBuilder builder;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setValidating(true);
String location = "http://myserver/mycontent.xml";
try {
builder = factory.newDocumentBuilder();
Document doc1 = builder.parse(location);
Document doc2 = builder.newDocument(); //empty document
} catch (SAXException se) {// handle error
} catch (IOException ioe) { // handle error
} catch (ParserConfigurationException pce){// handle error
}
Transparency No. 88
javax.xml.dom.DocumentBuilder
DOM
 abstract DOMImplementation getDOMImplementation()
 Obtain an instance of a DOMImplementation object.
 abstract Document newDocument()
 Obtain a new instance of a DOM Document object to build a DOM tree with.
 abstract boolean isNamespaceAware()
 Indicates whether or not this parser is configured to understand namespaces.
 abstract boolean isValidating()
 Indicates whether or not this parser is configured to validate XML documents.
 Document parse(File | InputSource | InputStream [, systemId] | uriString )
 Parse the content of the given file as an XML document and return a new
DOM Document object.
 abstract void setEntityResolver(EntityResolver er)
 Specify the EntityResolver to be used to resolve entities present in the XML
document to be parsed.
 abstract void setErrorHandler(ErrorHandler eh)
 Specify the ErrorHandler to be used to report errors present in the XML
document to be parsed.
Transparency No. 89
DOM
javax.xml.dom.DocumentBuilderFactory
 Object getAttribute(String name)
 void setAttribute(String name, Object value)
 Allows users to set/get specific attributes on the underlying
implementation.
 boolean isIgnoringComments() , setIgnoringComments(boolean)
 Indicates whether or not the factory is configured to produce parsers
which ignores comments.
 Other properties:
 IgnoringElementContentWhitespace ; ExpandEntityReferences;
 Coalescing; // merge adjacent texts and CDATA into a text node
 NamespaceAware; Validating;
 abstract DocumentBuilder newDocumentBuilder()
 Creates a new instance of a DocumentBuilder using the currently
configured parameters.
 static DocumentBuilderFactory newInstance()
 Obtain a new instance of a DocumentBuilderFactory.
Transparency No. 90
DOM
HOW DocumentBuilderFactory finds its instance
 Use the javax.xml.parsers.DocumentBuilderFactory
system property
 Use the above property at file
“%JAVA_HOME%/lib/jaxp.properties" in the JRE
directory.
 look for the classname in the file META-INF/services/
javax.xml.parsers.DocumentBuilderFactory in jars
available to the runtime.
 Platform default DocumentBuilderFactory instance,
which is
 “com.sun.org.apache.xerces.internal.jaxp.DocumentBuilde
rFactoryImpl” for jdk 1.5. and j2se 6.
Transparency No. 91
DOM
DOM level 3 Load and Save Specification
 provide an API for loading/Saving DOM Objects
 specification ;
JavaDoc API
 Main interfaces










DOMImplementationLS
LSInput : where to get XML text
LSOutput : where to put cml text
LSParser : XMLtext2DOM
LSSerializer : DOM2XMLText
LSLoadEvent : { getInput(); getNewDocument(); }
LSParserFilter
LSProgressEvent:{getInput();getPosition();getTotalSize()}
LSResourceResolver: like EntityResolver in Sax
LSSerializerFilter
Transparency No. 92
DOM
DOMImplementationLS
 Main Factory interface for other LS classes instances
 methods :
 createLSInput() : LSInput // create an empty LSInput
 createLSOutput(): LSOutput // empty LSOutput
 createLSParser(short mode, String schemeType):LSParser
mode: MODE_SYNCHROUS or MODE_ASYNCHROUS
schemeType: dtd  “http://www.w3.org/TR/rec-xml”

xml schema  “http://www.w3.org/2001/xmlschema”
 createLSSerializer():LSSerializer
Transparency No. 93
DOM
How to get a DOMImplementationLS implementation?
 get a DOMImplementation:






impl = DocumentBuilderFactory.newInstance()
.newDocumentBuilder();
/* other implementation dependent code possible. e.g.,
org.apache.xerces.dom.DOMImplementationImpl impl
= DOMImplementationImpl.getDOMImplementation() ;
*/
 cast impl to DOMImplementationLS if it support LS:





DOMImplementationLS ls = null ;
if( impl.hasFeature(“LS”, “3.0”) ) {
ls = (DOMImplementationLS) impl ;
} else { out.println(“LS 3.0 not supported!”); exit() ; }
Transparency No. 94
DOM
LSOutput
 represents an output destination for data.
 may wrap 1 or more of
 a character stream
 a byte stream (+ character encoding )
 a system id
LSOutput
 code snippet:
lsout = ls.createLSOutput() ;
characterStream : Writer
lsout.setCharacterStream(new
FileWriter(“file1.xml”));
byteStream : OutputStream
lsout.setEncoding(“UTF-8”) ;
systemId: String
lsout.setSystemId(“file:file2.xml”);
encoding : String // see http://www.w3.org/TR/
2004/REC-xml-20040204/#charencoding for its
format.
Transparency No. 95
DOM
LSInput
 represents an input source for data.
 actual input is found by the order





•
chartacter stream
byteStream
stringData
systemId
publicId
LSInput
characterStream : Reader
byteStream: InputStream
stringData: String
systemId: String
code snippet:
publicId: String
lsin = ls.createLSInput() ;
lsin.setCharacterStream(new
FileReader(“file1.xml”));
lsin.setEncoding(“UTF-8”) ;
lsin.setBaseURI(“file:file1.xml”) ;
lsin.setSystemId(“file:file2.xml”);
encoding: String
baseURI: String
certifiedText: boolean
(can be converted to UTF-* form)
Transparency No. 96
DOM
LSSerializer
 provide an API for serializing (writing) a DOM document
out into XML.
 The XML data is written to a string or an output stream.
 Filter can be used to dertermine which nodes should be
serialized.
 To control a LSSerializer,
LSSerializer
we should :




cfg = serializer.getDomConfig() ;
cfg.setParameter(…),
…
serializer.write(doc|node, …)
domConfigR : DOMConfiguration
newLine: String
filter: LSSerializerFilter
write(Node, LSOutput) : void
writeToString(Node) : String
writeToURI(Node, String uri) : void
Transparency No. 97
DOM
org.w3c.dom.DOMConfiguration
 represents the configuration of a document and
maintains a table of recognized parameters.
 affect Document.normalizeDocument() behavior, such as
replacing the CDATASection nodes with Text nodes or
specifying the schema to be used for validation.
 used in [DOM Level 3 Load and Save] in the DOMParser
and DOMSerializer interfaces.
DOMConfiguration
canSetParameter(String name, Object value) : boolean
setParameter(String name, Object value): void
getParameter(String name) :Object
DOMStringList
boolean contains(String str)
int
getLength()
String item(int index)
getParameterNames() :DOMStringList
Transparency No. 98
DOM
Configurable parameters of DOMConfiguration
parameter
possible value (Type)
meaning
canonical-form
true / false
do xml canonicalization
discard-default-content
true/false
don’t show default
content in document
format-pretty-print
true/false
exact transformation not
specified in this spec.
ignore-unknown-characterdenormalizations
true/false
warning or raise an error
when an unknown
character is encountered
normalize-characters
true/false
xml-declaration
true/false
show xml or text
declaration
Transparency No. 99
DOM
Additional parameters
parameter
possible value (Type)
meaning
cdata-section
true/false
preserve cdata-section
true/false
preserve comments
element-content-whitespace
true/false
keep ignorable whitespace
entities
true/false
Keep EntityReference
nodes in the document
error-handler
DOMErrorHandler
Error Handler
well-formed
true/false
check well-formed w.r.t xmlversion
validate
true/false
validate the document when
parsing normalization
validate-if-schema
true/false
validate if a schema (dtd
or … ) for the doc element
can be found
schema-type
String(absolute URI)
schema languages used
(dtd, xml-schema,…)
schema-location
DOMStringList (URIs)
location ofTransparency
schemasNo. 100
comments
DOM
LSParser
 An interface to an object that is able to build, or augment, a
DOM tree from various input sources.
 methods:




getDomConfig()
get/setFilter(LSParserFilter)
parse(LSInput), parseURI(Strnig uri) : Document
parseWithContext(LSInput input, Node contextArg,
short action)
Parse an XML fragment from an LSInput.
insert content into position specified by context/action.
possible actions: replace, append-as-child, replace-children,
before, after, replace
Transparency No. 101
DOM
LSParser




abort() // abort parsing
getBusy() : boolean // check if the parser is busy parsing
getAsync() // chenck if this is an asynchronous parser.
should implement events.EventTarget to for
asynchronous parser.
 Org.w3c.events.*
 EventTarget:
Add/removeEventListener(String type, EventListener, bool
useCapture)
dispatchEvent (org.w3c.events.Event)
 EventListener
handleEvent(Event) : void
 LSLoadEvent extends Event : … getType(), getTarget(),…
Transparency No. 102
DOM
LSParserFilter
 methods:
 short acceptNode(Node nodeArg)
 This method will be called by the parser at the completion of the
parsing of each node.
 values: FILTER_ACCEPT, _REJECT(subtree), _SKIP(a node only),
_INTERRUPT(accept & terminate)
 int getWhatToShow()
 Tells the LSParser what types of nodes to show to the method
LSParserFilter.acceptNode.
 constants used defined in [DOM Level 2 Traversal and Range:
NodeFilter] .
 unpassed nodes are built automatically into the resulting
document.
 short startElement(Element elementArg)
 call this method after each Element start tag has been scanned, but
before the remainder of the Element is processed.
 purpose: allow element to be skipped quickly. return value same as
acceptNode().
Transparency No. 103
Download