DOM DOM (Document Object Model) Cheng-Chia Chen Transparency No. 1 DOM What is DOM? DOM (Document Object Model) A tree-based Data model of XML Documents An API for XML document processing cross multi-languages language neutral. defined in terms of CORBA IDL language-specific bindings supplied for ECMAScript, java, …. Transparency No. 2 DOM Document Object Model Defines how XML and HTML documents are represented as objects in programs W3C Standard Defined in IDL; thus language independent HTML as well as XML Writing as well as reading Covers everything except internal and external DTD subsets Transparency No. 3 DOM Trees An XML document can be represented as a tree. It has a root. It has nodes. It is amenable to recursive processing. Transparency No. 4 DOM DOM (Document Object Model) What is the tree view of the document ? <?xml version=“1.0” encoding=“UTF-8” ?> <TABLE> <TBODY> <TR> <TD>紅樓夢</TD> <TD>曹雪芹</TD> </TR> <TR> <TD>三國演義</TD> <TD>羅貫中</TD> </TR> </TBODY> </TABLE> Transparency No. 5 DOM Tree view (DOM view) of an XML Docuemnt (document node; root) (element node) (text node) 紅樓夢 曹雪芹 三國演義 羅貫中 Transparency No. 6 DOM DOM Evolution DOM Level 0: DOM Level 1, a W3C Standard DOM Level 2, a W3C Standard DOM Level 3: W3C Standard: Document Object Model (DOM) Level 3 Core Specification Document Object Model (DOM) Level 3 Load and Save Specification Document Object Model (DOM) Level 3 Validation Specification DOM Level 3 : W3C Working group notes Document Object Model (DOM) Level 3 XPath Specification Version 1.0 Document Object Model (DOM) Level 3 Views and Formatting Specification Document Object Model (DOM) Level 3 Events Specification Version 1.0 W3c DOM Working group W3C DOM Tech Reports Transparency No. 7 DOM DOM Implementations for Java Apache XML Project's Xerces parsers: http://xml.apache.org/xerces2-j/index.html Oracle/Sun's Java API for XML http://jaxp.java.net/ GNU JAXP: http://www.gnu.org/software/classpathx/jaxp/jaxp.html Now part of GNU Classpath Transparency No. 8 DOM Modules Modules: Core: org.w3c.dom (L1~L3) Traversal: org.w3c.dom.traversal (L2) Xpath, Load and Save, Validation (L3) Range: org.w3c.dom.range (L2) HTML: org.w3c.dom.html (L2) Views: org.w3c.dom.views(L2) StyleSheets: org.w3c.dom.stylesheets CSS: org.w3c.dom.css Events: org.w3c.dom.events (L2) Only the core,traversal, XPath, L&S, and Validation modules really apply to XML. The others are for HTML. Transparency No. 9 DOM DOM Trees Entire document is represented as a tree. A tree contains nodes. Some nodes may contain other nodes (depending on node type). Each document node contains: zero or one doctype nodes one root element node zero or more comment and processing instruction nodes Transparency No. 10 DOM org.w3c.dom 17 interfaces: Attr CDATASection CharacterData Comment Document DocumentFragment DocumentType DOMImplementation Element Entity EntityReference NamedNodeMap Node NodeList Notation ProcessingInstruction Text plus one exception: DOMException Plus a bunch of HTML stuff in org.w3c.dom.html and other packages Transparency No. 11 DOM The DOM Interface Hierarchy NamedNodeMap DOMImplementation NodeList DOMException Node Fundamental Interface Document CharacterData Comment Attr Text Element DocumentType Notation Extended Interface CDATASection Entity EntityReference ProcessingInstruction DocumentFragment Transparency No. 12 DOM Steps to use DOM Creates a parser using library specific code Use the parser to parse the document and return a DOM org.w3c.dom.Document object. The entire document is stored in memory. DOM methods and interfaces are used to extract data from this object Transparency No. 13 DOM Parsing documents with a (Xerces) DOM Parser Example import com.sun.org.apache.xerces.internal.parsers.*; // import org.apache.xerces.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*; public class DOMParserMaker { public static void main(String[] args) { DOMParser parser = new DOMParser(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory parser.parse(args[i]); Document d = parser.getDocument(); // work with the document... } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } }} Transparency No. 14 DOM Parsing process using JAXP javax.xml.parsers.DocumentBuilderFactory.newInstance() creates a DocumentBuilderFactory Configure the factory The factory's newDocumentBuilder() method creates a DocumentBuilder Configure the builder The builder parses the document and returns a DOM org.w3c.dom.Document object. The entire document is stored in memory. DOM methods and interfaces are used to extract data from this object Transparency No. 15 DOM JAXP’s DOM plugability mechanism Transparency No. 16 DOM Parsing documents with a JAXP DocumentBuilder import javax.xml.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*; public class JAXPParserMaker { public static void main(String[] args) { try { DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); builderFactory.setNamespaceAware(true); DocumentBuilder parser = builderFactory.newDocumentBuilder(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory Document d = parser.parse(args[i]); // work with the document... } catch (SAXException e) { System.err.println(e); catch (IOException e) { System.err.println(e); } } } // end for } catch (ParserConfigurationException e) { System.err.println("You need to install a JAXP aware parser."); }}} Transparency No. 17 DOM The Node Interface package org.w3c.dom; public interface Node { // NodeType public static final short ELEMENT_NODE = 1; public static final short ATTRIBUTE_NODE = 2; public static final short TEXT_NODE = 3; public static final short CDATA_SECTION_NODE = 4; public static final short ENTITY_REFERENCE_NODE = 5; public static final short ENTITY_NODE = 6; public static final short PROCESSING_INSTRUCTION_NODE = 7; public static final short COMMENT_NODE = 8; public static final short DOCUMENT_NODE = 9; public static final short DOCUMENT_TYPE_NODE = 10; public static final short DOCUMENT_FRAGMENT_NODE = 11; public static final short NOTATION_NODE = 12; Transparency No. 18 DOM The Node interface Node Properties name(qname, uri, lname, prefix), type, value public String getNodeName(); public String getNamespaceURI(); public String getPrefix(); public void setPrefix(String prefix) throws DOMException; public String getLocalName(); public String getNodeValue() throws DOMException; public String setNodeValue(String value) throws DOMException; public short getNodeType(); Transparency No. 19 DOM The Node interface Tree navigation public Node getParentNode(); public NodeList getChildNodes(); public Node getFirstChild(); public Node getLastChild(); public Node getPreviousSibling(); public Node getNextSibling(); public NamedNodeMap getAttributes(); public Document getOwnerDocument(); public boolean hasChildNodes(); public boolean hasAttributes(); Transparency No. 20 DOM Node navigation parentNode this nextSibling previousSibling firstChild lastChild childNodes Transparency No. 21 DOM The Node interface Tree Modification public Node insertBefore (Node newNode, Node refNode) throws DOMException; public Node appendChild(Node newNode) throws DOMException; public Node replaceChild (Node newNode, Node refNode) throws DOMException; public Node removeChild(Node node) throws DOMException; Transparency No. 22 DOM Node manipulation this firstChild refNode this.appendChild(newNode) lastChild childNodes this.insertBefore(newNode, refNode) this.replaceChild(newNode, refNode) This.removeNode(refNode) newNode Transparency No. 23 DOM The Node interface Utilities public Node cloneNode(boolean deep); public void normalize(); merge all adjacent text nodes into one. CDATASECTION delimiters reserved No empty text nodes public boolean isSupported(String feature, String version); Tests whether the DOM implementation implements a specific feature and that feature is supported by this node. Transparency No. 24 DOM Node (continued) new in DOM 3 String getTextContent() returns the text content of this node and its descendants. i.e., the string-value of the node in xpath view (except for Document, of which getTextContent() is null). void setTextConent(String arg ) reset arg as the unique child of this node boolean isDefaultNamespace(String namespaceURI) This method checks if the specified namespaceURI is the default namespace or not. boolean isEqualNode(Node arg) // Tests if two nodes are equal. When are two nodes equal? Same type, names, attributes, value & childNodes boolean isSameNode(Node other) Returns whether this node is the same node as the given one. String lookupNamespaceURI(String prefix) Look up the namespace URI associated to the given prefix, starting from this node. Transparency No. 25 DOM short compareDocumentPosition(Node other) Compare ‘this’ node with ‘other’ node possible values: Node.DOCUMENT_POSITION_PRECEDING, _FOLLOWING,_CONTAINS,CONTAINED_BY,_DISCONNECTED, _IMPLEMENTATION_SPECIFIC String getBaseURI() //order: xml:base entity document The absolute base URI of this node or null if the implementation wasn't able to obtain an absolute URI. Object getUserData(String key) Retrieves the object associated to a key on this node. Object setUserData(String key, Object data, UserDataHandler handler) Associate an object to a key on this node. handler can handle events (clone, del, renamed etc.) for the node Transparency No. 26 DOM UserDataHandler (skipped!) gets called when the node the object is associated to is being cloned, adopted, deleted , imported, or renamed. can be used by the application to implement various behaviors regarding the data it associates to the DOM nodes. void handle(short operation, String key, Object data, Node src, Node dst) This method is called whenever the node for which this handler is registered is imported or cloned. operation - Specifies the type of operation that is being performed on the node. : { adopt, clone, import, delete, rename } methods: Doc.adoptNode(), Node.cloneNode(), Doc.importNode(),Node.removeChild(node), Doc.renameNode() key - Specifies the key for which this handler is being called. data - Specifies the data for which this handler is being called. src - Specifies the node being cloned, adopted, imported, or renamed. This is null when the node is being deleted. dst - Specifies the node newly created if any, or null. Transparency No. 27 DOM The NodeList Interface Represent an ordered collection of nodes, without defining or constraining how this collection is implemented. package org.w3c.dom; public interface NodeList { // 0-based public Node item(int index); // access by position! public int getLength(); } Why not just List<Node> ? applicable to Java only. but DOM is defined not only for Java. Transparency No. 28 DOM The NamedNodeMap interface Represent collections of nodes that can be accessed by name. public interface NamedNodeMap { public Node item(int index); // same as NodeList public int getLength(); public Node getNamedItem(String name); // key = nodeName public Node setNamedItem(Node arg) throws DOMException; // insert/replace node depending on if the map has a node with the same name as arg.getNodeName() // old node returned if this is a replacement! public Node removeNamedItem(String name) throws DOMException; // Introduced in DOM Level 2: key=URI+localName public Node getNamedItemNS(namespaceURI, localName); public Node setNamedItemNS(Node arg) throws DOMException; public Node removeNamedItemNS(namespaceURI, localName) throws DOMException ; } Transparency No. 29 DOM DOMStringList, NameLIst DOMStringList // List<String> an ordered collection of DOMString(i.e., Java String) values. boolean contains(String str) Test if a string is part of this DOMStringList. + getLength() + item(int) NameList // List< ( prefix Name,NamespaceURI) > an ordered collection of pairs of (prefix) name and namespace values (which could be null values). int getLength() String getName(int index) String getNamespaceURI(int index) boolean contains(String str) Test if a name is part of this NameList. boolean containsNS(String namespaceURI, String name) Test if the pair namespaceURI/name is part of this NameList. Transparency No. 30 DOM NodeReporter import javax.xml.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*; public class NodeReporter { public static void main(String[] args) { try { DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = builderFactory.newDocumentBuilder(); NodeReporter iterator = new NodeReporter(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory Document doc = parser.parse(args[i]); iterator.followNode(doc); } catch (SAXException ex) { System.err.println(args[i] + " is not well-formed."); } catch (IOException ex) { System.err.println(ex); } } } catch (ParserConfigurationException ex) { System.err.println("You need to install a JAXP aware parser."); } } // end main Transparency No. 31 DOM // note use of recursion public void followNode(Node node) { processNode(node); if (node.hasChildNodes()) { NodeList children = node.getChildNodes(); for (int i = 0; i < children.getLength(); i++) { followNode(children.item(i)); } } } public void processNode(Node node) { String name = node.getNodeName(); String type = typeName[node.getNodeType()]; System.out.println("Type " + type + ": " + name); } Transparency No. 32 DOM Type2TypeName Public String[ ] typeName = new String[]{ "Unknown Type“ , "Element“, "Attribute“, "Text“, "CDATA Section“, "Entity Reference“, "Entity“, "Processing Instruction“, "Comment“, "Document“, "Document Type Declaration“, "Document Fragment“, "Notation“, }} Transparency No. 33 DOM Values of NodeName, NodeValue and attributes in a Node Interface nodeName nodeValue attributes Attr name of attribute value of attribute null CDATASection#cdata-section content null Comment #comment content null Document #document null null DocumentFragment #document-fragment null null DocumentType document type name null null Element tag name null NamedNodeMap Entity entity name null null EntityReference null name of entity referenced null Notation notation name null null ProcessingInstruction content excluding target target null Text #text content of the text node null Transparency No. 34 DOM The Document Node The root node representing the entire document; not the same as the root element Contains: zero or more processing instruction nodes zero or more comment nodes zero or one document type node one element node Transparency No. 35 DOM The Document Interface package org.w3c.dom; public interface Document extends Node { public DocumentType getDoctype(); public DOMImplementation getImplementation(); public Element getDocumentElement(); public String getDocumentURI()V3; // =null if not specified or create using DOMImplementation.createDocuemnt() public NodeList getElementsByTagName(String tagname); public NodeList getElementsByTagNameNS(String NamespaceURI, String localName); public Element getElementById(String elementId); Transparency No. 36 DOM The Document Interface // Factory methods public Element createElement(String tagName) throws DOMException; public Element createElementNS(String namespaceURI, String qName) throws DOMException; public DocumentFragment createDocumentFragment(); public Text public Comment public CDATASection createTextNode(String data); createComment(String data); createCDATASection(String data) throws DOMException; public ProcessingInstruction createProcessingInstruction(String target, String data) throws DOMException; public Attr public Attr createAttribute(String name) throws DOMException; createAttributeNS(String namespaceURI, String qName) throws DOMException; public EntityReference createEntityReference(String name) throws DOMException; public Node importNode(Node importedNode, boolean deep) throws DOMException; } Transparency No. 37 DOM New in Document V3 Node adoptNode(Node node): adopt(i.e., move) trees rooted at node from its owner document to this document. It is detached from its parent if it has one. c.f. importNode(Node, deep) // this is a copy DOMConfiguration getDomConfig() DOMCOnfiguration is a table of (key, value) parameters used to control how DOCUMENT.normalizeDocument() behaves. normalizeDocument() acts as if the document was going through a save and load cycle, putting the document in a "normal" form. Transparency No. 38 DOM use case: DOMConfiguration docConfig = myDocument.getDomConfig(); docConfig.setParameter("infoset", Boolean.TRUE); myDocument.normalizeDocument(); Node renameNode(Node n, String namespaceURI, String qualifiedName) throws DOMException Rename an existing node of type ELEMENT_NODE or ATTRIBUTE_NODE. getXmlVersion(), getXmlEncoding(), getStandalone():boolean get respective value from the XML declaration of a document. <?xml version="1.1" encoding="UTF-8" standard="true" ?> setXmlVersion(String), setXMLStandalone(boolean) Transparency No. 39 DOM Element Nodes Represents a complete element including its start-tag, end-tag, and content Content may contain: Element nodes ProcessingInstruction nodes Comment nodes Text nodes CDATASection nodes EntityReference nodes Transparency No. 40 DOM The Element Interface public String getTagName(); // = getNodeName(); public NodeList getElementsByTagName(String name); public NodeList getElementsByTagNameNS(String rui, String localName); public String getAttribute(String name); public String getAttributeNS(String uri, String localName); public void public void setAttribute(String name, String value) throws DOMException; setAttributeNS(String uriURI, String qName, String value) throws DOMException; public void public void removeAttribute(String name) throws DOMException; removeAttributeNS(String uri, String localName) throws DOMException; public Attr public Attr getAttributeNode(String name); getAttributeNodeNS(String namespaceURI, String localName); public Attr public Attr setAttributeNode(Attr newAttr) throws DOMException; setAttributeNodeNS(Attr newAttr) throws DOMException; public Attr removeAttributeNode(Attr oldAttr) throws DOMException; Transparency No. 41 DOM Example application RSS-based list of Web logs <?xml version="1.0"?> <!-<!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd"> --> <weblogs> <log> <name>MozillaZine</name> <url>http://www.mozillazine.org</url> <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl> <ownerName>Jason Kersey</ownerName> <ownerEmail>kerz@en.com</ownerEmail> <description>THE source for news on the Mozilla Organization. DevChats, Reviews, Chats, Builds, Demos, Screenshots, and more.</description> <imageUrl></imageUrl> <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif </adImageUrl> </log> … </weblogs> Transparency No. 42 DOM DOM Design Want to find all URLs in the logs The character data of each url element needs to be read. Everything else can be ignored. The getElementsByTagName() method in Document gives us a quick list of all the url elements. Transparency No. 43 DOM The program WeblogsDOM .java Transparency No. 1 DOM CharacterData interface Represents things that are basically text holders Super interface of Text Comment CDATASection Transparency No. 45 DOM The CharacterData Interface Note: applicable to Comment, Text and CDATASection public interface CharacterData extends Node { // content retrieval public String getData() throws DOMException; public int getLength(); public String substringData(int offset, int count) throws DOMException; // 0-based // content modification public void setData(String data) throws DOMException; public void appendData(String arg) throws DOMException; public void insertData(int offset, String arg) throws DOMException; public void deleteData(int offset, int count) throws DOMException; public void replaceData(int offset, int count, String arg) throws DOMException; } Transparency No. 46 DOM Text Nodes Represents the text content of an element or an attribute Contains only pure text, no markup Parsers will return a single maximal text node for each contiguous run of pure text. Editing may change this. Transparency No. 47 DOM The Text Interface public interface Text extends CharacterData { public Text splitText(int offset) throws DOMException; split this into two, this becomes the first part and the last part is returned. String getWholeText() Returns all text of Text nodes logically-adjacent to this node, concatenated in document order. boolean isElementContentWhitespace() Returns whether this text node contains ignorable whitespace. Text replaceWholeText(String content) Replaces the text of the current node and all logically-adjacent text nodes with the specified text. return the Text node created with the new specified content. } Transparency No. 48 DOM CDATA section Nodes Represents a CDATA section like this example from a hypothetical SVG tutorial: <p>You can use a default <code>xmlns</code> attribute to avoid having to add the svg prefix to all your elements:</p> <![CDATA[ <svg xmlns="http://www.w3.org/2000/svg" width="12cm" height="10cm"> <ellipse rx="110" ry="130" /> <rect x="4cm" y="1cm" width="3cm" height="6cm" /> </svg> ]]> No children Transparency No. 49 DOM The CDATASection Interface // no additional methods other than those form Text public interface CDATASection extends Text { } Transparency No. 50 DOM DocumentType Nodes Represents a document type declaration Has no children Transparency No. 51 DOM The DocumentType Interface public interface DocumentType extends Node { public String public String public String getName(); getPublicId(); getSystemId(); public NamedNodeMap getEntities(); return all general entities, both external and internal, declared in the DTD. public NamedNodeMap getNotations(); public String getInternalSubset(); return the internal subset as a string or null if there is none. } Transparency No. 52 DOM Example <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd“ [ <!ENTITY foo "foo"> <!ENTITY bar "bar"> <!ENTITY bar "bar2"> <!ENTITY % baz "baz"> ]> name = “html” pubicId = "-//W3C//DTD XHTML 1.0 Strict//EN" systemId= "DTD/xhtml1-strict.dtd“ internalSubset="<!ENTITY foo "foo"> … " getEntities() [ ‘foo’: entity(foo),‘bar’:entity(bar)] // readonly getNotations() null Transparency No. 53 DOM Attr Nodes Represents an attribute Contains: Text nodes Entity reference nodes Transparency No. 54 DOM The Attr Interface public interface Attr extends Node { public String getName(); public boolean getSpecified(); //false => from DTD public String getValue(); public void setValue(String value) throws DOMException; public Element getOwnerElement(); Attr.getParent() == null Attr is not a member of Element.getChildren() // namespaceURI, prefix, localName inherited from Node public boolean isId()V3 // is this an ID attribute TypeInfo getSchemaTypeInfo() The type information associated with this attribute. } Transparency No. 55 DOM TypeInfo represents a type referenced from Element or Attr nodes, specified in the schemas associated with the document. = [namespace URI] + [type name] pubulic interface TypeInfo { String getTypeName() The name of a type declared for the associated element or attribute, or null if unknown. String getTypeNamespace() The namespace of the type declared for the associated element or attribute or null if the element does not have declaration or if no namespace information is available. boolean isDerivedFrom (String typeNamespaceArg, String typeNameArg, int derivationMethod) true if this is derived from the arg type definition. derMethod = one of { Extension, Restriction, union, list } Transparency No. 56 DOM ProcessingInstruction Nodes Represents a processing instruction like <?robots index="yes" follow="no"?> No children Transparency No. 57 DOM The ProcessingInstruction Interface public interface ProcessingInstruction extends Node { public String getTarget(); public String getData(); public void setData(String data) throws DOMException; } Ex: <?robots index="yes" follow="no“ ?> target = [robots] data = [index="yes" follow="no“ ] Transparency No. 58 DOM Comment Nodes Represents a comment like this example from the XML 1.0 spec: <!--* This is a comment --> No children The Comment Interface package org.w3c.dom; public interface Comment extends CharacterData { } Notes: Text, CDATASection, Comment are all subinterfaces of CharacterData and can use all methods defined in it. Transparency No. 59 DOM Entity (for internal /external parsed/unparsed general entities) public interface Entity extends Node { // PEs not represented public String getPublicId(); // external only public String getSystemId(); // external only public String getNotationName(); // unparsed only public String getXmlEncoding() // external only encoding given at text declaration; null if not given or not external entity. public String getInputEncoding() // external only actual encoding used for parsing; = XmlEncoding if it is given. public String getXmlVersion() } // external only // Entity’s replacement Text are stored as its readonly // childNodes if available.Its text can be got by getTextContent() Transparency No. 60 DOM Notation, Entity and EntityReference public interface Notation extends Node { public String getPublicId(); public String getSystemId(); } public interface EntityReference extends Node { } // referred entity contents are children of this node. // nodeName contains entity name referenced. } Transparency No. 61 DOM DOMException A runtime exception but you should catch it Error code accessible from the public code field Error code gives more detailed information: import static DOMException.*; DOMException.INDEX_SIZE_ERR Index or size is negative, or greater than the allowed value DOMSTRING_SIZE_ERR The specified range of text does not fit into a String HIERARCHY_REQUEST_ERR Attempt to insert a node somewhere it doesn't belong WRONG_DOCUMENT_ERR If a node is used in a different document than the one that created it (that doesn't support it) INVALID_CHARACTER_ERR An invalid or illegal character is specified, such as in a name. NO_DATA_ALLOWED_ERR Attempt to add data to a node which does not support data Transparency No. 62 DOM DOMException NO_MODIFICATION_ALLOWED_ERR Attempt to modify a read-only object NOT_FOUND_ERR Attempt to reference a node in a context where it does not exist NOT_SUPPORTED_ERR The implementation does not support the type of object requested INUSE_ATTRIBUTE_ERR Attempt to add an attribute to an element that already has that attribute INVALID_STATE_ERR An attempt is made to use an object that is not, or no longer, usable. SYNTAX_ERR An invalid or illegal string is specified. INVALID_MODIFICATION_ERR An attempt to modify the type of the underlying object. NAMESPACE_ERR An attempt is made to create or change an object in a way which is incorrect with regard to namespaces. INVALID_ACCESS_ERR A parameter or an operation is not supported by the underlying object. Transparency No. 63 DOM The DOMImplementation interface Creates new Document objects Creates new DocType objects Tests features supported by this implementation get impementation of other features Transparency No. 64 DOM DOMImplementation interface package org.w3c.dom; public interface DOMImplementation { public boolean hasFeature(String feature, String version) public Object getFeature(String feature, String version) public DocumentType createDocumentType(String qName, String publicID, String systemID, String internalSubset) public Document createDocument(String uri, String qName, DocumentType doctype) throws DOMException } Transparency No. 65 DOM org.apache.xerces.dom.DOMImplementationImpl The Xerces-specific class that implements DOMImplementation package org.apache.xerces.dom; // or // package com.sun.org.apache.xerces.dom; public class DOMImplementationImpl implements DOMImplementation { // factory method public static DOMImplementation getDOMImplementation() public boolean hasFeature(String feature, String version) public Object getFeature(String feature, String version) public DocumentType createDocumentType(String qName, String publicID, String systemID, String internalSubset) public Document createDocument(String uri, String qName, DocumentType doctype) throws DOMException } Transparency No. 66 DOM Examples of creating DOM documents in the memory FibonacciDOM.java using Xerces-j FibonacciJAXP.java using JAXP. Transparency No. 67 DOM Which modules and features are supported? A DOM application can use the hasFeature() method of the DOMImplementation interface to determine whether a module is supported or not. XML Module: "XML" HTML Module: "HTML" Views Module: "Views" StyleSheets Module: "StyleSheets" CSS Module: "CSS“ CSS (extended interfaces) Module: "CSS2" Events Module: "Events" User Interface Events (UIEvent interface) Module: "UIEvents" Mouse Events Module: "MouseEvents" Mutation Events Module: "MutationEvents" HTML Events Module: "HTMLEvents" Traversal Module: "Traversal" Range Module: "Range" Transparency No. 68 DOM Which modules are supported? import org.apache.xerces.dom.DOMImplementationImpl; import org.w3c.dom.*; import java.io.*; public class ModuleChecker { public static void main(String[] args) { // parser dependent DOMImplementation implementation = DOMImplementationImpl.getDOMImplementation(); String[] features = { "XML", "HTML", "Views", "StyleSheets", "CSS", "CSS2", "Events", "UIEvents", "MouseEvents", "MutationEvents", "HTMLEvents", "Traversal", "Range"}; for (int i = 0; i < features.length; i++) { if (implementation.hasFeature(features[i], "2.0")) { System.out.println("Implementation supports " + features[i] ); } else { System.out.println("Implementation does not support " + features[i]); } } }} Transparency No. 69 DOM The result > java ModuleChecker Implementation supports XML Implementation does not support HTML Implementation does not support Views Implementation does not support StyleSheets Implementation does not support CSS Implementation does not support CSS2 Implementation supports Events Implementation does not support UIEvents Implementation does not support MouseEvents Implementation supports MutationEvents Implementation does not support HTMLEvents Implementation supports Traversal Implementation supports Range > Transparency No. 70 DOM Which modules are supported? import org.apache.xerces.dom.DOMImplementationImpl; import org.w3c.dom.*; import java.io.*; public class ModuleChecker { public static void main(String[] args) { // use jax 1.3 DOMImplementation implementation = DocumentBuilderFactory .newInstance().newDocumentBuilder().getDOMImplementation(); String[] features = { "XML", "HTML", "Views", "StyleSheets", "CSS", "CSS2", "Events", "UIEvents", "MouseEvents", "MutationEvents", "HTMLEvents", "Traversal", "Range"}; for (int i = 0; i < features.length; i++) { if (implementation.hasFeature(features[i], “3.0")) { System.out.println("Implementation supports " + features[i] ); } else { System.out.println("Implementation does not support " + features[i]); } } }} Transparency No. 71 DOM The result > java ModuleChecker Implementation supports XML Implementation does not support HTML Implementation does not support Views Implementation does not support StyleSheets Implementation does not support CSS Implementation does not support CSS2 Implementation supports Events Implementation does not support UIEvents Implementation does not support MouseEvents Implementation supports MutationEvents Implementation does not support HTMLEvents Implementation supports Traversal Implementation supports Range > Transparency No. 72 DOM Serialization The process of taking an in-memory DOM tree and converting it to a stream of characters that can be written onto an output stream Not a standard part of DOM Level 2 The org.apache.xml.serialize package: public interface DOMSerializer public interface Serializer public abstract class BaseMarkupSerializer extends Object implements DocumentHandler, org.xml.sax.misc.LexicalHandler, DTDHandler, org.xml.sax.misc.DeclHandler, DOMSerializer, Serializer public class HTMLSerializer extends BaseMarkupSerializer public final class TextSerializer extends BaseMarkupSerializer public final class XHTMLSerializer extends HTMLSerializer public final class XMLSerializer extends BaseMarkupSerializer Transparency No. 73 DOM Example A DOM program that writes Fibonacci numbers onto System.out FibonacciDOMSerializer.java Transparency No. 74 DOM OutputFormat For pretty format of output. package org.apache.xml.serialize; public class OutputFormat extends Object { public OutputFormat( [String method, String encoding, boolean indenting ]) public OutputFormat( [Document doc,] String encoding, boolean indenting) // abbreviated as public property String method; // typical values: “xml”, “html” and “text” public String getMethod(); public void setMethod(String method) // other public properties : int indent, lineWidth; boolean indenting, OmitXMLDeclaration, Standalone, LineSeparator, PreserveSpace; String encoding, version, mediaType, DoctypePublic, DoctypeSystem; public void setDoctype(String publicID, String systemID) // Elements whose text children should be output as CDATA public String[] getCDataElements() public boolean isCDataElement(String tagName) public void setCDataElements(String[] cdataElements) Transparency No. 75 DOM OutputFormat //NonEscape elements; i.e., text children output without using char reference public String[] getNonEscapingElements() public boolean isNonEscapingElement(String tagName) public void setNonEscapingElements(String[] nonEscapingElements) // last printable character in the encoding public char getLastPrintable() Query methods public static String whichMethod(Document doc) public static String whichDoctypePublic(Document doc) public static String whichDoctypeSystem(Document doc) public static String whichMediaType(String method) Transparency No. 76 DOM Better formatted output UTF-8 encoding, Indentation, Word wrapping Document type declaration try { // Now that the document is created we need to *serialize* it OutputFormat format = new OutputFormat(fibonacci, “UTF-8", true); format.setLineSeparator("\r\n"); format.setLineWidth(72); format.setDoctype(null, "fibonacci.dtd"); XMLSerializer serializer = new XMLSerializer(System.out, format); serializer.serialize(root); } catch (IOException e) { System.err.println(e); } > Java domexample. PrettyFibonacciDOMSerializer Transparency No. 77 DOM DOM based XMLPrettyPrinter public class DOMPrettyPrinter { public static void main(String[] args) { DOMParser parser = new DOMParser(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory parser.parse(args[i]); Document document = parser.getDocument(); // set output format & serialize OutputFormat format = new OutputFormat(document, "UTF-8", true); format.setLineSeparator("\r\n"); format.setIndent(2); format.setPreserveSpace(false); format.setIndenting(true); format.setLineWidth(72); XMLSerializer serializer = new XMLSerializer(System.out, format); serializer.serialize(document); } catch (SAXException e) { catch (IOException e) { } } // end main } System.err.println(e); } System.err.println(e); } Transparency No. 78 DOM Notes Using the DOM to write documents automatically maintains well-formedness constraints Validity is not automatically maintained. Transparency No. 79 DOM References Much code this presentation uses came from: http://www.cafeconleche.org/slides/sd2004west/saxdom Processing XML with Java Elliotte Rusty Harold, Chapters 9-13: Chapter 9, The Document Object Model: Chapter 10, Creating New XML Documents with DOM: Chapter 11, The Document Object Model Core: Chapter 12, The DOM Traversal Module: Chapter 13, Output from DOM: DOM Level 2 Core Specification: DOM Level 2 Traversal and Range Specification: Transparency No. 80 DOM JAXP(Java API for XML ) for DOM Transparency No. 1 DOM DOMParsers and DOMImplementations XML Document DOM Parser DOM Document Problems: How to get a DOM Document object from an XML Document ? Get a DOM Parser, parse an XML document and then get a DOM document. HOW to construct DOM objects directly by program? Get a DOMImplementation, invoke cerateDocument() to get the initial DOM document. HOW to get a DOM object form an XML Document and modify it by programs ? Get a DOM document by parsing the XML Document, use the factory methods of Document to create Nodes and use Node methods to add them to the result tree. Transparency No. 82 DOM Use Apache’s xerces for DOM XML2DOM: // find the DOM parser implementation class: // import org.apache.xerces.parsers.DOMParser // or import com.sun.org.apache.xerces.internal.parsers.DOMParser DOMParser parser = new DOMParser(); parser.setFeature(("http://xml.org/sax/features/validation", true ); parser.setFeature(("http://xml.org/sax/features/namespace", true ); … parser.parse( url_or_inputSource) ; Document doc = parser.getDocument(); DOMImplementation dm = doc.getImplementation(); Transparency No. 83 DOM Construct DOM from scratch // find DOMImplematation class: // org.apache.xerces.dom.DOMImplementationImpl DOMImplementation dm = new DOMImplementationImpl(); // or dm = DOMImplementationImpl.getDOMImplementation(); Document doc = dm.createDocument(…); Element e = doc.createElement(…); Attr attr = doc.createAttributeNS(…); Text txt = doc.createTextNode(“…”); Transparency No. 84 DOM JAXP (Java API for XML Processing) Sun’s Java API for XML Processing three modules: for DOM Processing for SAX Processing for Transformation 5 packages 1. javax.xml.parsers Provides classes allowing the processing of XML documents. Two types of plugable parsers are supported: SAX (Simple API for XML) DOM (Document Object Model) 2. javax.xml.transform ( + … ) APIs for processing transformation instructions, and performing a transformation from source to result. Transparency No. 85 DOM JAXP’s DOM plugability mechanism Transparency No. 86 DOM JAXP API for DOM javax.xml.parsers.DocumentBuilder Using this class, an application programmer can obtain a Document from XML. javax.xml.parsers.DocumentBuilderFactory a factory class for obtaining a DocumentrBuilder. abstract class Concrete subclass can be obtained by the static method: DocumentBuilderFactory.newInstance() desired capability of the parser can be specified by setting the various properties of the obtained factory instance. Transparency No. 87 DOM Example code snippet import javax.xml.parsers.*; DocumentBuilder builder; DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); factory.setValidating(true); String location = "http://myserver/mycontent.xml"; try { builder = factory.newDocumentBuilder(); Document doc1 = builder.parse(location); Document doc2 = builder.newDocument(); //empty document } catch (SAXException se) {// handle error } catch (IOException ioe) { // handle error } catch (ParserConfigurationException pce){// handle error } Transparency No. 88 javax.xml.dom.DocumentBuilder DOM abstract DOMImplementation getDOMImplementation() Obtain an instance of a DOMImplementation object. abstract Document newDocument() Obtain a new instance of a DOM Document object to build a DOM tree with. abstract boolean isNamespaceAware() Indicates whether or not this parser is configured to understand namespaces. abstract boolean isValidating() Indicates whether or not this parser is configured to validate XML documents. Document parse(File | InputSource | InputStream [, systemId] | uriString ) Parse the content of the given file as an XML document and return a new DOM Document object. abstract void setEntityResolver(EntityResolver er) Specify the EntityResolver to be used to resolve entities present in the XML document to be parsed. abstract void setErrorHandler(ErrorHandler eh) Specify the ErrorHandler to be used to report errors present in the XML document to be parsed. Transparency No. 89 DOM javax.xml.dom.DocumentBuilderFactory Object getAttribute(String name) void setAttribute(String name, Object value) Allows users to set/get specific attributes on the underlying implementation. boolean isIgnoringComments() , setIgnoringComments(boolean) Indicates whether or not the factory is configured to produce parsers which ignores comments. Other properties: IgnoringElementContentWhitespace ; ExpandEntityReferences; Coalescing; // merge adjacent texts and CDATA into a text node NamespaceAware; Validating; abstract DocumentBuilder newDocumentBuilder() Creates a new instance of a DocumentBuilder using the currently configured parameters. static DocumentBuilderFactory newInstance() Obtain a new instance of a DocumentBuilderFactory. Transparency No. 90 DOM HOW DocumentBuilderFactory finds its instance Use the javax.xml.parsers.DocumentBuilderFactory system property Use the above property at file “%JAVA_HOME%/lib/jaxp.properties" in the JRE directory. look for the classname in the file META-INF/services/ javax.xml.parsers.DocumentBuilderFactory in jars available to the runtime. Platform default DocumentBuilderFactory instance, which is “com.sun.org.apache.xerces.internal.jaxp.DocumentBuilde rFactoryImpl” for jdk 1.5. and j2se 6. Transparency No. 91 DOM DOM level 3 Load and Save Specification provide an API for loading/Saving DOM Objects specification ; JavaDoc API Main interfaces DOMImplementationLS LSInput : where to get XML text LSOutput : where to put cml text LSParser : XMLtext2DOM LSSerializer : DOM2XMLText LSLoadEvent : { getInput(); getNewDocument(); } LSParserFilter LSProgressEvent:{getInput();getPosition();getTotalSize()} LSResourceResolver: like EntityResolver in Sax LSSerializerFilter Transparency No. 92 DOM DOMImplementationLS Main Factory interface for other LS classes instances methods : createLSInput() : LSInput // create an empty LSInput createLSOutput(): LSOutput // empty LSOutput createLSParser(short mode, String schemeType):LSParser mode: MODE_SYNCHROUS or MODE_ASYNCHROUS schemeType: dtd “http://www.w3.org/TR/rec-xml” xml schema “http://www.w3.org/2001/xmlschema” createLSSerializer():LSSerializer Transparency No. 93 DOM How to get a DOMImplementationLS implementation? get a DOMImplementation: impl = DocumentBuilderFactory.newInstance() .newDocumentBuilder(); /* other implementation dependent code possible. e.g., org.apache.xerces.dom.DOMImplementationImpl impl = DOMImplementationImpl.getDOMImplementation() ; */ cast impl to DOMImplementationLS if it support LS: DOMImplementationLS ls = null ; if( impl.hasFeature(“LS”, “3.0”) ) { ls = (DOMImplementationLS) impl ; } else { out.println(“LS 3.0 not supported!”); exit() ; } Transparency No. 94 DOM LSOutput represents an output destination for data. may wrap 1 or more of a character stream a byte stream (+ character encoding ) a system id LSOutput code snippet: lsout = ls.createLSOutput() ; characterStream : Writer lsout.setCharacterStream(new FileWriter(“file1.xml”)); byteStream : OutputStream lsout.setEncoding(“UTF-8”) ; systemId: String lsout.setSystemId(“file:file2.xml”); encoding : String // see http://www.w3.org/TR/ 2004/REC-xml-20040204/#charencoding for its format. Transparency No. 95 DOM LSInput represents an input source for data. actual input is found by the order • chartacter stream byteStream stringData systemId publicId LSInput characterStream : Reader byteStream: InputStream stringData: String systemId: String code snippet: publicId: String lsin = ls.createLSInput() ; lsin.setCharacterStream(new FileReader(“file1.xml”)); lsin.setEncoding(“UTF-8”) ; lsin.setBaseURI(“file:file1.xml”) ; lsin.setSystemId(“file:file2.xml”); encoding: String baseURI: String certifiedText: boolean (can be converted to UTF-* form) Transparency No. 96 DOM LSSerializer provide an API for serializing (writing) a DOM document out into XML. The XML data is written to a string or an output stream. Filter can be used to dertermine which nodes should be serialized. To control a LSSerializer, LSSerializer we should : cfg = serializer.getDomConfig() ; cfg.setParameter(…), … serializer.write(doc|node, …) domConfigR : DOMConfiguration newLine: String filter: LSSerializerFilter write(Node, LSOutput) : void writeToString(Node) : String writeToURI(Node, String uri) : void Transparency No. 97 DOM org.w3c.dom.DOMConfiguration represents the configuration of a document and maintains a table of recognized parameters. affect Document.normalizeDocument() behavior, such as replacing the CDATASection nodes with Text nodes or specifying the schema to be used for validation. used in [DOM Level 3 Load and Save] in the DOMParser and DOMSerializer interfaces. DOMConfiguration canSetParameter(String name, Object value) : boolean setParameter(String name, Object value): void getParameter(String name) :Object DOMStringList boolean contains(String str) int getLength() String item(int index) getParameterNames() :DOMStringList Transparency No. 98 DOM Configurable parameters of DOMConfiguration parameter possible value (Type) meaning canonical-form true / false do xml canonicalization discard-default-content true/false don’t show default content in document format-pretty-print true/false exact transformation not specified in this spec. ignore-unknown-characterdenormalizations true/false warning or raise an error when an unknown character is encountered normalize-characters true/false xml-declaration true/false show xml or text declaration Transparency No. 99 DOM Additional parameters parameter possible value (Type) meaning cdata-section true/false preserve cdata-section true/false preserve comments element-content-whitespace true/false keep ignorable whitespace entities true/false Keep EntityReference nodes in the document error-handler DOMErrorHandler Error Handler well-formed true/false check well-formed w.r.t xmlversion validate true/false validate the document when parsing normalization validate-if-schema true/false validate if a schema (dtd or … ) for the doc element can be found schema-type String(absolute URI) schema languages used (dtd, xml-schema,…) schema-location DOMStringList (URIs) location ofTransparency schemasNo. 100 comments DOM LSParser An interface to an object that is able to build, or augment, a DOM tree from various input sources. methods: getDomConfig() get/setFilter(LSParserFilter) parse(LSInput), parseURI(Strnig uri) : Document parseWithContext(LSInput input, Node contextArg, short action) Parse an XML fragment from an LSInput. insert content into position specified by context/action. possible actions: replace, append-as-child, replace-children, before, after, replace Transparency No. 101 DOM LSParser abort() // abort parsing getBusy() : boolean // check if the parser is busy parsing getAsync() // chenck if this is an asynchronous parser. should implement events.EventTarget to for asynchronous parser. Org.w3c.events.* EventTarget: Add/removeEventListener(String type, EventListener, bool useCapture) dispatchEvent (org.w3c.events.Event) EventListener handleEvent(Event) : void LSLoadEvent extends Event : … getType(), getTarget(),… Transparency No. 102 DOM LSParserFilter methods: short acceptNode(Node nodeArg) This method will be called by the parser at the completion of the parsing of each node. values: FILTER_ACCEPT, _REJECT(subtree), _SKIP(a node only), _INTERRUPT(accept & terminate) int getWhatToShow() Tells the LSParser what types of nodes to show to the method LSParserFilter.acceptNode. constants used defined in [DOM Level 2 Traversal and Range: NodeFilter] . unpassed nodes are built automatically into the resulting document. short startElement(Element elementArg) call this method after each Element start tag has been scanned, but before the remainder of the Element is processed. purpose: allow element to be skipped quickly. return value same as acceptNode(). Transparency No. 103