XML Written by Dr. Yaron Kanza, Edited by Liron Blecher Agenda • What is XML • Parsing XML • DOM • SAX • XML Scheme • JAXB Binding What is XML • XML stands for EXtensible Markup Language • It is a meta-language that describes the content of a document (self-describing data) • If Java = Portable Programs then XML = Portable Data • XML does not specify the tag set or grammar of the language • Tag Set – markup tags that have meaning to a language processor • Grammar – rules that define correct usage of a language’s tags 3 What is XML • An XML file has the following syntax rules: • All data is contained within tags • Tags are marked using <xxxxx> brackets where xxxxx is the tag name • Each tag must be closed (using </xxxxx> tag) • The data between the opening and closing tags is the value of the tags • Tags can be nested 4 What is XML • If a tag does not contain any value it can be opened and closed like this: <xxxxx /> • An XML document must contain a single root tag • Each tag can also have multiple attributes defined in it (where the tag is opened), for example: <country name=“Israel” capital=“Jerusalem” /> • There are two attributes: name and capital • Each attribute value must be inside inverted commas 5 Agenda • What is XML • Parsing XML • DOM • SAX • XML Scheme • JAXB Binding Parsing • Parsing means reading some input and analyzing it according to grammar rules • In regular text the grammar are end of lines, word spacing, etc. Formal grammar Input 7 Analyzed Parser Data The structure(s) of the input, according to the atomic elements and their relationships (as described in the grammar) Parsing XML • There are 2+1 methods for parsing XML files: • DOM – Document Object Model • SAX – Simple API for XML • JAXB Binding – Only when you have an XML definition (usually scheme file that ends with XSD) 8 Agenda • What is XML • Parsing XML • DOM • SAX • XML Scheme • JAXB Binding DOM • Parser creates a tree object out of the document • User accesses data by traversing the tree • The tree and its traversal conform to a W3C standard • The API allows for constructing, accessing and manipulating the structure and content of XML documents 10 DOM – Example XML File DOM Parser DOM Tree in memory 11 A P I Application DOM - Example <?xml version="1.0"?> <countries> <country continent=“Asia"> <name>Israel</name> <population year="2001">6,199,008</population> <city capital="yes"><name>Jerusalem</name></city> <city captial=”no”><name>Ashdod</name></city> </country> <country continent=“Europe"> <name>France</name> <population year="2004">60,424,213</population> </country> </countries> 12 DOM – Example (The DOM Tree) Document countries country continent name city Asia Israel population year 2001 13 capital city capital name name country population no Ashdod 6,199,008 year yes continent Jerusalem Europe name France 2004 60,424,213 DOM – Example (Creating the tree) • A DOM tree is generated by a DocumentBuilder • The builder is generated by a factory, in order to be implementation independent • The factory is chosen according to the system configuration DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse("world.xml"); 14 DOM – Example (configuring the factory) • The methods of the document-builder factory enable you to configure the properties of the document building • You can also add the schema file to the factory for additional validations on the XML structure • For example • factory.setValidating(true) • factory.setIgnoringComments(false) 15 DOM – Example (the Node interface) The nodes of the DOM tree include • A special root (denoted document) • The Document interface retrieved by builder.parse(…) actually extends the Node Interface • element nodes • text nodes • attributes • comments • and more ... Every node in the DOM tree implements the Node interface 16 DOM – Example (interfaces in the DOM tree) Node DocumentFragment Document Text CDATASection CharacterData Comment Attr Element DocumentType NodeList Notation Entity NamedNodeMap EntityReference ProcessingInstruction DocumentType Figure as appears in : “The XML Companion” - Neil Bradley 17 DOM – Example (interfaces in the DOM tree) Document Document Type Attribute Text Attribute Element Comment 18 Element Element Entity Reference Text Element Text Text DOM – Example (Node Navigation) Every node has a specific location in tree Node interface specifies methods for tree navigation • Node getFirstChild(); • Node getLastChild(); • Node getNextSibling(); • Node getPreviousSibling(); • Node getParentNode(); • NodeList getChildNodes(); • NamedNodeMap getAttributes() 19 DOM – Example (Node Navigation) getPreviousSibling() getFirstChild() getChildNodes() getParentNode() getLastChild() getNextSibling() 20 DOM – Example (Node Properties) Every node has • a type • a name • a value • attributes The roles of these properties differ according to the node types Nodes of different types implement different interfaces (that extend Node) 21 DOM – Example (Node Type) ELEMENT_NODE = 1 PROCESSING_INSTRUCTION_NODE = 7 ATTRIBUTE_NODE = 2 COMMENT_NODE = 8 TEXT_NODE = 3 DOCUMENT_NODE = 9 CDATA_SECTION_NODE = 4 DOCUMENT_TYPE_NODE = 10 ENTITY_REFERENCE_NODE = 5 DOCUMENT_FRAGMENT_NODE = 11 ENTITY_NODE = 6 NOTATION_NODE = 12 if (myNode.getNodeType() == Node.ELEMENT_NODE) { //process node … } 22 DOM – Node Manipulation Children of a node in a DOM tree can be manipulated - added, edited, deleted, moved, copied, etc. To constructs new nodes, use the methods of Document • createElement, createAttribute, createTextNode, etc. To manipulate a node, use the methods of Node: • appendChild, insertBefore, removeChild, replaceChild, setNodeValue, cloneNode(boolean deep) etc. 23 DOM – Example (Node Manipulation) Old New replaceChild deep = 'false' cloneNode deep = 'true' Figure as appears in “The XML Companion” - Neil Bradley 24 examples.xml DEMO 25 Agenda • What is XML • Parsing XML • DOM • SAX • XML Scheme • JAXB Binding SAX • XML is read sequentially • When a parsing event happens, the parser invokes the corresponding method of the corresponding handler • The handlers are programmer’s implementation of standard Java API (i.e., interfaces and classes) • We won’t get into this type of parser as it is very complicated and not required for small XML files 27 Agenda • What is XML • Parsing XML • DOM • SAX • XML Scheme • JAXB Binding JAXB Binding • Marshaling • Un-Marshaling 29 Jaxb binding DEMO 30