XML Technologies XML Dr Alexiei Dingli 1 What is XML? • XML stands for EXtensible Markup Language • XML is a markup language much like HTML • XML was designed to carry data, not to display data • XML tags are not predefined. You must define your own tags • XML is designed to be self-descriptive • XML is a W3C Recommendation 2 XML vrs HTML • XML – Is a meta language – Focus on transport and storage of data – Not a replacement to HTML! • HTML – Is a vocabulary of SGML – Focus on display (formatting) 3 XML is just pure information • Was not designed to do anything ... • Just structure, store and transport information <stickynote> <to>Joseph</to> <from>Tom</from> <body>Puchase tickets!</body> </stickynote> 4 Format • The format of a .xml document is plain text • Only XML aware applications can interpret it correctly • But it can be easily viewed/edited by anyone using a simple text editor Tip: Internet Explore can be used as a viewer and validator of XML (Eg1, Eg2) 5 Let’s be creative ... • The tags in the example above (like <to> and <from>) are not defined in any XML standard. These tags are "invented" by the author of the XML document • That is because the XML language has no predefined tags, it’s a meta language! • The tags used in HTML (and the structure of HTML) are predefined. HTML documents can only use tags defined in the HTML standard (like <p>, <h1>, etc.) • XML allows the author to define his own tags and his own document 6 structure Definition XML is a software and hardware independent tool for carrying information Note: the specification of the language can be found http://www.w3.org/XML/ 7 Content Vs. Layout • To display dynamic data in your HTML document, it will take a lot of work to edit the HTML each time the data changes • With XML, data can be stored in separate XML files • User can concentrate on using HTML for layout and display, and be sure that changes in the underlying data will not require any changes to the HTML • With a few lines of JavaScript, one can read an external XML file and update the data content of the HTML. 8 Simple data sharing • Most systems have data in incompatible formats • XML is stored in plain text, thus it is software/hardware independent • Much easier to share information 9 XML is a meta language! • As such, you can create new languages ... – XHTML the latest version of HTML – WSDL for describing available web services – WAP and WML as markup languages for handheld devices – RSS languages for news feeds – RDF and OWL for describing resources and ontology – SMIL for describing multimedia for the web10 XML Tree (1) • All xml documents are in the form of a tree <root> <child> <subchild>.....</subchild> </child> </root> 11 XML Tree (2) 12 XML Tree (3) • Simple example ... <stickynote> <to>Joseph</to> <from>Tom</from> <body>Puchase tickets!</body> </stickynote> 13 XML Tree (4) • Root element <stickynote> • Children elements <to> <from> <body> 14 XML Tree (4) 15 34U exercise • Create the tree for ... <bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> </bookstore> 16 XML Commandments 17 Commandment 1 For every opening Tag, there must be a closing Tag <p>This is a paragraph <p>This is a paragraph</p> 18 Commandment 2 XML Tags are case sensitive <Message>This is incorrect</message> <message>This is correct</message> 19 Commandment 3 XML Elements Must be Properly Nested <b><i>This text is bold and italic</b></i> <b><i>This text is bold and italic</i></b> 20 Commandment 4 XML Documents must have a root element <root> <child> <subchild>.....</subchild> </child> </root> 21 Commandment 5 XML attributes must be quoted <stickynote date=1/10/2008> <stickynote date=“12/11/2007”> 22 Commandment 6 Some characters have special meaning in XML Shortcut Symbol Meaning &lt; < less than &gt; > greater than &amp; & ampersand &apos; ' apostrophe &quot; " quotation mark <message>Meet me at Tom’s place</message> 23 <message>Meet me at Tom &apos; s place</message> Commandment 7 Comments in XML <!-- This is a comment --> 24 Elements Vrs Attributes <book category="CHILDREN"> <title>Harry Potter</title> </book> • book is an element – which can contain • other elements (such as title) • Or text content (such as Harry Potter in title) • category is an attribute – Whose value is CHILDREN 25 What’s in a name? • Naming rules ... – Names can contain letters, numbers and other characters – Names must not start with a number or punctuation character – Names must not start with the letters xml (or XML, or Xml, etc) 26 – Names cannot contain spaces Best (Name) Practices • Make names descriptive. Names with an underscore separator are nice: <first_name>, <last_name>. • Names should be short and simple, like this: <book_title> not like this: <the_title_of_the_book_which_i_am_currently_reading>. • Avoid "-" characters. If you name something "first-name," some software may think you want to subtract name from first. • Avoid "." characters. If you name something "first.name," some software may think that "name" is a property of the object "first." • Avoid ":" characters. Colons are reserved to be used for something called namespaces. • XML documents often have a corresponding database. A good practice is to use the naming rules of your database for the elements in the XML documents. • Non-English letters like éòá are perfectly legal in XML, but watch out for problems if your software vendor doesn't support them. 27 More into attributes ... • Generally used to provide additional information not part of the data • Use quotes and for a quote within a quote, use “&quot;” • Some limitations of attributes – attributes cannot contain multiple values – attributes cannot contain tree structures – attributes are not easily expandable in future 28 • If in doubt use elements Spot the difference ... <note date="10/01/2008"> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> <note> <date>10/01/2008</date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> <note> <date> <day>10</day> <month>01</month> <year>2008</year> </date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> 29 Well formed documents ... 1. XML documents must have a root element 2. XML elements must have a closing tag 3. XML tags are case sensitive 4. XML elements must be properly nested 5. XML attribute values must be quoted 30 Valid documents ... • Is a "Well Formed" XML document, which also conforms to the rules of a Document Type Definition (DTD) <!DOCTYPE note SYSTEM "Note.dtd"> <note> … </note> 31 Example DTD • A DTD is used to define the structure of an XML document but its not in XML! <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> 32 Example XSchema • An XSchema is an XML alternative to a DTD <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> 33 Errors! • Errors in XML documents will stop your XML applications • XML software should be small, fast, and compatible • HTML browsers will display documents with errors (like missing end tags) • HTML browsers are big and incompatible because they have a lot of unnecessary code to deal with (and display) HTML errors 34 XML Viewing • Just use a normal browser ... – simple.xml – cd_catalog.xml – plant_catalog.xml 35 Better XML Viewing • Just use the Cascading Style Sheet (CSS) • Without CSS • With CSS 36 The CSS CATALOG { background-color: #ffffff; width: 100%; } CD { display: block; margin-bottom: 30pt; margin-left: 0; } TITLE { color: #FF0000; font-size: 20pt; } ARTIST { color: #0000FF; font-size: 20pt; } COUNTRY,PRICE,YEAR,COMPANY { display: block; color: #000000; margin-left: 20pt; } 37 Even better XML viewing • Use XSLT – XSLT is the recommended style sheet language of XML – XSLT (eXtensible Stylesheet Language Transformations) is far more sophisticated than CSS – One way to use XSLT is to transform XML into HTML before it is displayed 38 XSL Example • The XML • The XSL • The result 39 Exercise • Amazon just commissioned you to create an XML file for the following book as follows: – Title A.I. a modern approach – Author Russel and Norvig – Publisher Prentice Hall – Date of Publication 2000 – ISBN 1234567 – Dimensions 10 x 5 – Number of Pages 500 – Comments 2 in store 1, 3 in store 2 – Review Quite interesting! – Image http://www.amazon.com/AIBook 40