DT228/3 Web Development Introduction to XML Definition XML is a cross-platform, software and hardware independent tool for transmitting information. “XMl is going to be everywhere” – W3C Introduction • XML is an important technology used throughout web applications • XML stands for EXtensible Markup Language • XML is a markup language much like HTML • XML was designed to describe data • XML tags are not predefined in XML. You must define your own tags Introduction • XML uses a Document Type Definition (DTD) or an XML Schema to describe the data • XMl documents are human readable • XMl documents end with .xml e.g. note.xml Introduction The Web was created to publish information for people • – “eyes-only” was dominant design perspective • – Hard to search • – Hard to automate processing Introduction The Web is using XML to become a platform for information exchange between computers (and people) • – Overcomes HTML’s inherent limitations • – Enables the new business models of the network economy eXtensible Markup Language • Instead of a fixed set of format-oriented tags like HTML, XML allows you to create whatever set of tags are needed for your type of information. • This makes any XML instance “self-describing” and easily understood by computers and people. • XML-encoded information is smart enough to support new classes of Web and e-commerce applications. Why XML? • Sample Catalog Entry in HTML <TITLE> Laptop Computer </TITLE> <BODY> <UL> <LI> IBM Thinkpad 600E <LI>400 MHz <LI> 64 Mb <LI>8 Gb <LI> 4.1 pounds <LI> $3200 </UL> </BODY> •How can I parse the content? E.g. price? •Need a more flexible mechanism than HTML to interpret content. Why XML? Sample Catalog Entry using XMl <COMPUTER TYPE=“Laptop”> <MANUFACTURER>IBM</MANUFACTURER> <LINE> ThinkPad</LINE> <MODEL>600E</MODEL> <SPECIFICATIONS> <SPEED UNIT = “MHz”>400</SPEED> <MEMORY UNIT=“MB”>64</MEMORY> <DISK UNIT=“GB”>8</DISK> <WEIGHT UNIT=“POUND”>4.1</WEIGHT> <PRICE CURRENCY=“USD”>3200</PRICE> </SPECIFICATIONS> </COMPUTER> Smart processing using XMl • <COMPUTER> and <SPECIFICATIONS> provide logical containers for extracting and manipulating product information as a unit • – Sort by <MANUFACTURER>, <SPEED>, <WEIGHT>, <PRICE>, etc. • Explicit identification of each part enables its automated processing – Convert <PRICE> from “USD” to Euro, Yen,etc. Document exchange • Use of XMl allows companies to exchange information that can be processed automatically without human intervention e.g. – Purchase orders – Invoices – Catalogues etc Difference between HTML and XML? • XML was designed to carry data. • XML is not a replacement for HTML. • XML and HTML were designed with different goals: •XML was designed to describe data and to focus on what data is. HTML was designed to display data and to focus on how data looks. •HTML is about displaying information, while XML is about describing information. XML: Describes Data • XML was not designed to DO anything. • XML is created to structure, store and to send information BUT it requires software to send it, receive it or display it • The following example is a note to Angela from Jim, stored as XML: <note> <to>Angela</to> <from>Jimmy</from> <heading>Reminder</heading> <body>Get the paper!</body> </note> XML: Describes Data The note has a header and a message body. It also has sender and receiver information. But still, this XML document does not DO anything. It is just pure information wrapped in XML tags. Someone must write a piece of software to send, receive or display it. XML is free and extensible • XML tags are not predefined. You must "invent" your own tags. • The tags used to mark up HTML documents and the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard (like <p>, <h1>, etc.). • XML allows the author to define his own tags and his own document structure. • The tags in the example above (like <to> and <from>) are not defined in any XML standard. These tags are "invented" by the author of the XML document. XML Document Structure XMl Declaration <root> <child> <subchild>.....</subchild> </child> </root> XML Document: Example <?xml version="1.0" encoding="ISO-8859-1"?> <note> <to>Angela</to> The first line in the document - the XML declaration - defines the XML version <from>Jimmy</from> and the character encoding used in the <heading>Reminder</heading> document. In this case the document <body>Get the paper!</body> conforms to the 1.0 specification of XML and uses the ISO-8859-1 (Latin-1/West </note> European) character set. <Root> element (what the document is) End of the root element <Child> elements XML Syntax rules 1. All elements must have a closing </…> tag* e.g. in HTML it’s okay to write: • <p> this is a paragraph HTML allows the closing tags to be left out. In XMl, would have • <p> this is a paragraph </p> * except for the XMl declaration XML Syntax rules 2. All elements are case sensitive Unlike HTML, XMl tags are case sensitive <letter> this is a incorrect </Letter> <letter> this is a correct </letter> XML Syntax rules 3. All elements must be properly nested <b><i> this is a incorrect in XML</b></i> acceptable in HTML but not XML <b><i> this is a correct in XML </i></b> XML Syntax rules 4. All XML documents must have a root element • All XML documents must contain a single tag pair to define a root element - all other elements must be within this root element. • All elements can have sub elements (child elements). • Sub elements must be correctly nested within their parent element: <root> <child> <subchild>.....</subchild> </child> </root> XML Syntax rules 5. Attribute values must always be in quotations “” • XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute value must always be quoted. Study the two XML documents below. The first one is incorrect, the second is correct: Incorrect <?xml version= etc <note date=12/11/2002> <to>Tove</to> <from>Jani</from> </note> correct <?xml version= etc <note date=“12/11/2002”> <to>Tove</to> <from>Jani</from> </note> XML Syntax rules 6. Valid Element Naming XML elements must follow these naming rules: •Names can contain letters, numbers, and other characters •Names must not start with a number or punctuation character •Names must not start with the letters xml (or XML or Xml ..) •Names cannot contain spaces child/parent elements <book> <title>My First XML</title> <prod id="33-657" media="paper"></prod> <chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> <chapter>XML Syntax <para>Elements must have tag</para> <para>Elements must be nested</para> </chapter> </book> Note: Indentation helps readability.. Book is the root element. Title, prod, and chapter are child elements of book. Book is the parent element of title, prod, and chapter. Title, prod, and chapter are siblings (or sister elements) because they have the same parent. child elements vs attributes 1. <note> <note date="12/11/2002"> 2. <date> <to>Tove</to> <day>12</day> <from>Jani</from> <month>11</month> <heading>Reminder</heading> <year>2002</year> <body>Hello!</body> </date> </note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Hello!</body> </note> Both 1 and 2 contain exactly the same information. 1 uses an attribute for the date. 2 uses child elements. Usually easier to use child elements – easier to read and maintain. Well formed XMl documents A “well formed” XMl document adhere to syntax rules described Can check whether an XML document has valid syntax at: http://www.w3schools.com/dom/dom_validate.asp OR can simply open your xml document in a web browser such as Internet Explorer and it will only open if correctly formatted To manually create or write an XML document: • Figure out what the overall document is (-> root tag) • Figure out what the key fields of information.. (-> tag names) • Figure out which information is data (-> tag contents) To manually create or write an XML document: Example: Write an XML document for the following Address book Name Jeremy Cannon Naomi Murphy Sheila Zheng Telephone Address 0098837 992887 999287 22, Marlboro Ct 39, Alma Road Apt 4, Hyde Road To manually create or write an XML document: • Figure out what the overall document is (-> root tag) --- Address book • Figure out what the key fields of information.. (-> tag names) --- Information is Name (which can be broken into first name, surname), Telephone, Address (which can be broken down into house number, street name) • Figure out which information is data (= tag contents) number, street name) To manually create or write an XML document: example <addressbook> <person <name> <firstname> Jeremy </first name> <surname> Cannon</surname> </name> <telephone> 0098837</telephone> <address> <housenumber>22</house number> <street> Marlboro Ct</street> </address> </person <person> <name> <firstname> Jeremy </first name> … etc </person </addressbook> To check your document.. • Save it as .xml file • Open it in a browser – and see if any errors are produced OR • go to a specialist XML validator for better error diagnosis e.g. http://www.w3schools.com/dom/dom_validate.asp Valid XMl documents A “valid” XMl document adheres a definition of what it can contain (e.g. shouldn’t be able to put ’13’ as a <month>, must have only allowed elements) Two ways to define this definition (1)Document Type Definition (DTD) (2)XML Schema Brief intro to both… DTD Declaration You usually specify the DTD for your XML document by providing a reference to it near the top of the XMl document. <?xml version="1.0"?> <!DOCTYPE note SYSTEM "note.dtd"> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> Syntax: <!Doctype root-element SYSTEM “filename” Document Type Definitions The DTD (note.DTD) for XMl document Note: is . <!DOCTYPE note [ <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT note (to,from,heading,body)> to (#PCDATA)> from (#PCDATA)> heading (#PCDATA)> body (#PCDATA)> ]> Lists the document type (note) and the valid elements, and the type of content they can accept Why use a DTD? With DTD, each of your XML files can carry a description of its own format with it. With a DTD, independent groups of people can agree to use a common DTD for interchanging data. Your application can use a standard DTD to verify that the data you receive from the outside world is valid. XML Schema XML Schema is an XML based alternative to DTD. An XML schema describes the structure of an XML document. XML Schemas will probably be used in most Web applications as a replacement for DTDs because: •XML Schemas are supported by the W3C •XML Schemas are richer and more useful than DTDs •XML Schemas are written in XML •XML Schemas support data types •XML Schemas are extensible to future additions •XML Schemas support namespaces XML Schema XML Schema is an XML based alternative to DTD. An XML schema describes the structure of an XML document. It uses XMl syntax. For XMl document note.xml: XML Schema shown overleaf <?xml version="1.0"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> XML Schema: Example root element defines the elements allowed in the .xml document <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com" elementFormDefault="qualified"> <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> XML Schema XML Schemas will probably be used in most Web applications as a replacement for DTDs because: •XML Schemas •XML Schemas •XML Schemas •XML Schemas •XML Schemas •XML Schemas are supported by the W3C are richer and more useful than DTDs are written in XML support data types are extensible to future additions support namespaces