XML CS 105 What is XML? • • • • XML stands for Extensible Markup Language. XML is a markup language like HTML. XML was designed to describe data. You must define your own tags. An XML Example <note> <to>John</to> <from>Merry</from> <heading>Reminder</heading> <body>Don’t forget to pick up the box</body> </note> Why XML? • There is a huge number of websites on the web and each website posts tunes of web pages. Also the number of web sites increases day by day. • It is getting harder to find the information we are looking for. Why XML? • We need search engines to locate the information that we are looking for. • Most of the WWW documents are in HTML. • Search engines can bring you tons of pages. • Note that we humans are smart. Most of the times we can realize the content of a webpage by reading it. But computer programs are not and search engine is a computer program. XML is the solution. • HTML describes how to display a page. • But it does nothing to describe the contents of a page. • We need a language to convey the content of a web page. • XML is the solution. XML • XML indicates: – What information is contained within a webpage. – Where that information is on the page. XML and HTML • XML is not a replacement for HTML • XML was designed to describe data in a format that you want and to focus on what data is. • HTML was designed to display data in a format that you prefer. • HTML is about displaying information, while XML is about describing information XML and HTML • XML and HTML complement each other. XML describes the data. HTML displays the data. eXtensible Markup Language • • • • eXtensible Markup Language XML is supported by many platforms and many tools. • A lot of web applications are using XML now. XML vs. HTML <HTML> <BODY> <UL> <LI> 1 kg apple <LI> 1/2 Sugar <LI> 1 kg white flour <LI> 250 g butter </UL> <P> Place apples in a bowl. Toss with flour until covered. then, ... </BODY> </HTML> <RECIPE> <INGREDIENTS> <INGREDIENT> 1Kg apple </INGREDIENT> <INGREDIENT> 1/2Kg Sugar </INGREDIENT> <INGREDIENT> 1 kg white flour </INGREDIENT> <INGREDIENT> 250 g butter </INGREDIENT> </INGREDIENTS> <INSTRUCTIONS> <INSTRUCTION> Place apples in a bowl. </INSTRUCTION> <INSTRUCTION> Toss with flour until covered. </INSTRUCTION> …. </INSTRUCTIONS> </RECIPE> XML • Labels the data • Structures the data • Example Rules for Well-Formatted XML • Every XML document must have a special tag: <?xml version=“1.0”?> to tell web browser it is an xml file • Every XML document must have a single, allenclosing “root tag” • Every XML element must have a corresponding closing tag • Just as HTML, XML elements must be properly nested • XML tags are case sensitive, <FOOD> and <food> are different • The value of an attribute must appear in quotation marks (either double or single quotation marks work) XML Element Names • Names can contain letters, numbers and other characters • Names cannot begin with a number or an underscore • Names cannot begin with the prefix “XML” or “xml” or “xML” etc. • Names cannot contain spaces Good, Bad, and Unparsable • Common errors in XML: misspelling tag names, have more than one root-level element, forgetting to close an open quote • If XML file is well-formed, the browser will display the file in a hierarchical fashion, otherwise, cannot load the file • The XML file could be designed poorly although it is well-formed. Good, Bad and Unparsable • Bad example: <TVGuide> <channel>…</channel> <time>…</time> <title>…</title> <description>…</description> <channel>…</channel> <time>…</time> <title>…</title> <description>…</description> </TVGuide> Good, Bad and Unparsable • Solution: <TVGuide> <program> <channel>…</channel> <time>…</time> <title>…</title> <description>…</description> </program> <program> <channel>…</channel> <time>…</time> <title>…</title> <description>…</description> </program> </TVGuide> Design Tips • Think logically and design accordingly Ex. Use <CD>…</CD> for cd collections • If you want it, tag it Ex. Don’t use <Time>10:00pm <name>..</name> </Time> Use <Time> <slot>10:00pm</slot> <name>…</name> </Time> • Think generically Ex. Don’t use <NBC>…</NBC> <ABC>…</ABC> to group different networks Use <name>NBC</name><name>ABC</name> • Think hierarchically Start with a meaningful root element and work downward to successive levels of detail What is XSL? • XSL is a language that allows one to describe for a browser how to process an XML file. • XSL can convert an XML file into another format XML file. • XSL can convert an XML file into a nonXML file. XSL • The most common type of XSL processing is to convert XML file into HTML file which can be displayed by browsers. We will focus on this use of XSL. • XSL is the bridge between XML and HTML. • We can use XSL to have different HTML formats of the same data represented in XML. • Separating data (contents) from style tags (display commands). • Example: