<XML> and the Future of Internetbased Computing 11 March 2002 Ian GRAHAM Emerging Business Strategy, Bank of Montreal E: <ian.graham@bmo.com> or <ian.graham@utoronto.ca> T: (416) 513.5656 / F: (416) 513.5590 Web: http://www.utoronto.ca/ian/talks/ Emerging Business Strategy, IBS ian.graham@bmo.com / 416.513.5656 1 Overview A history lesson – The Web and the birth of XML – when, why, and who What does XML give us? Examples, illustrations, and applications The future 2 In The Beginning ..... Ftp …. was the birth of the Web (Tim Berners-Lee, 1992) • HTML • HTTP • URL News Email Web Server Db & other software HTML Internet communication protocols URLs (location e.g -- http://www.foo.org/boo.html ) (data/display) Hello There Here’s a zippy HTML page, with lots of Colors and Links ...!!! Fun, Eh? HTTP (transfer) 3 Three Core Concepts HTTP -- HyperText Transfer Protocol – A protocol for transferring data between machines on the Internet URL -- Uniform Resource Locator – A scheme for referencing, using a simple text string, the specific location of a resource (Web page, audio file, program) somewhere on the Internet (e.g. http://www.utoronto.ca/ian/talks/ ) HTML -- HyperText Markup Language – a markup language for encoding information to be read / viewed by people HTTP and URLs have pretty-well stood the test of time. But by 1996, HTML was already showing signs of age .... 4 Simple HTML Example HTML (not XML) Markup Browser Rendering <HTML> <HEAD> <TITLE>The XML Specification Guide -Website Home Page </TITLE> <LINK REL="stylesheet" HREF="style.css"> </HEAD> <BODY BGCOLOR="#FFFFFF" TEXT="black" LINK="#0066CB" ALINK="#00A000" VLINK="#808080" > <TABLE WIDTH="100%" CELLPADDING="0" CELLSPACING="0" BORDER="0"> <TR> <TD VALIGN="top" ALIGN="left"><FONT CLASS="toolbar" FACE="arial,helvetica" SIZE="-1">The XML Specification Guide </FONT></TD> …….. More tags and text …. 5 The Problems with HTML HTML designed to serve one role - simple hypertext documents, with simple user interaction (forms, etc.). But people soon wanted to display other types of data: – mathematical expressions, literary text – graphics, multimedia, interactive content ... – commercial forms, purchase orders, generic data ... and “connect” these parts together (so they can interact) ... and dynamically mix/edit chunks of data together ... and build dynamic networks that exchange information ... and make sure this works reliably, anywhere. 6 HTML Scope was Too Limited – Single model for data (hypertext text) – Syntax too lenient ... It’s easy to create HTML that can be misprocessed by other systems – Result: • can’t create arbitrary custom data that can be universally understood HTML Web Evolution interchange data between machines modeling different types of data presentation of different types of data 7 The Birth of XML... ..happened in 1996, when a group of experts assembled to try and find a way out of the problem. First draft came out in late 1996 ... Final version of the XML 1.0 specification came out in February 1998 – Large Canadian contribution -- 3 out of 18 WG members, plus 1/3 editors [Tim Bray] – Followed in 1999 by a second ‘core’ XML specification (Also with Tim Bray as co-editor) Core Principles – Simple • But not as simple as HTML, in particular with stricter formal syntax – Extensible • So you can create your own tags, or elements – Distributed environment -friendly • like HTML, but better 8 An XML Example <?xml version=“1.0” ?> <partorders xmlns=“http://myco.org/Spec/partorders” > <order ref=“x23-2112-2342” date=“25aug1999-12:34:23h”> <desc> Gold sprockel grommets, with matching hamster </desc> <part number=“23-23221-a12” /> <quantity units=“gross”> 12 </quantity> <deliveryDate date=“27aug1999-12:00h” /> </order> <order ref=“x23-2112-2342” date=“25aug1999-12:34:23h”> . . . Order something else . . . </order> </partorders> 9 What is XML? Specification of a syntax for “encoding” text-based data (words, phrases, numbers, ...), with strict syntax rules about how to do so. A text-based syntax -- written using printable characters (no explicit binary data) Extensible -- you can define your own tags (essentially data types), within the constraints of the syntax rules Universal -- the syntax rules ensure that all XML processing software MUST identically handle a given piece of XML. If you can read and process it, so can anybody else 10 Example Revisited element tags attribute of this quantity element <partorders xmlns=“http://myco.org/Spec/partorders” > <order ref=“x23-2112-2342” date=“25aug1999-12:34:23h”> <desc> Gold sprockel grommets, with matching hamster </desc> <part number=“23-23221-a12” /> <quantity units=“gross”> 12 </quantity> <deliveryDate date=“27aug1999-12:00h” /> </order> <order ref=“x23-2112-2342” date=“25aug1999-12:34:23h”> . . . Order something else . . . </order> Hierarchical, structured information </partorders> 11 Processing XML -- creating data structures ref= date= <partorders xmlns="..."> <order date="..." ref="..."> <desc> ..text.. </desc> <part /> <quantity /> <delivery-date /> </order> <order ref=".." .../> </partorders> desc text order part quantity partorders text xmlns= delivery-date order ref= date= XML syntax rules guarantees the same result, always 12 XML: Why it's this way Simple (like HTML) – But not quite so simple – Stricter syntax rules, to eliminate processing errors – syntax defines structure (hierarchically), and names structural parts (element names) -- it is self-describing data Extensible (unlike HTML, vocabulary is not fixed) – Can create your own language of tags/elements, with rules – Strict syntax ensures that custom tags can be reliably processed Designed for a distributed environment (like HTML) – Can have data all over the place: can retrieve and use it reliably Can mix different data types together (unlike HTML) – Can mix one set of tags with another set: resulting data can still be reliably processed 13 Mixing dialects together: name spaces Default ‘type’ is xhtml <?xml version="1.0" encoding="iso-8859-1"?> <html xmlns="http://www.w3.org/1999/xhtml1" xmlns:mt=“http://www.w3.org/1998/mathml” > <head> <title> Title of XHTML Document </title> </head><body> <div class="myDiv"> <h1> Heading of Page </h1> <mt:mathml> <mt:sup> ...… MathML markup … </mt:mathml> <p> more html stuff goes here </p> </div> </body> </html> mt: prefix indicates 'type' mathml (a different language) 14 W3C rec XML Specification(s) Chart XML 1.0 XML names 15 Classes of XML Dialects XML gives us a tool for expressing data in a universally shareable way. Many XML 'dialects,' optimised for different roles. Can roughly break these down into five categories – presentation & data stuff people read, look at, or exchange – metadata for describing things; for use by other software – distributed apps data delivery; distributed applications, Web services – XML utilities XSLT, Schemas,… – software utilities variety of things … We’ll now look at some examples from the first three categories. 16 Classes of XML Dialects 1) Presentational Language (for people/applications) – – – – – – SMIL -- for multimedia (RealPlayer Multimedia players) WML -- Wireless WAP-phones XUL -- user interface (Netscape 6) VoiceXML -- voice interfaces (telephone-based ...) XHTML -- XMLized version of HTML … Some language with specific academic relevance: – – – – – TEI -- Text encodinghttp://www.tei-c.org/ MathML -- for mathematics http://www.w3.org/Math XHTML -- new HTML http://www.w3.org/MarkUp SVG -- for graphics http://www.w3.org/Graphics/SVG HEML -- historical events http://www.heml.org 17 TEI -- Text Encoding Initiative ... represent all kinds of literary and linguistic texts for online research and teaching, using an encoding scheme that is maximally expressive and minimally obsolescent. † Recently migrated to be compatible with XML (TEI-Lite) – Namespaces let you re-use XHTML ‘links’ – XML also has its own more expressive linking/pointing mechanisms Some online examples via .... [ www.utoronto.ca/ian/talks/11mar02/examples.html ] Gain: universally accessible literary/academic texts, with networked capabilities † From: TEI home page, http://www.tei-c.org, 16 Jan 2002 18 MathML, SVG: for Mathematics and Graphics XML dialects that model essential “types” of data for presentations and display. “Namespace” mechanism let you mix these different types of information together, and with other dialects (like XHTML) Some online examples .... [ www.utoronto.ca/ian/talks/11mar02/examples.html ] Advantages: Can communicate both structural and semantic information (how it looks and what it means) – Interactive mathematical example documents – Interfaces with tools like Mathematica, Maple – Non-proprietary languages, interfaces 19 HEML: Historical Event Markup and Linking ... elements that are flexible enough to represent most known events in the past while working well with existing document encoding schemes, such as XHTML, TEI-Lite and Docbook. † Online examples at ... [ www.utoronto.ca/ian/talks/11mar02/examples.html ] A “web” of historical events, cross-linking documents with resources, timelines, etc. † From: HEML home page, http://www.heml.org, 16 Jan 2002 20 And others CML - Chemical Markup Lang CellML - biological models BSML - bioinformatic sequences MAGE-ML - Microarray Gene Expression XSTAR - for archaeological research XMLMARC - MARC in XML AML - astronomy markup language ... many (dozens and dozens) more ... There has been an explosion of activity towards developing “universal” XML formats for encoding, exchanging and linking information. “Evolutionary” forces still at play (many languages are born, but only a few will survive) Prediction -- this will lead to a big change in how academic information is created, shared, and stored. 21 Informational Data: Metadata and Packages Can use XML to encode information about data – Indexes, catalog records, etc. – data about non-text resources (images, people, whatever) Can also use XML to package up information (data + catalog) Example: IMS Content packaging – A standard for “packaging” Web content relevant to Web based instructional applications – Will allow for interoperable content -- so it can be moved between different IMS-compliant learning systems. – A growing number of learning systems, including WebCT, support this standard One of the core components for creating learning objects 22 Distributed Data The networking of the data is becoming more important that the data itself XML is becoming the tool for creating such networks, and for transporting data from place to place in that network. The preceding example languages can sometimes do this sort of thing, but there are also specific XML languages aimed at this role. These ideas -- and some of the existing tools -- can be used in Portal / Website development, creation of distributed databases, etc. 23 Distributed data application: Open Directory RDF -- Resource Description Framework – A language for encoding metadata about resources – Used by the Open Directory Project to create an open, shareable directory of Web resources – Can search the directory site (like Yahoo), or download the entire directory and integrate it into your own. Current directory has: – – – – 46,000 human editors 45,000 categories millions or ‘resources’ catalogued re-used by ~290 sites around the world Online examples from ... [ www.utoronto.ca/ian/talks/11mar02/examples.html ] 24 Open Directory Model dmoz.org RDF data feeds: infospace <XML> Ask Jeeves Google infospace Downloading XML data from well-known location Labour party UK 25 Distributed data application: RSS RSS -- Rich/Resource/RDF Site Summaries – A language for encoding summary data about Web pages/sites, and related metadata (update interval, etc.) – Designed for syndicated distribution of information about pages – Rather like headlines for newspapers There are currently 850+ syndicators of such data, and several thousand RSS ‘feeds’ – News agencies – Web sites with updated content – individuals with ‘blogs’ Online examples from ... [ www.utoronto.ca/ian/talks/11mar02/examples.html ] 26 RSS Syndication Model sites ... RSS consumers Web site RSS aggregator Desktop app (e.g., Headline Viewer) Black lines: <XML> JavaScript component Other ... (aggregator, ...) ‘one-way’ XML -Simple querying of ‘aggregator’ via URLs: http://ag.org/?news 27 Distributed data application: Jabber open, XML-based protocol for instant messaging and presence. Jabber-based software is deployed on thousands of servers across the internet and is used by over a million people worldwide. A complete XML-based distributed application toolset. † From: TEI home page, http://www.tei-c.org, 16 Jan 2002 28 Jabber: Jabber clients • Presence • User directory • Proxys to Yahoo, ICQ • Other services Jabber server Jabber server 29 Jabber Example Jabber client Jabber client Jabber server • Connect register presence • Lookup user contact database • Send text message contact database Jabber server Requests and responses all sent in XML Generic XML protocol for exchanging messages, plus some services. Can be extended to non-text messaging applications 30 XML for networked applications XML for encoding data XML for transporting information between applications XML for encoding instructions to send to another application – XML interfaces to other applications Creation of Web Services – Software made available to others via a generic XML interface, with supporting facilities (directory service for ‘finding’ them, etc.) XML is becoming the core tool for building distributed, dynamically configured applications 31 How can this be used? XML interface (SOAP, XML-RPC, other...) Integrated Application Web site News Feeds Jabber/chat • Web content distribution • Calendar aggregation • Portlets for Web sites • Distributed catalogs / db’s Banking 32 The result of all this activity Enormous drive to create all the XML technologies needed behind the scenes Many “core” XML languages, plus many supporting standards Evolution has been very quick, as the new Web model is not that n 33 XML (and related) Specifications XML Core XML 1.0 W3C rec industry std W3C draft ‘Open’ std Xfragment XML names RDF Canonical Xpath MathML APIs XSLT JDOM Xpointer SMIL 1 & 2 XML base VoiceXML JAXP Xlink XSL DOM 1 DOM 2 DOM 3 XML signature Infoset XHTML events XML query …. UDDI RSS SOAP Biztalk CSS 1 CSS 2 CSS 3 Style WDDX ... IFX IMS XML-RPC XMI ebXML ... Jabber WSDL Protocols Web Services CellML XHTML 1.0 XHTML basic Xforms XML schema SAX 1 SAX 2 TEI HEML ... Application areas …... SVG Modularized XHTML Docbook XUL 100's more .... ... Data/presentaion 34 In Conclusion XML is changing the way we think about ‘raw’ information – – – – – Open, Universal Shareable Distributable Collective, complex, and emergent .. and with the Internet model is changing the way we think about applications – Networked (via XML) collections of individually simple apps. – Value in aggregation, not the individual parts 35 Conclusion II “A large part of how we think about music is influenced by the methods by with which it has conventionally been distributed. We think of pop songs as being three or four minutes long because 40 years ago that was all that could fit on one side of a vinyl single.” Moby We think of Internet-based computing is the same way -- in terms of what we know or knew -- not what it can be, or will become Our great opportunity is to help define this future 36 <XML> and the Future of Internetbased Computing 11 March 2002 Ian GRAHAM Emerging Business Strategy, Bank of Montreal E: <ian.graham@bmo.com> or <ian.graham@utoronto.ca> T: (416) 513.5656 / F: (416) 513.5590 Web: http://www.utoronto.ca/ian/talks/ Emerging Business Strategy, IBS ian.graham@bmo.com / 416.513.5656 37