ISOM Standards in Information Management: XML Arijit Sengupta Learning Objectives ISOM • Learn what XML is • Learn the various ways in which XML is used • Learn the key companion technologies • See how XML is being used in industry as a meta-language Agenda ISOM • • • • Overview Syntax and Structure The XML Alphabet Soup XML as a meta-language Overview What is XML? ISOM • • • • A tag-based meta language Designed for structured data representation Represents data hierarchically (in a tree) Provides context to data (makes it meaningful) Self-describing data • Separates presentation (HTML) from data (XML) • An open W3C standard • A subset of SGML vs. HTML, which is an implementation of SGML Overview What is XML? ISOM • XML is a “use everywhere” data specification XML XML Application X Documents XML Repository XML Database Configuration Overview Documents vs. Data ISOM • XML is used to represent two main types of things: Documents • Lots of text with tags to identify and annotate portions of the document Data • Hierarchical data structures Overview XML and Structured Data ISOM • Pre-XML representation of data: “PO-1234”,”CUST001”,”X9876”,”5”,”14.98” • XML representation of the same data: <PURCHASE_ORDER> <PO_NUM> PO-1234 </PO_NUM> <CUST_ID> CUST001 </CUST_ID> <ITEM_NUM> X9876 </ITEM_NUM> <QUANTITY> 5 </QUANTITY> <PRICE> 14.98 </PRICE> </PURCHASE_ORDER> Overview Benefits of XML ISOM • Open W3C standard • Representation of data across heterogeneous environments Cross platform Allows for high degree of interoperability • Strict rules Syntax Structure Case sensitive Overview Who Uses XML? ISOM • Submissions by Microsoft IBM Hewlett-Packard Fujitsu Laboratories Sun Microsystems Netscape (AOL), and others… • Technologies using XML SOAP, ebXML, BizTalk, WebSphere, many others… Agenda ISOM • • • • Overview Syntax and Structure The XML Alphabet Soup XML as a meta-language Syntax and Structure Components of an XML Document ISOM • Elements Each element has a beginning and ending tag • <TAG_NAME>...</TAG_NAME> Elements can be empty (<TAG_NAME />) • Attributes Describes an element; e.g. data type, data range, etc. Can only appear on beginning tag • Processing instructions Encoding specification (Unicode by default) Namespace declaration Schema declaration Syntax and Structure Components of an XML Document ISOM <?xml version=“1.0” ?> <?xml-stylesheet type="text/xsl” href=“template.xsl"?> <ROOT> <ELEMENT1><SUBELEMENT1 /><SUBELEMENT2 /></ELEMENT1> <ELEMENT2> </ELEMENT2> <ELEMENT3 type=‘string’> </ELEMENT3> <ELEMENT4 type=‘integer’ value=‘9.3’> </ELEMENT4> </ROOT> Elements with Attributes Elements Prologue (processing instructions) Syntax and Structure Rules For Well-Formed XML ISOM • There must be one, and only one, root element • Sub-elements must be properly nested A tag must end within the tag in which it was started • Attributes are optional Defined by an optional schema • Attribute values must be enclosed in “” or ‘’ • Processing instructions are optional • XML is case-sensitive <tag> and <TAG> are not the same type of element Syntax and Structure Well-Formed XML? ISOM • No, CHILD2 and CHILD3 do not nest properly <xml? Version=“1.0” ?> <PARENT> <CHILD1>This is element 1</CHILD1> <CHILD2><CHILD3>Number 3</CHILD2></CHILD3> </PARENT> Syntax and Structure Well-Formed XML? ISOM • No, there are two root elements <xml? Version=“1.0” ?> <PARENT> <CHILD1>This is element 1</CHILD1> </PARENT> <PARENT> <CHILD1>This is another element 1</CHILD1> </PARENT> Syntax and Structure Well-Formed XML? ISOM • Yes <xml? Version=“1.0” ?> <PARENT> <CHILD1>This is element 1</CHILD1> <CHILD2/> <CHILD3></CHILD3> </PARENT> Syntax and Structure An XML Document ISOM <?xml version='1.0'?> <bookstore> <book genre=‘autobiography’ publicationdate=‘1981’ ISBN=‘1-861003-11-0’> <title>The Autobiography of Benjamin Franklin</title> <author> <first-name>Benjamin</first-name> <last-name>Franklin</last-name> </author> <price>8.99</price> </book> <book genre=‘novel’ publicationdate=‘1967’ ISBN=‘0-201-63361-2’> <title>The Confidence Man</title> <author> <first-name>Herman</first-name> <last-name>Melville</last-name> </author> <price>11.99</price> </book> </bookstore> Syntax and Structure Namespaces: Overview ISOM • Part of XML’s extensibility • Allow authors to differentiate between tags of the same name (using a prefix) Frees author to focus on the data and decide how to best describe it Allows multiple XML documents from multiple authors to be merged • Identified by a URI (Uniform Resource Identifier) When a URL is used, it does NOT have to represent a live server Syntax and Structure Namespaces: Declaration ISOM Namespace declaration examples: xmlns: bk = “http://www.example.com/bookinfo/” xmlns: bk = “urn:mybookstuff.org:bookinfo” xmlns: bk = “http://www.example.com/bookinfo/” Namespace declaration Prefix URI (URL) Syntax and Structure Namespaces: Examples ISOM <BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE currency=‘US Dollar’>19.99</bk:PRICE> <bk:BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo” xmlns:money=“urn:finance:money”> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE money:currency=‘US Dollar’> 19.99</bk:PRICE> Syntax and Structure Namespaces: Default Namespace ISOM • An XML namespace declared without a prefix becomes the default namespace for all sub-elements • All elements without a prefix will belong to the default namespace: <BOOK xmlns=“http://www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> Syntax and Structure Namespaces: Scope ISOM • Unqualified elements belong to the inner-most default namespace. BOOK, TITLE, and AUTHOR belong to the default book namespace PUBLISHER and NAME belong to the default publisher namespace xmlns=“www.bookstuff.org/bookinfo”> <BOOK <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> <PUBLISHER xmlns=“urn:publishers:publinfo”> <NAME>Microsoft Press</NAME> </PUBLISHER> </BOOK> Syntax and Structure Namespaces: Attributes ISOM • Unqualified attributes do NOT belong to any namespace Even if there is a default namespace • This differs from elements, which belong to the default namespace Syntax and Structure Entities ISOM • Entities provide a mechanism for textual substitution, e.g. Entity Substitution &lt; &amp; < & • You can define your own entities • Parsed entities can contain text and markup • Unparsed entities can contain any data JPEG photos, GIF files, movies, etc. Agenda ISOM • • • • Overview Syntax and Structure The XML Alphabet Soup XML as a meta-language The XML ‘Alphabet Soup’ ISOM • XML itself is fairly simple • Most of the learning curve is knowing about all of the related technologies The XML ‘Alphabet Soup’ ISOM XML Extensible Markup Language Defines XML documents Infoset Information Set Abstract model of XML data; definition of terms DTD Document Type Definition Non-XML schema XSD XML Schema XML-based schema language XDR XML Data Reduced An earlier XML schema CSS Cascading Style Sheets Allows you to specify styles XSL Extensible Stylesheet Language Language for expressing stylesheets; consists of XSLT and XSL-FO XSLT XSL Transformations Language for transforming XML documents XSL-FO XSL Formatting Objects Language to describe precise layout of text on a page The XML ‘Alphabet Soup’ ISOM XPath XML Path Language A language for addressing parts of an XML document, designed to be used by both XSLT and XPointer XPointer XML Pointer Supports addressing into the Language internal structures of XML documents XLink XML Linking Describes links between XML Language documents XQuery XML Query Language Flexible mechanism for querying (draft) XML data as if it were a database DOM Document Object API to read, create and edit XML Model documents; creates in-memory object model SAX Simple API for XML API to parse XML documents; event-driven Data Island XML data embedded in a HTML page Data Automatic population of HTML elements from XML data Binding The XML ‘Alphabet Soup’ Schemas: Overview ISOM • DTD (Document Type Definitions) Not written in XML No support for data types or namespaces • XSD (XML Schema Definition) Written in XML Supports data types Current standard recommended by W3C The XML ‘Alphabet Soup’ Schemas: Purpose ISOM • Define the “rules” (grammar) of the document Data types Value bounds • A XML document that conforms to a schema is said to be valid More restrictive than well-formed XML • Define which elements are present and in what order • Define the structural relationships of elements The XML ‘Alphabet Soup’ Schemas: DTD Example ISOM • XML document: <BOOK> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> </BOOK> • DTD schema: <!DOCTYPE <!ELEMENT <!ELEMENT <!ELEMENT ]> BOOK BOOK TITLE AUTHOR [ (TITLE+, AUTHOR) > (#PCDATA) > (#PCDATA) > The XML ‘Alphabet Soup’ Schemas: XSD Example ISOM • XML document: <CATALOG> <BOOK> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> </BOOK> … </CATALOG> The XML ‘Alphabet Soup’ Schemas: XSD Example ISOM <xsd:schema id="NewDataSet“ targetNamespace="http://tempuri.org/schema1.xsd" xmlns="http://tempuri.org/schema1.xsd" xmlns:xsd="http://www.w3.org/1999/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"> <xsd:element name="book"> <xsd:complexType content="elementOnly"> <xsd:all> <xsd:element name="title" minOccurs="0" type="xsd:string"/> <xsd:element name="author" minOccurs="0" type="xsd:string"/> </xsd:all> </xsd:complexType> </xsd:element> <xsd:element name=“Catalog" msdata:IsDataSet="True"> <xsd:complexType> <xsd:choice maxOccurs="unbounded"> <xsd:element ref="book"/> </xsd:choice> </xsd:complexType> </xsd:element> </xsd:schema> The XML ‘Alphabet Soup’ Schemas: Why You Should Use XSD ISOM • Newest W3C Standard • Broad support for data types • Reusable “components” Simple data types Complex data types • • • • • Extensible Inheritance support Namespace support Ability to map to relational database tables XSD support in Visual Studio.NET The XML ‘Alphabet Soup’ Transformations: XSL ISOM • Language for expressing document styles • Specifies the presentation of XML More powerful than CSS • Consists of: XSLT XPath XSL Formatting Objects (XSL-FO) The XML ‘Alphabet Soup’ Transformations: Overview ISOM • XSLT – a language used to transform XML data into a different form (commonly XML or HTML) XML XML, HTML, … XSLT The XML ‘Alphabet Soup’ Transformations: XSLT ISOM • The language used for converting XML documents into other forms • Describes how the document is transformed • Expressed as an XML document (.xsl) • Template rules Patterns match nodes in source document Templates instantiated to form part of result document • Uses XPath for querying, sorting, etc. The XML ‘Alphabet Soup’ XPath (XML Path Language) ISOM • General purpose query language for identifying nodes in an XML document • Declarative (vs. procedural) • Contextual – the results depend on current node • Supports standard comparison, Boolean and mathematical operators (=, <, and, or, *, +, etc.) The XML ‘Alphabet Soup’ XPath Operators ISOM Operator Usage Description / Child operator – selects only immediate children (when at the beginning of the pattern, context is root) // Recursive descent – selects elements at any depth (when at the beginning of the pattern, context is root) . Indicates current context .. Selects the parent of the current node * Wildcard @ Prefix to attribute name (when alone, it is an attribute wildcard) [ ] Applies filter pattern The XML ‘Alphabet Soup’ XPath Query Examples ISOM ./author (finds all author elements within current context) /bookstore (find the bookstore element at the root) /* (find the root element) //author (find all author elements anywhere in document) /bookstore[@specialty = “textbooks”] (find all bookstores where the specialty attribute = “textbooks”) /book[@style = /bookstore/@specialty] (find all books where the style attribute = the specialty attribute of the bookstore element at the root) More XPath Examples ISOM Path Expression Result /bookstore/book[1] Selects the first book element that is the child of the bookstore element /bookstore/book[last()] Selects the last book element that is the child of the bookstore element /bookstore/book[last()-1] Selects the last but one book element that is the child of the bookstore element /bookstore/book[position()<3] Selects the first two book elements that are children of the bookstore element //title[@lang] Selects all the title elements that have an attribute named lang //title[@lang='eng'] Selects all the title elements that have an attribute named lang with a value of 'eng' /bookstore/book[price>35.00] Selects all the book elements of the bookstore element that have a price element with a value greater than 35.00 /bookstore/book[price>35.00]/title Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00 XPath Functions ISOM • Accessor functions: node-name, data, base-uri, document-uri • Numeric value functions: abs, ceiling, floor, round, … • String functions: compare, concat, substring, string-length, uppercase, lowercase, starts-with, endswith, matches, replace, … • Other functions include functions on boolean values, dates, nodes, etc. The XML ‘Alphabet Soup’ Data Islands ISOM • XML embedded in an HTML document • Manipulated via client side script or data binding <XML id=“XMLID”> <BOOK> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> </BOOK> </XML> <XML id=“XMLID” src=“mydocument.xml”> The XML ‘Alphabet Soup’ Data Islands ISOM • Can be embedded in an HTML SCRIPT element • XML is accessible via the DOM: <SCRIPT language=“xml” id=“XMLID”> <SCRIPT type=“text/xml” id=“XMLID”> <SCRIPT language=“xml” id=“XMLID” src=“mydocument.xml”> The XML ‘Alphabet Soup’ XML-Based Applications ISOM • Microsoft SQL Server Retrieve relational data as XML Query XML data Join XML data with existing database tables Update the database via XML Updategrams New XML data type in SQL 2005 • Microsoft Exchange Server XML is native representation of many types of data Used to enhance performance of UI scenarios (for example, Outlook Web Access (OWA)) Agenda ISOM • • • • Overview Syntax and Structure The XML Alphabet Soup XML as a meta-language XML as a Meta-Language ISOM SAX DOM A Language to create Languages CSS DSSL XSL XML/DTD XLL XSLT XSchema GO CML XPath MathML XPointer XQL WML BeanML Gene Ontology (GO) ISOM • Describing and manipulating information about the molecular function, biological process and cellular component of gene products. • Gene Ontology website: http://www.geneontology.org • GO DTD: ftp://ftp.geneontology.org/pub/go/xml/dtd/go.dtd • GO Browsers and tools: http://www.geneontology.org/#tools • GO Resources and samples: http://www.geneontology.org/#annotations Math ML ISOM • Describing and manipulating mathematical notations • MathML website www.w3.org/Math • MathML DTD www.w3.org/Math/DTD • MathML Browser www.w3.org/Amaya • MathML Resources www.webeq.com/mathml see sample documents here Chemical ML ISOM • Representing molecular and chemical information • CML website www.xml-cml.org • CML DTD www.xml-cml.org/dtdschema/index.html • CML Browser and Authoring Environment www.xml-cml.org/jumbo.html • CML Resources www.xml-cml.org/chimeral/index.html see sample documents here some require plug-in downloads, can be slow Wireless ML ISOM • Allows web pages to be displayed over mobile devices • WML works with WAP to deliver the content • Underlying model: Deck of Cards that the User can sift through • WAP/WML website www.wapforum.org • WML DTD www.wapforum.org/DTD/wml_1.1.xml • WAP/WML Resources www.oasis-open.org/cover/wap-wml.html www.w3scripts.com/wap Tutorial on WML, also see WAP Demo Scalable Vector Graphics ISOM • Describing vector graphics data for use over the web • Rendering is done on the browser • Bandwidth demands lower, scaling easier • SVG website www.w3.org/Graphics/SVG • SVG Plug-Ins www.adobe.com/svg • SVG Resources www.irt.org/articles/js176 1999 article and good, brief tutorial planet.svg An Example from Deitel Bean ML ISOM • Describing software components such as Java Beans • Defines how the components are interconnected and can be used • Bean ML Specs and Tools www.alphaworks.ibm.com/aw.nsf/techmain/bml • Bean ML Resources www.oasis-open.org/cover/beanML.html With Bean ML • You can mark-up beans using Bean ML • And invoke different operations on Beans • Includes BML Scripting Framework XBRL ISOM • Extensible Business Reporting Language • Capturing and representing financial and accounting information • Variety of situations e.g. publishing reports, extracting data for analysis, regulatory forms etc. • Initiated under the direction of AICPA • XBRL website www.xbrl.org • XBRL DTDs and Schemas http://www.xbrl.org/Core/2000-07-31/default.htm • Demos and Tools http://www.xbrl.org/Demos/demos.htm http://www.xbrl.org/Tools.htm News ML ISOM • Designed to be media-independent • Initiated by International Press Telecommunications Council • Enables tracking of news stories over time • NewsML website www.newsml.org • NewsML DTD http://www.oasis-open.org/cover/newsML.html • SportsML DTD – Derived from NewsML DTD http://xml.coverpages.org/sportsML.html cXML ISOM • CommerceXML from Ariba plus 40 other companies • cXML website www.cxml.org • Primary Set of Tools/Implementations to support cXML http://www.ariba.com/solutions/solutions_overview.cfm See also Whitepapers link explaining how these can be used for • E-procurement • E-fulfillment • And others .. xCBL ISOM • xCBL from Microsoft, SAP, Sun • xCBL website www.xcbl.org Marketed as XML component library for B2B e-commerce • Available Resources (see internal links) DTDs and Schemas XDK: SOX Parser and an XSLT Engine Example Documents ebXML ISOM • UN/CEFACT: the United Nations body whose mandate covers worldwide policy and technical development in the area of trade facilitation and electronic business. www.uncefact.org • ebXML website www.ebxml.org • Current Endorsements http://www.ebxml.org/endorsements.htm Still needs buy-in from the larger IS/IT vendors • Related Effort: RosettaNet http://www.rosettanet.org/rosettanet/Rooms/DisplayPages/L ayoutInitial Business Processes for IT, Component and Chip companies Conclusion ISOM • • • • Overview Syntax and Structure The XML Alphabet Soup XML as a meta-language Resources ISOM • • • • http://www.xml.com/ http://www.w3.org/xml/ http://www.w3schools.com/ http://msdn.microsoft.com/xml/