Inside SQLXML Virtual Directory Structure This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred. 2002 Microsoft Corporation. All rights reserved. Microsoft, BizTalk Server, and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. 2 Table of Contents Introduction ................................................................................................. 1 Welcoming XML Documents into Your Database .............................................. 1 A Brief Introduction to XML for The DBA .................................................... 1 Using XML Views to Create Virtual Documents ................................................ 2 Virtual Document Requirements ............................................................... 3 Matrix for Virtual Document Function Compatibility..................................... 3 Virtual Document Architectural Overview .................................................. 4 FOR XML Queries: The Ultimate Source of Virtual Documents ...................... 4 Leveraging Mapping Schemas to Supercharge Your SQLXML Tier.................. 9 Conclusion .................................................................................................. 15 i Introduction The software development community has seen many changes in its history, but database programming paradigms have been stable since the relational model. The add-ons of increased scalability, binary storage, and replication were all natural extensions of the same design model. If you wanted a document in a database, you stored it as a BLOB, a data type image, or as text. Even in the first and second revolutions of the Internet (TCP/IP and then HTTP), the axis mundi of an application, the database, kept doing what it had always been doing: responding to SQL queries and generating result sets. The third revolution of XML-based Web services brings the Internet much closer to the database and forces new thinking in terms of what a document is and how it will exist in the next generation of applications. This white paper provides an orientation to XML and clarifies the role of XML documents in the future of Microsoft® SQL Server™ programmability. After a brief orientation to the context for XML integration with SQL Server, you will learn about the various technologies required to bring XML into your data tier and ways to present your existing data as a virtual document. This white paper is intended to be an introduction to XML for the database professional; it does not encompass all issues regarding XML integration. After reading the entire white paper, you will have a better understanding of why XML is coming to your database and what to do to get your virtual documents rolling. Welcoming XML Documents into Your Database Most database developers have specialized in performance tuning, logical entities, and a data access API or two. Neither XML nor HTML has had a place in these worlds and the necessity for a document technology in the data tier is at first confusing. The Microsoft Windows® DNA programming model for scalability places the onus of presentation on other layers. Some have argued that XML allows you to do everything that you could do previously with a recordset but with more complexity and less speed. So why change your database to accommodate XML? The answer is that you only express them in a different form. All of your well-normalized entities, finely tuned queries and well-groomed indexes will remain intact if you plan properly for XML integration. The .NET framework includes two major changes: the Common Language Runtime Environment, which is beyond the scope of this white paper, and the pervasive use of XML. Whether they build from scratch or integrate shipping technology, developers will come to your door from all of the projects accessing the database with needs for several different types of documents. If you are still in need of a context for where XML will help your development community, go to http://msdn.microsoft.com/xml. A Brief Introduction to XML for The DBA If you are already familiar with the basic structures of XML documents and terminology, you can skip to the next section. 1 XML has three functional components: the XML document itself, a validating schema, and any namespaces used within both. Namespaces are a way of declaring nodes (elements or attributes) that have a specific meaning to an application that will read the document. Most namespaces are read by the MSXML parser, but SQLXML uses its own namespaces (these are discussed in detail later). There are many similarities between a document’s schema and a table’s schema: you can define a data type, what values are required or optional, and add constraints. In order to generate a valid document, its structure must obey any rules defined in the schema. The following is an example of an XML document in which the schema has been included in the document as an inline schema: <?xml version="1.0" encoding="utf-8" ?> - <root> - <Schema name="Schema1" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> - <ElementType name="Employees" content="empty" model="closed"> <AttributeType name="EmployeeID" dt:type="i4" /> <AttributeType name="FirstName" dt:type="string" /> <AttributeType name="lastName" dt:type="string" /> <attribute type="EmployeeID" /> <attribute type="FirstName" /> <attribute type="lastName" /> </ElementType> </Schema> <Employees xmlns="x-schema:#Schema1" EmployeeID="1" FirstName="Nancy" lastName="Davolio" /> <Employees xmlns="x-schema:#Schema1" EmployeeID="2" FirstName="Andrew" lastName="Fuller" /> <Employees xmlns="x-schema:#Schema1" EmployeeID="3" FirstName="Janet" lastName="Leverling" /> </root> Using XML Views to Create Virtual Documents To the applications that want to use your data in new ways, you will generate XML documents based on schemas that are provided. What makes these documents virtual is that you do not have to use text or image columns for storage. The documents are virtual because you are creating a view on your existing data that just happens to wrap your column headers, rows, and tuples into XML encoding, as opposed to Tabular Data Stream packet recordsets. The ability puts a thin veil over 2 your database so that any XML-based application will not be able to distinguish your database’s translation to and from XML, or a SQLXML tier, and a flat file XML stream. The benefits of using virtual documents as opposed to a persisted flat file include the following: No extra storage space is used for creating a final format. The most recent updates are available on demand. Security for document access maps directly to SQL models. Schemas exist independent from the tables and can be modified without affecting the underlying database schema. Unless you need an audit trail for a document state, the virtual document model makes a transparent addition to your database access. Virtual Document Requirements There are some mandatory and optional components for managing virtual documents. With the exception of OPENXML, all XML based programming in SQL Server uses the ICommandStream object, which was introduced into SQLOLEDB with version 2.6 of Microsoft Data Access Components (MDAC). Some of the technologies mentioned in this white paper shipped with and apply only to SQL Server 2000, but others (XML Bulk Load, Updategrams, and XSD support) will require installing SQLXML 3.0, also known as SQLXML. Matrix for Virtual Document Function Compatibility The following table shows the compatibility levels for different functions. Function SQL Server Tier SQLXML SOAP 2000 IIS 3.0 or higher XPath Queries 2000 IIS 1.0 or higher Updategrams 2000 IIS 1.0 or higher Template Queries 2000 IIS 1.0 or higher XML Bulk Load 2000 Both 1.0 or higher XSD Support 2000 IIS 2.0 or higher OPENXML 2000 SQL N/A FOR XML 2000 SQL N/A 3 Virtual Document Architectural Overview The following diagram outlines the entire document workflow available using XML in SQL Server 2000. Figure 1: Virtual Document Architecture As you can see, the majority of the virtual document structures eliminate any references to XML before the SQL command is forwarded to the relational engine. The only place where the Command that is forwarded to SQL Server has any indication that the data needs to be in an XML document is in the FOR XML clause. We will now go into detail about effectively using FOR XML to get the type of documents you want and how to leverage its power. FOR XML Queries: The Ultimate Source of Virtual Documents SQL Server 2000 introduced revisions to the Transact-SQL language that provide DBAs and developers an easy method to generate XML documents without having to master XML: the FOR XML clause. The FOR XML clause has three basic modes of tokenizing a recordset into an XML document: FOR XML RAW FOR XML AUTO 4 FOR XML EXPLICIT The following sections indicate the type of generic documents you can generate with each mode of FOR XML. Virtual Documents from FOR XML RAW Queries SELECT '<root>'; SELECT TOP 3 EmployeeID, FirstName, LastName FROM Employees FOR XML RAW, XMLDATA SELECT '</root>'; <root> <Schema name="Schema1" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <ElementType name="row" content="empty" model="closed"> <AttributeType name="EmployeeID" dt:type="i4" /> <AttributeType name="FirstName" dt:type="string" /> <AttributeType name="LastName" dt:type="string" /> <attribute type="EmployeeID" /> <attribute type="FirstName" /> <attribute type="LastName" /> </ElementType> </Schema> <row xmlns="x-schema:#Schema1" EmployeeID="1" FirstName="Nancy" LastName="Davolio" /> <row xmlns="x-schema:#Schema1" EmployeeID="2" FirstName="Andrew" LastName="Fuller" /> <row xmlns="x-schema:#Schema1" EmployeeID="3" FirstName="Janet" LastName="Leverling" /> </root> The example above creates the fastest but most generic XML representation of a recordset using the RAW mode of a virtual document. It does not have any business process context unless your schema uses a default row element with column attributes. When using SQL Server-based virtual directories, your data can be communicated using HTTP to any XML-savvy application on any platform. Keep in mind that the XMLDATA keyword is used to generate an XDR-style schema inline in the same XML document and is not mandatory. You can learn the relationship between schemas and document structure easier by examining the document structure as it relates to the schema structure. When you have an 5 understanding of a SELECT statement with a FOR XML clause, the XML document that you need you create comes naturally. Virtual Documents from FOR XML AUTO Queries SELECT '<root>'; SELECT TOP 3 EmployeeID, FirstName, LastName FROM Employees as EmployeeDetail FOR XML AUTO, ELEMENTS, XMLDATA SELECT '</root>'; <root> <Schema name="Schema1" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <ElementType name="EmployeeDetail" content="eltOnly" model="closed" order="many"> <element type="EmployeeID" /> <element type="FirstName" /> <element type="LastName" /> </ElementType> <ElementType name="EmployeeID" content="textOnly" model="closed" dt:type="i4" /> <ElementType name="FirstName" content="textOnly" model="closed" dt:type="string" /> <ElementType name="LastName" content="textOnly" model="closed" dt:type="string" /> </Schema> <EmployeeDetail xmlns="x-schema:#Schema1"> <EmployeeID>1</EmployeeID> <FirstName>Nancy</FirstName> <LastName>Davolio</LastName> </EmployeeDetail> <EmployeeDetail xmlns="x-schema:#Schema1"> <EmployeeID>2</EmployeeID> <FirstName>Andrew</FirstName> <LastName>Fuller</LastName> </EmployeeDetail> <EmployeeDetail xmlns="x-schema:#Schema1"> <EmployeeID>3</EmployeeID> <FirstName>Janet</FirstName> <LastName>Leverling</LastName> 6 </EmployeeDetail> </root> When an XML schema for a virtual document maps directly to one table or translates to simple hierarchies, you can use FOR XML in AUTO mode. By default, all columns show up as attributes on the table name or as a table alias as in the query above. The ELEMENTS keyword is recognized only by FOR XML in AUTO mode and is an allor-nothing setting for column information displayed as subelements instead of attributes. This is the most common way to extend existing Transact-SQL-based views used in existing reports and presentation layers into an XML document form when the XML schema is being created based on the existing uses of the data. Virtual Documents from FOR XML EXPLICIT Queries However, XML document structure is by no means one-size-fits-all. The majority of migrations to XML-based workflows will present database developers with an existing XML annotated schema from Microsoft BizTalk™ Server 2000, a DTD rewritten as an XSD or XDR schema, or some other form of XML specification to which you must mold your data. Before delving into how the EXPLICIT mode works, you must understand how SQL Server 2000 builds an arbitrary XML structure. In a well-formed XML document, all nodes, or elements and attributes, exist in relationship to one another. The TAG and PARENT columns are used to determine the unique node location and its parent node respectively. Specific rules govern the names of the columns and how they will be translated into XML that are covered in the topic "Using EXPLICIT Mode," in SQL Server Books Online. SELECT '<root>'; SELECT 1 as Tag, NULL as PARENT, E.EmployeeID as [Employee!1!EmployeeID], NULL as [EmployeeDetail!2!!element], NULL as [Nickname!3!!element], NULL as [Surname!4!!element] FROM (SELECT TOP 3 EmployeeID,FirstName,LastName from Employees) E UNION ALL SELECT 2, 1, E.EmployeeID as [Employee!1!EmployeeID], NULL as [EmployeeDetail!2!!element], NULL as [Nickname!3!!element], NULL as [Surname!4!!element] FROM (SELECT TOP 3 EmployeeID,FirstName,LastName from Employees) E UNION ALL 7 SELECT 3, 2, E.EmployeeID as [Employee!1!EmployeeID], NULL as [EmployeeDetail!2!!element], E.FirstName as [Nickname!3!!element], NULL as [Surname!4!!element] FROM (SELECT TOP 3 EmployeeID,FirstName,LastName from Employees) E UNION ALL SELECT 4, 2, E.EmployeeID as [Employee!1!EmployeeID], NULL as [EmployeeDetail!2!!element], NULL as [Nickname!3!!element], E.LastName as [Surname!4!!element] FROM (SELECT TOP 3 EmployeeID,FirstName,LastName from Employees) E ORDER BY [Employee!1!EmployeeID] FOR XML EXPLICIT, XMLDATA SELECT '</root>'; <root> <Schema name="Schema1" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <ElementType name="Employee" content="mixed" model="open"> <AttributeType name="EmployeeID" dt:type="i4" /> <attribute type="EmployeeID" /> </ElementType> <ElementType name="EmployeeDetail" content="mixed" model="open" /> <ElementType name="Nickname" content="mixed" model="open" /> <ElementType name="Surname" content="mixed" model="open" /> </Schema> <Employee xmlns="x-schema:#Schema1" EmployeeID="1"> <EmployeeDetail> <Nickname>Nancy</Nickname> <Surname>Davolio</Surname> </EmployeeDetail> </Employee> 8 <Employee xmlns="x-schema:#Schema1" EmployeeID="2"> <EmployeeDetail> <Nickname>Andrew</Nickname> <Surname>Fuller</Surname> </EmployeeDetail> </Employee> <Employee xmlns="x-schema:#Schema1" EmployeeID="3"> <EmployeeDetail> <Nickname>Janet</Nickname> <Surname>Leverling</Surname> </EmployeeDetail> </Employee> </root> In the example above, we have taken the same columns as rows displayed using the RAW and AUTO mode and presented the rowsets in a form that has complexities in a mixed child node format (both elements and attributes in one document) and additional nodes that could not be readily expressed from table entities. Leveraging Mapping Schemas to Supercharge Your SQLXML Tier While FOR XML queries can be written into stored procedures for the benefits of cached plans for server side execution, the bulk of XML processing will occur off the database server on Web services. These applications already have XML schemas and will not need a direct connection to the database for online transaction processing. User communities will make the business case for some dynamic form of ad hoc support for the ever-changing interactions between loosely coupled systems. Virtual documents will require a flexible architecture for XML schemas independent of the database server. SQLXML 3.0 uses two different namespaces that can be added as annotations to an existing schema. The namespace first shipped with SQL Server 2000, "schemasmicrosoft-com:xml-sql", encompasses functionality for transforming any XDR schema into an XML view of a SQL Server table or view. SQLXML 2.0 introduced "schemas-microsoft-com:mapping-schema", which extends the XML view capability for XSD-based schemas. Because an XML schema view is annotated to produce a view if the mapped tables and columns are in XML form, the terms XML View, Mapping Schema, and Annotated Schema are used interchangeably, regardless of which style of XML schema you bring to your database, 9 XDR-Based Virtual Documents Using XML View Mapper If you extract the inline references and add some data type information to the schema from the FOR XML EXPLICIT example output above, you can create the following schema: <?xml version="1.0" ?> <Schema name="Schema1" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <ElementType name="Employee"> <AttributeType name="EmployeeID" dt:type="i4" /> <attribute type="EmployeeID" /> </ElementType> <ElementType name="EmployeeDetail" /> <ElementType name="Nickname" dt:type="string" /> <ElementType name="Surname" dt:type="string" /> </Schema> In conjunction with the releases of SQLXML is the release of a tool called XML View Mapper, which creates a development environment for associating elements and their child nodes with respective database tables and columns. Figure 2 shows how to drag and drop a view or table from a particular database and link them to an existing XDR schema. Figure 2. ViewMapper 1.0 example schema mapping XML View Mapper performs implicit mapping only when the table and node names are the same; all other mapping must be accomplished using the interface or the included XDR Editor. After the database to XML node mapping is complete, the following annotated schema can be exported and used for creating virtual documents: <?xml version="1.0" ?> 10 <!-- Generated by XMLMapper.exe XDR Publisher --> <Schema xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes" xmlns:sql="urn:schemasmicrosoft-com:xml-sql"> <ElementType name="EmployeeDetail" content="mixed" order="many" sql:relation="Employees"> <element type="Nickname" minOccurs="1" maxOccurs="1" /> <element type="Surname" minOccurs="1" maxOccurs="1" /> </ElementType> <ElementType name="Nickname" content="textOnly" order="many" dt:type="string" sql:relation="Employees" sql:field="FirstName" /> <ElementType name="Surname" content="textOnly" order="many" dt:type="string" sql:relation="Employees" sql:field="LastName" /> <ElementType name="Employee" content="mixed" order="many" sql:relation="Employees"> <AttributeType name="EmployeeID" dt:type="i4" /> <attribute type="EmployeeID" required="no" /> <element type="EmployeeDetail" minOccurs="1" maxOccurs="1" sql:relation="Employees"> <sql:relationship key-relation="Employees" key="EmployeeID" foreignrelation="Employees" foreign-key="EmployeeID" /> </element> </ElementType> </Schema> Note that in addition to automatically cleaning up our schema to include some more specifics about the nature of the XML document, XML View Mapper has added some elements and attributes from the "schemas-microsoft-com:xml-sql" namespace. The attributes sql:relation and sql:field correspond to the nodes containing data from the Northwind database table and column, respectively. The sql:relationship sub element on the EmployeeDetail element is the equivalent of a JOIN statement between two different sql:relation element instantiations in a schema. In this case, a self-join is required in order for our virtual document output. This functionality is used in several SQLXML tier functions so that you have no code to manage on the data tier is necessary to create virtual documents. Virtual Document Queries Using XPath After you have created a mapping schema, you can treat the data exposed in this view like any XML document. In the same way that tables are typically accessed with a SELECT statement with a WHERE clause, XML documents can also be navigated 11 using a node navigation convention know as XPath. SQLXML allows XPath queries to run from an HTTP post when using a virtual directory enabled for SQLXML support. Using the IIS Virtual Directory Management for SQL Server on your Microsoft Internet Information Services (IIS) server, you can map a virtual directory to a SQLOLEDB connection string and physical locations for XML schemas. In Figure 3, IIS is running on the same server as SQL Server 2000, the SQLXML tier is best scaled on its own Web server. Figure 3: IIS Virtual Directory Manager for SQL Server In Figure 3 the virtual directory has been created named vdir that contains connection information to the local SQL Server to log in to the Northwind database as SA. This example uses virtual directories named schema and template for storing respectively mapping schemas and template queries, but in production these folders can have any name. After you have created the mapping schema and virtual directory configuration, from any Web browser you can run an XPath query and return a document. If we save the annotated schema shown earlier as Virtdoc.xdr in a schema folder exposed as the virtual directory s, you can return the virtual document from the FOR XML EXPLICIT with the following query: http://localhost/vdir/schema/virtdoc.xdr/Employees[@EmployeeID<4]?root=root Because you expect the document to not have a root node, SQLXML provides an external mechanism for defining a root as needed to ensure a well-formed document. For details of the parameters available for all Internet-based queries, see "Executing SQL Statements Using HTTP" in the MSDN Library at http://msdn.microsoft.com/. For more information about virtual directory configuration, see the topic "Using IIS Virtual Directory Management for SQL Server Utility" in SQL Server Books Online, and the documentation accompanying the Web releases of SQLXML. Virtual Directory features in caching, query types, and error handling have changed in each Web release. In order to build the most efficient mapping schemas, you need to understand how a virtual document is built from a SQL query. In the examples above, we could write stored procedures and ad hoc queries and specify the specific order and columns fro each SELECT that became part of the ultimate UNION and ORDER BY. You can use SQL Profiler to capture the query generated from the simple XPath Query above, copy and paste the query into Query Analyzer and see the results of the query 12 without the FOR XML clause to see how SQLXML creates the virtual document, as shown in Figure 4. Figure 4. SQL Server Query Analyzer with XPath Query example Without the definition of an sql:relationship, SQLXML will create a NULL in the Employee!1!EmployeeID column and the ORDER BY clause will not create the desired node nesting. There is a wide variety of additional namespace attributes that can give you very granular control of joins, CDATA sections, nodes with no SQL mapping and other scenarios. For more information, see the SQL Server Books Online topic, "Creating XML Views Using Annotated XDR Schemas." SQLXML Books Online has additional documentation about new mappings and changes for XSD schemas. Template Queries XPath queries are extremely powerful, but they do require knowledge of the schema in order to create the correct node navigation pattern and may be a compromise of security. Additionally, your user community may choose to search on columns that extend beyond the initial requests for indexes and cause performance problems associated with suboptimal index structures such as blocking and slow execution time. Your database is now a virtual document factory and still subject to all of the rules of good performance tuning as before it entered the brave new world of XML. In the pre-XML applications, you would use stored procedures to limit user privileges and performance impact. SQLXML gives you this same power with template queries. Templates are XML documents that support the "schemas-microsoft-com:xml-sql" namespace, but for mapping XML nodes to query components instead of database 13 structures. For example, we can create a Template.xml file that contains the following XML to only allow access to one employee record at a time while leveraging the same mapping schema from our XPath query: <EmployeeSet xmlns:sql="urn:schemas-microsoft-com:xml-sql"> <sql:header> <sql:param name="ID">1</sql:param> </sql:header> <sql:xpath-query mappingschema="..\schema\virtdoc.xdr">/Employee[@EmployeeID=$ID]</sql:xpathquery> </EmployeeSet> The sample above introduces the primary features of a template. Beginning with SQLXML 2.0, you can pass parameters to XPath queries, as defined in the sql:param element, including 1 as the default value if no parameter is passed in. Templates require a hard-coded root element, in this case EmployeeSet, because the entire template must be well formed in order to execute. The mapping-schema attribute can use local, UNC, or URL based paths, but in this case refers to our original examples. To execute the template query, save the above XML document in your template virtual directory as Template.xml and type this URL in your browser: http://localhost/vdir/template/template.xml?ID=3 Template queries map directly to any query batch sent to SQL Server. Conceivably, you could run any query (DDL, DBCC, etc) in an sql:query element. Template queries can also include a combination of XPath and SQL-based queries. 14 Conclusion This white paper has attempted to encompass the Internet-based uses of SQLXML as an introduction to SQL Server as a full "Web citizen," capable of generating XML documents from all data to interact with other tireless applications connected in loosely coupled systems. You will encounter many scenarios in which you will need additional tools. The examples above use Internet integrated workflow, but you can accomplish all functionality encompassed in virtual directories using ADO-based XML streams. Both ADO and HTTP-based XPath and template queries offer a speed as opposed to programming flexibility tradeoff, because virtual directories interact directly with the OLE DB interface. Virtual documents are not just for show; they can continue the isolation of data tier from a document tier and still for changes to your databases as needed. Mapping schemas can assist in managing modification of underlying tables using either SQLXML bulk load for a BCP-style insert and schema generation, or transactionalbased modifications using updategrams and diffgrams. All three methods present SQLXML with an XML view of the documents and their states to generate the correlated INSERT, UPDATE, and DELETE statements. When a mapping schema is not available or when more complicated server side interactions with OLE automation and other stored procedures are necessary, you can use OPENXML. Whether you use mapping schemas and virtual directories or ADO-based templates against FOR XML based stored procedures is a balancing act between performance and business goal. Regardless of the method chosen, you have taken your first step into the SQLXML tier and are ready for the thousand-mile journey of an XML integrated world. 15