XML

advertisement
XML
Microsoft introduced built-in XML support with SQL Server 2000. This chapter will provide a brief overview
of XML itself, as well as SQL Server 2000 XML support and related technologies. For a detailed overview of
the topics touched on here, see SQL Server 2000 XML Distilled by Kevin Williams et al (ISBN 1-904347-08-8),
XML
XML stands for Extensible Markup Language. XML is used primarily to make documents self-describing. Selfdescribing data is easier to interpret when it is moved from one business unit to another, or from one
company to another.
XML looks similar to HTML, in that it makes use of tags. Unlike HTML however, XML tags do not have
predefined meanings. In XML, you define your own tags. Tags in HTML describe the format of the data,
whereas XML tags describe the data itself. For example, in HTML, the following block means that the string
Welcome! is presented in bold characters:
<b> Welcome! </b>
In XML, the tag <b> doesn't have a predefined meaning.
XML is not a programming language. XML is a data format. XML presents data hierarchically, and not in a
relational format.
The following is an example of relational data from the Leaders table versus a hierarchical XML document.
Relational data:
Chapter 18
Hierarchical XML data:
<?xml version="1.0" encoding="ISO-8859-1"?>
<TheWorld>
<Leaders ID="1">
<vchTitle>President</vchTitle>
<vchCountry>Afghanistan</vchCountry>
<vchLastName>Karzai</vchLastName>
<vchFirstName>Hamid</vchFirstName>
</Leaders>
<Leaders ID="2">
<vchTitle>President</vchTitle>
<vchCountry>Albania</vchCountry>
<vchLastName>Moisiu</vchLastName>
<vchFirstName>Alfred</vchFirstName>
</Leaders>
<Leaders ID="3">
<vchTitle>Prime Minister</vchTitle>
<vchCountry>United Kingdom</vchCountry>
<vchLastName>Blair</vchLastName>
<vchFirstName>Tony</vchFirstName>
</Leaders>
</TheWorld>
We can use XML to share data more easily by formatting company data in shared dialects and schemas. XML
data can be transferred across the Internet, rendered in web pages, stored in XML files, and converted to
various file types. This flexibility translates into a format that allows you to separate data from presentation,
as well as keeping the data platform independent.
The Anatomy of an XML Document
In the previous section's example of an XML document, the first line was:
<?xml version="1.0" encoding="ISO-8859-1"?>
This is the XML declaration. For an XML document to be well-formed (meaning it follows the rules of an XML
document), the declaration should always be included. If the document does not include a header, it will not
be well formed and, therefore, does not adhere to all the basic XML syntax rules. Most XML parsers (involved
in consuming, translating, or transforming the document) will expect a document to be well-formed. The
declaration defines the XML version of the document (for now always 1.0, defined by the World Wide Web
Consortium (W3C)). The encoding, which can differ, defines the XML character set.
The previous example also included elements and attributes. Elements are called tags in HTML. The following
XML chunk has an element called <vchTitle> and a closing element called </vchTitle> (notice the
forward slash to mark the end of the element):
<vchTitle>President</vchTitle>
The next example shows the use of an attribute. The element is <Leaders>, and the ID="3" is the attribute
describing the <Leaders> element. Attributes are normally used to describe elements, or provide additional
information about elements:
<Leaders ID="3">
932
XML
There are no fixed rules on when to use attributes and when to use elements; if an item occurs many times, it
must be an element, since attributes cannot occur more than once for one element.
A well-formed XML document must have a root element. The XML document hierarchy is made up of parent
and child elements, all of which must be within the root element; the root element is the parent of all
elements within a document.
In the following example <TheWorld> is the root element, and all children tags exist within the <TheWorld>
opening and closing (</TheWorld>) elements:
<TheWorld>
...
</TheWorld>
Elements within other elements are called child elements; for example, the <Leaders></Leaders>
elements were child elements of the <TheWorld> root element, and <vchTitle> and other elements are
child elements of the <Leaders> parent. Child and parent elements must always be properly nested,
meaning that their open and closing tags should not overlap, but rather should always be contained within
one another, beginning with the root node.
Other requirements for an XML document are that each opening element be associated with a closing
element; always use closing tags in XML. XML parsers are much less forgiving than HTML parsers (usually a
web browser) regarding this. Also, be aware that elements are case-sensitive, and that attribute values must
always appear within double quotes (for example ID="3", not ID=3).
XML Technologies
There are a number of XML-related technologies. Some of these technologies are used for displaying XML,
like XHTML (HTML 5.0), XSL (Extensible Stylesheet Language), and XSLT (XSL Transformations. Others are
used to model and map the XML document, such as DTD (Document Type Definition) or XML Schema. DOM
(Document Object Model) and SAX (Simple API for XML) are used to manipulate the contents of an XML
document programmatically, and XPath (XML Path language), XLINK (XML Linking Language), and XQL
(XML Query Language) are used to query the contents of an XML document.
SOAP (Simple Object Access Protocol) is used to transfer arbitrary XML documents between systems. This
latest version of SQLXML includes SOAP functionality. Shortly after SQL Server 2000 was released, Microsoft
began offering free downloads of SQLXML (XML for SQL Server), which further extended interoperability
between XML and SQL Server 2000. At the time of writing, SQLXML is currently at version 3.0 SP1 (service
pack 1). SOAP with SQLXML enables SQL Server stored procedures, user-defined functions, and other
SQLXML technologies (described next), to be exposed as web services. Such web services can then be
accessed from multiple platforms or programming languages, by using the SOAP protocol.
In addition to the aforementioned XML-related technologies, there are multiple methods available for
importing, exporting, and manipulation of XML data as it relates to SQL Server 2000:
‰
OPENXML
Reviewed in more detail later, OPENXML uses Transact-SQL extensions and system stored procedures
to load an XML document into the SQL Server 2000 memory space. The source XML document must be
stored in a CHAR, NCHAR, VARCHAR, NVARCHAR, TEXT, or NTEXT table column within a table in SQL
Server. Once the document is in memory, we can use a rowset view of the XML data. We can use the
results of this rowset within other Transact-SQL operations (such as importing the results into a
table). OPENXML uses the Microsoft XML parser, called Microsoft XML Core Services (MSXML), for
parsing the document.
933
Chapter 18
‰
MSXML
MSXML can be used with or independently of SQL Server, and is available as a free download from the
Microsoft site. MSXML includes an XML parser, XSLT engine, DOM APIs, XSD (XML Schema definition
language), XPATH, and SAX. At the time of writing, MSXML is currently at version 4.0, SP1. Using a
programming language such as Visual Basic 6.0, and ADO (Microsoft's data access interface), you can
export or load data to or from SQL Server by accessing the DOM exposed by MSXML. MSXML also
includes two programming classes that enable HTTP access.
‰
Microsoft XML OLE DB Simple Provider
ADO can also work with the Microsoft XML OLE DB Simple Provider, which can be used to read XML
documents into a recordset, and then used for importing into SQL Server 2000. For more information
on this provider, see Microsoft Knowledge Base article 271722, HOWTO: Access Hierarchical XML Data
with the XML OLE DB Simple Provider.
‰
SQLXML XML Bulk Load Utility
SQLXML version 1.0 introduced the XML Bulk Load utility, which allows high-speed bulk loads of data
packaged inside XML tags into SQL Server. Unlike OPENXML, the entire XML document is not loaded
into memory, so XML Bulk Load can be used to load very large documents. XML Bulk Load is a standalone COM object, and can be referenced and invoked by COM compliant programming languages.
‰
FOR XML
The FOR XML clause is reviewed in the next section, and is used within the Transact-SQL statement to
output data in XML hierarchical format.
‰
SQLXML Client-side XML Processing
SQLXML 3.0 SP1 includes client-side XML processing, which converts a relational result set to a
hierarchical XML document format on the client side. If you call a SELECT FOR XML query, with
client-side XML formatting enabled, only the SELECT statement (without the FOR XML) is passed to
SQL Server. The rowset is then converted to an XML document by SQLXML on the client workstation.
‰
XML Schema Definition (XSD) Annotated Schemas
XSD is an extension of the W3C XML Schema specification. Microsoft added annotated schemas, which
allow you to avoid complicated XML FOR EXPLICIT clause statements by binding the XSD definition
directly to the database schema. This binding is also called XML views, allowing these views to be
queried by the XML Path language (XPath). You should consider using XSD, instead of FOR XML, if you
are more familiar with XML and XPath than Transact-SQL. XPath is not as expressive as Transact-SQL
(for example, XPath lacks wildcards, has limited data types, and doesn't have a UNION clause).
‰
Direct URL Queries
Internet Information Services 5.0 includes XML and SQL Server 2000 integration features. In
conjunction with IIS, you can embed SQL queries directly into URL strings. SQL Server can then return
the results as an XML document, and display the results in the browser.
‰
XML Templates
Also in conjunction with IIS 5.0, XML templates contain SQL statements that incoming URL requests
can invoke. Templates are more secure and flexible than direct URL queries, and are not limited in size
or complexity. XML templates are stored on the server itself.
‰
SQLXML Updategrams
Introduced in SQLXML version 1.0, updategrams allow you to modify data in SQL Server by using
special XML tags. With updategrams, you use an XML grammar to specify before and after images for
fragments of the modified data. Updategrams implement an XML-to-SQL mapping that eliminates the
need to write Transact-SQL update queries. This may be beneficial for those developers unfamiliar
with Transact-SQL and more comfortable with XML and related XML languages.
‰
SQLXML Diffgrams
Diffgrams are similar to updategrams but they can be generated automatically from an ADO.NET
Dataset object (ADO.NET is the next generation data access model for Microsoft's .NET Framework).
Aside from ADO.NET, diffgrams can also be used with ADO version 2.6.
934
XML
‰
SQLXML Managed Classes
Introduced in SQLXML version 2.0, the SQLXML managed classes consist of .NET objects (classes) that
allow programmers unfamiliar with traditional SQL to use XML templates or server-side XPath queries
against SQL Server instead. To use these classes, the .NET framework and SQLXML free download
must be installed on the machine where you plan to use them.
Some excellent references on the Web include:
‰
SQXML SQL Server 2000 (a Kevin Williams site) at http://www.sqlxml.org/
‰
PerfectXML at www.PerfectXML.com/SQLXML.asp
‰
Microsoft's XML Web Services page at http://www.microsoft.com/sql/techinfo/xml/default.asp
Kevin Williams' book, SQL Server 2000 XML Distilled (ISBN 1-904347-08-8), is also a great place for more
information.
18.1 How to… Use FOR XML
The Transact-SQL SELECT statement FOR XML clause allows you to convert relational rowset data into
hierarchical XML output.
FOR XML has three modes which impact how the hierarchical data is formatted: AUTO, RAW, and EXPLICIT.
AUTO mode returns each table used in the FROM clause of the query as an element, and each column
referenced in the SELECT clause as an attribute associated with the element. AUTO mode is ideal for queries
using complex JOIN operations, easing the conversion to hierarchical format.
The GROUP BY clause and aggregate functions are not allowed in conjunction with FOR XML AUTO.
The following is an example of using FOR XML AUTO clause:
SELECT TOP 2
Leaders.ID,
Leaders.vchTitle,
Leaders.vchCountry,
Leaders.vchLastName,
Country.iPopulation
FROM Leaders, Country
WHERE Leaders.vchCountry = Country.vchName
FOR XML AUTO
This returns:
<Leaders ID="1" vchTitle="President"
vchCountry="Afghanistan" vchLastName="Karzai">
<Country iPopulation="26668251"/>
</Leaders>
<Leaders ID="2" vchTitle="President"
vchCountry="Albania" vchLastName="Moisiu">
<Country iPopulation="3119000"/>
</Leaders>
935
Chapter 18
AUTO mode can also be specified with elements; when adding elements, this AUTO mode maps columns to
elements instead of attributes. For example:
SELECT TOP 2
Leaders.ID,
Leaders.vchTitle,
Leaders.vchCountry,
Leaders.vchLastName,
Country.iPopulation
FROM Leaders, Country
WHERE Leaders.vchCountry =
Country.vchName
FOR XML AUTO, ELEMENTS
This returns:
<Leaders>
<ID>1</ID>
<vchTitle>President</vchTitle>
<vchCountry>Afghanistan</vchCountry>
<vchLastName>Karzai/vchLastName>
<Country>
<iPopulation>26668251</iPopulation>
</Country>
</Leaders>
<Leaders>
<ID>2</ID>
<vchTitle>President</vchTitle>
<vchCountry>Albania</vchCountry>
<vchLastName>Moisiu</vchLastName>
<Country>
<iPopulation>3119000</iPopulation>
</Country>
</Leaders>
RAW mode generates one row element for each row of the query result set, and includes the column data as
the row element's attributes. RAW mode does not support retrieval of binary data.
The following is an example of using the FOR XML RAW clause:
SELECT TOP 2
Leaders.ID,
Leaders.vchTitle,
Leaders.vchCountry,
Leaders.vchLastName,
Country.iPopulation
FROM Leaders, Country
WHERE Leaders.vchCountry = Country.vchName
FOR XML RAW
This returns:
<row ID="1" vchTitle="President" vchCountry="Afghanistan"
vchLastName="Karzaia" iPopulation="26668251"/>
<row ID="2" vchTitle="President" vchCountry="Albania"
vchLastName="Moisiu" iPopulation="3119000"/>
936
XML
By adding the XMLDATA keyword to the FOR XML clause, a schema data type definition is prepended to the
XML output. This is useful for returning the data types of each column. For example:
SELECT TOP 2
Leaders.ID,
Leaders.vchTitle,
Leaders.vchCountry
FROM Leaders, Country
FOR XML AUTO, XMLDATA
This returns two XML blocks, the first defining the schema of the XML document (defining the elements,
attributes, and associated data types), and the second defining the XML document itself:
<Schema name="Schema4" xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes">
<ElementType name="Leaders" content="empty" model="closed">
<AttributeType name="ID" dt:type="i4"/>
<AttributeType name="vchTitle" dt:type="string"/>
<AttributeType name="vchCountry" dt:type="string"/>
<attribute type="ID"/>
<attribute type="vchTitle"/><attribute type="vchCountry"/>
</ElementType>
</Schema>
<Leaders xmlns="x-schema:#Schema4" ID="1" vchTitle="President"
vchCountry="Afghanistan"/>
<Leaders xmlns="x-schema:#Schema4" ID="1" vchTitle="President"
vchCountry="Afghanistan"/>
The BINARY BASE64 keyword is used with FOR XML AUTO, and FOR XML EXPLICIT (described next), when
binary information needs to be embedded in the XML document as base64-encoded format.
For example, the picture column from the Categories table is output using FOR XML AUTO and BINARY
BASE64 (results not shown, as the BINARY BASE64 output of one row alone takes up several pages):
SELECT TOP 1
CategoryID,
picture
FROM Categories
FOR XML AUTO, BINARY BASE64
EXPLICIT mode is the most complex mode to learn, but offers the most control. Using FOR XML EXPLICIT,
you can specify exactly which columns are elements or attributes in your query result set. You can specify
nesting, and use subqueries and UNION queries to create sophisticated XML hierarchies.
EXPLICIT format is based on the concept of a universal table, which contains information about the
hierarchical structure of the XML document. The keywords TAG and PARENT are used in a FOR XML EXPLICIT
SELECT statement. TAG is used to define the tag number of the current element, and the PARENT column
stores the tag number of the parent element (which is always NULL for the top level of the hierarchy).
After the TAG and PARENT values are defined in the SELECT statement, the elements and attributes must then
be defined via the XML identifier syntax:
ElementName!TagNumber!AttributeName!Directive
937
Chapter 18
XML parameters include:
‰
ElementName
The ElementName defines the generic identifier of the element.
‰
TagNumber
TagNumber defines the nesting value of the element within the XML tree.
‰
AttributeName
AttributeName is the name of the XML attribute if Directive is not specified, otherwise it is the
name of the contained element when optionalDirective is defined as xml, cdata, or element.
‰
Directive
The Directive field controls the XML format returned. Permitted values include:
Directive Value
Description
xml
Directs the column to act as a contained element, without entity encoding
taking place (conversion of special characters to its entity associated value).
cdata
Non-entity encoded data.
element
Generates a contained element with a specified name, with entity
encoding applied to the contents of the element.
ID, IDREF, IDREFS
These three types all facilitate XML intra-document links.
hide
The attribute will not be displayed, but can still be used for ordering.
xmltext
Wraps the contents of the column in a single tag, to be consumed or
parsed later on, or used for overflow data.
Example 18.1.1: Using FOR XML EXPLICIT
In this example, the top two rows are returned from the Leaders table. The ID field and vchLastName
columns are defined as attributes of the Leaders parent element, and the vchCountry column is defined as
an element:
SELECT TOP 2
1 as TAG,
NULL as PARENT,
ID as [Leaders!1!Leaders!id],
vchCountry as [Leaders!1!vchCountry!element],
vchLastName as [Leaders!1!vchLastName]
FROM dbo.Leaders
FOR XML EXPLICIT
This returns:
<Leaders Leaders="1" vchLastName="Karzaia">
<vchCountry>Afghanistan</vchCountry>
</Leaders>
<Leaders Leaders="2" vchLastName="Moisiu">
<vchCountry>Albania</vchCountry>
</Leaders>
For multiple levels in the XML tree use the UNION ALL operator with additional SELECT statements.
938
XML
18.2 How to… Use sp_makewebtask to
Output XML Documents
You can use FOR XML to output your relational data as hierarchical data; but how do you get it out of SQL
Server? This chapter briefly reviewed various technologies that can assist you with outputting data into XML
format and files (URL queries, templates, SQLXML); however, there is one easy method that is built-in to SQL
Server 2000, the sp_makewebtask procedure.
You can use the sp_makewebtask procedure to produce an HTML or XML document. This procedure has a
total of 33 parameters, but for our purposes, we will only need to use three of them. The user executing this
procedure must have SELECT permissions for the query from which you will be returning data (using FOR
XML in this case), CREATE PROCEDURE permissions in the database where the query is run, and permissions to
write the generated HTML document to the selected location.
The syntax of sp_makewebtask using just the 3 parameters (see Microsoft SQL Server Books Online for the
full array of choices) is as follows:
sp_makewebtask [@outputfile =] 'outputfile',
[@query =] 'query'
[, [@templatefile =] 'templatefile']
Parameter
Description
@outputfile ='outputfile'
The location of the HTML or XML file to be generated (UNC
names are allowed for remote computers). Depending on your
security context, your SQL Server service account, SQL Server
Agent, or proxy accounts should be configured to have
permissions to write to this location.
@query = 'query'
The Transact-SQL query used to output the results.
@templatefile =
'templatefile'
Template files are used to generate the HTML or XML
document, and contain placeholders and formatting
instructions. The <%insert_data_here%> tag is used to
indicate where query results should be added to an HTML
table. <%begindetail%> and <%enddetail%> tags are used
to define a complete row format (see SQL Server Books Online
for a review of the formatting options).
For our upcoming example, create a template file in Notepad called
C:\temp\leaders.tpl, the contents of which should be:
<leaders>
<%begindetail%>
<%insert_data_here%>
<%enddetail%>
</leaders>
Notice that we have placed a <leaders> root tag to make this
a well-formed document.
939
Chapter 18
Example 18.2.1: Using sp_makewebtask to generate an XML document
Once you have created a template file, as shown in the syntax table above, you can execute the
sp_makewebtask stored procedure in Query Analyzer, with the following example parameters:
sp_makewebtask @outputfile = 'C:\temp\leaders.xml',
@query = 'SELECT * FROM Leaders FOR XML AUTO',
@templatefile = 'C:\temp\leaders.tpl'
This produces a Leaders.xml file with the results of the SELECT * FROM LEADERS in XML format. If you
open this file with Microsoft Internet Explorer (version 5 or higher), you will see something similar to this:
18.3 How to… Use OPENXML
The OPENXML Transact-SQL command allows us to query XML documents stored in a table column like a
relational table. OPENXML allows retrieval of both elements and attributes from an XML document fragment.
The generated rowset can then easily be used within other SQL statements (sub-query or INSERT statement,
for example).
The sp_xml_preparedocument system stored procedure is used in conjunction with OPENXML to move the
XML document into memory. The document is stored in the internal cache of SQL Server, using up to one
eighth of the total available SQL Server memory. The procedure reads the XML document and parses it
internally using the MSXML (Microsoft XML) parser.
The sp_xml_preparedocument stored procedure is responsible for returning a handle that we can use to
access the internal representation of the XML document. The handle can be used for the duration of the user
connection to SQL Server, or until the handle is de-allocated using sp_xml_removedocument.The
sp_xml_removedocument procedure is used to clear the document from memory once finished.
The syntax for sp_xml_preparedocument is as follows:
sp_xml_preparedocument hdoc OUTPUT
[, xmltext]
[, xpath_namespaces]
940
XML
Parameter
Description
hdoc OUTPUT
This is an integer OUTPUT parameter value that indicates the handle of the
newly created XML document.
xmltext
This is the field containing the XML document. This column data type can
be CHAR, NCHAR, VARCHAR, NVARCHAR, TEXT, or NTEXT.
xpath_namespaces
This parameter specifies the namespace declarations used in the row and column
XPath expressions. This value defaults to <root xmlns:mp="urn:schemasmicrosoft-com:xml-metaprop">. A namespace is a collection of
element type and attribute names that is significant to the consuming parser or
XML-related technology (in this case Xpath).
The syntax for sp_xml_removedocument is as follows:
sp_xml_removedocument hdoc
This procedure takes only one parameter, hdoc, which is the integer handle of the XML document loaded into
memory with sp_xml_preparedocument. Always remember to de-allocate your XML documents using
sp_xml_removedocument when you have finished using them; otherwise, the memory will not be
de-allocated until the user session that loaded the document disconnects.
OPENXML uses XPath, which is a general-purpose query language for addressing, sorting, and filtering
elements of an XML document, including the text within them.
OPENXML is used in the FROM clause, or anywhere else a rowset provider (table or view) is allowed.
The syntax is:
OPENXML (idoc, rowpattern, flags)
[WITH (SchemaDeclaration or TableName)]
Parameters
Description
idoc
Integer document handle of the XML document.
rowpattern
XPath pattern used for identification of nodes to be processed as rows.
flags
Indicates XML data and relational rowset mapping.
0 creates attribute-centric mapping.
1 applies attribute-centric mapping first, and then element-centric
mapping for columns not yet processed.
2 uses element-centric mapping.
8 specifies that consumed data should not be copied to the overflow
property @mp:xmltext.
WITH
(SchemaDeclaration
or Tablename)
When WITH (SchemaDeclaration) is used, the actual syntax is made up
of a column name (rowset name), column type (valid data type), column
pattern (optional XPath pattern), and optional meta data properties (meta
data about the XML nodes).
If Tablename is used instead of a SchemaDeclaration, a table must
already exist for holding the rowset data.
941
Chapter 18
Example 18.3.1: Using OPENXML
This example extracts the ID and vchLastName fields from an XML document (held in a VARCHAR local
variable), and presents the data in a columnar report format.
The beginning of this example declares an integer variable @idoc for use by the sp_xml_preparedocument
system stored procedure. The @doc VARCHAR(1000) local variable is used to populate the XML document
string. A memory reference is then created to the @doc XML string, outputting the reference in memory value
to the @idoc integer variable.
Lastly, OPENXML is used with @idoc, along with an XPath statement pattern, to return the values of the ID
and vchLastName columns as a relational result set:
DECLARE @idoc int
DECLARE @doc varchar(1000)
SELECT @doc =
'<TheWorld>
<Leaders ID="1" vchLastName="Karzai
">
<vchCountry>Afghanistan</vchCountry>
</Leaders>
<Leaders ID="2" vchLastName="Moisiu">
<vchCountry>Albania</vchCountry>
</Leaders>
</TheWorld>'
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc
SELECT *
FROM OPENXML (@idoc, '/TheWorld/Leaders',1)
WITH (ID int, vchLastName varchar(100))
This returns:
942
XML
OPENXML works best with smaller documents (1MB or less), since documents are loaded into memory.
Documents can only consume up to 1/8 the total available SQL Server memory. Concurrent loading of the
same document by different user sessions may cause the memory limit to be reached, as well as the loading of
a single document that exceeds the maximum memory available.
943
Download