[6] Frank Cohen, FastSOA: Accelerate SOA with XML, XQuery, and

advertisement
The Need for SOA Database for Storing SOA Data
Divya Gade
Rejitha Rajasekhar
ITK478
Spring ‘07
Service-oriented architecture (SOA) is a collection of services that share business logic, data, and processes
through a programmatic interface across a network. It allows applications from different sources to
communicate through standards-based interfaces. This kind of communication is made possible through an
iterative process of creating or “exposing” new services, aggregating or “composing” these services into
larger composite applications, and making the outputs available for consumption by the business users [1].
The most common message format used for implementing the SOA services is XML. So, this in turn
implies that the database designed to store SOA data must be able to handle XML data with ease.
The goal of SOA is to achieve loose coupling among the interacting software agents. Large systems exhibit
more resilience to failure because of loose coupling and service based partitioning, and also resources may
be added incrementally throughout the architecture as needs change [2]. This presents a new set of
challenges to enterprise database systems supporting these scenarios. There are three main requirements for
the efficient storage of SOA data [3]. The first requirement is the need for federated information
management. When SOA implementations began to see success, businesses began to put hundreds and
thousands of services on them. As a result, service consumers are suddenly exposed to hundreds and
thousands of services. So, to comprehend these services, a federated view of the data and services is
required. The second requirement is the need for SOA data caching. To provide fast and reliable access to
the large amounts of data resulting from a successful SOA implementation, some kind of caching would be
needed. The third requirement is that the data and services need an unprecedented level of governance to
ensure their validity. SOA governance is the process of defining and enforcing organizational policies and
standards. These policies and standards are required for managing liabilities and dependencies in business,
to ensure continuity of business operations, and, as well, reduce costs.
SOA data is usually stored in relational databases and file systems. SOA data are fundamentally XML and
XML data cannot be easily modeled in relational databases [4]. A relational database is not efficient to
store XML data as it is slow and not scalable. Also, relational databases do not do a good job at persisting
and indexing hierarchical unstructured data such as XML. Relational databases usually use a field type
called Binary Large Object (BLOB) to store XML. This method is neither efficient nor easily indexed for
rapid searching. A secondary problem is that they do not handle sparse data efficiently. The choices are a
single table with lots of NULLs, which wastes space, or many sparsely populated tables, which are
expensive to join. Relational databases are also not normally optimized to work with streaming data. When
XML messages are sent across a Web service-based network, they are ideal for processing with a streambased approach that is foreign to relational databases [6]. An XML document could be shredded enough to
fit it into a relational table but this method is not a good choice when it comes to indexing and fast queries.
In addition, shredding also leads to the loss of details like element order, processing instructions,
comments, white space, and other elements that are important in many applications. Field and record
boundaries also do not tend to match the boundaries of an XML document. Various applications for
example, publishing systems, which care about these details, need to look beyond the relational database
for their information storage needs [7]. But for some applications like e-commerce that uses XML as a data
transport, the data involved will most probably have a highly regular structure and will be used by nonXML applications. Furthermore, things like entities and the encodings used by XML documents probably
are not important because such an application would be interested in the data and not how it is stored in an
XML document. In this case, all that is needed is a relational database and some kind of software to transfer
the data between XML documents and the database [5]. In contrast, in an SOA application the data have a
less regular structure and it is basically streaming and transient in nature. Thus, relational databases are not
entirely capable of handling SOA data. File systems also do not provide advanced querying and
management capabilities which is a typical need in an SOA. For the above compelling reasons, we believe
that if SOA data is created in XML, it should be persisted, managed, and treated as XML. So, when the
data is XML, the best method is to use a native XML database.
For persisting information such as the state of a business step in an application, the lists of available Web
services, etc., SOAs need some persistence mechanisms. Much of this information is frequently requested
and accessed and thus, by providing some caching in the middle tier, will alleviate the performance
bottleneck that can be caused by multiple requests to the same information store.
As described in [4], since SOA data and metadata are basically XML, the mid-tier caching architecture can
include an XML database as a mid-tier cache along with a number of XQuery-powered technologyindependent, reusable, and functionality-rich services, hence, improving SOA scalability and performance.
Such an SOA repository can enable increased performance, reliability, functionality, and usability of SOA
data. The authors state that this is made possible through an effective mid-tier caching architecture driven
by several important services like policy-based caching service, data repurposing service and data
abstraction service.
The use of a policy-based caching service can enable the setup of XQuery-based policies to cache result
sets of low-performing services. These policies can include the time-to-live before the cache is refreshed.
Policies which are based on time-of-day requests can determine if the data in the cache can be used for this
request or if the originating source must be used. Also, policies based on service availability ensure that if
due to some reason the service is not available at any time, results required can be obtained from the cache.
The cache can be refreshed depending on time and some other configurable parameters.
A data repurposing service can provide additional filtering and search criteria on the content returned from
a given service. In addition, XQuery can be used to drive transformations for repurposing the content and
also provide analytics and reporting on the returned content. XQuery can also use portions of different
result sets and generate a final result set based on aggregation of content from multiple services.
By having a data abstraction service will eliminate the need for Web services to be aware of individual data
sources. The figure shown below illustrates how a data abstraction service eliminates the need to develop
separate clients and Web services for each operation. Data source management for different data sources
such as JDBC, HTTP, WSDL, and file systems can also be enabled using this service.
Figure1. A data abstraction service eliminates the need for different Web services clients for different
operations. Source: [4]
In addition, since these services can run on any system, an SOA repository can be used to facilitate the
federation of services in an SOA. It can also be used to alleviate performance issues for center-tier process
by allocating the data as close as possible to the data processing.
Although there are several advantages to using an XML database for SOA, there are also some drawbacks.
The query language that is used for XML database is XQuery just how SQL is for relational databases. But,
XQuery is not as powerful as SQL in the sense that XQuery can retrieve information from an XML
database, but it cannot add documents to the database, delete documents from the database, modify existing
documents, or do anything else. As of now, this shortcoming is handled in the existing XML databases by
providing some kind of XQuery extensions like XUpdate, dbXML etc. Another thing is that XML
databases are still under development and standards are just being developed, and they cover only part of
what is needed to interface with such a database. So, it will be few more years before SOA databases will
become a mainstream for storing SOA data.
References:
[1] Microsoft BizTalk, Learn about service oriented architecture, Available online:
http://www.microsoft.com/biztalk/solutions/soa/overview.mspx
[2] David Campbell, Service Oriented Database Architecture: App Server-Lite, in the
Proceedings of the 2005 ACM SIGMOD international conference on Management of data SIGMOD, 2005
[3] Ash Parikh and Murty Gurajada, SOA for the real world, November 2006. Available online:
http://www.javaworld.com/javaworld/jw-11-2006/jw-1129-soa.html?page=3
[4] Ash Parikh, Robert Smik, Premal Parikh , The power behind the SOA repository, in the XML 2005
Proceedings. Available online:http://www.javaworld.com/javaworld/jw-06-2005/jw-0627webservices.html?page=1
[5] Ronald Bourret , XML And Databases, September 2005.
Available online: http://www.rpbourret.com/xml/XMLAndDatabases.htm
[6] Frank Cohen, FastSOA: Accelerate SOA with XML, XQuery, and native XML database technology, The
role of a mid-tier SOA cache architecture, Feb 2006.
Available online:http://www-128.ibm.com/developerworks/xml/library/x-accsoa/
[7] Elliotte Harold, Managing XML data: Native XML databases, June 2005.
Available online:http://www-128.ibm.com/developerworks/xml/library/x-mxd4.html
Download