The Need for SOA Database for Storing SOA Data Divya Gade Rejitha Rajasekhar ITK478 Spring ‘07 Service-oriented architecture (SOA) is a collection of services that share business logic, data, and processes through a programmatic interface across a network. It allows applications from different sources to communicate through standards-based interfaces. This kind of communication is made possible through an iterative process of creating or “exposing” new services, aggregating or “composing” these services into larger composite applications, and making the outputs available for consumption by the business users [1]. The most common message format used for implementing the SOA services is XML. So, this in turn implies that the database designed to store SOA data must be able to handle XML data with ease. The goal of SOA is to achieve loose coupling among the interacting software agents. Large systems exhibit more resilience to failure because of loose coupling and service based partitioning, and also resources may be added incrementally throughout the architecture as needs change [2]. This presents a new set of challenges to enterprise database systems supporting these scenarios. There are three main requirements for the efficient storage of SOA data [3]. The first requirement is the need for federated information management. When SOA implementations began to see success, businesses began to put hundreds and thousands of services on them. As a result, service consumers are suddenly exposed to hundreds and thousands of services. So, to comprehend these services, a federated view of the data and services is required. The second requirement is the need for SOA data caching. To provide fast and reliable access to the large amounts of data resulting from a successful SOA implementation, some kind of caching would be needed. The third requirement is that the data and services need an unprecedented level of governance to ensure their validity. SOA governance is the process of defining and enforcing organizational policies and standards. These policies and standards are required for managing liabilities and dependencies in business, to ensure continuity of business operations, and, as well, reduce costs. SOA data is usually stored in relational databases and file systems. SOA data are fundamentally XML and XML data cannot be easily modeled in relational databases [4]. A relational database is not efficient to store XML data as it is slow and not scalable. Also, relational databases do not do a good job at persisting and indexing hierarchical unstructured data such as XML. Relational databases usually use a field type called Binary Large Object (BLOB) to store XML. This method is neither efficient nor easily indexed for rapid searching. A secondary problem is that they do not handle sparse data efficiently. The choices are a single table with lots of NULLs, which wastes space, or many sparsely populated tables, which are expensive to join. Relational databases are also not normally optimized to work with streaming data. When XML messages are sent across a Web service-based network, they are ideal for processing with a streambased approach that is foreign to relational databases [6]. An XML document could be shredded enough to fit it into a relational table but this method is not a good choice when it comes to indexing and fast queries. In addition, shredding also leads to the loss of details like element order, processing instructions, comments, white space, and other elements that are important in many applications. Field and record boundaries also do not tend to match the boundaries of an XML document. Various applications for example, publishing systems, which care about these details, need to look beyond the relational database for their information storage needs [7]. But for some applications like e-commerce that uses XML as a data transport, the data involved will most probably have a highly regular structure and will be used by nonXML applications. Furthermore, things like entities and the encodings used by XML documents probably are not important because such an application would be interested in the data and not how it is stored in an XML document. In this case, all that is needed is a relational database and some kind of software to transfer the data between XML documents and the database [5]. In contrast, in an SOA application the data have a less regular structure and it is basically streaming and transient in nature. Thus, relational databases are not entirely capable of handling SOA data. File systems also do not provide advanced querying and management capabilities which is a typical need in an SOA. For the above compelling reasons, we believe that if SOA data is created in XML, it should be persisted, managed, and treated as XML. So, when the data is XML, the best method is to use a native XML database. For persisting information such as the state of a business step in an application, the lists of available Web services, etc., SOAs need some persistence mechanisms. Much of this information is frequently requested and accessed and thus, by providing some caching in the middle tier, will alleviate the performance bottleneck that can be caused by multiple requests to the same information store. As described in [4], since SOA data and metadata are basically XML, the mid-tier caching architecture can include an XML database as a mid-tier cache along with a number of XQuery-powered technologyindependent, reusable, and functionality-rich services, hence, improving SOA scalability and performance. Such an SOA repository can enable increased performance, reliability, functionality, and usability of SOA data. The authors state that this is made possible through an effective mid-tier caching architecture driven by several important services like policy-based caching service, data repurposing service and data abstraction service. The use of a policy-based caching service can enable the setup of XQuery-based policies to cache result sets of low-performing services. These policies can include the time-to-live before the cache is refreshed. Policies which are based on time-of-day requests can determine if the data in the cache can be used for this request or if the originating source must be used. Also, policies based on service availability ensure that if due to some reason the service is not available at any time, results required can be obtained from the cache. The cache can be refreshed depending on time and some other configurable parameters. A data repurposing service can provide additional filtering and search criteria on the content returned from a given service. In addition, XQuery can be used to drive transformations for repurposing the content and also provide analytics and reporting on the returned content. XQuery can also use portions of different result sets and generate a final result set based on aggregation of content from multiple services. By having a data abstraction service will eliminate the need for Web services to be aware of individual data sources. The figure shown below illustrates how a data abstraction service eliminates the need to develop separate clients and Web services for each operation. Data source management for different data sources such as JDBC, HTTP, WSDL, and file systems can also be enabled using this service. Figure1. A data abstraction service eliminates the need for different Web services clients for different operations. Source: [4] In addition, since these services can run on any system, an SOA repository can be used to facilitate the federation of services in an SOA. It can also be used to alleviate performance issues for center-tier process by allocating the data as close as possible to the data processing. Although there are several advantages to using an XML database for SOA, there are also some drawbacks. The query language that is used for XML database is XQuery just how SQL is for relational databases. But, XQuery is not as powerful as SQL in the sense that XQuery can retrieve information from an XML database, but it cannot add documents to the database, delete documents from the database, modify existing documents, or do anything else. As of now, this shortcoming is handled in the existing XML databases by providing some kind of XQuery extensions like XUpdate, dbXML etc. Another thing is that XML databases are still under development and standards are just being developed, and they cover only part of what is needed to interface with such a database. So, it will be few more years before SOA databases will become a mainstream for storing SOA data. References: [1] Microsoft BizTalk, Learn about service oriented architecture, Available online: http://www.microsoft.com/biztalk/solutions/soa/overview.mspx [2] David Campbell, Service Oriented Database Architecture: App Server-Lite, in the Proceedings of the 2005 ACM SIGMOD international conference on Management of data SIGMOD, 2005 [3] Ash Parikh and Murty Gurajada, SOA for the real world, November 2006. Available online: http://www.javaworld.com/javaworld/jw-11-2006/jw-1129-soa.html?page=3 [4] Ash Parikh, Robert Smik, Premal Parikh , The power behind the SOA repository, in the XML 2005 Proceedings. Available online:http://www.javaworld.com/javaworld/jw-06-2005/jw-0627webservices.html?page=1 [5] Ronald Bourret , XML And Databases, September 2005. Available online: http://www.rpbourret.com/xml/XMLAndDatabases.htm [6] Frank Cohen, FastSOA: Accelerate SOA with XML, XQuery, and native XML database technology, The role of a mid-tier SOA cache architecture, Feb 2006. Available online:http://www-128.ibm.com/developerworks/xml/library/x-accsoa/ [7] Elliotte Harold, Managing XML data: Native XML databases, June 2005. Available online:http://www-128.ibm.com/developerworks/xml/library/x-mxd4.html