Service Oriented Databases: Abhishek Khanolkar Abstract: This paper introduces the concept of service oriented databases and compares the different proposed architectures. At the beginning we try and differentiate the central databases with service oriented databases. Till date different architectures for the service oriented databases are proposed. Some are even implemented in commercial DBMS. This paper tries and evaluates a few of them. Introduction: Service Oriented Architecture (SOA) is an immerging architecture. SOA is not necessarily a new architecture [1]. Companies have been implementing this architecture for some time now. To understand the concept we need to understand the concept of ‘service’. A service is defined as “a software component that encapsulates a function, has a well defined interface that includes a set of messages that the service receives and sends, and a set of named operations” [1]. Real world examples of services are abundant—hairdressing cooking and cleaning are good examples of services. So now that we understand ‘services’ it is easier to discuss SOA. The SOA can be divided into three parts, 1) services, 2) Application that discovers and uses services and 3) an infrastructure, protocol or medium that connects applications to services [2]. Web services could also be used to build and deploy SOA. Web services are defined as “A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-process able format (specifically WSDL)” [4]. SOA can also be defines in the context of Web Services – “Service-Oriented Architecture is a collection of distributed, self contained Web services that communicate with each other independently of the context or state of other services” [3]. SOA and databases: Databases play a very vital role in any SOA. Any service however small requires data to access and add values to it. It is important to note that any service operates to add values to the task it operates on. A service could be used to insert data into the database or retrieve data from the database. Database systems are integrated with related applications and can be accessed only by the interfaces provided by the Application Server (AS) or middleware. There are different kinds of middleware’s -- RPC (remote Procedure Clients), CORBA (Common Object request Broker Architecture), J2EE now JEE (Java Enterprise edition), IBM MQ, TIBCO (The Information Bus Company), web methods, MOM (Message Oriented Middleware) and JMS (Java Message Service). All these technologies are either design patters or commercial products used to create the SOA. Databases are also divided into two parts central databases and distributed databases. In the Service oriented databases we are talking about both of these two kinds, but mostly distributed databases. In this paper when we talk about the Service Oriented Databases we are also talking about database Middleware. Service oriented database or Database Middleware Systems: Database Middleware can be defines as “systems used to integrate collections of data sources over computer network” [5]. Database Middleware Systems use the concept of ‘data integration servers’ [5]. The Data integration Servers provide a uniform to the application viewing the data. There are two ways of deploying an ‘integration server’ – 1) Database Gateway and 2) Database Mediator [5]. In the Database Gateway approach the commercial database is configures to access a remote database through a ‘gateway’. The gateway is responsible for providing access methods to the remote data [5]. In the second approach the integration server used a mediator server for distributed query processing. The mediator uses the functionality of ‘wrappers’ to access and modify the data. In these two approaches the user defined types and the query operators are defined in global operators and contained in libraries. The user defines libraries must be linked to clients in the system. There is deficiency in this approach 1) the inability to deploy application specific functionality and 2) inability to efficiently procession user defined types. The MOCHA architecture tries to address these problems. MOCHA: MOCHA stands for Middleware based On a Code SHipping Architecture. MOCHA is a database Middleware system designed to interconnect hundreds of data sources. MOCHA is a selfextensible middleware system for Distribute Data sources [5]. A self-extensible middleware-- “A self-extensible middleware system is one in which new applicationspecific functionality needed for query processing is deployed to remote sites in automatic fashion by the middleware system itself” [5]. MOCHA achieves this by shipping Java code with new possibilities to remote sites. This shipped Java code can then be used to manipulate data. This pattern differs with the current database middleware systems because in the current middleware systems the Administrator has to manually install all the code. MOCHA automatically deploys code to provide efficient query processing. MOCHA works efficiently with the data operators (data-reducing and data-inflating operators). Data reducing operators can be aggregates, predicates etc. An, AVG function in ORACLE can be considered as data reducing operators. So MOCHA efficiently places the task of deploying and using data reducing operators on the data sources. The data-inflating operators which increases the size of original data are ‘evaluated near’ the client. Since, in most cases the code is smaller than the actual data this optimizes query processing as less data needs to be transferred. Comparison: The MOCHA approach is very different from the current database middleware as the data is processed in the integration servers or data source evaluates only those operators present in its environment [5]. And, no code shipping takes place. The reason for developing MOCHA owe to the internet and various data sets used across. So database middleware services will be efficient only if they are scalable and they offer efficient query processing. MOCHA offers this opportunity. If currently used database middleware start to deploy code, then all the libraries also need to be configures and as the number of applications across the network increases this deployment also escalates. Developers might also need to add extra functionality to the applications to make it work. MOCHA on the other hand will provide application-specific functionality to interested sites in automatic fashion. MOCHA also makes the query operator evaluate on the site. The MOCHA architecture is based on three components 1) Client Applications 2) Query Processing Coordinator (QPC) 3) the Data-Access Provider (DAP). The Client could be anything an applet, servlet or stand alone java application. A client receives request form browser and provides that to the QPC. QPC job is to process and optimize the query. DAP will help the QPC in query processing and deploying the code, DPC also works with the remote databases. Query operators used in MOCHA are two types—1) projections and predicates and 2) aggregates. Aggregates are implemented in java. The operators are based on the plans created by QPC. Important Note: In MOCHA the code deployment phase occurs on-line as automatic process without human intervention and there is no need to restart any process to use the functionality received in code deployment. Some Technical issues with MOCHA: 1) MOCHA implemented an aggressive policy of object allocation and re-use. This improved memory management. This was lacking in some of the existing JDBC drivers used in database middleware’s as objects we created and used just once. 2) Originally the communication between QPC and DAP was done using the RMI it was found to be slow. So, MOCHA authors created a communication platform using java sockets. 3) Security was provided in order to avoid a dangerous code being executed on the host machine. For this purpose java infrastructure was used. We now discuss a different architecture which also is implemented in a commercial product. SODA: SODA stands for Service Oriented Database Architecture [6]; it is developed for Microsoft SQL Server DBMS. The SODA architecture was developed with one thing in mind the need for loosely coupled databases for loosely coupled applications. It was found that the core if many loosely coupled SOA systems was monolithic and clustered databases. The SQL server development team decided to add functionality to the system that will make it more SOA oriented. After adding the feature what came out was the SODA. The following features are added in the up-coming SQL Server. 1) SQLCLR—SQL (Structure Query Language) and CLR (Common Language Runtime). Embedding the CLR in the core database engine. 2) Database Change Notification—representation of complex queries so that database change notification can be provided without the need to poll the database. 3) Native web Service Access—administrator can publish a procedure as web method, this allows SOAP compliant clients to access services directly from database engine. 4) Service Broker—SQL Service is service centric rather than message centric and this benefits a lot. SODA what database SOA must include-1) It must support and act as a service host. 2) It may be able to directly process and transform service request. 3) It must provide an efficient and easily programmed logic. SODA why SOA in database? 1) Need to maintain messages in queues persistently. 2) Ability to scale services up and down. 3) Easy support for grid computing. Conclusion: If a comparison is made between these two architectures I would prefer SODA to MOCHA. MOCHA is an excellent system for service oriented architectures with features such as memory management, automatic extension of code and deployment. But, SODA is taking care of all those SOA oriented demands; it also is taking care of the typical client server demands. Moreover if a current architecture is using either SQL Server or other such commercial database it is easier to implement the additional features provided by such systems; then to go in for an entirely new system/architecture like MOCHA. This is from the business perspective; from the technical perspective both the systems offer great opportunities. In future it is expected that SQL and other such commercial systems may include more features such as one offered by MOCHA. It is also important to note that MOCHA was proposed at the turn of the century that is about 7 years back, and SODA is quite current. It is important to note that although good architectures are suggested it takes ample of time to implement them in commercial systems. References: [1] Gennaro (Jerry) Cuomo, IBM SOA “on the Edge” SIGMOD 2005. [2] Mira Kajko-Mattsson, Grace A. Lewis, Dennis B. Smith, A Framework for Roles for Development, Evolution and Maintenance of SOA-Based Systems, International Workshop on Systems Development in SOA Environments (SDSOA'07), 2007 [3] Dov Dori, SODA: Not Just a Drink! From an Object-Centered to a Balanced Object-Process ModelBased Enterprise Systems Development, Proceedings of the Fourth Workshop on Model-Based Development of Computer-Based Systems and Third International Workshop on Model-Based Methodologies for Pervasive and Embedded Software, 2006 [4] Web Services Architecture,http://www.w3.org/TR/2004/NOTE-ws-arch-20040211/ 2004 [5] Manuel Rodr´ıguez-Mart´ınez, Nick Roussopoulos, MOCHA: A Self-Extensible Database Middleware System for Distributed Data Sources_, MOD 2000, Dallas, TX USA [6] David Campbell, Service Oriented Database Architecture: App Server-Lite?, SIGMOD 2005,