Proceedings of the 7th Annual ISC Graduate Research Symposium ISC-GRS 2013 April 24, 2013, Rolla, Missouri Sima Das Department of Electrical and Computer Engineering Missouri University of Science and Technology, Rolla, MO 65409 INTEROPERABILITY AMONG HETEROGENEOUS MOBILE MULTI-DATABASES ABSTRACT Advances in technology allow mobile nodes to own Databases and perform Transaction processing in a Mobile Ad-hoc Network (MANET). The implicit heterogeneity and autonomy of individual database permits us to view the underlying database system as a multi-database system. Our problem formulation is based on both mobile user and mobile Data Servers. In this context we propose interoperability among heterogeneous, mobile, multi-databases and achieve it with an integration of underlying network architecture and summary schema structure. Over semantically inter-related and possibly replicated databases; we consider transaction management by using semantic serializability concept over summary schema model. Further, we analyze the effect of mobility on summary schema structure and on transaction management. 1. INTRODUCTION With advances in technology the mobile devices are capable of higher storage to act as databases and enough processing ability to process transactions. Since, data servers are mobile they need to be connected over wireless. Thus, dynamically configurable network commonly called mobile ad hoc network (MANET) become part of the overall network model. A wide range of applications confers to this model. In this environment the nodes or mobile hosts act as database servers. Taking into account highly mobile database servers, we consider (manned/un-manned) vehicular network as our suitable application domain for this paper. Though, the application can be set of communicating and collaborating drones in performing certain tasks; to ground search, rescue and critical mission; with the later one having lesser mobility of individual nodes. The implicit heterogeneity and autonomy of the underlying distributed database servers permit us to view it as a mobile multi-database system, where each mobile host can access multiple databases to process global transactions. This requires inter-operability among multiple heterogeneous, autonomous mobile databases. Further, to achieve higher performance we need suitable transaction management techniques over mobile multi-databases. Transaction management over static database servers and mobile hosts is studied in [1]. Here Ongtang et al., considered mobile agents on summary schema model; to mange transactions over static multi-database. Corresponding to each submitted transaction a global agent is created to take care of the actions. Agents here act as an abstraction over the underlying network and mapping among schema hierarchy. Xing et al., [2,3] proposed an optimistic concurrency control algorithm called sequential order dynamic adjustment (SODA) over centralized and distributed mobile ad hoc network databases. In SODA a history of committed transactions is maintained to validate committing transactions. Further, the list is dynamically adjusted during validation process to avoid unnecessary aborts. This gives a sequential ordering among the transactions. In [3] the authors considered energy efficient dynamic cluster construction algorithm over MANET, which is also the basis behind many (mobile) wireless sensor network due to their energy constraints. In MANET energy may not always be a constraint and is dependent on specific application, for example in applications like vehicular network and co-operating drones accomplishing some specific task there is no energy constraint; where as, ground search, rescue, and critical mission may have energy constraint. Further, in energy constrained mobile database node it is not always practical to execute time-consuming transactions, irrespective of dynamic selection of cluster head based on criteria of energy consumption rate or remaining energy. Brayner et al., [4] proposed semantic serializability based concurrency control over MANET databases. This has the intrinsic assumption that databases are disjoint and updates on a database only depend on values of data in the same database. Though, in many practical applications databases can be interdependent based on corresponding organizational structure, and the assumptions of semantic serializability does not hold. In this approach, global transactions are serialized at each site using strict 2PL, while each site must maintain the consistency of its own local database at the same time. Since, the global serializability is relaxed and there is no co-ordination among servers, and the locks held by the sub-transactions of the global transaction are released once they are completed at the local sites, the limited bandwidth is saved and transaction execution time is reduced. Ongtang et al., [1] used the summary schema model for transaction management in static multi-databases. Since, summary schema model represent semantic abstraction over databases, parsing the transaction over the schema hierarchy is one way of achieving semantic serializability. Here 1 the transaction commits only when all its sub-transactions are ready to commit (similar to strict 2PL scheme) and the issue of indirect conflict is addressed, as global order is always maintained. Still, the work in [1] has the implicit assumption of disjoint databases and that updates in a database only depend on values of data in that database. Situation involving update operation of a transaction over some attributes in one database that is inter-related with other databases over the same attribute(s), or, replications is not discussed. The research in [1-4] does not consider the effect of underlying network architecture on transaction management. In this context we consider interoperability among heterogeneous mobile multi-databases as an integration of network architecture and database concepts for better transaction management. For our study, we take (manned/unmanned) vehicular network as the standard application domain. In section 2 we consider the underlying network architecture. In section 3 we discuss interoperability among heterogeneous mobile multi-databases. Section 4 considers transaction management over summary schema model. Section 5 discuses the effect of mobility on schema hierarchy and transaction 2. UNDERLYING ARCHITECTURE Vehicular communication network (manned/unmanned) consists of two types of communication. In Vehicle –to-Vehicle (V2V) communication vehicles communicate directly with each other over WLAN using IEEE 802.11p standard and resulting in a vehicular ad hoc network. Thus, V2V network is selforganized, restricted to local level (communication range of 300 meters), and have high data rate. Large-scale Implementation in WLAN leads to congestion and frequent disconnection. In Vehicle-to-Infrastructure (V2I) communication vehicles communicate directly with infrastructure (base station) over WWAN using say 3G. The powerful communication device in base stations gives wider range (usually 20 miles), but WWANs have lower data rates and give limited bandwidth to vehicles. Vehicles or mobile nodes are considered to have data servers (possibly heterogeneous), and can both initiate and process transactions. Vehicles move along pre-existing roads, have no energy constraint, relatively lesser computing power than base stations, and have both 802.11 and 3G interfaces. Thus, V2V network is a specific class of MANET. In contrast, base stations considered to be fixed, have servers associated with them, have larger storage, and computing power, and can communicate with each other over optical fiber or wireless. We consider the server associated with each base station to have an auxiliary database. We refer to the area under each base station as zone. If each mobile node uses V2I communication, then limited-3G bandwidth will give rise to congestion, as base stations serve over a much wider transmission range. Thus, we consider the combination of these two types of communication model called hybrid model. Over this hybrid model clusters are formed involving vehicles, and base stations to give a hierarchical structure that fits into the network constraints. Here, we visualize three kinds of clusters: one involving subset of vehicles where, communication takes place over WLAN; another involving each base station and the set of cluster heads within its zone; the third static cluster involving only bases stations. Since, there is no energy constraint the best possible way to form the clusters is to use relative mobility and signal strength as parameters, as the network mobility changes frequently. The corresponding cluster head is selected based on some predefined heuristics (like ratio of total number of mobile nodes within the communication range to the number of mobile nodes with above average signal level, relative displacement). In literature there exists a number of clustering and cluster head election algorithms for VANET [5,6]. Further, in the MANET domain the dynamic clustering and cluster head algorithms need to consider energy constraint in the algorithms [7]. Fig. 1 represents this basic hierarchical network structure. In the following section we consider the needed database architecture over the discussed network hierarchy to facilitate Interoperability over mobile multi-databases. 3. INTEROPERABILITY AMONG HETEROGENEOUS MOBILE MULTI-DATABASES In order to consider interoperability among heterogeneous mobile multi-databases, we use the Summary Schema Model. Ceri et al., [8] proposed Global Schema Model to distributed databases. Further, Batini et al., [9] considered database schema integration methodologies. Extending the global schema structure to multi-databases Bright et al., considered summary schema model in [10]. 3.1.Underlying Summary Schema Structure Here we consider two specific types of summary schema structures to achieve Interoperability. Summary schema structure represents a semantic and structural abstraction of the underlying schema. (1) One summary schema model is over the autonomous, heterogeneous, mobile multidatabases. (2) Another summary schema is over the geographic regions of the road network. Since, the application domain is a vehicular network, the summary schema model over the geographic regions will contain a hierarchy of (Inter, Intra) state highways, local roads, corresponding exits, significant places etc., with the corresponding base station’s subnet IP address at the lowest level of the schema hierarchy. We assume the database designer/ linguist has already defined the semantic and structural relationship among attributes for a specific application domain (here vehicular network). On this an automatic, dynamic schema summarization algorithm is used to generate the summary schema hierarchy or to map the incoming transaction to semantically and structurally similar schema. Further, we consider the summery schema model being built up at four levels. 2 (a) (b) (c) (d) WLAN-Summary Schema: Each vehicle or mobile node has a database and its corresponding local schema. When a local cluster (WLAN) is formed and a cluster head is elected, all the local schemas within it are used to generate the corresponding WLAN-summary schema at the cluster head. This summary schema is built automatically and dynamically (as mobile nodes can change) using schema summarization algorithm. The lowest level cells of the schema table points to the IP address of the respective vehicles with matched semantic and structural attribute, and the cells of upper level schema table points to the corresponding matched lower level schema table. WWAN-Summary Schema/ Intra-Zonal Summary Schema: Each cluster head in a zone communicate with the corresponding base station. The WLANsummary schemas are used to build the intra-zonal summary schema at the base station, using the automatic schema summarization algorithm. The intra-zonal summary schema is build over all the mobile nodes of the respective zone. As in the WLAN–summary schema, the cells of the lowest level schema table here points to the IP address of the respective mobile node and of the upper level schema tables points to the lower level schema table based on semantic and structural matching. Auxiliary Summary Schema: This is either a complete or partial replica of the intra-summary schema model. This is built (replicated) when a mobile node leaves a zone, and its corresponding data is copied to the auxiliary database. The auxiliary database and the corresponding schema is used to parse the transaction when there is no matched schema is found in the intra-zonal summary schema and the transaction refers to that zone only. One such situation is, suppose a zone is completely empty and a transaction is directed to that zone to gather information about road condition; in this case the auxiliary summary schema is used to scan the auxiliary database for any possible information. The auxiliary DB in any case contains the most updated information corresponding to any set of attributes, when there is no vehicle with the matched schema attributes. This is because, leaving vehicles update the auxiliary database if there is no other node with matching schema attributes and there is read or update operation being performed on it. In case of update operation the updated value is kept on the auxiliary database for future use. We will consider the details of the use of auxiliary schema and auxiliary database in Section 3 Inter-Zonal Summary Schema: This summary schema is static one and can be built directly by the database designer, owing to the fixed base stations. This is built over geographic road network and corresponding base station’s subnet IP address, which we mentioned earlier. This is used to find either destination base stations for a geographic region or gateway base stations to the destination. This acts as a DNS server in network. The minute details of the above summary schema structure falls under the domain of schema summarization process. Inter-zonal summary gives user flexibility in submitting its transaction, without worrying about its current zone. If the destination for transaction processing is outside the current zone, then the specific route to the destination is taken; without broadcasting through all adjacent base stations. We will see the use of these above schema structures in providing interoperability during transaction execution. 3.2.Interoperability with Multiple Summary Schema Structure The multiple layers of summary schema structure allow transactions to be processed under their most precise domain. Suitable domain for processing transactions can be determined inn the following way using summary schema: Once a transaction is issued at a node: (1) The semantic and structural matching is done to check whether it involves own database or not. If it involves only own database domain and (a) A read operation, then the information is accessed locally. (b) An update operation, then the corresponding attributes are updated and the WLAN summary schema is searched for any other matching node and the attribute value is updated at the respective nodes. (2) If the outcome of the semantic and structural matching on own database is NULL, then WLAN-summary schema is searched for any matching node. If the match is found then: (a) If it is a read operation, then the information is extracted from one of the matching nodes. (b) If it is an update operation then all matched nodes are updated. (3) If the outcome of matching with respect to WLANsummary schema is NULL, then it is passed to the bases station via the cluster head. There it is matched with respect to the inter-zonal summary schema to find its corresponding zone and the base station. (I) If it is the same base station, then the transaction is matched against the intra-zonal summary schema node to find the destination node to process the transaction. (II) If it is a different base station, then the transaction is routed to the destination subnet IP address and matched against the intra-zonal summary schema node at the destination base station. Corresponding to read or update operation the suitable steps are taken as in (2).(a) and (2).(b). We see that, the above summary schema based approach allows us to route the transaction to the destination in a guided manner, rather than making an undirected broadcast. In this 3 discussion we limited our discussion to focus on interoperability, but there is more to the transaction execution. In the above we considered the transactions to be simple ones, involving single nodes. In reality single transaction processing can span multiple nodes and nodes themselves may not be semantically and structurally disjoint. We consider the details of this in transaction management in the following section. 4. TRANSACTION MANAGEMENT OVER SUMMARY SCHEMA MODEL In order to achieve higher performance transactions are interleaved. The concept underneath concurrent transaction execution, while maintaining the atomicity and consistency property is to have a sequential execution of the conflicting transactions. By conflicting transactions we mean the set of transactions that want to access the same data (semantically and structurally) at the same time and one of them is an update operation. One approach to achieve sequential order among conflicting transactions is through semantic and structural parsing. Since, schema represents a semantic and structural abstraction of the underlying attributes, it is justified to parse through the summary schema hierarchy of [10]. Further, summary schema hierarchy provides parallelism by allowing multiple entry points for a transaction. An incoming transaction can be matched with summary schema nodes at all levels of the summary schema hierarchy for possible matching. In [1] the authors considered summary schema hierarchy, but implicitly assumed disjoint and non-replicated database. Further, they have used agents as an abstraction over the underlying network and for parsing over the schema hierarchy. Here we consider databases that can have semantically and structurally identical attributes, and can be completely replicated. In addition, transaction execution can span over semantically and structurally related databases involving common attributes. Further, we consider non-agent based model, where transactions are passed over the network as packets and parsed over the summary schema hierarchy by a semantic parser. After getting parsed over the summary schema model, transactions reach the desired destination database for processing. Let us consider Ri, Rj as databases placed at two distinct nodes. Let Ri(ak) and Rj(al): attribute set of relation Ri and Rj respectively. We consider È,Ç : semantic, and structural union and intersection respectively. Let Ri(ak) Ç Rj(al) = {Si } ≠ Ø. We represent Ti-Ok: operation k corresponding to a transaction Ti. Let {am} Í {Si}. (a) Let us consider Ti-Ok ({am}} = Write and Ti-Ol ({am}} = Read. OR, Ti-Ok ({am}} = Write and Tj-Ok’ ({am}} = Read. That is the sub-transactions parsed to different local databases, or, operations corresponding to different transaction parsed to different databases but involve common attributes then, Ti-Ok ({am}} < Ti-Ol ({am}}. (b) Let us consider Ti-Ok ({am}} = Write and Ti-Ol ({am}} = write. OR, Ti-Ok ({am}} = Write and Tj-Ok’ ({am}} =write. Similar to the above situation, Ti-Ok ({am}} < Ti-Ol ({am}} or, Ti-Ol ({am}} < Ti-Ok ({am}}. (c) If {Si} = Ri(ak) = Rj(al). Ti-Ok ({Si}} = Write = Ti-Ok (Ri(ak)), then " Rj(al) = Ri(ak): Ti-Ok (Rj(al)) = Write. Further, " Tj > Ti: Ti-Ok < Tj-Ok’. The above set of constraints provides conflict serializability for the transactions, while parsed through summary schema hierarchy in case have inter-related and replicated databases. 5. EFFECT OF MOBILITY ON SCHEMA HIERARCHY AND TRANSACTION MANEGEMENT In this section we consider two aspects of mobility: if mobility will have any effect on performance of the summary schema structure and how transactions will be handled in case of mobile nodes that process and/or submit transactions. Summary schema structure is a logical hierarchical graph based on semantic and structural heterogeneity. The map along this graph is a traversal from generalized to precise attribute that satisfy the semantics and structural property of all higher-level nodes. In our example vehicular network application, all mobile nodes are highly mobile. In the following paragraph we consider the different scenarios, where summary schema is built and maintained and analyze effect of mobility on each. Let’s first consider the WLAN. Here, the cluster head contains the summary schema hierarchy for the nodes within its own communication range. The cells of the lowest level SSN points to the IP address of the vehicles. When a vehicle moves away from the cluster and hence, out of the communication range of the cluster head, the respective cell pointer at the lowest level SSN is set to NULL. At the same time, there might be other mobile nodes within the range of the cluster head with matching semantics, so the lowest level SSN may not get deleted. If there is no node with the matching semantic attributes of the SSN, then the pointer to this node is set to NULL at its upper level SSN. Thus, the deletion operation is an outcome of the heterogeneity of semantics and structural information of the mobile node and the number of heterogeneous nodes moving out. In the worst case if the cluster is dissolved that is, all the nodes move out of the cluster (irrespective of whether heterogeneous or homogeneous) then the graph becomes empty. An upper level SSN node is set to NULL, iff all its lower level nodes (children) are empty. If we observe, then each of the summary schema node is like a table that is populated (a pointer is added) when a match is found with its metadata fields and is eliminated when there is no match. This hierarchical graph structure is stored in memory, like a routing table in router that is populated and deleted as the nodes are connected and removed in its network. If we consider the cost of eliminating SSN nodes in the logical structure, then in the worst case it is O (h); where h is the height of the semantic tree. Further, in most application domains, mobile nodes move in groups, with relative distance among nodes 4 remaining same in most part. Thus, it is less likely that there will be changes to the cell pointer in WLAN. Similarly, when a node is added to the cluster and a (new) cluster head is elected, the summary schema is built. The cost (time) to build this graph depends on the efficiency of the semantic translator. The worstcase time to add a pointer to the already existing SSN node (table) or, to add a new SSN is O (h), where h is the height of the schema graph. In case of a new SSN node creation, the nodes in the upper layer need to be adjusted to accommodate the addition. This too has worst case cost O (h), giving the overall worst case cost to be O (h). In case of WWAN, when a node moves out of its current zone the respective cell pointer in the corresponding intra-zonal schema is set to NULL. Similarly, when a node enters a zone the corresponding intrazonal summary schema is populated. The procedure and cost is same, as in case of WLAN-summary schema structure. For inter-zonal summary schema there is no effect of mobility. Further, the effect of mobility on auxiliary summary schema structure is discussed in the following paragraph, along with transactions. First, we consider that a transaction is allocated to the matched node, if it has not initiated any handoff procedure. Now, if the node initiates handoff while executing the transaction, then the following situations can happen. (1) The transaction is completed before the handoff procedure is done. In this case the result is returned to the destination via base station and/or cluster heads. (2) Transaction is yet to be completed and the hand off procedure is initiated. (a) The transaction is aborted over that database. If there is additional matching nodes in the same zone then the transaction is restarted executing there. Further, all other allocated transactions to the transiting node are transferred to the suitable matched node. (b) There is no matched node from the intrazonal summary schema. The state of the database that is the values of the corresponding database is stored in the auxiliary database. The associated summary schema model (partial) is copied/formed as part of auxiliary summary schema structure. The aborted transaction can now start executing at the auxiliary database. Further, the allocated but waiting transactions are transferred to getting executed over the auxiliary database. (3) If the transaction does not have any matching node over the intra-zonal schema, then the auxiliary summary schema is searched for possible matching. If a match is found, the transaction is transferred to auxiliary database and is executed there. When a node processing a transaction, moves out of a WLAN framework, and is within the zone; it sends the result back to the cluster head via the new cluster head and base station. Any cluster head keeps track of the submitted but waiting transactions till it gets the corresponding result and delivers it to the requested node. Thus, an auxiliary database is updated when a node leaves a zone and there is no other node with the matched semantic/structural information. Now, we consider the situation where the requesting node has moved out of its WLAN or WWAN network. In either case the result is routed to the node via base station and cluster head. The result is returned to the base station of the transactionprocessing node. From here, the result is forwarded to the current location of the node. 7. CONCLUSIONS In this paper we introduce interoperability among mobile heterogeneous multi-databases by taking into consideration the underlying network architecture and the summary schema model. The summary schema approach is not suitable for resource constrained mobile ad hoc network, neither is time consuming transaction processing suitable over energy constrained mobile ad hoc network. In some application domains, like vehicular network the lack of energy constraint and resource constraint makes it suitable for multi-database transaction processing, where we considered summary schema model. Further, we considered transaction management over inter-related and replicated databases. Finally, we considered the effect of mobility over summary schema hierarchy and transaction management. Related to this paper we need to give the correctness proof of the transaction management approach over mobile multi-database. For future research in this domain, we would like to consider novel concurrency control algorithms that will be suitable for resource and energy constrained mobile ad hoc networks. This algorithm should be independent of the dynamic cluster head selection 8. ACKNOWLEDGMENTS I am grateful to the Intelligent Systems Center for the support for this research work. 9. REFERENCES [1] Ongtang, M., Hurson, Ali R., and Jiao Y., 2009, “Agent-based Infrastructure for Data and Transaction Management in Mobile Heterogeneous Environment,” International Conference on Communication and Mobile Computing, CMC-Vol. 3, pp. 314-318. [2] Xing, Z., Grunewald, L., and Phang, K. K., 2008, “SODA: an Algorithm to Guarantee Correctness of Concurrent Transaction Execution in Mobile P2P Databases,” Proceedings of the 19 th International Conference on DEXA Workshop, pp. 337-341. [3] Xing, Z., and Grunewald, L., “An Energy-efficient Concurrency Control Algorithm for Mobile Ad-hoc Network Databases,” Proceedings of the 22 nd International Conference on Database and Expert Applications Systems, pp. 496-510 (2011). 5 Brayner, A., and Alencar, F. S., 2005, “A Semanticserializability Based Fully-Distributed Concurrency Control Mechanism for Mobile Multi-database Systems,” Proceedings of the 16 th International Workshop on DEXA, pp. 1085-1089. [5] Souza, E. D., 2010, “A New Aggregate Local Mobility Clustering Algorithm for VANET,” IEEE International Conference on Communications, pp. 7200 -7204. [6] Weiwei, L., 2012, “Robust Clustering for Connected Vehicles using Local Network,” IEEE International Conference on Communications, pp. 7157 -7161. [7] Bandyopadhyay, S., 2003, “An Energy Efficient Hierarchical Clustering Algorithm for Wireless Sensor Networks,” INFOCOMM-Vol. 3, pp. 1713-1723. [8] Ceri, S., Pernici, B., and Wiederhold, G., 1987, “Distributed Database Design Methodologies,” Proc. IEEE 75, Vol.-5, pp. 533-546. [9] Batini, C., Lenzerini, M., and Navathe, S. B., 1986, “A Comparative Analysis of Methodologies for Database Schema Integration,” ACM Comput. Survey, 18, Vol.4, pp. 322-364. [10] Bright, M. W., and Hurson, A. R., 1990, “Summary schemas in Multi-database Systems,” Computer Engineering Tech. Rep. TR-90-076, The Pennsylvania State Univ., University Park, Penn. [4] 6 WWAN2 BS2 BS1 WLAN4 CH MN WLAN1 WLAN2 WWAN1 Fig 1: Logical clusters with communication topology The figure is not to scale. CH: Cluster Heads in each cluster with dotted black lines communicate with BS MN: Mobile nodes showing grey nodes, that communicate with other nodes within each cluster BS: the base station that communicate with cluster heads and other base stations Dotted black lines: Shows a logical cluster (WLAN) Solid Black links used to represent nodes within each cluster that can communicate directly with each other. Solid green links represent communication between cluster heads and BS The solid black rectangles represent WWAN. The solid red line represents the communication line (can be optical fiber, or wireless) between two base stations. 7