A Global Garbage Collector for Federated Database Management Systems Mulatéro Frédéric Thévenin Jean-Marc Bazex Pierre MIRA-UTM/IRIT UT1/IRIT IRIT Équipe MIRA - UFR MISSEG Université Toulouse Le Mirail 5 Allée Antonio Machado 31 052 Toulouse Cedex France mulatero@cict.fr Université Toulouse 1 Place A. France 31 042 Toulouse Cedex France IRIT Université Paul Sabatier 118, route de Narbonne 31 062 Toulouse Cedex 04 France bazex@irit.fr thevenin@univ-tlse1.fr Abstract A new trend is to federate existing and independent DataBase Management Systems (DBMS) into MultiDataBase Systems (MDBS). In this approach it is fundamental to preserve the autonomy of the federated DBMS. Indeed, the emergency of global applications should not disturb the preexisting local applications running on each local DBMS. This paper presents a global Garbage Collector (GC) integrated in a MultiDataBase System architecture which preserves DBMS autonomy. Each DBMS is supposed to have its own local GC and none assumption is made on the behavior of the local GC. In addition, there is no interaction between the global GC and the local GC. The global GC proposed is an adaptation of reference listing combined with a reverse mark and sweep technique. It has the following interesting properties: it is incremental and requires few interactions with transactions; the reverse mark and sweep technique is able to detect dead object cycles that are frequent in a DBMS context; it is able to collect objects without accessing the whole database and global synchronization of DBMS sites is not required. The Global GC works exclusively on entry and exit items without accessing global object cells stored in DBMS. Consequently it implies few I/O overhead for the DBMS. Keywords Incremental garbage collector, multidatabase system, DBMS autonomy, dead cycle detection 1. Introduction A MultiDataBase System (MDBS) [ÖV91] provides uniform access to data managed by autonomous DataBase Management Systems (DBMS) that can be heterogeneous and that are distributed across a network. Data being distributed and shared, it may be impossible for an application to determine which data are still in use and which data should be discarded. Referential persistency models conjointly used with a garbage collector provide a nice solution to ease this process. Referential persistency models [ABC+83] rely on the following rules. The user is allowed to give names to objects. Named objects are so called persistency roots. Each object referenced through a persistency root directly or indirectly is persistent. Other objects are transient and should be destroyed by the Garbage Collector (GC) if they are no more used by any application. Thus the role of the GC is to detect objects that are neither connected to persistency roots nor connected to objects in the stacks of applications, in order to discard them. A lot of works have been dedicated to GC in charge to collect objects in centralized environments [Wil92] and distributed environments [PS95, Fer96, BEN+94, LPQ92]. Some works have also been dedicated to GC in charge to collect database objects [AFG95, Bjö89, CWZ94, Det90, Gru92, KW93, ML94, MRV91, ONG93, SPD92, SP96, YNY94]. The specificity of database objects is that they have to respect the ACID properties [GR93], namely : Atomicity, Coherency, Isolation and Durability. Consequently, database objects are manipulated through transactions that run concurrently with the GC. A new trend is to federate existing and independent DBMS into MDBS. In this approach it is fundamental to preserve the autonomy of the federated DBMS. Indeed, the emergency of global applications should not disturb the preexisting local application running on each local DBMS. In this paper we propose a solution to implement a global GC for a MDBS composed of federated DBMS. This solution aims at respecting DBMS autonomy and supposes that each DBMS has its own local GC. The data of the MDBS are complex in the sense that there is no restriction in the localization of global objects in order to combine them. The algorithm proposed for the global GC is based on reference listing [ML94], a variant of reference counting [Col60]. Such algorithms do not detect object cycles which are quite frequent in object DBMS [AFG95]. To solve the problem, the proposed algorithm implements reverse mark and sweep based on inverse reference lists. This technique has two main advantages compared to regular mark and sweep techniques [McC60]: it is able to collect objects without accessing the whole database and objects are accessed only once. In addition, while running in a distributed context, it does not require global synchronization between DBMS sites. The following of the paper is organized as follows. Section 2 introduces our vocabulary and gives the main issues concerning global and local objects. Section 3 describes the architecture that supports these concepts. Section 4 presents the global GC protocol. Section 5 discusses important aspects on the global GC behavior. Section 6 concludes. 2. Global and local objects Two types of applications coexist in a federated MDBS. Local applications are limited to the manipulation of local objects stored in one DBMS. These manipulations escape to the control of the MDBS. Global applications transparently access to shared data located in the different DBMS federated by the MDBS. For each DBMS involved in the federation, objects are partitioned into three disjoint sets (see Figure 1). Local objects are only accessed by local applications. References between local objects are local to one DBMS. Global objects are shared by global applications under the control of the MDBS. A global object may contain external references, that is references to global objects located in remote DBMS. Shared local objects are accessed both by local and global applications. Such objects are local objects that are made visible to the global applications. They do not contain external references. Thus local application not under the control of the MDBS can not reach a global object through a shared local object. DBMS1 Applying referential persistency, objects are said to be persistent if they are named or referenced through a named object. In order to preserve DBMS autonomy, each DBMS manages its own set of local names for persistency roots of local objects and the federated MDBS manages a unique set of global names for persistency roots of global objects as pictured in Figure 1. SL L G G DBMS2 L External reference Local reference Global object Local object Global name Local name SL Shared Local object object Figure 1: Local, Global and Shared Local Objects 2 As mentioned above, two types of reference coexist in the federated MDBS. Local references are used to identify or send operations to local and shared local objects inside a local DBMS. External references are necessary to access global and shared local objects in the federated MDBS context. While local references provide direct access to objects inside a DBMS, external reference provide indirect access to objects across the network. As in [LPQ92], an external reference to object o is composed of (see Figure 2) a local reference to an exit item stored in the local DBMS, which in turn references an entry item stored in the DBMS containing object o, which itself contains a local reference to object o. The reference between an entry item and an exit item is a remote reference that can be implemented as the concatenation of a DBMS identifier and a local reference. In the following, if oi denotes a global object then Eidi, Xidij and Lidi respectively denote the external reference of oi which is also the reference of the entry item of oi, the reference of exit item from oj to oi, and the local reference of object oi. Local reference Remote reference Entry items EId2 LId2 EId1 XId23 LId3 EId3 LId1 XId13 Exit items Figure 2: Entry and Exit Items The operations defined on external references are the following : create external reference, assign external reference, traverse external reference, return external reference, and unassign external reference. Create external reference is used for global object creation and to change the status of a local object into shared local object. Assign external reference is necessary to establish a link between two global objects. Traverse external reference is used each time a global or shared local object is accessed. Return external reference allows a global objects method to return an external reference owned by the global object. Unassign external reference is necessary to inform that a link between two global objects has been cut. These operations are implemented through operations on exit and entry items so that the use of external references is transparent to global applications. Operations on local references are defined by the DBMS involved in the federation. 3 General Architecture It was mentioned in the introduction that there is no restriction in combining global objects to construct complex objects. Figure 3 shows an example where global object o1 stored in DBMS1 is composed of global object o2 stored in DBMS2 in turn composed of object o3 stored in DBMS3 in turn composed of object o4 stored in DBMS1. Entry item O1 Exit item O4 DBMS1 O3 DBMS3 O2 DBMS2 Local reference Global reference Figure 3: Combining Global Objects 3 This example shows that there is no limitation in the length of the composition string of complex objects and that there is no hierarchy between the DBMS involved in the composition. As a consequence, the same DBMS can in turn play the role of a client or a server. It is considered as a client when it calls the interface of a global object stored in a remote DBMS, while it is considered as a server when it implements the interface of a global object. In order to integrate a new DBMS into the federation, two modules have to be developed, namely the DBMS client module and the DBMS server module, so that the DBMS can play its two roles (see Figure 4 for an example). Named global object Global GC module Local application context DBMS1 Server module Client module Global application context Server module Server module Global GC module DBMS3 DBMS2 Global GC module Client module Figure 4: General Architecture 3.1. DBMS client module The client module is in charge to provide both the MDBS interface to global applications and the global object interface. The MDBS interface is composed of the following operations: connection to the federated MDBS, disconnection, commit of a transaction and abort of a transaction. The connect operation initiates a first transaction for the global application and the disconnect operation commit the last transaction of the application. The commit operation must ensure the durability of all the operations performed by the transaction (Atomicity and Durability) and must follow a two step commit protocol since we are in a distributed context. The abort operation must ensure that all the operations of a transaction are undone (Atomicity). The transactional aspects of this interface are further discussed in section 3.4. The global object interface consists in sending global object method calls to the appropriate DBMS server module as well as to provide access to the list of global names. This can be done following CORBA [OMG91] by mean of proxy whose objective is to encapsulate object method calls through exit items. Note that among global object methods a special method is required for global object creation. These aspects are not further discussed in this paper. It is important to mention here that since we are in a transactional context, each global object operation is performed for one transaction whose identifier is transmitted somehow. The simplest way to do it is to add the transaction identifier as a parameter of the method call mechanism. Exit items are managed by the client module. Exit items are implemented as a class of local objects stored in the DBMS associated to the client module so that they can be referenced through a local reference. The main operations of this class are : create exit item, assign exit item, traverse exit item and unassign exit item. Create exit item takes an external reference as entry parameter and returns a local reference to an exit item that can be affected to a global object. This operation is used each time an external reference is returned by a method so that an external reference is always encapsulated into an exit item. The result exit item is then bind to the application context (see Figure 4). Assign exit item and unassign exit item are used respectively to link and unlink a global object to an exit item. If the exit item is already assigned to a global object the assign exit item operation create a new exit item so that there is one exit item per external references held in the DBMS even if they refer to the 4 same object (see Figure 2). Having several exit items pointing to the same object avoids contention on exit items due to the DBMS transaction management mechanism (an exit item updated by one transaction is locked for the whole transaction). Recall that exit item should not be assigned to local shared objects. Traverse exit item returns the external reference associated to an exit item. This interface is used to send a method call to the global object associated to an exit item. It is also used to return external references: when a method of global object o needs to return the reference of an object affected to o it is necessary to return the external reference instead of the exit item. 3.2. DBMS server module The server module provides the mirror interfaces of the client module interfaces on the server side. These interfaces provide access to the transactional facilities mentioned above and transmit global objects methods calls to the appropriate object stored in a local DBMS through an entry item. This can be done by mean of stubs like in CORBA. Entry items are managed by the server module. Entry items can be implemented as a class that should provide the following interface: create entry item, traverse entry item, delete entry item. Create entry item takes a local reference to a global object cell as entry parameter and returns an external reference. This operation is used both by the method which creates global objects and to change the status of a local object into shared local object. Traverse entry item takes an external reference as entry parameter and returns a local reference to a global object cell. It is used to send a method to call the appropriate object. Delete entry item is used by the global GC to collect entry items. Contrarily to exit items, entry items are not stored in the local DBMS. In this way it is easier to provide uniform remote references in each server module. Another advantage of implementing entry items outside DBMS is to ensure that coherent parallel updates can be performed on entry items by mean of latches [ML89] instead of transactions. Latches are instant duration locks that guarantee the coherency property of parallel updates without providing the atomicity property of transactions. Latches are used to minimize synchronization between the global GC and transactions. Indeed next sections show that transactions and the global GC are run concurrently and that both update entry items. However, the global GC does not require the atomicity property for its updates. 3.3 Garbage Collection The global GC proposed aims at respecting DBMS autonomy. Consequently each DBMS is supposed to have its own local GC and none assumption is made on the behavior of the local GC. In addition, there should not be interactions between the global GC and the local GC of the DBMS involved in the federation. The purpose of the global GC is to make sure that entry items, global object cells and exit items are destroyed when no more used. On the other side, the purpose of a local GC is to collect the local objects no more used in a DBMS. We detail below how these tasks are run independently. Exit items and global objects cells are implemented as local objects and their destruction is under the responsibility of the local GC of the DBMS in which they are stored. However, only the global GC of the MDBS can decide whether a global object is alive or dead. Exit items and global objects cells are kept alive in the DBMS context by mean of local names allocated to them at creation time. The local name of an exit item can be directly stored in the exit item cell. As an exit item is unassigned, its local name is destroyed so that the local CG can collect the exit item. The local name of a global object cell is stored in the associated entry item. The global GC collect a global objects oi as follows. First each exit item Xidji pointed by oi is unassigned from the exit item of oi so that it is collected as mentioned above. Then the local name of o i stored in the entry item Eidj is destroyed so that the local GC can collect the global object's cell. Finally the entry item can be destroyed directly by the global GC since it is stored outside the DBMS. The global GC algorithm is a distributed algorithm implemented by global GC modules associated to each DBMS. This algorithm is detailed in section 4. 5 3.4 About Transaction Management and Garbage Collection Global objects as well as local objects must satisfy ACID properties. Consequently objects can only be accessed through transactions. Transaction management is not the purpose of this paper however a few hypothesis have to be stated. The federated MDBS distinguish global transactions accessing global and shared local objects from local transactions executed on each DBMS. A global transaction is supported by one local transaction per DBMS involved in the global transaction. As a global transaction commits, the commitment of the involved local transactions is synchronized though a two phase commit protocol [GR93]. Commercial DBMS provide a standard interface for the two phase commit protocol. However this interface is encapsulated by the DBMS client and server modules interface for synchronization between global transactions and the global GC. The global GC algorithm is incremental so that it runs concurrently with transactions. A first difficulty for the GC is to consider as alive volatile objects created by active transactions and not yet linked to any persistency root. The ACID properties of transactions bring the following problems [AFG95]. Atomicity brings the problem that a transaction which cuts a link between two global objects can be rolled back. This violates the fundamental assumption that dead objects always remain dead. Coherency and Isolation introduce overhead and contention problems in transaction execution. The GC should not increase this overhead and should avoid to interfere with transactions. It should not be considered as a transaction. Durability invalidates the hypothesis of some GC algorithms that the number of alive objects is small. In addition, objects visited by the GC may be stored on disk which introduce undesirable I/O overhead. Finally to enforce Durability the DBMS provide recovery mechanism that are able to re-execute actions of committed transaction in case of system crash or disk crash. The GC actions are not taken into account by the DBMS recovery mechanisms and may be lost after a crash. This may have consequences on the database recovery. The atomicity problem and the problem of volatile objects are solved using a read barrier [Bak78] implemented by the interface of entry items. The purpose of the read barrier in this context is to mark with a transaction identifier the entry item of each global object involved in a global transaction. These marks allow the global GC to considerate as alive both newly created objects and objects whose external reference have been unassigned by an active transaction. As a global transaction commits all its marks must be removed. Exit items bind to the transaction and that have not been affected to a global object are also collected by the commit operation. For each global object oi, a link is maintained between the entry item of oi and the set of exit items referenced by oi so that the global GC can traverse global object graphs without accessing the global object cells stored in the DBMS. The benefit is that the global CG works exclusively on entry and exit items without accessing the objects manipulated by DBMS transactions. In addition, the global GC doesn't need to be a transaction. Indeed exit items are only accessed in read mode by the global GC and entry items are stored outside the DBMS. A last advantage is that exit items are very small objects that can be clustered together so that accessing exit items should imply few I/O overhead for DBMS. As mentioned in the previous section, the global GC does not update or destroy DBMS objects directly. It simply destroys local names of objects that can be collected by a local GC. Consequently, the interactions with the recovery mechanism of a local DBMS are postponed to its local GC. However, transactions update entry items that are not stored in the DBMS. The server module should implement a recovery mechanism for entry items in order to ensure their durability. This aspect is not detailed in this paper. 4. Global Garbage Collector The global GC algorithm proposed is a variant of GC algorithms based on reference counting [Bak78]. In this context, each object maintains a reference counter set to the number of objects which reference it. Objects to be collected are unnamed objects with a reference counter set to zero. GC algorithms based on reference counting have the following advantages in a DBMS context. They are incremental by nature and do not require a complete scan of all the database objects for dead object detection. They also have the following drawbacks. It is impossible to detect dead object cycles, that is objects referencing each other without being reachable from a persistency root, using reference counters. Indeed, objects involved in a dead cycle have their reference counters at least set to one. A second problem is propagation. As the reference counter of object o is set to zero, it is necessary to send a decrement message to each object referenced by o before to collect o. This process can be recursive. A last drawback in a distributed context is that increment and decrement messages are not idempotent 6 if the network does not guarantee that messages are not lost nor duplicated. As a consequence, objects could be collected while still alive. The main solution to solve the cycle detection problem is to use a complementary mark and sweep technique [Hug85, LL86, LPQ92, SDP92]. The mark and sweep techniques are based on two steps [McC60]: a first step starts from roots and marks all reachable objects. Then a second step scans all objects and collects unmarked objects. To avoid that these two steps scan all the objects, some solutions are proposed to cluster object referencing each other into disjoint partitions. These two steps can then be performed independently on distinct partitions. However mark and sweep techniques do not work in perfect harmony with reference counting techniques. Several solutions have been proposed to solve the message problem [Bjö89, LQP92, SDP92, BEN+94, ML94, LC95]. Most of these solutions replace reference counters by reference lists taking into account the fact that the insert and the delete messages used to update lists are idempotent. Objects are collected when their reference lists are empty. The global GC algorithm is an adaptation of the solutions mentioned above. It combines the use of inverse reference lists with a reverse mark and sweep algorithm in order to solve both the dead object cycles and the propagation problems. An inverse reference list is assigned to each object. The inverse reference list of object o j is augmented with the global reference of oi when the external reference of oj is assigned to oi. When a link between oi and oj is cut, the corresponding exit item associated with oi is unaffected without propagating the update to the inverse reference list of oj. The reverse mark and sweep algorithm selects an object suspected to be dead and checks through its inverse references if it is reachable from a persistent root or a global transaction. If not, the suspected object and its predecessors are collected. Inverse reference lists are updated during this process. An important advantage of the reverse mark and sweep algorithm compared to other mark and sweep techniques is that it does not require a complete scan of the database objects for dead object detection and then keeps the advantage of reference counting techniques. The following of this section presents the data structures used by the global GC, explains how they are updated during object management and then details the reverse mark and sweep algorithm. 4.1. Entry and Exit Item Data Structures It was mentioned section 3.3 that the global GC can traverse global object graphs accessing only entry and exit items. Indeed all the data necessary for the GC are stored in entry and exit items in order to minimize interactions between the global GC and the DBMS transactions. Entry and Exit items are implemented as class. The main operations of these class have been presented in section 3.1 and 3.2. This section details additional operations of these required by the GC to access its data. Let us consider the entry item instance associated to global object o i. The entry item class must provide access to the following data: the local reference of oi namely Lidi; the local name of oi; the inverse reference list of objects connected to oi; the list of transactions that accessed oi and the status of oi. The local reference of oi is not used by the global GC but is required to traverse the exit item associated to o i. The local name of oi is required by the global GC to collect oi. The inverse reference list is used by the global GC to check if o i is reachable through its predecessors. The list of transaction identifiers is used to check if o i is reachable through an application transaction. This list implements the transaction marks mentioned in section 3.4. The status of o i is an information used by the reverse mark and sweep algorithm to detect objects involved in a cycle. This status initiated to quiet can change successively into suspected then cycle detector or cycle. An operation is also required to check if an exit item previously affected to o i still exists. This operation is required by the reverse mark and sweep algorithm to update the inverse reference list of o j when reference from oi to oj has been cut. Finally operations on the set of entry items managed by DBMS server are required to select the suspected objects that should be inspect by the reverse mark and sweep algorithm. Let us consider an exit item instance affected to object oi and which references object oj. The exit item class must provide access to the following data: the external reference of o j namely Eidj (traverse exit item); the local name of the exit item and the local reference to the oi entry item namely Eidi. Eidj is used to traverse the exit item. The local name is used to delete the local name of the exit item. Eid i is used conjointly with Eidj by the global GC to check if the exit item referencing oj previously affected to oi still exists. 7 4.2. Updating Entry and Exit Item Let us first consider the updates performed on entry and exit items by transaction T during the following operations: create oi, assign external reference of oj to oi, unassign external reference of oj in oi, traverse external reference of oi, return external reference of oi and commit or rollback. Operation create oi implies a call to create entry item with parameter Lidi. Local reference Lidi is affected to Eidi and T is inserted in the transaction list of Eidi. Operation assign external reference of oj to oi implies a call to assign exit item Xidij with parameter Eidi. This operation ask the server containing oj to insert Eidi in the inverse reference list of Eidj. It is not necessary to put T in the transaction list of Eidj since it is already in the transaction list of Eidi. Operation unassign external reference of oj in oi implies a call to unassign exit item Xidij. The reference Eidi is simply removed from exit item Xidij. The inverse reference list is not updated. Operation traverse external reference of o i implies a call to traverse exit item Xidki then to traverse entry item Eidi. Operation traverse entry item put T in the transaction list of Eidi. Operation return external reference of oi implies a call to traverse exit item Xidki which does not perform any update. The commit and rollback operations remove T from the transaction list of each entry item traversed by T. They also unname each exit items created by T which are not assigned. The global GC only performs two types of updates on entry items. The first one consists in updating inverse reference lists of entry items associated to global objects whose reference has been unaffected. The second one consists in changing the status of objects during the reverse mark and sweep process. It is important to note that these updates are never performed on objects marked by transactions. Thus there is very little interactions between transactions and the global GC for updating entry items. 4.3. Reverse Mark and Sweep Algorithm The reverse mark and sweep algorithm exploits a diffusing computation [DS80, CMH83] initiated at any global object suspected to be dead. It exploits inverse reference lists to find if the predecessors of the suspected object are either connected to persistency root or used by an application. In order to ease the presentation, the algorithm is first presented without considering object cycles. Adaptations of this algorithm are then proposed to collect dead object cycles. The algorithm is based on the following rules: rule 0: rule 1: rule 2: rule 3: rule 4: rule 5: rule 6: rule 7: rule 8: rule 9: the initiating object takes the status suspected; a suspected object sends a question to each of its predecessors according to its inverse reference list; a queried object that is named or marked by a transaction answers alive; a queried object that does not satisfy rule 2 and having unaffected the exit item corresponding to the query answers disconnected; a queried object that does not satisfy rule 3 and with no predecessor answers dead and is collected; a queried object that does not satisfy rule 4 takes the status suspected; as soon as an object receives an alive answer, it takes the status quiet and answers alive; as an object receives a disconnected answer, it removes the corresponding reference from its inverse reference list; if all answers are dead or disconnected, the object answers dead and is collected; when the initiating object has received all its answers, the computation is done. A new initiating object is selected. The reverse mark and sweep algorithm is composed of two steps. The first step consists in choosing an initiating object. This step can be done locally by a GC module. An initiating object must satisfy the two following conditions : (i) it is not named; (ii) it is not used by application processes. Such objects are selected through the entry items managed by the DBMS server module associated to the GC module. A global object satisfying conditions (i), (ii) and without predecessor is destroyed directly without initiating the computation (reference counting). The second step is a distributed computation that can be implemented by two operations associated with entry items in charge respectively to answer a question addressed to an entry item and to send questions to the entry item's predecessors according to rules 0 to 9. This process is illustrated in Figure 5 for the initiator o1. Object o1 takes the status suspected according to rule 0. Then o1 sends question q1 to its predecessor o2 according to rule 1. Object o2 is not named nor marked by a transaction but has three predecessors. According to rule 5 o2 takes the status suspected and sends question q2 to its first predecessor o4. The link between o4 and o2 has been cut (the exit item associated to o4 and referencing o2 has been suppressed) and o4 answers disconnected to q2 according to rule 3. According to rule 7, o2 removes o4 from its inverse reference list. Then object o2 sends question q3 to object o3. Object o3 answers dead and is collected according to rule 4 because it is unnamed and 8 has no predecessor. Then object o2 sends question q4 to object o5. Object o5 is named or marked by a transaction and answers alive according to rule 2. As o2 receives the answer alive to question 4, o2 takes the status quiet again and answers alive to q1 according to rule 6. As o1 receives the answer alive to question q1, o1 takes the status quiet again and the process stops according to rule 9. O3 O3 O4 O5 O2 O4 Inverse reference Initiating object q4 O2 External reference O5 q3 q2 q1 Named or marked object O1 O1 An object graph portion Querying inverse references starting from O1 Figure 5: Reverse Mark and Sweep Starting from O1 Note that the rules supervising step 2 are ordered. This order is important. Let us consider the example of Figure 5 again. If the transaction which has cut the link between o4 and o2 was not completed, o4 would be marked and would have answered alive to q2 according to rule 2. Indeed such a transaction can abort which would have the consequence to reintroduce the link between o4 and o2. This example shows that the global GC must be conservative (i.e. an object is not collected as soon as it becomes unreachable) which is the case due to rule 2. Rules 0 to 9 guarantee that the global GC satisfies the properties of safety that is: objects reachable from persistent roots or from active transactions must not be reclaimed. First, rule 2 guarantees that an object which is named or used by an application is not suspected. Then, rule 6 guarantees that a suspected object which has at least one reachable predecessor is not collected. Finally, rule 8 guarantees that a suspected object is collected if and only if all its predecessors answered dead (unreachable). Consequently if an answer message is lost the corresponding object is not collected. The global GC also satisfies the liveness property that is: unreachable objects should be reclaimed eventually. An unreachable object can be suspected then collected for the following reasons. The first step of the global GC can select any object as initiator using the set of entry items. All the marks inserted by a transaction in entry item's transaction lists are removed when the transaction ends (commit or abort). Rules 3 and 7 guarantee that the inverse reference list of a suspected object which has been unaffected by a commited transaction is updated. Finally an unreachable object which is queried is either directly collected according to rule 4 or suspected then collected according to rules 5 and 8. Compared to mark and sweep algorithms of the literature, the reverse mark and sweep algorithm has several important properties. Dead objects are collected during the marking phase so that the sweep phase is unnecessary. Consequently objects are accessed only once. In addition, a complete marking phase starting from an initiator access a small portion of the database and stops before reaching named objects if objects are marked by transactions. Due to these properties the reverse mark and sweep algorithm is well adapted to a DBMS context where number of alive objects is important. In addition it works in harmony with the reference counting technique aimed by the global GC. Cycle Collection A cycle is detected as a suspected object receives a question. This is illustrated in Figure 6.a with the cycle between o2 and o3. Object o2 receives a first question q1. It takes the status suspected and sends question q2 to object o3. Object o3 takes the status suspected and sends question q3 to object o2. Object o2 receives a question while being in suspected state, the cycle is detected. Figure 6.b shows that if the initiating object is involved in a cycle it will be the cycle detector. 9 O3 q2 Inverse reference Initiating object q3 O2 q1 O2 O1 q1 q2 O1 (a) (b) Figure 6: Cycle Detection In order to manage the cycle, the object that detects a cycle change its state from suspected to cycle detector and answers cycle. An object that receives a cycle answer can be in one of the two following situations (see Figure 7): either all its predecessor are dead (case of o3) or it has a predecessor not yet queried which is alive (case of o2). In the first case the object cannot answer that it is dead because an other object involved in the cycle can be alive (objects have not yet queried all their predecessors). In this case the object propagates the cycle answer. In the second case the object answers alive and is responsible to change the state from cycle to quiet for all the objects having already answered cycle (except for the cycle detector which will receive the alive answer). If the cycle detector does not receive an alive answer from one of its predecessors it is in charge to collect all the objects involved in the cycle. Note that only the cycle detector can start the cycle collection because it knows all the answers of its predecessors. These adaptations to the original algorithm are taken into account by replacing the rules 5, 6 and 8 by the rules 5a, 5b, 6a, 6b, 6c, 6d, 8a, 8b, 8c, 8d and 8e detailed below. Inverse reference Initiating object O4 q4 q2 O3 O2 Named or marked object q3 q5 q1 O5 O1 Figure 7: Deciding Whether a Cycle is Dead or Alive rule 5a: rule 5b: rule 5c: rule 6a: rule 6b: rule 6c: rule 6d: rule 8a: rule 8b: rule 8c: a queried object that does not satisfy rule 4 and having the status quiet takes the status suspected; a queried object that does not satisfy rule 5a takes the status cycle detector and answers cycle; an object that receives an cycle answer takes the status cycle; as soon as an object having the status suspected receives an alive answer, it takes the status quiet and answers alive; as soon as an object having the status cycle or cycle detector receives an alive answer, it takes the status quiet, asks its predecessor that answered cycle to change its status from cycle to quiet and answers alive; an object invoked for changing its status from cycle to quiet changes its status to quiet if it is cycle and propagates the request to its predecessor that answered cycle. an object invoked for changing its status from cycle to quiet stops the propagation of the request if it is cycle detector or quiet; if all answers are dead or disconnected and the object is suspected, the object answers dead and is collected; if all answers are dead or disconnected and the object has the status cycle, the object answers cycle; if all answers are dead or disconnected and the object is cycle detector, the object asks its predecessor that answered cycle to be collected and answers dead; 10 rule 8d: an object invoked for being collected and whose state is cycle propagates the request to its predecessor that answered cycle and is collected; rule 8e: an object invoked for being collected and whose state is cycle detector stops the propagation of the request and is collected; Let us apply this adaptation of the reverse mark and sweep algorithm to the example of Figure 7. Object o1 takes the status suspected according to rule 0 and sends question q1 to its predecessor o2 according to rule 1. Object o2 takes the status suspected and sends question q2 to its first predecessor o3 according to rules 5a and 1. In turn, object o3 takes the status suspected and sends question q3 to its first predecessor o2. Object o2 receives a question while being in status suspected, then o2 takes the status cycle detector according to rule 5b and answers cycle. Object o3 takes the status cycle according to rule 5c and sends question q4 to its second predecessor o4. Object o4 answers dead and is collected according to rule 4 because it is unnamed and has no predecessor. Object o3 answers cycle to q2 according to rule 8b. Object o2 sends question q5 to object o5. Object o5 is reachable and answers alive according to rule 2. As o2 receives the alive answer to question 5, o2 takes the status quiet, ask o3 to change its status into quiet and answers alive to q1 according to rule 6b. Object o3 asks o2 to change its status to quiet according to rule 6c and object o2 stops the propagation according to rule 6d. As o1 receives the answer alive to question q1, o1 takes the status quiet again and the process stops according to rule 9. In some cases objects can be involved in nested cycles (no elementary cycles) like in Figure 8. In such cases several nodes can take the status of cycle detector (in the worst case each cycle has a cycle detector). As for simple cycles, it is necessary to know the answers of all the predecessors of objects involved in circuits to decide that a circuit is not reachable. Consequently only one cycle detector can decide to collect the set of nested cycles, precisely the last cycle detector which does not have queried all its predecessors. The choice of the appropriate cycle detector in charge to collect a circuit depends on the nested cycles topology. This decision requires the management of a list of cycle detector identifiers propagated through the answers as pictured in Figure 8. Each time a cycle detector is detected, it answers cycle and put its identifier in the list of cycle detectors. When a cycle detector receives the answer of its last predecessor and this answer is not alive, it removes its identifier from the list of cycle detectors. If this list becomes empty, the cycle detector can decide to collect the set of cycles and answers dead. Otherwise it changes its status into cycle and propagates the cycle answer. Let us consider example Figure 8. Object o3 takes the status cycle detector when receiving question q4 and puts its reference in the cycle answer list. Object o1 takes the status cycle detector when receiving question q5 and puts its reference in the cycle answer list. Object o2 performs the union of the lists of answers to q4 and q5. When receiving answer to q2 object o3 has no more predecessors. It removes its reference from the answers list, changes its state into cycle and propagates the cycle answer to o1. Object o1 is the last cycle detector and can decide to collect the nested cycles if it is not reachable. The complete algorithm is described in [Mul97] with a proof of correctness. q1 a1 : cycle {O1} q5 O1 a3 : cycle {O1} O2 q4 O3 a4 : cycle {O3} q2 a2 : cycle {O1, O3} q3 a3 : cycle {O1, O3} ai : cycle { } stands for a cycle answer to question qi with its list of cycle detectors Figure 8: Nested Cycles 11 O4 5. Discussion The global GC algorithm can have several behaviors depending on how the first step in charge to select an initiating object is managed. A first solution is to select the initiating object among all the global objects. This solution requires a global synchronization between all the global GC modules. A second solution is to choose alternatively a global GC module in charge to select locally an initiating object. When the reverse mark and sweep process initiated at a global GC module is finished, this module can choose a new global GC module in charge to select locally a new initiator so that all the global GC modules will perform the first step one after the other. The synchronization is then limited to couples of global GC modules. A last solution is to allow the global GC modules to select initiating objects in parallel. This last solution avoids synchronization between the global GC modules but implies that several mark and sweep process can be run in parallel. If several mark and sweep processes are run in parallel the following fact occurs: a same global object can receive questions coming from distinct initiating objects. This fact raises two issues. First it is necessary to guarantee the integrity of parallel updates of the status of entry items. Secondly the questions must be sequenced since an object can receive queries from distinct initiators. The first issue is solved using latches as discussed in section 3.2. The second issue is managed as follows. A timestamp is used in order to distinguish from which initiator questions are issued. The timestamp is set by the initiator and is propagated through questions during the diffusing computation. This timestamp is stored in the entry item of each object whose status changes from quiet to suspected. If a suspected object receives a question with a timestamp different from its timestamp, the question is detected as issued from an another initiating object and is put in a queue associated to the object's entry item. Delayed questions stay in the queue until the suspected object recovers the quiet status or is collected. Indeed, according to rules 6a, 6b, 6c, 8a, 8d and 8d a suspected object either recovers the quiet status or is collected. Then, delayed questions are extracted from the queue and are answered respectively with a live or dead answer. Cycles, however, can cause deadlocks due to several diffusing computations mutually waiting for each other. Such deadlocks can be avoided with the following adaptation. As a suspected object receives a question with a timestamp greater than its timestamp, the question is not put into the queue but is answered with a message asking to suspend the computation. Such an answer can be processed as an alive answer. 6. Conclusion This paper has presented a global GC integrated in a MultiDataBase System architecture which preserves DBMS autonomy. None assumption is made on the behavior of the DBMS's local GC and the global GC works independently of the local GC. The global GC proposed is an adaptation of reference listing combined with a reverse mark and sweep technique. It has the following interesting properties: it is incremental and requires few interactions with transactions; the reverse mark and sweep technique is able to detect object cycles that are frequent in a DBMS context; it is able to collect objects without accessing the whole database. The global GC works exclusively on entry and exit items without accessing global object cells stored in DBMS. Consequently it implies few I/O overhead for the DBMS. In addition the global GC satisfies the fundamental properties of safety and liveness. Performance issues have not been directly addressed in this paper. However it was mentioned in section 5 that the global GC algorithm can have several behaviors depending on how the first step is managed. It was shown that several global GC can be run in parallel so that global synchronization of DBMS sites is not required. A parallel implementation of the global GC is in progress in order to evaluate its impact on the MDBS performances. Indeed an important objective is to avoid that garbage collection introduces overhead for the DBMS sites. An other opened issue is to avoid that the reverse mark and sweep process always inspect the same objects. Several heuristics can be applied during the step in charge to select an initiator. A simple one is to use timestamps as mentioned in section 5 in order to select the object with the smallest one. An other strategy is to take into consideration the fact that applications access a small portion of the database and to suspect in priority objects recently accessed. 12 References [ABC+83] M. Atkinson, P. Bailey, K. Chisholm, P. Cockshott, R. Morrison, An Approach to Persistent Programming, Computer Journal, 26(4), 1983. [AFG95] L. Amsaleg, M. Franklin, O. Gruber, Efficient Incremental Garbage Collection for Client-Server Object Database Systems, In Proc. of the 21th VLDB Int. Conf., Zurich, Switzerland, September 1995. [Bak78] H. G. Baker, List Processing in Real Time on a Serial Computer, CACM, 21(4):280-294, April 1978. [BEN+94] A. Birrell, D. Evers, G. Nelson, S. Owicki, E. Wobber, Distributed Garbage Collection for Network Object, Digital System Research Center Technical Report 115, 1994. [Bis77] P. B. Bishop, Computer System with a Very Large Address Space and Garbage Collection, PhD Thesis, MIT, Laboratory for Computer Science, Cambridge, MA, USA, May 1977, MIT/LCS/TR-178. [Bjö89] A. Björnerstedt, Secondary Storage Garbage Collection for Decentralized Object-Based Systems, Tsichritzis D. C. Editor Object Oriented Development, Genève, Centre Universitaire d'Informatique, 1989. [CMH83] K. M. Chandy, J. Misra, L. M. Haas, Distributed Deadlock Detection, ACM TOCS, 1(2), 1983. [Col60] G. E. Collins, A Method for Overlapping and Erasure of Lists, CACM, 2(12), December, 1960. [CWZ94] J. Cook, A. Wolf, B. Zorn, Partition Selection Policies in Object Database Garbage Collection, In SIGMOD Conf., Mineapolis, MN, May 1994. [Det90] D. Detlefs, Position Paper : Concurrent Atomic Garbage Collection, Workshop on Garbage Collection in Object-Oriented Systems, ECOOP-OOPSLA'90 Conf., Ottawa Canada, October 1990. [DS80] E. W. D. Dijkstra, C. S. Scholten, Termination Detection for Diffusing Computations, Information Processing Letters, 11(4), 1980. [Fer96] P. Ferreira, Larchant : Ramasse-Miettes dans une Mémoire Partagée Répartie avec Persistance par Atteignabilité, Thesis, Université Paris IV, Juillet 1996. [GR93] J. Gray, A. Reuter, Transaction Processing: Concepts and Techniques, Morgan Kaufmann, 1993. [Gru92] O. Gruber, Eos an Environment for Persistent and Distributed Applications over a Shared Object Space, Thesis, Université de Paris VI, France, December 1992. [Hug85] J. Hugues, A Distributed Garbage Collection Algorithm, In ACM Conf. on Functional Programming Languages and Computer Architecture, LNCS 201, Springer-Verlag, September 1985. [KW93] E. Kolodner, W. Weihl, Atomic Incremental Garbage Collection and Recovery for Large Stable Heap, In Proc. of the ACM SIGMOD Int. Conf., Washington D. C., June 1993. [LC95] S. Louboutin, V. Cahill, Lazy per Cluster Log-Keeping Mechanism for Global Garbage Detection on Amadeus, Distributed System Group, Trinity College, Dublin, Ireland, Technical Report TCD-CS-95-13, 1995. [LL86] B. Liskov, R. Ladin, Highly-Available Distributed Services and Fault-Tolerant Distributed Garbage Collection, In Proc. of the 5th Symposium of the Principles of Distributed Computing, Vancouver, Canada, August 1986. [LQP92] B. Lang, C. Queinnec, J. Piquer, Garbage Collecting the World, Conf. Record of the Nineteenth Annual ACM Symposium of Principles of Programming Languages, 1992. [McC60] J. McCarthy, Recursive Functions of Symbolic Expressions and their Computation by Machine, CACM, 3(4), April 1960. [ML94] U. Maheshwari, B. Liskov, Fault-Tolerant Garbage Collection in a Client-Server Object-Oriented Database, In Proc. of the 3rd PDIS Int. Conf., Austin Texas, September 1994. [ML89] C. Mohan, F. Levine, ARIES/IM: An Efficient and High Concurrency Index Management Method Using WriteAhead, IBM Research Report RJ6846, IBM Almaden Research Center, August 1989. [MRV91] L. Mancini, V. Rotella, S. Venosa, Copying Garbage Collection for Distributed Object Stores, In SRDS Conf., Pisa, Italy, September 1991. [Mul97] F. Mulatéro, Contrôle de Concurrence Sémantique et Ramasse-Miettes dans un Système Multibase de Données, Thesis, Université Paul Sabatier, Toulouse, France, Juin 1997. [OMG91] Object Management Group, The Common Object Request Broker Architecture : Architecture and Spec., OMG Document Number 91.12.1 Revision 1.1, 1991. [ONG93] J. Otoole, S. Nettles, D. Gifford, Concurrent Compacting Garbage Collection of a Persistent Heap, In Proc. of the 14th SOSP Int. Conf., Asheville North California, Vol. 27, Number 5, December 1993. [ÖV91] M. T. Özsu, P. Valduriez, Principles of Distributed Database Systems, Prentice-Hall Int. Editions, 1991. [PS95] D. Plaifossé, M. Shapiro, A Survey of Distributed Garbage Collection Techniques, Int. Workshop on Memory Management, Kinross Scotland, September 1995. [SDP92] M. Shapiro, P. Dickman, D. Plainfossé, Robust Distributed References and Acyclic Garbage Collection, Symposium on Principles of Distributed Computing, Vancouver Canada, August 1992. [SP96] M. Skubiszewski, N. Porteix, GC-consistent Cuts of Databases, INRIA, RA No. 2681, April 1996. [Wil92] P. R. Wilson, Uniprocessor Garbage Collection Techniques, In Proc. of the Int. Workshop on Memory Management, Number 637 in Lecture Notes in Computer Science, Springer-Verlag, Saint-Malo France, September 1992. [YNY94] V. Yong., J. Naughton, J. Yu, Storage Reclamation and Reorganization in Client-Server Persistent Object Stores, In Proc. of the Data Engineering Int. Conf., Houston Texas, February 1994. 13