Federated Database Management Systems

advertisement
A Global Garbage Collector for
Federated Database Management Systems
Mulatéro Frédéric
Thévenin Jean-Marc
Bazex Pierre
MIRA-UTM/IRIT
UT1/IRIT
IRIT
Équipe MIRA - UFR MISSEG
Université Toulouse Le Mirail
5 Allée Antonio Machado
31 052 Toulouse Cedex
France
mulatero@cict.fr
Université Toulouse 1
Place A. France
31 042 Toulouse Cedex
France
IRIT
Université Paul Sabatier
118, route de Narbonne
31 062 Toulouse Cedex 04
France
bazex@irit.fr
thevenin@univ-tlse1.fr
Abstract
A new trend is to federate existing and independent DataBase Management Systems (DBMS) into
MultiDataBase Systems (MDBS). In this approach it is fundamental to preserve the autonomy of the federated
DBMS. Indeed, the emergency of global applications should not disturb the preexisting local applications
running on each local DBMS. This paper presents a global Garbage Collector (GC) integrated in a
MultiDataBase System architecture which preserves DBMS autonomy. Each DBMS is supposed to have its own
local GC and none assumption is made on the behavior of the local GC. In addition, there is no interaction
between the global GC and the local GC. The global GC proposed is an adaptation of reference listing
combined with a reverse mark and sweep technique. It has the following interesting properties: it is
incremental and requires few interactions with transactions; the reverse mark and sweep technique is able to
detect dead object cycles that are frequent in a DBMS context; it is able to collect objects without accessing the
whole database and global synchronization of DBMS sites is not required. The Global GC works exclusively on
entry and exit items without accessing global object cells stored in DBMS. Consequently it implies few I/O
overhead for the DBMS.
Keywords
Incremental garbage collector, multidatabase system, DBMS autonomy, dead cycle detection
1. Introduction
A MultiDataBase System (MDBS) [ÖV91] provides uniform access to data managed by autonomous DataBase
Management Systems (DBMS) that can be heterogeneous and that are distributed across a network. Data being
distributed and shared, it may be impossible for an application to determine which data are still in use and which
data should be discarded. Referential persistency models conjointly used with a garbage collector provide a nice
solution to ease this process.
Referential persistency models [ABC+83] rely on the following rules. The user is allowed to give names to
objects. Named objects are so called persistency roots. Each object referenced through a persistency root
directly or indirectly is persistent. Other objects are transient and should be destroyed by the Garbage Collector
(GC) if they are no more used by any application. Thus the role of the GC is to detect objects that are neither
connected to persistency roots nor connected to objects in the stacks of applications, in order to discard them.
A lot of works have been dedicated to GC in charge to collect objects in centralized environments [Wil92] and
distributed environments [PS95, Fer96, BEN+94, LPQ92]. Some works have also been dedicated to GC in
charge to collect database objects [AFG95, Bjö89, CWZ94, Det90, Gru92, KW93, ML94, MRV91, ONG93,
SPD92, SP96, YNY94]. The specificity of database objects is that they have to respect the ACID properties
[GR93], namely : Atomicity, Coherency, Isolation and Durability. Consequently, database objects are
manipulated through transactions that run concurrently with the GC.
A new trend is to federate existing and independent DBMS into MDBS. In this approach it is fundamental to
preserve the autonomy of the federated DBMS. Indeed, the emergency of global applications should not disturb
the preexisting local application running on each local DBMS.
In this paper we propose a solution to implement a global GC for a MDBS composed of federated DBMS. This
solution aims at respecting DBMS autonomy and supposes that each DBMS has its own local GC. The data of
the MDBS are complex in the sense that there is no restriction in the localization of global objects in order to
combine them.
The algorithm proposed for the global GC is based on reference listing [ML94], a variant of reference counting
[Col60]. Such algorithms do not detect object cycles which are quite frequent in object DBMS [AFG95]. To
solve the problem, the proposed algorithm implements reverse mark and sweep based on inverse reference lists.
This technique has two main advantages compared to regular mark and sweep techniques [McC60]: it is able to
collect objects without accessing the whole database and objects are accessed only once. In addition, while
running in a distributed context, it does not require global synchronization between DBMS sites.
The following of the paper is organized as follows. Section 2 introduces our vocabulary and gives the main
issues concerning global and local objects. Section 3 describes the architecture that supports these concepts.
Section 4 presents the global GC protocol. Section 5 discusses important aspects on the global GC behavior.
Section 6 concludes.
2. Global and local objects
Two types of applications coexist in a federated MDBS. Local applications are limited to the manipulation of
local objects stored in one DBMS. These manipulations escape to the control of the MDBS. Global
applications transparently access to shared data located in the different DBMS federated by the MDBS.
For each DBMS involved in the federation, objects are partitioned into three disjoint sets (see Figure 1). Local
objects are only accessed by local applications. References between local objects are local to one DBMS.
Global objects are shared by global applications under the control of the MDBS. A global object may contain
external references, that is references to global objects located in remote DBMS. Shared local objects are
accessed both by local and global applications. Such objects are local objects that are made visible to the global
applications. They do not contain external references. Thus local application not under the control of the MDBS
can not reach a global object through a shared local object.
DBMS1
Applying referential persistency, objects are said to be persistent if they are named or referenced through a
named object. In order to preserve DBMS autonomy, each DBMS manages its own set of local names for
persistency roots of local objects and the federated MDBS manages a unique set of global names for persistency
roots of global objects as pictured in Figure 1.
SL
L
G
G
DBMS2
L
External reference
Local reference
Global object
Local object
Global name
Local name
SL
Shared Local object
object
Figure 1: Local, Global and Shared Local Objects
2
As mentioned above, two types of reference coexist in the federated MDBS. Local references are used to
identify or send operations to local and shared local objects inside a local DBMS. External references are
necessary to access global and shared local objects in the federated MDBS context. While local references
provide direct access to objects inside a DBMS, external reference provide indirect access to objects across the
network. As in [LPQ92], an external reference to object o is composed of (see Figure 2) a local reference to an
exit item stored in the local DBMS, which in turn references an entry item stored in the DBMS containing object
o, which itself contains a local reference to object o. The reference between an entry item and an exit item is a
remote reference that can be implemented as the concatenation of a DBMS identifier and a local reference. In
the following, if oi denotes a global object then Eidi, Xidij and Lidi respectively denote the external reference of oi
which is also the reference of the entry item of oi, the reference of exit item from oj to oi, and the local reference
of object oi.
Local reference
Remote reference
Entry items
EId2
LId2
EId1
XId23
LId3
EId3
LId1
XId13
Exit items
Figure 2: Entry and Exit Items
The operations defined on external references are the following : create external reference, assign external
reference, traverse external reference, return external reference, and unassign external reference. Create
external reference is used for global object creation and to change the status of a local object into shared local
object. Assign external reference is necessary to establish a link between two global objects. Traverse external
reference is used each time a global or shared local object is accessed. Return external reference allows a global
objects method to return an external reference owned by the global object. Unassign external reference is
necessary to inform that a link between two global objects has been cut. These operations are implemented
through operations on exit and entry items so that the use of external references is transparent to global
applications. Operations on local references are defined by the DBMS involved in the federation.
3 General Architecture
It was mentioned in the introduction that there is no restriction in combining global objects to construct complex
objects. Figure 3 shows an example where global object o1 stored in DBMS1 is composed of global object o2
stored in DBMS2 in turn composed of object o3 stored in DBMS3 in turn composed of object o4 stored in
DBMS1.
Entry item
O1
Exit item
O4
DBMS1
O3
DBMS3
O2
DBMS2
Local reference
Global reference
Figure 3: Combining Global Objects
3
This example shows that there is no limitation in the length of the composition string of complex objects and that
there is no hierarchy between the DBMS involved in the composition. As a consequence, the same DBMS can in
turn play the role of a client or a server. It is considered as a client when it calls the interface of a global object
stored in a remote DBMS, while it is considered as a server when it implements the interface of a global object.
In order to integrate a new DBMS into the federation, two modules have to be developed, namely the DBMS
client module and the DBMS server module, so that the DBMS can play its two roles (see Figure 4 for an
example).
Named global object
Global GC
module
Local
application
context
DBMS1
Server module
Client module
Global
application
context
Server module
Server module
Global GC
module
DBMS3
DBMS2
Global GC
module
Client module
Figure 4: General Architecture
3.1. DBMS client module
The client module is in charge to provide both the MDBS interface to global applications and the global object
interface. The MDBS interface is composed of the following operations: connection to the federated MDBS,
disconnection, commit of a transaction and abort of a transaction. The connect operation initiates a first
transaction for the global application and the disconnect operation commit the last transaction of the application.
The commit operation must ensure the durability of all the operations performed by the transaction (Atomicity
and Durability) and must follow a two step commit protocol since we are in a distributed context. The abort
operation must ensure that all the operations of a transaction are undone (Atomicity). The transactional aspects
of this interface are further discussed in section 3.4.
The global object interface consists in sending global object method calls to the appropriate DBMS server
module as well as to provide access to the list of global names. This can be done following CORBA [OMG91]
by mean of proxy whose objective is to encapsulate object method calls through exit items. Note that among
global object methods a special method is required for global object creation. These aspects are not further
discussed in this paper. It is important to mention here that since we are in a transactional context, each global
object operation is performed for one transaction whose identifier is transmitted somehow. The simplest way to
do it is to add the transaction identifier as a parameter of the method call mechanism.
Exit items are managed by the client module. Exit items are implemented as a class of local objects stored in the
DBMS associated to the client module so that they can be referenced through a local reference. The main
operations of this class are : create exit item, assign exit item, traverse exit item and unassign exit item. Create
exit item takes an external reference as entry parameter and returns a local reference to an exit item that can be
affected to a global object. This operation is used each time an external reference is returned by a method so that
an external reference is always encapsulated into an exit item. The result exit item is then bind to the application
context (see Figure 4). Assign exit item and unassign exit item are used respectively to link and unlink a global
object to an exit item. If the exit item is already assigned to a global object the assign exit item operation create a
new exit item so that there is one exit item per external references held in the DBMS even if they refer to the
4
same object (see Figure 2). Having several exit items pointing to the same object avoids contention on exit items
due to the DBMS transaction management mechanism (an exit item updated by one transaction is locked for the
whole transaction). Recall that exit item should not be assigned to local shared objects. Traverse exit item returns
the external reference associated to an exit item. This interface is used to send a method call to the global object
associated to an exit item. It is also used to return external references: when a method of global object o needs to
return the reference of an object affected to o it is necessary to return the external reference instead of the exit
item.
3.2. DBMS server module
The server module provides the mirror interfaces of the client module interfaces on the server side. These
interfaces provide access to the transactional facilities mentioned above and transmit global objects methods calls
to the appropriate object stored in a local DBMS through an entry item. This can be done by mean of stubs like
in CORBA.
Entry items are managed by the server module. Entry items can be implemented as a class that should provide
the following interface: create entry item, traverse entry item, delete entry item. Create entry item takes a local
reference to a global object cell as entry parameter and returns an external reference. This operation is used both
by the method which creates global objects and to change the status of a local object into shared local object.
Traverse entry item takes an external reference as entry parameter and returns a local reference to a global object
cell. It is used to send a method to call the appropriate object. Delete entry item is used by the global GC to
collect entry items.
Contrarily to exit items, entry items are not stored in the local DBMS. In this way it is easier to provide uniform
remote references in each server module. Another advantage of implementing entry items outside DBMS is to
ensure that coherent parallel updates can be performed on entry items by mean of latches [ML89] instead of
transactions. Latches are instant duration locks that guarantee the coherency property of parallel updates without
providing the atomicity property of transactions. Latches are used to minimize synchronization between the
global GC and transactions. Indeed next sections show that transactions and the global GC are run concurrently
and that both update entry items. However, the global GC does not require the atomicity property for its updates.
3.3 Garbage Collection
The global GC proposed aims at respecting DBMS autonomy. Consequently each DBMS is supposed to have its
own local GC and none assumption is made on the behavior of the local GC. In addition, there should not be
interactions between the global GC and the local GC of the DBMS involved in the federation. The purpose of
the global GC is to make sure that entry items, global object cells and exit items are destroyed when no more
used. On the other side, the purpose of a local GC is to collect the local objects no more used in a DBMS. We
detail below how these tasks are run independently.
Exit items and global objects cells are implemented as local objects and their destruction is under the
responsibility of the local GC of the DBMS in which they are stored. However, only the global GC of the MDBS
can decide whether a global object is alive or dead. Exit items and global objects cells are kept alive in the
DBMS context by mean of local names allocated to them at creation time. The local name of an exit item can be
directly stored in the exit item cell. As an exit item is unassigned, its local name is destroyed so that the local CG
can collect the exit item. The local name of a global object cell is stored in the associated entry item. The global
GC collect a global objects oi as follows. First each exit item Xidji pointed by oi is unassigned from the exit item
of oi so that it is collected as mentioned above. Then the local name of o i stored in the entry item Eidj is
destroyed so that the local GC can collect the global object's cell. Finally the entry item can be destroyed directly
by the global GC since it is stored outside the DBMS.
The global GC algorithm is a distributed algorithm implemented by global GC modules associated to each
DBMS. This algorithm is detailed in section 4.
5
3.4 About Transaction Management and Garbage Collection
Global objects as well as local objects must satisfy ACID properties. Consequently objects can only be accessed
through transactions. Transaction management is not the purpose of this paper however a few hypothesis have to
be stated. The federated MDBS distinguish global transactions accessing global and shared local objects from
local transactions executed on each DBMS. A global transaction is supported by one local transaction per
DBMS involved in the global transaction. As a global transaction commits, the commitment of the involved
local transactions is synchronized though a two phase commit protocol [GR93]. Commercial DBMS provide a
standard interface for the two phase commit protocol. However this interface is encapsulated by the DBMS
client and server modules interface for synchronization between global transactions and the global GC.
The global GC algorithm is incremental so that it runs concurrently with transactions. A first difficulty for the GC
is to consider as alive volatile objects created by active transactions and not yet linked to any persistency root.
The ACID properties of transactions bring the following problems [AFG95]. Atomicity brings the problem that
a transaction which cuts a link between two global objects can be rolled back. This violates the fundamental
assumption that dead objects always remain dead. Coherency and Isolation introduce overhead and contention
problems in transaction execution. The GC should not increase this overhead and should avoid to interfere with
transactions. It should not be considered as a transaction. Durability invalidates the hypothesis of some GC
algorithms that the number of alive objects is small. In addition, objects visited by the GC may be stored on disk
which introduce undesirable I/O overhead. Finally to enforce Durability the DBMS provide recovery mechanism
that are able to re-execute actions of committed transaction in case of system crash or disk crash. The GC actions
are not taken into account by the DBMS recovery mechanisms and may be lost after a crash. This may have
consequences on the database recovery.
The atomicity problem and the problem of volatile objects are solved using a read barrier [Bak78] implemented
by the interface of entry items. The purpose of the read barrier in this context is to mark with a transaction
identifier the entry item of each global object involved in a global transaction. These marks allow the global GC
to considerate as alive both newly created objects and objects whose external reference have been unassigned by
an active transaction. As a global transaction commits all its marks must be removed. Exit items bind to the
transaction and that have not been affected to a global object are also collected by the commit operation.
For each global object oi, a link is maintained between the entry item of oi and the set of exit items referenced by
oi so that the global GC can traverse global object graphs without accessing the global object cells stored in the
DBMS. The benefit is that the global CG works exclusively on entry and exit items without accessing the objects
manipulated by DBMS transactions. In addition, the global GC doesn't need to be a transaction. Indeed exit
items are only accessed in read mode by the global GC and entry items are stored outside the DBMS. A last
advantage is that exit items are very small objects that can be clustered together so that accessing exit items
should imply few I/O overhead for DBMS.
As mentioned in the previous section, the global GC does not update or destroy DBMS objects directly. It
simply destroys local names of objects that can be collected by a local GC. Consequently, the interactions with
the recovery mechanism of a local DBMS are postponed to its local GC. However, transactions update entry
items that are not stored in the DBMS. The server module should implement a recovery mechanism for entry
items in order to ensure their durability. This aspect is not detailed in this paper.
4. Global Garbage Collector
The global GC algorithm proposed is a variant of GC algorithms based on reference counting [Bak78]. In this
context, each object maintains a reference counter set to the number of objects which reference it. Objects to be
collected are unnamed objects with a reference counter set to zero. GC algorithms based on reference counting
have the following advantages in a DBMS context. They are incremental by nature and do not require a
complete scan of all the database objects for dead object detection. They also have the following drawbacks. It
is impossible to detect dead object cycles, that is objects referencing each other without being reachable from a
persistency root, using reference counters. Indeed, objects involved in a dead cycle have their reference counters
at least set to one. A second problem is propagation. As the reference counter of object o is set to zero, it is
necessary to send a decrement message to each object referenced by o before to collect o. This process can be
recursive. A last drawback in a distributed context is that increment and decrement messages are not idempotent
6
if the network does not guarantee that messages are not lost nor duplicated. As a consequence, objects could be
collected while still alive.
The main solution to solve the cycle detection problem is to use a complementary mark and sweep technique
[Hug85, LL86, LPQ92, SDP92]. The mark and sweep techniques are based on two steps [McC60]: a first step
starts from roots and marks all reachable objects. Then a second step scans all objects and collects unmarked
objects. To avoid that these two steps scan all the objects, some solutions are proposed to cluster object
referencing each other into disjoint partitions. These two steps can then be performed independently on distinct
partitions. However mark and sweep techniques do not work in perfect harmony with reference counting
techniques. Several solutions have been proposed to solve the message problem [Bjö89, LQP92, SDP92,
BEN+94, ML94, LC95]. Most of these solutions replace reference counters by reference lists taking into account
the fact that the insert and the delete messages used to update lists are idempotent. Objects are collected when
their reference lists are empty.
The global GC algorithm is an adaptation of the solutions mentioned above. It combines the use of inverse
reference lists with a reverse mark and sweep algorithm in order to solve both the dead object cycles and the
propagation problems. An inverse reference list is assigned to each object. The inverse reference list of object o j
is augmented with the global reference of oi when the external reference of oj is assigned to oi. When a link
between oi and oj is cut, the corresponding exit item associated with oi is unaffected without propagating the
update to the inverse reference list of oj. The reverse mark and sweep algorithm selects an object suspected to be
dead and checks through its inverse references if it is reachable from a persistent root or a global transaction. If
not, the suspected object and its predecessors are collected. Inverse reference lists are updated during this
process. An important advantage of the reverse mark and sweep algorithm compared to other mark and sweep
techniques is that it does not require a complete scan of the database objects for dead object detection and then
keeps the advantage of reference counting techniques.
The following of this section presents the data structures used by the global GC, explains how they are updated
during object management and then details the reverse mark and sweep algorithm.
4.1. Entry and Exit Item Data Structures
It was mentioned section 3.3 that the global GC can traverse global object graphs accessing only entry and exit
items. Indeed all the data necessary for the GC are stored in entry and exit items in order to minimize
interactions between the global GC and the DBMS transactions. Entry and Exit items are implemented as class.
The main operations of these class have been presented in section 3.1 and 3.2. This section details additional
operations of these required by the GC to access its data.
Let us consider the entry item instance associated to global object o i. The entry item class must provide access to
the following data: the local reference of oi namely Lidi; the local name of oi; the inverse reference list of objects
connected to oi; the list of transactions that accessed oi and the status of oi. The local reference of oi is not used
by the global GC but is required to traverse the exit item associated to o i. The local name of oi is required by the
global GC to collect oi. The inverse reference list is used by the global GC to check if o i is reachable through its
predecessors. The list of transaction identifiers is used to check if o i is reachable through an application
transaction. This list implements the transaction marks mentioned in section 3.4. The status of o i is an
information used by the reverse mark and sweep algorithm to detect objects involved in a cycle. This status
initiated to quiet can change successively into suspected then cycle detector or cycle. An operation is also
required to check if an exit item previously affected to o i still exists. This operation is required by the reverse
mark and sweep algorithm to update the inverse reference list of o j when reference from oi to oj has been cut.
Finally operations on the set of entry items managed by DBMS server are required to select the suspected objects
that should be inspect by the reverse mark and sweep algorithm.
Let us consider an exit item instance affected to object oi and which references object oj. The exit item class must
provide access to the following data: the external reference of o j namely Eidj (traverse exit item); the local name
of the exit item and the local reference to the oi entry item namely Eidi. Eidj is used to traverse the exit item. The
local name is used to delete the local name of the exit item. Eid i is used conjointly with Eidj by the global GC to
check if the exit item referencing oj previously affected to oi still exists.
7
4.2. Updating Entry and Exit Item
Let us first consider the updates performed on entry and exit items by transaction T during the following
operations: create oi, assign external reference of oj to oi, unassign external reference of oj in oi, traverse external
reference of oi, return external reference of oi and commit or rollback. Operation create oi implies a call to create
entry item with parameter Lidi. Local reference Lidi is affected to Eidi and T is inserted in the transaction list of
Eidi. Operation assign external reference of oj to oi implies a call to assign exit item Xidij with parameter Eidi.
This operation ask the server containing oj to insert Eidi in the inverse reference list of Eidj. It is not necessary to
put T in the transaction list of Eidj since it is already in the transaction list of Eidi. Operation unassign external
reference of oj in oi implies a call to unassign exit item Xidij. The reference Eidi is simply removed from exit
item Xidij. The inverse reference list is not updated. Operation traverse external reference of o i implies a call to
traverse exit item Xidki then to traverse entry item Eidi. Operation traverse entry item put T in the transaction list
of Eidi. Operation return external reference of oi implies a call to traverse exit item Xidki which does not perform
any update. The commit and rollback operations remove T from the transaction list of each entry item traversed
by T. They also unname each exit items created by T which are not assigned.
The global GC only performs two types of updates on entry items. The first one consists in updating inverse
reference lists of entry items associated to global objects whose reference has been unaffected. The second one
consists in changing the status of objects during the reverse mark and sweep process. It is important to note that
these updates are never performed on objects marked by transactions. Thus there is very little interactions
between transactions and the global GC for updating entry items.
4.3. Reverse Mark and Sweep Algorithm
The reverse mark and sweep algorithm exploits a diffusing computation [DS80, CMH83] initiated at any global
object suspected to be dead. It exploits inverse reference lists to find if the predecessors of the suspected object
are either connected to persistency root or used by an application. In order to ease the presentation, the algorithm
is first presented without considering object cycles. Adaptations of this algorithm are then proposed to collect
dead object cycles. The algorithm is based on the following rules:
rule 0:
rule 1:
rule 2:
rule 3:
rule 4:
rule 5:
rule 6:
rule 7:
rule 8:
rule 9:
the initiating object takes the status suspected;
a suspected object sends a question to each of its predecessors according to its inverse reference list;
a queried object that is named or marked by a transaction answers alive;
a queried object that does not satisfy rule 2 and having unaffected the exit item corresponding to the
query answers disconnected;
a queried object that does not satisfy rule 3 and with no predecessor answers dead and is collected;
a queried object that does not satisfy rule 4 takes the status suspected;
as soon as an object receives an alive answer, it takes the status quiet and answers alive;
as an object receives a disconnected answer, it removes the corresponding reference from its inverse
reference list;
if all answers are dead or disconnected, the object answers dead and is collected;
when the initiating object has received all its answers, the computation is done. A new initiating object
is selected.
The reverse mark and sweep algorithm is composed of two steps. The first step consists in choosing an initiating
object. This step can be done locally by a GC module. An initiating object must satisfy the two following
conditions : (i) it is not named; (ii) it is not used by application processes. Such objects are selected through the
entry items managed by the DBMS server module associated to the GC module. A global object satisfying
conditions (i), (ii) and without predecessor is destroyed directly without initiating the computation (reference
counting).
The second step is a distributed computation that can be implemented by two operations associated with entry
items in charge respectively to answer a question addressed to an entry item and to send questions to the entry
item's predecessors according to rules 0 to 9. This process is illustrated in Figure 5 for the initiator o1. Object
o1 takes the status suspected according to rule 0. Then o1 sends question q1 to its predecessor o2 according to
rule 1. Object o2 is not named nor marked by a transaction but has three predecessors. According to rule 5 o2
takes the status suspected and sends question q2 to its first predecessor o4. The link between o4 and o2 has been
cut (the exit item associated to o4 and referencing o2 has been suppressed) and o4 answers disconnected to q2
according to rule 3. According to rule 7, o2 removes o4 from its inverse reference list. Then object o2 sends
question q3 to object o3. Object o3 answers dead and is collected according to rule 4 because it is unnamed and
8
has no predecessor. Then object o2 sends question q4 to object o5. Object o5 is named or marked by a
transaction and answers alive according to rule 2. As o2 receives the answer alive to question 4, o2 takes the
status quiet again and answers alive to q1 according to rule 6. As o1 receives the answer alive to question q1, o1
takes the status quiet again and the process stops according to rule 9.
O3
O3
O4
O5
O2
O4
Inverse reference
Initiating object
q4
O2
External reference
O5
q3
q2
q1
Named or marked object
O1
O1
An object graph portion
Querying inverse references starting from O1
Figure 5: Reverse Mark and Sweep Starting from O1
Note that the rules supervising step 2 are ordered. This order is important. Let us consider the example of Figure
5 again. If the transaction which has cut the link between o4 and o2 was not completed, o4 would be marked and
would have answered alive to q2 according to rule 2. Indeed such a transaction can abort which would have the
consequence to reintroduce the link between o4 and o2. This example shows that the global GC must be
conservative (i.e. an object is not collected as soon as it becomes unreachable) which is the case due to rule 2.
Rules 0 to 9 guarantee that the global GC satisfies the properties of safety that is: objects reachable from
persistent roots or from active transactions must not be reclaimed. First, rule 2 guarantees that an object which is
named or used by an application is not suspected. Then, rule 6 guarantees that a suspected object which has at
least one reachable predecessor is not collected. Finally, rule 8 guarantees that a suspected object is collected if
and only if all its predecessors answered dead (unreachable). Consequently if an answer message is lost the
corresponding object is not collected. The global GC also satisfies the liveness property that is: unreachable
objects should be reclaimed eventually. An unreachable object can be suspected then collected for the following
reasons. The first step of the global GC can select any object as initiator using the set of entry items. All the
marks inserted by a transaction in entry item's transaction lists are removed when the transaction ends (commit or
abort). Rules 3 and 7 guarantee that the inverse reference list of a suspected object which has been unaffected by
a commited transaction is updated. Finally an unreachable object which is queried is either directly collected
according to rule 4 or suspected then collected according to rules 5 and 8.
Compared to mark and sweep algorithms of the literature, the reverse mark and sweep algorithm has several
important properties. Dead objects are collected during the marking phase so that the sweep phase is
unnecessary. Consequently objects are accessed only once. In addition, a complete marking phase starting from
an initiator access a small portion of the database and stops before reaching named objects if objects are marked
by transactions. Due to these properties the reverse mark and sweep algorithm is well adapted to a DBMS
context where number of alive objects is important. In addition it works in harmony with the reference counting
technique aimed by the global GC.
Cycle Collection
A cycle is detected as a suspected object receives a question. This is illustrated in Figure 6.a with the cycle
between o2 and o3. Object o2 receives a first question q1. It takes the status suspected and sends question q2 to
object o3. Object o3 takes the status suspected and sends question q3 to object o2. Object o2 receives a
question while being in suspected state, the cycle is detected. Figure 6.b shows that if the initiating object is
involved in a cycle it will be the cycle detector.
9
O3
q2
Inverse reference
Initiating object
q3
O2
q1
O2
O1
q1
q2
O1
(a)
(b)
Figure 6: Cycle Detection
In order to manage the cycle, the object that detects a cycle change its state from suspected to cycle detector and
answers cycle. An object that receives a cycle answer can be in one of the two following situations (see Figure
7): either all its predecessor are dead (case of o3) or it has a predecessor not yet queried which is alive (case of
o2). In the first case the object cannot answer that it is dead because an other object involved in the cycle can be
alive (objects have not yet queried all their predecessors). In this case the object propagates the cycle answer. In
the second case the object answers alive and is responsible to change the state from cycle to quiet for all the
objects having already answered cycle (except for the cycle detector which will receive the alive answer). If the
cycle detector does not receive an alive answer from one of its predecessors it is in charge to collect all the
objects involved in the cycle. Note that only the cycle detector can start the cycle collection because it knows all
the answers of its predecessors. These adaptations to the original algorithm are taken into account by replacing
the rules 5, 6 and 8 by the rules 5a, 5b, 6a, 6b, 6c, 6d, 8a, 8b, 8c, 8d and 8e detailed below.
Inverse reference
Initiating object
O4
q4
q2
O3
O2
Named or marked object
q3
q5
q1
O5
O1
Figure 7: Deciding Whether a Cycle is Dead or Alive
rule 5a:
rule 5b:
rule 5c:
rule 6a:
rule 6b:
rule 6c:
rule 6d:
rule 8a:
rule 8b:
rule 8c:
a queried object that does not satisfy rule 4 and having the status quiet takes the status suspected;
a queried object that does not satisfy rule 5a takes the status cycle detector and answers cycle;
an object that receives an cycle answer takes the status cycle;
as soon as an object having the status suspected receives an alive answer, it takes the status quiet and
answers alive;
as soon as an object having the status cycle or cycle detector receives an alive answer, it takes the status
quiet, asks its predecessor that answered cycle to change its status from cycle to quiet and answers alive;
an object invoked for changing its status from cycle to quiet changes its status to quiet if it is cycle and
propagates the request to its predecessor that answered cycle.
an object invoked for changing its status from cycle to quiet stops the propagation of the request if it is
cycle detector or quiet;
if all answers are dead or disconnected and the object is suspected, the object answers dead and is
collected;
if all answers are dead or disconnected and the object has the status cycle, the object answers cycle;
if all answers are dead or disconnected and the object is cycle detector, the object asks its predecessor
that answered cycle to be collected and answers dead;
10
rule 8d: an object invoked for being collected and whose state is cycle propagates the request to its predecessor
that answered cycle and is collected;
rule 8e: an object invoked for being collected and whose state is cycle detector stops the propagation of the
request and is collected;
Let us apply this adaptation of the reverse mark and sweep algorithm to the example of Figure 7. Object o1 takes
the status suspected according to rule 0 and sends question q1 to its predecessor o2 according to rule 1. Object
o2 takes the status suspected and sends question q2 to its first predecessor o3 according to rules 5a and 1. In
turn, object o3 takes the status suspected and sends question q3 to its first predecessor o2. Object o2 receives a
question while being in status suspected, then o2 takes the status cycle detector according to rule 5b and answers
cycle. Object o3 takes the status cycle according to rule 5c and sends question q4 to its second predecessor o4.
Object o4 answers dead and is collected according to rule 4 because it is unnamed and has no predecessor.
Object o3 answers cycle to q2 according to rule 8b. Object o2 sends question q5 to object o5. Object o5 is
reachable and answers alive according to rule 2. As o2 receives the alive answer to question 5, o2 takes the
status quiet, ask o3 to change its status into quiet and answers alive to q1 according to rule 6b. Object o3 asks o2
to change its status to quiet according to rule 6c and object o2 stops the propagation according to rule 6d. As o1
receives the answer alive to question q1, o1 takes the status quiet again and the process stops according to rule 9.
In some cases objects can be involved in nested cycles (no elementary cycles) like in Figure 8. In such cases
several nodes can take the status of cycle detector (in the worst case each cycle has a cycle detector). As for
simple cycles, it is necessary to know the answers of all the predecessors of objects involved in circuits to decide
that a circuit is not reachable. Consequently only one cycle detector can decide to collect the set of nested cycles,
precisely the last cycle detector which does not have queried all its predecessors. The choice of the appropriate
cycle detector in charge to collect a circuit depends on the nested cycles topology. This decision requires the
management of a list of cycle detector identifiers propagated through the answers as pictured in Figure 8. Each
time a cycle detector is detected, it answers cycle and put its identifier in the list of cycle detectors. When a cycle
detector receives the answer of its last predecessor and this answer is not alive, it removes its identifier from the
list of cycle detectors. If this list becomes empty, the cycle detector can decide to collect the set of cycles and
answers dead. Otherwise it changes its status into cycle and propagates the cycle answer. Let us consider
example Figure 8. Object o3 takes the status cycle detector when receiving question q4 and puts its reference in
the cycle answer list. Object o1 takes the status cycle detector when receiving question q5 and puts its reference
in the cycle answer list. Object o2 performs the union of the lists of answers to q4 and q5. When receiving
answer to q2 object o3 has no more predecessors. It removes its reference from the answers list, changes its state
into cycle and propagates the cycle answer to o1. Object o1 is the last cycle detector and can decide to collect
the nested cycles if it is not reachable. The complete algorithm is described in [Mul97] with a proof of
correctness.
q1
a1 : cycle {O1}
q5
O1
a3 : cycle {O1}
O2
q4
O3
a4 : cycle {O3}
q2
a2 : cycle {O1, O3}
q3
a3 : cycle {O1, O3}
ai : cycle { } stands for a cycle answer to question qi with its list of cycle detectors
Figure 8: Nested Cycles
11
O4
5. Discussion
The global GC algorithm can have several behaviors depending on how the first step in charge to select an
initiating object is managed. A first solution is to select the initiating object among all the global objects. This
solution requires a global synchronization between all the global GC modules. A second solution is to choose
alternatively a global GC module in charge to select locally an initiating object. When the reverse mark and
sweep process initiated at a global GC module is finished, this module can choose a new global GC module in
charge to select locally a new initiator so that all the global GC modules will perform the first step one after the
other. The synchronization is then limited to couples of global GC modules. A last solution is to allow the
global GC modules to select initiating objects in parallel. This last solution avoids synchronization between the
global GC modules but implies that several mark and sweep process can be run in parallel.
If several mark and sweep processes are run in parallel the following fact occurs: a same global object can
receive questions coming from distinct initiating objects. This fact raises two issues. First it is necessary to
guarantee the integrity of parallel updates of the status of entry items. Secondly the questions must be sequenced
since an object can receive queries from distinct initiators. The first issue is solved using latches as discussed in
section 3.2. The second issue is managed as follows.
A timestamp is used in order to distinguish from which initiator questions are issued. The timestamp is set by the
initiator and is propagated through questions during the diffusing computation. This timestamp is stored in the
entry item of each object whose status changes from quiet to suspected. If a suspected object receives a question
with a timestamp different from its timestamp, the question is detected as issued from an another initiating object
and is put in a queue associated to the object's entry item. Delayed questions stay in the queue until the suspected
object recovers the quiet status or is collected. Indeed, according to rules 6a, 6b, 6c, 8a, 8d and 8d a suspected
object either recovers the quiet status or is collected. Then, delayed questions are extracted from the queue and
are answered respectively with a live or dead answer. Cycles, however, can cause deadlocks due to several
diffusing computations mutually waiting for each other. Such deadlocks can be avoided with the following
adaptation. As a suspected object receives a question with a timestamp greater than its timestamp, the question is
not put into the queue but is answered with a message asking to suspend the computation. Such an answer can be
processed as an alive answer.
6. Conclusion
This paper has presented a global GC integrated in a MultiDataBase System architecture which preserves DBMS
autonomy. None assumption is made on the behavior of the DBMS's local GC and the global GC works
independently of the local GC.
The global GC proposed is an adaptation of reference listing combined with a reverse mark and sweep technique.
It has the following interesting properties: it is incremental and requires few interactions with transactions; the
reverse mark and sweep technique is able to detect object cycles that are frequent in a DBMS context; it is able to
collect objects without accessing the whole database. The global GC works exclusively on entry and exit items
without accessing global object cells stored in DBMS. Consequently it implies few I/O overhead for the DBMS.
In addition the global GC satisfies the fundamental properties of safety and liveness.
Performance issues have not been directly addressed in this paper. However it was mentioned in section 5 that
the global GC algorithm can have several behaviors depending on how the first step is managed. It was shown
that several global GC can be run in parallel so that global synchronization of DBMS sites is not required. A
parallel implementation of the global GC is in progress in order to evaluate its impact on the MDBS
performances. Indeed an important objective is to avoid that garbage collection introduces overhead for the
DBMS sites. An other opened issue is to avoid that the reverse mark and sweep process always inspect the same
objects. Several heuristics can be applied during the step in charge to select an initiator. A simple one is to use
timestamps as mentioned in section 5 in order to select the object with the smallest one. An other strategy is to
take into consideration the fact that applications access a small portion of the database and to suspect in priority
objects recently accessed.
12
References
[ABC+83] M. Atkinson, P. Bailey, K. Chisholm, P. Cockshott, R. Morrison, An Approach to Persistent Programming,
Computer Journal, 26(4), 1983.
[AFG95] L. Amsaleg, M. Franklin, O. Gruber, Efficient Incremental Garbage Collection for Client-Server Object
Database Systems, In Proc. of the 21th VLDB Int. Conf., Zurich, Switzerland, September 1995.
[Bak78] H. G. Baker, List Processing in Real Time on a Serial Computer, CACM, 21(4):280-294, April 1978.
[BEN+94] A. Birrell, D. Evers, G. Nelson, S. Owicki, E. Wobber, Distributed Garbage Collection for Network Object,
Digital System Research Center Technical Report 115, 1994.
[Bis77]
P. B. Bishop, Computer System with a Very Large Address Space and Garbage Collection, PhD Thesis, MIT,
Laboratory for Computer Science, Cambridge, MA, USA, May 1977, MIT/LCS/TR-178.
[Bjö89]
A. Björnerstedt, Secondary Storage Garbage Collection for Decentralized Object-Based Systems, Tsichritzis D.
C. Editor Object Oriented Development, Genève, Centre Universitaire d'Informatique, 1989.
[CMH83] K. M. Chandy, J. Misra, L. M. Haas, Distributed Deadlock Detection, ACM TOCS, 1(2), 1983.
[Col60]
G. E. Collins, A Method for Overlapping and Erasure of Lists, CACM, 2(12), December, 1960.
[CWZ94] J. Cook, A. Wolf, B. Zorn, Partition Selection Policies in Object Database Garbage Collection, In SIGMOD
Conf., Mineapolis, MN, May 1994.
[Det90]
D. Detlefs, Position Paper : Concurrent Atomic Garbage Collection, Workshop on Garbage Collection in
Object-Oriented Systems, ECOOP-OOPSLA'90 Conf., Ottawa Canada, October 1990.
[DS80]
E. W. D. Dijkstra, C. S. Scholten, Termination Detection for Diffusing Computations, Information Processing
Letters, 11(4), 1980.
[Fer96]
P. Ferreira, Larchant : Ramasse-Miettes dans une Mémoire Partagée Répartie avec Persistance par
Atteignabilité, Thesis, Université Paris IV, Juillet 1996.
[GR93]
J. Gray, A. Reuter, Transaction Processing: Concepts and Techniques, Morgan Kaufmann, 1993.
[Gru92] O. Gruber, Eos an Environment for Persistent and Distributed Applications over a Shared Object Space, Thesis,
Université de Paris VI, France, December 1992.
[Hug85] J. Hugues, A Distributed Garbage Collection Algorithm, In ACM Conf. on Functional Programming Languages
and Computer Architecture, LNCS 201, Springer-Verlag, September 1985.
[KW93] E. Kolodner, W. Weihl, Atomic Incremental Garbage Collection and Recovery for Large Stable Heap, In Proc.
of the ACM SIGMOD Int. Conf., Washington D. C., June 1993.
[LC95]
S. Louboutin, V. Cahill, Lazy per Cluster Log-Keeping Mechanism for Global Garbage Detection on Amadeus,
Distributed System Group, Trinity College, Dublin, Ireland, Technical Report TCD-CS-95-13, 1995.
[LL86]
B. Liskov, R. Ladin, Highly-Available Distributed Services and Fault-Tolerant Distributed Garbage Collection,
In Proc. of the 5th Symposium of the Principles of Distributed Computing, Vancouver, Canada, August 1986.
[LQP92] B. Lang, C. Queinnec, J. Piquer, Garbage Collecting the World, Conf. Record of the Nineteenth Annual ACM
Symposium of Principles of Programming Languages, 1992.
[McC60] J. McCarthy, Recursive Functions of Symbolic Expressions and their Computation by Machine, CACM, 3(4),
April 1960.
[ML94]
U. Maheshwari, B. Liskov, Fault-Tolerant Garbage Collection in a Client-Server Object-Oriented Database, In
Proc. of the 3rd PDIS Int. Conf., Austin Texas, September 1994.
[ML89]
C. Mohan, F. Levine, ARIES/IM: An Efficient and High Concurrency Index Management Method Using WriteAhead, IBM Research Report RJ6846, IBM Almaden Research Center, August 1989.
[MRV91] L. Mancini, V. Rotella, S. Venosa, Copying Garbage Collection for Distributed Object Stores, In SRDS Conf.,
Pisa, Italy, September 1991.
[Mul97] F. Mulatéro, Contrôle de Concurrence Sémantique et Ramasse-Miettes dans un Système Multibase de Données,
Thesis, Université Paul Sabatier, Toulouse, France, Juin 1997.
[OMG91] Object Management Group, The Common Object Request Broker Architecture : Architecture and Spec., OMG
Document Number 91.12.1 Revision 1.1, 1991.
[ONG93] J. Otoole, S. Nettles, D. Gifford, Concurrent Compacting Garbage Collection of a Persistent Heap, In Proc. of
the 14th SOSP Int. Conf., Asheville North California, Vol. 27, Number 5, December 1993.
[ÖV91]
M. T. Özsu, P. Valduriez, Principles of Distributed Database Systems, Prentice-Hall Int. Editions, 1991.
[PS95]
D. Plaifossé, M. Shapiro, A Survey of Distributed Garbage Collection Techniques, Int. Workshop on Memory
Management, Kinross Scotland, September 1995.
[SDP92] M. Shapiro, P. Dickman, D. Plainfossé, Robust Distributed References and Acyclic Garbage Collection,
Symposium on Principles of Distributed Computing, Vancouver Canada, August 1992.
[SP96]
M. Skubiszewski, N. Porteix, GC-consistent Cuts of Databases, INRIA, RA No. 2681, April 1996.
[Wil92]
P. R. Wilson, Uniprocessor Garbage Collection Techniques, In Proc. of the Int. Workshop on Memory
Management, Number 637 in Lecture Notes in Computer Science, Springer-Verlag, Saint-Malo France,
September 1992.
[YNY94] V. Yong., J. Naughton, J. Yu, Storage Reclamation and Reorganization in Client-Server Persistent Object
Stores, In Proc. of the Data Engineering Int. Conf., Houston Texas, February 1994.
13
Download