Document file

advertisement
In the name of God
Distributed Database Systems
(Technical report 1)
University of Tehran
Electrical and Computer Engineering Dept.
Directed By: Dr. M. Rahgozar
By: Samira Tasharofi
Reza Basseda
Summer 2005
1
Abstract A distributed database system is a database whose relations reside on different
sites or replicated at different sites or split between different sites. The advantages that
can be achieved by a distributed database system result in a lot of researches on it to
extend its utilizing in different environments. This paper will provide an overview for
different aspects of distributed database systems and the state of the art for those aspects.
We also can see the points that need more works.
1. Introduction
Distributed Database System (DDS) technology is the intersection of two
technologies, namely, database systems and computer networks.
Distributed Computing System consists of a number of processing elements (not
necessarily homogeneous) that are interconnected by a computer network and that cooperate in performing their assigned tasks. A distributed database can be defined as a
collection of multiple, logically interrelated databases distributed over a computer
network. A distributed database management system is a Software System that permits
the management of the Distributed Database System and makes the distribution
transparent to the users. A typical Distributed Database will have the following features:
 Entire database is distributed over a number of distant sites, potentially having
replication of some portion of data.
 In every site there exists means to access data situated in a remote site.
 The records kept in a location are more frequently accessed by the locally
submitted transactions than by remote transactions.
A distributed database provides a number of advantages, like:
 Local autonomy
 Improved performance (by proper fragmentation)
 Improved reliability/availability (by replication)
 Greater expandability
 Greater share ability
Also it has some disadvantages as:
 Higher complexity
 Higher software and hardware cost
 Synchronization and co-ordination among the sites
 Higher maintenance overhead in case of replication
 Greater security problem
Section 2 of this paper contributes in Distributed Data Storage. Distributed
transactions and their commit protocols are described in section 3 and 4 respectively. In
section 5 we considered concurrency control in distributed database systems. Section 6
focused on availability of data and approaches that improve availability. Sections 7,8
investigate query evaluation and heterogeneous distributed database. Finallly we
provided conclusion in section 9.
2. Distributed Data Storage
2
Data allocation is a critical aspect of distributed database systems: a poorlydesigned data allocation can lead to inefficient computation, high access costs, and high
network loads whereas a well-designed data allocation can enhance data availability,
diminish access time, and minimize overall usage of resources. It is thus very important
to provide distributed database systems with an efficient means of achieving effective
data allocation. Two important issues in distributed data storage are fragmentation and
replication. Some advanced features are also proposed that will be described.
2.1 Fragmentation
In order to distribute data in a distributed database, we need to use fragmentation of
information. The goal of it is to minimize the total data transfer cost incurred in executing a set
of queries.
There are three kinds of fragmentation: horizontal, vertical and mixed as will be
described bellow [1]:
Vertical fragments are created by dividing a global relation R on its attributes by
applying the project operator: Rj = Π {Aj},key R, where 1 ≤ j ≤m, Where {Aj} is a set of
attributes not in the primary key, upon which the vertical fragment is defined and m is the
maximum number of fragments. A vertical fragmentation schema is complete when every
attribute in the original global relation can be found in some vertical fragment defined on
that relation. Then the reconstruction rule is satisfied by a join on the primary key(s):
Rj  {R1, R2,… ,Rm} : R = ►◄key Rj

The disjoint ness rule does not apply in a strict sense to vertical fragmentation as the
reconstruction rule can only be satisfied when the primary key is included in each
fragment.

Horizontal fragmentation divides a global relation R on its tuples by use of the
j
selection operator: R = σPj (R), where 1 ≤ j ≤m, Where Pj is the selection condition as a
simple predicate and m is the maximum number of fragments. The horizontal
fragmentation schema satisfies the completeness rule if the selection predicates are
complete. Furthermore, if a horizontal fragmentation schema is complete, the
reconstruction rule is satisfied by a union operation over all the fragments:
1
2
m
j
Rj  {R , R , …, R } : R =  R
Disjoint ness is ensured when the selection predicates defining the fragments are
mutually exclusive.
Mixed Fragmentation is a hybrid fragmentation schema; it is a combination of
horizontal and vertical fragments. If the correctness and disjoint ness rules are satisfied
for the comprising fragments, they are implicitly satisfied for the entire hybrid schema.
Reconstruction is achieved by applying the reconstruction operators in reverse order of
fragment definition.

2.2 Replication
3
A relation or fragment of a relation is replicated if it is stored redundantly in two or
more sites. Reliability and performance are the two major purposes of data replication.
Data replication allows availability, especially for distributed databases, but it also
implies a big problem to keep consistency. It can be complete (replicated in all sites) or
partial. Fully redundant databases are those in which every site contains a copy of the
entire database [1, 8].
2.3 Dynamic Fragmentation
Traditionally, fragmentation in distributed databases has been determined by offline analysis and optimization, however, there are some enterprises having users
accessing their databases under changing access patterns (requires an approach to
dynamic fragmentation, i.e., an algorithm that can reallocate data while database is online). In [1] one of the approaches for dynamic fragmentation and improving performance
of accessing data in distributed database systems is RBy (bound) algorithm which is
based on horizontal fragmentation with partial replication as described below:
1. For each query requested, a slave computer increments a counter (ctr) for the
user that have made the request.
2. If ctr reaches bound number (parameter of this algorithm), then this computer is
a candidate to have a set of records replicated and need to follow steps 3 and 4, else step
5.
3. Request the set of records that the user is asking for and save this information
into the slave database.
4. Reset the user local counter to cero.
5. End.
This approach allows database availability even when the connection between slave
and master database is broken.
In order to reduce the search space, slave-master search (SMS) technique is
proposed in which accessing data is based on searching the slave (local) database before
sending the query to master computer.
2.4 Data Relocation
In a distributed system, long-running distributed services are often confronted with
changes to the configuration of the underlying hardware. If the service is required to be
highly available, the service needs to deal with the problem of adapting to these changes
while still providing its service and make it transparent to client processes. This problem
is increased further if multiple changes can occur concurrently. In [2] one of the
approaches for data relocation is represented to address the problems of data relocation
noted above as will be explain in this section.
In this approach, it is assumed that the service is implemented by a set of server
processes, with each server located on a different machine and managing (or hosting) a
disjoint subset of the records (assume no replication). Every record is always hosted by a
single server, which is called the record’s hosting server. The distributed service provides
its services to external client processes.
The load distribution policy is captured by a data structure which is called the
mapping which defines for each record its hosting server and can be used for redirecting
4
requests to appropriate servers. Mappings are managed by a separate service, called the
configuration service which is responsible for building an updated mapping that includes
the configuration change and providing the new mapping to all the servers in the case of
configuration change.
To invoke an operation, a client arbitrarily picks a server and submits its request to
it. The selected server then looks in its copy of the mapping to determine the record’s
hosting server and forwards the request accordingly. When a server receives the new
mapping, it starts relocating records, includes transferring of mappings between a server
and the configuration service, as well as relocation of records between servers.
The algorithm is proposed for multiple situations of data redistribution described as
the following:

Single Redistribution:
For single redistribution, the solution follows three steps:
 Initialization: Initially, all the servers have a local copy of the authoritative
mapping, M, which is used for forwarding requests to the proper hosting. When
the configuration service receives the notification for a configuration change, it
computes a new mapping M' that reflects the change, and distributes M' to all the
servers of the distributed service
 Record relocation: the server, receives a new mapping M’ and ships its own
record if it is needed. During the record relocation step, servers continue to
forward client requests using the authoritative mapping M or M’ for not-yetshipped record and already-shipped record respectively. Already-shipped record
requests are forwarded to the record’s new hosting server as dictated by M’.
 Termination: As soon as a server completes its record relocation step, it notifies
the configuration service. When the configuration service receives completion
notifications from all servers, it, in turn, notifies all the servers that the
termination step can start in which each server discards the mapping M and
replaces it by the new mapping M’ as the new authoritative mapping.
Different mappings are serviced sequentially. To make the record relocation step
more efficient, a server does not discard a record after it has been shipped to its new host.
Instead, the server keeps handling lookup requests for such a record, but only for as long
as that record remains consistent with the copy at its new hosting server, it means before
the first update request for that record is made.
Alternative Design Considerations
There are two strategies to deal with requests for already-shipped data while
redistribution is in progress:
1.Reject the request and let the client keep trying until the redistribution is
completed. It is contradictory of transparency goal.
2.Always handle the request locally independent of whether it is a lookup or an
update request. In the case of an update request for a record that is already relocated, the
authoritative hosting server propagates the record’s modified value to its new hosting
server in order to keep the two copies of the record consistent. This solution has the
5
advantage that update requests are processed slightly faster, but introduces additional
complexity for keeping the records consistent.
Also, there are two approaches for initial forwarding of requests:
1.Forwarding a request to the record’s authoritative hosting server and having it
forwarded further if the record is already shipped.
2.Forwarding the request to the record’s new hosting server and having the record
fetched on demand.
The first strategy favors frequent lookups and rare updates and the second strategy
favors more frequent updates as it eliminates the extra forwarding of every single request
for a shipped record that has been updated.
 Overlapping Redistributions: In the following approaches efficiency is
improved by introducing concurrency (let R1, R2,.., Rn be the sequence of upcoming
redistributions and M1, M2,. . . , Mn their respective mappings. M is the (current)
authoritative mapping of the distributed service as a whole).
Approach I: Per-server Sequential Redistribution
In this case, the configuration service generates a new mapping and distributes it to
the servers as soon as it receives a notification for a new configuration change. The
servers themselves are responsible for locally queuing incoming mappings and
processing them one at a time in the order received. Each server maintains a queue of
mappings, which always contains at least one mapping. A server that has relocated all
records for redistribution R1 can start carrying out the record relocation for the next
redistribution R2 before all other servers have completed redistribution R1. The
authoritative mapping as known to the server (i.e., M) is removed from the server’s queue
only upon receiving a notification from the configuration service stating that
redistribution R1 has been completed by all servers. The forwarding is based on first
preference virtual mapping.
Approach II: Per-server Mixed but Ordered Redistributions
The main idea in this case is that there are cases where a server does not need to
complete a redistribution to start working on the next one. Assume a server is currently
going through its set of records, checking which ones are to be shipped based on
redistribution Ri and it comes across a record that is not remapped by Ri. The server can
then ship this record based on a successive redistribution Rj(j > i), even if it has not
finished Ri yet. As the approach (I) forwarding requests are based on first preference
virtual mapping.
Approach III: Direct Shipping to Final Destination
The optimization introduced in Approach III entails that a record is shipped directly
to the record’s hosting server according to the last known redistribution. This policy
keeps a record from being shipped from server to server when it is already known that it
needs to be shipped further. Instead, the record is sent directly to the last server in the
6
chain of servers it is mapped to. This policy prevents unnecessary network traffic and
redistribution delay.
The main difference between this approach and approaches I and II is that server
ships records based on the virtual mapping with last preference of all the mappings in its
queue. The records are thus directly relocated to the proper hosting server. However,
servers still use the virtual mapping with first preference of all these mappings to forward
requests that cannot be handled locally.
3. Distributed Transactions
A multi-database system (MDBS) is a collection of autonomous local databases
(LDBSs) and it is viewed as a single unit. There are two types of transactions in MDBS.
A local transaction, which accesses a local database only, is submitted directly to LDBS.
Global transaction is a set of sub-transactions, where each sub-transaction is a
transaction accessing the data items at a single local site and the component local
database systems do not support prepare to commit stage. There are three types of subtransactions:
 Rtriable: It is guaranteed to commit after e finite number of submissions when
executed from any consistent database state.
 Compensatable: the effect of its execution can be semantically undone after
commitment by executing a compensating sub-transaction at its local site.
 Pivot: it is neither retriable nor compensatable.
In each global transaction, at most one sub-transaction can be pivot.
Global transaction management requires cooperation from local sites to ensure the
consistent and reliable execution of global transactions in distributed database system. In
a heterogeneous distributed database (or multi-database) environment, various local sites
makes conflicting assertions of autonomy over the execution of global transactions.
Global serializability is an accepted correctness criterion for the execution of (nonflexible) global and local transactions in the HDDBS environment. A global schedule S is
globally serializable if the committed projection from S of both global transactions in the
HDDBS environment and the transactions that run independently at local sites is conflictequivalent to some serial execution of those transactions
3.1 Flexible Transactions
A flexible transaction model for the specification of global transactions makes it
possible to deal robustly with these conflicting requirements. In heterogeneous database
systems, flexible transactions can increase the failure resilience of global transactions by
allowing alternate (but in some sense equivalent) execution to be attempted when a local
database system fails or some sub-transactions of the global transaction abort.
The flexible transaction model supports flexible execution control flow by
specifying two types of dependencies among the sub-transactions of a global transaction:
Execution ordering dependencies between two sub-transactions and alternative
dependencies between two subsets of sub-transactions.
3.1.1 Semi-Atomicity in Flexible Transactions
7
In [3] semi-atomicity is represented as a weaker form of atomicity for flexible
transaction and allows local sites to autonomously maintain serializability and
recoverability.
Let T = {t1,t2,..,tn} be a repertoire of sub-transactions and P(T) the collection of all
subsets of T. Let ti,tj < T and Ti<T be a subset of T with a precedence relation <i defined
on Ti. (Ti,<i) is a representative partial order, abbreviated as <-rpo, if the execution of
sub-transactions in Ti represents the execution of the entire flexible transaction T.
The execution of a flexible transaction T, preserves the property of semi-atomicity
if one of the following conditions is satisfied:
 All its sub-transaction in one <-rpo commit and all attempted sub-transactions
not in the committed <-rpo are either aborted or have their effects undone
 No partial effects of its sub-transactions remain permanent in local database
The preservation of the weaker property of semi-atomicity renders flexible
transactions more resilient to failures than traditional global transactions. This property is
preserved through a combination of compensation and retry approaches. The construction
of recoverable flexible transactions that are executable in the error-prone MDBS
environment demonstrates that the flexible transaction model indeed enhances the scope
of global transaction management beyond that offered by the traditional global
transaction model. Using flexible transactions, blocking that may caused by the 2PC
protocol can be prevented. Compensating sub-transactions may be subject to fewer
restrictions in such an environment.
3.1.2 Global Serializability: Some sub-transactions in the flexible transaction which do
not belong to the committed <-rpo (as noted above) may have committed and their effects
are compensated. Such sub-transactions which is called invalid sub-transactions with
their compensating transactions are termed surplus transactions. In the case of flexible
transactions, a global schedule S is serializable if the projection of committed local,
flexible and surplus transactions is conflict equivalent to some serial execution of these
transactions.
3.1.3 F-Serializability: F-serializability is a concurrency control criterion of flexible and
local transactions that is stricter than global seralizability in that it prevents the flexible
transactions which are serialized between a flexible transaction and its compensating subtransactions to affect any data items that have been updated by flexible transaction as
described in [4].
A global schedule S is compensation-free if for any sub-transaction tj which is
serialized between a sub-transaction ti and its compensating transaction cti in S, WC(ti)
AC(tj) = , Where WC(t) denotes the set of data items that writes and commits and
AC(t) denote the set of data items that t accesses and commits.
Let S be a global schedule of a set of a set of well-formed flexible transactions and
local transactions. S is F-serializable if it is globally serializable and compensationinterference free.
If we consider the traditional definition of global serializability in which all subtransactions and their compensating sub-transactions of a flexible transaction at a local
8
site are treated as a logically atomic sub-transaction, then the set of F-serializable
schedules would be superset of globally seralizable schedule.
Scheduling protocol
For achieving a GTM scheduling protocol (which assumes that each global subtransaction and compensating transaction predeclares its read- and write- sets) that
ensures F-serializability on the execution of local and flexible transactions, and avoids
cascading aborts, the Stored Sub-transaction Execution Graph (SSGP) is maintained. The
SSEG of a set of flexible transactions in global schedule S is a directed graph whose
nodes are global sub-transactions and compensating sub-transactions for those flexible
transactions, and whose edges titj indicate that tj must serialize before ti due to
preference, precedence, or conflict.
The proper insertion and deletion rules for nodes and edges are also defined in this
graph.
4. Commit Protocols
Distributed database systems implement a transaction commit protocol to ensure
transaction atomicity.
In distributed database systems, a single transaction may have to execute in many sites,
depending on the location of the data that it needs to process. In such cases, some sites of a
transaction may decide to commit while other could decide to abort it, resulting in a violation of
transaction atomicity. To address this problem, distributed database systems use transaction
commit protocols. The job of the commit protocol is to make sure that all participating sites of a
transaction agree upon the final outcome (commit or abort) of the transaction. Most importantly,
this assurance has to remain in the presence of network failures. In this section an overview of
the commit protocols represented in [5, 6] will be described.
4.1 General Commit Protocols
Variety of commit protocols has been proposed (for non-real time database systems) for a
distributed database system includes:
 Two Phase Commit (2PC): It operates in two phases: In the first phase, called “voting
phase”, the master reaches a global decision (commit or abort) based on local decisions
of the cohorts. In the second phase, called the “decision phase” the master conveys this
decision to the cohorts. In this protocol, cohorts use logging mechanisms in order to
cohorts can be undone if the transaction.

Presumed Abort (PA): it is a variant of 2PC protocol and behaves identically to 2PC
for committing transactions, but has reduced message and logging overhead for aborted
transactions. It is not necessary for cohorts to send ACKs for ABORT messages from
the master, or to force-write the abort record to the log.

Presumed Commit (PC): this protocol is another variant of 2PC, in which cohorts do
not send ACKs for a commit decision sent from the master, and also does not forcewrite a commit log record. In addition, the master does not write end log record. In
9
other words, the overhead are reduced for committing transactions rather than aborted
transaction.

Three Phase Commit (3PC): A fundamental problem with all of the above protocols
is that cohorts may become blocked in the event of site failure and remain blocked
until the failed site recover. This protocol addresses the blocking problem of the above
protocols by inserting an extra phase, called the “precommit phase”, in between the
two phases of the 2PC protocol in the price of increasing overhead in the
communication (messaging) and logging.
4.2 Real-Time Commit Protocols
Distributed commit processing can have considerably more effect than distributed
data processing on real-time performance. So, it is important to use commit protocols
which reflects on limitations of real-time database systems.
The semantic of firm deadline in distributed real-time database systems is that a
transaction should be either committed before its deadline or be killed when the deadline
expires. A distributed firm-deadline real-time transaction is said to be committed if the
master has reached the commit decision (that is, forced the commit log record to the disk)
before the expiry of the deadline at its site. This definition applies irrespective of whether
the cohorts have also received and recorded the commit decision by the deadline.
 Permits Reading Of Modified Prepared-data for Timeliness (PROMT)
It is the best performing two-phase protocol in DRTDBS. This protocol allows
transactions to “optimistically” borrow, in a controlled manner, the updated data of
transactions currently in their commit phase. This controlled borrowing reduces the data
inaccessibility and the priority inversion that is inherent in distributed real-time commit
processing. To future improve its real-time performance; three additional features are
included in the PROMPT protocol as:
1)
Active Abort: cohorts inform the master as soon as they decide to abort
locally, rather than only upon explicit request by the master.
2)
Silent Kill: there is no need for the master to invoke the abort protocol
since the cohorts of the transaction can independently realize the missing of the deadline
(assuming global clock synchronization)
3)
Healthy Lending: The heath factor associated with each transaction is
computed at the point of time when the master is ready to send PREPARE messages and
is defined to be the ratio TimeLeft/MinTime, where TimeLeft is the time left until the
transaction’s deadline, and MinTime is the minimum time required for commit
processing. In this scheme, a transaction is allowed to lend its data only if its health factor
is greater than a (system-specified) minimum value MinHF as lending by transactions
that is close to its deadline, result in the aborts of all the associated borrowers.
 Early Prepare (EP) Commit Protocol (one phase protocol)
10
This protocol uses PC protocol to eliminate one round of messages for a distributed
transaction that executes in the absence of failure. It also reduces the communication
overhead further by making each cohort enter the prepared state after it performs its
work, and before it replies to the master with the WORKDONE message. The master
may have to force multiple MEMBERSHIP records, because the transaction membership
may grow as transaction execution progresses. Also, the master must record a Cohort’s
identity in its stable log before sending a work request to that abort. The steps of FP’s
execution are as bellow:
 Master forces one or more MEMBERSHIP log records and sends a
STARTWORK request to each cohort.
 Each cohort executes its work request, forces a PREPARE log record, and replies
to the master with a WORKDONE message.
 Master forces a COMMIT log record, sends a COMMIT message to each cohort,
and forgets about the transaction. A commit decision is reached if all cohorts have
performed their jobs successfully and thereby, ready to commit.
 Each cohort appends to its log (but need not force) a COMMIT log record and
then forgets about the transaction.
In contrast to 2PC or PC, when EP is used and several requests are sent to each
cohort, each cohort forces only one PREPARE log record regardless of the number of
work requests it received. Also, a cohort using EP synchronously forces each PREPARE
record as early as possible, while a cohort using PC synchronously forces the PREPARE
record as late as possible.
Comparison of Real-Time Protocols
EP implicitly holds one strong feature of PROMPT, namely active abort. As a
result, EP outperforms PROMPT when the cohorts of a distributed transaction execute in
parallel. However, the performance of EP is rather poor in the environment where the
cohorts of a distributed transaction execute sequentially. In the case of very high
workload, in the environment of sequential execution, EP performs extremely well
compared to any two-phase protocol in the presence of both resource and data contention.
4.3 A Two-Phase Commit Protocol for Mobile Wireless Environment (M-2PC
protocol)
As noted in [7] if the traditional 2PC is executed in mobile environment,
disconnections will increase the number of, may be wrong, abortion decisions of
transaction because if a Fixed Host (FH) tries to communicate with it a disconnected
Mobile Host (MH) this will cause a failure.
As frequent are disconnections, as transaction abortions are. This is not acceptable
in mobile environments because frequent disconnections are not exceptions but rather are
part of the normal mode of operation, so they should not be treated as failures. So, the M2PC protocol is proposed for mobile environment. In this protocol, coordinator must
reside in the fixed part of the network to be directly reachable by the fixed participants.
Some situations considered in this protocol are as the following:
11

The case of mobile client and fixed servers: To handle disconnections the client
delegates its commit duties to the coordinator which is always available during
protocol execution. The client sends the request for commit to the coordinator
along with its logs. Afterwards, the client can disconnect. The coordinator
sends vote messages to all participants and decides on whether to commit or
abort according to the traditional 2PC principal. After receiving the
acknowledgements the coordinator informs the client, which may be in another
cell, about the result. The coordinator waits for the client acknowledgement
before forgetting about the transaction (releasing resources). So, to mitigate the
unforeseeable breakdowns, the client must force-write the identity and location
information of the coordinator just before sending the commit-request.

The case of mobile client and mobile servers: in this situation, mobile server
is called participant. The representation agent for mobile server, which is called
participant-agent, will work on behalf of the mobile server which is free to
disconnect from the moment it delegates its commitment duties to its
representation agent. The participant-agent is responsible of transmitting the
result to the participant at reconnection time and also of keeping logs and
eventually recovering in the case of failure. The participant is free to move to
another cell during the protocol execution. When it registers to a new base
station (BS), the participant MH (or mobile participant) informs its participantagent about its new location. Again, the workload is shifted to the fixed part of
the network thus preserving processing power and communication resources
and minimizing traffic cost over the wireless links.
5. Concurrency Control
Concurrency control has been actively investigated for the past several years, and
the problem for non-distributed DBMSs is well understood. A broad mathematical theory
has been developed to analyze the problem, and one approach, called two-phase locking,
has been accepted as a standard solution. Current research on non-distributed
concurrency control is focused on evolutionary improvements to two-phase locking,
detailed performance analysis and optimization, and extensions to the mathematical
theory.
The concurrency control problem is exacerbated in a distributed DBMS (DDBMS)
because (1) users may access data stored in many different computers in a distributed
system, and (2) a concurrency control mechanism at one computer cannot instantaneously
know about interactions at other computers.
More than 20 concurrency control algorithms have been proposed for DDBMS’s,
and several have been, or are being, implemented. These algorithms are usually complex,
hard to understand, and difficult to prove correct (indeed, many are incorrect). Because
they are described in different terminologies and make different assumptions about the
underlying DDBMS environment, it is difficult to compare the many proposed
algorithms, even in qualitative terms.
In fact, the sub algorithms used by all practical DDBMS concurrency control
algorithms are variations of just three basic techniques:
12
 Two-phase locking
 Timestamp ordering
 Optimistic
Thus the state of the art is far more coherent than a review of the literature would
seem to indicate. Well-known centralized concurrency control techniques can be
extended to solve the problem of concurrency control in distributed databases, but not all
concurrency control techniques are suitable for a distributed database. One example is
serialization graph testing, which works well in a centralized database system given
relative powerful processors compared to I/O speed. But in a distributed environment,
keeping the graph updated at all times is prohibitly expensive because of the
communication costs. During the last years, several distributed database systems have
been realized. Usually, the concurrency control in these systems has been done by some
kind of two-phase locking, but as processor speed increases relative to I/O and
communication speed, it is expected that timestamp ordering should be able to compete
with two-phase locking in performance. In theory, timestamp ordering scheduling should
be capable of good performance in distributed systems. It is deadlock free and avoids
much communication for synchronization and lock management. The work that has been
done has been mostly theoretical, but some interesting simulation models have been
developed and simulated at the University of Wisconsin.
Bernstein and GooDMan review many of the proposed algorithms and describe how
additional algorithms may be synthesized by combining basic mechanisms from the
locking and timestamp classes [17]. They have presented a framework for the design and
analysis of distributed database concurrency control algorithms. The framework has two
main components: (1) a system model that provides common terminology and concepts
for describing a variety of concurrency control algorithms, and (2) a problem
decomposition that decomposes concurrency control algorithms into read-write and
write-write synchronization sub-algorithms. They have considered synchronization subalgorithms outside the context of specific concurrency control algorithms. Virtually all
known database synchronization algorithms are variations of two basic techniques-twophase locking (2PL) and timestamp ordering (T/O). They have described the principal
variations of each technique, though they do not claim to have exhausted all possible
variations. In addition, they have described ancillary problems (e.g., deadlock resolution)
that must be solved to make each variation effective. They have shown how to integrate
the described techniques to form complete concurrency control algorithms. They have
listed 47 concurrency control algorithms, describing 25 in detail:
5.1. Basic 2PL: An implementation of 2PL amounts to building a 2PL scheduler, a
software module that receives lock requests and lock releases and processes them
according to the 2PL specification. The basic way to implement 2PL in a distributed
database is to distribute the schedulers along with the database, placing the scheduler for
data item x at the DM were x is stored. In this implementation readlocks may be
implicitly requested by din-reads and writelocks may be implicitly requested by
prewrites. If the requested lock cannot be granted, the operation is placed on a waiting
queue for the desired data item. (This can produce a deadlock,as discussed in Section
3.5.) Writelocks are implicitly released by din-writes. However, to release readlocks,
13
special lock-release operations are required. These lock releases may be transmitted in
parallel with the din-writes, since the DM-writes signal the start of the shrinking phase.
When a lock is released, the operations on the waiting queue of that data item are
processed first-in/first-out (FIFO) order.
5.2. Primary Copy 2PL: Primary copy 2PL is a 2PL technique that pays attention to
data redundancy. One copy of each logical data item is designated the primary copy;
before accessing any copy of the logical data item, the appropriate lock must be obtained
on the primary copy. For readlocks this technique requires more communication than
basic 2PL. Suppose xl is the primary copy of logical data item X, and suppose transaction
T wishes to read some other copy, Xn of X. To read X, T must communicate with two
DMs, the DM where Xs is stored (so T can lock X1) and the DM where X, is stored. By
contrast, under basic 2PL, T would only communicate with x,'s DM. For writelocks,
however, primary copy 2PL does not incur extra communication. Suppose T wishes to
update X. Under basic 2PL, T would issue prewrites to all copies of X (thereby
requesting writelocks on these data items) and then issue DM-writes to all copies. Under
primary copy 2PL the same operations would be required, but only the prewrite (X1)
would request a writelock. That is, pre-writes would be sent for X1, . . . , Xm, but the
prewrites for X2 . . . . . Xm would not implicitly request writelocks.
5.3 Voting 2PL: Voting 2PL (or majority consensus 2PL) is another 2PL implementation
that exploits data redundancy. Voting 2PL is derived from the majority consensus
technique of Thomas and is only suitable for ww synchronization. To understand voting,
we must examine it in the context of two-phase commit. Suppose transaction T wants to
write into X. Its TM sends prewrites to each DM holding a copy of X. For the voting
protocol, the DM always responds immediately. It acknowledges receipt of the prewrite
and says "lock set" or "lock blocked." (In the basic implementation it would not
acknowledge at all until the lock is set.) After the TM receives acknowledgments from
the DMs, it counts the number of"lock~set" responses: if the number constitutes a
majority, then the TM behaves as if all locks were set. Otherwise, it waits for "lockset"
operations from DMs that originally said "lock blocked." Deadlocks aside (see Section
3.5), it will eventually receive enough "lockset" operations to proceed. Since only one
transaction can hold a majority of locks on X at a time, only one transaction writing into
X can be in its second commit phase at any time. All copies of X thereby have the same
sequence of writes applied to them. A transaction's locked point occurs when it has
obtained a majority of its writelocks on each data item in its writeset. When updating
many data items, a transaction must obtain a majority of locks on every data item before
it issues
5.4. Centralized 2PL: Instead of distributing the 2PL schedulers, one can centralize the
scheduler at a single site. Before accessing data at any site, appropriate locks must be
obtained from the central 2PL scheduler. So, for example, to perform DM-read (x) where
x is not stored at the central site, the TM must first request a readlock on X from the
central site, walt for the central site to acknowledge that the lock has been set, then send
DM-read (x) to the DM that holds X. (To save some communication, one can have the
14
TM send both the lock request and DM-read (x) to the central site and let the central site
directly forward DM-read(x) to x's DM; the DM then responds to the TM when DM-read
(x) has been processed.) Like primary copy 2PL, this approach tends to require more
communication than basic 2PL, since DM-reads and prewrites usually cannot implicitly
request locks.
5.5. Basic T/O Implementation: An implementation of T/O amounts to building a T/O
scheduler, a software module that receives DM-reads and DM-writes outputs these
operations according to the T/O specification. In practice, prewrites must also be
processed through the T/O scheduler for two-phase commit to operate properly. As was
the case with 2PL, the basic T/O implementation distributes the schedulers along with the
database.
5.6 The Thomas Write Rule: For ww synchronization the basic T/O scheduler can be
optimized using an observation. Let W be a dm-write (x), and suppose ts(W) < W-ts(x).
Instead of rejecting W we can simply ignore it. We call this the Thomas Write Rule
(TWR) Intuitively, TWR applies to a dm-write that tries to place obsolete information
into the database. The rule guarantees that the effect of applying a set of dm-writes to x is
identical to what would have happened had the dm-writes been applied in timestamp
order. If TWR is used, there is no need to incorporate two-phase commit into the ww
synchronization algorithm; the ww scheduler always accepts prewrites and never buffers
dm-writes.
For rw synchronization the basic T/O scheduler can be improved using multiversion
data items. For each data item x there is a set of R-ts's and a set of (W-ts, value) pairs,
called versions. The R-ts's of x record the timestamps of all executed dm-read(x)
operations, and the versions record the timestamps and values of all executed dm-write(x)
operations.
5.7. Conservative timestamp ordering: Conservative timestamp ordering is a technique
for eliminating restarts during T/O scheduling. When a scheduler receives an operation O
that might cause a future restart, the scheduler delays 0 until it is sure that no future
restarts are possible.
Conservative T/O requires that each scheduler receive dm-reads (or dm-writes)
from each TM in timestamp order. For example, if scheduler Sj receives dm-read(x)
followed by dm-read(y) from TM, then ts(dm-read(x)) =< ts(dm-read(y)). Since the
network is assumed to be a FIFO channel, this timestamp ordering is accomplished by
requiring that TM, send din-reads (or din-writes) to S: in timestamp order: Conservative
T/O buffers din-reads and din-writes as part of its normal operation. When a scheduler
buffers an operation, it remembers the TM that sent it. Let min-R-ts(TM,) be the
minimum timestamp of any buffered din-read from TM~, with min-R-ts(TM,) ffi -oo if
no such din-read is buffered din-read from TM~, with min-R-ts(TM,) ffi -oo if no such
din-read is buffered. Define min-W-ts(TMi) analogously. Conservative T/O performs rw
synchronization as follows:
1. Let R be a din-read(x). If ts(R) > min-W-ts(TM) for any TM in the system, R is
buffered. Else R is output.
15
2. Let Wbe a dm-write(x). Ifts(W) :> min-R-ts(TM) for any TM, W is buffered. Else
W is output.
3. When R or W is output or buffered, this may increase min-R-ts(TM) or min-Wts(TM); buffered operations are retested to see if they can now be output.
The effect is that R is output if and only if (a) the scheduler has a buffered dinwrite from every TM, and (b) ts(R) < minimum timestamp of any buffered dm-write.
Similarly, W is output if and only if (a) there is a buffered din-read from every TM, and
(b) ts(W) < minimum timestamp of any buffered din-read. Thus R (or W) is output ff and
only if the scheduler has received every din-write (or din-read) with smaller timestamp
that it will ever receive. Ww synchronization is accomplished as follows:
1. Let Wbe a din-write(x). Ifts(W) > min-W-ts(TM) for any TM in the system, W is
buffered; else it is output.
2. When W is buffered or output, this may increase min-W-ts(TM); buffered dinwrites are retested accordingly.
The effect is that the scheduler waits until it has a buffered din-write from every
TM and then outputs the din-write with smallest timestamp. Two-phase commit need not
be tightly integrated into conservative T/O because dm-writes are never rejected.
Although pre-writes must be issued for all data items updated, the conservative T/O
schedulers do not process these operations.
5.8. Certifier: In the certification approach DM-reads and prewrites are processed by
DMs first-come/first-served, with no synchronization whatsoever. DMs do maintain
summary information about rw and ww conflicts, which they update every time an
operation is processed. However, din-reads and pre-writes are never blocked or rejected
on the basis of the discovery of such a conflict. Synchronization occurs when a
transaction attempts to terminate. When a transaction T issues its END, the DBMS
decides whether or not to certify, and thereby commit, T. To understand how this
decision is made, we must distinguish between "total" and "committed" executions. A
total execution of transactions includes the execution of all operations processed by the
system up to a particular moment. The committed execution is the portion of the total
execution that only includes din-reads and din-writes processed on behalf of committed
transactions. That is, the committed execution is the total execution that would result
from aborting all active transactions (and not restarting them). When T issues its END,
the system tests whether the committed execution augmented by T's execution is
serializable, that is, whether after committing T the resulting committed execution would
still be serializable. If so, T is committed; otherwise T is restarted.
There are two properties of certification that distinguish it from other approaches.
First, synchronization is accomplished entirely by restarts, never by blocking. And
second, the decision to restart or not is made after the transaction has finished executing.
No concurrency control method discussed above satisifies both these properties.
A certification concurrency control method must include a summarization algorithm
for storing information about dm-reads and prewrites when they are processed and a
certification algorithm for using that information to certify transactions when they
terminate. The main problem in the summarization algorithm is avoiding the need to store
information about already-certified transactions. The main problem in the certification
16
algorithm is obtaining a consistent copy of the summary information. To do so the
certification algorithm often must perform some synchronization of its own, the cost of
which must be included in the cost of the entire method.
5.9. Thomas' Majority Consensus Algorithm: Thomas' algorithm assumes a fully
redundant database, with every logical data item stored at every site. Each copy carries
the timestamp of the last transaction that wrote into it. Transactions execute in two
phases. In the first phase each transaction executes locally at one site called the
transaction's home site. Since the database is fully redundant, any site can serve as the
home site for any transaction. The transaction is assigned a unique timestamp when it
begins executing. During execution it keeps a record of the timestamp of each data item it
reads and, when its executes a write on a data item, processes the write by recording the
new value in an update list. Note that each transaction must read a copy of a data item
before it writes into it. When the transaction terminates, the system augments the update
list with the list of data items read and their timestamps at the time they were read. In
addition, the timestamp of the transaction itself is added to the update list. This completes
the first phase of execution. In the second phase the update list is sent to every site. Each
site (including the site that produced the update list) votes on the update list. Intuitively
speaking, a site votes yes on an update list if it can certify the transaction that produced it.
After a site votes yes, the update list is said to be pending at that site. To cast the vote, the
site sends a message to the transaction's home site, which, when it receives a majority of
yes or no votes, informs all sites of the outcome. If a majority voted yes, then all sites are
required to commit the update, which is then installed using TWR. If a majority voted no,
all sites are told to discard the update, and the transaction is restarted. The rule that
determines when a site may vote "yes" on a transaction is pivotal to the correctness of the
algorithm. To vote on an update list U, a site compares the timestamp of each data item in
the readset of U with the timestamp of that same data item in the site's local database. If
any data item has a timestamp in the database different from that in U, the site votes no.
Otherwise, the site compares the readset and writeset of U with the readset and writeset
of each pending update list at that site, and if there is no rw conflict between U and any of
the pending update lists, it votes yes. If there is an rw conflict between U and one of those
pending requests, the site votes pass (abstain) if U's timestamp is larger than that of all
pending update lists with which it conflicts. If there is an rw conflict but U's timestamp is
smaller than that of the conflicting pending update list, then it sets U aside on a wait
queue and tries again when the conflicting request has either been committed or aborted
at that site.
5.10. Ellis’ Ring Algorithm: Ellis' algorithm solves the distributed concurrency control
problem with the following restrictions:
1. The database must be fully redundant.
2. The communication medium must be a ring, so each site can only communicate
with its successor on the ring.
3. Each site-to-site communication link is pipelined.
4. Each site can supervise no more than one active update transaction at a time.
17
5. To update any copy of the database, a transaction must first obtain a lock on the
entire database at all sites.
The effect of restriction 5 is to force all transactions to execute serially; no
concurrent processing is ever possible. For this reason alone, the algorithm is
fundamentally impractical. To execute, an update transaction migrates around the ring,
(essentially) obtaining a lock on the entire database at each site. However, the lock
conflict rules are nonstandard. A lock request from a transaction that originated at site A
conflicts at site C with a lock held by a transaction that originated from site B if B=C and
either A=B or A's priority < B's priority. The daisy-chain communication induced by the
ring combined with this locking rule produces a deadlock-free algorithm that does not
require deadlock detection and never induces restarts. There are several problems with
this algorithm in a distributed database environment. First, as mentioned above, it forces
transactions to execute serially. Second, it only applies to a fully redundant database. And
third, the daily-chain communication requires that each transaction obtain its lock at one
site at a time, which causes communication delay to be (at least) linearly proportional to
the number of sites in the system.
This list includes almost all concurrency control algorithms described previously in
the literature, plus several new ones. This extreme consolidation of the state of the art is
possible in large part because of their framework set up earlier. The focus of [17] has
primarily been the structure and correctness of synchronization techniques and
concurrency control algorithms. They have left open a very important issue, namely,
performance.
5.11. Modeling Concurrency control in Distributed Database: The main performance
metrics for concurrency control algorithms are system throughput and transaction
response time. Four cost factors influence these metrics: intersite communication, local
processing, transaction restarts, and transaction blocking. The impact of each cost factor
on system throughput and response time varies from algorithm to algorithm, system to
system, and application to application. This impact is not understood in detail, and a
comprehensive quantitative analysis of performance is beyond the state of the art. They
provide a mythological model of Database and express serializable conditions in their
model and express algorithms in their model and express their correctness.
Figure 1 Hierarchical Transaction Structure
In [18] concurrency control problem is expressed in three questions as below:
1. How do the performance characteristics of the various basic algorithm classes
compare under alternative assumptions about the nature of the database, the
workload, and the computational environment?
18
2. How does the distributed nature of transactions affect the behavior of the various
classes of concurrency control algorithms?
3. How much of a performance penalty must be incurred for synchronization and
updates when data is replicated for availability or query performance reasons?
The first of these questions remains unanswered due to shortcomings of past studies
that have examined multiple algorithm classes. The most comprehensive of these studies
suffer from unrealistic modeling assumptions.
In [18], reported the first phase of a study aimed at addressing the questions raised
above.
Examined four concurrency control algorithms in this study, including two locking
algorithms: a timestamp algorithm and an optimistic algorithm. The algorithms
considered span a wide range of characteristics in terms of how conflicts are detected and
resolved. They express a hierarchical structure for transactions. It has been shown in
Figure 1. They describe under investigate algorithms briefly. Then they provide a model,
which comprises of four main parts: the source, transaction manager, resource manager
and concurrency control and test workloads and database parameters on them. Figure 11
shows the model in detail.
Figure 2 The Model of Database
Modeled distributed database, as collection of site comprises these components. It is
shown in Figure 12.
Finally, they present their initial performance results for the four-concurrency
control algorithms, which mentioned above under various assumptions about data
replication. CPU cost for sending and receiving messages, transaction locality, and
sequential versus parallel execution. The simulator used to obtain these results was
written in the DeNet simulation language, which allowed us to preserve the modular
structure of their model when implementing it. We describe the performance experiments
and results following a discussion of the performance metrics of interest and the
parameter settings used. They have four experiments and in those experiments, they
evaluated algorithms with respect to replication in first experiment. The purpose of this
19
experiment is to investigate the performance of the four algorithms as the system load
varies, and to see how different levels of data replication impact performance. In the
second experiment, they examine the impact of message cost on the performance of the
algorithms. The data layout, workload, and transaction execution pattern used here are
identical to those of first experiment. In the third experiment, they consider a situation
where a transaction may access non-local data. The data layout and transaction execution
pattern used here are the same as in first and second experiments. and all of the files
needed by a given transaction still reside on a single site.
Figure 3 Database in [18] methodology
The purpose of fourth experiment is to investigate performance under a parallel
transaction execution pattern. In this case, the data layout is different, and a bit more
complex.
In [21] Carey and Livny describes a distributed DBMS model, an extension to their
centralized model. Different simulation parameters are examined through simulations.
Several papers about concurrency control have also been written by Thomasian et. al.
[19, 20]. The most important difference between O-O approach and the earlier
approaches is that we focus on data-shipping page-server OODBs, while earlier
approaches have been done in the context of query-shipping relational database systems.
Also, inter-operation and inter-transaction times are expected to be much smaller in this
kind of system.
5.12. Simulating Concurrency Control in Distributed Database: In [22] Norvag,
Sandsta and Bratbergsengen provided a simulator for simulating a distributed database.
While distributed relational database systems usually use query shipping, data shipping is
most common in object-oriented database systems. That means, instead of sending the
queries to the data, data is sent to the queries. The most popular data granularity is pages.
This is the easiest to implement, the most common in today’s object-oriented DBMS, and
also the granularity that gives the best performance. In [22], they provide a brief view of
DBsim’s architecture.
In addition to simulate and compare schedulers, one of the main goals in the
development of the DBsim simulator was that it should be useful as a framework for
simulation of schedulers and easy to extend with new schedulers. The DBsim architecture
is object oriented, all the major components are implemented as classes in C++. The
20
program consists of a collection of cooperating objects; the most important are the event
controller (the main loop), transaction manager (TM), scheduler, data manager (DM) and
the bookkeeper. Extending the simulator with new scheduler strategies is easy, e.g., if
someone wants to test out a new scheduler, it can be implemented as a subtype of the
generic base scheduler class defined in DBsim. The generic scheduler has defined the
necessary methods to cooperate with the transaction and data managers.
The simulation is event driven, and each transaction can be thought of as a thread.
In the main loop an event from the event queue is picked and executed. Events in the
queue consist of an event type and the time for the event to be executed. If the event is a
TM-event, theTM is called, and if the event is a DM-event, theDMis called. Possible
reasons for events could be a transaction requesting an operation, or the data manager has
finished a read or writes operation on disk. They introduced three new concepts to the
distributed simulation model:
1. Number of sites that the database is distributed to.
2. Locality of data.
3. Type of network.
The main architecture for their simulator can be seen in Figure 13, which shows the
simulator as a collection of cooperating objects. For each simulated node we have one
data manager object, and one scheduler object.
Figure 4 DBsim Architecture
In their model they have only one global transaction manager object. This object is
responsible for creating transactions and operations to the underlying scheduler and data
manager objects. A bookkeeper object is used to collect statistics about the ongoing
activities.
They evaluated many parameters mentioned as below:
Number of Nodes: Number of nodes in the system they simulate. These nodes are
connected by network.
Number of Transactions: The number of transactions to complete after the initial
warm-up phase. The reason for having a warm-up phase, is to make sure that the start-up
phase for the system they simulate, is not taken into account when they start collecting
statistics about the simulation. In their simulations they have used a warm-up phase
consisting of 2000 transactions before they start collecting data.
21
Size of Address Space: Number of elements (pages) in the address space is
important. In their simulations they have set the address space to
20000 elements.
Data Declustering: Distribution of the data elements to nodes, the percentage of the
database elements in the database
system located at a particular node.
Data Locality: The probability that a transaction access a data element located on the
same node as the transaction.
Non-Local Data Access: When a transaction in the system is accessing a data
element not located at its home node, this is the probability that the remote access will
access data at a particular node.
Hot Spot Probability: The probability of an operation to address the hot spot part of
the address space. The hot spot
area in their model is the first 10% of the address space at each node.
Multi Programming Level: Number of concurrently executing transactions.
Transaction Distribution: The probability that a new transaction in the system is
started on a particular node.
Short Transaction Probability: The probability of a transaction being short. The
rest of the transactions are long transactions.
Abort Probability: The probability of a transaction requesting abort before
committing. This probability is the same
for both long and short transactions.
Read Probability: The probability of a data access operation being a read operation.
In their simulations, they have used a value of 80%. This gives a write probability of
20%.
Burst Probability: The probability of a transaction to ask for operations in a burst.
Time between operations in a burst is shorter than normal. The length of short and long
transaction, the time between transactions, time between operation requests from a
transaction and number of operations in a burst is drawn from an uniform distribution
with parameters as shown in Table 1.
Restart Delay: When using a timestamp ordering scheduler, transactions might get
aborted because of conflicts. If restarted immediately, the probability for the same
conflict to occur is quite high. To avoid this problem, the restart of the transaction is
delayed a certain amount of time. This delay is a value drawn from a uniform
distribution, multiplied by the number of retries.
Network: What kind of network to simulate. It can be of three types, cluster, LAN or
WAN.
Scheduler Type: Scheduler used for concurrency control. In the current version, this
has to be either two-phase locking or timestamp ordering.
Disk Operation Time: This time interval gives the time for doing an access to the
disk.
Regarding to above parameters, they evaluated various factors such as Throughput,
Number of Nodes, MPL, Response Time, Abort Frequency and Different Data
Placement. After these, the results concluded as below:
22





With a mix of long and short transactions, the TO scheduler have a higher
throughput than the 2PL scheduler.
With only short transaction, the two schedulers perform almost identical.
The TO scheduler have much higher abort probabilities than the 2PL scheduler.
The 2PL scheduler is in favor of long transactions, and the number of long
transactions that manage to finish successfully is much higher for the 2PL
scheduler.
The network is not the bottleneck for a reasonable load. Only for heavy load and a
slow network this severely affects performance.
6. Availability:
Availability is one of most popular aspects in designing distributed database. To
provide high availability for services such as mail or bulletin boards, data must be
replicated.
6.1 Providing Availability using Lazy Operations: The availability of data in
distributed databases can be increased by replication. If the data is replicated on several
sites, it may still be available after site failures. However, implementing an object with
several copies residing on different sites may introduce inconsistencies between copies of
the same object. To be consistent, a system should be one-copy equivalent; that is, it
should behave as if each object has only one copy in so far as the user can tell. A replica
control protocol is one that ensures that the database is one-copy equivalent. In [25]
Abbadi and Toueg consider database systems that are prone to both site and link failures.
Sites may fail by crashing or by failing to send or receive messages. Links may fail by
crashing, delaying, or failing to deliver messages. Several replica control protocols have
been proposed that tolerate different types of failures. In [25] they present a replica
control protocol that allows the accessing of data even when the database is partitioned. It
can be combined with any available concurrency control protocol to ensure the
correctness of a database.
They describe the formal database model and their correctness criteria. They
propose a replica control protocol, and then they prove it correct. Finally they present
several optimizations to the protocol. Their replica control protocol assumes two types of
transactions: user transactions, issued by the users of the database, and update
transactions, issued by the protocol. We assume that all transactions follow a conflictpreserving concurrency control protocol, for example, two-phase locking. Such a
protocol ensures that logs are CP-serializable only at the level of copies (but not at the
object level). They present a replica control protocol that ensures that all logs are onecopy serializable, and hence that transactions are serializable at the object level.
A user transaction t that is initiated at a site with view v is said to execute in v.
Informally, view v determines which objects t can read and write, as well as which copies
it can access or write. Views are totally ordered according to a unique view-id assigned to
each view, and two sites are said to have the same view if they have identical view-ids.
Their protocol ensures one-copy serializability by (1) ensuring that all transactions
executed in one view are one-copy serializable, and (2) ensuring that all transactions
executing in a “lower” view are serialized before transactions executing in a “higher”
23
view. Satisfying conditions (1) and (2) enforces a serialization of all transactions
executing in all views.
With each object x, we associate read and write accessibility thresholds, Ar [x] and
Aw [x], respectively. An object x is read (write) accessible in a view only if Ar [x](Aw [x])
copies reside on sites in that view. The accessibility thresholds Ar [x] and Aw[x] must
satisfy:
Ar [ x]  Aw [ x]  n[ x]
This relationship ensures that a set of copies of x of size Ar [x]] has at least one
copy in common with any set of copies of x of size Ar [x].
In each view v, every object x is assigned a read and write quorum, qr [x, v] and qw
[x, v]: These specify how many physical access and write operations are needed to read
and write an object x in view v. Let n[x, v] be the number of copies of x that reside on
sites in view v (formally, n[x, v] = |sites[x] ∩ vI ). For each view v, the quorums of object
z must satisfy the following relations:
qr [ x, v]  q w [ x, v]  n[ x, v]
2q w [ x, v]  n[ x, v]
1  qr [ x, v]  n[ x, v]
Aw [ x]  q w [ x, v]  n[ x, v]
These relations ensure that, in a view v, a set of copies of x of size qw[x, v] has at
least one copy in common with any set of copies of x of size qr[x, v], qw [x, v], and Ar[x].
Read operations use the version numbers associated with each copy to identify
(and read) the most “up-to-date” copy accessed. In their protocol, version numbers
consist of two fields (v-id, k). Intuitively, if a copy has version number (v-id, k), then this
copy was last written by a transaction t executing in a view v with view-id v-id, and t is
the kth transaction to write x in view v. A version number (v1-id, k1) is less than (v2-id,
k2), if v1-id < v2-id, or v1-id = v2-id and k1 < k2. Initially, sites have a common view v.
with view-id v0-id, and all copies have version number (vo-id, 0).
A user transaction t executing in view v can read (write) an object x only if x is read
(write) accessible in view v. (Note that a site can determine whether an object is read or
write accessible from its local view only, i.e., without accessing any copies.)
Furthermore, t can only accesses read or write copies of x that reside on sites with view v
(this restriction is relaxed in Section 5.1). If object x is read accessible in view v, t
executes the logical operation r[x] by
1. Physically accessing qr[x,v] copies of x residing on sites in v (with view v),
2. Determining vnmax, the maximum version number of the selected copies, and
3. Reading the accessed copy with version number vnmax.
If object x is write accessible in view v, with view-id v-id, t executes the logical
operation w[x] by
1. selecting qw[x,v] copies of x residing on sites in v (with view v),
2. Determining vnmax, the maximum version number of the selected copies, and
3. Writing all the selected copies and updating their version numbers to (v-id,l),
where l>= 1 is the smallest integer such that (v-id, l) is greater than vnmax.
24
If a user transaction tries to access a copy that resides on a site with a view different
from the view of the site where the issuing transaction is initiated, that transaction is
aborted.
Quorum relations (2) and (3) ensure that all logically conflicting operations issued
by user transactions executing in the same view, also physically conflict. Furthermore,
since all transactions use version numbers and a conflict-preserving concurrency control
protocol, one can show that all transactions executing in the same view are one-copy
serializable.
There are trade-offs between our protocol and the quorum consensus protocol in
terms of costs. The quorum consensus protocol is designed for multi-version databases. It
must maintain a quorum assignment table and ensure that this table always satisfies the
quorum intersection invariant. This overhead allows transactions to run at increasingly
higher levels (by a process called inflation) without incurring update costs. However, to
satisfy the quorum intersection invariant, read quorums must monotonically increase with
respect to level number, thus making read operations more expensive at higher levels.
One way to guarantee consistency of replicated data is to force service operations to
occur in the same order at all sites, but this approach is expensive. For some applications
a weaker causal operation order can preserve consistency while providing better
performance. In [23] described a new way of implementing causal operations.
Lazy replication is intended for an environment in which individual computers, or
nodes, are connected by a communication network. Both the nodes and the network may
fail, but we assume the failures are not Byzantine. The nodes are fail-stop processors.
The network can partition, and messages can be lost, delayed, duplicated, and
delivered out of order. The configuration of the system can change; nodes can leave and
join the network at any time. They assume nodes have loosely synchronized clocks.
There are practical protocols, such as NTP, that with low cost synchronize clocks in
geographically distributed networks.
A replicated application is implemented by service consisting of replicas running at
different nodes in a network. To hide replication from clients, the system also provides
front-end code that runs at client nodes. To call an operation, a client makes a local call to
the front end, which sends a call message to one of the replicas. The replica executes the
requested operation and sends a reply message back to the front end. Replicas
communicate new information (e.g., about updates) among themselves by lazily
exchanging gossip messages.
There are two kinds of operations: update operations modify but do not observe the
application state, while query operations observe the state but do not modify it.
(Operations that both update and observe the state can be treated as an update followed
by a query.) When requested to perform an update, the front end returns to the client
immediately and communicates with the service in the background. To perform a query,
the front end waits for the response from a replica, and then returns the result to the
client.
They proved correctness of their solution in a mythological way. There are many
solutions, which uses laziness and increase availability. There is a trade off between
consistency and availability. To increase availability, we increase replication in different
ways and this’ll make concurrency control harder.
25
6.2 Providing Availablity using Agent Based Design: Using heuristics to design
intelligent agents to increasing availability is usual. Thus, it is important for each agent in
this system to possess its own DBMS, which is autonomous in all respects. Dependency
on a distributed DBMS in order to gain access to data on another agent’s database would
reduce the autonomy of these agents.
The system being modelled is essentially a multi-agent environment where agents
will each have local data that is unique to environment and cannot be found on another
agent. This also means that there is no replication of data unlike traditional synchronous
or asynchronous distributed databases
Data will be partitioned vertically in this model. This means columns (parts of a
table) will be placed on agents based on relevance. For example, a relation for a machine
can be vertically partitioned so that columns used primarily by the manufacturing
department can be distributed to its computer, while the rest of the columns can be
distributed to the engineering department’s computer. In a multi-user database, multiple
statements inside different transactions could attempt to update the same data. This could
lead to the data becoming inconsistent. This is undesirable, and commercial systems use
rather complex techniques to handle concurrent transactions. Another issue is
determining the location of remote data. Agents are responsible for getting information
about data distribution and plan to get locks for data and make data available.
6.3. Availability through improving fault tolerance: In this approach, given an ordered
set of nodes, one can usually devise a rule that unambiguously imposes a desired logical
structure on this set. Then, the read and write operations can rely on this rule rather than
on the knowledge of statically structured network in determining what replica sets
include quorums. If in addition, at any time, all operations can agree on a set of replicas
from which the quorums are drawn, then the protocol can adjust dynamically this set to
reflect detected failures and repairs and at the same time guarantee consistency.
In this protocol, it’s assumed assume that each node is assigned a name and all
names are linearly ordered [14]. Among all nodes replicating the data item, we identify a
set of nodes considered the current epoch. At arty time, the data item may have only one
current epoch associated with it. Originally all replicas of the data item form the current
epoch. The system periodically runs a special operation, epoch checking, that polls all
replicas of the data item. If any members of the current epoch are not accessible (failures
detected), or any replicas outside the current epoch have been successfully contacted
(repairs detected), an attempt is made to forma new epoch. (Their epoch numbers
distinguishes Epochs, with later epochs assigned greater epoch numbers.) For this attempt
to be successful, the new epoch must contain a write quorum of the previous epoch, and
the list of the new epoch members (the epoch list) along with the new epoch number must
be recorded on every member of the new epoch. Then, due to the intersection property of
the quorums, it is possible to guarantee that, if the network partitions, the attempt to form
a new epoch will be successful in at most one partition and hence the uniqueness of the
current epoch will be preserved. For the same reason, any successful read or write
operation must contact at least one member of the current epoch and therefore obtain the
current epoch list. Hence, the operation can reconstruct the logical structure of the current
26
epoch and use it to identify read or write quorums. Similarly to dynamic voting, the
system will be available as long as some small number of nodes (the number depends on
the specific protocol) is up and connected.
7. Query Evaluation:
In [26] described a notation for describing queries and used that notation to show
plans of query evaluation.
It express that the performance of a distributed query-processing algorithm depends
to a significant extent on the estimation algorithm used to evaluate the expected sizes of
some intermediate relations. The choice of a reasonable estimation algorithm is therefore
extremely important. It categorize queries into two main queries. There are tree queries
and cyclic queries. There are optimal strategies for simple queries and tree queries. Then
they express heuristics to get optimal strategies for all queries.
7.1. Query Broker: In [27] Vu and Collet present their work on supporting flexible
query evaluation over large distributed, heterogeneous, and autonomous sources.
Flexibility means that the query evaluation process can be configured according to
application context specific, resources constraints and also can interact with its execution
environment. Their query evaluation is based on Query Brokers as basic units, which
allow the query processing to interact with its environment. Queries are evaluated under
query brokers contexts defining constraints of the evaluation task. Query Brokers have
dynamic behaviors towards their execution environment. This paper focuses on the
definition and the role of query brokers in the query evaluation task in large-scale
mediation systems. We show also how query brokers ensure the flexibility of this task.
Generally speaking, in mediation systems queries are formulated over a global
schema, called also the mediation schema. These queries, called global queries, are then
rewritten on local schemas, i.e. schemas of the component sources, and decomposed into
sub-queries, called local queries. The sub-queries are evaluated by their appropriated
sources and mediators assemble intermediate results.
We propose Query Brokers as basic units for evaluating queries. The query
evaluation process can be compared as a set of query brokers. Each of them ensuring
evaluation of a sub-query. The execution context is specified through Query Brokers in
order to take into consideration specific requirements while processing queries and to
fulfill system available resources. This provides a mean to adapt the query evaluation
process to context-specific. Besides, adaptivities of the query evaluation task are enabled
by interactions of Query Broker with its execution environment, i.e. users, execution
circumstances, during the evaluation phase.
27
Figure 5. Hierarchical Mediation Architecture
They assume a mediator-based system, i.e. a set of wrapped sources and a mediator
with the task of responding to queries formulated on its global schema by using
underlying sources. Mediators can be hierarchically organized as shown in Figure 5.
Following this approach, a mediator is built on other mediators and/or wrappers. Many
mediators can work together to respond to different queries. This approach is suitable to
build large-scale systems where mediators can be distributed over network. However,
communications between mediators must be taken into consideration while processing
queries. As a result, the query processing is distributed through mediator’s hierarchies.
This hypothesis allows us to generalize the mediation architecture, and also the query
processing architecture. As mentioned previously, the static optimization approach is not
suitable for processing queries in distributed and scalable mediation systems because of
lack of statistics and unpredictabilities of execution environment. They consider queryprocessing architecture (cf. Figure 6) including the following phases:
1. A parsing phase (parser) which syntactically and semantically analyses
queries. This phase is similar to the one of the traditional query processing;
2. A rewriting phase (rewriter) which translates global query into local
queries on sources schemas. This phase depends on the way mappings between
schemas are defined
3. A preparing phase (preparation) which generates a query evaluation plan
(QEP). We will present the form of QEP in the next section;
4. An evaluation phase (evaluator), which communicates other components,
i.e. mediators or wrappers, for evaluating sub-queries.
28
Figure 6 Query Processing Architecture
Figure 7 presents Query Brokers (QBs), which wrap one or several query
operators. In other words, a QB corresponds to a sub-query and is the basic unit of the
query evaluation. As a result, a QEP is represented as a QBs graph such as the one in
Figure 3. Our hierarchy of QBs fits the hierarchical mediation architecture presented in
Figure 5. Each mediator corresponds to one or several query broker(s) processing subquery (ies). A Query Broker is defined by:
1. A context which determines constraints for executing query and meta-information
driving evaluation tasks, i.e. optimization, execution of sub-query wrapped by this
broker and communication with other QBs. Examples of constraints are limitation
of execution time, acceptation of partial results, limitation of economic access
cost, etc.
2. Operator(s), which determines how this QB processes data. Operators can be
built-in operators, i.e. pre-defined operators such as algebraic operators -e.g.
selection, projection, join, union, etc.-, communication operators -e.g. send,
receive, etc.-, and user-defined or external operators.
3. Buffer(s), which are used for separating data stream between two QBs. Buffers,
can have different forms, from simple buffers for synchronizing two QBs
operating at different speeds to more or less complicated caches for materializing
intermediate results. More details of the buffer management will be discussed in
the next sub-section.
4. Rules (E-C-A2) which define behaviors of QB towards changes of execution
environment, e.g. delay of arrival data, inaccessible data, query refinement, etc.
Using rules, QBs could change evaluation strategies, e.g. re-schedule/reoptimization sub-queries, change the implementation of certain operators such as
join, etc., and behave towards query refinements during the execution phase.
29
Figure 8 gives an overview of the functional architecture of a QB. The main modules
are a Buffer Manager, a Context Manager, a Rule Manager, an Evaluator, and a Monitor.
In the following, we discuss these modules.
The QB context consists of a set of parameters. They determine four categories of
parameters related to user’s requirement, e.g. limitation of execution time (timeout), type
of partial result (partial-result), limitation of economic cost for processing queries (cost),
interested data (preference), etc.; availability of resources, e.g. memory-size, CPU-time,
etc.; meta-information, e.g. arrive-data-rate, source-access, etc.; and other query variables
which will be specified during query execution.
For achieving a flexible query evaluation framework, we adopted rule-based
approach for defining behaviors of QB. E-C-A rules allow specifying Query Brokers
behaviors towards execution circumstances. The techniques for re-scheduling and reoptimizing queries can be integrated in QBs as rules.
Figure 7 Interconnected Query Brokers
In [28] Evrendilek and Dogac provide a method to optimize a query over a distributed
database. They suppose that three steps are necessary to process the global queries: First
a global query is decomposed into sub-queries such that the data needed by each subquery are available from one local database. Next each sub-query is translated to a query
or queries of the corresponding local database system and sent to a local database system
for execution. Third the results returned by the sub-queries are combined into the answer.
In [28] Evrendilek and Dogac consider the optimization of query decomposition in
case of data replication and the optimization of inter-site joins, that is, the join of the
results returned by the sub-queries. The optimization algorithm presented for inter-site
joins can easily be generalized to any operation required for inter-site query processing.
The algorithm presented in this paper is distributed and takes the federated nature of the
problem into account.
Nowadays, P2P systems become more common. In such systems there is not global
knowledge: neither a global schema nor information of data distribution or indexes. The
30
only information a participating peer has is information about its neighbors, i.e. which
peers are reachable and which data they provide. The suitability of this approach was
already demonstrated by the success of the well-known file sharing systems like Napster
or Gnutella. Furthermore, in such systems, we cannot assume the existence of a global
schema even not as the sum of the schemas of the neighbor peers because adding a new
peer could trigger schema modifications for all other peers of the system. There are
several obvious advantages of such a schema-based P2P system. A main advantage is that
adding a new source (peer) is simplified because it requires only defining
correspondences to one peer already part of the system. Using this neighbor the new peer
becomes accessible from all other peers. Of course, such advantages are not for free.
7.2. Incomplete schema and P2P Query Evaluation: In [29] Karnstedt, Hose and
Sattler address this problem by investigating different strategies for processing queries on
incomplete schemas. Their contribution is (1) dealing with the problem of incomplete
schema information and (2) a detailed comparison of different query processing strategies
in this context. They express a system, which based on XML data. By having no
complete information about schema, they have to deal with two issues. First, they have to
express correspondences between two schemas and second, we have to formulate queries
without complete schema information.
In distributed systems the general question is whether to execute the query at the
initiator’s side or at the peers that store the relevant data. In the first case the data is
moved to the initiator and all operations are executed there. This is called data shipping.
The second approach is called query shipping, because in this case the query is moved to
the data and only that part satisfying the query is shipped to the requestor for further
processing. Applying this strategy the amount of data moved through the network is
reduced, because only the necessary data a queried peer cannot process is sent to other
peers. Query and data shipping are the two general approaches when processing queries
distributed, but neither query shipping nor data shipping is the best policy for query
processing in all situations. Other techniques, trying to combine the advantages of both
approaches, have been developed. An example is called hybrid shipping. In the query
shipping approach the first intention is to decompose the query into sub-queries
according to the known peers and their querying capabilities. In this way each peer
receives that part of the query it (or the peers connected to it) is expected to support.
After decomposition the peer computes the corresponding result, or forwards the query to
other peers if it does not provide all queried data. Another technique evolved is called
Mutant Query Plans: An execution plan constructed from the original query is sent as a
whole to other peers. Each peer decides by itself, if it can deliver any data. If yes, it
writes the data into the plan replacing the corresponding part of the query. Using such
mutating plans also provides the opportunity of optimizing (parts of) the plan
decentralized.
They implemented a query shipping technique is a variant of mutating query plans.
The query plan is shipped to the connected peers in parallel and each peer inserts the data
it can provide. Beside the general approaches based on flooding additional approaches
using global knowledge (all data at all peers is known to each peer) have been tested in
order to outline the benefits of query shipping even more. The difference is that in the
31
first approach there is more control messages generated than in this global knowledge
approach.
The approaches mentioned above, suitable for distributed systems, are not suitable for
real P2P systems without modifications. We cannot assume having all defined
correspondences known to each peer. The general techniques for processing queries must
be modified in order to accomplish the required tasks of query transformation and data
collection step by step, having each peer responsible for querying local neighbors using
only the locally defined mappings. If the processed query is formulated in a schema
unknown to the processing peer the simplest way of processing it is to query all
neighbors, which we call flooding. This is an applicable strategy even if no
correspondences are defined, because by sending the query to each peer in the network
(up to a certain horizon) it will finally be shipped to those peers knowing the used
schema.
They provide methods to route a query in the network, despite the restrictions we
encounter in P2P systems. The problem of routing is to decide which of the known peers
is most suitable for answering the query. A strategy adapted to their needs will have to
use partial information available. A possible approach is to use the defined
correspondences in terms of routing. The defined mappings provide for each peer of
information about what data is stored at the neighbors.
In order to reduce the message number using routing indexes can be quite useful. A
routing index is a data structure that allows to route queries only to peers that may store
the queried data. Therefore the data stored at each peer must somehow be associated with
data identifiers. Routing indexes may also be used to generate a list of priorities for
querying the peer’s neighbors.
7.3. Dynamic Query Evaluation: In [31] Jim and Suciu provide a method, which
evaluates queries despite of topology changes.
In [31], they propose a new paradigm for distributed query evaluation on the Web.
We present the paradigm in terms of a simple query language, called dynamically
distributed datalog (d3log). The simple addition of dynamic site discovery changes the
character of query evaluation, and motivates novel evaluation techniques. The main
paradigm change is the introduction of an intentional answer, in which a server responds
to a query not with a table, but rather with a set of rules. In this paper we develop a
framework for studying possible query evaluation techniques in the presence of dynamic
site discovery.
They started by studying query evaluation with site discovery for a non-recursive
query language, similar to SQL. They soon discovered however those query evaluation in
this new setting is deeply and intimately connected to recursion, for three reasons. First,
recursion may be impossible to prohibit: the Web is not under centralized control, and
one cannot constrain what links other sites may have. Second, it is difficult to detect: sites
are unknown in advance, so recursion must be detected dynamically. Finally, we have
found some applications, such as security infrastructures, where recursion arises naturally
and inevitably.
8. Heterogeneous Distributed Database:
32
Heterogeneous distributed database link distributed topics to legacy system and
designing a heterogeneous system. Nowadays using mediators to integrate various types
of systems are very usual.
In [24] Stphen and Huhns describe a technique for creating mediators- information
system components that operate among distributed applications and distributed databases.
The databases are distributed logically and physically, typically residing on different
platforms and having different semantics. A mediator automates the process of merging
widely differing databases into a seamless, unified whole to be presented to the user.
Their system’s underlying strength is its homogeneity-all components are modeled as
Java agents that communicate with each other in a common protocol (KQML) and with a
common semantics (the ontology).
The mediator-based information system consists of a mediator-reasoning core, a
universal ontology, wrappers to align the databases semantically with the ontology, and
agents that handle connectivity issues. Each component is modeled as an agent. The
mediator agent operates as a forward-chaining, rule-based system written declaratively. It
is multithreaded to support simultaneous queries and interactions. The mediator’s
intelligence is embedded in a fact and rule base residing in a text file that is easily edited.
By changing only the rule base, the mediator may be customized for any application.
Mediators mitigate the heterogeneity and distribution of information sources, but are
difficult to construct. The major problems involve semantics and communications. The
structure of a system for querying and updating heterogeneous databases is shown in
Figure 5. The system consists of a mediator-reasoning core, a universal ontology, and
wrappers to align the databases semantically with the ontology, and agents that handle the
connectivity issues.
The information flow in the system consists of the following. Database and ontology
agents communicate their semantics to the mediator agent. A user can formulate database
commands in terms of the common ontology; the user’s agent sends these commands to
the mediator agent. The mediator reasons about schemas and ontologies to determine
relevant databases or other information resources. The mediator communicates to each
information resource’s “wrapper” agent, which maps from terms in the common ontology
to the resource’s schema.
The common ontology of concepts includes both entities and relationships among the
entities. A system user formulates database commands in terms of the common ontology.
Each database is accessed through a “wrapper” that maps from terms in the common
ontology to the database schema. A mediator takes the ontology and database schema
information and maps the user command to the appropriate database wrappers. The
database wrappers are agents that translate the command into the local database schema.
The wrappers return results to the mediator in terms of the common ontology. For query
commands, the mediator gathers the results and passes them to the user.
The mediator is based on a uniform interaction among the components of the
information system, with each component modeled as an agent. There are separate agents
for each user interface, the ontology, and each database wrapper.
33
Figure 8 Agent-based Mediator Structure
In [24] agent-based mediator; for simplicity, all interagent communication occurs
through a special agent called the router. The agent router allows Java applets to
exchange messages with any registered agent on the Internet. (Netscape’s security
restrictions prohibit a given agent from communicating with an agent not spawned on the
same host.) The agent router allows any registered agent to send messages to any other
registered agent by making a single socket connection to the agent router. Messages are
forwarded without the sending agent having to know the receiving agent’s absolute
address and making a separate socket connection as with the usual Agent Name Server
(ANS) infrastructure. Like an e-mail server, the agent router buffers all messages so that
they are not lost due to network transient problems. If each individual agent goes down or
logs out, he may return for his messages at a later time.
The mediator contains rules that logically connect the input ontology and database
schemas. When these rules fire, a mediator agent is instantiated for the given input of
ontology and schemas. The resulting agent has a mapping between the domain ontology
and the database schemas. This mapping is the basis for the query mediation.
At run time, an application program submits an SQL command to the interface
agent, which forwards the command to the mediator agent. The mediator agent parses the
query, reasons about which databases may have relevant information, reasons about any
necessary decomposition of the query, and then sends the decomposed query to the
resource agent for the relevant databases. They accept the SQL commands and, using
JDBC, connect to, open, and query the database.
9. Conclusion:
34
A lot of theoretical work has been done in the design of distributed database
systems and transaction processing. Many concepts models and algorithms have been
developed for concurrency control, crash recovery, communication, query optimization,
nested transactions, site failures, network partitions, and replication. Simultaneously a
major effort has been put in the implementation of these ideas in experimental and
commercial systems with a wide spectrum of distributed database systems, distributed
operating systems and programming languages. The implementers have discarded many
ideas while adopting others such as two phase locking, two phase commit, and remote
procedure calls. They have built workable systems but still wrestle with the problems of
reliability and replication. The true benefits of distributed database systems such as high
availability and good response time appear to be within reach.
In this paper, we prepare a general view to recent works in distributed database
systems. We provided a survey on recent works in related topics to distributed database
systems. There are many new approaches used to improve distributed database systems.
Using intelligent mediators are very common in managing heterogeneous database. An
agent-based tool is well suited to the problem of providing inference prevention
capability in a distributed database system. There are many advantages for using agents
as DDBMS components. Some of these advantages are listed below:
1. Since the agents work in parallel and are local to the databases, the performance
benefit of distribution is not lost. There is no bottleneck through which all queries
must pass.
2. Similarly, the survivability benefit of distribution is not lost. The potential single
point of failure represented by a centralized Rational Downgrader is avoided.
3. The comparTMentalization provided by a distributed scheme is preserved.
Databases can prevent the inference of sensitive data in other databases without
knowing exactly what the nature of that data is.
4. Interoperability is insured. Heterogeneous databases can participate in the
inference prevention effort as long as they are compliant with the SQL standard.
5. A separation of concerns is maintained. Changes to the inference prevention
scheme do not require changes to the database management systems and vice
versa.
New distributed databases such as P2P raise new requirements for query evaluation,
concurrency control algorithm and etc. Web as distributed system which can be viewed a
data source, requires specific type of research related to distributed database systems. In
this paper, we consider some of recent approaches for designing distributed databases
such as agent based approaches and etc.
10. References
[1] D. Pinto and G. Torres. “On Dynamic Fragmentation of Distributed Databases Using
Partial Fragmentation replication”, Facultad de Ciencias de la Computación Benemérita
Universidad Autónoma de Puebla, 2002
[2] S. Voulgaris, M.V. Steen, A. Baggio, and G. Ballintjn. “Transparent Data Relocation
in Highly Available Distributed Systems”. Studia Informatica Universalis. 2002
35
[3] A.Zhang, M.Nodine, B.Bhargava and O.Bukhres, Ensuring Relaxed Atomicity for
Flexible Transaction in Multi-database Systems, SIGMOD’94 ACM conference
[4] A. Zhang, M. Nodine and B. Bhargava, Global scheduling for flexible Transactions in
Heterogeneous distributed Database Systems
[5] J.H.Harista, K.Ramamritham and R.Gupta. The PROMPT Real-Time Commit
Protocol, IEEE,1999
[6] P.Saha, One-Phase Real-Time Commit Protocols, Master engineering thesis in
computer science and engineering, Indian Institute of Science , 1999
[7] N. Nouali, A. Doucet and H. Drias, “A Two-Phase Commit Protocol for Mobile
Wireless Environment”, Australian Computer Society, 6th Australasian Database
Conference (ADC 2005),2005
[8] A. Sillberschats, H. F. Korth, S. Sudarshan, “Database System Concepts 4 th edition”,
Mc GrawHill publications co.2002
[9] V. Goenka, “Peterson’s algorithm in a multi agent database system”, Knowledge
representation and reasoning Journal, July 2003
[10] R. J. Peris, M. P. Martynez, and S. Arevalo. “An Algorithm for Deterministic
Scheduling for Transactional Multithreaded Replicas and its Correctness”, In IEEE
SRDS, 2000.
[11] R. J. Peris and M. P. Martynez, “Deterministic Scheduling and Online Recovery for
Replicated Multithreaded Transactional Servers”, In Workshop on Dependable
Middleware-Based Systems, 2002.
[12] B. K. S. Khoo, S. Chandramouli, “An Agent-Based Approach to Collaborative
Schema Design”, Workshop on Agent Based Information Systems at Autonomous Agent
Conference, May, 1999
[13] B. Bhargava, “Building Distributed Database Systems”, 26th VLDB Conference
Cairo, Egypt 2003
[14] M. Rabinovich, E. D. Lazowska, ” Improving Fault Tolerance and Supporting
Partial writes in Structured Coterie Protocols for Replicated Objects”, ACM SIGMOD,
USA, 1992
[15] L. M. Haas, C. Mohan, R. F. Wilms, R. A. Yost, “Computation and Communication in R*: A
Distributed Database Manager”, ACM Transactions on Computer Systems, Vol. 2, No. 1.
1984
[16] M. Blakey, “Models a Very Large Distributed Database”, ACM Transactions on
Computer Systems, Vol. 10, No. 6. 1992
[17] P. A. Bernstein and N. Goodman, ”Concurrency Control in Distributed Database
Systems”, Computing Surveys, Vol. 13, No. 2, June 1981
[18] M. J. Carey and M. Livny “Distributed Concurrency Control Performance: A Study
of Algorithms, Distribution, and Replication”, 14th VLDB Conference Los Angeles,
California 1988
[19] P.A. Franaszek, J. T. Robinson and Thomson, “Concurrency Control for High
Contention Environments” ACM Transactions on Database Systems, 1992
36
[20] A. Thomasian, “Performance Limits of Two-Phase Locking”. IEEE International
Conference on Data Engineering, 1991
[21] M.J. Carey, and M. Livny “Parallelism and Concurrency Control Performance in
Distributed Database Machines”, 1989 ACM SIGMOD, 1989
[22] Kj. Norvag, O. Sansta and K. Bratbergsengen, “Concurrency Control in Distributed
Object-Oriented Database Systems”, Advances in Database and Information Systems,
1997
[23] R. Ladin, B. Liskov, L. Shirra, S. Ghemawat, “Providing High Availability Using
Lazy Replication “, ACM Transactions on Computer Systems, Vol. 10, No. 4. 1992
[24] L. M. Stephen, M. N. Huhns, “Database Connectivity Using an Agent-Based
Mediator System”, Nov 2004
[25] A. E. Abbadi, S. Toueg, “Marinating Availability in Portioned Replicated
Databases”, ACM Transactions on Computer Systems, Vol. 14, No. 2. 1989
[26] C. T. Yu, C. C. Chang, “Distributed Query Processing”, Computing Surveys, Vol
14, No. 2, Dec 1984
[27] T. T. Vu, C. Collet, “ Query Brokers for Distributed and Flexible Query Evaluation”,
ACM Transactions on Database Systems, 2004
[28] C. Evrendilek, A. Dogac, “Query Decomposition, Optimization and Processing in
Multidatabase Systems”, ACM Transactions on Computer Systems, Vol. 11, No. 4. 2003
[29] M. Karnstedet, K. Hose, K. U. Sattler, “Distributed Query Processing in P2P
Systems with incomplete schema information”, ACM Transactions on Database Systems,
2004
[30] J. Smith, A. Gounarlis, P. Watson, N. W. Paton, A. A. A. Fernandes, R. Sakellariou,
“Distributed Query Processing On The Grid”, International Journal of High Performance
Computing Applications, Vol 17, No. 4 , Winter 2003
[31] T. Jim, D. Suciu, “Dynamically Distributed Query Evaluation”, ACM Transactions
on Database Systems, 2003
37
Download