ppt slides

advertisement
Chapter 11
Grid Concurrency
Control
11.1 A Grid Database Environment
11.2 An Example
11.3 Grid Concurrency Control (GCC)
11.4 Correctness of GCC
11.5 Features of GCC Protocol
11.6 Summary
11.7 Bibliographical Notes
11.8 Exercises
Grid Concurrency Control

Concurrency control protocol helps to maintain the consistency of data
in database

Concurrency control protocol addresses ‘C’ and ‘I’ of ACID properties

Serializability in the most widely accepted correctness criterion

Different DB architecture needs different concurrency control protocol,
i.e. concurrency control protocol for a centralized DBMS will be
different that that of a distributerd DBMS
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.1 A Grid Database EnvironmentTT
1

Data is geographically distributed in Grid
environment. A typical working of
database in Grid architecture is shown in
the figure
DB1
2
T2
Grid Middleware
Legend:
ST22
ST12
DB2





T1
ST23
ST13
DB3
T1 : Transaction 1
T2 : Transaction 2
ST ij : Subtransaction of
transaction i at site j
A distributed grid DB with 3 sites are shown, DB1, DB2, and DB3
(connected via grid middleware)
Transactions can be submitted at any site and may need to access
data from all the sites
Originator / coordinator is a site where transaction is submitted
Transactions T1 and T2 submitted to DB1 and they needs to access
data from DB2 and DB3 as well
Transaction and site identifiers are suffixed, e.g. T1 will have subtransactions ST12 & ST13; and T2 will have sub-transactions ST21
and ST22
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.1 A Grid Database Environment (Cont’d)

Data access must be synchronized to maintain correctness of data

Global lock tables, global logs etc cannot be implemented in Grid
environment

Different DB sites may implement different concurrency control procols,
e.g. one site may use locking whereas other site may use optimistic
concurrency control protocol

This situation is unavoidable in Grid architecture due to heterogeneous
DB sites
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.2 An Example
Following example shows that using traditional concurrency control
protocols in the Grid environment may potentially corrupt the data
Example

Consider four data objects are stored in two databases DB2 and
DB3:

DB2 = O1 and O2
DB3 = O3 and O4

Two transactions are submitted to the database DB1, as shown
below:
T1 = r1(O1) r1(O2) w1(O3) w1(O1) C1
T2 = r2(O1) r2(O3) w2(O4) w2(O1) C2

The transactions are submitted to the Grid middleware and the
metadata service forms required sub-transactions as follows:

Sub-transactions of T1:
ST12 = r12(O1) r12(O2) w12(O1) C12
ST13 = w13(O3) C13

(11.1)
(11.2)
Sub-transactions of T2:
ST22 = r22(O1) w22(O1) C22
ST23 = r23(O3) w23(O4) C23
(11.3)
(11.4)
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.2 An Example (Cont’d)


The sub-transactions are submitted to respective sites, i.e. ST12 and
ST22 are submitted to DB2 and ST13 and ST23 are submitted to DB3
As all DB sites are autonomous and hence schedules/histories are
created independently. Say DB2 create following history:
H2 = r12(O1) r12(O2) w12(O1)C12 r22(O1) w22(O1) C22
(11.5)
and DB3 creates following history:
H3 = r23(O3) w23(O4) C23 w13(O3) C13



(11.6)
From equation 11.5 serializability order: T1 execute before T2 and from
equation 11.6 serializability order: T2 executes before T1
Though there is no problem in executing histories H2 and H3 in
isolation, but when both histories are combined then serilaizability
graph produces a cycle T1  T2  T1
Traditional distributed DB handles this situation by implementing a
global management, which is not possible in Grid Databases. Next,
Grid Concurrency Control protocol is discussed
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC)


The above example is the motivation for GCC; where, though individual
sites generate serializable schedules, in global view of things the
transactions may be ordered incorrectly
Functions required by GCC:





DB_Accessed(T): takes the global transaction as argument and returns set of
databases where sub-transactions of the global transaction are submitted
Split_Trans(T): takes the global transaction as argument and returns a set of subtransactions
Active_Trans(DB): takes the database as an argument and returns the set of
global transactions having any sub-transaction running in the database
Cardinality (Any Set): takes any set, e.g. set of databases or set of subtransactions and returns the number of elements in the set
Append_TS (Subtransaction): takes the sub-transaction as an argument and
attaches a unique timestamp to it. Sub-transactions of same global transaction will
have same timestamp value
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Grid Serializability Theorem

Traditional Conflict Serializability is not sufficient to ensure consistency
in Grid database environment

Grid serializability theorem is needed to ensure correctness of data

Global transactions can be classified in 2 categories:



Global transactions with only one sub-transaction and
Global transaction having more than one sub-transaction
Total order is defined as below:
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)




In traditional serializability theory, serial history is considered
correct. On the same ground Grid-serial history is considered
correct in Grid architecture
Grid serial history is defined as below:
Condition (1) of definition 11.2 is very strict and does not allow
interleaving of operations
Hence a more practical approach, Grid Serializable history is used
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)

Grid serializable history:

Grid serializability is analysed by the grid serializability graph

If the graph is acyclic the history is Grid serializable
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)

Grid Serializability graph is defined as below:
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)





Condition (1) considers local transactions in Grid Serializability graph
Condition (2) only considers those global transactions having more
than one subtransaction
Condition (3) shows the arc between conflicting transactions
Grid serializability graph is stored at local sites as there is no global
management layer
Following types of conflicts are possible:





Conflict between global transactions (global-global conflict)
Conflict between global transaction and local transaction (global-local conflict)
Conflict between local transactions (local-local conflict)
Acyclic Grid-serializability graph is used to resolve global-local conflict
Total-order is used to resolve global-global conflict
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)

Based on the Grid serializability graph and total order Grid
serializability theorem is as follows:
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)



Example of Grid serializability graph:
In addition to the global transaction (in earlier example), consider
additional local transactions as follows:
Local Transactions. (LT12 is read as local transaction 1 at database
site DB2):
LT12 = lr12(O1) lw12(O2) lC12
LT13 = lw13(O3) lC13

Now consider following modified histories:
H2 = lr12(O1) r12(O1) r12(O2) w12(O1)C12 r22(O1) w22(O1) lw12(O2) C22 lC12
H3 = r23(O3) w23(O4) lw13(O3) C23 w13(O3) C13 lC13
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
ST12


Following figure shows the Grid
serializability graph at sites DB2
and DB3
Three possible types of conflicts
are discussed below:



ST22
LT12
At site DB2
ST13
ST23
LT13
At site DB3
Global-global conflict: At site DB2, ST12 precedes ST22 (i.e. T1 precedes T2) and
at site DB3, ST23 precedes ST13 (i.e. T2 precedes T1). Thus a cycle is formed at
different sites. And it may be impossible to identify the cycle without a global
management layer. Total order used in Grid serializability avoids formation of
cycles are distributed sites
Global-local conflict: Can be identified and resolved by local DBMS, e.g. in DB2
ST12 and LT12
Local-local conflict: Can be identified and resolved by local DBMS, similar to
traditional DBMS
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Grid Concurrency Control Protocol

Has 2 phases: submission & termination

Site where transaction is submitted is called originator

Split_trans(T) function is used to generate multiple sub-transactions of
global transaction

Sub-transactions are then submitted to participating sites

Unique timestamp is attached to each sub-transactions before
submitting

Sub-transactions at local databases are executed in total-order

A local schedular does not distinguishes between a local transaction
and a sub-transaction of global transaction

Global transaction with only one sub-transaction does not need to be in
total-order as they cannot conflict with other global transaction at more
than one site
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
GCC (Cont’d)

Submission
phase of GCC
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)

Step-1) Checks if data from multiple sites need to be accessed







if data from only originator is required then treat as local transaction
If more multiple DB needs to be accessed then the transaction is submitted to
metadata services. Split_trans(T) function is used to create sub-transactions
Step-2) Global transactions are added to a set which stores all the
currently executing global transactions. The set name is Active_Trans
Step-3) The middleware appends a timestamp to all sub-transactions
before submitting it to respective databases
Step-4) If more than one active global transaction exists simultaneously
that accesses more than one database, then sub-transactions are
executed in total order (according to the timestamp)
Step-5) When all sub-transactions of a global transaction finish
execution then the global transaction is removed from the Active_Trans
set (details in termination phase of GCC)
Note: Active_Trans is a set of currently active global transactions and
Active_trans(DB) is a function that take DB site as argument and
returns active transactions executing in that database
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Termination phase of GCC

A global transaction is active till even one of the sub-transaction is
executing

Steps of termination are as follows:



When a sub-transaction finishes execution, the originator is informed
Active Transactions, Conflicting Active Transactions and databases access by
global transaction set are updated accordingly
Check whether the completed sub-transaction is the last sub-transaction of the
global transaction
if not the last, then sub-transactions waiting in the queue cannot be
scheduled
if the sub-transaction is the last sub-transaction of the global transaction,
then other conflicting sub-transactions can be scheduled. Sub-transactions from
the queue then follows the normal submission steps
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)

Termination phase of GCC
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Revisiting the example of section 11.2

Say, transaction T1’s timestamp is 1 and T2’s timestamp is 2

History, H2, produced by site DB2 is a serial history (equation 11.5)
with T1 preceding T2

GCC will not schedule transactions as in H3 (equation 11.6) due to
step-4) of the submission phase of GCC. It will always follow the
total-order based on timestamp. Hence, sub-transactions of T1 will
always be scheduled before sub-transactions of T2. GCC will
generate histories H2 (equation 11.5) and H3 (equation 11.6) as
follows:
H2 = r12(O1) r12(O2) w12(O1)C12 r22(O1) w22(O1) C22 (same as (11.5))
H3 = w13(O3) C13 r23(O3) w23(O4) C23 (corrected execution order by the
GCC protocol)

Thus both schedules have ordered the transactions in total-order
with T1 preceding T2
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Comparison with traditional concurrency control protocols
Coordinator site
Central site managing
(typically where the
global information (e.g.
transaction is submitted)
global lock table)
Lock request
All participating sites
(1,2…n)
Lock granted
Operation command
Operation decision
Release lock request
Operations of a general centralised locking protocol (e.g. centralised
two phase locking) in homogeneous distributed DBMS
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Participant’s
All participating sites
image of global
(1,2,…n)
information
Coordinator site
(typically where the
transaction is submitted)
Operation command
embedded with lock request
Operation
End of operation
Release lock request
Operations of a general distributed locking protocol (e.g. decentralised
two phase locking) in homogeneous distributed DBMS
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Originator site (where
Multidatabase management
the transaction is
system (global management
submitted)
layer)
Operation request embedded
with global information
All participants
(1,2,É n)
Talk to participant depending
on its local protocol
Check with multi-DBMS layer
if required
MDBS Reply
Forward final decision
to the originator
Final decision
Operations of a general Multi-DBMS protocol
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.3 Grid Concurrency Control (GCC) (Cont’d)
Originator site (where
the transaction is
submitted)
Grid Middleware services
(metadata and timestamp
services for this purpose)
Operation request
Forward final decision
to the originator
All participants
(1,2,…n)
Forward operation
request to participants
Final decision
Operations of GCC protocol
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.4 Correctness of GCC Protocol


Grid-serializable schedule is considered correct in Grid environment
A concurrency control protocol conforming to Theorem 11.1 is Grid
serializable and thus is correct

Proposition 11.1: All local transactions and global subtransactions
submitted to any local scheduler are scheduled in serializable order.

Proposition 11.2: Any two global transactions having more than one
subtransaction actively executing simultaneously must follow totalorder.

Based on the proposition 11.1 and 11.2 following theorem can be
proved:

Theorem 11.2: Every schedule produced by GCC protocol is Gridserializable.
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.5 Features of GCC Protocol

Concurrency control in heterogeneous environment - Does not use
global lock table etc. and hence can work in Autonomous,
Heterogeneous environment

Reducing the load from originator site - As GCC does not use a
centralized scheduling schemes, originator sites have reduced load

Reducing number of messages in the inter-network - Communication
between the originator and other participating sites is reduced

But due to absence of global management layer, some of the valid
interleaving may not be possible and hence may result in strict
schedule
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
11.6 Summary

Global management layer cannot be used in Grid environment

GCC protocol maintains the correctness of data in Grid environment

GCC protocol can work in heterogeneous environment

Optimizing the scheduling process may be hard

The focus was to maintain the consistency of data in Grid databases
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
Continue to Chapter 12…
Download