ICS 2415 ADVANCED DATABASE SYSTEM A Distributed Computing System (DCS) Definition A Distributed Computing System (DCS) is a collection of processors interconnected by a communication network in which each processor has its own local memory and other peripherals and communication between any two processors of the system takes place by message passing over a communication network (i.e. for a particular processor, its own resources are local). Rationale for Distributed Systems. 1. Inherently Distributed Applications: Many applications are by nature, inherently distributed thus requiring distributed computing for their realization e.g. for collecting, preprocessing and accessing data. Examples include computerized world-wide airline reservation system, banking system, loaning system etc. 2. Information Sharing Among Distributed Users: there is desire for efficient person-toperson communication facility by sharing information over great distances e.g. two faroff users can work on the same project. 3. Resource Sharing: resources both h/w and s/w can be shared. 4. Better Price-Performance Ratio: they have this quality as compared to the centralized systems due to the rapidly increasing power and reduction in the price of microprocessors together with the increasing speed of communication networks. They facilitate resource sharing among multiple computers. 5. Shorter response time and higher throughput: They have better performance due to multiple processors as compared to single-processor centralized systems. 6. Higher reliability: due to the multiplicity of processors and storage devices, multiple copies of critical information is maintained and redundancy achieved. Also geographical distribution limits scope of failures caused by national disasters. Its important aspect is “availability” i.e. the fraction of time a system is available for use. 7. Extensibility and Incremental growth: they are capable of incremental growth i.e. additional resources both s/w and h/w can be added. Distributed systems with these qualities are referred to as “open distributed systems.” Better flexibility in meeting users’ needs: a distributed system may have a pool of different types of computers, so that the most appropriate one can be selected for processing a user’s job Distributed database system A set of databases in a distributed system that can appear to applications as a single data source. A distributed database systems employ a distributed processing architecture to process transactions. A distributed database system allows applications to access data from local and remote databases. Distributed databases use a client/server architecture to process information requests. From Oracle perspective there are three types of distributed architecture namely • Homogenous Distributed Database Systems • Heterogeneous • Client/Server Distributed Database Systems Database Architecture A homogenous distributed database system is a network of two or more Oracle databases that reside on one or more machines. The figure below illustrates a distributed system that connects three databases: hq, mfg, and sales. An application can simultaneously access or modify the data in several databases in a single distributed environment. For example, a single query from a Manufacturing client on local database mfg can retrieve joined data from the products table on the local database and the dept table on the remote hq database. For a client application, the location and platform of the databases are transparent. You can also create synonyms for remote objects in the distributed system so that users can access them with the same syntax as local objects. For example, if you are connected to database mfg but want to access data on database hq, creating a synonym on mfg for the remote dept table enables you to issue this query: SELECT * FROM dept; In this way, a distributed system gives the appearance of native data access. Users on mfg do not have to know that the data they access resides on remote databases. In a heterogeneous distributed database system, at least one of the databases is a non-Oracle system. To the application, the heterogeneous distributed database system appears as a single, local Oracle database. The local Oracle database server hides the distribution and heterogeneity of the data. The Oracle database server accesses the non-Oracle system using Oracle Heterogeneous Services in conjunction with an agent. If you access the non-Oracle data store using an Oracle Transparent Gateway, then the agent is a system-specific application. For example, if you include a Sybase database in an Oracle distributed system, then you need to obtain a Sybase-specific transparent gateway so that the Oracle databases in the system can communicate with it. Alternatively, you can use generic connectivity to access non-Oracle data stores so long as the non-Oracle system supports the ODBC or OLE DB protocols. Heterogeneous Services (HS) is an integrated component within the Oracle database server and the enabling technology for the current suite of Oracle Transparent Gateway products. HS provides the common architecture and administration mechanisms for Oracle gateway products and other heterogeneous access facilities. Also, it provides upwardly compatible functionality for users of most of the earlier Oracle Transparent Gateway releases. Client/Server Database Architecture A database server is the Oracle software managing a database, and a client is an application that requests information from a server. Each computer in a network is a node that can host one or more databases. Each node in a distributed database system can act as a client, a server, or both, depending on the situation. In the figure below, the host for the hq database is acting as a database server when a statement is issued against its local data (for example, the second statement in each transaction issues a statement against the local dept table), but is acting as a client when it issues a statement against remote data (for example, the first statement in each transaction is issued against the remote table emp in the sales database). A client can connect directly or indirectly to a database server. A direct connection occurs when a client connects to a server and accesses information from a database contained on that server. For example, if you connect to the hq database and access the dept table on this database as in the Figure above, you can issue the following: SELECT * FROM dept; This query is direct because you are not accessing an object on a remote database. In contrast, an indirect connection occurs when a client connects to a server and then accesses information contained in a database on a different server. For example, if you connect to the hq database but access the emp table on the remote sales database as in the Figure, you can issue the following: SELECT * FROM emp@sales; This query is indirect because the object you are accessing is not on the database to which you are directly connected. Transparent Gateway Agents For each non-Oracle system that you access, Heterogeneous Services can use a transparent gateway agent to interface with the specified non-Oracle system. The agent is specific to the nonOracle system, so each type of system requires a different agent. The transparent gateway agent facilitates communication between Oracle and non-Oracle databases and uses the Heterogeneous Services component in the Oracle database server. The agent executes SQL and transactional requests at the non-Oracle system on behalf of the Oracle database server. Generic Connectivity Generic connectivity enables you to connect to non-Oracle data stores by using either a Heterogeneous Services ODBC agent or a Heterogeneous Services OLE DB agent--both are included with your Oracle product as a standard feature. Any data source compatible with the ODBC or OLE DB standards can be accessed using a generic connectivity agent. The advantage to generic connectivity is that it may not be required for you to purchase and configure a separate system-specific agent. You use an ODBC or OLE DB driver that can interface with the agent. However, some data access features are only available with transparent gateway agents. Distributed Databases Versus Distributed Processing The terms distributed database and distributed processing are closely related, yet have distinct meanings. There definitions are as follows: • Distributed database A set of databases in a distributed system that can appear to applications as a single data source. • Distributed processing The operations that occurs when an application distributes its tasks among different computers in a network. For example, a database application typically distributes frontend presentation tasks to client computers and allows a back-end database server to manage shared access to a database. Consequently, a distributed database application processing system is more commonly referred to as a client/server database application system. Oracle distributed database systems employ a distributed processing architecture. For example, an Oracle database server acts as a client when it requests data that another Oracle database server manages. Distributed databases use a client/server architecture to process information requests. Distributed Databases Versus Replicated Databases The terms distributed database system and database replication are related, yet distinct. In a pure (that is, not replicated) distributed database, the system manages a single copy of all data and supporting database objects. Typically, distributed database applications use distributed transactions to access both local and remote data and modify the global database in real-time. The term replication refers to the operation of copying and maintaining database objects in multiple databases belonging to a distributed system. While replication relies on distributed database technology, database replication offers applications benefits that are not possible within a pure distributed database environment. Most commonly, replication is used to improve local database performance and protect the availability of applications because alternate data access options exist. For example, an application may normally access a local database rather than a remote server to minimize network traffic and achieve maximum performance. Furthermore, the application can continue to function if the local server experiences a failure, but other servers with replicated data remain accessible. Data Replication: A popular option for data distribution as well as for fault tolerance of a database is to store a separate copy of the database at each of two or more sites. A copy of each fragment can be maintained at several sites. Data replication is the design process of deciding which fragments will be replicated. An Oracle distributed database system can incorporate Oracle databases of different versions. All supported releases of Oracle can participate in a distributed database system. Nevertheless, the applications that work with the distributed database must understand the functionality that is available at each node in the system. A distributed database application cannot expect an Oracle7 database to understand the SQL extensions that are only available with Oracle9i. Database Links The central concept in distributed database systems is a database link. A database link is a connection between two physical database servers that allows a client to access them as one logical database. What Are Database Links? A database link is a pointer that defines a one-way communication path from an Oracle database server to another database server. The link pointer is actually defined as an entry in a data dictionary table. To access the link, you must be connected to the local database that contains the data dictionary entry. A database link connection is one-way in the sense that a client connected to local database A can use a link stored in database A to access information in remote database B, but users connected to database B cannot use the same link to access data in database A. If local users on database B want to access data on database A, then they must define a link that is stored in the data dictionary of database B. A database link connection allows local users to access data on a remote database. For this connection to occur, each database in the distributed system must have a unique global database name in the network domain. The global database name uniquely identifies a database server in a distributed system. The Figure shows an example of user John accessing the emp table on the remote database with the global name hq.acme.com: Database Link Database links are either private or public. If they are private, then only the user who created the link has access; if they are public, then all database users have access. One principal difference among database links is the way that connections to a remote database occur. Users access a remote database through the following types of links: Type of Link Description Connected user link Users connect as themselves, which means that they must have an account on the remote database with the same username as their account on the local database. Fixed user link Users connect using the username and password referenced in the link. For example, if Jane uses a fixed user link that connects to the hq database with the username and password John/tiger, then she connects as John, Jane has all the privileges in hq granted to John directly, and all the default roles that John has been granted in the hq database. Current user A user connects as a global user. A local user can connect as a global user in the link context of a stored procedure--without storing the global user's password in a link definition. For example, Jane can access a procedure that John wrote, accessing John's account and John's schema on the hq database. Current user links are an aspect of Oracle Advanced Security. Why Use Database Links? The great advantage of database links is that they allow users to access another user's objects in a remote database so that they are bounded by the privilege set of the object's owner. In other words, a local user can access a link to a remote database without having to be a user on the remote database. For example, assume that employees submit expense reports to Accounts Payable (A/P), and further suppose that a user using an A/P application needs to retrieve information about employees from the hq database. The A/P users should be able to connect to the hq database and execute a stored procedure in the remote hq database that retrieves the desired information. The A/P users should not need to be hq database users to do their jobs; they should only be able to access hq information in a controlled way as limited by the procedure. Database links allow you to grant limited access on remote databases to local users. By using current user links, you can create centrally managed global users whose password information is hidden from both administrators and non-administrators. For example, A/P users can access the hq database as John, but unlike fixed user links, John's credentials are not stored where database users can see them. By using fixed user links, you can create non-global users whose password information is stored in unencrypted form in the LINK$ data dictionary table. Fixed user links are easy to create and require low overhead because there are no SSL or directory requirements, but a security risk results from the storage of password information in the data dictionary. Global Database Names in Database Links To understand how a database link works, you must first understand what a global database name is. Each database in a distributed database is uniquely identified by its global database name. Oracle forms a database's global database name by prefixing the database's network domain, specified by the DB_DOMAIN initialization parameter at database creation, with the individual database name, specified by the DB_NAME initialization parameter. For example, the Figure below illustrates a representative hierarchical arrangement of databases throughout a network. Hierarchical Arrangement of Networked Databases The name of a database is formed by starting at the leaf of the tree and following a path to the root. For example, the mfg database is in division3 of the acme_tools branch of the com domain. The global database name for mfg is created by concatenating the nodes in the tree as follows: • mfg.division3.acme_tools.com While several databases can share an individual name, each database must have a unique global database name. For example, the network domainsus.americas.acme_auto.com and uk.europe.acme_auto.com each contain a sales database. The global database naming system distinguishes the salesdatabase in the americas division from the sales database in the europe division as follows: • sales.us.americas.acme_auto.com • sales.uk.europe.acme_auto.com Creation of Database Links: (Lecture 3) Examples • Create database links using the CREATE DATABASE LINK statement. The table gives examples of SQL statements that create database links in a local database to the remote sales.us.americas.acme_auto.com database: SQL Statement Connects To Database CREATE DATABASE LINK sales.us.americas.acme_auto.com USING 'sales_us'; sales using net Connected user service namesales_us CREATE DATABASE LINK foo sales using CONNECT TO CURRENT_USER USING service 'am_sls'; nameam_sls Connects As Current global user Link Type Private connected user Private current user CREATE DATABASE LINK sales using net John using sales.us.americas.acme_auto.com service password tiger CONNECT TO John IDENTIFIED BY tiger namesales_us USING 'sales_us'; Private fixed user CREATE PUBLIC DATABASE LINK sales sales using net John using CONNECT TO John IDENTIFIED BY tiger service password tiger USING 'rev'; name rev Public fixed user CREATE SHARED PUBLIC DATABASE sales using net LINK sales.us.americas.acme_auto.com service CONNECT TO John IDENTIFIED BY tiger namesales AUTHENTICATED BY anupam IDENTIFIED BY bhide USING 'sales'; John using password tiger, authenticated as anupam using password bhide Shared public fixed user Schema Objects and Database Links After you have created a database link, you can execute SQL statements that access objects on the remote database. For example, to access remote object emp using database link foo, you can issue: SELECT * FROM emp@foo; You must also be authorized in the remote database to access specific remote objects. Constructing properly formed object names using database links is an essential aspect of data manipulation in distributed systems. Naming of Schema Objects Using Database Links Oracle uses the global database name to name the schema objects globally using the following scheme: schema.schema_object@global_database_name where: • schema is a collection of logical structures of data, or schema objects. A schema is owned by a database user and has the same name as that user. Each user owns a single schema. • schema_object is a logical data structure like a table, index, view, synonym, procedure, package, or a database link. • global_database_name is the name that uniquely identifies a remote database. This name must be the same as the concatenation of the remote database's initialization parameters DB_NAME and DB_DOMAIN, unless the parameter GLOBAL_NAMES is set to FALSE, in which case any name is acceptable. For example, using a database link to database sales.division3.acme.com, a user or application can reference remote data as follows: SELECT * FROM John.emp@sales.division3.acme.com; # emp table in John's schema SELECT loc FROM John.dept@sales.division3.acme.com; Authorization for Accessing Remote Schema Objects To access a remote schema object, you must be granted access to the remote object in the remote database. Further, to perform any updates, inserts, or deletes on the remote object, you must be granted the SELECT privilege on the object, along with the UPDATE, INSERT, or DELETE privilege. Unlike when accessing a local object, the SELECT privilege is necessary for accessing a remote object because Oracle has no remote describe capability. Oracle must do a SELECT * on the remote object in order to determine its structure. Database Link Restrictions You cannot perform the following operations using database links: • Grant privileges on remote objects • Execute DESCRIBE operations on some remote objects. The following remote objects, however, do support DESCRIBE operations: o Tables o Views o Procedures o Functions • Analyze remote objects • Define or enforce referential integrity • Grant roles to users in a remote database • Obtain nondefault roles on a remote database. For example, if jane connects to the local database and executes a stored procedure that uses a fixed user link connecting as John, jane receives John's default roles on the remote database. Jane cannot issue SET ROLE to obtain a nondefault role. • Execute hash query joins that use shared server connections • Use a current user authentication link without authentication through SSL, password, or NT native Distributed Database Administration Some of the concepts relating to database management in an a distributed database system: • Site Autonomy • Distributed • Auditing Database Security Database Links • Administration Tools Site Autonomy Site autonomy means that each server participating in a distributed database is administered independently from all other databases. Although several databases can work together, each database is a separate repository of data that is managed individually. Some of the benefits of site autonomy in an Oracle distributed database include: • Nodes of the system can mirror the logical organization of companies or groups that need to maintain independence. • Local administrators control corresponding local data. Therefore, each database administrator's domain of responsibility is smaller and more manageable. • Independent failures are less likely to disrupt other nodes of the distributed database. No single database failure need halt all distributed operations or be a performance bottleneck. • Administrators can recover from isolated system failures independently from other nodes in the system. • A data dictionary exists for each local database--a global catalog is not necessary to access local data. • Nodes can upgrade software independently. Although Oracle permits you to manage each database in a distributed database system independently, you should not ignore the global requirements of the system. For example, you may need to: • Create additional user accounts in each database to support the links that you create to facilitate server-to-server connections. • Set additional initialization parameters such as COMMIT_POINT_STRENGTH, and OPEN_LINKS. Distributed Database Security Authentication Through Database Links Database links are private or public, authenticated or non-authenticated. You create public links by specifying the PUBLIC keyword in the link creation statement. For example, you can issue: CREATE PUBLIC DATABASE LINK foo USING 'sales'; You create authenticated links by specifying the CONNECT TO clause, AUTHENTICATED BY clause, or both clauses together in the database link creation statement. For example, you can issue: CREATE DATABASE LINK sales CONNECT TO John IDENTIFIED BY tiger USING 'sales'; CREATE SHARED PUBLIC DATABASE LINK sales CONNECT TO mick IDENTIFIED BY jagger AUTHENTICATED BY david IDENTIFIED BY bowie USING 'sales'; Supporting User Accounts and Roles In a distributed database system, you must carefully plan the user accounts and roles that are necessary to support applications using the system. Note that: • The user accounts necessary to establish server-to-server connections must be available in all databases of the distributed database system. • The roles necessary to make available application privileges to distributed database application users must be present in all databases of the distributed database system. As you create the database links for the nodes in a distributed database system, determine which user accounts and roles each site needs to support server-to-server connections that use the links. In a distributed environment, users typically require access to many network services. When you must configure separate authentications for each user to access each network service, security administration can become unwieldy, especially for large systems. Centralized User and Privilege Management Oracle provides different ways for you to manage the users and privileges involved in a distributed system. For example, you have these options: • Enterprise user management. You can create global users who are authenticated through SSL or by using passwords, then manage these users and their privileges in a directory through an independent enterprise directory service. • Network authentication service. This common technique simplifies security management for distributed environments. You can use the Oracle Advanced Security option to enhance Oracle Net and the security of an Oracle distributed database system. Windows NT native authentication is an example of a non-Oracle authentication solution. Schema-Dependent Global Users One option for centralizing user and privilege management is to create the following: • A global user • A user in a centralized directory in every database that the global user must connect to For example, you can create a global user called fred with the following SQL statement: CREATE USER fred IDENTIFIED GLOBALLY AS 'CN=fred adams,O=Oracle,C=England'; This solution allows a single global user to be authenticated by a centralized directory. The schema-dependent global user solution has the consequence that you must create a user called fred on every database that this user must access. Because most users need permission to access an application schema but do not need their own schemas, the creation of a separate account in each database for every global user creates significant overhead. Because of this problem, Oracle also supports schema-independent users, which are global users that an access a single, generic schema in every database. Administration Tools The database administrator has several choices for tools to use when managing an Oracle distributed database system: • Enterprise Manager • Third-Party Administration Tools • SNMP Support Enterprise Manager Enterprise Manager is Oracle's database administration tool that provides a graphical user interface (GUI). Enterprise Manager provides administrative functionality for distributed databases through an easy-to-use interface. You can use Enterprise Manager to: • Administer multiple databases. You can use Enterprise Manager to administer a single database or to simultaneously administer multiple databases. • Centralize database administration tasks. You can administer both local and remote databases running on any Oracle platform in any location worldwide. In addition, these Oracle platforms can be connected by any network protocols supported by Oracle Net. • Dynamically execute SQL, PL/SQL, and Enterprise Manager commands. You can use Enterprise Manager to enter, edit, and execute statements. Enterprise Manager also maintains a history of statements executed. Thus, you can reexecute statements without retyping them, a particularly useful feature if you need to execute lengthy statements repeatedly in a distributed database system. • Manage security features such as global users, global roles, and the enterprise directory service. Third-Party Administration Tools Currently more than 60 companies produce more than 150 products that help manage Oracle databases and networks, providing a truly open environment. SNMP Support Besides its network administration capabilities, Oracle Simple Network Management Protocol (SNMP) support allows an Oracle database server to be located and queried by any SNMP-based network management system. SNMP is the accepted standard underlying many popular network management systems such as: • HP's OpenView • Digital's • IBM's POLYCENTER Manager on NetView NetView/6000 • Novell's NetWare Management System • SunSoft's SunNet Manager Transaction Processing in a Distributed System A transaction is a logical unit of work constituted by one or more SQL statements executed by a single user. A transaction begins with the user's first executable SQL statement and ends when it is committed or rolled back by that user. A remote transaction contains only statements that access a single remote node. A distributed transaction contains statements that access more than one node. The followings define important concepts in transaction processing and explain how transactions access data in a distributed database: • Remote SQL Statements • Distributed • Shared SQL Statements SQL for Remote and Distributed Statements • Remote Transactions • Distributed Transactions • Two-Phase Commit Mechanism • Database Link Name Resolution • Schema Object Name Resolution Remote SQL Statements A remote query statement is a query that selects information from one or more remote tables, all of which reside at the same remote node. For example, the following query accesses data from the dept table in the John schema of the remote sales database: SELECT * FROM John.dept@sales.us.americas.acme_auto.com; A remote update statement is an update that modifies data in one or more tables, all of which are located at the same remote node. For example, the following query updates the dept table in the John schema of the remote sales database: UPDATE John.dept@mktng.us.americas.acme_auto.com SET loc = 'NEW YORK' WHERE deptno = 10; Distributed SQL Statements A distributed query statement retrieves information from two or more nodes. For example, the following query accesses data from the local database as well as the remote sales database: SELECT ename, dname FROM John.emp e, John.dept@sales.us.americas.acme_auto.com d WHERE e.deptno = d.deptno; A distributed update statement modifies data on two or more nodes. A distributed update is possible using a PL/SQL subprogram unit such as a procedure or trigger that includes two or more remote updates that access data on different nodes. For example, the following PL/SQL program unit updates tables on the local database and the remote sales database: BEGIN UPDATE John.dept@sales.us.americas.acme_auto.com SET loc = 'NEW YORK' WHERE deptno = 10; UPDATE John.emp SET deptno = 11 WHERE deptno = 10; END; COMMIT; Oracle sends statements in the program to the remote nodes, and their execution succeeds or fails as a unit. Shared SQL for Remote and Distributed Statements The mechanics of a remote or distributed statement using shared SQL are essentially the same as those of a local statement. The SQL text must match, and the referenced objects must match. If available, shared SQL areas can be used for the local and remote handling of any statement or decomposed query. Remote Transactions A remote transaction contains one or more remote statements, all of which reference a single remote node. For example, the following transaction contains two statements, each of which accesses the remote sales database: UPDATE John.dept@sales.us.americas.acme_auto.com SET loc = 'NEW YORK' WHERE deptno = 10; UPDATE John.emp@sales.us.americas.acme_auto.com SET deptno = 11 WHERE deptno = 10; COMMIT; Distributed Transactions A distributed transaction is a transaction that includes one or more statements that, individually or as a group, update data on two or more distinct nodes of a distributed database. For example, this transaction updates the local database and the remote sales database: UPDATE John.dept@sales.us.americas.acme_auto.com SET loc = 'NEW YORK' WHERE deptno = 10; UPDATE John.emp SET deptno = 11 WHERE deptno = 10; COMMIT; There are two types of permissible operations in distributed transactions: • DML and DDL Transactions • Transaction Control Statements DML and DDL Transactions The following list describes DML and DDL operations supported in a distributed transaction: • CREATE TABLE AS SELECT • DELETE • INSERT (default and • LOCK direct load) TABLE • SELECT • SELECT FOR UPDATE You can execute DML and DDL statements in parallel, and INSERT direct load statements serially, but note the following restrictions: • All remote operations • These statements must be SELECT statements. must not be clauses in another distributed transaction. • If the table referenced in the table_expression_clause of an INSERT, UPDATE, or DELETE statement is remote, then execution is serial rather than parallel. • You cannot perform remote operations after issuing parallel DML/DDL or direct load INSERT. • If the transaction begins using XA or OCI, it executes serially. • No loopback operations can be performed on the transaction originating the parallel operation. For example, you cannot reference a remote object that is actually a synonym for a local object. • If you perform a distributed operation other than a SELECT in the transaction, no DML is parallelized. Transaction Control Statements The following list describes supported transaction control statements: o COMMIT o ROLLBACK o SAVEPOINT Properties of Transaction: A Transaction has four properties that lead to the consistency and reliability of a distributed database. These are Atomicity, Consistency, Isolation, and Durability. · Atomicity: This refers to the fact that a transaction is treated as a unit of operation. It dictates that either all the actions related to a transaction are completed or none of them is carried out. · Consistency: The consistency of a transaction is its correctness. In other words, a transaction is a correct program that maps one consistent database state into another. · Isolation: According to this property, each transaction should see a consistent database at all times. Consequently, no other transaction can read or modify data that is being modified by another transaction. · Durability: This property ensures that once a transaction commits, its results are permanent and cannot be erased from the database. This means that whatever happens after the COMMIT of a transaction, whether it is a system crash or aborts of other transactions, the results already committed are not modified or undone. Session Trees for Distributed Transactions As the statements in a distributed transaction are issued, Oracle defines a session tree of all nodes participating in the transaction. A session tree is a hierarchical model that describes the relationships among sessions and their roles. The figure illustrates a session tree: Example of a Session Tree All nodes participating in the session tree of a distributed transaction assume one or more of the following roles: Role Description Client A node that references information in a database belonging to a different node. Database server A node that receives a request for information from another node. Global coordinator The node that originates the distributed transaction. Local coordinator A node that is forced to reference data on other nodes to complete its part of the transaction. Commit point site The node that commits or rolls back the transaction as instructed by the global coordinator. The role a node plays in a distributed transaction is determined by: • Whether the transaction is local or remote • The commit point strength of the node ("Commit Point Site") • Whether all requested data is available at a node, or whether other nodes need to be referenced to complete the transaction • Whether the node is read-only Clients A node acts as a client when it references information from another node's database. The referenced node is a database server. In the figure above , the node sales is a client of the nodes that host the warehouse and finance databases. Database Servers A database server is a node that hosts a database from which a client requests data. In the figure above, an application at the sales node initiates a distributed transaction that accesses data from the warehouse and finance nodes. Therefore,sales.acme.com has the role of client node, and warehouse and finance are both database servers. In this example, sales is a database server and a client because the application also modifies data in the sales database. Local Coordinators A node that must reference data on other nodes to complete its part in the distributed transaction is called a local coordinator. In the figure above, sales is a local coordinator because it coordinates the nodes it directly references: warehouse and finance. The node sales also happens to be the global coordinator because it coordinates all the nodes involved in the transaction. A local coordinator is responsible for coordinating the transaction among the nodes it communicates directly with by. Global Coordinator The node where the distributed transaction originates is called the global coordinator. The database application issuing the distributed transaction is directly connected to the node acting as the global coordinator. For example, in the figure, the transaction issued at the node sales references information from the database servers warehouse and finance. Therefore, sales.acme.com is the global coordinator of this distributed transaction. The global coordinator becomes the parent or root of the session tree. The global coordinator performs the following operations during a distributed transaction: • Sends all of the distributed transaction's SQL statements, remote procedure calls, and so forth to the directly referenced nodes, thus forming the session tree • Instructs all directly referenced nodes other than the commit point site to prepare the transaction • Instructs the commit point site to initiate the global commit of the transaction if all nodes prepare successfully • Instructs all nodes to initiate a global rollback of the transaction if there is an abort response • Receiving • Passing and relaying transaction status information to and from those nodes queries to those nodes • Receiving queries from those nodes and passing them on to other nodes • Returning the results of queries to the nodes that initiated them Commit Point Site The job of the commit point site is to initiate a commit or roll back operation as instructed by the global coordinator. The system administrator always designates one node to be the commit point site in the session tree by assigning all nodes a commit point strength. The node selected as commit point site should be the node that stores the most critical data. Commit Point Site The commit point site is distinct from all other nodes involved in a distributed transaction in these ways: • The commit point site never enters the prepared state. Consequently, if the commit point site stores the most critical data, this data never remains in-doubt, even if a failure occurs. In failure situations, failed nodes remain in a prepared state, holding necessary locks on data until in-doubt transactions are resolved. • The commit point site commits before the other nodes involved in the transaction. In effect, the outcome of a distributed transaction at the commit point site determines whether the transaction at all nodes is committed or rolled back: the other nodes follow the lead of the commit point site. The global coordinator ensures that all nodes complete the transaction in the same manner as the commit point site. How a Distributed Transaction Commits A distributed transaction is considered committed after all non-commit point sites are prepared, and the transaction has been actually committed at the commit point site. The online redo log at the commit point site is updated as soon as the distributed transaction is committed at this node. Because the commit point log contains a record of the commit, the transaction is considered committed even though some participating nodes may still be only in the prepared state and the transaction not yet actually committed at these nodes. In the same way, a distributed transaction is considered not committed if the commit has not been logged at the commit point site. Two-Phase Commit Mechanism Unlike a transaction on a local database, a distributed transaction involves altering data on multiple databases. Consequently, distributed transaction processing is more complicated, because Oracle must coordinate the committing or rolling back of the changes in a transaction as a self-contained unit. In other words, the entire transaction commits, or the entire transaction rolls back. Oracle ensures the integrity of data in a distributed transaction using the two-phase commit mechanism. In the prepare phase, the initiating node in the transaction asks the other participating nodes to promise to commit or roll back the transaction. During the commit phase, the initiating node asks all participating nodes to commit the transaction. If this outcome is not possible, then all nodes are asked to roll back. All participating nodes in a distributed transaction should perform the same action: they should either all commit or all perform a rollback of the transaction. Oracle automatically controls and monitors the commit or rollback of a distributed transaction and maintains the integrity of the global database (the collection of databases participating in the transaction) using the twophase commit mechanism. This mechanism is completely transparent, requiring no programming on the part of the user or application developer. The commit mechanism has the following distinct phases, which Oracle performs automatically whenever a user commits a distributed transaction: • Prepare Phase • Commit Phase • Forget Phase Prepare Phase The first phase in committing a distributed transaction is the prepare phase. In this phase, Oracle does not actually commit or roll back the transaction. Instead, all nodes referenced in a distributed transaction (except the commit point site, described in the "Commit Point Site") are told to prepare to commit. By preparing, a node: • Records information in the online redo logs so that it can subsequently either commit or roll back the transaction, regardless of intervening failures • Places a distributed lock on modified tables, which prevents reads When a node responds to the global coordinator that it is prepared to commit, the prepared node promises to either commit or roll back the transaction later--but does not make a unilateral decision on whether to commit or roll back the transaction. The promise means that if an instance failure occurs at this point, the node can use the redo records in the online log to recover the database back to the prepare phase. Prepared Response When a node has successfully prepared, it issues a prepared message. The message indicates that the node has records of the changes in the online log, so it is prepared either to commit or perform a rollback. The message also guarantees that locks held for the transaction can survive a failure. Read-Only Response When a node is asked to prepare, and the SQL statements affecting the database do not change the node's data, the node responds with a read-only message. The message indicates that the node will not participate in the commit phase. Abort Response When a node cannot successfully prepare, it performs the following actions: 1. Releases resources currently held by the transaction and rolls back the local portion of the transaction. 2. Responds to the node that referenced it in the distributed transaction with an abort message. These actions then propagate to the other nodes involved in the distributed transaction so that they can roll back the transaction and guarantee the integrity of the data in the global database. This response enforces the primary rule of a distributed transaction: all nodes involved in the transaction either all commit or all roll back the transaction at the same logical time. Steps in the Prepare Phase To complete the prepare phase, each node excluding the commit point site performs the following steps: 1. The node requests that its descendants, that is, the nodes subsequently referenced, prepare to commit. 2. The node checks to see whether the transaction changes data on itself or its descendants. If there is no change to the data, then the node skips the remaining steps and returns a read-only response (see "Read-Only Response"). 3. The node allocates the resources it needs to commit the transaction if data is changed. 4. The node saves redo records corresponding to changes made by the transaction to its online redo log. 5. The node guarantees that locks held for the transaction are able to survive a failure. 6. The node responds to the initiating node with a prepared response (see "Prepared Response") or, if its attempt or the attempt of one of its descendents to prepare was unsuccessful, with an abort response (see "Abort Response"). These actions guarantee that the node can subsequently commit or roll back the transaction on the node. The prepared nodes then wait until a COMMIT orROLLBACK request is received from the global coordinator. After the nodes are prepared, the distributed transaction is said to be in-doubt (see "In-Doubt Transactions"). It retains in-doubt status until all changes are either committed or rolled back. Commit Phase The second phase in committing a distributed transaction is the commit phase. Before this phase occurs, all nodes other than the commit point site referenced in the distributed transaction have guaranteed that they are prepared, that is, they have the necessary resources to commit the transaction. Steps in the Commit Phase The commit phase consists of the following steps: 1. The global coordinator instructs the commit point site to commit. 2. The commit point site commits. 3. The commit point site informs the global coordinator that it has committed. 4. The global and local coordinators send a message to all nodes instructing them to commit the transaction. 5. At each node, Oracle commits the local portion of the distributed transaction and releases locks. 6. At each node, Oracle records an additional redo entry in the local redo log, indicating that the transaction has committed. 7. The participating nodes notify the global coordinator that they have committed. When the commit phase is complete, the data on all nodes of the distributed system is consistent. Guaranteeing Global Database Consistency Each committed transaction has an associated system change number (SCN) to uniquely identify the changes made by the SQL statements within that transaction. The SCN functions as an internal Oracle timestamp that uniquely identifies a committed version of the database. In a distributed system, the SCNs of communicating nodes are coordinated when all of the following actions occur: • A connection occurs using the path described by one or more database links • A distributed SQL statement executes • A distributed transaction commits Among other benefits, the coordination of SCNs among the nodes of a distributed system ensures global read-consistency at both the statement and transaction level. If necessary, global time-based recovery can also be completed. During the prepare phase, Oracle determines the highest SCN at all nodes involved in the transaction. The transaction then commits with the high SCN at the commit point site. The commit SCN is then sent to all prepared nodes with the commit decision. Forget Phase After the participating nodes notify the commit point site that they have committed, the commit point site can forget about the transaction. The following steps occur: 1. After receiving notice from the global coordinator that all nodes have committed, the commit point site erases status information about this transaction. 2. The commit point site informs the global coordinator that it has erased the status information. 3. The global coordinator erases its own information about the transaction. In-Doubt Transactions The two-phase commit mechanism ensures that all nodes either commit or perform a rollback together. What happens if any of the three phases fails because of a system or network error? The transaction becomes in-doubt. Distributed transactions can become in-doubt in the following ways: • A server machine running Oracle software crashes • A network connection between two or more Oracle databases involved in distributed processing is disconnected • An unhandled software error occurs The RECO process automatically resolves in-doubt transactions when the machine, network, or software problem is resolved. Until RECO can resolve the transaction, the data is locked for both reads and writes. Oracle blocks reads because it cannot determine which version of the data to display for a query. Distributed Database Application Development Application development in a distributed system raises issues that are not applicable in a no distributed system. This section contains the following topics relevant for distributed application development: • Transparency in a Distributed Database System • Remote Procedure Calls • Distributed (RPCs) Query Optimization Transparency in a Distributed Database System With minimal effort, you can develop applications that make an Oracle distributed database system transparent to users that work with the system. The goal of transparency is to make a distributed database system appear as though it is a single Oracle database. Consequently, the system does not burden developers and users of the system with complexities that would otherwise make distributed database application development challenging and detract from user productivity. The following sections explain more about transparency in a distributed database system. Location Transparency An Oracle distributed database system has features that allow application developers and administrators to hide the physical location of database objects from applications and users. Location transparency exists when a user can universally refer to a database object such as a table, regardless of the node to which an application connects. Location transparency has several benefits, including: • Access to remote data is simple, because database users do not need to know the physical location of database objects. • Administrators can move database objects with no impact on end-users or existing database applications. Typically, administrators and developers use synonyms to establish location transparency for the tables and supporting objects in an application schema. For example, the following statements create synonyms in a database for tables in another, remote database. CREATE PUBLIC SYNONYM emp FOR John.emp@sales.us.americas.acme_auto.com; CREATE PUBLIC SYNONYM dept FOR John.dept@sales.us.americas.acme_auto.com; Now, rather than access the remote tables with a query such as: SELECT ename, dname FROM John.emp@sales.us.americas.acme_auto.com e, John.dept@sales.us.americas.acme_auto.com d WHERE e.deptno = d.deptno; An application can issue a much simpler query that does not have to account for the location of the remote tables. SELECT ename, dname FROM emp e, dept d WHERE e.deptno = d.deptno; In addition to synonyms, developers can also use views and stored procedures to establish location transparency for applications that work in a distributed database system. SQL and COMMIT Transparency Oracle's distributed database architecture also provides query, update, and transaction transparency. For example, standard SQL statements such as SELECT, INSERT, UPDATE, and DELETE work just as they do in a non-distributed database environment. Additionally, applications control transactions using the standard SQL statements COMMIT, SAVEPOINT, and ROLLBACK--there is no requirement for complex programming or other special operations to provide distributed transaction control. • The statements in a single transaction can reference any number of local or remote tables. • Oracle guarantees that all nodes involved in a distributed transaction take the same action: they either all commit or all roll back the transaction. • If a network or system failure occurs during the commit of a distributed transaction, the transaction is automatically and transparently resolved globally. Specifically, when the network or system is restored, the nodes either all commit or all roll back the transaction. Internal to Oracle, each committed transaction has an associated system change number (SCN) to uniquely identify the changes made by the statements within that transaction. In a distributed database, the SCNs of communicating nodes are coordinated when: • A connection is established using the path described by one or more database links. • A distributed SQL statement is executed. • A distributed transaction is committed. Among other benefits, the coordination of SCNs among the nodes of a distributed database system allows global distributed read-consistency at both the statement and transaction level. If necessary, global distributed time-based recovery can also be completed. Remote Procedure Calls (RPCs) Developers can code PL/SQL packages and procedures to support applications that work with a distributed database. Applications can make local procedure calls to perform work at the local database and remote procedure calls (RPCs) to perform work at a remote database. When a program calls a remote procedure, the local server passes all procedure parameters to the remote server in the call. For example, the following PL/SQL program unit calls the packaged procedure del_emp located at the remote sales database and passes it the parameter 1257: BEGIN emp_mgmt.del_emp@sales.us.americas.acme_auto.com(1257); END; In order for the RPC to succeed, the called procedure must exist at the remote site, and the user being connected to must have the proper privileges to execute the procedure. When developing packages and procedures for distributed database systems, developers must code with an understanding of what program units should do at remote locations, and how to return the results to a calling application. Distributed Query Optimization Distributed query optimization is an Oracle feature that reduces the amount of data transfer required between sites when a transaction retrieves data from remote tables referenced in a distributed SQL statement. Distributed query optimization uses Oracle's cost-based optimization to find or generate SQL expressions that extract only the necessary data from remote tables, process that data at a remote site or sometimes at the local site, and send the results to the local site for final processing. This operation reduces the amount of required data transfer when compared to the time it takes to transfer all the table data to the local site for processing. Using various cost-based optimizer hints such as DRIVING_SITE, NO_MERGE, and INDEX, you can control where Oracle processes the data and how it accesses the data. CONCURRENCY CONTROL IN DISTRIBUTED DATABASE SYSTEMS 1. DESCRIPTION OF THE PROBLEM Today's Database Management Systems (DBMSs) work in multiuser environment where users access the database concurrently. Therefore the DBMSs control the concurrent execution of user transactions, so that the overall correction of the database is maintained. A transaction is a user program accessing the database. Research in database concurrency control has advanced in a different direction from areas that may appear related such as operating systems concurrency. Database concurrency control permits users to access a database in a multi programmed fashion while preserving the illusion that each user is executing alone on a dedicated system. The main difficulty in achieving this goal is to prevent database updates performed by one user from interfering with database retrievals and updates performed by another. As an example, consider an on-line airline reservation system. Suppose two customers Customer A and Customer B, simultaneously try to reserve a seat for the same flight. In the absence of concurrency control, these two activities could interfere as illustrated in Figure 1. Let Seat No 18 be the first available seat. Both transactions could read the reservation information approximately same time and they reserve the seat No 18 for Customer A and Customer B, and store the result back into the database. The net effect is incorrect: Although two customers reserved a seat, the database reflects only one activity, the other reservation is lost by the system. As it is apparent from the example, it is necessary to establish a correctness criterion for the execution of concurrent user transactions. Serializability is the correctness criterion for the execution of concurrent transactions. The execution of concurrent transactions, which is termed as a history or as a log, is serializable if it produces the same output and has the same effect on the database as some serial execution of the same transactions. A log is serial if, for every pair of transactions, all of the operations of one transaction execute before any of the operations of the other. However, deciding on whether an equivalent serial log exists, is an NP_complete problem, that is, there is no known algorithm which will decide in polynomial time on whether any given log is serializable. Since serializability problem is NP_complete, several subclasses of serializable logs having polynomial time membership test are introduced. The popular subclasses are: the class of serial logs (class S), the class of logs produced by Two_phase locking schedulers (class 2PL), the class of logs produced by Basic Timestamp Ordering schedulers (class BTO) and the class of Conflict Preserving Serializable logs (class CPSR), whose scheduler is based on Serialization Graph Testing (SGT). Figure 1. Example of an anomaly in database in the absence of concurrency control. Concurrency control in DBMSs is achieved by a program, called the scheduler, whose goal is to order the operations of transactions in such a way that the resulting log is serializable. Practically, 2PL is the most popular scheduling technique for the centralized DBMSs. However for distributed DBMSs, 2PL induces high communication cost because of the deadlock problem. Therefore, improved algorithms for concurrency control in distributed DBMSs is one of the active research areas in database theory. Theoretically the class CPSR had been the most attractive log class until 1987, because CPSR was the largest known class of serializable logs in P. However in 1987 a new class of serializable logs in P, called the class WRW is introduced and it is proved that the class WRW is a proper superset of the class CPSR. Almost at the same time the class HD is introduced and it is proved that HD is a proper superset of the class WRW, which makes the HD class the largest known serializable log class in P. 2. DATABASE SYSTEM MODEL A database is a structured collection of data items, denoted {...,x,y,z} that can be accessed concurrently by several transactions. The size of the data contained in a data item is called the granularity of data item. The granularity is not important for the scope of this study and practically it could be chosen as a file, a record of a file, a field of a record or a page of a disk. The values of the data items at any time comprise the state of the database. A Database System (DBS) is a collection of software and hardware modules that supports the operations performed by the transactions. Users interact with the DBS through transactions. A transaction interacts with the outside world by issuing only read and write operations to the DBS or by doing terminal I/O. Users access data items by issuing Read and Write operations. A transaction, denoted by Ti, is a set of operations on data items that transforms a database from one consistent state to another. That is, transactions are assumed to be complete and correct computation units and if each transaction is executed alone on an initially consistent database, would terminate, produce correct results and leave database in a consistent state. A DBS simply consists of the following modules: Transaction manager (TM), Data Manager (DM) and Scheduler (Figure 2). A Distributed Database System (DDBS) is a collection of sites connected by a communication network and each site is simply a DBS. However in DDBSs each site runs one or more of the following software modules: a TM, a DM or a Scheduler. In DDBSs the schedulers may be distributed, that is there may be a local scheduler at each site. However the local schedulers must cooperate for the consistency of the database. The distributed schedulers must behave as if there is a global scheduler in the system (Figure 3.). TM performs the preprocessing of the operations it receives from transactions and DM manages the actual database while the scheduler controls the relative order in which these operations are executed. Figure 2. Database System Model Figure 3. Distributed Database System Model Each transaction issues its operations to a single TM, which receives the operations issued by transactions and forwards them to the scheduler. In DDBSs, the TM is also responsible for determining which scheduler should process the operation submitted by a transaction. The scheduler is responsible for the consistency of the database. However, a scheduler does not pay attention to the computations performed by the transactions, it makes its decision solely by considering the type of the operations and the data items related to the operations. The scheduler controls the order in which DMs process the read and write operations. When a scheduler receives a read or a write operation, it can either output the operation to the related DM, or delay the operation by holding it for later action, or reject the operation. If an operation of a transaction is rejected, then the transaction should be aborted. Furthermore, every transaction that read a value written by the aborted transaction should also be aborted. This, phenomenon, where an abortion triggers other abortions is called cascading aborts and it is usually avoided by not allowing a transaction Ti to read another transaction Tj's output until Tj is committed, that is, until the DBS is certain that the transaction Tj will not be aborted. Therefore, an incomplete transaction can not reveal its results to other transactions before its commitment, which is called isolation. Usually, a scheduler decides whether to accept, reject or delay an operation every time it receives the operation. Another approach is to schedule each operation immediately as it is received. When a transaction terminates, a validation test is applied on the transaction. If the validation test terminates successfully then the transaction is committed, otherwise it is aborted. Such schedulers are called the optimistic schedulers because they optimistically assume that transactions will not be aborted. These schedulers are also called the certifiers. The DM executes each read and write operation it receives. For a read operation, DM looks into its local database and returns the requested value. For a write operation, the DM modifies its local database and returns an acknowledgment. The DM sends the returned value or acknowledgment to the scheduler, which relays it back to the TM, which relays it back to the transaction. The read and write operations performed by transactions on some data item x are denoted by Ri[x] and Wi[x] respectively. The read operation Ri[x] returns the value stored in data item x to transaction Ti and the write operation Wi[x] changes the value of data item x to the one computed by Ti. The read operations has no effect on the consistency of the database. However, since the write operations update the values of the data items, they cause a change in the state of the database. Two operations belonging to different transactions conflict if they operate on the same data item and one of them is a write. The read set, denoted by S(Ri), of a transaction is the set of data items a transaction reads and the write set of a transaction, denoted S(Wi), is the set of data items a transaction writes. The access set or the base set of a transaction is the union of its read and its write sets. When two or more transactions execute concurrently, their operations are executed in an interleaved fashion. Each transaction starts with a begin operation and ends with a commit or abort operation. Commit indicates that the transaction has completed its execution and the effects of the transaction on the database, that is every write operation processed on behalf of the transaction, should be made permanent. Abort indicates that the transaction has completed abnormally and its effects should be undone by restoring the old values of the data item. The most common commitment protocol is two phase commitment (2PC). In the fist phase of 2PC, the values of data items in the write set of the transaction are copied into the secure storage at the related sites without overwriting the old values. If the first phase terminates successfully, the transaction commits and it cannot be aborted from this point on. Then the commit message is sent to the related sites and the effects of the transaction are made permanent by writing the values from the secure storage into the actual database. If a failure occurs during the first phase of 2PC, the transaction is aborted by simply omitting the values copied into the secure storage. If the transaction fail during the second phase of 2PC, there is no need for abortion, however the values copied into the secure storage are written into the actual database when the failed site awakes By the use of 2PC, cascading aborts are also avoided, because the write operations are applied into the database only when the transactions commit. 3. TWO PHASE LOCKING METHOD The two phase locking (2PL) schedulers is the most popular type of schedulers in commercial products. In 2PL technique two type of locks, which are the read lock and the write lock, are used on the data items. The read lock is a shared type of lock whereas the write lock is an exclusive type. That is, a transaction can have a read lock on a data item only if there is no write lock on the data item by any other transaction and a transaction can have a write lock on a data item only if there is no read lock or write lock on the data item by any other transaction. In a database system having 2PL mechanism for the concurrency control, each transaction obeys the following rules: 1. a transaction does not request a read or write lock on a data item if it already has that type of lock on that data item; 2. a transaction must have a read lock on a data item before reading the data item and it must have a write lock on a data item before writing the data item; 3. a transaction does not request any lock after it has released a lock. Each transaction obeying the third rule has two phases, this is the reason why the technique is called two phase locking. During the first phase, which is called the growing phase, a transaction obtains its locks without releasing any lock. The point at the end of the growing phase, when a transaction owns all the locks it will ever own, is called the locked point of the transaction. During the second phase, which is called the shrinking phase, the transaction releases the locks it has obtained in the first phase. It should be noted that a transaction can not request a lock in the shrinking phase. When the transaction terminates all the locks obtained by the transaction are released. The locked points of the transactions in a log L determines the serialization order of the transactions. Two phase locking is sufficient to preserve serializability. However 2PL is not sufficient to preserve isolation. If a transaction Ti releases some of its write locks before its commitment, then some other transaction Tj may read this value. In the case Ti is aborted, Tj and all the other transactions which have read some data item from Ti should also be aborted. In order to guarantee isolation, transactions are required to hold all of their locks until their commitment at the termination. 4. TIME STAMP ORDERING METHODS The time stamp ordering (TO) technique is based on the idea that an operation is allowed to proceed only if all the conflicting operations of older transactions have already been processed. Thus the serializability of the transactions is preserved. This requires knowledge about which transactions are younger than the others. In the implementation of a distributed timing system, each site in the distributed system contains a local clock or a counter which is used as a clock. The clock is assumed to tick at least once between any two events. Therefore the events within the site are totally ordered. For total ordering of the events at different sites, each site is assigned a unique number and this number is concatenated as least significant bits to the current value of the local clock. Furthermore, each message contains the information about the local time of their site of origin at which the message is sent. Upon receiving a message, the local clock at the receiver site is advanced to be later than the time informed by the message. (Lamport Clock) In a distributed system having such a clock facility, a timestamp, denoted by TS(A), is assigned to an event A, which is the local time at which the event occurs concatenated with the site identifier. TS(A) uniquely identifies the event itself and for any two events A and B, if event A has happened before the event B, then TS(A) < TS(B). In timestamp ordering technique, each transaction Ti is assigned a unique timestamp TS(Ti). Each operation issued by the transaction Ti is also assigned the same timestamp TS(Ti) and the conflicting operations are executed in the order of their timestamps. That is the transactions obey the rule known as the TO rule, which states that if pi[x] and qj[x] are conflicting operations belonging to different transactions Ti and Tj, then pi is to be executed before qj iff TS(Ti) < TS(Tj). Therefore, the transactions are processed such that their execution is equivalent to the execution of a serial log, where the transactions are ordered in the order of their timestamps. 5. SERIALIZATION GRAPH TESTING METHOD (SGT OR CPSR) The serialization graph for a log L, denoted by SG(L), is a directed graph, where the nodes are the transactions in the log. In SG(L) there is an edge from node Ti to Tj if and only if pi[x] and qj[x] are conflicting operations belonging to transactions Ti and Tj respectively and the operation pi[x] precedes qj[x] in the log. A serialization graph testing (SGT) or conflict preserving serializability (CPSR) scheduler works by explicitly building a serialization graph. When SGT scheduler receives an operation pi[x], it adds the node Ti into the graph if Ti does not already exist in the graph, and then it adds an edge from Tj to Ti for every previously scheduled operation qj[x] that conflict with pi[x]. If the resulting graph is cyclic, then the operation pi[x] is rejected, transaction Ti is aborted and the serialization graph is modified by removing all the edges coming in or out of the node Ti and by removing also node Ti itself. A distributed database management system A distributed database management system is the software that manages the Distributed Databases, and provides an access mechanism that makes this distribution transparent to the user. The objective of a distributed database management system (DDBMS) is to control the management of a distributed database (DDB) in such a way that it appears to the user as a centralized database. This image of centralized environment can be accomplished with the support of various kinds of transparencies such as: Location Transparency, Performance Transparency, Copy Transparency, Transaction Transparency, Transaction Transparency, Fragment Transparency, Schema Change Transparency, and Local DBMS Transparency. Distributed Database - Fragmentation Fragmentation involves breaking a relation (table) into two or more pieces either horizontally (Horizontal Fragmentation) or vertically (Vertical Fragmentation) or both (Hybrid), mainly to improve the availability of data to the end user and end user programs. Let us start this section with an example. Consider XYZ bank, which is currently having around 1000 branches all over the country. Assume that it maintains its database at single location, say New Delhi (Head office - Central Site). Now the problem is that, all the requests generated from any part of the country can only be handled at the central site (New Delhi). The requests might be generated for withdrawal of money, balance inquiry, PIN change request, transfer of funds, POS purchase, etc., through ATM, Net Banking, POS terminals. Think about the number of transactions could be generated and the network traffic created if thousands of the bank customer uses the above said mode for daily transactions, including direct bank transactions at the bank counters. One possible solution for handling such a huge number of transactions is to have distributed database. But, we have set of questions in front of us. They are; • How are we going to fragment a table? • How many fragments to be created? • Which strategy of fragmentation would help improving the performance? • Should one need to fragment all the tables in a database or only a few tables? • Where do we keep the fragments after fragmentation? (Allocation problem) Answer to these questions would help us in understanding, fragmenting, and improving the overall system. Types of Fragmentation: The first question 'How are we going to fragment a table?' can be answered here. We have the following types of fragmentation. 1. Horizontal Fragmentation 2. Vertical Fragmentation 3. Hybrid Fragmentation 1. Horizontal Fragmentation: A relation (table) is partitioned into multiple subsets horizontally using simple conditions. Let us take a relation of schema Account(Acno, Balance, Branch_Name, Type). If the permitted values forBranch_Name attribute are 'New Delhi', 'Chennai', and 'Mumbai', then the following SQL query would fragment the bunch of tuples (records) satisfying a simple condition. SELECT * FROM account WHERE branch_name = 'Chennai'; This query would get you all the records pertaining to the 'Chennai' branch, without any changes in the schema of the table. We could get three such bunch of records if we change the branch_name value in the WHERE clause of the above query, one for 'Chennai', one for 'New Delhi', and one for 'Mumbai'. This way of horizontally slicing the whole table into multiple subsets without altering the table structure is called Horizontal Fragmentation. The concept is usually used to keep tuples (records) at the places where they are used the most, to minimize data transfer between far locations. Horizontal Fragmentation has two variants as follows; 1. Primary Horizontal Fragmentation (PHF) 2. Derived Horizontal Fragmentation (DHF) 1.1 Primary Horizontal Fragmentation (PHF) Primary Horizontal Fragmentation is about fragmenting a single table horizontally (row wise) using a set of simple predicates (conditions). What is simple predicate? Given a table R with set of attributes [A1, A2, …, An], a simple predicate Pi can be expressed as follows; Pi : Aj θ Value Where θ can be any of the symbols in the set {=, <, >, ≤, ≥, ≠}, value can be any value stored in the table for the attributed Ai. For example, consider the following table Account given in Figure 1; Acno Balance Branch_Name A101 5000 Mumbai A103 10000 New Delhi A104 2000 Chennai A102 12000 Chennai A110 6000 Mumbai A115 6000 Mumbai A120 2500 New Delhi Figure 1: Account table For the above table, we could define any simple predicates like, Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000 etc using the above expression “Aj θ Value”. What is set of simple predicates? Set of simple predicates is set of all conditions collectively required to fragment a relation into subsets. For a table R, set of simple predicate can be defined as; P = { P1, P2, …, Pn} Example 1 As an example, for the above table Account, if simple conditions are, Balance < 10000, Balance ≥ 10000, then, Set of simple predicates P1 = {Balance < 10000, Balance ≥ 10000} Example 2 As another example, if simple conditions are, Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000, Balance ≥ 10000, then, Set of simple predicates P2 = { Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000, Balance ≥ 10000} What is Min-term Predicate? When we fragment any relation horizontally, we use single condition, or set of simple predicates to filter the data.Given a relation R and set of simple predicates, we can fragment a relation horizontally as follows (relational algebra expression); Fragment, Ri = σFi(R), 1 ≤ i ≤ n where Fi is the set of simple predicates represented in conjunctive normal form, otherwise called as Min-term predicate which can be written as follows; Min-term predicate, Mi=P1 Λ P2 Λ P3 Λ … Λ Pn Here, P1 means both P1 or ¬(P1), P2 means both P2 or ¬(P2), and so on. Using the conjunctive form of various simple predicates in different combination, we can derive many such min-term predicates. For the example 1 stated previously, we can derive set of min-term predicates using the rules stated above as follows; n We will get 2 min-term predicates, where n is the number of simple predicates in the given 2 predicate set. For P1, we have 2 simple predicates. Hence, we will get 4 (2 ) possible combinations of min-term predicates as follows; m1 = {Balance < 10000 Λ Balance ≥ 10000} m2 = {Balance < 10000 Λ ¬(Balance ≥ 10000)} m3 = {¬(Balance < 10000) Λ Balance ≥ 10000} m4 = {¬(Balance < 10000) Λ ¬(Balance ≥ 10000)} Our next step is to choose the min-term predicates which can satisfy certain conditions to fragment a table, and eliminate the others which are not useful. For example, the above set of min-term predicates can be applied each as a formula Fi stated in the above rule for fragment Ri as follows; Account1 = σBalance< 10000 Λ Balance ≥ 10000(Account) which can be written in equivalent SQL query as, Account1 <-- SELECT * FROM account WHERE balance < 10000 AND balance ≥ 10000; Account2 = σBalance< 10000 Λ ¬(Balance ≥ 10000)(Account) which can be written in equivalent SQL query as, Account2 SELECT * FROM account WHERE balance < 10000 AND NOT balance ≥ 10000; where NOT balance ≥ 10000 is equivalent to balance < 10000. Account3 = σ¬(Balance< 10000) Λ Balance ≥ 10000(Account) which can be written in equivalent SQL query as, Account3 SELECT * FROM account WHERE NOT balance < 10000 AND balance ≥ 10000; where NOT balance < 10000 is equivalent to balance ≥ 10000. Account4 = σ¬(Balance< 10000) Λ ¬(Balance ≥ 10000)(Account) which can be written in equivalent SQL query as, Account4 SELECT * FROM account WHERE NOT balance < 10000 AND NOT balance ≥ 10000; where NOT balance < 10000 is equivalent to balance ≥ 10000 and NOT balance ≥ 10000 is equivalent to balance < 10000. This is exactly same as the query for fragment Account1. From these examples, it is very clear that the first query for fragment Account1 (min-term predicate m1) is invalid as any record in a table cannot have two values for any attribute in one record. That is, the condition (Balance < 10000 Λ Balance ≥ 10000) requires that the value for balance must both be less than 10000 and greater and equal to 10000, which is not possible. Hence the condition violates and can be eliminated. For fragment Account2 (min-term predicate m2), the condition is (balance<10000 and balance<10000) which ultimately means balance<10000 which is correct. Likewise, fragmentAccount3 is valid and Account4 must be eliminated. Finally, we use the min-term predicates m2 and m3 to fragment theAccount relation. The fragments can be derived as follows for Account; SELECT * FROM account WHERE balance < 10000; Account2 Acno Balance Branch_Name A101 5000 Mumbai A104 2000 Chennai A120 2500 New Delhi A110 6000 Mumbai A115 6000 Mumbai SELECT * FROM account WHERE balance ≥ 10000; Account3 Acno Balance Branch_Name A103 10000 New Delhi A102 12000 Chennai Correctness of Fragmentation We have chosen set of min-term predicates which would be used to horizontally fragment a relation (table) into pieces. Now, our next step is to validate the chosen fragments for their correctness. We need to verify did we miss anything? We use the following rules to ensure that we have not changed semantic information about the table which we fragment. 1. Completeness – If a relation R is fragmented into set of fragments, then a tuple (record) of R must be found in any one or more of the fragments. This rule ensures that we have not lost any records during fragmentation. 2. Reconstruction – After fragmenting a table, we must be able to reconstruct it back to its original form without any data loss through some relational operation. This rule ensures that we can construct a base table back from its fragments without losing any information. That is, we can write any queries involving the join of fragments to get the original relation back. 3. Disjointness – If a relation R is fragmented into a set of sub-tables R1, R2, …, Rn, a record belongs to R1 is not found in any other sub-tables. This ensures that R1 ≠ R2. For example, consider the Account table in Figure 1 and its fragments Account2, and Account3 created using the min-term predicates we derived. From the tables Account2, and Account3 it is clear that the fragmentation is Complete. That is, we have not missed any records. Just all are included into one of the sub-tables. When we use an operation, say Union between Account2, and Account3 we will be able to get the original relation Account. (SELECT * FROM account2) Union (SELECT * FROM account3); The above query will get us Account back without loss of any information. Hence, the fragments created can be reconstructed. Finally, if we write a query as follows, we will get a Null set as output. It ensures that the Disjointness property is satisfied. (SELECT * FROM account2) Intersect (SELECT * FROM account3); We get a null set as result for this query because, there is no record common in both relations Account2 and Account3. For the example 2, recall the set of simple predicates which was as follows; Set of simple predicates P2 = { Branch_name = ‘Chennai’, Branch_name= ‘Mumbai’, Balance < 10000, Balance ≥ 10000} We can derive the following min-term predicates; m1 = { Branch_name = ‘Chennai’ Λ Branch_name= ‘Mumbai’ Λ Balance < 10000 Λ Balance ≥ 10000} m2 = { Branch_name = ‘Chennai’ Λ Branch_name= ‘Mumbai’ Λ Balance < 10000 Λ ¬(Balance ≥ 10000)} m3 = { Branch_name = ‘Chennai’ Λ Branch_name= ‘Mumbai’ Λ ¬(Balance < 10000) Λ Balance ≥ 10000} m4 = { Branch_name = ‘Chennai’ Λ ¬(Branch_name= ‘Mumbai’) Λ Balance < 10000 Λ Balance ≥ 10000} … … … mn = { ¬(Branch_name = ‘Chennai’) Λ ¬(Branch_name= ‘Mumbai’) Λ ¬(Balance < 10000) Λ ¬(Balance ≥ 10000)} 4 As in the previous example, out of 16 (2 ) min-term predicates, the set of min-term predicates which are not valid should be eliminated. At last, we would have the following set of valid minterm predicates. m1 = { Branch_name = ‘Chennai’ Λ ¬(Branch_name= ‘Mumbai’) Λ ¬(Balance < 10000) Λ Balance ≥ 10000} m2 = { Branch_name = ‘Chennai’ Λ ¬(Branch_name= ‘Mumbai’) Λ Balance < 10000 Λ ¬(Balance ≥ 10000)} m3 = { ¬(Branch_name = ‘Chennai’) Λ Branch_name= ‘Mumbai’ Λ ¬(Balance < 10000) Λ Balance ≥ 10000} m4 = { ¬(Branch_name = ‘Chennai’) Λ Branch_name= ‘Mumbai’ Λ Balance < 10000 Λ ¬(Balance ≥ 10000)} m5 = { ¬(Branch_name = ‘Chennai’) Λ ¬(Branch_name= ‘Mumbai’) Λ ¬(Balance < 10000) Λ Balance ≥ 10000} m6 = { ¬(Branch_name = ‘Chennai’) Λ ¬(Branch_name= ‘Mumbai’) Λ Balance < 10000 Λ ¬(Balance ≥ 10000)} The horizontal fragments using the above set of min-term predicates can be generated as follows; Fragment 10000; Fragment 10000; Fragment 10000; Fragment 10000; 1: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance ≥ 2: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance < 3: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance ≥ 4: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance < The horizontal fragments using the above set of min-term predicates can be generated as follows; Fragment 1: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance ≥ 10000; Account1 Acno Balance Branch_Name A102 12000 Chennai Fragment 2: SELECT * FROM account WHERE branch_name = ‘Chennai’ AND balance < 10000; Account2 Acno Balance Branch_Name A102 2000 Chennai Fragment 3: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance ≥ 10000; Account3 Acno Balance Branch_Name Fragment 4: SELECT * FROM account WHERE branch_name = ‘Mumbai’ AND balance < 10000; Account4 Acno Balance Branch_Name A101 5000 Mumbai A110 6000 Mumbai A115 6000 Mumbai In the ACCOUNT table we have the third branch ‘New Delhi’, which was not specified in the set of simple predicates. Hence, in the fragmentation process we must not leave the tuple with the value ‘New Delhi’. That is the reason we have included the min-term predicates m5 and m6 which can be derived as follows; Fragment 5: SELECT * FROM account WHERE branch_name <> ‘Mumbai’ AND branch_name <> ‘Chennai’ AND balance ≥ 10000; Account5 Acno Balance Branch_Name A103 10000 New Delhi Fragment 6: SELECT * FROM account WHERE branch_name <> ‘Mumbai’ AND branch_name <> ‘Chennai’ AND balance < 10000; Account6 Acno Balance Branch_Name A120 2500 New Delhi Correctness of fragmentation: Completeness: The tuple of the table Account is distributed into different fragments. No records were omitted. Otherwise, by performing the union operation between all the Account table fragments Account1, Account2, Account3, and Account4, we will be able to get Account back without any information loss. Hence, the above fragmentation is Complete. Reconstruction: As said before, by performing Union operation between all the fragments, we will be able to get the original table back. Hence, the fragmentation is correct and the reconstruction property is satisfied. Disjointness: When we perform Intersect operation between all the above fragments, we will get null set as result, as we do not have any records in common for all the fragments. Hence, disjointness property is satisfied.