CS 407 Distributed System & Databases WEEK # 13 - DDBMS Architecture, Dimensions, Architectural Alternatives DAY # 37 Distribution Design Issues Three key issues: Fragmentation, Allocation, Replication. Fragmentation Relation may be divided into a number of sub-relations, which are then distributed. Allocation Each fragment is stored at site with “optimal” distribution. Replication Copy of fragment may be maintained at several sites. Why Fragment? Usage o Applications work with views rather than entire relations. Efficiency o Data is stored close to where it is most frequently used. o Data that is not needed by local applications is not stored. Parallelism o With fragments as unit of distribution, transaction can be divided into several subqueries that operate on fragments. Security o Data not required by local applications is not stored and so not available to unauthorized users. Disadvantages o Performance, o Integrity. Engr. Muhammad Nadeem Page 1 CS 407 Distributed System & Databases Types of Fragmentation Four types of fragmentation: o Horizontal, o Vertical, o Mixed, o Derived. Other possibility is no fragmentation: o If relation is small and not updated frequently, may be better not to fragment relation. Horizontal Fragmentation Consists of a subset of the tuples of a relation. Defined using Selection operation of relational algebra: p(R) P1 = type=‘House’(PropertyForRent) P2 = type=‘Flat’(PropertyForRent For example: This strategy is determined by looking at predicates used by transactions. Involves finding set of minimal (complete and relevant) predicates. Set of predicates is complete, if and only if, any two tuples in same fragment are referenced with same probability by any application. Predicate is relevant if there is at least one application that accesses fragments differently. Vertical Fragmentation Consists of a subset of attributes of a relation. Defined using Projection operation of relational algebra: a1, ... ,an(R) Engr. Muhammad Nadeem Page 2 CS 407 Distributed System & Databases For example: S1 = staffNo, position, sex, DOB, salary(Staff) S2 = staffNo, fName, lName, branchNo(Staff) Determined by establishing affinity of one attribute to another. Mixed Fragmentation Consists of a horizontal fragment that is vertically fragmented, or a vertical fragment that is horizontally fragmented. Defined using Selection and Projection operations of relational algebra: p(a1, ... ,an(R)) a1, ... ,an(σp(R)) or Example: S1 = staffNo, position, sex, DOB, salary(Staff) S2 = staffNo, fName, lName, branchNo(Staff) S21 = branchNo=‘B003’(S2) S22 = branchNo=‘B005’(S2) S23 = branchNo=‘B007’(S2) Derived Horizontal Fragmentation A horizontal fragment that is based on horizontal fragmentation of a parent relation. Ensures that fragments that are frequently joined together are at same site. Defined using Semijoin operation of relational algebra: o Ri = R F Si, 1iw If relation contains more than one foreign key, need to select one as parent. Choice can be based on fragmentation used most frequently or fragmentation with better join characteristics. Engr. Muhammad Nadeem Page 3 CS 407 Distributed System & Databases Data Allocation Four alternative strategies regarding placement of data: Centralized, Partitioned (or Fragmented), Complete Replication, Selective Replication. Centralized Consists of single database and DBMS stored at one site with users distributed across the network. Partitioned Database partitioned into disjoint fragments, each fragment assigned to one site. Complete Replication Consists of maintaining complete copy of database at each site. Selective Replication Combination of partitioning, replication, and centralization. Data Replication • • Fully replicated database: – Stores multiple copies of each database fragment at multiple sites – Can be impractical due to amount of overhead Partially replicated database: – Stores multiple copies of some database fragments at multiple sites Engr. Muhammad Nadeem Page 4 CS 407 Distributed System & Databases – • Most DDBMSs are able to handle the partially replicated database well Unreplicated database: – Stores each database fragment at a single site – No duplicate database fragments WEEK # 13 - DDBMS Architecture, Dimensions, Architectural Alternatives DAY # 38 C. J. Date’s 12 Rules for a DDBMS Fundamental Principle to the user, a distributed system should look exactly like a non-distributed system. 1. Local Autonomy 2. No Reliance on a Central Site 3. Continuous Operation 4. Location Independence 5. Fragmentation Independence 6. Replication Independence 7. Distributed Query Processing 8. Distributed Transaction Processing 9. Hardware Independence 10. Operating System Independence 11. Network Independence 12. Database Independence Last four rules are ideals. Briefly Define The Distributed Query Processing Methodology By Using Diagram? Distributed Query Processing Methodology Engr. Muhammad Nadeem Page 5 CS 407 Distributed System & Databases Differentiate between Homogeneous and heterogeneous DDBMS? Homogeneous DDBMS All sites use same DBMS product. Much easier to design and manage. Approach provides incremental growth and allows increased performance. Heterogeneous DDBMS Sites may run different DBMS products, with possibly different underlying data models. Occurs when sites have implemented their own databases and integration is considered later. Translations required to allow for: Engr. Muhammad Nadeem Page 6 CS 407 Distributed System & Databases Different hardware. Different DBMS products. Different hardware and different DBMS products. Typical solution is to use gateways. What do you understand by Transaction Transparency and Concurrency Transparency in Distributed Database System? Transaction Transparency Ensures that all distributed transactions maintain distributed database’s integrity and consistency. Distributed transaction accesses data stored at more than one location. Each transaction is divided into number of sub-transactions, one for each site that has to be accessed. DDBMS must ensure the indivisibility of both the global transaction and each sub-transactions. Concurrency Transparency All transactions must execute independently and be logically consistent with results obtained if transactions executed one at a time, in some arbitrary serial order. Same fundamental principles as for centralized DBMS. DDBMS must ensure both global and local transactions do not interfere with each other. Similarly, DDBMS must ensure consistency of all sub-transactions of global transaction. WEEK # 13 - DDBMS Architecture, Dimensions, Architectural Alternatives DAY # 39 Lab - Replication in MS SQL Server Engr. Muhammad Nadeem Page 7