CS 407 Distributed System & Databases

advertisement
CS 407 Distributed System & Databases
WEEK # 13 - DDBMS Architecture, Dimensions, Architectural Alternatives
DAY # 37
Distribution Design Issues
Three key issues:
 Fragmentation,
 Allocation,
 Replication.
Fragmentation
Relation may be divided into a number of sub-relations, which are then distributed.
Allocation
Each fragment is stored at site with “optimal” distribution.
Replication
Copy of fragment may be maintained at several sites.
Why Fragment?

Usage
o Applications work with views rather than entire relations.

Efficiency
o Data is stored close to where it is most frequently used.
o Data that is not needed by local applications is not stored.

Parallelism
o With fragments as unit of distribution, transaction can be divided into several subqueries
that operate on fragments.

Security
o Data not required by local applications is not stored and so not available to unauthorized
users.
Disadvantages
o Performance,
o Integrity.
Engr. Muhammad Nadeem
Page 1
CS 407 Distributed System & Databases
Types of Fragmentation

Four types of fragmentation:
o Horizontal,
o Vertical,
o Mixed,
o Derived.

Other possibility is no fragmentation:
o If relation is small and not updated frequently, may be better not to fragment relation.
Horizontal Fragmentation

Consists of a subset of the tuples of a relation.

Defined using Selection operation of relational algebra:


p(R)


P1 =  type=‘House’(PropertyForRent)
P2 =  type=‘Flat’(PropertyForRent
For example:

This strategy is determined by looking at predicates used by transactions.

Involves finding set of minimal (complete and relevant) predicates.

Set of predicates is complete, if and only if, any two tuples in same fragment are referenced with
same probability by any application.

Predicate is relevant if there is at least one application that accesses fragments differently.
Vertical Fragmentation

Consists of a subset of attributes of a relation.

Defined using Projection operation of relational algebra:

a1, ... ,an(R)
Engr. Muhammad Nadeem
Page 2
CS 407 Distributed System & Databases

For example:

 S1 = staffNo, position, sex, DOB, salary(Staff)
 S2 = staffNo, fName, lName, branchNo(Staff)
Determined by establishing affinity of one attribute to another.
Mixed Fragmentation

Consists of a horizontal fragment that is vertically fragmented, or a vertical fragment that is
horizontally fragmented.

Defined using Selection and Projection operations of relational algebra:



 p(a1, ... ,an(R))
a1, ... ,an(σp(R))
or
Example:
S1 = staffNo, position, sex, DOB, salary(Staff)
S2 = staffNo, fName, lName, branchNo(Staff)
S21 =  branchNo=‘B003’(S2)
S22 =  branchNo=‘B005’(S2)
S23 =  branchNo=‘B007’(S2)
Derived Horizontal Fragmentation

A horizontal fragment that is based on horizontal fragmentation of a parent relation.

Ensures that fragments that are frequently joined together are at same site.

Defined using Semijoin operation of relational algebra:
o Ri = R
F
Si,
1iw

If relation contains more than one foreign key, need to select one as parent.

Choice can be based on fragmentation used most frequently or fragmentation with better join
characteristics.
Engr. Muhammad Nadeem
Page 3
CS 407 Distributed System & Databases
Data Allocation

Four alternative strategies regarding placement of data:




Centralized,
Partitioned (or Fragmented),
Complete Replication,
Selective Replication.
Centralized
Consists of single database and DBMS stored at one site with users distributed across the network.
Partitioned
Database partitioned into disjoint fragments, each fragment assigned to one site.
Complete Replication
Consists of maintaining complete copy of database at each site.
Selective Replication
Combination of partitioning, replication, and centralization.
Data Replication
•
•
Fully replicated database:
–
Stores multiple copies of each database fragment at multiple sites
–
Can be impractical due to amount of overhead
Partially replicated database:
–
Stores multiple copies of some database fragments at multiple sites
Engr. Muhammad Nadeem
Page 4
CS 407 Distributed System & Databases
–
•
Most DDBMSs are able to handle the partially replicated database well
Unreplicated database:
–
Stores each database fragment at a single site
–
No duplicate database fragments
WEEK # 13 - DDBMS Architecture, Dimensions, Architectural Alternatives
DAY # 38
C. J. Date’s 12 Rules for a DDBMS

Fundamental Principle to the user, a distributed system should look exactly like a non-distributed system.
1. Local Autonomy
2. No Reliance on a Central Site
3. Continuous Operation
4. Location Independence
5. Fragmentation Independence
6. Replication Independence
7. Distributed Query Processing
8. Distributed Transaction Processing
9. Hardware Independence
10. Operating System Independence
11. Network Independence
12. Database Independence

Last four rules are ideals.
Briefly Define The Distributed Query Processing Methodology By Using
Diagram?
Distributed Query Processing Methodology
Engr. Muhammad Nadeem
Page 5
CS 407 Distributed System & Databases
Differentiate between Homogeneous and heterogeneous DDBMS?
Homogeneous DDBMS

All sites use same DBMS product.

Much easier to design and manage.

Approach provides incremental growth and allows increased performance.
Heterogeneous DDBMS

Sites may run different DBMS products, with possibly different underlying data models.

Occurs when sites have implemented their own databases and integration is considered later.

Translations required to allow for:
Engr. Muhammad Nadeem
Page 6
CS 407 Distributed System & Databases
 Different hardware.
 Different DBMS products.
 Different hardware and different DBMS products.

Typical solution is to use gateways.
What do you understand by Transaction Transparency and Concurrency
Transparency in Distributed Database System?
Transaction Transparency

Ensures that all distributed transactions maintain distributed database’s integrity and consistency.

Distributed transaction accesses data stored at more than one location.

Each transaction is divided into number of sub-transactions, one for each site that has to be
accessed.

DDBMS must ensure the indivisibility of both the global transaction and each sub-transactions.
Concurrency Transparency

All transactions must execute independently and be logically consistent with results obtained if
transactions executed one at a time, in some arbitrary serial order.

Same fundamental principles as for centralized DBMS.

DDBMS must ensure both global and local transactions do not interfere with each other.

Similarly, DDBMS must ensure consistency of all sub-transactions of global transaction.
WEEK # 13 - DDBMS Architecture, Dimensions, Architectural Alternatives
DAY # 39
Lab - Replication in MS SQL Server
Engr. Muhammad Nadeem
Page 7
Download