File Systems and Databases Assessment

advertisement

Database Management - Assignment 5

Good work. See comment below…

Grade: 10 out of 10 points

Questions

1. Explain the following statement: a transaction is a logical unit of work.

A transaction is generated by events and it is a logical unit of work. That means, to record a transaction, all related events should be completed and no partial states are acceptable to avoid data inconsistency.

For example, when a student checks in a book (book_id = “12345”), that transaction consists of two events: updating the BOOK table by changing the book status

(check_out) “Y” to “N” and deleting a row that corresponds to the same book in the

CHECKOUTBOOK table. Changing the status of book to “Checked in” does not complete the check_in transaction and an incomplete transaction violates data integrity and creates an inconsistent database. That is why a transaction is a logical unit of work or events.

Using SQL, to check in a book can be completed by the following two statements.

UPDATE book

SET check_out = “N”

WH ERE book_id = “12345”;

DELETE FROM checkoutbook

WHERE book_id = “12345”;

You would need to use a COMMIT statement to save both these changes to the database.

2. List and discuss the four transaction properties.

The four transaction properties are atomicity, durability, serializability and isolation.

- Atomicity means that a transaction is indivisible and it cannot be partially completed to avoid inconsistent data.

- Durability means the permanent state of a transaction. For example, when an update occurs the value is changed from an old value to a new value. Durability means the state of a database reaches the state with a new value after completion of a transaction.

- Serializability means the occurrence of concurrent execution of multiple transactions, one after another. This is important in multi-user distributed database systems.

- Isolation means that the same data cannot be accessed and updated by several transactions. In other words, the second transaction can access the same data after the first transaction completed.

In a single-user DBMS, all transactions done by a single user are serial and isolated because one transaction is executed at a time. It ensures the serializability and isolation and it is

necessary to use controls for the atomicity and durability of a transaction only. However, in a multi-user DBMS environment, serializability and isolation are important in addition to atomicity and durability in order to meet data integrity and consistency since several concurrent transactions are executed the same data.

For instance, in a Athena stand-alone library system, although circulation transactions are done one by one, in the Accent centralized library system which is being used in all LAUSD district schools, these four transaction properties are important because multiple users access the same database at the same time.

3. What is concurrency control, and what is its objective?

Concurrency control is the management of simultaneous executions of transactions.

The objective of concurrency control is to make sure that the simultaneous transactions are done one after another in the multi-user database environment. Without having this, because multiple users access the same data at the same time over a distributed database system, there may be problems of losing data updates, loss of data integrity and data inconsistency.

4. What three levels of backup may be used in database recovery management? Briefly, describe what each of those three backup levels does.

When any failure happens because of the software, hardware or external factors, the data from the backup of the database is what you can use to recover the previous data.

Therefore, to backup the database or to schedule automatic database backup is important for data recovery. In database recovery management, there are three levels of backups:

1. Full backup level

Full backup gives an exact copy of the entire database.

2. Differential backup level

Differential backup copies a part of the database that contains the updates completed after the latest backup copy.

3. Transaction log backup level

This backup copies the transactions that are recorded in the transaction log between after the previous backup and just before the failure.

5. List three components of a DDBMS, and list three advantages and three disadvantages of a DDBMS?

The three components of a DDBMS are:

1. Computer workstations – Computers are needed to form the network system to distribute the database.

2. The network components – Networking software and hardware components are needed in each workstation to interact with each other.

3. The data processors – The data processors on each computer send and retrieve the data locally.

The three advantages of a DDBMS are: faster data access, faster data processing and a User-friendly interface. In DDBMS, several workstations are added to the network system and data is locally stored and accessed at different sites. Therefore, the database

system delivers faster data access and faster data processing. In addition, end users are more familiar with the regular pcs and work stations rather than the main frame computers.

This is why DDBMS gives the user-friendly and easy interface to users.

The three disadvantages of a DDBMS are:-

1. Complexity of management and control – In DDBMS, transaction management, concurrency control, data security, data recovery play important roles to make sure the database is consistent. That is why data management and control is more complex in

DDBMS than that in traditional systems.

2. DDBMS increases the storage needs —Because the data is stored in different workstations, additional disk storage is needed.

3. DDBMS increases the training costs —Users require more training for the complex data management in a distributed system and it increases the cost.

6. Explain the need for the two-phase commit protocol. Then describe the two phases.

Although the centralized database system requires only one data processor (DP), in the distributed database multiple-site data environment one DP is needed for each site because multiple processes are done by multiple sites. In this case, it is important that each transaction operation is committed by each local DP. Each DP maintains its own transaction log. When one of the DPs can not commit the transactions while each transaction is committed, that results in a inconsistent database. Because the two-phase commit ensures that all nodes commit their part of the transaction, it is required to solve the data inconsistency in the distributed multi-site data environment.

Two-phase commit protocol is implemented in two phases: the Preparation Phase and the Final Commit Phase.

- In the Preparation Phase, first, the coordinator ensures all subordinates are prepared to commit. The subordinate replies to the “YES/NO” message to the coordinator by writing to the transaction log using the write-ahead protocol. If all subordinates are ready to commit, the transaction continues to phase 2, otherwise, the coordinator aborts the transaction.

- In th e Final Commit Phase, the coordinator sends a “COMMIT” message and waits for a reply from the subordinators. The subordinates use the “DO” protocol to update the database. The coordinator cancels all changes by using the “UNDO” protocol if at least one o f the subordinates replies “NOT COMMITTED”.

7. Describe the three data fragmentation strategies. Give some examples.

1. Horizontal fragmentation –

In the horizontal fragmentation strategy, a table is divided into groups of rows logically and each fragment (group/sunset) contains unique rows and it is stored at a different node. All rows have the same attributes and the SELECT statement produces the contents of the fragments.

2. Vertical fragmentation –

In the vertical fragmentation strategy, the table is divided into logical groups of attributes (columns). Each fragment contains unique columns and is stored at a different location. The content in the fragment is obtained by using the “PROJECT” statement.

3. Mixed fragmentation –

This strategy is the combination of horizontal and vertical strategies. In other words, each row fragment may be a combination of groups of attributes.

For example, suppose that the following is the student table that is centralized in the school district.

Student Table

S_NO S_FNAME S_LNAME S_DOB LOC_CODE LOCATION

101 Clay Gediman 100978 8571 Canoga Park

STATUS OD_AMT

G $0.00

102 Melissa Rentchler 020477 8556 El Camino D $0.50

103 Mimi

104 David

105 Aung

Rangtha 090580

Gillham 100978

Min 020477

8571 Canoga Park G

8522 Taft G

8556 El Camino G

$0.00

$0.00

$0.00

106 Diane Anderson 090580 8522 Taft D $4.50

- Suppose that the district is interested to organize the student by location code, in the horizontal fragmentation strategy, the student table is divided into 3 fragments as shown below.

FRAG_NAME LOCATION CONDITION NODE_NAME S_NO

NO. OF

ROWS

STU_H1

STU_H2

Canoga Park LOC_CODE="8571" CP

El Camino LOC_CODE="8556" EC

101,103

102,105

2

2

2 STU_H3 Taft LOC_CODE="8522" TF 104,106

- Suppose that the district has two departments: library services and student information. In the vertical fragmentation of the student table, the table is divided into two fragments according to the attributes as below.

FRAG_NAME LOCATION NODE_NAME ATTRIBUTE NAMES

S_NO, S_FNAME, S_LNAME, S_DOB,

STU_V1 Library Service

Student

Information

LIBSER LOC_CODE

-

STU_V2 SINFO S_NO, STATUS, OD_AMT

In the mixed fragmentation, the fragmentations of the student table will be as below.

FRAG_NAME LOCATION

HORIZONTAL

CRITERIA

NODE

NAME

STU_M1

ROWS ATTRIBUTE NAMES

S_NO, S_FNAME, S_LNAME,

101,103 S_DOB, LOC_CODE

STU_M2

Library Service LOC_CODE="8571" CP_S

Student

Information LOC_CODE="8571" CP_L

STU_M3

101,103 S_NO, STATUS, OD_AMT

S_NO, S_FNAME, S_LNAME,

102,105 S_DOB, LOC_CODE

STU_M4

Library Service LOC_CODE="8556" EC_S

Student

Information LOC_CODE="8556" EC_L 102,105 S_NO, STATUS, OD_AMT

S_NO, S_FNAME, S_LNAME,

STU_M5

STU_M6

Library Service LOC_CODE="8522" TF_S

Student

Information LOC_CODE="8522" TF_L

104,106 S_DOB, LOC_CODE

104,106 S_NO, STATUS, OD_AMT

8. What is data replication, and what are the three replication strategies?

Data replication means that copies of data fragments are stored at multiple sites to enhance data availability and response time. If the database is replicated, it is required to

update all copies of the database at all sites for a ‘WRITE’ operation to maintain data consistency.

There are three replication strategies: fully replicated, partially replicated and unreplicated. The key factors to decide to use data replication are size of the database, usage frequency and costs.

(1). Fully replicated strategy

In this strategy, each database fragment is copied and stored at multiple sites. But fully replication increases the overhead cost and it is not practical.

(2). Partially replicated strategy

In this strategy, some database fragments are copied and stored at multiple sites. Partially replication is used by most DDBMSs.

(3). Unreplicated strategy

This strategy does not copy any fragment of the database and just stores the fragment at a single site.

9. You've been hired to integrate all of the libraries in the California State University system.

Describe a high-level design you might use to set this up as a distributed database.

I might use heterogeneous distributed database design to integrate all of the libraries in the California State University system because first of all, I believe all libraries are using different platforms, different DBMSs and different operating systems. Only

Heterogeneous distributed database design can handle different DBMS that are running under different computer systems. Using heterogeneous distributed database design is more cost effective than using a homogeneous system since all libraries have started different systems under different platforms. In addition, using distributed database systems enhance data availability and response time as opposed to using the centralized database system.

10. C.J Date's Twelve Commandments for Distributed Databases are listed on page 514.

Choose what you think are the three most important commandments and the three least likely to implemented commandments as they relate to library science. Explain your rationale.

Deciding on the most important commandments and the ones least likely to be implemented depends on the distributed database system used.

Suppose that all of the libraries in the California State University system are integrated under the distributed database system to minimize the costs in the long run. For this system, I think the three most important commandments are Hardware independence,

Operating system independence and Database independence; this is because libraries are using different hardware, operating systems and databases according to each library’s needs. To be able to work together in libraries, the distributed databases with independent hardware, operating system and database are highly required.

The three least likely to be implemented commandments are Failure independence,

Location transparency and Distributed query processing because every distributed system already satisfies these three capabilities. The database system is able to continue when a node failure in the network and the users do not need to know where the data came from.

Finally, the transaction processor is one of the components of the distributed database system and it does the distributed query processing. This is why these three commandments are basic features of a distributed database and they are least likely to be implemented for the desired distributed database library system.

Download