CSCI315 Database Design and Implementation

advertisement

SCSSE

School of Information Technology and Computer Science

CSCI315

Database Design and Implementation

This paper is for students studying at:

Wollongong Batemans Bay Bega

Moss Vale Shoalhaven Sydney

Family Name

First Name

...........................................................

...........................................................

Student Number ...........................................................

Table Number ...........................................................

Loftus

X Singapore (SIM)

SESSION 2 2008 EXAMINATION

Time Allowed: 3 hours 15 minutes

Number of Questions: 7 questions

Assessment: This exam is worth 70 % of the total marks for the subject ( 70 marks)

DIRECTIONS TO CANDIDATES

1.

Please attempt all questions

2.

All answers must be written in the exam booklet provided and must be written neat and legible; otherwise the examination will not be marked.

No Examination Aids or Materials Allowed

THIS EXAMINATION PAPER MUST NOT BE REMOVED FROM THE EXAMINATION ROOM

Question 1 (10 marks)

We would like to design a conceptual schema of a database that contains information about a network of hospitals. Read the following description of a sample database domain.

A description of a hospital consists of a unique hospital name, address of a main building including city, street, and building number, phone and fax numbers, and email address. The phone number, fax number, and email address can also be used for the identification purposes. A hospital consists of a number of buildings. A building number is a local identifier of a building.

Some of the buildings have the names, which are unique within a hospital.

A hospital consists of wards and wards consist of decks. Each ward has a unique name and is located in one building. A ward may occupy several levels of a building. A deck is identified by a number unique within a ward and it is located at one level of a building. The database should contain the complete information about the locations of buildings and decks.

A hospital employs the doctors, nurses, administration and maintenance staff members. The employees are described by an employee number unique within a hospital, first name, last name, date of birth and salary. The doctors are additional described by a status and specialization. The nurses are also described by a category. Administration and maintenance staff members are described by a position occupied. The doctors are assigned to the wards and the nurses are assigned to the decks. Additionally, a hospital contracts a number of consultants. The consultants are not assigned to the wards and are employed by a hospital for a limited time only.

The consultants are described by first and last name, specialization, phone number, and amount of charge per hour, start and end of contract with a hospital. The consultants can be employed by more that one hospital at a time.

A hospital admits the patients for the treatments. A patient is described by medical insurance company name and insurance policy number. Date and time are recorded on admittance to a hospital. A patient may be admitted to and discharged from a hospital many times. On arrival to a hospital a patient passes through entry physical examinations, which consists of a number of tests. The database should contain information about the names of tests applied and the results of the tests conducted just after arrival to a hospital. When, an illness a patient suffers from is identified he/she is located at a ward and at a deck of a ward.

During his/her stay in a hospital a patient passes through a number of physical examinations and through a number of treatments. A physical examination consists of a number of uniquely named tests. The results of the tests are recorded in the database together with date and time when each test has been conducted and all medical staff involved in each test. A treatment is described by a name, date/time when it occurred and all medical staff (doctors and nurses) involved. Some of the treatments are the operations conducted on a patient. An additional description of an operation includes a unique operation name and time spent on operation.

Design a conceptual schema for the database domain described above. Use a notation of simplified UML object class diagrams explained during the lecture classes and applied for conceptual modelling in the assignments and class test. Remember to determine the multiplicities of associations, identifiers of classes of objects, multivalued, optional and derived attributes (if any).

Question 2 (8 marks)

Consider the following conceptual schema:

Service-to departure-location 1..*

AIRPORT name ID address capacity

1..* Services

1..*

1..*

CITY country ID city-name ID population

The objective of this task is to extend the schema in the ways listed below.

(1) We would like to record information about the airplane connections between the airports. A connection is described by flight number

(identifier), airplane-type, departure time, arrival time, and airline servicing a connection.

(2) We would like to record information about the airplanes used to service the connections. An airplane is identified by a unique number and it is described by a type of airplane (e.g. 747-400), total number of seats in the first, business, and economy classes.

(3) We would like to record information about the instances of flights between the airports. The instances of flights implement the connections between the airports. The airplanes are used for the instances of flights. A flight instance is described by date when it occurred, and total number of passengers on the flight.

(4) We would like to record information about crew members servicing the instances of flights. A crew consists of two pilots, a navigator and a number of passenger service people. Each crew member is described by an employee number, which is a local identifier for the same airlines, first name and last name.

Extend the conceptual schema given above such that it correctly represents the additional information listed above. Redraw the conceptual schema after all modifications.

1..*

Question 3 (8 marks)

Consider the following conceptual schema.

EMPLOYEE e# ID first-name last-name dob

Works-on

1..* 1..*

PROJECT title deadline description

ID

Consists-of stage#

0..1

STAGE description time-required

SOFTWARE-MODULE completion-date

Consists-of

1..*

Implements

1..* s

PROGRAM-UNIT path ID1 file-name ID1 completion-date ID2

(1) (2 marks)

Redraw the schema to a canonical form, i.e. replace many-to-many associations, multivalued attributes, qualified associations, association classes, and link attributes with the equivalent constructions.

(2) (3 marks)

Translate the schema obtained from a step above into a collection of relational schemas.

For each relational schema determine a primary key, candidate keys (if any), foreign key

(if any), referential integrity constraints (if any), optional and mandatory attributes (if any).

(3) (3 marks)

Consider the database applications consistent with the following template.

Find the first and the last name ( first-name , last-name ) of all employees who implemented at least one program unit in a given year ( completion-date )

A sample application consistent with the template above is as follows.

Find the first and the last name ( first-name , last-name ) of all employees who implemented at least one program unit in 2007 ( TO_CHAR(completiondate,'YYYY')='2007' )

We would like to improve the performance of all applications consistent with the template given above through the denormalization of a conceptual schema in canonical form obtained from task (1). Propose the denormalization of the schema and redraw entire schema after the denormalization.

Question 4 (8 marks)

A script q4create.sql

listed below creates a small relational database that contains information about parts, orders involving parts, and lines from the orders.

CREATE TABLE Part(

PNumber

PName

PPrice

PRating

NUMBER(10)

VARCHAR(30)

NUMBER(7,2)

VARCHAR(10)

/* Part description */

NOT NULL, /* Number */

NOT NULL, /* Name

NOT NULL, /* Price

NULL, /* Rating

CONSTRAINT Part_pkey PRIMARY KEY( PNumber ) );

*/

*/

*/

CREATE TABLE Orders(

ONumber

ODate

OCustomer

NUMBER(10)

DATE

/* Order description

NOT NULL, /* Number

NOT NULL, /* Date when issued

VARCHAR(255) NOT NULL, /* Customer involved

CONSTRAINT Orders_pkey PRIMARY KEY( ONumber ) );

*/

*/

*/

*/

CREATE TABLE OrderLine(

LOrder

LLine

LPart

NUMBER(10)

NUMBER(3)

NUMBER(10)

/* Ordered parts

NOT NULL, /* Order number

NOT NULL, /* Line number

NOT NULL, /* Part number

LQuantity NUMBER(6) NOT NULL, /* Quantity

CONSTRAINT OrderLine_pkey PRIMARY KEY( LOrder, LLine ),

CONSTRAINT OrderLine_fkey1 FOREIGN KEY( LPart )

REFERENCES Part( PNumber ) );

*/

*/

*/

*/

*/

Modify the script listed above in the following ways:

(i) The script should create three locally managed tablespaces with the manual allocation of extents and with the size of each extent equal to 64 Kbytes. The size of a tablespace for a relational table Part should be 3 Mbytes, the size of a tablespace for the relational tables Orders and OrderLine should be 5 Mbytes, the size of the tablespace for the indexes on the attributes Parts(PNumber) and

Parts(Pname) should be 2 Mbytes. It should NOT be possible to automatically extend any of the tablespaces created and each tablespace should consist of only one file. All other parameters of the tablespaces are up to you.

(ii) The script should create a new user account that will own the database.

(iii) The script should grant the roles RESOURCE and CONNECT to the new user and revoke UNLIMITED TABLESPACE privilege from the account.

(iv) The script should grant to the user access to all disk space available in the tablespaces created in the previous step.

(v) The script should connect as the new user, create the tables and indexes in the appropriate tablespaces.

Question 5 (8 marks)

Road and Traffic Authority would like to record information about the accidents that happen from time to time our the roads. The database should contain information about all accidents that have happened in a period of the last 10 years. The total number of different cars is equal to 10 6 . Statistical data show that a car is involved in 0.1

of an accident per year.

An identifier of an accident ( aid ) consists of 10 bytes, an average length of a value of attribute location is equal to 100 bytes, implementation of a list of participants

( participants ) needs 200 bytes, description of an accident takes ( description ) needs 1000 bytes, and finally type of an accident ( recorded ) requires 10 bytes.

A database designer created a relational table:

ACCIDENT(aid, location, participants, description, type) to keep information about the accidents.

Consider a database system where: size of database disk block is equal to 8K , size of block header is equal to 100 bytes, we ignore table headers, size of an entry in a row directory is equal to

20% of each data block should be left empty.

4 bytes

Estimate how much storage is required to keep in the database information about all accidents that have happened in the last 10 yearss. Use a model of data block explained to you during the lecture classes in this subject (presentation 22. Relational table size ). Show all calculations.

Question 6 (8 marks)

The scenarios listed below are related to the following relational tables.

EMPLOYEE(e#, fname, lname, dept, salary, evaluation) primary key = (e#) foreign key = (dept) references DEPARTMENT(name)

DEPARTMENT(name, budget, manager, objectives, category)

Primary key = (name)

For each one of the scenarios given below propose a solution that eliminates the problems listed after a scenario and justify your proposal.

Scenario 1 (2 marks)

Your system has 2 Kbytes data block size. The rows in a relational table EMPLOYEE are longer than 2 Kbytes due to long values of evaluation attribute. There is a hash based index on attribute e# .

Problem

A relational table EMPLOYEE suffers from overflow chaining when new rows are added.

Scenario 2 (2 marks)

Consider a relational table DEPARTMENT . Suppose that there are 30 DEPARTMENT rows per one data block. There are 3 categories of departments.

Problem

Should you put a nonclustering B*-tree index on an attribute category to improve the performance of multipoint queries like:

SELECT *

FROM DEPARTMENT

WHERE category = …;

Scenario 3 (2 marks)

Consider a relational table DEPARTMENT . Suppose that there are 30 DEPARTMENT rows per one data block. Each department has a different manager.

Problem

Should you put a nonclustering B*-tree index on an attribute category to improve the performance of multipoint queries like:

SELECT *

FROM DEPARTMENT

WHERE manager = …;

Scenario 4 (2 marks)

Auditors take a copy of EMPLOYEE table to which they wish to apply a statistical analysis. They want to execute the following queries:

(i) Count all employees that have certain salary.

(ii) Find all employees that have maximum (minimum) salary within a particular department.

(iii) Find all employees with a given value of attribute e# .

Problem

Propose the best indexing schema to improve the performance of statistical analysis.

Question 7 (10 marks)

Consider a relational table EMPLOYEE(e#, name, salary, position) where an attribute e# is a primary key.

Assume that:

(i) a relational table EMPLOYEE occupies 10 2 data blocks,

(ii) a relational table EMPLOYEE contains 10 3 rows,

(iii) an attribute name has 800 distinct values,

(iv) an attribute salary has 20 distinct values,

(v) an attribute position has 50 distinct values,

(vi) a primary key is automatically indexed,

(vii) the attributes salary and position are indexed,

(viii) all indexes are implemented as B*-trees with a fanout equal to 10 ,

(ix) a leaf level of an index on attribute salary consists of 5 data blocks,

(x) a leaf level of an index on attribute position consists of 20 data blocks.

Find the total number of read block operations needed to compute the following queries:

(1) (2 marks)

SELECT DISTINCT salary

FROM EMPLOYEE;

(2) (2 marks)

SELECT *

FROM EMPLOYEE

WHERE name = 'James';

(3) (2 marks)

SELECT *

FROM EMPLOYEE

WHERE e# = 007 AND position = 'boss';

(4) (2 marks)

SELECT *

FROM EMPLOYEE

WHERE position = 'boss';

(5) (2 marks)

SELECT *

FROM EMPLOYEE

WHERE position = 'boss ' AND salary = 1000;

Show all calculations.

Download