SOLUTIONS - CIS209 - INTERNAL - 2003 PROBLEM 1 [25] Question 1 [6] A database is defined as “a shared collection of logically related persistent data as part of the information system of an organisation”. Explain in brief the meaning of “shared”, “logically related” and “persistent” in this definition. Answer “shared” means that different parts of an organisation — represented either via direct users or applications — put their data together in one repository — the database — which is then shared by all of them; [2] “logically related” means that the different items of data stored in a database are not independent from each-other; different links/relationships exist between them; [2] “persistent” means that, once stored, data does not disappear unless users instruct the DBMS to remove it; data persists whether or not the application programs that access it are running and it even persists in situations when the DBMS goes down; [2] Award 2 marks per explanation if the main idea was captured by the answer (i.e., even if the answer does not provide the above level of detail). TOTAL [6] Question 2 [6] Define the notion of “candidate key”. Can a relation have more than one candidate key? Give an example. Define the notion of “primary key” in terms of the “candidate key”. Answer A candidate key of a relation R is a (sub)set of attributes CK of R with the following properties: a) no distinct tuples in R can have the same value for CK (uniqueness); and [1] b) no proper subset of R has the uniqueness property (irreducibility). [1] The alternative definition — “A candidate key of R is a subset of attribute which can be used to uniquely identify each tuple in R” — should be rewarded 1 mark only. A relation can have more than one candidate key. [1] Example: R(StudentNo, FirstName, LastName, DOB, Address, Programme) CK: StudentNO and CK: (FistName, LastName, DOB) [2] (award 1 mark for each key correctly identified, but only if the relation has indeed two CKs) A primary key is a candidate key arbitrarily chosen by the developer of the database system. [1] (Note: in fact, the PK is used by many DBMSs in joins, and thus the choice is not completely arbitrary — the matter of efficiency arises; however, within the context of the relational model, no such arguments as efficiency arise, thus the choice is arbitrary) TOTAL [6] CIS209 – IS52003A 2003 Internal Solutions 1 Question 3 [3] Explain in brief what is it meant by physical program–data independence (or simply by just program—data independence) in the context of database systems. Answer The internal/physical and the logical levels of a database system are separated from each other [and are linked through a schema mapping]. [1] Application programs access the data of a database at the logical level [or the external]. [1] Physical program-data independence is the immunity of application programs to changes at the internal/physical level (assuming that the conceptual level does not change). [1] Award full marks if the general idea is conveyed, but less rigorously. TOTAL [3] Question 4 [3] Enumerate three benefits of the database approach to data management (as opposed to a bare file based approach). Answer - reduced redundancy; - fewer possibilities for storing inconsistent data; - easier integrity maintenance; - shared data; - program-data independence; - better maintenance of the data/information of the organisation (and of the overall information system); - data security can be easier enforced; - recovery mechanisms exist; - standards can be enforced; - conflicting requirements (for the overall information system) can be balanced; Award 1 mark per advantage/benefit correctly stated, but not more than 3 marks. TOTAL [3] Question 5 [4] Data can be stored in files and application programs can share this data by having a direct access to the respective files (refer to Diagram 1, below). However, data-centred applications normally employ a database management system (refer to Diagram 2, below). DBMS Program/Application 1 data files 1 Program/Application 1 data Program/Application n data files n Program/Application n Diagram 1 Diagram 2 How is the redundancy of data in each of the two approaches? CIS209 – IS52003A 2003 Internal Solutions 2 Answer In the first approach —file based — data is not integrated and thus the probability of ending with a lot of redundancy in the overall information system is quite high (different departments/parts of the system share sometimes big chunks of data. [2] In the database approach data is integrated and shared. [Providing a good design] the redundancy of data can be greatly reduced. [In fact, redundancy can almost completely be reduced, providing there are no other reasons to maintain it (efficiency of queries that would employ joins).] [2] The text in square brackets is not a necessary part of the answer. TOTAL [4] Question 6 [3] A union, intersection or difference can only be performed between two relations if they are type compatible. What is it meant by type compatibility? Give an example of two type compatible and two non type compatible relations (only the headings are required). Answer Two relations are type compatible if [and only if] they have the same headings. [1] Alternatively, two relations are type compatible if [and only if] they have the same set of attributes. The following two relations (having the same headings) are type compatible Men {<name : varchar>, <dob : date>, <address : varchar>} Women {<name : varchar>, <dob : date>, <address : varchar>} [1] whereas the following two relations are not type compatible Students {<name : varchar>, <address : varchar>, <programme : varchar>} Tutors {<name : varchar>, <address : varchar>, <main topic : char(6)>} [1] Award 1 mark for any correct respective example. If the examples are correct, but consist only of attribute names (and not of <attribute-name : attribute type> pairs, then award only 1 mark. TOTAL [3] PROBLEM 2 [25] Question 1 [19] Draw an ER diagram for the following description. The diagram should illustrate the entity types, including their attributes, the relationships between them and the multiplicity of each relationship (note that the textual description does not specify the multiplicity of all the relationships; you will have to state it yourself, according to your understanding of the problem). Work according to the following conventions: many-to-many relationships do not have to be transformed into one-tomany relationships; attributes could be composite and/or multi-valued; relationships may have attributes. An airline company intends to develop a database system for storing information about flights and their passengers. The information about flights is of two types: general flight information (such as the BA123 flight from London to Paris, departing at 12:00, available daily) and specific flight information (such as the BA123 flight on 02/04/2003 whose captain is John Smith). Note that a passenger may book a place on a specific flight but could not book a general flight. The general flight information should consist of flight number, destination airport, starting-point airport, intermediate stops (a list of airports), departure time and arrival time. The specific flight information should consist of date of flight, captain, delay at departure and delay at arrival. CIS209 – IS52003A 2003 Internal Solutions 3 The airline company has a flotilla of aircraft. The required information on aircraft consists of model/type (e.g., Airbus 540), seat capacity, normal flying altitude, flight autonomy (how long could it fly without refuelling), and an internal aircraft identifier used in case the company has more than one aircraft of the same model/type. A specific flight is always assigned one aircraft. Passengers book (specific) flights. The information regarding a booking, that is to be stored in the database, consists of ticket number and payment details. It would be useful to also have recorded in the database the time when the booking was made and the name of the staff member who performed the operation. The information required for each passenger consists of name, contact details these are made of house number, street, city, postcode, country and telephone and whether the person is smoking or nonsmoking. Answer Passenger name contactDetails houseNo street city postcode country telephone [1..2] smokingNonSmoking 1..* ticketNo paymentDetails date staff Books 1..* GenericFlight Aircraft flightNo destination startingPoint intermediateStops departureTime arrivalTime SpecificFlight 1 Has 0..* date captain delayAtDeparture delayAtArrival 0..* IsAssignedTo id 1 type seatCapacity altitude autonomy Award 4 marks for the correct identification of entities (1 per entity) (the names of the entities may be different, provided they “preserve” the same meaning as above) 4 marks for the correct identification of attributes (1 per entity) (the names of the attributes may be different, provided they “preserve” the same meaning as above; the marking should also accommodate “small” variations e.g. if one attribute is missing, nut the others are correctly identified, full marks should be awarded) 3 marks for the correct identification of relationships (1 per relationship) (the names and direction of the relationships may be different, provided they “preserve” the same meaning as above) 6 marks for correct identification of multiplicity of relationships (2 per relationship one at each end) 2 marks for the correct identification of the attributes for the Books relationship; still award 2 marks even if ‘date’ and ‘staff’ are not included. CIS209 – IS52003A 2003 Internal Solutions 4 Possible correct variations: - contact details considered as a separate entity; - telephone may be considered as single-valued attribute; - relationships Books replaced with an entity Booking with the same attributes as the relationship; there will be two one-to-many relationships between Booking, on one side, and Passenger and SpecificFlight, on the other; - the multiplicity of the IsAssignedTo, on the SpecificFlight side may be [1..*]; - the multiplicity of the Has, on the SpecificFlight side may be [1..*]; - Aircraft may be divided into two entities: a generic aircraft entity (detailing the type/model) and a specific aircraft entity (containing the id of each particular aircraft and other specific details); alternatively, the ‘id’ attribute as it is now in Aircraft could be regarded as multivalued. TOTAL [19] Question 2 [6] Consider the ER structure depicted in Figure 1 below. Find an application (e.g. library, hospital, software development company, university, etc.) for which this structure could be used to model a part of its information system and illustrate this model — i.e., find meaningful names for the entity types (E1 and E2), for the attributes of each entity (a1, ..., a5, b1, ..., b4), for the relationship R and for its attribute c. c E1 a1 a2 a3 a4 a5 E2 [0..*] R [1..*] b1 b2 b3 b4 Figure 1 Answer Sample solution: E1 : Student(studNo, fName, lName, address, dOB) E2 : Course(code, title, level, shortSyllabus) R : Takes (each student must take at least one course, but there may be (new optional) courses that are not taken by anyone (yet) c : result Award: 1 mark for a correct illustration of E1, a1 ... a5; 1 mark for a correct illustration of E2, b1 ... b4; 2 marks for a correct illustration of R; 2 marks for a correct illustration of c. TOTAL [6] CIS209 – IS52003A 2003 Internal Solutions 5 PROBLEM 3 [25] Question 1 Consider the following relation. patient_id patient_name p_DOB [4] disease doctor speciality diagnosis treatment Consider also the following assumptions: a doctor gives a unique diagnosis for the disease of one patient; however, a doctor may give different diagnoses for the same disease (for different patients); each diagnosis has associated a unique treatment; a doctor has a unique speciality; a patient has a unique patient_id. (a) In each of the expressions below, substitute the question marks with sets of attributes from the above relation to obtain expressions representing functional dependencies. [3] patient_id ? ? diagnosis ? treatment (b) Choose a primary key for this relation. [1] Answer a) patient_id patient_name (or p_DOB) patient_id, disease, doctor diagnosis diagnosis treatment b) PK : (patient_id, disease, doctor) [1] [1] [1] [1] TOTAL [4] Question 2 Consider the following relation. project task max_budget [21] duration payment_rate contractor contracted_time and the following functional dependencies: (project, task) max_budget, duration //there is a unique max_budget and period of work (duration) per project task (task, max_budget, duration) payment_rate //the contracting payment rate is unique given a certain task, max_budget and duration (project, task, contractor) contacted_time //contractors are employed on project tasks Assume they completely express all the functional dependencies existing in the given relation (i.e., the other are either trivial or can be deduced from the given ones). a) State the primary key for this relation (there is a unique candidate key). [2] b) State the definition for Boyce-Codd Normal Form (BCNF) [2] c) State a reason why this relation is not in BCNF. [2] d) State Heath’s theorem (for non-loss decomposition). [3] e) Decompose/transform (non-loss) the given relation into a set of relations in BCNF. Explain how you apply Heath’s theorem for each decomposition you make. State the end result clearly. Also, state the candidate keys for each resulting BCNF relation. [12] Note that the order in which you employ the above functional dependencies in normalisation is important — some orders may lead to the loss of certain dependencies. You are advised to start with the second functional dependency. CIS209 – IS52003A 2003 Internal Solutions 6 Answer a) primary key (PK) : (project, task, contractor) [2] b) A relation is in BCNF if and only if each of its non-trivial (left-)irreducible functional dependencies has a candidate key as its determinant. The “softer” (incorrect) version, whereby “non-trivial” and “ireducible” are not mentioned, should also be accepted (awarded full marks), provided the following question is also correctly answered. Otherwise award only 1 mark. [2] c) The first two functional dependencies (FDs) do not have a candidate key as their determinant, therefore they cause the relation to not be in BCNF. If either is mentioned as a cause, award full marks. [2] d) Let R be a relation and A, B and C subsets of attributes of R satisfying the following condition “heading(R)=ABC”. If R satisfies the functional dependency “AB” then R is equal to the join of its projections on (A, B) and (A, C). (alternatively, ‘If R satisfies the functional dependency “AB” then R can be non-loss decomposed into (A, B) and (A, C)). [3] e) (1) Heath’s theorem for R (the initial relation) based on ‘task, max_budget, duration payment_rate’ leads to: R1 (task, max_budget, duration, payment_rate) CK/PK : (task, max_budget, duration) R2 (project, task, max_budget, duration, contractor, contracted_time) CK/PK : (project, task, contractor) R1 is in BCNF R2 is not in BCNF, due to ‘project, task max_budget, duration’ (2) Heath’s theorem for R2, based on ‘project, task max_budget, duration’ leads to R21 (project, task, max_budget, duration) CK/PK : (project, task) R22 (project, task, contractor, contracted_time) CK/PK : (project, task, contractor) R21 is in BCNF R22 is in BCNF All of the initial FDs have been expressed. Result: (task, max_budget, duration, payment_rate) (project, task, max_budget, duration) (project, task, contractor, contracted_time) Award 6 marks for step (1) and 6 marks for step (2). Alternatively, award 4 marks for correct set of normalised relations (refer to “Result”, above; this should include the specification of CKs) — this means that the student had an intuition of the correct answer. The rest of 8 marks should be awarded for a correct normalisation process (application of Heath’s theorem (2 marks) + identification of relations in or not in BCNF (2 marks)). [12] NOTE: A correct answer to question e) accompanied by incorrect answers to questions b) and/or d) should look suspicious. Although it is possible that the student knows how to apply the definitions without being able to state them, this is improbable and the marker should consider such cases with care. TOTAL [21] CIS209 – IS52003A 2003 Internal Solutions 7 PROBLEM 4 [25] Consider the following database schema (some tuples are provided for explanatory purposes; arrows denote foreign keys —foreign keys are in italics; primary keys are in bold and underlined): Customer name dOB address occupation AccountType name category Student Current Golden Current Golden Savings current current savings minBalance -500 -2000 5000 interest 0.5% 1% 3.5% Account accNo type C-110-221 S-009-677 balance owner Student Current Golden Savings Joe Bloggs Mary Bear -245 5500 Transaction accNo date C-110-221 C-110-221 12/04/2003 21/04/2003 time 12:30 9:15 transType valIn cash withdraw cheque payment 0 200 valOut 30 0 Question 1 Referring to the above schema, express the following natural language queries in SQL: [19] (a) Find the minimum balance and interest for the ‘Golden Savings’ account type. [1] SELECT minBalance, interest FROM AccountType WHERE type = ‘Golden Savings’; (b) List the account names and their corresponding interest rates ordered according to the interest rates for all the ‘savings’ account types. [2] SELECT FROM WHERE ORDER BY name, interest AccountType category = ’savings’ interest; (c) List the date, time, transaction type, value in and value out for all the transactions incurred between ‘1/01/2003’ and ‘1/04/2003’ on ‘Joe Bloggs’’s ‘Student Current’ account. [3] SELECT FROM WHERE date, time, transType, valIn, valOut Transaction T, Account A T.accNo = A.accNo AND date BETWEEN (‘01/01/2003’ AND ‘01/04/2003’) AND owner = ‘Joe Bloggs’ AND type = ‘Student Current’; /* note that it ,makes no difference “student Current” accounts */ whether Joe Bloggs (d) List how much money ‘Mary Bear’ has in all her ‘savings’ accounts. SELECT FROM WHERE has one or more [3] SUM(balance) AS totalSavings Account A, AccountType AT type = name AND owner = ‘Mary Bear’ AND category = ‘savings’; CIS209 – IS52003A 2003 Internal Solutions 8 (e) List the name, address, occupation and total balance for all the customers whose total balance is negative — for each customer, their “total balance” means the sum of the balances of all their accounts. [3] SELECT FROM WHERE GROUP BY HAVING name, address, occupation, SUM(balance) AS totalBalance Account A, Customer C owner = name name, address, occupation totalBalance < 0; ‘address’ and ‘occupation’ are semantically redundant in the GROUP BY clause, but they are syntactically necessary in SQL; however, still award full marks even if the student uses only ‘name’ in the GROUP BY clause (i.e. assume that s/he would be able to “fix” such an omission if s/he were executing the command). Also ward full marks of the student does not use a name for the computed field and reuses the SUM expression in the HAVING clause. (f) List the account number, balance, account name/type and the interest on the respective account, for the account on which ‘Joe Bloggs’ has the highest balance. [3] SELECT FROM WHERE accNo, balance, type, interest ---‘name’ may be used instead ‘type’ Account, AccountType type = name AND owner = ‘Joe Bloggs’ AND balance IN ( SELECT MAX(balance) FROM Account WHERE owner = ‘Joe Bloggs’ ); ‘=’ could have been used instead of ‘IN’ to introduce the subquery — i.e., award full marks if that solution is proposed. The reason for the solution proposed above is its compliance with the relational model (the result of any select statement should be a relation/set). g) List the account category, number of accounts and average balance per category of account for all the customers whose occupation is ‘student’. [4] SELECT FROM WHERE category, COUNT(accNo) AS noAccounts, AVG(balance) AS avgBalance Account, AccountType AT, Customer C type = AT.name AND owner = C.name AND occupation = ‘student’ GROUP BY category; Note that ‘COUNT(*)’ could have been used instead of ‘COUNT(accNo)’. TOTAL [19] Question 2 Referring to the above schema, express the following integrity constraints in SQL: [6] a) The balance on each individual account (stored in ‘Account’) should not go below the minimum balance for its type (as stated in ‘AccountType’). [3] CREATE ASSERTION BalanceLimit CHECK ( NOT EXISTS ( SELECT * FROM Account, AccountType WHERE type = name AND balance < minBalance )); CIS209 – IS52003A 2003 Internal Solutions 9 b) The value-out for cash withdraws (see Transaction) for any “Student Current” account cannot be greater than 100 per individual transaction (note that this does not prevent the owner to withdraw more than 100 in consecutive transactions). [3] CREATE ASSERTION WithdrawLimit CHECK ( NOT EXISTS ( SELECT * FROM Transaction T, Account A WHERE T.accNo = A.accNo AND transType = ‘cash withdraw’ AND type = ‘Student Current’ AND valOut > 100 )); TOTAL [6] PROBLEM 5 [25] Question 1 [10] a) Consider the relation “Absences (student, date)”, with the primary key “(student, date)”, which records the dates when students are absent from university. For illustration, a small extension is given in Figure 1 below: Absences student adate S. Allen P. Clark S. Allen M. Lewis 12/01/2003 03/02/2003 05/03/2003 05/03/2003 CREATE VIEW SELECT FROM GROUP BY AbsCount AS student, COUNT(adate) AS noAbsences Absences student; Figure 1 Figure 2 Consider the view “AbsCount”, as defined in Figure 2. This represents the number of absences per student. Lastly, consider the following update operation attempted on AbsCount: UPDATE AbsCount SET noAbsences = 3 WHERE student = ‘S. Allen’; State whether this update operation could be performed by a relational DBMS and explain your answer. Draw a general rule regarding views from the above example. Although the syntax is that of SQL, you should consider the problem independent from any specific database language and/or DBMS. [6] b) State two restrictions imposed by SQL2 on update operations to views. [4] Answer a) This operation can simply not be performed by any relational DBMS. [1] Explanation Any update on a view has to be propagated to the base relations on which the view is defined. The view ‘AbsCount’ aggregates tuples from ‘Absences’. The proposed update should lead to the insertion or deletion of tuples in ‘Absence’ (obviously, only in the cases when the new value in the update statement is different from the old value). To insert a tuple into ‘Absences’, both ‘student’ and ‘adate’ should be provided. ‘adate’ is neither given in the query, nor can it be generated automatically for the tuples to be inserted/deleted. [4] If the student’s explanation is coherent and illustrates an understanding of the problem, but if it is not as comprehensive as above, still award full marks. Rule Any coherent rule that illustrates an understanding of the problem — even if it is incorrect — should be awarded full marks (i.e. 1). For example: “Updates are not possible through views that are defined via some aggregate functions” is incorrect but should be given 1 mark. [1] CIS209 – IS52003A 2003 Internal Solutions 10 b) In SQL2 updates on the following views are not possible: - views defined on two or more tables; - views defined via UNION, INTERSECT, EXCEPT; - the SELECT statement contains the word DISTINCT; - column specifications that contain elements different from a simple reference to columns of underlying base tables - ... etc. (see p. 83 of Study Guide, Vol. 1) Award 2 marks per correct statement. [4] TOTAL [10] Question 2 [8] a) What is a transaction? Give a simple example. [4] b) State and succinctly explain the two mechanisms customarily provided by the transaction manager (of a DBMS) for the implementation of transactions [4] Answer a) A transaction is a sequence of database operations that represents a logical unit of work. [2] An example could be given in the context of a database that stores some redundant data (e.g., each loan, in a library database, is stored explicitly, but the total number of loans is also explicitly stored for each borrower) – a transaction is required when such data is updated. [2] b) COMMIT - is issued at the end of a transaction (i.e. after all the operations of the transaction were successfully executed); once a DBMS received a COMMIT, the respective transaction is guaranteed to be executed. [2] ROLLBACK - is issues during a transaction if an error occurred in the execution of one of its operations; once a DBMS received a ROLLBACK, all the performed operations of the respective transaction are guaranteed to be undone. [2] Accept ‘the locking mechanism’ and the system’s ‘log’. Award 2 marks for correct description of each, but the maximum of marks, in this case, should not go over 3. TOTAL [8] Question 3 [7] a) Explain what is it meant by impedance mismatch in the context of relational database systems. [5] b) Consider two real life systems, A and B. Each requires the support of a database systems. System A consists of very many types of data objects (or entities), but each type (or entity) has only a few instances. System B consists of a moderate number of types of data objects (or entities), but each type (or entity) has very many instances. Disregarding any other constraints, for which system would you propose the use of a relational DBMS? [2] Answer a) In applications based on relational databases, data has to be translated between the way it is stored/represented on/in the database (the database’s data types) and the way it is represented in the application programmes (the data types of the programming language). Usually, the data types used by a relational database do not coincide with the data types used by a programming language. This is called impedance mismatch, and may cause the corruption of data. [5] b) System B. [2] TOTAL [7] CIS209 – IS52003A 2003 Internal Solutions 11