INF2603/202/2/2019 Tutorial letter 202/2/2019 Databases 1 INF2603 Semester 2 ASSIGNMENT 02 Marking Rubric/Guide School of Computing IMPORTANT INFORMATION: This tutorial letter contains important information about your module. Marking Guide of Assignment 02 S2 TOTAL MARKS: 100 Due date Tutorial matter covered in prescribed book 20 September 2019 All chapters in the syllabus (Chapter 1 to 8) Unique number: 842182 Question 1 [18] 1.1. What is a subtype discriminator? Given an example of its use. (3) 1.2. Under what circumstances are composite primary keys appropriate? (4) 1.3. What are time-variant data, and how would you deal with such data from a database design point of view? 1.4. Why are entity integrity and referential integrity important in a database? (2) (2) 1.5. What two conditions must be met before an entity can be classified as a weak entity? Give an example of a weak entity. (3) 1.6. Briefly, but precisely, explain the difference between single-valued attributes and simple attributes. Give an example of each. (4) ANSWER 1.1. What is a subtype discriminator? Given an example of its use. (3) Read section 5-1d on page 172, 2 marks for description and 1 mark for giving example 1.2. Under what circumstances are composite primary keys appropriate? (4) Read on page 177-179 (4 marks for discussing two cases to the fullest) 1.3. What are time-variant data, and how would you deal with such data from a database design point of view? (2) Read section 5-4b on page 182 (1 mark for defining time-variant data and 1 mark for explaining how to deal with data that changes over times from a database design point of view). 1.4. Why are entity integrity and referential integrity important in a database? Solution 2 (2) INF2603/202 Entity integrity and referential integrity are important because they are the basis for expressing and implementing relationships in the entity relationship model. Entity integrity ensures that each row is uniquely identified by the primary key. Therefore, entity integrity means that a proper search for an existing tuple (row) will always be successful. (And the failure to find a match on a row search will always mean that the row for which the search is conducted does not exist in that table.) Referential integrity means that, if the foreign key contains a value, that value refers to an existing valid tuple (row) in another relation. Therefore, referential integrity ensures that it will be impossible to assign a non-existing foreign key value to a table. 1.5. What two conditions must be met before an entity can be classified as a weak entity? Give an example of a weak entity. (3) Read section 4-1g on page 125, 2 marks for discussing 2 classifications, and 1 mark for giving example (page 126-7) 1.6. Briefly, but precisely, explain the difference between single-valued attributes and simple attributes. Give an example of each. (4) Read section 4-1b about attributes on page 117, 2 marks for explanation and 1 mark for each example. Question 2 [10] Suppose you are working within the framework of the conceptual model shown in ERD diagram below. The ERD FIGURE a. Write the business rules reflected in this ERD. 3 ANSWER: Learn more about business rules in sections 2-3 and 2-4. The figure has 6 entities and 5 relationships. Please note that there are optional and mandatory entities. In your answer, each relationship in both directions must produce 2 business rules (1 mark for each). Read about the application of business rules throughout the book. Also know how to differentiate 1:M 0r 1..*, M:N or *..* and 1:1 or 1..1 relationships. Question 3 [46] Suppose that you have been given the table structure and data shown in Table 3.1, which was imported from an Excel spreadsheet. The data reflect that a professor can have multiple advisees, can serve on multiple committees, and can edit more than one journal. Table 3.1: Sample PROFESSOR Records Attribute Name Sample Value Sample Value Sample Value EMP_NUM PROF_RANK EMP_NAME DEPT_CODE DEPT_NAME 123 Professor Ghee CIS Computer Info. Systems KDD-567 1215, 2312, 3233, 2218, 2098 104 Asst. Professor Rankin CHEM Chemistry 118 Assoc. Professor Ortega CIS Computer Info. Systems KDD-562 2134, 2789, 3456, 2002, 2046, 2018, 2764 PROF_OFFICE ADVISEE COMMITTEE_CODE JOURNAL_CODE PROMO, TRAF APPL, DEV JMIS, QED, JMGT BLF-119 3102, 2782, 3311, 2008, 2876, 2222, 3745, 1783, 2378 DEV SPR, TRAF Sample Value Assoc. Professor Smith ENG English PRT-345 2873, 2765, 2238, 2901, 2308 PROMO, SPR DEV JCIS, JMGT Given the information in Table: a. Draw the dependency diagram. (6) b. Identify the multivalued dependencies. (3) c. Create the dependency diagrams to yield a set of table structures in 3NF. (4) d. Eliminate the multivalued dependencies by converting the affected table structures to 4NF.(7) e. Draw the Crow’s Foot ERD to reflect the dependency diagrams you drew in Part c. (Note: You might have to create additional attributes to define the proper PKs and FKs. Make sure that all of your attributes conform to the naming conventions.) (26) [8,4,7,7] ANSWERS: Suppose that you have been given the table structure and data shown in Table 6.9, which was imported from an Excel spreadsheet. The data reflect that a professor can have multiple advisees, can serve on multiple committees, and can edit more than one journal. 4 INF2603/202 Table 3.1: Sample PROFESSOR Records Attribute Name Sample Value Sample Value Sample Value EMP_NUM PROF_RANK EMP_NAME DEPT_CODE DEPT_NAME 123 Professor Ghee CIS Computer Info. Systems KDD-567 1215, 2312, 3233, 2218, 2098 104 Asst. Professor Rankin CHEM Chemistry 118 Assoc. Professor Ortega CIS Computer Info. Systems KDD-562 2134, 2789, 3456, 2002, 2046, 2018, 2764 PROF_OFFICE ADVISEE COMMITTEE_CODE JOURNAL_CODE PROMO, TRAF APPL, DEV JMIS, QED, JMGT BLF-119 3102, 2782, 3311, 2008, 2876, 2222, 3745, 1783, 2378 DEV SPR, TRAF Sample Value Assoc. Professor Smith ENG English PRT-345 2873, 2765, 2238, 2901, 2308 PROMO, SPR DEV JCIS, JMGT Given the information in Table 3.1: a. Draw the dependency diagram. The dependency diagram is shown in Figure 3.1. Figure 3.1 The Dependency Diagram for Problem 3.a EMP_NUM PROF_RANK EMP_NAME DEPT_CODE DEPT_NAME PROF_OFFICE transitive dependency If each professor has a private office, PROF_OFFICE is a determinant of EMP_NUM. However, if an office can be shared among two or more professors, the dependency shown here does not exist. Because this dependency is not clear-cut, the dependency line is shown as a dashed line. ADVISEE COMMITTEE_CODE JOURNAL_CODE Note that Figure 3.1 reflects several ambiguities. For example, although each PROF_OFFICE value shown in Table 3.1 is unique, does that limited information indicate 5 that each professor has a private office? If so, the office number identifies the professor who uses that office. This condition yields a dependency. However, this dependency is not a transitive one, because a non-key attribute, PROF_OFFICE, determines the value of a key attribute, EMP_NUM. (We have indicated this potential transitive dependency through a dashed dependency line.) NOTE The assumption that PROF_OFFICE → EMP_CODE is a rather restrictive one, because it would mean that professors cannot share an office. One could safely assume that administrators at all levels would not care to be tied by such a restrictive office assignment requirement. Therefore, we will remove this restriction in the remaining problem solutions. Also, note that there is no reliable way to identify the effect of multivalued attributes on the dependencies. For example, EMP_NUM = 123 could identify any one of five advisees. Therefore, knowing the EMP_NUM does not identify a specific ADVISEE value. The same is true for the COMMITTEE_CODE and JOURNAL_CODE attributes. Therefore, these attributes are not marked with a solid arrow line. However, if you know that EMP_NUM = 123, you will also know all five advisees, all four committee codes, and all three journal codes for that employee number value. But you do not have a unique identification for each of those attribute values. Therefore, you cannot conclude that EMP_NUM → ADVISEE, nor can you conclude that EMP_NUM → COMMITTEE_CODE or that EMP_NUM → JOURNAL_CODE. b. Identify the multivalued dependencies. Table 3.1 shows several professor attributes – ADVISEE, COMMITTEE_CODE, and JOURNAL_CODE -- that represent multivalued dependencies. c. Create the dependency diagrams to yield a set of table structures in 3NF. The dependency diagrams are shown in Figure 3.2. Note that we have assumed that it is possible that professors can share an office. Figure 3.2 The Dependency Diagram for Problem 3c 6 INF2603/202 EMP_NUM PROF_RANK ADVISEE DEPT_CODE EMP_NAME PROF_OFFICE COMMITTEE_CODE DEPT_NAME 3NF JOURNAL_CODE 3NF 7 d. Eliminate the multivalued dependencies by converting the affected table structures to 4NF. The structures shown in Figure 3.3 conform to the 4NF requirement. Yet this normalization does not yield a viable database design. Here is another opportunity to stress that normalization without data modeling is a poor way to generate useful databases. (Note that we have assumed that an advisee can have only one advisor, but that an advisor can have many advisees.) Figure 3.3 The Initial Dependency Diagrams for Problem 3d EMP_NUM PROF_RANK ADVISEE EMP_NUM EMP_NAME EMP_NUM COMMITTEE_CODE EMP_NUM JOURNAL_CODE PROF_OFFICE DEPT_CODE DEPT_NAME Problem: This “solution” has limited value, because the Relationship between EMP_NUM and JOURNAL_CODE and between EMP_NUM and COMMITTEE_CODE is M:N. (A professor can write for many journals and each journal includes articles by many professors. Similarly, a professor can serve on many committees and each committee is composed of several professors.) The dependency diagrams shown in Figure 3.3 constitute an attempt to eliminate the shortcomings of the “system” shown in Figure 3.2. Unfortunately, while this solution meets the normalization requirements, it lacks the ability to properly link the professors to committees and journals. (That’s because the relationships between professors and journals and between professors and committees are M:N.) This solution would yield tables 3.2 and 3.3. (One would expect a professor to be an employee, so it’s reasonable to assume that – at some point -- we’ll have to create a supertype/subtype relationship between employee and professor. (To save space, we show only the first three EMP_NUM value sets from Table 3.1) 8 INF2603/202 Table 3.2 Implementation of the M:N Relationship between EMP_NUM and COMMITTEE_CODE EMP_NUM COMMITTEE_CODE 123 PROMO 123 TRAF 123 APPL 123 JMGT 104 DEV 118 SPR 118 TRAF The PK of the table shown in Table 3.2 is EMP_NUM + COMMITTEE_CODE. Table 3.3 Implementation of the M:N Relationship between EMP_NUM and JOURNAL_CODE EMP_NUM JOURNAL_CODE 123 JMIS 123 QED 123 JMGT 118 JCIS 118 JMGT The PK of the table shown in Table 3.3 is EMP_NUM + JOURNAL_CODE. Because EMP_CODE = 104 does not show any entries in the JOURNAL_CODE, the employee code does not occur in Table 3.3. The preceding table structures create multiple redundancies. Therefore, this solution is not acceptable. Here is yet another indication that normalization, while very useful, is not always (usually?) capable of producing implementable solutions. For example, the preceding examples illustrate that mulivalued attributes and M:N relationships cannot be effectively modeled without first using the ERD. (After the ERD has done its work, you should, of course, use dependency diagrams to check for data redundancies!) Figure 3.4 shows a more practical solution to the problem and its structures all conform to the normalization requirements. 9 e. Draw the Crow’s Foot ERD to reflect the dependency diagrams you drew in Part c. (Note: You might have to create additional attributes to define the proper PKs and FKs. Make sure that all of your attributes conform to the naming conventions.) Given the discussion in the previous problem segment d, we have incorporated additional features in the Crow’s Foot ERD shown in Figure 3.4. Note that we have eliminated the M:N relationships in this design by creating composite entities. This design is implementable and it meets design standards. Normalization was part of the process that led to this solution, but it was only a part of that solution. Normalization does not replace design! Figure 3.4 The Crow’s Foot ERD for Problem 3e 10 INF2603/202 Question 4 [7] a. What three join types are included in the OUTER JOIN classification? (3) Read page 85 and 212-284 about different Outer Join classes (1 mark for each) b. What is the difference between the COUNT aggregate function and the SUM aggregate function? (2) Read page 282 – 284 (1 mark for explaining each) c. What Oracle function should you use to calculate the number of days between your birthday and the current date? (2) Read about Oracle DATE FUNCTIONS on Table 7.9, page 304 to 305. NB: For subtracting your birthdate from the current date, using date arithmetic, the number of dates will be returned (1 mark for this). Note that in Oracle, the SQL statement requires the use of the FROM clause (1 mark for using virtual table called DUAL). Question 5 [19] Consider the contents of the ASSIGNMENT table and use SQL commands to answer questions 5a–e . Table 5.1: ASSIGNMENT 11 NOTE: All SQL syntax must be correct, missing characters will be penalized! a. Write the SQL query to list the assignmnet umber, project number, employee number and assignment charge per hour for all the projects that were assigned after the 24 th of March 2018.Sort the results by employee number. (3) b. Write the SQL code that will list only the distinct project numbers in the ASSIGNMENT table, sorted by project number. (4) c. Write the SQL code to validate the ASSIGN_CHARGE values in the ASSIGNMENT table. Your quesry shoul retrieve the assignment number, employee number, project number, the stored assignment charge (ASSIGN_CHARGE), and the calculated assignment charge (calculated by multiplying ASSIGN_CHG_HR by ASSIGN_HOURS). Sort the results by the assignment number. (5) Wrong question given to students, not refering to the given tables: d. Write the SQL query to display the book number, title, and cost for all books that cost R599.95 sorted by book number as in the table below. (2) Correct questions: Please use this TABLE: BOOK (Book_Num, Book_Title, Book_Year, Book_Cost, Book_Subject, Pat_ID) Write the SQL query to display the book number, title, and cost for all books that cost R599.95 sorted by book number as in the table below. (2) The students received free 2 marks, although most students got it right. e. Write a query to produce the total number of hours and charges for each of the projects represented in the ASSIGNMENT table, sorted by project number. The output is shown below: PROJ_NUM 12 (5) SumOfASSIGN_HOURS SumOfASSIGN_CHARGE 15 20.5 1806.52 18 23.7 1544.80 22 27.0 2593.16 25 19.4 1668.16 INF2603/202 ANSWER Consider the contents of the ASSIGNMENT table and use SQL commands to answer questions 5a–e . Table 6.1: ASSIGNMENT NOTE: All SQL syntax must be correct, missing characters will be penalized! a. Write the SQL query to list the assignmnet umber, project number, employee number and assignment charge per hour for all the projects that were assigned after the 24th of March 2018.Sort the results by employee number. (3) ANSWER SELECT AAIGN_NUM, PROJ_NUM, EMP_NUM, ASSIGN_CHG_HR FROM ASSIGNMENT WHERE ASSIGN_DATE> “24 MARCH 2018” ORDER BY EMP_NUM; 13 b. Write the SQL code that will list only the distinct project numbers in the ASSIGNMENT table, sorted by project number. (4) ANSWER SELECT DISTINCT PROJ_NUM FROM ASSIGNMENT ORDER BY PROJ_NUM; c. Write the SQL code to validate the ASSIGN_CHARGE values in the ASSIGNMENT table. Your query should retrieve the assignment number, employee number, project number, the stored assignment charge (ASSIGN_CHARGE), and the calculated assignment charge (calculated by multiplying ASSIGN_CHG_HR by ASSIGN_HOURS). Sort the results by the assignment number. (5) ANSWER SELECT ASSIGN_NUM, EMP_NUM, PROJ_NUM, ASSIGN_CHARGE, ROUND(ASSIGN_CHG_HR * ASSIGN_HOURS, 2) AS CALC_ASSIGN_CHARGE FROM ASSIGNMENT ORDER BY ASSIGN_NUM; d. Write the SQL query to display the book number, title, and cost for all books that cost R599.95 sorted by book number as in the table below. (2) The question was refering to the table of other assignment, hence we gave you free 2 marks e. Write a query to produce the total number of hours and charges for each of the projects represented in the ASSIGNMENT table, sorted by project number. The output is shown below: (5) ANSWER PROJ_NUM 14 SumOfASSIGN_HOURS SumOfASSIGN_CHARGE 15 20.5 1806.52 18 23.7 1544.80 22 27.0 2593.16 25 19.4 1668.16 INF2603/202 ANSWER SELECT PROJ_NUM, ROUND(Sum(ASSIGN_HOURS), 1) AS SumOfASSIGN_HOURS, ROUND(Sum(ASSIGN_CHARGE), 2) AS SumOfASSIGN_CHARGE FROM ASSIGNMENT GROUP BY PROJ_NUM ORDER BY PROJ_NUM; © 2019 15