INF3703/202/01/2011 PRINCIPLES OF DATABASES Tutorial letter 202/ Studiebrief 202 CONTENTS / INHOUD: Solution to assignment 2 / Oplossing vir werkopdrag 2 SCHOOL OF COMPUTING SKOOL VIR REKENAARKUNDE INF3703/202/01/2011 1. Answers to Assignment 1 Question 1 1.1 What is normalization? Answer to 1.1: Normalization is the process for assigning attributes to entities. Properly executed, the normalization process eliminates uncontrolled data redundancies, thus eliminating the data anomalies and the data integrity problems that are produced by such redundancies. Normalization does not eliminate data redundancy; instead, it produces the carefully controlled redundancy that lets us properly link database tables. 1.2 When is a table in 1NF? Answer to 1.2: A table is in 1NF when all the key attributes are defined (no repeating groups in the table) and when all remaining attributes are dependent on the primary key. However, a table in 1NF still may contain partial dependencies, i.e., dependencies based on only part of the primary key and/or transitive dependencies that are based on a non-key attribute. 1.3 When is a table in 2NF? Answer to 1.3: A table is in 2NF when it is in 1NF and it includes no partial dependencies. However, a table in 2NF may still have transitive dependencies, i.e., dependencies based on attributes that are not part of the primary key. 1.4 When is a table in 3NF? Answer to 1.4: A table is in 3NF when it is in 2NF and it contains no transitive dependencies. 1.5 When is a table in BCNF? Answer to 1.5: A table is in Boyce-Codd Normal Form (BCNF) when it is in 3NF and every determinant in the table is a candidate key. For example, if the table is in 3NF and it contains a nonprime attribute that determines a prime attribute, the BCNF requirements are not met.This description clearly yields the following conclusions: If a table is in 3NF and it contains only one candidate key, 3NF and BCNF are equivalent. BCNF can be violated only if the table contains more than one candidate key. Putting it another way, there is no way that the BCNF requirement can be violated if there is only one candidate key. 2 INF3703/202/01/2011 Question 2 2.1 Using the INVOICE table structure shown in table 1, write the relational schema, draw its dependency diagram and identify all dependencies (including all partial and transitive dependencies). You can assume that the table does not contain repeating groups and that any invoice number may reference more than one product. (Hint: This table uses a composite primary key.) Table 1 Attribute Name INV_NUM PROD_NUM SALE_DATE PROD_LABEL VEND_CODE VEND_NAME QUANT_SOLD PROD_PRICE Sample Value 211347 AAE3422QW 15-Jan-2006 Rotary sander 211 NeverFail, Inc. 1 $49.95 Sample Value Sample Value Sample Value Sample Value 211347 QD-300932X 211347 RU-995748G 211348 AA-E3422QW 211349 GH-778345P 15-Jan-2006 0.25-in. drill bit 15-Jan-2006 Band saw 15-Jan-2006 Rotary sander 16-Jan-2006 Power drill 211 NeverFail, Inc. 309 BeGood, Inc. 211 NeverFail, Inc. 157 ToughGo, Inc. 8 $3.45 1 $39.99 2 $49.95 1 $87.75 Answer to 2.1: Note: QUANT_SOLD = NUM_SOLD 3 INF3703/202/01/2011 2.2 Using the initial dependency diagram drawn in question 2.1, remove all partial dependencies, draw the new dependency diagrams, and identify the normal forms for each table structure you created. Answer to 2.2: 2.3 Using the table structures you created in question 2.2, remove all transitive dependencies and draw the new dependency diagrams. Also identify the normal forms for each table structure you created Answer to 2.3: 4 INF3703/202/01/2011 Question 3 Some Tiny University staff employees are information technology (IT) personnel. Some IT personnel provide technology support for academic programs. Some IT personnel provide technology infrastructure support. Some IT personnel provide technology support for academic programs and technology infrastructure support. IT personnel are not professors. IT personnel are required to take periodic training to retain their technical expertise. Tiny University tracks all IT personnel training by date, type, and results (completed vs. not completed). Given that information, create the complete ERD containing all primary keys, foreign keys, and main attributes. Answer: These are optional 5 INF3703/202/01/2011 Question 4 4.1 Explain the following statement: a transaction is a logical unit of work. Answer to 4.1: A transaction is a logical unit of work that must be entirely completed of aborted; no intermediate states are accepted. In other words, a transaction, composed of several database requests, is treated by the DBMS as a unit of work in which all transaction steps must be fully completed if the transaction is to be accepted by the DBMS. Acceptance of an incomplete transaction will yield an inconsistent database state. To avoid such a state, the DBMS ensures that all of a transaction's database operations are completed before they are committed to the database. For example, a credit sale requires a minimum of three database operations: 4.2 What is concurrency control? Answer to 4.2: Concurrency control is the activity of coordinating the simultaneous execution of transactions in a multiprocessing or multi-user database management system. The objective of concurrency control is to ensure the serializability of transactions in a multi-user database management system. (The DBMS's scheduler is in charge of maintaining concurrency control.) Because it helps to guarantee data integrity and consistency in a database system, concurrency control is one of the most critical activities performed by a DBMS. If concurrency control is not maintained, three serious problems may be caused by concurrent transaction execution: lost updates, uncommitted data, and inconsistent retrievals. 4.3 What is the advantages of DDBMS? Answer to 4.3: Data are located near the greatest demand site Faster access Faster data access Growth facilitation 4.4 What is the disadvantages of DDBMS? Answer to 4.4: Complexity of management and control Security Lack of standards Increased storage requirements 6 INF3703/202/01/2011 Question 5 5.1 Explain the difference between a distributed database and distributed processing Answer to 5.1: Distributed processing, a database’s logical processing is shared among two or more physically independent sites that are connected through a network. For example the data input/output(I/O), data selection and data validation might be performed on one computer, and a report based on that data might be on another computer. Distributed database, on the other hand, stores a logically related database over two or more physically independent sites. The sites are connected via computer network. 5.2 Describe the different types of database requests and transactions. Answer to 5.2: A database transaction is formed by one or more database requests. Each database request is the equivalent of a single SQL statement. The basic difference between a local transaction and a distributed transaction is that the latter can update or request data from several remote sites on a network. In a DDBMS, a database request and a database transaction can be of two types: remote or distributed. A remote request accesses data located at a single remote database processor (or DP site). In other words, an SQL statement (or request) can reference data at only one remote DP site. A remote transaction, composed of several requests, accesses data at only a single remote DP site. A distributed transaction allows a transaction to reference several different local or remote DP sites. Although each single request can reference only one local or remote DP site, the complete transaction can reference multiple DP sites because each request can reference a different site. Use Figure 12.12 to illustrate the distributed transaction. A distributed request lets us reference data from several different DP sites. Since each request can access data from more than one DP site, a transaction can access several DP sites. The ability to execute a distributed request requires fully distributed database processing because we must be able to: Partition a database table into several fragments. 7 INF3703/202/01/2011 Reference one or more of those fragments with only one request. In other words, we must have fragmentation transparency. The distributed request feature also allows a single request to reference a physically partitioned table. 5.3 What is the objective of query optimisation function? Answer to 5.3: The objective of query optimization functions is to minimize the total costs associated with the execution of a database request. The costs associated with a request are a function of:the access time (I/O) cost involved in accessing the physical data stored on disk the communication cost associated with the transmission of data among nodes in distributed database systems the CPU time cost. It is difficult to separate communication and processing costs. Query-optimization algorithms use different parameters, and the algorithms assign different weight to each parameter. For example, some algorithms minimize total time, others minimize the communication time, and still others do not factor in the CPU time, considering it insignificant relative to the other costs. Query optimization must provide distribution and replica transparency in distributed database systems. 5.4 What is XML and why is it important? Answer to 5.4: Extensible Markup Language (XML) is a metalanguage used to represent and manipulate data elements. XML is designed to facilitate the exchange of structured documents, such as orders and invoices, over the internet. 8 INF3703/202/01/2011 2.1 Extras exercises to complete Question 1: Problem 7 on page 200 of textbook Question 2: Problem 7 on page 234 of textbook Question 3: Problem 11 and 12 on page 284 of textbook 9 INF3703/202/01/2011 Answer to Question 1: Solution to Problem 7 on page 200. 10 INF3703/202/01/2011 Answer to Question 2: Solution to Problem 7on page 238 11 INF3703/202/01/2011 Answer to Question 3: Solution to Problem 11 & 12 on page 284 EMP_CODE EMP_LNAME EMP_EDUCATION DEPT_CODE DEPT_NAME DEPT_MANAGER Transitive Dependencies Continued…. EMP_DEPENDENTS EMP_DOB EMP_HIRE_DATE EMP_TRAINING Continued…. JOB_TITLE JOB_CLASS EMP_BASE_SALARY EMP_COMMISSION_RATE Transitive Dependencies EMPLOYEE EMP_CODE EMP_LNAME DEPT_CODE JOB_CLASS EMP_DOB EMP_HIRE_DATE DEPARTMENT DEPT_CODE DEPT_NAME EMP_CODE QUALIFICATION EDUCATION EMP_CODE EDUC_CODE QUAL_DATE EDUC_CODE EDUC_DESCRIPTION DEPENDENT EMP_CODE DEP_NUM DEP_FNAME DEP_TYPE JOB JOB_CLASS JOB_TITLE JOB_BASE_SALARY 12 INF3703/202/01/2011 The relational schemas are written as: EMPLOYEE (EMP_CODE, EMP_LNAME, DEPT_CODE, JOB_CLASS, EMP_DOB,EMP_HIREDATE) DEPENDENT(EMP_CODE, DEP_NUM, DEP_FNAME, DEP_TYPE) DEPARTMENT(DEPT_CODE, DEPT_NAME, EMP_CODE) JOB(JOB_CLASS, JOB_TITLE, JOB_BASE_SALARY) EDUCATION(EDUC_CODE, EDUC_DESCRIPTION) QUALIFICATION(EMP_CODE, EDUC_CODE, QUAL_DATE_EARNED) UNISA 2011 13