Exercises (Course: Database Management Systems)

Review Questions and Exercises (Course: Database Management Systems) Chaper 1 Disk Storage, Basic File Structures and Hashing Review Questions Q1. What is the difference between a file organization and an acess method ? Q2. What is the difference between a static and dynamic files? Q3. What are the typical record-at-a-time operations for accessing a file? Which of these depend on the current record of a file? Q4. Discuss the advantages and disadvantages of using (a) an unordered file, (b) an ordered file, and (c) a static hashing file with bucket and chaining. Which operations can be performed efficiently on each of these organizations, and which operations are expensive? Q5. Discuss the techniques for allowing a hash file to expand and shrink dynamically. What are the advantages and disadvantages of each? Exercises E1 Consider a disk with the following characteristics: block size B = 512 bytes, interlock gap size G = 128 bytes; number of blocks per track = 20; number of tracks per surface = 400. A disk pack consists of 15 double-sided disks. a. What is the total capacity of a track, and what is its useful capacity (excluding interlock gaps)? b. How many cylinders are there? c. What are the total capacity and the useful capacity of a cylinder? d. What are the total capacity and the useful capacity of a disk pack? e. Suppose that the disk drive rotates the disk pack at a speed of 2400 rpm (revolutions per minute); what are the transfer rate (tr) in bytes/msec and the block transfer time (btt) in msec? What is the average rotational delay (rd) in msec? What is the bulk transfer rate? (See Appendix C) f. Suppose that the average disk seek time is 30 msec. How much time does it take (on the average) in msec to locate and transfer a single block, given its block address? g. Calculate the average time it would take to transfer 20 random blocks, and compare this with the time it would take to transfer 20 consecutive blocks using double buffering to save seek time and rotational delay. E2 A file has r = 20000 STUDENT records of fixed length. Each record has the following fields: NAME ( 30 bytes), SSN (9 bytes), ADDRESS (40 bytes), PHONE (9 bytes), BIRTHDATE (8 bytes), SEX (1 byte), MAJORDEPTCODE (4 bytes), MINORDEPTCODE (4 bytes), CLASSCODE (4 bytes, integer), and DEGREEPROGRAM (3 bytes). An additional byte is used as a deletion maker. The file is stored on the disk whose parameters are given in Exercise E1. a. Calculate the record size R in bytes. b. Calculate the blocking factor bfr and the number of file block b, assuming an unspanned organization. c. Calculate the average time it takes to find a record by doing a linear search on the file if (i) the file blocks are stored contiguously, and double buffering is used; (ii) the file are not stored contiguously. d. Assume that the file is ordered by SSN; calculate the time it takes to search for a record given its SSN value, by doing a binary search. E3. A PARTS file with Part# as hash key includes records with the following Part# = values: 2369, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 6975, 4981, 9208. The file used eight buckets, numbered 0 to 7. Each bucket is one disk block and holds two records. Load these records into the file in the given order, using the hash function h(K) = K mod 8. Calculate the average number of block accesses for a random retrieval on Part#. E4. Load the records of Exercise E3 into expandable hash files based on extendible hashing. Show the structure of the directory at each step, and the global and local depths. Use the hash function h(K) = K mod 128. E5. Load the records of Exercise E3 into an expandable hash file, using hashing. Start with a single disk block, using the hash function h0 = K mod 20, and show how the file grows and how the hash functions change as the records are inserted. Assume that blocks are split wherever an overflow occurs, and show the value of n at each stage. E6. Suppose that we have a hash file of fixed-length records, and suppose that overflow is handled by chaining. Outline algorithms for insertion, deletion, and modification of a file record. State any assumptions you make. E7. Can you think of techniques other than chaining to handle bucket overflow in external hashing? E8. Write pseudo-code for the insertion algorithms for linear hashing and for extendible hashing. Review Questions and Exercises Chaper 2 Indexing Structures for Files Review Questions Q1. Define the following terms : indexing field, primary key field, clustering field, secondary key field, block anchor, dense index, nondense (sparse) index Q2. What are the differences among primary, secondary, and clustering indexes? How do these differences affect the ways in which these indexes are implemented? Which of the indexes are dense, and which are not? Q3. Why can we at most one primary or clustering index on a file, but several secondary indexes? Q4. How does multilevel indexing improve the efficiency of searching an index file? Q5. What is the order p of a B-tree? Describe the structure of B-tree nodes. Q6. What is the order p of a B+-tree? Describe the structure of both internal and leaf nodes of a B+-tree. Q7. How does a B-tree differ from a B+-tree? Why is a B+-tree usually preferred as an access structure to a data file? Exercises E1. Consider a dish with block size B = 512 bytes. A block pointer is P = 6 bytes long, and a record pointer is PR = 7 bytes long. A file has r = 30,000 EMPLOYEE records of fixed length. Each record has the following fields: NAME (30 bytes), SSN (9 bytes), DEPARTMENTCODE (9 bytes), ADDRESS (40 bytes), PHONE (9 bytes), BIRTHDATE (8 bytes), SEX (1 byte), JOBCODE (4 bytes), SALARY (4 bytes, real number). An additional byte is used as a deletion marker. a. Calculate the record size R in bytes. b. Calculate the blocking factor bfr and the number of file blocks b, assuming an unspanned organization. c. Suppose that the file is ordered by the key field SSN and we want to construct a primary index on SSN. Calculate (i) the index blocking factor bfr, (which is also the index fanout fo); (ii) the number of first-level index entries and the number of first-level index blocks; (iii) the number of levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the multilevel index; and (v) the number of block accesses needed to search for and retrieve a record from the file – given its SSN value – using the primary index. d. Suppose that the file is not ordered by the key field SSN and we want to construct a secondary index on SSN. Repeat the previous exercise (part c) for the secondary index and compare with the primary index. e. Suppose that the file is not ordered by the nonkey field DEPARTMENTCODE and we want to construct a secondary index on DEPARTMENTCODE, using block of record pointers, with an extra level of indirection that stores record pointers. Assume there are 1000 distinct values of DEPARTMENTCODE and that the EMPLOYEE records are evenly distributed among these values. Calculate (i) the index blocking factor bfri, (which is also the index fanout fo); (ii)the number of blocks needed by the level of indirection that stores record pointers; (iii) the number of first-level index entries and the number of first-level index blocks; (iv) the number of levels needed if we make it into a multilevel index; (v) the total number of blocks required by the multilevel index and the blocks used in the extra level of indirection; (vi) the approximate number of block accesses needed to search for and retrieve all records in the file that have a specific DEPARTMENTCODE value – using the index. f. Suppose that the file is ordered by the nonkey field DEPARTMENTCODE and we want to construct a clustering index on DEPARTMENTCODE that uses block anchors (every new value of DEPARTMENTCODE starts at the beginning of a new block). Assume there are 1000 distinct values of DEPARTMENTCODE and that the EMPLOYEE records are evenly distributed among these values. Calculate (i) the index blocking factor bfri, (which is also the index fanout fo); (iii) the number of levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the multilevel index; (vi) the number of block accesses needed to search for and retrieve all records in the file that have a specific DEPARTMENTCODE value – using the clustering index (assume that multiple blocks in a cluster are contiguous). g. Suppose that the file is not ordered by the key field SSN and we want to construct a B+-tree access structure (index) on SSN. Calculate (i) the orders p and pleaf of the B+-tree; (ii) the number of leaf-level blocks needed if blocks are approximately 69% full (rounded up for convenience); (iii) the number of levels needed if internal nodes are also 69 percent full (rounded up for convenience); (iv) the total number of blocks required by the B+-tree; and (v) the number of block accesses needed to search for and retrieve a record from the file – given its SNN value – using the B+-tree. E2. A PARTS file with Part# as key key includes records with the following Part# = values: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39, 43, 47, 50, 69, 75, 8, 49, 33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like. E3. Repeat Exercise E2, but use a B-tree of order p = 4 instead of a B+-tree. E4. Suppose that the following search field values are deleted, in the given order, from the B+-tree of Exercise E2; show how the tree will shrink and show the final tree. The deleted values are 65, 75, 43, 18, 20, 92, 59, 37. E5. Repeat Exercise E4, but use a B-tree of order p = 4 instead of a B+-tree. Review Questions and Exercises Chaper 3 Algorithms for Query Processing and Optimization Review Questions Q1. Discuss the reasons for converting SQL queries into relational algebra queries before optimization is done. Q2. Discuss the different algorithms for implementing each of the following relational operators and the circumstances under which each algorithms can be used: SELECT JOIN UNION, INTERSECT, SET DIFFERENCE, CARTESIAN PRODUCT Q3. What is a query execution plan? Q4. What is meant by the term heuristic optimization? Discuss the main heuristics that are applied during query optimization? Q5. How does a query tree represent a relational algebra expression? What is meant by an execution of a query tree? Discuss the rules for transformation of query trees and identify when each rule should be applied during optimization. Q6. How many different orders are there for a query that joins 10 relations? Q7. What is the difference between pipelining and materialization? Q8. Discuss the cost components for a cost function that is used to estimate query execution cost. Which cost components are used most often as the basis for cost functions? Exercises E1. Given a database consisting of the following relations: And given the following SQL queires: Q1: SELECT FNAME, LNAME, ADDRESS FROM EMPLOYEE, DEPARTMENT WHERE DNAME='Research' AND DNUMBER=DNO (Retrieve the name and address of all employees who work for the 'Research' department.) Q8: SELECT E.FNAME, E.LNAME, S.FNAME, S.LNAME FROM EMPLOYEE E S WHERE E.SUPERSSN=S.SSN (For each employee, retrieve the employee's name, and the name of his or her immediate supervisor.) Q4: (SELECT PNAME FROM PROJECT, DEPARTMENT, EMPLOYEE WHERE DNUM=DNUMBER AND MGRSSN=SSN AND LNAME='Smith') UNION (SELECT PNAME FROM PROJECT, WORKS_ON, EMPLOYEE WHERE PNUMBER=PNO AND ESSN=SSN AND LNAME='Smith') (Make a list of all project numbers for projects that involve an employee whose last name is 'Smith' as a worker or as a manager of the department that controls the project.) Q27: SELECT FNAME, LNAME, 1.1*SALARY FROM EMPLOYEE, WORKS_ON, PROJECT WHERE SSN=ESSN AND PNO=PNUMBER AND PNAME='ProductX’ (Show the effect of giving all employees who work on the 'ProductX' project a 10% raise.) a. Draw at least two query trees that can represent each of these queries. Under what circumstances would you use each of your query trees? b. Draw initial query tree for each of these queries, then show how the query tree is optimized. c. For each query, compare your own query trees of part (a) and the initial and final query trees of part (b). E2. A file of 4096 blocks is to be sorted with an available buffer space of 64 blocks. How many passes will be needed in the merge phase of the external sort-merge algorithm? E3 Extend the sort-merge join algorithm to implement the left outer join operation. E4. Given the following relations: EMPLOYEE(ename, ssn, bdate, address, sex, salary, dno) PROJECT(pname, pnumber, plocation) WORKS_ON(essn, pno, hours) and the query: “Find the names of the employees whose birthdates are after 1957 and currently work for the project Aquarius” Apply the heuristic optimization transformation rules to find an efficient query execution plan for the above query, which is described by the following query tree. ename Pname = ‘Aquarius’ and Pnumber=Pno and Essn = Ssn and Bdate > ’31-12-1957’   Employee Project Works_on E5 Given the three following relations: Supplier(Supp#, Name, City, Specialty) Project(Proj#, Name, City, Budget) Order(Supp#, Proj#, Part-name, Quantity, Cost) and the SQL query: SELECT Supplier.Name, Project.Name FROM Supplier, Order, Project WHERE Supplier.City = ‘New York City’ AND Project.Budget > 10000000 AND Supplier.Supp# = Order.Supp# AND Order.Proj# = Project.Proj# a. Write the relational algebraic expression that is equivalent to the above query and draw a query tree for the expression. b. Apply the heuristic optimization transformation rules to find an efficient query execution plan for the above query. Assume that the number of the supliers in New York is larger that the number of the projects with the budgets more than 10000000$. Review Questions and Exercises (Course: Database Management Systems) Chaper 4 Introduction to Transaction Processing Concepts and Theory Review Questions Q1. Discuss the actions taken by the read_item and write_item operations on a database. Q2. What is the system log used for? What are the typical kinds of records in a system log? What are transaction commit points, and why are they important? Q3. Discuss the atomicity, durability, isolation, and consistency preservation properties of a database transaction. Q4. What is a serial schedule? What is a serializable schedule? Why is a serial schedule considered correct ? Why is a serializable schedule considered correct ? Q5. Discuss how serializability is used to enforce concurrency control in a database system. Why is serializability sometimes considered too restrictive as a measure of correctness for schedules? Exercises E1. Which of the following schedules is (conflict) serializable ? For each serializable schedule, determine the equivalent serial schedules. a. r1(X); r3(X); w1(X); r2(X); w3(X); b. r1(X); r3(X); w3(X); w1(X); r2(X); c. r3(X); r2(X); w3(X); r1(X); w1(X); d. r3(X); r2(X); r1(X); w3(X); w1(X); E2. Consider the three transactions T1, T2 and T3, and the schedules S1 and S2 given below. Draw the serializability (precedence) graphs for S1 and S2, and state whether each schedule is serializable or not. If a schedule is serializable, write down the equivalent serial schedule(s). T1: r1(X); r1(Z); w1(X); T2: r2(Z); r2(Y); w2(Z); w2(Y); T3: r3(X); r3(Y); w3(Y); S1: r1(X); r2(Z); r1(Z); r3(X); r3(Y), w1(X); w3(Y); r2(Y), w2(Z); w2(Y); S2: r1(X); r2(Z); r3(X); r1(Z); r2(Y), r3(Y); w1(X); w2(Z), w3(Y); w2(Y); E3. Given two following transactions: T1: r1(A); w1(A); r1(B);w1(B); T2: r2(A); w2(A); r2(B);w2(B); Prove that the schedule S: r1(A);w1(A); r2(A); w2(A); r1(B);w1(B); r2(B);w2(B); is conflict-serializable. (Hint: reordering the nonconflicting operations in S until we form the equivalent serial schedule). Review Questions and Exercises (Course: Database Management Systems) Chapter 5 Concurrency Control Review Questions Q1. What is the two-phase locking protocol ? How does it guarantee serializability? Q2. What are some variations of the two-phase locking protocol? Why is strict or rigorous two-phase locking often preferred? Q3. Discuss the problems of deadlock and starvation, and the different approaches to dealing with these problems. Q4. Describe the wait-die and wound-wait protocols for deadlock prevention. Q5. What is a timestamp? How does the system generate timestamps? Q6. Discuss the timestamp ordering protocols for concurrency control. How does strict timestamp ordering differ from basic timestamp ordering? Q7. Discuss two multiversion techniques for concurrency control. Q8. What is a certify lock? What are the advantages and disadvantages of using certify locks? Q9. How do optimistic concurrency control techniques differ from other concurrency control techniques? Why are they also called validation or certification techniques? Discuss the typical phases of an optimistic concurrency control method. Q10. How does the granularity of data items affect the performance of concurrency control? What factors affect selection of granularity size for data items? Q11. What is multiple granularity locking? Under what circumstances is it used? Q12. What are intention locks? Exercises E1. Consider the schedule shown in the following figure. Draw the wait-for graph before and after the last action write_lock(A) of transaction T3. T1 read_lock(A) read_item(A) T2 T3 T4 write_lock(B) write_item(B) read_lock(B) read_lock(C) read_item(C) write_lock( C) write_lock(B) write_lock(A) E2. Consider the set of transactions accessing database element A shown in the following figure. These transactions are operating under an ordinary timestamp-based scheduler. Explain why the transaction T3 has to be aborted. What happens if these transactions are operating under a multiversion timestamp-based scheduler? (Note: In the figure r means read and w means write.) T1 150 T2 200 T3 175 T4 225 A RT=0 WT=0 RT=150 WT=150 RT=200 WT=200 r4(A) RT=225 r1(A) w1(A) r2(A) w2(A) r3(A) Abort E3. Consider the relation Movie(title, year, length, studioName) Transaction T1 consists of the query: SELECT * FROM Movie WHERE title = ‘King Kong’ Transaction T2 consists of the query: UPDATE Movie SET year = 1939 WHERE title = ‘Gone with the wind’ Assume that there are two records in relation Movie with the title ‘King Kong’ and there is one record with the title ‘Gone with the wind’. Suggest the collection of locks for this situation. E4. Consider the three transactions T1, T2, and T3, and the schedules S1 and S2 given below. Draw the serializability graph for S1 and S2, and state whether each schedule is conflict-serializable or not. If a schedule is conflict-serializable, write down the equivalent serial schedule. T1: r1(B); w1(B); T2: r2(A); w2(A); r2(B); w2(B); T3: r3(A);w3(A); S1: r2(A); r1(B); w2(A); r3(A); w1(B); w3(A); r2(B); w2(B); S2: r2(A); r1(B); w2(A); r2(B); r3(A); w1(B); w3(A); w2(B); Review Questions and Exercises (Course: Database Management Systems) Chaper 6 Database Recovery Techniques Review Questions Q1. How are buffering and caching techniques used by the recovery subsystem ? Q2. Describe the write-ahead logging protocol. Q3. Discuss the UNDO and REDO operations and the recovery techniques that use each. Q4. Discuss the deferred update technique of recovery. What are the advantages and disadvantages of this technique? Why is it called the NO-UNDO/REDO method? Q5. How can recovery handle transaction operations that do not affect the database such as the printing of reports by a transaction? Q6. Discuss the immediate update recovery technique in both single-user and multi-user environments. What are the advantages and disadvantages of immediate update? Q7. Describe the shadow paging recovery technique. Under what circumstances does it not require a log? Q8. Describe the three phases of the ARIES recovery method. Q9. What are log sequence numbers (LSNs) in ARIES? How are they used? What information does the Dirty Page Tables and Transaction Table contain? Describe how fuzzy checkpointing is used in ARIES. Q10. Multiple-choice questions from 19.28 to 19.37 in the text book Exercises E1. Suppose that the system crashes before the [read_item, T3, A] entry is written to the following log: A B C D 30 15 40 20 [start_transaction, T3] [read_item, T3, C] * [write_item, T3, B, 15, 12] 12 [start_transaction,T2] [read_item, T2, B] ** [write_item, T2, B, 12, 18] 18 [start_transaction,T1] [read_item, T1, A] [read_item, T1, D] [write_item, T1, D, 20, 25] 25 [read_item, T2, D] ** [write_item, T2, D, 25, 26] 26 [read_item, T3, A] ---- system crash ---*T3 is rolled back because it did not reach its commit point. ** T2 is rolled back because it reads the value of item B written by T3. Will that make any difference in the recovery process? E2. Suppose that the system crashes before the [write_item, T2, D, 25, 26] entry is written to the log given in Exercise E1. Will that make any difference in the recovery process? E3. The log corresponding to a particular schedule at the point of a system crash for four transactions T1, T2, T3 and T4 is given as follows: [start_transaction, T1] [read_item, T1, A] [read_item, T1, D] [write_item, T1, D, 20, 25] [commit, T1] [checkpoint] [start_transaction, T2] [read_item, T2, B] [write_item, T2, B, 12, 18] [start_transaction, T4] [read_item, T4, D] [write_item, T4, D, 25, 15] [start_transaction, T3] [write_item, T3, C, 30, 40] [read_item, T4, A] [write_item, T4, A, 30, 20] [commit, T4] [read_item, T2, D] [write_item, T2, D, 15, 25] ---- system crash Suppose that we use the immediate update protocol with checkpointing. Describe the recovery process from the system crash. Specify which transactions are rolled back, which operations in the log are redone and which (if any) are undone, and whether any cascading rollback take place. E4. Suppose that we use the deferred update protocol for the example in Exercise E3. Show how the log would be different in the case of deferred update by removing the unnecessary log entries; then describe the recovery process, using your modified log. Assume that only REDO operations are applied, and specify which operations in the log are redone and which are ignored. Multi-choice questions: 1. Incremental logging with deferred updates implies that the recovery system must necessarily a. store the old value of the updated item in the log. b. store the new value of the updated item in the log. c. store both the old and new value of the updated item in the log. d. store only the Begin Transaction and Commit Transaction records in the log. 2. The write ahead logging (WAL) protocol simply means that a. the writing of a data item should be done ahead of any logging operation. b. the log record for an operation should be written before the actual data is written. c. all log records should be written before a new transaction begins execution. d. the log never needs to be written to disk. 3. In case of transaction failure under a deferred update incremental logging scheme, which of the following will be needed: a. un undo operation b. a redo operation c. an undo and redo operation. d. none of the above. 4. for incremental logging with immediate updates, a log record for a transaction would contain: a. a transaction name, data item name, old value of item, new value of item. b. a transaction name, data item name, old value of item. c. a transaction name, data item name, new value of item. d. a transaction name and data item name. 5. For correct behavior during recovery, undo and redo operation must be a. commutative b. associative c. idempotent d. distributive 6. When a failure occurs, the log is consulted and each operation is either undone or redone. This is a problem because: a. searching the entire log is time consuming. b. many redo’s are unnecessary c. both (a) and (b) d. none of the above. 7. When using a log based recovery scheme, it might improve performance as well as providing a recovery mechanism by a. writing the log records to disk when each transaction commits. b. writing the appropriate log records to disk during the transaction’s execution. c. waiting to write the log records until multiple transactions commit and writing them in a batch. d. never writing the log records to disk. 8. There is a possibility of a cascading rollback when a. a transaction writes items that have been written only by a committed transaction. b. a transaction writes an item that is previously written by an uncommitted transaction. c. a transaction reads an item that is previously written by an uncommitted transaction. d. both (b) and (c). 9. To cope with media (disk) failures, it is necessary a. for the DBMS to only execute transactions in a single user environment. b. to keep a redundant copy of the database. c. to never abort a transaction d. all of the above. 10. If the shadowing approach is used for flushing a data item back to disk, then a. the item is written to disk only after the transaction commits. b. the item is written to a different location on disk. c. the item is written to disk before the transaction commits. d. the item is written to the same disk location from which it was read. Review Questions and Exercises (Course: Database Management Systems) Chaper 7 Data Warehousing Review Questions Q1 Kho dữ liệu có bốn tính chất quan trọng. Bốn tính chất đó là gì ?. Q2 Tại sao thời gian thường là một chiều trong kho dữ liệu? Q3. Lược đồ hình sao và lược đồ hình bông tuyết khác nhau như thế nào? Q4. Giải thích các thuật ngữ rút trích (extract), biến thể (transform) và nạp (load) trong kho dữ liệu. Q5. Giaûi thích khaùi nieäm khoái döõ lieäu (data cube) vaø neâu coâng duïng cuûa noù trong vieäc moâ hình hoùa kho döõ lieäu. Q6. Moâ taû ngaén goïn ba loaïi taùc vuï thöôøng coù trong caùc coâng cuï OLAP. Q7. Trả lời các câu hỏi trắc nghiệm sau đây: 1. Data Mart không tồn tại một cách vật lý trong loại kiến trúc kho dữ liệu nào sau đây: A. kiến trúc hai tầng tổng quát B. kiến trúc data mart độc lập C. kiến trúc ba tầng tổng quát D. kiến trúc data mart luận lý và kho dữ liệu @ctive. 2. Dữ liệu hòa giải (reconciled data) tương ứng với dữ liệu trong tầng nào sau đây: A. Data mart B. Data warehouse C. Cơ sở dữ liệu tác nghiệp D. Cả ba câu trên đều sai 3. Trong phân tích OLAP, tác vụ drill-down được thực hiện bằng cách A. tổng quát hóa lên trong hệ phân cấp chiều, thí dụ: từ city  state B. chi tiết hóa xuống trong hệ phân cấp chiều, thí dụ: từ state  city C. thêm chiều mới D. bỏ một vài chiều 4. Trong một lược đồ hình bông tuyết, A. Bảng fact thường được chuẩn hóa B. Bảng fact thường phi chuẩn hóa C. Các bảng chiều là chuẩn hóa D. Các bảng chiều là phi chuẩn hóa Q8. Match the following terms and definitions ……… event a) previous data content is lost ……… periodic data b) detailed, historical data ……… data mart c) converts data formats ……… star schema d) corrects errors in source data ……… data mining e) data are not altered or deleted ……… reconciled data f) a database action (e.g., create) ……… dependent data mart g) data warehouse of limited scope ……… data visualization h) dimension and fact tables ……… transient data i) form of knowledge discovery ……… snowflake schema j) filled from data warehouse ……… data transformation k. results from hierarchical dimensions ……… data scrubbing l. data represented in graphical formats Exercises E1. Millenium College want you to help them design a star schema to record grades for courses completed by students. There are four dimension tables with attributes as follows:  Course_Section. Attributes: Course_ID, Section_number, Course_Name, Units, Room_ID, Room_Capacity.  Professor. Atributes: Prof_ID, Prof_Name, Title, Department_ID, Department_Name.  Student. Attributes: Student_ID, Student_Name, Major.  Period. Atributes: Semester_ID, Year. The only fact that is to be recorded in the fact table is Course_Grade. Design a star schema for this problem. E2. Having mastered the principles of normalization, you recognize immediately that the star schema you developed for Millenium College in exercise E1 is not in third normal form. Using these principles, convert the star scheme to a snowflake schema. E3. Given the ER diagram for a database application as in the following figure. In this application the transaction “medical examination” (represented by the binary relationship “examines”) is the main subject of the data warehouse to be constructed. Design a star schema for this data ware house. Pat_ID Specialty Age Sex examines DOCTOR PATIENT Lives-in Name Doc_ID date charge CITY city-name state

Exercises (Course: Database Management Systems)

Related documents

Products

Support

Exercises (Course: Database Management Systems)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib