ROLL NO.: NAME: CS 543 – Data Warehousing Midterm Exam June 17, 2004 Duration: 75 minutes (3:30 to 4.45 PM) Instructions: (1) Please write legibly. Unreadable answers will NOT be graded. Use a BALL-POINT and write in a readily readable font size. (2) Write in the space provided for each question only. If you need to continue a question in a different space, have it signed by the instructor/invigilator. Otherwise, it will not be graded. (3) Distribute your time accordingly as some questions might be more involved than others. (4) Make all usual assumptions if something is not stated in the question explicitly. If you feel your answer depends on an unstated assumption, then state it explicitly. (5) This is a close books/notes exam. (6) There are 19 questions in this exam. All questions (except one) have equal weight. 1. The following is NOT a reason for the emergence of the data warehousing environment in the 1990s: a. Growing information crisis b. Globalization of organizations c. Need to be successful in a competitive environment d. Improvements in hardware and software technologies e. Automation f. None of the above 2. Suppose you have used the top-down approach to build a data warehouse for a multinational company. The company now wants to have subject-specific or departmental data marts. The best approach would be to a. Partition the data warehouse along subject lines b. Build conformed data marts c. Use the bottom-up approach to create the data marts d. Develop a GUI that allows subject-specific views and accesses of the data warehouse e. Decline the project as it would be too difficult 3. Suppose there are on average 10,000 sale transactions in a grocery store daily. If the transactions contain an average of 4 items (from 250,000 items in store), how many rows would be added to the store’s data mart’s base fact table every week? CS 543 (Su 03/04) – Dr. Asim Karim Page 1 of 5 4. Refer to question 3 above. Suppose the grocery store’s data mart also has a 2-way aggregate fact table that summarizes weekly sales of item categories. If there are on average 100 items per category and items from all categories are sold each week, how many additional rows would be added to this fact table per week? 5. What is a snapshot fact table? Explain briefly. 6. A scalable computing platform for data warehousing should have a. Expandable disk storage space b. Parallel processing capabilities c. Multiple information delivery channels d. Query optimizations e. Excellent communication links f. a and b only g. a, b and c 7. What level of data granularity is best when you are unsure of the types of queries that users will pose to the data warehouse? a. High (i.e. coarse information) b. Low c. Medium d. There is no “best” level of data granularity 8. What is the difference between normalization and sub-dimensioning of a star schema. Explain briefly with an example. CS/CMPE 332 (Sp 03/04) – Dr. Asim Karim Page 2 of 5 9. Indicate true or false in front of the following statements. a. ERP systems may be substituted for data warehouses b. Metadata standards facilitate deploying a combination of best-of-breed products c. A separate data staging platform is essential for a data warehousing environment d. Business dimensions can be identified from operational system databases e. Transactional fact tables can grow in size very quickly over time 10. As you move up the dimension hierarchy for an aggregate fact table, the percentage of valid/filled rows a. Increases b. Decreases c. Remains the same d. Cannot predict from this information 11. (10 points) Match the columns (write the matching statements’ letter in front of the number). 1. nonvolatile data A. roadmap for users 2. dual data granularity B. subject-oriented 3. dependent data mart C. knowledge discovery 4. disparate data D. private spreadsheets 5. decision support E. application flavor 6. data staging F. because of multiple sources 7. data mining G. details and summary 8. metadata H. read-only 9. operational systems I. workbench for data integration 10. internal data J. data from main data warehouse 12. Would you recommend normalizing a fact table? Yes or no, and explain very briefly why? CS/CMPE 332 (Sp 03/04) – Dr. Asim Karim Page 3 of 5 13. What is JAD?. 14. The star schema is better than the snowflake schema in the following way: a. It wastes less storage space b. It is easier to maintain and keep consistent c. It is easier to understand by end-users d. It is more efficient for transaction storage e. None of the above 15. List at least two problems with using operational system keys as primary keys for the dimension tables. 16. Suppose a medical data warehouse has the dimensions patient, clinic, time, and payment_method. The fact table records medical check-up events or instances with attributes like bill_amount, cost, body_emperature, eye_color, and number_of_tests. Draw a snowflake schema for the data warehouse. Show all relevant attributes in the dimension tables. List all primary and foreign keys. Show the relationships between the tables (Read questions 17 and 18 also; use the backside of the previous page to answer this question). 17. Refer to the medical data warehouse described in question 16 above. For each of the attributes in the fact table, mention whether it is additive, semi-additive, non-additive, or degenerate. CS/CMPE 332 (Sp 03/04) – Dr. Asim Karim Page 4 of 5 18. Refer to the medical data warehouse snowflake schema that you created in question 16 above. Suppose there are 25,000 patients, 100 clinics, 365 days, and 4 payment methods. Assume on average 250 patients are checked per clinic per day. a. How many rows of the fact table are required to answer the query: What is the total cost of medical check-ups at clinic XYZ during the first 7 days of May, 2004? b. How many table joins are needed to answer the query in 18(a) based on the snowflake schema you designed in question 16? 19. Consider the following star schema: Create a new star schema that includes a 1-way aggregate fact table (along product category) and a 2-way aggregate fact table (along product category and time quarters). Answer on the backside of the previous page. CS/CMPE 332 (Sp 03/04) – Dr. Asim Karim Page 5 of 5