ROLL NO.: NAME: CS 543 – Data Warehousing Midterm Exam Solution April 15, 2006 Duration: 75 minutes (15:30 to 16:45) 1. Order the following decision support technologies from oldest to most recent. Write the order number (1 to 5, where 1 is the oldest and 5 is the most recent) in front of each technology. - (3) Executive information systems - (1) Ad-hoc reporting - (4) Data warehousing - (5) Data mining - (2) Expert systems 2. The following statement is not true regarding ERP: a. ERP provides a unified interface to the organization’s processes b. ERP integrates both internal and external data sources c. ERP does not create a separate integrated data repository d. ERP can be supported by a data warehousing environment e. ERP is essentially for the day-to-day operations of the organization f. None of the above 3. The following is not true regarding the corporate information factory (CIF) architecture: a. It adopts the top-down approach to data warehouse construction b. It advocates the use of relational modeling c. It creates dependent data marts d. It allows users direct access to the enterprise data warehouse e. It is advocated by Claudia Imhoff f. None of the above 4. Are dependent data marts conformed? Explain briefly. Yes, in a way, dependent data marts are conformed. They are constructed from the enterprise data warehouse which is a consistent and integrated source of data, Dependent data marts thus conform to the standard of the enterprise data warehouse. 5. List the uses of the metadata in a data warehousing environment. Be clear and concise. Promotes automation of the data warehousing environment by providing a single resource for all applications/tools CS 543 (Sp 05-06) – Dr. Asim Karim Page 1 of 5 Provides (business) rules for ETL Maintains information for the smooth operation and maintenance of the data warehouse Provides instructions for the end-users on how to query and interpret the results Provides information regarding the syntax and semantics of the data models in the data warehouse and the source databases 6. Draw the infrastructure diagram for a multi-tier client server data warehousing environment labeling the key functional components. 7. List at least 5 myths of the multidimensional modeling approach as pointed out by Kimball. Be clear and concise. Dimensional models and data marts are for summary data only Dimensional models and data marts are departmental, no enterprise, solutions Dimensional models and data marts are not scalable Dimensional models and data marts are only appropriate when there is a predictable usage pattern Dimensional models and data marts cannot be integrated and therefore lead to stovepipe solutions 8. Suppose a retail sales fact table has the following attributes: Store (FK), Product (FK), Date (FK), Customer (FK), Invoice Number, Qty Sold, Cost, Sale Amount, Profit. a. Define the grain of the fact table (precisely) in words Each row in the fact table describes an event (a sale) defined by one store, one product, one date, and one customer b. What is the type of the non-key attributes? (additive, junk, degenerate, etc). Invoice Number – degenerate dimension (not a fact) Qty Sold – fully additive fact Cost – fully additive fact Sale Amount – fully additive fact Profit – fully additive fact. If profit is in percentages, then it is semi-additive 9. Updates to a dimension table can be slow or rapid. Give an example of each type for a Customer dimension in a customer-oriented business (such as a telecommunications company). The customer dimension table of a telecom company can have several million rows. Examples of attributes that update slowly are: marital status, education, citizenship, etc Examples of attributes that update rapidly: age, income, address, number of children, etc CS 543 (Sp 05-06) – Dr. Asim Karim Page 2 of 5 10. Consider the following attributes of a Date dimension of a retail company: Date Key, Day, Week, Month, Year, Day of Week, Holiday Flag, Event Flag, Fiscal Month, Fiscal Quarter, Fiscal Year Draw the hierarchies of the attributes in the dimension. Identify attributes that fall in these hierarchies and those that do not occur in any hierarchy. Hierarchy 1 (from summary to detail) Year, Month, Week, Day Hierarchy 2 Fiscal Year, Fiscal Quarter, Fiscal Month Attributes that do not belong to these hierarchies Day of Week, Holiday Flag, Event Flag 11. Which of the following dimension types require creation of a new dimension table in the schema: a. Sub-dimension b. Junk dimension c. Degenerate dimension d. a and b e. a, b, and c 12. The primary purpose of a data warehouse is strategic information delivery. Identify additional/side benefits to an organization that constructs a data warehouse. Be clear and concise. Improvement in the quality of data Explicit analysis and formulization of business processes and rules Central repository for the organization’s data Standardization of terms and terminologies across the organization Enhanced awareness for the importance of data collection and analysis 13. A bank wants to analyze month-ending balances in its various accounts. An efficient way to perform such analyses is to have a. Transaction fact table b. Base fact table c. Snapshot fact table d. Core fact table e. ________________ fact table (specify your own, and justify) 14. Suppose a data mart has 4 dimensions with 10,000, 20,000, 50,000, and 100,000 rows respectively. What is the number of allowable events according to this dimensional model? In general, how many rows would the fact table have? (lesser, greater, equal, not able to decide). Justify your answer. CS 543 (Sp 05-06) – Dr. Asim Karim Page 3 of 5 Number of allowable events = 10,000*20,000*50,000*100,000 = 1,000,000 In general, the fact table will have lesser number of rows, as the number of actual events is always less than or equal to the number of allowable events. 15. Match the columns (write the matching statements’ letter in front of the number). 1. nonvolatile data (D) A. subject-oriented 2. dual data granularity (C) B. private spreadsheets 3. dependent data mart (E) C. details and summary 4. internal data (B) D. read-only 5. decision support (A) E. data from main data warehouse 16. What is a data warehouse bus matrix? How is it related to a information package diagram? Explain briefly with an example. The data warehouse bus matrix identifies the business processes and the dimensions along which these processes are analyzed. The business processes are listed in rows and the dimensions along columns with a cross (X) if a process uses a dimension. The information package diagram captures information for one business process only. It identifies dimensions, dimension attributes, and facts. 17. (10 points) Design a dimensional model for course registration in the School of Arts and Sciences at LUMS. The system maintains student registration in courses every quarter over several years. Several majors are offered in the school, and the school is made up of several academic departments to which faculty are associated. As a minimum, the model should be able to answer the following queries efficiently: - How many courses did student XYZ register/enroll in Spring 2005? - What is the total number of courses offered by faculty in the CS department in 2005? - How many units of courses were offered in the ECON major during Spring 2005? - What was the average GPA of students majoring in CS during Spring 2005? Identify the dimensions, facts, and their attributes. Draw the resulting dimensional model schema. Provide justifications for your design. Ensure that your design follows good practices of dimensional modeling. (Do this on the back-side of the PREVIOUS page) 18. (10 points) Consider the following dimensional model for postpaid billing at a wireless telecom company. CS 543 (Sp 05-06) – Dr. Asim Karim Page 4 of 5 Clearly identify the problems in this design and suggest ways of resolving them. Present your answer as a list clearly numbering the problem, describing the problem, and suggesting the solution. Draw the final corrected schema. CS 543 (Sp 05-06) – Dr. Asim Karim Page 5 of 5