CS 524 – High-Performance Computing

advertisement
ROLL NO.:
NAME:
CS 543 – Data Warehousing
Midterm Exam Solution
April 15, 2006
Duration: 75 minutes (15:30 to 16:45)
1. Order the following decision support technologies from oldest to most recent. Write
the order number (1 to 5, where 1 is the oldest and 5 is the most recent) in front of
each technology.
- (3) Executive information systems
-
(1) Ad-hoc reporting
-
(4) Data warehousing
-
(5) Data mining
-
(2) Expert systems
2. The following statement is not true regarding ERP:
a. ERP provides a unified interface to the organization’s processes
b. ERP integrates both internal and external data sources
c. ERP does not create a separate integrated data repository
d. ERP can be supported by a data warehousing environment
e. ERP is essentially for the day-to-day operations of the organization
f. None of the above
3. The following is not true regarding the corporate information factory (CIF)
architecture:
a. It adopts the top-down approach to data warehouse construction
b. It advocates the use of relational modeling
c. It creates dependent data marts
d. It allows users direct access to the enterprise data warehouse
e. It is advocated by Claudia Imhoff
f. None of the above
4. Are dependent data marts conformed? Explain briefly.
Yes, in a way, dependent data marts are conformed. They are constructed from the
enterprise data warehouse which is a consistent and integrated source of data, Dependent
data marts thus conform to the standard of the enterprise data warehouse.
5. List the uses of the metadata in a data warehousing environment. Be clear and
concise.

Promotes automation of the data warehousing environment by providing a single
resource for all applications/tools
CS 543 (Sp 05-06) – Dr. Asim Karim
Page 1 of 5




Provides (business) rules for ETL
Maintains information for the smooth operation and maintenance of the data
warehouse
Provides instructions for the end-users on how to query and interpret the results
Provides information regarding the syntax and semantics of the data models in the
data warehouse and the source databases
6. Draw the infrastructure diagram for a multi-tier client server data warehousing
environment labeling the key functional components.
7. List at least 5 myths of the multidimensional modeling approach as pointed out by
Kimball. Be clear and concise.





Dimensional models and data marts are for summary data only
Dimensional models and data marts are departmental, no enterprise, solutions
Dimensional models and data marts are not scalable
Dimensional models and data marts are only appropriate when there is a
predictable usage pattern
Dimensional models and data marts cannot be integrated and therefore lead to
stovepipe solutions
8. Suppose a retail sales fact table has the following attributes: Store (FK), Product (FK),
Date (FK), Customer (FK), Invoice Number, Qty Sold, Cost, Sale Amount, Profit.
a. Define the grain of the fact table (precisely) in words
Each row in the fact table describes an event (a sale) defined by one store, one product,
one date, and one customer
b. What is the type of the non-key attributes? (additive, junk, degenerate, etc).
Invoice Number – degenerate dimension (not a fact)
Qty Sold – fully additive fact
Cost – fully additive fact
Sale Amount – fully additive fact
Profit – fully additive fact. If profit is in percentages, then it is semi-additive
9. Updates to a dimension table can be slow or rapid. Give an example of each type for a
Customer dimension in a customer-oriented business (such as a telecommunications
company).
The customer dimension table of a telecom company can have several million rows.
Examples of attributes that update slowly are: marital status, education, citizenship, etc
Examples of attributes that update rapidly: age, income, address, number of children, etc
CS 543 (Sp 05-06) – Dr. Asim Karim
Page 2 of 5
10. Consider the following attributes of a Date dimension of a retail company:
Date Key, Day, Week, Month, Year, Day of Week, Holiday Flag, Event Flag, Fiscal
Month, Fiscal Quarter, Fiscal Year
Draw the hierarchies of the attributes in the dimension. Identify attributes that fall in
these hierarchies and those that do not occur in any hierarchy.
Hierarchy 1 (from summary to detail)
Year, Month, Week, Day
Hierarchy 2
Fiscal Year, Fiscal Quarter, Fiscal Month
Attributes that do not belong to these hierarchies
Day of Week, Holiday Flag, Event Flag
11. Which of the following dimension types require creation of a new dimension table in
the schema:
a. Sub-dimension
b. Junk dimension
c. Degenerate dimension
d. a and b
e. a, b, and c
12. The primary purpose of a data warehouse is strategic information delivery. Identify
additional/side benefits to an organization that constructs a data warehouse. Be clear
and concise.





Improvement in the quality of data
Explicit analysis and formulization of business processes and rules
Central repository for the organization’s data
Standardization of terms and terminologies across the organization
Enhanced awareness for the importance of data collection and analysis
13. A bank wants to analyze month-ending balances in its various accounts. An efficient
way to perform such analyses is to have
a. Transaction fact table
b. Base fact table
c. Snapshot fact table
d. Core fact table
e. ________________ fact table (specify your own, and justify)
14. Suppose a data mart has 4 dimensions with 10,000, 20,000, 50,000, and 100,000 rows
respectively. What is the number of allowable events according to this dimensional
model? In general, how many rows would the fact table have? (lesser, greater, equal,
not able to decide). Justify your answer.
CS 543 (Sp 05-06) – Dr. Asim Karim
Page 3 of 5
Number of allowable events = 10,000*20,000*50,000*100,000 = 1,000,000
In general, the fact table will have lesser number of rows, as the number of actual events
is always less than or equal to the number of allowable events.
15. Match the columns (write the matching statements’ letter in front of the number).
1. nonvolatile data (D)
A. subject-oriented
2. dual data granularity (C)
B. private spreadsheets
3. dependent data mart (E)
C. details and summary
4. internal data (B)
D. read-only
5. decision support (A)
E. data from main data warehouse
16. What is a data warehouse bus matrix? How is it related to a information package
diagram? Explain briefly with an example.
The data warehouse bus matrix identifies the business processes and the dimensions
along which these processes are analyzed. The business processes are listed in rows and
the dimensions along columns with a cross (X) if a process uses a dimension.
The information package diagram captures information for one business process only. It
identifies dimensions, dimension attributes, and facts.
17. (10 points) Design a dimensional model for course registration in the School of Arts
and Sciences at LUMS. The system maintains student registration in courses every
quarter over several years. Several majors are offered in the school, and the school is
made up of several academic departments to which faculty are associated. As a
minimum, the model should be able to answer the following queries efficiently:
- How many courses did student XYZ register/enroll in Spring 2005?
- What is the total number of courses offered by faculty in the CS department in
2005?
- How many units of courses were offered in the ECON major during Spring 2005?
- What was the average GPA of students majoring in CS during Spring 2005?
Identify the dimensions, facts, and their attributes. Draw the resulting dimensional model
schema. Provide justifications for your design. Ensure that your design follows good
practices of dimensional modeling.
(Do this on the back-side of the PREVIOUS page)
18. (10 points) Consider the following dimensional model for postpaid billing at a
wireless telecom company.
CS 543 (Sp 05-06) – Dr. Asim Karim
Page 4 of 5
Clearly identify the problems in this design and suggest ways of resolving them. Present
your answer as a list clearly numbering the problem, describing the problem, and
suggesting the solution.
Draw the final corrected schema.
CS 543 (Sp 05-06) – Dr. Asim Karim
Page 5 of 5
Download