midterm exam - Suraj @ LUMS

advertisement
ROLL NO.:
NAME:
CS 543 – Data Warehousing
Midterm Exam
June 17, 2004
Duration: 75 minutes (3:30 to 4.45 PM)
Instructions:
(1) Please write legibly. Unreadable answers will NOT be graded. Use a BALL-POINT
and write in a readily readable font size.
(2) Write in the space provided for each question only. If you need to continue a question
in a different space, have it signed by the instructor/invigilator. Otherwise, it will not
be graded.
(3) Distribute your time accordingly as some questions might be more involved than
others.
(4) Make all usual assumptions if something is not stated in the question explicitly. If you
feel your answer depends on an unstated assumption, then state it explicitly.
(5) This is a close books/notes exam.
(6) There are 19 questions in this exam. All questions (except one) have equal weight.
1. The following is NOT a reason for the emergence of the data warehousing
environment in the 1990s:
a. Growing information crisis
b. Globalization of organizations
c. Need to be successful in a competitive environment
d. Improvements in hardware and software technologies
e. Automation
f. None of the above
2. Suppose you have used the top-down approach to build a data warehouse for a
multinational company. The company now wants to have subject-specific or
departmental data marts. The best approach would be to
a. Partition the data warehouse along subject lines
b. Build conformed data marts
c. Use the bottom-up approach to create the data marts
d. Develop a GUI that allows subject-specific views and accesses of the data
warehouse
e. Decline the project as it would be too difficult
3. Suppose there are on average 10,000 sale transactions in a grocery store daily. If the
transactions contain an average of 4 items (from 250,000 items in store), how many
rows would be added to the store’s data mart’s base fact table every week?
CS 543 (Su 03/04) – Dr. Asim Karim
Page 1 of 5
4. Refer to question 3 above. Suppose the grocery store’s data mart also has a 2-way
aggregate fact table that summarizes weekly sales of item categories. If there are on
average 100 items per category and items from all categories are sold each week, how
many additional rows would be added to this fact table per week?
5. What is a snapshot fact table? Explain briefly.
6. A scalable computing platform for data warehousing should have
a. Expandable disk storage space
b. Parallel processing capabilities
c. Multiple information delivery channels
d. Query optimizations
e. Excellent communication links
f. a and b only
g. a, b and c
7. What level of data granularity is best when you are unsure of the types of queries that
users will pose to the data warehouse?
a. High (i.e. coarse information)
b. Low
c. Medium
d. There is no “best” level of data granularity
8. What is the difference between normalization and sub-dimensioning of a star schema.
Explain briefly with an example.
CS/CMPE 332 (Sp 03/04) – Dr. Asim Karim
Page 2 of 5
9. Indicate true or false in front of the following statements.
a. ERP systems may be substituted for data warehouses
b. Metadata standards facilitate deploying a combination of best-of-breed
products
c. A separate data staging platform is essential for a data warehousing
environment
d. Business dimensions can be identified from operational system databases
e. Transactional fact tables can grow in size very quickly over time
10. As you move up the dimension hierarchy for an aggregate fact table, the percentage
of valid/filled rows
a. Increases
b. Decreases
c. Remains the same
d. Cannot predict from this information
11. (10 points) Match the columns (write the matching statements’ letter in front of the
number).
1. nonvolatile data
A. roadmap for users
2. dual data granularity
B. subject-oriented
3. dependent data mart
C. knowledge discovery
4. disparate data
D. private spreadsheets
5. decision support
E. application flavor
6. data staging
F. because of multiple sources
7. data mining
G. details and summary
8. metadata
H. read-only
9. operational systems
I. workbench for data integration
10. internal data
J. data from main data warehouse
12. Would you recommend normalizing a fact table? Yes or no, and explain very briefly
why?
CS/CMPE 332 (Sp 03/04) – Dr. Asim Karim
Page 3 of 5
13. What is JAD?.
14. The star schema is better than the snowflake schema in the following way:
a. It wastes less storage space
b. It is easier to maintain and keep consistent
c. It is easier to understand by end-users
d. It is more efficient for transaction storage
e. None of the above
15. List at least two problems with using operational system keys as primary keys for the
dimension tables.
16. Suppose a medical data warehouse has the dimensions patient, clinic, time, and
payment_method. The fact table records medical check-up events or instances with
attributes like bill_amount, cost, body_emperature, eye_color, and number_of_tests.
Draw a snowflake schema for the data warehouse. Show all relevant attributes in the
dimension tables. List all primary and foreign keys. Show the relationships between
the tables (Read questions 17 and 18 also; use the backside of the previous page to
answer this question).
17. Refer to the medical data warehouse described in question 16 above. For each of the
attributes in the fact table, mention whether it is additive, semi-additive, non-additive,
or degenerate.
CS/CMPE 332 (Sp 03/04) – Dr. Asim Karim
Page 4 of 5
18. Refer to the medical data warehouse snowflake schema that you created in question
16 above. Suppose there are 25,000 patients, 100 clinics, 365 days, and 4 payment
methods. Assume on average 250 patients are checked per clinic per day.
a. How many rows of the fact table are required to answer the query: What is the
total cost of medical check-ups at clinic XYZ during the first 7 days of May,
2004?
b. How many table joins are needed to answer the query in 18(a) based on the
snowflake schema you designed in question 16?
19. Consider the following star schema:
Create a new star schema that includes a 1-way aggregate fact table (along product
category) and a 2-way aggregate fact table (along product category and time quarters).
Answer on the backside of the previous page.
CS/CMPE 332 (Sp 03/04) – Dr. Asim Karim
Page 5 of 5
Download