Exercises (Course: Database Management Systems)

advertisement
Review Questions and Exercises
(Course: Database Management Systems)
Chaper 1
Disk Storage, Basic File Structures and Hashing
Review Questions
Q1. What is the difference between a file organization and an acess method ?
Q2. What is the difference between a static and dynamic files?
Q3. What are the typical record-at-a-time operations for accessing a file? Which of these
depend on the current record of a file?
Q4. Discuss the advantages and disadvantages of using (a) an unordered file, (b) an
ordered file, and (c) a static hashing file with bucket and chaining. Which operations can
be performed efficiently on each of these organizations, and which operations are
expensive?
Q5. Discuss the techniques for allowing a hash file to expand and shrink dynamically.
What are the advantages and disadvantages of each?
Exercises
E1
Consider a disk with the following characteristics: block size B = 512 bytes, interlock gap
size G = 128 bytes; number of blocks per track = 20; number of tracks per surface = 400.
A disk pack consists of 15 double-sided disks.
a. What is the total capacity of a track, and what is its useful capacity (excluding interlock
gaps)?
b. How many cylinders are there?
c. What are the total capacity and the useful capacity of a cylinder?
d. What are the total capacity and the useful capacity of a disk pack?
e. Suppose that the disk drive rotates the disk pack at a speed of 2400 rpm (revolutions
per minute); what are the transfer rate (tr) in bytes/msec and the block transfer time (btt)
in msec? What is the average rotational delay (rd) in msec? What is the bulk transfer
rate? (See Appendix C)
f. Suppose that the average disk seek time is 30 msec. How much time does it take (on
the average) in msec to locate and transfer a single block, given its block address?
g. Calculate the average time it would take to transfer 20 random blocks, and compare
this with the time it would take to transfer 20 consecutive blocks using double buffering
to save seek time and rotational delay.
E2
A file has r = 20000 STUDENT records of fixed length. Each record has the following
fields: NAME ( 30 bytes), SSN (9 bytes), ADDRESS (40 bytes), PHONE (9 bytes),
BIRTHDATE (8 bytes), SEX (1 byte), MAJORDEPTCODE (4 bytes),
MINORDEPTCODE (4 bytes), CLASSCODE (4 bytes, integer), and
DEGREEPROGRAM (3 bytes). An additional byte is used as a deletion maker. The file
is stored on the disk whose parameters are given in Exercise E1.
a. Calculate the record size R in bytes.
b. Calculate the blocking factor bfr and the number of file block b, assuming an unspanned organization.
c. Calculate the average time it takes to find a record by doing a linear search on the file
if (i) the file blocks are stored contiguously, and double buffering is used; (ii) the file are
not stored contiguously.
d. Assume that the file is ordered by SSN; calculate the time it takes to search for a
record given its SSN value, by doing a binary search.
E3.
A PARTS file with Part# as hash key includes records with the following Part# = values:
2369, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 6975, 4981,
9208. The file used eight buckets, numbered 0 to 7. Each bucket is one disk block and
holds two records. Load these records into the file in the given order, using the hash
function h(K) = K mod 8. Calculate the average number of block accesses for a random
retrieval on Part#.
E4.
Load the records of Exercise E3 into expandable hash files based on extendible hashing.
Show the structure of the directory at each step, and the global and local depths. Use the
hash function h(K) = K mod 128.
E5.
Load the records of Exercise E3 into an expandable hash file, using hashing. Start with a
single disk block, using the hash function h0 = K mod 20, and show how the file grows
and how the hash functions change as the records are inserted. Assume that blocks are
split wherever an overflow occurs, and show the value of n at each stage.
E6.
Suppose that we have a hash file of fixed-length records, and suppose that overflow is
handled by chaining. Outline algorithms for insertion, deletion, and modification of a file
record. State any assumptions you make.
E7.
Can you think of techniques other than chaining to handle bucket overflow in external
hashing?
E8.
Write pseudo-code for the insertion algorithms for linear hashing and for extendible
hashing.
Review Questions and Exercises
Chaper 2
Indexing Structures for Files
Review Questions
Q1. Define the following terms : indexing field, primary key field, clustering field,
secondary key field, block anchor, dense index, nondense (sparse) index
Q2. What are the differences among primary, secondary, and clustering indexes? How do
these differences affect the ways in which these indexes are implemented? Which of the
indexes are dense, and which are not?
Q3. Why can we at most one primary or clustering index on a file, but several secondary
indexes?
Q4. How does multilevel indexing improve the efficiency of searching an index file?
Q5. What is the order p of a B-tree? Describe the structure of B-tree nodes.
Q6. What is the order p of a B+-tree? Describe the structure of both internal and leaf
nodes of a B+-tree.
Q7. How does a B-tree differ from a B+-tree? Why is a B+-tree usually preferred as an
access structure to a data file?
Exercises
E1.
Consider a dish with block size B = 512 bytes. A block pointer is P = 6 bytes long, and a
record pointer is PR = 7 bytes long. A file has r = 30,000 EMPLOYEE records of fixed
length. Each record has the following fields: NAME (30 bytes), SSN (9 bytes),
DEPARTMENTCODE (9 bytes), ADDRESS (40 bytes), PHONE (9 bytes),
BIRTHDATE (8 bytes), SEX (1 byte), JOBCODE (4 bytes), SALARY (4 bytes, real
number). An additional byte is used as a deletion marker.
a. Calculate the record size R in bytes.
b. Calculate the blocking factor bfr and the number of file blocks b, assuming an unspanned organization.
c. Suppose that the file is ordered by the key field SSN and we want to construct a
primary index on SSN. Calculate (i) the index blocking factor bfr, (which is also the
index fanout fo); (ii) the number of first-level index entries and the number of first-level
index blocks; (iii) the number of levels needed if we make it into a multilevel index; (iv)
the total number of blocks required by the multilevel index; and (v) the number of block
accesses needed to search for and retrieve a record from the file – given its SSN value –
using the primary index.
d. Suppose that the file is not ordered by the key field SSN and we want to construct a
secondary index on SSN. Repeat the previous exercise (part c) for the secondary index
and compare with the primary index.
e. Suppose that the file is not ordered by the nonkey field DEPARTMENTCODE and we
want to construct a secondary index on DEPARTMENTCODE, using block of record
pointers, with an extra level of indirection that stores record pointers. Assume there are
1000 distinct values of DEPARTMENTCODE and that the EMPLOYEE records are
evenly distributed among these values. Calculate (i) the index blocking factor bfri, (which
is also the index fanout fo); (ii)the number of blocks needed by the level of indirection
that stores record pointers; (iii) the number of first-level index entries and the number of
first-level index blocks; (iv) the number of levels needed if we make it into a multilevel
index; (v) the total number of blocks required by the multilevel index and the blocks used
in the extra level of indirection; (vi) the approximate number of block accesses needed to
search for and retrieve all records in the file that have a specific DEPARTMENTCODE
value – using the index.
f. Suppose that the file is ordered by the nonkey field DEPARTMENTCODE and we
want to construct a clustering index on DEPARTMENTCODE that uses block anchors
(every new value of DEPARTMENTCODE starts at the beginning of a new block).
Assume there are 1000 distinct values of DEPARTMENTCODE and that the
EMPLOYEE records are evenly distributed among these values. Calculate (i) the index
blocking factor bfri, (which is also the index fanout fo); (iii) the number of levels needed
if we make it into a multilevel index; (iv) the total number of blocks required by the
multilevel index; (vi) the number of block accesses needed to search for and retrieve all
records in the file that have a specific DEPARTMENTCODE value – using the clustering
index (assume that multiple blocks in a cluster are contiguous).
g. Suppose that the file is not ordered by the key field SSN and we want to construct a
B+-tree access structure (index) on SSN. Calculate (i) the orders p and pleaf of the B+-tree;
(ii) the number of leaf-level blocks needed if blocks are approximately 69% full (rounded
up for convenience); (iii) the number of levels needed if internal nodes are also 69
percent full (rounded up for convenience); (iv) the total number of blocks required by the
B+-tree; and (v) the number of block accesses needed to search for and retrieve a record
from the file – given its SNN value – using the B+-tree.
E2.
A PARTS file with Part# as key key includes records with the following Part# = values:
23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39, 43, 47, 50,
69, 75, 8, 49, 33, 38. Suppose that the search field values are inserted in the given order
in a B+-tree of order p = 4 and pleaf = 3; show how the tree will expand and what the final
tree will look like.
E3. Repeat Exercise E2, but use a B-tree of order p = 4 instead of a B+-tree.
E4. Suppose that the following search field values are deleted, in the given order, from
the B+-tree of Exercise E2; show how the tree will shrink and show the final tree. The
deleted values are 65, 75, 43, 18, 20, 92, 59, 37.
E5. Repeat Exercise E4, but use a B-tree of order p = 4 instead of a B+-tree.
Review Questions and Exercises
Chaper 3
Algorithms for Query Processing and Optimization
Review Questions
Q1. Discuss the reasons for converting SQL queries into relational algebra queries before
optimization is done.
Q2. Discuss the different algorithms for implementing each of the following relational
operators and the circumstances under which each algorithms can be used:
SELECT
JOIN
UNION, INTERSECT, SET DIFFERENCE, CARTESIAN PRODUCT
Q3. What is a query execution plan?
Q4. What is meant by the term heuristic optimization? Discuss the main heuristics that
are applied during query optimization?
Q5. How does a query tree represent a relational algebra expression? What is meant by
an execution of a query tree? Discuss the rules for transformation of query trees and
identify when each rule should be applied during optimization.
Q6. How many different orders are there for a query that joins 10 relations?
Q7. What is the difference between pipelining and materialization?
Q8. Discuss the cost components for a cost function that is used to estimate query
execution cost. Which cost components are used most often as the basis for cost
functions?
Exercises
E1. Given a database consisting of the following relations:
And given the following SQL queires:
Q1: SELECT
FNAME,
LNAME,
ADDRESS
FROM
EMPLOYEE,
DEPARTMENT
WHERE
DNAME='Research' AND DNUMBER=DNO
(Retrieve the name and address of all employees who work for the 'Research'
department.)
Q8:
SELECT
E.FNAME,
E.LNAME,
S.FNAME,
S.LNAME
FROM
EMPLOYEE
E
S
WHERE
E.SUPERSSN=S.SSN
(For each employee, retrieve the employee's name, and the name of his or her immediate
supervisor.)
Q4:
(SELECT
PNAME
FROM
PROJECT,
DEPARTMENT,
EMPLOYEE
WHERE
DNUM=DNUMBER AND MGRSSN=SSN
AND
LNAME='Smith')
UNION
(SELECT
PNAME
FROM
PROJECT,
WORKS_ON,
EMPLOYEE
WHERE
PNUMBER=PNO AND ESSN=SSN AND
LNAME='Smith')
(Make a list of all project numbers for projects that involve an employee whose last name
is 'Smith' as a worker or as a manager of the department that controls the project.)
Q27: SELECT
FNAME,
LNAME,
1.1*SALARY
FROM EMPLOYEE,
WORKS_ON,
PROJECT
WHERE
SSN=ESSN AND PNO=PNUMBER AND
PNAME='ProductX’
(Show the effect of giving all employees who work on the 'ProductX' project a 10%
raise.)
a. Draw at least two query trees that can represent each of these queries. Under what
circumstances would you use each of your query trees?
b. Draw initial query tree for each of these queries, then show how the query tree is
optimized.
c. For each query, compare your own query trees of part (a) and the initial and final query
trees of part (b).
E2.
A file of 4096 blocks is to be sorted with an available buffer space of 64 blocks. How
many passes will be needed in the merge phase of the external sort-merge algorithm?
E3
Extend the sort-merge join algorithm to implement the left outer join operation.
E4. Given the following relations:
EMPLOYEE(ename, ssn, bdate, address, sex, salary, dno)
PROJECT(pname, pnumber, plocation)
WORKS_ON(essn, pno, hours)
and the query:
“Find the names of the employees whose birthdates are after 1957 and currently work for
the project Aquarius”
Apply the heuristic optimization transformation rules to find an efficient query execution
plan for the above query, which is described by the following query tree.
ename
Pname = ‘Aquarius’ and Pnumber=Pno
and Essn = Ssn and Bdate > ’31-12-1957’


Employee
Project
Works_on
E5
Given the three following relations:
Supplier(Supp#, Name, City, Specialty)
Project(Proj#, Name, City, Budget)
Order(Supp#, Proj#, Part-name, Quantity, Cost)
and the SQL query:
SELECT Supplier.Name, Project.Name
FROM Supplier, Order, Project
WHERE Supplier.City = ‘New York City’ AND Project.Budget > 10000000 AND
Supplier.Supp# = Order.Supp# AND Order.Proj# = Project.Proj#
a. Write the relational algebraic expression that is equivalent to the above query and
draw a query tree for the expression.
b. Apply the heuristic optimization transformation rules to find an efficient query
execution plan for the above query. Assume that the number of the supliers in
New York is larger that the number of the projects with the budgets more than
10000000$.
Review Questions and Exercises
(Course: Database Management Systems)
Chaper 4
Introduction to Transaction Processing Concepts and Theory
Review Questions
Q1. Discuss the actions taken by the read_item and write_item operations on a database.
Q2. What is the system log used for? What are the typical kinds of records in a system
log? What are transaction commit points, and why are they important?
Q3. Discuss the atomicity, durability, isolation, and consistency preservation properties
of a database transaction.
Q4. What is a serial schedule? What is a serializable schedule? Why is a serial schedule
considered correct ? Why is a serializable schedule considered correct ?
Q5. Discuss how serializability is used to enforce concurrency control in a database
system. Why is serializability sometimes considered too restrictive as a measure of
correctness for schedules?
Exercises
E1. Which of the following schedules is (conflict) serializable ? For each serializable
schedule, determine the equivalent serial schedules.
a. r1(X); r3(X); w1(X); r2(X); w3(X);
b. r1(X); r3(X); w3(X); w1(X); r2(X);
c. r3(X); r2(X); w3(X); r1(X); w1(X);
d. r3(X); r2(X); r1(X); w3(X); w1(X);
E2. Consider the three transactions T1, T2 and T3, and the schedules S1 and S2 given
below. Draw the serializability (precedence) graphs for S1 and S2, and state whether each
schedule is serializable or not. If a schedule is serializable, write down the equivalent
serial schedule(s).
T1: r1(X); r1(Z); w1(X);
T2: r2(Z); r2(Y); w2(Z); w2(Y);
T3: r3(X); r3(Y); w3(Y);
S1: r1(X); r2(Z); r1(Z); r3(X); r3(Y), w1(X); w3(Y); r2(Y), w2(Z); w2(Y);
S2: r1(X); r2(Z); r3(X); r1(Z); r2(Y), r3(Y); w1(X); w2(Z), w3(Y); w2(Y);
E3. Given two following transactions:
T1: r1(A); w1(A); r1(B);w1(B);
T2: r2(A); w2(A); r2(B);w2(B);
Prove that the schedule
S: r1(A);w1(A); r2(A); w2(A); r1(B);w1(B); r2(B);w2(B);
is conflict-serializable. (Hint: reordering the nonconflicting operations in S until we form
the equivalent serial schedule).
Review Questions and Exercises
(Course: Database Management Systems)
Chapter 5
Concurrency Control
Review Questions
Q1. What is the two-phase locking protocol ? How does it guarantee serializability?
Q2. What are some variations of the two-phase locking protocol? Why is strict or
rigorous two-phase locking often preferred?
Q3. Discuss the problems of deadlock and starvation, and the different approaches to
dealing with these problems.
Q4. Describe the wait-die and wound-wait protocols for deadlock prevention.
Q5. What is a timestamp? How does the system generate timestamps?
Q6. Discuss the timestamp ordering protocols for concurrency control. How does strict
timestamp ordering differ from basic timestamp ordering?
Q7. Discuss two multiversion techniques for concurrency control.
Q8. What is a certify lock? What are the advantages and disadvantages of using certify
locks?
Q9. How do optimistic concurrency control techniques differ from other concurrency
control techniques? Why are they also called validation or certification techniques?
Discuss the typical phases of an optimistic concurrency control method.
Q10. How does the granularity of data items affect the performance of concurrency
control? What factors affect selection of granularity size for data items?
Q11. What is multiple granularity locking? Under what circumstances is it used?
Q12. What are intention locks?
Exercises
E1. Consider the schedule shown in the following figure. Draw the wait-for graph before
and after the last action write_lock(A) of transaction T3.
T1
read_lock(A)
read_item(A)
T2
T3
T4
write_lock(B)
write_item(B)
read_lock(B)
read_lock(C)
read_item(C)
write_lock( C)
write_lock(B)
write_lock(A)
E2. Consider the set of transactions accessing database element A shown in the following
figure. These transactions are operating under an ordinary timestamp-based scheduler.
Explain why the transaction T3 has to be aborted. What happens if these transactions are
operating under a multiversion timestamp-based scheduler?
(Note: In the figure r means read and w means write.)
T1
150
T2
200
T3
175
T4
225
A
RT=0
WT=0
RT=150
WT=150
RT=200
WT=200
r4(A)
RT=225
r1(A)
w1(A)
r2(A)
w2(A)
r3(A)
Abort
E3. Consider the relation Movie(title, year, length, studioName)
Transaction T1 consists of the query:
SELECT * FROM Movie
WHERE title = ‘King Kong’
Transaction T2 consists of the query:
UPDATE Movie SET year = 1939
WHERE title = ‘Gone with the wind’
Assume that there are two records in relation Movie with the title ‘King Kong’ and there
is one record with the title ‘Gone with the wind’.
Suggest the collection of locks for this situation.
E4. Consider the three transactions T1, T2, and T3, and the schedules S1 and S2 given
below. Draw the serializability graph for S1 and S2, and state whether each schedule is
conflict-serializable or not. If a schedule is conflict-serializable, write down the
equivalent serial schedule.
T1: r1(B); w1(B);
T2: r2(A); w2(A); r2(B); w2(B);
T3: r3(A);w3(A);
S1: r2(A); r1(B); w2(A); r3(A); w1(B); w3(A); r2(B); w2(B);
S2: r2(A); r1(B); w2(A); r2(B); r3(A); w1(B); w3(A); w2(B);
Review Questions and Exercises
(Course: Database Management Systems)
Chaper 6
Database Recovery Techniques
Review Questions
Q1. How are buffering and caching techniques used by the recovery subsystem ?
Q2. Describe the write-ahead logging protocol.
Q3. Discuss the UNDO and REDO operations and the recovery techniques that use each.
Q4. Discuss the deferred update technique of recovery. What are the advantages and
disadvantages of this technique? Why is it called the NO-UNDO/REDO method?
Q5. How can recovery handle transaction operations that do not affect the database such
as the printing of reports by a transaction?
Q6. Discuss the immediate update recovery technique in both single-user and multi-user
environments. What are the advantages and disadvantages of immediate update?
Q7. Describe the shadow paging recovery technique. Under what circumstances does it
not require a log?
Q8. Describe the three phases of the ARIES recovery method.
Q9. What are log sequence numbers (LSNs) in ARIES? How are they used? What
information does the Dirty Page Tables and Transaction Table contain? Describe how
fuzzy checkpointing is used in ARIES.
Q10. Multiple-choice questions from 19.28 to 19.37 in the text book
Exercises
E1. Suppose that the system crashes before the [read_item, T3, A] entry is written to the
following log:
A
B
C
D
30
15
40
20
[start_transaction, T3]
[read_item, T3, C]
*
[write_item, T3, B, 15, 12] 12
[start_transaction,T2]
[read_item, T2, B]
**
[write_item, T2, B, 12, 18] 18
[start_transaction,T1]
[read_item, T1, A]
[read_item, T1, D]
[write_item, T1, D, 20, 25]
25
[read_item, T2, D]
**
[write_item, T2, D, 25, 26]
26
[read_item, T3, A]
---- system crash ---*T3 is rolled back because it did not reach its commit point.
** T2 is rolled back because it reads the value of item B written by T3.
Will that make any difference in the recovery process?
E2. Suppose that the system crashes before the [write_item, T2, D, 25, 26] entry is
written to the log given in Exercise E1. Will that make any difference in the recovery
process?
E3. The log corresponding to a particular schedule at the point of a system crash for four
transactions T1, T2, T3 and T4 is given as follows:
[start_transaction, T1]
[read_item, T1, A]
[read_item, T1, D]
[write_item, T1, D, 20, 25]
[commit, T1]
[checkpoint]
[start_transaction, T2]
[read_item, T2, B]
[write_item, T2, B, 12, 18]
[start_transaction, T4]
[read_item, T4, D]
[write_item, T4, D, 25, 15]
[start_transaction, T3]
[write_item, T3, C, 30, 40]
[read_item, T4, A]
[write_item, T4, A, 30, 20]
[commit, T4]
[read_item, T2, D]
[write_item, T2, D, 15, 25] ---- system crash
Suppose that we use the immediate update protocol with checkpointing. Describe the
recovery process from the system crash. Specify which transactions are rolled back,
which operations in the log are redone and which (if any) are undone, and whether any
cascading rollback take place.
E4. Suppose that we use the deferred update protocol for the example in Exercise E3.
Show how the log would be different in the case of deferred update by removing the
unnecessary log entries; then describe the recovery process, using your modified log.
Assume that only REDO operations are applied, and specify which operations in the log
are redone and which are ignored.
Multi-choice questions:
1. Incremental logging with deferred updates implies that the recovery system must
necessarily
a. store the old value of the updated item in the log.
b. store the new value of the updated item in the log.
c. store both the old and new value of the updated item in the log.
d. store only the Begin Transaction and Commit Transaction records in the log.
2. The write ahead logging (WAL) protocol simply means that
a. the writing of a data item should be done ahead of any logging operation.
b. the log record for an operation should be written before the actual data is written.
c. all log records should be written before a new transaction begins execution.
d. the log never needs to be written to disk.
3. In case of transaction failure under a deferred update incremental logging scheme,
which of the following will be needed:
a. un undo operation
b. a redo operation
c. an undo and redo operation.
d. none of the above.
4. for incremental logging with immediate updates, a log record for a transaction would
contain:
a. a transaction name, data item name, old value of item, new value of item.
b. a transaction name, data item name, old value of item.
c. a transaction name, data item name, new value of item.
d. a transaction name and data item name.
5. For correct behavior during recovery, undo and redo operation must be
a. commutative
b. associative
c. idempotent
d. distributive
6. When a failure occurs, the log is consulted and each operation is either undone or
redone. This is a problem because:
a. searching the entire log is time consuming.
b. many redo’s are unnecessary
c. both (a) and (b)
d. none of the above.
7. When using a log based recovery scheme, it might improve performance as well as
providing a recovery mechanism by
a. writing the log records to disk when each transaction commits.
b. writing the appropriate log records to disk during the transaction’s execution.
c. waiting to write the log records until multiple transactions commit and writing them in
a batch.
d. never writing the log records to disk.
8. There is a possibility of a cascading rollback when
a. a transaction writes items that have been written only by a committed transaction.
b. a transaction writes an item that is previously written by an uncommitted transaction.
c. a transaction reads an item that is previously written by an uncommitted transaction.
d. both (b) and (c).
9. To cope with media (disk) failures, it is necessary
a. for the DBMS to only execute transactions in a single user environment.
b. to keep a redundant copy of the database.
c. to never abort a transaction
d. all of the above.
10. If the shadowing approach is used for flushing a data item back to disk, then
a. the item is written to disk only after the transaction commits.
b. the item is written to a different location on disk.
c. the item is written to disk before the transaction commits.
d. the item is written to the same disk location from which it was read.
Review Questions and Exercises
(Course: Database Management Systems)
Chaper 7
Data Warehousing
Review Questions
Q1 Kho dữ liệu có bốn tính chất quan trọng. Bốn tính chất đó là gì ?.
Q2 Tại sao thời gian thường là một chiều trong kho dữ liệu?
Q3. Lược đồ hình sao và lược đồ hình bông tuyết khác nhau như thế nào?
Q4. Giải thích các thuật ngữ rút trích (extract), biến thể (transform) và nạp (load) trong
kho dữ liệu.
Q5. Giaûi thích khaùi nieäm khoái döõ lieäu (data cube) vaø neâu coâng duïng cuûa noù
trong vieäc moâ hình hoùa kho döõ lieäu.
Q6. Moâ taû ngaén goïn ba loaïi taùc vuï thöôøng coù trong caùc coâng cuï OLAP.
Q7. Trả lời các câu hỏi trắc nghiệm sau đây:
1. Data Mart không tồn tại một cách vật lý trong loại kiến trúc kho dữ liệu nào sau đây:
A. kiến trúc hai tầng tổng quát
B. kiến trúc data mart độc lập
C. kiến trúc ba tầng tổng quát
D. kiến trúc data mart luận lý và kho dữ liệu @ctive.
2. Dữ liệu hòa giải (reconciled data) tương ứng với dữ liệu trong tầng nào sau đây:
A. Data mart
B. Data warehouse
C. Cơ sở dữ liệu tác nghiệp
D. Cả ba câu trên đều sai
3. Trong phân tích OLAP, tác vụ drill-down được thực hiện bằng cách
A. tổng quát hóa lên trong hệ phân cấp chiều, thí dụ: từ city  state
B. chi tiết hóa xuống trong hệ phân cấp chiều, thí dụ: từ state  city
C. thêm chiều mới
D. bỏ một vài chiều
4. Trong một lược đồ hình bông tuyết,
A. Bảng fact thường được chuẩn hóa
B. Bảng fact thường phi chuẩn hóa
C. Các bảng chiều là chuẩn hóa
D. Các bảng chiều là phi chuẩn hóa
Q8. Match the following terms and definitions
……… event
a) previous data content is lost
……… periodic data
b) detailed, historical data
……… data mart
c) converts data formats
……… star schema
d) corrects errors in source data
……… data mining
e) data are not altered or deleted
……… reconciled data
f) a database action (e.g., create)
……… dependent data mart
g) data warehouse of limited scope
……… data visualization
h) dimension and fact tables
……… transient data
i) form of knowledge discovery
……… snowflake schema
j) filled from data warehouse
……… data transformation
k. results from hierarchical dimensions
……… data scrubbing
l. data represented in graphical formats
Exercises
E1.
Millenium College want you to help them design a star schema to record grades for
courses completed by students. There are four dimension tables with attributes as
follows:
 Course_Section. Attributes: Course_ID, Section_number, Course_Name, Units,
Room_ID, Room_Capacity.
 Professor. Atributes: Prof_ID, Prof_Name, Title, Department_ID, Department_Name.
 Student. Attributes: Student_ID, Student_Name, Major.
 Period. Atributes: Semester_ID, Year.
The only fact that is to be recorded in the fact table is Course_Grade.
Design a star schema for this problem.
E2.
Having mastered the principles of normalization, you recognize immediately that the star
schema you developed for Millenium College in exercise E1 is not in third normal form.
Using these principles, convert the star scheme to a snowflake schema.
E3.
Given the ER diagram for a database application as in the following figure. In this
application the transaction “medical examination” (represented by the binary relationship
“examines”) is the main subject of the data warehouse to be constructed.
Design a star schema for this data ware house.
Pat_ID
Specialty
Age
Sex
examines
DOCTOR
PATIENT
Lives-in
Name
Doc_ID
date
charge
CITY
city-name
state
Download