Homework 4 Key - Computer Science Department

advertisement
Database Systems: Homework 4 Key
Due 25 November, 2013
Team:
Key
1. (5 points) A disk drive has the following characteristics: the block size B = 512 bytes,
average seek time s = 30 ms (milliseconds), average rotational delay rd = 12.5 ms,
B
(bytes per millisecond). How much time does it take on
transfer rate tr = 512 ms
average to locate and transfer a single block, given its block address?
Average time to locate a block = s + rd +
B
tr
= 30 ms + 12.5 ms +
512 B
B
512 ms
= 43.5 ms.
2. (15 points, 3 each) The drive mentioned above is used to store a database file of r =
20, 000 STUDENT records with fixed length. Each record has the following fields: Name
(30 bytes), Ssn (9 bytes), Address (40 bytes), Phone (10 bytes), Birth date (8 bytes),
Sex (1 byte), Major dept code (4 bytes), Minor dept code (4 bytes), Class code (4
bytes, integer), and Degree program (3 bytes). An additional byte is used as a deletion
marker.
(a) Calculate the record size R in bytes.
R = 30 B + 9 B + 40 B + 10 B + 8 B + 1 B + 4 B + 4 B + 4 B + 3 B + 1 B = 114 B
(b) Calculate the blocking factor bf r and the number of file blocks b, assuming an
unspanned organization.
l m l
m
512 B 20,000 records
r
records
;
b
=
bf r = B
=
=
4
=
= 5000 blocks.
R
114 B
block
bf r
4 records
block
(c) Calculate the average time it takes to find a record by doing a linear search on
the file if the blocks are stored contiguously and double buffering is used.
On average half of the 5000 blocks must be examined to find a specific record.
If the file is contiguous and double-buffered, the seek time and rotational delay
B
=
happen once, then 2500 contiguous blocks must be read. So: s + rd + 2500 · tr
512 B
30 ms + 12.5 ms + 2500 · 512 B = 2.54 s, or 2.5425 s.
ms
(d) Calculate the average time it takes if, instead, the blocks are stored randomly.
Once again 2500 blocks must be examined, but the seek and rotational times happen
for each block. That was calculated as 43.5 ms in the first question, so: 2500 ·
43.5 ms = 108.75 s.
(e) Assume the file is ordered by Ssn. Calculate the average time it takes to search
for a record given its Ssn using binary search.
The number of blocks that must be examined is dlog2 (b)e = dlog2 (5000)e = 13.
The blocks won’t be organized sequentially so the time to access each one is again,
on average, 43.5 ms. So the average search time now is 13 · 43.5 ms = 565.5 ms.
Note this is over 4 times better than in the contiguous file, even though the seek
and rotational delays happen for each block.
1
3. (15 points, 3 each) Recall the database and file parameters for the in-class questions
from chapter 18. Those parameters were:
Block size (B)
Block pointer length (P)
Record pointer length (PR )
Number of EMPLOYEE records (r)
EMPLOYEE fields and sizes
Name
Ssn
Department code
Address
Phone
Birth date
Sex
Job code
Salary
deletion marker
512 B
6B
7B
30,000
30 B
9B
9B
40 B
10 B
8B
1B
4B
4B
1B
Suppose that the file is not ordered by the key field Ssn (this will matter) and we want
to construct a B + -tree access structure (index) on Ssn.
(a) Calculate p, the branch-node order, and pleaf , the leaf-node order, of the tree.
Branch
contain
will
512 B−6
the Ssn search key and block pointers to other nodes, so
nodes
B
=
= 34. Leaf nodes will have a record pointer for each key
p = VB−P
+P
9 B+6 B
value (because the file is jnot ordered
k on that field) plus one block pointer to the
B−P
B−6 B
next leaf node, so pleaf = V +PR = 512
= 31.
9 B+7 B
(b) What is the number of leaf-level blocks needed if blocks are about 69% full?
(Assume each leaf node contains d0.69 · pleaf e index records.)
There will be d0.69 · pleaf e = d0.69 · 31e = 22 search keys per leaf node. Since the file
is not ordered on that field, there will have to be a search record
for
each record in
= 1364 blocks.
the file, or 30, 000 of them. So the number of leaf blocks is 30,000
22
(The ceiling because the last block will be partially full.)
(c) What is x, the number of levels needed by the tree, if branch nodes are also 69%
full? (Again assume each branch node contains d0.69 · pe index records.)
By the same reasoning there will be d0.69 · pe = d0.69 · 34e = 24 block pointers per
branch node. Then, one way to compute this is iteratively: If there are 1364 blocks
in the first level
of the tree, there must be 1364 block pointers in
57the
second level.
1364
So there are 24 = 57 blocks in the second level. Similarly, 24 = 3 blocks in
the third level, and clearly 1 block in the fourth level. So x = 4.
Alternatively, use the logarithmic-height formula: x = dlogf o (b1 ))e + 1 =
d(log24 (1364)e + 1 = 4.
(d) What is the total number of blocks required by the B + -tree?
2
1364 at the first level, 57 at the second, 3 at the third and 1 at the fourth. The
sum is 1425 blocks.
(e) How many block accesses are needed to retrieve an entire record from the file,
given its Ssn value, using this B + -tree?
Four to traverse the height, plus one to access the full record in the primary file, so
5.
4. (20 points, 10 each) For each of the SQL queries below, give the canonical query
tree, and an optimized tree based on the heuristics from 19.7. Refer to the Library
database schema in figure 5.
(a)
select
from
where
T i t l e , Author name , P u b l i s h e r n a m e
Book B, Book Authors A, P u b l i s h e r P
P . Name = P u b l i s h e r n a m e and A. Book id = B . Book id and
P . Phone l i k e ’406−% ’ ;
See figures 1 and 2.
(b)
s e l e c t Name , Address , Phone , T i t l e
from
Borrower B, Book Loans BL, (
s e l e c t Book id as BID , T i t l e
from
Book , Book Copies C, L i b r a r y B r a n c h L
where Book . Book id = C. Book id and C. N o o f c o p i e s = 1 and
C. B r a n c h i d = L . B r a n c h i d and (
L . Branch name = ’ Montevideo ’ or
L . Branch name = ’ Bird I s l a n d ’ ) )
where B . Card no = BL . Card no and BID = BL . Book id ;
See figures 3 and 4.
3
πTitle,Author name,Publisher name
σP.Name=Publisher name ∧ A.Book id=B.Book id ∧ P.Phone'0 406−%0
×
×
P = Publisher
B = Book A = Book Authors
Figure 1: Canonical query tree, first query
πTitle,Author name,Publisher name
o
nP.Name=Publisher name
πTitle,Author name,Publisher name
πP.Name
o
nA.Book id=B.Book id
σP.Phone'0 406−%0
B = Book A = Book Authors
P = Publisher
Figure 2: Heuristic-optimized query tree, first query
4
πName,Address,Phone,Title
σB.Card no=BL.Card no ∧ BID=BL.Book id
×
ρBID,Title
×
B = Borrower
πBook id,Title
BL = Book Loans
σ(Book.Book id=C.Book id0 ∧ C.No of 0copies=1 ∧ C.Branch0 id=L.Branch
id ∧
)
0
(L.Branch name= Montevideo ∨ L.Branch name= BirdIsland )
×
×
Book
L = Library Branch
C = Book Copies
Figure 3: Canonical query tree, second query
5
Figure 4: Heuristic-optimized query tree, second query
6
BL = Book Loans
πBook id,Card no
Book
copies=1
L = Library Branch
C = Book Copies
σC.No of
σ(L.Branch name=0 Montevideo0 ∨)
L.Branch name=0 BirdIsland0
o
nBook.Book id=C.Book id
πBook id,Title
πBranch id
πBook id,Branch id,Title
o
nC.Branch id=L.Branch id
πBook id,Title
o
nB.Card no=BL.Card no
B = Borrower
ρBID,Title
πName,Address,Phone,Book id
o
nBID=BL.Book id
πName,Address,Phone,Title
Figure 5: Library database schema
7
Download