Database Systems: Homework 4 Key Due 25 November, 2013 Team: Key 1. (5 points) A disk drive has the following characteristics: the block size B = 512 bytes, average seek time s = 30 ms (milliseconds), average rotational delay rd = 12.5 ms, B (bytes per millisecond). How much time does it take on transfer rate tr = 512 ms average to locate and transfer a single block, given its block address? Average time to locate a block = s + rd + B tr = 30 ms + 12.5 ms + 512 B B 512 ms = 43.5 ms. 2. (15 points, 3 each) The drive mentioned above is used to store a database file of r = 20, 000 STUDENT records with fixed length. Each record has the following fields: Name (30 bytes), Ssn (9 bytes), Address (40 bytes), Phone (10 bytes), Birth date (8 bytes), Sex (1 byte), Major dept code (4 bytes), Minor dept code (4 bytes), Class code (4 bytes, integer), and Degree program (3 bytes). An additional byte is used as a deletion marker. (a) Calculate the record size R in bytes. R = 30 B + 9 B + 40 B + 10 B + 8 B + 1 B + 4 B + 4 B + 4 B + 3 B + 1 B = 114 B (b) Calculate the blocking factor bf r and the number of file blocks b, assuming an unspanned organization. l m l m 512 B 20,000 records r records ; b = bf r = B = = 4 = = 5000 blocks. R 114 B block bf r 4 records block (c) Calculate the average time it takes to find a record by doing a linear search on the file if the blocks are stored contiguously and double buffering is used. On average half of the 5000 blocks must be examined to find a specific record. If the file is contiguous and double-buffered, the seek time and rotational delay B = happen once, then 2500 contiguous blocks must be read. So: s + rd + 2500 · tr 512 B 30 ms + 12.5 ms + 2500 · 512 B = 2.54 s, or 2.5425 s. ms (d) Calculate the average time it takes if, instead, the blocks are stored randomly. Once again 2500 blocks must be examined, but the seek and rotational times happen for each block. That was calculated as 43.5 ms in the first question, so: 2500 · 43.5 ms = 108.75 s. (e) Assume the file is ordered by Ssn. Calculate the average time it takes to search for a record given its Ssn using binary search. The number of blocks that must be examined is dlog2 (b)e = dlog2 (5000)e = 13. The blocks won’t be organized sequentially so the time to access each one is again, on average, 43.5 ms. So the average search time now is 13 · 43.5 ms = 565.5 ms. Note this is over 4 times better than in the contiguous file, even though the seek and rotational delays happen for each block. 1 3. (15 points, 3 each) Recall the database and file parameters for the in-class questions from chapter 18. Those parameters were: Block size (B) Block pointer length (P) Record pointer length (PR ) Number of EMPLOYEE records (r) EMPLOYEE fields and sizes Name Ssn Department code Address Phone Birth date Sex Job code Salary deletion marker 512 B 6B 7B 30,000 30 B 9B 9B 40 B 10 B 8B 1B 4B 4B 1B Suppose that the file is not ordered by the key field Ssn (this will matter) and we want to construct a B + -tree access structure (index) on Ssn. (a) Calculate p, the branch-node order, and pleaf , the leaf-node order, of the tree. Branch contain will 512 B−6 the Ssn search key and block pointers to other nodes, so nodes B = = 34. Leaf nodes will have a record pointer for each key p = VB−P +P 9 B+6 B value (because the file is jnot ordered k on that field) plus one block pointer to the B−P B−6 B next leaf node, so pleaf = V +PR = 512 = 31. 9 B+7 B (b) What is the number of leaf-level blocks needed if blocks are about 69% full? (Assume each leaf node contains d0.69 · pleaf e index records.) There will be d0.69 · pleaf e = d0.69 · 31e = 22 search keys per leaf node. Since the file is not ordered on that field, there will have to be a search record for each record in = 1364 blocks. the file, or 30, 000 of them. So the number of leaf blocks is 30,000 22 (The ceiling because the last block will be partially full.) (c) What is x, the number of levels needed by the tree, if branch nodes are also 69% full? (Again assume each branch node contains d0.69 · pe index records.) By the same reasoning there will be d0.69 · pe = d0.69 · 34e = 24 block pointers per branch node. Then, one way to compute this is iteratively: If there are 1364 blocks in the first level of the tree, there must be 1364 block pointers in 57the second level. 1364 So there are 24 = 57 blocks in the second level. Similarly, 24 = 3 blocks in the third level, and clearly 1 block in the fourth level. So x = 4. Alternatively, use the logarithmic-height formula: x = dlogf o (b1 ))e + 1 = d(log24 (1364)e + 1 = 4. (d) What is the total number of blocks required by the B + -tree? 2 1364 at the first level, 57 at the second, 3 at the third and 1 at the fourth. The sum is 1425 blocks. (e) How many block accesses are needed to retrieve an entire record from the file, given its Ssn value, using this B + -tree? Four to traverse the height, plus one to access the full record in the primary file, so 5. 4. (20 points, 10 each) For each of the SQL queries below, give the canonical query tree, and an optimized tree based on the heuristics from 19.7. Refer to the Library database schema in figure 5. (a) select from where T i t l e , Author name , P u b l i s h e r n a m e Book B, Book Authors A, P u b l i s h e r P P . Name = P u b l i s h e r n a m e and A. Book id = B . Book id and P . Phone l i k e ’406−% ’ ; See figures 1 and 2. (b) s e l e c t Name , Address , Phone , T i t l e from Borrower B, Book Loans BL, ( s e l e c t Book id as BID , T i t l e from Book , Book Copies C, L i b r a r y B r a n c h L where Book . Book id = C. Book id and C. N o o f c o p i e s = 1 and C. B r a n c h i d = L . B r a n c h i d and ( L . Branch name = ’ Montevideo ’ or L . Branch name = ’ Bird I s l a n d ’ ) ) where B . Card no = BL . Card no and BID = BL . Book id ; See figures 3 and 4. 3 πTitle,Author name,Publisher name σP.Name=Publisher name ∧ A.Book id=B.Book id ∧ P.Phone'0 406−%0 × × P = Publisher B = Book A = Book Authors Figure 1: Canonical query tree, first query πTitle,Author name,Publisher name o nP.Name=Publisher name πTitle,Author name,Publisher name πP.Name o nA.Book id=B.Book id σP.Phone'0 406−%0 B = Book A = Book Authors P = Publisher Figure 2: Heuristic-optimized query tree, first query 4 πName,Address,Phone,Title σB.Card no=BL.Card no ∧ BID=BL.Book id × ρBID,Title × B = Borrower πBook id,Title BL = Book Loans σ(Book.Book id=C.Book id0 ∧ C.No of 0copies=1 ∧ C.Branch0 id=L.Branch id ∧ ) 0 (L.Branch name= Montevideo ∨ L.Branch name= BirdIsland ) × × Book L = Library Branch C = Book Copies Figure 3: Canonical query tree, second query 5 Figure 4: Heuristic-optimized query tree, second query 6 BL = Book Loans πBook id,Card no Book copies=1 L = Library Branch C = Book Copies σC.No of σ(L.Branch name=0 Montevideo0 ∨) L.Branch name=0 BirdIsland0 o nBook.Book id=C.Book id πBook id,Title πBranch id πBook id,Branch id,Title o nC.Branch id=L.Branch id πBook id,Title o nB.Card no=BL.Card no B = Borrower ρBID,Title πName,Address,Phone,Book id o nBID=BL.Book id πName,Address,Phone,Title Figure 5: Library database schema 7