COP 5725 Advanced Database Systems Fall 2023, Assignment 1 Instructor: Peixiang Zhao TA: Jiyang Bai Due date: Tuesday, 11/07/2021, via Canvas Problem 1 SQL [10 points] We define a database for FSU libraries as follow, Books: (bookID INTEGER, title TEXT, genre TEXT, lib INTEGER REFERENCES Libraries, PRIMARY KEY (bookID)); Libraries: Checkouts: (book, day)). (libraryID INTEGER, name TEXT, PRIMARY KEY (libraryID)); (book INTEGER REFERENCES Books, day DATETIME, PRIMARY KEY 1. [3 points] Write a SQL query that returns the bookID and genre of each book that has ever been checked out; 2. [3 points] Write a SQL query that returns the bookID of the book that has been checked out the most times, and the corresponding checked-out count. We assume that each book was checked out a unique number of times; 3. [4 points] Write a SQL query that finds the name of all of the pairs of libraries that have books with matching titles. In the output, report the names of both libraries (the first name is alphabetically less than the second) and the matching titles of the books. Problem 2 Storage [10 points] We record the following disk page access patterns: A B C D A F A D G D G E D F, where each letter represents a disk page. Assume the buffer pool can accommodate four pages, and they are all available at the beginning. 1. [4 points] Please compute the hit rate if the LRU policy is adopted; 2. [4 points] Please compute the hit rate if the MRU policy is adopted; 3. [2 points] As operating systems have implemented the buffer management policies, why wouldn’t we directly use OS policies in database systems? COP 5725: Advanced Database Systems Problem 3 Fall 2023 B+ Tree [10 points] Consider the following table schema: Beers(Name VARCHAR(64), Age INTEGER, Price INTEGER); Suppose we’ve built a B+-tree index upon the attribute Age. This index is two levels deep (in addition to the root node), and the actual tuples are not stored at the leaf nodes, but are referenced in leaf nodes to separate data pages. 1. [3 points] Consider the following SQL query SELECT * FROM Beers WHERE Age = 28 What is the worst-case number of I/O’s to execute this query if there is only 1 matching tuple? What about 2 matching tuples? What about 3 matching tuples? We assume the B+-tree index upon Age is unclustered. 2. [3 points] Consider the following SQL query SELECT * FROM Beers WHERE Price = 75 What is the worst-case number of I/O’s to execute this query if there are 100 data pages. 3. [4 points] Consider the following SQL query DELETE FROM Beers WHERE Age = 20 Assume this query deletes one tuple from the table. How many I/O’s are required to execute this query? Problem 4 kd-Tree [5 points] Given a kd-tree index that is perfectly balanced, the index concerns two dimensions (e.g., salary and age). For a query only one of√the two dimensions is specified (e.g., age = 35), prove we wind up looking at about n out of the n leaves from the kd-tree index to answer the query. Problem 5 kd-Tree [25 points] In the class, we learned that kd-tree (k-dimensional tree) is primarily for multidimensional search. We consider in this question a degraded kd-tree that is used to index one-dimensional data (where k = 1). First of all, given an array A of n values A = [a1 , a2 , . . . , an ]. We first build a kd-tree T for A, where leaves of T store the data points of A, and the internal nodes of T store the splitting values (i.e. medians) to guide the search. Let v denote the value stored at each split node t. The Assignment 1 Page 2 COP 5725: Advanced Database Systems (a) A = [1, 2] Fall 2023 (b) A = [1, 2, 3] Figure 1: Two examples of kd-tree for A = [1, 2] and A = [1, 2, 3], respectively left subtree of t contains all data points smaller than or equal to v, and the right subtree of t contains all the data points strictly greater than v. The splitting value v is the median of data points at t: If there are an even number x of data points at t, the median is the data point at rank x/2; Otherwise, the median is the data point at rank ⌈x/2⌉. See Figure 1 for some toy examples. 1. [10 points] Given A = [63, 60, 110, 23, 81, 38, 50, 10, 5, 71, 30, 100, 90, 20], construct and draw the kd-tree for A. What is the time complexity of kd-tree construction for a one-dimensional array A of size n? 2. [15 points] Given a range query [l, r], where l ≤ r, write an efficient algorithm to report all the data points x in the one-dimensional kd-tree with n data points, where l ≤ x ≤ r. What is the time complexity of the algorithm? Problem 6 Quad-Tree [10 points] Place all the data of Table 1 into a quad tree with dimensions speed and ram. Assume that the range for speed is 1.00 to 5.00, and for ram it is 500 to 3, 500. No leaf of the quad tree should have more than two points (Hint: you may chose the format of Figure 14.43 in the text book). Assignment 1 Page 3 COP 5725: Advanced Database Systems Model 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 Speed 2.66 2.10 1.42 2.80 3.20 3.20 2.20 2.20 2.00 2.80 1.86 2.80 RAM 1024 512 512 1024 512 1024 1024 2048 1024 2048 2048 1024 Fall 2023 Hard disk 250 250 80 250 250 320 200 250 250 300 160 160 Table 1: Some PC’s and their characteristics Assignment 1 Page 4