Uploaded by Mohd Ashif (Ashif)

homework1

advertisement
COP 5725 Advanced Database Systems
Fall 2023, Assignment 1
Instructor: Peixiang Zhao
TA: Jiyang Bai
Due date: Tuesday, 11/07/2021, via Canvas
Problem 1
SQL [10 points]
We define a database for FSU libraries as follow,
Books: (bookID INTEGER, title TEXT, genre TEXT, lib INTEGER REFERENCES
Libraries, PRIMARY KEY (bookID));
Libraries:
Checkouts:
(book, day)).
(libraryID INTEGER, name TEXT, PRIMARY KEY (libraryID));
(book INTEGER REFERENCES Books, day DATETIME, PRIMARY KEY
1. [3 points] Write a SQL query that returns the bookID and genre of each book
that has ever been checked out;
2. [3 points] Write a SQL query that returns the bookID of the book that has
been checked out the most times, and the corresponding checked-out count.
We assume that each book was checked out a unique number of times;
3. [4 points] Write a SQL query that finds the name of all of the pairs of libraries
that have books with matching titles. In the output, report the names of both
libraries (the first name is alphabetically less than the second) and the matching
titles of the books.
Problem 2
Storage [10 points]
We record the following disk page access patterns: A B C D A F A D G D G E D F,
where each letter represents a disk page. Assume the buffer pool can accommodate
four pages, and they are all available at the beginning.
1. [4 points] Please compute the hit rate if the LRU policy is adopted;
2. [4 points] Please compute the hit rate if the MRU policy is adopted;
3. [2 points] As operating systems have implemented the buffer management policies, why wouldn’t we directly use OS policies in database systems?
COP 5725: Advanced Database Systems
Problem 3
Fall 2023
B+ Tree [10 points]
Consider the following table schema:
Beers(Name VARCHAR(64), Age INTEGER, Price INTEGER);
Suppose we’ve built a B+-tree index upon the attribute Age. This index is two levels
deep (in addition to the root node), and the actual tuples are not stored at the leaf
nodes, but are referenced in leaf nodes to separate data pages.
1. [3 points] Consider the following SQL query
SELECT * FROM Beers WHERE Age = 28
What is the worst-case number of I/O’s to execute this query if there is only
1 matching tuple? What about 2 matching tuples? What about 3 matching
tuples? We assume the B+-tree index upon Age is unclustered.
2. [3 points] Consider the following SQL query
SELECT * FROM Beers WHERE Price = 75
What is the worst-case number of I/O’s to execute this query if there are 100
data pages.
3. [4 points] Consider the following SQL query
DELETE FROM Beers WHERE Age = 20
Assume this query deletes one tuple from the table. How many I/O’s are required to execute this query?
Problem 4
kd-Tree [5 points]
Given a kd-tree index that is perfectly balanced, the index concerns two dimensions
(e.g., salary and age). For a query only one of√the two dimensions is specified (e.g.,
age = 35), prove we wind up looking at about n out of the n leaves from the kd-tree
index to answer the query.
Problem 5
kd-Tree [25 points]
In the class, we learned that kd-tree (k-dimensional tree) is primarily for multidimensional search. We consider in this question a degraded kd-tree that is used
to index one-dimensional data (where k = 1). First of all, given an array A of n
values A = [a1 , a2 , . . . , an ]. We first build a kd-tree T for A, where leaves of T store
the data points of A, and the internal nodes of T store the splitting values (i.e. medians) to guide the search. Let v denote the value stored at each split node t. The
Assignment 1
Page 2
COP 5725: Advanced Database Systems
(a) A = [1, 2]
Fall 2023
(b) A = [1, 2, 3]
Figure 1: Two examples of kd-tree for A = [1, 2] and A = [1, 2, 3], respectively
left subtree of t contains all data points smaller than or equal to v, and the right
subtree of t contains all the data points strictly greater than v. The splitting value v
is the median of data points at t: If there are an even number x of data points at t,
the median is the data point at rank x/2; Otherwise, the median is the data point at
rank ⌈x/2⌉. See Figure 1 for some toy examples.
1. [10 points] Given A = [63, 60, 110, 23, 81, 38, 50, 10, 5, 71, 30, 100, 90, 20], construct and draw the kd-tree for A. What is the time complexity of kd-tree
construction for a one-dimensional array A of size n?
2. [15 points] Given a range query [l, r], where l ≤ r, write an efficient algorithm to
report all the data points x in the one-dimensional kd-tree with n data points,
where l ≤ x ≤ r. What is the time complexity of the algorithm?
Problem 6
Quad-Tree [10 points]
Place all the data of Table 1 into a quad tree with dimensions speed and ram. Assume
that the range for speed is 1.00 to 5.00, and for ram it is 500 to 3, 500. No leaf of
the quad tree should have more than two points (Hint: you may chose the format of
Figure 14.43 in the text book).
Assignment 1
Page 3
COP 5725: Advanced Database Systems
Model
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
Speed
2.66
2.10
1.42
2.80
3.20
3.20
2.20
2.20
2.00
2.80
1.86
2.80
RAM
1024
512
512
1024
512
1024
1024
2048
1024
2048
2048
1024
Fall 2023
Hard disk
250
250
80
250
250
320
200
250
250
300
160
160
Table 1: Some PC’s and their characteristics
Assignment 1
Page 4
Download