Uploaded by mariam_20031

2015-DBII-Exam-Cairo-CS-ver5-solution

advertisement
German University in Cairo
Faculty of Media Engineering and Technology
Bar code
Exam Solution
CSEN 604: Databases II
(MET – CSEN)
Spring 2015 Semester
Dr. Wael Abouelsaadat
Date: May 23rd, 2015
Duration: 3 hours
Do not turn this page until you have received the signal to start.
In the meantime, read the instructions below carefully.
This exam consists of 6 questions (numbered 1 to 6) on 14 pages (including this one and an aid sheet in the last
page), printed on one side of the paper. When you receive the signal to start, please make sure that your copy of the
examination is complete.
Answer each question directly on the examination paper, in the space provided, and use the reverse side of
the page for rough work. If you need more space for one of your solutions, use the reverse side of the page
and indicate clearly the part of your work that should be marked.
1.
2.
3.
4.
5.
6.
_________________ / 25
_________________ / 12
_________________ / 16
_________________ / 16
_________________ / 20
_________________ / 11
(Index Structures)
(Result Size Estimation)
(I/O Cost Estimation)
(Concurrency Control)
(Logs & Recovery)
(SQL Transaction Modes)
_________________ / 100
TOTAL
Question 1. Index Structures [25 marks total]
a) [4 marks] Suppose we store a relation R(x,y) in a grid file. Both attributes have a range of values
from 0 to 1000. The partitions of this grid file happens to be uniformly spaced; for x there are
partitions every 20 unites at 20, 40, 60, and so on, while for y the partitions are every 50 units, at
50, 100, 150, and so on.
How many buckets do we have to examine to answer the range query:
SELECT * FROM R WHERE 310 < x AND x < 400 AND 520 < y AND y < 730;
Solution & marking:
25 [Either gets it or none]
b) [6 marks] Find any/all violations of the B+ tree structure in the following diagram. Circle each
bad node and give a brief explanation of each error. Assume the order of the tree is 4 (n=4; 4
keys, 5 pointers)
Solution & marking:
- [1 mark] Interior Node 10: below min key req
- [1 mark] Interior Node 13,20: key 20 duplicated from root.
- [1 mark] Leaf node 20,21: both keys are not less than root.
- [1 mark] Leaf node 22,23: key 22 not less than parent key 22
- [2 marks] Leaf nodes 8,9 and 6,7: swapped positions – leaf key 6 less than parent key 8
2/14
c) [15 marks] Consider the extensible hashing index shown below. To insert an entry, it is
translated to binary and the first n bits from the right are used. For example, 36 is binary 100100
while 51 is binary 110011.
i. [2 marks] Is it possible to identify the last entry that inserted into the index? if yes,
specify the entry ?
Solution & marking:
[2 marks] No, it could be any one of the data entries in the index. [Justification is not
required since I did not ask to justify why not] We can always find a sequence of
insertions and deletions with a particular key value, among the key values shown in the
index, as the last insertion.
ii. [2 marks] Which entry is guaranteed to be not the last one inserted?
Solution & marking:
10
[Either gets it or none]. [Justification not required since I did not ask to justify]
iii. [2 marks] Suppose you are told that there have been no deletions from this index so
far. Which buckets where last split?
Solution:
The last insertion which caused a split cannot be in Bucket C. Buckets B and C or C and
D could have made a possible bucket-split combination but the total number of data
entries in these combinations is 4 and the absence of deletions demands a sum of at least
5 data entries for such combinations. Buckets B and D can form a possible bucket-split
combination because they have a total of 6 data entries between themselves. So do A and
3/14
E. But for the B and D to be split images, the starting global depth should have been 1. If
the starting global depth is 2, then the last insertion causing a split would be in A or E.
[Either gets it or none]. [Justification not required since I did not ask to justify]
iv. [3 marks] Show the index after inserting an entry with hash value 68 (1000100).
Solution:
[In case of error, partial marks should be given]
4/14
v. [3 marks] Show the index after inserting entries with hash values 17 (10001) and 69
(1000101) into the original index
Solution:
[In case of error, partial marks should be given]
vi. [3 marks] Show the index after deleting the entry with hash value 10 from the original
index. Is a merge of buckets triggered by this deletion? If not explain, why.
Solution:
[In case of error, partial marks should be given]
5/14
Question 2. Result Size Estimation [12 marks total]
Consider the following tables:
- table student with attributes ID, name, major, credits
- table course with title, instructor, credits
- table registered with attributes student and course
- registered.student is a foreign key to student ID.
- Attribute course of relation registered is a foreign key to attribute title of relation course.
Given are the following statistics:
T(student) = 30, 000
V (student, ID) = 30, 000
V (student, name) = 29, 500
V (student, major) = 20
T(course) = 80
V (course, title) = 80
V (course, instructor) = 50
V (course, credits) = 6
T(registered) = 10, 000
V (registered, student) = 3, 000
V (registered, course) = 30
V (student, credits) = 32
The min and max values for some of the columns are:
min(course, credits) = 0
max(course, credits) = 36
min(student, credits) = 0
max(student, credits) = 36
a) [2 marks] Estimate the number of result tuples for the query q = σmajor=CS(student)
Solution:
b) [3 marks] Estimate the number of result tuples for the following query of OR’ed terms
q = σmajor=CS ∨ major=Bio(student)
Solution:
c) [3 marks] Estimate the number of result tuples for the following query with ANDed terms
q = σcredits≥32 ∧ credits≤34(student)
Your solution must take into consideration the given min and max values for credits.
Solution:
6/14
d) [4 marks] Estimate the number of result tuples for the following join query
q = student ID=student registered
course=title course
Solution:
7/14
Question 3. I/O Cost Estimation [16 marks total]
a) [9 marks] Consider two relations R and S with B(R) = 3,500 and B(S) = 2,300. You have M =
101 memory pages available. Compute the number of I/O operations for each join method below.
i) [3 marks] Block nested-loop join
Solution:
ii) [3 marks] Merge-join (inputs not sorted)
Solution:
[we will also consider multiplying by 5 as in the lecture slides as correct answer since they
are two alternative estimation techniques]
iii) [3 marks] Hash-join
Solution:
8/14
b) [7 marks] Assume you have a database with the following relations;
Customers( CustID, Name, Age, Gender ) and is stored on 100,000 disk pages (aka blocks)
Purchases ( CustID, Product, Date, Location, Amount) on 2,000,000 disk pages
SalesCalls( CustID, Salesperson, Date, Result)
on 300,000 disk pages
From the database query log, you have observed the following query mix on this database:
10% queries selecting on Customers.CustID
30% queries selecting on Customers.Name
35% queries selecting on Purchases.Product
10% queries selecting on SalesCalls.SalesPerson
15% queries selecting on SalesCalls.Date
You want to create indices over these relations to speed up queries over these relations and you have
enough resources to build these indices. You may assume that the index allows you to retrieve the
answer to the query with significantly less cost than doing a table scan. You can also assume that the
savings obtained by building an index on an attribute is proportional to the number of pages in the
relation multiplied by the number of queries
Which two attributes are best to build indices on? (because they will achieve the best performance
enhancement). Justify your answer.
Solution:
Purchases.product and SalesCalls.Date
Purchases is the largest relation and Purchases.Product is the most queried attribute.
For the second index, we have to choose between Customers.Name and SalesCalls.Date. Of these two,
SalesCalls.Date is more useful since the savings obtained by building an index on an attribute is
proportional to the number of pages in the corresponding relation multiplied by the number of queries
using that index: 0.15 x 300,000 > 0.1 x 100,000
[each 3.5; 1.5 marks for choice and 2 marks for justification]
9/14
Question 4. Concurrency Control [16 marks total]
a) [3 marks] Consider a database that is read-only (i.e., no transactions change any data in the
database). Suppose serializability needs to be supported. Place a check mark in-front of each
correct statement:
__T__ No locking is necessary.
_____ Only read locks are necessary and they need to be held until end of transaction.
_____ Only read locks are necessary but they can be released as soon as the read is
complete.
_____ Both read and write locks are necessary and locking must be done in two phases.
_____ None of the above.
b) [4 marks] In the schedule given below, the label Ri(X) indicates a read of element X by
transaction Ti , and Wi(X) indicates a write of element X by transaction Ti . Draw the precedence
graph for schedule below. Is the schedule conflict-serializable? If so, what is the order of the
three transactions if run serially?
R2(A) R1(C) R2(B) R3(B) W2(B) R1(A) R3(C) W3(C) W1(A)
Solution:
Schedule is not conflict-serializable because the precedence graph has a cycle.
10/14
c) [1 mark] In the case of 3 transactions T1, T2, T3, list all possible serial schedules.
Solution:
T1,T2,T3
T2,T3,T1
T1,T3,T2
T3,T1,T2
T2,T1,T3
T3,T2,T1
d) [3 marks] Justify why running any one of those serial schedule in your answer to c) is valid
despite of the fact that each schedule might result in a different database state.
Solution:
Each serial schedule will leave the database in a new consistent. It does not really matter
which transaction got executed first, as long as it is a serial schedule, it is fine.
e) [2 marks] What is two-phase locking?
Solution:
A transaction must obtain all locks for all resources it needs before releasing any lock.
f) [3 marks] Describe an example of two transactions, each has a sequence of read/write steps,
running concurrently where using locking but not 2 phase locking will produce an inconsistent
database state. Specify values for the records you are reading/writing to demonstrate your
solution.
Solution:
11/14
Question 5. Logs & Recovery [20 marks total]
a) [6 marks] Undo Logging
Consider the following sequence of UNDO log records with a non-quiescent checkpoint:
<START S>
<S,A,60>
<COMMIT S>
<START T>
<T,A,10>
<START U>
<U,B,20>
<T,C,30>
<START V>
<U,D,40>
<START CKPT(T,U,V)>
<V,F,70>
<COMMIT U>
<T,E,50>
<COMMIT T>
<V,B,80>
<COMMIT V>
i. [1 mark] when the <END CKPT> record is written?
Solution:
<END CKPT> will appear immediately after or before <COMMIT V>
ii.[5 marks] for each possible point at which a crash could occur, how far back in the log we
must look to find all possible incomplete transactions.
Solution:
12/14
b) [7 marks] Redo Logging
Consider the following set of redo log records, explain what happens to both disk and log in case a
failure occurs and the last log to appear on disk is:
<START A>
<A, X, 4>
<A, Y, 2>
<START B>
<A, Z, 3>
<START C>
<B, M, 100>
<B, N, 50>
<C, L, 20>
<COMMIT B>
<START D>
<COMMIT C>
<D, O, 12>
<D, P, 13>
<COMMIT D>
<COMMIT A>
<START E>
<E, Q, 85>
<E, R , 32>
<COMMIT E>
i. [1 mark] <COMMIT C>
Solution:
ii. [3 marks] <START E>
Solution:
iii. [3 marks] <B, N, 50>
Solution:
13/14
c) [7 marks] Undo/Redo Logging
Consider the following set of undo/redo log records, explain what happens to both disk and log
in case a failure occurs and the last log to appear on disk is:
<START A>
<A, X, 4, 41>
<A, Y, 2, 21>
<START B>
<A, Z, 3, 31>
<START C>
<B, M, 100, 101>
<B, N, 50, 51>
<C, L, 20, 21>
<COMMIT B>
<START D>
<COMMIT C>
<D, O, 12, 11>
<D, P, 13, 15>
<COMMIT D>
<COMMIT A>
<START E>
<E, Q, 85, 81>
<E, R , 32, 31>
<COMMIT E>
i. [1 mark] <COMMIT C>
Solution:
ii. [3 marks] <START E>
Solution:
iii. [3 marks] <B, N, 50>
Solution:
14/14
Question 6. SQL Transactions Modes [11 marks total]
In this question, you are going to provide an example which includes 2 transactions to show how each
transaction isolation level works. Your example must be different from the one in lecture slides. You
will Show the result of running the two transactions at the same time (i.e. concurrently) in the same
isolation level.
Note: below is the one the lecture slide. Students are not allowed to use this one. Each student should
come with his/her own example solution.
T1: (max) SELECT MAX(price) FROM Sells WHERE bakery= ‘Joe’’s Bakery’;
(min) SELECT MIN(price) FROM Sells WHERE bakery = ‘Joe’’s Bakery’;
T2: (del) DELETE FROM Sells WHERE Bakery= ‘Joe’’s Bakery’;
(ins) INSERT INTO Sells VALUES (‘Joe’’s Bakery’, ‘French’,3.50);
a) [1 mark] Draw a table with column names and some data inside that you will use to answer
the rest of this question:
b) [2 marks] Write two SQL transactions doing operations on the table above you defined in a).
One of the two transactions must include update and insert SQL statements. Your transactions
content must be relevant to this question and not just any arbitrary SQL.
Solution:
Transaction 1 SQL statements:
Transaction 2 SQL statements:
15/14
c) [2 marks] For the SERIALIZABLE isolation level, what is the result of running T1 and T2
both at the same time in that level? Show the order of execution of the statements in T1 and T2.
d) [2 marks] For the REPEATABLE READ isolation level, what is the result of running T1 and
T2 both at the same time in that level? Show the order of execution of the statements in T1 and
T2.
e) [2 marks] For the READ COMMITTED isolation level, what is the result of running T1 and
T2 both at the same time in that level? Show the order of execution of the statements in T1 and
T2.
f) [2 marks] For the READ UNCOMMITTED isolation level, what is the result of running T1
and T2 both at the same time in that level? Show the order of execution of the statements in T1
and T2.
End of Exam
16/14
Download