Uploaded by ahmed.haanyyy

CS4320 Final2017 withSolut

advertisement
CS4320 Fall 2017
Final Exam
Page 1! of 12
CS4320 Exam
December 7th, 2017
(150 minutes working time)
Name: __________________________________
Cornell NETID: _________
I understand and will adhere to the Cornell Code of Academic Integrity.
---------------------------------------------------------------Signature
Maximum number of points possible: 120. This exam counts for 30 % of your overall
grade. Questions vary in difficulty. Do not get stuck on one question.
In all problems, whenever you think a problem is underspecified, make assumptions and
clearly state them.
Good luck!
You have 150 minutes working time
for this exam.
CS4320 Fall 2017
Final Exam
Page 2! of 12
Part A) SQL Queries. (20 points)
Consider the database created by the following SQL commands. This database, used as example
throughout the lecture, stores information on sailors, boats, and reservations for boats by sailors.
CREATE TABLE sailors(sid INTEGER PRIMARY KEY,
name VARCHAR(100));
CREATE TABLE boats (bid INTEGER PRIMARY KEY,
color VARCHAR(5));
CREATE TABLE reserves(sid INTEGER, bid INTEGER, PRIMARY
KEY(sid, bid), FOREIGN KEY (sid) REFERENCES sailors(sid),
FOREIGN KEY (bid) REFERENCES boats(bid));
In the following, we ask you to find equivalent reformulations for given SQL queries (i.e., to find
another SQL query that produces exactly the same result for each possible database instance that
is consistent with all constraints implied by the schema definition).
A.1) Reformulate the following SQL query without using the AND keyword:
SELECT * FROM boats WHERE (bid <> 1 AND bid <> 2);
(5 points)
SELECT * FROM boats WHERE NOT (bid = 1 OR bid = 2);
A.2) Reformulate the following SQL query without using the sailors table in any FROM
clause:
SELECT S.sid FROM sailors S WHERE EXISTS (SELECT * FROM
reserves WHERE sid = S.sid);
(5 points)
SELECT DISTINCT sid FROM reserves;
CS4320 Fall 2017
Final Exam
Page 3! of 12
A.3) Reformulate the following SQL query without using the ISNULL keyword:
SELECT S.sid FROM sailors S LEFT OUTER JOIN reserves R ON
(S.sid = R.sid) WHERE R.sid ISNULL;
(5 points)
SELECT S.sid FROM sailors S WHERE NOT EXISTS (SELECT * FROM
reserves R WHERE R.sid = S.sid);
A.4) Reformulate the following SQL query without using the HAVING keyword:
SELECT sid AS sailor, COUNT(*) AS count FROM reserves GROUP
BY sid HAVING COUNT(*) > 2;
(5 points)
SELECT * FROM (SELECT sid AS sailor, COUNT(*) AS count FROM
reserves GROUP BY sid) AS S WHERE count > 2;
CS4320 Fall 2017
Final Exam
Page 4! of 12
Part B) Relational Operators. (15 points)
The following questions refer to two relations, R and S. R contains 100,000 tuples with 1,000
tuples per disc page. S contains 50,000 tuples with 50 tuples per disc page.
B.1) We join R and S by an index nested loops join. A suitable index is defined on S, we assume
for simplicity that each index access requires exactly one disc page read. Calculate the number of
disc accesses for the join (do not consider cost for writing out the join result).
(5 points)
We need to read the relation without index, R, which costs 100,000 / 1,000 = 100 disc reads.
Then, for each tuple in R, we access the index once: 100,000 disc reads. In total, we have
100,100 disc reads.
B.2) We join R and S by a block nested loops join. Assume that 102 buffer pages are available.
Choose the outer relation (and block size) leading to minimal join cost and calculate the number
of disc reads required for the join (do not count cost for writing out the join result).
(5 points)
Reserving one buffer page for the output and one as input buffer for the inner relation, 100 buffer
pages remain to store blocks of the outer operand. We choose the smaller relation, R with
100,000 / 1,000 = 100 pages versus S with 50,000 / 50 = 1,000 pages, as outer operand. Then, the
join cost is 100 + 1 * 1,000 = 1,100 disc reads.
B.3) We join R and S by the hash join seen in the lecture (we assume that only one partitioning
pass is necessary). Calculate the number of times that a hash function is evaluated in order to
perform the join.
(5 points)
During the partitioning phase, a hash function is evaluated for each input tuple from both
relations. During the next phase, we use a second hash function on each input tuple to match
tuples in the same partition. In total, we have 2 * (100,000 + 50,000) = 300,000 evaluations.
CS4320 Fall 2017
Final Exam
Page 5! of 12
Part C) Schema Design and Normalization. (20 points)
C.1) Draw an ER diagram capturing the following scenario. There are two types of entities,
employees and stores. Employees are characterized by two attributes, their name and their
employee ID (the employee ID is unique). Stores are characterized by their store ID (which is
unique). Employees may manage other employees and each employee has at most one manager.
Employees work at stores and each employee is assigned to at least one store.
Make sure that all entity types, relationships, attributes, and constraints implied by the
description are also represented in your ER diagram.
(10 points)
Manager
ID
name
Employee
Manages
Subordinate
Worksat
Store
store_id
CS4320 Fall 2017
Final Exam
Page 6! of 12
C.2) Consider a relation schema R with seven attributes: ABCDEFG. Attribute A is a key of the
relation. The following functional dependencies hold in addition:
BC → A, DE → F, and B → G.
Decompose R (via lossless-join decomposition) by the method seen in the lecture until BoyceCodd Normal Form (BCNF) is reached. Justify for each single decomposition step why it is
required (by pointing out why the current schema is not in BCNF yet). Justify for the final result
why it is in BCNF.
(10 points)
ABCDEFG
Not in BCNF due to B → G as B is no key (and no trivial dependency either).
Decomposed into two relations: ABCDEF and BG.
Not in BCNF due to DE → F as DE is no key (and no trivial dependency either).
Decompose ABCDEF into ABCDE and DEF.
The result (i.e., relations ABCDE, DEF, and BG) is in BCNF since BC, DE, and B are keys in
their respective relations.
CS4320 Fall 2017
Final Exam
Page 7! of 12
Part D) Concurrency Control. (25 points)
In the following, we ask you to write out schedules with certain properties. Use the notation seen
in the lecture (i.e., WT(A) means transaction number T writes object A, RT(A) means transaction
T reads object A, CT means transaction T commits, AT means transaction T aborts).
D.1) Propose a schedule involving two transactions that is not conflict-serializable. (4 points)
W1(A) R2(A) W1(A)
D.2) Propose a schedule involving two transactions that exposes the unrepeatable read anomaly.
(4 points)
R1(A) W2(A) C2 R1(A)
D.3) Transform the following schedule into an equivalent serial schedule (i.e., a serial schedule
containing the same operations that has the same conflict graph):
R3(A) R2(A) R3(C) W3(A) R1(C) W2(B) W1(B) W1(C)
(5 points)
R2(A) W2(B) R3(A) R3(C) W3(A) R1(C) W1(B) W1(C)
CS4320 Fall 2017
Final Exam
Page 8! of 12
D.4) Name one advantage of conservative two-phase locking compared to non-conservative twophase locking (i.e., name one reason why conservative two-phase locking may lead to better
performance). (4 points)
Conservative two-phase locking avoids deadlocks which can be costly.
D.5) What is the “wait-die” policy? Explain in less than five sentences its purpose and how it
works (i.e., explain what happens in different cases). (4 points)
This policy is used for avoiding deadlocks. If a transaction with higher priority requests a lock
held by a transaction with lower priority then the former transaction waits. If a transaction with
lower priority requests a lock held by a transaction with higher priority then the former
transaction aborts.
D.6) During the validation of a transaction Tj in optimistic concurrency control, we consider an
earlier transaction Ti that finished before Tj started its write phase. Under which condition on Ti
and Tj will Tj need to be aborted? (4 points)
We need to abort Tj if the write set of Ti (i.e., set of objects written by Ti) overlaps with the read
set of Tj (i.e., set of objects read by Tj).
CS4320 Fall 2017
Final Exam
Page 9! of 12
Part E) Logging and Recovery with ARIES. (20 points)
Consider the following (simplified) log entries:
0 T2 Updates P2
5 T1 Updates P3
10 begin_checkpoint
15 end_checkpoint
20 T1 Abort
25 T3 Updates P7
30 CLR: Undo T1 LSN 5
35 T3 Updates P2
40 T3 Commit
We assume that those are the last log entries before a system crash. The ARIES algorithm is used
for recovery, starting from the checkpoint shown. At the checkpoint, the dirty page table contains
only page P2 with recLSN=0 (i.e., time at which page became dirty) and the transaction table
contains transaction T2 with lastLSN=0 (i.e., last log entry by transaction) and T1 with
lastLSN=5. Both transactions are active (i.e., neither committed nor aborted) at the checkpoint.
E.1) The log entries do not give any information on when data pages are written back to hard
disc (e.g., due to “page stealing”). Based on all available information, point out one page that
must have been written back to hard disc and justify in at most two sentences. (5 points)
Page P3 does not appear in the dirty page table at the checkpoint despite the update at LSN 5.
Hence, it was written back to disc (and taken out of the dirty page table) before the checkpoint.
E.2) Fill in the following table, representing the state of the dirty page table after the analysis
phase is completed (you may use less than the available number of rows). (5 points)
Page
P2
P7
P3
recLSN
0
25
30
CS4320 Fall 2017
Final Exam
Page 10
! of 12
E.3) Which compensation log records are written during the undo phase? Specify those entries in
the format used above (i.e., CLR: Undo TX LSN Y for transaction X and log entry number Y).
(5 points)
CLR: Undo T2 LSN 0
E.4) The ARIES algorithm requires writing parts of the log to stable storage under certain
conditions (write-ahead logging). Which log entries in the log above must have caused such a
“log flush”? Justify in at most two sentences.
(5 points)
LSN 40 causes a log flush due to the transaction commit.
(P3 is written back to disc between LSN 5 and 15 which also causes a log flush)
CS4320 Fall 2017
Final Exam
Page 11
! of 12
Part F) Distributed DBMS, MapReduce. (20 points)
F.1) We join two relations located at two different sites, connected via a network. Explain in less
than four sentences why a bloom-join might be faster than an approach based on a semi-join.
(5 points)
The semi-join approach ships a projection on the join column from one site to the other while the
bloom-join only ships a bit vector. The bit vector is typically smaller and therefore faster to send.
F.2) A distributed DBMS stores N replica of a data set, R designates the number of replicas
accessed for each read operation, and W the number of replicas that need to be updated for a
successful write. What inequality on N, R, and W must hold to guarantee strong consistency?
(5 points)
We must have R + W > N.
F.3) What is the two-phase commit protocol? Explain shortly, in at most three sentences, what
happens in each of the two phases.
(5 points)
It’s a consensus protocol for distributed transaction processing. In the first phase (voting phase),
the coordinator queries the subordinates to find out whether they are able to commit the current
transaction. In the second phase (termination phase), the coordinator sends the final decision
(whether to commit or to abort) to all subordinates.
F.4) What are “stragglers”? Describe, in at most three sentences, one strategy by which typical
MapReduce implementations try to minimize their negative impact.
(5 points)
Stragglers are map or reduce tasks that take unusually long, the MapReduce framework typically
creates multiple task instances (“backup tasks”) and proceeds as soon as the first task instance
completes.
CS4320 Fall 2017
Final Exam
Page 12
! of 12
CS4320 Final Exam
This page will be used for grading your exam. Do not write anything on this page.
SECTION
Part A
QUESTION
A.1 (Max: 5 points)
SCORE
SECTION TOTAL
(Max: 20 points)
A.2 (Max: 5 points)
A.3 (Max: 5 points)
A.4 (Max: 5 points)
Part B
B.1 (Max: 5 points)
(Max: 15 points)
B.2 (Max: 5 points)
B.3 (Max: 5 points)
Part C
C.1 (Max: 10 points)
(Max: 20 points)
C.2 (Max: 10 points)
Part D
D.1 (Max: 4 points)
(Max: 25 points)
D.2 (Max: 4 points)
D.3 (Max: 5 points)
D.4 (Max: 4 points)
D.5 (Max: 4 points)
D.6 (Max: 4 points)
Part E
E.1 (Max: 5 points)
(Max: 20 points)
E.2 (Max: 5 points)
E.3 (Max: 5 points)
E.4 (Max: 5 points)
Part F
F.1 (Max: 5 points)
F.2 (Max: 5 points)
F.3 (Max: 5 points)
F.4 (Max: 5 points)
Total (Max: 120 points)
(Max: 20 points)
Download