MTIf04Sol.doc

advertisement

University of California, Berkeley

College of Engineering

Computer Science Division – EECS

Fall 2004 Prof. Michael J. Franklin

MIDTERM I

CS 186 Introduction to Database Systems

NAME : _____ D.B. Guru ____ STUDENT ID:_____ 123456789 ___

IMPORTANT:

Circle the last two letters of your class account:

cs186 a b c d e f g h i j k l m n o p q r s t u v w x y z

a b c d e f g h i j k l m n o p q r s t u v w x y z

DISCUSSION SECTION DAY & TIME:_ W 2am ________ TA NAME: __ ????

___

This is a closed book examination – but you are allowed one 8.5” x 11” sheet of notes (double sided). You should answer as many questions as possible. Partial credit will be given where appropriate. There are 100 points in all. You should read all of the questions before starting the exam, as some of the questions are substantially more time-consuming than others.

Write all of your answers directly on this paper. Be sure to clearly indicate your final answer for each question. Also, be sure to state any assumptions that you are making in your answers.

GOOD LUCK!!!

Problem

1. Buffer Manager

Possible

20

Score

2. B-Trees 18

3. Hash Indexes

4. Formal Query Languages

5. Short Answer

TOTAL

20

22

20

100

SID:____________________

T9

T10

T11

T12

T13

T14

T4

T5

T6

T7

T8

Time

|

|

V

T1

T2

Question 1 – Buffer Management [3 parts, 10 points total] a) [10 points] You are given a database system with four buffer frames (A, B, C, D) and a file of five disk pages (1, 2, 3, 4, 5). Assume that you start with an empty buffer pool. A sequence of requests is made to the buffer manager as described in the Request column (below). At certain times a Pin request is immediately followed by an Unpin request (represented as Pin/Unpin), but other times Pin and Unpin requests happen in an interlaced manner with other requests.

Fill in the following table using the MRU page replacement policy.

Note: Total points for this problem is 10, (1 each for correct buffer contents at T5-T14)

Buffer Frames Request

|

|

V

Pin 5

Pin/Unpin 2

A

5

5

B

2

C D

0

0

Points

(8)

|

V

T3 Pin/Unpin 3 5 2 3 0

Pin 4

Pin 1

Unpin 5

Unpin 4

Pin/Unpin 1

5

5

5

5

5

2

2

2

2

2

1

1

1

3

1

4

4

4

4

4

1

1

1

0

1

Pin 1

Pin/Unpin5

Pin 4

Unpin 1

Pin 3

Pin 5

5

5

3

5

5

3

2

2

2

2

2

5

1

1

1

1

1

1

4

4

4

4

4

4

1

1

1

1

1

1

CS 186 Midterm I October 4, 2004 Page 2 of 12

SID:____________________

Question 1 – Buffer Management (continued)

Now, consider the Full 2Q buffer replacement policy (use the new pseudocode algorithm that you implemented in part 2 of homework 1. Assume that Kout is 50% of the total buffer pool). Answer the following two questions given the state of the buffer pool below:

A

Buffer Frames

B

1 (pinned) 2 (pinned)

C

4

D

3

A1in

4

Queue State

Am

3

A1out

3 b) [5 points] Assuming that Kin is 25% of the total number of unpinned pages, show the state of the buffer pool (including the frame contents and the queue state) after a request is made to pin page 5.

A

1(P)

Buffer Frames

B

2(P)

C

5(P)

D

3

A1in

Queue State

Am

3

A1out

3,4<head c) [5 points] Starting with the initial buffer pool contents (i.e., shown at the top of this page), but now assuming that Kin is 50% of the total number of unpinned pages, show the state of the buffer pool (including the frame contents and the queue state) after a request is made to pin page 5.

A

1(P)

Buffer Frames

B C

2(P) 4

D

5(P)

A1in

Queue State

Am A1out

4 3

CS 186 Midterm I October 4, 2004 Page 3 of 12

SID:____________________

Question 2 – B+Trees [6 parts, 3 points total]

For each of the following B+ Trees, decide whether it is a valid B+ Tree (i.e., one that could exist after numerous inserts and deletes) or if it is invalid. Circle your choice, and if it is invalid, describe in one sentence the single main reason why. Note: All of the trees are of order d=2. An

“*” next to a key indicates that it is a “data entry”. Be sure to look carefully. a) [3 points] circle one: valid invalid

If invalid, why? Invalid: Tree not balanced.

Leaf nodes on different levels.

20 40 60 80

3* 8* 14* 17*

23 29 45 50 55 60* 62* 65* 75* 90* 93* 95* 99*

20* 21* 22* 40* 42* 43*

24* 26* 27* 28*

31* 32* 39*

45* 47* 48* 49* b) [3 points] circle one: valid invalid

50* 52* 54*

56* 57* 58*

If invalid, why? Valid

10 17

25 32

20 45

56 70 90

2* 3* 5*

10* 12* 14* 15*

17* 18* 19*

20* 22* 23*

26* 28* 30*

32* 35* 44*

45* 47* 52*

58* 62* 68*

70* 72* 74* 85*

92* 95* 96* 98*

CS 186 Midterm I October 4, 2004 Page 4 of 12

Question 2 – B+Trees (continued) c) [3 points] circle one: valid invalid

If invalid, why? Valid

SID:____________________

10

2* 3* 5* 10* 12* 14* 15* d) [3 points] circle one: valid invalid

If invalid, why? Invalid: nodes 25 and 92 are below the order of the tree

10 17

25

20 45

56 70 90

2* 3* 5*

10* 12* 14* 15*

17* 18* 19*

20* 22* 23*

26* 28* 30*

45* 47* 52*

58* 62* 68*

70* 72* 74* 85*

92*

CS 186 Midterm I October 4, 2004 Page 5 of 12

Question 2 – B+trees (Continued)

SID:____________________ e) [3 points] circle one: valid invalid

If invalid, why? ? Invalid: 21,22,23 are out of place, note: 24 and 47 can appear in multiple non-leaf nodes

10 17

24 36

24 47

47 70 91

2* 3* 5*

10* 12* 14* 15*

18* 19* 20*

21* 22* 23*

26* 28* 30*

36* 38* 44*

47* 49* 52*

58* 62* 68*

70* 72* 74* 85*

92* 95* 96* f) [3 points] circle one: valid invalid

If invalid, why? Invalid: 41 and 83 are out of place.

30 37

50 65 73

40 84

90 95

20* 22* 25*

30* 32* 35* 36*

37* 38* 39* 41*

43* 46* 47*

50* 53* 62*

67* 68* 70* 72*

74* 76* 82*

83* 85* 86* 89*

91* 92* 94*

95* 98* 99*

CS 186 Midterm I October 4, 2004 Page 6 of 12

2

00

01

10

11

SID:____________________

Question 3 – Hashing [2 parts, 20 points]: a) [10 points] Extendible Hashing

Consider the following 4 update operations. operation no. operation key value (binary)

1

2 insert insert

20 (010100)

46 (101110)

3 insert 18 (010010)

4 insert 23 (010111)

Now, consider an extendible hash structure where each bucket can hold up to 4 entries, with a hash function

h(n) = n mod

depth

2

and an initial state as shown below.

Draw the extendible hash structure and its contents after the 4 operations have occurred in the order shown. We recommend that you do your scratch work on this page at first. But, this page will not be graded. Y

ou MUST put your final answer on the following page!!

2

8 16

1

5 7 13 21

2

6 10 22

CS 186 Midterm I October 4, 2004 Page 7 of 12

SID:____________________

Final answer for Question 3(a) - Extendible Hashing:

Only this page will be graded for question 3(a).

The final structure should have a directory of size 8 so use the template below.

Show all buckets and pointers

Label the directory entries with their corresponding hash value (as on the previous page).

Make sure to include local depths for all buckets and the global depth of the directory.

2

3

8 16 20

000

2

001

5 13 21

010

3

011

10 18

100

101 3

110 6 22 46

111

2

7 23

CS 186 Midterm I October 4, 2004 Page 8 of 12

SID:____________________

Question 3 – Hashing (continued) b) [10 points] Linear Hashing

Now, answer the question from part (a) for linear hashing . That is, given the initial state:

8 16

5 7 13 21

6 10 22

Draw the linear hashing structure that results after performing the four insert operations below

(same ones as in part a). Be sure to fill in the values for N, Level, and Next for your answer.

Also, please clearly indicate your answer so we can find it!

operation no. operation key value (binary)

1 insert 20 (010100)

2

3

4 insert insert insert

46 (101110)

18 (010010)

23 (010111)

Next

N = 2

Level = 1

Next = 0

8 16 20

5 13 21

6 10 22 46

7 23

18

CS 186 Midterm I October 4, 2004 Page 9 of 12

SID:____________________

Question 4 – Formal Query Languages [4 parts, 22 points total]

Consider the following schema that records information about students and the courses they have taken (Primary keys are underlined):

Student (sid, sname, address, age)

Course (cid, cname, credits)

Grade(sid, cid, semester, year, grade)

Assume that all students have taken at least one course. Express the following queries in the language requested.

NOTE: we use English for our solutions instead of greek symbols because we wrote this in Word, but we expected you to use the corresponding greek symbols. Also, there are many correct answers for some of these queries., We just give a sample. a) [5 points] Find the sids of students who have received A’s in all classes they have taken. Use

Relational ALGEBRA.

PROJECT sid

(Grade) – PROJECT sid

(SELECT

Grade != ‘A’

(Grade)) b) [5 points] Find the sids of students who have received A’s in all classes they have taken. Use

Relational CALCULUS.

{X | EXISTS S in Students ( FORALL G in Grades ((G.Sid = S.Sid AND X.Sid = S.Sid)

IMPLIES G.Grade = A))} c) [6 points] Find the sids of students who have taken both “Interpretive Dance” and “Data

Structures”.

Use either Algebra or Calculus, but not both.

Project_sid(Select_cname='Interpretive Dance'(Grades Join Courses))

Intersection

Project_sid(Select_cname='Data Structures'(Grades Join Courses)) d) [6 points] Find the cids of courses that have been taken by no students. Use either Algebra or

Calculus, but not both.

Project_cid(Course) – Project_cid(Grade)

CS 186 Midterm I October 4, 2004 Page 10 of 12

SID:____________________

Question 5 – Short Answer [7 parts, 20 points total] a) [2 points] In theory, Hash structures such as Linear Hashing have the potential to require fewer disk reads than B+Trees for exact match searches (i.e., equality predicate on a search key).

In practice, this isn’t always the case. Which of the following are valid reasons for why linear hashing may not provide a savings in disk I/Os compared to B+Trees. ( there may be more than one ):

Answer is B and D

A.

Computing hash functions is very expensive.

B.

The upper levels of the B+tree may be in memory.

C.

B+Trees have less internal (i.e., within a block) fragmentation.

D.

Hash buckets may have long overflow chains.

E.

None of the above. b) [3 points] Disk manufacturers are building disks that spin faster than they used to (e.g., modern disks can spin at 15,000 RPM, compared to 7,200 RPM before). The main components of the cost of performing a disk read are seek time, rotational delay , and transfer time . For each of these three components state whether or not it is reduced by having faster spinning disks :

Component

Seek Time

Reduced by faster spinning? (Yes or No)

NO

Rotational Delay YES

Transfer Time YES c) [3 points] Given a relational algebra expression and the schemas of the relations on which it is written, which of the following is true? ( circle one only )

Answer is: A

A.

You can always know the airity of the result.

B.

You can always know the cardinality of the result.

C.

You can always know both.

D.

You can’t know either for sure.

CS 186 Midterm I October 4, 2004 Page 11 of 12

SID:____________________ d) [3 points] Consider a B+Tree index that starts out empty. One million data entries with unique keys are then inserted in sorted order (with keys 1 to 1,000,000). Assuming that the inserts are done using the regular b+tree insert (i.e., no bulk loading), and that we use alternative 2 data entries (<key_value, tuple_id> pairs), which of the following is true after the inserts are done?

( Circle one only) Answer is B (try it!)

A.

The leaf pages will be maximally full.

B.

The leaf pages will be minimally full.

C.

Not enough information to tell. e) [3 points] Given a relation instance with multiple columns and lots of rows, which of the following is true? ( Circle one only) Answer is B – key is property of schema, not any one instance.

A. You can always determine with certainty the primary key of the relation.

B. You can never determine with certainty the primary key of the relation.

C. You can sometimes determine with certainty the primary key of the relation.

For the remaining two questions, consider the following relations:

CREATE TABLE Customers ( cid CHAR(10), cname CHAR(50), zipcode INT,

PRIMARY KEY (cid))

CREATE TABLE Products ( pid CHAR(10),pname CHAR(50),

PRIMARY KEY (pid))

CREATE TABLE Orders ( cid CHAR(10), pid CHAR(10),amount int,

PRIMARY KEY (cid, pid),

FOREIGN KEY (cid) REFERENCES Customers,

FOREIGN KEY (pid) REFERENCES Products) f) [3points]

Suppose someone tries to delete a “Customer” tuple for which orders exist. List three possible actions that the database system could do in response to this:

1. “ No Action” – means reject the delete

2. Cascade – means delete the customer and all orders with that reference that customer.

3. Delete Customer and set CID field in orders that reference that customer to a special value (NULL or a default) g) [3 points] For the Customers relation described above, suppose you knew that the combination of “cname” and “zipcode” was a candidate key and that “cid” was a candidate key. What is an argument for choosing “cid” as the primary key over “cname,zipcode”?

Two good possible answers: 1) cid is smaller/simpler than the composite key so it is easier to index/cheaper to store as a foreign key, etc. 2) zipcode is more likely to change than cid (which is presumably system-assigned. NOTE: cname, zipcode not unique is NOT a correct answer, since the question stated that that combination is a candidate key, meaning that it is unique by definition.

CS 186 Midterm I October 4, 2004 Page 12 of 12

Download