CS 405G: Introduction to Database Systems Lecture 10: Normalization and Transactions

advertisement
CS 405G: Introduction to
Database Systems
Lecture 10: Normalization and Transactions
Instructor: Chen Qian

3/28 Quiz 4
7/1/2016
Chen Qian @ University of Kentucky
2
Normalization

A normalization is the process of organizing the fields
and tables of a relational database to minimize
redundancy and dependency.

A normal form is a certification that tells whether a
relation schema is in a particular state
7/1/2016
Chen Qian @ University of Kentucky
3
First Normal Form ( 1NF )

NF is to characterize a relation (not an attribute, a key,
etc…)


We can only say “this relation or table is in 1NF”
A relation is in first normal form if the domain of each
attribute contains only atomic values, and the value of
each attribute contains only a single value from that
domain.
7/1/2016
Chen Qian @ University of Kentucky
4
7/1/2016
Chen Qian @ Univ of Kentucky
2nd Normal Form


An attribute A of a relation R is a nonprimary attribute if
it is not part of any key in R, otherwise, A is a primary
attribute.
R is in (general) 2nd normal form if every nonprimary
attribute A in R is not partially functionally dependent
on any key of R
7/1/2016
Chen Qian @ University of Kentucky
6
7/1/2016
Chen Qian @ Univ of Kentucky
Decomposition
EID
PID
Ename
email
Pname
Hours
1234
10
John Smith
jsmith@ac.com
B2B platform
10
1123
9
Ben Liu
bliu@ac.com
CRM
40
1234
9
John Smith
jsmith@ac.com
CRM
30
1023
10
Susan Sidhuk
Decomposition
ssidhuk@ac.com B2B platform
40
Foreign key
EID
Ename
email
EID
PID
Pname
Hours
1234
John Smith
jsmith@ac.com
1234
10
B2B platform
10
1123
Ben Liu
bliu@ac.com
1123
9
CRM
40
1023
Susan Sidhuk
ssidhuk@ac.com
1234
9
CRM
30
1023
10
B2B platform
40


Decomposition eliminates redundancy
To get back to the original relation, use natural join.
7/1/2016
Chen Qian @ University of Kentucky
8
Decomposition

Decomposition may be applied recursively
7/1/2016
EID
PID
Pname
Hours
1234
10
B2B platform
10
1123
9
CRM
40
1234
9
CRM
30
1023
10
B2B platform
40
PID
Pname
EID
PID
Hours
10
B2B platform
1234
10
10
9
CRM
1123
9
40
1234
9
30
1023
10
40
Chen Qian @ University of Kentucky
9
Third normal form
• 3NF requires that there are no non-trivial
functional dependencies of non-key attributes on
something other than a superset of a candidate key.
• Recall: non-trivial FD means LHS has no
intersection with RHS.
• In summary, all non-key attributes are mutually
independent.
7/1/2016
Chen Qian @ University of Kentucky
10
Boyce-Codd normal form (BCNF)
• BCNF requires that there are no non-trivial
functional dependencies of attributes on something
other than a superset of a candidate key (called a
superkey).
• All attributes are dependent on a key, a whole key
and nothing but a key (excluding trivial
dependencies, like A->A).
7/1/2016
Chen Qian @ University of Kentucky
11
• A table is said to be in the BCNF if and only if
it is in the 3NF and every non-trivial, leftirreducible functional dependency has a
candidate key as its determinant.
• In more informal terms, a table is in BCNF if it
is in 3NF and the only determinants are the
candidate keys.
7/1/2016
Chen Qian @ University of Kentucky
12
BCNF decomposition example
WorkOn (EID, Ename, email, PID, hours)
BCNF violation: EID -> Ename, email
Student (EID, Ename, email)
BCNF
7/1/2016
Grade (EID, PID, hours)
BCNF
Chen Qian @ University of Kentucky
13
Another example
WorkOn (EID, Ename, email, PID, hours)
BCNF violation: email -> EID
StudentID (email, EID)
BCNF
StudentGrade’ (email, Ename, PID, hours)
BCNF violation: email -> Ename
StudentName (email, Ename)
Grade (email, PID, hours)
BCNF
BCNF
7/1/2016
Chen Qian @ University of Kentucky
14
Normalization
There is a sequence to normal forms:
1NF is considered the weakest,
2NF is stronger than 1NF,
3NF is stronger than 2NF, and
BCNF is considered the strongest
Also,
any relation that is in BCNF, is in 3NF;
any relation in 3NF is in 2NF; and
any relation in 2NF is in 1NF.
7/1/2016
15
In 3NF, but not in BCNF:
Instructor teaches one
course only.
student_no course_no instr_no
Student takes a course
and has one instructor.
{student_no, course_no}  instr_no
instr_no  course_no
since we have instr_no  course-no, but instr_no is not a
Candidate key.
7/1/2016
16
student_no course_no instr_no
student_no instr_no
course_no instr_no
{student_no, instr_no}  student_no
{student_no, instr_no}  instr_no
instr_no  course_no
7/1/2016
17
2NF, but not in 3NF, nor in BCNF:
inv_no
line_no prod_no prod_desc
qty
since prod_no is not a candidate key and we have:
prod_no  prod_desc.
7/1/2016
18
Summary

Philosophy behind BCNF:
Data should depend on the key, the whole key, and
nothing but the key!

Philosophy behind 3NF:
… But not at the expense of more expensive constraint
enforcement!
7/1/2016
19
Basic knowledge

Transaction view of DBMS



ACID





Read(x)
Write(x)
Atomicity: TX’s are either completely done or not done at all
Consistency: TX’s should leave the database in a consistent state
Isolation: TX’s must behave as if they are executed in isolation
Durability: Effects of committed TX’s are resilient against failures
SQL transactions
-- Begins implicitly
SELECT …;
UPDATE …;
ROLLBACK | COMMIT;
7/1/2016
Chen Qian @ University of Kentucky
20
Concurrency control

Goal: ensure the “I” (isolation) in ACID
T1:
read(A);
write(A);
read(B);
write(B);
commit;
A B
7/1/2016
T2:
read(A);
write(A);
read(C);
write(C);
commit;
C
Chen Qian @ University of Kentucky
21
Good versus bad schedules
Good!
T1
T2
r(A)
w(A)
r(B)
w(B)
T1
Good! (But why?)
T2
T1
r(A)
r(A)
w(A)
r(A)
Read 400
Write w(A)
400 – 100
r(A)
w(A)
r(C)
w(C)
7/1/2016
Bad!
r(B)
Read 400
r(A)
w(A)
w(A) Write
400 – 50
r(B)
r(C)
w(B)
T2
r(C)
w(B)
w(C)
Chen Qian @ University of Kentucky
w(C)
22
Serial schedule

Execute transactions in order, with no interleaving of
operations





T1.r(A), T1.w(A), T1.r(B), T1.w(B), T2.r(A), T2.w(A),
T2.r(C), T2.w(C)
T2.r(A), T2.w(A), T2.r(C), T2.w(C), T1.r(A), T1.w(A),
T1.r(B), T1.w(B)
Isolation achieved by definition!
Problem: no concurrency at all
Question: how to reorder operations to allow more
concurrency
7/1/2016
Chen Qian @ University of Kentucky
23
Conflicting operations

Two operations on the same data item conflict if at least
one of the operations is a write






r(X) and w(X) conflict
w(X) and r(X) conflict
w(X) and w(X) conflict
r(X) and r(X) do not
r/w(X) and r/w(Y) do not
Order of conflicting operations matters

E.g., if T1.r(A) precedes T2.w(A), then conceptually, T1
should precede T2
7/1/2016
Chen Qian @ University of Kentucky
24
Precedence graph


A node for each transaction
A directed edge from Ti to Tj if an operation of Ti
precedes and conflicts with an operation of Tj in the
schedule
T1
T2
r(A)
w(A)
T2
T1
r(A)
r(A)
w(A)
r(B)
r(C)
w(B)
7/1/2016
T1
T1
w(C)
r(A)
T2
w(A)
Good:
no cycle
w(A)
r(B)
r(C)
w(B)
w(C)
Chen Qian @ University of Kentucky
T2
Bad:
cycle
25
Conflict-serializable schedule


A schedule is conflict-serializable iff its precedence
graph has no cycles
A conflict-serializable schedule is equivalent to some
serial schedule (and therefore is “good”)


In that serial schedule, transactions are executed in the
topological order of the precedence graph
You can get to that serial schedule by repeatedly
swapping adjacent, non-conflicting operations from
different transactions
7/1/2016
Chen Qian @ University of Kentucky
26
Remember those from OS class?

Lock: a high-level concept that describe the state of a
data item with respect to read/write operations




Deadlock:


Spinlock
Semaphore
Monitor
A set of processes is deadlocked if each process is waiting
for an event that only another process in the set can cause
Starvation:

A program continues to run indefinitely but fail to make
any progress
7/1/2016
Chen Qian @ University of Kentucky
27
Next

Guarantee conflict-serializable schedule with 2 phase
locking
7/1/2016
Chen Qian @ University of Kentucky
28
Locking

Rules



If a transaction wants to read an object, it must first request a
shared lock (S mode) on that object
If a transaction wants to modify an object, it must first request
an exclusive lock (X mode) on that object
Allow one exclusive lock, or multiple shared locks
Mode of the lock requested
Mode of lock(s)
currently held
by other transactions
7/1/2016
S
X
S
Y
N
X
N
N
Grant the lock?
Compatibility matrix
Chen Qian @ University of Kentucky
29
Basic locking is not enough
Add 1 to both A and B
(preserve A=B)
T1
T2
Multiply both A and B by 2
(preserves A=B)
lock-X(A)
Read 100
Write 100+1
r(A)
w(A)
unlock(A)
lock-X(A)
r(A)
w(A)
Possible schedule
under locking
Read 101
Write 101*2
unlock(A)
lock-X(B)
But still not
conflict-serializable!
r(B)
w(B)
T1
T2
Read 100
Write 100*2
unlock(B)
lock-X(B)
Read 200
Write 200+1
7/1/2016
r(B)
w(B)
Chen Qian @ University of Kentucky
unlock(B)
A  B!
30
Two-phase locking (2PL)

All lock requests precede all unlock requests

Phase 1: obtain locks, phase 2: release locks
T1
lock-X(A)
r(A)
w(A)
lock-X(B)
unlock(A)
r(B)
w(B)
7/1/2016 unlock(B)
T2
T1
2PL guarantees a
conflict-serializable r(A)
w(A)
schedule
lock-X(A)
r(A)
r(B)
w(A)
w(B)
lock-X(B)
r(B)
w(B)
Cannot obtain the lock on B
until T1 unlocks
Chen Qian @ University of Kentucky
T2
r(A)
w(A)
r(B)
w(B)
31
Problem of 2PL
T1
T2
r(A)
w(A)

r(A)
w(A)
r(B)
w(B)
Abort!


r(B)
w(B)

T2 has read uncommitted data
written by T1
If T1 aborts, then T2 must
abort as well
Cascading aborts possible if
other transactions have read
data written by T2
Even worse, what if T2 commits before T1?

Schedule is not recoverable if the system crashes right
after T2 commits
7/1/2016
Chen Qian @ University of Kentucky
32
Strict 2PL
Only release locks at commit/abort time



A writer will block all other readers until the writer
commits or aborts
Used in most commercial DBMS
7/1/2016
Chen Qian @ University of Kentucky
33
Next ...

A few examples
7/1/2016
Chen Qian @ University of Kentucky
34
Non-2PL, A= 1000, B=2000, Output =?
Lock_X(A)
Read(A)
A: = A-50
Write(A)
Lock_S(A)
Unlock(A)
Read(A)
Unlock(A)
Lock_S(B)
Lock_X(B)
Read(B)
Unlock(B)
PRINT(A+B)
Read(B)
B := B +50
Write(B)
Unlock(B)
35
2PL, A= 1000, B=2000, Output =?
Lock_X(A)
Read(A)
A: = A-50
Write(A)
Lock_X(B)
Unlock(A)
Lock_S(A)
Read(A)
Read(B)
B := B +50
Write(B)
Unlock(B)
Lock_S(B)
Unlock(A)
Read(B)
Unlock(B)
7/1/2016
Chen Qian @ University of Kentucky
PRINT(A+B)
36
Strict 2PL, A= 1000, B=2000, Output =?
Lock_X(A)
Read(A)
A: = A-50
Write(A)
Lock_X(B)
Read(B)
B := B +50
Write(B)
Unlock(A)
Unlock(B)
Lock_S(A)
Read(A)
Lock_S(B)
Read(B)
PRINT(A+B)
Unlock(A)
7/1/2016
Unlock(B)
Chen Qian @ University of Kentucky
37
Lock Management

Lock and unlock requests handled by Lock Manager

LM keeps an entry for each currently held lock.
Entry contains:




7/1/2016
List of xacts currently holding lock
Type of lock held (shared or exclusive)
Queue of lock requests
Chen Qian @ University of Kentucky
38
Lock Management, cont.

When lock request arrives:


Lock upgrade:

7/1/2016
Does any other xact hold a conflicting lock?
 If no, grant the lock.
 If yes, put requestor into wait queue.
Shared lock can request to upgrade to exclusive
Chen Qian @ University of Kentucky
39
Deadlocks

Deadlock: Cycle of transactions waiting for locks to be
released by each other.

Two ways of dealing with deadlocks:
 prevention
 detection

Many systems just punt and use Timeouts
 What are the dangers with this approach?
7/1/2016
Chen Qian @ University of Kentucky
40
Deadlock Detection


Create and maintain a “waits-for” graph
Periodically check for cycles in graph
7/1/2016
Chen Qian @ University of Kentucky
41
Deadlock Detection (Continued)
Example:
T1: S(A), S(D),
T2:
X(B)
T3:
T4:
S(B)
X(C)
S(D), S(C),
X(A)
X(B)
Deadlock!
7/1/2016
T1
T2
T4
T3
Chen Qian @ University of Kentucky
42
Deadlock Prevention


Assign priorities based on timestamps.
Say Ti wants a lock that Tj holds
Two policies are possible:
Wait-Die: If Ti has higher priority, Ti waits for Tj; otherwise Ti aborts
Wound-wait: If Ti has higher priority, Tj aborts; otherwise Ti waits

Why do these schemes guarantee no deadlocks?

Important detail: If a transaction re-starts, make sure it gets its
original timestamp. -- Why?
7/1/2016
Chen Qian @ University of Kentucky
43
Summary

Correctness criterion for isolation is “serializability”.


In practice, we use “conflict serializability,”
which is somewhat more restrictive but easy to enforce.
Two Phase Locking and Strict 2PL: Locks implement the notions
of conflict directly.


The lock manager keeps track of the locks issued.
Deadlocks may arise; can either be prevented or detected.
7/1/2016
Chen Qian @ University of Kentucky
44
Download