miniworld Requirements & collection analysis Database Requirements Conceptual Design Conceptual Schema ( ER diagram ) DBMS independent DBMS specific Data Model Mapping Conceptual Schema ( Relations ) • primary key constraint • foreign key constraint refinement Schema Refinement and Normal Forms •Conceptual database design gives us a set of relation schemas and integrity constraints •Given a design, how do we know it is good or not? •A design can be evaluated from various perspectives, our focus is on data redundancy Conceptual design Schemas ICs The Evils of Redundancy • Redundancy is at the root of several problems associated with relational schemas: – redundant storage – Insertion/update/deletion anomalies Example • Schema – • Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked) Constraints: 1. 2. ssn is the primary key If two tuples have the same value on rating, they have the same value on hrly_wages SSN Name H Attishoo Lot Rd W d 48 8 10 123-22-3666 231-31-5368 Smiley 22 8 10 30 131-24-3650 Smethurst 35 5 7 30 434-26-3751 Guldu 35 5 7 32 612-67-4134 Madayan 35 8 10 40 40 •Solution: Decomposition If we break Hourly_Emps into Hourly_Emps2 and Wages, then we don’t have updates, insertion, deletion anomalies. Wages Hourly_Emps2 S N L R H 123-22-3666 Attishoo 48 8 40 231-31-5368 Smiley 22 8 30 131-24-3650 Smethurst 35 5 30 434-26-3751 Guldu 35 5 32 612-67-4134 Madayan 35 8 40 R W 8 10 5 7 Decomposition Concerns •Should a relation be decomposed? •If a relation is not in certain form, some problems (e.g., redundancy) will arise, are these problems tolerable? • Aforementioned anomalies • Potential performance loss: Queries over the original relation may required to join the decomposed relations •How to decompose a relation? Two properties must be preserved: •lossless-join: the data in the original relation can be recovered from the smaller relations •dependency-preservation: all constraints on the original relation must still hold by enforcing some constraints on each of the small relations Functional Dependencies (FDs) In a relation schema R, a set of attributes X functionally determines a set of attributes Y if and only if whenever two tuples of R agree on X value, they must necessarily agree on the Y value. XY t1, t 2 r( R), where r(R) is an instance of R, t1[ X ] t 2[ X ] t1[Y ] t 2[Y ] X R and Y R XY: Y is functionally dependent on X, or X uniquely determines Y or X functionally determines Y, or X determines Y X X1 X1 X2 Y Y2 Y2 Y2 Z Z1 Z2 Z3 X X1 X1 X1 Y Y1 Y1 Y2 Z Z1 Z2 Z1 Does this data set violate X->Y? Does this data set violate Z->Y? Does this data set violate X->Y? Does this data set violate XY->Z? Does this data set violate Z->X? • An FD is a statement about all allowable relations. – Must be identified based on semantics of application. – Given some allowable instance r1 of R, we can check if it violates some FD f, but we cannot tell if f holds over R! • A primary key constraint is a special case of an FD – The attributes in the key play the role of X, and the set of all attributes in the relation plays the role of Y Example 1 Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked) • Notation: We will denote this relation schema by listing the attributes: SNLRWH – – This is really the set of attributes {S,N,L,R,W,H}. Sometimes, we will refer to all attributes of a relation by using the relation name. (e.g., Hourly_Emps for SNLRWH) • Some FDs on Hourly_Emps: – – ssn is the key: SSNLRWH (or {S}{S,N,L,R,W,H}) rating determines hrly_wages : RW Example 2 since dname name ssn did lot Employees Works_for budget Departments Additional Constraints: Employees are assigned parking lots based on their department. All employees in the same department is given the same lot. FD: did->lot Works_for(ssn,name,did,since) Department (did,dname,budget,lot); Dependency Reasoning A set of dependencies may imply some additional dependencies. EMP_DEPT(ENAME,SSN,BDATE,ADDRESS,DNUMBER,DNAME,DMGRSSN) F={SSN->{ENAME,BDATE,ADDRESS,DNUMBER}, DNUMBER->{DNAME,DMGRSSN} } F infers the following additional functional dependencies: F {SSN}->{DNAME,DMGRSSN} F {SSN}->{SSN} F {DNUMBER}->{DNAME} Dependency Reasoning A set of dependencies may imply some additional dependencies. Some important questions 1. Given a set of attributes X, what attributes can be determined by X 2. Given an FD set, what other dependencies are implied 3. Given an FD set F, what is the minimum set of dependencies that is equivalent to F Armstrong’s Axioms • Armstrong’s attributes: Axioms where X, Y, Z are sets of –Reflexivity: If X Y, then XY. –Augmentation: –Transitivity: If XY, then XZ YZ for any Z. If X Y and YZ, then XZ. PROOFS •Reflexive rule: If X Y, then XY. Let {t1,t2} r(R) such that t1[X]=t2[X] Since Y X, t1[X]=t2[X] t1[Y]=t2[Y] XY. PROOFS (Cont’d) •Transitive rule: If XY and YZ, then XZ. Let XY and YZ t1, t 2 r ( R) such that t1[X]=t2[X], we have: (1) t1[Y]=t2[Y] (2)&(4) t1[Z]=t2[Z] (3)&(5) XZ (1) (2) (3) (4) (5) PROOFS (Cont’d) •Augmentation rule: If XY, then XZYZ. Assume that the Augmentation rule is not true. t1, t 2 r ( R) t1[X] = t2[X] t1[Y] = t2[Y] t1[XZ] = t2[XZ] t1[YZ] != t2[YZ] (1)&(3) (2)&(5) (1) (2) (3) (4) t1[Z]=t2[Z] (5) t1[YZ]=t2[YZ] (6) (6) Contradicts (4) Additional Inference Rules for Functional Dependencies – Union: If X Y and X Z, then X YZ. – Decomposition: If XYZ, then XY and XZ. – Pseudotransitive Rule: If XY and WYZ then WXZ. PROOFS (Cont’d) •Union rule: If XY and XZ, then XYZ. Given XY and XZ. (1) (2) Applying Augmentation rule on (1), we have XXXY XXY. (3) Applying Augmentation rule on (2), we have XYZY XYYZ . (4) Applying Transitive rule on (3) and (4), we have XYZ. PROOFS (Cont’d) •Decomposition rule: If XYZ then XY and XZ. Given XYZ. Since Y YZ, reflexive rule gives YZY. (1) (2) Applying Transitive rule on (1) and (2), we have XY. XZ is derived in a similar way. PROOFS (Cont’d) •Pseudotransitive rule: If XY and WYZ, then WXZ. Given XY and WYZ. (1) (2) Applying Augmentation rule on (1), we have WXWY. (3) Applying Transitive rule on (3)&(2), we have WXZ. Exercise • Prove or disprove the following inference rules 1. 2. 3. 4. • • {WY,XZ} |= {WXY} {XY,XW,WYZ} |= {XZ} {XY} |= {XYZ} {XY, Z Y} |= {XZY} Prove using inference rules Disprove by showing a counter example