The Relational Model, Normalization, and Modification Anomalies Outline • Part 1: Introduction – Overview of Normalization, Big names in normalization • Part 2: Normal Forms – Definitions and Techniques • Part 3: Tokenization – Concepts and Application • Part 4: Epilog – BCNF Revisited, Related Activities, and Reflections Part 1: Introduction • Overview of Normalization approaches • Review of Normal Forms • Some big names in the history of normalization • 3 of Codd’s 12 rules Normalization • Normalization – A process of evaluating and converting a relation to reduce modification anomalies • Modification Anomaly – An unexpected consequence resulting from maintenance of the data in a database • Two Normalization approaches – Top-Down—relationships among entities – Bottom-Up—relationships among attributes Top-Down Normalization • Many-Many associations – “Factored” – Net outcomes … A B A B • More entities • More relations • Quality – Many-Many associations eliminated C Bottom-Up Normalization • Application of a sequence of transformations that results in an improved data model at each stage • Quality – A quality rating system for a relation – Elimination of maintenance problems and minimization of data replication • Introduced by E. F. Codd with the Relational Model A B } C D A B B } } C D Known Normal Forms • Bottom-Up: In order from lowest to highest are • Top-Down: no specific normal form—goal might be viewed as • Known normal forms are related one to the other. Ordering of Normal Forms – First Normal Form (1NF). Elimination of repeating field types; “atomic” fields; duplicate rows – Second Normal Form (2NF). Elimination of partial key dependencies. – Third Normal Form (3NF). Elimination of transitive key dependencies among non-key attributes. – Boyce-Codd Normal Form (BCNF). Elimination of partial key dependencies upon non-key attributes. – Fourth Normal Form (4NF). Elimination of multi-valued dependencies. – Fifth Normal Form (5NF). Elimination of join anomalies. – Domain Key Normal Form. Elimination of all modification anomalies DKNF => 5NF => 4NF => BCNF => 3NF => 2NF => 1NF A Normalization Strategy • For initial Logical Model (LM) – Apply Top-Down approach, removing all Many-to-Many associations – Creates new objects, relationships • For resulting LM – Apply Bottom-Up approach, to the level of quality desired (usually 3NF or BCNF) – Create new objects, relationships • Reassess resulting LM – LM quality is the least quality level of any object in the LM – Optimization/Tuning Note • “No major application will run in Third Normal Form.” George Koch (formerly a senior vice president of Oracle) Normalization: Contributors • Dr. E. F. Codd – A Relational Model of Data for Large Shared Databanks, CACM, Vol 13, No 6, June, 1970. – Introduced the Relational Model – Identified the first three normal forms • Dr. R. F. Boyce – extended Codd's original three forms. • Dr. R. Fagin – Extended the theory as proposed by Codd and – introduced another way of evaluating a design. • Dr. David. M. Kroenke, Author & Educator – Instrumental in clarifying the theory of normal forms Codd's 12 rules (for defining a fully relational DBMS) • Published in the 1980’s by Codd to defend his original notion of a relational DBMS • Really 13 rules (0-12) • Rules 1, 2, and 3 are relevant to a discussion of Modification Anomalies Codd's 12 rules (continued) • Rule 1: Information Rule – All information in the database should be represented in one and only one way - as values in a table. – Note: rows and columns are not ordered • Source: http://www.cse.ohio-state.edu/~sgomori/570/coddsrules.html Codd's 12 rules (continued) • Rule 2: Guaranteed Access Rule – Each and every datum (atomic value) is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name. – Note: No repeating rows, no pointers, no repeated fields—eliminates CODASYL and OO models • Source: http://www.cse.ohio-state.edu/~sgomori/570/coddsrules.html Codd's 12 rules (continued) • Rule 3: Systematic Treatment of Null Values – Null values (distinct from empty character string or a string of blank characters and distinct from zero or any other number) are supported in the fully relational DBMS for representing missing information in a systematic way, independent of data type. – Note: Primary Keys may not be NULL (See Rule 1) Source: http://www.cse.ohio-state.edu/~sgomori/570/coddsrules.html Part 2: Normal Forms • Definitions and terminology – – – – Functional Dependency Key The first 4 normal forms Selecting a key Review: Functional Dependency • For A and B are non-empty attribute collections for a relation R, B is Functionally Dependent on A if for each value of A there is exactly one value of B. • Remarks: – A is said to Functionally Determine B – A is called a Determinant – The relationship is represented as A B Review: Key • Definitions: Given that ε is the set of all attributes for a relation R, – A is an Identifier for R if A ε – K is a Key for a relation R if and only if • K is an identifier for R and • No non-empty subset of K is an identifier for R • Note: – All relations have a key (covered in later slides) – An attribute which belongs to the selected key A is called a Key Attribute – all other attributes are called Non-Key Attributes Example • ε = {STUD-ID, STUD-NAME, DORM, DORM-FEE} A = {STUD-ID} B = {DORM} C = {DORM-FEE} • Two identified functional Dependencies – Aε – BC • A is a Key: – cannot discard since only 1 attribute • B is not a Key: – does not determine all attributes Review: First Normal Form • A relation R is in FIRST NORMAL FORM (1NF) if and only if – R has no repeating attribute types AND – all attribute types of R are “atomic” – No repeated rows • Not a functional dependency but … essential to comply with Codd’s Rule 2 for a relational DBMS Review: Second Normal Form • Definitions: – A relation R with key, K, has a Partial Key Dependency if and only if a collection of nonkey attributes is determined by (or functionally dependent on) a non-empty proper subset of K. – A relation R is in SECOND NORMAL FORM (2NF) if and only if • R is in 1NF AND • R has no partial key dependencies • Note: by definition, any relation in 2NF is in 1NF—thus the LM is improved Review: Third Normal Form • Definitions: – For A and C, attribute collections for a relation R, there is a Transitive Dependency of C upon A if there is an attribute collection, B, of for which • A B and • B C. – A relation R is in THIRD NORMAL FORM (3NF) if and only if • R is in 2NF AND • R has no transitive dependencies of one non-key attribute collection upon another non-key attribute collection • Note: 3NF implies 2NF Review: Boyce-Codd Normal Form • Definitions: – Attribute collections A and B of a relation R are Candidate Keys for R if and only if • A is a key for R and • B is a key for R and • A is not equal to B – A relation R is in BOYCE-CODD NORMAL FORM (BCNF) if and only if • R is in 3NF AND • all determinants of R are candidate keys Selecting a Key 1. Identify all determinants 2. The set of all attributes is a finite set that is an identifier for the relation—place these on the “Key” side of the LM diagram 3. One by one, move an attribute determined by other attribute collections from the “Key” side to the “Non-Key” side 4. Repeat step 3 until there are no more attributes determined by attributes on the “Key” side. 5. Those attributes on the “Key” side are a key Key Selection Key { A B Non - Key What is the minimal set of attributes that uniquely identify the row? C D Based on functional dependencies, we move each non-key attribute out of the key. Key Selection Key { Non - Key A B C D Key Selection Key A B Non - Key } Now we have the minimal set of attributes that uniquely identify the row C D Summary of Normal Forms Key Non - Key BCNF A D Find Key 3NF B C 2NF Part 3: Tokenization • Modification anomalies and Design flaws • Tokenized tables – Functional Dependencies: 2NF, 3NF, and BCNF – Tokenized Tables: approach permitting a narrow focus on the dependency, not the actual data • Observe the result of a normalization process on – Maintenance of data – Logical Model Normalization (Review) • Normalization – A process of evaluating and converting a relation to reduce modification anomalies • Modification Anomaly – An unexpected consequence resulting from maintenance of the data in a database Anomalies: Types • Two types associated with databases – Modification Anomaly: • Three basic types of modification anomalies: – Insertion, – Deletion, and – Update – Design Anomaly • A flaw in the logical design of the database itself • Connecting the two types of anomalies modification anomaly there is a corresponding design anomaly and Whenever there is a design anomaly there are modification anomalies which may surface. – For each – Anomalies: Design • Types have been – classified and – criteria for removal developed • Normalization—the process of removing design anomalies • A Normal Form identifies a type of anomaly Tokenized Tables • Designed to emphasize the relationship between modification anomalies and normal forms using abstractions of actual data tables, Tokenized Tables. • Supplement to traditional, textbook approach. • Daigle, Roy (1996) “Teaching Normalization Concepts with Abstraction,” AIS Link to scanned copy Tokenized Tables • Most approaches to normalization use context-based data tables and intuition to examine the relationship. Definitions… • Problem abstraction – removing the irrelevant, i.e. the context • Solution generalization – Exhibiting data anomalies independent of context • Constraint: A “faithful” state of the database • Verification – demonstrate that normalization removes the data anomalies Tokenized Tables • “Tokenize” refers to the process of converting context-based data into symbolic representation – Each attribute is assigned its own variable name (A, B, C, etc) – Each data value is assigned a distinct symbolic value (a1, a2,…;b1, b2,…; etc) Tokenization (context) COURSE ID SECTION COURSE TITLE INSTRUCTOR NAME INSTRUCTOR LOCATION ISC 285 101 Programming II Chapman FCW 9 ITE 285 101 Programming II Chapman FCW 9 ACC 201 501 Fund Acctg Miller MCOB 310 MKT 300 801 Intro Mktg Bennett MCOB 310 MKT 300 802 Intro Mktg Beatty MCOB 333 Determinants: 1. COURSE ID COURSE TITLE (Courses could be cross-listed) 2. COURSE ID, SECTION INSTRUCTOR NAME (A course could be taught by different faculty) 3. INSTRUCTOR NAME INSTRUCTOR LOCATION (faculty could share an office location) 4. COURSE ID, SECTION is a key for this table Tokenization (context free) COURSE ID SECTION COURSE TITLE INSTRUCTOR NAME INSTRUCTOR LOCATION ISC 285 101 Programming II Chapman FCW 9 ITE 285 101 Programming II Chapman FCW 9 ACC 201 501 Fund Acctg Miller MCOB 310 MKT 300 801 Intro Mktg Bennett MCOB 310 MKT 300 802 Intro Mktg Beatty MCOB 333 A B C D E a1 b1 c1 d1 e1 a2 b1 c1 d1 e1 a3 b2 c2 d2 e2 a4 b3 c3 d3 e2 a4 b4 c3 d4 e3 Tokenization (abstraction) A B C D E a1 b1 c1 d1 e1 a2 b1 c1 d1 e1 a3 b2 c2 d2 e2 a4 b3 c3 d3 e2 a4 b4 c3 d4 e3 Conceptual Diagram A B } C D E Determinants (without context): 1. A C 2. A, B D 3. D E 4. A, B is a key Tokenization (initial) Conceptual Diagram A B } C D E Tokenization (faithful representation) Conceptual Diagram A B } C D Tokenized Table E A B C D E Tokenization (faithful representation) Conceptual Diagram A B } C D Tokenized Table E A B C D E a1 b1 c1 d1 e1 a2 b1 c1 d1 e1 a3 b2 c2 d2 e2 a4 b3 c3 d3 e2 a4 b4 c3 d4 e3 An Illustration Tokenized Conceptual Diagrams 3NF B Step 1 A B B C Step 4 A C Normalization Procedure Normalized Relation Unnormalized Relation Tokenized Data Tables Step 2 A a1 a2 a3 a4 a5 B b1 b2 b1 b3 b2 C c1 c2 c1 c1 c2 Modification Anomalies Step 3 .Insertion b4 c3 .Deletion row a4 .Update b1 c1 to b1 c5 or a3 b1 to a3 b2 Step 5 Corresponding Projection Step 6 A B a1 b1 a2 b2 a3 b1 a4 b3 a5 b2 Modification • Insertion • Deletion • Update B C b1 c1 b2 c2 b3 c1 Anomalies Removed Removed Removed An Illustration Tokenized Conceptual Diagrams 3NF B Step 1 A B B C Step 4 A C Normalization Procedure Normalized Relation Unnormalized Relation Tokenized Data Tables Step 2 A a1 a2 a3 a4 a5 B b1 b2 b1 b3 b2 C c1 c2 c1 c1 c2 Modification Anomalies Step 3 .Insertion b4 c3 .Deletion row a4 .Update b1 c1 to b1 c5 or a3 b1 to a3 b2 Step 5 Corresponding Projection Step 6 A B a1 b1 a2 b2 a3 b1 a4 b3 a5 b2 Modification • Insertion • Deletion • Update B C b1 c1 b2 c2 b3 c1 Anomalies Removed Removed Removed An Illustration Tokenized Conceptual Diagrams 3NF B Step 1 A B B C Step 4 A C Normalization Procedure Normalized Relation Unnormalized Relation Tokenized Data Tables Step 2 A a1 a2 a3 a4 a5 B b1 b2 b1 b3 b2 C c1 c2 c1 c1 c2 Modification Anomalies Step 3 .Insertion b4 c3 .Deletion row a4 .Update b1 c1 to b1 c5 or a3 b1 to a3 b2 Step 5 Corresponding Projection Step 6 A B a1 b1 a2 b2 a3 b1 a4 b3 a5 b2 Modification • Insertion • Deletion • Update B C b1 c1 b2 c2 b3 c1 Anomalies Removed Removed Removed An Illustration Tokenized Conceptual Diagrams 3NF B Step 1 A B B C Step 4 A C Normalization Procedure Normalized Relation Unnormalized Relation Tokenized Data Tables Step 2 A a1 a2 a3 a4 a5 B b1 b2 b1 b3 b2 C c1 c2 c1 c1 c2 Modification Anomalies Step 3 .Insertion b4 c3 .Deletion row a4 .Update b1 c1 to b1 c5 or a3 b1 to a3 b2 Step 5 Corresponding Projection Step 6 A B a1 b1 a2 b2 a3 b1 a4 b3 a5 b2 Modification • Insertion • Deletion • Update B C b1 c1 b2 c2 b3 c1 Anomalies Removed Removed Removed An Illustration Tokenized Conceptual Diagrams 3NF B Step 1 A B B C Step 4 A C Normalization Procedure Normalized Relation Unnormalized Relation Tokenized Data Tables Step 2 A a1 a2 a3 a4 a5 B b1 b2 b1 b3 b2 C c1 c2 c1 c1 c2 Modification Anomalies Step 3 .Insertion b4 c3 .Deletion row a4 .Update b1 c1 to b1 c5 or a3 b1 to a3 b2 Step 5 Corresponding Projection Step 6 A B a1 b1 a2 b2 a3 b1 a4 b3 a5 b2 Modification • Insertion • Deletion • Update B C b1 c1 b2 c2 b3 c1 Anomalies Removed Removed Removed An Illustration Tokenized Conceptual Diagrams 3NF B Step 1 A B B C Step 4 A C Normalization Procedure Normalized Relation Unnormalized Relation Tokenized Data Tables Step 2 A a1 a2 a3 a4 a5 B b1 b2 b1 b3 b2 C c1 c2 c1 c1 c2 Modification Anomalies Step 3 .Insertion b4 c3 .Deletion row a4 .Update b1 c1 to b1 c5 or a3 b1 to a3 b2 Step 5 Corresponding Projection Step 6 A B a1 b1 a2 b2 a3 b1 a4 b3 a5 b2 Modification • Insertion • Deletion • Update B C b1 c1 b2 c2 b3 c1 Anomalies Removed Removed Removed Guidelines for locating Modification Anomalies • Insertion – of a “small” entity produces a KEY problem for the “large” entity (violates Codd’s Rule 3) • Deletion – of a “large” entity loses a “small” entity (loss of data integrity) • Updates – to the “small” entity need to be performed in several places (violates Codd’s Rule 2) 2NF (Steps 1 and 2) Step 1- Initial Design Conceptual Diagram A B } C D Step 2- Create faithful table state Tokenized Data Table A B C D a1 b1 c1 d1 a1 b2 c2 d2 a2 b1 c1 d1 a2 b2 c3 d2 a3 b3 c2 d1 Problems? Building the table—is it a faithful representation of the determinants? 2NF A B } Modification Anomalies Insert: b4 d3 Deletion: row (a3, b3) Update: b1 d1 to b1 d4 C Step 3-Find insertion anomaly D Tokenized Data Table A B C D a1 b1 c1 d1 a1 b2 c2 d2 a2 b1 c1 d1 a2 b2 c3 d2 a3 b3 c2 d1 null b4 null d3 Can’t have null in Primary Key 2NF A B } Modification Anomalies Insert: b4 d3 Deletion: row (a3, b3) Update: b1 d1 to b1 d4 C Step 3-FindStep deletion 3 anomaly D Tokenized Data Table A B C D a1 b1 c1 d1 a1 b2 c2 d2 a2 b1 c1 d1 a2 b2 c3 d2 a3 b3 c2 d1 Lose information b3 d1 2NF A B } Modification Anomalies Insert: b4 d3 Deletion: row (a3, b3) Update: b1 d1 to b1 d4 C Step 3-Find Step update 3 anomaly D Tokenized Data Table A B C D a1 b1 c1 d1 d4 a1 b2 c2 d2 a2 b1 c1 d1 a2 b2 c3 d2 a3 b3 c2 d1 You have to change it here too. 2NF Step 4-Remove partial key dependency (Normalize to 2NF) A B } C A B D Pull out Then thenon-partial partial dependency dependency B } } (Child) C (Foreign Key) (Parent) D 2NF Step 5-Build new tables from original with SQL (Projection/distinct rows) (Child) A a1 a1 a2 a2 a3 B b1 b2 b1 b2 b3 C c1 c2 c1 c3 c2 D d1 d2 d1 d2 d1 A a1 a1 a2 a2 a3 B b1 b2 b1 b2 b3 C c1 c2 c1 c3 c2 (Parent) B b1 b2 b3 D d1 d2 d1 2NF Step 5-Verify removal of insertion anomaly A a1 Modification Anomalies a1 a2 Insert: b4 d3 Deletion: row (a3, b3) a2 Update: b1 d1 a3 to b1 d4 B b1 b2 b1 b2 b3 C c1 c2 c1 c3 c2 B b1 b2 b3 b4 D d1 d2 d1 d3 a value for A is not required—not an attribute for this table! 2NF Step 5-Verify removal of deletion anomaly Modification Anomalies Insert: b4 d3 Deletion: row (a3, b3) Update: b1 d1 to b1 d4 A a1 a1 a2 a2 a3 B b1 b2 b1 b2 b3 C c1 c2 c1 c3 c2 B b1 b2 b3 b4 D d1 d2 d1 d3 No loss of information: b3 d1 2NF Step 5-Verify removal of update anomaly Modification Anomalies Insert: b4 d3 Deletion: row (a3, b3) Update: b1 d1 to b1 d4 A a1 a1 a2 a2 a3 B b1 b2 b1 b2 b3 C c1 c2 c1 c3 c2 B b1 b2 b3 b4 D d4 d2 d1 d3 Only one place to change value 3NF Conceptual Diagram A } B C (Transitive Relationship) Tokenized Data Table A B C a1 b1 c1 a2 b2 c2 a3 b1 c1 a4 b3 c1 a5 b2 c2 We’re building the table—does it capture what could happen over time? 3NF A } B C Modification Anomalies Insert: b4 c3 Deletion: row a4 Update: (b1 c1) to (b1 c5) OR (a3 b1) to (a3 b2) Can’t have null in Primary Key Tokenized Data Table A B C a1 b1 c1 a2 b2 c2 a3 b1 c1 a4 b3 c1 a5 b2 c2 null b4 c3 3NF A } B Tokenized Data Table C Modification Anomalies Insert: b4 c3 Deletion: row a4 Update: (b1 c1) to (b1 c5) OR (a3 b1) to (a3 b2) A B C a1 b1 c1 a2 b2 c2 a3 b1 c1 a4 b3 c1 a5 b2 c2 Lose information b3 c1 3NF A } B Tokenized Data Table C Modification Anomalies Insert: b4 c3 Deletion: row a4 Update: (b1 c1) to (b1 c5) OR (a3 b1) to (a3 b2) A B C a1 b1 c5 c1 a2 b2 c2 a3 b1 c1 a4 b3 c1 a5 b2 c2 You have to change it here too. 3NF A } B Tokenized Data Table C Modification Anomalies A = SID B = Building C = Fee Insert: b4 c3 Deletion: row a4 Update: (b1 c1) to (b1 c5) OR (a3 b1) to (a3 b2) What about this one? A B C a1 b1 c1 a2 b2 c2 a3 b1 b2 c1 a4 b3 c1 a5 b2 c2 change Or c2 to c1 c1? to c2? 3NF Normalize to 3NF A } B Move transitive dependency to new Parent table B C (Transitive Relationship) A } } C (Foreign Key) B 3NF Normalize to 3NF Projection/distinct row A B C a1 b1 c1 a2 b2 c2 a3 b1 c1 a4 b3 c1 a5 b2 c2 A B B C a1 b1 b1 c1 a2 b2 b2 c2 a3 b1 b3 c1 a4 b3 a5 b2 Assignment: Verify that anomalies were removed! BCNF Conceptual Diagram A B } C D “D” is a determinant but not a candidate key Tokenized Data Table A B C D a1 b1 c1 d1 a2 b1 c2 d1 a1 b2 c2 d2 a2 b2 c1 d2 a3 b1 c1 d3 We’re building the table—does it capture what could happen over time? BCNF A B } C D Modification Anomalies Insert: d4 b3 Deletion: row (a3, b1) Update: (d1 b1) to (d1 b2) OR (a2, b1) d1 to (a2, b1) d2 Can’t have null in Primary Key Tokenized Data Table A a1 a2 a1 a2 a3 B b1 b1 b2 b2 b1 C c1 c2 c2 c1 c1 D d1 d1 d2 d2 d3 null b3 null d4 BCNF A B } C D Tokenized Data Table Modification Anomalies Insert: d4 b3 Deletion: row (a3, b1) Update: (d1 b1) to (d1 b2) OR (a2, b1) d1 to (a2, b1) d2 A a1 a2 a1 a2 a3 B b1 b1 b2 b2 b1 C c1 c2 c2 c1 c1 D d1 d1 d2 d2 d3 Lose information d3 b1 BCNF A B } C D Modification Anomalies Insert: d4 b3 Deletion: row (a3, b1) Update: (d1 b1) to (d1 b2) OR (a2, b1) d1 to (a2, b1) d2 Tokenized Data Table A a1 a2 a1 a2 a3 B b1 b1 b2 b2 b1 C c1 c2 c2 c1 c1 D d1 d1 d2 d2 d3 You have to change it here too. BCNF A B } C D Tokenized Data Table Modification Anomalies Insert: d4 b3 Deletion: row (a3, b1) Update: (d1 b1) to (d1 b2) OR (a2, b1) d1 to (a2, b1) d2 Hmm.... A a1 a2 a1 a2 a3 B b1 b1 b2 b2 b1 C c1 c2 c2 c1 c1 D d1 d1 d2 d2 d3 BCNF Normalized to BCNF A B } C D D A D } } B C BCNF Normalized to BCNF Projection/distinct row A a1 a2 a1 a2 a3 B b1 b1 b2 b2 b1 C c1 c2 c2 c1 c3 D d1 d1 d2 d2 d3 A a1 a2 a1 a2 a3 C c1 c2 c2 c1 c3 D d1 d1 d2 d2 d3 D d1 d2 d3 B b1 b2 b1 Assignment: Verify that anomalies were removed! BCNF A B C D A a1 a1 a1 a2 A2 a2 Advisor can only advise on one project A project can have multiple advisors A student can be on multiple projects B b1 b2 b3 b4 b1 b2 C c1 c1 c3 c4 c5 c6 D d1 d1 d2 d3 d3 d4 BCNF Conceptual Diagram B A } C D “D” is a determinant but not a candidate key Tokenized Data Table A a1 a1 a1 a2 A2 a2 B b1 b2 b3 b4 b1 b2 C c1 c1 c3 c4 c5 c6 D d1 d1 d2 d3 d3 d4 BCNF Normalized to BCNF B A } C D B D D } } C A BCNF Normalized to BCNF B D D } } C A BCNF Normalized to BCNF B D D } } C A BCNF A B C D Epilog • BCNF Revisited • Related Activities • Reflections – – – – Data Redundancy vs Data Replication Learning about Modification Anomalies Past … Future A Hypothesis? Another way to understand BCNF • I always had heartburn about BCNF… • So I searched for another approach…my approach – – authors gave a definition involving “candidate keys” but I never found an author that demonstrated how to use a candidate key in the normalization step. 1. Find another candidate key (I believe that this is why authors shied away from this approach because you have to “prove” it is a key!) 2. Revise the diagram using the newly found candidate key 3. Ask the question: Can this diagram be normalized using known normalization steps? The process is illustrated on the next two slides… BCNF--Revisited NOT INTUITIVE! Definition?... Candidate keys? B A } B C D D D } } C A BCNF—Revise diagram using a different key! Need to show that B, D is a candidate key! B A } C B D D } C A BCNF—Revise diagram using a different key! B A } C B D D } Assert: B, D is a Key (Uses Armstrong’s Axioms—look it up!) 1. Given: A,B C, D A 2. B B, D D (Reflexive Property) 3. D A (Given) 4. B, D B, A (3 & 4) 5. B, A C (Given) 6. B, D C (4 & 5 & Transitive property) B,D is a Key diagram on top left can be rewritten as the diagram on the top right! C A Here’s the proof… for those of you who are interested! This diagram shows a partial key dependency BCNF—Normalize to 2NF B A } C B D D } C A Normalize to 2NF! Does this help you better understand the original normalization step? B D D } } C A Related Activities • SQL exercises – write SQL statements to create new (normalized) tables from original table – drop the original table – original table can be made into a view from the new tables (Why bother?) • E-R diagrams – supplements the impact on normal forms – Establishes associations among newly created objects Reflections: Data Redundancy vs Data Replication • Data Redundancy: unnecessary data Replication – Applying a normal forms (bottom-up), transforms relationships among attributes into relationships among objects – For the relational model, data Replication of Foreign Keys (FK) is necessary to retain the original relationship among attributes … through the new relationships among the objects created as a consequence of a normalization process Reflections: Learning about Modification Anomalies • Can the relationship between the modification anomalies and design flaws be more clearly examined (and learned…) by abstracting the relationship? • This is an empirical question. Reflections: Past … Future • Past—Windows-Based tool (Hari Munikar) – permitted students to • construct a tokenized table (in 1NF) • find a key for the table • normalize in steps to 3NF – desired functionality • • • • construct context-based table automatic conversion to tokenized form extend to BCNF construct E-R diagram from resulting diagram • Evaluation of the approach: Is it effective? – An open research project…Anyone interested? Reflections: A Hypothesis? • Premises: – students will benefit from an examination that focuses on the underlying principles because of problem abstraction, solution generalization, and verification – Context-based examples can be a source of confusion