CSE4701 Midterm Exam Advice and Hints CSE4701 Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 191 Auditorium Road, Box U-155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818 MTE.1 Core Material CSE4701 No Questions on … Chapters 1/2/4/5 (Intro/SQL) Chapter 3 & 6 : Relational Model and Algebra Chapter 7: ER Model Conceptual Database Design Chapter 8: Extended ER Model Extension with Inheritance and other Features Chapter 9: ER to Relational Translation Detailed Algorithm for Translation Chapter 15&16 : FDs and Normalization Guidelines for “Good” Design Normal Forms and Normalization Through Slide … (set in class!) MTE.2 Schema for the Exam – BasketBall http://www.engr.uconn.edu/~steve/Cse4701/cse4701BBall.doc CSE4701 PLAYER(PLName, PFName, StartYear, NumYears, UniformNumber); COACH(CLName, CFName, StartYear, EndYear); TEAM (TeamID, Year, Squad); ROSTERS(TeamID, PLName, CLName); RSRECORD(TeamID, Wins, Losses); PORECORD(TeamID, Wins, Losses); STATISTICS(PLName, TeamID, PPG, RPG, APG); TITLES(TeamID, TitleType); RS - Regular Season, PO - Playoff MTE.3 Schema for the Exam – BasketBall Explaining Tables CSE4701 Player/Coach Tables: These track information on the players and coaches for the basketball teams (Mens/Womens). Since Names Unique, it is not possible to have the case where a former player is now a coach (Kevin Ollie) Team Table: This tracks information on each team, namely, the ID of the team, its year, and squad (Mens or Womens). Rosters Table: This tracks information on each roster by TeamID, tacking the players and coach for each team. RSRecord/PSRecord: These tables contain parallel information for a teams regular season and post season win/loss records. Statistics Table: This tracks statistics for each player, namely, PPG, RPG, and APG on a team basis. Titles Table: This tracks the titles that a team has – since a team can have multiple titles, the primary key is both the attributes (TeamID and TitleType). MTE.4 Schema for the Exam – BasketBall Assumptions CSE4701 Last names of coaches and players are unique. Year, StartYear, and EndYear attributes have values such as, 1968, 1971, 1994, 1999, 2001 In TEAM table, a year such as 1997 means the season that finishes in 1997 but started in November 1996. EndYear is null for active coaches. NumYears represents the number of complete years a player is in the program STATISTICS contains values for each player ONLY after the season is complete ROSTSERS is a ternary relationship among PLAYER, COACH, TEAM. There is one COACH per year, i.e., coaches not fired during season. RSRECORD is for regular season record; PORECORD is for playoffs. PPG, RPG, and APG are Points, Rebounds, and Assists Per Game The attribute Squad has the values Mens and Womens, and the attribute TitleType has the values BigEastRS, BigEastCC, NCAA, NIT MTE.5 Hints for Taking Exam CSE4701 Read the Questions Carefully! Ask Questions if you are Confused! Answer Questions in Any Order Organized to fit on minimum number of pages Answer “Easiest” questions for you! Assess Points per Time Unit 75 minutes = 75 points 15 points = 15 minutes For Essay/Short Answer Questions - Length Answer Matches Points 5 points = 1/4 page = 3 or 4 sentences 30 points - if 1/4 page - likely few points! Exam Designed to be Longer than 75 Minutes! MTE.6 Hints for Taking Exam CSE4701 Don't Define Concepts E.G., Ask About Concept X, Don't Explain Concept X, Just Answer the Question and I'll Know If You Know Concept X Don't Panic, Read and Review Course Materials Prior to Exam! Don't Be Afraid to Not Answer a Question 60% Correct for 100 Points = 60 Points 90% Correct F0r 80 Points = 72 Points Partial Credit Is the Norm If I Ask You to Pick and Analyze a Concept for a 5 Pt Problem You Get 1 for the Concept and 4 for the Analysis. MTE.7 Possible Questions CSE4701 Open Notes, Book, and Online (Web) 5 Total Questions Possibilities… Constructive and Algorithm Questions Relational Algebra Understanding Concepts and Applying Problem Solving ER Design (no Conversion Algorithm) Know your Algorithms and Constructs Show All Work to Receive Partial (Any) Credit Do Not Jump to Final Answer Avoid Run-on Explanations Covered Material - See Remaining Slides! MTE.8 Chapters 3 & 6: Relational Model and Algebra CSE4701 Covered Material Definition of a Relation Relation Algebra Including Select, Project, Join, Theta-Join, Natural Join, Union, Intersection, etc. Key Concepts (Superkey, Candidate key, etc.) Referential Integrity Equivalence of Various Relational Operations No Questions … Insert, Delete and Modify Operations on Relations Semi and Outer Join MTE.9 Chapters 7 and 8: ER and EER CSE4701 Chapter 7: ER Model Conceptual Database Design Basic ER Concepts 1-1, 1-m, and m-n Relationships Chapter 8: Extended ER Model Extension with Inheritance Understanding the Differences and their Usage Disjoint vs. Overlapping Specialization vs. Generalization Constraints: Partial and Disjoint Categories Focus on Understanding Various ER and EER Concepts and Constructs MTE.10 Chapters 7 and 8: ER and EER CSE4701 No Questions on … Min/Max Notation ER Complications Recursive Relationship Multiple Relationships Between Two Entity Types Participation Constraints (Existence Constraints) Strong and Weak Entities Relationships Among More than two Entity Types Connection Traps Simplification Techniques MTE.11 Chapter 9: ER Model to Relational Model CSE4701 Covered Material Detailed Algorithm for Translation All Eight Steps and Their Application For Step 8, Focus on Different Approaches that are Utilized to Map Different Type of EER Inheritance Make sure that you Clearly Understand the Translation Process! MTE.12 Chapter 15 & 16: Functional Dependencies and Normalization CSE4701 Covered Topics Include … Four Guidelines for “Good” Design Update Anomalies (Insert, Delete, Modify) Functional Dependencies (FDs) Single and Multi-Valued Dependencies Ability to Define FDs for a Relational Schema Normal Forms and Normalization 1st, 2nd, and 3rd Normal Forms (no BCNF) MTE.13 More Detailed Summary/Study Guide CSE4701 MTE.14 Chapters 7 and 8: ER and EER CSE4701 Chapter 7: ER Model Conceptual Database Design Basic ER Concepts 1-1, 1-m, and m-n Relationships Chapter 8: Extended ER Model Extension with Inheritance Understanding the Differences and their Usage Disjoint vs. Overlapping Specialization vs. Generalization Constraints: Partial and Disjoint Categories Focus on Understanding Various ER and EER Concepts and Constructs MTE.15 Chapters 3 & 6 : Relational Model and Algebra CSE4701 Covered Material Definition of a Relation Relation Algebra Including Select, Project, Join, Theta-Join, Natural Join, Union, Intersection, etc. Key Concepts (Superkey, Candidate key, etc.) Referential Integrity Equivalence of Various Relational Operations No Questions … Insert, Delete and Modify Operations on Relations Slides 36-43 of cse4701chap7.pptx Semi and Outer Join Slides 86-89 of cse4701chap7.pptx MTE.16 Basic Concepts - Relation Schema CSE4701 A Schema of a Relation Denoted as R(A :D , A :D , ..., A :D ) 1 1 2 2 n n Set of Attributes That Describe a Relation Denoted by {A1:D1, A2:D2 , ..., An:Dn}, where Ai (i=1, …, n) is Attribute Name and Di is Domain Over Which Ai is Defined Domain The Set of Values From which the Values of an Attribute Aj are Drawn, Denoted by Domain(Aj) Example STUDENT (s#, sname, email, dept) Domain(s#): Number(9) Domain(sname): Char(30) Domain(email): Char(20) Domain(dept): Char(15) MTE.17 Relation Instances A Relation (Relation Instance) An Occurrence of a Relation Scheme R( A1:D1, A2 :D2 , ..., An :Dn); CSE4701 Defined as a Subset of the Cartesian Product of the Domains that Define its Schema, Denoted by R(r) = {T1, T2, ..., Tm} Ti (i=1,…,m) is a Member of the Cartesian Product Domain(A1) Domain(A2) … Domain(An). R is also Called the Intension of a Relation r is also Called the Extension of a Relation MTE.18 Relation Instances EMP CSE4701 WORKS ENO ENAME TITLE E1 E2 E3 E4 E5 E6 E7 E8 J. Doe M. Smith A. Lee J. Miller B. Casey L. Chu R. Davis J. Jones Elect. Eng. Syst. Anal. Mech. Eng. Programmer Syst. Anal. Elect. Eng. Mech. Eng. Syst. Anal. ENO PNO E1 E2 E2 E3 E3 E4 E5 E6 E7 E7 E8 P1 P1 P2 P3 P4 P2 P2 P4 P3 P5 P3 RESP DUR Manager Analyst Analyst Consultant Engineer Programmer Manager Manager Engineer Engineer Manager 12 24 6 10 48 18 24 48 36 23 40 PROJ PNO PNAME BUDGET P1 P2 P3 P4 P5 Instrumentation Database Develop. CAD/CAM Maintenance CAD/CAM 150000 135000 250000 310000 500000 PROJ[PNO] P1 P2 P3 P4 P5 EMP[TITLE] Elect.Eng Syst. Anal Mech. Eng Programmer MTE.19 Key Constraints CSE4701 Superkey (SK): Any Subset of Attributes Whose Values are Guaranteed to Distinguish Among Tuples Candidate Key (CK): A Superkey with a Minimal Set of Attributes (No Attribute Can Be Removed Without Destroying the Uniqueness -- Minimal Identity) A Value of an Attribute or a Set of Attributes in a Relation That Uniquely Identifies a Tuple There may be Multiple Candidate Keys MTE.20 Key Constraints CSE4701 Primary Key (PK): Choose One From Candidate Keys The Primary Key Attributed are Underlined Foreign Key (FK): An Attribute or a Combination of Attributes (Say A) of Relation R1 Which Occurs as the Primary Key of another Relation R2 (Defined on the Same Domain) Allows Linkages Between Relations that are Tracked and Establish Dependencies Useful to Capture ER Relationships MTE.21 Referential Integrity Constraints CSE4701 A Constraint Involving Two Relations Used to Specify a Relationship Among Tuples in Referencing Relation and Referenced Relation Definition: R1and R2 have a Referential Integrity Constraint If Tuples in the Referencing Relation R1 have a Set of Foreign Key (FK) Attributes That Reference the Primary Key PK of the Referenced Relation R2 A Tuple T1 in R1( A1, A2 , ..., An) is Said to Reference a Tuple T2 in R2 if $ FK {A1, A2 , ..., An} such that T1[fk] = T2[pk] MTE.22 Examples WORKS EMP ENO CSE4701 E1 E2 E3 E4 E5 E6 E7 E8 ENAME TITLE ENO PNO J. Doe M. Smith A. Lee J. Miller B. Casey L. Chu R. Davis J. Jones Elect. Eng. Syst. Anal. Mech. Eng. Programmer Syst. Anal. Elect. Eng. Mech. Eng. Syst. Anal. E1 E2 E2 E3 E3 E4 E5 E6 E7 E7 E8 P1 P1 P2 P3 P4 P2 P2 P4 P3 P5 P3 RESP Manager Analyst Analyst Consultant Engineer Programmer Manager Manager Engineer Engineer Manager DUR 12 24 6 10 48 18 24 48 36 23 40 PROJ PNO PNAME BUDGET P1 P2 P3 P4 P5 Instrumentation Database Develop. CAD/CAM Maintenance CAD/CAM 150000 135000 250000 310000 500000 E9 P3 Engineer 30 MTE.23 Referential Integrity Constraints A Referential Integrity Constraint Can Be Displayed in a Relational Database Schema as a Directed Arc From R1.FK to R2.PK CSE4701 EMP PROJ ENO ENAME TITLE WORK ENO PNO PNO PNAME BUDGET RESP DUR WORK[ENO] is a subset of EMP[ENO] WORK[PNO] is a subset of PROJ[PNO] MTE.24 What is Relational Algebra? Relational Algebra is a Procedural Paradigm You Need to Tell What/How to Construct the Result CSE4701 Consists of a Set of Operators Which, When Basic Relational Operations: Applied to Relations, Yield Relations (Closed Unary Operations Algebra) SELECT s or P. Binary Operations Set operations: UNION INTERSECTION DIFFERENCE – CARTESIAN PRODUCT JOIN operations PROJECT MTE.25 Relational Algebra CSE4701 RS RS R\S RS union intersection set difference Cartesian product A1, A2, ..., An (R) projection sF (R) selection R S natural join R S theta-join RS division [A1 B1,.., An Bn]rename MTE.26 Relational Algebra CSE4701 Selection Projection Union Difference Cartesian Product Intersection Join, Equi-join, Natural Join Derivable from the fundamental operators Fundamental Operators MTE.27 All Relational Algebra Operations CSE4701 A Set of Relational Algebra Operations Is Called a Complete Set, If and Only If Any Relational Algebra Operator in the Set Cannot be Derived in Terms of a Sequence of Others in Set Any Relational Algebra Operator Not in the Set Can Be Derived in Terms of a Sequence of Only the Operators in the Set Important Concepts: The Set of Algebra Operations {S ,P , , –, } is a Complete Set of Relational Algebra Operations Any Query Language Equivalent to These Five Operations is Called Relationally Complete MTE.28 Relational Algebra: Summary CSE4701 Fundamental Operators Selection Projection Union Set Difference Cartesian Product Additional Operators Join Intersection Quotient (Division) Union Compatibility Same Degree Corresponding Attributes Defined Over the Same Domain Form: <Operator><Operand(s)> Result> Relation (s) Relation MTE.29 Chapters 7 and 8: ER and EER CSE4701 No Questions on … Chapter 1 and 2 Min/Max Notation (slides 31-33 of cse4701chap3and4.pptx) ER Complications (slides 45-55 of cse4701chap3and4.pptx) Recursive Relationship Multiple Relationships Between Two Entity Types Participation Constraints (Existence Constraints) Strong and Weak Entities Relationships Among More than two Entity Types Connection Traps Simplification Techniques MTE.30 Summary of ER-Diagram Notation Meaning Symbol ENTITY TYPE WEAK ENTITY TYPE CSE4701 RELATIONSHIP TYPE IDENTIFYING RELATIONSHIP TYPE ATTRIBUTE KEY ATTRIBUTE MULTIVALUED ATTRIBUTE COMPOSITE ATTRIBUTE DERIVED ATTRIBUTE E1 E1 E2 R R N (min,max) R E2 E TOTAL PARTICIPATION OF E2 IN R CARDINALITY RATIO 1:N FOR E1:E2 IN R STRUCTURAL CONSTRAINT (min, max) ON PARTICIPATION OF E IN R MTE.31 Example COMPANY Database (Cont.) Store Each Employee’s Social Security Number, Address, Salary, Sex, and Birthdate Each Employee Works for One Department but May Work on Several Projects We Track of the Number of Hours Per Week that an Employee Currently Works on Each Project We Track of the Direct Supervisor of Each Employee CSE4701 Each Employee May have a Number of Dependents For Each Dependent, We Track of their Name, Sex, Birthdate, and Relationship to Employee MTE.32 ER Diagram for the Company Database CSE4701 MTE.33 Enhanced ER Model CSE4701 Object-Oriented Extensions to E-R Model EER Concepts Specialization Attribute Inheritance Generalization Subclasses Superclasses Constraints on Specialization and Generalization Categorization MTE.34 Enhanced ER Constructs and Notation CSE4701 MTE.35 Specialization/Attribute Inheritance CSE4701 An Entity Type E1 is a Specialization of another Entity Type E2 if E1 has the Same Salary Employee No Employee Properties of E2 and Perhaps NameEven More. E1 IS-A E2 EMPLOYEE MANAGER Title EMPLOYEE Address Condo Expense Act. Title MANAGER Employee No Employee Name Address Salary MTE.36 Generalization CSE4701 Employee No Employee Name Title EMPLOYEE Salary Address d ENGINEER Project Office SECRETARY Specialty Office SALESPERSON Region Car MTE.37 Constraints CSE4701 disjoint, total d disjoint, partial overlapping, total o overlapping, partial d o M_date Manufactured_Part Part BatchNo o DrawingNo Purchased_Part PartNo Description SupplierName ListPrice MTE.38 Total and Partial Disjoint Employee No Employee Name Salary Hourly Rate CSE4701 Title HOURLY_EMP SALARIED_EMP EMPLOYEE d d ENGINEER Salary Address Project Office SECRETARY Specialty Office SALESPERSON Region Car MTE.39 Total Overlapping Part No Part Name QTY PART CSE4701 WGT o MANUFACTURED_PART Batch No Drawing No PURCHASED_PART Price MTE.40 Categories CSE4701 A Category is a Subclass of a Union of Two or PERSON More Entity Types. The Concept of Category Superclass Relationship with Two or More Superclasses A Category Can Be Total or Partial BANK COMPANY u OWNER M OWNS N OWNER and REG_VEHICLE are both categories REG_VEHICLE u CAR TRUCK MTE.41 CSE4701 MTE.42 Chapter 9: ER Model to Relational Model CSE4701 Covered Material Detailed Algorithm for Translation All Eight Steps and Their Application For Step 8, Focus on Different Approaches that are Utilized to Map Different Type of EER Inheritance You are Responsible for All Slides in cse4701chap9.pptx Make sure that you Clearly Understand the Translation Process! No Questions on … Does not apply MTE.43 ER-to-Relational Mapping Algorithm CSE4701 Step 1: For Each Regular Entity Type E Create a Relation RE Include only the Simple Attributes of a Composite Attribute Step 2: For Each Weak Entity Type W with Owner Entity Type E Create a Relation RW Include as Attributes All Simple Attributes of W Primary Key attribute(s) of the Relation that Corresponds to W’s Owner Entity Type E MTE.44 ER-to-Relational Mapping Algorithm CSE4701 Step 3: For Each 1:1 Relationship Identify the Relations R1 and R2 Include as Foreign Key of one Relation the Primary Key of the Other Relation Step 4: For each Regular 1:n Relationship Include as Foreign Key in the Entity Type at the n-side of the Relationship, the Primary Key of the Entity Type at the 1-side of the Relationship MTE.45 ER-to-Relational Mapping Algorithm CSE4701 Step 5: For Each Binary n:m Relationship Create a New Relation, whose Attributes Include All Simple Attributes of the n:m Relationship as Non-key Attributes PKs of the Relations that Represent the Participating Entity Types, as FK Attributes in this New Relation Step 6: For Each Multi-valued Attribute A Create a New Relationship R that Includes An Attribute Corresponding to A The PK Attribute of the Relation Whose Corresponding Entity Type or Relationship Has A as an Attribute MTE.46 ER-to-Relational Mapping Algorithm CSE4701 Step 7: For Each n-ary Relationship R, n>2 Create a New Relation to Represent R Step 8: Convert Each Specialization for Superclass C with Attributes {k, A1, …, An} (k is the PK), where C has n Subclasses {S1, ..., Sn} Create a Relation Si for each Subclass Entity (1<= i <= n) with Attributes Attrs(Si) = {k} {attributes of Si}, and PK{Si} =k Note that the Relation for C was created in an Earlier Step Note also that there are Three Other Options for Mapping Specialization Hierarchies MTE.47 Transformation of EER to ... Acount # Income ACCOUNT Supplier Name Supplier No CSE4701 SUPPLIER Expenses 1 RECORDS BALANCE Project No Project Name 1 N Location PROJECT SUPPLY M 1 1 Credit Budget Location Amount Duration Date MANAGES Part No L Part Name PART WORKS ON Office Color N 1 N Employee Name Consists of Employee No EMPLOYEE CONTAIN Title o MANUFACTURED_PART Batch No Drawing No ENGINEER Responsibility Weight M Made-up of Project d Salary QTY PURCHASED_PART Price Street # SECRETARY Specialty Car Address Apt. # Office City SALESPERSON Region MTE.48 … Final Set of Relations EMPLOYEE(ENO, ENAME, TITLE, SALARY, APT#, STREET, CITY, PJNO, DURATION, RESP) PROJECT(PJNO,PNAME,BUDGET,MGR) SUPPLIER(SNO,SNAME,CREDIT,LOCATION) CSE4701 PART(PNO, PNAME, WGT, COLOR, MAN, PURC, BATCH#, DRAWING#, PRICE) ENGINEER(ENO, PROJECT,OFFICE) SECRETARY(ENO, OFFICE, SPECIALTY) SALESPERSON(ENO, CAR, REGION) SUPPLY(SNO, PJNO, PNO, AMOUNT,DATE) LOC(PJNO, LOCATION) CONTAIN(PNO, CPNO,QTY) ACCOUNT(PJNO, ACNO, INCOME, EXPENSES, BALANCE) MTE.49 Chapter 15&16: Functional Dependencies CSE4701 Covered Topics Include … Four Guidelines for “Good” Design Update Anomalies (Insert, Delete, Modify) Functional Dependencies (FDs) Ability to Define FDs for a Relational Schema Normalization 1, 2, 3NF Multi-Valued Dependencies No Questions on … FD Closure, Equivalence MTE.50 Guideline 1 CSE4701 GUIDELINE 1: Informally, Each Tuple in a Relation Should Represent One Entity or Relationship Instance (Applies to Individual Relations and their Attributes) Attributes of Different Entities should not be Mixed in the Same Relation Only FKs should be used to Refer to Other Entities Entity and Relationship Attributes should be Kept Apart as Much as Possible Bottom Line: Design a Schema that can be Explained Easily Relation by Relation The Semantics of Attributes should be Easy to Interpret MTE.51 Guideline 2 CSE4701 Guideline 2: Design a Schema that does not Suffer from Insertion, Deletion and Update Anomalies If There are any Present, then Note them so that Applications can take them into Account Reasons for Update Anomalies Dependencies Caused by One Relation used to Represent Two or More Entities/Relationships Redundancy Caused by Dependencies/Other Factors Strive for Independence (Double Edged Sword) Eliminates the Anomalies Query Performance May Suffer (e.g., Always Require Many Joins for Queries) MTE.52 Redundant Information/Update Anomalies CSE4701 Mixing Attributes of Multiple Entities (see Prior Two Slides) May Cause Problems Key Problem: Information is Stored Redundantly There are Two Consequences: Wasting Storage Problems with Update Anomalies Insertion Anomalies - Inserting New Tuples Deletion Anomalies - Removing Existing Tuples Modification Anomalies - Changing Existing Tuples MTE.53 Guideline 3 CSE4701 Guideline 3: Relations should be Designed such that their Tuples will have as Few NULL Values as Possible Attributes that are NULL Frequently Could Be Placed in Separate Relations (With the Primary Key) Reasons for Null Values Attribute Not Applicable or Invalid Attribute Value Unkown (May Exist) Value Known to Exist, but Unavailable MTE.54 Guideline 4 CSE4701 Guideline 4: The Relations should be Designed to Satisfy the Lossless Join Condition No Spurious Tuples Should Be Generated by Doing a Natural-join of Any Relations Two Important Properties of Decompositions: a. Non-additive(Losslessness) of Corresponding Join b. Preservation of the Functional Dependencies Property (a) is Extremely Important and Cannot Be Sacrificed Property (b) is Less Stringent and May Be Sacrificed MTE.55 Guideline 4: Lost Information A First Example of Lost Information What is Lost in the Join of R and S? CSE4701 R = (A, B, C) S = (D, C) RS(A, B, C, D) B C D C A B C D a1 b2 a2 b2 a3 b4 c1 c1 c2 d1 d2 d4 d5 c1 c2 c2 c3 a1 a2 a3 a3 b2 b2 b4 b4 c1 c1 c2 c2 d1 d1 d2 d4 A lost info. MTE.56 Guideline 4: Spurious Tuples A Second Example of Spurious Tuples What are Spurious in the Join of R1and R2? CSE4701 R(A, B, C, D) A a1 a2 a3 a4 B b1 b2 b1 b2 C c1 c2 c1 c2 R1(B, D) D d1 d1 d2 d3 B b1 b2 b1 b2 C d1 d1 d2 d3 R2(A, D) A a1 a2 a3 a4 D d1 d1 d2 d3 R1 and R2 Join A B C D a1 a2 a1 a2 a3 a4 b1 b1 b2 b2 b1 b2 c1 c2 c2 c2 c1 c2 d1 d1 d1 d1 d2 d3 MTE.57 Functional Dependencies (FDs) CSE4701 FDs are used to Specify Formal Measures of the "Goodness" of Relational Designs FDs and Keys are used to Define Normal Forms for Relations FDs are Constraints that are Derived from the Meaning and Interrelationships of the Data Attributes A Set of Attributes X Functionally Determines a Set of Attributes Y if the Value of X Determines a Unique Value for Y FDs are Derived from the Real-World Constraints on the Attributes A Relational Schema is Relations with Keys and FDs! MTE.58 Example of FDs CSE4701 Social Security Number Determines Employee Name SSN ENAME Project Number Determines Project Name and Location PNUMBER {PNAME, PLOCATION} SSN and Project Number Determines the Hours Per Week That the Employee Works on the Project {SSN, PNUMBER} HOURS Notes: An FD is a Property of Attributes in the Schema FDs Must Hold on Every Relation Instance R If K is a Key of R, then K Functionally Determines All Attributes in R (Since we Never have Two Distinct Tuples with T1[k]=t2[k]) MTE.59 Inference Rules for FDs CSE4701 Given a set of FDs F, we can Infer Additional FDs that Hold whenever the FDs in F Hold For Example, Consider: F = {SSN ->{EName, BDate, Address, DNumber}, DNumber -> {DName, DMGRSSN} } What are Additional FDs? SSN EName SSN Address SSN BDate SSN SSN SSN DNumber DNumber Dname DNumber DMGRSSN MTE.60 Inference Rules Armstrong’s Inference 1. Reflexive: If X Y, then X Y. CSE4701 2. Augmentation: If { X Y} then XZYZ. Rules 3. Transitive: If { XY, YZ } then X Z. Derived Inference Rules 4. Decomposition: If { XYZ } then X Y. 5. Additive (Union): If {XY, XZ } then X YZ. MTE.61 Summary of 1NF, 2NF, 3NF Concepts CSE 4701 Test Remedy (Normalization) 1NF Relation should have no nonatomic attributes or nested relations. Form new relations for each nonatomic attribute or nested relation. 2NF For relations where primary key contains multiple attributes, no nonkey attribute should be functionally dependent on a part of the primary key. Decompose and set up a new relation for each partial key with its dependent attribute(s). Make sure to keep a relation with the original primary key and any attributes that are fully functionally dependent on it. 3NF Relation should not have a nonkey attribute functionally determined by another nonkey attribute (or by a set of nonkey attributes.) That is, there should be no transitive dependency of a nonkey attribute on the primary key. Decompose and set up a relation that includes the nonkey attribute(s) that functionally determine(s) other nonkey attribute(s). Chapter 14-62 Comparing the Normal Forms Poor Relational Schema Design Developed as Stepping Stone CSE 4701 1NF Eliminate the non-trivial functional dependencies of non-key attributes to key Eliminate partial FDs of non-key attributes to key 2NF Eliminate transitive FDs of nonkey attributes to key 3NF BCNF Eliminate partial and transitive FDs of key attributes to key Most 3NF are in BCNF - BCNF Eliminates All Update Anomalies Chapter 14-63 Summary of Normalization 1NF CSE 4701 Lossless Decomposition and Dependency Preserving Eliminate the Partial Functional Dependencies of Non-prime Attributes to Key Attributes 2NF Eliminate the Transitive Functional Dependencies of Non-prime Attributes to Key Attributes 3NF Lossless Decomposition but not Dependency Preserving Eliminate the Partial and Transitive Functional Dependencies of Prime (Key) Attributes to Key BCNF Chapter 14-64 What are Multi-Valued Dependencies? CSE 4701 Focused on the Concept of Multi-Valued Dependencies A MVD X Y Indicates that a Value of X Corresponds to Multiple Values of Y Consider EMP with MVDs: ENAME PNAME (E works on many P) ENAME DNAME (E has many Dependents) Chapter 14-65