Lecture 14 Functional Dependency Designing a database for an enterprise • Objective • Create an accurate representation of data, relationships between the data and constraints on the data that is pertinent to the enterprise. • Techniques • Entity-Relationship (ER) Modeling • Normalization Informal Design Guidelines for Relational Databases • Relational database design: The grouping of attributes to form "good" relation schemas • Two levels of relation schemas: – The logical "user view" level – The storage "base relation" level • Design is concerned mainly with base relations. • Criteria for "good" base relations: – Discuss informal guidelines for good relational design – Discuss formal concepts of functional dependencies and normal forms 1NF 2NF 3NF BCNF 3 Semantics of the Relation Attributes • Each tuple in a relation should represent one entity or relationship instance – Only foreign keys should be used to refer to other entities – Entity and relationship attributes should be kept apart as much as possible – Design a schema that can be explained easily relation by relation. The semantics of attributes should be easy to interpret. 4 5 Example tables suffering from poor designs; by mixing attributes from distinct real-world entities. 6 Redundant Information in tuples and Update Anomalies • Mixing attributes of multiple entities may cause problems – Information is stored redundantly wasting storage • Problems with update anomalies: – Insertion anomalies – Deletion anomalies – Modification anomalies 7 Normalization Problem • In designing a table for EMP_DEPT with the attributes ENAME, SSN, BDATE, ADDRESS, DNUMBER, DNAME, DMGRSSN What's wrong with just using a single entity class (or table): EMP_DEPT( ENAME, SSN, BDATE, ADDRESS, DNUMBER, DNAME, DMGRSSN) ? 8 Answer: What's Wrong … • Redundancy: DNAME & DMGRSSN Extra Work: If we change name of a department, we have to do it at multiple places Anomalies: Could change DNUMBER without changing DNAME or vice versa. Too many NULLs: If an employee is unassigned, both DNUMBER and DNAME would be NULL. 9 redundancy redundancy redundancy 10 Redundancy: DNUMBER → {DNAME, DMGRSSN} – Entities with the same value for DNUMBER have the same value for DNAME and DMGRSSN – Including DNAME and DMGRSSN in the entity class is redundant, since it can be derived from DNUMBER • Redundancy causes duplicate work • Suppose the company wants to change DNUMBER 5 to be the Sales department. That change must be made to multiple employees 11 Redundancy and Anomaly • Redundancy can cause anomalies (inconsistencies) if modifications are not done carefully • Update Anomaly: – Updating a value in a single cell can make the database inconsistent • Insertion Anomaly: – Adding an entity can make the database inconsistent • Deletion Anomaly: – Deleting some information can make the database inconsistent or cause unintended loss of information 12 EXAMPLE OF AN UPDATE ANOMALY • Changing the manager of department 5 from 333445555 to 999999999 may cause this update to be made for all employees who work in that department; otherwise the database will become inconsistent. 13 EXAMPLE OF AN INSERT ANOMALY NULL James, Bill NULL NULL 111111111 1980-03-12 NULL 333 Sims Av, Chicago 6 Sales 999999999 NULL NULL NULL • Cannot insert a new department unless an employee is assigned to it – or insert null values in attributes for employees for that department. • Inversely- Cannot insert a new employee unless he/she is assigned to a department. – Or insert null values if the employee does not work for a department yet. 14 EXAMPLE OF A DELETE ANOMALY • When a department is deleted, it will result in deleting all the employees who work on that department. • Alternately, if an employee is the sole employee on a department, deleting that employee would result in deleting the corresponding department. – The information concerning that department is lost from the database. 15 Anomalies and good design • Ideally, design a schema that does not suffer from the insertion, deletion and update anomalies. • Normalization is a process that can often be used to arrive at such schemas. 16 Null Values in Tuples • Relations should be designed such that their tuples will have as few NULL values as possible • The NULL value means ‘no data’ and is different from values such as 0 for numeric types or the empty string for string types. • Attributes that are NULL frequently could be placed in separate relations (with the primary key) • Nulls can have multiple interpretations: – Attributes not applicable or invalid – Unknown values – Known but absent values 17 Decomposing Relations • The problems we came across in the example can be eliminated by splitting (or decomposing) the relation schema into multiple relations. – Care should be taken not to lose the information which is called Lossy Decomposition – We should achieve Lossless Decomposition 18 Illustration of lossy decomposition • The original database schema is as follows EMP_DEPT( ENAME, SSN, BDATE, ADDRESS, DNUMBER, DNAME, DMGRSSN) • If the Relation is decomposed into DEPARTMENT(DNUMBER, DNAME, DMGRSSN) EMPLOYEE(SSN, ENAME, BDATE, ADDRESS) • This kind of design eliminates – Need for NULL Values – Get rid of the types of anomalies we have come across BUT We have LOST the information of the relationship between EMPLOYEE and the DEPARTMENT in which the employee is 19 Lossless Decomposition • To prevent the LOSS of information, the solution is to place a foreign key in one of the relations. • A foreign key is a set of non-key attributes in one relation which acts as a primary key for another relation. The solution DEPARTMENT(DNUMBER, DNAME, DMGRSSN) EMPLOYEE(SSN, DNUMBER, ENAME, BDATE, ADDRESS) What happens when SSN is the foreign key in DEPARTMENT instead of DNUMBER being the Foreign key in EMPLOYEE? 20 Candidate Keys of a relation • A candidate key for a relation is a set of its attributes that satisfy: – Uniqueness. • The values of the attributes uniquely identify a tuple. – Minimality. • No proper subset of the attributes has the uniqueness property. • If uniqueness is satisfied (but not necessarily minimality) the attributes are said to form a superkey. 21 Candidate Keys of a relation Examples • Example: Employee Table • key – {SSN} • Candidate key – {SSN} • Superkeys – {SSN}, {SSN, ENAME}, {SSN, ENAME, BDATE} • {SSN, ENAME} and {SSN, ENAME, BDATE} are superkeys but not a candidate key. 22 Candidate keys vs Primary Keys • Note that the concept of a candidate key is defined with respect to the relation (schema), and not with respect to any particular instance of the relation. • The primary key of a relation in a DBMS should be a candidate key, but there could be several candidate keys to choose from. When talking about normalization, it is irrelevant which key is chosen as primary key. 23 Functional Dependencies • Functional dependencies (FDs) are used to specify formal measures of the "goodness" of relational designs • FDs and keys are used to define normal forms for relations • FDs are constraints that are derived from the meaning and interrelationships of the data attributes • FDs specify which attributes in a table or entity class are determined by other attributes 24 Simple Functional Dependencies (FD's) • Dependencies among attributes A→B A functionally determines B B functionally depends on A The value of A uniquely determines a single value for B If two or more tuples (of a specific table or entity class) have the same value for A, they have the same value for B (e.g. Every employee that has the same value for DNUMBER – e.g. 5 has the same value for DNAME – e.g. Research) 25 FD's for a Normalized Example EMPLOYEE ( SSN, ENAME, BDATE, ADDRESS) SSN → ENAME SSN → BDATE SSN → ADDRESS SSN can be used to lookup (and therefore uniquely determine) all the other attributes of an Employee tuple This can also be written as SSN → ENAME, BDATE, ADDRESS or SSN → { ENAME, BDATE, ADDRESS } Also SSN → SSN (this is a trivial FD, which we usually don't write) 26 An Example Functional Dependency 27 Simple Candidate Keys • A simple candidate key is any attribute of an entity class or table which uniquely identifies a tuple UNIQUE and NOT NULL Employee EMPNO SSN ENAME BDATE ADDRESS Simple Candidate Keys Both EMPNO and SSN uniquely identify an employee A designer chooses a primary key from one of the candidate keys 28 Determinants & Dependents empno Determinant → addr Dependent In a Simple FD, the determinant is a single attribute 29 Example tables suffering from poor designs; by mixing attributes from distinct real-world entities. 30 Example tables suffering from poor designs; by mixing attributes from distinct real-world entities. 31 FD- Full Functional Dependency A full functional dependency X Y is a functional dependency where the removal of any attributes from X will render X Y invalid. • For example, if we have a primary key {EMPNO, SSN} and we define an Full FD based on it, then {EMPNO, SSN} {ENAME, BDATE, ADDRESS} • If either EMPNO or SSN is missing, then the FD no longer holds (i.e. it will become invalid) because both are required for the primary key 32 Inference rules • (Reflexive) : If X is a subset of Y, then XY • (Augmentation): If XYthen XZYZ • • • • – (Notation: XZ stands for X U Z) (Transitive): If XY and YZ, then XZ (Decomposition): If XYZ, then XY (Union): If XY and XZ, then XYZ (Pseudotransitive): If XY and WYZ, then WXZ 33 Inference rules for functional dependencies - Example • Given F = {SSN {ENAME, BDATE, ADDRESS, DNUMBER}, DNUMBER {DNAME, DMGRSSN} } we can infer SSN {DNAME, DMGRSSN} SSN SSN DNUMBER DNUMBER 34 Question 1 Examine Table 1 shown below. This table represents the hours worked per week for temporary staff at each branch of a company. Question 1 a)Describe the concept of functional dependency. b)Identify the functional dependencies represented by the data shown in the Table 1. State any assumptions you make about the data (if necessary). c)Table 1 is susceptible to anomalies. Provide examples of how such anomalies could occur on this table. References • Fundamentals of Database Systems, by Elmasri and Navathe (chapter 10) • http://www.openlineconsult.com/db • Database Systems by Carolyn E. Begg, Thomas M. Connolly (chapter 13 and 14)