Functional Dependency CS157a Sec. 2 Koichiro Hongo What is Functional Dependency? A functional dependency is a constraint between two sets of attributes in a relation from a database. ● An attribute or set of attributes X is said to functionally determine another attribute Y (written X Y) if and only if each X value is associated with at most one Y value. Customarily we call X determinant set and Y a dependent set. ● So if we are given the value of X we can determine the value of Y. ● Why is Fanctional Dependency Important? The determination of functional dependencies is an important part of designing databases in the relational model, and in database normalization and denormalization. ● The functional dependencies, along with the attribute domains, are selected so as to generate constraints that would exclude as much data inappropriate to the user domain from the system as possible. ● Functional Dependency Inference Rules Reflexivity: If B is a subset of A then A functionally determines B For example: {name, location} {name} Functional Dependency Inference Rules(cont.) Augmentation: If B is a subset of A and C functionally determines D then A and C functionally determine B and D For example: {name, location} and {birthdate, time} {name} and {age} (as {name} is a subset of {name, location} and {birthdate, time} functionally determines {age}) Functional Dependency Inference Rules(cont.) Transitivity: If A functionally determines B and B functionally determines C then A functionally determines C For example: {name, location} {initials} (as {name, location} functionally determines {name} and {name} functionally determines {initials}) Functional Dependency Inference Rules(cont.) Pseudo transitivity: If A functionally determines B and B and C functionally determine D then A and C functionally determine D Functional Dependency Inference Rules(cont.) Union: If A functionally determines B and A functionally determines C then A functionally determines B and C For example: {name, location, birthdate, time} {initials, age} (as {name, location, birthdate, time} {initials} and {name, location, birthdate, time} {age}) Functional Dependency Inference Rules(cont.) Decomposition: If A functionally determines B and C then A functionally determines B and A functionally determines C For example: {name, location, birthdate, time} {initials, age} implies that {name, location, birthdate, time} {initials} and {name, location, birthdate, time} {age} Trivial Functional Dependencies Some functional dependencies are said to be trivial because they are satisfied by all relation. For example: AA X Y if Y is a subset of X Keys and Functional Dependencies ● Keys and, more generally, functional dependencies, are constraints on the database that require relations to satisfy certain properties. Relations that satisfy all such constraints are legal relations. What is Superkey? (review) A superkey is defined in the relational model as a set of attributes of a relation for which it holds that in all instances of the relation there are no two distinct tuples that have the same values for the attributes in this set. Equivalently a superkey can also be defined as those sets of attributes of a relation upon which all attributes of the relation are functionally dependent. Keys and Functional Dependencies(cont.) Functional dependencies allow us to express constraints that we cannot express with superkeys. Let's consider the schema of the example in the textbook(p.265) Figure 7.2 Figure 7.2 Partial list of tuples in relations loan, borrower, and bor_loan customer_id loan_number amount . . . L-100 . . . . . . . . . . . . 23-652 15-202 23-521 10000 . . . . . . loan loan_number L-100 L-100 L-100 . . . borrower customer_id . . . 23-652 15-202 23-521 . . . loan_number . . . amount . . . L-100 L-100 L-100 . . . 10000 10000 10000 . . . bor_loan Keys and Functional Dependencies(cont.) Figure 7.2, we consider the schema bor_loan = (customer_id, loan_number, amount) in which the functional dependency loan_number amount holds because for each loan (identified by loan_number) there is a unique amount. We denote the fact that the pair of attributes(customer_id, loan_amount) forms a superkey for bor_loan by writing: customer_id, loan_number customer_id, loan_number, amount or, equivalently, customer_id, loan_number bor_loan Keys and Functional Dependencies(cont.) We shall use functional dependencies in two ways: 1. To test relations to see whether they are legal under a given set of functional dependencies. If a relation r is a legal under a set F of functional dependencies, we say that r satisfies F. 2. To specify constraints on the set of legal relations. We shall thus concern ourselves with only those relations that satisfy a given set of functional dependencies. If we wish to constrain ourselves to relations on schema R that satisfy a set F of functional dependencies, we say that F holds on R. Database Normalization Database normalization relates to the level of redundancy in a relational database's structure. The key idea is to reduce the chance of having multiple different versions of the same data, like an address, by storing all potentially duplicated data in different tables and linking to them instead of using a copy. Then updating the address in one place will instantly change all the places where the address is used. ● Well-normalized databases have a schema that reflects the true dependencies between tracked quantities. This means that updates can be quickly performed with little risk of data becoming inconsistent. ● History of Database Normalization British computer scientist who made seminal contributions to the theory of relational databases. ● While working for IBM, he created the relational model for database management. ● Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form in his paper A Relational Model of Data for Large Shared Data Banks Communications of the ACM, Vol. 13, No. 6, June 1970, pp. 377-387 ● Edgar F. Codd Normal Forms In the relational model, formal methods exist for quantifying "how normalized" a database is. These classifications are called normal forms, and there are algorithms for converting a given database between them. ● Edgar F. Codd originally established three normal forms: 1NF, 2NF and 3NF. There are now others that are generally accepted, but 3NF is widely considered to be sufficient for many practical applications. Most tables when reaching 3NF are also in BCNF. ● 1NF(First Normal Form) In the relational model, we formalize this idea that attributes do not have any substructure. A domain is atomic if elements of the domain are considered to be indivisible units. We say that a relation schema R is in first normal form if the domains of all attributes of R are atomic. In other words, a relation schema R is in 1NF if there are no muntivalued attributes. 1NF(First Normal Form) (cont.) To understand first normal form (1NF), consider these two examples of things you might know: "What is your favorite color?" "What food will you not eat?" A difference between these two examples is that, you can have only one favorite color; but, there is very little limitation on the number of foods you might not eat. Data that has a single value such as "person's favorite color" is inherently in first normal form. Storing such data has not much changed since Codd wrote and needs no further explanation here. Data that has multiple values must be stored differently. 2NF(Second Normal Form) Second normal form (2NF) prescribes full functional dependancy on the primary key. It most commonly applies to tables that have composite primary keys, where two or more attributes comprise the primary key. It requires that there are no non-trivial functional dependencies of a non-key attribute on a part (subset) of a candidate key. A table is said to be in the 2NF if and only if it is in the 1NF and every non-key attribute is irreducibly dependent on the primary key. 2NF(Second Normal Form) (cont.) How about this example? Example: Machine_parts(part_number, supplier_name, price, supplier_address) In this case, price is fully dependent on the primary key(different suppliers may charge different price on the same part). However, supplier_address is partially dependent because it only depends on the supplier_name. Therefore, this table is not in 2NF. 3NF(Third Normal Form) Third normal form (3NF) requires that there are no non-trivial functional dependencies of non-key attributes on something other than a superset of a candidate key. A table is in 3NF if it is in 2NF, and none of the nonprimary key attributes is a fact about any other nonprimary key attribute. In summary, all non-key attributes are mutually independent(i.e. there should not be transitive dependencies). 3NF(Third Normal Form) (cont.) How about this example? Example: Machine_parts(part_number, supplier_name, supplier_address) In this case, supplier_address is dependent on supplier_name. Therefore, there is a transitive dependency in the table. It means that it is not in 3NF. BCNF (Boyce-Codd Normal Form) Boyce-Codd normal form (or BCNF) requires that there are no non-trivial functional dependencies of attributes on something else than a superset of a candidate key. At this stage, all attributes are dependent on a key, a whole key and nothing but a key (excluding trivial dependencies, like AA). A table is said to be in the BCNF if and only if it is in the 3NF and every non-trivial, left-irreducible functional dependency has a candidate key as its determinant. In more informal terms, a table is in BCNF if it is in 3NF and the only determinants are the candidate keys. Work Cited: Silberschatz, Avi. Hank Korth, and S. Sudarshan. DATABASE SYSTEM CONCEPTS FIFTH EDITION. New York: The McGraw-Hill Companies, Inc., 205. 263-309. Wikipedia: <http://en.wikipedia.org/wiki/Functional_dependency> <http://en.wikipedia.org/wiki/Database_normalization>