RELATIONAL DATABASE DESIGN Basic Concepts • a database is an collection of logically related records or files • a relational database stores its data in 2-dimensional tables • a table is a two-dimensional structure made up of rows (tuples, records) and columns (attributes, fields) • example: a table of students engaged in sports activities, where a student is allowed to participate in at most one activity StudentID 100 150 175 200 Activity Skiing Swimming Squash Swimming Fee 200 50 50 50 Table Characteristics • each row is unique and stores data about one entity • row order is unimportant • each column has a unique attribute name • each column (attribute) description (metadata) is stored in the database • Access metadata is stored and manipulated in the rows of the Design View tables • column order is unimportant • entries in a column have the same data type MIS-DB-Design 1 PA-5-Appendix RELATIONAL DATABASE DESIGN Primary Keys • a primary key is an attribute, or a collection of attributes whose value(s) uniquely identify each row in a relation • a primary key must be minimal (that is, must not contain unnecessary attributes StudentID 100 150 175 200 Activity Skiing Swimming Squash Swimming Fee 200 50 50 50 • we assume that a student is allowed to participate in at most one activity • the primary key in the above table is StudentID • what if we allow the students to participate in more than one activity? StudentID 100 100 175 175 200 200 Activity Skiing Golf Squash Swimming Swimming Golf Fee 200 65 50 50 50 65 • in this table, the two attributes, {StudentID, Activity}, constitute the primary key • a multi-attribute primary key is called a concatenated key, (composite key) and its members are called secondary keys MIS-DB-Design 2 PA-5-Appendix RELATIONAL DATABASE DESIGN Foreign Keys • a foreign key is an attribute or a collection of attributes in a relation, whose values match the values of a primary key in some relation • example: the STATE and CITY relations below STATE relation: State Abbrev CT MI SD TN TX StateName Connecticut Michigan South Dakota Tennessee Texas Union Order 5 26 40 16 28 StateBird American robin robin pheasant mocking bird mocking bird State Population 3,287,116 9,295,297 696,004 4,877,185 16,986,510 CITY relation: State Abbrev CT CT CT MI SD SD TN TX TX CityName Hartford Madison Portland Lansing Madison Pierre Nashville Austin Portland City Population 139,739 14,031 8,418 127,321 6,257 12,906 488,374 465,622 12,224 • primary key in STATE relation: StateAbbrev • primary key in CITY relation: {StateAbbrev, CityName} • foreign key in CITY relation: StateAbbrev MIS-DB-Design 3 PA-5-Appendix RELATIONAL DATABASE DESIGN Alternate Database Representations • an alternative representation of the previous database is • STATE = {StateAbbrev, StateName, UnionOrder, StateBird, StatePopulation} • CITY = {StateAbbrev, CityName, CityPopulation} Functional Dependency • a functional dependency is a relationship among attributes • attribute B is functionally dependent on attribute A if given a value of attribute A we can uniquely look up the corresponding value of attribute B • attribute A is the determinant of attribute B if attribute B is functionally dependent on attribute A • in the STATE relation above, StateAbbrev is a determinant of all other attributes, since specifying its value would allow us to determine the values of all other attributes uniquely by table lookup • in the STATE relation, the attribute StateName is also a determinant of all other attributes • in the CITY relation above, the attributes StateAbbrev and CityName together are a determinant of the attribute CityPopulation • in the CITY relation, the attribute CityName is not a determinant of the attribute CityPopulation because multiple cities in the table may have the same name MIS-DB-Design 4 PA-5-Appendix RELATIONAL DATABASE DESIGN Functional Dependency Formally, given two attributes A and B, we say that B is functionally dependent on attribute A if ti(A) = tj(A) ti(B) = tj(B) for i j where ti(A) means A’s value in the ith record. Notice that the reverse is not necessarily true. Example: Customer Name is dependent on Customer ID but the reverse is not true assuming that two customers can have the same name. MIS-DB-Design 5 PA-5-Appendix RELATIONAL DATABASE DESIGN Dependency Diagrams • a dependency diagram is a pictorial representation of all functional dependencies in a database • an attribute is represented by a rectangle • an arrow is drawn from the rectangle for attribute A to the rectangle for attribute B whenever attribute A is the determinant of attribute B • example: students sports activity - I consists of the relation ACTIVITY = {StudentID, Activity, Fee} StudentID Activity Fee • example: students sports activity - II consists of the relation ACTIVITY = {StudentID, Activity, Fee} StudentID StudentID 100 150 175 200 MIS-DB-Design Activity Skiing Swimming Squash Swimming Activity Fee StudentID 100 100 175 175 200 200 Fee 200 50 50 50 6 Activity Skiing Golf Squash Swimming Swimming Golf Fee 200 65 50 50 50 65 PA-5-Appendix RELATIONAL DATABASE DESIGN Partial Dependencies • a partial dependency is a functional dependency in which the determinant is a part of the primary key • example: ACTIVITY = {StudentID, Activity, Fee} • the dependency between the attributes Activity and Fee is a partial dependency StudentID Activity Fee Transitive Dependencies • a transitive dependency is a functional dependency in which none of the attributes involves attributes of a primary key ( none of them is a part of the primary key) • example: ACTIVITY = {StudentID, Activity, Fee} • the dependency between the attributes Activity and Fee is a transitive dependency StudentID MIS-DB-Design Activity 7 Fee PA-5-Appendix RELATIONAL DATABASE DESIGN Database Anomalies • anomalies are problems caused by bad database design • problems mean here undesirable irregularities of tables • example: ACTIVITY = {StudentID, Activity, Fee} StudentID 100 100 175 175 200 200 Activity Skiing Golf Squash Swimming Swimming Golf Fee 200 65 50 50 50 65 • an insertion anomaly occurs when a row cannot be added to a relation, because not all data is available • example: we want to store the fact that diving costs $175, but cannot enter this fact into the table until a student takes up scuba-diving • a deletion anomaly occurs when data is deleted from a relation, and unintentionally other critical data are lost • example: by deleting a record (say, StudentID = 100), the fact that skiing costs $200 is lost • an update anomaly occurs when one attribute is changed, but the DBMS must make more than one change to reflect that single change • example: if the cost of swimming changes, then all entries with swimming Activity must be changed too MIS-DB-Design 8 PA-5-Appendix RELATIONAL DATABASE DESIGN Cause of Anomalies • anomalies are mostly caused by the following: • data redundancy (replication of the same field in multiple tables) ( repeating sections) • partial dependency • transitive dependency • example: ACTIVITY = {StudentID, Activity, Fee} StudentID 100 100 175 175 200 200 StudentID Activity Skiing Golf Squash Swimming Swimming Golf Activity Fee 200 65 50 50 50 65 Fee • in this example, there is a partial dependency, which the cause of all the anomalies MIS-DB-Design 9 PA-5-Appendix RELATIONAL DATABASE DESIGN Cause of Anomalies (Cont.) • a two-table solution: • STUDENTS = {StudentID, Activity} • ACTIVITIES = {Activity, Fee} StudentID 100 100 150 175 175 200 200 StudentID Activity Skiing Golf Swimming Squash Swimming Swimming Golf Activity Fees Skiing 200 Golf 65 Swimming 50 Squash 50 ScubaDiving 200 Fee Activity Activity • the above relations do not have any of the anomalies • we can add the cost of diving in ACTIVITIES even though no one has taken it in STUDENTS • if StudentID 100 drops Skiing, no skiing-related data will be lost • if the cost of swimming changes, that cost need only be changed in one place only (the ACTIVITIES table) • the Activity field is replicated in the two tables, but without this replication we cannot join the two tables MIS-DB-Design 10 PA-5-Appendix RELATIONAL DATABASE DESIGN Good Database Design Principles 1. no redundancy • a field is stored in only one table, unless it happens to be a foreign key • replication of foreign keys is permissible, because they allow two tables to be joined together 2. no partial dependencies • the dependency diagram of any relation in the database should contain no partial dependencies 3. no transitive dependencies • the dependency diagram of any relation in the database should contain no transitive dependencies • normalization is the process of eliminating partial and transitive dependencies • as we normalize the relations, larger tables are split into smaller tables with one common foreign key field • there are five normal forms (NF), as given below First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) MIS-DB-Design 11 PA-5-Appendix RELATIONAL DATABASE DESIGN First Normal Form • a relation is said to be in the first normal form (1NF) if it does not contain any nested relation. In other words, all attributes are atomic. •IMPORTANT NOTE: many authors call the nested relations repeating section. This name is missleading. The nested relations involve many different instances of the same attribute for one record. There is nothing repeating in this anomaly. • example: CLIENT table has nested relations. Client ID 2173 Client Name VetID VetName PetID PetName PetType Barbara Hennessey 27 PetVet 4519 8005 Vernon Noordsy Sandra Amidon 31 27 PetCare PetVet 8112 Helen Wandzell 24 PetsRUs 1 2 3 2 1 2 3 Sam Hoober Tom Charlie Beefer Kirby Kirby Bird Dog Hamster Cat Dog Cat Dog CLIENT = {ClientD, ClientName, VetID, VetName, {PetID, PetName, PetType} } MIS-DB-Design 12 PA-5-Appendix RELATIONAL DATABASE DESIGN •In order to eliminate the nested relation, pull out the nested relation and form a new table •Be sure to include the old key in the new table so that you can connect the tables back together. •When a table contains no nested relations, we say that it is in first normal form. Client Name Client ID Vet Name VetID transitive Client ID PetID Pet Name Pet Type Second Normal Form • In order to eliminate the partial dependency, split the table again. •a table is said to be in the second normal form (2NF) if it does not contain any partial dependencies, that is, each nonkey column in a table depends on the entire key. • example: partial dependencies in the relation • now there are no partial dependencies, hence we need not do anything; • the relation still has some anomalies MIS-DB-Design 13 PA-5-Appendix RELATIONAL DATABASE DESIGN Third Normal Form •In order to eliminate transitive dependency, we split the table again. •a table of 2NF is said to be in the third normal form (3NF) if it does not contain any transitive dependencies, that is, Each nonkey column depends on the whole key and nothing but the key. • in the 3NF, each determinant is a primary key • example: conversion of CLIENT relation to the 3NF: • CLIENTS = {ClientID, ClientName, VetID} • PETS = {ClientID, PetID, PetName, PetType} • VETS = {VetID, VetName} Client Name Client ID VetID VetID Client ID PetID Pet Name Vet Name Pet Type • note that the tables can be joined to yield a table in the first normal form • the ClientID and VetID fields are replicated, but they are both foreign keys MIS-DB-Design 14 PA-5-Appendix RELATIONAL DATABASE DESIGN Third Normal Form (Cont.) • example: CLIENT database in the third normal form Client ID 2173 Client Name VetID Barbara Hennessey 27 4519 8005 8112 Vernon Noordsy Sandra Amidon Helen Wandzell 31 27 24 VetID 27 31 24 VetName PetVet PetCare PetsRUs Client ID 2173 2173 PetID PetName PetType 1 2 Sam Hoober Bird Dog 2173 3 Tom Hamster 4519 8005 8005 8112 2 1 2 3 Charlie Beefer Kirby Kirby Cat Dog Cat Dog with table relationships • the database consists of three types of entities, stored as distinct relations in separate tables: • clients (CLIENTS) • pets ( PETS) • vets (VETS) • there is no redundancy (only foreign keys are replicated) • there are no partial and transitive dependencies MIS-DB-Design 15 PA-5-Appendix RELATIONAL DATABASE DESIGN Example: Hardware Store Database • the ORDERS table : Order Numb 10001 10001 10002 10002 10002 10002 10003 10004 10004 10005 MIS-DB-Design Cust Code 5217 5217 5021 5021 5021 5021 4118 6002 6002 5021 Order Date 11/22/94 11/22/94 11/22/94 11/22/94 11/22/94 11/22/94 11/22/94 11/22/94 11/22/94 11/23/94 Cust Name Williams Williams Johnson Johnson Johnson Johnson Lorenzo Kopiusko Kopiusko Johnson 16 ProdDescr Prod Price Hammer $8.99 Screwdriver $4.45 Clipper $18.22 Screwdriver $44.45 Crowbar $11.07 Saw $14.99 Hammer $8.99 Saw $14.99 Screwdriver $4.45 Cordlessdrill $34.95 Quantity 2 1 1 3 1 1 1 1 2 1 PA-5-Appendix RELATIONAL DATABASE DESIGN Example: Hardware Store Database (Cont.) • ORDERS = {OrderNumb, ProdDescr, CustCode, OrderDate, CustName, ProdPrice, Quantity} • dependency diagram of the ORDERS table: partial Order Numb Prod Descr Cust Code Cust Name Order Date transitive Prod Price Quantity partial • conversion of the hardware store database to 2NF • QUANTITY = {OrderNumb, ProdDescr, Quantity} • PRODUCTS = {ProdDescr, ProdPrice} • ORDERS = {OrderNumb, CustCode, OrderDate, CustName} • dependency diagram of the hardware store database in 2NF Order Numb Prod Descr Prod Descr Quantity Prod Price transitive Order Numb MIS-DB-Design Cust Code 17 Order Date Cust Name PA-5-Appendix RELATIONAL DATABASE DESIGN Example: Hardware Store Database (Cont.) • conversion of the ORDERS relation to 3NF • QUANTITY = {OrderNumb, ProdDescr, OrderQuant} • PRICE = {ProdDescr, ProdPrice} • ORDERS = {OrderNumb, CustCode, OrderDate} • CUSTOMERS = {CustCode, CustName} • dependency diagram of the hardware store database in 3NF Order Numb Order Numb Prod Descr Prod Descr Order Quant Order Date Cust Code Cust Code Prod Price Cust Name • table relationships for the hardware store database MIS-DB-Design 18 PA-5-Appendix RELATIONAL DATABASE DESIGN Example: Video Store Database • the CUSTOMER relation: Customer Phone ID 1 502-666-7777 2 502-888-6464 Last Name Johnson Smith First Name Martha Jack Address 125 Main St. 873 Elm St. 3 502-777-7575 Washington Elroy 95 Easy St. 4 5 ….. 502-333-9494 502-474-4746 ……. Adams Steinmetz …… Samuel Susan …… 746 Brown Dr. 15 Speedway Dr. …… City Alvaton Bowling Green Smith’s Grove Alvation Portland ….. State KY KY Zip Code 42122 42101 KY 42171 KY TN ….. 42122 37148 ….. • the RENTAL-FORM relation: Trans ID 1 1 2 2 2 3 ….. Rent Date 4/18/95 4/18/95 4/18/95 4/18/95 4/18/95 4/18/95 ……. Customer ID 3 3 7 7 7 8 …… Video ID 1 6 8 2 6 9 …… Copy# 2 3 1 1 1 1 …… Title Rent 2001:SpaceOdyssey Clockway Orange Hpscotch Apocalypse Now Clockwork Orange Luggage of the Gods ….. $1.50 $1.50 $1.50 $2.00 $1.50 $2.50 ….. • a customer can rent multiple videos as part of the same transaction • however, the VideoID fields will be different for each video • multiple copies of the same video are allowed • the copy# field stores the number of the copy • video rental depends on the title and not on the day • the database still contains some anomalies MIS-DB-Design 19 PA-5-Appendix RELATIONAL DATABASE DESIGN Example: Video Store Database (Cont.) • relations for the video store database • CUSTOMER = {CustomerID, Phone, Name, Address, City, State, ZipCode} • RENTAL-FORM = {TransID, RentDate, CustomerID, VideoID, Copy#, Title, Rent} • dependency diagram for the video store database Customer ID Trans ID Phone Rent Date Name Customer ID Address Video ID City State Copy# Title ZipCode Rent • video store database after eliminating partial dependencies • CUSTOMER = {CustomerID, Phone, Name, Address, City, State, ZipCode} • RENTALS = {TransID, RentDate, CustomerID} • VIDEOS = {VideoID, Title, Rent} • VideosRented = {TransID, VideoID, Copy#} MIS-DB-Design 20 PA-5-Appendix RELATIONAL DATABASE DESIGN Example: Video Store Database (Cont.) • dependency diagram for the video store database in 3NF Customer ID Phone Trans ID Rent Date Name Address Customer ID Trans ID City Video ID Video ID State Title ZipCode Rent Copy# • table relationships for the video store database MIS-DB-Design 21 PA-5-Appendix RELATIONAL DATABASE DESIGN Summary of Guidelines for Database Design • identify the entities involved in the database • identify the fields relevant for each entity and define the corresponding relations • determine the primary key of each relation • avoid data redundancy, but have some common fields so that tables can be joined together • ensure that all the required database processing can be done using the defined relations • normalize the relations by splitting them into smaller ones MIS-DB-Design 22 PA-5-Appendix