Database Management Systems & Programming LIS 558 - Week 5 ER Model Transformation Normalization Faculty of Information & Media Studies Summer 2000 Class Outline E-R Transformation E-R Transformation Exercises Break Normalization Normalization Exercises Steps to E-R ModelTransformation 1. Identify entities 2. Identify relationships 3. Determine relationship type 4. Determine level of participation 5. Assign an identifier for each entity 6. Draw completed E-R diagram 7. Deduce a set of preliminary skeleton tables along with a proposed primary key for each table (using cases provided) 8. Develop a list of all attributes of interest (not already listed and systematically assign each to a table in such a way to achieve a 3NF design (i.e., no repeating groups, no partial dependencies, and no transitive dependencies) Transforming an E-R Model General Rules Governing Relationships among Tables 1. All primary keys must be defined as NOT NULL. 2. Define all foreign keys to conform to the following requirements for binary relationships. – 1:M Relationship – M:N Relationship – 1:1 Relationship – Weak Entity Transforming an E-R Model 1:M Relationships • Create the foreign key by putting the primary key of the “one” (parent) in the table of the “many” (dependent). • Foreign Key Rules: Null On Delete On Update If both sides are MANDATORY NOT NULL RESTRICT CASCADE If both sides are OPTIONAL NULL ALLOWED SET NULL CASCADE If one side is OPTIONAL and the other MANDATORY NULL ALLOWED SET NULL or RESTRICT CASCADE Transforming an E-R Model Weak Entity • Put the key of the parent table (strong entity) in the weak entity. • The weak entity relationship conforms to the same rules as the 1:M relationship, except foreign key restrictions: NOT NULL ON DELETE CASCADE ON UPDATE CASCADE M:N Relationship • Convert the M:N relationship to a composite (bridge) entity consisting of (at least) the parent tables’ primary keys. Transforming an E-R Model 1:1 Relationships • If both entities are in mandatory participation in the relationship and they do not participate in other relationships, it is most likely that the two entities should be part of the same entity. Transforming an E-R Model Case 1: M:N, Both sides MANDATORY Transforming an E-R Model Case 2: M:N, Both sides OPTIONAL Transforming an E-R Model Case 3: M:N, One side OPTIONAL Transforming an E-R Model Cases 1-3: M:N 1 M PATIENT N 1 prescribed DRUG PATIENT (PATIENT_ID, PATIENT_LNAME, PATIENT_PHYSICIAN,...) DRUG (DRUG_ID, DRUG_NAME, DRUG_MANUFACTURER, ...) PRESCRIBE(PATIENT_ID, DRUG_ID, DOSAGE, DATE…) NOTE: The relationship may have its own attributes. Example of decomposing entities with a binary M:N relationship Students:Classes have an M:N relationship, therefore, decompose to three tables. bridge table Transforming an E-R Model Case 4: 1:M, Both sides MANDATORY EMPLOYEE 1 checks M PRODUCT EMPLOYEE (EMP_ID, EMP_DEPT, …) PRODUCT (PROD_ID, PROD_NAME, PROD_%FIBRE, EMP_ID... ) Transforming an E-R Model Case 5: 1:M, Both sides OPTIONAL PHYSIOTHERAPIST 1 has M CLIENTS PHYSIOTHERAPIST (PT_ID, PT_LNAME, ...) CLIENT (CLIENT_ID, CLIENT_LNAME, CLIENT_OHIP#, …PT_ID) Transforming an E-R Model Case 6: 1:M, Many side OPTIONAL, one side MANDATORY MACHINE 1 contains M PARTS MACHINE (MACH_ID, MACH_NAME, MACH_DEPT, ...) PART (PART_ID, PART_NAME, PART_CATEGORY, …, MACH_ID) Transforming an E-R Model Case 7: 1:M, One side OPTIONAL, many side MANDATORY BAND 1 accepts M MUSICIAN BAND (BAND_ID, BAND_NAME, MUSIC_TYPE...) MUSICIAN (MUSICIAN_ID, MUSICIAN_INSTRUMENT, … BAND_ID) Transforming an E-R Model Case 8: 1:1, Both Sides MANDATORY Transforming an E-R Model Case 8: 1:1, Both Sides MANDATORY PLUMBER 1 1 assigned BUILDING PLUMBER (PLUMBER_ID, PLUMBER_LNAME,…BUILDING_ID) BUILDING (BUILDING_ID, BUILDING_ADDRESS,...) EMPLOYEE 1 has a 1 JOB-DESCRIPTION EMPLOYEE (EMP_NUM, EMP_LNAME,…, JOB_DESC) Transforming an E-R Model Case 9: 1:1, Both Sides OPTIONAL EXERCISER 1 has 1 TRAINER EXERCISER (EXERCISER_ID, EXERCISER_LNAME, …TRAINER_ID) TRAINER (TRAINER_ID, TRAINER_LNAME, ...) Transforming an E-R Model Case 10: 1:1, One Side OPTIONAL, One Side MANDATORY EMPLOYEE 1 has 1 AUTO EMPLOYEE (EMP_ID, EMP_LNAME, EMP_PHONE,…) AUTO (LIC_NUM, SERIAL_NUM, MAKE, MODEL,, …, EMP_ID) Transforming an E-R Model Case 11: Weak Entity (Foreign key located in weak entity) Case 11. Decomposing Weak Entities When the relationship type of a binary relationship is 1:M between an entity and its weak entity, two tables are required: one for each entity, with the entity key from each entity serving as the primary key for the corresponding table. Additionally, the entity that has a dependency on the existence of another entity has a primary key that is partially or totally derived from the parent entity of the relationship. Weak entities must be deleted when the strong entity is deleted. HOSPITAL 1 contains M UNIT HOSPITAL (HOSP_ID, HOSP_NAME, HOSP_ADDRESS, ...) UNIT (HOSP_ID, UNIT_NAME, HEAD_NURSE, ...) Transforming an E-R Model Case 12: Multivalued Attributes Decomposing an IS-A Relationship CLIENT 1 INDIVIDUAL CLIENT CORPORATE CLIENT Entity CLIENT contains ClientNumber ClientName Address AmountDue SocialInsuranceNumber TaxIdentificationNumber ContactPerson Phone Problem: Too many NULL values Solution: Separate into CLIENT entity plus several subtypes Decomposing an IS-A Relationship Create a table for the parent entity and for each of the child entities or subtypes Move the associated attributes from the parent entity into the child table to which they correspond From the parent entity take the entity key and add it as the primary key to the corresponding table for each child entity In the event a table corresponding to a child entity already has a primary key then simply add the entity key from the parent entity as an attribute of the table corresponding to the child entity CLIENT 1 INDIVIDUAL CLIENT CORPORATE CLIENT CLIENT (CLIENT_ID, AMOUNT_DUE, …) INDIVIDUAL_CLIENT (CLIENT_ID, SIN#, …) CORPORATE_CLIENT(CLIENT_ID, GST#, …) Transforming Recursive Relationships 1:1 - create a foreign key field (duplicate values not allowed) that contains the domain of primary key Stud_ID Stud_FName Stud_LName 1 2 3 4 Rodney Joki Francine Anne Jones Singh Moire Abel Locker Partner 4 3 2 1 1:M - create a foreign key field (duplicate values allowed) that contains the domain of primary key Prod_ID Prod_Name Base_Prod 1 2 3 4 5 6 Chicken burger Raw Chicken Weiner Schnitzel Fried Chicken Ground pork Pork dumplings 2 5 2 5 Transforming M:N Recursive Relationships M:N - create a second relation that contains two foreign keys: one for each side of the relationship “course requires course.” Decomposing Ternary relationships When a relationship is three-way (ternary) four preliminary tables are required: one for each entity, with the entity key from each entity serving as the primary key for the corresponding table, and one for the relationship. The table corresponding to the relationship will have among its attributes the entity keys from each entity Similarly, when a relationship is N-way, N+1 preliminary tables are required. Transforming an E-R Diagram Converting an E-R Model into a Database Structure • A painter might paint many paintings. The cardinality is (1,N) in the relationship between PAINTER and PAINTING. • Each painting is painted by one (and only one) painter. • A painting might (or might not) be exhibited in a gallery; i.e., the GALLERY is optional to PAINTING. Transforming an E-R Model Transformed schema for ARTIST database PAINTER(PRT_NUM, PRT_LASTNAME, PRT_FIRSTNAME, PRT_INITIAL, PTR_AREACODE, PRT_PHONE) Case 4 PAINTING(PNTG_NUM, PNTG_TITLE, PNTG_PRICE, PTR_NUM, GAL_NUM) Case 7 GALLERY(GAL_NUM, GAL_OWNER, GAL_AREACODE, GAL_PHONE, GAL_RATE) A Data Dictionary for the ARTIST Database Library Database Example writes AUTHOR M BOOK N M publishes 1 PUBLISHER PUBLISHER (Pub_ID, ___, ___, ___, ___, …) BOOK (ISBN, Pub_ID, ___, ___, ___, ___, …) AUTHOR (Author_ID, ___, ___, ___, ___, …) WRITES(ISBN, Author_ID, ___, ___, ___, ___, …) Case 6 Case 2 University Example M STUDENT takes N COURSE M M taught by Case 6 ENROLL (StudID, CourseID, ___, ...) STUDENT (StudID, ___, ___, FacID, …) Case 2 COURSE (CourseID, ___, ___, ___, …) FACULTY (FacID, ___, ___, ___, ___, …) TEACH (FacID, CourseID,…) Case 2 N 1 FACULTY E-R Modeling & Transformation Exercise E-R Modeling & Transformation Exercise Create an E-R model and define its table structures for the following requirements. - An INVOICE is written by a SALESREP. Each sales representative can write many invoices, but each invoice is written by a single sales representative. - The INVOICE is written for a single CUSTOMER. However, each customer may have many invoices. - An INVOICE may include many detail lines (LINE) which describe the products bought by the customer. - The product information is stored in a PRODUCT entity. - The product's vendor information is found in a VENDOR entity. E-R Modeling & Transformation Exercise 1 1 CUSTOMER VENDOR (1,1) (1,N) (1,N) gene rate s M (1,1) SHIPMENT 1 M INVOICE M (1,1) writes 1 (1,N) SALES RE P M M 1 PRO DUCT INV_LINE (1,N) (1,1) (1,1) (1,N) (1,1) (1,N) M 1 E-R Modeling & Transformation Exercise • Keep in mind that the preceding E-R diagram reflects a set of business rules that may easily be modified • For example, if customers are supplied via a commercial customer list, many of the customers on that list will not (yet!) have bought anything, so INVOICE would be optional to CUSTOMER • We are assuming here that a product can be supplied by many vendors and that each vendor can supply many products. The PRODUCT may be optional to VENDOR if the vendor list includes potential vendors from which you have not (yet) ordered anything. • Some products may never sell, so LINE is optional to PRODUCT... because an unsold product will never appear in an invoice line. • LINE may be shown as weak to INVOICE, because it borrows the invoice number as part of its primary key and it is existencedependent on INVOICE • The design depends on the exact nature of the business rules. E-R Modeling & Transformation Exercise 1 VENDOR (0,N) sh ips (1,1) 1 CUSTOMER ORDER (1,N) (1,1) gene rate s M (1,1) (1,1) writes 1 (1,N) SALES RE P M sh ows in M 1 INVOICE M M contains (1,N) M INV_LINE (1,1) 1 is in (1,1) (1,N) PRO DUCT (0,N) 1 E-R Modeling & Transformation Exercise CUSTOMER (CustomerID, …) INVOICE (InvoiceID, CustomerID, SalesRepID,…) LINE (InvoiceID, LineID, ProdID,…) PRODUCT (ProductID, …) SALESREP (SalesRepID, …) VENDOR (VendorID,…) ORDER (OrderID, ProductID, VendorID,…) Further E-R Transformation Exercises ER Modeling I handout - Q1 DIVISION (DivisionID,…ManagerID) DEPARTMENT (DeptID,…DivisionID) not null EMPLOYEE (EmpID, …DeptID) PROJECT (ProjectID,…) EMPLOYEE_PROJECT (EmpID, ProjectID,…) null allowed ER Modeling I - Q2 INSTRUCTOR (InstructorID, HighestDegree, …) COURSE (CourseID, ClassTitle, …) CLASS (ClassID, CourseID, InstructorID, Term…) TRAINEE (TraineeID, …) ENROLL (TraineeID, ClassID, Term…)* All foreign keys not null. * Optionally, create an EnrollmentID attribute to use as primary key. ER Modeling I - Q3 CUSTOMER (CustomerID, …) INVOICE (InvoiceID, CustomerID, SalesRepID,…) LINE (InvoiceID, LineID, ProdID,…) PRODUCT (ProductID, …) SALESREP (SalesRepID, …) VENDOR (VendorID,…) SHIP (ShipID, ProductID, VendorID,…) All foreign keys not null ER Modeling I - Q4 AGENT (AgentID, LName, Region…) CLIENT (ClientID, LName,…) MUSICIAN (MusicianID, AgentID, Name, DaysAvailable,…) EVENT (EventID, ClientID, MusicianID, Date, Time, Location…) INSTRUMENT (InsturmentID, …) MUSICIAN_INSTRUMENT (MusicianID, InstrumentID, YearsExperience…) All foreign keys not null. ER Modeling I - Q5 CITY (CityID, …) TEAM (TeamID, CoachID, CityID, …) PLAYER (PlayerID, TeamID,…) COACH (CoachID, TeamID,…) GAME (GameID, HomeTeamID, VisitorTeamID,…) All foreign keys not null. ER Modeling II - Q1 COMPANY (CompanyID, …) DEPARTMENT (DepartmentID, CompanyID…) EMPLOYEE (EmployeeID, DepartmentID, …) DEPENDENT (EmployeeID, DependentID, …) EMPLOYEE_HISTORY (EmployeeID, HistoryID, …) All foreign keys are not null ER Modeling II - Q2 MEMBER (MemberID, …) WORKOUT (WorkoutID, MemberID, Date…) EXERCISE (ExerciseID…) WORKOUT_EXERCISE (WorkoutID, ExerciseID, NumberSets, NumberReps,…) ER Modeling II - Q3 EMPLOYEE (EmployeeID, Name…PositionID) PART_TIME_EMPLOYEE (EmployeeID, HourlyRate…) FULL_TIME_EMPLOYEE (EmployeeID, Salary, OfficeRoom, …) POSITION (PositionID, Title, Job_Description…) All foreign keys not null. ER Modeling II - Q4 USER (UserID, Name, Department,…) PROBLEM (ProblemID, TimeSpent, UserID, ResolverID,…) HARDWARE (ProblemID, Description, Solution…) SOFTWARE (ProblemID, SoftwareVersion, …) RESOLVER (ResolverID, Name, Phone, Level, …) All foreign keys not null. ER Modeling II - Q5 EMPLOYEES (EmployeeID, SupervisorID, …) SKILLS (SkillID, SkillName, …) EMPLOYEE_SKILL (EmployeeID, SkillID, DateAcquired, Certification,…) PROJECTS (ProjectID, ProjectName, ManagerID, StartDate…) EMPLOYEE_PROJECT (EmployeeID, ProjectID, Role…) PROJECT_SKILL (ProjectID, SkillID, SkillLevelRequired, NumberStaff,…)* DEPENDENTS (EmployeeID, DependentID, DateOfBirth…) WORK_HISTORY(EmployeeID, HistoryID,…) BENEFITS (BenefitID, BenefitType, Company, Contact,…) EMPLOYEE_BENEFIT (EmployeeID, BenefitID,…) All foreign keys are not null. * Optionally, create a ProjectSkill_ID attribute to use as primary key. ER Modeling II - Q6 ORCHARD (OrchardID, Location, …) SPECIES (SpeciesID, Name, OrchardID…) DISEASE (DiseaseID, Symptoms, Treatment,…) SPECIES_DISEASE (SpeciesDiseaseID, SpeciesID, DiseaseID, Date,…)* CUSTOMER (CustomerID, …) ORDER (OrderID, CustomerID, …) ORDERDETAILS (OrderID, DetailID, SpeciesID,…) All foreign keys not null. * Optionally, use the combination of SpeciesID, DiseaseID and Date as primary key and remove SpeciesDiseaseID entirely. Class Outline E-R Transformation E-R Transformation Exercises Break Normalization Normalization Exercises Transformation & Normalization 1. Identify entities 2. 3. 4. 5. 6. 7. Identify relationships Determine relationship type Determine level of participation Assign an identifier for each entity Draw completed E-R diagram Deduce a set of preliminary skeleton tables along with a proposed primary key for each table (using rules provided) 8. Develop a list of all attributes of interest (not already listed and systematically assign each to a table in such a way to achieve a 3NF design (i.e., no repeating groups, no partial dependencies, and no transitive dependencies) Database Design Problems Database design is the process of separating information into multiple tables that are related to each other Single table designs work only for the simplest of situations in which data integrity problems are easy to correct Anomalies (abnormalities) often arise in single table designs as a result of inserting, deleting, or updating records Some tables are better structured than others (i.e., result in fewer anomalies) Database Design Problems Numerous anomalies can arise during the design of databases • • • • • Redundancy Multi-valued problems Update anomalies Insertion anomalies Deletion anomalies The Problem with Nulls 1. Nulls used in mathematical expressions - unknown quantity leads to unknown total value - misleading value of all inventory Product ID Product Description Category Accessories Price Quantity Total Value 801 Shur-Lock U-Lock 75.00 802 SpeedRite Cyclecomputer 60.00 20 1,200.00 803 SteelHead Microshell HelmetAccessories 40.00 40 1,600.00 804 SureStop 133-MB Brakes Components 25.00 10 250.00 805 Diablo ATM Mountain Bike Bikes 806 Ultravision Helmet Mount Mirrors 10 74.50 1,200.00 7.45 Total: Category Total Occurences 0 Accessories 2 Bikes 1 Components 1 3,124.50 2. Nulls used in aggregate functions - blanks exist under category - cannot be counted because they don’t exist! Database Design Problems Use of the relational database model removes some database anomalies Further removal of database anomalies relies on a structured technique called normalization Presence of some of these anomalies is sometimes justified in order to enhance performance Database design consists of balancing the art of design with the science of design Normalization Goal in database design to create well-structured tables Transform E-R models to tables following the rules provided Assuring tables are well-structured with minimal problems (redundancy, multi-valued attributes, update anomalies, insertion anomalies, deletion anomalies) is achieved using structured technique called normalization Normalization Normalization is the structured decomposition of one table into two or more tables using a procedure designed to determine the most appropriate split Normalization our method of making sure the E-R design was correct in the first place Rules for Normalization Basic #1 Rule • The attribute values in a relational table should be functionally dependent (FD) on the primary key value. • In any table, a field A is said to be functionally dependent on field B if, regardless of any insertions or deletions, the value of B determines the value of A (in other words only one value of A occurs with a particular value of B) Rules for Normalization First Normal Form (1NF) • A table cannot have repeating fields or groups (i.e., must remove redundant data) • Repeating groups are removed by creating another table which holds those attributes that repeat. This second table is then linked to the original table with an identifier (i.e., foreign key) Rules for Normalization Second Normal Form (2NF) • Table is in 1NF • All nonkey fields in a table must be functionally dependent on all of the key (i.e., remove all partial dependencies) • 2NF is primarily concerned with dependencies involving a concatenated primary key (nonkey fields must be functionally dependent on the entire concatenated key not just one attribute of the composite key) Rules for Normalization Third Normal Form (3NF) • Table is in 2NF • A nonkey field cannot be functionally dependent on another nonkey field (i.e., remove transitive dependencies by placing attributes involved in a new relational table) Rules for Normalization Fourth Normal Form (4NF) Boyce-Codd Normal Form (BCNF) Fifth Normal Form (5NF) Domain-Key Normal Form (DKNF) For most database designs 3NF is sufficient 3NF is level for designing in this course First Normal Form A table is in first normal form if it meets the following criteria: The data are stored in a twodimensional table with no two rows identical and there are no repeating groups. • The following table in NOT in first normal form because it contains a multi-valued attribute (an attribute with more than one value in each row). Member_ID Memb_FName Memb_LName Hobbies 1 Rodney Jones hiking, cooking 3 Francine Moire golf, theatre, hiking 2 Anne Abel concerts Handling multi-valued attributes: Incorrect Solutions Member_ID Memb_FName Memb_LName Hobbies 1 Rodney Jones hiking, cooking 3 Francine Moire golf, theatre, hiking 2 Anne Abel concerts Member_ID Memb_FName Memb_LName Hobby1 Hobby2 Hobby3 1 Rodney Jones hiking cooking 3 Francine Moire golf theatre hiking 2 Anne Abel concerts Member_ID Memb_FName Memb_LName Hobbies 1 Rodney Jones fishing 1 Rodney Jones cooking 3 Francine Moire golf 3 Francine Moire theatre 3 Francine Moire hiking 2 Anne Abel concerts Handling multi-valued attributes: Correct Solution Create another entity (table) to handle multiple instances of the repeating group. This second table is then linked to the original table with an identifier (i.e., foreign key). This solution has the following advantages: • no limit to the number of hobbies per member • no waste of disk space • searching becomes much easier within a column (e.g., who likes hiking?) Member_ID Memb_FName Memb_LName Hobbies 1 Rodney Jones hiking, cooking 3 Francine Moire golf, theatre, hiking 2 Anne Abel concerts Member_ID Memb_FName Memb_LName 1 Rodney Jones 3 Francine Moire 2 Anne Abel Member_ID 1 1 3 3 3 2 Hobby hiking cooking golf theatre hiking concerts Handling Repeating Groups An attribute can have a group of several data entries. Repeating groups can be removed by creating another table which holds those attributes that repeat. This second table (validation table) is then linked to the original table with an identifier (i.e., foreign key) Advantages: fewer characters tables; reduces miskeying, update anomalies Product_ID Product_Name Category Price 801 Shur-Lock U-Lock Accessory 75.00 802 SpeedRite Cyclecomputer Component 60.00 803 SteelHead Microshell Helmet Accessory 40.00 804 SureStop 133-MB Brakes Component 25.00 805 Diablo ATM Mountain Bike Bike 806 Ultravision Helmet Mount Mirrors Accessory Product_ID Product_Name Category 801 Shur-Lock U-Lock 1 802 SpeedRite Cyclecomputer 2 803 SteelHead Microshell Helmet 1 804 SureStop 133-MB Brakes 2 805 Diablo ATM Mountain Bike 3 806 Ultravision Helmet Mount Mirrors 1 Price 75.00 60.00 40.00 25.00 1200.00 7.45 1,200.00 7.45 Category_ID Category 1 Accessory 2 Component 3 Bike Second Normal Form A table is in second normal form if it meets the following criteria: The relation is in first normal form, and, all nonkey attributes are functionally dependent on the entire primary key (no partial dependencies). • Applies only to tables that have a composite primary key. • In the following table, both the EmpID and Training (composite primary key) determine Date, whereas, only EmpID (part of the primary key) determines Dept. EmpID Training 1 Word 3 Excel 2 Excel 1 Access Date 12-Sep-99 14-Oct-99 14-Oct-99 23-Nov-99 Dept Oncology Paediatrics Renal Oncology Removing Partial Dependencies Remove partial dependencies by separating the relation into two relations. Reduces the problems of: • • • • update anomalies delete anomalies insert anomalies redundancies EmpID 1 3 2 1 EmpID 1 3 2 1 Training Word Excel Excel Access Date 12-Sep-99 14-Oct-99 14-Oct-99 23-Nov-99 Training Word Excel Excel Access Date 12-Sep-99 14-Oct-99 14-Oct-99 23-Nov-99 EmpID 1 2 3 Dept Oncology Paediatrics Renal Oncology Dept Oncology Renal Paediatrics Third Normal Form A table is in third normal form if it meets the following criteria: The relation is in second normal form, and, a nonkey field is not functionally dependent on another nonkey field (no transitive dependencies). • The following table is in second normal form but NOT in third normal form because Member_Id (the primary key) does not determine every attribute (does not determine RegistrationFee). RegistrationFee is determined by Sport. Member_ID 1 3 2 4 Memb_FName Rodney Francine Anne Goro Memb_LName Jones Moire Abel Azuma Sport Swimming Tennis Tennis Skiing RegistrationFee $100 $200 $200 $150 Member ID FName, LName, Lesson; Lesson Cost Removing non-key Transitive Dependencies Remove transitive dependencies by placing attributes involved in a new relational table. Reduces the problems of: • • • • update anomalies delete anomalies insert anomalies redundancies MemberID 1 3 2 4 MemberID 1 3 2 4 MembFName Rodney Francine Anne Goro MembFName Rodney Francine Anne Goro MembLName Sport Jones 1 Moire 2 Abel 2 Azuma 1 MembLName Jones Moire Abel Azuma Sport RegFee Swimming $100 Tennis $200 Tennis $200 Skiing $150 SportID Sport RegFee 1 Swimming $100 2 Tennis $200 3 Skiing $150 Normalization Example: Video Store A video rental shop tracks all of their information in one table. There are now 20,000 records in it. Is it possible to achieve a more efficient design? (They charge $10/movie/day.) Cust_Name Rodney Jones Francine Moire Anne Abel Rodney Jones Cust_address 23 Richmond St. 750-12 Kipps Lane 5 Sarnia Road 23 Richmond St. Cust_Phone 681-9854 672-9999 432-1120 681-9854 Rental_date Video_1 Video_2 15-Oct-99 Gone with the Wind Braveheart 4-Nov-99 Manhatten 3-Sep-99 Manhatten The African Queen 22-Sep-99 Never Say Never Silence Again of the Lambs Video_3 VideoType_1 Mississippi Burning Classic Comedy Comedy Adventure Return_date VideoType_2 VideoType3 17-Oct-99 Adventure Adventure Classic Horror 4-Sep-99 26-Sep-99 TotalPrice Paid? $ 60.00 yes $ $ 20.00 yes 80.00 yes VIDEO (Cust_name, Cust_address, Cust_phone, Rental_date, Video_1, Video_2, Video_3, VideoType_1, VideoType_2, VideoType3, Return_date, Total_Price, Paid?) Normalization Example: Video Store Cust_Name Rodney Jones Francine Moire Anne Abel Rodney Jones Cust_address 23 Richmond St. 750-12 Kipps Lane 5 Sarnia Road 23 Richmond St. Video_1 Video_2 Gone with the Wind Braveheart Manhatten Manhatten The African Queen Never Say Never Silence Again of the Lambs Cust_Phone 681-9854 672-9999 432-1120 681-9854 Video_3 VideoType_1 Mississippi Burning Classic Comedy Comedy Adventure Return_date 17-Oct-99 TotalPrice Paid? $ 60.00 yes 4-Sep-99 26-Sep-99 $ $ 20.00 yes 80.00 yes Rental_date 15-Oct-99 4-Nov-99 3-Sep-99 22-Sep-99 VideoType_2 VideoType3 Adventure Adventure Classic Horror Is the Video store in 1NF? No attributes should form repeating groups - remove them by creating another table. There are repeating groups for videos and customers. CUSTOMER (Cust_Num, Cust_Name, Cust_address_Cust_phone Cust_Num Cust_Name 1 Rodney Jones 2 Francine Moire 3 Anne Abel Cust_address Cust_Phone 23 Richmond St.681-9854 750-12 Kipps Lane 672-9999 5 Sarnia Road 432-1120 VIDEO (VideoNum, VideoName, VideoType VideoNum 1 2 3 4 5 6 7 VideoName VideoType Gone with the Wind Classic Manhatten Comedy Never Say Never Again A dventure Braveheart Adventure Mississippi Burning Adventure The African Queen Classic Silence of the Lambs Horror RENTAL (Cust_num, VideoNum, Rental_date, Return_date, TotalPrice, Paid?) Cust_Num VideoNum Rental_date Return_date 1 1,4,5 15-Oct-99 17-Oct-99 2 2 4-Nov-99 3 2,6 3-Sep-99 4-Sep-99 1 3,7 22-Sep-99 26-Sep-99 TotalPrice $ 60.00 $ $ 20.00 80.00 Paid? yes yes yes Video Store: 1NF (cont’d) Have not yet removed all repeating groups - video is a multivalued attribute - move to another table. Cust_Num VideoNum Rental_date Return_date 1 1,4,5 15-Oct-99 17-Oct-99 2 2 4-Nov-99 3 2,6 3-Sep-99 4-Sep-99 1 3,7 22-Sep-99 26-Sep-99 RentalNum Cust_Num 1 1 2 2 3 3 4 1 Rental_date Return_date TotalPrice Paid? 15-Oct-99 17-Oct-99 $ 60.00 yes 4-Nov-99 3-Sep-99 4-Sep-99 $ 20.00 yes 22-Sep-99 26-Sep-99 $ 80.00 yes RENTAL (RentalNum, Cust_Num, Rental_date, Return_Date, TotalPrice, Paid?) TotalPrice $ 60.00 $ $ 20.00 80.00 Paid? yes yes yes RentalNum VideoNum 1 1 1 4 1 5 2 2 3 2 3 6 4 3 4 7 RENTALDETAILS (RentalNum, VideoNum) The Video Store is now in 1NF CUSTOMER (Cust_Num, Cust_Name, Cust_address, Cust_phone Cust_Num Cust_Name 1 Rodney Jones 2 Francine Moire 3 Anne Abel Cust_address Cust_Phone 23 Richmond St.681-9854 750-12 Kipps Lane 672-9999 5 Sarnia Road 432-1120 VideoNum VideoName VideoType 1 Gone with the Wind Classic 2 Manhatten Comedy VIDEO (VideoNum, VideoName, VideoType 3 Never Say Never Again Adventure 4 Braveheart Adventure RentalNum VideoNum 5 Mississippi Burning Adventure 1 1 6 The African Queen Classic 1 4 7 Silence of the Lambs Horror 1 5 2 2 RentalNum Cust_Num Rental_date Return_date TotalPrice Paid? 3 2 1 1 15-Oct-99 17-Oct-99 $ 60.00 yes 3 6 2 2 4-Nov-99 4 3 3 3 3-Sep-99 4-Sep-99 $ 20.00 yes 4 7 4 1 22-Sep-99 26-Sep-99 $ 80.00 yes RENTALDETAILS RENTAL (RentalNum, Cust_Num, Rental_date, Return_Date, TotalPrice, Paid?) (RentalNum, VideoNum) Is the Video Store in 2NF? The only table that has a composite primary key has no other fields, therefore, yes. Cust_Num Cust_Name 1 Rodney Jones 2 Francine Moire 3 Anne Abel VideoNum 1 2 3 4 5 6 7 Cust_address Cust_Phone CUSTOMER (Cust_Num, 23 Richmond St.681-9854 Cust_Name, Cust_address, 750-12 Kipps Lane 672-9999 Cust_phone 5 Sarnia Road 432-1120 VideoName VideoType Gone with the Wind Classic Manhatten Comedy Never Say Never Again Adventure Braveheart Adventure Mississippi Burning Adventure The African Queen Classic Silence of the Lambs Horror RentalNum Cust_Num 1 1 2 2 3 3 4 1 VIDEO (VideoNum, VideoName, VideoType Rental_date Return_date TotalPrice 15-Oct-99 17-Oct-99 $ 60.00 4-Nov-99 3-Sep-99 4-Sep-99 $ 20.00 22-Sep-99 26-Sep-99 $ 80.00 RENTAL (RentalNum, Cust_Num, Rental_date, Return_Date, TotalPrice, Paid?) Paid? yes yes yes RentalNum VideoNum 1 1 1 4 1 5 2 2 3 2 3 6 4 3 4 7 RENTALDETAILS (RentalNum, VideoNum) Is the Video Store in 3NF? Does each attribute in each table depend upon the primary key? Cust_Num Cust_Name 1 Rodney Jones 2 Francine Moire 3 Anne Abel VideoNum 1 2 3 4 5 6 7 Cust_address Cust_Phone 23 Richmond St. 681-9854 750-12 Kipps Lane 672-9999 5 Sarnia Road 432-1120 VideoName VideoType Gone with the Wind Classic Manhatten Comedy Never Say Never Again Adventure Braveheart Adventure Mississippi BurningAdventure The African Queen Classic Silence of the LambsHorror RentalNum Cust_Num 1 1 2 2 3 3 4 1 Rental_date Return_date TotalPrice 15-Oct-99 17-Oct-99 $ 60.00 4-Nov-99 3-Sep-99 4-Sep-99 $ 20.00 22-Sep-99 26-Sep-99 $ 80.00 RentalNum VideoNum 1 1 1 4 1 5 2 2 3 2 3 6 4 3 4 7 Paid? yes yes yes The Video Store is now in 3NF Because, in each table every attribute depends on the primary key and not on any other key. Cust_Num Cust_Name 1 Rodney Jones 2 Francine Moire 3 Anne Abel VideoNum 1 2 3 4 5 6 7 Cust_address Cust_Phone 23 Richmond St. 681-9854 CUSTOMER (Cust_Num, Cust_Name, 750-12 Kipps Lane 672-9999 Cust_address, 5 Sarnia Road 432-1120 Cust_phone) VideoName VideoType Gone with the Wind Classic Manhatten Comedy Never Say Never Again Adventure Braveheart Adventure Mississippi BurningAdventure The African Queen Classic Silence of the LambsHorror RentalNum Cust_Num 1 1 2 2 3 3 4 1 Rental_date 15-Oct-99 4-Nov-99 3-Sep-99 22-Sep-99 RENTAL (RentalNum, Cust_Num, Rental_date) VIDEO (VideoNum, VideoName, VideoType) RentalNum VideoNum ReturnDate 1 1 16-Oct-99 1 4 17-Oct-99 1 5 16-Oct-99 2 2 5-Nov-99 3 2 4-Sep-99 3 6 6-Sep-99 4 3 24-Sep-99 4 7 16-Sep-99 Amt_Paid $10 $20 $10 $10 0 0 $5 0 RENTALDETAILS (RentalNum, VideoNum, ReturnDate, Amt_Paid) Normalization Example: ARTIST Checking Transformed ER Model Transformed schema for ARTIST database PAINTER(PRT_NUM, PRT_LASTNAME, PRT_FIRSTNAME, PRT_INITIAL, PTR_AREACODE, PRT_PHONE) PAINTING(PNTG_NUM, PNTG_TITLE, PNTG_PRICE, PTR_NUM, GAL_NUM) GALLERY(GAL_NUM, GAL_OWNER, GAL_AREACODE, GAL_PHONE, GAL_RATE) 1NF? 2NF? 3NF? Checking Transformed RE Model 1 VENDOR (0,N) sh ips (1,1) 1 CUSTOMER ORDER (1,N) (1,1) gene rate s M (1,1) (1,1) writes 1 (1,N) SALES RE P M sh ows in M 1 INVOICE M M contains (1,N) M INV_LINE (1,1) 1 is in (1,1) (1,N) PRO DUCT (0,N) 1 Checking Transformed ER Model CUSTOMER (CustomerID, …) INVOICE (InvoiceID, CustomerID, SalesRepID,…) LINE (InvoiceID, LineID, ProdID,…) PRODUCT (ProductID, …) SALESREP (SalesRepID, …) VENDOR (VendorID,…) ORDER (OrderID, ProductID, VendorID,…) 1NF? 2NF? 3NF? depends on placement of attributes Normalization Exercises Normalization Exercises To keep track of office furniture, computers, printers, and so on, the FOUNDIT company uses the following table structure: Attribute name Sample value ITEM_ID ITEM_DESCRIPTION BLDG_ROOM BLDG_CODE BLDG_NAME BLDG_MANAGER 2311345-678 HP DeskJet 660C printer 325 DEL Dawn's Early Light E. R. Rightonit Given this information, draw the dependency diagram. Make sure you label the transitive and/or partial dependencies. Normalization Exercises ITE M_ ID ITE M_ DESC RI PTION BLDG_ RO OM B LDG_C ODE BL DG_ NAM E Transitive De pend encie s BLDG_ MAN AGE R Normalization Exercises All ta bles in 3NF ITEM_ID ITEM_DESCRIPTION ITEM_ROOM BLDG_COD E BLDG_CODE BLDG_NAME EMP_CODE EMP_CODE EMP_LNAME EMP_FNAME EMP_INI TIAL Normalization Exercises 1 EMPLOYEE (0,N) manages (1,1) M 1 BUI LDING M contains (1,N) EMPLOYEE ITEM (1,1) BUI LDING ITEM EMP_CODE BLDG_CODE ITEM_ID EMP_LNA ME BLDG_NA ME ITEM_DESCRIPTION EMP_FNAME EMP_CODE ITEM_ROOM EMP_INI TI AL BLDG_CODE Normalization Exercises Conflicting Goals of Design Database design must reconcile the following requirements: • Design elegance requires that the design must adhere to design rules concerning nulls, derived attributes, redundancies, relationship types, etc. • Information requirements are dictated by the end users • Operational (transaction) speed requirements are also dictated by the end users Clearly, an elegant database design that fails to address end user information requirements or one that forms the basis for an implementation whose use progresses at a snail's pace has little practical use. Characteristics of Fields Each field within a table must have a unique name (avoid spaces and special characters). Data within a field must be of the same data type. The following are common data types: • • • • • • • • character (text or string) memo (large character field) integer (whole numbers for calculations) number (values with decimals for calculations) currency (formatted number) logical or Boolean (true/false; 0,-1; yes/no) date/ time (use computer’s internal calendar/clock) graphic (picture) Guidelines for Ideal Table Design Each table should represents a single theme or subject or entity or transaction Tables should include primary keys that uniquely identify each record of each table Avoid the use of smart keys that attempt to embed meaning into primary keys (keys should be meaningless) A primary key should be a unique, random or sequential collection of alphabetic, numeric or alphanumeric characters The domain of primary keys should be large enough to accommodate the identification of unique rows for the entire potential universe of records Use the suffix ID in constructing primary keys to ensure they are readily identifiable Tables should not contain any of the following: multipart fields, multivalued fields, calculated or derived fields or unnecessary duplicate fields There should be a minimum amount of redundant data Common Errors in Database Design Flat file database Duplicate field names Too much data Cryptic field or table names Compound fields Referential integrity Missing keys Database Security Bad keys Missing relationships Unnecessary relationships Missing or incorrect business rules Missing or incorrect constraints Incorrect relationships Ashenfelter, J. P. (March 26, 1999). Common Database Mistakes. Found online at <http://webreview.com/wr/pub/1999/03/26/feature/index3.html> (June 5, 2000). The Well-Structured Database E-R modeling is top-down method of designing Transforming an E-R model does not guarantee the best design (e.g., E-R model could be way off) Best to transform E-R model and then check the design according to the Cases of normalization Normalization is bottom-up method of designing a database Use both approaches to develop a wellstructured database For Week 6 Assignment #2 due June 12 Structured Query Language Discuss project assignment Read Rob, Chapter 3.1-3.6 and Chapter 6 Work on Adamski Tutorial #5