4 Building the Database E-R Diagrams and Normalization of Database Tables MIS 304 Winter 2006 4 Review of Last Class • The relational database is set of linked tables. • The links are accomplished by using keys: – Primary keys to identify a unique value in each row (entity instance). – Foreign keys use an attribute from another table(s) primary key to connect the tables. • In order to query two linked tables we need to do a join. – Joins are powerful but potentially costly in terms of computer resources. 2 4 Class Objective • • • • • Fill out our understanding of E-R diagrams. Look at how databases get “loaded” Understand the concept of a “Normal Form” Understand why these concepts are important. Be able to move a database or table from one level “Form” to another. 3 4 The Building Blocks of an E-R Diagram • • • • Entity Attribute Relationship Identifier 4 4 Data Models: Degrees of Data Abstraction Figure 3.1 5 4 The Entity Relationship (E-R) Model • Represents conceptual view • Main Components – Entities • Corresponds to entire table, not row • Represented by rectangle – Attributes – Relationships 6 4 Attributes • Characteristics of entities • Domain is set of possible values • Primary keys underlined Figure 3.6 7 4 The Entity Relationship (E-R) Model – A derived attribute is not physically stored within the database; instead, it is derived by using an algorithm. • Example: AGE can be derived from the data of birth and the current date. 8 Basic E-R Model Entity Presentation 4 9 4 Multivalued Attribute in Relational DBMS • The relational DBMS cannot implement multivalued attributes. • Possible courses of action for the designer – Within the original entity, create several new attributes, one for each of the original multivalued attribute’s components (Figure 4.9). – Create a new entity composed of the original multivalued attribute’s components 10 A New Entity Set Composed of Multivalued Attribute’s Components 4 11 Splitting the Multivalued Attributes into New Attributes 4 12 4 Connectivity and Cardinality in an ERD Figure 3.12 13 Weak Entity 4 • Existence-dependent on another entity • Has primary key that is partially or totally derived from parent entity 14 Relationship Participation 4 • The participation is optional if one entity occurrence does not require a corresponding entity occurrence in a particular relationship. • An optional entity is shown by a small circle on the side of the optional entity. 15 Relationship Degree 4 Unary – Single entity, “Recursive” – Exists between occurrences of same entity set • Binary – Two entities associated • Ternary – Three entities associated 16 4 17 4 The Chen Representation of the Invoicing Problem 18 The Crow’s Foot Representation of the Invoicing Problem 4 19 4 Database Tables and Normalization • Normalization is a process for assigning attributes to entities. It reduces data redundancies and helps eliminate the data anomalies. • Normalization works with data modeling to make the database more useful. • The highest level of normalization is not always the most desirable. 20 4 Normalization Stages • Normalization works through a series of stages called normal forms: – – – – – First normal form (1NF) Second normal form (2NF) Third normal form (3NF) Fourth normal form (4NF) Fifth normal form (5NF) 21 4 The Need for Normalization • Case of a Construction Company – Building project -- Project number, Name, Employees assigned to the project. – Employee -- Employee number, Name, Job classification – The company charges its clients by billing the hours spent on each project. The hourly billing rate is dependent on the employee’s position. – Periodically, a report is generated. – The table whose contents correspond to the reporting requirements is shown in Table 5.1. 22 4 Artifacts • Companies have ingrained processes and procedures. • There are often visible by looking at the paper trail. • Data reports and forms are a big part of that trail. • You can look at the output artifacts as well as the input ones. 23 4 24 A Table Whose Structure Matches the Report Format 4 25 4 Database Tables and Normalization • Problems with the Figure 5.1 – The project number is intended to be a primary key, but it contains nulls. – The table displays data redundancies. – The table entries invite data inconsistencies. – The data redundancies yield the following anomalies: • Update anomalies. • Addition anomalies. • Deletion anomalies. 26 4 Conversion to First Normal Form • A relational table must not contain repeating groups. – Repeating groups can be eliminated by adding the appropriate entry in at least the primary key column(s). 27 Data Organization: First Normal Form 4 28 4 What do you notice? • How much redundancy is there in the table? • What are the keys? 29 4 Another Example Region West Salesperson Jones Smith East Brinkley Month Total Jan 1550 Feb 1290 Jan 1990 Feb 2010 Jan 2202 30 4 Tools to help in Normalization • Data Models • Dependency Diagrams, A new tool. 31 4 Data Models • Last time we saw how to use a data model to evaluate the Cost/Benefit of adding and removing relations between tables. 32 4 Dependency Diagram (1NF) • Dependency Diagram – The primary key components are bold, underlined, and shaded in a different color. – The arrows above entities indicate all desirable dependencies, i.e., dependencies that are based on PK. – The arrows below the dependency diagram indicate less desirable dependencies -- partial dependencies and transitive dependencies. Figure 4.4 33 4 Forms of Dependency • Dependency • Partial Dependency • Transitive Dependency 34 4 Database Tables and Normalization • 1NF Definition – The term first normal form (1NF) describes the tabular format in which: • All the key attributes are defined. • There are no repeating groups in the table. • All attributes are dependent on the primary key. 35 4 Conversion to Second Normal Form • Starting with the 1NF format, the database can be converted into the 2NF format by using the Dependency Diagram and: – Writing each key component on a separate line, and then writing the original key on the last line and – Writing the dependent attributes after each new key. PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR) ASSIGN (PROJ_NUM, EMP_NUM, HOURS) 36 Second Normal Form (2NF) Conversion Results 4 37 4 2NF Definition • A table is in 2NF if: – It is in 1NF and – It includes no partial dependencies; that is, no attribute is dependent on only a portion of the primary key. (It is still possible for a table in 2NF to exhibit transitive dependency; that is, one or more attributes may be functionally dependent on nonkey attributes.) 38 4 Effect on Queries • How does this differ from the 1NF version. • How does this effect the SQL queries you are likely to run on the database. – Number of “Joins” – Number of “Sub Queries” 39 4 Conversion to Third Normal Form • Create a separate table with attributes in a transitive functional dependence relationship. PROJECT (PROJ_NUM, PROJ_NAME) ASSIGN (PROJ_NUM, EMP_NUM, HOURS) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS) JOB (JOB_CLASS, CHG_HOUR) 40 4 3NF Definition • A table is in 3NF if: – It is in 2NF and – It contains no transitive dependencies. 41 The Completed Database 4 42 4 The Query Effects • Now how many “Joins” do we have? • What is stored where? 43 4 Additional DB Enhancements Figure 4.6 44 4 Boyce-Codd Normal Form (BCNF) • A table is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key. (A determinant is any attribute whose value determines other values with a row.) • If a table contains only one candidate key, the 3NF and the BCNF are equivalent. • BCNF is a special case of 3NF. 45 4 3NF Table Not in BCNF Figure 4.7 46 Sample Data for a BCNF Conversion 4 Table 5.2 47 4 Decomposition into BCNF Figure 4.9 48 4 Normalization and Database Design • Database Design and Normalization Example: (Construction Company) – Summary of Operations: • The company manages many projects. • Each project requires the services of many employees. • An employee may be assigned to several different projects. • Some employees are not assigned to a project and perform duties not specifically related to a project. Some employees are part of a labor pool, to be shared by all project teams. • Each employee has a (single) primary job classification. This job classification determines the hourly billing rate. • Many employees can have the same job classification. 49 4 Normalization and Database Design • Normalization should be part of the design process • E-R Diagram provides macro view • Normalization provides micro view of entities – Focuses on characteristics of specific entities – May yield additional entities • Difficult to separate normalization from E-R diagramming • Business rules must be determined 50 4 Initial ERD for Contracting Company Figure 4.10 51 4 Normalization and Database Design • Three Entities After Transitive Dependency Removed PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_CODE) JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR) 52 4 Modified ERD for Contracting Company Figure 4.11 53 4 Initial ERD for Contracting Company Figure 4.10 54 4 Modified ERD for Contracting Company Figure 4.11 55 4 Final ERD for Contracting Company Figure 4.12 56 4 Normalization and Database Design • Attribute ASSIGN_HOUR is assigned to the composite entity ASSIGN. • “Manages” relationship is created between EMPLOYEE and PROJECT. PROJECT (PROJ_NUM, PROJ_NAME, EMP_NUM) EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, EMP_HIREDATE, JOB_CODE) JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR) ASSIGN (ASSIGN_NUM, ASSIGN_DATE, PROJ_NUM, EMP_NUM, ASSIGN_HOURS) 57 The Relational Schema For The Contracting Company 4 58 4 Higher-Level Normal Forms • 4NF Definition – A table is in 4NF if it is in 3NF and has no multiple sets of multivalued dependencies. 59 4 Conversion to 4NF Figure 4.15 Set of Tables in 4NF Figure 4.14 Multivalued Dependencies 60 4 Multivalued Attributes • Can you think of a kind of database that has lots of Multivalued attributes? 61 4 Fifth normal form (5NF) • • • A table can be reconstructed from other tables There exists some rule that enables a relation to be inferred Base case – Consultants provide skills to one more firms and firms can use many consultants; a consultant has many skills and a skill can be used by many firms; and a firm can have a need for many skills and the same skill can be required by many firms CONSULTANT ASSIGNMENT *consultid … FIRM *firmid … SKILL *skilldesc … 62 4 Fifth normal form (5NF) • The rule – If a consultant has a certain skill (e.g., database) and has a contract with the firm that requires that skill (e.g., IBM), then the consultant advises the firm on that skill (i.e., he advises IBM on database) CONSULTANT CONTRACT *consultid ... ADVISE FIRM *firmid ... SKILL REQUIRE *skilldesc ... 63 4 Why is this useful? • Multi Locations • Disaster Recovery • Query optimization 64 4 Denormalization • Normalization is one of many database design goals • Normalized table requirements – Additional processing – Loss of system speed • Normalization purity is difficult to sustain due to conflict in: – Design efficiency – Information requirements – Processing 65 4 Unnormalized Table Defects • Data updates less efficient • Indexing more cumbersome • No simple strategies for creating views 66 The Initial 1NF Structure 4 67 4 Identifying the Possible PK Attributes 68 4 Table Structures Based On The Selected PKs 69 4 70 4 Closing the Loop • Data Modeling, Table Design, and Normalization fit into an overall Systems Development Life Cycle (SDLC) that describes the history of any project. • The Database component(s) of the system exhibits its own independent Data Base Life Cycle (DBLC) • The Data Base Administrator’s job is to shepherd the database through the: Analysis > Design > Implementation > Support path that characterizes the SDLC. • At any two points in history the same data may be organized very differently. 71 Systems Development Life Cycle 4 72 Database Lifecycle (DBLC) 4 Figure 6.3 73 4 How it fits together 74 4 Execution • None of these paths are Prescribed in this methodology. • Time is often the enemy and multiple trips through the system slows the project down. • Remember the Brooks “Mythical Man Month” e.g. No nine women, working together in any known way, cannot produce a baby in one month. 75