CSIS 254 Oracle Normalization Relational Databases (Review) • In relational databases, all data is stored in tables, which correspond roughly to entities • Each table is two-dimensional, consisting of rows and columns • Each row in a table, called a t-uple, corresponds to an occurrence of the entity • Columns in each table contain similar data across all rows in the table Relational Database Example The following table is an example of a relational table describing classes that students have taken at a mythical college used in the rest of this lesson Student Id Student Name 0194327 0194327 1850243 1850243 1850243 8502432 7402943 Joe Adams Joe Adams Joe Adams Jane Smith Jane Smith Ida Know Eunice Eye Course Id CSIS-840 CSIS-824 CSIS-740 CSIS-941 CSIS-840 CSIS-184 CSIS-824 Course Name VB Concepts Intro to C++ Oracle Admin Systems Des. VB Concepts Networks PowerPoint Grade Term Teacher C B A B B A W Spr-02 Fal-02 Spr-03 Fal-02 Spr-02 Sum-03 Spr-02 Wilkins Smythe Wallace Evans Wolkins Farmer Simpson Relational Database Example • Each row (or t-uple) in the table describes a Class taken by a Student in a term at our college • The data in each column is consistent throughout the table • However, there are three inconsistencies in the table itself. Can you find them? Primary Keys (Review) • Each row in a table has a primary key, which is the column or set of columns identified to our DBMS that uniquely identifies it from every other row in the table • No attribute value in a primary key can be NULL • A table can have only one primary key • If a primary key is not specified, Oracle supplies one • What would be the primary key for our sample database? Foreign Keys (Review) • An attribute (or group of attributes) in a table can also be a foreign key, meaning that it references the primary key (or at least unique attribute) to another table • An example would be a Customer Id attribute on an invoice header, which would reference the customer account information for that invoice Normalization Let’s begin our discussion of normalization by using an example -- we want to expand the sample relational table for our mythical college by tracking data for: – – – – – – – students courses departments teachers classes (courses offered during a term) teachers assigned to each class students enrolled in each class Database Normalization Example CLASS We might start off with an entity for each Class that looks something like this Course Id Term Offered Department Name Course Description Classroom (or “Internet”) Credits / Hours Teacher Id Teacher Name Student #1 Data Student #2 Data …. Student #30 Data Database Normalization Example CLASS (exploded) The information stored for each student would be What problems can you see with this scheme? … Student #1 Data Id Full Name e-mail Addresses Grade for Class GPA Student #2 Data Id Full Name e-mail Addresses Grade for Class GPA. Problems with Our Example • We can’t have more than 30 students in a class • There’s lots of duplicate information in our tables – This design would require many updates whenever a change was made to data about a department, a teacher, a student, etc. • Does it make sense for us to have to know, for example, a course number in order to to look up a teacher’s name? Problems with Our Example (continued) • Removing a class entity occurrence might remove valuable information from our database • We don’t have any data verification checks – We might wind up with inconsistent data across two or more records (is this necessarily bad if we are trying to take snapshots?) Normalization Goal #1 Remove redundant data • Duplicated data wastes disk space • Duplicated data may not necessarily be consistent, that is, stored in exactly the same way • Redundant data creates problems for our coders – Ideally, data should be stored (and changed) in exactly the same way in all locations, which not only is time consuming for the system’s programmers, but also takes computer resources to perform once the system is implemented Normalization Goal #2 Remove dependency issues • It is not intuitive for a user of our new system to look in the CLASS entity to find, for example, a student’s email address. • It would probably make more sense to move this information into a separate entity (i.e., a database table that defines students). Normalization The Bottom Line “In summary, normal forms insure that we do not compromise the integrity of our data by either creating false data or destroying true data.” Ensor & Stevenson Forms of Normalization • To accomplish these goals, we have created a set of rules which define normal forms or levels. • There are five normal forms, each progressively more restrictive, which are called first normal form (1NF), second normal form (2NF), … • Most database designers only consider the first three forms in their work, as we will • As we shall see, there might be good reasons to deviate from these normal forms First Normal Form (1NF) • A database is in first normal form (1NF) if each attribute of the database is simple, single-valued (atomic), and does not repeat – Let’s assume column definitions are consistent across rows • Method: – Reduce all attributes into atomic components – Eliminate duplicative columns (repeating groups) and multivalued attributes from the same table – Create a separate table for each group of related data – Identify each row with a unique column or set of columns (a primary key) Our Sample Database CLASS Here’s what our database entity for classes at our college currently looks like Course Id Term Offered Department Name Course Description Classroom (or “Internet”) Credits / Hours Teacher Id Teacher Name Student #1 Data Student #2 Data …. Student #30 Data Our Sample Database in 1NF We should divide the Course Id into a Department Id and Course Number (e.g., Course ID “CSIS-254” would be divided into Department Id “CSIS”, Course Number “254”) (Won’t this make the Department Name redundant?) CLASS Department Id (added) Course Number (added) Term Offered Department Name Course Description Classroom (or “Internet”) Credits / Hours Teacher Id Teacher Last Name Student #1 Data Student #2 Data …. Student #30 Data Our Sample Database in 1NF Next, break out Student Ids, Names, e-mail Address, and Grades into a separate entity, eliminating the repeating Student groups. CLASS / STUDENT Department Id Course Number Term Offered Student Id Student Full Name Student e-mail Addresses Student Grade for Class Student GPA Our Sample Database in 1NF CLASS / STUDENT We need to break down the Student’s Names into their simpler components Department Number Course Number Term Offered Student Id Student Full Name First Name Middle Name Last Name Student e-mail Addresses Student Grade for Class Student GPA Our Sample Database in 1NF Finally, we need to break out Student email Addresses into another entity, where each occurrence represents a single e-mail address CLASS / STUDENT E-MAIL ADDRESS Department Id Course Number Term Offered Student Id Address Number or Id Student e-mail Address Our Sample Database in 1NF CLASS Department Id Course Number Term Offered Department Name Course Description Classroom (or “Internet”) Credits / Hours Teacher Id Teacher Last Name CLASS / STUDENT Department Id Course Number Term Offered Student Id Student Full Name First Name Middle Name Last Name Student Grade for Class Student GPA Our Sample Database in 1NF CLASS / STUDENT Department Id Course Number Term Offered Student Id Student Full Name First Name Middle Name Last Name Student Grade for Class Student GPA CLASS / STUDENT E-MAIL ADDRESS Department Id Course Number Term Offered Student Id Address Number or Id Student e-mail Address 1NF Advantages • Removes limits artificially introduced into a database design by using repeating groups • Ensures that attributes are broken into their most basic units and are not multi-valued Exercise FAVORITE TV SHOWS Put the following table in 1NF, then draw an ERD for your new system TV Show Name Category Main Star Name #1 Main Star Name #2 Main Star Name #3 Day and Time Shown Network Channel My Rating (1-10) One Possible Answer SHOW / STARS TV Show Name Star Number Star Name FAVORITE TV SHOWS TV Show Name Category My Rating (1-10) SHOW TIMES TV Show Name Slot Number Date and Time Network Channel Second Normal Form (2NF) • 2NF implies 1NF by definition • All non-key attributes must be fully-dependent on every key attribute in the primary key – In other words, a non-key attribute cannot depend on only part of the primary key – This restriction applies only to tables with composite keys • 2NF reduces redundant data in a table by extracting it, placing it in new table(s), then creating relationships between those tables. Second Normal Form (2NF) • Method: – Remove subsets of data that appear in multiple rows of a table, and place into separate tables – Create relationships between these new tables and their predecessors through the use of foreign keys. Our Sample Database in 2NF We can break out the Department Name from the CLASS entity, as it will be the same for each Class having the same Department DEPARTMENT Department Id Department Name Our Sample Database in 2NF We also can break out the Course Description from this entity, as it also will be the same for each Class referencing the same Course COURSE Department Id Course Number Course Description Credits / Hours Note that we’ve kept the Department Id in this entity. Why? Our Sample Database in 2NF We can also break out the information about each Teacher, since it also will be the same for each Class that a Teacher conducts, irrespective of the Class TEACHER Teacher Id Teacher Last Name Our Sample Database in 2NF Our new CLASS / STUDENT entity can also have its studentrelated attributes (names, and GPA) broken out, that is, attributes that do not change with the class number STUDENT Student Id Student Full Name First Name Middle Name Last Name Student GPA Our Sample Database in 1NF Student e-mail Addresses are not dependent upon Department Id, Course Number, or Term, so remove them from the email entity STUDENT E-MAIL ADDRESS Department Id (deleted) Course Number (deleted) Term Offered (deleted) Student Id Address Number or Id Student e-mail Address Our Sample Database in 2NF Our final CLASS / STUDENT entity, minus all of the attributes that have been moved to other entities, looks like CLASS / STUDENT Department Id Course Id Student Id Term Student Grade for Class 2NF and Foreign Keys • To ensure data integrity, we would implement four foreign keys in our CLASS, CLASS / STUDENT – Department Id must reference an occurrence in DEPARTMENT entity – Course Id must reference a row in COURSE – Student Id must reference a row in STUDENT – Teacher Id must reference a row in TEACHER • Would we implement a similar restriction on our student e-mail address entity? 2NF Advantages • All advantages of 1NF • Common data is forced to be consistent, since it is stored in only one place in the database • We can store data about separate entities without implying the existence of others – In our original database design, we can’t store information about Students, Teachers, or Departments if we don’t have any classes in which they are involved. Exercise SALES ORDER Convert the following table into 2NF, and draw a new ERD Order Number Customer Account Number Customer Account Name Customer Address Date of Entry of Order Date of Requested Shipment Item Numbers Item Descriptions Quantities Ordered Unit Prices Extended Prices Total Order Price Third Normal Form (3NF) • 3NF implies 2NF (which implies 1NF) • A database is in third normal form (3NF) if the data in every column of each row (occurrence) in a table (entity) is dependent ONLY upon each column in the key – In general, any time the contents of a group of fields may apply to more than a single record in the table, consider placing those fields in a separate table. – This means that derived attributes are not allowed in 3NF Third Normal Form (3NF) • All attributes depend upon the key, the whole key, and nothing but the key • Method: – Remove all derived columns – Move all remaining columns not dependent on the key into a new table Our Sample Database in 3NF Our STUDENT entity cannot contain a GPA, since that is a derived attribute (the average of all of the Grades received) STUDENT Student Id Student Names First Name Middle Name Last Name Student GPA (deleted) Advantages of 3NF • All advantages of 1NF and 2NF • Information is stored in one and only one place in the database • All entities are now 2-dimensional, nonredundant, and can be implemented in relational tables Disadvantages of Normalization • Proliferation of tables, resulting in increased system complexity – Can be overcome with views for end-users • Performance hits through added tables and lack of derived attributes – May be partially offset by reduced computing needs of maintaining data only once • We will discuss these in detail next week... Last Slide Next Week’s Assignment • Draw a complete ERD for our normalized 3NF mythical college database. Does it make sense to you? • Normalize the two organizations / systems that you used in last week’s homework by updating their ERD’s (Engineering Method only). • Introduce at least two derived attributes that you might include in your design, and explain why. • Prepare for a quiz next week on what we have covered so far in class: Stages of SDLC, Entities, Attributes, Relationships, Diagramming, and Normalization