Database Theory and Terminology, Part 2 How Many Tables? • Databases for real businesses tend to have a lot of tables, but not always the right number. • Normalization generally results in more tables. • However, beginning database designers frequently create too many tables in ways that have nothing to do with normalization. The most common of these are: – Using two tables in a one-to-one relationship. – Making separate tables based on an attribute. One-to-one Relationship • A one-to-one relationship between two tables is when each record in one table corresponds to one or zero records in the other table. • One-to-one relationships can legitimately be used in supertype/subtype situations (coming soon), and rarely in other situations. • Beginners frequently use them unnecessarily, using two tables where only one is needed. • The next slide gives an example. • The two tables on top are in a one-to-one relationship on the StudentId field. • This only complicates the database. The tables are easily combined into one. BAD EXAMPLE! BETTER! Separating Tables by an Attribute • The most common type of error (at least for 373 students) is creating multiple tables for a single entity, separating the records based on the value of a single attribute. • This results in a database with a lot of tables which is slow and difficult to query. • Several examples follow. Too Many Tables • It is not uncommon for beginning database designers to think that different tables are used to represent different categories. • Here is a design for a database meant to hold the chemical elements. BAD EXAMPLE! • As you can see, each table has exactly the same fields. • The only thing separating the tables is the “Series” of the elements— Actinides, NobleGases, Nonmetals, etc. • By recognizing that Series is really just another attribute of elements, all of these tables can be combined into one table containing all elements. • Adding a “Series” column allows all of the elements to be stored in a single table. GOOD EXAMPLE! Same fields? One table. • Obviously, these “tables” came from the Elements table, which is where the data actually belongs. • Note that the Elements table has all of the same fields as each table on the previous slide, plus a Series field. This allows elements from all series to be stored in a single table which is more efficient and easier to query. • At least at the level of chemistry we are looking at here, “Series” is an attribute of the “Element” entity; not an entity in itself. • Breaking up a single entity into multiple tables based on one attribute is bad database design. Same Fields--Baseball BAD EXAMPLE! BETTER EXAMPLE • • • This is a big improvement over the previous slide; however The “TEAM” field is not a good choice to be a part of the primary key, since it uses the names of the teams. If this were a database actually used by Major League Baseball or ESPN, teams would be assigned a TeamNumber surrogate key which would be used in all related tables (like players, schedules, results). Same Fields--Players BAD EXAMPLE! OKAY EXAMPLE • • In a simple database, this could be an acceptable table. It has all of the sports in a single table, and it has a good primary key. However, in a more heavy-duty database, StudentName would likely be divided into LastName, FirstName, and MiddleInitial fields, and SportName would be replaced with a SportID foreign key field which would link to a Sports table. Multi-Single Table Parents BAD EXAMPLE! BETTER EXAMPLE Same Fields, Same Table • If you have two tables that have exactly the same fields, they almost certainly represent the same entity. Therefore, • The tables should be combined, adding a field to hold the attribute that you had used to separate them. Different Fields? Different Tables. • The Customers and Products entities from GuateTours have no attributes in common. • Trying to put them into the same table would make no sense. • It would also violate every conceivable level of normalization. But… Isn’t there something in between “all fields” and “no fields” in common? • Good question! How about the Customers and Employees tables in GuateTours? They share three fields in common, and even the primary keys are pretty similar. Should we combine them into a single table or not? Another good question! • In this case, we could try to combine employees and customers into a single Persons table, with a “PersonType” field to tell us whether a particular record is an employee or a customer. • However, this ends up with a lot of blank cells, and some confusion as well. Who is the next customer’s boss? What is employee Jose’s PartySize? Separate Tables • In this case, keeping Employees and Customers in separate tables is the right choice. • They have enough different fields, and • It is unlikely that anyone will run frequent queries to get information from both fields, such as a list of the names and phone numbers of all employees and customers. • Although both are examples of people, to the business they are treated completely differently. • Therefore, separate tables. Super Types and Sub Types • This example is based on pages 184 to 188 of “Databases Demystified” (available on reserve). • Here’s the relationship diagram; explanation on the following slides: Super Types and Sub Types • The Customer table is called a “super type”; it contains the fields shared by all types of customers of a particular business. • The IndividualCustomer and CommercialCustomer tables are called “sub types”; they contain fields specific to those types of customers. Super Types and Sub Types • Both sub type tables are linked to the Customer table with one-to-one relationships; every customer in either sub type is matched with a single record in the Customer table, and each customer in the Customer table appears at most once in a sub type table. Super Types and Sub Types • After you have learned to create queries in Access and using SQL, you will see that: – We could easily recreate a “complete” list of Individual Customers by running an INNER JOIN query between the IndividualCustomer and Customer tables. – We can also quickly prepare a mailing or calling list for all customers with a simple query on the Customers table. One-to-One Relationships • The relationship between a super type table and its related sub type tables is one-to-one. • Each record in one table corresponds to at most one record in the related table. • The relationship between a supertype and its subtypes is one of the few places where it is necessary or appropriate to have one-to-one relationships. Super Types, Sub Types Summary • Breaking up the Customer table into subtypes while retaining common fields in the super type Customer table makes sense. – It provides organization, recognizing that the two types of customers share attributes, but – It also avoids the confusion that would be caused if all customers were included in a single table (what is the CompanyType of an individual?). – For many purposes, a company will treat all customers the same way (mailings, sale prices). – In contrast, most businesses would not treat customers and employees the same way: • not only would many fields be different, but • how they are used in the database is different. Therefore, • keeping them in separate tables is appropriate. Lookup Tables • I cropped this part of the relationship diagram out of the earlier slides. • This shows that the “CompanyType” field of the CommercialCustomer table is related to the only field in the CustomerTypes table. • That table is called a “Lookup” table—a limited set of values from which a particular field should be chosen. Lookup Tables • It also common to have a two-field lookup table—the allowable values along with a numeric primary key. • The advantage of either type of lookup table is that it doesn’t allow database users to make up their own entries, which might be incorrect, misspelled, or otherwise inappropriate. Lookup tables • The table below demonstrates what can happen if you use text fields instead of lookups. • Try writing a query to find all sole proprietorships in that table! (Assuming there are a lot more records.) Actually, don’t. There are constructs in programming very similar to lookup tables. Anyone know what they are called? (Jeopardy music…) Redundancy is Bad in Tables, Not in Lectures! • Good relational database design is about optimizing how the data is STORED, not how it is DISPLAYED. • Most “tables” you have seen—in books, in lectures, on the web—were probably optimized for display, not for storage. • Relational database tables are designed for consistency and to reduce redundancy. They are not designed for appearance. • When we learn SQL and Visual Basic, we will look at various ways to display the data stored in relational database tables. Relationships • In the Guate Tours database, go to the Database Tools tab on the ribbon. • Click on “Relationships”. You should see this: What the relationship diagram shows • This is the relationship diagram for this database. • This diagram basically tells Access which fields in a table are foreign keys—that is, which fields are primary keys of other tables. • For example, the EmployeeID field in the Tours table is a foreign key—it is linked to the primary key of the Employees table. • The “1” and the “” symbol indicate that this relationship is “one-to-many” • That is, each tour has one employee, but each employee can work on many tours. What Relationships Are • The technical term for a relationship is “foreignkey constraint” • This means that when you place a value in a foreign-key field, it should have a matching primary-key value in the related table. • For example, we assign an employee to a tour by putting his/her EmployeeID number in the EmployeeID field in the Tours table. • The relationship (foreign-key constraint) requires that the matching EmployeeID already exists in the Employees table. Examining Relationships • If you right-click on one of the relationship lines, a context menu appears: • Selecting “Edit Relationship” brings up this window: • It shows the fields in the two tables that are related. Enforce Referential Integrity • “Enforce Referential Integrity” means that you are in a serious relationship; you’re not going to get out of this one easily! • If you check this (as I will require you to do for assignments), Access will not allow you to enter a value which doesn’t exist in the related table. • You see that the Tours table’s EmployeeID field is related to the Employees table’s primary key. • Watch what happens if I try to assign a tour to a non-existent employee. Access as Assistant • “You cannot add or change a record because a related record is required in table ‘Employees.’ • In other words, Access is telling me “You asked me to enforce referential integrity, and that’s what I’m doin’! You gotta problem with that?” • Basically, Access is helping me to teach you about foreign keys. One of the things you’ll learn to hate about Access, but I’ve learned to like. Creating Relationships • To create relationships, you need to open the Relationships window. You do this from the Database Tools tab on the ribbon. • The easiest way to create a relationship is to drag a field from one table to another. The relationship properties box will appear: Referential Integrity Must Be Enforced! • As I said before, I will require you to check the “Enforce Referential Integrity” box in your relationships. This will accomplish three things: 1. It will protect the integrity of your data. 2. It will give Access the opportunity to teach you a lesson or two. 3. It will annoy and frustrate you at times. Cascading • I don’t want you to check the two other checkboxes: Cascade Update Related Fields and Cascade Delete Related Records. • Cascade Update might happen if you changed a primary key value. Perhaps you have a customer named Joe Superstitious, who just happens to have been assigned customer number 13. He thinks that’s bad luck, so you agree to change it for him. Cascade updates would cause all records in related tables (Orders, for example) to change CustomerID values of 13 to his new CustomerID. • Maybe he’s so superstitious he won’t ever shop with you again; he wants to cancel his account. Cascade Delete would remove all related records, such as all the orders that Joe had placed over the years. • There are other ways to deal with these situations (simply adding a True/False “Active” column to the Customers table does the trick). Cascade update and delete destroy data and are therefore dangerous and not recommended. Relationship Types • The most common type of relationships in databases are one-to-many and many-tomany. • Oftentimes the distinction depends on how the business is run. In our example, the Employees to Tours relationship is one-tomany: One employee can work on many tours, but each tour has only one employee assigned. Many-to-Many Relationships • If your tours became larger, it is certainly possible that you might have more than one employee assigned to a tour. The relationship would then be many-to-many. One employee can work many tours, and one tour can have many employees. • The Guate Tours database already has two many-tomany relationships: Customers-Tours, and OrdersProducts. A tour usually has many customers, and a customer can sign up for many tours. An order can contain many types of products, and a particular product can be a part of many orders. Representing Many-to-Many Relationships • Access won’t allow you to directly define a manyto-many relationship (neither will any other DBMS) • Many-to-many relationships are created using an intersection table: a table with a compound primary key which is composed of the primary keys of the two related tables. • The intersection table is then related to each of the other tables with one-to-many relationships. Many-to-Many Examples • Look back at the Relationships diagram in GuateTours. • The two intersection tables (which implement the many-tomany relationships) are CustomerTour and OrderDetails. • Note that the primary key of CustomerTour includes the keys from the two related tables PLUS the TourDate (since a customer might take the same tour more than once). • The primary key of OrderDetails is composed of the primary keys of Orders and Products. Quantity is included here because it is a property of the combination of the order and the product: How many of THIS product are included in THIS order. A Business Decision • Whether a relationship is one-to-many or many-to-many is frequently a business decision. • Suppose that you buy your office supplies from Office Depot, Office Max, or Staples. • For simplicity, you buy all of your paper from Office Depot, all of your printer supplies from Office Max, and all of your tacks and staples from Staples. • In this case, each supplier supplies many products, but each product comes from only one supplier. This is a one-tomany relationship: More flexible, more complex • Using only one supplier for each product is simple, but it could be costing you money. Why not buy all products from all suppliers when they are on sale? • This creates a many-to-many relationship: • Note that Price has been moved to the intersection table, since the price for each product may vary from store to store. Many-to-Many • It is harder to design many-to-many relationships, and to write application code for them; However • Chances are that in many cases where you think that a one-to-many relationship is enough, you will eventually need the flexibility of a many-tomany relationship. – Will employees really do ONLY one thing? – Will players play ONLY one position? • If the answer is “Maybe not,” use a many-tomany relationship. Reflexive Relationships • Sometimes, a field in a table relates to another field in the same table. • This usually indicates some sort of hierarchy within the records in the table. • In GuateTours, I added a BossID field to the Employees table. This field gets filled with the EmployeeID of that employee’s boss. • Some DBMS’s allow you to draw relationship diagrams which show reflexive relationships directly—an arrow from BossID up to EmployeeID. • Access doesn’t let you do this. To show a reflexive relationship, you must show a second copy of the Employees table and create the relationship between the original and the copy. Quick Review • You have now been introduced to much of the theory and terminology of relational databases. • Being comfortable with the terminology will be crucial to your understanding the theory and practice of database design using third normal form. • Therefore, here’s a quick review of some of the definitions you’ve seen (and will need to know for the rest of this lecture, as well as for exams): Definitions • Database: a database is a collection of interrelated data items that are managed as a single unit. • Relational Database: A collection of tables, the relationships between them, and auxiliary items such as views and stored procedures. The tables are organized according to the principles first described by E.F. Codd. • DBMS: Database Management System—the computer software that organizes the data on computers and manages access to it. Examples include Oracle, MySql, DB2, and Microsoft’s SQL Server (for large-scale databases) and Access (for smaller databases). Definitions • Relation: A set of ordered tuples. Relations are represented by tables in databases (not by relationships!) • Entity: A generic noun, representing a class of things, but not one particular thing. • Attribute, Field, Column: Properties possessed by entities. These are known as “fields” or “columns” in database tables. Definitions • Tuple, Record, Row: The theorist’s “tuple” becomes a “record” or “row” in a database table. • The three anomalies: Insert, Update, and Delete. These are caused by trying to store information about more than one entity in a single table. We’ll look at these further next week. • 3NF: Third Normal Form. This will be the main topic for next week. Definitions • OLTP: Online Transaction Processing. This type of database is used in the day-to-day operation of a business. It is designed to handle frequent changes, frequent requests for small amounts of data, and multiple concurrent users. It is the type of database that requires 3NF, and what we will be discussing next week. • OLAP: Online Analytical Processing. Databases composed of historical data which isn’t being constantly updated. OLAP databases are used for analyzing performance, not for dayto-day operations. They do not require 3NF. • Normalization: Modifying the design of a database so that its tables are in 3NF. Definitions • Table Design: Defining the fields that make up a table, including identifying data types and assigning primary keys. • Populating a Table: Adding rows of data to a table. • Constraint: A restriction on values that can be entered into a column. Setting the data type is one type of constraint; adding numeric ranges or min/max text lengths is another; and primary and foreign keys are a third type of constraint. • Primary Key: One or more columns in a table which (together) uniquely identify a row (distinguish it from all others in the table). • Candidate Key: Any field or combination of fields that could serve as the primary key. Definitions • Simple Key: A primary key consisting of one field. • Compound Key: A primary key consisting of two or more fields. • Natural Key: A pre-existing or ready-made field which can serve as the primary key for a table. • Surrogate Key: A field (usually numeric) added to a table as the primary key when no natural keys are available. Definitions • Foreign Key: a field (or fields) in a table that is not the primary key in that table, but IS the primary key in another table. • Referential Integrity: This is a property of a relationship in Access which tells Access to take the relationship seriously by enforcing the foreign-key constraint. Entering a value in the foreign-key column of one table will require that that value already exist in the primary-key column of the other. • Intersection Table: A table used to implement a many-tomany relationship. The primary key of the intersection table is the combination of the primary keys of the two related tables.