Database Theory and - UM Personal World Wide Web Server

advertisement
Database Theory and
Terminology, Part 2
How Many Tables?
• Databases for real businesses tend to have a
lot of tables, but not always the right number.
• Normalization generally results in more tables.
• However, beginning database designers
frequently create too many tables in ways that
have nothing to do with normalization. The
most common of these are:
– Using two tables in a one-to-one relationship.
– Making separate tables based on an attribute.
One-to-one Relationship
• A one-to-one relationship between two tables is when each
record in one table corresponds to one or zero records in
the other table.
• One-to-one relationships can legitimately be used in
supertype/subtype situations (coming soon), and rarely in
other situations.
• Beginners frequently use them unnecessarily, using two
tables where only one is needed.
• The next slide gives an example.
• The two tables on top are in a one-to-one relationship on
the StudentId field.
• This only complicates the database. The tables are easily
combined into one.
BAD
EXAMPLE!
BETTER!
Separating Tables by an Attribute
• The most common type of error (at least for
373 students) is creating multiple tables for a
single entity, separating the records based on
the value of a single attribute.
• This results in a database with a lot of tables
which is slow and difficult to query.
• Several examples follow.
Too Many Tables
• It is not uncommon for beginning database designers to think that
different tables are used to represent different categories.
• Here is a design for a database meant to hold the chemical elements.
BAD
EXAMPLE!
• As you can see, each table has exactly the same fields.
• The only thing separating the tables is the “Series” of the elements—
Actinides, NobleGases, Nonmetals, etc.
• By recognizing that Series is really just another attribute of elements, all of
these tables can be combined into one table containing all elements.
• Adding a “Series” column allows all of the
elements to be stored in a single table.
GOOD
EXAMPLE!
Same fields? One table.
• Obviously, these “tables” came from the Elements
table, which is where the data actually belongs.
• Note that the Elements table has all of the same fields as each table
on the previous slide, plus a Series field. This allows elements from
all series to be stored in a single table which is more efficient and
easier to query.
• At least at the level of chemistry we are looking at here, “Series” is
an attribute of the “Element” entity; not an entity in itself.
• Breaking up a single entity into multiple tables based on one
attribute is bad database design.
Same Fields--Baseball
BAD
EXAMPLE!
BETTER
EXAMPLE
•
•
•
This is a big improvement over the previous slide; however
The “TEAM” field is not a good choice to be a part of the primary key, since it uses the names of the
teams.
If this were a database actually used by Major League Baseball or ESPN, teams would be assigned a
TeamNumber surrogate key which would be used in all related tables (like players, schedules, results).
Same Fields--Players
BAD
EXAMPLE!
OKAY
EXAMPLE
•
•
In a simple database, this could be an acceptable table. It has all of the sports in a single table, and
it has a good primary key.
However, in a more heavy-duty database, StudentName would likely be divided into LastName,
FirstName, and MiddleInitial fields, and SportName would be replaced with a SportID foreign key
field which would link to a Sports table.
Multi-Single Table Parents
BAD
EXAMPLE!
BETTER
EXAMPLE
Same Fields, Same Table
• If you have two tables that have exactly the
same fields, they almost certainly represent
the same entity. Therefore,
• The tables should be combined, adding a field
to hold the attribute that you had used to
separate them.
Different Fields? Different Tables.
• The Customers and Products entities from
GuateTours have no attributes in common.
• Trying to put them into the same table would
make no sense.
• It would also violate every conceivable level of
normalization.
But… Isn’t there something in between
“all fields” and “no fields” in common?
• Good question! How about the Customers
and Employees tables in GuateTours?
 They share three fields in common, and even the
primary keys are pretty similar. Should we
combine them into a single table or not?
Another good question!
• In this case, we could try to
combine employees and
customers into a single Persons
table, with a “PersonType” field
to tell us whether a particular
record is an employee or a
customer.
• However, this ends up with a lot
of blank cells, and some
confusion as well. Who is the
next customer’s boss? What is
employee Jose’s PartySize?
Separate Tables
• In this case, keeping Employees and Customers in
separate tables is the right choice.
• They have enough different fields, and
• It is unlikely that anyone will run frequent queries
to get information from both fields, such as a list
of the names and phone numbers of all
employees and customers.
• Although both are examples of people, to the
business they are treated completely differently.
• Therefore, separate tables.
Super Types and Sub Types
• This example is based
on pages 184 to 188 of
“Databases
Demystified” (available
on reserve).
• Here’s the relationship
diagram; explanation on
the following slides:
Super Types and Sub Types
• The Customer table is
called a “super type”; it
contains the fields shared
by all types of customers
of a particular business.
• The IndividualCustomer
and CommercialCustomer
tables are called “sub
types”; they contain fields
specific to those types of
customers.
Super Types and Sub Types
• Both sub type tables are
linked to the Customer
table with one-to-one
relationships; every
customer in either sub
type is matched with a
single record in the
Customer table, and each
customer in the Customer
table appears at most
once in a sub type table.
Super Types and Sub Types
• After you have learned to
create queries in Access and
using SQL, you will see that:
– We could easily recreate a
“complete” list of Individual
Customers by running an
INNER JOIN query between
the IndividualCustomer and
Customer tables.
– We can also quickly prepare a
mailing or calling list for all
customers with a simple query
on the Customers table.
One-to-One Relationships
• The relationship between a super type table
and its related sub type tables is one-to-one.
• Each record in one table corresponds to at
most one record in the related table.
• The relationship between a supertype and its
subtypes is one of the few places where it is
necessary or appropriate to have one-to-one
relationships.
Super Types, Sub Types Summary
• Breaking up the Customer table into subtypes while
retaining common fields in the super type Customer table
makes sense.
– It provides organization, recognizing that the two types of
customers share attributes, but
– It also avoids the confusion that would be caused if all
customers were included in a single table (what is the
CompanyType of an individual?).
– For many purposes, a company will treat all customers the same
way (mailings, sale prices).
– In contrast, most businesses would not treat customers and
employees the same way:
• not only would many fields be different, but
• how they are used in the database is different. Therefore,
• keeping them in separate tables is appropriate.
Lookup Tables
• I cropped this part of the
relationship diagram out of the
earlier slides.
• This shows that the
“CompanyType” field of the
CommercialCustomer table is
related to the only field in the
CustomerTypes table.
• That table is called a “Lookup”
table—a limited set of values
from which a particular field
should be chosen.
Lookup Tables
• It also common to have a
two-field lookup table—the
allowable values along with a
numeric primary key.
• The advantage of either type
of lookup table is that it
doesn’t allow database users
to make up their own entries,
which might be incorrect,
misspelled, or otherwise
inappropriate.
Lookup tables
• The table below demonstrates what can happen if
you use text fields instead of lookups.
• Try writing a query to find all sole proprietorships in
that table! (Assuming there are a lot more records.)
Actually, don’t.
 There are constructs in programming very
similar to lookup tables. Anyone know what
they are called? (Jeopardy music…)
Redundancy is Bad in Tables, Not in
Lectures!
• Good relational database design is about
optimizing how the data is STORED, not how it is
DISPLAYED.
• Most “tables” you have seen—in books, in
lectures, on the web—were probably optimized
for display, not for storage.
• Relational database tables are designed for
consistency and to reduce redundancy. They are
not designed for appearance.
• When we learn SQL and Visual Basic, we will look
at various ways to display the data stored in
relational database tables.
Relationships
• In the Guate Tours database, go to the Database
Tools tab on the ribbon.
• Click on “Relationships”. You should see this:
What the relationship diagram shows
• This is the relationship diagram for this database.
• This diagram basically tells Access which fields in
a table are foreign keys—that is, which fields are
primary keys of other tables.
• For example, the EmployeeID field in the Tours
table is a foreign key—it is linked to the primary
key of the Employees table.
• The “1” and the “” symbol indicate that this
relationship is “one-to-many”
• That is, each tour has one employee, but each
employee can work on many tours.
What Relationships Are
• The technical term for a relationship is “foreignkey constraint”
• This means that when you place a value in a
foreign-key field, it should have a matching
primary-key value in the related table.
• For example, we assign an employee to a tour by
putting his/her EmployeeID number in the
EmployeeID field in the Tours table.
• The relationship (foreign-key constraint) requires
that the matching EmployeeID already exists in
the Employees table.
Examining Relationships
• If you right-click on one of the relationship lines, a context menu appears:
• Selecting “Edit Relationship” brings up this window:
• It shows the fields in the
two tables that are
related.
Enforce Referential Integrity
• “Enforce Referential Integrity” means that you
are in a serious relationship; you’re not going to
get out of this one easily!
• If you check this (as I will require you to do for
assignments), Access will not allow you to enter a
value which doesn’t exist in the related table.
• You see that the Tours table’s EmployeeID field is
related to the Employees table’s primary key.
• Watch what happens if I try to assign a tour to a
non-existent employee.
Access as Assistant
• “You cannot add or change a record because a
related record is required in table ‘Employees.’
• In other words, Access is telling me “You asked
me to enforce referential integrity, and that’s
what I’m doin’! You gotta problem with that?”
• Basically, Access is helping me to teach you
about foreign keys. One of the things you’ll
learn to hate about Access, but I’ve learned to
like.
Creating Relationships
• To create relationships, you need to open the
Relationships window. You do this from the
Database Tools tab on the ribbon.
• The easiest way to create a relationship is to
drag a field from one table to another. The
relationship properties box will appear:
Referential Integrity Must Be Enforced!
• As I said before, I will require you to check the
“Enforce Referential Integrity” box in your
relationships. This will accomplish three
things:
1. It will protect the integrity of your data.
2. It will give Access the opportunity to teach
you a lesson or two.
3. It will annoy and frustrate you at times.
Cascading
• I don’t want you to check the two other checkboxes: Cascade Update
Related Fields and Cascade Delete Related Records.
• Cascade Update might happen if you changed a primary key value.
Perhaps you have a customer named Joe Superstitious, who just happens
to have been assigned customer number 13. He thinks that’s bad luck, so
you agree to change it for him. Cascade updates would cause all records in
related tables (Orders, for example) to change CustomerID values of 13 to
his new CustomerID.
• Maybe he’s so superstitious he won’t ever shop with you again; he wants
to cancel his account. Cascade Delete would remove all related records,
such as all the orders that Joe had placed over the years.
• There are other ways to deal with these situations (simply adding a
True/False “Active” column to the Customers table does the trick).
Cascade update and delete destroy data and are therefore dangerous and
not recommended.
Relationship Types
• The most common type of relationships in
databases are one-to-many and many-tomany.
• Oftentimes the distinction depends on how
the business is run. In our example, the
Employees to Tours relationship is one-tomany: One employee can work on many tours,
but each tour has only one employee
assigned.
Many-to-Many Relationships
• If your tours became larger, it is certainly possible that
you might have more than one employee assigned to a
tour. The relationship would then be many-to-many.
One employee can work many tours, and one tour can
have many employees.
• The Guate Tours database already has two many-tomany relationships: Customers-Tours, and OrdersProducts. A tour usually has many customers, and a
customer can sign up for many tours. An order can
contain many types of products, and a particular
product can be a part of many orders.
Representing Many-to-Many
Relationships
• Access won’t allow you to directly define a manyto-many relationship (neither will any other
DBMS)
• Many-to-many relationships are created using an
intersection table: a table with a compound
primary key which is composed of the primary
keys of the two related tables.
• The intersection table is then related to each of
the other tables with one-to-many relationships.
Many-to-Many Examples
• Look back at the Relationships diagram in GuateTours.
• The two intersection tables (which implement the many-tomany relationships) are CustomerTour and OrderDetails.
• Note that the primary key of CustomerTour includes the
keys from the two related tables PLUS the TourDate (since a
customer might take the same tour more than once).
• The primary key of OrderDetails is composed of the
primary keys of Orders and Products. Quantity is included
here because it is a property of the combination of the
order and the product: How many of THIS product are
included in THIS order.
A Business Decision
• Whether a relationship is one-to-many or many-to-many is
frequently a business decision.
• Suppose that you buy your office supplies from Office
Depot, Office Max, or Staples.
• For simplicity, you buy all of your paper from Office Depot,
all of your printer supplies from Office Max, and all of your
tacks and staples from Staples.
• In this case, each supplier supplies many products, but each
product comes from only one supplier. This is a one-tomany relationship:
More flexible, more complex
• Using only one supplier for each product is
simple, but it could be costing you money.
Why not buy all products from all suppliers
when they are on sale?
• This creates a many-to-many relationship:
• Note that Price has been moved to the
intersection table, since the price for each
product may vary from store to store.
Many-to-Many
• It is harder to design many-to-many relationships,
and to write application code for them; However
• Chances are that in many cases where you think
that a one-to-many relationship is enough, you
will eventually need the flexibility of a many-tomany relationship.
– Will employees really do ONLY one thing?
– Will players play ONLY one position?
• If the answer is “Maybe not,” use a many-tomany relationship.
Reflexive Relationships
• Sometimes, a field in a table relates to another field in the same
table.
• This usually indicates some sort of hierarchy within the records in
the table.
• In GuateTours, I added a BossID field to the Employees table. This
field gets filled with the EmployeeID of that employee’s boss.
• Some DBMS’s allow you to draw relationship diagrams which show
reflexive relationships directly—an arrow from BossID up to
EmployeeID.
• Access doesn’t let you do this. To show a reflexive relationship, you
must show a second copy of the Employees table and create the
relationship between the original and the copy.
Quick Review
• You have now been introduced to much of the
theory and terminology of relational databases.
• Being comfortable with the terminology will be
crucial to your understanding the theory and
practice of database design using third normal
form.
• Therefore, here’s a quick review of some of the
definitions you’ve seen (and will need to know for
the rest of this lecture, as well as for exams):
Definitions
• Database: a database is a collection of interrelated data
items that are managed as a single unit.
• Relational Database: A collection of tables, the
relationships between them, and auxiliary items such
as views and stored procedures. The tables are
organized according to the principles first described by
E.F. Codd.
• DBMS: Database Management System—the computer
software that organizes the data on computers and
manages access to it. Examples include Oracle, MySql,
DB2, and Microsoft’s SQL Server (for large-scale
databases) and Access (for smaller databases).
Definitions
• Relation: A set of ordered tuples. Relations are
represented by tables in databases (not by
relationships!)
• Entity: A generic noun, representing a class of
things, but not one particular thing.
• Attribute, Field, Column: Properties possessed
by entities. These are known as “fields” or
“columns” in database tables.
Definitions
• Tuple, Record, Row: The theorist’s “tuple”
becomes a “record” or “row” in a database table.
• The three anomalies: Insert, Update, and Delete.
These are caused by trying to store information
about more than one entity in a single table.
We’ll look at these further next week.
• 3NF: Third Normal Form. This will be the main
topic for next week.
Definitions
• OLTP: Online Transaction Processing. This type of database
is used in the day-to-day operation of a business. It is
designed to handle frequent changes, frequent requests for
small amounts of data, and multiple concurrent users. It is
the type of database that requires 3NF, and what we will be
discussing next week.
• OLAP: Online Analytical Processing. Databases composed of
historical data which isn’t being constantly updated. OLAP
databases are used for analyzing performance, not for dayto-day operations. They do not require 3NF.
• Normalization: Modifying the design of a database so that
its tables are in 3NF.
Definitions
• Table Design: Defining the fields that make up a table,
including identifying data types and assigning primary keys.
• Populating a Table: Adding rows of data to a table.
• Constraint: A restriction on values that can be entered into
a column. Setting the data type is one type of constraint;
adding numeric ranges or min/max text lengths is another;
and primary and foreign keys are a third type of constraint.
• Primary Key: One or more columns in a table which
(together) uniquely identify a row (distinguish it from all
others in the table).
• Candidate Key: Any field or combination of fields that could
serve as the primary key.
Definitions
• Simple Key: A primary key consisting of one field.
• Compound Key: A primary key consisting of two
or more fields.
• Natural Key: A pre-existing or ready-made field
which can serve as the primary key for a table.
• Surrogate Key: A field (usually numeric) added to
a table as the primary key when no natural keys
are available.
Definitions
• Foreign Key: a field (or fields) in a table that is not the
primary key in that table, but IS the primary key in another
table.
• Referential Integrity: This is a property of a relationship in
Access which tells Access to take the relationship seriously
by enforcing the foreign-key constraint. Entering a value in
the foreign-key column of one table will require that that
value already exist in the primary-key column of the other.
• Intersection Table: A table used to implement a many-tomany relationship. The primary key of the intersection
table is the combination of the primary keys of the two
related tables.
Download