Tips for drawing an ERD

advertisement
Practical tips for creating
entity relationship diagrams (ERDs)
Chitu Okoli
Associate Professor in Business Technology Management
John Molson School of Business, Concordia University, Montréal
Entities
What is an ENTITY in a business case?
 An entity is a noun (person, place, thing or event) that contains
details (the details will be attributes)
 If it contains no details whatsoever (not even a name or
description), then it might be an attribute of something else, not
an entity in its own right
 Entities describe the contents of entire rows
 Strictly speaking, an entity refers to any specific row in a table,
not the table as a whole
 Specifically, the name of the table is what the primary key of
each row describes
 For example, each attribute in one row in the Employee table should
refer to an individual employee
 Attributes are also nouns (persons, places, things or dates), but
they are not entities if they describe other things
 E.g. a name or date is a noun, but it describes a thing, e.g. a
person or product; therefore name or date is the attribute, and
the Person or Product is the entity
Common false entities:
The Organization that owns or
operates the database
 The organization that owns the database is NEVER
modeled as an entity within the database
 Similarly, the database itself or the information system in
which the database runs cannot be modeled as an entity
 However, its customers, suppliers, and other
organizations for which it stores data will be modeled
 Multiple branches or locations within the organization
might be modeled, but never the overall organization
itself
Common false entities:
“Inventory”, “Stock”, “Warehouse”, etc.
 These entity names violate the rule that “an entity refers to any
specific row in a table, not the table as a whole”
 What does one row of an Inventory table describe? An individual
inventory? That makes no sense.
 The entity should be named according to what each row describes
 Appropriate names: Product, Item, Part, etc.
 The key is: you should name the entire entity based on the contents
of the individual rows
 To handle the “inventory” or “stock” aspect, there should be an
attribute called something like QuantityInStock for each row
 This assumes that each row represents a specific product model,
e.g. you have 5
Common false entities:
Forms and reports
 In general, a form or a report is NOT an entity
 A form is an input tool that receives data that needs to go into a database
 A report is an output tool that displays data taken from a database
 Reports include schedules, summaries, bills, receipts, queries, etc.
 Forms and reports are great sources for identifying entities and attributes, but they
normally represent multiple entities at the same time
 They are a good point to start table normalization, but rarely survive as valid entities
 However, the EVENT associated with a form or report is often an entity
 E.g. An order form is not an entity, but an Order event is an entity
 E.g. A job application form is not an entity, but the JobApplication event is an entity
 Even when a form or report represents an event, multiple entities are usually
involved
 Notable exception (which is not really an exception):
 An Invoice is often expressed as an entity, with InvoiceLine as a bridge entity (M:M)
connecting the Invoice to Product
 Actually, the “Invoice” entity is just an Sale event (an invoice is just a report for a
single sale), and “InvoiceLine” could just as well be called “Product_Sale”
Relationships
What kind of RELATIONSHIP is it?
 To accurately determine a relationship, you must always test
each side in turn. For each side, you must ask:
 One instance of EntityA can have zero/one (minimum) or
one/many (maximum) instance(s) of EntityB?
 One instance of EntityB can have zero/one (minimum) or
one/many (maximum) instance(s) of EntityA?
 Examples:
 One department can employ zero or many employees; one
employee can work for one and only one department
 1:M: One ( II ) on the department side, many ( O< ) on the employee side
 One department can be managed by zero or one managers; one
manager can manage zero or one departments
 1:1: One ( IO ) on the department side, one ( OI ) on the manager side
 One project can employ zero or many employees; one employee
can work for zero or many projects
 M:M: Many ( >O ) on the project side, many ( O< ) on the employee side
Maximum and minimum cardinalities
 1:1, 1:M and M:M refer to maximum cardinalities
 E.g. 1:M means maximum one of EntityA and maximum many of EntityB
 Minimum cardinalities are expressed with Crow’s foot notation:




OI: Minimum zero, maximum one
II: Minimum one, maximum one
O<: Minimum zero, maximum many
I< : Minimum one, maximum many
 Minimum zero (OI or O<) means that this side of the relationship
is optional
 The other entity can exist in the database without this side
 Minimum one (II or I<) means that this side of the relationship is
mandatory (required)
 The other entity cannot exist in the database without this side
 For a bridge entity (representing a M:M relationship), the one
side is always II (minimum one, maximum one).
 If either main entity were absent, the record would not exist.
How COMMON are various kinds
of relationship?
 One-to-many (1:M) is the most common
 Many-to-many (M:M) is quite common, but in an ERD,
it must always be decomposed into two 1:M
relationships
 One-to-one (1:1) is actually rather rare
 Most of the time when you think it’s 1:1, you’ve done
something wrong
 Double- and triple-check to make sure it really is 1:1
Legitimate 1:1 relationships

IS-A or supertype/subtype:
 E.g. A person is a specific type of person: an Employee is an Accountant, a
Doctor is a Specialist, a Student is an Undergraduate
 E.g. A product is a specific type of product: a Bicycle is a type of Vehicle, a Violin
is a type of MusicalInstrument

Boss, where the rule is that you can only have one boss, and a boss can
be boss of only one organizational unit
 E.g. Employee manages a Department, Student is president of a Club

Some kinds of scarce resource allocation
 E.g. one employee can be assigned only one company car


Marriage: one Person can be married to only one other Person at a time
Matching reservations
 E.g. an advance car Reservation is matched to one eventual car Rental

NOTE: None of these relationships can store historical information
 If you add dates to track historical changes, it becomes M:M
 In addition to these, there are other database performance
scenarios where 1:1 relationships could make sense
Naming conventions
REQUIREMENTS for
database naming conventions
 Many variations exist for naming conventions, but one rule
is absolute: Whatever you choose,
BE CONSISTENT!
 Entity and attribute names:
 Never uses spaces (“ ”) in instead, use either
CamelCase or underscores_between_words
 Spaces can seriously mess up database programming in unexpected
places
 Relationship names
 Relationship names are required in an ERD
 For relationship names in an ERD, spaces are perfectly OK, since they
never appear in programming
You can’t use reserved words
 Each database has a list of keywords that have special
meaning; you can’t use these as entity or attribute
names
 E.g. name, date, order, row, number, etc.
 Some DBMSs let you use them (maybe you have to use
quotes or brackets), but trouble is almost guaranteed
when you start programming
 Lists of reserved names for popular DBMSs





Oracle SQL Reserved Words
MySQL Keywords and Reserved Words
Microsoft Access SQL Reserved Words
Microsoft SQL Server Reserved Keywords
You can’t memorize them all, but it’s a good idea to at least scan
through the list once when you start using a new DBMS
SUGGESTIONS for
database naming conventions
 My preferred (optional) tips:
 Avoid abbreviating table or attribute names (e.g. don’t use CustID
for CustomerID, or EmpName for EmployeeName)
 Requires a little more typing, but clarity in coding is priceless
 Avoid redundant table prefixes such as “tbl” in tblCustomer; just
Customer is fine
 The context of use makes it clear that it’s a table
 Always use a surrogate primary key (PK) for every table and call it
ID, and always call the foreign key (FK) TableID or Table_ID.
 For example, the PK for Customer table is ID (Customer.ID) and its
corresponding FK in the Order table is CustomerID (Order.CustomerID)
 Detailed specifications (these are just suggestions):
 Jason Mauss’s detailed guide
 Justin Watts’ tips
SUGGESTIONS for naming
Entities and Attributes
 Entity and attribute names are always a singular noun,
not plural (e.g. Customer, not Customers)
 For bridge/associative entities from M:M relationships,
there are two choices:
 Be creative in forming a noun from the verb of the relationship. E.g.
for Employee and Project, name the bridge entity “Assignment”
 Simply form a name by merging the two constituent entity names.
E.g. for Employee and Project, name it Employee_Project or
Project_Employee
SUGGESTIONS for naming
Relationships
 Never use “has” to name relationships—it is
meaningless
 E.g. Don’t use “Customer has Sale” or “Sale has Product”. Instead,
use “Customer purchases Sale” or “Sale offers Product”
 Anything is better than “has”
 Relationship names are always verbs, usually lower
case; spaces are allowed
 Relationships are named from the one side to the
many side (in a 1:M relationship)
 For 1:1 relationships, they are named from the side with the
PK to the side with the FK
 For bridge entity relationships, name them as if the many
side were directly connected to the original entity
Major stages for designing an ERD
1.
2.
3.
Identify the entities, attributes and relationships from the business case
description
Verify the attributes, select primary keys, add foreign keys and normalize
Finalize the ERD using modeling software
Designing an ERD: Stage 1
Identify entities, attributes and relationships
1.
Identify entities (main nouns) and attributes (nouns that are details of
entities)




2.
Identify relationships between entities




3.
Analyze the case top-down first (entities, then attributes)
Analyze again, this time bottom-up (attributes, then entities)
Each attribute must apply specifically and only to its own entity
Leave out primary keys and foreign keys at this stage
It is never the case that all entities are related to all relationships. You must
make sure that the relationship exists.
For a relationship to exist, the case description must mention the entities
involved together in the same sentence. Otherwise, there is no relationship.
Use the two-step procedure for verifying 1:M, M:M or 1:1 (first verify one
side of the relationship, and then the other side)
Verify minimum and maximum cardinalities for each relationship (e.g.
minimum 0, maximum many; minimum 1, maximum 1
Convert any M:M relationships into two 1:M relationships
Designing an ERD: Stage 2
Verify attributes, add keys, normalize
1. Populate all your entities with attributes
2. Review and revise attributes

In particular, verify if some of the attributes identified earlier might
actually belong in the new bridge entities created for M:M
3. Decide on the primary keys (PK, unique identifiers) for
each entity
4. Add any appropriate foreign keys (FK) to all entities
5. Normalize each entity to fourth normal form (4NF)


Normalization might require you to make major revisions to some
entities
If revisions are required, reiterate until you are satisfied with all
the entities
In which entity does the foreign key go?

1:M relationships (including bridge entities):
 The PK from the one side must always be copied to the many
side as a FK

1:1 relationships:
 Copy the FK to only one of the two sides, never both
 (1,1):(0,1): Foreign key must go on the (0,1) side
 (1,0):(0,1): You can choose either side, but never both
 Preferably, place the FK on whichever side will cause fewer nulls
 (1,1):(1,1): This relationship almost never exists in the real
world. You probably made a mistake. Verify your cardinalities,
or merge the two tables into one.
 Exception: this relationship is sometimes used deliberately for database
performance optimization
Designing an ERD: Stage 3 (modeling software)
Draw the ERD
1. Draw the final ERD using modeling software

Stages 1 and 2 are drafts (maybe even on paper); you
don’t consider it final until you’ve got your drafts polished
If you draw something in software at the beginning, it is
much harder to see and correct your errors

2. Verify various items:


All relationships must be named
By default, all relationships are non-identifying/weak
(dashed lines)


Identifying/strong relationships (solid lines) are used only
when an FK is part of one entity’s PK
Required attributes (including all PKs and some FKs)
should be bolded
Download