Practical tips for creating entity relationship diagrams (ERDs) Chitu Okoli Associate Professor in Business Technology Management John Molson School of Business, Concordia University, Montréal Entities What is an ENTITY in a business case? An entity is a noun (person, place, thing or event) that contains details (the details will be attributes) If it contains no details whatsoever (not even a name or description), then it might be an attribute of something else, not an entity in its own right Entities describe the contents of entire rows Strictly speaking, an entity refers to any specific row in a table, not the table as a whole Specifically, the name of the table is what the primary key of each row describes For example, each attribute in one row in the Employee table should refer to an individual employee Attributes are also nouns (persons, places, things or dates), but they are not entities if they describe other things E.g. a name or date is a noun, but it describes a thing, e.g. a person or product; therefore name or date is the attribute, and the Person or Product is the entity Common false entities: The Organization that owns or operates the database The organization that owns the database is NEVER modeled as an entity within the database Similarly, the database itself or the information system in which the database runs cannot be modeled as an entity However, its customers, suppliers, and other organizations for which it stores data will be modeled Multiple branches or locations within the organization might be modeled, but never the overall organization itself Common false entities: “Inventory”, “Stock”, “Warehouse”, etc. These entity names violate the rule that “an entity refers to any specific row in a table, not the table as a whole” What does one row of an Inventory table describe? An individual inventory? That makes no sense. The entity should be named according to what each row describes Appropriate names: Product, Item, Part, etc. The key is: you should name the entire entity based on the contents of the individual rows To handle the “inventory” or “stock” aspect, there should be an attribute called something like QuantityInStock for each row This assumes that each row represents a specific product model, e.g. you have 5 Common false entities: Forms and reports In general, a form or a report is NOT an entity A form is an input tool that receives data that needs to go into a database A report is an output tool that displays data taken from a database Reports include schedules, summaries, bills, receipts, queries, etc. Forms and reports are great sources for identifying entities and attributes, but they normally represent multiple entities at the same time They are a good point to start table normalization, but rarely survive as valid entities However, the EVENT associated with a form or report is often an entity E.g. An order form is not an entity, but an Order event is an entity E.g. A job application form is not an entity, but the JobApplication event is an entity Even when a form or report represents an event, multiple entities are usually involved Notable exception (which is not really an exception): An Invoice is often expressed as an entity, with InvoiceLine as a bridge entity (M:M) connecting the Invoice to Product Actually, the “Invoice” entity is just an Sale event (an invoice is just a report for a single sale), and “InvoiceLine” could just as well be called “Product_Sale” Relationships What kind of RELATIONSHIP is it? To accurately determine a relationship, you must always test each side in turn. For each side, you must ask: One instance of EntityA can have zero/one (minimum) or one/many (maximum) instance(s) of EntityB? One instance of EntityB can have zero/one (minimum) or one/many (maximum) instance(s) of EntityA? Examples: One department can employ zero or many employees; one employee can work for one and only one department 1:M: One ( II ) on the department side, many ( O< ) on the employee side One department can be managed by zero or one managers; one manager can manage zero or one departments 1:1: One ( IO ) on the department side, one ( OI ) on the manager side One project can employ zero or many employees; one employee can work for zero or many projects M:M: Many ( >O ) on the project side, many ( O< ) on the employee side Maximum and minimum cardinalities 1:1, 1:M and M:M refer to maximum cardinalities E.g. 1:M means maximum one of EntityA and maximum many of EntityB Minimum cardinalities are expressed with Crow’s foot notation: OI: Minimum zero, maximum one II: Minimum one, maximum one O<: Minimum zero, maximum many I< : Minimum one, maximum many Minimum zero (OI or O<) means that this side of the relationship is optional The other entity can exist in the database without this side Minimum one (II or I<) means that this side of the relationship is mandatory (required) The other entity cannot exist in the database without this side For a bridge entity (representing a M:M relationship), the one side is always II (minimum one, maximum one). If either main entity were absent, the record would not exist. How COMMON are various kinds of relationship? One-to-many (1:M) is the most common Many-to-many (M:M) is quite common, but in an ERD, it must always be decomposed into two 1:M relationships One-to-one (1:1) is actually rather rare Most of the time when you think it’s 1:1, you’ve done something wrong Double- and triple-check to make sure it really is 1:1 Legitimate 1:1 relationships IS-A or supertype/subtype: E.g. A person is a specific type of person: an Employee is an Accountant, a Doctor is a Specialist, a Student is an Undergraduate E.g. A product is a specific type of product: a Bicycle is a type of Vehicle, a Violin is a type of MusicalInstrument Boss, where the rule is that you can only have one boss, and a boss can be boss of only one organizational unit E.g. Employee manages a Department, Student is president of a Club Some kinds of scarce resource allocation E.g. one employee can be assigned only one company car Marriage: one Person can be married to only one other Person at a time Matching reservations E.g. an advance car Reservation is matched to one eventual car Rental NOTE: None of these relationships can store historical information If you add dates to track historical changes, it becomes M:M In addition to these, there are other database performance scenarios where 1:1 relationships could make sense Naming conventions REQUIREMENTS for database naming conventions Many variations exist for naming conventions, but one rule is absolute: Whatever you choose, BE CONSISTENT! Entity and attribute names: Never uses spaces (“ ”) in instead, use either CamelCase or underscores_between_words Spaces can seriously mess up database programming in unexpected places Relationship names Relationship names are required in an ERD For relationship names in an ERD, spaces are perfectly OK, since they never appear in programming You can’t use reserved words Each database has a list of keywords that have special meaning; you can’t use these as entity or attribute names E.g. name, date, order, row, number, etc. Some DBMSs let you use them (maybe you have to use quotes or brackets), but trouble is almost guaranteed when you start programming Lists of reserved names for popular DBMSs Oracle SQL Reserved Words MySQL Keywords and Reserved Words Microsoft Access SQL Reserved Words Microsoft SQL Server Reserved Keywords You can’t memorize them all, but it’s a good idea to at least scan through the list once when you start using a new DBMS SUGGESTIONS for database naming conventions My preferred (optional) tips: Avoid abbreviating table or attribute names (e.g. don’t use CustID for CustomerID, or EmpName for EmployeeName) Requires a little more typing, but clarity in coding is priceless Avoid redundant table prefixes such as “tbl” in tblCustomer; just Customer is fine The context of use makes it clear that it’s a table Always use a surrogate primary key (PK) for every table and call it ID, and always call the foreign key (FK) TableID or Table_ID. For example, the PK for Customer table is ID (Customer.ID) and its corresponding FK in the Order table is CustomerID (Order.CustomerID) Detailed specifications (these are just suggestions): Jason Mauss’s detailed guide Justin Watts’ tips SUGGESTIONS for naming Entities and Attributes Entity and attribute names are always a singular noun, not plural (e.g. Customer, not Customers) For bridge/associative entities from M:M relationships, there are two choices: Be creative in forming a noun from the verb of the relationship. E.g. for Employee and Project, name the bridge entity “Assignment” Simply form a name by merging the two constituent entity names. E.g. for Employee and Project, name it Employee_Project or Project_Employee SUGGESTIONS for naming Relationships Never use “has” to name relationships—it is meaningless E.g. Don’t use “Customer has Sale” or “Sale has Product”. Instead, use “Customer purchases Sale” or “Sale offers Product” Anything is better than “has” Relationship names are always verbs, usually lower case; spaces are allowed Relationships are named from the one side to the many side (in a 1:M relationship) For 1:1 relationships, they are named from the side with the PK to the side with the FK For bridge entity relationships, name them as if the many side were directly connected to the original entity Major stages for designing an ERD 1. 2. 3. Identify the entities, attributes and relationships from the business case description Verify the attributes, select primary keys, add foreign keys and normalize Finalize the ERD using modeling software Designing an ERD: Stage 1 Identify entities, attributes and relationships 1. Identify entities (main nouns) and attributes (nouns that are details of entities) 2. Identify relationships between entities 3. Analyze the case top-down first (entities, then attributes) Analyze again, this time bottom-up (attributes, then entities) Each attribute must apply specifically and only to its own entity Leave out primary keys and foreign keys at this stage It is never the case that all entities are related to all relationships. You must make sure that the relationship exists. For a relationship to exist, the case description must mention the entities involved together in the same sentence. Otherwise, there is no relationship. Use the two-step procedure for verifying 1:M, M:M or 1:1 (first verify one side of the relationship, and then the other side) Verify minimum and maximum cardinalities for each relationship (e.g. minimum 0, maximum many; minimum 1, maximum 1 Convert any M:M relationships into two 1:M relationships Designing an ERD: Stage 2 Verify attributes, add keys, normalize 1. Populate all your entities with attributes 2. Review and revise attributes In particular, verify if some of the attributes identified earlier might actually belong in the new bridge entities created for M:M 3. Decide on the primary keys (PK, unique identifiers) for each entity 4. Add any appropriate foreign keys (FK) to all entities 5. Normalize each entity to fourth normal form (4NF) Normalization might require you to make major revisions to some entities If revisions are required, reiterate until you are satisfied with all the entities In which entity does the foreign key go? 1:M relationships (including bridge entities): The PK from the one side must always be copied to the many side as a FK 1:1 relationships: Copy the FK to only one of the two sides, never both (1,1):(0,1): Foreign key must go on the (0,1) side (1,0):(0,1): You can choose either side, but never both Preferably, place the FK on whichever side will cause fewer nulls (1,1):(1,1): This relationship almost never exists in the real world. You probably made a mistake. Verify your cardinalities, or merge the two tables into one. Exception: this relationship is sometimes used deliberately for database performance optimization Designing an ERD: Stage 3 (modeling software) Draw the ERD 1. Draw the final ERD using modeling software Stages 1 and 2 are drafts (maybe even on paper); you don’t consider it final until you’ve got your drafts polished If you draw something in software at the beginning, it is much harder to see and correct your errors 2. Verify various items: All relationships must be named By default, all relationships are non-identifying/weak (dashed lines) Identifying/strong relationships (solid lines) are used only when an FK is part of one entity’s PK Required attributes (including all PKs and some FKs) should be bolded