The Entity-Relationship (ER) Model Chpt 2 Instructor: Jianping Fan http://www.cs.uncc.edu/~jfan Database Management Systems Raghu Ramakrishnan 1 Overview of Database Design a. What is database? ---database consists of many tables and their inter-table relationships ---each table has some “similar” tuples b. What database should address? ---tables and their relationships ---submission & processing of queries 3 key issues: Description, Organization & Search Database Management Systems Raghu Ramakrishnan 2 Example: University Database Tables: Students: SID, sname, year, GPA Departments: DID, dname, office Faculties: ssn, fname, f-office, phone, salary Courses: CID, cname, time, room, credit-hour Inter-Table Relationships: Students enroll in Courses, Faculties teach Courses Faculties work for Departments Query on multiple tables is allowed! Database Management Systems Raghu Ramakrishnan 3 Overview of Database Design Requirement Analysis ER design Relational Schema DBMS Requirement analysis: analysis of data and users’ requirement to entity sets and relationship sets Conceptual database design: ER model for description Logical database design: ER model to relational database Beyond ER model Schema refinement: analyze the collections of relations Physical database design: database indexing Security design: access control Database Management Systems Raghu Ramakrishnan 4 Overview of Database Design Requirement Analysis ER design Relational Schema DBMS Requirement analysis: analysis of data and users’ requirement to entity sets and relationship sets a. b. c. d. e. What kind of attributes should be included for tuple description? Which attributes should be indexed? Frequent operations & access Long schema or short schema (balance of storage & access efficiency)? Table integration or separation? Updating frequency Query optimization framework? Database Management Systems Raghu Ramakrishnan 5 Overview of Database Design Conceptual design: (ER Model is used at this stage.) a. Should a concept be modeled as an entity or an attribute? b. Should a concept be modeled as an entity or a relationship? c. What are relationship sets and their participating entity sets? d. Should we use binary or ternary relationships? e. Should we use aggregation? Database Management Systems Raghu Ramakrishnan 6 Overview of Database Design Requirement Analysis ER design Relational Schema DBMS Logical database design: ER model to relational database a. How to create physical database tables? b. How to transform E-R models (conceptual database design) into physical database tables? a. How generate physical database indexing for fast query? Database Management Systems Raghu Ramakrishnan 7 Overview of Database Design Requirement Analysis ER design Schema refinement: Relational Schema DBMS analyze the collections of relations a. Which tables should be separated into multiple smaller tables? b. Which tables can be integrated as one single larger table? c. Which attributes should be inserted into existing schema? d. Which attributes can be deleted from existing schema? Database Management Systems Raghu Ramakrishnan 8 Overview of Database Design Requirement Analysis ER design Relational Schema DBMS Physical database design: database indexing a. Which attributes should be selected for indexing? Most frequently-used attributes for query formulation b. What kind of indexing structures should be selected? range search or equal search? Database Management Systems Raghu Ramakrishnan 9 Overview of Database Design Requirement Analysis ER design Relational Schema DBMS Security design: access control Who can access what in database under which conditions? Database Management Systems Raghu Ramakrishnan 10 Overview of Database Design Conceptual design: (ER Model is used at this stage.) – What are the entities and relationships in the enterprise? – What information about these entities and relationships should we store in the database (i.e., attributes)? – What are the integrity constraints or business rules that hold? – A database `schema’ in the ER Model can be represented pictorially (ER diagrams). – Can map an ER diagram into a relational schema. Database Management Systems Raghu Ramakrishnan 11 University Database University database contains employees and departments which are described by certain attributes University database also contains relationships between employees departments which are also described by certain attributes Database Management Systems Raghu Ramakrishnan 12 ER Model Basics ssn name lot Employees Entity: Real-world object distinguishable from other objects. An entity is described (in DB) using a set of attributes. ID name year GPA 999-80-3267, John Smith, 2003, 3.5 999-32-0847, James Gary, 2006, 3.0 Database Management Systems Raghu Ramakrishnan 13 ER Model Basics ssn name lot Employees Entity Set: A collection of similar entities. E.g., all employees. – All entities in an entity set have the same set of attributes. – Each entity set has a key. – Each attribute has a domain. What’s the key? How many keys one object can have? Database Management Systems Raghu Ramakrishnan 14 Entity, Entity Set, Attribute, Schema & Domain ID or SSN 999-38-4431 Name Year John Smith 999-28-3341 Miki Jordan 1999 2000 331-43-4567 David Kim 2000 535-34-5678 Paul Lee 1998 Database Management Systems Raghu Ramakrishnan Age 21 GPA 3.68 28 3.45 25 26 4.00 3.89 15 ER Model Basics (Contd.) since name ssn dname lot Employees budget did Works_In Departments Relationship Set Relationship: Association among two or more entities. Examples: Fan works in Computer Science Department. Smith work in Electronic Engineering Department Database Management Systems Raghu Ramakrishnan 16 ER Model Basics (Contd.) since name ssn dname lot Employees budget did Works_In Departments Relationship Set Relationship Set: Collection of similar relationships. – An n-ary relationship set R relates n entity sets E1 ... En; each relationship in R involves entities e1 E1, ..., en En Same entity set could participate in different relationship sets, or in different “roles” in same set. Database Management Systems Raghu Ramakrishnan 17 Entity vs. Entity Set Student --- Students John Smith (999-21-3415, jsmith@, John Smith, 18, 3.5) Students in ITCS3160 999-21-3415, jsmith@, John Smith, 18, 3.5 999-31-2356, jzhang@, Jie Zhang, 20, 3.0 999-32-1234, ajain@, Anil Jain, 21, 3.8 Database Management Systems Raghu Ramakrishnan 18 Entity Keys Primary key Candidate key 999-21-3415, jsmith@, John Smith, 18, 3.5 999-31-2356, jzhang@, Jie Zhang, 20, 3.0 999-32-1234, ajain@, Anil Jain, 21, 3.8 Database Management Systems Raghu Ramakrishnan 19 Relationship vs. Relationship Set John Smith (999-21-3415, jsmith@, John Smith, 18, 3.5) Relationship ITCS3160 (3160, ITCS, DBMS, J. Fan, 3, Kenn. 236) Database Management Systems Raghu Ramakrishnan 20 Relationship vs. Relationship Set 999-21-3415, jsmith@, John Smith, 18, 3.5 999-31-2356, jzhang@, Jie Zhang, 20, 3.0 999-32-1234, ajain@, Anil Jain, 21, 3.8 Relationship set 3160, ITCS, DBMS, J. Fan, 3, Kenn. 236 6157, ITCS, Visual DB, J. Fan, 3, Kenn. 236 Database Management Systems Raghu Ramakrishnan 21 Potential Relationship Types 1-to-1 Database Management Systems 1-to Many Many-to-1 Raghu Ramakrishnan Many-to-Many 22 Example 1 Build an ER Diagram for the following information: – Students Have an Id, Name, Login, Age, Gpa – Courses Have an Id, Name, Credit Hours – Students enroll in courses Receive a grade Database Management Systems Raghu Ramakrishnan 23 Example 1 Answer Name Id Login Students Age Id GPA Name Credit Courses Enrolled_In Grade Database Management Systems Raghu Ramakrishnan 24 Example 2 Build an ER Diagram for the following information: – Patients Name, Address, Phone #, Age – Drugs Name, Manufacturer , Expiration Date – Patients are prescribed drugs Dosage, # Days Database Management Systems Raghu Ramakrishnan 25 Example 2 Answer Name Addr Patients Phone Name Age Manuf Exp Drug Prescribed Dosage Database Management Systems #days Raghu Ramakrishnan 26 Example 3 Build an ER Diagram for the following information: – Students Have an Id, Name, Login, Age, Gpa – Courses Have an Id, Name, Credit Hours – Students enroll in courses Receive a grade - faculties Name, Address, Phone #, Age – Faculties teach courses semester Database Management Systems Raghu Ramakrishnan 27 Entity vs. Attribute: Ternary Relationship Should address be an attribute of Employees or an entity (connected to Employees by a relationship)? Depends upon the use we want to make of address information, and the semantics of the data: If we have several addresses per employee, address must be an entity (since attributes cannot be set-valued). If the structure (city, street, etc.) is important, e.g., we want to retrieve employees in a given city, address must be modeled as an entity (since attribute values are atomic). Database Management Systems Raghu Ramakrishnan 28 name from to dname Works_In2 does not ssn lot did budget allow an employee to Departments Works_In2 work in a department for Employees two or more periods. Similar to the problem of wanting to record several addresses for an name dname employee: we want to ssn lot did budget record several values of the Works_In3 Departments Employees descriptive attributes for each instance of this Duration to from relationship. Same employee works in same department in different periods Database Management Systems Raghu Ramakrishnan 29 Entity vs. Attribute Should address be an attribute of Employees or an entity (connected to Employees by a relationship)? Depends upon the use we want to make of address information, and the semantics of the data: If we have several addresses per employee, address must be an entity (since attributes cannot be set-valued). If the structure (city, street, etc.) is important, e.g., we want to retrieve employees in a given city, address must be modeled as an entity (since attribute values are atomic). Database Management Systems Raghu Ramakrishnan 30 Entity vs. Relationship: Ternary Relationship name from to dname Works_In2 does not ssn lot did budget allow an employee to Departments Works_In2 work in a department for Employees two or more periods. Similar to the problem of wanting to record several addresses for an name dname employee: we want to ssn lot did budget record several values of the Works_In3 Departments Employees descriptive attributes for each instance of this Duration to from relationship. Database Management Systems Raghu Ramakrishnan 31 Entity vs. Relationship: Ternary Relationship First ER diagram OK if a manager gets a separate discretionary budget for each dept. – Redundancy of dbudget, which is stored for each dept managed by the manager. – Misleading: suggests dbudget tied to managed dept. What if a manager gets a discretionary budget that covers all managed depts? Database Management Systems Raghu Ramakrishnan 32 Entity vs. Relationship: Ternary Relationship First ER diagram OK if a manager gets a separate discretionary budget for each dept. since name ssn – Misleading: suggests dbudget tied to managed dept. lot Employees – Redundancy of dbudget, which is stored for each dept managed by the manager. dbudget dname did Departments Manages2 name ssn What if a manager gets a discretionary budget that covers all managed depts? Database Management Systems budget dname lot Employees did Manages3 budget Departments since apptnum Mgr_Appts dbudget Raghu Ramakrishnan 33 Binary vs. Ternary Relationships* name ssn If each policy is owned by just 1 employee: – Key constraint on Policies would mean policy can only cover 1 dependent! Employees Bad design Policies policyid cost name ssn age Dependents Covers pname lot age Dependents Employees Purchaser Better design Database Management Systems pname lot policyid Raghu Ramakrishnan Beneficiary Policies cost 34 Binary vs. Ternary Relationships (Contd.) name dname ssn lot did budget Works_In3 Employees Duration from Departments to name ssn Employees dname lot did Works_In3 from Database Management Systems budget Departments to Raghu Ramakrishnan 35 Binary vs. Ternary Relationships (Contd.) Previous example illustrated a case when two binary relationships were better than one ternary relationship. An example in the other direction: a ternary relation Contracts relates entity set Parts, Departments and Suppliers, and has descriptive attributes qty. No combination of binary relationships is an adequate substitute: – S “can-supply” P, D “needs” P, and D “deals-with” S does not imply that D has agreed to buy P from S. – How do we record qty? Database Management Systems Raghu Ramakrishnan 36 Key Constraints Consider Works_In: An employee can work in many departments; a dept can have many employees. since name ssn lot Employees Database Management Systems dname did Works_In Raghu Ramakrishnan budget Departments 37 Key Constraints Consider Works_In: An employee can work in at most one department; a dept can have many employees. since name ssn lot Employees Database Management Systems dname did Works_In Raghu Ramakrishnan budget Departments 38 Key Constraints In contrast, each dept has at most one manager, according to the key constraint on Manages. At most one!!! since name ssn dname lot did Employees Manages budget Departments Key Constraint (time constraint) Database Management Systems Raghu Ramakrishnan 39 Participation Constraints Does every department have a manager? – If so, this is a participation constraint: the participation of Departments in Manages is said to be total (vs. partial). Every did value in Departments table must appear in a row of the Manages table (with a non-null ssn value!) since name dname ssn did lot Employees Partial Total Manages budget Departments Total w/key constraint Works_In Total since Database Management Systems Raghu Ramakrishnan 40 What are the policies behind this ER model? since name dname ssn did lot Employees Total Total Manages budget Departments Total w/key constraint Works_In Total since Database Management Systems Raghu Ramakrishnan 41 since name dname ssn lot Employees did Manages Any Difference? budget Departments Works_In since name dname ssn did lot Employees Partial Total Manages budget Departments Total w/key constraint Works_In Total since Database Management Systems Raghu Ramakrishnan 42 Weak Entities vs. Owner Entities A weak entity can be identified uniquely only by considering the primary key of another (owner) entity. – Owner entity set and weak entity set must participate in a one-to-many relationship set (1 owner, many weak entities). – Weak entity set must have total participation in this identifying relationship set. name ssn Primary Key for weak entity lot Employees cost Policy Identifying Relationship Database Management Systems Raghu Ramakrishnan pname age Dependents Weak Entity 43 Ternary Relationship name dname ssn lot Employees from did Works_In3 budget Departments Duration to Why? since name ssn dname lot Employees Database Management Systems budget did Works_In Departments Raghu Ramakrishnan 44 name ssn ISA (`is a’) Hierarchies As in C++, or other PLs, attributes are inherited. hourly_wages lot Employees hours_worked ISA contractid If we declare A ISA B, every A Contract_Emps Hourly_Emps entity is also considered to be a B entity. Overlap constraints: Can Joe be an Hourly_Emps as well as a Contract_Emps entity? (Allowed/disallowed) Covering constraints: Does every Employees entity also have to be an Hourly_Emps or a Contract_Emps entity? (Yes/no) Reasons for using ISA: – To add descriptive attributes specific to a subclass. – To identify entitities that participate in a relationship. Database Management Systems Raghu Ramakrishnan 45 name ssn Aggregation Used when we have to model a relationship involving (entitity sets and) a relationship set. – Aggregation allows us to treat a relationship set as an entity set for purposes of participation in (other) relationships. – Monitors mapped to table like any other relationship set. Database Management Systems lot Employees Monitors until Aggregation started_on pid dname pbudget Projects Raghu Ramakrishnan did Sponsors budget Departments 46 Database Management Systems Raghu Ramakrishnan 47 Real Database Design Build an ER Diagram for the following information: – Walmart Stores Store Id, Address, Phone # – Products Product Id, Description, Price – Manufacturers Name, Address, Phone # – Walmart Stores carry products Amount in store – Manufacturers make products Amount in factory/warehouses Database Management Systems Raghu Ramakrishnan 48 Conceptual Design Using the ER Model Design choices: – Should a concept be modeled as an entity or an attribute? – Should a concept be modeled as an entity or a relationship? – Identifying relationships: Binary or Ternary? Aggregation? Database Management Systems Raghu Ramakrishnan 49 Entity vs. Attribute Should address be an attribute of Employees or an entity (connected to Employees by a relationship)? Depends upon the use we want to make of address information, and the semantics of the data: If we have several addresses per employee, address must be an entity (since attributes cannot be set-valued). If the structure (city, street, etc.) is important, e.g., we want to retrieve employees in a given city, address must be modeled as an entity (since attribute values are atomic). Database Management Systems Raghu Ramakrishnan 50 Entity vs. Attribute (Contd.) name from to dname Works_In2 does not ssn lot did budget allow an employee to Departments Works_In2 work in a department for Employees two or more periods. Similar to the problem of wanting to record several addresses for an name dname employee: we want to ssn lot did budget record several values of the Works_In3 Departments Employees descriptive attributes for each instance of this Duration to from relationship. Database Management Systems Raghu Ramakrishnan 51 Entity vs. Relationship First ER diagram OK if a manager gets a separate discretionary budget for each dept. since name ssn – Misleading: suggests dbudget tied to managed dept. lot Employees – Redundancy of dbudget, which is stored for each dept managed by the manager. dbudget dname did Departments Manages2 name ssn What if a manager gets a discretionary budget that covers all managed depts? Database Management Systems budget dname lot Employees did Manages3 budget Departments since apptnum Mgr_Appts dbudget Raghu Ramakrishnan 52 Binary vs. Ternary Relationships* name ssn If each policy is owned by just 1 employee: – Key constraint on Policies would mean policy can only cover 1 dependent! Employees Bad design Policies policyid cost name ssn age Dependents Covers pname lot age Dependents Employees Purchaser Better design Database Management Systems pname lot policyid Raghu Ramakrishnan Beneficiary Policies cost 53 Binary vs. Ternary Relationships (Contd.) name dname ssn lot did budget Works_In3 Employees Duration from Departments to name ssn Employees dname lot did Works_In3 from Database Management Systems budget Departments to Raghu Ramakrishnan 54 Binary vs. Ternary Relationships (Contd.) Previous example illustrated a case when two binary relationships were better than one ternary relationship. An example in the other direction: a ternary relation Contracts relates entity set Parts, Departments and Suppliers, and has descriptive attributes qty. No combination of binary relationships is an adequate substitute: – S “can-supply” P, D “needs” P, and D “deals-with” S does not imply that D has agreed to buy P from S. – How do we record qty? Database Management Systems Raghu Ramakrishnan 55 Summary of Conceptual Design Conceptual design follows requirements analysis, – Yields a high-level description of data to be stored ER model popular for conceptual design – Constructs are expressive, close to the way people think about their applications. Basic constructs: entities, relationships, and attributes (of entities and relationships). Some additional constructs: weak entities, ISA hierarchies, and aggregation. Note: There are many variations on ER model. Database Management Systems Raghu Ramakrishnan 56 Summary of ER (Contd.) Several kinds of integrity constraints can be expressed in the ER model: key constraints, participation constraints, and overlap/covering constraints for ISA hierarchies. Some foreign key constraints are also implicit in the definition of a relationship set. – Some constraints (notably, functional dependencies) cannot be expressed in the ER model. – Constraints play an important role in determining the best database design for an enterprise. Database Management Systems Raghu Ramakrishnan 57 Summary of ER (Contd.) ER design is subjective. There are often many ways to model a given scenario! Analyzing alternatives can be tricky, especially for a large enterprise. Common choices include: – Entity vs. attribute, entity vs. relationship, binary or n-ary relationship, whether or not to use ISA hierarchies, and whether or not to use aggregation. Ensuring good database design: resulting relational schema should be analyzed and refined further. FD information and normalization techniques are especially useful. Database Management Systems Raghu Ramakrishnan 58 Erwin ER Modeling Tool http://www.cai.com/products/alm/erwin.htm Demo ERwin and it’s capabilities – Open sample movies model Erwin_3.5.2/models – Build Example 2 using ERwin Database Management Systems Raghu Ramakrishnan 59 Homework Assignment Problem 2.4 at the end of Chapter 2 – Pages 53 Due Next Thursday: Hard copy to instructor Format for homework: name, ID. Database Management Systems Raghu Ramakrishnan 60 Homework Assignment since name dname ssn did lot Employees Partial Manages Total Departments Total w/key constraint Works_In partial budget Total since pname Policy age Dependents Key/total cost Database Management Systems Raghu Ramakrishnan 61