Entity Relationship Model (ER Model) Dimitra Vista 1 Levels of Design Database applications are large and sophisticated • Requirements Analysis – what are we building? • Conceptual database design – high level description of the data • Logical database design – formal schema (must choose a DBMS) • Schema refinement – reduce data redundancy • Physical database design – choose indexes, storage, etc. • Application and security design – complete software design Domain experts must drive requirements Dimitra Vista 2 Conceptual Design • Conceptual database design – high level description of the data • Collect user requirements • Information that needs to be represented • Operations that need to be supported • Several techniques for representing this information (e.g., UML) • Develop a conceptual schema for the database • A high-level representation • Follows a specific data model (ER model) • Often represented graphically (ER diagram) Dimitra Vista 3 Conceptual Design – Why Do We Care? • Typical large company • 1000s of databases (90% of them relational) • 100s of tables in each database • 50-200 columns in each table • Huge amount of effort to resolve inconsistencies / violations • It is hard to design applications Dimitra Vista 4 Entity-Relationship Model (ER Model) • Used for the conceptual design of a database • Introduced by Chen in 1976 • A very common model for database design • Allows the specification of the schema in graphical terms • A picture is worth a thousand words! • Easier for other participants to understand • Basic concepts are simple, but model allows for sophisticated concepts • Can be mapped automatically to the relational model Key insight: Some designs are clearly wrong, but there are often multiple correct ones! Dimitra Vista 5 ER Model Basics • Entity - a real-world object distinguishable from other objects, e.g., Mary Jones, CS123 • Entity set - a collection of similar entities, where similar means (for now) that they share the same attributes • Attribute - a characteristic of an entity, e.g., name, age, social security number (SSN) Dimitra Vista 6 Alternative Representations Dimitra Vista 7 Keys • Each entity set must have a key attribute (a.k.a. an identifier) – a key uniquely identifies an entity within the entity set – a key is either simple (1 attribute) or composite (multiple attributes) • Keys are important! They allow us to: – unambiguously refer to an entity within a set – link together entities across entity sets Dimitra Vista 8 Attributes • Each attribute has a domain - a set of possible values (e.g., GPA, gender, name) • Required vs. optional (e.g., last name vs. middle name) • Simple vs. composite (e.g., age vs. address) • Single-valued vs. multi-valued (e.g., car make vs. color) • Stored vs. derived (e.g., DOB vs. age) - advantages / disadvantages? Dimitra Vista 9 Relationships and Relationship Sets • A relationship is an association between two or more entities, e.g., Jane works in IT • A relationship set is a collection of similar relationships, e.g., Employees work in Departments Dimitra Vista 10 Descriptive Attributes A relationship set can have descriptive attributes Dimitra Vista 11 N-ary Relationship Sets An n-ary relationship set R associates n entity sets E1, E2, ...., En. Each relationship r relates entities e1∈ E1, e2 ∈ E2, ..., en∈ En. This is a ternary relationship Dimitra Vista 12 What is the difference? • Should a relationship be a ternary relationship or two binary relationships? Lecturer Lecturer teaches lectures Course Room vs Course Dimitra Vista teaches Room 13 What is the difference? • Should a relationship be a ternary relationship or two binary relationships? Lecturer Lecturer teaches lectures Course Room vs Course teaches Room Problem if the lecturer teaches more than one course: Second design cannot represent which room is assigned for which course Dimitra Vista 14 Recursive Relationship Sets The same entity set can take on different roles in the same relationship set. This is called a recursive relationship set. Dimitra Vista 15 An Instance of a Relationship Set • A relationship belongs to (is an element of) a relationship set, and so must be uniquely identifiable by participating entities. • Meaning: each (employee, department) pair appears in the relationship set only once. Because it’s a set! Dimitra Vista 16 Examples Draw ER diagrams representing entity sets and relationship sets described below. Cars(vin, plate, state) Parking(lot, address) Assigned_To() vin plate Car Dimitra Vista lot state assigned_do address Parking 17 Entity vs Attribute • It is not always easy to tell whether an attribute warrants creating an entity set of its own • Difficult cases – composite attributes, e.g., address – multi-valued attributes, e.g., colors of a car (exterior /interior) • Rule of thumb: – composite attributes should be modeled as entity sets if we care about their structure • Rule of thumb: – multi-valued attributes should be modeled as entity sets Dimitra Vista 18 Composite Attributes • Composite attributes allow us to divide attributes into subparts (other attributes). street name city state address EMPLOYEES Dimitra Vista 19 What is the difference? street street city city state state Address name address EMPLOYEES Dimitra Vista name lives vs EMPLOYEES 20 What is the difference? street street city city state state Address name address EMPLOYEES name lives vs EMPLOYEES 1) The second permits multiple addresses for an employee 2) Composite attributes should be modeled as entity sets if we care about their structure Dimitra Vista 21 Multi-Valued Attributes • Multi-valued attributes allow us to have attributes that can have multiple values ssn name phone EMPLOYEES Dimitra Vista 22 Multi-Valued Attributes vs Entity Set? • Multi-valued attributes allow us to have attributes that can have multiple values area name ssn EMPLOYEES phone number Phone vs ssn name has EMPLOYEES Rule of thumb: multi-valued attributes should be modeled as entity sets Dimitra Vista 23 Derived Attributes • Derived attributes allow us to compute rather than store the value of an attribute ssn dob age EMPLOYEES Dimitra Vista 24 What’s wrong with this design? Dimitra Vista 25 What’s wrong with this design? • For each pair of employee and department you can have only one value for from/to • What’s a better design? Dimitra Vista 26 A Better Design Dimitra Vista 27 Examples Draw ER diagrams representing entity sets and relationship sets described below. • Courses are offered by departments. A course may be offered in multiple terms, and each offering must be recorded. cid did name Courses Offered_by name Departments term Offering Dimitra Vista 28 Examples Draw ER diagrams representing entity sets and relationship sets described below. • Courses are offered by departments. A course may be offered in multiple terms, and only the most recent offering must be recorded. cid name Courses Dimitra Vista term Offered_by did name Departments 29 ER Model – Beyond the Basics • cardinality • key constraints • participation constraints • weak entities • ISA (is a) relationship • entity clustering Dimitra Vista 30 Cardinality Dimitra Vista 31 Constraining Relationships • Key constraints: entity relates at most once Teacher • teaches Participation constraints: entity relates at least once Teacher Dimitra Vista teaches 32 Key Constraints • Business rule: A professor may teach 0, 1 or multiple classes. Each class is taught by at most one professor. Dimitra Vista 33 Key Constraints • Business rule: A professor teaches at most one class. Each class is taught by at most one professor. Dimitra Vista 34 Participation Constraints Business rule: A professor may teach 0, 1 or multiple classes. Each class is taught by some professor. Dimitra Vista 35 Participation Constraints Business rule: Each professor teaches some class. Each class is taught by some professor. Dimitra Vista 36 Participation Constraints Business rule: Each professor teaches some class. Each class is taught by exactly one professor. Dimitra Vista 37 Explain the Business Rules Lecturer teaches Course Business rules: Each lecturer teaches at most one course Each course must be taught by some lecturer Dimitra Vista 38 Explain the Business Rules Student favorite Course Business rules: Each student has exactly one favorite course A course does not have to be the favorite of some student Dimitra Vista 39 Explain the Business Rules Business rules: Each department is managed by exactly one employee. Each employee works in some department. Each department has some employee working in it. Dimitra Vista 40 Example Employees have managers. An employee has at most one manager. A manager may manage any number of employees. Dimitra Vista 41 Example (Solution) Employees have managers. An employee has at most one manager. A manager may manage any number of employees. Dimitra Vista 42 Example Professors advise students. Each professor advises at most one student. Each student has exactly one advisor. Dimitra Vista 43 Example (Solution) Professors advise students. Each professor advises at most one student. Each student has exactly one advisor. Dimitra Vista 44 Example • An author is described by a name and a date of birth (dob).No two authors have the same combination of name and dob. • A book is described by an ISBN, a title and a year when it was written (year). No two books have the same ISBN number. • A book is written by exactly one author. • An author is only included in the database is she has authored at least one book. Dimitra Vista 45 Example (Solution) • An author is described by a name and a date of birth (dob).No two authors have the same combination of name and dob. • A book is described by an ISBN, a title and a year when it was written (year). No two books have the same ISBN number. • A book is written by exactly one author. • An author is only included in the database is she has authored at least one book. Dimitra Vista 46 Example • A scientist is described by a name and a date of birth (dob). No two scientists have the same combination of name and dob. • A discovery has a name. No two discoveries have the same name. • An award has a name, which uniquely identifies it. • Scientists make discoveries. A discovery is made by one or several scientists. A scientist who made no discoveries is not tracked in our database. • Scientists receive awards. A scientist may receive the same award more than once in different years. All years in which a scientist received a particular award must be recorded. Dimitra Vista 47 Example (Solution) • A scientist is described by a name and a date of birth (dob). No two scientists have the same combination of name and dob. • A discovery has a name. No two discoveries have the same name. • An award has a name, which uniquely identifies it. • Scientists make discoveries. A discovery is made by one or several scientists. A scientist who made no discoveries is not tracked in our database. • Scientists receive awards. A scientist may receive the same award more than once in different years. All years in which a scientist received a particular award must be recorded. Dimitra Vista 48 Example Chefs work at restaurants. A chef is uniquely identified by a SSN, and is also described by a name and a cuisine in which she specialized. A restaurant is uniquely identified by a combination of name and city. Each chef works in at least one restaurant, and each restaurant must have at least one chef working at it. Some chefs own restaurants, and if a chef owns a restaurant - she is its sole owner. Dimitra Vista 49 Example (Solution) Chefs work at restaurants. A chef is uniquely identified by a SSN, and is also described by a name and a cuisine in which she specialized. A restaurant is uniquely identified by a combination of name and city. Each chef works in at least one restaurant, and each restaurant must have at least one chef working at it. Some chefs own restaurants, and if a chef owns a restaurant - she is its sole owner. Dimitra Vista 50 A Note on Keys • A key in a binary relationship is clear Lecturer A lecturer teaches at most one course If you know the lecturer, you know the course teaches Course Dimitra Vista 51 A Note on Keys A key in a ternary relationship is not so clear • Lecturer Lecturer Course teaches Lecturer is the key Dimitra Vista Room Course teaches Room Lecturer is a key Course is also a key 52 Key and Participation Constraints: Recap • Key constraint – denoted by a line with an arrowhead towards the relationship set – occurs only in 1-to-many or 1-to-1 relationship sets – example: a course is taught by one / at most one / exactly one professor • Participation constraint – denoted by a bold line towards the relationship set – can occur on the 1 or on the many side of a relationship set – example: a course must be taught by a professor – example: a course is taught by at least one professor Dimitra Vista 53 Weak Entities • The identifying owner entity and the weak entity must participate in a one-to-many relationship set. This relationship set is called the identifying relationship set of the weak entity. • The weak entity must have total participation in the identifying relationship set. Dimitra Vista 54 In Summary At most one At least one Exactly one Weak entity Dimitra Vista 55 Extended ER-Model • Aggregation (aka entity clustering) – Models the relationships of a relationship • ”is-a” relationship – Like subclasses in an object-oriented language – Avoids data redundancy Dimitra Vista 56 Extended ER Model - Aggregation • Aggregation is used when we have to model a relationship with another relationship • Allows us to treat a relationship set as an entity set for purposes of participation in (other) relationships Dimitra Vista 57 A Night at the Opera “Let’s develop the conceptual data model for an opera production.” • What are the entities? • What are the relationships? • What are the business rules? Dimitra Vista 58 Opera: Entity Sets • Opera(title, year) - may have more than one opera by the same title, but not in the same year (e.g., Otello - Verdi 1887 /Rossini 1816) • Composer (name, birth, death) - may have more than one composer with the same name, but not also with the same birth and death years • Librettist (name, birth, death) • Company (name) • Venue (name, city) • Production - an entity set or a relationship set, linking Opera and Company? • Performance - an entity set or a relationship set, linking Production and Venue? Dimitra Vista 59 Opera: Relationship Sets • An Opera is written by Composer / Librettist – each Composer composed at least one Opera – each Librettist wrote the libretto for at least one Opera • An Opera by the same name may be written by a different (Composer, Librettist) pair • A piece is produced by a Company • A production is performed at a Venue Dimitra Vista 60 Opera: The Model Dimitra Vista 61 ISA (is a) Relationships • A ”ISA” B means that every A is also a B – A inherits all the attributes from A – Use them to add descriptive attributes only to the subclass – Use it to describe relationships with the superclass name ssn PERSON salary INSTRUCTORS Dimitra Vista ISA gpa STUDENTS 62 Is-A Relationships • Like subclasses in an object-oriented language • They reduce data redundancy (you don’t have to repeat data that exist in the parent) – Subclasses inherit attributes from the parent – Subclasses inherit relationships from the parent No multiple inheritance, single parent hierarchy (tree) !!! Dimitra Vista 63 ISA (is-a) Relationships • Overlap constraints determine whether two subclasses are allowed to contain the same entity • Covering constraints determine whether the entities in the subclasses collectively include all entities in the superclass PERSON ISA INSTRUCTORS PERSON Overlaps STUDENTS A person can be both an instructor and a student Dimitra Vista ISA INSTRUCTORS Covers STUDENTS A person must be either an instructor or a student 64 Conceptual Design Using The ER Model Design Choices • Entity vs attribute? • Entity vs relationship? • Binary vs ternary relationship? • Aggregation vs ternary relationship? Constraints • A lot of data semantics can (and should) be captured • But some constraints cannot be captured in ER diagrams • Check the book for more examples and more explanations Dimitra Vista 65 Design Choices: Entity or Attribute? • Should address be an attribute of Employee or an entity (connected to Employee by a relationship? ssn Employee num ssn address vs Employee has street city Address It depends. • If we have several addresses per employee, address must be an entity (since attributes cannot be set-valued) • If the address is important as a structure (e.g., we are doing locationbased analytics), address must be modeled as an entity Dimitra Vista 66 Design Choices: Entity or Relationship? • Should visit be a relationship or an entity? date date Patient • visit Visit Doctor vs Patient sees Doctor If visit is a relationship only one date is allowed Dimitra Vista 67 Design Choices: Binary or Ternary Relationship? • Should a relationship between 3 entities be a ternary relationship or two binary relationships? Policy Policy Employee • covers Dependent vs purchases covers Employee Dependent First design does not allow a policy to cover multiple dependents Dimitra Vista 68 Design Choices: Binary or Ternary Relationship? • Should a relationship between 3 entities be a ternary relationship or two binary relationships? Part Supplier Part contract Department vs cansupply needs Supplier Department qty • In the second design, where should the qty attribute go? Dimitra Vista 69 Design Choices: Ternary Relationship or Aggregation? Should a relationship be a ternary relationship or an aggregation? • Opera Company Opera written written Composer produced Composer Company vs • First design does not allow an opera to be written unless it is produced by a company Dimitra Vista 70 In Summary • Lots of subtle points • In database design, these choices matter a lot! • Hard to consider all possible scenarios in advance, but in principle, if you want to work with a database, you must make those choices Database design is an iterative process! Dimitra Vista 71 ER Modeling Tools Many of them translate to SQL automatically • ER/Studio • Enterprise Architect • ERWin • PgModeler, PgDesigner • UML tools • and many others https://dbmstools.com/categories/database-diagram-tools/postgresql Precise visualization may differ, concepts are similar Dimitra Vista 72 Alternative Visualization This is called crow’s foot notation Dimitra Vista 73 Music Brainz Database Dimitra Vista 74 Lessons • Lesson 1: A picture is worth a thousand words • Lesson 2: Conceptual modeling follows requirements analysis • Lesson 3: ER models are not complete, but approximate designs (subjective) Dimitra Vista 75 Acknowledgements Some slides in this course are inspired or copied from • https://www.cs.drexel.edu/~julia/cs461/ by Julia Stoyanovich, Drexel U, Spring 2018 • https://sfu-db.github.io/cmpt354/ by Jiannan Wang - Simon Fraser, Fall 2018 • https://www.db-book.com/db7/slides-dir/index.html by Avi Silberschatz, Henry F. Korth, S. Sudarshan • https://www.youtube.com/watch?v=i7iugcmrBJ8&list=PLXPbT_PYOiRipf X8zrv_9EpnSOpK9P__j, Immanuel Trummer, Connell University, Intro to Database Systems • https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_otTrB Trjyohi, Andy Pavlo, CMU, Intro to Database Systems Dimitra Vista 76