Chapter 2: Entity-Relationship Model • • • • • • • Basic Concepts Constraints Design Issues Entity-Relationship diagram Strong – Weak entity sets Extended ER features Mapping an ER Schema to tables Entity-Relationship Model ER modeling: A graphical technique for understanding and organizing the data independent of the actual database implementation OR The ER model describes data as entities , relationships and attributes A technique used to analyze & model the data in organizations using an Entity Relationship (E-R) diagram. • Basic Notations of ER model are • Entity types / sets • Relationship types / sets • Attributes Entity Types , Entity Sets, Keys and Value Sets • Entity An entity is a thing or object in the real world that is distinguishable from all other objects OR Any thing from the real world that have an independent existence and about which we intend to collect data • An entity may be an object with physical existence (person, book, car, house) or with a conceptual existence (Company, job, or university course) Entity Types , Entity Sets, Keys and Value Sets • Each Entity has attributes, the particular properties that describe it • Entity type - Set of entities of the same type that share the same attributes. Each Entity type is described by its name and attributes OR A collection of entities that share common properties or characteristics • Entity Set – Collection of all entities of a particular entity type in DB at any point in time • Extension of the entity Set - The individual entities that constitute a set • Entity sets do not need to be disjoint • An Entity is represented in ERD as a rectangular box enclosing the type name Key Attributes of an Entity Type • Key attribute - An attribute whose values are distinct for each individual entities in the entity set. Its values can be used to identify each entity uniquely – e. g. the account number of an account, the employee id of an employee • composite attribute - If several attributes form a key i.e. combination of attribute values must be distinct for each entity. A composite key must be minimal • Each key attribute has its name underlined inside the oval / ellipse Value sets (Domain) of attributes • Each Simple attribute of an entity type is associated with a value set or domain of values , which specifies the set of values that may be assigned to that attribute for each individual entity – If range of ages allowed for employees is between 16 & 70 then value set for age attribute is set of integer numbers between 16 & 70 – Value set for name attribute is set of strings of alphabetic characters separated by blank spaces • Value sets are not displayed in ERD. They are specified using the basic data types available in programming languages Attributes • An entity is represented by a set of attributes • Attributes are descriptive properties possessed by each member of an entity set • Each entity may have its own value for each attribute • The set of possible values for each attribute of a particular entity is called the domain of the attribute – The domain of attribute marital status is just the four values of set of alphabetic characters single, married, divorced, widowed – The domain of the attribute month is set of twelve values ranging from January to December Types of attributes Simple Vs composite attribute • Simple / Atomic - cannot be divided into smaller subparts / simpler components – age of an employee • Composite - can be divided into smaller subparts / simpler components, which represents attribute with independent meaning – joining date of the employee. Can be divided into day, month and year If composite attribute is referred only as a whole then no need to divide it into component attributes, the whole composite attribute can be referred as simple attribute Composite Attributes Single Vs Multi-valued Attributes • Single valued - Can take on only a single value for each particular entity – age of a person. There can be only one value for this • Multi-valued - Can take set of values for the same entity – skill set of an employee, colors for a car, degrees for a person, dependents of an employee, nominees of an a/c holder Stored Vs Derived attribute • Stored - Attribute values that need to be stored permanently – name of an employee • Derived– Attribute values that can be calculated /derived from value of other attribute – years of service of employee can be calculated from date of joining and current date Null attribute • Null - An attribute takes a null value when an entity does not have an applicable value for it. The NULL value may indicate “not-applicable”, value does not exist for the entity or an attribute. Attribute Value is unknown. An unknown value may be either missing or not known – telephone no. of a person Relationship Types, Relation Sets, Roles, Constraints • A relationship is an association among several entities Example: Hayes depositor A-102 customer entity relationship set account entity • A relationship set is a mathematical relation among n 2 entities, each taken from entity sets {(e1, e2, … en) | e1 E1, e2 E2, …, en En} where (e1, e2, …, en) is a relationship – Example: (Hayes, A-102) depositor Relationship Set borrower • An attribute can also be property of a relationship set, called as descriptive attribute. • For instance, the depositor relationship set between entity sets customer and account may have the attribute access-date Degree of a Relationship Set • Refers to number of entity sets that participate in a relationship set. • One Unary • Two Binary • Three Ternary • Relationship sets that involve two entity sets are binary (or degree two). Generally, most relationship sets in a database system are binary. • Relationship sets may involve more than two entity sets. – E.g. Suppose employees of a bank may have jobs (responsibilities) at multiple branches, with different jobs at different branches. Then there is a ternary relationship set between entity sets employee, job and branch • Relationships between more than two entity sets are rare. Most relationships are binary. Mapping Constraints E-R may define certain constraints to which the contents of the database must confirm. Two most important types of constraints are • Existence dependencies • Mapping Cardinalities Mapping Constraints - Existence Dependencies • IF the existence of entity ‘x’ depends on the existence of entity ‘y’, then ‘x’ is said to be existence dependent on ‘y’. If ‘y’ is deleted, so is ‘x’. Entity ‘y’ is said to be dominant entity and ‘x’ is said to be subordinate entity Mapping Constraints - Mapping Cardinalities • Express the number of entities to which another entity can be associated via a relationship set. • Most useful in describing binary relationship sets. • For a binary relationship set the mapping cardinality must be one of the following types: – – – – One to one One to many Many to one Many to many Mapping Cardinalities Some elements in A and B may not be mapped to any elements in the other set Mapping Cardinalities E-R Diagram Overall logical structure of a database expressed graphically Major Components are • Rectangles represent entity sets. • Diamonds represent relationship sets. • Lines - Single line link attributes to entity sets and entity sets to relationship sets. - Double Line indicate total participation of an entity in a relationship set • Ellipses represent attributes – Double ellipses represent multi-valued attributes. – Dashed ellipses denote derived attributes. • Underline indicates primary key attributes E-R Diagram With Composite, Multi-valued, and Derived Attributes Relationship Sets with Attributes E-R diagram - Cardinalities • – – – – Relationships can have different connectivity one-to-one (1:1) one-to-many (1:N) many-to-One (M:1) many-to-many (M:N) OR • We express cardinality constraints by drawing either a directed line (), signifying “one,” or an undirected line (—), signifying “many,” between the relationship set and the entity set. E.g. : Employee head-of department (1:1) Lecturer offers course (1:n) assuming a course is taught by a single lecturer Student enrolls course (m:n) • The minimum and maximum values of this connectivity is called the cardinality of the relationship Entity’s Role • Entity sets of a relationship need not be distinct • The labels “manager” and “worker” are called roles; they specify how employee entities interact via the works-for relationship set. • Roles are indicated in E-R diagrams by labeling the lines that connect diamonds to rectangles. • Role labels are optional, and are used to clarify semantics of the relationship • The function that an entity plays in a relationship is called that entity’s role Participation of an Entity Set in a Relationship Set Total participation (indicated by double line) - every entity in the entity set participates in at least one relationship in the relationship set Partial participation - some entities may not participate in any relationship in the relationship set Participation of an Entity Set in a Relationship Set Alternative Notation for Cardinality Limits • Alternative Notation for Cardinality Limits Weak Entity Types • Strong / Regular Entity Type - Entity type that has its own key attribute – Employee, student, customer, department • Weak / child / subordinate Entity Type - Entity types that do not have key attributes of their own – Dependent of an Employee, nominees of an a/c holder • Identifying / Owner Entity Type Entities belonging to a weak entity type are identified by being related to specific entities from another entity type in combination with one of their attribute values. The other entity type is identifying /owner / parent / dominant entity type Weak Entity Example • Identifying Relationship – The relationship that relates a weak entity type to the owner • Weak Entity type always has a total participation constraint i.e. existence dependency with respect to its identifying relationship • Not every existence dependency results in a weak entity type – Driver_license entity cannot exist unless person entity, though it has its own key. So Driver_license is not a weak entity type Weak Entity • In ERD weak entity type and its identifying relationship are represented by double line rectangle and double line diamond respectively. Partial key attribute is underlined with a dashed or dotted line Design Issues • Use of entity sets vs. attributes Choice mainly depends on the structure of the enterprise being modeled, and on the semantics associated with the attribute in question • Use of entity sets vs. relationship sets Possible guideline is to designate a relationship set to describe an action that occurs between entities • Binary versus n-ary relationship sets Although it is possible to replace any nonbinary (n-ary, for n > 2) relationship set by a number of distinct binary relationship sets, a n-ary relationship set shows more clearly that several entities participate in a single relationship. • Placement of relationship attributes An Example Database Application • The company is organized into departments. Each department has a unique name, a unique number, and a particular employee who manages the department. We keep track of the start date when that employee began managing the department. A department may have several locations • A department controls a number of projects, each of which has a unique name, a unique number, and a single location • We store each employee’s name, number, address, salary, gender, and DOB. An employee is assigned to one department but may work on several projects, which are not necessarily controlled by the same department. We keep track of the direct supervisor of each employee • We want to keep track of the dependents of each employee for insurance purposes. We keep each dependent’s first name, gender, DOB and relationship to the employee Enhanced / Extended ER Modeling (EER) • In late 1970’s designers of the DB applications designed more accurate database schemas that reflect the data properties and constraint more precisely for newer applications of database such as GIS (Geographic Information Systems), CAD/CAM. These databases have more complex requirements than traditional applications. This led to the development of additional semantic data modeling concepts that were incorporated into conceptual data models such as ER model. EER model includes all the modeling concepts of the ER model Sub-classes, Super-classes And Inheritance • Sub-Classes – In many cases an entity type has various sub-groupings of its entities that are meaningful and need to be represented explicitly because of their significance to the DB applications – Entities of Employee entity type may be grouped into Secretary, Engineer, Manager, Salaried_employee, Hourly_Employee Each of the subgroup is sub-class of the Employee entity type. The Employee entity is called Super-class for each of these sub-classes • An entity in sub-class is same as entity in the super-class but in a distinct specific role Sub-classes, Super-classes And Inheritance • The relationship between a superclass and any of its sub-classes is a super-class/ sub-class or class/subclass relationship. It is often called an IS-A or IS-AN relationship • Type Inheritance – A concept associated with sub-class. An entity that is member of a sub-class inherits all the attributes of the entity that is member of super-class • Sub-class entity also inherits all the relationships in which the super-class participates Specialization • A process of defining a set of sub-classes of an entity type; this entity type is called super-class of the specialization. The sub-class is defined on the basis of some distinguishing characteristic of the super-class. Top-down process • We represent a specialization in EERD as the sub-classes that define a specialization are attached by lines to a circle that represent specialization, which is connected to the superclass. The subset symbol on each line connecting a subclass to the circle indicates the direction of the super-class / sub-class relationship. Attributes that apply to only subclass are attached to the rectangle representing that sub-class and are called as local / specific attributes of the sub-class. The “d” symbol is written inside the circle Specialization In EERD Employee Secretary Engineer Manager Specialization Example Generalization • A reverse process of abstraction in which the differences among several entity types are suppressed. Common features of sub-class entity types are identified and generalized them into a single super-class. Bottom-up process • The term generalization refer to the process of defining a generalized entity type from the given entity types Specialization and Generalization • Can have multiple specializations of an entity set based on different features – permanent-employee vs. temporaryemployee, in addition to secretary, manager, engineer • Each particular employee would be – a member of one of permanent-employee or temporary-employee – and also a member of one of secretary, manager, engineer Design Constraints on a Specialization/Generalization • Constraint on which entities can be members of a given lowerlevel entity type – condition-defined • E.g. all customers over 65 years are members of senior-citizen entity set; senior-citizen ISA person. – user-defined Design Constraints on a Specialization/Generalization • Constraint on whether or not entities may belong to more than one lower-level entity type within a single generalization – Disjoint • an entity can belong to only one lower-level entity type • Noted in E-R diagram by writing disjoint next to the ISA triangle – Overlapping • an entity can belong to more than one lower-level entity type Design Constraints on a Specialization/Generalization • Completeness constraint -- specifies whether or not an entity in the higherlevel entity set must belong to at least one of the lower-level entity sets within a generalization – total : an entity must belong to one of the lower-level entity type – partial: an entity need not belong to one of the lower-level entity type Summary of the notation for ERD Summary of the notation for ERD Alternative E-R Notations Aggregation • Consider the ternary relationship works-on • Suppose we want to record managers for tasks performed by an employee at a branch Aggregation • Eliminate this redundancy via aggregation – Treat relationship as an abstract entity – Allows relationships between relationships – Abstraction of relationship into new entity E-R Diagram With Aggregation Keys • A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity. • A candidate key of an entity set is a minimal super key – Customer-id is candidate key of customer – account-number is candidate key of account • Although several candidate keys may exist, one of the candidate keys is selected to be the primary key. • Candidate key = minimal key to identify a row • Super key = at least as wide as a candidate key • Composite primary key - A primary key which is a combination of more than one attribute is called a composite primary key • Overlapping candidate keys - Two candidate keys overlap if they involve any attribute in common. e.g., in an Employee table, E#, Ename and Emailid, Ename are two overlapping candidate keys. (they have Ename in common) Example • Imagine a table with the fields <Name>, <Age>, <SSN> and <Phone Extension>. This table has many possible superkeys. Three of these are <SSN>, <Phone Extension, Name> and <SSN, Name>. Of those listed, only <SSN> is a candidate key, as the others contain information not necessary to uniquely identify records. • Non-Key Attributes The attributes other than the Candidate Key attributes are called Non-Key attributes. OR • The attributes which do not participate in any of the Candidate keys • Foreign key A foreign key is a “copy” of a primary key that has been exported from one relation into another to represent the existence of a relationship between them. A foreign key is a copy of the whole of its parent primary key i.e. if the primary key is composite, then so is the foreign key • Weak entity set represented by double rectangles. • underline the discriminator of a weak entity set with a dashed line. • payment-number – discriminator of the payment entity set • Primary key for payment – (loan-number, payment-number) • Note: the primary key of the strong entity set is not explicitly stored with the weak entity set, since it is implicit in the identifying relationship. • If loan-number were explicitly stored, payment could be made a strong entity, but then the relationship between payment and loan would be duplicated by an implicit relationship defined by the attribute loan-number common to payment and loan Case Study For a college Assumptions : • A college contains many departments • Each department can offer any number of courses • Many instructors can work in a department • An instructor can work only in one department • For each department there is a Head • An instructor can be head of only one department • Each instructor can take any number of courses • A course can be taken by only one instructor • A student can enroll for any number of courses • Each course can have any number of students Steps in ER Modeling Identify the Entities • Find relationships • Identify the key attributes for every Entity • Identify other relevant attributes • Draw complete E-R diagram with all attributes including Primary Key • Review your results with your Business users Step 1: Identify the Entities • DEPARTMENT • STUDENT • COURSE • INSTRUCTOR Step 2: Find the relationships • One course is enrolled by multiple students and one student enrolls for multiple courses, hence the cardinality between course and student is Many to Many. • The department offers many courses and each course belongs to only one department, hence the cardinality between department and course is One to Many. • One department has multiple instructors and one instructor belongs to one and only one department , hence the cardinality between department and instructor is one to Many. • Each department there is a “Head of department” and one instructor is “Head of department “,hence the cardinality is one to one . • One course is taught by only one instructor, but the instructor teaches many courses, hence the cardinality between course and instructor is many to one. Step 3: Identify the key attributes • Deptname is the key attribute for the Entity “Department”, as it identifies the Department uniquely. • Course# (CourseId) is the key attribute for “Course” Entity. • Student# (Student Number) is the key attribute for “Student” Entity. • Instructor Name is the key attribute for “Instructor” Entity. Step 4: Identify other relevant attributes • For the department entity, the relevant attribute is location • For course entity, course name, duration, prerequisite • For instructor entity, room#, telephone# • For student entity, student name, date of birth SUMMARY OF ER-DIAGRAM NOTATION FOR ER SCHEMAS Symbol Meaning ENTITY TYPE WEAK ENTITY TYPE RELATIONSHIP TYPE IDENTIFYING RELATIONSHIP TYPE ATTRIBUTE KEY ATTRIBUTE MULTIVALUED ATTRIBUTE COMPOSITE ATTRIBUTE DERIVED ATTRIBUTE E1 E1 E2 R R R TOTAL PARTICIPATION OF E2 IN R N E2 CARDINALITY RATIO 1:N FOR E1:E2 IN R (min,max) E STRUCTURAL CONSTRAINT (min, max) ON PARTICIPATION OF E IN R ER DIAGRAM – Entity Types are: EMPLOYEE, DEPARTMENT, PROJECT, DEPENDENT Designing an ER Diagram Consider the following set of requirements for a University database. Design an ER diagram for this application: • • • • • The university keeps track of each student's name, student number, social security number, current address and phone number, permanent address and phone number, birthdate, sex, class (freshman, graduate), major department, minor department (if any), degree program (B.A., B.S., ... Ph.D.). Some user applications need to refer to the city, state, and zip code of the student's permanent address and to the student's last name. Both social security number and student number are unique for each student. All students will have at least a major department. Each department is described by a name, department code, office number, office phone, and college. Both the name and code have unique values for each department. Each course has a course name, description, course number, number of credits, level and offering department. The course number is unique for each course. Each section has an instructor, semester, year, course, and section number. The section number distinguishes sections of the same course that are taught during the same semester/year; its value is an integer (1, 2, 3, ... up to the number of sections taught during each semester). A grade report must be generated for each student that lists the section, letter grade, and numeric grade (0,1,2,3, or 4) for each student and calculates his or her average GPA. University ER Diagram Degree Name Birth date StudentID SSN DName Major In Department Student Sex DCode OfficeNumber OfficePhone College Class Minor In Address City State Zip Offer CName Grade_Report Letter Grade Instructor CourseDesc Year Course CNumber GPA Credits Numeric Grade Section SectionNumber Belong_To Semester ER DIAGRAM FOR A BANK DATABASE © The Benjamin/Cummings Publishing Company, Inc. 1994, Elmasri/Navathe, Fundamentals of Database Systems, Second Edition FIGURE 3.17 An ER diagram for an AIRLINE database schema. Reduction of an E-R Schema to Tables • Each strong entity set becomes a table • Each single-valued attribute becomes a column • Derived attributes are ignored • Composite attributes are represented by components • Multi-valued attributes are represented by a separate table • The key attribute of the entity set becomes the primary key of the table Entity example • Here address is a composite attribute • Years of service is a derived attribute (can be calculated from date of joining and current date) • Skill set is a multi-valued attribute • The relational Schema Employee (E#, Name, Door_No, Street, City, Pincode, Joining_Date) Emp_Skillset( E#, Skillset) Converting weak entity types • Weak entity types are converted into a table of their own, with the primary key of the strong entity acting as a foreign key in the table • This foreign key along with the key of the weak entity (discriminator) form the composite primary key of this table The Relational Schema Employee (E# ,…….) Dependant (Employee, Dependant_ID, Name, Address) Converting relationships • The way relationships are represented depends on the cardinality and the degree of the relationship • The possible cardinalities are: 1:1, 1:M, N:M • The degrees are: Unary Binary Ternary Binary 1:1 The primary key of the partial participant will become the foreign key of the total participant Employee( E#, Name,…) Department (Dept#, Name…,Head) Binary 1 : 1 Binary 1 : 1 Case 2: Uniform participation types The primary key of either of the participants can become a foreign key in the other Employee (E#,name…) Chair( item#, model, location, used_by) (or) Employee ( E#, Name….Sits_on) Chair (item#,….) Binary 1 : 1 Binary 1:N The primary key of the relation on the “1” side of the relationship becomes a foreign key in the relation on the “N” side Teacher (ID, Name, Telephone, ...) Subject (Code, Name, ..., Teacher) Binary 1 : N • A new table is created to represent the relationship • New table contains two foreign keys - one from each of the participants in the relationship • The primary key of the new table is the combination of the two foreign keys • Student (Sid#,Title…) Course(C#,CName,…) • Enrolls (Sid#, C#) Binary M : N Unary 1 : 1 • Consider employees who are also a couple • The primary key field itself will become foreign key in the same table Employee( E#, Name,... , Spouse) Unary 1 : 1 Employee Table EmpCode PK EmpName DateofJoining Spouse FK Unary 1:N • The primary key field itself will become foreign key in the same table • Same as unary 1:1 Employee( E#, Name,…,Manager) Unary 1 : N Employee Table EmpCode PK EmpName DateofJoining Manager FK Unary M:N • There will be two resulting tables. One to represent the entity and another to represent the M:N relationship as follows • Employee( E#, Name,…) • Guaranty( Guarantor, beneficiary) Unary M : N Employee Table EmpCode PK Beneficiary PK /FK EmpName DateofJoining Spouse Employee Table Guarantor PK/FK FK Ternary relationship Ternary relationship • Represented by a new table • The new table contains three foreign keys – one from each of the participating Entities • The primary key of the new table is the combination of all three foreign keys Prescription (Doctor#, Patient #, Medicine_Name) Representing Specialization as Tables • Method 1 – create a table for the higher level entity – create a table for each lower level entity set, include primary key of higher level entity set and local attributes table table attributes person customer employee name, street, city name, credit-rating name, salary Drawback - getting information about, employee requires accessing two tables • Method 2 – create a table for each entity set with all local and inherited attributes table customer employee table attributes name, street, city, credit-rating name, street, city, salary If specialization is total, no need to create table for generalized entity (person) – Drawback - street and city may be stored redundantly for persons who are both customers and employees Relations Corresponding to Aggregation • To represent aggregation, – create a table containing primary key of the aggregated relationship, the primary key of the associated entity set – Any descriptive attributes • To represent aggregation manages between relationship works-on and entity set manager, create a table manages(employee-id, branch-name, title, manager-name) • Table works-on is redundant provided we are willing to store null values for attribute manager-name in table manages • The existence of a weak entity set depends on the existence of a identifying entity set – it must relate to the identifying entity set via a total, one-to-many relationship set from the identifying to the weak entity set – Identifying relationship depicted using a double diamond • The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes among all the entities of a weak entity set. • The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent, plus the weak entity set’s discriminator. Specialization • Top-down design process; we designate subgroupings within an entity set that are distinctive from other entities in the set. • These subgroupings become lower-level entity sets that have attributes or participate in relationships that do not apply to the higher-level entity set. • Depicted by a triangle component labeled ISA (E.g. customer “is a” person). • Attribute inheritance – a lower-level entity set inherits all the attributes and relationship participation of the higher-level entity set to which it is linked. Generalization • A bottom-up design process – combine a number of entity sets that share the same features into a higher-level entity set. • Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in the same way. • The terms specialization and generalization are used interchangeably.