Review of the Entity-Relationship Model Slides courtesy of Amol Deshpande material from ch. 2 of Korth & Silberschatz Database System Concepts, Data Modeling Goals: Conceptual representation of the data “Reality” meets “bits and bytes” Must make sense, and be usable by other people Review: Entity-relationship Model Relational Model Motivation You’ve just been hired by Bank of America as their DBA for their online banking web site. You are asked to create a database that monitors: customers accounts loans branches transactions, … Now what??!!! Database Design Steps Entity-relationship Model Typically used for conceptual database design info Conceptual DB design Three Levels of Modeling Conceptual Data Model Logical DB design Logical Data Model Relational Model Typically used for logical database design Physical DB design Physical Data Model Entity-Relationship Model Two key concepts Entities: • An object that exists and is distinguishable from other objects – Examples: Bob Smith, BofA, CMSC424 • Have attributes (people have names and addresses) • Form entity sets with other entities of the same type that share the same properties – Set of all people, set of all classes • Entity sets may overlap – Customers and Employees Entity-Relationship Model Two key concepts Relationships: • Relate 2 or more entities – E.g. Bob Smith has account at College Park Branch • Form relationship sets with other relationships of the same type that share the same properties – Customers have accounts at Branches • Can have attributes: – has account at may have an attribute start-date • Can involve more than 2 entities – Employee works at Branch at Job ER Diagram: Starting Example access-date cust-name number cust-id customer has account cust-street cust-city Rectangles: entity sets Diamonds: relationship sets Ellipses: attributes balance Review Roadmap Details of the ER Model How to represent various types of constraints/semantic information etc. Design issues A detailed example Relationship Cardinalities We may know: One customer can only open one account OR One customer can open multiple accounts Representing this is important Why ? Better manipulation of data Can enforce such a constraint Remember: If not represented in conceptual model, the domain knowledge may be lost Mapping Cardinalities Express the number of entities to which another entity can be associated via a relationship set Most useful in describing binary relationship sets Mapping Cardinalities One-to-One One-to-Many customer has account customer has account customer has account customer has account Many-to-One Many-to-Many Types of Attributes Simple vs Composite Single value per attribute ? Single-valued vs Multi-valued E.g. Phone numbers are multi-valued Derived If date-of-birth is present, age can be derived Can help in avoiding redundancy, enforcing constraints etc… Types of Attributes access-date cust-name number cust-id customer has account cust-street cust-city balance Types of Attributes age multi-valued (double ellipse) derived (dashed ellipse) access-date cust-name number cust-id date-of-birth customer has account cust-city phone no. balance cust-street Types of Attributes access-date age cust-name number cust-id date-of-birth customer has account phone no. balance cust-street month day cust-city year Composite Attribute Next: Keys Key = set of attributes identifying individual entities or relationships Entity Keys Possible Keys: date-of-birth {cust-id} cust-name {cust-name, cust-city, cust-street} {cust-id, age} cust-id cust-name ?? Probably not. age customer Domain knowledge dependent !! cust-street cust-city phone no. Entity Keys Superkey any attribute set that can distinguish entities Candidate key a minimal superkey • Can’t remove any attribute and preserve key-ness – {cust-id, age} not a superkey – {cust-name, cust-city, cust-street} is » assuming cust-name is not unique Primary key Candidate key chosen as the key by DBA Underlined in the ER Diagram Entity Keys {cust-id} is a natural primary key Typically, SSN forms a good primary key date-of-birth cust-name Try to use a candidate key that rarely changes e.g. something involving address not a great idea cust-id age customer cust-street cust-city phone no. Relationship Set Keys What attributes are needed to represent a relationship completely and uniquely ? Union of primary keys of the entities involved, and relationship attributes access-date number cust-id customer has account {cust-id, access-date, account number} describes a relationship completely Relationship Set Keys Is {cust-id, access-date, account number} a candidate key ? No. Attribute access-date can be removed from this set without losing key-ness access-date number cust-id customer has account Relationship Set Keys Is {cust-id, account-number} a candidate key ? Depends access-date number cust-id customer has account Relationship Set Keys Is {cust-id, account-number} a candidate key ? Depends access-date number cust-id customer has account If one-to-one relationship, either {cust-id} or {account-number} sufficient Since a given customer can only have one account, she can only participate in one relationship Ditto account Relationship Set Keys Is {cust-id, account-number} a candidate key ? Depends access-date number cust-id customer has account If one-to-many relationship (as shown), {account-number} is a candidate key A given customer can have many accounts, but at most one account holder per account allowed Relationship Set Keys General rule for binary relationships one-to-one: primary key of either entity set one-to-many: primary key of the entity set on the many side many-to-many: union of primary keys of the associate entity sets n-ary relationships More complicated rules Data Constraints Representing semantic data constraints We already saw constraints on relationship cardinalities Participation Constraint Given an entity set E, and a relationship R it participates in: If every entity in E participates in at least one relationship in R, it is total participation partial otherwise Participation Constraint access-date cust-name number cust-id customer has account cust-street cust-city balance Total participation Cardinality Constraints How many relationships can an entity participate in ? access-date number cust-id customer 0..* Minimum - 0 Maximum – no limit account has 1..1 Minimum - 1 Maximum - 1 Recursive Relationships Sometimes a relationship associates an entity set to itself Recursive Relationships emp-name emp-id manager works-for employee worker emp-street emp-city Must be declared with roles Weak Entity Sets An entity set without enough attributes to have a primary key E.g. Transaction Entity Attributes: • transaction-number, transaction-date, transaction-amount, transaction-type • transaction-number: may not be unique across accounts Weak Entity Sets A weak entity set must be associated with an identifying or owner entity set Account is the owner entity set for Transaction Weak Entity Sets Still need to be able to distinguish between different weak entities associated with the same strong entity number trans-date trans-number account has Transaction trans-type balance trans-amt Weak Entity Sets Discriminator: A set of attributes that can be used for that number trans-date trans-number account has Transaction trans-type balance trans-amt Weak Entity Sets Primary key: Primary key of the associated strong entity discriminator attribute set For Transaction: • {account-number, transaction-number} + Specialization Consider entity person: Attributes: name, street, city Further classification: customer • Additional attributes: customer-id, credit-rating employee • Additional attributes: employee-id, salary Note similarities to object-oriented programming Specialization: Example Aggregation No relationships between relationships E.g.: Associate account officers with has account relationship set customer has ? account officer employee account Aggregation Associate an account officer with each account ? What if different customers for the same account can have different account officers ? customer has ? account officer employee account Aggregation Solution: Aggregation customer has account officer employee account More… Read Chapter 2 for: Specialization/Aggregation details • Different types of specialization’s etc Generalization: opposite of specialization Lower- and higher-level entities Attribute inheritance … E/R Data Model Design Issue #1: Entity Sets vs. Attributes An Example: Employees can have multiple phones (b) (a) Employee phone_no vs Employee Phone Uses phone_loc loc no To resolve, determine how phones are used 1. Can many employees share a phone? (If yes, then (b)) 2. Can employees have multiple phones? (if yes, then (b), or (a) with multivalued attributes) 3. Else (a), perhaps with composite attributes Employee phone no loc E/R Data Model Design Issue #2: Entity Sets vs. Relationship Sets An Example: How to model bank loans Customer ssn Loan Borrows name (a) lno amt vs Customer ssn Branch Loans bname name amt lno (b) To resolve, determine how loans are issued 1. Can there be more than one customer per loan? • If yes, then (a). Otherwise, loan info must be replicated for each customer (wasteful, potential update anomalies) 2. Is loan a noun or a verb? • Both, but more of a noun to a bank. (hence (a) probably more appropriate) bcity E/R Data Model Design Issue #3: N-ary vs Binary Relationship Sets An Example: Works_At Ternary: Employee Works_at Dept Branch (Joe, Moody, Acct) Works_At vs Binary: Employee WAE WA WAB Branch WAD Dept (Joe, w3) WAE (Moody, w3) WAB (Acct, w3) WAD Choose n-ary when possible! (Avoids redundancy, update anomalies) Example Design We will model a university database Main entities: • Professor • Projects • Departments • Graduate students • etc… SSN proj-number name sponsor professor project area start rank budget dept-no SSN name name dept grad office age homepage degree SSN proj-number name sponsor professor project area start rank budget dept-no SSN name name dept grad office age homepage degree proj-number SSN PI name sponsor professor project area start rank budget Co-PI Appt Chair Supervises RA Time (%) dept-no SSN name name homepage Major age advisor office grad advisee dept Mentor degree proj-number SSN PI name sponsor professor project area start rank budget Co-PI Appt Chair Supervises RA Time (%) dept-no SSN name name homepage Major age advisor office grad advisee dept Mentor degree proj-number SSN PI name sponsor professor project area start rank budget Co-PI Appt Chair Supervises RA Time (%) dept-no SSN name name Major advisee office grad homepage And so on… age advisor dept Mentor degree Summary Entity-relationship Model Intuitive diagram-based representation of domain knowledge, data properties etc… Two key concepts: • Entities • Relationships Additional Details: • Relationship cardinalities • Keys • Participation Constraints • … Database Design Steps Entity-relationship Model Typically used for conceptual database design info Conceptual DB design Three Levels of Modeling Conceptual Data Model Logical DB design Logical Data Model Relational Model Typically used for logical database design Physical DB design Physical Data Model Review: Entity-Relationship Model Basics E1 a1 … E2 R b1 an c1 … … bm ck E1 Entity set R Relationship set Attribute (primary key if underlined) a Thoughts… Nothing about actual data How is it stored ? No talk about the query languages How do we access the data ? Semantic vs Syntactic Data Models Remember: E/R Model is used for conceptual modeling Many conceptual models have the same properties They are much more about representing the knowledge than about database storage/querying Thoughts… Basic design principles Faithful Must make sense Satisfies the application requirements Models the requisite domain knowledge If not modeled, lost afterwards Avoid redundancy Potential for inconsistencies Go for simplicity Typically an iterative process that goes back and forth Relational Data Model Introduced by Ted Codd (late 60’s – early 70’s) • Before = “Network Data Model” (Cobol as DDL, DML) • Very contentious: Database Wars (Charlie Bachman vs. Mike Stonebraker) Relational data model contributes: 1. 2. 3. 4. Separation of logical, physical data models (data independence) Declarative query languages Formal semantics Query optimization (key to commercial success) Key Abstraction: Relation Account = bname acct_no balance Downtown Brighton Brighton A-101 A-201 A-217 500 900 500 Terms: • Tables (aka: Relations) Why called Relations? Why Called Relations? Mathematical relations Given sets: R = {1, 2, 3}, S = {3, 4} • R S = { (1, 3), (1, 4), (2, 3), (2, 4), (3, 3), (3, 4) } • A relation on R, S is any subset () of R S (e.g: { (1, 4), (3, 4)}) Database relations Given attribute domains Branches = { Downtown, Brighton, … } Accounts = { A-101, A-201, A-217, … } Balances = R Account Branches Accounts Balances { (Downtown, A-101, 500), (Brighton, A-201, 900), (Brighton, A-217, 500) } Relations Account = bname acct_no balance Downtown Brighton Brighton A-101 A-201 A-217 500 900 500 Considered equivalent to… { (Downtown, A-101, 500), (Brighton, A-201, 900), (Brighton, A-217, 500) } Relational database semantics defined in terms of mathematical relations Relations Account = bname acct_no balance Downtown Brighton Brighton A-101 A-201 A-217 500 900 500 Considered equivalent to… Terms: • • • • { (Downtown, A-101, 500), (Brighton, A-201, 900), (Brighton, A-217, 500) } Tables (aka: Relations) Rows (aka: tuples) Columns (aka: attributes) Schema (e.g.: Acct_Schema = (bname, acct_no, balance)) Definitions 1. Relation Schema (or Schema) A list of attributes and their domains We will require the domains to be atomic Programming language equivalent: A variable (e.g. x) E.g. account(account-number, branch-name, balance) • Relation Instance A particular instantiation of a relation with actual values Will change with time bname acct_no balance Programming language equivalent: Value of a variable Downtown Brighton Brighton A-101 A-201 A-217 500 900 500 Rest of the Class • Converting from an E/R diagram to a relational schema – • Remember: We still use E/R models for conceptual modeling of the database Relational Algebra – Data retrieval language E/R Diagrams Relations Convert entity sets into a relational schema with the same set of attributes Customer cname ccity bname bcity Branch Customer_Schema(cname, ccity, cstreet) cstreet assets Branch_Schema(bname, bcity, assets) E/R Diagrams Relations Convert relationship sets also into a relational schema Remember: A relationship is completely described by primary keys of associate entities and its own attributes acct-no balance Account_Schema(acct-no, balance) Account access-date Depositor_Schema(cname, acct-no, access-date) Depositor Customer cname ccity Customer_Schema(cname, ccity, cstreet) cstreet Well… Not quite. We can do better. It depends on the relationship cardinality E/R Diagrams Relations Say One-to-Many Relationship from Customer to Account Many accounts per customer acct-no balance Account access-date Account_Schema(acct-no, balance, cname, access-date) Depositor Customer cname ccity Customer_Schema(cname, ccity, cstreet) cstreet Exactly same information, fewer tables E/R Diagrams Relations E/R Entity Sets Relational Schema E1 E = (a1, …, an) a1 … an E/R Diagrams Relations E/R Entity Sets Relational Schema E1 E = (a1, …, an) a1 an … Relationship Sets E1 a1 … b1 an c1 … R = (a1, b1, c1, …, cn) E2 R … bm ck Not the whole story for Relationship Sets … a1: E1’s key b1: E2’s key c1, …, ck: attributes of R E/R Diagrams Relations Relationship Cardinality Relational Schema R E1 a1 … b1 an c1 n:m R E2 … … bm ck E1 = (a1, …, an) E2 = (b1, …, bm) R = (a1, b1, c1, …, cn) E/R Diagrams Relations Relationship Cardinality Relational Schema R E1 a1 … b1 an c1 n:m R n:1 R E2 … … bm ck E1 E2 R E1 E2 = = = = = (a1, …, an) (b1, …, bm) (a1, b1, c1, …, cn) (a1, …, an, b1, c1, …, cn) (b1, …, bm) E/R Diagrams Relations Relationship Cardinality Relational Schema R E1 a1 … b1 an c1 n:m E2 … … bm ck R R E1 = (a1, …, an) E2 = (b1, …, bm,, a1, c1, …, cn) R n:1 1:n = = = = = (a1, …, an) (b1, …, bm) (a1, b1, c1, …, cn) (a1, …, an, b1, c1, …, cn) (b1, …, bm) E1 E2 R E1 E2 E/R Diagrams Relations Relationship Cardinality Relational Schema R E1 a1 … b1 an c1 n:m E2 … … bm ck R R E1 = (a1, …, an) E2 = (b1, …, bm,, a1, c1, …, cn) R Treat as n:1 or 1:n R n:1 1:n = = = = = (a1, …, an) (b1, …, bm) (a1, b1, c1, …, cn) (a1, …, an, b1, c1, …, cn) (b1, …, bm) E1 E2 R E1 E2 1:1 Translating E/R Diagrams to Relations acct_no balance Account bname Loan-Branch Customer ccity Branch Acct-Branch Depositor cname assets bcity Loan Borrower cstreet lno amt Q. How many tables does this get translated into? A. 6 (account, branch, customer, loan, depositor, borrower) Bank Database Account bname acct_no Branch balance bname bcity assets Depositor cname Borrower acct_no cname lno Customer cname cstreet ccity Loan bname lno amt Bank Database Account Branch bname acct_no balance bname bcity assets Downtown Mianus Perry R.H. Brighton Redwood Brighton A-101 A-215 A-102 A-305 A-201 A-222 A-217 500 700 400 350 900 700 750 Downtown Redwood Perry Mianus R.H. Pownel N. Town Brighton Brooklyn Palo Alto Horseneck Horseneck Horseneck Bennington Rye Brooklyn 9M 2.1M 1.7M 0.4M 8M 0.3M 3.7M 7.1M Depositor cname acct_no Johnson Smith Hayes Turner Johnson Jones Lindsay A-101 A-215 A-102 A-305 A-201 A-217 A-222 Customer cname cstreet ccity Jones Smith Hayes Curry Lindsay Turner Williams Adams Johnson Glenn Brooks Green Main North Main North Park Putnam Nassau Spring Alma Sand Hill Senator Walnut Harrison Rye Harrison Rye Pittsfield Stanford Princeton Pittsfield Palo Alto Woodside Brooklyn Stanford Borrower cname lno Jones Smith Hayes Jackson Curry Smith Williams Adams L-17 L-23 L-15 L-14 L-93 L-11 L-17 L-16 Loan bname lno amt Downtown Redwood Perry Downtown Mianus R.H. Perry L-17 L-23 L-15 L-14 L-93 L-11 L-16 1000 2000 1500 1500 500 900 1300 E/R Diagrams & Relations E/R Relational Schema Weak Entity Sets IR E1 a1 … an E1 = (a1, …, an) E2 = (a1, b1, …, bm) E2 b1 … bm E/R Diagrams & Relations E/R Relational Schema Multivalued Attributes Emp = (ssn, name) Emp-Phones = (ssn, phone) Employee ssn name phone ssn name ssn phone 001 … Smith … 001 001 … 4-1234 4-5678 … Emp Emp-Phones E/R Diagrams & Relations E/R Relational Schema Subclasses a1 Method 1: E = (a1, …, an) E1 = (a1, b1, …, bm) E2 = (a1, c1, …, ck) an … E ISA E1 b1 … E2 bm c1 … ck E/R Diagrams & Relations E/R Relational Schema Subclasses a1 Method 1: E = (a1, …, an) E1 = (a1, b1, …, bm) E2 = (a1, c1, …, ck) an … E ISA E1 b1 … E2 bm c1 … ck Method 2: E1 = (a1, …, an, b1, …, bm) E2 = (a1, …, an, c1, …, ck) E/R Diagrams & Relations Subclasses example: Method 1: Account SAccount CAccount = (acct_no, balance) = (acct_no, interest) = (acct_no, overdraft) Method 2: SAccount CAccount = (acct_no, balance, interest) = (acct_no, balance, overdraft) Q: When is method 2 not possible? A: When subclassing is partial Keys and Relations As in the E/R Model: 1. Superkeys • set of attributes of table for which every row has distinct set of values 2. Candidate keys •“minimal” superkeys 3. Primary keys •DBA-chosen candidate keys Act as Integrity Constraints i.e., guard against illegal/invalid instance of given schema e.g., Branch = (bname, bcity, assets) bname bcity assets Brighton Brighton Brooklyn Boston 5M 3M Invalid!!