CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu CMSC424, Spring 2005 Data Modeling • Goals: • Conceptual representation of the data • “Reality” meets “bits and bytes” • Must make sense, and be usable by other people • Today: • Entity-relationship Model • Relational Model CMSC424, Spring 2005 Motivation • You’ve just been hired by Bank of America as their DBA for their online banking web site. • You are asked to create a database that monitors: • • • • • customers accounts loans branches transactions, … • Now what??!!! CMSC424, Spring 2005 Database Design Steps Entity-relationship Model Typically used for conceptual database design info Conceptual DB design Three Levels of Modeling Conceptual Data Model Logical DB design Logical Data Model Relational Model Typically used for logical database design Physical DB design Physical Data Model CMSC424, Spring 2005 4 Entity-Relationship Model • Two key concepts • Entities: • An object that exists and is distinguishable from other objects • Examples: Bob Smith, BofA, CMSC424 • Have attributes (people have names and addresses) • Form entity sets with other entities of the same type that share the same properties • Set of all people, set of all classes • Entity sets may overlap • Customers and Employees CMSC424, Spring 2005 Entity-Relationship Model • Two key concepts • Relationships: • Relate 2 or more entities • E.g. Bob Smith has account at College Park Branch • Form relationship sets with other relationships of the same type that share the same properties • Customers have accounts at Branches • Can have attributes: • has account at may have an attribute start-date • Can involve more than 2 entities • Employee works at Branch at Job CMSC424, Spring 2005 ER Diagram: Starting Example access-date cust-name number cust-id customer has account cust-street cust-city balance • Rectangles: entity sets • Diamonds: relationship sets • Ellipses: attributes CMSC424, Spring 2005 7 Rest of the class • Details of the ER Model • How to represent various types of constraints/semantic information etc. • Design issues • A detailed example CMSC424, Spring 2005 Next: Relationship Cardinalities • We may know: One customer can only open one account OR One customer can open multiple accounts • Representing this is important • Why ? • Better manipulation of data • Can enforce such a constraint • Remember: If not represented in conceptual model, the domain knowledge may be lost CMSC424, Spring 2005 Mapping Cardinalities • Express the number of entities to which another entity can be associated via a relationship set • Most useful in describing binary relationship sets CMSC424, Spring 2005 Mapping Cardinalities • One-to-One customer has account • One-to-Many customer has account • Many-to-One customer has account • Many-to-Many customer has account CMSC424, Spring 2005 Mapping Cardinalities • Express the number of entities to which another entity can be associated via a relationship set • Most useful in describing binary relationship sets • N-ary relationships ? CMSC424, Spring 2005 Next: Types of Attributes • Simple vs Composite • Single value per attribute ? • Single-valued vs Multi-valued • E.g. Phone numbers are multi-valued • Derived • If date-of-birth is present, age can be derived • Can help in avoiding redundancy, enforcing constraints etc… CMSC424, Spring 2005 Types of Attributes access-date cust-name number cust-id customer has account cust-street cust-city balance CMSC424, Spring 2005 Types of Attributes age • multi-valued (double ellipse) access-date • derived (dashed ellipse) cust-name number cust-id date-of-birth customer has account cust-city phone no. balance cust-street CMSC424, Spring 2005 Types of Attributes access-date age cust-name number cust-id date-of-birth customer has account phone no. balance cust-street month day cust-city year Composite Attribute CMSC424, Spring 2005 Next: Keys • Key = set of attributes identifying individual entities or relationships CMSC424, Spring 2005 Entity Keys Possible Keys: date-of-birth {cust-id} cust-name {cust-name, cust-city, cust-street} {cust-id, age} cust-id cust-name ?? Probably not. age customer Domain knowledge dependent !! cust-street cust-city phone no. CMSC424, Spring 2005 Entity Keys • Superkey • any attribute set that can distinguish entities • Candidate key • a minimal superkey • Can’t remove any attribute and preserve key-ness • {cust-id, age} not a superkey • {cust-name, cust-city, cust-street} is • assuming cust-name is not unique • Primary key • Candidate key chosen as the key by DBA • Underlined in the ER Diagram CMSC424, Spring 2005 Entity Keys • {cust-id} is a natural primary key date-of-birth cust-name • Try to use a candidate key that rarely changes cust-id age • Typically, SSN forms a good primary key customer • e.g. something involving address not a great idea cust-street cust-city phone no. CMSC424, Spring 2005 Relationship Set Keys • What attributes are needed to represent a relationship completely and uniquely ? • Union of primary keys of the entities involved, and relationship attributes access-date number cust-id customer has account • {cust-id, access-date, account number} describes a relationship completely CMSC424, Spring 2005 Relationship Set Keys • Is {cust-id, access-date, account number} a candidate key ? • No. Attribute access-date can be removed from this set without losing key-ness • In fact, union of primary keys of associated entities is always a superkey access-date number cust-id customer has CMSC424, Spring 2005 account Relationship Set Keys • Is {cust-id, account-number} a candidate key ? • Depends access-date number cust-id customer has CMSC424, Spring 2005 account Relationship Set Keys • Is {cust-id, account-number} a candidate key ? • Depends access-date number cust-id customer has account • If one-to-one relationship, either {cust-id} or {account-number} sufficient • Since a given customer can only have one account, she can only participate in one relationship • Ditto account CMSC424, Spring 2005 Relationship Set Keys • Is {cust-id, account-number} a candidate key ? • Depends access-date number cust-id customer has account • If one-to-many relationship (as shown), {account-number} is a candidate key • A given customer can have many accounts, but at most one account holder per account allowed CMSC424, Spring 2005 Relationship Set Keys • General rule for binary relationships • one-to-one: primary key of either entity set • one-to-many: primary key of the entity set on the many side • many-to-many: union of primary keys of the associate entity sets • n-ary relationships • More complicated rules CMSC424, Spring 2005 Next: Data Constraints • Representing semantic data constraints • We already saw constraints on relationship cardinalities CMSC424, Spring 2005 Participation Constraint • Given an entity set E, and a relationship R it participates in: • If every entity in E participates in at least one relationship in R, it is total participation • partial otherwise CMSC424, Spring 2005 Participation Constraint access-date cust-name number cust-id customer has account cust-street cust-city balance Total participation CMSC424, Spring 2005 29 Cardinality Constraints How many relationships can an entity participate in ? access-date number cust-id customer account has 0..* Minimum - 0 Maximum – no limit 1..1 Minimum - 1 Maximum - 1 CMSC424, Spring 2005 Next: Recursive Relationships • Sometimes a relationship associates an entity set to itself CMSC424, Spring 2005 Recursive Relationships emp-name emp-id manager works-for employee worker emp-street emp-city Must be declared with roles CMSC424, Spring 2005 Next: Weak Entity Sets • An entity set without enough attributes to have a primary key • E.g. Transaction Entity • Attributes: • transaction-number, transaction-date, transaction-amount, transaction-type • transaction-number: may not be unique across accounts CMSC424, Spring 2005 Weak Entity Sets • A weak entity set must be associated with an identifying or owner entity set • Account is the owner entity set for Transaction CMSC424, Spring 2005 Weak Entity Sets Still need to be able to distinguish between different weak entities associated with the same strong entity number trans-date trans-number account has Transaction trans-type balance trans-amt CMSC424, Spring 2005 Weak Entity Sets Discriminator: A set of attributes that can be used for that number trans-date trans-number account has Transaction trans-type balance trans-amt CMSC424, Spring 2005 Weak Entity Sets • Primary key: • Primary key of the associated strong entity + discriminator attribute set • For Transaction: • {account-number, transaction-number} CMSC424, Spring 2005 Next: Specialization • Consider entity person: • Attributes: name, street, city • Further classification: • customer • Additional attributes: customer-id, credit-rating • employee • Additional attributes: employee-id, salary • Note similarities to object-oriented programming CMSC424, Spring 2005 Specialization: Example CMSC424, Spring 2005 Finally: Aggregation • No relationships between relationships • E.g.: Associate account officers with has account relationship set customer has ? account officer employee CMSC424, Spring 2005 account Finally: Aggregation • Associate an account officer with each account ? • What if different customers for the same account can have different account officers ? customer has ? account officer employee CMSC424, Spring 2005 account Finally: Aggregation • Solution: Aggregation customer has account officer employee CMSC424, Spring 2005 account More… • Read Chapter 2 for: • Specialization/Aggregation details • Different types of specialization’s etc • • • • Generalization: opposite of specialization Lower- and higher-level entities Attribute inheritance … CMSC424, Spring 2005 E/R Data Model Design Issue #1: Entity Sets vs. Attributes An Example: Employees can have multiple phones (b) (a) Employee phone_no vs Employee Phone Uses phone_loc loc no To resolve, determine how phones are used 1. Can many employees share a phone? (If yes, then (b)) 2. Can employees have multiple phones? (if yes, then (b), or (a) with multivalued attributes) 3. Else Employee phone (a), perhaps with composite attributes CMSC424, Spring 2005 no loc E/R Data Model Design Issue #2: Entity Sets vs. Relationship Sets An Example: How to model bank loans Customer ssn Loan Borrows name (a) vs Customer amt lno ssn Branch Loans bname name amt lno (b) To resolve, determine how loans are issued 1. Can there be more than one customer per loan? • If yes, then (a). Otherwise, loan info must be replicated for each customer (wasteful, potential update anomalies) 2. Is loan a noun or a verb? • Both, but more of a noun to a bank. (hence (a) probably more appropriate) CMSC424, Spring 2005 bcity E/R Data Model Design Issue #3: N-ary vs Binary Relationship Sets An Example: Works_At Ternary: Employee Works_at Dept Branch (Joe, Moody, Acct) Works_At vs Binary: Employee WAE WA WAB Branch WAD Dept (Joe, w3) WAE (Moody, w3) WAB (Acct, w3) WAD CMSC424, Spring 2005 Choose n-ary when possible! (Avoids redundancy, update anomalies) Example Design • We will model a university database • Main entities: • • • • • Professor Projects Departments Graduate students etc… CMSC424, Spring 2005 SSN proj-number name sponsor professor project area start rank budget dept-no SSN name name grad dept office age homepage degree CMSC424, Spring 2005 SSN proj-number name sponsor professor project area start rank budget dept-no SSN name name grad dept office age homepage degree CMSC424, Spring 2005 proj-number SSN PI name sponsor professor project area start rank budget Co-PI Appt Chair Supervises RA Time (%) dept-no SSN name name Major homepage CMSC424, Spring 2005 age advisor office grad advisee dept Mentor degree proj-number SSN PI name sponsor professor project area start rank budget Co-PI Appt Chair Supervises RA Time (%) dept-no SSN name name Major homepage CMSC424, Spring 2005 age advisor office grad advisee dept Mentor degree proj-number SSN PI name sponsor professor project area start rank budget Co-PI Appt Chair Supervises RA Time (%) dept-no SSN name name Major advisee office grad homepage And so on… CMSC424, Spring 2005 age advisor dept Mentor degree Summary • Entity-relationship Model • Intuitive diagram-based representation of domain knowledge, data properties etc… • Two key concepts: • Entities • Relationships • We also looked at: • • • • Relationship cardinalities Keys Participation Constraints … CMSC424, Spring 2005 Summary • Details unimportant • Key idea: We can represent many data properties and constraints conceptually using this • Read Chapter 2 • Assignment will require you to do this anyway ! CMSC424, Spring 2005