Conceptual Design with ER Model Lecture #2 1 Lecture Outline • Logistics • Steps in building a database application • Conceptual design with ER model 2 Steps in Building a DB Application • Step 1: Pick an application – something where you think you will have quite a bit of data – need to manage multiple users who process/query the data • Step 2: Application analysis and conceptual design – – – – discuss what to model for the application need a modeling language to express what you want ER model is the most popular such language output: an ER diagram of the application 3 name category Example ER Diagram name cid ssn Takes Course Student quarter Advises Teaches Professor address name field 4 Steps in Building a DB Application • Step 3: Translate the ER diagram into a relational schema – using a set of rules • Step 4: Refine the relational schema (normalization) – use a set of schema refinement rules to transform the above rel. schema into a good rel. schema • At this point – you have a good relational schema on paper 5 Example Relational Schema Students: SSN 123-45-6789 234-56-7890 Takes: Name Charles Dan … Category undergrad grad … SSN 123-45-6789 123-45-6789 234-56-7890 CID CSE444 CSE444 CSE142 … Courses: CID CSE444 CSE541 Name Databases Operating systems Quarter fall winter 6 Steps in Building a DB Application • Step 5: Pick a relational DB management system (RDBMS) – MS SQL server, IBM DB2, Oracle – MySQL, PostreSQL – SQLite, ... • Step 6: Implement your DB in that RDBMS – first, implement the schema – next, add the tuples – you can do all these using a language called SQL 7 Steps in Building a DB Application • Step 7: Query your DB using SQL – now that your DB is in place, you can query it using SQL in the “command-line interface” Find all courses that Mary takes select C.name from Students S, Takes T, Courses C where S.name = “Mary” and S.ssn = T.ssn and T.cid = C.cid 8 Steps in Building a DB Application • Subsequent steps – you may not want to access the DB just via the command line interface • ordinary users may not know how to interact with the database directly & ask SQL queries • and the database also cannot do everything you want; SQL is a limited language (not Turing complete, can’t do recursive stuff) – hence you often want to write an application program in C++, Java, Perl, etc to give lay users an easier way to query the DB and take care of things that the database cannot do – many such applications user a Web interface to interact 9 with users Database management system (RDBMS) DB 2 DB 1 Web browser PhP code interface App 2 DB 3 10 11 Database management system (RDBMS) DB 2 DB 1 Lay users PhP code DB 3 Database developer Database system administrator (DBA) 12 Steps in Building a DB Application • Subsequent steps – When the tables have many tuples, queries may run very slowly – so you may need to build indexes to speed up query execution – in the long run, need to maintain the DB, expand, clean, etc. • This class will discuss the above steps, and will look into the internals of RDBMSs 13 We Will Start with Step 2 • Step 1: Pick an application – something where you think you will have quite a bit of data – may need to manage multiple users who process/query the data • Step 2: Application analysis and conceptual design – – – – discuss what to model for the application need a modeling language to express what you want ER model is the most popular such language output: an ER diagram of the application 14 ER Model • Gives us a language to specify – what information the db must hold – what are the relationships among components of that information • Proposed by Peter Chen in 1976 • What we will cover – – – – basic stuff: entities, relations constraints weak entity sets design principles 15 Basic Concepts name category name price makes Company Product stockprice buys employs Person address name ssn 16 Entities and Attributes • Entities – real-world objects distinguishable from other objects – described using a set of attributes price name Product name category Company • Attributes stockprice – each has an atomic domain: string, integers, reals, etc. • Entity set: a collection of similar entities 17 Relations • A mathematical definition: – if A, B are sets, then a relation R is a subset of A x B • A={1,2,3}, B={a,b,c,d}, 1 a 2 b R = {(1,a), (1,c), (3,b)} A= c 3 B= d makes is a subset of Product x Company: makes Company Product 18 ER Diagram name category name price makes Company Product stockprice buys employs Person address name ssn 19 More about relationships ... 20 Multiplicity of E/R Relations • one-one: • many-one a b c d 1 2 3 1 2 3 • many-many 1 2 3 a b c d a b c d makes Company Product 21 Multiplicity of E/R Relations • one-one: • many-one a b c d 1 2 3 1 2 3 • many-many 1 2 3 a b c d 1 c 2 a 3 d 1 b 2 b 3 d a b c d makes Company Product 22 Important: Difference with the Cow Book (Raghu Book) • We will use Employee Manages Department • Cow book use (see Page 33, 3rd edition) Employee Manages Department • You should use the notations in the lectures 23 Multiway Relationships How do we model a purchase relationship between buyers, products and stores? Product Purchase Store Person Can still model as a mathematical set (how?) 24 • When creating an ER diagram, you will create various relationships • Then decide on the “cardinality” of those – should they be 1-1, 1-many, many-1, many-many, etc • Then specify these cardinalities on the ER diagram • You may or may not be able to specify the *exact* cardinality requirements – specially for multiway relationships • May need to approximate 25 Example Given the following relationship Invoice VideoStore Rental Movie Person Suppose invoice determines the rest How would we specify this? 26 Example There is no good way; here’s a possible solution Invoice VideoStore Rental Movie Person Q: Why is this incomplete ? 27 A: Because of the meaning of such an arrow sign Invoice VideoStore Rental Movie Person If I know the store, person, invoice, I know the movie too 28 Roles in Relationships What if we need an entity set twice in one relationship? Product Purchase Person salesperson Store Person buyer 29 • This is a bit confusing, because we have no idea what role each Person entity would play • So we should label the edges to indicate the roles Product Purchase salesperson Person Store buyer Person 30 We can condense this further by collapsing the two Persons into one Product Purchase Store buyer salesperson Person 31 Attributes on Relationships date Product Purchase Store Person 32 Attributes of Entities and Relationships always have atomic values! 33 Consider Attribute Phone of Professors name category name cid ssn Takes Course Student quarter Advises Teaches Professor address name phone 34 Converting Multiway Relationships to Binary date ProductOf Product StoreOf Store BuyerOf Person Purchase 35 Relationships: Summary • Modeled as a mathematical set • Binary and multiway relationships • Converting a multiway one into many binary ones • Constraints on the degree of the relationship – many-one, one-one, many-many – limitations of arrows • Attributes of relationships – not necessary, but useful 36 Subclasses in ER Diagrams name category price Product isa Software Product platforms isa Educational Product 37 Age Group Subclasses • Subclass = special case = fewer entities = more properties. • Example: Ales are a kind of beer. – Not every beer is an ale, but some are. – Let us suppose that in addition to all the properties (attributes and relationships) of beers, ales also have the attribute color. 38 Subclasses in ER Diagrams • Assume subclasses form a tree. – I.e., no multiple inheritance. • Isa triangles indicate the subclass relationship. – Point to the superclass. 39 Example name Beers manf isa color Ales 40 Constraints • A constraint = an assertion about the database that must be true at all times • Part of the database schema • Very important in database design • When creating the ER diagram, you need to find as many constraints as possible • And specify them on the ER diagram 41 Modeling Constraints Finding constraints is part of the modeling process. Commonly used constraints: Keys: social security number uniquely identifies a person. Single-value constraints: a person can have only one father. Referential integrity constraints: if you work for a company, it must exist in the database. Domain constraints: peoples’ ages are between 0 and 150. General constraints: all others (at most 50 students enroll in a class) 42 Why Constraints are Important? • Give more semantics to the data – help us better understand it • Prevent wrong data entry • Allow us to refer to entities (e.g, using keys) • Enable efficient storage, data lookup, etc. 43 Keys in E/R Diagrams name Underline: category price No formal way to specify multiple keys in E/R diagrams Product Person address name ssn 44 More about Keys • Every entity set must have a key – why? • A key can consist of more than one attribute • There can be more than one key for an entity set – one key will be designated as primary key • Requirement for key in an isa hierarchy – not covered in this lecture 45 Single Value Constraint • At most one value play a particular role • An attribute of a entity set has a single value – we can specify if the value must be present or can be missing (represented with say NULL or -1) – example in real-estate domain • price vs. house-style • A many-one relation implies single value const. 46 Referential Integrity Constraint • Single value constraint: at most one value exists in a given role • Ref. int. constraint: exactly one value exists in a given role • An attribute has a non-null, single value – this can be considered a kind of ref. int. constraint • However, we more commonly use such constraints to refer to relationships 47 Referential Integrity Constraints • In some formalisms we may refer to other object but get garbage instead – e.g. a dangling pointer in C/C++ • the Referential Integrity Constraint on relationships explicitly requires a reference to exists 48 Referential Integrity Constraints Product makes Company Product makes Company • This will be even clearer once we get to relational databases 49 Other Kinds of Constraints • • • • Domain constraints Constraints on degree of a relationship Other more general constraints See the book 50 Weak Entity Sets Entity sets are weak when their key attributes come from other classes to which they are related. This happens if: - part-of hierarchies - splitting n-ary relations to binary. affiliation Team sport number ) University name 51 Weak Entity Sets • Occasionally, entities of an entity set need “help” to identify them uniquely. • Entity set E is said to be weak if in order to identify entities of E uniquely, we need to follow one or more many-one relationships from E and include the key of the related entities from the connected entity sets. 52 Now, about design techniques ... 53 Design Principles 1: Be Faithful Product Country Instructor Purchase President Teaches Person Person Course 54 Design Principles 2: Avoid Redundancy date Product Purchase personAddr Store personName 55 Design Principles 3: KISS date Dates Product Purchase Store Person • Read the book for more design principles 56 More on Design Techniques (from Ullman's slides) 1. Avoid redundancy. 2. Limit the use of weak entity sets. 3. Don’t use an entity set when an attribute will do. 57 Avoiding Redundancy • Redundancy occurs when we say the same thing in two different ways. • Redundancy wastes space and (more importantly) encourages inconsistency. – The two instances of the same fact may become inconsistent if we change one and forget to change the other, related version. 58 Example: Good name Beers name ManfBy addr Manfs This design gives the address of each manufacturer exactly once. 59 Example: Bad name Beers name ManfBy addr Manfs manf This design states the manufacturer of a beer twice: as an attribute and as a related entity. 60 Example: Bad name manf manfAddr Beers This design repeats the manufacturer’s address once for each beer; loses the address if there are temporarily no beers for a manufacturer. 61 Entity Sets Versus Attributes • An entity set should satisfy at least one of the following conditions: – – It is more than the name of something; it has at least one nonkey attribute. or It is the “many” in a many-one or many-many relationship. 62 Example: Good name Beers name ManfBy addr Manfs •Manfs deserves to be an entity set because of the nonkey attribute addr. •Beers deserves to be an entity set because it is the “many” of the many-one relationship ManfBy. 63 Example: Good name manf Beers There is no need to make the manufacturer an entity set, because we record nothing about manufacturers besides their name. 64 Example: Bad name Beers name ManfBy Manfs Since the manufacturer is nothing but a name, and is not at the “many” end of any relationship, it should not be an entity set. 65 Don’t Overuse Weak Entity Sets • Beginning database designers often doubt that anything could be a key by itself. – They make all entity sets weak, supported by all other entity sets to which they are linked. • In reality, we usually create unique ID’s for entity sets. – Examples include social-security numbers, automobile VIN’s etc. 66 When Do We Need Weak Entity Sets? • The usual reason is that there is no global authority capable of creating unique ID’s. • Example: it is unlikely that there could be an agreement to assign unique player numbers across all football teams in the world. 67 ER Review • Basic stuff – – – – entity, attribute, entity set relation: binary, multiway, converting from multiway relationship roles, attributes on relationships subclasses (is-a) • Constraints – on relations • many-one, one-one, many-many • limitations of arrows – keys, single-valued, ref integrity, domain & general constraints 68 ER Review • Weak entity set • Design principles – be faithful – avoid redundancy – KISS 69