CS 405G: Introduction to Database Systems Lecture 4: Relational Model Instructor: Chen Qian Review A data model is What are the two terms used by ER model to describe a miniworld? a group of concepts for describing data. Entity Relationship What makes a good conceptual database design 7/1/2016 ? 10101 11101 2 Today’s Outline Relational Model Relational Model and Relational Database Schemas Informal definition, not so formal, and formal Relational Model Constraints 7/1/2016 3 Why Study the Relational Model? Most widely used model. “Legacy systems” in older models e.g., IBM’s IMS Object-oriented concepts merged in “Object-Relational” model Early work done in POSTGRES research project at Berkeley XML features in most relational systems Can export XML interfaces Can embed XML inside relational fields 7/1/2016 4 Historically The model was first proposed by Dr. E.F. Codd of IBM in 1970 in the following paper: "A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970. The above paper caused a major revolution in the field of Database management and earned Ted Codd the coveted ACM Turing Award. The picture is from wikipedia 7/1/2016 5 Database Design 7/1/2016 6 Relational Model Concepts Relational database: a set of relations. Relation: made up of 2 parts: 7/1/2016 Schema : specifies name of relation, plus the name and type of each attribute. E.g. Students(sid: string, name: string, login: string, age: integer, gpa: real) Instance : a table, with rows and columns. #rows = cardinality #fields = degree / arity 7 Relation RELATION: A table of values 7/1/2016 A relation may be thought of as a set of rows (table view). Each row represents a fact that corresponds to a realworld entity or relationship. Each row has a value of an item or set of items that uniquely identifies that row in the table. 8 Relation 7/1/2016 Sometimes row-ids or sequential numbers are assigned to identify the rows in the table. A relation may alternately be thought of as a set of columns (schema view). Each column typically is called by its column name or column header or attribute name. 9 A (Slightly) Formal Definition A database is a collection of relations (or tables) Each relation is identified by a name and a list of attributes (or columns) Each attribute has a name and a domain (or type) Such as SID:string Set-valued attributes not allowed Simplicity is a virtue! 7/1/2016 10 Schemas Relation schema = relation name + attributes + types of attributes in order Example: Beers(name, manf) or Beers(name: string, manf: string) Database = collection of relations. Database schema = set of all relation schemas in the database. 11 Schema versus instance Schema (metadata) Students(sid: string, name: string, login: string, age: integer, gpa: real) Specification of how data is to be structured logically Defined at set-up; Rarely changes Instance Content Changes rapidly, but always conforms to the schema Compare to type and objects of type in a programming language Entity and entity type? 7/1/2016 12 Example Schema Student (SID integer, name string, age integer, GPA float) Course (CID string, title string) Enroll (SID integer, CID integer) Instance { 142, Amy, 20, 3.3, 123, Bob, 22, 3.1, ...} { CS405G, Intro. to Database Systems, ...} { 142, CS405G, 142, CS314, ...} 7/1/2016 13 Formal Definition (Set Theory) Formally, given sets D1, D2, …. Dn a relation r is a subset of D1 x D2 x … x Dn x: Cartesian product For sets A and B, the Cartesian product A × B is the set of all ordered pairs (a, b) where a ∈ A and b ∈ B. Thus, a relation is a set of n-tuples (a1, a2, …, an) where each ai Di 7/1/2016 14 Example Example: If customer_name = {Jones, Smith, Curry, Lindsay, …} customer_street = {Main, North, Park, …} customer_city = {Harrison, Rye, Pittsfield, …} Then r = { (Jones, Main, Harrison), (Smith, North, Rye), (Curry, North, Rye), (Lindsay, Park, Pittsfield) } is a relation over customer_name × customer_street × customer_city 7/1/2016 15 Attribute Types Each attribute of a relation has a name, designating the role of the attribute The set of allowed values for each attribute is called the domain of the attribute Attribute values (domain members) are required to be atomic; that is, indivisible 7/1/2016 E.g. the value of an attribute can be an account number, but cannot be a set of account numbers Domain is said to be atomic if all its members are atomic The special value null is a member of every domain 16 Relation Schema A1, A2, …, An are attributes R = (A1, A2, …, An ) is a relation schema Example: Customer_schema = (customer_name, customer_street, customer_city) 7/1/2016 r(R) denotes a relation r on the relation schema R Example: customer (Customer_schema) 17 Relation Instance The current values (relation instance) of a relation are specified by a table An element t of r is a tuple, represented by a row in a table attributes (or columns) customer_name customer_street Jones Smith Curry Lindsay Main North North Park customer_city Harrison Rye Rye Pittsfield tuples (or rows) customer 7/1/2016 18 Definition Summary Informal Terms Formal Terms Table Relation Column Attribute/Domain Row Tuple Values in a column Domain Table Definition Schema of a Relation Populated Table Extension 7/1/2016 19 Characteristics of Relation The tuples in a ration r(R) are not considered to be ordered, even though they appear to be in the tabular form. We consider the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2, ..., vn> to be ordered . All values are considered atomic (indivisible). A special null value is used to represent values that are unknown or inapplicable to certain tuples. 7/1/2016 20 Characteristics of Relation Notation: we refer to component values of a tuple t by t[Ai] = vi (the value of attribute Ai for tuple t). Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t containing the values of attributes Au, Av, ..., Aw, respectively. 7/1/2016 21 Relational Integrity Constraints Integrity Constraints are conditions that must hold on all valid relation instances. There are four main types of constraints: 1. Domain constraints 1. 2. 3. 4. 7/1/2016 The value of a attribute must come from its domain Key constraints Entity integrity constraints Referential integrity constraints 22 Primary Key Constraints A set of fields is a candidate key (abbreviated as key) for a relation if : 1. No two distinct tuples can have same values in all key fields, and 2. Property 1 is not true for any subset of the key. What if Part 2 is false? A super key: a set of fields that contains a key. If there are multiple keys for a relation, one of the keys is chosen (by DBA) to be the primary key. Key Example E.g., given a schema Student(sid: string, name: string, gpa: float) we have: sid is a key for Students. (What about name?) The set {sid, gpa} is a superkey. CAR (licence_num: string, Engine_serial_num: string, make: string, model: string, year: integer) What is the candidate key(s) Which one you may use as a primary key What are the super keys 7/1/2016 24 Entity Integrity Entity Integrity: The primary key attributes (PK) of each relation schema R cannot have null values in any tuple of r(R). 7/1/2016 Other attributes of R may be similarly constrained to disallow null values, even though they are not members of the primary key. 25 Foreign Keys, Referential Integrity Foreign key : Set of fields in one relation that is used to `refer’ to a tuple in another relation. (Must correspond to primary key of the second relation.) Like a `logical pointer’. Foreign key constraint: The foreign key in the referencing relation must match the primary key of the referenced relation. E.g. sid is a foreign key referring to Students: Student(sid: string, name: string, gpa: float) Enrolled(sid: string, cid: string, grade: string) If all foreign key constraints are enforced, referential integrity is achieved, i.e., no dangling references. Foreign Key constraints Only students listed in the Students relation should be allowed to enroll for courses. Enrolled sid 53666 53666 53650 53666 cid grade Carnatic101 C Reggae203 B Topology112 A History105 B Students sid 53666 53688 53650 name login Jones jones@cs Smith smith@eecs Smith smith@math age 18 18 19 Possible violation: Add <50000, History105, B> to Enrolled. Possible violation: delete <53650, Smith, …> from Students. gpa 3.4 3.2 3.8 Update Operations on Relations Update operations 7/1/2016 INSERT a tuple. DELETE a tuple. MODIFY a tuple. Constraints should not be violated in updates 28 From E/R Diagrams to Relations Called logical design (different from conceptual design) Entity sets become relations with the same set of attributes. Relationships become relations whose attributes are only: The keys of the connected entity sets. Attributes of the relationship itself. 29 Design principles KISS Avoid redundancy Keep It Simple, Stupid Redundancy wastes space, complicates updates and deletes, promotes inconsistency Capture essential constraints, but don’t introduce unnecessary restrictions Use your common sense 7/1/2016 Luke Huan Univ. of Kansas 30 Entity Set -> Relation name manf Beers Relation: Beers(name, manf) 31 Relationship -> Relation To represent a relationship, the attributes of the relation include: 1. the primary key attributes of each participating entity set, becoming foreign keys. 2. the descriptive attributes of the relationship set name addr employee duration name Work location department Work(employee name, dept name, duration) 32 Relationship -> Relation The set of nondescriptive attributes is a candidate key, if there are no key constraints. name addr employee duration name Work location department Work(employee name, dept name, duration) Relationship -> Relation If there is a key constraint, the key of the entity with an arrow is the candidate key of the relation. name addr employee duration name manage location department Manage(employee name, dept name, duration) Relationship -> Relation name husband addr Drinkers 1 name Likes manf Beers 2 Buddies Favorite wife Married Likes(drinker name, beer name) Favorite(drinker name, beer name) Buddies(name1, name2) Married(husband name, wife name) 35 Combining Relations It is OK to combine the relation for an entity-set E with the relation R for a many-one relationship from E to another entity set. Example: Drinkers(name, addr) and Favorite(drinker, beer) combine to make Drinker1(name, addr, favBeer). name addr Drinkers name Favorite manf Beers 36 Combining Relations Risk with Many-Many Relationships: Combining Drinkers with Likes would be a mistake. It leads to redundancy name addr Drinkers name Likes manf Beers name addr beer Sally 123 Maple Bud Sally 123 Maple Miller Redundancy 37 Handling Weak Entity Sets Relation for a weak entity set must include attributes for its complete key (including those belonging to other entity sets), as well as its own, nonkey attributes. An identifying (double-diamond) relationship is redundant and yields no relation. 38 Translating weak entity sets Remember the “borrowed” key attributes Watch out for attribute name conflicts number name Rooms capacity In Buildings year In number Seats Building (building_name, year) Rooms (building_name, room_number, capacity) eats (building_name, room_number, seat_number, left_or_right) L/R? 7/1/2016 39 Example name time 40 Logins name At Hosts Example name time Logins name At Hosts Hosts(hostName) Logins(loginName, hostName, time) At(loginName, hostName, hostName2) At becomes part of Logins 41 Must be the same Mapping of N-ary Relationship Types For each n-ary relationship type R, where n>2, create a new relationship to represent R. Include all foreign keys of the participating entity types. include any attributes of the n-ary relationship type 7/1/2016 42 Ternary relationship types. (a) The SUPPLY relationship. 7/1/2016 43 Mapping the n-ary relationship type SUPPLY 7/1/2016 44 Some exercise Consider the relations Students, Faculty, Courses, Rooms, Enrolled, Teaches, and Meets. 1. List all the foreign key constraints among these relations. 2. Give an example of a (plausible) constraint involving one or more of these relations that is not a primary key or foreign key constraint. 7/1/2016 45 Some exercise 1. No foreign keys for Students, Faculty, Courses, Rooms Enrolled: sid and cid should both have FKCs placed on them. (Real students must be enrolled in real courses.) Teaches: fid and cid Meets: cid and rno. 2. the length of sid, cid, and fid could be standardized; limits could be placed on the size of the numbers entered into the credits, room/course capacity, and faculty salary; an enumerated type should be assigned to the grade field etc 7/1/2016 46 Example We have the following relational schemas Student(sid: string, name: string, gpa: float) Course(cid: string, department: string) Enrolled(sid: string, cid: string, grade: character) We have the following sequence of database update operations. (assume all tables are empty before we apply any operations) INSERT<‘1234’, ‘John Smith’, ‘3.5> into Student sid 1234 7/1/2016 name John Smith Chen Qian, University 47 of Kentucky gpa 3.5 Example (Cont.) INSERT<‘647’, ‘EECS’> into Courses INSERT<‘1234’, ‘647’, ‘B’> into Enrolled UPDATE the grade in the Enrolled tuple with sid = 1234 and cid = 647 to ‘A’. DELETE the Enrolled tuple with sid 1234 and cid 647 7/1/2016 sid 1234 name John Smith cid 647 department EECS sid cid grade 1234 647 A B Chen Qian, University 48 of Kentucky gpa 3.5 Exercise INSERT<‘108’, ‘MATH’> into Courses INSERT<‘1234’, ‘108’, ‘B’> into Enrolled INSERT<‘1123’, ‘Mary Carter’, ‘3.8’> into Student 7/1/2016 sid 1234 name John Smith gpa 3.5 1123 Mary Carter 3.8 cid 647 108 department EECS MATH sid 1234 cid 108 Chen Qian, University 49 of Kentucky grade B Exercise (cont.) A little bit tricky INSERT<‘1125’, ‘Bob Lee’, ‘good’> into Student INSERT<‘1123’, NULL, ‘B’> into Enrolled Fail due to domain constraint name John Smith gpa 3.5 1123 Mary Carter 3.8 cid 647 department EECS 108 MATH sid 1234 cid 108 Fail due to entity integrity INSERT <‘1233’,’647’, ‘A’> into Enrolled sid 1234 Failed due to referential integrity 7/1/2016 Chen Qian, University 50 of Kentucky grade B Exercise (cont.) A more tricky one UPDATE the cid in the tuple from Course where cid = 108 to 109 7/1/2016 sid 1234 name John Smith gpa 3.5 1123 Mary Carter 3.8 cid 647 department EECS 108 109 MATH sid 1234 cid 108 109 Chen Qian, University 51 of Kentucky grade B Update Operations on Relations In case of integrity violation, several actions can be taken: 7/1/2016 Cancel the operation that causes the violation (REJECT option) Perform the operation but inform the user of the violation Trigger additional updates so the violation is corrected (CASCADE option, SET NULL option) Execute a user-specified error-correction routine Chen Qian, University 52 of Kentucky Next class Relational algebra (hard part!) 7/1/2016 53