Chap 3,6 6e & 7 5e : Relational Model Parts 1 & 2 CSE 4701 Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 191 Auditorium Road, Box U-155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818 A large portion of these slides are being used with the permission of Dr. Ling Lui, Associate Professor, College of Computing, Georgia Tech. The remainder of these slides have been adapted from the AWL web site for the textbook. Chapter 3-1 Combining Chapters 3 and 6 6ed CSE 4701 What is a Relational Data Model? Schema, Tables, Attributes/Columns, Tuples Characteristics and Examples Referential Integrity Superkeys, Candidate Primary and Foreign Keys Referential Integrity Constrains Relational Algebra Selection, Project, Join, Union, Intersection Advanced Concepts Operations on Relations Chapter 3-2 Essentials of Relational Approach CSE 4701 Relational Model of Data is Based on the Concept of Relations A Relation is a Mathematical Concept Based on Sets Theory of Relations Provides a Formal Foundation for the Relational Data Model The Model Was First Proposed by Dr. E.F. Codd (IBM) in 1970 in the Paper, Entitled "A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970 Chapter 3-3 Relational Data Model: Data Structure CSE 4701 Relational Data Model Structures a Database as a Set of Relations. A Relation Set of Tuples and Typically Shown as a Table With Columns and Rows. Column (Field) Represents an Attribute Row (Tuple) Represents an Entity or a Relationship Attributes relation name t1 t2 A1 v11 v21 tm vm1 R Tuples . . . A2 v12 v22 vm2 ...... An v1n v2n t1[An] vmn Chapter 3-4 Two Versions of a Student Relation CSE 4701 Chapter 3-5 Basic Concepts - Relation Schema CSE 4701 A Schema of a Relation Denoted as R(A :D , A :D , ..., A :D ) 1 1 2 2 n n Set of Attributes That Describe a Relation Denoted by {A1:D1, A2:D2 , ..., An:Dn}, where Ai (i=1, …, n) is Attribute Name and Di is Domain Over Which Ai is Defined Domain The Set of Values From which the Values of an Attribute Aj are Drawn, Denoted by Domain(Aj) Example STUDENT (s#, sname, email, dept) Domain(s#): Number(9) Domain(sname): Char(30) Domain(email): Char(20) Domain(dept): Char(15) Chapter 3-6 Basic Concepts CSE 4701 Relation Scheme - Definition of a Relation Set of Attributes that Describe a Relation e.g., R( A1, A2 , ..., An) Domain - Set of Values from which the Values of an Attribute Are Drawn Denoted by Domain(aj) Relation (Relation Instance) Subset of the Cartesian Product of Domains that Defines its Schema Occurrence of a Relation Scheme R(r) = {T1, T2, ..., Tm}. Cardinality is the Number of Tuples Chapter 3-7 Basic Concepts CSE 4701 Tuple A Row in a Relational Table - ti = {vi1,vi2, ...,vin} Attribute A Column in a Relational Table Projection of an Attribute Aj is {v1j,v2j, ...,vmj}, a Subset of Domain(Aj.) Several Attributes may be Defined on the same Domain (e.g., date of purchase, date of order, etc.) Null Value Special Value Meaning “not known” or “not applicable” … Must be a Value - Even if it is Null Degree - the Number of Attributes Chapter 3-8 What is an Example? CSE 4701 R(A, B) is a Relation Schema Defined over A and B Let domain(A) = {a1, a2} and domain(B) = {0, 1, 2} Tuples are: <a1, 0>, <a1, 2>, <a2, 2> etc. How Many Possible Tuples are there? Entire Relation is a set: {<a1, 0>, <a1, 2>, <a2, 2> etc. } Chapter 3-9 Relation Schemes CSE 4701 Example EMP(ENO, ENAME, TITLE, SAL) PROJ (PNO, PNAME, BUDGET) WORKS(ENO, PNO, RESP, DUR) Underlined Attributes are Relation Keys which Uniquely Distinguish Among Tuples (Rows) Tabular Form EMP ENO ENAME TITLE PROJ PNO PNAME BUDGET WORKS ENO PNO RESP SAL DUR Chapter 3-10 Relation Instances CSE 4701 EMP WORKS ENO ENAME TITLE E1 E2 E3 E4 E5 E6 E7 E8 J. Doe M. Smith A. Lee J. Miller B. Casey L. Chu R. Davis J. Jones Elect. Eng. Syst. Anal. Mech. Eng. Programmer Syst. Anal. Elect. Eng. Mech. Eng. Syst. Anal. ENO PNO E1 E2 E2 E3 E3 E4 E5 E6 E7 E7 E8 P1 P1 P2 P3 P4 P2 P2 P4 P3 P5 P3 RESP DUR Manager Analyst Analyst Consultant Engineer Programmer Manager Manager Engineer Engineer Manager 12 24 6 10 48 18 24 48 36 23 40 PROJ PNO PNAME BUDGET P1 P2 P3 P4 P5 Instrumentation Database Develop. CAD/CAM Maintenance CAD/CAM 150000 135000 250000 310000 500000 PROJ[PNO] P1 P2 P3 P4 P5 EMP[TITLE] Elect.Eng Syst. Anal Mech. Eng Programmer Chapter 3-11 Examples (cont.) CSE 4701 Quiz: R(A, B) is a Relation Schema Defined over A and B Let domain(A) = {a1, a2} and domain(B) = {0, 1, 2} Which of the Following are Relations of R? {(a1, 1), (a1, 2), (a2, 0)} {(a1, 0), (a1, 1), (a1, 2)} {(a1, 1), (a2, 2}, (a0, 0)} {(a1, 1), (a2, a2}, (a0, a0)} {(a1, 1, c1), (a2, 2)} What if Attribute A is a Key? Chapter 3-12 Characteristics of Attributes CSE 4701 Attribute Name Attribute Value - Must have a Value An Attribute Name Refers to a Position in a Tuple by Name Rather than Position An Attribute Name Indicates the Role of a Domain in a Relation Attribute Names must be Unique Within Relations By Using Attribute Names we can Disregard the Ordering of Field Values in Tuples Must Be an Atomic Value Can Be a Null Value Meaning “Not Known”, “Not Applicable” ... Not Possible to have Undefined Value Chapter 3-13 Characteristics of Relations CSE 4701 No Duplicate Tuples It is a Set! The Primary Key Always Exists No Explicit or Implicit Ordering of Tuples No Ordering of Attributes (If They Are Referred to by Their Names) All Attribute Values Are Atomic A Special Null Value is Used to Represent Values that are Unknown or Inapplicable to Certain Tuples Thus - If “No” Value is Desired, “Null” is Used Chapter 3-14 Other Examples CSE 4701 Which of Following are Relations in a Relational Model? R1 A B C D a2 {b1, b2} c1 d5 a2 b7 c9 d5 a2 b23 c22 d1 …... Employee E# Ename AGE E2 Diamond 45 E1 Smith 30 E3 Evan R2 A a2 a2 a2 B …... b2 b7 b7 C D c6 d1 c9 d5 c9 d5 ADDRESS 1888 Buford Hyw. 3302 Peachtree Rd., Atlanta, GA Baker Ct. Atlanta Chapter 3-15 Data Structure: Summary CSE 4701 Relational Schema R( A1:D1, A2 :D2 , ..., An :Dn) Relation R(r) With Tuples of n Columns Denoted as Ti = {vi1,vi2, ...,vin} Attributes Aj (I=1,…,m) and R[aj] = {v1j,v2j, ...,vmj}, Domain(aj.) is a Subset of D , and Several Attributes 1 may be Defined on the Same Domain Degree N: Number of Attributes Cardinality M: Number of Tuples Chapter 3-16 Quiz CSE 4701 R(A, B) is a Relation Schema Defined Over A and B Let Domain(A) = {a1, a2} and Domain(B) = {0, 1, 2} Is R(A, B) Equivalent to R(B, A)? Yes How May Possible Tuples? 2×3=6 Is the Set {(a1, 1), (a2, 2}, (a0, 0)} a Relation of Schema R? No What is the Degree of a Relation of Schema R? 2 What is the Cardinality of the Following Relation 3 {(a1, 1), (a1, 2), (a2, 0)} of Schema R? Chapter 3-17 Summary of Model: Common Terms CSE 4701 Informal Table Column Row (Instance) Table Definition Populated Table Formal Relation Attribute Tuple Schema of Relation Extension Chapter 3-18 Summary of Model: Theoretical Foundation CSE 4701 Notion of Relation and Tuple is Modeled as in Set Theory Changes From Set Theory Existence of Null Value in the Tuples Most Implementation Allow Duplicate Tuples in Result Sets (such as Projection) Interpretation of Relations: Interpretation Linguistic Logical Schema declaration assertion Tuple fact instance of assertion Logical predicate values of satisfying predicate Chapter 3-19 Summary of Model: Features CSE 4701 Simple and Mathematically Elegant Simple, Uniform Data Structure Solid Theoretical Foundation Advantage of the Relational Model: Simplicity Separation Between Data and Data Access Easier to Define Data and Data Structure Easier to Write Queries (Specify What Not How) Relational DBMS can do More for You PC-Based Systems have Brought DB to Masses MS Access - Easy to Use Integration with Office Tools (Word, Excel) Chapter 3-20 Relational Integrity Constraints CSE 4701 Integrity Constraints (ICs): Conditions that Must Hold on All Valid Relation Instances at Any Given DB State Why are Integrity Constraints Needed? Multiple Flights, Customers, and Cust/Flight/Date What Happens when we try to Delete a Flight? FLT-SCHEDULE FLT# DepT CUSTOMER Dest ArrT CUST# CUST-NAME RESERVATION FLT# DATE CUST# Chapter 3-21 Relational Integrity Constraints Classification CSE 4701 There are Three Main Types of Constraints: Key Constraints Entity Integrity Constraints Referential Integrity Constraints Other Types of Semantic Constraints: Domain Constraints Transition Constraints Set Constraints DBMSs Handle Some But Not All Constraints Think About Programming Related Constraints Throughout Our Upcoming Discussion Chapter 3-22 Key Constraints CSE 4701 Superkey (SK): Any Subset of Attributes Whose Values are Guaranteed to Distinguish Among Tuples Could be All Attributes of Entire Relation Candidate Key (CK): A Superkey with a Minimal Set of Attributes (No Attribute Can Be Removed Without Destroying the Uniqueness -- Minimal Identity) A Value of an Attribute or a Set of Attributes in a Relation That Uniquely Identifies a Tuple There may be Multiple Candidate Keys Chapter 3-23 Key Constraints CSE 4701 Primary Key (PK): Choose One From Candidate Keys The Primary Key Attributed are Underlined Foreign Key (FK): An Attribute or a Combination of Attributes (Say A) of Relation R1 Which Occurs as the Primary Key of another Relation R2 (Defined on the Same Domain) Allows Linkages Between Relations that are Tracked and Establish Dependencies What are Foreign Keys in: Chapter 3-24 Superkeys and Candidate Keys: Examples CSE 4701 Example: The CAR relation schema: CAR(State, Reg#, SerialNo, Make, Model, Year) Its primary key is {State, Reg#} It has two candidate keys Key1 = {State, Reg#} Key2 = {SerialNo} which are also superkeys {SerialNo, Make} is a Superkey but not a Key Why? If Remove SerialNo, Make is not a Primary Key Chapter 3-25 Another Schema with Key CSE 4701 CAR(License#, EngineSerialNumber, Make, Model, Year) What are Typically Used as Keys for Cars? Chapter 3-26 A Complete Schema with Keys ... CSE 4701 Keys Allow us to Establish Links Between Relations Chapter 3-27 …and Corresponding DB Tables CSE 4701 Which Represent Tuples/Instances of Each Relation A S C null W B null null 1 4 5 5 Chapter 3-28 …with Remaining DB Tables CSE 4701 Chapter 3-29 Another View: What Do Arrows Represent? CSE 4701 Chapter 3-30 Entity Integrity Constraint CSE 4701 Relational Database Schema: A Set S of Relation Schemas (R1, R2, ..., Rn) That Belong to the Same Database S is the Name of the Database S = {R1, R2, ..., Rn} Entity Integrity: For Any Ri in S, Pki is the Primary Key of R Attributes in Pki Cannot Have Null Values in any Tuple of R(ri) T[pki] < > Null for Any Tuple T in R(r) Chapter 3-31 Referential Integrity Constraints CSE 4701 A Constraint Involving Two Relations Used to Specify a Relationship Among Tuples in Referencing Relation and Referenced Relation Definition: R1and R2 have a Referential Integrity Constraint If Tuples in the Referencing Relation R1 have a Set of Foreign Key (FK) Attributes That Reference the Primary Key PK of the Referenced Relation R2 A Tuple T1 in R1( A1, A2 , ..., An) is Said to Reference a Tuple T2 in R2 if $ FK {A1, A2 , ..., An} such that T1[fk] = T2[pk] Chapter 3-32 Examples CSE 4701 WORKS EMP ENO ENAME TITLE ENO PNO E1 E2 E3 E4 E5 E6 E7 E8 J. Doe M. Smith A. Lee J. Miller B. Casey L. Chu R. Davis J. Jones Elect. Eng. Syst. Anal. Mech. Eng. Programmer Syst. Anal. Elect. Eng. Mech. Eng. Syst. Anal. E1 E2 E2 E3 E3 E4 E5 E6 E7 E7 E8 P1 P1 P2 P3 P4 P2 P2 P4 P3 P5 P3 PROJ RESP Manager Analyst Analyst Consultant Engineer Programmer Manager Manager Engineer Engineer Manager DUR 12 24 6 10 48 18 24 48 36 23 40 Can we Add this Tuple to WORKS? PNO PNAME BUDGET P1 P2 P3 P4 P5 Instrumentation Database Develop. CAD/CAM Maintenance CAD/CAM 150000 135000 250000 310000 500000 E9 P3 Engineer 30 Chapter 3-33 Referential Integrity Constraints CSE 4701 A Referential Integrity Constraint Can Be Displayed in a Relational Database Schema as a Directed Arc From R1.FK to R2.PK EMP PROJ ENO ENAME TITLE WORK ENO PNO PNO PNAME BUDGET RESP DUR WORK[ENO] is a subset of EMP[ENO] WORK[PNO] is a subset of PROJ[PNO] Chapter 3-34 Integrity Constraints Summary CSE 4701 Relational Database: Set of Relations Satisfying the Integrity Constraints Integrity Constraints (ICs): Conditions that Must Hold on All Valid Relation Instances Key Constraints - Uniqueness of Keys Entity ICs - No Primary Key Value is Null Referential ICs Between Two Relations, Cross References Must Point to Existing Tuples Domain ICs are Limits on the Value of Particular Attribute Transition ICs Indicate the Way Values Changes Due to Database Update Chapter 3-35 What is Relational Algebra? CSE 4701 Relational Algebra is a Procedural Paradigm You Need to Tell What/How to Construct the Result Consists of a Set of Operators Which, When Applied to Relations, Yield Relations (Closed Algebra) Basic Relational Operations: Unary Operations SELECT s or P. Binary Operations Set operations: UNION INTERSECTION DIFFERENCE – CARTESIAN PRODUCT JOIN operations PROJECT Chapter 3-36 Relational Algebra CSE 4701 RS RS R-S RS union intersection set difference Cartesian product A1, A2, ..., An (R) projection sF (R) selection R S natural join R S theta-join RS division [A1 B1,.., An Bn]rename Chapter 3-37 Selection CSE 4701 Selects the Tuples (Rows) From a Relation, Which Satisfy a Selection Condition Selection Produces a Horizontal Subset of the Operand Relation General Form sF (R) R is a Relation F is a Boolean Expression on the Attributes of R Resulting Relation Has the Same Schema as R Select Finds and Retrieves All Relevant Rows (Tuples) of Table/Relation R which Includes ALL of its Columns (Attributes) Chapter 3-38 Selection Example CSE 4701 EMP ENO ENAME TITLE E1 E2 E3 J. Doe M. Smith A. Lee Elect. Eng. Syst. Anal. Mech. Eng. E4 E5 E6 J. Miller B. Casey L. Chu Programmer Syst. Anal. Elect. Eng. E7 E8 R. Davis J. Jones Mech. Eng. Syst. Anal. s TITLE='Elect. Eng.'(EMP) ENO E1 E6 ENAME J. Doe L. Chu TITLE Elect. Eng Elect. Eng. s TITLE='Elect. Eng.’ OR TITLE=‘Mech.Eng’(EMP) Chapter 3-39 Another Selection Example CSE 4701 A S C null W B null null Chapter 3-40 Selection Condition A SELECT Condition is a Boolean Expression Form F1 Y F2 Y ..., Y Fq (Q>=1), Where Fi (I=1,…,q) are Atomic Boolean Expressions of the Form ac or ab, a, b are Attributes of R and c is a Constant. The Operator Q is one of the Arithmetic Comparison Operators: <, >, =, <>, >=, <= CSE 4701 The Operator Y is one of the Logical Operators: , , ¬ Nesting: ( ) Conditions are Essentially “Conditional Expressions” in PLs Chapter 3-41 Projection CSE 4701 Extract Only Certain Columns (Attributes) Specified in an Attribute List X From a Relation R Produces a New Relation, which is a Vertical Subset of the Operand Relation R The Schema (Columns) of the Resulting Relation is X General Form X(R) R is a Relation X is a Subset of the Attributes of R Over Which the Projection is Performed Project Retrieves Specified Columns of Table/Relation R which Includes ALL of its Rows (Tuples) Chapter 3-42 Projection Example CSE 4701 PROJ PNO PNAME BUDGET P1 Instrumentation 150000 P2 Database Develop. 135000 P3 CAD/CAM 250000 P4 P5 Maintenance CAD/CAM 310000 500000 PNO,BUDGET(PROJ) PNO BUDGET P1 150000 P2 135000 P3 P4 P5 250000 310000 500000 Chapter 3-43 Other Projection Examples CSE 4701 Chapter 3-44 Characteristics of Projection CSE 4701 The PROJECT Operation Eliminates Duplicate Tuples in the Resulting Relation Why? Projection Must Maintain a Mathematical Set (No Duplicate Elements) EMP TITLE(PROJ) ENO ENAME TITLE TITLE E1 E2 E3 J. Doe M. Smith A. Lee Elect. Eng. Syst. Anal. Mech. Eng. E4 E5 E6 E7 E8 J. Miller B. Casey L. Chu R. Davis J. Jones Programmer Syst. Anal. Elect. Eng. Mech. Eng. Syst. Anal. Elect.Eng Syst.Anal Mec.Eng Programmer Chapter 3-45 Relational Algebra Expression CSE 4701 Several Operations can be Combined to form a Relational Algebra Expression (query) Example: Retrieve all Customers over age 60? Method 1: CNAME, ADDRESS, AGE (s AGE>60(CUSTOMER) ) Method 2: Senior-CUST(C#, Addr, Age) = CNAME, ADDRESS, AGE (s AGE>60(CUSTOMER) ) Method 3: CNAME, ADDRESS, AGE (C) where C = s AGE>60(CUSTOMER) Chapter 3-46 Selection with Projection Example CSE 4701 Chapter 3-47 Union CSE 4701 General Form RS where R, S are Relations Result contains Tuples from both R and S Duplications are Removed The two Operands R, S should be union-compatible (type-compatible w.r.t Columns/Attributes) Example: “find students registered for course C1 or C3” s#(sCNO=‘C1’ (S-C)) s#(sCNO=‘C3’ (S-C)) Chapter 3-48 Union Compatibility CSE 4701 Two Relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) are said Union-compatible If and Only If They Have The Same Number of Attributes The Domains of Corresponding Attributes are Compatible, i.e., Dom(Ai)=dom(Bi) for I=1, 2, ..., N Names Do Not Have to be Same! For Relational Union and Difference Operations, the Operand Relations Must Be Union Compatible The Resulting Relation for Relational Set Operations Has the Same Attribute Names as the First Operand Relation R1 (by Convention) Chapter 3-49 Set Difference General Form CSE 4701 R–S where R and S are Relations Result Contains all tuples that are in R, but not in S. R – S <> S – R Again, there Must be Compatibility Example “Find the students who registered course C1 but not C3” s#(sCNO=‘C1’ (S-C)) – s#(sCNO=‘C3’ (S-C)) Chapter 3-50 Set Intersection General Form CSE 4701 RS where R and S are Relations Result Contains all Tuples that are in R and S. R S = R – (R – S) Again, there Must be Compatibility Example “find the students who registered for both C1 and C3” s#(sCNO=‘C1’ (S-C)) s#(sCNO=‘C3’ (S-C)) Chapter 3-51 Union, Difference, Intersection Examples CSE 4701 What are these Other Three Result Tables? Chapter 3-52 Cartesian Product CSE 4701 Given Relations R of Degree k1 and Cardinality card1 S of Degree k2 and Cardinality card2 Cartesian Product RS is a Relation of Degree (k1+ k2) and Consists of Tuples of Degree (k1+ k2) where each Tuple is a Concatenation of one Tuple of R with one Tuple of S Cardinality of the Result of the Cartesian Product R S is card1 * card2 What is One Problem with Cartesian Product w.r.t. the Result Set? Chapter 3-53 Cartesian Product: Example CSE 4701 R A B C a1 a2 a3 b1 b1 b4 c3 c5 c7 R S S A a1 a1 a2 a2 a3 a3 B b1 b1 b1 b1 b4 b4 C c3 c3 c5 c5 c7 c7 E F e1 e2 f1 f5 E e1 e2 e1 e2 e1 e2 F f1 f5 f1 f5 f1 f5 Chapter 3-54 Cartesian Product CSE 4701 Given R(A1, …,An) and S(B1,…,Bm), the result of a Cartesian product R S is a relation of schema R’(A1, …, An, B1, …, Bm). Example “Get a list containing (S#, C#) for all students who live in Storrs but are not registered for the database course” (S#(scity=‘Storrs’(STUDENT)) C# (sCNAME=‘Database’(COURSE))) – S#, C#(S-C) Chapter 3-55 Cartesian Product Example Generates Lots of “Data” that Doesn’t Make Sense CSE 4701 EMP ENO EMP SAL ENAME TITLE E1 E2 J. Doe M. Smith Elect. Eng Syst. Anal. E3 A. Lee Mech. Eng. E4 E5 E6 J. Miller B. Casey L. Chu Programmer Syst. Anal. Elect. Eng. E7 E8 R. Davis J. Jones Mech. Eng. Syst. Anal. SAL TITLE Elect. Eng. Syst. Anal. Mech. Eng. Programmer SAL 40000 34000 27000 24000 ENO ENAME EMP.TITLE SAL.TITLE SAL E1 E1 E1 J. Doe J. Doe J. Doe Elect. Eng. Elect. Eng. Elect. Eng. Elect. Eng. Syst. Anal. Mech. Eng. 40000 34000 27000 E1 J. Doe Elect. Eng. Programmer 24000 E2 E2 E2 M. Smith M. Smith M. Smith Syst. Anal. Syst. Anal. Syst. Anal. Elect. Eng. Syst. Anal. Mech. Eng. 40000 34000 27000 E2 E3 E3 M. Smith A. Lee A. Lee Syst. Anal. Mech. Eng. Mech. Eng. Programmer Elect. Eng. Syst. Anal. 24000 40000 34000 E3 E3 A. Lee A. Lee Mech. Eng. Mech. Eng. Mech. Eng. Programmer 27000 24000 E8 E8 E8 J. Jones J. Jones J. Jones Syst. Anal. Syst. Anal. Syst. Anal. Elect. Eng. Syst. Anal. Mech. Eng. 40000 34000 27000 E8 J. Jones Syst. Anal. Programmer 24000 Chapter 3-56 Homework 2 Spring 15 Problems CSE 4701 Problem 6.18, parts a, b, c, and d – Using Cartesian Product Chapter 3-57 The Library Schema (Figure 6.14) CSE 4701 Chapter 3-58 Parts a and b CSE 4701 a. How many copies of the book titled The Lost Tribe are owned by the library branch whose name is ‘Sharpstown’? (BaB – books at Branches) BaB= (BOOKCOPIES × (sTitle=‘The Lost Tribe’ (BOOK))) ) Ans = No_Of_Copies( (sBranchName=‘Sharpstown’ (LIBRARY-BRANCH)) × BaB) b. How many copies of the book titled The Lost Tribe are owned by each library branch? (CaB- Copies at Branches) CaB = BOOKCOPIES × LIBRARY_BRANCH) Ans = BranchName, No_Of_Copies( (sTitle=‘The Lost Tribe’ (BOOK)) × CaB) Chapter 3-59 Reminder – Discuss Homework 1 CSE 4701 Chapter 3-60 Theta Join (-Join) CSE 4701 General Form R S where R, S are Relations, F is a Boolean Expression, called a Join Condition. A Derivative of Cartesian Product R S = s (R S) R(A1, A2, ..., Am, B1, B2, ..., Bn) is the Resulting Schema of a -Join over R1 and R2: R1(A1, A2, ..., Am) R2 (B1, B2, ..., Bn) Chapter 3-61 -Join Condition CSE 4701 A -Join Condition is a Boolean Expression of the form F1 y1 F2 y2 ..., yn-1 Fq (q>=1), where Fi (i=1,…,q) are Atomic Boolean Expressions of the form Ai Bj, Ai, Bj are Attributes of R1 and R2 Respectively is one of the Algorithmic Comparison Operators =, <>, >, <. >=, <= The Operator yi (i=1,…,n-1) is Either a Logical AND operator or a logical OR operator Again – a Conditional Expression Chapter 3-62 -Join Example CSE 4701 EMP Notice “Absence” of Nonsensical Data from Cart Product. ENO E1 E2 E3 E4 E5 E6 E7 E8 ENAME J. Doe M. Smith A. Lee J. Miller B. Casey L. Chu R. Davis J. Jones TITLE Elect. Eng Syst. Anal. Mech. Eng. Programmer Syst. Anal. Elect. Eng. Mech. Eng. Syst. Anal. SAL TITLE Elect. Eng. Syst. Anal. Mech. Eng. Programmer SAL 40000 34000 27000 24000 EMP E.TITLE=SAL.TITLE SAL TITLE SAL.TITLE SAL J. Doe M. Smith Elect. Eng. Analyst Elect. Eng. Analyst 40000 34000 E3 A. Lee Mech. Eng. Mech. Eng. 27000 E4 J. Miller Programmer E5 B. Casey Syst. Anal. Programmer 24000 Syst. Anal. 34000 E6 L. Chu Elect. Eng. Elect. Eng. 40000 E7 E8 R. Davis J. Jones Mech. Eng. Syst. Anal. Mech. Eng. Syst. Anal. 27000 34000 ENO ENAME E1 E2 Chapter 3-63 Other Types of Join CSE 4701 Equi-join (EQUIJOIN) The Expression only Contains one or more Equality Comparisons Involving Attributes from R1 and R2 User Specifies Equality Conditions Natural Join Denoted as R S Special Equi-join of Two Relations R and S Over a Set of Attributes Common to both R and S By Common, it means that each Join Attribute in A has not only Compatible Domains but also the Same Name in both Relations R and S System Automatically Picks the Equality Conditions Chapter 3-64 Examples CSE 4701 R A B C a1 a2 a3 b1 b1 b4 c3 c5 c7 S EQUIJOIN R R.B=S.B S A R.B S.B a1 b1 b1 a2 b1 b1 B E b1 b5 e1 e2 Natural Join R S C E A R.B C c3 c5 e1 e1 a1 a2 c3 e1 c5 e1 b1 b1 E Chapter 3-65 Homework 2 Spring 15 Problems CSE 4701 Problem 6.18, parts a, b, c, and d – with Join Chapter 3-66 The Library Schema (Figure 6.14) CSE 4701 Chapter 3-67 Parts a and b CSE 4701 a. How many copies of the book titled The Lost Tribe are owned by the library branch whose name is ‘Sharpstown’? (BaB – books at Branches) BaB= (BOOKCOPIES * (sTitle=‘The Lost Tribe’ (BOOK))) ) BookId Ans = No_Of_Copies( (sBranchName=‘Sharpstown’ (LIBRARY-BRANCH)) * BaB) BranchID b. How many copies of the book titled The Lost Tribe are owned by each library branch? (CaB- Copies at Branches) CaB = BOOKCOPIES * LIBRARY_BRANCH) BranchId Ans = BranchName, No_Of_Copies( (sTitle=‘The Lost Tribe’ (BOOK)) * CaB) BookId Chapter 3-68 Parts c and d CSE 4701 c. Retrieve the names of all borrowers who do not have any books checked out. NO_CHECKOUT_B =CardNo(BORROWER) - CardNo(BOOK_LOANS) Ans = Name(BORROWER * NO_CHECKOUT_B) d. For each book that is loaned out from the Sharpstown branch and whose Due_date is today, retrieve the book title, the borrower’s name, and the borrower’s address. S =BranchID(s BranchName=‘Sharpstwon’ (LIBRARY-BRANCH)) B_FROM_S = BookID, CardNo ( (s DueDate=‘Today’ (BOOKLOANS)) * S ) BranchId Ans = Title, Name, Address ( BOOK * BORROWER * B_FROM_S ) BookId CardNo Chapter 3-69 Natural Join CSE 4701 Natural Join Combines Relations on Attributes with the Same Names STUDENT(S#, SN, CITY, Email) S-C(S#, C#, G) Example Query 1: “list of students with complete course grade info” STUDENT S-C All Natural Joins can be Expressed by a Combination of Primitive Operators Example Query 2: “print all students info (courses taken and grades)” S#, SN, CITY, Email, C#, G (sSTUDENT.S# = S-C.S# (STUDENT S-C)) Chapter 3-70 Natural Join Example CSE 4701 EMP ENO ENAME TITLE EMP SAL E1 E2 J. Doe M. Smith Elect. Eng Syst. Anal. E3 A. Lee Mech. Eng. ENO ENAME E.TITLE E4 E5 E6 J. Miller B. Casey L. Chu Programmer Syst. Anal. Elect. Eng. E1 J. Doe Elect. Eng. 70000 E2 M. Smith Syst. Anal. 80000 E7 E8 R. Davis J. Jones Mech. Eng. Syst. Anal. E3 E4 A. Lee J. Miller Mech. Eng. Programmer 56000 60000 SAL TITLE Elect. Eng. Syst. Anal. Mech. Eng. Programmer SAL 70000 80000 56000 60000 SAL E5 B.Casey Syst.Anal 80000 E6 L. Chu Elect.Eng 70000 E7 R.Davis Mech.Eng 56000 E8 J. Jones Syst. Anal. 80000 Chapter 3-71 Another Natural Join Example CSE 4701 Chapter 3-72 Yet Another Natural Join Example CSE 4701 1 4 5 5 Chapter 3-73 Quotient (Division) CSE 4701 Given Relations R(T,U) of degree r S(U) of degree s The Division of R by S, R÷S Results is a Relation of Degree (rs) Consists of all (rs)-tuples t such that for all s-tuples u in S, the tuple tu is in R. Chapter 3-74 Division Example CSE 4701 R ENO PNO E1 E2 E2 E3 E3 E4 E5 E6 E7 E8 P1 P1 P2 P1 P4 P2 P2 P4 P3 P3 PNAME BUDGET Instrumentation Instrumentation Database Develop. Instrumentation Maintenance Instrumentation Instrumentation Maintenance CAD/CAM CAD/CAM 150000 150000 135000 150000 310000 150000 150000 310000 250000 250000 S Find the employees who work for both project P1 and project P4? R÷S PNO PNAME P1 P4 Instrumentation Maintenance BUDGET 150000 310000 ENO E3 Chapter 3-75 Relational Algebra CSE 4701 Selection Projection Union Difference Cartesian Product Intersection Join, Equi-join, Natural Join Derivable from the fundamental operators Fundamental Operators Chapter 3-76 All Relational Algebra Operations CSE 4701 A Set of Relational Algebra Operations Is Called a Complete Set, If and Only If Any Relational Algebra Operator in the Set Cannot be Derived in Terms of a Sequence of Others in Set Any Relational Algebra Operator Not in the Set Can Be Derived in Terms of a Sequence of Only the Operators in the Set Important Concepts: 1. The Set of Algebra Operations {, s, , –, } is a Complete Set of Relational Algebra Operations 2. Any Query Language Equivalent to These Five Operations is Called Relationally Complete Chapter 3-77 Relational Algebra: Summary CSE 4701 Fundamental Operators Selection Projection Union Set Difference Cartesian Product Additional Operators Join Intersection Quotient (Division) Union Compatibility Same Degree Corresponding Attributes Defined Over the Same Domain Form: <Operator><Operand(s)> <Result> Relation (s) Relation Chapter 3-78 Semi-Join CSE 4701 General form: R FS R is Target Relation, S is the Source Relation Semi-join A Form of Join where the Result Contains ONLY those Tuples of the Target Relation, which Participate in the Join with the Source Relation Benefit: Decreases the Number of Tuples that need to be Handled in the Join R F S = P(R FS) = P(R) FP(S) = R FP(S) where A is a set of Attributes of R, B is a set of attributes of S Chapter 3-79 Semi-Join Example CSE 4701 R B A a1 a2 a3 a4 b1 b1 b3 b4 R SAL =R S B b2 b2 b3 b4 b5 b6 b7 b8 b9 b10 C D c1 c1 c1 c2 c2 c2 c3 c3 c4 c4 d9 d8 d7 d9 d8 d7 d5 d4 d4 d9 A a3 a4 (P B S) B b3 b4 Chapter 3-80 Aggregate Functions CSE 4701 Commonly Used Aggregate Functions: SUM, COUNT, AVERAGE, MIN, MAX They Often Applied to sets of Values or sets of Tuples in Database Applications [<grouping attributes> ] f <function list> (R) The Grouping Attributes are Optional Chapter 3-81 Examples of Aggregate Functions CSE 4701 Example 1: Retrieve the average salary of all employees (no grouping): R(AVG_SAL) = s AVG(SALARY) (EMPLOYEE) Example 2: For each department, retrieve the department number, the number of employees, and the average salary (in the department): R(D#,Num_E,AVG_SAL) = GROUPDNO( s DNO, COUNT(ENO), AVG(SALARY)(EMPLOYEE)) DNO is called the grouping attribute in this example Chapter 3-82 Operations on Relations CSE 4701 A DBMS Operates via User Queries to Read and Change Data in a Database Changes Can be Inserting, Deleting, or Updating (Equivalent to a Delete followed by Insert) One Critical Issue in DB Operations is Integrity Constraints Maintenance in the Presence of INSERTING a Tuple DELETING a Tuple UPDATING/MODIFYING a Tuple. We’ll discuss Each case in Turn What is Constraint Maintenance Similar to in PL? Chapter 3-83 Problem Statements CSE 4701 Integrity Constraints (ICs) Should Not Be Violated by Update Operations To Maintain ICs, Updates may Need to be Propagated and Cause Other Updates Automatically Common Method: Group Several Update Operations Together As a Single Transaction If Integrity Violation, Several Actions Can Be Taken: Cancel Operation that Caused Violation (REJECT) Perform the Operation but Inform User of Violation Trigger Additional Updates So the Violation is Corrected (CASCADE Option, SET NULL Option) Execute a User-specified Error-Correction Routine (Similar to What in a PL Like Java?) Chapter 3-84 Insertion Operations on Relations CSE 4701 Insert a Duplicate Key Violates Key Integrity: Check If Duplicates Occur Insert a Null Key Violates Entity Integrity: Check If Null is in Any Key Insert a Tuple Whose Foreign Key Attribute Pointing to an Non-existent Tuple Violates Referential Integrity: Check the Existence of Referred Tuple Correction Actions: Reject the Update Correct the Violation - Change Null, Duplicate, Etc. Cascade the Access - Insert a New Tuple That Did Not Exist/Delete Tuples that are being Referenced Chapter 3-85 Examples CSE 4701 EMP ? ? WORKS ENO ENAME TITLE E1 E2 E3 E4 E5 J. Doe M. Smith A. Lee J. Miller B. Casey Elect. Eng. Syst. Anal. Mech. Eng. Programmer Syst. Anal. E6 L. Chu E3 R. Davis ENO PNO E1 E2 E2 E3 E3 E4 E5 P1 P1 P2 P3 P4 P2 P2 RESP Manager Analyst Analyst Consultant Engineer Programmer Manager DUR 12 24 6 10 48 18 24 Mech. Eng. E1 PROJ PNO PNAME BUDGET P1 P2 P3 P4 P5 Instrumentation Database Develop. CAD/CAM Maintenance CAD/CAM 150000 135000 250000 310000 500000 Engineer E1 P5 Engineer E8 P3 Manager 36 40 ? ? ? Chapter 3-86 Deletion Operations on Relations CSE 4701 Deleting a Tuple Referred to by Other Tuples in Database (via FKs) would Violate Referential Integrity Action: Check for Incoming Pointers of the Deleted Tuple. Group the Deletion and the Post-processing of the Referencing Pointers in a Single Transaction Three Options If Deletion Causes a Violation Reject the Deletion Attempt to Cascade (Propagate) the Deletion by Deleting the Tuples which Reference the Tuple being or to be Deleted Modify the Referencing Attribute Values that Cause the Violation; Each Values is Set to Null or Changed to Reference to Another Valid Tuple Chapter 3-87 Example CSE 4701 EMP WORKS ENO ENAME TITLE E1 E2 E3 E4 E5 E6 J. Doe M. Smith A. Lee J. Miller B. Casey L. Chu Elect. Eng. Syst. Anal. Mech. Eng. Programmer Syst. Anal. Elect. Eng. ENO PNO E1 E2 E2 E3 E3 E4 E5 E6 P1 P1 P2 P3 P5 P2 P2 P4 RESP Manager Analyst Analyst Consultant Engineer Programmer Manager Manager DUR 12 24 6 10 48 18 24 48 1. Cascading Deleting this tuple? PROJ PNO PNAME BUDGET P1 P2 P3 P4 P5 Instrumentation Database Develop. CAD/CAM Maintenance CAD/CAM 150000 135000 250000 310000 500000 E5 2. reference revision? Chapter 3-88 Modify Operations on Relations CSE 4701 Modify Operation Changes Values of One or More Attributes in a Tuple (or Tuples) of a Given Relation R Maintaining ICs Requires to Check If the Modifying Attributes Are Primary Key or Foreign Keys. Integrity Check Actions: Case 1: If the Attributes to be Modified are Neither a Primary Key nor a Foreign Key, Modify Causes No Problems Must Check and Confirm that the New Value is of Correct Data Type and Domain Case 2: Modifying a Primary Key Value Similar to Deleting One Tuple and Insert Another in its Place Chapter 3-89 Constraints and Update Operations CSE 4701 Three Types of Update Operations: INSERT, DELETE, MODIFY Constraint Maintenance During Updates The Types of Constraints That Most DBMSs Maintain are Key Constraints Entity Constraints Referential Integrity Constraints Other Semantic Constraints Need to Be Maintained by Application Developers/programmers Transition Constraints Domain Constraints Etc. Some DB Do Maintain Domain Constraints via Enumeration and Value-Range Data Types Chapter 3-90 Concluding Remarks CSE 4701 What have we Seen in Chapters 3 & 6? Basic Concepts of Relational Model Including Relation/Table, Tuple/Row, Attribute/Column, Domain/Attribute Value Concept of SK, CK, PK, and FK for Identification and Referential Integrity Integrity Constraints as they Relate to Referential Dependencies Check for Modification Operations Overall, Relational Theory is Basis for SQL, Normal Forms, ER-Relational Translation, etc. Chapter 3-91