Relational Algebra Unit - 4 Relational Algebra • The relational algebra is a procedural query language. • It consists of a set of operations that take one or two relations as input and produce a new relation as their result. • The fundamental operations in the relational algebra are select, project, union, set difference, Cartesian product, and rename, set intersection, natural join, and assignment. Fundamental Operations The instructor relation • The select, project, and rename operations are called unary operations, because they operate on one relation. • The other three operations operate on pairs of relations and are, therefore, called binary operations. The Select Operation • The select operation selects tuples that satisfy a given predicate. We use the lowercase Greek letter sigma (σ) to denote selection. • The predicate appears as a subscript to σ . The argument relation is in parentheses after the σ . σ salary>90000 (instructor) σ dept name = “Physics” (instructor ) • It allows comparisons like =, ≠, <, ≤, >, and ≥ in the selection predicate. • It can combine several predicates into a larger predicate by using the connectives and (∧), or (∨), and not (¬). σ dept name = “Physics” ∧ salary >90000 (instructor ) The Project Operation • If we want to list all instructors’ ID, name, and salary, but do not care about the dept name. The project operation allows us to produce this relation. • The project operation is a unary operation that returns its argument relation, with certain attributes left out. • Projection is denoted by the uppercase Greek letter pi (Π). Π ID, name, salary (instructor ) Find the name of all instructors in the Physics department. Π name (σ dept name = “Physics” (instructor)) The Union Operation Consider a query to find the set of all courses taught in the Fall 2009 semester, the Spring 2010 semester, or both. To find the set of all courses taught in the Fall 2009 semester. Π course id (σ semester = “Fall” ∧ year=2009 (section)) To find the set of all courses taught in the Spring 2010 semester. Π course id (σ semester = “Spring” ∧ year=2010 (section)) To answer the query, we need the union of these two sets; that is, we need all section IDs that appear in either or both of the two relations. Π course id (σ semester = “Fall” ∧ year=2009 (section)) ∪ Π course id (σ semester = “Spring” ∧ year=2010 (section)) The Set-Difference Operation • The set-difference operation, denoted by −, allows us to find tuples that are in one relation but are not in another. • The expression r − s produces a relation containing those tuples in r but not in s. Π course id (σ semester = “Fall” ∧ year=2009 (section)) – Π course id (σ semester = “Spring” ∧ year=2010 (section)) Courses offered in the Fall 2009 semester but not in Spring 2010 semester The Cartesian-Product Operation • The Cartesian-product operation, denoted by a cross (×), allows us to combine information from any two relations. • We write the Cartesian product of relations r1 and r2 as r1 × r2. • The same attribute name may appear in both r1 and r2, we need to devise a naming schema to distinguish between these attributes. • We do so by attaching to an attribute the name of the relation from which the attribute originally came. (instructor.ID, instructor.name, instructor.dept name, instructor.salary teaches.ID, teaches.course id, teaches.sec id, teaches.semester, teaches.year) • For those attributes that appear in only one of the two schemas, we shall usually drop the relation-name prefix. (instructor.ID, name, dept name, salary teaches.ID, course id, sec id, semester, year) The Cartesian-Product Operation Teaches Relation • Assume that we have n1 tuples in instructor and n2 tuples in teaches. Then, there are n1 ∗ n2 ways of choosing a pair of tuples. • In general, if we have relations r1(R1) and r2(R2), then r1 × r2 is a relation whose schema is the concatenation of R1 and R2. • Relation R contains all tuples t for which there is a tuple t1 in r1 and a tuple t2 in r2 for which t[R1] = t1[R1] and t[R2] = t2[R2]. σ dept name = “Physics”(instructor × teaches) Result of instructor × teaches. The Cartesian-Product Operation Π name, course id (σ instructor .ID = teaches.ID (σ dept name = “Physics”(instructor × teaches))) • Another way of writing the above mentioned query Π name, course id (σ instructor .ID = teaches.ID ((σ dept name = “Physics”(instructor)) × teaches)) The Rename Operation • Unlike relations in the database, the results of relational-algebra expressions do not have a name that we can use to refer to them. • It is useful to be able to give them names; the rename operator, denoted by the lowercase Greek letter rho (ρ). ρx (E) • Returns the result of expression E under the name x. A second form of the rename operation is as follows • Assume that a relational algebra expression E has arity n. • Then, the expression ρx(A1,A2,...,An) (E) returns the result of expression E under the name x, and with the attributes renamed to A1, A2,..., An. The Rename Operation • Step 1: To compute the temporary relation, we need to compare the values of all salaries. • write the temporary relation that consists of the salaries that are not the largest. Π instructor.salary (σ instructor.salary < d.salary (instructor × ρd (instructor))) • The result contains all salaries except the largest one. • Step 2: The query to find the largest salary in the university. Π salary (instructor) − Π instructor.salary (σ instructor.salary < d.salary (instructor × ρd (instructor))) Formal Definition of the Relational Algebra • A basic expression in the relational algebra consists of either one of the following A relation in the database • A constant relation • A constant relation is written by listing its tuples within { }, for example { (22222, Einstein, Physics, 95000), (76543, Singh, Finance, 80000) }. • Let E1 and E2 be relational-algebra expressions. Then, the following are all relational-algebra expressions E1 ∪ E2 E1 − E2 E1 × E2 P(E1), where P is a predicate on attributes in E1 S(E1), where S is a list consisting of some of the attributes in E1 x (E1), where x is the new name for the result of E1 Additional Relational-Algebra Operations • The Set-Intersection Operation • Suppose that we wish to find the set of all courses taught in both the Fall 2009 and the Spring 2010 semesters. course id (semester = “Fall” ∧ year=2009 (section)) ∩ course id (semester = “Spring” ∧ year=2010 (section)) • Note that we can rewrite any relational-algebra expression that uses set intersection by replacing the intersection operation with a pair of set-difference operations as r ∩ s = r − (r − s) The Natural-Join Operation • The natural join is a binary operation that allows us to combine certain selections and a Cartesian product into one operation. It is denoted by the join symbol . Instructor relation The natural join of the instructor relation with the teaches relation Teaches Relation The Natural-Join Operation • Find the names of all instructors together with the course id of all courses they taught. Π name, course id (instructor teaches) The Assignment Operation • The assignment operation, denoted by ←, works like assignment in a programming language. • The result to the right of the ← is assigned to the relation variable on the left of the ←. • May use variable in subsequent expressions. Division Operation r÷s • Suited to queries that include the phrase “for all”. • Let r and s be relations on schemas R and S respectively where • R = (A1, …, Am, B1, …, Bn) • S = (B1, …, Bn) The result of r ÷ s is a relation on schema R – S = (A1, …, Am) r ÷ s = { t | t ∈ ∏ R-S(r) ∧ ∀ u ∈ s ( tu ∈ r ) } • Find all customers who have an account at all branches located in Brooklyn city. ∏customer-name, branch-name (depositor account) ÷ ∏branch-name (σbranch-city = “Brooklyn” (branch)) Extended Relational-Algebra Operations Generalized Projection • which extends the projection operation by allowing operations such as arithmetic and string functions to be used in the projection list. The generalized-projection operation has the form Π F1,F2,...,Fn (E) • where E is any relational-algebra expression, and each of F1, F2,..., Fn is an arithmetic expression involving constants and attributes in the schema of E. Π ID,name,dept name,salary÷12(instructor) • It gives the ID, name, dept name, and the monthly salary of each instructor. Outer Join • An extension of the join operation that avoids loss of information. • Computes the join and then adds tuples form one relation that do not match tuples in the other relation to the result of the join. • Uses null values • null signifies that the value is unknown or does not exist • All comparisons involving null are false by definition. Outer Join Relation loan loan-number branch-name L-170 L-230 L-260 Downtown Redwood Perryridge amount 3000 4000 1700 Relation borrower customer-nam loan-number e Jones L-170 Smith L-230 Hayes L-155 Inner Join • loan Borrower loan-number L-170 L-230 branch-name Downtown Redwood amount 3000 4000 customer-name Jones Smith Left Outer Join • loan Borrower loan-number L-170 L-230 L-260 branch-name Downtown Redwood Perryridge amount 3000 4000 1700 customer-name Jones Smith null Right Outer Join • loan borrower loan-number L-170 L-230 L-155 branch-name Downtown Redwood null amount 3000 4000 null customer-name Jones Smith Hayes Full Outer Join • loan borrower loan-number L-170 L-230 L-260 L-155 branch-name Downtown Redwood Perryridge null amount 3000 4000 1700 null customer-name Jones Smith null Hayes Null Values • It is possible for tuples to have a null value, denoted by null, for some of their attributes • null signifies an unknown value or that a value does not exist. • The result of any arithmetic expression involving null is null. • Aggregate functions simply ignore null values • For duplicate elimination and grouping, null is treated like any other value, and two nulls are assumed to be the same Null Values • Comparisons with null values return the special truth value unknown • If false was used instead of unknown, then not (A < 5) would not be equivalent to A >= 5 • Three-valued logic using the truth value unknown: • OR: (unknown or true) = true, (unknown or false) = unknown (unknown or unknown) = unknown • AND: (true and unknown) = unknown, (false and unknown) = false, (unknown and unknown) = unknown • NOT: (not unknown) = unknown • In SQL “P is unknown” evaluates to true if predicate P evaluates to unknown • Result of select predicate is treated as false if it evaluates to unknown Modification of the Database • The content of the database may be modified using the following operations • Deletion • Insertion • Updating • All these operations are expressed using the assignment operator. Deletion • A delete request is expressed similarly to a query, except instead of displaying tuples to the user, the selected tuples are removed from the database. • Can delete only the whole tuples; cannot delete values on a particular attributes • A deletion is expressed in relational algebra by: r←r–E where r is a relation and E is a relational algebric query. • Delete all account records in the Perryridge branch. account ← account – σ branch-name = “Perryridge” (account) • Delete all loan records with amount in the range of 0 to 50 loan ← loan – σ amount ≥ 0 and amount ≤ 50 (loan) Insertion • To insert data into a relation, either • specify a tuple to be inserted • write a query whose result is a set of tuples to be inserted • in relational algebra, an insertion is expressed by r← r ∪ E where r is a relation and E is a relational algebra expression. • The insertion of a single tuple is expressed by letting E be a constant relation containing one tuple. • Insert information in the database specifying that Smith has $1200 in account A-973 at the Perryridge branch. account ← account ∪ {(“Perryridge”, A-973, 1200)} depositor ← depositor ∪ {(“Smith”, A-973)} Updating • A mechanism to change a value in a tuple without charging all values in the tuple • Use the generalized projection operator to do this task r ← ∏ F1, F2, …, FI, (r) • Each Fi is either • the ith attribute of r, if the ith attribute is not updated, or, • if the attribute is to be updated Fi is an expression, involving only constants and the attributes of r, which gives the new value for the attribute. Make interest payments by increasing all balances by 5 percent. account ← ∏ AN, BN, BAL * 0.05 (account) where AN, BN and BAL stand for account-number, branch-name and balance, respectively. Aggregation • Aggregate functions take a collection of values and return a single value as a result. • For example, the aggregate function sum takes a collection of values and returns the sum of the values. • {1, 1, 3, 4, 4, 11} returns the value 24. Avg = 4, Count = 6, Min = 1, Max = 11. • The collections on which aggregate functions operate can have multiple occurrences of a value; the order in which the values appear is not relevant. • Such collections are called multisets. Sets are a special case of multisets where there is only one copy of each element. • The symbol G is the letter G in calligraphic font; read it as “calligraphic G.” • If we do want to eliminate duplicates, we use the same function names as before, with the addition of the hyphenated string “distinct” appended to the end of the function name. Aggregation • There are circumstances where we would like to apply the aggregate function not to a single set of tuples, but instead to a group of sets of tuples. • Find the average salary in each department. • Find the average salary of all instructors. • The general form of the aggregation operation G is as follows • E is any relational-algebric expression; G1, G2,..., Gn constitute a list of attributes on which to group; each Fi is an aggregate function; and each Ai is an attribute name. The Tuple Relational Calculus • When we write a relational-algebra expression, we provide a sequence of procedures that generates the answer to our query. • The tuple relational calculus, by contrast, is a nonprocedural query language. • It describes the desired information without giving a specific procedure for obtaining that information. • A query in the tuple relational calculus is expressed as {t | P(t)} • It is the set of all tuples t such that predicate P is true for t. • Find the ID, name, dept name, salary for instructors whose salary is greater than $80,000. {t | t ∈ instructor ∧ t[salary] > 80000} • There exists • “There exists a tuple t in relation r such that predicate Q(t) is true.” • “Find the instructor ID for each instructor with a salary greater than $80,000”. • “The set of all tuples t such that there exists a tuple s in relation instructor for which the values of t and s for the ID attribute are equal, and the value of s for the salary attribute is greater than $80,000.” • “Find the names of all instructors whose department is in the Watson building.” • Tuple variable u is restricted to departments that are located in the Watson building, while tuple variable s is restricted to instructors whose dept name matches that of tuple variable u. • To find the set of all courses taught in the Fall 2009 semester, the Spring 2010 semester, or both. • Courses that are offered in both the Fall 2009 and Spring 2010 semesters. • Find all the courses taught in the Fall 2009 semester but not in Spring 2010 semester. • The query that we shall consider next uses implication, denoted by ⇒. The formula P ⇒ Q means “P implies Q”; that is, “if P is true, then Q must be true.” Note that P ⇒ Q is logically equivalent to ¬P ∨ Q. • Find all students who have taken all courses offered in the Biology department. • “Q is true for all tuples t in relation r.” • “The set of all students (that is, (ID) tuples t) such that, for all tuples u in the courserelation, if the value of u on attribute dept name is ’Biology’, then there exists a tuple in the takes relation that includes the student ID and the course id.” Safety of Expressions • There is one final issue to be addressed. A tuple-relational-calculus expression may generate an infinite relation. {t |¬ (t ∈ instructor )} • To help us define a restriction of the tuple relational calculus, we introduce the concept of the domain of a tuple relational formula, P. • The domain of P, denoted dom(P), is the set of all values referenced by P. • They include values mentioned in P itself, as well as values that appear in a tuple of a relation mentioned in P. dom(t ∈ instructor ∧ t[salary] > 80000) • It is the set containing 80000 as well as the set of all values appearing in any attribute of any tuple in the instructor relation. • The expression {t | P(t)} is safe if all values that appear in the result are values from dom(P). • The expression {t |¬ (t ∈ instructor)} is not safe. • Note that dom(¬ (t ∈ instructor)) is the set of all values appearing in instructor. However, it is possible to have a tuple t not in instructor that contains values that do not appear in instructor. • Find the instructorID, name, dept name, and salary for instructors whose salary is greater than $80,000. {< i, n, d,s > | < i, n, d,s > ∈ instructor ∧ s > 80000} • Find all instructor ID for instructors whose salary is greater than $80,000 {< n > | ∃ i, d,s (< i, n, d,s > ∈ instructor ∧ s > 80000)} • In the tuple calculus, when we write ∃ s for some tuple variable s, we bind it immediately to a relation by writing ∃ s ∈ r. • However, when we write ∃ n in the domain calculus, n refers not to a tuple, but rather to a domain value. Thus, the domain of variable n is unconstrained until the subformula < i, n, d,s > ∈ instructor constrains n to instructor names that appear in the instructor relation. • Find the names of all instructors in the Physics department together with the course id of all courses they teach. {< n, c > | ∃ i, a (< i, c, a,s, y > ∈ teaches ∧ ∃ d,s (< i, n, d,s > ∈ instructor ∧ d = “Physics”))} • Find the set of all courses taught in the Fall 2009 semester, the Spring 2010 semester, or both. {< c > | ∃ s (< c, a,s, y, b,r, t > ∈ section ∧ s = “Fall” ∧ y = “2009” ∨∃ u (< c, a,s, y, b,r, t > ∈ section ∧ s = “Spring” ∧ y = “2010” • Find all students who have taken all courses offered in the Biology department. {< i > | ∃ n, d, t (< i, n, d, t > ∈ student) ∧ ∀ x, y, z, w (< x, y, z,w> ∈ course ∧ z = “Biology” ⇒ ∃ a, b (< a, x, b,r, p, q > ∈ takes ∧ < c, a > ∈ depositor ))} { < x1, x2, …, xn > | P(x1, x2, …, xn)} is safe if all of the following hold 1. All values that appear in tuples of the expression are values from dom(P) (that is, the values appear either in P or in a tuple of a relation mentioned in P). 2. For every “there exists” subformula of the form ∃ x (P1(x)), the subformula is true if and only if there is a value of x in dom(P1) such that P1(x) is true. 3. For every “for all” subformula of the form ∀x (P1 (x)), the subformula is true if and only if P1(x) is true for all values x from dom (P1).