Relational Algebra Manipulating Databases To access information in a database we use a query Ex: How many customers have the first name = `John’? Good query writing follows a formal model called relational algebra Relational Algebra Relational algebra: a collection of mathematical operations that manipulate tables Familiarity with relational algebra helps understanding the logic behind complex queries and ease the way for writing them Tables and Queries Recall: a table is a set of rows/records having the same number and types of attributes When you send a query to the database, it Finds the appropriate rows of information in the stored tables Performs the requested operations on the data Represents the results in a new temporary table Delivers the table of results to the user Example Ex: How many customers have the first name = `John’? The database creates a table containing all customers whose first name is `John’ and returns the table to the user Basic types of queries There are 4 basic types of queries A projection operation produces a result table with A selection operation produces a result table with All of the columns of the input table Only those rows of its input table that satisfy some criteria. A join or product operation produces a result table by Only some of the columns of its input table. Combining the columns of two input tables. A set operation produces a result table by Combining rows from one or the other of its input tables Projection operation: A projection query selects some of the columns of the input table project T onto (attribute1, attribute2, …) Relational algebra form: attribute1, attribute2,...(T) Example firstName,lastName(Customer) accountId lastName firstName street city state zipcode balance 101 Block Jane 345 Randolph Circle Apopka FL 30458- $0.00 102 Hamilton Cherry 3230 Dade St. Dade City FL 30555- $3.00 103 Harrison Katherine 103 Landis Hall Bratt FL 30457- $31.00 104 Breaux Carroll 76 Main St. Apopka FL 30458- $35.00 106 Morehouse Anita 9501 Lafayette St. Houma LA 44099- $0.00 111 Doe Jane 123 Main St. Apopka FL 30458- $0.00 201 Greaves Joseph 14325 N. Bankside St. Godfrey IL 43580- $0.00 444 Doe Jane Cawthon Dorm, room 642 Tallahassee FL 32306- $10.55 Example … Notice that the result table has fewer rows Duplicate rows have been removed because the attributes do not contain a key firstName lastName Anita Morehouse Jane Block Carroll Breaux Cherry Hamilton Catherine Harrison Jane Doe Joseph Greaves Storing the temporary results We can store the result of a query in a table T’ as follows: T’ attribute1, attribute2,...(T) This will create a table T’ with attributes: attribute1, attribute2, … containing the result of the query Selection queries: A selection query selects rows that match a selection criteria from a table Relational algebra form <condition>(T) Each row is checked to see if it satisfies the condition and selected accordingly Example lastName=‘Doe’(Customer) accountId firstName lastName street city state zipcode balance 111 Jane Doe 123 Main St. Apopka FL 34331 0.00 444 Jane Doe Cawthon Dorm, Tallahassee FL room 642 32306 10.55 Complex selection criteria The selection criterion can be any boolean expression containing operators like: and, or, =, , <, >, , , etc … Example T ssn=’376-77-0099’ and date < ’01-mar-2002’(TimeCard) ssn date startTime endTime storeId paid 145-09-0967 01/14/2002 8:15 12:00 3 yes 245-11-4554 01/14/2002 8:15 12:00 3 yes 376-77-0099 02/23/2002 14:00 22:00 5 yes 376-77-0099 03/21/2002 14:00 22:00 5 yes 145-09-0967 01/16/2002 8:15 12:00 3 yes 376-77-0099 01/03/2002 10:00 14:00 5 yes 376-77-0099 01/03/2002 15:00 19:00 5 yes Product queries: X The product query takes two tables and produce a table which is the cross product of the two, i.e., combines every row of one table with every row of other table R(A1, A2 , … , An) × S(B1, B2 , … , Bm)= Q(A1, A2 , … , An, B1, B2 , … , Bm) Relational algebra form: RS Example Employee TimeCard Employee.ssn lastName firstName TimeCard.ssn date startTime endTime storeId paid 145-09-0967 Uno Jane 145-09-0967 1/14/2002 8:15:00 AM 12:00:00 PM 3 Yes 245-11-4554 Toulouse Jie 145-09-0967 1/14/2002 8:15:00 AM 12:00:00 PM 3 Yes 376-77-0099 Threat Ayisha 145-09-0967 1/14/2002 8:15:00 AM 12:00:00 PM 3 Yes 479-98-0098 Fortune Julian 145-09-0967 1/14/2002 8:15:00 AM 12:00:00 PM 3 Yes 579-98-8778 Fivozinsky Bruce 145-09-0967 1/14/2002 8:15:00 AM 12:00:00 PM 3 Yes 145-09-0967 Uno Jane 145-09-0967 1/16/2002 8:15:00 AM 12:00:00 PM 3 Yes 245-11-4554 Toulouse Jie 145-09-0967 1/16/2002 8:15:00 AM 12:00:00 PM 3 Yes 376-77-0099 Threat Ayisha 145-09-0967 1/16/2002 8:15:00 AM 12:00:00 PM 3 Yes 479-98-0098 Fortune Julian 145-09-0967 1/16/2002 8:15:00 AM 12:00:00 PM 3 Yes 579-98-8778 Fivozinsky Bruce 145-09-0967 1/16/2002 8:15:00 AM 12:00:00 PM 3 Yes 145-09-0967 Uno Jane 245-11-4554 1/14/2002 8:15:00 AM 12:00:00 PM 3 Yes 245-11-4554 Toulouse Jie 245-11-4554 1/14/2002 8:15:00 AM 12:00:00 PM 3 Yes Product queries … If two attributes in two tables T and R have the same name, we prefix them with the relation name: T.<attribute> Ex: Employee.ssn, TimeCard.ssn Remark. Many of the resulting rows in the previous example don’t make sense Join queries: ⋈ In the previous table we are only interested in the rows that match: rows with Employee.ssn = TimeCard.ssn We are interested in the query: Employee.ssn=TimeCrad.ssn(Employee x TimeCard) Join operations … A join query is a cross product with a restriction on the result rows Typical join condition is equality of attributes The join condition determines which rows match Only matching rows are in the result table It is called equi-join Relational algebra form: R ⋈<condition>S Example Some rows from the table Employee ⋈Employee.ssn=TimeCard.ssn TimeCard Employee. ssn last Name first Name TimeCard. ssn date start Time storeId paid end Time 145-09-0967 Uno Jane 145-09-0967 01/14/2002 8:15 3 no 12:00 145-09-0967 Uno Jane 145-09-0967 01/16/2002 8:15 3 no 12:00 245-11-4554 Toulouse Jie 245-11-4554 01/14/2002 8:15 3 no 12:00 376-77-0099 Threat Ayisha 376-77-0099 02/23/2002 14:00 5 no 22:00 Natural join: * Frequently, when doing an equi-join, the attributes have the same name A natural join is an equi-join with an equality condition on the common attributes: Employee ⋈ssn TimeCard Employee * TimeCard In natural join the common attributes appear once Queries with multiple joins Consider the Video-Rental schema, and suppose we want to retrieve for every currently-rented video, the renter’s account number, video number, rental date, due date, title of the movie, and cost Solution accountId, videoId, dateRented, dateDue, title, cost ((Rental ⋈videoId Video) ⋈movieId Movie) Combining operations Suppose we want to find the following info. For customer with account = 113, find all the videos that he is renting: For each video, find the video number, the title of the movie, and the due date Solution videoId, title, dateDue (( accountId=113(Rental) ⋈videoId Video) ⋈movieId Movie) Or: T1= accountId=113(Rental) T2=T1 ⋈videoId Video T3 = T2 ⋈movieId Movie T4 = videoId, title, dateDue ( T3) More examples List all comedy movies that were rented on December 21, 2001. For every movie list the customer’s name, movie title, and date returned Solution T1= daterented=‘December 21 2001’(PreviousRental) T2=T1 ⋈videoId Video T3= genre=‘comedy’ (Movie) T4= T2 ⋈movieId T3 T5= T4 ⋈accountId Customer T6 = firstName, lastName, title, dateReturned ( T5) Set operations Set operations include: Union, intersection, and difference Relational algebra form: , , Set operations can be applied to any tables with the same shape (compatible) The same order and type of attributes Attribute names do not have to agree Set operations If R and S are two compatible tables: R S is the table that contains the set of rows that are either in R or in S R S is the table that contains the set of rows that are both in R and S R - S is the table that contains the set of rows that are in R but not in S Example of Retrieve all the videos that are currently or were previously rented EverRented= Rental PreviousRental Example of Retrieve the video id of all the videos that are currently rented and have been rented at least once before Veterans= videoId, ( Rental) videoId, ( PreviousRental) Example of Retrieve the video id of all the videos that are currently rented and have never been rented before FirstTime= videoId, ( Rental) - videoId, ( PreviousRental) Aggregate functions Not all queries can be expressed using the basic operations described previously. What if we want to compute the average salary of all employees? Aggregate functions What if we want to count the number of employees in each department? For such queries, we use aggregate functions. Relational algebra form <grouping attributes><function list>(T) Aggregate functions … The function list includes: average, sum, count, maximum, minimum The result of the query will be a table containing the results The attributes consist of the grouping attributes + function parameters Examples Ex1: compute the average salary of all the employees Average(salary)(Employee) The resulting table contains one attribute: Average_Salary and one value Ex2: compute the number of employees in each department DNO Count(ssn)(Employee) The resulting table contains two attributes: DNO and Count_ssn. There is a row for every dept. containing the DNO value and the number of employees Renaming attributes It is sometimes convenient to rename the attributes in the resulting relation: R(DEPTNUM, NUM_EMPL) DNO Count(ssn) (Employee) Recursive operations Compute all the employees supervised by ``Pinochio’’ Compute all the emplyees supervised by ``Pinochio’’ at level two Compute all the employees supervisod by ``Pinochio’’ at any level!!! Answers A1: Pinochio_ssn <Result1 ssn (fname=`pinochio’(Employee)) ssn (Pinochio ⋈ssn=superssn Employee) A2: Result2 ssn (Result1 ⋈ssn=superssn Employee) Result Result1 Result2 A3: is not supported by standard relational algebra Outer Join Left Outer Join Ex: list the employee names and also the name of the department they manage in case it exists Right Outer Join Full Outer Join Examples from (Emp-Dept-Proj schema) List everybody who makes more than $30000. List names of everybody working for the research department. List employees with a dependent. List employees that have a daughter. List employees without dependents. List employees working on a project in Houston. List all supervisors. List names of all managers. List names of managers with at least one dependent Examples from (Emp-Dept-Proj schema …) For every project located in ‘Chicago’, list the project number, the controlling department number, the department manager’s last name, address, and birthdate. Make a list of project numbers for projects involving an employee whose first name is `Pinochio’ either as a worker on the project, or as a manager of the department that controls the project. Find the names of all employees who are directly supervised by `Isaac Newton’ For each department, retrieve the department name and average salary of its employees. Retrieve the average salary of all female employees For each project, list the project name and the total number of hours spent on the project.