Relational algebra

advertisement
Relational Algebra
Manipulating Databases

To access information in a database we
use a query
Ex: How many customers have the first
name = `John’?

Good query writing follows a formal
model called relational algebra
Relational Algebra

Relational algebra: a collection of
mathematical operations that
manipulate tables

Familiarity with relational algebra helps
understanding the logic behind complex
queries and ease the way for writing
them
Tables and Queries


Recall: a table is a set of rows/records having
the same number and types of attributes
When you send a query to the database, it




Finds the appropriate rows of information in the
stored tables
Performs the requested operations on the data
Represents the results in a new temporary table
Delivers the table of results to the user
Example

Ex: How many customers have the first
name = `John’?
The database creates a table containing
all customers whose first name is
`John’ and returns the table to the user
Basic types of queries

There are 4 basic types of queries

A projection operation produces a result table with


A selection operation produces a result table with



All of the columns of the input table
Only those rows of its input table that satisfy some criteria.
A join or product operation produces a result table by


Only some of the columns of its input table.
Combining the columns of two input tables.
A set operation produces a result table by

Combining rows from one or the other of its input tables
Projection operation: 

A projection query selects some of the
columns of the input table


project T onto (attribute1, attribute2, …)
Relational algebra form:

attribute1,
attribute2,...(T)
Example

firstName,lastName(Customer)
accountId
lastName
firstName
street
city
state
zipcode
balance
101
Block
Jane
345 Randolph Circle
Apopka
FL
30458-
$0.00
102
Hamilton
Cherry
3230 Dade St.
Dade City
FL
30555-
$3.00
103
Harrison
Katherine
103 Landis Hall
Bratt
FL
30457-
$31.00
104
Breaux
Carroll
76 Main St.
Apopka
FL
30458-
$35.00
106
Morehouse Anita
9501 Lafayette St.
Houma
LA
44099-
$0.00
111
Doe
Jane
123 Main St.
Apopka
FL
30458-
$0.00
201
Greaves
Joseph
14325 N. Bankside St.
Godfrey
IL
43580-
$0.00
444
Doe
Jane
Cawthon Dorm, room 642 Tallahassee FL
32306-
$10.55
Example …

Notice that the
result table has
fewer rows

Duplicate rows have
been removed
because the
attributes do not
contain a key
firstName
lastName
Anita
Morehouse
Jane
Block
Carroll
Breaux
Cherry
Hamilton
Catherine
Harrison
Jane
Doe
Joseph
Greaves
Storing the temporary results

We can store the result of a query in a
table T’ as follows:
T’ 
attribute1, attribute2,...(T)
This will create a table T’ with attributes:
attribute1, attribute2, … containing the
result of the query
Selection queries: 


A selection query selects rows that
match a selection criteria from a table
Relational algebra form


<condition>(T)
Each row is checked to see if it satisfies
the condition and selected accordingly
Example

lastName=‘Doe’(Customer)
accountId
firstName
lastName
street
city
state
zipcode
balance
111
Jane
Doe
123 Main St.
Apopka
FL
34331
0.00
444
Jane
Doe
Cawthon Dorm, Tallahassee FL
room 642
32306
10.55
Complex selection criteria

The selection criterion can be any
boolean expression containing
operators like: and, or, =, , <, >, , ,
etc …
Example
T ssn=’376-77-0099’
and date < ’01-mar-2002’(TimeCard)
ssn
date
startTime
endTime
storeId
paid
145-09-0967
01/14/2002
8:15
12:00
3
yes
245-11-4554
01/14/2002
8:15
12:00
3
yes
376-77-0099
02/23/2002
14:00
22:00
5
yes
376-77-0099
03/21/2002
14:00
22:00
5
yes
145-09-0967
01/16/2002
8:15
12:00
3
yes
376-77-0099
01/03/2002
10:00
14:00
5
yes
376-77-0099
01/03/2002
15:00
19:00
5
yes
Product queries: X



The product query takes two tables and
produce a table which is the cross product of
the two, i.e., combines every row of one table
with every row of other table
R(A1, A2 , … , An) × S(B1, B2 , … , Bm)=
Q(A1, A2 , … , An, B1, B2 , … , Bm)
Relational algebra form:

RS
Example
Employee  TimeCard
Employee.ssn
lastName
firstName
TimeCard.ssn
date
startTime
endTime
storeId paid
145-09-0967
Uno
Jane
145-09-0967
1/14/2002 8:15:00 AM
12:00:00 PM 3
Yes
245-11-4554
Toulouse
Jie
145-09-0967
1/14/2002 8:15:00 AM
12:00:00 PM 3
Yes
376-77-0099
Threat
Ayisha
145-09-0967
1/14/2002 8:15:00 AM
12:00:00 PM 3
Yes
479-98-0098
Fortune
Julian
145-09-0967
1/14/2002 8:15:00 AM
12:00:00 PM 3
Yes
579-98-8778
Fivozinsky Bruce
145-09-0967
1/14/2002 8:15:00 AM
12:00:00 PM 3
Yes
145-09-0967
Uno
Jane
145-09-0967
1/16/2002 8:15:00 AM
12:00:00 PM 3
Yes
245-11-4554
Toulouse
Jie
145-09-0967
1/16/2002 8:15:00 AM
12:00:00 PM 3
Yes
376-77-0099
Threat
Ayisha
145-09-0967
1/16/2002 8:15:00 AM
12:00:00 PM 3
Yes
479-98-0098
Fortune
Julian
145-09-0967
1/16/2002 8:15:00 AM
12:00:00 PM 3
Yes
579-98-8778
Fivozinsky Bruce
145-09-0967
1/16/2002 8:15:00 AM
12:00:00 PM 3
Yes
145-09-0967
Uno
Jane
245-11-4554
1/14/2002 8:15:00 AM
12:00:00 PM 3
Yes
245-11-4554
Toulouse
Jie
245-11-4554
1/14/2002 8:15:00 AM
12:00:00 PM 3
Yes
Product queries …


If two attributes in two tables T and R
have the same name, we prefix them
with the relation name: T.<attribute>
Ex: Employee.ssn, TimeCard.ssn
Remark. Many of the resulting rows in
the previous example don’t make sense
Join queries: ⋈
In the previous table we are only
interested in the rows that match: rows
with Employee.ssn = TimeCard.ssn
We are interested in the query:

Employee.ssn=TimeCrad.ssn(Employee
x TimeCard)
Join operations …

A join query is a cross product with a
restriction on the result rows



Typical join condition is equality of attributes


The join condition determines which rows match
Only matching rows are in the result table
It is called equi-join
Relational algebra form:

R ⋈<condition>S
Example
Some rows from the table
Employee ⋈Employee.ssn=TimeCard.ssn TimeCard
Employee.
ssn
last
Name
first
Name
TimeCard. ssn
date
start
Time
storeId
paid
end
Time
145-09-0967
Uno
Jane
145-09-0967
01/14/2002
8:15
3
no
12:00
145-09-0967
Uno
Jane
145-09-0967
01/16/2002
8:15
3
no
12:00
245-11-4554
Toulouse
Jie
245-11-4554
01/14/2002
8:15
3
no
12:00
376-77-0099
Threat
Ayisha
376-77-0099
02/23/2002
14:00
5
no
22:00
Natural join: *


Frequently, when doing an equi-join, the
attributes have the same name
A natural join is an equi-join with an
equality condition on the common attributes:
Employee ⋈ssn TimeCard
Employee * TimeCard

In natural join the common attributes appear
once
Queries with multiple joins

Consider the Video-Rental schema, and
suppose we want to retrieve for every
currently-rented video, the renter’s
account number, video number, rental
date, due date, title of the movie, and
cost
Solution
 accountId, videoId, dateRented, dateDue, title, cost
((Rental ⋈videoId Video) ⋈movieId Movie)
Combining operations

Suppose we want to find the following
info. For customer with account = 113,
find all the videos that he is renting: For
each video, find the video number, the
title of the movie, and the due date
Solution
 videoId, title, dateDue
(( accountId=113(Rental) ⋈videoId Video) ⋈movieId Movie)
Or:
T1=  accountId=113(Rental)
T2=T1 ⋈videoId Video
T3 = T2 ⋈movieId Movie
T4 =  videoId, title, dateDue ( T3)
More examples

List all comedy movies that were rented
on December 21, 2001. For every movie
list the customer’s name, movie title,
and date returned
Solution
T1=  daterented=‘December 21 2001’(PreviousRental)
T2=T1 ⋈videoId Video
T3=  genre=‘comedy’ (Movie)
T4= T2 ⋈movieId T3
T5= T4 ⋈accountId Customer
T6 =  firstName, lastName, title, dateReturned ( T5)
Set operations



Set operations include: Union, intersection,
and difference
Relational algebra form: , , 
Set operations can be applied to any tables
with the same shape (compatible)


The same order and type of attributes
Attribute names do not have to agree
Set operations

If R and S are two compatible tables:
R  S is the table that contains the set of rows
that are either in R or in S
R  S is the table that contains the set of rows
that are both in R and S
R - S is the table that contains the set of rows
that are in R but not in S
Example of 

Retrieve all the videos that are currently
or were previously rented
EverRented= Rental  PreviousRental
Example of 

Retrieve the video id of all the videos
that are currently rented and have been
rented at least once before
Veterans= videoId, ( Rental) 
 videoId, ( PreviousRental)
Example of 
Retrieve the video id of all the videos
that are currently rented and have
never been rented before
FirstTime= videoId, ( Rental) -
 videoId, ( PreviousRental)
Aggregate functions


Not all queries can be expressed using
the basic operations described
previously.
What if we want to compute the
average salary of all employees?
Aggregate functions



What if we want to count the number of
employees in each department?
For such queries, we use aggregate
functions.
Relational algebra form

<grouping attributes><function list>(T)
Aggregate functions …



The function list includes: average,
sum, count, maximum, minimum
The result of the query will be a table
containing the results
The attributes consist of the grouping
attributes + function parameters
Examples
Ex1: compute the average salary of all the employees

Average(salary)(Employee)
The resulting table contains one attribute: Average_Salary and
one value
Ex2: compute the number of employees in each
department

DNO

Count(ssn)(Employee)
The resulting table contains two attributes: DNO and
Count_ssn. There is a row for every dept. containing
the DNO value and the number of employees
Renaming attributes

It is sometimes convenient to rename
the attributes in the resulting relation:
R(DEPTNUM, NUM_EMPL) 
DNO
Count(ssn) (Employee)
Recursive operations



Compute all the employees supervised
by ``Pinochio’’
Compute all the emplyees supervised
by ``Pinochio’’ at level two
Compute all the employees supervisod
by ``Pinochio’’ at any level!!!
Answers
A1:
Pinochio_ssn <Result1
 ssn (fname=`pinochio’(Employee))
 ssn (Pinochio ⋈ssn=superssn Employee)
A2:
Result2  ssn (Result1 ⋈ssn=superssn Employee)
Result  Result1  Result2
A3: is not supported by standard relational algebra
Outer Join

Left Outer Join
Ex: list the employee names and also the
name of the department they manage
in case it exists

Right Outer Join

Full Outer Join
Examples from (Emp-Dept-Proj
schema)









List everybody who makes more than $30000.
List names of everybody working for the research
department.
List employees with a dependent.
List employees that have a daughter.
List employees without dependents.
List employees working on a project in Houston.
List all supervisors.
List names of all managers.
List names of managers with at least one dependent
Examples from (Emp-Dept-Proj
schema …)






For every project located in ‘Chicago’, list the project number,
the controlling department number, the department manager’s
last name, address, and birthdate.
Make a list of project numbers for projects involving an
employee whose first name is `Pinochio’ either as a worker on
the project, or as a manager of the department that controls
the project.
Find the names of all employees who are directly supervised by
`Isaac Newton’
For each department, retrieve the department name and
average salary of its employees.
Retrieve the average salary of all female employees
For each project, list the project name and the total number of
hours spent on the project.
Download