Relational Algebra

advertisement
Relational Algebra
(CB Chapter 5.1)
CPSC 356 Database
Ellen Walker
Hiram College
Relational Algebra
• Mathematical language for operating on
relations
• Each operator takes one or more relations as
input, and produces one relation as output
• Relational algebra operators can be
implemented as functions in a programming
language
• A relational algebra expression indicates
which operators to use, in which order
Example Relation (PhoneBook)
First
Last
Dept
Email
Phone Title
Louis
Oliphant
CPSC Louis
5275
Asst
Ellen
Walker
CPSC WalkerEL
5250
Prof
Laura
VanWormer
PHYS VanwormerLA
5249
Prof
Cathy
Mansor
WEC
MansorCN
5957
Dean
Brad
Goodner
BIOL
GoodnerBW
5260
Prof
Another Relation (Dept)
Name
Building
CPSC
Colton
WEC
Hinsdale
PHYS
Gerstacker
BIOL
Gerstacker
MGMT
Hinsdale
Selection (condition )
•
•
The output relation has all rows of the input
relation that satisfy the condition
(Title=“prof”) Phonebook
First
Last
Dept
Email
Laura
VanWormer
PHYS VanwormerLA
5249
Prof
Ellen
Walker
CPSC WalkerEL
5250
Prof
Brad
Goodner
BIOL
5260
Prof
GoodnerBW
Phone Title
Valid Conditions
• Basic condition
– Dept = ‘CS’
– First < Last
attribute compare to const
attribute compare to attribute
• Combination of conditions using AND, OR, or
NOT
• Note: All attributes must come from the
relation in the selection
Pseudocode for Selection
Select (input relation, condition, &output rel)
Do for every tuple in the input relation
If the tuple satisfies the condition
Copy the tuple to the output relation
Time = O(number of tuples in input relation)
Projection (attributes)
• Create a new relation with only the listed
attributes in it.
• last, phone PhoneBook
Last
Phone
Oliphant
5275
Walker
5250
VanWormer
5249
Mansor
5957
Goodner
5260
Relations Have No Duplicates
• dept PhoneBook
• Only 4 rows, even though PhoneBook had
5!
Dept
CPSC
PHYS
WEC
BIOL
Pseudocode for Projection
Project (input relation, attribute-list, &output rel)
Do for every tuple in the input relation
Do for every attribute in the input relation
If the attribute is in the attribute-list
copy the value to the output relation
Remove duplicates in the output relation
Time = O(number of tuples in relation + time for
duplicate-removal)
Combining Select & Project
• first, last ( title=prof PhoneBook )
First
Last
Laura
VanWormer
Ellen
Walker
Brad
Goodner
Remember: A Relation is a Set
• A set is an unordered collection of unique elements
Set S = {1,2,3}
{a,b,c} = {a,c,b}
“1 is an element of S”
• A subset of a set is another set whose elements all
come from the original set.
{a,b} is a subset of {a,c,b}
{1,2,3} is a subset of {1,2,3}
{1,2,4} is not a subset of {1,2,3}
{} (the empty set) is a subset of every set!
Basic Set Operations
• Union: the set of all elements in either or
both original sets
– {1,2} union {2,3} = {1,2,3}
• Intersection: the set of all elements in both
original sets (only)
– {1,2} intersect {2,3} = {2}
• Set Difference: the set of all elements in the
first but not the second set
– {1,2} – {2,3} = {1}
Applying to Relations
• Relations must be “comparable”
– Same set of attributes in each relation!
• Union = all tuples
• Intersection = all matching tuples
• Set Difference = all tuples from first but not
second
Basic Operation Examples
• R1 = dept PhoneBook
• R2 = name as “dept” Dept
– Rename attribute to be same
•
•
•
•
R1  R2 = R2 (in this case)
R1  R2 = R1 (in this case)
R1 – R2 = { } (empty set)
R2 – R1 =
Dept
MGMT
R1
R2
Dept
Dept
CPSC
CPSC
PHYS
PHYS
WEC
WEC
BIOL
BIOL
MGMT
Another Set Operation
• Cartesian product: a set of ordered pairs,
where each contains one element from each
original set
{1,2,3} x {a, b} =
{(1,a), (1,b), (2,a), (2,b), (3,a), (3,b)}
• For Relations: create a new relation with
every combination of tuples
Cartesian Product (X)
• dept, last PhoneBook X Dept
Last
Dept
Name
Bldg
Oliphant
CPSC
CPSC
Colton
Walker
CPSC
CPSC
Colton
VanWorme
r
PHYS
CPSC
Colton
Mansor
WEC
CPSC
Colton
Goodner
BIOL
CPSC
Colton
Oliphant
CPSC
WEC
Hinsdale
Walker
CPSC
WEC
Hinsdale
VanWorme
r
PHYS
WEC
Hinsdale
Mansor
WEC
WEC
Hinsdale
Goodner
BIOL
WEC
Hinsdale
Oliphant
CPSC
PHYS
Colton
Walker
CPSC
PHYS
Colton
Etc….
Pseudocode for Cartesian Product
Product (relation1, relation2, &output rel)
Do for every tuple in relation1
Do for every tuple in relation2
Build a row with all attributes from both relations
Add it to the output relation
Time = O(number of tuples in relation1 * number of
tuples in relation 2)
This is the most expensive operation in relational
algebra!
Join Combines X and Select
• Theta Join:
– any condition R1 X R2
• Equijoin:
– equality condition R1 X R2
• Natural Join:
– equality condition R1 X R2
– Project to remove one copy of each equal attribute
• Left or Right Outer Join:
– Include all tuples from (left or right) side, even if they don’t
have a match
Naïve Pseudocode for Join
• Join (rel1, rel2, condition, output rel)
• product (rel1, rel2, tmp)
• select (tmp, condition, output rel)
• Time: Same as Cartesian Product
• To keep time down, keep the size of the
relations down -- we’ll look at this later!
Let’s Try Some Examples:
• What are the first and last names of all
professors who don’t work in Hinsdale?
• What are the telephone extensions of people
who work in Hinsdale?
• Which buildings contain people whose phone
numbers are between 5000 and 5200?
• List the Dept. Name, Building Name, and
phone numbers for All departments (even
those without phone numbers).
First and last names of all professors
who don’t work in Hinsdale
• Select “all professors”
– Title=“Prof” (Phonebook)
• Select “Departments not in Hinsdale”
– Bldg != “Hinsdale” (Dept)
• Connect these relations where depts match
– (Title=“Prof” (Phonebook)) |X| Dept=Name (Bldg != “Hinsdale” (Dept))
• One project to get the final result
– first,last ((Title=“Prof” (Phonebook)) |X| Dept=Name (Bldg != “Hinsdale”
(Dept)))
Telephone extensions of people who
work in Hinsdale
• Select “Departments in Hinsdale” and project to just
Dept to make the table smaller
– Name (Bldg = “Hinsdale” (Dept))
• Join with PhoneBook to get only those in the right
departments
– Name (Bldg = “Hinsdale” (Dept)) |X| Dept=Name PhoneBook)
• Project to get just the extensions
– Phone ( Name (Bldg = “Hinsdale” (Dept)) |X| Dept=Name PhoneBook))
Buildings with phone numbers
between 5000 and 5300
• Select to get phone numbers from 5000 to 5300 and
Project to have only the dept attribute
– dept (5000<=Phone && 5300>=Phone (PhoneBooks))
• Join with Dept to associate department names with
buildings
– (dept (5000<=Phone && 5300>=Phone (PhoneBooks)) |X| dept=name
(Dept))
• Project to get just the building names
– bldg ((dept (5000<=Phone && 5300>=Phone (PhoneBooks)) |X|
dept=name (Dept)))
Dept. Name, Building, and phone
numbers for all departments
• Join to include info from *ALL* departments.
This requires an outer join
– Dept X| name=dept PhoneBook
• Project to get the right attributes
– name, bldg,phone Dept X| name=dept PhoneBook
Aggregation
• Operations that allow you to combine all the
values in a table (column) in some way:
•
•
•
•
•
COUNT
SUM
AVG
MIN
MAX
– Examples:
• How many CPSC majors are there?
• What is the average GPA of CPSC majors?
Grouping
• Aggregate all elements in one column based
on values in another column
• Aggregation operators (COUNT, SUM, AVG,
MIN, MAX)
• Format:
–
group-by-attribute,
F OPER attribute (table)
• Note: F is a backward cursive F in the book.
Grouping Examples
• List average GPAs by major
major
F AVG GPA(Student)
• What is the average GPA of CPSC students?
σmajor=CPSC (major F AVG GPA(Student))
• List number of faculty in each department
•
dept
F COUNT id(Faculty)
Download