Lecture 5

advertisement
Relational Algebra
1
Motivation
• Write a Java program
• Translate it into a program in assembly language
• Execute the assembly language program
• As a rough analogy, here we write a SQL query
• Translate it into a program in relational algebra
language
• Execute the program
2
Example
Sells( bar,
Joe’s
Joe’s
Sue’s
Sue’s
beer,
Bud
Miller
Bud
Coors
price )
2.50
2.75
2.50
3.00
Bars( bar, addr
)
Joe’s Maple St.
Sue’s River Rd.
Query: Find all locations that sell beer for less than 2.75.
Select addr
From Sells, Bars
Where (Sells.bar = Bars.bar) and (Sells.price < 2.75)
3
Translating into a program in RA
PROJECTBars.addr
SELECTSells.price < 2.75
JOIN
Sells
Sells.bar = Bars.bar
Bars
4
Relational Algebra at a Glance
• Operators: relations as input, new relation as output
• Five basic RA operations:
– Basic Set Operations
• union, difference (no intersection, no complement)
– Selection: s
– Projection: p
– Cartesian Product: X
• When our relations have attribute names:
– Renaming: r
• Derived operations:
– Intersection, complement
– Joins (natural,equi-join, theta join, semi-join)
5
Five Basic RA Operations
6
Set Operations
• Union, difference
• Binary operations
7
Set Operations: Union
•
•
•
•
•
Union: all tuples in R1 or R2
Notation: R1 U R2
R1, R2 must have the same schema
R1 U R2 has the same schema as R1, R2
Example:
– ActiveEmployees U RetiredEmployees
8
Set Operations: Difference
•
•
•
•
•
Difference: all tuples in R1 and not in R2
Notation: R1 – R2
R1, R2 must have the same schema
R1 - R2 has the same schema as R1, R2
Example
– AllEmployees - RetiredEmployees
9
Selection
•
•
•
•
•
Returns all tuples which satisfy a condition
Notation: sc(R)
c is a condition: =, <, >, and, or, not
Output schema: same as input schema
Find all employees with salary more than
$40,000:
– sSalary > 40000 (Employee)
10
Selection Example
Employee
SSN
999999999
777777777
888888888
Name
John
Tony
Alice
DepartmentID
1
1
2
Salary
30,000
32,000
45,000
Find all employees with salary more than $40,000.
s Salary > 40000 (Employee)
SSN
Name
888888888 Alice
DepartmentID
2
Salary
45,000
11
Ullman: Selection
• R1 := SELECTC (R2)
– C is a condition (as in “if” statements) that refers to
attributes of R2.
– R1 is all those tuples of R2 that satisfy C.
12
Example
Relation Sells:
bar
Joe’s
Joe’s
Sue’s
Sue’s
beer
Bud
Miller
Bud
Miller
price
2.50
2.75
2.50
3.00
JoeMenu := SELECTbar=“Joe’s”(Sells):
bar
beer
price
Joe’s
Bud
2.50
Joe’s
Miller
2.75
13
Projection
•
•
•
•
•
•
•
Unary operation: returns certain columns
Eliminates duplicate tuples !
Notation: P A1,…,An (R)
Input schema R(B1,…,Bm)
Condition: {A1, …, An}  {B1, …, Bm}
Output schema S(A1,…,An)
Example: project social-security number and
names:
– P SSN, Name (Employee)
14
Projection Example
Employee
SSN
999999999
777777777
888888888
Name
John
Tony
Alice
DepartmentID
1
1
2
Salary
30,000
32,000
45,000
P SSN, Name (Employee)
SSN
999999999
777777777
888888888
Name
John
Tony
Alice
15
Projection
• R1 := PROJL (R2)
– L is a list of attributes from the schema of R2.
– R1 is constructed by looking at each tuple of R2,
extracting the attributes on list L, in the order
specified, and creating from those components a tuple
for R1.
– Eliminate duplicate tuples, if any.
16
Example
Relation Sells:
bar
Joe’s
Joe’s
Sue’s
Sue’s
beer
Bud
Miller
Bud
Miller
price
2.50
2.75
2.50
3.00
Prices := PROJbeer,price(Sells):
beer
price
Bud
2.50
Miller
2.75
Miller
3.00
17
Cartesian Product
•
•
•
•
•
•
•
•
Each tuple in R1 with each tuple in R2
Notation: R1 x R2
Input schemas R1(A1,…,An), R2(B1,…,Bm)
Condition: {A1,…,An}  {B1,…Bm} = F
Output schema is S(A1, …, An, B1, …, Bm)
Notation: R1 x R2
Example: Employee x Dependents
Very rare in practice; but joins are very common
18
Cartesian Product Example
Employee
Name
John
Tony
SSN
999999999
777777777
Dependents
EmployeeSSN
999999999
777777777
Dname
Emily
Joe
Employee x Dependents
Name
SSN
EmployeeSSN
John
999999999 999999999
John
999999999 777777777
Tony
777777777 999999999
Tony
777777777 777777777
Dname
Emily
Joe
Emily
Joe
19
Product
• R3 := R1 * R2
– Pair each tuple t1 of R1 with each tuple t2 of R2.
– Concatenation t1t2 is a tuple of R3.
– Schema of R3 is the attributes of R1 and R2, in
order.
– But beware attribute A of the same name in R1
and R2: use R1.A and R2.A.
20
Example: R3 := R1 * R2
R1(
A,
1
3
B)
2
4
R2(
B,
5
7
9
C )
6
8
10
R3(
A,
1
1
1
3
3
3
R1.B,
2
2
2
4
4
4
R2.B,
5
7
9
5
7
9
C )
6
8
10
6
8
10
21
Renaming
•
•
•
•
•
•
Does not change the relational instance
Changes the relational schema only
Notation: r B1,…,Bn (R)
Input schema: R(A1, …, An)
Output schema: S(B1, …, Bn)
Example:
rLastName, SocSocNo (Employee)
22
Renaming Example
Employee
Name
John
Tony
SSN
999999999
777777777
rLastName, SocSocNo (Employee)
LastName
John
Tony
SocSocNo
999999999
777777777
23
Renaming
• The RENAME operator gives a new schema
to a relation.
• R1 := RENAMER1(A1,…,An)(R2) makes R1 be
a relation with attributes A1,…,An and the
same tuples as R2.
• Simplified notation: R1(A1,…,An) := R2.
24
Example
Bars( name, addr
)
Joe’s Maple St.
Sue’s River Rd.
R(bar, addr) := Bars
R(
bar, addr
)
Joe’s Maple St.
Sue’s River Rd.
25
Derived RA Operations
1) Intersection
2) Most importantly: Join
26
Set Operations: Intersection
•
•
•
•
•
All tuples both in R1 and in R2
Notation: R1 R2
R1, R2 must have the same schema
R1  R2 has the same schema as R1, R2
Example
– UnionizedEmployees  RetiredEmployees
• Intersection is derived:
– R1  R2 = R1 – (R1 – R2)
why ?
27
Joins
•
•
•
•
•
•
•
Theta join
Natural join
Equi-join
Semi-join
Inner join
Outer join
etc.
28
Theta Join
•
•
•
•
•
•
A join that involves a predicate
Notation: R1
where q is a condition
q R2
Input schemas: R1(A1,…,An), R2(B1,…,Bm)
{A1,…An}  {B1,…,Bm} = f
Output schema: S(A1,…,An,B1,…,Bm)
Derived operator:
R1
q R2 = s q (R1 x R2)
29
Theta-Join
• R3 := R1 JOINC R2
– Take the product R1 * R2.
– Then apply SELECTC to the result.
• As for SELECT, C can be any boolean-valued
condition.
– Historic versions of this operator allowed only A theta
B, where theta was =, <, etc.; hence the name “thetajoin.”
30
Example
Sells( bar,
Joe’s
Joe’s
Sue’s
Sue’s
beer,
Bud
Miller
Bud
Coors
price )
2.50
2.75
2.50
3.00
BarInfo := Sells JOIN
BarInfo( bar,
Joe’s
Joe’s
Sue’s
Sue’s
Bars( name, addr
)
Joe’s Maple St.
Sue’s River Rd.
Sells.bar = Bars.name
beer,
Bud
Miller
Bud
Coors
price,
2.50
2.75
2.50
3.00
Bars
name, addr
)
Joe’s Maple St.
Joe’s Maple St.
Sue’s River Rd.
Sue’s River Rd.
31
Natural Join
• Notation: R1
R2
• Input Schema: R1(A1, …, An), R2(B1, …, Bm)
• Output Schema: S(C1,…,Cp)
– Where {C1, …, Cp} = {A1, …, An} U {B1, …, Bm}
• Meaning: combine all pairs of tuples in R1 and R2
that agree on the attributes:
– {A1,…,An}  {B1,…, Bm} (called the join attributes)
• Equivalent to a cross product followed by selection
• Example Employee
Dependents
32
Natural Join Example
Employee
Name
John
Tony
SSN
999999999
777777777
Dependents
SSN
999999999
777777777
Dname
Emily
Joe
Employee
Dependents =
PName, SSN, Dname(s SSN=SSN2(Employee x rSSN2, Dname(Dependents))
Name
John
Tony
SSN
999999999
777777777
Dname
Emily
Joe
33
Natural Join
• R=
• R
A
B
B
C
X
Y
Z
U
X
Z
V
W
Y
Z
Z
V
Z
V
S =
S=
A
B
C
X
Z
U
X
Z
V
Y
Z
U
Y
Z
V
Z
V
W
34
Natural Join
• Given the schemas R(A, B, C, D), S(A, C, E),
what is the schema of R S ?
• Given R(A, B, C), S(D, E), what is R
• Given R(A, B), S(A, B), what is R
S ?
S ?
35
Natural Join
• A frequent type of join connects two relations by:
– Equating attributes of the same name, and
– Projecting out one copy of each pair of equated
attributes.
• Called natural join.
• Denoted R3 := R1 JOIN R2.
36
Example
Sells( bar,
Joe’s
Joe’s
Sue’s
Sue’s
beer,
Bud
Miller
Bud
Coors
price )
2.50
2.75
2.50
3.00
Bars( bar, addr
)
Joe’s Maple St.
Sue’s River Rd.
BarInfo := Sells JOIN Bars
Note Bars.name has become Bars.bar to make the natural
join “work.”
BarInfo(
bar,
Joe’s
Joe’s
Sue’s
Sue’s
beer,
Bud
Milller
Bud
Coors
price,
2.50
2.75
2.50
3.00
addr
)
Maple St.
Maple St.
River Rd.
River Rd.
37
Equi-join
• Most frequently used in practice:
R1
A=B
R2
• Natural join is a particular case of equi-join
• A lot of research on how to do it efficiently
38
Semijoin
• R
S = P A1,…,An (R
S)
• Where the schemas are:
– Input: R(A1,…An), S(B1,…,Bm)
– Output: T(A1,…,An)
39
Semijoin
Applications in distributed databases:
• Product(pid, cid, pname, ...) at site 1
• Company(cid, cname, ...) at site 2
• Query: sprice>1000(Product)
cid=cid Company
• Compute as follows:
T1 = sprice>1000(Product)
T2 = Pcid(T1)
send T2 to site 2
T3 = T2
Company
send T3 to site 1
Answer = T3 T1
site 1
site 1
(T2 smaller than T1)
site 2 (semijoin)
(T3 smaller than Company)
site 1 (semijoin)
40
Joins
•
•
•
•
•
•
•
Theta join
Natural join
Equi-join
Semi-join
Inner join
Outer join
etc.
41
Relational Algebra at a Glance
• Operators: relations as input, new relation as output
• Five basic RA operations:
– Basic Set Operations
• union, difference (no intersection, no complement)
– Selection: s
– Projection: p
– Cartesian Product: X
• When our relations have attribute names:
– Renaming: r
• Derived operations:
– Intersection, complement
– Joins (natural,equi-join, theta join, semi-join)
42
Building Complex Expressions
•
Example
–
•
•
•
in arithmetic algebra:
(x + 4)*(y - 3)
Relational algebra allows the same.
That is, we can build complex expressions
using relational algebra operators
Three notations, just as in arithmetic:
1. Sequences of assignment statements.
2. Expressions with several operators.
3. Expression trees.
43
Sequences of Assignments
• Create temporary relation names.
• Renaming can be implied by giving relations a
list of attributes.
• Example: R3 := R1 JOINC R2 can be written:
R4 := R1 * R2
R3 := SELECTC (R4)
44
Expressions with Several Operators
•
•
Example: the theta-join R3 := R1 JOINC R2 can
be written: R3 := SELECTC (R1 * R2)
Precedence of relational operators:
1. Unary operators --- select, project, rename --- have
highest precedence, bind first.
2. Then come products and joins.
3. Then intersection.
4. Finally, union and set difference bind last.
 But you can always insert parentheses to force
the order you desire.
45
Expression Trees
• Leaves are operands --- either variables standing
for relations or particular, constant relations.
• Interior nodes are operators, applied to their child
or children.
46
Example
• Using the relations Bars(name, addr) and
Sells(bar, beer, price), find the names of all the
bars that are either on Maple St. or sell Bud for
less than $3.
Sells( bar,
Joe’s
Joe’s
Sue’s
Sue’s
beer,
Bud
Miller
Bud
Coors
price )
2.50
2.75
2.50
3.00
Bars(name, addr
)
Joe’s Maple St.
Sue’s River Rd.
47
As a Tree:
• Using the relations Bars(name, addr) and Sells(bar, beer,
price), find the names of all the bars that are either on
Maple St. or sell Bud for less than $3. name
UNION
Joe’s
Sue’s
name
RENAMER(name)
Joe’s
Joe’s
Sue’s
PROJECTname
PROJECTbar
name
address
Joe’s
Maple St.
SELECTaddr = “Maple St.”
Bars
bar
SELECT
bar
beer price
Joe’s
Bud
2.5
Sue’s Bud
2.5
price<3 AND beer=“Bud”
Sells
48
Example
• Using Sells(bar, beer, price), find the bars that
sell two different beers at the same price.
Sells( bar,
Joe’s
Joe’s
Sue’s
Joe’s
beer,
Bud
Miller
Bud
Coors
price )
2.50
2.75
2.50
2.50
49
Strategy
• Using Sells(bar, beer, price), find the bars that
sell two different beers at the same price.
Sells( bar,
Joe’s
Joe’s
Sue’s
Joe’s
beer,
Bud
Miller
Bud
Coors
price )
2.50
2.75
2.50
2.50
S( bar,
Joe’s
Joe’s
Sue’s
Joe’s
beer1, price )
Bud 2.50
Miller 2.75
Bud 2.50
Coors 2.50
• Sells natural-join S = (bar, beer, beer1, price)
– need a selection condition though. What is it?
50
Strategy Summary
• By renaming, define a copy of Sells, called
S(bar, beer1, price). The natural join of Sells
and S consists of quadruples (bar, beer, beer1,
price) such that the bar sells both beers at this
price.
51
The Tree
PROJECTbar
SELECTbeer != beer1
JOIN
RENAMES(bar, beer1, price)
Sells
Sells
52
Complex Queries
Product ( pid, name, price, category, maker-cid)
Purchase (buyer-ssn, seller-ssn, store, pid)
Company (cid, name, stock price, country)
Person(ssn, name, phone, city)
Find phone numbers of people who bought gizmos from Fred.
Find telephony products that somebody bought
53
Expression Tree
P phone
buyer-ssn=ssn
pid=pid
seller-ssn=ssn
P ssn
Person
Purchase
P pid
sname=fred
sname=gizmo
Person
Product
54
Exercises
Product ( pid, name, price, category, maker-cid)
Purchase (buyer-ssn, seller-ssn, store, pid)
Company (cid, name, stock price, country)
Person(ssn, name, phone number, city)
Ex #1: Find people who bought telephony products.
Ex #2: Find names of people who bought American products
55
Exercises
Product ( pid, name, price, category, maker-cid)
Purchase (buyer-ssn, seller-ssn, store, pid)
Company (cid, name, stock price, country)
Person(ssn, name, phone number, city)
Ex #3: Find names of people who bought American products and did
not buy French products
Ex #4: Find names of people who bought American products and they
live in Madison.
56
Exercises
Product ( pid, name, price, category, maker-cid)
Purchase (buyer-ssn, seller-ssn, store, pid)
Company (cid, name, stock price, country)
Person(ssn, name, phone number, city)
Ex #5: Find people who bought stuff from Joe or bought products
from a company whose stock prices is more than $50.
57
Operations on Bags
(and why we care)
• Union: {a,b,b,c} U {a,b,b,b,e,f,f} = {a,a,b,b,b,b,b,c,e,f,f}
– add the number of occurrences
• Difference: {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d}
– subtract the number of occurrences
• Intersection: {a,b,b,b,c,c} {b,b,c,c,c,c,d} = {b,b,c,c}
– minimum of the two numbers of occurrences
• Selection: preserve the number of occurrences
• Projection: preserve the number of occurrences (no duplicate
elimination)
• Cartesian product, join: no duplicate elimination
58
Summary of Relational Algebra
• Why bother ? Can write any RA expression
directly in C++/Java, seems easy.
• Two reasons:
– Each operator admits sophisticated implementations
(think of
, s C)
– Expressions in relational algebra can be rewritten:
optimized
59
Example
Sells( bar,
Joe’s
Joe’s
Sue’s
Sue’s
beer,
Bud
Miller
Bud
Coors
price )
2.50
2.75
2.50
3.00
Bars( bar, addr
)
Joe’s Maple St.
Sue’s River Rd.
Query: Find all locations that sell beer for less than 2.75.
Select addr
From Sells, Bars
Where (Sells.bar = Bars.bar) and (Sells.price < 2.75)
60
Example
PROJECTBars.addr
SELECTSells.price < 2.75
JOIN
Sells
Sells.bar = Bars.bar
Bars
61
What is an “Algebra”
• Mathematical system consisting of:
– Operands --- variables or values from which new
values can be constructed.
– Operators --- symbols denoting procedures that
construct new values from given values.
62
What is Relational Algebra?
• An algebra whose operands are relations or
variables that represent relations.
• Operators are designed to do the most common
things that we need to do with relations in a
database.
– The result is an algebra that can be used as a query
language for relations.
63
Glimpse Ahead: Efficient
Implementations of Operators
• s(age >= 30 AND age <= 35)(Employees)
– Method 1: scan the file, test each employee
– Method 2: use an index on age
– Which one is better ? Depends a lot…
• Employees
–
–
–
–
–
Relatives
Iterate over Employees, then over Relatives
Iterate over Relatives, then over Employees
Sort Employees, Relatives, do “merge-join”
“hash-join”
etc
64
Glimpse Ahead: Optimizations
Product ( pid, name, price, category, maker-cid)
Purchase (buyer-ssn, seller-ssn, store, pid)
Person(ssn, name, phone number, city)
• Which is better:
sprice>100(Product) (Purchase
(sprice>100(Product) Purchase)
scity=seaPerson)
scity=seaPerson
• Depends ! This is the optimizer’s job…
65
Finally: RA has Limitations !
• Cannot compute “transitive closure”
Name1
Name2
Relationship
Fred
Mary
Father
Mary
Joe
Cousin
Mary
Bill
Spouse
Nancy
Lou
Sister
• Find all direct and indirect relatives of Fred
• Cannot express in RA !!! Need to write C program
66
Apply the Lessons You Have Learned
• Suppose you work at Facebook
• Now in charge of helping FB applications
manage the FB friendship graph
• What should you do?
67
Download