Relational Algebra 1 Motivation • Write a Java program • Translate it into a program in assembly language • Execute the assembly language program • As a rough analogy, here we write a SQL query • Translate it into a program in relational algebra language • Execute the program 2 Example Sells( bar, Joe’s Joe’s Sue’s Sue’s beer, Bud Miller Bud Coors price ) 2.50 2.75 2.50 3.00 Bars( bar, addr ) Joe’s Maple St. Sue’s River Rd. Query: Find all locations that sell beer for less than 2.75. Select addr From Sells, Bars Where (Sells.bar = Bars.bar) and (Sells.price < 2.75) 3 Translating into a program in RA PROJECTBars.addr SELECTSells.price < 2.75 JOIN Sells Sells.bar = Bars.bar Bars 4 Relational Algebra at a Glance • Operators: relations as input, new relation as output • Five basic RA operations: – Basic Set Operations • union, difference (no intersection, no complement) – Selection: s – Projection: p – Cartesian Product: X • When our relations have attribute names: – Renaming: r • Derived operations: – Intersection, complement – Joins (natural,equi-join, theta join, semi-join) 5 Five Basic RA Operations 6 Set Operations • Union, difference • Binary operations 7 Set Operations: Union • • • • • Union: all tuples in R1 or R2 Notation: R1 U R2 R1, R2 must have the same schema R1 U R2 has the same schema as R1, R2 Example: – ActiveEmployees U RetiredEmployees 8 Set Operations: Difference • • • • • Difference: all tuples in R1 and not in R2 Notation: R1 – R2 R1, R2 must have the same schema R1 - R2 has the same schema as R1, R2 Example – AllEmployees - RetiredEmployees 9 Selection • • • • • Returns all tuples which satisfy a condition Notation: sc(R) c is a condition: =, <, >, and, or, not Output schema: same as input schema Find all employees with salary more than $40,000: – sSalary > 40000 (Employee) 10 Selection Example Employee SSN 999999999 777777777 888888888 Name John Tony Alice DepartmentID 1 1 2 Salary 30,000 32,000 45,000 Find all employees with salary more than $40,000. s Salary > 40000 (Employee) SSN Name 888888888 Alice DepartmentID 2 Salary 45,000 11 Ullman: Selection • R1 := SELECTC (R2) – C is a condition (as in “if” statements) that refers to attributes of R2. – R1 is all those tuples of R2 that satisfy C. 12 Example Relation Sells: bar Joe’s Joe’s Sue’s Sue’s beer Bud Miller Bud Miller price 2.50 2.75 2.50 3.00 JoeMenu := SELECTbar=“Joe’s”(Sells): bar beer price Joe’s Bud 2.50 Joe’s Miller 2.75 13 Projection • • • • • • • Unary operation: returns certain columns Eliminates duplicate tuples ! Notation: P A1,…,An (R) Input schema R(B1,…,Bm) Condition: {A1, …, An} {B1, …, Bm} Output schema S(A1,…,An) Example: project social-security number and names: – P SSN, Name (Employee) 14 Projection Example Employee SSN 999999999 777777777 888888888 Name John Tony Alice DepartmentID 1 1 2 Salary 30,000 32,000 45,000 P SSN, Name (Employee) SSN 999999999 777777777 888888888 Name John Tony Alice 15 Projection • R1 := PROJL (R2) – L is a list of attributes from the schema of R2. – R1 is constructed by looking at each tuple of R2, extracting the attributes on list L, in the order specified, and creating from those components a tuple for R1. – Eliminate duplicate tuples, if any. 16 Example Relation Sells: bar Joe’s Joe’s Sue’s Sue’s beer Bud Miller Bud Miller price 2.50 2.75 2.50 3.00 Prices := PROJbeer,price(Sells): beer price Bud 2.50 Miller 2.75 Miller 3.00 17 Cartesian Product • • • • • • • • Each tuple in R1 with each tuple in R2 Notation: R1 x R2 Input schemas R1(A1,…,An), R2(B1,…,Bm) Condition: {A1,…,An} {B1,…Bm} = F Output schema is S(A1, …, An, B1, …, Bm) Notation: R1 x R2 Example: Employee x Dependents Very rare in practice; but joins are very common 18 Cartesian Product Example Employee Name John Tony SSN 999999999 777777777 Dependents EmployeeSSN 999999999 777777777 Dname Emily Joe Employee x Dependents Name SSN EmployeeSSN John 999999999 999999999 John 999999999 777777777 Tony 777777777 999999999 Tony 777777777 777777777 Dname Emily Joe Emily Joe 19 Product • R3 := R1 * R2 – Pair each tuple t1 of R1 with each tuple t2 of R2. – Concatenation t1t2 is a tuple of R3. – Schema of R3 is the attributes of R1 and R2, in order. – But beware attribute A of the same name in R1 and R2: use R1.A and R2.A. 20 Example: R3 := R1 * R2 R1( A, 1 3 B) 2 4 R2( B, 5 7 9 C ) 6 8 10 R3( A, 1 1 1 3 3 3 R1.B, 2 2 2 4 4 4 R2.B, 5 7 9 5 7 9 C ) 6 8 10 6 8 10 21 Renaming • • • • • • Does not change the relational instance Changes the relational schema only Notation: r B1,…,Bn (R) Input schema: R(A1, …, An) Output schema: S(B1, …, Bn) Example: rLastName, SocSocNo (Employee) 22 Renaming Example Employee Name John Tony SSN 999999999 777777777 rLastName, SocSocNo (Employee) LastName John Tony SocSocNo 999999999 777777777 23 Renaming • The RENAME operator gives a new schema to a relation. • R1 := RENAMER1(A1,…,An)(R2) makes R1 be a relation with attributes A1,…,An and the same tuples as R2. • Simplified notation: R1(A1,…,An) := R2. 24 Example Bars( name, addr ) Joe’s Maple St. Sue’s River Rd. R(bar, addr) := Bars R( bar, addr ) Joe’s Maple St. Sue’s River Rd. 25 Derived RA Operations 1) Intersection 2) Most importantly: Join 26 Set Operations: Intersection • • • • • All tuples both in R1 and in R2 Notation: R1 R2 R1, R2 must have the same schema R1 R2 has the same schema as R1, R2 Example – UnionizedEmployees RetiredEmployees • Intersection is derived: – R1 R2 = R1 – (R1 – R2) why ? 27 Joins • • • • • • • Theta join Natural join Equi-join Semi-join Inner join Outer join etc. 28 Theta Join • • • • • • A join that involves a predicate Notation: R1 where q is a condition q R2 Input schemas: R1(A1,…,An), R2(B1,…,Bm) {A1,…An} {B1,…,Bm} = f Output schema: S(A1,…,An,B1,…,Bm) Derived operator: R1 q R2 = s q (R1 x R2) 29 Theta-Join • R3 := R1 JOINC R2 – Take the product R1 * R2. – Then apply SELECTC to the result. • As for SELECT, C can be any boolean-valued condition. – Historic versions of this operator allowed only A theta B, where theta was =, <, etc.; hence the name “thetajoin.” 30 Example Sells( bar, Joe’s Joe’s Sue’s Sue’s beer, Bud Miller Bud Coors price ) 2.50 2.75 2.50 3.00 BarInfo := Sells JOIN BarInfo( bar, Joe’s Joe’s Sue’s Sue’s Bars( name, addr ) Joe’s Maple St. Sue’s River Rd. Sells.bar = Bars.name beer, Bud Miller Bud Coors price, 2.50 2.75 2.50 3.00 Bars name, addr ) Joe’s Maple St. Joe’s Maple St. Sue’s River Rd. Sue’s River Rd. 31 Natural Join • Notation: R1 R2 • Input Schema: R1(A1, …, An), R2(B1, …, Bm) • Output Schema: S(C1,…,Cp) – Where {C1, …, Cp} = {A1, …, An} U {B1, …, Bm} • Meaning: combine all pairs of tuples in R1 and R2 that agree on the attributes: – {A1,…,An} {B1,…, Bm} (called the join attributes) • Equivalent to a cross product followed by selection • Example Employee Dependents 32 Natural Join Example Employee Name John Tony SSN 999999999 777777777 Dependents SSN 999999999 777777777 Dname Emily Joe Employee Dependents = PName, SSN, Dname(s SSN=SSN2(Employee x rSSN2, Dname(Dependents)) Name John Tony SSN 999999999 777777777 Dname Emily Joe 33 Natural Join • R= • R A B B C X Y Z U X Z V W Y Z Z V Z V S = S= A B C X Z U X Z V Y Z U Y Z V Z V W 34 Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ? • Given R(A, B, C), S(D, E), what is R • Given R(A, B), S(A, B), what is R S ? S ? 35 Natural Join • A frequent type of join connects two relations by: – Equating attributes of the same name, and – Projecting out one copy of each pair of equated attributes. • Called natural join. • Denoted R3 := R1 JOIN R2. 36 Example Sells( bar, Joe’s Joe’s Sue’s Sue’s beer, Bud Miller Bud Coors price ) 2.50 2.75 2.50 3.00 Bars( bar, addr ) Joe’s Maple St. Sue’s River Rd. BarInfo := Sells JOIN Bars Note Bars.name has become Bars.bar to make the natural join “work.” BarInfo( bar, Joe’s Joe’s Sue’s Sue’s beer, Bud Milller Bud Coors price, 2.50 2.75 2.50 3.00 addr ) Maple St. Maple St. River Rd. River Rd. 37 Equi-join • Most frequently used in practice: R1 A=B R2 • Natural join is a particular case of equi-join • A lot of research on how to do it efficiently 38 Semijoin • R S = P A1,…,An (R S) • Where the schemas are: – Input: R(A1,…An), S(B1,…,Bm) – Output: T(A1,…,An) 39 Semijoin Applications in distributed databases: • Product(pid, cid, pname, ...) at site 1 • Company(cid, cname, ...) at site 2 • Query: sprice>1000(Product) cid=cid Company • Compute as follows: T1 = sprice>1000(Product) T2 = Pcid(T1) send T2 to site 2 T3 = T2 Company send T3 to site 1 Answer = T3 T1 site 1 site 1 (T2 smaller than T1) site 2 (semijoin) (T3 smaller than Company) site 1 (semijoin) 40 Joins • • • • • • • Theta join Natural join Equi-join Semi-join Inner join Outer join etc. 41 Relational Algebra at a Glance • Operators: relations as input, new relation as output • Five basic RA operations: – Basic Set Operations • union, difference (no intersection, no complement) – Selection: s – Projection: p – Cartesian Product: X • When our relations have attribute names: – Renaming: r • Derived operations: – Intersection, complement – Joins (natural,equi-join, theta join, semi-join) 42 Building Complex Expressions • Example – • • • in arithmetic algebra: (x + 4)*(y - 3) Relational algebra allows the same. That is, we can build complex expressions using relational algebra operators Three notations, just as in arithmetic: 1. Sequences of assignment statements. 2. Expressions with several operators. 3. Expression trees. 43 Sequences of Assignments • Create temporary relation names. • Renaming can be implied by giving relations a list of attributes. • Example: R3 := R1 JOINC R2 can be written: R4 := R1 * R2 R3 := SELECTC (R4) 44 Expressions with Several Operators • • Example: the theta-join R3 := R1 JOINC R2 can be written: R3 := SELECTC (R1 * R2) Precedence of relational operators: 1. Unary operators --- select, project, rename --- have highest precedence, bind first. 2. Then come products and joins. 3. Then intersection. 4. Finally, union and set difference bind last. But you can always insert parentheses to force the order you desire. 45 Expression Trees • Leaves are operands --- either variables standing for relations or particular, constant relations. • Interior nodes are operators, applied to their child or children. 46 Example • Using the relations Bars(name, addr) and Sells(bar, beer, price), find the names of all the bars that are either on Maple St. or sell Bud for less than $3. Sells( bar, Joe’s Joe’s Sue’s Sue’s beer, Bud Miller Bud Coors price ) 2.50 2.75 2.50 3.00 Bars(name, addr ) Joe’s Maple St. Sue’s River Rd. 47 As a Tree: • Using the relations Bars(name, addr) and Sells(bar, beer, price), find the names of all the bars that are either on Maple St. or sell Bud for less than $3. name UNION Joe’s Sue’s name RENAMER(name) Joe’s Joe’s Sue’s PROJECTname PROJECTbar name address Joe’s Maple St. SELECTaddr = “Maple St.” Bars bar SELECT bar beer price Joe’s Bud 2.5 Sue’s Bud 2.5 price<3 AND beer=“Bud” Sells 48 Example • Using Sells(bar, beer, price), find the bars that sell two different beers at the same price. Sells( bar, Joe’s Joe’s Sue’s Joe’s beer, Bud Miller Bud Coors price ) 2.50 2.75 2.50 2.50 49 Strategy • Using Sells(bar, beer, price), find the bars that sell two different beers at the same price. Sells( bar, Joe’s Joe’s Sue’s Joe’s beer, Bud Miller Bud Coors price ) 2.50 2.75 2.50 2.50 S( bar, Joe’s Joe’s Sue’s Joe’s beer1, price ) Bud 2.50 Miller 2.75 Bud 2.50 Coors 2.50 • Sells natural-join S = (bar, beer, beer1, price) – need a selection condition though. What is it? 50 Strategy Summary • By renaming, define a copy of Sells, called S(bar, beer1, price). The natural join of Sells and S consists of quadruples (bar, beer, beer1, price) such that the bar sells both beers at this price. 51 The Tree PROJECTbar SELECTbeer != beer1 JOIN RENAMES(bar, beer1, price) Sells Sells 52 Complex Queries Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone, city) Find phone numbers of people who bought gizmos from Fred. Find telephony products that somebody bought 53 Expression Tree P phone buyer-ssn=ssn pid=pid seller-ssn=ssn P ssn Person Purchase P pid sname=fred sname=gizmo Person Product 54 Exercises Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Ex #1: Find people who bought telephony products. Ex #2: Find names of people who bought American products 55 Exercises Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Ex #3: Find names of people who bought American products and did not buy French products Ex #4: Find names of people who bought American products and they live in Madison. 56 Exercises Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Ex #5: Find people who bought stuff from Joe or bought products from a company whose stock prices is more than $50. 57 Operations on Bags (and why we care) • Union: {a,b,b,c} U {a,b,b,b,e,f,f} = {a,a,b,b,b,b,b,c,e,f,f} – add the number of occurrences • Difference: {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d} – subtract the number of occurrences • Intersection: {a,b,b,b,c,c} {b,b,c,c,c,c,d} = {b,b,c,c} – minimum of the two numbers of occurrences • Selection: preserve the number of occurrences • Projection: preserve the number of occurrences (no duplicate elimination) • Cartesian product, join: no duplicate elimination 58 Summary of Relational Algebra • Why bother ? Can write any RA expression directly in C++/Java, seems easy. • Two reasons: – Each operator admits sophisticated implementations (think of , s C) – Expressions in relational algebra can be rewritten: optimized 59 Example Sells( bar, Joe’s Joe’s Sue’s Sue’s beer, Bud Miller Bud Coors price ) 2.50 2.75 2.50 3.00 Bars( bar, addr ) Joe’s Maple St. Sue’s River Rd. Query: Find all locations that sell beer for less than 2.75. Select addr From Sells, Bars Where (Sells.bar = Bars.bar) and (Sells.price < 2.75) 60 Example PROJECTBars.addr SELECTSells.price < 2.75 JOIN Sells Sells.bar = Bars.bar Bars 61 What is an “Algebra” • Mathematical system consisting of: – Operands --- variables or values from which new values can be constructed. – Operators --- symbols denoting procedures that construct new values from given values. 62 What is Relational Algebra? • An algebra whose operands are relations or variables that represent relations. • Operators are designed to do the most common things that we need to do with relations in a database. – The result is an algebra that can be used as a query language for relations. 63 Glimpse Ahead: Efficient Implementations of Operators • s(age >= 30 AND age <= 35)(Employees) – Method 1: scan the file, test each employee – Method 2: use an index on age – Which one is better ? Depends a lot… • Employees – – – – – Relatives Iterate over Employees, then over Relatives Iterate over Relatives, then over Employees Sort Employees, Relatives, do “merge-join” “hash-join” etc 64 Glimpse Ahead: Optimizations Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Person(ssn, name, phone number, city) • Which is better: sprice>100(Product) (Purchase (sprice>100(Product) Purchase) scity=seaPerson) scity=seaPerson • Depends ! This is the optimizer’s job… 65 Finally: RA has Limitations ! • Cannot compute “transitive closure” Name1 Name2 Relationship Fred Mary Father Mary Joe Cousin Mary Bill Spouse Nancy Lou Sister • Find all direct and indirect relatives of Fred • Cannot express in RA !!! Need to write C program 66 Apply the Lessons You Have Learned • Suppose you work at Facebook • Now in charge of helping FB applications manage the FB friendship graph • What should you do? 67