Relational Algebra – Basis for Relational Query Languages Based on presentation by Juliana Freire Formal relational query languages Is this the Algebra you know? Algebra -> operators and atomic operands Expressions -> applying operators to atomic operands and/or other expressions Algebra of arithmetic: operands are variables and constants, and operators are the usual arithmetic operators E.g., (x+y)*2 or ((x+7)/(y-3)) + x Relational algebra: operands are variables that stand for relations and relations (sets of tuples), and operations include union, intersection, selection, projection, Cartesian product, etc – E.g., (π c-ownerChecking-account) ∩ (π s-ownerSavings-account) What is a query? A query is applied to relation instances, and the result of a query is also a relation instance. (view, query) – Schemas of input and output fixed, but instances not. • Operators refer to relation attributes by position or name: – E.g., Account(number, owner, balance, type) – Positional notation easier for formal definitions, named-field notation more readable. – Both used in SQL Relational Algebra Operations The usual set operations: union, intersection, difference •Operations that remove parts of relations: selection, projection •Operations that combine tuples from two relations: Cartesian product, join •Since each operation returns a relation, operations can be composed! Removing Parts of Relations • Selection – rows • Projection - columns Selection: Example Another selection Example of Projection Projection removes duplicates Set Operations • Union • Intersection • Difference What happens when sets unite? Union Operation – Example • Relations r, s:A B A B 1 2 2 3 1 s r r s: A B 1 2 1 3 Union Example Union Compatibility Intersection Set Difference Operation – Example • Relations r, s:A B A B 1 2 2 3 1 s r r – s: A B 1 1 Difference Another way to show intersection? Summary so far: • • • • • • E1 U E2 : union E1 - E2 : difference E1 x E2 : cartesian product c(E1) : select rows, c = condition (book has p for predicate) IIs(E1) : project columns : s =selected columns x(c1,c2) (E1) : rename, x is new name of E1, c1 is new name of column Combining Tuples of Two Relations • Cross product (Cartesian product) • Joins Cartesian-Product Operation – Example Relations r, s: A B C D E 1 2 10 10 20 10 a a b b r s r x s: A B C D E 1 1 1 1 2 2 2 2 10 10 20 10 10 10 20 10 a a b b a a b b Cross Product Example Cross Product • How to resolve???? Renaming operator: Rename whole relation: Teacher X secondteacher(Teacher) Teacher.t-num, Teacher.t-name, secondteacher.t-num, secondteacher.t-name OR rename attribute before combining: Teacher X secondteacher(t-num2, t-name2)(Teacher) t-num, t-name, t-num2, t-name2 OR rename after combining c(t-num1, t-name1, t-num2, t-name2)(Teacher X Teacher) t-num1, t-name1, t-num2, t-name2 Join: Example Join : Example Condition Join Equi and Natural Join Divide operator Divide Operation Divide Definition When to divide? Division Example Dividing without division sign Working out an example Assignment operation Why would we use Relational Algebra????? Equivalencies help ER vs RA • Both ER and the Relational Model can be used to model the structure of a database. • Why is it the case that there are only Relational Databases and no ER databases? RA vs Full Programming Language • Relational Algebra is not Turing complete. There are operations that cannot be expressed in relational algebra. • What is the advantage of using this language to query a database? Summary of Operators updated • • • • • • • Summary so far: E1 U E2 : union E1 - E2 : difference E1 x E2 : cartesian product c(E1) : select rows, c = condition (book has p for predicate) IIs(E1) : project columns : s =selected columns x(c1,c2) (E1) : rename, x is new name of E1, c1 is new name of column • E1 E2 : division • E1 E2 : join, c = match condition Practice • Find names of stars who’ve appeared in a 1994 movie • Information about movie year available in Movies; so need an extra join: σyear=1994(πname(Stars ⋈ AppearIn ⋈ Movies)) • A more efficient solution: πname(Stars ⋈ AppearIn ⋈ (σyear=1994( Movies)) • An even more efficient solution: πname(Stars ⋈ πname(AppearIn ⋈ (πmovieIdσyear=1994(Movies))) A query optimizer can find this, given the first solution! Extended Relational Algebra Operations • Generalized projection • Outer join • Aggregate functions Generalized projection – calculate fields Aggregate Operation – Example • Relatio n r: g sum(c) (r) A B C 7 sum(c ) 27 7 3 10 Aggregate • Functions on more than one tuple • Samples: – – – – – – Sum Count-distinct Max Min Count Avg • Use “as” to rename branchname g sum(balance) as totalbalance (account) Aggregate Operation – Example • Relation account by branch-name: branch_namegrouped account_number balance Perryridge Perryridge Brighton Brighton Redwood branch_name g A-102 A-201 A-217 A-215 A-222 400 900 750 750 700 sum(balance) (account) branch_name Perryridge Brighton Redwood sum(balance) 1300 1500 700 Outer Join • Keep the outer side even if no join • Fill in missing fields with nulls Outer Join – Example • Relation loan loan_number branch_name L-170 L-230 L-260 Downtown Redwood Perryridge amount 3000 4000 1700 Relation borrower customer_name loan_number Jones Smith Hayes L-170 L-230 L-155 Outer Join – Example • Inner Join loan Borrower loan_number branch_name L-170 L-230 Downtown Redwood amount customer_name 3000 4000 Jones Smith Left Outer Join loan Borrower loan_number branch_name L-170 L-230 L-260 Downtown Redwood Perryridge amount customer_name 3000 4000 1700 Jones Smith null Outer Join – Example Right Outer Join loan borrower loan_number branch_name L-170 L-230 L-155 Downtown Redwood null amount customer_name 3000 4000 null Jones Smith Hayes Full Outer Join loan borrower loan_number branch_name L-170 L-230 L-260 L-155 Downtown Redwood Perryridge null amount customer_name 3000 4000 1700 null Jones Smith null Hayes Summary of Operators - Full • • • • • • E1 U E2 : union E1 - E2 : difference E1 x E2 : cartesian product c(E1) : select rows, c = condition (book has p for predicate) IIs(E1) : project columns : s =selected columns separated by commas, can have calculations included x(c1,c2) (E1) : rename, x is new name of E1, c1 is new name of column • E1 E2 : division • E1 • • • E1 E2 : outer join, c = match condition, keep the side with the arrows : assignment – give a new name to an expression to make it easy to read as : rename a calculated column • attribute1g function (attribute2) (E1) : perform function on attribute2 whenever attribute1 changes E2 : join, c = match condition