Discussion #23 Relational Algebra Discussion #23 1/32 Topics • Algebras • Relational Algebra – – – – – – – use of standard notation set operators , , renaming selection projection cross product join || • Queries (from English) • Query optimization • SQL Discussion #23 2/32 Relational Algebra • What is an algebra? – a pair: (set of values, set of operations) – ADT type Class Object e.g. stack: (set of all stacks, {pop, push, top, …}) integer: (set of all integers, {+, -, *, }) • What is relational algebra? – (set of relations, set of relational operators) – {, , , , , , , ||} Discussion #23 3/32 Relational Algebra is Closed • Closed: all operations produce values in the value set – – – – – (reals, {+, *, }) closed (reals, {+, *, , }) not closed (divide by 0) (reals, {+, *, >}) not closed (T/F not in value set) (computer reals, {+, *, }) not closed (overflow, roundoff) (relations, relational operators) closed • Implication: we can always nest relational operators; can’t for algebras that are not closed. – e.g. after overflow, can do nothing – e.g. can’t always nest: (2 < 3) + 5 = ? Discussion #23 4/32 Set Operations: , , and • Relations are sets; thus set operations should work. • Examples: R= A 1 2 2 RS = A 1 2 2 4 5 Discussion #23 B 2 2 3 2 5 B 2 2 3 RS = A B 2 2 2 3 S= A 2 2 4 5 B 2 3 2 5 RS = A B 1 2 SR = A B 4 2 5 5 5/32 Set Operations (continued …) • Definition: schema(R) = {A, B} = AB, i.e. the set of attributes • We sometimes write R(AB) to mean the relation R with schema AB. • Definition: union compatible – schema(R) = schema(S) – required precondition for , , • Definitions: – R S = { t | t R t S} – R S = { t | t R t S} – R S = { t | t R t S} Discussion #23 6/32 Tuple Restriction: [X] • Restriction is a tuple operator (not a relational operator). • t[X] restricts tuple t to the attributes in X. A B C t=1 2 3 t[A] = (1) t[AC] = (1,3) t = (1,2,3) t[A] = (1,2,3)[A] = {(A,1), (B, 2), (C,3)}[A] = {(A,1)} = (1) Discussion #23 7/32 Renaming: • ABR renames attribute A to be B. – A must be in schema(R) – B must not be in schema(R) • Example: let • But with : R =A B Q =A C RQ = ? 1 2 2 2 2 3 2 2 3 2 Not union compatible CBQ = A B 2 2 3 2 Discussion #23 RCBQ = A B 1 2 2 3 2 2 3 2 8/32 Renaming (continued…) • Q = ABR renames attribute A to B; the result is Q. • Precondition: – A schema(R) – B schema(R) • Postcondition: – schema(Q) = (schema(R) {A}) {B} – Q = {t' | t (tR t' = (t – {(A, t[A])}) {(B, t[A])})} R = {{(A,1), (C,2)} {(A,2), (C,2)}} Discussion #23 Q = ABR = {{(B,1), (C,2)} {(B,2), (C,2)}} 9/32 Selection: • The selection operation selects the tuples that satisfy a condition. R =A B 1 2 2 2 2 3 A=1R = A B 1 2 B=2R = A B PR = { t | t R P(t) } Meaning: apply predicate P to tuple t by substituting into P appropriate t values. 1 2 2 2 A=2B2R = A B 2 2 2 3 A=3R = A B Note: empty, but still retain the schema • Precondition: each attribute mentioned in P must be in schema(R). • Postcondition: PR = { t | t R P(t) } schema(PR) = schema(R) Discussion #23 10/32 Projection: The projection operation restricts tuples in a relation to those designated in the operation. AR = A R =A B 1 2 2 Q =A 1 2 3 2 2 3 B 1 1 4 1 2 C 1 1 5 BR = B 2 3 ABR = R = A,BR = {A,B}R BCQ = B C 1 1 4 5 Precondition: X schema(R) Postcondition: XR = { t' | t (t R t' = t[X]) } schema(XR) = X Discussion #23 11/32 Cross Product: Standard cartesian product adapted for relational algebra R =A B S=C D 1 2 2 2 1 1 2 2 3 3 Discussion #23 R S =A B C D 1 1 1 2 2 2 2 2 2 2 2 2 1 2 3 1 2 3 1 2 3 1 2 3 12/32 Cross Product (continued…) Precondition: schema(R) schema(S) = Postcondition: R S = { t | t' t''(t' R t'' S t = t' t'')} schema(R S) = schema(R) schema(S) R =A B 1 2 = t' 2 2 t' = { (A,1), (B,2) } S=C D 1 1 2 2 3 3 = t'' t'' = { (C,3), (D,3) } t' t'' = { (A,1), (B,2), (C,3), (D,3) } Discussion #23 13/32 Cross Product (continued…) What if R and S have the same attribute, e.g. A? S=C A R =A B 1 1 = t'' = { (C,1), (A,1) } 2 2 3 3 = t''' = { (C,3), (A,3) } 1 2 = t' = { (A,1), (B,2) } 2 2 Can’t do cross product Solution: Rename AAS t' t'' = { (A,1), (B,2), (C,1), (A,1) } R AAS = A B C A 1 1 1 2 2 2 Discussion #23 2 2 2 2 2 2 1 2 3 1 2 3 1 2 3 1 2 3 14/32 Natural Join: || R =A B S=B C R || S = A B C 1 2 2 2 1 2 2 1 3 2 1 2 1 2 2 1 Cross Product R || S = ABC B=B' (R BB'S ) Projection Discussion #23 Selection A 1 11 1 2 22 2 B 2 2 2 2 22 2 B' 1 2 3 1 2 3 C 2 11 2 2 11 2 Renaming 15/32 Join (continued …) • In general, we can equate 0, 1, 2, or more attributes using || . • A join is defined as: schema (R || S) = schema(R) schema(S) R || S = {t | t[schema(R)] R t[schema(S)] S} • There are no preconditions join always works. Discussion #23 16/32 Join (continued…) 0 attributes in common (full cross product) R =A B S=C D 1 1 2 3 4 1 1 1 1 5 1 attribute in common R || S = A B C D 1 1 2 2 4 4 1 1 3 3 1 1 1 1 1 1 1 1 R =A B S=B C R || S = A B C 1 2 2 2 2 3 1 1 2 2 3 3 1 2 2 2 2 2 2 3 3 2 attributes in common R =A B C 1 2 3 2 2 4 2 3 5 Discussion #23 1 5 1 5 1 5 S =A B D R || S = A B C D 1 1 1 2 2 2 2 2 1 2 2 4 2 2 2 4 1 17/32 Join (continued…) • We can use renaming to control the || R =A B S=B C 1 2 2 2 1 2 2 1 3 2 S' = B A = A B 1 2 2 1 3 2 2 1 1 2 2 3 R || CAS = A B 1 2 R || S' = A B 1 2 • BTW, observe equivalence with intersection Discussion #23 18/32 Relational Algebra Expressions • Relational operators are closed. Thus we can nest expressions: R =A B 1 3 2 4 S=B C D 2 2 3 4 5 7 2 5 1 2 3 4 DC=5(R || S) = A B C D 1 2 5 1 1 2 7 2 3 4 5 4 = D 1 4 • Unary operators have precedence over binary operators; binary operators are left associative. • We can now do something very useful: ask and answer with relational algebra (almost) any query we can dream up. Discussion #23 19/32 Relational Algebra Queries • List the prerequisites for EE200. PrerequisiteCourse='EE200'cp = Prerequisite EE005 CS100 • When does CS101 meet? Day,HourCourse='CS101'cdh = Day Hour M W F 9AM 9AM 9AM • When and where does EE200 meet? Day,Hour,RoomCourse='EE200'(cdh || cr) = Day Hour Room Our answers are in (cdh || cr). We select Course to be EE200. Then, project on Day, Hour, Room. Discussion #23 Tu 10AM 25 Ohm Hall W 1PM 25 Ohm Hall Th 10AM 25 Ohm Hall 20/32 Queries (continued…) • Where can I find Snoopy at 9 am on Monday? StudentID Name'Snoopy' Address Course StudentID Grade Course Room* Course Day'M' Hour'9AM' Phone RoomName='Snoopy' Day='M' Hour='9AM' (snap || csg || cr || cdh) = Room • Can we rewrite the query more optimally? • What rules should we use? Turing Aud. – Associativity and commutivity of join – Distributive laws for select and project • What strategy should we use? – Eliminate unnecessary operations – Make joins as small as possible before execution Discussion #23 21/32 Query Optimization • “Intuitively” we can write RoomName='Snoopy' Day='M' Hour='9AM' (snap || csg || cr || cdh) as Room(Name='Snoopy'snap || csg || cr || Day='M' Hour='9AM'cdh) • Why does this execute faster? • What laws hold that will let us do this? R || S = S || R P1P2E = P1P2E P(R |×| S) = R || PS (if all the attributes of P are in S) • How do we know they hold? Discussion #23 22/32 Proofs for Laws • • To prove P1P2E = P1P2E, we need to prove that two sets are equal. We prove A = B by showing AB BA. We show that AB by showing that xA xB. Thus, we can do two proofs to prove P1P2E = P1P2E as follows: 1. 2. 3. 4. 5. 6. t P1P2E t E (P1P2)(t) t E P1(t) P2(t) t E P2(t) P1(t) t P2E P1(t) t P1P2E 1. t P1P2E 2. … Discussion #23 premise def.: PR = {t | tR P(t)} identical substitutions & operations commutative def. of def. of premise just go backwards from 6 to 1 in the proof above 23/32 Alternate Proof (Derive the right-hand side from the left-hand side.) Thus, we can prove P1P2E = P1P2E as follows: P1P2E = {t | t E (P1P2)(t)} = {t | t E P1(t) P2(t)} = {t | t E P2(t) P1(t)} = {t | t P2E P1(t)} = {t | t P1P2E} = P1P2E Discussion #23 def.: PR = {t | tR P(t)} identical substitutions & operations commutative def. of def. of def. of a relation 24/32 Proofs for Laws (continued …) • • To prove P(R || S) = R || PS, where all attributes of P are in S, we again need to prove that two sets are equal. As before, we can convert the lhs to the rhs. P(R || S) = {t | t P(R || S)} def. of a relation = {t | t R || S P(t)} def.: PR={t | tRP(t)} = {t | t[schema(R)] R t[schema(S)] S P(t)} def.: R||S={t | t[schema(R)]Rt[schema(S)]S} = {t | t[schema(R)] R t[schema(S)] S P(t[schema(S)])} all attributes of P are in S = {t | t[schema(R)] R t[schema(S)] PS} = {t | t R || PS} = R || PS Discussion #23 def. of def. of || def. of a relation 25/32 SQL Correspondence with Relational Algebra Assume we have relations R(AB) and S(BC). A B = 1 R select A from R where B = 1 select B from R except select B from S B R B S select A, R.B, C from R, S where R.B = S.B A, R.B, C R.B = S.B (R S) = R || S Discussion #23 26/32 SQL Correspondence with Relational Algebra Assume we have relations R(AB) and S(BC). A B = 1 R select A from R where B = 1 select R.B from R where R.B not in (select S.B from S) B R B S select * from R natural join S R || S Discussion #23 27/32 SQL Queries • List the prerequisites for EE200. select Prerequisite from cp where Course='EE200' Prerequisite EE005 CS100 • When does CS101 meet? select Day, Hour from cdh where Course= 'CS101' Day M W F Hour 9AM 9AM 9AM • When and where does EE200 meet? select cdh.Course, Day, Hour, Room from cdh, cr where cdh.Course= 'EE200' and cdh.Course=cr.Course Discussion #23 Course EE200 EE200 EE200 Day Tu W Th Hour 10AM 1PM 10AM Room 25 Ohm Hall 25 Ohm Hall 25 Ohm Hall 28/32 SQL Queries • List the prerequisites for EE200. select Prerequisite from cp where Course='EE200' Prerequisite EE005 CS100 • When does CS101 meet? select Day, Hour from cdh where Course= 'CS101' Day M W F Hour 9AM 9AM 9AM • When and where does EE200 meet? select Course, Day, Hour, Room from cdh natural join cr where cdh.Course= 'EE200' Discussion #23 Course EE200 EE200 EE200 Day Hour Tu 10AM W 1PM Th 10AM Room 25 Ohm Hall 25 Ohm Hall 25 Ohm Hall 29/32 SQL Queries • List all prerequisite courses. select Prerequisite from cp Prerequisite CS100 EE005 CS100 CS101 CS120 CS101 CS121 CS205 select distinct Prerequisite from cp Prerequisite CS100 CS101 CS120 CS121 CS205 EE005 Discussion #23 30/32 SQL Queries • Where can I find Snoopy at 9 am on Monday? select Room Room from snap, csg, cr, cdh Turing Aud. where Name='Snoopy' and Day='M' and Hour='9AM' and snap.StudentID=csg.StudentID and csg.Course=cr.Course and cr.Course=cdh.Course • List all prereqs of CS750 (including prereqs of prereqs.) • Not possible with standard SQL (unless nesting depth is known) • Is possible with Datalog Rules: prereqOf(x, y) :- cp(y, x). prereqOf(x, y) :- prereqOf(x, z), cp(y, z). Query: prereqOf(x, 'CS750')? • To gain more power and flexibility, we typically embed SQL in a high-level language. Discussion #23 31/32 SQL Queries • List all prereqs of CS750 (including prereqs of prereqs.) select cp.Prerequisite from cp where cp.Course = 'CS750' union select cp1.Prerequisite from cp cp1, cp cp2 where cp1.Course = cp2.Prerequisite and cp2.Course = 'CS750' union … Discussion #23 32/32