DEFINITIONS Super key: a single key or a group of multiple keys that unique identifies tuples in a table Candiate key: is a subset of Super key and can be just one attribute. Essentially it remove unneed attributes. Primary key: each table can have only one used to retrieve unique tuples. The primary key is a subset of candiate keys. The primary key should be non-NULLable in order to maitain entity integrity contraints. Foreign key: a key that is primary in another (parent) table but is included in the instant {host) table to maintain reference integrity constraints. Cardinality: is the total number of values for a domain (i.e. the toal number of values for a range). Contraints: 1. 2. 3. 4. 5. 6. Entity Constraint: refers to the use of a primary key to uniquely identifies tuples therein. This is violated if the key is set to NULL. Key Constraint: refers to the uniqueness that a primary key is to have for al tuples. Can be violated using INSERT operation if the inserted tuple has a primary key already in use (e.g., a user has a SSN number and an insert adds a user with the same SSN). Referential Constraint: refers to the use of a foreign key to allow a tuple to be uniquely identified in a parent table from a child table. Semantic Integrity Constraint: refers to business rules used to maintain range of values for attributes involving triggers and assertions Transition Constrait: refers to transactions in the database that deal with changes in state. Involves the operations of Update (Modify), Delete, and Insert. Domain Constraint: refers to the mismatch of datatypes (e.g., during an INSERT operation) for an attribute Domain: refers to the datatype used by a column in a database Degree (arity): refers to the number of attributes Update Operation Transaction and Constraint Violations Insert Operation This can violate many if not all constraints. For example, it can violate (i) entity integrity constraint (which requires non-Nullable key), (ii) key constraint (i.e., that no two tuples in a relation have a same key value), referential integrity (but only if the instant relation is a child with a foreign key). Delete Operation This can only violate referential integrity. However, it requires that the instant relation be a parent. Put another way, a violation can only occur when the instant reference has a foreign key point to it. There are a number of remedies. For example: Restrict: just stop the delete operation Cascade: in addition to deleting the instant (parent) reference, all (child) references that have foreign keys pointing to the instant (parent) reference are deleted Set Null or Set Default (this is bad): once the parent tuple is deleted, the child tuple that reference the parent tuple has the foreign key (referencing attribute) set to NULL. However, this can be a big problem when the the referencing attribute is part of the key. BAD! Four option?: Once the parent tuple is delete, the child relation table is deleted (i.e. DROPPED) all together thereby removing all tuples in the child. Update Operation This operation can cause two types (maybe 3) of violations-referential integrity, primary key constraint/entity integrity constraint, and semantic constraint. Primary key/Entity integrity constraint are very similar and maybe the same. This can happen during and update because and UPDATE can be on the attribute that is designated as the primary key. Accordingly, the update operation is really just a type of INSERT and has all its problems. Semantic constraint can be a problem. But this is usually checked by the DBMS to confirm that the values are the right datetype/domain. Referential integrity can be a problem is a child relation that has a foreign key has its foreign key modified to reference an item in the parent that doesn't exist. EER Relational Mapping Lecture ET If we have ET, then we have a "relation" ET. If we ET with a property, then "relation" ET gets a attribute. If we ET with a property type identifying, then "relation" ET gets an primary key attribute . If we have ET with composite property type C made up of D and E, then ET gets two attributes. 1 " / t./, . I·--- . ; ~ ET-A-F (Multivalue Property turn to Foreign Key) We don't have multi-values in the relational model (I guess because it is flat and requires atomic domains). As such, we need to put it in a separate relation with a key. 1 We lose the C in relation model. However, they way relation is defined as a subset of tuples with domains of atomic values. The relation model is "flat." Ct /'~ -------1 ,---C_{y/ LED ,-,,ot<-i) 1/ / I, /Vc\&?,)~ rt <!. I!) ' ( $ ~ ) . . \ i ~fs ' ·~ .)'--I(.)' / ,· -~-,• ~~--· / ______ ....,. .-,. t '} \. f ! (....- _. .• -----~i-1gure 3 {Exception to Huie above... namely, when we have total participati~. In the FIG above, we want to put the foreign key in the entity that has total participation because ALL the instances map to the other entity but not the other way around. ET1-R-ET2-1-N (1 to Many) The key to understanding how to form a relationship model is to figure out how the instances map. That is, do each instance map to one or many? That is, we are looking for uniqueness. ._:_) ---- i /Y' : \ :;· r -liQ1 per!-~ ET1-R-ET2-N-M (Many to Many) r;~,0Je,se.~/~ J f""' ..L., . 0 ~.~,_/ er1 For the relation model, we must generate another (third) relation. The third relation's key will be a combination of A_key and B_key. This allows us to create a combination (permutation) of values fo /l~A' J b.:1J<_upht{;(;7 maoytomany ; /,;t,..;vvioq U1i!'d relation c,iiled "R" having a foreign key A and BJ Mapping Weak Entity Types ~f::_:__, I ; ~ Ir.:.;\ i \ ./ j - . v,· ~ 02 A1S~]J r t /f · . I fl7 ,Lx) ( ~ ())IJ(J=(\ ) . \,_ ' ft /lvf aP#}-(gy ~ ----i ~': /n I J ,1 In the FIG above, we know that in our ERR diagram. That "B" shown is a weak key attribute. That is, it ultimately relies on A for uniqueness. As such, in our flat relation mode, we need a combination of A-B as the key, which is a foreign key pointing to A. Disjoint and Overlapping Case Overview All overlapping cases may have replication depending on the implimentation. By contrast, disjoint cases do not have the problem with replication. ~ (!'"'. :_;)". andatory Disjoint Case (2 possibilites).,_. /,~:, J \~·· ~ C-7 ~ ~ f ff ~·, II ~-- c:~) _(0) I , ,,/ . ---,:{' .::::::- ...__,. - A' I/~· ( ,f ''-,I' ! ~----, r~t) ) --- 'y.~ ¾~1~ 1 ;✓r.·.=: /. .. _ ---- ~-r-,-- / ~-- / c.:_ . , .~~ IB / '1-ith~tant ~~i;;ve full participation of ET. That is, ETl and ET2 must have all the attn o · ., s of the super set ET. Accordingly, we have Table with Attributes A and B with C and D corresponding to ETl and ET2. '· The first option-which the teacher does not like given that C or D will have NUlls-involves having a single table with A, B, C, and D along with a type. C and D will have the following combinations respectively: (i) NULL and SET, (ii) SET and NULL, and {iii) SET and SET. 2 The first option has "consistency problems." The second option-which the teaches does like-involves the use of foreign keys. ET1 has a foreign key pointing (e.g.,~ above) to ET. ET2 has a foreign key (e.g.,~ above) pointing to ET. However, there may be a "replication" of the tuple value in ETl and ET2 (i.e., both ET1 and ET2 point to ET). Non-mandatory Overlap Case (4 possibilities with replication) 2 There is no case whereby C and Dare set to NULL and NULL since ET has full participation. Ii) -- I ;:,<.,. -,~ '"t ,._, ~) --... ,..,,.,...,.... In the instant case, there may be: a foreign key in ETl that points to ET, a foreign key in to ET, no foreign key at all, or foreign keys of ETl and ET2 that point to ET (i.e., replication). That is, there may be no instances in ETl and ET2 that point to ET. Going back to the "bad" option. Since there is not full participation, C and D of the A, B, C, D, Type Tuple may be set. So C and D may be as follows: (i) SET and SET, (ii) SET and NULL, (iii) NULL and NULL, and (iv) SET and SET. Non-mandatory Disjoint Case (3 possibilities no replication) In the instant case, there are three possibilities: (i) neither ETl nor ET2 have foreign keys pointing to ET, (ii) only ETl points to ET, and (iii) only ET2 points to ET. There is no replication since this is disjoint. Union Types Relational Algebra Close algebra system is when you have a result of a rational number with operands of rational numbers. We can build high level models from previous models that build off each other. Relational Algebra Operators ~Set Operators Union: Ru S Intersection: R Set Diff: R\S 3 nS Constructors Naturaljoin:R•S Outer Join: R J><1 S Theta - Join: R r><J 9 S Cartesian Product: RxS 3 = Misc Misc Projection: 1TA1,A2 ...,An Selection: O'expression(R) Divide: R + S Rename: P[Al 81, ... AnBn .., v: -:J'i ,,; ~ , r• Cr:t"¥?'. Cl.:> jo;v'} -- 1/,1 / /) (2, ,-rs 01 ~ We should view the "joins" as first starting with a cartesian product. From the cartesian product, we remove rows that violate the condition. There are four groups: 1) Set operators 2) Projection op eliminates columns and selection eliminates rows 3) Joins (construct ops.) 4) Divideby and rename The algebra system is a "closed" system that allows us to nest operations. Selection Op. The sigma op just looks into a table and returns fill the tuples from the table. Simple Expression Selection can have a "simple expression" with an attribute name compared with either constant or another attribute: With a constant: 1) Attribute name = constant 2) Attribute name< constant 3) Attribute name> constant 4) Attribute name>= constant 5) Attribute name<= constant 6) Attribute name !=constant With another Attribute: 7) Attribute name= Attribute_2 8) 9) Attribute name< Attribute_2 Attribute name> Attribute_2 10) Attribute name>= Attribute_2 11) Attribute name<= Attribute_2 12) Attribute name !=Attribute_2 Based on the results, tuples will be returned. Note: all columns will be present. Composite Expression For example, O"currentCity=HomeTow OR HomeTown=1Atlanta,(RegularUser) The expression above is a composite expression. It searches within the RegularUser Table and returns tuples with CurrentCity (which is an attribute) matching HomeTown (which is a second attribute) OR Home Town (which is an attribute) matching a constant of Atlanta. Expressions include: Expressionl AND Expression2 Expressionl OR Expression2 Expression NOT(Expression) Looking at the Result rows 3 and 4, Austin== Austin and Dallas== Dallas. That is, the CurrentCity Attribute equals the HomeTown Attribute. Looking at the Result rows 1 and 2, the HomeTown attribute matches the constant of Atlanta. Projection The projection operator just selects the columns accordingly. rrEmail,BirthYear,Sex (crHomeTown 'Atlanta'(Regular User)) The following means: Select all Users from the Table Regular User from the HomeTown== Atlanta. From that set of Tuples, return columns of Email, BirthYear, and Sex. The example above removes the Current City and Home Town columns. The example above according to the teacher is "interesting" because it nests operators pi and sigma. This can only be done when the algebra system is "closed." Also the projection operator allows us to remove "duplicates" BirthYear 1988 1965 1988 1974 Sex F M F F HomeTown Atlanta Atlanta Atlanta Austin rrsirthYear (crHomeTown='Atlanta'(Regular User)) This is what the table looks like after running the sigma operator. The pi operator allows us to just get the column of ONLY Birth Year. Email Birth Year Sex 1988 1965 1988 F M F However, this is not the final result. We need to remove ALL DUPLICATES. HomeTown Atlanta Atlanta Atlanta Birth Year 1988 1965 1988 Birth Year 1988 1965 Union U {OR) Union is related to "OR". Using a Venn-Diagram, it is either A or B or Both. For Union, both A and B must be "type compatible." That is, the number of attributes of the operands must be the same. Also, the "types" must be compatible. Final all cities that are a Current City or a HomeTown for some Regular User r Email ~-User4 User 9 User 10 User 12 Birth Year 1988 1988 1986 1974 Current City San Fransico Las Vegas Dallas College Park lTcurrentCity(RegularUser) U lTHomeTown(RegularUser) Current City I San Fransico Las Vegas Dallas College Park IHomeTown ! Atlanta I Atlanta I Dallas I Austin Result San Fransico Las Vegas Dallas College Park Atlanta HomeTown Atlanta Atlanta Dallas Austin Austin The above removes all the duplicates {e.g., Atlanta and Dallas). The Union of the example of A and B cannot be greater than eight (8). Intersection n (And) Find all cities that are a Current City for someone and a HomeTown for some RegularUser : Email I User4 I User 9 1 User 10 I User 12 Birth Year 1988 1988 1986 1974 Current City San Fransico Las Vegas Dallas College Park HomeTown Atlanta Atlanta Dallas Austin , Result of Intersection Dallas Set Difference \ Find all cities that are a CurrentCity for some RegularUser, but exclude those that are a HomeTown for some RegularUser. Email Userl User9 UserlO User12 User13 I I I Result Seattle College Park Las Vegas BirthY 1985 1986 1988 1986 1974 Sex M M F F M CurrentCity Seattle College Park Las Vegas Dallas College Park HomeTown Atlanta Atlanta Atlanta Dallas Austin ~ (We remove Dallas because it is shows up in HomeTown) College Park (we remove College Park because we remove duplicates) Natural Join * (Inner Join) (Constructor Operator) Find Email, Year, Sex and Even when the BirthYear of the Regular Year is the same as the EventYear of the Major60sEvents 1, - ·" -; ' fh;i,,: o + '--1-/) ner vb ;11 a.__) V✓ / q A. (cv.,-f6·)<-,/', I Email Userl User2 User3 User8 :y ,· ~ , Year 1963 1963 Sex M M M M -? -, ,,___./ 1963 1962 1g£j ~968 , 1968 1968 1969 1967 - / "' I J / -· . __,,~ _ I _,, ~·' '~ Event March on Wash lch bin ein Berliner speech JFK Cuban Missile Crisis Berlin Wall up let Off Bloody Sunday MLK killed Moon landing Doors: Alabama r-" I R&&r7, ,· '-:jJl~ 1966 Email User2 ~ser3 ser8 {should be listed ultiple times because of multiple rows in Major 60s Table) User8 {same) User8 (same) I 1 Year 1969 1967 1968 M M Moon Landing Doors: Alabama Tet Off 1968 1968 M M Bloody Sunday MLK killed [><] Theta Join 8 (Inner Joint) (Constructor) Find Email, Birth Year, Sex, and EventYear when the Birth Year of the RegularUser is before the EventYear of the Majors60sEvent. . ~· . . Difference Between Natural Join and Theta Join .....- , ,.., 1 ~- -. - _}_,, / -::_.} _ - J --:-· ; ,/ L (J., ~ / I The theta join is a genius of the species of nature join. That is, theta join is the same as nature join when there is an equality. In the theta joint, you have specify two attributes-namely, a first attribute out of A table and a second attribute out of B table. Also make sure you have both of the attributes specified in your result table! . 61/F/l Email Userl User2 Birth Year 1985 (none) 1969 (look for greater than 1969) 1967 {4 hits} 1968 (1 hit) User3 User8 EventYear 1963 1963 1963 1962 1~ ( '! 1968 ' \\ 1968 1968 1969 / 196T 1966 f V/f,?,-,r ·-;,, S;rf.0. /V:~ 0 / Sex M M M M Event March on Wash lch bin ein Berliner speech JFK Cuban Missile Crisis Berlin Wall up Tet Off Bloody Sunday MLK killed Moon Landing Doors: Alabama Rolling Stones: Paint it Black 1,.,,..., vr 4.),/) ~ · V--t J l ~ Email Birth Year (MAKE SURE YOU GET YEARS FROM THE A TABLE NOT B) 1967 1967 1967 1967 1968 User3 User3 User3 i User3 User8 "' EventYear Sex Event 1968 1968 1968 1969 1969 M M M M M Tet Off Bloody Sunday MLK killed Moon Landing Moon Landing (Left) Outer Join l><I Email Userl , User2 M M M M User3 User8 i EventYear I 1963 Event March on Wash lch bin ein Berliner speech JFK Cuban Missile Crisis Berlin Wall up Tet Off Bloody Sunday MLK killed Moon Landing Doors: Alabama Rolling Stones: Paint it Black 1963 i 1963 I 1962 19-61 1968 1968 ~. I I 1968 / 1969 , 1967 1966 Email User2 , User3 User8 I User8 , User8 Userl I / J Birth Year 1969 1967 1968 1968 1968 1985 Sex M M M M M M Event Moon Landing Doors: Alabama Tet Off Bloody Sunday MLK killed NULL ~ C...._ ;x. Items in Yellow are called the "inner'' part of the query whereas only User 1 is called the "outer'' part of the query. The outer will have a NULL. Teaches says it is a special case of the theta joint .... but don't know why. Cartesian X Product The Product of A and B should be the number of tuples in A times B. Let us send an email blast to all users and notify them of all the interests they do not have. First we need to do some type of expansion which is what the Cartesian Product is good for. 1T Email (RegularU ser )X 1T Interest (User Interests) Regular User Email Userl User2 User3 User12 BirthY 1985 1969 1967 1974 Sex M M M F Interest SinceAge Music Blogging Meditation Music 10 13 21 11 Userlnterests i Email Userl User2 User2 User3 The first Pl would return (A) i Uml User2 · User3 • User12 The second Pl would return (B) Blogging Med;at;oa The result should be 12 long (3x4). The product of AX B is: Userl Userl , Userl User2 User2 User2 . User3 User3 User3 User12 1 Music Blogging Meditation Music Blogging Meditation Music Blogging Meditation Music Blogging Meditation User12 User12 Divide By+ Find email of all users with at least all the interests of user 1. • • V s.:rl ntt;'.rt.:sl s) ) r.A and t.B=s.B)t R(A,B) means that R has attributes A. S(B) means that S has attribute (B). This is always the structure of the divide by operator. The output of the divide operator will be A NOT B. 1~1 :J i p ut -r ,U3 ' \/ { Vi u-: ~ J _,..--.,. \) ', 'T / ~ s - --I U.5 --, u< _) 05' \J·-:::; L/f 10I ~ <:=') < 0 (Ji-/ ,/v, LA' (< R(A,B) Pi of email and interest of Userlnterest just drops the since age column: 1 Email Userl Userl Userl User2 User2 User3 User3 User3 User3 User4 r i User4 User4 Interest Music Reading Tennis Swim Tennis Swim Tennis Music Reading DIV Music Reading S(B) Pi of email=userl of Userlnterest is ... but remove duplicates: I Email I Userl ! Userl I Userl I Email Userl R(A,B)/S(B)= Z(A) ... but in this case R(A,B)/S(A)=Z(B) whereby A is Email and Bis lnterest ... need to have one attribute ... Relational Calculus Difference Between Relational Algebra (RA) and Calculus (RC) Relational Algebra is procedural in nature. That is, it is operator based. It is a series of results (i.e., steps). Calculus by contrast is declarative in nature. That is, you describe what you want, not how it is done (i.e., steps). Factoids SQL is mostly based on tuple calculus. Both RC and RA are equivalent in terms of horsepower. It is called "calculus" because queries have variables over ranges of tuples. Discussion {t I P(t)} ... P is the predicate. Find tuples called "t" that satisfy the predicate "P". Predicates are made up of atoms. Range Expression: t E R is the same as R(t) ... this says that tis a tuple of relation R. Attribute Value: t.A means that value ton attribute A. Constant: c is constant. Atoms: (i) t E R, (ii) r.A 0 s.B, or (iii) r.A 0 c. Atoms are predicates. Predicates can be: Nested: (P _l)(P _2) Negated: not(P _1) Or'ed: P_1 OR P_2 And'ed: P_1 AND P_2 Implies: Pl-> P2 f P(t) i~ a predicate, t is a free variable in P, and R is a relation then 3(tER)(P(t)} and V(tER )(PCO) are predicates Selection Without Composite Find all regular users {r I rERegularUser} With Composite Find all Regular Users who have the same CurrentCity and HomeTown or have HomeTown Atlanta {r I rERegularUser AND (r.CurrentCity = r.HomeTown OR r.HomeTown='Atlanta')} Projection Find Email, BirthYear, and Sex for RegularUser's with HomeTown as Atlanta {r.Email, r.BirthYear, r.Sex I rERegularUser4 AND (r.HomeTown='Atlanta')} Union (Related to OR) Just like relational algebra, the Union is related to the logic OR. Think VennDiagrams. Find all cities that are a CurrentCity or a Home Town for some Regular User {s.City I :3( rERegularUser)(s.City=r.CurrentCity) OR 3( rERegularUser)(s.City=r.HomeTown)} Intersection (Related to AND} Find all cities that are a Current City for some RegularUser and a HomeTOwn for some RegularUser {s.City I :3( rERegularUser)(s.City=r.CurrentCity) AND :3( rERegularUser)(s.City=r.HomeTown)} Difference (Related to AND NOT} Think of the Venn Diagram for relational algebra. ~;q.}t ~~ -z; ·3//J ~."· ~ ' ·, ~/,/ \ .. ~- ·~----. ~-~~ '-~---.,__,,, 4 This is the selection predicate P(r). Find all cities that are a CurrentCity for some RegularUser but exclude those that are a HomeTown for some RegularUser {s.City I 3( rERegularUser)(s.City=r.CurrentCity) AND NOT 3( rERegularUser)(s.City=r.HomeTown)} Natural Join There is R and Stables with Result T. Find Email, Year, Sex, and Even when Birth Year of RegularUser is the same as the EventYear of the Major60'sEvents {t.email, t.year, t.sex, t.event I 3( rERegularUser) 3( rEMajor60sEvents) (r.Vear = s.Vear AND t.Year = s.Vear AND t.sex = s.sex AND t.event =s.event) Cartesian Product Combine all RegularUser tuples with all Userlnterests tuples {r,s I( rERegularUser) AND ( rEUserlnterests)} For an email blast, combine all users with the interest they don't have so they can be invited to join groups with those interests The results should have email and interest attributes {r.Email,s.lnterest I( rERegularUser) AND ( rEUserlnterests)} AND NOT(3(t EUserlnterests)(r.Email = t.Email AND s.lnterest = t.lnterest))} Divide By {r.Email I rEUserlnterests AND V(sEUserlnterests)((s.Email t.Email AND t.lnterest = s.lnterest))} != 'Userl') OR 3(tEUserlnterests)(r.Email = /" (.,.,,re" 0-· /" }. 51 V✓h1/ <;e-1 ~ uppecrs· ,..Q lf>v-11.0ll t5 0 f ,r '/A.--.; i~ ce{ {' clacc;1:,·,, c::t1f r,:b,J~. £ _~ Oef'.v~~ -tGrqz3n fGQ_'j , ( •~-_.,✓./,~,,1(,,_,,11,~r\ ,, ... ~ i \ \ r ·•·-~ . ',/ 'i I , I t / I l --... - "· I.,.,. '-- . ,;, , -~/-, .t -.)