Uploaded by dennis Kyritsis

Final Notes for test 2

advertisement
DEFINITIONS
Super key: a single key or a group of multiple keys that unique identifies tuples in a table
Candiate key: is a subset of Super key and can be just one attribute. Essentially it remove unneed
attributes.
Primary key: each table can have only one used to retrieve unique tuples. The primary key is a subset of
candiate keys. The primary key should be non-NULLable in order to maitain entity integrity contraints.
Foreign key: a key that is primary in another (parent) table but is included in the instant {host) table to
maintain reference integrity constraints.
Cardinality: is the total number of values for a domain (i.e. the toal number of values for a range).
Contraints:
1.
2.
3.
4.
5.
6.
Entity Constraint: refers to the use of a primary key to uniquely identifies tuples therein. This is
violated if the key is set to NULL.
Key Constraint: refers to the uniqueness that a primary key is to have for al tuples. Can be
violated using INSERT operation if the inserted tuple has a primary key already in use (e.g., a
user has a SSN number and an insert adds a user with the same SSN).
Referential Constraint: refers to the use of a foreign key to allow a tuple to be uniquely
identified in a parent table from a child table.
Semantic Integrity Constraint: refers to business rules used to maintain range of values for
attributes involving triggers and assertions
Transition Constrait: refers to transactions in the database that deal with changes in state.
Involves the operations of Update (Modify), Delete, and Insert.
Domain Constraint: refers to the mismatch of datatypes (e.g., during an INSERT operation) for
an attribute
Domain: refers to the datatype used by a column in a database
Degree (arity): refers to the number of attributes
Update Operation Transaction and Constraint
Violations
Insert Operation
This can violate many if not all constraints.
For example, it can violate (i) entity integrity constraint (which requires non-Nullable key), (ii) key
constraint (i.e., that no two tuples in a relation have a same key value), referential integrity (but only if
the instant relation is a child with a foreign key).
Delete Operation
This can only violate referential integrity. However, it requires that the instant relation be a parent. Put
another way, a violation can only occur when the instant reference has a foreign key point to it.
There are a number of remedies. For example:
Restrict: just stop the delete operation
Cascade: in addition to deleting the instant (parent) reference, all (child) references that have
foreign keys pointing to the instant (parent) reference are deleted
Set Null or Set Default (this is bad): once the parent tuple is deleted, the child tuple that
reference the parent tuple has the foreign key (referencing attribute) set to NULL. However, this can be
a big problem when the the referencing attribute is part of the key. BAD!
Four option?: Once the parent tuple is delete, the child relation table is deleted (i.e. DROPPED)
all together thereby removing all tuples in the child.
Update Operation
This operation can cause two types (maybe 3) of violations-referential integrity, primary key
constraint/entity integrity constraint, and semantic constraint.
Primary key/Entity integrity constraint are very similar and maybe the same. This can happen during and
update because and UPDATE can be on the attribute that is designated as the primary key. Accordingly,
the update operation is really just a type of INSERT and has all its problems.
Semantic constraint can be a problem. But this is usually checked by the DBMS to confirm that the
values are the right datetype/domain.
Referential integrity can be a problem is a child relation that has a foreign key has its foreign key
modified to reference an item in the parent that doesn't exist.
EER Relational Mapping Lecture
ET
If we have ET, then we have a "relation" ET.
If we ET with a property, then "relation" ET gets a attribute.
If we ET with a property type identifying, then "relation" ET gets an primary key attribute .
If we have ET with composite property type C made up of D and E, then ET gets two attributes. 1
"
/ t./,
.
I·--- .
; ~
ET-A-F (Multivalue Property turn to Foreign Key)
We don't have multi-values in the relational model (I guess because it is flat and requires atomic
domains). As such, we need to put it in a separate relation with a key.
1
We lose the C in relation model. However, they way relation is defined as a subset of tuples with domains of
atomic values. The relation model is "flat."
Ct
/'~
-------1 ,---C_{y/
LED ,-,,ot<-i) 1/ /
I,
/Vc\&?,)~ rt
<!.
I!)
'
(
$
~
)
.
.
\
i
~fs
'
·~
.)'--I(.)'
/
,·
-~-,•
~~--·
/
______
....,.
.-,.
t
'}
\.
f
! (....-
_.
.•
-----~i-1gure 3 {Exception to Huie above... namely, when we have total participati~.
In the FIG above, we want to put the foreign key in the entity that has total participation because ALL
the instances map to the other entity but not the other way around.
ET1-R-ET2-1-N (1 to Many)
The key to understanding how to form a relationship model is to figure out how the instances map. That
is, do each instance map to one or many? That is, we are looking for uniqueness.
._:_)
---- i /Y' : \ :;· r
-liQ1
per!-~
ET1-R-ET2-N-M (Many to Many)
r;~,0Je,se.~/~
J f""' ..L.,
. 0
~.~,_/
er1
For the relation model, we must generate another (third) relation. The third relation's key will be a
combination of A_key and B_key. This allows us to create a combination (permutation) of values fo
/l~A' J b.:1J<_upht{;(;7
maoytomany
; /,;t,..;vvioq U1i!'d relation c,iiled "R" having a foreign key A and BJ
Mapping Weak Entity Types
~f::_:__, I
; ~
Ir.:.;\
i \ ./
j
-
.
v,·
~
02 A1S~]J
r t /f · .
I fl7
,Lx) ( ~ ())IJ(J=(\ ) . \,_ '
ft
/lvf
aP#}-(gy
~
----i
~':
/n
I
J
,1
In the FIG above, we know that in our ERR diagram. That "B" shown is a weak key attribute. That is, it
ultimately relies on A for uniqueness. As such, in our flat relation mode, we need a combination of A-B
as the key, which is a foreign key pointing to A.
Disjoint and Overlapping Case Overview
All overlapping cases may have replication depending on the implimentation. By contrast, disjoint cases
do not have the problem with replication.
~ (!'"'.
:_;)". andatory Disjoint Case (2 possibilites).,_.
/,~:, J \~·· ~
C-7 ~
~
f ff
~·,
II
~-- c:~)
_(0)
I , ,,/
. ---,:{'
.::::::-
...__,.
-
A' I/~· (
,f ''-,I'
!
~----,
r~t) )
--- 'y.~ ¾~1~
1
;✓r.·.=: /.
.. _
----
~-r-,-- /
~-- / c.:_
. ,
.~~
IB
/
'1-ith~tant ~~i;;ve full participation of ET. That is, ETl and ET2 must have all the attn
o
·
.,
s of
the super set ET. Accordingly, we have Table with Attributes A and B with C and D corresponding to ETl
and ET2.
'·
The first option-which the teacher does not like given that C or D will have NUlls-involves having a
single table with A, B, C, and D along with a type. C and D will have the following combinations
respectively: (i) NULL and SET, (ii) SET and NULL, and {iii) SET and SET. 2 The first option has "consistency
problems."
The second option-which the teaches does like-involves the use of foreign keys. ET1 has a foreign key
pointing (e.g.,~ above) to ET. ET2 has a foreign key (e.g.,~ above) pointing to ET.
However, there may be a "replication" of the tuple value in ETl and ET2 (i.e., both ET1 and ET2 point to
ET).
Non-mandatory Overlap Case (4 possibilities with replication)
2
There is no case whereby C and Dare set to NULL and NULL since ET has full participation.
Ii)
-- I
;:,<.,.
-,~
'"t ,._,
~)
--... ,..,,.,...,....
In the instant case, there may be: a foreign key in ETl that points to ET, a foreign key in
to ET, no foreign key at all, or foreign keys of ETl and ET2 that point to ET (i.e., replication). That is,
there may be no instances in ETl and ET2 that point to ET.
Going back to the "bad" option. Since there is not full participation, C and D of the A, B, C, D, Type Tuple
may be set. So C and D may be as follows: (i) SET and SET, (ii) SET and NULL, (iii) NULL and NULL, and (iv)
SET and SET.
Non-mandatory Disjoint Case (3 possibilities no replication)
In the instant case, there are three possibilities: (i) neither ETl nor ET2 have foreign keys pointing to ET,
(ii) only ETl points to ET, and (iii) only ET2 points to ET. There is no replication since this is disjoint.
Union Types
Relational Algebra
Close algebra system is when you have a result of a rational number with operands of rational numbers.
We can build high level models from previous models that build off each other.
Relational Algebra Operators
~Set Operators
Union: Ru S
Intersection: R
Set Diff: R\S
3
nS
Constructors
Naturaljoin:R•S
Outer Join: R J><1 S
Theta - Join: R r><J 9 S
Cartesian Product: RxS 3
=
Misc
Misc
Projection: 1TA1,A2 ...,An
Selection: O'expression(R)
Divide: R + S
Rename: P[Al 81, ... AnBn
..,
v: -:J'i
,,;
~
,
r•
Cr:t"¥?'.
Cl.:>
jo;v'} --
1/,1 /
/)
(2,
,-rs 01 ~
We should view the "joins" as first starting with a cartesian product. From the cartesian product, we remove rows
that violate the condition.
There are four groups:
1)
Set operators
2) Projection op eliminates columns and selection eliminates rows
3) Joins (construct ops.)
4) Divideby and rename
The algebra system is a "closed" system that allows us to nest operations.
Selection Op.
The sigma op just looks into a table and returns
fill the tuples from the table.
Simple Expression
Selection can have a "simple expression" with an attribute name compared with either constant or
another attribute:
With a constant:
1) Attribute name = constant
2)
Attribute name< constant
3)
Attribute name> constant
4)
Attribute name>= constant
5)
Attribute name<= constant
6)
Attribute name !=constant
With another Attribute:
7)
Attribute name= Attribute_2
8)
9)
Attribute name< Attribute_2
Attribute name> Attribute_2
10) Attribute name>= Attribute_2
11) Attribute name<= Attribute_2
12) Attribute name !=Attribute_2
Based on the results, tuples will be returned. Note: all columns will be present.
Composite Expression
For example, O"currentCity=HomeTow
OR HomeTown=1Atlanta,(RegularUser)
The expression above is a composite expression. It searches within the RegularUser Table and returns
tuples with CurrentCity (which is an attribute) matching HomeTown (which is a second attribute) OR
Home Town (which is an attribute) matching a constant of Atlanta.
Expressions include:
Expressionl AND Expression2
Expressionl OR Expression2
Expression
NOT(Expression)
Looking at the Result rows 3 and 4, Austin== Austin and Dallas== Dallas. That is, the CurrentCity
Attribute equals the HomeTown Attribute. Looking at the Result rows 1 and 2, the HomeTown attribute
matches the constant of Atlanta.
Projection
The projection operator just selects the columns accordingly.
rrEmail,BirthYear,Sex (crHomeTown 'Atlanta'(Regular User))
The following means: Select all Users from the Table Regular User from the HomeTown== Atlanta. From
that set of Tuples, return columns of Email, BirthYear, and Sex.
The example above removes the Current City and Home Town columns. The example above according to
the teacher is "interesting" because it nests operators pi and sigma. This can only be done when the
algebra system is "closed."
Also the projection operator allows us to remove "duplicates"
BirthYear
1988
1965
1988
1974
Sex
F
M
F
F
HomeTown
Atlanta
Atlanta
Atlanta
Austin
rrsirthYear (crHomeTown='Atlanta'(Regular User))
This is what the table looks like after running the sigma operator. The pi operator allows us to just
get the column of ONLY Birth Year.
Email
Birth Year
Sex
1988
1965
1988
F
M
F
However, this is not the final result. We need to remove ALL DUPLICATES.
HomeTown
Atlanta
Atlanta
Atlanta
Birth Year
1988
1965
1988
Birth Year
1988
1965
Union U {OR)
Union is related to "OR". Using a Venn-Diagram, it is either A or B or Both. For Union, both A and B must
be "type compatible." That is, the number of attributes of the operands must be the same. Also, the
"types" must be compatible.
Final all cities that are a Current City or a HomeTown for some Regular User
r
Email
~-User4
User 9
User 10
User 12
Birth Year
1988
1988
1986
1974
Current City
San Fransico
Las Vegas
Dallas
College Park
lTcurrentCity(RegularUser) U lTHomeTown(RegularUser)
Current City
I San Fransico
Las Vegas
Dallas
College Park
IHomeTown
! Atlanta
I Atlanta
I Dallas
I Austin
Result
San Fransico
Las Vegas
Dallas
College Park
Atlanta
HomeTown
Atlanta
Atlanta
Dallas
Austin
Austin
The above removes all the duplicates {e.g., Atlanta and Dallas). The Union of the example of A and B
cannot be greater than eight (8).
Intersection
n (And)
Find all cities that are a Current City for someone and a HomeTown for some RegularUser
: Email
I User4
I User 9
1
User 10
I User 12
Birth Year
1988
1988
1986
1974
Current City
San Fransico
Las Vegas
Dallas
College Park
HomeTown
Atlanta
Atlanta
Dallas
Austin
, Result of Intersection
Dallas
Set Difference \
Find all cities that are a CurrentCity for some RegularUser, but exclude those that are a HomeTown for
some RegularUser.
Email
Userl
User9
UserlO
User12
User13
I
I
I
Result
Seattle
College Park
Las Vegas
BirthY
1985
1986
1988
1986
1974
Sex
M
M
F
F
M
CurrentCity
Seattle
College Park
Las Vegas
Dallas
College Park
HomeTown
Atlanta
Atlanta
Atlanta
Dallas
Austin
~ (We remove Dallas because it is shows up in HomeTown)
College Park (we remove College Park because we remove duplicates)
Natural Join
* (Inner Join)
(Constructor Operator)
Find Email, Year, Sex and Even when the BirthYear of the Regular Year is the same as the EventYear of
the Major60sEvents
1,
-
·"
-;
'
fh;i,,: o + '--1-/) ner vb ;11 a.__)
V✓ / q
A.
(cv.,-f6·)<-,/',
I
Email
Userl
User2
User3
User8
:y
,·
~
,
Year
1963
1963
Sex
M
M
M
M
-?
-,
,,___./
1963
1962
1g£j
~968
, 1968
1968
1969
1967
-
/
"'
I
J
/
-·
.
__,,~
_
I
_,,
~·' '~
Event
March on Wash
lch bin ein Berliner speech
JFK
Cuban Missile Crisis
Berlin Wall up
let Off
Bloody Sunday
MLK killed
Moon landing
Doors: Alabama
r-"
I
R&&r7,
,·
'-:jJl~
1966
Email
User2
~ser3
ser8 {should be listed
ultiple times because
of multiple rows in
Major 60s Table)
User8 {same)
User8 (same)
I
1
Year
1969
1967
1968
M
M
Moon Landing
Doors: Alabama
Tet Off
1968
1968
M
M
Bloody Sunday
MLK killed
[><]
Theta Join
8
(Inner Joint) (Constructor)
Find Email, Birth Year, Sex, and EventYear when the Birth Year of the RegularUser is before the EventYear
of the Majors60sEvent.
.
~·
.
.
Difference Between Natural Join and Theta Join
.....-
, ,..,
1
~-
-.
- _}_,, / -::_.}
_ -
J --:-· ; ,/ L
(J., ~ /
I
The theta join is a genius of the species of nature join. That is, theta join is the same as nature join when
there is an equality. In the theta joint, you have specify two attributes-namely, a first attribute out of A
table and a second attribute out of B table.
Also make sure you have both of the attributes specified in your result table!
.
61/F/l
Email
Userl
User2
Birth Year
1985 (none)
1969 (look for greater than
1969)
1967 {4 hits}
1968 (1 hit)
User3
User8
EventYear
1963
1963
1963
1962
1~
( '! 1968
' \\
1968
1968
1969 /
196T
1966
f V/f,?,-,r ·-;,, S;rf.0. /V:~ 0
/
Sex
M
M
M
M
Event
March on Wash
lch bin ein Berliner speech
JFK
Cuban Missile Crisis
Berlin Wall up
Tet Off
Bloody Sunday
MLK killed
Moon Landing
Doors: Alabama
Rolling Stones: Paint it Black
1,.,,...,
vr 4.),/) ~
·
V--t
J
l
~
Email
Birth Year (MAKE
SURE YOU GET
YEARS FROM THE
A TABLE NOT B)
1967
1967
1967
1967
1968
User3
User3
User3
i User3
User8
"'
EventYear
Sex
Event
1968
1968
1968
1969
1969
M
M
M
M
M
Tet Off
Bloody Sunday
MLK killed
Moon Landing
Moon Landing
(Left) Outer Join l><I
Email
Userl
, User2
M
M
M
M
User3
User8
i EventYear
I 1963
Event
March on Wash
lch bin ein Berliner speech
JFK
Cuban Missile Crisis
Berlin Wall up
Tet Off
Bloody Sunday
MLK killed
Moon Landing
Doors: Alabama
Rolling Stones: Paint it Black
1963
i 1963
I 1962
19-61
1968
1968
~.
I
I 1968
/ 1969
, 1967
1966
Email
User2
, User3
User8
I User8
, User8
Userl
I
/
J
Birth Year
1969
1967
1968
1968
1968
1985
Sex
M
M
M
M
M
M
Event
Moon Landing
Doors: Alabama
Tet Off
Bloody Sunday
MLK killed
NULL ~
C...._
;x.
Items in Yellow are called the "inner'' part of the query whereas only User 1 is called the "outer'' part of
the query. The outer will have a NULL. Teaches says it is a special case of the theta joint .... but don't
know why.
Cartesian X Product
The Product of A and B should be the number of tuples in A times B.
Let us send an email blast to all users and notify them of all the interests they do not have. First we need
to do some type of expansion which is what the Cartesian Product is good for.
1T Email (RegularU ser )X 1T Interest (User Interests)
Regular User
Email
Userl
User2
User3
User12
BirthY
1985
1969
1967
1974
Sex
M
M
M
F
Interest
SinceAge
Music
Blogging
Meditation
Music
10
13
21
11
Userlnterests
i
Email
Userl
User2
User2
User3
The first Pl would return (A)
i Uml
User2
· User3
• User12
The second Pl would return (B)
Blogging
Med;at;oa
The result should be 12 long (3x4). The product of AX B is:
Userl
Userl
, Userl
User2
User2
User2
. User3
User3
User3
User12
1
Music
Blogging
Meditation
Music
Blogging
Meditation
Music
Blogging
Meditation
Music
Blogging
Meditation
User12
User12
Divide By+
Find email of all users with at least all the interests of user 1.
•
•
V s.:rl ntt;'.rt.:sl s) )
r.A and t.B=s.B)t
R(A,B) means that R has attributes A. S(B) means that S has attribute (B). This is always the structure of
the divide by operator. The output of the divide operator will be A NOT B.
1~1
:J i
p
ut
-r
,U3
'
\/ {
Vi
u-:
~
J
_,..--.,.
\)
',
'T
/
~
s
- --I
U.5
--,
u<
_)
05'
\J·-:::;
L/f
10I
~
<:=')
<
0
(Ji-/
,/v,
LA'
(<
R(A,B) Pi of email and interest of Userlnterest just drops the since age column:
1
Email
Userl
Userl
Userl
User2
User2
User3
User3
User3
User3
User4
r
i
User4
User4
Interest
Music
Reading
Tennis
Swim
Tennis
Swim
Tennis
Music
Reading
DIV
Music
Reading
S(B) Pi of email=userl of Userlnterest is ... but remove duplicates:
I Email
I Userl
! Userl
I Userl
I Email
Userl
R(A,B)/S(B)= Z(A) ... but in this case R(A,B)/S(A)=Z(B) whereby A is Email and Bis lnterest ... need to have
one attribute ...
Relational Calculus
Difference Between Relational Algebra (RA) and Calculus (RC)
Relational Algebra is procedural in nature. That is, it is operator based. It is a series of results (i.e., steps).
Calculus by contrast is declarative in nature. That is, you describe what you want, not how it is done (i.e.,
steps).
Factoids
SQL is mostly based on tuple calculus. Both RC and RA are equivalent in terms of horsepower. It is called
"calculus" because queries have variables over ranges of tuples.
Discussion
{t I P(t)} ... P is the predicate. Find tuples called "t" that satisfy the predicate "P". Predicates are made up
of atoms.
Range Expression: t E R is the same as R(t) ... this says that tis a tuple of relation R.
Attribute Value: t.A means that value ton attribute A.
Constant: c is constant.
Atoms: (i) t E R, (ii) r.A 0 s.B, or (iii) r.A 0 c.
Atoms are predicates.
Predicates can be:
Nested: (P _l)(P _2)
Negated: not(P _1)
Or'ed: P_1 OR P_2
And'ed: P_1 AND P_2
Implies: Pl-> P2
f P(t) i~ a predicate, t is a free variable in P, and R is a relation then
3(tER)(P(t)} and V(tER )(PCO) are predicates
Selection
Without Composite
Find all regular users
{r
I rERegularUser}
With Composite
Find all Regular Users who have the same CurrentCity and HomeTown or have HomeTown Atlanta
{r
I rERegularUser AND (r.CurrentCity = r.HomeTown OR r.HomeTown='Atlanta')}
Projection
Find Email, BirthYear, and Sex for RegularUser's with HomeTown as Atlanta
{r.Email, r.BirthYear, r.Sex
I rERegularUser4 AND (r.HomeTown='Atlanta')}
Union (Related to OR)
Just like relational algebra, the Union is related to the logic OR. Think VennDiagrams.
Find all cities that are a CurrentCity or a Home Town for some Regular User
{s.City
I :3( rERegularUser)(s.City=r.CurrentCity) OR 3( rERegularUser)(s.City=r.HomeTown)}
Intersection (Related to AND}
Find all cities that are a Current City for some RegularUser and a HomeTOwn for some RegularUser
{s.City
I :3( rERegularUser)(s.City=r.CurrentCity) AND :3( rERegularUser)(s.City=r.HomeTown)}
Difference (Related to AND NOT}
Think of the Venn Diagram for relational algebra.
~;q.}t
~~
-z;
·3//J
~."·
~ ' ·,
~/,/
\
..
~-
·~----.
~-~~
'-~---.,__,,,
4
This is the selection predicate P(r).
Find all cities that are a CurrentCity for some RegularUser but exclude those that are a HomeTown for
some RegularUser
{s.City
I 3( rERegularUser)(s.City=r.CurrentCity) AND NOT 3( rERegularUser)(s.City=r.HomeTown)}
Natural Join
There is R and Stables with Result T.
Find Email, Year, Sex, and Even when Birth Year of RegularUser is the same as the EventYear of the
Major60'sEvents
{t.email, t.year, t.sex, t.event
I 3( rERegularUser) 3( rEMajor60sEvents) (r.Vear = s.Vear AND t.Year =
s.Vear AND t.sex = s.sex AND t.event =s.event)
Cartesian Product
Combine all RegularUser tuples with all Userlnterests tuples
{r,s
I( rERegularUser) AND ( rEUserlnterests)}
For an email blast, combine all users with the interest they don't have so they can be invited to join
groups with those interests
The results should have email and interest attributes
{r.Email,s.lnterest I( rERegularUser) AND ( rEUserlnterests)} AND NOT(3(t EUserlnterests)(r.Email =
t.Email AND s.lnterest = t.lnterest))}
Divide By
{r.Email
I rEUserlnterests AND V(sEUserlnterests)((s.Email
t.Email AND t.lnterest = s.lnterest))}
!= 'Userl') OR 3(tEUserlnterests)(r.Email =
/"
(.,.,,re"
0-· /"
}.
51
V✓h1/
<;e-1
~
uppecrs·
,..Q lf>v-11.0ll t5
0 f
,r '/A.--.; i~ ce{ {' clacc;1:,·,,
c::t1f r,:b,J~.
£ _~ Oef'.v~~ -tGrqz3n fGQ_'j ,
( •~-_.,✓./,~,,1(,,_,,11,~r\
,, ...
~
i
\
\
r
·•·-~
.
',/
'i
I
, I
t
/
I
l
--...
-
"· I.,.,.
'--
.
,;, ,
-~/-, .t
-.)
Download