the relational model - Computer Science and Engineering

advertisement
The Relational Model
Relations
A relation is a more
concrete construction, of
something we have seen
before, the ER diagram.
A relation is (just!) a
table!
We will use table and
relation interchangeably,
except where there is a
possibility of confusion.
S.S.N
street
name
city
students
name
S.S.N
street
city
Lisa 1272 Blaine Riverside
Bart 5592 Apple Irvine
Lisa 7552 11th
Riverside
Sue
5555 Main
Oceanside
The students relation
A relation consists of a relational schema and a relational instance.
A relation schema is essentially a list of column names with their
data types. In this case…
students(name : string, S.S.N : string, street : string, city : string)
• An relation instance
is made up of zero of
more tuples (rows,
records)
name
S.S.N
street
city
Lisa 1272 Blaine Riverside
Bart 5592 Apple Irvine
Lisa 7552 11th
Riverside
Sue
5555 Main
Oceanside
A schema specifies a relation’s name.
students(name : string, S.S.N : string, street : string, city : string)
A schema also specifies the name of each field, and its domain.
Fields are often referred to as columns, attributes, dimensions
A minor, but important point about relations, they are unordered.
name
S.S.N
street
city
name
S.S.N
city
street
Lisa
1272
Blaine
Riverside
Lisa
1272
Riverside
Blaine
Bart
5592
Apple
Irvine
Bart
5592
Irvine
Apple
Lisa
7552
11th
Riverside
Lisa
7552
Riverside
11th
Sue
5555
Main
Oceanside
Sue
5555
Oceanside Main
This is not a problem, since we refer to fields by name.
However sometimes, we refer to the fields by their column number,
in which case the ordering becomes important. I will point this out
when we get there.
Also, the tuples are unordered too!
Note that every tuple in our instance is unique. This is not a
coincidence. The definition of relation demands it.
Later we will see how we can represent weak entities in relations.
name
S.S.N
street
city
Lisa 1272 Blaine Riverside
Bart 5592 Apple Irvine
Lisa 7552 11th
Riverside
Sue
5592 Main
Oceanside
The number of fields is called the
degree (or arity, or dimensionality
of the relation).
Below we have a table of degree 4.
The number of tuples
cardinality of the relation
Of course, we don’t count the
row that has the labels!
To the right we have a table
of cardinality 3.
name
S.S.N
street
city
Lisa 1272 Blaine Riverside
Bart 5592 Apple Irvine
Lisa 7552 11th
Riverside
students(name : string, S.S.N : string, street : string, city : string)
Note that relations have primary keys, just like ER
diagrams. Remember that the primary key might not be
one field, it may be a combination of two or more fields.
name
S.S.N
street
city
Lisa 1272 Blaine Riverside
Bart 5592 Apple Irvine
Lisa 7552 11th
Riverside
Sue
5555 Main
Oceanside
Translating ER diagrams into Relations
We need to figure out how to translate ER diagrams into relations.
There are only three cases to worry about.
• Strong entity sets
• Weak entity sets
• Relationship sets
days
Name
Number
Course
Name
PID
Teaches
Professor
• Strong entity sets
days
Name
Number
Course
Name
PID
Teaches
Professor
professor(PID : string, name : string)
This is trivial, the primary key
of the ER diagram becomes
the primary key of the
relation. All other fields are
copied in (in any order)
PID
name
1234
3421
2342
4531
Keogh
Lee
Smyth
Lee
• Weak entity sets
days
Name
Number
Teaches
Course
Name
PID
Professor
course(PID : string, number : string, name : string)
PID
number
name
1234
3421
2342
4531
CS12
CS11
CS12
CS15
C++
Java
C++
LISP
The primary key of the relation
consists of the union of the primary
key of the strong entity set and the
discriminator of the weak entity set.
The “imported” key from the strong
entity set is called the foreign key.
All other fields are copied in (in any
order)
• Relationship entity sets
days
Name
Number
Teaches
Course
Name
PID
Professor
teaches(PID : string, days : string )
For one-to-one relationship sets,
the relation’s primary key can be
that of either entity set.
• For many-to-many relationship
sets, the union of the primary keys
becomes the relation’s primary key
•For the other cases, the the
relation’s primary key is taken from
the strong entity set.
PID
days
1234
3421
2342
4531
mwf
wed
tue
sat
So, this ER Model…
days
Name
Number
Course
Name
PID
Teaches
Professor
… maps to this database schema
professor(PID : string, name : string)
course(PID : string, number : string, name : string)
teaches(PID : string, days : string)
We have seen how to create a database
schema, how do we create an actual
database on our computers?
professor(PID : string, name : string)
course(PID : string, number : string, name : string)
teaches(PID : string, days : string)
…how do we create an actual database
on our computers?
We use SQL, a language that allows us
to build, modify and query databases.
professor(PID : string, name : string)
SQL (Structured Query Language)
• SQL is a language that allows us to build, modify and
query databases.
• SQL is an ANSI standard language. American National Standards Institute
• SQL is the “engine” behind Oracle, Sybase, Microsoft
SQL Server, Informix,Access, Ingres, etc.
• Most of these systems have build GUIs on top of the
command line interface, so you don’t normally write
statements directly in SQL (although you can).
Important Note
• In our textbook, the authors introduce SQL at the same
time as they introduce the relational model (Chapter 3).
• My plan is a little different. I plan to discuss operations
on databases (using relational algebra) in a more abstract
way, and revisit SQL later in the course.
• I encourage you to glance at the SQL material as you
read about the relational model in Chapter 3, but don’t
worry about the details of SQL just yet.
Relational Algebra
• Procedural language
• Five basic operators
• selection
• projection
• union
• set difference
• Cross product
SQL is closely based
on relational algebra.
select
project
(why no intersection?)
difference
Cartesian product
• The are some other operators which are composed of the above
operators. These show up so often that we give them special names.
• The operators take one or two relations as inputs and give a new
relation as a result.
Select Operation – Example
• Relation r
• A=B ^ D > 5 (r)
lowercase
Greek
sigma
A
B
C
D


1
7


5
7


12
3


23 10
A
B
C
D


1
7


23 10
Intuition: The select operation
allows us to retrieve some rows
of a relation (by “some” I mean
anywhere from none of them to
all of them)
Here I have retrieved all the
rows of the relation r where
either the value in field A
equals the value in field B, or
the value in field D is greater
than 5.
Select Operation
• Notation:  p(r)
• p is called the selection predicate
• Defined as:
lowercase Greek sigma 
p(r) = {t | t  r and p(t)}
Where p is a formula in propositional calculus consisting
of terms connected by :  (and),  (or),  (not)
Each term is one of:
<attribute> op
<attribute> or <constant>
where op is one of: =, , >, . <. 
• Example of selection:
 name=“Keogh(professor)
Project Operation – Example I
• Relation r:
• A,C (r)
Greek capital
letter pi
A
B
C

10
7

20
1

30
1

40
2
A
C

7

1

1

2
Intuition: The project operation
allows us to retrieve some
columns of a relation (by
“some” I mean anywhere from
none of them to all of them)
Here I have retrieved columns
A and C.
Project Operation – Example II
• Relation r:
• A,C (r)
A
B
C

10
1

20
1

30
1

40
2
Intuition: The project
operation removes
duplicate rows, since
relations are sets.
A
C
A
C

1

1

1

1

1

2

2
=
Here there are two rows
with A =  and C = 1. So
one was discarded.
Project Operation
• Notation:
A1, A2, …, Ak (r)
Greek capital letter pi
where A1, A2 are attribute names and r is a
relation name.
• The result is defined as the relation of k
columns obtained by erasing the columns that
are not listed
• Duplicate rows removed from result, since
relations are sets.
Union Operation – Example
Relations r, s:
A
B
A
B

1

2

2

3

1
s
r
r  s:
A
B

1

2

1

3
Intuition: The union
operation concatenates
two relations, and removes
duplicate rows (since
relations are sets).
Here there are two rows
with A =  and B = 2. So
one was discarded.
Union Operation
• Notation: r  s
• Defined as:
r  s = {t | t  r or t  s}
For r  s to be valid.
1. r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (e.g., 2nd column
of r deals with the same type of values as does the 2nd
column of s).
Although the field types must be the same, the names can be
different. For example I can union professor and lecturer where:
professor(PID : string, name : string)
lecturer(LID : string, first_name : string)
Set Difference Operation – Example
Relations r, s:
A
B
A
B

1

2

2

3

1
s
r
r – s:
A
B

1

1
Intuition: The set
difference operation
returns all the rows that
are in r but not in s.
Set Difference Operation
• Notation r – s
• Defined as:
r – s = {t | t  r and t  s}
• Set differences must be taken between
compatible relations.
– r and s must have the same arity
– attribute domains of r and s must be compatible
• Note that in general r – s  s – r
Cross-Product Operation -Example
Relations r, s:
A
B

1

2
r
r x s:
C
D
E




10
10
20
10
a
a
b
b
s
A
B
C
D
E








1
1
1
1
2
2
2
2








10
19
20
10
10
10
20
10
a
a
b
b
a
a
b
b
Intuition: The cross
product operation
returns all possible
combinations of rows in
r with rows in s.
In other words the result
is every possible pairing
of the rows of r and s.
Cross-Product Operation-Example
Relations r, s:
A
B

1

2
r
r x s:
C
D
E




10
10
20
10
a
a
b
b
s
A
B
C
D
E








1
1
1
1
2
2
2
2








10
19
20
10
10
10
20
10
a
a
b
b
a
a
b
b
Intuition: The cross
product operation
returns all possible
combinations of rows in
r with rows in s.
In other words the result
is every possible pairing
of the rows of r and s.
Cross-Product Operation
• Notation r x s
• Defined as:
r x s = {t q | t  r and q  s}
• Assume that attributes of r(R) and s(S) are
disjoint. (That is, R  S = ).
• If attributes names of r(R) and s(S) are not
disjoint, then renaming must be used.
Composition of Operations
• We can build expressions using
multiple operations
• Example: A= C(r x s)
A
B

1

2
r
“take the cross product of r
and s, then return only the
rows where A equals B”
r x s:
C
D
E




10
10
20
10
a
a
b
b
s
A=C(r x s)
A
B
C
D
E








1
1
1
1
2
2
2
2








10
10
20
10
10
10
20
10
a
a
b
b
a
a
b
b
A
B
C
D
E



1
2
2
 10
 20
 20
a
a
b
Rename Operation
• Allows us to name, and therefore
to refer to, the results of
relational-algebra expressions.
Example:
 myRelation (r – s)
Take the set difference of r and s,
and call the result myRelation
Renaming in relational algebra is
essentiality the same as assignment
in a programming language
A
B

A
B
1


2
2


3
1
s
r
A
B

1

1
myRelation
Rename Operation
If a relational-algebra expression E
has arity n, then
x (A1, A2, …, An) (E)
A
B

A
B
1


2
2


3
1
returns the result of expression E under
the name X, and with the attributes
renamed to A1, A2, …., An.
Example
 myRelation(E,K) (r – s)
Take the set difference of
r and s, and call the
result myRelation,
while renaming the first
field E and the second
field K.
s
r
E
K

1

1
myRelation
Banking Examples
branch (branch-name, branch-city, assets)
customer (customer-name, customer-street, customer-only)
account (account-number, branch-name, balance)
loan (loan-number, branch-name, amount)
depositor (customer-name, account-number)
borrower (customer-name, loan-number)
Note that I have not indicated primary keys here for simplicity.
Quick note on notation
good_customers
bad_customers
customer-name loan-number
customer-name loan-number
Patty
1234
Seymour
3432
Apu
3421
Marge
3467
Selma
2342
Selma
7625
Ned
4531
Abraham
3597
If we have two or more relations which feature
the same attribute names, we could confuse them.
To prevent this we can use dot notation.
For example
good_customers.loan-number
Example Queries
• Find all loans of over $1200
amount > 1200 (loan)
loan
amount > 1200 (loan)
“select from the relation loan,
only the rows which have a
amount greater than 1200”
loan-number
1234
3421
2342
4531
branch-name amount
Riverside
1,923.03
Irvine
123.00
Dublin
56.25
Prague
120.03
1234
Riverside
1,923.03
Example Queries
• Find the loan number for each loan of an amount greater than $1200
loan-number (amount > 1200 (loan))
“select from the relation loan,
only the rows which have a
amount greater than 1200, then
project out just the
loan_number”
loan
amount > 1200 (loan)
loan-number (amount > 1200 (loan))
loan-number branch-name
amount
1234
Riverside
1,923.03
3421
Irvine
123.00
2342
Dublin
56.25
4531
Prague
120.03
1234
Riverside
1234
1,923.03
Example Queries
• Find all loans greater than $1200 or less than $75
amount > 1000 or amount < 75(loan)
“select from the relation loan, only
the rows which have a amount
greater than 1000 or an amount less
than 75
loan
amount > 1000 or amount < 75(loan)
loan-number branch-name
amount
1234
Riverside
1,923.03
3421
Irvine
123.00
2342
Dublin
56.25
4531
Prague
120.03
1234
Riverside
2342
Dublin
1,923.03
56.25
Example Queries
• Find the names of all customers who have a loan, an account, or
both, from the bank
customer-name (borrower)  customer-name (depositor)
depositor
borrower
customer-name loan-number
customer-name account-number
Patty
1234
Moe
3467
Apu
3421
Apu
2312
Selma
2342
Patty
9999
Ned
4531
Krusty
3423
customer-name (borrower)
Moe
Apu
customer-name (depositor)
Patty
Patty
Moe
Apu
Krusty
Apu
Selma
Selma
Patty
Ned
Ned
Krusty
Example Queries
Note this example is
split over two slides!
Find the names of all customers who have a loan at the Riverside branch.
customer-name (branch-name=“Riverside ” (borrower.loan-number = loan.loan-number(borrower x loan)))
borrower
We retrieve
borrower and
loan…
…we calculate
their cross
product…
loan
customer-name loan-number
loan-number branch-name amount
Patty
1234
1234
Riverside
Apu
3421
3421
Irvine
customer-name
borrower.loan
-number
loan.loannumber
branch-name
Patty
1234
1234
Riverside
Patty
1234
3421
Irvine
Apu
3421
1234
Riverside
Apu
3421
3421
Irvine
1,923.03
123.00
amount
1,923.03
123.00
1,923.03
123.00
customer-name (branch-name=“Riverside ” (borrower.loan-number = loan.loan-number(borrower x loan)))
…we calculate
their cross
product…
…we select the
rows where
borrower.loannumber is equal to
loan.loan-number…
…we select the
rows where
branch-name is
equal to
customer-name
borrower.loan
-number
loan.loannumber
branch-name
Patty
1234
1234
Riverside
Patty
1234
3421
Irvine
Apu
3421
1234
Riverside
Apu
3421
3421
Irvine
customer-name
borrower.loan
-number
loan.loannumber
branch-name
Patty
1234
1234
Riverside
Apu
3421
3421
Irvine
customer-name
borrower.loan
-number
loan.loannumber
branch-name
Patty
1234
1234
Riverside
“Riverside”
…we project out
the customer-name.
Patty
amount
1,923.03
123.00
1,923.03
123.00
amount
1,923.03
123.00
amount
1,923.03
Example Queries
Note this example is
split over three slides!
Find the largest account balance
...we will need to rename account relation as d...
balance(account) - account.balance(account.balance < d.balance (account x d (account)))
d
account
We do a rename to
get a “copy” of
account which we
call d…
… next we will do
a cross product…
account- balance
number
account- balance
number
Apu
100.30
Apu
100.30
Patty
12.34
Patty
12.34
Lenny
45.34
Lenny
45.34
balance(account) - account.balance(account.balance < d.balance (account x d (account)))
account.accountnumber
… do a cross
product…
…select out all rows
where account.balance
is less than
d.balance…
.. next we project…
account.
balance
d.account- d.balance
number
Apu
100.30 Apu
100.30
Apu
100.30 Patty
12.34
Apu
100.30 Lenny
45.34
Patty
12.34 Apu
100.30
Patty
12.34 Patty
12.34
Patty
12.34 Lenny
45.34
Lenny
45.34 Apu
100.30
Lenny
45.34 Patty
12.34
Lenny
45.34 Lenny
45.34
account.accountnumber
account.
balance
d.account- d.balance
number
Patty
12.34 Apu
Patty
12.34 Lenny
Lenny
45.34 Apu
100.30
45.34
100.30
balance(account) - account.balance(account.balance < d.balance (account x d (account)))
account.accountnumber
.. next we project out
account.balance…
…then we do a set
difference between it
an the original
account.balance from
the account relation…
… the set difference
leaves us with one
number, the largest
value!
account.
balance
d.account- d.balance
number
Patty
12.34 Apu
Patty
12.34 Lenny
Lenny
45.34 Apu
account.
balance
12.34
account
account- balance
number
Apu
100.30
Patty
12.34
Lenny
45.34
100.30
12.34
45.34
100.30
45.34
100.30
Formal Definition
• A basic expression in the relational algebra consists of
either one of the following:
– A relation in the database
– A constant relation
• Let E1 and E2 be relational-algebra expressions; the
following are all relational-algebra expressions:
– E1  E2
– E1 - E2
– E1 x E2
– p (E1), P is a predicate on attributes in E1
– s(E1), S is a list consisting of some of the attributes in E1
–  x (E1), x is the new name for the result of E1
Additional Operations
We define additional operations that do not add
any power to the relational algebra, but that
simplify common queries.
– Natural join
–
–
–
–
Conditional Join
Equi join
Division
Set intersection
All joins are really
special cases of
conditional join
Natural-Join Operation: Motivation
Very often we have a query and the
answer is not contained in a single
relation. For example, I might wish to
know where Apu banks.
The classic relational algebra way to do
such queries is a cross product, followed
by a selection which tests for equality on
some pair of fields.
borrower.l-number = loan.l-number(borrower x
loan)))
While this works…
• it is unintuitive
• it requires a lot of memory
• the notation is cumbersome
borrower
loan
cust-name l-number
l-number
branch
Patty
1234
1234
Dublin
Apu
3421
3421
Irvine
cust-name
borrower.l-number
loan.l-number
branch
Patty
1234
1234
Dublin
Patty
1234
3421
Irvine
Apu
3421
1234
Dublin
Apu
3421
3421
Irvine
cust-name
borrower.l-number
loan.l-number
branch
Patty
1234
1234
Dublin
Apu
3421
3421
Irvine
Note that is this example the two relations are the same size (2 by
2), this does not have to be the case.
So we have a more intuitive way of achieving the same effect,
the natural join, denoted by the
symbol
Natural-Join Operation: Intuition
Natural join combines a cross product and a selection into one
operation. It performs a selection forcing equality on those attributes
that appear in both relation schemes. Duplicates are removed as in
all relation operations.
So if the relations have one attribute in common, as in the last slide
(“l-number”), for example, we have…
borrower
loan
= borrower.l-number = loan.l-number(borrower x loan)))
There are two special cases:
• If the two relations have no attributes in common, then their
natural join is simply their cross product.
• If the two relations have more than one attribute in common,
then the natural join selects only the rows where all pairs of
matching attributes match. (lets see an example on the next slide).
A
l-name
f-name
age
Bouvier
Selma
40
Bouvier
Patty
40
Smith
Maggie 2
Both the l-name and the
f-name match, so select.
Only the f-names match,
so don’t select.
Only the l-names match,
so don’t select.
We remove duplicate
attributes…
The natural join of A and B
Note that this is just a way to visualize the natural join, we
don’t really have to do the cross product as in this example
B
l-name
f-name
ID
Bouvier
Selma
1232
Smith
Selma
4423
l-name
f-name
age
l-name
f-name
ID
Bouvier
Selma
40
Bouvier
Selma
1232
Bouvier
Patty
40
Smith
Selma
4423
Smith
Maggie 2
Bouvier
Selma
1232
Bouvier
Selma
40
Smith
Selma
4423
Bouvier
Patty
40
Bouvier
Selma
1232
Smith
Maggie 2
Smith
Selma
4423
l-name
f-name
age
l-name
f-name
ID
Bouvier
Selma
40
Bouvier
Selma
1232
A
B=
l-name
f-name
age
ID
Bouvier
Selma
40
1232
Natural-Join Operation
• Notation: r s
• Let r and s be relations on schemas R and S respectively.The result is
a relation on schema R  S which is obtained by considering each
pair of tuples tr from r and ts from s.
• If tr and ts have the same value on each of the attributes in R  S, a
tuple t is added to the result, where
– t has the same value as tr on r
– t has the same value as ts on s
• Example:
R = (A, B, C, D)
S = (E, B, D)
• Result schema = (A, B, C, D, E)
• r s is defined as:
r.A, r.B, r.C, r.D, s.E (r.B = s.B r.D = s.D (r x s))
Natural Join Operation – Example
• Relations r, s:
A
B
C
D
B
D
E





1
2
4
1
2





a
a
b
a
b
1
3
1
2
3
a
a
a
b
b





r
r
s
s
A
B
C
D
E





1
1
1
1
2





a
a
a
a
b





How did we get here?
Lets do a trace over the
next few slides…
Warning! Example spread
over many slides, you may
wish to edit before printing.
A
B
C
D
B
D
E





1
2
4
1
2





a
a
b
a
b
1
3
1
2
3
a
a
a
b
b





r
s
First we note which attributes the two relations have in common…
A
B
C
D
B
D
E





1
2
4
1
2





a
a
b
a
b
1
3
1
2
3
a
a
a
b
b





r
s
A
B
C
D
E


1
1


a
a


There are two rows in s that match our first row in r, (in the relevant
attributes) so both are joined to our first row…
A
B
C
D
B
D
E





1
2
4
1
2





a
a
b
a
b
1
3
1
2
3
a
a
a
b
b





r
s
A
B
C
D
E


1
1


a
a


…there are no rows in s that match our second row in r, so do
nothing…
A
B
C
D
B
D
E





1
2
4
1
2





a
a
b
a
b
1
3
1
2
3
a
a
a
b
b





r
s
A
B
C
D
E


1
1


a
a


…there are no rows in s that match our third row in r, so do
nothing…
A
B
C
D
B
D
E





1
2
4
1
2





a
a
b
a
b
1
3
1
2
3
a
a
a
b
b





r
s
A
B
C
D
E




1
1
1
1




a
a
a
a




There are two rows in s that match our fourth row in r, so both are
joined to our fourth row…
A
B
C
D
B
D
E





1
2
4
1
2





a
a
b
a
b
1
3
1
2
3
a
a
a
b
b





r
s
A
B
C
D
E





1
1
1
1
2





a
a
a
a
b





There is one row that matches our fifth row in r,.. so it is joined to
our fifth row and we are done!
Conditional-Join Operation:
The conditional join is actually the most general type of join. I
introduced the natural join first only because it is more intuitive
and.. natural!
Just like natural join, conditional join combines a cross product and
a selection into one operation. However instead of only selecting
rows that have equality on those attributes that appear in both
relation schemes, we allow selection based on any predicate.
r
c
s = c(r x s)
Where c is any predicate
the attributes of r and/or s
Duplicate rows are removed as always, but duplicate columns are
not removed!
Conditional-Join Example:
We want to find all women that are older than their husbands…
r
l-name
f-name marr-Lic
age
l-name
Simpson
Marge
777
35
Simpson Homer
Lovejoy
Helen
234
38
Flanders
Maude 555
24
Krabappel Edna
r
978
s
Lovejoy
f-name
marr-Lic age
777
36
Timothy 234
36
Simpson Bart
null
9
40
r.age > s.age AND r.Marr-Lic = r.Marr-Lic
s
r.l-name
r.f-name
r.Marr-Lic
r.age
s.l-name
s.f-name
s.marr-Lic
s.age
Lovejoy
Helen
234
38
Lovejoy
Timothy
234
36
Note we have removed ambiguity of attribute names by using “dot” notation
Also note the redundant information in the marr-lic attributes
Set-Intersection Operation - Example
Relation r, s:
A
B



1
2
1
A


2
3
s
r
rs
B
A
B

2
Intuition: The
intersection operation
returns all the rows that
are in both r and s.
Set-Intersection Operation
•
•
•
•
Notation: r  s
Defined as:
r  s ={ t | t  r and t  s }
Assume:
– r, s have the same arity
– attributes of r and s are compatible
• Note: r  s = r - (r - s)
r /s
Division Operation
• Suited to queries that include the phrase “for all”.
• Let r and s be relations on schemas R and S
respectively where
– R = (A1, …, Am, B1, …, Bn)
– S = (B1, …, Bn)
The result of r / s is a relation on schema
R – S = (A1, …, Am)
r / s = { t | t   R-S(r)   u  s ( tu  r ) }
Division Operation – Example
Relations r, s:
A
B
B











1
2
3
1
1
1
3
4
6
1
2
1
2
s
r
 occurs in the presence of both 1 and 2, so it is returned.
 occurs in the presence of both 1 and 2, so it is returned.
 does not occur in the presence of both 1 and 2, so is ignored.
...
A
r / s:


Another Division Example
Relations r, s:
A
B
C
D
E
D
E








a
a
a
a
a
a
a
a








a
a
b
a
b
a
b
b
1
1
1
1
3
1
1
1
a
b
1
1
r
s
r /s:
A
B
C


a
a


<, a , > occurs in the presence of both <a,1> and <b,1>, so it is returned.
< , a , > occurs in the presence of both <a,1> and <b,1>, so it is returned.
<, a , > does not occur in the presence of both <a,1> and <b,1>, so it is ignored.
Assignment Operation
• The assignment operation () provides a convenient way to
express complex queries, write query as a sequential program
consisting of a series of assignments followed by an
expression whose value is displayed as a result of the query.
• Assignment must always be made to a temporary relation
variable.
• Example: Write r  s as
temp1  R-S (r)
temp2  R-S ((temp1 x s) – R-S,S (r))
result = temp1 – temp2
– The result to the right of the  is assigned to the relation variable on the
left of the .
– May use variable in subsequent expressions.
Extended Relational-Algebra-Operations
• Generalized Projection
• Outer Join
• Aggregate Functions
Generalized Projection
• Extends the projection operation by allowing arithmetic
functions to be used in the projection list.
 F1, F2, …, Fn(E)
• E is any relational-algebra expression
• Each of F1, F2, …, Fn are are arithmetic expressions
involving constants and attributes in the schema of E.
• Given relation credit-info(customer-name, limit, creditbalance), find how much more each person can spend:
customer-name, limit – credit-balance (credit-info)
Generalized Projection
Given relation credit-info(customer-name, limit, creditbalance), find how much more each person can spend:
customer-name, limit – credit-balance (credit-info)
customer-name
credit-info
limit
credit-balance
Simpson, Marge
500 400
Lovejoy, Helen
2000 1500
Flanders, Maude
0 0
Krabappel, Edna
50 11
100
500
0
39
Aggregate Functions and Operations
• Aggregation function takes a collection of values and returns a
single value as a result.
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
• Aggregate operation in relational algebra
G1, G2, …, Gn
–
–
–
–
g F1( A1), F2( A2),…, Fn( An) (E)
E is any relational-algebra expression
G1, G2 …, Gn is a list of attributes on which to group (can be empty)
Each Fi is an aggregate function (i.e avg, min, max etc)
Each Ai is an attribute name
Aggregate Operation – Example
• Relation r:
g sum(c) (r)
A
B
C








7
7
3
10
sum-C
27
i.e we want to find the sum of all the numbers in attribute C
Aggregate Operation – Example
Relation account grouped by last-name:
account
i.e calculate the
total balances,
grouped by lastname.
last-name
g
last-name
account-number
balance
Simpson
Simpson
Flanders
Flanders
Nahasapeemapetilon
A-102
A-201
A-217
A-215
A-222
400
900
750
750
11700
sum(balance)
(account)
last-name
balance
Simpson
Flanders
Nahasapeemapetilon
1300
1500
11700
Yes yes, I make
good money, but
I was shot 14
times last year
Outer Join
• An extension of the join operation that avoids loss of
information.
• Computes the join and then adds tuples from one
relation that does not match tuples in the other
relation to the result of the join.
• Uses null values:
– null signifies that the value is unknown or does not exist
– All comparisons involving null are (roughly speaking) false
by definition.
• Will study precise meaning of comparisons with nulls later
Outer Join – Example
• Relation loan
 Relation borrower
loan-number
L-170
L-230
L-260
branch-name
Springfield
Shelbyville
Dublin
amount
3000
4000
1700
customer-name loan-number
Simpson
Wiggum
Flanders
L-170
L-230
L-155
Outer Join – Example
loan-number
• Inner Join
loan
L-170
L-230
branch-name
Springfield
Shelbyville
amount
3000
4000
customer-name
Simpson
Wiggum
Borrower
• Left Outer Join
loan
borrower
loan-number
L-170
L-230
L-260
branch-name
Springfield
Shelbyville
Dublin
amount
3000
4000
1700
customer-name
Simpson
Wiggum
null
Outer Join – Example
loan-number
Right Outer Join
loan
L-170
L-230
L-155
Springfield
Shelbyville
null
loan-number
branch-name
amount
3000
4000
null
customer-name
Simpson
Wiggum
Flanders
borrower
Full Outer Join
loan
branch-name
borrower
L-170
L-230
L-260
L-155
Springfield
Shelbyville
Dublin
null
amount
3000
4000
1700
null
customer-name
Simpson
Wiggum
null
Flanders
Null Values
• It is possible for tuples to have a null value, denoted by null, for
some of their attributes
• null signifies an unknown value or that a value does not exist.
• The result of any arithmetic expression involving null is null.
• Aggregate functions simply ignore null values
– Is an arbitrary decision. Could have returned null as result instead.
– We follow the semantics of SQL in its handling of null values
• For duplicate elimination and grouping, null is treated like any
other value, and two nulls are assumed to be the same
– Alternative: assume each null is different from each other
– Both are arbitrary decisions, so we simply follow SQL
Null Values
• Comparisons with null values return the special truth value unknown
– If false was used instead of unknown, then not (A < 5)
would not be equivalent to
A >= 5
• Three-valued logic using the truth value unknown:
– OR: (unknown or true)
= true,
(unknown or false)
= unknown
(unknown or unknown) = unknown
– AND: (true and unknown)
= unknown,
(false and unknown)
= false,
(unknown and unknown) = unknown
– NOT: (not unknown) = unknown
– In SQL “P is unknown” evaluates to true if predicate P evaluates to unknown
• Result of select predicate is treated as false if it evaluates to
unknown
Download