A Relational Approach to SQL

How to Write Correct SQL and Know It:
A Relational Approach to SQL
a technical seminar for DBAs, data architects,
DBMS implementers, database application programmers,
and other database professionals
by
C. J. Date
Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical,
photographic, or otherwise, without the explicit written permission of the copyright owner.
THESIS :
1. You’re an SQL professional
2. But SQL is complicated and difficult (much more so than
SQL advocates would have you believe)
3. And testing can never be exhaustive
4. So to have a hope of writing correct SQL, you must
follow some discipline
5. Q: What discipline?
A: Discipline of using SQL relationally
6. So you must know relational theory thoroughly too
(as well as SQL itself)
Copyright C. J. Date 2008
page 1
USING SQL RELATIONALLY :
Why is this a good idea? What does it mean? Isn’t SQL
relational anyway?
And in any case ... What does "SQL" mean?*
Objectives:
1. Cover relational theory thoroughly
/* what it is but not always why */
2. Apply that theory to SQL practice
/* and explain esoteric SQL features */
*
Ignore (e.g.) OLAP, dynamic SQL, user defined types,
and other nonrelational stuff
Copyright C. J. Date 2008
page 2
PREREQUISITES :
This seminar is not for complete beginners ...
but it's not just a refresher course, either!
Aimed at database professionals:
•
Know SQL reasonably well
•
Know that relational theory is A Good Thing
Sadly, if your "relational" knowledge derives from SQL
alone, you won't know the relational model as well as you
should, and you might know some things that ain't so
SQL  the relational model !!!
Copyright C. J. Date 2008
page 3
FOR EXAMPLE :
•
What exactly is first normal form?
•
What’s the connection between relations and
predicates?
•
What’s semantic optimization?
•
What’s an image relation?
•
What’s semidifference and why is it important?
•
Why doesn’t deferred integrity checking make sense?
Copyright C. J. Date 2008
page 4
•
What’s a relation variable?
•
What’s prenex normal form?
•
Can a relation have an attribute whose values are
themselves relations?
•
Is SQL relationally complete?
•
What’s The Information Principle?
•
How does XML fit with the relational model?
Copyright C. J. Date 2008
page 5
TERMINOLOGY :
Relational terms when discussing relational theory—
relation, tuple, attribute (etc.); SQL terms when discussing
SQL—table, row, column (etc.)
Note: The equivalences are not exact!
One term I’ll use in connection with both relational theory
and SQL: operator (SQL uses operator, function,
procedure, routine, method, but they all mean the same
thing, pretty much)
Thus, e.g., "=", ":=", "+", SELECT,
DISTINCT, UNION, SUM,
GROUP BY, etc., etc.
Copyright C. J. Date 2008
operators
page 6
WHY DO YOU NEED TO KNOW
RELATIONAL THEORY ???
Because it's PRINCIPLES ... FOUNDATIONS ...
Professionals should know the foundations of their field
Technology and products (and SQL) change all the time,
but principles ENDURE ... Hence emphasis on:
• Principles, not products
• Foundations, not fads
Compromises and tradeoffs might be necessary in "the real
world" but should always be made from a position of
conceptual strength
Copyright C. J. Date 2008
page 7
SOME NICE QUOTES :
Those who are enamored of practice without theory are
like a pilot who goes into a ship without rudder or
compass and never has any certainty where he [sic] is
going. Practice should always be based on a sound
knowledge of theory.
—Leonardo da Vinci (1452-1519)
Languages die ... mathematical ideas do not.
—G. H. Hardy (1877-1947)
Copyright C. J. Date 2008
page 8
THEORY
IS
PRACTICAL !
Copyright C. J. Date 2008
page 9
UNFORTUNATELY ...
The gap between theory and practice
is not as wide in theory
as it is in practice
—Anon.
Copyright C. J. Date 2008
page 10
CODD’S ORIGINAL RELATIONAL MODEL :
AN OVERVIEW
STRUCTURE:
DEPT
types ("domains")
n-ary relations
EMP
attributes, tuples
keys: candidate, primary, foreign
INTEGRITY:
entity integrity
referential integrity
DNO
ENO
LOC
ENAME
BUDGET
DNO
SAL
/* but I don't believe in nulls !!! */
MANIPULATION:
relational algebra:
/* see later re relational calculus */
intersection, union, difference, product
restrict, project, join, divide
relational assignment
Copyright C. J. Date 2008
page 11
CODD’S ORIGINAL RELATIONAL ALGEBRA :
AN OVERVIEW
restrict
product
project
a
b
c
(select)
intersect
union
Copyright C. J. Date 2008
b1 c1
b2 c2
b3 c3
a
a
b
b
c
c
x
y
x
y
x
y
difference
a
a
a
b
c
(natural) join
a1 b1
a2 b1
a3 b2
x
y
a1 b1
a2 b1
a3 b2
c1
c1
c2
x
y
z
x
y
x
z
divide
a
page 12
THE SUPPLIERS-AND-PARTS DATABASE :
S
SNO
SNAME
STATUS
S1
S2
S3
S4
S5
Smith
Jones
Blake
Clark
Adams
20
10
30
20
30
P PNO
PNAME
P1
P2
P3
P4
P5
P6
Nut
Bolt
Screw
Screw
Cam
Cog
Copyright C. J. Date 2008
COLOR
Red
Green
Blue
Red
Blue
Red
SP
CITY
London
Paris
Paris
London
Athens
WEIGHT
12.0
17.0
17.0
14.0
12.0
19.0
CITY
London
Paris
Oslo
London
Paris
London
SNO
PNO
QTY
S1
S1
S1
S1
S1
S1
S2
S2
S3
S4
S4
S4
P1
P2
P3
P4
P5
P6
P1
P2
P2
P2
P4
P5
300
200
400
200
100
100
300
400
200
200
300
400
page 13
MODEL vs. IMPLEMENTATION :
Unfortunately the term "data model" is used in the IT world
with two very different meanings:
Data model (first sense): An abstract, self-contained,
logical definition of the objects, operators, and so forth, that
together make up the abstract machine with which users
interact. The objects allow us to model the structure of
data. The operators allow us to model its behavior.
Implementation: The physical realization on a real
machine of the components of the abstract machine that
together constitute the data model in question.
Copyright C. J. Date 2008
page 14
MODEL vs. IMPLEMENTATION (cont.) :
Data model (second sense): A model of the persistent data
of some particular enterprise (i.e., a logical DB design).
First meaning: Like a programming language, whose
constructs can be used to solve many specific problems,
but in and of themselves have no direct connection with any
such specific problem
Second meaning: Like a specific program written in that
language—uses the facilities provided by the model (first
meaning) to solve some specific problem
Copyright C. J. Date 2008
page 15
MODEL vs. IMPLEMENTATION (cont.) :
From here on "model" means the first sense (barring explicit
statements to the contrary)
Don’t confuse model vs. implementation !!! ... e.g., don’t
confuse keys vs. unique indexes
Model vs. implementation implies (physical) data
independence ... Hence protection of investment
Everything to do with performance is primarily an
implementation, not a model, issue!
/* and recommendations to follow are almost NEVER
/* driven by performance concerns ...
Copyright C. J. Date 2008
*/
*/
page 16
E.g., "JOINS ARE SLOW" :
MAKES NO SENSE !!!
S JOIN SP
/* good */
vs.
/* bad
*/
do for all tuples in S ;
fetch S tuple into TS , TN , TT , TC ;
do for all tuples in SP with SNO = TS ;
fetch SP tuple into TS , TP , TQ ;
emit tuple TS, TN , TT , TC , TP , TQ ;
end ;
end ;
Recommendation: Don’t do this!
Copyright C. J. Date 2008
page 17
PROPERTIES OF RELATIONS :
 Every relation has a heading
(set of attribute names—more precisely,
attribute-name:type-name pairs, but
informally we often ignore the types)
and a body (set of tuples)
 No. of attributes = degree, no. of tuples = cardinality
 Relations never contain duplicate tuples
/* SQL fails here */
 The tuples of a relation are unordered, top to bottom
 The attributes of a relation are unordered, left to right
/* SQL fails here */
Copyright C. J. Date 2008
page 18
NOTE THAT :
 Every subset of a tuple is a tuple ... Every subset of a
heading is a heading ... Every subset of a body is a body
 Tuple equality: Two tuples EQUAL iff (= if and only if)

Same attributes (i.e., same attribute-name/type-name pairs)

And attributes with same name have same attribute value
I.e., iff they're the same tuple !!!
 Two tuples are duplicates iff they're equal
 MANY features of the relational model rely on the above
Copyright C. J. Date 2008
page 19
MORE ON RELATIONS :

Relations are always normalized (i.e., in first normal
form, 1NF)

A relation and a table aren’t the same thing!


A table can be regarded as a CONCRETE picture of
an ABSTRACT idea (but it’s a significant advantage
of the relational model that its fundamental data
objects have such a simple and easily understood
concrete representation)
Base vs. derived relations /* see next page */
Copyright C. J. Date 2008
page 20
BASE vs. DERIVED RELATIONS :
Rel ops let us start with given rels and derive further rels
(e.g., by doing queries) ... Given rels are base ones, others
are derived
Must be able to define base ones (CREATE TABLE in SQL)
and base ones must be named
Certain derived rels—in particular, views (aka virtual rels)—
are named too: e.g.,
CREATE VIEW SST_PARIS
AS SELECT SNO , STATUS
FROM S
WHERE CITY = ‘Paris’ ;
Copyright C. J. Date 2008
page 21
Value of view at time t = result of evaluating defining
expression at time t
Can operate on views as if they were base rels ...
Can think of view as being conceptually materialized
at time of reference
• But it isn’t really materialized!
/* at least, we hope not */
• And materialization wouldn’t work for updates anyway
Copyright C. J. Date 2008
page 22
POPULAR MISCONCEPTIONS :
What you often hear:
• Base rels "physically exist"
• Views don’t "physically exist"
Wrong!
RM deliberately has nothing to say about physical storage
matters!
Also ... it’s all relations !!!
Copyright C. J. Date 2008
page 23
FROM A RECENT TEXTBOOK :
"[It] is important to make a distinction between stored
relations, which are tables, and virtual relations, which
are views ... [We] shall use relation only where a table
or a view could be used. When we want to emphasize that
a relation is stored, rather than a view, we shall sometimes
use the term base relation or base table."
• How many confusions here?
• No wonder there's so much confusion out there, if this
is typical of the quality of the teaching (which it
probably is)
Copyright C. J. Date 2008
page 24
ONE FURTHER (important) PRELIMINARY :
RELATIONS vs. RELVARS
Historically there has been much confusion between
relations as such (i.e., relation values) and relation
variables
Consider: DECLARE N INTEGER ...
—pgmg lang
N is an integer variable whose values are
integers per se
Likewise: CREATE TABLE T ...
—SQL
T is a relation variable whose values are
relations per se /* ignoring SQL quirks */
For example:
Copyright C. J. Date 2008
page 25
S
SNO
S1
S2
S3
relation
variable
SNAME
Smith
Jones
Blake
STATUS
20
10
30
CITY
London
Paris
Paris
current
relation
value
DELETE S WHERE CITY = ‘Paris’ ;
Shorthand for:
S := S WHERE NOT (CITY = ‘Paris’ ) ;
S
relation
variable
Copyright C. J. Date 2008
SNO
SNAME
STATUS
S1
Smith
20
CITY
London
current
relation
value
page 26
HENCE :
• INSERT / DELETE / UPDATE are all shorthand for some
relational assignment, and—by definition—they all
assign some relation value to some relation variable
• A relation variable or relvar is a variable whose permitted
values are relations
Base (or real) relvar: One that isn’t virtual
Virtual relvar: One that’s defined by means of some
specified relational expression in terms
of one or more other relvars
• Henceforth: “Relation” means relation / “relvar” means
relvar! ... and we ought to start again
Copyright C. J. Date 2008
page 27
BY THE WAY :
SQL doesn’t support relational assignment as such ...
So foregoing example
S := S WHERE NOT ( CITY = ‘Paris’ ) ;
is expressed in Tutorial D ... Self-explanatory (?) "toy"
language used by Date and Darwen to illustrate the ideas
of The Third Manifesto
In what follows, I’ll use Tutorial D to illustrate relational
concepts (as well as showing SQL analogs where
applicable)
Copyright C. J. Date 2008
page 28
ASIDE : THE THIRD MANIFESTO
C. J. Date and Hugh Darwen: Databases, Types, and the
Relational Model: The Third Manifesto (3rd edition,
Addison-Wesley, 2006)
• Proposal for future direction of data and DBMSs
• D = any language that conforms to Manifesto principles
(generic name)
• Tutorial D = language used in Manifesto book as a
basis for examples
See www.thethirdmanifesto.com
Copyright C. J. Date 2008
page 29
VALUES vs. VARIABLES IN GENERAL :
 VALUE : an "individual constant"

no location in time or space

can’t be changed

can be represented in memory
(by some encoding)
 VARIABLE : a holder for (the representation of) a value

has location in time and space

can be updated (i.e., current value can be
replaced by another)
Important note: Values and variables (more fundamentally,
types) can be arbitrarily complex
Hard to imagine people getting confused over such a basic
distinction, but they do ...
Copyright C. J. Date 2008
page 30
VALUE vs. VARIABLE CONFUSION :
AN EXAMPLE :
"We distinguish the declared type of a variable from ...
the type of the object that is the current value of the
variable ...
(so an object is a value)
"... we distinguish objects from values ...
(so an object isn't a value after all) — ???
"... a MUTATOR [is an operation such that it's] possible
to observe its effect on some object."
(in fact, an object is a variable) — ?????
Copyright C. J. Date 2008
page 31
A GUIDING PRINCIPLE AND
A GREAT AID TO CLEAR THINKING :
All logical differences are big differences
—Wittgenstein
Examples:





Model vs. implementation
Value vs. variable
Relation vs. relvar
Base relvar vs. view
Data model (1st sense) vs.
data model (2nd sense)
Copyright C. J. Date 2008






Relation vs. table
Attribute vs. column
Tuple vs. row
SQL vs. relational model
DB vs. DBMS
Expression vs. statement
page 32
STRUCTURE OF PRESENTATION :
1. Setting the scene
8.
SQL and constraints
2. Types and domains
9.
SQL and views
3. Tuples and relations,
rows and tables
10.
SQL and logic I:
Relational calculus
4. No duplicates, no nulls
11.
SQL and logic II:
Using logic to write SQL
12.
Further SQL topics
6. SQL and algebra I:
The original operators
13.
Appendix:
The relational model
7. SQL and algebra II:
Additional operators
14.
Appendix: DB design
5. Base relvars, base tables
Copyright C. J. Date 2008
page 33
RELATIONS ARE DEFINED OVER TYPES :
RM implies support for user defined types—hence, user
defined operators also—hence, an "object/relational"
DBMS done right is just a relational DBMS done right!
RM attributes can be of any type whatsoever, except (a) no
pointer valued attributes; (b) relation r cannot have an
attribute of the same type as r itself (see later)
But whole point about user defined types is: They look just
like system defined types to other users ... So I’ll just
assume types are system defined (mostly)
RM prescribes type BOOLEAN ... Assume CHAR,
INTEGER, FIXED available too /* see later for SQL */
Copyright C. J. Date 2008
page 34
DOMAINS AND TYPES ARE THE SAME THING :
1. Equality comparisons and "domain check override" (DCO)
domains really are types ...
Note: Assume for sake of discussion that SNO attribs
in S and SP are of user defined type SNO ...
PNO attribs in P and SP are of user defined type PNO
Caveat: Only fair to warn you that I discuss "DCO" only
to dismiss it ... as we’ll see
2. Data value atomicity and first normal form
... of arbitrary complexity
Copyright C. J. Date 2008
page 35
EQUALITY COMPARISONS :
 "Everyone knows" that two values can be tested for equality
only if they come from the same domain
E.g., with suppliers and parts:
SP.SNO = S.SNO /* OK */
SP.PNO = S.SNO /* not OK */
 Any relational op—join, union, etc.—that calls for an explicit or
implicit equality comparison between values from different
domains should fail /* at compile time */
E.g., SELECT
FROM
WHERE
(
S.SNO, S.SNAME, S.STATUS, S.CITY
S
NOT EXISTS
SELECT *
FROM
SP
WHERE SP.PNO = S.SNO ) /* not OK */
Probably a typo
Copyright C. J. Date 2008
page 36
EQUALITY COMPARISONS (cont.) :

Comparison "SP.PNO = S.SNO" is INVALID
—unless user insists ...
(Codd's "domain check override" ops)

BUT, according to Codd:
P.WEIGHT = SP.QTY
P.WEIGHT - SP.QTY = 0
/* not OK
/* OK ... ?!?!?
*/
*/
"... DBMS checks that the basic data types are the same"
[Codd's book on RM/V2 p.47, italics added]
 So there’s something strange about Codd-style domain checks
in the first place, let alone "domain check override"
Copyright C. J. Date 2008
page 37
"DOMAIN CHECK OVERRIDE" :
Indeed, "domain check override" (DCO) is not the appropriate
concept (in fact, it makes no sense AT ALL*) ...
Consider comparisons:
S.SNO = 'X4'
P.PNO = 'X4'
valid
valid
S.SNO = P.PNO
invalid
What's going on ???
Well ...
*
Stems from failure to recognize another logical difference!
(see next page)
Copyright C. J. Date 2008
page 38
 SNO, PNO are types—represented internally in terms of
type CHAR, say—but representation is (or should be)
irrelevant and HIDDEN! (it’s an implementation issue)
... Logical difference between type and representation
 Also selector operators SNO, PNO that effectively
convert CHAR values to types SNO, PNO—invoked
implicitly in:
S.SNO = 'X4'
P.PNO = 'X4'
(i.e., strings coerced to type SNO or PNO: see later)
 Plus operators for inverse conversions too (in effect)
 This mechanism provides domain checking and "DCO"
capability in a clean, fully orthogonal, non ad hoc manner
Copyright C. J. Date 2008
page 39

What we’re really talking about is
STRONG TYPING

Which incidentally would correctly deal with
expressions such as
P.WEIGHT * SP.QTY
(
WEIGHT )
P.WEIGHT + SP.QTY
( invalid )
SPX.QTY + SPY.QTY /* SPX and SPY both shipments */
(
QTY )
etc., etc.
Copyright C. J. Date 2008
page 40
DATA VALUE ATOMICITY :
 First normal form (1NF) requires every attribute value
in every tuple to be "atomic"
 Codd defines atomic as "nondecomposable by the DBMS
(excluding certain special functions)"
 But this defn is a trifle puzzling, and/or not very precise ...
What about strings
numbers
dates
times
(SUBSTR, LIKE, etc.)?
(INTEGER, FRACTION, etc.)?
(YEAR, MONTH, DAY)?
(HOUR, MIN, SEC)?
Not to mention, e.g., view defns in the catalog
Copyright C. J. Date 2008
page 41
NOW WATCH VERY CAREFULLY !!!
R1
R2
SNO PNO
SNO
PNO
SNO
S2
S2
S3
S4
S4
S4
S2
S3
S4
P1,P2
P2
P2,P4,P5
S2
S3
S4
P1
P2
P2
P2
P4
P5
R3
This one is clearly
NOT 1NF... PNO is
"repeating group" or
"multivalued" (?)
PNO_SET
{P1,P2}
{P2}
{P2,P4,P5}
But this one is 1NF
again !!!
This one is
clearly 1NF...
Copyright C. J. Date 2008
page 42
Values of PNO_SET in R3 are no more and no less
"decomposable by the DBMS" than are strings, dates,
etc.
(R3 might not be a good DESIGN—that’s a separate
issue)
The real point:
"Atomicity" has no absolute meaning!
Copyright C. J. Date 2008
page 43
A CLOSER LOOK AT R3 :
SNO
PNO_REL
S2
PNO
P1
P2
S3
PNO
/* note name change */
Values in PNO_REL position are
RELATIONS!
… PNO_REL is a relationvalued attribute (RVA)
P2
..
Copyright C. J. Date 2008
....
/*
/*
/*
/*
no “table valued columns” in
SQL, though SQL does support
columns with values that are
“multisets of rows”
*/
*/
*/
*/
page 44
A DOMAIN IS A DATA TYPE (summary) :
Domains, and therefore attributes, can contain ABSOLUTELY
ANYTHING !!! (any values, that is)



Arrays, lists, relations, XML docs, photos, ...
I.e., values of ARBITRARY COMPLEXITY
Without violating first normal form!
DOMAIN

TYPE
Recap: RM implies support for user defined types—hence,
user defined ops also—hence, an "O/R" DBMS done
right is just a relational DBMS done right!
From here on, favor type over domain
Copyright C. J. Date 2008
page 45
TO SPELL IT OUT ONE MORE TIME :
THE QUESTION AS TO WHAT TYPES ARE SUPPORTED IS
ORTHOGONAL TO THE QUESTION OF SUPPORT FOR THE
RELATIONAL MODEL
More succinctly:
TYPES ARE ORTHOGONAL TO TABLES
The relational model has NEVER prescribed data types
(it's never been implemented either—but that's
another matter)
Copyright C. J. Date 2008
page 46
SO WHAT’S A TYPE ???
Basically, a named set of values—e.g., all possible
integers (INTEGER); all possible character strings (CHAR);
all possible supplier numbers (SNO); all XML docs ... all
fingerprints ... all X rays ... etc., etc.
Every value (in partic, every relation) is of some type—in
fact, exactly one type /* so types disjoint */ unless type
inheritance is supported—and carries its type with it
Every variable (in partic, every relvar), every attribute of
every relation, every operator that returns a result, and
every parameter of every operator is declared to be of
some type
Copyright C. J. Date 2008
page 47
To say that variable V is of type T is to say that every value
v that can legally be assigned to V is of type T
Aside: To say that V is a variable is to say that V is
"assignable to" (i.e., updatable)
Every expression denotes some value and is of some type
= type of value in question = type of value returned by
outermost operator
E.g., type of
(a+b)*(x-y)
is whatever the declared type of "*" is
Copyright C. J. Date 2008
page 48
Associated with type T is a set of ops for operating on values
and variables of type T ... ("associated with" means op in
question has parameter of declared type T)
E.g., system-defined type INTEGER:

System defines ":=", "=", "<", etc., for assigning and
comparing integers

And "+", "*", etc., for arithmetic on integers

Perhaps CAST to convert integers to char strings

But not "||", SUBSTR, etc.
Copyright C. J. Date 2008
page 49
E.g., user-defined type SNO:
Type definer defines ":=", "=", and maybe "<" etc., for
assigning and comparing supplier numbers


But not "+", "*", etc.


Subscript ops for arrays

Special arith ops for dates and times

XQuery ops for XML docs ... and so on
Copyright C. J. Date 2008
page 50
DEFINING A NEW TYPE INVOLVES AT LEAST
ALL OF THE FOLLOWING :
1. Specifying a name for the type
2. Specifying the values that make up the type /* see later */
3. Specifying the physical representation /* ignore */
4. Specifying a selector op for selecting values of the type
/* see later */
5. Specifying ops that apply to values and variables of the
type ... Must include "=" and ":=" !!!
6. For those ops that return a result, specifying the type of the
result (so DBMS knows which expressions are legal, and
type of result of every legal expression)
Copyright C. J. Date 2008
page 51
EXAMPLE (Tutorial D) :
Define type:
TYPE POINT ... /* geometric points in 2D space */ ;
Define op REFLECT that, given point (x,y), returns inverse
point (-x,-y):
OPERATOR REFLECT ( P POINT ) RETURNS POINT ;
RETURN POINT ( - THE_X ( P ) , - THE_Y ( P ) ) ;
/* POINT selector invocation ... takes two */
/* arguments (unlike SNO selector earlier) */
END OPERATOR ;
Copyright C. J. Date 2008
page 52
POINTS ARISING (sorry) :
Another important logical difference:
argument vs. parameter
And another:
operator vs. invocation

Selector is a generalization of the familiar concept of
a literal
Copyright C. J. Date 2008
page 53
NOTE TOO THAT :
 The values that make up a given type exist BEFORE the DB
exists, WHILE the DB exists, and AFTER the DB exists ...
Better: They "have no location in time or space"
 Defining type T just means "now we're interested in a certain
set of values and we want to call it T"
 Similarly for dropping type T
 Values and sets of values don't "belong" to any
particular DB!
Copyright C. J. Date 2008
page 54
SCALAR vs. NONSCALAR
/* informal distinction */ :



Type is scalar if no user visible components, nonscalar
otherwise
Values, variables, etc., of type T are scalar if T is scalar,
nonscalar otherwise
Nonscalar example (Tutorial D):
VAR S BASE
RELATION { SNO CHAR , SNAME CHAR ,
STATUS INTEGER , CITY CHAR }
KEY { SNO } ;

RELATION {...} is a relation type (nonscalar)
/* order in which attribs specified insignificant */
Copyright C. J. Date 2008
page 55
TYPE GENERATORS :


RELATION {...} is also a generated type ...
obtained by invoking RELATION type generator
(not defined by separate TYPE statement)
Example involving TUPLE type generator:
VAR SINGLE_SUPPLIER
TUPLE { STATUS INTEGER , SNO CHAR ,
CITY CHAR , SNAME CHAR } ;

Code fragment /* illustrating "tuple extraction" */ :
SINGLE_SUPPLIER := TUPLE FROM ( S WHERE SNO = ‘S1’ ) ;

Note logical difference between tuple t and relation r
containing just tuple t !!!
Copyright C. J. Date 2008
page 56
SCALAR TYPES IN SQL :
BOOLEAN
CHARACTER(n)
CHARACTER VARYING(n)
FLOAT(p)
NUMERIC(p,q)
DECIMAL(p,q)
INTEGER
SMALLINT
DATE
TIME
TIMESTAMP
INTERVAL
1. Various defaults, abbreviations, alternative spellings
2. Literals (more or less conventional)
3. Scalar assignment:
SET <scalar var ref> = <scalar exp> ;
Plus implicit scalar assignments on FETCH etc.
Copyright C. J. Date 2008
page 57
4. Scalar equality comparison:
<scalar exp> = <scalar exp>
Plus implicit comparisons on DISTINCT, UNION, etc.
Unfortunately "=" support is badly flawed!


Can give TRUE even if comparands clearly
distinguishable /* discuss in a moment */
Can fail to give TRUE even if comparands not
distinguishable /* see nulls, later */
Copyright C. J. Date 2008
page 58
5. BOOLEAN might not be supported ... If it isn’t:



Boolean exps can still appear in WHERE, ON, HAVING
But no table can have a column of type BOOLEAN, and
no variable can be declared to be of type BOOLEAN
So workarounds might be needed ...
6. SQL also supports "domains" ... But SQL domains aren’t
types at all ... In fact, completely unnecessary, now that
SQL does support user defined types ... Use them if you
like, but don’t mistake them for true relational domains
Copyright C. J. Date 2008
page 59
SQL TYPE CHECKING AND COERCIONS :
SQL supports a weak form of strong typing (!) on assignment
and equality comparisons:



BOOLEAN : BOOLEAN
Character string : Character string
Number : Number
(plus various rules for dates, times, etc.)
In other words, SQL often does coercions
One bizarre consequence: Certain unions (etc.) can yield
result with rows not appearing in either operand!
Copyright C. J. Date 2008
page 60
FOR EXAMPLE :
INTEGER
T1
X
Y
T2
0
0
1.0
2.0
X
Y
X
Y
0.0
0.0
1.0
0
1
2
0.0
0.0
0.0
1.0
1.0
2.0
0.0
2.0
NUMERIC(5,1)
SELECT X , Y FROM T1
UNION
SELECT X , Y FROM T2 ... Result:
Copyright C. J. Date 2008
page 61
RECOMMENDATIONS :
1. Ensure that columns with the same name are always of
the same type /* see later */
2. Avoid type conversions where possible
3. When they can’t be avoided, do them explicitly:
SELECT CAST ( X AS NUMERIC(5,1) ) AS X , Y FROM T1
UNION
SELECT X , CAST ( Y AS NUMERIC(5,1) ) AS Y FROM T2
I.e., avoid coercions! /* general good practice */
Copyright C. J. Date 2008
page 62
UNFORTUNATELY :
Certain coercions are built into the definition of SQL and
can’t be avoided! Just for the record:

If table exp tx is used as a row subquery, then the table t
denoted by tx should have just one row r, and t is coerced
to r

If table exp tx is used as a scalar subquery, then the table t
denoted by tx should have just one column and just one
row and hence contain just one value v, and t is doubly
coerced to v

If the "row exp" rx in the ALL or ANY comparison rx theta
sq (where theta is, e.g., >ALL or <ANY and sq is a
subquery) is in fact a scalar exp, the scalar value v
denoted by that exp is coerced to a row that contains
just v
Copyright C. J. Date 2008
page 63
SQL COLLATIONS :
Type checking and coercion for character strings are more
complex than I’ve been pretending ...



Given string consists of chars from one character set
and has one collation
Given collation = rule for specific character set ...
Governs comparison of strings of chars from that set
Let C be a collation for character set S, and let a and b
be any two characters from S. Then C must be such
that exactly one of
a<b
a=b
a>b
gives TRUE and the other two give FALSE (under C)
Copyright C. J. Date 2008
page 64
COMPLICATIONS :

Either PAD SPACE or NO PAD can apply to collation C
Under PAD SPACE, distinct strings (e.g., ‘AB’ and ‘AB ’)
can "compare equal"
Recommendation: Don’t use PAD SPACE!

But distinct strings might still "compare equal" even with
NO PAD ... E.g., if C is CASE_INSENSITIVE
Recommendation: Don’t do this ... or if you must,
then be very careful!
Copyright C. J. Date 2008
page 65
Call v1 and v2 "equal but distinguishable" if they’re distinct
but v1 = v2 gives TRUE
In UNION, JOIN, MATCH, LIKE, UNIQUE, etc., implicit
equality rule is indeed "equal even if distinguishable"
In UNION, JOIN, GROUP BY, DISTINCT, etc., DBMS
might have to choose which "equal but distinguishable"
value is to appear in some column in some result row
SQL gives little guidance in such situations!
Hence, certain SQL expressions are indeterminate!
... or "possibly nondeterministic" (SQL term)
Copyright C. J. Date 2008
page 66
For example,
SELECT MAX ( Z )
FROM T
might return ‘ZZZ’ on one occasion and ‘zzz’ on another,
even if T hasn’t changed in the interim!
One important consequence: Many SQL table exps aren’t
allowed in constraints !!!
Strong recommendation: Avoid possibly nondeterministic
expressions as much as you can!
Copyright C. J. Date 2008
page 67
SQL ROW TYPES :
Recall:
VAR SINGLE_SUPPLIER
TUPLE { STATUS INTEGER , SNO CHAR ,
CITY CHAR , SNAME CHAR } ;
SQL analog of TUPLE type generator = ROW type constructor
DECLARE SINGLE_SUPPLIER /* SQL row variable */
ROW ( SNO VARCHAR(5) , SNAME VARCHAR(25) ,
STATUS INTEGER , CITY VARCHAR(20) ) ;
But "field" [sic] order matters! ... 4 fields can be arranged into
24 distinct row types!
Copyright C. J. Date 2008
page 68
SQL ROW TYPES (cont.) :
Row assignment: e.g.,
SET SINGLE_SUPPLIER = ( S WHERE SNO = ‘S1’ ) ;
/* row subquery ...
*/
Note the coercion here !!!

Row comparison: /* see later */
Copyright C. J. Date 2008
page 69
WHAT ABOUT SQL TABLE TYPES ???
SQL doesn’t really have a TABLE type generator (or
constructor) at all !!! Recall:
VAR S BASE
RELATION { SNO CHAR , SNAME CHAR ,
STATUS INTEGER , CITY CHAR }
KEY { SNO } ;
SQL analog:
CREATE TABLE S
( SNO
VARCHAR(5)
SNAME VARCHAR(25)
STATUS INTEGER
CITY
VARCHAR(20)
UNIQUE ( SNO ) ) ;
Copyright C. J. Date 2008
NOT NULL ,
NOT NULL ,
NOT NULL ,
NOT NULL ,
/* note strange
/* jumble of
/* column and
/* constraint
/* defns
*/
*/
*/
*/
*/
page 70
No sequence of linguistic tokens in that CREATE TABLE
statement that can logically be labeled "an invocation of the
TABLE type constructor"
If table S has any type at all, it’s just bag of rows, where the
rows are of type
ROW ( SNO
SNAME
STATUS
CITY
Copyright C. J. Date 2008
VARCHAR(5) ,
VARCHAR(25) ,
INTEGER ,
VARCHAR(20) )
page 71
ASIDE : "TYPED TABLES"
Very bad term! ... If "typed table" TT defined to be "of type
T," then TT is not of type T, and nor are its rows!
Avoid such tables anyway, because they’re inextricably
intertwined with SQL’s support for pointers ...
RM prohibits pointers ... But SQL allows a column in one
table to have values that are pointers to rows in some other
table ... Pointers are reference values, columns containing
them are of some REF type ... Why?
Strong recommendation: Don’t use such tables, nor any
features related to them!
Copyright C. J. Date 2008
page 72
STRUCTURE OF PRESENTATION :
1. Setting the scene
8.
SQL and constraints
2. Types and domains
9.
SQL and views
3. Tuples and relations,
rows and tables
10.
SQL and logic I:
Relational calculus
4. No duplicates, no nulls
11.
SQL and logic II:
Using logic to write SQL
12.
Further SQL topics
6. SQL and algebra I:
The original operators
13.
Appendix:
The relational model
7. SQL and algebra II:
Additional operators
14.
Appendix: DB design
5. Base relvars, base tables
Copyright C. J. Date 2008
page 73
A SAMPLE TUPLE VALUE
(tuple for short) :
attribute name
SNO:CHAR
S1
type name
SNAME : CHAR
Smith
STATUS : INTEGER
20
degree = 4
CITY : CHAR
London
attribute value
Attribute : attribute name + type name
Component : attribute + attribute value
Heading :
{ SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }
Type : TUPLE {SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }
Copyright C. J. Date 2008
page 74
 By definition, no left to right ordering to components
(so ordering arbitrary in written form)
 By definition, every tuple contains exactly one value,
of approp type, for each attribute
No nulls !!! (nulls aren’t values)
Recommendation: Never say "null value"!
 Sample tuple selector invocation (tuple literal):
TUPLE { SNO ‘S1’ , SNAME ‘Smith’ , STATUS 20 , CITY ‘London’ }
/* keyword TUPLE does double duty in Tutorial D */
Copyright C. J. Date 2008
page 75




Two tuples equal ("duplicates") iff very same tuple (""
and "" make sense, "<" and ">" don’t)
Every subset of a heading is a heading ...
Every subset of a tuple is a tuple: e.g.,
SNO : CHAR CITY : CHAR
SNO : CHAR
S1
S1
London
The empty set is a subset of every set ... So the empty
tuple (or 0-tuple) is a valid tuple! (and there’s only one)
... Type and value both TUPLE{} in Tutorial D
Tuple assignment and comparisons: Already discussed
Copyright C. J. Date 2008
page 76
ATTRIBUTE EXTRACTION :
Note logical difference between value v and tuple t (of
degree one) that contains just v !!!
Let t be a tuple—say the tuple for supplier S1 in current
value of suppliers-and-parts DB
Tutorial D:
CITY FROM t
—"extracts" CITY value from t
SQL analog: t.CITY
Copyright C. J. Date 2008
page 77
SQL ROWS :
Tutorial D term:
SQL analog (approx.):
tuple (value)
TUPLE type generator
tuple selector
tuple variable
row
row type constructor
row value constructor
row variable (?)
But SQL rows have left to right ordering to their "fields" ...
e.g., ROW(1,2)  ROW(2,1) * ... Fields identified by ordinal
position, not by name
No "0-row"
* Keyword ROW optional in row value constructors and
usually omitted
Copyright C. J. Date 2008
page 78
ROW ASSIGNMENT :
SET syntax (as for scalars) /* already discussed */
Row assignments also involved (in effect) in UPDATE: e.g.,
UPDATE S
SET
STATUS = 20 , CITY = ‘London’
WHERE CITY = ‘Paris’ ;
Logically equivalent to:
UPDATE S
SET ( STATUS , CITY ) = ( 20 , ‘London’ )
WHERE CITY = ‘Paris’ ;
Copyright C. J. Date 2008
page 79
ROW COMPARISONS :
Believe it or not, most boolean exps in SQL, even simple
"scalar" comparisons, are defined in terms of rows, not
scalars!
Example involving "genuine" row comparison:
SELECT SNO
FROM
S
WHERE ( STATUS , CITY ) = ( 20 , ‘London’ )
Logically equivalent to:
SELECT SNO
FROM
S
WHERE STATUS = 20 AND CITY = ‘London’
Copyright C. J. Date 2008
page 80
SELECT SNO
FROM
S
WHERE ( STATUS , CITY ) <> ( 20 , ‘London’ )
Logically equivalent to:
SELECT
FROM
WHERE
OR
Copyright C. J. Date 2008
SNO
S
STATUS <> 20
CITY <> ‘London’
page 81
Because row components have left to right ordering, SQL
can support "<" and ">" on rows:
SELECT SNO
FROM
S
WHERE ( STATUS , CITY ) > ( 20 , ‘London’ )
Logically equivalent to:
SELECT
FROM
WHERE
OR (
SNO
S
STATUS > 20
STATUS = 20 AND CITY > ‘London’ )
/* hmmm ... */
Copyright C. J. Date 2008
page 82
But most row comparisons involve rows of degree one:
SELECT SNO
FROM
S
WHERE ( STATUS ) = ( 20 )
Syntax rule: Parens can be dropped from row value
constructors of degree one ... Thus:
SELECT SNO
FROM
S
WHERE STATUS = 20
But this "scalar" comparison is stil technically a row
comparison (scalar comparands coerced to rows)
Copyright C. J. Date 2008
page 83
RECOMMENDATION :
Unless the rows being compared are of degree one (i.e.,
effectively scalars):
Don’t use "<", "<=", ">", and ">=" comparisons
•
They rely on left to right column ordering
•
No straightforward relational counterpart
•
Error prone
In this connection ... it’s worth noting that the SQL
standardizers took several iterations to get the semantics
right!
Copyright C. J. Date 2008
page 84
A SAMPLE RELATION VALUE
(relation for short) :
SNO:CHAR
SNAME : CHAR
S1
S2
S3
S4
S5
Smith
Jones
Blake
Clark
Adams
STATUS : INTEGER
20
10
30
20
30
CITY : CHAR
London
Paris
Paris
London
Athens
Heading : { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }
/* tuple heading as previously defined */
/* … same attributes and same degree */
Type : RELATION { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }
Body : { tuples all with specified heading }
Cardinality : cardinality of body
Copyright C. J. Date 2008
page 85
NOTE THAT :
 "Relations contain tuples" only indirectly true!
 By definition:



No relation contains duplicate tuples—including results
of relational operators
No top to bottom ordering to tuples, no left to right
ordering to attributes
Every tuple of every relation contains exactly one value,
of approp type, for each attribute—i.e., relations are
always normalized
No nulls!
Copyright C. J. Date 2008
page 86
 Every subset of a body is a body (loosely, every subset of a
rel is a rel—empty subset included ("empty relation")
Given rel type RT, there’s exactly one empty rel of type RT
 Tuple extraction: Already discussed
 t  r : TRUE iff t appears in r ... SQL example:
SELECT
FROM
WHERE
(
Copyright C. J. Date 2008
SNO , SNAME , STATUS , CITY
S
SNO IN
/* SNO coerced to ROW(SNO) */
SELECT SNO
FROM SP )
page 87
ANOTHER POINT :
RELATIONS ARE n-DIMENSIONAL (massive confusion on
this simple point!) …
A couple of quotes:
1. "When you’re well trained in relational modeling, you
begin to believe the world is two-dimensional … You
think you can get anything into the rows and columns of
a table" —Douglas Barry, Executive Director, ODMG
2. "There is simply no way to mask the complexities
involved in assembling two-dimensional data into a
multi-dimensional form"—Richard Finkelstein
Copyright C. J. Date 2008
page 88
But a relation with n attributes (i.e., of degree n) represents
points in n-dimensional space ... It’s n-dimensional, not
2-dimensional !!!
Of course a relation looks flat when pictured in tabular
form on paper … but a picture of a thing isn’t the thing
itself !!!

A major logical difference here, in fact!

Let’s all vow never to say "flat relations" ever again
Copyright C. J. Date 2008
page 89
Copyright C. J. Date 2008
page 90
RELATIONAL COMPARISONS :
Must be able to test rels for equality, of course:
e.g.,
S { CITY } = P { CITY }
/* FALSE */
Other useful comparison ops:





Useful shorthands:
IS_EMPTY ( r )
Copyright C. J. Date 2008
IS_NOT_EMPTY ( r )
page 91
RELATIONS OF DEGREE ZERO :
Empty heading is a valid heading ... So a relation can be of
degree zero! Type is RELATION{} in Tutorial D
(Such rels are a little hard to draw)
Can a relation with no attributes have any tuples?
Yes, it can have AT MOST ONE TUPLE (the 0-tuple)
One tuple: TABLE_DEE
No tuples: TABLE_DUM
/* DEE for short */
/* DUM for short */
Fundamentally important! (perhaps surprisingly)
But not supported in SQL ...
Copyright C. J. Date 2008
page 92
WHY ARE THEY SO IMPORTANT ?
Because DEE corresponds to YES (or TRUE) and
DUM corresponds to NO (or FALSE) !!!
/* see later for further explanation */
Also ... DEE and DUM (especially DEE) play a role in the
relational algebra analogous to the role played by 0 in
conventional arithmetic
/* again, see later for further explanation */
Copyright C. J. Date 2008
page 93
SQL TABLES :
I.e., table values, unless context demands otherwise
/* see later re table variables */
SQL has no "table type" notion ... An SQL table is just a
bag of rows of some row type ... Hence, no "TABLE type
generator" (though SQL does support ROW, ARRAY,
MULTISET type generators)
But SQL table value constructor is analogous (somewhat)
to a relation selector. E.g. /* "table literal" */
VALUES ( 1, 2 ), ( 2, 1 ), ( 1,1 ), ( 1,2 )
Denotes table with 2 unnamed columns and 4 (not 3!) rows
Copyright C. J. Date 2008
page 94
ANOTHER EXAMPLE :
VALUES (
(
(
(
(
‘S1’ ,
‘S2’ ,
‘S3’ ,
‘S4’ ,
‘S5’ ,
‘Smith’ ,
‘Jones’ ,
‘Blake’ ,
‘Clark’ ,
‘Adams’ ,
20 ,
10 ,
30 ,
20 ,
30 ,
‘London’
‘Paris’
‘Paris’
‘London’
‘Athens’
),
),
),
),
)
Recommendations:
1. For each column, ensure all values are of the same
type
2. Don’t specify same row twice
Copyright C. J. Date 2008
page 95
TABLE COMPARISONS ???
No direct support, but workarounds are available ...
E.g., SQL analog of
S { CITY } = P { CITY }
is:
NOT EXISTS ( SELECT CITY FROM S
EXCEPT
SELECT CITY FROM P )
AND
NOT EXISTS ( SELECT CITY FROM P
EXCEPT
SELECT CITY FROM S )
Copyright C. J. Date 2008
page 96
COLUMN NAMING (very important!) :
RM attribute naming discipline:
No anonymous attributes
No duplicate attribute names
SQL enforces analogous discipline for tables that are
current values of table variables (CREATE TABLE or
CREATE VIEW) but not for tables resulting from evaluation
of some table expression
Very strong recommendation: /* Why? See later */
Use AS to enforce discipline if SQL doesn’t!*
*
But you can’t, with VALUES expressions
Copyright C. J. Date 2008
page 97
EXAMPLES :
SELECT
FROM
DISTINCT SNAME , ‘Supplier’ AS TAG
S
SELECT
FROM
DISTINCT SNAME , 2 * STATUS AS DOUBLE_STATUS
S
CREATE VIEW SDS
AS
SELECT DISTINCT SNAME , 2 * STATUS AS DOUBLE_STATUS
FROM S ;
SELECT
FROM
WHERE
AND
Copyright C. J. Date 2008
DISTINCT S.CITY AS SCITY , P.CITY AS PCITY
S , SP , P
S.SNO = SP.SNO
SP.PNO = P.PNO
page 98
SELECT TEMP.*
FROM ( S JOIN P ON S.CITY > P.CITY ) AS TEMP
( SNO , SNAME , STATUS , SCITY ,
PNO , PNAME , COLOR , WEIGHT , PCITY )
SELECT
FROM
WHERE
MAX ( WEIGHT ) AS MBW
P
COLOR = ‘Blue’
Note:
Can ignore recommendation if no need to reference
column subsequently: e.g.,
SELECT
... WHERE WEIGHT < ( SELECT MAX ( WEIGHT )
FROM P
WHERE P.COLOR = ‘Blue’ )
Copyright C. J. Date 2008
page 99
WHY IS COLUMN NAMING IMPORTANT ???
Rel alg ops (e.g., UNION) rely on proper attrib naming
One reason: Avoids complexities caused by relying on
ordinal position!
To use SQL relationally, must apply same discipline to SQL
analogs ... As a prereq:
Very strong recommendation: If two columns represent
"the same kind of information," give them the same name
wherever possible!
E.g., SNO and SNO, not (say) SNO and SNUM
If two columns represent different kinds of information, give
them different names (usually)
Copyright C. J. Date 2008
page 100
Only situation where foregoing recommendation can’t be
followed = when two columns in same table represent
same kind of information ... E.g.:
CREATE TABLE EMP ( ENO ... , MNO ... , ... ) ;
So column renaming sometimes necessary: e.g.,
( SELECT ENO , MNO FROM EMP ) AS TEMP1
NATURAL JOIN
( SELECT ENO AS MNO , ... FROM EMP ) AS TEMP2
/* join EMP to itself on MNO in "1st copy" */
/* and ENO in "2nd copy"
*/
Copyright C. J. Date 2008
page 101
But what if DB already violates naming discipline?
Possible strategy:

For each base table T, define view V identical to T
except for column renaming

Ensure V abides by column naming discipline

Operate in terms of V instead of T
Referred to subsequently as the "operate via views
strategy"
Copyright C. J. Date 2008
page 102
BUT ...
Impossible to ignore ordinal position 100 percent ...
Columns still have ordinal position even when they don’t need
to (in base tables and views in particular)
Strong recommendation: Never write SQL code that relies
on ordinal position!
Contexts in which SQL attaches significance to ordinal
position:

SELECT *

JOIN, UNION, INTERSECT, EXCEPT

VALUES

INSERT if column name commalist omitted

Column name commalist in CREATE VIEW

ALL and ANY
comparisons
Copyright C. J. Date 2008
and range variable definitions
page 103
STRUCTURE OF PRESENTATION :
1. Setting the scene
8.
SQL and constraints
2. Types and domains
9.
SQL and views
3. Tuples and relations,
rows and tables
10.
SQL and logic I:
Relational calculus
4. No duplicates, no nulls
11.
SQL and logic II:
Using logic to write SQL
12.
Further SQL topics
6. SQL and algebra I:
The original operators
13.
Appendix:
The relational model
7. SQL and algebra II:
Additional operators
14.
Appendix: DB design
5. Base relvars, base tables
Copyright C. J. Date 2008
page 104
WHY DUPLICATE ROWS ARE BAD NEWS :
I assume you know:
•
Relational DBMSs include an optimizer ...
Purpose is to figure out the best way to implement user
queries etc. ("best" = best performing)
•
Optimizers transform relational expressions ("query
rewrite")* ...
Replace exp1 by exp2, where exp1 and exp2
guaranteed to produce same result when evaluated but
exp2 has better performance (we hope)
* But watch out for this term (has other meanings too)
Copyright C. J. Date 2008
page 105
DUPLICATE ROWS (cont.) :

If a table permits duplicates, IT’S NOT A RELATION

RM doesn’t recognize duplicates

Example (with acknowledgments to Nat Goodman):
P
PNO
P1
P1
P1
P2
PNAME
Screw
Screw
Screw
Screw
SP
SNO
S1
S1
S1
PNO
P1
P1
P2
No CKs !!!
Violate the
Information
Principle !!!
Meanings
hidden !!!
 Find part nos. for parts that either are screws or
are supplied by supplier S1
Copyright C. J. Date 2008
page 106
DUPLICATE ROWS (cont.) :
SELECT P.PNO FROM P
WHERE P.PNAME = ‘Screw’
OR
P.PNO IN
(SELECT SP.PNO FROM SP
WHERE SP.SNO = ‘S1’)
SELECT
WHERE
UNION
SELECT
WHERE
P.PNO FROM P
P.PNAME = ‘Screw’
ALL
SP.PNO FROM SP
SP.SNO = ‘S1’
SELECT SP.PNO FROM SP
WHERE SP.SNO = ‘S1’
OR
SP.PNO IN
(SELECT P.PNO FROM P
WHERE P.PNAME = ‘Screw’)
SELECT
WHERE
UNION
SELECT
WHERE
DISTINCT P.PNO FROM P
P.PNAME = ‘Screw’
ALL
SP.PNO FROM SP
SP.SNO = ‘S1’
SELECT P.PNO FROM P, SP
WHERE (SP.SNO = ‘S1’ AND
P.PNO = SP.PNO)
OR
P.PNAME = ‘Screw’
SELECT
WHERE
UNION
SELECT
WHERE
P.PNO FROM P
P.PNAME = ‘Screw’
ALL
DISTINCT SP.PNO FROM SP
SP.SNO = ‘S1’
SELECT SP.PNO FROM P, SP
WHERE (SP.SNO = ‘S1’ AND
P.PNO = SP.PNO)
OR
P.PNAME = ‘Screw’
SELECT
WHERE
UNION
SELECT
WHERE
P.PNO FROM P
P.PNAME = ‘Screw’
Copyright C. J. Date 2008
SP.PNO FROM SP
SP.SNO = ‘S1’
page 107
DUPLICATE ROWS (cont.) :
SELECT P.PNO FROM P
P1*3 P2*1
WHERE P.PNAME = ‘Screw’
OR
P.PNO IN
(SELECT SP.PNO FROM SP
WHERE SP.SNO = ‘S1’)
SELECT
WHERE
UNION
SELECT
WHERE
P.PNO FROM P
P.PNAME = ‘Screw’
ALL
SP.PNO FROM SP
SP.SNO = ‘S1’
SELECT SP.PNO FROM SP
P1*2 P2*1
WHERE SP.SNO = ‘S1’
OR
SP.PNO IN
(SELECT P.PNO FROM P
WHERE P.PNAME = ‘Screw’
SELECT
WHERE
UNION
SELECT
WHERE
DISTINCT P.PNO FROM PP1*3 P2*2
P.PNAME = ‘Screw’
ALL
SP.PNO FROM SP
SP.SNO = ‘S1’
SELECT P.PNO FROM P, SP
WHERE (SP.SNO = ‘S1’ AND
P.PNO = SP.PNO)
OR
P.PNAME = ‘Screw’
P1*9 P2*3
SELECT
WHERE
UNION
SELECT
WHERE
P.PNO FROM P
P1*4 P2*2
P.PNAME = ‘Screw’
ALL
DISTINCT SP.PNO FROM SP
SP.SNO = ‘S1’
SELECT SP.PNO FROM P, SP
WHERE (SP.SNO = ‘S1’ AND
P.PNO = SP.PNO)
OR
P.PNAME = ‘Screw’
P1*8 P2*4
SELECT
WHERE
UNION
SELECT
WHERE
P.PNO FROM P
P.PNAME = ‘Screw’
Copyright C. J. Date 2008
P1*5 P2*2
P1*1 P2*1
SP.PNO FROM SP
SP.SNO = ‘S1’
page 108
DUPLICATE ROWS (cont.) :

Either (a) the user cares about the degree of duplication,
or (b) the user does not care…

Expression transformation is inhibited!

Performance suffers

DBMS code quality suffers

Law-abiding users suffer

Particularly annoying if the user does NOT care !!!
Copyright C. J. Date 2008
page 109
DUPLICATE ROWS : FURTHER ISSUES
• If a table is a plot of points in some n-dimensional
space, duplicates don’t add anything—just mean
plotting the same point twice
• If table T permits duplicates, we can’t distinguish
"genuine" duplicates and duplicates arising from data
entry errors!
• If something is true, saying it twice doesn’t make it
more true
Much more could be said ....

Please write out one googol times:
There’s no such thing as a duplicate.
Copyright C. J. Date 2008
—Anon.page 110
AVOIDING DUPLICATES IN SQL :
RM prohibits duplicates ... So to use SQL relationally, we
must prevent them from occurring
Base tables: Specify at least one key /* see later */
Derived tables: SELECT ALL / UNION ALL / VALUES can
all produce dup rows ...
VALUES already discussed ... Regarding ALL vs. DISTINCT:
Can appear in SELECT / UNION / INTERSECT / EXCEPT /
invocation of "set function" such as SUM /* this case is a
little special ... see later */
DISTINCT is default for UNION / INTERSECT / EXCEPT ...
ALL is default in other cases
Copyright C. J. Date 2008
page 111
SELECT / UNION / etc. :
Obvious recommendations: Always specify DISTINCT ...
preferably do so explicitly ... and never specify ALL
Unfortunately ... /* quote ex book */ :
At this point in the original draft, I added that if you find the
discipline of always specifying DISTINCT annoying, don’t complain
to me—complain to the SQL vendors instead. But my reviewers
reacted with almost unanimous horror to my suggestion that you
should always specify DISTINCT. One wrote: "Those who really
know SQL well will be shocked at the thought of coding SELECT
DISTINCT by default." Well, I’d like to suggest, politely, that (a)
those who are "shocked at the thought" probably know the
implementations well, not SQL, and (b) their shock is probably due
to their recognition that those implementations do such a poor job
of optimizing away unnecessary DISTINCTs.
Copyright C. J. Date 2008
page 112
If I write SELECT DISTINCT SNO FROM S ..., that
DISTINCT can safely be ignored. If I write either EXISTS
(SELECT DISTINCT ...) or IN (SELECT DISTINCT ...),
those DISTINCTs can safely be ignored. If I write SELECT
DISTINCT SNO FROM SP ... GROUP BY SNO, that
DISTINCT can safely be ignored. If I write SELECT
DISTINCT ... UNION SELECT DISTINCT ..., those
DISTINCTs can safely be ignored. And so on. Why should
I, as a user, have to devote time and effort to figuring out
whether some DISTINCT is going to be a performance hit
and whether it’s logically safe to omit it?—and to
remembering all of the details of SQL’s inconsistent rules
for when duplicates are automatically eliminated and when
they’re not?
Copyright C. J. Date 2008
page 113
Well, I could go on. However, I decided—against my own
better judgment, but in the interest of maintaining good
relations (with my reviewers, I mean)—not to follow my own
advice elsewhere in this book but only to request duplicate
elimination explicitly when it seemed to be logically
necessary to do so. It wasn’t always easy to decide when
that was, either. But at least now I can add my voice to
those complaining to the vendors, I suppose.

Copyright C. J. Date 2008
page 114
SADLY, THEREFORE :
Recommendations:
• Make sure you know when SQL eliminates duplicates
without you asking it to
• When you do have to ask, make sure you know whether
it matters if you don’t
• When it does matter, specify DISTINCT
/* but be annoyed about it */
• And never specify ALL!
Copyright C. J. Date 2008
page 115
WHY NULLS ARE BAD NEWS :
I assume you know:
 Any comparison in which at least one comparand is null
evaluates to UNKNOWN, not TRUE or FALSE
Rationale: Null means "value unknown" …
Hence three-valued logic (3VL)
 3VL truth tables for NOT, AND, OR:
NOT
T
U
F
Copyright C. J. Date 2008
F
U
T
AND
T
U
F
T
T
U
F
U
U
U
F
F
F
F
F
OR
T
U
F
T
T
T
T
U
T
U
U
F
T
U
F
page 116
NULLS (cont.) :
S
SNO
S1
CITY
London
P
PNO
P1
CITY
"null"
Nothing at all in CITY slot for part P1 !!!
Get SNO/PNO pairs where either the supplier and part
cities are different or the part city isn’t Paris (or both):
SELECT
FROM
WHERE
OR
Copyright C. J. Date 2008
DISTINCT S.SNO, P.PNO
S, P
S.CITY <> P.CITY
P.CITY <> ‘Paris’
page 117
NULLS (cont.) :
Boolean expression in the WHERE clause:
( S.CITY <> P.CITY ) OR ( P.CITY <> ‘Paris’ )
For the only data we have, this becomes
( S.CITY <> null ) OR ( null <> ‘Paris’ )
UNKNOWN OR UNKNOWN
UNKNOWN
Nothing retrieved!
Copyright C. J. Date 2008
page 118
NULLS (cont.) :
But part P1 does have some corresponding city … i.e.,
the null does stand for some real value, say c
Either c is Paris or it is not
If it is, boolean expression becomes
( ‘London’ <> ‘Paris’ ) OR ( ‘Paris’ <> ‘Paris’ ) : TRUE
If it is not, boolean expression becomes
( ‘London’ <> c ) OR ( c <> ‘Paris’ ) : TRUE
because c is not Paris
So TRUE is the right answer … hence, 3VL DOES NOT
MATCH REALITY !!! (Showstopper !!!)
Copyright C. J. Date 2008
page 119
EVEN MORE TRIVIAL EXAMPLE :
SELECT PNO
FROM P
WHERE CITY = CITY
Message:
If you have nulls in your DB ...
you’re getting wrong answers !!!
Note: Foregoing arguments apply to nulls and 3VL in
general ... But SQL manages to introduce
additional flaws of its own!
In particular, SQL represents "the third truth value" by NULL,
not UNKNOWN (even though it does support an UNKNOWN
keyword) ... Just as bad as representing zero by NULL !!!
Copyright C. J. Date 2008
page 120
TO SUM UP :
By definition, a null isn’t a value … THEREFORE:

A "type" that contains a null isn’t a type

A "tuple" that contains a null isn’t a tuple

A "relation" that contains a null isn’t a relation

In fact, nulls violate The Information Principle
/* see later */
 Which means the entire edifice crumbles, and
ALL BETS ARE OFF !!!
MUCH more that could be said—but not here ...
Copyright C. J. Date 2008
page 121
AVOIDING NULLS IN SQL :
RM prohibits nulls ... So to use SQL relationally, we must
prevent them from occurring
Base tables: Specify NOT NULL for every column
Derived tables: Many ops can produce nulls ...


"Set functions" such as SUM all return null if argument is
empty (except for COUNT and COUNT(*), which correctly
return zero)
If scalar subquery evaluates to an empty table, that table
is coerced to null
Copyright C. J. Date 2008
page 122

If row subquery evaluates to an empty table, that table is
coerced to a row of all nulls /* not a null row! */

Outer join, union join

If ELSE omitted from CASE, ELSE NULL assumed

If x = y, NULLIF(x,y) returns null

ON DELETE SET NULL, ON UPDATE SET NULL
Copyright C. J. Date 2008
page 123
STRONG RECOMMENDATIONS :

Base tables: Specify NOT NULL for every column
/* is this a duplicate recommendation? */

Don’t use NULL keyword in any other context

Don’t use UNKNOWN keyword anywhere

Don’t omit ELSE from CASE

Don’t use NULLIF

Don’t use outer join except as noted below

Don’t use union join

Don’t specify PARTIAL or FULL on MATCH
Copyright C. J. Date 2008
page 124
STRONG RECOMMENDATIONS (cont.) :

Don’t use MATCH on foreign key constraints

Don’t use IS DISTINCT FROM

Don’t use IS [NOT] TRUE or IS [NOT] FALSE

Do use COALESCE on every exp that might otherwise
"evaluate to null" ... e.g.:
SELECT S.SNO ,
(
SELECT COALESCE ( SUM ( ALL QTY ) , 0 )
FROM
SP
/* this ALL is OK! */
WHERE SP.SNO = S.SNO ) AS TOTQ
FROM
S
Copyright C. J. Date 2008
page 125
A REMARK ON OUTER JOIN :
Should generally be avoided (shotgun marriage): Forces
tables into a kind of union [sic!] even when they fail to
conform to requirements for union /* see later */ by, in effect,
padding with nulls before doing the union
But why not pad with proper values?—
SELECT
FROM
UNION
SELECT
FROM
WHERE
Copyright C. J. Date 2008
SNO , PNO
SP
SNO , ‘nil’ AS PNO
S
SNO NOT IN
( SELECT SNO FROM SP )
SNO
PNO
S1
S1
S1
..
S5
P1
P2
P3
..
nil
page 126
A REMARK ON OUTER JOIN (cont.) :
Could achieve same result via disciplined (“clean”) use of
explicit outer join plus COALESCE:
SELECT SNO , COALESCE ( PNO , ‘nil’ ) AS PNO
FROM ( S NATURAL LEFT OUTER JOIN SP ) AS POINTLESS
/* re that POINTLESS ... don’t even ask (yet?) */
Copyright C. J. Date 2008
page 127
STRUCTURE OF PRESENTATION :
1. Setting the scene
8.
SQL and constraints
2. Types and domains
9.
SQL and views
3. Tuples and relations,
rows and tables
10.
SQL and logic I:
Relational calculus
4. No duplicates, no nulls
11.
SQL and logic II:
Using logic to write SQL
12.
Further SQL topics
6. SQL and algebra I:
The original operators
13.
Appendix:
The relational model
7. SQL and algebra II:
Additional operators
14.
Appendix: DB design
5. Base relvars, base tables
Copyright C. J. Date 2008
page 128
BASE RELVARS, BASE TABLES :
Assume for simplicity until further notice that:
• All relvars are base relvars
• All table variables are base table variables
Special considerations* that apply to other kinds of relvars /
other kinds of table variables—to views in particular—will be
covered later
*
Such as they are
Copyright C. J. Date 2008
page 129
DATA DEFINITIONS :
VAR S BASE RELATION
{ SNO
CHAR ,
SNAME CHAR ,
STATUS INTEGER ,
CITY
CHAR }
KEY { SNO } ;
CREATE TABLE S
( SNO
VARCHAR(5)
SNAME VARCHAR(25)
STATUS INTEGER
CITY
VARCHAR(20)
UNIQUE ( SNO ) ) ;
NOT NULL ,
NOT NULL ,
NOT NULL ,
NOT NULL ,
VAR P BASE RELATION
{ PNO
CHAR ,
PNAME CHAR ,
COLOR CHAR ,
WEIGHT FIXED ,
CITY
CHAR }
KEY { PNO } ;
CREATE TABLE P
( PNO
VARCHAR(6)
PNAME VARCHAR(25)
COLOR CHAR(10)
WEIGHT NUMERIC(5,1)
CITY
VARCHAR(20)
UNIQUE ( PNO ) ) ;
NOT NULL ,
NOT NULL ,
NOT NULL ,
NOT NULL ,
NOT NULL ,
Copyright C. J. Date 2008
page 130
VAR SP BASE RELATION
CREATE TABLE SP
{ SNO
CHAR ,
( SNO
VARCHAR(5) NOT NULL ,
PNO
CHAR ,
PNO
VARCHAR(6) NOT NULL ,
QTY
INTEGER }
QTY
INTEGER
NOT NULL ,
KEY { SNO , PNO }
UNIQUE ( SNO, PNO ) ,
FOREIGN KEY { SNO }
FOREIGN KEY ( SNO )
REFERENCES S
REFERENCES S ( SNO ) ,
FOREIGN KEY { PNO }
FOREIGN KEY ( PNO )
REFERENCES P ;
REFERENCES P ( PNO ) ) ;
Copyright C. J. Date 2008
page 131
UPDATING IS SET LEVEL
/* actually ALL rel ops are set level */ :



INSERT inserts a set of tuples / DELETE deletes a set
of tuples / UPDATE updates a set of tuples
Thus, e.g., "UPDATE tuple t" really means "update a
set of tuples that happens to be of cardinality one" ...
... and isn’t always possible!

Suppose suppliers S1 and S4 must be in the same city
(integrity constraint for relvar S)

Then updating, e.g., just the city for S1 must fail

Instead (e.g.):
Copyright C. J. Date 2008
page 132
UPDATE S
WHERE SNO = ‘S1’
OR SNO = ‘S4’ :
{ CITY := ‘New York’ } ;
•
UPDATE S
SET
CITY = ‘New York’
WHERE SNO = ‘S1’
OR
SNO = ‘S4’ ;
Implications: (a) Integrity checking and triggered actions
mustn’t be done till all updating has been done (set level op
is not a sequence of tuple level ops) /* more on integrity
later */ ... (b) UPDATE / DELETE via cursor make no sense!
• Recommendation: Avoid row level ops (cursor updates in
particular) unless you know integrity problems won’t occur
Copyright C. J. Date 2008
page 133
WHAT’S MORE :
 Tuples are values and CAN'T be updated!
 "Updating a set of tuples" really means replacing one set
of tuples by another ...
R := ( R MINUS old ) UNION new ;
where old and new are relations (of same type as R)
containing the old and new tuples, respectively
 Likewise: "Updating attribute A within tuple t" is also
sloppy—though useful!—shorthand
Copyright C. J. Date 2008
page 134
RELATIONAL ASSIGNMENT :
• R := rx ; /* generic form */
• "INSERT R rx ;" shorthand for:
R := R D_UNION rx ;
"disjoint union"
• "DELETE R WHERE bx ;" shorthand for:
R := R WHERE NOT ( bx ) ;
• "UPDATE R WHERE bx : { ... } ;" shorthand for:
/* see later */
Copyright C. J. Date 2008
attribute assignment commalist
page 135
UPDATING IN SQL :
• INSERT / DELETE / UPDATE directly analogous to Tutorial D
counterparts ... Two points on INSERT:
INSERT INTO T [ ( column name commalist ) ] tx ;
1. tx often but not always a VALUES exp ... INSERT really does
insert a set of rows /* not true historically! */
2. Recommendation: State column names explicitly. E.g.:
INSERT INTO SP ( PNO , SNO , QTY )
VALUES ( ‘P6’ , ‘S4’ , 700 ) ,
( ‘P6’ , ‘S5’ , 250 ) ;
/* good */
INSERT INTO SP /* bad—relies on column ordering */
VALUES ( ‘S4’ , ‘P6’ , 700 ) ,
( ‘S5’ , ‘P6’ , 250 ) ;
Copyright C. J. Date 2008
page 136
No SQL counterpart to relational assignment as such ...
Best approximation:
R := rx ;
DELETE FROM T ;
INSERT INTO T ( ... ) tx ;
SQL could fail where Tutorial D succeeds

The Assignment Principle:
After assignment of v to V, v = V must give TRUE
Very simple ... but far reaching consequences!
Copyright C. J. Date 2008
page 137
EVERY RELVAR HAS AT LEAST ONE
CANDIDATE KEY (why?) :
Let K be a subset of the heading of relvar R. Then K is a
candidate key (or just key) for R iff:
1.
Uniqueness:
No possible value of R has two distinct tuples with
the same value for K
2.
Irreducibility:
No proper subset of K has the uniqueness property
E.g., {SNO}, {PNO}, {SNO,PNO} for relvars S, P, SP, resp.
Copyright C. J. Date 2008
page 138
POINTS ARISING :
Strong recommendation:
Every CREATE TABLE should
have at least one UNIQUE
and/or PRIMARY KEY specification
Note: We don’t insist on primary keys as such, but do
usually follow PK discipline ourselves (marked by
double underlining)
Key values are tuples! Key uniqueness relies on tuple equality! ...
Number of attributes is degree of key
Keys apply to relvars, not relations (why?)
Note: System can enforce uniqueness but can’t enforce irreducibility
Copyright C. J. Date 2008
page 139
Why irreducibility?
Because if system knows only that, e.g., {SNO,CITY} values
have uniqueness property, it will be enforcing the WRONG
INTEGRITY CONSTRAINT
Recommendation: Never lie to the DBMS!
A subset SK of the heading of R that’s unique but not necessarily
irreducible is a superkey
Uniqueness of SK implies that the functional dependence
/* see later */ SK  A is satisfied by R for all subsets A of the
heading of R
i.e., ALWAYS have "arrows out of superkeys"
Copyright C. J. Date 2008
page 140
RELVARS CAN HAVE N KEYS
(N > 1) :
VAR TAX_BRACKET BASE RELATION
{ LOW MONEY, HIGH MONEY, PERCENTAGE INTEGER }
KEY { LOW }
KEY { HIGH }
KEY { PERCENTAGE } ;
VAR ROSTER BASE RELATION
{ DAY DAY, HOUR HOUR, GATE GATE, PILOT NAME }
KEY { DAY, HOUR, GATE }
KEY { DAY, HOUR, PILOT } ;
VAR MARRIAGE BASE RELATION
{ SPOUSE_A NAME, SPOUSE_B NAME, DATE_OF_MARRIAGE DATE }
KEY { SPOUSE_A, DATE_OF_MARRIAGE }
KEY { DATE_OF_MARRIAGE, SPOUSE_B }
KEY { SPOUSE_B, SPOUSE_A } ;
Copyright C. J. Date 2008
page 141
SOME RELVARS HAVE FOREIGN KEYS :
• Let R1 and R2 be relvars, not necessarily distinct, and let K
be a key for R1
• Let FK be a subset of the heading of R2 such that there
exists a possibly empty sequence of attribute renamings on
R1 that maps K into K’ (say), where K’ and FK contain
exactly the same attributes
• Let R2 and R1 be subject to the constraint that, at all times,
every tuple t2 in R2 has an FK value that’s the K’ value for
some (necessarily unique) tuple t1 in R1 at the time in
question
• Then FK is a foreign key (with the same degree as K); the
associated constraint is a referential constraint; and R2 and
R1 are the referencing relvar and the corresponding
referenced relvar, respectively, for that constraint
Copyright C. J. Date 2008
page 142
E.g., {SNO} and {PNO} in relvar SP
Referential integrity rule: DB must never contain any
unmatched FK values
Note reliance on tuple equality again ... Another example:
VAR EMP BASE RELATION
CREATE TABLE EMP
{ ENO CHAR ,
( ENO VARCHAR(6) NOT NULL ,
MNO CHAR ,
MNO VARCHAR(6) NOT NULL ,
... }
..... ,
KEY { ENO }
UNIQUE ( ENO ) ,
FOREIGN KEY { MNO }
FOREIGN KEY ( MNO )
REFERENCES EMP { ENO }
REFERENCES EMP ( ENO ) ) ;
RENAME ( ENO AS MNO ) ;
Copyright C. J. Date 2008
page 143
Column matching in SQL done by ordinal position, not by name,
so renaming not nec ... though corresp columns must be of
same type (no coercion)
Recommendation: Nevertheless, ensure that corresp columns
do have the same name if possible
Can’t follow this recommendation if either:


Table T has FK matching key of T itself (as in EMP)
Table T2 has two distinct FKs both matching same key
in table T1 (as in bill of materials)
So do the best you can ...
Copyright C. J. Date 2008
page 144
REFERENTIAL ACTIONS
/* e.g., cascade delete */ :
Not part of RM as such ... Supported by SQL but not by
Tutorial D /* yet */
RM = foundation of the DB field, but only the foundation ...
Nothing wrong with additional features, so long as they
don’t violate RM and are in spirit of RM and are useful:

Type theory

Recovery and concurrency (?)

Triggered procedures ... Referential actions a special
case, though specified declaratively ... OK so long as
set level not row level (?) ... OK so long as they don’t
violate The Assignment Principle (but they usually do)
Copyright C. J. Date 2008
page 145
(Very important!) WAY OF THINKING
ABOUT RELVARS :
Heading corresponds to a predicate (truth valued function):
e.g.,
Supplier SNO is under contract, is named SNAME, has
status STATUS, and is located in CITY
Parameters (SNO, SNAME, STATUS, CITY in the example)
stand for values of the relevant types
Tuples represent true propositions ("instantiations" of the
predicate that evaluate to TRUE), obtained by substituting
arguments for the parameters: e.g.,
Supplier S1 is under contract, is named Smith, has status
20, and is located in London
Copyright C. J. Date 2008
page 146
THUS :
•
Every relvar has associated relvar predicate (or
meaning or intended interpretation or intension)
•
If relvar R has predicate P, then every tuple t in R at
time x represents proposition p, derived by invoking (or
instantiating) P at time x with t’s attrib values as
arguments

•
Body of R at time x is extension of P at time x
The Closed World Assumption: Relvar R contains, at
any given time, all and only the tuples that represent
true propositions (true instantiations of the predicate for
R) at the time in question

Loosely: Everything the DB says (or implies) is true,
everything else is false
Copyright C. J. Date 2008
page 147
RELATIONS vs. TYPES :
TYPES are sets of things we can talk about;
RELATIONS are (true) statements about those things!
Note three very important corollaries ...
Copyright C. J. Date 2008
page 148
1. Types and relations are both NECESSARY
2. They're not the same thing (logical difference!)
3. They're SUFFICIENT (as well as necessary)*
A DB (with ops) is a logical system!
This was Codd’s great insight ... and it’s why RM is rock
solid, and "right," and will endure ... and why other "data
models" are just not in the same ballpark
* Need relvars too for changes over time
Copyright C. J. Date 2008
page 149
A NICE ANALOGY :
TYPES are to RELATIONS
as
NOUNS are to SENTENCES
Copyright C. J. Date 2008
page 150
STRUCTURE OF PRESENTATION :
1. Setting the scene
8.
SQL and constraints
2. Types and domains
9.
SQL and views
3. Tuples and relations,
rows and tables
10.
SQL and logic I:
Relational calculus
4. No duplicates, no nulls
11.
SQL and logic II:
Using logic to write SQL
12.
Further SQL topics
6. SQL and algebra I:
The original operators
13.
Appendix:
The relational model
7. SQL and algebra II:
Additional operators
14.
Appendix: DB design
5. Base relvars, base tables
Copyright C. J. Date 2008
page 151
SOME PRELIMINARIES :
•
Reminder re closure and nested exps
•
Ops are generic and read-only
•
But exps (op invocations) can include relvar refs: e.g.,
R1 UNION R2 /* R1 and R2 are relvar names */
*
•
Relvar ref is itself a rel exp* (op is "return value of")
•
INSERT / DELETE / UPDATE / relational assignment
are rel ops but not rel algebra ops: Caveat lector!
Not in SQL, though!—e.g., T1 UNION T2 illegal and so
is T (must say, e.g., SELECT * FROM T)
Copyright C. J. Date 2008
page 152
Tutorial D vs. SQL :
Overriding point = when correspondence needs to be
established between operand attributes (as in JOIN):

Tutorial D requires corresponding attributes to be,
formally, the very same attribute ... E.g.:
P JOIN S /* join P and S "on CITY" */

SQL uses different techniques in different contexts:
ordinal position, explicit specification, same name
(not always same type) ... E.g.:
SELECT P.PNO , P.PNAME , P.COLOR , P.WEIGHT , P.CITY
/* or S.CITY */ ,
S.SNO , S.SNAME , S.STATUS
FROM
P,S
WHERE P.CITY = S.CITY
/* explicit specification */
Copyright C. J. Date 2008
page 153
OR :
SELECT P.PNO , P.PNAME , P.COLOR , P.WEIGHT , P.CITY
/* or S.CITY */ ,
S.SNO , S.SNAME , S.STATUS
FROM
P JOIN S
ON
P.CITY = S.CITY
SELECT P.PNO , P.PNAME , P.COLOR , P.WEIGHT , CITY ,
S.SNO , S.SNAME , S.STATUS
FROM
P JOIN S
not P.CITY
USING ( CITY )
or S.CITY!
SELECT P.PNO , P.PNAME , P.COLOR , P.WEIGHT , CITY ,
S.SNO , S.SNAME , S.STATUS
FROM
P NATURAL JOIN S
Copyright C. J. Date 2008
page 154
POINTS ARISING :
•
SQL permits, and sometimes requires, dot qualified names;
Tutorial D doesn’t
•
Tutorial D sometimes needs to rename attributes to avoid
naming clashes or mismatches; SQL usually doesn’t
(though it does support column renaming for other reasons)
•
Tutorial D has no need for "correlation names"
/* see later */
•
SQL supports features of rel calculus as well as features of
rel algebra; Tutorial D doesn’t /* see later */
•
SQL requires most queries to conform to SELECT - FROM
- WHERE template; Tutorial D has nothing analogous
/* see later */
Copyright C. J. Date 2008
page 155
MORE ON CLOSURE :
Result of every rel op is a relation ... Any op that produces
a result that’s not a rel isn’t a rel op!*
E.g., in SQL, any op that produces a result with:



Duplicate rows
Nulls
Left to right column
ordering


Anonymous columns
Duplicate column names
Strong recommendation: Don’t use any op that violates
closure if you want the result to be amenable to further
relational processing
*
Except for relational inclusion (?)
Copyright C. J. Date 2008
page 156
Closure doesn’t mean intermediate results have to be
materialized (popular misconception!) ... E.g.:
( P JOIN S )
WHERE PNAME > SNAME
SELECT
FROM
WHERE
AND
P.* , SNO , SNAME , STATUS
P,S
P.CITY = S.CITY
P.PNAME > S.SNAME
Can pipeline join result to restriction op
But another important point here:
"PNAME > SNAME" applies to result of P JOIN S ... so names
PNAME and SNAME refer to attributes of that result !!!
Copyright C. J. Date 2008
page 157
How do we know that result has such attributes? What is the
heading of that result? More generally: What’s the heading for
the result of any algebraic operation?
Need relation type inference rules such that, given headings
(and hence types) of input rels, we can infer heading (and hence
type) of output rel
RM includes such rules ... E.g., P JOIN S is of type:
RELATION { PNO CHAR , PNAME CHAR , COLOR CHAR , WEIGHT FIXED ,
CITY CHAR , SNO CHAR , SNAME CHAR , STATUS INTEGER }
In fact need for such rules is implied by closure
Copyright C. J. Date 2008
page 158
RENAME :
S RENAME ( CITY AS SCITY )
SELECT SNO , SNAME , STATUS ,
S.CITY AS SCITY
FROM S
Result identical to current value of S except for renaming
SNO
SNAME
STATUS
S1
S2
S3
S4
S5
Smith
Jones
Blake
Clark
Adams
20
10
30
20
30
SCITY
London
Paris
Paris
London
Athens
Note: Relvar
S not changed
in the DB!
... not like
ALTER TABLE in
SQL
Needed primarily as a preliminary to performing, e.g.,
UNION or JOIN /* see later */
Copyright C. J. Date 2008
page 159
HOW DOES SQL HANDLE
"TABLE TYPE" INFERENCE ???
Answer: Not very well!
•
•
•
(•
No proper notion of table type anyway
Result can have anonymous columns
Result can have duplicate column names
Result has left to right column ordering)
Strong recommendation: Use column renaming discipline
described earlier—which effectively relied on SQL-style
column renaming (AS specifications)—to ensure that SQL
conforms as far as possible to relational rules
Copyright C. J. Date 2008
page 160
EXAMPLE REVISITED :
ANOTHER POINT
( P JOIN S )
WHERE PNAME > SNAME
SELECT
FROM
WHERE
AND
P.* , SNO , SNAME , STATUS
P,S
P.CITY = S.CITY
P.PNAME > S.SNAME
“P.PNAME > S.SNAME” applies to result of join ... ???
Actually quite difficult to explain this at all ... The standard does
explain it, but the machinations involved are much more
complicated than RM type inference rules ... Details beyond the
scope of this seminar !!!
In any case, you’re supposed to know SQL, so you already know
how this works (right?) ... Or had you never thought about this
issue before?
Copyright C. J. Date 2008
page 161
THE ORIGINAL OPERATORS :
restriction
/* aka selection */
projection
JOIN, TIMES
theta join
/* see later */
UNION, INTERSECT, MINUS
DIVIDEBY
Copyright C. J. Date 2008
/* see much later */
page 162
RESTRICT :
P WHERE WEIGHT < 12.5
boolean exp in
which every attrib
ref identifies
attrib of P and there
are no relvar refs
SELECT P.*
FROM P
WHERE WEIGHT < 12.5
- - - - - - - - - - - - Note: WHERE in
Tutorial D is more
general
Result has same heading as P and body = tuples of P for
which boolean exp evaluates to TRUE
PNO
PNAME
COLOR
P1
P5
Nut
Cam
Red
Blue
Copyright C. J. Date 2008
WEIGHT
12.0
12.0
CITY
London
Paris
page 163
PROJECT :
P { COLOR , CITY }
SELECT DISTINCT COLOR , CITY
FROM P
Result has heading as specified:
COLOR
CITY
Note: Duplicates eliminated!
Red
Green
Blue
Blue
London
Paris
Oslo
Paris
Tutorial D also supports
projection on ALL BUT specified
attribs ... Similarly for other
ops where it makes sense
Copyright C. J. Date 2008
page 164
(Natural) JOIN :
Rels r1 and r2 joinable iff attribs with same name are of same
type (i.e., iff set theory union of headings is a legal heading)
/* concept relevant to other ops as well as join */
P JOIN S
SELECT P.* , SNO , SNAME , STATUS
FROM
P,S
WHERE P.CITY = S.CITY
Result heading = set theory union of headings of P and S ...
Result body = set of all tuples t where t is the set theory union
of tuple from P and tuple from S
PNO PNAME
COLOR
P1
..
P6
Red
...
Red
Nut
...
Cog
Copyright C. J. Date 2008
WEIGHT
12.0
....
19.0
CITY
SNO SNAME STATUS
London S1
......
..
London S4
Smith
.....
Clark
20
..
20
page 165
ALTERNATIVE SQL FORMULATION :
SELECT *
FROM P NATURAL JOIN S
Result heading has columns
CITY, PNO, PNAME, COLOR, WEIGHT, SNO, SNAME, STATUS
in that order ... but don’t write code that relies on
this ordering!
Copyright C. J. Date 2008
page 166
POINTS ARISING :
Let r1 and r2 be joinable
Let common attributes (set theory intersection of headings) be
{Y} ... Let other attributes of r1 and r2 be {X} and {Z}, resp. ...
Join has heading = set theory union of {X}, {Y}, and {Z}
If {X} and {Z} are empty, {Y} = entire heading of r1 and r2, and
r1 JOIN r2 degenerates to r1 INTERSECT r2
E.g.: S { CITY } JOIN P { CITY }
same as
S { CITY } INTERSECT P { CITY }
Copyright C. J. Date 2008
page 167
If {Y} is empty, r1 and r2 have no common attrib names,
and r1 JOIN r2 degenerates to r1 TIMES r2
E.g.: S { ALL BUT CITY } JOIN P { ALL BUT CITY }
same as
S { ALL BUT CITY } TIMES P { ALL BUT CITY }
Direct support for TIMES included for psychological reasons
rather than logical ones (likewise for INTERSECT)
Note: For TIMES, operand rels must have no common
attrib names
Copyright C. J. Date 2008
page 168
Can usefully define n-adic JOIN also (n > 0)*
JOIN { r1 , r2 , ... , rn }
JOIN { r }  r
JOIN { }  ??? Answer: TABLE_DEE !!!

*
TABLE_DEE is the identity with respect to JOIN
/* important! */
Why exactly is this possible? See later ...
Copyright C. J. Date 2008
page 169
EXPLICIT JOINS IN SQL :
1. t1 NATURAL JOIN t2
/* already explained */
2. t1 JOIN t2 ON bx
3. t1 JOIN t2 USING ( C1 , C2 , ... , Cn )
4. t1 CROSS JOIN t2 /*  ( SELECT * FROM t1 , t2 ) */

2. t1 JOIN t2 ON bx ... logically equivalent to:
( SELECT * FROM t1 , t2 WHERE bx )
Copyright C. J. Date 2008
page 170
EXPLICIT JOINS IN SQL (cont.) :
3. t1 JOIN t2 USING ( C1 , C2 , ... , Cn ) equivalent to:
( SELECT * FROM t1 , t2
WHERE t1.C1 = t2.C1 AND ... AND t1.Cn = t2.Cn )
—except that columns C1, C2, ..., Cn appear only once
in result, and result column ordering is:
first C1, C2, ..., Cn (in that order)
then other columns of t1 (in same order as in t1),
then other columns of t2 (in same order as in t2)
/* Do you begin to see what a pain this left to right */
/* ordering business is ???
*/
Copyright C. J. Date 2008
page 171
RECOMMENDATIONS :
1. NATURAL JOIN: First choice ... Usually most succinct
if other recommendations followed ... But make sure
columns with same name are of same type (joinability)
2. Avoid JOIN ON: Virtually guaranteed to produce
duplicate column names (unless ... ???) ... If you must
use it, do renaming as well
3. JOIN USING: Make sure columns with same name are
of same type
4. CROSS JOIN: Make sure no common column names
5. WHERE (original syntax): As Case 2 (JOIN ON)
Copyright C. J. Date 2008
page 172
UNION, INTERSECT, MINUS :
Operands must be of same type, result is of same type also ...
Suppose parts have extra attribute STATUS, of type INTEGER:
P { STATUS , CITY } UNION
S { CITY , STATUS }
Note:
SELECT
FROM
UNION
SELECT
FROM
STATUS , CITY
P
CORRESPONDING
CITY , STATUS
S
Duplicates eliminated!—unless ALL specified, in SQL;
result has attributes (columns) STATUS and CITY—in
that order, in SQL
If CORRESPONDING not specified, column matching done on
basis of ordinal position ... Don’t do this!
Copyright C. J. Date 2008
page 173
UNION, INTERSECT, MINUS (cont.) :
P { STATUS , CITY } INTERSECT
S { CITY , STATUS }
SELECT
FROM
INTERSECT
SELECT
FROM
STATUS , CITY
P
CORRESPONDING
CITY , STATUS
S
P { STATUS , CITY } MINUS
S { CITY , STATUS }
SELECT
FROM
EXCEPT
SELECT
FROM
STATUS , CITY
P
CORRESPONDING
CITY , STATUS
S
Copyright C. J. Date 2008
page 174
RECOMMENDATIONS :

Make sure corresponding columns have same name
and type

Always specify CORRESPONDING if possible ...

... otherwise, make sure columns line up properly
(because matching done by ordinal position): e.g.,
SELECT STATUS , CITY FROM P
UNION
SELECT STATUS , CITY FROM S /* note reordering */

Don’t use "BY (column name commalist)"

Never specify ALL! Note: Usual "justification" for
ALL is performance ...
Copyright C. J. Date 2008
page 175
ONE LAST POINT :
Tutorial D also supports:

“Disjoint union” (D_UNION)
/* see defn of INSERT earlier */

n-adic UNION, INTERSECT, D_UNION (n > 0)
/* but not MINUS !!! */
Copyright C. J. Date 2008
page 176
WHICH OPERATORS ARE PRIMITIVE ???
Already seen that INTERSECT and TIMES can be defined
in terms of join ... i.e., not all ops primitive
Difference between primitive and useful !!!
One possible primitive set:
restrict
project
join
union
difference
But what about rename?
Copyright C. J. Date 2008
page 177
"WITH" SPECIFICATIONS
/* very useful feature */ :
Get pairs of supplier numbers such that the suppliers are
colocated (i.e., in same city):
(
(
( S RENAME ( SNO AS SA ) ) { SA , CITY } JOIN
( S RENAME ( SNO AS SB ) ) { SB , CITY } )
WHERE SA < SB ) { SA , SB }
Or:
WITH ( S RENAME ( SNO AS SA ) ) { SA , CITY } AS R1 ,
( S RENAME ( SNO AS SB ) ) { SB , CITY } AS R2 ,
R1 JOIN R2 AS R3 ,
R3 WHERE SA < SB AS R4 :
R4 { SA, SB }
Copyright C. J. Date 2008
page 178
"WITH" IN SQL :

Operands the other way around: WITH name AS exp

No colon separator

In Tutorial D, WITH can be used with exps of any kind; in
SQL, WITH can be used with table exps only
WITH T1
T2
T3
T4
SELECT
Copyright C. J. Date 2008
AS ( SELECT SNO AS SA , CITY FROM S ) ,
AS ( SELECT SNO AS SB , CITY FROM S ) ,
AS ( SELECT * FROM T1 NATURAL JOIN T2 ) ,
AS ( SELECT * FROM T3 WHERE SA < SB )
SA , SB FROM T4
page 179
WHAT DO RELATIONAL EXPRESSIONS MEAN?
Recall: Every relvar has a relvar predicate (i.e., what the relvar
means)
This notion extends naturally to arbitrary rel exps!
E.g., consider projection S {SNO,SNAME,STATUS} ...
Denotes rel containing all tuples of the form
TUPLE { SNO sno , SNAME sn , STATUS st }
such that a tuple of the form
TUPLE { SNO sno , SNAME sn , STATUS st , CITY sc }
currently exists in relvar S for some CITY value sc ...
In other words:
Copyright C. J. Date 2008
page 180
Specified exp denotes current extension of predicate:
There exists some city CITY such that supplier SNO is under
contract, is named SNAME, has status STATUS, and is
located in city CITY
Or just: Supplier SNO is under contract, is named SNAME, has
status STATUS, and is located somewhere
This predicate = meaning of S {SNO,SNAME,STATUS} ...
Has three parameters (relation has three attributes);
CITY is a bound variable, not a param /* see later */
Pred for arb rel exp can be determined from preds for relvars
involved plus semantics of rel ops involved
Copyright C. J. Date 2008
page 181
THETA JOIN :
E.g.: "unequal" join of S and P on cities
/* SQL only */ :
SELECT SNO , SNAME , STATUS , S.CITY AS SCITY ,
PNO , PNAME , COLOR , WEIGHT , P.CITY AS PCITY
/* 3. "project"
*/
FROM S , P
/* 1. cartesian product
*/
WHERE S.CITY <> P.CITY
/* 2. restrict
*/
Note the conceptual algorithm for evaluating a SELECT FROM - WHERE exp (i.e., formal definition of semantics
of such exps)
By the way: What if theta had been "=" ???
Copyright C. J. Date 2008
page 182
EXPRESSION TRANSFORMATION :
("query rewrite") :
Example: Suppliers who supply part P2, with corresp
quantities (Tutorial D):
( ( S JOIN SP ) WHERE PNO = ‘P2’ ) { ALL BUT PNO }

DB : 100 suppliers, 100,000 shipments (500 for P2)

No optimization at all (worst case) :
1. Join

10,000,100 reads, 100,000 writes
2. Restrict (result 500 tuples)

100,000 reads, no writes
3. Project

No reads, no writes
TOTAL: 10,200,100 tuple I/Os
Copyright C. J. Date 2008
page 183
AN OBVIOUS IMPROVEMENT :
1. Restrict (result 500 tuples)

100,000 reads, no writes
2. Join (result 500 tuples)

100 reads, no writes
3. Project

No reads, no writes
TOTAL: 100,100 tuple I/Os
(100 times better)
Copyright C. J. Date 2008
page 184
In effect, optimizer has transformed original exp into
S JOIN ( SP WHERE PNO = ‘P2’ ) /* ignore projection */
Such transformations are one of the two great ideas at the
heart of optimization
Other = cost based optimizing: E.g., index or hash on SP.PNO
will reduce 1,000,000 reads in Step 1 to 500, and overall
procedure now 20,000 times better than the original
But such optimizing has little to do with RM per se, except for
strong logical vs. physical separation, which keeps access
strategies out of applications
Copyright C. J. Date 2008
page 185
THE DISTRIBUTIVE LAW :
E.g., SQRT ( a * b )  SQRT ( a ) * SQRT ( b )
"SQRT distributes over multiplication"
/* but not over addition */
In RM, restrict distributes over UNION / INTERSECT / MINUS
... also JOIN if restriction condition = AND of two separate
conditions, one for each join operand
I.e., ( r1 WHERE bx1 ) JOIN ( r2 WHERE bx2 ) 
( ( r1 JOIN r2 ) WHERE bx1 AND bx2
This law was used in the example
Net effect: Can do restrictions early
Copyright C. J. Date 2008
page 186
Project distributes over UNION
I.e., ( r1 UNION r2 ) { X } 
r1 { X } UNION r2 { X }
Also distributes over JOIN provided all joining attribs
are included in the projection
Can do projections early
Copyright C. J. Date 2008
page 187
THE COMMUTATIVE LAW :
Dyadic Op is commutative iff a Op b  b Op a
• In arith, "+" and "*" are commutative,
"-" and "/" aren’t
• In RM, UNION / INTERSECT / JOIN are commutative,
MINUS isn’t
• Hence, in (e.g.) r1 JOIN r2, system is free to choose,
smaller of r1 and r2 (say) as "outer" rel
and other as "inner" rel
Copyright C. J. Date 2008
page 188
THE ASSOCIATIVE LAW :
Dyadic Op is associative iff a Op (b Op c)  ( a Op b) Op c
• In arith, "+" and "*" are associative,
"-" and "/" aren’t
• In RM, UNION / INTERSECT / JOIN are associative,
MINUS isn’t
• Hence, in (e.g.) r1 JOIN r2 JOIN r3:
No parens necessary
System is free to choose join sequence
Copyright C. J. Date 2008
page 189
THE IDEMPOTENCE AND ABSORPTION LAWS :
Dyadic Op is idempotent iff a Op a  a


In logic, AND and OR are idempotent
In RM, UNION / INTERSECT / JOIN are idempotent,
MINUS isn’t
Absorption laws:
r1 UNION ( r1 INTERSECT r2 )  r1
r1 INTERSECT ( r1 UNION r2 )  r1
Copyright C. J. Date 2008
page 190
All such transformations can be done without regard for actual
data values or access paths!
Important note:
Many such transformations available for sets ...
But fewer for bags ...
And fewer still if column ordinal position has to be taken into
account ...
And far fewer if nulls and 3VL have to be taken into account ...
What do you conclude?
Copyright C. J. Date 2008
page 191
BUT DOESN’T RELYING ON ATTRIBUTE NAMES
MAKE FOR FRAGILE CODE ???
E.g., P JOIN S ... What if STATUS attribute added to P?
Popular misconception!
1960s/1970s:
pgm
DB
Not much data
independence
DB
More data
independence
(but ...)
DB def
Today:
pgm
DB def
Copyright C. J. Date 2008
page 192
The right way:
pgm
DB
DB def
Note:
DB def
Full data
independence*
Views should have solved this problem but didn’t
... because mapping specified as part of the view
definition instead of separately
Recommendation: Adopt the "operate via views strategy"!
*
Full logical data independence, to be precise
Copyright C. J. Date 2008
page 193
STRUCTURE OF PRESENTATION :
1. Setting the scene
8.
SQL and constraints
2. Types and domains
9.
SQL and views
3. Tuples and relations,
rows and tables
10.
SQL and logic I:
Relational calculus
4. No duplicates, no nulls
11.
SQL and logic II:
Using logic to write SQL
12.
Further SQL topics
6. SQL and algebra I:
The original operators
13.
Appendix:
The relational model
7. SQL and algebra II:
Additional operators
14.
Appendix: DB design
5. Base relvars, base tables
Copyright C. J. Date 2008
page 194
ADDITIONAL OPERATORS :
MATCHING, NOT MATCHING
EXTEND
image relations
DIVIDEBY
aggregate operators
SUMMARIZE
GROUP, UNGROUP
"what if"
ORDER BY (?)
Copyright C. J. Date 2008
page 195
SEMIJOIN AND SEMIDIFFERENCE :
Most exps involving join or difference really require semijoin
or semidifference
r1 MATCHING r2  ( r1 JOIN r2 ) { H1 }
where {H1} = heading of r1
S MATCHING SP
SELECT S.* FROM S
WHERE SNO IN
( SELECT SNO FROM SP )
r1 NOT MATCHING r2  r1 MINUS ( r1 MATCHING r2 )
S NOT MATCHING SP
SELECT S.* FROM S
WHERE SNO NOT IN
( SELECT SNO FROM SP )
If r1 and r2 of same type, r1 NOT MATCHING r2 degenerates
to r1 MINUS r2 /* analogous remark NOT true of semijoin */
Copyright C. J. Date 2008
page 196
EXTEND :
EXTEND P
ADD ( WEIGHT * 454
AS GMWT )
PNO PNAME
COLOR
WEIGHT
P1
P2
P3
P4
P5
P6
Red
Green
Blue
Red
Blue
Red
12.0
17.0
17.0
14.0
12.0
19.0
Nut
Bolt
Screw
Screw
Cam
Cog
Copyright C. J. Date 2008
SELECT P.* ,
WEIGHT * 454 AS GMWT
FROM
P
CITY
GMWT
London
Paris
Oslo
London
Paris
London
5448.0
7718.0
7718.0
6356.0
5448.0
8626.0
Note: Relvar
P not
changed in
the DB!
... not like
ALTER TABLE
in SQL
page 197
HENCE :
Get PNO and gram weight for parts with gram weight > 7000.0:
( ( EXTEND P ADD ( WEIGHT * 454 AS GMWT ) )
WHERE GMWT > 7000.0 ) { PNO, GMWT }
Contrast SQL:
SELECT PNO, ( WEIGHT * 454 ) AS GMWT
FROM P
WHERE ( WEIGHT * 454 ) > 7000.0 /* not GMWT > 7000.0 */
SELECT - FROM - WHERE template too rigid! (Lack of
orthogonality) ... Need to apply WHERE to SELECT result,
not FROM result
Copyright C. J. Date 2008
page 198
Actually the standard does allow:
SELECT TEMP PNO , TEMP.GMWT
FROM ( SELECT P.PNO , ( WEIGHT * 454 ) AS GMWT
FROM P ) AS TEMP
WHERE TEMP.GMWT > 7000.0
But does your favorite product support subqueries in
the FROM clause?
Also, this style leads to references appearing (possibly a
long way) before definitions ...
Copyright C. J. Date 2008
page 199
IMAGE RELATIONS :
Image relation = "image" in some rel of some tuple
(usually a tuple in some other rel)
E.g., image in SP of tuple in S for S4:
PNO
QTY
( SP WHERE SNO = ‘S4’ ) { ALL BUT SNO }
P2
P4
P5
200
300
400
Very useful and widely applicable concept! So we define a
shorthand ...
Copyright C. J. Date 2008
page 200
S WHERE ( !!SP ) { PNO } = P { PNO }
image in SP of
"current" tuple
in S
relational
equality
I.e., get suppliers who supply all parts!
SNO SNAME
S1
Smith
STATUS
20
CITY
London
Image relation ref can’t appear wherever rel exp is general can
appear, only in contexts where pertinent tuple well defined (e.g.,
WHERE clause)
Copyright C. J. Date 2008
page 201
SQL has no direct support for image rels ... SQL analog of
foregoing example: /* can be simplified */
SELECT
FROM
WHERE
(
AND
(
Copyright C. J. Date 2008
*
S
NOT EXISTS
SELECT PNO
FROM
SP
WHERE SP.SNO = S.SNO
EXCEPT
SELECT PNO
FROM
P)
NOT EXISTS
SELECT PNO
FROM
P
EXCEPT
SELECT PNO
FROM
P
WHERE SP.SNO = S.SNO )
page 202
ANOTHER EXAMPLE :
S
SP
PJ
J
{
{
{
{
SNO }
SNO, PNO }
PNO, JNO }
JNO }
/* suppliers
/* supplier supplies part
/* part is used in project
/* projects
*/
*/
*/
*/
Get all sno/jno pairs such that:
• SNO sno currently appears in S
• JNO jno currently appears in J
• Supplier sno supplies all parts used in project jno
( S JOIN J ) WHERE !!PJ  !!SP
Easy ... but try it in SQL!
Copyright C. J. Date 2008
page 203
DIVIDEBY :
Should be dropped, IMHO
/* so can skip this topic if you like */
• Any query that can be done via divide can be done better
via image rels
• There are at least seven different divides!
• Doesn’t solve the problem it was originally, and
specifically, meant to address
Original and simplest version:
Let heading of r2 be subset of heading of r1 (so r1 and r2
definitely joinable, by the way)
Copyright C. J. Date 2008
page 204
r1
X
r2
Y
Dividend
Y
Divisor

X
Result
r1 DIVIDEBY r2  r1 { X } NOT MATCHING
( ( r1 { X } JOIN r2 ) NOT MATCHING r1 )
E.g., let RP be ( P WHERE COLOR = ‘Red’ ) ... Then
SP { SNO , PNO } DIVIDEBY RP { PNO }
Loosely (?):
SNOs for suppliers who
supply all red parts ...
Probably needs to be joined to S (?)
Copyright C. J. Date 2008
SNO
S1
page 205
AGGREGATE OPERATORS
/* digression (?) */ :
In RM, agg op = op that derives a single value from the bag or set of
values of some attribute of some relation—or, for COUNT, from the
entire rel. E.g.:
X := COUNT ( S ) ;
/* X = 5 */
SELECT COUNT ( * ) AS X
FROM S
Y := COUNT
( S { STATUS } ) ;
/* Y = 3 */
SELECT COUNT ( DISTINCT STATUS )
AS Y
FROM S
Tutorial D syntax:
<agg op name> ( <relation exp> [, <exp> ] )
Copyright C. J. Date 2008
page 206
Tutorial D EXAMPLES :
SUM ( SP { QTY } )
/* 1000 */
SUM ( SP , QTY )
/* 3100 */
AVG ( SP , 3 * QTY )
/* 775 */
Legal <agg op name>s include:
COUNT
SUM
AVG
MAX
MIN
AND
OR
XOR
The <exp> can include <attribute ref>s (in practice, almost
always does)
The <exp> must be omitted for COUNT ... Otherwise, can be
omitted only if rel denoted by <relation exp> is of degree one, as
in first example above
Copyright C. J. Date 2008
page 207
WHAT ABOUT SQL ???
SELECT COUNT ( * ) AS X FROM S
SELECT COUNT ( DISTINCT STATUS ) AS Y FROM S
SQL doesn’t really support agg ops at all!
Foregoing exps are summarizations, not aggregations; they don’t
evaluate to 5 and 3, resp. ... instead, they evaluate to tables
containing those counts:
X
Y
5
3
Copyright C. J. Date 2008
/* COUNT invocations are agg */
/* op invocations, perhaps
*/
/* ... but they can’t appear
*/
/* as "stand alone" exps ...
*/
/* only inside table exps
*/
page 208
IN OTHER WORDS :
Aggregation is treated in SQL as a special case of
summarization (i.e., loosely, what’s represented by a SELECT
exp with a GROUP BY) ... Note that the foregoing SELECT
exps do have implicit GROUP BYs:
SELECT COUNT ( * ) AS X
FROM
S
GROUP BY ( )
SELECT COUNT ( DISTINCT STATUS ) AS Y
FROM
S
GROUP BY ( )
SQL "aggregation" is, loosely, a SELECT exp without an
explicit GROUP BY
Copyright C. J. Date 2008
page 209
Aggregation and summarization are often confused! ...
Perhaps you can begin to see why
Picture confused still further because SQL often coerces table
resulting from an "aggregation" to the single row it contains, or
even doubly coerces it to the single value that row contains,
as here:
SET X = ( SELECT COUNT ( * ) FROM S ) ;
SET Y = ( SELECT COUNT ( DISTINCT STATUS ) FROM S ) ;
Another oddity: Logical error in connection with SQL-style
aggregation and empty tables (I don’t mean the nulls problem) ...
Details beyond the scope of this seminar
Copyright C. J. Date 2008
page 210
BACK TO Tutorial D :
Image rels can be very useful in connection with agg ops
... e.g.:
Suppliers for whom total shipment quantity, taken over all
shipments, is less than 1000
S WHERE SUM ( !!SP , QTY ) < 1000
SQL "analog" (but note the trap!):
SELECT
FROM
WHERE
GROUP
HAVING
Copyright C. J. Date 2008
S.SNO , S.SNAME , S.STATUS , S.CITY
S , SP
S.SNO = SP.SNO
BY S.SNO , S.SNAME , S.STATUS , S.CITY
SUM ( SP.QTY ) < 1000
page 211
•
Suppliers with fewer than three shipments:
S WHERE COUNT ( !!SP ) < 3
•
Suppliers where maximum shipment quantity < twice
minimum shipment quantity:
S WHERE MAX ( !!SP , QTY ) < 2 * MIN ( !!SP , QTY )
•
Update suppliers where total shipment quantity < 1000,
halving their status:
UPDATE S WHERE SUM ( !!SP , QTY ) < 1000 :
{ STATUS := 0.5 * STATUS } ;
Copyright C. J. Date 2008
page 212
SUMMARIZE :
SUMMARIZE SP PER ( S { SNO } ) ADD ( COUNT ( PNO ) AS PCT )
/* Tutorial D (see later for SQL analog) ... */
/* call this "SX1" for subsequent reference */
SNO
S1
S2
S3
S4
S5
Copyright C. J. Date 2008
PCT
6
2
1
3
0
Note: COUNT ( PNO ) is not an
invocation of the agg op
called COUNT!— which takes
a rel as its argument ...
So what is it ??? Hmmm ...
note this tuple in particular!
page 213
SUMMARIZE (cont.) :
Heading of PER rel must = that of some projection of SUMMARIZE
rel ... If it actually is such a projection, can replace PER spec by BY
spec as in SX2 here:
SUMMARIZE SP BY { SNO } ADD ( COUNT ( PNO ) AS PCT )
SNO
S1
S2
S3
S4
Copyright C. J. Date 2008
PCT
6
2
1
3
Misses S5, with count of 0 ...
because BY { SNO } is shorthand for
PER ( SP { SNO } )
page 214
EXAMPLE SX2 HAS A DIRECT SQL ANALOG :
SELECT SNO , COUNT ( ALL PNO ) AS PCT
FROM
SP
GROUP BY SNO
Summarizations typically formulated in SQL by means of
SELECT exp with explicit GROUP BY /* but see later */
(Recall that "aggregations" typically have implicit GROUP BY)
But what about Example SX1 ??? Straightforward GROUP
BY doesn’t do the job ... Instead:
Copyright C. J. Date 2008
page 215
EXAMPLE SX1 IN SQL :
SELECT S.SNO , (
FROM
S
SELECT COUNT ( ALL PNO ) /* AS PCT ??? */
FROM SP
WHERE SP.SNO = S.SNO ) AS PCT
/* double coercion */
Example SX2 could be done the same way:
SELECT DISTINCT SPX.SNO,
( SELECT COUNT ( ALL SPY.PNO )
FROM SP AS SPY
WHERE SPY.SNO = SPX.SNO ) AS PCT
FROM
SP AS SPX
GROUP BY is logically redundant!
Copyright C. J. Date 2008
page 216
/* SX3 : Slight variation on SX1 */
SUMMARIZE SP PER ( S { SNO } ) ADD ( SUM ( QTY ) AS TOTQ )
/* SQL analog ... or is it? */
SELECT S.SNO , ( SELECT SUM ( ALL QTY )
FROM
SP
WHERE SP.SNO = S.SNO ) AS TOTQ
FROM
S
/* SX4 : Slight variation on SX3 */
( SUMMARIZE SP PER ( S { SNO } ) ADD ( SUM ( QTY ) AS TOTQ ) )
WHERE TOTQ > 250
Copyright C. J. Date 2008
page 217
SQL ANALOG /* or is it? */ :
SELECT SNO , SUM ( ALL QTY ) AS TOTQ
FROM SP
GROUP BY SNO
HAVING SUM ( ALL QTY ) > 250 /* not TOTQ > 250 !!! */
Or:
SELECT DISTINCT SPX.SNO , ( SELECT SUM ( ALL SPY.QTY )
FROM
SP AS SPY
WHERE SPY.SNO = SPX.SNO )
AS
TOTQ
FROM
SP AS SPX
WHERE ( SELECT SUM ( ALL SPY.QTY )
FROM
SP AS SPY
WHERE
SPY.SNO = SPX.SNO ) > 250
HAVING is logically redundant!
Copyright C. J. Date 2008
page 218
GROUP BY / HAVING formulations often more succinct
On the other hand, they sometimes give the "wrong"
answer, or at least not the answer really wanted
Recommendations:
• If you use GROUP BY or HAVING, make sure you’re
summarizing the right table (typically suppliers rather
than shipments, in terms of our example)
• Watch out for empty sets ... Use COALESCE wherever
necessary
Copyright C. J. Date 2008
page 219
BACK TO Tutorial D :
•
Image rels can be very useful in connection with
summarization ... In fact, they make SUMMARIZE logically
redundant!
SUMMARIZE SP PER ( S { SNO } )
ADD ( COUNT ( PNO ) AS PCT )
Or: EXTEND S { SNO } ADD ( COUNT ( !!SP ) AS PCT )
•
For each supplier, get supplier details and total, maximum,
and minimum shipment quantity:
EXTEND S ADD ( SUM ( !!SP , QTY ) AS TOTQ ,
MAX ( !!SP , QTY ) AS MAXQ ,
MIN ( !!SP , QTY ) AS MINQ )
/* note use of "multiple EXTEND" */
Copyright C. J. Date 2008
page 220
•
For each supplier, get supplier details, total shipment quantity,
grand total shipment quantity:
EXTEND S ADD
( SUM ( !!SP , QTY ) AS TOTQ ,
SUM ( SP , QTY ) AS GTOTQ )
•
SNO
TOTQ
GTOTQ
S1
..
S5
1300
....
0
3100
....
3100
and
For each city c, get c and total and average shipment quantities
for all shipments for which supplier and part city are both c
WITH ( S JOIN SP JOIN P ) AS TEMP :
EXTEND TEMP { CITY } ADD ( SUM ( !!TEMP , QTY ) AS TOTQ ,
AVG ( !!TEMP , QTY ) AS AVGQ )
Copyright C. J. Date 2008
page 221
RECALL THESE RELATIONS :
R1
SNO
S2
S2
S3
S4
S4
S4
PNO
P1
P2
P2
P2
P4
P5
R4
SNO
PNO_REL
S2
PNO
P1
P2
S3
PNO
P2
S4
PNO
P2
P4
P5
Type of R4 =
RELATION
{ SNO CHAR ,
PNO_REL RELATION { PNO CHAR } }
Copyright C. J. Date 2008
page 222
GROUP AND UNGROUP :
R1 GROUP ( { PNO } AS PNO_REL ) : gives R4
R4 UNGROUP ( PNO_REL ) : gives R1
SQL has no direct counterparts
Exercise: What does this do?—
EXTEND R1 { SNO } ADD ( !!R1 AS PNO_REL )
Copyright C. J. Date 2008
page 223
"WHAT IF" QUERIES :
What if parts in Paris were in Nice and their weight doubled?
UPDATE P
WHERE CITY = ‘Paris’ :
{ CITY := ‘Nice’ ,
WEIGHT := 2 * WEIGHT }
/* read-only op !!! */
Copyright C. J. Date 2008
WITH T1 AS
( SELECT P.*
FROM
P
WHERE CITY = ‘Paris’ ) ,
T2 AS
( SELECT P.* , ‘Nice’ AS NC ,
2 * WEIGHT AS NW
FROM T1 )
SELECT PNO , PNAME , COLOR ,
NW AS WEIGHT ,
NC AS CITY
FROM T2
page 224
Tutorial D EXPRESSION IS
SHORTHAND FOR :
WITH ( P WHERE CITY = ‘Paris’ ) AS R1 ,
( EXTEND R1 ADD ( ‘Nice’ AS NC ,
2 * WEIGHT AS NW ) ) AS R2 ,
R2 { ALL BUT CITY , WEIGHT } AS R3 :
R3 RENAME ( NC AS CITY , NW AS WEIGHT )
/* can now explain expansion of UPDATE statement: */
UPDATE P WHERE CITY = ‘Paris’ :
{ CITY := ‘Nice’ , WEIGHT := 2 * WEIGHT } ;
Expansion:
P := ( P WHERE CITY  ‘Paris’ )
UNION
( UPDATE P WHERE CITY = ‘Paris’ :
{ CITY := ‘Nice’ , WEIGHT := 2 * WEIGHT } ) ;
Copyright C. J. Date 2008
page 225
WHAT ABOUT "ORDER BY" ???
Not a relational op (because result is not a relation) ...
So not legal in relational exps, and hence not in view
definitions etc.
 Produces ordered list or sequence of tuples
Also, not a function
 Result indeterminate (in general) …
/* like many SQL expressions, in fact */
Also, produces a sequence of tuples, yet "<" and ">"
aren't defined for tuples!
Copyright C. J. Date 2008
page 226
STRUCTURE OF PRESENTATION :
1. Setting the scene
8.
SQL and constraints
2. Types and domains
9.
SQL and views
3. Tuples and relations,
rows and tables
10.
SQL and logic I:
Relational calculus
4. No duplicates, no nulls
11.
SQL and logic II:
Using logic to write SQL
12.
Further SQL topics
6. SQL and algebra I:
The original operators
13.
Appendix:
The relational model
7. SQL and algebra II:
Additional operators
14.
Appendix: DB design
5. Base relvars, base tables
Copyright C. J. Date 2008
page 227
INTEGRITY CONSTRAINTS :
An integrity constraint is, loosely, a boolean expression
that must evaluate to TRUE
Two basic kinds: Type constraints / database constraints
Constraints = really what DB management is all about!
•
Talking of poor quality of education ...
Constraints are vital, and proper DBMS support for them is
vital as well
•
I don’t care how fast your system runs if I can’t trust the
answers it’s giving me!
Copyright C. J. Date 2008
page 228
TYPE CONSTRAINTS :
Define values that make up a given type ... For system
defined types, not much to say ... So suppose for sake of
example that quantities are of a user defined type, say QTY:
TYPE QTY /* quantities */
POSSREP QPR { Q INTEGER
CONSTRAINT Q  0 AND Q  5000 } ;
TYPE POINT /* geometric points in 2D space */
POSSREP CARTESIAN { X FIXED, Y FIXED
CONSTRAINT SQRT ( X ** 2 + Y ** 2 )  100.0 } ;
Checked "immediately" (actually during selector operator
invocations ... see next page)
Copyright C. J. Date 2008
page 229
SELECTORS AND THE_ OPERATORS :
One selector per possrep
One THE_ op per possrep component
Examples:
• QPR ( 250 )
/* selector invocation
/* ... actually a literal
*/
*/
Simplify QTY type def:
TYPE QTY POSSREP { Q INTEGER
CONSTRAINT Q > 0 AND Q < 5000 } ;
Selector invocation becomes:
QTY ( 250 )
Copyright C. J. Date 2008
page 230
SELECTORS AND THE_ OPERATORS
(cont.) :
Examples (cont.):
• THE_Q ( QZ ) /* THE_ op invocation */
/* (QZ is of type QTY) */
Simplify POINT type def:
TYPE POINT POSSREP { X FIXED , Y FIXED CONSTRAINT ... } ;
• POINT ( PX , PY )
/* POINT selector invocation
*/
• POINT ( 5.7 , -3.9 )
/* POINT literal
*/
• THE_X ( P )
/* THE_ op invocation
*/
Copyright C. J. Date 2008
page 231
WHAT ABOUT SQL ???
SQL doesn’t support type constraints at all!
E.g.: CREATE TYPE QTY AS INTEGER FINAL ;
/* all available integers denote valid quantities ?!? */
So to constrain quantities further, must specify approp
database constraint on every use of the type ... E.g.:
CREATE TABLE SP
( SNO VARCHAR(5) NOT NULL ,
PNO VARCHAR(6) NOT NULL ,
QTY QTY
NOT NULL , … ,
CONSTRAINT SPQC CHECK ( QTY >= QTY(0) AND
QTY <= QTY(5000) ) ) ;
Copyright C. J. Date 2008
page 232
SQL does support selectors and THE_ ops (in effect), but
doesn’t use these terms and support not entirely
straightforward ... Further details beyond scope of this
seminar
POINT example in SQL:
CREATE TYPE POINT AS
( X NUMERIC(5,1) , Y NUMERIC(5,1) ) NOT FINAL ;
Recommendation: Use database constraints to make up
for SQL’s lack of type constraints
Duplication of effort much better than having bad data in the
database!
Copyright C. J. Date 2008
page 233
DATABASE CONSTRAINTS :
CONSTRAINT CX1 IS_EMPTY
( S WHERE STATUS < 1
OR
STATUS > 100 ) ;
CREATE ASSERTION CX1 CHECK
( NOT EXISTS
( SELECT * FROM S
WHERE STATUS < 1
OR
STATUS > 100 ) ) ;
CONSTRAINT CX2 IS_EMPTY
( S WHERE CITY = ‘London’
AND
STATUS  20 ) ;
CREATE ASSERTION CX2 CHECK
( NOT EXISTS
( SELECT * FROM S
WHERE CITY = ‘London’
AND
STATUS <> 20 ) ) ;
• CX1 and CX2 are "tuple" (or "row") constraints:
Deprecated terms
Copyright C. J. Date 2008
page 234
CONSTRAINT CX3
COUNT ( S ) =
COUNT ( S { SNO } ) ;
CREATE ASSERTION CX3 CHECK
( UNIQUE ( SELECT SNO
FROM S ) ) ;
• {SNO} is a superkey for S
• In practice would use KEY or UNIQUE shorthand
• Note: UNIQUE in SQL returns TRUE iff every row in its
argument table is distinct /* more later */
• Alternative SQL formulation:
CREATE ASSERTION CX3 CHECK
( ( SELECT COUNT ( SNO ) FROM S ) =
( SELECT COUNT ( DISTINCT SNO ) FROM S ) ) ;
Copyright C. J. Date 2008
page 235
CONSTRAINT CX4
COUNT ( S { SNO } ) =
COUNT ( S { SNO , CITY } ) ;
CREATE ASSERTION CX4 CHECK
( NOT EXISTS ( SELECT *
FROM S AS SX
WHERE EXISTS ( SELECT *
FROM S AS SY
WHERE SX.SNO = SY.SNO
AND
SX.CITY <> SY.CITY ) ) ) ;
• Functional dependence {SNO}  {CITY}
• In practice this FD implied by fact that {SNO} is a superkey,
so no need to state CX4 explicitly ... but not all FDs are
consequences of keys
• But most will be, if DB well designed!
Copyright C. J. Date 2008
page 236
CONSTRAINT CX5 IS_EMPTY
( ( S JOIN SP )
WHERE STATUS < 20
AND
PNO = ‘P6’ ) ;
CREATE ASSERTION CX5 CHECK
( NOT EXISTS
( SELECT *
FROM
S NATURAL JOIN SP
WHERE STATUS < 20
AND
PNO = ‘P6’ ) ) ;
• "Multi-relvar" constraint: Slightly deprecated term
• CX1-CX4 were single-relvar constraints, or just relvar
constraints for short: Slightly deprecated terms
Copyright C. J. Date 2008
page 237
CONSTRAINT CX6
SP { SNO }  S { SNO } ;
CREATE ASSERTION CX6 CHECK
( NOT EXISTS
( SELECT SNO
FROM
SP
EXCEPT
SELECT SNO
FROM
S));
• Foreign key constraint from SP to S
• In practice would use FOREIGN KEY shorthand
(at least in SQL)
Copyright C. J. Date 2008
page 238
DATABASE CONSTRAINTS IN SQL :
Any DB constraint expressible in Tutorial D can be expressed
in SQL via CREATE ASSERTION (unless "possibly
nondeterministic" ???)
But SQL also supports base table constraints ... e.g.:
CREATE TABLE SP
( ... ,
CONSTRAINT CX5 CHECK
( PNO <> ‘P6’ OR ( SELECT STATUS FROM S
WHERE SNO = SP. SNO ) > 20 ) ) ;
Equivalent formulation could be specified on base table S
instead—or any base table in the database!
Useful for "row constraints" but not for other kinds
Copyright C. J. Date 2008
page 239
CREATE TABLE S ( ... ,
CONSTRAINT CX1 CHECK ( STATUS >= 1 AND STATUS <= 100 ) ) ;
CREATE TABLE S ( ... ,
CONSTRAINT CX2 CHECK ( STATUS = 20 OR CITY <> ‘London’ ) ) ;
SQL also supports column constraints ... e.g., NOT NULL, and
key constraints for keys of degree one
Note:
 Base table constraint for T automatically satisfied if T
is empty (!)
 (Important) Most current products support simple row
constraints (plus key and FK constraints) only !!!
Copyright C. J. Date 2008
page 240
OK, so I saved the bad news till last ...
Recommendations:
 State constraints declaratively wherever possible
 Use triggered procedures to enforce constraints that can’t
be stated declaratively
See Applied Mathematics for Database Professionals,
by Lex de Haan and Toon Koppelaars (Apress, 2007)
 Lobby the vendors!
Copyright C. J. Date 2008
page 241
Distinction single- vs. multi-relvar constraints is more
pragmatic than logical ... because:
Like single-relvar constraints, multi-relvar constraints
must be checked "immediately" !!!
All constraints must be satisfied at statement boundaries
—no "deferred" or COMMIT-time checking at all!
(contrary to SQL standard and some commercial products)
In order to explain this unorthodox view, I need to digress
for a moment and talk about transactions ...
Copyright C. J. Date 2008
page 242
THE "ACID" PROPERTIES :
Atomicity: Transactions are "all or nothing"
Consistency: Transactions transform a consistent state of
the DB into another consistent state, without necessarily
preserving consistency at all intermediate points
Isolation: Any given transaction's updates are concealed
from all other transactions until the given transaction
commits
Durability: Once a transaction commits, its updates survive
in the DB, even if there's a subsequent system crash
Copyright C. J. Date 2008
page 243
One argument in favor of transactions has always been
that transactions are supposed to be a unit of integrity
(see "Consistency" on previous page)
But I no longer believe this argument!—I now think
statements have to be that "unit of integrity”—i.e.,
to repeat, constraints must be satisfied at statement
boundaries
Why have I changed my mind?
For at least five reasons:
Copyright C. J. Date 2008
page 244
FIRST AND MOST IMPORTANT :
As we have seen, a DB can be regarded as a collection
of propositions, assumed by convention to be ones that
evaluate to TRUE
And if that collection is ever allowed to include any
inconsistencies, then all bets are off!
 I'll come back to this point later ...
The "I" property might mean that only one transaction ever
sees any particular inconsistency, but that particular
transaction does see the inconsistency and can thus produce
wrong answers
Copyright C. J. Date 2008
page 245
SECOND :
I don't agree that any given inconsistency can be seen by
only one transaction, anyway ... E.g.:
 Suppose transaction TX1 obtains some incorrect information
from the DB and writes it to file F
 Suppose transaction TX2 now reads that same information
from file F
TX1 has "infected" TX2 ... TX1 and TX2 aren't really isolated
from each other ... Even if they run at totally different times!
I don't believe in the "I" property of transactions
Copyright C. J. Date 2008
page 246
THIRD :
 Don't want every program or other “code unit” to have
to cater for the possibility that the DB might be
inconsistent when it runs!

Severe loss of orthogonality if a procedure that
assumes consistency becomes unsafe to use when
checking is deferred
 Desirable to be able to specify a code unit
independently of whether that unit is to run as a
transaction per se or as part of a transaction

In fact, I’d like nested transactions ... but that's
a topic for another day
Copyright C. J. Date 2008
page 247
FOURTH :
The Principle of Interchangeability (of base relvars and
views—see later) implies that the very same constraint might
be a single-relvar constraint with one design for the DB and
a multi-relvar constraint with another
E.g., VAR LS VIRTUAL ( S WHERE CITY = ‘London’ ) ;
VAR NLS VIRTUAL ( S WHERE CITY  ‘London’ ) ;
Instead of S being real and LS and NLS virtual, we could
make LS and NLS real and S virtual!—S is the union of
restrictions LS and NLS, and mapping works both ways
/* more on interchangeability later */
Copyright C. J. Date 2008
page 248

SNO unique in S

SNO unique across LS and NLS — multi-relvar
constraint
CONSTRAINT CX7 IS_EMPTY
( LS { SNO } JOIN
NLS { SNO } ) ;
Copyright C. J. Date 2008
— single-relvar
constraint
CREATE ASSERTION CX7 CHECK
( NOT EXISTS
( SELECT *
FROM LS , NLS
WHERE LS.SNO = NLS.SNO ) ) ;
page 249
FIFTH :
Semantic optimization uses constraints to simplify queries
(for performance reasons) ... E.g.:
Constraint: All red parts must be stored in London
Query:

Find suppliers who supply only red parts and are
located in the same city as at least one of the parts
they supply
Find London suppliers who supply only red parts
Payoff could be orders of magnitude greater than that from
conventional optimization ... but it requires DB to be consistent
at all times, not just transaction boundaries (if constraints
aren’t satisfied, simplifications will be invalid, and answers will
be wrong)
Copyright C. J. Date 2008
page 250
BUT DOESN'T SOME CHECKING HAVE
TO BE DEFERRED ???
E.g., "Supplier S1 and part P1 are in the same city":
 If supplier S1 moves from London to Paris, then part P1
must move from London to Paris as well
 Conventional solution /* SQL */ :
START TRANSACTION ;
UPDATE S SET CITY = ‘Paris’ WHERE SNO = ‘S1’ ;
UPDATE P SET CITY = ‘Paris’ WHERE PNO = ‘P1’ ;
COMMIT ; /* integrity check done here */
 If this transaction asks "Are supplier S1 and part P1
in the same city?" between the two UPDATEs, it will
get the answer no
Copyright C. J. Date 2008
page 251
Tutorial D SOLUTION :
The multiple assignment operator lets us carry out several
assignments as a single operation, without any integrity
checking being done until all assignments have been executed:
UPDATE S WHERE SNO = ‘S1’ : { CITY := ‘Paris’ } ,
UPDATE P WHERE PNO = ‘P1’ : { CITY := ‘Paris’ } ;
Note comma separator … One statement, not two!
Shorthand for:
S := … , P := … ;
Copyright C. J. Date 2008
page 252
SEMANTICS /* slightly simplified */ :
1. Evaluate source expressions
2. Execute individual assignments in sequence
3. Do integrity checking
No individual assignment depends on any other ... No way
for the transaction to see an inconsistent state of the DB
between the two UPDATEs, because notion of "between
the two UPDATEs" has no meaning ... Now no need for
deferred checking at all!
Note: I’m not saying we don’t need transactions !!!
By the way: SQL already has some multiple assignment!
Copyright C. J. Date 2008
page 253
Recommendation:
Given the state of today’s SQL products, some constraint
checking will probably have to be deferred ...
In which case, you should do whatever it takes—probably
terminate the transaction—to force the check to be done
before performing any operation that might rely on the
constraint being satisfied
Copyright C. J. Date 2008
page 254
CONSTRAINTS AND PREDICATES :
Relvar predicate for R is "intended interpretation" for R … but
it (and corresp propositions) aren’t and can’t be understood
by the system
 System can't know what it means for a "supplier" to
"be located" somewhere, etc.—that's interpretation
 System can't know a priori whether what the user tells
it is true!—can only check the integrity constraints ...
If OK, system accepts user assertion as true from
this point forward
System can't enforce truth, only consistency !!!
Copyright C. J. Date 2008
page 255
 Correct implies consistent
Converse not true
 Inconsistent implies incorrect
Converse not true
DB is correct iff it fully reflects the true state of
affairs in the real world ... but the best the system
can do is ensure the DB is consistent (= satisfies all
known integrity constraints)
Copyright C. J. Date 2008
page 256
Let C1, C2, ..., Cn be all of the DB constraints that
mention base relvar R. Then:
( C1 ) AND ( C2 ) AND ... AND ( Cn ) AND TRUE
is THE (total) relvar constraint for R
Let R1, R2, ..., Rm be all of the base relvars in DB, and
let corresp (total) relvar constraints be RC1, RC2, ...,
RCm, respectively. Then:
( RC1 ) AND ( RC2 ) AND ... AND ( RCm ) AND TRUE
is THE (total) database constraint for DB
Copyright C. J. Date 2008
page 257
The Golden Rule:
 No database is ever allowed to violate its
total DB constraint
/* and therefore: */
 No relvar is ever allowed to violate its
total relvar constraint
Criterion for acceptability of updates ...
Total relvar constraint for R is system’s best
approximation to relvar predicate for R
Copyright C. J. Date 2008
page 258
CONSTRAINTS ARE VITAL !!!
Recall that a DB can be regarded as a collection of
propositions ... and if that collection is ever allowed
to include any inconsistencies, all bets are off!
Proof:
 Suppose DB implies both p and NOT p are TRUE
(there's the inconsistency)
 Let q be any arbitrary proposition
 From truth of p, infer truth of p OR q
 From truth of p OR q and truth of NOT p, infer
truth of q ... but q was arbitrary !!!
Copyright C. J. Date 2008
page 259
STRUCTURE OF PRESENTATION :
1. Setting the scene
8.
SQL and constraints
2. Types and domains
9.
SQL and views
3. Tuples and relations,
rows and tables
10.
SQL and logic I:
Relational calculus
4. No duplicates, no nulls
11.
SQL and logic II:
Using logic to write SQL
12.
Further SQL topics
6. SQL and algebra I:
The original operators
13.
Appendix:
The relational model
7. SQL and algebra II:
Additional operators
14.
Appendix: DB design
5. Base relvars, base tables
Copyright C. J. Date 2008
page 260
VIRTUAL RELVARS ("VIEWS") :
 A view is a relvar that "looks and feels" just like a base
relvar but doesn’t exist independently of other relvars (it’s
defined in terms of them)

Repeat: A view is a relvar! ("CREATE TABLE" vs.
"CREATE VIEW" was at least a psychological mistake)
 A view is a derived relvar

All virtual relvars are derived but some derived ones
aren’t virtual /* see snapshots, later */
 A view is a window into underlying relvars ... Ops on
view are "really" ops on those underlying relvars
 A view is a "canned query" (i.e., named rel exp)
Copyright C. J. Date 2008
page 261
VIEWS ARE RELVARS :
A view V is a relvar whose value at time t = result of evaluating
certain rel exp at time t ... View defining expression specified
when V is defined and must mention at least one relvar
VAR LS VIRTUAL
( S WHERE
CITY = ‘London’ ) ;
CREATE VIEW LS AS
( SELECT *
FROM
S
WHERE CITY = ‘London’ )
WITH CHECK OPTION ;
VAR NLS VIRTUAL
( S WHERE
CITY  ‘London’ ) ;
CREATE VIEW NLS AS
( SELECT *
FROM
S
WHERE CITY <> ‘London’ )
WITH CHECK OPTION ;
Copyright C. J. Date 2008
page 262
CREATE VIEW allows parenthesized column name commalist
after view name ... E.g.
CREATE VIEW SDS ( SNAME , DOUBLE_STATUS )
AS ( SELECT DISTINCT SNAME , 2 * STATUS
FROM S ) ;
Recommendation: Don’t do this. Instead:
CREATE VIEW SDS
AS ( SELECT DISTINCT SNAME , 2 * STATUS AS DOUBLE_STATUS
FROM S ) ;
Tell DBMS once not twice that SNAME column is called SNAME!
Copyright C. J. Date 2008
page 263
THE PRINCIPLE OF INTERCHANGEABILITY :
Instead of S being real and LS and NLS virtual, we could make
LS and NLS real and S virtual—S is the union of restrictions LS
and NLS, and mapping works both ways:
VAR LS BASE RELATION
{ SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }
KEY { SNO } ;
VAR NLS BASE RELATION
{ SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }
KEY { SNO } ;
VAR S VIRTUAL ( LS D_UNION NLS ) ; /* disjoint union */
/* plus certain constraints on, e.g., CITY */
Copyright C. J. Date 2008
page 264
Designs are information equivalent ... So: Which
relvars are base ones and which virtual is arbitrary
(formally speaking, at least) ... Hence:
The Principle of Interchangeability: There must be no
arbitrary and unnecessary distinctions between base and
virtual relvars ... Virtual relvars should "look and feel" just
like base ones to the user
 Having keys or not

Integrity in general


Tuple IDs
"Entity integrity"
... and we MUST be able to "update views" !!!
Copyright C. J. Date 2008
page 265
RELATION CONSTANTS
/* digression */ :
View defining exp must mention at least one relvar ...
Otherwise the "variable" isn’t a variable! Consider,
e.g., following SQL view defn:
CREATE VIEW S_CONST ( SNO ,
AS VALUES ( ‘S1’ ,
( ‘S2’ ,
( ‘S3’ ,
( ‘S4’ ,
( ‘S5’ ,
SNAME ,
‘Smith’ ,
‘Jones’ ,
‘Blake’ ,
‘Clark’ ,
‘Adams’,
STATUS , CITY )
20 , ‘London’ ) ,
10 , ‘Paris’
),
30 , ‘Paris’
),
20 , ‘London’ ) ,
30 , ‘Athens’ ) ;
Not updatable! Really a named relation constant
Copyright C. J. Date 2008
page 266
NAMED CONSTANTS ARE USEFUL :
CONST PERIODIC_TABLE INIT ( RELATION
{ TUPLE { ELEMENT ‘Hydrogen’ , SYMBOL ‘H’ , ATOMICNO 1 } ,
{ TUPLE { ELEMENT ‘Helium’ , SYMBOL ‘He’ , ATOMICNO 2 } ,
................................................................................................
{ TUPLE { ELEMENT ‘Uranium’ , SYMBOL ‘U’ , ATOMICNO 92 }
});
Note: TABLE_DUM and TABLE_DEE are system defined
"relcons"

Can simulate relcons via view mechanism, but there’s a
logical difference between variables and constants ...

... also between constants and literals
Copyright C. J. Date 2008
page 267
VIEWS AND PREDICATES :
A view is a relvar and has a relvar predicate, derived from
preds for underlying relvars ... E.g., view LS:
Supplier SNO is under contract, is named SNAME, has
status STATUS, and is located in city CITY AND city CITY
is London
More colloquially:
Supplier SNO is under contract, is named SNAME, has
status STATUS, and is located in London
But latter obscures fact that CITY is a parameter ... It is a
parameter, but corresp argument is constant (in practice,
would probably project away CITY attribute)
Copyright C. J. Date 2008
page 268
RETRIEVAL OPERATIONS :
User operates on views as if they were real ...DBMS maps
operations into corresponding operations on base relvars in
terms of which views are (ultimately) defined
Read-only operations are straightforward: e.g.,
SELECT SNO
maps to SELECT LS.SNO
FROM LS
FROM ( SELECT S.*
WHERE STATUS > 10
FROM S
WHERE S.CITY = ‘London’ ) AS LS
WHERE LS.STATUS > 10
and then (?) to SELECT
FROM
WHERE
AND
Copyright C. J. Date 2008
S.SNO
S
S.CITY = ‘London’
S.STATUS > 10
page 269
RETRIEVAL OPERATIONS
(cont.) :
Foregoing substitution procedure works because of closure!
Didn’t always work in early versions of SQL ... E.g.:
CREATE VIEW V AS
( SELECT CITY , SUM ( STATUS ) AS ST
FROM S
GROUP BY CITY ) ;
SELECT CITY
maps to (???)
FROM V
WHERE ST > 25
SELECT
FROM
WHERE
GROUP
S.CITY
S
SUM ( S.STATUS ) > 25
BY S.CITY
So some products implement some view retrievals by
materialization instead of substitution (!)
Copyright C. J. Date 2008
page 270
VIEWS AND CONSTRAINTS :
A view is a relvar and has a (total) relvar constraint, derived
from constraints for underlying relvars
E.g., view LS: {SNO} is a key ... AND CITY = ‘London’
Even though derived, nice to be able to declare such view
constraints explicitly ... (a) DBMS might not be able to do the
derivation; (b) documentation (explain semantics); (c) another
reason to come! E.g.:
VAR LS VIRTUAL ( S WHERE CITY = ‘London’ )
KEY { SNO };
Copyright C. J. Date 2008
page 271
Recommendation: In SQL, include such specifications
as comments. E.g.:
CREATE VIEW LS
AS ( SELECT *
FROM S
WHERE CITY = ‘London’ )
/* UNIQUE ( SNO ) */
WITH CHECK OPTION ;
Note: "View constraints" can always be formulated via
CREATE ASSERTION (if supported!)
Of course, we don’t want "the same" constraint to be
checked twice ...
Copyright C. J. Date 2008
page 272
A MORE COMPLEX EXAMPLE :
CREATE TABLE FDH
( FLIGHT ... , DESTINATION ... , HOUR ... ,
UNIQUE ( FLIGHT ) ) ;
CREATE TABLE DFGP
( DAY ... , FLIGHT ... , GATE ... , PILOT ... ,
UNIQUE ( DAY , FLIGHT ) ) ;
Constraints:
BTCX1: IF ( f1,n1,h ), ( f2,n2,h ) IN FDH AND
( d,f1,g,p1 ), ( d,f2,g,p2 ) IN DFGP
THEN f1 = f2 AND p1 = p2
BTCX1: IF ( f1,n1,h ), ( f2,n2,h ) IN FDH AND
( d,f1,g1,p ), ( d,f2,g2,p ) IN DFGP
THEN f1 = f2 AND g1 = g2
Copyright C. J. Date 2008
page 273
CREATE ASSERTION BTCX1 CHECK
( NOT ( EXISTS ( SELECT * FROM FDH AS FX WHERE
EXISTS ( SELECT * FROM FDH AS FY WHERE
EXISTS ( SELECT * FROM DFGP AS DX WHERE
EXISTS ( SELECT * FROM DFGP AS DY WHERE
FY.HOUR = FX.HOUR AND
DX.FLIGHT = FX.FLIGHT AND
DY.FLIGHT = FY.FLIGHT AND
DY.DAY = DX.DAY AND
DY.GATE = DX.GATE AND
( FX.FLIGHT <> FY.FLIGHT OR
DX.PILOT <> DY.PILOT ) ) ) ) ) ) ) ;
BTCX2 is analogous
Copyright C. J. Date 2008
page 274
BUT :
CREATE VIEW V AS
( FDH NATURAL JOIN DFGP ,
UNIQUE ( DAY , HOUR , GATE ) , /* hypothetical */
UNIQUE ( DAY , HOUR , PILOT ) ) ;
/* syntax !!!
*/
Or /* valid syntax */ :
CREATE VIEW V AS FDH NATURAL JOIN DFGP ;
CREATE ASSERTION VCX1 CHECK
( UNIQUE ( SELECT DAY , HOUR , GATE FROM V ) ) ;
CREATE ASSERTION VCX2 CHECK
( UNIQUE ( SELECT DAY , HOUR , PILOT FROM V ) ) ;
/* Could replace "V" by defn */
Copyright C. J. Date 2008
page 275
UPDATE OPERATIONS :
The Principle of Interchangeability implies that views must
be updatable!
(What? Really? Even views like S JOIN P?)
Well, certain updates on certain base relvars can’t be done,
either! ... Fail on violations of either The Golden Rule or The
Assignment Principle (ignore latter possibility for simplicity)
So to support updates on view V, DBMS needs to know total
relvar constraint VC for V ... i.e., needs to do constraint
inference
Today’s products don’t and are therefore very weak on view
updating
Copyright C. J. Date 2008
page 276
UPDATE OPERATIONS
(cont.) :
Today’s products typically don’t allow updating views any
more complex than simple restrictions and/or projections of
single underlying base table (and even here there are
problems) ... e.g., DELETE on view LS probably OK ... but
what about INSERT ???
Recommendation: Specify WITH CASCADED CHECK
OPTION on view definitions
whenever possible
Note: SQL’s support for view updating is not only limited and
ad hoc—it’s also extremely hard to understand
From the SQL standard:
Copyright C. J. Date 2008
page 277
[The] <query expression> QE1 is updatable if and only if for
every <query expression> or <query specification> QE2 that
is simply contained in QE1:
a) QE1 contains QE2 without an intervening <non join
query expression> that specifies UNION DISTINCT,
EXCEPT ALL, or EXCEPT DISTINCT.
b) If QE1 simply contains a <non join query expression>
NJQE that specifies UNION ALL, then:
i) NJQE immediately contains <query expression> LO
and a <query term> RO such that no leaf generally
underlying table of LO is also a leaf generally
underlying table of RO.
(cont.)
Copyright C. J. Date 2008
page 278
ii) For every column of NJQE, the underlying columns in
the tables identified by LO and RO, respectively, are
either both updatable or not updatable.
c) QE1 contains QE2 without an intervening <non join
query term> that specifies INTERSECT.
d) QE2 is updatable.
Copyright C. J. Date 2008
page 279
OBSERVE THAT :

Foregoing is just one of many rules that have to be taken
in combination in order to determine whether a given
SQL view is updatable

Rules scattered over many different parts of the
document

Rules rely on many additional concepts and constructs—
e.g., updatable columns, leaf generally underlying tables,
<non join query term>s—defined in still further parts of
the document
Copyright C. J. Date 2008
page 280
LOOSELY, FOLLOWING SQL VIEWS
ARE UPDATABLE :
1. Restriction and/or projection of single base table
2. One to one or one to many join of two base tables (many
side only, in latter case)
3. UNION ALL or INTERSECT of two distinct base tables
4. Certain combinations of Cases 1-3 above
Even these cases are treated incorrectly, because of (a) lack
of constraint inference; (b) duplicates; (c) nulls
Copyright C. J. Date 2008
page 281
Picture complicated still further ... A view can be:
•
Updatable
•
Potentially updatable
•
Simply updatable
•
Insertable into
Note implication that some views might permit some
updates but not others ... and further implication that
DELETE and INSERT might not be inverses
Recommendation: Lobby the vendors!
Copyright C. J. Date 2008
page 282
WHAT ARE VIEWS FOR ???
1. User U1 who defines view V is aware of exp X that defines V
... U1 can use name V wherever exp X is intended, but such
uses are really just shorthand
E.g., U1 might have perception
S and SP
plus V  S JOIN SP
(for updates)
(for retrievals)
but U1 knows these relvars aren’t all independent
2. User U2 who is merely informed that V is available for use
should typically not be aware of exp X ... To U2, V should look
just like a base relvar (logical data independence)
/* have been assuming this case */
Copyright C. J. Date 2008
page 283
VIEWS AND SNAPSHOTS :
Contrast views and snapshots—also derived, but real
not virtual ... e.g.:
VAR LSS SNAPSHOT ( S WHERE CITY = ‘London’ )
KEY { SNO }
REFRESH EVERY DAY ;
SQL has CREATE TABLE AS ... but no REFRESH
Many applications can tolerate—might even require—data
"as of" some point in time (e.g., end of an accounting period)
Copyright C. J. Date 2008
page 284
WATCH OUT FOR TERMINOLOGY !
Much current DB literature refers to snapshots as
"materialized views" ... which is a contradiction in terms,
pretty much (whole point about views as far as RM is
concerned is that they’re virtual)
And then typically goes on to abbreviate "materialized view"
to just view (!) ...
So ubiquitously, in fact, that the unqualified term view has
come to mean, almost always, a snapshot instead (at least
in the academic world), and we no longer have a good term
for view in its original sense
Recommendations: Never use the term view, unqualified,
to mean a snapshot; never use the term materialized view;
and watch out for violations of these recommendations!
Copyright C. J. Date 2008
page 285
STRUCTURE OF PRESENTATION :
1. Setting the scene
8.
SQL and constraints
2. Types and domains
9.
SQL and views
3. Tuples and relations,
rows and tables
10.
SQL and logic I:
Relational calculus
4. No duplicates, no nulls
11.
SQL and logic II:
Using logic to write SQL
12.
Further SQL topics
6. SQL and algebra I:
The original operators
13.
Appendix:
The relational model
7. SQL and algebra II:
Additional operators
14.
Appendix: DB design
5. Base relvars, base tables
Copyright C. J. Date 2008
page 286
SQL AND LOGIC :
Relational calculus:
•
Alternative to relational algebra
•
Queries, constraints, view definitions, etc. can be stated
in calculus terms as well as algebraic ones
/* sometimes one is easier, sometimes the other */
•
Applied form of predicate calculus (aka predicate logic)
•
RDB language can be based on either algebra or
calculus ... Tutorial D? SQL?
Copyright C. J. Date 2008
page 287
LOGIC : PROPOSITIONS
A proposition is a declarative sentence, or statement, that’s
categorically either true or false. Examples:
1. 2 + 3 = 5
2. 2 + 3 > 7
3. Jupiter is a star
4. Mars has two moons
5. Venus is between Earth and Mercury
Copyright C. J. Date 2008
page 288
POINTS ARISING :


Don’t fall into the common trap of thinking
propositions are always true ... A false proposition is
still a valid proposition
Informally, P is a valid proposition if and only if the
following is a valid question: "Is it true that P?"


Very fine point (which I’m mostly going to ignore):
The proposition isn’t really the declarative sentence
as such—rather, it’s the assertion made by that
sentence ... E.g., "It’s hot" and "Il fait chaud" denote
the same proposition
Copyright C. J. Date 2008
page 289
SO HOW MANY OF THE FOLLOWING
ARE PROPOSITIONS ???
1. Bach is the greatest musician
who ever lived.
6. Supplier S1 is located in
London.
2. What’s the time?
7. We both have the same
favorite author x.
3. Supplier S2 is located in
some city x.
8. Nothing is heavier than
lead.
4. Some countries have a female
president.
9. It will rain tomorrow.
5. All politicians are corrupt.
10. Supplier S6’s city is
unknown.
Copyright C. J. Date 2008
page 290
LOGIC : CONNECTIVES
Operators for combining propositions to make further
(compound) propositions ... Simple proposition = one with
no connectives ... Truth tables:
NOT
t
f
f
t
OR
t
f
tf
tt
tf
AND
t
f
tf
tf
ff
IF
t
f
tf
tf
tt
IFF
t
f
tf
tf
ft
Negation:
E.g., NOT (Jupiter is a star)
: TRUE
Disjunction:
E.g., (Mars has two moons) OR
(2 + 3 > 7)
: TRUE
E.g., (Mars has two moons) AND
(2 + 3 > 7)
: FALSE
Conjunction:
Copyright C. J. Date 2008
page 291
Implication (IMPLIES, also written IF ... THEN ...):
E.g.,
IF (Mars has two moons)
THEN (Venus is between Earth and Mercury) : TRUE
/* see later */
Bi-implication (BI-IMPLIES, also written IF AND ONLY IF
or IFF or "") : E.g.,
(2 + 3 = 5) IFF (Jupiter is a star)
: FALSE
In practice we use symbols for the connectives (usually)
and adopt precedence rules that allow us to drop parens
Copyright C. J. Date 2008
page 292
CAVEAT :
Connectives are close but not identical to their natural
language counterparts ... because they’re meant to be
context independent
E.g., p AND q  q AND p
But "and" is not necessarily commutative in natural
language... Contrast:
•
I voted for a change in leadership and I was seriously
disappointed
•
I was seriously disappointed and I voted for a change in
leadership
Copyright C. J. Date 2008
page 293
A NOTE ON IMPLICATION :
Truth table not symmetric (i.e., op not commutative):
TRUE if p is FALSE and q is TRUE

IF p THEN q is
FALSE if p is TRUE and q is FALSE
FALSE implies anything!

IF p THEN q  ( NOT p ) OR q
Aside: This latter is a tautology ... Evaluates to
TRUE no matter what p and q stand for*
And here’s a contradiction: p AND NOT p
*
Tautologies of form a  b are particularly important
Copyright C. J. Date 2008
page 294
RE "FALSE IMPLIES ANYTHING" :
Consider integrity constraint on suppliers:
If supplier s is located in London, then supplier s
must have status 20
Formally, this is an implication:*
IF s.CITY = ‘London’ THEN s.STATUS = 20
Don’t want the check to fail if the city isn’t London!
* Slightly simplified for sake of the example
Copyright C. J. Date 2008
page 295
Again consider following constraint:
IF s.CITY = ‘London’ THEN s.STATUS = 20
Following is logically equivalent:
IF NOT ( s.STATUS = 20 ) THEN NOT ( s.CITY = ‘London’ )
i.e., IF s.STATUS  20 THEN s.CITY  ‘London’
Contrapositive of original ... More generally:
IF p THEN q  IF NOT q THEN NOT p
Copyright C. J. Date 2008
page 296
HOW MANY OF THE FOLLOWING PROPOSITIONS
ARE LOGICALLY DISTINCT ???
1. ( P.WEIGHT > 17.0 ) IMPLIES ( P.CITY  ‘Paris’ )
2. ( P.CITY = ‘Paris’ ) IMPLIES ( P.WEIGHT < 17.0 )
3. ( P.WEIGHT < 17.0 ) OR ( P.CITY  ‘Paris’ )
4. NOT ( ( P.CITY = ‘Paris’ ) AND ( P.WEIGHT > 17.0 ) )
Copyright C. J. Date 2008
page 297
HOW MANY OF THE FOLLOWING PROPOSITIONS
ARE LOGICALLY DISTINCT ???
Let x = (P.WEIGHT > 17.0), y = (P.CITY  ‘Paris’ )
•
IF x THEN y
•
IF NOT y THEN NOT x
•
( NOT x ) OR y
•
NOT ( ( NOT y ) AND x )
Lessons learned:
•
Manipulations can be done purely formally!
•
Equivalences not always immediately obvious!
Copyright C. J. Date 2008
page 298
MORE CONNECTIVES :
XOR
t
f
p or q but
not both
t f
f t
t f
NOR
t
f
t f
f f
f t
NAND
t
f
t f
f t
t t
NOT (p OR q)
= neither p
nor q
NOT (p AND q)
= not both p
and q
Peirce arrow
pq
Sheffer stroke*
p q
Exactly 4 monadic / 16 dyadic connectives in total
(not all named):
* Slightly unfortunate because " " is also used for OR
Copyright C. J. Date 2008
page 299
THE 4 MONADICS :
t
f
t
t
Copyright C. J. Date 2008
t
f
t
f
NOT
t
f
f
t
t
f
f
f
page 300
THE 16 DYADICS :
t
f
tf
tt
tt
OR
t
f
tf
tt
tf
t
f
tf
tf
tf
t
f
tf
tt
ft
IFF
t
f
tf
tf
ft
t
f
tf
tt
ff
AND
t
f
tf
tf
ff
Copyright C. J. Date 2008
IF
t
f
tf
tf
tt
NAND
t
f
tf
ft
tt
t
f
tf
ff
tt
XOR
t
f
tf
ft
tf
t
f
tf
ff
tf
t
f
tf
ft
ft
NOR
t
f
tf
ff
ft
t
f
tf
ft
ff
t
f
tf
ff
ff
page 301
COMPLETENESS :
A logical system is truth functionally complete if and only
if all possible connectives can be expressed in terms of the
given ones
The 20 possible connectives are not all primitive
Primitive sets:
Copyright C. J. Date 2008
{ NOT, OR }
{ NOT, AND }
{ NOR }
{ NAND }
page 302
TRUTH TABLES REVISITED :
Alternative style (example):
This style can be used to
show truth value of arb
log exp in terms of truth
values of components: e.g.,
(NOT q) IMPLIES (NOT p)
p
t
t
f
f
q
t
f
t
f
p AND q
t
f
f
f
p
q
NOT p
NOT q
(NOT q) IMPLIES (NOTp)
t
t
f
f
t
f
t
f
f
f
t
t
f
t
f
t
t
f
t
t
Copyright C. J. Date 2008
page 303
EXAMPLES :

Prove (NOT p) OR q  p IMPLIES q
p
t
t
f
f

q
t
f
t
f
NOT p
f
f
t
t
(NOT p) OR q
t
f
t
t
p IMPLIES q
t
f
t
t
Prove (NOT p) AND ( p OR q) IMPLIES q is a tautology
p
t
t
f
f
Copyright C. J. Date 2008
q
t
f
t
f
NOT p
f
f
t
t
p OR q
t
t
t
f
(NOT p) AND p
( OR q)
f
f
t
f
*
t
t
t
t
page 304
CONNECTIVES REVISITED :
OR and AND are fundamentally dyadic ... but n-adic versions
can be defined (why, exactly?). Let p1, p2 ..., pn (n > 0) be
propositions. Then:

OR {p1,p2,...,pn} is equivalent to:
FALSE OR (p1) OR (p2) OR ... OR (pn)
Note: If none of the p’s involves any ORs, this prop is in
disjunctive normal form (DNF)

AND {p1,p2,...,pn} is equivalent to:
TRUE AND (p1) AND (p2) AND ... AND (pn)
Note: If none of the p’s involves any ANDs, this prop
is in conjunctive normal form (CNF)
Copyright C. J. Date 2008
page 305
LOGIC : PREDICATES
A predicate is a truth valued function. Examples:
1. x is a star
2. x has two moons
3. x has m moons
4. x is between Earth and y
5. x is between y and z
Note parameters (or placeholders or free variables) ...
Invoking ("instantiating") predicate involves replacing
parameters by arguments and yields a proposition
(which evaluates to TRUE or FALSE, by definition)
Copyright C. J. Date 2008
page 306
Arguments satisfy predicate iff resulting proposition
evaluates to TRUE ... E.g., the sun satisfies "x is a star,"
the moon doesn’t
Predicate with n parameters is n-place or n-adic (and
if n = 0 the predicate is a proposition)
Connectives apply to predicates as well as propositions
... Simple/compound terminology applies too
Terminology: Predicate logic (aka predicate calculus)
= study of predicates, connectives, and logical inferences
that can be made using such predicates and connectives
Copyright C. J. Date 2008
page 307
LOGIC : INFERENCE
Logic includes rules of inference by which new truths
(theorems) can be inferred from given truths (axioms
and/or previously proved theorems)
Modus Ponens: If p IMPLIES q is true and p is true, we
can infer that q is true ("direct reasoning")
E.g., given the truth of both "If I have no money then I
will have to wash dishes" and "I have no money," we
can infer truth of "I will have to wash dishes"
Modus Tollens: If p IMPLIES q is true and q is false, we
can infer that p is false ("indirect reasoning")
Copyright C. J. Date 2008
page 308
LOGIC : QUANTIFICATION
Another way to get a proposition from a predicate ...
Consider monadic predicate p(x) (parameter shown for
clarity). Then these are propositions:
• EXISTS x ( p ( x ) )
/* existential quantifier */
/* —"backward E"
*/
Meaning: At least one value a exists such that p(a)
evaluates to TRUE
• FORALL x ( p ( x ) )
/* universal quantifier */
/* —"upside down A" */
Meaning: All possible values a are such that p(a)
evaluates to TRUE
Copyright C. J. Date 2008
page 309
EXAMPLES :
• EXISTS x ( x is a logician )
TRUE (e.g., take x to be Bertrand Russell)
Single example suffices to show truth
• FORALL x ( x is a logician )
FALSE (e.g., take x to be George W. Bush)
Single counterexample suffices to show falsity
Note: Parameter x must "range over" some set of
permissible values—see later
Copyright C. J. Date 2008
page 310
LET x AND y RANGE OVER PERSONS :
Consider dyadic predicate "x is taller than y"
Quantify over x (using EXISTS, for definiteness):
EXISTS x ( x is taller than y )
Monadic predicate ... Invoke ("instantiate") with
argument Steve:
EXISTS x ( x is taller than Steve )
Proposition: TRUE iff there exists at least one person,
say Arnold, taller than Steve
Copyright C. J. Date 2008
page 311
ALTERNATIVELY :
Quantify over both parameters (using EXISTS, again
for definiteness):
EXISTS x ( EXISTS y ( x is taller than y ) )
Proposition: TRUE iff there are at least two persons
not of the same height
Given an n-adic predicate, quantifying over m parameters
(m < n) yields a k-adic predicate, where k = n - m
EXISTS x ( EXISTS y ( x is taller than y ) )
EXISTS y ( EXISTS x ( x is taller than y ) )
Similarly for FORALL ... Series of like quantifiers can be
written in any sequence without changing semantics
Copyright C. J. Date 2008
page 312
SIX POSSIBLE "FULL QUANTIFICATIONS"
(and six distinct meanings) :
Assuming at least two distinct persons:
1. EXISTS x EXISTS y ( x is taller than y )
Meaning: Somebody is taller than somebody else; TRUE,
unless everybody is the same height
2. EXISTS x FORALL y ( x is taller than y )
Meaning: Somebody is taller than everybody; FALSE
3. FORALL x EXISTS y ( x is taller than y )
Meaning: Everybody is taller than somebody; FALSE
Copyright C. J. Date 2008
page 313
4. EXISTS y FORALL x ( x is taller than y )
Meaning: Somebody is shorter than everybody; FALSE
/* But need to explain that predicates "x is taller
/* than y" and "y is shorter than x" are logically
/* equivalent!
*/
*/
*/
5. FORALL y EXISTS x ( x is taller than y )
Meaning: Everybody is shorter than somebody; FALSE
6. FORALL x FORALL y ( x is taller than y )
Meaning: Everybody is taller than everybody; FALSE
Copyright C. J. Date 2008
page 314
LOGIC : FREE AND BOUND VARIABLES
Recap: A free variable is just a parameter
Quantifying over a free variable makes it bound
E.g.:

x is taller than y
/* x, y both free
*/

EXISTS x ( x is taller than y)
/* x bound, y free
*/

EXISTS x EXISTS y ( x is taller than y)
/* x, y both bound */
So a proposition is a predicate with no free variables!
Copyright C. J. Date 2008
page 315
THE TERMINOLOGY ISN’T VERY GOOD :
Free variables = parameters; but bound variables have no
exact counterpart in conventional programming terms ...
They serve as a kind of dummy, linking the predicate inside
the parens to the quantifier outside. E.g.:
EXISTS x ( x > 3 )
vs.
EXISTS y ( y > 3 )
By contrast, consider:
EXISTS x ( x > 3 ) AND x < 0
EXISTS y ( y > 3 ) AND x < 0
EXISTS y ( y > 3 ) AND y < 0
/* two different x’s !!! */
"Free" and "bound" really apply to variable occurrences in
expressions, not to variables as such ... (sigh)
Copyright C. J. Date 2008
page 316
EXERCISE (Honest Abe) :
"You can fool some of the people some of the time, and
some of the people all the time, but you cannot fool all
the people all of the time."
Is this statement unambiguous? What does it mean?

Analysis: Statement involves three simple predicates (or
propositions?) ANDed together:
you can fool some of the people some of the time
AND
you can fool some of the people all the time
AND /* but maps to AND */
you cannot fool all the people all of the time
Copyright C. J. Date 2008
page 317
EXERCISE (cont.) :
Denote "you can fool person x at time y" by fool(x,y)
"You can fool some of the people some of the time":
EXISTS x EXISTS y ( fool (x, y ) )
— easy enough
"You can fool some of the people all the time":
FORALL y EXISTS x ( fool (x, y ) )
— ???
EXISTS x FORALL y ( fool (x, y ) )
— ???
"You cannot fool all the people all of the time":
I’ll leave this one to you!
Copyright C. J. Date 2008
page 318
RELATIONAL CALCULUS :
SNO and STATUS for suppliers in Paris who supply part P2:
( S WHERE CITY = ‘Paris’ ) { SNO , STATUS }
MATCHING ( SP WHERE PNO = ‘P2’ )
Relational calculus:
RANGEVAR SX RANGES OVER S ;
RANGEVAR SPX RANGES OVER SP ;
{ SX.SNO , SX.STATUS } WHERE SX.CITY = ‘Paris’ AND
EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = ‘P2’ )
Generic form /* of rel calc exp per se */ :
proto tuple WHERE predicate
Copyright C. J. Date 2008
page 319
SQL ANALOG OF EXAMPLE :
SELECT
FROM
WHERE
AND
(
SX.SNO , SX.STATUS
S AS SX
SX.CITY = ‘Paris’
EXISTS
SELECT *
FROM SP AS SPX
WHERE SPX.SNO = SX.SNO
AND
SPX.PNO = ‘P2’ )
So SQL does support range variables /* see next page */
SQL also supports EXISTS, but indirectly: EXISTS sq gives
TRUE if table denoted by sq nonempty, FALSE otherwise*
/* sq usually "correlated" */
*
Never UNKNOWN !!!
Copyright C. J. Date 2008
page 320
SQL RANGE VARIABLES CAN BE IMPLICIT :
SELECT S.SNO , S.STATUS
FROM S
/* implicit: AS S */
WHERE S.CITY = ‘Paris’
AND
EXISTS
( SELECT *
FROM
SP
/* implicit: AS SP */
WHERE
SP.SNO = S.SNO
AND
SP.PNO = ‘P2’ )
"S." and "SP." do not refer to tables S and SP !!!
—they refer to implicit range variables (implicit
correlation names, in SQL terms)
Copyright C. J. Date 2008
page 321
MORE EXAMPLES :
• SNAMEs for suppliers who supply all parts
/* range variable defns omitted */ :
{ SX.SNAME } WHERE
FORALL PX ( EXISTS SPX ( SPX.SNO = SX.SNO AND
SPX.PNO = PX.PNO ) )
Quantifier order important!
SQL analog ??? /* see later */
• SNAMEs for suppliers who supply all red parts:
{ SX.SNAME } WHERE
FORALL PX ( IF PX.COLOR = ‘Red’ THEN
EXISTS SPX ( SPX.SNO = SX.SNO AND
SPX.PNO = PX.PNO ) )
Copyright C. J. Date 2008
page 322
PRENEX NORMAL FORM :
{ SX.SNAME } WHERE FORALL PX
( EXISTS SPX ( IF PX.COLOR = ‘Red’ THEN
SPX.SNO = SX.SNO AND
SPX.PNO = PX.PNO ) )
A predicate is in prenex normal form (PNF) iff (a) it’s quantifier
free or (b) it’s of the form EXISTS x (p) or FORALL x (p), where p
is in PNF in turn:
Q1 x1 ( Q2 x2 ( ... ( Qn xn ( q ) ) ... ) )
where n > 0, each Qi is either EXISTS or FORALL, and q is
quantifier free
PNF is no more correct than any other form, but often easiest
to write
Copyright C. J. Date 2008
page 323
MORE QUERIES :
• Pairs of SNOs where the suppliers are colocated:
{ SX.SNO AS SA , SY.SNO AS SB } WHERE SX.CITY = SY.CITY
AND SX.SNO < SY.SNO
• SNAMEs for suppliers who don’t supply part P2:
{ SX.SNAME } WHERE NOT EXISTS SPX ( SPX.SNO = SX.SNO AND
SPX.PNO = ‘P2’ )
• For each shipment, shipment details, including total shipment
weight:
{ SPX , PX.WEIGHT * SPX.QTY AS SHIPWT } WHERE PX.PNO = SPX.PNO
Copyright C. J. Date 2008
page 324
• For each part, PNO and total shipment quantity:
{ PX.PNO , SUM ( SPX WHERE SPX.PNO = PX.PNO , QTY ) AS TOTQ }
[ WHERE TRUE ]
• Cities that store more than five red parts:
{ PX.CITY } WHERE COUNT
( PY WHERE PY.CITY = PX.CITY AND PY.COLOR = ‘Red’ ) > 5
Copyright C. J. Date 2008
page 325
CONSTRAINTS :
• STATUS must be in the range 1 to 100 inclusive:
CONSTRAINT CX1
FORALL SX ( SX.STATUS > 0 AND SX.STATUS < 101 ) ;
SQL base table constraint (on base table S):
CONSTRAINT CX1 CHECK ( STATUS > 0 AND STATUS < 101 )
Elides the quantifier (and explicit range variable)
• Suppliers in London must have status 20:
CONSTRAINT CX2 FORALL SX ( IF SX.CITY = ‘London’
THEN SX.STATUS = 20 ) ;
Copyright C. J. Date 2008
page 326
• No two suppliers have same SNO:
CONSTRAINT CX3
FORALL SX ( FORALL SY ( IF SX.SNO = SY.SNO THEN
SX.SNAME = SY.SNAME AND
SX.STATUS = SY.STATUS AND
SX.CITY = SY.CITY ) ) ;
• No supplier with status less than 20 can supply part P6:
CONSTRAINT CX5
FORALL SX ( IF SX.STATUS < 20 THEN
NOT EXISTS SPX ( SPX.SNO = SX.SNO AND
SPX.PNO = ‘P6’ ) ) ;
Copyright C. J. Date 2008
page 327
• Every SNO in SP must appear in S:
CONSTRAINT CX6
FORALL SPX ( EXISTS SX ( SX.SNO = SPX.SNO ) ) ;
/* more on this one later */
• No SNO appears in both LS and NLS:
CONSTRAINT CX7 FORALL LX ( FORALL NX ( LX.SNO  NX.SNO ) ) ;
• There must always be at least one supplier:
CONSTRAINT CX9 EXISTS SX ( TRUE ) ;
Copyright C. J. Date 2008
page 328
MORE ON THE QUANTIFIERS :
1. WE DON’T NEED BOTH
EXISTS x ( x is taller than Steve )
NOT FORALL x ( NOT x is taller than Steve )
Say the same thing! More generally:
EXISTS x ( p ( x ) )  NOT FORALL x ( NOT p ( x ) )
Likewise:
FORALL x ( p ( x ) )  NOT EXISTS x ( NOT p ( x ) )
So we don’t need both ... but it’s nice to have both. E.g.:
Copyright C. J. Date 2008
page 329
"GET SUPPLIERS WHO SUPPLY ALL PARTS" :
Compare and contrast:
SX WHERE FORALL PX ( EXISTS SPX
( SX.SNO = SPX.SNO AND SPX.PNO = PX.PNO )
vs.
Copyright C. J. Date 2008
SELECT
FROM
WHERE
(
SX.*
S AS SX
NOT EXISTS
SELECT PX.*
FROM
P AS PX
WHERE NOT EXISTS
( SELECT SPX.*
FROM SP AS SPX
WHERE SX.SNO = SPX.SNO
AND
SPX.PNO = PX. PNO ) )
page 330
MORE ON THE QUANTIFIERS :
2. EMPTY RANGES
EXISTS x ( p ( x ) )  NOT FORALL x ( NOT p ( x ) )
Suppose there are no x’s; then LHS evaluates to FALSE

So RHS evaluates to FALSE

So FORALL x ( NOT p ( x ) ) evaluates to TRUE

But p was arbitrary ...

So FORALL x ( q ( x ) ) evaluates to TRUE:
regardless of the predicate q(x) !
Copyright C. J. Date 2008
page 331
SOME CONSEQUENCES :


Business rule or constraint of the form FORALL x (...)
is "automatically" satisfied if there aren’t any x’s.
E.g., "all taxpayers with taxable income > $1 billion must pay
supertax" automatically satisfied if no taxpayer has such a
large taxable income
Certain queries produce "unexpected" results (if you
don’t know logic). E.g., "get suppliers who supply all
purple parts"—
SX WHERE FORALL PX ( IF PX.COLOR = ‘Purple’
THEN EXISTS SPX ( SX.SNO = SPX.SNO AND
SPX.PNO = PX.PNO ) )
—returns all suppliers if there are no purple parts (!)
Copyright C. J. Date 2008
page 332
MORE ON THE QUANTIFIERS :
3. DEFINITIONS
Consider p(x); let x range over {x1,x2,...,xn}. Then:
EXISTS x ( p ( x ) ) 
FALSE OR p ( x1 ) OR p ( x2 ) OR ... OR p ( xn )
FORALL x ( p ( x ) ) 
TRUE AND p ( x1 ) AND p ( x2 ) AND ... AND p ( xn )
E.g.: let p(x) = x has a moon;
let x range over {Mercury, Venus, Earth, Mars}
But foregoing definitions are valid only because the sets are
all finite! (And even though the quantifiers are thus "just
shorthand," they’re very useful shorthand!)
Copyright C. J. Date 2008
page 333
MORE ON THE QUANTIFIERS :
4. ADDITIONAL KINDS
Possibilities include:

There exist at least three x’s such that

A majority of x’s are such that

An odd number of x’s are such that
and so on ... One important one:

There exists exactly one x such that ("UNIQUE")
E.g.: UNIQUE x ( x has social security number y )
Meaning: Exactly one person has social security number y
Copyright C. J. Date 2008
page 334
CONSTRAINT CX6 REVISITED :
• Every shipment must have a supplier:
CONSTRAINT CX6
FORALL SPX ( EXISTS SX ( SX.SNO = SPX.SNO ) ) ;
Better:
CONSTRAINT CX6
FORALL SPX ( UNIQUE SX ( SX.SNO = SPX.SNO ) ) ;
• SQL has very indirect support:
UNIQUE sq where sq is (SELECT * FROM T WHERE bx)
gives TRUE if at most one row in T satisfies bx, else FALSE
So CX6 becomes:
Copyright C. J. Date 2008
page 335
CREATE ASSERTION CX6 CHECK
( NOT EXISTS ( SELECT *
FROM
SP AS SPX
WHERE NOT EXISTS
( SELECT *
FROM S AS SX
WHERE SX.SNO = SPX.SNO )
OR
NOT UNIQUE
( SELECT *
FROM S AS SX
WHERE SX.SNO = SPX.SNO ) ) ) ;
/* but "OR ... (...)" could be dropped
/* because (SNO) is key for S
Copyright C. J. Date 2008
*/
*/
page 336
SOME EQUIVALENCES :
If IS_EMPTY supported, quantifiers need not be:
EXISTS x ( p )  NOT ( IS_EMPTY ( X WHERE p ) )
FORALL x ( p )  IS_EMPTY ( X WHERE NOT ( p ) )
/* x ranges over X */
These equivalences explain SQL’s EXISTS (which is really an
operator, not a quantifier, in SQL) ... and SQL’s lack of support
for FORALL
EXISTS x ( p )  COUNT ( X WHERE p ) > 0
FORALL x ( p )  COUNT ( X WHERE p ) = COUNT ( X )
UNIQUE x ( p )  COUNT ( X WHERE p ) = 1
Recommendation: Don’t use COUNT in preference to EXISTS
Copyright C. J. Date 2008
page 337
RELATIONAL COMPLETENESS :
For every expression of the rel algebra, there exists an
expression of the rel calculus that’s logically equivalent
(i.e., has same semantics) ...
So rel calculus is at least as “powerful” (better: expressive)
as rel algebra
Not obvious (?), but converse is true too
Both are relationally complete
/* basic measure of expressive power of lang */
What about SQL ???
Copyright C. J. Date 2008
page 338
TO SUM UP :
DB professionals in general and SQL practitioners in
particular should have at least a basic understanding of
logic or relational calculus (it comes to the same thing) !!!
Here’s a quote:
Surely it’s worth investing a little effort up front in becoming
familiar with [basic logic] in order to avoid the problems
associated with ambiguous business rules. Ambiguity in
business rules leads to implementation delays at best or
implementation errors at worst (possibly both). And such
delays and errors certainly have costs associated with them,
costs that are likely to outweigh those initial learning costs
many times over. In other words, framing business rules
properly is a serious matter, and it requires a certain level of
technical competence.
Copyright C. J. Date 2008
page 339
These remarks are set in the context of business rules
specifically, but they’re of wider applicability—as we’ll see
Yes, I know the counterarguments ... but I don’t agree
with them
Reviewer:
Copyright C. J. Date 2008
"Counterarguments to what? Surely not to the
assertion that it would be better if the rule
designer were trained in logic? If so, I’d like to
be told them, and perhaps some others would
feel the same."
page 340
Yes, that’s what I meant ... Claim is:
Logic is simply too difficult for most people to deal with
Might be true in general (big subject!) ... but don’t need to
understand the whole of logic for the purpose at hand ...
and the benefits are so huge!
Small effort up front pays for itself many times over in
avoiding errors in rules, and constraints, and queries,
and on and on
Copyright C. J. Date 2008
page 341
A FINAL REMARK :
Logic is very solid !!!
Began with the ancient Greeks: Aristotle 384-322 BCE
Leibniz 1646-1716: Laid foundations of modern logic
Boole 1815-1864: Laws of Thought (1854)
Frege 1848-1925: Quantifiers (1879)
Wittgenstein 1889-1951: Truth tables (1922)
Etc., etc., etc.
Copyright C. J. Date 2008
page 342
STRUCTURE OF PRESENTATION :
1. Setting the scene
8.
SQL and constraints
2. Types and domains
9.
SQL and views
3. Tuples and relations,
rows and tables
10.
SQL and logic I:
Relational calculus
4. No duplicates, no nulls
11.
SQL and logic II:
Using logic to write SQL
12.
Further SQL topics
6. SQL and algebra I:
The original operators
13.
Appendix:
The relational model
7. SQL and algebra II:
Additional operators
14.
Appendix: DB design
5. Base relvars, base tables
Copyright C. J. Date 2008
page 343
HOW TO WRITE CORRECT SQL AND
KNOW IT :
SQL is complicated and difficult—much more so than SQL
advocates would have you believe ... In fact, it’s
unteachable !!! (so my title might be an overclaim)
So to have a hope of writing correct SQL, you must follow
some discipline
Logic is a HUGE help!

Formulate query (or ...) in logic or rel calc

Map that formulation systematically to SQL
In other words, expression transformation once again
Copyright C. J. Date 2008
page 344
SOME IMPORTANT TRANSFORMATION LAWS :
Law of the form exp1  exp2 implies that if some exp
contains an occurrence of exp1, it can be rewritten as an
exp containing an occurrence of exp2 without changing the
meaning /* crucial point */ ... E.g.
SELECT
FROM
WHERE (
OR
(
SNO
S
STATUS > 10 AND CITY = ‘London’ )
STATUS > 10 AND CITY = ‘Athens’ )
Boolean exp here clearly equivalent to:
STATUS > 10 AND ( CITY = ‘London’ OR CITY = ‘Athens’ )
Thanks to distributivity (of AND over OR)
Copyright C. J. Date 2008
page 345

The distributive laws:
p AND ( q OR r )  ( p AND q ) OR ( p AND r )
p OR ( q AND r )  ( p OR q ) AND ( p OR r )
Here and elsewhere p, q, r denote arb boolean exps
p q r
p AND q
p AND r
2 OR 3
q OR r
p AND 5
T T T
T
T
T
T
T
T T F
T
F
T
T
T
T FT
F
T
T
T
T
T FF
F
F
F
F
F
FT T
F
F
F
T
F
FT F
F
F
F
T
F
FFT
F
F
F
T
F
FFF
F
F
F
F
F
Copyright C. J. Date 2008
page 346

The implication law:
IF p THEN q  ( NOT p ) OR q

The double negation law:
NOT ( NOT p )  p

De Morgan’s laws:
NOT (p AND q )  ( NOT p ) OR ( NOT q )
NOT (p OR q )  ( NOT p ) AND ( NOT q )
Copyright C. J. Date 2008
page 347

The quantification law:
FORALL x ( p ( x ) )  NOT EXISTS x ( NOT p ( x ) )
/* repeated application of De Morgan */

De Morgan’s "first" law revisited:
NOT (p AND q )  ( NOT p ) OR ( NOT q )
Often applied to result of prior application of implication
law ... So restate, replacing q by NOT q:
NOT (p AND NOT q )  ( NOT p ) OR q
Copyright C. J. Date 2008
page 348
EXAMPLE 1:
LOGICAL IMPLICATION
All red parts must be stored in London ... i.e.:
IF COLOR = ‘Red’ THEN CITY = ‘London’ /* for given part */
Apply implication law /* add parens for clarity */ :
( NOT ( COLOR = ‘Red’ ) ) OR CITY = ‘London’
Map to base table constraint (SQL):
CONSTRAINT BTCX1 CHECK
( NOT ( COLOR = ‘Red’ ) OR CITY = ‘London’ )
Simplify /* i.e., more transformations! */ :
CONSTRAINT BTCX1 CHECK ( COLOR <> ‘Red’ OR CITY = ‘London’ )
Copyright C. J. Date 2008
page 349
EXAMPLE 2:
UNIVERSAL QUANTIFICATION
FORALL PX ( IF COLOR = ‘Red’ THEN PX.CITY = ‘London’ )
Apply quantification law:
NOT EXISTS PX ( NOT ( IF PX.COLOR = ‘Red’
THEN PX.CITY = ‘London’ ) )
/* henceforth add/drop parens freely */
Implication law:
NOT EXISTS PX ( NOT ( NOT ( PX.COLOR = ‘Red’ )
OR PX.CITY = ‘London’ ) )
Could now map to SQL, but let’s tidy it up first:
Copyright C. J. Date 2008
page 350
De Morgan:
NOT EXISTS PX ( NOT ( NOT ( ( PX.COLOR = ‘Red’ )
AND NOT ( PX.CITY = ‘London’ ) ) ) )
Double negation (and drop some parens):
NOT EXISTS PX ( PX.COLOR = ‘Red’
AND NOT ( PX.CITY = ‘London’ ) )
One more obvious transformation:
NOT EXISTS PX ( PX.COLOR = ‘Red’ AND PX.CITY  ‘London’ )
Copyright C. J. Date 2008
page 351
TRANSFORM FINAL EXP TO SQL :

NOT maps to NOT

EXISTS PX ( bx )  EXISTS ( SELECT *
FROM P AS PX
WHERE ( sbx ) )
/* sbx is SQL analog of bx */

Parens around sbx can be dropped

Wrap up entire exp inside CREATE ASSERTION
CREATE ASSERTION ... CHECK
( NOT EXISTS ( SELECT
FROM
WHERE
AND
Copyright C. J. Date 2008
*
P AS PX
PX.COLOR = ‘Red’
PX.CITY <> ‘London’ ) ) ;
page 352
EXAMPLE 3:
IMPLIES AND FORALL
PNAMEs for parts whose weight is different from that of every
part in Paris:
{ PX.PNAME } WHERE FORALL PY ( IF PY.CITY = ‘Paris’
THEN PY.WEIGHT  PX.WEIGHT )
Quantification law:
{ PX.PNAME } WHERE NOT EXISTS PY ( NOT ( IF PY.CITY = ‘Paris’
THEN PY.WEIGHT  PX.WEIGHT ) )
Implication law:
{ PX.PNAME } WHERE
NOT EXISTS PY ( NOT ( NOT ( PY.CITY = ‘Paris’ )
OR ( PY.WEIGHT  PX.WEIGHT ) ) )
Copyright C. J. Date 2008
page 353
De Morgan:
{ PX.PNAME } WHERE
NOT EXISTS PY ( NOT ( NOT ( ( PY.CITY = ‘Paris’ )
AND NOT ( PY.WEIGHT  PX.WEIGHT ) ) ) )
Tidy up:
{ PX.PNAME } WHERE NOT EXISTS PY ( PY.CITY = ‘Paris’ AND
PY.WEIGHT = PX.WEIGHT )
Map to SQL:
Copyright C. J. Date 2008
page 354
SELECT
FROM
WHERE
(
DISTINCT PX.PNAME /* DISTINCT needed here! */
P AS PX
NOT EXISTS
SELECT *
FROM P AS PY
WHERE PY.CITY = ‘Paris’
AND
PY.WEIGHT = PX.WEIGHT )
But ... suppose there’s at least one part in Paris, but such
parts all have a null weight
Original query now can’t be answered ... Any definite result
is a lie!
But foregoing SQL exp will return all PNAMEs in table P
Copyright C. J. Date 2008
page 355
WHAT’S MORE :
SELECT DISTINCT PX.PNAME
FROM
P AS PX
WHERE PX.WEIGHT NOT IN ( SELECT PY.WEIGHT
FROM
P AS PY
WHERE PY.CITY = ‘Paris’ )
Looks equivalent ...
Is equivalent in 2VL ...
But gives different but equally incorrect result: viz.,
empty table! (under same conditions as before)
Moral ???
Copyright C. J. Date 2008
page 356
EXAMPLE 4:
CORRELATED SUBQUERIES
Names of suppliers who supply both part P1 and part P2:
{ SX.SNAME } WHERE
EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = ‘P1’ ) AND
EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = ‘P2’ )
SELECT DISTINCT SX.SNAME
FROM
S AS SX
WHERE EXISTS ( SELECT *
FROM
SP AS SPX
WHERE SPX.SNO = SX.SNO
AND
SPX.PNO = ‘P1’ )
AND
EXISTS ( SELECT *
FROM
SP AS SPX
WHERE SPX.SNO = SX.SNO
AND
SPX.PNO = ‘P2’ )
Copyright C. J. Date 2008
page 357
Correlated subqueries often contraindicated from a performance
point of view,* because (conceptually, at least) they have to be
evaluated once for each row in the outer table, instead of just
once and for all
So eliminate them? ... Easy (for subqueries in EXISTS):
SELECT DISTINCT SX.SNAME
FROM
S AS SX
WHERE SX.SNO IN ( SELECT
FROM
WHERE
AND
SX.SNO IN ( SELECT
FROM
WHERE
*
SPX.SNO
SP AS SPX
SPX.PNO = ‘P1’ )
SPX.SNO
SP AS SPX
SPX.PNO = ‘P2’ )
Mirabile dictu ...
Copyright C. J. Date 2008
page 358
SELECT sic /* "select item commalist" */
FROM T1
WHERE [ NOT ] EXISTS ( SELECT *
FROM T2
WHERE T2.C = T1.C
AND
bx )
Maps to:
SELECT sic
FROM T1
WHERE T1.C [ NOT ] IN ( SELECT T2.C
FROM T2
WHERE bx )
But what if there are nulls?
Copyright C. J. Date 2008
page 359
EXAMPLE 5:
NAMING SUBEXPRESSIONS
Get supplier details for suppliers who supply all purple parts
{ SX } WHERE FORALL PX ( IF PX.COLOR = ‘Purple’ THEN
EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )
Implication law:
{ SX } WHERE FORALL PX ( NOT ( PX.COLOR = ‘Purple’ ) OR
EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )
De Morgan:
{ SX } WHERE FORALL PX ( NOT ( PX.COLOR = ‘Purple’ ) AND
NOT EXISTS SPX
( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) )
Copyright C. J. Date 2008
page 360
Quantification law:
{ SX } WHERE NOT EXISTS PX
( NOT ( NOT ( ( PX.COLOR = ‘Purple’ ) AND
NOT EXISTS SPX
( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) ) )
Double negation:
{ SX } WHERE NOT EXISTS PX ( ( PX.COLOR = ‘Purple’ ) AND
NOT EXISTS SPX
( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )
Copyright C. J. Date 2008
page 361
Drop some parens and map to SQL:
SELECT
FROM
WHERE
(
Copyright C. J. Date 2008
*
S AS SX
NOT EXISTS
SELECT *
FROM
P AS PX
WHERE PX.COLOR = ‘Purple’
AND
NOT EXISTS
( SELECT *
FROM
SP AS SPX
WHERE SPX.SNO = SX.SNO
AND
SPX.PNO = PX.PNO ) )
page 362
A BETTER APPROACH :
Introduce names for subexpressions:
exp1 : PX.COLOR = ‘Purple’
exp2 : EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO )
/* both map fairly directly to SQL */
Original rel calc formulation:
{ SX } WHERE FORALL PX ( IF exp1 THEN exp2 )
Can see the forest as well as the trees! ... and can apply usual
transformations—but in a different sequence, because we
now have better grasp of the big picture
Copyright C. J. Date 2008
page 363
Quantification law:
{ SX } WHERE NOT EXISTS PX ( NOT ( IF exp1 THEN exp2 ) )
Implication law:
{ SX } WHERE NOT EXISTS PX ( NOT ( NOT ( exp1 ) OR ( exp2 ) )
De Morgan:
{ SX } WHERE NOT EXISTS PX ( NOT ( NOT ( exp1
AND NOT exp2 ) ) )
Double negation:
{ SX } WHERE NOT EXISTS PX ( exp1 AND NOT ( exp2 ) )
Can now expand exp1 and exp2 and map to SQL
Copyright C. J. Date 2008
page 364
EXAMPLE 6:
NAMING SUBEXPRESSIONS bis
Get suppliers such that every part they supply is in the same
city as that supplier
{ SX } WHERE FORALL PX
( IF EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO )
THEN PX.CITY = SX.CITY )
{ SX } WHERE FORALL PX ( IF exp1 THEN exp2 )
{ SX } WHERE NOT EXISTS PX ( NOT ( IF exp1 THEN exp2 ) )
{ SX } WHERE NOT EXISTS PX ( NOT ( NOT ( exp1 ) OR exp2 ) )
{ SX } WHERE NOT EXISTS PX ( NOT ( NOT ( exp1 AND
NOT ( exp2 ) ) ) )
{ SX } WHERE NOT EXISTS PX ( exp1 AND NOT ( exp2 ) )
Copyright C. J. Date 2008
page 365
Expand exp1 and exp2 and map to SQL:
SELECT
FROM
WHERE
(
Copyright C. J. Date 2008
*
S AS SX
NOT EXISTS
SELECT *
FROM
P AS PX
WHERE EXISTS
( SELECT
FROM
WHERE
AND
AND
*
SP AS SPX
SPX.SNO = SX.SNO
SPX.PNO = PX.PNO )
PX.CITY <> SX.CITY )
page 366
EXAMPLE 7:
DEALING WITH AMBIGUITY
Get suppliers such that every part they supply is in the
same city
Possible interpretations include:


Get suppliers SX such that for all parts PX and PY, if SX
supplies both of them, then PX.CITY = PY.CITY
Get suppliers SX such that for all parts PX and PY, if SX
supplies both of them and they’re distinct, then PX.CITY =
PY.CITY
Assume first interpretation ...
Copyright C. J. Date 2008
page 367
{ SX } WHERE FORALL PX ( FORALL PY
( IF EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO )
AND EXISTS SPY ( SPY.SNO = SX.SNO AND SPY.PNO = PY.PNO )
THEN PX.CITY = PY.CITY ) )
{ SX } WHERE FORALL PX ( FORALL PY
( IF exp1 AND exp2 THEN exp3 ) )
{ SX } WHERE NOT EXISTS PX ( NOT FORALL PY
( IF exp1 AND exp2 THEN exp3 ) )
{ SX } WHERE NOT EXISTS PX ( NOT ( NOT EXISTS PY ( NOT
( IF exp1 AND exp2 THEN exp3 ) ) ) )
Copyright C. J. Date 2008
page 368
{ SX } WHERE NOT EXISTS PX ( EXISTS PY ( NOT
( IF exp1 AND exp2 THEN exp3 ) ) )
{ SX } WHERE NOT EXISTS PX ( EXISTS PY ( NOT
( NOT ( exp1 AND exp2 ) OR exp3 ) ) )
{ SX } WHERE NOT EXISTS PX ( EXISTS PY ( NOT
( NOT ( exp1 ) OR NOT ( exp2 ) OR ( exp3 ) ) )
{ SX } WHERE NOT EXISTS PX ( EXISTS PY (
( exp1 AND exp2 AND NOT ( exp3 ) ) ) )
Copyright C. J. Date 2008
page 369
SELECT
FROM
WHERE
(
*
S AS SX
NOT EXISTS
SELECT *
FROM P AS PX
WHERE EXISTS
( SELECT
FROM
WHERE
(
AND
AND
Copyright C. J. Date 2008
*
P AS PY
EXISTS
SELECT *
FROM SP AS SPX
WHERE SPX.SNO = SX.SNO
AND
SPX.PNO = PX.PNO )
EXISTS
( SELECT *
FROM SP AS SPY
WHERE SPY.SNO = SX.SNO
AND SPY.PNO = PY.PNO )
PX.CITY <> PY.CITY ) )
page 370
EXAMPLE 8:
USING COUNT
Get suppliers such that every part they supply is in the same
city /* same as Example 7 */ ... Or:

Get suppliers SX such that the number of cities for parts
supplied by SX is less than or equal to one
{ SX } WHERE COUNT ( PX.CITY WHERE EXISTS SPX
( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) < 1
SELECT *
FROM S AS SX
WHERE ( SELECT COUNT ( DISTINCT PX.CITY )
FROM
P AS PX
WHERE EXISTS ( SELECT *
FROM
SP AS SPX
WHERE SPX.SNO = SX.SNO
AND
SPX.PNO = PX.PNO ) ) <=1page 371
Copyright C. J. Date 2008




Reminder: Don’t use COUNT when EXISTS is what you
mean
Is that DISTINCT in the COUNT invocation necessary?
Can you formulate the query in terms of GROUP BY and
HAVING?
If so, what are the logical steps involved in constructing
that formulation?
Copyright C. J. Date 2008
page 372
EXAMPLE 11*:
ALL OR ANY COMPARISON
 E.g., P.WEIGHT >ALL ( SELECT ... )
rx theta sq
subquery, denoting table t
=, <, (etc.) followed by ALL or ANY
row expression (usually scalar in practice: coercion)
 ALL : TRUE iff comparison without ALL returns TRUE for all
rows in t (hence, TRUE if t empty)
 ANY : TRUE iff comparison without ANY returns TRUE for at
least one row in t (hence, FALSE if t empty)
*
For Examples 9 and 10, see the book
Copyright C. J. Date 2008
page 373
PNAMEs for parts with weight > that of every blue part:
SELECT DISTINCT PX.PNAME
FROM P AS PX
WHERE PX.WEIGHT >ALL ( SELECT PY.WEIGHT
FROM P AS PY
WHERE PY.COLOR = ‘Blue’ )
Recommendation: Don’t use ALL or ANY comparisons!
 Error prone (e.g., replace "every" by "any" in example?)
 Redundant ... e.g., consider:
SELECT DISTINCT SNAME
FROM S
WHERE CITY <>ANY ( SELECT CITY FROM P )
Copyright C. J. Date 2008
page 374
SNAMEs for suppliers whose city isn’t equal to any part city?
Wrong! Actually equivalent* to:
SELECT DISTINCT SNAME
FROM S
WHERE EXISTS ( SELECT *
FROM P
WHERE P.CITY <> S.CITY )
ALL or ANY comparisons can always be transformed into
equivalent exps involving EXISTS (as above) ... Can also
usually be transformed into exps involving MAX or MIN
*
Is it? What if cities could be null?
Copyright C. J. Date 2008
page 375
ANY
=
ALL
=ANY equivalent to IN
IN
<>
NOT IN
<
< MAX
< MIN
<=
<=MAX
<=MIN
>
> MIN
> MAX
>=
>=MIN
>=MAX
Copyright C. J. Date 2008
<>ALL equivalent to NOT IN
=ALL, <>ANY ... Use EXISTS
page 376
FOR EXAMPLE :
1. SELECT DISTINCT PX.PNAME
FROM
P AS PX
WHERE PX.WEIGHT >ALL ( SELECT PY.WEIGHT
FROM
P AS PY
WHERE PY.COLOR = ‘Blue’ )
2. SELECT DISTINCT PX.PNAME
FROM
P AS PX
WHERE PX.WEIGHT > ( SELECT MAX ( PY.WEIGHT )
FROM P AS PY
WHERE PY.COLOR = ‘Blue’ )
Exercise: What coercions are involved in the above?
Copyright C. J. Date 2008
page 377
BUT :
MAX gives null if argument is empty ...
1. SELECT DISTINCT PX.PNAME
FROM
P AS PX
WHERE PX.WEIGHT >ALL ( SELECT PY.WEIGHT
FROM
P AS PY
WHERE PY.COLOR = ‘Blue’ )
2. SELECT DISTINCT PX.PNAME
FROM
P AS PX
WHERE PX.WEIGHT > ( SELECT MAX ( PY.WEIGHT )
FROM P AS PY
WHERE PY.COLOR = ‘Blue’ )
No blue parts: Exp 1 gives all PNAMEs ...
Exp 2 gives empty !!!
Copyright C. J. Date 2008
page 378
2. SELECT
FROM
WHERE
(
Copyright C. J. Date 2008
DISTINCT PX.PNAME
P AS PX
PX.WEIGHT >
SELECT COALESCE ( MAX ( PY.WEIGHT ) , 0.0 )
FROM P AS PY
WHERE PY.COLOR = ‘Blue’ )
page 379
EXAMPLE 12:
GROUP BY AND HAVING
For each part supplied by no more than two suppliers,
get PNAME and city and total quantity supplied
{ PX.PNO , PX.CITY ,
SUM ( SPX.QTY WHERE SPX.PNO = PX.PNO , QTY ) AS TPQ }
WHERE COUNT ( SPY WHERE SPY.PNO = PX.PNO ) < 2
SELECT PX.PNO , PX.CITY ,
( SELECT COALESCE ( SUM ( SPX.QTY ) , 0 ) AS TPQ
FROM SP AS SPX
WHERE SPX.PNO = PX.PNO ) AS TPQ
FROM P AS PX
WHERE ( SELECT COUNT ( * )
FROM SP AS SPY
WHERE SPY.PNO = PX.PNO ) <= 2
Copyright C. J. Date 2008
page 380
OR :
SELECT PX.PNO , PX.CITY ,
COALESCE ( SUM ( SPX.QTY ) , 0 ) AS TPQ
FROM P AS PX , SP AS SPX
WHERE PX.PNO = SPX.PNO
GROUP BY PX.PNO
HAVING COUNT ( * ) <= 2
 Easier to understand?
 Is PX.CITY in SELECT clause legal?
 Correct for parts supplied by no suppliers at all?
/* No */
 Are formulations equivalent in presence of nulls?
Or duplicates?
Copyright C. J. Date 2008
page 381
STRUCTURE OF PRESENTATION :
1. Setting the scene
8.
SQL and constraints
2. Types and domains
9.
SQL and views
3. Tuples and relations,
rows and tables
10.
SQL and logic I:
Relational calculus
4. No duplicates, no nulls
11.
SQL and logic II:
Using logic to write SQL
12.
Further SQL topics
6. SQL and algebra I:
The original operators
13.
Appendix:
The relational model
7. SQL and algebra II:
Additional operators
14.
Appendix: DB design
5. Base relvars, base tables
Copyright C. J. Date 2008
page 382
FURTHER SQL TOPICS :
• Implementation defined vs. • Subqueries
implementation dependent
• "Possibly nondeterministic"
• SELECT *
expressions
• Explicit tables
• Empty sets
• Dot qualification
• BNF grammar for SQL
table expressions
• Range variables
Copyright C. J. Date 2008
page 383
THANK YOU FOR LISTENING !!!
1. Setting the scene
8.
SQL and constraints
2. Types and domains
9.
SQL and views
3. Tuples and relations,
rows and tables
10.
SQL and logic I:
Relational calculus
4. No duplicates, no nulls
11.
SQL and logic II:
Using logic to write SQL
12.
Further SQL topics
6. SQL and algebra I:
The original operators
13.
Appendix:
The relational model
7. SQL and algebra II:
Additional operators
14.
Appendix: DB design
5. Base relvars, base tables
Copyright C. J. Date 2008
page 384
THESIS (stake in the ground) :
DBs are not just "data stores" !!!
I claim, if you think about the issue at the approp level
of abstraction, you’re inexorably led to the position:
DBs must be relational
All other "models"—inverted lists, IMS-style hierarchies,
CODASYL-style networks, objects (= CODASYL
warmed over), XML or "semistructured model" (= IMS
warmed over), etc., etc.—are simply ad hoc storage
structures that have been elevated above their station
and will not endure
Copyright C. J. Date 2008
page 385
JUSTIFICATION :
Want to record "true facts": e.g., Joe’s salary is 50K
... i.e., true propositions
Easily encoded as ordered pairs: e.g., <Joe,50K>
value of type NAME
value of type MONEY
But not just arbitrary propositions ... Rather, all true
instantiations of certain predicates ... In the example:
x’s salary is y
value of type NAME
value of type MONEY
Copyright C. J. Date 2008
page 386
JUSTIFICATION (cont.) :
In other words, we want to record extension of "x’s salary is
y"—i.e., a set of ordered pairs—i.e., a binary relation! ...
which we can depict as a table:
values of type NAME
values of type MONEY
Joe
50K
Amy
60K
Sue
45K
...
Ron
...
60K
Actually a function,
because each person
has just one salary
Subset of cartesian product of set of all names ("type
NAME") and set of all money values ("type MONEY"), in that
order
Copyright C. J. Date 2008
page 387
JUSTIFICATION (cont.) :
Humble (but very solid) beginnings! But Codd realized:
1. Need n-adic predicates and propositions (not just
dyadic); hence n-ary relations (not just binary) and
n-tuples (not just pairs)—tuples for short
2. Ordering OK for pairs but soon gets unwieldy for n > 2
... So replace attribute ordinal positions by attribute
names and (re)define relation concept accordingly
3. Representation obviously not the end of the story ...
Need operators for deriving further relations from given
("base") ones for queries etc.—e.g., "Find all persons
with salary 60K" ... Hence relational calculus (logic) /
relational algebra (set theory)
Copyright C. J. Date 2008
page 388
EXAMPLE REVISITED :
attribute of type NAME
attribute of type MONEY
heading
PERSON
SALARY
body
Joe
Amy
Sue
...
Ron
50K
60K
45K
...
60K
No "first" or "second"
attribute
Note logical difference
between attribute and
underlying type
From this point forward relation means a relation in above
sense, barring explicit statements to the contrary
Copyright C. J. Date 2008
page 389
THE RELATIONAL MODEL DEFINED :
1.
An open-ended collection of scalar types (including in
particular the type boolean or truth value)
2.
A relation type generator and an intended interpretation
for relations of types generated thereby
3.
Facilities for defining relation variables of such generated
relation types
4.
A relational assignment operation for assigning relation
values to such relation variables
5.
An open-ended collection of generic relational operators
for deriving relation values from other relation values
Copyright C. J. Date 2008
page 390
SOME IMPLICATIONS :
1.
User defined types and user defined operators
2.
Users can specify individual relation types
3.
Relvars are the only variables allowed inside an RDB—
in accordance with Codd's Information Principle:
Entire information content of the DB is represented in
one and only one way: as explicit attribute values within
tuples within relations
4.
INSERT / DELETE / UPDATE just shorthand
5.
System defined operators (plus user-defined ones?)—
used for many purposes, including constraints in
particular
Copyright C. J. Date 2008
page 391
WHAT REMAINS TO BE DONE ???
•
Proper implementation


•
The Third Manifesto
The TransRelationaltm Model
Further foundation issues: e.g.,




Constraint inference
Database design
"Missing information"
Etc.
Copyright C. J. Date 2008
page 392
•
Higher level abstractions



•
Higher level interfaces


•
PACK and UNPACK
"U_" ops, keys, etc.
Etc.
Propositions
Data mining, decision support, etc.
What about SQL ???
Copyright C. J. Date 2008
page 393
STRUCTURE OF PRESENTATION :
1. Setting the scene
8.
SQL and constraints
2. Types and domains
9.
SQL and views
3. Tuples and relations,
rows and tables
10.
SQL and logic I:
Relational calculus
4. No duplicates, no nulls
11.
SQL and logic II:
Using logic to write SQL
12.
Further SQL topics
6. SQL and algebra I:
The original operators
13.
Appendix:
The relational model
7. SQL and algebra II:
Additional operators
14.
Appendix: DB design
5. Base relvars, base tables
Copyright C. J. Date 2008
page 394
SOME REMARKS ON DATABASE DESIGN :
DB design theory is not part of RM as such—rather,
it builds on RM
 Obviously true for physical design!—but true of
logical design too, to some extent
 Concepts such as further normalization on which
design theory is based are themselves based on
more fundamental concepts that are part of RM
So I'll be brief ... Quick look at:
 Normalization
 Denormalization
 Orthogonality
Copyright C. J. Date 2008
page 395
FUNCTIONAL DEPENDENCIES :
"Everyone knows" that 2NF, 3NF, BCNF all depend on
functional dependencies (FDs)
Let A and B be subsets of the heading of R; then R
satisfies the FD
A  B
iff, whenever two tuples of R agree on A, they also
agree on B
E.g., given EMP { ENO , SALARY , DNO , MNO } :
{ DNO }  { MNO }
Copyright C. J. Date 2008
page 396
Reminder: If SK is a superkey for R and A is any subset of
the heading of R, then R satisfies SK  A

The fact that a given FD holds for R is a relvar constraint
on R (of course): e.g., for EMP as on previous page,
CONSTRAINT FDX COUNT ( EMP { DNO } ) =
COUNT ( EMP { DNO , MNO } ) ;
Likewise for multi-valued dependencies (MVDs), which are
relevant to "4NF", and join dependencies (JDs), which are
relevant to "5NF" (CONSTRAINT formulations left as an
exercise)
Copyright C. J. Date 2008
page 397
NORMAL FORMS :
 1NF : All relvars are in 1NF—even with relation-valued
attributes (RVAs)—though RVAs usually contraindicated
 2NF, 3NF : Mainly historical interest
 BCNF : R is in BCNF iff for every nontrivial FD X  A
satisfied by R, X is a superkey
Loosely: Every fact is a fact about the key, the whole key,
and nothing but the key
(The FD A  B is trivial iff it can't possibly be violated—
i.e., iff B  A)
 4NF : Mainly historical interest
Copyright C. J. Date 2008
page 398
JOIN DEPENDENCIES :
Let R be a relvar, and let A, B, ..., C be subsets of the
heading of R. Then R satisfies the join dependence
*{A,B,…,C}
if and only if every legal value of R is equal to the join
of its projections on A, B, ..., C (i.e., if and only if R
can be nonloss decomposed into those projections)
E.g.: Relvar S satisfies JD * { SN , SS , SC }
where SN = { SNO , SNAME }, etc.
Note: UNION { A , B , … , Z } must equal heading
Every MVD is a JD, every FD is an MVD
Copyright C. J. Date 2008
page 399
EVERY FD IS A JD (example) :
Suppose relvar S satisfies additional FD:
{ CITY }  { STATUS } /* see next page */
Then S can be nonloss decomposed into projections on:
{ SNO , SNAME , CITY }
{ CITY , STATUS }
In other words, S satisfies following JD:
* { SNC , CS }
where SNC = { SNO , SNAME , CITY }
CS = { CITY , STATUS }
Copyright C. J. Date 2008
page 400
SAMPLE VALUE OF RELVAR S
SATISFYING { CITY }  { STATUS } :
S
Copyright C. J. Date 2008
SNO
SNAME
STATUS
S1
S2
S3
S4
S5
Smith
Jones
Blake
Clark
Adams
20
30
30
20
30
CITY
London
Paris
Paris
London
Athens
note the
change
page 401
NONLOSS DECOMPOSE :
SNC
SNO
SNAME
CITY
S1
S2
S3
S4
S5
Smith
Jones
Blake
Clark
Adams
London
Paris
Paris
London
Athens
CS
CITY
STATUS
Athens
London
Paris
30
20
30
S  SNC JOIN CS ... In other words, S satisfies
* { { SNO , SNAME , CITY } , { CITY , STATUS } }
Copyright C. J. Date 2008
page 402
NORMAL FORMS (cont.) :
 5NF : The "final" normal form!*—R is in 5NF iff, for every
nontrivial JD * {A,B,...,C} satisfied by R, each of A,B,...,C
is a superkey [and keys can be ordered such that each
adjacent pair is included in at least one of A, B, ..., C]
The JD * {A,B,...,C} is trivial iff at least one of A, B, ..., C
= heading
 Theorem (Date & Fagin 1991): 3NF and no composite keys
implies 5NF
 And another: BCNF and not all key implies 5NF
*
Well .... except for 6NF
Copyright C. J. Date 2008
page 403
NORMAL FORMS (cont.) :
 6NF : The true final normal form—R is in 6NF iff
the only JDs it satisfies are trivial ones
E.g., SP (but not S or P)
R is in 6NF iff the only JDs it satisfies are of the form
*{...,{H},...}, where {H} is the heading
R is in 6NF iff it’s in 5NF, is of degree n, and has no key
of degree less than n-1 ... 6NF implies 5NF
E.g., PLUS{A,B,C} : 6NF (every key is of degree two)
Note: 6NF has extended defn in temporal DB context
Copyright C. J. Date 2008
page 404
OBJECTIVES OF NORMALIZATION :
 Reduce redundancy
 Avoid update anomalies
 "Better" representation of semantics
 Easier enforcement of constraints (normalization to 5NF
gives us a simple way of enforcing certain important and
commonly occurring constraints)


Copyright C. J. Date 2008
Only need to enforce KEY UNIQUENES
All JDs (and so all MVDs and all FDs) will then be
enforced automatically
page 405
SOME REASONS WHY NORMALIZATION
IS NOT A PANACEA :
 Enforces certain constraints very simply, but JDs etc.
are not the ONLY kind of constraint
 Decomposition is not unique, in general
 Not all redundancies can be removed by taking projections
 BCNF and "dependency preservation" objectives can conflict

In fact, normalization can cause some FDs (etc.) to
cease to be FDs (etc.), since they now span relvars!
 Some design issues are simply not addressed
 Nevertheless ... DENORMALIZE ONLY AS A LAST RESORT !!!
Copyright C. J. Date 2008
page 406
DENORMALIZATION CONSIDERED HARMFUL:
Almost always, anything less than full normalization is
strongly contraindicated—even in a "direct image"
implementation !!! /* big topic in its own right */
 Fully normalized design is a "good" representation of the
real world—intuitively easy to understand, good base
for future growth
Everyone knows denormalization makes update harder ... but
it can make retrieval harder too—see next page
 Can be bad for performance as well!—usually means
improving the performance of one application at the
expense of others
Copyright C. J. Date 2008
page 407
DENORMALIZATION BAD FOR RETRIEVAL
(example) :
 Again suppose suppliers satisfy { CITY }  { STATUS }:
S
SNO
SNAME
STATUS
S1
S2
S3
S4
S5
Smith
Jones
Blake
Clark
Adams
20
30
30
20
30
CITY
London
Paris
Paris
London
Athens
note the
change
 Can be regarded as denormalization of SNC and CS
/* see earlier */
 "Find average city status" (i.e., 26.667)
Copyright C. J. Date 2008
page 408
SELECT DISTINCT AVG (STATUS) AS REQD
FROM S
— result (incorrect): 26
SELECT DISTINCT AVG (DISTINCT STATUS) AS REQD
FROM S
— result (incorrect): 25
SELECT DISTINCT CITY, AVG (STATUS) AS REQD
FROM S
GROUP BY CITY
— gives avg status per
city, not overall avg
SELECT DISTINCT CITY, AVG (AVG (STATUS)) AS REQD
FROM S
GROUP BY CITY
— syntax error
SELECT DISTINCT AVG (STATUS) AS REQD
FROM ( SELECT DISTINCT CITY, STATUS
FROM S ) AS POINTLESS — correct (at last!) ...
Copyright C. J. Date 2008
— but is it supported?
page 409
ORTHOGONALITY (a little more science!) :
Design theory is about reducing redundancy (true fact!)
—but what’s redundancy ??? Well, certainly:
If DB is such that if tuple t appears at all it must appear
more than once, then DB clearly involves some
redundancy
Note that normalization is precisely about eliminating
redundant appearances of the same tuple!
E.g., suppose once again that suppliers satisfy FD
{ CITY }  { STATUS }
Copyright C. J. Date 2008
page 410
S
SNO
SNAME
STATUS
S1
S2
S3
S4
S5
Smith
Jones
Blake
Clark
Adams
20
30
30
20
30
CITY
London
Paris
Paris
London
Athens
note the
change
(Sub)tuples <20,London> and <30,Paris> both appear twice
(and do represent redundancy) ... /* recall that every
subset of a tuple is a tuple */
So normalize
Copyright C. J. Date 2008
page 411
SNC
SNO
SNAME
CITY
S1
S2
S3
S4
S5
Smith
Jones
Blake
Clark
Adams
London
Paris
Paris
London
Athens
CS
CITY
STATUS
Athens
London
Paris
30
20
30
Now <20,London> and <30,Paris> both appear just once
Copyright C. J. Date 2008
page 412
BUT WHAT ABOUT :
/* part weight < 17.0 pounds */
LP
P#
PNAME
COLOR
P1
P2
P3
P4
P5
Nut
Bolt
Screw
Screw
Cam
Red
Green
Blue
Red
Blue
PNAME
COLOR
Bolt
Screw
Cog
Green
Blue
Red
HP P#
P2
P3
P6
WEIGHT
12.0
17.0
17.0
14.0
12.0
WEIGHT
17.0
17.0
19.0
CITY
London
Paris
Oslo
London
Paris
CITY
Paris
Oslo
London
/* part weight > 17.0 pounds */
Copyright C. J. Date 2008
page 413
Normalization doesn’t help … but problem is easy to
see!


Relvar predicates for LP and HP "overlap"
I.e., they require tuples for parts with weight 17.0
pounds to appear in both relvars:
CONSTRAINT LP_AND_HP_OVERLAP
( LP WHERE WEIGHT = 17.0 ) =
( HP WHERE WEIGHT = 17.0 ) ;
So:
Copyright C. J. Date 2008
page 414
THE PRINCIPLE OF ORTHOGONAL DESIGN :
First version: No two base relvars should be such that
their relvar constraints might require the same tuple to
appear in both
—McGoveran & Date 1994
but somewhat revised here

Solves the LP / HP problem
Remember that (as far as the user is concerned) all
relvars in the DB are base relvars!
Orthogonality principle as stated applies to relvars of
the same type … But what about:
Copyright C. J. Date 2008
page 415
SX
SNO
SNAME
S1
S2
S3
S4
S5
Smith
Jones
Blake
Clark
Adams
STATUS
20
10
30
20
30
SY
SNO SNAME
CITY
S1
S2
S3
S4
S5
London
Paris
Paris
London
Athens
Smith
Jones
Blake
Clark
Adams
Second version: Let A and B be distinct relvars. Then there
should not exist nonloss decompositions of A and B into
projections A1, …, Am and B1, …, Bn, respectively, such that
the relvar constraints for some Ai and some Bj might require
the same tuple to appear in both.
Subsumes first version … But what about:
Copyright C. J. Date 2008
page 416
SX
SNO
SNAME
S1
S2
S3
S4
S5
Smith
Jones
Blake
Clark
Adams
STATUS
20
10
30
20
30
SY
ID
LABEL
CITY
S1
S2
S3
S4
S5
Smith
Jones
Blake
Clark
Adams
London
Paris
Paris
London
Athens
Oh, all right ...
Copyright C. J. Date 2008
page 417
THE PRINCIPLE OF ORTHOGONAL DESIGN
(final version) :
Let A and B be distinct relvars. Replace A and B by nonloss
decompositions into projections A1, …, Am and B1, …, Bn,
respectively, such that every Ai (i = 1, …, m) and every Bj (j
= 1, …, n) is in 6NF. Let some i and j be such that there
exists a sequence of zero or more attribute renamings with
the property that (a) when applied to Ai, it produces Ak, and
(b) Ak and Bj are of the same type. Then there must not
exist a constraint to the effect that, at all times, (Ak WHERE
ax) = (Bj WHERE bx), where ax and bx are restriction
conditions, neither of which is a contradiction.
Subsumes second version
Copyright C. J. Date 2008
page 418
ORTHOGONALITY COMPLEMENTS NORMALIZATION :
Consider again decomposition of S into SX and SY:
SX { SNO, SNAME, STATUS }
SY { SNO, SNAME, CITY }
Satisfies all normalization principles!—

Both projections in 5NF

Decomposition nonloss

Dependencies preserved
 Both projections needed in reconstruction
Orthogonality, not normalization, tells us the decomposition
is bad
Copyright C. J. Date 2008
page 419
POINTS ARISING :






FORMALIZED COMMON SENSE (like normalization)
Reduces redundancy, avoids update anomalies
(like normalization)
If R is decomposed via restriction, restrictions should be
pairwise disjoint (and R should be reconstructable via
disjoint union)
Orthogonal decomposition: Any decomposition that abides
by The Principle of Orthogonal Design
No strong logical reasons for horizontal decomposition?
(Contrast normalization)
Horizontal and vertical decomposition both lead to need for
multi-relvar ("database") constraints
Copyright C. J. Date 2008
page 420
ONE FURTHER POINT :
Much confusion over this topic, even though the idea is
basically so simple (mea culpa) …
Example where orthogonality is NOT violated (acks Hugh
Darwen) … Consider predicates:

Employee ENO is on vacation

Employee ENO is awaiting phone number allocation
Obvious design:
ON_VACATION { ENO }
KEY { ENO }
Copyright C. J. Date 2008
NEEDS_PHONE { ENO }
KEY { ENO }
page 421
Same tuple can appear in ON_VACATION and
NEEDS_PHONE—but different propositions / no redundancy /
no violation of orthogonality
Note difference in kind between this example and LP / HP
example:


For LP / HP, there’s a formal constraint that a tuple
must satisfy in order to be accepted into either relvar …
and constraints "overlap"
For ON_VACATION / NEEDS_PHONE, no analogous
property holds … DBMS must just trust the user!
Copyright C. J. Date 2008
page 422
TO SUM UP : LOGICAL DATABASE DESIGN ...
... is, precisely, specifying constraints !!!
DB is supposed to be "a faithful representation of the real
world" ... It's constraints that represent semantics ... So:
1. Pin down relvar predicates as carefully as possible
(albeit informally)
2. Map the output from the first step into relvars and
corresponding constraints (some of which will
involve FDs, MVDs, JDs in particular)
Note: "E/R modeling" is almost totally incapable of
dealing with constraints!
Note: All of the above is highly relevant to what the
commercial world calls business rules
Copyright C. J. Date 2008
page 423
SOME REMARKS ON PHYSICAL DESIGN :

Should follow logical design; automatable ??? … Not a
farfetched idea

RM deliberately meant to give implementers freedom to
implement the model any way they liked … But typically:
base relvar
physical table
....attributes...
.....fields......
tuple
tuple
tuple
tuple
tuple
record
record
record
record
record
Copyright C. J. Date 2008
page 424
 Many things wrong with direct image style …
In particular, almost no data independence !!!
 Hence "denormalize for performance" (etc.)
 But something better is on its way:
The TransRelationaltm Model


No penalty for full normalization!
MANY other advantages … including, possibly, a basis
for a relational approach to missing information
Copyright C. J. Date 2008
page 425
Suppose just two suppliers S1 and S2, and S2’s status is
unknown ...
SN
SNO
S1
S2
SS
SNAME
Smith
Jones
SNO
S1
SC
STATUS
20
SNO
S1
S2
CITY
London
Paris
If you don’t know something, better to say nothing at all!
/* but be careful over relvar predicates */
Wovon man nicht reden kann, darüber muss man schweigen
("whereof one cannot speak, thereon one must remain silent")
—Wittgenstein
Copyright C. J. Date 2008
page 426
THANK YOU FOR LISTENING !!!
1. Setting the scene
8.
SQL and constraints
2. Types and domains
9.
SQL and views
3. Tuples and relations,
rows and tables
10.
SQL and logic I:
Relational calculus
4. No duplicates, no nulls
11.
SQL and logic II:
Using logic to write SQL
12.
Further SQL topics
6. SQL and algebra I:
The original operators
13.
Appendix:
The relational model
7. SQL and algebra II:
Additional operators
14.
Appendix: DB design
5. Base relvars, base tables
Copyright C. J. Date 2008
page 427