CS 157A Chap 2

advertisement
Chapter 2
THE RELATIONAL
MODEL OF DATA
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
1
Chapter 2
The Relational Model of Data
2.1 An Overview of Data Models
2.2 Basics of Relational Model
2.3 Defining a Relation Schema in SQL
2.4 An Algebraic Query Language
2.5 Constraints on Relations
2.6 Summary of Chapter 2
2.7 References for Chapter 2
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
2
Section 2.1
AN OVERVIEW OF DATA MODELS
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
3
2.1 An Overview of Data Models
2.1.1 What is Data Model?
2.1.2 Important Data Models
2.1.3 The Relational Data Model in Brief
2.1.4 The Semi-structured Model in Brief
2.1.5 Other Data Models
2.1.6 Comparison of Modeling Approaches
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
4
2.1.1 What is Data Model?


A data model is a “small set of
notations/mathematics” (mathematical model
– see Definition in Discrete Mathematics) for
describing data.
The description generally consists of 3 parts:
Structure: it can be imagined as 'object' in Java or
'struct' in C but in database world, the structure of
data is higher level than physical data model. That's
why we refer to it as conceptual model.
2. Operations: a limited set of queries (retrieving data)
and modifications (changing data)
3. Constraints: applying some limitations on what the
data can be.
1.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
5
Structure






External view
E.g. CS
conceptual view
E.g Knowledge
Internal view
Congress libaray
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
6
2.1.2 Important Data Models

Currently, there are two important data
models as follows:
The Relational Data Model (including ObjectRelational Extension) which is present in all
commercial DBMS's
2. The Semi-Structured Data Model (including XML)
which is an added feature of most DBMS's
1.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
7
2.1.3 The Relational Data Model in
Brief


The relational model is based on tables.
For instance, the following table shows 3
movies but you can imagine that there are
many more rows.
Title
Year
Length
Genre
Gone with the wind
1939
231
Drama
Star Wars
1977
124
Sci-fi
Wayne World
1992
95
Comedy
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
8
2.1.3 The Relational Data Model in
Brief (cont'd)



We are not going to talk about how to
implement the structure of the tables here and
it will be postponed to higher courses in
database.
There are some operations that we can do on
the tables. For example we can query the rows
where the genre is 'comedy'.
As an example for the constraints, we may
decide there could never be two movies with
the same title and year in this table.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
9
2.1.4 The Semi-structured Model
in Brief
Semi-structured data resembles trees or
graphs, rather than tables or arrays.
 The principal manifestation of this viewpoint
today is XML, a way to represent data by
hierarchically nested tagged elements.
TYLIN: IBM ARC abandom it
 The tags, similar to those used in HTML.
 You can imagine tags as the column headers
do in the relational model.
 You can see an example of XML in the next
slide which is the same as movies data.

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
10
2.1.4 The Semi-structured Model
in Brief (cont'd)
<Movies>
<Movie title=“Gone with
the wind”>
<Year>1939</Year>
<Length>281</Length>
<Genre>drama</Genre>
</Movie>
<Movie title=“Star Wars”>
<Year>1977</Year>
<Length>124</Length>
<Genre>Scifi</Genre>
</Movie>
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
<Movie title=“Wayne’s
World”>
<Year>1992</Year>
<Length>95</Length>
<Genre>Comedy</Genre>
</Movie>
</Movies>
11
2.1.5 Other Data Models


A modern trend is to add object-oriented
features to the relational model.
There are two effects of object-orientation on
relations:
Values can have structure, rather than being
elementary types such as integer or strings.
2. Relations can have associated methods.
1.

These extensions are called object-relational
model.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
12
2.1.5 Other Data Models (cont'd)



In earlier DBMS's, there were several other
models like hierarchical model or network
model.
Hierarchical model was a tree-oriented model
that unlike the modern DBMS's, it really
operated at the physical level.
Network model was a graph-oriented and also
physical level model as well.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
13
2.1.6 Comparison of Modeling
Approaches (IBM ARC; no more





Semi-structured models have more flexibility
but relational model is still preferred.
In large databases, efficiency of access to data
and modifying data are of great importance.
Ease of use is another factor of using DBMS's.
Both of these features can found in relational
DBMS's.
Moreover, SQL, the structured query language,
in spite of its simplicity, is a powerful language
for database operations.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
14
Section 2.2
BASICS OF RELATIONAL MODEL
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
15
2.2 Basics of Relational Model
2.2.1 Attributes
2.2.2 Schemas
2.2.3 Tuples
2.2.4 Domains
2.2.5 Equivalent Representations of a Relation
2.2.6 Relation Instances
2.2.7 Keys of Relations
2.2.8 An Example Database Schema
2.2.9 Exercises for Section 2.2
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
16
2.2 Basics of Relational Model

The relational model gives us a single way to
represent data: as a two-dimensional table
called a relation.
Movies
relation

Title
Year
Length
Genre
Gone with the wind
1939
231
Drama
Star Wars
1977
124
Sci-fi
Wayne World
1992
95
Comedy
Each row (tuple) represents a movie and each
column (attribute) represents a property of the
movie.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
17
2.2.1 Attributes




The columns of a relation are called attributes.
In the Movies relation (in previous slide), title,
year, length, and genre are attributes.
Attributes appear at the top of the columns.
Like choosing descriptive names for variables
in regular programming languages, attributes
names should be chosen in such a way that
describe the contents.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
18
2.2.2 Schemas



The names of a relation and the set of
attributes for the relation is called the schema
of the relation.
We show the schema for the relation with the
relation name followed by the parenthesized
list of its attributes
For instance, the following is the schema of
relation Movies:
Movies(title, year, length, genre)
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
19
2.2.2 Schemas (cont'd)


Note that the attributes are a set, not a list but
when we talk about relations, we often specify
an order for the attributes.
A database consists of one ore more relations.
The set of schemas in the database is called a
relational database schema, or just a database
schema.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
20
2.2.3 Tuples



The rows of a relation, other than the header
row containing the attributes names, are
called tuples.
A tuple has one component for each attribute
of the relation.
For example, in the Movies relation, the first
tuple has four components: 'Gone with the
wind', 1939, 231, and drama for attributes
title, year, length, and genre respectively.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
21
2.2.3 Tuples (cont'd)

When we wish to write a tuple in isolation, not
as a part of a relation, we normally use
commas to separate components like this:
('Gone with the wind', 1939, 231, 'drama')

Note that we always use the same order of the
attributes to show the tuple in isolation.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
22
2.2.4 Domains
Tylin 8/29: Past, present, future data
 The relational model requires that each
component of each tuple be atomic.
 That is, it must be of some elementary type
such as integer or string.
 It is not allowed for a value to be a set, list,
array or any other type that reasonably can be
broken into smaller components.
 It's further assumed that associated with each
attribute of a relation is a domain.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
23
2.2.4 Domains (cont'd)

We can include the domain for each attributes
in a schema as follows:
Movies(title:string, year:integer,
length:integer, genre:string)
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
24
2.2.5 Equivalent Representations
of a Relation



Relations are sets of tuples, not lists of tuples.
In other words, the order of tuples in a relation
has no significance.
Moreover, we can reorder the attributes of a
relation as well.
Note that, when we change the order of the
attributes, we should change the order of the
contents as well.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
25
2.2.6 Relation Instances





Relations change over time. In other words,
relations are not static.
For example, we insert tuples in the Movies
relation over time, and also, we may delete or
update some tuples as well.
Even the schema can be changed. In other
words, we may add/delete an attribute
to/from the schema.
We call a set of tuples for a given relation an
instance of that relation.
Current instance, is the set of tuples that
exists now.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
26
2.2.7 Keys of Relations



Relational model allows us to place some
constraints on a schema.
One important constraint is called key
constraint or simply a key.
A set of attributes (one or more) forms a key if
two tuples in the relation cannot have the
same values in all the attributes of the key.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
27
2.2.7 Keys of Relations (cont'd)
Example 2.1
For the Movies relation, we can assign the
attributes title and year be the key of the
relation.
In this way, the relation cannot have two tuples
with the same title and year.
Note that the title by itself does not form a key
because there are many movies over the years
that have the same name. In other words, the
title by itself is not unique and cannot identify
a movie uniquely.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
28
2.2.7 Keys of Relations (cont'd)



We indicate the attribute(s) contributing in the
key by underlying them as follows:
Movies(title, year, length, genre)
Note that the key is a constraint for all
possible instances of the relation, and not for a
specific instance.
Most of the time, we use an artificial keys for a
relation. For example, for the Movies relation,
we could add a new attributes movie_id and
assign it as the key. In this way, we could
make sure it was unique for all possible tuples.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
29
2.2.8 An Example Database
Schema
The database schema that are used during this
book is as follows:
Movies(title:string, year:integer,
length:integer, genre:string,
studioName:string, producerC#:integer)
Moviestar(name:string, address:string,
gender:char, birthdate:date)
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
30
2.2.8 An Example Database
Schema (cont'd)
StarsIn(movieTitle:string,
movieYear:integer, starName:string)
MovieExec (name:string, addres:string,
cert#:integer, netWorth:integer)
Studio (name:string, address:string,
presC#:integer)
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
31
2.2.9 Exercises for Section 2.2
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
32
Section 2.3
DEFINING A RELATION SCHEMA
IN SQL
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
33
2.3 Defining a Relation Schema in
SQL
2.3.1 Relations in SQL
2.3.2 Data Types
2.3.3 Simple Table Declaration
2.3.4 Modifying Relations Schemas
2.3.5 Default Values
2.3.6 Declaring Keys
2.3.7 Exercises for Section 2.3
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
34
2.3 Defining a Relation Schema in
SQL



SQL, Structured Query Language, pronounced
"sequel", is the principal language to describe,
and manipulate relational database.
There is a standard called SQL-99 that most
commercial databases implemented something
similar, but not identical to, the standard.
There are two sub-languages for SQL:


DDL: Data Definition Language
DML: Data Manipulation Language
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
35
2.3.1 Relations in SQL

SQL makes a distinction between three kinds
of relations:




Stored relations: are called tables. These relations
exists in database and usually we deal with them.
Views: are relations that do not exist but are
constructed when needed.
Temporary tables: are constructed temporarily by
SQL processor when it executes queries or other
tasks.
We are going to discuss about tables in this
chapter. Views will be covered in chapter 8 and
temporary tables are never declared.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
36
2.3.2 Data Types


All attributes must have a data type.
The primitive data types supported by SQL are:

Character string



CHAR(n): fixed length string of length n; short strings
will be padded with trailing blank to make n characters.
VARCHAR(n): variable length string up to n character;
an end-marker or string-length is used to show the end
of the string; the purpose is to save space.
Note that longer values will be truncated to fit.
Bit string


BIT(n): fixed bit string of length n;
BIT VARYING(n): bit string of length up to n;
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
37
2.3.2 Data Types (cont'd)

The primitive data types (cont'd):







BOOLEAN: a logical value of TRUE, FALSE, or
UNKNOWN (NULL)
INT or INTEGER: integer value
SHORTINT: short integer; usually the lower bound
and the upper bound of SHORTINT is half of
INTEGER's.
FLOAT or REAL: floating point number
DOUBLE: double precision real number
DECIMAL(n, d): customized real number;
NUMERIC(n, d): a synonym for DECIMAL
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
38
2.3.2 Data Types (cont'd)

The primitive data types (cont'd):

DATE : represents a date value of the form
'yyyy-mm-dd'




TIME: represents a time value of the form
'HH:mm:ss' or 'HH:mm:ss.d' (d is a fraction of
seconds)
You can create a date constant like this:
DATE '2011-08-24'
You can create a time constant like this:
TIME '16:09:25' or TIME '16:09:25.378'
Most databases have TIMESTAMP data type of
the form 'yyyy-mm-dd HH:mm:ss'
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
39
2.3.3 Simple Table Declaration

The simplest form of relation declaration:
CREATE TABLE tabName(
attrib1 type,
attrib2 type,
...
attribn type);
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
40
2.3.3 Simple Table Declaration
(cont'd)
Example 2.2
The relation Movies can be declared as follows:
CREATE TABLE
title
year
length
genre
studioName
producerC#
Movies(
CHAR(100),
INT,
INT,
CHAR(10),
CHAR(30),
INT);
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
41
2.3.3 Simple Table Declaration
(cont'd)
Example 2.3
The relation MovieStar can be declared as
follows:
CREATE TABLE MovieStar(
name
CHAR(30),
address
VARCHAR(255),
gender
CHAR(1),
birthdate DATE);
The gender attribute can be 'M' or 'F'.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
42
2.3.4 Modifying Relations Schemas


To drop a relation R, execute the following SQL
statement:
DROP TABLE R;
To alter the schema, we have several options.


To add attributes:
ALTER TABLE R
ADD attrib1 type, ..., attribn type;
To drop attributes:
ALTER TABLE R
DROP attrib1, ..., attribn;
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
43
2.3.4 Modifying Relations Schemas
(cont'd)
Example 2.4
Add an attribute to MoviesStar for phone data.
ALTER TABLE MovieStar
ADD phone CHAR(16);
Note that phone attribute will be NULL for all
existing tuples.
Drop birthdate attribute
ALTER TABLE MovieStar
DROP bithdate;
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
44
2.3.5 Default Values


When we insert or modify a tuple, we
sometimes do not have values for some
attributes and we wish to assign a default
values for them.
To assign a default value for attribute1, we use
the following syntax:
CREATE TABLE tabName(
attrib1 type DEFAULT defaultValue,
...
attribn type);
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
45
2.3.4 Modifying Relations Schemas
(cont'd)
Example 2.5
Assign default value '?' for gender and default
value '0000-00-00' for birthdate.
CREATE TABLE MovieStar(
name
CHAR(30),
address
VARCHAR(255),
gender
CHAR(1) DEFAULT '?',
birthdate DATE DEFAULT DATE '0000-00-00');
Note that we can assign a default value when
altering a schema as well:
ALTER TABLE MovieStar
ADD phone CHAR(16) DEFAULT 'unlisted';
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
46
2.3.6 Declaring Keys

There are two ways to declare an attribute or a
set of attributes to be a key:


Method 1:
CREATE TABLE tabName(
attrib1 type PRIMARY KEY,
...
attribn type);
Method 2:
CREATE TABLE tabName(
attrib1 type,
...
attribn type,
PRIMARY KEY(attrib1,...,attribk));
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
47
2.3.6 Declaring Keys (cont'd)


Note that if the key is a set of attributes, then
we have to use method 2 but if the key is just
one attribute, then either methods can be
used.
There are two declarations that may be used
to indicate key:



PRIMARY KEY
UNIQUE
Both have the same effect but in PRIMARY KEY
case, none of the attributes of the key can be
NULL but in UNIQUE case, it's possible.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
48
2.3.6 Declaring Keys (cont'd)
Example 2.6
Declare name attribute as primary key in
MovieStar relation.
CREATE TABLE MovieStar(
name
CHAR(30) PRIMARY KEY,
address
VARCHAR(255),
gender
CHAR(1) DEFAULT '?',
birthdate DATE DEFAULT DATE '0000-00-00');
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
49
2.3.6 Declaring Keys (cont'd)
Example 2.6 (cont'd)
Alternatively, we can use the following syntax:
CREATE TABLE MovieStar(
name
CHAR(30),
address
VARCHAR(255),
gender
CHAR(1) DEFAULT '?',
birthdate DATE DEFAULT DATE '0000-00-00'),
PRIMARY KEY (name);
Note that UNIQUE can replace PRIMARY KEY.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
50
2.3.6 Declaring Keys (cont'd)
Example 2.7
Declare title and year attributes as primary key in
Movies relation.
CREATE TABLE Movies(
title
CHAR(100),
year
INT,
length
INT,
genre
CHAR(10),
studioName CHAR(30),
producerC# INT,
PRIMARY KEY (title, year);
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
51
2.3.7 Exercises for Section 2.3
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
52
Section 2.4
AN ALGEBRAIC QUERY LANGUAGE
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
53
2.4 An Algebraic Query Language
2.4.1 Why Do We Need a Special Query Language?
2.4.2 What is an Algebra?
2.4.3 Overview of an Relational Algebra
2.4.4 Set Operations on Relations
2.4.5 Projection
2.4.6 Selection
2.4.7 Cartesian Product
2.4.8 Natural Joins
2.4.9 Theta-Joins
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
54
2.4 An Algebraic Query Language
(cont'd)
2.4.10 Combining Operations to Form Queries
2.4.11 Naming and Renaming
2.4.12 Relationships Among Operations
2.4.13 A linear Notation for Algebraic Expressions
2.4.14 Exercises for Section 2.4
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
55
2.4 An Algebraic Query Language




A DBMS needs a way to query the data and to
modify the data.
We begin our study of operations on relations
with a special algebra called relational
algebra.
Relational algebra was used by some early
DBMS's prototypes but is not used in current
commercial DBMS's.
The real query language, SQL, uses relational
algebra internally to optimize the process of
retrieving the data.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
56
2.4.1 Why Do We Need a Special
Query Language?




Why we don't use Java or C to retrieve the
needed data?
For example, we could represent a tuple with
an object in Java and we could represent a
relation with an array of the objects! What
would be the problem?
Surprisingly, relational algebra is useful
because it is less powerful than Java or C!
Ease of programming and producing highly
optimized code by compiler are two important
advantages of being less powerful.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
57
2.4.2 What is an Algebra?




In general, algebra consists of operators and
atomic operands.
In arithmetic, the atomic operand are variables
like x and y and constants like 10 and the
operators are the simple arithmetic operators
like: +, -, /, *.
Any algebra allows us to build expressions by
combining operators and atomic operands.
Relational algebra is another example of
algebra. Variables are relations and constants
are finite relations. Operators will be covered in
next sub-sections.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
58
2.4.3 Overview of an Relational
Algebra

The operations fall into four classes:


Set operations: union, intersection, difference
Operations that remove some parts of a relation:



Operations that combine the tuples of two relations:




Selection – eliminates some tuples
Projection – eliminates some attributes
Cartesian product: pairs the tuples of two relations in
all possible ways
Various kinds of joins: will be covered later
Renaming: changes the schema without changing
the tuples.
We refer to expressions of relational algebra
as queries.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
59
2.4.4 Set Operations on Relations

The three most common operations on sets
are:





Union: R  S, is the set of elements that are in R or S
or both.
Intersection: R  S, is the set of elements that are in
both R and S.
Difference: R – S, is the set of elements that are in R
but not in S.
Note that an element appears in a set once
and duplicated values are not allowed.
When we apply these operations to relations,
we need to put some conditions on R and S.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
60
2.4.4 Set Operations on Relations
(cont'd)

The conditions of R and S



R and S must have the same schema.
The order of attributes is important here and must
be the same.
If the name of the attributes are different but
the types are the same, we can rename the
attributes temporarily by renaming operator.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
61
2.4.4 Set Operations on Relations
(cont'd)
Example 2.8
Given relations R and S as follows, compute:
R  S, R  S, and R – S
Name
Gender
Birthdate
Carrie Fisher
F
9/9/99
Mark Hamill
M
8/8/88
Relation R
Name
Gender
Birthdate
Carrie Fisher
F
9/9/99
Harrison Ford
M
7/7/77
Relation S
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
62
2.4.4 Set Operations on Relations
(cont'd)
Example 2.8 (cont'd)
RS
RS
R-S
Name
Gender
Birthdate
Carrie Fisher
F
9/9/99
Mark Hamill
M
8/8/88
Harrison Ford
M
7/7/77
Name
Gender
Birthdate
Carrie Fisher
F
9/9/99
Name
Gender
Birthdate
Mark Hamill
M
8/8/88
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
63
2.4.5 Projection


The projection operator produces a new
relation that has only some of the attributes.
Projection operator in relational algebra is:
πA1, A2, ..., An (R)

This operator applies on the relation R and
produces a new relation with only attributes
A1, A2, …, An from relation R. In other words,
the schema of the new relation would have the
following set of attributes:
{A1, A2, …, An}
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
64
2.4.5 Projection (cont'd)
Example 2.9
Given the relation Movies. Project the first three
attributes.
Title
Year
Length Genre
studioName ProducerC#
Star Wars
1977
124
sciFi
Fox
12345
Galaxy Quest
1999
104
Comedy
Dreamworks
67890
95
Comedy
Paramount
99999
Wayne’s World 1992
The result
relation
Title
Year
Length
Star Wars
1977
124
Galaxy Quest
1999
104
Wayne’s World 1992
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
95
65
2.4.5 Projection (cont'd)
Example 2.9 (cont'd)
Project the Genre attribute.
The result
relation
Genre
sciFi
Comedy
Note that in the relational algebra of sets,
duplicate tuples are always eliminated. That's
why 'Comedy' tuple is one instead of two.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
66
2.4.6 Selection



The selection operator, applies to a relation R,
and produces a new relation with a subset of
R's tuples.
The tuples in the resulting relation are those
that satisfy some condition C that involves the
attributes of R.
Selection operator is denoted by:
σC (R)

The schema for the resulting relation is the
same as R's schema.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
67
2.4.6 Selection (cont'd)


The operands in condition C are either
constants or attributes of R.
We apply C to each tuple t of R by
substituting.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
68
2.4.6 Selection (cont'd)
Example 2.10
Given the Movies relation as follows:
Find σlength >= 100 (Movies).
Title
Year
Length Genre
studioName ProducerC#
Star Wars
1977
124
sciFi
Fox
12345
Galaxy Quest
1999
104
Comedy
DreamWorks
67890
Wayne's World
1992
95
Comedy
Paramount
99999
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
69
2.4.6 Selection (cont'd)
Example 2.10 (cont'd)
The first two tuples satisfy the condition. So, the
result relation would be:
Title
Year
Length Genre
studioName ProducerC#
Star Wars
1977
124
sciFi
Fox
12345
Galaxy Quest
1999
104
Comedy
DreamWorks
67890
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
70
2.4.6 Selection (cont'd)
Example 2.11
The Movies relation is given.
Find the set of tuples that represent Fox movies
at least 100 minutes long.
So, we are looking for:
σlength >= 100 AND studioName = 'FOX' (Movies).
Title
Year
Length Genre
studioName ProducerC#
Star Wars
1977
124
sciFi
Fox
12345
Galaxy Quest
1999
104
Comedy
DreamWorks
67890
Wayne's World
1992
95
Comedy
Paramount
99999
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
71
2.4.6 Selection (cont'd)
Example 2.11 (cont'd)
The result would be:
Title
Year
Length Genre
studioName ProducerC#
Star Wars
1977
124
Fox
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
sciFi
12345
72
2.4.7 Cartesian Product



The Cartesian Product (or cross product or
just product for simplicity) of two sets R and S
is the set of pairs that can be formed by
choosing the first element of the pair to be
any element of R and the second any element
of S.
This product is denote by R X S.
In relational algebra, the sets are the
relations and the members are the tuples.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
73
2.4.7 Cartesian Product (cont'd)
Example 2.12
A
R.B S.B C
D
1
2
2
5
6
1
2
4
7
8
B
C
D
1
2
9
10
11
A
B
2
5
6
1
2
3
4
2
5
6
4
7
8
3
4
3
4
4
7
8
9
10
11
3
4
9
10
11
Relation R
Relation S
Result R X S
Note that attribute B is in both schemas, it has
been R.B and S.B in the result to disambiguate
them.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
74
2.4.8 Natural Joins





More often the cross product is not what we
want. Usually we want to pair only those
tuples that match in some certain conditions.
The simplest way is the natural join of two
relation R and S.
In this join we pair those tuples that agree
with the common attributes in R and S.
Natural join is denoted by:
R∞S
A tuple that fails to pair with any tuple of the
other relation is said to be a dangling tuple.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
75
2.4.8 Natural Joins (cont'd)
Example 2.13
B
C
D
A
B
2
5
6
A
B
C
D
1
2
4
7
8
1
2
5
6
3
4
9
10
11
3
4
7
8
Relation R
Relation S
Result R ∞ S
Note that attribute B is in both schemas, and
since they should be equal in the result,
therefore, one copy of it is enough in the
result.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
76
2.4.8 Natural Joins (cont'd)
Example 2.14
A
B
C
D
A
B
C
B
C
D
1
2
3
4
1
2
3
2
3
4
1
2
3
5
6
7
8
2
3
5
6
7
8
10
9
7
8
7
8
10
9
7
8
10
Relation U
Relation V
Result U ∞ V
Note that attribute B and C are in both schemas,
and since they should be equal in the result,
therefore, one copy of them is enough in the
result.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
77
2.4.9 Theta-Joins




Equating the shared attributes is just one way
that is used in natural join.
It is sometimes desirable to pair tuples from
two relations on some other basis.
Historically, the theta refers to an arbitrary
condition. We use C as the condition rather
than θ.
Theta-join is denoted by:
R ∞C S
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
78
2.4.9 Theta-Joins (cont'd)
Example 2.15
A
B
C
A
U.B
U.C
V.B
V.C
D
1
2
3
1
2
3
2
3
4
6
7
8
1
2
3
2
3
5
9
7
8
1
2
3
7
8
10
Relation U
6
7
8
7
8
10
9
7
8
7
8
10
B
C
D
2
3
4
2
3
5
7
8
10
Result: U ∞A<D V
Relation V
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
79
2.4.9 Theta-Joins (cont'd)
Example 2.16
A
B
C
1
2
3
6
7
8
9
7
8
Relation U
B
C
D
2
3
4
2
3
5
7
8
10
A
U.B
U.C
V.B
V.C
D
1
2
3
7
8
10
Result: U ∞A<D AND U.B <> V.B V
Relation V
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
80
2.4.10 Combining Operations to
Form Queries



Relational algebra like all other algebras,
allows us to form complex expressions by
applying operations to the result of other
operations.
One can construct expressions of relational
algebra by applying operators to sub
expressions, using parenthesis when
necessary to indicate grouping of operands.
It is also possible to represent expressions as
expression trees.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
81
2.4.10 Combining Operations to
Form Queries (cont'd)
Example 2.17
What are the titles and years of Movies made by
fox that are at least 100 minutes long?
One solution would be:
1.
Select those tuples that have length >= 100.
2.
Select those tuples that have studioName =
'Fox'.
3.
Compute the intersection of (1) and (2).
4.
Project the relation from (3) onto attributes
title and year.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
82
2.4.10 Combining Operations to
Form Queries (cont'd)
Example 2.17 (cont'd)
Here is the
suggested
expression tree!
π title, year
σlength >= 100
Movies
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011

σstudioName = 'Fox'
Movies
83
2.4.10 Combining Operations to
Form Queries (cont'd)
Example 2.17 (cont'd)
Alternatively, we could represent the same
expression in a linear notation as follows:
πtitle,year (σlength>=100(Movies)  σstudioName=‘Fox’(Movies))
There are always more than one solution for a
problem. For instance, the following expression
does the same job but more efficiently. Can you
say why?
πtitle,year(σlength>=100 AND studioName=‘Fox’(Movies))
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
84
2.4.11 Naming and Renaming




Sometimes we need to change the relation's
name or change its attributes names.
The following operator renames the relation R
to S and renames the attributes as well:
ρ S(A1, A2, …, An ) (R)
Note that the resulting relation has the same
tuples. In other words, the renaming operator
does not change the relation's contents.
If we just want to rename the relation's name,
then we can eliminate the attributes as:
ρ S (R)
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
85
2.4.11 Naming and Renaming
(cont'd)
Example 2.18
This is the same as example
2.12 but it uses the
renaming operator to avoid
ambiguity between the
attributes.
A
B
X
C
D
1
2
2
5
6
1
2
4
7
8
B
C
D
1
2
9
10
11
A
B
2
5
6
3
4
2
5
6
1
2
4
7
8
3
4
4
7
8
3
4
9
10
11
3
4
9
10
11
Relation R
Relation S
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
R X ρ S(X, C, D) (S)
86
2.4.11 Naming and Renaming
(cont'd)
Example 2.18 (cont'd)
Alternatively, we could make the product first
and then rename the attributes as follows:
ρ RS(A, B, X, C, D) (R X S)
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
87
2.4.12 Relationships Among
Operations




Some operations can be expressed in terms of
other operations.
For instance, the following identity is valid:
R  S = R – (R – S)
Also, theta-join can be expressed by cross
product and selection as follows:
R ∞C S = σC (R X S)
The other equality is between natural-join and
cross product as follows:
R ∞ S = πL (σC (R X S))
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
88
2.4.13 A linear Notation for
Algebraic Expressions
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
89
2.4.14 Exercises for Section 2.4
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
90
Section 2.5
CONSTRAINTS ON RELATIONS
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
91
2.5 Constraints on Relations
2.5.1 Relational Algebra as a Constraint Language
2.5.2 Referential Integrity Constraints
2.5.3 Key Constraints
2.5.4 Additional Constraint Examples
2.5.5 Exercises for Section 2.5
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
92
2.5 Constraints on Relations



A constraint is the ability to restrict the data
that may be stored in a database.
So far we have seen one kind of constraints,
the key.
Constraints can be expressed in relational
algebra.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
93
2.5.1 Relational Algebra as a
Constraint Language

There are two ways in which we can use
expressions of relational algebra to express
constraints:
1. If R is an expression of a relational algebra, then
R= is a constraint that says “the value of R must
be empty”, or equivalently, “There are no tuples in
R”.
2. If R and S are expressions of relational algebra,
then R  S is a constraint that says “Every tuple in
R must also be in S.”
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
94
2.5.2 Referential Integrity
Constraints


Referential Integrity constraint asserts that a
value appearing in one relation must also
appear in another related relation.
For instance, in the Movies database, should
we see a StarsIn tuple that has a person p in
the starName attribute, we would expect that
p appears as the name of some star in the
MovieStar relation.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
95
2.5.2 Referential Integrity
Constraints (cont'd)


In general, if we have any value v as the
component in attribute A of some tuple in one
relation R, then v must appear in a particular
component, say for attribute B, of some tuple
of another relation S.
We can express this integrity constraint in
relational algebra as:
πA (R)  πB(S)
or equivalently as:
πA (R) - πB(S) = 
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
96
2.5.2 Referential Integrity
Constraints (cont'd)
Example 2.21
Consider the following schemas:
Movies(title, year, length, genre, studioName,
producerC#)
MovieExec(name, address, cert#, netWorth)
The producer of a movie should be an executive
and should have a record in MovieExec.
Therefore, we must expect that producerC# in
Movies relation should appear as cert# in one
tuple of MovieExec relation.
πproducerC# (Movies)  πcert#(MovieExec)
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
97
2.5.2 Referential Integrity
Constraints (cont'd)
Example 2.22 (multi-value referential integrity)
Consider the following schemas:
StarsIn(movieTitle, movieYear, starName)
Movies(title, year, length, genre, studioName,
producerC#)
The combined movieTitle and movieYear in
StarsIn relation must appear in one tuple of
Movies relation.
πmovieTitle, movieYear (StarsIn)  πtitle, year (Movies)
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
98
2.5.3 Key Constraints
Example 2.23
Consider the following schema:
MovieStar(name, address, gender, birthdate)
The attribute 'name' is the key of this relation.
That is, if two tuples have the same name, then
they must have the same address, gender, and
birthdate.
To express this constraint in relational algebra,
we make Cartesian product of the relation
with itself as follows:
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
99
2.5.3 Key Constraints (cont'd)
Example 2.23 (cont'd)
σMS1.name = MS2.name AND MS1.address <> MS2.address (MS1 X MS2) = 
Note that we renamed the MovieStar relation to
MS1 and MS2 to disambiguate the references
to them.
Here are the renaming operators:
ρMS1 (MovieStar)
ρMS2 (MovieStar)
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
100
2.5.4 Additional Constraint
Examples
Example 2.24
Consider the following schema:
MovieStar(name, address, gender, birthdate)
Suppose we wish to specify that the only legal
values for the gender attribute are 'F' and 'M'.
We can express this constraint in relational
algebra as:
σgender <> 'F' AND gender <> 'M' (MovieStar) = 
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
101
2.5.4 Additional Constraint
Examples (cont'd)
Example 2.25
Consider the following schema:
MovieExec(name, address, cert#, netWorth)
Studio(name, address, presC#)
Suppose we wish to require that one must have
a net worth of at least $10,000,000 to be the
president of a movie studio.
The constraint can be expressed as:
σnetWorth < 10000000 (Studio ∞presC# = cert# MovieExec) = 
or equivalently as:
πprecC# (Studio)  πcert# (σnetWorth < 10000000 (MovieExec))
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
102
2.5.5 Exercises for Section 2.5
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
103
2.6 Summary of Chapter 2
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
104
2.7 References for Chapter 2
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011
105
Download