ppt file

advertisement
Relational Algebra
Spring 2012
Instructor: Hassan Khosravi
Querying relational databases
 Lecture given by Dr. Widom on querying Relational Models
2.2
2.1 An Overview of Data Models
 2.1.1 What is a Data Model?
 2.1.2 Important Data Models
 2.1.3 The Relational Model in Brief
 2.1.4 The Semi-structured Model in Brief
 2.1.5 Other Data Models
 2.1.6 Comparison of Modeling Approaches
2.3
2.1.1 What is a Data Model?
Data model is a notion for describing data or information. Real World  Math
Model:
1.
Structure of the data (tuples)
2.
Operations on the data –queries to retrieve and modify information
3.
Constraints on the data – year has to be integer, name is string .

Important data models

The relational Model

The semi-structured data model XML
2.4
Relational Model in Brief
Title
Year
Length
genre
Gone with the wind
1939
231
Drama
Star Wars
1977
124
SciFi
Wayne’s world
1992
95
comedy
 Relational model is based on tables
 Operations: query, modify
 Constraints: year is Integer between 1930-2012
 The structure may appear to resemble an array of structs in C where
the column headers are the field names and each row represent the
values of one struct in the array.

Distinction in scales of relations

Not normally implemented as main-memory structure

Take into consideration to access relations on hard drive
2.5
 Semi structure data resembles trees or
The Semi-structured Model
in Brief
<Movies>
<Movie title=“Gone with the Wind”>
<Year>1939</Year>
graphs rather than tables or arrays.
 Operations usually involve following in
the tree.

Find the movies with the comedy genre.
<Length>231</Length>
<Genre>drama</Genre>
</Movie>
<Movie title=“Star Wars”>
 Constraints often involve data types of
values associated with a tag.

<title= Wars >
<Year>1977</Year>
<Length>124</Length>
<Genre>sciFi</Length>
</Movie>
<Movie title=“Wayne’s World”>
<Year>1992</Year>
<Length>95</Length>
<Genre>comedy</Genre>
</Movie>
</Movies>
2.6
Values associated with the length tag are
integers
Comparison of Modeling Approaches
 Semi-structured models have more flexibility than relations. However,
the relational model is still preferred in DBMS’s.
1.
2.
Efficiency of access to data and efficiency of modifications to that
data are more important than flexibility
ease of use is more important than flexibility.
 SQL enables the programmer to express their wishes at very high
level. The strongly limited set of operations can be optimized to run
very fast
2.7
Basics of the Relational model
Title
Year
Length
genre
Gone with the wind
1939
231
Drama
Star Wars
1977
124
SciFi
Wayne’s world
1992
95
comedy
 Attributes: columns of a relation are named attributes.
 Schema: the name of the relation and the set of attributes

Movies(title, year, length, genre)
 Tuples: The rows of a relation, other than the header
 Domains: the value for each attribute must be atomic (can not be
structure). Each attribute has a domain of values.
2.8
Equivalent Representations of a Relation
Relations are sets of tuples not lists of tuples. The order of tuples does
not matter. Attributes could be reordered too.
Title
Year
Length
genre
Gone with the wind
1939
231
Drama
Star Wars
1977
124
SciFi
Wayne’s world
1992
95
comedy
Year
Genre
Title
length
1977
SciFi
Star Wars
124
1992
Comedy Wayne’s World
95
1939
Drama
231
Gone With the Wind
How many different ways can we present the given relation?
2.9
Relation Instances and Keys

A set of tuples for a given relation is called an instance of that
relation. It is expected for the instance of the relation to change
over time.


New movies are added to the table
It is less common for the schema of a relation to change. It is hard
to add a new value for all the current tuples if a new attribute is
added to the schema.
 Keyes of relations

Key constraints: A set of attributes form a key if we do not allow
two tuples in a relation instance to have the same value.

We indicate the attributes that form a key by underlining them


Key most be true for all possible instances of a relation not a
specific instance.


Movies(title, year, length, genre)
Genre is not a key
What if our data does not have a key?

Generate artificial ID. Student Number
2.10
Database Schema about Movies
Movies(
title: string;
Year : integer,
Length : integer,
Genre : string,
studioName : string,
producerC# : integer
MovieExec (
name: string,
address : string
cert# : integer
netWorth : integer
)
)
Studio (
Moviestar (
name : string,
address : string,
gender : char,
birthdate : date
)
name: string,
address : string
pressC# : integer
)
StarsIn (
MovieTitle: string,
Movieyear : integer
Starname : string
)
2.11
Defining a Relation Schema in SQL
 2.3.1 Relations in SQL
 2.3.2 Data Types
 2.3.3 Simple Table Declarations
 2.3.4 Modifying Relation Schemas
 2.3.5 Default Values
 2.3.6 Declaring Keys
 2.3.7 Exercises for Section 2.3
2.12
2.3.1 Relations in SQL
 SQL also pronounced (sequel) is the principal language used to
describe and manipulate relational database
 SQL makes a distinction between three kinds of relations

Stored relations (tables): this relations are tables that exist in the
database we can query and modify

Views: are relations defined by a computation. They are not stored
but constructed. We just query them (chapter 8)

Temporary tables: are constructed by SQL language processor
during optimization. These are not stored nor seen by the user
2.13
Data Types
 Char(n): a fixed-length string of up to n characters.

Char(5) of foo is stored “foo ”
 Varchar(n): a variable-length string of up to n characters

Varchar(5) of foo is stored “foo”
 Bit(n), Varbit(n) fixed and variable string of upto n bits.
 Boolean: True False and although it would surprise George Boole
Unknown
 Int or Integer: typical integer values
 Float or real: typical real values
 Decimal(6,2) could be 0123.45
 Date and time: essentially char strings with constraints.
2.14
2.3.3 Simple Table Declarations
CREATE TABLE Movie (
Movies(
title: string;
Year : integer,
Length : integer,
Genre : string,
studioName : string,
producerC# : integer
title VARCHAR(255),
year INTEGER,
length INTEGER,
inColor CHAR(1),
)
studioName CHAR(50),
producerC# INTEGER,
);
CREATE TABLE MOVIESTAR (
NAME CHAR(30),
ADDRESS VARCHAR2(50),
GENDER CHAR(6) ,
BIRTHDATE DATE
);
2.15
Moviestar (
name : string,
address : string,
gender : char,
birthdate : date
)
Modifying Relation Schemas
 We can delete a table R by the following SQL command

Drop table R;
 We can modify a table by the command

Alter Table MovieStar ADD phone CHAR(16);

Alter Table MovieStar Drop birthdate;
 Defaults values

To use the default character ? As the default for an unknown
gender.

Earliest possible date for Unknown Birthdate. DATE ‘0000-00-00’

Gender CHAR(1) DEFAULT ‘?’,

Birthdate DATE DEFAULT DATE ‘0000-00-00’,

ALTER TABLE MovieStar ADD phone CHAR (16) DEFAULT ‘
unlisted’;
2.16
2.3.6 Declaring Keys
 Two ways to declare keys in CRATE table statement

Primary key can not be null

Unique can be null

Replace primary with unique in examples to get the example with
unique
CREATE TABLE MovieStar (
name CHAR (30) Primary Key,
address VARCHAR (255),
gender CHAR(1),
birthdate DATE
);
CREATE TABLE MovieStar (
name CHAR (30),
address VARCHAR (255),
gender CHAR(1),
birthdate DATE
PRIMARY KEY (name)
);
2.17
Example 2.7
 The Relation Movie, whose key is the pair of attributes ‘title and year’
must be declared like this
CREATE TABLE Movies(
title
CHAR(100),
year
INTEGER,
length
INTEGER,
genre
CHAR(10),
studiName
CHAR(30),
producerC#
INTEGER,
PRIMARY KEY
(title,year)
);
2.18
Quick summary
 Lecture given by Dr. Widom on Relational Model definition
2.19
2.4 An Algebraic Query Language
 2.4.1 Why Do We Need a Special Query Language?
 2.4.2 What is an Algebra?
 2.4.3 Overview of Relational Algebra
 2.4.4 Set Operations on Relations
 2.4.5 Projection
 2.4.6 Selection
 2.4.7 Cartesian Product
 2.4.8 Natural Joins
 2.4.9 Theta-Joins
 2.4.10 Combining Operations to Form Queries
 2.4.11 Naming and Renaming
 2.4.12 Relationships Among Operations
 2.4.13 A Linear Notation for Algebraic Expressions
 2.4.14 Exercises for Section 2.4
2.20
Why Do We Need a Special Query
Language?
 Why not just use C or java instead of introducing relational algebra ?

Relational algebra is useful because it is less powerful than C and
Java. One of the only areas where non-Turing-complete
languages make sense.


Relational algebra CANNOT determine whether the number of
tuples are odd or even
Being less powerful is helpful because

Ease of programming

Ease of compilation
– Ease of optimization
2.21
Projection
 The Projection operator applied to a relation R, produces a new
relation with a subset of R’s columns.
 Duplicate tuples are eliminated.
Title
Year
Length
Genre
Studioname
producerC#
Star Wars
1977
124
SciFi
Fox
12345
Galaxy
1999
104
Comedy
DreamWorks
67890
Wayne’s
World
1992
95
Comedy
Paramount
99999
∏Title,year,length (Movies)
Title
Year
Length
Star Wars
1977
124
Galaxy Quest
1999
104
Wayne’s World
1992
95
∏genre (Movies)
Genre
SciFi
Comedy
2.22
Selection and Projection
 Lecture given by Dr. Widom on selection and projection
2.23
2.4.6 Selection
 The selection operator applied to a relation R, produces a new relation
with a subset of R’s tuples.
Title
Year
Length
Genre
Studioname
producerC#
Star Wars
1977
124
SciFi
Fox
12345
Galaxy
1999
104
Comedy
DreamWorks
67890
Wayne’s
World
1992
95
Comedy
Paramount
99999
σ length >= 100(Movie)
Title
Year
Length
Genre
StudioName
producerC#
Star Wars
1977
124
SciFi
Fox
12345
Galaxy
1999
104
Comedy
DreamWorks
67890
2.24
Example for Selection
 Set tuples in the relation movies that represent Fox Movies at least
100 minutes long.
Title
Year
Length
Genre
Studioname
producerC#
Star Wars
1977
124
SciFi
Fox
12345
Galaxy
1999
104
Comedy
DreamWorks
67890
Wayne’s
World
1992
95
Comedy
Paramount
99999
σ Length >= 100 AND studioName = ‘Fox’ (Movies)
Title
Year
Length
Genre
StudioName
producerC#
Star Wars
1977
124
SciFi
Fox
12345
2.25
2.4.7 Cartesian Product
 The Cartesian Product of two sets R and S is the set of pairs that can
be formed by choosing the first element from R and the second from
S.
Relation R X S
Relation R
A
B
1
2
3
4
Relation S
B
C
D
2
5
6
4
7
8
9
10
11
A
R.B
S.B
C
D
1
2
2
5
6
1
2
4
7
8
1
2
9
10
11
3
4
2
5
6
3
4
4
7
8
3
4
9
10
11
 If R and S have some attribute in common, we need to invent new
name for the identical attributes.
2.26
Cartesian Product
 Lecture given by Dr. Widom on duplicates , cross product
2.27
2.28
2.4.8 Natural Joins
 The Natural join of two sets R and S is the set of pairs that agree in
whatever attributes are common to the schemas of R and S.


Let A1,A2, …, An be attributes in both R and S. a tuple r from R
and s from S are successfully paired if and only if r and s agree on
A1,A2, …, An
that can be formed by choosing the first element from R and the
second from S.
Relation R
A
B
1
2
3
4
Relation S
B
C
D
2
5
6
4
7
8
9
10
11
Relation R ⋈ S
2.29
A
B
C
D
1
2
5
6
3
4
7
8
Example for Natural Join
 A more complicated example for natural join
Relation U
Result U ⋈ V
Relation V
A
B
C
B
C
D
1
2
3
2
3
4
6
7
8
2
3
5
9
7
8
7
8
10
2.30
A
B
C
D
1
2
3
4
1
2
3
5
6
7
8
10
9
7
8
10
 Lecture given by Dr. Widom on Natural Join
2.31
2.32
2.33
Theta-Joins
 It is sometimes desirable to pair tuples on other conditions except all
the common attributes being equal. The notation for a theta-join of
relation R and S based on condition C is R ⋈C S

The result is constructed as follows:
– Take product of R and S
– Select tuples that satisfy C
U ⋈ A<D V
Relation U
Relation V
A
B
C
B
C
D
1
2
3
2
3
4
6
7
8
2
3
5
9
7
8
7
8
10
2.34
A
U.B
U.C
V.B
V.C
D
1
2
3
2
3
4
1
2
3
2
3
5
1
2
3
7
8
10
6
7
8
7
8
10
9
7
8
7
8
10
Example on Theta-Joins
 U and V that has more complex condition :

We require for successful pairing not only that the A
component of U-tuple be less than D component of the V-tuple,
but that the two tuples disagree on their respective B
components
Relation U
Relation V
A
B
C
B
C
D
1
2
3
2
3
4
6
7
8
2
3
5
9
7
8
7
8
10
U ⋈ A < D AND U.B <> V.B V
A
U.B
U.C
V.B
V.C
D
1
2
3
7
8
10
2.35
Combining Operations to Form Queries
 Example: “ What are the titles and years of movies made by Fox that
are at least 100 minutes long”
∏
Title,year
∩
σ
σ
length >=100
Movies
Movies
∏
Title,year
StudioName =‘Fox’
(σ length >=100 (Movies) ∩
σ
StudioName =‘Fox’
(Movies)
∏ Title,year (σ length >=100 AND StudioName =‘Fox’ (Movies)
2.36
Relational algebra
 Algebra in general consists of operators and atomic operands

Algebra of arithmetic operands are variables and constants and
operators are (+, -, *, /).
 Any algebra allows us to build expressions by applying an operator to
operands and other expressions.

(x+y)/z
Name
Address
Gender Birthdate
Carrie Fisher
123 Maple st., Hollywood
F
9/9/99
Mark hamill
456 Oak road., Brentwood
M
8/8/88
Relation R
Name
Address
Gender
Birthdate
Carrie Fisher
123 Maple st., Hollywood
F
9/9/99
Harrison Ford
789 Palm Dr., Beverly Hills
M
7/7/77
Relation S
2.37
Operations of relational algebra
 Union (R S): the set of elements that are in R, or S or both. Appears
only once in the union.
Relation R
Relation S
Name
Address
Gender
Birthdate
Carrie Fisher
123 Maple st., Hollywood
F
9/9/99
Mark hamill
456 Oak road., Brentwood
M
8/8/88
Name
Address
Gender
Birthdate
Carrie Fisher
123 Maple st., Hollywood
F
9/9/99
Harrison Ford
789 Palm Dr., Beverly Hills
M
7/7/77
Name
Address
Gender
Birthdate
Carrie Fisher
123 Maple st., Hollywood
F
9/9/99
Mark Hamill
456 oak Rd., Brentwood
M
8/8/88
Harrison Ford
789 Palm Dr., Beverly Hills
M
7/7/77
2.38
Operations of relational algebra
 Intersection (R S): the set of elements that are in both R and S.
Appears only once in the intersection.
Relation R
Relation S
Name
Address
Gender
Birthdate
Carrie Fisher
123 Maple st., Hollywood
F
9/9/99
Mark hamill
456 Oak road., Brentwood
M
8/8/88
Name
Address
Gender
Birthdate
Carrie Fisher
123 Maple st., Hollywood
F
9/9/99
Harrison Ford
789 Palm Dr., Beverly Hills
M
7/7/77
Name
Address
Gender
Birthdate
Carrie Fisher
123 Maple st.,
Hollywood
F
9/9/99
2.39
Operations of relational algebra
 The Difference (R-S): the set of elements that are in R and not in S.
Appears only once in the difference.
Relation R
Relation S
Name
Address
Gender
Birthdate
Carrie Fisher
123 Maple st., Hollywood
F
9/9/99
Mark hamill
456 Oak road., Brentwood
M
8/8/88
Name
Address
Gender
Birthdate
Carrie Fisher
123 Maple st., Hollywood
F
9/9/99
Harrison Ford
789 Palm Dr., Beverly Hills
M
7/7/77
Name
Address
Gender
Birthdate
Mark
Hamill
456 oak Rd.,
Brentwood
M
8/8/88
2.40
 Lecture given by Dr. Widom on union, difference, intersection
2.41
2.42
2.4.11 Naming and Renaming
 Operator to explicitly rename attributes in relations.
 PS(A1,A2, …, An ) (R)
results in a relation S that has exactly the same
tuples as R but the attributes names are A1,A2, …, An starting
from the left most attribute.
Relation S
Relation R
B
C
D
A
B
2
5
6
1
2
4
7
8
3
4
9
10
11
R X ρ s (X,C,D) (S)
A
B
X
C
D
1
2
2
5
6
1
2
4
7
8
1
2
9
10
11
3
4
2
5
6
3
4
4
7
8
3
4
9
10
11
2.43
 Lecture given by Dr. Widom on Renaming
2.44
Relationships Among Operations
 Intersection can be expressed as difference.

RS = R –(R –S)

See video
 Theta join can be expressed by product and selection

R ⋈ CS= C(RS)
 Natural join can be rewritten by product, selection, projection

Example Result U ⋈ V = ∏A,U.B, U.C, D(U.B=V.B AND U.C=V.B (UV))
Relation U
Relation V
A
B
C
B
C
D
1
2
3
2
3
4
6
7
8
2
3
5
9
7
8
7
8
10
Result U ⋈ V
A
B
C
D
1
2
3
4
1
2
3
5
6
7
8
10
9
7
8
10
 These are the only redundancies ( union, difference, selection,
projection, product, renaming) form an independent set.
2.45
2.5 Constraints on Relations
 2.5.1 Relational Algebra as a Constraint Language
 2.5.2 Referential Integrity Constraints
 2.5.3 Key Constraints
 2.5.4 Additional Constraint Examples
 2.5.5 Exercises for Section 2.5
 2.6 Summary of Chapter 2
 2.7 References for Chapter 2
2.46
Referential Integrity Constraints
 Referential Integrity Constraints

A value appearing in one context also appears in another, related
context

StarsIn(movietitle, movieYear,starName)

Movie(title,year,length,studioName, producerC#)

∏movieTitle, movieYear (StarsIn) ⊆ ∏title,year(Movies)

Movie(title,year,length,genre,studioName, producerC#)

MovieExec(name,address,cert#,netWorth)

∏producerC# (Movies) ⊆ ∏cert# (MocvieExec)
2.47
Key Constraints
 Recall that name is the key for relation

MovieStar(name,address,gender,birthdate)

The requirement can be expressed by the algebraic
expression

σ MS1.name = MS2.name AND MS1.address ≠ MS2.address(MS1 x MS2) = ∅

MS1 in the product MS1 x MS2 is shorthand for the remaining

ρ MS1(name,address,gender,birthdate) (MovieStar)
2.48
Example 2.24
 The only legal value for Gender attribute is ‘F’ and ‘M’. We can
express the gender attribute of MovieStar alegrabically by:

σ Mgender ≠‘F’
AND gender ≠‘M’(MovieStar)
2.49
=∅
Example 2.25
 If one must have networth of at least $100,000,000 to be president of
movie studio. FROM

MovieExec(name,address,cert#,networth)

Studio(name,address, presC#)
 First we have to perform theta-join on this two relations.

σ networth < 100000000(Studio ⋈ presC# = cert# MovieExec) = ∅
 Second way

∏TpressC#(Studio) ⊆ ∏cert#(σ networth < 100000000(MovieExec))
 Which one is more efficient?
2.50
Summary of Relational Algebra
 Lecture given by Dr. Widom on Relational Model
2.51
Download