CS 405G: Introduction to Database Systems Lecture 4: Relational Model Instructor: Chen Qian

advertisement
CS 405G: Introduction to
Database Systems
Lecture 4: Relational Model
Instructor: Chen Qian
Review

A data model is


What are the two terms used by ER model to describe
a miniworld?



a group of concepts for describing data.
Entity
Relationship
What makes a good conceptual database design
7/1/2016
?
10101
11101
2
Today’s Outline

Relational Model

Relational Model and Relational Database Schemas


Informal definition, not so formal, and formal
Relational Model Constraints
7/1/2016
3
Why Study the Relational Model?


Most widely used model.
“Legacy systems” in older models


e.g., IBM’s IMS
Object-oriented concepts merged in

“Object-Relational” model


Early work done in POSTGRES research project at Berkeley
XML features in most relational systems


Can export XML interfaces
Can embed XML inside relational fields
7/1/2016
4
Historically

The model was first proposed by Dr. E.F. Codd of IBM in
1970 in the following paper:
"A Relational Model for Large Shared Data Banks,"
Communications of the ACM, June 1970.
The above paper caused a major revolution in the field of Database
management and earned Ted Codd the coveted ACM Turing Award.
The picture is from wikipedia
7/1/2016
5
Database Design
7/1/2016
6
Relational Model Concepts


Relational database: a set of relations.
Relation: made up of 2 parts:


7/1/2016
Schema : specifies name of relation, plus the name
and type of each attribute.
 E.g. Students(sid: string, name: string, login:
string, age: integer, gpa: real)
Instance : a table, with rows and columns.
 #rows = cardinality
 #fields = degree / arity
7
Relation

RELATION: A table of values



7/1/2016
A relation may be thought of as a set of rows (table
view).
Each row represents a fact that corresponds to a realworld entity or relationship.
Each row has a value of an item or set of items that
uniquely identifies that row in the table.
8
Relation



7/1/2016
Sometimes row-ids or sequential numbers are assigned
to identify the rows in the table.
A relation may alternately be thought of as a set of
columns (schema view).
Each column typically is called by its column name or
column header or attribute name.
9
A (Slightly) Formal Definition



A database is a collection of relations (or tables)
Each relation is identified by a name and a list of
attributes (or columns)
Each attribute has a name and a domain (or type)



Such as SID:string
Set-valued attributes not allowed
Simplicity is a virtue!
7/1/2016
10
Schemas



Relation schema = relation name + attributes + types
of attributes

in order

Example: Beers(name, manf) or Beers(name: string, manf:
string)
Database = collection of relations.
Database schema = set of all relation schemas in the
database.
11
Schema versus instance
Schema (metadata)




Students(sid: string, name: string, login: string, age:
integer, gpa: real)
Specification of how data is to be structured logically
Defined at set-up; Rarely changes
Instance





Content
Changes rapidly, but always conforms to the schema
Compare to type and objects of type in a programming
language
Entity and entity type?
7/1/2016
12
Example

Schema




Student (SID integer, name string, age integer, GPA float)
Course (CID string, title string)
Enroll (SID integer, CID integer)
Instance



{ 142, Amy, 20, 3.3, 123, Bob, 22, 3.1, ...}
{ CS405G, Intro. to Database Systems, ...}
{ 142, CS405G, 142, CS314, ...}
7/1/2016
13
Formal Definition (Set Theory)

Formally, given sets D1, D2, …. Dn a relation r is a
subset of D1 x D2 x … x Dn

x: Cartesian product


For sets A and B, the Cartesian product A × B is the set of
all ordered pairs (a, b) where a ∈ A and b ∈ B.
Thus, a relation is a set of n-tuples (a1, a2, …, an) where
each ai  Di
7/1/2016
14
Example


Example: If

customer_name = {Jones, Smith, Curry, Lindsay, …}

customer_street = {Main, North, Park, …}

customer_city
= {Harrison, Rye, Pittsfield, …}
Then r = { (Jones, Main, Harrison), (Smith,
North, Rye),
(Curry, North, Rye), (Lindsay, Park, Pittsfield) }
is a relation over
customer_name × customer_street × customer_city
7/1/2016
15
Attribute Types



Each attribute of a relation has a name, designating
the role of the attribute
The set of allowed values for each attribute is
called the domain of the attribute
Attribute values (domain members) are required to
be atomic; that is, indivisible



7/1/2016
E.g. the value of an attribute can be an account
number, but cannot be a set of account numbers
Domain is said to be atomic if all its members are
atomic
The special value null is a member of every domain
16
Relation Schema


A1, A2, …, An are attributes
R = (A1, A2, …, An ) is a relation schema
Example:
Customer_schema = (customer_name,
customer_street, customer_city)

7/1/2016
r(R) denotes a relation r on the relation schema R
Example:
customer (Customer_schema)
17
Relation Instance


The current values (relation instance) of a
relation are specified by a table
An element t of r is a tuple, represented by a
row in a table
attributes
(or columns)
customer_name customer_street
Jones
Smith
Curry
Lindsay
Main
North
North
Park
customer_city
Harrison
Rye
Rye
Pittsfield
tuples
(or rows)
customer
7/1/2016
18
Definition Summary
Informal Terms
Formal Terms
Table
Relation
Column
Attribute/Domain
Row
Tuple
Values in a column
Domain
Table Definition
Schema of a Relation
Populated Table
Extension
7/1/2016
19
Characteristics of Relation



The tuples in a ration r(R) are not considered to be
ordered, even though they appear to be in the tabular
form.
We consider the attributes in R(A1, A2, ..., An) and the
values in t=<v1, v2, ..., vn> to be ordered .
All values are considered atomic (indivisible). A
special null value is used to represent values that are
unknown or inapplicable to certain tuples.
7/1/2016
20
Characteristics of Relation

Notation: we refer to component values of a tuple t
by t[Ai] = vi (the value of attribute Ai for tuple t).
Similarly, t[Au, Av, ..., Aw] refers to the subtuple
of t containing the values of attributes Au, Av, ...,
Aw, respectively.
7/1/2016
21
Relational Integrity Constraints

Integrity Constraints are conditions that must hold on
all valid relation instances.

There are four main types of constraints:
1.
Domain constraints
1.
2.
3.
4.
7/1/2016
The value of a attribute must come from its domain
Key constraints
Entity integrity constraints
Referential integrity constraints
22
Primary Key Constraints

A set of fields is a candidate key (abbreviated as key)
for a relation if :
1. No two distinct tuples can have same values in all key
fields, and
2. Property 1 is not true for any subset of the key.

What if Part 2 is false? A super key: a set of fields that
contains a key.

If there are multiple keys for a relation, one of the keys
is chosen (by DBA) to be the primary key.
Key Example

E.g., given a schema Student(sid: string, name: string,
gpa: float) we have:


sid is a key for Students. (What about name?) The set
{sid, gpa} is a superkey.
CAR (licence_num: string, Engine_serial_num: string,
make: string, model: string, year: integer)



What is the candidate key(s)
Which one you may use as a primary key
What are the super keys
7/1/2016
24
Entity Integrity

Entity Integrity: The primary key attributes (PK) of
each relation schema R cannot have null values in any
tuple of r(R).

7/1/2016
Other attributes of R may be similarly constrained to
disallow null values, even though they are not
members of the primary key.
25
Foreign Keys, Referential Integrity

Foreign key : Set of fields in one relation that is used to

`refer’ to a tuple in another relation. (Must correspond
to primary key of the second relation.) Like a `logical
pointer’.
Foreign key constraint: The foreign key in the
referencing relation must match the primary key of the
referenced relation.
E.g. sid is a foreign key referring to Students:




Student(sid: string, name: string, gpa: float)
Enrolled(sid: string, cid: string, grade: string)
If all foreign key constraints are enforced, referential
integrity is achieved, i.e., no dangling references.
Foreign Key constraints

Only students listed in the Students relation should be
allowed to enroll for courses.
Enrolled
sid
53666
53666
53650
53666


cid
grade
Carnatic101
C
Reggae203
B
Topology112
A
History105
B
Students
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
age
18
18
19
Possible violation: Add <50000, History105, B> to
Enrolled.
Possible violation: delete <53650, Smith, …> from
Students.
gpa
3.4
3.2
3.8
Update Operations on Relations

Update operations




7/1/2016
INSERT a tuple.
DELETE a tuple.
MODIFY a tuple.
Constraints should not be violated in updates
28
From E/R Diagrams to Relations

Called logical design (different from conceptual
design)

Entity sets become relations with the same set of
attributes.

Relationships become relations whose attributes are
only:


The keys of the connected entity sets.
Attributes of the relationship itself.
29
Design principles

KISS


Avoid redundancy



Keep It Simple, Stupid
Redundancy wastes space, complicates updates and
deletes, promotes inconsistency
Capture essential constraints, but don’t introduce
unnecessary restrictions
Use your common sense
7/1/2016
Luke Huan Univ. of Kansas
30
Entity Set -> Relation
name
manf
Beers
Relation: Beers(name, manf)
31
Relationship -> Relation



To represent a relationship, the attributes of the
relation include:
1. the primary key attributes of each participating
entity set, becoming foreign keys.
2. the descriptive attributes of the relationship set
name
addr
employee
duration name
Work
location
department
Work(employee name, dept name, duration)
32
Relationship -> Relation

The set of nondescriptive attributes is a candidate
key, if there are no key constraints.
name
addr
employee
duration name
Work
location
department
Work(employee name, dept name, duration)
Relationship -> Relation


If there is a key constraint,
the key of the entity with an arrow is the candidate
key of the relation.
name
addr
employee
duration name
manage
location
department
Manage(employee name, dept name, duration)
Relationship -> Relation
name
husband
addr
Drinkers
1
name
Likes
manf
Beers
2
Buddies
Favorite
wife
Married
Likes(drinker name, beer name)
Favorite(drinker name, beer name)
Buddies(name1, name2)
Married(husband name, wife name)
35
Combining Relations


It is OK to combine the relation for an entity-set E
with the relation R for a many-one relationship from
E to another entity set.
Example: Drinkers(name, addr) and Favorite(drinker,
beer) combine to make Drinker1(name, addr,
favBeer).
name
addr
Drinkers
name
Favorite
manf
Beers
36
Combining Relations


Risk with Many-Many Relationships:
Combining Drinkers with Likes would be a mistake.
It leads to redundancy
name
addr
Drinkers
name
Likes
manf
Beers
name
addr
beer
Sally 123 Maple Bud
Sally 123 Maple Miller
Redundancy
37
Handling Weak Entity Sets

Relation for a weak entity set must include attributes
for its complete key (including those belonging to
other entity sets), as well as its own, nonkey
attributes.

An identifying (double-diamond) relationship is
redundant and yields no relation.
38
Translating weak entity sets


Remember the “borrowed” key attributes
Watch out for attribute name conflicts
number
name
Rooms
capacity
In
Buildings
year
In
number
Seats
Building (building_name, year)
Rooms (building_name, room_number, capacity)
eats (building_name, room_number, seat_number, left_or_right)
L/R?
7/1/2016
39
Example
name
time
40
Logins
name
At
Hosts
Example
name
time
Logins
name
At
Hosts
Hosts(hostName)
Logins(loginName, hostName, time)
At(loginName, hostName, hostName2)
At becomes part of
Logins
41
Must be the same
Mapping of N-ary Relationship Types

For each n-ary relationship type R, where n>2,
create a new relationship to represent R.

Include
 all foreign keys of the participating entity types.
 include any attributes of the n-ary relationship
type
7/1/2016
42
Ternary relationship types. (a) The SUPPLY relationship.
7/1/2016
43
Mapping the n-ary relationship type SUPPLY
7/1/2016
44
Some exercise



Consider the relations Students, Faculty, Courses,
Rooms, Enrolled, Teaches, and Meets.
1. List all the foreign key constraints among these
relations.
2. Give an example of a (plausible) constraint involving
one or more of these relations that is not a primary key
or foreign key constraint.
7/1/2016
45
Some exercise

1.





No foreign keys for Students, Faculty, Courses, Rooms
Enrolled: sid and cid should both have FKCs placed on
them. (Real students must be enrolled in real courses.)
Teaches: fid and cid
Meets: cid and rno.
2.




the length of sid, cid, and fid could be standardized;
limits could be placed on the size of the numbers entered
into the credits, room/course capacity, and faculty salary;
an enumerated type should be assigned to the grade field
etc
7/1/2016
46
Example

We have the following relational schemas





Student(sid: string, name: string, gpa: float)
Course(cid: string, department: string)
Enrolled(sid: string, cid: string, grade: character)
We have the following sequence of database update
operations. (assume all tables are empty before we apply
any operations)
INSERT<‘1234’, ‘John Smith’, ‘3.5> into Student
sid
1234
7/1/2016
name
John Smith
Chen Qian, University
47
of Kentucky
gpa
3.5
Example (Cont.)




INSERT<‘647’,
‘EECS’> into Courses
INSERT<‘1234’,
‘647’, ‘B’> into
Enrolled
UPDATE the grade in
the Enrolled tuple with
sid = 1234 and cid =
647 to ‘A’.
DELETE the Enrolled
tuple with sid 1234
and cid 647
7/1/2016
sid
1234
name
John Smith
cid
647
department
EECS
sid
cid
grade
1234
647
A
B
Chen Qian, University
48
of Kentucky
gpa
3.5
Exercise



INSERT<‘108’,
‘MATH’> into
Courses
INSERT<‘1234’,
‘108’, ‘B’> into
Enrolled
INSERT<‘1123’,
‘Mary Carter’, ‘3.8’>
into Student
7/1/2016
sid
1234
name
John Smith
gpa
3.5
1123
Mary Carter
3.8
cid
647
108
department
EECS
MATH
sid
1234
cid
108
Chen Qian, University
49
of Kentucky
grade
B
Exercise (cont.)


A little bit tricky
INSERT<‘1125’, ‘Bob
Lee’, ‘good’> into
Student


INSERT<‘1123’,
NULL, ‘B’> into
Enrolled


Fail due to domain
constraint
name
John Smith
gpa
3.5
1123
Mary Carter
3.8
cid
647
department
EECS
108
MATH
sid
1234
cid
108
Fail due to entity
integrity
INSERT
<‘1233’,’647’, ‘A’>
into Enrolled

sid
1234
Failed due to
referential integrity
7/1/2016
Chen Qian, University
50
of Kentucky
grade
B
Exercise (cont.)


A more tricky one
UPDATE the cid in
the tuple from Course
where cid = 108 to
109
7/1/2016
sid
1234
name
John Smith
gpa
3.5
1123
Mary Carter
3.8
cid
647
department
EECS
108
109
MATH
sid
1234
cid
108
109
Chen Qian, University
51
of Kentucky
grade
B
Update Operations on Relations

In case of integrity violation, several actions can
be taken:




7/1/2016
Cancel the operation that causes the violation
(REJECT option)
Perform the operation but inform the user of the
violation
Trigger additional updates so the violation is
corrected (CASCADE option, SET NULL option)
Execute a user-specified error-correction routine
Chen Qian, University
52
of Kentucky
Next class

Relational algebra (hard part!)
7/1/2016
53
Download