The ER Database Model - Department of Computer Science

advertisement
The Relational Model 1
Prof. Sin-Min Lee
Department of Mathematics and
Computer Science
Prof. Sin-Min Lee
Department of Computer Science
History of Relational Model
• First proposed by a E.F. Codd in 1970. Codd
proposed the Relational model in 1970.
“ A relational model of data for large shared data banks.”
•He linked the representation of data with that of
mathematical sets.
• First research started at IBM’s San Jose Research
Laboratory. Prototype was called System R.
• Commercial RDBMS’s started to appear in late 1970’s
and early 1980’s. Most well known is Oracle.
Dr. Edgar F. Codd (1923-2003)

Codd completed his PhD at the
University of Michigan in 1963, and
presented a thesis on the topic of a
self-reproducing computer consisting
of a large number of simple identical
cells, each of which interacts in a
uniform manner with its four
immediate neighbors. Codd reported
this work in a book entitled Cellular
Automata published by Academic Press
in 1968.
Edgar Codd

The relational model devised by Codd was explored
during the 1970s, and commercial relational database
products began to emerge in the 1980s, originally for
mainframe systems and later for microcomputers.
Edgar Codd first wrote about the concept of relational
databases in his paper "A Relational Model of Data for
Large Shared Data Banks" in 1970. He was using the
term "relation" in the strict mathematical sense of a
table with a few special properties. Thus Codd was
describing a system where all of the database - data,
structure, rules - is housed in simple tables of rows and
columns.
Edgar Codd

While this may seem obvious to us today, it was by no
means obvious in 1970. Codd went on to define
relational databases more completely where he laid out
twelve principles of relational databases in 1974. His
most recent work in 1990, expands the list to 333
requirements.(Codd 90)
Normalization for relational databases was introduced
by Dr. E. F. Codd back in 1970 when he wrote his
original paper. The concept has since been expounded
upon by other experts in the field.

Read his obituary
http://www.intosaiitaudit.org/intoit_articles/18p60top62.pdf
Read:
http://www.mercurynews.com/mld/mercurynews/
news/local/5676133.htm?1c


IBM database developer dead at 79
`RELATIONAL' MODEL IS BASIS OF TODAY'S TRANSACTIONS
By Lisa M. Krieger
Mercury News



Edgar F. Codd, an IBM computer pioneer who created the ``relational database
model'' that underlies a $7 billion industry of storing the world's online business data,
died of heart failure at home Friday in Williams Island, Fla. He was 79.
Bank accounts, credit cards, stock trading, travel reservations, online auctions and
innumerable other now-routine data transactions all rely on Codd's model, based on
highly abstract and complex mathematical theory.
Before Codd's landmark research paper in 1970, it was possible to store lots of
information -- but analyzing it was difficult, requiring lines and lines of code for even
simple tasks.
Codd’s Original Paper




“A Relational Model of Data for Large Shared Data
Banks”
Communications of the ACM, Volume 13, Number 6,
June 1970
Lower level (basement) of the new Martin Luther King,
Jr. Library
Get to roll the shelves apart to access journals.
Codd’s Reasons



Data independence from database
implementation such as machine representation
Natural structure of data
Can be analyzed mathematically (Codd was a
mathematician by training)
Alternative: Network Model


Charles A. Bachman 1973 ACM Turing Award
Lecture “The Programmer as Navigator”
Communications of the ACM, Volume 16,
Number 11, November 1973, pp. 653-657
INTRODUCTION
The relational model is the most used data model for
commercial data-processing because it is simple to
use and to maintain.
A relational data model is based on a collection of
tables. The user of the database system may query
these tables, insert new tuples, and update (modify)
tuples. There are several languages for expressing
these operations.
Data Models
·
Codd
suggests that
any data
model has
three
components:
the data
structures;
the integrity
constraints;
the data
manipulation
Relational Data Structure
EMPLOYEE
E1
Jones
Relation
Attribute
Sex
Mgr Emp#
Heading
Male
E65
E6
Smith
Male
E28
E28
Jones
Female
-
Emp#
Name Emp
Gender
Domain
Female
Male
Body
Basic Structure
The account table below represents a relation in the
relational model. The three columns titles are the
attributes and their domains.
Each row is called a tuple.
An account is a subset of the set of all possible tuples.
account-number
branch-name balance
A-101
Downtown
500
A-102
Perryridge
400
A-201
Brighton
900
A-215
Mianus
700
The Domain
Employee
Emp#
E1
E2
E3
Name
Red
Brown
Black
Mgr#
E1
E1
Attributes
Person Name
Red, Brown
Black, Blue
E1, E2, E3,E4
Emp#
Domains
Seven Characteristics of a
relation
• The name of the relation is different from all others.
• Each cell of the relation contains only one value
• Each attribute (or field) has a name that is distinct.
• All the values of a particular attribute are from the
same domain.
• The order of the attributes makes no difference.
• There are no duplicate tuples
• The order of the tuples makes no difference.
Example of the Student table.
Primary Key
SocialNum FirstName
LastName
PhoneNum Class#
556-34-2832
John
Smith
924-1000
32245
839-32-1929
Jane
Doe
924-1929
99839
312-39-5193
Some
Body
555-1000
11021
493-33-2910
Any
One
555-1020
49303
Other terms...


Cardinality = Number of rows
Degree = Number of columns
Degree = 5
Cardinality = 6
Database Schema



Database Schema is the logical design of the
database
Database instance is a snapshot of the data in
the DB at a given instance in time
Relation instance is the programming language
notion of a value of a variable
Database Schema
Relation schema consists of a list of
attributes and their corresponding domain.
As a convention, uppercase letter are used so
Account-schema=(account-number, branchname, balance) This means that account is a
relation on Account-schema by
account(Account-schema)
Database Schema
Relation instance is the set of values of a
relation at a specific moment in time. This
values may change in time causing a change in
the relation as it is updated.
The Relational Data Model
·
DATA STRUCTURES - domain,
attribute, relation, tuple, primary key,
degree, cardinality.

INTEGRITY CONSTRAINTS - entity
integrity and referential integrity.

DATA MANIPULATION OPERATIONS
- defined through relational algebra
and equivalent relational calculus.
Keys




Superkey is a set of one or more attributes that allow
us to identify uniquely an entity in the entity set.
Candidate Key are minimal superkey in an entity, one
of those keys is selected to be the primary key
Primary Key is a candidate key that is chosen to
identify entities within an entity set
Foreign Key is a primary key of another relation
schema
Keys
If K of R is a superkey for R, then the
relation r(R) does not have two tuples
with the same value. So if t1 and t2 are in r
t1 = t2
Find Candidate Keys
R(A, B, C, D)
1
1
3
4
2
1
2
1
3
3
1
2
4
4
4
3

{A, B}
X
{A, C}
X
{A, D}

{B, C}

{B, D}

{C, D}
X
{A,C,D}
X
{B, C, D}
 = okay
 = not okay
How to determine keys



Strong entity set: the entity primary key
becomes the relation primary key
Weak entity set: the primary key of the relation
is the union of the strong entity set primary key
and the discriminator
Relation set: the union of the primary keys of
the related entity sets becomes a superkey of the
relation
How to determine keys


Combined tables: in a many-to-one, the
primary key of the many becomes the relation
primary key. In a one-to-one either primary key
can be used
Multivalued attributes: the entity primary key
becomes the primary key
Schema Diagram
A database schema with primary and foreign key
dependencies
relation
primary
account
account-number
branch-name
balance
branch
shade indicates primary key
depositor
customer
customer-name
account-number
customer-name
dependency
loan
branch-name
loan-number
branch-city
assets
branch-name
amount
customer-street
customer-city
borrower
customer-name
loan-number
Entity Integrity
·
No component of the Primary Key of a base
relation is allowed to accept nulls.
Surname
Red
Black
Red
Black
Given Name
John
Fred
Salary
$40,000
$50,000
$60,000
$70,000
Foreign Key
·
A foreign key is an attribute or attribute combination
of one relation R2 whose values are required to
match those of the primary key of relation R1 where
R1 and R2 are not necessarily distinct. Note that a
foreign key and the corresponding primary key
should be defined on the same domain(s).
Employee
Dept
Emp#
e1
e2
e3
ename
red
blue
brown
Worksfordept
d1
Foreign key
d2
Dept
d1
d2
d3
Dname
Pay
Tax
Art
Referential Integrity
If base relation R2 includes a foreign key FK
matching the primary key PK of some base relation
R1 then every value of FK in R2 must either
(a) be equal to the value of PK in some tuple of R1,
or
(b) be wholly null.
Note that PK and FK may comprise more
than one attribute and that R1 and R2 are not
necessarily distinct.
( Stated more simply a foreign key should be a valid
primary key value or null.)
Foreign Key Rules
For each foreign key three rules need to be answered:
Can the foreign key accept nulls ?
What should happen on an attempt to delete the target of a
foreign key reference?
What should happen on an attempt to update the target of a
foreign key reference ?
Employee
Emp#
e1
e2
e3
ename
red
blue
brown
Dept
Worksfordept
d1
d2
Dept
d1
d2
d3
Dname
Pay
Tax
Art
Foreign Key Rules
When should foreign key rules be checked ?
Dept (Dept#, Dname, Budget)
Emp (Emp#, Ename, Salary, WorksforDept#)
WorksforDept# References Dept delete
cascades, update cascades
Depend (Emp#, Dependname, Date-of-birth)
Emp# references Emp delete cascades,
update cascades
Example of the Class table.
Primary Key
Class#
SectionNum
Professor
32245
2
Lee
11021
1
Agoston
Foreign key example
SocialNum FirstName
LastName
PhoneNum Class#
556-34-2832
John
Smith
924-1000
32245
839-32-1929
Jane
Doe
924-1929
99839
312-39-5193
Some
Body
555-1000
11021
493-33-2910
Any
One
555-1020
49303
Class#
SectionNum
Professor
32245
2
Lee
11021
1
Agoston
Relational instances in the
Student relation


The content of the table (a grouping of rows) are
called relational instances
These instances are unordered, and no two rows can be
exactly alike
A relational
instance
Integrity Constraints




All DBMS’ must have some form of ICs to prevent
invalid data from being entered.
Domain constraints specify the set of values which
may be used for each field.
Other constraints, such as key or tuple, may limit which
values from the domain can be used for a given field in
a given instance.
Key constraints require that each set of fields in the key
be unique for each entry.
Enforcing Integrity Constraints

Each DBMS should have means to resolve
invlaid entries such as:
What happens if an entry that duplicates a key entry
is entered?
 What should be done if an entry of a foreign key is
deleted?



A foreign key is a key where at least one field depends on
a field from a different table.
What happens when an invalid entry is entered?




Associated
with each attribute
is a set of
Relation
schema
values, called a domain, that can be assigned
to the entry of a tuple corresponding to the
attribute.
A relation schema is a set of attributes.
Example EMP = { Name, SSN, DeptName,
Salary, Birthdate }
Convention
EMP(Name,SSN,DeptName,Salary,Birthdate)
Relational DBMS Products
IBM Relational Products
DB2
MVS/370 MVS/XA
SQL/DS VM/CMS DOS/VSE
QMF
front-end to DB2 and
SQL/DS
CSP
application development
tool
Numerous other RDBMS
ORACLE (SQL*Forms)
INGRES
from ASK Corp.
(OSL,ABF)
AIM/RDB from Fujitsu
INFORMIX
VAXSQL/Rdb from DEC
NonStop SQL from Tandem
Microcomputer versions
ORACLE
INGRES
dBase IV
microSQL
practically all micro DBMS
• The relational model is based on set operations. Tables are
sets of rows. The actual storage structure is hidden from the
user. The relational model is just concerned with a logical
view of the data, not the physical view. There are no pointers
for the user to worry about. The only data are explicit values
in tables. All data values in the cells of tables are Atomic
(also known as Scalar). Exactly one data value and not a set
or a repeating group is allowed in each cell.
• Relational databases are the most widely used in the
world ( 90+%.)
• A mathematical viewpoint helped to shape a database
industry.
• Future? Possibly Object Oriented Database model.
Query Languages
Users use query languages to request information
from the database SQL is the most spread.
Database uses two types of query languages:
Procedural language: the user instructs the
system to perform a sequence of operations on the
database
Nonprocedural language: the user describes the
desired information without giving a specific
procedure for obtain the information
Download