ER Review

advertisement
Review of the Entity-Relationship
Model
Slides courtesy of Amol Deshpande
material from ch. 2 of
Korth & Silberschatz Database System Concepts,
Data Modeling
Goals:
Conceptual representation of the data
“Reality” meets “bits and bytes”
Must make sense, and be usable by other people
Review:
Entity-relationship Model
Relational Model
Motivation
You’ve just been hired by Bank of America as their DBA for their
online banking web site.
You are asked to create a database that monitors:
customers
accounts
loans
branches
transactions, …
Now what??!!!
Database Design Steps
Entity-relationship Model
Typically used for conceptual
database design
info
Conceptual DB design
Three Levels of
Modeling
Conceptual Data Model
Logical DB design
Logical Data Model
Relational Model
Typically used for logical
database design
Physical DB design
Physical Data Model
Entity-Relationship Model
Two key concepts
Entities:
• An object that exists and is distinguishable from other objects
– Examples: Bob Smith, BofA, CMSC424
• Have attributes (people have names and addresses)
• Form entity sets with other entities of the same type that share the
same properties
– Set of all people, set of all classes
• Entity sets may overlap
– Customers and Employees
Entity-Relationship Model
Two key concepts
Relationships:
• Relate 2 or more entities
– E.g. Bob Smith has account at College Park Branch
• Form relationship sets with other relationships of the same type that
share the same properties
– Customers have accounts at Branches
• Can have attributes:
– has account at may have an attribute start-date
• Can involve more than 2 entities
– Employee works at Branch at Job
ER Diagram: Starting Example
access-date
cust-name
number
cust-id
customer
has
account
cust-street
cust-city
Rectangles: entity sets
Diamonds: relationship sets
Ellipses: attributes
balance
Review Roadmap
Details of the ER Model
How to represent various types of constraints/semantic
information etc.
Design issues
A detailed example
Relationship Cardinalities
We may know:
One customer can only open one account
OR
One customer can open multiple accounts
Representing this is important
Why ?
Better manipulation of data
Can enforce such a constraint
Remember: If not represented in conceptual model, the domain knowledge
may be lost
Mapping Cardinalities
Express the number of entities to which another entity
can be associated via a relationship set
Most useful in describing binary relationship sets
Mapping Cardinalities
One-to-One
One-to-Many
customer
has
account
customer
has
account
customer
has
account
customer
has
account
Many-to-One
Many-to-Many
Types of Attributes
Simple vs Composite
Single value per attribute ?
Single-valued vs Multi-valued
E.g. Phone numbers are multi-valued
Derived
If date-of-birth is present, age can be derived
Can help in avoiding redundancy, enforcing constraints etc…
Types of Attributes
access-date
cust-name
number
cust-id
customer
has
account
cust-street
cust-city
balance
Types of Attributes
age
multi-valued (double ellipse)
derived (dashed
ellipse)
access-date
cust-name
number
cust-id
date-of-birth
customer
has
account
cust-city
phone no.
balance
cust-street
Types of Attributes
access-date
age
cust-name
number
cust-id
date-of-birth
customer
has
account
phone no.
balance
cust-street
month
day
cust-city
year
Composite Attribute
Next: Keys
Key = set of attributes identifying individual entities or
relationships
Entity Keys
Possible Keys:
date-of-birth
{cust-id}
cust-name
{cust-name, cust-city, cust-street}
{cust-id, age}
cust-id
cust-name ?? Probably not.
age
customer
Domain knowledge dependent !!
cust-street
cust-city
phone no.
Entity Keys
Superkey
any attribute set that can distinguish entities
Candidate key
a minimal superkey
• Can’t remove any attribute and preserve key-ness
– {cust-id, age} not a superkey
– {cust-name, cust-city, cust-street} is
» assuming cust-name is not unique
Primary key
Candidate key chosen as the key by DBA
Underlined in the ER Diagram
Entity Keys
{cust-id} is a natural primary key
Typically, SSN forms a good primary key
date-of-birth
cust-name
Try to use a candidate key that rarely changes
e.g. something involving address not a great
idea
cust-id
age
customer
cust-street
cust-city
phone no.
Relationship Set Keys
What attributes are needed to represent a relationship completely and uniquely ?
Union of primary keys of the entities involved, and relationship attributes
access-date
number
cust-id
customer
has
account
{cust-id, access-date, account number} describes a relationship completely
Relationship Set Keys
Is {cust-id, access-date, account number} a candidate key ?
No. Attribute access-date can be removed from this set without losing key-ness
access-date
number
cust-id
customer
has
account
Relationship Set Keys
Is {cust-id, account-number} a candidate key ?
Depends
access-date
number
cust-id
customer
has
account
Relationship Set Keys
Is {cust-id, account-number} a candidate key ?
Depends
access-date
number
cust-id
customer
has
account
If one-to-one relationship, either {cust-id} or {account-number} sufficient
Since a given customer can only have one account, she can only participate in one
relationship
Ditto account
Relationship Set Keys
Is {cust-id, account-number} a candidate key ?
Depends
access-date
number
cust-id
customer
has
account
If one-to-many relationship (as shown), {account-number} is a candidate key
A given customer can have many accounts, but at most one account holder per account
allowed
Relationship Set Keys
General rule for binary relationships
one-to-one: primary key of either entity set
one-to-many: primary key of the entity set on the many side
many-to-many: union of primary keys of the associate entity
sets
n-ary relationships
More complicated rules
Data Constraints
Representing semantic data constraints
We already saw constraints on relationship cardinalities
Participation Constraint
Given an entity set E, and a relationship R it participates
in:
If every entity in E participates in at least one relationship in R,
it is total participation
partial otherwise
Participation Constraint
access-date
cust-name
number
cust-id
customer
has
account
cust-street
cust-city
balance
Total participation
Cardinality Constraints
How many relationships can an entity participate in ?
access-date
number
cust-id
customer
0..*
Minimum - 0
Maximum – no limit
account
has
1..1
Minimum - 1
Maximum - 1
Recursive Relationships
Sometimes a relationship associates an entity set to itself
Recursive Relationships
emp-name
emp-id
manager
works-for
employee
worker
emp-street
emp-city
Must be declared with roles
Weak Entity Sets
An entity set without enough attributes to have a primary
key
E.g. Transaction Entity
Attributes:
• transaction-number, transaction-date, transaction-amount,
transaction-type
• transaction-number: may not be unique across accounts
Weak Entity Sets
A weak entity set must be associated with an identifying
or owner entity set
Account is the owner entity set for Transaction
Weak Entity Sets
Still need to be able to distinguish between different
weak entities associated with the same strong entity
number
trans-date
trans-number
account
has
Transaction
trans-type
balance
trans-amt
Weak Entity Sets
Discriminator: A set of attributes that can be used for that
number
trans-date
trans-number
account
has
Transaction
trans-type
balance
trans-amt
Weak Entity Sets
Primary key:
Primary key of the associated strong entity
discriminator attribute set
For Transaction:
• {account-number, transaction-number}
+
Specialization
Consider entity person:
Attributes: name, street, city
Further classification:
customer
• Additional attributes: customer-id, credit-rating
employee
• Additional attributes: employee-id, salary
Note similarities to object-oriented programming
Specialization: Example
Aggregation
No relationships between relationships
E.g.: Associate account officers with has account relationship set
customer
has
?
account officer
employee
account
Aggregation
Associate an account officer with each account ?
What if different customers for the same account can have different account
officers ?
customer
has
?
account officer
employee
account
Aggregation
Solution: Aggregation
customer
has
account officer
employee
account
More…
Read Chapter 2 for:
Specialization/Aggregation details
• Different types of specialization’s etc
Generalization: opposite of specialization
Lower- and higher-level entities
Attribute inheritance
…
E/R Data Model
Design Issue #1: Entity Sets vs. Attributes
An Example: Employees can have multiple phones
(b)
(a)
Employee
phone_no
vs
Employee
Phone
Uses
phone_loc
loc
no
To resolve, determine how phones are used
1. Can many employees share a phone?
(If yes, then (b))
2. Can employees have multiple phones?
(if yes, then (b), or (a) with multivalued attributes)
3. Else
(a), perhaps with composite attributes
Employee
phone
no
loc
E/R Data Model
Design Issue #2: Entity Sets vs. Relationship Sets
An Example: How to model bank loans
Customer
ssn
Loan
Borrows
name
(a)
lno
amt
vs
Customer
ssn
Branch
Loans
bname
name
amt
lno
(b)
To resolve, determine how loans are issued
1. Can there be more than one customer per loan?
• If yes, then (a). Otherwise, loan info must be replicated for each
customer (wasteful, potential update anomalies)
2. Is loan a noun or a verb?
• Both, but more of a noun to a bank. (hence (a) probably more
appropriate)
bcity
E/R Data Model
Design Issue #3: N-ary vs Binary Relationship Sets
An Example: Works_At
Ternary:
Employee
Works_at
Dept
Branch
(Joe, Moody, Acct)  Works_At
vs
Binary:
Employee
WAE
WA
WAB
Branch
WAD
Dept
(Joe, w3)  WAE
(Moody, w3)  WAB
(Acct, w3)  WAD
Choose n-ary
when possible!
(Avoids redundancy,
update anomalies)
Example Design
We will model a university database
Main entities:
• Professor
• Projects
• Departments
• Graduate students
• etc…
SSN
proj-number
name
sponsor
professor
project
area
start
rank
budget
dept-no
SSN
name
name
dept
grad
office
age
homepage
degree
SSN
proj-number
name
sponsor
professor
project
area
start
rank
budget
dept-no
SSN
name
name
dept
grad
office
age
homepage
degree
proj-number
SSN
PI
name
sponsor
professor
project
area
start
rank
budget
Co-PI
Appt
Chair
Supervises
RA
Time (%)
dept-no
SSN
name
name
homepage
Major
age
advisor
office
grad
advisee
dept
Mentor
degree
proj-number
SSN
PI
name
sponsor
professor
project
area
start
rank
budget
Co-PI
Appt
Chair
Supervises
RA
Time (%)
dept-no
SSN
name
name
homepage
Major
age
advisor
office
grad
advisee
dept
Mentor
degree
proj-number
SSN
PI
name
sponsor
professor
project
area
start
rank
budget
Co-PI
Appt
Chair
Supervises
RA
Time (%)
dept-no
SSN
name
name
Major
advisee
office
grad
homepage
And so on…
age
advisor
dept
Mentor
degree
Summary
Entity-relationship Model
Intuitive diagram-based representation of domain knowledge, data
properties etc…
Two key concepts:
• Entities
• Relationships
Additional Details:
• Relationship cardinalities
• Keys
• Participation Constraints
• …
Database Design Steps
Entity-relationship Model
Typically used for conceptual
database design
info
Conceptual DB design
Three Levels of
Modeling
Conceptual Data Model
Logical DB design
Logical Data Model
Relational Model
Typically used for logical
database design
Physical DB design
Physical Data Model
Review: Entity-Relationship Model
Basics
E1
a1
…
E2
R
b1
an
c1
…
…
bm
ck
E1
Entity set
R
Relationship set
Attribute (primary key if underlined)
a
Thoughts…
Nothing about actual data
How is it stored ?
No talk about the query languages
How do we access the data ?
Semantic vs Syntactic Data Models
Remember: E/R Model is used for conceptual modeling
Many conceptual models have the same properties
They are much more about representing the knowledge
than about database storage/querying
Thoughts…
Basic design principles
Faithful
Must make sense
Satisfies the application requirements
Models the requisite domain knowledge
If not modeled, lost afterwards
Avoid redundancy
Potential for inconsistencies
Go for simplicity
Typically an iterative process that goes back and forth
Relational Data Model
Introduced by Ted Codd (late 60’s – early 70’s)
• Before = “Network Data Model” (Cobol as DDL, DML)
• Very contentious: Database Wars (Charlie Bachman vs. Mike Stonebraker)
Relational data model contributes:
1.
2.
3.
4.
Separation of logical, physical data models (data independence)
Declarative query languages
Formal semantics
Query optimization (key to commercial success)
Key Abstraction: Relation
Account =
bname
acct_no
balance
Downtown
Brighton
Brighton
A-101
A-201
A-217
500
900
500
Terms:
• Tables (aka: Relations)
Why called Relations?
Why Called Relations?
Mathematical relations
Given sets: R = {1, 2, 3}, S = {3, 4}
• R  S = { (1, 3), (1, 4), (2, 3), (2, 4), (3, 3), (3, 4) }
• A relation on R, S is any subset () of R  S (e.g: { (1, 4), (3, 4)})
Database relations
Given attribute domains
Branches = { Downtown, Brighton, … }
Accounts = { A-101, A-201, A-217, … }
Balances = R
Account  Branches  Accounts  Balances
{ (Downtown, A-101, 500),
(Brighton, A-201, 900),
(Brighton, A-217, 500) }
Relations
Account =
bname
acct_no
balance
Downtown
Brighton
Brighton
A-101
A-201
A-217
500
900
500
Considered equivalent to…
{ (Downtown, A-101, 500),
(Brighton, A-201, 900),
(Brighton, A-217, 500) }
Relational database semantics defined in terms
of mathematical relations
Relations
Account =
bname
acct_no
balance
Downtown
Brighton
Brighton
A-101
A-201
A-217
500
900
500
Considered equivalent to…
Terms:
•
•
•
•
{ (Downtown, A-101, 500),
(Brighton, A-201, 900),
(Brighton, A-217, 500) }
Tables (aka: Relations)
Rows (aka: tuples)
Columns (aka: attributes)
Schema (e.g.: Acct_Schema = (bname, acct_no, balance))
Definitions
1. Relation Schema (or Schema)
A list of attributes and their domains
We will require the domains to be atomic
Programming language equivalent: A variable (e.g. x)
E.g. account(account-number, branch-name, balance)
•
Relation Instance
A particular instantiation of a relation with actual values
Will change with time
bname
acct_no
balance
Programming language
equivalent:
Value of a variable
Downtown
Brighton
Brighton
A-101
A-201
A-217
500
900
500
Rest of the Class
•
Converting from an E/R diagram to a relational
schema
–
•
Remember: We still use E/R models for conceptual
modeling of the database
Relational Algebra
– Data retrieval language
E/R Diagrams  Relations
Convert entity sets into a relational schema with the same
set of attributes
Customer
cname
ccity
bname
bcity
Branch
Customer_Schema(cname, ccity, cstreet)
cstreet
assets
Branch_Schema(bname, bcity, assets)
E/R Diagrams  Relations
Convert relationship sets also into a relational schema
Remember: A relationship is completely described by
primary keys of associate entities and its own attributes
acct-no
balance
Account_Schema(acct-no, balance)
Account
access-date
Depositor_Schema(cname, acct-no,
access-date)
Depositor
Customer
cname
ccity
Customer_Schema(cname, ccity, cstreet)
cstreet
Well… Not quite. We can do better.
It depends on the relationship cardinality
E/R Diagrams  Relations
Say One-to-Many Relationship from Customer to Account
 Many accounts per customer
acct-no
balance
Account
access-date
Account_Schema(acct-no, balance,
cname, access-date)
Depositor
Customer
cname
ccity
Customer_Schema(cname, ccity, cstreet)
cstreet
Exactly same information, fewer tables
E/R Diagrams  Relations
E/R
Entity Sets
Relational Schema
E1
E = (a1, …, an)
a1
…
an
E/R Diagrams  Relations
E/R
Entity Sets
Relational Schema
E1
E = (a1, …, an)
a1
an
…
Relationship Sets
E1
a1
…
b1
an
c1 …
R = (a1, b1, c1, …, cn)
E2
R
…
bm
ck
Not the whole story for Relationship Sets …
a1: E1’s key
b1: E2’s key
c1, …, ck: attributes of R
E/R Diagrams  Relations
Relationship Cardinality
Relational Schema
R
E1
a1
…
b1
an
c1
n:m
R
E2
…
…
bm
ck
E1 = (a1, …, an)
E2 = (b1, …, bm)
R = (a1, b1, c1, …, cn)
E/R Diagrams  Relations
Relationship Cardinality
Relational Schema
R
E1
a1
…
b1
an
c1
n:m
R
n:1
R
E2
…
…
bm
ck
E1
E2
R
E1
E2
=
=
=
=
=
(a1, …, an)
(b1, …, bm)
(a1, b1, c1, …, cn)
(a1, …, an, b1, c1, …, cn)
(b1, …, bm)
E/R Diagrams  Relations
Relationship Cardinality
Relational Schema
R
E1
a1
…
b1
an
c1
n:m
E2
…
…
bm
ck
R
R
E1 = (a1, …, an)
E2 = (b1, …, bm,, a1, c1, …, cn)
R
n:1
1:n
=
=
=
=
=
(a1, …, an)
(b1, …, bm)
(a1, b1, c1, …, cn)
(a1, …, an, b1, c1, …, cn)
(b1, …, bm)
E1
E2
R
E1
E2
E/R Diagrams  Relations
Relationship Cardinality
Relational Schema
R
E1
a1
…
b1
an
c1
n:m
E2
…
…
bm
ck
R
R
E1 = (a1, …, an)
E2 = (b1, …, bm,, a1, c1, …, cn)
R
Treat as n:1 or 1:n
R
n:1
1:n
=
=
=
=
=
(a1, …, an)
(b1, …, bm)
(a1, b1, c1, …, cn)
(a1, …, an, b1, c1, …, cn)
(b1, …, bm)
E1
E2
R
E1
E2
1:1
Translating E/R Diagrams to Relations
acct_no
balance
Account
bname
Loan-Branch
Customer
ccity
Branch
Acct-Branch
Depositor
cname
assets
bcity
Loan
Borrower
cstreet
lno
amt
Q. How many tables does this get translated into?
A. 6 (account, branch, customer, loan, depositor, borrower)
Bank Database
Account
bname
acct_no
Branch
balance
bname
bcity
assets
Depositor
cname
Borrower
acct_no
cname
lno
Customer
cname
cstreet
ccity
Loan
bname
lno
amt
Bank Database
Account
Branch
bname
acct_no
balance
bname
bcity
assets
Downtown
Mianus
Perry
R.H.
Brighton
Redwood
Brighton
A-101
A-215
A-102
A-305
A-201
A-222
A-217
500
700
400
350
900
700
750
Downtown
Redwood
Perry
Mianus
R.H.
Pownel
N. Town
Brighton
Brooklyn
Palo Alto
Horseneck
Horseneck
Horseneck
Bennington
Rye
Brooklyn
9M
2.1M
1.7M
0.4M
8M
0.3M
3.7M
7.1M
Depositor
cname
acct_no
Johnson
Smith
Hayes
Turner
Johnson
Jones
Lindsay
A-101
A-215
A-102
A-305
A-201
A-217
A-222
Customer
cname
cstreet
ccity
Jones
Smith
Hayes
Curry
Lindsay
Turner
Williams
Adams
Johnson
Glenn
Brooks
Green
Main
North
Main
North
Park
Putnam
Nassau
Spring
Alma
Sand Hill
Senator
Walnut
Harrison
Rye
Harrison
Rye
Pittsfield
Stanford
Princeton
Pittsfield
Palo Alto
Woodside
Brooklyn
Stanford
Borrower
cname
lno
Jones
Smith
Hayes
Jackson
Curry
Smith
Williams
Adams
L-17
L-23
L-15
L-14
L-93
L-11
L-17
L-16
Loan
bname
lno
amt
Downtown
Redwood
Perry
Downtown
Mianus
R.H.
Perry
L-17
L-23
L-15
L-14
L-93
L-11
L-16
1000
2000
1500
1500
500
900
1300
E/R Diagrams & Relations
E/R
Relational Schema
Weak Entity Sets
IR
E1
a1
…
an
E1 = (a1, …, an)
E2 = (a1, b1, …, bm)
E2
b1
…
bm
E/R Diagrams & Relations
E/R
Relational Schema
Multivalued Attributes
Emp
= (ssn, name)
Emp-Phones = (ssn, phone)
Employee
ssn
name
phone
ssn
name
ssn
phone
001
…
Smith
…
001
001
…
4-1234
4-5678
…
Emp
Emp-Phones
E/R Diagrams & Relations
E/R
Relational Schema
Subclasses
a1
Method 1:
E = (a1, …, an)
E1 = (a1, b1, …, bm)
E2 = (a1, c1, …, ck)
an
…
E
ISA
E1
b1
…
E2
bm
c1
…
ck
E/R Diagrams & Relations
E/R
Relational Schema
Subclasses
a1
Method 1:
E = (a1, …, an)
E1 = (a1, b1, …, bm)
E2 = (a1, c1, …, ck)
an
…
E
ISA
E1
b1
…
E2
bm
c1
…
ck
Method 2:
E1 = (a1, …, an, b1, …, bm)
E2 = (a1, …, an, c1, …, ck)
E/R Diagrams & Relations
Subclasses example:
Method 1:
Account
SAccount
CAccount
= (acct_no, balance)
= (acct_no, interest)
= (acct_no, overdraft)
Method 2:
SAccount
CAccount
= (acct_no, balance, interest)
= (acct_no, balance, overdraft)
Q: When is method 2 not possible?
A: When subclassing is partial
Keys and Relations
As in the E/R Model:
1. Superkeys
• set of attributes of table for which every row has distinct set of values
2. Candidate keys
•“minimal” superkeys
3. Primary keys
•DBA-chosen candidate keys
Act as Integrity Constraints
i.e., guard against illegal/invalid instance of given schema
e.g., Branch = (bname, bcity, assets)
bname
bcity
assets
Brighton
Brighton
Brooklyn
Boston
5M
3M

Invalid!!
Download