Entity Relationship Model

advertisement
Introduction to Database
Design
Entity Relationship Model
Design of a Database

Design phases:

Requirement Analysis
• Talk to people and figure out what they want

Conceptual Database Design
• Do the design
• Many tools/modeling techniques


ER, UML, Rambaugh, Booch, Yordon
Logical Database Design
• Actual database tables in relational model, or OO model
or XML model
• Here – only relational model.
Overview of Database
Design

Conceptual design: (ER Model is used at this
stage.)





What are the entities and relationships in the enterprise?
What information about these entities and relationships
should we store in the database?
What are the integrity constraints or business rules that
hold?
A database `schema’ in the ER Model can be
represented pictorially (ER diagrams).
Can map an ER diagram into a relational schema.
Entity-Relationship Model
 Entity
Sets
 Relationship Sets
 Mapping Constraints
 Keys
 E-R Diagram
 Extended E-R Features
 Design Issues
 Design of an E-R Database Schema
 Reduction of an E-R Schema to Tables
Entity Sets

A database can be modeled as:





a collection of entities,
relationship among entities.
An entity is an object that exists and is
distinguishable from other objects.
Example: specific person, company, event, plant
Entities are described using attributes
Example: people have names and addresses
An entity set is a set of entities of the same type that
share the same properties.
Example: set of all persons, companies, trees,
holidays
Entity Sets customer and
loan
customer-id customer- customer- customerloan- amount
name
street
city
number
Attributes



An entity is represented by a set of attributes, that is descriptive
properties possessed by all members of an entity set.
Example:
customer = (customer-id, customer-name,
customer-street, customer-city)
loan = (loan-number, amount)
Domain – the set of permitted values for each attribute
Keys: Minimal set of attributes whose values uniquely identify an
entity in the set


Candidate Keys: all sets of attributes that can potentially be a key.
Primary Key: One of the candidate keys is chosen to be a “primary”
key.
Relationship Sets

A relationship is an association among several entities
Example:
Hayes depositor
A-102
customer entity relationship set

account entity
A relationship set is a mathematical relation among n 
2 entities, each taken from entity sets
{(e1, e2, … en) | e1  E1, e2  E2, …, en  En}
where (e1, e2, …, en) is a relationship



Example:
(Hayes, A-102)  depositor
There can be multiple relationship sets between the
same two entities.
A relationship must be uniquely identified by the
participating entities.
Relationship Set borrower
Descriptive Attributes


Descriptive attributes: used to record information
about the relationship
When was the last time that the customer accessed
his/her account.
E-R Diagrams
Rectangles represent entity sets.
Diamonds represent relationship sets.
Lines link attributes to entity sets and entity sets to relationship
sets.
Ellipses represent attributes
Underline indicates primary key attributes (coming up)
Ternary Relationships


Ternary relationships - used to record associations
between three entity sets.
Example: Each branch has several jobs that can be
worked on by

For this we need to record the association between
employees, branches and jobs.
Roles/Self Referential
Relationships

Entity sets of a relationship need not be distinct

The labels “manager” and “worker” are called roles; they
specify how employee entities interact via the works-for
relationship set.
Roles are indicated in E-R diagrams by labeling the lines that
connect diamonds to rectangles.
Role labels are optional, and are used to clarify semantics of
the relationship


Constraints in ER
 Key
Constraints
 Cardinality Constraints
 Participation Constraints
 Overlapping
Constraints (ISA)
 Coverage Constraints (ISA)
Key Constraints

Consider depositor relationship: A customer can deposit into
many accounts; an account can have many depositors.

Compare with: Each department has at most one Manager
Contrast with: Each customer can be the borrower on one loan. However,
each loan can have many borrowers.
The restriction that each customer can be borrower on one loan => Key
Constraint
Key Constraint II


Relationship set like borrower - sometimes said to
be one-to-many
Relationship set between customers and accounts ->
many-to-many
Key Constraint III

Additional Restriction: a loan may be borrowed by
only one customer -> one-to-one

Textbook clarification: arrow shown to go from
customer to borrower


Means same thing!
Implies that customer entity participates in the borrower
relationship set only once.
Key Constraints for Ternary
Relationships
 Key
constraints in binary relationships can
be easily extended to ternary.
Alternative Notation for
Cardinality Limits
 Cardinality limits can also express participation constraints
Participation Constraints
 Total participation (indicated by double/thick line): every
entity in the entity set participates in at least one relationship in
the relationship set
 E.g. participation of loan in borrower is total
 every loan must have a customer associated to it via
borrower
 Partial participation: some entities may not participate in any
relationship in the relationship set
 E.g. participation of customer in borrower is partial
 Not every customer has a loan
Keys


A super key of an entity set is a set of one or more
attributes whose values uniquely determine each
entity.
A candidate key of an entity set is a minimal super
key



Customer-id is candidate key of customer
account-number is candidate key of account
Although several candidate keys may exist, one of
the candidate keys is selected to be the primary
key.
Weak Entity Sets

Assumption so far:



Attributes associated with an entity contain a key (to uniquely
identify the entities)
Not always the case!
Example:


Employees can purchase policies to cover their dependents.
We need to record information about policies:
• Who is covered, Who owns the policy



This above is modeled via a Weak Entity Set.


Don’t really care about the dependents beyond that
If employee quits, policy is deleted and coverage for dependents
stopped!
An entity set that does not have a primary key is referred to as a
weak entity set.
Weak entity is uniquely identified by a conjunction of some of its
attributes and the primary key of another entity - Identifying
entity set
Weak Entity Sets

Restrictions:


it must relate to the identifying entity set via a one-tomany relationship set from the identifying to the weak
entity set
It must have total participation in the identifying
relationship set.
Weak Entity Sets (Cont.)

We depict a weak entity set by double rectangles.
 We underline the discriminator of a weak entity set
with a dashed line.
 payment-number – discriminator of the payment
entity set
 Primary key for payment – (loan-number, paymentnumber)
Conceptual Design Using the ER
Model

Design choices:




Should a concept be modeled as an entity or an attribute?
Should a concept be modeled as an entity or a relationship?
Identifying relationships: Binary or ternary? Aggregation?
Constraints in the ER Model:


A lot of data semantics can (and should) be captured.
But some constraints cannot be captured in ER diagrams.
•
Constraints on individual attributes of an entity

Employee enitites must have age > 24
Entity vs. Attribute



Remember – attribute values are atomic (cannot be
broken down further)
Should address be an attribute of Employees or an
entity (connected to Employees by a relationship)?
Depends upon the use of address information, and the
semantics of the data:
• If we have several addresses per employee, address must
be an entity (since attributes cannot be set-valued).
• If address is to be shared by many employees, address
should be an entity.
• If the structure (city, street, etc.) is important, e.g., we want
to retrieve employees in a given city, address must be
modeled as an entity (since attribute values are atomic).
Entity vs. Attribute (Contd.)
from


name
to
Works_In2 does not
did
ssn
lot
budget
allow an employee to
Departments
Works_In2
Employees
work in a department
for two or more periods.
Similar to the problem
of wanting to record
several addresses for an
name
dname
ssn
lot
did
budget
employee: we want to
record several values of
Works_In3
Departments
Employees
the descriptive attributes
for each instance of this
Duration
to
from
relationship.
dname
Entity vs. Relationship


First ER diagram OK if
name
a manager gets a
ssn
separate discretionary
Employees
budget for each dept.
What if a manager gets
a discretionary budget
name
ssn
that covers all
managed depts?
Employees

Redundancy of dbudget,
which is stored for each
dept managed by the
manager.
- Misleading: suggests dbudget
tied to managed dept.
since
dbudget
dname
lot
did
budget
Departments
Manages2
dname
lot
did
Manages3
budget
Departments
since
apptnum
Mgr_Appts
dbudget
Binary vs. Ternary
Relationships
name
ssn
pname
lot
Employees
age
Dependents
Covers
Policies
policyid


name
Consider Figure 1 - What
doeslotit
ssn
depict?
Employees
Additional constraints:



Purchaser
cost
pname
age
Dependents
Beneficiary
A policy cannot be owned jointly by two
employees
Policies
Every policy must be owned by some
employee
policyid
cost
Dependents is a weak entity set - uniquely
identified by policyId
Binary vs Ternary

Constraint 1: Add a key constraint on Policies
with respect to Covers


Constraint 2: Total participation constraint on
Policies


Side effect: policy can cover only one dependent
Ok if each policy covers at least one dependent
Constraint 3: Introduce an indentifying
relationship set
Better Solution
name
ssn
pname
lot
Employees
age
Dependents
Covers
Policies
policyid
cost
name
ssn
pname
age
lot
Dependents
Employees
Purchaser
Beneficiary
Policies
policyid
cost
Are you awake?
ER Group Exercise
Class (ISA) Hierarchies
 As
in C++ or
Java, attributes
are inherited
 If we declare A
ISA B, every A
entity is also
considered to
be a B entity.
ISA Hierarchy Constraints

Overlap Constraints: Can Joe be both an
employee and a customer?
(Allowed/Disallowed)
 Does every employee entity also have to be an
officer or teller or secretary entity? (Yes/No)
 Reasons for using ISA:


To add attributes specific to a subclass
To identify entities that participate in a relationship
Aggregation
name
ssn
lot
Employees

Used when we have
to model a
relationship
involving (entitity
sets and) a
pid
relationship set.

Monitors
started_on
pbudget
until
since
did
dname
budget
Aggregation allows
Sponsors
Departments
Projects
us to treat a
relationship set as an
entity set for
 Aggregation vs. ternary relationship:
purposes of
 Monitors is a distinct relationship,
participation in
(other) relationships.with a descriptive attribute.
Also, can say that each sponsorship
is monitored by at most one employee.

Case Study (from Text Book)
 See
Handout
 Addition to earlier exercise.
Summary of Conceptual
Design

Conceptual design follows requirements analysis,


ER model popular for conceptual design




Yields a high-level description of data to be stored
Constructs are expressive, close to the way people think
about their applications.
Basic constructs: entities, relationships, and
attributes (of entities and relationships).
Some additional constructs: weak entities, ISA
hierarchies, and aggregation.
Note: There are many variations on ER model.
Summary of ER (Contd.)

Several kinds of integrity constraints can be
expressed in the ER model: key constraints,
participation constraints, and overlap/covering
constraints for ISA hierarchies. Some foreign key
constraints are also implicit in the definition of a
relationship set.


Some constraints (notably, functional dependencies) cannot
be expressed in the ER model.
Constraints play an important role in determining the best
database design for an enterprise.
Summary of ER (Contd.)

ER design is subjective. There are often many
ways to model a given scenario! Analyzing
alternatives can be tricky, especially for a large
enterprise. Common choices include:


Entity vs. attribute, entity vs. relationship, binary or n-ary
relationship, whether or not to use ISA hierarchies, and
whether or not to use aggregation.
Ensuring good database design: resulting
relational schema should be analyzed and refined
further. FD information and normalization
techniques are especially useful.
Summary of Symbols Used
in E-R Notation
Summary of Symbols (Cont.)
Alternative E-R Notations
UML
 UML:
Unified Modeling Language
 UML has many components to graphically
model different aspects of an entire
software system
 UML Class Diagrams correspond to E-R
Diagram, but several differences.
Summary of UML Class Diagram
Notation
UML Class Diagrams
(Contd.)





Entity sets are shown as boxes, and attributes are shown within the
box, rather than as separate ellipses in E-R diagrams.
Binary relationship sets are represented in UML by just drawing a
line connecting the entity sets. The relationship set name is written
adjacent to the line.
The role played by an entity set in a relationship set may also be
specified by writing the role name on the line, adjacent to the entity
set.
The relationship set name may alternatively be written in a box,
along with attributes of the relationship set, and the box is
connected, using a dotted line, to the line depicting the relationship
set.
Non-binary relationships cannot be directly represented in UML -they have to be converted to binary relationships.
UML Class Diagram Notation
(Cont.)
*Note reversal of position in cardinality constraint depiction
UML Class Diagrams (Contd.)




Cardinality constraints are specified in the form l..h,
where l denotes the minimum and h the maximum
number of relationships an entity can participate in.
Beware: the positioning of the constraints is exactly the
reverse of the positioning of constraints in E-R diagrams.
The constraint 0..* on the E2 side and 0..1 on the E1 side
means that each E2 entity can participate in at most one
relationship, whereas each E1 entity can participate in
many relationships; in other words, the relationship is
many to one from E2 to E1.
Single values, such as 1 or * may be written on edges;
The single value 1 on an edge is treated as equivalent to
1..1, while * is equivalent to 0..*.
Download