Title: ENHANCEMENT OF ERD Research by: Student Name : Wafa

advertisement
Title: ENHANCEMENT OF ERD
Research by:
Student Name : Wafa Ali Edrees
Student Id : 201130061
Collage Of Computer Science and Information System
Level : 5
Teacher : Arshia Arjumand Banu
1
ENHANCEMENT OF ERD
ABSTRACT:
This paper describes about the inclusion of normalization principles in the
technique widely used in data modeling called Entity relationship diagram (ERD).
Actually ERD is developed during the phase of conceptual data modeling in the database
development process. Now, with the concept of normalization we are enhancing and
transforming them in the logical database design phase. Application of normalization
during ERD development allows for more robust requirement analysis.
Keywords: EER, ERD, UML, DBMS, Normalization
INTRODUCTION:
Data modeling is an essential technique used to analyze business
requirements and essential component of database design and development. Entity
relationship diagram (ERD) is one of the most widely used techniques for data
modeling.
Data modeling is performed during the initial phases of the database life cycle. In this
process, the first two phases are concerned with the information content of the database,
while the last two phases are concerned with the implementation of the database on
some commercial DBMS.
During the conceptual data modeling phase, data requirements are expressed through an
ERD. The conceptual data modeling phase in general is independent of a DBMS. The
logical design phase transforms the conceptual data model into a format understandable
to DBMS. This phase may also enhance or refine the data model (ERD) of the previous
phase to ensure efficient utilization of the database.
2
One of the ways an ERD is enhanced during the logical design phase is through the
process of normalization. Normalization is one of the key tenets in relational model
design. It is the process of removing redundancy in a table so that the table is easier to
modify. It usually involves dividing an entity table into two or more tables and defining
relationships between the tables. The objective is to isolate data so that additions,
deletions, and modifications of an attribute can be made in just one table and then
propagated through the rest of the database via the defined relationships.
3
Normalization utilizes association among attributes within an entity table to accomplish
its objective. Since an ERD also utilizes association among attributes as a basis to
identify entity type structure, it is possible to apply normalization principles during the
conceptual data modeling phase. Performing normalization during ERD development
can improve the conceptual model, and speed its implementation. This paper outlines the
application of normalization principles to ERD development during the conceptual
modeling phase. There are various standards for ERD.
Related work:
In traditional environment there is face-to-face interaction. In this environment, data
normalization would be introduced. Files act locally where as DBMS saves directly in a
database. In File System transactions are not possible where as various transactions like
insert, delete, view, updating etc are possible in DBMS. Data will be accessed through
single or various files where as in DBMS, tables (schema) is used to access data. A "File
manager" is used to store all relationships in directories in File Systems where as a data
base manager (administrator) stores the relationship in form of structural tables. Data in
data bases are more secure compared to data in files.
Advantages
 Reduced data redundancy
 Reduced updating errors and increased consistency
 Greater data integrity and independence from applications programs
 Improved data access to users through use of host and query languages
 Improved data security
 Reduced data entry, storage, and retrieval costs
 Facilitated development of new applications program
Disadvantages
4
 Database systems are complex, difficult, and time-consuming to design
 Substantial hardware and software start-up costs
 Damage to database affects virtually all applications programs
 Extensive conversion costs in moving form a file-based system to a database
system
 Initial training required for all programmers and users
Proposed work:
Normalized ERD
Now we utilize the representation of dependency concepts in ERD toward their use in
the application of normal forms. Each normal form rule and its application is outlined.
First Normal Form (1NF)
The first normal form rule is that there should be no nesting or repeating groups in a
table. Now an entity type that contains only one value for an attribute in an entity
instance ensures the application of first normal form for the entity type. So in a way any
entity type with an entity identifier is by default in first normal form. For example, the
entity type Student is in first normal form.
Second Normal Form (2NF)
The second normal form rule is that the key attributes determine all non-key attributes.
A violation of second normal form occurs when there is a composite key, and part of the
key determines some non-key attributes. The second normal form deals with the
situation when the entity identifier contains two or more attributes, and the non-key
attribute depends on part of the entity identifier. For example, consider the modified
entity type Student. The entity type has a composite entity identifier of SID and City
attributes.
5
An entity instance of this entity type is shown in the following Figure. Now, if there is a
functional dependency City ,Status, then the entity type structure will violate the second
normal form.
To resolve the violation of the second normal form a separate entity type City with oneto-many relationship is created. The relationship cardinalities can be further modified to
reflect organizational working. In general, the second normal form violation can be
avoided by ensuring that there is only one attribute as an entity identifier.
Third Normal Form (3NF)
The third normal form rule is that the non-key attributes should be independent. This
normal form is violated when there exists a dependency among non-key attributes in the
6
form of a transitive dependency. For example consider the entity type Student, In this
entity type, there is a functional dependency BuildingName, Fee that violates the third
normal form.
Transitive dependency is resolved by moving the dependency attributes to a new entity
type with one-to-many relationship. In the new entity type the determinant of the
dependency becomes the entity identifier. The resolution of the third normal form is
shown in the following figure. The relationship cardinalities can be further modified to
reflect organizational working.
Boyce-Codd Normal Form (BCNF)
The Boyce-Codd normal form (BCNF) extends the third normal form. The Boyce-Codd
normal form rule is that every determinant is a candidate key. Even though Boyce-Codd
normal form and third normal form generally produce the same result, Boyce-Codd
normal form is a stronger definition than third normal form. Every table in Boyce-Codd
normal form is by definition in third normal form. Boyce-Codd normal form considers
two special cases not covered by third normal form:
1. Part of a composite entity identifier determines part of its attribute, and
2. A non entity identifier attribute determines part of an entity identifier attribute.
These situations are only possible if there is a composite entity identifier, and
dependencies exist from a non-entity identifier attribute to part of the entity identifier.
7
For example, consider the entity type Student Concentration. The entity type is in third
normal form, but since there is a dependency FacultyName, MajorMinor, it is not in
Boyce-Codd normal form.
To ensure that Student Concentration entity type stays in Boyce-Codd normal form,
another entity type Faculty with one-to-many relationship is constructed as shown in the
following figure. The relationship cardinalities can be further modified to reflect
organizational working.
Fourth Normal Form (4NF)
Fourth normal form rule is that there should not be more than one multi-valued
dependency in a table. For example, consider the Student Details entity type. Now,
during requirements analysis if it is found that the Major Minor values of a student are
independent of the Activity performed by the student, then the entity type structure will
violate the fourth normal form. To resolve the violation of the fourth normal form
separate weak entity types with identifying relationships are created. The Student Focus
and Student Activity entity types are weak entity types. The relationship cardinalities
8
can be further modified to reflect organizational working. It is now presumed that the
Student entity type has the functional dependency SID, Name, Street, City, Zip.
Architecture:
The Logical Structures: Access to the data is made possible by a well-defined logical
organization composed of the following.
Logical
Description
structure
Fields
A field holds a single piece of information, such as a name or an amount. A field can
hold one specific type of information. Fields are assembled into a structure called a
record. On its own, a field is not very useful, as it can hold only a limited amount of
information.
Records
A record is a logical structure assembled from an arbitrary number of fields. A record
9
stores a single entry in the database. The fields in a record store information about
important properties of the entry. Records are organized in tables.
Tables
A table can be thought of as an N times M matrix. Each of the N rows describes a record
and each of the M columns describes a field in the record. Tables are organized in
companies.
Companies
A company is a sub-database; its primary use is to separate and group large portions of
data together. A company can contain private tables as well as tables that are shared
with other companies.
The following illustration shows logical structures.
Application of Normalization to ERD:
Data modeling is an iterative process. Generally a preliminary data model is constructed
which is then refined many times. There are many guidelines (rules) for refining an
ERD. Some of these rules are as follows:
1. Transform attributes into entity types. This transformation involves the addition of
an entity type and a 1-M (one-to-many) relationship.
10
2. Split compound attributes into smaller attributes. A compound attribute contains
multiple kinds of data.
3. Expand entity types into two entity types and a relationship. This transformation
can be useful to record a finer level of detail about an entity.
4. Transform a weak entity type into a strong entity type. This transformation is most
useful for associative entity types.
5. Add historical details to a data model. Historical details may be necessary for
legal as well as strategic reporting requirements. This transformation can be
applied to attributes and relationships.
6. Add generalization hierarchies by transforming entity types into generalization
hierarchy.
Application of normalization principles toward ERD development enhances these
guidelines. To understand this application (i) representation of dependency concepts in
an ERD is outlined, followed by (ii) representation of normal forms toward the
development of entity type structure. Guidelines for identification of various
dependencies is avoided in the paper so as to focus more on their application. Only the
first four normal forms and the Boyce-Codd normal forms are considered.
Representation of Dependencies
Functional dependency in an entity type occurs if one observes the association among
the entity identifier and other attributes as reflected in an entity instance. Each entity
instance represents a set of values taken by the non entity identifier attributes for each
primary key (entity identifier) value. So, in a way an entity instance structure also
reflects an application of the functional dependency concept. For example, the Student
entity type can represent the functional dependency SID , Name, Street, City, Zip.
11
Each entity instance will now represent the functional dependency among the entity
attributes as shown.
During requirement analysis, some entity types may be identified through functional
dependencies, while others may be determined through database relationships. For
example, the statement, "A faculty teaches many offerings but an offering is taught by
one faculty" defines entity type Faculty and Offerings. Another important consideration
is to distinguish when one attribute alone is the entity identifier versus a composite entity
identifier. A composite entity identifier is an entity identifier with more than one
attribute. A functional dependency in which the determinant contains more than one
attribute usually represents a many-to-many relationship, which is more addressed
through higher normal forms. The notion of having a composite entity identifier is not
very common, and often times is a matter of expediency, rather than good entity
structure or design.
12
Transitive dependency in an entity type occurs if non entity identifier attributes have
dependency among themselves. For example, consider the modified Student entity type .
In this entity type, suppose there is a functional dependency BuildingName, Fee.
Existence of BuildingName Fee dependency implies that the value assigned to the Fee
attribute is fixed for distinct BuildingName attribute values. In other words, the Fee
attribute values are not specific to the SID value of a student, but rather the
BuildingName value. The entity instance of transitive dependency is shown in the figure.
Multi-valued dependency equivalency in ERD occurs when attributes within an entity
instance have more than one value. This is a situation when some attributes within an
entity instance have maximum cardinality of N (more than 1). When an attribute has
multiple values in an entity instance, it can be setup either as a composite key identifier
of the entity type, or split into a weak entity type. For example, consider the following
entity type Student Details as shown as follows.
13
The Student Details entity type has a composite entity identifier consisting of three
attributes - SID, Major Minor, and Activity. The composition of entity identifier is due
to the fact that a student has multiple Major Minor values along with being involved in
multiple activities. However, a student has only one value for Name, Street, City, Zip
attributes based on the functional dependency SID, Major Minor, Activity,Name, Street,
City, Zip. The multi-valued dependency affects the key structure. So, in the Student
Details entity type, there can be an MVD SID,Major Minor, Activity. This means that a
SID value is associated with multiple values of Major Minor and Activity attributes, and
together they determine other attributes. The entity instance of Student Details entity
type as follows.
14
Diagram:
The Database Design Life cycle
Database development is just one part of the much wider field of software
engineering, the process of developing and maintaining software. A core aspect of
software engineering is the subdivision of the development process into a series of
phases, or steps, each of which focuses on one aspect of the development. The collection
of these steps is sometimes referred to as a development life cycle. The software
product moves through this life cycle (sometimes repeatedly as it is refined or
redeveloped) until it is finally retired from use. Ideally, each phase in the life cycle can
15
be checked for correctness before moving on to the next phase. However, software
engineering is a very rich discipline with many different methods for the subdivision of
the development process and a detailed exploration of the many different ways in which
development can be structured is beyond the scope of this unit.

Establishing requirements involves consultation with, and agreement
among, stakeholders as to what they want of a system, expressed as a statement of
requirements.

Analysis starts by considering the statement of requirements and finishes by
producing a system specification. The specification is a formal representation of
what a system should do, expressed in terms that are independent of how it may be
realized.

Design begins with a system specification and produces design documents, and
provides a detailed description of how a system should be constructed.

Implementation is the construction of a computer system according to a given
design document and taking account of the environment in which the system will be
operating (for example specific hardware or software available for the
development). Implementation may be staged, usually with an initial system than
can be validated and tested before a final system is released for use.

Testing compares the implemented system against the design documents and
requirements specification and produces an acceptance report or, more usually, a list
of errors and bugs that require a review of the analysis, design and implementation
processes to correct (testing is usually the task that leads to the waterfall model
iterating through the life cycle).

Maintenance involves dealing with changes in the requirements, or the
implementation environment, bug fixing or porting of the system to new
16
environments (for example migrating a system from a standalone PC to a UNIX
workstation or a networked environment). Since maintenance involves the analysis
of the changes required, design of a solution, implementation and testing of that
solution over the lifetime of a maintained software system, the waterfall life cycle
will be repeatedly revisited.
Conclusion:
Instead of applying normalization principles during the relational design portion of
logical database design phase, it is better to apply them during the conceptual modeling
phase. Due to the similarity in the notion of an entity type and a relation, normalization
concepts when explained or applied to an ERD may generate a richer model. Also, such
an application enables a better representation of user working requirements. There
should be only one dependency in each entity type where the determinant is the entity
identifier. There should not be any additional dependency among the non entity
identifier attributes. Any such additional dependency should be represented by a new
entity type with one-to-many relationship. If there is a composite entity identifier of
three or more attributes it should be ensured that there is only one multi-valued
dependency among them.
Future enhancement:
Normalization provides numerous benefits to a database. Some of the major benefits
include the following:

Greater overall database organization

Reduction of redundant data

Data consistency within the database

A much more flexible database design

A better handle on database security
17
Organization is brought about by the normalization process,
making everyone's job easier, from the user who accesses tables to the database
administrator (DBA) who is responsible for the overall management of every object in
the database. Data redundancy is reduced, which simplifies data structures and conserves
disk space. Because duplicate data is minimized, the possibility of inconsistent data is
greatly reduced. For example, in one table an individual's name could read STEVE
SMITH, whereas the name of the same individual reads STEPHEN R. SMITH in
another table. Because the database has been normalized and broken into smaller tables,
you are provided with more flexibility as far as modifying existing structures. It is much
easier to modify a small table with little data than to modify one big table that holds all
the vital data in the database. Lastly, security is also provided in the sense that the DBA
can grant access to limited tables to certain users. Security is easier to control when
normalization has occurred. Data integrity is the assurance of consistent and accurate
data within a database.
18
Reference:

Adelman, S., Moss, Larissa and Abai, Majid (2005) Data Strategy, AddisonWesley, Readings, MA.

Bala, Mohan and Martin, Kipp (1997) "A Mathematical Programming Approach
to Data Base Normalization," Informs Journal of Computing, Vol. 9, No.1, pp. 114.

Balaban, M. and Shoval, P. (1999). Enhancing the ER model with integrity
methods. Journal of Database Management, 10(4),14-23.

Balaban, M. and Shoval, P. (2002). Enforcing Cardinality Constraints in ER
Model with Integrity Methods. In Keng Siau (Eds) Advanced Topics in Database
Research, Volume 1, 1-16.
19
Download