Title: ENHANCEMENT OF ERD Research by: Student Name : Wafa Ali Edrees Student Id : 201130061 Collage Of Computer Science and Information System Level : 5 Teacher : Arshia Arjumand Banu 1 ENHANCEMENT OF ERD ABSTRACT: This paper describes about the inclusion of normalization principles in the technique widely used in data modeling called Entity relationship diagram (ERD). Actually ERD is developed during the phase of conceptual data modeling in the database development process. Now, with the concept of normalization we are enhancing and transforming them in the logical database design phase. Application of normalization during ERD development allows for more robust requirement analysis. Keywords: EER, ERD, UML, DBMS, Normalization INTRODUCTION: Data modeling is an essential technique used to analyze business requirements and essential component of database design and development. Entity relationship diagram (ERD) is one of the most widely used techniques for data modeling. Data modeling is performed during the initial phases of the database life cycle. In this process, the first two phases are concerned with the information content of the database, while the last two phases are concerned with the implementation of the database on some commercial DBMS. During the conceptual data modeling phase, data requirements are expressed through an ERD. The conceptual data modeling phase in general is independent of a DBMS. The logical design phase transforms the conceptual data model into a format understandable to DBMS. This phase may also enhance or refine the data model (ERD) of the previous phase to ensure efficient utilization of the database. 2 One of the ways an ERD is enhanced during the logical design phase is through the process of normalization. Normalization is one of the key tenets in relational model design. It is the process of removing redundancy in a table so that the table is easier to modify. It usually involves dividing an entity table into two or more tables and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of an attribute can be made in just one table and then propagated through the rest of the database via the defined relationships. 3 Normalization utilizes association among attributes within an entity table to accomplish its objective. Since an ERD also utilizes association among attributes as a basis to identify entity type structure, it is possible to apply normalization principles during the conceptual data modeling phase. Performing normalization during ERD development can improve the conceptual model, and speed its implementation. This paper outlines the application of normalization principles to ERD development during the conceptual modeling phase. There are various standards for ERD. Related work: In traditional environment there is face-to-face interaction. In this environment, data normalization would be introduced. Files act locally where as DBMS saves directly in a database. In File System transactions are not possible where as various transactions like insert, delete, view, updating etc are possible in DBMS. Data will be accessed through single or various files where as in DBMS, tables (schema) is used to access data. A "File manager" is used to store all relationships in directories in File Systems where as a data base manager (administrator) stores the relationship in form of structural tables. Data in data bases are more secure compared to data in files. Advantages Reduced data redundancy Reduced updating errors and increased consistency Greater data integrity and independence from applications programs Improved data access to users through use of host and query languages Improved data security Reduced data entry, storage, and retrieval costs Facilitated development of new applications program Disadvantages 4 Database systems are complex, difficult, and time-consuming to design Substantial hardware and software start-up costs Damage to database affects virtually all applications programs Extensive conversion costs in moving form a file-based system to a database system Initial training required for all programmers and users Proposed work: Normalized ERD Now we utilize the representation of dependency concepts in ERD toward their use in the application of normal forms. Each normal form rule and its application is outlined. First Normal Form (1NF) The first normal form rule is that there should be no nesting or repeating groups in a table. Now an entity type that contains only one value for an attribute in an entity instance ensures the application of first normal form for the entity type. So in a way any entity type with an entity identifier is by default in first normal form. For example, the entity type Student is in first normal form. Second Normal Form (2NF) The second normal form rule is that the key attributes determine all non-key attributes. A violation of second normal form occurs when there is a composite key, and part of the key determines some non-key attributes. The second normal form deals with the situation when the entity identifier contains two or more attributes, and the non-key attribute depends on part of the entity identifier. For example, consider the modified entity type Student. The entity type has a composite entity identifier of SID and City attributes. 5 An entity instance of this entity type is shown in the following Figure. Now, if there is a functional dependency City ,Status, then the entity type structure will violate the second normal form. To resolve the violation of the second normal form a separate entity type City with oneto-many relationship is created. The relationship cardinalities can be further modified to reflect organizational working. In general, the second normal form violation can be avoided by ensuring that there is only one attribute as an entity identifier. Third Normal Form (3NF) The third normal form rule is that the non-key attributes should be independent. This normal form is violated when there exists a dependency among non-key attributes in the 6 form of a transitive dependency. For example consider the entity type Student, In this entity type, there is a functional dependency BuildingName, Fee that violates the third normal form. Transitive dependency is resolved by moving the dependency attributes to a new entity type with one-to-many relationship. In the new entity type the determinant of the dependency becomes the entity identifier. The resolution of the third normal form is shown in the following figure. The relationship cardinalities can be further modified to reflect organizational working. Boyce-Codd Normal Form (BCNF) The Boyce-Codd normal form (BCNF) extends the third normal form. The Boyce-Codd normal form rule is that every determinant is a candidate key. Even though Boyce-Codd normal form and third normal form generally produce the same result, Boyce-Codd normal form is a stronger definition than third normal form. Every table in Boyce-Codd normal form is by definition in third normal form. Boyce-Codd normal form considers two special cases not covered by third normal form: 1. Part of a composite entity identifier determines part of its attribute, and 2. A non entity identifier attribute determines part of an entity identifier attribute. These situations are only possible if there is a composite entity identifier, and dependencies exist from a non-entity identifier attribute to part of the entity identifier. 7 For example, consider the entity type Student Concentration. The entity type is in third normal form, but since there is a dependency FacultyName, MajorMinor, it is not in Boyce-Codd normal form. To ensure that Student Concentration entity type stays in Boyce-Codd normal form, another entity type Faculty with one-to-many relationship is constructed as shown in the following figure. The relationship cardinalities can be further modified to reflect organizational working. Fourth Normal Form (4NF) Fourth normal form rule is that there should not be more than one multi-valued dependency in a table. For example, consider the Student Details entity type. Now, during requirements analysis if it is found that the Major Minor values of a student are independent of the Activity performed by the student, then the entity type structure will violate the fourth normal form. To resolve the violation of the fourth normal form separate weak entity types with identifying relationships are created. The Student Focus and Student Activity entity types are weak entity types. The relationship cardinalities 8 can be further modified to reflect organizational working. It is now presumed that the Student entity type has the functional dependency SID, Name, Street, City, Zip. Architecture: The Logical Structures: Access to the data is made possible by a well-defined logical organization composed of the following. Logical Description structure Fields A field holds a single piece of information, such as a name or an amount. A field can hold one specific type of information. Fields are assembled into a structure called a record. On its own, a field is not very useful, as it can hold only a limited amount of information. Records A record is a logical structure assembled from an arbitrary number of fields. A record 9 stores a single entry in the database. The fields in a record store information about important properties of the entry. Records are organized in tables. Tables A table can be thought of as an N times M matrix. Each of the N rows describes a record and each of the M columns describes a field in the record. Tables are organized in companies. Companies A company is a sub-database; its primary use is to separate and group large portions of data together. A company can contain private tables as well as tables that are shared with other companies. The following illustration shows logical structures. Application of Normalization to ERD: Data modeling is an iterative process. Generally a preliminary data model is constructed which is then refined many times. There are many guidelines (rules) for refining an ERD. Some of these rules are as follows: 1. Transform attributes into entity types. This transformation involves the addition of an entity type and a 1-M (one-to-many) relationship. 10 2. Split compound attributes into smaller attributes. A compound attribute contains multiple kinds of data. 3. Expand entity types into two entity types and a relationship. This transformation can be useful to record a finer level of detail about an entity. 4. Transform a weak entity type into a strong entity type. This transformation is most useful for associative entity types. 5. Add historical details to a data model. Historical details may be necessary for legal as well as strategic reporting requirements. This transformation can be applied to attributes and relationships. 6. Add generalization hierarchies by transforming entity types into generalization hierarchy. Application of normalization principles toward ERD development enhances these guidelines. To understand this application (i) representation of dependency concepts in an ERD is outlined, followed by (ii) representation of normal forms toward the development of entity type structure. Guidelines for identification of various dependencies is avoided in the paper so as to focus more on their application. Only the first four normal forms and the Boyce-Codd normal forms are considered. Representation of Dependencies Functional dependency in an entity type occurs if one observes the association among the entity identifier and other attributes as reflected in an entity instance. Each entity instance represents a set of values taken by the non entity identifier attributes for each primary key (entity identifier) value. So, in a way an entity instance structure also reflects an application of the functional dependency concept. For example, the Student entity type can represent the functional dependency SID , Name, Street, City, Zip. 11 Each entity instance will now represent the functional dependency among the entity attributes as shown. During requirement analysis, some entity types may be identified through functional dependencies, while others may be determined through database relationships. For example, the statement, "A faculty teaches many offerings but an offering is taught by one faculty" defines entity type Faculty and Offerings. Another important consideration is to distinguish when one attribute alone is the entity identifier versus a composite entity identifier. A composite entity identifier is an entity identifier with more than one attribute. A functional dependency in which the determinant contains more than one attribute usually represents a many-to-many relationship, which is more addressed through higher normal forms. The notion of having a composite entity identifier is not very common, and often times is a matter of expediency, rather than good entity structure or design. 12 Transitive dependency in an entity type occurs if non entity identifier attributes have dependency among themselves. For example, consider the modified Student entity type . In this entity type, suppose there is a functional dependency BuildingName, Fee. Existence of BuildingName Fee dependency implies that the value assigned to the Fee attribute is fixed for distinct BuildingName attribute values. In other words, the Fee attribute values are not specific to the SID value of a student, but rather the BuildingName value. The entity instance of transitive dependency is shown in the figure. Multi-valued dependency equivalency in ERD occurs when attributes within an entity instance have more than one value. This is a situation when some attributes within an entity instance have maximum cardinality of N (more than 1). When an attribute has multiple values in an entity instance, it can be setup either as a composite key identifier of the entity type, or split into a weak entity type. For example, consider the following entity type Student Details as shown as follows. 13 The Student Details entity type has a composite entity identifier consisting of three attributes - SID, Major Minor, and Activity. The composition of entity identifier is due to the fact that a student has multiple Major Minor values along with being involved in multiple activities. However, a student has only one value for Name, Street, City, Zip attributes based on the functional dependency SID, Major Minor, Activity,Name, Street, City, Zip. The multi-valued dependency affects the key structure. So, in the Student Details entity type, there can be an MVD SID,Major Minor, Activity. This means that a SID value is associated with multiple values of Major Minor and Activity attributes, and together they determine other attributes. The entity instance of Student Details entity type as follows. 14 Diagram: The Database Design Life cycle Database development is just one part of the much wider field of software engineering, the process of developing and maintaining software. A core aspect of software engineering is the subdivision of the development process into a series of phases, or steps, each of which focuses on one aspect of the development. The collection of these steps is sometimes referred to as a development life cycle. The software product moves through this life cycle (sometimes repeatedly as it is refined or redeveloped) until it is finally retired from use. Ideally, each phase in the life cycle can 15 be checked for correctness before moving on to the next phase. However, software engineering is a very rich discipline with many different methods for the subdivision of the development process and a detailed exploration of the many different ways in which development can be structured is beyond the scope of this unit. Establishing requirements involves consultation with, and agreement among, stakeholders as to what they want of a system, expressed as a statement of requirements. Analysis starts by considering the statement of requirements and finishes by producing a system specification. The specification is a formal representation of what a system should do, expressed in terms that are independent of how it may be realized. Design begins with a system specification and produces design documents, and provides a detailed description of how a system should be constructed. Implementation is the construction of a computer system according to a given design document and taking account of the environment in which the system will be operating (for example specific hardware or software available for the development). Implementation may be staged, usually with an initial system than can be validated and tested before a final system is released for use. Testing compares the implemented system against the design documents and requirements specification and produces an acceptance report or, more usually, a list of errors and bugs that require a review of the analysis, design and implementation processes to correct (testing is usually the task that leads to the waterfall model iterating through the life cycle). Maintenance involves dealing with changes in the requirements, or the implementation environment, bug fixing or porting of the system to new 16 environments (for example migrating a system from a standalone PC to a UNIX workstation or a networked environment). Since maintenance involves the analysis of the changes required, design of a solution, implementation and testing of that solution over the lifetime of a maintained software system, the waterfall life cycle will be repeatedly revisited. Conclusion: Instead of applying normalization principles during the relational design portion of logical database design phase, it is better to apply them during the conceptual modeling phase. Due to the similarity in the notion of an entity type and a relation, normalization concepts when explained or applied to an ERD may generate a richer model. Also, such an application enables a better representation of user working requirements. There should be only one dependency in each entity type where the determinant is the entity identifier. There should not be any additional dependency among the non entity identifier attributes. Any such additional dependency should be represented by a new entity type with one-to-many relationship. If there is a composite entity identifier of three or more attributes it should be ensured that there is only one multi-valued dependency among them. Future enhancement: Normalization provides numerous benefits to a database. Some of the major benefits include the following: Greater overall database organization Reduction of redundant data Data consistency within the database A much more flexible database design A better handle on database security 17 Organization is brought about by the normalization process, making everyone's job easier, from the user who accesses tables to the database administrator (DBA) who is responsible for the overall management of every object in the database. Data redundancy is reduced, which simplifies data structures and conserves disk space. Because duplicate data is minimized, the possibility of inconsistent data is greatly reduced. For example, in one table an individual's name could read STEVE SMITH, whereas the name of the same individual reads STEPHEN R. SMITH in another table. Because the database has been normalized and broken into smaller tables, you are provided with more flexibility as far as modifying existing structures. It is much easier to modify a small table with little data than to modify one big table that holds all the vital data in the database. Lastly, security is also provided in the sense that the DBA can grant access to limited tables to certain users. Security is easier to control when normalization has occurred. Data integrity is the assurance of consistent and accurate data within a database. 18 Reference: Adelman, S., Moss, Larissa and Abai, Majid (2005) Data Strategy, AddisonWesley, Readings, MA. Bala, Mohan and Martin, Kipp (1997) "A Mathematical Programming Approach to Data Base Normalization," Informs Journal of Computing, Vol. 9, No.1, pp. 114. Balaban, M. and Shoval, P. (1999). Enhancing the ER model with integrity methods. Journal of Database Management, 10(4),14-23. Balaban, M. and Shoval, P. (2002). Enforcing Cardinality Constraints in ER Model with Integrity Methods. In Keng Siau (Eds) Advanced Topics in Database Research, Volume 1, 1-16. 19