PHASE 3 : SYSTEM DESIGN LESSON 8 DATABASE DESIGN INTRODUCTION In the previous lesson, we have learned several types of development strategies that should be considered in the system development and how prototyping have been used in the development. System design is the fourth phase of the system development life cycle which we develop an understanding on how the system will operate. In this lesson we will discuss on d database design. This lesson consists of four sections: overview of database design conventional files vs. database database concepts normalization LEARNING OUTCOMES At the end of this lesson, students should be able to : define database design compare and contrast between conventional files and database define and give examples of database concepts such as tables, fields and records do the normalization process TERMINOLOGY No Word Definition 1 Database 2 Database management a collection of tools, features, and interfaces that enable system A collection of tables, that form an overall data structure users to add, update, manage, access, analyze and also delete the content of a set of data 3 Field The smallest unit of named application data recognized by a system software 4 First normal form a relation in which the intersection of each row and column contains one and only one value 5 Fully dependency A condition where a1 and a2 attributes of relation R. If a1 >a2, then a2 is fully dependent a1 6 Normalization a technique for producing a set of relations with desirable properties, given the data requirements of an enterprise 7 Second normal form a relation that is in 1NF and every non key attribute is fully functional dependent on the key 8 Table The relational database that is equivalent of a file 9 Third normal form a relation that is in first and second normal form and in which no non-key attribute is transitively dependent on the key 10 Transitive dependency A condition where a1, a2 and a3 are attributes of relation R. If a1 ->a2 and a2-> a3, then a3 is transitively dependent on a1 via a2 8.1 OVERVIEW OF DATABASE DESIGN System Planning System Analysis √ Files and Database Input Design Output Design User Interfaces Design System Design System Implementation System Maintenance Figure 8-1: Database Design Activities in the System Design Phase Database is where all the information system’s data are kept as records and will be accessed again for the system’s used. Database is a main component in any system where we keep records. Figure 8-1 shows that database design is one of the activity involved during the system design. Data are a core of a system element. In the previous lesson, we have learned how Data Flow Diagram and Entity-Relationship Diagram have been used to depict the data requirements of a system. When designing a database, we should define data in a fundamental form called normalized data. The main purpose of database design is to structure the data in a stable structure so that it’s easy to manage data and less redundancy. 8.2 CONVENTIONAL FILES AND DATABASES Before we proceed to the core of database design, we need to know the differences between conventional files and database. Both of these approaches have its own advantages and disadvantages, but since years ago, more organizations are moving forward choosing database as the data storage for the information system. 8.2.1 Conventional Files Conventional files also known as file processing. It stores, manages data in one or more than one separate files. Organizations mainly use file processing to handle large volumes of structured data on a regular basis. Many older systems utilize the use of file processing because this approach is well suited to mainframe hardware and batch input. It’s also because lower cost and more efficient than database. But, there is problems arise so organizations were changing to use database. Three major problems exist in file processing are : i) Data redundancy Occurs when the same data but stored in several places. It may cause mores storage for same data, and it will cost more when updating and maintaining data process. Sometimes, data updating is only be done at one place but not at all places. ii) Data integrity Occurs when updates are not applied in every file. Changing data in only certain places but not at all places will cause inconsistent data and result in incorrect information. It’s because same entity should be refer to same data. iii) Rigid data structure Business must make a decision based on company-wide data and manager often require information from multiple unit from the company. But, retrieving information from independent, file-based system is slow and inefficient. 8.2.2 Databases A proper design of a database system will provide a solution for problems of file processing. In database environment several systems can be built around a single database. When designing a database, we need to consider what type of Database Management System will be used. A database management system (DBMS) is a collection of tools, features, and interfaces that enable users to add, update, manage, access, analyze and also delete the content of a set of data. Some of advantages of DBMS are : i) Scalability The system can be expanded, modified or downsized easily to meet the rapidly changing needs of an organization. ii) Better support In a client/server system, processing is distributed throughout the organization. It may require the power and flexibility of a database design. iii) Economy of scale The use of database allows better utilization of hardware because database processing is at a lower cost. iv) Flexible data sharing Data can be shared among the parties, allowing more users to access more data, allows more than one user to access same information. 8.2.3 Exercises Answer TRUE or for FALSE for each of the questions below. 1. The purpose of database design is to choose an efficient and securely data storage technologies. TRUE 2. The major problems exist in file processing are data redundancy, data integrity and rigid data structure. TRUE 3. A database is a collection of interrelated files. TRUE 4. A database is necessarily dependent on the applications that use it. FALSE 5. A database should be develop using any DBMS. TRUE 8.3 DATABASE CONCEPTS Before we proceed to database design, first we need to understand the concepts of database. In ER-Diagram, an entity is a person, thing, place or event for which data is collected and maintained. The relational model is based on a concept of a relation, which is physically represented as a table. In this section, we will explain the terminology and structural concepts of the relational model. Fields Records Primary key Figure 8-2: A Table STUDENT in a Database 8.3.1 Tables or Files Data is organized into tables or files. In database system, a file is called a table. A file is a set of all occurrences a given record structure. A table is the relational database equivalent of a file. A table or files, contains a set of related records that store data about specific entity. Tables and files are shown as two-dimensional structures that consist of rows and columns. Columns represent a fields, or characteristics of the entity, while row represent a record which it is refer to individual instances. Figure 8-2 shows an example of one table named STUDENT exist in STUDENT REGISTRATION SYSTEM. 8.3.2 Fields Fields are common for files and databases. A field is a physical implementation of an attribute exists in ER-D. It’s the smallest unit of meaningful data to be stored in a file or database. Attributes is represented as a field in a tables. A table consists of more than one attributes. Fields can have values. Figure 8-2 shows an example of fields exist in STUDENT table. These values can be of fixed or variable length; can be alphabetic, numeric, or others depending on setting during the design; refers to domain. A domain is a set of allowable values for one or more than one values. There are four types of fields can be stored : Primary key - is a field whose value is used to identify one and only one record. Secondary key - is a field that can be used as an alternate key that can be used to identify a single record in a table Foreign key - is a field that can be used as a pointer to the records of a different files or tables in a database. Descriptive field - are a field that stores a business data except all they types of keys above. 8.3.3 Records Fields are organized into records. Records are common to both files and tables. A record is a collection of data that have something in common with the entity. A record, also refer to a tuple; is a set of related fields that describes one instances, or occurrence of an entity. A record may have one or more than one fields, depending on what type of information is needed. Figure 8-2 shows an example of records in STUDENT table. 8.3.4 Exercises Answer TRUE or for FALSE for each of the questions below. 1. Tables and files are shown as two-dimensional structures that consist of rows and columns. TRUE 2. A field is the physical implementation of a data attribute. TRUE 3. Files are the smallest unit of meaningful data to be stored. FALSE 4. A primary key is a field whose values identify one and only one record in a file and it cannot be NULL. TRUE 5. A record cannot have one or more than one fields. FALSE 8.4 NORMALIZATION Normalization is a technique for producing a set of relations with desirable properties, given the data requirements of an enterprise (Connolly et. al 2002). Normalization always performed as a series of tests on relations to determine whether it satisfies the requirements of a given normal form. There are three normal forms that were normally used called first normal form (1NF), second normal form (2NF) and third normal form (3NF). All this normal forms are normally based on functional dependencies among attributes in a relation. In this section, we will explain these three types of normal forms. Before that, we need to explain the concept of functional dependencies. 8.4.1 Functional Dependencies In order to explain the concept of functional dependencies, let us assume that we have a relation R. This relation R has several attributes named, a1, a2, a3 and a4. So, R (a1, a2, a3, a4) In this relation R, if a1 and a2 are the attributes of R, a2 is functionality dependent on a1 (denoted as a1-> a2). Meaning that a1 is determinant on a2. Figure 8-3 shows that a2 is functionality dependent on a1. Determinant refers to attributes or group of attributes on the lefthand side of the arrow of a functional dependency (Connolly et. al 2002). a2 is functionality dependent on a1 a1 a2 Figure 8-3: a1 is a determinant of a2 The rest of this section, we will use this example below. Assume that we have a Relation STUDENT with the following attributes : STUDENT ( student_ID, student_name, program_code, course_id, course_name) In this relation, noted that student_name is functionality dependent on student_ID. student_ID is the determinant of student_name as shown in figure below. student_ID student_name is functionality dependent on student_ID student_name Figure 8-4: student_ID is a determinant of student_name 8.4.2 First Normal Form First normal form (1NF) is a relation in which the intersection of each row and column contains one and only one value. The process of normalization begins with a table in unnormalized form (UNF). To transform this unnormalized form to 1NF, we should identify and remove repeating groups in the table. A repeating group is an attribute or group of attributes in the table that occurs multiple values for a single occurrence of the key attribute for that table. Figure 8-5 shows there are repeating groups occurred when course data is repeated for each student. Figure 8-5: A Table GRADE with repeating groups There are two approaches to remove repeating groups from unnormalized form: 1. Remove the repeating groups by entering an appropriate data in the empty columns containing repeating data. 2. Remove the repeating groups by placing the repeating data, along with a copy of the original key attribute in a separate new relation. Using the first approach, we remove the repeating group by entering an appropriate data in each row. The result is shown in Figure 8-6. From here, we select StudId and CourseId as primary keys. Then, the STUDENT relation is defined as below : GRADE(StudId, TeachRoom, Grade) StudName,Tel, Major, CourseId, CourseTitle, TeachName, Figure 8-6: First Normal Form 8.4.3 Second Normal Form Second normal form (2NF) is based on the concept of full functional dependency. Second normal form (2NF) is a relation that is in 1NF and every non key attribute is fully functional dependent on the key. Full functional dependency is when in relation R, if a1 and a2 are the attributes of R, a2 is fully functionality dependent on a1 but not on any proper subset of a1. Partial functionality dependent is a condition when some attributes that can be removed from a1 and the dependencies still holds. From First Normal Form below, we identify partial and fully dependencies for each attribute. Figure 8-7 shows results for Second Normal Form and its relations. Figure 8-7: Second Normal Form STUDENT(StudId, StudName, Tel, Major) COURSETEACH(CourseId, CourseTitle, TeachName, TeachRoom) REGISTRATION(StudId, CourseId, Grade) 8.4.4 Third Normal Form Second normal form (2NF) has less redundancy than 1NF, but the redundancy are still exist. Third normal form (3NF) is a relation that is in first and second normal form and a condition where no non-key attribute is transitively dependent on the key. Third normal form (3NF) is based on the concept of transitive dependency. Transitive dependency is a condition where a1, a2 and a3 are attributes of relation R. If a1 -> a2 and a2 -> a3, then a3 is transitively dependent on a1 via a2. Transitive dependency is a condition where non key attributes is depends on non key attributes. So, we should remove all non key attributes with transitive dependencies to a new relation and copy together with its key attribute. Figure 8-8 shows Third Normal Form and its relations. Figure 8-8: Third Normal Form STUDENT(StudId, StudName, Tel, Major) COURSETEACH(CourseId, CourseTitle, TeachName) REGISTRATION(StudId, CourseId, Grade) TEACHER(TeachName, TeacherRoom) 8.4.5 Exercises Answer TRUE or for FALSE for each of the questions below. 1. Normalization is a technique for producing a set of relations with desirable properties, given the data requirements of an enterprise. TRUE 2. An attribute cannot be functionally dependent on more than attribute. FALSE 3. A partial dependency is a condition where non key attributes is depends on a part of key attributes. TRUE 4. A transitive dependency is a condition where non key attributes is depends on key attributes. FALSE 5. Third normal form (3NF) is a relation that is in first and second normal form and a condition where no non-key attribute is transitively dependent on the key. TRUE SUMMARY This is the end of lesson Eight. In this lesson, we have learned : overview of database design conventional files vs. database database concepts normalization In the next lesson, we will discuss the second activities in the system design, output and report design. Outputs present information to system users. So, we need to design a good and effective output. SELF ASSESSMENT Fill in with the correct answer 1. ________________________ is where all the information system’s data are kept as records and will be accessed again for the system’s used. Database 2. ________________________a collection of tools, features, and interfaces that enable users to add, update, manage, access, analyze and also delete the content of a set of data. Database Management System 3. A ________________________ is a physical implementation of an attribute exists in Entity Relationship Diagram. Field 4. A ________________________ is a named, two-dimensional table of data. Relation 5. ________________________ is a field whose value is used to identify one and only one record. Primary key 6. A ________________________ is a collection of data that have something in common with the entity and also refers as a________________________. record, tuple 7. ________________________ is the process of converting complex data structures into simple, stable data structures. Normalization 8. A relation is in ________________________ if every nonprimary key attribute is functionally dependent on the whole primary key. second normal form 9. Second normal form (2NF) is based on the concept of ________________________. full functional dependency 10. ________________________ is a condition where non key attributes is depends on non key attributes. Transitive dependency