database design

advertisement
PHASE 3 :
SYSTEM DESIGN
LESSON 8
DATABASE DESIGN
INTRODUCTION
In the previous lesson, we have learned several types of development strategies that should be
considered in the system development and how prototyping have been used in the
development. System design is the fourth phase of the system development life cycle which we
develop an understanding on how the system will operate. In this lesson we will discuss on d
database design. This lesson consists of four sections:

overview of database design

conventional files vs. database

database concepts

normalization
LEARNING OUTCOMES
At the end of this lesson, students should be able to :

define database design

compare and contrast between conventional files and database

define and give examples of database concepts such as tables, fields and records

do the normalization process
TERMINOLOGY
No
Word
Definition
1
Database
2
Database management a collection of tools, features, and interfaces that enable
system
A collection of tables, that form an overall data structure
users to add, update, manage, access, analyze and also
delete the content of a set of data
3
Field
The smallest unit of named application data recognized by
a system software
4
First normal form
a relation in which the intersection of each row and column
contains one and only one value
5
Fully dependency
A condition where a1 and a2 attributes of relation R. If a1 >a2, then a2 is fully dependent a1
6
Normalization
a technique for producing a set of relations with desirable
properties, given the data requirements of an enterprise
7
Second normal form
a relation that is in 1NF and every non key attribute is fully
functional dependent on the key
8
Table
The relational database that is equivalent of a file
9
Third normal form
a relation that is in first and second normal form and in
which no non-key attribute is transitively dependent on the
key
10
Transitive dependency
A condition where a1, a2 and a3 are attributes of relation R.
If a1 ->a2 and a2-> a3, then a3 is transitively dependent on
a1 via a2
8.1
OVERVIEW OF DATABASE DESIGN
System
Planning
System
Analysis
√ Files and Database
Input Design
Output Design
User Interfaces Design
System
Design
System
Implementation
System
Maintenance
Figure 8-1: Database Design Activities in the System Design Phase
Database is where all the information system’s data are kept as records and will be accessed
again for the system’s used. Database is a main component in any system where we keep
records. Figure 8-1 shows that database design is one of the activity involved during the system
design. Data are a core of a system element. In the previous lesson, we have learned how Data
Flow Diagram and Entity-Relationship Diagram have been used to depict the data requirements
of a system. When designing a database, we should define data in a fundamental form called
normalized data. The main purpose of database design is to structure the data in a stable
structure so that it’s easy to manage data and less redundancy.
8.2
CONVENTIONAL FILES AND DATABASES
Before we proceed to the core of database design, we need to know the differences between
conventional files and database. Both of these approaches have its own advantages and
disadvantages, but since years ago, more organizations are moving forward choosing database
as the data storage for the information system.
8.2.1
Conventional Files
Conventional files also known as file processing. It stores, manages data in one or more than
one separate files. Organizations mainly use file processing to handle large volumes of
structured data on a regular basis. Many older systems utilize the use of file processing
because this approach is well suited to mainframe hardware and batch input. It’s also because
lower cost and more efficient than database. But, there is problems arise so organizations were
changing to use database. Three major problems exist in file processing are :
i)
Data redundancy
Occurs when the same data but stored in several places. It may cause mores storage for
same data, and it will cost more when updating and maintaining data process.
Sometimes, data updating is only be done at one place but not at all places.
ii) Data integrity
Occurs when updates are not applied in every file. Changing data in only certain places
but not at all places will cause inconsistent data and result in incorrect information. It’s
because same entity should be refer to same data.
iii) Rigid data structure
Business must make a decision based on company-wide data and manager often require
information from multiple unit from the company. But, retrieving information from independent,
file-based system is slow and inefficient.
8.2.2
Databases
A proper design of a database system will provide a solution for problems of file processing. In
database environment several systems can be built around a single database. When designing
a database, we need to consider what type of Database Management System will be used. A
database management system (DBMS) is a collection of tools, features, and interfaces that
enable users to add, update, manage, access, analyze and also delete the content of a set of
data. Some of advantages of DBMS are :
i)
Scalability
The system can be expanded, modified or downsized easily to meet the rapidly
changing needs of an organization.
ii) Better support
In a client/server system, processing is distributed throughout the organization. It may
require the power and flexibility of a database design.
iii) Economy of scale
The use of database allows better utilization of hardware because database processing
is at a lower cost.
iv) Flexible data sharing
Data can be shared among the parties, allowing more users to access more data, allows
more than one user to access same information.
8.2.3
Exercises
Answer TRUE or for FALSE for each of the questions below.
1. The purpose of database design is to choose an efficient and securely data storage
technologies. TRUE
2. The major problems exist in file processing are data redundancy, data integrity and rigid
data structure. TRUE
3. A database is a collection of interrelated files. TRUE
4. A database is necessarily dependent on the applications that use it. FALSE
5. A database should be develop using any DBMS. TRUE
8.3
DATABASE CONCEPTS
Before we proceed to database design, first we need to understand the concepts of database.
In ER-Diagram, an entity is a person, thing, place or event for which data is collected and
maintained. The relational model is based on a concept of a relation, which is physically
represented as a table. In this section, we will explain the terminology and structural concepts of
the relational model.
Fields
Records
Primary key
Figure 8-2: A Table STUDENT in a Database
8.3.1
Tables or Files
Data is organized into tables or files. In database system, a file is called a table. A file is a set of
all occurrences a given record structure. A table is the relational database equivalent of a file. A
table or files, contains a set of related records that store data about specific entity. Tables and
files are shown as two-dimensional structures that consist of rows and columns. Columns
represent a fields, or characteristics of the entity, while row represent a record which it is refer to
individual instances. Figure 8-2 shows an example of one table named STUDENT exist in
STUDENT REGISTRATION SYSTEM.
8.3.2
Fields
Fields are common for files and databases. A field is a physical implementation of an attribute
exists in ER-D. It’s the smallest unit of meaningful data to be stored in a file or database.
Attributes is represented as a field in a tables. A table consists of more than one attributes.
Fields can have values. Figure 8-2 shows an example of fields exist in STUDENT table. These
values can be of fixed or variable length; can be alphabetic, numeric, or others depending on
setting during the design; refers to domain. A domain is a set of allowable values for one or
more than one values. There are four types of fields can be stored :

Primary key - is a field whose value is used to identify one and only one record.

Secondary key - is a field that can be used as an alternate key that can be used to
identify a single record in a table

Foreign key - is a field that can be used as a pointer to the records of a different files or
tables in a database.

Descriptive field - are a field that stores a business data except all they types of keys
above.
8.3.3
Records
Fields are organized into records. Records are common to both files and tables. A record is a
collection of data that have something in common with the entity. A record, also refer to a tuple;
is a set of related fields that describes one instances, or occurrence of an entity. A record may
have one or more than one fields, depending on what type of information is needed. Figure 8-2
shows an example of records in STUDENT table.
8.3.4
Exercises
Answer TRUE or for FALSE for each of the questions below.
1. Tables and files are shown as two-dimensional structures that consist of rows and columns.
TRUE
2. A field is the physical implementation of a data attribute. TRUE
3. Files are the smallest unit of meaningful data to be stored. FALSE
4. A primary key is a field whose values identify one and only one record in a file and it cannot
be NULL. TRUE
5. A record cannot have one or more than one fields. FALSE
8.4
NORMALIZATION
Normalization is a technique for producing a set of relations with desirable properties, given the
data requirements of an enterprise (Connolly et. al 2002). Normalization always performed as a
series of tests on relations to determine whether it satisfies the requirements of a given normal
form. There are three normal forms that were normally used called first normal form (1NF),
second normal form (2NF) and third normal form (3NF). All this normal forms are normally
based on functional dependencies among attributes in a relation. In this section, we will explain
these three types of normal forms. Before that, we need to explain the concept of functional
dependencies.
8.4.1
Functional Dependencies
In order to explain the concept of functional dependencies, let us assume that we have a
relation R. This relation R has several attributes named, a1, a2, a3 and a4. So,
R (a1, a2, a3, a4)
In this relation R, if a1 and a2 are the attributes of R, a2 is functionality dependent on a1
(denoted as a1-> a2). Meaning that a1 is determinant on a2. Figure 8-3 shows that a2 is
functionality dependent on a1. Determinant refers to attributes or group of attributes on the lefthand side of the arrow of a functional dependency (Connolly et. al 2002).
a2 is functionality
dependent on a1
a1
a2
Figure 8-3: a1 is a determinant of a2
The rest of this section, we will use this example below. Assume that we have a Relation
STUDENT with the following attributes :
STUDENT ( student_ID, student_name, program_code, course_id, course_name)
In this relation, noted that student_name is functionality dependent on student_ID. student_ID
is the determinant of student_name as shown in figure below.
student_ID
student_name is
functionality dependent
on student_ID
student_name
Figure 8-4: student_ID is a determinant of student_name
8.4.2 First Normal Form
First normal form (1NF) is a relation in which the intersection of each row and column contains
one and only one value. The process of normalization begins with a table in unnormalized form
(UNF). To transform this unnormalized form to 1NF, we should identify and remove repeating
groups in the table. A repeating group is an attribute or group of attributes in the table that
occurs multiple values for a single occurrence of the key attribute for that table. Figure 8-5
shows there are repeating groups occurred when course data is repeated for each student.
Figure 8-5: A Table GRADE with repeating groups
There are two approaches to remove repeating groups from unnormalized form:
1. Remove the repeating groups by entering an appropriate data in the empty columns
containing repeating data.
2. Remove the repeating groups by placing the repeating data, along with a copy of the
original key attribute in a separate new relation.
Using the first approach, we remove the repeating group by entering an appropriate data in
each row. The result is shown in Figure 8-6. From here, we select StudId and CourseId as
primary keys. Then, the STUDENT relation is defined as below :
GRADE(StudId,
TeachRoom,
Grade)
StudName,Tel,
Major,
CourseId,
CourseTitle,
TeachName,
Figure 8-6: First Normal Form
8.4.3
Second Normal Form
Second normal form (2NF) is based on the concept of full functional dependency. Second
normal form (2NF) is a relation that is in 1NF and every non key attribute is fully functional
dependent on the key. Full functional dependency is when in relation R, if a1 and a2 are the
attributes of R, a2 is fully functionality dependent on a1 but not on any proper subset of a1.
Partial functionality dependent is a condition when some attributes that can be removed from a1
and the dependencies still holds. From First Normal Form below, we identify partial and fully
dependencies for each attribute. Figure 8-7 shows results for Second Normal Form and its
relations.
Figure 8-7: Second Normal Form
STUDENT(StudId, StudName, Tel, Major)
COURSETEACH(CourseId, CourseTitle, TeachName, TeachRoom)
REGISTRATION(StudId, CourseId, Grade)
8.4.4
Third Normal Form
Second normal form (2NF) has less redundancy than 1NF, but the redundancy are still exist.
Third normal form (3NF) is a relation that is in first and second normal form and a condition
where no non-key attribute is transitively dependent on the key. Third normal form (3NF) is
based on the concept of transitive dependency. Transitive dependency is a condition where a1,
a2 and a3 are attributes of relation R. If a1 -> a2 and a2 -> a3, then a3 is transitively dependent
on a1 via a2. Transitive dependency is a condition where non key attributes is depends on non
key attributes. So, we should remove all non key attributes with transitive dependencies to a
new relation and copy together with its key attribute. Figure 8-8 shows Third Normal Form and
its relations.
Figure 8-8: Third Normal Form
STUDENT(StudId, StudName, Tel, Major)
COURSETEACH(CourseId, CourseTitle, TeachName)
REGISTRATION(StudId, CourseId, Grade)
TEACHER(TeachName, TeacherRoom)
8.4.5
Exercises
Answer TRUE or for FALSE for each of the questions below.
1. Normalization is a technique for producing a set of relations with desirable properties, given
the data requirements of an enterprise. TRUE
2. An attribute cannot be functionally dependent on more than attribute. FALSE
3. A partial dependency is a condition where non key attributes is depends on a part of key
attributes. TRUE
4. A transitive dependency is a condition where non key attributes is depends on key
attributes. FALSE
5. Third normal form (3NF) is a relation that is in first and second normal form and a condition
where no non-key attribute is transitively dependent on the key. TRUE
SUMMARY
This is the end of lesson Eight. In this lesson, we have learned :

overview of database design

conventional files vs. database

database concepts

normalization
In the next lesson, we will discuss the second activities in the system design, output and report
design. Outputs present information to system users. So, we need to design a good and
effective output.
SELF ASSESSMENT
Fill in with the correct answer
1. ________________________ is where all the information system’s data are kept as records
and will be accessed again for the system’s used. Database
2. ________________________a collection of tools, features, and interfaces that enable users
to add, update, manage, access, analyze and also delete the content of a set of data.
Database Management System
3. A ________________________ is a physical implementation of an attribute exists in Entity
Relationship Diagram. Field
4. A ________________________ is a named, two-dimensional table of data. Relation
5. ________________________ is a field whose value is used to identify one and only one
record. Primary key
6. A ________________________ is a collection of data that have something in common with
the entity and also refers as a________________________. record, tuple
7. ________________________ is the process of converting complex data structures into
simple, stable data structures. Normalization
8. A relation is in ________________________ if every nonprimary key attribute is functionally
dependent on the whole primary key. second normal form
9. Second normal form (2NF) is based on the concept of ________________________. full
functional dependency
10. ________________________ is a condition where non key attributes is depends on non
key attributes. Transitive dependency
Download