A WEB-BASED ELECTRONIC FILING SYSTEM USING

advertisement
A WEB-BASED ELECTRONIC FILING SYSTEM USING
CONVERSION OF IMAGE FILE TO TEXT FILE APPROACH
YOUSIF NABEIL YOUSIF
DISSERTATION SUBMITTED IN FULFILMENT OF THE
REQUIREMENTS FOR THE DEGREE OF
MASTER OF COMPUTER SCIENCE
FACULTY OF COMPUTER SCIENCE
AND INFORMATIOM TECHNOLOGY
UNIVERSITY OF MALAYA
KUALA LUMPUR
MARCH 2010
DEDICATION
This work is dedicated to my beloved parents and my beloved wife
II
Acknowledgment
The author wishes to extend his grateful appreciation to all those who have
contributed directly and indirectly to the preparation of this thesis. Especially the author
wishes to extend his thanks to Doctor Norizan Mohd Yasin, Project Supervisor, for her
advice, guidance and encouragement throughout the preparation of this thesis.
Special thanks to the reviews, assessments and comments from the Panel of
assessors, which are significant in contributing toward the betterment of the thesis.
Finally, the author expresses his sincere thanks to his family members and his best
friends Gassan and Samih for the encouragement, inspiration and patience which they
provided at every step during this course of studies.
III
Abstract
The purpose of this thesis is to develop a document management system for the
departments in Faculty of Computer Science and Information Technology (FCSIT) in
University of Malaya, this system enables administering and managing of students files
more efficient. This system can also used in many other departments in the faculty or
university. The system developed is called Electronic Filing System (EFS). This system
consists of scanning, storing, indexing, archiving, retrieval, and accessing of original
documents. Electronic Filling System EFS also help users to save time in searching
document. The system can prevent lost document or damage from the effects of disasters
such as burn. The system also increases the user productivity of FCSIT and enhances the
efficiency of using information, communication and technology. This study employs
qualitative research method that includes observation, document analysis and interviews
for data collection process. The finding of the data analysis is use as system function
requirement in developing.
IV
TABLE OF CONTENTS
Acknowledgment
III
Abstract
IV
List of Figures
XII
List of Tables
XIV
Chapter 1: Introduction
1.1
1.2
What is Paper Document
2
1.1.1 What is Document Management
3
What is Electronic Document Management System
3
1.2.1
3
What is Electronic Filing System
1.3
Problem Background
3
1.4
Objective
4
1.5
Expected Outcome
4
1.6
Project Scope
5
1.7
Research Significant
5
1.8
Research Methodology
6
1.9
Thesis Layout
7
1.10
Summary
8
V
Chapter 2: Literature Review
2.1
The Definition of Document and Document Management
9
2.2
Different Type of Documents
10
2.3
The Uses of Paper Files and Electronic Files
13
2.4
Document Management System
14
2.5
Elements of Document Management
16
2.6
Centralized Filing System
18
2.6.1
19
2.7
2.8
2.9
Benefits of Centralized Filing System
Electronic Document Management Systems
20
2.7.1
23
Advantages of EDMS
Neural Networks
24
2.8.1
What is Neural Networks
24
2.8.2
Neural Networks Types
26
2.8.2.1
The First Type is Perceptron
26
2.8.2.2
The Second Type is Multi-Layer-Perceptron
27
2.8.2.3
The Third Type is Back Propagation Net
28
Self-Organizing Map
29
2.9.1
29
What is Self Organizing Map
2.10
Document Management Application- Literature Review
31
2.11
Summary
38
VI
Chapter 3: Research Methodology
3.1
Research Methodology
39
3.2
Data Collection Process
41
3.2.1
Observation
42
3.2.2
Interviews
43
3.3
System Development Methodology Approach
46
3.3.1 Waterfall Model-Introduction
46
3.3.2 Task Regions of Waterfall Model
48
3.3.2.1
Requirement Gathering and Analysis
48
3.3.2.2
System Design
48
3.3.2.3
Coding
48
3.3.2.4
Testing
49
3.3.2.5
Installation
49
3.3.2.6
Maintenance
49
3.4
Justification of Methodology Selection
50
3.5
Summary
50
Chapter 4: Case Study
4.1
University of Malaya
52
4.2
Quality Management& Enhancement Centre (QMEC)
53
4.2.1 Documentation Section
54
4.2.2
54
Internal Quality Audit Section
VII
4.2.3
Training & Awareness Section
54
4.2.4
Quality Management Section
54
4.2.5
Customers Feedback & Continuous Improvement Section
55
4.3
Faculty of Computer Science and Information Technology (FCSIT)
55
4.4
Office of FCSIT
56
4.5
Staff of FCSIT
56
4.6
Students of FCSIT
57
4.7
Research Unit of Analysis
58
4.8
The Current Document Managing System in FCSIT
59
4.9
Current System Drawbacks
59
4.10
Summary
60
Chapter 5: Data Collection and Analysis
5.1
The Answers to the Interview Questions
61
5.2
Observation
64
5.3
Challenges of Current Systems
65
5.4
General System Requirements
65
5.5
Summary
66
Chapter 6: System Implementation and Design
6.1
System Requirements
67
6.1.1
67
General Requirements
VIII
6.2
6.3
6.4
6.1.2
User Management Requirements
68
6.1.3
Functional Requirements
68
6.1.4
Non-Functional Requirements
69
Systems Development Consideration
70
6.2.1
System Environment
70
6.2.2
Programming Language and Development Tools
72
6.2.2.1
PHP Programming Language
72
6.2.2.2
MySQL Database System
74
Development Tools
74
6.3.1
PHP Designer 2007 Personal
74
6.3.2
MySQL Query Browsers
75
6.3.3
MySQL Administrator
75
System Design
75
6.4.1
Interface Module
76
6.4.2
Administrator Module
78
6.4.2.1
Administrator: Manage Users
78
6.4.2.2
Administrator: Update User Account
79
6.4.2.3
Administrator: Updating Screen
80
6.4.2.4
Administrator: Delete User Account
81
6.4.2.5
Administrator: Admission Records
82
6.4.2.6
Delete Student Form
83
6.4.3
User Operation
84
6.4.3.1
84
User Interface Screen
IX
6.4.4
6.4.3.2
User: Admission Records Screen
86
6.4.3.3
User: View Student Form Details
87
6.4.3.4
Update Student Form
88
6.4.3.5
User: Printing Students Form
90
6.4.3.6
Other Services
91
Java-Base Application Interface
91
6.4.4.1
Document Scanning
94
6.4.4.2
The Scanning Processes
94
6.4.4.3
Document Format
94
6.4.4.4
Scanning and Saving the Scanned Students
6.4.4.5
Registration Forms into the System
98
6.4.4.4.1
Scanning the Students’ Registration Forms
98
6.4.4.4.2
Browse for the Forms
101
Artificial Neural Network
102
6.4.4.5.1
103
6.4.4.6
Definition of Self-Organizing Map
104
6.4.4.7
Form Training
105
6.4.4.7.1
6.4.4.7.2
6.5
6.6
Definition of Artificial Neural Network
Training Procedures in Reading the
Registration Form
106
Sending Email
108
System Testing
109
6.5.1
110
Unit Testing
Summary
112
X
Chapter 7: Conclusion
7.1
Project Objectives
113
7.2
Training Staff and Users
114
7.3
System Limitation
114
7.4
Future Enhancements
115
7.5
Summary
115
APPENDIX A
116
References
132
XI
List of Figures
Figure 2.1
The Activity Profile for 8 Disk Officers and Chiefs
13
Figure 2.2
Electronic Document Management Systems
23
Figure 2.3
Perceptron Characteristics
27
Figure 2.4
Multi-Layer-Perceptron Characteristics
28
Figure 2.5
Back Propagation Net Characteristics
29
Figure 2.6
Link Node
30
Figure 3.1
General Overview of “Waterfall Model”
47
Figure 4.1
Current Students Registration
58
Figure 6.1
Use Case of System Architecture
75
Figure 6.2
Web-Based Interfaces
77
Figure 6.3
Use Case of Admin Privilege
78
Figure 6.4
Administrator Management Page
79
Figure 6.5
Administrator Updating Accounts
80
Figure 6.6
Administrator Delete Users Accounts
81
Figure 6.7
Use Cases of Administrator Admission Records
82
Figure 6.8
Administrators: Admission Records Screen
83
Figure 6.9
Delete student form from the database
84
Figure 6.10
Web-Base User Interface Screen
85
Figure 6.11
Use Case of User: Admission Records Screen
86
Figure 6.12
Staff: Admission Records
87
Figure 6.13
Users: Student Form Screen
88
Figure 6.14
Users: Update Student Form
89
XII
Figure 6.15
User: Printing the Admission Records Page
90
Figure 6.16
Users: Printing the Student Form
91
Figure 6.17
Java-Base Application Interfaces
92
Figure 6.18
Main Page of the Java-Base Application
93
Figure 6.19
Student Form
95
Figure 6.20
New Specially Designed Registrations Form
96
Figure 6.21
Designs New Registration Form Template
97
Figure 6.22
Selecting the Scanner
98
Figure 6.23
Scan process
99
Figure 6.24
Process of Changing from Image to Text
100
Figure 6.25
Upload Result Image-to-Texts
101
Figure 6.26
Browse and Select the Required Registration Form
102
Figure 6.27
Neural Network Processes
103
Figure 6.28
First Examples for Training Form
105
Figure 6.29
Second Examples for Training Form
106
Figure 6.30
Training Procedures
107
Figure 6.31
Selecting the Character and Numbers in the Form
108
Figure 6.32
Sending Student Forms by Email
109
XIII
List of Tables
Table 2.1
Different Between Free Ocr, OmniPage17 and Simple Ocr Systems
37
Table 3.1
Data Collection Source Evidence
41
Table 6.1
ESF System Environment
71
Table 6.2
ESF Hardware Requirement
72
Table 6.3
Unit Testing for the Entire Administrator
Table 6.4
User Functionality Module
110
Unit Testing for the Entire Staff user Functionality Module
111
XIV
Lest of Abbreviation
ANN:
Artificial Neural Network
CAD:
Computer Added Design
COLD:
Computer Output to Laser Disk
DMS:
Document Management System
EDMS:
Electronic Document Management System
EFS :
Electronic Filing System
FCSIT:
Faculty of Computer Science and Information Technology
IDE:
Integrated Development Environment
IPS:
Institute Postgraduate Studies
ISC:
International Student Centre
IT:
Information Technology
OCR:
Optical Character Recognition
PHEI:
Public Higher Education Institution
QMEC:
Quality Management& Enhancement Centre
SOM:
Self-Organizing Map
UM:
University Malaya
WM:
Waterfall Model
XV
Chapter 1: Introduction
Paper files are one of the most important basics of office work. Regarding to the
important increase in the quantity of files in the administrative transactions, people had
used paper to create and distribute documents. Now, however, they create electronic
documents using word processor or presentation documents using presentation software’s
with a personal computer and distribute these documents via computer network (AbuSafiya
& Mazumdar 2004).
The file storage system has became gradually more important, especially as the world is
heading towards computerized systems (Konishi & Ikeda 2007). Great advances in
electronic information technology have made the creation, storage and flow of electronic
documents not only feasible but economical, and consequently have led to great increases
in productivity. Yet, paper documents exist in virtually every office and are involved in
most business and non-business processes.
There are some institutions having very large number of files which required a large
storage space. According to (AbuSafiya & Mazumdar 2004) this has contributed additional
problem for the managing and administering of proper storage. Poor management of
documents for example files are being left lying on the office floor- can cause healthy and
safety problem as this can be a risk hazard for people who also share the same office space
as they can accidentally kick the file and cause injury. The management and administering
of electronic documents are easier and efficient. Electronic documents require less storage
1
space as most documents are stored and saved virtually and electronically. Electronic
systems enable the rapid creation and distribution of documents. Therefore, it can be
speculated that people would eventually replaced paper documents with electronic
documents and realize a paperless office.
The aim of the electronic filing system, which is referred to as EFS in this thesis is to
manage documents more efficiently and effectively by firstly to reduce storage space and
secondly, to ensure the safety of the files as no one can access the system without the
correct password and this is for the security of the student document. The EFS system’s
function includes storage, retrieve, read and print documents files. Change in the text is
allowed with permission. This is discussed in detailed in Chapter 6.
In order to store the file, the system use paper files as input where users have to scan the
paper source document using-scanner to input the file in the system and store it in the
allocated space (database) in the system storage. This research refers to user staff
interchangeably which refer to one who use the system.
1.1
What is Paper Document?
Paper document is any source of information, in material form, capable of being
used for reference or study or as an authority. Examples of document include: manuscripts,
printed matter, illustrations, diagrams and museum specimens.
2
1.1.1
What is Document Management?
A document management is the process of handling document in such a way that
information can be created, shared, organized and stored efficiently and appropriately
(LaMarca et al. 2006).
1.2
What is Electronice Document Management System?
Electronic document management system (EDMS) is a computer system or suite of
programs designed to store and track electronic document and other media.
1.2.1 What is Electronice Filing System?
Electronic filing system is one of the pluralities of document files each associated
with a mark of recognition of each document. Electronic filing system includes an input for
the introduction of a recovery in the form of a single document or multiple registration
process.
1.3
Problem Background
Managing documents can be a problem in many organizations. Poor management of
system can affect the efficiency of an organization. An organization need to have a
systematic way and procedures in administering and managing their document. However,
this is not always true. Many organizations still have problems in managing their document.
Some institutions have to keep their documents/files for several years before they
can be destroyed. The implies that there will be-numerous paper files in each institution.
All this paper files need to be kept that takes up office space. This creates issue in storage
3
space. Also if there is a need to retrieve any document from the files time is need to look
for the file.
In the current system few users need to work in the archive to search for required files.
Also, problem of time taken to search for files in the archive reduces user’s productivity of
and efficiency. On the contrary, the new system does not require any specific user to sits in
front of a computer to search for the required documents/files. Required documents/files
can be searched quickly hence increasing users productivity at work and efficiency.
1.4
Objective
The main objectives of this research are:
•
To study how files are being manage in organization.
•
To investigate the work process involve in managing files.
•
To develop a system to manage the files electronically.
•
To developed a program that is able to read handwritten character so that it can be
scanned as imaged but can be manipulated as text to use paper source document that
can be scanned as an input file and be kept for later use.
1.5
Expected Outcome
The output of this research is a system that manages documents/file electronically.
This system will overcome the problems identified in section 1.2 by allowing
documents/files to be stored electronically hence minimizing the huge storage space
needed for manual paper files. In addition, the process of storing, organizing and
retrieving of the electronic file is made easier and faster. Also input of data into the
4
system is faster by the use of scanning devices where source paper document is scanned
and stored as image in the system.
1.6
Project Scope
This research is intended to develop a system that manages document for easy
storage, organizing and retrieval of file. Input to the system will be the paper documents
which will be scanned as imaged and stored in the computer system. This image is then
converted to text by using the neural network method and displays it for staff manipulation
just like any other text file.
1.7
Research Significant
This research identifies the need of developing and promoting a comprehensive
Web-based electronic filing system in Faculty of Computer Science and Information
Technology (FCSIT) University of Malaya.
Research significant normally imply to two users type: researchers and practitioners.
For researcher, the research provides a good base for further study in the field of electronic
filing. The research deeply studies the concept, contents and the important role of electronic
filing in any education institutes.
On the other hand, the research conducted on the system users identifies the awareness
of the current system and the willingness to transform from practicing the conventional
method of managing the manual file system to the modern method of managing through a
web portal. Moreover, the research is replacing the work intensive, space-hogging file
5
cabinets with a fully automated paperless environment. The users can be more productive
by using the electronic filing system, and get an immediate response to a document inquiry
by:
1.8
•
Providing a universal access to accurate administrative forms.
•
Reducing administrative time and costs for handling student’s document.
•
Organizing and saving all the student files in the system.
•
View all the students’ files that have been store in the database.
•
Using the system by different privileges for the head department and the staff.
Research Methodology
The research utilized qualitative in research methodology. The instruments used to
collect data are interviews and observation. The interview was conducted with staffs in the
office of the Faculty of Computer Science and Information Technology (FCSIT) University
of Malaya. All the questions during the interviews are focused on information pertaining to
the work process and technical information of the current system and the proposed new
system.
The qualitative data came from observations of application of the current system by
staffs. The observation data are very important because it provides the technical aspect of
the system.
6
1.9
Thesis Layout
The thesis is divided into several sections for easy reading.
Chapter 1: Introduction
This is the first chapter that presents the background of the problem, the main objective
of the research as well as the scope of the research. It also highlights the methodology used
to conduct the research.
Chapter 2: Literature Review
This chapter presents a thorough and exhaustive research of previous work done in this
field. It includes model, definitions and techniques used by other researchers.
Chapter 3: Research Methodology
This chapter explains the data collection process which include collection of primary
data using interview and observation and secondary data resources used in the research.
Chapter 4: Case Study
This chapter details out the case study conducted at the Faculty o Computer Science and
Information Technology, University Malaya.
Chapter 5: Data Analysis and Findings
This chapter analyses the data collected based on the case study the findings of the
analysis done are then used as a guide to determine the user requirement of the proposed
EFS.
7
Chapter 6: System Implementation and Design
This chapter will focuses on implementation of the electronic file system. It describes
the design system, development, implementation and testing. This included the process of
coding the classes, user interface development, and it also discusses the creation installation
package for the system. Also will focuses in details the design aspect of the EFS system it
includes architecture design, functional design, data format design, user interface design
and database design, the functional design is expressed in UML diagrams by use case
diagram.
Chapter 7: Conclusion
This chapter concludes the research done. It also includes the contribution of the
research, future enhancement and suggestion to improve the system.
1.10 Summary
In summary, the statements above indicate the intention build a web-base electronic
filing system, which can be used by staffs in the university offices. This system is used to
store the student’s forms, and it will be developed to make the work of the staffs easier with
the student’s document.
8
Chapter 2: Literature review
This chapter reviews the existing literature pertaining to document handling and
storage. Prior researches conducted in this domain are thoroughly and exhaustively
reviewed. Capabilities and features of the existing document management systems were
reviewed and critically analyses.
Explains the objective of the literature review is to acquire a greater understanding of
the information system that have been implemented and are already in use in similar
situations. The suitable features of existing systems are examined and considered to be
incorporated into the proposed system.
2.1
The Definition of Document and Document Management
In general, the Word document usually means a container of information (often on
paper), and contain drown information or written, used for the purpose specified in the
regulation (Matheu 2005). Usually, a document is part of the paper or a collection of
papers, for example, in a memorandum, correspondence, and mission statement, a receipt
of materials or a client statement. Central to the idea of a document that normally would be
without the difficulty of transport, storage and handling as a single unit.
The word document usually means an information carrier containing written or drawn
information for a particular purpose. Over the last decade, the term document has undergone
a radical change in definition. This change is due in part to information technology. Thus, a
9
large part of the documents used in business in today's world, where files are stored in the
person computer and are treated by the operating units, and email systems. Information
technology is now capable of producing a new electronic document, which can house
graphics, text, computer added design (CAD), and multimedia objects, audio or video clips
(Zantout & Marir 1999).
Documents are processed and stored in electronic form not as physical objects but as
digital ones. The document is no longer the place where words are put on a page, but rather a
collection of elements or objects related to a particular topic, brought as one. Therefore, a
new definition of a document in electronic age emerges.
2.2
Different Type of Documents
a)
Paper Document
The consumption of papers, which are usually made of wood fibre, are exercised a
considerable pressure on forest ecosystems in the world. It seems on the face of the rise of
the computer and the power capacity of storing documents in electronic form, would lead to
a reduction in paper consumption ,which will no doubt news of the books of the forest (York
2006).
However, paper files are still a widely used by people during the life cycle of the
document; because the paper document are easy to note, it provides an inexpensive way to
display large amounts of information, it is socially acceptable and good in the meetings and
interactions are also very flexible using the paper .
10
The paper file has several advantages over the electronic file such as being able to lay
flat on a desk and read without the assistance of any other tool or software. In addition,
folders, files and piles have a number of visually distinguishable attributes: size, location,
and look of the topmost document. Papers are extensively used for document reviewing and
note-taking due to its versatility and simplicity (Sellen & Harper 1997). As users make
annotations on the printed documents as a means to gather notes, it would be easy and fast
to find and locate the information if the files are properly organized on the shelves by
numbers or a predetermined pattern.
While the paper file has its own merits, it is not free from defects. Many companies are
still stored hundreds or even thousands of paper documents in filing cabinets or boxes that
leads to have a huge space for storage purpose. Also, the information stored in this manner
will be lost or damaged if there a fire or natural disaster. The cost of the continuation of
these files and copy the material away from the site could be the most expensive, if the files
were not arranged properly, it will take a long time to find a particular file (Sellen & Harper
1997).
b)
Electronic File
There are many definitions for the electronic files. However, in this paper its definition
would be limited to the following:
•
A collection of data stored in a defined electronic format. An electronic file may be
a single electronic record, a group or series of transactions.
•
A document that is establishes and stored on a computer.
11
The electronic file has been improved in order to support the effort to reduce printing of
documents such as letters and images (Matsuo, Nakamura & Tatekawa 2001). The
electronic pager is an information terminal to store letters and images as electronic
information coinable and editable freely and it is configured like a book or a notebook that
is familiar to us heretofore. From one point of view, the electronic file is a block of
arbitrary information, or resource for storing information, which is available to software
and is usually based on some kind of durable storage. A file is durable in the sense that it is
available for programs to use after the completion of the current program. Computer files
can be viewed as the modern counterpart of paper documents that typically were kept in
offices' and libraries' files, which are the source of the term.
An electronic file has many benefits. It’s resolved a conventional problem concerning
the display of a document when it is being communicated. Exchanging electronic file
become more efficient and fast. Electronic document storage solves many problems, for
e.g. to store electronic document in a server (Aura, Kuhn & Roe 2006). Electronically
backing up of data is trivial and electronic document storage is cheap too. As the cost of
hard drives is constantly coming down so does the cost of data storage online data storage
is a specific form of electronic document storage that allows one to secure documents to a
data middle and access them at any time from anywhere in the world As part of their
service, most online data storage services back up your data regularly
The electronic file also has its disadvantages. Two major disadvantages which can
cause tremendous loss are damage to the computer and viruses attacks. If the computer is
damaged and there is no copy of all the data it had stored prior to the damage, then the loss
12
would be enormous. The second major disadvantage is that of virus’s attacks which is very
prevalently. If the computer is infected by a virus and deleted all the files in the database, it
would not be easy to recover and restore the files to its original form (Aura, Kuhn & Roe
2006).
2.3
The Uses of Paper Files and Electronic Files
This simple graph explains the uses of paper files and electronic files in a variety of
wide-ranging activities. Based on the graph, files are widely used in nearly all industries
(Sellen & Harper 1997):
2500
Paper Only
1500
Electronic Only
Paper & Electronic
1000
No Documents
500
0
D rafting ow n
creating ow n
E diting ow n
E diting ow n
R ev.A nothers
R ev.A nothers
C ollob.
C ollob . D ata
C onrersations
M eeting
R eading Only
D ocum ent
N ote T aking
F orm atting
F orm F illing
T yping T ext
Organising
P hotocpying
D eailing w ith
P rinting
S earching for
S earching for
D eating w .
T elephone
T hinking &
R esponding
Language
M in u tes
2000
Figure 2.1 The Activity Profile for 8 Disk Officers and Chiefs
13
Although there were 16 economists in the sample, due to their busy schedules and days
away from the office, only 8 of them provided complete data on which to base our
quantitative analysis. Figure 2.1 shows the activity profile we were able to construct. The
profile shows the large proportion of their time was spent on authoring activities, as one
might expect. The figure also shows the extent to which these processes relied on paper, or
a combination of paper and electronic tools, In particular collaborative authoring processes,
either in co-authoring a document or in reviewing the documents of others, were heavily
paper-based. Paper was also often present in the drafting and editing of their own text and
data, although this tended to be in conjunction with online tools (Sellen & Harper 1997).
One can also see from Figure 2.1 that over half of conversations and the majority of
meetings were supported by paper documents. Of further interest is the fact that it also
tended to be the preferred medium for reading documents, for document delivery, for
thinking and planning activities, and for document organization.
2.4
Document Management System
A document management system is provided which organize, store and retrieves
document according to properties attached to the document. Applications which function
based on hierarchical path name communicate to the document management system
through a translator.
The most sophisticated method currently used to manage document is Document
Management Systems, where the documents are stored centrally on a server and users
interact with this central repository through interfaces implemented using standard web
14
browsers (Omar 2005). DMS are developed to provide a library and/or repository where
documents can be created, managed and stored for easier access by departments and users
across an enterprise.
Document management system (DMS) is a management control system used to regulate
the creation, use and maintenance of the creation of the document electronically. This
system links the paper, images and electronic documents into flexible and powerful
document management system. The DMS allows converting the paper to electronic format
such as image, raw data, facsimile transmission, e-mail, sound or video clips and paper
record can be linked through a single indexing and retrieval application. Bar code
technology used on both paper and imaged document allows all records to be indexed,
tracked and retrieved through a single user application (Omar 2005).
There are also some systems which include scanning and network features that would
allow multiple users on its network to simultaneously access the necessary documents
remotely. Record can indexed stored, retrieved, printed or faxed by all authorized user on a
network (Lea & Smith Judy Read 2002). Fax messages are captured, stored, routed or
relaxed, eliminating the need for hard copies. Electronic document can be stored on optical
as well as electronic media, and raw data can be automatically and directly located via
searches on computer output to Laser Disk (COLD). COLD is a technique for transfer of
computer generated output to visual disk so that it can be viewed and printed without using
the original program. COLD combines the capabilities of scanning paper documents
created on another system and linking them to COLD document.
15
The crucial technologies are incorporated into the document management system, these
include full text retrieval, electronic document imaging, film based imaging, and workflow
system. In addition, automatic work processes and scheduling, controlling and routing of
electronic document and other related work processes in an organization are provided by
workflow system (Lea & Smith Judy Read 2002). DMS can stop lost records, save storage
space, manage records easily, find document quickly, make images centrally available, and
eliminate file cabinets. Document imaging does not only keep all document organized, it
also allows the documents to be maintained and backed up daily, weekly, monthly or even
yearly.
2.5
Elements of Document Management
A complete document management system consists of six components namely
scanning, storage, indexing, archiving, retrieval, and access. The following paragraphs
provide a brief description of each of these components.
a)
Document Scanning
With the technology of scanning, it makes conversion of paper document to electronic
format a fast, inexpensive, and easy process. A good quality scanner will allow putting your
paper files into your computer easily. It should also be able to convert accurately the
original information so as to ensure that no details are lost in the process.
b)
Document Storage
Storage or Storing which is also called filing allows the placement of the hard
copies or saves the computer records in a suitable location. The storage system creates an
16
organized document filing system and makes the retrieval simple and efficient. A stable
storage system should be able to adopt the ever-changing documents, increasing volumes,
and advancing technology.
c)
Document Indexing
The indexing component in a document management system is a very important
component. The system of indexing produces an organized document filing system and that
makes for simple and efficient retrieval. A proper indexing system permits for more
effective procedures and systems. The index can incorporate physical location information
such as location: where the document is stored and document identification information: the
date created, the date archived, and subject matters of its contents. In addition, indexing is
the mental process of determining the filing segment or name by which a record is stored or
the placing or the listing of items in an order that follows a specific system (Lea & Smith
Judy Read 2002).
d)
Document Archiving
Archive refers to a group of records or documents with specific characteristics,
which also refers to a location in which these documents were kept. It is a long-term
storage of electronic documents that can be taken in the future retrieval.
e)
Document Retrieval
The system uses the information retrieval of documents, including index and text, to
find images stored in the system (Lea & Smith Judy Read 2002). This system makes finding
the right documents easily, as well as retrieves it quickly. Recovery is the process of
17
identifying and removing record or file from storage. It also work on information retrieval in
a particular subject of stored data.
f)
Document Access
One major component of any document management system is access. Document
viewing should be readily available to those who need it, with the flexibility to control
access to the system. To access any document you mast know the location of the file or it
will be difficult to find it.
2.6
Centralized Filing System
For organizations with many employees, undoubtedly, they must have experienced the
frustration of attempting to find files and documents that have been used by someone else
or which; quite possibly, they have been misplaced or lost and could not remember where
they have been kept (Liu, McMahon & Culley 2008). They might be hidden on someone
else's desk, in a drawer someplace, or at the bottom of your own stack of items ''to be dealt
with later. Usually, the last person to use the file simply kept it in their work area. That
happens when there is no central location and attendant procedures for records that need to
be handled by many people from many departments. Thus, a centralized filing system
would be one of the solutions that can be used to address this situation.
Centralized filing system is a system in which all the records from all the departments
or units are kept and located in one, central location. The management of these records is
placed in the control of one staff or in the case of large centralized filing systems, several
people.
18
Files are conveniently available to all departments. The establishment of a central filing
system and its processes, need the determination of what information needs to be accessible
to all staff and what should be available only to certain individuals. The director must be
involved in all phases of the project for obvious reasons, but also because an analysis of
information needs assessment and can describe the effect of excellence (Liu, McMahon &
Culley 2008).
Consider this quote from Information and Records Management ''No organization
should permit bits and segments of its records to be scattered randomly wherever they
happen to be created or to have accumulated (Liu, McMahon & Culley 2008). Organization
should not arbitrarily forces the centralization of records without taking into account the
practical needs of the offices. Therefore, it is important to be careful planning and
consideration before the beginning of the reorganization of the files that are used by many
people from various departments electronically.
2.6.1 Benefits of Centralized Filing System
Some of the main benefits of a centralized filing system include less duplication of files
and more efficient use of equipment, supplies, and space. All related data could also be kept
together. Another benefit of a centralized filing system is that the organization would be
able to provide a uniformed service (Liu, McMahon & Culley 2008). In addition, when
access is provided to all staff positions, it would reduce the frustration of finding and
locating information which in turn decreases any bickering among staffs. On top of that,
another benefit is that it can simplify routine maintenance and annual and periodic
archiving of the files.
19
In establishing central files, it is also important to designate what records must not be
available to all. Besides the benefits of the creation the central filing system are not
insignificant. Because it is unusual not to recognize some out-dated records that should be
archived, or which are long overdue for destruction. These are taking up valuable space
already, and now is the best and perhaps only opportunity to put records retention
guidelines into effect.
2.7
Electronic Document Management Systems
Electronic document management system (EDMS) is the application of technology to
save paper and speed up communications, and increasing the productivity of business
operations. From a broader perspective, the EDMS is a significant expansion in the area of
information management and a concomitant increase in the responsibilities of managers
and executives (Zantout & Marir 1999).
Electronic document management systems have been approximately for several
decades, technologies have been develop in recent years to include a variety of features
(Zantout & Marir 1999). Imaging technology provides the facility to replace the paperbased document management system online while the multimedia technology involves the
detain display of various data types, together with the facility to get back constituent
objects of the multimedia document. In addition, systems may incorporate GroupWare,
workbook and text retrieval functionality, with some overlapping among them. These are
discussed in the following sections. The most important functions of the current document
management systems enable users to:
20
•
directly manipulate the documents,
•
index and store the records so that they can are retrievable,
•
communicate through the exchange of documents,
•
collaborate around documents,
•
Model and automate the flow of documents.
Any company one day will feel the need for some kind of Electronic Document
Management System to control their ever-growing number of various documents and
drawings. Companies often resist the need for EDMS but are deterred by the expenses and
difficulty involved in implementing an EDMS. Electronic document management system
is used in an effective manner, requires a substantial change in working practices, in spite
most technical aspects are resolved through the adoption of low cost databases and ease of
integration with the Windows environment (Sprague Jr 1995). A useful EDMS should not
only control documents but also provide access to them throughout the company, and even
to customers or other participants in the project through the Internet or network. An EDMS
should also centralize data in an easily accessible environment, allowing users to store,
access, and modify information easily and quick.
In addition, the task of managing all the information necessary to design and build any
major business is a real challenge, and many believe that more efficient information
management is a main mechanism for companies to increase its productivity. The standard
features of a good system should be composed of the following functions: searching facility,
viewing without the use of the original application, red-lining and marking- up feature,
printing and plotting, workflows and document life cycles, revision and version control,
21
document security, document relationships, status reporting, issue/distribution management
and remote access (Sprague Jr 1995).
Electronic document management systems are the basic level of creating systems as
shown in Figure 2.2. Data from a stand-alone systems is to establish whether the inputs from
the CD or scanned into the system. EDMS then provides the data storage and retrieval
system with outputs in the form of hard copies or computer files.
22
EDMS
1. Scan
Printed document are scanned
2. Register
All document are registered for author,
input data, context type, etc., information
Document
metadata
author input
data type,
Document
metadata
author input
data type,
3. Store
Documents are indexed and stored in
database or file structures
4. Retrieve
Documents are retrieved through
search and queries. The system
provides version check in/out,
activity, track, etc.
Figure 2.2 Electronic Document Management Systems
2.7.1 Advantages of EDMS
Many companies use EDMS to standardize the way for the users (that have the right
privileges) to find and access the document and information that they want. EDMS helps
users to do their jobs more easily, and provides the company with the security and reliability
of data, and management actions. Many of these features aim to save time, simplify the
23
work, protect the investment in creating these documents, enforce quality standards, and
enable the audit and to ensure accountability (Groetzner, Guenthner & Streckeisen 2004).
2.8
•
Generally efficient location and delivery of documentation.
•
The ability to manage documents and system data regardless of source or format
•
The ability to integrate computer and paper
•
Controlling the access, distribution and modification of documents
•
Provision of document editing and mark-up tools.
Neural Networks
In this research, neural network is a method that used to scan input data to save it in
the database of the system, in order to develop the EFS system, the artificial neural network
is used to read the forms and to change it from image to text. There are many researchers
work in this algorism and everyone defines it in his own way. Some of the artificial neural
network definitions are studied and presented here for discussion.
2.8.1 What is Neural Networks?
Neural networks are a new way of programming a computer (Chen 1996). It is
exceptionally good performance in pattern recognition and other tasks that are very difficult
to program using conventional techniques. Programs using neural networks are also able to
learn and adapt to changing circumstances.
24
Neural network has strong features in modeling the data, which able to capture and
represent complex relationships between inputs and outputs. Incentive for the development
of neural network technology stemmed from the desire to design a system that could
perform an “intelligent” task similar to those of the human brain. Neural networks resemble
the human brain in the following ways (Chen 1996):
1. A neural network acquires knowledge through learning.
2. A neural network's knowledge is stored within inter-neuron connection strengths
known as synaptic weights.
Real power and the exploitation of neural networks lie in their ability to represent both
linear and non-linear relationships and their ability to learn directly from the model data.
Traditional linear models are simply inadequate when it used to model data with a nonlinear characteristics (Chen 1996).
Neural networks has different model of computing:
• Von Neumann machines are based on processing / memory abstraction of human
information processing.
• Neural networks on the basis of the structure parallel to the brain of an animal.
Neural networks are a form of multi-processor computer system, with
• Simple processing elements.
• A high degree of interdependence.
25
• Simple messages included.
• Adaptive interaction between elements.
2.8.2 Neural Networks Types
There are several types of neural networks. They can be discriminate by their type (feed
forward or backward), their structure and learning algorithm used by them. The type of a
neural net indicates if the neurons of one of the net’s layers were connected among each
other. Feed forward neural networks only allow the existence of connections between
neurons of different layers, while the networks of the feed backward type are also links
between the neurons of the same layer (Cho 2000). In this section a selection of neural
networks will be included.
2.8.2.1
The First Type is Perceptron
The Perceptron, was first introduce by F. Rosenblatt in 1958. It is a very simple type of
neural net with two layers of neurons, which accept only binary input and output values (0
or 1). The learning process is supervised and the network is able to solve the basic logical
operations such AND or OR. It is also used for the purposes of pattern classification. As
you can see in figure 2.3 more complicated logical operations (such as the XOR) cannot be
solved by perceptron (Cho 2000).
26
Figure 2.3 Perceptron Characteristics
2.8.2.2 The Second Type is Multi-Layer-Perceptron
The Multi-Layer Perceptron was introduced for the first time by M. Minsky S. Papert in
1969 as in figure 2.4. It is an extended Perceptron which has one or more hidden neurons
layers between the input and output layers. Due to its extension structure, a multi-layered Perceptron is capable of solving all the logical operations, including the XOR problem
(Cho 2000).
27
Figure 2.4 Multi-Layer-Perceptron Characteristics
2.8.2.3 The Third Type is Back Propagation Net
The Back propagation Net was published for the first time by the LG. Hinton, E. Rumelhart
and R.J. Williams in 1986 and is one of the strongest types of neural net (Cho 2000). It has
the same structure of the Multi-layers Perceptron and used the back propagation learning
algorithm as display in figure 2.5.
28
Figure 2.5 Back Propagation Net Characteristics
2.9
Self-Organizing Map
In the last section, an overview of neural network algorithm which used to convert the
scanned document to text format. However, in the EFS implementation the selforganization map (SOM) algorithm, and a special type of neural network is used. This
section describes (SOM) algorithm in details.
2.9.1 What is Self Organizing Map?
Self-organizing map (SOM) is new software which is effective for the visualization of
the high-dimensional data (Oja, Kaski & Kohonen 2003). It implemented an orderly highdimensional distribution mapping into regular low-dimensional grid. It is also able to
convert complicated and non-linear statistical relationships between high-dimensional data
29
items into simple geometric relationships on the low-dimensional display Compresses the
information while maintaining the most important metric and topographic relationships of
primary data items on the screen, can also be thought in the production of type abstractions.
The two aspects, visualization and abstraction, can be used in a number of ways in complex
tasks such as process analysis, machine perception, control, and communications.
The SOM usually consists of a regular two-dimensional grid of nodes (Kohonen 1998).
A model for some of the observation is associated with each node as in Figure 2.6. The
SOM algorithm computes the models to optimize the description of the area (a separate or
continuously distributed) observations.
The models are automatically ordered into meaningful two-dimensional mode which
similar to each other in the grid than the more mixed ones. In this sense the SOM is
similarity graph, and a clustering diagram, too. Its computation is a nonparametric, repeated
the regression process (Kohonen 1998).
Figure 2.6 Link Node
30
In this exemplary application, each processing element in the hexagonal grid holds a
model of a short-time spectrum of natural speech. Note that neighbouring models are
mutually similar.
2.10
Document Management Application- Literature Review
Allergan, founded in 1948, is a technology-driven health care company with its
headquarters in Irvine, California. It develops and commercializes eye care pharmaceutical,
ophthalmic surgical device, over-the-counter contact lens care, movement disorder and
dermatological markets throughout the world. Allergan markets its products in more than
100 countries, and in l997, generated approximately $l. l billion in worldwide revenue.
In order to be successful, Allergan requires a very efficient and streamlined business
operation strategy. To achieve that, Allergen’s management teams invested great amount of
time to research and select the best business systems that would meet their needs as they
plan ahead. They also must ensure that their staffs are adequately trained to reap the
maximum benefits. This resulted in the purchase of IXOS-ARCHIVE1 as their imaging and
archiving solution for SAPTM R/3@.
Competitive global companies such as Allergan are implementing imaging and
archiving solutions to manage the ever increasing volume of data so that system
performance could be optimized, data would be secure and employee productivity would
increase. IXOS-ARCHIVE, the SAP-certified imaging and archiving product suite, delivers
1
Available from: http://www.ixos.com, last accessed 15/6/2009
31
the solution for these concerns by storing and retrieving documents, reports, data and
images digitally under the control of R/3 processes. System performance remains optimal
with regular data archiving, data is stored securely and employees become more productive
as they simply access needed information on-line, when and where it is needed, from one
file system. The Accounts Payable Shared Service Centre was Allergen’s first department
to utilize IXOS-ARCHIVE in order to cut costs and save time. Instead of processing hard
copies of invoices, they are now scanned and processed for payment on-line.
IXOS-ARCHIVE allows the scanned documents to be manipulated before being
transported to the transaction processor. Pages that were inadvertently scanned upside down
can be turned right side up. It has another feature which allows the page order to be
changed for multiple-page documents. Once a scanned invoice is transported to the
transaction processor, the processor can then send the invoice image to the appropriate
person for payment approval and general ledger coding, or match the invoice on-line to the
appropriate purchase order and receiver before processing for payment. Before IXOSARCHIVE was implemented, approval requests had to send via interoffice mail and the
process could take as long as two weeks.
Allergan has sites in California, Texas, Massachusetts, Mexico, Puerto Rico and
Canada which are utilizing IXOS-ARCHIVE. Very often, employees or Cost Centre
Managers from these different locations needed to research a particular invoice, and would
call the Shared Service Accounts Payable Centre in Irvine and ask for copies of specific
documents to be faxed to them. With IXOS-ARCHIVE, most of these phone calls to the
32
Shared Service Centre no longer have to be made. Authorized viewers can quickly
access the appropriate invoice or related documents on their computer screens. In addition,
outsmarting the paper documents to be microfilmed or microfiches (at a considerable cost)
is no longer necessary.
The Accounts Payable Shared Service Centre has realized considerable time savings as
they receive fewer phone calls asking for research. They no longer need to stand in front of
a photocopier making duplicates of the invoices which would later need to be sent out for
approval, general ledger coding, or auditing before being mailed to the requestors. This
translated to more efficient time management and improved performance.
IXOSARCHIVE has also been put to use in the Return Goods processing department.
It has helped to eliminate much manual paperwork and filing. In addition, SAP transaction
history is being archived for permanent storage and retrieval. As the volume of the
company’s data grows, system performance becomes a greater priority. Allergen’s
management wisely choose to plan ahead before its system performance became adversely
affected. They began data archiving.
Sandy Howard, the Project Manager of Business Systems Development Group at
Allergen noted, “Howe wanted to make sure we didn't run into any performance problems.
Our financial ledger is growing by about l0 gigabytes per month, and so we started
archiving this data first. We'l1 also start archiving our SIS and LIS reporting data. IXOS
required very little implementation time and the IXOS SOFTWARE team has really
33
impressed us with their technical knowledge and outstanding customer servicemen.
In 1985 Timeshare administrators Hutchinson & Co was formed to be a collection
agent for just one resort. However, as the company grows they gradually took over all the
back office administration which included the work of the trustee. As a result, five years
later in 1990 they formed Hutchinson & Co Trust Company Ltd. The company now
supports more than 100 resorts in the United Kingdom, Europe and South East Asia from
its headquarters in Camberley, Surrey.
Hutchinson is required to retain documentation relating to its customers’ timeshare
agreements for a period of 80 years. The storing and accessing of such as amount of
paperwork was becoming a very expensive problem. “We were surrounded by paperwork,”
said Hutchinson Systems Administrator, David Earles. “Every single wall of the office was
lined in shelves and files. We even had a separate building just to store the files.” Earles
said that for several years the company had begun researching for available digital
archiving systems but found that the technology at that time was not yet able to meet their
needs. “Back then, the machines were too expensive and too slow to make it a worthwhile
option for us. Fortunately, the situation has improved a lot.
In June 2001, Earles and his colleagues began discussing a document management
solution with Canon. “We went to Canon because we knew they were the best in the
industry, and we needed a powerful, reliable solution,” he said. The document management
solution proposed by Canon was a combination of the features from the Canon DR5020, a
document scanner and the Scan-File 2000, archiving software.
34
Size and thickness of documents were not a problem with the Canon DR5020 as it can
scan documents of up to and including A3. As for speed, it can handle up to 75 pages per
minute, and with automatic feeding and double feed detection, the volume can go up to 500
sheets. The control panel is placed on the product itself making the DR5020 compact,
lightweight and especially user-friendly. It also has a variety of scanning options: a barcode
unit for automatic indexing, an endorser for “post stamping” documents and an imprinter
for “pre-stamping'' documents on the actual digital image.
Free OCR2 is a free online Optical Character Recognition (OCR) tool. It can be used to
perform OCR on any image supply and it is free. To use it, just upload the image files.
Free OCR can support different kinds of files such as JPG, GIF, TIFF BMP and PDF but
can be used only for the first page.
The only restriction in using the tool is that the image file must not be larger than 2MB,
no wider or higher than 5000 pixels, and there is a limit of 10 image upload per hour. There
is also an automatic image pre-processing optimization before the image is fed into the
OCR engine. It reduces background noise and adjusts the resolution. The only thing left is
to de-skew the image if skew is more than 10. While the free OCR can now handle images
with multi-column text, it also supports more languages including Bulgarian, Catalan,
Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Indonesian,
Italian, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian,
Slovak, Slovene, Spanish, Swedish, Tagalog, Turkish, Ukrainian and Vietnamese.
The free OCR software enables you to extract text from an image and convert it into an
editable text document. If you need the text from an image, you can just scan the text and
use the OCR tool to convert it into editable text. The result is always plain text.
2
Available from: http://www.free-ocr.com, last accessed 1/5/2009
35
OmniPage 173 is the world’s most accurate document conversion application. With
more than 99% character accuracy, it can convert PDF and paper files into electronic files
that you can edit, search and share in the formats of your choice. You can turn documents
that would take hours to re-type into perfectly formatted documents in just seconds.
OmniPage 17 also supports the conversion of all popular image formats such as TIF, JPG,
BMP, PCX, GIF, PDF, MAX and more.
The tool also has optical character recognition of up to 99% for 119 different languages.
OmniPage now recognizes Simplified and Traditional Chinese, Japanese and Korean. There
is no better OCR application that could be found for the price. This increased level of
accuracy also greatly reduces the need for post-recognition proof reading and correction.
The superior accuracy of this document conversion software means that organizations
can save significant amounts of time and money by radically improving the ways in which
paper and digital documents are processed, archived and shared. It is really simple to use –
tasks are automated, information is easily accessed, and productivity soars.
Even digital camera or iPhone pictures can be converted into files that you can edit in
your favorite PC applications, while scanned documents can be turned into electronic
books, specially formatted for easy reading on the Amazon Kindle so you can take your
documents with you.
SimpleOCR4 is a popular OCR freeware with thousands of users worldwide.
SimpleOCR, which has up to 99% accuracy, is also a royalty-free OCR SDK for developers
to use in their custom applications. While SimpleOCR currently supports English and
French languages, we are in the process of adding recognition for additional languages.
If you have a scanner and want to avoid retyping your documents, SimpleOCR is the
fastest, free way to do it. The SimpleOCR software is 100% free and not limited in any
3
Available from: http://www.nuance.com/imaging, last accessed 13/8/2009
4
Available from: http://www.simpleocr, last accessed 3/8/2009
36
way. Anyone can use SimpleOCR for free - home users, educational institutions, and even
corporate users.
If your documents have multi-column layouts, non-standard fonts, poor quality or
colour images, you will need one of our commercial OCR applications or Imaging SDKs to
get an accurate read. The OCR Guide compares desktop and server OCR solutions from
several major engines, including ABBYY, IRIS, Nuance (formerly Scan Soft) and more.
Our imaging solutions website, ScanStore.com offers demo downloads and online ordering
for all these applications with your ScanStore User Account. We encourage you to try the
SimpleOCR freeware and see how it works with your documents. If you need better
features and accuracy for more demanding applications, please come back and find your
solutions at ScanStore.
SimpleOCR has a huge Dictionary. With more than 120, 000 words, it is unlikely that
SimpleOCR will run into a word it does not know. In the rare event that it does not, our
improved text editor allows you to easily add the new word to the dictionary. By adding
new words to the dictionary, SimpleOCR becomes better with every use.
Input Formats - SimpleOCR works with all fully compliant TWAIN scanners and
accepts inputs from TIFF files. Meanwhile, Output Formats - SimpleOCR can save the
documents it acquires in text formats (TXT and RTF), importable into almost every
program, such as Word, WordPerfect, HTML editors, and e-mail programs, either fully
formatted or as plain text. Additionally, it can also save scanned documents in the industry
standard TIFF format, a format widely accepted as PDF files.
This table show the different between the free ocr, omnipage17 and simple ocr systems.
Feature
Free ocr
Omnipage 17
Simple ocr
Free
Free ocr system
Have to parches
Have to parches
37
File types
JPG, GIF, TIFF BMP, PDF TIF, JPG, BMP,
PCX, GIF, PDF,
MAX
JPG, GIF, TIFF
BMP, PDF, MAX
File size
2MB
4 MB
2 MB
Languages
32 different languages
119 different
languages
English, French
online
online ocr
Not online
Not online
Accuracy
Up to 89%
Up to 99%
Up to 99%
Table 2.1: Different between free ocr, omnipage17 and simple ocr systems
2.11 Summary
The review of literature involves various fields such as definition of ‘document’ and
‘document management’, different type of documents, document management system,
elements of document management and centralized filing system. The definition of
electronic document management systems and its benefits been discussed. In addition to
that, different types of neural networks and self-organizing map have also been identified.
Example of current electronic filing systems has been discussed.
38
Chapter 3: Research Methodology
This chapter focuses on criteria in selecting a suitable methodology to conduct the
data collection process as well as for the project software life cycle and its development
tool.
3.1 Research Methodology
According to (Yin 1994) and (Zikmund 1987), research can be used for three purposes exploratory, descriptive and explanatory.
Exploratory Studies are a valuable means of finding out what is happening, to seek new
insights, to ask questions and to assess a phenomenon in a new light. (Robson 2002)
explained that an exploratory study is a particularly useful approach if one wishes to clarify
the understanding of a problem. The advantage of exploratory research is that it has great
flexibility and is adaptable to change. The flexibility inherent in exploratory research does
not mean the absence of direction.
Descriptive Research is described within problem areas, where there already exist plenty of
literature works and the aim is to study events that have occurred or are happening in the
39
present time. The aim of descriptive research is to describe the characteristics of a
population or phenomenon. It seeks to determine the answers to who, what, when,
where and how questions (Zikmund 1987). According to (Robson 2002)the objective of
descriptive research is to portray an accurate profile of persons, events or situations. Usually it
is taken as an extension of or a forerunner to a piece of exploratory research (Robson
2002). (Zikmund 1987)noted that accuracy is of immense importance in descriptive
research. Though admit ted errors cannot be eliminated completely, a good research
strives for descriptive precision. It is usually taken based on some previous knowledge and
understanding of the nature of the research problem.
Explanatory Research is aimed at establishing causal relationship variables. The emphasis here
is on studying a situation or a problem in order to explain the relationships between
variables. Usually, exploratory and/or descriptive research precedes this kind of research
(Zikmund 1987), and according to the researcher must be knowledgeable about the research
subject.
The research purpose of this study has been assessed as both exploratory and descriptive.
The study focuses on explorative research because of the limited knowledge about the
research area, and since the research aims to gain a deeper understanding within this field.
The research is also descriptive in nature, as the attempt is made to describe the data
collected.
3.2
Data Collection Process
40
The data collected can be classified as primary versus secondary data. Primary data is
gathered and assembled specifically for the research project at hand. Secondary data has
already been collected for purposes other than the problem at hand.
According to (Yin 1994),there are six sources of evidence that can be made the
focus of data collection for case studies: documentation, archival records, interviews, direct
observations, participant-observation, and physical artefacts. Each of these sources of
evidence is explained in Table 3.2.
Table 3.1: Data Collection Source Evidence
Source of
Evidence
Description
The different types of documents include statistics, registrations,
official publications, letters, diaries, newspaper, journals, branch
literature and brochures. Documents are mostly used for collecting
secondary data.
These can be, for example, service records, organisational records,
Archival Records maps and charts, survey data, and personal records. Archival records
are often used in computerised form, also for collecting secondary
data.
Documentation
The interviews mostly take the form of an open-ended nature, in
which an investigator can ask key respondents for the facts of a
matter, as well as for the respondents’ opinions about events. The
interview can also take the form of a focused interview, in which a
respondent is interviewed for a short period of time, an hour for
example. Moreover, the interview can entail more structured
questions, along the lines of a formal survey
This can involve observations of meetings, sidewalk activities,
Direct Observation
factory work, classrooms, and the like. Observational evidence is
often useful in providing additional information about the topic
being studied. To increase the reliability of observational
evidence, a common procedure is to have more than a single
observer making an observation, whether of the formal or the
casual variety.
Participant-observation
is
a special mode of observation in which
Participantthe investigator is not merely a passive observer. Instead, the
Observation
investigator may take a variety of roles within a case study situation
and may actually participate in the events being studied.
Interviews
41
Physical Artefacts
A final source of evidence is a physical or cultural artefact - a
technological device, a tool or instrument, a work of art, or some
other physical evidence. Such artefacts may be collected or
observed as part of a field visit and have been used extensively in
anthropological research.
Source: Adapted from Yin, 1994, pp.85
This research employs two methods of collecting data:
3.2.1 Observation
An observation can give useful insight into problems, work conditions, bottlenecks
and Methods work (Avison & Fitzgerald 2006).
Observation is the first method used to gather information regarding the development of
an online Smart E-Portfolio System. Observation will help to identify the potential users of
the system. For the purpose of this research, the office staff in FCSIT was visited to observe
the current system used and to know how they are handle the students forms. Also, an
observation has been done on my supervisor during one semester; to observe how the
course portfolio is been collected and prepared.
In the interview process the staff might not reveal all the needed information; hence
observation helps by giving insight information. Observation include observe how the files
are manage such as storage, accessing and searching.
3.2.2 Interviews
The second method used to collect the data for this research is through interviews.
42
This method is chosen as it presented a significant source of information for a case study (Yin
1994). The type of data used in this method is called primary data as it is collected for a
specific purpose by the researcher. There are three different types of interviews, mainly
open-ended, focused and structured.
An open-ended interview is used when the respondent is allowed to answer the
questions in his/her own words. A focused interview is bound to a certain degree as despite
following a set of questions, it is performed in an informal, conversational manner. The
third type, which is a structured interview, is based on a survey, in which the researcher
without any flexibility predetermines the questions (Yin 1994).
An interview can be conducted over the telephone or in person. The most
qualitative interview is done on a one-to-one or face-to-face basis. Some of the great
advantages of interviewing someone in person are that it can include questions that are more
complex, and that it can be conducted over a longer period of time.
In this research, a one-to-one question and answers sessions were held. The
interviews were recorded and reviewed later while incorporating researchers’
additional remarks. A total of two interviews were held with a number of FCSIT staff
which include my supervisor and the office staff who works with the student
document. Interviews with these office staff took place on 1/8/08 and are used to get
the initial functional requirements of the application. The data gathered was then used
to identify the data entities and hence the design of the application.
43
The findings of the interview indicate the need of electronic file storage system as it
will improve work efficiency. This is because every student file needs to be kept for seven
years in the office. All these files take a large office space staff fined it difficult to search
for particular files if they need.
Interviews were then conducted with them by asking relevant questions about the
current system used to manage the student’s files, and to identify if there is a need for new
system that makes the process of administering and managing the students files easier. The
questions and the result of interview are displayed in chapter 5.
The questions that have been asked during the interviews with the staff are:
•
Are the staffs satisfied with the current storage file system?
This question has been asked to know if the users are satisfied with the current
system that they are using.
•
What are the steps workflows in the current storage file system?
This question has been asked to know the workflows of the current system and haw
long it takes to be complete.
•
What are the advantages of the current storage file system?
This question has been asked to know the advantages of the current system to use it
when the new EFS devolved.
•
What are the disadvantages of the current storage file system?
44
This question has been asked to know the disadvantages of the current system to try
to develop it, or to find new way for it in the workflows process for the new system.
•
In the steps that have been mentioned before which steps that take long time to be
done in current storage file system?
This question has been asked to know the problem of this steps that take long time
to be done and try to fine where the problem is and fix it.
•
What do you think the reason for the delay in the process of the current storage files
system?
This question has been asked because the user how is working with the current
system knows where and why the system is delay.
•
What are the changes that must take place in the current storage file system?
This question has been asked because there must be some processes that the user
don’t like it or think no need for it.
•
What are the capabilities of staff in dealing with computer?
This question has been asked to know the capabilities of the user with the computer
and to know haw the new system mast be developed.
•
Will the staff accept the change of the current storage file system to computerized
system?
This question has been asked because maybe some users see that there is no need to
change the current system or they don’t want to start to learn anther new system.
45
•
Do the staffs need any training for the new computerized system?
This question has been asked to know if the users need for training if the new system is
used.
3.3 System Development Methodology Approach
Methodology is a collection of techniques for building models and it is applied
across the development of a software life cycle. There are a few categories of software
development methodologies such as: object oriented methodology where by systems are
modelled as a collection of cooperating object, structured methodology those are based on
functional (algorithmic) decomposition; and also data-driven methodology by which the
structure of system is derived by mapping system inputs to output. A good software design
methodology provides at least three models, which are structural model, functional model
and control model.
3.3.1 Waterfall Model-Introduction
The waterfall model is the classic software life cycle model. This model was the only
widely accepted life cycle model until the early 1980s.
The waterfall model is the earliest method of structured system development. Although
under attack in recent years for being too rigid and unrealistic when it come to quickly
meeting customer's needs, the waterfall model is still widely used. It is attributed with
46
providing the theoretical other process models, because it most closely resembles a '' model
for software development (Royce 1987).
Requirement gathering
and analysis
System Design
Coding
Testing
Installation
Maintenance
Figure 3.1 General Overview of Waterfall Model
Waterfall model methodology is chosen for this project since it gives full focus on each
and every software development aspect. This is needed since the project involve directly
with integration between the three element of technology which are electronic document
imaging, electronic workflow and electronic centralized filing system.
47
3.3.2 Task Regions of Waterfall Model
The waterfall model consists of the following steps according to (Gilb 1985):
3. 3.2.1 Requirement Gathering and Analysis
All possible requirements of the system to be developed are captured in this phase.
Requirements are set of functionalities and constraints that the end-user (who will be using
the system) expects from the system. The requirements are gathered from the end-user by
consultation, these requirements are analyzed for their validity and the possibility of
incorporating the requirements in the system to be development is also studied. Finally, a
requirement specification document is created which serves the purpose of guideline for the
next phase of the model.
3.3.2.2 System Design
Before a starting for actual coding, it is highly important to understand what system is
to be created and what it should look like? The requirement specifications from first phase
are studied in this phase and system design is prepared. System design helps in specifying
hardware and system requirements and also helps in defining overall system architecture.
The system design specifications serve as input for the next phase of the model.
3.3.2.3 Coding
48
Also knows as programming, this step involves the creation of the system software.
Requirement and system specifications from the system design step are translated into
machine readable computer code.
3.3.2.4 Testing
As the software is created and added to the developing system, testing is performed to
ensure that it is working correctly and efficiently. Testing is generally focused on two
areas: internal efficiency and external effectiveness. The goal of external effectiveness
testing is to verify that the software is functioning according to system design, and that it is
performing all necessary functions or sub-functions. The goal of internal testing is to make
sure that computer code is efficient, standardized, and well document. Testing can be a
labour-intensive process, due to iterative nature.
3.3.2.5 Installation
Once system has been testing satisfactorily it is delivered to the customer and installed
for use. The introduction of the system has to be managing carefully so as not to cause
unnecessary disruption and minimize the risk to changes
3.3.2.6 Maintenance
This phase of the waterfall model is virtually never ending phase. Generally, problems
with the system developed (which are not found during the development life cycle) come
up after its practical use starts, so the issues related to the system are solved after
deployment of the system. Not all the problems come in picture directly but they arise time
to time and needs to be solved; hence this process is referred as maintenance.
49
3.4
Justification of Waterfall Methodology Selection
The methodology selection brings many benefits towards the final delivery of the
proposed system. The selected methodology incorporates systematic development
technique to the project. This approach will create a more scalable system as it models the
real world via abstraction.
The selection of Waterfall Model (WM) will encourage planning before designing and
enforces some important rules in the process of developing the proposed system. It breaks
the system into sub components with milestones corresponding to the completion of
intermediate products. Since the WM is a discipline approach, it requires each stage of the
software development to be documented. Besides that, the correctness of the product is
checked on each stage of the product building. This ensures only the correct product that
fulfils the users requirement are build during the whole development process.
Another main reason for choosing WM is that it is a stable and reliable model. As it is
widely used in the industry for a long time its reliability is tested and proved. The
developers are also familiar with this model, as it is classic and popular. Due to the limited
time in the software development process, using a stable and familiar model ensures
reduced misunderstanding and problems in the system.
50
3.5 Summary
This chapter has discussed the research methodology, research techniques, and
research tools which are used in this dissertation. Research methodology produces the
main guidelines for developing EFS. The research techniques are used to collect and
capture requirements from end users who were interviewed and observed during
work time. In chapter 5 the output of the data collection have been displayed and
discussed.
51
Chapter 4: Case Study
The university is an institution of higher education and research, which grants academic
degrees in a variety of subject. A university provides both undergraduate and postgraduate
degree. Each student enrolment requires students to fill in a form about their bio data. The
university graduate every year hundreds of student from different disciplines to go out to
the real world to implement and develop what they learn.
4.1 University of Malaya
University of Malaya is the first and oldest public university in Malaysia. University of
Malaya (UM) traditionally provides education; research and service to the society. The
university had its roots in Singapore with the establishment of King Edward VII college of
Medicine in 1905 and the Raffles College in 1929 to meet the need for medical and tertiary
education. On October 8, 1949, University of Malaya was formed with the amalgamation of
both colleges. The amalgamation, paved the way for University of Malaya to emerge as an
education institution which will cater for the tertiary needs of Federated Malaya and
Singapore (University of Malaya Student Handbook, 2007).
52
The growth of the university was very rapid during the first decade of its establishment
and this resulted in setting up of two autonomous divisions in 1959, one located in
Singapore and the other in Kuala Lumpur. In 1960, the government of the two territories
indicated their desire to change the status of the divisions into that of a national university.
Legislation was passed in 1961 and University of Malaya was established on January 1,
1962.
To date, University of Malaya has an estimated population of 25,000 registered
students, pursuing various levels of courses. The university has 12 faculties, 2 academies, 3
centres and 2 institutes.
4.2 Quality Management& Enhancement Centre (QMEC)
In June 2002 was a turning point in UM history, with formalization of UM Quality
Management System (QMS). Based on the framework and requirements of MS ISO
9001:2000, the UM QMS encompasses all core processes which include teaching and
learning, research and consultation, and supporting services. On 24 December 2002, as
listed in the Malaysia Book of Records, UM became the first public higher education
institution (PHEI) to be certified with MS ISO 9001:2000 on a comprehensive basis.
Quality Management& Enhancement Centre (QMEC) was formed on 27th July 2002
with the aim of managing and coordinating activities associated with the UM QMS. QMEC
has been actively engaged in coordinating, strengthening and continually improving the
UM QMS. These activities include conducting training sessions, courses and workshops in
the effort to instil awareness amongst the staff of UM, and stress the importance of ensuring
quality in all aspects of the organization. QMEC's scope has since expanded to include
53
other quality management framework, namely criteria for the Ministry of Higher Education
quality management, Research University, University Ranking, and ASEAN University
Network quality management. QMEC has five main sections as follow:
4.2.1 Documentation Section
The Documentation Section consists of a Document Manager, an e-Document Manager
and other members who are responsible for the management of controlled quality
documents which are currently available online through the QMEC website.
4.2.2 Internal Quality Audit Section
The Internal Quality Audit Section is headed by a Chief Auditor who is assisted by a
Deputy Chief Auditor and Assistant Auditors. This section coordinates the University's
internal quality audit exercises which aim to check on the University's compliance to UM
QMS.
4.2.3 Training & Awareness Section
The Training & Awareness Section comprises a Manager and other QMEC members.
The section's main function is to coordinate training in all aspects that pertain to quality in
UM. Activities on awareness and appreciation of UM QMS are regularly and continually
conducted for all levels of UM staff.
4.2.4 Quality Assurance Section
54
The Quality Management Section members include a Manager as section head. Its main
responsibility is to coordinate activities with regards to the quality management of PHEI in
UM. It is responsible for monitoring internal quality management activities, disseminating
good practices and conducting awareness and training programmers in quality assurance.
4.2.5 Customers Feedback & Continuous Improvement Section
This section, consisting of a manager and other members, manages matters pertaining to
feedback/complaints from customers. The Customer's Satisfaction Survey as well as the
Continual Improvement Projects are also carried out and assessed by this section.
4.3
Faculty of Computer Science and Information Technology (FCSIT)
Historically, the computer facilities and services at the University of Malaya were
provided in the mid of 1967 by the Computer Centre, which was formed in 1965. In
December 1969 the centre also took an additional role of teaching and research in the field
of computer science and information technology (FCSIT, Annual Report, 2002).
A post-graduate Diploma in Computer Science was then introduced in 1974. During the
1990/91 academic session the Centre began offering the Bachelor of Computer Science
(CS) programmed with a maiden intake of 50 students. After various proposals, the
University’s Council on September 1994 agreed to the formation of the Faculty of
Computer Science and Information Technology (FCSIT) and a separate Computer Services
Division. The Bachelor of Information Technology (IT) commence during the 1996/97
academic session. At present the faculty has four departments; Artificial Intelligence,
55
Software Engineering, Information Science and Computer Systems and Technology.
Currently, apart from the two Bachelor programmers, its graduate studies offer Masters and
Doctor of Philosophy programmers in Computer Science, Information Technology,
Software Engineering and Library and Information Science.
4.4
Office of FCSIT
This section about the office of FCSIT. This office responsible for everything related to
the collage from lecturers, classroom and student. The office in this faculty divided into two
offices: the postgraduate office managing doctoral and master students, and the
undergraduate office that includes all the departments of the faculty. These offices manage
the files of the students; each student has his own file that hold his entire document like
registration forms that he/she filled to registration, certificates and payment vouchers made;
unfortunately the student files are paper files and handled manually. According to the
university’s regulations all the student files must be kept in the offices for seven years. This
means a huge collection of thousands of files, and all these files are paper files that take
large office space for storage.
4.5 Staff of FCSIT
Each office has its own staffs that are responsible for the student files. Staffs of the
undergraduate office have been interviewed. The interview identifies the problems they
faced in managing the student paper files.
56
Special cabinet have been installed in both the offices. The staff put all the student’s
files on these cabinets that create accumulate over time creating storage problems and searching task for a particular file becomes tedious. The existence of the students paper
files in this manner many increase risks like fragmentation, fire and safety hazard.
4.6 Students of FCSIT
At University of Malaya, there are two categories of students: i.e. (a) the undergraduate
students and the postgraduate students and (b) Malaysian and foreign students with various
disciplines within the university.
For all the students, the undergraduate students register for the first time at the
International Student Centre (ISC) and the postgraduate students register for the first time
in the Institute Postgraduate Studies (IPS). Each student is required to fill in the student
registration forms enclosed with transcript certificates and payment vouchers and submit it
to the office.
The ISC and IPS send copies of the student’s files that hold every document to
students’ faculty, since faculties must keep every student’s files in their offices. Every new
semester the undergraduate students have to complete t the registration process at the ISC
and the faculty. The postgraduate students follow the same procedure where they have to
complete the registration process at the IPS and the faculties (see figure 4.1).
57
All Students’ registration
in ISC or IPS
ISC and IPS send all
students files to there
faculties
User’s office of the
faculty administering and
managing student files
New
students
Organize
files by
names
Organize
files by
years
Existing
students
Organize
files by
department
s
Update
student
files
Figure 4.1 Current Students Registration
4.7 Research Unit of Analysis
This study is conducted at the FCSIT office that manages the undergraduate student’s
files. These files hold all the students’ document from the first registration until they
graduate.
58
4.8
The Current Document Managing System in FCSIT
In the undergraduate office of FCSIT there are three staffs who work with the students’
files. This staffs administers and manages hundreds of new and old students’ files to be
kept in the office; all these files are slowly taking up the office space. The staffs makes
singular file for each student that hold all his/her document from the first registration day
until the students complete their studies and graduate.
These files are organized by
disciplines, names and years before storing these files in the allocated space on the
specially built cabinet.
4.9
Current System Drawbacks
•
The staff found it difficult to organize files.
•
The staff found difficulty to search for files.
•
The paper files can be torn.
•
The updates in the file make the appearance of the form look very messy.
•
If the head departments need to see any student’s file he/she has to ask one of the
staff to search and bring the file take time and effort to search for a file.
•
If the staffs need to send the student file to anther office, they have to make copy
From the file and send it by one of them.
The way of the current file storage in the office, the problems that face the staff with
the files. The new system will resolve all this problems and make the procedures be very
simple for all sides.
59
4.10 Summary
This chapter discusses the case study where the data collection took place. A thorough
discussion on the unit of analysis, that is FCSIT, has been made. The work process
involved in managing students files and the responsible staffs has also been explore.
60
Chapter 5: Data Analysis and Finding
This chapter as a continuation of the previous chapters of the study - focuses on the
analysis of data collection. The findings from the data analysis process are then used as
system requirement to develop the proposed electronic file storage system (EFS) which is
more efficient to administer and manage student’s files.
5.1
The Answers to the Interview Questions
The answer that has been came from the staff and used to come out with the data
analysis for the current system will be mentioned.
•
Are the staffs satisfied with the current storage file system?
The answer to this question is yes they satisfied with the system because they do not
have any other system or options to work with. The current storage system has been in
placed since the first day they work in FCSIT. They also found out that the senior staffs
have been working with this system all along.
•
What are the work processes involve in the current file storage system?
In the case of a foreign student, when the student registers for the first time at IPS, a
copy of all the student’s registration form will be sent by IPS to the respective faculties
where the student has enrolled,. The staffs of the respective office receive the student
registration forms and open a folder for each student before filing them away. Each student
61
will have their own folder which is paper based. Most of the time the staff filed the student
folders by years or names and store them it the cabinets.
•
What are the advantages of the current storage file system?
The current system has advantages like the existence of the files in the form of paper. it
is better for the user to read from the paper that reading from the computer. That is the only
advantages that can be found in the current system.
•
What are the disadvantages of the current storage file system?
The current system has many disadvantages like the difficulty when receiving large
number of student files at the beginning of every new semester and the process to manage,
organized and store these files into the cabinets. The big number of student files requires
many cabinets that take up large area of the office floor. Also, searching for a particular
student’s files can be time consuming and tedious and the need to return the files back to
the same location. If the staffs need to send any student file to another office, they have to
scan the student’s fillister, save it in the computer as electronic document and send it by
email or the staff can simply make a copy of the student’s file and post it.
•
In the work process steps that have been mentioned before which steps took the
longest time in current storage file system?
The answer of the staff for this question was that all the steps took a long time to be
done. The most time consuming is organizing the student forms and keeping it in the
cabinets because the large number of available files; it may take a few days to organize the
62
files, Other work process like searching for the files, returning the files to the same place
and sending the files to other offices does not take more than half an hour to be done.
•
What do you think the reason for the delay in the process of the current storage files
system?
The answer from the staff for this question was that the work with too many students’
files/folders and all the process of the current system must done by hand.
•
What are the changes that must take place in the current storage file system?
The answer from the staff for this question was that they hope if there is an electronic
system that can save the entire student files into a computer (i.e. database) so it will be easy
for them to administer and managed the students’ files.
•
What are the capabilities of staff in dealing with computer?
The answer from the staff for this question was that all the staff in the office knows how
to use the computer because there are many of the computers in the office used for some
other works.
•
Will the staff accept the change of the current storage file system to computerized
system?
The answer from the staff for this question was that if the new computerized system is
easy and good of course the staff will accept the changes because it will reduce the time
and effort they used.
63
•
Do the staffs need any training for the new computerized system?
The answer the staffs for this question was yes; anyone who wants to work with the
new system for the first time need to be trained so that the staff can deal with it in the right
way without making any mistake because working with the student data is very sensitive.
5.2
Observation
An observation can give useful insight into problems, work conditions, bottlenecks and
methods work.
Observation is the first method used to gather information regarding the development of a
new electronic file storage system. A visit was made to the FCSIT undergraduate office to
make personal observation on the existing system and the workflow of how
staffs work
with the student files. From the observation it is noted that:
•
The work process of the current system is still done manually.
•
The main stakeholder working with the students’ files are the administrator and staff
of the faculty’s office.
From the above finding, the staffs handling the students’ files are the main source for
interview; because they are involve directly work with possessing the students’ files.
64
5.3
Challenges of Current Systems
Based on the data collected and interviews the current system suffers some difficulties
as summarized as below:
1. Files storage problem: Many cabinets needed to be built in the faculty’s office to
store all the paper-based student files.
2. Files organizing problem: the staff do not how. A systematic way to organize the
students’ files. Files are normally organized by characters, department or years.
3. Files retrieving problem: the staff take too long (time) to search for a particular
students’ file because the search in done manually and there are thousands of
students’ files in the cabinets.
4. Files sending problems: if the staff need to send any student’s file to any other
location, they have to find the file and send it by the post or scan it as image and
send it by -electronic email.
5.4
General System Requirements
The requirements for the system were based on the findings of the literature review
done as well as from the interview sessions done. After a careful analysis of the data
collected, the findings of the analysis is used to derived the following application
requirements:
1. The application must be a web based so the user can enter the system from any
places.
2. The application must develop on open source platform to allow easier future
upgrade and enhancements.
65
3. The interface of the system must be simple for the user to use it easily.
4. The users must be training to use the new system.
There should be 2 main groups of user; administrator and staff. Each user shall have
different access level or privilege to the application. The following describes each user role:
-
The Administrator will have access to entire system.
-
The staff can access all the system and are given the privilege to manage student file.
5.5
Summary
In this chapter, an analysis of data collected was done whereby the responses to the
interviews were analysed carefully and in depth. The findings of the analysis are use as
input to determine the system function of the new student filing system. Chapter Six will
discuss how the system function is determine and how the system is design, developed,
implemented and tested.
66
Chapter 6: System Design, Development and Implementation
This chapter describes the system design, development and implementation. The system
in general is divided into two modules. The first module is the user interface which offers
web-based interface. The second module is the documents scanning module which is a
Java-based application used for scanning documents. The two modules are integrated into
the proposed system to offer an efficient electronic file storage system.
6.1
System Requirements
The requirements for the system are based on the findings from literature reviews, as well
as from interview sessions with users. After careful analysis of the findings, the following
application requirements were derived.
6.1.1 General Requirements
The general requirements refer to the generic features of the application. These features
shall span across the application regardless of functionality and modularity. The derived
general requirements were as follows:
•
The application must be a web-based thin client system. This is to allow users to
have greater access to the application.
•
The application must operate on an open source platform to allow easier future
upgrades and enhancements.
67
•
Application navigation should be easy to use and self explanatory.
6.1.2 User Management Requirements
There should be two main groups of users; administrator, administrative staff. Each user
shall have different access level or privilege to the application. The following describes
each user’s role:
•
The administrator has access to manage the system users and assign roles.
•
The administrative staffs have access to manage student’s forms in the system.
6.1.3 Functional Requirements
Functional requirements are important as they are used to determine what the system
should be able to do, and the functions it should perform to produce a particular output or
outputs that are desired by the system users. The system has 2 main components as
explained in detail below:
1. The Administrator Subsystem includes modules to:
a. login to the system using user name and password
b. add administrative staff
c. add students forms by scanning the form
d. delete students forms from the database system
e. update student forms in the database system
f. print students forms
g. log out of the system
68
2. The Administrative Staff Subsystem includes five modules to:
h. login to the system using user name and password
i. add students forms by scanning the form
j. update student forms in the database system
k. print students forms
l. logout of the system
6.1.4 Non-Functional Requirements
On-functional requirements are factors used to judge how the system operates. Unlike
functional requirements, which describe the specific functions that the system has to
deliver, non-functional requirements illustrate the quality of the system.
•
Accessibility - The system should be accessible to any of the authorised users
anywhere without requiring excessive effort. This also includes platform
compatibility with all the platforms. The system is designed to be a web-based
system that can be accessed through a web browser with an Internet connection. To
login to the system, the user should supply a valid username with the corresponding
password. After the authentication of the user’s access rights have been made, the
user is signed on.
•
Availability - The system should be readily available at any time of the day.
Available means.
69
•
Maintainability – The system should be easily maintained and does not demand
too much effort to enhance or extend.
•
Security - All passwords are encrypted while usernames are unique to ensure that
each system user is distinct from the other. This also certifies that only authorised
users can use the functionalities of the system, based on the level of privilege and
access rights granted. Besides these, only the system administrator is allowed to
make any changes to the internal features and structure of the system. It is crucial
that the system is secure from malicious attacks.
•
Usability - The system should require little effort to learn and use (refer interview.
Thus, it is important that the layout of the system components and workflow of the
system be consistent to accelerate the familiarisation and usability process. Besides
that, the auto calculation of evaluation scores will also enable higher efficiency as
the required time to accomplish the task is greatly reduced.
6.2
Systems Development Consideration
6.2.1 System Environment
The final system runs on a typical web environment set up, which consists of the following
components: relational database system, web server, web application, and the user interface
or browser. The system was developed and tested on a single machine or server with the
following software installations as listed in the table below:
70
Table 6.1: ESF System Environment
Item
Software Product
Operating
Microsoft Windows XP Professional
System
Web Server
Apache server 2.2
Web Application
PHP 4.3.10
MySQL 5.0 Database Server –
Database
Community Edition
The server has the following hardware specifications as listed in the following table, which
also represents the recommended minimum hardware requirements.
71
Table 6.2: ESF Hardware Requirement
Item
Hardware Specifications
Processor
Intel Pentium 4 (1.8MHz)
Memory
1GB DDR2 RAM at 553 MHz
Hard Disk
40GB SATA
Network
Interface 10/100 Ethernet
6.2.2 Programming Language and Development Tools
This section examines the chosen scripting language and database, as well as look at
the development tools that were used in the construction of the application.
6. 2.2.1 PHP Programming Language
PHP is a recursive acronym for PHP: Hypertext Pre-processor. It is an open-source
server-side scripting language that was first introduced in 1994. Since then, it has become
the most popular open-source web-based programming language, used by over 6 million
72
domains
with
a
monthly
growth
rate
of
15%
(according
to
Net
craft,
http://www.netcraft.com/survey/).
Amongst the benefits of using PHP are:
1. The scripting language is very easy to learn and there is an abundance of PHP resources
available on the Internet. This makes it easier to maintain and upgrade the PHP applications
compared to other scripting languages such as Perl or ASP.
2. PHP works on almost any operating system. This cross-platform compatibility feature
makes it easier to deploy and install completed application on existing Internet servers such
as Apache, Microsoft and Netscape service solutions. Thus, it is highly suitable for today’s
heterogeneous network environments.
3. PHP also has built-in supports for a wide variety of commercial, as well as noncommercial databases such as MySQL, Informix, mSQL, Microsoft SQL Server,
PostgreSQL, Oracle, Sybase and also ODBC type database connection.
4. PHP supports protocols such as POP3, LDAP, SNMP, HTTP, COM, and IMAP, and also
offers integration with various external libraries. This allows PHP developers to do almost
anything, from generating PDF documents and creating graphic images to parsing XML
documents. It is also able to work with other server-side languages, such as JAVA and
COM.
5. Being an open source scripting language with wide distribution and a large community
of users, PHP is very well supported. PHP bugs are found and fixed quite regularly, and the
73
language enjoys continuous improvements to enhance its capabilities due to its huge pool
of open-source developers. Most importantly, all these benefits are made available to its
users without any hidden cost.
6. 2.2.2 MySQL Database System
MySQL is a powerful, secure and scalable multi-threaded, multi-user relational database
management system owned by the Swedish firm MySQL AB. Although small in size as
compared to other commercial relational databases, MySQL is extremely fast. Perhaps the
most convincing reference of MySQL implementation is the Google Search engine, which
is built entirely on MySQL technology.
The main reason for using MySQL as the application database is because of PHP’s
extensive built-in support for MySQL database. PHP has numerous functions available to
allow developers to control and manipulate the MySQL database without having to code
new procedures. This will expedite the application development as less coding needs to be
done.
6.3
Development Tools
The following tools were used during the development of the Content Storyboard
Application system:
6.3.1 PHP Designer 2007 Personal
This tool is available as a freeware and can be downloaded from the Internet. It is
developed by MPSoftware and is an Integrated Development Environment (IDE) for PHP,
74
designed to help ease and enhance the process of editing, debugging and analysing PHP
scripts.
6.3.2 MySQL Query Browsers
MySQL Query Browser is a tool for creating, executing, optimising and testing SQL
queries for the MySQL Database server. It is available for free at http://www.mysql.com.
6.3.3 MySQL Administrator
MySQL Administrator is a free tool that is available from the MySQL website for
administering and managing the MySQL databases. It provides database administrators
with an easy to use but powerful visual interface that gives better visibility on how the
databases operate.
6.4
System Design
This section describes the user interface and the scanning modules. Figure 6.1 illustrates
the design of the system and how it work.
75
Figure 6.1 Use Case of System Architecture
The system user first has to scan the forms by using the desktop interface java and than
the system will save it direct in the database. All the forms will be saved in the database In
particular. If the user wants to open any student form, it can be opened from the web-base
interface from any place if the user has the permeation of entering the system.
The following sections describe the major functionalities of the system, and the basic
operations offered by the system.
6.4.1 Interface Module
This section describes the user interface module of the system. The user interface
composes of two layers; the interface layer which is a web-based layer developed using
PHP language. The second layer is the database layer. The database layer describes the
internal storage layer of the system. It presents the structure of the database used to store
the students’ forms.
76
The next figure 6.2 shows the web-base interface of the system that has been developed
to manage the student forms. All the users’ mast enters the web-base system from this page
because it is the main of the system.
Figure 6.2 Web-Based Interfaces
There are two types of users in this system, the administrator and staff user. The
administrator is the global manager of the system who has the full access permissions.
Administrator has the control of the system and can by provide usernames and other
privileges for users. An administrator can update and do maintenance on the system.
77
On the other hand, the staffs as a user is a restricted user who has limited permissions
comparing to the administrator the staff can work with both the web-based and Java-based
application. The staffs have all the privilege except deleting information Staffs are also
responsible for scanning students’ documentation by using the Java-based application.
Each user has their own username and password to enter the system. The web-based is
integrated with the Java-based application to complement each other. The web-based
application does the reading of the scanned files that have been stored in the database by
using the java-based application as shown in Figure 6.1.
6.4.2
Administrator Module
The next sections will describe the entire administrator operations.
6.4.2.1 Administrator: Manage Users
In this system module, the administrator is the only one who is given the privilege to
access the system. These use cases show the privilege given only to the admin and doesn’t
given to the staff and that is for security.
78
Figure 6.3 Use Case of Administrator Privilege
6.4.2.2 Administrator: Update User Account
The administrator access the system by administrator username and password which
give access and privilege to access the system. After that user have to click in the manage
user link. Figure 6.4 shows the screen to manage user page.
79
Figure 6.4 Administrator Management Page
6.4.2.3 Administrator: Updating Screen
When the administrator accesses the user management page they can access any user
account and update (figure 6.5).
80
Figure 6.5 Administrator Updating Accounts
After the administrator access the user account as seen in figure 6.5 he/she can do
system maintenance such as to change the username, user password and user access level
and click in the update data button for the system to accept the new update. Any other users
who are not an administrator will not be able to see the user management link in the system.
6.4.2.4 Administrator: Delete User Account
In this module the administrator must enter his/her own username and password to
have the privilege to enter the manage user link as seen in figure 6.4 web-base manage user.
The administrator can press the delete button to delete any user account from the system
seen in figure 6.6.
81
Figure 6.6 Administrator Delete Users Accounts
6.4.2.5 Administrator: Admission Records
In this module the administrator and user can access to the admission records module
in the system. The admission records module provides different privilege to the user and to
the administrator as seen in figure 6.7. The admin have all the privilege of the system like
viewing, update, delete, print and even the validity of entering the system.
82
Figure 6.7 Use Cases of Administrator Admission Records
Figure 6.8 administrator admission records show all the privilege of the administrator
user in the admission record module.
83
Figure 6.8 Administrators: Admission Records Screen
In the main menu of the administrator admission record screen there is four links. The
first one is HOME which take the user to the main page; the second link is MANAGE
USER which takes the admin to the page of creating the user name and password of
entering the system, the third link is ADMINSSIN RECORD which make the user view all
student form that has been store in the database of the system, the fourth link is LOGOUT
which sign the user out of the system.
6.4.2.6 Delete Student Form
The administrator is the only user how has the privilege to delete the student forms
from the system, and that is for security reason. As shown in figure 6.9 that when the
administrator enter the system by his/here user name and password they can find the delete
84
button inside the admission records. In figure 6.9 the administrator can delete the student
form from the system.
Figure 6.9 Delete student form from the database
6.4.3 User Operation
The next sections will describe the entire user operations.
6.4.3.1 User Interface Screen
Figure 6.10 depicts the user interface screen.
85
Figure 6.10 Web-Base User Interface Screen
The welcome note of the user interface is to explain the idea of the system that has been
designed and the methods of use and benefits for the user.
86
6.4.3.2 User: Admission Records Screen
Figure 6.11 Use Case of User: Admission Records Screen
The staffs has some privilege in the system like viewing, update, print and scanning the
student form by using the java interface of the system. The privilege of deleting can not
been given to the staff because any form mast store at leas for seven years.
87
Figure 6.12 depicts shows all the privilege of users in the admission records module. This
figure has been explained in figure 6.8.
Figure 6.12 Staff: Admission Records
6.4.3.3 User: View Student Form Details
In this system module, both of the administrator user and the user can view the details
of the student forms. When any of the users press the details button as seen in figure 6.12 it
will open page with all the details that on the form the details of the form are presented
web-base and are similar to the source document (form). This is because the form has been
scanned and stored into the database of the system and can be accessed web-based. If the
user wants to make any changes to the design of the web-based form then the user has to go
88
to the Java-based application and choose option to make changes to the forms design.
Figure 6.13 present the view of the web-based student form.
Figure 6.13 Users: Student Form Screen
This page shows the user all the details of the student’s forms that had been scanned and
store. From this page the user can update and print the forms as it will be explants in the
next sections.
6.4.3.4 User: Update Student Form
The students during the study years may have changes to their data, for e.g. changing of
postal address, phone number and email address. The students need to inform the FCSIT
office about the change in their data.
89
The staffs who are responsible for the student forms will then update the students’ data
in the system. To update the students’ detail, users have to use the update button as seen in
figure 6.13; it will open the relevant page with all the details that the form holds. Then the
user can do the necessary update on the student’s record as in figure 6.14.
Figure 6.14 Users: Update Student Form
Comparatively, there is a difference between figures 6.13 and figure 6.14. In figure 6.14
the marital statue is single and the nationality is Malaysia, but in figure 6.13 the marital
status is married and the nationality is empty. That was the update done in the student‘s
form.
90
6.4.3.5 User: Printing Students Form
Both of the administrator and user can print the student forms from the system the user
can print the main page of the admission records that hold all the student forms names that
has been saved in the system as shown in figure 6.15.
Figure 6.15 User: Printing the Admission Records Page
Also the user can enter any portion of the student details and print it as shown in figure
6.16.
91
Figure 6.16 Users: Printing the Student Form
6.4.3.6 Other Services
In order to increase the use of the web-based electronic file storage system EFS system
application it is linked to University of Malaya main page so that it gives accessibility to
the university and the faculty web sites. This service adds flexibility to the system and
provides an easy way to bring the users up to date with the university and the faculty web
sites.
6.4.4 Java-Base Application Interface
This section describes the Java-based application which is the basis in the development
of the EFS system. No user is allowed to access the Java-based application; only the
92
administrator with administrator username and password can gives access to enter the webbase system that provides interface to the Java-based application. This is the main javabased application interface as in figure 6.17.
Figure 6.17 Java-Based Application Interfaces
After the administrator access the main page of the Java-base application all the
processes that the system works is made available, as seen in figure 6.18.
93
Figure 6.18 Main Page of the Java-Based Application
In figure 6.18 the user will find on the left of the screen three buttons. The first button
is SCAN; the user can press it to enter the printer application for scanning the forms. The
second button is VIEW; the user can press it to view all the students’ forms in the system.
The third button is USER; jest the admin how can enter this part to view all the users of the
system. In the medal of the screen there is TEMPLATE, the user can use it to change the
design of the form in the system. All the bottom of the template there is the SOURCE
which shows the user all the printers or devices that is connected with the computer. The
button ACQUIRE also take the user to the part of the scanning. The button TRAIN opens
all the training forms that have been used to train the system by different hand writing.
94
6.4.4.1 Document Scanning
This section discusses about the java-based application that has been used to scan all
the student’s documents. That is the important part in the whole system. The scanned
documents are then saved in the database of the system.
6.4.4.2 The Scanning Processes
To initiate scanning by using the Java-base application there is a lot of processes that
has been developed. The way of scanning the document by using the java-based application
will be display in section 6.4.4.4.
6.4.4.3 Document Format
This section describe about the design of student’s registration form in the University
of Malaya and the changes that has been done to the forms to fit into with the new EFS
system. Figure 6.19 illustrates the registration form that the University of Malaya currently
using.
95
Figure 6.19 Student Form
This current registration form is not a fit match to be used by the new EFS system
because not all students have the same hand writing, and when some of them write they did
not separate the letters. It is very difficult for the EFS system to read the entire different
hand writing pattern because there is hundreds of different hand writing. The EFS system
needs to be trained to learn to recognise the different hand writing.
96
A new registration form has been specially designed to work with the new EFS system.
Figure 6.20 depicts an example for the new specially designed registration form that has
been designed to be used in EFS.
Figure 6.20 New Specially Designed Registrations Form
97
The design of the new registration form is to make each letter to be separated from one
another. This helps the Java-base application algorithm to read each character by itself;
hence able to read the students’ handwriting as identified characters.
If the administrator wants to change the design of the registration form at anytime it can
be done by pressing “NEW” in the template and design the new form, as shown in figure
6.21. The registration form can also be edited and deleted as needed.
Figure 6.21 Designs New Registration Form Template
98
6.4.4.4 Scanning and Saving the Scanned Students Registration Forms
into the System
These sections descript how to save the scanned students’ registration forms into the
system. There are two ways to save the scanned forms.
6.4.4.4.1 Scanning the Students’ Registration Forms
Users need to access the main page of the Java-base application. Next, the user needs to
go to the correct option to do the scanning process. Next, user need to place the paper-based
student registration’s form onto the scanner device. Then, the user need to select the
scanner.
The user has to choose the scanner button option as shown in figure 6.22.
99
Figure 6.22 Selecting the Scanner
It is important to remember that before the user select the scanner, they must put the
source form on the scanner because after scanner option has been selected and pressed
“OK” then the EFS system will automatically scanned the form, as in figure 6.23.
Figure 6.23 Scanning Process
The resolution of the scanner must be 300 so the system can read the character. The
resolution is to identify the size of the character that the system will read from the forms.
After the system scanned the registration form, the user will press “ACCEPT” so the
system can change the image to text. The user can see the process of the changes as in
figure 6.24.
100
Figure 6.24 Process of Changing From Image to Text
When the changing image-to-text process is finished the system will display the
output. If the user wants to save the scanned registration form in the database then the user
just need to press “OK” else press “NO”. The upload result is shown as in figure 6.25.
101
Figure 6.25 Upload Result Image-to-Texts
6.4.4.4.2 Browse for the Forms
When the users access the main page of the Java-base application, the users have to
press the relevant button to select the source from which to acquire the image. The user has
to choose the file button to browse for the required registration forms that need to have
changes, as shown in figure 6.26.
102
Figure 6.26 Browse and Select the Required Registration Form
After the user selects the required registration form, the system changes the registration
form from image to text and the upload result is shown as in figure 6.25.
6.4.4.5 Artificial Neural Network
This system contain artificial neural network that used to read and scan the paper –
source registration form and change it from image to text. The users scan the registration
forms as image and save it into a database in the system via artificial neural network
concept. The next section defined and explained the artificial neural network used.
103
6.4.4.5.1 Definition of Artificial Neural Network
An artificial neural network (ANN), often just called a "neural network" (NN), is a
mathematical model or computational model based on biological neural networks. It
consists of an interconnected group of artificial neurons and processes information using a
connectionless approach to computation. In most cases an ANN is an adaptive system that
changes its structure based on external or internal information that flows through the
network during the learning phase (Matthews 2000).
In more practical terms neural networks are non-linear statistical data modelling tools.
They can be used to model complex relationships between inputs and outputs or to find
patterns in data (Chen 1996).
Figure 6.27 Neural Network Processes
104
A neural network is an interconnected group of nodes, akin to the vast network of
neurons in the human brain.
In the above section the overview about the neural network has been presented as an
algorithm used for converting the scanned document to text format. But, in the system
implementation the self-organizing map algorithm is being used, a special type of neural
network. The following section describes the (SOM) algorithm in details.
6.4.4.6 Definition of Self-Organizing Map
A self-organizing map is a type of artificial neural network that is trained using
unsupervised learning to produce a low-dimensional (typically two dimensional),
discredited representation of the input space of the training samples, called a map (Kohonen
1998). Self-organizing maps are different than other artificial neural networks in the sense
that they use a neighbourhood function to preserve the topological properties of the input
space.
This makes SOM useful for visualizing low-dimensional views of high-dimensional
data, akin to multidimensional scaling. The model was first described as an artificial neural
network by the Finnish professor Teuvo Kohonen, and is sometimes called a Kohonen map.
105
6.4.4.7 Form Training
The artificial neural network that has been used in the system must be trained to be able
to identify the different types of hand writing. In this system the artificial neural network
was training by six different hand writing, two of them will be presented as an example as
in figure 6.28 and figure 6.29, and after the training has been done, the system can identify
the different hand writing.
Figure 6.28 First Examples for Training Form
106
Figure 6.29 Second Examples for Training Form
This is example for two forms same in the design but different in the hand writing.
6.4.4.7.1 Training Procedures in Reading the Registration Form
These sections descript how to train- the system by different forms to make it perfect
reading. There are few and simple steps to train reading the forms. When the users enter the
Java-base application, the user have to chose the “TRAINING’ in the template and press
107
the button “NEW” to browse for the form that must be trained and then select the correct
button as shown in figure 6.30.
Figure 6.30 Training Procedures
After the user select - form and press “OK”, the training form will be open and the user
have to enter all the character and numbers line by line and how many of them in each line.
The user than highlight it in the form to let the system know the character and numbers that
must been training in the selected form, as shown in figure 6.31.
108
Figure 6.31 Selecting the Character and Numbers in the Form
6.4.4.7.2 Sending Email
One of the operations that the system can do is sending the student’s form by email; the
user must have a “GMAIL” account or university email in order to send the email with the
student’s form.
The user can send the student form by email just by using the java-base application.
The student’s form cannot be sent by the web-based interface because it is all ready
connected with the internet. Any user who has a username and password can view the
students’ forms in detail in the web-based system.
109
When the user accessed the Java-base system application and view the details of the
student’s form, the user need to press the “EMAIL’ button to send the student’s form, and
the student form will be send as seen in figure 6.32.
Figure 6.32 Sending Student Forms by Email
6.5 System Testing
Testing is the process that is carried out to ensure that the system conforms to the
specification and meets the requirements of the users, namely administrator and staff of the
office. Testing had been conducted not only in the end but also during the development of
the prototype system. Functional and interface testing were carried out for the module or
for the whole system. Each and every link had been checked to make sure all the links are
110
working correctly. Interface testing is carried out to identify that the interface works
correctly and faults are not created because of interface errors.
6.5.1 Unit Testing
Unit Testing is to test software in terms of a unit, a module, a function, a specific
section of code. This testing occurs while the software is being developed and before
completion.
For Unit Testing, test cases are designed to verify that an individual unit implements
all design decisions made in the unit's design specification. A thorough unit test
specification should include positive testing where the unit does what it is supposed to do,
and also negative testing where the unit does not do anything that it is not supposed to do.
Table 6.3 shows the Unit Testing for the entire Administrator Functionality module.
Table 6.3 Unit Testing for the Entire Administrator user Functionality
Module:
Functionality
Status
login to the system using user name and
password
PASS
add administrative staff
PASS
add students forms by scanning the form
PASS
delete students forms from the database system
PASS
111
update student forms in the database system
PASS
print students forms
PASS
log out of the system
PASS
Also for the staff user the Table 6.4 shows the Unit Testing for the entire Staff
Functionality module.
Table 6.4 Unit testing for the Entire Users Functionality Module:
Functionality
Status
login to the system using user name and PASS
password
add students forms by scanning the form
PASS
update student forms in the database system
PASS
print students forms
PASS
logout of the system
PASS
Both the table 6.3 and table 6.4 are showed that the functionality of the system has been
successfully achieved and the user requirement have been met and implemented.
112
6.6 Summary
This chapter begins the discussion on the designing the system functionality based on
the findings of the data collection done in the case study. Different tools are looked when
considering the tools to be used in the development and implementing of the EFS system.
And finally, the chapter discusses and explains how the electronic file storage system and
the Java-based application interact. The process of scanning and storing the image file and
the process of manipulating the image file into text file has also been discussed.
113
Chapter 7: Conclusion
This chapter presents the conclusion of the study. It discusses the differences between
the current file system of the faculty and the new electronic file storage system that has
been developed to simplify and facilitate the work of staff in administering and managing
the student’s files. This chapter also discusses the limitation of the new electronic file
storage system and its consideration for future enhancement.
7.1
Project Objectives
On the whole objective of this study has been achieved. As discussed in Chapter1
section 1.4, the main objective of this research is to develop a system that improves the
administering and managing process of student’s files. An electronic file storage system
(EFS) which is a web-base system has been developed for this purpose. It is anticipated that
if the EFS is used full scale in FCSIT, it can help in reducing the size of files storage space
in the office, by scanning the forms and store it electronically in the database. In addition,
the system helps to protect files from natural disaster. Electronic file storage also simplify
and facilitate the work of staff handling the files. In addition, the system has functionality
facilities that allow the sharing of students’ files by emails from the system direct be
University email or Gmail (Google email).
114
7.2
Training Staff and Users
The objective of developing the new electronic file system is to improve the work
process in the FCSIT offices. The staff and users who are dealing with the new system must
be trained to be able to use the system competently the right way. The training will be
required in handling the two parts of the EFS system: (i) web-based interface and (ii) Javabased application.
During the training all the system features and functions will be explained step by step
in detail. Training must be easy and friendly so that users do not find any difficulties in
understanding the system. Furthermore, a detail user manual has also been developed for
referencing.
The difference between the EFS web-based interface and Java-based application is in
the training of using scanner. The work with the scanner needs more training because the
users need to know how to manage the scanner and the procedure to scan input source, i.e.
students’ form.
7.3
System Limitation
The system has some limitation as discussed below:
•
Limitation in neural networks training
The system is developed using neural networks algorithm that include the self
organization maps (SOM). This algorithm cannot read all the hand writing of the people
115
because there are hundreds of different hand writing., That is why this algorithm must
be trained at less by seven different types of hand writing but not more than that
because of the system’s memory limitation. This system only keeps seven trained
forms. Each additional form will replace the oldest forms in that memory.
7.4
Future Enhancements
There are two enhancements for this project
•
Improve the storage capability of the system by enlarge the memory size of the
neural network, so more training forms can be applied.
•
Reduce the scanning process time by using scanner more advanced and faster in
scanning.
7.5
Summary
This chapter presented the conclusion of the study. It summarizes the project objective
and the training need for the staff and users for the new EFS system. It also underlined the
constraints of the new system and its future enhancement. The EFS solved all the problems
that have been mission in the problem stated in chapter 1. After the system has been
developed all the students document has been store in the database of the system for the
university, that mean no need for archiving the files and take up office space or losing time
to search for files. All the work process has become electronically with the new system.
116
Appendix A: Source Code for Converting from Image to Text
package som;
/**
* Java Neural Network Example Handwriting Recognition
* Copyright 2005 by Heaton Research, Inc.
* by Jeff Heaton (http://www.heatonresearch.com) 10-2005
*/
public class KohonenNetwork extends Network
{
/**
* The weights of the output neurons base on the input from the input neurons.
*/
double outputWeights[][];
/**
* The learning method.
*/
protected int learnMethod = 1;
/**
* The learning rate.
*/
protected double learnRate = 0.3;
/**
* Abort if error is beyond this
*/
protected double quitError = 0.1;
/**
* How many retries before quit.
*/
protected int retries = 10000;
/**
* Reduction factor.
*/
117
protected double reduction = .99;
/**
* The owner object, to report to.
*/
//protected Applet owner;
/**
* Set to true to abort learning.
*/
public boolean halt = false;
/**
* The training set.
*/
protected TrainingSet train;
/**
* The constructor.
*
* @param inputCount
*
Number of input neurons
* @param outputCount
*
Number of output neurons
* @param owner
*
The owner object, for updates.
*/
public KohonenNetwork(int inputCount, int outputCount/*, Applet owner*/)
{
int n;
totalError = 1.0;
this.inputNeuronCount = inputCount;
this.outputNeuronCount = outputCount;
this.outputWeights = new double[outputNeuronCount][inputNeuronCount + 1];
this.output = new double[outputNeuronCount];
//this.owner = owner;
}
/**
* Set the training set to use.
*
* @param set
*
The training set to use.
*/
public void setTrainingSet(TrainingSet set)
{
118
train = set;
}
/**
* Copy the weights from this network to another.
*
* @param dest
*
The destination for the weights.
* @param source
*/
public static void copyWeights(KohonenNetwork dest, KohonenNetwork source)
{
for (int i = 0; i < source.outputWeights.length; i++)
{
System.arraycopy(source.outputWeights[i], 0, dest.outputWeights[i], 0,
source.outputWeights[i].length);
}
}
/**
* Clear the weights.
*/
public void clearWeights()
{
totalError = 1.0;
for (int y = 0; y < outputWeights.length; y++)
for (int x = 0; x < outputWeights[0].length; x++)
outputWeights[y][x] = 0;
}
/**
* Normalize the input.
*
* @param input
*
input pattern
* @param normfac
*
the result
* @param synth
*
synthetic last input
*/
void normalizeInput(final double input[], double normfac[], double synth[])
{
double length, d;
length = vectorLength(input);
// just in case it gets too small
if (length < 1.E-30)
length = 1.E-30;
119
normfac[0] = 1.0 / Math.sqrt(length);
synth[0] = 0.0;
}
/**
* Normalize weights
*
* @param w
*
Input weights
*/
void normalizeWeight(double w[])
{
int i;
double len;
len = vectorLength(w);
// just incase it gets too small
if (len < 1.E-30)
len = 1.E-30;
len = 1.0 / Math.sqrt(len);
for (i = 0; i < inputNeuronCount; i++)
w[i] *= len;
w[inputNeuronCount] = 0;
}
/**
* Try an input patter. This can be used to present an input pattern to the
* network. Usually its best to call winner to get the winning neuron though.
*
* @param input
*
Input pattern.
*/
void trial(double input[])
{
int i;
double normfac[] = new double[1], synth[] = new double[1], optr[];
normalizeInput(input, normfac, synth);
for (i = 0; i < outputNeuronCount; i++)
{
optr = outputWeights[i];
output[i] = dotProduct(input, optr) * normfac[0] + synth[0]
* optr[inputNeuronCount];
120
// Remap to bipolar (-1,1 to 0,1)
output[i] = 0.5 * (output[i] + 1.0);
// account for rounding
if (output[i] > 1.0)
output[i] = 1.0;
if (output[i] < 0.0)
output[i] = 0.0;
}
}
/**
* Present an input pattern and get the winning neuron.
*
* @param input
*
input pattern
* @param normfac
*
the result
* @param synth
*
synthetic last input
* @return The winning neuron number.
*/
public int winner(double input[], double normfac[], double synth[])
{
int i, win = 0;
double biggest, optr[];
normalizeInput(input, normfac, synth); // Normalize input
biggest = -1.E30;
for (i = 0; i < outputNeuronCount; i++)
{
optr = outputWeights[i];
output[i] = dotProduct(input, optr) * normfac[0] + synth[0]
* optr[inputNeuronCount];
// Remap to bipolar(-1,1 to 0,1)
output[i] = 0.5 * (output[i] + 1.0);
if (output[i] > biggest)
{
biggest = output[i];
win = i;
}
// account for rounding
if (output[i] > 1.0)
output[i] = 1.0;
if (output[i] < 0.0)
output[i] = 0.0;
}
121
return win;
}
/**
* This method does much of the work of the learning process. This method
* evaluates the weights against the training set.
*
* @param rate
*
learning rate
* @param learn_method
*
method(0=additive, 1=subtractive)
* @param won
*
a Holds how many times a given neuron won
* @param bigerr
*
a returns the error
* @param correc
*
a returns the correction
* @param work
*
a work area
* @exception java.lang.RuntimeException
*/
void evaluateErrors(double rate, int learn_method, int won[],
double bigerr[], double correc[][], double work[])
throws RuntimeException
{
int best, size, tset;
double dptr[], normfac[] = new double[1];
double synth[] = new double[1], cptr[], wptr[], length, diff;
// reset correction and winner counts
for (int y = 0; y < correc.length; y++)
{
for (int x = 0; x < correc[0].length; x++)
{
correc[y][x] = 0;
}
}
for (int i = 0; i < won.length; i++)
won[i] = 0;
bigerr[0] = 0.0;
// loop through all training sets to determine correction
for (tset = 0; tset < train.getTrainingSetCount(); tset++)
{
dptr = train.getInputSet(tset);
best = winner(dptr, normfac, synth);
122
won[best]++;
wptr = outputWeights[best];
cptr = correc[best];
length = 0.0;
for (int i = 0; i < inputNeuronCount; i++)
{
diff = dptr[i] * normfac[0] - wptr[i];
length += diff * diff;
if (learn_method != 0)
cptr[i] += diff;
else
work[i] = rate * dptr[i] * normfac[0] + wptr[i];
}
diff = synth[0] - wptr[inputNeuronCount];
length += diff * diff;
if (learn_method != 0)
cptr[inputNeuronCount] += diff;
else
work[inputNeuronCount] = rate * synth[0] + wptr[inputNeuronCount];
if (length > bigerr[0])
bigerr[0] = length;
if (learn_method == 0)
{
normalizeWeight(work);
for (int i = 0; i <= inputNeuronCount; i++)
cptr[i] += work[i] - wptr[i];
}
}
bigerr[0] = Math.sqrt(bigerr[0]);
}
/**
* This method is called at the end of a training iteration. This method
* adjusts the weights based on the previous trial.
*
* @param rate
*
learning rate
* @param learn_method
*
method(0=additive, 1=subtractive)
* @param won
*
a holds number of times each neuron won
* @param bigcorr
*
holds the error
123
* @param correc
*
holds the correction
*/
void adjustWeights(double rate, int learn_method, int won[],
double bigcorr[], double correc[][])
{
double corr, cptr[], wptr[], length, f;
bigcorr[0] = 0.0;
for (int i = 0; i < outputNeuronCount; i++)
{
if (won[i] == 0)
continue;
wptr = outputWeights[i];
cptr = correc[i];
f = 1.0 / (double) won[i];
if (learn_method != 0)
f *= rate;
length = 0.0;
for (int j = 0; j <= inputNeuronCount; j++)
{
corr = f * cptr[j];
wptr[j] += corr;
length += corr * corr;
}
if (length > bigcorr[0])
bigcorr[0] = length;
}
// scale the correction
bigcorr[0] = Math.sqrt(bigcorr[0]) / rate;
}
/**
* If no neuron wins, then force a winner.
*
* @param won
*
how many times each neuron won
* @exception java.lang.RuntimeException
*/
void forceWin(int won[]) throws RuntimeException
124
{
int i, tset, best, size, which = 0;
double dptr[], normfac[] = new double[1];
double synth[] = new double[1], dist, optr[];
size = inputNeuronCount + 1;
dist = 1.E30;
for (tset = 0; tset < train.getTrainingSetCount(); tset++)
{
dptr = train.getInputSet(tset);
best = winner(dptr, normfac, synth);
if (output[best] < dist)
{
dist = output[best];
which = tset;
}
}
dptr = train.getInputSet(which);
best = winner(dptr, normfac, synth);
dist = -1.e30;
i = outputNeuronCount;
while ((i--) > 0)
{
if (won[i] != 0)
continue;
if (output[i] > dist)
{
dist = output[i];
which = i;
}
}
optr = outputWeights[which];
System.arraycopy(dptr, 0, optr, 0, dptr.length);
optr[inputNeuronCount] = synth[0] / normfac[0];
normalizeWeight(optr);
}
/**
* This method is called to train the network. It can run for a very long time
* and will report progress back to the owner object.
*
* @exception java.lang.RuntimeException
125
*/
public void learn() throws RuntimeException
{
int i, key, tset, iter, n_retry, nwts;
int won[], winners;
double work[], correc[][], rate, best_err, dptr[];
double bigerr[] = new double[1];
double bigcorr[] = new double[1];
KohonenNetwork bestnet; // Preserve best here
totalError = 1.0;
for (tset = 0; tset < train.getTrainingSetCount(); tset++)
{
dptr = train.getInputSet(tset);
if (vectorLength(dptr) < 1.E-30)
{
throw (new RuntimeException(
"Multiplicative normalization has null training case in trainint set "+tset));
}
}
bestnet = new KohonenNetwork(inputNeuronCount, outputNeuronCount/*, owner*/);
won = new int[outputNeuronCount];
correc = new double[outputNeuronCount][inputNeuronCount + 1];
if (learnMethod == 0)
work = new double[inputNeuronCount + 1];
else
work = null;
rate = learnRate;
initialize();
best_err = 1.e30;
// main loop:
n_retry = 0;
for (iter = 0;; iter++)
{
evaluateErrors(rate, learnMethod, won, bigerr, correc, work);
totalError = bigerr[0];
if (totalError < best_err)
{
126
best_err = totalError;
copyWeights(bestnet, this);
}
winners = 0;
for (i = 0; i < won.length; i++)
if (won[i] != 0)
winners++;
if (bigerr[0] < quitError)
break;
if ((winners < outputNeuronCount)
&& (winners < train.getTrainingSetCount()))
{
forceWin(won);
continue;
}
adjustWeights(rate, learnMethod, won, bigcorr, correc);
// owner.updateStats(n_retry,totalError,best_err);
if (halt)
{
// owner.updateStats(n_retry,totalError,best_err);
break;
}
Thread.yield();
if (bigcorr[0] < 1E-5)
{
if (++n_retry > retries)
break;
initialize();
iter = -1;
rate = learnRate;
continue;
}
if (rate > 0.01)
rate *= reduction;
}
// done
copyWeights(this, bestnet);
127
for (i = 0; i < outputNeuronCount; i++)
normalizeWeight(outputWeights[i]);
halt = true;
n_retry++;
// owner.updateStats(n_retry,totalError,best_err);
}
/**
* Called to initialize the Kononen network.
*/
public void initialize()
{
int i;
double optr[];
clearWeights();
randomizeWeights(outputWeights);
for (i = 0; i < outputNeuronCount; i++)
{
optr = outputWeights[i];
normalizeWeight(optr);
}
}
}
package som;
import java.io.*;
import java.util.*;
/**
* Java Neural Network Example Handwriting Recognition
* Copyright 2005 by Heaton Research, Inc.
* by Jeff Heaton (http://www.heatonresearch.com) 10-2005
*/
abstract public class Network implements Serializable
{
/**
* The value to consider a neuron on
*/
public final static double NEURON_ON = 0.9;
128
/**
* The value to consider a neuron off
*/
public final static double NEURON_OFF = 0.1;
/**
* Output neuron activations
*/
protected double output[];
/**
* Mean square error of the network
*/
protected double totalError;
/**
* Number of input neurons
*/
protected int inputNeuronCount;
/**
* Number of output neurons
*/
protected int outputNeuronCount;
/**
* Random number generator
*/
protected Random random = new Random(System.currentTimeMillis());
/**
* Called to learn from training sets.
*
* @exception java.lang.RuntimeException
*/
abstract public void learn() throws RuntimeException;
/**
* Called to present an input pattern.
*
* @param input
*
The input pattern
*/
abstract void trial(double[] input);
/**
* Called to get the output from a trial.
*/
129
double[] getOutput()
{
return output;
}
/**
* Called to calculate the trial errors.
*
* @param train
*
The training set.
* @return The trial error.
* @exception java.lang.RuntimeException
*/
double calculateTrialError(TrainingSet train) throws RuntimeException
{
int i, size, tset, tclass;
double diff;
totalError = 0.0; // reset total error to zero
// loop through all samples
for (int t = 0; t < train.getTrainingSetCount(); t++)
{
// trial
trial(train.getOutputSet(t));
tclass = (int) (train.getClassify(train.getInputCount() - 1));
for (i = 0; i < train.getOutputCount(); i++)
{
if (tclass == i)
diff = NEURON_ON - output[i];
else
diff = NEURON_OFF - output[i];
totalError += diff * diff;
}
for (i = 0; i < train.getOutputCount(); i++)
{
diff = train.getOutput(t, i) - output[i];
totalError += diff * diff;
}
}
totalError /= (double) train.getTrainingSetCount();
;
return totalError;
}
130
/**
* Calculate the length of a vector.
*
* @param v
*
vector
* @return Vector length.
*/
public static double vectorLength(double v[])
{
double rtn = 0.0;
for (int i = 0; i < v.length; i++)
rtn += v[i] * v[i];
return rtn;
}
/**
* Called to calculate a dot product.
*
* @param vec1
*
one vector
* @param vec2
*
another vector
* @return The dot product.
*/
double dotProduct(double vec1[], double vec2[])
{
int k, m, v;
double rtn;
rtn = 0.0;
k = vec1.length / 4;
m = vec1.length % 4;
v = 0;
while ((k--) > 0)
{
rtn += vec1[v] * vec2[v];
rtn += vec1[v + 1] * vec2[v + 1];
rtn += vec1[v + 2] * vec2[v + 2];
rtn += vec1[v + 3] * vec2[v + 3];
v += 4;
}
while ((m--) > 0)
{
rtn += vec1[v] * vec2[v];
131
v++;
}
return rtn;
}
/**
* Called to randomize weights.
*
* @param weight
*
A weight matrix.
*/
void randomizeWeights(double weight[][])
{
double r;
int temp = (int) (3.464101615 / (2. * Math.random())); // SQRT(12)=3.464...
for (int y = 0; y < weight.length; y++)
{
for (int x = 0; x < weight[0].length; x++)
{
r = (double) random.nextInt() + (double) random.nextInt()
- (double) random.nextInt() - (double) random.nextInt();
weight[y][x] = temp * r;
}
}
}
}
132
References
Abusafiya, M. & Mazumdar, S. 2004, Accommodating paper in document databases, ACM
New York, NY, USA, pp. 155-162.
Aura, T., Kuhn, T.A. & Roe, M. 2006, Scanning electronic documents for personally
identifiable information, ACM New York, NY, USA, pp. 41-50.
Avison, D. & Fitzgerald, G. 2006, Information systems development methodologies, tools
and techniques, 4 edn, McGraw-Hill Education.
Chen, H.H. 1996, Neural network: software simulation on a massively parallel computer,
University of Portsmouth, Unpublished B. Sc.(Hons) Computing Final Year
Project.
Cho, J.M. 2000, 'Chromosome classification using back propagation neural networks',
IEEE Engineering in Medicine and Biology Magazine, vol. 19, no. 1, pp. 28-33.
Gilb, T. 1985, Evolutionary Delivery versus the "waterfall model", vol. 10, ACM New
York, NY, USA, pp. 49-61.
Groetzner, M., Guenthner, U. & Streckeisen, H. 2004, Method of storage management in
document databases, United States Patent 6704753 , retrieved online from
http://www.freepatentsonline.com/6.704753html.
133
Kohonen, T. 1998, 'The self-organizing map', Neurocomputing, vol. 21, no. 1-3, pp. 1-6.
Konishi, K. & Ikeda, N.F.H. 2007, Data model and architecture of a paper-digital
document management system, ACM New York, NY, USA, pp. 29-31.
Lamarca, A.G., Dourish, J.P., Edwards, W.K. & Salisbury, M.P. 2006, Tagging related
files in a document management system, United States Patent 7086000 ,retrieved
online from http://www.freepatentsonline.com/7086000.html.
Lea, G.M. & Smith Judy Read, K.N.F. 2002, 'Records Management With Disk And
Practice Set With Disk:(2 Books With Disks).
Liu, S., Mcmahon, C.A. & Culley, S.J. 2008, A review of structured document retrieval
(SDR) technology to improve information access performance in engineering
document management, vol. 59, Elsevier, pp. 3-16.
Matheu, F. 2005, Life cycle document management system for construction, Doctoral
Thesis, Universitat Politecnica De Catalunya, Spain, retrieved online from
http://www.tdx.cat/TDX-0518105-155912/#documents.
Matsuo, H., Nakamura, T. & Tatekawa, M. 2001, Electronic paper file, United States
Patent 7249324 ,retrieved online from
http://www.freepatentsonline.com/7249324.html.
Matthews, J. 2000, 'An Introduction to Neural Networks', Generation5 at the forefront of
Artificial Intelligence.
Oja, M., Kaski, S. & Kohonen, T. 2003, 'Bibliography of self-organizing map (SOM)
papers: 1998-2001 addendum', Neural Computing Surveys, vol. 3, no. 1, pp. 1-156.
Omar, M. 2005, Felda document management system, Master thesis, Universiti of
Teknologi Malaysia.
Robson, C. 2002, Real world research: a resource for social scientists and practitioner
researchers. Blackwell, Oxford, UK.
Royce, W.W. 1987, 'Managing the development of large software systems: concepts and
techniques', IEEE Computer Society Press Los Alamitos, CA, USA, pp. 328-338.
Sellen, A. & Harper, R. 1997, Paper as an analytic resource for the design of new
technologies, ACM New York, NY, USA, pp. 319-326.
Sprague Jr, R.H. 1995, Electronic document management: Challenges and opportunities
for information systems managers, The Society for Information Management and
134
The Management Information Systems Research Center of the University of
Minnesota, pp. 29-49.
Yin, R.K. 1994, Case Study research: Design and methods, Second edition, Thousands
Oaks: Sage Publications, Inc.
York, R. 2006, Ecological paradoxes: William Stanley Jevons and the paperless office, vol.
13, SOCIETY FOR HUMAN ECOLOGY, p. 143.
Zantout, H. & Marir, F. 1999, Document management systems from current capabilities
towards intelligent information retrieval: an overview, vol. 19, Elsevier, pp. 471484.
Zikmund, W.G. 1987, Business research methods, Dryden Press.
135
Download