An Automation System to recognize Sinhala Handwritten Characters

advertisement
An Automation System to recognize Sinhala Handwritten Characters
using Artificial Neural Networks
Abstract
In the Government of Sri Lanka, most of the information based activities are still carried out
manually. This research attempt proposes a new way to automate an important public service which
is fundamental by nature, issuing the National Identity Card (NIC). This presents an approach for
recognize Sinhala Handwritten Characters in the application forms. Initially s e t o f
handwritings of 30 individuals were collected and then two third of those samples were used for
the training process and the remaining one third were used for the testing process. The scanned
images of the Characters were gone through preprocessing for the further processing. Finding
boundaries and the Normalization of the characters is going to handle by the preprocessor. After
preprocessing, segmentation is done in order to get the individual characters from the list of
Characters. Standard image processing techniques were employed to accomplish these tasks. Then
they were trained by an Artificial Neural Network (ANN). The recognition of Sinhala characters is
done by an ANN which is widely used in applications involving uncertainty. Rules are imposed on
the results of the neural networks (NN) to make the recognition process more accurate. Then the
details of the applicant are appended to the database. The outcome of this research will be beneficial
to the general public at large.
Introduction
In many countries, e-government strategies are being used to make the government
processes more efficient and accurate. Typically, in order to get something done from
Gramma Niladari (GN), citizens have to fill out required application forms. In the
current scenario, most of the forms are in the medium of Sinhala. Therefore when GN
gets those filled application forms he has to go through them manually and then the
processing has to be done. If the Government have an integrated system which is
possible to connect all the basic entities in the process then the automation becomes
easier. Therefore this study is an initiating point which will support the concept of egovernment in the Sri Lankan context. In Sri Lanka, most probably citizens are
connected to Government through GN. Automation of the services done by GN plays
an important role in the development process of a realistic e-government strategy. In
order to achieve these goals, this system proposes a system that allows the extraction of
Sinhala Handwritten Characters from the above mentioned application forms.
The most important component of this task is extracting data from forms which are filled
by citizens in Sinhala language which involves Sinhala handwritten character
recognition. Usually, people submit the applications by filling them in their handwriting.
Nandasara (1995) states that it is really a challenging task to identify handwritten
characters since the variation which has to be captured among the characters is
high. Furthermore, due to the special structure of the Sinhala characters the
recognition process become complex [1]. Previously much effort has been carried out in
this area of making a computer recognize both handwritten and typed characters
automatically. Until quite recently, this effort is on recognizing English characters.
However for the Asian languages such as Sinhala and Tamil there were few efforts.
Methods which are widely used for character recognition in these kinds of
languages include pattern matching using image processing techniques.
1
Material & Methods
1. Data Acquisition
In the data acquisition stage, handwritings of 30 people are collected. Handwritings of
20 people are used for training the neural network and remaining 10 handwritings
are used for testing. When collecting sample letters from individuals, blank A4 sheet
with dotted pencil lines is used. After that each person is advised to write a given set
of letters on those dotted lines. Subsequent to that, the pencil lines on the sheets are
erased and they are scanned by a HP Scan jet Scanner with 200 dpi resolution.
Another important thing that should be mentioned is, when collecting letters only
limited number of characters is collected, since some of the characters are rarely used in
the context.
Implementation of the NN consists of four main steps. They can be represented in the
following diagram. Those are
i.
ii.
iii.
iv.
Pre Processing
Segmentation
Training
Post Processing
Figure 1.0 – Main Steps of Word Processing
2. Pre Processing
The image is prepared for further
processing. Initially, image has gone
through a filtering process to facili tate
removing the noise that could be added
during the scanning process. The term noise is to be understood as anything that
prevents recognition system from fulfilling its objective. Noise can be added to the
image due to the roughness of the paper. It was observed that scanned image contains
salt and pepper noise. Therefore in order to remove noise median filtering was used.
After filtering, it is binarized or converted to black and white. This is done to ease the
processing. Different people can write the letters in diverse colors. Therefore in order to
avoid that, effect binarization [10] is done. In most of the typical character recognition
systems, these steps are followed before the processing stage. After the binarization, the
next step is to make the characters thin. The goal of thinning is to eliminate the thickness
differences of pen by making the image one pixel thick. When writing the letters, they
are blotted with ink and hence letters become much thicker. Therefore to avoid this
effect, thinning can be used. Thus it takes all the letters into a one particular standard
format. For thinning morphological operations were applied.These are the three steps that
have been followed under the preprocessing stage. For each of these image processing
techniques there were built-in functions in MATLAB [3]. Those built-in methods were
used in the implementation process.
2
3. Segmentation
In the segmentation stage, the image is divided into characters. Then from each
character all the white spaces around them are removed.
Figure 2.0 - Finding Boundaries of a Character
Projection profiles of the image are used to crop the image into text lines and after that
to individual letters. Initially, horizontal projection profile is used to detect the text
lines of the image and afterwards image is segmented into text lines. Then the vertical
projection profiles of those text lines were used to segment them into individual
Characters.
Since the scanned image consists of 9 text lines, horizontal histogram also
consists of 9 bars corresponding to each of those text lines. Then the boundaries of
those bars can be obtained and after that using them the image can be cropped into text
lines. After obtaining the text line, letters has to be cropped in an attempt to input to the
system for processing. Those letters can be prepared by getting the vertical histogram of
the text line. Vertical projection histogram shows how the letters are distributed within
the text line. Boundaries of the characters can be found. After getting the boundaries of
the characters, each of them can be cropped. Then the characters can be isolated in order
to input to the system.
This procedure was done for all the characters which were in the collected data set. Then
using those column vectors input vector was created. Then for the segmented characters,
NN had to be created and thus input vector and the test vector were created. Then the
NN had to be trained with those input vector.
Results and Discussion
The most salient feature of NN is their massive processing units and interconnectivity.
Unless handled carefully, the various parameters involved in the architecture of the
NN may cause the training process (adjusting weights) to slow down considerably.
Some of these parameters are: the number of layers, number of neurons in each layer,
the initial values of weights, the training coefficient and the tolerance of the correctness.
The optimal selection of parameters varies depending on the alphabet. So as to train
the weights, an initial set of weights is tested against each input vector. If an input
vector is found for which the recognition fails, weights are adjusted to suit the
particular input vector. However, this adjustment might also affect the recognition
of other input vectors which have already been tested. So, the entire model needs to be
tested all over again from the beginning.
ANNs are capable of abstracting the essence of a set of inputs. For example, a
network can be trained on a sequence of distorted versions of a letter. After
adequate training, application of such a distorted example will cause the network to
3
produce a perfectly formed letter. Experimental results have revealed that training of
more than 20 such distorted versions of the same letter produces correct results with a
very high percentage of accuracy.
Back propagation is a systematic method for training multilayer ANNs (perceptron).
The Sigmoid compresses the range of NET so that OUT lies between zero and one.
Since the back propagation uses the derivative of the squashing function [2], it has to be
everywhere differentiable. The Sigmoid has this property and the additional advantage of
providing a form of automatic gain control. Properly trained back propagation
networks tend to give reasonable answers when presented with inputs that they have
never seen. Typically, a new input leads to an output similar to the correct output for
input vectors used in training that are similar to the new input being presented. This
generalization property makes it possible to train a network on a representative set of
input/target pairs and get good results without training the network on all possible
input/output pairs.
Conclusions
One of the major problems of doing this for Sinhala handwritten characters is that
they do not appear at the same relative location of the letter due to the different
proportions in which characters are written by different writers of the language [6].
Even the same person may not always write the same letter with the same proportions.
Even the normalization of the characters into a standard size does not completely
eliminate this effect, although it does help to some extent.
Training is the most important and the most time consuming activity of
NN implementations. An efficient system should take the minimum training time
possible. To minimize the training time, experiments should be carried out on the values
of the parameters to choose a better set of values which reduces the training time.
There are certain factors that affect training time and performance of the networks.
Following are the parameters that could be adjusted to minimize the training time:
a)
b)
c)
d)
e)
f)
g)
h)
i)
j)
Initial values of the weights
Number of neurons in the hidden layer
Training coefficient
Tolerance
Grid size used to extract bit patterns from the input
image Size of the training data set
Constituent characters in the training set
Form of the input (i.e. individual handwriting)
How representative the training set
How representative the test set for generalization
Therefore training is a process which has to be carried out carefully in order to obtain a
good recognition rate.
4
This doesn’t indicate any major problems
with the training. The validation and test
curves are very similar. If the test curve
had increased significantly before the
validation curve increased, then it is
possible that some over fitting might have
occurred. According to the above graph
mean squared error reduces with time
while the neural network is testing,
validating and training. It is a good
performance measure.
Figure 3.0 - Performance Plot
From the whole exercise of attempting to use NN techniques for the recognition
of characters in the Sinhala alphabet, it was discovered that there is a separate approach
which could be developed by employing NN techniques together with image
processing techniques. The reasons could be unreliability of the data and the
segments. To obtain a better output or to improve results, several attempts were taken.
Since the network was not sufficiently accurate, the network was reinitialized and the
training was done again. Each time a feed forward network is initialized; the network
parameters are different and might produce different solutions. However it could not
achieve a considerable recognition rate.
Since it was not successful, it was attempted to improve the results by increasing the
number of hidden neurons above 20. Larger numbers of neurons in the hidden layer
give the network more flexibility because the network has more p a r a m e t e r s it can
optimize. When increasing the layer size gradually, if the hidden layer is made too
large, it might cause the problem to be under characterized and the network must
optimize more parameters than there are data vectors to constraint these parameters.
However that effort was not successful anyway.
Then the third option was to try a different training function. Bayesian regularization
training with trainbr, for example, can sometimes produce better generalization
capability than using early stopping. Other than that, training functions such as
trainscg and trainrp were used which will be more appropriate for character
recognition systems. However from that also considerable recognition rate could not be
achieved. Since any of the above methods could not produce a better result it was
decided to use additional data for training and the testing Stages. Sometimes the
handwriting styles in the training data set might similar to each other. Therefore it
could be a possible reason for not getting a higher testing rate. Providing additional
data for the network is more likely to produce a network that generalizes well to new
data. By increasing the Number of character samples better results can be expected.
This will help to achieve a more generalized trained network.
5
Training a neural network with a higher testing rate is also a challenge. That is over
fitting can be occurred. In this scenario, network has trained to classify only the items in
the training set. However, if it is trained well for patterns in the training set then it cannot
classify the items which it has never seen. When selecting the training sets, it is better
to group the character sets according to the shape of the character (such as round or
squared) or the size of the characters. Then the results would be more accurate and it
would be a complete system for character training and testing. More training sets and
training programs are required to develop such a system. A user interface can be
introduced to make it user friendly. A menu driven program with push button controls
and pictures would be attractive. In this is current state the NN can be used only for
training eight characters on Sinhala Alphabet with an initial guidance. A trainee can
continue training while enjoying it as a game.
References
Rajapakse, Jagath(2000) “Neural Networks and Pattern Recognition”
Notes, Nanyang Technological University, Singapore, December
_Course
Aleksander, Igore & Morton, Helen (1991): “An Introduction to Neural Computing”,
Chapman & Hal, ISBN 0 412 37780 2
Documentation, MATLAB Version 7.1.2 (R11) The Mathworks, Inc., Jan.21, 1999
Nandasara, S. T., Disanayake, J. B., Samaranayake, V. K., Seneviratne, E. K and
Koannantakool, T. (1990): Draft Standards for the use of Sinhala in Computer
Technology” by the Computer & Information Council of Sri Lanka (CINTEC)
Beale, R. and Jakson, T. (1990): “Neural Computing – An Introduction”, IOP publishing
Ltd. ISBN 0 852774 262 2
Disanayake, J. B. (1993): “Lets Read and Write Sinhala”, Pioneer Lanka
Valluru, R. and Hayagriva, R. (1996): C++ Neural Networks and Fuzzy Logic, BPB
Publications
Earal Gose, Riched Johnsonbaugh and Steve Jost (2003), Pattern Recognition and
Image Analysis. Prentice-Hall, India
Hemakumar L. Prematathne and J.Bibun (2002), Recognition of Printed Sinhala Characters
Using Linear Symmetry. The 5th Asian Conference on Computer Vision
Manning, Christopher D. and Hinrich Schutze (2000), Foundations of Statistical Natural
Language Processing. MIT Press
6
Download