Integration of Structural and Statistical Information for Unconstrained

advertisement
Integration of Structural and Statistical Information for Unconstrained
Handwritten Numeral Recognition
Jinhai Cai
Zhi-Qiang Liu
cai@cs.mu.oz.au
zliu@cs.mu.oz.au
Department of Computer Science
The University of Melbourne
Parkville, Victoria 3052, Australia
Abstract
In this paper, we propose an approach that integrates
the statistical and structural information for unconstrained
handwritten numeral recognition. This approach uses stateduration adapted transition probability distribution to overcome the weakness of state-duration modeling of conventional HMMs and uses macro-states to tackle the difficulty
for HMMs to model pattern structures. Consequently, the
proposed method is superior to conventional approaches in
many aspects. The experimental results show that the proposed approach can achieve high performance in terms of
speed and accuracy.
1. Introduction
Recognition of unconstrained handwritten characters has
gained considerable attention in different areas due to its potential applications. Since late 60’s, research in this area has
made impressive progress and many systems [1,2,4] have
been developed, particularly in machine printed and on-line
character recognition. However, there is still a significant
performance gap between human and machine in the recognition of off-line totally unconstrained handwritten character recognition.
Generally, there are three categories of handwritten character classifiers: neural network approach [1,2], statistical approach [3] and structural approach [3]. Hidden
Markov model (HMM) has been widely used in automatic
speech recognition [6]. HMM provides a powerful statistical framework for modeling sequential inputs. It has also
found significant applications in the field of on-line handwritten word recognition because dynamic information is
available and features can be easily arranged as sequences of
vectors. Recently, researchers have proposed to use HMMs
for the recognition of off-line handwritten words. The features are extracted sequentially from pixels line by line
(column or row) [4]. In some systems, directional projection histograms [5] are used as features due to their onedimensional nature. However, these methods have difficulties to model structural information.
This paper proposes a new method to use fully the statis-
tical and structural information for the recognition of unconstrained handwritten numerals. In order to do so, we define
the state of a given observation as micro-state and the collection of individual micro-states as macro-state. The statistical information is described by micro-states using HMMs
and the structural information is modelled by singletons and
the relationships between macro-states.
2. Feature Extraction
Feature extraction is important to the success of handwritten word recognition. We extracted features from closed
outer contours. A perfect contour must satisfy the condition:
Every pixel of a contour has only two neighbouring pixels in
the 8-neighbour system. The perfect contour is extracted by
following steps:
1. Perform erosion once, which deletes all boundary pixels if they are deletable.
2. Perform standard dilation once.
3. Obtain the contour and delete redundant pixels.
In our system, we use chain code-based features including
locations, orientations and curvatures. The orientation of
segment k is defined as the direction from segment k , 1
to k + 1 that is encoded into one of the 16 directions. The
curvature vector of segment i is defined as: fcxi ; cyi g =
fxoi , xi ; yoi , yi g, where xoi = (xi+1 + xi,1 )=2 and
yoi = (yi+1 + yi,1 )=2. The curvature vectors associated
with a particular orientation are quantized into three codebooks. Therefore, the curvature vectors and the orientations
of line segments form 16 3 = 48 codebooks. In such a
way, an input image can be represented by a set of vectors:
O = fO1; O2 ; ; OT g;
(1)
where Oi = f(xi ; yi ); (Di )g, Di is the index of the codebook and T is the length of observation sequence.
3. The Hidden Markov Model
The outer contours can be represented by sequential feature vectors, which are regarded as the equivalence of timevarying signals. For unconstrained handwritten characters,
there are large variations in writing styles. Taking the two
“0”s shown in Fig.1(a), for example, two “0”s can be modeled by HMMs with a state sequence j ! k ! q ! and j ! o ! ! p ! q ! , respectively. Two
HMMs can be combined into one model shown in Fig.1(b).
Therefore, HMMs are able to deal with variations and uncertainty present in handwritten images.
k
j
t
j p
q
j
j
q
(a)
... j
k
... p
o
q
... f
(b)
Figure 1. An example of using one HMM to model
a pattern class with huge variations.
Generally, an HMM with N states can be characterised
by: (1) the state transition probability matrix,
= faij g;
(2) the output probability matrix, = fbj (v )g, where v can
be an observation symbol or vector; (3) the initial state distribution, = fi g. In our system, we use the following
additional parameters: the final state distribution = ffi g,
the state duration distribution s = fpi (di )g and the model
duration distribution w = fpw (dw )g. The complete parameter array of an HMM can be represented compactly as:
A
B
F
D
D
w = (A; B; ; F; Ds ; Dw ):
(2)
3.1. Recognition based on HMMs
For an HMM-based system, the classification decision
is based on the conditional probability of
for the given
model w , p( jw ), which is given by
O
O
p(Ojw ) =
OS
O
X p(O; Sj
all S
w );
aij (di ) =
Pk>iNijNik [1 , aii(di)];
(4)
where Nij is the number of state transitions from state i to
state j .
Further, the model duration probability and the final state
probability can also be used to improve the performance of
handwritten character recognition. Combining all these parameters and taking the computation into consideration, we
modify the Viterbi algorithm as follows.
Step 1: Initialisation.
o
t
i
cope with the problem and does not increase the computation. As the model used in Vaseghi’s method can skip over
only one state and our model can skip over several states, we
modify the state transition probabilities as:
(3)
where p( ; jw ) is the probability of the observation sequence and the set of states for given model w . The
amount of direct computation of (3) is infeasible [6]. Fortunately, there are two very efficient algorithms, namely, the
Baum-Welch algorithm [6] and the Viterbi algorithm [7],
available to calculate p( jw ). We adopted the Viterbi algorithm due to the following reasons:
1. Viterbi algorithm can obtain the best state sequence
that is particularly useful in modeling state duration.
2. Viterbi algorithm can greatly reduce computation by
replacing logarithms and products by summations.
S
O
3.2. Duration modelling and final state modelling
The modeling of state duration is one major weakness of
the conventional HMMs [6]. Vaseghi proposed a method [8]
that uses state distribution depended transition probability to
1 (i) = log i + log bi (O1 );
1 (i) = 0 0 i N , 1;
d1i = 1:
Step 2: Recursion. From time t=2 to T.
t (j ) = max
[log t,1 (i) + log aij (dti,1 ) +
i
log bj (Ot )];
t (j ) = arg max
[log t,1 (i) + log aij (dti,1 )];
i
dtj = dti,1 + 1 i = j; or dtj = 1 i < j:
Step 3: Termination. (SF is the final state set.)
1 (w) = log p (Ojw ) = max [log T (s)+log fs +
s2SF
log pw (dw )];
sT = arg max [log T (s) + log fs + log pw (dw )]:
s2SF
Step 4: State path backtracking. Form time t=T-1 to 1.
st = t+1 (st+1 ):
4. Structural Modeling
Another major weakness of HMMs is that it is difficult to
model structural information. This problem can be solved
if we take the advantage of the macro-states. The structure of the character can be modeled by macro-states (singletons) and the relationships between macro-states (twonodes). The similarity between the character and the reference model is measured by matching the structure of the
character with the best state sequence instead of the whole
model.
A singleton can be described by three parameters: orientations, d1i and d2i , and location fXmi ; Ymi g shown in
Fig.2(a). The distributions of orientations can not be directly
modeled by Gaussian or the mixture of Gaussian distributions, because the errors may occur in the estimation of average orientations. Therefore, the orientations are quantized
into 16 codes.
The relationships between macro-states (two-nodes) are
described by relative positions and relative orientations. For
two macro-states i and j (j 6= i) shown in Fig.2(b), the relative position fXij ; Yij g is defined as
fXij ; Yij g = fXmj , Xmi ; Ymj , Ymi g:
(5)
The relative orientation dij between the two macro-states
is defined as the angle between d2i and d1j measured anticlockwise (see Fig.2(c)). The relative orientation is also encoded into one of the 16 orientation codebooks.
{ Xij ,Yij }
d 2i
{X mi ,Y mi}
d1 i
d 2j
d2i
d 1j
d2i
dij
d 1j
d 1i
(a)
(b)
Table 1. Comparison of performance with and
without SMs.
(c)
Figure 2. (a) Singletons. (b) Two-node. (c) Relative
orientation.
The structure matching is based on the cost of mismatching in terms of the structural description. The matching criterion is defined as
PNi=0,1 1iG1(Xmi; Ymi; d1i; d2ijMw ) +
PNi=0,1 PNj=0,1;j6=i 2ij G2(Xij ; Yij ; dij jMw );
2 (w)=
(6)
M
where 1i and 2ij are weights, w is the structural model
of class w, G1 () is a matching function for a singleton and
G2 () is a matching function for a two-node pair. Now the
problem of combining structural information becomes one
of designing the matching function. The logarithmic Gaussian functions are used in this paper, as they are consistent
with the functions used in the modified Viterbi algorithm described in Section 3.
5. Experimental Results
To evaluate the proposed approach, a standard subset of
CEDAR CDROM1 database is used. This database consists
of 18,468 digit images for training and 2711 digit images for
testing. In the test database (bindigis/bs), there is a subset
(goodbs) consisting of 2213 well segmented digit images.
Some examples are shown in Fig.3.
Figure 3. Some samples in the database.
In our experiments, the recognition of unconstrained
handwritten numerals is performed by
w = arg max
f (w) + 2 (w)g:
w 1
further improved if some global features are used and multiple models per class are adopted.
As far as the recognition speed is concerned, the average recognition speed measured on Silicon Graphics
(433MHz IP7) is 7 digits per second.
(7)
We conducted our experiments on the “bindigis” set and
the “goodbs” set. The performance of the recognizer is
given in Table 1. The structural models (SMs) can reduce
the recognition errors of HMMs by 15.45% and 18.18%, respectively. Our experimental results achieved by one model
per class are comparable to the best results [1,2,4,9] published recently. Some other approaches use several models
per class or combine several classifiers to recognize handwritten numerals. For instance, Hwang and Bang [9] use 50
models per class to achieve 97.90% recognition rate. It is
expected that performance of the proposed approach can be
Test Set
bindigis
goodbs
Correct Recognition Rates
HMMs
SSMs
95.46%
96.16%
98.01%
98.37%
Error Reduction
15.45%
18.18%
6. Conclusion
A new approach for unconstrained handwritten numeral
recognition has been presented. This approach integrates
statistical and structural information. The success of this
system lies in that the features extracted from perfect outer
contours can be arranged into sequent vectors that are suitable for HMMs; and segmentation by macro-states avoids
inconsistent segmention problems in traditional structural
approaches and multi-to-multi matching that is time consuming. The experimental results show that the proposed
approach based on statistical-structural models can achieve
high performance in terms of recognition speed and accuracy.
References
[1] S.W. Lee, “Off-Line Recognition of Totally Unconstrained
Handwritten Numerals Using Multilayer Cluster Neural Network,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.18, No.6, pp.648-652, June 1996.
[2] S.B. Cho, “Neural-Network Classifiers for Recognizing Totally Unconstrained Handwritten Numerals,” IEEE Trans. on
Neural Networks, Vol.8, No.1, pp.43-53, January 1997.
[3] R. Schalkoff, Pattern Recognition: Statistical, Structural
and Neural Approaches, John Wiley & Sons, 1992.
[4] A.J. Elms, The representation and Recognition of text using hidden Markov modelsi, PhD Thesis, Department of
Electronic and Electrical Engineering, University of Surrey,
1996.
[5] H.S. Park and S.W. Lee, “Off-line recognition of large-set
handwritten characters with multiple hidden Markov Models,” Pattern Recognition, Vol.29, No.2, pp.231-244, 1996.
[6] L.R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of
The IEEE, Vol.77, No.2, pp.257-286, February 1989.
[7] A. Viterbi, “Error bounds for convolutional codes and an
asymptotically optimum decoding algorithm,” IEEE Transactions On Information Theory, Vol.13, No.2, pp.260-269,
April 1967.
[8] S.V. Vaseghi, “State duration modelling in hidden Markov
models,” Signal Processing, Vol.42, pp.31-41, 1995.
[9] Y.S. Hwang and S.Y. Bang, “Recognition of unconstrained
handwritten numerals by a radial basis function neural
network classifier,” Pattern Recognition Letters, Vol.18,
pp.657-664, 1997.
Download