Integration of Structural and Statistical Information for Unconstrained Handwritten Numeral Recognition Jinhai Cai Zhi-Qiang Liu cai@cs.mu.oz.au zliu@cs.mu.oz.au Department of Computer Science The University of Melbourne Parkville, Victoria 3052, Australia Abstract In this paper, we propose an approach that integrates the statistical and structural information for unconstrained handwritten numeral recognition. This approach uses stateduration adapted transition probability distribution to overcome the weakness of state-duration modeling of conventional HMMs and uses macro-states to tackle the difficulty for HMMs to model pattern structures. Consequently, the proposed method is superior to conventional approaches in many aspects. The experimental results show that the proposed approach can achieve high performance in terms of speed and accuracy. 1. Introduction Recognition of unconstrained handwritten characters has gained considerable attention in different areas due to its potential applications. Since late 60’s, research in this area has made impressive progress and many systems [1,2,4] have been developed, particularly in machine printed and on-line character recognition. However, there is still a significant performance gap between human and machine in the recognition of off-line totally unconstrained handwritten character recognition. Generally, there are three categories of handwritten character classifiers: neural network approach [1,2], statistical approach [3] and structural approach [3]. Hidden Markov model (HMM) has been widely used in automatic speech recognition [6]. HMM provides a powerful statistical framework for modeling sequential inputs. It has also found significant applications in the field of on-line handwritten word recognition because dynamic information is available and features can be easily arranged as sequences of vectors. Recently, researchers have proposed to use HMMs for the recognition of off-line handwritten words. The features are extracted sequentially from pixels line by line (column or row) [4]. In some systems, directional projection histograms [5] are used as features due to their onedimensional nature. However, these methods have difficulties to model structural information. This paper proposes a new method to use fully the statis- tical and structural information for the recognition of unconstrained handwritten numerals. In order to do so, we define the state of a given observation as micro-state and the collection of individual micro-states as macro-state. The statistical information is described by micro-states using HMMs and the structural information is modelled by singletons and the relationships between macro-states. 2. Feature Extraction Feature extraction is important to the success of handwritten word recognition. We extracted features from closed outer contours. A perfect contour must satisfy the condition: Every pixel of a contour has only two neighbouring pixels in the 8-neighbour system. The perfect contour is extracted by following steps: 1. Perform erosion once, which deletes all boundary pixels if they are deletable. 2. Perform standard dilation once. 3. Obtain the contour and delete redundant pixels. In our system, we use chain code-based features including locations, orientations and curvatures. The orientation of segment k is defined as the direction from segment k , 1 to k + 1 that is encoded into one of the 16 directions. The curvature vector of segment i is defined as: fcxi ; cyi g = fxoi , xi ; yoi , yi g, where xoi = (xi+1 + xi,1 )=2 and yoi = (yi+1 + yi,1 )=2. The curvature vectors associated with a particular orientation are quantized into three codebooks. Therefore, the curvature vectors and the orientations of line segments form 16 3 = 48 codebooks. In such a way, an input image can be represented by a set of vectors: O = fO1; O2 ; ; OT g; (1) where Oi = f(xi ; yi ); (Di )g, Di is the index of the codebook and T is the length of observation sequence. 3. The Hidden Markov Model The outer contours can be represented by sequential feature vectors, which are regarded as the equivalence of timevarying signals. For unconstrained handwritten characters, there are large variations in writing styles. Taking the two “0”s shown in Fig.1(a), for example, two “0”s can be modeled by HMMs with a state sequence j ! k ! q ! and j ! o ! ! p ! q ! , respectively. Two HMMs can be combined into one model shown in Fig.1(b). Therefore, HMMs are able to deal with variations and uncertainty present in handwritten images. k j t j p q j j q (a) ... j k ... p o q ... f (b) Figure 1. An example of using one HMM to model a pattern class with huge variations. Generally, an HMM with N states can be characterised by: (1) the state transition probability matrix, = faij g; (2) the output probability matrix, = fbj (v )g, where v can be an observation symbol or vector; (3) the initial state distribution, = fi g. In our system, we use the following additional parameters: the final state distribution = ffi g, the state duration distribution s = fpi (di )g and the model duration distribution w = fpw (dw )g. The complete parameter array of an HMM can be represented compactly as: A B F D D w = (A; B; ; F; Ds ; Dw ): (2) 3.1. Recognition based on HMMs For an HMM-based system, the classification decision is based on the conditional probability of for the given model w , p( jw ), which is given by O O p(Ojw ) = OS O X p(O; Sj all S w ); aij (di ) = Pk>iNijNik [1 , aii(di)]; (4) where Nij is the number of state transitions from state i to state j . Further, the model duration probability and the final state probability can also be used to improve the performance of handwritten character recognition. Combining all these parameters and taking the computation into consideration, we modify the Viterbi algorithm as follows. Step 1: Initialisation. o t i cope with the problem and does not increase the computation. As the model used in Vaseghi’s method can skip over only one state and our model can skip over several states, we modify the state transition probabilities as: (3) where p( ; jw ) is the probability of the observation sequence and the set of states for given model w . The amount of direct computation of (3) is infeasible [6]. Fortunately, there are two very efficient algorithms, namely, the Baum-Welch algorithm [6] and the Viterbi algorithm [7], available to calculate p( jw ). We adopted the Viterbi algorithm due to the following reasons: 1. Viterbi algorithm can obtain the best state sequence that is particularly useful in modeling state duration. 2. Viterbi algorithm can greatly reduce computation by replacing logarithms and products by summations. S O 3.2. Duration modelling and final state modelling The modeling of state duration is one major weakness of the conventional HMMs [6]. Vaseghi proposed a method [8] that uses state distribution depended transition probability to 1 (i) = log i + log bi (O1 ); 1 (i) = 0 0 i N , 1; d1i = 1: Step 2: Recursion. From time t=2 to T. t (j ) = max [log t,1 (i) + log aij (dti,1 ) + i log bj (Ot )]; t (j ) = arg max [log t,1 (i) + log aij (dti,1 )]; i dtj = dti,1 + 1 i = j; or dtj = 1 i < j: Step 3: Termination. (SF is the final state set.) 1 (w) = log p (Ojw ) = max [log T (s)+log fs + s2SF log pw (dw )]; sT = arg max [log T (s) + log fs + log pw (dw )]: s2SF Step 4: State path backtracking. Form time t=T-1 to 1. st = t+1 (st+1 ): 4. Structural Modeling Another major weakness of HMMs is that it is difficult to model structural information. This problem can be solved if we take the advantage of the macro-states. The structure of the character can be modeled by macro-states (singletons) and the relationships between macro-states (twonodes). The similarity between the character and the reference model is measured by matching the structure of the character with the best state sequence instead of the whole model. A singleton can be described by three parameters: orientations, d1i and d2i , and location fXmi ; Ymi g shown in Fig.2(a). The distributions of orientations can not be directly modeled by Gaussian or the mixture of Gaussian distributions, because the errors may occur in the estimation of average orientations. Therefore, the orientations are quantized into 16 codes. The relationships between macro-states (two-nodes) are described by relative positions and relative orientations. For two macro-states i and j (j 6= i) shown in Fig.2(b), the relative position fXij ; Yij g is defined as fXij ; Yij g = fXmj , Xmi ; Ymj , Ymi g: (5) The relative orientation dij between the two macro-states is defined as the angle between d2i and d1j measured anticlockwise (see Fig.2(c)). The relative orientation is also encoded into one of the 16 orientation codebooks. { Xij ,Yij } d 2i {X mi ,Y mi} d1 i d 2j d2i d 1j d2i dij d 1j d 1i (a) (b) Table 1. Comparison of performance with and without SMs. (c) Figure 2. (a) Singletons. (b) Two-node. (c) Relative orientation. The structure matching is based on the cost of mismatching in terms of the structural description. The matching criterion is defined as PNi=0,1 1iG1(Xmi; Ymi; d1i; d2ijMw ) + PNi=0,1 PNj=0,1;j6=i 2ij G2(Xij ; Yij ; dij jMw ); 2 (w)= (6) M where 1i and 2ij are weights, w is the structural model of class w, G1 () is a matching function for a singleton and G2 () is a matching function for a two-node pair. Now the problem of combining structural information becomes one of designing the matching function. The logarithmic Gaussian functions are used in this paper, as they are consistent with the functions used in the modified Viterbi algorithm described in Section 3. 5. Experimental Results To evaluate the proposed approach, a standard subset of CEDAR CDROM1 database is used. This database consists of 18,468 digit images for training and 2711 digit images for testing. In the test database (bindigis/bs), there is a subset (goodbs) consisting of 2213 well segmented digit images. Some examples are shown in Fig.3. Figure 3. Some samples in the database. In our experiments, the recognition of unconstrained handwritten numerals is performed by w = arg max f (w) + 2 (w)g: w 1 further improved if some global features are used and multiple models per class are adopted. As far as the recognition speed is concerned, the average recognition speed measured on Silicon Graphics (433MHz IP7) is 7 digits per second. (7) We conducted our experiments on the “bindigis” set and the “goodbs” set. The performance of the recognizer is given in Table 1. The structural models (SMs) can reduce the recognition errors of HMMs by 15.45% and 18.18%, respectively. Our experimental results achieved by one model per class are comparable to the best results [1,2,4,9] published recently. Some other approaches use several models per class or combine several classifiers to recognize handwritten numerals. For instance, Hwang and Bang [9] use 50 models per class to achieve 97.90% recognition rate. It is expected that performance of the proposed approach can be Test Set bindigis goodbs Correct Recognition Rates HMMs SSMs 95.46% 96.16% 98.01% 98.37% Error Reduction 15.45% 18.18% 6. Conclusion A new approach for unconstrained handwritten numeral recognition has been presented. This approach integrates statistical and structural information. The success of this system lies in that the features extracted from perfect outer contours can be arranged into sequent vectors that are suitable for HMMs; and segmentation by macro-states avoids inconsistent segmention problems in traditional structural approaches and multi-to-multi matching that is time consuming. The experimental results show that the proposed approach based on statistical-structural models can achieve high performance in terms of recognition speed and accuracy. References [1] S.W. Lee, “Off-Line Recognition of Totally Unconstrained Handwritten Numerals Using Multilayer Cluster Neural Network,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.18, No.6, pp.648-652, June 1996. [2] S.B. Cho, “Neural-Network Classifiers for Recognizing Totally Unconstrained Handwritten Numerals,” IEEE Trans. on Neural Networks, Vol.8, No.1, pp.43-53, January 1997. [3] R. Schalkoff, Pattern Recognition: Statistical, Structural and Neural Approaches, John Wiley & Sons, 1992. [4] A.J. Elms, The representation and Recognition of text using hidden Markov modelsi, PhD Thesis, Department of Electronic and Electrical Engineering, University of Surrey, 1996. [5] H.S. Park and S.W. Lee, “Off-line recognition of large-set handwritten characters with multiple hidden Markov Models,” Pattern Recognition, Vol.29, No.2, pp.231-244, 1996. [6] L.R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of The IEEE, Vol.77, No.2, pp.257-286, February 1989. [7] A. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” IEEE Transactions On Information Theory, Vol.13, No.2, pp.260-269, April 1967. [8] S.V. Vaseghi, “State duration modelling in hidden Markov models,” Signal Processing, Vol.42, pp.31-41, 1995. [9] Y.S. Hwang and S.Y. Bang, “Recognition of unconstrained handwritten numerals by a radial basis function neural network classifier,” Pattern Recognition Letters, Vol.18, pp.657-664, 1997.