Pattern Recognition Letters 23 (2002) 45±56 www.elsevier.com/locate/patrec A novel feature extraction method and hybrid tree classi®cation for handwritten numeral recognition Zhang Ping *, Chen Lihui Digital Signal Processing Laboratory, S2-B4-a03, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore Received 8 March 2000; received in revised form 2 January 2001 Abstract A hybrid classi®cation system with neural network and decision tree as the classi®ers for handwritten numeral recognition is proposed. Firstly a variety of stable and reliable global features are de®ned and extracted based on the character geometric structures, a novel ¯oating detector is then proposed to detect segments along the left and right pro®les of a character image used as local features. The recognition system consists of a hierarchical coarse classi®cation and ®ne classi®cation. For the coarse classi®er: a three-layer feed forward neural network with back propagation learning algorithm is employed to distinguish six subsets f0g, f6g, f8g, f1; 7g, f2; 3; 5g, f4; 9g based on the feature similarity of characters extracted. Three character classes namely f0g, f6g and f8g are directly recognized from arti®cial neural network (ANN). For each of characters in the latter three subsets, a decision tree classi®er is built for further ®ne classi®cation as follows: Firstly, the speci®c feature±class relationship is heuristically and empirically deduced between the feature primitives and corresponding semantic class. Then, an iterative growing and pruning algorithm is used to form a tree classi®er. Experiments demonstrated that the proposed recognition system is robust and ¯exible and a high recognition rate is reported. Ó 2002 Elsevier Science B.V. All rights reserved. Keywords: Handwritten numeral recognition; Feature extraction; Decision tree classi®er; Neural networks 1. Introduction Handwritten character recognition, with extensive variety of writing style, has been a long active research ®eld due to its potential commercial perspective. Many methodologies have been proposed and character recognition systems have been commercialized in recently years. However, there * Corresponding author. Present address: 2000 ST MARC, Apt 1003, Montreal, Que, Canada H3H 2N9. E-mail addresses: pin_zhan@cs.concordia.ca, Epzhang_ 2000@yahoo.com (Z. Ping), . still exist rooms for further research on severely, omnifont machine-printed and unconstrained handwritten character recognition in pursuit of higher recognition rate and faster processing time (Garris and Dimmick, 1996; Blue et al., 1994; Heutte and Paquet, 1998; Trier et al., 1996). As we have known, two of the commonly used classi®ers are arti®cial neural networks (ANNs) classi®er and decision tree (DT) classi®er. Arti®cial neural network, due to its useful properties such as highly parallel mechanism, excellent fault tolerance, adaptation, self-learning, has been increasingly developed and successfully used in character recognition (Weideman et al., 1995; Zhang et al., 0167-8655/02/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 0 1 ) 0 0 0 8 8 - 5 46 Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56 1999; Gader and Khabou, 1996; Cho, 1997; Amin et al., 1996; Cao et al., 1995, 1997). Generally speaking, the decision making processing of a neural network is dicult to understand. On the contrary, a decision tree classi®er, because of its conceptual simplicity and computational eciency, has long been investigated. A large variety of methods have been proposed for the design of classi®cation tree (Breiman et al., 1984; Sethi and Sarvarayudu, 1982; Quinlan, 1986; Wang and Suen, 1987; Gelfand et al., 1991; Amit et al., 1997; Safavian and Landgrebe, 1991). Recently, some researchers successfully combined ANN with TD to automatically design decision tree for various applications (Guo and Gelfand, 1992; Sethi, 1995; Krishna et al., 1999). However, how to combine a suitable feature space with an optimal decision tree for solving multiple recognition problems still needs further investigation. A pattern classi®er uses a series of tests or decision functions to determine the identity of an unknown pattern or object. The evaluation of the classi®er is planned in such a way that the successive outcome reduces uncertainty about the unknown pattern being considered for classi®cation. A more challenging approach is to con®gure a classi®cation system by using a series of suitable features. In this paper, a hybrid classi®er is presented which consists of two parts: coarse classi®cation and ®ne classi®cation based on the feature similarity extracted in this paper. For coarse classi®er, a three-layer feed forward neural networks with back propagation learning algorithm is employed to distinguish numeric six subsets f0g, f6g, f8g, f1; 7g, f2; 3; 5g, f4; 9g. Three character classes f0g, f6g and f8g are directly recognized from ANN. For each of the latter three subsets, a decision tree classi®cation is built for further ®ne classi®cation. The paper is organized as follows: In Section 2, a variety of the global features of handwritten numeral characters are extracted, and a novel ¯oating detector is introduced to detect character local features. In Section 3, a decision tree growing and pruning algorithm is reviewed. In Section 4, a hybrid recognition system is proposed. Some of relationship between the features and corresponding class is addressed. In Section 5, the comparisons of recognition rate between ANN and proposed hybrid tree are conducted. Finally, some conclusions are given. 2. Handwritten numeral feature extraction Normally, the ideal geometric shape of a numeral is a series of connecting strokes. Besides the preprocessing such as ®ltering, segmentation, normalization are needed to be processed, an additional broken stroke connection and character slant correction algorithm (Cai and Liu, 1999) is employed for more accurate feature extraction. After preprocessing, the normalized character is scaled into 32 24 matrix without severely slant distortion. Based on the preprocessed data, two types of feature extraction need to be performed. Here are some de®nitions. 2.1. Global feature 2.1.1. Middle line feature Middle line consists of a set of middle points between two neighboring strokes, in which the middle line can be established from horizontal direction or vertical direction, respectively. In this paper, only vertical middle features are used. Actually, the algorithm for extracting middle line feature is very simple. The middle point of the adjacent two strokes needs to be written down by scanning the character image from left to right along the vertical direction. For example, the middle line of character ``v'' in the vertical direction is illustrated in Fig. 1. The symbol # represents extracted middle points which form the middle line. If the beginning/end point of the middle line is a cross point of two adjacent strokes, we de®ne the open/close status of the point is closed (1); otherwise it is opened (0). The position of terminal point and the open/close status of terminal point can be encoded as the middle line features. Some middle lines extracted by this method are shown in Fig. 2 denoted by thin lines in the character image. Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56 47 The extracted middle line and concave features of character ``2'' and ``8'' are shown in Fig. 4 indicated by thin lines and arrows, respectively. Fig. 1. Middle line features of character-v (in vertical direction). 2.1.2. Concave feature Concave feature describes the concavity in character's outer pro®les from the top/bottom/left/ right direction point of view. For example, left concave feature is shown in Fig. 3. From character's left pro®le A±B±C, A and C are two most outer-edge points, the B is a most interior point. The middle point of between point A and point C is assigned as the concave point. Some parameters are de®ned as: Dep1 jA Bj; Dep2 jC Bj; Depth : D min Dep1; Dep2; Width : W Wid1 Wid2; Concavity : C D=W : 1 In our recognition system, if C > 0:3, the concave feature is taken into consideration, otherwise, it is treated as an invalid one and ignored. The number of concave and the position of each concave feature in each pro®le can be featured. 2.1.3. Width feature A normalized character is divided into four equal sub-regions vertically (along the direction of the character height). The maximum width of each sub-region is calculated and denoted by m1; m2; m3; m4, respectively. In order to systematically quantify the width of any sub-region x 2 m1; m2; m3; m4, a scaling function f x is calculated and used as width feature. f x int a x Mmin= Mmax Mmin; 2 where Mmax maxfmig, Mmin minfmig, i 1; . . . ; 4; a is a scale factor and selected as 3 which ensures the width feature of each sub-region can be encoded by 2 bits. 2.1.4. Point feature End points, branch points, and cross points features de®ned in (Amin et al., 1996) are applied in this system. These features are easily extractable, and can be encoded as point features. 2.2. Local feature extraction The feature extraction methods mentioned above can depict apparent global features. However, these global features are not ecient to recognize free handwritten characters in serious distortion condition. The feature extraction method needs to be further investigated. A ¯oating feature detector is therefore proposed to detect Fig. 2. Middle lines extracted. 48 Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56 Fig. 5. Detectors in eight directions. Fig. 3. Concave feature. Fig. 4. Concave feature. tiny segments in character image used as local features. 2.2.1. Feature detector FFD w; h is a ¯oating detector with two parameters w and h to detect the tiny segments along character outer pro®les as local feature. FFD w; h is set to1 when a local feature is detected, otherwise FFD w; h 0. In order to detect various segments in dierent directions, eight temples are designed as shown in Fig. 5. Each of which is used for extracting speci®c segments in the speci®c direction. Detector (a) can be used to detect horizontallike segments in the left pro®le. The moving direction of the detector is from bottom to top. Detector (b) can also detect horizontal-like segments in the left pro®le, however, the moving direction of the detector is from top down to bottom. Detectors (c) and (d) both can detect horizontal-like segments in the right pro®le. Detectors (e) and (f) can be used for detecting vertical-like segments along the top pro®le; Detectors (g) and (h) for detecting vertical-like segments in the bottom pro®le. In FFD w; h, Parameter h is called the height of the detector. Any cursive segment can be detected by a changeable h. Parameter w stands for detector width. If w is set too large, the detector will overlook many useful features; if w is set too small, many details such as zigzag noise and handwritten scribbles will be detected which will result in feature variability for a same character. A suitable pair of parameters w0; h0 needs to be initialized before the detector can be used. In our System, the parameters w0 and h0 are empirically assigned to one-eighth of character's width and height, respectively in order to ®lter out many unwanted details such as zigzag noise and handwritten scribbles. For better apprehension, we describe two ¯oating detectors Fig. 5(a) and (d) to detect the Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56 49 Fig. 7. Floating detector. Step 6: Stop moving the detector, increment wi and detect hi as follows: While FFD wi; hi 1f wi 1 wi 1; detect hi 1 g Fig. 6. Floating detectors to detect horizontal local segments. local features of character ``2'' in detail shown in Fig. 6. The detector of Fig. 5(a) can detect horizontal-like segments along the left pro®le of the character with the moving direction from bottom to top, whereas Fig. 5(d) can detect horizontal segments along the right pro®le with inverse direction. For detecting the top-left horizontal tiny feature, the moving direction of feature detector can be visualized by positions I, II, III. Obviously, position III is more likely an approach to detecting horizontal segment. 2.2.2. The procedure of local feature extraction The procedure of seeking segment in left pro®le of character ``2'' in Fig. 6 is described as follows: Step 1: Place the detector at the most left-bottom edge of the character. Step 2: Initialize w0 and h0. Step 3: Detect tiny segment. Step 4: If (FFD w0; h0 1) (one segment is detected) go to step 6. Step 5: Move forward along detecting direction with the step h0; test whether the detector has reached the top of the character, if so, go to step 8; else go back to step 3. Write down parameter pair (wi; hi). Step 7: Jump over the detected segment, test whether or not the detector has arrived the top, if not, go to step 2. Step 8: End of the procedure. For instance, in order to distinguish two dierent writing styles of characters ``4'' and ``9'' indicated in Fig. 7. FFD(I) can detect a left-pro®le segment in both character ``4'' and ``9''. However, FFD(II) can only extract a right-pro®le segment in character ``4'' and FFD(III) for detect the bottom-left segment of one writing style of character ``9''. Combined with the middle line feature, a decision tree can be constructed which will be elaborated in Section 4. 3. Binary decision tree classi®er The design of a DT classi®er can be decomposed into three categories: 1. The choice of an appropriate tree structure. 2. The choice of feature subsets to be used at each internal node. 3. The choice of the decision rule or strategy to be used at each internal node. 50 Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56 We will address the issue related to build up an appropriate tree for classi®cation in this section. In a binary DT classi®er, a sequence of decision rules is used to assign an unknown sample to a pattern class. A hierarchical structure tree is noted as T consisting of several levels: Level 0 contains a node, called root node. Level 1 contains the nodes 2 and 3; level 2 contains the nodes 4, 5, 6, and 7 and so on. Speci®cally, level i contains 2i nodes from 2i to 2 i1 1. Nodes with descending branches are the non-terminal nodes (NTNs). Nodes without descending branches are the terminal nodes (TNs). Each NTN contains a decision rule. Each TN belongs to one of recognized classes. A tree can be grown by recursively ®nding splitting rules (features and thresholds) until all terminal nodes have pure or nearly pure class membership or cannot be split further. According to Guo and Gelfand (1992), let N be the number of training samples, N t be the number of training samples which land in node t, Nj t be the number of training samples which land in node t and belong to class j, and M be the number of class to be classi®ed. De®ne P t N t=N ; PL t P tL =P t; PR t P tR =P t; and P j=t Nj t=N t; where P t is the probability that a randomly selected training sample lands in node t, PL t PR t is the conditional probability that training samples belong to left tL (right tR ) branch given it lands in node t, and P j=t is the conditional probability that the training sample belongs to class j given it lands in node t. De®ne a tree splitting criterion based on a node impurity function such as the Gini criterion (Breiman et al., 1984). XX g t P i=t P j=t: i j6i Next, de®ne the change in node impurity DG f ; h; t due to a split at node t with two parameters feature vector f and threshold h by: DG f ; h; t g t g tL PL t g tR PR t: 4 The best feature f and the threshold h at node t can be obtained by maximizing the decrease in node impurity: DG f ; h ; t maxfDG f ; h; tg; f 2 F; 5 where F is the feature set. In our recognition system, f is chosen from a global and local feature space F, whereas the recognized class y 2 f1; 2; 3; . . . ; 9; 0g. A decision rule d is a function that map f into class y with d f representing the class label of a feature vector f. The misclassi®cation rate of the decision classi®cation tree is denoted by R T P d f 6 y: 6 In practical applications, the misclassi®cation rate is simply estimated by the ratio of samples misclassi®ed to the total number of the testing samples. R T Nerror =N ; 7 where Nerror is the number of samples such that d f 6 y. There is a guideline on how to ®nd a pruned tree from a tree T. Suppose T 1 is a pruned subtree of tree T if T 1 has same root node as T and has fewer either NTN or NT. This is denoted by T1 < T . For an optimal seeking pruned subtree T1 from T, we construct many tree T 0 and satisfy: R T1 minfR T 0 8T 0 6 T g: 8 In this paper, an iterative growing and pruning algorithm (Gelfand et al., 1991) is adopted for the construction of a decision tree. The training algorithm is described as follows: · The training data are split into two independent sets, called the ®rst and second training sets. · A large tree is grown based on the ®rst training set by splitting until all terminal nodes have a pure class membership, or have fewer than a speci®ed number of samples, or cannot be split such that both descendents are nonempty. · A pruned subtree is selected by minimizing the misclassi®cation rate over the second training set. · A tree is grown o of the terminal nodes of the selected pruned subtree based on the second training set by splitting until all terminal nodes Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56 have pure class memberships, or have fewer than a speci®ed number of samples, or cannot be split such that both descendents are nonempty. · A pruned subtree is selected by minimizing the misclassi®cation rate over the ®rst training set. · The procedure is iterated, successively interchanging the roles of the ®rst and second training sets. It will be shown that the process of selected pruned subtree converges, then the tree is formed. 4. Recognition system A block diagram of handwritten numeral recognition system is shown in Fig. 8. The recognition system consists of three main parts, namely feature extraction, coarse classi®cation, ®ne classi®cation. Features extracted in Section 2 will be used in our system for feature extraction. Coarse classi®cation: a three-layer neural network with back-propagation algorithm is employed. Totally global features 98 bits and local features 8 bits are fed into the input layer. The 51 output layer is composed of six nodes representing six character subsets f0g, f6g, f8g, f1; 7g, f2; 3; 5g, f4; 9g based on the feature similarity extracted in this paper. It does not totally depend on the similarity of the geometrical pro®le of characters. The middle line feature, concave feature, point feature, as well as local feature extracted by FFD can be used to distinguish character ``6'' from ``5'', character ``8'' from ``6'', character ``0'' from ``9'', etc. For example, in order to distinguish character ``2'' from character ``9'', the FFD of the Fig. 5(d) was applied to detect horizontal-like segment in the right pro®les on both characters. Normally, one bottom-right segment in the character ``2'' can be detected, however, same positional segment in character ``9'' cannot be detected. The network is connected between adjacent layers. Here brie¯y introduce some encoding scheme. Middle line feature encoding: Only three longest middle lines in each character will be considered. Those features such as the terminal point positions, the terminal point open/close status will be encoded. For each middle line position encoding, the character image is equally divided into 4 4 sub-regions. 4 bits can be used to encode one terminal point position situated at relative sub- Fig. 8. The system block diagram of recognition system. 52 Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56 region. For terminal point status encoding, if the terminal point is sealed by adjacent strokes, the status is closed: (1), otherwise, the status is opened (0). So 30 bits are used to encode the middle line features of a character. Concave feature encoding: Every pro®le is divided into three sub-regions. The concave position can be encoded 1 as soon as the concave feature is extracted in the corresponding sub-region or else it will be set to 0. 12 bits is needed to encode the concave features of a character. Width feature encoding: A normalized character is partitioned into four slices along vertical direction. 2 bits can encode the width of each slice. 8 bits are needed for width feature encoding. Point feature encoding: Three planes referring to end point plane, branch point plane, and cross point plane are masked on character image with 4 4 grids. Where exists an end point, or a branch point, or a cross point, the corresponding grid position on the relative plane will be set to 1, respectively. Three feature planes need 16 3 48 bits to encode. Local feature encoding: Only horizontal-like segments in both left and right pro®les are considered. Four equal sub-regions in each pro®le are produced. The relevant sub-region can be encoded 1 as soon as the segment in this sub-region is detected otherwise it will be set to 0. 8 bits are needed to encode both the left and the right pro®les' local features. After being trained, the neural network can be used as coarse classi®er. Reducing the number of classes from 10 into six has made a great impact on speeding up learning and facilitating convergence. In the coarse classi®cation, characters ``0'', ``6'', and ``8'' are directly recognized from ANN as the three character sets have very stable and recognizable middle line and concave features which can be used to easily distinguish one from others. For the classi®cation of the remaining three subsets f1; 7g, f2; 3; 5g, f4; 9g. Special attention must be paid to the geometrical dierence between (among) those characters in each subclass in order to distinguish one from others. A heuristic and empirical method has been applied to build up the relationship between feature vectors and decision rule d . For example, in order to distinguish 1 from 7 in f1; 7g subset, two most stable and most distinguishable features (width feature and local segment feature) are chosen to build the relationship between the feature and class. Some decision rules d can be deduced based on the feature±class relationship vectors such as: If {(one or more segments in the left pro®le) or (the dierence of width feature in dierent slices)} The character belongs to ``7'' Else The character belongs to ``1'' After training, the decision tree of distinguishing character ``1'' from ``7'' is shown in Appendix A. For subset f4; 9g shown in Fig. 7, the middle line feature, the local segment feature and the point features are used to describe the relationship between the features and the corresponding class. The overall features extracted by the proposed method are listed in Table 1, these features can construct corresponding decision tree to distinguish between character ``4'' and ``9''. As characters ``4'' and ``9'' with dierent writing styles in Fig. 7 using relevant features extracted by the proposed methods and listed in Table 1 and decision tree, the character set can be correctly recognized. However, for character ``4'', if the writing style is the same as the one in the left- Table 1 The list of extracted features of character ``4'' and ``9'' Character 4 9 Open (0)/Close (1) status of middle line Segment detected by FFD in the left/right edge Point features Beg End Left Right End Branch Cross 0 (1) 1 1 1 1 1 1 0 4 (3) 1 (2) 1 (2) 1 1 0 Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56 bottom of Fig. 7 (both the open/close status of beginning and end of middle line are closed), and if the right-middle segment is not detected by FFD(II) in the right pro®le (the length of rightmiddle segment is less than one-eighth of the character's width), this character could be recognized incorrectly. It is because under such a circumstance, there are no feature dierence between the character ``4'' and character ``9'' in the above writing style. For subset f2; 3; 5g, some of global features such as point features, concave features, and local segment features are chosen to deduct several decision rules. Trees are generated by recursively partitioning the feature space in such a way that all of the terminal nodes belong to a speci®c class, then an iterative growing and pruning algorithm can be used to prune and grow the trees in pursuit of optimal design of the trees. 5. Experiments Two sets of handwritten numerals were collected. In both cases, we adopt the pattern recognition convention in selection of training and testing data as follows: For ®rst group (called Data 1), 10 000 free handwritten numerals are collected written by 200 people with unconstrained writing style. 5000 out of these characters are selected randomly as training samples (each class includes 500 characters); the others are then input as testing samples. Another group (called Data 2), 2500 characters from NIST Special Database3 are chosen for training, other 2500 characters in the same database are used to test. This method ensures that the characters in 53 the evaluation set are not in the training set. The reject policy is incorporated into the neural networks with two rules. The ®rst rule rejects a character when the highest activation level of the output neurons does not exceed a predetermined threshold. The second rule requires that the difference between the two highest activation levels of the output neurons must be greater than a predetermined percentage of the highest activation level of the output neurons. If the output satis®es the two conditions, the character will be classi®ed into the class associated with the unit with the highest activation level in the output layer, otherwise, the character will be rejected (Ergenzinger and Thomsen, 1995; Karras and Perantonis, 1995). Two experiments were conducted. For experiment one, a three-layer feed forward neural network was employed as character recognizer. The network is connected between adjacent layers. The global features encoded with 98 bits, and local features encoded with 8 bits are together fed into input layer. The recognition network has 10 output layers (10 units standing for characters from 0±9), and one hidden layer (20 units). According to the rule of the thumb that a connection weight can easily learn 1.5 bits of information (Lang and Witbrock, 1988). In experiment one, we adopt the network structure as 106 20 10, the total weights are only 106 20 20 10 2320, which need 2320 1:5 3480 training data. The networks are ®rstly trained by using training samples (5000 characters in training set) in Data 1, conducted recognition performance by using the testing samples both in Data 1 and Data 2. Then the training samples (2500 characters in the training set) of Data 2 are again applied to Table 2 The handwritten numeral recognition rate by ANN Training samples Testing samples Recognition rate (%) Rejection rate (%) Misrecognition rate (%) Data 1 (5000) Data Data Data Data 96.70 95.50 97.60 96.10 1.95 2.50 1.30 1.60 1.35 2.00 1.10 2.30 Data 1 (5000) + Data 2 (2500) 1 2 1 2 (5000) (2500) (5000) (2500) 54 Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56 Table 3 Coarse classi®cation recognition rate by using ANN Training samples Testing samples Recognition rate (%) Rejection rate (%) Misrecognition rate (%) Data 1 (5000) Data Data Data Data 98.90 98.60 99.20 98.80 0.65 0.80 0.45 0.70 0.45 0.60 0.35 0.50 Data 1 (5000) + Data 2 (2500) 1 2 1 2 (5000) (2500) (5000) (2500) Table 4 lists the average number of nodes in three trees. The overall handwritten numeral recognition rate for testing Data 1 and testing Data 2 is shown in Table 5. The recognition rate tendency referring to various training samples of this hybrid classi®cation system is visualized in Fig. 9. Table 4 The number of nodes in three decision trees Types of the tree The number of nodes in pruned tree f1; 7g f2; 3; 5g f4; 9g 15 63 31 the networks to continuously conduct training, and further recognition is performed by using both testing samples. The overall recognition rates are tabulated in Table 2. In the experiment two, the same network as the experiment one is employed as coarse character recognizer shown in Fig. 8. The dierent is that the output layer only includes 6 units representing six subsets f0g, f6g, f8g, f1; 7g, f2; 3; 5g, f4; 9g. Table 3 gives the coarse recognition rate. It is shown the recognition rate has improved greatly because only six patterns are classi®ed. The remained character pairs will be further recognized by decision tree classi®cation. An iterative growing and pruning algorithm is employed to construct three decision tree classi®ers. During decision tree growing and pruning procedure, the training data in Data 1 and Data 2 were divided into two equal sub-training data, and used to iterative split and prune the tree. 6. Conclusions A good classi®er combining with stable and ¯exible feature extraction method is the most important factor for pattern recognition, especially for OCR. In this work, the character geometrical features are exploited in detail. Firstly, a set of global features is de®ned and extracted. A novel ¯oating feature detector is developed. The former depicts character global geometrical features, the later describes character local features. A hybrid classi®er is proposed which comprises two sub classi®ers, namely an ANN coarse classi®er and three decision tree ®ne classi®ers. In the coarse classi®er, the characters with large dierence in features extracted by this paper proposed method are directly recognized. Those character with similar geometric structure need to be further classi®ed. An iterative growing and pruning Table 5 The handwritten numeral recognition rate by hybrid classi®er Training samples Testing samples Recognition rate (%) Rejection rate (%) Misrecognition rate (%) Data 1 (5000) Data Data Data Data 97.80 97.60 98.10 97.90 1.35 1.25 1.15 1.20 0.85 1.15 0.75 0.90 Data 1 (5000) + Data 2 (2500) 1 2 1 2 (5000) (2500) (5000) (2500) Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56 55 tree algorithm is adopted to build three decision trees for recognizing character pair. Compared to other fusion of classi®er methods, proposed hybrid classi®er combines two entirely dierent classi®cation method (ANN and DT) to perform two level hierarchical recognizers. For each recognizer, the training procedure is carried individually and the interference of dierent data sets is minimized. Whole system is ¯exible and easily adjustable. Experiments demonstrated our proposed system has improved character recognition rate compared to those only feed forward neural networks with back propagation learning algorithm is used. Fig. 9. Recognition rate versus sample size for ANN method, hybrid decision tree method. Appendix A 56 Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56 References Amin, A., Humoud Al-Sadoun, Fischer, S., 1996. Hand-printed Arabic character recognition system using an arti®cial network. Pattern Recognition 29 (4), 663±675. Amit, Y., Geman, D., Wilder, K., 1997. Joint induction of shape features and tree classi®ers. IEEE Trans. Pattern Anal. Machine Intell. 19 (11), 1300±1305. Blue, J.L., Candela, G.T., Grother, P.J., Chellappa, R., Wilson, C.L., 1994. Evaluation of pattern classi®ers for ®ngerprint and OCR applications. Pattern Recognition 18 (4), 485±501. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classi®cation and Regression Trees. Wadsworth, Belmont, CA. Cai, J., Liu, Z.-Q., 1999. Integration of structural and statistical Information for unconstrained handwritten numeral recognition. IEEE Trans. Pattern Anal. Machine Intell. 21 (3), 263±270. Cao, J. et al., 1995. Recognition of handwritten numerals with multiple feature and multi-stage classi®er. Pattern Recognition 28 (2), 153±160. Cao, J. et al., 1997. A hierarchical neural network architecture for handwritten numeral recognition. Pattern Recognition 30 (2), 289±299. Cho, S.-B., 1997. Neural-network classi®ers for recognizing totally unconstrained handwritten numerals. IEEE Trans. Neural Networks 8 (1), 43±53. Ergenzinger, S., Thomsen, E., 1995. An accelerated learning algorithm for multi-layer perceptions: optimization layer by layer. IEEE Trans. Neural Networks 6 (1), 31±43. Gader, P.D., Khabou, M.A., 1996. Automatic feature generation for handwritten digit recognition. IEEE Trans. Pattern Anal. Machine Intell. 18 (12), 1256±1261. Garris, M.D., Dimmick, D.L., 1996. Form design for high accuracy optical character recognition. IEEE Trans. Pattern Anal. Machine Intell. 18 (6), 653±656. Gelfand, S.B., Ravishankar, C.S., Delp, E.J., 1991. An iterative growing and pruning algorithm for classi®cation tree design. IEEE Trans. Pattern Anal. Machine Intell. 13 (2), 163±174. Guo, H., Gelfand, S.B., 1992. Classi®cation trees with neural network feature extraction. IEEE Trans. Neural Networks 3 (6), 923±933. Heutte, L., Paquet, T., et al., 1998. A structural/statistical feature based vector for handwritten character recognition. Pattern Recognition Lett. 19, 629±641. Karras, D.A., Perantonis, S.J., 1995. An ecient constrained training algorithm for feed forward networks. IEEE Trans. Neural Networks 6 (6), 1420±1434. Krishna, R., Sivakumar, G., Bhattacharya, P., 1999. Extracting decision trees from trained neural networks. Pattern Recognition 32, 1999±2009. Lang, K.J., Witbrock, M.J., 1988. Learning to tell two spirals apart. In: Proceedings of the 1988 Connectionist Models Summer School. Morgan Kaufmann, Los Altos, CA, pp. 52±59. Quinlan, J.R., 1986. Induction of decision trees. Machine Learning 1, 81±106. Safavian, S.R., Landgrebe, D., 1991. A survey of decision tree classi®er methodology. IEEE Trans. Systems, Man Cybernet. 21 (3), 660±674. Sethi, I.K., 1995. Neural implementation of tree classi®ers. IEEE Trans. Systems, Man Cybernet. 25 (8), 1243±1249. Sethi, I.K., Sarvarayudu, G.P.R., 1982. Hierarchical classi®er design using mutual information. IEEE Trans. Pattern Anal. Machine Intell. 4, 441±445. Trier, é.D., Jain, A.K., Taxt, T., 1996. Feature extraction methods for character recognition ± a survey. Pattern Recognition 29 (4), 641±662. Wang, Q.R., Suen, C.Y., 1987. Large tree classi®er with heuristic search and global training. IEEE Trans. Pattern Anal. Machine Intell. 9 (1), 91±102. Weideman, W.E., Manry, M.T., Yau, H.-C., Gong, W., 1995. Comparisons of a neural network and a nearest-neighbor classi®er via the numeric handprint recognition problem. IEEE Trans. Neural Networks 6 (6), 1524±1530. Zhang, B., Fu, M., Yan, H., Jabri, M.A., 1999. Handwritten digit recognition by adaptive-subspace self-organizing map. IEEE Trans. Neural Networks 10 (4), 939±953.