A novel feature extraction method and hybrid tree classification for

Pattern Recognition Letters 23 (2002) 45±56
www.elsevier.com/locate/patrec
A novel feature extraction method and hybrid tree
classi®cation for handwritten numeral recognition
Zhang Ping *, Chen Lihui
Digital Signal Processing Laboratory, S2-B4-a03, School of Electrical and Electronic Engineering,
Nanyang Technological University, Singapore 639798, Singapore
Received 8 March 2000; received in revised form 2 January 2001
Abstract
A hybrid classi®cation system with neural network and decision tree as the classi®ers for handwritten numeral
recognition is proposed. Firstly a variety of stable and reliable global features are de®ned and extracted based on the
character geometric structures, a novel ¯oating detector is then proposed to detect segments along the left and right
pro®les of a character image used as local features. The recognition system consists of a hierarchical coarse classi®cation
and ®ne classi®cation. For the coarse classi®er: a three-layer feed forward neural network with back propagation
learning algorithm is employed to distinguish six subsets f0g, f6g, f8g, f1; 7g, f2; 3; 5g, f4; 9g based on the feature
similarity of characters extracted. Three character classes namely f0g, f6g and f8g are directly recognized from arti®cial
neural network (ANN). For each of characters in the latter three subsets, a decision tree classi®er is built for further ®ne
classi®cation as follows: Firstly, the speci®c feature±class relationship is heuristically and empirically deduced between
the feature primitives and corresponding semantic class. Then, an iterative growing and pruning algorithm is used to
form a tree classi®er. Experiments demonstrated that the proposed recognition system is robust and ¯exible and a high
recognition rate is reported. Ó 2002 Elsevier Science B.V. All rights reserved.
Keywords: Handwritten numeral recognition; Feature extraction; Decision tree classi®er; Neural networks
1. Introduction
Handwritten character recognition, with extensive variety of writing style, has been a long active
research ®eld due to its potential commercial perspective. Many methodologies have been proposed
and character recognition systems have been
commercialized in recently years. However, there
*
Corresponding author. Present address: 2000 ST MARC,
Apt 1003, Montreal, Que, Canada H3H 2N9.
E-mail addresses: pin_zhan@cs.concordia.ca, Epzhang_
2000@yahoo.com (Z. Ping), .
still exist rooms for further research on severely,
omnifont machine-printed and unconstrained
handwritten character recognition in pursuit of
higher recognition rate and faster processing time
(Garris and Dimmick, 1996; Blue et al., 1994;
Heutte and Paquet, 1998; Trier et al., 1996).
As we have known, two of the commonly used
classi®ers are arti®cial neural networks (ANNs)
classi®er and decision tree (DT) classi®er. Arti®cial
neural network, due to its useful properties such as
highly parallel mechanism, excellent fault tolerance, adaptation, self-learning, has been increasingly developed and successfully used in character
recognition (Weideman et al., 1995; Zhang et al.,
0167-8655/02/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 7 - 8 6 5 5 ( 0 1 ) 0 0 0 8 8 - 5
46
Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56
1999; Gader and Khabou, 1996; Cho, 1997; Amin
et al., 1996; Cao et al., 1995, 1997). Generally
speaking, the decision making processing of a
neural network is dicult to understand. On the
contrary, a decision tree classi®er, because of its
conceptual simplicity and computational eciency, has long been investigated. A large variety
of methods have been proposed for the design of
classi®cation tree (Breiman et al., 1984; Sethi and
Sarvarayudu, 1982; Quinlan, 1986; Wang and
Suen, 1987; Gelfand et al., 1991; Amit et al., 1997;
Safavian and Landgrebe, 1991). Recently, some
researchers successfully combined ANN with TD
to automatically design decision tree for various
applications (Guo and Gelfand, 1992; Sethi, 1995;
Krishna et al., 1999). However, how to combine a
suitable feature space with an optimal decision tree
for solving multiple recognition problems still
needs further investigation.
A pattern classi®er uses a series of tests or decision functions to determine the identity of an
unknown pattern or object. The evaluation of the
classi®er is planned in such a way that the successive outcome reduces uncertainty about the
unknown pattern being considered for classi®cation. A more challenging approach is to con®gure
a classi®cation system by using a series of suitable
features. In this paper, a hybrid classi®er is presented which consists of two parts: coarse classi®cation and ®ne classi®cation based on the feature
similarity extracted in this paper. For coarse classi®er, a three-layer feed forward neural networks
with back propagation learning algorithm is employed to distinguish numeric six subsets f0g, f6g,
f8g, f1; 7g, f2; 3; 5g, f4; 9g. Three character classes
f0g, f6g and f8g are directly recognized from
ANN. For each of the latter three subsets, a decision tree classi®cation is built for further ®ne
classi®cation.
The paper is organized as follows: In Section 2,
a variety of the global features of handwritten
numeral characters are extracted, and a novel
¯oating detector is introduced to detect character
local features. In Section 3, a decision tree growing
and pruning algorithm is reviewed. In Section 4, a
hybrid recognition system is proposed. Some of
relationship between the features and corresponding class is addressed. In Section 5, the
comparisons of recognition rate between ANN
and proposed hybrid tree are conducted. Finally,
some conclusions are given.
2. Handwritten numeral feature extraction
Normally, the ideal geometric shape of a numeral is a series of connecting strokes. Besides the
preprocessing such as ®ltering, segmentation,
normalization are needed to be processed, an additional broken stroke connection and character
slant correction algorithm (Cai and Liu, 1999) is
employed for more accurate feature extraction.
After preprocessing, the normalized character is
scaled into 32 24 matrix without severely slant
distortion.
Based on the preprocessed data, two types of
feature extraction need to be performed. Here are
some de®nitions.
2.1. Global feature
2.1.1. Middle line feature
Middle line consists of a set of middle points
between two neighboring strokes, in which the
middle line can be established from horizontal
direction or vertical direction, respectively. In this
paper, only vertical middle features are used.
Actually, the algorithm for extracting middle
line feature is very simple. The middle point of the
adjacent two strokes needs to be written down by
scanning the character image from left to right
along the vertical direction. For example, the
middle line of character ``v'' in the vertical direction is illustrated in Fig. 1. The symbol # represents extracted middle points which form the
middle line.
If the beginning/end point of the middle line is
a cross point of two adjacent strokes, we de®ne
the open/close status of the point is closed (1);
otherwise it is opened (0). The position of terminal
point and the open/close status of terminal point
can be encoded as the middle line features. Some
middle lines extracted by this method are shown
in Fig. 2 denoted by thin lines in the character
image.
Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56
47
The extracted middle line and concave features
of character ``2'' and ``8'' are shown in Fig. 4 indicated by thin lines and arrows, respectively.
Fig. 1. Middle line features of character-v (in vertical direction).
2.1.2. Concave feature
Concave feature describes the concavity in
character's outer pro®les from the top/bottom/left/
right direction point of view. For example, left
concave feature is shown in Fig. 3.
From character's left pro®le A±B±C, A and C
are two most outer-edge points, the B is a most
interior point. The middle point of between point
A and point C is assigned as the concave point.
Some parameters are de®ned as:
Dep1 ˆ jA Bj; Dep2 ˆ jC Bj;
Depth : D ˆ min…Dep1; Dep2†;
Width : W ˆ Wid1 ‡ Wid2;
Concavity : C ˆ D=W :
…1†
In our recognition system, if C > 0:3, the concave
feature is taken into consideration, otherwise, it is
treated as an invalid one and ignored. The number
of concave and the position of each concave feature in each pro®le can be featured.
2.1.3. Width feature
A normalized character is divided into four
equal sub-regions vertically (along the direction of
the character height). The maximum width of each
sub-region is calculated and denoted by
m1; m2; m3; m4, respectively. In order to systematically quantify the width of any sub-region
x 2 …m1; m2; m3; m4†, a scaling function f …x† is
calculated and used as width feature.
f …x† ˆ int…a …x
Mmin†=…Mmax
Mmin††;
…2†
where Mmax ˆ maxfmig, Mmin ˆ minfmig, i ˆ
1; . . . ; 4; a is a scale factor and selected as 3 which
ensures the width feature of each sub-region can be
encoded by 2 bits.
2.1.4. Point feature
End points, branch points, and cross points
features de®ned in (Amin et al., 1996) are applied
in this system. These features are easily extractable, and can be encoded as point features.
2.2. Local feature extraction
The feature extraction methods mentioned
above can depict apparent global features. However, these global features are not ecient to recognize free handwritten characters in serious
distortion condition. The feature extraction
method needs to be further investigated. A ¯oating
feature detector is therefore proposed to detect
Fig. 2. Middle lines extracted.
48
Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56
Fig. 5. Detectors in eight directions.
Fig. 3. Concave feature.
Fig. 4. Concave feature.
tiny segments in character image used as local
features.
2.2.1. Feature detector
FFD…w; h† is a ¯oating detector with two parameters w and h to detect the tiny segments along
character outer pro®les as local feature. FFD…w; h†
is set to1 when a local feature is detected, otherwise FFD…w; h† ˆ 0.
In order to detect various segments in di€erent
directions, eight temples are designed as shown in
Fig. 5. Each of which is used for extracting speci®c
segments in the speci®c direction.
Detector (a) can be used to detect horizontallike segments in the left pro®le. The moving direction of the detector is from bottom to top.
Detector (b) can also detect horizontal-like segments in the left pro®le, however, the moving direction of the detector is from top down to
bottom. Detectors (c) and (d) both can detect
horizontal-like segments in the right pro®le. Detectors (e) and (f) can be used for detecting vertical-like segments along the top pro®le; Detectors
(g) and (h) for detecting vertical-like segments in
the bottom pro®le.
In FFD…w; h†, Parameter h is called the height
of the detector. Any cursive segment can be detected by a changeable h. Parameter w stands for
detector width. If w is set too large, the detector
will overlook many useful features; if w is set too
small, many details such as zigzag noise and
handwritten scribbles will be detected which will
result in feature variability for a same character. A
suitable pair of parameters …w0; h0† needs to be
initialized before the detector can be used. In our
System, the parameters w0 and h0 are empirically
assigned to one-eighth of character's width and
height, respectively in order to ®lter out many
unwanted details such as zigzag noise and handwritten scribbles.
For better apprehension, we describe two
¯oating detectors Fig. 5(a) and (d) to detect the
Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56
49
Fig. 7. Floating detector.
Step 6: Stop moving the detector, increment wi
and detect hi as follows:
While …FFD…wi; hi† ˆˆ 1†f
wi ‡ 1 ˆ wi ‡ 1; detect hi ‡ 1
g
Fig. 6. Floating detectors to detect horizontal local segments.
local features of character ``2'' in detail shown in
Fig. 6. The detector of Fig. 5(a) can detect horizontal-like segments along the left pro®le of the
character with the moving direction from bottom
to top, whereas Fig. 5(d) can detect horizontal
segments along the right pro®le with inverse direction. For detecting the top-left horizontal tiny
feature, the moving direction of feature detector
can be visualized by positions I, II, III. Obviously,
position III is more likely an approach to detecting
horizontal segment.
2.2.2. The procedure of local feature extraction
The procedure of seeking segment in left pro®le
of character ``2'' in Fig. 6 is described as follows:
Step 1: Place the detector at the most left-bottom edge of the character.
Step 2: Initialize w0 and h0.
Step 3: Detect tiny segment.
Step 4: If (FFD…w0; h0† ˆˆ 1) (one segment is
detected) go to step 6.
Step 5: Move forward along detecting direction
with the step h0; test whether the detector has
reached the top of the character, if so, go to step
8; else go back to step 3.
Write down parameter pair (wi; hi).
Step 7: Jump over the detected segment, test
whether or not the detector has arrived the
top, if not, go to step 2.
Step 8: End of the procedure.
For instance, in order to distinguish two di€erent
writing styles of characters ``4'' and ``9'' indicated
in Fig. 7. FFD(I) can detect a left-pro®le segment
in both character ``4'' and ``9''. However, FFD(II)
can only extract a right-pro®le segment in character ``4'' and FFD(III) for detect the bottom-left
segment of one writing style of character ``9''.
Combined with the middle line feature, a decision
tree can be constructed which will be elaborated in
Section 4.
3. Binary decision tree classi®er
The design of a DT classi®er can be decomposed into three categories:
1. The choice of an appropriate tree structure.
2. The choice of feature subsets to be used at each
internal node.
3. The choice of the decision rule or strategy to be
used at each internal node.
50
Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56
We will address the issue related to build up an
appropriate tree for classi®cation in this section. In
a binary DT classi®er, a sequence of decision rules
is used to assign an unknown sample to a pattern
class. A hierarchical structure tree is noted as T
consisting of several levels: Level 0 contains a node,
called root node. Level 1 contains the nodes 2 and
3; level 2 contains the nodes 4, 5, 6, and 7 and so on.
Speci®cally, level i contains 2i nodes from 2i to
2…i‡1† 1. Nodes with descending branches are the
non-terminal nodes (NTNs). Nodes without descending branches are the terminal nodes (TNs).
Each NTN contains a decision rule. Each TN belongs to one of recognized classes.
A tree can be grown by recursively ®nding
splitting rules (features and thresholds) until all
terminal nodes have pure or nearly pure class
membership or cannot be split further. According
to Guo and Gelfand (1992), let N be the number of
training samples, N …t† be the number of training
samples which land in node t, Nj…t† be the number
of training samples which land in node t and belong to class j, and M be the number of class to be
classi®ed. De®ne
P …t† ˆ N …t†=N ;
PL …t† ˆ P …tL †=P …t†;
PR …t† ˆ P …tR †=P …t†; and
P …j=t† ˆ Nj …t†=N …t†;
where P …t† is the probability that a randomly selected training sample lands in node t, PL …t† …PR …t††
is the conditional probability that training samples
belong to left tL (right tR ) branch given it lands in
node t, and P …j=t† is the conditional probability
that the training sample belongs to class j given it
lands in node t. De®ne a tree splitting criterion
based on a node impurity function such as the Gini
criterion (Breiman et al., 1984).
XX
g…t† ˆ
P …i=t† P …j=t†:
i
j6ˆi
Next, de®ne the change in node impurity
DG…f ; h; t† due to a split at node t with two parameters feature vector f and threshold h by:
DG…f ; h; t† ˆ g…t†
g…tL † PL …t†
g…tR † PR …t†:
…4†
The best feature f and the threshold h at node t
can be obtained by maximizing the decrease in
node impurity:
DG…f ; h ; t† ˆ maxfDG…f ; h; t†g;
f 2 F;
…5†
where F is the feature set.
In our recognition system, f is chosen from a
global and local feature space F, whereas the recognized class y 2 f1; 2; 3; . . . ; 9; 0g. A decision rule
d…† is a function that map f into class y with d…f †
representing the class label of a feature vector f.
The misclassi®cation rate of the decision classi®cation tree is denoted by
R…T † ˆ P …d…f † 6ˆ y†:
…6†
In practical applications, the misclassi®cation rate
is simply estimated by the ratio of samples misclassi®ed to the total number of the testing
samples.
R…T † ˆ Nerror =N ;
…7†
where Nerror is the number of samples such that
d…f † 6ˆ y. There is a guideline on how to ®nd a
pruned tree from a tree T. Suppose T 1 is a pruned
subtree of tree T if T 1 has same root node as T and
has fewer either NTN or NT. This is denoted by
T1 < T . For an optimal seeking pruned subtree T1
from T, we construct many tree T 0 and satisfy:
R…T1 † ˆ minfR…T 0 † 8T 0 6 T g:
…8†
In this paper, an iterative growing and pruning
algorithm (Gelfand et al., 1991) is adopted for the
construction of a decision tree. The training algorithm is described as follows:
· The training data are split into two independent
sets, called the ®rst and second training sets.
· A large tree is grown based on the ®rst training
set by splitting until all terminal nodes have a
pure class membership, or have fewer than a
speci®ed number of samples, or cannot be split
such that both descendents are nonempty.
· A pruned subtree is selected by minimizing the
misclassi®cation rate over the second training
set.
· A tree is grown o€ of the terminal nodes of the
selected pruned subtree based on the second
training set by splitting until all terminal nodes
Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56
have pure class memberships, or have fewer
than a speci®ed number of samples, or cannot
be split such that both descendents are nonempty.
· A pruned subtree is selected by minimizing the
misclassi®cation rate over the ®rst training set.
· The procedure is iterated, successively interchanging the roles of the ®rst and second training sets. It will be shown that the process of
selected pruned subtree converges, then the tree
is formed.
4. Recognition system
A block diagram of handwritten numeral recognition system is shown in Fig. 8.
The recognition system consists of three main
parts, namely feature extraction, coarse classi®cation, ®ne classi®cation. Features extracted in Section 2 will be used in our system for feature
extraction.
Coarse classi®cation: a three-layer neural network with back-propagation algorithm is employed. Totally global features 98 bits and local
features 8 bits are fed into the input layer. The
51
output layer is composed of six nodes representing
six character subsets f0g, f6g, f8g, f1; 7g, f2; 3; 5g,
f4; 9g based on the feature similarity extracted in
this paper. It does not totally depend on the similarity of the geometrical pro®le of characters. The
middle line feature, concave feature, point feature,
as well as local feature extracted by FFD can be
used to distinguish character ``6'' from ``5'', character ``8'' from ``6'', character ``0'' from ``9'', etc.
For example, in order to distinguish character ``2''
from character ``9'', the FFD of the Fig. 5(d) was
applied to detect horizontal-like segment in the
right pro®les on both characters. Normally, one
bottom-right segment in the character ``2'' can be
detected, however, same positional segment in
character ``9'' cannot be detected. The network is
connected between adjacent layers. Here brie¯y
introduce some encoding scheme.
Middle line feature encoding: Only three longest
middle lines in each character will be considered.
Those features such as the terminal point positions, the terminal point open/close status will be
encoded. For each middle line position encoding,
the character image is equally divided into 4 4
sub-regions. 4 bits can be used to encode one
terminal point position situated at relative sub-
Fig. 8. The system block diagram of recognition system.
52
Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56
region. For terminal point status encoding, if the
terminal point is sealed by adjacent strokes, the
status is closed: (1), otherwise, the status is opened
(0). So 30 bits are used to encode the middle line
features of a character.
Concave feature encoding: Every pro®le is divided into three sub-regions. The concave position
can be encoded 1 as soon as the concave feature is
extracted in the corresponding sub-region or else it
will be set to 0. 12 bits is needed to encode the
concave features of a character.
Width feature encoding: A normalized character
is partitioned into four slices along vertical direction. 2 bits can encode the width of each slice. 8
bits are needed for width feature encoding.
Point feature encoding: Three planes referring to
end point plane, branch point plane, and cross
point plane are masked on character image with
4 4 grids. Where exists an end point, or a branch
point, or a cross point, the corresponding grid
position on the relative plane will be set to 1, respectively. Three feature planes need 16 3 ˆ 48
bits to encode.
Local feature encoding: Only horizontal-like
segments in both left and right pro®les are considered. Four equal sub-regions in each pro®le are
produced. The relevant sub-region can be encoded
1 as soon as the segment in this sub-region is detected otherwise it will be set to 0. 8 bits are needed
to encode both the left and the right pro®les' local
features.
After being trained, the neural network can be
used as coarse classi®er. Reducing the number of
classes from 10 into six has made a great impact
on speeding up learning and facilitating convergence. In the coarse classi®cation, characters ``0'',
``6'', and ``8'' are directly recognized from ANN
as the three character sets have very stable and
recognizable middle line and concave features
which can be used to easily distinguish one from
others.
For the classi®cation of the remaining three
subsets f1; 7g, f2; 3; 5g, f4; 9g. Special attention
must be paid to the geometrical di€erence between
(among) those characters in each subclass in order
to distinguish one from others. A heuristic and
empirical method has been applied to build up the
relationship between feature vectors and decision
rule d…†.
For example, in order to distinguish 1 from 7 in
f1; 7g subset, two most stable and most distinguishable features (width feature and local segment feature) are chosen to build the relationship
between the feature and class.
Some decision rules d…† can be deduced based
on the feature±class relationship vectors such as:
If {(one or more segments in the left pro®le) or
(the di€erence of width feature in di€erent
slices)}
The character belongs to ``7''
Else
The character belongs to ``1''
After training, the decision tree of distinguishing
character ``1'' from ``7'' is shown in Appendix A.
For subset f4; 9g shown in Fig. 7, the middle
line feature, the local segment feature and the
point features are used to describe the relationship
between the features and the corresponding class.
The overall features extracted by the proposed
method are listed in Table 1, these features can
construct corresponding decision tree to distinguish between character ``4'' and ``9''.
As characters ``4'' and ``9'' with di€erent writing
styles in Fig. 7 using relevant features extracted by
the proposed methods and listed in Table 1 and
decision tree, the character set can be correctly
recognized. However, for character ``4'', if the
writing style is the same as the one in the left-
Table 1
The list of extracted features of character ``4'' and ``9''
Character
4
9
Open (0)/Close (1) status of
middle line
Segment detected by FFD in
the left/right edge
Point features
Beg
End
Left
Right
End
Branch
Cross
0 (1)
1
1
1
1
1
1
0
4 (3)
1 (2)
1 (2)
1
1
0
Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56
bottom of Fig. 7 (both the open/close status of
beginning and end of middle line are closed), and if
the right-middle segment is not detected by
FFD(II) in the right pro®le (the length of rightmiddle segment is less than one-eighth of the
character's width), this character could be recognized incorrectly. It is because under such a circumstance, there are no feature di€erence between
the character ``4'' and character ``9'' in the above
writing style.
For subset f2; 3; 5g, some of global features
such as point features, concave features, and local
segment features are chosen to deduct several decision rules.
Trees are generated by recursively partitioning
the feature space in such a way that all of the
terminal nodes belong to a speci®c class, then an
iterative growing and pruning algorithm can be
used to prune and grow the trees in pursuit of
optimal design of the trees.
5. Experiments
Two sets of handwritten numerals were collected. In both cases, we adopt the pattern recognition convention in selection of training and
testing data as follows: For ®rst group (called
Data 1), 10 000 free handwritten numerals are
collected written by 200 people with unconstrained writing style. 5000 out of these characters are selected randomly as training samples
(each class includes 500 characters); the others are
then input as testing samples. Another group
(called Data 2), 2500 characters from NIST
Special Database3 are chosen for training, other
2500 characters in the same database are used to
test. This method ensures that the characters in
53
the evaluation set are not in the training set. The
reject policy is incorporated into the neural networks with two rules. The ®rst rule rejects a
character when the highest activation level of the
output neurons does not exceed a predetermined
threshold. The second rule requires that the difference between the two highest activation levels
of the output neurons must be greater than a
predetermined percentage of the highest activation level of the output neurons. If the output
satis®es the two conditions, the character will be
classi®ed into the class associated with the unit
with the highest activation level in the output
layer, otherwise, the character will be rejected
(Ergenzinger and Thomsen, 1995; Karras and
Perantonis, 1995).
Two experiments were conducted. For experiment one, a three-layer feed forward neural network was employed as character recognizer. The
network is connected between adjacent layers.
The global features encoded with 98 bits, and
local features encoded with 8 bits are together fed
into input layer. The recognition network has 10
output layers (10 units standing for characters
from 0±9), and one hidden layer (20 units). According to the rule of the thumb that a connection weight can easily learn 1.5 bits of
information (Lang and Witbrock, 1988). In experiment one, we adopt the network structure as
106 20 10, the total weights are only 106 20 ‡ 20 10 ˆ 2320, which need 2320 1:5 ˆ
3480 training data.
The networks are ®rstly trained by using
training samples (5000 characters in training set) in
Data 1, conducted recognition performance by
using the testing samples both in Data 1 and
Data 2. Then the training samples (2500 characters
in the training set) of Data 2 are again applied to
Table 2
The handwritten numeral recognition rate by ANN
Training samples
Testing samples
Recognition rate
(%)
Rejection rate
(%)
Misrecognition rate
(%)
Data 1 (5000)
Data
Data
Data
Data
96.70
95.50
97.60
96.10
1.95
2.50
1.30
1.60
1.35
2.00
1.10
2.30
Data 1 (5000) + Data 2 (2500)
1
2
1
2
(5000)
(2500)
(5000)
(2500)
54
Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56
Table 3
Coarse classi®cation recognition rate by using ANN
Training samples
Testing samples
Recognition rate
(%)
Rejection rate
(%)
Misrecognition rate
(%)
Data 1 (5000)
Data
Data
Data
Data
98.90
98.60
99.20
98.80
0.65
0.80
0.45
0.70
0.45
0.60
0.35
0.50
Data 1 (5000) + Data 2 (2500)
1
2
1
2
(5000)
(2500)
(5000)
(2500)
Table 4 lists the average number of nodes in three
trees.
The overall handwritten numeral recognition
rate for testing Data 1 and testing Data 2 is shown
in Table 5. The recognition rate tendency referring
to various training samples of this hybrid classi®cation system is visualized in Fig. 9.
Table 4
The number of nodes in three decision trees
Types of the tree
The number of nodes in
pruned tree
f1; 7g
f2; 3; 5g
f4; 9g
15
63
31
the networks to continuously conduct training,
and further recognition is performed by using both
testing samples. The overall recognition rates are
tabulated in Table 2.
In the experiment two, the same network as the
experiment one is employed as coarse character
recognizer shown in Fig. 8. The di€erent is that the
output layer only includes 6 units representing six
subsets f0g, f6g, f8g, f1; 7g, f2; 3; 5g, f4; 9g.
Table 3 gives the coarse recognition rate. It is
shown the recognition rate has improved greatly
because only six patterns are classi®ed. The remained character pairs will be further recognized
by decision tree classi®cation.
An iterative growing and pruning algorithm is
employed to construct three decision tree classi®ers. During decision tree growing and pruning
procedure, the training data in Data 1 and Data 2
were divided into two equal sub-training data,
and used to iterative split and prune the tree.
6. Conclusions
A good classi®er combining with stable and
¯exible feature extraction method is the most
important factor for pattern recognition, especially for OCR. In this work, the character geometrical features are exploited in detail. Firstly, a
set of global features is de®ned and extracted. A
novel ¯oating feature detector is developed. The
former depicts character global geometrical features, the later describes character local features.
A hybrid classi®er is proposed which comprises
two sub classi®ers, namely an ANN coarse classi®er and three decision tree ®ne classi®ers. In the
coarse classi®er, the characters with large di€erence in features extracted by this paper proposed
method are directly recognized. Those character
with similar geometric structure need to be further classi®ed. An iterative growing and pruning
Table 5
The handwritten numeral recognition rate by hybrid classi®er
Training samples
Testing samples
Recognition rate
(%)
Rejection rate
(%)
Misrecognition rate
(%)
Data 1 (5000)
Data
Data
Data
Data
97.80
97.60
98.10
97.90
1.35
1.25
1.15
1.20
0.85
1.15
0.75
0.90
Data 1 (5000) + Data 2 (2500)
1
2
1
2
(5000)
(2500)
(5000)
(2500)
Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56
55
tree algorithm is adopted to build three decision
trees for recognizing character pair. Compared to
other fusion of classi®er methods, proposed hybrid classi®er combines two entirely di€erent
classi®cation method (ANN and DT) to perform
two level hierarchical recognizers. For each recognizer, the training procedure is carried individually and the interference of di€erent data sets
is minimized. Whole system is ¯exible and easily
adjustable. Experiments demonstrated our proposed system has improved character recognition
rate compared to those only feed forward neural
networks with back propagation learning algorithm is used.
Fig. 9. Recognition rate versus sample size for ANN method,
hybrid decision tree method.
Appendix A
56
Z. Ping, C. Lihui / Pattern Recognition Letters 23 (2002) 45±56
References
Amin, A., Humoud Al-Sadoun, Fischer, S., 1996. Hand-printed
Arabic character recognition system using an arti®cial
network. Pattern Recognition 29 (4), 663±675.
Amit, Y., Geman, D., Wilder, K., 1997. Joint induction of
shape features and tree classi®ers. IEEE Trans. Pattern
Anal. Machine Intell. 19 (11), 1300±1305.
Blue, J.L., Candela, G.T., Grother, P.J., Chellappa, R., Wilson,
C.L., 1994. Evaluation of pattern classi®ers for ®ngerprint
and OCR applications. Pattern Recognition 18 (4), 485±501.
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984.
Classi®cation and Regression Trees. Wadsworth, Belmont,
CA.
Cai, J., Liu, Z.-Q., 1999. Integration of structural and statistical
Information for unconstrained handwritten numeral recognition. IEEE Trans. Pattern Anal. Machine Intell. 21 (3),
263±270.
Cao, J. et al., 1995. Recognition of handwritten numerals with
multiple feature and multi-stage classi®er. Pattern Recognition 28 (2), 153±160.
Cao, J. et al., 1997. A hierarchical neural network architecture
for handwritten numeral recognition. Pattern Recognition
30 (2), 289±299.
Cho, S.-B., 1997. Neural-network classi®ers for recognizing
totally unconstrained handwritten numerals. IEEE Trans.
Neural Networks 8 (1), 43±53.
Ergenzinger, S., Thomsen, E., 1995. An accelerated learning
algorithm for multi-layer perceptions: optimization layer by
layer. IEEE Trans. Neural Networks 6 (1), 31±43.
Gader, P.D., Khabou, M.A., 1996. Automatic feature generation for handwritten digit recognition. IEEE Trans.
Pattern Anal. Machine Intell. 18 (12), 1256±1261.
Garris, M.D., Dimmick, D.L., 1996. Form design for high
accuracy optical character recognition. IEEE Trans. Pattern
Anal. Machine Intell. 18 (6), 653±656.
Gelfand, S.B., Ravishankar, C.S., Delp, E.J., 1991. An iterative
growing and pruning algorithm for classi®cation tree design.
IEEE Trans. Pattern Anal. Machine Intell. 13 (2), 163±174.
Guo, H., Gelfand, S.B., 1992. Classi®cation trees with neural
network feature extraction. IEEE Trans. Neural Networks 3
(6), 923±933.
Heutte, L., Paquet, T., et al., 1998. A structural/statistical
feature based vector for handwritten character recognition.
Pattern Recognition Lett. 19, 629±641.
Karras, D.A., Perantonis, S.J., 1995. An ecient constrained
training algorithm for feed forward networks. IEEE Trans.
Neural Networks 6 (6), 1420±1434.
Krishna, R., Sivakumar, G., Bhattacharya, P., 1999. Extracting
decision trees from trained neural networks. Pattern Recognition 32, 1999±2009.
Lang, K.J., Witbrock, M.J., 1988. Learning to tell two spirals
apart. In: Proceedings of the 1988 Connectionist Models
Summer School. Morgan Kaufmann, Los Altos, CA,
pp. 52±59.
Quinlan, J.R., 1986. Induction of decision trees. Machine
Learning 1, 81±106.
Safavian, S.R., Landgrebe, D., 1991. A survey of decision tree
classi®er methodology. IEEE Trans. Systems, Man Cybernet. 21 (3), 660±674.
Sethi, I.K., 1995. Neural implementation of tree classi®ers.
IEEE Trans. Systems, Man Cybernet. 25 (8), 1243±1249.
Sethi, I.K., Sarvarayudu, G.P.R., 1982. Hierarchical classi®er
design using mutual information. IEEE Trans. Pattern
Anal. Machine Intell. 4, 441±445.
Trier, é.D., Jain, A.K., Taxt, T., 1996. Feature extraction
methods for character recognition ± a survey. Pattern
Recognition 29 (4), 641±662.
Wang, Q.R., Suen, C.Y., 1987. Large tree classi®er with
heuristic search and global training. IEEE Trans. Pattern
Anal. Machine Intell. 9 (1), 91±102.
Weideman, W.E., Manry, M.T., Yau, H.-C., Gong, W., 1995.
Comparisons of a neural network and a nearest-neighbor
classi®er via the numeric handprint recognition problem.
IEEE Trans. Neural Networks 6 (6), 1524±1530.
Zhang, B., Fu, M., Yan, H., Jabri, M.A., 1999. Handwritten
digit recognition by adaptive-subspace self-organizing map.
IEEE Trans. Neural Networks 10 (4), 939±953.