full_text

advertisement
A Class-modularity for Character Recognition
Il-Seok Oh*, Jin-Seon Lee**, and Ching Y. Suen***
*
Division of Electronics and Information Engineering, Chonbuk National University, Korea
**
Department of Computer Engineering, Woosuk University, Korea
***
Centre for Pattern Recognition and Machine Intelligence, Concordia University, Canada
Abstract
A class-modular classifier can be characterized by two prominent features: low classifier complexity and
independence of classes. While conventional character recognition systems adopting the class modularity are faithful
to the first feature, they do not investigate the second one. Since a class can be handled independently of the other
classes, the class-specific feature set and classifier architecture can be optimally designed for a specific class. Here
we propose a general framework for the class modularity that exploits fully both features and present four types of
class-modular architecture. The neural network classifier is used for testing the framework. A simultaneous selection
of the feature set and network architecture is performed by the genetic algorithm. The effectiveness of the classspecific features and classifier architectures is confirmed by experimental results on the recognition of handwritten
numerals.
Keywords: class-specific feature selection, class-specific architecture selection, modular neural network, genetic
algorithm, handwritten numeral recognition
I. Introduction
In the class-modular concept, the original K-classification problem is decomposed into K 2-classification
subproblems. A modular architecture is adopted which consists of K subclassifiers, each responsible for
discriminating a class from the other K-1 classes. An integration module collects the outputs from K 2-classifiers and
decides the final output of the total classifier [Oh01].
The class-modular classifier has been proven to have a superior recognition performance to the conventional
non-modular structures. The class modularity has been tested using the neural network classifiers [Tsay92, Mui94,
Anand95, Oh01] and using the rule-based classifiers [Suen92, Hong96]. All of these research studies presented
experimental data supporting its superiority in character recognition. The most recent paper among them provides
experimental results showing a significant improvement in various character recognition problems ranging from
small-set classification of the numerals (10 classes) to a large-set classification of a subset of Korean characters (352
classes) [Oh01].
The most prominent features of the class modularity can be highlighted in two ways.

Low classifier complexity: Each of 2-classifiers has a much smaller number of parameters to be estimated by the
training process. Since a 2-classifier uses the whole training set for its training, the process of parameters
estimation can be accomplished in a more precise and stable manner. Oh and Suen presented experimental
results using a variety of handwriting recognition problems and drew an important conclusion that the classmodular neural network training terminates its convergence much faster and in a more stable state than the nonmodular neural networks both in MSE and recognition rate [Oh01].

Independence of classes: Each of 2-classifiers can be designed and trained independently of the other classes. In
the most general form, the 2-classifier module for a specific class can be viewed as a black box by the other K-1
classes, and the sole purpose of the 2-classifier is to discriminate the input samples with a very high precision.
We can design the 2-classifier modules so that they can have their own feature set and classifier architecture.
The papers reviewed in the above exploit only the first feature of the class modularity. Though they achieved an
improvement, there is still room for further improvement by using the second feature. Oh et al proposed a scheme of
class-specific feature selection for the recognition of handwritten numerals [Oh99]. The class-specific features
resulted in a significant improvement in recognition rates for the well-known CENPARMI and CEDAR handwritten
numeral databases.
In this paper, we propose a general framework for the class modularity in the domain of character recognition.
Four types of architecture will be presented. In the most restricted form called Type 1, all the classes share a common
feature vector, and use a common classifier architecture. In Type 2, individual classes have their own feature vectors,
but a common classifier architecture is used. In Type 3, a feature vector is shared, but each class uses a different
classifier architecture. In the most general form called Type 4, classes have their own feature vectors and use
different classifier architectures.
Several important issues will be raised and discussed. The separate design of the classes could pose new
problems, like the selection of class-specific features and classifier architectures, global optimization after locally
optimizing each class, and the integration of outputs from 2-classifiers. As an example, as the heterogeneity of 2classifiers increases, a more complex integration scheme should be devised. However, a new way of improving the
recognition performance of the character recognition systems has been created by the class modularity concept and
choice of the independence level of classes is solely at our disposal.
To prove the effectiveness of the class modularity, the neural network classifier is used. We choose FFMLP
(Feed-Forward Multiple Layer Perceptron) because it is one of the most popular classification schemes in current
character recognition applications. Using the handwritten numeral recognition problem, Type 1 and Type 4
architectures are implemented and compared. In Type 4, each class is allowed to select its own feature subset and
FFMLP architecture. For the feature and network architecture selection, we use a GA (Genetic Algorithm). Both the
feature and network architecture (i.e., number of hidden nodes) information are encoded in a binary chromosome, so
their selection is performed simultaneously. The crossover and mutation operators appropriate for our selection
process are designed.
The experimental data are collected so that we can compare the recognition performance of Type 1 and Type 4
architectures. Several performance graphs illustrate a clear superiority of Type 4 to Type 1. Since there still remains
much room for raising the class independence, we assert that the class modularity concept is worthy of further
investigations.
In Section II, a general framework for the class modularity is proposed and four types of architecture are
presented. In Section III, GA for the feature and network architecture selection is described. Section IV presents
experimental results. Finally, Section V gives the concluding remarks.
II. Class Modularity in a General Framework
2.1 Conventional framework of character recognition
The conventional character recognition system is depicted in Figure 1. It consists of two major modules, one for
the feature extraction and the other for the classification. The vectors V, X, and D represent the measurement, feature,
and class decision vectors, respectively. The task of transforming d-dimensional X into K-dimensional D is denoted
by K-classification, and a program module for the K-classification is denoted by K-classifier. To achieve an effective
transformation from X to D, the classifier contains a set of free parameters denoted by P. For the Bayes classifier, P is
a set of statistical variables related to the means and covariance matrices to be estimated. For the neural network
classifier, the weights on arcs connecting the nodes constitute the parameter set P. In the handwriting recognition, the
size of P is usually in the thousands or tens of thousands to accommodate a large variation of the handwritten
characters. For the large-set classification, it increases up to several hundred thousands. The classifier should be
trained prior to the recognition. The training is a procedure for adjusting P using a set of training samples. Let us
denote a set of samples from the class i by Zi={zi0, zi1, ... , ziNi-1} where Ni is the number of samples belonging to
i.
After finishing training the classifier, the recognition procedure evaluates a formula involving the adjusted P in
order to produce D. The class decision module determines a unique class label using D. The algorithm which chooses
the class with the maximum value in D is the simplest and is most popular. There are other class decision algorithms
which could improve the final recognition performance.
One of the distinct properties of the conventional framework is that all the K classes share the same feature
space and the same classifier architecture. The essential task in designing a character recognition system is to choose
a feature type with a good discriminative power and a classification algorithm discriminating the classes well in the
chosen feature space. Figure 2 illustrates examples for the task.
X
V
*
Feature
extraction
K-classification
d
D
K
(*: variable size)
Figure 1. Conventional architecture viewed as unstructured black box.
x2
aa a
c
a a a a a cc cc
a a
c cc
cc
b
dd
b b b dd d d
bbb
d dd
bb b
x4
x1
(a) good choice of features
dd
d dd d
d dd d c cc
d dd d
cb b b
c c cc b
aa a b c cc b b
baa bb b b a a
a b aa aa
x3
(b) bad choice of features
Figure 2. Artificial sample distributions in feature space.
The first situation in Figure 2(a) illustrates a good choice of the features. Four classes, a, b, c, and d, can be
perfectly discriminated using the feature vector X1=(x1,x2) and a linear classifier. However this case rarely happens in
the actual character recognition domain. Due to considerable variability of handwritten characters, Figure 2(b) is a
usual situation where the different classes have large overlaps in the feature space. It is not an easy task to design an
accurate K-classifier in this situation.
In Figure 2(b), one notable aspect is that while the feature vector X2=(x3,x4) is not good in discriminating all
the four classes, it is good at least in discriminating class d from the other 3 classes. However, since in the
conventional architecture shown in Figure 1, all the classes share the feature space and classifier, the feature vector
X2 will not produce a good recognition performance. We must add more features to X2 or give it up and find new
kinds of feature with a better discriminative power. Repetition of adding or replacing the features is a tedious task
with practical limitations. In this sense, we argue that the conventional architecture has a rigid structure composed of
two unstructured black boxes as seen in Figure 1 for the feature and classification modules in which all the K classes
are altogether intermingled within a box. The modules can not be modified or optimized locally for each class.
2.2 General framework for the class modularity
A motivation of the class modularity can be found in the observation that the feature vector X2 in Figure 2(b)
is good in discriminating the class d from the other 3 classes. In this situation, a linear function is enough for
discriminating the patterns from the class set {d} and those from the class set {a, b, c}. For this partial classification,
the feature vector X2 is so important that we must keep it.
Figure 3 illustrates another artificial situation where classes are well discriminated in different feature spaces.
In this situation, the feature vector (x1,x2) and a linear classifier is good for classes a and b. Other feature vectors
(x3,x4) and (x5,x6) are designed for classes c and d, respectively. For the class d, a quadratic classifier should be
designed. Although it is a simple artificial situation, it explains well the rationales and motivations for the class
modularity.
x2
x4
c c dd
d cc d d
bb b
dd c dd b bbb
c d dc bb bb
cd
a aa aa
a aa a a
x6
cc
c cc cc
c cc ccc
d
cc
dda aa bb
dd abb aab
a a bb
x1
aaa bcc
aa ca bb c
c b cc cb
cc cc bbc
dddccd dccd bbc
b ddb dcb
dd d
bb
d dd d d bbd
d ddd dd d
c b cc
x3
(a) features for classes a and b
(b) features for class c
x5
(c) features for class d
Figure 3. An artificial situation motivating the class modularity.
In the class-modular classification, the K-classification problem is decomposed into K 2-classification
subproblems, one for each of the K classes. A 2-classification subproblem is solved by the 2-classifier specifically
designed for the corresponding class. The 2-classifier is only responsible for one specific class and discriminates that
class from the other K-1 classes. In the class-modular framework, K 2-classifiers solve the original K-classification
problem cooperatively and the class decision module integrates the outputs from the K 2-classifiers.
Both of the feature and classification modules in Figure 1 are the target for applying class modularity. We
denote the 2-classifier for the class i by 2-C(i) which considers two classes 0 and 1 constructed by dividing the
K original classes into two groups: 0={i}and 1={k| 0kK, ki}. The role of 2-C(i) is to determine the
membership probabilities of two classes, 0 and 1 for an input pattern. A schematic diagram for the 2-C(i) can be
seen in Figure 4. Note that D has two output lines because 2-C(i) is a 2-classifier. The vector, D=(d0, d1) should
represent the probabilities of the input pattern to belong to the classes 0 and 1. There is another option in the
number of output lines in a 2-classifier. Since the output can be binary, we can put only one line. In this case, the
value 1 means the membership to 0 and the value 0 to 1. We can choose one of two options appropriate for the
application.
X
2-C( i)
(a) with two outputs
d0
d1
X
2-C( i)
(b) with one output
Figure 4. A schematic diagram of the 2-C(i).
dd
2-C(0)common
2-C(1)common
Xcommon
V
Fcommon
Dhomogeneous
….
2-C(K-1)common
F(0)
F(1)
V
X0
2-C(0)common
X1
2-C(1)common
….
F(K-1)
Dhomogeneous
….
XK-1
2-C(K-1)common
2-C(0)
2-C(1)
Xcommon
V
Fcommon
Dheterogeneous
….
2-C(K-1)
F(0)
F(1)
V
X0
X1
….
F(K-1)
2-C(0)
2-C(1)
Dheterogeneous
….
XK-1
2-C(K-1)
Figure 5. Four types of the class-modular architecture.
The K-classifier is composed of K 2-C(i), 0iK. Four types of the class-modular architecture are
summarized by Figure 5. Type 1 in Figure 5(a) is the most restricted architecture which uses a common feature set
and a common classifier architecture for all the K classes. Fcommon and Xcommon represent the commonly used feature
extractor and the feature vector, respectively. The 2-C(i)common is a 2-classifier which uses a common classifier
architecture as other classes, but the K 2-C(i)common have their own classifier parameter sets and should be trained
and managed independently since the architecture has the class modularity. Type 2 architecture allows the classes to
have their own feature vectors. The F(i) and Xi represent the feature extractor and the feature vector for the class
i, respectively. For example, class 0 uses the mesh features and class 1 uses the contour direction features, and so on.
But they share the same classifier architecture. In Type 3 architecture, a feature vector is shared by all the classes
while different classifier architectures are used by the different classes. For example, while using a common mesh
feature vector, class 0 uses a 3-layer FFMLP and class 1 uses a rule-based classifier, and so on. When several classes
use the FFMLP, they can use a different network structure like the number of hidden layers and hidden nodes by
determining the optimal values for those network parameters for each class. Type 4 has the most general architecture
that allows the classes to have their own feature vectors and classifier architectures.
Each of 2-C(i) is trained independently of other classes. The same training algorithm as the original Kclassifier is applicable to each of 2-classifiers. What we should do is to prepare the training set for each of K 2classifiers. To train the 2-C(i), we reorganize the samples in the original training set into two groups, Z 0 and Z1
such that Z0 has the samples from the classes in 0 and Z1 those from the classes in 1, i.e.,
Z 0    Zi and Z 1    Zi .
i
0
i
1
Finishing the preparation of the training set for 2-C(i), the same training algorithm as the nonmodular Kclassifier is applied to train the 2-C(i). When all the classes complete their training, the whole training process is
complete. After the completion, we save the K sets of 2-classifier parameters and use them for the recognition at the
operational stage.
2.3 Class-modular FFMLP
In Figure 6(a), we can see a schematic diagram for a 2-classifier. It has one input layer, one hidden layer, and
one output layer. The d-m-1 represents the network architecture where d, m, and 1 denote the numbers of input,
hidden, and output nodes, respectively. The three layers are fully connected. The input layer has d nodes to accept the
d-dimensional feature vector. We use a 2-classifier with one output node. The number of hidden nodes is dynamically
determined by the network architecture selection process.
The total classifiers of Type 1 and Type 4 for the numeral recognition are shown in Figures 6(b) and 6(c),
respectively. The Type 1 classifier has a feature selector (denoted by F-selector) commonly used by the 10 classes. It
selects k out of d features where dk. The 10 2-classifiers use the same network architecture, i.e., d-m-1. In Type 4
classifier, each class has its own feature selector (denoted by F-selector(i)) and uses the class-specific feature subset
whose dimension is ki, 0i<9. Additionally the 10 2-classifiers use different network architectures, ki-mi-1, 0i<9.
k
F-selector
k
2-C(0)k-m-1
d
2 - C(i )d-m-1
(a) 2-classifier
k0
2-C(1)k-m-1
…..
d
X
FF-Selector(0)
FF-Selector(1)
10
k-m-1
k
2-C(9)
(b) total classifier of Type 1
k1
d
2-C(0)k0-m0-1
2-C(0)k1-m1-1
…..
10
k9-m9-1
FF-Selector(9)
k9
2-C(9)
(c) total classifier of Type 4
Figure 6. Class-modular FFMLPs of Type 1 and Type 4.
III. Feature and Architecture Selection for Class-modular Neural Network by GA
3.1 GA for feature and network architecture selection
The feature selection is a process of selecting k features out of d features where k is smaller than d. The primary
purposes are to gain a computational efficiency and to improve the classification accuracy by removing useless
and/or redundant features. It is well known that the optimal selection is computationally infeasible due to exponential
search space and hence the available algorithms seek a sub-optimal solution. We provide two options depending on
type of the number k. When an exact integer is given, it is the number-based selection. If an integer range of [L, H] is
given, it is the range-based selection. In the range-based scheme, we should select a feature subset whose cardinality
is within the range.
As already said in Subsection 2.3, we use a FFMLP with one hidden layer for the classification purpose and so
the classifier architecture can be represented by I-M-O. The number I is same as the number of features and is fixed
by the feature selection. In our class-modular architecture, a 2-classifier has one output node as seen in Figure 4(b),
so O is 1. The number of hidden nodes (i.e., M) plays an important role in designing an accurate FFMLP classifier
[Looney97]. The network architecture selection process in this paper attempts to estimate an optimal M.
One of the distinguishing characteristics in the selection processes is that in Type 4 the classes perform the
selections separately to design their own feature vectors and network architectures. The other one is related to a
simultaneous selection of the feature and network architecture by using a GA.
The reasons for choosing GA for the selection processes can be explained as follows.

GA’s effectiveness for the feature selection and optimal parameter estimation has been confirmed by many
researchers [Yang98, Raymer00].

By encoding the feature subset and network architecture information in one chromosome, a simultaneous
optimization of both selections can be accomplished.

The two-level optimization performing subsequently a local optimization of 2-classifiers and a global
optimization of total classifier is applicable.
The last argument needs a more explanation. The feature and network architecture selections render the best 2classifier for each of classes. However since the 2-classifiers are locally optimized independently of other 2classifiers, the total classifier composed of them is not guaranteed to be globally optimal. So in the two-level
optimization scheme, a global optimization of the total classifier is attempted. Since GA maintains multiple 2classifiers for each class in populations, we can construct the total classifier in a variety of ways by choosing a 2classifier of each class and integrating them. One method is to keep the top n 2-classifiers for each class and make nK
combinations using them where K is the number of classes. Since for a large n the search space is exponentially
growing, we can use GA to find out a sub-optimal total classifier.
3.2 GA implementation
Five design considerations should be given in implementing a GA for our problem [Michalewicz96].
(1) Chromosome representation: In order to select the feature subset and network architecture simultaneously, we
design a binary chromosome as shown in Figure 7. In the first part of the chromosome, a binary digit is assigned to
each feature and it has a value 1 if the feature is selected and a value 0 otherwise. The second part encodes the
number of hidden nodes in a binary number notation. Since a neural network with zero hidden node does not
function, the number of hidden nodes is one more than the actual number. As an example, assume an 8-dimensional
feature vector and the maximum number of hidden nodes is 16. Then the first part has 8 bits and the second part has
4 bits. A chromosome ‘10000100/1010’ represents the information that the first and sixth features are selected and
the network has 11 hidden nodes.
C = feature_selection_info/number_hidden_nodes_info = F / H = (f1f2 …., fd / m3m2m1m0)
fi = 1 (selected) or 0 (not selected), number of hidden nodes = (m 0+21m1+22m2+23m3) + 1
Figure 7. A chromosome.
(2) Fitness evaluation: We should evaluate the chromosomes in a population in order to give a higher probability of
survival to the fitter chromosomes. In our problem, the evaluation is straightforward since a chromosome has all the
information about the feature and network architecture and the network training gives a recognition rate for the
training set. For a more reasonable performance index, the recognition rate for a validation set can be used. But in
our experimentation, we do not have a separate validation set and so the recognition rate for the training set was used.
In our experiments, we use the range-based selection with a range [L,H]. In order to force a feature subset to be
within the range, we take the range as a constraint and give a penalty to the chromosomes breaking the constraint.
The fitness measurement of a chromosome C=F/H is defined by considering a linear penalty as follow:
fitness(C) = goodness(C) – penalty(F),
goodness(C) = recognition rate for the training set,
penalty(F) =
0 if Lcardinality(F) H,
w*(cardinality(F)-H) if cardinality(F)>H,
w*(L-cardinality(F)) if cardinality(F)<L
in which w is a weight and cardinality(F) is the number of digits with value 1 in F. The linear penalty function is
shown in Figure 8.
(3) GA operators: We use the standard crossover and mutation operators described in [Michalewicz96]. The
operators are applied to two parts, F and H, of a chromosome separately. The mutation for the part F is operated such
that the number of 0-1 conversion is the same as the number of 1-0 conversion with the purpose of controlling a
chromosome to satisfy the range constraint. The mutation for the part H is done in a way that the mutation
probability decreases as the generation increases and as a bit is more significant. Since change of a more significant
bit has a stronger influence on the number of hidden nodes, we give a lower probability to the bit. Figure 9 illustrates
the dynamic mutation probabilities.
Penalty
m0
Pm
m1
m2
m3
L
generation
H
Figure 8. Penalty function.
Figure 9. Dynamic mutation probabilities for H.
(4) Initial population: When a range [L,H] is given for the feature selection, we generate an initial F such that the
probability of a binary digit to be 1 is (L+H)/2d where d is the total number of features. So an initial chromosome is
likely to have (L+H)/2 features selected.
(5) Parameters: We use the following GA parameters.
Population size = 20, Number of generation = 50, Crossover probability = 0.6, Mutation probability = 0.1
3.3 GA procedures
The GA procedures for Type 1 and Type 4 can be outlined in the followings. Note that in Type 4 GA process operates
on each class separately. After each generation, the total classifier is constructed by choosing the best 2-classifiers of
10 classes and it is evaluated.
Type 1 classifier
Type 4 classifier
GA_program()
{
t=0;
initial_population();
evaluate_population();
GA_program()
{
t=0;
for(k=0;k<No_Classes;k++) {
initial_population(k);
evaluate_population(k);
}
evaluate_total_classifier();
while (t<Max_Generation) {
t++;
select_new_population();
crossover(); mutation();
evaluate_population();
}
}
}
while (t<Max_Generation) {
t++;
for(i=0;i<No_Classes;k++) {
select_new_population(k);
crossover(k); mutation(k);
evaluate_population(k);
}
evaluate_total_classifier();
}
IV. Experiments
For our experiment, the CENPARMI handwritten numeral database is used. We use the DDD feature vector that
has 256-dimensions [Oh98]. We let the number of hidden nodes to be within 16 and so a chromosome has
256+4=260 bits. We perform two experiments: Experiment 1 with the range [16,32] and Experiment 2 with the range
[8,16].
As the generation increments, we measure the actual recognition rates of the total classifiers using both the
training set and test set. Figures 10 and 11 illustrate the graphs of the recognition rates for Experiments 1 and 2,
respectively. Each of Figures 10 and 11 has four recognition rate curves, each representing one of Type 1 or Type 4
classifier and one of the training or test set. Type 4 is superior to Type 1 in both experiments. The best recognition
rates (i.e., from the fittest chromosome) for the training set in Experiment 1 are 96.07% at 43rd generation and
98.42% at 47th generation for Type 1 and Type 4, respectively. The corresponding recognition rates for the test set are
93.50% and 95.70% for Type 1 and Type 4, respectively. A similar analysis is made for Experiment 2. The best rates
for the training set are 91.97% at 22 nd generation (89.40% for the test set) and 96.12% at 48 th generation (93.10% for
the test set) for Type 1 Type 4, respectively.
recognition rates
100
98
96
94
T y p e 1-tra in
T y p e 1-te s t
T y p e 4-tra in
T y p e 4-te s t
92
90
88
86
1
6
11
16
21
26
31
36
41
46
51
g e n e ra tio n s
Figure 10. Graphs of recognition rates produced in Experiment 1 (range [16,32]).
recognition rates
100
95
90
T y p e 1-tra in
T y p e 1-te s t
T y p e 4-tra in
T y p e 4-te s t
85
80
75
1
6
11
16
21
26
31
36
41
46
51
g e n e ra tio n s
Figure 11. Graphs of recognition rates produced in Experiment 2 (range [8,16]).
Table 1 summarizes the performance data as well as network architecture. An interesting analysis can be made
regarding the class-specific feature subsets and classifier architectures. In Type 1, the GA found the optimal network
architecture 31-5-1 and 16-6-1 for the Experiment 1 and 2, respectively, that are commonly used by the 10 classes. In
Type 4, the 10 classes found their own optimal architectures as shown in the Table. In Experiment 1, class 2 uses the
least number of features and class 7 uses the least number of hidden nodes.
The large gaps in recognition rate between Type 1 and Type 4 for the test set (2.20% in Experiment 1and 3.70%
in Experiment 2) are clearly shown in Table 1. We argue that the Type 4 gains a significant improvement due to the
class-specific features and network architectures shown in the table.
Table 1. Summary of performance and network architectures
Best classifier
Type 1
Type 4
43rd
46th
Train set
96.07%
98.42%
Test set
93.50%
95.70%
At generation
Exp. 1
Architectures
(I-M-O (class))
22nd
48th
Train set
91.97%
96.12%
Test set
89.40%
93.10%
At generation
Exp. 2
31-5-1(0);28-3-1(1);25-7-1(2)
32-7-1(3);32-8-1(4);31-7-1(5)
31-5-1 (10 classes)
26-7-1(6);28-4-1(7);30-5-1(8)
30-6-1(9); 29.3-5.9-1(average)
Architectures
(I-M-O (class))
16-6-1(0);15-6-1(1);16-6-1(2)
15-7-1(3);16-4-1(4);16-8-1(5)
16-6-1 (10 classes)
15-3-1(6);15-8-1(7);16-6-1(8)
16-7-1(9); 15.6-6.1-1(average)
V. Concluding Remarks
A general framework for the class-modularity has been presented along with the motivations of the class-specific
features and classifier architectures. Among 4 types of class-modular architectures, Type 1 and Type 4 were tested
using handwritten numeral recognition and their performances were compared. The GA was adopted as a tool for
selecting the feature subset and classifier architecture simultaneously. The experimental results confirm that the
classifier designed with the class-specific features and network architectures has a superior performance.
Several future directions can be found. Firstly, we expect that the two-level optimization would result in a further
improvement. A GA program for the global optimization of the total classifier should be developed. Secondly, the
class modularity can be applied to other classification methodologies like k-nn and polynomial classifiers. Also,
applicability and effectiveness should be explored. Thirdly, since each of 2-classifiers may have different output
types and/or strength level, an advanced integration module must be developed to improve the classification
performance. Finally, application to other pattern recognition problems should be investigated.
References
[Anand95] R. Anand, K. Mehrotra, C.K. Mohan, and S. Ranka, “Efficient classification for multiclass problems
using modular neural networks,” IEEE Tr. on Neural Networks, Vol.6, No.1, pp.117-124, 1995.
[Hong96] K.-C. Hong, S.-M. Choi, J.-S. Lee, and I.-S. Oh, “Pipelining multiple algorithms for handwritten numeral
recognition,” Proceedings of 2nd Korean-French Workshop on Handwriting Recognition, pp.95-102, Paris, France,
1996.
[Looney97] C.G. Looney, Pattern Recognition Using Neural Networks, Oxford University Press, NewYork, 1997.
[Michalewicz96] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs,” Springer-Verlag,
1996.
[Mui94] L. Mui, A. Agarwal, A. Gupta, P.S.P. Wang, “An adaptive modular neural network with application to
unconstrained character recognition,” in Document Image Analysis (Edited by H. Bunke, P.S.P Wang, and H.S.
Baird), World Scientific, Singapore, pp.1189-1203, 1994.
[Oh98] I.-S. Oh and C. Y. Suen, “Distance features for neural network-based recognition of handwritten characters,”
International Journal on Document Analysis and Recognition, Vol.1, No.2, pp.73-88, 1998.
[Oh99] I.-S. Oh, J.-S. Lee, and C.Y. Suen, “Analysis of class separation and combination of class-dependent features
for handwriting recognition,” IEEE Tr. on PAMI, Vol.21, No.10, pp.1089-1094, October 1999.
[Oh01] I.-S. Oh and C.Y. Suen, “A class-modular feedforward neural network for handwriting recognition,” Pattern
Recognition (in printing).
[Raymer00] M.L. Raymer, W.F. Punch, E.D. Goodman, L.A. Kuhn, and A.K. Jain, “Dimensionality reduction using
genetic algorithms,” IEEE Tr. On Neural Network, Vol.4, No.2, July 2000.
[Suen92] C.Y. Suen, C. Nadal,
R. Legault, T.A., Mai, and L. Lam, “Computer recognition of unconstrained
handwritten numerals,” Proceedings of the IEEE, Vol.80, No.7, pp.1162-1180, 1992.
[Tsay92] S.-C. Tsay, P.-R. Hong, and B.-C. Chieu, “Handwritten digits recognition system via OCON neural network
by pruning selective update,” Proceedings of 11th ICPR, pp.656-659, 1992.
[Yang98] J. Yang and V. Honavar, “Feature subset selection using a genetic algorithm,” IEEE Intelligent Systems and
Their Applications, Vol.13, No.2, pp.44-49.
Download