full_text

A Class-modularity for Character Recognition Il-Seok Oh*, Jin-Seon Lee**, and Ching Y. Suen*** * Division of Electronics and Information Engineering, Chonbuk National University, Korea ** Department of Computer Engineering, Woosuk University, Korea *** Centre for Pattern Recognition and Machine Intelligence, Concordia University, Canada Abstract A class-modular classifier can be characterized by two prominent features: low classifier complexity and independence of classes. While conventional character recognition systems adopting the class modularity are faithful to the first feature, they do not investigate the second one. Since a class can be handled independently of the other classes, the class-specific feature set and classifier architecture can be optimally designed for a specific class. Here we propose a general framework for the class modularity that exploits fully both features and present four types of class-modular architecture. The neural network classifier is used for testing the framework. A simultaneous selection of the feature set and network architecture is performed by the genetic algorithm. The effectiveness of the classspecific features and classifier architectures is confirmed by experimental results on the recognition of handwritten numerals. Keywords: class-specific feature selection, class-specific architecture selection, modular neural network, genetic algorithm, handwritten numeral recognition I. Introduction In the class-modular concept, the original K-classification problem is decomposed into K 2-classification subproblems. A modular architecture is adopted which consists of K subclassifiers, each responsible for discriminating a class from the other K-1 classes. An integration module collects the outputs from K 2-classifiers and decides the final output of the total classifier [Oh01]. The class-modular classifier has been proven to have a superior recognition performance to the conventional non-modular structures. The class modularity has been tested using the neural network classifiers [Tsay92, Mui94, Anand95, Oh01] and using the rule-based classifiers [Suen92, Hong96]. All of these research studies presented experimental data supporting its superiority in character recognition. The most recent paper among them provides experimental results showing a significant improvement in various character recognition problems ranging from small-set classification of the numerals (10 classes) to a large-set classification of a subset of Korean characters (352 classes) [Oh01]. The most prominent features of the class modularity can be highlighted in two ways.  Low classifier complexity: Each of 2-classifiers has a much smaller number of parameters to be estimated by the training process. Since a 2-classifier uses the whole training set for its training, the process of parameters estimation can be accomplished in a more precise and stable manner. Oh and Suen presented experimental results using a variety of handwriting recognition problems and drew an important conclusion that the classmodular neural network training terminates its convergence much faster and in a more stable state than the nonmodular neural networks both in MSE and recognition rate [Oh01].  Independence of classes: Each of 2-classifiers can be designed and trained independently of the other classes. In the most general form, the 2-classifier module for a specific class can be viewed as a black box by the other K-1 classes, and the sole purpose of the 2-classifier is to discriminate the input samples with a very high precision. We can design the 2-classifier modules so that they can have their own feature set and classifier architecture. The papers reviewed in the above exploit only the first feature of the class modularity. Though they achieved an improvement, there is still room for further improvement by using the second feature. Oh et al proposed a scheme of class-specific feature selection for the recognition of handwritten numerals [Oh99]. The class-specific features resulted in a significant improvement in recognition rates for the well-known CENPARMI and CEDAR handwritten numeral databases. In this paper, we propose a general framework for the class modularity in the domain of character recognition. Four types of architecture will be presented. In the most restricted form called Type 1, all the classes share a common feature vector, and use a common classifier architecture. In Type 2, individual classes have their own feature vectors, but a common classifier architecture is used. In Type 3, a feature vector is shared, but each class uses a different classifier architecture. In the most general form called Type 4, classes have their own feature vectors and use different classifier architectures. Several important issues will be raised and discussed. The separate design of the classes could pose new problems, like the selection of class-specific features and classifier architectures, global optimization after locally optimizing each class, and the integration of outputs from 2-classifiers. As an example, as the heterogeneity of 2classifiers increases, a more complex integration scheme should be devised. However, a new way of improving the recognition performance of the character recognition systems has been created by the class modularity concept and choice of the independence level of classes is solely at our disposal. To prove the effectiveness of the class modularity, the neural network classifier is used. We choose FFMLP (Feed-Forward Multiple Layer Perceptron) because it is one of the most popular classification schemes in current character recognition applications. Using the handwritten numeral recognition problem, Type 1 and Type 4 architectures are implemented and compared. In Type 4, each class is allowed to select its own feature subset and FFMLP architecture. For the feature and network architecture selection, we use a GA (Genetic Algorithm). Both the feature and network architecture (i.e., number of hidden nodes) information are encoded in a binary chromosome, so their selection is performed simultaneously. The crossover and mutation operators appropriate for our selection process are designed. The experimental data are collected so that we can compare the recognition performance of Type 1 and Type 4 architectures. Several performance graphs illustrate a clear superiority of Type 4 to Type 1. Since there still remains much room for raising the class independence, we assert that the class modularity concept is worthy of further investigations. In Section II, a general framework for the class modularity is proposed and four types of architecture are presented. In Section III, GA for the feature and network architecture selection is described. Section IV presents experimental results. Finally, Section V gives the concluding remarks. II. Class Modularity in a General Framework 2.1 Conventional framework of character recognition The conventional character recognition system is depicted in Figure 1. It consists of two major modules, one for the feature extraction and the other for the classification. The vectors V, X, and D represent the measurement, feature, and class decision vectors, respectively. The task of transforming d-dimensional X into K-dimensional D is denoted by K-classification, and a program module for the K-classification is denoted by K-classifier. To achieve an effective transformation from X to D, the classifier contains a set of free parameters denoted by P. For the Bayes classifier, P is a set of statistical variables related to the means and covariance matrices to be estimated. For the neural network classifier, the weights on arcs connecting the nodes constitute the parameter set P. In the handwriting recognition, the size of P is usually in the thousands or tens of thousands to accommodate a large variation of the handwritten characters. For the large-set classification, it increases up to several hundred thousands. The classifier should be trained prior to the recognition. The training is a procedure for adjusting P using a set of training samples. Let us denote a set of samples from the class i by Zi={zi0, zi1, ... , ziNi-1} where Ni is the number of samples belonging to i. After finishing training the classifier, the recognition procedure evaluates a formula involving the adjusted P in order to produce D. The class decision module determines a unique class label using D. The algorithm which chooses the class with the maximum value in D is the simplest and is most popular. There are other class decision algorithms which could improve the final recognition performance. One of the distinct properties of the conventional framework is that all the K classes share the same feature space and the same classifier architecture. The essential task in designing a character recognition system is to choose a feature type with a good discriminative power and a classification algorithm discriminating the classes well in the chosen feature space. Figure 2 illustrates examples for the task. X V * Feature extraction K-classification d D K (*: variable size) Figure 1. Conventional architecture viewed as unstructured black box. x2 aa a c a a a a a cc cc a a c cc cc b dd b b b dd d d bbb d dd bb b x4 x1 (a) good choice of features dd d dd d d dd d c cc d dd d cb b b c c cc b aa a b c cc b b baa bb b b a a a b aa aa x3 (b) bad choice of features Figure 2. Artificial sample distributions in feature space. The first situation in Figure 2(a) illustrates a good choice of the features. Four classes, a, b, c, and d, can be perfectly discriminated using the feature vector X1=(x1,x2) and a linear classifier. However this case rarely happens in the actual character recognition domain. Due to considerable variability of handwritten characters, Figure 2(b) is a usual situation where the different classes have large overlaps in the feature space. It is not an easy task to design an accurate K-classifier in this situation. In Figure 2(b), one notable aspect is that while the feature vector X2=(x3,x4) is not good in discriminating all the four classes, it is good at least in discriminating class d from the other 3 classes. However, since in the conventional architecture shown in Figure 1, all the classes share the feature space and classifier, the feature vector X2 will not produce a good recognition performance. We must add more features to X2 or give it up and find new kinds of feature with a better discriminative power. Repetition of adding or replacing the features is a tedious task with practical limitations. In this sense, we argue that the conventional architecture has a rigid structure composed of two unstructured black boxes as seen in Figure 1 for the feature and classification modules in which all the K classes are altogether intermingled within a box. The modules can not be modified or optimized locally for each class. 2.2 General framework for the class modularity A motivation of the class modularity can be found in the observation that the feature vector X2 in Figure 2(b) is good in discriminating the class d from the other 3 classes. In this situation, a linear function is enough for discriminating the patterns from the class set {d} and those from the class set {a, b, c}. For this partial classification, the feature vector X2 is so important that we must keep it. Figure 3 illustrates another artificial situation where classes are well discriminated in different feature spaces. In this situation, the feature vector (x1,x2) and a linear classifier is good for classes a and b. Other feature vectors (x3,x4) and (x5,x6) are designed for classes c and d, respectively. For the class d, a quadratic classifier should be designed. Although it is a simple artificial situation, it explains well the rationales and motivations for the class modularity. x2 x4 c c dd d cc d d bb b dd c dd b bbb c d dc bb bb cd a aa aa a aa a a x6 cc c cc cc c cc ccc d cc dda aa bb dd abb aab a a bb x1 aaa bcc aa ca bb c c b cc cb cc cc bbc dddccd dccd bbc b ddb dcb dd d bb d dd d d bbd d ddd dd d c b cc x3 (a) features for classes a and b (b) features for class c x5 (c) features for class d Figure 3. An artificial situation motivating the class modularity. In the class-modular classification, the K-classification problem is decomposed into K 2-classification subproblems, one for each of the K classes. A 2-classification subproblem is solved by the 2-classifier specifically designed for the corresponding class. The 2-classifier is only responsible for one specific class and discriminates that class from the other K-1 classes. In the class-modular framework, K 2-classifiers solve the original K-classification problem cooperatively and the class decision module integrates the outputs from the K 2-classifiers. Both of the feature and classification modules in Figure 1 are the target for applying class modularity. We denote the 2-classifier for the class i by 2-C(i) which considers two classes 0 and 1 constructed by dividing the K original classes into two groups: 0={i}and 1={k| 0kK, ki}. The role of 2-C(i) is to determine the membership probabilities of two classes, 0 and 1 for an input pattern. A schematic diagram for the 2-C(i) can be seen in Figure 4. Note that D has two output lines because 2-C(i) is a 2-classifier. The vector, D=(d0, d1) should represent the probabilities of the input pattern to belong to the classes 0 and 1. There is another option in the number of output lines in a 2-classifier. Since the output can be binary, we can put only one line. In this case, the value 1 means the membership to 0 and the value 0 to 1. We can choose one of two options appropriate for the application. X 2-C( i) (a) with two outputs d0 d1 X 2-C( i) (b) with one output Figure 4. A schematic diagram of the 2-C(i). dd 2-C(0)common 2-C(1)common Xcommon V Fcommon Dhomogeneous …. 2-C(K-1)common F(0) F(1) V X0 2-C(0)common X1 2-C(1)common …. F(K-1) Dhomogeneous …. XK-1 2-C(K-1)common 2-C(0) 2-C(1) Xcommon V Fcommon Dheterogeneous …. 2-C(K-1) F(0) F(1) V X0 X1 …. F(K-1) 2-C(0) 2-C(1) Dheterogeneous …. XK-1 2-C(K-1) Figure 5. Four types of the class-modular architecture. The K-classifier is composed of K 2-C(i), 0iK. Four types of the class-modular architecture are summarized by Figure 5. Type 1 in Figure 5(a) is the most restricted architecture which uses a common feature set and a common classifier architecture for all the K classes. Fcommon and Xcommon represent the commonly used feature extractor and the feature vector, respectively. The 2-C(i)common is a 2-classifier which uses a common classifier architecture as other classes, but the K 2-C(i)common have their own classifier parameter sets and should be trained and managed independently since the architecture has the class modularity. Type 2 architecture allows the classes to have their own feature vectors. The F(i) and Xi represent the feature extractor and the feature vector for the class i, respectively. For example, class 0 uses the mesh features and class 1 uses the contour direction features, and so on. But they share the same classifier architecture. In Type 3 architecture, a feature vector is shared by all the classes while different classifier architectures are used by the different classes. For example, while using a common mesh feature vector, class 0 uses a 3-layer FFMLP and class 1 uses a rule-based classifier, and so on. When several classes use the FFMLP, they can use a different network structure like the number of hidden layers and hidden nodes by determining the optimal values for those network parameters for each class. Type 4 has the most general architecture that allows the classes to have their own feature vectors and classifier architectures. Each of 2-C(i) is trained independently of other classes. The same training algorithm as the original Kclassifier is applicable to each of 2-classifiers. What we should do is to prepare the training set for each of K 2classifiers. To train the 2-C(i), we reorganize the samples in the original training set into two groups, Z 0 and Z1 such that Z0 has the samples from the classes in 0 and Z1 those from the classes in 1, i.e., Z 0    Zi and Z 1    Zi . i 0 i 1 Finishing the preparation of the training set for 2-C(i), the same training algorithm as the nonmodular Kclassifier is applied to train the 2-C(i). When all the classes complete their training, the whole training process is complete. After the completion, we save the K sets of 2-classifier parameters and use them for the recognition at the operational stage. 2.3 Class-modular FFMLP In Figure 6(a), we can see a schematic diagram for a 2-classifier. It has one input layer, one hidden layer, and one output layer. The d-m-1 represents the network architecture where d, m, and 1 denote the numbers of input, hidden, and output nodes, respectively. The three layers are fully connected. The input layer has d nodes to accept the d-dimensional feature vector. We use a 2-classifier with one output node. The number of hidden nodes is dynamically determined by the network architecture selection process. The total classifiers of Type 1 and Type 4 for the numeral recognition are shown in Figures 6(b) and 6(c), respectively. The Type 1 classifier has a feature selector (denoted by F-selector) commonly used by the 10 classes. It selects k out of d features where dk. The 10 2-classifiers use the same network architecture, i.e., d-m-1. In Type 4 classifier, each class has its own feature selector (denoted by F-selector(i)) and uses the class-specific feature subset whose dimension is ki, 0i<9. Additionally the 10 2-classifiers use different network architectures, ki-mi-1, 0i<9. k F-selector k 2-C(0)k-m-1 d 2 - C(i )d-m-1 (a) 2-classifier k0 2-C(1)k-m-1 ….. d X FF-Selector(0) FF-Selector(1) 10 k-m-1 k 2-C(9) (b) total classifier of Type 1 k1 d 2-C(0)k0-m0-1 2-C(0)k1-m1-1 ….. 10 k9-m9-1 FF-Selector(9) k9 2-C(9) (c) total classifier of Type 4 Figure 6. Class-modular FFMLPs of Type 1 and Type 4. III. Feature and Architecture Selection for Class-modular Neural Network by GA 3.1 GA for feature and network architecture selection The feature selection is a process of selecting k features out of d features where k is smaller than d. The primary purposes are to gain a computational efficiency and to improve the classification accuracy by removing useless and/or redundant features. It is well known that the optimal selection is computationally infeasible due to exponential search space and hence the available algorithms seek a sub-optimal solution. We provide two options depending on type of the number k. When an exact integer is given, it is the number-based selection. If an integer range of [L, H] is given, it is the range-based selection. In the range-based scheme, we should select a feature subset whose cardinality is within the range. As already said in Subsection 2.3, we use a FFMLP with one hidden layer for the classification purpose and so the classifier architecture can be represented by I-M-O. The number I is same as the number of features and is fixed by the feature selection. In our class-modular architecture, a 2-classifier has one output node as seen in Figure 4(b), so O is 1. The number of hidden nodes (i.e., M) plays an important role in designing an accurate FFMLP classifier [Looney97]. The network architecture selection process in this paper attempts to estimate an optimal M. One of the distinguishing characteristics in the selection processes is that in Type 4 the classes perform the selections separately to design their own feature vectors and network architectures. The other one is related to a simultaneous selection of the feature and network architecture by using a GA. The reasons for choosing GA for the selection processes can be explained as follows.  GA’s effectiveness for the feature selection and optimal parameter estimation has been confirmed by many researchers [Yang98, Raymer00].  By encoding the feature subset and network architecture information in one chromosome, a simultaneous optimization of both selections can be accomplished.  The two-level optimization performing subsequently a local optimization of 2-classifiers and a global optimization of total classifier is applicable. The last argument needs a more explanation. The feature and network architecture selections render the best 2classifier for each of classes. However since the 2-classifiers are locally optimized independently of other 2classifiers, the total classifier composed of them is not guaranteed to be globally optimal. So in the two-level optimization scheme, a global optimization of the total classifier is attempted. Since GA maintains multiple 2classifiers for each class in populations, we can construct the total classifier in a variety of ways by choosing a 2classifier of each class and integrating them. One method is to keep the top n 2-classifiers for each class and make nK combinations using them where K is the number of classes. Since for a large n the search space is exponentially growing, we can use GA to find out a sub-optimal total classifier. 3.2 GA implementation Five design considerations should be given in implementing a GA for our problem [Michalewicz96]. (1) Chromosome representation: In order to select the feature subset and network architecture simultaneously, we design a binary chromosome as shown in Figure 7. In the first part of the chromosome, a binary digit is assigned to each feature and it has a value 1 if the feature is selected and a value 0 otherwise. The second part encodes the number of hidden nodes in a binary number notation. Since a neural network with zero hidden node does not function, the number of hidden nodes is one more than the actual number. As an example, assume an 8-dimensional feature vector and the maximum number of hidden nodes is 16. Then the first part has 8 bits and the second part has 4 bits. A chromosome ‘10000100/1010’ represents the information that the first and sixth features are selected and the network has 11 hidden nodes. C = feature_selection_info/number_hidden_nodes_info = F / H = (f1f2 …., fd / m3m2m1m0) fi = 1 (selected) or 0 (not selected), number of hidden nodes = (m 0+21m1+22m2+23m3) + 1 Figure 7. A chromosome. (2) Fitness evaluation: We should evaluate the chromosomes in a population in order to give a higher probability of survival to the fitter chromosomes. In our problem, the evaluation is straightforward since a chromosome has all the information about the feature and network architecture and the network training gives a recognition rate for the training set. For a more reasonable performance index, the recognition rate for a validation set can be used. But in our experimentation, we do not have a separate validation set and so the recognition rate for the training set was used. In our experiments, we use the range-based selection with a range [L,H]. In order to force a feature subset to be within the range, we take the range as a constraint and give a penalty to the chromosomes breaking the constraint. The fitness measurement of a chromosome C=F/H is defined by considering a linear penalty as follow: fitness(C) = goodness(C) – penalty(F), goodness(C) = recognition rate for the training set, penalty(F) = 0 if Lcardinality(F) H, w*(cardinality(F)-H) if cardinality(F)>H, w*(L-cardinality(F)) if cardinality(F)<L in which w is a weight and cardinality(F) is the number of digits with value 1 in F. The linear penalty function is shown in Figure 8. (3) GA operators: We use the standard crossover and mutation operators described in [Michalewicz96]. The operators are applied to two parts, F and H, of a chromosome separately. The mutation for the part F is operated such that the number of 0-1 conversion is the same as the number of 1-0 conversion with the purpose of controlling a chromosome to satisfy the range constraint. The mutation for the part H is done in a way that the mutation probability decreases as the generation increases and as a bit is more significant. Since change of a more significant bit has a stronger influence on the number of hidden nodes, we give a lower probability to the bit. Figure 9 illustrates the dynamic mutation probabilities. Penalty m0 Pm m1 m2 m3 L generation H Figure 8. Penalty function. Figure 9. Dynamic mutation probabilities for H. (4) Initial population: When a range [L,H] is given for the feature selection, we generate an initial F such that the probability of a binary digit to be 1 is (L+H)/2d where d is the total number of features. So an initial chromosome is likely to have (L+H)/2 features selected. (5) Parameters: We use the following GA parameters. Population size = 20, Number of generation = 50, Crossover probability = 0.6, Mutation probability = 0.1 3.3 GA procedures The GA procedures for Type 1 and Type 4 can be outlined in the followings. Note that in Type 4 GA process operates on each class separately. After each generation, the total classifier is constructed by choosing the best 2-classifiers of 10 classes and it is evaluated. Type 1 classifier Type 4 classifier GA_program() { t=0; initial_population(); evaluate_population(); GA_program() { t=0; for(k=0;k<No_Classes;k++) { initial_population(k); evaluate_population(k); } evaluate_total_classifier(); while (t<Max_Generation) { t++; select_new_population(); crossover(); mutation(); evaluate_population(); } } } while (t<Max_Generation) { t++; for(i=0;i<No_Classes;k++) { select_new_population(k); crossover(k); mutation(k); evaluate_population(k); } evaluate_total_classifier(); } IV. Experiments For our experiment, the CENPARMI handwritten numeral database is used. We use the DDD feature vector that has 256-dimensions [Oh98]. We let the number of hidden nodes to be within 16 and so a chromosome has 256+4=260 bits. We perform two experiments: Experiment 1 with the range [16,32] and Experiment 2 with the range [8,16]. As the generation increments, we measure the actual recognition rates of the total classifiers using both the training set and test set. Figures 10 and 11 illustrate the graphs of the recognition rates for Experiments 1 and 2, respectively. Each of Figures 10 and 11 has four recognition rate curves, each representing one of Type 1 or Type 4 classifier and one of the training or test set. Type 4 is superior to Type 1 in both experiments. The best recognition rates (i.e., from the fittest chromosome) for the training set in Experiment 1 are 96.07% at 43rd generation and 98.42% at 47th generation for Type 1 and Type 4, respectively. The corresponding recognition rates for the test set are 93.50% and 95.70% for Type 1 and Type 4, respectively. A similar analysis is made for Experiment 2. The best rates for the training set are 91.97% at 22 nd generation (89.40% for the test set) and 96.12% at 48 th generation (93.10% for the test set) for Type 1 Type 4, respectively. recognition rates 100 98 96 94 T y p e 1-tra in T y p e 1-te s t T y p e 4-tra in T y p e 4-te s t 92 90 88 86 1 6 11 16 21 26 31 36 41 46 51 g e n e ra tio n s Figure 10. Graphs of recognition rates produced in Experiment 1 (range [16,32]). recognition rates 100 95 90 T y p e 1-tra in T y p e 1-te s t T y p e 4-tra in T y p e 4-te s t 85 80 75 1 6 11 16 21 26 31 36 41 46 51 g e n e ra tio n s Figure 11. Graphs of recognition rates produced in Experiment 2 (range [8,16]). Table 1 summarizes the performance data as well as network architecture. An interesting analysis can be made regarding the class-specific feature subsets and classifier architectures. In Type 1, the GA found the optimal network architecture 31-5-1 and 16-6-1 for the Experiment 1 and 2, respectively, that are commonly used by the 10 classes. In Type 4, the 10 classes found their own optimal architectures as shown in the Table. In Experiment 1, class 2 uses the least number of features and class 7 uses the least number of hidden nodes. The large gaps in recognition rate between Type 1 and Type 4 for the test set (2.20% in Experiment 1and 3.70% in Experiment 2) are clearly shown in Table 1. We argue that the Type 4 gains a significant improvement due to the class-specific features and network architectures shown in the table. Table 1. Summary of performance and network architectures Best classifier Type 1 Type 4 43rd 46th Train set 96.07% 98.42% Test set 93.50% 95.70% At generation Exp. 1 Architectures (I-M-O (class)) 22nd 48th Train set 91.97% 96.12% Test set 89.40% 93.10% At generation Exp. 2 31-5-1(0);28-3-1(1);25-7-1(2) 32-7-1(3);32-8-1(4);31-7-1(5) 31-5-1 (10 classes) 26-7-1(6);28-4-1(7);30-5-1(8) 30-6-1(9); 29.3-5.9-1(average) Architectures (I-M-O (class)) 16-6-1(0);15-6-1(1);16-6-1(2) 15-7-1(3);16-4-1(4);16-8-1(5) 16-6-1 (10 classes) 15-3-1(6);15-8-1(7);16-6-1(8) 16-7-1(9); 15.6-6.1-1(average) V. Concluding Remarks A general framework for the class-modularity has been presented along with the motivations of the class-specific features and classifier architectures. Among 4 types of class-modular architectures, Type 1 and Type 4 were tested using handwritten numeral recognition and their performances were compared. The GA was adopted as a tool for selecting the feature subset and classifier architecture simultaneously. The experimental results confirm that the classifier designed with the class-specific features and network architectures has a superior performance. Several future directions can be found. Firstly, we expect that the two-level optimization would result in a further improvement. A GA program for the global optimization of the total classifier should be developed. Secondly, the class modularity can be applied to other classification methodologies like k-nn and polynomial classifiers. Also, applicability and effectiveness should be explored. Thirdly, since each of 2-classifiers may have different output types and/or strength level, an advanced integration module must be developed to improve the classification performance. Finally, application to other pattern recognition problems should be investigated. References [Anand95] R. Anand, K. Mehrotra, C.K. Mohan, and S. Ranka, “Efficient classification for multiclass problems using modular neural networks,” IEEE Tr. on Neural Networks, Vol.6, No.1, pp.117-124, 1995. [Hong96] K.-C. Hong, S.-M. Choi, J.-S. Lee, and I.-S. Oh, “Pipelining multiple algorithms for handwritten numeral recognition,” Proceedings of 2nd Korean-French Workshop on Handwriting Recognition, pp.95-102, Paris, France, 1996. [Looney97] C.G. Looney, Pattern Recognition Using Neural Networks, Oxford University Press, NewYork, 1997. [Michalewicz96] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs,” Springer-Verlag, 1996. [Mui94] L. Mui, A. Agarwal, A. Gupta, P.S.P. Wang, “An adaptive modular neural network with application to unconstrained character recognition,” in Document Image Analysis (Edited by H. Bunke, P.S.P Wang, and H.S. Baird), World Scientific, Singapore, pp.1189-1203, 1994. [Oh98] I.-S. Oh and C. Y. Suen, “Distance features for neural network-based recognition of handwritten characters,” International Journal on Document Analysis and Recognition, Vol.1, No.2, pp.73-88, 1998. [Oh99] I.-S. Oh, J.-S. Lee, and C.Y. Suen, “Analysis of class separation and combination of class-dependent features for handwriting recognition,” IEEE Tr. on PAMI, Vol.21, No.10, pp.1089-1094, October 1999. [Oh01] I.-S. Oh and C.Y. Suen, “A class-modular feedforward neural network for handwriting recognition,” Pattern Recognition (in printing). [Raymer00] M.L. Raymer, W.F. Punch, E.D. Goodman, L.A. Kuhn, and A.K. Jain, “Dimensionality reduction using genetic algorithms,” IEEE Tr. On Neural Network, Vol.4, No.2, July 2000. [Suen92] C.Y. Suen, C. Nadal, R. Legault, T.A., Mai, and L. Lam, “Computer recognition of unconstrained handwritten numerals,” Proceedings of the IEEE, Vol.80, No.7, pp.1162-1180, 1992. [Tsay92] S.-C. Tsay, P.-R. Hong, and B.-C. Chieu, “Handwritten digits recognition system via OCON neural network by pruning selective update,” Proceedings of 11th ICPR, pp.656-659, 1992. [Yang98] J. Yang and V. Honavar, “Feature subset selection using a genetic algorithm,” IEEE Intelligent Systems and Their Applications, Vol.13, No.2, pp.44-49.

full_text

Related documents

Products

Support

full_text

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib