International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 89 Pattern Classification for Handwritten Marathi Characters using Gradient Descent of Distributed error with Genetic Algorithm for multilayer Feed Forward Neural Networks 1 Holkar Shrirang Raosaheb, 2 DR. Manu Pratap Sing 1 2 Holkar Shrirang Raosaheb Shri Venkateshwara University, Gajraula , Amroha(Uttar Pradesh) India. e-mail: shrirangholkar@rediff.com) DR. Manu Pratap Sing Institute of Computer & Information Science Dr. B.R.Ambedkar University, Agra-282002 Uttar Pradesh, India (e-mail: manu_p_singh@hotmail.com) Abstract In this paper the performance of feedforward neural network with descent gradient of distributed error and genetic algorithm is evaluated for the recognition of handwritten characters of ’Marathi’ script. The performance index for the feedforward multilayer neural networks is considered here with distributed instantaneous unknown error i.e. different error for different layers. The genetic algorithm is applied here to make the search process more efficient to determine the optimal weight vector from the population of weights. The genetic algorithm here is applied with distributed error and the fitness function for the genetic algorithm is also considered as the mean of square distributed error that is different for each layer. Hence the convergence is obtained only when the minimum of different errors is determined. In this performance evaluation it has been analyzed that the proposed method of descent gradient of distributed Abstract error with genetic algorithm commonly known as hybrid distributed evolutionary technique for the multilayer feed forward neural performs better in terms of accuracy, epochs and number of optimal solutions for given training set and test pattern sets for the pattern recognition problem. Keywords: Hybrid evolutionary distributed technique, Multilayer feedforward neural network, gradient descent, 1. Introduction IJoART Pattern recognition is an emerging area of the machine learning and intelligence. The problem of pattern recognition has been considered in many ways. The one of the most popular way is in the form of the pattern classification. Pattern classification is a problem in which the machine can distinguish the different input stimuli in meaningful categorization according to the present features in these inputs. This meaningful categorization can exhibit with some already predefined classes depending upon the nature of problem. Pattern recognition and its application have been studied from very long period of time and there are various methods have been proposed to accomplish the task of pattern classification [1-10]. The recognition of handwritten curve script as in the form of character classification, character association has been considered as the dominate area in field of pattern recognition with machine learning techniques [11, 12]. Copyright © 2013 SciResPub. Soft computing techniques have been identified as a powerful tool to perform the task of pattern recognition for hand written curve script in the domain of machine learning [13 - 15]. The neural network techniques and evolutionary search methods have been used in various form of hybrid evolutionary algorithms for accomplish the task of pattern classification of handwritten curve scripts of many languages [16 - 18]. The feedforward multilayer neural network with gradient descent of backpropagated error is used widely for generalize pattern classification [19]. The analysis of this neural network architecture with generalized delta learning rule (backpropagation learning) has highlighted the performance and limitation of this architecture due to unavailability of more information for the units of output layer for handwritten character recognition [20]. Therefore the recurrent neural IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 90 network as in the form of backpropagation through error and genetic algorithm is evaluated for the time model (BPTT) offers a suitable framework for recognition of handwritten characters of ’Marathi’ reusing the output values of the neural network in script. The performance index for the feedforward training and it exhibited some promising performance multilayer neural networks is considered here with but only for the dynamic patterns and shown distributed instantaneous unknown error i.e. different inefficiency static patterns [21]. Later on it has been error for different layers. The genetic algorithm is investigated that the feed forward multilayer neural applied here to make the search process more efficient network with enhance and extended version of to determine the optimal weight vector from the backpropagation learning algorithm [22] is more population of weights. The genetic algorithm here is suitable for handling the complex pattern classification applied with distributed error and the fitness function or recognition tasks in spite of its inherited problem of for the genetic algorithm is also considered as the mean local minimum, slow rate of convergence and no of square distributed error that is different for each guarantee of convergence [23 – 27]. layer. Hence the convergence is obtained only when the minimum of different errors is determined. So that, the It has been found that to overcome the problems of instantaneous square error is not same for each layer descent gradient searching in a large search space as in instead of this it is different for each layer and it is the case of complex pattern recognition task with considered as distributed error for the multilayer feed multilayer IJoART feedforward neural network due the forward neural network, in which the number of units evolutionary search algorithm i.e. genetic algorithm in hidden layer and output layers are equal. Thus, the (GA) is a better alterative [28]. The reason of this is same desired output pattern for a presented input quite obvious because this search technique is free from pattern is distributed to every unit of hidden layers & derivatives and it evolves the population of possible outputs layer those contains the different actual outputs partial solutions and applies the natural selection and each layer has different square error. Thus, the process for filtering them until the global optimal instantaneous error is now distributed instead of back solution is not found [29]. Various prominent results propagated. have been reported in the literature for the generalize technique i.e. descent gradient of distributed error with classification for the handwritten English character genetic algorithm is used to train the multilayer neural recognition problem with the integration of genetic network architecture for the generalized classification algorithm and backpropagation learning rule for of hand written ’Marathi’ script. multilayer feed forward neural The proposed hybrid evolutionary network architecture[30,11]. In this approach the fitness The rest of the paper is organized as follows: performance for the weights has been considered with Section 2 presents the generalized descent gradient back-propagated error of the current input pattern method for the instantaneous distributed error and the vector. Thus, the performance of network still depends implementation of genetic algorithm in generalize way upon the back-propagated instantaneous random and with unknown error. architecture and simulation design for the proposed distributed error. Section 3 explores the method. Section 4 presents the results and discussion. In this paper the performance of feedforward neural network with descent gradient of distributed Copyright © 2013 SciResPub. The section 5 of the paper presents the conclusion followed by references. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 2. Generalized Descent gradient learning for 91 This constraint restrict the architecture in a way that the number of units in output layer and the hidden layers distributed square error should same though the desire output pattern for the A multilayer feed forward neural network with at presented input pattern could accommodate least two intermediate layers commonly known as conveniently by each layer. Thus, for every hidden hidden layer, in addition to the input and output layer layer and output layer we have the different square can pattern error. Therefore the optimum weight vector can obtain classification task. The generalized delta learning rule for each layer if the weights are adjusted in such a way [23] is a very common and widely used technique to that train the multilayer feedforward neural networks for the instantaneous square error of that layer. It exhibits that pattern classification & pattern mapping. In this we have more than one objective function or minimum learning the optimum weight vector may be obtained error, one each for each layer except the input layer for for the given training set, if the weights are adjusted in the presented pattern. It explores this problem as the such a way that the gradient descent is made along the multi-objective optimization problem. Thus, here the total error surface in the weight space. The error for the objective is minimization is actually not the least mean square error instantaneous square error simultaneously to determine for the entire training set instead of this it is an optimum weight vector for the presented input pattern. perform any complex generalized the gradient descent to obtain is made the minimum along the of each IJoART instantaneous square error for each presented pattern on Therefore, the mean of instantaneous square error of each time. Thus, for every pattern on each time there the layer is used to update the weights of the layer and will be an unknown local error and there is the the gradient descent of each error for each layer will incrementally updating of the weight for each local obtain at the same time. Therefore, there will be more error. Hence each time the weights are updated to than one gradient descent at one time of individual minimize this known local error by propagating this errors for the presented input pattern depending on the error back to all hidden layers from the output layer. number of hidden layers. Hence, the updating of weight Thus, the instantaneous error for each presented input vector for units of hidden layers and for the units of pattern as the square difference between the desire output layer will be proportional to their corresponding pattern vector and the actual output for the units of gradient descents. So that, there is a different gradient output layer is backpropagated to units of hidden for each layers. Thus, the optimal weight changes will layers. In this current work we are considering the proportional to the gradient descent of the distributed distributed error instead of the backpropagated error. instantaneous mean square errors for the presented The instantaneous square error is not same for the each input pattern. The generalized method for obtaining the layer because each layer has its own actual output weight update for hidden layers and output layer is pattern vector. So that for each layer the instantaneous formulated as: square error is computed with the square difference between desire output pattern vector for the given input sample from the training set and the actual output pattern vector of the respective layer. This distributed instantaneous square error imposes a constraint on the Let (al , d l ) for l 1,2, , L be the current input pattern vector set of the training set of L pattern samples is presented to the multilayer feed forward neural network for formulating the generalized descent architecture of multilayer feed forward neural network. Copyright © 2013 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 92 gradient of instantaneous square distributed error. As we have discussed already about the constraint of this ElO multilayer feed forward neural network for keeping K 1 2 k 1 [d klO S k ( y klO )]2 (3) same the number of units in hidden and output layer as shown in figure 1 And, E lH J 1 2 j 1 [ d Hjl S j ( y Hjl )]2 (4) Hence, the update in the weight for the k th unit of output layer at iteration t for the current input pattern vector is represented as; wkjl (t ) ElO wkj kj (5) And also the update in the weight for the IJoART jth unit of hidden layer at iteration t for the same current pattern is represented as: Fig. 1: Multilayer Feed Forward Neural Network Architecture w lji (t ) The current random sample pattern ElH w ji ji (al , d l ) of the training set defines the instantaneous squared error vector e Op at the output layer and e Hp at the hidden layer as: elO Here kj and are the learning rates for the output and hidden layer respectively. dl S ( y lO ) ( d1l S1 ( y1Ol ), , d kl S k ( y klO )) Now, apply the chain rule on Equation 5, we have; l kj (1) elH ji (6) dl S ( y lH ) ( d1l S1 ( y1Hl ), , d kl w (t ) S k ( y klH )) (2) Therefore the instantaneous distributed mean square error for the output and hidden layer is defined as Here, kj the J y klO j 1 S k ( y klO ) ElO wkj activation value Copyright © 2013 SciResPub. is S j ( y Hjl ) wkj and the output signal is f ( y klO ) 1 1 e O y kl respectively: Or, kj ElO y klO y klO wkj wkjl (t ) kj ElO S j ( y Hjl ) y klO IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 ElO S k ( y klO ) S j ( y Hjl ) O O S k ( y kl ) y kl kj 93 ElH S j ( y Hjl ) J j 1 [d kl S j ( y Hjl )] Hence we have: ElO S k ( y klO )(1 S k ( y klO )) S j ( y Hjl ) O S k ( y kl ) kj w lji (t ) J j 1 ji [ d kl S j ( y Hjl )]S j ( y Hjl )(1 S j ( y Hjl ))ai (9) Thus, the weight at the iteration (t+1) for the units of Now from the equation 3 we have: ElO S k ( y klO ) K k output layer with momentum term are presented as: [d kl 1 S k ( y klO )] w lji (t 1) J j 1 ji Hl j S j ( y Hjl )(1 S j ( y Hjl )) ai (10) Here the momentum rate constant is considered Hence we have: wkjl (t ) with 0 K k 1 kj [d kl 1 for the hidden layer S k ( y klO )]S k ( y klO )(1 S k ( y klO ))S j Here ( y Hjl )an interesting observation is considered about the (7) number of terms appearing in the expression for weight IJoART Thus, the weight at the iteration (t+1) for the units of updating for the hidden layer. It can be seen from output layer with momentum term are presented as: equation 9 that the less number of terms are considered K wkjl (t 1) k 1 kj O k S k ( y klO )(1 S k ( yklO )) S j ( y Hjl ) (8) Here the momentum rate constant is considered with 0 Or, ji l ji w (t ) E w ji ji E y ji H l H jl learning rule for the backpropagated instantaneous mean square error. Thus, gradient of distributed instantaneous mean square error. H jl Hence it is obvious that should consider the fast w ji convergence with respect to conventional generalized y delta learning rule of backpropagated error. ElH ai y Hjl 2.1 Genetic algorithm with descent gradient of distributed Error H l ji backpropagation computation of weight update according to descent Similarly, apply the chain rule on Equation 6, we have; w lji (t ) wkj (t 1) from the less time complexity is involved for the 1 for the output layer. H l withl respect to the weight updating for hidden layer H jl S j (y ) E ai H S j ( y jl ) y Hjl The majority of implementation of the GA is a derivative of Holland’s innovative specification. In our approach the genetic algorithm is incorporated with ji ElH S j ( y Hjl )(1 S j ( y Hjl ))a i H S j ( y jl ) Now, from the equation 4 we have: descent gradient for distributed instantaneous mean square error learning in the multilayer feed forward neural network architecture for the generalized pattern classification. The input pattern vector with its corresponding output pattern vector form the training Copyright © 2013 SciResPub. IJOART w lji (t 1) International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-77763 94 set is presented to the neural network. The neural iterations. So that the initial population of solutions for network with its current setting of weights obtained the GA is not random instead of this the initial population actual output for each unit of hidden layers and output of weights as solution is suboptimal because the layer. The distributed instantaneous mean square error weights have updated in the direction of convergence. is obtained and the proposed descent gradient learning Thus, the GA explores from suboptimal solution to rule for distributed error is applied up to some fixed multi objective optimal solution for the given problem. arbitrary n iterations. Thus, the weights between the The multi objective optimal solution reflects that every layers and bias values of units are updated up to n layer expect input layer has its own different error iterations for the given input pattern and improved from surface or objective function. their initial stage. After this the iteration for weight Chromosome Representation A update stops and the genetic algorithm is employed to chromosome is a collection of genes evolve the population of modified weights and bias representing either a weight value or a bias value values. The genetic algorithm is applying for obtaining represented in some real number. The initial population the optimal weight vector from the large size of weight of weight and bias for the representation of basic or space for the given training set with following three initial chromosome in our method is not random. elements. Instead of this the initial chromosome consists with (i) The genetic code for the weight vector suboptimal value of weight and bias. Therefore the IJoART representation in the form of chromosome; chromosome is represented as a matrix of real numbers (ii) The technique for evolving the population of for the set of weight values and bias values. As we have weight vectors; discussed already that in our proposed multilayer neural (iii) The fitness function for evaluating the performance of evolved weight vector; network architecture the error is considered as distributed instantaneous mean square error i.e. the There are lot of works is reported on the evaluation of different neural network with genetic algorithm [24]. The chromosome will partition in the sub-chromosomes majority of the work indicates the integration of genetic corresponding to each layer hidden layer and output algorithm with neural network is found at following layer. Hence, as per our general architecture of neural three levels [25]: network as shown in Figure 1 there will be two sub- (i) Connection weights (ii) Architectures (iii) Learning rules. The evaluation of a weight vectors for the neural error for different layers. Hence the chromosomes. In the first sub-chromosome, there will be (i j j) genes and for the second chromosome there will be ( j k k) genes. Thus, the numbers of network is an area of curiosity and it is considered in the approach of this current work. In this approach the genetic algorithm is using different fitness evaluation function for each layer. The distributed instantaneous mean square error for each layer is considered as the sub-chromosomes depend upon the number of hidden layer but the number of genes in every subchromosome will same, though values of genes may different in each sub-chromosome. fitness evaluation function for that layer. Generally the GA starts from the random initial solution and then converges for the optimal solution. In our approach the GA applies after the updating of weights up to n Copyright © 2013 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 95 the next population of sub-chromosomes for the hidden The Mutation Operator and output layer respectively. The inner Mutation operator randomly selects a gene from chromosome and modified it with some random value to generate the next population of chromosome. The prepares a new sub-chromosome at each iterations of mutation and outer operator is building the new probability of mutation is kept low to minimize the population of sub-chromosome called randomness for genetic algorithm. In our approach the C ON _ new . mutation operator applied to each sub-chromosome, randomly selects a gene from each sub-chromosome and adds a small random value between +1 and -1 to generate the next population of sub-chromosome. Let N operator C HN _ new & Elitism Elitism is used with the creation of each new population to continue the old good population in the next generation. This process has the significance in the C for the network which is way that the good solution of previous population C HN and should not lose by the application of genetic operators. C ON for hidden layer and output layer. C HN is This involved copying the best encoded network we have the chromosome partitioned in the two sub-chromosomes as containing (i j) j containing ( j k) k m H genes while N O C is m O genes. The size of next unchanged into the new population as given in Equations 11 and 12, to include C HN _ old & C ON _ old for IJoART generated population would be N H 1 and N O 1 N _ new creating C H N _ new & CO . Selection respectively. If the mutation operator has applied n The selection process of genetic algorithm selects good times over the old sub-chromosome for the output layer or fit population from the newly generated population. and the hidden layer respectively then we have the Here the selection process simultaneously considers following new population of the sub chromosomes newly generated sub chromosomes of hidden layer and [26]: output layer i.e. C HN _ new and C ON _ new respectively for selecting the good population for further cycle. Let a C N _ new H C N _ old H n i 1 [C N _ old H ,m H H (C N _ old H, H H )] sub chromosome which the distributed instantaneous mean square error (11) H for the hidden layer i.e. El for the pattern l reached to And its C N _ new O C N _ old O n i 1 [C N _ old O,m O O (C N _ old O, O O )] (12) Here H and N _ new CSel is selected for H from C H O are the small random generated values between -1 to + 1 for sub chromosomes of hidden layer and output layer respectively, randomly selected genes from chromosomes respectively and H & O are the old C old H and C O sub- C HN _ new & C ON _ new are Copyright © 2013 SciResPub. accepted minimum Sel level. N _ new chromosome C N from C O Likewise a sub is selected for which the distributed instantaneous mean square error for the O output layer i.e. E l for the same pattern l reached to its accepted minimum level. Crossover Crossover is a very important and useful operator of genetic algorithm. Here the crossover operator IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 considers the selected sub-chromosomes from CSel H & Where and 96 are the randomly selected genes Sel CSel H & C O and C Sel O and creates next generation of population positions from the sub-chromosomes separately for the hidden layers and output layer. We next C next H & C O is the next generation of population of apply the uniform crossover operator n times on the size n+1. Thus, after the cross over operation we have selected sub chromosomes on different crossover points 2(n+1) total populations of chromosome for the to obtain the next generation of population. Let the selected CSel H and sub-chromosomes network i.e. n+1each for hidden layer and for output C Sel O are layer. considered for uniform crossover as shown in Figs.2-4. Fitness Evaluation Function Fitness evaluation function of genetic algorithm is used to evaluate the performance of generated new populations. It filters the populations those find suitable as per the criteria of fitness function. Here, we use the separate fitness evaluation function for each layer. Therefore as per our neural network architecture, two IJoART fitness evaluation functions have used. The one is for output layer and second one is for the hidden layer. The first fitness evaluation function estimates the performance for the sub chromosome of hidden layer i.e. C next H and second one estimates the performance for next the sub chromosome of output layer i.e. C O . The fitness function used here is proportional to the sum of distributed instantaneous mean squared error on respective layers. The fitness function f H for the hidden layer considers the instantaneous mean square error as specified in equation 4 to evaluate the performance of Fig. 4: After applying crossover operator Therefore, on applying the crossover operator n times Sel sub-chromosomes for hidden layer i.e. C next H . The Sel on selected sub-chromosome ( C H & C O ), the n+1 fitness function population of sub-chromosomes each can be generated instantaneous mean square error as specified in as [27]: next H C Equation 3 to evaluate the performance of subSel H C n i1 Sel H [(C Sel ,H Sel ,H C Sel ,H C ) (C VH VH Sel ,H C )] chromosomes for output layer i.e. C next . Thus, the O (13) genetic algorithm attempts to find weight vectors and And next O C f O for the output layer considers the Sel O C n i1 Sel O [(C Sel ,O C bias values for different layers those minimize the Sel ,O Sel ,O C ) (C VO VO (14) Copyright © 2013 SciResPub. Sel ,O C )] corresponding instantaneous mean of squared error. This procedure for evaluating the performance for IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 97 weight vector of hidden and output layer can represent are collected in this simulation as input stimuli for the as: training pattern set. These scanned images of distinct min errorH 1.0 && min errorO 1.0 handwritten characters of ‘Marathi’ scripts are shown in figure 5 as: Do for all n+1chromosomes { ( if H ) then C Hmin ECl ,next (min errorH H ,i H ) then COmin ECl ,next ( if (min errorO O ,i else (((min errorH ( (min errorO C Hnext ,i )&& COnext ,i ) min errorH )) && min errorO ) )) } Here min C min & C O represents the sub-chromosomes H those have the minimum error for hidden and output layers respectively. Here we also have the possibility for getting more than optimal weight vectors for the IJoART given training set because there are more than one sub- chromosomes in hidden and in output layers those evaluated as fit by the fitness evaluation functions of respective layers. 3 Simulation Design and Implementation In this simulation design and implementation, two proposed multilayer feed forward neural networks are considered. Both neural networks are trained with proposed descent gradient of distributed instantaneous mean square algorithm. Since every input pattern consist with 16 distinct features so that each neural network architecture contains 16 processing units in the input layer. First neural network architecture consists with input layer, two hidden layers with five units in each and one output layer with 5 units. Second neural network architecture consists with input layer, one hidden layer of 5 units and output layer also with 5 units. Feature Extraction There are five different samples of handwritten characters of ’Marathi’ script from five different people Copyright © 2013 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 98 Fig. 5: Scanned images of handwritten distinct ‘Marathi’ scripts The scanned images of hand written characters of ‘Marathi’ scripts as shown in figure 5 are partition into sixteen equal parts, and the density values of the pixels for each part were calculated and obtained the center of density gravity. Therefore for each scanned image of handwritten characters of ‘Marathi’ scripts we obtained the sixteen values as the input pattern vector of training set. Thus, we have the training set, which consist with sampled patterns of handwritten characters of ‘Marathi’ scripts and each sample pattern is considered as pattern vector of dimension 16 1 with real number values. The output pattern vector corresponds to input pattern vector is of dimension 5 1 of the binary values. The test input patterns set is also considered with same method for the sample patterns those were IJoART not used in training set. The sample test patterns were used to verify the performance of trained neural networks. Simulation design for 16-5-5-5 Neural Network Architecture In the simulation of proposed feed forward multilayer neural network architecture with two hidden layers of 5 units each and one output layer of 5 units (16-5-5-5) involves three different instantaneous mean of square errors at the same time i.e. for first hidden layer & E o for output layer, E h1 E h 2 for second hidden layer, those are presented as for pattern l: ElO 1 2 K k 1 (d kl S k ( y klO )) 2 (15) ElH1 1 2 G g 1 (d kl S g ( y glH1 )) 2 (16) Copyright © 2013 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 And ElH 2 1 2 J j (d kl 1 S j ( y Hjl 2 )) 2 99 After this the selection is applied to all the three subchromosomes for selecting the better population of chromosomes for next generation. This selection (17) the procedure considers the of distributed instantaneous instantaneous mean of square error updates the weight mean of square error as specified in equations 15, 16 vector up to t iterations. After this the weight updating and 17 as the fitness evaluation function to select the is stopped and the genetic algorithm applies. The sub chromosomes for the next generation. Now we updated weight & bias values are considered as the apply the cross over operator simultaneously on all the initial population of chromosome for the genetic selected sub-chromosomes to generate the large algorithm. As per our proposed neural network population in next generation. Thus, the cross over architecture in this simulation design we have three sub operator generates the populations of sub-chromosomes chromosomes one each for both hidden layers and one for first hidden layer, second hidden layer and output for the output layer. The first sub-chromosome as layer of size 85 genes, 30 genes and 30 genes shown in figure 6 is of 85 genes in which 80 are the respectively. So that, the selected population of weights weights values on the connection link and 5 are the bias and biases form each sub-chromosome determines the for the units of hidden layer. The second and third sub- optimal solutions for the given Training pattern set. chromosomes are of 30 genes each in which 25 are Thus, there are minimum three optimal solutions are The proposed gradient learning rule for IJoART weight values on the connection link and 5 are the bias required for the convergence of neural network. for the units of second hidden layer and output layer. Simulation design for 16-5-5 Neural Network Architecture In the simulation of proposed feed forward multilayer neural network architecture with one hidden layer of 5 units and one output layer of 5 units (16-5-5) involves two different instantaneous mean of square errors at the same time i.e. E o for output layer & E h1 for first hidden layer, those are presented as for pattern l: 1 2 ElO K k 1 (d kl S k ( y klO )) 2 (18) And ElH 1 2 J j 1 (d kl S j ( y Hjl )) 2 (19) Fig. 6 (c): Sub-chromosome 3 for output layer of 30 In this experiment we divide the chromosome into the genes two sub chromosomes one each for hidden layer and The mutation operator applies simultaneously to all the output layer. The first sub-chromosome as shown in three sub-chromosomes by adding the small random values between -1 and 1 to the selected genes to generate the new population of these sub chromosomes. Copyright © 2013 SciResPub. figure 7 is of 85 genes in which 80 are the weights values on the connection link and 5 are the bias for the units of hidden layer. The second sub-chromosome IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 100 consists with 30 genes in which 25 are weight values training set of handwritten characters of ‘Marathi’ on the connection link and 5 are the bias for the units of scripts. output layer. Genetic Algorithm with backpropagated error: The parameters of the genetic algorithm with backpropagated error for the simulation of both the experiments are as follows: Parameter Value Learning rate for output layer ( O Fig. 7 (b): Sub-chromosome 2 for output layer of 30 Learning rate for first genes hidden layer ( H1 The mutation operator applies simultaneously to both the sub-chromosomes by adding the small random values between -1 and 1 to the selected genes to 0.01 ) 0.01 ) Learning rate for second hidden layer 0.1 IJoART generate the new population of these sub chromosomes. ( H2 ) After this the selection is applied to both the sub- Momentum term chromosomes for selecting the better population of ( ) 0.9 chromosomes for next generation. This selection procedure considers the of distributed instantaneous Adaption rate (K ) 3.0 mean of square error as specified in equations 18 and 19 as the fitness evaluation functions to select the sub chromosomes for the next generation. Now we apply Mutation 3 population size the cross over operator simultaneously on all the selected sub-chromosomes to generate the large Crossover population in next generation. Thus, the cross over population size operator generates the populations of sub-chromosomes for hidden layer and output layer of size 85 genes and Initial population 1000 Randomly generated values between 0 and 1 30 genes respectively. So that, the selected population Back propagated of weights and biases form each sub-chromosome determines the optimal solutions for the given Training Fitness pattern set. Thus, there are minimum two optimal evaluation function solutions are required for the convergence of neural (one fitness function) network. 3.3 Parameters used The following parameters are used to accomplish the Minimum error (MAXE) instantaneous squared error El 1 2 K k 1 (d k S k ( y kO )) 0.00001 simulation of these two experiments for the given Copyright © 2013 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 Table 1: Parameters used for genetic algorithm with Mutation back propagated error probability 101 Smaller than 0.01 Mutation Genetic algorithm with distributed error: The parameters used in the simulation of both the experiments for genetic algorithm with descent gradient learning for distributed error are as follows: population size for sub- 3 chromosome of output layer Mutation Parameter Value population size for sub- Learning rate chromosome for output layer ( O 0.01 ) Crossover for hidden population 0.1 layers H1 & H2 of hidden layers Learning rate ( 3 each ) Momentum term for output layer ( ) Momentum term for size for output layer IJoART output layer Crossover population 0.9 architecture) 0.7 size for second hidden 3.0 layer( for 16- architecture) 0.0001 Crossover population (MAXEO ) size for Minimum hidden layer( error for the 0.001 1000 for 16-5-5 architecture) Number of iteration prior Copyright © 2013 SciResPub. 500 5-5-5 error for the ( MAXE H ) Crossover population Minimum hidden layers 1000 for 16-5-5-5 Adaption rate output layer size for first hidden layer( ( ) (K ) 1000 5000 IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 102 to applying 16-5-5-5 is also found efficient and more generalized GA for the test pattern set also. The results of performance Initial population Values of weights & bias in each sub evaluation are shown with tables 5 and 6. The entries of chromosomes up to 5000 iterations of tables are presenting mean values of iterations and descent gradient for distributed error. number of convergence weight matrices of five trials Fitness Distributed instantaneous sum of evaluation squared errors with each hybrid technique for given training set. functions (two fitness ElO 1 2 ElH1 1 2 G ElH 2 1 2 J function for 16-5-5 architecture and three fitness function for K k 1 g 1 j 1 (d kl S k ( y kO )) 2 (d kl S g ( y gH1 )) 2 (d kl S j ( y Hj 2 )) 2 16-5-5-5 architecture) IJoART Table 2: Parameters used for decent gradient learning with distributed error 4 Results and Discussion The results from Simulation design and implementation for both the neural network architectures i.e. for 16-5- 5-5 and 16-5-5 are considered for 65 training sample examples of Handwritten ‘Marathi’ scripts with two hybrid techniques. The techniques commonly used are genetic algorithm with descent gradient for backpropagated instantaneous mean square error and genetic algorithm with descent gradient for distributed instantaneous mean square error. The performance of both the neural network architectures have been evaluated with these two hybrid techniques of learning for the given training set and the performance analysis is also performed. Hence in the performance analysis it has been found that the neural network architecture of 16-5-5-5 performed more optimally in terms of convergence, number of epoch and number of optimal solutions for the classification of patterns in training set. The performance of neural network architecture for Copyright © 2013 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 103 IJoART Copyright © 2013 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 104 Table 5: Performance evaluation for GA with descent gradient of distributed Error and back Propagated Error for 16-5-5 architecture IJoART Copyright © 2013 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 105 Table 6: Performance evaluation for GA with descent gradient of distributed Error and back Propagated Error for 16-5-5-5 architecture In the results tables are containing the information IJoART about counts. The counts are here representing the number of optimum solutions i.e. the number of weight matrices on which the network is convergence for the given training set. The integer value for the epoch in tables is representing the number of iterations performed by each learning method to classify the given input pattern. It has been observed from the results that no case of non convergence is found. Thus the network is able to successfully converge for more than one optimum weight vectors or solution for the given input pattern. Table 5 of simulated result is showing the performance evaluation between GA with descent gradient of instantaneous mean square distributed error and GA with descent gradient of backpropagated error for the network architecture 16-55. This evaluation is considered about the parameter of epochs i.e. number of iteration for the convergence and number of counts i.e. number of optimal converged weight vectors. Results of table 5 are considered for mean of five trials for the same input pattern. Table 6 of simulated result is showing the performance Copyright © 2013 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 106 evaluation between GA with descent gradient of on the parameter of number of iterations and number of instantaneous mean square distributed error and GA counts. with descent gradient of backpropagated error for the 5. Conclusion network architecture 16-5-5-5. This evaluation is also In this work we have considered the simulation of two considered about the parameter of epochs i.e. number neural network architectures for their performance of iteration for the convergence and number of counts evaluation with descent gradient of instantaneous mean i.e. number of optimal converged weight vectors. square distributed error with GA and descent gradient Results of table 6 are also considered for mean of five of instantaneous mean square backpropagated error trials for the same input pattern. An important analysis with GA for the classification of handwritten ‘Marathi’ about the optimal solutions is also observed form this curve scripts. We considered the instantaneous mean simulation. Here an optimal solution is obtained only square distributed error as the mean of square when there is more than one objective functions are difference between target output pattern and actual satisfied at one time. As in the case of our neural output pattern from each unit of each layer differently network of 16-5-5 architecture there are two objective correspond to present input pattern. Thus, the common functions one each for hidden layer and output layer. target pattern is used by each layer with their respective The network is converged only when both the objective different computed actual output pattern. Therefore in functions find their defined minimum error threshold. this approach the convergence for the given training IJoART Similarly in the neural network of 16-5-5-5 architecture samples is considered only when three different error we have the three different objective functions and the functions are minimized simultaneously. Hence, the network is converged only when all the three objective optimum solution is constraints with three objectives functions find their defined minimum error threshold. functions and this reflects the case of multi objective Thus, the performance of neural networks for descent optimization instead of single objective optimization as gradient of instantaneous mean square distributed error in the case of descent gradient of instantaneous mean considers as the multi-objective optimization. On the square backpropagated error. Therefore on the basis of other hand the GA with simulation descent gradient of instantaneous mean square back-propagated error results & analysis the following observations can be drawn: considers only one objective i.e. one common error function for objective function for all the layers. So 1. It can observe that the performance of GA with that, number of optimal solutions or counts are descent gradient of distributed error for multi reflecting only the converged weight matrices or objective optimization is better in most of the cases optimal weight matrices to obtain only one minimum of than GA with descent gradient of backpropagated error. Thus, it exhibits the case of single objective error for single optimization in terms of number of optimization. It can be seen from the result of Table 5 optimize solutions or counts. This is obvious that & 6 that the performance of neural network architecture number of iteration for GA with descent gradient with descent gradient of instantaneous mean square of distributed error are more because the in this distributed error for multi objective optimization is method there are three objective functions and all approximately same as GA with descent gradient with of them should minimize for the optimal solution. back-propagated error for single objective optimization 2. It can also see from the results that the behavior of GA with descent gradient of distributed error is Copyright © 2013 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 107 more consistent & exhibiting less randomness in with different methods of image processing for compare to GA of feature extraction from the handwritten curve backpropagated error. There is also another scripts. These aspects can consider for future work interesting observation about the performance of to evaluate the performance for propose method on neural networks for GA with descent gradient of various problem domain. with descent gradient distributed error for the number of counts and iterations for the new pattern information and for References the same pattern information with different examples. Every time for the same pattern approach”, New Delhi: Tata McGraw-Hill counts are more & number of iteration are less and (2004) [2] Sun, Y., “Hopfield neural network based number of iterations are high. So that when we algorithms move from one unknown local error minimum to reconstruction-Part another unknown local error minimum there is less Simulations”, IEEE Transaction on Signal number of optimum solutions and it requires more Process vol. 48(7), pp. 2105-2118 (2000) [3] number of iterations to converge. for image I: restoration and Algorithms and Szu, H., Yang, X., Telfer, B. and Sheng, Y., IJoART Generally the GA starts form the random solutions “Neural network and wavwlet transform for and converge towards the optimal solution. Hence scale invariant data classification”, Phys. Rev. in multi objective optimization the randomness of E 48, pp. 1497-1501 (1993) GA more increases and possibility to obtain [4] Nagy, G., “Classification Algorithms in optimal solution decreases. In the proposed Pattern Recognition,” IEEE Transactions on technique, the GA does not start from random Audio and Electroacoustics, vol. 16(2), pp. population of solutions but instead of this it starts 203-212 (1968) from the sub-optimal solutions, because the GA is [5] Hoppensteadt, F.C. and Ihikevich, E.M., applied after the some iteration of descent gradient “Synchronization of instantaneous mean square distributed error. Associative These Neurocomputing,” Phys. Rev., vol. 62(E), pp. iterations explore the direction for GA starts from sub-optimal solutions and moves of Memory, Laser Oscillators, and Optical 4010-4013 (2000) convergence and from here the GA starts. Thus, 4. Kumar, S., “Neural Networks: A Class room information with different examples the number of for new pattern information these counts are low & 3. [1] [6] Keith, L.P., “Classification Of Cmi energy towards the optimal solutions. levels The multi objective optimization is a dominate networks,” Phys. Rev., vol. 41(A), pp. 2457- thrust area in soft computing research. There are 2461 (1990) various real world problems where multi objective [7] using counterpropagation neural Carlson, J.M., Langer, J.S. and Shaw, B.E., optimization is required. The proposed method “Dynamics of earthquake faults,” Reviews of may explore the possibility to achieve the optimal modern physics, vol. 66(2), pp. 657-670 solutions for various problems of multi objective (1994) optimization. The performance of GA with descent gradient of distributed error can be more improved Copyright © 2013 SciResPub. [8] Palaniappan, R., “Method of identifying individuals using VEP signals and neural IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 [9] [10] networks,” IEE Proc. Science Measurement Inst. Electron. Commun. Eng., vol.65(E), pp. and Technology, vol. 151(1), pp. 16-20 (2004) 107-114 (1982) Zhao, H., “Designing asymmetric neural [18] and Lam, L., “Computer recognition of Rev. vol. 70(6), pp. 137-141 (2004) unconstrained handwritten numerals,” Proc. Schutzhold, R., “Pattern recognition on a IEEE, vol. 80(7), pp. 1162-1180 (1992) [19] [13] “Handwritten digit recognition by neural Impedovo, S., “Fundamentals in Handwriting networks with single-layer training,” IEEE Recognition.” Trans. on Neural Networks, vol. 3, pp. 962- NATO-Advanced Study 968 (1992) Mori, S., Suen, C.Y. and Yamamoto, K., [20] Neural Network Architecture for Visual development,” Proceeding of the IEEE, vol.80 Pattern Recognition,” IEEE Trans. on Neural (7), pp. 1029-1058 (1992) Networks, vol. 8(2), pp. 331-340 (1997) Fukushima, K. and Wake, N., “Handwritten [21] inverting a deformable template model of IJoART handwritten digits,” Proc. Int. Conf. Artificial Networks, vol. 2(3), pp. 355-365 (1991) Neural Networks, Sorrento, Italy, pp. 961-964 Blackwell, K.T., Vogl, T.P., Hyman S.D., (1994) Approach to Handwritten [22] Co., Boston, MA (1996) [23] Rumelhart, D.E., Hinton G.E., and Williams 655-666 (1992) Boser, B., Denkar, J.S., R.J., “Learning internal representations by Henderson, D., Howard, R.E., Hubbard, W., error propagation.”, MIT Press, Cambridge, and vol. 1,pp. 318–362 (1986). Ie Cun, Y., Jackel, Recognition L.D., “Handwritten with a Digit Back-Propagation [24] Sprinkhuizen-Kuyer, I.G., and Boers, E.J.W., Network,” Advances in Neural Information “The Local Minima of the error surface of the Processing Systems, vol. 2, pp. 396-404 2-2-1 XOR network,” Annals of Mathematics (1990) and Artificial Intelligence, vol. 25(1-2), pp. Kharma, N.N., and Ward, R.K., “A novel 107-136 (1999) invariant mapping applied to hand-written [25] Zweiri, Y.H., Seneviratne, L.D., and Pattern Althoefer, K., “Stability Analysis of a Three- Recognition vol. 34(11), pp. 2115-2120 Term Backpropagation algorithm,” Neural (2001) Networks Journal, vol. 18(10), pp. 1341-1347 Arabic [17] Hagan, M.T., Demuth, H.B. and Beale, M.H., “Neural Network Design,” PWS Publishing Character Recognition,” Pattern Recognition vol. 25, pp. [16] Urbanczik, R., “A recurrent neural network neocognitron,” IEEE transaction on Neural Barbour, G.S. and Alkon, D.L., “A New [15] Lee, S.W., and Song, H.H., “A New Recurrent “Historical review of OCR research and alphanumeric character recognition by the [14] Knerr, S., Personnaz, L., and Dreyfus, G., pp. 311-316 (2003) Institute, vol. 124, Springer-Verlag (1994) [12] Suen, C.Y., Nadal, C., Lagault, R., Mai, T.A., networks with associative memory,” Phys. quantum computer,” Phys. Rev. vol. 67(A), [11] 108 Badi, character K. and recognition,” Shimura, M., (2005) “Machine recognition of Arabic cursive script,” Trans. [26] Abarbanel, H., Talathi, S., Gibb, L., and Rabinovich, M., “Synaptic plasticity with Copyright © 2013 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014 ISSN 2278-7763 109 discrete state synapses,” Phys. Rev., vol. E, 72:031914 (2005) [27] Shrivastava, S. and Singh, M.P., “Performance evaluation of feed-forward neural network with soft computing techniques for hand written English alphabets”, Journal of Applied Soft Computing, vol. 11, pp. 1156-1182 (2011) IJoART Copyright © 2013 SciResPub. IJOART