Document 14671047

advertisement
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
89
Pattern Classification for Handwritten Marathi Characters
using Gradient Descent of Distributed error with Genetic
Algorithm for multilayer Feed Forward Neural Networks
1
Holkar Shrirang Raosaheb,
2
DR. Manu Pratap Sing
1
2
Holkar Shrirang Raosaheb
Shri Venkateshwara University,
Gajraula , Amroha(Uttar Pradesh)
India.
e-mail: shrirangholkar@rediff.com)
DR. Manu Pratap Sing
Institute of Computer & Information Science
Dr. B.R.Ambedkar University, Agra-282002
Uttar Pradesh, India
(e-mail: manu_p_singh@hotmail.com)
Abstract In this paper the performance of feedforward neural network with descent gradient of distributed error
and genetic algorithm is evaluated for the recognition of handwritten characters of ’Marathi’ script. The
performance index for the feedforward multilayer neural networks is considered here with distributed instantaneous
unknown error i.e. different error for different layers. The genetic algorithm is applied here to make the search
process more efficient to determine the optimal weight vector from the population of weights. The genetic algorithm
here is applied with distributed error and the fitness function for the genetic algorithm is also considered as the
mean of square distributed error that is different for each layer. Hence the convergence is obtained only when the
minimum of different errors is determined.
In this performance
evaluation it has been analyzed that the proposed method of descent gradient of distributed
Abstract
error with genetic algorithm commonly known as hybrid distributed evolutionary technique for the multilayer feed
forward neural performs better in terms of accuracy, epochs and number of optimal solutions for given training set
and test pattern sets for the pattern recognition problem.
Keywords: Hybrid evolutionary distributed technique, Multilayer feedforward neural network, gradient descent,
1. Introduction
IJoART
Pattern recognition is an emerging area of the
machine learning and intelligence. The problem of
pattern recognition has been considered in many ways.
The one of the most popular way is in the form of the
pattern classification.
Pattern classification is a
problem in which the machine can distinguish the
different input stimuli in meaningful categorization
according to the present features in these inputs. This
meaningful categorization can exhibit with some
already predefined classes depending upon the nature
of problem. Pattern recognition and its application have
been studied from very long period of time and there
are various methods have been proposed to accomplish
the task of pattern classification [1-10]. The recognition
of handwritten curve script as in the form of character
classification,
character
association
has
been
considered as the dominate area in field of pattern
recognition with machine learning techniques [11, 12].
Copyright © 2013 SciResPub.
Soft computing techniques have been identified as
a powerful tool to perform the task of pattern
recognition for hand written curve script in the domain
of machine learning [13 - 15]. The neural network
techniques and evolutionary search methods have been
used in various form of hybrid evolutionary algorithms
for accomplish the task of pattern classification of
handwritten curve scripts of many languages [16 - 18].
The feedforward multilayer neural network with
gradient descent of backpropagated error is used widely
for generalize pattern classification [19]. The analysis
of this neural network architecture with generalized
delta learning rule (backpropagation learning) has
highlighted the performance and limitation of this
architecture due to unavailability of more information
for the units of output layer for handwritten character
recognition [20].
Therefore the recurrent neural
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
90
network as in the form of backpropagation through
error and genetic algorithm is evaluated for the
time model (BPTT) offers a suitable framework for
recognition of handwritten characters of ’Marathi’
reusing the output values of the neural network in
script. The performance index for the feedforward
training and it exhibited some promising performance
multilayer neural networks is considered here with
but only for the dynamic patterns and shown
distributed instantaneous unknown error i.e. different
inefficiency static patterns [21]. Later on it has been
error for different layers. The genetic algorithm is
investigated that the feed forward multilayer neural
applied here to make the search process more efficient
network with enhance and extended version of
to determine the optimal weight vector from the
backpropagation learning algorithm [22] is more
population of weights. The genetic algorithm here is
suitable for handling the complex pattern classification
applied with distributed error and the fitness function
or recognition tasks in spite of its inherited problem of
for the genetic algorithm is also considered as the mean
local minimum, slow rate of convergence and no
of square distributed error that is different for each
guarantee of convergence [23 – 27].
layer. Hence the convergence is obtained only when the
minimum of different errors is determined. So that, the
It has been found that to overcome the problems of
instantaneous square error is not same for each layer
descent gradient searching in a large search space as in
instead of this it is different for each layer and it is
the case of complex pattern recognition task with
considered as distributed error for the multilayer feed
multilayer
IJoART
feedforward neural network due the
forward neural network, in which the number of units
evolutionary search algorithm i.e. genetic algorithm
in hidden layer and output layers are equal. Thus, the
(GA) is a better alterative [28]. The reason of this is
same desired output pattern for a presented input
quite obvious because this search technique is free from
pattern is distributed to every unit of hidden layers &
derivatives and it evolves the population of possible
outputs layer those contains the different actual outputs
partial solutions and applies the natural selection
and each layer has different square error. Thus, the
process for filtering them until the global optimal
instantaneous error is now distributed instead of back
solution is not found [29]. Various prominent results
propagated.
have been reported in the literature for the generalize
technique i.e. descent gradient of distributed error with
classification for the handwritten English character
genetic algorithm is used to train the multilayer neural
recognition problem with the integration of genetic
network architecture for the generalized classification
algorithm and backpropagation learning rule for
of hand written ’Marathi’ script.
multilayer
feed
forward
neural
The
proposed
hybrid
evolutionary
network
architecture[30,11]. In this approach the fitness
The rest of the paper is organized as follows:
performance for the weights has been considered with
Section 2 presents the generalized descent gradient
back-propagated error of the current input pattern
method for the instantaneous distributed error and the
vector. Thus, the performance of network still depends
implementation of genetic algorithm in generalize way
upon the back-propagated instantaneous random and
with
unknown error.
architecture and simulation design for the proposed
distributed
error.
Section
3
explores the
method. Section 4 presents the results and discussion.
In this paper the performance of feedforward
neural network with descent gradient of distributed
Copyright © 2013 SciResPub.
The section 5 of the paper presents the conclusion
followed by references.
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
2.
Generalized Descent gradient learning for
91
This constraint restrict the architecture in a way that the
number of units in output layer and the hidden layers
distributed square error
should same though the desire output pattern for the
A multilayer feed forward neural network with at
presented
input
pattern
could
accommodate
least two intermediate layers commonly known as
conveniently by each layer. Thus, for every hidden
hidden layer, in addition to the input and output layer
layer and output layer we have the different square
can
pattern
error. Therefore the optimum weight vector can obtain
classification task. The generalized delta learning rule
for each layer if the weights are adjusted in such a way
[23] is a very common and widely used technique to
that
train the multilayer feedforward neural networks for the
instantaneous square error of that layer. It exhibits that
pattern classification & pattern mapping. In this
we have more than one objective function or minimum
learning the optimum weight vector may be obtained
error, one each for each layer except the input layer for
for the given training set, if the weights are adjusted in
the presented pattern. It explores this problem as the
such a way that the gradient descent is made along the
multi-objective optimization problem. Thus, here the
total error surface in the weight space. The error for the
objective is
minimization is actually not the least mean square error
instantaneous square error simultaneously to determine
for the entire training set instead of this it is an
optimum weight vector for the presented input pattern.
perform
any
complex
generalized
the gradient
descent
to obtain
is
made
the minimum
along
the
of each
IJoART
instantaneous square error for each presented pattern on
Therefore, the mean of instantaneous square error of
each time. Thus, for every pattern on each time there
the layer is used to update the weights of the layer and
will be an unknown local error and there is the
the gradient descent of each error for each layer will
incrementally updating of the weight for each local
obtain at the same time. Therefore, there will be more
error. Hence each time the weights are updated to
than one gradient descent at one time of individual
minimize this known local error by propagating this
errors for the presented input pattern depending on the
error back to all hidden layers from the output layer.
number of hidden layers. Hence, the updating of weight
Thus, the instantaneous error for each presented input
vector for units of hidden layers and for the units of
pattern as the square difference between the desire
output layer will be proportional to their corresponding
pattern vector and the actual output for the units of
gradient descents. So that, there is a different gradient
output layer is backpropagated to units of hidden
for each layers. Thus, the optimal weight changes will
layers. In this current work we are considering the
proportional to the gradient descent of the distributed
distributed error instead of the backpropagated error.
instantaneous mean square errors for the presented
The instantaneous square error is not same for the each
input pattern. The generalized method for obtaining the
layer because each layer has its own actual output
weight update for hidden layers and output layer is
pattern vector. So that for each layer the instantaneous
formulated as:
square error is computed with the square difference
between desire output pattern vector for the given input
sample from the training set and the actual output
pattern vector of the respective layer. This distributed
instantaneous square error imposes a constraint on the
Let
(al , d l ) for l
1,2,
, L be the current input
pattern vector set of the training set of L pattern
samples is presented to the multilayer feed forward
neural network for formulating the generalized descent
architecture of multilayer feed forward neural network.
Copyright © 2013 SciResPub.
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
92
gradient of instantaneous square distributed error. As
we have discussed already about the constraint of this
ElO
multilayer feed forward neural network for keeping
K
1
2
k 1
[d klO
S k ( y klO )]2
(3)
same the number of units in hidden and output layer as
shown in figure 1
And,
E lH
J
1
2
j 1
[ d Hjl
S j ( y Hjl )]2
(4)
Hence, the update in the weight for the
k th unit of
output layer at iteration t for the current input pattern
vector is represented as;
wkjl (t )
ElO
wkj
kj
(5)
And also the update in the weight for the
IJoART
jth unit of
hidden layer at iteration t for the same current pattern is
represented as:
Fig. 1: Multilayer Feed Forward Neural
Network Architecture
w lji (t )
The current random sample pattern
ElH
w ji
ji
(al , d l ) of the
training set defines the instantaneous squared error
vector
e Op at the output layer and e Hp at the hidden layer
as:
elO
Here
kj
and
are the learning rates for the output
and hidden layer respectively.
dl
S ( y lO )
( d1l
S1 ( y1Ol ),
, d kl
S k ( y klO ))
Now, apply the chain rule on Equation 5, we have;
l
kj
(1)
elH
ji
(6)
dl
S ( y lH )
( d1l
S1 ( y1Hl ),
, d kl
w (t )
S k ( y klH ))
(2)
Therefore the instantaneous distributed mean square
error for the output and hidden layer is defined as
Here,
kj
the
J
y klO
j 1
S k ( y klO )
ElO
wkj
activation
value
Copyright © 2013 SciResPub.
is
S j ( y Hjl ) wkj and the output signal is
f ( y klO )
1
1 e
O
y kl
respectively:
Or,
kj
ElO y klO
y klO wkj
wkjl (t )
kj
ElO
S j ( y Hjl )
y klO
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
ElO
S k ( y klO )
S j ( y Hjl )
O
O
S k ( y kl )
y kl
kj
93
ElH
S j ( y Hjl )
J
j 1
[d kl
S j ( y Hjl )]
Hence we have:
ElO
S k ( y klO )(1 S k ( y klO )) S j ( y Hjl )
O
S k ( y kl )
kj
w lji (t )
J
j 1
ji
[ d kl
S j ( y Hjl )]S j ( y Hjl )(1 S j ( y Hjl ))ai
(9)
Thus, the weight at the iteration (t+1) for the units of
Now from the equation 3 we have:
ElO
S k ( y klO )
K
k
output layer with momentum term are presented as:
[d kl
1
S k ( y klO )]
w lji (t 1)
J
j 1
ji
Hl
j
S j ( y Hjl )(1 S j ( y Hjl )) ai
(10)
Here the momentum rate constant is considered
Hence we have:
wkjl (t )
with 0
K
k 1
kj
[d kl
1 for the hidden layer
S k ( y klO )]S k ( y klO )(1 S k ( y klO ))S j Here
( y Hjl )an interesting observation is considered about the
(7)
number of terms appearing in the expression for weight
IJoART
Thus, the weight at the iteration (t+1) for the units of
updating for the hidden layer. It can be seen from
output layer with momentum term are presented as:
equation 9 that the less number of terms are considered
K
wkjl (t 1)
k 1
kj
O
k
S k ( y klO )(1 S k ( yklO )) S j ( y Hjl )
(8)
Here the momentum rate constant is considered
with 0
Or,
ji
l
ji
w (t )
E
w ji
ji
E
y
ji
H
l
H
jl
learning
rule
for
the
backpropagated instantaneous mean square error. Thus,
gradient of distributed instantaneous mean square error.
H
jl
Hence it is obvious that should consider the fast
w ji
convergence with respect to conventional generalized
y
delta learning rule of backpropagated error.
ElH
ai
y Hjl
2.1
Genetic algorithm with descent gradient of
distributed Error
H
l
ji
backpropagation
computation of weight update according to descent
Similarly, apply the chain rule on Equation 6, we have;
w lji (t )
wkj (t 1)
from
the less time complexity is involved for the
1 for the output layer.
H
l
withl respect to the weight updating for hidden layer
H
jl
S j (y )
E
ai
H
S j ( y jl )
y Hjl
The majority of implementation of the GA is a
derivative of Holland’s innovative specification. In our
approach the genetic algorithm is incorporated with
ji
ElH
S j ( y Hjl )(1 S j ( y Hjl ))a i
H
S j ( y jl )
Now, from the equation 4 we have:
descent gradient for distributed instantaneous mean
square error learning in the multilayer feed forward
neural network architecture for the generalized pattern
classification. The input pattern vector with its
corresponding output pattern vector form the training
Copyright © 2013 SciResPub.
IJOART
w lji (t 1)
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-77763
94
set is presented to the neural network. The neural
iterations. So that the initial population of solutions for
network with its current setting of weights obtained the
GA is not random instead of this the initial population
actual output for each unit of hidden layers and output
of weights as solution is suboptimal because the
layer. The distributed instantaneous mean square error
weights have updated in the direction of convergence.
is obtained and the proposed descent gradient learning
Thus, the GA explores from suboptimal solution to
rule for distributed error is applied up to some fixed
multi objective optimal solution for the given problem.
arbitrary n iterations. Thus, the weights between the
The multi objective optimal solution reflects that every
layers and bias values of units are updated up to n
layer expect input layer has its own different error
iterations for the given input pattern and improved from
surface or objective function.
their initial stage. After this the iteration for weight
Chromosome Representation
A
update stops and the genetic algorithm is employed to
chromosome
is
a
collection
of
genes
evolve the population of modified weights and bias
representing either a weight value or a bias value
values. The genetic algorithm is applying for obtaining
represented in some real number. The initial population
the optimal weight vector from the large size of weight
of weight and bias for the representation of basic or
space for the given training set with following three
initial chromosome in our method is not random.
elements.
Instead of this the initial chromosome consists with
(i) The genetic code for the weight vector
suboptimal value of weight and bias. Therefore the
IJoART
representation in the form of chromosome;
chromosome is represented as a matrix of real numbers
(ii) The technique for evolving the population of
for the set of weight values and bias values. As we have
weight vectors;
discussed already that in our proposed multilayer neural
(iii) The fitness function for evaluating
the
performance of evolved weight vector;
network architecture the error is considered as
distributed instantaneous mean square error i.e. the
There are lot of works is reported on the evaluation of
different
neural network with genetic algorithm [24]. The
chromosome will partition in the sub-chromosomes
majority of the work indicates the integration of genetic
corresponding to each layer hidden layer and output
algorithm with neural network is found at following
layer. Hence, as per our general architecture of neural
three levels [25]:
network as shown in Figure 1 there will be two sub-
(i) Connection weights
(ii)
Architectures
(iii) Learning rules.
The evaluation of a weight vectors for the neural
error
for
different layers.
Hence the
chromosomes. In the first sub-chromosome, there will
be
(i j j) genes and for the second chromosome
there will be
( j k k) genes. Thus, the numbers of
network is an area of curiosity and it is considered in
the approach of this current work. In this approach the
genetic algorithm is using different fitness evaluation
function for each layer. The distributed instantaneous
mean square error for each layer is considered as the
sub-chromosomes depend upon the number of hidden
layer but the number of genes in every subchromosome will same, though values of genes may
different in each sub-chromosome.
fitness evaluation function for that layer. Generally the
GA starts from the random initial solution and then
converges for the optimal solution. In our approach the
GA applies after the updating of weights up to n
Copyright © 2013 SciResPub.
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
95
the next population of sub-chromosomes for the hidden
The Mutation Operator
and output layer respectively. The inner
Mutation operator randomly selects a gene from
chromosome and modified it with some random value
to generate the next population of chromosome. The
prepares a new sub-chromosome at each iterations of
mutation and outer
operator is building the new
probability of mutation is kept low to minimize the
population of sub-chromosome called
randomness for genetic algorithm. In our approach the
C ON _ new .
mutation operator applied to each sub-chromosome,
randomly selects a gene from each sub-chromosome
and adds a small random value between +1 and -1 to
generate the next population of sub-chromosome. Let
N
operator
C HN _ new &
Elitism
Elitism is used with the creation of each new
population to continue the old good population in the
next generation. This process has the significance in the
C for the network which is
way that the good solution of previous population
C HN and
should not lose by the application of genetic operators.
C ON for hidden layer and output layer. C HN is
This involved copying the best encoded network
we have the chromosome
partitioned in the two sub-chromosomes as
containing
(i j)
j
containing
( j k) k
m H genes
while
N
O
C is
m O genes. The size of next
unchanged into the new population as given in
Equations 11 and 12, to include
C HN _ old & C ON _ old for
IJoART
generated population would be N H
1 and N O 1
N _ new
creating C H
N _ new
& CO
.
Selection
respectively. If the mutation operator has applied n
The selection process of genetic algorithm selects good
times over the old sub-chromosome for the output layer
or fit population from the newly generated population.
and the hidden layer respectively then we have the
Here the selection process simultaneously considers
following new population of the sub chromosomes
newly generated sub chromosomes of hidden layer and
[26]:
output layer i.e.
C HN _ new and C ON _ new respectively for
selecting the good population for further cycle. Let a
C
N _ new
H
C
N _ old
H
n
i 1
[C
N _ old
H ,m H
H
(C
N _ old
H, H
H
)]
sub chromosome
which the distributed instantaneous mean square error
(11)
H
for the hidden layer i.e. El for the pattern l reached to
And
its
C
N _ new
O
C
N _ old
O
n
i 1
[C
N _ old
O,m O
O
(C
N _ old
O, O
O
)]
(12)
Here
H
and
N _ new
CSel
is selected for
H from C H
O
are the small random generated values
between -1 to + 1 for sub chromosomes of hidden layer
and output layer respectively,
randomly selected genes from
chromosomes respectively and
H &
O are the
old
C old
H and C O sub-
C HN _ new & C ON _ new are
Copyright © 2013 SciResPub.
accepted
minimum
Sel
level.
N _ new
chromosome C N from C O
Likewise
a
sub
is selected for which
the distributed instantaneous mean square error for the
O
output layer i.e. E l for the same pattern l reached to
its accepted minimum level.
Crossover
Crossover is a very important and useful operator of
genetic algorithm.
Here the crossover
operator
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
considers the selected sub-chromosomes from
CSel
H &
Where
and
96
are the randomly selected genes
Sel
CSel
H & C O and
C Sel
O and creates next generation of population
positions from the sub-chromosomes
separately for the hidden layers and output layer. We
next
C next
H & C O is the next generation of population of
apply the uniform crossover operator n times on the
size n+1. Thus, after the cross over operation we have
selected sub chromosomes on different crossover points
2(n+1) total populations of chromosome for the
to obtain the next generation of population. Let the
selected
CSel
H and
sub-chromosomes
network i.e. n+1each for hidden layer and for output
C Sel
O are
layer.
considered for uniform crossover as shown in Figs.2-4.
Fitness Evaluation Function
Fitness evaluation function of genetic algorithm is used
to evaluate the performance of generated new
populations. It filters the populations those find suitable
as per the criteria of fitness function. Here, we use the
separate fitness evaluation function for each layer.
Therefore as per our neural network architecture, two
IJoART
fitness evaluation functions have used. The one is for
output layer and second one is for the hidden layer. The
first
fitness
evaluation
function
estimates
the
performance for the sub chromosome of hidden layer
i.e.
C next
H and second one estimates the performance for
next
the sub chromosome of output layer i.e. C O . The
fitness function used here is proportional to the sum of
distributed instantaneous mean squared error on
respective layers. The fitness function f H for the hidden
layer considers the instantaneous mean square error as
specified in equation 4 to evaluate the performance of
Fig. 4: After applying crossover operator
Therefore, on applying the crossover operator n times
Sel
sub-chromosomes for hidden layer i.e.
C next
H . The
Sel
on selected sub-chromosome ( C H & C O ), the n+1
fitness function
population of sub-chromosomes each can be generated
instantaneous mean square error as specified in
as [27]:
next
H
C
Equation 3 to evaluate the performance of subSel
H
C
n
i1
Sel
H
[(C
Sel
,H
Sel
,H
C
Sel
,H
C ) (C
VH VH
Sel
,H
C )] chromosomes for output layer i.e. C next . Thus, the
O
(13)
genetic algorithm attempts to find weight vectors and
And
next
O
C
f O for the output layer considers the
Sel
O
C
n
i1
Sel
O
[(C
Sel
,O
C
bias values for different layers those minimize the
Sel
,O
Sel
,O
C ) (C
VO VO
(14)
Copyright © 2013 SciResPub.
Sel
,O
C )]
corresponding instantaneous mean of squared error.
This procedure for evaluating the performance for
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
97
weight vector of hidden and output layer can represent
are collected in this simulation as input stimuli for the
as:
training pattern set. These scanned images of distinct
min errorH
1.0 && min errorO
1.0
handwritten characters of ‘Marathi’ scripts are shown
in figure 5 as:
Do for all n+1chromosomes
{
( if
H
) then C Hmin
ECl ,next
(min errorH
H ,i
H
) then COmin
ECl ,next
( if (min errorO
O ,i
else (((min errorH
( (min errorO
C Hnext
,i )&&
COnext
,i )
min errorH )) &&
min errorO ) ))
}
Here
min
C min
& C O represents the sub-chromosomes
H
those have the minimum error for hidden and output
layers respectively. Here we also have the possibility
for getting more than optimal weight vectors for the
IJoART
given training set because there are more than one sub-
chromosomes in hidden and in output layers those
evaluated as fit by the fitness evaluation functions of
respective layers.
3
Simulation Design and Implementation
In this simulation design and implementation, two
proposed multilayer feed forward neural networks are
considered. Both neural networks are trained with
proposed descent gradient of distributed instantaneous
mean square algorithm. Since every input pattern
consist with 16 distinct features so that each neural
network architecture contains 16 processing units in the
input layer. First neural network architecture consists
with input layer, two hidden layers with five units in
each and one output layer with 5 units. Second neural
network architecture consists with input layer, one
hidden layer of 5 units and output layer also with 5
units.
Feature Extraction
There are five different samples of handwritten
characters of ’Marathi’ script from five different people
Copyright © 2013 SciResPub.
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
98
Fig. 5: Scanned images of handwritten distinct
‘Marathi’ scripts
The scanned images of hand written characters of
‘Marathi’ scripts as shown in figure 5 are partition into
sixteen equal parts, and the density values of the pixels
for each part were calculated and obtained the center of
density gravity. Therefore for each scanned image of
handwritten characters of ‘Marathi’ scripts we obtained
the sixteen values as the input pattern vector of training
set. Thus, we have the training set, which consist with
sampled
patterns
of
handwritten
characters
of
‘Marathi’ scripts and each sample pattern is considered
as pattern vector of dimension 16 1 with real number
values. The output pattern vector corresponds to input
pattern vector is of dimension 5 1 of the binary
values. The test input patterns set is also considered
with same method for the sample patterns those were
IJoART
not used in training set. The sample test patterns were
used to verify the performance of trained neural
networks.
Simulation design for 16-5-5-5 Neural Network
Architecture
In the simulation of proposed feed forward multilayer
neural network architecture with two hidden layers of 5
units each and one output layer of 5 units (16-5-5-5)
involves three different instantaneous mean of square
errors at the same time i.e.
for first hidden layer &
E o for output layer, E h1
E h 2 for second hidden layer,
those are presented as for pattern l:
ElO
1
2
K
k 1
(d kl
S k ( y klO )) 2
(15)
ElH1
1
2
G
g 1
(d kl
S g ( y glH1 )) 2
(16)
Copyright © 2013 SciResPub.
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
And
ElH 2
1
2
J
j
(d kl
1
S j ( y Hjl 2 )) 2
99
After this the selection is applied to all the three subchromosomes for selecting the better population of
chromosomes for next generation. This selection
(17)
the
procedure considers the of distributed instantaneous
instantaneous mean of square error updates the weight
mean of square error as specified in equations 15, 16
vector up to t iterations. After this the weight updating
and 17 as the fitness evaluation function to select the
is stopped and the genetic algorithm applies. The
sub chromosomes for the next generation. Now we
updated weight & bias values are considered as the
apply the cross over operator simultaneously on all the
initial population of chromosome for the genetic
selected sub-chromosomes to generate the large
algorithm. As per our proposed neural network
population in next generation. Thus, the cross over
architecture in this simulation design we have three sub
operator generates the populations of sub-chromosomes
chromosomes one each for both hidden layers and one
for first hidden layer, second hidden layer and output
for the output layer. The first sub-chromosome as
layer of size 85 genes, 30 genes and 30 genes
shown in figure 6 is of 85 genes in which 80 are the
respectively. So that, the selected population of weights
weights values on the connection link and 5 are the bias
and biases form each sub-chromosome determines the
for the units of hidden layer. The second and third sub-
optimal solutions for the given Training pattern set.
chromosomes are of 30 genes each in which 25 are
Thus, there are minimum three optimal solutions are
The
proposed
gradient
learning
rule
for
IJoART
weight values on the connection link and 5 are the bias
required for the convergence of neural network.
for the units of second hidden layer and output layer.
Simulation design for 16-5-5 Neural Network
Architecture
In the simulation of proposed feed forward multilayer
neural network architecture with one hidden layer of 5
units and one output layer of 5 units (16-5-5) involves
two different instantaneous mean of square errors at the
same time i.e.
E o for output layer & E h1 for first
hidden layer, those are presented as for pattern l:
1
2
ElO
K
k 1
(d kl
S k ( y klO )) 2
(18)
And
ElH
1
2
J
j 1
(d kl
S j ( y Hjl )) 2
(19)
Fig. 6 (c): Sub-chromosome 3 for output layer of 30
In this experiment we divide the chromosome into the
genes
two sub chromosomes one each for hidden layer and
The mutation operator applies simultaneously to all the
output layer. The first sub-chromosome as shown in
three sub-chromosomes by adding the small random
values between -1 and 1 to the selected genes to
generate the new population of these sub chromosomes.
Copyright © 2013 SciResPub.
figure 7 is of 85 genes in which 80 are the weights
values on the connection link and 5 are the bias for the
units of hidden layer. The second sub-chromosome
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
100
consists with 30 genes in which 25 are weight values
training set of handwritten characters of ‘Marathi’
on the connection link and 5 are the bias for the units of
scripts.
output layer.
Genetic Algorithm with backpropagated error: The
parameters
of
the
genetic
algorithm
with
backpropagated error for the simulation of both the
experiments are as follows:
Parameter
Value
Learning rate for
output layer
(
O
Fig. 7 (b): Sub-chromosome 2 for output layer of 30
Learning rate for first
genes
hidden layer (
H1
The mutation operator applies simultaneously to both
the sub-chromosomes by adding the small random
values between -1 and 1 to the selected genes to
0.01
)
0.01
)
Learning rate for
second hidden layer
0.1
IJoART
generate the new population of these sub chromosomes.
(
H2
)
After this the selection is applied to both the sub-
Momentum term
chromosomes for selecting the better population of
( )
0.9
chromosomes for next generation. This selection
procedure considers the of distributed instantaneous
Adaption rate
(K )
3.0
mean of square error as specified in equations 18 and
19 as the fitness evaluation functions to select the sub
chromosomes for the next generation. Now we apply
Mutation
3
population size
the cross over operator simultaneously on all the
selected sub-chromosomes to generate the large
Crossover
population in next generation. Thus, the cross over
population size
operator generates the populations of sub-chromosomes
for hidden layer and output layer of size 85 genes and
Initial population
1000
Randomly generated values
between 0 and 1
30 genes respectively. So that, the selected population
Back propagated
of weights and biases form each sub-chromosome
determines the optimal solutions for the given Training
Fitness
pattern set. Thus, there are minimum two optimal
evaluation function
solutions are required for the convergence of neural
(one fitness function)
network.
3.3 Parameters used
The following parameters are used to accomplish the
Minimum error
(MAXE)
instantaneous squared error
El
1
2
K
k 1
(d k
S k ( y kO ))
0.00001
simulation of these two experiments for the given
Copyright © 2013 SciResPub.
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
Table 1: Parameters used for genetic algorithm with
Mutation
back propagated error
probability
101
Smaller than 0.01
Mutation
Genetic algorithm with distributed error: The
parameters used in the simulation of both the
experiments for genetic algorithm with descent gradient
learning for distributed error are as follows:
population
size for sub-
3
chromosome
of output layer
Mutation
Parameter
Value
population
size for sub-
Learning rate
chromosome
for output
layer
(
O
0.01
)
Crossover
for hidden
population
0.1
layers
H1
&
H2
of hidden
layers
Learning rate
(
3 each
)
Momentum
term for
output layer
( )
Momentum
term for
size for output
layer
IJoART
output layer
Crossover
population
0.9
architecture)
0.7
size for
second hidden
3.0
layer( for 16-
architecture)
0.0001
Crossover
population
(MAXEO )
size for
Minimum
hidden layer(
error for the
0.001
1000
for 16-5-5
architecture)
Number of
iteration prior
Copyright © 2013 SciResPub.
500
5-5-5
error for the
( MAXE H )
Crossover
population
Minimum
hidden layers
1000
for 16-5-5-5
Adaption rate
output layer
size for first
hidden layer(
( )
(K )
1000
5000
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
102
to applying
16-5-5-5 is also found efficient and more generalized
GA
for the test pattern set also. The results of performance
Initial
population
Values of weights & bias in each sub
evaluation are shown with tables 5 and 6. The entries of
chromosomes up to 5000 iterations of
tables are presenting mean values of iterations and
descent gradient for distributed error.
number of convergence weight matrices of five trials
Fitness
Distributed instantaneous sum of
evaluation
squared errors
with each hybrid technique for given training set.
functions (two
fitness
ElO
1
2
ElH1
1
2
G
ElH 2
1
2
J
function for
16-5-5
architecture
and three
fitness
function for
K
k 1
g 1
j 1
(d kl
S k ( y kO )) 2
(d kl
S g ( y gH1 )) 2
(d kl
S j ( y Hj 2 )) 2
16-5-5-5
architecture)
IJoART
Table 2: Parameters used for decent gradient
learning with distributed error
4
Results and Discussion
The results from Simulation design and implementation
for both the neural network architectures i.e. for 16-5-
5-5 and 16-5-5 are considered for 65 training sample
examples of
Handwritten ‘Marathi’ scripts with two
hybrid techniques. The techniques commonly used are
genetic
algorithm
with
descent
gradient
for
backpropagated instantaneous mean square error and
genetic algorithm with descent gradient for distributed
instantaneous mean square error. The performance of
both the neural network architectures have been
evaluated with these two hybrid techniques of learning
for the given training set and the performance analysis
is also performed. Hence in the performance analysis it
has been found that the neural network architecture of
16-5-5-5 performed more optimally in terms of
convergence, number of epoch and number of optimal
solutions for the classification of patterns in training
set. The performance of neural network architecture for
Copyright © 2013 SciResPub.
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
103
IJoART
Copyright © 2013 SciResPub.
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
104
Table 5: Performance evaluation for GA with
descent gradient of distributed Error and back
Propagated Error for 16-5-5 architecture
IJoART
Copyright © 2013 SciResPub.
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
105
Table 6: Performance evaluation for GA with
descent gradient of distributed Error and back
Propagated Error for 16-5-5-5 architecture
In the results tables are containing the information
IJoART
about counts. The counts are here representing the
number of optimum solutions i.e. the number of weight
matrices on which the network is convergence for the
given training set. The integer value for the epoch in
tables is representing the number of iterations
performed by each learning method to classify the
given input pattern. It has been observed from the
results that no case of non convergence is found. Thus
the network is able to successfully converge for more
than one optimum weight vectors or solution for the
given input pattern. Table 5 of simulated result is
showing the performance evaluation between GA with
descent
gradient
of instantaneous mean
square
distributed error and GA with descent gradient of
backpropagated error for the network architecture 16-55. This evaluation is considered about the parameter of
epochs i.e. number of iteration for the convergence and
number of counts i.e. number of optimal converged
weight vectors. Results of table 5 are considered for
mean of five trials for the same input pattern. Table 6
of simulated result is showing the performance
Copyright © 2013 SciResPub.
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
106
evaluation between GA with descent gradient of
on the parameter of number of iterations and number of
instantaneous mean square distributed error and GA
counts.
with descent gradient of backpropagated error for the
5. Conclusion
network architecture 16-5-5-5. This evaluation is also
In this work we have considered the simulation of two
considered about the parameter of epochs i.e. number
neural network architectures for their performance
of iteration for the convergence and number of counts
evaluation with descent gradient of instantaneous mean
i.e. number of optimal converged weight vectors.
square distributed error with GA and descent gradient
Results of table 6 are also considered for mean of five
of instantaneous mean square backpropagated error
trials for the same input pattern. An important analysis
with GA for the classification of handwritten ‘Marathi’
about the optimal solutions is also observed form this
curve scripts. We considered the instantaneous mean
simulation. Here an optimal solution is obtained only
square distributed error as the mean of square
when there is more than one objective functions are
difference between target output pattern and actual
satisfied at one time. As in the case of our neural
output pattern from each unit of each layer differently
network of 16-5-5 architecture there are two objective
correspond to present input pattern. Thus, the common
functions one each for hidden layer and output layer.
target pattern is used by each layer with their respective
The network is converged only when both the objective
different computed actual output pattern. Therefore in
functions find their defined minimum error threshold.
this approach the convergence for the given training
IJoART
Similarly in the neural network of 16-5-5-5 architecture
samples is considered only when three different error
we have the three different objective functions and the
functions are minimized simultaneously. Hence, the
network is converged only when all the three objective
optimum solution is constraints with three objectives
functions find their defined minimum error threshold.
functions and this reflects the case of multi objective
Thus, the performance of neural networks for descent
optimization instead of single objective optimization as
gradient of instantaneous mean square distributed error
in the case of descent gradient of instantaneous mean
considers as the multi-objective optimization. On the
square backpropagated error. Therefore on the basis of
other hand the GA with
simulation
descent gradient of
instantaneous mean square back-propagated error
results
&
analysis
the
following
observations can be drawn:
considers only one objective i.e. one common error
function for objective function for all the layers. So
1.
It can observe that the performance of GA with
that, number of optimal solutions or counts are
descent gradient of distributed error for multi
reflecting only the converged weight matrices or
objective optimization is better in most of the cases
optimal weight matrices to obtain only one minimum of
than GA with descent gradient of backpropagated
error. Thus, it exhibits the case of single objective
error for single optimization in terms of number of
optimization. It can be seen from the result of Table 5
optimize solutions or counts. This is obvious that
& 6 that the performance of neural network architecture
number of iteration for GA with descent gradient
with descent gradient of instantaneous mean square
of distributed error are more because the in this
distributed error for multi objective optimization is
method there are three objective functions and all
approximately same as GA with descent gradient with
of them should minimize for the optimal solution.
back-propagated error for single objective optimization
2.
It can also see from the results that the behavior of
GA with descent gradient of distributed error is
Copyright © 2013 SciResPub.
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
107
more consistent & exhibiting less randomness in
with different methods of image processing for
compare to GA
of
feature extraction from the handwritten curve
backpropagated error. There is also another
scripts. These aspects can consider for future work
interesting observation about the performance of
to evaluate the performance for propose method on
neural networks for GA with descent gradient of
various problem domain.
with
descent
gradient
distributed error for the number of counts and
iterations for the new pattern information and for
References
the same pattern information with different
examples. Every time for the same pattern
approach”, New Delhi: Tata McGraw-Hill
counts are more & number of iteration are less and
(2004)
[2]
Sun, Y., “Hopfield neural network based
number of iterations are high. So that when we
algorithms
move from one unknown local error minimum to
reconstruction-Part
another unknown local error minimum there is less
Simulations”, IEEE Transaction on Signal
number of optimum solutions and it requires more
Process vol. 48(7), pp. 2105-2118 (2000)
[3]
number of iterations to converge.
for
image
I:
restoration
and
Algorithms
and
Szu, H., Yang, X., Telfer, B. and Sheng, Y.,
IJoART
Generally the GA starts form the random solutions
“Neural network and wavwlet transform for
and converge towards the optimal solution. Hence
scale invariant data classification”, Phys. Rev.
in multi objective optimization the randomness of
E 48, pp. 1497-1501 (1993)
GA more increases and possibility to obtain
[4]
Nagy, G., “Classification Algorithms in
optimal solution decreases. In the proposed
Pattern Recognition,” IEEE Transactions on
technique, the GA does not start from random
Audio and Electroacoustics, vol. 16(2), pp.
population of solutions but instead of this it starts
203-212 (1968)
from the sub-optimal solutions, because the GA is
[5]
Hoppensteadt, F.C. and Ihikevich, E.M.,
applied after the some iteration of descent gradient
“Synchronization
of instantaneous mean square distributed error.
Associative
These
Neurocomputing,” Phys. Rev., vol. 62(E), pp.
iterations
explore
the
direction
for
GA starts from sub-optimal solutions and moves
of
Memory,
Laser
Oscillators,
and
Optical
4010-4013 (2000)
convergence and from here the GA starts. Thus,
4.
Kumar, S., “Neural Networks: A Class room
information with different examples the number of
for new pattern information these counts are low &
3.
[1]
[6]
Keith, L.P., “Classification Of Cmi energy
towards the optimal solutions.
levels
The multi objective optimization is a dominate
networks,” Phys. Rev., vol. 41(A), pp. 2457-
thrust area in soft computing research. There are
2461 (1990)
various real world problems where multi objective
[7]
using
counterpropagation
neural
Carlson, J.M., Langer, J.S. and Shaw, B.E.,
optimization is required. The proposed method
“Dynamics of earthquake faults,” Reviews of
may explore the possibility to achieve the optimal
modern physics, vol. 66(2), pp. 657-670
solutions for various problems of multi objective
(1994)
optimization. The performance of GA with descent
gradient of distributed error can be more improved
Copyright © 2013 SciResPub.
[8]
Palaniappan, R., “Method of identifying
individuals using VEP signals and neural
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
[9]
[10]
networks,” IEE Proc. Science Measurement
Inst. Electron. Commun. Eng., vol.65(E), pp.
and Technology, vol. 151(1), pp. 16-20 (2004)
107-114 (1982)
Zhao, H., “Designing asymmetric neural
[18]
and Lam, L., “Computer recognition of
Rev. vol. 70(6), pp. 137-141 (2004)
unconstrained handwritten numerals,” Proc.
Schutzhold, R., “Pattern recognition on a
IEEE, vol. 80(7), pp. 1162-1180 (1992)
[19]
[13]
“Handwritten digit recognition by neural
Impedovo, S., “Fundamentals in Handwriting
networks with single-layer training,” IEEE
Recognition.”
Trans. on Neural Networks, vol. 3, pp. 962-
NATO-Advanced
Study
968 (1992)
Mori, S., Suen, C.Y. and Yamamoto, K.,
[20]
Neural Network Architecture for Visual
development,” Proceeding of the IEEE, vol.80
Pattern Recognition,” IEEE Trans. on Neural
(7), pp. 1029-1058 (1992)
Networks, vol. 8(2), pp. 331-340 (1997)
Fukushima, K. and Wake, N., “Handwritten
[21]
inverting a deformable template model of
IJoART
handwritten digits,” Proc. Int. Conf. Artificial
Networks, vol. 2(3), pp. 355-365 (1991)
Neural Networks, Sorrento, Italy, pp. 961-964
Blackwell, K.T., Vogl, T.P., Hyman S.D.,
(1994)
Approach
to
Handwritten
[22]
Co., Boston, MA (1996)
[23] Rumelhart, D.E., Hinton G.E., and Williams
655-666 (1992)
Boser, B., Denkar, J.S.,
R.J., “Learning internal representations by
Henderson, D., Howard, R.E., Hubbard, W.,
error propagation.”, MIT Press, Cambridge,
and
vol. 1,pp. 318–362 (1986).
Ie Cun, Y.,
Jackel,
Recognition
L.D.,
“Handwritten
with
a
Digit
Back-Propagation
[24]
Sprinkhuizen-Kuyer, I.G., and Boers, E.J.W.,
Network,” Advances in Neural Information
“The Local Minima of the error surface of the
Processing Systems, vol. 2, pp. 396-404
2-2-1 XOR network,” Annals of Mathematics
(1990)
and Artificial Intelligence, vol. 25(1-2), pp.
Kharma, N.N., and Ward, R.K., “A novel
107-136 (1999)
invariant mapping applied to hand-written
[25]
Zweiri,
Y.H.,
Seneviratne,
L.D.,
and
Pattern
Althoefer, K., “Stability Analysis of a Three-
Recognition vol. 34(11), pp. 2115-2120
Term Backpropagation algorithm,” Neural
(2001)
Networks Journal, vol. 18(10), pp. 1341-1347
Arabic
[17]
Hagan, M.T., Demuth, H.B. and Beale, M.H.,
“Neural Network Design,” PWS Publishing
Character
Recognition,” Pattern Recognition vol. 25, pp.
[16]
Urbanczik, R., “A recurrent neural network
neocognitron,” IEEE transaction on Neural
Barbour, G.S. and Alkon, D.L., “A New
[15]
Lee, S.W., and Song, H.H., “A New Recurrent
“Historical review of OCR research and
alphanumeric character recognition by the
[14]
Knerr, S., Personnaz, L., and Dreyfus, G.,
pp. 311-316 (2003)
Institute, vol. 124, Springer-Verlag (1994)
[12]
Suen, C.Y., Nadal, C., Lagault, R., Mai, T.A.,
networks with associative memory,” Phys.
quantum computer,” Phys. Rev. vol. 67(A),
[11]
108
Badi,
character
K.
and
recognition,”
Shimura,
M.,
(2005)
“Machine
recognition of Arabic cursive script,” Trans.
[26]
Abarbanel, H., Talathi, S., Gibb, L., and
Rabinovich, M., “Synaptic plasticity with
Copyright © 2013 SciResPub.
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014
ISSN 2278-7763
109
discrete state synapses,” Phys. Rev., vol. E,
72:031914 (2005)
[27] Shrivastava, S. and Singh, M.P.,
“Performance evaluation of feed-forward
neural network with soft computing
techniques for hand written English
alphabets”, Journal of Applied Soft
Computing, vol. 11, pp. 1156-1182
(2011)
IJoART
Copyright © 2013 SciResPub.
IJOART
Download