2-D pattern Modified neocognitron for improved recognition [I]),

advertisement
Modified neocognitron for improved 2-D pattern
recognition
C.N.S.Ganesh Murthy
Y.V. Ve n kates h
Indexing terms: Neocognitron, 2-D pattern recognition, Training
Abstract: Some modifications to an existing
neural network, the neocognitron, are proposed
in order to overcome some of its limitations and
to achieve an improved recognition of patterns
(for instance, characters). Motivation for the
present work arose from the results of extensive
simulation experiments on the neocognitron.
Inhibition during training is dispensed with,
including it only during the testing phase of the
neocognitron. Even during testing, inhibition is
totally discarded in the initial layer because it
leads, otherwise, to some undesirable results.
However, inhibition, which is feature-based, is
incorporated in the later stages. The number of
network parameters which are to be set manually
during training is reduced. The training is made
simple without involving multiple training
patterns of the same nature. A new layer has been
introduced after the C-layer (of the neocognitron)
to scale down the network size. Finally, the
response of the S-cell has been simplified, and the
blurring operation between the S- and the Clayers has been changed. The new architecture,
which is robust with respect to small variations in
the value of the network parameters, and the
associated training are believed to be simpler and
more efficient than those of the neocognitron.
1
Introduction
In visual pattern recognition, the objects to be recognised are subjected to various forms of transformation,
involving shift, scale and rotation. The standard techniques found in the literature on pattern recognition
come nowhere close to the human ability to perform
transformation-invariant
recognition.
Obviously
inspired by the remarkable power of the human vision
system, attempts are being made to design an artificial
neural vision machine for imitating some of its aspects.
A basic problem in this context is to synthesise a network of these artificial neurons, in order to endow it
with pattern-recognition capabilities.
The human visual system seems to have a hierarchical structure (see Hubel and Wiesel [I]), in which simple features are first extracted in the early layers from a
stimulus pattern, then these are integrated, in the
higher layers, into more complicated versions. In this
hierarchy, a cell in a higher stage is likely to receive signals from a wider area of the retina, and, perhaps, as a
consequence, is more insensitive to the position of the
stimulus.
Many techniques have been proposed to solve the
problem of character recognition. See, for instance, [2]
for a review of these techniques. It is now acknowledged that neural techniques offer advantages in terms
of speed, fault tolerance and adaptation. Many models
have been suggested in the literature on the application
of neural networks. These models differ in complexity
and capabilities, and can be classified as models with
and without self-organisation and feedforward networks consisting of layers of neurons. However, in this
report, we consider only those which have been proposed for recognition of 2-D patterns.
The multi-layer perceptron (MLP) [3,4] is a well
known architecture in feed-forward networks. However
the MLP treats a 2-D pattern as a stacked 1-D vector,
and hence, two similar patterns shifted by even one
pixel are regarded as two different patterns. For the
same reason, in an MLP, invariance to scale and distortion cannot be achieved in a straightforward way:
the patterns are to be pre-processed to extract invariant
features before using MLP as the classifier. Hence, a
stand alone MLP without preprocessing cannot be used
for recognition.
2
Review of related work
Fukushima has proposed neural networks [5-91 for the
recognition of 2-D patterns which might have been
subjected to certain types of deformation. His architecture consists of a cascade connection of pairs of cell
layers, called S (simple) and C (complex) layers. Each
layer consists of many planes; and each plane is a 2-D
array of cells. The output of an S-layer is the input to
the succeeding C-layer. The input to the first S-layer is
from the photo-receptor, and the output of the last Clayer is the result of the recognition operation. In the
last C-layer, each cell corresponds to a specific pattern
(see Fig. 1).
The neocognitron is a hierarchical multi-layered neural network capable, according to its creator, of deformation-invariant visual pattern recognition. Even if the
input pattern is deformed in shape, only one cell, corresponding to the category of the input pattern, is acti-
vated on the highest stage of the network. Other cells
respond to other categories. Fukushima has reported
that recognition of one of the ten numerals, with small
shifts within the 16 by 16 image, has been successful.
S-laver
C-laver
S-laver
mann [14] that shift invariance can be obtained only by
creating a model, which simply responds to the total
energy in the image. However, this is not a satisfactory
result as far as neural nets are concerned. It may be
noted here that, Barnard and Casasent [15] have also
analysed why the neocognitron fails to be an intrinsically shift-invariant pattern recogniser.
We now examine some aspects of processing carried
out by the neocognitron, which is supposed to recognise patterns after an appropriate training phase, even
in the presence of shifts, scale changes, and distortions.
In this network, the C-cells are meant to make the network robust against distortion. The input and the output of a C-cell are related as follows:
p)3m,
r
CL[P,3, kl = @
SIUSLb2),3+ 7-, k-
r
where
moduI e-1
module- 2
Fig. 1 Schematic structure of neocognitron Inhibitory neurons are not
shown
The Cognitron [5], the earlier model of Fukushima,
uses unsupervised learning, and does not have the ability to correctly recognise position-shifted or shape-distorted patterns. In the neocognitron architecture [6], a
technique dealing with deformations and shifts in position is addressed, but scaling is not taken care of.
Because of unsupervised learning used in [6], the
number of planes in the Us1 and the Us2 layer can
only be figured out based on experimentation and heuristics. Among the 24 planes in the Us1 layer (see Fig.
12 in [6]) four of them respond to the vertical line (different positions). Also, because of the type of bIurring
and reduction of size from Us1 to Ucl (16x16 plane in
Us1 corresponds to 10x10 in Ucl), patterns presented
at some particular shifted positions will not be recognised properly, as demonstrated experimentally in
[10,11]. To overcome these defects of [6], the recent
paper of Fukushima [S] deals with an application of the
neocognitron, and ostensibly attempts to make the network tolerant to deformations, shift and scaling. In [S],
Fukushima uses shifted and scaled versions of the
exemplars, to train the neocognitron to achieve shift
and scale invariance.
A recent paper by Li and Wu [12] discusses the problem of rotation invariant-pattern recognition using the
neocognitron. Li and Wu make a reference to the Fourier-Mellin transform approach of Casasent [ 131 for
invariant recognition of 2-D patterns. It is well known
that Fourier transform techniques entail high precision
arithmetic, and are not suitable for computer implementation. While the multi-layered structure of Li and
Wu is the same as the neocognitron, the input to their
second layer is a set of rotated versions of the object,
as though each rotated version is a separate object. The
functions of the other layers are similar to those of the
corresponding layers in the neocognitron.
Menon and Heinemann [14] have found that the neocognitron does not perform satisfactorily when it has to
discriminate between three (somewhat larger) objects
with larger shifts in a 128 by 128 image. A similar but
independent conclusion, that the neocognitron does not
possess a shift invariance property, has also been
reported in [10,11]. It is found by Menon and Heine-
s
1
J
+ SI
(1)
O(z) = z / ( g ~+ x) if z 2 0
and
@(a) = 0 if
II:
<0
The parameter a, is a positive constant, which determines the degree of saturation of the output, and the
weights D[r, s] are chosen so as to decrease monotonically with respect to the distance from ( r , s) = (0, 0).
Further, in the neocognitron, the feature extraction is
carried out by S-cells, whereas tolerance to positional
shift is accomplished by the C-cells. The output of a
typical S-cell is given by:
SL[i,II,k-] =
(2)
where o ( x ) = max(x, 0).
The parameter rL controls the selectivity of the S-cell
to an input pattern. The value of the inhibitory input
V, to the S-cell is given by
I
where the weights g [r, s] are chosen so as to decrease
monotonically with respect to the distance from (r, s) =
(0, 0).
The reinforcement strategy is as follows [2]:
AL[i,P,T,Sl =4Lg[~,sl~L-l[P,~,s]
where (il k) denotes the coordinates of the representative S-cell, and qL is a positive constant which determines the amount of reinforcement.
In order to provide the motivation for the present
paper, the neocognitron (Fig. 1) has been simulated for
verifying its ability to recognise two-dimensional patterns [10,11]. We present below an analysis of these
results.
3
Analysis of experimen~airesults obtained
from the n e o ~ o ~ ~ i t ~ o n
We recall that the scale invariance of Fukushima [5-S]
actually requires training the network with a number of
scaled versions of the same pattern, and then grouping
the outputs. In order to show that the neocognitron
fails if the shifted and scaled versions of the patterns
are not used to train the network, the training set is
chosen to contain no shifted and scaled versions of the
exemplars. This does not mean that using the shifted
and scaled versions of the patterns to train the network
will enhance the performance of the network. The specifications of the neural network actuallv simulated are
given in Table 1.
v
H
2
T
1 plane 9 x 9
Layer 1
S-layer
4 planes (9x9) 8 planes (7x7) 4 planes (5x5)
Inhibitory input
to the
1 plane (9x9)
Slayer
Layer 2
1 plane (7x7)
Layer 3
948
H.
532
Z = 690
f = 590
V e
8.35
H I 605
777
T = 707
Fi .3 Patterns used to test the performance of the network (Fukushuna)
a$ the corresponding resultsfor V
Top two patterns are correctly recognised, bottom 2 patterns are not correctly
recognised
1 plane (5x5)
v
Receptive area
for each S-cell
from each
preceding
C-plane
(3x3)
C-layer
4 planes (7x7) 8 planes(5x5) 4 planes (1x1)
Receptive area
for each C-cell
from the
corresponding
S-plane
v=
-
z.
Table 1: Specifications of the simulated network (Fukushima)
Input layer
--
= 792
507
6 M
549
H
z
-
v = 175
= 228
11.09
H = 11 16
Z = 430
T = 750
= 396
T = 827
(3x3)
(3x3)
=
1.42
772
203
= 1016
j.
(3x3)
(3x3)
(3x3)
The training patterns used for layers 1, 2 and 3 are
shown in Fig. 2. For testing the pattern recognition
capabilities of the simulated neocognitron, specific patterns were presented at the photoreceptor. These patterns and the corresponding recognition results are
presented in Figs. 3-6.
On the basis of these experimental results, the following conclusions can be drawn:
There is no robust way of choosing the fixed C-layer
weights. It appears to be impossible to obtain a theoretical upper bound on the shift and scaling for the correct recognition of the learnt patterns.
The weights D[r, s] in the C-layers decrease monotonically with respect to the distance from (r, s) = (0, 0).
This results in the blurring of features, which, in turn,
leads to ambiguity in the recognition of patterns.
The global features extracted in the intermediate layers are a combination of averaged versions of the primitives which are input to the first layer. As a result,
‘credit assignment’ cannot be done uniquely.
Fi .4 Patterns used to test the performance of the network (Fukushirna)
an%the correspomiing resultsfor H
Top two patterns are correctly recognised, bottom 2 patterns are not correctly
recognised
v
H
Z
T
v
--
=
F
E:9$e
c
~
~
~
~
V -
~
226
H = 1016
z=
f =
d
~
~
750
515
~Of the
~ network
~ ~ ~(Fukushima)
f o r m
Top two patterns are correctly recognised, bottom 2 patterns are not correctly
recognised
752
318
T = 951
z
=
=
v * 1.06
H = 11.21
450
T - 836
z =
Top row: Patterns used to train the four planes of layer 1
Second and third rows: Patterns used to train the eight planes of layer 2
Bottom row: Patterns used to train the four planes of layer 3
z.
-
d
H
Fig.2 Patterns used to train the various layers of the network (Fuku-
259
775
832
T F 374
H-
= 584
H
1049
Z = 676
T = 694
V = 1.32
shima)
v.
5.90
752
1052
4.23
V = 161
H = 665
z = 121
T = 606
-
V =
H
026
994
3.15
T = 776
z
F
Fi .6 Patterns used to test the performance ofthe network (Fukushima)
an%the corresponding resultsfor T
Top two patterns are correctly recognised; bottom 2 patterns are not correctly
recognised
a
n
c
e
Modifications to the structure of the
neocognitron
4
We now deal with modifications to be incorporated in
the structure of the neocognitron [5-81, in order to
achieve improved recognition capabilities. The inhibition is completely discarded in the training, but incorporated during the testing phase in the second and the
third layers only. Patterns used for training the network are shown in Figs. 7-15. Note that, in the first
layer, inhibition is not considered for the following reason: assume that the S-cell, in the initial (featureextracting) layer, is designated to respond to a horizontal line. The presence of inhibition in this layer causes
the S-cell to give a smaller output when a vertical line
also exists along with the horizontal line in its field of
view (see, for instance, Fig. 8f which is a part of character ‘T’), than when the horizontal line alone is
present. Note that, in the initial layers, the presence of
one primitive should not affect the response of the Scell designated to respond to another primitive.
Fig.7
present in ‘I’ are absent in ‘T’, and the inhibition to ‘1’
should be as a result of the fact that Drimitives in
Fig. 9a and Fig. 10b (in ‘1’) and Fig. 4; (in ‘1’) are
different.
0
b
d
e
Fig. 10 Patterns used to train the second layer
(pattern set 3)
C
f
of
the proposed network
Patterns used to train the first layer of the proposed network
L
a
b
C
f
d
e
Fig. 11 Patterns used to train the second layer of the proposed network
(pattern set 4)
I
d
f
e
Fig.8 Patterns used to train the second layer of the proposed network
(pattern set I )
a
b
d
e
Fig.9 Patterns used to train the second layer
(pattern set 2)
0
C
f
of
the proposed network
The inhibition of a cell is feature-based. For example,
consider pattern Fig. 13d. This is the pattern ‘I’ which
contains ‘T’ (Fig. 1 2 4 as a sub-pattern, and which is
close to the pattern ‘1’ (Fig. 14a). When ‘I’ is fed as
input to the network, the inhibition to ‘T’ should come
from the fact that the primitives in Fig. 8e and Fig. 9b
Fig.12 Pattern used to train the third layer
(pattern set I )
of the proposed networ
The training strategy used here, after modifying the
neocognitron architecture, is distinct from that of
Fukushima [5-81, as explained below.
The input characteristic of an S-cell during the training phase is modified as follows:
where the norm of the matrix A , is obtained from summing the squares of the elements of AL for the indices
p , I and m (i.e. excluding i the first index); and the
norm of C, is similarly obtained from summing the
squares of the elements of CL-I,but only for those values of indices (p, I and m) for which A,[i, p , I, m] is
non-zero. The characteristic of @ is shown in Fig. 16.
This kind of response of an S-cell enables it to extract a
certain feature of a prescribed pattern, in spite of the
presence of other features.
b
a
d
Fig. 15 Patterns used to train the third layer of the proposed network
(pattern set 4)
-
C
Fig.13 Patterns used to train the third layer of the proposed network
(pattern set 2)
Fig. 16 Characteristicsof Q in the equationfor S, [i, j , k ] in Section 4
The input-output characteristic of a C-cell is prescribed as follows:
Find max = max S ( j + I , k + m ) , -1 5 Z,m 5 1
Let the coordinates of ma2 be (u,
w)
Calculate d ( ~m, ) = d(u- 112 + (v - m)2
The output of the C-cell is given by
C ( j ,k ) =
a
/ I l l l l l l l l l l l l l l l l l I l l
U
(pattern set 3)
S ( j + 1, k
I
+ m)qd(l>m)
where q = 0.1.
The response of the C-cell is similar to the one used
in the neocognitron [5-81. However, d(Z, m) has a peak
at (u, v) (where S has a maximum) instead of at (0, 0).
This kind of response gives the same effect as the one
obtained by training with shifted primitives. Note that
no shifted and scaled versions of the patterns are used
for training.
The training algorithm is as follows: The central Scell of the S-plane, which is to be trained, is selected as
the representative. The receptive area of the S-cell at
the input layer is calculated by backtracking. The pattern is presented within this receptive area at the input,
and the weights are fixed in the following manner:
I l l l l l l l l l l l l l l l l l l l l l
d
Fig. 14 Patterns used to train the third layer of the proposed network
C
Lrnc
A L [ i , P , I , ml = CL-1 ( P , I , m)
where i = index of the S-plane in module L ; p = index
m> = coordinates
ased on features, is
,to enhance the pernetwork in classifying patterns which
’ and ‘I’ and ‘E’ and ‘F’. To this end,
neurons of second and third S-iayers
g the testing phase. The modificace, for the second layer fL = 2) are as
[ p , j + I, k + m] I
sented by AL [i, p , I, m], or does not contain the pnmitive represented by AL [i, p , I, m]. We should add here,
that these changes have to be made only in the testing
phase of the network. The responses of the S-cells are
given by eqn. 3 during the training phase.
Two types of network arrangements are considered
incorporating the above modifications. The input layer
C, followed by S1- C1- R1 - S2 - C, - S3 - C, in the
first type and by S1- C1 - S, - C2 - R2 - S3 - C, in the
second. As explained before, the S, layers extract the
features, and C, layers do the blurring, making the network more tolerant. The R, layers are introduced to
scale down the size. In the first type, the scaling down
is affected just after the C1 layer, whereas in the second
type, it happens after the C2 layer. In both the cases,
the scaling (which is non-overlapping) is by a factor of
114.
and side, the term involving h * ib2 (with
arneter of the order 0.002) provides an
, whenever there is a mismatch between
Ep, j + 1, k + m] and AL [i, p , 1, ml.
e larger magnitude if the input pata primitive not represented by the weights
1, or if the input pattern does not contain
ve represented by the weights AL [i, p , I, m].
For the third layer, we further calculate the following
taef[z]
= max S [ z ,j , k ]
W)
1
m
ern i of the third (€inal> layer, the
represent primitive z of the sec[ij [z],in a way, indicates whether
ntains the primitive z of the second
] represents the presence of primut test pattern. The dot normalised
similarity between the vector of second
in the training pattern i, and
layer primitives present in the
muz = max usurn[i]
e, and ib3 is of the order
5
Comparison with the neocognitron
On the basis of extensive simulation studies, it has been
found that the modifications proposed lead to a substantial improvement in the performance of the neocognitron for pattern recognition. Some advantages of
the modified neocognitron are now given.
* The response of the S-cell is different from the one in
[5-81, and is greatly simplified.
Since only one set of training patterns is used, and no
shifted and scaled versions are considered, it takes less
time to train the net.
The number of constants to be fixed during the training of the network is now reduced to six. These are the
threshold constants (Thl,Th2,Th3)and the sl
the nonlinearity (al,a2,a,) at the output of the S-cell
in the three layers. The a,s are generally in the range
0.1 - 0.15, and the performance of the network is
robust with respect to small variations in the value of
a,. The threshold values during the training phase were
fixed as Thl = 0.7 and Th2 = 0.88 (Th3does not figure
during training), and during the testing the values were
Thl = 0.7, Th2 = 0.82 and Th3 = 0.82. As in the case
of the a,s, small variations in the values of (Thl,Th2,
Th3) do not affect the peformance of the network significantly.
Using the training pattern set which does not contain
shifted and scaled versions of the patterns, the network
is still tolerant to the scale changes in the patterns, to
some extent. The tolerance to the scale changes is due
to the presence of the C-layers. This can be explained
in a heuristic way. Consider the pattern ‘V’ (Fig.
Fig. 12a), made up of primitives shown in Fig. 7a,c
and d (of the first layer) and primitives shown in
Fig, 8a, b and Fig. llc (of the second layer). When the
pattern ‘V’is presented to the network to train the last
layer, the outputs of the corresponding planes of the
second S-layer have a 1
the features occur, thus
tionship between the fe
smeared by the following C-layer, and the weights, corresponding to the primitives (Fig. 8 4 b and Fig. l l c )
are given in Table 2. It is this smearing caused by the
C-layer that is responsible for the network to be tolerant to some extent to changes in scale. Fig. 17 shows
ns of the signifiant values when the three
matrices corresponding to the primitives (Fig. 8a,b and
Fig. l l c ) are superimposed. It can be seen that when a
0
small ‘v’ or a big ‘V’ is presented, its features fall
within the space covered by the weights (as shown in
Fig. 8), and the recognition is effected. Thus, in a heuristic way, it can be demonstrated that the network has
a reasonable scale tolerance. However, for some patterns, the range of scale for successful recognition was
found to be as high as 1 : 4 in our simulations.
Table 2: Weights (corresponding to primitives Fig. 8a,
Fig. 8b and Fig. I l c ) of the neuron for extracting the
feature Fig. 12a
0.23
0.35
0.66
1.12
1.21
1.11
0.56
0.00
0.00
0.00
0.00
0.05
0.59
0.82
0.82
0.53
0.39
0.00
0.24
0.34
0.32
0.25
0.18
0.08
0.00
0.00
0.00
0.00
0.03
0.03
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.05
0.05
0.05
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.01
0.01
0.00
0.00
0.00
0.19
0.00
0.00
0.00
0.00
0.00
0.00
0.28
0.30
0.19
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.23
0.33
0.31
0.25
0.18
0.09
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.01
0.18
0.18
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.29
0.31
0.20
0.00
0.00
0.00
0.00
0.11
0.38
0.42
0.42
0.20
0.00
0.00
0.05
0.59
0.83
0.83
0.54
0.40
0.00
0.24
0.36
0.67
1.12
1.21
1.11
0.57
0.00
0.00
0.00
0.00
0.00
0.00
0.05
0.15
0.21
0.20
0.00
0.00
0.00
0.00
0.00
0.00
0.14
0.21
0.35
0.37
0.00
0.00
0.00
0.00
0.00
0.00
0.19
0.82
1.14
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.89
1.24
0.00
0.00
0.00
0.00
0.00
0.00
0.20
0.82
1.14
0.00
0.00
0.00
0.00
0.00
0.15
0.22
0.35
0.37
0.00
0.00
0.00
0.00
0.06
0.16
0.22
0.21
0.00
0.00
0.00
0.00
0.00
0.06
0.06
0.06
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.11
0.37
0.40
0.41
0.00
0.00
0.00
made before hand, to represent as many patterns as
possible. It should therefore be noted that the number
of planes needed to detect the primitives does not necessarily increase with the number of patterns, but saturates after some stage.
Another reason for the increase in the network size is
that we may have to increase the field of view of S cells, to enable them to differentiate patterns closely
resembling each other. The reason for an increase in
the field of view is given by considering, for example,
primitives shown in Fig. 9a and c. After blurring, these
primitives will have overlapping supports, and hence
they appear to be similar. Therefore a field of view of 5
x 5 is insufficient to correctly distinguish between the
two. In order to succeed in this case, we need to train
the network with larger primitives, so that, even after
blurring, the limbs of the primitives are of sufficient
length to preserve their identity. This necessitates the
choice of larger patterns and a greater field of view to
accomodate the limbs. Note, however, that this
increase is limited.
We give, in Table 3, the details of a typical simulated
network. Other configurations (like C, - S, - C1- S2 C, - R2 - S3 - C3)for recognising 16 patterns have also
been tried out, which are omitted here in view of space
constraints. See [l I] for details.
Table 3: Network specifications for recognising 16 patterns for the configuration C,,- Sl Cl - R, - S, - C, S,
- C, where the network is scaled down in layer Rl
-
~
Input layer
S-layer
Receptive area
for each S-cell
from each
preceding
C-plane
enlarged V
C-layer
J
Receptive area
for each C-cell
from the
corresponding
S(R)-plane
-
~
1 plane 21 x 21
Layer 1
Layer 2
Layer 3
4 planes
(21x21)
24 planes
( 10x10)
16 planes
(10x10)
(3x3)
(7x7)
(8x8)
4 planes
(21x21)
24 planes
(10x10)
16 planes
(1x1)
(3x3)
(3x3)
(10x10)
Fig. 17 Schematic depicting the scale tolerance of the network
Encircled regions: location of large values of weights due to the three primitives
of Fig. 8a, Fig. 86 and Fig. l l c
5.1 Discussions
However, there are limitations to the proposed modifications.
* As explained earlier, the parameters az and the
threshold values Thl, Th2 and Th3 are to fixed manually, and this procedure is believed to be quite simple.
If the number of patterns for training (and hence for
recognition) is to be increased, then the network size
also has to be increased. This is owing to the fact that
more planes are to be employed to detect more primitives in the S2 layer, and is an inherent requirement of
neocognitron-like architectures. However, the number
of planes to be included depends on the number of
primitives needed to represent all the patterns under
consideration. A careful choice of primitives can be
We have extensively tested the performance of the
modified neocognitron, using patterns of various sizes
and shifts shown in Figs. 18-33. The specifications for
designing the network to recognise sixteen patterns are
given in Table 3. The training patterns for the first
layer are given in Fig. 7, those for the second layer are
shown in Figs. 8-11, and those for the third layer in
Figs. 12-15. In Figs. 18-33, the outputs of the neurons
in the last layer corresponding to each of the patterns
are given. The character corresponding to the largest
output is taken as the recognition result. If some of the
patterns are close to the input test pattern, the corresponding outputs are seen to be significant. The results
shown in Figs. 18-33 indicate the superiority in the
performance, obtained as a result of the proposed modifications to the network.
V
=
Z
i
M
i
0886 H = 0000
0000 T = 0-000
0'092 N = 0.664
w = o m 0 I = 0000
1 = 0000 4 = 0000
E = 0000 F = 0000
J
0402 X = 0000
i
Y
V = 0.740 H = 0 000
= 0000 T i 0000
= 0451 N = 0000
w = 0000 I i 0000
Z
M
1
0000 4
z
=
0000 H i 0974
0 000 I = 0 000
0000 N = 0000
W = 0000 I =
1 = 0 000 4 =
E = 0000 F =
.
I = 0 000 X
Y = 0000 K =
i
0-000
0 000
0089
0 000
0000
= 0000
M = 0.921
W = 0561
1 = 0000
v = 0259 H
M = 0666 N = 0-WO
W = 0000 I =
~
z
E
v - 0673 H
z = 0 000 T
i
0000
= 0000
i
V = 0,059 H = 0636
J
=
0-000
=
OW0
0ow
Y = 0396
-
=
N
i
0.000
0000
I = 0.000
4 = 0000
0.000F = 0208
0 000 X 0 305
0321 K = 0000
i
i
i
=
=
=
=
K
I
i
0000 H = 0 856
0000 T = 0 000
0716 N = 0000
0566 I = 0000
0000 4 = 0000
0000 F = 0.323
0108 X = 0318
0362 K = 0 000
Fig.22 Pattern used to test the network configuration Co S,
C,
-
-
R, S2 C2 S3 - C3 and the corresponding results (M)
-
-
E = 0163 F 0471
J = 0-000 X = 0-029
Y = 0058 K = 0000
V = 0122 H = 1046
v =
i
i
-
-
V = 0000 H
Z = 0000 T
M 0000 N
w = 0000 I
1 = 0000 4
E = 0000 F
J
0000 X
Y = 0000 K
= 0099 H = 1-008
0000 T = 0036
M = 0-653 N = 0071
w = 0 735 I = 0 000
1 = 0 000 4 = 0 207
V
Z
i
=
=
E = 0070 F
J = 0000 x
i
C,
Y
T
M = 0917 N
W = 0685 I
1 ; 0000 4
= 0-000
E = 0-000 F = 0-148
J = 0073 x = 0 1 5 4
Y = 0 465 K 0027
-
=
z =0000
1 = 0-0004
ig. 18 Patterns used to test the network configuration C, - S,
R,- S2 - C2 - S3 - C3 and the corresponding results (V)
M
0 103
0000
0000
0,000
~
H = 0 000
0.000 T = 0000
i
V
0000
i
i
1
E =
=
0572 N = 0000
0000 I = 0000
0000 4 = 0.000
0000 F = 0041
0071 X = 0.223
0509 K = 0 000
=
=
0676 N
0000 I
om0 4
0000 F
J =0156X=0196
Y = 0517 K = 0000
=
0 837
=
=
=
=
0 000 T
i
E = 0000 F = 0000
i
=
=
0-843 H
i
M
J = 0.241 X = 0000
Y = 0 185 K = 0000
0.000 K
0000
=
Z
w
i
i
V
i
i
= 0000
= 0000
= 0664
= 0000
= 0000
=
0000
= 0 198 H = 0389
Z = 0000 T = 0000
V
M
0.616 N
=
1067
i
i
= 0-000
J
0000
Y
=
=
W = 0 623 I = 0000
1 = 0000 4 0064
E
0000 F = 0 023
=
0000 X
i
0214
0 000 K = 0 044
Wti i i i i i iii i i i i i I
v = 0.000 H
L = 0000 T
M = 0 555 N
W = 0544 I
1 = 0.000 4
L
E i= 0000 F
J = 0000 x
Y = 0 054 K
= 0698
= 0000
= 0 000
= 0000
0000
0279
= 0000
= 0 000
=
i
Z = 0000 T = 0-041
M = 0814 N = 0 085
W = 0770 I = 0 ow
1 = 0-ow 4 = 0 111
E
0 1 6 4 F i 0-565
i
J = 0000 X i 0-197
Y 0-290 K = 0-041
i
Fig.19 Patterns used to test the network configuration C,
R1 S2 - C2 - S3 - C3 and the corresponding results
-
S,
-
C,
-
-
v = 0000 H
z =0 925 T
= 0-000 H = 0000
Z = 1'155 I i 0140
M = n
m.n. N nnnn
..
. ...
W = 0027 I = 0-282
1
0163 4 0-594
V
M = 0 000 N
W - 0 000
1 = 0 000 4
Y
v = 0-000 H
z =0 951
=
0000
1047
0234
= 0144
= 0802
H = 0164
T = 0-708
N = 0-OW
I = 0844
4 = 0794
= 0-599F = 0760
= 0000 X = 0-531
i 0711
K = 0827
=
=
1 = 0567 4
E = 0597 F
J - 0 000 X
Y = 0 625 K
ig.20 Patterns used to test the network configuration C,
R, S2 C2 S3 C3 and the corresponding results (Z)
-
-
0586 K = 0487
~
T
M = 0000 N
W = 0 193 I
-
i
~
E = 0-291 F = 0565
J = 0 000 X = 0 375
E = 0 000
J = 0 000 X
Y = 0 197 K
- S, -
i
=
=
0000 x
=
0950
=
0 000
=
0.515
=
0-000
E
0000 F
0000
J = 0000 X = 0000
Y = 0000 K
0 000
i
OW0 H
0000 T
0000 N
0000 I
0000 4
0 000 F
0 000
i
i
i
i
=
=
i
M = 0 703
M
N
i
i
=
W = 1 040 I
W = 0 797
1 ; 0 000 4
E = 0 069 F
J = 0 000 X
Y = 0-206 K
Fig.23 Pattern used to test the network configuration C,
RI - S2- C2 - S3 - C3 and the corresponding results CN)
V = 0.000 H = 0105
Z = 0000 T = 0000
M
0000
W 0968
1
0000
E
0000
J = 0000
Y = 0182
=
=
=
=
=
=
=
=
=
=
=
i
i
i
N = 0.884
I = 0000
4 = 0000
F = 0000
X = 0.407
K = 0000
0420 H = 0865
0000 T = 0000
0625 N = 0112
0634 I
0000
0000 4 = 0225
0000 F = 0179
0124 X = 0352
0.386 K = 0-000
~
R,- S2 - C2 - S3 - C3 and the corresponding results
-
S,
C,
-
-
V = 0146 H
0697
Z = 0.000 T * 0000
~
M i 0666 N = 0260
W = 1.034 I I 0000
1
I
0.000 4
-
0.315
=
0-034
i
E = 0,000 F 0339
J = 0000 X = 0514
Y
=
0390 K
v = 0.016 H
z = 0000
i
F
-
-
0.797
T = 0000
M = 0.~70N =
W = 0 967 I =
1 = 0.000 4
E = 0 000 F =
J = 0000 X =
Y = 0421 K
Fig.24 Patterns used to test the network configuration C,
C,
-
V = 0000 H =
Z = 0,000 T =
M
0000 N =
w = 0000 I =
1 = 0437 4 =
0328 H = 0766
0006 T = 0203
0-904 N = 0882
0052
1 = 0221 4
0770
E = 0357 F = 0512
.I. = 0000
X = 0875
~~.
Y = 0661 K = 0807
V
Z
0 191 H
z = 0 000 T
S,
-
0178
0000
0084
0341
0308
0025
C,
-
0000
0594
OOW
oow
oow
i
0 023
=
ow0
i
0000 H = 0 0 0 0
0000 T = 0-619
0000 N = 0000
W = 0 0 0 0 I = 0136
1 = 0187 4
0000
E
0000 F 0152
J
0 000 X 0 000
Y = 0 040 K = 0 289
V
Z
M
=
=
i
i
i
i
i
i
V=OOOOH=OM)O
Z = 0 000 T = 0 770
J = 0000 X = 0000
Y = 0 147 K = 0-271
Fig.21 Patterns used to test the network configuration Co - Sl
Rl
-
S2 - C,
- S3 -
C, and the corresponding results (g
-
C,
-
Conclusions
In order to provide motivation for the proposed modifications of the neocognitron [5-81, we have presented
some experimental results obtained from its simulation.
-
0000 H = 0000
0201 T = 0869
v = 0000 H =
z = 0-119T
E
0030 N = 0.000
0000 I = 1.015
=
1.107 4
M F 0000
W = 0w0
i = 0986
E = 0231
J = 0000
Y - 0 289
I
M=0000N=O000
w = o o o o 1 =02m
1 = 0222 4 = O O W
E = 0 000 F = 0 000
=
=
~~
~
=
0039
0319 F = 0395
= 0.000 X i 0214
0-397K = 0-653
~
=
Fig.25 Patterns used to test the network configuration C, - S,
Rl- S2- C2 - S3 - C3 and the corresponding results (I)
0000
0891
N = 0000
I = 1.056
4 = 0.039
F r 0.373
X = 0106
K 0797
-
-
C,
-
It has been found that, for an improved recognition of
two-dimensional patterns by the neocognitron, changes
are needed. There should be no inhibition in the initial
layer, thus enabling a more efficient extraction of the
V
H = 0000
0807
T
N = 0000
= 0000
= 0.000
M = 0000
W = 0000
1 = 1 154
E = 0000
J = 0000
Y = 0000
z
i
I = 0875
4 = 0000
F = 0000
x = 0000
K = 0000
V
z
M
W
1
E
J
Y
v = 0 000
2 . 0 000
= 0000 H
= 0000 T
= 0000 N
1149 4
=
-
H=
T =
N=
i 0634 I =
I
0805
= 1'298
= 0552
= 0.713
= 0921
-
V * 0385
Z = 0527
M i 0674
W = 0766
1 =
0649
0322
= 0349 H = 0000
I 0366 T i 0000
= 0457 N F 0288
= 0 522 I = 0.OW
'
0486
H=
T =
N=
I F
0.873 4
4
-
Sl
0779
-
0656
1
0660
i
w
~
1
0 235 4 = 0 387
= 1087 F = 1 045
J = 0000 X i 0419
Y = 0243 K = 0721
0285
0208
E = 1 1 5 1 F = 1136
J = 0 000 X = 0 662
Y = 0 530 K = 0 906
=
V
i
Z
i
0448
0495
H=
T=
J
Y
0461
0547
M
i
1 = 0691 4
E
0791 F
i
i
i
1
E
-
S,
-
C,
J
Y
-
-
i
V
i
Z
M
0 000 H 0 000
0-000 T = 0000
0000 N 0000
= 0 000 I = 0 000
1 = 0000 4
0000
E = 0000 F = 0286
V
Z
M
w
=
=
=
~
J
Y
i
i
0 0 0 0 X = 0000
0 000 K = 0 000
V = 0348
z = 0242
M = 0736
nt ; 0 7 6 0 ,
1 = 0197
E
0580
J = 0000
Y = 0 723
i
H
=
T
=
0 749
0 261
N
= 0 162
= 0 134
4 = 0516
F = 0 896
x
= 0 731
K = 0 606
i
i
i
i
i
= 0000 H
=
Fig.29 Patterns used to test the network configuration C,
Rl- S2 - C2 - S3 - C3 and the corresponding results (F)
K
0000
0-000
0.000
0000
i
SI
C,
-
-
0828 F
x
0 543 K
0000
0 000
-
Sl
1143
-
C,
primitives the input pattern is made up of. Inhibition
can be included in the later stages and should be feature-based to enable efficient classification. However, it
should also be noted that the proposed architecture is
still not intrinsically meant for recognition of patterns
i
i
1.219
0-708
0000
0-230
0095
0189
0379
i
-
-
Cl
-
= 0'000 H = 0 000
= 0210 T = 0 000
= 0080 N = 0-000
= 0 000
= 0000 4 ; 0 000
= 0000 F = 0046
= 0000 X = 0-525
=
0499 H = 0000
0070 T = 0012
0479 N = 0000
= 0000 I = 0000
= 0000 4 = 0000
=0135F=0480
= 0000 X = 0540
= 0981 K
0320
'
1212 K = 0 000
= 0342 H = O 000
= 0285 T I 0 429
= 0505 N = 0 000
= 0 000 I = 0 257
i
0354 4 ;
0 000
0162 F = 0 425
= 0000 X = 0 424
= 0.938 K - 0 502
=
i
-
S1 - C1 -
-
V = 0000
H
i
0000
2
0-000T = 0346
M = 0000 N = 0000
W = 0000 1 = 0277
i
1 = 0328 4 = 0000
= 0173 F = 0.254
i
E
i
J = 0000 X
Y = 0 103 K
Y = 0538 K
-
i
' = 0.000 I
=
i
0 587
0-153
I
F = 0.358
1'148 K
F = 0545
J F 0000 X = 1'146
Y = 1071 K 0990
i
V = 0 000 H =
Z = 0029 T i
M = 0000 N
W = 0000 I =
1 = 0501 4 =
E = 0472 F =
J = 0000 X =
0 391
= 0 400
=
4
0000 X
H
0 000
0000
0209
0-800
0035
0000 T
0000 N
= 0000 I
= 0000 4
0-000 F
= 0000 x
0000 K
0 000
1210
=
H = 0000
T = 0000
N i 0124
I -0000
T
N=
I =
4 =
0-000
i
0 291 N = 0 000
i 0286
I i 0 192
= 0242 4 = 0213
=
0000
i
0000
0 000
0300 T = 0 349
= 0000
0000
= 0000 H
0 000 H = 0 000
0000 T i 0048
0000 N =
w = 0 000 I =
1
0 000 4 =
E = 0830 F
J = 0000 X =
Y
0 000 K =
=
0216
0 000
F = 0029
x = 0000
2 M =
W I
1.
Fig.32 Patterns used to test the network configuration C,
Rl S2 C, - S, - C3 and the corresponding results cy)
-
i
=
=
-
v =
=
w
0889
0775
J = 0011 X = 0869
Y
0644 K = 1020
=
i
V
Z
M = 0557 N 0 486
= 0642 I = 0662
0356 H
=
i
E
W
Fig.28 Patterns used to test the network configuration C,
R1- S2 C2 S3 C3 and the corresponding results (E)
0000
i
0-129
0675
0452
0.454
0233
E = 0437
0.069
0307 T =
0316 N F
= 0 000 I =
0000 4 =
0000 F =
0043 X =
= 1390 K
i
L
E
0 535
0000
=
H=
T
N
I =
4 =
Fig.31 Patterns used to test the network configuration C, - S,
Rl- S2 - C2 - S3 - C3 and the corresponding results (X)
W
~
i
i
0005 N = 0000
D
.O..W. I = 0 272
=
0256
0397
1.355
0844
0172 4 = 0697
0469 F = 0682
1.136
0150 X
1-070 K = 0763
= 1.210
Cl
i
s
i
0 822 I
~
i
-
0150
0240
0337
0 321
0000
0062
0 641 H = 0457
0,174
0599 T
0 876 N - 0594
0766
0554
0429
-
0187
0-000X
= 1.213 K
=
M
= 0000 H =
Z = 0 390 T =
M = 0 241 N
W = 0407 I =
- 0000 F -=
s
V
V
0000
0000
0000
0000
i
i
ow0
0-000
Fig.30 Patterns used to test the network configuration C,
E = 0229 F = 0429
J = 0201 X = 0380
Y i 0214 K = 0610
1
0 211
0012
0 311
0 000
0 000
0 000
0 540
0 127
R, - S2 C2 - S, - C, and the corresponding results (J)
M
i
i
-
Cl
V=0000H=0163
Z
0000 T = 0233
i
0000 4
0-067
i
0.040
i
0.000 F = 0 000
0 928 X = 0 000
0000 K = 0000
0000
0-000
i
~
0000 H i
0000 T I
M = 0000 N
w = 0000 I
1 =00004=
E = 0000 F =
J
0106 X i
Y = 0000 K
-
i
= 0213 K = 0689
I
0698 4
i
-
0310 X
0000 K
i
i
M
= 0676 4 = 1.226
= 0437 F = 0667
= 0.165 X
0478
C3 and the corresponding results (4)
V
Z
i
Z
i
-
Fig.27 Patterns used to test the network eonfiguration C,
-
-
= 0000
= 0329
= 0328
1 = 0892 4
E = 0498 F
J = 0362 X
Y = 0518 K
S2 - C2 - S,
Sl
V
i
0108 X = O 1 2 3
= 0048 K = 0629
V = 0241 H = 0-626
Z = 0-848 T = 0605
M = 0458 N = 0064
-
0662
=
1 = 0962 4
E = 0000 F
Rl
i
0.050 H =
0000 T =
0058 N =
w 0000 I =
1 = 0000 4 =
E
OOW F
J
0-744 X =
Y = 0000 K il
= 0959 4 = 0000
= 0089 F = 0302
M = 0188 N
w = 0543 I
= 0839
0113
i
= 0088
OOW I * 0000
~
X = 0000
0000 I
=
V = 0000 H
Z = 0-839T
W
0.000
= 0000
0000 K
0 040 H
0000 T
0.004 N
0 000 N
i
=
F
= 0000
= 0106
Fig.26 Patterns used to test the network configuration C,
C2 - S3 - C, and the corresponding results (1)
Y
0000 T = 0000
i 0000
0000 I F 0000
0 000 4 = 0-000
0000 F 0000
0 108 x = 0000
0000 K = 0000
0000
= 0000 I = 0966
R, - S2 -.
=
=
0000 H = 0000
0512
i
= 0000 H = 0000
= 0000 T = 0497
= 0000 N i 0000
M = 0 000
w = 0 000
1 = 0 713
E = 0 104
J - 0 000
Y = 0 047
J
= 0000
I
0 000
0514
0000
0.423
0333
0637
0757
0982
V
Z
M
W
1
0158
0630
0 375
0479
0031
E = 0497
J = 0000
Y = 0817
=
=
=
=
~
=
i
0302
1051
H = 0000
T = 0529
N = 0 082
1 = 0643
4 = 0774
F = 0637
X
K
Fig.33 Patterns used to test the network configuration C, - S,
Rl S, - C2 - S3 - C3 and the corresponding results (K)
~
i
0905
1-037
-
C,
-
-
subjected to rotation and/or occlusion. In fact, a different approach based on feature extraction (based, for
instance, on location of corners and their connectivity)
is needed. These results will be reported elsewhere
[ 16,171.
7
1 HUBEL, D., and WIESEL, T.: ‘Shape and arrangement of columns in the cat’s striate cortex’, J. Physiology, 1963, 165, pp.
559-567
GOVINDAN, V.K., and SHIVAPRASAD, A.P.: ‘Character recognition - a survey’, Pattern Recognit., 1990, 23, (7), pp. 671483
LIPPMAN, R.P.: ‘An introduction to computing with neural networks’, IEEE ASSP Mag. April 1987, pp. 4-22
LIPPMANN, R.P.: ‘Pattern classification using neural networks’,
IEEE Commun. Mag. November 1989, pp. 4 7 4 4
FUKUSHIMA, K.: ‘Cognitron: a self-organising multi-layered
neural network model’, Biol. Cybern., 1975, 20, pp. 121-136
FUKUSHIMA, K., and MIYAKE, S.: ‘Neocognitron: a new
algorithm for pattern recognition tolerant of deformations and
shifts in position’, Pattern Recognit., 1982, 15, (6), pp. 455469
7 FUKUSHIMA, K.: ‘Neocognitron, a self-organising neural network model for a mechanism of pattern recognition unaffected by
shift in position’, Biol. Cybern., 1980, 36, pp. 193-202
8 FUKUSHIMA, K.: ‘Handwritten alphanumeric character recognition by the neocognitron’, IEEE Trans. Neural Netw. 1991,
May 1991, 2, pp. 355-365
9 FUKUSHIMA, K.: ‘Analysis of the process of visual pattern recognition by the neocognitron’, Neural Netw., 1989,2, pp. 413420
10 VENKATESH, Y.V., and GANESH MURTHY, C.N.S.: ‘Experimental investigations on the performance of the neocognitron for
2-D pattern recognition’, International conference on Information
processing, Singapore, December 1991
11 VENKATESH, Y.V., and GANESH MURTHY, C.N.S.: ‘Modjfied noecognitron for improved 2-D pattern recognition’, Technical Report, March 1994 and March 1995 (revised), Department of
Electrical Engineering, Indian Institute of Science, Bangalore,
India
12 LI, C., and WU, C.H.: ‘Introducing rotation invariance into the
neocognitron model for pattern recognition’, Pattern Recognit.
Lett. December 1993, 14, pp. 985-995
13 CASASENT, D., and PSALTIS, D.: ‘Position, rotation and scale
invariant optical correlation’, Appl. Opt., 1976, 15, pp. 1795-1799
14 MENON, M.M., and HEINEMANN, K.G.: ‘Classification of
natterns
using
a self-organising neural network‘, Neural Netw.,
..... ~~~.
~.~~
1988, 1, pp. 201-215
15 BARNARD, E., and CASASENT, D.: ‘Shift invariance and the
neocognitron’, Neural Netw., 1990, 3, pp. 403410
16 GANESH MURTHY, C.N.S., and VENKATESH, Y.V.: ‘Character recognition using encoded patterns as inputs neural networks’, National conference on Neural networks and fuzzy
systems-NCNNFS 1995, Anna University, Madras, India, 16-18
March 1995, pp. 104-114
17 GANESH MURTHY, C.N.S., and VENKATESH, Y.V.: ‘A new
method for encoding patterns for classification by neural networks’, IEEE Trans.Neural Netw. (submitted)
Download