Modified neocognitron for improved 2-D pattern recognition C.N.S.Ganesh Murthy Y.V. Ve n kates h Indexing terms: Neocognitron, 2-D pattern recognition, Training Abstract: Some modifications to an existing neural network, the neocognitron, are proposed in order to overcome some of its limitations and to achieve an improved recognition of patterns (for instance, characters). Motivation for the present work arose from the results of extensive simulation experiments on the neocognitron. Inhibition during training is dispensed with, including it only during the testing phase of the neocognitron. Even during testing, inhibition is totally discarded in the initial layer because it leads, otherwise, to some undesirable results. However, inhibition, which is feature-based, is incorporated in the later stages. The number of network parameters which are to be set manually during training is reduced. The training is made simple without involving multiple training patterns of the same nature. A new layer has been introduced after the C-layer (of the neocognitron) to scale down the network size. Finally, the response of the S-cell has been simplified, and the blurring operation between the S- and the Clayers has been changed. The new architecture, which is robust with respect to small variations in the value of the network parameters, and the associated training are believed to be simpler and more efficient than those of the neocognitron. 1 Introduction In visual pattern recognition, the objects to be recognised are subjected to various forms of transformation, involving shift, scale and rotation. The standard techniques found in the literature on pattern recognition come nowhere close to the human ability to perform transformation-invariant recognition. Obviously inspired by the remarkable power of the human vision system, attempts are being made to design an artificial neural vision machine for imitating some of its aspects. A basic problem in this context is to synthesise a network of these artificial neurons, in order to endow it with pattern-recognition capabilities. The human visual system seems to have a hierarchical structure (see Hubel and Wiesel [I]), in which simple features are first extracted in the early layers from a stimulus pattern, then these are integrated, in the higher layers, into more complicated versions. In this hierarchy, a cell in a higher stage is likely to receive signals from a wider area of the retina, and, perhaps, as a consequence, is more insensitive to the position of the stimulus. Many techniques have been proposed to solve the problem of character recognition. See, for instance, [2] for a review of these techniques. It is now acknowledged that neural techniques offer advantages in terms of speed, fault tolerance and adaptation. Many models have been suggested in the literature on the application of neural networks. These models differ in complexity and capabilities, and can be classified as models with and without self-organisation and feedforward networks consisting of layers of neurons. However, in this report, we consider only those which have been proposed for recognition of 2-D patterns. The multi-layer perceptron (MLP) [3,4] is a well known architecture in feed-forward networks. However the MLP treats a 2-D pattern as a stacked 1-D vector, and hence, two similar patterns shifted by even one pixel are regarded as two different patterns. For the same reason, in an MLP, invariance to scale and distortion cannot be achieved in a straightforward way: the patterns are to be pre-processed to extract invariant features before using MLP as the classifier. Hence, a stand alone MLP without preprocessing cannot be used for recognition. 2 Review of related work Fukushima has proposed neural networks [5-91 for the recognition of 2-D patterns which might have been subjected to certain types of deformation. His architecture consists of a cascade connection of pairs of cell layers, called S (simple) and C (complex) layers. Each layer consists of many planes; and each plane is a 2-D array of cells. The output of an S-layer is the input to the succeeding C-layer. The input to the first S-layer is from the photo-receptor, and the output of the last Clayer is the result of the recognition operation. In the last C-layer, each cell corresponds to a specific pattern (see Fig. 1). The neocognitron is a hierarchical multi-layered neural network capable, according to its creator, of deformation-invariant visual pattern recognition. Even if the input pattern is deformed in shape, only one cell, corresponding to the category of the input pattern, is acti- vated on the highest stage of the network. Other cells respond to other categories. Fukushima has reported that recognition of one of the ten numerals, with small shifts within the 16 by 16 image, has been successful. S-laver C-laver S-laver mann [14] that shift invariance can be obtained only by creating a model, which simply responds to the total energy in the image. However, this is not a satisfactory result as far as neural nets are concerned. It may be noted here that, Barnard and Casasent [15] have also analysed why the neocognitron fails to be an intrinsically shift-invariant pattern recogniser. We now examine some aspects of processing carried out by the neocognitron, which is supposed to recognise patterns after an appropriate training phase, even in the presence of shifts, scale changes, and distortions. In this network, the C-cells are meant to make the network robust against distortion. The input and the output of a C-cell are related as follows: p)3m, r CL[P,3, kl = @ SIUSLb2),3+ 7-, k- r where moduI e-1 module- 2 Fig. 1 Schematic structure of neocognitron Inhibitory neurons are not shown The Cognitron [5], the earlier model of Fukushima, uses unsupervised learning, and does not have the ability to correctly recognise position-shifted or shape-distorted patterns. In the neocognitron architecture [6], a technique dealing with deformations and shifts in position is addressed, but scaling is not taken care of. Because of unsupervised learning used in [6], the number of planes in the Us1 and the Us2 layer can only be figured out based on experimentation and heuristics. Among the 24 planes in the Us1 layer (see Fig. 12 in [6]) four of them respond to the vertical line (different positions). Also, because of the type of bIurring and reduction of size from Us1 to Ucl (16x16 plane in Us1 corresponds to 10x10 in Ucl), patterns presented at some particular shifted positions will not be recognised properly, as demonstrated experimentally in [10,11]. To overcome these defects of [6], the recent paper of Fukushima [S] deals with an application of the neocognitron, and ostensibly attempts to make the network tolerant to deformations, shift and scaling. In [S], Fukushima uses shifted and scaled versions of the exemplars, to train the neocognitron to achieve shift and scale invariance. A recent paper by Li and Wu [12] discusses the problem of rotation invariant-pattern recognition using the neocognitron. Li and Wu make a reference to the Fourier-Mellin transform approach of Casasent [ 131 for invariant recognition of 2-D patterns. It is well known that Fourier transform techniques entail high precision arithmetic, and are not suitable for computer implementation. While the multi-layered structure of Li and Wu is the same as the neocognitron, the input to their second layer is a set of rotated versions of the object, as though each rotated version is a separate object. The functions of the other layers are similar to those of the corresponding layers in the neocognitron. Menon and Heinemann [14] have found that the neocognitron does not perform satisfactorily when it has to discriminate between three (somewhat larger) objects with larger shifts in a 128 by 128 image. A similar but independent conclusion, that the neocognitron does not possess a shift invariance property, has also been reported in [10,11]. It is found by Menon and Heine- s 1 J + SI (1) O(z) = z / ( g ~+ x) if z 2 0 and @(a) = 0 if II: <0 The parameter a, is a positive constant, which determines the degree of saturation of the output, and the weights D[r, s] are chosen so as to decrease monotonically with respect to the distance from ( r , s) = (0, 0). Further, in the neocognitron, the feature extraction is carried out by S-cells, whereas tolerance to positional shift is accomplished by the C-cells. The output of a typical S-cell is given by: SL[i,II,k-] = (2) where o ( x ) = max(x, 0). The parameter rL controls the selectivity of the S-cell to an input pattern. The value of the inhibitory input V, to the S-cell is given by I where the weights g [r, s] are chosen so as to decrease monotonically with respect to the distance from (r, s) = (0, 0). The reinforcement strategy is as follows [2]: AL[i,P,T,Sl =4Lg[~,sl~L-l[P,~,s] where (il k) denotes the coordinates of the representative S-cell, and qL is a positive constant which determines the amount of reinforcement. In order to provide the motivation for the present paper, the neocognitron (Fig. 1) has been simulated for verifying its ability to recognise two-dimensional patterns [10,11]. We present below an analysis of these results. 3 Analysis of experimen~airesults obtained from the n e o ~ o ~ ~ i t ~ o n We recall that the scale invariance of Fukushima [5-S] actually requires training the network with a number of scaled versions of the same pattern, and then grouping the outputs. In order to show that the neocognitron fails if the shifted and scaled versions of the patterns are not used to train the network, the training set is chosen to contain no shifted and scaled versions of the exemplars. This does not mean that using the shifted and scaled versions of the patterns to train the network will enhance the performance of the network. The specifications of the neural network actuallv simulated are given in Table 1. v H 2 T 1 plane 9 x 9 Layer 1 S-layer 4 planes (9x9) 8 planes (7x7) 4 planes (5x5) Inhibitory input to the 1 plane (9x9) Slayer Layer 2 1 plane (7x7) Layer 3 948 H. 532 Z = 690 f = 590 V e 8.35 H I 605 777 T = 707 Fi .3 Patterns used to test the performance of the network (Fukushuna) a$ the corresponding resultsfor V Top two patterns are correctly recognised, bottom 2 patterns are not correctly recognised 1 plane (5x5) v Receptive area for each S-cell from each preceding C-plane (3x3) C-layer 4 planes (7x7) 8 planes(5x5) 4 planes (1x1) Receptive area for each C-cell from the corresponding S-plane v= - z. Table 1: Specifications of the simulated network (Fukushima) Input layer -- = 792 507 6 M 549 H z - v = 175 = 228 11.09 H = 11 16 Z = 430 T = 750 = 396 T = 827 (3x3) (3x3) = 1.42 772 203 = 1016 j. (3x3) (3x3) (3x3) The training patterns used for layers 1, 2 and 3 are shown in Fig. 2. For testing the pattern recognition capabilities of the simulated neocognitron, specific patterns were presented at the photoreceptor. These patterns and the corresponding recognition results are presented in Figs. 3-6. On the basis of these experimental results, the following conclusions can be drawn: There is no robust way of choosing the fixed C-layer weights. It appears to be impossible to obtain a theoretical upper bound on the shift and scaling for the correct recognition of the learnt patterns. The weights D[r, s] in the C-layers decrease monotonically with respect to the distance from (r, s) = (0, 0). This results in the blurring of features, which, in turn, leads to ambiguity in the recognition of patterns. The global features extracted in the intermediate layers are a combination of averaged versions of the primitives which are input to the first layer. As a result, ‘credit assignment’ cannot be done uniquely. Fi .4 Patterns used to test the performance of the network (Fukushirna) an%the correspomiing resultsfor H Top two patterns are correctly recognised, bottom 2 patterns are not correctly recognised v H Z T v -- = F E:9$e c ~ ~ ~ ~ V - ~ 226 H = 1016 z= f = d ~ ~ 750 515 ~Of the ~ network ~ ~ ~(Fukushima) f o r m Top two patterns are correctly recognised, bottom 2 patterns are not correctly recognised 752 318 T = 951 z = = v * 1.06 H = 11.21 450 T - 836 z = Top row: Patterns used to train the four planes of layer 1 Second and third rows: Patterns used to train the eight planes of layer 2 Bottom row: Patterns used to train the four planes of layer 3 z. - d H Fig.2 Patterns used to train the various layers of the network (Fuku- 259 775 832 T F 374 H- = 584 H 1049 Z = 676 T = 694 V = 1.32 shima) v. 5.90 752 1052 4.23 V = 161 H = 665 z = 121 T = 606 - V = H 026 994 3.15 T = 776 z F Fi .6 Patterns used to test the performance ofthe network (Fukushima) an%the corresponding resultsfor T Top two patterns are correctly recognised; bottom 2 patterns are not correctly recognised a n c e Modifications to the structure of the neocognitron 4 We now deal with modifications to be incorporated in the structure of the neocognitron [5-81, in order to achieve improved recognition capabilities. The inhibition is completely discarded in the training, but incorporated during the testing phase in the second and the third layers only. Patterns used for training the network are shown in Figs. 7-15. Note that, in the first layer, inhibition is not considered for the following reason: assume that the S-cell, in the initial (featureextracting) layer, is designated to respond to a horizontal line. The presence of inhibition in this layer causes the S-cell to give a smaller output when a vertical line also exists along with the horizontal line in its field of view (see, for instance, Fig. 8f which is a part of character ‘T’), than when the horizontal line alone is present. Note that, in the initial layers, the presence of one primitive should not affect the response of the Scell designated to respond to another primitive. Fig.7 present in ‘I’ are absent in ‘T’, and the inhibition to ‘1’ should be as a result of the fact that Drimitives in Fig. 9a and Fig. 10b (in ‘1’) and Fig. 4; (in ‘1’) are different. 0 b d e Fig. 10 Patterns used to train the second layer (pattern set 3) C f of the proposed network Patterns used to train the first layer of the proposed network L a b C f d e Fig. 11 Patterns used to train the second layer of the proposed network (pattern set 4) I d f e Fig.8 Patterns used to train the second layer of the proposed network (pattern set I ) a b d e Fig.9 Patterns used to train the second layer (pattern set 2) 0 C f of the proposed network The inhibition of a cell is feature-based. For example, consider pattern Fig. 13d. This is the pattern ‘I’ which contains ‘T’ (Fig. 1 2 4 as a sub-pattern, and which is close to the pattern ‘1’ (Fig. 14a). When ‘I’ is fed as input to the network, the inhibition to ‘T’ should come from the fact that the primitives in Fig. 8e and Fig. 9b Fig.12 Pattern used to train the third layer (pattern set I ) of the proposed networ The training strategy used here, after modifying the neocognitron architecture, is distinct from that of Fukushima [5-81, as explained below. The input characteristic of an S-cell during the training phase is modified as follows: where the norm of the matrix A , is obtained from summing the squares of the elements of AL for the indices p , I and m (i.e. excluding i the first index); and the norm of C, is similarly obtained from summing the squares of the elements of CL-I,but only for those values of indices (p, I and m) for which A,[i, p , I, m] is non-zero. The characteristic of @ is shown in Fig. 16. This kind of response of an S-cell enables it to extract a certain feature of a prescribed pattern, in spite of the presence of other features. b a d Fig. 15 Patterns used to train the third layer of the proposed network (pattern set 4) - C Fig.13 Patterns used to train the third layer of the proposed network (pattern set 2) Fig. 16 Characteristicsof Q in the equationfor S, [i, j , k ] in Section 4 The input-output characteristic of a C-cell is prescribed as follows: Find max = max S ( j + I , k + m ) , -1 5 Z,m 5 1 Let the coordinates of ma2 be (u, w) Calculate d ( ~m, ) = d(u- 112 + (v - m)2 The output of the C-cell is given by C ( j ,k ) = a / I l l l l l l l l l l l l l l l l l I l l U (pattern set 3) S ( j + 1, k I + m)qd(l>m) where q = 0.1. The response of the C-cell is similar to the one used in the neocognitron [5-81. However, d(Z, m) has a peak at (u, v) (where S has a maximum) instead of at (0, 0). This kind of response gives the same effect as the one obtained by training with shifted primitives. Note that no shifted and scaled versions of the patterns are used for training. The training algorithm is as follows: The central Scell of the S-plane, which is to be trained, is selected as the representative. The receptive area of the S-cell at the input layer is calculated by backtracking. The pattern is presented within this receptive area at the input, and the weights are fixed in the following manner: I l l l l l l l l l l l l l l l l l l l l l d Fig. 14 Patterns used to train the third layer of the proposed network C Lrnc A L [ i , P , I , ml = CL-1 ( P , I , m) where i = index of the S-plane in module L ; p = index m> = coordinates ased on features, is ,to enhance the pernetwork in classifying patterns which ’ and ‘I’ and ‘E’ and ‘F’. To this end, neurons of second and third S-iayers g the testing phase. The modificace, for the second layer fL = 2) are as [ p , j + I, k + m] I sented by AL [i, p , I, m], or does not contain the pnmitive represented by AL [i, p , I, m]. We should add here, that these changes have to be made only in the testing phase of the network. The responses of the S-cells are given by eqn. 3 during the training phase. Two types of network arrangements are considered incorporating the above modifications. The input layer C, followed by S1- C1- R1 - S2 - C, - S3 - C, in the first type and by S1- C1 - S, - C2 - R2 - S3 - C, in the second. As explained before, the S, layers extract the features, and C, layers do the blurring, making the network more tolerant. The R, layers are introduced to scale down the size. In the first type, the scaling down is affected just after the C1 layer, whereas in the second type, it happens after the C2 layer. In both the cases, the scaling (which is non-overlapping) is by a factor of 114. and side, the term involving h * ib2 (with arneter of the order 0.002) provides an , whenever there is a mismatch between Ep, j + 1, k + m] and AL [i, p , 1, ml. e larger magnitude if the input pata primitive not represented by the weights 1, or if the input pattern does not contain ve represented by the weights AL [i, p , I, m]. For the third layer, we further calculate the following taef[z] = max S [ z ,j , k ] W) 1 m ern i of the third (€inal> layer, the represent primitive z of the sec[ij [z],in a way, indicates whether ntains the primitive z of the second ] represents the presence of primut test pattern. The dot normalised similarity between the vector of second in the training pattern i, and layer primitives present in the muz = max usurn[i] e, and ib3 is of the order 5 Comparison with the neocognitron On the basis of extensive simulation studies, it has been found that the modifications proposed lead to a substantial improvement in the performance of the neocognitron for pattern recognition. Some advantages of the modified neocognitron are now given. * The response of the S-cell is different from the one in [5-81, and is greatly simplified. Since only one set of training patterns is used, and no shifted and scaled versions are considered, it takes less time to train the net. The number of constants to be fixed during the training of the network is now reduced to six. These are the threshold constants (Thl,Th2,Th3)and the sl the nonlinearity (al,a2,a,) at the output of the S-cell in the three layers. The a,s are generally in the range 0.1 - 0.15, and the performance of the network is robust with respect to small variations in the value of a,. The threshold values during the training phase were fixed as Thl = 0.7 and Th2 = 0.88 (Th3does not figure during training), and during the testing the values were Thl = 0.7, Th2 = 0.82 and Th3 = 0.82. As in the case of the a,s, small variations in the values of (Thl,Th2, Th3) do not affect the peformance of the network significantly. Using the training pattern set which does not contain shifted and scaled versions of the patterns, the network is still tolerant to the scale changes in the patterns, to some extent. The tolerance to the scale changes is due to the presence of the C-layers. This can be explained in a heuristic way. Consider the pattern ‘V’ (Fig. Fig. 12a), made up of primitives shown in Fig. 7a,c and d (of the first layer) and primitives shown in Fig, 8a, b and Fig. llc (of the second layer). When the pattern ‘V’is presented to the network to train the last layer, the outputs of the corresponding planes of the second S-layer have a 1 the features occur, thus tionship between the fe smeared by the following C-layer, and the weights, corresponding to the primitives (Fig. 8 4 b and Fig. l l c ) are given in Table 2. It is this smearing caused by the C-layer that is responsible for the network to be tolerant to some extent to changes in scale. Fig. 17 shows ns of the signifiant values when the three matrices corresponding to the primitives (Fig. 8a,b and Fig. l l c ) are superimposed. It can be seen that when a 0 small ‘v’ or a big ‘V’ is presented, its features fall within the space covered by the weights (as shown in Fig. 8), and the recognition is effected. Thus, in a heuristic way, it can be demonstrated that the network has a reasonable scale tolerance. However, for some patterns, the range of scale for successful recognition was found to be as high as 1 : 4 in our simulations. Table 2: Weights (corresponding to primitives Fig. 8a, Fig. 8b and Fig. I l c ) of the neuron for extracting the feature Fig. 12a 0.23 0.35 0.66 1.12 1.21 1.11 0.56 0.00 0.00 0.00 0.00 0.05 0.59 0.82 0.82 0.53 0.39 0.00 0.24 0.34 0.32 0.25 0.18 0.08 0.00 0.00 0.00 0.00 0.03 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.05 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.19 0.00 0.00 0.00 0.00 0.00 0.00 0.28 0.30 0.19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.23 0.33 0.31 0.25 0.18 0.09 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.18 0.18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.29 0.31 0.20 0.00 0.00 0.00 0.00 0.11 0.38 0.42 0.42 0.20 0.00 0.00 0.05 0.59 0.83 0.83 0.54 0.40 0.00 0.24 0.36 0.67 1.12 1.21 1.11 0.57 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.15 0.21 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.14 0.21 0.35 0.37 0.00 0.00 0.00 0.00 0.00 0.00 0.19 0.82 1.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.89 1.24 0.00 0.00 0.00 0.00 0.00 0.00 0.20 0.82 1.14 0.00 0.00 0.00 0.00 0.00 0.15 0.22 0.35 0.37 0.00 0.00 0.00 0.00 0.06 0.16 0.22 0.21 0.00 0.00 0.00 0.00 0.00 0.06 0.06 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.11 0.37 0.40 0.41 0.00 0.00 0.00 made before hand, to represent as many patterns as possible. It should therefore be noted that the number of planes needed to detect the primitives does not necessarily increase with the number of patterns, but saturates after some stage. Another reason for the increase in the network size is that we may have to increase the field of view of S cells, to enable them to differentiate patterns closely resembling each other. The reason for an increase in the field of view is given by considering, for example, primitives shown in Fig. 9a and c. After blurring, these primitives will have overlapping supports, and hence they appear to be similar. Therefore a field of view of 5 x 5 is insufficient to correctly distinguish between the two. In order to succeed in this case, we need to train the network with larger primitives, so that, even after blurring, the limbs of the primitives are of sufficient length to preserve their identity. This necessitates the choice of larger patterns and a greater field of view to accomodate the limbs. Note, however, that this increase is limited. We give, in Table 3, the details of a typical simulated network. Other configurations (like C, - S, - C1- S2 C, - R2 - S3 - C3)for recognising 16 patterns have also been tried out, which are omitted here in view of space constraints. See [l I] for details. Table 3: Network specifications for recognising 16 patterns for the configuration C,,- Sl Cl - R, - S, - C, S, - C, where the network is scaled down in layer Rl - ~ Input layer S-layer Receptive area for each S-cell from each preceding C-plane enlarged V C-layer J Receptive area for each C-cell from the corresponding S(R)-plane - ~ 1 plane 21 x 21 Layer 1 Layer 2 Layer 3 4 planes (21x21) 24 planes ( 10x10) 16 planes (10x10) (3x3) (7x7) (8x8) 4 planes (21x21) 24 planes (10x10) 16 planes (1x1) (3x3) (3x3) (10x10) Fig. 17 Schematic depicting the scale tolerance of the network Encircled regions: location of large values of weights due to the three primitives of Fig. 8a, Fig. 86 and Fig. l l c 5.1 Discussions However, there are limitations to the proposed modifications. * As explained earlier, the parameters az and the threshold values Thl, Th2 and Th3 are to fixed manually, and this procedure is believed to be quite simple. If the number of patterns for training (and hence for recognition) is to be increased, then the network size also has to be increased. This is owing to the fact that more planes are to be employed to detect more primitives in the S2 layer, and is an inherent requirement of neocognitron-like architectures. However, the number of planes to be included depends on the number of primitives needed to represent all the patterns under consideration. A careful choice of primitives can be We have extensively tested the performance of the modified neocognitron, using patterns of various sizes and shifts shown in Figs. 18-33. The specifications for designing the network to recognise sixteen patterns are given in Table 3. The training patterns for the first layer are given in Fig. 7, those for the second layer are shown in Figs. 8-11, and those for the third layer in Figs. 12-15. In Figs. 18-33, the outputs of the neurons in the last layer corresponding to each of the patterns are given. The character corresponding to the largest output is taken as the recognition result. If some of the patterns are close to the input test pattern, the corresponding outputs are seen to be significant. The results shown in Figs. 18-33 indicate the superiority in the performance, obtained as a result of the proposed modifications to the network. V = Z i M i 0886 H = 0000 0000 T = 0-000 0'092 N = 0.664 w = o m 0 I = 0000 1 = 0000 4 = 0000 E = 0000 F = 0000 J 0402 X = 0000 i Y V = 0.740 H = 0 000 = 0000 T i 0000 = 0451 N = 0000 w = 0000 I i 0000 Z M 1 0000 4 z = 0000 H i 0974 0 000 I = 0 000 0000 N = 0000 W = 0000 I = 1 = 0 000 4 = E = 0000 F = . I = 0 000 X Y = 0000 K = i 0-000 0 000 0089 0 000 0000 = 0000 M = 0.921 W = 0561 1 = 0000 v = 0259 H M = 0666 N = 0-WO W = 0000 I = ~ z E v - 0673 H z = 0 000 T i 0000 = 0000 i V = 0,059 H = 0636 J = 0-000 = OW0 0ow Y = 0396 - = N i 0.000 0000 I = 0.000 4 = 0000 0.000F = 0208 0 000 X 0 305 0321 K = 0000 i i i = = = = K I i 0000 H = 0 856 0000 T = 0 000 0716 N = 0000 0566 I = 0000 0000 4 = 0000 0000 F = 0.323 0108 X = 0318 0362 K = 0 000 Fig.22 Pattern used to test the network configuration Co S, C, - - R, S2 C2 S3 - C3 and the corresponding results (M) - - E = 0163 F 0471 J = 0-000 X = 0-029 Y = 0058 K = 0000 V = 0122 H = 1046 v = i i - - V = 0000 H Z = 0000 T M 0000 N w = 0000 I 1 = 0000 4 E = 0000 F J 0000 X Y = 0000 K = 0099 H = 1-008 0000 T = 0036 M = 0-653 N = 0071 w = 0 735 I = 0 000 1 = 0 000 4 = 0 207 V Z i = = E = 0070 F J = 0000 x i C, Y T M = 0917 N W = 0685 I 1 ; 0000 4 = 0-000 E = 0-000 F = 0-148 J = 0073 x = 0 1 5 4 Y = 0 465 K 0027 - = z =0000 1 = 0-0004 ig. 18 Patterns used to test the network configuration C, - S, R,- S2 - C2 - S3 - C3 and the corresponding results (V) M 0 103 0000 0000 0,000 ~ H = 0 000 0.000 T = 0000 i V 0000 i i 1 E = = 0572 N = 0000 0000 I = 0000 0000 4 = 0.000 0000 F = 0041 0071 X = 0.223 0509 K = 0 000 = = 0676 N 0000 I om0 4 0000 F J =0156X=0196 Y = 0517 K = 0000 = 0 837 = = = = 0 000 T i E = 0000 F = 0000 i = = 0-843 H i M J = 0.241 X = 0000 Y = 0 185 K = 0000 0.000 K 0000 = Z w i i V i i = 0000 = 0000 = 0664 = 0000 = 0000 = 0000 = 0 198 H = 0389 Z = 0000 T = 0000 V M 0.616 N = 1067 i i = 0-000 J 0000 Y = = W = 0 623 I = 0000 1 = 0000 4 0064 E 0000 F = 0 023 = 0000 X i 0214 0 000 K = 0 044 Wti i i i i i iii i i i i i I v = 0.000 H L = 0000 T M = 0 555 N W = 0544 I 1 = 0.000 4 L E i= 0000 F J = 0000 x Y = 0 054 K = 0698 = 0000 = 0 000 = 0000 0000 0279 = 0000 = 0 000 = i Z = 0000 T = 0-041 M = 0814 N = 0 085 W = 0770 I = 0 ow 1 = 0-ow 4 = 0 111 E 0 1 6 4 F i 0-565 i J = 0000 X i 0-197 Y 0-290 K = 0-041 i Fig.19 Patterns used to test the network configuration C, R1 S2 - C2 - S3 - C3 and the corresponding results - S, - C, - - v = 0000 H z =0 925 T = 0-000 H = 0000 Z = 1'155 I i 0140 M = n m.n. N nnnn .. . ... W = 0027 I = 0-282 1 0163 4 0-594 V M = 0 000 N W - 0 000 1 = 0 000 4 Y v = 0-000 H z =0 951 = 0000 1047 0234 = 0144 = 0802 H = 0164 T = 0-708 N = 0-OW I = 0844 4 = 0794 = 0-599F = 0760 = 0000 X = 0-531 i 0711 K = 0827 = = 1 = 0567 4 E = 0597 F J - 0 000 X Y = 0 625 K ig.20 Patterns used to test the network configuration C, R, S2 C2 S3 C3 and the corresponding results (Z) - - 0586 K = 0487 ~ T M = 0000 N W = 0 193 I - i ~ E = 0-291 F = 0565 J = 0 000 X = 0 375 E = 0 000 J = 0 000 X Y = 0 197 K - S, - i = = 0000 x = 0950 = 0 000 = 0.515 = 0-000 E 0000 F 0000 J = 0000 X = 0000 Y = 0000 K 0 000 i OW0 H 0000 T 0000 N 0000 I 0000 4 0 000 F 0 000 i i i i = = i M = 0 703 M N i i = W = 1 040 I W = 0 797 1 ; 0 000 4 E = 0 069 F J = 0 000 X Y = 0-206 K Fig.23 Pattern used to test the network configuration C, RI - S2- C2 - S3 - C3 and the corresponding results CN) V = 0.000 H = 0105 Z = 0000 T = 0000 M 0000 W 0968 1 0000 E 0000 J = 0000 Y = 0182 = = = = = = = = = = = i i i N = 0.884 I = 0000 4 = 0000 F = 0000 X = 0.407 K = 0000 0420 H = 0865 0000 T = 0000 0625 N = 0112 0634 I 0000 0000 4 = 0225 0000 F = 0179 0124 X = 0352 0.386 K = 0-000 ~ R,- S2 - C2 - S3 - C3 and the corresponding results - S, C, - - V = 0146 H 0697 Z = 0.000 T * 0000 ~ M i 0666 N = 0260 W = 1.034 I I 0000 1 I 0.000 4 - 0.315 = 0-034 i E = 0,000 F 0339 J = 0000 X = 0514 Y = 0390 K v = 0.016 H z = 0000 i F - - 0.797 T = 0000 M = 0.~70N = W = 0 967 I = 1 = 0.000 4 E = 0 000 F = J = 0000 X = Y = 0421 K Fig.24 Patterns used to test the network configuration C, C, - V = 0000 H = Z = 0,000 T = M 0000 N = w = 0000 I = 1 = 0437 4 = 0328 H = 0766 0006 T = 0203 0-904 N = 0882 0052 1 = 0221 4 0770 E = 0357 F = 0512 .I. = 0000 X = 0875 ~~. Y = 0661 K = 0807 V Z 0 191 H z = 0 000 T S, - 0178 0000 0084 0341 0308 0025 C, - 0000 0594 OOW oow oow i 0 023 = ow0 i 0000 H = 0 0 0 0 0000 T = 0-619 0000 N = 0000 W = 0 0 0 0 I = 0136 1 = 0187 4 0000 E 0000 F 0152 J 0 000 X 0 000 Y = 0 040 K = 0 289 V Z M = = i i i i i i V=OOOOH=OM)O Z = 0 000 T = 0 770 J = 0000 X = 0000 Y = 0 147 K = 0-271 Fig.21 Patterns used to test the network configuration Co - Sl Rl - S2 - C, - S3 - C, and the corresponding results (g - C, - Conclusions In order to provide motivation for the proposed modifications of the neocognitron [5-81, we have presented some experimental results obtained from its simulation. - 0000 H = 0000 0201 T = 0869 v = 0000 H = z = 0-119T E 0030 N = 0.000 0000 I = 1.015 = 1.107 4 M F 0000 W = 0w0 i = 0986 E = 0231 J = 0000 Y - 0 289 I M=0000N=O000 w = o o o o 1 =02m 1 = 0222 4 = O O W E = 0 000 F = 0 000 = = ~~ ~ = 0039 0319 F = 0395 = 0.000 X i 0214 0-397K = 0-653 ~ = Fig.25 Patterns used to test the network configuration C, - S, Rl- S2- C2 - S3 - C3 and the corresponding results (I) 0000 0891 N = 0000 I = 1.056 4 = 0.039 F r 0.373 X = 0106 K 0797 - - C, - It has been found that, for an improved recognition of two-dimensional patterns by the neocognitron, changes are needed. There should be no inhibition in the initial layer, thus enabling a more efficient extraction of the V H = 0000 0807 T N = 0000 = 0000 = 0.000 M = 0000 W = 0000 1 = 1 154 E = 0000 J = 0000 Y = 0000 z i I = 0875 4 = 0000 F = 0000 x = 0000 K = 0000 V z M W 1 E J Y v = 0 000 2 . 0 000 = 0000 H = 0000 T = 0000 N 1149 4 = - H= T = N= i 0634 I = I 0805 = 1'298 = 0552 = 0.713 = 0921 - V * 0385 Z = 0527 M i 0674 W = 0766 1 = 0649 0322 = 0349 H = 0000 I 0366 T i 0000 = 0457 N F 0288 = 0 522 I = 0.OW ' 0486 H= T = N= I F 0.873 4 4 - Sl 0779 - 0656 1 0660 i w ~ 1 0 235 4 = 0 387 = 1087 F = 1 045 J = 0000 X i 0419 Y = 0243 K = 0721 0285 0208 E = 1 1 5 1 F = 1136 J = 0 000 X = 0 662 Y = 0 530 K = 0 906 = V i Z i 0448 0495 H= T= J Y 0461 0547 M i 1 = 0691 4 E 0791 F i i i 1 E - S, - C, J Y - - i V i Z M 0 000 H 0 000 0-000 T = 0000 0000 N 0000 = 0 000 I = 0 000 1 = 0000 4 0000 E = 0000 F = 0286 V Z M w = = = ~ J Y i i 0 0 0 0 X = 0000 0 000 K = 0 000 V = 0348 z = 0242 M = 0736 nt ; 0 7 6 0 , 1 = 0197 E 0580 J = 0000 Y = 0 723 i H = T = 0 749 0 261 N = 0 162 = 0 134 4 = 0516 F = 0 896 x = 0 731 K = 0 606 i i i i i = 0000 H = Fig.29 Patterns used to test the network configuration C, Rl- S2 - C2 - S3 - C3 and the corresponding results (F) K 0000 0-000 0.000 0000 i SI C, - - 0828 F x 0 543 K 0000 0 000 - Sl 1143 - C, primitives the input pattern is made up of. Inhibition can be included in the later stages and should be feature-based to enable efficient classification. However, it should also be noted that the proposed architecture is still not intrinsically meant for recognition of patterns i i 1.219 0-708 0000 0-230 0095 0189 0379 i - - Cl - = 0'000 H = 0 000 = 0210 T = 0 000 = 0080 N = 0-000 = 0 000 = 0000 4 ; 0 000 = 0000 F = 0046 = 0000 X = 0-525 = 0499 H = 0000 0070 T = 0012 0479 N = 0000 = 0000 I = 0000 = 0000 4 = 0000 =0135F=0480 = 0000 X = 0540 = 0981 K 0320 ' 1212 K = 0 000 = 0342 H = O 000 = 0285 T I 0 429 = 0505 N = 0 000 = 0 000 I = 0 257 i 0354 4 ; 0 000 0162 F = 0 425 = 0000 X = 0 424 = 0.938 K - 0 502 = i - S1 - C1 - - V = 0000 H i 0000 2 0-000T = 0346 M = 0000 N = 0000 W = 0000 1 = 0277 i 1 = 0328 4 = 0000 = 0173 F = 0.254 i E i J = 0000 X Y = 0 103 K Y = 0538 K - i ' = 0.000 I = i 0 587 0-153 I F = 0.358 1'148 K F = 0545 J F 0000 X = 1'146 Y = 1071 K 0990 i V = 0 000 H = Z = 0029 T i M = 0000 N W = 0000 I = 1 = 0501 4 = E = 0472 F = J = 0000 X = 0 391 = 0 400 = 4 0000 X H 0 000 0000 0209 0-800 0035 0000 T 0000 N = 0000 I = 0000 4 0-000 F = 0000 x 0000 K 0 000 1210 = H = 0000 T = 0000 N i 0124 I -0000 T N= I = 4 = 0-000 i 0 291 N = 0 000 i 0286 I i 0 192 = 0242 4 = 0213 = 0000 i 0000 0 000 0300 T = 0 349 = 0000 0000 = 0000 H 0 000 H = 0 000 0000 T i 0048 0000 N = w = 0 000 I = 1 0 000 4 = E = 0830 F J = 0000 X = Y 0 000 K = = 0216 0 000 F = 0029 x = 0000 2 M = W I 1. Fig.32 Patterns used to test the network configuration C, Rl S2 C, - S, - C3 and the corresponding results cy) - i = = - v = = w 0889 0775 J = 0011 X = 0869 Y 0644 K = 1020 = i V Z M = 0557 N 0 486 = 0642 I = 0662 0356 H = i E W Fig.28 Patterns used to test the network configuration C, R1- S2 C2 S3 C3 and the corresponding results (E) 0000 i 0-129 0675 0452 0.454 0233 E = 0437 0.069 0307 T = 0316 N F = 0 000 I = 0000 4 = 0000 F = 0043 X = = 1390 K i L E 0 535 0000 = H= T N I = 4 = Fig.31 Patterns used to test the network configuration C, - S, Rl- S2 - C2 - S3 - C3 and the corresponding results (X) W ~ i i 0005 N = 0000 D .O..W. I = 0 272 = 0256 0397 1.355 0844 0172 4 = 0697 0469 F = 0682 1.136 0150 X 1-070 K = 0763 = 1.210 Cl i s i 0 822 I ~ i - 0150 0240 0337 0 321 0000 0062 0 641 H = 0457 0,174 0599 T 0 876 N - 0594 0766 0554 0429 - 0187 0-000X = 1.213 K = M = 0000 H = Z = 0 390 T = M = 0 241 N W = 0407 I = - 0000 F -= s V V 0000 0000 0000 0000 i i ow0 0-000 Fig.30 Patterns used to test the network configuration C, E = 0229 F = 0429 J = 0201 X = 0380 Y i 0214 K = 0610 1 0 211 0012 0 311 0 000 0 000 0 000 0 540 0 127 R, - S2 C2 - S, - C, and the corresponding results (J) M i i - Cl V=0000H=0163 Z 0000 T = 0233 i 0000 4 0-067 i 0.040 i 0.000 F = 0 000 0 928 X = 0 000 0000 K = 0000 0000 0-000 i ~ 0000 H i 0000 T I M = 0000 N w = 0000 I 1 =00004= E = 0000 F = J 0106 X i Y = 0000 K - i = 0213 K = 0689 I 0698 4 i - 0310 X 0000 K i i M = 0676 4 = 1.226 = 0437 F = 0667 = 0.165 X 0478 C3 and the corresponding results (4) V Z i Z i - Fig.27 Patterns used to test the network eonfiguration C, - - = 0000 = 0329 = 0328 1 = 0892 4 E = 0498 F J = 0362 X Y = 0518 K S2 - C2 - S, Sl V i 0108 X = O 1 2 3 = 0048 K = 0629 V = 0241 H = 0-626 Z = 0-848 T = 0605 M = 0458 N = 0064 - 0662 = 1 = 0962 4 E = 0000 F Rl i 0.050 H = 0000 T = 0058 N = w 0000 I = 1 = 0000 4 = E OOW F J 0-744 X = Y = 0000 K il = 0959 4 = 0000 = 0089 F = 0302 M = 0188 N w = 0543 I = 0839 0113 i = 0088 OOW I * 0000 ~ X = 0000 0000 I = V = 0000 H Z = 0-839T W 0.000 = 0000 0000 K 0 040 H 0000 T 0.004 N 0 000 N i = F = 0000 = 0106 Fig.26 Patterns used to test the network configuration C, C2 - S3 - C, and the corresponding results (1) Y 0000 T = 0000 i 0000 0000 I F 0000 0 000 4 = 0-000 0000 F 0000 0 108 x = 0000 0000 K = 0000 0000 = 0000 I = 0966 R, - S2 -. = = 0000 H = 0000 0512 i = 0000 H = 0000 = 0000 T = 0497 = 0000 N i 0000 M = 0 000 w = 0 000 1 = 0 713 E = 0 104 J - 0 000 Y = 0 047 J = 0000 I 0 000 0514 0000 0.423 0333 0637 0757 0982 V Z M W 1 0158 0630 0 375 0479 0031 E = 0497 J = 0000 Y = 0817 = = = = ~ = i 0302 1051 H = 0000 T = 0529 N = 0 082 1 = 0643 4 = 0774 F = 0637 X K Fig.33 Patterns used to test the network configuration C, - S, Rl S, - C2 - S3 - C3 and the corresponding results (K) ~ i 0905 1-037 - C, - - subjected to rotation and/or occlusion. In fact, a different approach based on feature extraction (based, for instance, on location of corners and their connectivity) is needed. These results will be reported elsewhere [ 16,171. 7 1 HUBEL, D., and WIESEL, T.: ‘Shape and arrangement of columns in the cat’s striate cortex’, J. Physiology, 1963, 165, pp. 559-567 GOVINDAN, V.K., and SHIVAPRASAD, A.P.: ‘Character recognition - a survey’, Pattern Recognit., 1990, 23, (7), pp. 671483 LIPPMAN, R.P.: ‘An introduction to computing with neural networks’, IEEE ASSP Mag. April 1987, pp. 4-22 LIPPMANN, R.P.: ‘Pattern classification using neural networks’, IEEE Commun. Mag. November 1989, pp. 4 7 4 4 FUKUSHIMA, K.: ‘Cognitron: a self-organising multi-layered neural network model’, Biol. Cybern., 1975, 20, pp. 121-136 FUKUSHIMA, K., and MIYAKE, S.: ‘Neocognitron: a new algorithm for pattern recognition tolerant of deformations and shifts in position’, Pattern Recognit., 1982, 15, (6), pp. 455469 7 FUKUSHIMA, K.: ‘Neocognitron, a self-organising neural network model for a mechanism of pattern recognition unaffected by shift in position’, Biol. Cybern., 1980, 36, pp. 193-202 8 FUKUSHIMA, K.: ‘Handwritten alphanumeric character recognition by the neocognitron’, IEEE Trans. Neural Netw. 1991, May 1991, 2, pp. 355-365 9 FUKUSHIMA, K.: ‘Analysis of the process of visual pattern recognition by the neocognitron’, Neural Netw., 1989,2, pp. 413420 10 VENKATESH, Y.V., and GANESH MURTHY, C.N.S.: ‘Experimental investigations on the performance of the neocognitron for 2-D pattern recognition’, International conference on Information processing, Singapore, December 1991 11 VENKATESH, Y.V., and GANESH MURTHY, C.N.S.: ‘Modjfied noecognitron for improved 2-D pattern recognition’, Technical Report, March 1994 and March 1995 (revised), Department of Electrical Engineering, Indian Institute of Science, Bangalore, India 12 LI, C., and WU, C.H.: ‘Introducing rotation invariance into the neocognitron model for pattern recognition’, Pattern Recognit. Lett. December 1993, 14, pp. 985-995 13 CASASENT, D., and PSALTIS, D.: ‘Position, rotation and scale invariant optical correlation’, Appl. Opt., 1976, 15, pp. 1795-1799 14 MENON, M.M., and HEINEMANN, K.G.: ‘Classification of natterns using a self-organising neural network‘, Neural Netw., ..... ~~~. ~.~~ 1988, 1, pp. 201-215 15 BARNARD, E., and CASASENT, D.: ‘Shift invariance and the neocognitron’, Neural Netw., 1990, 3, pp. 403410 16 GANESH MURTHY, C.N.S., and VENKATESH, Y.V.: ‘Character recognition using encoded patterns as inputs neural networks’, National conference on Neural networks and fuzzy systems-NCNNFS 1995, Anna University, Madras, India, 16-18 March 1995, pp. 104-114 17 GANESH MURTHY, C.N.S., and VENKATESH, Y.V.: ‘A new method for encoding patterns for classification by neural networks’, IEEE Trans.Neural Netw. (submitted)