ARTICLE Communicated by Bruno Olshausen Self-Organization of Control Circuits for Invariant Fiber Projections Tomas Fernandes fernandes@fias.uni-frankfurt.de Christoph von der Malsburg malsburg@fias.uni-frankfurt.de Frankfurt Institute for Advanced Studies, 60438 Frankfurt am Main, Germany Assuming that patterns in memory are represented as two-dimensional arrays of local features, just as they are in primary visual cortices, pattern recognition can take the form of elastic graph matching (Lades et al., 1993). Neural implementation of this may be based on preorganized fiber projections that can be activated rapidly with the help of control units (Wolfrum, Wolff, Lücke, & von der Malsburg, 2008). Each control unit governs a set of projection fibers that form part of a coherent mapping. We describe a mathematical model for the ontogenesis of the underlying connectivity based on a principle of network self-organization as described by the Häussler system (Häussler & von der Malsburg, 1983), modified to be sensitive to pattern similarity and to support formation of multiple mappings, each under the command of a control unit. The process takes the form of a soft-winner-take-all, where units compete for the representation of maps. We show simulations for invariant point-to-point and feature-to-feature mappings. 1 Introduction Our work is based on the hypothesis that the recognition of visual patterns is based on a homeomorphism between object models in memory and segments within the visual input field. According to neurophysiological observations on a range of animal species, visual input is represented in primary visual cortex as a two-dimensional array of active local feature detector neurons. The receptive fields of these feature detectors may be idealized as Gabor wavelets (Daugman, 1980; Jones & Palmer, 1987). We assume also that patterns in memory, presumably located in inferotemporal cortex (Rolls, 1991), are in the form of arrays of feature detectors structured by their connections as two-dimensional sheets. It may further be surmised that analysis of visual input proceeds in the form of attention flashes, each of which singles out a segment, a figure, from within the current visual representation in primary visual cortex and recognizes it by finding a homeomorphic model in memory, that is, a model whose Neural Computation 27, 1005–1032 (2015) doi:10.1162/NECO_a_00725 c 2015 Massachusetts Institute of Technology 1006 T. Fernandes and C. von der Malsburg feature detectors can be brought into one-to-one correspondence with those of the visual segment such that neighboring units in one field correspond to neighboring units in the other. Neighborhood-preserving mapping has been sucessfully exploited in a face recognition system (Lades et al., 1993). We proceed on the assumption that these correspondence relations are implemented by arrays of neural fibers. As a given figure has to be identified with a single memory model in spite of its appearance in varying position, size, and orientation within the retinal coordinate system of primary visual cortex, a whole system of fiber projections must be available, one of which is to be activated during each recognition event. We speak of invariant fiber projections. Homeomorphism-based pattern recognition has been variously proposed in the literature (von der Malsburg, 1994; Hinton, 1981; Kree & Zippelius, 1988; Olshausen, Anderson, & Van Essen, 1993; Wiskott & von der Malsburg, 1996; Arathorn, 2002) as an alternative to the more widely accepted hierarchies of feature detectores (see Krizhevsky, Sutskever, & Hinton, 2012; Sermanet et al., 2014). Efficient management of the invariant fiber projections is possible with the help of control units, as proposed in Olshausen et al. (1993) and further developed in Lücke (2005) and Wolfrum, Wolff, Lücke, and von der Malsburg (2008). The latter demonstrated on this basis a highly functional model of biological face recognition for the special case of position invariance. In that system, control units perform several functions: they test for similarity of signals on the presynaptic and postsynaptic sides of the synapses they control, they cooperate with other control units with which they are consistent in the sense of a neighborhoodpreserving mapping, they compete with inconsistent other units, and if they prevail on their competitors, they hold “their” connections open and thus establish a coherent homeomorphic mapping. This mode of control is reminiscent of the often-discussed sigma-pi units (Rumelhart & McClelland, 1987). The neurophysiological mechanism could be based on nonlinear response of dendritic patches, on which controlling and controlled synapses coterminate (Mel, 1994). As in previous work (Zhu, Bergmann, & von der Malsburg, 2010; Bergmann & von der Malsburg, 2011), our intention here is to present a biological model for the ontogenesis of the connections between control units and controlled synapses. As a novel aspect, we address the ontogenesis of two-dimensional mappings with invariance for rotation and scale and the additional complication that if features are not invariant to scale or orientation, the correspondences between feature types have to change under rotation or scaling of mappings. This was illustrated in a system (Sato, Jitsev, & von der Malsburg, 2009) capable of first letting a set of control units recognize scale and orientation transformation parameters between two jets (local bundles of feature values) in image and model domain and then letting them activate a set of fibers implementing that transformation. As in Zhu et al. (2010) and Bergmann and von der Malsburg (2011) the principle we employ to ontogenetically structure the connections of control Self-Organization of Control Circuits 1007 units is inspired by the ontogenetic mechanism for the establishment of retinotopic mappings, as reviewed in Goodhill (2007). Signals (of electrical or chemical nature) arise spontaneously in the retina, are smoothed by lateral connections, are transported to the target structure (the optic tectum), are smoothed there by lateral connections, and control the growth of synapses. Synapses that find strong correlation between presynaptic and postsynaptic signals grow at the expense of competitors, competition reigning on the one hand between synapses that converge on one target position and on the other hand between synapses that diverge from the same retinal position. Early models of this mechanism have been presented in Willshaw and von der Malsburg (1976), using electrical signals, and Willshaw and von der Malsburg (1979), using chemical signals. As in Zhu et al. (2010) we use a generalization of a formulation (Häussler & von der Malsburg, 1983) that renders the above mechanisms in compact form as a set of differential equations. In the following section, we recapitulate and slightly extend the Häussler model for the formation of fixed retinotopic maps before describing, in section 3, the formation of units for the control of multiple maps. Whereas in the first case all synapses not part of the final fixed mapping are eliminated, in the multiple map case, all synapses required by any of the maps must be physically present, but the synapses not part of the currently active map are temporarily switched off by lack of support from their control unit(s). 2 Fixed Map Formation In order to form a mapping, neurons from a source region have to project with their fibers to a target region and establish synaptic contacts there. Here use the example case of retinotopic maps, but also other topographic mappings in higher visual areas might develop by the process described in this section. According to this mechanism, fibers that originate in neighboring points in the retina and projecting to neighboring points in the tectum (see Figure 1a) cooperate with each other. Due to short-range excitatory connections in the two sheets (Cρ ρ and Cτ τ in the figure), such pairs of fibers are part of alternate pathways transporting a signal originating in a point in the retina to a point in the tectum, one such pathway being (ρ → τ via the direct pathway of a connection with strength wτ ρ , another pathway between the same points being the chain Cρ ρ − wτ ρ − Cτ τ . These pathways, the combined strength of which is τ ρ Cτ τ wτ ρ Cρ ρ , conspire to induce signal correlations between points ρ and τ , correlations that in a Hebbian fashion act to strengthen the direct connection wτ ρ . Under some simplifying assumptions concerning spontaneous signal sources in the retina and linear transport of signals over connections, it is possible (see von der Malsburg, 1995), to eliminate signals from the dynamics and subsume their effect in a formulation of direct interactions between connections strengths, as used below, for example, in equation 2.2. 1008 T. Fernandes and C. von der Malsburg Figure 1: Cooperative (a) and competitive (b, c) processes between fibers connecting the retina and tectum. Fibers connecting neighboring points in the retina to neighboring points in the tectum cooperate (a), whereas competition reigns between fibers that connect one retinal point to very different tectal points (b) or very different retinal points with the same tectal point (c). These interactions lead to the development of topological mappings, which optimally exploit cooperation and avoid competition. Here Cρ ρ and Cτ τ are coupling functions within sheets, and wτ,ρ is the strength of the connection between points ρ and τ in the retina and tectum, respectively. In order to balance the growth of connections, there is growthdiminishing competition of two kinds. If a point in the retina connects to several points in the tectum, these connections compete for strength (divergent competition; see Figure 1b). Likewise, connections from different retinal positions to the same tectal position compete (convergent competition; see Figure 1c). This process of map formation has been compactly formulated as a set of differential equations for the development of the strengths wτ ρ of the connections between retinal point ρ and tectal point τ (Häussler & von der Malsburg, 1983). This “Häussler system” is defined by the following equation (in which W stands for the matrix wτ ρ ), ẇτ ρ = fτ ρ (W) − wτ ρ Bτ ρ ( f(W)), (2.1) which contains the cooperation term fτ ρ (W) = α + wτ ρ τ ρ C(τ, τ , ρ, ρ )wτ ρ , (2.2) and the competition term 1 Bτ ρ f(W) = 2 τ fτ ρ (W) Nτ + ρ fτ ρ (W) Nρ . (2.3) Self-Organization of Control Circuits 1009 Figure 2: Matrix interpretation of the Häussler system in the case of onedimensional chains as retina and tectum. Each matrix cell holds the weight of a connection wρτ between the points with index ρ and τ , and the matrix has size Nτ × Nρ . The growth of each connection is enhanced by its neighbors (the circle symbolizing the bell-shaped coupling function C) and is suppressed by convergent competition (within columns) and divergent competition (within rows). Here, α is an unspecific synaptic growth rate; C(τ, τ , ρ, ρ ) is a separable coupling function, modeled as a product of gaussians; and Cτ (τ , τ )Cρ (ρ , ρ) and Nρ , Nτ are the numbers of points in the retina and tectum, respectively. As we pointed out above, the coupling functions are the result of neural signal correlations, as derived in von der Malsburg (1995). The sum term in equation 2.2 models the cooperation of the direct connection wτ ρ with its indirect neighboring connections (see Figure 1a), while Bτ ρ models competition, its two terms standing for divergent (see Figure 1b) and convergent (see Figure 1c) competition. If the source and target domains are one-dimensional chains instead of two-dimensional sheets, the connections wτ ρ form a matrix in which all interactions between these connections can be visualized (see Figure 2). In Häussler and von der Malsburg (1983), the system was subjected to a stability analysis by linearization about the unstable stationary point wτ ρ = 1 ∀τ, ρ and extraction of linear modes. With a proper choice of α, only two modes, corresponding to coarse crossed diagonals, would initially grow. These correspond to the two possible orientations of the map, and they 1010 T. Fernandes and C. von der Malsburg Figure 3: Häussler system in two dimensions. These patterns of connectivity are stable states of equation 2.1. The system is simulated iteratively starting with a connectivity matrix initialized with small random values and reaches the final configuration in about 3500 iterations. The final state is selected by the random initial conditions. grow at an equal rate due to the symmetry of the system. Due to nonlinear interactions, these compete with each other, spontaneously breaking the symmetry, and the winning diagonal then excites higher-frequency modes, which finally add up to form a narrow diagonal pattern, which corresponds to a topological mapping between the chains. Development of the system thus proceeds from coarse to fine. In simulations, the choice of the final configuration is determined by the initial state of the weight matrix, which is initialized with small random values or by noise injected in the process, if any. Figure 3 shows several possible find states of a simulation for the two-dimensional case. So far, the system is autonomous, the final state being determined by spontaneous symmetry breaking. In the biological case, however, the orientation of the retinotopic map is determined genetically. Apparently (for a review, see Goodhill, 2007) this is achieved with the help of weak chemical marker gradients that are preestablished in the retina and tectum to break the symmetry. To include such external influences in the system, we replace the unspecific growth term α in equation 2.2 by a structured growth term (somewhat inspired by Hebbian plasticity), gτ ρ = αmax e−ke (Iρ −Oτ ) + η, 2 (2.4) where ke is a fixed parameter and I and O are input and output activity patterns, which for the moment may be assumed to be externally imposed. The exponential term controls the growth of the weight of the fiber (ρ, τ ) in such a way that similar activity values Iρ and Oτ lead to maximal growth rate αmax , whereas low similarity brings the growth close to zero, disturbed only by a small noisy fluctuation η. External guidance of the map formation process, somewhat analogous to preestablished chemical marker gradients in the retino-tectal case, replaces spontaneous symmetry breaking with goal-oriented development. Self-Organization of Control Circuits 1011 3 Formation of Multiple Maps In distinction to the ontogenetic establishment of fixed retinotopic mappings, homeomorphic pattern recognition requires a multiplicity of maps, one of which is to be activated for each relative transformation state between retinal and pattern-centered coordinate frames, such that retinal and memory patterns can be mapped to each other. Because a very large number of points in retinal coordinates are to be mapped at different times to the same point in invariant space, it is unrealistic to assume that these mappings are realized by direct fibers. It rather is to be assumed that there is a sequence of intervening sheets with relatively low fan-in and fan-out at each stage. This was first proposed by Anderson and Van Essen (1987) and was shown in Wolfrum and von der Malsburg (2007) to be feasible with realistic numbers of sheets and fibers. In addition, there are reasons to assume that the first layers are responsible for compensating translation (and the static deformation due to foveal inhomogeneities) and that scale and orientation are compensated at later stages. We concentrate here on the latter. We thus consider an input domain I and an output domain O and direct fibers between them, with the goal of establishing a set of mappings that differ in relative orientation and relative scale between I and O. Once established, these mappings have to be activated on the fast, perceptual timescale, while the image of an object under scrutiny is moving over the retina. As proposed in Olshausen et al. (1993), further worked out in Lücke (2005), and implemented in a concrete object recognition system (Wolfrum et al., 2008), this is possible with the help of control units, a hypothetical kind of neurons, as outlined in section 1 (see Figure 4). According to this hypothesis, the fibers constituting a coherent mapping are under the management of a single control unit. When this is active, it helps the fibers to transmit signals, but when it is inactive, it essentially switches them off. It is the point of this article to describe the process of learning by which the connections between control units and individual projection fibers between I and O are set up so that they command organized maps. These connections presumably have the form that neuritic extensions of control units approach the synapses of individual projection fibers in O and touch them directly or connect to dendritic patches near them. We denote the strengths of these connections as Wτuρ , index u identifying the control unit and ρ and τ the units in I and O, respectively, that are connected by the fibers under the control of u. There is to be a whole set of control units that together cover the space of all transformations. (Although it is, for several reasons, more realistic to assume that a cooperative set of control units is needed to control a single mapping, we here consider for simplicity only a single control unit per mapping). The process to be modeled has the following form. A visual pattern I appears in I . It is centered (as the result of mappings in previous stages), 1012 T. Fernandes and C. von der Malsburg Figure 4: Controlled mappings. In this schematic, three topographic mappings (indicated by different colors) connect the domains I and O, each governed by a control unit (U1 to U3, their colors corresponding to those of the mappings). The control unit u has bidirectional connection strengths Wτuρ to and from synapses connecting fibers between units ρ in I to units τ in O. Control unit activity is computed in equation 3.1 from input E(u), the weighted similarity, equation 3.2, of signal patterns in the domains I and O, and lateral input, equation 3.3, which they receive from each other. During learning, control units fire with probability according to equation 3.5 and update their weights according to equation 3.6. The latter is similar to equation 2.1, containing, however, a binary control variable S(u) and instead of α, an activity-dependent growth term, equation 2.4. During the function of pattern recognition, not modeled here, silent control units switch off and fail to keep open the mapping fibers they u . control through their contacts Wρτ and it occupies part of I . Initially the activity O in O is random. Some or all of the control units are active, so that most of the projecting fibers from I to O are conducting. The combined effect of these signals modifies the pattern O. A cycle is then started in which the control units test for similarity between I and O under each unit’s current individual mapping, and one or several control units experiencing the highest similarities are permitted to modify their connections such as to bring their mapping nearer to the actual signal similarities, as described above. This iteration may go on for some time while the same pattern I is active, and the process is repeated with many different input patterns. Self-Organization of Control Circuits 1013 From the point of view of one of the control units, the changes undergone by its connections Wτuρ are an intermittent form of retinotopic development. Each time the unit is switched on, it encounters an output pattern that is relatively similar to the input pattern (as that is the condition for its firing), and it is permitted to briefly change its connections according to the retinotopy equation, equation 3.6, which is a gated and similarity-guided version (see equation 2.4) of equation 2.1. Its connections thus develop into a retinotopic mapping, different control units specializing on different scale or orientation. After this process has converged, there will be a set of mappings, each governed by its own control unit and together covering a range of transformation parameters in scale and orientation. We now describe the functional components of the iterative algorithm in more detail. 3.1 Unit Excitation. We define Etot , the total excitation of the control unit u, as a weighted sum, combining the inputs E(u) coming through synaptic contacts with the mapping fibers, equation 3.2, with the lateral excitation/inhibition L(u) between units, equation 3.4, parameter b ∈ [0, 1] regulating the balance between the two terms: Etot (u) = bE(u) + (1 − b)L(u). (3.1) 3.2 Excitation Through the Mapping. The control units’ processes (which properly should be called neurites, as they conduct signals in both directions) touch the synapses (ρ, τ ) of the projection fibers with weight u Wρτ and collect the values of the similarity of signals on the presynaptic and the postsynaptic side of the projecting synapses, weighted with the strength of the connection, 2 E(u) = (I −O ) 1 u − ρ 2k2τ e W e , ρτ Nu ρ τ (3.2) where ke is a constant that regulates the standard deviation of the similarity measure, Iρ and Oτ are the activity patterns in the pre- and postsynaptic side of the mapping, ρ and τ are indices of source and target points in I u and O, respectively, and Nu = Wρτ is a normalization factor. ρτ 3.3 Lateral Interaction Between Control Units. The lateral interaction L(u) of the control units in equation 3.1 is computed according to L(u) = U u =u C(du,u )S(u ), (3.3) 1014 T. Fernandes and C. von der Malsburg where U is the number of control units, S(u ) ∈ {0, 1} is the activity state of unit u (see below), C(..) is the mexican hat coupling function defined in equation 3.4, and du,u is the distance between the control units. Although control units naturally form a two-dimensional array with coordinates’ scale and orientation, we connected them up as a one-dimensional circular chain and made sure that our input image sequence also formed a onedimensional chain by keeping the scale of the input image constant during a whole sequence of orientations, then changing the scale by one step, and so on. In this way, each image is very similar to the previous one (with one exception, when the scale jumps from the highest to the lowest value) and the neighborhood interaction between control units ensures that the next control unit on duty is already preexcited. The coupling function is defined as C(d) = 2 1 3σu π 4 1− d2 σu2 − e d2 2σ 2 u , (3.4) where d = du,u is the distance of control units u and u and σu is a parameter scaling the distance over which units cooperate. The mexican hat shape of this function implements short-range cooperation and long-range inhibition among the units. 3.4 Unit Firing Probability and Firing Rate. In each iteration step, the probability for a unit to fire (firing being designated as S(u) = 1) is determined stochastically with probability p(u) = 1 1 + e−ks (Etot (u)+β ) , (3.5) where ks is a parameter and Etot is computed in equation 3.1. The effect of our stochastic firing mechanism is a soft winner-take-all (WTA). There are several reasons for using a stochastic mechanism for activating control units. First, even units with small Etot thus get a chance to fire and modify their weights occasionally so that all of them are finally engaged, while, second, units with large Etot are kept from dominating the game, thereby giving other units a chance to take over transformation parameters although they still have lower Etot . The parameter β controls the bias of the unit by shifting the curve along the excitation axis, thereby giving even inhibited units a certain firing probability. Early in the learning process, the distribution of Etot (u) will be very broad, but toward the end of the learning period, when the mappings controlled by the units are highly structured, that distribution will be sharply peaked around one point in the transformation space (see Figures 8 or 10) so that the activation of control units will be almost deterministic. Self-Organization of Control Circuits 1015 Note that this stochastic function is of the same nature as the one used in Boltzmann machines (Ackley, Hinton, & Sejnowski, 1985). There are differences, though, with respect to how these units are used. First, the connection weights of the units are the result of similarity-guided selforganization of topographically restricted mappings. Second, units compete and cooperate via lateral connections (see equations 3.1 to 3.4). 3.5 Evolution Equation for the Connections. When we put terms together, our evolution equations for the connectivity Wτuρ are now Ẇτuρ = S(u)[ fτ ρ (Wu ) − Wτuρ Bτ ρ ( f(Wu ))], (3.6) with the cooperation term fτ ρ (Wu ) = gτ ρ + Wτuρ τ ρ C(τ, τ , ρ, ρ )Wτu ρ . (3.7) The difference from equation 2.1, apart from the fact that the additional index u has been introduced, lies in the gating factor S(u), which permits modification of connections only for active control units (with S(u) = 1) and not for inactive units (S(u) = 0), whereas the difference of equation 3.7 from equation 2.2 is the replacement of the constant α by the pattern similarity term, equation 2.4. The competition term Bτ ρ is formed as in equation 2.3. 3.6 Pattern Formation in the Target Domain. In the beginning of the process, we assume that the activity pattern O in the target domain O (output pattern) is formed by random fluctuations and thus is unstructured. The output pattern is then updated according to Oτ = γ Iτ∗ − Oτ , (3.8) where γ is an update rate and Iτ∗ = Iρ is the activity of the source domain unit ρ with the strongest active connection Wρu τ to τ , (ρ , u ) = argmax S(u)Wτuρ . ρ,u (3.9) This strategy is similar to the one proposed in Wiskott and von der Malsburg (1996; see also Riesenhuber & Poggio, 1999; Krizhevsky et al., 2012; Sermanet et al., 2014). Note that conceptually, our update rule differs from the trace learning rule presented in Földiak (1991). Here, only one projecting fiber per output position is allowed to project its input activity. In addition, this fiber selection is indirect and depends on the state of the control units u and the control connection weights Wρτ . 1016 T. Fernandes and C. von der Malsburg 3.7 Convergence and Entropy. The system converges when Ẇτuρ → 0 for all control units u and all connections ρτ . To diagnose this convergence of the fiber connections, we compute the sum of the modulus of the weight changes over all connections, = uτ ρ |Ẇτuρ |. (3.10) u should have most of its Furthermore, after convergence, the matrix Wρτ values close to zero (the actual value determined by the positive growth term gρτ ), except for those connections that form the mapping and converge to the saturation value Ntot , the total number of units in the target domain. In order to assess the progress of this concentration of values, we calculate the entropy of this at every iteration by normalizing the weights, distribution u u Wnorm = W u / ρτ Wρτ , computing their histogram h(n) over N bins (e.g., N = 512), and then forming the distribution ph (n) = h(n)/ i h(i), for which we finally compute the entropy, h=− N ph (n)log2 (ph (n)). (3.11) n=1 See Figure 5 for the evolution of and h for a typical run of the system. 3.8 Execution Flow. The main aspects of the execution flow are described in algorithm 1. The input patterns are organized in consecutive frame sequences containing object images that are slightly transformed from frame to frame through a smooth sequence of transformations T1 , T2 , . . . , TU modeling the rigid motion of objects that are tracked or manipulated by an observer (we here limit ourselves to scaling or rotation in the image plane). The output pattern O is reset to a random pattern every time the attention shifts to another object. 4 Simulations of Point Mapping Generation We performed simulations of the algorithm using input and output field sizes of 32 × 32 units, initializing the weight matrix Wu and the output pattern O with small, random values and presenting as input patterns face images as in Figure 6. We found good convergence of the system in terms of both parameters and h as well as in terms of the shape of final mappings. Because the progress of map formation, which proceeds in small increments for stochastically changing control units, is difficult to illustrate, we performed a separate long run with a single input image. The convergence and entropy h fell steadily (see Figure 5, and a mapping developed along with an activity pattern O that already closely resembles Self-Organization of Control Circuits 1017 Figure 5: Typical evolution over 3000 iterations, taken from the simulation shown in Figure 7. Dotted curve, left scale: convergence of the system. Solid curve, right scale: entropy h of the central unit in the output field O. the input pattern (see Figure 7). Although the simulation runs for 3000 steps with convergence parameter h continuing to fall asymptotically, the entropy reaches a plateau at around iteration 1700, and no changes are noted thereafter in the mapping and the pattern O. The most important parameters to stabilize the process and avoid capture in local optima are the growth rate αmax in equation 2.4 and the widths of the coupling functions Cτ and Cρ (see Häussler & von der Malsburg, 1983, for a comprehensive analysis). As in the single-map formation case, in order to ensure stable evolution, we start with large standard deviations of the coupling functions and gradually reduce them during the evolution of the system. A decisive turning point in the development of our system was the realization that whereas a linear combination of inputs to a unit in O washed out all structure from the output pattern and prevented structure formation, the decision of permitting only the strongest link to a unit in O to update it (see equation 3.8) solved the problem (an idea introduced in Wiskott and von der Malsburg, 1996, together with a justification in biological terms). While the other parameters are not so critical for stability, they influence the sensitivity of control units to activity states in the beginning of the process 1018 T. Fernandes and C. von der Malsburg and lose importance after a few hundred iterations when the structure of each control unit’s connectivity is essentially already determined. For the simulations, we used the following parameter set: αmax = 0.2, η = 0.001, ke = 10, Cτ and Cρ with standard deviation σ from 2 to 0.75, b = 0.5, σu = 2.5, ks = 4, and γ = 0.1. Self-Organization of Control Circuits 1019 Figure 6: (a, b) Typical patterns employed as input during development of point-to-point mappings. The image size is 32 × 32 pixels in gray scale. (c, d) Gabor jets with 15 orientations and 8 scales employed for feature-to-feature mappings (taken from the center pixel of input images, a and b, respectively). The horizontal and vertical axes correspond to the orientations and scales of the filters, with white and black values representing high and low responses, normalized for clarity. The run we have presented so far, Figures 5 and 7, involved a large number of iterations on a single input pattern, showing that a single mapping can develop in an uninterrupted sequence with a single input image. However, to achieve an even distribution of assignments of control units over the range of transformations appearing in the input, it is necessary to use an intermittent schedule. We now describe a production run using the interleaved schedule described in algorithm 1. It involves different input images, each one coming in a sequence of transformed versions. Each such run leaves behind a (slightly deformed) copy of the input pattern, so that the next, scaled or rotated version, gives a different control unit the chance to win and to organize its connections Wτuρ . There is no logical necessity to present continuous sequences of transformations of the same object at the input. However, convergence is accelerated decisively by it because, first, one input leaves behind a structured output pattern to which the next input in a sequence can be matched, and, second, due to the neighbor excitation in equation 3.3, the correct next control unit is already preexcited to win the race. This in turn promotes the continuity of response between neighboring control units shown in Figure 10. For our production run, the system had 300 control units, and we used a training set of 30 “objects” (frontal face images, similar to those in Figure 6), each presented at 60 rotations and 5 scales. As the transformations are shared by all objects, the number of objects is not critical as long as a large enough set of transformations is covered. Depending on the size of the training set, the system may iterate cyclically over the samples to reach convergence, because a control unit needs to experience the same transformation a few hundred times before specializing to it. In our production run, each of the 30 face images was cycled through 300 transformations. At the end of this sequence, all control units had structured their own mapping to collectively cover the space of scale and orientation transformations 1020 T. Fernandes and C. von der Malsburg Figure 7: A sample simulation. Codevelopment of point-to-point mappings and output pattern. (Top) Update of output pattern O. (Middle) Point-to-point transformation Wτuρ for one control unit. Only the strongest link into each output unit is shown for the mapping (if that strongest link is above a small threshold; if not, no link is shown). (Bottom) Iteration number i. Both mapping Wτuρ and output pattern O are initialized randomly. The matrix Wτuρ is updated according to equations 3.6, while the output pattern is updated using equation 3.8 with parameter γ = 0.1. Over the course of the iteration, the output pattern O evolves into a copy of the input pattern I (though slightly deformed and scaled by the mapping; compare Figure 6a). between I and O (see Figure 8). In our case, the number of control units of the system happens to be equal to the number of transformations shown, but one should rather think in terms of a more or less dense sampling of a continuous space of transformation parameters by control units. Should very precise mappings between high-resolution patterns be asked for, the control space can be simplified by factorization into smaller subspaces, as modeled explicitly in Olshausen et al. (1993) and Wolfrum and von der Malsburg (2007) for the case of translation. Self-Organization of Control Circuits 1021 Figure 8: Evaluation of transformation parameters developed by 15 selected control units (out of the total of 300 control units) in a frozen state of the system. Each pixel in the squares corresponds to a stimulation by an image of the orientation and scale indicated on the axes. The peaks of the distributions show that each unit is selectively excited by a narrow range of transformations. The color bar represents the level of excitation from low (blue) to high (red). The response maxima lie at scales s approximately equal to 1.25, 1.00, and 0.75 for the three columns (left to right), and at orientations θ varying approximately from −60◦ to 60◦ in increments of 30◦ (top to bottom). 1022 T. Fernandes and C. von der Malsburg 4.1 Evaluation of Control Unit Specificity. In order to evaluate the specificity of the control units after they have developed, we analyze the dynamics of the competition and the excitability of individual control units with respect to a set of known transformations of the input pattern. After the training phase, the system is expected to have converged to a state in which units respond to a small range of transformation parameters. To ascertain this, we freeze the output pattern Oτ (at a time when it has evolved to be similar to the input pattern used in the test) and the connection weights Wτuρ . We now take the input pattern through a number of rotation and scale parameters (s, θ ) and obtain the excitation E(u) according to equation 3.2. By plotting its values for each control unit as a two-dimensional excitation surface over the orientation and scale parameters of the input pattern, we obtain Figure 8. As expected, it turns out that in most of the cases, control units have a pronounced excitation peak around a particular scale and orientation of the input pattern relative to the output pattern, that is, they have receptive fields that are selective for that specific transformation. In some cases (see, e.g., the unit of the last row, first column), a control unit responds to more than one region. This may be caused by partially consistent mappings, partial maps responding to different parameter sets, or the symmetries of the input pattern. In other cases, as in the last column, rows 2, 4, and 5, the region is identifiable but the excitation is relatively low compared to the maximum excitation level of other units. This may be due to peculiarities of our specific set of input patterns, especially limited resolution. The graph of Figure 9 depicts the evolution over the training phase of the firing probabilities of five neighboring control units, corresponding to the center column shown in Figure 8. Figure 10 illustrates the continuity of response between neighboring control units after completion of the organization, showing that neighboring control units in the chain respond to similar transformation parameters. Because the input pattern does not change much from one iteration to the next, lateral excitation (see equation 3.3), preexcites neighboring units for the next iteration, making them more likely to win the competition in the next round. 5 Feature-to-Feature Mapping Once consistent point mappings between I and O have been organized so that they can be activated by single control units, our next task is to model the further structuring of the connections of those control units so that they also specify feature-to-feature maps that are consistent with the point maps; consistency here means that point maps and feature maps describe the same relative orientation and scale between the patterns in I and O and their local features. So far, the “units” in systems I and O have been treated as structureless entities, as if they were single neurons. We now have to face the fact that Self-Organization of Control Circuits 1023 Figure 9: Evolution of the probability p(u) (see equation 3.5) of five control units in response to input figures at scale s = 1 and orientation θ = 0, corresponding to the midpoint in the panels of Figure 8. Control unit 0 (corresponding to the center panel of Figure 8) develops a strong probability of firing in response to that transformation parameter set, while the others (corresponding to the other panels in the same column, in the sequence 5, 3, 1, 0, 2, 4, 6, top to bottom) maintain some probability of firing due to lateral interaction or residual pattern similarity. each position in I and O is occupied by a whole set of neurons representing different texture features in I , or ready to represent texture features in O (those cells acquiring their feature selectivity only through their connections from I ). Thus, what we called a unit might now be called a hypercolumn, or column. Correspondingly, the unit activity Iρ is to be replaced by the higherdimensional entity Iρκ , and similarly Oτ by Oτ ι , where the indices κ and ι designate feature cells inside columns in the two domains. Correspondingly, the set of connections is to be expanded into a higher-dimensional entity, with point and feature indices in both domains. We tried to work with this full space from the beginning but could not achieve convergence of the system, the reason being the high dimensionality of the search space. A simple means to reduce this search space decisively is to again apply a coarse-to-fine strategy. To implement this, we assume that the neurons inside columns are initially coupled by excitatory connections that are strong enough early in development to let the units of a column 1024 T. Fernandes and C. von der Malsburg Figure 10: Neighboring unit responses showing the continuity of distribution of activation of control units in the one-dimensional chain of the control space. This graph was obtained by rotating an input pattern and calculating E(u) using equation 3.2 for 10 immediate neighboring units (5 to each side, indexed by u ∈ [−5 · · · 5]) to the control unit u = 0 that developed the strong response to scale s = 1 and orientation θ = 0. The orientation of the input pattern is expressed in the horizontal axis. Due to neighboring excitation given by equation 3.3, a winner unit can preexcite its neighbors, making them more likely to win the competition in the next iteration. Even beyond the range shown, we observed complete continuity of the mapping of transformation parameter values onto control unit numbers. switch on and off simultaneously. The whole column can then be described by a single activity variable. Our simulations so far may be interpreted this way. At the end of this stage, the connectivity Wu has converged to a state in which for a given control unit u, all the connections from a column in I go to a single column in O, forming all-to-all connections between the feature units inside the columns (or, rather, a random subset of all possible connections). Once this system state is reached, the strength of the excitatory coupling inside columns in I and in O can be reduced, so that the activity of individual neurons becomes independent of each other and the control units are free to organize the feature-to-feature connections. To describe the reorganization of the feature map between a pair of columns connected under a Self-Organization of Control Circuits 1025 control unit u, one in I and one in O, we formulate the analog of equation 3.6, valid for all pairs of columns connected under a control unit u, Ẇικu = S(u)[ fικ (W) − Wικu Bικ (f(Wu ))], (5.1) with the cooperation term fικ = gικ + Wικu ι κ C(ι, ι , κ, κ )Wιu κ . (5.2) Again, the competition term is of the form of equation 2.3. The pattern similarity term gικ is analogous to equation 2.4, only this time computed on the basis of the activity patterns in the pair of columns to which the equations are applied. The indices ι and κ are both two-dimensional, the two dimensions referring to orientation and scale of the Gabor feature space. We use a log-Gabor filter bank (Field, 1987) with 15 orientations and 8 scales (see Figures 6c and 6d for two examples). The cooperation function C is assumed to have the same form as that in equation 3.7, that is, a product of gaussian functions for the two dimensions. Orientations have periodic boundary conditions; the scale dimension has open boundary conditions. We have simulated this system for a single pair of columns in I and O, the pair corresponding to the central link of the simulation shown in Figure 7, extracting the log-Gabor features centered on the point of origin of that link in the input pattern and initializing the activity pattern in the column in O with random numbers. The typical evolution of the system for a single control unit is shown in Figure 11. As mappings between I and O vary only in terms of scale and orientation (and not in terms of translation or other dimensions), selecting a given pair of points in the two domains automatically also means selecting one transformation parameter set (s, θ ). An exception is the pair of center points in the two domains, which experiences all transformations in s and θ . To cover this case, we have followed the full schedule of algorithm 1, creating input feature sets extracted from sand θ -transformed input images for the sub-sequences. Figure 12 shows the final states of a set of seven control units representing rotations of the input jet (which appear as horizontal translations in feature space), with scale s = 1.0. Varying the scale of the input image unfortunately did not lead to regular maps shifting systematically in the vertical feature dimension. This may be due to the fact that Gabor responses for different scales at a given orientation tend to be similar to each other (Gabor magnitudes are constant for edges with step function profile). 5.1 Consistency of Feature Maps. As each control unit stands for definite values of the transformation parameters’ scale and orientation, feature mappings under its purview must be consistent with those parameters. 1026 T. Fernandes and C. von der Malsburg Figure 11: Feature mappings and pattern formation in Gabor feature space. Evolution of the activity pattern in a column in O (first row) and the map from the corresponding column in I (second row) for the center fiber of the spatial mapping in Figure 7 under the influence of a single control unit. The Gabor responses are organized as a two-dimensional array with 15 orientations and 8 scales in the horizontal and vertical axes, respectively. Again, both feature mapping and columnar activity in O are initialized randomly. After 3000 iterations, the output pattern has converged to the input pattern (not shown), and the mapping is clearly defined. Figure 12: Final states of feature-to-feature control units. Each map represents one transformation of the feature space encoded by the connections of one control unit. We use wrap-around on the orientation (horizontal) axis, so that rotations in image space correspond to translations in feature space. No changes in the scale of the pattern were considered (s = 1.0). The top row shows the Gabor jets of the input with seven rotations from 0◦ to 60◦ in steps of 12◦ . The bottom row shows the corresponding maps, developed through the process illustrated in Figure 11. Self-Organization of Control Circuits 1027 This is ensured in the following way. We assume the point maps have already been organized but the feature maps have not. When an object is inspected and a pattern I is presented in I , some control unit will get the upper hand after a short interval, presumably one that scales and rotates the input pattern (if the input pattern has a clear orientation) into the standard size-and-orientation format of the output domain. The mapping activated by this control unit will project I into O. The feature units in the columns of I are in a definite state as activated by the input image. The initially still disorganized connections to the corresponding columns in O generate a random distribution of activity over their feature units. If the coupling in the columns in O is weak enough to allow independent activity of feature units in its columns, a dynamic instability will ensue, as described by equations 3.8 and 5.1. We assume, however, that the modification of Wικu does not continue to completion (establishment of a one-to-one feature map) but will result in only some small increment. If, next, a transformed version of the input pattern is presented as part of the inspection of the object, the lingering activity pattern in O is now confronted with a transformed pattern in I —transformed in terms of both scale or orientation and feature activities. The control unit responsible for the mapping of the new version of the input pattern to the unchanged output pattern will now be activated, and a given output column will find itself connected by another active link with a new input column. That input column refers to the same point on the surface of the object but has transformed feature values. During an inspection sequence (the same object seen under different transformations), there will thus be differently transformed feature distributions exposed to the same (though slowly changing) activity distribution in the same target column. After a sufficient number of such inspection sequences, the system will settle into a state in which all mappings lead to an invariant representation in O—invariant in terms of two-dimensional layout and feature activity. This process is reminiscent of Földiak’s slowness idea (Földiak, 1991), although it conceptually differs from it, as remarked above. It is a logical requirement that the same object is shown to the system in different transformation states in sequence, so that the control units responsible for these transformations can associate the same output feature distribution with the corresponding transformations of the same texture in the input domain. There is, though, no logical requirement for these transformations to follow each other in a continuous and incremental sequence. As we argued for the point-mapping case, however, it greatly helps convergence of the system, given the preexcitation of neighboring control units (see equation 3.3). 6 Discussion and Conclusion The purpose of this project was to set up specific connectivity patterns of control units, enabling them to perform tests for similarity between input 1028 T. Fernandes and C. von der Malsburg patterns and memory patterns and to activate mappings that transform input patterns to invariant memory patterns. This functionality, and the interaction with a memory containing many patterns, has been realized on the basis of manual prewiring in Wolfrum et al. (2008). Although the required connectivity patterns of control units are intricate in detail, the process we are proposing for setting them up by self-construction in early ontogenesis is very simple. The basic format of this process is network self-organization: an initial, partially structured, partially random network generates activity patterns under the influence of spontaneous or sensory input, and the activity patterns act back on the network by a mechanism of synaptic plasticity. This loop iterates many times until the network structure stabilizes. In distinction to most other studies of map formation by network self-organization, it is not the synaptic weights of the map that are subject to plasticity but the third-party connections of control units with the synapses of the projecting fibers. We argued in Bergmann and von der Malsburg (2011) that the development of control circuits for position and size invariance may happen prenatally on the basis of spontaneously created activity blobs of varying size and position in the input domain. We believe, however, that setting up circuits for orientation invariance and for the transformation of features requires actual visual input involving rotation and scaling of images; one can hardly imagine neural mechanisms able to spontaneously create training input with consistent transformation of point and feature patterns. Although control units and their role may appear somewhat exotic, they can be implemented by relatively ordinary neurons. As has already been proposed by many authors (e.g., Mel, 1994), nonlinearity of dendritic patches of target neurons may induce interactions between neighboring synapses. If the nonlinearity amounts to a threshold, the effectiveness of the signals in individual synapses may depend critically on the simultaneous presence of signals in neighboring synapses, resulting in the type of control we are invoking here. During the development of our system, we encountered difficulties arising from the size of the connectivity search space. Input and output domains are two-dimensional arrays of local texture spaces (modeled here by receptive fields of Gabor type), which themselves have two dimensions (scale and orientation). Thus, the two domains are 4-dimensional entities, such that the fibers between them are samples of a 16-dimensional space. This seems to pose a quantitative and a dynamical problem. The quantitative problem is that of realizing in the computer (or in the biological system) the number of trial fibers implied by a full sampling of the mapping space Wτuιρκ (realizing that each index stands for two dimensions). The dynamic problem consists of the difficulty of avoiding local optima during network contraction. Both problems are addressed here by a proper coarse-to-fine strategy. If, early in the game, units in the input and output domains are gathered by short-range connections into coarse-grained units of many Self-Organization of Control Circuits 1029 neurons whose signals are tightly correlated, the search space is reduced in size accordingly. Moreover, even if the connectivity space is sampled randomly and sparsely in appropriate fashion, the coarse-grained units may still have all-to-all connections. Once the mapping has contracted sufficiently, lateral connections may be reduced in strength to open more degrees of freedom. We have made use of this strategy in two ways, by using broad coupling functions C in equations 3.7 and 5.2 initially and by assuming that point mappings are formed first while whole feature columns initially act as units, opening feature degrees of freedom only after convergence of the point mappings. We have not made any attempt to model the actual three-dimensional geometry of the system, which would be a major undertaking. We realize that this keeps us from addressing a number of biologically interesting questions. As to genetic control of the process, it is limited to setting the initial stage: two two-dimensional domains I and O, local connectivity, and the initial connectivity between them, presumably of some stochastic structure, signal dynamics of appropriate type, modeled implicitly in our formulation, synaptic plasticity mechanisms, and a schedule for changing parameters. We are acting, in our formulation, as if the control units stood globally for whole mappings. This is unrealistic for a number of reasons. If each control unit governed only a rather local “maplet,” a whole mapping being established by cooperative interaction between many control units forming a field, the potential arises for deformable mappings to be realized, as required, for instance, by the deformation of object surfaces during rotation in depth. If there is enough redundancy in the number of control units and if the connections between control units are plastic themselves, a memory for deformation patterns could be installed. Control units could also realize a hierarchy of spatial and feature resolution levels as a basis for coarse-tofine pattern memory search strategies. Here, the interaction among units of the control space was simplified to a one-dimensional circular chain. With this simplification, we lose the interaction between units with similar orientation but with slightly different scaling parameters. As we employed a controlled sequence of smoothly changing input patterns (first all rotations for a input pattern, then changing its scale and again rotating, and repeating until completion of the parameter set), it does not disturb the process of fiber organization (see Figure 10). As already argued, in a more realistic scenario, the control space may itself be high-dimensional, in which case it would be advantageous to factorize it into independent subspaces for each transformation parameters (e.g., into separate sets of units controlling resolution levels, translation, scale, and orientation). In this case, a cascade of several stages built on the system proposed here could be employed, although it is not clear yet how this hierarchical control space would interact during the organization process. In the biological case, at least from the point of view of the necessary machinery, this could be implemented by chaining 1030 T. Fernandes and C. von der Malsburg up mappings through a number of domains (e.g., V1, V2, V4, IT) or by controlling individual projection synapses by several control unit sets. 6.1 Experimental Predictions. Like Anderson and Van Essen (1987), we are positing the existence of control units, neurons (or astrocytes? Möller, Lücke, Zhu, Faustmann, & von der Malsburg, 2007) whose activity correlates with the size of the postsynaptic effect of projection fibers and with transformation parameters (position, size or orientation) of at least local patches of attentively inspected figures. We further predict that processes of control units are closely colocated with projecting fiber synapses on dendritic patches of target neurons. We surmise that the control of synaptic efficacy is due to concave nonlinearities of these dendritic patches. Although it is imaginable that the collection of signal similarity information, equation 3.2, and the delivery of the synaptic control, exerted by factor S(u) in equation 3.9, are transported by separate control unit processes (dendrites and axons), the number of degrees of freedom to be managed during ontogenesis would be much smaller if both types of graded signals would be conducted in opposite directions along the same processes, in which case one would speak of neurites. Once the field of connectomics succeeds in reconstructing the precise wiring and geometry of a block of cortical tissue, its most promising fruit could well be the documentation of the arrangement of neural and control unit processes that we are predicting here. Acknowledgments This work was supported by the EU project FP7-216593 SECO. References Ackley, H., Hinton, E., & Sejnowski, J. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9, 147–169. Anderson, C. H., & Van Essen, D. C. (1987). Shifter circuits: A computational strategy for dynamic aspects of visual processing. Proceedings of the National Academy of Sciences, 84(17), 6297–6301. Arathorn, D. W. (2002). Map-seeking circuits in visual cognition: A computational mechanism for biological and machine vision. Stanford, CA: Stanford University Press. Bergmann, U., & von der Malsburg, C. (2011). Self-organization of topographic bilinear networks for invariant recognition. Neural Computation, 23, 2770–2797. Daugman, J. G. (1980). Two-dimensional spectral analysis of cortical receptive field profiles. Vision Research, 20, 847–856. Field, D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America. A, Optics, Image Science, 4(12), 2379–2394. Self-Organization of Control Circuits 1031 Földiak, P. (1991). Learning invariance from transformation sequences. Neural Computation, 3, 194–200. Goodhill, G. J. (2007). Contributions of theoretical modeling to the understanding of neural map development. Neuron, 56, 301–311. Häussler, A. F., & von der Malsburg, C. (1983). Development of retinotopic projections: An analytic treatment. Journal of Theoretical Neurobiology, 2, 47–73. Hinton, G. E. (1981). A parallel computation that assigns canonical object-based frames of reference. International Joint Conference on Artificial Intelligence (pp. 683– 685). Menlo Park, CA: American Association for Artificial Intelligence. Jones, J. P., & Palmer, L. A. (1987). An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. Journal of Neurophysiology, 58(6), 1233–1258. Kree, R., & Zippelius, A. (1988). Recognition of topological features of graphs and images in neural networks. J. Phys. A, 21, 813–818. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinburger (Eds.), Advances in neural information processing systems, 25. Red Hook, NY: Curran. Lades, M., Vorbrüggen, J. C., Buhmann, J., Lange, J., von der Malsburg, C., Würtz, R. P., & Konen, W. (1993). Distortion invariant object recognition in the dynamic link architecture. IEEE Transactions on Computers, 42, 300–311. Lücke, J. (2005). Information processing and learning in networks of cortical columns. Doctoral dissertation, Ruhr University, Bochum. Mel, B. W. (1994). Information processing in dendritic trees. Neural Computation, 6, 1031–1085. Möller, C., Lücke, J., Zhu, J.-M., Faustmann, P. M., & von der Malsburg, C. (2007). Glial cells for information routing? Cognitive Systems Research, 8, 28–35. Olshausen, B. A., Anderson, C. H., & Van Essen, D. C. (1993). A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. Journal of Neuroscience, 13(11), 4700–4719. Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025. Rolls, E. T. (1991). Neural organization of higher visual functions. Current Opinion in Neurobiology, 1, 274–278. Rumelhart, D., & McClelland, J. L. (1987). Parallel distributed processing, vol. 1. Cambridge, MA: MIT Press. Sato, Y. D., Jitsev, J., & von der Malsburg, C. (2009). A visual object recognition system invariant to scale and rotation. Neural Network World, 19, 529–544. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2014). Overfeat: Integrated recognition, localization and detection using convolutional networks. In Proceedings of the International Conference on Learning Representations. von der Malsburg, C. (1994). The correlation theory of brain function. In E. Domany, J. L. van Hemmen, & K. Schulten (Eds.), Models of neural networks II. Berlin: Springer. (Original date 1981) von der Malsburg, C. (1995). Network self-organization in the ontogenesis of the mammalian visual system. In S. F. Zornetzer, J. Davis, & C. Lau (Eds.), An introduction to neural and electronic networks (2nd ed., pp. 464–467). Orlando, FL: Academic Press. 1032 T. Fernandes and C. von der Malsburg Willshaw, D. J., & von der Malsburg, C. (1976). How patterned neural connections can be set up by self-organization. Proc. R. Soc. London, B194, 431–445. Willshaw, D. J., & von der Malsburg, C. (1979). A marker induction mechanism for the establishment of ordered neural mappings: Its application to the retinotectal problem. Phil. Trans. of the R. Soc. London, Series B, Biological Sciences, 287, 203–243. Wiskott, L., & von der Malsburg, C. (1996). Face recognition by dynamic link matching. In J. Sirosh, R. Miikkulainen, & Y. Choe (Eds.), Lateral interactions in the cortex: Structure and function. Austin, TX: UTCS Neural Networks Research Group. www.cs.utexas.edu/users/nn/web-pubs/htmlbook96/wiskott Wolfrum, P., Wolff, C., Lücke, J., & von der Malsburg, C. (2008). A recurrent dynamic model for correspondence-based face recognition. Journal of Vision, 8, 1–18. http://journalofvision.org/8/7/34 Wolfrum, P., & von der Malsburg, C. (2007). What is the optimal architecture for visual information routing? Neural Computation, 19, 3293–3309. Zhu, J., Bergmann, U., & von der Malsburg, C. (2010). Self-organization of steerable topographic mappings as basis or translation invariance. In Artificial Neural Networks—ICANN 2010, Lecture Notes in Computer Science 6353, 414–419. Berlin: Springer. Received February 28, 2014; accepted December 1, 2014.