Computational approaches to sensorimotor transformations

November 2000 Volume 3 Number Supp pp 1192 - 1198 Computational approaches to sensorimotor transformations Alexandre Pouget1 & Lawrence H. Snyder2 1. Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA 2. Department of Anatomy and Neurobiology, Washington University, 660 S. Euclid Avenue, St. Louis, Missouri 63110, USA Correspondence should be addressed to A Pouget. alex@bcs.rochester.edu Behaviors such as sensing an object and then moving your eyes or your hand toward it require that sensory information be used to help generate a motor command, a process known as a sensorimotor transformation. Here we review models of sensorimotor transformations that use a flexible intermediate representation that relies on basis functions. The use of basis functions as an intermediate is borrowed from the theory of nonlinear function approximation. We show that this approach provides a unifying insight into the neural basis of three crucial aspects of sensorimotor transformations, namely, computation, learning and short-term memory. This mathematical formalism is consistent with the responses of cortical neurons and provides a fresh perspective on the issue of frames of reference in spatial representations. The term 'sensorimotor transformation' refers to the process by which sensory stimuli are converted into motor commands. This process is crucial to any biological organism or artificial system that possesses the ability to react to the environment. Accordingly, this topic has attracted considerable attention in neuroscience as well as engineering over the last 30 years. A typical example of such a transformation is reaching with the hand toward a visual stimulus. In this case, as for most sensorimotor transformations, two issues must be resolved. First, one must determine the configuration of the arm that will bring the hand to the spatial location of the visual stimulus (kinematics). The second problem is specifying and controlling the application of force to determine the movement trajectory (dynamics)1, 2. This review focuses almost exclusively on kinematics (see Wolpert and Ghahramani, this issue, for models of movement dynamics). Our goal is to provide an overview of the basis function approach to sensorimotor transformations. In this approach, sensory information is recoded into a flexible intermediate representation to facilitate the transformation into a motor command. This has the advantage of explaining how the same neurons can be engaged in three seemingly distinct aspects of sensorimotor transformations, namely, computation, learning and short-term memory. We first review the theory behind representing and transforming spatial information using basis functions. Next we describe how these transformations can be learned using biologically plausible algorithms. Finally, we explain how to implement short-term spatial memory and updating of motor plans using these representations. In each case, we examine the extent to which models relying on basis functions are consistent with known neurobiology. Remarkably, all three tasks—computation, learning and short-term memory of spatial representations—can be efficiently handled using a neural architecture derived from the basis function approach. As we will see, a basis function representation is a form of population code, and we argue that the exceptional computational versatility of basis functions may explain why population codes are so ubiquitous in the brain. Basis functions for sensorimotor transformations Sensorimotor transformations are often formalized in terms of coordinate transformations. For instance, to reach for an object currently in view, the brain must compute the changes in joint angles of the arm that will bring the hand to the desired spatial location. This computation requires combining visual information—the retinal or eye-centered coordinates of the object—with signals related to the posture of body parts, such as the position of the eyes in the head (eye position), the position of the head with respect to the trunk (head position) and the starting position of the arm. We refer to such positional signals as 'posture signals'. In this way, we can recast a coordinate transformation as the computation of the value of a particular function. This function takes visual and postural signals as input and produces as output the set of changes in joint angles required to solve the task (for example, bring the hand to the target). Recasting the coordinate transformation as computing the value of a function makes it easier to generate and test biologically inspired models of this aspect of brain function. We will adopt a vectorial notation for the signals being transformed. V is a vector that encodes an object's location in eye-centered space. It has three components, which correspond, respectively, to the image's azimuth and elevation on the retina and the object's distance from the retina. This is clearly not the form of the representation used by the brain, but the vector format is not important at this stage (see below). We use similar notations, P and J, for the posture signals and the change in joint coordinates. A coordinate transform can then be written as a function, f(), mapping V and P onto J: J = f(V, P). It is useful to divide all functions into two classes, linear and nonlinear (Fig. 1). Sensorimotor transformations almost exclusively belong to the second class. The nonlinearity arises from the geometry of our joints. The change in spatial location of the hand that results from bending the elbow depends not only on the amplitude of the elbow movement, but also on the state of the shoulder joint. As a result, a neural network implementation of a sensorimotor transformation requires at least three layers. There must be at least one intermediate layer (the so-called 'hidden layer') to recode the sensory inputs before they can be transformed into motor commands (Box 1; Fig. 1d). One of the challenges in computational neuroscience has been to identify intermediate representations that are both biologically plausible and computationally efficient for these nonlinear mappings. One solution involves using intermediate units that compute basis functions3-5, because most functions of interest can be approximated using a linear combination of basis functions. The best known basis set is that used by the Fourier transform: any function can be expressed as the linear sum of a series of cosine and sine functions of arbitrary amplitudes and frequencies. Many other functions can be used to form basis sets (Box 1 ; Fig. 1d). When applied to sensorimotor transformations, and in particular to the example of reaching toward a visual target, the idea is that the reaching motor command J can be obtained by taking a weighted sum of N basis functions {Bi(V,P)}N i=1 of the visual and posture signals, V and P: The set of weights, {wi} Ni=1, is specific to the reaching motor command being computed and, as we will see later, can be determined using simple learning rules. Many choices are available for the basis functions. For instance, one can use the set of all Gaussian functions of V and P, which are a subset of a larger family known as radial basis functions (RBF) 3. The network in Fig. 1d is an example of a radial basis function network in which the variables considered are x and y instead of V and P. This type of representation is also sometime called a population code, that is, a code in which the variables are encoded through the activity of a large population of neurons with overlapping bell-shape tuning curves. Population codes may be ubiquitous in the nervous system because they provide basis sets. An alternative choice for a basis set is the product of a Gaussian function of the eye-centered position of an object (V) and a sigmoid function of eye position (P). In an idealized neuron that performs this calculation (Fig. 2a), the receptive field is eye-centered, that is, it remains at the same location relative to the fovea regardless of eye position (Fig. 2b). However, response gain (that is, its amplitude) changes with eye position. From a biological point of view, one problem with Eq. 1 is the format of the input and output vectors. For instance, we used polar coordinates for vector V, yet no such vector has been explicitly identified in the cortex. Instead, the visual position of objects is encoded by the activity of a large number of binocular neurons forming the retinotopic maps in the early visual areas. This does not mean that we cannot use the basis function framework. We simply replace the vector V with a new vector, VA, which has as many components as there are neurons in the retinotopic map; each component corresponds to the activity (for example, firing rate) of one neuron. Likewise, the vectors P and J can be replaced by the corresponding neuronal patterns of activities PA and JA (Fig. 2c ). Many network models of sensorimotor transformations rely on such basis function representations in their intermediate layer6-9. Biological plausibility of the basis function approach The basis function approach requires that the tuning curves of neurons in intermediate stages of computation provide a basis function set. A set of functions constitutes a basis set if certain requirements are met. First, the functions must combine their inputs, for example, the visual input, V, and the posture signals, P, nonlinearly and so that they cannot be decomposed into separate functions of V and functions of P. This rules out linear functions and functions of the type Bi( V,P) = Ci(V) + Di(P). Furthermore, the functions must be able to fully cover the range of possible input values5. In other words, there must be units with all possible combinations of selectivity for visual and posture signals. Neurons whose response can be described by a Gaussian function of retinal location multiplied by a sigmoidal function of eye position would qualify (Fig. 2a and b)5. Many such gain-modulated neurons are found in the parietal lobe, where there are neurons with all possible combinations of visual and eye position selectivities10. Gain modulations between sensory and posture signals are also observed in occipital11-15 and premotor cortices16, suggesting that basis function representations may be widely used. In many early papers, gain modulation by posture signals was reported to be linear, not sigmoidal. This is clearly incompatible with the basis function hypothesis, as basis functions require nonlinear tuning curves. These experiments, however, were designed to detect an effect, not to distinguish the precise form of the gain field. A linear model of gain fields was simple and lent itself most easily to statistical testing. However, recent experiments17 and new analyses5 reveal significant nonlinearities consistent with sigmoidal modulation. This conclusion is based on data from the parietal cortex, but given the similarities among gain fields throughout the cortex, it is reasonable to think that it applies to most gain fields. Another line of evidence in support of the basis function approach comes from the study of hemineglect patients with right parietal cortex lesions. These patients tend to ignore sensory stimuli located on their left18. 'Left', however, can be defined with respect to multiple frames of reference; it could be the left side with respect to the eyes (that is, the left visual field), head or body. For example, consider a subject who turns his head to the left of a stimulus that lies directly in front of him, but then moves his eyes far to the right. The stimulus will now lie on the midline with respect to the body, to the right with respect to the head, and to the left with respect to the eyes. By assessing neglect using a variety of body postures and stimulus locations, one can attempt to determine the relevant frame of reference for a given patient's neglect. Interestingly, such experiments show that neglect often affects multiple frames of reference (for review, see ref. 19). This observation fits well with one property of basis function representations, namely, that they encode location in multiple frames of reference simultaneously. For instance, a basis function representation integrating a visual input with eye position signals (Fig. 2c) represents the location of objects in eye- and head-centered frames of reference simultaneously5. Indeed, to recover the position of an object in, say, head-centered coordinates, one must compute a function of the eye-centered position of the object as well as the current eye and head positions. As for any other function, this can be done with a simple linear transformation of the activity of the basis function units. As a result, a simulated lesion of a basis function representation can explain why hemineglect affects several frames of reference across a variety of tasks19. The multiplicity of frames of reference is one of the most distinguishing properties of basis function representations. In more traditional approaches to spatial representations, object position is represented in maps using one particular frame of reference. Multiple frames of reference require multiple maps, and a neuron can only contribute to one frame of reference, specific to its map. By contrast, in a basis function map, each neuron contributes to multiple frames of reference. Thus, basis function neurons are ideally placed to coordinate different behaviors, such as moving the eyes and hand to the same object, even though these movements must be programmed in distinct coordinates. Learning sensorimotor transformations A few sensorimotor transformations (such as the eyeblink reflex) may already be wired at birth and require little training. In most cases, however, the mapping from sensory to motor coordinates must be learned and updated through life, as eyes, arms and other body parts change in size and weight. As before, we focus exclusively on the issue of coordinate transformations. How do we learn and maintain a mapping of sensory coordinates of objects into motor coordinates? Piaget20 proposed that babies learn by associating spontaneous motor commands with the sensory consequences of those spontaneous actions. Consider how this would apply to a two-layer network used to control arm movements. We assume that the input layer encodes the visual location of the hand, whereas the output layer represents reaching motor commands in joint-centered coordinates. (Motor commands actually require changes in joint angles, but for simplicity we will consider absolute angles.) On each trial, the network generates a spontaneous pattern of activity in the motor layer. This pattern is fed to the arm, which moves accordingly, and the network receives visual feedback of the resulting hand position. At this point, the system can learn to associate the patterns in the sensory and output layers. In particular, the Hebb rule can be used to increase the weights between co-active sensory and motor units21. A further refinement is to treat the position of the hand after the spontaneous movement as a target to be reached. The idea is to first compute the motor command that the network would have generated if it had aimed for that location from the start. We call this the 'predicted' motor command. (Note that this movement is not actually executed.) We can then compare this predicted command to the original spontaneous one. Because the spontaneous command is precisely the command that brings the hand to the current location, we should adjust the network weights to make the predicted motor command closer to the spontaneous one. To compute the predicted motor command, we use the visually determined hand location after the spontaneous movement as a network input, and use the current weights to compute the activity of the motor units. If the predicted and spontaneous commands are the same, no learning is required. If they differ, then the difference between the spontaneous and predicted motor commands can be used as an error signal to adjust the weights (Fig. 3a). For instance, one could use a learning rule known as the delta rule22, which takes the form wij = ai (a j* - a j), where wij is the change in the weight between the presynaptic sensory unit i and postsynaptic motor unit j, is a learning rate, a i is the activity of the presynaptic unit, aj * is the spontaneous postsynaptic motor activity, and a j is the predicted postsynaptic motor activity. This strategy works well if the sensorimotor transformation is linear— if it can be implemented in a two-layer network (Box 1)—such as learning to make an eye movement to a visual target. Indeed, the retinal location of a target and the saccade vector required to acquire that target are identical. The transformation from a sensory map (for example, V1) to a motor map (superior colliculus) is therefore an identity mapping, which is a linear transformation21. Unfortunately, however, most sensorimotor transformations are nonlinear, and the networks that compute them require at least one intermediate layer. We showed above that a good choice for the intermediate representation is to use basis functions. This turns out to be a good choice for learning as well. Indeed, with basis functions, we can decompose learning into two independent stages: first, learning the basis functions and, second, learning the transformation from basis functions to motor commands (Fig. 3b). The basis functions can be learned via a purely unsupervised learning rule. In other words, they can be learned without regard to the motor commands being computed—before the baby even starts to move his arm. Indeed, because by definition any basis function set can be used to construct any motor command (Eq. 1), the choice of a basis set is independent of the motor commands to be learned. The choice is constrained instead by general considerations about the computational properties of the basis functions, such as their robustness to noise or their efficiency during learning3, as well as considerations about biological plausibility. Gaussian and sigmoid functions are often a good choice in both respects. It is also crucial that the basis functions tile the entire range of input values encountered, that is, they must form a map in the input space considered, like the one shown in Fig. 2c. This can be done by using variations of the Hebb and delta learning rules with an additional term to enforce global competition, to ensure that each neuron learns distinct basis functions23-25. The second problem—learning the transformation from basis functions to motor commands—is easy because motor commands are linear combinations of basis functions (Fig. 2c). We only need to learn the linear weights in Eq. 1, which can be done as outlined above with a simple delta rule7, 8 (Fig. 3b). However, even a three-layer network trained with a combination of delta rules and unsupervised learning does not always suffice. If a nonlinear sensorimotor transformation is a one-to-many mapping, then a network trained with delta rules will converge on the average of all possible solutions, which is not necessarily itself a solution. Consider a planar two-joint arm whose segments are of equal length (Fig. 4a). There are exactly two configurations that will reach a target lying less than two segment lengths from the origin. A network trained using the delta rule would converge on the average of these two configurations, which unfortunately would overshoot the target. A similar problem would arise in the primate arm. Jordan and Rumelhart26 proposed a solution to this problem. They use two networks: one to compute the required movement, and one to map that movement into a prediction of the resulting visual location of the hand (Fig. 4b). In engineering terms, these two networks constitute an inverse and a forward model, respectively. The forward model can be thought of as an internal model of the arm; it attempts to predict the perceptual consequence of one's action (see Wolpert and Ghahramani, this issue, for experimental evidence supporting forward models in the brain). One role of the forward model is to ensure that the system does not average across commands leading to the same sensory outcome26. Each model requires three or more layers, as both the forward and inverse models are typically nonlinear. In principle, one can use basis function networks and the learning strategy described above. Jordan and Rumelhart instead used the backpropagation algorithm27, which allows efficient propagation of the error signal through the network layers but is not believed to be biologically plausible. Thanks to computational modeling, we now have a better idea of how Piaget's proposal for motor learning could be implemented in biologically plausible neural networks using simple learning rules. The basis function hypothesis, in particular, is well suited to the learning of complex nonlinear sensorimotor transformations. Recent work on the parietal cortex has suggested how these models can be tested28, 29. Short-term memory and updating motor plans As shown above, representations using basis functions are useful for computing and learning sensorimotor transformations because they provide a basis set. It turns out that these representations are also efficient for short-term or working memory, the ability to remember a location for a few hundred milliseconds to a few seconds. Short-term memory is closely related to the problem of updating motor plans. Consider a retinotopic map, that is, a population of neurons with Gaussian tuning curves for the eye-centered location of an object. In such a map, the appearance of an object triggers a 'hill' of activity centered at the location of the stimulus (Fig. 5a). This hill can serve as a memory of the object if it can be maintained over time after the object disappears30. Much data supports the notion that this is indeed how short-term memory is manifested in the cortex30-34. (Note that the retinotopic map need not be arranged topographically on the cortical surface. For example, area LIP seems to contain such a map, inasmuch as the response fields of the neurons completely tile visual sensory space, even though neurons coding adjacent locations do not necessarily lie next to one another.) To maintain a hill of activity in a network of units with Gaussian tuning curves, one needs to add lateral connections with specific weights. Because Gaussian tuning curves are basis function, it is particularly easy to find the appropriate values analytically or to learn them with biologically plausible learning procedures such as the delta rule35. A typical solution is to use weights that generate local excitation and long-range inhibition. Such connections seem to exist in the cortex and are extensively used in computational modeling of cortical circuits (for example, ref. 36). The resulting networks are simply basis function maps to which lateral connections have been added. Computation of motor commands and maintenance of spatial short-term memory can therefore be done by the same basis function neurons. This could explain why many neurons showing gain modulation in the parietal cortex also show memory activity (L.H.S., unpublished data). The addition of lateral connections does not alter in any way the computational properties of the basis function map; indeed, lateral connections can contribute to computation of the basis functions themselves37 or serve as optimal nonlinear noise filters38. One requirement for a memory system in retinotopic coordinates is that activity be updated to take into account each eye or head movement. Consider a memory saccade task in which the target appears 15° to the right of the fixation point and is then extinguished. If the eyes are then moved 10° to the right before acquiring the target, the internal representation of the target location must be updated because the target itself is no longer visible. In this particular case, the invisible target would now lie 5° to the right of the fixation point. More generally, if R is the object's retinal location and the eyes move by E, the new retinal location is R - E. Neurophysiological evidence indicates that such updating does occur in the brain, not only for eye-centered memory maps in response to eye movements32, 39, 40 but also for head-centered maps in response to head movements41. This updating mechanism is closely related to the problem of updating motor plans. Indeed, the eye movement required to foveate an object is simply equal to the retinal location of the object, from which it follows that remembering a retinal location is mathematically equivalent to remembering an eye movement. Therefore, the updating mechanism we have just considered can also be interpreted as updating a motor plan for an eye movement. Interestingly, updated memory activity is often observed in neurons that also show presaccadic activity40, 42. If hills of activity are used for short-term memory, updating in this example would require moving the hill from the position corresponding to 15° right to the position corresponding to 5° right (Fig. 5b). As long as an eye-velocity or eye-displacement signal is available to the network, this updating mechanism is very easy to implement with the networks above. For instance, the lateral connections can be modified such that the network moves the hills with a velocity proportional to the negative of the eye velocity35. Other ideas have been explored as well, but available experimental data do not constrain which of these variations are used in the brain43-45. In particular, different schemes may be used in areas with and without topographic maps. Discussion Our understanding of the neural basis of sensorimotor transformations has made outstanding progress over the last 20 years, in part because of new experimental data but also thanks to the development of the theoretical ideas we have reviewed in this paper. This is a prime example in which theoretical and experimental approaches have been successfully integrated in neuroscience. Perhaps the most remarkable observation that has come out of this research is that seemingly distinct problems, such as computing, learning and remembering sensorimotor transformations can be handled by the same neural architecture. As we have seen, networks of units with bell-shaped or sigmoidal tuning curves for sensory and posture signals are perfectly suited to all three tasks. The key property is that units with bell-shaped tuning curves provide basis functions, which, when combined linearly, make it easy to compute and learn nonlinear mappings. The simple addition of appropriate lateral connections then adds information storage (memory) plus the ability to update stored information after each change in posture. Clearly, basis function networks are only the beginning of the story. Unresolved issues abound, starting with the problem that basis function representations are subject to combinatorial explosion, that is, the number of neurons required increases exponentially with the number of signals being integrated. Hence a basis function map using 10 neurons per signal and integrating 12 signals would require 1012 neurons, more than total number of neurons available in the cortex. One solution is to use multiple modules of basis functions. For instance, one could use two maps, connected in a hierarchical fashion, in which the first map integrates 6 signals and the second map the remaining 6 signals, for a total of 2 106, which is close to, if not less than, the total number of neurons available in a single cortical area. It is clear that the brain indeed uses multiple cortical modules for sensorimotor transformations, and it will be interesting to identify the computational principles underlying this modular architecture46, 47. Another problem will be to understand how these circuits handle neuronal noise. Neurons are known to be noisy. This implies that we must not only worry about how they compute but also how they do so efficiently or reliably in the presence of noise38. These questions promise to provide interesting issues for computational neuroscience to address for years to come. Received 30 May 2000; Accepted 4 October 2000. REFERENCES 1. Tresch, M., Saltiel, P. & Bizzi, E. The construction of movement by the spinal cord. Nat. Neurosci. 2, 162–167 (1999). MEDLINE 2. Todorov, E. Direct cortical control of muscle activation in voluntary arm movements: a model. Nat. Neurosci. 3, 391– 398 (2000). MEDLINE 3. Poggio, T. A theory of how the brain might work. Cold Spring Harbor Symp. Quant. Biol. 55, 899–910 (1990). MEDLINE 4. Pouget, A. & Sejnowski, T. A neural model of the cortical representation of egocentric distance. Cereb. Cortex 4, 314– 329 (1994). MEDLINE 5. Pouget, A. & Sejnowski, T. Spatial transformations in the parietal cortex using basis functions. J. Cogn. Neurosci. 9, 222– 237 (1997). 6. Groh, J. & Sparks, D. Two models for transforming auditory signals from head-centered to eye-centered coordinates. Biol. Cybern. 67, 291–302 ( 1992). MEDLINE 7. Burnod, Y. et al. Visuomotor transformations underlying arm movements toward visual targets: a neural network model of cerebral cortical operations. J. Neurosci. 12, 1435–1453 (1992). MEDLINE 8. Salinas, E. & Abbot, L. Transfer of coded information from sensory to motor networks. J. Neurosci. 15, 6461–6474 (1995). MEDLINE 9. Zipser, D. & Andersen, R. A back-propagation programmed network that stimulates reponse properties of a subset of posterior parietal neurons . Nature 331, 679–684 (1988). MEDLINE 10. Andersen, R., Essick, G. & Siegel, R. Encoding of spatial location by posterior parietal neurons . Science 230, 456–458 (1985). MEDLINE 11. Trotter, Y., Celebrini, S., Stricanne, B., Thorpe, S. & Imbert, M. Neural processing of stereopsis as a function of viewing distance in primate visual area V1. J. Neurophysiol. 76, 2872–2885 ( 1997). 12. Trotter, Y. & Celebrini, S. Gaze direction controls response gain in primary visual-cortex neurons. Nature 398, 239–242 (1999). MEDLINE 13. Galletti, C. & Battaglini, P. Gaze-dependent visual neurons in area {V3a} of monkey prestriate cortex. J. Neurosci. 9, 1112–1125 (1989). MEDLINE 14. Bremmer, F., Ilg, U., Thiele, A., Distler, C. & Hoffman, K. Eye position effects in monkey cortex. I: Visual and pursuitrelated activity in extrastriate areas MT and MST. J. Neurophysiol. 77, 944–961 ( 1997). MEDLINE 15. Cumming, B. & Parker, A. Binocular neurons in V1 of awake monkeys are selective for absolute, not relative, disparity. J. Neurosci. 19, 5602–5618 (1999). MEDLINE 16. Boussaoud, D., Barth, T. & Wise, S. Effects of gaze on apparent visual responses of frontal cortex neurons. Exp. Brain Res. 93, 423–434 (1993). MEDLINE 17. Squatrito, S. & Maioli, M. Gaze field properties of eye position neurones in areas MST and 7a of macaque monkey. Vis. Neurosci. 13, 385–398 ( 1996). MEDLINE 18. Vallar, G. Spatial hemineglect in humans. Trends Cogn. Sci. 2, 87–97 (1998). 19. Pouget, A., Deneve, S. & Sejnowski, T. Frames of reference in hemineglect: a computational approach. Prog. Brain Res. 121, 81– 97 (1999). MEDLINE 20. Piaget, J. The Origins of Intelligence in Children (The Norton Library, New York, 1952). 21. Kuperstein, M. Neural model of adaptative hand-eye coordination for single postures. Science 239, 1308–1311 ( 1988). MEDLINE 22. Widrow, B. & Hoff, M. E. in Conference proceedings of WESCON , 96–104 (1960). 23. Moody, J. & Darken, C. Fast learning in networks of locally-tuned processing units. Neural Comput. 1, 281– 294 (1989). 24. Hinton, G. & Brown, A. in Neural Information Processing Systems vol. 12, 122–128 (MIT Press, Cambridge Massachusetts, 2000). 25. Olshausen, B. A. & Field, D. J. Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Res. 37, 3311–3325 ( 1997). MEDLINE 26. Jordan, M. & Rumelhart, D. Forward models: supervised learning with a distal teacher. Cognit. Sci. 16, 307–354 (1990). 27. Rumelhart, D., Hinton, G. & Williams, R. in Parallel Distributed Processing (eds. Rumelhart, D., McClelland, J. & Group, P. R.) 318–362 (MIT Press, Cambridge, Massachusetts, 1986). 28. Desmurget, M. et al. Role of the posterior parietal cortex in updating reaching movements to a visual target. Nat. Neurosci. 2, 563–567 (1999). MEDLINE 29. Wolpert, D. M., Goodbody, S. J. & Husain, M. Maintaining internal representations: the role of the human superior parietal lobe. Nat. Neurosci. 1, 529–533 (1998). MEDLINE 30. Amit, D. The hebbian paradigm reintegrated — local reverberations as internal representations. Behav. Brain Sci. 18, 617 –626 (1995). 31. Fuster, J. Memory in the Cerebral Cortex: An Empirical Approach to Neural Networks in the Human and Nonhuman Primate (MIT Press, Cambridge, Massachusetts, 1995). 32. Goldberg, M. & Bruce, C. Primate frontal eye fields. III. Maintenance of a spatially accurate saccade signal. J. Neurophysiol. 64, 489–508 (1990). MEDLINE 33. Gnadt, J. & Mays, L. Neurons in monkey parietal area LIP are tuned for eye-movement parameters in three-dimensional space. J. Neurophysiol. 73, 280–297 (1995). MEDLINE 34. Funahashi, S., Bruce, C. & Goldman-Rakic, P. Dorsolateral prefrontal lesions and oculomotor delayed response performance: evidence for mnemonic "scotomas". J. Neurosci. 13, 1479–1497 (1993). MEDLINE 35. Zhang, K. Representation of spatial orientation by the intrinsic dynamics of the head-direction cell ensemble: a theory. J. Neurosci. 16, 2112–2126 (1996). MEDLINE 36. Somers, D. C., Nelson, S. B. & Sur, M. An emergent model of orientation selectivity in cat visual cortical simple cells. J. Neurosci. 15, 5448–5465 (1995). MEDLINE 37. Salinas, E. & Abbott, L. F. A model of multiplicative neural responses in parietal cortex. Proc. Natl. Acad. Sci. USA 93, 11956–11961 (1996). MEDLINE 38. Deneve, S., Latham, P. & Pouget, A. Reading population codes: A neural implementation of ideal observers. Nat. Neurosci. 2, 740– 745 (1999). MEDLINE 39. Walker, M., Fitzgibbon, E. & Goldberg, M. Neurons in the monkey superior colliculus predict the visual result of impending saccadic eye movements. J. Neurophysiol. 73, 1988–2003 ( 1995). MEDLINE 40. Mazzoni, P., Bracewell, R., Barash, S. & Andersen, R. Motor intention activity in the macaque's lateral intraparietal area. I. Dissociation of motor plan from sensory memory. J. Neurophysiol. 76, 1439–1456 (1996). MEDLINE 41. Graziano, M., Hu, X. & Gross, C. Coding the locations of objects in the dark. Science 277, 239–241 (1997). MEDLINE 42. Duhamel, J. R., Colby, C. L. & Goldberg, M. E. The updating of the representation of visual space in parietal cortex by intended eye movements. Science 255, 90–92 (1992). MEDLINE 43. Droulez, J. & Berthoz, A. A neural model of sensoritopic maps with predictive short-term memory properties. Proc. Natl. Acad. Sci. USA 88, 9653–9657 ( 1991). MEDLINE 44. Dominey, P. & Arbib, M. A cortico-subcortical model for the generation of spatially accurate sequential saccades. Cereb. Cortex 2, 153–175 ( 1992). MEDLINE 45. Seung, H. How the brain keeps the eyes still. Proc. Natl. Acad. Sci. USA 93, 13339–13344 ( 1996). MEDLINE 46. Snyder, L., Batista, A. & Andersen, R. Coding of intention in the posterior parietal cortex . Nature 386, 167–170 (1997). MEDLINE 47. Snyder, L., Grieve, K., Brotchie, P. & Andersen, R. Separate body- and world-referenced representations of visual space in parietal cortex. Nature 394, 887–891 ( 1998). MEDLINE Figure 1: Linear and nonlinear functions. (a) The linear function z = 2x + 3y. In general, a function is linear if it can be written as a weighted sum of its input variables plus a constant. All other functions are nonlinear. Linear functions form lines (one variable), planes (two variables, as shown here) or hyperplanes (more than two variables). (b) Linear functions can be implemented in twolayer networks. The network shown corresponds to the linear function in (a). (c) The nonlinear function z = exp(-( x2 + y2 - 100) 2/1000). Nonlinear functions are not planar and can form surfaces that can be arbitrarily complex, such as a circular ridge. (d) A neural network implementation of the nonlinear function in (c) using Gaussian basis functions in the intermediate representation (Box 1). The basis function units are organized so as to form a map in the x- y plane. Right, two representative response functions of these basis function units. The activity of the output unit is obtained by taking a linear sum of the basis function units. In this example, the weights of the blue units onto the output unit are set to one, whereas all the other units have a weight of zero. As a result, the output unit mathematically sums a set of Gaussian functions arranged along a circle in the x-y plane. This leads to the response function (right), which is similar to the circular ridge in (c). Figure 2: Basis function units. (a) The response function of a basis function unit computing a product of a Gaussian function of retinal location (eyecentered position) multiplied by a sigmoidal function of eye position. (b) A mapping of the retinotopic receptive field derived from a unit with the properties in (a) for three different eye positions. Bold lines in (a) correspond to the three curves shown here. The receptive field always peaks at the same retinal location, but the gain (or amplitude) of the response varies with eye position. Gain modulations similar to this are found in many cortical areas, from V1 to the premotor cortex. (c) A neural network model for nonlinear sensorimotor transformations using basis functions. The input layers encode the retinal location of an object and the current eye position, whereas the output layer encodes the change in joint angles of the arm. Other inputs signals are needed to compute the change in joint angles of the arm, such as head position, starting hand location and so on, but for simplicity we show only the visual and eye position inputs. The tuning curve of the eye-centered and eyeposition units are assumed to follow Gaussian and sigmoid functions, respectively. This nonlinear sensorimotor transformation requires an intermediate layer. In the case illustrated here, the intermediate layer uses basis function units. Each unit computes the product of the activity of one input unit from the eye-centered map and one input unit from the eye position map. This leads to the response of the basis function units (a, b). The transformation from the basis function to the output units involves a simple linear transformation, namely, a weighted sum of the activity of the basis functions units. This is the main advantage of this approach: once the basis functions are computed, nonlinear transformations become linear. Figure 3: Learning sensorimotor transformations in neural networks. (a) Learning motor commands with spontaneous movements. The motor layer generates a spontaneous pattern of activity. This activity is fed into the arm, resulting in a new arm position. The position of the arm is observed in the input sensory layer. The sensory activity is passed through the weights to compute a predicted motor command. The weights can then be adjusted according to an error signal obtained by computing the difference between the spontaneous and predicted motor command. By repeating this procedure many times, one can learn the appropriate sensorimotor mapping. Note that the same motor units are involved in all steps: generating and remembering the spontaneous motor command, computing the predicted motor command and computing the error signal. In most models, the neuronal mechanisms underlying these various steps are not specified; the motor units are simply assumed to be able to perform the required computations. (b) When the transformation is nonlinear, as is the case for arm movements, an intermediate layer of units is required. If basis functions are used in the intermediate layer, learning can be done in two stages. The weights to the basis function can be learned with an unsupervised, or self-organizing rule because the basis functions only depend on the input, not on the motor command computed in the output layer. The weights to the output layer are equivalent to the ones in the network in (a) and can be learned using the same error signal. Figure 4: One-to-many sensorimotor transformations. (a) To reach for the location indicated by the star, a twojoint arm working in a two-dimensional plane can adopt exactly two configurations (top). The transformation in this case is from one sensory input into one of two motor commands. In the case illustrated here, the two arm configurations are mirror images of each other, that is, the joint angles have the same amplitude but reversed signs. As a result, the average of angle 1 and 2 over the two configurations is zero. The average configuration (bottom) would therefore overshoot the target (red arrow). Because averaging is precisely what the network in Fig. 3b does during training, this transformation cannot be learned properly with this kind of networks. (b) One solution is to use two networks, one for the sensorimotor transformation (the inverse model) and one for predicting the sensory consequences of the motor command (the forward model). Each of these networks typically involves three or more layers such as the one shown in Fig. 3b. Learning in this system is achieved by learning the forward model (for example, through the generation of spontaneous motor commands) and then using the forward model to train the inverse model. Figure 5: Short-term memory networks for sensorimotor transformations. (a) A hill of activity (top) in a network organized in a twodimensional retinotopic map (bottom). If the hill is a stable state for the network (that is, if it maintains itself over time), then the position of the peak of the hill can be used as a memory for retinal location. In the case illustrated here, the network is storing the value (15°, 0°), as indicated by the arrows. (b) To remain spatially accurate, a memory of the retinal location of an object must be updated after each eye movement. This updating can take place in such memory networks by displacing the hill by an amount equal to minus the eye displacement (- E). The response to a +10° horizontal eye movement is shown.

Computational approaches to sensorimotor transformations

Related documents

Products

Support

Computational approaches to sensorimotor transformations

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib