Efficient Encoding of Natural Time Varying Images Produces Oriented Space-Time Receptive Fields Rajesh P. N. Rao and Dana H. Ballard Department of Computer Science University of Rochester Rochester, NY 14627 rao,dana @cs.rochester.edu Technical Report 97.4 National Resource Laboratory for the Study of Brain and Behavior Department of Computer Science, University of Rochester August 1997 Abstract The receptive fields of neurons in the mammalian primary visual cortex are oriented not only in the domain of space, but in most cases, also in the domain of space-time. While the orientation of a receptive field in space determines the selectivity of the neuron to image structures at a particular orientation, a receptive field’s orientation in space-time characterizes important additional properties such as velocity and direction selectivity. Previous studies have focused on explaining the spatial receptive field properties of visual neurons by relating them to the statistical structure of static natural images. In this report, we examine the possibility that the distinctive spatiotemporal properties of visual cortical neurons can be understood in terms of a statistically efficient strategy for encoding natural time varying images. We describe an artificial neural network that attempts to accurately reconstruct its spatiotemporal input data while simultaneously reducing the statistical dependencies between its outputs. The network utilizes spatiotemporally summating neurons and learns efficient sparse distributed representations of its spatiotemporal input stream by using recurrent lateral inhibition and a simple threshold nonlinearity for rectification of neural responses. When exposed to natural time varying images, neurons in a simulated network developed localized receptive fields oriented in both space and space-time, similar to the receptive fields of neurons in the primary visual cortex. 1 Introduction Since the seminal experiments of Hubel and Wiesel over 30 years ago[Hubel and Wiesel, 1962; 1968], it has been known that neurons in the mammalian primary visual cortex respond selectively to stimuli such as edges or bars at particular orientations. In many cases, the neurons are directionally selective i.e. they respond only to motion in a particular direction. An especially useful concept in characterizing the response properties of visual neurons has been the notion of a receptive field. The receptive field of a neuron is classically defined as the area of visual space within which stimuli such as bars or edges can elicit responses This research was supported by NIH/PHS research grant 1-P41-RR09283. 1 from the neuron [Hartline, 1940]. Although they are a function of both space and time, early depictions of visual receptive fields were confined to spatial coordinates. In recent years, new mapping techniques have allowed the characterization of receptive fields in both space and time[Emerson et al., 1987; McLean and Palmer, 1989; Shapley et al., 1992; DeAngelis et al., 1993a] (see [DeAngelis et al., 1995] for a review). The new mapping results indicate that in most cases, the receptive field of a visual neuron changes over time. It has been noted that while the spatial structure of a receptive field indicates neuronal attributes such as preference for a particular orientation of bars or edges, the spatiotemporal structure of a neuron’s receptive field governs important dynamical properties such as velocity and direction selectivity[Adelson and Bergen, 1985; Watson and Ahumada, 1985; Burr et al., 1986]. In particular, orientation of a neuron’s receptive field in space-time indicates the preferred direction of motion while the slope of the oriented subregions gives an estimate of the preferred velocity [McLean and Palmer, 1989; Albrecht and Geisler, 1991; Reid et al., 1991; DeAngelis et al., 1993b; McLean et al., 1994]. An attractive approach to understanding the receptive field properties of visual neurons is to relate them to the statistical structure of natural images. Motivated by the property that natural images possess a power spectrum [Field, 1987], Atick and Redlich [Atick, 1992; Atick and Redlich, 1992] provided an explanation of the center-surround structure of retinal ganglion receptive fields in terms of whitening or decorrelation of outputs in response to natural images. Several Hebbian learning algorithms for decorrelation have also been proposed [Bienenstock et al., 1982; Williams, 1985; Barrow, 1987; Linsker, 1988; Oja, 1989; Sanger, 1989; Foldiak, 1990; Atick and Redlich, 1993], many of which perform Principal Component Analysis (PCA). Although the PCA of natural images produces lower order components that resemble oriented filters [Baddeley and Hancock, 1991; Hancock et al., 1992], the higher order components are unlike any known neural receptive field profiles. In addition, the receptive fields obtained are global rather than localized feature detectors. Recently, Olshausen and Field showed that a neural network that includes the additional constraint of maximizing the sparseness of the distribution of output activities develops, when trained on static natural images, synaptic weights with localized, oriented spatial receptive fields [Olshausen and Field, 1996] (see also [Harpur and Prager, 1996; Rao and Ballard, 1997a] and related work on projection pursuit [Huber, 1985] based learning methods [Intrator, 1992; Law and Cooper, 1994; Shouval, 1995]). Similar results have also been obtained using an algorithm that extracts the independent components of a set of static natural images [Bell and Sejnowski, 1997]. These algorithms are all based directly or indirectly on Barlow’s principle of redundancy reduction [Barlow, 1961; 1972; 1989; 1994], where the goal is to learn “feature detectors” whose outputs are as statistically independent as possible. The underlying motivation is that sensory inputs such as images are generally comprised of a set of independent objects or features whose components are highly correlated. By learning detectors for these independent features, the sensory system can develop accurate internal models of the sensory environment and can efficiently represent external events as sparse conjunctions of independent features. In this paper, we explore the possibility that the distinctive spatiotemporal receptive field properties of visual cortical neurons can be understood in terms of a statistically efficient strategy for encoding natural time varying images [Eckert and Buchsbaum, 1993; Dong and Atick, 1995]. We describe an artificial neural network that attempts to accurately reconstruct its spatiotemporal input data while simultaneously reducing the statistical dependencies between its outputs, as advocated by the redundancy reduction principle. Our approach utilizes a spatiotemporal generative model that can be viewed as a simple extension of the spatial generative model used by Harpur and Prager[Harpur and Prager, 1996], Olshausen and Field [Olshausen and 2 Field, 1996], Rao and Ballard [Rao and Ballard, 1997a], and others. The spatiotemporal generative model allows neurons in the network to perform not just a spatial summation of the current input, but a spatiotemporal summation of both current and past inputs over a finite spatiotemporal extent. The network learns efficient sparse distributed representations of its spatiotemporal input stream by utilizing lateral inhibition[Foldiak, 1990] and a simple threshold nonlinearity for rectification of neural responses [Lee and Seung, 1997; Hinton and Ghahramani, 1997]. When exposed to natural time varying images, neurons in a simulated network developed localized receptive fields oriented in both space and space-time, similar to the receptive fields of neurons in the primary visual cortex. 2 Spatial Generative Models The idea of spatial generative models has received considerable attention in recent studies pertaining to neural coding [Hinton and Sejnowski, 1986; Jordan and Rumelhart, 1992; Zemel, 1994; Dayan et al., 1995; Hinton and Ghahramani, 1997], although the roots of the approach can be traced back to early ideas in control theory such as Wiener filtering [Wiener, 1949] and Kalman filtering [Kalman, 1960]. In this section, we first consider a class of spatial generative models that have previously been used in the neural modeling literature for explaining spatial receptive field properties [Harpur and Prager, 1996; Olshausen and Field, 1996; Rao and Ballard, 1997a]. This will serve to motivate the spatiotemporal models we will be concerned with later. Assume that an image, denoted by a vector I of pixels, can be represented as a linear combination of a set of basis vectors : I (1) The coefficients can be regarded as an internal representation of spatial characteristics of the image I, as interpreted using the internal model defined by the basis vectors . In terms of a neuronal network, the coefficients correspond to the activities or firing rates of neurons while the vectors in the basis matrix correspond to the synaptic weights of neurons. It is convenient to rewrite the above equation in matrix form as: I r (2) where is the matrix whose columns consist of the basis vectors and r is the vector consisting of coefficients . The goal is to estimate the coefficients r for a given image and, on a longer time scale, learn appropriate basis vectors in . A standard approach is to define a least-squared error criterion of the form: ! ! I r "$# r" ! I r" (3) (4) where denotes the % th pixel of I and denotes the % th row of . Note that is simply the sum of squared pixel-wise errors between the input I and the image reconstruction r. Estimates for and r can be obtained by minimizing [Williams, 1985; Daugman, 1988; Pece, 1992; Harpur and Prager, 1996; Olshausen and Field, 1996; Rao and Ballard, 1997a]. 3 One can obtain a probabilistic generative model of the image generation process by utilizing a Gaussian noise process n to model the differences between I and r. The resulting stochastic generative model becomes: I r n (5) If a zero mean Gaussian noise process n with unit covariance is assumed, one can show that is the negative log likelihood of generating the input I (see, for example,[Bryson and Ho, 1975]). Thus, minimizing is equivalent to maximizing the likelihood of the observed data. Unfortunately, in many cases, minimization of a least-squares optimization function such as without additional constraints generates solutions that are far from being adequate descriptors of the true input generation process. For example, a popular solution to the least-squares minimization criterion is principal component analysis (PCA), also sometimes referred to as eigenvector or singular value decomposition (SVD) of the input covariance matrix. PCA optimizes by finding a set of mutually orthogonal basis vectors that are aligned with the directions of maximum variance in the input data. This is perfectly adequate in the case where the input data clouds are Gaussian and capturing pairwise statistics suffices. However, statistical studies have shown that natural image distributions are highly non-Gaussian and cannot be adequately described using orthogonal bases [Field, 1994]. Thus, additional constraints are required in order to guide the optimization process towards solutions that more accurately reflect the input generation process. One way of adding constraints is to take into account the prior distributions of the parameters r and . Thus, one can minimize an optimization criterion of the form: r" " (6) where " and " are terms related to the prior distributions of the parameters r and . In particular, they denote the negative log of the prior probabilities of r and respectively. When viewed in the context of information theory, these negative log probability terms in can be interpreted as representing the cost of coding the parameters in bits (in the base ). Thus, the function can be given an interpretation in terms of the minimum description length (MDL) principle [Rissanen, 1989; Zemel, 1994], namely, that solutions are required not only to be accurate but also to be cheap in terms of coding length. This formalizes the wellknown Occam’s Razor principle that advocates simplicity over complexity among solutions to a problem. One may also note that minimizing is equivalent to maximizing the posterior probability of input data (maximum a posteriori (MAP) estimation). Specific choices of and determine the the nature of internal representations that will be learned. [Olshausen and Field, 1996] proposed functions of the form For example, Olshausen and Field " ! and to encourage sparseness in r. Alternately, one can use a ", " " zero-mean multivariate Gaussian prior on r [Rao and Ballard, 1997a] to yield the negative log of the prior density: r" r# r (7) where is a positive constant and denotes a set of lateral weights. The matrix represents the inverse covariance matrix of r. We show in the next section that this choice enforces lateral inhibition among the output neurons, thereby encouraging sparse distributed representations, and leads to an “anti-Hebbian” learning rule for the lateral weights equivalent to Foldiak’s well-known adaptation rule[Foldiak, 1990]. 4 Generative Weights Spatial Generative Model U r A I Spatial Response Vector I Image Spatiotemporal Generative Model Image Sequence I(k) I(2) I(1) I(k) I(2) B r U(1) r U U(2) U(k) I(1) U(k) U(2) r U(1) Spatiotemporal Response Vector Figure 1: Generative Models. (A) Linear spatial generative model used in [Harpur and Prager, 1996; Olshausen and Field, 1996; Rao and Ballard, 1997a]. A given input image I is assumed to be generated by multiplying a basis vector matrix with a set of “hidden” causes represented by the spatial response vector r. (B) Spatiotemporal generative model used in this paper. An input image sequence I , , is assumed to be generated by multiplying a set of basis matrices , , with a spatiotemporal response vector r. In the case of , for the sake of simplicity, we will assume a Gaussian distribution with a covariance yielding the following value for : where " , (8) denotes the sum of squares of the elements of the given matrix. A final constraint is motivated both by biology and by information coding concerns. We will constrain the network outputs (coefficients ) to be non-negative, acknowledging the fact that the firing rate of a neuron cannot be negative. The non-negativity constraint is especially attractive in information coding terms since it causes an infinite density at for the rectified and a consequent low coding cost at [Hinton and Ghahramani, 1997], which encourages sparseness among the outputs of the network. We are overlooking the possibility that a single neuron can signal both positive and negative quantities by raising or lowering its firing rate with respect to a fixed background firing rate corresponding to zero. 5 3 Spatiotemporal Generative Models In the previous section, the input data consisted of static images I, and we used a single basis matrix to capture the statistical structure of the space of input images. Furthermore, the spatial structure given by the pixels of the image I were internally represented by a single spatial response vector r. We now draw an analogy between time and space to define a spatiotemporal generative model. Suppose our training sequence being set consists of different sequences of images each, a given denoted by the vectors I , . We will use a set of basis matrices , . For " " " will be used to capture the statistical structure of the images occurring at the time step in each , the training sequences. As will become clear in the next section, the underlying motivation here is that the neurons in the network perform not just a spatial summation, but a space-time summation of inputs over a finite spatiotemporal extent (see Figure 2). Thus, since inputs are weighted differentially depending on their spatiotemporal history, we need to learn a set of synaptic weights, one for each time instant . These " , in turn determine the spatiotemporal receptive fields spatiotemporal synaptic weights, as given by the of the neurons in the network. A single spatiotemporal response vector r will be used to characterize a given spatiotemporal image , in much the same way as a single spatial response vector was previously used to sequence for pixels of a given static image. We thus obtain the following characterize the spatial structure for % space-time analog of Equation 2: " r I " (9) Note that in the special case where , we obtain Equation 2. From a probabilistic perspective, one can rewrite Equation 9 in the form of the following stochastic generative model: " I " I " " r n (10) .. .. . . I " " where n is a stochastic noise process accounting for the differences between I " and is easy to see that Equation 5 is a special case of the above generative model, where We can now define the following space-time analog of the optimization function 4: ! " " r" ! " r" # I " ! " r" I " " r. Once again, it . (11) (12) This is simply the sum of squared pixel-wise reconstruction errors across both space and time. As in the previous section, if n is assumed to be a zero mean Gaussian with unit covariance, one can show that is the negative log likelihood of generating the inputs I " I " . Thus, minimizing is equivalent to maximizing the likelihood of the spatiotemporal input sequence. 6 As in the previous section, by assuming Gaussian priors on the parameters r, following MDL-based cost function: r# r " and , we obtain the (13) We will additionally constrain the elements of r to be non-negative as discussed in the previous section. " , and . This can be achieved by Thus, we are now left with the task of finding optimal estimates of r, minimizing using gradient descent as discussed in the next section. 4 Network Dynamics and Learning Rules In this section, we describe gradient-descent based estimation rules for obtaining optimal estimates of r, 4.1, one may alternate between " and . In the case where the input consists of batch data, as in Section " and , and the optimization of " and for fixed r, thereby implethe optimization of r for fixed menting a form of the expectation-maximization (EM) algorithm [Dempster et al., 1977]. In the case of " and on-line data, which is considered in Section 4.2, the optimization of r occurs simultaneously with . However, the learning rates for " and are set to values much smaller than the adaptation rate of r. 4.1 Estimation of r An optimal estimate of r can be obtained by performing stochastic gradient descent on r ! r ! "# " " r" ! with respect to r: r (14) where governs the rate of descent towards the minima. The above equation is relatively easy ! to inter pret: in order to modify r towards the optimal estimate, we need to obtain the residual error " " r" " r. The residual errors for the between the input at time and its reconstruction various time steps are then spatiotemporally weighted by their corresponding weights " # to obtain a spatiotemporal sum, which is modified by lateral inhibition due to the term ! r. In a neural implementation, " # would the individual rows of the matrices comprise the synaptic weights of a single spatiotemporally summating neuron. Thus, the % th row of " # would represent the effect of the synapses of the % th neuron for time instant . Figure 2A shows a network implementation of the above dynamics. A possible problem with this implementation is the need for computing global residual errors at each iteration of the estimation process. This becomes especially problematic in the case where the data is being obtained on-line since one would need to keep the past images in memory and in addition, use separate sets of neurons representing the matrices " for generating the signals " r at each iteration. The dynamics can be implemented more locally by rewriting Equation 14 in the following form: r ! "# " r ! r (15) where " # " . Note that this form of the dynamics does not require that residual errors be computed at each iteration. Rather, we simply perform a spatiotemporal filtering of the inputs I " using the synaptic weight matrices " # for and then subtract two lateral terms, one involving 7 T Residual Error Signals Spatiotemporal Filtering of Errors U(k) T U(2) + T U(1) A + Lateral Inhibition I(k) U(k) I(1) r U(2) Image Sequence Spatiotemporal Response Vector U(1) Spatiotemporal Filtering of Inputs I(k) B L Rectify I(2) I(2) I(1) T Lateral Inhibition L U(k) T U(2) + + Rectify r T U(1) W Recurrent Excitation/Inhibition Figure 2: Alternate Network Implementations of the Response Vector Dynamics. (A) shows a globally recurrent architecture for implementing the dynamics of the response vector r (Equation 16). The architecture requires the calculation of residual error signals [Mumford, 1994; Barlow, 1994; Pece, 1992; Rao and Ballard, 1997a] between predicted images and actual images. These errors are filtered by a set of spatiotemporally summating neurons whose synapses represent the matrices . The result is used to correct r in conjunction with recurrent lateral inhibition and rectification. (B) shows a biologically more plausible locally recurrent architecture for implementing the dynamics of r. Rather than computing global residual errors, the inputs are directly filtered by a set of spatiotemporally summating neurons representing . The result is then further modified by recurrent excitation/inhibition due to the weights (see text) as well as lateral inhibition due to the weights . 8 and the other involving . While the term involving is exclusively inhibitory, the components of the vector ! r may be excitatory or inhibitory. Once again, the rows of the matrices correspond to " # " # representing the synaptic the synapses of spatiotemporally summating neurons, with the % th row of weights of the % th spatiotemporal neuron for time instant . Expressing the above dynamics in its discrete form and adding a final constraint of non-negativity results in the following equation for updating r until convergence is achieved for the given input sequence: r % ! "# " " r%" r %" ! r % " " (16) " , applied to all components of the where is a threshold nonlinearity for rectification: given vector. It is interesting to note the similarity between the dynamics as given by the stochastic gradient descent rule above and those proposed by Lee and Seung for their “Conic” network[Lee and Seung, 1997]. In particular, the above equation can regarded as a spatiotemporal extension of the dynamics used in the Conic network. A network implementation of this equation is shown in Figure 2B. 4.2 On-Line Estimation The dynamics described above can be extended to the more realistic case where the inputs are being encountered on-line. Let represent the current time instant. The on-line form of Equation 15 is then given by: "# ! r " ! r! r (17) Expressing the above dynamics in its discrete form and enforcing the constraint of non-negativity yields the following rule for updating r at each time instant: r "# ! " r " " ! r " ! r " " (18) where the operator again denotes rectification. In summary, the current spatiotemporal response is determined by three factors: the previous spatiotemporal response r " , the past inputs I " I ! " , and lateral inhibition due to and . These three factors at time instant are combined via summation, followed by rectification, to yield the responses at time instant . The consideration of only the past inputs rather than the entire input history in the equation above is consistent with the observation that cortical neurons process stimuli within restricted temporal epochs of time (see, for example,[DeAngelis et al., 1995]). 4.3 Learning Rules A learning rule for determining the optimal estimate for each " , for each : descent on with respect to " ! " 9 " can be obtained by performing gradient " ! " r " r# ! " (19) where is the learning rate parameter. Note that for the on-line case, I " is replaced by its on-line counterpart I ! " in the above equation. Similarly, a learning rule for the lateral weights can be obtained by performing gradient descent on with respect to : ! rr# ! (20) Note that this is a Hebbian learning rule with decay. In conjunction with the inhibition in the dynamics of r (Equation 18), the above learning rule can be seen to be equivalent to Foldiak’s anti-Hebbian learning rule [Foldiak, 1990], if the diagonal terms of , which implement self-inhibition, are set to zero. 5 Experimental Results The algorithms derived in the previous section were tested on a set of five digitized natural images from the Ansel Adams Fiat Lux collection at the UCR/California Museum of Photography (Figure 3A, images repro pixels with grayscale pixel values between and duced with permission). The images were of size . Each image was preprocessed by filtering with a circularly symmetrical zero-phase whitening/low-pass filter with the spatial frequency profile [Olshausen and Field, 1996; Atick and Redlich, 1992]: (21) " where the cut-off frequency As described in [Olshausen and Field, 1997], the cycles/image. " whitening component of the filter performs “sphering” for natural image data by attenuating the low frequencies and boosting the higher frequencies. The low-pass exponential component " helps to reduce the effects of noise/aliasing at high frequencies and eliminates the artifacts of using a rectangular sampling lattice. Figure 3B shows the frequency profile of the whitening/low-pass filter in the 1-D case while Figure 3C shows the 2-D case. The corresponding spatial profile obtained via inverse Fourier transform is shown in Figure 3D. The spatial profile resembles the well-known center-surround receptive fields characteristic of retinal ganglion cells. Atick and Redlich [Atick and Redlich, 1992; Atick, 1992] have shown that the measured spatial frequency profiles of retinal ganglion cells are well approximated by filters resembling " . Figure 3E shows the results of filtering an image from the training set using ". The training data comprised of sequences of contiguous image patches extracted from the filtered natural patch is shown as a box labeled RF in Figure 3E. Starting from an images. The relative size of a image patch extracted at a randomly selected location in a given training image, the next training patch was extracted by moving in one of directions as shown in Figure 3F. This was achieved by incrementing or decrementing the and/or coordinate of the current patch location by one according to the current direction of motion. After every image patches, the current direction was set randomly to one of the possible directions. The direction was also randomly changed in case an image boundary was encountered. A new training image was obtained after every patches, thereby cycling through the five preprocessed training images. In the first experiment, a network of neurons was trained on sequences of natural image patches of size . The temporal extent of processing was set to , resulting in sets of synaptic weight " # , , were initialized to uniformly random vectors. These vectors, which form the rows of values and normalized to length one. After the presentation of each training image patch, the response vector 10 A 120 80 100 60 IFT 80 K(f) 60 40 20 0 40 FT 20 −20 5 5 0 0 0 0 50 100 150 200 250 300 f 350 fy −5 fx D C B −5 Image RF E F Figure 3: Training Paradigm. (A) The five natural images used in the experiments. The original negatives are from the Ansel Adams Fiat Lux collection at the UCR/California Museum of Photography (images reproduced with permission). (B) 1-D spatial frequency response profile of zero-phase whitening/low-pass filter used for preprocessing the natural images. (C) The full 2-D response profile of the same filter. (D) The profile in the space domain obtained by inverse Fourier transform (IFT), showing the familiar center-surround receptive field characteristic of retinal ganglion cells (see text). (E) A natural image after being filtered with the center-surround filter in (D). The relative size of a receptive field (same size as the input image patches) is shown as a box labeled “RF” on the bottom right corner. (F) depicts the process of extracting image patches from the filtered natural images for training the neural network. Patches are extracted from the square window as it moves in one of directions within a given training image (see text from more details). 11 t=1 A t=3 t=2 t=4 t=5 t=6 t=7 t=8 0.3 0.3 0.4 0.3 0.3 0.3 0.3 0.3 0.2 0.2 0.3 0.2 0.2 0.2 0.2 0.2 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0 0 0.1 0 0 0 0 −0.1 −0.1 −0.1 −0.1 −0.1 −0.2 −0.2 −0.2 −0.2 −0.2 0 −0.3 8 8 6 −0.2 −0.3 8 8 8 8 8 8 6 6 4 2 0 2 0 0 2 4 2 0 4 6 4 2 0 0 8 6 4 2 2 0 6 4 4 2 0 −0.3 8 −0.4 8 6 6 4 2 0 2 0 −0.3 8 6 6 4 2 4 2 0 4 6 4 2 0 0 −0.3 8 6 8 6 4 2 2 0 6 4 4 2 −0.3 −0.3 8 6 6 4 0 −0.1 −0.2 −0.2 −0.3 8 0.1 −0.1 −0.1 0 0 Initial Synaptic Weights t=1 t=3 t=2 0.15 0.1 t=4 0.15 0.1 t=6 0.1 t=7 0.1 t=8 0.1 0.1 0.1 0.05 0.05 0.05 0.05 0.05 0 B t=5 0.1 0.05 0.05 0.05 0 −0.05 0 0 −0.05 −0.05 −0.1 −0.1 −0.1 −0.15 8 −0.15 8 x 6 4 4 2 2 0 0 y −0.1 −0.15 8 8 6 8 6 −0.15 8 0 4 2 2 0 0 −0.05 −0.05 6 4 2 0 0 8 6 6 4 4 2 0 −0.05 −0.1 8 8 6 6 4 0 −0.05 −0.05 0 6 4 4 2 2 0 0 −0.1 8 8 6 6 4 2 2 0 −0.1 8 8 6 4 4 2 2 0 0 0 −0.1 8 8 6 4 2 2 0 6 4 4 2 8 6 6 4 2 0 0 0 Synapses After Learning t=1 C 2 3 4 5 6 7 8 y x Figure 4: Example of Receptive Field Dynamics after Learning (RF size = ). (A) shows the initial set of synaptic weights (or receptive fields) for a single model neuron. The weights shown comprised the first row of for . The components of these vectors were initialized to random values according to the uniform distribution and the resulting vectors were normalized to length one. (B) shows the same set of synaptic weights after training the -neuron network on natural images (see text for details). The synaptic profiles at each time step resemble oriented Gabor wavelets [Daugman, 1980; Marcelja, 1980]. (C) depicts these synaptic weights using a classical 2D receptive field representation, where the dark regions are inhibitory and the bright regions are excitatory. The entire sequence of synaptic weights suggests that this neuron is tuned towards dark bars moving diagonally from the bottom left corner of the receptive field to the top right corner. . r was updated according to the on-line estimation rule given by equation 18 with and " were adapted at each time step using equation 19 with and . The weights Similarly, the lateral weights were adapted using equation 20 with and . The diagonal of , representing self-inhibitory decay terms or the “leakiness” parameters in the leaky integrator equation 18, was fixed at , although qualitatively similar results were obtained if the diagonal was also additionally adapted along with the lateral weights. The learning rate parameters and were gradually decreased by dividing with every image presentations and a stable solution was arrived at after image presentations. Figure 4 shows a set of synaptic weight vectors (first row of " # for ) for a model neuron before and after learning (A and B respectively). Since these synaptic vectors form the feedforward weighting function of neurons in the network (see Figure 2B), they can be roughly interpreted as “receptive fields” or spatial impulse response functions for time steps . The receptive fields after learning resemble localized Gabor wavelets which have previously been shown to well approximate the receptive field weighting profiles of simple cells in the mammalian primary visual cortex[Daugman, 1980; Marcelja, 1980; Olshausen and Field, 1996]. In addition, the model neuron can be seen to be tuned towards dark bars moving diagonally from the bottom left corner of the receptive field to the top right corner. A 12 t=1 2 3 4 5 6 7 8 t=1 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) Figure 5: Further Examples of Receptive Fields (RF size = 2 3 4 5 6 7 8 ). (a) through (p) show examples of synaptic weights for sixteen other model neurons from the network described in Figure 4. Several varieties of orientation selectivity can be discerned, with each neuron being tuned towards a particular direction of motion. Other neurons, such as those depicted in (o) and (p), appear to be selective for oriented image structures that are approximately stationary or contain motion along the preferred orientation. number of other examples of receptive fields developed by the network are shown in Figure 5. The set of spatiotemporal synaptic weights together form an overcomplete set of basis functions for representing spatiotemporal input sequences. In a second set of experiments, we investigated the importance of rectification and lateral inhibition in learning visual cortex-like receptive fields. Three networks using the same set of parameters, initialization conditions, and training patches as the one used in Figures 4 and 5 were subjected to three different conditions. In the first network, rectification was removed but the lateral weights were learned and used as before while in the second network, the lateral weights (including the diagonal terms) were disabled but rectification was retained. The third network was trained without either lateral inhibition or rectification of responses. Figure 6 compares the results obtained after image presentations for the same model neuron as in Figure 4. This specific example as well as the results obtained for other model neurons in the networks suggest that both lateral inhibition and rectification are necessary for obtaining synaptic profiles resembling visual cortical receptive fields. In a third set of experiments, the results obtained for receptive fields were verified in the case. The training paradigm involving the natural image patches was identical to the case. A network of neurons was trained on sequences of natural image patches of size . The temporal extent of processing was set to . The parameters were set as follows: , , , 13 t=1 2 3 4 5 6 7 8 With Rectification and Lateral Inhibition y x Lateral Inhibition Only Rectification Only Without Rectification or Lateral Inhibition Figure 6: The Need for Lateral Inhibition and Rectification during Learning. The figure shows the synaptic profiles obtained after training for the same neuron (from Figure 4) with identical initial conditions and identical training regimes but with lateral inhibition and/or rectification removed. As can be seen from this specific example, which is typical of the result obtained for other neurons in the networks, both lateral inhibition and rectification appear to be necessary for obtaining synaptic profiles resembling visual cortical receptive fields. Neither rectification nor lateral inhibition alone seemed to suffice to produce the desired neural receptive fields. . The diagonal of was fixed at . The learning rate parameters and were gradually decreased by dividing with every image presentations and a stable solution was arrived at after image presentations. Figure 7 shows a set of synaptic weight vectors (or receptive fields) for a model neuron before and after learning. The model neuron can be seen to be tuned towards dark horizontal bars moving downwards. Several other examples of receptive fields developed by the network are shown in Figure 8. A majority of these neurons appear to be tuned towards oriented image structures moving in a particular direction. Some neurons, such as the one shown in (k), exhibit more complex dynamics, involving two or more inhibitory subregions coalescing into one while other neurons, such as the one shown in (l), appear to be tuned towards oriented image structures that are either approximately stationary or contain some form of motion along the preferred orientation. The space-time receptive fields for the model neurons in Figure 7 and 8 are shown in Figure 9. These - plots were obtained by integrating the 3-D spatiotemporal receptive field data along the neuron’s preferred orientation, as illustrated in the top two rows of the figure. The inverse of the slope of the oriented subregions in the space-time receptive field provides an estimate of the neuron’s preferred velocity (see, for example, [Adelson and Bergen, 1985]). Thus, in the case of the top two rows in the figure, the slope is approximately indicating a preferred speed of approximately pixel/time step for these two neurons in their respective directions. The space-time receptive fields in Figure 9 (b) through (l) are those for the model neurons in Figure 8 (b) through (l). Note that even though the training image window moved at pixel/time step, in some cases, such as (d) and (h), the preferred speed is less than pixel/time step due to the well-known “aperture effect” [Adelson and Movshon, 1982]. In the extreme case of an approximately stationary receptive field as in (l), the space-time receptive field indicates a preferred speed of zero. In order to evaluate the nature of internal representations used by the network, we exposed the trained , , and 14 t=1 t=2 0.15 t=3 0.15 0.1 0.1 0.05 t=4 0.15 0.1 0.05 t=5 0.15 0.1 0.05 t=6 0.15 0.15 0.1 0.05 0.1 0.05 0.05 0 0 0 0 0 0 −0.05 −0.05 −0.05 −0.05 −0.05 −0.05 −0.1 −0.1 −0.1 −0.1 −0.1 −0.1 −0.15 −0.15 −0.15 −0.15 −0.15 −0.15 15 15 12 10 15 12 10 10 A 15 12 2 t=7 2 0.1 2 t=10 2 t=11 t=12 0.1 0.05 0.05 0 0 0 0 0 0 −0.05 −0.05 −0.05 −0.05 −0.05 −0.05 −0.1 −0.1 −0.1 −0.1 −0.1 −0.1 −0.15 −0.15 −0.15 −0.15 −0.15 −0.15 15 15 12 10 15 12 10 10 2 2 2 12 10 8 2 6 5 4 0 0 8 6 5 4 0 0 10 10 8 6 5 4 0 0 15 12 10 10 8 6 5 4 0 0 15 12 10 10 8 6 5 4 0 15 12 10 10 8 6 5 2 0 0.15 0.1 0.05 4 0 0 0.15 0.1 0.05 10 8 6 5 4 0 0 0.15 0.1 0.05 12 10 10 8 6 5 4 0 0.15 0.1 0.05 2 0 t=9 0.15 15 12 10 8 6 5 4 0 0 t=8 0.15 15 12 10 8 6 5 4 0 0 10 10 8 6 5 4 0 10 10 8 6 5 2 4 0 0 2 0 Initial Synaptic Weights t=1 t=2 t=3 t=4 t=5 t=6 0.1 0.1 0.1 0.1 0.1 0.1 0.05 0.05 0.05 0.05 0.05 0.05 0 0 0 0 0 0 −0.05 −0.05 −0.05 −0.05 −0.05 −0.05 −0.1 15 −0.1 15 −0.1 15 −0.1 15 −0.1 15 −0.1 15 12 10 12 10 10 B 12 2 t=7 12 2 t=8 12 2 2 8 2 t=11 0.1 0.1 0.1 0.1 0.05 0.05 0.05 0.05 0.05 0.05 0 0 0 0 0 0 −0.05 −0.05 −0.05 −0.05 −0.05 −0.05 −0.1 15 12 10 −0.1 15 12 10 10 8 y 4 0 2 0 −0.1 15 12 10 10 8 6 5 6 5 x 0 −0.1 15 12 10 2 10 8 2 2 12 10 10 6 5 4 0 0 8 6 5 4 0 0 8 6 5 4 0 0 −0.1 15 12 10 10 8 6 5 4 2 0 t=12 0.1 10 4 0 0 0.1 −0.1 15 6 5 4 0 0 t=10 t=9 12 10 8 6 5 4 0 0 10 10 8 6 5 4 0 0 10 10 8 6 5 4 0 0 10 10 8 6 5 4 0 10 10 8 6 5 2 4 0 0 2 0 Synapses After Learning t=1 C 2 3 4 5 6 7 8 9 10 11 12 y x Figure 7: Example of Receptive Field Dynamics after Learning (RFsize = ). (A) shows the initial set of random and normalized synaptic weights for a single model neuron in a -neuron network. (B) shows the same set of synaptic weights after training the network on the filtered natural images (see text for details). (C) depicts these synaptic weights using a classical 2D receptive field representation (dark regions are inhibitory, bright regions are excitatory). The sequence of synaptic weights suggests that this neuron is tuned towards dark horizontal bars moving downwards. 15 t=1 2 3 4 5 6 7 8 9 10 11 12 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) Figure 8: Further Examples of Receptive Fields (RF size = ). (a) through (l) show examples of synaptic weights for twelve other model neurons from the network used in Figure 7. A majority of these neurons are tuned towards oriented image structures moving in a particular direction. Some neurons, such as the one shown in (k), exhibit more complex dynamics, for example, involving two or more inhibitory subregions coalescing into one. Other neurons, such as the one shown in (l), appear to be tuned towards oriented image structures that are either approximately stationary or contain some motion along the preferred orientation. 16 t=1 2 3 4 5 6 7 8 9 10 11 12 t x t=1 2 3 4 5 6 7 8 9 10 11 12 t (a) (b) (c) (d) (e) (f) x (g) (h) (i) (j) (k) (l) t x Figure 9: Space-Time Receptive Fields (size = ). The top two rows depict the process of constructing space-time receptive field profiles from temporal sequences of 2-D spatial receptive fields. The top row is from Figure 7 while the second row is from Figure 8 (a). The - plots are obtained by integrating the 3-D spatiotemporal data along the neuron’s preferred orientation (in these two cases, along the horizontal direction). Note that orientation in - space indicates the neuron’s preferred direction of motion (in this case, upwards or downwards). In addition, the slope of approximately in both cases indicates a preferred speed of approximately pixel/time step for these neurons (since slope is inverse with the preferred velocity [Adelson and Bergen, 1985]). (b) through (l) show the space-time receptive fields for the model neurons in Figure 8 (b) through (l). In some cases, such as (d) and (h), the preferred speed is less than pixel/time step due to the aperture effect [Adelson and Movshon, 1982]. In the case of (l), the preferred speed is zero. 17 network and a companion network without the lateral inhibitory weights to a sequence of images depicting a bright vertical bar moving to the right on a dark background (Figure 10). In both cases, the responses of the model neurons in each network were plotted as histograms at each time step, with the first vertical bar corresponding to the response of the first neuron, the second to that of the second neuron, and so on. As seen in Figure 10, the trained network generates sparse distributed representations of the spatiotemporal input sequence (only a few neurons are active at each time step). On the other hand, disabling the lateral inhibitory connections results in a much larger number of neurons being active at each time step, suggesting that the lateral connections play a crucial role in generating sparse distributed representations of input stimuli. In the final set of experiments, we analyzed the direction selectivity of model neurons in the trained -neuron network. Figure 11 illustrates the experimental methodology used. Each neuron in the network was exposed to oriented dark/bright bars on a bright/dark background moving in a direction perpendicular to the bar’s orientation (bright or dark bars were chosen depending on which case elicited the largest response in steps of . Figure 11B shows from the neuron). The direction of motion was varied from to the cases where the direction of motion is (downwards) and (upwards) respectively. Figure 11C shows the response of the model neuron in these two cases as a function of the time from stimulus onset. As expected from the structure of its space-time receptive field, the neuron exhibits a significant response for a bar moving downwards (preferred direction) while a bar moving upwards (null direction) elicits little or no response. A direction selectivity index (DSI) was defined for each model neuron as: DSI ! Peak Response in Null Direction Peak Response in Preferred Direction (22) The preferred direction was taken to be the direction of motion eliciting the largest response from the model neuron and the null direction was set to the opposite direction. Thus, a neuron with a peak response of zero in the null direction and a nonzero response in the preferred direction has a DSI of or . A neuron with equal responses in both directions have a DSI of zero. Figure 12A shows polar response plots of four model neurons with increasing DSI from left to right. The rightmost plot is for the neuron in Figure 11 and its DSI of confirms its relatively high direction selectivity. Figure 12B shows the population distribution of direction selectivity in the -neuron network. A relatively large proportion of the neurons ( of the or ) had a direction selectivity index of or greater. 6 Discussion The results suggest that many of the important spatiotemporal properties of visual cortical neurons such as velocity and direction selectivity can be explained in terms of learning efficient spatiotemporal generative models for natural time varying images. Efficiency is defined not only in terms of reducing image reconstruction errors but also in terms of reducing statistical dependencies between network outputs. This redundancy reduction is implemented via recurrent lateral inhibition as derived from the MDL principle. The network also utilizes other lateral connections which are derived from the generative model. These lateral excitatory/inhibitory connections play a role similar to those used in Bayesian belief networks for facilitating the phenomenon of “explaining away” [Pearl, 1988]. The network additionally includes rectification of outputs to encourage sparseness among the network activities. Although rectification helps in the development of sparse distributed representations, rectification alone was found to be insufficient in producing satisfactory space-time receptive fields. Likewise, lateral inhibition alone was also found to be insufficient. 18 Input Responses (Without L. Inhibition) Responses (With L. Inhibition) Input 1 7 2 8 3 9 4 10 5 11 6 Responses (Without L. Inhibition) Responses (With L. Inhibition) 12 Figure 10: Lateral Inhibition and Sparse Distributed Representations of Input Stimuli. To test the role of the lateral inhibitory weights (see Figure 2) in generating sparse distributed representations, these weights were removed (after learning) neurons in the network at the and the network was exposed to a bright vertical bar moving to the right. The responses of the various time steps are shown as histograms (the first vertical bar is the response of the first neuron, the second is that of the second neuron, and so on). The histograms have been normalized such that the maximum response in each graph is one. The responses with lateral inhibition intact are shown to the right at each time step. As is evident from the relatively sparse number of neurons active at each time step in the “with lateral inhibition” case, the existence of lateral inhibitory connections in the network appears to be crucial for generating sparse distributed representations of the spatiotemporal input stimuli. 19 A B C 0.2 0.18 Response 0.16 0.14 Preferred Direction 0.12 0.1 0.08 0.06 0.04 0.02 t 0 0 2 4 6 8 10 12 0.2 0.18 x Response 0.16 Space-Time Receptive Field 0.14 0.12 Null Direction 0.1 0.08 0.06 0.04 0.02 0 0 Input Stimulus 2 4 6 8 10 12 Time from Stimulus Onset Figure 11: Example of Direction Selectivity. The model neuron from Figure 7 was tested for direction selectivity. (A) shows the space-time receptive field of this neuron. (B) depicts the two input stimuli used for testing: a dark horizontal bar in a white background, moving either downwards or upwards. (C) shows the response of the neuron as a function of the time from stimulus onset. As expected from the structure of its space-time receptive field, the neuron exhibits a significant response for a bar moving downwards (preferred direction), reaching a peak response of units at 11 time steps from stimulus onset. On the other hand, a bar moving upwards (null direction) elicits little or no response. 20 90 0.3 90 0.1 60 120 0.15 150 30 0 210 330 240 0.2 0.06 0.06 0.15 30 150 0.04 0.1 0.02 0.02 0.05 0 210 180 0 210 330 240 330 240 300 270 DSI = 1.77 % 30 150 0.04 180 300 60 120 0.08 30 270 90 0.25 60 120 0.08 150 0.075 180 90 0.1 60 120 0.225 180 0 210 240 300 270 DSI = 22.00 % 330 300 270 DSI = 63.46 % DSI = 92.25 % A 40 35 Number of Neurons 30 25 20 15 10 5 0 0 10 20 30 40 50 60 70 80 90 100 Direction Selectivity Index (DSI) % B Figure 12: Analysis of Direction Selectivity. (A) shows polar response plots of four model neurons when exposed to bars oriented at a particular angle and moving in a direction perpendicular to the orientation. The radius of the plot indicates the strength of the response while the angle indicates the direction of motion of the bar. The sequence of plots is arranged from left to right in the order of increasing direction selectivity, as given by the direction selectivity index DSI (see text for definition). The polar plot on the extreme right is for the neuron in Figure 11, confirming its relatively high direction selectivity. (B) shows the distribution of direction selectivity in the population of neurons in the trained network. A relatively large proportion of the neurons ( of the ) had a direction selectivity index of or greater. 21 Several previous models of direction selectivity have utilized recurrent lateral interactions and rectification [Suarez et al., 1995; Maex and Orban, 1996; Mineiro and Zipser, 1997], although without the statistical perspective pursued herein. In addition, the receptive fields in many of these approaches were hard-wired by hand rather than learned. An interesting issue is how the spatiotemporally varying synaptic weights of neurons in the present model can be implemented biologically by intrinsic synaptic mechanisms and axonal-dendritic interactions over time. In this regard, it is possible that space time receptive fields similar to those obtained by the present method may also arise in other neural circuits that do not explicitly use spatiotemporal synaptic weights. Most of the synaptic weights in the trained networks were found to be space-time inseparable. The lack of larger numbers of separable receptive fields in the trained networks suggests an alternative mechanism for the generation of such receptive fields. Most cortical receptive fields that are space-time separable have a temporal weighting profile that approximates a derivative in time. We have previously shown [Rao and Ballard, 1997b] that a generative model based on a first-order Taylor series approximation of an image produces localized oriented filters that compute spatial derivatives for estimating translations in the image plane. A possibility that we are currently investigating is to use a Taylor series expansion of an image in both space and time, and to ascertain whether such a strategy produces separable filters that compute derivatives in both space and time. Other issues being pursued include explaining contrast normalization effects[Albrecht and Geisler, 1991; Heeger, 1991] and recasting the hierarchical framework proposed in[Rao and Ballard, 1997a] to accommodate the spatiotemporal generative model proposed herein. References [Adelson and Bergen, 1985] E.H. Adelson and J. Bergen. Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A, 2(2):284–299, 1985. [Adelson and Movshon, 1982] E.H. Adelson and J.A. Movshon. Phenomenal coherence of moving visual patterns. Nature, 300:523–525, 1982. [Albrecht and Geisler, 1991] D.G. Albrecht and W.S. Geisler. Motion sensitivity and the contrast-response function of simple cells in the visual cortex. Visual Neurosci., 7:531–546, 1991. [Atick and Redlich, 1992] J.J. Atick and A.N. Redlich. What does the retina know about natural scenes? Neural Computation, 4(2):196–210, 1992. [Atick and Redlich, 1993] J.J. Atick and A.N. Redlich. Convergent algorithm for sensory receptive field development. Neural Computation, 5:45–60, 1993. [Atick, 1992] J.J. Atick. Could information theory provide an ecological theory of sensory processing. Network, 3:213–251, 1992. [Baddeley and Hancock, 1991] R.J. Baddeley and P.J.B. Hancock. A statistical analysis of natural images matches psychophysically derived orientation tuning curves. Proc. R. Soc. Lond. Ser. B, 246:219–223, 1991. [Barlow, 1961] H.B. Barlow. Possible principles underlying the transformation of sensory messages. In W.A. Rosenblith, editor, Sensory Communication, pages 217–234. Cambridge, MA: MIT Press, 1961. 22 [Barlow, 1972] H.B. Barlow. Single units and cognition: A neurone doctrine for perceptual psychology. Perception, 1:371–394, 1972. [Barlow, 1989] H.B. Barlow. Unsupervised learning. Neural Computation, 1:295–311, 1989. [Barlow, 1994] H.B. Barlow. What is the computational goal of the neocortex? In C. Koch and J.L. Davis, editors, Large-Scale Neuronal Theories of the Brain, pages 1–22. Cambridge, MA: MIT Press, 1994. [Barrow, 1987] H.G. Barrow. Learning receptive fields. In Proceedings of the IEEE Int. Conf. on Neural Networks, pages 115–121, 1987. [Bell and Sejnowski, 1997] A.J. Bell and T.J. Sejnowski. The ‘independent components’ of natural scenes are edge filters. Vision Research (in press), 1997. [Bienenstock et al., 1982] E. L. Bienenstock, L. N. Cooper, and P. W. Munro. Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J. Neurosci., 2:32–48, 1982. [Bryson and Ho, 1975] A.E. Bryson and Y.-C. Ho. Applied Optimal Control. New York: John Wiley and Sons, 1975. [Burr et al., 1986] D.C. Burr, J. Ross, and M.C. Morrone. Seeing objects in motion. Proc. R. Soc. Lond. Ser. B, 227:249–265, 1986. [Daugman, 1980] J.G. Daugman. Two-dimensional spectral analysis of cortical receptive field profiles. Vision Research, 20:847–856, 1980. [Daugman, 1988] J.G. Daugman. Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression. IEEE Trans. Acoustics, Speech, and Signal Proc., 36(7):1169–1179, 1988. [Dayan et al., 1995] P. Dayan, G.E. Hinton, R.M. Neal, and R.S. Zemel. The Helmholtz machine. Neural Computation, 7:889–904, 1995. [DeAngelis et al., 1993a] G.C. DeAngelis, I. Ohzawa, and R.D. Freeman. Spatiotemporal organization of simple-cell receptive fields in the cat’s striate cortex. I. General characteristics and postnatal development. J. Neurophysiol., 69(4):1091–1117, 1993. [DeAngelis et al., 1993b] G.C. DeAngelis, I. Ohzawa, and R.D. Freeman. Spatiotemporal organization of simple-cell receptive fields in the cat’s striate cortex. II. Linearity of temporal and spatial summation. J. Neurophysiol., 69(4):1091–1117, 1993. [DeAngelis et al., 1995] G.C. DeAngelis, I. Ohzawa, and R.D. Freeman. Receptive-field dynamics in the central visual pathways. Trends in Neuroscience, 18:451–458, 1995. [Dempster et al., 1977] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society Series B, 39:1–38, 1977. 23 [Dong and Atick, 1995] D.W. Dong and J.J. Atick. Statistics of natural time-varying images. Network: Computation in Neural Systems, 6(3):345–358, 1995. [Eckert and Buchsbaum, 1993] M.P. Eckert and G. Buchsbaum. Efficient encoding of natural time varying images in the early visual system. Phil. Trans. R. Soc. Lond. B, 339:385–395, 1993. [Emerson et al., 1987] R.C. Emerson, M.C. Citron, W.J. Vaughn, and S.A. Klein. Nonlinear directionally selective subunits in complex cells of cat striate cortex. J. Neurophysiol., 58:33–65, 1987. [Field, 1987] D.J. Field. Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A, 4:2379–2394, 1987. [Field, 1994] D.J. Field. What is the goal of sensory coding? Neural Computation, 6:559–601, 1994. [Foldiak, 1990] P. Foldiak. Forming sparse representations by local anti-Hebbian learning. Biol. Cybern., 64:165–170, 1990. [Hancock et al., 1992] P.J.B. Hancock, R.J. Baddeley, and L.S. Smith. The principal components of natural images. Network, 3:61–70, 1992. [Harpur and Prager, 1996] G.F. Harpur and R.W. Prager. Development of low-entropy coding in a recurrent network. Network, 7:277–284, 1996. [Hartline, 1940] H.K. Hartline. The receptive fields of optic nerve fibers. Am. J. Physiol., 130:690–699, 1940. [Heeger, 1991] D.J. Heeger. Non-linear model of neural responses in cat visual cortex. In Computational models of visual processing, pages 119–133. Cambridge, MA: MIT Press, 1991. [Hinton and Ghahramani, 1997] G.E. Hinton and Z. Ghahramani. Generative models for discovering sparse distributed representations. Phil. Trans. Roy. Soc. Lond. B, 1997. To appear. [Hinton and Sejnowski, 1986] G.E. Hinton and T.J. Sejnowski. Learning and relearning in Boltzmann machines. In D.E. Rumelhart and J.L. McClelland, editors, Parallel Distributed Processing, volume 1, chapter 7, pages 282–317. MIT Press, Cambridge, 1986. [Hubel and Wiesel, 1962] D.H. Hubel and T.N. Wiesel. Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. Journal of Physiology (London), 160:106–154, 1962. [Hubel and Wiesel, 1968] D.H. Hubel and T.N. Wiesel. Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology (London), 195:215–243, 1968. [Huber, 1985] P.J. Huber. Projection pursuit (with discussion). Annals of Statistics, 13:435–525, 1985. [Intrator, 1992] N. Intrator. Feature extraction using an unsupervised neural network. Neural Computation, 4(1):98–107, 1992. [Jordan and Rumelhart, 1992] M.I. Jordan and D.E. Rumelhart. Forward models: Supervised learning with a distal teacher. Cognitive Science, 16:307–354, 1992. 24 [Kalman, 1960] R.E. Kalman. A new approach to linear filtering and prediction theory. Trans. ASME J. Basic Eng., 82:35–45, 1960. [Law and Cooper, 1994] C.C. Law and L.N. Cooper. Formation of receptive fields in realistic visual environments according to the Bienenstock, Cooper, and Munro (BCM) theory. Proc. Natl. Acad. Sci. USA, 91:7797–7801, 1994. [Lee and Seung, 1997] D.D. Lee and H.S. Seung. Unsupervised learning by convex and conic coding. In M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9. Cambridge, MA: MIT Press, 1997. To appear. [Linsker, 1988] R. Linsker. Self-organization in a perceptual network. Computer, 21(3):105–117, 1988. [Maex and Orban, 1996] R. Maex and G.A. Orban. Model circuit of spiking neurons generating directional selectivity in simple cells. J. Neurophysiol., 75(4):1515–1545, 1996. [Marcelja, 1980] S. Marcelja. Mathematical description of the responses of simple cortical cells. Journal of the Optical Society of America, 70:1297–1300, 1980. [McLean and Palmer, 1989] J. McLean and L.A. Palmer. Contribution of linear spatiotemporal receptive field structure to velocity selectivity of simple cells in the cat’s striate cortex. Vision Research, 29:675– 679, 1989. [McLean et al., 1994] J. McLean, S. Raab, and L.A. Palmer. Contribution of linear mechanisms to the specification of local motion by simple cells in areas 17 and 18 of the cat. Visual Neurosci., 11:271–294, 1994. [Mineiro and Zipser, 1997] P. Mineiro and D. Zipser. Analysis of direction selectivity arising from recurrent cortical interactions. Technical Report 97.03, Dept. of Cog. Science, UCSD, 1997. [Mumford, 1994] D. Mumford. Neuronal architectures for pattern-theoretic problems. In C. Koch and J.L. Davis, editors, Large-Scale Neuronal Theories of the Brain, pages 125–152. Cambridge, MA: MIT Press, 1994. [Oja, 1989] E. Oja. Neural networks, principal components, and subspaces. International Journal of Neural Systems, 1:61–68, 1989. [Olshausen and Field, 1996] B.A. Olshausen and D.J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609, 1996. [Olshausen and Field, 1997] B.A. Olshausen and D.J. Field. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 1997. To appear. [Pearl, 1988] J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, CA, 1988. 25 [Pece, 1992] A.E.C. Pece. Redundancy reduction of a Gabor representation: a possible computational role for feedback from primary visual cortex to lateral geniculate nucleus. In I. Aleksander and J. Taylor, editors, Artificial Neural Networks 2, pages 865–868. Amsterdam: Elsevier Science, 1992. [Rao and Ballard, 1997a] R.P.N. Rao and D.H. Ballard. Dynamic model of visual recognition predicts neural response properties in the visual cortex. Neural Computation, 9(4):721–763, 1997. [Rao and Ballard, 1997b] R.P.N. Rao and D.H. Ballard. Localized receptive fields may mediate transformation-invariant recognition in the visual cortex. Technical Report 97.2, National Resource Laboratory for the Study of Brain and Behavior, Department of Computer Science, University of Rochester, May 1997. [Reid et al., 1991] R.C. Reid, R.E. Soodak, and R.M. Shapley. Directional selectivity and spatiotemporal structure of receptive fields of simple cells in cat striate cortex. J. Neurophysiol., 66:505–529, 1991. [Rissanen, 1989] J. Rissanen. Stochastic Complexity in Statistical Inquiry. Singapore: World Scientific, 1989. [Sanger, 1989] T.D. Sanger. Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks, 2:459–473, 1989. [Shapley et al., 1992] R.M. Shapley, R.C. Reid, and R. Soodak. Spatiotemporal receptive fields and direction selectivity. In M.S. Landy and J.A. Movshon, editors, Computational Models of Visual Processing, pages 109–118. Cambridge, MA: MIT Press, 1992. [Shouval, 1995] H. Shouval. Formation and organisation of receptive fields, with an input environment composed of natural scenes. PhD thesis, Dept. of Physics, Brown University, 1995. [Suarez et al., 1995] H. Suarez, C. Koch, and R. Douglas. Modeling direction selectivity of simple cells in striate visual cortex with the framework of the canonical microcircuit. J. Neurosci., 15(10):6700–6719, 1995. [Watson and Ahumada, 1985] A.B. Watson and A.J. Ahumada. Model of human visual-motion sensing. J. Opt. Soc. Am. A, 2(2):322–341, 1985. [Wiener, 1949] N. Wiener. The Extrapolation, Interpolation, and Smoothing of Stationary Time Series with Engineering Applications. New York: Wiley, 1949. [Williams, 1985] R.J. Williams. Feature discovery through error-correction learning. Technical Report 8501, Institute for Cognitive Science, University of California at San Diego, 1985. [Zemel, 1994] R.S. Zemel. A Minimum Description Length Framework for Unsupervised Learning. PhD thesis, Department of Computer Science, University of Toronto, 1994. 26