Efficient Encoding of Natural Time Varying Images Produces

advertisement
Efficient Encoding of Natural Time Varying Images
Produces Oriented Space-Time Receptive Fields
Rajesh P. N. Rao and Dana H. Ballard
Department of Computer Science
University of Rochester
Rochester, NY 14627
rao,dana @cs.rochester.edu
Technical Report 97.4
National Resource Laboratory for the Study of Brain and Behavior
Department of Computer Science, University of Rochester
August 1997
Abstract
The receptive fields of neurons in the mammalian primary visual cortex are oriented not only in
the domain of space, but in most cases, also in the domain of space-time. While the orientation of a
receptive field in space determines the selectivity of the neuron to image structures at a particular orientation, a receptive field’s orientation in space-time characterizes important additional properties such as
velocity and direction selectivity. Previous studies have focused on explaining the spatial receptive field
properties of visual neurons by relating them to the statistical structure of static natural images. In this
report, we examine the possibility that the distinctive spatiotemporal properties of visual cortical neurons
can be understood in terms of a statistically efficient strategy for encoding natural time varying images.
We describe an artificial neural network that attempts to accurately reconstruct its spatiotemporal input data while simultaneously reducing the statistical dependencies between its outputs. The network
utilizes spatiotemporally summating neurons and learns efficient sparse distributed representations of
its spatiotemporal input stream by using recurrent lateral inhibition and a simple threshold nonlinearity
for rectification of neural responses. When exposed to natural time varying images, neurons in a simulated network developed localized receptive fields oriented in both space and space-time, similar to the
receptive fields of neurons in the primary visual cortex.
1 Introduction
Since the seminal experiments of Hubel and Wiesel over 30 years ago[Hubel and Wiesel, 1962; 1968], it
has been known that neurons in the mammalian primary visual cortex respond selectively to stimuli such
as edges or bars at particular orientations. In many cases, the neurons are directionally selective i.e. they
respond only to motion in a particular direction. An especially useful concept in characterizing the response
properties of visual neurons has been the notion of a receptive field. The receptive field of a neuron is
classically defined as the area of visual space within which stimuli such as bars or edges can elicit responses
This research was supported by NIH/PHS research grant 1-P41-RR09283.
1
from the neuron [Hartline, 1940]. Although they are a function of both space and time, early depictions of
visual receptive fields were confined to spatial coordinates. In recent years, new mapping techniques have
allowed the characterization of receptive fields in both space and time[Emerson et al., 1987; McLean and
Palmer, 1989; Shapley et al., 1992; DeAngelis et al., 1993a] (see [DeAngelis et al., 1995] for a review). The
new mapping results indicate that in most cases, the receptive field of a visual neuron changes over time.
It has been noted that while the spatial structure of a receptive field indicates neuronal attributes such as
preference for a particular orientation of bars or edges, the spatiotemporal structure of a neuron’s receptive
field governs important dynamical properties such as velocity and direction selectivity[Adelson and Bergen,
1985; Watson and Ahumada, 1985; Burr et al., 1986]. In particular, orientation of a neuron’s receptive field
in space-time indicates the preferred direction of motion while the slope of the oriented subregions gives an
estimate of the preferred velocity [McLean and Palmer, 1989; Albrecht and Geisler, 1991; Reid et al., 1991;
DeAngelis et al., 1993b; McLean et al., 1994].
An attractive approach to understanding the receptive field properties of visual neurons is to relate
them to the statistical structure of natural images. Motivated by the property that natural images possess
a
power spectrum [Field, 1987], Atick and Redlich [Atick, 1992; Atick and Redlich, 1992] provided
an explanation of the center-surround structure of retinal ganglion receptive fields in terms of whitening
or decorrelation of outputs in response to natural images. Several Hebbian learning algorithms for decorrelation have also been proposed [Bienenstock et al., 1982; Williams, 1985; Barrow, 1987; Linsker, 1988;
Oja, 1989; Sanger, 1989; Foldiak, 1990; Atick and Redlich, 1993], many of which perform Principal Component Analysis (PCA). Although the PCA of natural images produces lower order components that resemble oriented filters [Baddeley and Hancock, 1991; Hancock et al., 1992], the higher order components
are unlike any known neural receptive field profiles. In addition, the receptive fields obtained are global
rather than localized feature detectors. Recently, Olshausen and Field showed that a neural network that
includes the additional constraint of maximizing the sparseness of the distribution of output activities develops, when trained on static natural images, synaptic weights with localized, oriented spatial receptive
fields [Olshausen and Field, 1996] (see also [Harpur and Prager, 1996; Rao and Ballard, 1997a] and related
work on projection pursuit [Huber, 1985] based learning methods [Intrator, 1992; Law and Cooper, 1994;
Shouval, 1995]). Similar results have also been obtained using an algorithm that extracts the independent components of a set of static natural images [Bell and Sejnowski, 1997]. These algorithms are all
based directly or indirectly on Barlow’s principle of redundancy reduction [Barlow, 1961; 1972; 1989;
1994], where the goal is to learn “feature detectors” whose outputs are as statistically independent as possible. The underlying motivation is that sensory inputs such as images are generally comprised of a set of
independent objects or features whose components are highly correlated. By learning detectors for these
independent features, the sensory system can develop accurate internal models of the sensory environment
and can efficiently represent external events as sparse conjunctions of independent features.
In this paper, we explore the possibility that the distinctive spatiotemporal receptive field properties of
visual cortical neurons can be understood in terms of a statistically efficient strategy for encoding natural
time varying images [Eckert and Buchsbaum, 1993; Dong and Atick, 1995]. We describe an artificial neural
network that attempts to accurately reconstruct its spatiotemporal input data while simultaneously reducing
the statistical dependencies between its outputs, as advocated by the redundancy reduction principle. Our
approach utilizes a spatiotemporal generative model that can be viewed as a simple extension of the spatial
generative model used by Harpur and Prager[Harpur and Prager, 1996], Olshausen and Field [Olshausen and
2
Field, 1996], Rao and Ballard [Rao and Ballard, 1997a], and others. The spatiotemporal generative model allows neurons in the network to perform not just a spatial summation of the current input, but a spatiotemporal
summation of both current and past inputs over a finite spatiotemporal extent. The network learns efficient
sparse distributed representations of its spatiotemporal input stream by utilizing lateral inhibition[Foldiak,
1990] and a simple threshold nonlinearity for rectification of neural responses [Lee and Seung, 1997;
Hinton and Ghahramani, 1997]. When exposed to natural time varying images, neurons in a simulated
network developed localized receptive fields oriented in both space and space-time, similar to the receptive
fields of neurons in the primary visual cortex.
2 Spatial Generative Models
The idea of spatial generative models has received considerable attention in recent studies pertaining to
neural coding [Hinton and Sejnowski, 1986; Jordan and Rumelhart, 1992; Zemel, 1994; Dayan et al., 1995;
Hinton and Ghahramani, 1997], although the roots of the approach can be traced back to early ideas in
control theory such as Wiener filtering [Wiener, 1949] and Kalman filtering [Kalman, 1960]. In this section,
we first consider a class of spatial generative models that have previously been used in the neural modeling
literature for explaining spatial receptive field properties [Harpur and Prager, 1996; Olshausen and Field,
1996; Rao and Ballard, 1997a]. This will serve to motivate the spatiotemporal models we will be concerned
with later.
Assume that an image, denoted by a vector I of pixels, can be represented as a linear combination of
a set of basis vectors :
I
(1)
The coefficients can be regarded as an internal representation of spatial characteristics of the image I, as
interpreted using the internal model defined by the basis vectors . In terms of a neuronal network, the
coefficients correspond to the activities or firing rates of neurons while the vectors in the basis matrix correspond to the synaptic weights of neurons. It is convenient to rewrite the above equation in matrix form
as:
I r
(2)
where is the matrix whose columns consist of the basis vectors and r is the vector consisting
of coefficients .
The goal is to estimate the coefficients r for a given image and, on a longer time scale, learn appropriate
basis vectors in . A standard approach is to define a least-squared error criterion of the form:
!
!
I r "$#
r"
!
I r"
(3)
(4)
where denotes the % th pixel of I and denotes the % th row of . Note that is simply the sum of
squared pixel-wise errors between the input I and the image reconstruction r. Estimates for and r can
be obtained by minimizing [Williams, 1985; Daugman, 1988; Pece, 1992; Harpur and Prager, 1996;
Olshausen and Field, 1996; Rao and Ballard, 1997a].
3
One can obtain a probabilistic generative model of the image generation process by utilizing a Gaussian
noise process n to model the differences between I and r. The resulting stochastic generative model
becomes:
I r n
(5)
If a zero mean Gaussian noise process n with unit covariance is assumed, one can show that is the negative log likelihood of generating the input I (see, for example,[Bryson and Ho, 1975]). Thus, minimizing
is equivalent to maximizing the likelihood of the observed data.
Unfortunately, in many cases, minimization of a least-squares optimization function such as without additional constraints generates solutions that are far from being adequate descriptors of the true input
generation process. For example, a popular solution to the least-squares minimization criterion is principal component analysis (PCA), also sometimes referred to as eigenvector or singular value decomposition
(SVD) of the input covariance matrix. PCA optimizes by finding a set of mutually orthogonal basis vectors that are aligned with the directions of maximum variance in the input data. This is perfectly adequate in
the case where the input data clouds are Gaussian and capturing pairwise statistics suffices. However, statistical studies have shown that natural image distributions are highly non-Gaussian and cannot be adequately
described using orthogonal bases [Field, 1994]. Thus, additional constraints are required in order to guide
the optimization process towards solutions that more accurately reflect the input generation process.
One way of adding constraints is to take into account the prior distributions of the parameters r and .
Thus, one can minimize an optimization criterion of the form:
r"
"
(6)
where
" and " are terms related to the prior distributions of the parameters r and . In particular,
they denote the negative log of the prior probabilities of r and respectively. When viewed in the context
of information theory, these negative log probability terms in can be interpreted as representing the cost
of coding the parameters in bits (in the base ). Thus, the function can be given an interpretation in terms
of the minimum description length (MDL) principle [Rissanen, 1989; Zemel, 1994], namely, that solutions
are required not only to be accurate but also to be cheap in terms of coding length. This formalizes the wellknown Occam’s Razor principle that advocates simplicity over complexity among solutions to a problem.
One may also note that minimizing is equivalent to maximizing the posterior probability of input data
(maximum a posteriori (MAP) estimation).
Specific choices of and determine the the nature of internal representations that will be learned.
[Olshausen and Field, 1996] proposed functions of the form
For example, Olshausen
and
Field
" !
and
to encourage sparseness in r. Alternately, one can use a
",
" " zero-mean multivariate Gaussian prior on r [Rao and Ballard, 1997a] to yield the negative log of the prior
density:
r" r# r
(7)
where is a positive constant and denotes a set of lateral weights. The matrix
represents the inverse
covariance matrix of r. We show in the next section that this choice enforces lateral inhibition among
the output neurons, thereby encouraging sparse distributed representations, and leads to an “anti-Hebbian”
learning rule for the lateral weights equivalent to Foldiak’s well-known adaptation rule[Foldiak, 1990].
4
Generative
Weights
Spatial Generative Model
U
r
A
I
Spatial
Response
Vector
I
Image
Spatiotemporal Generative Model
Image Sequence
I(k)
I(2)
I(1)
I(k)
I(2)
B
r
U(1)
r
U
U(2)
U(k)
I(1)
U(k)
U(2)
r
U(1)
Spatiotemporal
Response
Vector
Figure 1: Generative Models. (A) Linear spatial generative model used in [Harpur and Prager, 1996; Olshausen and Field, 1996;
Rao and Ballard, 1997a]. A given input image I is assumed to be generated by multiplying a basis vector matrix with a set of
“hidden” causes represented by the spatial response vector r. (B) Spatiotemporal generative model used in this paper. An input
image sequence I , , is assumed to be generated by multiplying a set of basis matrices , , with a
spatiotemporal response vector r.
In the case of , for the sake of simplicity, we will assume a Gaussian distribution with a covariance
yielding the following value for :
where
" ,
(8)
denotes the sum of squares of the elements of the given matrix.
A final constraint is motivated both by biology and by information coding concerns. We will constrain
the network outputs (coefficients ) to be non-negative, acknowledging the fact that the firing rate of a
neuron cannot be negative. The non-negativity constraint is especially attractive in information coding
terms since it causes an infinite density at for the rectified and a consequent low coding cost at [Hinton and Ghahramani, 1997], which encourages sparseness among the outputs of the network.
We are overlooking the possibility that a single neuron can signal both positive and negative quantities by raising or lowering
its firing rate with respect to a fixed background firing rate corresponding to zero.
5
3 Spatiotemporal Generative Models
In the previous section, the input data consisted of static images I, and we used a single basis matrix to
capture the statistical structure of the space of input images. Furthermore, the spatial structure given by the
pixels of the image I were internally represented by a single spatial response vector r. We now draw an
analogy between time and space to define a spatiotemporal generative model.
Suppose our training
sequence being
set consists of different sequences of images each, a given
denoted by the
vectors
I
,
.
We
will
use
a
set
of
basis
matrices
,
. For
"
"
" will be used to capture the statistical structure of the images occurring at the time step in
each , the training sequences. As will become clear in the next section, the underlying motivation here is that the
neurons in the network perform not just a spatial summation, but a space-time summation of inputs over
a finite spatiotemporal extent (see Figure 2). Thus, since inputs are weighted differentially depending on
their spatiotemporal history, we need to learn a set of synaptic weights, one for each time instant . These
" , in turn determine the spatiotemporal receptive fields
spatiotemporal synaptic weights, as given by the of the neurons in the network.
A single spatiotemporal response vector r will be used to characterize a given spatiotemporal image
, in much the same way as a single spatial response vector was previously used to
sequence for pixels of a given static image. We thus obtain the following
characterize the spatial structure for %
space-time analog of Equation 2:
" r I " (9)
Note that in the special case where , we obtain Equation 2.
From a probabilistic perspective, one can rewrite Equation 9 in the form of the following stochastic
generative model:
" I " I " " r n
(10)
..
..
. . I "
"
where n is a stochastic noise process accounting for the differences between I " and is easy to see that Equation 5 is a special case of the above generative model, where We can now define the following space-time analog of the optimization function 4:
! " " r"
!
" r" # I " ! " r"
I "
" r. Once again, it
.
(11)
(12)
This is simply the sum of squared pixel-wise reconstruction errors across both space and time. As in the
previous section, if n is assumed to be a zero mean Gaussian
with
unit
covariance,
one
can
show
that
is the negative log likelihood of generating the inputs I " I " . Thus, minimizing is equivalent to
maximizing the likelihood of the spatiotemporal input sequence.
6
As in the previous section, by assuming Gaussian priors on the parameters r, following MDL-based cost function:
r# r
" and , we obtain the
(13)
We will additionally constrain the elements of r to be non-negative as discussed in the previous section.
" , and . This can be achieved by
Thus, we are now left with the task of finding optimal estimates of r, minimizing using gradient descent as discussed in the next section.
4 Network Dynamics and Learning Rules
In this section, we describe gradient-descent based estimation rules for obtaining optimal estimates of r,
4.1, one may alternate between
" and . In the case where the input consists of batch data, as in Section
" and , and the optimization of " and for fixed r, thereby implethe optimization of r for fixed menting a form of the expectation-maximization (EM) algorithm [Dempster et al., 1977]. In the case
of
" and
on-line data, which is considered in Section
4.2, the optimization of r occurs simultaneously with . However, the learning rates for " and are set to values much smaller than the adaptation rate of r.
4.1 Estimation of r
An optimal estimate of r can be obtained by performing stochastic gradient descent on
r !
r
!
"#
" " r" ! with respect to r:
r
(14)
where governs the rate of descent towards the minima. The above equation is relatively easy
! to inter
pret: in order to modify r towards the optimal estimate, we need to obtain the residual error
" " r"
" r. The residual errors for the
between the input at time and its reconstruction various time steps
are then spatiotemporally weighted by their corresponding weights " # to obtain a spatiotemporal sum, which is modified by lateral inhibition due to the term !
r. In a neural implementation,
" # would
the individual rows of the matrices comprise
the
synaptic
weights
of a single spatiotemporally
summating neuron.
Thus, the % th row of " # would represent the effect of the synapses of the % th neuron
for time instant . Figure 2A shows a network implementation of the above dynamics. A possible problem
with this implementation is the need for computing global residual errors at each iteration of the estimation
process. This becomes especially problematic in the case where the data is being obtained on-line since one
would need to keep
the
past
images
in
memory
and
in
addition,
use
separate sets of neurons representing
the matrices " for generating the signals " r at each iteration.
The dynamics can be implemented more locally by rewriting Equation 14 in the following form:
r
! "#
"
r ! r
(15)
where " # " . Note that this form of the dynamics does not require that residual
errors
be computed at each iteration. Rather,
we simply perform a spatiotemporal filtering of the inputs I " using
the synaptic weight matrices " # for and then subtract two lateral terms, one involving 7
T
Residual
Error
Signals
Spatiotemporal
Filtering of Errors
U(k)
T
U(2)
+
T
U(1)
A
+
Lateral
Inhibition
I(k)
U(k)
I(1)
r
U(2)
Image
Sequence
Spatiotemporal
Response Vector
U(1)
Spatiotemporal
Filtering of Inputs
I(k)
B
L
Rectify
I(2)
I(2)
I(1)
T
Lateral
Inhibition
L
U(k)
T
U(2)
+
+
Rectify
r
T
U(1)
W
Recurrent
Excitation/Inhibition
Figure 2: Alternate Network Implementations of the Response Vector Dynamics. (A) shows a globally recurrent architecture
for implementing the dynamics of the response vector r (Equation 16). The architecture requires the calculation of residual error
signals [Mumford, 1994; Barlow, 1994; Pece, 1992; Rao and Ballard, 1997a] between predicted images and actual images. These
errors are filtered by a set of spatiotemporally summating neurons whose synapses represent the matrices . The result is used to
correct r in conjunction with recurrent lateral inhibition and rectification. (B) shows a biologically more plausible locally recurrent
architecture for implementing the dynamics of r. Rather than computing global residual errors, the inputs are directly filtered by a
set of spatiotemporally summating neurons representing . The result is then further modified by recurrent excitation/inhibition
due to the weights
(see text) as well as lateral inhibition due to the weights .
8
and the other involving . While the term involving is exclusively inhibitory, the components
of the
vector ! r may be excitatory or inhibitory. Once again, the rows of the matrices
correspond
to
"
#
" # representing the synaptic
the synapses of spatiotemporally summating neurons, with
the % th row of weights of the % th spatiotemporal neuron for time instant .
Expressing the above dynamics in its discrete form and adding a final constraint of non-negativity results
in the following equation for updating r until convergence is achieved for the given input sequence:
r %
!
"#
"
" r%"
r %" !
r % " "
(16)
" , applied to all components of the
where is a threshold nonlinearity for rectification: given vector. It is interesting to note the similarity between the dynamics as given by the stochastic gradient
descent rule above and those proposed by Lee and Seung for their “Conic” network[Lee and Seung, 1997].
In particular, the above equation can regarded as a spatiotemporal extension of the dynamics used in the
Conic network. A network implementation of this equation is shown in Figure 2B.
4.2 On-Line Estimation
The dynamics described above can be extended to the more realistic case where the inputs are being encountered on-line. Let represent the current time instant. The on-line form of Equation 15 is then given
by:
"# !
r
" ! r!
r
(17)
Expressing the above dynamics in its discrete form and enforcing the constraint of non-negativity yields the
following rule for updating r at each time instant:
r
"# !
" r "
" !
r " !
r " "
(18)
where the operator again denotes rectification. In summary, the
current spatiotemporal
response
is
determined by three factors: the previous spatiotemporal response r " , the past inputs I " I !
" , and lateral inhibition due to and . These three factors at time instant are combined via
summation, followed by rectification, to yield the responses at time instant . The consideration of only
the past inputs rather than the entire input history in the equation above is consistent with the observation
that cortical neurons process stimuli within restricted temporal epochs of time (see, for example,[DeAngelis
et al., 1995]).
4.3 Learning Rules
A learning rule for determining the optimal estimate for each " , for each :
descent on with respect to " !
"
9
" can be obtained by performing gradient
" ! " r " r# !
"
(19)
where is the learning rate parameter. Note that for the on-line case, I " is replaced by its on-line
counterpart I !
" in the above equation.
Similarly, a learning rule for the lateral weights can be obtained by performing gradient descent on
with respect to :
!
rr# !
(20)
Note that this is a Hebbian learning rule with decay. In conjunction with the inhibition in the dynamics of r
(Equation 18), the above learning rule can be seen to be equivalent to Foldiak’s anti-Hebbian learning rule
[Foldiak, 1990], if the diagonal terms of , which implement self-inhibition, are set to zero.
5 Experimental Results
The algorithms derived in the previous section were tested on a set of five digitized natural images from the
Ansel Adams Fiat Lux collection at the UCR/California Museum of Photography (Figure 3A, images repro pixels with grayscale pixel values between and
duced
with permission). The images were of size
. Each image was preprocessed by filtering with a circularly symmetrical zero-phase whitening/low-pass
filter with the spatial frequency profile [Olshausen and Field, 1996; Atick and Redlich, 1992]:
(21)
" where the cut-off frequency
As described in [Olshausen and Field, 1997], the
cycles/image.
" whitening component of the filter performs “sphering” for natural image data by attenuating
the
low
frequencies
and
boosting
the
higher
frequencies.
The low-pass exponential component
" helps to reduce the effects of noise/aliasing at high frequencies and eliminates the artifacts of
using a rectangular sampling lattice.
Figure 3B shows the frequency profile of the whitening/low-pass filter in the 1-D case while Figure 3C
shows the 2-D case. The corresponding spatial profile obtained via inverse Fourier transform is shown
in Figure 3D. The spatial profile resembles the well-known center-surround receptive fields characteristic
of retinal ganglion cells. Atick and Redlich [Atick and Redlich, 1992; Atick, 1992] have shown that the
measured
spatial frequency profiles of retinal ganglion cells are well approximated by filters resembling
" . Figure 3E shows the results of filtering an image from the training set using
".
The training data comprised of
sequences
of
contiguous
image
patches
extracted
from
the filtered natural
patch is shown as a box labeled RF in Figure 3E. Starting from an
images. The relative size of a
image patch extracted at a randomly selected location in a given training image, the next training patch
was extracted by moving in one of directions as shown in Figure 3F. This was achieved by incrementing
or decrementing the and/or coordinate
of the current patch location by one according to the current
direction of motion. After every image patches, the current direction was set randomly to one of the
possible directions. The direction was also randomly changed in case an image boundary was encountered.
A new training image was obtained after every patches, thereby cycling through the five preprocessed
training images.
In the first experiment, a network of
neurons was trained on sequences of natural image patches of
size
. The temporal extent of processing was set to , resulting in
sets of synaptic weight
" # , , were initialized to uniformly random
vectors. These vectors, which form the rows of values and normalized to length one. After the presentation of each training image patch, the response vector
10
A
120
80
100
60
IFT
80
K(f)
60
40
20
0
40
FT
20
−20
5
5
0
0
0
0
50
100
150
200
250
300
f
350
fy
−5
fx
D
C
B
−5
Image
RF
E
F
Figure 3: Training Paradigm. (A) The five natural images used in the experiments. The original negatives are from the Ansel
Adams Fiat Lux collection at the UCR/California Museum of Photography (images reproduced with permission). (B) 1-D spatial
frequency response profile of zero-phase whitening/low-pass filter used for preprocessing the natural images. (C) The full 2-D
response profile of the same filter. (D) The profile in the space domain obtained by inverse Fourier transform (IFT), showing the
familiar center-surround receptive field characteristic of retinal ganglion cells (see text). (E) A natural image after being filtered
with the center-surround filter in (D). The relative size of a receptive field (same size as the input image patches) is shown
as a box labeled “RF” on the bottom right corner. (F) depicts the process of extracting image patches from the filtered natural
images for training the neural network. Patches are extracted from the square window as it moves in one of directions within a
given training image (see text from more details).
11
t=1
A
t=3
t=2
t=4
t=5
t=6
t=7
t=8
0.3
0.3
0.4
0.3
0.3
0.3
0.3
0.3
0.2
0.2
0.3
0.2
0.2
0.2
0.2
0.2
0.1
0.1
0.2
0.1
0.1
0.1
0.1
0
0
0.1
0
0
0
0
−0.1
−0.1
−0.1
−0.1
−0.1
−0.2
−0.2
−0.2
−0.2
−0.2
0
−0.3
8
8
6
−0.2
−0.3
8
8
8
8
8
8
6
6
4
2
0
2
0
0
2
4
2
0
4
6
4
2
0
0
8
6
4
2
2
0
6
4
4
2
0
−0.3
8
−0.4
8
6
6
4
2
0
2
0
−0.3
8
6
6
4
2
4
2
0
4
6
4
2
0
0
−0.3
8
6
8
6
4
2
2
0
6
4
4
2
−0.3
−0.3
8
6
6
4
0
−0.1
−0.2
−0.2
−0.3
8
0.1
−0.1
−0.1
0
0
Initial Synaptic Weights
t=1
t=3
t=2
0.15
0.1
t=4
0.15
0.1
t=6
0.1
t=7
0.1
t=8
0.1
0.1
0.1
0.05
0.05
0.05
0.05
0.05
0
B
t=5
0.1
0.05
0.05
0.05
0
−0.05
0
0
−0.05
−0.05
−0.1
−0.1
−0.1
−0.15
8
−0.15
8
x
6
4
4
2
2
0
0
y
−0.1
−0.15
8
8
6
8
6
−0.15
8
0
4
2
2
0
0
−0.05
−0.05
6
4
2
0
0
8
6
6
4
4
2
0
−0.05
−0.1
8
8
6
6
4
0
−0.05
−0.05
0
6
4
4
2
2
0
0
−0.1
8
8
6
6
4
2
2
0
−0.1
8
8
6
4
4
2
2
0
0
0
−0.1
8
8
6
4
2
2
0
6
4
4
2
8
6
6
4
2
0
0
0
Synapses After Learning
t=1
C
2
3
4
5
6
7
8
y
x
Figure 4: Example of Receptive Field Dynamics after Learning (RF size =
).
(A) shows the initial set of
synaptic weights (or receptive fields) for a single model neuron. The weights shown comprised the first row of for . The components of these vectors were initialized to random values according to the uniform distribution and the resulting
vectors were normalized to length one. (B) shows the same set of synaptic weights after training the -neuron network on
natural images (see text for details). The synaptic profiles at each time step resemble oriented Gabor wavelets [Daugman, 1980;
Marcelja, 1980]. (C) depicts these synaptic weights using a classical 2D receptive field representation, where the dark regions are
inhibitory and the bright regions are excitatory. The entire sequence of synaptic weights suggests that this neuron is tuned towards
dark bars moving diagonally from the bottom left corner of the receptive field to the top right corner.
.
r was updated according
to the on-line estimation rule given by equation 18 with and " were adapted at each time step using equation 19 with and
.
The weights Similarly, the lateral weights were adapted using equation 20 with
and
. The
diagonal of , representing self-inhibitory
decay
terms
or
the
“leakiness”
parameters
in
the
leaky integrator
equation 18, was fixed at , although qualitatively similar results were obtained if the diagonal was also
additionally adapted along with the lateral weights. The learning rate parameters and were gradually
decreased by dividing with every image presentations and a stable solution was arrived at after
image presentations.
Figure 4 shows a set of synaptic weight vectors (first row of " # for ) for a
model neuron before and after learning (A and B respectively). Since these synaptic vectors form the
feedforward weighting function of neurons in the network (see Figure 2B), they can be roughly interpreted
as “receptive fields” or spatial impulse response functions for time steps . The receptive fields
after learning resemble localized Gabor wavelets which have previously been shown to well approximate the
receptive field weighting profiles of simple cells in the mammalian primary visual cortex[Daugman, 1980;
Marcelja, 1980; Olshausen and Field, 1996]. In addition, the model neuron can be seen to be tuned towards
dark bars moving diagonally from the bottom left corner of the receptive field to the top right corner. A
12
t=1
2
3
4
5
6
7
8
t=1
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
(p)
Figure 5: Further Examples of Receptive Fields (RF size = 2
3
4
5
6
7
8
).
(a) through (p) show examples of synaptic weights for
sixteen other model neurons from the network described in Figure 4. Several varieties of orientation selectivity can be discerned,
with each neuron being tuned towards a particular direction of motion. Other neurons, such as those depicted in (o) and (p), appear
to be selective for oriented image structures that are approximately stationary or contain motion along the preferred orientation.
number
of other examples of receptive fields developed by the network are shown in Figure 5. The set of
spatiotemporal synaptic weights together form an overcomplete set of basis functions for representing
spatiotemporal input sequences.
In a second set of experiments, we investigated the importance of rectification and lateral inhibition in
learning visual cortex-like receptive fields. Three networks using the same set of parameters, initialization
conditions, and training patches as the one used in Figures 4 and 5 were subjected to three different conditions. In the first network, rectification was removed but the lateral weights were learned and used as
before while in the second network, the lateral weights (including the diagonal terms) were disabled but
rectification was retained. The third network was trained without either lateral inhibition or rectification of
responses. Figure 6 compares the results obtained after image presentations for the same model
neuron as in Figure 4. This specific example as well as the results obtained for other model neurons in the
networks suggest that both lateral inhibition and rectification are necessary for obtaining synaptic profiles
resembling visual cortical receptive fields.
In a third set of experiments, the results obtained for receptive fields were verified in the
case. The training paradigm involving the natural image patches was identical to the
case. A network
of
neurons was trained on sequences of natural image patches of size
. The temporal extent
of processing was set to . The parameters were set as follows: , , ,
13
t=1
2
3
4
5
6
7
8
With Rectification and
Lateral Inhibition
y
x
Lateral Inhibition Only
Rectification Only
Without Rectification or
Lateral Inhibition
Figure 6: The Need for Lateral Inhibition and Rectification during Learning. The figure shows the synaptic profiles obtained
after training for the same neuron (from Figure 4) with identical initial conditions and identical training regimes but with lateral
inhibition and/or rectification removed. As can be seen from this specific example, which is typical of the result obtained for other
neurons in the networks, both lateral inhibition and rectification appear to be necessary for obtaining synaptic profiles resembling
visual cortical receptive fields. Neither rectification nor lateral inhibition alone seemed to suffice to produce the desired neural
receptive fields.
. The diagonal of was fixed at . The learning rate
parameters and were gradually decreased by dividing with every image presentations and a
stable solution was arrived at after image presentations.
Figure 7 shows a set of synaptic weight vectors (or receptive fields) for a model neuron before and
after learning. The model neuron can be seen to be tuned towards dark horizontal bars moving downwards.
Several other examples of receptive fields developed by the network are shown in Figure 8. A majority of
these neurons appear to be tuned towards oriented image structures moving in a particular direction. Some
neurons, such as the one shown in (k), exhibit more complex dynamics, involving two or more inhibitory
subregions coalescing into one while other neurons, such as the one shown in (l), appear to be tuned towards
oriented image structures that are either approximately stationary or contain some form of motion along the
preferred orientation.
The space-time receptive fields for the model neurons in Figure 7 and 8 are shown in Figure 9. These
- plots were obtained by integrating the 3-D spatiotemporal receptive field data along the neuron’s preferred orientation, as illustrated in the top two rows of the figure. The inverse of the slope of the oriented
subregions in the space-time receptive field provides an estimate of the neuron’s preferred velocity (see,
for example, [Adelson and Bergen, 1985]). Thus, in the case of the top two rows in the figure, the slope
is approximately indicating a preferred speed of approximately pixel/time step for these two neurons
in their respective directions. The space-time receptive fields in Figure 9 (b) through (l) are those for the
model neurons in Figure 8 (b) through (l). Note that even though the training image window moved at
pixel/time step, in some cases, such as (d) and (h), the preferred speed is less than pixel/time step due to
the well-known “aperture effect” [Adelson and Movshon, 1982]. In the extreme case of an approximately
stationary receptive field as in (l), the space-time receptive field indicates a preferred speed of zero.
In order to evaluate the nature of internal representations used by the network, we exposed the trained
,
, and
14
t=1
t=2
0.15
t=3
0.15
0.1
0.1
0.05
t=4
0.15
0.1
0.05
t=5
0.15
0.1
0.05
t=6
0.15
0.15
0.1
0.05
0.1
0.05
0.05
0
0
0
0
0
0
−0.05
−0.05
−0.05
−0.05
−0.05
−0.05
−0.1
−0.1
−0.1
−0.1
−0.1
−0.1
−0.15
−0.15
−0.15
−0.15
−0.15
−0.15
15
15
12
10
15
12
10
10
A
15
12
2
t=7
2
0.1
2
t=10
2
t=11
t=12
0.1
0.05
0.05
0
0
0
0
0
0
−0.05
−0.05
−0.05
−0.05
−0.05
−0.05
−0.1
−0.1
−0.1
−0.1
−0.1
−0.1
−0.15
−0.15
−0.15
−0.15
−0.15
−0.15
15
15
12
10
15
12
10
10
2
2
2
12
10
8
2
6
5
4
0
0
8
6
5
4
0
0
10
10
8
6
5
4
0
0
15
12
10
10
8
6
5
4
0
0
15
12
10
10
8
6
5
4
0
15
12
10
10
8
6
5
2
0
0.15
0.1
0.05
4
0
0
0.15
0.1
0.05
10
8
6
5
4
0
0
0.15
0.1
0.05
12
10
10
8
6
5
4
0
0.15
0.1
0.05
2
0
t=9
0.15
15
12
10
8
6
5
4
0
0
t=8
0.15
15
12
10
8
6
5
4
0
0
10
10
8
6
5
4
0
10
10
8
6
5
2
4
0
0
2
0
Initial Synaptic Weights
t=1
t=2
t=3
t=4
t=5
t=6
0.1
0.1
0.1
0.1
0.1
0.1
0.05
0.05
0.05
0.05
0.05
0.05
0
0
0
0
0
0
−0.05
−0.05
−0.05
−0.05
−0.05
−0.05
−0.1
15
−0.1
15
−0.1
15
−0.1
15
−0.1
15
−0.1
15
12
10
12
10
10
B
12
2
t=7
12
2
t=8
12
2
2
8
2
t=11
0.1
0.1
0.1
0.1
0.05
0.05
0.05
0.05
0.05
0.05
0
0
0
0
0
0
−0.05
−0.05
−0.05
−0.05
−0.05
−0.05
−0.1
15
12
10
−0.1
15
12
10
10
8
y
4
0
2
0
−0.1
15
12
10
10
8
6
5
6
5
x
0
−0.1
15
12
10
2
10
8
2
2
12
10
10
6
5
4
0
0
8
6
5
4
0
0
8
6
5
4
0
0
−0.1
15
12
10
10
8
6
5
4
2
0
t=12
0.1
10
4
0
0
0.1
−0.1
15
6
5
4
0
0
t=10
t=9
12
10
8
6
5
4
0
0
10
10
8
6
5
4
0
0
10
10
8
6
5
4
0
0
10
10
8
6
5
4
0
10
10
8
6
5
2
4
0
0
2
0
Synapses After Learning
t=1
C
2
3
4
5
6
7
8
9
10
11
12
y
x
Figure 7: Example
of Receptive Field Dynamics after Learning (RFsize =
). (A) shows the initial set of random
and normalized synaptic weights for a single model neuron in a -neuron network. (B) shows the same set of synaptic
weights after training the network on the filtered natural images (see text for details). (C) depicts these synaptic weights using a
classical 2D receptive field representation (dark regions are inhibitory, bright regions are excitatory). The sequence of synaptic
weights suggests that this neuron is tuned towards dark horizontal bars moving downwards.
15
t=1
2
3
4
5
6
7
8
9
10
11
12
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
Figure 8: Further Examples of Receptive Fields (RF size = ). (a) through (l) show examples of synaptic weights for
twelve other model neurons from the network used in Figure 7. A majority of these neurons are tuned towards oriented image
structures moving in a particular direction. Some neurons, such as the one shown in (k), exhibit more complex dynamics, for
example, involving two or more inhibitory subregions coalescing into one. Other neurons, such as the one shown in (l), appear
to be tuned towards oriented image structures that are either approximately stationary or contain some motion along the preferred
orientation.
16
t=1
2
3
4
5
6
7
8
9
10
11
12
t
x
t=1
2
3
4
5
6
7
8
9
10
11
12
t
(a)
(b)
(c)
(d)
(e)
(f)
x
(g)
(h)
(i)
(j)
(k)
(l)
t
x
Figure 9: Space-Time Receptive Fields (size =
). The top two rows depict the process of constructing space-time
receptive field profiles from temporal sequences of 2-D spatial receptive fields. The top row is from Figure 7 while the second row
is from Figure 8 (a). The - plots are obtained by integrating the 3-D spatiotemporal data along the neuron’s preferred orientation
(in these two cases, along the horizontal direction). Note that orientation in - space indicates the neuron’s preferred direction of
motion (in this case, upwards or downwards). In addition, the slope of approximately in both cases indicates a preferred speed of
approximately pixel/time step for these neurons (since slope is inverse with the preferred velocity [Adelson and Bergen, 1985]).
(b) through (l) show the space-time receptive fields for the model neurons in Figure 8 (b) through (l). In some cases, such as (d)
and (h), the preferred speed is less than pixel/time step due to the aperture effect [Adelson and Movshon, 1982]. In the case of
(l), the preferred speed is zero.
17
network and a companion network without the lateral inhibitory weights to a sequence of images depicting
a bright vertical bar moving to the right on a dark background (Figure 10). In both cases, the responses of the
model neurons in each network were plotted as histograms at each time step, with the first vertical bar
corresponding to the response of the first neuron, the second to that of the second neuron, and so on. As seen
in Figure 10, the trained network generates sparse distributed representations of the spatiotemporal input
sequence (only a few neurons are active at each time step). On the other hand, disabling the lateral inhibitory
connections results in a much larger number of neurons being active at each time step, suggesting that the
lateral connections play a crucial role in generating sparse distributed representations of input stimuli.
In the final set of experiments, we analyzed the direction selectivity of model neurons in the trained
-neuron network. Figure 11 illustrates the experimental methodology used. Each neuron in the network
was exposed to oriented dark/bright bars on a bright/dark background moving in a direction perpendicular to
the bar’s orientation (bright or dark bars were chosen depending on which
case elicited
the largest response
in steps of . Figure 11B shows
from the neuron). The direction of motion was varied from to
the cases where the direction of motion is (downwards) and (upwards) respectively. Figure 11C
shows the response of the model neuron in these two cases as a function of the time from stimulus onset. As
expected from the structure of its space-time receptive field, the neuron exhibits a significant response for a
bar moving downwards (preferred direction) while a bar moving upwards (null direction) elicits little or no
response.
A direction selectivity index (DSI) was defined for each model neuron as:
DSI !
Peak Response in Null Direction
Peak Response in Preferred Direction
(22)
The preferred direction was taken to be the direction of motion eliciting the largest response from the model
neuron and the null direction was set to the opposite direction. Thus, a neuron with a peak response of zero
in the null direction and a nonzero response in the preferred direction has a DSI of or . A neuron with
equal responses in both directions have a DSI of zero. Figure 12A shows polar response plots of four model
neurons
with increasing DSI from left to right. The rightmost plot is for the neuron in Figure 11 and its DSI
of confirms its relatively high direction selectivity. Figure 12B shows the population distribution of
direction
selectivity in the
-neuron network.
A relatively large proportion of the neurons ( of the
or ) had a direction selectivity index of or greater.
6 Discussion
The results suggest that many of the important spatiotemporal properties of visual cortical neurons such as
velocity and direction selectivity can be explained in terms of learning efficient spatiotemporal generative
models for natural time varying images. Efficiency is defined not only in terms of reducing image reconstruction errors but also in terms of reducing statistical dependencies between network outputs. This redundancy
reduction is implemented via recurrent lateral inhibition as derived from the MDL principle. The network
also utilizes other lateral connections which are derived from the generative model. These lateral excitatory/inhibitory connections play a role similar to those used in Bayesian belief networks for facilitating the
phenomenon of “explaining away” [Pearl, 1988]. The network additionally includes rectification of outputs
to encourage sparseness among the network activities. Although rectification helps in the development of
sparse distributed representations, rectification alone was found to be insufficient in producing satisfactory
space-time receptive fields. Likewise, lateral inhibition alone was also found to be insufficient.
18
Input
Responses
(Without L. Inhibition)
Responses
(With L. Inhibition)
Input
1
7
2
8
3
9
4
10
5
11
6
Responses
(Without L. Inhibition)
Responses
(With L. Inhibition)
12
Figure 10: Lateral Inhibition and Sparse Distributed Representations of Input Stimuli. To test the role of the lateral
inhibitory weights (see Figure 2) in generating sparse distributed representations, these weights were removed (after learning)
neurons in the network at the
and the network was exposed to a bright vertical bar moving to the right. The responses of the various time steps are shown as histograms (the first vertical bar is the response of the first neuron, the second is that of the second
neuron, and so on). The histograms have been normalized such that the maximum response in each graph is one. The responses
with lateral inhibition intact are shown to the right at each time step. As is evident from the relatively sparse number of neurons
active at each time step in the “with lateral inhibition” case, the existence of lateral inhibitory connections in the network appears
to be crucial for generating sparse distributed representations of the spatiotemporal input stimuli.
19
A
B
C
0.2
0.18
Response
0.16
0.14
Preferred
Direction
0.12
0.1
0.08
0.06
0.04
0.02
t
0
0
2
4
6
8
10
12
0.2
0.18
x
Response
0.16
Space-Time
Receptive Field
0.14
0.12
Null
Direction
0.1
0.08
0.06
0.04
0.02
0
0
Input Stimulus
2
4
6
8
10
12
Time from Stimulus Onset
Figure 11: Example of Direction Selectivity. The model neuron from Figure 7 was tested for direction selectivity. (A) shows
the space-time receptive field of this neuron. (B) depicts the two input stimuli used for testing: a dark horizontal bar in a white
background, moving either downwards or upwards. (C) shows the response of the neuron as a function of the time from stimulus
onset. As expected from the structure of its space-time receptive field, the neuron exhibits a significant response for a bar moving
downwards (preferred direction), reaching a peak response of units at 11 time steps from stimulus onset. On the other hand, a
bar moving upwards (null direction) elicits little or no response.
20
90
0.3
90
0.1
60
120
0.15
150
30
0
210
330
240
0.2
0.06
0.06
0.15
30
150
0.04
0.1
0.02
0.02
0.05
0
210
180
0
210
330
240
330
240
300
270
DSI = 1.77 %
30
150
0.04
180
300
60
120
0.08
30
270
90
0.25
60
120
0.08
150
0.075
180
90
0.1
60
120
0.225
180
0
210
240
300
270
DSI = 22.00 %
330
300
270
DSI = 63.46 %
DSI = 92.25 %
A
40
35
Number of Neurons
30
25
20
15
10
5
0
0
10
20
30
40
50
60
70
80
90
100
Direction Selectivity Index (DSI) %
B
Figure 12: Analysis of Direction Selectivity. (A) shows polar response plots of four model neurons when exposed to bars
oriented at a particular angle and moving in a direction perpendicular to the orientation. The radius of the plot indicates the strength
of the response while the angle indicates the direction of motion of the bar. The sequence of plots is arranged from left to right in
the order of increasing direction selectivity, as given by the direction selectivity index DSI (see text for definition). The polar plot
on the extreme right is for the neuron in Figure 11, confirming its relatively high direction selectivity. (B) shows the distribution of
direction selectivity in the population of neurons in the trained network. A relatively large proportion of the neurons ( of
the ) had a direction selectivity index of
or greater.
21
Several previous models of direction selectivity have utilized recurrent lateral interactions and rectification [Suarez et al., 1995; Maex and Orban, 1996; Mineiro and Zipser, 1997], although without the
statistical perspective pursued herein. In addition, the receptive fields in many of these approaches were
hard-wired by hand rather than learned. An interesting issue is how the spatiotemporally varying synaptic weights of neurons in the present model can be implemented biologically by intrinsic synaptic mechanisms and axonal-dendritic interactions over time. In this regard, it is possible that space time receptive fields similar to those obtained by the present method may also arise in other neural circuits that
do not explicitly use spatiotemporal synaptic weights. Most of the synaptic weights in the trained networks were found to be space-time inseparable. The lack of larger numbers of separable receptive fields
in the trained networks suggests an alternative mechanism for the generation of such receptive fields.
Most cortical receptive fields that are space-time separable have a temporal weighting profile that approximates a derivative in time. We have previously shown [Rao and Ballard, 1997b] that a generative
model based on a first-order Taylor series approximation of an image produces localized oriented filters
that compute spatial derivatives for estimating translations in the image plane. A possibility that we are
currently investigating is to use a Taylor series expansion of an image in both space and time, and to ascertain whether such a strategy produces separable filters that compute derivatives in both space and time.
Other issues being pursued include explaining contrast normalization effects[Albrecht and Geisler, 1991;
Heeger, 1991] and recasting the hierarchical framework proposed in[Rao and Ballard, 1997a] to accommodate the spatiotemporal generative model proposed herein.
References
[Adelson and Bergen, 1985] E.H. Adelson and J. Bergen. Spatiotemporal energy models for the perception
of motion. J. Opt. Soc. Am. A, 2(2):284–299, 1985.
[Adelson and Movshon, 1982] E.H. Adelson and J.A. Movshon. Phenomenal coherence of moving visual
patterns. Nature, 300:523–525, 1982.
[Albrecht and Geisler, 1991] D.G. Albrecht and W.S. Geisler. Motion sensitivity and the contrast-response
function of simple cells in the visual cortex. Visual Neurosci., 7:531–546, 1991.
[Atick and Redlich, 1992] J.J. Atick and A.N. Redlich. What does the retina know about natural scenes?
Neural Computation, 4(2):196–210, 1992.
[Atick and Redlich, 1993] J.J. Atick and A.N. Redlich. Convergent algorithm for sensory receptive field
development. Neural Computation, 5:45–60, 1993.
[Atick, 1992] J.J. Atick. Could information theory provide an ecological theory of sensory processing.
Network, 3:213–251, 1992.
[Baddeley and Hancock, 1991] R.J. Baddeley and P.J.B. Hancock. A statistical analysis of natural images
matches psychophysically derived orientation tuning curves. Proc. R. Soc. Lond. Ser. B, 246:219–223,
1991.
[Barlow, 1961] H.B. Barlow. Possible principles underlying the transformation of sensory messages. In
W.A. Rosenblith, editor, Sensory Communication, pages 217–234. Cambridge, MA: MIT Press, 1961.
22
[Barlow, 1972] H.B. Barlow. Single units and cognition: A neurone doctrine for perceptual psychology.
Perception, 1:371–394, 1972.
[Barlow, 1989] H.B. Barlow. Unsupervised learning. Neural Computation, 1:295–311, 1989.
[Barlow, 1994] H.B. Barlow. What is the computational goal of the neocortex? In C. Koch and J.L. Davis,
editors, Large-Scale Neuronal Theories of the Brain, pages 1–22. Cambridge, MA: MIT Press, 1994.
[Barrow, 1987] H.G. Barrow. Learning receptive fields. In Proceedings of the IEEE Int. Conf. on Neural
Networks, pages 115–121, 1987.
[Bell and Sejnowski, 1997] A.J. Bell and T.J. Sejnowski. The ‘independent components’ of natural scenes
are edge filters. Vision Research (in press), 1997.
[Bienenstock et al., 1982] E. L. Bienenstock, L. N. Cooper, and P. W. Munro. Theory for the development
of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J. Neurosci.,
2:32–48, 1982.
[Bryson and Ho, 1975] A.E. Bryson and Y.-C. Ho. Applied Optimal Control. New York: John Wiley and
Sons, 1975.
[Burr et al., 1986] D.C. Burr, J. Ross, and M.C. Morrone. Seeing objects in motion. Proc. R. Soc. Lond.
Ser. B, 227:249–265, 1986.
[Daugman, 1980] J.G. Daugman. Two-dimensional spectral analysis of cortical receptive field profiles.
Vision Research, 20:847–856, 1980.
[Daugman, 1988] J.G. Daugman. Complete discrete 2-D Gabor transforms by neural networks for image
analysis and compression. IEEE Trans. Acoustics, Speech, and Signal Proc., 36(7):1169–1179, 1988.
[Dayan et al., 1995] P. Dayan, G.E. Hinton, R.M. Neal, and R.S. Zemel. The Helmholtz machine. Neural
Computation, 7:889–904, 1995.
[DeAngelis et al., 1993a] G.C. DeAngelis, I. Ohzawa, and R.D. Freeman. Spatiotemporal organization of
simple-cell receptive fields in the cat’s striate cortex. I. General characteristics and postnatal development.
J. Neurophysiol., 69(4):1091–1117, 1993.
[DeAngelis et al., 1993b] G.C. DeAngelis, I. Ohzawa, and R.D. Freeman. Spatiotemporal organization of
simple-cell receptive fields in the cat’s striate cortex. II. Linearity of temporal and spatial summation. J.
Neurophysiol., 69(4):1091–1117, 1993.
[DeAngelis et al., 1995] G.C. DeAngelis, I. Ohzawa, and R.D. Freeman. Receptive-field dynamics in the
central visual pathways. Trends in Neuroscience, 18:451–458, 1995.
[Dempster et al., 1977] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete
data via the EM algorithm. J. Royal Statistical Society Series B, 39:1–38, 1977.
23
[Dong and Atick, 1995] D.W. Dong and J.J. Atick. Statistics of natural time-varying images. Network:
Computation in Neural Systems, 6(3):345–358, 1995.
[Eckert and Buchsbaum, 1993] M.P. Eckert and G. Buchsbaum. Efficient encoding of natural time varying
images in the early visual system. Phil. Trans. R. Soc. Lond. B, 339:385–395, 1993.
[Emerson et al., 1987] R.C. Emerson, M.C. Citron, W.J. Vaughn, and S.A. Klein. Nonlinear directionally
selective subunits in complex cells of cat striate cortex. J. Neurophysiol., 58:33–65, 1987.
[Field, 1987] D.J. Field. Relations between the statistics of natural images and the response properties of
cortical cells. J. Opt. Soc. Am. A, 4:2379–2394, 1987.
[Field, 1994] D.J. Field. What is the goal of sensory coding? Neural Computation, 6:559–601, 1994.
[Foldiak, 1990] P. Foldiak. Forming sparse representations by local anti-Hebbian learning. Biol. Cybern.,
64:165–170, 1990.
[Hancock et al., 1992] P.J.B. Hancock, R.J. Baddeley, and L.S. Smith. The principal components of natural
images. Network, 3:61–70, 1992.
[Harpur and Prager, 1996] G.F. Harpur and R.W. Prager. Development of low-entropy coding in a recurrent
network. Network, 7:277–284, 1996.
[Hartline, 1940] H.K. Hartline. The receptive fields of optic nerve fibers. Am. J. Physiol., 130:690–699,
1940.
[Heeger, 1991] D.J. Heeger. Non-linear model of neural responses in cat visual cortex. In Computational
models of visual processing, pages 119–133. Cambridge, MA: MIT Press, 1991.
[Hinton and Ghahramani, 1997] G.E. Hinton and Z. Ghahramani. Generative models for discovering sparse
distributed representations. Phil. Trans. Roy. Soc. Lond. B, 1997. To appear.
[Hinton and Sejnowski, 1986] G.E. Hinton and T.J. Sejnowski. Learning and relearning in Boltzmann machines. In D.E. Rumelhart and J.L. McClelland, editors, Parallel Distributed Processing, volume 1,
chapter 7, pages 282–317. MIT Press, Cambridge, 1986.
[Hubel and Wiesel, 1962] D.H. Hubel and T.N. Wiesel. Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. Journal of Physiology (London), 160:106–154, 1962.
[Hubel and Wiesel, 1968] D.H. Hubel and T.N. Wiesel. Receptive fields and functional architecture of
monkey striate cortex. Journal of Physiology (London), 195:215–243, 1968.
[Huber, 1985] P.J. Huber. Projection pursuit (with discussion). Annals of Statistics, 13:435–525, 1985.
[Intrator, 1992] N. Intrator. Feature extraction using an unsupervised neural network. Neural Computation,
4(1):98–107, 1992.
[Jordan and Rumelhart, 1992] M.I. Jordan and D.E. Rumelhart. Forward models: Supervised learning with
a distal teacher. Cognitive Science, 16:307–354, 1992.
24
[Kalman, 1960] R.E. Kalman. A new approach to linear filtering and prediction theory. Trans. ASME J.
Basic Eng., 82:35–45, 1960.
[Law and Cooper, 1994] C.C. Law and L.N. Cooper. Formation of receptive fields in realistic visual environments according to the Bienenstock, Cooper, and Munro (BCM) theory. Proc. Natl. Acad. Sci. USA,
91:7797–7801, 1994.
[Lee and Seung, 1997] D.D. Lee and H.S. Seung. Unsupervised learning by convex and conic coding. In
M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9.
Cambridge, MA: MIT Press, 1997. To appear.
[Linsker, 1988] R. Linsker. Self-organization in a perceptual network. Computer, 21(3):105–117, 1988.
[Maex and Orban, 1996] R. Maex and G.A. Orban. Model circuit of spiking neurons generating directional
selectivity in simple cells. J. Neurophysiol., 75(4):1515–1545, 1996.
[Marcelja, 1980] S. Marcelja. Mathematical description of the responses of simple cortical cells. Journal
of the Optical Society of America, 70:1297–1300, 1980.
[McLean and Palmer, 1989] J. McLean and L.A. Palmer. Contribution of linear spatiotemporal receptive
field structure to velocity selectivity of simple cells in the cat’s striate cortex. Vision Research, 29:675–
679, 1989.
[McLean et al., 1994] J. McLean, S. Raab, and L.A. Palmer. Contribution of linear mechanisms to the
specification of local motion by simple cells in areas 17 and 18 of the cat. Visual Neurosci., 11:271–294,
1994.
[Mineiro and Zipser, 1997] P. Mineiro and D. Zipser. Analysis of direction selectivity arising from recurrent
cortical interactions. Technical Report 97.03, Dept. of Cog. Science, UCSD, 1997.
[Mumford, 1994] D. Mumford. Neuronal architectures for pattern-theoretic problems. In C. Koch and J.L.
Davis, editors, Large-Scale Neuronal Theories of the Brain, pages 125–152. Cambridge, MA: MIT Press,
1994.
[Oja, 1989] E. Oja. Neural networks, principal components, and subspaces. International Journal of Neural
Systems, 1:61–68, 1989.
[Olshausen and Field, 1996] B.A. Olshausen and D.J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609, 1996.
[Olshausen and Field, 1997] B.A. Olshausen and D.J. Field. Sparse coding with an overcomplete basis set:
A strategy employed by V1? Vision Research, 1997. To appear.
[Pearl, 1988] J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, CA,
1988.
25
[Pece, 1992] A.E.C. Pece. Redundancy reduction of a Gabor representation: a possible computational role
for feedback from primary visual cortex to lateral geniculate nucleus. In I. Aleksander and J. Taylor,
editors, Artificial Neural Networks 2, pages 865–868. Amsterdam: Elsevier Science, 1992.
[Rao and Ballard, 1997a] R.P.N. Rao and D.H. Ballard. Dynamic model of visual recognition predicts
neural response properties in the visual cortex. Neural Computation, 9(4):721–763, 1997.
[Rao and Ballard, 1997b] R.P.N. Rao and D.H. Ballard.
Localized receptive fields may mediate
transformation-invariant recognition in the visual cortex. Technical Report 97.2, National Resource Laboratory for the Study of Brain and Behavior, Department of Computer Science, University of Rochester,
May 1997.
[Reid et al., 1991] R.C. Reid, R.E. Soodak, and R.M. Shapley. Directional selectivity and spatiotemporal
structure of receptive fields of simple cells in cat striate cortex. J. Neurophysiol., 66:505–529, 1991.
[Rissanen, 1989] J. Rissanen. Stochastic Complexity in Statistical Inquiry. Singapore: World Scientific,
1989.
[Sanger, 1989] T.D. Sanger. Optimal unsupervised learning in a single-layer linear feedforward neural
network. Neural Networks, 2:459–473, 1989.
[Shapley et al., 1992] R.M. Shapley, R.C. Reid, and R. Soodak. Spatiotemporal receptive fields and direction selectivity. In M.S. Landy and J.A. Movshon, editors, Computational Models of Visual Processing,
pages 109–118. Cambridge, MA: MIT Press, 1992.
[Shouval, 1995] H. Shouval. Formation and organisation of receptive fields, with an input environment
composed of natural scenes. PhD thesis, Dept. of Physics, Brown University, 1995.
[Suarez et al., 1995] H. Suarez, C. Koch, and R. Douglas. Modeling direction selectivity of simple cells in
striate visual cortex with the framework of the canonical microcircuit. J. Neurosci., 15(10):6700–6719,
1995.
[Watson and Ahumada, 1985] A.B. Watson and A.J. Ahumada. Model of human visual-motion sensing. J.
Opt. Soc. Am. A, 2(2):322–341, 1985.
[Wiener, 1949] N. Wiener. The Extrapolation, Interpolation, and Smoothing of Stationary Time Series with
Engineering Applications. New York: Wiley, 1949.
[Williams, 1985] R.J. Williams. Feature discovery through error-correction learning. Technical Report
8501, Institute for Cognitive Science, University of California at San Diego, 1985.
[Zemel, 1994] R.S. Zemel. A Minimum Description Length Framework for Unsupervised Learning. PhD
thesis, Department of Computer Science, University of Toronto, 1994.
26
Download