NEURAL SCALE-INVARIANT TIME-FREQUENCY DECOMPOSITION FOR DETECTION OF SPATIOTEMPORAL FEATURES Zoran Tiganj, Karthik H. Shankar, Marc W. Howard Center for Memory and Brain, Boston Universtiy ABSTRACT along a Weber-Fechner scale; this representation can then be used to support a variety of behaviors including representaA growing body of evidence from neuroscience and cognition of time, space and sequences as well as episodic memory tive science suggests that the brain maintains a scale-invariant [12, 13]. The defining property of the Weber-Fechner repretimeline in which events are represented on a logarithmicallysentation is that the error in the time dimension grows linearly spaced Weber-Fechner scale. This Weber-Fechner represenwith the time in the past, consistent with a logarithmicallytation is scale-invariant – meaning that the array gives the spaced scale. Intuitively, the difference between 10 and 11 same relative error for signals at any scale – and resource is the same as the difference between 100 and 110. A scaleconserving, with the amount of resources necessary to repreinvariant representation appears to be an adaptive response to sent a particular timescale going up logarithmically with that a world that contains information at many scales. timescale. In addition to this time-domain representation, it Let us consider the problem of spectral decomposition of is in many cases desirable to have a frequency-domain reprea continuous time-varying signal f (t) from the perspective of sentation with these properties. We show that starting from a an organism facing a natural environment. Assume that, like logarithmically-spaced timeline, a scale-invariant band-pass a brain, we do not have access to the entire signal and that decomposition with log-spaced central frequencies can be the resources necessary to maintain memory for the signal at readily constructed. The method is neurally plausible and any moment are costly. If we are to estimate the instantacan be computed on-line. This approach can be used in neuneous power of the signal in a particular frequency band ωo romorphic engineering to extract spatio-temporal features of using standard methods, we would need to maintain on the interests, for instance slowly varying input components. order of ωo−1 time points of the signal. That is, the resources Index Terms— Band-pass decomposition, Scale-invariance, necessary to estimate a particular frequency band grow linLog-spacing, Time-frequency representation early with the period of that frequency band. In addition, for a continuously-varying natural signal we do not a priori know the frequency (or frequencies) that will be important for our 1. INTRODUCTION survival. A neurally-plausible time-local method for scaleinvariant frequency-decomposition that required logarithmic Numerous cognitive experiments, but also our everyday exresources would enable an organism to represent a very wide perience, demonstrate that humans can remember the time at range of frequencies with minimal resources. which events were experienced with gradually decreasing acThe solution we propose here builds on a neurallycuracy over a wide temporal range, up to tens or even hunplausible method for constructing a Weber-Fechner timeline dreds of seconds [1, 2, 3, 4]. The gradual decay without [14]. In this method, a set of leaky integrators indexed by abrupt changes in the accuracy suggests that the same mechatheir rate constant s maintain at each moment the Laplace nism might be used to store the memory over such wide temtransform of the input signal leading up to that moment. This poral range. It is widely believed that working memory is operation is time-local meaning that the only storage of the represented through the maintenance of action potential firsignal is through the set of s values that are maintained. The ing (see for example [5]). intrinsic properties of neurons can give rise to a broad range Natural signals, including auditory, visual and natural lanof time scales [15, 16]; these intrinsic properties can be exguage signals show correlations over a wide range of time ploited to give rise to leaky integrators with a range of rate scales [6, 7, 8, 9]. Perhaps as a response to this rich strucconstants in a biophysically realistic simulation [17]. ture, different regions of the human brain contain temporal receptive fields that respond to features across a range of disThe biologically plausible computational method for tinct scales [10, 11]. These considerations have led to the spectral decomposition presented here operates on the outputs hypothesis that at each moment the brain maintains working of the neural leaky integrators (effectively low-pass filters that memory as an estimate of the time at which events took place constitute working memory) and constructs a bank of band- pass filters. The decomposition is scale-invariant and when the ratio of the adjacent rate constants is a constant it provides log-spacing of the central-frequencies of the band-pass filters. The method for constructing such representation is detailed in the next section. We also reflect on the construction of a Weber-Fechner timeline and show that this band-pass decomposition yields a Weber-Fechner frequency-line with analogous properties to the timeline, only in the frequency domain. Additionally we show that the frequency-line can also be constructed from the timeline. In the third section we demonstrate the bandpass decomposition in the presence of multiplicative noise and contrast its performance to slow feature analysis (SFA) [18]. In the last section we discuss the overall properties and potential applications of this method. We reflect on its possible utility in the context of neuromorphic engineering. a (−1) Lk f (t) f˜∗ (t) t Fs (t) (−1) sLk+1 f˜ω∗ (ω) b (−1) Lk f (t) 2. DERIVING FREQUENCY DECOMPOSITION FROM NEURAL REPRESENTATION OF THE RECENT PAST Fs (t) f˜∗ (t) t (−1) sL1 f˜ω∗ (ω) We will first describe a neural mechanism for maintaining a scale-invariant working memory and then we will show how this mechanism can serve as a basis for constructing a neural representation of time and frequency. The block diagram illustrating the basic idea is show on Figure 1. The mechanism for representing the time is analogous to the one described in [19], see also [14, 20, 12]. The main contribution of this paper is in constructing a scale-invariant neural frequency decomposition with log-spaced central frequencies which can be used to identify spatiotemporal signal of interest. 2.1. A neural mechanism for maintaining working memory Fig. 1. Construction of neural support along time and frequency dimensions. a The signal from time domain, f (t), is transformed into Laplace domain using a set of leaky integrators Fs (t), with time constants s. The leaky integrators are chosen such that the transform maintains a scale-invariant memory representation of the input signal. A linear combination of this memory representation (implemented through (−1) (−1) operators Lk and sL1 is sufficient to create a neural timeline f˜∗ (t) and neural frequency-line f˜ω∗ (ω), both scalet invariant with respect to time and frequency. b The neural frequency-line f˜ω∗ (ω) can also be constructed from neural timeline f˜∗ (t). t We use a set of leaky integrators with different time constants to maintain information about history of a continuously changing input across multiple seconds. The crucial property of this representation is that temporal resolution gradually decays from more recent to more distant past. Time varying input signal f (t) is sent to N nodes constituting the input layer. Each node is a leaky integrator with a different time constant 1/s. The activity of these nodes Fs (t) is described by the following differential equation: dFs (t) = −sFs (t) + f (t). dt (1) The above expression effectively constitutes a set of first ors der low-pass filters with the cutoff frequencies 2π and gain 1/s. This can be read from the transfer function in the Fourier domain with variable ω: Fs (ω) 1 1 = . f (ω) s 1 + iω s (2) If the number of leaky integrators would be infinite, containing all the time constants from zero to infinity, the output of this set would constitute a Laplace transform (restricted to purely real, positive values of theR Laplace variable) of the in0 τ put with respect to s: Fs (t) = −∞ f (t0 )e−s(t −t) dt0 . This would be a perfect memory representation of the input signal. Since in practice the number of leaky integrators is limited one can obtain a discretized approximation of the Laplace transform by choosing a finite number of neurons (restrictsj = C, ing s to a discrete set of values). Here we chose sj+1 C > 1 (exponentially decaying s). Log spacing of s is optimal for long range correlated signals ([14], see also [13]). We will see that it also results in a scale-invariant frequency line. 2.2. Creating a neural timeline A simple linear transformation on the instantaneous values of the outputs from the leaky integrators is sufficient to construct a neural timeline – a neural memory representation of what happened when. This effectively means that each different stimulus will have a set of neurons that fire sequentially for a circumscribed period of time following each presentation of that stimulus. This representation has two important properties: 1) It is scale-invariant (the standard deviation of the circumscribed period of elevated firing rate is proportional to its peak time); 2) Peak firing times of the sequentially activated cells are log-spaced. These two properties imply that the resolution of the estimate of when a stimulus happened will decay with time elapsed since the stimulus presentation. This is consistent with the memory decay observed in everyday life and quantified in behavioral experiments. Additionally, this is a resource saving property: we need log of the resources that we would need for uniformly spaced peak firing (assuming the same maximal time resolution). To construct a neural timeline, since Fs (t) is an approximation of the Laplace transform of the input signal, we need to invert it and express Fs (t) in terms of some new variable ∗ that will correspond to internal representation of time, t. This is achieved by setting the weights between the first and the second layer such that the output of the second layer f˜∗ (t) is t approximately a k th order derivative of Fs (t) with respect to s (see [19] for details): (−1)k k+1 (k) s Fs (t). f˜∗ (t) = t k! (3) ∗ where s = −k/t. We define this linear operator as: L−1 k so −1 ˜ that f∗ ≡ Lk [Fs ]. This inversion will provide a dynamically t updated timeline maintained by a population of neurons, such that the firing rate of a particular neuron in the second layer ∗ represents the input activity t time ago (Figure 2a and 2b). Neurons with these properties have been reported in multiple areas of the mammalian brain (these are called time-cells, see [21] for a recent review). To explore the frequency properties of this representation we again compute the transfer function using a Fourier transform. Notice first that the impulse response of Fs (t) for f (t) = δ(0) is: Fs (t) = e−st and in that case the k th or(k) der derivative of Fs (t) reads as Fs (t) = (−t)k e−st . We can now construct a transfer function in the Fourier space by inserting the above expression into equation (3): f˜∗ (ω) t f (ω) = 1 (1 + iω k+1 s ) . (4) The above transfer function is equivalent to a low-pass filter of order k + 1 and unit gain. This property will be useful later to demonstrate that we can construct a frequency-line from this representation as well. a sf˜∗ (t) on lin-scale b sf˜∗ (t) on log-scale t t 1000 2000 3000 4000 5000 Time 6000 7000 8000 c f˜ω∗ (ω) on lin-scale 5 10 15 20 25 30 Frequency [Hz] 3 4 10 9000 10000 5 10 Time 10 d f˜ω∗ (ω) on log-scale 35 40 45 50 0 10 1 2 10 10 Frequency [Hz] Fig. 2. Scale-invariant log-spaced neural support across time and frequency dimensions (timeline and frequency-line). The y-axis in all 4 plots represents the firing rate. Both representations are constructed using only instantaneous firing rate from a set of exponentially decaying cells that received an input at time 0. a (time in lin-scale) & b (time in log-scale): Time-cells fire sequentially for a circumscribed period of time following the input at time 0. Each curve represents a single cell with scaled amplitude. The firing fields (intervals with elevated firing rate) spread in a scale-invariant fashion. Peak firing times are log-spaced. This set of cells maintains memory of the previous values of the stimulus such that what happened in a more distant past is represented less accurately than what happened in a more recent past. c (frequency in linscale) & d (frequency in log-scale): Similarly to time-cells in the time domain, frequency responses of neurons that do the frequency decomposition span the frequency axis in a scaleinvariant fashion with log-spaced central frequencies: slower components are represented more precisely than faster components. Each curve again represents a single cell, but in this case what we plot is a frequency response of the cells to a delta-pulse. This set of cells decomposes the input into ordered frequency bands allowing an estimate of the speed of the input signal. 2.3. Creating a neural frequency-line Similarly to the construction of a neural timeline, a simple linear transformation of the outputs from the leaky integrators is sufficient to construct a neural frequency-line – a scaleinvariant neural decomposition of an input signal into frequency bands with log-spaced central frequencies through a set of temporal receptive windows. In general, it is of a wide interest (for instance in iden- tifying slow components) to estimate the speed of the input signal. This can be easily done by taking a time derivative of the signal. However this approach is prone to errors when the scale of the target signal is not a priori known, as is often the case. Therefore, we will use our dynamical memory representation to construct derivatives on multiple (log-spaced) sj = C, subtracting the acscales. It turns out that given sj+1 tivity of neighboring cells in Fs (ω), scaled by s (to have unit gain, Figure 1a), or in f˜∗ (t) (Figure 1b) provides exactly the t desired log-spaced support along the frequency axis. Subtracting neighboring cells is equivalent to taking a first order derivative with respect to the cell index, which amends to using the operator sL−1 k+1 : f˜ω∗ (ω) f (ω) = 1 iω k+1 sj ) (1 + − 1 (1 + iω k+1 Csj ) , (5) where when integer k = 0 we are using Fs (t) scaled by s, while when k > 0 we are using f˜∗ (t). Integer j goes from 1 t ∗ to N − 1. ω is given as a function of s and it represents an ∗ internal frequency-line, similarly as t represents an internal ∗ timeline. The relationship between ω and s will be explicitly derived later. Notice that s and ω always appear together as ratio ωs . This indicates the scale invariance since if ω changes, scaling s by the same amount will result in a same transfer function. Because f˜ω∗ (ω) is a result of substraction of the activity of two low-pass filters its frequency characteristics will be band∗ pass. The central frequencies of the band-pass filters, ω, are the peaks of |f˜ω∗ (ω)|. They can be analytically expressed in a simple form only for k = 0: ∂|f˜ω∗ (ω)| √ ∗ = 0 ⇒ ω = s C. ∂ω (6) ∗ Inserting ω back into equation (5) gives us the amplification of the band-pass filters: C−1 C+1 . Notice that the amplification does not depend on s so it will be the same for all band-pass filters. To identify the distribution of the central frequencies along the frequency axis we observe the ratio of the neighboring central frequencies: ∗ ωj ∗ ω j+1 = 1 . C (7) The constant ratio that is smaller than 1 (since C > 1) indicates that the central frequency grows exponentially, providing log-spaced temporal receptive fields (Figure 2c and 2d). These ordered temporal receptive windows allow us not only to sort the signal by the speed of the components, but also to have an estimate of the speed. We discussed above the case k = 0, notice that for any in∗ teger k ≥ 0 the ratio ωj ∗ ω j+1 is constant. Increasing k increases the order of the low-pass filters and consequently makes the resulting band-pass filters more steep. 3. DEMONSTRATION I this section we demonstrate how the proposed method for frequency decomposition, combined with PCA, can be used for identification of spatiotemporal features of interest. Given a multidimensional input, the frequency decomposition described above should be done independently for each dimension. One can then apply PCA across all the input dimensions separately for each band (N − 1 PCAs, where N is the number of time constants s) to quantify the energy of dominant components. Projecting the input signal on the eigenvectors that correspond to the largest eigenvalues (across bands) will give us the spatiotemporal components of interest. Additionally, directions of the eigenvectors will indicate which bands are representing the same components. Notice that since PCA is applied in each band separately we will be able to quantify the speed of the identified components. We constructed a two-dimensional signal such that each dimension was composed of a weighted sum of a slow 7 Hz sine wave (xs ) and a fast 71 Hz cosine wave (xf ), Figure 3a. To each input we added binary multiplicative noise designed as a square wave with a duty cycle randomly varying between 1 and 10 ms. This was introduced to mimic occlusions that are generally present in visual and other types of input. Total duration of the signal was 5s. Our goal here was to isolate the components xs and xf assuming that we do not have any prior knowledge about the input signal. The presence of multiplicative noise made the problem not well suited for SFA. SFA relies on taking a time-derivative of the input data to get an estimate of the velocity. Numerical derivative of the given input signal resulted in a rather noisy output that obscures the true components. The output of a classical SFA with 0.1 ms lagged time-derivative is shown in the first column in Figure 3b. Assuming we know the scale of the target signal we can reduce the noise by analyzing the signal at that scale. To illustrate this we filtered the input signal with a first order low-pass filter with a cutoff frequency of 20 Hz (different choices of the cutoff frequency led to similar or worse results). The result of applying SFA on such signal is shown in the second column in Figure 3b. The slow, 7 Hz component was estimated fairly well in this case, while the fast, 71 Hz component was still not accessible. Since in general we do not know the target frequencies a priori it is not possible to do the filtering at the optimal scale. This is where the proposed method becomes useful: we can exploit the fact that we have a memory of the recent past and decompose the signal into multiple bands spanning all the frequencies of interest (from some minimum to some maximum frequency). We constructed a scale-invariant memory of each of the two input dimensions with N = 20 and k = 0. From this memory representation we built a frequency decomposi- a 4. DISCUSSION Original signals Input mixture xs = cos(14πt) xm (xs + 2xf ) xf = cos(142πt) xm (2xs + xf ) xm b SFA with dt large x̃f SFA with dt small x̃f x̃s x̃s c Bandpass decomposition x̃f f[Hz] 100 10 x̃s 1 0 λi,1 vi,1 0 λi,1 vi,2 Fig. 3. Comparing band-pass decomposition and SFA in identifying significant frequency components in a noisy input. a Two-dimensional input signal consists of a mixture of a sine and a cosine wave (7 Hz and 71 HZ respectivly) and multiplicative noise. The x-axis is time and spans 1s, and the y-axis is unitless signal amplitude. b Outputs of SFA for derivatives taken at two different temporal scales of the signal. c The input signal is decomposed into 19 frequency bands (N=20). The left column shows eigenvectors (vi ) scaled by the largest eigenvalue (λi ) for each frequency band. The right column shows projections of the input signal on two eigenvectors that correspond to peak eigenvalues. tion into 19 frequency bands (Figure 3c). The cutoff frequencies were decaying exponentially from 142 Hz to 2.4 Hz. At each band we applied PCA and found the largest eigenvalues and corresponding eigenvectors. As shown in the first column in Figure 3c the direction of the eigenvectors indicates whether a particular band dominantly represents xs or xf , while the eigenvalues indicate the energy in each band. Projection of the band-pass filtered input signal onto the eigenvectors that corresponded to the peak eigenvalues led to signals that resemble our original xf and xs . Notice that the band-pass decomposition did not only identify the slow and fast components, but also provided their frequency estimation (on a log-spaced scale). On the other hand SFA in general only provides the rank of the identified components with respect to their speed. We proposed a biologically plausible neural method for constructing a scale-invariant frequency decomposition through a set of band-pass filters with log-spaced central frequencies. We demonstrated that this frequency decomposition has properties analogous to those of an internal timeline that can be constructed from the same memory representation. We characterized these properties in terms of the central frequency and gain and described how the frequency decomposition can be constructed from neurally plausible leaky integrators. The crucial properties of the proposed method are the scale-invariance and log-spacing. When designing neural systems for processing natural signals it is essential to account for processing on multiple scales that span from milliseconds to hundreds of seconds, since the natural signals tend to be autocorrelated on a variety of temporal scales. Maintaining the same frequency resolution for all the frequency bans would require the number of bands to grow linearly with the frequency. This would be rather costly in terms of resources and computationally inefficient since we generally do not need high resolution at high frequencies. Therefore the proposed method provides a good trade-off between resolution and resources, in fact an optimal trade-off when the input signals are scale-invariant. Notice that memory systems based on register-style buffers or shift-registers could not account for the desired gradually decaying temporal resolution. The parallel design in our approach (leaky integrators operating in parallel, independently of each other) provides a simple solution for gradual decay in temporal accuracy which then allows us to construct the frequency-line with the desired properties. Ability to decompose the input signal into multiple frequency bands can be particularly useful for extracting features that change slowly relative to the sensory representation. For instance, the activity of visual receptors changes rapidly comparing to the identity or position of the object in front of the eye. Existing unsupervised approaches such as SFA require some prior knowledge about the temporal scale of the signal. Additionally, band-pass decomposition allows not only identification of the components of interests, but also an estimate of their speed on a log-scale. The proposed method does not require convolution like many other frequency decomposition methods. This makes it particularly suitable for neural implementation. Since our approach is based on biologically plausible neurons, neuromorphic platforms (see [22] for a recent example) could benefit for this approach. Designing neurons that can have long time constants would be a step towards making these computations simple to implement in neuormorphic hardware. Even though long time constants present a large challenge in analog electronics, the simplicity of their neural implementation [17] could provide an inspiration. In biological systems on a single-neuron level diffusion time constants are of the order of 1 s, but when combined with renewal mechanisms they can be extended to infinity (demonstrated by frequently observed persistent firing). Having such long time constants the proposed method only requires lateral inhibition mechanisms for taking spatial derivatives. 5. REFERENCES [1] M. J. Hacker, “Speed and accuracy of recency judgments for events in short-term memory.,” Journal of Experimental Psychology: Human Learning and Memory, vol. 15, pp. 846–858, 1980. [2] P A Lewis and R C Miall, “The precision of temporal judgement: milliseconds, many minutes, and beyond,” Philosophical Transcripts of the Royal Society London B: Biological Sciences, vol. 364, no. 1525, pp. 1897– 905, 2009. [3] M. W. Howard, T. E. Youker, and V. Venkatadass, “The persistence of memory: Contiguity effects across several minutes,” Psychonomic Bulletin & Review, vol. 15, no. PMC2493616, pp. 58–63, 2008. [4] B. C. Rakitin, J. Gibbon, T. B. Penny, C. Malapani, S. C. Hinton, and W. H. Meck, “Scalar expectancy theory and peak-interval timing in humans,” Journal of Experimental Psychology: Animal Behavior Processes, vol. 24, pp. 15–33, 1998. [5] P. Goldman-Rakic, “Cellular basis of working memory,” Neuron, vol. 14, pp. 477–85, 1995. [11] Yulia Lerner, Christopher J Honey, Lauren J Silbert, and Uri Hasson, “Topographic mapping of a hierarchy of temporal receptive windows using a narrated story,” Journal of Neuroscience, vol. 31, no. 8, pp. 2906–15, 2011. [12] Marc W Howard, Christopher J MacDonald, Zoran Tiganj, Karthik H Shankar, Qian Du, Michael E Hasselmo, and Howard Eichenbaum, “A unified mathematical framework for coding time, space, and sequences in the hippocampal region,” Journal of Neuroscience, vol. 34, no. 13, pp. 4692–707, 2014. [13] M. W. Howard, K. H. Shankar, W. Aue, and A. H. Criss, “A distributed representation of internal time,” Psychological Review, vol. 122, no. 1, pp. 24–53, 2015. [14] K. H. Shankar and M. W. Howard, “Optimally fuzzy scale-free memory,” Journal of Machine Learning Research, vol. 14, pp. 3753–3780, 2013. [15] A. V. Egorov, B. N. Hamam, E. Fransén, M. E. Hasselmo, and A. A. Alonso, “Graded persistent activity in entorhinal cortex neurons.,” Nature, vol. 420, no. 6912, pp. 173–8, 2002. [16] M. E. Sheffield, T. K. Best, B. D. Mensh, W. L. Kath, and N. Spruston, “Slow integration leads to persistent action potential firing in distal axons of coupled interneurons.,” Nature Neuroscience, vol. 14, no. 2, pp. 200–7, 2011. [6] R. F. Voss and J. Clarke, “1/f noise in music and speech,” Nature, vol. 258, pp. 317–318, 1975. [17] Zoran Tiganj, Michael E. Hasselmo, and Marc W. Howard, “A simple biophysically plausible model for long time constants in single neurons,” Hippocampus, vol. 25, no. 1, pp. 27–37, 2015. [7] Dawei W Dong and Joseph J Atick, “Statistics of natural time-varying images,” Network: Computation in Neural Systems, vol. 6, no. 3, pp. 345–358, 1995. [18] Laurenz Wiskott and Terrence J Sejnowski, “Slow feature analysis: unsupervised learning of invariances,” Neural Computation, vol. 14, no. 4, pp. 715–70, 2002. [8] E. Alvarez-Lacalle, B. Dorow, J. P. Eckmann, and E. Moses, “Hierarchical structures induce long-range dynamical correlations in written texts.,” Proceedings of the National Academy of Science, USA, vol. 103, no. 21, pp. 7956–61, 2006. [19] K. H. Shankar and M. W. Howard, “A scale-invariant representation of time,” Neural Computation, vol. 24, no. 1, pp. 134–193, 2012. [9] Vishnu Sreekumar, Simon Dennis, Isidoros Doxas, Yuwen Zhuang, and Mikhail Belkin, “The geometry and dynamics of lifelogs: discovering the organizational principles of human experience,” PLoS One, vol. 9, no. 5, pp. e97166, 2014. [10] Christopher J Honey, Thomas Thesen, Tobias H Donner, Lauren J Silbert, Chad E Carlson, Orrin Devinsky, Werner K Doyle, Nava Rubin, David J Heeger, and Uri Hasson, “Slow cortical dynamics and the accumulation of information over long timescales,” Neuron, vol. 76, no. 2, pp. 423–34, 2012. [20] Karthik H Shankar, “Generic construction of scaleinvariantly coarse grained memory,” arXiv:1406.3185 [q-bio.NC], 2014. [21] Howard Eichenbaum, “Time cells in the hippocampus: A new dimension for mapping memories,” Nature Reviews Neuroscience, In press. [22] Paul A Merolla, John V Arthur, Rodrigo AlvarezIcaza, Andrew S Cassidy, Jun Sawada, Filipp Akopyan, Bryan L Jackson, Nabil Imam, Chen Guo, Yutaka Nakamura, et al., “A million spiking-neuron integrated circuit with a scalable communication network and interface,” Science, vol. 345, no. 6197, pp. 668–673, 2014.