A rank encoder: Adaptive analog to digital conversion exploiting time domain spike signal processing ∗ Philipp Häfliger and Elin Jørgensen Aasebø Department of Informatics University of Oslo Abstract. An electronic circuit is presented that encodes an array of analog input signals into a digital number. The digital output is a ‘rank order code’ that reflects the relative strength of the inputs, but is independent of the absolute input intensities. In that sense, the circuit performs an adaptive analog to digital conversion, adapting to the average intensity of the inputs (i.e. effectively normalizing) and adapting the quantization levels to the spread of the inputs. Thus, it can convey essential information with a minimal amount of output bits over a huge range of input signals. As a first processing step the analog inputs are projected into the time domain, i.e. into voltage spikes. The latency of those spikes encodes the strength of the input. This conversion enables the circuit to conduct further analog processing steps by asynchronous logic. The circuit was implemented as a prototype on a VLSI chip, fabricated in the AMS 0.6µm process. The implementation takes 31 analog inputs that are delivered as frequency encoded spike trains by a 5bit AER (address event representation) bus. The output is 128 bits wide and is read serially in 16-bit packages. Based on tests with one particular set of inputs with an entropy of 112.62 bits, it is estimated that the chip is able to convey 40.81 bits of information about that input set. Possible applications can be found in speech recognition and image processing. Keywords: analog to digital conversion (ADC), normalization, rank order code, time domain signal representation, temporal code 1. Introduction 1.1. Rank order encoding The principle of rank order encoding (Thorpe et al., 1996) is inspired by observations in neurophysiology. Neurons (brain cells) transmit signals in the form of short voltage pulses, so-called action potentials or spikes. How the nervous system codes information with these spike signals is an open research area. Pioneers in neurophysiology found correlations between stimuli and the firing frequency of neurons, e.g. between optical stimuli and neurons in the visual cortex (Hubel and Wiesel, 1962). Thus, early neural models considered the information ∗ This work was supported by the ASICs for MEMS project of the Norsk Forskningsrådet c 2003 Kluwer Academic Publishers. Printed in the Netherlands. paper.tex; 13/06/2003; 11:26; p.1 2 transmitted between neurons to be encoded in the average spike rate. More recent observations, however, indicate that these rate models are not sufficient. A strong argument for this view comes from experiments described in (Thorpe et al., 1996). In these experiments, monkeys and human subjects were presented with pictures. They had to decide if they saw an animal in the picture or not and they had to press one of two buttons accordingly. The subjects needed about 200-500ms to solve this task. The experimenters argued that along the neural processing path from the eye to the motor reaction each neuron involved only had time to fire once or twice within this time limit. Therefore the necessary information could not have been encoded in firing frequency but was somehow conveyed with only one or two spikes per processing stage. There are several models of how this can be achieved (Thorpe et al., 1996; Maass and Natschläger, 1997; Maass, 1999). Some of them consider the relative timing of individual spikes among a group of neurons to be important. The principle of rank order encoding (Thorpe et al., 1996) is one of them. In this model the relevant information within a group of neurons is the order in which they fire their first spike after a stimulus presentation. A rank order code is then a permutation of the neurons of this group. This order of firing corresponds to the order of the strengths of the stimulation that the individual neurons are exposed to: Neurons that are stimulated more strongly tend to react by producing a spike earlier. If one thinks of these neurons as being stimulated by pixels in a picture then the neurons associated with the brightest pixels will fire first. A lot of the essential information of the picture is preserved in the rank order code. Some information, however, is discarded. For example, the absolute intensities of the inputs cannot be reconstructed from the code: A rank order code will be independent of the illumination level. This is an advantage for many image processing tasks. Object recognition, for instance, should not depend on whether the picture of an object has been taken in a dim room or in bright daylight. 1.2. Address event representation Address event representation (AER) is a communication protocol that provides virtual point-to-point connections for constant length pulse signals. It was originally thought of as a means to connect spiking artificial neurons (Mahowald, 1994; Mortara and Vittoz, 1994). In our system we use it to convey an array of analog values that are encoded in pulse frequency. In AER, a fast asynchronous digital n-bit bus (a 5-bit bus in the case of the input to our system as depicted in figure 1) is used to provide paper.tex; 13/06/2003; 11:26; p.2 3 16 arbiter 31 integrator array 5 AER receiver 16 31 rank encoder reset Figure 1. The block diagram of the system including the input and output circuitry. 2n virtual pulse signal connections between multiple computational elements on one or on several chips. A sender of such pulse streams could be a sensor array chip of some sort. Whenever an element of that array transmits a pulse, that pulse appears as a digital number on the bus. That number is the sending element’s ‘address’, i.e. its chip coordinate. On the receiver side, this address is demultiplexed and a fixed length pulse is reconstructed and delivered to the receiving element. 2. Methods The system described in the block diagram in figure 1 has been implemented as a prototype on a 3 x 3 mm VLSI chip fabricated in the 0.6µm AMS process. The prototype receives 31 analog inputs and issues a 128-bit output code. The core circuit of the system (‘rank encoder’ in figure 1) translates an array of 31 analog current inputs first into spike/time-domain signals and then into a rank order code. The reset signal serves as a trigger that starts a new computation as it goes low. The array of time varying analog inputs to the rank encoder could for example be pixel values from an image sensor or an acoustic signal processed by a filter array. paper.tex; 13/06/2003; 11:26; p.3 4 This encoder could be the output stage of such a sensor chip. The rank encoder is described in more detail in subsection 2.2. For this test chip we simulated the sensor inputs. Due to pin limitation, we could not provide all 31 analog inputs directly from off-chip. We added an input stage which receives AER signals via a 5-bit bus1 . The AER sequences were computed on a PC to represent 31 Poisson distributed pulse streams. Then they were downloaded to a digital pattern generator and sent onto the AER bus. An ‘AER receiver’ (figure 1) reconstructs the 31 pulse streams, which encode the momentary input intensity in pulse frequency. Those pulse frequencies are transformed into analog input currents for the rank encoder by the ‘integrator array’ (figure 1). The whole input block is further described in subsection 2.1. The 128-bit output code of the rank encoder is read out serially in packages of 16 bits, starting from the most significant bits (MSBs). Further explanations of the output block are in subsection 2.3. In the following descriptions of the individual circuit blocks some transistor width to length ratios are labeled in the circuit diagrams next to the transistors. All other transistors are of minimal dimensions (W/L=1.4/0.6). The biases used in the experiments are written in brackets under the bias names in the schematics. 2.1. The input block The input block consists of an AER receiver and an array of leaky integrators. The AER receiver block consists of a demultiplexer and some control circuitry (not shown). The latter handles the handshake protocol of the bus communication and ensures that the decoded pulses are of fixed length, independent of the timing of the handshake. The analog input currents are reconstructed by an array of leaky integrator circuits (figure 2). A fixed width voltage pulse coming from the AER receiver (inpulse) switches on I1 and thus, decrements VC by an amount set by the bias timeconst. The charge that has been withdrawn is replaced more slowly by the current I2 through the diodeconnected pfet transistor connected to the supply voltage Vgain− . Note that since the same amount of charge is replaced by I2 which has been withdrawn by I1 , the averages of the two currents Iˆ1 and Iˆ2 are the same and this average is the input pulse frequency ν multiplied with the current IM 1 flowing through transistor M1 when it is in saturation, i.e. during a pulse, multiplied by the pulse length T . 1 One of the 32 input addresses is not used, since the rank encoder was designed to receive only 31 inputs. The reason being that this results in a more convenient number of output bits. paper.tex; 13/06/2003; 11:26; p.4 5 Vgain(4.28V) Vgain+ (5.00V) 1.4/2.8 1.4/2.8 I2 VC I1 500fF timeconst (0.40V) M1 2.8/1.4 inpulse 2.8/0.6 Iout Figure 2. The ‘integrator’ circuit that computes an analog current from the frequency of the incoming voltage pulses. Iˆ1 = Iˆ2 = IM 1 T ν (1) I2 is a sort of low pass filtered or averaged version of I1 . If I2 would flow through a resistor instead of a diode this averaging would occur with a fixed time constant. But in this circuit, the rate of decay of I2 decreases with the voltage across the diode (Vgain− − VC ). Thus, smaller decrements of VC per inpulse increase the time constant of decay of I2 . I2 is mirrored into the output current Iout . By adjusting Vgain− and Vgain+ , positive and negative gain can be added to this stage. The following formula for the gain G is valid for subthreshold currents, under the assumption that the slope factor is only marginally different for the two different source voltages. UT is the thermal voltage. 1 V Iout e UT gain+ ≈ 1 := G V I2 e UT gain− (2) paper.tex; 13/06/2003; 11:26; p.5 out out 1..2 out 3..4 out 5..7 8..10 11..13 out Vdd out 14..16 6 3 3 3 3 2 2 Iin 7 Σ Time WTA Iin 6 Σ Time WTA Time WTA Iin 5 Σ Time WTA Time WTA Time WTA Iin 4 Σ Time WTA Time WTA Time WTA Iin 3 Σ Time WTA Time WTA Time WTA Time WTA Iin 2 Σ Time WTA Time WTA Time WTA Time WTA Time WTA Iin 1 Σ Time WTA Time WTA Time WTA Time WTA Time WTA Time WTA Vdd Vdd Vdd Vdd Vdd Vdd Vdd Vdd Vdd encoder encoder encoder Time WTA Vdd encoder Time WTA Vdd pullup_bias (3.89V) 10.4/1.0 Figure 3. The schematics of a 7 input rank encoder. 2.2. The rank encoder The main block in figure 1, the rank encoder, computes the permutation of inputs according to the order of strength of the analog input currents. The circuit for 7 analog current inputs is illustrated in figure 3. The function and signal flow is illustrated in a 4 input example in figure 4. The input currents (Iin1..7 ) are first latency encoded by so called integrate and fire (I&F) neurons on the left of the block-diagram in figure 3, i.e. the neuron that receives the biggest input current will fire first, followed by the one that receives the second strongest input, etc. (example in figure 4). The stronger the input, the smaller the latency. This is a general property of I&F neurons. They integrate an input paper.tex; 13/06/2003; 11:26; p.6 encoder encoder Time WTA finished 7 analog current inputs i&f neurons WTA columns 2 3 1 Iin finished t0 : reset goes low l: latency Figure 4. An example of a signal flow through a 4 input rank encoder. current until a threshold is reached. They then emit an output pulse. Thus, the latency li of that pulse spikei is proportional to the inverse of the input current Iini with proportionality constant C as given in equation (3) and illustrated in figure 5. I V Iin t0 t t0 1 C Iin t Figure 5. The principle of latency encoding. paper.tex; 13/06/2003; 11:26; p.7 8 Vdd Iin i Σ spike i reset 1.4/0.6 1.4/1.4 spike i Iin i 150fF 1.4/1.4 reset reset ENS reset choicei+1 ENS en_out en_out choicei+1 Figure 6. The integrate and fire neuron that projects a current input into the time domain. inspikei+1 inspikei Time WTA outspikei encodei 10.4/1.0 Q R Q reset choicei ENS en_out ENS S encodei inspikei+1 1 inspikei 0 ENS en_out reset choicei outspikei Figure 7. One timing winner-take-all (WTA) cell. paper.tex; 13/06/2003; 11:26; p.8 9 li = C 1 Iini (3) The implementation of our I&F neuron (figure 6) is based on C. Mead’s ‘self resetting neuron’ (Mead, 1989). Our adaptation is no longer self-resetting. The output goes high as the voltage on the capacitor that integrates the input current reaches the switching threshold. Positive feedback pulls the voltage on the capacitor up to Vdd too, stabilizing that state. The circuit needs to be reset explicitly to restart the whole process. This is done by the global reset signal. The actual rank order encoding is then performed by a cascade of ‘timing winner-take-all’ circuits (columns in figure 3). The latency encoded signals are passed on from the I&F neurons to the first timing winner-take-all (WTA) column that determines a winner (the earliest spike/the strongest input) and re-routes the spike signals to the next column, removing that winner from the further competition (circles in figure 4). Thus, each subsequent column consists of one less WTA cell. The column results are transmitted by encoders (out1..16 ). By default the encoders issue a zero, as long as none of their inputs are active. Thus, all non-zero outputs from the encoders indicate that a spike has been registered and that the timing WTA column has reached a decision. The global reset signal resets all WTA cells. A single timing WTA cell is shown in figure 7. Its task, to determine if its input is the winner among the latency encoded analog signals, has become a rather simple problem that can be solved by asynchronous logic. The first of the identical WTA cells within a column that receives an input spike (inspikei ) declares itself the winner (The flipflop is set) and disables its neighbours: the active flipflop output pulls EN S low, which is global within a column and disables the flipflop inputs. The active flipflop also forces the signal choicei low and this will force all lower choicej (j ≤ i) signals low. The signal choicei+1 decides if either the direct input (inspikei ) to a WTA cell or the input to the cell above (inspikei+1 ) is routed to the neighbouring cell in the next column using a one-bit bus multiplexer. All cells below the winning cell pass on their direct input (choicei+1 is low). All other cells pass on the input of the above neighbour (choicei+1 is high). Thus, the winning signal itself is not passed on. Note that this implementation also conveniently resolves conflicts: if two spike signals are close enough together in time to cause two flipflops to be set within one column, the top most will be declared the winner. Only this cell will have a high choicei+1 signal and a low choicei signal. Thus, only for this cell will the signal encodei be high. encodei is the input to the encoder that encodes the winner of that column. The spike signal (outspikei ) that is routed to the next column paper.tex; 13/06/2003; 11:26; p.9 10 is gated by the signal en out to avoid glitches. en out is simply the inverted choicei of the bottom most WTA cell in the column. The removal of the column winners reduces the number of output bits: fewer bits are needed to encode the winner in smaller columns (see figure 3). For instance, our prototype chip with 31 analog inputs would require log2 31! = 112.66 output bits for an optimal encoding of all possible 31! rank permutations. The encoding proposed here requires 128 bits which means that merely 12% of them are redundant. By comparison, a solution without reducing the column height would result in a 155-bit output which would mean 27% redundancy. For an arbitrary number s of analog input channels the number of output bits for different encoding strategies is given by equation (4). N = dlog2 (s + 1)e nobv = N s nred = N (s + 1) − (2N − 1) − 1 nopt = dlog2 (s!)e (4) nobv is the number of output bits needed for the ‘obvious’ solution without reducing the size of the columns. nred is the output size for the solution used by us that reduces the size of the WTA columns. It requires 19% fewer bits on average for 2 to 1023 analog inputs. nopt is the optimal encoding of permutations of s elements. nred is on average 7% bigger than this optimal encoding for 2 to 1023 analog inputs. Instead of using time domain encoding, a design that operates completely in the voltage or current domain could have been developed. Our original motivation to employ time domain processing was to explore the possibilities of a kind of encoding suggested by observations of the nervous system. One advantage that we discovered in the design process was the easy elimination of winners of previous columns by simple digital bus multiplexers. In contrast, the multiplexing of two analog signals is not so trivial nor so noise free. More generally, it can be said that by using spikes in time domain the design goes from being analog to digital asynchronous, which is in many respects easier to handle. Time domain signals also offer an interesting method to increase the signal to noise ratio. A standard way of achieving this is to increase the range over which a signal is represented. If it can be avoided to increase the noise proportionally, the signal to noise ratio is improved. In an analog circuits one can increase the supply voltage and represent voltage signals for example between 0V and 5V instead of between 0V and 3.5V. The thermal noise level remains approximately constant paper.tex; 13/06/2003; 11:26; p.10 11 and thus, the signal quality improves. Or in a current mode analog circuit signals can be represented by bigger currents to achieve that improvement. For example the output of a current mode WTA array as described in (Mead, 1989) will be less affected by thermal noise if the input currents and the bias current are increased proportionally. The price to pay is an increase in power consumption. In a time domain circuit, representing signals, not in a bigger voltage or current range, but on a bigger time interval has that same effect. Thus, if the input pulses to the timing WTA are spread over a bigger time interval, the output is less likely to be affected by jitter in the delay of these pulses. The cost of this improvement is not in power but in the speed of the computation. It is remarkable though that in a time domain circuit one also reduces the effect of device mismatches that way. Since the computational elements in a time domain circuit are asynchronous logic gates, their mismatch is best expressed in difference of gate delay. These mismatches, however, remain constant with the increase of the signal ranges. For instance, a WTA cell that receives an input pulse at 60ns delay might win unjustly over another that receives a 55ns delay input, because it switches 10ns faster. That unfair advantage will not matter anymore if those two inputs are represented with 4 times bigger delays: 240ns and 220ns. Whereas the same trick does not work for the current mode WTA. If a 100nA input wins unfairly over a 110nA input due to mismatch of the transistor properties in the central differential pair like structure, it will also win unfairly if the input currents are increased to 400nA and 440nA respectively. That is because the mismatches of transistor drain currents are proportional to those currents. 2.3. The Output Block The output block in figure 1 (circuit schematics not shown) is an asynchronous logic block that has the task of sending the resulting 128-bit code out in 16-bit packages by a handshake protocol. An arbiter circuit mediates between the external receiver and the rank encoder. The 16 bits encoding the higher ranks are sent first. With 31 analog inputs the first 16 WTA columns will have 5-bit outputs. Thus, the first 16 bits will consist of the 3 x 5 bits from the first 3 WTA columns plus the most significant bit from the fourth column. As soon as these first 16 bits are ready, as indicated by the signal en out from the fourth WTA column, the arbiter sends them out. As soon as the first 16-bit package has been transmitted and the next 16 bits are ready, they are sent out, etc. The arbiter makes sure that the packages are sent in the correct order. paper.tex; 13/06/2003; 11:26; p.11 12 3. Results 3.1. Test Setup In a final test run the circuit was tested with three different input distributions that were permutations of the same set of input currents. The 31 input currents were frequency encoded and conveyed as Poisson distributed spike trains by the AER input bus. The 31 average frequencies νi (i ∈ [1, 31]) of those spike trains ranged from 100Hz to 3100Hz. The three distributions were defined as follows: 1 1 31 − i 1 1 = − ( − ) νi 100 30 100 3100 1 1 i−1 1 1 input distribution 2: = − ( − ) νi 100 30 100 3100 input distribution 1: input distribution 3: 1 = νi ( 1 100 1 100 − − 32−2i 1 30 ( 100 2i−31 1 30 ( 100 − − 1 3100 ) 1 3100 ) (5) i ∈ [1, 16] i ∈ [17, 31] The rank orders of these input distributions are represented by the drawn out lines in figures 8, 9, and 10. The difference of one input intensity to the next strongest in those three distributions theoretically results in evenly-spaced firing times in the integrate and fire neurons, since according to equations (1) and (2) the input currents to the rank encoder Iini are proportional to the input frequencies νi , and the latency li is proportional to the inverse of that input current according to (3). Thus, the following equation is true for the theoretically constant interval ∆l from the firing of any one neuron to the next firing among the other neurons. ∆l = C 1 1 1 ( − ) IM 1 T G 30 100 3100 (6) That should make the distinction between two subsequent input intensities by the rank encoder equally difficult for low and high intensities. The maximal input intensity of 3100 spike events per second was chosen as high as possible, without coming too close to overloading the input AER bus: The digital pattern generator was set to produce address events at maximally 200 kHz. Stimulating all 31 inputs with 3100 spikes per second would require an average event rate of 96.1 kHz, i.e. about half of the maximal capacity. In the tests the chip was reset with a 20Hz clock. One of the factors that determined this frequency is that it sets a lower limit for the input paper.tex; 13/06/2003; 11:26; p.12 13 100% 100% 100% >25% 100% 100% 100% 75% 75% 75% 20% 75% 50% 50% 15% 50% 25% 25% 25% 10% 25% 0% <5% 0% 5% 0% 0% 5 rank 10 15 20 25 30 5 10 15 20 input number 25 30 Figure 8. The gray-scale histogram of the chip test with the first frequency input pattern. intensities in the time domain: If the reset frequency would have been somewhat higher, then the weakest input intensity of 100 spikes per second would not have caused a spike at all between two reset pulses. Thus, the output code would always have been incomplete, without the lower ranks assigned to an input channel. (Instead of using a clock for reset, the circuit could have been set to reset itself after the completion of the output rank order code. The input intensity would in that case determine the number of outputs per time unit.) 3.2. Measurements The output rank order is shown as gray-scale encoded histograms (figures 8, 9, and 10) over 455 computations. A perfect output would follow the drawn out lines. Error bars indicate the standard errors in ranking the individual inputs. The measured output is a good approximation of the perfect result. The standard error computed over all input channels and test runs measures 4.4 ranks. Two different kinds of deviations from the correct rank can be observed: random deviations and systematic deviations. The random deviations can be blamed almost entirely on the way the input was conveyed to the rank encoder. We could therefore expect paper.tex; 13/06/2003; 11:26; p.13 14 5 >25% 20% 15% 10% <5% rank 10 15 20 25 30 5 10 15 20 input number 25 30 Figure 9. The gray-scale histogram of the chip test run with the second input frequency pattern. >25% 20% 15% 10% <5% 5 rank 10 15 20 25 30 5 10 15 20 input number 25 30 Figure 10. The gray-scale histogram of another chip test run with the third input frequency pattern. paper.tex; 13/06/2003; 11:26; p.14 15 2.4 x 10 −6 2.35 current [A] 2.3 2.25 2.2 2.15 2.1 2.05 2 0 0.5 1 time [s] 1.5 2 Figure 11. Output currents from 3 elements of the integrator array according to circuit simulation. those deviations to be vastly reduced, if the rank encoder would be integrated on the same chip as the sensors, thereby eliminating the input block of our prototype. The random deviations in this prototype system occur mainly due to imperfect integration of the Poisson distributed AER input pulses: Some of the randomness of those Poisson signals is not averaged out. There are two steps of integration: the integrator array (figure 2) and the I&F neurons (figure 6). Let us look at the integrator circuit first. Figure 11 shows the output current (according to a circuit level simulation) of an integrator circuit that receives 3100Hz (dashed line, i = 1 in distribution 1 in equation (5)), 194Hz (drawn out line, i = 16 in distribution 1 in equation (5)), and 100Hz (dotted line, i = 31 in distribution 1 in equation (5)) Poisson distributed spike input. The time-constant of the integration is not sufficiently long to make these currents smooth. Figure 11 shows that the situation is worse for the lower input intensities. As explained in the integrator circuit description in section 2.1, the time-constant of the integration is adjustable within limits by the bias voltage timeconstant of the integrator circuit. We sought to increase the time-constant by choosing that bias to be low, and the gain of the integrator was increased by lowering Vgain− to make up for the small paper.tex; 13/06/2003; 11:26; p.15 16 currents. Unfortunately, we could not set the timeconstant bias below a certain limit, where thermal noise started to dominate the actual input. To further increase the time constant, a bigger capacitance would have been necessary. The problem is worse for the lower intensity inputs, since the average intervals for those Poisson inputs are the longest and the most prone to exceed the time constant of the integration. A second stage of integration is located in the I&F neurons. By having them integrate the currents from the integrator circuits longer, the result will improve. Note that the time of integration in this stage is adjustable within a wide range by adjusting the current gain G in the integrator circuit. A lower gain on that current will lead to longer integration times before the firing threshold of the I&F neuron is reached. But this would be a trade-off, since it would increase the time necessary to complete one measurement. We chose to reset our prototype chip by a 20 Hz clock and set the gain as low as possible, such that the last neuron was firing just in time to complete the rank order code. From integrating the simulated currents from the integrator array we would expect a standard error of the ranking of 4.8 2 , which is in good accordance with the total standard error of 4.4 we measured from the chip. We conclude from this that these effects from the input block are the main source of random deviations. Almost none of them are caused in the rank encoder itself. In fact, the measurements from the chip are even slightly better than expected, but that is because we optimized the biases for the chip and the simulation device-parameters will be somewhat different from the real ones and thus, the same biases will cause slightly different behaviour. All of the mentioned causes for random deviations affect weaker inputs more severely. Thus, the theoretical independence of rank order encoding to absolute input intensity is somewhat compromised: In practice the code’s precision becomes worse if the average signal intensity is lower. This is one reason why the errors are bigger for the lower ranks in the measurements in figures 8, 9, and 10. The standard errors over 2 We used sequences of 2 seconds of simulated integrator array current output for all 31 pulse frequencies of the Poisson inputs (examples in figure 11). We integrated them in Matlab in an assumed I&F neuron where the product of the firing threshold and the capacitance, i.e. the charge that is neccessary to fire the neuron, was chosen to be 8 ∗ 10−8 C. Thus the weakest input current would elicit a spike in just under 50ms. That allowed us to reset the capacitance with 20Hz, thus, starting a new computation every 50ms. Doubling the threshold-charge to 1.6∗10−7 C (which is equivalent to reducing the current gain by half on the chip) and reducing the reset frequency to 10Hz leads to a standard error of 4.1. Furthermore the following threshold-charge/reset frequency/standard error triplets where computed: 3.2 ∗ 10−7 C / 5Hz / 3.3, 6.4 ∗ 10−7 C / 2.5Hz / 2.6, 1.28 ∗ 10−6 C / 1.25Hz / 2.2, 2 ∗ 10−6 C / 1Hz / 1.6. paper.tex; 13/06/2003; 11:26; p.16 17 those plots is 5.0 ranks measured over ranks 16 to 31 and only 3.7 over ranks 1 to 16. Systematic deviations occur due to mismatches in the electronic elements on the chip. In the analog part of the circuit (the spikeintegrator and the integrate and fire neuron) the relevant mismatches are in parameters that affect the absolute magnitude of the analog voltages and currents, namely capacitance, threshold voltage, etc., whereas in the time domain part of the circuit (the timing WTA columns), it is mismatches in parameters that affect the timing of the signals, i.e. gate delays. Due to those systematic deviations some input channels show a clear tendency to be ranked too high (e.g. input number 15,6), whereas others are constantly ranked too low (e.g. input number 8). To estimate the total noise in the system as a whole we applied information theory. The noise in a system can be quantified by estimating the information about the input contained in the output. Thus, we estimated the ‘mutual information’ of the system input and its output. A necessary requirement for this computation is the knowledge about the input statistics. In our test setup we had full control over the inputs and chose a particular set of 31 input intensities. Considering all permutations of this set there is a total of 31! possible input patterns. (For the measurements we only used three of those permutations.) To compute the ‘entropy’ of these patterns (the total information gained by knowing which pattern actually occurs), we assumed that any one of them may occur with uniform probability. If the input can be clearly identified by looking at the output, then it contains complete information, i.e. an equivalent of log2 31! = 112.66 bits. Due to the uncertainty of the deviations in the encoder this number is reduced. Based on our measurements we estimated that the mutual information between the systems input and the systems output was 40.81 bits for this particular experimental setup. The complete derivation of this result is described in appendix A. 4. Conclusion An electronic circuit is presented that, as a first processing step, transforms an array of analog input signals into time domain spike signals. Thus, these analog signals can be processed further by asynchronous logic rather than analog devices. This type of temporal signal encoding bears many possibilities, some of which are discussed in this paper and others that need yet to be explored. The particular circuit presented in this paper computes a digital encoding of an array of analog inputs. The encoding is a so-called paper.tex; 13/06/2003; 11:26; p.17 18 rank order code (Thorpe et al., 1996). This particular analog to digital conversion (ADC) is independent of the absolute input level in theory (but it is expected and observed that in practice the reliability of the output code degrades slightly for lower input levels). It can be thought of as an ADC that adapts to the average input level and the spread of the inputs. Thus, it can encode essential information with a minimal number of bits over a huge range and variety of inputs. Our prototype chip, for example, takes 31 analog inputs and encodes them in 128 bits. If one were to use classical non-adaptive ADC limited to a total output size of 128 bits, it would allow for 4-bit ADCs per input channel (plus 4 unused extra bits). The observation range for these ADCs would be fixed, such that they could for instance be set to encode input voltages from 0V to 5V. That interval would be divided into chunks of 1.25V and encoded as numbers between 0 and 3. If for some reason the input signals would only lie between 2.5V and 3.75V, all ADCs would issue the number 2 and one would gain virtually no useful information about the inputs beyond the fact that all of them are somewhere between 2.5V and 3.75V. On the other hand, rank encoding would adapt its resolution to the limited spread. Thus, in a rank encoder’s output there is all the information concerning the relative order of strength of the inputs regardless of whether they lie between 0V and 5V or 2.5V and 3.75V. Even if half of them are between 0V and 1V and the other half between 4V and 5V, rank encoding still conveys differences within both clusters. Admittedly, the absolute input strengths cannot be reconstructed accurately. But this is not a large price to pay in many applications that actually employ a normalization step after the ADC in order to get rid of the ‘spatial DC component’, i.e. the average intensity level. More severe perhaps is the loss of the information about the magnitude of the relative differences. Thus, in the example with two distinct clusters of input strengths, one between 0 and 1V, and one between 4V and 5V, the information about the presence of these distinct clusters is also lost. But also this limitation is not crucial for many applications (see subsection 4.1 for possible examples). The circuit has been tested in a VLSI chip implementation, fabricated in the 0.6µm AMS process. As mentioned previously, the prototype has 31 analog inputs (received as frequency encoded pulse streams on an AER bus) and 128 output bits. Due to the clever output encoding, this is only 12% redundant bits to encode all of the possible 31! outputs. By contrast, a more obvious solution that had been considered earlier would have resulted in 27% redundancy. In testing the chip, 3 input patterns out of a particular set of 31! have been used. Each input pattern in this set should evoke one par- paper.tex; 13/06/2003; 11:26; p.18 19 ticular output code. Assuming that all inputs appear with the same probability, the information theoretical entropy of the input (complete information about the input) is 112.66 bits. In a noiseless model, the output code would contain all of that information, whereby the input could be completely reconstructed. On the physical device, however, it has been estimated from the tests that a total of 40.81 bits of mutual information are contained in the output. Simulation data suggests that most of the information loss is caused by the analog input stage that converts the pulse streams into analog currents. With direct current input to the rank encoder (if enough pins could be spared or if the sensors would be located on the same chip as the rank encoder), this information loss would be considerably smaller. Even so, the amount of information actually conveyed is still huge. The number suggests that the system could reliably transmit an alphabet of 240.81 = 1012.29 letters encoded as a subset of those 31! permutations of Poisson spike trains. Depending on the application, this information may well be sufficient for further data reduction steps to work on. Promising candidate applications could be speech or image processing. 4.1. Possible applications While constructing this system, we envisioned its use in the early processing stages of a phoneme recognition system. With an analog bandpass filter array (such as a silicon cochlea (Lazzaro and Mead, 1989; Sarpeshkar et al., 1998)) providing its input, it could replace some of the computationally costly early processing steps in a more conventional speech recognition system: digital to analog conversion, Fourier transformation, intensity normalization and an initial vector quantization. Digital signal processing would then take over an already comfortably reduced data set. The only question is if this reduced data will still contain all necessary information to recognize phonemes. In spoken English there are about 44 different phonemes. We have estimated that the rank encoder in its present implementation can already reliably convey an alphabet that is about eleven orders of magnitude larger. This is, however, dependent on how this alphabet is encoded in the input to the rank encoder. Thus, in the suggested input stage of a phoneme recognition system, the noisy and less advantageous encoding of the spoken phonemes has to be considered, as well as the noise in the filter array. But because of the huge capacity when using our tailored encoding, we think it very likely that phoneme power spectrums are different enough in terms of their rank order code that those phonemes can be extracted from the rank encoder output. However, this would still need to be verified experimentally. We hope paper.tex; 13/06/2003; 11:26; p.19 20 to attain a functioning silicon cochlea with AER output soon to put this assumption to the test. Also, other kinds of processing of arrays of sensory information could make use of a rank encoder, especially if they apply intensity normalization as one of the first steps, which is rendered superfluous by rank order encoding. Examples can be found in visual processing. The authors who introduced rank order encoding originally (Thorpe et al., 1996) used it to encode pictures as input to a face detection system. Other object recognition tasks are also good candidates. Hardware implementations of rank order encoders, such as the one described in this paper, open up possibilities to use this encoding in real time applications. paper.tex; 13/06/2003; 11:26; p.20 21 real relative rank of winner 30 <5% 10% 15% 20% >25% 25 20 15 10 5 5 10 15 20 WTA column # 25 30 Figure 12. Gray-scale histogram showing the deviations of the winners of each WTA column to their real rank among the inputs still remaining in the competition. It visualizes the values hi,j in equation (14) where the y-axis refers to j+1, the x-axis to i. Appendix A. Derivation of the output’s information content To estimate how much the maximal information content of the output of 112.66 bits is reduced by noise, the information theoretical ‘mutual information’ between the systems input and the systems output, is estimated here. We describe the input as the random variable X that assumes with equal probability 31! different states, i.e. the permutations of the 31 inputs. The output is the random variable Y which is the combination of the 31 timing WTA column outputs Y = (Y31 , .., Y1 ). In information theory, the entropy H(X) of a discrete random variable X is a measure of the total information content of that variable, i.e. the uncertainty about the value that this variable may adopt. For discrete random variables where P (X = xi ) = pi it is defined by (7). H(X) = X −pi log2 pi (7) i Mutual information of two random variables can be expressed with conditional entropy as in (8). paper.tex; 13/06/2003; 11:26; p.21 22 I(X, Y ) = H(Y ) − H(Y |X) (8) Or it can be expressed with the entropy of the combination of the two variables as in (9). I(X, Y ) = H(X) + H(Y ) − H(X, Y ) (9) If we solve (9) for H(X, Y ) and substitute X with Y31 |X and Y with (Y30 , .., Y1 )|X we get an expression for H(Y31 , (Y30 , .., Y1 )|X) = H(Y |X). If we substitute H(Y |X) in (8) with this expression we get (10). I(X, Y ) = H(Y ) − H(Y31 |X) − H(Y30 , .., Y1 |X) + I(Y31 , (Y30 , .., Y1 )|X) (10) Applying (8) on I(Y31 , (Y30 , .., Y1 )|X) we get (11) I(X, Y ) = H(Y ) − H(Y31 |X) − H(Y30 , .., Y1 |X, Y31 ) (11) Repeating this same procedure on the right most summand 29 more times we do finally get (12). I(X, Y ) = H(Y ) − H(Y31 |X) − .. − H(Y1 |X, Y31 , .., Y2 ) (12) The conditional entropies in (12) can be approximated (actually over-estimated) by the conditional entropies that only depend on the input. H(Yi |X, Y31 , .., Yi+1 ) = H(Yi |X) − I(Yi , (Y31 , .., Yi+1 )|X) / H(Yi |X) (13) And these can be estimated from the experimental results, since we can estimate the probability distribution of the associated conditional random variables. These conditional random variables are exactly the outputs of the timing WTA columns of the circuit. Each of those column outputs can be viewed as the result of a random experiment, where the probability distribution depends on the input and on the results of the preceding columns. We compute the difference δi of each column’s (index i) output (i.e. the winning input) to the real rank of this input among the inputs that have not yet been removed by a previous column. We model the probability distributions pi,j = P (δi = j) of these differences from the histogram hi,j (see figure 12) of the differences resulting from the experiments. paper.tex; 13/06/2003; 11:26; p.22 23 hi,j pi,j ≈ P j hi,j (14) Note that since we are computing a conditional entropy dependent on input X, we would actually need to compute the entropies for each column given every possible input pattern and then average over the results from all input patterns. We do, however, assume (somewhat unjustly) that the input channel properties are uniform. Thus, the probability distribution of the WTA columns outputs for any particular input pattern would only be a permutation of distribution for another permutation of the input and therefore have the same entropy. By expressing a column’s output by the winner’s real rank among the competitors, the probability distribution does not even have to be permuted, but would be exactly the same for every permutation of the input. That is why we can add the histograms for the three different input permutations used in the experiments to derive the probability distribution pi,j . The assumption that all input channels have uniform properties lets us also determine H(Y ). For any permutation of a particular input rank pattern, the rank order code output probability distribution will also just be the same permutation applied to probability distribution of the output of this particular input. Thus, if those probability distributions are averaged for all inputs and all inputs occur with equal probability, then all possible outputs will also occur with uniform probability. Thus, we approximate by (15). H(Y ) / log2 31! = 112.66 (15) Finally, we can compute the mutual information I(X, Y ) between the systems input and its output by combining the experimental histogram hi,j , (14), (7), (15), (13) into (12) to get to the final result (16). I(X, Y ) ≈ 40.81 (16) Let us just briefly look at the effects of the assumptions that were made on the way to this result. 1. Assuming uniform input channel properties will by (15) tend to make the estimated mutual information too big. The same assumption will, however, also lead to an overestimation of the WTA column outputs’ entropy by equations (14) and (7), and therefore to an underestimation of I(X, Y ). paper.tex; 13/06/2003; 11:26; p.23 24 2. Substituting H(Yi |X, Y31 , .., Yi+1 ) with H(Yi |X) according to (13) will contribute to underestimating I(X, Y ). 3. Approximating the WTA columns’ probability distribution by (14) will underestimate the real distribution’s entropy. That is because the real distribution is more even and would be better approximated by including many more experiments. This will contribute to overestimating I(X, Y ). References Hubel, D. and T. Wiesel: 1962, ‘Receptive Fields, Binocular Interaction and Functional Architecture in the cat’s visual cortex’. Journal of Physiology 160, 106–154. Lazzaro, J. and C. A. Mead: 1989, ‘Circuit Models of Sensory Transduction in the Cochlea’. In: C. A. Mead and M. Ismail (eds.): Analog VLSI Implementations of Neural Networks. Kluwer Academic Press, pp. 85–101. Maass, W.: 1999, ‘Computing with spiking neurons’. In: Pulsed Neural Networks. The MIT Press, pp. 55–85. Maass, W. and T. Natschläger: 1997, ‘Networks of spiking neurons can emulate arbitrary hopfield nets in temporal coding’. Network: Comp. in Neural Sys. 8, 355–372. Mahowald, M.: 1994, An Analog VLSI System for Stereoscopic Vision. Kluwer. Mead, C.: 1989, Analog VLSI and Neural Systems. Addison Wesley. Mortara, A. and E. A. Vittoz: 1994, ‘A communication architecture tailored for analog VLSI artificial neural networks: intrinsic performance and limitations’. IEEE Trans. on Neural Networks 5, 459–466. Sarpeshkar, R., R. F. Lyon, and C. Mead: 1998, ‘A Low-Power Wide-DynamicRange Analog VLSI Cochlea’. In: T. S. Lande (ed.): Neuromorphic Systems Engineering. Boston: Kluwer Academic Publishers, Chapt. 3, pp. 49–103. Thorpe, S., F. Fize, and C. Marlot: 1996, ‘Speed of processing in the human visual system’. Nature 381, 520–522. Authors’ Vitae paper.tex; 13/06/2003; 11:26; p.24 25 Philipp Häfliger Philipp Häfliger was born in Zürich, Switzerland in 1970. He studied Computer Science at the Federal Institute of Technology (ETH) in Zürich, with Astronomy as a second subject, from 1990 to 1995. A semester project in High Energy Astrophysics resulted in his first scientific publication as a co-author. He obtained his PhD in 2000 from the Institute for Neuroinformatics at ETH. At present, he works as a researcher in the Microelectronics Group at the Department of Informatics at the University of Oslo in Norway. He teaches a lecture in Neuromorphic Electronics and is the local coordinator for the European 5th Framework IST project CAVIAR, with partners in Spain and in Switzerland. He has lately enrolled as a member in several IEEE CAS society technical committees. When time permits, he loves to explore the magnificent nature in Norway and elsewhere. Elin Jørgensen Aasebø Elin Jørgensen Aasebø was born in Oslo, Norway, in 1978. She enrolled in the University of Oslo straight out of school. After courses in both psychology and philosophy, she ended up with informatics as the main area of study. In October 2002 she obtained a Master of Science degree. The Master project was in the field of neuromorphic engineering and organized under the microelectronic group at the Informatics department. She now works with FPGA design at Kongsberg Defence Communications in Billingstad, Norway. paper.tex; 13/06/2003; 11:26; p.25