f 41 Hearing Research, 18 (1985) 41-55 Elsevier HRR00594 Some aspects of temporal coding for single-channel electrical stimulation of the cochlea Robert A. Dobie * and Norbert Dillier ENT-Clinic, University Hospital, CH-809! Zurich, Switzerland (Received 3 October 1984; accepted 27 March 1985) Estimates of the useful frequency range for single-channel electrical stimulation of the cochlea range from 400 to 4000 Hz. Psychophysical studies in single-channel implant patients are relevant not only to the practical problem of designing stimulation strategies, but also to questions of temporal processing of pitch in the normal auditory nervous system. Patients with single-channel extracochlear devices participated in several experiments involving stimuli differing in fine temporal structure. Stochastic pulse trains, in which the probability of pulse delivery (p) for a given cycle was less than 1.0, were readily discriminated from ordinary pulse trains. Frequency discrimination using stochastic pulse trains differing only in fine temporal structure (but identical average pulse rates) was as good as with ordinary pulse trains or sinusoids for P 0.5, but deteriorated rapidly for P < 0.5. Discrimination of triangular and trapezoidal waveforms from square waves was surprisingly good: rise-times (for 0 to maximum current) as low as 0.08 ms could be discriminated. Conversely, detection of jitter in pulse trains was almost an order of magnitude worse. The results show that frequency discrimination for single-channel electrical stimulation of the cochlea is based on discrimination of inter-pulse periods, and that pulse rates which would be unnatural for acoustically-evoked VIIIth nerve activity — up to 750 Hz — are more useful for coding mid-range frequencies than low-rate stochastic simulations of normal VIIIth nerve firing patterns. The waveform discriminations reported would be obscured by low-pass filtering even at 2000 Hz, and probably depend on changes in relative synchrony among an array of VIIIth nerve units with different thresholds. In general, these results support the use of analog coding schemes with relatively large bandwidth. cochlear implant, temporal coding, electrical stimulation Introduction It has been asserted [11,15] that frequencies up to 3-4 kHz can be usefully presented to deaf patients by electrical stimulation of the cochlea through a single channel. If this is so, it must be attributable to some type of temporal coding, as opposed to a place code, since a single-channel system obviously cannot exploit the normal spatial array of cochlear neurons along a high-to-low frequency (base-to-apex) dimension. The evidence presented for such a phenomenon is of three general types: 1. Measures of patients' abilities to discriminate sinusoidal stimuli of different frequencies have • Present address: Department of Otolaryngology, RL-30, University of Washington, Seattle, WA 98195, U.S.A. in some cases shown significant discrimination up to 1-2 kHz [12,1]. 2. At repetition frequencies up to 400 Hz, patients can distinguish among square, triangle, and sine waves, indicating an ability to utilize waveform information [14]. 3. Measures of Speech recognition abilities are in some cases reduced when frequencies above 900 Hz are excluded by filtering [11]. The ability of individual auditory neurons to phase-lock and thus preserve information regarding stimulus frequency in their interval histograms, at least up to 2000 Uz and probably higher, is well known [24,16]. As Evans [4] has pointed out, frequency cues corresponding to the place' of activity in the peripheral auditory system as well as to the fine temporal structure of the discharge patterns are both available to the central auditory 0378-5955/85/$03.30 © 1985 Elsevier Science Publishers B.V. (Biomedical Division) 42 nervous system (CANS). In the normal auditory system the voice pitch can be extracted from the combined temporal and place information of an auditory-nerve fiber population for a wide range of intensities and additive background noise conditions [22]. The limits of frequency discrimination based on only temporal information have been difficult to study psychoacoustically because of the near impossibility of excluding spectral information in auditory stimulation of normal subjects [7]. In addition, although the temporal firing pattern of a single neuron contains adequate information to determine rather high stimulus frequencies, it is not clear whether the CANS can extract this information from a single neuron or whether the temporal firing patterns of an array of neurons must be compared in order to make such an analysis. Theoretically, single-channel cochlear implant patients should be nearly ideal experimental subjects in which to study these questions. Spectral information can, in a sense, be ignored. At least, it can be assumed that the spatial distribution of cochlear nerve excitation is no longer dependent upon the stimulus spectrum. However, many (perhaps most) of these patients have severe losses of both sensory and neural structures, and there are typically practical difficulties in getting these patients to perform arduous, boring psychophysical tasks. Thus, frequency discrimination results from single-channel cochlear implant patients must be viewed as minimal estimates of the ability of the normal CANS to utilize temporal information. Under conditions of auditory stimulation, even for low frequencies (e.g., 100 Hz), a given unit usually does not fire on every cycle of the stimulus, but in a stochastic fashion; as a firnt-order approximation, there is a fixed probability of firing (much less than 1.0) for each cycle. Thus, the interval histogram will contain peaks at values of nT (n = small integer, T = period of stimulus), with the size of the peaks decreasing in an exponential fashion. For low frequencies, whose periods are greater than the neuron's refractory period, at least some of the interspike intervals will be equal to T. For higher frequencies (above about 1 kHz), only integer multiples of the period (2T, 3T, ) will be represented in the units interval histogram. If the CANS examines the temporal firing pat- terns of an array of units, the composite interval histogram will contain values as small as T, even for frequencies above 1 kHz, where no single unit could fire at the stimulus frequency. This is, of course, the volley' theory of Wever [31]. There is very little information available regarding single unit firing patterns for electrical stimulation, but it is known that the degree of phase-locking is more precise (less jitter) than for acoustic stimuli, and that maximum discharge rates are extremely high -- up to 900/s ([17,8]; Van den Honert, 1983, personal communication). The relatively unnatural nature of the auditory nerve firing patterns in response to electrical stimulation has also been noted by Sachs et al. [25], who have suggested the use of coding strategies incorporating lower pulse rates and stochastic properties. Presumably, the central nervous system (CANS) may find it difficult to make use of the abnormally high rates of firing elicited by even low frequencies (e.g., 500 Hz). Interspike interval histograms of single units excited with low-probability electrical pulse trains (intervals T, 2T, 3T, ) are assumed to be similar to interval histograms of single units stimulated with periodic acoustic signals of frequency 1/T. The difference for the CANS however is that for acoustically stimulated fiber groups originating from the same cochlear region the temporal and spatial averages are equivalent (ergodicity principle) whereas for electrically stimulated fiber groups they are not. If an instantaneous representation in a spatially distributed fiber group is a prerequisite for adequate central pitch processing, then single channel stochastic stimulation probably could not efficiently be used for conveying higher frequency information. If, on the other hand, some temporally averaging central analysis mechanism for spike intervals is available, then stochastic stimuli could be used to provide the CANS with a more natural input. Frequency (pulse rate) discrimination between appropriately chosen stochastic pulse stimuli can be assumed to be based exclusively on differences in fine temporal structure, since frequency and probability of pulse delivery can be co-varied so that resulting stimuli have identical mean interpulse intervals (or 'pulse densities'). Thus, stochastic pulse stimuli offer a unique test of the ability of 43 the CANS to use the kind of phase-locked temporal information which has Jong been known to be present in single-unit firing patterns [24]. It is useful also to consider how a patient with a single-channel cochlear or extra-cochlear implant could distinguish among different waveforms (square vs. triangle vs. sine). Clearly, spectrum per se plays no role. If one waveform were more effective in stimulating a larger array of neurons, this could be the basis for discrimination, but if the stimuli are first adjusted for equal loudness, this would be expected to eliminate cues based only on the size of the neural array stimulated. The most likely way for such a discrimination to be made is by the differing temporal distribution of spikes for the different stimuli. A square wave will presumably fire all units whose thresholds are exceeded, nearly simultaneously, and the degree of phase-locking should be nearly perfect, i.e., a given neuron's firing will be at intervals which are precise multiples of the stimulus period. Conversely, the triangle wave, at the other extreme, will fire low-threshold units slightly earlier in every cycle than high-threshold units. In other words, the degree of simultaneity of the array is reduced. Similarly, each individual unit's firing pattern, over a large number of cycles, might show reduced precision of phase-locking, i.e. increased jitter'. This would be apparent from an examination of an interval histogram, and could also be the basis for discrimination of the different waveforms. Note that the distinction could be made based on a single unit's firing pattern only if a sufficient number of cycles is presented to permit some kind of analysis of interspike intervals, while an analysis of the degree of synchrony among an array of units could theoretically be performed after even a single cycle of the stimulating waveform. Thus, studies of discrimination of fine temporal structures of electrical stimuli are of interest for two major reasons. First, they may assist in developing coding strategies for patients with cochlear (or extra-cochlear) implants. Second, they offer a unique insight into the ability of the CANS to use temporal cues in the complete absence of spectral-place cues. Materials and Methods Each of the patients who participated in these studies had received a single-channel extracochlear implant, with a single-ball electrode at the round window membrane. Technical specifications and basic psychophysical performance have been described previously [28,2]. Surgery has in each case been performed more than one year prior to these studies; during this time, thresholds for electrical stimulation had been stable and patients were using wearable encoding and stimulation units on a regular basis (see Table I for summary of technical specifications). Neither of them is a `star patient'; no open-speech understanding has been achieved without lip-reading. Two patients provided most of the data to be reported. One (U.T.) had suffered bilateral profound deafness from sudden hearing loss superimposed upon congenital sensorineural hearing loss, at age 39. The other (E.P.) was prelingually deaf secondary to meningitis in the first year of life. Neither had any response to audiometric tones up to 110 dB. Each of them participated in 5-10 sessions of about 4 h each. Some of the experiments were repeated on different days; in such cases, the best performance will be reported and noted as such. Although neither test-retest variability nor leaming effects were formally investigated, both appeared to be important in these difficult and tedious psychophysical tasks. However, the results can be taken as at least minimal estimates of performance, i.e., some implant patients can perform at least as well as reported herein on the tasks to be described. Each of the tasks to be described was also performed by 3-6 normal-hearing subjects in a sound-field setting. The stimuli were transduced TABLE I TECHNICAL SPECIFICATIONS Electrodes: 90% Pt, 10% Ir, 1 mm ball diameter Round window niche vs. M. temporalis Stimuli: Capacitively coupled pulses, 0.1-0.2 ms duration. Pulse amplitude and timing detertnined from amplified and compressed microphone signal Signal transmission: Amplitude modulation, 12 MHz 44 by electrostatic headphones with a flat frequency response from 20 to 20000 Hz. Pulses used in both the acoustic and electrical stimulation experiments were 0.1 ms in duration. Amplitude levels for pulses were set for each experiment by a determination of most comfortable level (average of three trials) using a slide-potentiometer. All stimuli were generated through a digital to analog converter (12-bit resolution, AA11-K, DEC) using a dual programmable 1 MHz Clock (KW11K, DEC). The algorithms were implemented in Fortran and Macro-Assembler on a PDP-11/34A (DEC). For experiments with normal hearing subjects a lowpass filter (5 kHz, 24 dB/oct, KrohnHite) was switched between the D/A-Converter and the output attenuator. The maximal frequency resolution for pulse output with a 1 MHz clock obviously depends on the absolute frequency. 1000 clock cycles per pulse interval will produce exactly 1000 pulses per second (pps), 999 cycles 1001.001 pps, 1001 cycles 999.001 pps. The 1 pps-resolution at 1000 pps is not sufficient for psychoacoustic experiments with normal listeners as their frequency discrimination lies in the same range. For implant patients however this value is certainly sufficient. At lower pulse rates the resolution gets progressively better. For example, a pulse rate of 100 pps is generated using 10000 cycles, 100.01 pps using 9999 cycles. Four types of temporal discrimination were studied. In each case, variation of some stimulus parameter gave rise to a clear variation along a particular perceptual dimension (high-low, sharpdull, or smooth-rough), so that stimuli could be labelled and discriminated. Difference limens (DLs) were determined using an adaptive two-alternative forced choice paradigm, in which a transformed up-down method [19] with seven reversals estimated the point at which the probability of a correct response was 0.707. For a frequency discrimination task, for example, each stimulus pair contained a reference stimulus and a higher-frequency test stimulus, in random order; the subject's task was to identify the stimulus pair as either high-low or low-high. All stimuli were 800 ms pulse trains or waveforms with 25 ms rise—fall times, used to modulate the 12 MHz carrier for transcutaneous transmission. The paired stimuli were separated by 400 ms; after each stimulus pair, a delay of 3-5 s (depending on the subject's response time) is imposed. If no response is made after 5 s, the same stimulus pair is repeated. Stochastic pulse stimuli (probability and frequency discrimination) An ordinary pulse train at f = 500 Hz may be considered to have a probability of 1, i.e., a pulse is generated every 2 ms. If the probability is reduced to 0.8, there is an 80% chance of a pulse being delivered at each 2 ms interval. In one second, 400 pulses would be delivered (500 x 0.8), and an interpulse interval histogram would reveal intervals of 2, 4, 6... ms. Such a stimulus (500 Hz at P = 0.8) was always heard as rough compared to an ordinary pulse train (P = 1.0). Individual 800 ms stimuli were generated immediately prior to delivery of each stimulus pair, so that, for example, each presentation of 500 Hz at P= 0.8 was unique. In one set of experiments, a probability DL' was determined. The reference stimulus was a pulse train at P= 1.0 (smooth) and the test stimuli were stochastic pulse trains with the same base frequency but stepwise decreasing probabilities of pulse generation (rough). Although stochastic pulse trains were heard as rough, this did not prevent an appropriate labelling along a high—low continuum. Frequency discrimination experiments were performed in which a reference stimulus (e.g., 500 Hz at P= 0.8) was compared to other stochastic pulse stimuli with the same mean pulse rate and interpulse interval (in this case, 400/s and 2.5 ms). The test stimuli would have higher base frequencies but lower probabilities of pulse delivery (e.g., 525 Hz at P= 0.762; 550 Hz at P= 0.727). Typical stimuli are shown in Fig. 1. Gap detection When subjects discriminated ordinary from stochastic pulse trains, they were, in one sense, detecting gaps in an otherwise continuous pulse train. Thus, for comparison, a simple gap detection task was included. The reference stimulus was an ordinary pulse train 800 ms long; the test stimuli had gaps inserted at the midpoint of the stimuli. One subject (E.P.) also performed a gap detection 45 A STOCHASTIC PULSE TRA1HS E00 Hz/p=0.8 667 Hz/p.0.6 1 e. Time (msec 1.0 200 B 698. Hz/p.8.8 324 8.6_ 68 9 0.8 8 ie ' Time 1msec) 1.8 667. Hz/p.9.6 8.5 239 199 39 0.9 20 5 Time (»sec) 5 ie Fig. 1. Ordinary pulse trains at 500 and 667 Hz differ both in inter-pulse interval (2 vs. 1.5 ms) and in pulse density (500 vs. 667 pulses/s). (A) Reduced probability of pulse generation (P = 0.8 for '500 Hz' vs. P = 0.6 for '667 Hz') yields a stimulus pair differing only in fine temporal structure; all interpulse intervals are, for the '500 Hz' stimulus, multiples of 2 ins, and for the '667 Hz' stimulus, multiples of 1.5 ins. The pulse density is the same for both stimuli: 400 pulses/s. (B) The interpulse interval histograms for the stimulus pair shown in A. Numbers above the histogram bars denote number of corresponding intervals within a 1 s stimulus. The total count for the '500 Hz' stimulus is 399, for the *667 Hz' stimulus 404. task in which narrow-band noise at 250, 500 or 1000 Hz (316 Hz bandwidth) was the stimulus presented with or without a variable gap. Waveform discrimination Discrimination among sine, square, and triangular waves in single-channel electrical stimulation must depend an temporal differences in the waveform rather than spectral differences per se. Thus, we considered the square and triangular waves to be extremes of abrupt and gradual current change, respectively, and constructed intermediate trapezoidal waveforms to test the Limits of discrimination of rate of change of current (see Fig. 2). Since threshold for electrical stimulation may in some cases be more closely related to charge delivered than to current, it is useful to consider the functions relating total charge to time for different stimulus waveforms. Ignoring the effects of charge leakage for the sake of illustration, the instantaneous charge is proportional to the integral of the current waveform. Thus, a square wave (current) yields a triangular wave (charge) and a triangular wave (current) yields a parabolic wave (charge). Fig. 3 illustrates this point: the temporal differences between the charge waveforms are much less than for the corresponding current waveforms. The exact values of current and voltage across the electrode are not known as the electrode impedance cannot be measured directly. However in vitro measurements of the electrode characteristics indicate mostly resistive behaviour in the frequency range of interest and therefore nearly equivalent waveforms for current and voltage. In this context it might be important to note that the stimuli are AC-coupled via a capacitance of 180 nF which acts as a high pass filier emphasizing rapid waveform changes. If it is true that the effective stimulus is the integrated current per period rather than the instantaneous amplitude then this integration would counterbalance the differentiating effect of the coupling capacitor. It was recognized that loudness must be controlled in an experiment of this type; at equal peak current levels, a square wave delivers twice as much charge per half-cycle as a triangular wave. For each experiment, MCL was obtained for each of 10 stimuli to be used (ranging from square to triangular in equal steps). Thus, each stimulus was 46 STIMULUS CURRENT ANO CMRAGE WRVEFORM5 RT/PERIOD INTEGRATED CURRENT (CHARGE) HAVEFORMS 0.250 0.225 0.200 0.175 0.150 0.125 0.100 0.075 0.050 0.025 0.000 e.o 1.0 TIME 1051 0.2 0.4 TIME IMS I 0.6 Fig. 3. Charge waveforms for rectangular and triangular current waveforms. 2.0 Fig. 2. Variation of rise-time/period from 0.0 (rectangular waveform) to 0.25 (triangular waveform). The curves are scaled in amplitude to yield equal areas. Thick lines denote stimulus (current) waveforms, thin lins are the integrated curves corresponding to charge over time. presented at MCL. The current levels required to equalize loudness were surprisingly close to those predicted for an equal-charge model (Fig. 4). In each waveform experiment, the reference stimulus was a square wave, and the test stimuli were trapezoidal; with gradually increasing rise time/period, until at the extreme (rise time/ period = 0.25) a triangular wave was produced. of that neuron's threshold more rapidly than a triangular wave, and would thus be expected to elicit a Spike with more precision on each cycle, i.e., better phase-locking. This condition can be simulated using a pulse train with imposed jitter (Fig. 5). The reference stimulus for these experiments was an ordinary pulse train (e.g., 500 Hz) in which all the interpulse intervals were exactly 1/frequency. The test stimuli were pulse trains with gaussian distributed interpulse intervals with the same mean as the test stimulus and increasing standard deviation (S.D.) (e.g., 2 ms t i • 0.05 ms S.D., i = 0 ... 9). Restilts Jitter discrimination Discrimination of waveforms (e.g., square vs. triangular) could be based on changes in the degree of synchrony in a single neuron's interval histogram (temporal averaging) and/or on changes in the synchrony of an array of neurons (spatial averaging). Assume that a given neuron's threshold of response is not absolute, but has some variability. A square wave would pass through the region Stochastic pulse stimuli Temporal discrimination results are summarized in Table II. Normal listeners very readily detect slight decreases in probability of pulse delivery. With no training, DLs of 0.05 are typical; in other words, P = 0.95 is distinguishable from P =1.0, at frequencies from 80-1000 Hz. With electrical stimulation, U.T. obtained DLs from 47 EOURL-LOUONESS LEVELS FOR DIFFERENT WAVEFORMS tU.T.I 500 Hz PULSE TRRIHS WITH JITTER sd (es«) 11111111111111111111111111111111111111111111111111 1111111111111111111111111 1111111111111111111111111I 1111111111111111111111111 111111111111111 111 11111 111 1 11111111111111 1111111111 II 11111111111 111 11111 111 II 11111111111111111III II II I 1111111111 111 11 I II 111 11 0.0 Time (msecl 0. 0 0.25 0.5 0.75 1.0 100. Fig. 5. Pulse trains (500 Hz) with varying amount of jitter. The pulse intervals are gaussian distributed around a mean value of 2 ms with standard deviations (S.D.) of 0 (regular pulse train), 0.25, 0.5, 0.75 and 1.0 ms (from top to bottom). Only 100 ms of the 800 ms trains are displayed. O Mo 13.1-080. 4-1.250. ›FH0(500. (F* 1000. 0.05 0.10 0.15 BISE TIME/PERIOD 0.20 0.03 to 0.11; E.P. from 0.12 to 0.30. U.T.'s DLs were all less than 0.05 from 125 to 500 Hz. Both showed slightly worse performance at higher frequencies. At 750 Hz, their DLs were 0.11 (U.T.) STOCMASTIC PULSE RATE DIFFERENCE LIMENSI 3 NORMALS 0.25 8 P. EQUAL-LOUDNESS LEVELS FOR DIFFERENT WAVEFORMS (E.P.I a-0.2 X--S 0. 8 +- 0.8 2-Z0.8 X-1(1.0 2 B O w2 cc 70 0.0 8 •b.o 250.0 550.0 FREGUENCY (HZ) 750.0 1000.0 FRED (HZ) Ch4D80. A-4125. 01-0125. £-+250. +-+500. »4(750. 411-• t 000. 8 >H0(500. 40-6750. 4-+1000. A (c ue 2.. -I 8 111.05 CAD 0.15 RISS TIME/PERU:1D 8 L0 Fig. 4. Normalized peak amplitude levels for equal Ioudness sensation versus rise time/period at different frequencies. (A) Data for U.T.; (B) E.P. FIRING PR08R8ILITY tlo Fig. 6. Relative DLs (df/f, in percent) for stochastic pulses: mean values for 3 normal subjects. 48 TABLE II DIFFERENCE LIMENS - TEMPORAL DISCRIMINATION Freq. (Hz) Stochastic pulses Gap DL Pulse prob. Jitter DL (ms) (S.D./T, %) 1.0 0.8 0.6 0.4 0.2 0.6 0.6 0.3 0.4 0.5 0.3 2.2 1.5 0.4 0.5 0.5 0.1 1.0 0.7 0.5 0.5 0.5 0.5 0.01 0.01 0.01 0.02 0.03 0.02 0.18 0.08 0.03 0.02 0.01 0.01 4.5 3.3 1.5 1.0 2.7 2.0 0.07 0.12 0.02 0.03 0.14 0.17 2.20 (27.5) 1.25 (15.6) 0.77 (19.2) 0.27 (13.5) 0.40 (30.0) 0.05 (5.0) 0.11 0.14 0.17 0.01 0.15 1.53 (19.2) 1.30 (32.5) 0.37 (18.3) Normal subjects (mean values) 80 125 250 500 750 1 000 0.01 0.01 0.01 0.01 0.01 0.01 0.4 0.3 0.4 0.2 0.3 0.2 0.3 0.3 0.2 0.1 0.2 0.1 0.4 0.3 0.4 0.2 0.2 0.2 U.T. 80 125 250 500 750 1 000 0.11 0.04 0.06 0.03 0.11 0.07 2.7 7.5 3.8 7.0 7.0 25.0 32.0 5.0 5.2 8.0 24.0 28.0 11.3 6.3 5.0 20.0 E.P. 80 125 250 500 750 1 000 0.12 0.18 0.22 0.30 0.30 10.8 17.5 23.3 22.5 31.7 61.7 25.0 29.2 23.3 30.8 33.3 and 0.30 (E.P.). Both patients reported the stochastic pulse trains to be rough, and could clearly distinguish the rough-smooth dimension elicited from either loudness or pitch. Pulse rate discrimination for stochastic pulse stimuli is, for normal listeners, essentially as good as for ordinary pulse trains, even for probability as low as 0.4. With about 1 h training, normal subjects achieved relative DLs (expressed as df/f, in percent) of less ihan 1% for all frequencies and all probabilities > 0.2 (see Fig. 6, for example). This is better than would be expected for sinusoids (especially at low frequencies, where discrimination of, say, 125 from 126 Hz would be exceptional) and is probably attributable to the rich harmonic structure of the stimuli. A dick train at 125 Hz contains harmonics at 2 and 4 kHz which can be discriminated from harmonics at 2.015 and 4.032 kHz in a 126 Hz dick train. Even at very low probabilities of pulse delivery, this harmonic structure persists, so that this task can- 25.3 10.0 30.0 23.3 Noise (ms) Waveform (RT/T) Pulses (ms) Rate discr. (df/f, %) 16.7 5.0 4.0 60.0 1.8 55.0 (1.47) (1.04) (0.83) (0.83) (1.0) (1.0) 0.14 (14.2) not be considered a pure test of temporal discrimination in normal subjects. As expected, the electrically stimulated patients did muck worse. For ordinary pulse trains, U.T. typically had relative DLs (df/f) of 3-7% up to 750 Hz, about the same as had been obtained with sinusoidal stimuli. With stochastic pulse trains (P = 0.8 or 0.6), she continued to hear the test stimuli as higher in pitch than the reference stimulus, and performed nearly as well as she had with ordinary pulse trains. When probability was reduced further, performance was severely degraded (Fig. 7). E.P.'s performance was worse, but showed similar pattems (Fig. 8): up to 500 Hz, he performed nearly as well for P = 0.8 as he had for ordinary pulse trains (or for sinusoidal stimuli). At 80 and 125 Hz he did as well with P = 0.4 and even P = 0.2. Gap detection U.T. was able to detect gaps as small as 1.0 ms 49 STOCHASTIC PULSE RATE DIFFERENCE LIMENS, U.T. STOCHASTIC PULSE RATE DIFFERENCE LIMENS: E.P. P. P- + 0.2 *-x0.4 2-Z0.8 )1(-* 1 . 0 0.0 `b.o 500.0 FREOUENCT (HZ) 250.0 5.00.0 FREOUENC (ND 750.0 10e0.0 8 FREI] (HZ) FREI] (HZ) 0-1080. 0-4125. 0-4080 . A-4 I 25. ›F-X500. 4-0750. >H<500. 0.-0750. 4.-1'1000. +-* 1000. os = 8 0.2 0.0 0.8 FIRING PROBRBILITT 0.0 1.0 0L0 0.2 0.0 dAN FIRING PROEIRRIL1TT de 1.0 Fig. 7. U.T.'s relative DLs (df/f, in percent) for stochastic pulses. Fig. 8. E.P.'s relative DLs (df/f, in percent) for stochastic pulses. (in a 500 Hz pulse train); her DLs for 250 Hz and 1000 Hz were 1.5 and 2.0 ms, respectively. E.P. did less well, with gap DLs of 4.0 ms (500 Hz), 2.7 ms (750 Hz), and 1.8 ms (1000 Hz). Performance for detection of gaps in narrow-band noise (E.P. only) was much worse, even after some practice: 60 ms at 500 Hz, 55 ms at 1000 Hz. and vice versa. Interestingly, on a second day of testing, she reversed this labelling: square waves were reported as sharp and triangular waves as dull. Her ability to discriminate stimuli consistently along this dimension was unimpaired. Perhaps sharp-dull was not an apt choice to describe the perceptual continuum heard by the patient, but it was (on each occasion) her choice. DLs for waveform variation ranged from 0.02 (rise-time/period) for 250 Hz to 0.17 at 1000 Hz. Her equal-loudness levels approximated the expected equal area function except for 1000 Hz (Fig. 4A). E.P. performed nearly as well as Ü.T. although he was not able to label the differences in perception other than "sounds different". He chose to perform the test at a level above his usually set Waveform discrimination U.T. was able to distinguish square wave from triangular wave stimuli, even up to 1000 Hz. Prior to formal DL measurements with loudness matching, she was able to label these as sharp (triangular) and dull (square). This distinction was subjectively different from loudness, and persisted even when one stimulus was presented near threshold and the other near uncomfortable loudness level, 50 most comfortable level. Again the loudness equalization was performed for every new frequency and test condition. The equalized amplitude deviated systematically from the expected area-function (Fig. 4B). There was one exceptionally good performance at 750 Hz with 100% correct responses for all waveform differences greater than the minimum (rise-time/period = 0.025). At all other tested frequencies the DLs were between 0.11 and 0.17 (see Table II). Jitter discrimination Both U.T. and E.P. heard stimuli with imposed jitter as rough compared to the smooth sound of ordinary pulse trains. This was different from the sharp-dull dimension elicited by waveform variation. U.T. performed slightly better, with jitter DLs (values in parentheses denote the jitter ratio in percent: pulse interval S.D./mean interpulse interval) of 2.2 ms (27.5%) at 80 Hz, 1.25 ms (15.6%) at 125 Hz, 0.77 ms (19.2%) at 250 Hz, 0.27 ms (13.5%) at 500 Hz, 0.40 ms (30.0%) at 750 Hz and 0.05 ms (5.0%) at 1000 Hz. The outcome at 1500 Hz was 0.04 ms (6.0%) (not shown an Table II). E.P. commented the jittered stimuli as "the computer is crazy". His DLs were 1.53 ms (19.2%) at 125 Hz, 1.3 ms (32.5%) at 250 Hz, 0.37 ms (18.3%) at 500 Hz and 0.14 ms (14.2%) at 1000 Hz. There were no measurements at 80 and 750 Hz. Discussion The ability of implanted patients to distinguish stochastic pulse trains from ordinary pulse trains could be considered a form of gap detection. When P = 0.95 at 500 Hz, for example, most of the interpulse intervals are 2 ms, but in a pulse train 800 ms long, the probability of at least one interval 6 ms (3 times the period) is about 0.61. The probability of an interval of 8 ms or longer in such a stimulus is only 0.05. Thus, discrimination of such stimuli from an ordinary pulse train, as was achieved by one of our patients (U.T.) would be equivalent to detecting a `gap' of about 4 ms (since the normal period of the pulse train is 2 ms, an interpulse interval of 6 ms is equivalent to an additional gap of 4 ms). Even E.P., who in this as in most other tests performed worse than U.T., was able to discriminate stochastic pulse trains of 750 and 1000 Hz, with P= 0.70, from ordinary pulse trains. The maximum intervals present in such stochastic stimuli would be 7.-8 ms in most cases, yielding maximum gaps of 6-7 ms. This is much better than the 22-157 ms (mean = 52 ms) reported by Hochmair-Desoyer et al. [13] for gap detection using broadband (250-1000 Hz) noise bursts in 12 implant patients. However, our patients' performance in gap detection for pulse trains was also much better than this. One patient (E.P.) also performed gap detection for narrow-band noise, and his performance was comparable to that reported by Hochmair-Desoyer et al. [13]. It seems, therefore, that gaps in very regular stimuli (pulse trains) are much more readily appreciated than gaps in irregular stimuli (noise bands). Figs. 7 and 8 show the pulse rate discrimination performance of U.T. and E.P. for ordinary and stochastic pulse trains. For ordinary (P = 1.0) pulse trains, their performance is the same as for sinusoidal stimuli (see also [29]). For slight reductions in probability of pulse delivery (P = 0.8 or 0.6), performance was unaffected. Recall that in this task, subjects had to correctly identify as high or low stimuli with identical average pulse rate, differing only in the distribution of interpulse intervals (e.g. 2, 4, 6, ... ms for base frequency = 500 Hz; 1.82, 3.63, 5.45, ms for base frequency = 550 Hz). Not only do these data unequivocally prove that the CANS is able to utilize such temporal pattems for frequency discrimination; they also remove any doubt about confounding of pitch and loudness, since the test stimuli all contained equal numbers of identical pulses. Performance deteriorated rapidly for P smaller than 0.5, suggesting that these patients were primarily relying upon the periods of the base frequencies themselves in making frequency discriminations, rather than the higher multiples of these periods which were increasingly represented for low-probability stimuli. Thus, a relative DL (df/f) of 5% at 500 Hz represents a discrimination of 2 ms vs. 1.9 ms, or an absolute DL of 0.1 ms. The most 'realistic' of our stochastic pulse stimuli, in terms of imitation of VIIIth nerve spike trains, 51 were probably those with low probability of pulse delivery, yielding average rates under 200/s, e.g., P = 0.4 at 500 Hz, P = 0.2 at 1000 Hz. These were poorly discriminated, and sounded very unpleasant to our patients. We have to conclude from these results that single channel stochastic pulse stimulation (imitating the temporal firing distribution of single auditory nerve fibers) does not provide a truly `naturaT pulse code and that probably independent and different versions of stochastic pulse trains would have to be supplied to different neurons to mimic also the spatial discharge pattern distribution. This would however require multiple stimulation channels for each frequency band to be transmitted, and is probably unrealistic. From the single-unit data available, it seems likely that electric pulse trains up to at least 600-900 Hz elicit cycle-forcycle firing [9,30]. While this is indeed unnatural vis-ä-vis acoustic stimulation, the CANS is able to use this information to make frequency discriminations based on fine temporal structure. Neural models capable of such an analysis of incoming VIIIth nerve data have been described, and postulate an array of units tuned to `characteristic periods'. These units could receive separate straight-through and delayed versions of the VIIIth nerve spike train (Licklider's neural autocorrelator, [20]), or could be oscillators with positive feedback loops of varying temporal path length [18]. In either case, such neurons would respond maximally when the interval between incoming spikes (T) is the same as the pathlength of the feedback loop, or the delay line. Even if no single VIIIth nerve neuron fires at rates of 1/T, a group of neurons from the same cochlear locus could converge upon a single second-order neuron, so that aggregate input to the latter cell would be a spike train at f = 1/T. Godfrey et al. [5] have described cochlear nucleus units which fire cyclefor-cycle with dick trains up to 800/s. Such units presumably receive convergent input from several auditory neurons and could provide regular spike train input to higher order neurons actually performing period analysis. To our knowledge, period analyzing units like this have not been described in die brainstem, but could be sought with electrical stimuli like those used in this study. Similar units, binaurally innervated with "characteristic interau- ral delays" have been described in the superior olivary complex [6]. Models like this would detect only a particular period T, and would not be capable of analyzing a single stochastic spike train with interspike intervals of nT unless one postulated multiple delay lins (or feedback loops) for each unit with delays of T, 2T, 3T,... etc. This seems unnecessary and certainly less parsimonious than the models described above. In addition, our patients' poor performance for stochastic pulse trains with P < 0.4 suggests that they relied mainly on the primary intervals present in the stimuli and were not able to use the period multiplier (2T, 3T, 4T, ) very well. It is often argued that frequency discrimination based on temporal features of electrical stimulation (` rate pitch') is limited to the region below 4-500 Hz [27,26]. Indeed, pitch scaling clearly is in many cases random above this level [10]. However, frequency discrimination as good as 10% (df/f) has been reported for frequencies as high as 1000 Hz [28,12] or 2000 Hz [1]. Muller [23] argues that loss of pitch-labelling or scaling ability (above 500 Hz) does not necessarily mean that information from higher spectral regions is wasted in electrical stimulation of the cochlea; cases where speech discrimination is impaired by low-pass filtering at 900 Hz [11] support this point of view. Another argument for the inciusion of relatively high frequencies in coding schemes is the ability of patients to discriminate waveforms which differ (in the frequency domain) only at harmonics of the frequency used. One of our patients (U.T.) could clearly distinguish square from triangular waves independent of intensity manipulations, up to 1000 Hz. Two possibilities for neural encoding of such waveform differences were briefly mentioned in the introduction. Most likely is that the degree of synchrony among a population of VIIIth nerve units (greater for square waves, less for triangular waves) provides the cue for discrimination. Fig. 9 illustrates this idea. If one assumes that units A, B and C have different thresholds, it can be seen that they will discharge together for the square wave, but at different phases of the triangular wave stimulus. The neural substrate for this phenomenon has been provided by the finding of Van den 52 Honert and Stypulkowski [30] that individual VIIIth nerve unit thresholds, independent of cochlear place, have a range of about 12 dB. Loeb et al. [21] have similarly interpreted perceptual differences of electrical stimulation with equal-charge pulses of different durations in terms of a spatial cross-correlation mechanism involving units with different Eiring thresholds. An alternative possibility would be that a single neuron had not a precisely determined threshold, but instead a range of possible thresholds (mean some standard deviation). The CANS could then, by examining the interval histogram of such units, distinguish between various degrees of synchrony. For illustration, Fig. 9 can again serve, if A, B and C are now thought of as discharges (at different times) of the same unit, whose range of possible thresholds is shown. This latter hypothesis seems less likely, since no one has reported individual unit threshold variability for electrical stimulation. In fact, the extremely steep rate-level curves reported by Kiang and Moxon [17], in which the dynamic range (from spontaneous rate to maximum) was only about 4 dB, argue against this. An indirect test of this hypothesis was provided by the experiments involving jitter DL: it was assumed that pulses at, say, 250 Hz with imposed jitter would simulate the response of neurons with variable thresholds to a gradual waveform (triangular). In fact, performance on the jitter DL was much worse than would be required to make the required waveform discrimination. U.T. could distinguish an ordinary 250 Hz pulse train from one with a jitter ratio of 19.2%, i.e. 95% of interpulse intervals were between 2.46 and 5.54 ms (mean ± 2 S.D.). Waveform discrimination at 250 Hz, on the other hand, was possible with a DL (rise time/period) of only 0.02; a trapezoid with a rise time of 0.08 ms could be distinguished from a square wave (see Fig. 10). At 500 Hz, the results were similar: the waveNAVEFORM AND JITTER DL. U.T. 250 HZ 0 cc o.a 1111111 111111) 1114 , IM uadi ..didil i111111111111111...: PULSE GENERATION WITH DIFFERENT NAVEFORMS 1500 HZ) dS 5 c 8 o.o 0.0 TIME INS) 2.0 TIME IMS1 0.0 Fig. 9. Hypothetical firing patterns for three units with different thresholds for rectangular (left) and triangular (right) stimuli (500 Hz). (A) Low threshold-unit. (B) Medium threshold-unit. (C) High threshold-unit. i'.o a'.o T IME Ins) Fig. 10. Comparison of waveform- and jitter-DLs for U.T. (250 Hz). The upper plot shows the gaussion pulse—interval distribution function (standard deviation = 0.77 ms) and the interval histogram of a stimulus pulse train generated according to this function. The lower plot shows a trapezoidal waveform (rise time/period — 0.02, thick line) and a rectangular waveform (rise time/period — 0.0, thin line) of a 250 Hz stimulus. • 53 form DL was 0.03; a trapezoid with a rise time of 0.06 ms was discriminated. The jitter DL was 0.27 ms, meaning that a stimulus with 95% of interpulse intervals between 1.47 and 2.53 ms (mean ± 2 S.D.) could be distinguished from an ordinary pulse train. Thus, waveform differences comprising less than 0.1 ms could be discriminated, while pulse trains with jitter required about 1 ms before being discriminated from ordinary pulse trains. The temporal differences among discriminable waveforms would be even less if we considered their charge (integrated current) waveforms (see Fig. 3). It should be noted again that pulse trains with jitter were heard as rough, whereas the difference between square and triangular waves was heard along a sharp-dull dimension. Pulsatile stimuli (of short duration, constant amplitude and with interpulse intervals greater than the refractory period) evoke synchronous activity for both regular and irregular pulse trains in the whole nerve fiber population whereas analog waveforms produce temporally more dispersed discharge patterns. Our results seem to indicate that the CANS interprets synchronous interneural timing differences between successive stimulus periods differently than periodically repeated interneural timing differences within the same cycle. Several of the questions posed in the Introduction have been answered, at least in part: 1. How well can the CANS use purely temporal information for frequency discrimination? Stochastic pulse trains permitted us to present stimuli differing only in fine time structure. For probabilities > 0.5, performance was as good as for ordinary pulse trains. Period differences as small as 0.1 ms could be discriminated. 2. Would a stochastic, low-mean-rate pulse code be useful for electrical stimulation of the cochlea? Since frequency discrimination was best for high-probability pulse trains, which were least like natural auditory-nerve firing patterns, the answer seems to be `no'. Apparently the CANS does not ordinarily infer frequency by analyzing single-unit interval histograms, but by analyzing the aggregate response of an array of units, at least in the frequency range ( < 1000 Hz) studied. However, under conditions of pulsatile electrical stimulation, discrimination of interpulse periods does not depend on interneural timing differences. 3. What are the limits of waveform discrimination for analog coding? Rise-time differences (square wave vs. trapezoid) of less than 0.1 ms could be discriminated; low-pass filtering, even at 2000 Hz, would eliminate these cues. 4. What are the mechanisms for waveform discrimination? If the CANS were able to distinguish waveforms on the basis of the degree of synchrony in single-unit period histograms, subjects should perform as well in detection of jitter in pulse trains as they do in waveform discrimination. Since jitter detection was an order of magnitude worse than waveform discrimination, we conclude that the degree of synchrony in a neural array is the most important cue. To interpret the results of both the stochastic pulse and waveform discrimination experiments, the notion of a neural array is necessary. Even a single electrode can be considered to control a multi-channel' neural receiver. This has implications for implant coding strategies: pulse coding in which only the timing and amplitude of pulses is varied probably could not exploit this perceptual ability while analog codes (or pulse codes in which pulse duration varies) could do so. The choice of a coding scheme, however, depends on many factors. Our group has previously recognized the fact that simpler analog codes (e.g., that used by the Vienna group) may have advantages over pulse codes in that the information processing required for a pulse code inevitably results in a loss of information, and we are very far from knowing the `optimal' pulse code [3]. However, analog schemes are themselves sometimes difficult to use in practice, requiring intricate frequency equali7ation adjustments. The choice of a pulse code by our group was also based on safety considerations. The experiments reported here make some sense in terms of what is known about single unit responses to electrical stimulation, but they raise many new questions which should be answerable by single-unit neurophysiologic methods. Do individual VIIIth nerve units possess the degree of threshold variability which would permit them to signal the difference between a square and triangular wave by the degree of synchrony in their 54 interval histograms? (We doubt it.) What are the interval histograms of such units to high frequencies (1-4 kHz) like? Are they still unimodal? At the cochlear nucleus level, are units responding preferentially to specific pulse intervals to be found? (We expect they will be found.) Psychophysical work also needs to be done. Simple frequency discrimination above 500 Hz for electrical stimulation is admittedly poor, but the results reported here give some intriguing hints that the CANS is capable of some rather fine temporal discrimination — on the order of less than 0.1 ms. With regard to consonant discrimination, it would be particularly interesting to know how well electrically stimulated patients can perform on tasks involving rapid frequency changes. We wish to emphasize that the results reported are only a minimum estimate of what at least some patients can do. There is, of course, tremendous variability in the abilities of implant patients, depending on (presumably) nerve survival, intellectual ability, etc. Many authors have considered that the differences among patients would assist in choosing between single-channel and multi-channel devices. These differences may also be relevant to a choice among different coding strategies for particular patients receiving single-channel devices: a patient with very limited neural survival might best be served by a very limited pulse code which minimizes problems of dynamic range, while another patient with a more generous array of neurons might profit from an analog coding scheme permitting him to exploit his `multi-channel' auditory nerve receiver. Acknowledgements This work was supported by Swiss National Research Foundation Grant No. 3.848.0.79. and by a Teacher-Investigator Development Award from the National Institute for Neurological and Communicative Disorders and Stroke (to R.A.D.). The authors would like to acknowledge the critical comments of Drs. C. Van den Honert, P. Stypulkowski, I. and E. Hochmair and R. Hartmann. References 1 Bilger, R.C. (1977): Psychoacoustic evaluation of present prostheses. Ann. Otol. Rhinol. Laryngol. 86 (Suppl. 38), 92-140. 2 Dillier, N., Spillmann T. and Guentensperger, J. (1983): Computerized testing of signal encoding strategies with round window implants. In: Cochlear Prostheses: An International Symposium. Editors: C.W. Parkins and J.W. Anderson. Ann. N.Y. Acad. Sei. 405, 360-369. 3 Dillier, N. and Spillmann, T. (1984): Signal coding for single channel stimulation. In: Cochlear Implants, pp. 233-242. Editors: R.A. Schindler and M.M. Merzenich. Raven Press, New York. 4 Evans, E.F. (1978): Place and time coding of frequency in the peripheral auditory system: Some physiological pros and cons. Audiology 17, 369-420. 5 Godfrey, D.A., Kiang, N.Y.-S. and Norris, B.E. (1975): Single unit activity in the posteroventral cochlear nucleus of the cat. J. Comp. Neurol. 162, 247-268. 6 Goldberg, J.M. and Brown, P.B. (1969): Responses of binaural neurons of dog superior olivary complex to dichotic tonal stimuli: some physiological mechanisms of sound localization. J. Neurophysiol. 32, 613-636. 7 Goldstein, J.L. (1978): Mechanisms of signal analysis and pattern perception in periodicity pitch. Audiology 17, 421-445. 8 Hartmann, R., Topp, G. and Klinke, R. (1984): Electrical stimulation of the cat cochlea - discharge pattem of single auditory fibers. Adv. Audiol. 1, 18-29. 9 Hartmann, R., Topp, G. and Klinke, R. (1984): Discharge pattems of cat primary auditory fibers with electrical stimulation of the cochlea. Hearing Res. 13, 47-62. 10 Hochmair-Desoyer, IJ., Hochmair, ES., Burian, K. and Fischer, R.E. (1981): Four years of experience with cochlear prostheses. Med. Progr. Technol. 8, 107-119. 11 Hochmair, E.S. and Hochmair-Desoyer, I.J. (1983): Percepts elicited by different speech-coding strategies. In: Cochlear Prostheses: An International Symposium. Editors: C.W. Parkins and J.W. Anderson. Ann. N.Y. Acad. Sci. 405, 268-279. 12 Hochmair-Desoyer, IJ., Hochmaier, ES., Burian, K. and Stiglbrunner, H.K. (1983): Percepts from the Vienna cochlear prosthesis. In: Cochlear Prostheses: An International Symposium. Editors: C.W. Parkins and J.W. Anderson. Ann. N.Y. Acad. Sei. 405, 295-306. 13 Hochmair-Desoyer, IJ., Hochmair, E.S. and Stiglbrunner, H.K. (1984): Psychoacoustic temporal processing and Speech understanding in cochlear implant patients. In: Cochlear Implants, pp. 291-394. Editors: R.A. Schindler and M.M. Merzenich. Raven Press, New York. 14 Hochmair, E.S. and Hochmair-Desoyer, IJ.: Aspects of sound signal processing using the Vienna intra- and extracochlear implants. In: Cochlear Implants pp. 101-110. Raven Press, New York. 15 House, W.F. and Berliner, K.I. (1982): Cochlear implants: Progress and perspectives. Ann. Otol. 91, Suppl. 91. 16 Johnson, D.H. (1980): The relationship between spike rate • 55 and synchrony in responses of auditory nerve fibers to single tones. J. Acoust. Soc. Am. 68, 1115-1122. 17 Kiang, N.Y.-S. and Moxon, E.C. (1972): Physiological considerations in artificial stimulation of the inner ear. Ann. Otol. RhinoL Laryngol. 81, 714-730. 18 Langner, G. (1981) Neuronal mechanisms for pitch analysis in the time domain. Exp. Brain Res. 44, 450-454. 19 Levitt, H. (1980): Adaptive testing in audiology. Scand. Audiol. Suppl. 6, 241-291. 20 Licklider, J.C.R. (1959): Three auditory theories. In: From Psychology: A Study of a Science, Vol. 1, pp. 41-144. Editor: S. Koch. McGraw-Hill, New York. 21 Loeb, G.E., White, M.W. and Merzenich, M.M. (1983): Spatial cross-correlation. A proposed mechanism for acoustic pitch perception. Biol. Cybern. 47, 149-163. 22 Miller, M.I. and Sachs, M.B. (1984): Representation of voice pitch in discharge patterns of auditory nerve fibers. Hearing Res. 14, 257-279. 23 Muller, C.G. (1983): Comparison of percepts found with cochlear implant devices. In: Cochlear Prostheses: An International Symposium. Editors: C.W. Parkins and J.W. Anderson. Ann. N.Y. Acad. Sei. 405, 412-420. 24 Rose, J.E., Brugge, J.F., Anderson, D.J. and Hind, J.E. (1967): Phase-locked response to low-frequency tones in single auditory nerve fibers of the squirrel monkey. J. Neurophysiol. 40, 769-793. 25 Sachs, M.B., Young, E.D. and Miller, M.I. (1983): Speech encoding in the auditory nerve: Implications for cochlear implants. In: Cochlear Prostheses: An International Symposium. Editors: C.W. Parkins and J.W. Anderson. Ann. N.Y. Acad. Sei. 405, 94-113. 26 Sachs, M.B. (1983): Round table discussion. In: Cochlear Prostheses: An International Symposium. Editors: C.W. Parkins and J.W. Anderson. Ann. N.Y. Acad. Sei. 405, 510-511. 27 Simmons, F.B. (1983): Perceptions from modiolar (eighth nerve) stimulation. In: Cochlear Prostheses: An International Symposium. Editors: C.W. Parkins and J.W. Anderson. Ann. N.Y. Acad. Sei. 405, 259-263. 28 Spillmann, T., Dillier, N. and Guentensperger, J. (1982): Electrical stimulation of hearing by implanted cochlear electrodes in humans. Appl. Neurophysiol. 45, 32-37. 29 Spillmann, T. and Dillier, N. (1984): Round window cochlear implant. In: Cochlear Implants, pp. 157-165. Editors: R.A. Schindler and M.M. Merzenich. Raven Press, New York. 30 Van den Honert, C. and Stypulkowski, P.H. (1984): Physiological properties of the electrically stimulated auditory nerve. II. Single fiber recordings. Hearing Res. 14, 225-243. 31 Wever, E.G. (1949): Theory of Hearing. Princeton University Press, Princeton, NJ.