Estimation of the level and phase of the simple distortion tone in the

advertisement
Estimation of the level and phase of the simple distortion tone
in the modulation domain
Aleksander Sek
Institute of Acoustics, Adam Mickiewicz University, 85 Umultowska, 61-614 Poznan, Poland
Brian C. J. Moorea)
Department of Experimental Psychology, University of Cambridge, Downing Street,
Cambridge CB2 3EB, England
共Received 15 October 2003; revised 26 July 2004; accepted 27 July 2004兲
These experiments were designed to test the idea that nonlinearities in the auditory system can
introduce a distortion component into the internal representation of the envelope of a sound, and to
estimate the phase of the hypothetical distortion component. In experiment 1, a two-alternative
forced-choice 共2AFC兲 task with feedback was used to measure psychometric functions for detecting
5-Hz probe modulation of a 4-kHz sinusoidal carrier in the presence of a masker modulator with
components at 50 and 55 Hz (m⫽0.3 for each component兲. Performance was measured as a
function of the relative phase, ⌬␸, of the probe relative to the ‘‘venelope’’ 共envelope of the
envelope兲 of the masker. Performance was poorest for ⌬␸⫽135°. In experiment 2, ⌬␸ was fixed at
135°, m was set to 0.48 for each masker component, and psychometric functions for detecting probe
modulation were measured using a 2AFC task without feedback. For small probe modulation depths
(m⬇0.03), the detectability index, d ⬘ , was consistently negative, consistent with the existence of
a weak distortion product which can ‘‘cancel’’ the probe modulation. The distortion component for
the conditions of the experiment was estimated to have a phase of about ⫺25° relative to the
venelope. © 2004 Acoustical Society of America. 关DOI: 10.1121/1.1795331兴
PACS numbers: 43.66.Dc, 43.66.Mk, 43.66.Nm, 43.66.Ba 关NFV兴
I. INTRODUCTION
Several recent models for the perception of amplitude
modulation 共AM兲 in sounds are based on the idea that the
envelopes of the outputs of the 共peripheral兲 auditory filters
are fed to a second array of overlapping bandpass filters
tuned to different envelope modulation rates 共Kay, 1982;
Dau et al., 1997a; 1997b; Ewert and Dau, 2000; Ewert et al.,
2002; Verhey et al., 2003兲. This set of filters is usually called
a ‘‘modulation filter bank’’ 共MFB兲. Psychoacoustical evidence consistent with the concept of an MFB has come from
experiments involving detection of ‘‘probe’’ modulation in
the presence of masker modulation; these experiments appear to show frequency selectivity in the modulation domain
共Bacon and Grantham, 1989; Houtgast, 1989兲.
Dau et al. 共1997a兲 conducted an experiment to assess
whether modulation masking could be explained in terms of
the temporal similarity of the envelopes of the signal and
masker, rather than in terms of the MFB. They amplitude
modulated a 5-kHz sinusoidal carrier with a masker that consisted of the third to seventh harmonics of a 30-Hz fundamental frequency; the phases of the components were random. The task was to detect sinusoidal probe modulation in
the range 20 to 120 Hz. The amount of modulation masking
increased progressively as the probe frequency was increased
from 20 to about 100 Hz. There was no maximum in the
masking function at 30 Hz, even though the temporal envelope pattern of the masker and signal was similar at this
frequency. The results were consistent with the idea that the
a兲
Electronic mail: bcjm@cam.ac.uk
J. Acoust. Soc. Am. 116 (5), November 2004
Pages: 3031–3037
auditory system performs a spectral analysis of the envelope.
However, Verhey et al. 共2003兲 later suggested that the failure
of Dau et al. 共1997a兲 to find a peak in the modulation masking pattern at the frequency corresponding to the ‘‘missing
fundamental’’ resulted from the modulation masker impairing detection of the signal modulation on some trials and
enhancing it on others, depending on the specific choice of
共random兲 masker component phases.
Moore et al. 共1999兲 examined modulation masking for
cases where the probe modulation was at a frequency remote
from any spectral frequency in the masker modulation, but
there was nevertheless a similarity between the temporal pattern of the masker modulation and the probe modulation.
This was achieved by using a two-component modulator.
The ‘‘beats’’ between these two components had a rate that
was equal to or close to the probe frequency. A similar
method had been used earlier by Sheft and Yost 共1997兲 to
examine modulation detection interference 共MDI兲. Moore
et al. found that the threshold for detecting 5-Hz probe
modulation was affected by the presence of a pair of masker
modulators beating at a 5-Hz rate 共40 and 45 Hz, 50 and 55
Hz, or 60 and 65 Hz兲. The threshold was dependent on the
phase of the probe modulation relative to the beat cycle of
the masker modulators; the threshold elevation was greatest
when the peak amplitude of the probe modulation coincided
with a peak in the beat cycle. The maximum threshold elevation of the 5-Hz probe produced by the beating masker
modulators was 7–12 dB greater than that produced by the
individual components of the masker modulators. These results cannot be explained in terms of the spectra of the envelopes of the stimuli, as the beating masker modulators did
0001-4966/2004/116(5)/3031/7/$20.00
© 2004 Acoustical Society of America
3031
not produce a 5-Hz component in the spectra of the envelopes. Moore et al. 共1999兲 proposed an explanation for their
results based on the idea that nonlinearities within the auditory system, such as basilar-membrane compression, introduce distortion in the internal representation of the envelopes
of the stimuli. This notion was initially suggested by Shofner
et al. 共1996兲 on the basis of an electrophysiological study
using two-component modulators. In the case of twocomponent beating modulators, a weak component, corresponding to the simple difference component, would be introduced at the beat rate.
Verhey et al. 共2003兲 conducted experiments similar to
those of Moore et al. 共1999兲, but included conditions using
both two-component and three-component masker modulators. Following Ewert et al. 共2002兲, they used the term
‘‘venelope’’ to refer to the 共ac-coupled兲 envelope of the envelope. Like Moore et al., Verhey et al. found that, for a
probe modulation frequency equal to the masker venelope
periodicity, the probe modulation depth at threshold varied
with the phase of the probe relative to the venelope. However, unlike Moore et al., Verhey et al. found that thresholds
were lower for the in-phase condition, where maxima in the
probe coincided with maxima in the venelope, than for the
antiphase condition. In comparable experiments, described
later, Füllgrabe et al. 共2004兲 found large individual differences in the relative probe phase leading to the poorest detectability of the probe modulation.
Verhey et al. argued that basilar-membrane compression
could not explain their results, as it leads to the prediction of
a phase effect opposite to that found by them. They proposed
that the auditory system extracts the venelope prior to the
MFB, and in a separate pathway, as suggested earlier by
Ewert et al. 共2002兲. The concept of venelope extraction may
be regarded as a functional way of creating an internal representation that contains both the envelope and the venelope.
Several researchers have also noted that venelope or ‘‘beat’’
cues may be present at the outputs of modulation filters
tuned to the first-order or ‘‘carrier’’ rates 共Ewert et al., 2002;
Fullgrabe and Lorenzi, 2003; Millman et al., 2003; Verhey
et al., 2003兲; such cues might be used for the detection of the
envelope beats 共Millman et al., 2003兲.
The present paper is particularly concerned with two
issues. First, we wished to establish more clearly how the
detectability of probe modulation depends on the relative
phase of the probe and the venelope of a two-component
masker modulator beating at the same rate as the probe. Second, we wished to establish whether the effect produced by
the two-component masker modulator could be explained in
terms of the introduction by the auditory system of a distortion component at the venelope rate. In experiment 1 we
measured psychometric functions for the detection of probe
modulation for eight different relative phases of the probe
and venelope. We measured psychometric functions rather
than estimating thresholds using an adaptive procedure, since
it was not obvious that the psychometric functions would
always be monotonic 共or that d ⬘ would always be positive兲,
as explained below. Also, we wished to assess whether the
form of the psychometric functions could be explained using
the assumption of a distortion component. In experiment 2
3032
J. Acoust. Soc. Am., Vol. 116, No. 5, November 2004
we measured psychometric functions for the detection of
probe modulation in the presence of a two-component
masker modulator using the relative phase that had been
found in experiment 1 to lead to the poorest performance. A
two-alternative forced-choice task without feedback was
used. We anticipated that if a distortion component at the
venelope rate was present, and was out of phase with the
probe modulation, it might lead to negative d ⬘ values for
some probe modulation depths, i.e., the probe would be consistently identified in the wrong interval. The results showed
that this was indeed the case.
II. EXPERIMENT 1: PSYCHOMETRIC FUNCTIONS FOR
DIFFERENT PROBE AND VENELOPE PHASES
A. Stimuli
The carrier was a 4-kHz sinusoid with a level of 70 dB
SPL. This relatively high carrier frequency was chosen so
that the spectral sidebands produced by the modulation
would not be resolved. The probe modulation frequency was
5 Hz. The masker modulator was composed of two sinusoids
with frequencies of 50 and 55 Hz. The modulation index, m,
for each masker modulator component was 0.3. Each modulator started in sine phase. The equation describing the
masker envelope, E(t), is
E 共 t 兲 ⫽1⫹0.3 sin共 2 ␲ 55t 兲 ⫹0.3 sin共 2 ␲ 50t 兲 ,
共1兲
where t is time. Although the individual modulator components had zero amplitude at time zero, the venelope had its
maximum value 共0.6兲 at time zero. The venelope of the
masker modulator repeated at a 5-Hz rate, but there was no
5-Hz component in the modulation spectrum of the masker.
The phase of the probe modulation relative to the venelope is
defined in terms of ⌬␸, where ⌬␸ is zero when the peak in
the amplitude of the probe modulation coincides with the
peak in the venelope. This meant that, for ⌬␸⫽0, the signal
starting phase was advanced by 90° 共␲/2 radians兲 relative to
sine phase. Values of ⌬␸ were 0, 45, 90, 135, 180, 225, 270,
and 315°.
On each trial, the carrier was presented in two bursts
separated by a silent interval of 300 ms. Each burst had
20-ms raised-cosine rise and fall ramps, and an overall duration 共including rise/fall times兲 of 1000 ms. The modulation
was applied during the whole of the carrier, and the starting
phase of the modulation is defined relative to the start of the
carrier.
Stimuli were generated using a Tucker-Davis Technologies array processor 共TDT-AP2兲 in a host PC, and a 16-bit
digital to analog converter 共TDT-DD1兲 operating at a 50-kHz
sampling rate. The stimuli were attenuated 共TDT-PA4兲 and
sent through an output amplifier 共TDT-HB6兲 to a Sennheiser
HD580 earphone. Only one ear was tested for each subject.
Subjects were seated in a double-walled sound-attenuating
chamber.
B. Procedure
Psychometric functions were measured using a twointerval forced-choice procedure. The masker modulation
A. Sek and B. C. J. Moore: Modulation distortion
was present in both intervals of a trial, and the probe modulation was presented in either the first or the second interval,
selected at random. The task of the subject was to indicate,
by pressing one of two buttons, the interval containing the
probe modulation. Feedback was provided by lights following each response. For each subject and each value of ⌬␸,
five different values were used for the modulation depth of
the probe, m p . The values were chosen individually for each
subject and each value of ⌬␸ on the basis of pilot trials, so as
to give values for the detectability index, d ⬘ , ranging from
just above zero to about 2–3. A run started with five trials
using the largest value of m p . Then, in successive trials,
stimuli with each value of m p were presented once, in descending order. This sequence was repeated ten times to give
a total of 55 trials per run. With this procedure, subjects
receive an easily detected stimulus once every five trials,
which helps them to ‘‘remember’’ what aspect of the stimulus they should be listening to. Without such a reminder,
subjects may ‘‘lose’’ the most effective detection cue, leading
to unduly poor performance for low probe modulation depths
共Taylor et al., 1983; Moore and Sek, 1992兲. Results from the
first five trials of each run were discarded. Each run was
repeated at least 20 times, so that at least 200 judgments
were obtained for each value of m p .
C. Subjects
Three subjects were tested. One was author AS. The
other two subjects were paid for their services. All subjects
had absolute thresholds less than 20 dB HL at all audiometric
frequencies and had no history of hearing disorders. All had
previous experience in psychoacoustic tasks, including tasks
similar to the one used here. They received extensive practice during the pilot trials used to determine appropriate values of m p to be used in the main experiment.
D. Results
The percent-correct scores were converted to d ⬘ values
using standard tables 共Hacker and Ratcliff, 1979兲. The pattern of results was similar across subjects. Psychometric
functions for a representative subject 共AW兲 are shown by the
open squares in Fig. 1; d ⬘ is plotted as a function of
20 log(mp) 共the solid squares connected by dashed lines show
predictions which are explained later兲. Performance varied
markedly with ⌬␸. Poorest performance was found for ⌬␸
⫽135° and 180°. Best performance was found for ⌬␸⫽0°
and 315°. This pattern of results was found for all three
subjects and is similar to that found by Verhey et al. 共2003兲,
but differs from that found by Moore et al. 共1999兲. There
were some cases of negative d ⬘ values in the results. However, none of the d ⬘ values was significantly below 0, based
on confidence intervals for d ⬘ calculated as described by
Miller 共1996兲.
To estimate threshold values of m p giving a d ⬘ value of
1, the data were fitted with functions of the form
log10共 d ⬘ 兲 ⫽a⫹b log10共 m p 兲 ,
共2兲
where a and b are fitting constants. Since the d ⬘ values were
sometimes negative for the two smallest values of m p , the
J. Acoust. Soc. Am., Vol. 116, No. 5, November 2004
FIG. 1. Open squares show results of experiment 1 for subject AW. The
detectability index, d ⬘ , for detecting 5-Hz probe modulation is plotted as a
function of 20 log(mp), where m p is the modulation depth of the probe. Each
panel shows results for one relative phase of the probe modulation relative
to the beat cycle 共the venelope兲 of the masker modulator; the relative phase
is denoted ⌬␸. Filled squares connected by dashed lines show predictions
derived as described in the text.
fitting was done using only the data for the three largest
values of m p . The resulting threshold estimates are shown in
Table I. The pattern of results is similar across subjects, all
three showing the highest thresholds for ⌬␸⫽135°. A withinsubjects analysis of variance on the threshold values with
factor phase showed a highly significant effect of phase:
F(7,14)⫽15.598, p⬍0.001. The threshold values are somewhat lower than those estimated by Verhey et al. 共2003兲 in
their most similar condition 共5-kHz carrier, masker modulaTABLE I. Thresholds (d ⬘ ⫽1) estimated from the functions relating log(d⬘)
to log(mp).
Phase, degrees
JL
AW
AS
Mean
0
45
90
135
180
225
270
315
⫺28.4
⫺26.6
⫺23.7
⫺18.3
⫺19.0
⫺20.6
⫺24.1
⫺26.4
⫺26.8
⫺26.6
⫺22.1
⫺18.6
⫺19.2
⫺22.4
⫺26.1
⫺29.0
⫺26.0
⫺23.0
⫺19.5
⫺18.4
⫺21.6
⫺22.4
⫺26.5
⫺27.3
⫺27.1
⫺25.4
⫺21.8
⫺18.4
⫺19.9
⫺21.8
⫺25.6
⫺27.6
A. Sek and B. C. J. Moore: Modulation distortion
3033
tors at 40 and 45 Hz兲, perhaps because the signal duration
used here was longer 共1000 ms versus 600 ms兲.
Two methods were used to estimate the value of ⌬␸
giving poorest performance. Both methods are based on the
assumption that the function relating the threshold to the
value of ⌬␸ is symmetrical about the value of ⌬␸ giving the
highest threshold. In the first method, the mean thresholds
were fitted with a function of the following form:
slope⫽max⫺A 共 ⌬ ␸ ⫺ ␸ offset兲 2 ,
共3兲
where max is the maximum value of the function, A is a
constant, and ␸ offset is the value of ⌬␸ at the maximum of the
function. The best-fitting value of ␸ offset was 155°. In the
second method, the data were fitted with a single cycle of a
sine function, where the amplitude, the dc-offset, and the
phase were free parameters. The best-fitting function had a
maximum for ⌬␸⫽155°. The two methods are consistent in
indicating that the poorest performance was obtained for ⌬␸
⫽155°.
The results are consistent with the idea that a nonlinearity in the auditory system introduced a weak distortion component into the internal representation of the envelope with a
phase of ⫺25° relative to the venelope. We denote the effective modulation depth of the hypothetical distortion component by m d . The pattern of results can be understood in the
following way. The probe modulation was probably detected
as a change in the depth of 5-Hz modulation; for comparable
effects of phase using noise carriers, see Bacon and
Grantham 共1989兲 and Strickland and Viemeister 共1996兲. Regardless of the value of ⌬␸, subjects had to distinguish the
5-Hz modulation of depth m d in the nonsignal interval from
the 5-Hz modulation in the signal interval resulting from the
vector sum of the distortion component and the probe modulation, which is denoted m sum . For some values of ⌬␸ 共135°
and 180°兲, the distortion component and probe modulation
tend to cancel, leading to a small value of m sum and to poor
performance. For other values of ⌬␸ 共0° and 315°兲, the distortion component and probe modulation are almost in phase,
leading to a large value of m sum and to good performance.
However, the value of d ⬘ should be monotonically related to
m sum⫺m d 共the difference in modulation depth in the two
intervals兲, and the relationship should be the same for all
values of ⌬␸.
To test this prediction, for each subject a starting value
was assumed for m d . Assuming that the distortion component had a phase relative to the venelope of ⫺25°, the value
of m sum was calculated for each value of m p . The correlation
of the d ⬘ values with the values of m sum⫺m d was then determined, and the value of m d was systematically varied to
determine the value giving the highest correlation. The resulting values of m d , expressed as 20 log(md), were ⫺29.1,
⫺28.3, and ⫺30.2 for JL, AW, and AS, respectively, and the
corresponding correlations were 0.97, 0.97, and 0.94. These
values for 20 log(md) suggest that the effective magnitude of
the hypothetical distortion component is very low, corresponding to a barely detectable amount of modulation for a
sinusoidal carrier 共Zwicker, 1952; Sek and Moore, 1994;
Dau et al., 1997a; Kohlrausch et al., 2000; Moore and Glasberg, 2001兲. Increasing the assumed value of m d by, for ex3034
J. Acoust. Soc. Am., Vol. 116, No. 5, November 2004
FIG. 2. Scatter plots of the values of d ⬘ against the values of m sum⫺m d 共see
the text兲, denoted here ‘‘difference in effective modulation depth.’’ Each
panel shows results for one subject. Each symbol shows results for one
value of ⌬␸, as indicated in the key. Linear regression lines are also shown.
ample, 6 dB resulted in substantial decreases in the correlation of the d ⬘ values with the values of m sum⫺m d .
Averaging across subjects, the correlation decreased from
0.96 to 0.75. Decreasing the assumed value of m d by 6 dB
resulted in somewhat smaller decreases in the correlation, to
a mean value of 0.91. Thus, the magnitude of the distortion
component is unlikely to be much bigger than estimated
above, but it could be somewhat smaller.
Figure 2 shows scatter plots of the values of d ⬘ against
the values of m sum⫺m d . It is clear that the data for the
different values of ⌬␸ all lie along the same function for
each subject and that d ⬘ is almost linearly related to m sum
⫺m d . The scatter plots in Fig. 2 were fitted with linear
regression lines, which are shown in the figure, and these
lines were used to generate predicted values of d ⬘ for each
value of ⌬␸ and m p . The predictions for AW are shown as
filled squares and dashed lines in Fig. 1. There is no evidence
A. Sek and B. C. J. Moore: Modulation distortion
FIG. 3. The squares and circles show
results of experiment 2, in which no
feedback was given and each component of the masker had a modulation
depth of 0.48. The triangles reproduce
results from experiment 1 in which
feedback was given and each component of the masker had a modulation
depth of 0.3. The value of ⌬␸ was
fixed at 135°. Each panel shows results for one subject.
for any systematic discrepancy between the predicted and
obtained values. This was also true for the results of the other
subjects.
In summary, the results are consistent with the idea that
the masking of the 5-Hz probe modulation by the twocomponent masker modulator was caused by a low-level
5-Hz distortion component in the internal representation of
the masker envelope. This distortion component appears to
have a phase of about ⫺25° relative to the venelope of the
masker.
III. EXPERIMENT 2: PSYCHOMETRIC FUNCTIONS
DETERMINED WITHOUT FEEDBACK USING
VERY LOW PROBE MODULATION DEPTHS
A. Rationale
In experiment 2, we sought further evidence for the hypothetical envelope distortion product.
The experiment was similar to experiment 1, but was
modified in the following ways.
共1兲 Only a single value of ⌬␸ was used, namely 135°. This
was the value that led to the poorest performance in experiment 1. For this value of ⌬␸, the envelope distortion
product should have been almost opposite in phase to the
probe modulation.
共2兲 No feedback was given. This meant that subjects could
not use the feedback to modify their strategy, and made
it more likely that they would always pick the interval in
which the modulation depth sounded greater.
共3兲 The modulation depth of the two-component masker
modulator was increased. The value of m for each component of the masker was set to 0.48. This was done
since it seemed likely that the magnitude of the hypothetical distortion product would increase with increasing modulation depth of the masker 共Moore and Sek,
2000; Sek and Moore, 2003兲. One possible problem here
is that the phase of the hypothetical distortion product
might change with the modulation depth of the masker.
Thus, the value of ⌬␸ of 135° might not be optimal for
producing cancellation of the probe modulation. The
maximum amplitude of the masker modulator, given that
the two modulator components started in sine phase, was
0.9573. To avoid overmodulation, the modulation depth
of the probe was not allowed to exceed 0.0447. Given
that ⌬␸ was 135°, the maximum amplitude of the
masker and probe modulators combined was 0.9839.
J. Acoust. Soc. Am., Vol. 116, No. 5, November 2004
共4兲 The probe modulation depths were chosen to be small,
so that they were likely to be comparable with the modulation depth of the hypothetical distortion product, as
estimated in experiment 1. This was done to increase the
likelihood of finding cancellation effects.
共5兲 Several closely spaced values of m p were used, to avoid
the possibility of missing the range of values of m p over
which d ⬘ was negative.
B. Method
The subjects were the same as for experiment 1. The
stimuli and method were also almost the same as for experiment 1, except that no feedback was provided. Two sets of
runs were conducted, covering different ranges of the probe
modulation depth, m p . In one set, the values were 0.005,
0.007, 0.01, 0.014, and 0.02. In a second set, the values were
0.02, 0.028, 0.035, 0.04, and 0.045. Each run was repeated at
least 40 times, so that at least 400 judgments were obtained
for each value of m p .
C. Results
The results for each subject are shown in Fig. 3. The
results for the two sets of values of m p used in experiment 2
are shown by squares and circles. The triangles show results
from experiment 1 for ⌬␸⫽135°; note, however, that the
masker modulation depth was greater in experiment 2 than in
experiment 1. The results for small values of m p show that
there is a range over which d ⬘ is consistently negative. The
minimum value of d ⬘ occurs for 20 log(mp)⬇⫺30, although
the value varies across subjects from ⫺34 to ⫺27. At the
minimum, the value of d ⬘ is about ⫺0.43. Confidence intervals for d ⬘ were calculated as described by Miller 共1996兲.
For 400 forced-choice trials and for d ⬘ ⬇0.4, the 95%confidence interval is about ⫾0.2. Thus, d ⬘ values in the
vicinity of the minimum differ significantly from zero (p
⬍0.05).
The results support the idea that the auditory system
generates a weak envelope distortion component at the venelope rate 共5 Hz兲. When the probe modulation depth is comparable to the effective modulation depth of the distortion
component, and when ⌬␸⫽135°, the probe and distortion
component modulation nearly cancel, leading subjects to select the ‘‘wrong’’ interval as containing the probe modulation. It seems reasonable to assume that the probe and distortion component are nearly equal in effective modulation
A. Sek and B. C. J. Moore: Modulation distortion
3035
depth when d ⬘ is at its most negative value. As noted above,
the minimum value of d ⬘ occurred for 20 log(mp)⬇⫺30.
Thus, the results suggest that the distortion component in the
internal representation of the envelope has a magnitude approximately equal to that produced by an input modulation
depth of 0.032.
The estimated values of m d for individual subjects are
similar to those estimated from the data of experiment 1, and
are in the same rank order; the value is highest for AW and
lowest for AS. It is curious that the estimated values were not
higher for experiment 2 than for experiment 1, as the masker
modulation depth was greater in experiment 2. Possibly, the
relative phase of the distortion component varies with the
masker modulation depth, and the value of ⌬␸ chosen for
experiment 2 was not optimal for producing cancellation of
the distortion component and probe modulation.
The effective level of the distortion component estimated from experiments 1 and 2 is comparable to that estimated by Moore et al. 共1999兲. For a two-component modulator with m⫽0.3 for each component, the distortion product
was estimated to have an effective modulation index of 0.027
(20 log m⫽⫺33.7). In their model, Ewert et al. 共2002兲 and
Verhey et al. 共2003兲 assumed that the effective magnitude of
the venelope component was scaled by a factor of 0.3 relative to the envelope. For example, for a two-component
modulator with m⫽0.3 for each component, the venelope
amplitude fluctuates between 0 and 0.6, so the scaled venelope would have a peak-to-valley ratio of 0.2. Due to the fact
that the two-component modulator does not produce a sinusoidal venelope, the venelope component at the difference
frequency would have a value of m of about 0.076
(20 log m⫽⫺22.4). This is somewhat larger than estimated
from experiments 1 and 2. The difference across studies may
simply reflect individual differences.
IV. DISCUSSION
There are various ways in which nonlinearities in the
auditory system might introduce a distortion component at
the venelope rate into the internal representation of the envelope. Basilar-membrane compression is probably not involved, since that nonlinearity would introduce a distortion
component that was 180° out of phase with the venelope. In
any case, the detection of AM of a sinusoidal carrier probably depends strongly on the use of information from the
high-frequency side of the excitation pattern evoked by the
carrier 共Zwicker, 1956; Moore and Sek, 1994; Kohlrausch
et al., 2000兲. This part of the excitation pattern appears to be
processed almost linearly on the basilar membrane, at least
for medium to high frequencies 共Rhode and Robles, 1974;
Sellick et al., 1982兲, so a distortion component at the venelope rate would not be introduced. Basilar-membrane nonlinearity might play a greater role if subjects were forced to
attend to the outputs of auditory filters tuned close to the
carrier frequency 共which is not the case for most previous
studies, or the current one兲.
In experiments similar to the present ones, Füllgrabe
et al. 共2004兲 measured the detectability of 5-Hz probe modulation of a 5-kHz carrier in the presence of a ‘‘second-order’’
modulator 共Lorenzi et al., 2001a; 2001b兲, as a function of the
3036
J. Acoust. Soc. Am., Vol. 116, No. 5, November 2004
relative phase, ⌬␸, between the probe modulation and the
venelope of the second-order modulation. They included
conditions with a notched noise centered at 5 kHz, which
was intended to restrict off-frequency listening. In the absence of the notched noise, the value of ⌬␸ leading to poorest detectability of 5-Hz second-order modulation varied
with the first-order modulation rate, suggesting that at least
one component of the nonlinearity that generates the envelope distortion product is time varying; an instantaneous
nonlinearity would lead to an envelope distortion component
with a relative phase that was independent of the first-order
rate. In the presence of the notched noise, the value of ⌬␸
giving poorest detectability hardly varied with first-order
modulation rate, but it did vary across subjects, from about
45° to 135°. These results suggest that more than one mechanism may contribute to the nonlinearity.
Possible nonlinearities contributing to distortion in the
internal representation of the envelope occur in peripheral
transduction processes 共Yates, 1990兲, and peripheral adaptation effects 共Smith, 1977兲. The model of Dau and co-workers
共Dau et al., 1997a; 1997b兲 incorporates ‘‘adaptation loops’’
to simulate adaptation processes, which introduce strong
nonlinearity, but only for very low modulation rates 共below
about 2 Hz兲. Verhey et al. 共2003兲 considered several models
for generating a distortion component at the venelope rate.
These included a ‘‘threshold’’ model, which effectively produced half-wave rectification of the ac-coupled envelope,
and a model in which the venelope was explicitly extracted.
They concluded that the venelope model gave the best fit to
their data. However, for this model, the best performance is
predicted to occur when ⌬␸ is exactly equal to 0° and the
worst performance is predicted when ⌬␸ is exactly 180°. In
our experiment 1, performance was poorest for ⌬␸⫽135 and
180°, and the form of the data suggested a threshold maximum centered at about 155°. In the study of Füllgrabe et al.
共2004兲, described above, the value of ⌬␸ leading to poorest
performance was often below 180° when no notched noise
was used, and was consistently below 180° when a notchednoise was used. Thus, it seems clear that the envelope distortion component is not always exactly in phase with the
venelope, and that the distortion component phase may vary
from one subject to another. This casts some doubt upon the
idea that the envelope distortion component results from explicit extraction of the venelope at some stage in the auditory
system.
V. CONCLUSIONS
The following conclusions can be drawn from this study.
共1兲 Experiment 1 showed that, in the presence of a pair of
masker modulators beating at a 5-Hz rate 共50 and 55
Hz兲, the detectability of 5-Hz probe modulation was dependent on the phase of the probe modulation relative to
the beat cycle of the masker modulators. The relative
phase, ⌬␸, is defined as zero when the peak amplitude of
the probe modulation coincides with a peak in the beat
cycle 共the peak in the venelope of the masker兲. The best
A. Sek and B. C. J. Moore: Modulation distortion
共2兲
共3兲
共4兲
共5兲
performance occurred when ⌬␸ was 0° or 315°. The
poorest performance occurred when ⌬␸ was 135° or
180°.
The pattern of the results for experiment 1 could be fitted
well based on the assumption that the auditory system
introduced a weak distortion component in the modulation spectrum at a 5-Hz rate. Performance appears to be
based on the difference between the modulation depth of
the distortion component 共in the nonsignal interval兲 and
of the vector sum of the distortion component and the
probe modulation 共in the signal interval兲. The value of
d ⬘ is linearly related to this difference.
Experiment 2 used a fixed value of ⌬␸ of 135°; this was
the value that led to the poorest performance in experiment 1. In contrast to experiment 1, no feedback was
given. The results showed that d ⬘ values were consistently negative over a range of probe modulation depths,
m p ; in other words, over this range of m p subjects consistently identified the probe modulation as being in the
wrong interval of the two-alternative forced-choice task.
The value of m p leading to the most negative value of d ⬘
was about 0.032 关 20 log(mp)⫽⫺30兴 .
The results are consistent with the idea that nonlinearities within the auditory system can introduce a weak
distortion component in the internal representation of the
envelopes of the stimuli, although a compressive nonlinearity does not account for the results. In the case of
two-component beating modulators, a weak component
is introduced at the beat rate. Even for large modulation
depths of the two-component modulator, the effective
modulation depth of the distortion component appears to
be only about 0.03.
For the conditions of our experiment, the envelope distortion component appears to have a phase between 0°
and ⫺45° relative to the venelope; the best estimate of
the relative phase was ⫺25°. However, the relative phase
may vary across conditions 共e.g., with the frequencies of
the masker modulator components兲 and across subjects.
ACKNOWLEDGMENTS
This work was supported by the Wellcome Trust and the
Medical Research Council 共UK兲. We thank Neal Viemeister,
Torsten Dau, Christian Lorenzi, and one anonymous reviewer for helpful comments on an earlier version of this
paper.
Bacon, S. P., and Grantham, D. W. 共1989兲. ‘‘Modulation masking: Effects of
modulation frequency, depth, and phase,’’ J. Acoust. Soc. Am. 85, 2575–
2580.
Dau, T., Kollmeier, B., and Kohlrausch, A. 共1997a兲. ‘‘Modeling auditory
processing of amplitude modulation. I. Detection and masking with
narrow-band carriers,’’ J. Acoust. Soc. Am. 102, 2892–2905.
Dau, T., Kollmeier, B., and Kohlrausch, A. 共1997b兲. ‘‘Modeling auditory
processing of amplitude modulation. II. Spectral and temporal integration,’’ J. Acoust. Soc. Am. 102, 2906 –2919.
Ewert, S. D., and Dau, T. 共2000兲. ‘‘Characterizing frequency selectivity for
envelope fluctuations,’’ J. Acoust. Soc. Am. 108, 1181–1196.
Ewert, S. D., Verhey, J. L., and Dau, T. 共2002兲. ‘‘Spectro-temporal processing in the envelope-frequency domain,’’ J. Acoust. Soc. Am. 112, 2921–
2931.
Füllgrabe, C., and Lorenzi, C. 共2003兲. ‘‘The role of envelope beat cues in the
detection and discrimination of second-order amplitude modulation,’’ J.
Acoust. Soc. Am. 113, 49–52.
J. Acoust. Soc. Am., Vol. 116, No. 5, November 2004
Füllgrabe, C., Moore, B. C. J., Demany, L., Ewert, S., Sheft, S., and Lorenzi,
C. 共2004兲. ‘‘Modulation masking produced by 2nd-order modulators,’’ J.
Acoust. Soc. Am. 共submitted兲.
Hacker, M. J., and Ratcliff, R. 共1979兲. ‘‘A revised table of d ⬘ for
M-alternative forced choice,’’ Percept. Psychophys. 26, 168 –170.
Houtgast, T. 共1989兲. ‘‘Frequency selectivity in amplitude-modulation detection,’’ J. Acoust. Soc. Am. 85, 1676 –1680.
Kay, R. H. 共1982兲. ‘‘Hearing of modulation in sounds,’’ Physiol. Rev. 62,
894 –975.
Kohlrausch, A., Fassel, R., and Dau, T. 共2000兲. ‘‘The influence of carrier
level and frequency on modulation and beat-detection thresholds for sinusoidal carriers,’’ J. Acoust. Soc. Am. 108, 723–734.
Lorenzi, C., Soares, C., and Vonner, T. 共2001a兲. ‘‘Second-order temporal
modulation transfer functions,’’ J. Acoust. Soc. Am. 110, 1030–1038.
Lorenzi, C., Simpson, M. I., Millman, R. E., Griffiths, T. D., Woods, W. P.,
Rees, A. et al. 共2001b兲. ‘‘Second-order modulation detection thresholds
for pure-tone and narrow-band noise carriers,’’ J. Acoust. Soc. Am. 110,
2470–2478.
Miller, J. 共1996兲. ‘‘The sampling distribution of d ⬘ ,’’ Percept. Psychophys.
58, 65–72.
Millman, R. E., Green, G. G., Lorenzi, C., and Rees, A. 共2003兲. ‘‘Effect of
a noise modulation masker on the detection of second-order amplitude
modulation,’’ Hear. Res. 178, 1–11.
Moore, B. C. J., and Glasberg, B. R. 共2001兲. ‘‘Temporal modulation transfer
functions obtained using sinusoidal carriers with normally hearing and
hearing-impaired listeners,’’ J. Acoust. Soc. Am. 110, 1067–1073.
Moore, B. C. J., and Sek, A. 共1992兲. ‘‘Detection of combined frequency and
amplitude modulation,’’ J. Acoust. Soc. Am. 92, 3119–3131.
Moore, B. C. J., and Sek, A. 共1994兲. ‘‘Effects of carrier frequency and
background noise on the detection of mixed modulation,’’ J. Acoust. Soc.
Am. 96, 741–751.
Moore, B. C. J., and Sek, A. 共2000兲. ‘‘Effects of relative phase and frequency spacing on the detection of three-component amplitude modulation,’’ J. Acoust. Soc. Am. 108, 2337–2344.
Moore, B. C. J., Sek, A., and Glasberg, B. R. 共1999兲. ‘‘Modulation masking
produced by beating modulators,’’ J. Acoust. Soc. Am. 106, 908 –918.
Rhode, W. S., and Robles, L. 共1974兲. ‘‘Evidence from Mössbauer experiments for nonlinear vibration in the cochlea,’’ J. Acoust. Soc. Am. 55,
588 –596.
Sek, A., and Moore, B. C. J. 共1994兲. ‘‘The critical modulation frequency and
its relationship to auditory filtering at low frequencies,’’ J. Acoust. Soc.
Am. 95, 2606 –2615.
Sek, A., and Moore, B. C. J. 共2003兲. ‘‘Testing the concept of a modulation
filter bank: The audibility of component modulation and detection of
phase change in three-component modulators,’’ J. Acoust. Soc. Am. 113,
2801–2811.
Sellick, P. M., Patuzzi, R., and Johnstone, B. M. 共1982兲. ‘‘Measurement of
basilar membrane motion in the guinea pig using the Mössbauer technique,’’ J. Acoust. Soc. Am. 72, 131–141.
Sheft, S., and Yost, W. A. 共1997兲. ‘‘Modulation detection interference with
two-component masker modulators,’’ J. Acoust. Soc. Am. 102, 1106 –
1112.
Shofner, S., Sheft, S., and Guzman, S. J. 共1996兲. ‘‘Responses of ventral
cochlear nucleus units in the chinchilla to amplitude modulation by lowfrequency, two-tone complexes,’’ J. Acoust. Soc. Am. 99, 3592–3605.
Smith, R. L. 共1977兲. ‘‘Short-term adaptation in single auditory-nerve fibers:
Some poststimulatory effects,’’ J. Neurophysiol. 49, 1098 –1112.
Strickland, E. A., and Viemeister, N. F. 共1996兲. ‘‘Cues for discrimination of
envelopes,’’ J. Acoust. Soc. Am. 99, 3638 –3646.
Taylor, M. M., Forbes, S. M., and Creelman, C. D. 共1983兲. ‘‘PEST reduces
bias in forced-choice psychophysics,’’ J. Acoust. Soc. Am. 74, 1367–1374.
Verhey, J. L., Ewert, S., and Dau, T. 共2003兲. ‘‘Modulation masking produced
by complex tone modulators,’’ J. Acoust. Soc. Am. 114, 2135–2146.
Yates, G. K. 共1990兲. ‘‘Basilar membrane nonlinearity and its influence on
auditory nerve rate-intensity functions,’’ Hear. Res. 50, 145–162.
Zwicker, E. 共1952兲. ‘‘Die Grenzen der Hörbarkeit der Amplitudenmodulation und der Frequenzmodulation eines Tones 共The limits of audibility of
amplitude modulation and frequency modulation of a pure tone兲,’’ Acustica 2, 125–133.
Zwicker, E. 共1956兲. ‘‘Die elementaren Grundlagen zur Bestimmung der Informationskapazität des Gehörs 共The foundations for determining the information capacity of the auditory system兲,’’ Acustica 6, 356 –381.
A. Sek and B. C. J. Moore: Modulation distortion
3037
Download