from an Enlarged Head Adaptation to Auditory Localization Cues

Adaptation to Auditory Localization Cues
from an Enlarged Head
by
Salim Kassem
B.S., Electrical Engineering (1996)
Pontificia Universidad Javeriana
Submitted to the Department of Electrical Engineering and Computer Science
in Partial Fulfillment of the Requirements for the Degree of
Master of Science in Electrical Engineering and Computer Science
at the
Massachusetts Institute of Technology
June 1998
@ 1998 Massachusetts Institute of Technology
All rights reserved
................
..............
Signature of Author ..................
Department of Electrical Engineering and Computer Science
May 20, 1998
Certified by ...
Nathaniel I. Durlach
Senior Research Scientist of Electrical Engineering and Computer Science
/Thes Supervisor
-
Accepted by ...............................
... C.Sm th
rthu
C. Smith
AccptembyArthur
Chairman, Department Committee on Graduate Students
gra
Se%94i~iA
Adaptation to Auditory Localization Cues
from an Enlarged Head
by
Salim Kassem
Submitted to the Department of Electrical Engineering and Computer Science
on May 20, 1998 in Partial Fulfillment of the Requirements for the Degree of
Master of Science in Electrical Engineering and Computer Science
ABSTRACT
Auditory localization cues for a double-size head were simulated using an
auditory virtual environment where the acoustic cues were presented to subjects through
headphones. The goals of the study were to see if better-than-normal resolution could be
achieved and analyze how subjects adapt to this type of transformation of spatial acoustic
cues. This worked follows that done by Shinn-Cunningham (1994, 1998) and ShinnCunningham, Durlach and Held (1998a, 1998b), where a nonlinear remapping of the
normal space filters was implemented. The double-size head's acoustic cues were
simulated by frequency-scaling normal Head Related Transfer Functions. As a result, the
Interaural Time Differences (ITDs) presented for every position were doubled. Therefore,
even though the relationship between the location a naive listener associates with a
stimulus and its correct location is not linear, it is a linear transformation in ITD space.
Since ITDs were doubled, some ITDs presented to the listener were larger than the largest
naturally-occurring ITDs, which proved to be a problem. Bias and resolution were the two
quantitative measures used to study performance as well as to examine changes in
performance over time. Also, the Minimum Audible Angles for normal and altered cues
were determined and used to obtain estimates of subjects' sensitivity.
In the experiments, mean response and bias changed over time as expected,
clearly showing the adaptation process. Resolution results were less consistent, giving
better-than-normal resolution around the middle positions with altered cues.
Nevertheless, normal cues provided better overall performance. When correct-answer
feedback was used, resolution behaved as expected, but when feedback was not
presented, results were consistent with subjects attending to the whole range of possible
cues throughout the experiment (i.e., the internal noise was large and constant). Previous
work suggested that mean response, bias and resolution are dependent on each other and
that all have the same adaptation rate. However, the no-feedback condition proved that
resolution can be independent of the other quantities.
Finally, estimates of sensitivity indicated that resolution is strongly related to the
type of cues used and that changes in resolution depend directly on the total internal
noise.
Thesis Supervisor: Nathaniel I. Durlach
Title: Senior Research Scientist of Electrical Engineering and Computer Science
ACKNOWLEDGMENTS
Dedico este trabajo de grado a mi esposa, quien me di6
todo el apoyo, toda la amistad y todo el amor que necesitd.
Graciaspor creer en mi.
A mis padres y hermanos, por ayudarme a ser como hoy
soy. A mi papa', porque sin su esfuerzo no podria estar
aquf.
A mis verdaderos amigos.
This work is dedicated to my wife , who gave me all the
support, all the friendship and all the love I needed. Thank
you for believing in me.
To my parents and siblings, for helping me be who I am. To
my father; without his effort I would not be here.
To my truly good friends.
I want to thank Nathaniel Durlach for his support and for
giving me the opportunity of learning wonderful things.
Special thanks to Barbara Shinn-Cunningham, for all her
unconditional help. Without her guidance, this work could
never have been finished.
I also want to thank Lorraine Delhorne, Jay Desloge and
Andy Brughera for all their kind collaboration.
TABLE OF CONTENTS
ABSTRACT .....................................................................................
ACKNOWLEDGMENTS ........................................................................................................................
3
................
1. INTRODUCTION ...................................................................................................
9
2. BAC KG RO UND .......................................................................................................................................
2.1.
............
NORMAL AUDITORY LOCALIZATION ....................................
...
.................... 9
................................................................... 13
2.2. AUDITORY VIRTUAL ENVIRONMENTS .................
15
2.3. SENSORY IMPROVEMENT...................................................
21
3. ADAPTATION TO SUPERNORMAL CUES .........................................................................
......
3. 1. MOTIVATION...........................................................................
3.3. EQUIPMENT AND EXPERIMENTAL SETUP...........................
........
................ 21
.
3.2. SUPERNORMAL AUDITORY LOCALIZATION: DOUBLE-SIZE HEAD ..................
............................ 24
...................... 27
3.4. ADAPTATION EXPERIMENT WITH FEEDBACK ........................................
3.4. 1. Exp eriment Description ....................................................................................... 2 7
3.4.2. Analysis ...............................
............... 28
..............................
29
3 .4 .3. Exp ected R esults ..................................................................................................................
3.4.4. R esults.................... ........................................
.................................................................... 3 1
3.4.5. Error in M easured tH RTFs ......................................................
............ 35
........................ .....
3.5. ADAPTATION EXPERIMENT WITHOUT FEEDBACK ......................................
......
.................
37
3.5. 1. Experim ent Description ..............................................................
37
3.5.2. Exp ected R esults ..................................................................................................................
38
3.5.3. Results................
38
..................
4. JUST NOTICEABLE DIFFERENCE .............................................................................................
4.1. M OTIVATION.............
4.2. B ACKG RO UND .........
.
..............
.. .............................
44
............................................ 44
............................................................................. 44
45
4 .3 . N E W H RT Fs .................................................................................................................................
4.4. EQUIPMENT AND EXPERIMENTAL SETUP................................
.................
...........
47
4.5. EXPERIMENT DESCRIPTION .........................................................
48
4 .6. EX PECTED R ESU LTS......................................................................................................................
50
4 .7 . R ES ULT S .................................................................................................................
................... 5 1
5. MODEL OF ADAPTATION ...........................................................................................................
53
5.1. REM APPING FUNCTIO N .....................................................................................................................
53
5.2. AVERAGE PERCEIVED POSITION ...........................................................
55
6. RELATING JND AND RESOLUTION .........................................................................................
61
6.1. BACKGROUND ............................................................................
6.2. R ESU LTS ..................................................................................................................
....................
7. CO N CLUSIO N .......................................................................................................................................
7.1. SUMMARY ................
61
............................................................
64
69
69
7 .2. D ISCU SSIO N ......................................................................................
70
7.3. FUTURE WORK ..............................................
72
REFEREN CES ...........................................................................................................................................
74
1. INTRODUCTION
In recent years, computing technology has provided us with more sophisticated
ways of gathering data, increasing the amount and complexity of the information
presented to users. As a result, the systems that work with this information are more
complex and more difficult to operate and understand. Today's graphic computer
interfaces are a first approach to easing the resulting burden of displaying information.
Lately, attention has been given to a more sophisticated interface, referred to as virtual
reality, whose objective is to provide a more efficient and natural way of presenting and
manipulating information by incorporating a three-dimensional spatial cues in the display
(Wenzel, 1992).
Using this technology, a human operator can interact with a real environment via a
human-machine interface and a telerobot as if he were the one standing in the remote
working area. Ideally, the operator should see, hear, and feel what the telerobot sees,
hears, and feels. Moreover, the telerobot can provide additional information that can be
useful to the operator (e.g., temperature, speed, etc.). Normally, the teleoperator system is
used to interact with a remote, inaccessible or hazardous environment, protecting the
physical integrity of the operator while permitting him to control and achieve a specific
task. The signals in the telerobot's environment are sensed, sent back, and displayed to
the human operator. In the same way, the actions taken by the operator in response to the
signals are transmitted to the telerobot and used to control its actions (Durlach, 1991).
In a virtual-environment
system, the same kind of human-machine interface is
used, but a computer simulation replaces the telerobot and the environment. The purpose
of a teleoperator system is to extend the operator's sensory-motor system in order to
facilitate the manipulation of the physical environment, while in a virtual reality system
the objective is to study or alter the human operator. General information on teleoperators
and virtual-environments can be found in Vertut and Coiffet (1986), Sheridan (1987),
Bolt (1984), Foley (1987), and Durlach and Mavor (1995).
In the past, use of the visual modality was the primary method for presenting
spatial information to a human operator. However, more recently, the auditory system has
become recognized as an alternative for delivering such information. Acoustic signals are
very useful because they can be heard from any source direction, they tend to produce an
alerting or orienting response, and they can be detected faster than visual signals (Wenzel,
1992). In this project, attention is given only to the auditory localization features of the
machine-human interface, and particular consideration is given to how to provide the
operator with a better-than-normal localization ability, so-called supernormal auditory
localization (Durlach, Shinn-Cunningham, and Held, 1993). Such an approach attempts
to provide acoustic cues that yield better effective spatial resolution than do normal cues.
This is achieved by increasing the change in the physical acoustic cues that result when
source position changes. Improving the effective resolution is desirable because the
normal human auditory localization system has extremely poor resolution in azimuth at
angles off the side, in elevation, and in distance; it has at least a moderate resolution only
in azimuth for sources in the front. In other words, we have relatively poor spatial
resolution for acoustic sources, especially when compared with visual spatial resolution.
Durlach, Shinn-Cunningham, and Held (1993) proposed several ways to increase
the directional resolution by using localization cues that would improve the justnoticeable-difference (JND) (i.e., the minimum separation for which a listener can resolve
two adjacent spatial positions). Some of the suggested methods for achieving
supernormal cues include simulating the localization cues from an enlarged head,
remapping the normal localization cues to increase resolution in some regions of the
azimuth plane while decreasing it in others, and exponentiating the complex interaural
ratio at all frequencies (Durlach and Pang, 1986). As Shinn-Cunningham (1998a) noted,
these approaches should improve the subject's ability to resolve sources in JND-type
experiments, but the effects on identification tasks using a larger range of physical stimuli
are not clear. In addition, the use of supernormal localization cues will displace the
apparent location of the source for a naive listener when he is first exposed to these
remapped cues. Adaptation to the new cues is said to have taken place to the extent that
the mean localization error diminishes over time with training.
Given the results obtained in a previous work by Shinn-Cunningham (1994) and
Shinn-Cunningham, Durlach and Held (1998a, 1998b), a study of supernormal auditory
localization cues will be undertaken, using the suggested enlarged head approach
(Durlach, Shinn-Cunningham, and Held, 1993). Auditory localization cues for a doublesized head will be simulated and presented to the subjects during the experiments. The
main goals of this project will be: To analyze how subjects adapt to a transformation of
spatial acoustic cues that is approximately linear, to extend the quantitative model of
adaptation developed from the nonlinear adaptation results (Shinn-Cunningham, 1998),
and to see if better-than-normal resolution is achieved with the double-head size cues.
Also, the results of this experiment will be compared with those of Shinn-Cunningham
(1994 and 1998) and Shinn-Cunningham, Durlach and Held (1998a, 1998b) to explore
how different types of remappings affect adaptation to remapped auditory spatial cues.
Following their work, bias (a measure of response error in units of standard deviation)
and resolution (the ability of reliably differentiate between nearby stimulus locations) are
the two quantitative measures that will be used to analyze the performance of subjects
over the course of the experiments.
2. BACKGROUND
2.1. Normal Auditory Localization
The classic duplex theory (Lord Rayleigh, 1907) states that the interaural
differences in time of arrival and interaural differences in intensity, are the two primary
cues used for auditory localization (Figure 1). Interaural time differences (ITDs) arise
when a sound source is to one side of the head, since the sound reaches the nearest ear
first'. If a sound source is far enough from the head, then sounds' wavefront is
approximately planar when it reaches the head. The distance the sound must travel to
reach the two ears differs, depending on source location. Assuming a spherical model of
the head with radius r, the difference in the travel distance for a source on the horizontal
plane at an angle of 0 (in radians) is given by (Figure 2):
Ad = r -(0 + sinO).
(1)
Assigning a radius of 8.75 cm to the spherical head, and knowing that the velocity of
sound c is 343 m/sec, the interaural time difference (ITD) can be expressed as:
ITD
Ad
C
255x 10 -6 -(0 + sin0)
[sec].
(2)
Figure 3 shows predictions of ITD based on equation 2 and measurement of ITD for adult
males (Mills, 1972).
The duplex theory states that the relative left-right position of a sound source is
determined by ITDs for low frequency sounds and IIDs for high frequency sounds. As the
duplex theory explains, ITDs give good perceptual cues for sound location only for low
The sound will reach the farther ear 29 psec later per each additional centimeter it must travel (Mills,
1972).
frequencies; at frequencies higher than 1500Hz, phase ambiguities occur. The phase
information becomes ambiguous at high frequencies because the wavelengths are smaller
than the distance between the ears.
closer
rD
Interaural
sources off
sooner at t
---
closer
ear
--
IID
Interaural Intensity Differences (lIDs):
sources off to one side are louder at
the closer ear due to head- shodowing
Figure 1. The duplex theory postulates that interaural intensity differences (lIDs) and interaural time
differences (ITDs) are the two primary cues for auditory localization (from Wenzel, 1992).
r-(O+si
Lef
Figure 2. Differences between the distances of the ears from a sound source that is far away and that can be
represented as a plane wave front (from Mills, 1972).
On the other hand, sources off to one side of the head are louder at the closer ear
due to head-shadowing; the head acts as a low-pass filter for the far ear, making IDs
important localization cues for high frequencies. This acoustic effect occurs because
wavelengths are large relative to the size of the head at high frequencies.
It has been found that ITD is the major cue for determining the location of sources
along the horizontal plane, and that the spectral peaks and the notches produced by the
filtering effect of the pinnae (mainly above 5kHz) are important for determining source
elevation.
Even though the duplex theory provides a clean and simple explanation for
determining the lateral position of a sound, this approach presents several limitations. For
example, listeners use the time delay envelope of high frequency sounds for localization
even though they do not use ITD at these frequencies. The direction-dependent filtering
that occurs when sound waves impinge on the outer ears and pinnae also provides very
important localization cues. It has been shown that the spectral shaping by the pinnae is
highly directional dependent (Shaw, 1974 and 1975), and that the pinnae is responsible
for the externalization of the sounds (Plenge, 1974).
0ANGLE FROM DIRECTLY AHEAD
Figure 3. Interaural time difference (ITD) as a function of the position of a source of clicks. X: measured
values from five subjects. 0: values computed from the mathematical approximation (from Mills, 1972).
Therefore, the auditory system's method for determining source position depends
on a directional dependent filtering that occurs when the received wave sound interacts
with the head, ears, and torso of the listener. Let X(w) be the complex spectrum of the
sound source and YL(m,O,B)
and YR(Wm,O, ) be the complex spectrum of the signals
received in the left and right ear respectively. Then, for sources that are sufficiently far
from the listener (so that distance only affects the overall level of the received signals),
and for anechoic listening conditions, one can write:
YL(O, 0,)
= r-' -HL(o,,
) -X(o)
Y, (o,,) = r -' H R (0,0,) • X(O),
(3a)
(3b)
where r is the distance from the head to the source, and HL(o,,,) and HR(m,0,4) are the
space filters or Head Related Transfer Functions (HRTFs) for each ear, describing the
directional dependent effects of the head and body. The HRTFs depend on the frequency,
o; the azimuth of the sound source relative to the head, 0; and the elevation of the source
relative to the head, 0.
The auditory system compares the signals received at the two ears in a manner
that can be usefully represented mathematically by forming the ratio:
YL(o,)0,)
YR (W,O,
HL(O,0',)
)
(4)
HR (,0,0)
In this ratio, the effect of r and the effect of X(w) are canceled, and the ratio depends only
on o, 0, and 0. The auditory system can determine the location of the sound source from
the ratio, independent of source characteristics. The magnitude and the phase of the ratio
of the signals at the two ears for a source at direction (0,0) are equivalent to the interaural
intensity difference (IID) and the interaural time difference (ITD), respectively.
Even though interaural processing (i.e., computation of IID and ITD) offers useful
localization information, directional ambiguities can occur: (i) distance is not perceived
because its effect is negligible for distant sources, and (ii) front-back confusions appear
_
__
__
because of the so-called cone of confusion2 (Mills, 1972). Head movements and monaural
processing help to resolve front-back ambiguities. Head movements cause changes in IID
and ITD which differ for a source in front or behind the listener. Also, a priori knowledge
or information about the transmitted signal X(o) can allow monaural spectral cues to be
used to estimate the space filters HL(o,O, ) and HR(o,O,) from the signals YL(0,0,0) and
YR(w,O,) received at the two ears.
Wightman and Kistler (1992) found that low-frequency ITDs are the dominant
cues for localization of broadband sound sources. Although ITD cues are dominant, when
the low-frequency components of a stimulus are removed, direction is determined by IID
and spectral shape cues. In other words, when low-frequency interaural time cues are
present, they override the ID and the spectral shape cues that are present in other
frequency ranges. It follows that in every condition in which there is a conflict between
low-frequency ITD and any other cue, sound localization is determined mainly by ITD.
The ITD is used primarily to establish the locus of possible source location (i.e., to
determine on which cone of confusion the sound source lies), while lID and spectral
filtering help to resolve any ambiguity in ITD information. Integration of all available
cues leads to accurate localization (Wightman and Kistler, 1992).
More information about normal auditory localization can be found in Blauert
(1983), Mills (1972), Wightman, Kistler, and Perkins (1987), Wenzel (1992) and
Durlach, Shinn-Cunningham and Held (1993).
2.2. Auditory Virtual Environments
In order to better understand the importance of auditory cues such as ITD, IID and
pinnae effects, and to enhance their capabilities, researchers have begun to use auditory
virtual environments to simulate acoustic sources around the listeners. This approach
2 The
cone of confusion errors arise because a given ITD or IID produced from one source position is
roughly equal to that produced by sound sources located at any place over the surface of a hyperbolic
surface (with a cone shape) whose axis is the interaural axis.
gives the experimenter good control of the stimulus while creating rich and realistic
localization cues.
One class of simulation technique derives from the measurement of Head Related
Transform Functions (HRTFs). Using a normative mannequin, such as the KEMAR
(Knowles Electronics, Inc.), it is possible to obtain good estimates of the acoustic effects
of the head and the pinnae on sounds reaching the listeners' ear drum as a function of
source position. Using these finite impulse response (FIR) filters, it is possible to filter an
arbitrary sound to give it spatial characteristics (i.e., to simulate a sound coming from a
predetermined direction). Even though the HRTFs provide good acoustic cues, the
localizability of the sound also depends on other factors, such as its original spectral
content (e.g., narrow band sounds like pure tones are harder to localize than broad band
tones). Individual differences in the pinnae appear to be very important for some aspects
of localization, most notably resolving cone of confusion errors. Several studies show that
most listeners can obtain useful directional information from a typical HRTF, suggesting
that the basic properties of the HRTFs carry much of the important localization
information (Wenzel, 1992).
Using digital signal processing (DSP) systems, real time simulation of acoustic
cues can be used to generate spatial auditory cues over headphones. These systems use
time domain convolution to achieve the desired real time performance, reproducing a
free-field experience. Using a head tracker device attached to the headphones, the system
can determine the actual head's yaw, pitch and roll and decide which set of HRTFs is
needed for presenting a source from a particular position. The DSP system will then filter
the input signal with the proper HRTF. Even if the subject's head is moving freely, the
head tracker allows the presentation of a fixed sound location by calculating the relative
azimuth and elevation from the source to the head. Of course, the term real time is a
relative one given that it is not possible to select the appropriate HRTF on the fly. Some
processing time is needed for all the computations. Due to the constraints of memory and
computation time, DSP systems must make several approximations and simplifications,
losing some reliability.
A typical HRTF record consists of a pair of impulse responses (i.e., one for the
right and one for the left ear), measured from several equidistant locations around the
subject. The HRTFs are then estimated by canceling the effects of the loud speakers, the
stimulus, and the microphone responses from the recorded signal (Wightman and Kistler,
1.989). For example, the HRTFs measured by Wightman and Kistler (1989) from their
subject SOS consisted of 36 azimuth positions (with a 100 resolution) ranging from 1800
to -170', and 14 elevation positions (with a 100 resolution), ranging between 80 to -500.
Hence, the HRTFs represented a total of 504 positions (36 in azimuth times 14 in
elevation). The HRTF for a specific position is stored as two 127 tap FIR filters, each
containing the impulse response for one of the ears.
Figure 4 shows typical HRTF waveforms for two different locations in azimuth at
0' elevation, and demonstrates how ITD and IID vary as a function of the direction of the
sound source. For a source at 0Oin azimuth (i.e., right in front of the listener), there is
very little difference in either the magnitude (lID) or the phase (ITD) responses for both
ears (top right plots); this is highlighted by taking the ratio between the responses of both
ears (bottom right plots). Because sound arrives almost at the same time and with the
same magnitude at both ears, the ratio of the phase and magnitude is almost zero. For a
source at -400 in azimuth (i.e., to the left of the listener), the magnitude (lID) of the left
ear is greater than the one of the right, while the phase (ITD) of the right ear is larger (top
left plots). As expected, the ratio between the right and left ear responses (bottom left
the
plots) shows a negative magnitude (i.e., the sound at the right ear has less energy than
sound at the left ear) and a negative overall phase (i.e., the sound arrives at the right ear
later that at the left ear).
2.3. Sensory Improvement
It is now possible to think not only of better ways to simulate normal localization
cues, but also of methods for transforming the natural acoustic cues for the purpose of
achieving better spatial resolution (e.g., superlocalization, Durlach, 1991).
Frequency responselet(--) and right(-- -) ear(0 degrees)
Frequencyresponseleft(-) and right(--- ) ear(40 degrees)
-
10
100
90
80
I
0
s
6000
8
10
12
if
70
60
10
40L
0
2000
4000
6000
8000
10000
12000
0
0
1 20
70
000
Jr--
righf/teft
ear800(40 degrees)
10
Frequency
400 responseMXX
-'
12 0
-30
-60-
-20
-40
-60-
-80
-100
0
-1001
2000
4000
6000
8000
10000
12000
2000
4000
6000
8000
10000
12000
2000
4000
6000
8000
Frequency (HzJ
10000
12000
Frequency
(HzJ
Frequencyresponse
rightWlf
eer (-40 degrees)
20
10
0
20
-10-
-10-
-20-
-20-
-30
-401
0
0
2000
2000
4000
4000
6000
6000
8000
8000
10000
10000
0
12000
12000
10
0
0
to
-10
20-
-20
30
-30
-401
0
2000
4000
6000
8000
Frequency1Hzl
10000
12000
-40
0
Figure 4. Frequency responses for -40o and 00 in azimuth and 0' in elevation of the HRTFs measured by
Wightman and Kistler (1989) from their subject SOS. The figure illustrates how the HRTFs contain the IID,
ITD, and pinnae effect cues.
Some studies have tried to show how subjects adapt to unnatural auditory
localization cues. One set of such studies (Warren and Strelow, 1984; Strelow and
Warren, 1985) investigated the use of the Binaural Sensory Aid, a device that used
auditory localization cues as a way of representing the position of objects sensed with
sonar. Here, the ITDs contained information about the distance from the object, and the
IIDs gave its direction. The results of this study showed that blindfolded subjects were
able to adapt and use these unnatural cues accurately, after being trained using a correctanswer feedback paradigm.
In an attempt to improve spatial resolution (i.e., improving the JND in direction),
a study on supernormal auditory localization was undertaken (Durlach, ShinnCunningham, and Held, 1993). Its main goal was to determine if adaptation to rearranged
acoustic spatial cues was possible and to see whether resolution could be improved.
In this study, supernormal localization cues were created by remapping the
relationship between source position and the normal HRTFs (Durlach, ShinnCunningham, and Held, 1993). The transformation was supernormal only for some
positions. At other positions the rearrangement actually reduced the change in acoustic
cues with changes in source location. To simulate a sound at position 0, the study used
HRTFs that were chosen from the normal HRTF set, but which normally correspond to a
different azimuth. The new HRTFs are given by:
H'(w, 0, ) = H(o, f, (),).
(5)
'With this transformation no new HRTFs were created. Instead, the existing HRTFs were
reassigned to different angles.
The family of mapping functions fo(O) used to transform the horizontal plane was
given by:
(0)
I1-tan
2
2 2n sin(20)
(6)
1-n 2 +(I+n')cos(26)
°
where the parameter n gives the slope of the transformation at 0=0 . Figure 5 shows this
transformation for several cases of n. When n=l, cues are not rearranged. With n>l the
transformation increased the cue differences (and therefore the resolution) around values
of 0=00, while it decreased them in the neighborhood of 6--90. For n<l the opposite
occurred. As a result, subjects were expected to show better-than-normal resolution in the
front, and lower resolution towards the sides when n>1.
In the study, subjects were first tested with normal localization cues (to determine
baseline performance), and then with altered (supernormal) cues to examine how
performance changed with training. Finally, normal cues were presented again to see if
there was any after-effect as a result of training.
Bias, a measure of the error in the subjects response, and resolution, a measure of
the ability to resolve adjacent stimulus positions, were the two quantities used to analyze
the adaptation process throughout the experiments. Figure 6 and 7 illustrates bias and
resolution results for one of the experiments in this study, in which correct-answer
feedback was used to train the subjects.
80
60
40
Co
a)3
1..
0)
a)
*0
V.
20
0
-20
-40
-60
-80
-50
-30
-10
source azimuth
10
30
50
0 (degrees)
Figure 5. A plot of the azimuth remapping transformation specified by equation 6 (from ShinnCunningham, Durlach, and Held, 1998a).
The first normal cues are expected to show small bias (error in units of standard
deviation) since cues are roughly consistent with normal localization cues. The first run
using altered cues resulted in a very large bias, indicating the sudden introduction of the
unnatural sounds. The last run using altered cues showed a decrease in bias compared to
before training, demonstrating that the correct answer feedback caused subjects to adapt
to the new cues (although, adaptation was not complete). Finally, the first normal cue test
following training with altered cues produced a negative after-effect, indicating that the
performance was not controlled exclusively by conscious correction (Shinn-Cunningham,
Durlach, and Held, 1998a).
Resolution of adjacent source locations is shown in Figure 7. In the first normal
cue run, resolution provides a standard against which other results are compared. As
expected, when altered cues were presented for the first time, resolution increased around
the center positions and decreased at the edges of the range. In the last run with altered
cues, resolution remained enhanced (with respect to the baseline), but showed a decrease
compared to the first altered cue run. As before, an after-effect was seen after normal cues
were introduced again (Shinn-Cunningham, Durlach, and Held, 1998a).
- --
2.4- 1.6 -
- n=1
- - --n=3
n=3
- n=
o9 O-
0
0.8Pi 0.0 -0.8-1.6-2.4-
Io
o~
-60 -40 -20
0 20
40
60
Source position (degrees)
Figure 6. Bias results for one of the experiments carried out. 0: First run in the experiment using normal
cues. ': First run with altered cues. *: Last run with altered cues. 0: First normal cue run following altered
cue exposure. Here, the altered cues have a transformation strength of n=3 (from Shinn-Cunningham,
Durlach, and Held, 1998a).
This study showed that subjects could not adapt completely to a nonlinear
remapping of the auditory localization cues. In general, subjects were able to reduce their
response bias with training, but they could never completely overcome their errors. In
addition, although the transformation initially increased resolution as expected, resolution
decreased as subjects adapted to the remapping. Shinn-Cunningham, Durlach and Held
(1998a) concluded that resolution depended not only on the range of physical cues
presented during an experiment, a result previously described for perception of sound
intensity (e.g., Durlach and Braida, 1969; Braida and Durlach, 1972), but also upon the
past history of exposure or training of the subject. The researchers also found that
subjects adapted to the best-fit linear approximation of the nonlinear transformation,
implying that subjects may only be capable of adapting to linear transformations of the
localization cues (Shinn-Cunningham, Durlach and Held, 1998b).
4.0
3.0
d'i
2.0
1.0
0.0
-60 -40 -20
0
20
40
60
Source position (degrees)
Figure 7. Resolution results for one of the experiments carried out. 0: First run in the experiment using
normal cues. +: First run with altered cues. *: Last run with altered cues. 0: First normal cue run following
altered cue exposure. Here, the altered cues have a transformation strength of n=3 (from ShinnCunningham, Durlach, and Held, 1998a).
3. ADAPTATION TO SUPERNORMAL CUES
3.1. Motivation
The main goal of this project is to examine further whether humans can adapt to
unnatural (altered) auditory localization cues that will provide listeners with better-thannormal localization ability, so-called supernormal auditory localization (Durlach, ShinnCunningham, and Held, 1993).
In contrast with the previous study Listed above (e.g., Shinn-Cunningham et al.,
1994 and 1998) where a nonlinear remapping of the normal space filters was
implemented, a more linear approach that expands all positions is now taken to create
supernormal HRTFs. The earlier experiments showed that subjects adapted to the best-fit
linear approximation of a nonlinear transformation. This could mean that subjects are
only able to adapt to linear transformations. This study is designed to give further insight
into the adaptation process to determine if this linear constraint holds for other cue
transformations. In addition, the new transformation may provide listeners with a higher
spatial sensitivity and, hopefully, a low overall localization error.
3.2. Supernormal Auditory Localization: Double-Size Head
To improve resolution, the localization cues must increase the discriminability
between separated sources. This may be achieved by having a larger-than-normal
difference in the physical cues corresponding to two different positions. One way of
achieving this is by simulating a larger-than-normal head, thereby increasing the ITDs
and lIDs associated with every position in space. For a subject who has not adapted to
such a change in cues, the use of such a transformation will make him think that the
location of sound sources are farther apart than they actually are.
The double-size head was simulated by frequency scaling normal HRTFs
(Rabinowitz, Maxwell, Shao, and Wei, 1993). The new pair of HRTF filters are defined
as follows:
HL (0,,4) = HL(K(o,,)
R
(o0,8,)
= H R (Ko,,4),
(7)
where K has a constant value. This transformation approximates the acoustic effect of
increasing the size of the human body, including the head and pinnae, by a factor of K. As
a result, the IlD and ITD will also be affected, and will be determined by the new ratio:
YR (,0,)
1
HR (0)O,)
(8)
Here, both the interaural differences and the monaural spectral cues are magnified by the
factor K (Durlach, Shinn-Cunningham, and Held, 1993), and therefore, it is said to be a
linear transformation. For the current study, the frequency was doubled (i.e., by setting
K=2), simulating a head twice the normal size. As Rabinowitz, Maxwell, Shao, and Wei
(1993) showed, scaling the HRTFs corresponds to uniformly scaling up all physical
dimensions to simulate the main acoustic effects of a magnified head.
The transformation of the HRTFs presents several problems: Scaling the
frequency of the normal HRTFs can be achieved by inserting an additional sample equal
to zero after each sample of the original HRTF impulse response. This causes the time
signal (i.e., the impulse response) to increase in length by a factor of two (i.e., K=2). In
the frequency domain, the spectrum is compressed by a factor of two. The new HRTFs
must be low-pass filtered to remove energy above the original Nyquist frequency. For
example, if the normal HRTFs are defined up to 20kHz, the new HRTFs are only defined
up to 10kHz. Without low-pass filtering the upsampled waveforms, this procedure would
create distortion of the spectrum above 0lkHz due to spectral aliasing.
Conversely, since the size of the head is doubled, the ITDs presented to the
listener will include larger ITDs than the largest naturally-occurring ITDs. For example, a
source at 900 (or -900) will produce the maximum normal ITD of around 0.65 msec
(Figure 3). With the transformed cues, the corresponding ITD will be 1.3 msec. It is not
clear how subjects will perceive these unnatural cues. As a consequence, subjects must
adapt to the expanded interaural axis not only by relabeling it, but also by interpreting
larger than normal ITDs (Durlach, 1991).
Normal HRTFs from subject SOS (Wightman's and Kistler's, 1989) were used to
create the double-head HRTFs. Each position described by the HRTFs contains two 127
tap FIR filters (one containing the filter coefficients for the right ear and one for the left
ear) sampled at 50kHz. To create the double-head HRTFs, each FIR filter was upsampled
by a factor of two and then low pass filtered at 25kHz (Figure 8). As a result, the new
altered HRTFs were two times longer (i.e., each FIR filter is now 254 tap long), and
sampled at 100kHz.
alteredHATF(-40degrees)
Frequencyresponse
FrequencyresponsenormalHRTF (40 degrees)
-10
0
-10
00
2000
-40
0
3000
4000
5000
60
3000
4000
5000
6000
-400
2000
4000
000
000 10000
1000
102000
2000
0
0
(Hz]
Frequency
Frequency[Hz]
Notice the different frequency scales
Figure 8. Comparison between normal and altered HRTFs for a source at -400 in azimuth. The left panel
shows the normal HRTF while the right one shows the altered HRTF under different frequency scales. The
shapes of both frequency responses are the same, except for the fact that the altered HRTF has been scaled
in frequency, indicating that the upsampling was successful. Note that the ITD (given by the slope of the
phase as a function of frequency) for the altered HRTF is now doubled.
Figure 8 compares the frequency responses of the HRTF at -400, showing that
the upsampling doubles the ITD (the ITD is given by the slope of the phase response as a
function of frequency). As mentioned above, the altered HRTFs are now compressed in
frequency by a factor of two, and in order to prevent unpredictable results at high
frequencies, the new HRTFs were low-pass filtered. As a result, the magnitude response
is effectively zero above 12.5kHz..
3.3. Equipment and Experimental Setup
Adaptation to the double-size head auditory localization cues was investigated by
presenting simulated acoustic cues and real visual cues.
The acoustic cues were
generated by an auditory virtual environment. Visual cues were provided by a light
display located in front of the subjects. The visual cues were used to provide the subjects
with spatial feedback about the simulated sounds.
Subjects were seated in front of a five-foot-diameter arc of lights, consisting of
thirteen 2 inch light bulbs. The lights were labeled from 1 to 13 (all lights were visible to
the subjects during the experiment). The lights were positioned from -600 to +600 in
azimuth with respect to the head position, with a 100 separation between each pair of
lights. The position -60o azimuth was represented by light 1, 0Oby light 7, 60' by light
13, etc. The light array was connected to a digital-analog device, the light driver, which
receives a digital input from a personal computer (PC) and converts it to an analog output
that drives the current to each light bulb. The PC used Data Translation's DT2817 Digital
I/0 Board to transmit signals to the light driver, enabling it to turn the light on or off at
any of the 13 positions (Figure 9). This light array provided visual feedback to the
subjects.
The acoustic cues were simulated
by an auditory virtual environment system
consisting of a PC, a signal-processing device, a head tracker, headphones, and a function
generator. The head tracker transmits to the PC the instantaneous head orientation of the
subject with respect to 0' azimuth (i.e., the 00 position in the light array, calibrated during
start-up procedures). The PC calculates the relative direction of the head with respect to
the desired source position. This information is then transmitted to the signal-processing
hardware, which filters the waveform provided by the function generator with the
appropriate HRTFs to produce the left and right ear signals. Finally, the binaural signal
generated by the signal processing hardware was played to the subject over headphones.
_20o
-20
-10o
^
0
10
00
200
/o
-50
-600
Figure 9. Diagram of the light array which gave subjects spatial visual feedback. Thirteen light bulbs
represent 13 positions ranging from -600 to 600 in azimuth with respect to the subject's head position.
Lights were placed at 100 intervals.
The Escort EFG-2210 function generator provided the system with a 5Hz periodic
train of clicks (i.e., square wave) as the sound source. As described later, the subjects
heard roughly 5 clicks per trial, as the signal-processing hardware switches the input
signal on and off asynchronously.
A Polhemus 3Space Isotrack provided head position information. The Isotrack
uses electromagnetic signals to measure the relative position (azimuth, elevation and roll)
between a stationary transmitter and a receiver worn on the subject's head.
The PC, a Pentium-S based machine running at 100MHz, controlled the signalprocessing hardware and the light array and ran the experiment's software control
program.
To present a source, the program randomly selected one source position from the
13 possibilities. The relative position between the selected source and the subject's head
was calculated after reading the position of the listener's head from the head tracker. The
PC instructed the signal-processing hardware to generate the appropriate binaural cues
and present them to the subject, based on these computations.
The signal-processing hardware used was the System II, a signal-processing
platform from Tucker Davis Technologies. The System II consists of analog and digital
interface modules permitting the synthesis of high-quality analog waveforms, including
PA4 Programmable Attenuators, an HTI Head Tracker Interface, and the PD1 Power
Sdac (a real-time digital filtering system). An analog to digital converter (ADC) received
the input waveform and filtered it with the selected HRTFs. The binaural signal was then
passed through the output digital to analog converter (DAC) which was connected to the
PA4s. The programmable attenuators controlled the length of the stimuli that the subjects
received. While the attenuators were in the mute state, no sound was heard. The PA4s
were switched out of the mute state for one second per trial, allowing roughly 5 clicks to
be heard. The HTI permitted the computer to read the coordinates provided by the head
tracker.
Figure 10 shows a block diagram of the virtual auditory environment used to
simulate the acoustic cues.
Transmitter
Receii
-
Function
Generator
sound source
Figure 10. Block diagram of the virtual auditory environment that simulated acoustic localization cues.
3.4. Adaptation Experiment with Feedback
3.4.1. Experiment Description
In each testing run, subjects had to face front (0' azimuth) while a continuous
sound (click train) was presented from a random location. When the sound was turned
off, they were asked to identify the location of the sound by reporting the position number
(i.e., a number between 1 and 13) to the operator, who entered it on the keyboard. As
soon as the answer was typed into the computer the appropriate light was turned on as a
way of giving a correct-answer feedback. One second after the subject's response, the
next random sound was presented. All locations were presented to the subject exactly
twice in each run (i.e., the locations were chosen at random without replacement). Thus,
if 13 positions were used, 26 trials were presented in each run. Each run lasts around 3
minutes.
Finally, each run could present either normal or altered cues, determined by
selecting the appropriate set of HRTFs.
The basic experimental paradigm was similar to that used by Shinn-Cunningham
1(1994). Each subject performed 8 identical sessions of 40 testing runs each. In each
:session, the first 2 runs and the last 8 runs used normal cues, while the others used altered
HRTFs. Eight sessions were necessary in order to have a sufficient number of trials to
average across. It was assumed that all trials were stochastically independent even though
the positions presented were chosen at random without replacement.
Before the beginning of the experiment, subjects were informed that both normal
and altered cues would be used at different times and that the apparent location of
simulated sources using altered cues may not be their correct location. Also, the subjects
were notified every time that a change of cues was about to occur (from normal to altered
or from altered to normal), so that they would answer as accurately as possible for the
current cues.
Data from five subjects were gathered. All subjects were naive (without prior
experience in auditory localization experiments), reported normal hearing, and had no
difficulty performing the test.
3.4.2. Analysis
Bias and resolution were the two quantitative measures used to study the
performance and adaptation of each subject under these experimental conditions. Bias
measures the error in the subject response (in units of standard deviation), describing how
well the subjects adapted to the altered cues. Resolution measures the ability to resolve
adjacent stimulus positions.
As described by Shinn-Cunningham (1994), there are three basic processing
schemes that can be used for finding estimates of the average signed error (bias) and the
response sensitivity (resolution). All schemes assume that each presentation of a physical
stimulus results in a random variable with a Gaussian distribution along some internal
decision axis. The mean of the Gaussian distribution is assumed to depend monotonically
on the source position, while its standard deviation has the same value for all positions.
This indicates that the ability to resolve sources comes from the relative distances
between their means.
The first estimation method uses a Maximum Likelihood Estimate (MLE)
technique to find the means of the internal distributions and the placement of decision
criteria, given the confusion matrix observed.
The second method, known as the raw processing method, computes raw
estimates of bias and resolution from the means and the standard deviations of the
responses. Bias is estimated as the difference between the mean and the correct response
divided by the standard deviation. Resolution between two adjacent positions is computed
as the difference of the mean responses divided by the average of the standard deviations.
Finally, the third method, also a raw processing method, assumes that the
variations in response between the standard deviation of all positions are unimportant. As
a result, the standard deviation for all positions is averaged and used as a constant value.
Bias and resolution are then computed as in the second method.
As Shinn-Cunningham (1994) noted, the results of these three methods are very
similar, even though MLE processing is much more computationally intensive and takes
into account many factors ignored by the other methods.
Thus, method two was assumed to be adequate for analyzing the data in this
study. Accordingly, bias and resolution are given by:
bias = m(p)- p
bias
(9a)
m(p + 1)- m(p)
resolution = d'= m
V (p + 1)Jo(p)
(9b)
where p is the target position, and m(p) and o(p) are the mean and the standard deviation
of the responses for target position p, respectively.
3.4.3. Expected Results
3
Given the results obtained by Shinn-Cunningham (1994) and the linearity of the
altered cues used in this project, the following results were expected (Figure 11): For the
first run using normal cues, subjects are expected to show almost zero bias and better
resolution for the center positions than for the edges. When the first altered cues are
presented, an increase in resolution in almost all directions was expected (with greater
values around zero), due to the fact that the ITDs were larger (doubled) at all positions.
Because of the increase in ITDs, the mean response should show a change in slope (with
slope of mean response to correct location approximately doubled). This is consistent
with subjects hearing sources farther to the side than their correct position. Similarly, we
expected that bias would be small for positions near 00 azimuth, larger for intermediate
3 The supernormal cues used are called linear because the ITDs are approximately doubled for all positions.
positions, and small again at the extreme edges (since subjects could not respond beyond
the range of locations presented).
Expected Mean Response
Target Position (degrees)
Expected Bias C haracteristic
A
•
Expected Resolution Characteristic
\
i
'^^^^"^^^'^'
01I
-60
K X_
a -
X
-40
-20
0
20
40
Target Position (degrees)
)
60
Target Position (degrees)
Figure 11. Cartoon exaggerating the effects of adaptation for mean, bias and resolution.
- : Normal cues. - -: Altered'cues. O: First presentation of normal cues. *: First presentation of altered
cues. X: Last presentation of altered cues. +: First presentation of normal cues following the last run of
altered cues.
After the 30th altered run, adaptation was assumed to have taken place in that it
was expected that mean errors would decrease with time. This decrease would be evident
by a change in the slope relating mean response to correct location towards one, and a
decrease in bias towards 0. Since the acoustic range was larger with the altered cues, the
internal decision noise was assumed to grow with adaptation (Durlach and Braida, 1969;
Braida and Durlach, 1972; Shinn-Cunningham, 1998). As a result, resolution was
expected to decrease with time. The change in internal noise would also cause bias to
decrease even farther than if there was no change in stimulus range.
Finally, results from the first normal cues after exposure to the supernormal cues
would give insight into whether subjects really adapted to the supernormal cues or if they
were just consciously correcting their responses based on whether they were hearing
normal or altered cues. In the first case, subjects could not immediately turn off their
remapping of localization cues (even when they were told that they are hearing normal
cues) and mean responses were expected to show an after-effect (i.e., the slope relating
mean response to location was expected to be less than one). The after-effect should also
cause identification performance to be worse after training than before, bias should be
non zero and in the opposite direction from the error originally introduced by the
remapping. If subjects could consciously change their responses, mean, bias, and
resolution should have resembled those from the first normal cue run.
As is shown in Figure 11, the expected results are all symmetrical around 00
azimuth (the mean response had odd symmetry) since there was no reason to think that
there would be any left-right asymmetry in the results. For this reason, all the results
presented here are collapsed around 00 (i.e., the left and right sides were averaged).
3.4.4. Results
The data showed small differences across sessions, compared to the differences
across test runs within a particular session. Therefore, the data reported in this study were
collapsed across the eight sessions performed by each subject. The individual subject
responses were analyzed to find mean response, bias, and resolution as a function of
position for each run in the session. These statistics were then averaged across subjects,
and then further collapsed by assuming left-right symmetry, to yield the results shown.
Results from runs 2, 3, 32 and 33 were examined in detail to investigate how
performance changed over the course of one session. Run 2 was the last run that used
normal cues prior to the exposure to altered cues. At run 2, the subject knew what the
experiment was about and should had been comfortable with the procedure. The results of
this run served as a baseline or reference point for other runs because it reflected normal
C~III~----- -------
----- -
_______~i~ --
localization performance. Run 3 was the first run that used supernormal cues and
provided a measure of the immediate effects of the transformation. After 30 runs, subjects
should have adapted to the unnatural cues, and run 32, the last altered cue used, should
illustrate the final state of adaptation. Finally, run 33, the first run using normal cues after
the altered cue runs, should revel the after-effect.
Mean response for group with feedback
5(
4
S
a)
a,
c,
3
c
C:
2
o
0
0)
a,
o
1'
"o
V
-1
Target Position (degrees)
Figure 12. Mean response and slope characteristic for the group with feedback as a function of target
position. 0: First presentation of normal cues. *: First presentation of altered cues. --: Normal slope
(diagonal). - -: Double slope.
Figure 12 shows the mean response as a function of position. The mean response
was very close to the diagonal for the first normal cues presented to the subjects, as
expected. When the altered cues were introduced, the outcome was consistent with the
transformation employed: subjects heard sources farther to the side and the slope of the
response curve was almost doubled for the center positions (the edge effect makes the
slope decrease at the borders). Several runs with supernormal cues forced subjects to
adapt, decreasing the slope of the mean response and reducing the localization error
(Figure 13). However, the localization error was still present because subjects could not
adapt completely to the unnatural cues. Finally, when normal cues were presented once
again, small localization errors were made (particularly to the sides) in the opposite
direction, revealing an after effect from the supernormal cues.
Mean response for group with feedback
tJ
E;
4
0),
%W
C
0
'0
2
-1
0
10
30
40
20
Target Position (degrees)
50
60
Figure 13. Mean response for the group with feedback as a function of target position.
-- : Normal cues. - -: Altered cues. 0: First presentation of normal cues. *: First presentation of altered
cues. X: Last presentation of altered cues. +: First presentation of normal cues following the last run of
altered cues.
As seen in the figures, the mean response at 00 was slightly negative. This
negative value was caused by a small (unintended) energy difference between the left and
the right ear's HRTFs and it will be discussed in the next section.
Figure 14 illustrates the bias estimates for this group of subjects. As before, the
first presentation using normal cues gave reference values. The first altered cue
presentation had a large positive bias because subjects heard sources farther to the side
than with normal cues. Bias at the edges was negative because the only error that could be
made at the extreme locations was towards the center. After adaptation took place, bias
was reduced for most locations but it was still greater than the normal bias (adaptation is
not complete). Finally, when normal cues were presented again, a negative bias was
present for the lateral positions (but not for central positions).
Bias for group with feedback
0
10
20
30
40
Target Position (degrees)
50
60
Figure 14. Bias estimates for the group with feedback as a function of target position.
- : Normal cues. - -: Altered cues. 0: First presentation of normal cues. *: First presentation of altered
cues. X: Last presentation of altered cues. +: First presentation of normal cues following the last run of
altered cues.
As described before, resolution was expected to increase in almost all the azimuth
positions for all runs when using altered cues (it was expected to be somewhat better at
the center than at the edges of the range). Figure 15 shows that resolution increased only
for the center positions and in the first altered run (where bias had its highest value).
Furthermore, the gain in resolution was not as good as the loss of resolution at the side
positions. Finally, any changes in resolution between the last altered cues and the first
normal presentation after the supernormal runs are small and inconclusive. For both
normal and altered cues, there was a substantial decrease in resolution at the end of the
session (after training), compared to the beginning of the session (prior to training). This
may indicate an overall increase in variability with time, perhaps due to subject boredom.
An alternative explanation is that training with the transformation caused decreases in
resolution for both normal and altered cues.
In conclusion, the expected results were obtained in this experiment, but the
magnitude of the observed changes in performance was small, perhaps because subjects
adapted very fast to the change in cues. The mean response and the bias showed all the
characteristics of adaptation; however, there was only a small after effect, perhaps
because subjects learned to go from one cue to the other one consciously. Finally,
resolution decreased with training and time.
Resolution for group with feedback
2.5
m
2
1.5
1
0.5
lIK
*,.
n
0
10
20
30
40
50
60
Target Position (degrees)
Figure 15. Resolution estimates for the group with feedback as a function of target position.
- : Normal cues. - -: Altered cues. 0: First presentation of normal cues. *: First presentation of altered
cues. X: Last presentation of altered cues. +: First presentation of normal cues following the last run of
altered cues.
3.4.5. Error in Measured HRTFs
The normal HRTFs used had an energy level mismatch at high frequency of 6dB
between the left and the right ear at all azimuths in this experiment, causing the subjects
to shift their answers towards the left side. While standard deviation should not been
affected by the error, the mean response was slightly shifted to the left side.
Consequently, bias results also show a small negative shift. Even though mean response
and bias were collapsed around 00, this effect was not canceled out completely (especially
around 00).
Impulse Response Left Ear (0 degrees)
400
mV
100.
mV
/div
-400
mV
Impulse Response Right Ear (0 degrees)
400
mV
-~1-
50mV
100
_____
-1
__
~AA
/div
-400---
--400
-
31.219
Start:
0
a
Stop:
31.219
m..
ms
Figure 16. Impulse response for the left ear (top panel) and right ear (bottom panel) filters at 0Oazimuth. A
6dB difference in amplitude is evident for high frequency components.
Frequency Response for Right/Left Ears (0 degrees)
12.5
dB
LogMag
5
dB
/div
-27.5
Start:
0
Hz
Stop:
12.8
kHz
Start:
0
Hz
Stop:
12.8
kHz
180
dog
Phase
45
deg
/div
-180
dog
Figure 17. Frequency response for the ratio between right and left ear filters at
difference in magnitude is evident for high frequency components (top panel).
00 azimuth. A 6dB
Figure 16, the impulse response of the left and right HRTFs at 00 azimuth, shows
a difference in amplitude of 50mV or 6dB (i.e., 201loglo[100mV/50mV] ) at the high
frequency portions of the signals. This can be verified by examining Figure 17, where the
frequency response magnitude of the ratio between the right and the left ear is depicted. A
6dB difference is present for the high frequency components (roughly above 7kHz). Both
figures represent the actual outputs of the virtual auditory system (i.e., the signals that go
directly to the headphones).
3.5. Adaptation Experiment without Feedback
3.5.1. Experiment Description
This experiment was similar in all respects to the previous one, except that no
feedback was given to the subjects, and that the trials structure was more tightly
controlled. In this experiment each run of 26 trials was broken into subruns of 13 trials
each to allow detailed analysis of the speed with which adaptation occurred.
As in the previous test, subjects were informed before the beginning of the
experiment that normal and altered cues would be used and that the apparent location of
sources using the altered cues might not be the correct location. As before, the subjects
were reminded every time before a change of cues would occur (from normal to altered or
from altered to normal), so that they would answer as accurately as possible for the
current cues.
The purpose of this experiment was to see if removing explicit feedback would
slow down the subjects' adaptation process, allowing our measurements to better capture
the changes in performance over time. Adaptation was expected to occur even without
feedback as subjects adjusted to the larger-than-normal cue range and learned to map the
cues to the range of available responses. For example, if they heard a source that appeared
outside the possible range of response locations when using altered HRTFs (i.e., between
-60' and +600), the subjects would learn to adjust their responses to map the cue range to
the response range.
Data from five subjects were collected, none of whom participated in the first
experiments. All subjects were naive, reported normal hearing, and had no difficulty
performing the test.
3.5.2. Expected Results
The overall pattern of results expected here was the same as for the previous
experiment, except that changes were expected to occur more slowly. Since adaptation
took place very fast in the previous experiment, only 10 runs were used here: 2 normal, 7
altered, 1 normal. The shorter session length also prevented subjects from getting bored
or distracted towards the end of the experiment.
3.5.3. Results
Runs 2, 3, 9 and 10 were analyzed to investigate how performance changed over
the course of one session. As before, run 2 was the last run that used normal cues prior to
the exposure to altered cues and provided a baseline against which other runs could be
compared. Run 3 provided a measure of the immediate effects of the supernormal cues.
Run 9 showed how subjects adapted to the transformed cues after exposure. Finally, run
10 showed any after-effect caused by the exposure to altered cues.
Figure 18 shows the mean response as a function of position. As in the previous
experiment, the mean response was very close to the diagonal for the first normal cues
presentation. Results after the first presentation of altered cues indicate that subjects
heard sources farther to the side with the altered cues, with slope of the response curve
roughly doubled for the center positions. After seven runs with supernormal cues, the
error in mean response decreased, but subjects never adapted completely (Figure 19). An
after-effect showing localization errors in the opposite direction appeared when normal
cues were presented in run 10.
Mean response for group with no feedback
0
30
20
10
40
50
ti
Target Position (degrees)
Figure 18. Mean response and slope characteristic for the group with no feedback as a function of target
position. 0: First presentation of normal cues. *: First presentation of altered cues. .: normal slope
(diagonal). - -: double slope.
Mean response for group with no feedback
- '
5(
)r
7
,X
,/
0
Target Position (degrees)
Figure 19. Mean response for the group with no feedback as a function of target position.
Normal cues. - -: Altered cues. 0: First presentation of normal cues. *: First presentation of altered
-:
cues. X: Last presentation of altered cues. +: First presentation of normal cues following the last run of
altered cues.
Figure 20 illustrates the bias estimates for this experiment. The first presentation
of altered cues resulted in significant errors in the expected direction (i.e., positive bias,
indicating that subjects heard sources farther to the side). Some adaptation took place as
indicated by the decrease of bias at the end of the altered runs. Again, it was clear that
subjects did not entirely overcome the errors introduced by the transformation. Finally,
when normal cues were presented again, a distinct negative bias was present at all
positions.
Bias for group with no feedback
1.
0.
-0.
-1.
-2.
0
10
20
30
40
50
60
Target Position (degrees)
Figure 20. Bias estimates for the group with no feedback as a function of target position.
-:
Normal cues. - -: Altered cues. 0: First presentation of normal cues. *. First presentation of altered
cues. X. Last presentation of altered cues. +: First presentation of normal cues following the last run of
altered cues.
In this experiment, as in the previous one, resolution was enhanced in the central
positions after the supernormal cues were introduced (Figure 21). However, resolution for
the outside positions was very poor, presumably due to the fact that at the edges the cues
presented are beyond the normal range. In other words, because cues change with position
at two times the normal rate, only the center positions will give rise to natural ITDs. In
general, relatively little change in resolution is seen for either normal or altered cues. It
appears that the effects of training are very small, comparable to the random variation in
the estimates due to stimulus uncertainty.
Resolution for group with no feedback
2
-
.
1.5
1
X- -
0.5
""'"
X
n
0
10
20
30
40
50
60
Target Position (degrees)
Figure 21. Resolution estimates for the group with no feedback as a function of target position.
- : Normal cues. - -: Altered cues. 0: First presentation of normal cues. *: First presentation of altered
cues. X: Last presentation of altered cues. +: First presentation of normal cues following the last run of
altered cues.
In order to better understand the rate at which changes are occurring, the subruns
were compared to see if adaptation was occurring rapidly enough so that differences
between the first and the second trials of a run could be observed. Figure 22 illustrates
this adaptation rate by comparing the first 13 trials and the last 13 trials of runs 3 and 5,
where altered cues were used. The figure shows that there was not much difference
between the two groups of trials within a run, especially for mean response and bias,
implying that the change within one run was small. Subjects should have made
unconscious use of the array of response positions in front of them (Figure 9). For
example, the edges of the range could help determine that the range of stimuli for altered
cues was larger than normal, but that this stimulus range still must be mapped to
localization responses between -60 and +600 degrees. It is not clear why resolution
presents some apparent differences between the trials, while mean response and bias do
not.
Mean response for group with no feedback
Mean response for group with no feedback
:)
D
Target Position (degrees)
Target Position (degrees)
Bias for group with no feedback
Bias for group with no feedback
/
-
\
N~
0
-1
-1.5
-2
10
20
30
40
Target Position (degrees)
50
0
60
10
Resolution for group with no feedback
0
10
20
30
40
Target Position (degrees)
50
20
30
40
Target Position (degrees)
50
60
Resolution for group with no feedback
60
0
U
0
10
20
30
40
Target Position (degrees)
50
60
Figure 22. Mean response, bias and resolution estimates for the group with no feedback as a function of
target position. Left side panels show the estimates for the first run using altered cues (run 3) while the right
side panels show the estimates for the third run using altered cues (run 5). 0: First presentation of normal
cues. *: First 13 trials of presentation with altered cues. X: Last 13 trials of presentation with altered cues.
This experiment demonstrated that feedback accelerates the adaptation process for
supernormal cues, but that explicit feedback is not required for adaptation to occur. With
exposure to a different set of stimuli, subjects learn to map the new stimulus set to the
available responses. Resolution with the new double-head stimuli is enhanced at the
center positions compared to normal cues, but is worse at the edges, implying that
subjects cannot use cues outside the normally-experienced range very effectively. Finally,
as subjects adapt, there is little change in resolution with this transformation, either for
normal or altered cues. This implies that boredom, not a change in internal noise, may be
the cause of the decrease in resolution seen in the first experiment.
l ann~
,~c~uz~MB
r~,ac--. ,~;.2 l--~--r---.r~
r------- ---- - ~'~_ ~~rsh
I
4. JUST NOTICEABLE DIFFERENCE
4.1. Motivation
The use of supernormal acoustic cues increased resolution only in the
neighborhood of 00 in azimuth, while making it very poor towards the sides. This result
was expected for the non-linear transformation (Durlach, Shinn-Cunningham, and Held,
1993), but not for the double sized head simulation in which the ITDs were doubled. The
following experiment was performed in order to help understand these discrepancies, and
to develop an improved model of the adaptation to double-size head cues.
4.2. Background
The resolution of the auditory localization system is measured in terms of the just
noticeable difference (JND) in azimuth, also known as the minimum audible angle
(MAA). The minimum audible angle is defined as the smallest detectable difference
between the azimuths of two identical sources of sound (Mills, 1958). The minimum
audible angle around 0' azimuth is small (about 10) for low frequency sinusoids, large for
intermediate frequency sinusoids between 1500 and 2000 Hz, and small again for high
frequency sinusoids. Resolution is poorer for sources off to the side, with the MAA
increasing with the magnitude of the reference azimuth. For tones between 1500 and
2000 Hz and azimuths of more than 450, the JND is indeterminately large (Mills, 1972).
Figure 23 shows some typical curves of the MAA as a function of frequency, for several
azimuths. More information on the minimum audible angle can be find in Mills (1958
and 1963).
Q)
a,)
a)
-o
sl
._.J
0
(9
Z
LJ
4
W
.J
CD
D
4
3
Z
z;
200
500
1000
2000
5000
10,000
FREQUENCY(Hz)
Figure 23. Just Noticeable Difference (JND) or Minimum Audible Angle (MAA) between successive
pulses of tone as a function of the frequency of the tone and the direction of the source.
0: 0o. 0: 300. A: 600. A:75 0 (from Mills, 1963).
4.3. New HRTFs
In the JND paradigm used here, subjects must indicate whether two successive
sounds come from the same or different directions. The HRTFs measured by Wightman
and Kistler (1989) and used in previous experiments were not utilized here because they
are measured with spatial resolution of 100. A normal JND value for the center positions
is roughly 1', indicating a need for HRTFs with at least a
10
resolution.
To overcome the poor spatial resolution of the HRTFs, it was necessary to
spatially interpolate them to approximate HRTFs at 10 spacing.
Even though a simple linear interpolation seemed a reasonable approach, spectral
discrepancies occur with this method due to the differences in phase between the filters
(Wenzel and Foster, 1993).
A method for minimizing such effects was suggested by Kistler and Wightman
(1991), in which a minimum-phase approximation of the measured HRTFs was used. The
method assumes that HRTFs are minimum phase functions (Oppenheim and Schafer,
1975), and that the ITD at each source have a constant value (i.e., the frequency
dependence of ITD is unimportant perceptually). This was confirmed by Kulkarni (1995).
The first step for computing the approximation is to obtain the minimum phase filters
from the original HRTFs. By definition of minimum phase filters, their magnitude is
equal to the magnitude of the original filters and the phase is equivalent to the Hilbert
transform of the log-magnitude of the original filter. To obtain the magnitude of the new
filters with a finer resolution, the magnitude of the original filters is interpolated. The
appropriate phase difference (ITD) is obtained by finding the maximum value from the
cross correlation function of the impulse response of the HRTF pairs. After interpolating
the delay, half is added to the left of the new filter and half is subtracted from the right
one. Equations 10a through 10e describe the method for finding the minimum phase
interpolation for a 50 HRTF.
In this example, HL(w) and HR(o) are the frequency
responses, and hL[n] and hR[n] are the impulse responses of the pair of original HRTFs. '
is the Hilbert transform, and max(xcorr (a,b)) is the maximum value of the cross
correlation function.
ITD resolution is bounded by the sampling rate of the HRTFs. In this study, the
normal HRTFs used were sampled at 20 psec, therefore, ITD resolution is 20 Psec which
could be small for the HRTFs with a 10 resolution. Therefore, the normal HRTFs were
upsampled to achieve better ITD resolution.
New normal and altered HRTFs were created and gave acceptable performance
with a 10 resolution.
H
HmR
o.(Ct) = HLoo (o) leJ(ogHL"(CO)
(o)
=
HR, 00) e jX(Iogj HR
o
HmpL 100 ((o) = HL 1o((0) 1ex log IH
HmpR 100 ( )
=
HR
00 ol
(10a)
jx log HR 100(
)
(lOb)
mpL (o) + H
()H
HmpR 5"(o)
(10c)
too (0)
HmpR 00 (0) + HmpR 100 (0)
22
max[xcorr(hLoo [n],hR o [n])]+ max[xcorr(hL
5
()
(10d)
2
delay =
HL
o,, [n],hR, [n])]
(1Oe)
_idelay
2
= Hmpso (0) . e
.delay
HR 5
(0) = HpR5 (O) e 2
0
4.4. Equipment and Experimental Setup
The JND experiment used an auditory virtual environment to present acoustic
head
cues to subjects identical to that used in previous experiments, except that the
tracker was not used. Figure 24 shows a block diagram for the system, where the PC and
4.
the signal processing device were the ones used before
PC
(main control)
Wave Generator
(Gaussian Noise)
acoustic localization cues for
Figure 24. Block diagram of the virtual auditory environment that simulated
the JND experiment.
Description of the PC and signal processing device can be find in section 3.3.
~-------
r~~sL?-i~l
Tr~PL
~-~--L--.-
--
The PC controls the program and decides which HRTF was appropriate to use.
This information was transmitted to the signal processing device, which filtered the input
waveform with the specified HRTF. The filtered signal was then presented to the subject
through headphones. For this experiment, the stimuli were rectangularley gated Gaussian
noise pulses of duration 0.2 sec, produced by a WG 1 Waveform Generator.
4.5. Experiment Description
In each trial of the JND test, three consecutive noise pulses were presented in one
of the following orders, selected randomly:
(i) RP, LEFT, RP,
(ii) RP, RP, LEFT,
where RP stands for a pulse simulated at the reference position and LEFT for a
pulse simulated at a position to the left of the reference position by a small increment.
The subjects must determine which interval (second or third) was located to the left of the
first sound presented.
In order to disguise any overall spectral level difference between positions that
could serve as an acoustic cue to subjects, the intensity of each stimulus was randomized
over a range of 15dB.
The experiment used the transformed up-down method for estimating JND values
described by Levitt (1971). This method employs an adaptive procedure in which the
position increment in the LEFT trial is determined by the prior stimuli and responses. The
objective of this forced-choice experiment was to determine the MAA (defined as the
change in location that yields 70.7% correct performance) for simulated sources using
both normal and altered HRTFs. Accordingly, to make the data converge to the 70.7%
level, the following strategy was used:
(i) If the answer is incorrect, increase the angle between RF and LEFT by 10
(ii) If there are two correct answers in a row, decrease the angle between RF and
LEFT by 10.
When tracking the increment presented in a run, peaks (when several correct
answers were achieved and the angle between RF and LEFT is the smallest) and valleys
(runs of incorrect answers and the angle between RF and LEFT is greater) are observed.
The average of the peaks and valley angles yields an estimate of the JND value (i.e., the
70.7% level). In the experiment, the first three reversals (i.e., change between correct to
incorrect answers or incorrect to correct) are thrown away, and the MAA is estimated
from the data of the next seven reversals.
An experimental session consisted of four tests using different reference positions
(00, 100, 300, and 600). The order of the four tests was chosen randomly. Each test was
done twice in each session, once with normal HRTFs and once with altered HRTFs. Each
session lasted approximately 40 minutes.
Each subject performed three sessions. Consequently, for each subject there were
three JND values for each of the four reference positions using normal cues, and three
JND values for each reference position using altered cues.
Prior to the beginning of the experiment, subjects were informed that normal and
altered cues would be used. They were notified every time before a change of cues
occurred (from normal to altered or from altered to normal), so that they would answer as
accurately as possible for the current cues. Additionally, they were told that the intensity
of the sounds was random, and therefore that it should not be used as a localization cue;
however, they were free to use whatever cue they found useful to do the task based on the
feedback provided. Finally, they were informed that some trials might be harder than
others and that feedback will tell what the correct answer was. At the beginning of the
first session, subjects had a training session in which they got familiar with the test
procedure.
Data from six subjects were gathered (none of whom performed the adaptation
experiments). All subjects were naive, reported normal hearing, and had no difficulty
performing the test.
4.6. Expected Results
The JND results for the normal cues should be roughly similar to the ones found
by Mills (1963), shown in Figure 23. However, they might be somewhat larger due to the
HRTF approximations used, limitations of the virtual environment, differences in the
stimuli, or other procedural changes.
Given the results of the previous experiments with altered cues, it was expected
that the JND would improve for the center positions and would get worse as the angle
increases. This could be explained by the unnatural ITDs that were presented to the
subjects. They were not capable of distinguishing sources when the ITDs were larger than
the ones normally received. Figure 25 sketches the expected results for JND.
Expected JND
0
10
20
30
40
50
Reference Position (degrees)
Figure 25. Expected JND. O: Using normal cues. *: Using altered cues.
4.7. Results
The data gathered was averaged across sessions and across subjects to find the
mean value for each reference position (0', 100, 300 and 600). The least-square-error
quadratic polynomials that fit the mean values are:
Normal JND = 0.0011 1'2 + 01417101+ 4.7579 for
Altered JND = 0.00331012 + 0.1453101+ 3.5532
o10
< 600
(11)
for 101 < 600,
where 0 represents the reference position. Figure 26 shows that these polynomial
representations have the same characteristics as the expected results (Figure 25). A
smaller JND value was obtained around the center positions when altered cues were used.
This implies that better resolution should be achieved at these positions since the relation
between resolution and JND is inversely proportional. Additionally, the JND value for
altered cues increases at the edges, indicating poorer resolution. Figure 26 shows that
even though altered cues are easier to discriminate at the center positions, normal cues are
easier to discriminate for a larger range of positions (from roughly 200 and beyond).
Mean JND
0
10
40
20
30
Reference position (degrees)
bu
bu
Figure 26. Mean JND.- -: Using normal cues, where O denotes the actual data points.
- -: Using altered cues, where * denotes the actual data points.
It is important to notice that at 00 both data (normal and altered) have a larger JND
value than at 100. This may be a consequence of the HRTF error explained in section
3.4.5, which causes positions to shift slightly to the left.
arrrr~
~mnsarrr~ lr~n
*--,u~-u--~-~nrc~
5. MODEL OF ADAPTATION
A model of adaptation to the larger-than-normal auditory localization cues is
developed to make quantitative predictions of the rate in which subjects adapt to the
supernormal cues. The model used is based on the one introduced by Shinn-Cunningham
et al. (1998b).
5.1. Remapping Function
Doubling the side of the head obviously doubles the ITD that arises for a source
from a fixed position. However, in order to relate the previous adaptation model to the
double-head conditions, we must develop a quantitative description of where the doublehead source normally would be perceived by a naive subject (i.e., what source position
normally gives rise to the ITDs resulting from a source at 0 reaching the enlarged head).
Let 0 denote the azimuth of a sound source reaching the enlarged head, and let 0' denote
the location that gives rise to the same physical cues from a normal-size head. Before any
adaptation has taken place, a source reaching the enlarged head from position 0 will be
perceived at position 0'. The relation between 0 and 0' is assumed to be related by some
functionf such that:
f(0) =0'.
(12)
Because of the discrepancy between 0 and 0' (which is the difference between 0' and 0
described by the function f), the supernormal cues will introduce an initial bias.
The relationship between 0 and 0' can be derived from the mathematical
approximation for LTD (Equation 2). As explained before, the effects of EID and the
spectral shaping of the pinnae are ignored since ITD is the dominant cue when a
broadband stimuli is presented.
2 - Normal ITD = Altered ITD
2 -2551 -(0 + sin 0) = 2554 -(0'+ sin 0')
(13)
2.0+ 2 -sin0 = 0'+sin0'.
Solving Equation 13 numerically using a cubic polynomial expression we find that:
f(0) =0.0008l3 -0.0111012 + 2.1115101-0.2106
for
101 <400.
(14)
This function is shown in Figure 27.
f(O)
I,
(D3
0)
W2
D
0
10
20
30
0 (degrees)
40
50
60
Figure 27. Remapping function. - -: Normal cues (i.e.,f(O)=O).-: Altered cues (i.e.,J(e)=o').
The remapping function is defined between -400 and 400 because outside this
range altered ITDs are larger than the ones normally heard. As expected, for small angles
the remapping function is approximately equal to f()=20, since sin0E0. For larger angles
sinO deviates from a straight line and is greater than 20. Finally, 0=400 is the upper limit
for this function (it maps to 0'=90o) since it is unclear how to predict what location will
be perceived for sources whose ITDs are larger than occur in normal listening conditions.
One possible way of handling these extreme values is to assume that they are all heard at
90.
Figure 27 shows how altered HRTFs should affect the JND locations smaller than
400. The remapping creates a larger-than-normal difference in physical cues for two
sources at different positions, as seen by comparing the normal remapping function
(f(0)=0) to the double-head function for all positions.
5.2. Average Perceived Position
Even though f(6) describes the remapping of altered HRTFs, it does not account
for the adaptation process governing how subjects adapt over runs. The average perceived
position p (or mean response; results are plotted in Figures 13 and 19) reflects changes in
response due to adaptation during the experiments. Therefore, the average perceived
position (in degrees) is a function of the run r and of the normal remappingf(0):
pf (0), r] = k(r) -f ().
(15)
Here k(r) represents the slope of the line relating average response to the normal position.
It describes the adaptive state of the subject during run r.
Estimates of the slope k(r) were calculated by finding the least-square-error
solution to Equation 15, and then averaging across subjects. However, because edge
effects would cause errors in the estimates of the slope and because the remapping
function is defined up to 400, only the middle 7 positions were used (i.e., between -300
and 300).
Figure 28 (group with feedback) and Figure 29 (group with no feedback) show the
mean responses and the best-fit slope as a function of the transformed source position. In
both cases, the slope estimate is near 1.0 when normal cues are presented, and around /2
with altered cues. Also, the slope decreases between the first run with altered cues and the
last one, reflecting the adaptation process. As mentioned before, several positions fell
-L-_UI^CIIMIII~II~---- - -----I
-~-- I~-~-~--~-VIU~--i--MYC
-~X
,i-1-~iav~riYc-uZz'iCL~UFIL~FI\T~rW-
L
LII-~-U
-
outside the range of possible responses and even though they were not used in the
estimation of k(r), they are still shown in the figures; they are plotted assuming that they
are heard at 900.
Figure 28 and 29 also demonstrate that the linearity assumed in Equation 15 is
valid, since the regression coefficients are very close to 1.0 for every run.
For the initial normal-cue runs, the slope estimate should be equal to 1.0, since the
average perceived position is expected to equal the normal presented position for a nafve
subject, as explained in Equation 16.
p[f (0), r]= k(r) f (0)= 0
(16)
p[0,1] = k(l) .0= 0
so k(l) = 1.
as subjects adapt.
When altered cues are presented, the slope should decrease to 1/2
Since perfect adaptation is given by p[f(O),r]=O, one has:
p[f (0),r] = k(r)- f (0)= 0
assumingf (0) _ 20
p[f (0),r] = k(r) 20 = 0
so k(r) -
2
(17)
Group with feedback
0
8
40
40
0
20 ./I
run 2
R2=0.994
k(2)=0.961
F
20
run 3
R2=0.989
k(3)=0.591
0
100
100
0
50
100
5
525
50
9
100
0
50
Transformed position 0' (degrees;)
100
Figure 28. Mean response (o) and best-fit slope (-) for the group with feedback. Runs 2 and 33 use
normal cues (0'=0), while runs 3 and 32 use altered cues. R2 is the square of the regression coefficient.
Group with no feedback
50
0
100
100
50
,,
bu
40
20
run 9
R2=0.99 5
k(9)=0.5 61
run 10
J
R2=0.999
k(10)=0.859
50
0
100
Transformed position 0' (degrees)
100
Figure 29. Mean response (o) and best-fit slope (-) for the group with no feedback. Runs 2 and 10 use
2
normal cues (0'=O), while runs 3 and 9 use altered cues. R is the square of the regression coefficient.
Slope estimate
co
.L
0
,)
4u
C/)
0
5
10
15
20
Run r
25
30
35
40
Figure 30. Average least-square-error slope as a function of run. 0: Group with feedback. *: Group with
no feedback.
Figure 30 shows the best fit-slope value for all the runs in both experimental
groups. It illustrates that the slope decreases dramatically when altered cues are first
introduced (run 3) and that it decreases slowly as subjects adapt (runs 3 through 32 for the
group with feedback, and runs 3 through 9 for the group with no feedback). It also shows
similar characteristics across both groups. The difference in slope between runs I and 2
shows that the group without feedback adapted to the normal cues, while the other group
adapted immediately with the aid of the correct-answer feedback. Figure 30 also shows
that the group with feedback achieved stable performance with the altered cues, while the
second group, with fewer runs, had not yet reached asymptotic level of performance. In
both experiments, the slope begins near 1.0, as expected for unadapted subjects using
normal cues. When altered cues are imposed, the slope average changes rapidly to a value
around 0.6. The slope then decreases as subjects adapt to a value around 0.5, with the
largest changes taking place during the first runs of altered cues. This rate of change in
slope can be modeled by an exponential equation given by:
k(r) = T-(T- Ko).e-h,
(18)
where T is the estimate of the slope asymptote, Ko the estimate of the initial slope of the
subject's response, r is the run number (normalized so that the first run with altered cues
has value 1), and b is a parameter that controls the rate of adaptation.
The value of asymptote T was found as the mean of the last slope estimates using
altered cues (last 3 for the group with feedback and the last 2 for the other group). The
value of the initial slope Ko was set to the average of the first two normal runs. Finally,
the adaptation rate for each group was estimated by finding the least-square-error solution
to Equation 18. The results are summarized by Equation 19, and Figures 31 and 32.
k(r) = 0.53-(0.53 - 0.98)-e -086 r for group with feedback
(19)
k(r) = 055 - (0.55 - 1.05) -e-087r for group with no feedback.
For both groups, the slope estimates are nearly equal, indicating that there is no
particular advantage for the feedback group. In particular, neither T, the asymptote, nor b,
the rate of adaptation, are very different across groups. The similarity in these estimates is
due in part to the way in which the asymptote was estimated for the no feedback group.
By examination, it appears that they may still be adapting in run 9. If their true asymptote
is actually smaller than 0.55 (e.g., closer to 0.53, the value of the feedback group
asymptote), the rate b for this group would be smaller. Again, by inspection, this may be a
more accurate description of their behavior.
In general, the main difference between the groups is that the slope is larger for
the group with no feedback than for the feedback group in a given run. An explanation of
this phenomenon is that the effect of feedback is to speed the rate of adaptation.
Group with feedback
0.9,
0.
0.8,
)
0.1
0.7
0.
0.6!
0.
0.5
...........
I _
0.
0
5
10
15
20
Run r
25
30
Figure 31. Estimates for T, K, b and k(r). O: Slope estimates. -:
Equation 19 for the group with feedback.
35
40
k(r) approximation described by
Group with no feedback
I.Z
-
ko=1.05
0.9
F-
0.8
E
0.7 Ib=0.87
0.6
T=0.55
0 .521.
0
I
I
I
I
5
10
15
20
Run r
I
I
I
25
30
35
..
I
40
Figure 32. Estimates for T, Ko, b and k(r). *: Slope estimates. -- : k(r) approximation described by
Equation 19 for the group with no feedback.
6. RELATING JND AND RESOLUTION
Results from the Just Noticeable Difference (JND) experiment may predict the
relative sensitivity to localization cues in the adaptation experiments. The JND gives a
measure of the resolution between two positions, while the sensitivity measured in the
adaptation experiments gives insight into how sensitive subjects are to localization cues
in experiments with a larger range of stimuli. One model (the preliminary model of
intensity perception by Durlach and Braida, 1969) predicts that JND is inversely
proportional to sensitivity in identification experiments. The scale factor equating these
measures, however, depends upon internal noise.
6.1. Background
Durlach's and Braida's preliminary intensity perception model provides a
quantitative theory for predicting resolution for various types of experiments measuring
intensity perception (the model is described fully in Durlach and Braida, 1969). The
model has previously been applied to localization experiments involving adaptation
(Shinn-Cunningham, 1998).
In the model, every stimulus I maps to an internal sensation Y, a Gaussian random
variable with mean oa(I) and variance
p2.
The internal sensation is further transformed by
the addition of a second source of noise, a zero-mean Gaussian random variable with
variance y. As a result, the internal sensation is a Gaussian random variable Q with mean
a(l) and variance 12+_y. The variance 1 2+ 2 arises from two independent sources. The
first noise source, 32, is called Sensation Noise and depends solely on the stimulus
presented. This noise limits the best performance that can be achieved in any experiment.
Memory Noise (-),the second noise source, affects the transformation from the sensation
Y to the random variable Q, and depends on the type of experiment. For the type of
experiments performed in this study, Memory Noise is assumed to be proportional to the
total range of stimuli presented. This type of noise is termed context-coding noise.
Therefore, if Imx and Imi, are equal to the extreme values used in the experiments, the
Memory Noise can be written as:
Y2 = G 2 [(Imax
1)]2 ,
--
(20)
where G is a constant and (e)is a function that transforms the stimuli physical location
into a random internal decision variable. The addition of context-coding noise allows the
model to explain why subjects may confuse two stimuli in a large-range identification
task, while they can always identify them correctly in a JND-type task (where the range is
reduced).
The model assumes that subject responses are based on the value of the decision
variable Q. The Q axis is divided into n contiguous regions by n+l criteria. Each region
corresponds to one of the n possible stimulus locations presented in the experiment. For a
one-interval experiment the discriminability between two stimuli Ii and Ij can be written
as:
d I(l
i)
-
1 2
- O(IIj)
oC(lj )(l)
2
j2
+G
2
.[(Imax)
-(Imin
2
(21)
)]
Therefore, d' increases as the distance between the mean values of the two stimuli Iiand
Ij increases. Also, for two fixed stimuli, the sensitivity decreases as the range increases
since the internal noise grows. In other words, the Gaussian function of the two stimuli
overlap each other more in the large-range case than in the small-range case.
In summary, the function u(*) is a transformation taking physical stimulus values
to variables along an internal decision axis. The mean of the decision variable Q is
monotonically related to the location of the physical stimuli and its standard deviation is
independent of the location. The internal noise comes from a fixed source with variance
32
and a second source that depends on the total range of the stimuli (with variance -?).
The JND (the difference between stimuli at which 70.7% correct responses
occurs) is the increment for which the values of the reference stimulus and the reference
stimulus plus an increment, transformed by the function a(.), are separated enough that
subjects can reliable tell the two stimuli apart. In the model, it is assumed that the two
stimulus distributions overlap by an amount that leads to 70.7% correct responses when
the decision criterion is positioned optimally, halfway between the mean of the
distributions. Thus, when the stimuli are one JND apart, the area on the wrong side of the
criteria is 0.293 (i.e., 29.3% incorrect responses). Discriminability in a JND experiment is
dominated by the Sensation Noise; Memory Noise in these tasks is negligible since the
range is small. Additionally, in the underlying decision space, the JND increment (in
standard deviation units) is independent of the reference stimulus. Thus, this increment
(in stimulus units) is roughly inversely proportional to the derivative of the a(.) function
evaluated at the reference position:
d a(.)
do
JN
(22)
Conversely, resolution in identification tasks is proportional to the distance between the
means of the underlying distributions (in standard deviation units). However, in the largerange task there is a significant amount of Memory Noise. As a result, although the two
types of experiments should show similar relative sensitivity, resolution in the large-range
experiment should be proportional to the JND:
d'= A -
d*
= A -JND- ',
(23)
where A is a constant. This result indicates the inverse of the JND function predicts the
general shape of resolution and that relative sensitivity depends only on the a(*) function.
6.2. Results
The mean JND functions found for normal and altered cues (Equation 11) were
used to estimate the shape of d'. Figure 33 shows the result of inverting the JND
functions. The sensitivity for the altered cues around the center positions is greater
because the JND is smaller than for normal cues. For angles larger than 250, d' is larger
for normal cues (as expected). Since the units of the estimated sensitivity are arbitrary, the
data in Figure 33 were scaled so that the value of the normal data at 00 was one, without
loss of generality.
Inverse JND
0
0
10
20
30
40
Target Position (degrees)
50
60
Figure 33. Estimated shape of d'. -- : Normal cues. - -: Altered cues.
The scale factors that made the resolution results from the experiments best fit the
1/JND data were found using a least-squared error method. Resolution data obtained from
normal cues were fitted to the normal 1/JND curve, while resolution data from altered
runs was scaled to match the 1/JND curve for altered cues. The scaled versions of the
resolution data are plotted and compared to the 1/JND curves in Figure 34. By inspection,
resolution and 1/JND do show very similar shapes in all the cases (i.e., for normal and
altered cues and before and after adaptation). It is important to note that each graph shows
results before and after adaptation that appear very similar. However, the scale factors
used to fit the 1/JND curves are different for the two curves. These scale factors are
proportional to the internal noise in the model, which may vary with training (ShinnCunningham, 1998). The prediction that relative sensitivity depends only on the stimuli
used appears to hold; therefore, further analysis of the scale factors is described below.
Group with no feedback
Group with feedback
0
0
0
0
10
20
3
0
40
30
20
Target Position (degrees)
0
50
6
60
00
10
10
40
30
20
Target Position (degrees)
50
60
Group with no feedback
Group with feedback
1.4
1
1
0.8
0.6
0.4
0.2
00
11
10
40
30
20
Target Position (degrees)
50
60
0
10
40
30
20
Target Position (degrees)
50
50
60
60
Figure 34. Resolution results scaled to match I/JND curves. Results for the group with feedback are plotted
in the left side; results for the group without feedback are plotted in the right side. Results for normal cues
are shown in the top panels; results for altered cues are shown in the bottom panels. -: Resolution data
from experiments. - -: 1/JND curves. 0: First presentation of normal cues (before adaptation). *: First
presentation of altered cues (before adaptation). X: Last presentation of altered cues (after adaptation).
+: First presentation of normal cues following the last altered cues (after adaptation).
The scale factor analysis will directly reflect any changes of internal noise during
the performance of an experiment, and several hypothesis can be made using the ideas
given by the preliminary intensity perception model.
First define:
- nl as the internal noise for normal cues before adaptation,
- n2 as the internal noise for altered cues before adaptation,
- n 3 as the internal noise for altered cues after adaptation,
- n4 as the internal noise for normal cues after adaptation,
- sf1 as the scale factor for normal cues before adaptation,
- sf2 as the scale factor for altered cues before adaptation,
- sf 3 as the scale factor for altered cues after adaptation,
- sf 4 as the scale factor for normal cues after adaptation.
It is reasonable to think that the internal noise does not change abruptly between
acoustic cue changes (from normal to altered or from altered to normal), but instead has
slow transitions. Thus, ni should be roughly equal to n2 and n3 should be roughly equal to
n4. This assumption implies that sf1 should be equal to sf 2, and that sf3 should be equal to
sf4 . Even though sfj and sf2 (or sf 3 and sf 4) refer to different 1/JND curves, their value
should be the same if the internal noise is the same (i.e., both resolution data are affected
by the same internal noise quantity and therefore, they are shifted in the same amount
from the underlying L/JND curve). It is also expected that internal noise will increase
after adaptation (Shinn-Cunningham, 1998). Here, n4 should be greater than nl, and n3
should be greater than n2 . This implies that sf 4 should be greater than sfj and sf3 greater
than sf2 . Finally, if feedback was important in these experiments, internal noise values
should be comparable across the two groups (feedback and no feedback). Equation 24
gives a summary of these hypotheses.
n, = n 2 and n 3 =
n4
therefore
sf = sf 2 and sf 3 = sf 4
n, < n4 and n2 < n3
therefore
sf < sf 4 and sf 2 <sf3 .
(24)
Figure 35 compares the scale factors obtained for both experiments. Here, the
hypothesis that noise does not change much between nl and n2 or between n 3 and n 4 , is
not generally true. It looks like, with or without feedback, nl has the smallest value and
n2 is slightly larger. In the feedback experiment, n4 is clearly smaller than n3, but this
relation does not hold in the no-feedback experiment. Additionally, the data supports the
idea that n4 > nl in both experiments, but n 3 > n2 only in the feedback experiment.
Finally, looking across experiments, there is a big difference between the scale factors. In
particular, performance is generally worse for the no-feedback case (scale factors are
larger) than for the feedback case, indicating that feedback helps reduce the overall
internal noise. It is interesting to note that the maximum noise achieved in both
experiments is roughly the same (n3 is about 0.53 for both experiments).
Scale Factor Analysis
U.bb
0
0.5
0O
0
c
LL
O
0.45
a)
co
O
0.4
0
n '
Group with Feedback
Group with no Feedback
Figure 35. Scale factors from the group with feedback are on the left side while the scale factor for the
group with no feedback are on the right side. 0: Normal cues before adaptation. 0: Altered cues before
adaptation. U: Altered cues after adaptation. 0: Normal cues after adaptation.
A possible explanation for these results is that in the feedback condition, noise
changes rapidly between runs, and there are observable changes from nl to n2 to
n3
to n 4.
In general, nj is smallest, consistent with the idea that effective range of stimuli is smaller
(normal range -600 to +600). After altered cues are introduced, internal noise increases
(n2) because the range is larger, and subjects are starting to adapt. N3 has the largest value,
since it reflects the noise when the subjects attend the whole range between -900 to +900
and are adapted to the altered cues. When normal cues are reintroduced, n 4 reflects a
change in range back towards the original value ( i.e., n 4 < n3 but still n 4 > n 1).
Conversely, it appears that in the no-feedback condition, subjects tend to listen to
almost the whole range of cues throughout the experiment. Initially, in the n, run, they
attend to a slightly smaller range (nl < n2, n3, and n4), but as soon as they hear the
auditory signal change to the altered cue condition, they attend to the whole range of
possible locations (-90o to +900). Thus, there is no substantial change from n 2 to n3, since
they are already attending the whole range. When normal cues are presented again, there
may be a slight decrease from n3 to n 4 , indicating that subjects attend a slightly smaller
range in the n4 run. However, this change is small and is probably not significant. In
general, then, it appears that subjects attend to the whole range of cues when feedback is
not provided because they are not sure what acoustic cues will be presented.
7. CONCLUSION
7.1. Summary
Adaptation to double-size head auditory localization cues was investigated by
presenting simulated acoustic cues with the aid of an auditory virtual environment. The
goal of this study was to determine whether better-than-normal performance could be
achieved with these supernormal localization cues. Bias and resolution were the two
aspects of performance analyzed.
This study follows previous work by Shinn-Cunningham et al. (1994 and 1998) in
which a nonlinear remapping of the normal HRTFs was implemented (Equation 6). The
study concluded that subjects did not adapt to the nonlinear transformation employed but
rather to a linear approximation of the transformation. It also showed that the slope
relating mean response to the physical cues presented changed exponentially over time.
These results indicated that the largest changes in performance occurred at the beginning
of the period using altered cues. By the end of the exposure to altered cues, the mean
slope asymptoted to a stable value. Finally, the rate at which subjects adapted was found
to be b=0.84 run - ] (Equation 18).
In this study, even though the transformation function is not exactly linear
(Equation 14), the ITDs for every position are doubled when altered cues are used. This
means that the mapping is roughly linear in ITD space.
Two similar experiments investigated how subject performance changed over
time. Both experiments used a forced-choice identification task using 13 different
positions. In the first experiment, the first 2 and the last 8 runs used normal cues, while
the middle 30 presented altered cues. In this experiment, correct-answer feedback was
provided after each response. In the second experiment, the first 2 and the last runs used
normal cues, while the middle 7 presented altered cues. In this experiment, feedback was
not provided. The experiments showed that feedback accelerates the adaptation process to
supernormal cues, but that it is not necessary for adaptation to occur. For both
experiments, mean response and bias showed all the characteristics of adaptation, while
resolution results were less consistent. In particular, changes in resolution in the feedback
experiment were similar to changes seen in previous experiments (Shinn-Cunningham et
al., 1994 and 1998). However, without feedback, internal noise was large throughout the
experiment, as if subjects attended to the whole range of possible cues, independent of
their adaptation rate. In general, resolution was better at the center positions when altered
cues were introduced, but normal cues provided better overall resolution.
A just noticeable difference (JND) experiment was run to obtain further insight
into the normal and double-head resolution results (Equation 11). The JND curves were
used to predict sensitivity as a function of azimuth and to compare these measures of
sensitivity to the resolution results. This analysis indicated that relative sensitivity
depends only on the cues used (not on the adaptive state of the subject). Additionally, it
showed that feedback reduces the total internal noise, improving resolution.
Finally, it was found that with the double-size head localization cues, subjects also
adapt to a linear transformation. In both adaptation experiments, mean slope (relating
acoustic cues with position) changed exponentially over time. The rate at which subjects
adapted was found to be b=0.86 run -' for the group with feedback and b=0.87 run -' for the
group with no feedback (Equation 19).
7.2. Discussion
The most important characteristic of simulating double-size head cues was that
the normal ITDs were doubled for every source position. As discussed previously, for
angles greater than 400 (and for angles less than -400 since symmetry was assumed) the
altered 1TDs presented were unnatural (i.e., the values were larger than the largest,
normally-occurring ITD). This means that subjects heard sources that had cues never
heard before. Although in one sense, this transformation of ITDs is linear, in another
sense, it is not. In particular, the transformation function f(0) (Equation 14) between the
position that an unadapted naive listener perceives and the actual position is not linear.
Additionally, f()
is not defined for azimuths above 400. It is supposed that these
positions were always mapped to 900 (Figure 27). This means that subjects heard normal
positions between 400 and 600 as coming from the same (900) position. If this were
exactly what occurred, subjects should confuse those positions, or in other words, their
resolution should be very bad for sources farther than 400 to the side. This effect can be
seen in the resolution results for both adaptation experiments (Figures 15 and 21).
Furthermore, the JND function (Equation 11 and Figure 26) also shows poor resolution at
the edges of the range, since its value at the edges is greater for altered cues than for
normal cues. For example, the JND value for 600 is approximately 230 (subjects could
only distinguish a source from 600 with one coming from 370). As explained before,
previous models predict that the inverse of the JND function gives a good estimate of the
general shape of the sensitivity curve in any experiment. The inverse JND curve also
predicts that resolution is very poor for the edges (Figure 33). In short, the altered cues
are beneficial only for sources between -400 and +400. In this case, the range of cues
presented correspond to the normal-cue range of exactly -900 and +90' (i.e., all ITDs
presented will be natural).
The two adaptation experiments (feedback and no-feedback) showed some
similarities and some very interesting differences. In both groups, mean response and bias
results (Figures 13, 14, 19 and 20) showed the adaptation process that was expected.
While the resolution results for the group with feedback showed changes consistent with
changes in resolution in previous adaptation experiments, in the group with no feedback
resolution did not change with adaptation (Figures 15 and 21). It seems that resolution for
the no-feedback group depended only on the acoustic cues presented (i.e., the results for
normal cues before and after adaptation are very similar, as are the results for altered cues
before and after adaptation). Indeed, this results resemble the estimates of d' as a function
of azimuth (Figure 33), further supporting the hypothesis that resolution reflects and
follows the changes in the type of cues.
The difference in the resolution results between the two adaptation experiments
can also be explained by the amount of internal noise occurring in each stage of the
adaptation process. Changes in resolution with adaptation occur because of internal noise
changes. In the group with feedback, subjects first attend to a ±600 range. When altered
cues are introduced, subjects attend to a larger range, but the internal noise does not
change immediately. After adaptation has taken place, the internal noise has increased to
the value proper for the larger range. After normal cues are reintroduced, the actual
physical-cue range is reduced, but again the internal noise does not decrease abruptly. On
the other hand, the group with no feedback always attends to a large range since they
never know with certainty the range being used. For this reason, the internal noise is
constant during the experiment. The scale factor analysis supports this result (Figure 35),
since the values for the group with no feedback are very similar for all runs, in contrast
with feedback group, for whom the values change with adaptation.
In a previous work, Shinn-Cunningham et al. (1994 and 1998) found that mean
response, bias and resolution were dependent and that all changes were related to a single
underlying process. It is interesting to see that both studies found very similar rates of
adaptation using different transformations. However, in seems that the no-feedback
condition causes resolution to be independent from the other quantities.
Finally, the explanation that boredom in the first experiment caused the difference
in the resolution results is not acceptable, since overall performance levels (resolution) is
worse without feedback, when a shorter experiment was performed.
7.3. Future Work
The results in this study could be better understood if the exact shape of the
transformation function f(O) was known and defined throughout all the possible source
locations. This implies that the effects of the unnatural ITD values should be studied.
Also, to achieve a better model for the sensitivity d', the exact shape of xo(O), the
transformation function between physical stimulus values and variables along an internal
decision axis, should be determined. This function could also help create a model of
adaptation that predicts the performance of bias and resolution using the double-size head
cues. Since feedback or no-feedback conditions gave different results, the model
presented by Shinn-Cunningham et al. (1994 and 1998) should be modify to allow bias
and resolution to be driven by different processes.
REFERENCES
Braida, L. D. and Durlach, N. I. (1972). Intensity perception. II. Resolution in oneinterval paradigms. Journalof the Acoustic Society of America, 51, 483-502.
Blauert, J. (1983). Spatial Hearing. Cambridge, MA: MIT Press.
Bolt, R. A. (1984). The human interface: Where people and computers meet. London:
Lifetime Learning Publishers.
Durlach, N. I. (1991). Auditory localization in teleoperator and virtual environment
systems: ideas, issues, and problems. Perception, 20, 543-554.
Durlach, N. I. and Braida, L. D. (1969). Intensity perception. I. Preliminary theory of
intensity resolution. Journalof the Acoustic Society of America, 46, 372-383.
Durlach, N. I., and Mavor, A. S. ed. (1995). Virtual Reality: Scientific and Technological
Challenges. Washington, D.C.: National Academy Press.
Durlach, N. I. and Pang, X. D. (1986). Interaural magnification. Journal of the Acoustic
Society of America, 80, 1849-1850.
Durlach, N. I., Shinn-Cunningham, B. G., and Held, R. (1993). Supernormal auditory
localization. I. General background. Presence, 2, 89-103.
Foley, J. D. (1987). Interfaces for advanced computing. Scientific American, October,
127-135.
Kistler, D. J. and Wightman, F. L. (1991). A model of head-related transfer functions
based on principal components analysis and minimum-phase reconstruction.
Journalof the Acoustic Society of America, 91, 1637-1647.
Kulkarni, A. (1995). Auditory Imaging in a virtual environment. Unpublished master's
thesis. Department of Biomedical Engineering, Boston University, Boston,
Massachusetts.
Levitt, H. (1971). Transformed up-down methods in psychoacoustics. Journal of the
Acoustic Society of America, 49, 467-477.
Lippmann, R. P., Braida, L. D. and Durlach, N. I. (1976). Intensity perception. V. Effect
of payoff matrix on absolute identification. Journal of the Acoustic Society of
America, 59, 129-134.
Mills, A. W. (1963). Auditory perceptions of spatial relations. Proceeding of the
International Congress of Technology and Blindness. Vol. 2. Pp. 111-139.
American Foundation for the Blind, New York.
Mills, A. W. (1958). On the minimum audible angle. Journal of the Acoustical Society of
America, 30, 237-246.
Mills, A. W. (1972). Auditory localization. In J. V. Tobia (Ed.), Foundations of Modern
Auditory Theory (pp. 303-348). New York: Academic Press.
Oppenheim, A. V., Schafer, R. W. (1975). Digital Signal Processing. (pp. 337-367).
Englewood Cliffs, NJ: Prentice Hall.
Plenge, G. (1974). On the difference between localization and lateralization. Journal of
the Acoustical Society of America, 56, 944-951.
Rabinowitz, W. R., Maxwell, J., Shao, Y., and Wei, M. (1993). Sound localization cues
for a magnified head: Implications from sound diffraction about a rigid sphere.
Presence, 2, 125-129.
Lord Rayleigh [Strutt, J. W.] (1907). On our perception of sound direction. Philosophical
Magazine, 13, 214-232.
Shaw, E. A. (1974). The external ear. In W. D. Keidel & W. D. Neff (Eds.), Handbook of
sensory physiology, Vol. 1, Auditory system (pp. 455-490). New York: SpringerVerlag.
Shaw, E. A. (1975). The external ear: New knowledge. In S. C. Dalsgaard (Ed.),
Earmolds and Associated Problems. Proceedings of the 7'h Danavox Symposium,
Scandinavian,Audiology, Suppl. 5, 24-50.
Sheridan, T. (1987). Telerobotics. Proceeding of the International Federation of
Automatic Control, 1 0 hIFAC World Congress,July, 27-31, Munich, FRG.
Shinn-Cunningham, B. G. (1994). Adaptation to supernormal auditory localization cues
in an auditory virtual environment. Unpublished Ph.D. thesis in the Department of
Electrical Engineering and Computer Science, Massachusetts Institute of
Technology.
Shinn-Cunningham, B. G. (1998). Adapting to remapped auditory localization cues: A
decision-theory model. (Draft).
Shinn-Cunningham, B. G., Durlach, N. I., and Held, R. M. (1998a). Adapting to
supernormal auditory localization cues I: Bias and resolution. Journal of the
Acoustic Society of America, (submitted).
Shinn-Cunningham, B. G., Durlach, N. I., and Held, R. M. (1998b). Adapting to
supernormal auditory localization cues II: Constraints on adaptation of mean
response. Submitted to Journalof the Acoustic Society of America, (submitted).
Strelow, E. R, and Warren, D. H.. (1985). Sensory substitution in blind children and
neonates. D. H. Warren and E. R. Strelow (Eds.) Electronic Spatial Sensing for
the Blind (pp. 273-298). Dordrecht, NL: Martinus-Nijhoff.
Vertut, J., Coiffet, P. (1986). Robot technology. Teleoperation and Robotics: Evolution
and Development (volume 3A) and Applications and Technology (volume 3B).
Englewood Cliffs, NJ: Prentice Hall.
Warren, D. H., and Strelow, E. R. (1984). Learning spatial dimensions with a visual
sensory aid: Molyneaux revisited. Perception, 13 (pp. 331-350).
Wenzel, E. M. (1992). Localization in virtual acoustic displays. Presence, 1 (1), 80-107.
Wenzel, E. M. and Foster, S. H. ( 1993). Perceptual consequences of interpolating headrelated transfer functions during spatial synthesis. Proceedings of the IEEE ASSP
Workshop on applications of signal processing to audio and acoustics, October
1993, A New Paltz, New York.
Wightman, F. L., and Kistler, D. J. (1989). Headphone simulation of free-field listening.
I. Stimulus synthesis. Journalof the Acoustic Society of America, 85, 858-867.
Wightman, F. L., and Kistler, D. J., and Perkins, M. E. (1987). A new approach to the
study of human sound localization. In W. A. Yost and G. Gourevitch (Eds.),
DirectionalHearing (pp. 26-48). New York: Springer-Verlag.
Wightman, F. L., and Kistler, D. J. (1992). The dominant role of low-frequency interaural
time differences in sound localization. Journal of the Acoustical Society of
America, 91, 1648-1661.