Adaptation to a Varying Auditory Environment

Adaptation to a Varying Auditory Environment
by
Gregory Galen Lin
Submitted to the Department of Electrical Engineering and
Computer Science
in partial fulfillment of the requirements for the degree of
Bachelor of Science in Electrical Science and Engineering
and Master of Engineering in Electrical Engineering and Computer
Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
May 1996
@ Gregory Galen Lin, MCMXCVI. All rights reserved.
The author hereby grants to MIT permission to reproduce and
distribute publicly paper and electronic copies of this thesis
document in whole or in part, and to grant others the right to do so.
A uthor ..........................
Department of Elf'ctricdal Engineering and Computer Science
May 28, 1996
Certified by
,Nathaniel I Durlach
Research Scientist
:5hesis Supervisor
Accepted
b-y-
Fred&r; R. Morgenthaler
Chairman, Department Committee on Graduate Students
,ASSA-( C UijSETTS iNS2'
OF TECHNOLOGY
JUN 111996
i;:
ng.
Adaptation to a Varying Auditory Environment
by
Gregory Galen Lin
Submitted to the Department of Electrical Engineering and Computer Science
on May 28, 1996, in partial fulfillment of the
requirements for the degree of
Bachelor of Science in Electrical Science and Engineering
and Master of Engineering in Electrical Engineering and Computer Science
Abstract
This project investigated sensorimotor adaptation to rearranged auditory cues. Data
was collected by presenting subjects with an acoustic cue (a gated pulse-train generating a clicking sound) simulated to come from one of 13 locations (confined to a
horizontal azimuthal plane) and recording the subject's estimate of the stimuli location. After each response, the subject was informed of the correct response, providing
constant training. Subjects were presented, in order, with unaltered cues, strongly
altered cues, weakly altered cues, and unaltered cues. Results show that, in addition
to partial adaptation to the changing environment, subjects can partially adapt from
strongly altered cues to weakly altered cues.
Thesis Supervisor: Nathaniel I Durlach
Title: Senior Research Scientist
Contents
1
Project
2 Background
3
7
2.1
Localization Cues .............................
7
2.2
Previous Work
8
..............................
Data Collection
10
3.1
T ask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
3.2
Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
4 Experimental Problems
5
15
Data Analysis
5.1
Mean Response ........
5.2
Error . . ..
5.3
Resolution .........
. . ...
16
. . . . .
17
. .
17
5.4
Bias . . . . . . . . . . . . . .
18
5.5
Estimating Adaptation . . . .
27
5.6
Imperfection in auditory cues
33
. . . .
5.7
6
Impact of edges ........
Summary
A Warp and Line Fit Results
. . .
............
. .
. . .
33
35
37
List of Figures
2-1 Transformation performed by fn(0) .......
3-1
Altered Locations: (a) normal cues (n = 1); (b) second set of altered
cues (n = 4); (c) first set of altered cues (n = 2) . . . . . . . . . . . .
14
5-1
Runs 2 and 3: Changing from n = 1 to n= 4
. . . . . . . . . . . . .
21
5-2
Runs 3 and 17: Start and finish of n = 4 . . . . . . . . . . . . . . . .
22
5-3
Runs 17 and 18: Changing from n = 4 to n = 2 . . . . . . . . . . . .
23
5-4
Runs 18 and 32: Start and finish of n = 4
. . . . . . . . . . . . . . .
24
5-5
Runs 32 and 33: Changing from n = 2 to n = 1 . . . . . . . . . . . .
25
5-6
Runs 33 and 40: Start and finish of n = 1
. . . . . . . . . . . . . . .
26
5-7
Observation of linearity ..............
. ... ... ... ..
28
5-8
Individual Adaptation Results ..........
. .... ... ... .
30
5-9
Adaptation over runs ...............
. .... ... ... .
32
List of Tables
3.1
Table of Warp Transformations
.....................
12
5.1
Subject Exponential Fit Results .....................
30
A.1 Line-Fit values
....................
..........
38
A.2 W arp-Fit Values ....................
..........
39
Chapter 1
Project
This project investigated subject adaptation to supernormal auditory localization
cues. Supernormal auditory localization aims to improve a subject's ability to discriminate the locations of nearby sounds. The proposed experiments will contribute
to the understanding of adaptation to supernormal auditory localization cues.
Chapter 2
Background
2.1
Localization Cues
Sound localization involves processing of three main indicators: interaural intensity
difference (IID), interaural time difference (ITD), and spectral cues. IIDs are differences in sound intensity between the subject's ears, where, for example, a more
intense sound at the left ear is more likely to correspond to a source on a person's
left. ITDs are any differences in sound arrival times between the ears; the closer an
ear is to a sound source, the earlier the ear will receive the sound. As in the case with
IIDs, ITDs between the two ears help indicate the location of the sound source. The
final main indicator used in auditory localization is monaural spectral cue shaping.
The outer ear alters a sound according to the sound's frequency and the angle with
which it impacts the ear. Unlike IIDs and ITDs, monaural frequency cues depend on
the prior knowledge and experience of the subject with these frequency-to-location
translations [2].
Localization cues are generated when a sound interacts with a person's head, and
the total interaction can be summarized by a head-related transfer function (HRTF).
By measuring the intensity, time, and frequency changes of a known source as it
enters the ear canal from different locations, a set of coefficients can be determined
such that convolution of these coefficients with an audio stream will produce correct
spatial signals for the left and right ear.
Effects of Transformation
60
- i·
i
i:
-.4.
....
80 .............................................................
i i-·
-
--: -
M
40
Q
20
0
0
7M-20
)-
a
X
. .. . . . .a . . . . .*.. . . .
....
-40
.......
S--
....
.......
.........
3K
Na
-60
. . . . .. . .
warp = 4
warp = 2
warp =
-80
-80
-60
-40
-20
0
20
correct location (degrees)
40
60
80
Figure 2-1: Transformation performed by f,(O)
2.2
Previous Work
In this project, subjects were exposed to an auditory spatial distortion constrained
along a constant azimuthal plane described by the expression:
1
0'= f,(0) = 21tan-[ 1 -
2n sin(20)
2n + n2) cos(29)
n 2 +(1 sin(2
where the angle, 9, represents the correct location, 0' is the angle that normally
corresponds to the localization cues presented to the subject, and n represents the
extent of the audio warping.
The term correct will always refer to the location from which the subject is told
the source is coming, and the term normal will refer to the location that normally
corresponds to the physical cues presented. Thus, subjects are told that the source
is at 0, even though the normally-heard position of the source is 0'. The degree of
distortion produced by n (or warp) is reflected in figure 2-1 where the x-axis reflects
the correct location and the y-axis denotes the normal location. As shown in figure 21, a value of n = 1 represents no altering, so that the correct cue locations and normal
cue locations are the same. Larger values of n represent more drastic deviations from
normal.
When the transformed cues are first introduced, subjects will make systematic
8
errors in localization. For instance, with n > 1, subjects will tend to hear sounds
farther off-center than normal. A subject's adaptation to the transformed audio cues
is observed through analysis of their localization performance, summarized by resolution and bias measures. Adaptation is evidenced if subjects overcome the systematic
error (bias) in localization judgements over time.
Previous work [1] has shown that subjects can partially adapt within a two-hour
period (e.g. over time, bias is reduced) when they are exposed to a single cue transformation of the form shown in figure 2-1. Subjects also adapted to a relatively weak
transformation (n = 2) followed by a stronger transformation (n = 4) in a single
2-hour session. A single model was able to explain both of these results. However,
a pilot study with only 2 subjects indicated that subjects given a relatively strong
transformation (n = 4) followed by a relatively weak transformation (n = 2) did not
adapt in a way predicted by the model. The work described here investigates the
surprising result in more detail.
Chapter 3
Data Collection
3.1
Task
Data was collected through a series of trials with each subject. Each trial consisted
of a burst of clicks, after which the subject responded with the apparent location of
the sound source. The response was immediately followed by visual feedback from
spatially-positioned light bulbs (fig. 3-1) giving the correct sound source position.
Testing and training were thus simultaneous, with each trial adding to the subject's
experience with the new auditory space.
Twenty-six trials were grouped to form a run, with a stretch of 40 runs making up
a session (typically spanning two hours). In each session, subjects were exposed to,
in order, 2 runs of normal cues (warp parameter n = 1), 15 runs of strongly warped
cues (n = 4), 15 runs of mildly warped cues (n = 2), and 8 final runs of normal cues
(n = 1) with a 5 minute break after the 10th and 32nd runs. Subjects were notified
each time the degree of warping is changed.
3.2
Setup
Subjects were seated facing 13 numbered lights labeled 1 to 13 from left to right. The
lights were arranged on a semi-circular path at 10 degree intervals, 5 feet from the
subject. Light 7 was visually straight ahead and referenced as 0 degrees, light 1 was
located at -60 degrees, and light 13 was located at +60 degrees.
With the normal set of cues (fig. 3-1a) each light corresponded to its physical
location. Under strongly warped cues (fig. 3-1c), the "normal" sound location corresponding to each lamp was shifted farther off center than the actual lamp location.
For example, the sound cues for location number 8 were closer to the normal cues
for a source at +30 degrees than for the normal cues for a normal source at +10
degrees (under no warping). The lightly warped cues (fig. 3-1b) gave the same type
of distortion as the strongly warped cues (fig. 3-1c), but to a lesser extent (table 3.1).
light
1
2
3
4
5
6
7
8
9
10
11
12
13
f (O)n = 1
-90.00
-80
-70
-60
-50
-40
-30
-20
-10
0
10
20
30
40
50
60
70
80
90
f (O)n =4
-90.00
-87.48
-84.8
-81.79
-78.15
-73.41
-66.59
-55.52
-35.2
0
35.2
55.52
66.59
73.41
78.15
81.79
84.8
87.48
90
f (O)n = 2
-90.00
-84.96
-79.69
-73.9
-67.24
-59.21
-49.11
-36.05
-19.43
0
19.43
36.05
49.11
59.21
67.24
73.9
79.69
84.96
90
Table 3.1: Table of Warp Transformations
The head position of the subject was monitored using a Bird headtracker (a commercial device using electro-magnetic pulses to allow the position of the head to be
tracked) mounted on a set of Sennheiser HD-545 headphones. The acoustic stimulus was five 1 millisecond pulses spaced at 100 millisecond intervals sent through a
low-pass filter (to prevent aliasing of high-frequency components) and into a Convolvotron.
The Convolvotron was special-purpose signal-processing hardware installed in an
Intel x86-based PC responsible for mapping an input source to the appropriate location in auditory space. The input signal was first sampled and digitized, then the
mapping was accomplished by convolving the input with a pair of transfer functions,
one for the right ear and one for the left ear, which contain the direction-dependent
effects on sound caused by a head and a pair of ears. This pair of transfer functions
was simply the empirically-determined HRTF for a source from the specified direction. Thus, any auditory signal was transformed into a pair of signals (left and right)
that contain spatial information.
From the Convolvotron, the newly spatialized signal was sent to the headphones.
After each presentation, the subject entered a responses (between 1 and 13, corresponding to the numbered sources) on a keyboard which sat on their lap. From the
keyboard, the PC collected the response, and after each response, activated the lamp
corresponding to the correct sound source position. Through this feedback, the subject was trained to adapt to changes in the mapping between audio cues and the
corresponding correct location. Data files with subject responses (recorded by the
PC) were updated after every run.
-10·
i
6o'
-60
-90
'90
:o·
-30
.. ,.
9d
-o0
0*
-.30
0o
-90 .
Figure 3-1: Altered Locations: (a) normal cues (n = 1); (b) second set of altered cues
(n = 4); (c) first set of altered cues (n = 2)
14
Chapter 4
Experimental Problems
The setup had a few shortcomings that may affect the experimental results. Experiments prior to January 8th, 1996 were conducted in an office room that is not
sound-proof. While the headphones provided some isolation they could not completely
eliminate the noises caused by the environment. In addition to the computer's continual mechanical hum, the disk-writing operation that occurred between runs was
audible to the subject. Experimentation after January 8th was conducted in a soundproof room with the PC located outside of the booth. With this setup, the primary
disturbance was a noticeable hum produced by the Bird head-tracking system.
Additionally, the HRTFs used in the described experiments was empirically determined from a single "petite female" subject [3]. The localization cues produced
by the Convolvotron may be slightly different from the cues that the subject would
typically expect (see Imperfection in auditory cues).
Chapter 5
Data Analysis
Data was averaged across all 8 sessions for each subject to find the statistics below.
The resulting values were then averaged across all 5 test subjects to yields the data
plotted in figures 2 through 9. Graphs were made for run-pairs corresponding to
changes in warp strength (figs 5-1, 5-3, 5-5) and to the beginning and end of a warp
(figs 5-2, 5-4, 5-6).
5.1
Mean Response
The mean response graphs (figs. 5-1, 5-2, 5-3, 5-4, 5-5, 5-6; panel a) plot correct versus
subject response, where correct cue refers to the location to which the experiment
trains the subject, and subject response is the (average) response given by subjects
when presented with the associated correct cue. If all of a subject's responses are
correct, the mean response line will fall exactly on the "correct answer" base line.
On run 3 (n = 1 to n = 4; fig 5-1a) subject overestimation produces a sigmoidal
response curve as a function of cue location. Over time (trial 3 to trial 17; fig 5-2a),
subjects are able to partially adapt, indicated by a response curve closer to the base
line response.
Comparing runs 17 and 18 (n = 4 to n = 2; fig 5-3a) we see that subjects adjust
quickly to the weaker transformation. The mean curve for run 18 is very close to the
"correct answer" base line.
Continued training on the n = 2 cues (runs 18 to 32; fig 5-4a) produces slight
improvement across all cues.
On the final change of cues (between runs 32 and 33, n = 2 to n = 1; fig 5-5a)
subject responses show underestimation similar to the change introduced between
run 17 and 18. Consistent with previous runs, continued exposure improves subject
performance (runs 33 to 40; fig 5-6a).
5.2
Error
Error (figs. 5-1 to 5-6; panel b) graphs show the difference between subject response
and the correct response (noted as subject error). It is the inverse of the bias graphs
with the exception of an inversion and normalization by the standard deviation.
Error is closely related to bias since it is equal to the error multiplied by -1 and
divided by the standard deviation in subject responses. Thus, patterns in error can
be understood by reading the discussion of bias results.
5.3
Resolution
The resolution (d') between location i and i + 1 is defined as
di+,
mi+ - mi
where mi is the mean subject response for cue location i and ai is the standard
deviation of the subject response to location i. Resolution measures a subject's perceived distance between adjacent cue locations normalized by the standard deviation
in subject responses, and thus, measures the ability to discriminate between different sound sources. The perceptually closer the sources are to each other, the more
difficult it becomes to discern them as separate locations, leading to lower values of
resolution.
The first change in cues takes place on trial 3 where the warp strength increases
from n = 1 (run 2) to n = 4 (run 3). Under n = 4, the average distance between
the normal cues just ahead of the subject (cue locations 5 through 9) increases,
producing the expected improvement in resolution. With greater separation between
the forward-located cues (depicted in fig 3-1a: n = 1, and 3-1c: n = 4), they become
easier to resolve. Conversely, because the cues at the edges of the test range become
more closely located, resolution begins to suffer.
Resolution decreases somewhat as exposure to the warped cues continues between
runs 3 and 17 (fig 5-2c).
On the change from n = 4 (run 17) to n = 2 (run 18), center resolution degrades.
Center cue locations for n = 2 are spaced more closely than the cue locations for
n = 4 (compare figure 3-1c with 3-1b) producing the expected degradation in resolution. Larger spacing for locations at the edges of the range generate small resolution
improvements in resolution beyond source locations 5 through 9. Continued exposure to n = 2 cues (runs 18 through 32; fig. 5-4) degrades resolution performance, if
anything.
Upon returning to normal cues (runs 32 to 33; fig. 5-5) little change is seen in
resolution. With continued exposure to the normal cues (runs 33 through 40; fig.
5-6), resolution remains relatively constant.
5.4
Bias
The bias 3 associated with cue i is
iz-mi
o1i
Bias is a noise-adjusted measure of the error in subject response for a given source
position, thus reflecting a subject's error in location as measured in units of response
standard deviation.
For example, when subjects are initially exposed to more-strongly-warped cues
(run 2, n = 1 to run 3, n = 4) the bias should be positive for errors left of center
(except at the edges; see Impact of the edges). A simple estimate of bias for sudden
changes in warping (ie, from run 2 [n = 1] to run 3 [n = 4] or run 17 [n = 4] to run
18 [n = 2]) can be found by subtracting the corresponding normal positions from the
correct position (i.e., subtract fig 3-1a from fig 3-1c to generate crude bias values for
n = 1 to n = 4).
For cues with a weak to strong change (increasing warp n), an after-effect is caused
by subject's overestimation of cue locations. On run 3, the subject first experiences
warp n = 4.
Assuming that he has adapted to n = 1 (which are normal cues
and do not require adaptation; see section Imperfection in auditory cues), then his
first exposure to n = 4 will produce responses in which he interprets the physical
stimuli like there is no transformation (n = 1). Looking at table 3.1, cue 81n=4 maps
approximately halfway between cue 101,=1 and cue 111n=1 (say 10.51,=1) and cue 91n=4
maps to cue 12.51,=1. The new mapping (n = 4) produces an overestimation which is
consistent with the data. Additionally, larger shifts in cue remapping leads to greater
overestimation which is also consistent with the data in the panel.
Figure 5-2d depicts the results for the 3rd to the 17th runs corresponding to the
1st and 15th runs with n = 4. Over time there is a decrease in average bias as subjects
adapt to the cue transformation.
Conversely, for cues which change from strong to weak (decreasing warp n), subjects generally underestimate the cue locations. On run 18, subjects are exposed to a
warp n = 2 that is weaker than the most recent warp (n = 4). In this case, cue 91n=2
maps to cue 8 1n=4 and cue 131n=2 maps to cue 111n=4. Figure 5-3d results show the
expected underestimation caused by decreasing warp strength.
Figure 5-4d shows the 1st and 15th exposure to warp n = 2; again bias decreases
over time.
On run 33, underestimation results when the subject is reintroduced to normal
cues n = 1 (down from n = 2) where, from table 3.1, cue 131n=1 maps to cue 111n=2
and cue 91n=1 maps to cue 81n=2 (fig. 5-5d). Because the magnitude of the location
shifts are not as drastic as the initial change of n = 1 to n = 4, the magnitude of the
error is not as great.
Figure 5-6 shows the 1st and 8th runs following the return to normal cues.
In each case where the cues change (e.g., figures 5-1, 5-3, and 5-5), the corresponding change in bias is not as large as the differences reflected in table 3.1. Subject
training is a continuous process throughout each run, and thus errors made early in
the run may be larger than the errors later in the run (which may be reduced by
adjustments made later in the run). Additionally, subjects are notified each time a
cue is changed, and across the multiple sessions a subject participates in, he may be
able to anticipate the new cues as soon as they are presented. Finally, subjects may
not be completely adapted to the previous transformation when the cues are changed,
resulting in a smaller than predicted change in bias. Even with these circumstances,
data still strongly reflects the systematic over- and under-estimation consistent with
adaptation (though imperfect) to each new cue transformation.
(a) Mean response
(b) Difference plot
2
10
................
·..
·..
·.
1
E
0
-o
-Run 2
oo•
0
-Run 3
0 Base
|
a -1
.o o.-
...
o.- 0. o
..... ...
....... I
............. :/............
-2
o o o0oo.0
- Run2
I-Run
3
o
Base
correct cue location
(d)Bias
correct cue location
(c) Resolution
2.5
2
S'•
1.5
1
-Run
2
-Run 3 I
Base
...
.....-o....
...
........
........
0.5
0.0.0-0
0 .---
0.0-0.0.0..0.0-0..
-0.5
0
-0.5
location
5
10
location
Figure 5-1: Runs 2 and 3: Changing from n = 1 to n = 4
"
(b) Difference plot
(a) Mean response
15
2
/
CD
1
o10
u)
0-,
,',0,0_ 0-00
U)
U)
-1
.LJ
.............
............
-2
0
5
10
correct cue location
(c) Resolution
--Run
Run 317
Base
5
10
correct cue location
(d) Bias
2.5
1
1.5
1
0
A
0.5
/
.....
....
o Base
o.i0.-0.
.
o00
.0o..
:
:
-1
0
-0.5
- Run 3
-Run 17
:
/
2
S/'
m
5
10
location
a
0
location
Figure 5-2: Runs 3 and 17: Start and finish of n = 4
/ ..
(a) Mean response
(b) Difference plot
15
O
u)
C
0
ol
0.
a,
0)
a,
5
a,
.......................
- Run 1'
7
-Run
o Base
1]8
0
2.5
5
10
correct cue location
(d) Bias
correct cue location
(c) Resolution
f
\ -Run 17
2 ............... .........
--Run 18
\ o Base
1.5
1
0.5
.............
.....
....
......
- . .
0'
0
0
-... IO - - - -
0.
0
-0.5
_.v
location
0
5
location
Figure 5-3: Runs 17 and 18: Changing from n = 4 to n = 2
10
(a) Mean response
(b) Difference plot
2
1
0
C,
)-I
- Run 18
- Run 32
-2 o Base
.........................
C
correct cue location
(c) Resolution
2.5
5
10
correct cue location
(d) Bias
-
2
1.5
1
0.5
o0-o-oo00o-o-o-0
0
-0.5
0
.00
..-.
0
location
5
10
location
Figure 5-4: Runs 18 and 32: Start and finish of n = 4
(b)Difference plot
(a) Mean response
15
2
...............
..............
............
a)
1
io
010
W.
CD
S-1
(I
..
...
0-Run
Base33......................
-2
5
10
correct cue location
(c) Resolution
-
correct cue location
(d) Bias
- Run 32
-Run 33
Run 32
-Run 33
i
1.5
1 o Base
oBase
1
._
.0
0.5
...
... 0
... . ....
...
00 0-"0"
..
O
.o,6 o-
0
0-0-"-
.0.
...
."..
....
O..
-1
...
n
0r
0
-
location
0--o
o 0
.........
.....
5
location
Figure 5-5: Runs 32 and 33: Changing from n = 2 to n = 1
(a)Mean response
(b)Difference plot
0
o 0.10
....
.. .
...............
..;...
t5
CD
-o
Ar
0
So-Run
,
33
-Run 40
0 Base
0
0
correct cue location
(c) Resolution
ar
10
5
correct cue location
(d)Bias
-
-Run 40 :
o Base
.9'.>
1.5
1
/
0.5
o . ooo.o.o
0 ..........
-
.
.....
02
0I
S/
- Run 33 :
2 .......................... - R un 33
-
L
0
location
location
Figure 5-6: Runs 33 and 40: Start and finish of n = 1
5.5
Estimating Adaptation
The degree of adaptation can be measured by the slope of the line that best fits
mean response as a function of 0', the normal position of the stimuli. Observation
of subject response versus normal cue location (figure 8) show that response has a
roughly linear shape as a function of 0'. From start to finish of n = 4 exposure (runs
3 and 17, respectively; figs. 5-7a and 5-7b) and from start to finish of n = 2 (runs
18 and 32, respectively; figs. 5-7c and 5-7d) the subject response as a function of
normal cue appears linear. However, the slope of the line relating mean response to
0' changes over time.
The best-fit was generated by finding the line that minimizes the mean-square
error between predicted and measured subject response. Because the correct cue for
straight-ahead (light 7) remains the same as the normal cue location for straightahead, each line-fit was forced to contain the point where normal cue straight-ahead
is the same as subject response straight-ahead (i.e., only the slope of the line changed;
the intercept was assumed fixed).
Because some warp levels generate cues that fall outside of the normal response
range, only normal cues that fall between +60 and -60 degrees are considered. For
example, when the warp level changes from n = 1 to n = 4, cue 21n=4 is presented
from -78 degrees and due to his familiarity with the n = 1 space, the best the subject
can respond with is location 1. Rather than make assumptions about the adaptation
patterns, cues whose normal locations are outside of the normal response range (n = 1;
+60 to -60) are left off of adaptation calculations (see Impact of the edges).
These line-fit results were compared to a transform-fit approach. Rather than
finding the best-fit slope of a line, the subject responses were fitted by varying the
warp strength, n, in the transform formula (given on page 7). Tabulation of the
mean-square error on a run-by-run basis (tables A.1 and A.2) showed that the line-fit
is generally better than the warp-fit. In runs where the warp-fit produced better error
results, the difference is very small (i.e., runs 33 to 40).
(b) Run 17
(a) Run 3
12
12
2 10
0
0.
:
:
:
S10
0o
a 8
6
5 6
CD
..........
2
12
C 10
0
.....
................
.
..
...... . ...
........
5
10
normal location
(c) Run 18
15
..................
. .
..
. . . . . . . . •. . . . . . . . . . :. . . .
0
5
10
15
normal location
(d) Run 32
12
10
. . . . . . .. . . .
6
*4
4
2
0
0
....... . . ...... . . ....I..... ..
~..........
................ .......... ..
5 8
a 8
6
'4
2
C)
4
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
.
.
.
...
.
..........................
........... . .....
..
....
.....
........... . .. ......
2
0
5
10
normal location
0
0
Figure 5-7: Observation of linearity
5
10
normal location
Individual results are presented in figure 5-8. Rates and asymptote values vary
across subjects and are summarized in table 5.1. Rate is the time constant associated with the exponential valued in terms of runs. Subject responses that could not
successfully fit an exponential are listed as N/A.
Comparing subjects, we see that all five subjects appear to adapt to the n =
4 transformation at roughly the same rate. However, it is clear that the rate of
adaptation can vary greatly between subjects when changing from strong (n = 4)
to weak (n = 2) transformations. For instance, subject LCW adapts slowly to the
n = 2 transformation when compared to subject JJP. In contrast, two subjects (MSS
and SC) appear to show no change in slope during exposure to n = 2 cues (note the
flat line fit to their data in runs 17 through 32); instead, their performance is stable
throughout this exposure period.
subject
runs 3-17
asymptote
rate
runs 18-32
asymptote
rate
runs 33-40
asymptote
rate
JJP JIR
LCW
MSS
SC
0.55
0.71
0.62
0.89
0.60
1.20
0.61
1.05
0.66
0.69
0.64
0.99
0.70
3.77
0.68
6.17
0.67
N/A
0.72
N/A
0.87
1.44
0.85
3.10
0.84
1.68
0.83
2.34
0.89
N/A
Table 5.1: Subject Exponential Fit Results
Subject: LCW
Subject: MSS
0.9
O 0.8
00.9
~~.
...............
...
h:
o 0.8
S0.7
0.7
)...........
Q0.6 ..
0.5
0
10 Subj2c: JIR 30
S0.6
0.5)
0
10
.20
... 30
Subject: JJP
;..
00.9
o 0.8
0.9
0.8
......................
(d)
b
......
'''''''
0.7.........................
_
0.7
0.6
I
............................
S0.6
(c
'
0
0.9.
I
.
.
0.5 }
0
10 Subject:
20 SC 30
........
10
.I.........
0.8
0.7
0.6
I
'
..
0
.
.
10
20
30
40
Figure 5-8: Individual Adaptation Results
20
30
Figure 5-9 plots the best-fit line slope averaged across the five subjects as a function of run. It appears that the best-fit slope changes gradually when cue transformation changes. Consistent with [1], the average slope appears to exponentially
approach an asymptotic value as the subjects adapt to each transformation. Given
the inter-subject differences in adaptation rate, little can be said about the relative
rate of adaptation from n = 1 to n = 4 compared to adapting from n = 4 to n = 2.
But, the rate of adaptation is roughly consistent with the average rate of adaptation
in previous experiments [1].
The average asymptote of adaptation across subjects when n = 4 is 0.61 (with
a standard deviation of 0.04) and roughly 0.68 (with a standard deviation of 0.03)
when n = 2. These values are comparable to the average values for asymptotes of
previous experiments where n = 4 (asymptote of 0.59 with a standard deviation of
0.07) and n = 2 (asymptote of 0.73 with a standard deviation of 0.04) [1] especially
when inter-subject variability is considered.
Adaptation
0.95
0.9
0.85
0.8
0
1-
i 0.75
0.7
0.65
0.6
055155
0
5
10
15
0
2
20
25
runs
Figure 5-9: Adaptation over runs
30
35
40
5.6
Imperfection in auditory cues
The unwarped HRTFs used in the experiment are based on measurements taken by
Wightman [3] from the subject SDO, a petite female. Because of the original subject's
smaller head, subject interpretation of the audio cues are slightly skewed. The error
introduced is predictable and can be accounted for by considering the effects of only
the ITD associated with the HRTF.
For some angle 0 there is an associated ITD(O) for each subject. Assuming that
Wightman's subject SDO has a head smaller than any subject I use, interaural delays
presented to my subjects will be smaller than normal for a source at a particular
position. That is, angle Ox normally gives rise to ITDSDo(Ox) and ITDtestsubject(Ox)
where, generally
IITDsDo(Ox)I < IITDtest-subject (Ox)
because of SDO's smaller head. When a source from Ox is presented, even for normal
cues (n = 1), the subject will perceive the source to be at some position lal < OxlJ
While this analysis explains systematic errors in localization (whereby the magnitude of the source angle is underestimated) for normal cues, these errors are very
small compared to the errors introduced when the auditory cues are transformed (fig.
2-1).
5.7
Impact of edges
Data at the extremes of the testing range must be handled differently. For example,
between the second and third runs where the cues change from n = 1 to n = 4, the
auditory range changes from +60 to -60 when n = 1 to +82 to -82 when n = 4.
Because of this change, the range of auditory cues exceeds the range of possible
response positions whenever n > 1.
Because subjects are not instantly familiar with the transformed auditory space,
they are forced to interpret the cues in the context of the old auditory space. When
n = 4 is first introduced, subjects are accustomed to normal cues (n = 1). For
instance, with n = 4 the normal cues for auditory sources 1 through 4 and 10 through
13 fall outside the range of responses (+60 to -60 degrees). Under the expanded range,
it is likely that when the subject initially hears any cue less than 5 or greater than 9,
they will answer 1 or 13, respectively. The difference plot in figure 5-1b, for example,
reflects this effect by the sudden decrease in error occurring before cue 4 and after
cue 10. The small error at the extremes result from the fact that the response range
available to the subjects limits the errors possible at the edge of the range.
To minimize error introduced by these edges, the edge data is treated differently
in the calculation of adaptation.
Chapter 6
Summary
Over the two-hour test period, subjects are able to adapt to the various changes introduced into their auditory environment. Error and bias plots show systematic error
and adaptation. Errors and bias values always decreases as exposure to a particular
warp-strength continues. The mean graphs also demonstrate adaptation as subject
response consistently shifts towards the base line.
Other indications of adaptation are demonstrated by systematic over- and underestimation at instances where warp strength changes. A weak to strong cue change
(run 2 to run 3) produces an overestimation of cue distance from the center while weak
to strong cue changes (run 17 to run 18 and run 32 to run 33) lead to underestimation
of cue locations with respect to the center.
Adaptation can be summarized by the slopes of the line generated by normal cue
versus subject response. In this experiment, adaptation happens at a rate comparable
to adaptation seen in previous experiments when changing from a weak to a strong
warp (n = 1 to n = 4), but is inconsistent across subjects when changing from strong
to weak transforms (n = 4 to n = 2 and n = 2 to n = 1). This difference may be the
result of the magnitude of the change or the direction of the change.
A previous model of adaptation [1] predicts that the exponential rate of adaptation
is independent of the order of runs. Current results are consistent with this prediction
for the initial change in transformation, but show that subject differences can occur
with subsequent cue changes. The same model predicts that the asymptote to which
subjects adapt depends only on the transform strength. The asymptote values in
current experiments are quantitatively consistent with this model.
Appendix A
Warp and Line Fit Results
run
fit-value
0.915000
0.876000
0.688000
0.641000
0.621000
0.617000
0.609000
0.612000
0.604000
0.609000
0.632000
0.608000
0.594000
0.606000
0.602000
0.591000
0.592000
0.651000
0.657000
0.654000
0.671000
0.665000
0.673000
0.661000
0.673000
0.678000
0.679000
0.683000
0.680000
0.674000
0.701000
0.691000
0.777000
0.805000
0.820000
0.834000
0.848000
0.852000
0.866000
0.853000
MSE
0.062621
0.041815
0.139680
0.137652
0.143011
0.163654
0.162175
0.169945
0.221647
0.256640
0.198166
0.315373
0.341567
0.299267
0.300367
0.467556
0.186458
0.216157
0.147900
0.143736
0.188446
0.205563
0.138698
0.166358
0.166455
0.158415
0.132656
0.176875
0.177086
0.133242
0.186548
0.158317
0.155936
0.114007
0.072180
0.070147
0.055556
0.055053
0.065929
0.058607
Table A.1: Line-Fit values
run
fit-value
0.875000
0.810000
1.555000
1.310000
1.215000
1.210000
1.175000
1.185000
1.160000
1.175000
1.275000
1.180000
1.120000
1.160000
1.150000
1.110000
1.110000
0.855000
0.890000
0.880000
0.920000
0.905000
0.930000
0.900000
0.925000
0.940000
0.935000
0.945000
0.945000
0.930000
0.995000
0.965000
0.755000
0.755000
0.755000
0.755000
0.770000
0.775000
0.795000
0.780000
MSE
0.076250
0.034485
1.976269
1.627068
1.313647
1.499365
1.359899
1.357868
1.529519
1.303260
1.545372
1.494498
1.215115
1.232352
1.255091
1.184178
1.248420
0.174047
0.181483
0.154377
0.166857
0.127646
0.228298
0.190203
0.144188
0.219329
0.098412
0.134091
0.214962
0.155193
0.204137
0.175897
0.150068
0.079870
0.044412
0.052217
0.035308
0.037839
0.057732
0.047575
Table A.2: Warp-Fit Values
Bibliography
[1] Barbara G. Shinn-Cunningham. Supernormal Auditory Localization Cues in an
Auditory Virtual Environment.PhD thesis, Massachusetts Institute of Technology,
1994.
[2] Elizabeth M. Wenzel. Localization in virtual acoustic displays. Presence, 1(1):80107, 1992.
[3] F.L. Wightman and D.J. Kistler. Headphone simulation of free-field listening.
Journal of the Acoustical Society of America, 85:858-867, 1989.