Adaptation to Auditory Localization Cues from an Enlarged Head by Salim Kassem B.S., Electrical Engineering (1996) Pontificia Universidad Javeriana Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Master of Science in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology June 1998 @ 1998 Massachusetts Institute of Technology All rights reserved ................ .............. Signature of Author .................. Department of Electrical Engineering and Computer Science May 20, 1998 Certified by ... Nathaniel I. Durlach Senior Research Scientist of Electrical Engineering and Computer Science /Thes Supervisor - Accepted by ............................... ... C.Sm th rthu C. Smith AccptembyArthur Chairman, Department Committee on Graduate Students gra Se%94i~iA Adaptation to Auditory Localization Cues from an Enlarged Head by Salim Kassem Submitted to the Department of Electrical Engineering and Computer Science on May 20, 1998 in Partial Fulfillment of the Requirements for the Degree of Master of Science in Electrical Engineering and Computer Science ABSTRACT Auditory localization cues for a double-size head were simulated using an auditory virtual environment where the acoustic cues were presented to subjects through headphones. The goals of the study were to see if better-than-normal resolution could be achieved and analyze how subjects adapt to this type of transformation of spatial acoustic cues. This worked follows that done by Shinn-Cunningham (1994, 1998) and ShinnCunningham, Durlach and Held (1998a, 1998b), where a nonlinear remapping of the normal space filters was implemented. The double-size head's acoustic cues were simulated by frequency-scaling normal Head Related Transfer Functions. As a result, the Interaural Time Differences (ITDs) presented for every position were doubled. Therefore, even though the relationship between the location a naive listener associates with a stimulus and its correct location is not linear, it is a linear transformation in ITD space. Since ITDs were doubled, some ITDs presented to the listener were larger than the largest naturally-occurring ITDs, which proved to be a problem. Bias and resolution were the two quantitative measures used to study performance as well as to examine changes in performance over time. Also, the Minimum Audible Angles for normal and altered cues were determined and used to obtain estimates of subjects' sensitivity. In the experiments, mean response and bias changed over time as expected, clearly showing the adaptation process. Resolution results were less consistent, giving better-than-normal resolution around the middle positions with altered cues. Nevertheless, normal cues provided better overall performance. When correct-answer feedback was used, resolution behaved as expected, but when feedback was not presented, results were consistent with subjects attending to the whole range of possible cues throughout the experiment (i.e., the internal noise was large and constant). Previous work suggested that mean response, bias and resolution are dependent on each other and that all have the same adaptation rate. However, the no-feedback condition proved that resolution can be independent of the other quantities. Finally, estimates of sensitivity indicated that resolution is strongly related to the type of cues used and that changes in resolution depend directly on the total internal noise. Thesis Supervisor: Nathaniel I. Durlach Title: Senior Research Scientist of Electrical Engineering and Computer Science ACKNOWLEDGMENTS Dedico este trabajo de grado a mi esposa, quien me di6 todo el apoyo, toda la amistad y todo el amor que necesitd. Graciaspor creer en mi. A mis padres y hermanos, por ayudarme a ser como hoy soy. A mi papa', porque sin su esfuerzo no podria estar aquf. A mis verdaderos amigos. This work is dedicated to my wife , who gave me all the support, all the friendship and all the love I needed. Thank you for believing in me. To my parents and siblings, for helping me be who I am. To my father; without his effort I would not be here. To my truly good friends. I want to thank Nathaniel Durlach for his support and for giving me the opportunity of learning wonderful things. Special thanks to Barbara Shinn-Cunningham, for all her unconditional help. Without her guidance, this work could never have been finished. I also want to thank Lorraine Delhorne, Jay Desloge and Andy Brughera for all their kind collaboration. TABLE OF CONTENTS ABSTRACT ..................................................................................... ACKNOWLEDGMENTS ........................................................................................................................ 3 ................ 1. INTRODUCTION ................................................................................................... 9 2. BAC KG RO UND ....................................................................................................................................... 2.1. ............ NORMAL AUDITORY LOCALIZATION .................................... ... .................... 9 ................................................................... 13 2.2. AUDITORY VIRTUAL ENVIRONMENTS ................. 15 2.3. SENSORY IMPROVEMENT................................................... 21 3. ADAPTATION TO SUPERNORMAL CUES ......................................................................... ...... 3. 1. MOTIVATION........................................................................... 3.3. EQUIPMENT AND EXPERIMENTAL SETUP........................... ........ ................ 21 . 3.2. SUPERNORMAL AUDITORY LOCALIZATION: DOUBLE-SIZE HEAD .................. ............................ 24 ...................... 27 3.4. ADAPTATION EXPERIMENT WITH FEEDBACK ........................................ 3.4. 1. Exp eriment Description ....................................................................................... 2 7 3.4.2. Analysis ............................... ............... 28 .............................. 29 3 .4 .3. Exp ected R esults .................................................................................................................. 3.4.4. R esults.................... ........................................ .................................................................... 3 1 3.4.5. Error in M easured tH RTFs ...................................................... ............ 35 ........................ ..... 3.5. ADAPTATION EXPERIMENT WITHOUT FEEDBACK ...................................... ...... ................. 37 3.5. 1. Experim ent Description .............................................................. 37 3.5.2. Exp ected R esults .................................................................................................................. 38 3.5.3. Results................ 38 .................. 4. JUST NOTICEABLE DIFFERENCE ............................................................................................. 4.1. M OTIVATION............. 4.2. B ACKG RO UND ......... . .............. .. ............................. 44 ............................................ 44 ............................................................................. 44 45 4 .3 . N E W H RT Fs ................................................................................................................................. 4.4. EQUIPMENT AND EXPERIMENTAL SETUP................................ ................. ........... 47 4.5. EXPERIMENT DESCRIPTION ......................................................... 48 4 .6. EX PECTED R ESU LTS...................................................................................................................... 50 4 .7 . R ES ULT S ................................................................................................................. ................... 5 1 5. MODEL OF ADAPTATION ........................................................................................................... 53 5.1. REM APPING FUNCTIO N ..................................................................................................................... 53 5.2. AVERAGE PERCEIVED POSITION ........................................................... 55 6. RELATING JND AND RESOLUTION ......................................................................................... 61 6.1. BACKGROUND ............................................................................ 6.2. R ESU LTS .................................................................................................................. .................... 7. CO N CLUSIO N ....................................................................................................................................... 7.1. SUMMARY ................ 61 ............................................................ 64 69 69 7 .2. D ISCU SSIO N ...................................................................................... 70 7.3. FUTURE WORK .............................................. 72 REFEREN CES ........................................................................................................................................... 74 1. INTRODUCTION In recent years, computing technology has provided us with more sophisticated ways of gathering data, increasing the amount and complexity of the information presented to users. As a result, the systems that work with this information are more complex and more difficult to operate and understand. Today's graphic computer interfaces are a first approach to easing the resulting burden of displaying information. Lately, attention has been given to a more sophisticated interface, referred to as virtual reality, whose objective is to provide a more efficient and natural way of presenting and manipulating information by incorporating a three-dimensional spatial cues in the display (Wenzel, 1992). Using this technology, a human operator can interact with a real environment via a human-machine interface and a telerobot as if he were the one standing in the remote working area. Ideally, the operator should see, hear, and feel what the telerobot sees, hears, and feels. Moreover, the telerobot can provide additional information that can be useful to the operator (e.g., temperature, speed, etc.). Normally, the teleoperator system is used to interact with a remote, inaccessible or hazardous environment, protecting the physical integrity of the operator while permitting him to control and achieve a specific task. The signals in the telerobot's environment are sensed, sent back, and displayed to the human operator. In the same way, the actions taken by the operator in response to the signals are transmitted to the telerobot and used to control its actions (Durlach, 1991). In a virtual-environment system, the same kind of human-machine interface is used, but a computer simulation replaces the telerobot and the environment. The purpose of a teleoperator system is to extend the operator's sensory-motor system in order to facilitate the manipulation of the physical environment, while in a virtual reality system the objective is to study or alter the human operator. General information on teleoperators and virtual-environments can be found in Vertut and Coiffet (1986), Sheridan (1987), Bolt (1984), Foley (1987), and Durlach and Mavor (1995). In the past, use of the visual modality was the primary method for presenting spatial information to a human operator. However, more recently, the auditory system has become recognized as an alternative for delivering such information. Acoustic signals are very useful because they can be heard from any source direction, they tend to produce an alerting or orienting response, and they can be detected faster than visual signals (Wenzel, 1992). In this project, attention is given only to the auditory localization features of the machine-human interface, and particular consideration is given to how to provide the operator with a better-than-normal localization ability, so-called supernormal auditory localization (Durlach, Shinn-Cunningham, and Held, 1993). Such an approach attempts to provide acoustic cues that yield better effective spatial resolution than do normal cues. This is achieved by increasing the change in the physical acoustic cues that result when source position changes. Improving the effective resolution is desirable because the normal human auditory localization system has extremely poor resolution in azimuth at angles off the side, in elevation, and in distance; it has at least a moderate resolution only in azimuth for sources in the front. In other words, we have relatively poor spatial resolution for acoustic sources, especially when compared with visual spatial resolution. Durlach, Shinn-Cunningham, and Held (1993) proposed several ways to increase the directional resolution by using localization cues that would improve the justnoticeable-difference (JND) (i.e., the minimum separation for which a listener can resolve two adjacent spatial positions). Some of the suggested methods for achieving supernormal cues include simulating the localization cues from an enlarged head, remapping the normal localization cues to increase resolution in some regions of the azimuth plane while decreasing it in others, and exponentiating the complex interaural ratio at all frequencies (Durlach and Pang, 1986). As Shinn-Cunningham (1998a) noted, these approaches should improve the subject's ability to resolve sources in JND-type experiments, but the effects on identification tasks using a larger range of physical stimuli are not clear. In addition, the use of supernormal localization cues will displace the apparent location of the source for a naive listener when he is first exposed to these remapped cues. Adaptation to the new cues is said to have taken place to the extent that the mean localization error diminishes over time with training. Given the results obtained in a previous work by Shinn-Cunningham (1994) and Shinn-Cunningham, Durlach and Held (1998a, 1998b), a study of supernormal auditory localization cues will be undertaken, using the suggested enlarged head approach (Durlach, Shinn-Cunningham, and Held, 1993). Auditory localization cues for a doublesized head will be simulated and presented to the subjects during the experiments. The main goals of this project will be: To analyze how subjects adapt to a transformation of spatial acoustic cues that is approximately linear, to extend the quantitative model of adaptation developed from the nonlinear adaptation results (Shinn-Cunningham, 1998), and to see if better-than-normal resolution is achieved with the double-head size cues. Also, the results of this experiment will be compared with those of Shinn-Cunningham (1994 and 1998) and Shinn-Cunningham, Durlach and Held (1998a, 1998b) to explore how different types of remappings affect adaptation to remapped auditory spatial cues. Following their work, bias (a measure of response error in units of standard deviation) and resolution (the ability of reliably differentiate between nearby stimulus locations) are the two quantitative measures that will be used to analyze the performance of subjects over the course of the experiments. 2. BACKGROUND 2.1. Normal Auditory Localization The classic duplex theory (Lord Rayleigh, 1907) states that the interaural differences in time of arrival and interaural differences in intensity, are the two primary cues used for auditory localization (Figure 1). Interaural time differences (ITDs) arise when a sound source is to one side of the head, since the sound reaches the nearest ear first'. If a sound source is far enough from the head, then sounds' wavefront is approximately planar when it reaches the head. The distance the sound must travel to reach the two ears differs, depending on source location. Assuming a spherical model of the head with radius r, the difference in the travel distance for a source on the horizontal plane at an angle of 0 (in radians) is given by (Figure 2): Ad = r -(0 + sinO). (1) Assigning a radius of 8.75 cm to the spherical head, and knowing that the velocity of sound c is 343 m/sec, the interaural time difference (ITD) can be expressed as: ITD Ad C 255x 10 -6 -(0 + sin0) [sec]. (2) Figure 3 shows predictions of ITD based on equation 2 and measurement of ITD for adult males (Mills, 1972). The duplex theory states that the relative left-right position of a sound source is determined by ITDs for low frequency sounds and IIDs for high frequency sounds. As the duplex theory explains, ITDs give good perceptual cues for sound location only for low The sound will reach the farther ear 29 psec later per each additional centimeter it must travel (Mills, 1972). frequencies; at frequencies higher than 1500Hz, phase ambiguities occur. The phase information becomes ambiguous at high frequencies because the wavelengths are smaller than the distance between the ears. closer rD Interaural sources off sooner at t --- closer ear -- IID Interaural Intensity Differences (lIDs): sources off to one side are louder at the closer ear due to head- shodowing Figure 1. The duplex theory postulates that interaural intensity differences (lIDs) and interaural time differences (ITDs) are the two primary cues for auditory localization (from Wenzel, 1992). r-(O+si Lef Figure 2. Differences between the distances of the ears from a sound source that is far away and that can be represented as a plane wave front (from Mills, 1972). On the other hand, sources off to one side of the head are louder at the closer ear due to head-shadowing; the head acts as a low-pass filter for the far ear, making IDs important localization cues for high frequencies. This acoustic effect occurs because wavelengths are large relative to the size of the head at high frequencies. It has been found that ITD is the major cue for determining the location of sources along the horizontal plane, and that the spectral peaks and the notches produced by the filtering effect of the pinnae (mainly above 5kHz) are important for determining source elevation. Even though the duplex theory provides a clean and simple explanation for determining the lateral position of a sound, this approach presents several limitations. For example, listeners use the time delay envelope of high frequency sounds for localization even though they do not use ITD at these frequencies. The direction-dependent filtering that occurs when sound waves impinge on the outer ears and pinnae also provides very important localization cues. It has been shown that the spectral shaping by the pinnae is highly directional dependent (Shaw, 1974 and 1975), and that the pinnae is responsible for the externalization of the sounds (Plenge, 1974). 0ANGLE FROM DIRECTLY AHEAD Figure 3. Interaural time difference (ITD) as a function of the position of a source of clicks. X: measured values from five subjects. 0: values computed from the mathematical approximation (from Mills, 1972). Therefore, the auditory system's method for determining source position depends on a directional dependent filtering that occurs when the received wave sound interacts with the head, ears, and torso of the listener. Let X(w) be the complex spectrum of the sound source and YL(m,O,B) and YR(Wm,O, ) be the complex spectrum of the signals received in the left and right ear respectively. Then, for sources that are sufficiently far from the listener (so that distance only affects the overall level of the received signals), and for anechoic listening conditions, one can write: YL(O, 0,) = r-' -HL(o,, ) -X(o) Y, (o,,) = r -' H R (0,0,) • X(O), (3a) (3b) where r is the distance from the head to the source, and HL(o,,,) and HR(m,0,4) are the space filters or Head Related Transfer Functions (HRTFs) for each ear, describing the directional dependent effects of the head and body. The HRTFs depend on the frequency, o; the azimuth of the sound source relative to the head, 0; and the elevation of the source relative to the head, 0. The auditory system compares the signals received at the two ears in a manner that can be usefully represented mathematically by forming the ratio: YL(o,)0,) YR (W,O, HL(O,0',) ) (4) HR (,0,0) In this ratio, the effect of r and the effect of X(w) are canceled, and the ratio depends only on o, 0, and 0. The auditory system can determine the location of the sound source from the ratio, independent of source characteristics. The magnitude and the phase of the ratio of the signals at the two ears for a source at direction (0,0) are equivalent to the interaural intensity difference (IID) and the interaural time difference (ITD), respectively. Even though interaural processing (i.e., computation of IID and ITD) offers useful localization information, directional ambiguities can occur: (i) distance is not perceived because its effect is negligible for distant sources, and (ii) front-back confusions appear _ __ __ because of the so-called cone of confusion2 (Mills, 1972). Head movements and monaural processing help to resolve front-back ambiguities. Head movements cause changes in IID and ITD which differ for a source in front or behind the listener. Also, a priori knowledge or information about the transmitted signal X(o) can allow monaural spectral cues to be used to estimate the space filters HL(o,O, ) and HR(o,O,) from the signals YL(0,0,0) and YR(w,O,) received at the two ears. Wightman and Kistler (1992) found that low-frequency ITDs are the dominant cues for localization of broadband sound sources. Although ITD cues are dominant, when the low-frequency components of a stimulus are removed, direction is determined by IID and spectral shape cues. In other words, when low-frequency interaural time cues are present, they override the ID and the spectral shape cues that are present in other frequency ranges. It follows that in every condition in which there is a conflict between low-frequency ITD and any other cue, sound localization is determined mainly by ITD. The ITD is used primarily to establish the locus of possible source location (i.e., to determine on which cone of confusion the sound source lies), while lID and spectral filtering help to resolve any ambiguity in ITD information. Integration of all available cues leads to accurate localization (Wightman and Kistler, 1992). More information about normal auditory localization can be found in Blauert (1983), Mills (1972), Wightman, Kistler, and Perkins (1987), Wenzel (1992) and Durlach, Shinn-Cunningham and Held (1993). 2.2. Auditory Virtual Environments In order to better understand the importance of auditory cues such as ITD, IID and pinnae effects, and to enhance their capabilities, researchers have begun to use auditory virtual environments to simulate acoustic sources around the listeners. This approach 2 The cone of confusion errors arise because a given ITD or IID produced from one source position is roughly equal to that produced by sound sources located at any place over the surface of a hyperbolic surface (with a cone shape) whose axis is the interaural axis. gives the experimenter good control of the stimulus while creating rich and realistic localization cues. One class of simulation technique derives from the measurement of Head Related Transform Functions (HRTFs). Using a normative mannequin, such as the KEMAR (Knowles Electronics, Inc.), it is possible to obtain good estimates of the acoustic effects of the head and the pinnae on sounds reaching the listeners' ear drum as a function of source position. Using these finite impulse response (FIR) filters, it is possible to filter an arbitrary sound to give it spatial characteristics (i.e., to simulate a sound coming from a predetermined direction). Even though the HRTFs provide good acoustic cues, the localizability of the sound also depends on other factors, such as its original spectral content (e.g., narrow band sounds like pure tones are harder to localize than broad band tones). Individual differences in the pinnae appear to be very important for some aspects of localization, most notably resolving cone of confusion errors. Several studies show that most listeners can obtain useful directional information from a typical HRTF, suggesting that the basic properties of the HRTFs carry much of the important localization information (Wenzel, 1992). Using digital signal processing (DSP) systems, real time simulation of acoustic cues can be used to generate spatial auditory cues over headphones. These systems use time domain convolution to achieve the desired real time performance, reproducing a free-field experience. Using a head tracker device attached to the headphones, the system can determine the actual head's yaw, pitch and roll and decide which set of HRTFs is needed for presenting a source from a particular position. The DSP system will then filter the input signal with the proper HRTF. Even if the subject's head is moving freely, the head tracker allows the presentation of a fixed sound location by calculating the relative azimuth and elevation from the source to the head. Of course, the term real time is a relative one given that it is not possible to select the appropriate HRTF on the fly. Some processing time is needed for all the computations. Due to the constraints of memory and computation time, DSP systems must make several approximations and simplifications, losing some reliability. A typical HRTF record consists of a pair of impulse responses (i.e., one for the right and one for the left ear), measured from several equidistant locations around the subject. The HRTFs are then estimated by canceling the effects of the loud speakers, the stimulus, and the microphone responses from the recorded signal (Wightman and Kistler, 1.989). For example, the HRTFs measured by Wightman and Kistler (1989) from their subject SOS consisted of 36 azimuth positions (with a 100 resolution) ranging from 1800 to -170', and 14 elevation positions (with a 100 resolution), ranging between 80 to -500. Hence, the HRTFs represented a total of 504 positions (36 in azimuth times 14 in elevation). The HRTF for a specific position is stored as two 127 tap FIR filters, each containing the impulse response for one of the ears. Figure 4 shows typical HRTF waveforms for two different locations in azimuth at 0' elevation, and demonstrates how ITD and IID vary as a function of the direction of the sound source. For a source at 0Oin azimuth (i.e., right in front of the listener), there is very little difference in either the magnitude (lID) or the phase (ITD) responses for both ears (top right plots); this is highlighted by taking the ratio between the responses of both ears (bottom right plots). Because sound arrives almost at the same time and with the same magnitude at both ears, the ratio of the phase and magnitude is almost zero. For a source at -400 in azimuth (i.e., to the left of the listener), the magnitude (lID) of the left ear is greater than the one of the right, while the phase (ITD) of the right ear is larger (top left plots). As expected, the ratio between the right and left ear responses (bottom left the plots) shows a negative magnitude (i.e., the sound at the right ear has less energy than sound at the left ear) and a negative overall phase (i.e., the sound arrives at the right ear later that at the left ear). 2.3. Sensory Improvement It is now possible to think not only of better ways to simulate normal localization cues, but also of methods for transforming the natural acoustic cues for the purpose of achieving better spatial resolution (e.g., superlocalization, Durlach, 1991). Frequency responselet(--) and right(-- -) ear(0 degrees) Frequencyresponseleft(-) and right(--- ) ear(40 degrees) - 10 100 90 80 I 0 s 6000 8 10 12 if 70 60 10 40L 0 2000 4000 6000 8000 10000 12000 0 0 1 20 70 000 Jr-- righf/teft ear800(40 degrees) 10 Frequency 400 responseMXX -' 12 0 -30 -60- -20 -40 -60- -80 -100 0 -1001 2000 4000 6000 8000 10000 12000 2000 4000 6000 8000 10000 12000 2000 4000 6000 8000 Frequency (HzJ 10000 12000 Frequency (HzJ Frequencyresponse rightWlf eer (-40 degrees) 20 10 0 20 -10- -10- -20- -20- -30 -401 0 0 2000 2000 4000 4000 6000 6000 8000 8000 10000 10000 0 12000 12000 10 0 0 to -10 20- -20 30 -30 -401 0 2000 4000 6000 8000 Frequency1Hzl 10000 12000 -40 0 Figure 4. Frequency responses for -40o and 00 in azimuth and 0' in elevation of the HRTFs measured by Wightman and Kistler (1989) from their subject SOS. The figure illustrates how the HRTFs contain the IID, ITD, and pinnae effect cues. Some studies have tried to show how subjects adapt to unnatural auditory localization cues. One set of such studies (Warren and Strelow, 1984; Strelow and Warren, 1985) investigated the use of the Binaural Sensory Aid, a device that used auditory localization cues as a way of representing the position of objects sensed with sonar. Here, the ITDs contained information about the distance from the object, and the IIDs gave its direction. The results of this study showed that blindfolded subjects were able to adapt and use these unnatural cues accurately, after being trained using a correctanswer feedback paradigm. In an attempt to improve spatial resolution (i.e., improving the JND in direction), a study on supernormal auditory localization was undertaken (Durlach, ShinnCunningham, and Held, 1993). Its main goal was to determine if adaptation to rearranged acoustic spatial cues was possible and to see whether resolution could be improved. In this study, supernormal localization cues were created by remapping the relationship between source position and the normal HRTFs (Durlach, ShinnCunningham, and Held, 1993). The transformation was supernormal only for some positions. At other positions the rearrangement actually reduced the change in acoustic cues with changes in source location. To simulate a sound at position 0, the study used HRTFs that were chosen from the normal HRTF set, but which normally correspond to a different azimuth. The new HRTFs are given by: H'(w, 0, ) = H(o, f, (),). (5) 'With this transformation no new HRTFs were created. Instead, the existing HRTFs were reassigned to different angles. The family of mapping functions fo(O) used to transform the horizontal plane was given by: (0) I1-tan 2 2 2n sin(20) (6) 1-n 2 +(I+n')cos(26) ° where the parameter n gives the slope of the transformation at 0=0 . Figure 5 shows this transformation for several cases of n. When n=l, cues are not rearranged. With n>l the transformation increased the cue differences (and therefore the resolution) around values of 0=00, while it decreased them in the neighborhood of 6--90. For n<l the opposite occurred. As a result, subjects were expected to show better-than-normal resolution in the front, and lower resolution towards the sides when n>1. In the study, subjects were first tested with normal localization cues (to determine baseline performance), and then with altered (supernormal) cues to examine how performance changed with training. Finally, normal cues were presented again to see if there was any after-effect as a result of training. Bias, a measure of the error in the subjects response, and resolution, a measure of the ability to resolve adjacent stimulus positions, were the two quantities used to analyze the adaptation process throughout the experiments. Figure 6 and 7 illustrates bias and resolution results for one of the experiments in this study, in which correct-answer feedback was used to train the subjects. 80 60 40 Co a)3 1.. 0) a) *0 V. 20 0 -20 -40 -60 -80 -50 -30 -10 source azimuth 10 30 50 0 (degrees) Figure 5. A plot of the azimuth remapping transformation specified by equation 6 (from ShinnCunningham, Durlach, and Held, 1998a). The first normal cues are expected to show small bias (error in units of standard deviation) since cues are roughly consistent with normal localization cues. The first run using altered cues resulted in a very large bias, indicating the sudden introduction of the unnatural sounds. The last run using altered cues showed a decrease in bias compared to before training, demonstrating that the correct answer feedback caused subjects to adapt to the new cues (although, adaptation was not complete). Finally, the first normal cue test following training with altered cues produced a negative after-effect, indicating that the performance was not controlled exclusively by conscious correction (Shinn-Cunningham, Durlach, and Held, 1998a). Resolution of adjacent source locations is shown in Figure 7. In the first normal cue run, resolution provides a standard against which other results are compared. As expected, when altered cues were presented for the first time, resolution increased around the center positions and decreased at the edges of the range. In the last run with altered cues, resolution remained enhanced (with respect to the baseline), but showed a decrease compared to the first altered cue run. As before, an after-effect was seen after normal cues were introduced again (Shinn-Cunningham, Durlach, and Held, 1998a). - -- 2.4- 1.6 - - n=1 - - --n=3 n=3 - n= o9 O- 0 0.8Pi 0.0 -0.8-1.6-2.4- Io o~ -60 -40 -20 0 20 40 60 Source position (degrees) Figure 6. Bias results for one of the experiments carried out. 0: First run in the experiment using normal cues. ': First run with altered cues. *: Last run with altered cues. 0: First normal cue run following altered cue exposure. Here, the altered cues have a transformation strength of n=3 (from Shinn-Cunningham, Durlach, and Held, 1998a). This study showed that subjects could not adapt completely to a nonlinear remapping of the auditory localization cues. In general, subjects were able to reduce their response bias with training, but they could never completely overcome their errors. In addition, although the transformation initially increased resolution as expected, resolution decreased as subjects adapted to the remapping. Shinn-Cunningham, Durlach and Held (1998a) concluded that resolution depended not only on the range of physical cues presented during an experiment, a result previously described for perception of sound intensity (e.g., Durlach and Braida, 1969; Braida and Durlach, 1972), but also upon the past history of exposure or training of the subject. The researchers also found that subjects adapted to the best-fit linear approximation of the nonlinear transformation, implying that subjects may only be capable of adapting to linear transformations of the localization cues (Shinn-Cunningham, Durlach and Held, 1998b). 4.0 3.0 d'i 2.0 1.0 0.0 -60 -40 -20 0 20 40 60 Source position (degrees) Figure 7. Resolution results for one of the experiments carried out. 0: First run in the experiment using normal cues. +: First run with altered cues. *: Last run with altered cues. 0: First normal cue run following altered cue exposure. Here, the altered cues have a transformation strength of n=3 (from ShinnCunningham, Durlach, and Held, 1998a). 3. ADAPTATION TO SUPERNORMAL CUES 3.1. Motivation The main goal of this project is to examine further whether humans can adapt to unnatural (altered) auditory localization cues that will provide listeners with better-thannormal localization ability, so-called supernormal auditory localization (Durlach, ShinnCunningham, and Held, 1993). In contrast with the previous study Listed above (e.g., Shinn-Cunningham et al., 1994 and 1998) where a nonlinear remapping of the normal space filters was implemented, a more linear approach that expands all positions is now taken to create supernormal HRTFs. The earlier experiments showed that subjects adapted to the best-fit linear approximation of a nonlinear transformation. This could mean that subjects are only able to adapt to linear transformations. This study is designed to give further insight into the adaptation process to determine if this linear constraint holds for other cue transformations. In addition, the new transformation may provide listeners with a higher spatial sensitivity and, hopefully, a low overall localization error. 3.2. Supernormal Auditory Localization: Double-Size Head To improve resolution, the localization cues must increase the discriminability between separated sources. This may be achieved by having a larger-than-normal difference in the physical cues corresponding to two different positions. One way of achieving this is by simulating a larger-than-normal head, thereby increasing the ITDs and lIDs associated with every position in space. For a subject who has not adapted to such a change in cues, the use of such a transformation will make him think that the location of sound sources are farther apart than they actually are. The double-size head was simulated by frequency scaling normal HRTFs (Rabinowitz, Maxwell, Shao, and Wei, 1993). The new pair of HRTF filters are defined as follows: HL (0,,4) = HL(K(o,,) R (o0,8,) = H R (Ko,,4), (7) where K has a constant value. This transformation approximates the acoustic effect of increasing the size of the human body, including the head and pinnae, by a factor of K. As a result, the IlD and ITD will also be affected, and will be determined by the new ratio: YR (,0,) 1 HR (0)O,) (8) Here, both the interaural differences and the monaural spectral cues are magnified by the factor K (Durlach, Shinn-Cunningham, and Held, 1993), and therefore, it is said to be a linear transformation. For the current study, the frequency was doubled (i.e., by setting K=2), simulating a head twice the normal size. As Rabinowitz, Maxwell, Shao, and Wei (1993) showed, scaling the HRTFs corresponds to uniformly scaling up all physical dimensions to simulate the main acoustic effects of a magnified head. The transformation of the HRTFs presents several problems: Scaling the frequency of the normal HRTFs can be achieved by inserting an additional sample equal to zero after each sample of the original HRTF impulse response. This causes the time signal (i.e., the impulse response) to increase in length by a factor of two (i.e., K=2). In the frequency domain, the spectrum is compressed by a factor of two. The new HRTFs must be low-pass filtered to remove energy above the original Nyquist frequency. For example, if the normal HRTFs are defined up to 20kHz, the new HRTFs are only defined up to 10kHz. Without low-pass filtering the upsampled waveforms, this procedure would create distortion of the spectrum above 0lkHz due to spectral aliasing. Conversely, since the size of the head is doubled, the ITDs presented to the listener will include larger ITDs than the largest naturally-occurring ITDs. For example, a source at 900 (or -900) will produce the maximum normal ITD of around 0.65 msec (Figure 3). With the transformed cues, the corresponding ITD will be 1.3 msec. It is not clear how subjects will perceive these unnatural cues. As a consequence, subjects must adapt to the expanded interaural axis not only by relabeling it, but also by interpreting larger than normal ITDs (Durlach, 1991). Normal HRTFs from subject SOS (Wightman's and Kistler's, 1989) were used to create the double-head HRTFs. Each position described by the HRTFs contains two 127 tap FIR filters (one containing the filter coefficients for the right ear and one for the left ear) sampled at 50kHz. To create the double-head HRTFs, each FIR filter was upsampled by a factor of two and then low pass filtered at 25kHz (Figure 8). As a result, the new altered HRTFs were two times longer (i.e., each FIR filter is now 254 tap long), and sampled at 100kHz. alteredHATF(-40degrees) Frequencyresponse FrequencyresponsenormalHRTF (40 degrees) -10 0 -10 00 2000 -40 0 3000 4000 5000 60 3000 4000 5000 6000 -400 2000 4000 000 000 10000 1000 102000 2000 0 0 (Hz] Frequency Frequency[Hz] Notice the different frequency scales Figure 8. Comparison between normal and altered HRTFs for a source at -400 in azimuth. The left panel shows the normal HRTF while the right one shows the altered HRTF under different frequency scales. The shapes of both frequency responses are the same, except for the fact that the altered HRTF has been scaled in frequency, indicating that the upsampling was successful. Note that the ITD (given by the slope of the phase as a function of frequency) for the altered HRTF is now doubled. Figure 8 compares the frequency responses of the HRTF at -400, showing that the upsampling doubles the ITD (the ITD is given by the slope of the phase response as a function of frequency). As mentioned above, the altered HRTFs are now compressed in frequency by a factor of two, and in order to prevent unpredictable results at high frequencies, the new HRTFs were low-pass filtered. As a result, the magnitude response is effectively zero above 12.5kHz.. 3.3. Equipment and Experimental Setup Adaptation to the double-size head auditory localization cues was investigated by presenting simulated acoustic cues and real visual cues. The acoustic cues were generated by an auditory virtual environment. Visual cues were provided by a light display located in front of the subjects. The visual cues were used to provide the subjects with spatial feedback about the simulated sounds. Subjects were seated in front of a five-foot-diameter arc of lights, consisting of thirteen 2 inch light bulbs. The lights were labeled from 1 to 13 (all lights were visible to the subjects during the experiment). The lights were positioned from -600 to +600 in azimuth with respect to the head position, with a 100 separation between each pair of lights. The position -60o azimuth was represented by light 1, 0Oby light 7, 60' by light 13, etc. The light array was connected to a digital-analog device, the light driver, which receives a digital input from a personal computer (PC) and converts it to an analog output that drives the current to each light bulb. The PC used Data Translation's DT2817 Digital I/0 Board to transmit signals to the light driver, enabling it to turn the light on or off at any of the 13 positions (Figure 9). This light array provided visual feedback to the subjects. The acoustic cues were simulated by an auditory virtual environment system consisting of a PC, a signal-processing device, a head tracker, headphones, and a function generator. The head tracker transmits to the PC the instantaneous head orientation of the subject with respect to 0' azimuth (i.e., the 00 position in the light array, calibrated during start-up procedures). The PC calculates the relative direction of the head with respect to the desired source position. This information is then transmitted to the signal-processing hardware, which filters the waveform provided by the function generator with the appropriate HRTFs to produce the left and right ear signals. Finally, the binaural signal generated by the signal processing hardware was played to the subject over headphones. _20o -20 -10o ^ 0 10 00 200 /o -50 -600 Figure 9. Diagram of the light array which gave subjects spatial visual feedback. Thirteen light bulbs represent 13 positions ranging from -600 to 600 in azimuth with respect to the subject's head position. Lights were placed at 100 intervals. The Escort EFG-2210 function generator provided the system with a 5Hz periodic train of clicks (i.e., square wave) as the sound source. As described later, the subjects heard roughly 5 clicks per trial, as the signal-processing hardware switches the input signal on and off asynchronously. A Polhemus 3Space Isotrack provided head position information. The Isotrack uses electromagnetic signals to measure the relative position (azimuth, elevation and roll) between a stationary transmitter and a receiver worn on the subject's head. The PC, a Pentium-S based machine running at 100MHz, controlled the signalprocessing hardware and the light array and ran the experiment's software control program. To present a source, the program randomly selected one source position from the 13 possibilities. The relative position between the selected source and the subject's head was calculated after reading the position of the listener's head from the head tracker. The PC instructed the signal-processing hardware to generate the appropriate binaural cues and present them to the subject, based on these computations. The signal-processing hardware used was the System II, a signal-processing platform from Tucker Davis Technologies. The System II consists of analog and digital interface modules permitting the synthesis of high-quality analog waveforms, including PA4 Programmable Attenuators, an HTI Head Tracker Interface, and the PD1 Power Sdac (a real-time digital filtering system). An analog to digital converter (ADC) received the input waveform and filtered it with the selected HRTFs. The binaural signal was then passed through the output digital to analog converter (DAC) which was connected to the PA4s. The programmable attenuators controlled the length of the stimuli that the subjects received. While the attenuators were in the mute state, no sound was heard. The PA4s were switched out of the mute state for one second per trial, allowing roughly 5 clicks to be heard. The HTI permitted the computer to read the coordinates provided by the head tracker. Figure 10 shows a block diagram of the virtual auditory environment used to simulate the acoustic cues. Transmitter Receii - Function Generator sound source Figure 10. Block diagram of the virtual auditory environment that simulated acoustic localization cues. 3.4. Adaptation Experiment with Feedback 3.4.1. Experiment Description In each testing run, subjects had to face front (0' azimuth) while a continuous sound (click train) was presented from a random location. When the sound was turned off, they were asked to identify the location of the sound by reporting the position number (i.e., a number between 1 and 13) to the operator, who entered it on the keyboard. As soon as the answer was typed into the computer the appropriate light was turned on as a way of giving a correct-answer feedback. One second after the subject's response, the next random sound was presented. All locations were presented to the subject exactly twice in each run (i.e., the locations were chosen at random without replacement). Thus, if 13 positions were used, 26 trials were presented in each run. Each run lasts around 3 minutes. Finally, each run could present either normal or altered cues, determined by selecting the appropriate set of HRTFs. The basic experimental paradigm was similar to that used by Shinn-Cunningham 1(1994). Each subject performed 8 identical sessions of 40 testing runs each. In each :session, the first 2 runs and the last 8 runs used normal cues, while the others used altered HRTFs. Eight sessions were necessary in order to have a sufficient number of trials to average across. It was assumed that all trials were stochastically independent even though the positions presented were chosen at random without replacement. Before the beginning of the experiment, subjects were informed that both normal and altered cues would be used at different times and that the apparent location of simulated sources using altered cues may not be their correct location. Also, the subjects were notified every time that a change of cues was about to occur (from normal to altered or from altered to normal), so that they would answer as accurately as possible for the current cues. Data from five subjects were gathered. All subjects were naive (without prior experience in auditory localization experiments), reported normal hearing, and had no difficulty performing the test. 3.4.2. Analysis Bias and resolution were the two quantitative measures used to study the performance and adaptation of each subject under these experimental conditions. Bias measures the error in the subject response (in units of standard deviation), describing how well the subjects adapted to the altered cues. Resolution measures the ability to resolve adjacent stimulus positions. As described by Shinn-Cunningham (1994), there are three basic processing schemes that can be used for finding estimates of the average signed error (bias) and the response sensitivity (resolution). All schemes assume that each presentation of a physical stimulus results in a random variable with a Gaussian distribution along some internal decision axis. The mean of the Gaussian distribution is assumed to depend monotonically on the source position, while its standard deviation has the same value for all positions. This indicates that the ability to resolve sources comes from the relative distances between their means. The first estimation method uses a Maximum Likelihood Estimate (MLE) technique to find the means of the internal distributions and the placement of decision criteria, given the confusion matrix observed. The second method, known as the raw processing method, computes raw estimates of bias and resolution from the means and the standard deviations of the responses. Bias is estimated as the difference between the mean and the correct response divided by the standard deviation. Resolution between two adjacent positions is computed as the difference of the mean responses divided by the average of the standard deviations. Finally, the third method, also a raw processing method, assumes that the variations in response between the standard deviation of all positions are unimportant. As a result, the standard deviation for all positions is averaged and used as a constant value. Bias and resolution are then computed as in the second method. As Shinn-Cunningham (1994) noted, the results of these three methods are very similar, even though MLE processing is much more computationally intensive and takes into account many factors ignored by the other methods. Thus, method two was assumed to be adequate for analyzing the data in this study. Accordingly, bias and resolution are given by: bias = m(p)- p bias (9a) m(p + 1)- m(p) resolution = d'= m V (p + 1)Jo(p) (9b) where p is the target position, and m(p) and o(p) are the mean and the standard deviation of the responses for target position p, respectively. 3.4.3. Expected Results 3 Given the results obtained by Shinn-Cunningham (1994) and the linearity of the altered cues used in this project, the following results were expected (Figure 11): For the first run using normal cues, subjects are expected to show almost zero bias and better resolution for the center positions than for the edges. When the first altered cues are presented, an increase in resolution in almost all directions was expected (with greater values around zero), due to the fact that the ITDs were larger (doubled) at all positions. Because of the increase in ITDs, the mean response should show a change in slope (with slope of mean response to correct location approximately doubled). This is consistent with subjects hearing sources farther to the side than their correct position. Similarly, we expected that bias would be small for positions near 00 azimuth, larger for intermediate 3 The supernormal cues used are called linear because the ITDs are approximately doubled for all positions. positions, and small again at the extreme edges (since subjects could not respond beyond the range of locations presented). Expected Mean Response Target Position (degrees) Expected Bias C haracteristic A • Expected Resolution Characteristic \ i '^^^^"^^^'^' 01I -60 K X_ a - X -40 -20 0 20 40 Target Position (degrees) ) 60 Target Position (degrees) Figure 11. Cartoon exaggerating the effects of adaptation for mean, bias and resolution. - : Normal cues. - -: Altered'cues. O: First presentation of normal cues. *: First presentation of altered cues. X: Last presentation of altered cues. +: First presentation of normal cues following the last run of altered cues. After the 30th altered run, adaptation was assumed to have taken place in that it was expected that mean errors would decrease with time. This decrease would be evident by a change in the slope relating mean response to correct location towards one, and a decrease in bias towards 0. Since the acoustic range was larger with the altered cues, the internal decision noise was assumed to grow with adaptation (Durlach and Braida, 1969; Braida and Durlach, 1972; Shinn-Cunningham, 1998). As a result, resolution was expected to decrease with time. The change in internal noise would also cause bias to decrease even farther than if there was no change in stimulus range. Finally, results from the first normal cues after exposure to the supernormal cues would give insight into whether subjects really adapted to the supernormal cues or if they were just consciously correcting their responses based on whether they were hearing normal or altered cues. In the first case, subjects could not immediately turn off their remapping of localization cues (even when they were told that they are hearing normal cues) and mean responses were expected to show an after-effect (i.e., the slope relating mean response to location was expected to be less than one). The after-effect should also cause identification performance to be worse after training than before, bias should be non zero and in the opposite direction from the error originally introduced by the remapping. If subjects could consciously change their responses, mean, bias, and resolution should have resembled those from the first normal cue run. As is shown in Figure 11, the expected results are all symmetrical around 00 azimuth (the mean response had odd symmetry) since there was no reason to think that there would be any left-right asymmetry in the results. For this reason, all the results presented here are collapsed around 00 (i.e., the left and right sides were averaged). 3.4.4. Results The data showed small differences across sessions, compared to the differences across test runs within a particular session. Therefore, the data reported in this study were collapsed across the eight sessions performed by each subject. The individual subject responses were analyzed to find mean response, bias, and resolution as a function of position for each run in the session. These statistics were then averaged across subjects, and then further collapsed by assuming left-right symmetry, to yield the results shown. Results from runs 2, 3, 32 and 33 were examined in detail to investigate how performance changed over the course of one session. Run 2 was the last run that used normal cues prior to the exposure to altered cues. At run 2, the subject knew what the experiment was about and should had been comfortable with the procedure. The results of this run served as a baseline or reference point for other runs because it reflected normal C~III~----- ------- ----- - _______~i~ -- localization performance. Run 3 was the first run that used supernormal cues and provided a measure of the immediate effects of the transformation. After 30 runs, subjects should have adapted to the unnatural cues, and run 32, the last altered cue used, should illustrate the final state of adaptation. Finally, run 33, the first run using normal cues after the altered cue runs, should revel the after-effect. Mean response for group with feedback 5( 4 S a) a, c, 3 c C: 2 o 0 0) a, o 1' "o V -1 Target Position (degrees) Figure 12. Mean response and slope characteristic for the group with feedback as a function of target position. 0: First presentation of normal cues. *: First presentation of altered cues. --: Normal slope (diagonal). - -: Double slope. Figure 12 shows the mean response as a function of position. The mean response was very close to the diagonal for the first normal cues presented to the subjects, as expected. When the altered cues were introduced, the outcome was consistent with the transformation employed: subjects heard sources farther to the side and the slope of the response curve was almost doubled for the center positions (the edge effect makes the slope decrease at the borders). Several runs with supernormal cues forced subjects to adapt, decreasing the slope of the mean response and reducing the localization error (Figure 13). However, the localization error was still present because subjects could not adapt completely to the unnatural cues. Finally, when normal cues were presented once again, small localization errors were made (particularly to the sides) in the opposite direction, revealing an after effect from the supernormal cues. Mean response for group with feedback tJ E; 4 0), %W C 0 '0 2 -1 0 10 30 40 20 Target Position (degrees) 50 60 Figure 13. Mean response for the group with feedback as a function of target position. -- : Normal cues. - -: Altered cues. 0: First presentation of normal cues. *: First presentation of altered cues. X: Last presentation of altered cues. +: First presentation of normal cues following the last run of altered cues. As seen in the figures, the mean response at 00 was slightly negative. This negative value was caused by a small (unintended) energy difference between the left and the right ear's HRTFs and it will be discussed in the next section. Figure 14 illustrates the bias estimates for this group of subjects. As before, the first presentation using normal cues gave reference values. The first altered cue presentation had a large positive bias because subjects heard sources farther to the side than with normal cues. Bias at the edges was negative because the only error that could be made at the extreme locations was towards the center. After adaptation took place, bias was reduced for most locations but it was still greater than the normal bias (adaptation is not complete). Finally, when normal cues were presented again, a negative bias was present for the lateral positions (but not for central positions). Bias for group with feedback 0 10 20 30 40 Target Position (degrees) 50 60 Figure 14. Bias estimates for the group with feedback as a function of target position. - : Normal cues. - -: Altered cues. 0: First presentation of normal cues. *: First presentation of altered cues. X: Last presentation of altered cues. +: First presentation of normal cues following the last run of altered cues. As described before, resolution was expected to increase in almost all the azimuth positions for all runs when using altered cues (it was expected to be somewhat better at the center than at the edges of the range). Figure 15 shows that resolution increased only for the center positions and in the first altered run (where bias had its highest value). Furthermore, the gain in resolution was not as good as the loss of resolution at the side positions. Finally, any changes in resolution between the last altered cues and the first normal presentation after the supernormal runs are small and inconclusive. For both normal and altered cues, there was a substantial decrease in resolution at the end of the session (after training), compared to the beginning of the session (prior to training). This may indicate an overall increase in variability with time, perhaps due to subject boredom. An alternative explanation is that training with the transformation caused decreases in resolution for both normal and altered cues. In conclusion, the expected results were obtained in this experiment, but the magnitude of the observed changes in performance was small, perhaps because subjects adapted very fast to the change in cues. The mean response and the bias showed all the characteristics of adaptation; however, there was only a small after effect, perhaps because subjects learned to go from one cue to the other one consciously. Finally, resolution decreased with training and time. Resolution for group with feedback 2.5 m 2 1.5 1 0.5 lIK *,. n 0 10 20 30 40 50 60 Target Position (degrees) Figure 15. Resolution estimates for the group with feedback as a function of target position. - : Normal cues. - -: Altered cues. 0: First presentation of normal cues. *: First presentation of altered cues. X: Last presentation of altered cues. +: First presentation of normal cues following the last run of altered cues. 3.4.5. Error in Measured HRTFs The normal HRTFs used had an energy level mismatch at high frequency of 6dB between the left and the right ear at all azimuths in this experiment, causing the subjects to shift their answers towards the left side. While standard deviation should not been affected by the error, the mean response was slightly shifted to the left side. Consequently, bias results also show a small negative shift. Even though mean response and bias were collapsed around 00, this effect was not canceled out completely (especially around 00). Impulse Response Left Ear (0 degrees) 400 mV 100. mV /div -400 mV Impulse Response Right Ear (0 degrees) 400 mV -~1- 50mV 100 _____ -1 __ ~AA /div -400--- --400 - 31.219 Start: 0 a Stop: 31.219 m.. ms Figure 16. Impulse response for the left ear (top panel) and right ear (bottom panel) filters at 0Oazimuth. A 6dB difference in amplitude is evident for high frequency components. Frequency Response for Right/Left Ears (0 degrees) 12.5 dB LogMag 5 dB /div -27.5 Start: 0 Hz Stop: 12.8 kHz Start: 0 Hz Stop: 12.8 kHz 180 dog Phase 45 deg /div -180 dog Figure 17. Frequency response for the ratio between right and left ear filters at difference in magnitude is evident for high frequency components (top panel). 00 azimuth. A 6dB Figure 16, the impulse response of the left and right HRTFs at 00 azimuth, shows a difference in amplitude of 50mV or 6dB (i.e., 201loglo[100mV/50mV] ) at the high frequency portions of the signals. This can be verified by examining Figure 17, where the frequency response magnitude of the ratio between the right and the left ear is depicted. A 6dB difference is present for the high frequency components (roughly above 7kHz). Both figures represent the actual outputs of the virtual auditory system (i.e., the signals that go directly to the headphones). 3.5. Adaptation Experiment without Feedback 3.5.1. Experiment Description This experiment was similar in all respects to the previous one, except that no feedback was given to the subjects, and that the trials structure was more tightly controlled. In this experiment each run of 26 trials was broken into subruns of 13 trials each to allow detailed analysis of the speed with which adaptation occurred. As in the previous test, subjects were informed before the beginning of the experiment that normal and altered cues would be used and that the apparent location of sources using the altered cues might not be the correct location. As before, the subjects were reminded every time before a change of cues would occur (from normal to altered or from altered to normal), so that they would answer as accurately as possible for the current cues. The purpose of this experiment was to see if removing explicit feedback would slow down the subjects' adaptation process, allowing our measurements to better capture the changes in performance over time. Adaptation was expected to occur even without feedback as subjects adjusted to the larger-than-normal cue range and learned to map the cues to the range of available responses. For example, if they heard a source that appeared outside the possible range of response locations when using altered HRTFs (i.e., between -60' and +600), the subjects would learn to adjust their responses to map the cue range to the response range. Data from five subjects were collected, none of whom participated in the first experiments. All subjects were naive, reported normal hearing, and had no difficulty performing the test. 3.5.2. Expected Results The overall pattern of results expected here was the same as for the previous experiment, except that changes were expected to occur more slowly. Since adaptation took place very fast in the previous experiment, only 10 runs were used here: 2 normal, 7 altered, 1 normal. The shorter session length also prevented subjects from getting bored or distracted towards the end of the experiment. 3.5.3. Results Runs 2, 3, 9 and 10 were analyzed to investigate how performance changed over the course of one session. As before, run 2 was the last run that used normal cues prior to the exposure to altered cues and provided a baseline against which other runs could be compared. Run 3 provided a measure of the immediate effects of the supernormal cues. Run 9 showed how subjects adapted to the transformed cues after exposure. Finally, run 10 showed any after-effect caused by the exposure to altered cues. Figure 18 shows the mean response as a function of position. As in the previous experiment, the mean response was very close to the diagonal for the first normal cues presentation. Results after the first presentation of altered cues indicate that subjects heard sources farther to the side with the altered cues, with slope of the response curve roughly doubled for the center positions. After seven runs with supernormal cues, the error in mean response decreased, but subjects never adapted completely (Figure 19). An after-effect showing localization errors in the opposite direction appeared when normal cues were presented in run 10. Mean response for group with no feedback 0 30 20 10 40 50 ti Target Position (degrees) Figure 18. Mean response and slope characteristic for the group with no feedback as a function of target position. 0: First presentation of normal cues. *: First presentation of altered cues. .: normal slope (diagonal). - -: double slope. Mean response for group with no feedback - ' 5( )r 7 ,X ,/ 0 Target Position (degrees) Figure 19. Mean response for the group with no feedback as a function of target position. Normal cues. - -: Altered cues. 0: First presentation of normal cues. *: First presentation of altered -: cues. X: Last presentation of altered cues. +: First presentation of normal cues following the last run of altered cues. Figure 20 illustrates the bias estimates for this experiment. The first presentation of altered cues resulted in significant errors in the expected direction (i.e., positive bias, indicating that subjects heard sources farther to the side). Some adaptation took place as indicated by the decrease of bias at the end of the altered runs. Again, it was clear that subjects did not entirely overcome the errors introduced by the transformation. Finally, when normal cues were presented again, a distinct negative bias was present at all positions. Bias for group with no feedback 1. 0. -0. -1. -2. 0 10 20 30 40 50 60 Target Position (degrees) Figure 20. Bias estimates for the group with no feedback as a function of target position. -: Normal cues. - -: Altered cues. 0: First presentation of normal cues. *. First presentation of altered cues. X. Last presentation of altered cues. +: First presentation of normal cues following the last run of altered cues. In this experiment, as in the previous one, resolution was enhanced in the central positions after the supernormal cues were introduced (Figure 21). However, resolution for the outside positions was very poor, presumably due to the fact that at the edges the cues presented are beyond the normal range. In other words, because cues change with position at two times the normal rate, only the center positions will give rise to natural ITDs. In general, relatively little change in resolution is seen for either normal or altered cues. It appears that the effects of training are very small, comparable to the random variation in the estimates due to stimulus uncertainty. Resolution for group with no feedback 2 - . 1.5 1 X- - 0.5 ""'" X n 0 10 20 30 40 50 60 Target Position (degrees) Figure 21. Resolution estimates for the group with no feedback as a function of target position. - : Normal cues. - -: Altered cues. 0: First presentation of normal cues. *: First presentation of altered cues. X: Last presentation of altered cues. +: First presentation of normal cues following the last run of altered cues. In order to better understand the rate at which changes are occurring, the subruns were compared to see if adaptation was occurring rapidly enough so that differences between the first and the second trials of a run could be observed. Figure 22 illustrates this adaptation rate by comparing the first 13 trials and the last 13 trials of runs 3 and 5, where altered cues were used. The figure shows that there was not much difference between the two groups of trials within a run, especially for mean response and bias, implying that the change within one run was small. Subjects should have made unconscious use of the array of response positions in front of them (Figure 9). For example, the edges of the range could help determine that the range of stimuli for altered cues was larger than normal, but that this stimulus range still must be mapped to localization responses between -60 and +600 degrees. It is not clear why resolution presents some apparent differences between the trials, while mean response and bias do not. Mean response for group with no feedback Mean response for group with no feedback :) D Target Position (degrees) Target Position (degrees) Bias for group with no feedback Bias for group with no feedback / - \ N~ 0 -1 -1.5 -2 10 20 30 40 Target Position (degrees) 50 0 60 10 Resolution for group with no feedback 0 10 20 30 40 Target Position (degrees) 50 20 30 40 Target Position (degrees) 50 60 Resolution for group with no feedback 60 0 U 0 10 20 30 40 Target Position (degrees) 50 60 Figure 22. Mean response, bias and resolution estimates for the group with no feedback as a function of target position. Left side panels show the estimates for the first run using altered cues (run 3) while the right side panels show the estimates for the third run using altered cues (run 5). 0: First presentation of normal cues. *: First 13 trials of presentation with altered cues. X: Last 13 trials of presentation with altered cues. This experiment demonstrated that feedback accelerates the adaptation process for supernormal cues, but that explicit feedback is not required for adaptation to occur. With exposure to a different set of stimuli, subjects learn to map the new stimulus set to the available responses. Resolution with the new double-head stimuli is enhanced at the center positions compared to normal cues, but is worse at the edges, implying that subjects cannot use cues outside the normally-experienced range very effectively. Finally, as subjects adapt, there is little change in resolution with this transformation, either for normal or altered cues. This implies that boredom, not a change in internal noise, may be the cause of the decrease in resolution seen in the first experiment. l ann~ ,~c~uz~MB r~,ac--. ,~;.2 l--~--r---.r~ r------- ---- - ~'~_ ~~rsh I 4. JUST NOTICEABLE DIFFERENCE 4.1. Motivation The use of supernormal acoustic cues increased resolution only in the neighborhood of 00 in azimuth, while making it very poor towards the sides. This result was expected for the non-linear transformation (Durlach, Shinn-Cunningham, and Held, 1993), but not for the double sized head simulation in which the ITDs were doubled. The following experiment was performed in order to help understand these discrepancies, and to develop an improved model of the adaptation to double-size head cues. 4.2. Background The resolution of the auditory localization system is measured in terms of the just noticeable difference (JND) in azimuth, also known as the minimum audible angle (MAA). The minimum audible angle is defined as the smallest detectable difference between the azimuths of two identical sources of sound (Mills, 1958). The minimum audible angle around 0' azimuth is small (about 10) for low frequency sinusoids, large for intermediate frequency sinusoids between 1500 and 2000 Hz, and small again for high frequency sinusoids. Resolution is poorer for sources off to the side, with the MAA increasing with the magnitude of the reference azimuth. For tones between 1500 and 2000 Hz and azimuths of more than 450, the JND is indeterminately large (Mills, 1972). Figure 23 shows some typical curves of the MAA as a function of frequency, for several azimuths. More information on the minimum audible angle can be find in Mills (1958 and 1963). Q) a,) a) -o sl ._.J 0 (9 Z LJ 4 W .J CD D 4 3 Z z; 200 500 1000 2000 5000 10,000 FREQUENCY(Hz) Figure 23. Just Noticeable Difference (JND) or Minimum Audible Angle (MAA) between successive pulses of tone as a function of the frequency of the tone and the direction of the source. 0: 0o. 0: 300. A: 600. A:75 0 (from Mills, 1963). 4.3. New HRTFs In the JND paradigm used here, subjects must indicate whether two successive sounds come from the same or different directions. The HRTFs measured by Wightman and Kistler (1989) and used in previous experiments were not utilized here because they are measured with spatial resolution of 100. A normal JND value for the center positions is roughly 1', indicating a need for HRTFs with at least a 10 resolution. To overcome the poor spatial resolution of the HRTFs, it was necessary to spatially interpolate them to approximate HRTFs at 10 spacing. Even though a simple linear interpolation seemed a reasonable approach, spectral discrepancies occur with this method due to the differences in phase between the filters (Wenzel and Foster, 1993). A method for minimizing such effects was suggested by Kistler and Wightman (1991), in which a minimum-phase approximation of the measured HRTFs was used. The method assumes that HRTFs are minimum phase functions (Oppenheim and Schafer, 1975), and that the ITD at each source have a constant value (i.e., the frequency dependence of ITD is unimportant perceptually). This was confirmed by Kulkarni (1995). The first step for computing the approximation is to obtain the minimum phase filters from the original HRTFs. By definition of minimum phase filters, their magnitude is equal to the magnitude of the original filters and the phase is equivalent to the Hilbert transform of the log-magnitude of the original filter. To obtain the magnitude of the new filters with a finer resolution, the magnitude of the original filters is interpolated. The appropriate phase difference (ITD) is obtained by finding the maximum value from the cross correlation function of the impulse response of the HRTF pairs. After interpolating the delay, half is added to the left of the new filter and half is subtracted from the right one. Equations 10a through 10e describe the method for finding the minimum phase interpolation for a 50 HRTF. In this example, HL(w) and HR(o) are the frequency responses, and hL[n] and hR[n] are the impulse responses of the pair of original HRTFs. ' is the Hilbert transform, and max(xcorr (a,b)) is the maximum value of the cross correlation function. ITD resolution is bounded by the sampling rate of the HRTFs. In this study, the normal HRTFs used were sampled at 20 psec, therefore, ITD resolution is 20 Psec which could be small for the HRTFs with a 10 resolution. Therefore, the normal HRTFs were upsampled to achieve better ITD resolution. New normal and altered HRTFs were created and gave acceptable performance with a 10 resolution. H HmR o.(Ct) = HLoo (o) leJ(ogHL"(CO) (o) = HR, 00) e jX(Iogj HR o HmpL 100 ((o) = HL 1o((0) 1ex log IH HmpR 100 ( ) = HR 00 ol (10a) jx log HR 100( ) (lOb) mpL (o) + H ()H HmpR 5"(o) (10c) too (0) HmpR 00 (0) + HmpR 100 (0) 22 max[xcorr(hLoo [n],hR o [n])]+ max[xcorr(hL 5 () (10d) 2 delay = HL o,, [n],hR, [n])] (1Oe) _idelay 2 = Hmpso (0) . e .delay HR 5 (0) = HpR5 (O) e 2 0 4.4. Equipment and Experimental Setup The JND experiment used an auditory virtual environment to present acoustic head cues to subjects identical to that used in previous experiments, except that the tracker was not used. Figure 24 shows a block diagram for the system, where the PC and 4. the signal processing device were the ones used before PC (main control) Wave Generator (Gaussian Noise) acoustic localization cues for Figure 24. Block diagram of the virtual auditory environment that simulated the JND experiment. Description of the PC and signal processing device can be find in section 3.3. ~------- r~~sL?-i~l Tr~PL ~-~--L--.- -- The PC controls the program and decides which HRTF was appropriate to use. This information was transmitted to the signal processing device, which filtered the input waveform with the specified HRTF. The filtered signal was then presented to the subject through headphones. For this experiment, the stimuli were rectangularley gated Gaussian noise pulses of duration 0.2 sec, produced by a WG 1 Waveform Generator. 4.5. Experiment Description In each trial of the JND test, three consecutive noise pulses were presented in one of the following orders, selected randomly: (i) RP, LEFT, RP, (ii) RP, RP, LEFT, where RP stands for a pulse simulated at the reference position and LEFT for a pulse simulated at a position to the left of the reference position by a small increment. The subjects must determine which interval (second or third) was located to the left of the first sound presented. In order to disguise any overall spectral level difference between positions that could serve as an acoustic cue to subjects, the intensity of each stimulus was randomized over a range of 15dB. The experiment used the transformed up-down method for estimating JND values described by Levitt (1971). This method employs an adaptive procedure in which the position increment in the LEFT trial is determined by the prior stimuli and responses. The objective of this forced-choice experiment was to determine the MAA (defined as the change in location that yields 70.7% correct performance) for simulated sources using both normal and altered HRTFs. Accordingly, to make the data converge to the 70.7% level, the following strategy was used: (i) If the answer is incorrect, increase the angle between RF and LEFT by 10 (ii) If there are two correct answers in a row, decrease the angle between RF and LEFT by 10. When tracking the increment presented in a run, peaks (when several correct answers were achieved and the angle between RF and LEFT is the smallest) and valleys (runs of incorrect answers and the angle between RF and LEFT is greater) are observed. The average of the peaks and valley angles yields an estimate of the JND value (i.e., the 70.7% level). In the experiment, the first three reversals (i.e., change between correct to incorrect answers or incorrect to correct) are thrown away, and the MAA is estimated from the data of the next seven reversals. An experimental session consisted of four tests using different reference positions (00, 100, 300, and 600). The order of the four tests was chosen randomly. Each test was done twice in each session, once with normal HRTFs and once with altered HRTFs. Each session lasted approximately 40 minutes. Each subject performed three sessions. Consequently, for each subject there were three JND values for each of the four reference positions using normal cues, and three JND values for each reference position using altered cues. Prior to the beginning of the experiment, subjects were informed that normal and altered cues would be used. They were notified every time before a change of cues occurred (from normal to altered or from altered to normal), so that they would answer as accurately as possible for the current cues. Additionally, they were told that the intensity of the sounds was random, and therefore that it should not be used as a localization cue; however, they were free to use whatever cue they found useful to do the task based on the feedback provided. Finally, they were informed that some trials might be harder than others and that feedback will tell what the correct answer was. At the beginning of the first session, subjects had a training session in which they got familiar with the test procedure. Data from six subjects were gathered (none of whom performed the adaptation experiments). All subjects were naive, reported normal hearing, and had no difficulty performing the test. 4.6. Expected Results The JND results for the normal cues should be roughly similar to the ones found by Mills (1963), shown in Figure 23. However, they might be somewhat larger due to the HRTF approximations used, limitations of the virtual environment, differences in the stimuli, or other procedural changes. Given the results of the previous experiments with altered cues, it was expected that the JND would improve for the center positions and would get worse as the angle increases. This could be explained by the unnatural ITDs that were presented to the subjects. They were not capable of distinguishing sources when the ITDs were larger than the ones normally received. Figure 25 sketches the expected results for JND. Expected JND 0 10 20 30 40 50 Reference Position (degrees) Figure 25. Expected JND. O: Using normal cues. *: Using altered cues. 4.7. Results The data gathered was averaged across sessions and across subjects to find the mean value for each reference position (0', 100, 300 and 600). The least-square-error quadratic polynomials that fit the mean values are: Normal JND = 0.0011 1'2 + 01417101+ 4.7579 for Altered JND = 0.00331012 + 0.1453101+ 3.5532 o10 < 600 (11) for 101 < 600, where 0 represents the reference position. Figure 26 shows that these polynomial representations have the same characteristics as the expected results (Figure 25). A smaller JND value was obtained around the center positions when altered cues were used. This implies that better resolution should be achieved at these positions since the relation between resolution and JND is inversely proportional. Additionally, the JND value for altered cues increases at the edges, indicating poorer resolution. Figure 26 shows that even though altered cues are easier to discriminate at the center positions, normal cues are easier to discriminate for a larger range of positions (from roughly 200 and beyond). Mean JND 0 10 40 20 30 Reference position (degrees) bu bu Figure 26. Mean JND.- -: Using normal cues, where O denotes the actual data points. - -: Using altered cues, where * denotes the actual data points. It is important to notice that at 00 both data (normal and altered) have a larger JND value than at 100. This may be a consequence of the HRTF error explained in section 3.4.5, which causes positions to shift slightly to the left. arrrr~ ~mnsarrr~ lr~n *--,u~-u--~-~nrc~ 5. MODEL OF ADAPTATION A model of adaptation to the larger-than-normal auditory localization cues is developed to make quantitative predictions of the rate in which subjects adapt to the supernormal cues. The model used is based on the one introduced by Shinn-Cunningham et al. (1998b). 5.1. Remapping Function Doubling the side of the head obviously doubles the ITD that arises for a source from a fixed position. However, in order to relate the previous adaptation model to the double-head conditions, we must develop a quantitative description of where the doublehead source normally would be perceived by a naive subject (i.e., what source position normally gives rise to the ITDs resulting from a source at 0 reaching the enlarged head). Let 0 denote the azimuth of a sound source reaching the enlarged head, and let 0' denote the location that gives rise to the same physical cues from a normal-size head. Before any adaptation has taken place, a source reaching the enlarged head from position 0 will be perceived at position 0'. The relation between 0 and 0' is assumed to be related by some functionf such that: f(0) =0'. (12) Because of the discrepancy between 0 and 0' (which is the difference between 0' and 0 described by the function f), the supernormal cues will introduce an initial bias. The relationship between 0 and 0' can be derived from the mathematical approximation for LTD (Equation 2). As explained before, the effects of EID and the spectral shaping of the pinnae are ignored since ITD is the dominant cue when a broadband stimuli is presented. 2 - Normal ITD = Altered ITD 2 -2551 -(0 + sin 0) = 2554 -(0'+ sin 0') (13) 2.0+ 2 -sin0 = 0'+sin0'. Solving Equation 13 numerically using a cubic polynomial expression we find that: f(0) =0.0008l3 -0.0111012 + 2.1115101-0.2106 for 101 <400. (14) This function is shown in Figure 27. f(O) I, (D3 0) W2 D 0 10 20 30 0 (degrees) 40 50 60 Figure 27. Remapping function. - -: Normal cues (i.e.,f(O)=O).-: Altered cues (i.e.,J(e)=o'). The remapping function is defined between -400 and 400 because outside this range altered ITDs are larger than the ones normally heard. As expected, for small angles the remapping function is approximately equal to f()=20, since sin0E0. For larger angles sinO deviates from a straight line and is greater than 20. Finally, 0=400 is the upper limit for this function (it maps to 0'=90o) since it is unclear how to predict what location will be perceived for sources whose ITDs are larger than occur in normal listening conditions. One possible way of handling these extreme values is to assume that they are all heard at 90. Figure 27 shows how altered HRTFs should affect the JND locations smaller than 400. The remapping creates a larger-than-normal difference in physical cues for two sources at different positions, as seen by comparing the normal remapping function (f(0)=0) to the double-head function for all positions. 5.2. Average Perceived Position Even though f(6) describes the remapping of altered HRTFs, it does not account for the adaptation process governing how subjects adapt over runs. The average perceived position p (or mean response; results are plotted in Figures 13 and 19) reflects changes in response due to adaptation during the experiments. Therefore, the average perceived position (in degrees) is a function of the run r and of the normal remappingf(0): pf (0), r] = k(r) -f (). (15) Here k(r) represents the slope of the line relating average response to the normal position. It describes the adaptive state of the subject during run r. Estimates of the slope k(r) were calculated by finding the least-square-error solution to Equation 15, and then averaging across subjects. However, because edge effects would cause errors in the estimates of the slope and because the remapping function is defined up to 400, only the middle 7 positions were used (i.e., between -300 and 300). Figure 28 (group with feedback) and Figure 29 (group with no feedback) show the mean responses and the best-fit slope as a function of the transformed source position. In both cases, the slope estimate is near 1.0 when normal cues are presented, and around /2 with altered cues. Also, the slope decreases between the first run with altered cues and the last one, reflecting the adaptation process. As mentioned before, several positions fell -L-_UI^CIIMIII~II~---- - -----I -~-- I~-~-~--~-VIU~--i--MYC -~X ,i-1-~iav~riYc-uZz'iCL~UFIL~FI\T~rW- L LII-~-U - outside the range of possible responses and even though they were not used in the estimation of k(r), they are still shown in the figures; they are plotted assuming that they are heard at 900. Figure 28 and 29 also demonstrate that the linearity assumed in Equation 15 is valid, since the regression coefficients are very close to 1.0 for every run. For the initial normal-cue runs, the slope estimate should be equal to 1.0, since the average perceived position is expected to equal the normal presented position for a nafve subject, as explained in Equation 16. p[f (0), r]= k(r) f (0)= 0 (16) p[0,1] = k(l) .0= 0 so k(l) = 1. as subjects adapt. When altered cues are presented, the slope should decrease to 1/2 Since perfect adaptation is given by p[f(O),r]=O, one has: p[f (0),r] = k(r)- f (0)= 0 assumingf (0) _ 20 p[f (0),r] = k(r) 20 = 0 so k(r) - 2 (17) Group with feedback 0 8 40 40 0 20 ./I run 2 R2=0.994 k(2)=0.961 F 20 run 3 R2=0.989 k(3)=0.591 0 100 100 0 50 100 5 525 50 9 100 0 50 Transformed position 0' (degrees;) 100 Figure 28. Mean response (o) and best-fit slope (-) for the group with feedback. Runs 2 and 33 use normal cues (0'=0), while runs 3 and 32 use altered cues. R2 is the square of the regression coefficient. Group with no feedback 50 0 100 100 50 ,, bu 40 20 run 9 R2=0.99 5 k(9)=0.5 61 run 10 J R2=0.999 k(10)=0.859 50 0 100 Transformed position 0' (degrees) 100 Figure 29. Mean response (o) and best-fit slope (-) for the group with no feedback. Runs 2 and 10 use 2 normal cues (0'=O), while runs 3 and 9 use altered cues. R is the square of the regression coefficient. Slope estimate co .L 0 ,) 4u C/) 0 5 10 15 20 Run r 25 30 35 40 Figure 30. Average least-square-error slope as a function of run. 0: Group with feedback. *: Group with no feedback. Figure 30 shows the best fit-slope value for all the runs in both experimental groups. It illustrates that the slope decreases dramatically when altered cues are first introduced (run 3) and that it decreases slowly as subjects adapt (runs 3 through 32 for the group with feedback, and runs 3 through 9 for the group with no feedback). It also shows similar characteristics across both groups. The difference in slope between runs I and 2 shows that the group without feedback adapted to the normal cues, while the other group adapted immediately with the aid of the correct-answer feedback. Figure 30 also shows that the group with feedback achieved stable performance with the altered cues, while the second group, with fewer runs, had not yet reached asymptotic level of performance. In both experiments, the slope begins near 1.0, as expected for unadapted subjects using normal cues. When altered cues are imposed, the slope average changes rapidly to a value around 0.6. The slope then decreases as subjects adapt to a value around 0.5, with the largest changes taking place during the first runs of altered cues. This rate of change in slope can be modeled by an exponential equation given by: k(r) = T-(T- Ko).e-h, (18) where T is the estimate of the slope asymptote, Ko the estimate of the initial slope of the subject's response, r is the run number (normalized so that the first run with altered cues has value 1), and b is a parameter that controls the rate of adaptation. The value of asymptote T was found as the mean of the last slope estimates using altered cues (last 3 for the group with feedback and the last 2 for the other group). The value of the initial slope Ko was set to the average of the first two normal runs. Finally, the adaptation rate for each group was estimated by finding the least-square-error solution to Equation 18. The results are summarized by Equation 19, and Figures 31 and 32. k(r) = 0.53-(0.53 - 0.98)-e -086 r for group with feedback (19) k(r) = 055 - (0.55 - 1.05) -e-087r for group with no feedback. For both groups, the slope estimates are nearly equal, indicating that there is no particular advantage for the feedback group. In particular, neither T, the asymptote, nor b, the rate of adaptation, are very different across groups. The similarity in these estimates is due in part to the way in which the asymptote was estimated for the no feedback group. By examination, it appears that they may still be adapting in run 9. If their true asymptote is actually smaller than 0.55 (e.g., closer to 0.53, the value of the feedback group asymptote), the rate b for this group would be smaller. Again, by inspection, this may be a more accurate description of their behavior. In general, the main difference between the groups is that the slope is larger for the group with no feedback than for the feedback group in a given run. An explanation of this phenomenon is that the effect of feedback is to speed the rate of adaptation. Group with feedback 0.9, 0. 0.8, ) 0.1 0.7 0. 0.6! 0. 0.5 ........... I _ 0. 0 5 10 15 20 Run r 25 30 Figure 31. Estimates for T, K, b and k(r). O: Slope estimates. -: Equation 19 for the group with feedback. 35 40 k(r) approximation described by Group with no feedback I.Z - ko=1.05 0.9 F- 0.8 E 0.7 Ib=0.87 0.6 T=0.55 0 .521. 0 I I I I 5 10 15 20 Run r I I I 25 30 35 .. I 40 Figure 32. Estimates for T, Ko, b and k(r). *: Slope estimates. -- : k(r) approximation described by Equation 19 for the group with no feedback. 6. RELATING JND AND RESOLUTION Results from the Just Noticeable Difference (JND) experiment may predict the relative sensitivity to localization cues in the adaptation experiments. The JND gives a measure of the resolution between two positions, while the sensitivity measured in the adaptation experiments gives insight into how sensitive subjects are to localization cues in experiments with a larger range of stimuli. One model (the preliminary model of intensity perception by Durlach and Braida, 1969) predicts that JND is inversely proportional to sensitivity in identification experiments. The scale factor equating these measures, however, depends upon internal noise. 6.1. Background Durlach's and Braida's preliminary intensity perception model provides a quantitative theory for predicting resolution for various types of experiments measuring intensity perception (the model is described fully in Durlach and Braida, 1969). The model has previously been applied to localization experiments involving adaptation (Shinn-Cunningham, 1998). In the model, every stimulus I maps to an internal sensation Y, a Gaussian random variable with mean oa(I) and variance p2. The internal sensation is further transformed by the addition of a second source of noise, a zero-mean Gaussian random variable with variance y. As a result, the internal sensation is a Gaussian random variable Q with mean a(l) and variance 12+_y. The variance 1 2+ 2 arises from two independent sources. The first noise source, 32, is called Sensation Noise and depends solely on the stimulus presented. This noise limits the best performance that can be achieved in any experiment. Memory Noise (-),the second noise source, affects the transformation from the sensation Y to the random variable Q, and depends on the type of experiment. For the type of experiments performed in this study, Memory Noise is assumed to be proportional to the total range of stimuli presented. This type of noise is termed context-coding noise. Therefore, if Imx and Imi, are equal to the extreme values used in the experiments, the Memory Noise can be written as: Y2 = G 2 [(Imax 1)]2 , -- (20) where G is a constant and (e)is a function that transforms the stimuli physical location into a random internal decision variable. The addition of context-coding noise allows the model to explain why subjects may confuse two stimuli in a large-range identification task, while they can always identify them correctly in a JND-type task (where the range is reduced). The model assumes that subject responses are based on the value of the decision variable Q. The Q axis is divided into n contiguous regions by n+l criteria. Each region corresponds to one of the n possible stimulus locations presented in the experiment. For a one-interval experiment the discriminability between two stimuli Ii and Ij can be written as: d I(l i) - 1 2 - O(IIj) oC(lj )(l) 2 j2 +G 2 .[(Imax) -(Imin 2 (21) )] Therefore, d' increases as the distance between the mean values of the two stimuli Iiand Ij increases. Also, for two fixed stimuli, the sensitivity decreases as the range increases since the internal noise grows. In other words, the Gaussian function of the two stimuli overlap each other more in the large-range case than in the small-range case. In summary, the function u(*) is a transformation taking physical stimulus values to variables along an internal decision axis. The mean of the decision variable Q is monotonically related to the location of the physical stimuli and its standard deviation is independent of the location. The internal noise comes from a fixed source with variance 32 and a second source that depends on the total range of the stimuli (with variance -?). The JND (the difference between stimuli at which 70.7% correct responses occurs) is the increment for which the values of the reference stimulus and the reference stimulus plus an increment, transformed by the function a(.), are separated enough that subjects can reliable tell the two stimuli apart. In the model, it is assumed that the two stimulus distributions overlap by an amount that leads to 70.7% correct responses when the decision criterion is positioned optimally, halfway between the mean of the distributions. Thus, when the stimuli are one JND apart, the area on the wrong side of the criteria is 0.293 (i.e., 29.3% incorrect responses). Discriminability in a JND experiment is dominated by the Sensation Noise; Memory Noise in these tasks is negligible since the range is small. Additionally, in the underlying decision space, the JND increment (in standard deviation units) is independent of the reference stimulus. Thus, this increment (in stimulus units) is roughly inversely proportional to the derivative of the a(.) function evaluated at the reference position: d a(.) do JN (22) Conversely, resolution in identification tasks is proportional to the distance between the means of the underlying distributions (in standard deviation units). However, in the largerange task there is a significant amount of Memory Noise. As a result, although the two types of experiments should show similar relative sensitivity, resolution in the large-range experiment should be proportional to the JND: d'= A - d* = A -JND- ', (23) where A is a constant. This result indicates the inverse of the JND function predicts the general shape of resolution and that relative sensitivity depends only on the a(*) function. 6.2. Results The mean JND functions found for normal and altered cues (Equation 11) were used to estimate the shape of d'. Figure 33 shows the result of inverting the JND functions. The sensitivity for the altered cues around the center positions is greater because the JND is smaller than for normal cues. For angles larger than 250, d' is larger for normal cues (as expected). Since the units of the estimated sensitivity are arbitrary, the data in Figure 33 were scaled so that the value of the normal data at 00 was one, without loss of generality. Inverse JND 0 0 10 20 30 40 Target Position (degrees) 50 60 Figure 33. Estimated shape of d'. -- : Normal cues. - -: Altered cues. The scale factors that made the resolution results from the experiments best fit the 1/JND data were found using a least-squared error method. Resolution data obtained from normal cues were fitted to the normal 1/JND curve, while resolution data from altered runs was scaled to match the 1/JND curve for altered cues. The scaled versions of the resolution data are plotted and compared to the 1/JND curves in Figure 34. By inspection, resolution and 1/JND do show very similar shapes in all the cases (i.e., for normal and altered cues and before and after adaptation). It is important to note that each graph shows results before and after adaptation that appear very similar. However, the scale factors used to fit the 1/JND curves are different for the two curves. These scale factors are proportional to the internal noise in the model, which may vary with training (ShinnCunningham, 1998). The prediction that relative sensitivity depends only on the stimuli used appears to hold; therefore, further analysis of the scale factors is described below. Group with no feedback Group with feedback 0 0 0 0 10 20 3 0 40 30 20 Target Position (degrees) 0 50 6 60 00 10 10 40 30 20 Target Position (degrees) 50 60 Group with no feedback Group with feedback 1.4 1 1 0.8 0.6 0.4 0.2 00 11 10 40 30 20 Target Position (degrees) 50 60 0 10 40 30 20 Target Position (degrees) 50 50 60 60 Figure 34. Resolution results scaled to match I/JND curves. Results for the group with feedback are plotted in the left side; results for the group without feedback are plotted in the right side. Results for normal cues are shown in the top panels; results for altered cues are shown in the bottom panels. -: Resolution data from experiments. - -: 1/JND curves. 0: First presentation of normal cues (before adaptation). *: First presentation of altered cues (before adaptation). X: Last presentation of altered cues (after adaptation). +: First presentation of normal cues following the last altered cues (after adaptation). The scale factor analysis will directly reflect any changes of internal noise during the performance of an experiment, and several hypothesis can be made using the ideas given by the preliminary intensity perception model. First define: - nl as the internal noise for normal cues before adaptation, - n2 as the internal noise for altered cues before adaptation, - n 3 as the internal noise for altered cues after adaptation, - n4 as the internal noise for normal cues after adaptation, - sf1 as the scale factor for normal cues before adaptation, - sf2 as the scale factor for altered cues before adaptation, - sf 3 as the scale factor for altered cues after adaptation, - sf 4 as the scale factor for normal cues after adaptation. It is reasonable to think that the internal noise does not change abruptly between acoustic cue changes (from normal to altered or from altered to normal), but instead has slow transitions. Thus, ni should be roughly equal to n2 and n3 should be roughly equal to n4. This assumption implies that sf1 should be equal to sf 2, and that sf3 should be equal to sf4 . Even though sfj and sf2 (or sf 3 and sf 4) refer to different 1/JND curves, their value should be the same if the internal noise is the same (i.e., both resolution data are affected by the same internal noise quantity and therefore, they are shifted in the same amount from the underlying L/JND curve). It is also expected that internal noise will increase after adaptation (Shinn-Cunningham, 1998). Here, n4 should be greater than nl, and n3 should be greater than n2 . This implies that sf 4 should be greater than sfj and sf3 greater than sf2 . Finally, if feedback was important in these experiments, internal noise values should be comparable across the two groups (feedback and no feedback). Equation 24 gives a summary of these hypotheses. n, = n 2 and n 3 = n4 therefore sf = sf 2 and sf 3 = sf 4 n, < n4 and n2 < n3 therefore sf < sf 4 and sf 2 <sf3 . (24) Figure 35 compares the scale factors obtained for both experiments. Here, the hypothesis that noise does not change much between nl and n2 or between n 3 and n 4 , is not generally true. It looks like, with or without feedback, nl has the smallest value and n2 is slightly larger. In the feedback experiment, n4 is clearly smaller than n3, but this relation does not hold in the no-feedback experiment. Additionally, the data supports the idea that n4 > nl in both experiments, but n 3 > n2 only in the feedback experiment. Finally, looking across experiments, there is a big difference between the scale factors. In particular, performance is generally worse for the no-feedback case (scale factors are larger) than for the feedback case, indicating that feedback helps reduce the overall internal noise. It is interesting to note that the maximum noise achieved in both experiments is roughly the same (n3 is about 0.53 for both experiments). Scale Factor Analysis U.bb 0 0.5 0O 0 c LL O 0.45 a) co O 0.4 0 n ' Group with Feedback Group with no Feedback Figure 35. Scale factors from the group with feedback are on the left side while the scale factor for the group with no feedback are on the right side. 0: Normal cues before adaptation. 0: Altered cues before adaptation. U: Altered cues after adaptation. 0: Normal cues after adaptation. A possible explanation for these results is that in the feedback condition, noise changes rapidly between runs, and there are observable changes from nl to n2 to n3 to n 4. In general, nj is smallest, consistent with the idea that effective range of stimuli is smaller (normal range -600 to +600). After altered cues are introduced, internal noise increases (n2) because the range is larger, and subjects are starting to adapt. N3 has the largest value, since it reflects the noise when the subjects attend the whole range between -900 to +900 and are adapted to the altered cues. When normal cues are reintroduced, n 4 reflects a change in range back towards the original value ( i.e., n 4 < n3 but still n 4 > n 1). Conversely, it appears that in the no-feedback condition, subjects tend to listen to almost the whole range of cues throughout the experiment. Initially, in the n, run, they attend to a slightly smaller range (nl < n2, n3, and n4), but as soon as they hear the auditory signal change to the altered cue condition, they attend to the whole range of possible locations (-90o to +900). Thus, there is no substantial change from n 2 to n3, since they are already attending the whole range. When normal cues are presented again, there may be a slight decrease from n3 to n 4 , indicating that subjects attend a slightly smaller range in the n4 run. However, this change is small and is probably not significant. In general, then, it appears that subjects attend to the whole range of cues when feedback is not provided because they are not sure what acoustic cues will be presented. 7. CONCLUSION 7.1. Summary Adaptation to double-size head auditory localization cues was investigated by presenting simulated acoustic cues with the aid of an auditory virtual environment. The goal of this study was to determine whether better-than-normal performance could be achieved with these supernormal localization cues. Bias and resolution were the two aspects of performance analyzed. This study follows previous work by Shinn-Cunningham et al. (1994 and 1998) in which a nonlinear remapping of the normal HRTFs was implemented (Equation 6). The study concluded that subjects did not adapt to the nonlinear transformation employed but rather to a linear approximation of the transformation. It also showed that the slope relating mean response to the physical cues presented changed exponentially over time. These results indicated that the largest changes in performance occurred at the beginning of the period using altered cues. By the end of the exposure to altered cues, the mean slope asymptoted to a stable value. Finally, the rate at which subjects adapted was found to be b=0.84 run - ] (Equation 18). In this study, even though the transformation function is not exactly linear (Equation 14), the ITDs for every position are doubled when altered cues are used. This means that the mapping is roughly linear in ITD space. Two similar experiments investigated how subject performance changed over time. Both experiments used a forced-choice identification task using 13 different positions. In the first experiment, the first 2 and the last 8 runs used normal cues, while the middle 30 presented altered cues. In this experiment, correct-answer feedback was provided after each response. In the second experiment, the first 2 and the last runs used normal cues, while the middle 7 presented altered cues. In this experiment, feedback was not provided. The experiments showed that feedback accelerates the adaptation process to supernormal cues, but that it is not necessary for adaptation to occur. For both experiments, mean response and bias showed all the characteristics of adaptation, while resolution results were less consistent. In particular, changes in resolution in the feedback experiment were similar to changes seen in previous experiments (Shinn-Cunningham et al., 1994 and 1998). However, without feedback, internal noise was large throughout the experiment, as if subjects attended to the whole range of possible cues, independent of their adaptation rate. In general, resolution was better at the center positions when altered cues were introduced, but normal cues provided better overall resolution. A just noticeable difference (JND) experiment was run to obtain further insight into the normal and double-head resolution results (Equation 11). The JND curves were used to predict sensitivity as a function of azimuth and to compare these measures of sensitivity to the resolution results. This analysis indicated that relative sensitivity depends only on the cues used (not on the adaptive state of the subject). Additionally, it showed that feedback reduces the total internal noise, improving resolution. Finally, it was found that with the double-size head localization cues, subjects also adapt to a linear transformation. In both adaptation experiments, mean slope (relating acoustic cues with position) changed exponentially over time. The rate at which subjects adapted was found to be b=0.86 run -' for the group with feedback and b=0.87 run -' for the group with no feedback (Equation 19). 7.2. Discussion The most important characteristic of simulating double-size head cues was that the normal ITDs were doubled for every source position. As discussed previously, for angles greater than 400 (and for angles less than -400 since symmetry was assumed) the altered 1TDs presented were unnatural (i.e., the values were larger than the largest, normally-occurring ITD). This means that subjects heard sources that had cues never heard before. Although in one sense, this transformation of ITDs is linear, in another sense, it is not. In particular, the transformation function f(0) (Equation 14) between the position that an unadapted naive listener perceives and the actual position is not linear. Additionally, f() is not defined for azimuths above 400. It is supposed that these positions were always mapped to 900 (Figure 27). This means that subjects heard normal positions between 400 and 600 as coming from the same (900) position. If this were exactly what occurred, subjects should confuse those positions, or in other words, their resolution should be very bad for sources farther than 400 to the side. This effect can be seen in the resolution results for both adaptation experiments (Figures 15 and 21). Furthermore, the JND function (Equation 11 and Figure 26) also shows poor resolution at the edges of the range, since its value at the edges is greater for altered cues than for normal cues. For example, the JND value for 600 is approximately 230 (subjects could only distinguish a source from 600 with one coming from 370). As explained before, previous models predict that the inverse of the JND function gives a good estimate of the general shape of the sensitivity curve in any experiment. The inverse JND curve also predicts that resolution is very poor for the edges (Figure 33). In short, the altered cues are beneficial only for sources between -400 and +400. In this case, the range of cues presented correspond to the normal-cue range of exactly -900 and +90' (i.e., all ITDs presented will be natural). The two adaptation experiments (feedback and no-feedback) showed some similarities and some very interesting differences. In both groups, mean response and bias results (Figures 13, 14, 19 and 20) showed the adaptation process that was expected. While the resolution results for the group with feedback showed changes consistent with changes in resolution in previous adaptation experiments, in the group with no feedback resolution did not change with adaptation (Figures 15 and 21). It seems that resolution for the no-feedback group depended only on the acoustic cues presented (i.e., the results for normal cues before and after adaptation are very similar, as are the results for altered cues before and after adaptation). Indeed, this results resemble the estimates of d' as a function of azimuth (Figure 33), further supporting the hypothesis that resolution reflects and follows the changes in the type of cues. The difference in the resolution results between the two adaptation experiments can also be explained by the amount of internal noise occurring in each stage of the adaptation process. Changes in resolution with adaptation occur because of internal noise changes. In the group with feedback, subjects first attend to a ±600 range. When altered cues are introduced, subjects attend to a larger range, but the internal noise does not change immediately. After adaptation has taken place, the internal noise has increased to the value proper for the larger range. After normal cues are reintroduced, the actual physical-cue range is reduced, but again the internal noise does not decrease abruptly. On the other hand, the group with no feedback always attends to a large range since they never know with certainty the range being used. For this reason, the internal noise is constant during the experiment. The scale factor analysis supports this result (Figure 35), since the values for the group with no feedback are very similar for all runs, in contrast with feedback group, for whom the values change with adaptation. In a previous work, Shinn-Cunningham et al. (1994 and 1998) found that mean response, bias and resolution were dependent and that all changes were related to a single underlying process. It is interesting to see that both studies found very similar rates of adaptation using different transformations. However, in seems that the no-feedback condition causes resolution to be independent from the other quantities. Finally, the explanation that boredom in the first experiment caused the difference in the resolution results is not acceptable, since overall performance levels (resolution) is worse without feedback, when a shorter experiment was performed. 7.3. Future Work The results in this study could be better understood if the exact shape of the transformation function f(O) was known and defined throughout all the possible source locations. This implies that the effects of the unnatural ITD values should be studied. Also, to achieve a better model for the sensitivity d', the exact shape of xo(O), the transformation function between physical stimulus values and variables along an internal decision axis, should be determined. This function could also help create a model of adaptation that predicts the performance of bias and resolution using the double-size head cues. Since feedback or no-feedback conditions gave different results, the model presented by Shinn-Cunningham et al. (1994 and 1998) should be modify to allow bias and resolution to be driven by different processes. REFERENCES Braida, L. D. and Durlach, N. I. (1972). Intensity perception. II. Resolution in oneinterval paradigms. Journalof the Acoustic Society of America, 51, 483-502. Blauert, J. (1983). Spatial Hearing. Cambridge, MA: MIT Press. Bolt, R. A. (1984). The human interface: Where people and computers meet. London: Lifetime Learning Publishers. Durlach, N. I. (1991). Auditory localization in teleoperator and virtual environment systems: ideas, issues, and problems. Perception, 20, 543-554. Durlach, N. I. and Braida, L. D. (1969). Intensity perception. I. Preliminary theory of intensity resolution. Journalof the Acoustic Society of America, 46, 372-383. Durlach, N. I., and Mavor, A. S. ed. (1995). Virtual Reality: Scientific and Technological Challenges. Washington, D.C.: National Academy Press. Durlach, N. I. and Pang, X. D. (1986). Interaural magnification. Journal of the Acoustic Society of America, 80, 1849-1850. Durlach, N. I., Shinn-Cunningham, B. G., and Held, R. (1993). Supernormal auditory localization. I. General background. Presence, 2, 89-103. Foley, J. D. (1987). Interfaces for advanced computing. Scientific American, October, 127-135. Kistler, D. J. and Wightman, F. L. (1991). A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction. Journalof the Acoustic Society of America, 91, 1637-1647. Kulkarni, A. (1995). Auditory Imaging in a virtual environment. Unpublished master's thesis. Department of Biomedical Engineering, Boston University, Boston, Massachusetts. Levitt, H. (1971). Transformed up-down methods in psychoacoustics. Journal of the Acoustic Society of America, 49, 467-477. Lippmann, R. P., Braida, L. D. and Durlach, N. I. (1976). Intensity perception. V. Effect of payoff matrix on absolute identification. Journal of the Acoustic Society of America, 59, 129-134. Mills, A. W. (1963). Auditory perceptions of spatial relations. Proceeding of the International Congress of Technology and Blindness. Vol. 2. Pp. 111-139. American Foundation for the Blind, New York. Mills, A. W. (1958). On the minimum audible angle. Journal of the Acoustical Society of America, 30, 237-246. Mills, A. W. (1972). Auditory localization. In J. V. Tobia (Ed.), Foundations of Modern Auditory Theory (pp. 303-348). New York: Academic Press. Oppenheim, A. V., Schafer, R. W. (1975). Digital Signal Processing. (pp. 337-367). Englewood Cliffs, NJ: Prentice Hall. Plenge, G. (1974). On the difference between localization and lateralization. Journal of the Acoustical Society of America, 56, 944-951. Rabinowitz, W. R., Maxwell, J., Shao, Y., and Wei, M. (1993). Sound localization cues for a magnified head: Implications from sound diffraction about a rigid sphere. Presence, 2, 125-129. Lord Rayleigh [Strutt, J. W.] (1907). On our perception of sound direction. Philosophical Magazine, 13, 214-232. Shaw, E. A. (1974). The external ear. In W. D. Keidel & W. D. Neff (Eds.), Handbook of sensory physiology, Vol. 1, Auditory system (pp. 455-490). New York: SpringerVerlag. Shaw, E. A. (1975). The external ear: New knowledge. In S. C. Dalsgaard (Ed.), Earmolds and Associated Problems. Proceedings of the 7'h Danavox Symposium, Scandinavian,Audiology, Suppl. 5, 24-50. Sheridan, T. (1987). Telerobotics. Proceeding of the International Federation of Automatic Control, 1 0 hIFAC World Congress,July, 27-31, Munich, FRG. Shinn-Cunningham, B. G. (1994). Adaptation to supernormal auditory localization cues in an auditory virtual environment. Unpublished Ph.D. thesis in the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology. Shinn-Cunningham, B. G. (1998). Adapting to remapped auditory localization cues: A decision-theory model. (Draft). Shinn-Cunningham, B. G., Durlach, N. I., and Held, R. M. (1998a). Adapting to supernormal auditory localization cues I: Bias and resolution. Journal of the Acoustic Society of America, (submitted). Shinn-Cunningham, B. G., Durlach, N. I., and Held, R. M. (1998b). Adapting to supernormal auditory localization cues II: Constraints on adaptation of mean response. Submitted to Journalof the Acoustic Society of America, (submitted). Strelow, E. R, and Warren, D. H.. (1985). Sensory substitution in blind children and neonates. D. H. Warren and E. R. Strelow (Eds.) Electronic Spatial Sensing for the Blind (pp. 273-298). Dordrecht, NL: Martinus-Nijhoff. Vertut, J., Coiffet, P. (1986). Robot technology. Teleoperation and Robotics: Evolution and Development (volume 3A) and Applications and Technology (volume 3B). Englewood Cliffs, NJ: Prentice Hall. Warren, D. H., and Strelow, E. R. (1984). Learning spatial dimensions with a visual sensory aid: Molyneaux revisited. Perception, 13 (pp. 331-350). Wenzel, E. M. (1992). Localization in virtual acoustic displays. Presence, 1 (1), 80-107. Wenzel, E. M. and Foster, S. H. ( 1993). Perceptual consequences of interpolating headrelated transfer functions during spatial synthesis. Proceedings of the IEEE ASSP Workshop on applications of signal processing to audio and acoustics, October 1993, A New Paltz, New York. Wightman, F. L., and Kistler, D. J. (1989). Headphone simulation of free-field listening. I. Stimulus synthesis. Journalof the Acoustic Society of America, 85, 858-867. Wightman, F. L., and Kistler, D. J., and Perkins, M. E. (1987). A new approach to the study of human sound localization. In W. A. Yost and G. Gourevitch (Eds.), DirectionalHearing (pp. 26-48). New York: Springer-Verlag. Wightman, F. L., and Kistler, D. J. (1992). The dominant role of low-frequency interaural time differences in sound localization. Journal of the Acoustical Society of America, 91, 1648-1661.