Design and Evaluation of Computer

advertisement
Design and Evaluation of ComputerSimulated Spatial Sound
S.H. Kurniawan 1, A.J. Sporka, V. Nemec and P. Slavik
1 Introduction
The potential of Virtual Reality (VR), which has been an important and exciting
field of HCI for many years, for people with disabilities has slowly been recognised.
VR systems had been applied in the areas of education, training, rehabilitation,
communication and access to information technology for people with disabilities
(Colwell, Petrie, Kornbrot, Hardwick and Furner, 1998).
There is a wide range of applications of VR for blind and visually impaired
users. These applications share a common feature: the substitution or enhancement
of visual information with information in other modalities, i.e., audio or haptic/
tactile/kinaesthetic. One important application of VR is to train blind users to
navigate and move around in real environment, also known as the orientation and
mobility (O&M) training (Inman and Loge, 1999). O&M training is important
because it helps blind people develop the skills and techniques to overcome travel
difficulties created by blindness and to maximise their ability to move around in
different environments, familiar or unfamiliar, independently, safely and confidently
(The Royal Blind School, 2003).
Conventional O&M training involves instructing a blind trainee to approach a
wall or bring a small obstacle near to the trainee’s face to show the sound variation
caused by presence of object (known as the obstacle perception training) or to bring
trainees to various environments to train them to detect the sound variation caused
by various factors, e.g., the floor textures, the room size, the location of the closest
obstacle, etc. (Seki and Ito, 2003). This method is very time consuming and may
pose some danger to the trainees (e.g., when training them to cross a busy road).
This is an area where VR and virtual sound may be beneficial. Rather than exposing
a blind trainee to a real environment, the trainee can stay in a virtual environment
and learn to orientate and move around based on the virtual sound he/she heard.
1
Department of Computation, UMIST, PO Box 88, Manchester, M60 1QD, UK
138
Kurniawan, Sporka, Nemec and Slavik
However, this also means that the acoustic system used for the training must be able
to produce sounds that are natural to the trainees’ ears.
This paper reports on the design and evaluation of one component of the O&M
training system for blind and visually impaired people: a spatial audio system that is
capable of modelling the acoustic response of a closed environment with varying
sizes and textures (e.g., a small carpeted room vs. a large plastered hallway).
Previous work on room perception includes Suzuki and Martens’ work (2001) on the
subjects’ ability to determine the presence of the walls made of different materials in
virtual environment.
2 Spatial Sounds in Real and Virtual Environments
2.1 Spatial Sound
Sound is essentially vibration of particles of solid or liquid medium around their
equilibrium positions. If this vibration falls in the range of 10 and 20000 Hz, the
sound is called the audible sound. The vibration of the particles causes small local
adiabatic variations of pressure in the medium, referred to as the acoustic pressure.
These pressure variations are propagated through the environment by means of
waves of acoustic pressure.
In the real world with obstacles between the source and the receiver, only some
part of the sound wave travels straight between the source and the receiver (hence is
called direct sound). Signal 1 in Figure 1 is an example of a direct sound. The shape
of the signal of a direct sound is unchanged, except for its intensity – due to the
energy conservation law – and its temporal displacement or delay – due to the finite
phase velocity of the sound waves. Other parts of the sound will be reflected or
diffracted by some obstacles before reaching the receiver. In this case, what the
receiver receives is the acoustic response of the environment to the original sound
emitted by the source. The combination of the direct and indirect sound is called the
spatial or spatialised sound, also called the sound that carries the reverberation of the
environment. Figure 1 illustrates the propagation of sound in a closed environment.
Figure 1. Propagation of sound in a closed environment. S = sound source, R = sound
receiver. The diagram on the right hand side is the corresponding IR diagram.
Design and Evaluation of Computer-Simulated Spatial Sound
139
For any configuration of sound source, sound receiver and obstacles of the
environment, it is possible to represent the acoustic response using an acoustic
impulse response (IR) diagram as shown in the right hand side of Figure 1. Briefly,
IR describes the intensity and the time of arrival of all echoes of the emitted sound
received by the sound receiver.
The phase velocity of the sound waves (i.e., the speed of sound) is different for
different media (Kutruff, 1979). For the air, its magnitude is approximately:
c = 331.4 + 0.6è, è is the temperature in degree Centigrade
(1)
As a consequence of the energy conservation law, the amplitude of the sound
pressure decreases with its distance from the sound source. Besides the well-known
“1/r2 rule” (the intensity of sound decays with the square of distance from the sound
source), during the propagation itself, the sound energy is also scattered as heat
dissipated to the medium.
Sound reflection occurs when a wave hits a surface of an obstacle, as depicted in
Figure 2.a. In this case, a reflected wave originates from the place of impact. This
reflected wave carries only a part of the energy of the original wave as the energy is
lost during interaction with the obstacle. The amount of energy lost is determined
by the absorption coefficient á, which is dependent on the material of the surface,
the frequency of the sound, and the angle of impact â.
Sound diffraction occurs if the wavelength of the sound is similar to the
dimension of the obstacles of the environment. In this phenomenon, the sound is
deviated around an obstacle, as Figure 2.b shows.
a
b
Figure 2. Indirect sounds: a. sound reflection; b. sound diffraction
2.2 The Human Auditory System
The human auditory system is capable of detecting the reverberation in the sound
received and analysing the spatial information about the surrounding environment
contained in it. This characteristic necessitates the incorporation of the acoustic
response of the environment when creating virtual sound in VR systems.
140
Kurniawan, Sporka, Nemec and Slavik
As a consequence of the sound interactions with different obstacles in an
environment, the sound arriving to a listener contains multiple echoes of the original
sound (a combination of sounds with varying delays and magnitudes of attenuation)
and information about the directions of arrival.
These echoes can be divided into three major parts:
•
•
•
the first audible echo received is interpreted by the human auditory system
as a direct sound. Its direction of arrival provides the most important
information of where the sound source is located. Its intensity provides the
information of the distance of the sound source;
the early echoes ( 100 ms) are processed separately by the human auditory
system. Analysing their incoming direction and intensity allows the position
of the nearest obstacles in the environment to be determined (Funkhouser,
Jot and Tsingos, 2002);
the late reverberation gives the overall information about the environment
(the size of the environment, the textures of the floor/wall, etc.)
2.3 Spatial Sound in Virtual Environment
Modelling sound propagation in an environment can be done throught an
appropriate wave equation (Kutruff, 1979). However, the solution of this equation is
very complex even for simple scenes and virtually impossible for more complicated
scenes where many obstacles are involved. Therefore, alternative ways to describe
sound waves are needed. Generally, there are three approaches to solve the
equation: numerical, geometrical and statistical.
2.3.1 The Numerical Approaches
These approaches give the solution for the wave equation by reducing the problem
to estimating energy transfers among finite elements specified within the modelled
scene. There are two methods within the numerical approaches:
•
•
The finite and boundary element methods give the solution of the wave
equation by spatial subdivision of the scene into distinct elements, for which
the wave equation is expressed as a discrete set of linear equations. The
underlying computation of these methods is very complex and consequently
when the calculation is performed using a computer, it requires a large
memory capacity. The computational complexity increases with the
frequency of the sound whose scene is modelled. Therefore, when precise
estimation is required, these methods are suitable only for low-frequency
energy transfers within simple scenes.
The waveguide mesh is a regular array of elements with its neighbours
connected by unit delays. Each element describes the sound energy of a
finite part of the modelled environment. Each sound source and receiver is
represented by one element from the mesh. The simulation itself is iterative.
In each iteration, each element updates its energy status based on the
previous energy status of all its neighbours following the energy
conservation law. The IR is then described by the development of sound
Design and Evaluation of Computer-Simulated Spatial Sound
141
energy in the receiver (Lokki, Savioja, Vaananen, Huopaniemi and Takala,
2002).
2.3.2 The Geometrical Approaches
These approaches assume that the sound wavelengths are smaller than the size of the
obstacles and therefore they are valid only for the sounds of high frequencies,
however, their lesser computational costs render their use feasible in VR systems.
The key idea common to all geometrical approaches is the simulation of the sound
wave propagation through an investigation of the behaviour of its infinitesimal parts,
the sound rays. The audibility of a sound source in the position of a listener is
investigated by searching rays that represent audible echoes of the emitted sound.
There are two methods within the geometrical approaches:
Figure 3. Ray tracing.
•
Ray tracing, similar to the well-known and widely used method with the
same name in the 3D computer graphics, is based on the concept of sound
rays tracing. Each ray of the initial set of rays emanating from S is traced
and compared with the position of sound receivers R1 and R2. Figure 3
shows the ray tracing process. The tracing is stopped when a limit is
reached (e.g., maximum order of reflection or minimum level of energy has
been exceeded, or the ray hits the receiver). This method is easy to
implement, but may carry the risk of space under sampling. As shown in
Figure 3, due to insufficient initial set of rays, R2 was considered incapable
of sound reception.
Figure 4. a. A beam b. A single beam; S – sound source, R 1, R 2 – sound receivers.
142
Kurniawan, Sporka, Nemec and Slavik
•
Beam tracing, based on the concept of sound beams tracing. A beam is a
cone defined by its apex (sound source) and its base (a closed environment)
as illustrated in Figure 4.a. A beam consists of all of the rays that originate in
the beam's apex and intersect the beam's base. Using this method, larger
areas of the space are searched at once as illustrated in Figure 4.b. The
calculation of the reflection of a beam is more computationally expensive
than that of the ray tracing method but fewer beams are required to get the
same precision of calculation as that of the ray tracing method.
Nonetheless, since the number of beams increases exponentially in the
higher order of reflection, this method is usable only for the early echoes.
Consequently, it is impossible to use only the geometrical approaches to
simulate long reverberations, for which the reflections of high orders ( 30),
needs to be taken into account in reasonable time.
2.3.3 The Statistical Approaches
The human auditory system can only distinguish the early echoes. The late
reverberation phase only provides information about the size of the environment.
Therefore, it is possible to model the late reverberation phase using a statistical
model where the echoes contributing to the simulated IR are randomly generated.
The requirements for these echoes are (Kutruff, 1979):
•
•
the temporal density of the reflections increases with the square of time;
the intensity drops exponentially with time.
The statistical approaches are employed in most of the current spatial sound
intended for use in the VR systems (Funkhouser, Min and Carlbom, 1999).
2.3.4 The Convolution
The process of applying the IR to the sound signal for spatialization is usually
modeled as the convolution of the sound signal and the IR. The convolution of two
discrete signals in the digital signal processing is usually defined as:
f1 [t ] f 2 [t ]
t
u 0
f1[u ] f 2 [t
u]
where f1 and f2 are the input signals
(2)
As the acoustic IR is a list of echoes of the emitted sound, the process of the
convolution of the emitted sound signal with the acoustic IR can be thought as the
superposition of the delayed, attenuated, and accelerated copies of the emitted sound
signal.
3 The Spatial Sound System
The designed spatial audio system’s main function is to perform off-line (non-real
time) simulations of the sound propagation between a source and a receiver, taking
Design and Evaluation of Computer-Simulated Spatial Sound
143
into account the acoustic response of the environment. This system employs a
hybrid sound propagation model consisting of a beam tracing algorithm (for the
phase of the early echoes) and a statistical model (for the late reverberation).
The process of modeling the acoustic response of the environment to the emitted
sound consists of two fundamental steps as illustrated in Figure 5:
•
•
IR is computed to simulate the propagation of the sound from the source to
the receiver in the environment. This step may also be considered as an
enumeration of the sound paths from the source to the receiver along which
the echoes of the original sound are transmitted.
The sound signal representing the acoustic activity of the sound source is
brought to the convolution with the IR generated by the previous step. The
result is the spatialized audio signal.
Figure 5. The process of simulating a spatialized audio signal
4 User Evaluation
To test the fit of the algorithms used in the spatialization process, the system was
evaluated by a representative group of its prospective users.
4.1 The Stimuli Development
The sounds were recorded using a stereophonic microphone PHILIPS SBC 3050
and a SoundBlaster 16 compatible sound card. Seven distinct sounds: guitar, flute,
mobile phone ringing, human voice, cane tapping, glass tinkling and handclapping,
were recorded in three different room conditions, coded small (S), medium (M) and
large (L). The characteristics of these rooms are listed in Table 1. These recorded
stimuli were simply called the recorded scenes.
The other stimuli are called the simulated scene stimuli. To create these stimuli,
dry sounds (pure sounds, without the effects of the environment) were recorded in a
music studio with a very short reverberation (less than .05s) using the AKG C1000S
microphone and the Midiman Delta 1010 sound card. The sounds were stored
separately into a set of 44.1 kHz PCM files. Then, the effects of the environments
were added using the designed spatial audio system. The addition process was
performed in two steps. Firstly, a model of the real rooms was created in the ASE
format (a 3D graphics file format). Secondly, a batch of separate task description
144
Kurniawan, Sporka, Nemec and Slavik
files for each dry sound was combined with each model of the rooms. This batch
was finally processed by the system to produce the simulated scene stimuli.
Table 1. The approximate characteristics of the real scenes
Environments
Bedroom (S)
Hallway (M)
Stairway (L)
Dimensions
4 × 4 × 2.5 m
8×3×5m
12 × 12 × 10 m
Surfaces
Plaster, carpet, wood
Plaster, marble
Plaster, marble, tiles
Reverberation length
.2 s
1s
3.5 s
4.2 The Evaluation Method
Nine registered blind participants (8M, 1 F; mean age 29.3 with an S.D. of 6.76
years) listened to 42 sound files (7 types of sound x 3 environments for each of the
simulated and recorded scene groups) through a headphone. The sequence of the
sounds played was controlled so that no adjacent sounds share any similarity (e.g., if
the first sound is a simulated flute in a small room, then the next played sound must
be in a recorded scene, must not be a flute sound nor it is in a small room).
Each participant performed the evaluation with no other participant around. Each
listened to one of the two sets of sounds. The order of the second set of sounds is the
reversed order of the first set. After listening to each sound, the participants
answered in writing three questions:
1.
2.
3.
What sound was it?
Was that sound more likely to be from a small (S), medium (M) or large (L)
room?
Was that room more likely to be a real room (R) or simulated using
computer (C)?
4.3 Results and Analysis
The first question was intended to encourage the participants to listen carefully.
Therefore, in this paper the answers were not analysed.
The answers to the second question were scored 0, 0.5 or 1. When the
participants answered correctly, they were scored 1. A score of 0.5 was given when
the difference between the correct and the wrong answers was one room size (e.g., a
participant answered S for a sound in an M room). When the difference was two
sizes (a participant answered S for L or vice versa) then a score of 0 was given. The
answers to third question were scored 0 (wrong) or 1 (correct).
The one-way Analysis of Variance (ANOVA) reveals that across all participants,
the sum of scores for the room size question were not significantly different between
the recorded and simulated scene groups, with F(1,376) = 0.03, p = 0.862. This
result might mean that there is evidence that the designed system was able, to a
certain degree, to simulate various room sizes. Further analysis, displayed in a
Design and Evaluation of Computer-Simulated Spatial Sound
145
% of correct answers
100
78
80
75
78
83
79
69
60
40
20
0
Small
Simulated
Medium
Recorded
Large
Room size
% of correct answers
graphical form in Figure 6.a, shows that the difference was not significant in any
room size.
Figure 6.b shows that it was easier to recognise the recorded scenes in the small
room conditions. The participants were correct in 83% of occasions. In the medium
room conditions, the participants seemed unsure whether the scenes were real or
simulated, indicated by scores that are only slightly above 50% (assuming that
random guessing carries a 50% probability of correct answers) in both scene groups.
And finally, in the large room conditions, the participants were quite successful in
recognising the simulated scenes (they were correct in 71% of the occasions) but
were less able to recognise the real scenes. Focusing on the simulated scenes, from
these results, it can be inferred that the designed system was unable to simulate the
large room conditions perfectly (hence, the simulated and recorded scenes could be
easily distinguished). Based on the same argument, it might mean that the designed
system was able, to a certain degree, to simulate the small and medium room
conditions quite well (hence the participants were unsure whether the scenes were
recorded or simulated).
90
80
70
60
50
40
30
20
10
0
83
71
59
57
49
41
Small
Simulated
Medium
Large
Nature of scenes
Recorded
Figure 6. a. The % of correct answers for the room size question. b. The % of correct answers
for the nature of the scenes question.
5 Conclusions and Further Work
The results of the user studies indicated that the algorithms behind the designed
spatial audio system were able to simulate the environments to a certain extent. The
system was able, to a certain degree, to simulate the sound variation in different
room sizes, indicated by the lack of significant differences between the sum of
scores in the simulated and recorded scene groups. However, it seems that when the
system simulated the large room condition, the difference between the reverberation
of the simulated and recorded scenes were noticeable. Based on these results, we can
speculate that the designed audio system is potentially useful as a part of the O&M
training suite for blind and visually impaired people, preferrably to simulate sounds
in small or medium room conditions.
146
Kurniawan, Sporka, Nemec and Slavik
Integrating this system into the training suite and testing the suite with its
prospective users is the immediate follow-up work. Further studies are also needed
to investigate how users can distinguish between various room sizes and between
simulated and recorded scenes.
6 Acknowledgement
We would like to thank Dominik Pecka for his willingness to lend us the music
studio Fjördström, Prague, and to operate its equipment during the recording of the
dry sounds.
7 References
Colwell C, Petrie H, Kornbrot D, Hardwick A, Furner S (1998) Haptic virtual reality for blind
computer users. In: Proceedings of the 3rd International ACM Conference on Assistive
Technologies (ASSETS '98). ACM Press, Marina del Rey, USA, pp 92-93
Funkhouser T, Jot JM, Tsingos N (2002) Sounds Good to Me! Computational Sound for
Graphics, Virtual Reality, and Interactive Systems. SIGGRAPH 2002 Course Notes
[online]. Available at: http://www.cs.princeton.edu/gfx/papers/funk02course.pdf
Funkhouser T, Min P, Carlbom I (1999) Real-time acoustic modelling for distributed virtual
environments. In: Computer Graphics Proceedings, Annual Conference Series, SIGGRAPH
99, Los Angeles, CA, pp 365–374
Inman DP, Loge K (1999) Teaching orientation and mobility skills to blind children using
simulated acoustical environments. HCI 2: 1090-1094
Kuttruff H (1979) Room Acoustics, 2nd ed. Applied Science Publishers Ltd., London, U.K.
Lokki T, Savioja L, Vaananen R, Huopaniemi J, Takala T (2002) Creating Interactive Virtual
Auditory Environments. IEEE Computer Graphics & Applications 22: 49-57
Seki Y, Ito K (2003) Study on acoustical training system of obstacle perception for the blind.
In: Craddock, McCormack, Rielly & Knops (Eds.), Assistive Technology - Shaping the
Future (Proceedings of 7th European Conference for the Advancement of Assistive
Technology (AAATE), Dublin, Ireland, 31 Aug – 3 Sept 2003) pp 461-465
Suzuki K, Martens WL (2001) Subjective evaluation of room geometry in multichannel
spatial sound reproduction: Hearing missing walls in simulated reverberation. In:
Proceedings of the 12th International Conference on Artificial Reality and Telexistence
(ICAT’01) [online]. Available at: http://vrsj.t.u-tokyo.ac.jp/ic-at/papers/01090.pdf.
The Royal Blind School (2003) Orientation and mobility [online]. Available at:
http://www.royalblindschool.org.uk/Departments/Mobility.htm.
Download