Room Volume Estimation Based on Ambiguity of Short

advertisement
robotics
Article
Room Volume Estimation Based on Ambiguity of
Short-Term Interaural Phase Differences Using
Humanoid Robot Head †
Ryuichi Shimoyama 1, * and Reo Fukuda 2
1
2
*
†
College of Industrial Technology, Nihon University, Tokyo 275-8575, Japan
System Research Division Image Information Systems Laboratory, Canon Electronics Inc.,
Tokyo 105-0011, Japan; fukuda.reo@canon-elec.co.jp
Correspondence: shimoyama.ryuichi@nihon-u.ac.jp; Tel.: +81-47-474-2418
This paper is an extended version of our paper published in Proceedings of the 23rd International
Conference on Robotics in Alpe-Adria-Danube Region (RAAD), Smolenice, Slovakia, 3–5 September 2014.
Academic Editor: Huosheng Hu
Received: 16 June 2016; Accepted: 14 July 2016; Published: 21 July 2016
Abstract: Humans can recognize approximate room size using only binaural audition.
However, sound reverberation is not negligible in most environments. The reverberation causes
temporal fluctuations in the short-term interaural phase differences (IPDs) of sound pressure.
This study proposes a novel method for a binaural humanoid robot head to estimate room
volume. The method is based on the statistical properties of the short-term IPDs of sound pressure.
The humanoid robot turns its head toward a sound source, recognizes the sound source, and then
estimates the ego-centric distance by its stereovision. By interpolating the relations between room
volume, average standard deviation, and ego-centric distance experimentally obtained for various
rooms in a prepared database, the room volume was estimated by the binaural audition of the robot
from the average standard deviation of the short-term IPDs at the estimated distance.
Keywords: room volume; estimation; short-term IPD; biologically inspired; binaural audition;
humanoid robot; PACS; J0101
1. Introduction
An autonomous robot needs to know its actual location, orientation, and surrounding
environment in order to turn around or otherwise move safely in unregulated environments. The robot
can localize its position visually in a well-regulated environment, in which every object is recognized
by its position. A monitoring system may send the robot much information on the surrounding
environment either manually or automatically by networked sensors. In unregulated environments,
such as those destroyed by disaster, or when sensors are malfunctioning, visual information is not
enough. However, an additional acoustic sensing system could provide the robot with a redundant
way for recognizing its location and changes in the environment.
It is well known that a humanoid robot can adapt easily to human living environments due to
its physical shape, size, and appearance. The audition of humanoid robots has conventionally been
an array system and not a binaural system. As an example of an array system, multiple lightweight
microphones are mounted on the head of humanoid robots ASIMO (Advanced Steps in Innovative
Mobility) and HRP-2 (Humanoid Robotics Project-2) [1,2]. Multiple microphones are necessary for
realizing multiple acoustical functions. In conventional design, the number of microphones and the
algorithm are selected before developing the hardware. Since humans can recognize approximate room
size by only binaural hearing, this work examines the possibility of binaural audition for room volume
Robotics 2016, 5, 16; doi:10.3390/robotics5030016
www.mdpi.com/journal/robotics
Robotics 2016, 5, 16
2 of 15
estimation by a humanoid robot, in a reverse approach from hardware design. Binaural audition is a
kind of biologically inspired approach.
To estimate the volume of a room, impulse room response has conventionally been used.
Reverberation time, which is related to room volume, is calculated from the damping characteristic
of the impulse response measured by one microphone in a room [3,4]. Binaural room impulse
response can also be used [5–9]. The conventional impulse room response is actively detected, so the
measurement is noisy and interferes with our living environments. Binaural audition is passively
detected and does not interfere with our environments.
Studies on sound source localization and distance or trajectory estimation from binaural signals
have been reported [10–12]. Sound reverberation is not negligible in reverberation environments
such as an enclosed space (except for an anechoic room), since reverberation comprises significant
components of the received signal. The ratio of energies of direct and reflected sound depends on the
perceived distance of a sound source [13–17]. In psychophysical experiments, the effect of reflected
sound on received sound depended on the positions of the sound source and the receiver in a room.
The absolute position or trajectory of the sound source in a room has usually been the focus of
the estimation.
The reverberation causes temporal fluctuations in short-term interaural phase differences (IPDs)
and interaural level differences (ILDs) [8,11]. The statistical properties of the received sound signal
have been studied for sound source location classification and distance estimation [11,18,19]. Hu et al.
discussed the influence of amplitude variation of non-stationary sound sources on the distribution
pattern of IPDs and ILDs when short-term Fourier transform is used [18]. Their method is invalid in an
unknown room or a changing environment. Georganti et al. proposed a method for binaural distance
estimation of a sound source from a voice sound [11]. The accuracy of the estimation decreased in a
shorter time frame. Reverberation degrades the repeatability of measurement due to the temporal
fluctuations in short-term IPDs and ILDs. No algorithm based on unrepeatable data has yet produced
satisfactory results. Though a shorter time frame is valid for conducting temporal measurements,
unrepeatable data are inadequate for exact measurement.
In this study, a novel method for estimating the room volume is proposed. The proposed method
is based on the statistical properties of short-term IPDs of sound pressure and uses a binaural humanoid
robot. The time frame adapted is less than 1 s. In this short time frame, the accuracy of estimation
decreases in the conventional method. The temporal fluctuation in short-term IPDs in rooms is
evaluated as the average standard deviation. The humanoid robot turns its head toward a sound
source, recognizes the object of the sound source, and then estimates the ego-centric distance to the
object. Ego-centric means that the robot is always at the origin of the local coordinates. The relative
distance between the robot at the origin of the local coordinates and the sound source in a room
is utilized for estimating room volume. The absolute locations of the robot and sound source in
a room are not the focus of this study. The room volume is estimated from the average standard
deviation of short-term IPDs at the estimated distance by interpolating the data in a prepared database.
The database contains the relations between room volume, average standard deviation, and ego-centric
distance experimentally obtained for various rooms. When the acoustical properties that relate only to
the room volume can be detected, the robot can identify various environments. This method may be a
solution for recognizing unknown or changed environments where no information on the environment
is available.
In Section 2, the methods and procedures are presented. The ego-centric distance by robot
vision and the statistical properties of sound by robot audition are described. A procedure for
evaluating the statistical properties of IPDs is described. Other procedures include localizing
the sound source, rotating the robot head toward the sound source, and preparing the database.
The experimental measuring system is also described, which consists of eight room measurements
and volumes. The statistical properties of IPDs in a short-term frequency analysis are shown in
Section 3. The prepared database and the results of room volume estimations in unknown rooms are
Robotics 2016, 5, 16
3 of 15
Robotics
Robotics 2016,
2016, 5,
5, 16
16
33 of
of 15
15
shown. The effect of surrounding obstacles on the room volume estimation is described. In Section 4,
are
are discussed
discussed in
in comparison
comparison with
with the
the results
results of
of conventional
conventional methods.
methods. Finally,
Finally, in
in Section
Section 5,
5, aa
experimental results are discussed in comparison with the results of conventional methods. Finally, in
summary
summary is
is given.
given.
Section 5, a summary is given.
2.
2. Methods
Methods
and
Procedures
2.
Methodsand
andProcedures
Procedures
2.1.
Vision
2.1. Ego-Centric
Ego-Centric Distance
Distance Estimation
Estimation by
by Robot
Robot Vision
Vision
Figure
Figure 11 shows
humanoid robot
robot head
head and
and aa loudspeaker
loudspeaker in
in aaareverberation
reverberation
room.
Figure
shows the
the geometry
geometry of
of aa humanoid
reverberationroom.
room.
The
ego-centric
distance
D
to
the
loudspeaker
is
estimated
by
robot
vision
provided
by
two
cameras.
The
ego-centric
distance
D
to
the
loudspeaker
is
estimated
by
robot
vision
provided
by
two
cameras.
The
D to the loudspeaker is estimated by robot vision provided by two cameras.
Loudspeaker
Loudspeaker
D
D
Humanoid
Humanoid robot
robot head
head
Reverberate
Reverberate rooms
rooms
L[m]
L[m] xx W[m]
W[m] xx H[m]
H[m]
Figure 1.
Geometryof
humanoid
robot
head
loudspeaker
in reverberative
(an overhead
Figure
robot
head
and
loudspeaker
in
rooms
(an
view).
1. Geometry
Geometry
ofofhumanoid
humanoid
robot
head
andand
loudspeaker
in reverberative
reverberative
roomsrooms
(an overhead
overhead
view).
view).
Theturns
robotits
turns
itstoward
head toward
the loudspeaker
by binaural
audition,
recognizes
the object,
and
The
head
the
by
audition,
recognizes
the
and
The robot
robot
turns
its
head
toward
the loudspeaker
loudspeaker
by binaural
binaural
audition,
recognizes
the object,
object,
and then
then
then
estimates
Ego-centric
distance
D the
to the
loudspeaker
stereovision.Short-term
Short-termIPDs
IPDsof
of binaural
estimates
Ego-centric
distance
D
loudspeaker
by
stereovision.
estimates
Ego-centric
distance
D to
to
the
loudspeaker
byby
stereovision.
Short-term
IPDs
of
binaural
sound signals
volume.
signals are
are utilized
utilized for
for identifying
identifying the
the room
room volume.
volume.
The
The geometry
geometry of
of the
the two
two cameras
cameras mounted
mounted
in
the
robot
head
and
the
object
is
shown
in
Figure
2.
The
geometry
of
the
two
cameras
mounted in
in the
the robot
robot head
head and
and the
the object
object is
is shown
shownin
inFigure
Figure2.
2.
The
distance
from
the
center
point
between
the
two
cameras
to
the
object
is
defined
as
the
ego-centric
The
distance
from
the
center
point
between
the
two
cameras
to
the
object
is
defined
as
the
ego-centric
The distance from the center point between the two cameras to the object is defined as the ego-centric
distance.
distance. It
It
should be
be emphasized
emphasized that
that
Figure
shows no
no absolute
absolute location
location in
in the
the room.
room. The
The
object,
distance.
It should
should
be
emphasized
that Figure
Figure 22
2 shows
shows
no
absolute
location
in
the
room.
The object,
object,
which
is
the
sound
source,
is
detected
by
each
camera.
which
is
the
sound
source,
is
detected
by
each
camera.
which is the sound source, is detected by each camera.
Object
θ11
D
θ22
S
Left camera
Right camera
Figure
Figure 2.
2. Geometry
Geometry
of
stereo
cameras
mounted
in
robot’s
eye
and
the
object
as
sound
source.
Geometry of
of stereo
stereo cameras
cameras mounted
mounted in
in robot’s
robot’s eye
eye and
and the
the object
object as
as aa sound
sound source.
source.
Ego-centric
angles.
Ego-centric distance
distance D
D to
to the
the object
object is
is estimated
estimated from
from two
two visual
visual angles.
angles.
In
In Figure
Figure 2,
2, the
the two
two visual
visual angles
angles for
for the
the object
object are
are θ1
θ1 and
and θ2,
θ2, and
and distance
distance D
D is
is given
given by
by
Robotics 2016, 5, 16
4 of 15
In Figure 2, the two visual angles for the object are θ1 and θ2, and distance D is given by
Robotics 2016, 5, 16
D“
4 of 15
S
tanθ1 ` tanθ2
(1)
S
D
(1)
where S is the distance between the two cameras.
Each visual angle is calculated as the difference
tan  1  tan  2
between the center of the graphical image and the center position of the object recognized in the same
S is the distance
between the
two cameras.
Each visual
angle is calculated
as the
difference
image.where
The accuracy
of ego-centric
distance
D decreases
with decreasing
distance
between
the two
between
the
center
of
the
graphical
image
and
the
center
position
of
the
object
recognized
in
the same
cameras. The absolute difference ε between θ1 and θ2 is expressed as
image. The accuracy of ego-centric distance D decreases with decreasing distance between the two
cameras. The absolute difference  between
θ2 is expressed as
ε “θ1
|θ1and
´ θ2|
(2)
  1   2
(2)
If θ1 = θ2, the object is positioned at the same distance from each camera. The robot can rotate its
If θ1 = θ2,
object is Equation
positioned(2).
at the
distance
from
each its
camera.
robot can
rotate the
head horizontally
tothe
minimize
Thesame
robot
can also
adjust
head The
vertically
to match
its
head
horizontally
to
minimize
Equation
(2).
The
robot
can
also
adjust
its
head
vertically
to
match
center of the object to the vertical center of the image. To find the exact center position of the object, the
the center of the object to the vertical center of the image. To find the exact center position of the
object pattern is calculated by a geometry matching algorithm [20]. The center position of the object is
object, the object pattern is calculated by a geometry matching algorithm [20]. The center position of
defined as the center of the matched template. The error of the object position in the graphical image
the object is defined as the center of the matched template. The error of the object position in the
may significantly
affectmay
the ego-centric
estimation.
template
patterns
photographed
graphical image
significantlydistance
affect the
ego-centricSeveral
distance
estimation.
Several
template at
various
distances
are used and
the best
matching
automatically
calculating
patterns
photographed
at various
distances
are template
used and is
theselected
best matching
templatefor
is selected
the precise
object position
in the the
graphical
The estimated
distance image.
D is corrected
at several
automatically
for calculating
precise image.
object position
in the graphical
The estimated
distance
D is corrected
at several with
measuring
positions by
with of
true
so that
the
measuring
positions
by comparing
true distances,
socomparing
that the error
thedistances,
estimated
ego-centric
error
of
the
estimated
ego-centric
distance
is
below
0.02
m
over
the
range
from
0.5
m
to
1.5
m
for
“ε
distance is below 0.02 m over the range from 0.5 m to 1.5 m for “ε < 0.5 degrees”.
< 0.5 degrees”.
2.2. Estimation of Statistical Properties of Sound by Binaural Audition
2.2. Estimation of Statistical Properties of Sound by Binaural Audition
Two acoustical signals, x(t) and y(t), are detected with two microphones. The cross-power
Two acoustical signals, x(t) and y(t), are detected with two microphones. The cross-power
spectrum G12 is expressed as
spectrum G12 is expressed as
G12 “ X p f q Y p f q˚
(3)
G12  X  f Y  f 
*
(3)
where X(f ) and Y(f ) are the respective spectra after discrete Fourier transform (DFT) processing and *
where
and Y(f)conjugate
are the respective
spectra after discrete Fourier transform (DFT) processing and
indicates
theX(f)
complex
[20].
* indicates
theG12
complex
[20]. as
The
phase of
(IPD)conjugate
is expressed
The phase of G12 (IPD) is expressed as

=G12 “ = pXp f qY ˚*p f qq
G12   X ( f )Y ( f )

(4)
(4)
200
fi
⊿f
Standard deviation [deg.]
Interaural phase difference [deg.]
Figure
3a shows
a asample
the IPDs
IPDsforfor
measured
pressure
in aThe
room.
Figure
3a shows
sample of
of the
thethe
measured
soundsound
pressure
in a room.
discontinuity of the
byby
thethe
standard
deviation.
The discontinuity
theIPDs
IPDsisisquantified
quantified
standard
deviation.
100
0
-100
-200
4000
6000
8000
10000
Frequency [Hz]
(a)
12000
60
fmin
50
40
fmax
S.D.AVE
30
20
10
0
4000
6000
8000
10000
12000
Frequency [Hz]
(b)
Figure 3. Method for evaluating statistical properties of binaural signals: (a) method for calculating
Figure 3. Method for evaluating statistical properties of binaural signals: (a) method for calculating the
the frequency spectrum of standard deviation from short-term IPDs of binaural signals; and
frequency spectrum of standard deviation from short-term IPDs of binaural signals; and (b) frequency
(b) frequency spectrum of the standard deviation and its average value from fmin to fmax.
spectrum of the standard deviation and its average value from fmin to fmax.
Robotics 2016, 5, 16
5 of 15
If 2m + 1 data of the discrete phase difference are included within the frequency width Δf,5 where
of 15
m is an integer, the standard deviation σi at center frequency fi is given by the following equation:
Robotics 2016, 5, 16


im
1 are
2 the frequency width ∆f, where
If 2m + 1 data of the discrete phase difference
included within
i 
  j  i
(5)
m is an integer, the standard deviation σi at 2center
frequency
f
is
given
by the following equation:
i
m  1 j i  m
g
iÿ
`
where ϕj is the phase difference value at f
frequency
fj,mand
f
`
1
i
φ j ´ φi
σi “ e
is
˘2 the average of the phase differences
(5)
2m `
1
at frequency fi. The frequency spectrum of the
standard
deviation is obtained as the center frequency
j “i ´ m
shifted over the measured frequency range (Figure 3b). The average value S.D.AVE over the frequency
where
φj isfmin
thetophase
range
from
fmax isdifference
given by value at frequency f j , and φi is the average of the phase differences at
frequency fi . The frequency spectrum of the standard deviation is obtained as the center frequency
1 n
shifted over the measured frequency rangeS(Figure
average value S.D.AVE over the frequency(6)
.D. AVE3b).
 The

i
n i 1
range from f min to f max is given by
n
1ÿ
where n data of the discrete standard deviation
min to
S.D. AVEare
“ included
σi within the frequency range from f(6)
n
i“1reverberation condition in the room.
fmax. S.D.AVE is a steady indicator that corresponds to the
The
frequency
spectrum
of
the
standard
deviation
is not
always
necessary,
sincefrom
the faverage
where n data of the discrete standard deviation are included
within
the frequency
range
min to
standard
deviation
value indicator
can be directly
calculatedtofrom
the short-term
IPDs ofinbinaural
signals, as
f max . S.D.
that corresponds
the reverberation
condition
the room.
AVE is a steady
shown The
in Figure
3a. The
frequency
the standard
is utilized
forthe
selecting
frequency
spectrum
of thespectrum
standard of
deviation
is not deviation
always necessary,
since
averagethe
standard
deviationrange
value that
can be
directly
calculated
from
the short-term
IPDs of binaural signals, as
adequate
frequency
is most
sensitive
to the
change
of room volume.
shown in Figure 3a. The frequency spectrum of the standard deviation is utilized for selecting the
frequency range that is most sensitive to the change of room volume.
2.3.adequate
Procedures
an actual environment, sound is not always generated in front of the robot. In addition, the
2.3.InProcedures
sound source cannot be distinguished between several objects by using only the visual image
In an actual environment, sound is not always generated in front of the robot. In addition, the
captured by the two cameras. Therefore, after localizing the sound source, the robot head is rotated
sound source cannot be distinguished between several objects by using only the visual image captured
toward the sound source and the object nearest to the center of the cameras is assumed to be the
by the two cameras. Therefore, after localizing the sound source, the robot head is rotated toward the
sound source. Additional procedures are necessary for the robot head to rotate toward the sound
sound source and the object nearest to the center of the cameras is assumed to be the sound source.
source
and to calculate the ego-centric distance and room volume. The proposed algorithm is shown
Additional procedures are necessary for the robot head to rotate toward the sound source and to
in calculate
Figure 4.the ego-centric distance and room volume. The proposed algorithm is shown in Figure 4.
Left signal x ( t)
Right signal y ( t)
DFT
DFT
Conjugate
Y * (f )
X (f )
Interaural Phase differences
( Cross- power spectral phase )
Finding sound source azimuth
Rotating robot head toward sound source
Distance estimation by robot vision
( Ego- centric distance D )
( Evaluation function S.D. AVE )
Calculation of the frequency spectra of the
standard deviation on phase differences
Finding evaluation function S.D.AVE
over limited frequency range
Database
 D  ΔS .D. AVE

S .D. AVE  Room volume
Estimation of the room volume,
interpolating the database
Figure
algorithm.
Figure 4.
4. Proposed
Proposed algorithm.
Four procedures are additionally needed as pre-processes:
Robotics 2016, 5, 16
6 of 15
Robotics 2016, 5, 16
6 of 15

‚
‚

‚

‚
Four procedures are additionally needed as pre-processes:
Sound is generated in the room.
Sound
is generated
the room.
The sound
source is in
localized
with binaural signals by using the localization algorithm CAVSPAC
[21].
The
robot
head
is
rotated
toward
sound source.
there
the sound
The sound source
localized
withthe
binaural
signalsThough
by using
theis ambiguity
localizationin algorithm
source’s position
in terms
of front
or back,
the robot
can distinguish
front from
by rotating
CAVSPAC
[21]. The
robot head
is rotated
toward
the sound
source. Though
thereback
is ambiguity
in
its head
several
times.
the
sound
source’s
position in terms of front or back, the robot can distinguish front from back by
The object
is recognized
by robot vision. The robot adjusts its head angle horizontally and
rotating
its head
several times.
vertically
ε < 0.5 by
degrees
Equation
(2).
This adjusts
threshold
based
on the results
The
objecttoissatisfy
recognized
robotinvision.
The
robot
itsvalue
head was
angle
horizontally
and
of
preliminary
measurements.
vertically to satisfy ε < 0.5 degrees in Equation (2). This threshold value was based on the results
The
ego-centricmeasurements.
distance is calculated by Equation (1). The error of the estimated ego-centric
of
preliminary
distance
is
below
0.02 m over
the rangeby
from
0.5 m to(1).
1.5 The
m after
correction.
The ego-centric distance
is calculated
Equation
error
of the estimated ego-centric
distance
is below
0.02bemthe
over
the range
0.5 m to 1.5
m the
afterego-centric
correction.distance is estimated
The object
that may
sound
sourcefrom
is recognized
and
by the
above
procedures.
The
object
that may be the sound source is recognized and the ego-centric distance is estimated
by
above procedures.
 the
Binaural
sound signals are measured during the measuring time frame. The cross-power spectral
phase
is
calculated
by Equations
(3) and
(4). the measuring time frame. The cross-power spectral
Binaural sound signals
are measured
during
‚

The frequency
spectrum
of the standard
deviation is obtained by Equation (5).
phase
is calculated
by Equations
(3) and (4).
‚
S.D.
AVE is calculated for evaluating the reverberation by Equation (6).
The frequency spectrum of the standard deviation is obtained by Equation (5).

The room volume is estimated by referring to the database, which contains the relations between
‚
S.D.AVE is calculated for evaluating the reverberation by Equation (6).
the average standard deviation, the ego-centric distance, and the room volume, experimentally
‚
The room volume is estimated by referring to the database, which contains the relations between
defined in advance.
the average standard deviation, the ego-centric distance, and the room volume, experimentally
defined in advance.
2.4. Experimental Measuring System
2.4. Experimental
Measuring
System of the humanoid robot head (M3-Neony, Vstone). The robot head
Figure 5 shows
the appearance
is approximated
as athe
rotated
ellipsoidal
of 0.11robot
m (Length)
× 0.05 m (Radius).
An robot
auto-focus
Figure 5 shows
appearance
of theshell
humanoid
head (M3-Neony,
Vstone). The
head
camera
(Q2F-00008,
Microsoft)
was
mounted
in
each
robot’s
eye.
The
interval
between
the two
is approximated as a rotated ellipsoidal shell of 0.11 m (Length) ˆ 0.05 m (Radius). An auto-focus
cameras(Q2F-00008,
is approximately
0.05 was
m. Earphone-type
microphones
4101, Bruel
& Kjӕr)
were
added
camera
Microsoft)
mounted in each
robot’s eye.(Type
The interval
between
the two
cameras
for
sound
detection.
The
interval
between
the
two
microphones
is
approximately
0.10
m.
The
robot
is approximately 0.05 m. Earphone-type microphones (Type 4101, Bruel & Kjær) were added for sound
head
was
fixed
on
the
pan
tilt
unit
(PTU-46-17P70T,
FLIR),
in
which
motion
is
controlled
detection. The interval between the two microphones is approximately 0.10 m. The robot headby
wasa
workstation.
The angular
of rotating
the
tilt unit
are is
0.013
degreesby
in apan
and 0.003
fixed
on the pan
tilt unit resolutions
(PTU-46-17P70T,
FLIR),
inpan
which
motion
controlled
workstation.
degrees
in
tilt.
The angular resolutions of rotating the pan tilt unit are 0.013 degrees in pan and 0.003 degrees in tilt.
Auto-focus camera
Microphone
Pan tilt Unit
Figure
Figure 5.
5. Appearance of humanoid robot head. An auto-focus camera was mounted in each robot’s
eye.
Earphone-type microphones
eye. Earphone-type
microphones are
are added
added for
for sound
sound detection.
detection. The
The robot
robot head
head was
was fixed
fixed on
on the
the pan
pan
tilt
tilt unit,
unit, in
in which
which motion
motion is
is controlled
controlled by
by aa workstation.
workstation.
The configuration of the experimental measuring system is shown in Figure 6.
Robotics 2016, 5, 16
7 of 15
Robotics 2016, 5, 16
7 of 15
The configuration of the experimental measuring system is shown in Figure 6.
Robot head
Loudspeaker
X-Stage
NEXUS2693,
B&K
T3500,Dell
PCI-4474,N.I.
EH-150,Hitachi
PC
Figure 6.
6. Measuring
Measuring system
system configuration.
configuration.
Figure
The
The measurement
measurement and data processing
processing of the
the acoustical
acoustical signals
signals were
were conducted
conducted on
on aa workstation
workstation
(T3500, Dell)
Dell)equipped
equipped
with
a 24-bit
resolution
A/D converter
(PCI-4474,
N.I.).
The frequency
sampling
with
a 24-bit
resolution
A/D converter
(PCI-4474,
N.I.). The
sampling
frequency
was
24frame
kHz. One
frame
was 0.2
s, which
included
for each
This
was 24 kHz.
One
period
was period
0.2 s, which
included
4800
data for4800
eachdata
channel.
Thischannel.
frame period
frame
period isshorter
remarkably
shorter
than in
that
in the conventional
methods.noise
Broadband
noise
is remarkably
than that
adapted
theadapted
conventional
methods. Broadband
was radiated
was
radiated
via the(0.19
loudspeaker
(0.19mm(W)
(H)ˆ×0.13
0.11mm(D)),
(W)which
× 0.13was
m (D)),
which was
positioned
atm.
a
via the
loudspeaker
m (H) ˆ 0.11
positioned
at a height
of 0.83
height
of 0.83 m. The
loudspeaker
on the
speaker
stand
fixed on the computer-controlled
X-stage
The loudspeaker
on the
speaker stand
was
fixed on
the was
computer-controlled
X-stage and the position
and
positionbywas
controlled
by a sequencer
Hitachi).
of fmin
to fmax3b
shown
was the
controlled
a sequencer
(EH-150,
Hitachi).(EH-150,
The values
of f minThe
to fvalues
max shown
in Figure
were
in
3b and
were11set
at 4 respectively.
kHz and 11 kHz, respectively.
setFigure
at 4 kHz
kHz,
Table
Table 1 shows the room measurements and volumes used for preparing the database. Each
Each room
room
volume was approximated using architectural drawings. Lecture rooms B and C were similar to each
each
other
other in volume, and D and E were also similar to each other in volume.
Table 1. Room measurements and volumes.
Table
Room
Room
Gymnasium
Lecture room A
Lecture room B
Gymnasium
Lecture room C
Lecture room D
Lecture room E
Lecture
room A
Lecture
room F
Elevator hall
Lecture room B
Room Measurement
Room measurement
L[m]ˆW[m]ˆH[m]
L[m]×W[m]×H[m]
26.0 ˆ 39.0 ˆ 11.7
20.1 ˆ 14.7 ˆ 3.3
14.3 ˆ 17.6 ˆ 3.0
26.0ˆ×17.1
39.0ˆ
× 11.7
14.1
3.0
9.0 ˆ 13.6 ˆ 3.0
8.5 ˆ 13.6 ˆ 3.0
20.1
14.7
3.3
5.7
ˆ×7.6
ˆ ×2.6
7.8 ˆ 3.6 ˆ 3.0
14.3 × 17.6 × 3.0
Room Volume [m3 ]
Room volume [m3]
119 ˆ 102
971
755
726 119 × 102
367
347
971
113
70
755
3. Experimental Results
Lecture room C
14.1 × 17.1 × 3.0
726
3.1. Statistic Properties of IPDs in Short-Term Frequency Analysis
Lectureproperties
room D
9.0sound
× 13.6 ×pressure
3.0
367
The statistical
of the IPDs of
in a short-term frequency
analysis are
next discussed. The short-term IPDs measured in the reverberation rooms fluctuate temporally [11,18].
Lecture
room E were tested first.
8.5 ×One
13.6 loudspeaker
× 3.0
347a distance 1 m from
One room and
one distance
was located at
the robot head. Figure 7a,b show the two IPDs measured under the same conditions. Though they
Lecture
room
F
5.7 ×identical.
7.6 × 2.6 As shown in Figure113
have almost the
same
properties,
they are not
7c, which is a plot of
the differences between the two IPDs, the differences fluctuated randomly over the full measuring
Elevator
hall and several distances
7.8 × 3.6 were
× 3.0 tested next. The standard
70
frequencies. Three
rooms
deviation was
3.1. Statistic Properties of IPDs in Short-Term Frequency Analysis
200
100
0
-100
-200
0
10000
5000
Frequency [Hz]
Interaural phase difference [deg.]
Interaural phase difference [deg.]
The statistical properties of the IPDs of sound pressure in a short-term frequency analysis are
next discussed. The short-term IPDs measured in the reverberation rooms fluctuate temporally
[11,18]. One room and one distance were tested first. One loudspeaker was located at a distance 1 m
Robotics 2016, 5, 16
8 of 15
from the robot head. Figure 7a,b show the two IPDs measured under the same conditions. Though
they have almost the same properties, they are not identical. As shown in Figure 7c, which is a plot
of the differences between the two IPDs, the differences fluctuated randomly over the full measuring
introduced
for quantizing
the fluctuations.
The relation
between
thestandard
standard
deviation
frequencies.
Three rooms
and several distances
were tested
next. The
deviation
was and the
ego-centricintroduced
distancefor
according
the
three room
is shown
7d.deviation
The standard
quantizingtothe
fluctuations.
Thesizes
relation
betweenin
theFigure
standard
and the deviation
distance according
to theand
three increasing
room sizes is shown
in Figure
7d. The
standard
deviation
increased ego-centric
with decreasing
room size
distance
of the
source.
This
indicates that
increased with decreasing room size and increasing distance of the source. This indicates that
short-term IPDs are ambiguous, unrepeatable, and unreliable for a distant sound source.
short-term IPDs are ambiguous, unrepeatable, and unreliable for a distant sound source.
200
100
0
-100
-200
0
5000
(a)
(b)
200
80
Standard deviation[deg.]
Difference of IPDs [deg.]
10000
Frequency [Hz]
100
0
-100
-200
0
5000
10000
Frequency [Hz]
60
40
20
0
0
0.5
1.5
1
2
Ego-centric distance to loudspeaker [m]
(c)
(d)
Figure 7. These figures show that the repeatability of short-term IPDs depends on the room volume.
Figure 7. These
figures
show
the repeatability
ofdistance
short-term
IPDs depends
room volume.
Standard
deviation
was that
proportional
to ego-centric
to loudspeaker:
(a) first;on
(b)the
second;
Standard (c)
deviation
proportional
to (d)
ego-centric
distance
to loudspeaker:
(a)room
first;
(b) second;
Differencewas
between
(a) and (b); and
Standard deviation
to distance.
(a,b) The same
and
recording
conditions
recorded
theStandard
first and second
recordings,
respectively.
(c) Difference
between
(a) and
(b); during
and (d)
deviation
to distance.
(a,b) The same room and
recording conditions recorded during the first and second recordings, respectively.
Next, the repeatability of S.D.AVE calculated using Equation (6) is discussed. Figure 8 shows the
relation between S.D.AVE and the ego-centric distance. The standard deviation of S.D.AVE, which was
five times atof
each
distance,
was a maximum
0.5 degrees (6)
at various
distances Figure
from 0.5 8
mshows the
Next,measured
the repeatability
S.D.
usingofEquation
is discussed.
AVE calculated
to 1.5 m. This value was markedly smaller than the fluctuation of IPDs shown in Figure 7d. This
relation between
S.D.AVE and the ego-centric distance. The standard deviation of S.D.AVE , which was
implies that S.D.AVE is more repeatable and reliable than IPD.
Average standard deviation S.D.AVE [deg.]
measured five times at each distance, was a maximum of 0.5 degrees at various distances from 0.5 m to
1.5 m. This value was markedly smaller than the fluctuation of IPDs shown in Figure 7d. This implies
Robotics
2016, 5,is
16more repeatable and reliable than IPD.
9 of 15
that
S.D.
AVE
15
10
5
0
0
0.5
1
1.5
2
Ego-centric distance to loudspeaker D [m]
8. Relation
Relation between
between S.D.
S.D.AVE
AVE and the ego-centric distance
distance to
to loudspeaker.
loudspeaker.S.D.
S.D.AVE
AVE is repeatable
Figure 8.
steady for
for measurement
measurement repetitions
repetitions under
under same
same conditions.
conditions.
and steady
Figure 9 shows that the values of S.D.AVE were proportional to the ego-centric distance in the
eight rooms of different volumes listed in Table 1. The slopes of S.D.AVE with respect to the ego-centric
distance depended on the room volume. S.D.AVE values of the short-term IPDs increased with
decreasing room size when the distance was held constant. The values of S.D.AVE were compared in
eight rooms at distances 0.5 m, 1.0 m, and 1.5 m, when the IPDs were measured repeatedly under the
same conditions. Taking account of the repeated measurement fluctuations, these rooms can be
Average
0
0
0.5
1
1.5
2
Ego-centric distance to loudspeaker D [m]
Robotics Figure
2016, 5, 8.
16 Relation between S.D.AVE and the ego-centric distance to loudspeaker. S.D.AVE is repeatable
9 of 15
and steady for measurement repetitions under same conditions.
Average standard deviation S.D.AVE [deg.]
Figure
the values
valuesofofS.D.
S.D.
were
proportional
toego-centric
the ego-centric
distance
AVE
Figure99shows
shows that
that the
AVE
were
proportional
to the
distance
in the in
the
eight
rooms
of
different
volumes
listed
in
Table
1.
The
slopes
of
S.D.
with
respect
eight rooms of different volumes listed in Table 1. The slopes of S.D.AVE with respectAVE
to the ego-centric to
the
ego-centric
distance
depended
on the S.D.
room
values of
theincreased
short-term
IPDs
distance
depended
on the
room volume.
AVEvolume.
values ofS.D.
theAVE
short-term
IPDs
with
increased
with
decreasing
size when
constant.
The
values
of S.D.
decreasing
room
size whenroom
the distance
wasthe
helddistance
constant.was
Theheld
values
of S.D.AVE
were
compared
inAVE
eight
rooms at distances
0.5 m, 1.0
m, and 1.50.5
m, m,
when
repeatedly
under
the
were
compared
in eight rooms
at distances
1.0the
m,IPDs
and were
1.5 m,measured
when the
IPDs were
measured
same conditions.
Taking
of theTaking
repeated
measurement
fluctuations,
these rooms
can be
repeatedly
under the
sameaccount
conditions.
account
of the repeated
measurement
fluctuations,
categorized
room sizes forinto
the relatively
longsizes
(1.5 m)
distance
from(1.5
the m)
loudspeaker:
these
rooms into
canfour
be categorized
four room
forego-centric
the relatively
long
ego-centric
small (elevator
room),
middle (room
F),(elevator
intermediate
(rooms
D, E),(room
large (gymnasium,
and rooms
A,D,
B, E),
distance
from the
loudspeaker:
small
room),
middle
F), intermediate
(rooms
3, 113 m3, 347–367 m3, and 726 m3 and3above 3
and
C).
The
corresponding
room
volumes
were
70
m
large (gymnasium, and rooms A, B, and C). The corresponding room volumes were 70 m , 113 m ,
according
the726
values
of S.D.
AVE. The room volumes are clearly distinguished in the smaller rooms
347–367
m3 ,toand
m3 and
above
according to the values of S.D.AVE . The room volumes are clearly
(rooms
D,
E,
F,
and
elevator
hall)
with
a distant
sound
source due
to with
sounda reflection.
Large
roomsdue
distinguished in the smaller rooms (rooms
D, E, F,
and elevator
hall)
distant sound
source
above 726 m3 (gymnasium, and rooms A, B, and
C) volume were not as precisely distinguished by the
3
to sound reflection. Large rooms above 726 m (gymnasium, and rooms A, B, and C) volume were not
present method.
as precisely distinguished by the present method.
60
50
40
○:Gymnasium
●:Room A
△:Room B
▲:Room C
□:Room D
■:Room E
◇:Room F
◆:Elevator hall
30
20
10
0
0.5
1
1.5
Ego-centric distance to loudspeaker [m]
Figure
and the ego-centric distance to loudspeaker in the eight rooms of
Figure9.9.Relation
Relationbetween
betweenS.D.
S.D.AVE
AVE and the ego-centric distance to loudspeaker in the eight rooms of
different
volumes
listed
in
Table
1.
different volumes listed in Table 1.
3.2.
Estimation
3.2.
EstimationofofTest
TestRoom
RoomVolume
Volume
Estimation
test room
roomG,
G,which
whichisisnot
notincluded
included
Table
1,next
is next
discussed.
Estimationofofthe
theroom
roomvolume
volume of test
in in
Table
1, is
discussed.
Figure
showsaa3-D
3-Dsurface
surface for
for interpolating
interpolating the
room
volumes,
thethe
average
Figure
1010shows
therelations
relationsbetween
betweenthe
the
room
volumes,
average
standarddeviations,
deviations,and
andego-centric
ego-centric distances
distances that
obtained
in in
thethe
eight
rooms
standard
thatwere
wereexperimentally
experimentally
obtained
eight
rooms
shown in Table 1. The measurements were conducted in a relatively wide space at the center of each
room, in which the desks and the chairs were moved to one wall. The height of rooms A to F and
the elevator hall was approximately 3 m. The height of the binaural microphones of the robot head
was 0.8 m. The distances to the side walls were longer than the distances to the floor or the ceiling in
the measured rooms. The room volume in test room G was estimated by using the relations shown
in Figure 10. The volume of test room G was calculated as 1108 m3 , 21 m ˆ 11 m ˆ 4.8 m, from the
design layout. The estimated room volume was 1113 m3 , and the value of S.D.AVE was 10.3 degrees at
a distance of 1 m to the loudspeaker. The estimated room volume was in good agreement with the
volume calculated from the room design drawing. The maximum error rate of room volume estimation
was 22% in the largest room beside the gymnasium, lecture room A, when the IPDs were measured
repeatedly in the same room and under the same recording conditions. Since the magnitude of the
slope of room volume versus S.D.AVE increases with increasing room volume, the error of the room
volume estimation will decrease with decreasing room volume in same way as shown for S.D.AVE
in Figure 9. The estimated room volumes approximately agreed with each other in several different
rooms with the same actual volumes. Though the room volumes shown in Table 1 are different from
volume calculated from the room design drawing. The maximum error rate of room volume
estimation was 22% in the largest room beside the gymnasium, lecture room A, when the IPDs were
measured repeatedly in the same room and under the same recording conditions. Since the
magnitude of the slope of room volume versus S.D.AVE increases with increasing room volume, the
error of the room volume estimation will decrease with decreasing room volume in same way as
Robotics
2016, 5,for
16S.D.AVE in Figure 9. The estimated room volumes approximately agreed with each other in
10 of 15
shown
several different rooms with the same actual volumes. Though the room volumes shown in Table 1
are different from each other, large rooms beyond 726 m3 were not as precisely distinguished, as
each other,
largeinrooms
beyond
726
m3 were
as precisely
distinguished,
Section
mentioned
Section
3.1. The
robot
could not
acoustically
categorize
test roomas
G mentioned
as relativelyinlarge
in 3.1.
The robot
could
acoustically
categorize
test
room
G
as
relatively
large
in
terms
of
room
volume.
terms of room volume.
106
10 4
Room G
Room
volume [m3]
10 2
Ego-centric distance to
loudspeaker [m]
Average standard deviation
S .D. AVE [deg.]
Figure
10. Relations
between
theroom
roomvolumes,
volumes, the
the average
deviations,
andand
ego-centric
Figure
10. Relations
between
the
averagestandard
standard
deviations,
ego-centric
distances
that
were
experimentally
obtained
in
the
eight
rooms
shown
in
Table
1.
Room
G,
which
was was
distances that were experimentally obtained in the eight rooms shown in Table 1. Room G, which
not
included
in
Table
1,
was
a
test
room.
not included in Table 1, was a test room.
3.3. Effect of Surrounding Obstacles on the Room Volume Estimation
3.3. Effect of Surrounding Obstacles on the Room Volume Estimation
Several obstacles, such as furniture, desks, and chairs, are found in a real room. Sound reflects
Several
obstacles,
such
as furniture,
and chairs,
are found
a real
room.
Sound
reflects
from their
surfaces and
diffracts
around desks,
them. Curtains
on windows
mayinalso
absorb
sound
energy.
from The
their
surfaces
and diffracts
around
them.
Curtains
on windows
mayinalso
energy.
effect
of obstacles
on the present
room
volume
estimation
is discussed
this absorb
section. sound
The kind,
size, and
location ofon
furniture
in a room
vary,
so the estimation
following two
casesinare
discussed
thekind,
The effect
of obstacles
the present
room
volume
is simple
discussed
this
section.forThe
samelocation
room. of furniture in a room vary, so the following two simple cases are discussed for the
size, and
First
same room. is the case that no surrounding obstacles are set around the robot for a relatively wide space.
The same loudspeaker was located at four positions from A to D at the center of the room, as shown
First is the case that no surrounding obstacles are set around the robot for a relatively wide space.
in Figure 11. The absolute positions of the loudspeaker are different in the room, and each angle and
The same loudspeaker was located at four positions from A to D at the center of the room, as shown
ego-centric distance for the loudspeaker is different. The estimated room volumes are shown in
in Figure
absolute room
positions
of values
the loudspeaker
different
in the room,
and each
Figure11.
12. The
The estimated
volume
were almostare
constant.
The statistical
properties
of theangle
and ego-centric
distance
for pressure
the loudspeaker
is different.
The estimated
roomreflected
volumes
arethe
shown
short-term IPDs
of sound
were not remarkably
influenced
by the sound
from
in Figure
12. The obstacles
estimated
volume
values
constant.
The
statistical
properties
surrounding
at room
the center
of the
room.were
This almost
result supports
that
relative
positions,
not of
the short-term
Robotics
2016, 5, 16 IPDs of sound pressure were not remarkably influenced by the sound reflected11from
of 15
the surrounding obstacles at the center of the room. This result supports that relative positions, not
loudspeaker
andand
robot
are appropriate
for estimating
the room
absolute positions,
positions,between
betweenthethe
loudspeaker
robot
are appropriate
for estimating
thevolume
room
in
the
present
method.
volume in the present method.
0.8[m]
1.0[m]
1.2[m]
60
30
30
0.6[m]
volume[m3]
Figure 11. One speaker located at four positions from A to D in the center of the room. The absolute
Figure 11. One speaker located at four positions from A to D in the center of the room. The absolute
positions
in the
the room
room of
of the
the loudspeaker
loudspeaker differed
differed and
and the
the angles
angles and
and ego-centric
ego-centric distances
distances for
for the
the
positions in
speaker
also
differed.
speaker also differed.
104
103
Estimated
roomvolume[m
Estimatedroom
volume[m33]]
Figure 11.
11. One
One speaker
speaker located
located at
at four
four positions
positions from
from A
A to
to D
D in
in the
the center
center of
of the
the room.
room. The
The absolute
absolute
Figure
positions
in
the
room
of
the
loudspeaker
differed
and
the
angles
and
ego-centric
distances
for the
the
positions in the room of the loudspeaker differed and the angles and ego-centric distances for
speaker
also
differed.
Robotics
2016,
5,
16
11 of 15
speaker also differed.
1044
10
1033
10
1022
10
1011
10
1000
10
0.5
0.5
11
1.5
1.5
Ego-centric distance
distance to
to loudspeaker
loudspeaker [m]
[m]
Ego-centric
Figure 12.
12. Estimated
Estimated room
roomvolumes.
volumes.The
Theestimated
estimatedroom
roomvolume
volumevalues
values
were
almost
constant.
That
Figure
were
almost
constant.
That
is,
Estimated
room
volumes.
The
estimated
room
volume
values
were
almost
constant.
That
is,
the
present
room
volume
estimation
may
not
depend
on
the
absolute
position
of
the
loudspeaker.
the
present
room
volume
estimation
may
not
depend
on
the
absolute
position
of
the
loudspeaker.
is, the present room volume estimation may not depend on the absolute position of the loudspeaker.
Second, the
the effect
effect of
of surrounding
surrounding obstacles,
obstacles, here
here aa side
side wall,
wall, partitions,
partitions, and
and aa curtain,
curtain, on
on the
the
Second,
room volume
volume estimation
estimation is
is discussed.
discussed. The
The
sound
from the
the loudspeaker
loudspeaker located
located near
near the
the wall,
wall, the
the
room
The sound
sound from
partition,
or
the
curtain
was
measured.
The
geometry
of
the
robot,
loudspeaker,
and
obstacles
is
was measured.
measured. The
partition, or the curtain was
The geometry
geometry of
of the robot, loudspeaker, and obstacles is
shown
in
Figure
13.
The
loudspeaker
was
located
at
a
distance
of
1
m
from
the
robot
in
parallel
with
shown in Figure 13. The loudspeaker was located at a distance of 1 m from the robot in parallel with
one side
side wall.
wall. The
The room
room volume
volume was
was estimated
estimated when
when the
the robot
robot and
and the
the loudspeaker
loudspeaker were
were located
located at
at
one
various
distances
from
the
wall
while
keeping
it
parallel
with
the
wall.
The
sounds
were
additionally
from the wall while keeping it parallel with the wall. The sounds were additionally
various distances from
measured in
in the
the two
two cases
cases where
where four
four partitions
partitions (1.2
(1.2 m
m (L)
(L) ˆ
1.95
m (H))
(H)) were
were placed
placed in
in front
front of
of the
the
measured
partitions
m
(L)
×× 1.95
1.95 m
wall or
or all
all of
of the
the partitions
partitions were
were covered
coveredwith
withaaacurtain.
curtain.
partitions
were
covered
with
curtain.
wall
Robot
Robot
1.0[m]
1.0[m]
Loudspeaker
Loudspeaker
0.4~3.6[m]
0.4~3.6[m]
Curtain
Curtain
Partition
Partition
Wall
Wall
4.8[m]
4.8[m]
Figure 13. Geometry of the robot, loudspeaker, and obstacles: a wall, partitions, and partitions covered
with a thin curtain.
Figure 14 shows: (a) the relations between the distance to the obstacle and the average standard
deviation S.D.AVE ; and (b) the relations between the distance to the obstacle and the estimated
room volume.
As shown in Figure 14a, S.D.AVE increased with decreasing distance, when the distance to the side
wall or the partitions was less than approximately 2 m. This result means that sound reflected from the
surface of obstacles may affect S.D.AVE near the obstacles. S.D.AVE values versus the distances to the
wall or the partitions have similar characteristics. In the case of the curtain, S.D.AVE increases with
decreasing distance when the distance to the curtain was less than approximately 1 m, due to sound
absorption by the curtain. When the partitions were covered with a thin curtain, S.D.AVE values were
lower than those in the uncovered case. In the same manner, as shown in Figure 14b, the room volume
was more underestimated when the loudspeaker and robot were located closer than approximately
2 m to the side wall or the partitions. The room volume was underestimated only when the distance
to the curtain was less than approximately 1 m. The room volume value was the same as that in the
to the wall or the partitions have similar characteristics. In the case of the curtain, S.D.AVE increases
with decreasing distance when the distance to the curtain was less than approximately 1 m, due to
sound absorption by the curtain. When the partitions were covered with a thin curtain, S.D.AVE values
were lower than those in the uncovered case. In the same manner, as shown in Figure 14b, the room
volume2016,
was
Robotics
5, 16more underestimated when the loudspeaker and robot were located closer12than
of 15
approximately 2 m to the side wall or the partitions. The room volume was underestimated only
when the distance to the curtain was less than approximately 1 m. The room volume value was the
center
the in
room
obstacles
were the
farther
than approximately
2 mapproximately
from the robot2 and
the
same asofthat
the when
center the
of the
room when
obstacles
were farther than
m from
sound
source.
Especially,
the
effect
of
the
curtain
was
negligible
when
the
curtain
was
located
farther
the robot and the sound source. Especially, the effect of the curtain was negligible when the curtain
than
approximately
1 m due
to its sound1absorption.
was located
farther than
approximately
m due to its sound absorption.
(a)
(b)
Figure 14. Effect of three types of obstacle: a wall, partitions, and the partitions covered with a thin
Figure 14. Effect of three types of obstacle: a wall, partitions, and the partitions covered with a thin
distance to
curtain. (a,b)
the same
same room:
room: (a)
(a) the
the relation
relation between
between S.D.
S.D.AVE and
curtain.
(a,b) were
were measured
measured in
in the
AVE and distance to
obstacle;
and
(b)
the
estimated
room
volume
to
distance
to
the
obstacle.
Red
lines
indicate
the cases
cases
obstacle; and (b) the estimated room volume to distance to the obstacle. Red lines indicate the
that
the
loudspeaker
and
robot
were
located
far
from
the
side
wall
in
a
relatively
wide
space.
The
that the loudspeaker and robot were located far from the side wall in a relatively wide space. The effect
effect
the obstacle
increases
with decreasing
distance
for obstacle
lessapproximately
than approximately
of
the of
obstacle
increases
with decreasing
distance
for obstacle
less than
2 m. 2 m.
These results imply that the effect of sound reflected from the surface of an obstacle is not
These results imply that the effect of sound reflected from the surface of an obstacle is not
negligible in the present room volume estimation. The room volume is underestimated when an
negligible in the present room volume estimation. The room volume is underestimated when an
obstacle is located close to the loudspeaker and robot.
obstacle is located close to the loudspeaker and robot.
4. Discussion
4.
Discussion
In this
this section,
section,the
theresults
resultsofofthe
thepresent
presentpaper
paper
discussed
comparison
with
results
in
In
areare
discussed
in in
comparison
with
the the
results
in the
the
references.
To
the
best
of
our
knowledge,
room
volume
has
never
been
estimated
passively
by
references. To the best of our knowledge, room volume has never been estimated passively by using
using
a binaural
robotin
head
in conventional
methods.
We adapted
the binaural
cue IPDs,
are
a
binaural
robot head
conventional
methods.
We adapted
the binaural
cue IPDs,
whichwhich
are ITDs
ITDs
transformed
from
the
time
domain
to
the
frequency
domain.
The
IPDs
are
intrinsically
the
same
transformed from the time domain to the frequency domain. The IPDs are intrinsically the same as
as the
ITDs.
the
ITDs.
The binaural
The
binaural cues,
cues, such
such as
as ILDs,
ILDs, ITDs,
ITDs, and
and the
the interaural
interaural coherence
coherence (IC),
(IC), have
have been
been investigated
investigated
for
auditory
detection
of
spatial
characteristics
(distance,
reverberation,
and
angle
for auditory detection of spatial characteristics (distance, reverberation, and angle of
of sound
sound source).
source).
These
binaural
cues
are
affected
by
the
reverberant
energy
as
a
function
of
the
listener’s
location
in
These binaural cues are affected by the reverberant energy as a function of the listener’s location in the
room and the source location relative to the listener [11,22]. Thus, the relations between these binaural
cues and the absolute location of the microphones and the source have been a point of contention.
Hartmann et al. measured the ICs of sound pressures in various rooms using a binaural head and torso
in four rooms smaller than 210 m3 and one room of 3800 m3 [22]. They reported that the ICs markedly
depended on the distance between the loudspeaker and microphones and on their absolute locations.
A specific feature of the present method is the short time frame for calculating the DFT. The time
frame was set at 0.2 s. In conventional methods, the time frame is set longer than 1 s, since a shorter
time frame leads to lower performance [10,11]. Though short-term IPDs are ambiguous, unrepeatable,
and unreliable for long-distance sound sources, we found that the average standard deviation value
of short-term IPDs was more repeatable and reliable than the IPDs even in reverberation rooms, as
mentioned in Section 3. Another feature is the adjustable frequency range, as shown in Figure 3b
in Section 2. The average value of the spectral standard deviation depends on this frequency range.
An adequate frequency range that is most sensitive to the change of room volume can be selected.
When the frequency range was adjusted adequately, average standard deviation is proportional to
Robotics 2016, 5, 16
13 of 15
each ego-centric distance to the source, as shown in Figures 8 and 9 in Section 3. Conventional research
using spectral standard deviation provides no solution for setting the frequency range. Georganti et al.
reported that the standard deviation values of binaural room transfer functions (BRTFs) followed a
distance-dependent behavior [13]. As shown in Figure 10 in Section 3, we found that the average
standard deviation of short-term IPDs depended on not only the ego-centric distance to the source
but also the room volume. Consequently, the adjusted average standard deviation of short-term IPDs
depended only on the relative locations of the source and microphones, not on their absolute locations,
as shown in Figure 12 in Section 3. In the present paper, a binaural cue that depends only on the
relative locations of the sound source and the microphones and not on their absolute locations is
utilized by using short-term IPDs.
The effect of surrounding obstacles such as a side wall, partitions, or a curtain on the average
standard deviation of short-term IPDs was discussed in Section 3. The effect of the sound reflected from
the surface of the obstacle was not negligible in the present room volume estimation. The room volume
was underestimated when the obstacle was located close to the loudspeaker and robot. Both the
microphones mounted on the robot head and the loudspeaker were set at a height of 0.8 m from the
flour. The distance from the microphones to the ceiling was approximately 2.2 m in lecture rooms B,
C, D, and E. The average standard deviations may be affected by the shorter distance to the floor in
these rooms, and not by the greater distance to the ceiling. On the other hand, the room volumes in
lecture rooms B and C were twice those in lecture rooms D and E. Corresponding to the room volume,
the average standard deviations in rooms D and E were almost twice those in rooms B and C at a
distance 1.5 m, as shown in Figure 9 in Section 3. The heights of the ceiling were 11.7 m, 3.3 m, and
2.6 m in the gymnasium and lecture rooms A and F, respectively. It seems that the height of the ceiling
may not greatly affect the average standard deviations in Figure 9, since the distance between the
ceiling and the microphones was sufficient. These results indicate that the average standard deviation
of short-term IPDs depends on the volume of the room. It should be emphasized that the effect of
reflected sound from not only the floor and the ceiling but also from the far side wall may not be
negligible for the average standard deviation.
Another realistic problem is background noise. It is very hard to find a room without people in
it talking or other noise. This means that there are usually multiple sound sources in a room. If the
sounds are sparse over time such as human voices, the robot can continue to measure S.D.AVE and
wait for quiet intervals. The minimum value of S.D.AVE may indicate the room volume, since a distant
sound source will give a high S.D.AVE . In the case of continuous noise, the performance of this method
will be lower.
Why does the average standard deviation of short-term IPDs depend on the room volume? One of
the reasons may be that the fluctuation of waves over time in an acoustical field corresponds to the
shape and size of the room. Such a wave field is made by the synthesis of reflected waves and direct
waves. An acoustical field may be similar to the water waves in a pool. The water surface with a
floating object in the pool fluctuates in height over time. The height of the fluctuation may be higher in
a small pool than in a large pool. 3-D simulations in room acoustics may provide an exact answer in
the near future.
5. Conclusions
The room volumes of reverberation rooms were estimated from short-term IPDs by a binaural
humanoid robot. The robot turned its head toward a sound source with binaural audition and
recognized the object that was the sound source with robot vision. Then, the ego-centric distance
to the object was estimated with stereovision. The average standard deviation was calculated using
short-term IPDs obtained by binaural signals. The relations between the room volumes, the average
standard deviations, and the ego-centric distances were experimentally obtained in eight rooms.
The volume of a test room was estimated by interpolating these relations. The results are summarized
as follows:
Robotics 2016, 5, 16
(1)
(2)
(3)
(4)
(5)
14 of 15
Short-term IPDs are ambiguous, unrepeatable, and unreliable for distant sound sources.
Average standard deviation of short-term IPDs (S.D.AVE ) is more repeatable and reliable than the
IPDs even in reverberation rooms.
The proposed average standard deviation is proportional to the ego-centric distance to the sound
source. The slope of S.D.AVE with respect to the ego-centric distance depends on the room volume.
The average standard deviation of short-term IPDs depends on the volume of the room. The effect
of the reflected sound from not only the floor and the ceiling but also the far wall may not be
negligible in the average standard deviation.
The average standard deviation of short-term IPDs increases with decreasing distance near
surrounding obstacles, such as a side wall, partitions, or a curtain. Thus, the room volume is
underestimated near the obstacles.
For eight rooms having different room volumes, the robot could categorize them into four sizes
of rooms, namely, small, middle, intermediate, and large, using the average standard deviation
values of short-term IPDs.
Several assumptions were necessary for estimating ego-centric distance to a sound source by
stereovision, even when the object to be recognized was limited to one loudspeaker. The robot could
not identify acoustically each of the eight rooms having different volumes. As is true for humans,
the robot could only categorize the place where it was as a room of one of four sizes, namely, small,
middle, intermediate, and large. The room volume was underestimated when an obstacle was located
close to the loudspeaker and robot. This implies the possibility that a robot can acoustically detect the
presence of an obstacle by the change of the estimated room volume when the robot moves toward the
obstacle. Such ability may be limited in actual performance. Nevertheless, it will be an additional way
for a robot to recognize its surrounding environment, since the robot cannot acoustically localize the
position of an obstacle that radiates no sound.
Computer simulation of room acoustics will help to explain this phenomenon in more detail.
Room volume estimation under multiple sound sources remains as future work.
Author Contributions: R.S. and R.F. conceived and designed the experiments; R.F. performed the experiments;
R.S. and R.F. analyzed the data; R.F. contributed programing/making the robot head/analysis tools; and R.S. wrote
the paper.
Conflicts of Interest: The authors declare no conflict of interest.
References
1.
2.
3.
4.
5.
6.
7.
Nakamura, K.; Nakadai, K.; Asano, F.; Hasegawa, Y.; Tsujino, H. Intelligent sound source localization for
dynamic environments. In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots
And Systems, St. Louis, MO, USA, 10–15 October 2009; pp. 664–669.
Asano, F.; Asoh, H.; Nakadai, K. Sound source localization using joint Bayesian estimation with a hierarchical
noise model. IEEE Trans. Audio Speech Lang. Proc. 2013, 21, 1953–1965. [CrossRef]
Shabtai, N.R.; Zigel, Y.; Rafaely, B. Estimating the room volume from room impulse response via hypothesis
verification approach. In Proceedings of the 2009 IEEE/SP 15th Workshop on Statistical Signal Processing,
Cardiff, UK, 31 August–3 September 2009; pp. 717–720.
Kuster, M. Reliability of estimating the room volume from a single room impulse response. J. Acoust. Soc. Am.
2008, 124, 982–993. [CrossRef] [PubMed]
Kearney, G.; Masterson, C.; Adams, S.; Boland, F. Approximation of binaural room impulse responses.
Proc. ISSC 2009, 2009, 1–6.
Jeub, M.; Schafer, M.; Vary, P. A binaural room impulse response database for the evaluation of
dereverberation algorithms. In Proceedings of the 2009 16th International Conference on Digital Signal
Processing, Santorini, Creece, 5–7 July 2009; pp. 1–5.
Larsen, E.; Schmitz, C.D. Acoustic scene analysis using estimated impulse responses. Proc. Signals Syst. Comp.
2004, 1, 725–729.
Robotics 2016, 5, 16
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
15 of 15
Shinn-Cunningham, B.G.; Kopco, N.; Martin, T.J. Localizing nearby sound sources in a classroom: Binaural
room impulse responses. J. Acoust. Soc. Am. 2005, 117, 3100–3115. [CrossRef] [PubMed]
Shabtai, N.R.; Zigel, Y.; Rafaely, B. Feature selection for room volume identification from room impulse
response. In Proceedings of the 2009 IEEE Workshop on Applications of Signal Processing to Audio and
Acoustics, New Paltz, NY, USA, 18–21 October 2009.
Vesa, S. Binaural sound source distance learning in rooms. IEEE Trans. Audio Speech Lang. Proc. 2009, 17,
1498–1507. [CrossRef]
Georganti, E.; May, T.; Par, S.V.D.; Mourjopoulos, J. Sound source distance estimation in rooms based on
statistical properties of binaural signals. IEEE Trans. Audio Speech Lang. Proc. 2013, 21, 1727–1741. [CrossRef]
Jetzt, J.J. Critical distance measurement of rooms from the sound energy spectral response. J. Acoust. Soc. Am.
1979, 65, 1204–1211. [CrossRef]
Bronkhorst, A.W.; Houtgast, T. Auditory distance perception in rooms. Nature 1999, 397, 517–520. [CrossRef]
[PubMed]
Kuster, M. Estimating the direct-to reverberant energy ratio from the coherence between coincident pressure
and particle velocity. J. Acoust. Soc. Am. 2011, 130, 3781–3787. [CrossRef] [PubMed]
Georganti, E.; Mourjopoulos, J.; Par, S.V.D. Room statistics and direct-to-reverberant ratio estimation from
dual-channel signals. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 42–47.
Eaton, J.; Moore, A.H.; Naylor, P.A.; Skoglund, J. Direct-to-reverberant ratio estimation using a null-steered
beamformer. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), South Brisbane, Austrelia, 19–24 April 2015; pp. 46–50.
Hioka, Y.; Niwa, K. Estimating direct-to-reverberant ratio mapped from power spectral density using deep
neural network. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 149–152.
Hu, J.S.; Liu, W.H. Location classification of nonstationary sound sources using binaural room distribution
patterns. IEEE Trans. Audio Speech Lang. Proc. 2009, 17, 682–692. [CrossRef]
Nix, J.; Holmann, V. Sound source localization in real sound fields based on empirical statistics of interaural
parameters. J. Acoust. Soc. Am. 2006, 119, 463–479. [CrossRef] [PubMed]
Kido, K.; Suzuki, H.; Ono, T. Deformation of impulse response estimates by time window in cross spectral
technique. J. Acoust. Soc. Jpn. (E) 1998, 19, 249–361. [CrossRef]
Shimoyama, R.; Yamazaki, K. Computational acoustic vision by solving phase ambiguity confusion.
Acoust. Sci. Tech. 2009, 30, 199–208. [CrossRef]
Hartmann, W.M.; Rakerd, B.; Koller, A. Binaural coherence in rooms. Acta Acust. United Acust. 2005, 91,
451–462.
Shimoyama, R.; Fukuda, R. Room volume estimation based on statistical properties of binaural signals using
humanoid robot. In Proceedings of the 23rd International Conference on Robotics in Alpe-Adria-Danube
Region (RAAD), Smolenice, Slovakia, 3–5 September 2014; pp. 1–6.
© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC-BY) license (http://creativecommons.org/licenses/by/4.0/).
Download