Document 13134431

advertisement
2011 International Conference on Intelligent Building and Management
Proc .of CSIT vol.5 (2011) © (2011) IACSIT Press, Singapore
Performance Comparison of Compact Disk and Widely
Distributed Microphone Arrays for Intelligent Lecture Halls
YingYing Tiong 1, Yue Li 2
CSIRO ICT Centre
Abstract. SquareHead AudioScope is a hands-free speech acquisition tool, that allows users to select and
‘zoom in’ the sound recorded using a compact disk microphone array. Massachusetts Institute of Technology
(MIT) also developed a hands-free speech acquisition tool using widely distributed microphone array. In this
paper, we compare the performance of the disk and the distributed arrays, inside a large lecture hall by
studying their beam patterns and signal to interference ratios, via simulation.
Keywords: speech acquisition tool, array, signal processing, lecture hall, conference room
1. Introduction
Microphone arrays might replace close talking microphones as speech acquisition tools in a lecture hall.
The former system allows users isolate and amplify the particular sound they want to hear. SquareHead
AudioScope is an example of a commercialised speech acquisition tool applied in one of the auditoriums at
the Max Planck Institute. The audio collector of the Audioscope has 300 built-in microphones, concealed in
an elegant carbon dish to form a compact disk array. Another option of the speech acquisition tool for a
lecture hall is a widely distributed microphone array. [3]-[6] are some research regarding the area. An
example of the distributed array is shown in Fig. 2. The system is developed in MIT using 32 omnidirectional microphones attached uniformly on the ceiling.
Fig. 1: SquareHead AudioScope Conference System
being installed in one of the auditorium at Max Planck
Institute [1]. The highlight circle is the AudioScope Dish.
Fig. 2: Experimenting MIT distributed microphone array.
Note that microphones are highlighted in red circles with
pointed arrows[4].
The major physical difference between the two arrays is the distributed array has larger aperture. Thus,
the sound quality of the two arrays might be different. The preliminary step in choosing an appropriate
speech acquisition tool for a large and crowded lecture hall, is to simulate and compare the performance of
the two arrays. To do this, we first compare the beam pattern of the arrays under various conditions, followed
by comparing the noise suppression capability of the systems through calculating the Source to Interference
Ratio (SIR).
1
Tel.: + 61430904702
E-mail address: ying.tiong@csiro.au.
2
Tel.: +61425358929
E-mail address: yue.li@csiro.au.
285
2. Simulation Procedures
Assume 1024 (32 × 32) audience seats are distributed uniformly inside the lecture hall and form a 25 m
× 25 m audience plane. The distance between two neighboring seats is 0.78m. The centre of the audience
plane is set as the origin. The disk and the distributed arrays are installed 5 meters above the audience plane
and the centre of the arrays is aligned to the origin. Fig. 3 shows the testing environment. The audience plane
is the XY plane at z = 0 m. The disk array is represented by the circular planar. The microphones of the
distributed array are represented by dots.
Microphone position distribution in the disk array is shown in Fig. 4. The arrangement is similar to that
of the Squarehead disk array[1]. The microphones are on thirty 1.0 m radial lines. The gap between radial
lines is 12⁰. Each radial line has 10 microphones. The distance d between ith and (i+1)th microphones is
.
(1)
● The disk array a
○ Microphone of the distributed array
×(0,0,0) – Origin Point ×(xa,ya,za) – Focal Point
×(xj,yj,zj) – Arbitrary Non Focal Point
Fig. 3: 3D view of the disk and distributed arrays inside a
25 m×25 m lecture hall.
Fig. 4: Microphone distribution of the disk array.
The distributed array is a 17×17 square array. The centre element of the array is above the origin. The
pitch of the array in both x and y directions are 1.47 m. The aperture size of the array is 23.52 m ×23.52 m.
Multiple reflection signals are ignored and the propagation medium is assumed to be homogeneous
during the simulation.
2.1. Beam Pattern
Suppose either the disk or the distributed array is operating (Fig. 3). Consider a signal sg(t) is generated
at an arbitrary location
, , . The signal received by the ith microphone is
,
,
th
,
,
(2)
where , is the distance from the source location to the i microphone and c is the speed of sound (c is
assumed 340 m/s at 20 ºC). Note that the received signal is time shifted. Besides, the amplitude of the
received signal is decayed due to the spherical effect during propagation.
The formed beam using the delay and sum method is
,
,
,
∑
,
,
,
.
(3)
, ,
, M refers to the number of microphones in the array and
In (3), refers to the focal point
,
refers to a weighting function applied to the array. In this paper, the following three weighting
functions are tested
1
. (6)
(4),
,
,
1 (5), and
,
,
,
(4) shows that the beam former is trying to compensate the decay of the signal from the focal point due
to spherical spreading. (5) implies that the beamformer only compensates for the phase delay without
considering spherical spreading. (6) is used so that signals received from microphones near the focal point
contribute more to the output of the beamformer.
The maximum signal power over time of the output from the beamformer is
286
,
,
| max
,
| .
,
is defined as the beam value for a sound source at
(7)
when the focal point
is fixed.
For single frequency signals, the output signal of the beamformer can be expressed by the Fourier
Transform of (3)
,
∑
,
,
,
,
,
.
(8)
The beam value of single frequency signals is
,
,
,
.
(9)
2.2. SIR Calculations
In this paper, SIR (8) is used to calculate the noise suppression ability of the two arrays.
,
(10)
where the signal power refers to the power of the output signal due to the sound source at the focal point and
the interference noise power refers to the sum of power of output signal due to the sound source at non focal
points.
In this part, we assume that there is a person at every seat. Every person is making sound at various
levels. The sounds from different people are uncorrelated. There are no other significant noises besides
sounds made by people. When the array is steered to focus on a seat (the focal point ), the SIR of the
beamformer is
,
∑N
,
·
,
(9)
where Bf (ra, ra) is the beam value at the focal point, Bf (ra ,rj) is the beam value at jth seat, N is the number of
people in the lecture hall except the person at the focal point, Pa is the average power over time of the sound
made by the person at the focal point, P0 is the average power over time and over people at non focal points.
3. Simulation Results
The performances of the microphone arrays depend on the type of sound they receiving (ie: speech or
music). Assume the performance of the arrays for single and multi frequency signals are similar, 500 Hz and
3 kHz signals are used to measure the performances of the two arrays when detecting speech and music
respectively.
3.1. Beam Pattern
This section shows the beam patterns of 500 Hz signal only. The beam patterns of 3 kHz signal is
assumed similar to that of 500 Hz signal. This section is divided into two parts, the first part shows the beam
patterns when arrays are focused on (12.5 m, 0 m, 0 m).
5
2
1
0
2.5
0
y(m)
-2.5 -2.5
0
x(m)
2.5
0
1
z (m )
0.5
Beam Value
3
z (m )
Beam Value
4
1
0.5
0
0.25
-10
-5
0
x(m)
5
0
10
y(m)
(a)
(b)
Fig. 5: Beam pattern of the disk array focused on the
origin for a 500 Hz signal. (a) XY plane (b) YZ plane.
-0.25 -0.25
0
x(m)
0.25
0
-12.5
0
x(m)
12.5
(a)
(b)
Fig. 6: Beam pattern of the distributed array focused on
the origin for a 500 Hz signal. (a) XY plane (b) YZ plane.
Figs. 5 and 6 show beam patterns when the disk and distributed arrays focused on the origin. The beam
patterns are calculated using (5) with the first weighting function in (4). Note that the scales of the figures are
different.
From Fig. 5(a), the 3-dB beamwidth of the disk array is approximately 3 m. The distance between two
neighbouring seats is about 0.78 m. Thus, the disk array is focusing more than ten seats. On the other hand,
287
Fig. 6(a) shows that the 3-dB beamwidth of the distributed array is about 0.28 m. The distributed array is
focusing on one seat. However, the beam of the distributed array has higher level sidelobes compared to the
disk array.
From Fig. 5(b), the beam of the disk array has large values along the beam direction in YZ plane. The
array behaves similarly to a spotlight. Also, the highest beam value of the disk array does not occur at the
focal point, but at a point close to the array aperture. This shows that the disk array does not provide
resolution in z-direction. Fig. 6(b) shows the highest beam value at the focal point. The 3-dB beamwidth in
the z-direction is about 1.25 m. Therefore, the distributed array is able to generate a 3D resolution cell.
However, since the seats inside the lecture hall form a plane, as assumed before, the disk array still performs
acceptably but the system only provides 2D resolution.
Figs. 7 and 8 show beam patterns of the disk and the distributed arrays when focusing on (12.5 m, 0 m, 0
m). Fig. 7(a) shows that there are locations where the beam value is higher than the beam value at focal point.
The output of the disk array might contain higher signals from those non focal points. Thus, the quality of the
sound might be deteriorated. Fig. 8(a) shows the highest beam value of the distributed array still occurs at the
focal point. However, the beamwidth of the distributed array is increased slightly when the focal point is
moved from the origin to (12.5 m, 0 m, 0 m).
5
3
1
2
1
0
12.5
12.5
0
0
y(m) -12.5 -12.5 x(m)
0
-10
-5
0
x(m)
5
1
4
z (m )
Beam Value
4
2
z (m )
Beam Value
In the YZ plane, the disk array is still behaving like a spotlight as shown in Fig. 7(b). Meanwhile, refer to
Fig. 8(b), the highest beam value of the distributed array still occurs at the focal point. Nonetheless, the
beamwidth of the distributed array in the YZ plane also increases when the focal point is moved from the
origin to (12.5 m, 0 m, 0 m).
0.5
0
1
(a)
(b)
Fig. 7: Beam pattern of the disk array focused on (12.5 m,
0 m, 0 m) for a 500 Hz signal.
2
1
0
y(m)
10
3
-1
11
12
0
x(m)
-10
-5
0
x(m)
5
10
(a)
(b)
Fig. 8: Beam pattern of the distributed array focused on
(12.5 m, 0 m, 0 m) for a 500Hz signal.
3.2. SIR Pattern
In this section, the SIR pattern is formed by calculating SIR values of the output of the arrays when
focusing on the audience seats.
100
2
1
0
10
80
20
60
10
0
12.5
y(m)
0
-10
-10
x(m)
Fig. 9: SIR pattern of disk
array for 500 Hz signal.
25
40
0
10
5
0
-12.5
-12.5
0
x(m)
Fig. 10: SIR pattern of
distributed array for 500
Hz signal.
20
15
12.5
y(m)
35
30
20
10
0
40
Distributed Array
Disk Array
SIR
SIR
SIR
3
30
SIR
4
-10
-5
0
x(m)
5
10
Fig. 11: Compare SIR
pattern of the disk and the
distributed array for 3 kHz
single frequency signal at xdirection.
0
-12.5
- Disk:
- Sparse:
0
12.5
x(m)
,
or
,
Sparse:
1 or
, Sparse:
,
1,
,
Fig. 12: SIR patterns for
500 Hz signal at xdirection.
Figs. 9 and 10 show SIR patterns for a 500Hz signal of the disk and distributed arrays respectively. The
SIR value was calculated using (9), by assuming
100. The two arrays have highest SIR value at the
origin. The SIR values of the arrays decreases when the distance between the focal point and the origin
increases. Thus, the output noise suppression ability of the two arrays deteriorates when focusing at a seat
further form the origin.
288
By comparing the peak SIR value, the distributed array has much higher SIR values than the disk array.
The distributed array has better noise suppression ability.
Fig. 11 shows the comparison of SIR patterns of the disk and distributed arrays for a 3000 Hz signal, in
x-direction. The disk array has higher SIR value when the focal point is closer to the origin. When the focal
point is away from the origin, the distributed array has higher SIR values. The transition point is x ≈ 5 m.
Fig. 12 shows the SIR patterns of the disk and distributed arrays with three different weighting functions
(4)-(6), for a 500Hz signal, in the x-direction. The SIR pattern of the disk array does not show any significant
changes corresponding to the three weighting functions. However, the SIR pattern of the distributed array
changes when different weighting function is used. From Fig. 13, the SIR values of the distributed array
increases significantly when the spherical decay compensator is removed (replace weighting function (4) to
(5)). When weighting function (5) is replaced by (6), the SIR values for the seats near the border of the
audience plane increase. Noise suppression ability of the distributed arrays when focusing the seats near the
border increases.
4. Conclusion And Future Work
The distributed array has better spatial resolution than the disk array. The highest beam value of the
distributed array always occurs at its focal point but, the highest beam value of the disk array may occurs at
non focal point especially when the focal point is far from the origin. Besides, the disk array only provides
2D resolution whereas the distributed array provides 3D resolution.
The SIR patterns shows that the distributed array has better noise suppression ability compare to the disk
array for low frequency signals (ie: normal human vocal). Meanwhile, for high frequency signals (ie: music
signal), the noise suppression ability of the disk depends on the seats the system is focusing on. When
focusing on the seats near to the origin, the disk array has better noise suppression ability. However, when
the focal point is far away from the origin, the distributed array has better noise suppression ability.
Overall, from the simulation results, the distributed array has more advantage compared to the disk array.
However, the accuracy of the microphone position of the distributed array is difficult to determine. The error
in the location calibration will influence the quality of the formed beam. The research on the microphone
calibration method can be found in [7]-[9].
5. References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
M. Grotticelli, “New Microphone Can Hear Any Sound at Outdoor Sports Venues,” Squarehead Technology.
[Online] Available: http://broadcastengineering.com/audio/new-microphone-audioscope-1022/ [Accessed: Nov 22,
2010]
M. Kjolerbakken, V. Jahr and I. Hafizovic, “Directional Audio Capturing.” U.S. Patent 20080247567, Oct. 9, 2008
K. Wilson V. Rangarajan, N. Checka, and T. Darrell.,
"Audiovisual Arrays for Untethered Spoken
Interfaces", 4th Int. Conf. Multimodal Interfaces (ICMI), pp. 2002.
J. M. Sachar, H. F. Silverman, and W. R. Patterson, III, "Large vs Small Aperture Microphone Arrays:
Performance Over a Large Focal Area", Proc. ICASSP, pp. SAM P7.8 - SAM P7.8, 2001.
D. Sun, “Microphone Arrays for Hands-free Speech Interfaces”, David Sun Projects. [Online] Available:
http://www.eecs.berkeley.edu/~davidsun/micarray.html. [Accessed: Jan. 20, 2011]
R. Stiefelhagen, et. al., “Audiovisual Perception of a Lecturer in a Smart Seminar Room”, Signal Processing, In
Press, 2006.
J. M. Sachar, H. F. Silverman and W. R. Patterson, “Microphone Position and Gain Calibration for a LargeAperture Microphone Array”, IEEE Trans. On Speech and Audio Processing, Vol. 13, no. 1, pp. 42-52, Jan. 2005
Y. Li, "Position and time-delay calibration of transducer elements in a sparse array for underwater ultrasound
imaging," IEEE Trans. Ultrason., Ferroelect., Freq. Cont., vol. 53, no. 8, pp. 1458-1466, Aug 2006.
Y. Li, I. Sharp, M. Hedley, P. Ho, and Y. J. Guo, "Single- and Double-Difference Algorithms for Position and
Time-Delay Calibration of Transducer-Elements in a Sparse Array," IEEE Trans. Ultrason., Ferroelect., Freq.
Cont., vol. 54, no. 6, pp. 1188-1198, June 2007.
A. Cigada, F. Ripamonti, and M. Vanali, “The Delay & Sum Algorithm Applied to Microphone Array
Measurements: Numerical Analysis and Experimental validation” ScienceDirect, Mechanical Signal Processing
Vol. 21, Issue 6, pp. 2645-2664, Aug 2007. [Online] Available: http//www.sciencedirect.com [Accessed: Nov. 29,
2010]
289
Download