Document 13134431

2011 International Conference on Intelligent Building and Management Proc .of CSIT vol.5 (2011) © (2011) IACSIT Press, Singapore Performance Comparison of Compact Disk and Widely Distributed Microphone Arrays for Intelligent Lecture Halls YingYing Tiong 1, Yue Li 2 CSIRO ICT Centre Abstract. SquareHead AudioScope is a hands-free speech acquisition tool, that allows users to select and ‘zoom in’ the sound recorded using a compact disk microphone array. Massachusetts Institute of Technology (MIT) also developed a hands-free speech acquisition tool using widely distributed microphone array. In this paper, we compare the performance of the disk and the distributed arrays, inside a large lecture hall by studying their beam patterns and signal to interference ratios, via simulation. Keywords: speech acquisition tool, array, signal processing, lecture hall, conference room 1. Introduction Microphone arrays might replace close talking microphones as speech acquisition tools in a lecture hall. The former system allows users isolate and amplify the particular sound they want to hear. SquareHead AudioScope is an example of a commercialised speech acquisition tool applied in one of the auditoriums at the Max Planck Institute. The audio collector of the Audioscope has 300 built-in microphones, concealed in an elegant carbon dish to form a compact disk array. Another option of the speech acquisition tool for a lecture hall is a widely distributed microphone array. [3]-[6] are some research regarding the area. An example of the distributed array is shown in Fig. 2. The system is developed in MIT using 32 omnidirectional microphones attached uniformly on the ceiling. Fig. 1: SquareHead AudioScope Conference System being installed in one of the auditorium at Max Planck Institute [1]. The highlight circle is the AudioScope Dish. Fig. 2: Experimenting MIT distributed microphone array. Note that microphones are highlighted in red circles with pointed arrows[4]. The major physical difference between the two arrays is the distributed array has larger aperture. Thus, the sound quality of the two arrays might be different. The preliminary step in choosing an appropriate speech acquisition tool for a large and crowded lecture hall, is to simulate and compare the performance of the two arrays. To do this, we first compare the beam pattern of the arrays under various conditions, followed by comparing the noise suppression capability of the systems through calculating the Source to Interference Ratio (SIR). 1 Tel.: + 61430904702 E-mail address: ying.tiong@csiro.au. 2 Tel.: +61425358929 E-mail address: yue.li@csiro.au. 285 2. Simulation Procedures Assume 1024 (32 × 32) audience seats are distributed uniformly inside the lecture hall and form a 25 m × 25 m audience plane. The distance between two neighboring seats is 0.78m. The centre of the audience plane is set as the origin. The disk and the distributed arrays are installed 5 meters above the audience plane and the centre of the arrays is aligned to the origin. Fig. 3 shows the testing environment. The audience plane is the XY plane at z = 0 m. The disk array is represented by the circular planar. The microphones of the distributed array are represented by dots. Microphone position distribution in the disk array is shown in Fig. 4. The arrangement is similar to that of the Squarehead disk array[1]. The microphones are on thirty 1.0 m radial lines. The gap between radial lines is 12⁰. Each radial line has 10 microphones. The distance d between ith and (i+1)th microphones is . (1) ● The disk array a ○ Microphone of the distributed array ×(0,0,0) – Origin Point ×(xa,ya,za) – Focal Point ×(xj,yj,zj) – Arbitrary Non Focal Point Fig. 3: 3D view of the disk and distributed arrays inside a 25 m×25 m lecture hall. Fig. 4: Microphone distribution of the disk array. The distributed array is a 17×17 square array. The centre element of the array is above the origin. The pitch of the array in both x and y directions are 1.47 m. The aperture size of the array is 23.52 m ×23.52 m. Multiple reflection signals are ignored and the propagation medium is assumed to be homogeneous during the simulation. 2.1. Beam Pattern Suppose either the disk or the distributed array is operating (Fig. 3). Consider a signal sg(t) is generated at an arbitrary location , , . The signal received by the ith microphone is , , th , , (2) where , is the distance from the source location to the i microphone and c is the speed of sound (c is assumed 340 m/s at 20 ºC). Note that the received signal is time shifted. Besides, the amplitude of the received signal is decayed due to the spherical effect during propagation. The formed beam using the delay and sum method is , , , ∑ , , , . (3) , , , M refers to the number of microphones in the array and In (3), refers to the focal point , refers to a weighting function applied to the array. In this paper, the following three weighting functions are tested 1 . (6) (4), , , 1 (5), and , , , (4) shows that the beam former is trying to compensate the decay of the signal from the focal point due to spherical spreading. (5) implies that the beamformer only compensates for the phase delay without considering spherical spreading. (6) is used so that signals received from microphones near the focal point contribute more to the output of the beamformer. The maximum signal power over time of the output from the beamformer is 286 , , | max , | . , is defined as the beam value for a sound source at (7) when the focal point is fixed. For single frequency signals, the output signal of the beamformer can be expressed by the Fourier Transform of (3) , ∑ , , , , , . (8) The beam value of single frequency signals is , , , . (9) 2.2. SIR Calculations In this paper, SIR (8) is used to calculate the noise suppression ability of the two arrays. , (10) where the signal power refers to the power of the output signal due to the sound source at the focal point and the interference noise power refers to the sum of power of output signal due to the sound source at non focal points. In this part, we assume that there is a person at every seat. Every person is making sound at various levels. The sounds from different people are uncorrelated. There are no other significant noises besides sounds made by people. When the array is steered to focus on a seat (the focal point ), the SIR of the beamformer is , ∑N , · , (9) where Bf (ra, ra) is the beam value at the focal point, Bf (ra ,rj) is the beam value at jth seat, N is the number of people in the lecture hall except the person at the focal point, Pa is the average power over time of the sound made by the person at the focal point, P0 is the average power over time and over people at non focal points. 3. Simulation Results The performances of the microphone arrays depend on the type of sound they receiving (ie: speech or music). Assume the performance of the arrays for single and multi frequency signals are similar, 500 Hz and 3 kHz signals are used to measure the performances of the two arrays when detecting speech and music respectively. 3.1. Beam Pattern This section shows the beam patterns of 500 Hz signal only. The beam patterns of 3 kHz signal is assumed similar to that of 500 Hz signal. This section is divided into two parts, the first part shows the beam patterns when arrays are focused on (12.5 m, 0 m, 0 m). 5 2 1 0 2.5 0 y(m) -2.5 -2.5 0 x(m) 2.5 0 1 z (m ) 0.5 Beam Value 3 z (m ) Beam Value 4 1 0.5 0 0.25 -10 -5 0 x(m) 5 0 10 y(m) (a) (b) Fig. 5: Beam pattern of the disk array focused on the origin for a 500 Hz signal. (a) XY plane (b) YZ plane. -0.25 -0.25 0 x(m) 0.25 0 -12.5 0 x(m) 12.5 (a) (b) Fig. 6: Beam pattern of the distributed array focused on the origin for a 500 Hz signal. (a) XY plane (b) YZ plane. Figs. 5 and 6 show beam patterns when the disk and distributed arrays focused on the origin. The beam patterns are calculated using (5) with the first weighting function in (4). Note that the scales of the figures are different. From Fig. 5(a), the 3-dB beamwidth of the disk array is approximately 3 m. The distance between two neighbouring seats is about 0.78 m. Thus, the disk array is focusing more than ten seats. On the other hand, 287 Fig. 6(a) shows that the 3-dB beamwidth of the distributed array is about 0.28 m. The distributed array is focusing on one seat. However, the beam of the distributed array has higher level sidelobes compared to the disk array. From Fig. 5(b), the beam of the disk array has large values along the beam direction in YZ plane. The array behaves similarly to a spotlight. Also, the highest beam value of the disk array does not occur at the focal point, but at a point close to the array aperture. This shows that the disk array does not provide resolution in z-direction. Fig. 6(b) shows the highest beam value at the focal point. The 3-dB beamwidth in the z-direction is about 1.25 m. Therefore, the distributed array is able to generate a 3D resolution cell. However, since the seats inside the lecture hall form a plane, as assumed before, the disk array still performs acceptably but the system only provides 2D resolution. Figs. 7 and 8 show beam patterns of the disk and the distributed arrays when focusing on (12.5 m, 0 m, 0 m). Fig. 7(a) shows that there are locations where the beam value is higher than the beam value at focal point. The output of the disk array might contain higher signals from those non focal points. Thus, the quality of the sound might be deteriorated. Fig. 8(a) shows the highest beam value of the distributed array still occurs at the focal point. However, the beamwidth of the distributed array is increased slightly when the focal point is moved from the origin to (12.5 m, 0 m, 0 m). 5 3 1 2 1 0 12.5 12.5 0 0 y(m) -12.5 -12.5 x(m) 0 -10 -5 0 x(m) 5 1 4 z (m ) Beam Value 4 2 z (m ) Beam Value In the YZ plane, the disk array is still behaving like a spotlight as shown in Fig. 7(b). Meanwhile, refer to Fig. 8(b), the highest beam value of the distributed array still occurs at the focal point. Nonetheless, the beamwidth of the distributed array in the YZ plane also increases when the focal point is moved from the origin to (12.5 m, 0 m, 0 m). 0.5 0 1 (a) (b) Fig. 7: Beam pattern of the disk array focused on (12.5 m, 0 m, 0 m) for a 500 Hz signal. 2 1 0 y(m) 10 3 -1 11 12 0 x(m) -10 -5 0 x(m) 5 10 (a) (b) Fig. 8: Beam pattern of the distributed array focused on (12.5 m, 0 m, 0 m) for a 500Hz signal. 3.2. SIR Pattern In this section, the SIR pattern is formed by calculating SIR values of the output of the arrays when focusing on the audience seats. 100 2 1 0 10 80 20 60 10 0 12.5 y(m) 0 -10 -10 x(m) Fig. 9: SIR pattern of disk array for 500 Hz signal. 25 40 0 10 5 0 -12.5 -12.5 0 x(m) Fig. 10: SIR pattern of distributed array for 500 Hz signal. 20 15 12.5 y(m) 35 30 20 10 0 40 Distributed Array Disk Array SIR SIR SIR 3 30 SIR 4 -10 -5 0 x(m) 5 10 Fig. 11: Compare SIR pattern of the disk and the distributed array for 3 kHz single frequency signal at xdirection. 0 -12.5 - Disk: - Sparse: 0 12.5 x(m) , or , Sparse: 1 or , Sparse: , 1, , Fig. 12: SIR patterns for 500 Hz signal at xdirection. Figs. 9 and 10 show SIR patterns for a 500Hz signal of the disk and distributed arrays respectively. The SIR value was calculated using (9), by assuming 100. The two arrays have highest SIR value at the origin. The SIR values of the arrays decreases when the distance between the focal point and the origin increases. Thus, the output noise suppression ability of the two arrays deteriorates when focusing at a seat further form the origin. 288 By comparing the peak SIR value, the distributed array has much higher SIR values than the disk array. The distributed array has better noise suppression ability. Fig. 11 shows the comparison of SIR patterns of the disk and distributed arrays for a 3000 Hz signal, in x-direction. The disk array has higher SIR value when the focal point is closer to the origin. When the focal point is away from the origin, the distributed array has higher SIR values. The transition point is x ≈ 5 m. Fig. 12 shows the SIR patterns of the disk and distributed arrays with three different weighting functions (4)-(6), for a 500Hz signal, in the x-direction. The SIR pattern of the disk array does not show any significant changes corresponding to the three weighting functions. However, the SIR pattern of the distributed array changes when different weighting function is used. From Fig. 13, the SIR values of the distributed array increases significantly when the spherical decay compensator is removed (replace weighting function (4) to (5)). When weighting function (5) is replaced by (6), the SIR values for the seats near the border of the audience plane increase. Noise suppression ability of the distributed arrays when focusing the seats near the border increases. 4. Conclusion And Future Work The distributed array has better spatial resolution than the disk array. The highest beam value of the distributed array always occurs at its focal point but, the highest beam value of the disk array may occurs at non focal point especially when the focal point is far from the origin. Besides, the disk array only provides 2D resolution whereas the distributed array provides 3D resolution. The SIR patterns shows that the distributed array has better noise suppression ability compare to the disk array for low frequency signals (ie: normal human vocal). Meanwhile, for high frequency signals (ie: music signal), the noise suppression ability of the disk depends on the seats the system is focusing on. When focusing on the seats near to the origin, the disk array has better noise suppression ability. However, when the focal point is far away from the origin, the distributed array has better noise suppression ability. Overall, from the simulation results, the distributed array has more advantage compared to the disk array. However, the accuracy of the microphone position of the distributed array is difficult to determine. The error in the location calibration will influence the quality of the formed beam. The research on the microphone calibration method can be found in [7]-[9]. 5. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] M. Grotticelli, “New Microphone Can Hear Any Sound at Outdoor Sports Venues,” Squarehead Technology. [Online] Available: http://broadcastengineering.com/audio/new-microphone-audioscope-1022/ [Accessed: Nov 22, 2010] M. Kjolerbakken, V. Jahr and I. Hafizovic, “Directional Audio Capturing.” U.S. Patent 20080247567, Oct. 9, 2008 K. Wilson V. Rangarajan, N. Checka, and T. Darrell., "Audiovisual Arrays for Untethered Spoken Interfaces", 4th Int. Conf. Multimodal Interfaces (ICMI), pp. 2002. J. M. Sachar, H. F. Silverman, and W. R. Patterson, III, "Large vs Small Aperture Microphone Arrays: Performance Over a Large Focal Area", Proc. ICASSP, pp. SAM P7.8 - SAM P7.8, 2001. D. Sun, “Microphone Arrays for Hands-free Speech Interfaces”, David Sun Projects. [Online] Available: http://www.eecs.berkeley.edu/~davidsun/micarray.html. [Accessed: Jan. 20, 2011] R. Stiefelhagen, et. al., “Audiovisual Perception of a Lecturer in a Smart Seminar Room”, Signal Processing, In Press, 2006. J. M. Sachar, H. F. Silverman and W. R. Patterson, “Microphone Position and Gain Calibration for a LargeAperture Microphone Array”, IEEE Trans. On Speech and Audio Processing, Vol. 13, no. 1, pp. 42-52, Jan. 2005 Y. Li, "Position and time-delay calibration of transducer elements in a sparse array for underwater ultrasound imaging," IEEE Trans. Ultrason., Ferroelect., Freq. Cont., vol. 53, no. 8, pp. 1458-1466, Aug 2006. Y. Li, I. Sharp, M. Hedley, P. Ho, and Y. J. Guo, "Single- and Double-Difference Algorithms for Position and Time-Delay Calibration of Transducer-Elements in a Sparse Array," IEEE Trans. Ultrason., Ferroelect., Freq. Cont., vol. 54, no. 6, pp. 1188-1198, June 2007. A. Cigada, F. Ripamonti, and M. Vanali, “The Delay & Sum Algorithm Applied to Microphone Array Measurements: Numerical Analysis and Experimental validation” ScienceDirect, Mechanical Signal Processing Vol. 21, Issue 6, pp. 2645-2664, Aug 2007. [Online] Available: http//www.sciencedirect.com [Accessed: Nov. 29, 2010] 289

Document 13134431

Related documents

Products

Support

Document 13134431

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib