Position Calibration of Audio Sensors and Actuators in a Distributed Computing Platform Vikas C. Raykar | Igor Kozintsev | Rainer Lienhart University of Maryland, CollegePark | Intel Labs, Intel Corporation Motivation Many multimedia applications are emerging which use multiple audio/video sensors and actuators. Speakers Distributed Rendering Cameras Distributed Capture Current Work Microphones Number Crunching Displays Other Applications What can you do with multiple microphones… Speaker localization and tracking. Beamforming or Spatial filtering. X Some Applications… Speech Recognition Hands free voice communication Novel Interactive audio Visual Interfaces Multichannel speech Enhancement Smart Conference Rooms Audio/Image Based Rendering Audio/Video Surveillance Speaker Localization and tracking Source separation and Dereverberation Meeting Recording MultiChannel echo Cancellation More Motivation… Current work has focused on setting up all the sensors and actuators on a single dedicated computing platform. Dedicated infrastructure required in terms of the sensors, multi-channel interface cards and computing power. On the other hand Computing devices such as laptops, PDAs, tablets, cellular phones,and camcorders have become pervasive. Audio/video sensors on different laptops can be used to form a distributed network of sensors. Internal microphone Common TIME and SPACE Put all the distributed audio/visual input/output capabilities of all the laptops into a common TIME and SPACE. For the common TIME see our poster. Universal Synchronization Scheme for Distributed Audio-Video Capture on Heterogenous Computing Platforms R. Lienhart, I. Kozintsev and S. Wehr In this paper we deal with common SPACE i.e estimate the 3D positions of the sensors and actuators. Why common SPACE Most array processing algorithms require that precise positions of microphones be known. Painful and Imprecise to do a manual measurement. This paper is about.. Z Y X If we know the positions of speakers…. Y If distances are not exact If we have more speakers Solve in the least square sense ? X If positions of speakers unknown… Consider M Microphones and S speakers. What can we measure? Calibration signal Distance between each speaker and all microphones. Or Time Of Flight (TOF) MxS TOF matrix Assume TOF corrupted by Gaussian noise. Can derive the ML estimate. Nonlinear Least Squares.. More formally can derive the ML estimate using a Gaussian Noise model Find the coordinates which minimizes this Maximum Likelihood (ML) Estimate.. we can define a noise model and derive the ML estimate i.e. maximize the likelihood ratio If noise is Gaussian and independent ML is same as Least squares Gaussian noise Reference Coordinate system | Multiple Global minima Reference Coordinate System Positive Y axis Similarly in 3D Origin 1.Fix origin (0,0,0) X axis 2.Fix X axis (x1,0,0) 3.Fix Y axis (x2,y2,0) 4.Fix positive Z axis x1,x2,y2>0 Which to choose? Later… On a synchronized platform all is well.. However On a Distributed system.. The journey of an audio sample.. This laptop wants to play a calibration signal on the other laptop. Play comand in software. Network When will the sound be actually played out from The loudspeaker. Multimedia/multistream applications Operating system I/O bus Audio/video I/O devices On a Distributed system.. Time Origin tsj Signal Emitted by source j Playback Started Capture Started tmi t Signal Received by microphone i TOFij ˆ Fij TO t Joint Estimation.. MS TOF Measurements Microphone and speaker Coordinates 3(M+S)-6 Totally 4M+4S-7 parameters to estimates MS observations Can reduce the number of parameters Speaker Emission Start Times S Microphone Capture Start Times M -1 Assume tm_1=0 Nonlinear least squares.. Levenberg Marquadrat method Function of a large number of parameters Unless we have a good initial guess may not converge to the minima. Approximate initial guess required. Closed form Solution.. Say if we are given all pairwise distances between N points can we get the coordinates. 1 2 3 4 1 X X X X 2 X X X X 3 X X X X 4 X X X X Classical Metric Multi Dimensional Scaling dot product matrix Symmetric positive definite rank 3 Given B can you get X ?....Singular Value Decomposition Same as Principal component Analysis But we can measure Only the pairwise distance matrix How to get dot product from the pairwise distance matrix… i d ki d ij k j d kj Example of MDS… Can we use MDS..Two problems s1 s2 s3 s4 m1 m2 m3 m4 s1 ? ? ? ? X X X X s2 ? ? ? ? X X X X X X X s3 ? ? ? ? X s4 ? ? ? ? X X X X m1 X X X X ? ? ? ? m2 X X X X ? ? ? ? m3 X X X X ? ? ? ? m4 X X X X ? ? ? ? 1. We do not have the complete pairwise distances 2. Measured distances Include the effect of lack of synchronization Clustering approximation… Clustering approximation… ii ij ji jj Finally the complete algorithm… TOF matrix Approximation Clustering Approx ts Approx Distance matrix between GPCs Approx tm Dimension and coordinate system TDOA based Nonlinear minimization Microphone and speaker locations Dot product matrix tm MDS to get approx GPC locations perturb Approx. microphone and speaker locations Sample result in 2D… Algorithm Performance… The performance of our algorithm depends on Noise Variance in the estimated distances. Number of microphones and speakers. Microphone and speaker geometry One way to study the dependence is to do a lot of monte carlo simulations. Or given a noise model can derive bounds on how worst can our algortihm perform. The Cramer Rao bound. Gives the lower bound on the variance of any unbiased estimator. Does not depends on the estimator. Just the data and the noise model. Basically tells us to what extent the noise limits our performance i.e. you cannot get a variance lesser than the CR bound. Rank Deficit..remove the Known parameters Jacobian Number of sensors matter… Number of sensors matter… Geometry also matters… Geometry also matters… Synchronized setup | bias 0.08 cm sigma 3.8 cm Speaker 2 Speaker 3 Mic 3 Mic 4 Mic 2 Mic 1 Speaker 1 Z Room Width = 2.55 m Speaker 4 Room Length = 4.22 m Room Height = 2.03 m Experimental results using real data Summary •General purpose computers can be used for distributed array processing •It is possible to define common time and space for a network of distributed sensors and actuators. •For more information please see our two papers or contact igor.v.kozintsev@intel.com rainer.lienhart@intel.com •Let us know if you will be interested in testing/using out time and space synchronization software for developing distributed algorithms on GPCs (available in January 2004) Thank You ! | Questions ?