Self Localizing sensors and actuators on Distributed Computing Platforms Vikas Raykar Igor Kozintsev Rainer Lienhart Intel Labs Motivation Many multimedia applications are emerging which use multiple audio/video sensors and actuators. Distributed Capture Cameras Speakers Distributed Rendering Microphones Number Crunching Displays Other Applications Applications Smart Conference Rooms Speech Recognition Source separation and Deverberation Meeting Recording Audio/Image Based Rendering Hands free voice communication MultiChannel Speech Enhancement MultiChannel Echo Cancellation Audio/Video Surveillance Object Localization And tracking Distributed Audio Video Capture Interactive Audio Visual Interfaces Additional Motivation Current work has focused on setting up all the sensors and actuators on a single dedicated computing platform. Dedicated infrastructure required in terms of the sensors, multi-channel interface cards and computing power. On the other hand… Computing devices such as laptops, PDAs, tablets, cellular phones, camcorders have become pervasive. Audio/video sensors on different laptops can be used to form a distributed network of sensors. Problem formulation Put all the distributed audio-visual I/O capabilities into a common time and space. In this paper: Focus on providing a common space by means of actively estimating the 3D positions of the sensors (microphones) and actuators (speakers). Account for the errors due to lack of temporal synchronization among various sensors and actuators (A/Ds and D/As) on distributed general purpose computing platforms. Our View of Distributed Sensor Network Y Z X Localization with known positions of speakers Distances are not exact There are more speakers If positions of speakers are unknown… Consider M Microphones and S speakers. What can we measure? Calibration signal Distance between each speaker and all microphones (Time Of Flight) MxS TOF matrix Assume TOF corrupted by AWGN: can derive the ML estimate. Nonlinear Least Squares Find the coordinates which minimizes this Reference Coordinate System Positive Y axis Similarly in 3D Origin 1.Fix origin (0,0,0) X axis 2.Fix X axis (x1,0,0) 3.Fix Y axis (x2,y2,0) 4.Fix positive Z axis x1,x2,y2>0 Which to choose? Later… On a synchronized platform all is well.. Intel Labs However on a Distributed system.. PC platform overview CPU AGP CPU, MCH, FSB, memory Multimedia/multistream applications MCH Operating system ATA ICH, hub, PCI, LAN, etc. AC97 ICH I/O bus LAN USB PCI Slots Audio/video I/O devices Intel Labs Timing on distributed system Time Origin tsj Signal Emitted by source j Playback Started Capture Started tmi t Signal Received by microphone i TOFij ˆ Fij TO t Joint Estimation MS TOF Measurements Microphone and speaker Coordinates DM+DS - [ D(D+1)/2 ] Speaker Emission Start Times S Microphone Capture Start Times M -1 Assume tm_1=0 Time Difference of Arrival (TDOA) Formulation same as above but less number of parameters. Nonlinear least squares Levenberg Marquadrat method Multidimensional function. Unless we have a good initial guess may not converge to the global minima. Approximate initial guess required. Multi Dimensional Scaling dot product matrix Symmetric positive definite rank 3 Given B can you get X ?....Singular Value Decomposition Clustering approximation Clustering approximation ii ij ji jj How to get dot product from the pair wise distance matrix i d ki d ij j k d kj Centroid as the origin Later shift it to our orignal reference Slightly perturb each location of GPC into two to get the initial guess for the microphone and speaker coordinates Sample result in 2D Algorithm TOF matrix Clustering Approx ts Approx Distance matrix between GPCs Approx tm Dimension and coordinate system TDOA based Nonlinear minimization Microphone and speaker locations Dot product matrix tm MDS to get approx GPC locations perturb Approx. microphone and speaker locations Cramer-Rao bound Gives the lower bound on the variance of any unbiased estimator. Does not depends on the estimator. Just the data and the noise model. Basically tells us to what extent the noise limits our performance i.e. you cannot get a variance lesser than the CR bound. Rank deficit: remove the known parameters Jacobian Performance comparison Dependence on number of nodes Dependence on number of nodes Geometry matters Geometry matters Experimental setup: bias 0.08 cm sigma 3.8 cm Speaker 2 Speaker 3 Mic 3 Mic 4 Mic 2 Mic 1 Speaker 1 Z Room Width = 2.55 m Speaker 4 Room Length = 4.22 m Room Height = 2.03 m Summary General purpose computers can be used for distributed array processing It is possible to define common time and space for a network of distributed sensors and actuators. For more information please see our two papers in ACM MM in November or contact igor.v.kozintsev@intel.com rainer.lienhart@intel.com Let us know if you will be interested in testing/using out time and space synchronization software for developing distributed algorithms on GPC (available in November) Intel Labs Backup Intel Labs Calibration signal Results (contd.)