Position Calibration of Audio Sensors and Actuators

advertisement
Position Calibration of Audio Sensors and Actuators
in a Distributed Computing Platform
Vikas C. Raykar | Igor Kozintsev | Rainer Lienhart
University of Maryland, CollegePark | Intel Labs, Intel Corporation
Motivation
 Many multimedia applications are emerging which use multiple
audio/video sensors and actuators.
Speakers
Distributed
Rendering
Cameras
Distributed
Capture
Current Work
Microphones
Number
Crunching
Displays
Other Applications
What can you do with multiple microphones…
 Speaker localization and tracking.
 Beamforming or Spatial filtering.
X
Some Applications…
Speech Recognition
Hands free voice
communication
Novel Interactive audio
Visual Interfaces
Multichannel speech
Enhancement
Smart Conference
Rooms
Audio/Image Based
Rendering
Audio/Video
Surveillance
Speaker Localization
and tracking
Source separation and
Dereverberation
Meeting Recording
MultiChannel echo
Cancellation
More Motivation…
 Current work has focused on setting up all the sensors
and actuators on a single dedicated computing platform.
 Dedicated infrastructure required in terms of the sensors,
multi-channel interface cards and computing power.
On the other hand
 Computing devices such as laptops, PDAs, tablets,
cellular phones,and camcorders have become pervasive.
Audio/video sensors on different laptops can be used to
form a distributed network of sensors.
Internal microphone
Common TIME and SPACE
Put all the distributed audio/visual input/output capabilities of all
the laptops into a common TIME and SPACE.
For the common TIME see our poster.
Universal Synchronization Scheme for Distributed Audio-Video Capture on Heterogenous Computing
Platforms R. Lienhart, I. Kozintsev and S. Wehr
In this paper we deal with common SPACE i.e estimate the 3D
positions of the sensors and actuators.
Why common SPACE
Most array processing algorithms require that precise positions of
microphones be known.
Painful and Imprecise to do a manual measurement.
This paper is about..
Z
Y
X
If we know the positions of speakers….
Y
If distances are not exact
If we have more speakers
Solve in the least square
sense
?
X
If positions of speakers unknown…
Consider M Microphones and S speakers.
What can we measure?
Calibration signal
Distance
between each
speaker and all
microphones.
Or Time Of
Flight (TOF)
MxS TOF matrix
Assume TOF
corrupted by
Gaussian noise.
Can derive the
ML estimate.
Nonlinear Least Squares..
More formally can
derive the ML estimate
using a Gaussian
Noise model
Find the coordinates which minimizes this
Maximum Likelihood (ML) Estimate..
we can define a noise model
and derive the ML estimate i.e. maximize the likelihood ratio
If noise is Gaussian
and independent
ML is same as
Least squares
Gaussian noise
Reference Coordinate system | Multiple Global minima
Reference Coordinate System
Positive Y axis
Similarly in 3D
Origin
1.Fix origin
(0,0,0)
X axis
2.Fix X axis
(x1,0,0)
3.Fix Y axis
(x2,y2,0)
4.Fix positive Z
axis
x1,x2,y2>0
Which to choose? Later…
On a synchronized platform all is well..
However On a Distributed system..
The journey of an audio sample..
This laptop wants to play
a calibration signal on
the other laptop.
Play comand in software.
Network
When will the sound be
actually played out from
The loudspeaker.
Multimedia/multistream applications
Operating system
I/O bus
Audio/video I/O devices
On a Distributed system..
Time Origin
tsj
Signal Emitted by source j
Playback Started
Capture Started
tmi
t
Signal Received by microphone i
TOFij
ˆ Fij
TO
t
Joint Estimation..
MS TOF Measurements
Microphone and speaker
Coordinates
3(M+S)-6
Totally
4M+4S-7 parameters
to estimates
MS observations
Can reduce the
number of
parameters
Speaker Emission
Start Times
S
Microphone Capture
Start Times
M -1
Assume tm_1=0
Nonlinear least squares..
Levenberg Marquadrat
method
Function of a large number of parameters
Unless we have a good initial guess may not converge
to the minima.
Approximate initial guess required.
Closed form Solution..
Say if we are given all pairwise distances between N
points can we get the coordinates.
1
2
3
4
1
X
X
X
X
2
X
X
X
X
3
X
X
X
X
4
X
X
X
X
Classical Metric Multi Dimensional Scaling
dot product matrix
Symmetric positive definite
rank 3
Given B can you get X ?....Singular Value Decomposition
Same as
Principal component Analysis
But we can measure
Only the pairwise distance matrix
How to get dot product from the pairwise distance
matrix…
i
d ki
d ij

k
j
d kj
Example of MDS…
Can we use MDS..Two problems
s1
s2
s3
s4
m1
m2
m3
m4
s1
?
?
?
?
X
X
X
X
s2
?
?
?
?
X
X
X
X
X
X
X
s3
?
?
?
?
X
s4
?
?
?
?
X
X
X
X
m1
X
X
X
X
?
?
?
?
m2
X
X
X
X
?
?
?
?
m3
X
X
X
X
?
?
?
?
m4
X
X
X
X
?
?
?
?
1. We do not have
the complete
pairwise distances
2. Measured distances
Include the
effect of lack of
synchronization
Clustering approximation…
Clustering approximation…
ii
ij
ji
jj
Finally the complete algorithm…
TOF matrix
Approximation
Clustering
Approx
ts
Approx
Distance matrix
between GPCs
Approx
tm
Dimension and
coordinate system
TDOA based
Nonlinear
minimization
Microphone and speaker
locations
Dot product matrix
tm
MDS to get approx
GPC locations
perturb
Approx. microphone
and speaker
locations
Sample result in 2D…
Algorithm Performance…
The performance of our algorithm depends on
Noise Variance in the estimated distances.
Number of microphones and speakers.
Microphone and speaker geometry
 One way to study the dependence is to do a lot of monte carlo
simulations.
 Or given a noise model can derive bounds on how worst can
our algortihm perform.
The Cramer Rao bound.
 Gives the lower bound on the variance of any unbiased
estimator.
 Does not depends on the estimator. Just the data and the
noise model.
 Basically tells us to what extent the noise limits our
performance i.e. you cannot get a variance lesser than the CR
bound.
Rank Deficit..remove the
Known parameters
Jacobian
Number of sensors matter…
Number of sensors matter…
Geometry also matters…
Geometry also matters…
Synchronized setup | bias 0.08 cm sigma 3.8 cm
Speaker
2
Speaker
3
Mic
3
Mic
4
Mic
2
Mic
1
Speaker
1
Z
Room Width = 2.55 m
Speaker
4
Room Length = 4.22 m
Room Height = 2.03 m
Experimental results using real data
Summary
•General purpose computers can be used for distributed
array processing
•It is possible to define common time and space for a
network of distributed sensors and actuators.
•For more information please see our two papers or contact
igor.v.kozintsev@intel.com
rainer.lienhart@intel.com
•Let us know if you will be interested in testing/using out time
and space synchronization software for developing
distributed algorithms on GPCs (available in January 2004)
Thank You ! | Questions ?
Download