Position Calibration of Acoustic Sensors and Actuators

advertisement
Position Calibration of Acoustic Sensors and Actuators
on Distributed General Purpose Computing Platforms
Vikas Chandrakant Raykar | University of Maryland, CollegePark
Motivation
 Many multimedia applications are emerging which use multiple
audio/video sensors and actuators.
Speakers
Distributed
Rendering
Cameras
Distributed
Capture
Current Thesis
Microphones
Number
Crunching
Displays
Other Applications
What can you do with multiple microphones…
 Speaker localization and tracking.
 Beamforming or Spatial filtering.
X
Some Applications…
Speech Recognition
Hands free voice
communication
Novel Interactive audio
Visual Interfaces
Multichannel speech
Enhancement
Smart Conference
Rooms
Audio/Image Based
Rendering
Audio/Video
Surveillance
Speaker Localization
and tracking
Source separation and
Dereverberation
Meeting Recording
MultiChannel echo
Cancellation
More Motivation…
 Current work has focused on setting up all the sensors
and actuators on a single dedicated computing platform.
 Dedicated infrastructure required in terms of the sensors,
multi-channel interface cards and computing power.
On the other hand
 Computing devices such as laptops, PDAs, tablets,
cellular phones,and camcorders have become pervasive.
Audio/video sensors on different laptops can be used to
form a distributed network of sensors.
Common TIME and SPACE
Put all the distributed audio/visual input/output capabilities of all
the laptops into a common TIME and SPACE.
This thesis deals with common SPACE i.e estimate the 3D
positions of the sensors and actuators.
Why common SPACE
Most array processing algorithms require that precise positions of
microphones be known.
Painful, tedious and imprecise to do a manual measurement.
This thesis is about..
Z
Y
X
If we know the positions of speakers….
Y
If distances are not exact
If we have more speakers
Solve in the least square
sense
?
X
If positions of speakers unknown…
Consider M Microphones and S speakers.
What can we measure?
Calibration signal
Distance
between each
speaker and all
microphones.
Or Time Of
Flight (TOF)
MxS TOF matrix
Assume TOF
corrupted by
Gaussian noise.
Can derive the
ML estimate.
Nonlinear Least Squares..
More formally can
derive the ML estimate
using a Gaussian
Noise model
Find the coordinates which minimizes this
Maximum Likelihood (ML) Estimate..
we can define a noise model
and derive the ML estimate i.e. maximize the likelihood ratio
If noise is Gaussian
and independent
ML is same as
Least squares
Gaussian noise
Reference Coordinate system | Multiple Global minima
Reference Coordinate System
Positive Y axis
Similarly in 3D
Origin
1.Fix origin
(0,0,0)
X axis
2.Fix X axis
(x1,0,0)
3.Fix Y axis
(x2,y2,0)
4.Fix positive Z
axis
x1,x2,y2>0
Which to choose? Later…
On a synchronized platform all is well..
However On a Distributed system..
The journey of an audio sample..
This laptop wants to play
a calibration signal on
the other laptop.
Play comand in software.
Network
When will the sound be
actually played out from
The loudspeaker.
Multimedia/multistream applications
Operating system
I/O bus
Audio/video I/O devices
On a Distributed system..
Time Origin
tsj
Signal Emitted by source j
Playback Started
Capture Started
tmi
t
Signal Received by microphone i
TOFij
ˆ Fij
TO
t
Joint Estimation..
MS TOF Measurements
Microphone and speaker
Coordinates
3(M+S)-6
Totally
4M+4S-7 parameters
to estimates
MS observations
Can reduce the
number of
parameters
Speaker Emission
Start Times
S
Microphone Capture
Start Times
M -1
Assume tm_1=0
Use Time Difference of Arrival (TDOA)..
Formulation same as above but less number of parameters.
Assuming M=S=K Minimum K required..
Nonlinear least squares..
Levenberg Marquadrat
method
Function of a large number of parameters
Unless we have a good initial guess may not converge
to the minima.
Approximate initial guess required.
Closed form Solution..
Say if we are given all pairwise distances between N
points can we get the coordinates.
1
2
3
4
1
X
X
X
X
2
X
X
X
X
3
X
X
X
X
4
X
X
X
X
Classical Metric Multi Dimensional Scaling
dot product matrix
Symmetric positive definite
rank 3
Given B can you get X ?....Singular Value Decomposition
Same as
Principal component Analysis
But we can measure
Only the pairwise distance matrix
How to get dot product from the pairwise distance
matrix…
i
d ki
d ij

k
j
d kj
Centroid as the origin…
Later shift
it to our
orignal reference
Slightly perturb each location of GPC
into two to get the initial guess for the
microphone and speaker coordinates
Example of MDS…
MDS is more general..
• Instead of pairwise
distances we can use
pairwise “dissimilarities”.
• When the distances are
Euclidean MDS is
equivalent to PCA.
• Eg. Face recognition,
wine tasting
• Can get the significant
cognitive dimensions.
Can we use MDS..Two problems
s1
s2
s3
s4
m1
m2
m3
m4
s1
?
?
?
?
X
X
X
X
s2
?
?
?
?
X
X
X
X
X
X
X
s3
?
?
?
?
X
s4
?
?
?
?
X
X
X
X
m1
X
X
X
X
?
?
?
?
m2
X
X
X
X
?
?
?
?
m3
X
X
X
X
?
?
?
?
m4
X
X
X
X
?
?
?
?
1. We do not have
the complete
pairwise distances
2. Measured distances
Include the
effect of lack of
synchronization
Clustering approximation…
Clustering approximation…
ii
ij
ji
jj
Finally the complete algorithm…
TOF matrix
Approximation
Clustering
Approx
ts
Approx
Distance matrix
between GPCs
Approx
tm
Dimension and
coordinate system
TDOA based
Nonlinear
minimization
Microphone and speaker
locations
Dot product matrix
tm
MDS to get approx
GPC locations
perturb
Approx. microphone
and speaker
locations
Sample result in 2D…
Algorithm Performance…
•The performance of our algorithm depends on
•Noise Variance in the estimated distances.
•Number of microphones and speakers.
•Microphone and speaker geometry
•One way to study the dependence is to do a lot of monte carlo simulations.
•Else can derive the covariance matrix and bias of the estimator.
•The ML estimate is implicitly defined as the minimum of a certain error
function.
•Cannot get an exact analytical expression for the mean and variance.
• Or given a noise model can derive bounds on how worst can our algortihm
perform.
•The Cramer Rao bound.
Estimator Variance…
 Can use implicit function theorem and Taylors series expansion to get
approximate expressions for bias and variance.
•J A Fessler. Mean and variance of implicitly defined biased estimators (such as penalized
maximum likelihood): Applications to tomography.
IEEE Tr. Im. Proc., 5(3):493-506, 1996.
•Amit Roy Chowdhury and Rama Chellappa, "Statistical Bias and the Accuracy of 3D
Reconstruction from Video", Submitted to International Journal of Computer Vision
Using first order taylors series expansion
Jacobian
Rank Deficit..remove the
Known parameters
 Gives the lower bound on the variance of any unbiased
estimator.
 Does not depends on the estimator. Just the data and the
noise model.
 Basically tells us to what extent the noise limits our
performance i.e. you cannot get a variance lesser than the CR
bound.
Rank Deficit..remove the
Known parameters
Jacobian
Different Estimators..
Number of sensors matter…
Number of sensors matter…
Geometry also matters…
Geometry also matters…
Calibration Signal…
Time Delay Estimation…
•
•
Compute the cross-correlation between the signals received at the two
microphones.
The location of the peak in the cross correlation gives an estimate of the
delay.
Task complicated due to two reasons
1.Background noise.
2.Channel multi-path due to room reverberations.
Use Generalized Cross Correlation(GCC).
•
•
W(w) is the weighting function.
PHAT(Phase Transform) Weighting
•
•
Time Delay Estimation…
Synchronized setup | bias 0.08 cm sigma 3.8 cm
Speaker
2
Speaker
3
Mic
3
Mic
4
Mic
2
Mic
1
Speaker
1
Z
Room Width = 2.55 m
Speaker
4
Room Length = 4.22 m
Room Height = 2.03 m
Distributed Setup…
Master
GPC 2
GPC M
Initialization phase Scan the network
and find the number of GPC’s and the
UPnP services available
GPC 1
Play ML Sequence
Play Calibration
Signal
TOA Computation
TOA matrix
Position estimation
Experimental results using real data
Related Previous work…
J. M. Sachar, H. F. Silverman, and W. R. Patterson III. Position calibration of
large-aperture microphone arrays. ICASSP 2002
Y. Rockah and P. M. Schultheiss. Array shape calibration using sources in unknown
locations Part II: Near-field sources and estimator implementation. IEEE Trans. Acoust.,
Speech, Signal Processing, ASSP-35(6):724-735, June 1987.
J. Weiss and B. Friedlander. Array shape calibration using sources in unknow locations a
maximum-likelihood approach. IEEE Trans. Acoust., Speech, Signal Processing ,
37(12):1958-1966, December 1989.
R. Moses, D. Krishnamurthy, and R. Patterson. A self-localization method for wireless
sensor networks. Eurasip Journal on Applied Signal Processing Special Issue on Sensor
Networks, 2003(4):348{358, March 2003.
index.htm
Our Contributions…
•Novel setup for array processing.
•Position calibration in a distributed scenario.
•Closed form solution for the non-linear minimization routine.
•Expression for the mean and variance of the esimators.
•Study the effect of sensor geometry.
Acknowledgements…
• Dr. Ramani Duraiswami and Prof. Rama Chellappa
• Prof. Yegnanarayana
• Dr. Igor Kozintsev and Dr. Rainer Lienhart, Intel Research
• Prof. Min Wu and Prof. Shihab Shamma
• Prof. Larry Davis
Thank You ! | Questions ?
Download