Fast Deconvolution Of Multichannel Systems Using Regularization

advertisement
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 6, NO. 2, MARCH 1998
setting a noise detection threshold. From this threshold, a multistage
uniform quantizer is used to shape quantization noise to fit the
masking curve. The FSM model used to encode the output of the
quantizer exploits statistical redundancies in the time, frequency, and
stage dimensions, resulting in effective compression of the audio
signal.
Informal test results show that the proposed audio coder is roughly
equivalent in quality to MPEG layer II and performs better than
MPEG layer I at the tested bit rates. Its computation demands are
reasonable, and it is amenable to hardware and software implementations. However, it should be recognized that MPEG audio coders,
unlike this coder, also satisfy certain practical functionalities in
addition. Hence, the proposed approach as presented is not a substitute
for the standard. Rather, we feel the approach points to some novel
alternative quantization and entropy coding components that may be
useful in future audio compression systems and standards.
ACKNOWLEDGMENT
The authors thank the anonymous reviewers for their constructive
comments and suggestions.
REFERENCES
[1] J. Princen, A. Johnson, and A. Bradley, “Subband/transform coding
using filterbank designs based on time-domain aliasing cancellation,”
in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Apr.
1987, pp. 2161–2164.
[2] R. N. J. Veldhuis, M. Breeuwer, and R. V. D. Waal, “Subband coding
of digital audio signals without loss of quality,” Proc. IEEE Int. Conf.
Acoustics, Speech, and Signal Processing, May 1989, pp. 2209–2012.
[3] J. D. Johnson, “Transform coding of audio signals using perceptual noise
criteria,” IEEE J. Select. Areas Commun., vol. 6, pp. 314–323, Feb. 1988.
[4] W.-Y. Chan and A. Gersho, “High fidelity audio transform coding with
vector quantization,” in Proc. IEEE Int. Conf. Acoustics, Speech, and
Signal Processing, Albuquerque, NM, Apr. 1990, pp. 1109–1112.
[5] M. R. Soleymani, “New tandem source-channel trellis coding scheme,”
IEEE Trans. Speech Audio Processing, vol. 2, pp. 24–28, Jan. 1994.
[6] “Coding of moving pictures and associated audio for digital storage media at up to about 1.5 mbit/s, Pt. 3: Audio,” ISO/IEC JTC1/SC29/WG11
MPEG, IS 11172-3.
[7] F. Kossentini, M. Macon, and M. J. T. Smith, “Audio coding using variable-depth multistage quantizers,” in Data Compression Conf.,
Snowbird, UT, Mar. 1996.
[8] H. S. Malvar, Signal Processing with Lapped Transforms. Norwood,
MA: Artech House, 1992.
[9] E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models. New
York: Springer-Verlag, 1990.
[10] P. Maragos and R. W. Schafer, “Morphological systems for multidimensional signal processing,” Proc. IEEE, vol. 78, pp. 690–710, Apr.
1990.
[11] I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data
compression,” Commun. ACM, vol. 30, pp. 520–540, 1987.
[12] F. Kossentini, W. Chung, and M. Smith, “Subband image coding with
jointly optimized quantizers,” in Proc. IEEE Int. Conf. Acoustics, Speech,
and Signal Processing, Detroit, MI, May 1995, pp. 2221–2224.
, “Conditional entropy-constrained residual VQ with application to
[13]
image coding,” Trans. Image Processing: Spec. Issue Vector Quantizat.,
vol. 5, pp. 311–320, Feb. 1996.
[14]
, “A jointly optimized subband coder,” IEEE Trans. Image Processing, vol. 5, pp. 1311–1323, Sept. 1996.
[15] P. G. Howard and J. S. Vitter, “Design and analysis of fast text compression based on quasiarithmetic coding,” Inform. Processing Manage.,
vol. 30, pp. 777–790, 1994.
189
Fast Deconvolution of Multichannel
Systems Using Regularization
Ole Kirkeby, Philip A. Nelson, Hareo Hamada,
and Felipe Orduna-Bustamante
Abstract— A very fast deconvolution method, which is based on the
fast Fourier transform (FFT), can be used to control the outputs from a
multichannel plant comprising any number of control sources and error
sensors. The result is a matrix of causal finite impulse response filters
whose performance is optimized at a large number of discrete frequencies.
I. INTRODUCTION
D
ECONVOLUTION is useful for many practical applications,
and there is a vast amount of literature covering the different
aspects of the problem (see, e.g., [1, chap. 10] or [2]). We are
interested in deconvolution techniques for the purpose of designing
digital filters for multichannel sound reproduction. More specifically,
given a set of S loudspeakers, the objective is to reproduce a
desired sound field at R points in space as accurately as possible.
This principle is applied by the so-called cross-talk cancellation
systems that are used for reproducing binaural recordings over two
loudspeakers [3]–[5]. In this case, a 2 2 2 matrix of digital filters
is used to compensate for both the room response and the response
of the loudspeakers, and also to cancel the cross-talk from the left
loudspeaker to the right ear and vice versa [6]–[9]. A related problem
is that of achieving perfect “dereverberation” of a room response at
one microphone position by using two digital filters to calculate the
input to two loudspeakers [10].
In this correspondence, we present a very fast method for calculating a matrix of digital filters that can be used to control the
outputs from a multichannel plant. This method is typically several
orders of magnitude faster than the time domain methods that have
previously been investigated [6], [10], [11]. It combines the wellknown principles of least squares inversion in the frequency domain
[12], [13], and the zeroth-order regularization method [14, chap. 18]
which is traditionally used when one is faced with an ill-conditioned
inversion problem [15, chap. 2].
II. SYSTEM DESCRIPTION
The inversion problem is shown in block diagram form in Fig. 1.
We assume that the system is working in discrete time, and so the
conventional z -transform notation is used [16, chap. 7]. The variables
are defined as follows. u(z ) is a vector of T observed signals, v(z ) is
a vector of S source input signals, w(z ) is a vector of R reproduced
signals, d(z ) is a vector of R desired signals, and e(z ) is a vector of
Manuscript received February 3, 1996; revised April 10, 1997. This
work was first published in an ISVR Technical Report, April 1996, which
was subsequently reproduced in the IEICE Transactions on Fundamentals
of Electronics, Communications, and Computer Sciences, vol. E80–A, pp.
809–820, May 1997. The associate editor coordinating the review of this
manuscript and approving it for publication was Dr. Dennis R. Morgan.
O. Kirkeby and P. A. Nelson are with the Institute of Sound and Vibration
Research, Southampton University, Highfield, Southampton SO17 1BJ, U.K.
(e-mail: pan@isvr.soton.ac.uk).
H. Hamada is with the Department of Electrical and Communications
Engineering, Tokyo Denki University, Tokyo 101, Japan.
F. Orduna-Bustamante is with the Seccion de Acoustica, Centro de Instrumentos, UNAM, Circuito Exterior CU, Mexico DF.
Publisher Item Identifier S 1063-6676(98)01688-5.
1063–6676/98$10.00  1998 IEEE
190
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 6, NO. 2, MARCH 1998
( A)
H ()
have been omitted from the scalar
where the subscript m;
elements of m;A z for convenience. The elements Crs z of
z are the z -transforms of the impulse responses crs n of the
z is assumed to be a causal
multichannel plant. Each element
sequence of finite length. Thus, the z -transform of crs n is of the
form
N 01
Crs z
crs n z 0n crs
crs z 01
n=0
1 1 1 crs Nc 0 z0(N 01) :
(3)
C( )
C( )
( )=
Fig. 1. The discrete-time multichannel deconvolution problem in block
diagram form.
()
()
()
()
= (0) + (1)
+ + ( 1)
The elements Art (z ) of A(z ) are also assumed to represent causal
finite length sequences. The maximum number of coefficients in the
elements of A(z ), and C(z ) are denoted by Na and Nc , respectively.
The elements of Hm;A (z ) will later (in Section IV) be constrained
to be causal finite length sequences whose maximum number of
coefficients is Nh , but the analysis that follows (in Section III)
imposes only the constraint of stability on the optimal filters. From
the block diagram shown in Fig. 1, it is straightforward to derive the
following relationships:
v(z) = Hm;A (z)u(z)
w(z) = C(z)v(z)
d(z) = z0m A(z)u(z)
and
2
Fig. 2. The geometry of a 4
4 system. Four microphone positions are
obtained by rotating a KEMAR dummy head by 5 and by 5 relative
to facing straight ahead.
+
0
and
(1b)
e(z) = [E1(z) 1 1 1 ER(z)]T:
(1e)
A( ) C( )
H ()
(1a)
(1c)
(1d)
H ()
C( )
u
H ()
A( )
A(z) =
C(z) =
and
Hm;A (z) =
C( )
A11(z) 1 1 1 A1T (z)
..
.
..
..
.
..
.
..
.
RT z
1S z
..
.
(2a)
AR1 (z) 1 1 1 A ( )
C11(z) 1 1 1 C ( )
.
CR1(z) 1 1 1 CRS (z)
H11(z) 1 1 1 H1T (z)
..
.
..
.
..
.
ST z
HS1 (z) 1 1 1 H ( )
(6)
(7)
A( )
u( )
From (6) it is seen that the function of the target matrix
z is to
define the desired signals z in terms of the observed signals z .
d( )
III. EXACT LEAST SQUARES DECONVOLUTION
u(z) = [U1(z) 1 1 1 UT (z)]T
v(z) = [V1(z) 1 1 1 VS (z)]T
w(z) = [W1(z) 1 1 1 WR(z)]T
d(z) = [D1(z) 1 1 1 DR(z)]T
The matrices
z;
z , and m;A z represent multichannel
z is an R 2 T target matrix, z is an R 2 S plant
filters.
matrix, and m;A z is an S 2 T matrix of optimal filters. The
component z 0m delays all the elements of by an integer number
of m samples. This delay is usually referred to as a modeling delay.
It is crucial to include a modeling delay in order to ensure that it is
possible to achieve a good performance from the optimal filters under
the constraint that they be causal. This paper describes a method for
z;
determining a matrix of causal optimal filters m;A z given
z , and m. The matrices have the following structures:
A( )
(5)
e(z) = d(z) 0 w(z):
R performance error signals. All vectors are column vectors written
as
(4)
This section outlines the theory upon which the fast deconvolution
algorithm is based. We show how to calculate a matrix of optimal
filters that are ideal in the sense that they are constrained to be
stable, but not constrained to be either causal or of finite duration.
Consequently, they are generally not realizable in practice. However,
when the modeling delay and the regularization parameter are set
appropriately, the fast deconvolution algorithm will essentially return
a range of coefficients that are a close approximation to the ideal
filters, and so the properties of these filters are crucial.
A. The Exact Least Squares Solution
=0
(no modeling delay).
We first consider the case when m
For the purpose of defining 0;A_ z uniquely, the complex variable
z is constrained to be on the unit circle, jz j
, by substituting
j!
for z , where
is the sampling interval and ! is the
angular frequency [16]. This is equivalent to constraining the impulse
reponse of a filter with a given z -transform to be stable, but it does not
guarantee that the impulse response is causal [17]. A cost function J
is defined as the sum of two terms: a “performance error” term H ,
which is a measure of how well the desired signals are reproduced
at the transducers, and an “effort penalty” term H , which is a
measure proportional to the total input power to all the sources. The
denotes the Hermitian operator which transposes and
superscript
conjugates its argument [18, p. 343]. For a system working in discrete
time, the total cost J as a function of frequency is given by
H ej!1 ej!1 H ej!1 ej!1 : (8)
J ej!1
H ()
1
exp( 1)
=1
ee
vv
H
(2b)
:
(2c)
(
)=e (
)e(
)+ v (
)v(
)
The positive real number is a regularization parameter that determines how much weight to assign to the effort term [19], [20]. By
varying from zero to infinity, the solution changes gradually from
minimizing only the performance error to minimizing only the effort
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 6, NO. 2, MARCH 1998
191
Fig. 3. The impulse responses crs (n) of the 16 HRTF’s derived from the geometry shown in Fig. 2. The “matrix form” is used to illustrate the
position of each of the HRTF’s in (n).
C
0
v
) = [C ( )C( ) + I]
2 CH (ej!1)A(ej!1 )u(ej!1 ):
cost [14, chap. 18]. When > ; J is minimized in the least squares
sense by a vector opt of source inputs that are given by
H ej!1 ej!1 01
ej!1
vopt (
(9)
This solution is unique regardless of the dimensions and the rank of
. Consequently, if is to take its optimal value opt for any choice
of , then according to (4) 0;A must be given by
0;A ej!1
H ej!1 ej!1 01 H ej!1 ej!1
(10)
C
u
v
H (
[C (
)=
)C(
v
H
) + I] C ( )A( )
assuming that no modeling delay is used (m = 0). Since exp(j!1)
conjugated is equal to exp(0j! 1), the z -transform of H0;A (z ) then
becomes
H0;A (z) = [CT(z01 )C(z) + I]01CT(z01 )A(z):
(11)
()
at any other transducers, d2 z is ideally reproduced perfectly at
transducer number two and not observed at any other transducers,
z does not have to be a square matrix; the
and so on. Note that
number of control sources and error sensors need not be the same.
One reason why the cross-talk cancellation problem deserves special
attention is that once 0;I z is known it is a trivial task to calculate
0;A z since
C( )
H ()
H ()
H0;A (z) = H0;I (z)A(z)
(13)
as seen from (11) and (12). This means that the cross-talk cancellation
problem is in a sense a “worst-case” problem. If it is possible to
solve the cross-talk cancellation problem, it is possible to solve the
deconvolution problem for any target matrix. Another reason for
considering 0;I z specifically in the context of sound reproduction
is that this matrix is necessary for reproducing binaural recordings
over loudspeakers [9].
H ()
C. The Effect of Regularization
B. The Generalized Cross-Talk Cancellation Matrix
d( )
In the special case where the desired signals z are identical to
z is an identity matrix of
the observed signals z , the matrix
order R T , and so the optimal filters are given by
u( )
=
A( )
H0;I (z) = [CT(z01 )C(z) + I]01CT(z01 ):
matrix H0;I (z ) is referred to as the generalized
(12)
The
cross-talk
cancellation matrix [9] (this term is also used when m is not zero
[21], [22]). This matrix of filters achieves the best (best in the
frequency domain least squares sense) reproduction of each of the
desired signals dr z at transducer number r. Thus, d1 z is ideally
reproduced perfectly at transducer number one and not observed
()
()
Since a large value of means that the optimal solution will favor
a low power output from the inverse filters at the expense of a high
performance error, it is evident that can be used to control the
power output from the optimal filters. It is important to note that
can be used to control the “duration” of the inverse filters, and
thereby provide a way to avoid the undesirable “wrap-around” effect
usually associated with filter design methods based on sampling in the
frequency domain. It turns out that regularization essentially controls
the longest time constant of the optimal filters [22], and in order to
ensure that the value of this time constant is not too long or too
short, the regularization parameter must be set appropriately. If is too small, there will be sharp peaks in the frequency responses
192
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 6, NO. 2, MARCH 1998
C
Fig. 4. The impulse responses of the 16 filters hsr (n) that deconvolve the matrix (n) shown in Fig. 3. The “matrix form” illustrates the position of
each of the optimal filters hst (n) in m;I (n). Note how the combined effect of the modeling delay and the regularization has limited the main part
of the energy of each hrs (n) to the center of the filter.
H
of the optimal filters, and if is too large, the deconvolution will
not be very accurate. Fortunately, though, the exact value of is
usually not critical. For example, if the optimal value of in a given
situation is 0 , then values in the range between 0:80 and 1:20 will
usually work just as well, and values in the range between 0:20 and
50 are likely to produce acceptable results. Ultimately, a subjective
judgement is necessary in order to determine whether the value of
is acceptable.
IV. FAST DECONVOLUTION USING REGULARIZATION
As demonstrated in the previous section, it is not difficult to derive
an expression for the inverse z -transforms of the optimal filters under
the constraint that they be stable. In practice, the filters also have to be
causal and, in addition, our method also requires them to have finite
duration. In this section, we show how to calculate a matrix of optimal
causal finite impulse response (FIR) filters, each containing Nh
coefficients. Since this method uses fast Fourier transforms (FFT’s),
Nh must be equal to a power of two.
A. The Principle of the Method
It is a well-known fact that deconvolution based on matching
the frequency response only at a number of discrete frequencies
usually leads to an undesirable circular convolution effect, sometimes
referred to as wrap-around effect, in the time domain [17, chap. 3].
When convolving two sequences by multiplying their FFT’s, circular
convolution effects can be avoided by using zero-padding. However,
when one attempts to deconvolve one sequence out of another
by dividing their FFT’s, zero-padding does not prevent circular
convolution effects from affecting the outcome. The problem is that
the optimal frequency response of the optimal filter is inevitably that
of a filter whose impulse response is of infinite duration, and so
the zero-padded sequence would have to be infinitely long to avoid
circular convolution completely.
The basic idea behind our method is to use regularization to reduce
the effective duration of the optimal filter to approximately Nh =2.
Since the response of the exact least squares inverse is matched
only at Nh frequencies, Nh needs to be large enough to ensure that
important detail is not missed out by the sampling in the frequency
domain. As an initial estimate of Nh , one can try a value of 4SNc .
B. The Fast Deconvolution Algorithm
The implementation of the inversion method is straightforward in
practice. FFT’s are used to get in and out of the frequency domain,
and the system is inverted for each frequency in turn. A “cyclic shift”
of the inverse FFT’s of the optimal frequency responses is used to
implement a modeling delay (this has been demonstrated previously
by Hamada [23]).
Equation (11) gives an expression for the response of 0;A as a
continuous function of frequency. If an FFT is used to sample the
frequency response of 0;A at Nh points, then the value of 0;A (k)
at those frequencies is given by
H
H
H
H0 A(k) = [CH (k)C(k) + I]01CH(k)A(k)
(14)
;
where k denotes the kth frequency index; that is, the frequency
corresponding to the complex number exp(j 2k=Nh ). In order to
calculate the impulse responses of a matrix of causal filters m;A (n)
for a given value of , the following steps are necessary.
h
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 6, NO. 2, MARCH 1998
w n
193
Cn
H
n
Fig. 5. The impulse responses rt ( ) of the 16 filters that are the result of the “matrix convolution” of ( ) with m;I ( ). The cross-talk is represented
by the off-diagonal elements that are ideally zero. Note that the peaks along the diagonals occur at sample number 512, the value of the modeling delay .
C
1) Calculate (k) by taking R 2 S Nh -point FFT’s of the plant
impulse responses crs (n).
2) For each of the Nh values of k, calculate the S 2 T matrix
0;A (k ) from (14).
3) Calculate 0;A (n) by taking S 2 T Nh -point inverse FFT’s of
the elements of 0;A (k).
4) Implement the modeling delay by a cyclic shift of m of each
element of 0;A (n). For example, if the inverse FFT of H11 (k)
is f3; 2; 1; 0; 0; 0; 0; 1g, then after a cyclic shift of three to the
right h11 (n) is f0; 0; 1; 3; 2; 1; 0; 0g.
The exact value of m is not critical; a value of Nh =2 is likely to
work well in all but a few cases.
H
h
H
h
C. A Multichannel Example
This section shows the result of a deconvolution of a 4 2 4 matrix
of head related transfer functions (HRTF’s) based on the geometry
shown in Fig. 2. The four loudspeakers are positioned at equal
distances from the origin of the coordinate system at the angles 30 ;
030 ; 110 , and 0110 (the coordinate system chosen is consistent
with that employed by Blauert [24]). The four microphones are “positioned” in the ears of a KEMAR dummy head which is rotated by plus
5 and by 05 relative to facing straight ahead. This setup was used
for implementing a virtual source imaging system [21]. The HRTF’s
are taken from the MIT Media Laboratory’s data base, which has been
made available for researchers over the Internet (World Wide Web
address: http://sound.media.mit.edu/kdm/hrtf.html). Each HRTF is
the result of a measurement in an anechoic chamber at a sampling
frequency of 44.1 kHz. We use the “compact” version of the data
base; each HRTF has been equalized for the loudspeaker response
before being truncated to retain only 128 coefficients. In addition,
m
each HRTF has also been scaled so that their values lie within the
range from 01 to +1. Fig. 3 shows, in “matrix-form,” the impulse
responses of the 16 HRTF’s that are the elements of (n) as derived
from the geometry shown in Fig. 2. This system can be efficiently inverted by a generalized cross-talk cancellation matrix m;I (n) whose
16 elements each contains 1024 coefficients. The impulse responses
of the 16 elements of m;I (n) are shown in Fig. 4. These filters were
calculated in less than 15 s on a 486-PC using a modeling delay of
512 samples and a regularization parameter of 0.0001 (without regularization, the deconvolution does not work at all). The 16 impulse responses wrt (n) resulting from “convolving” (n) with m;I (n) (or,
more precisely, multiplying (z ) with (z )) are shown in Fig. 5,
and their frequency responses Wrt (f ) are shown in Fig. 6. Clearly,
the deconvolution is very accurate. Diagonal elements of
(f ) have
got almost perfectly flat frequency responses, and the “cross-talk,” as
represented by the off-diagonal elements of
(f ), is attenuated by
more than 30 dB over almost the entire audio frequency range.
C
H
H
C
H
C
H
W
W
V. CONCLUSION
An FFT-based deconvolution method can be used to deconvolve
both single-channel and multichannel systems with a matrix of causal
FIR filters. The method is extremely fast, and easy to implement.
However, the method works well only when it is possible to use
relatively long optimal filters, and so it should be used only when
hardware restrictions are not too severe. It is suitable for both
hardware and computer implemention since it uses only numerically
fast operations such as FFT’s, convolutions, and inversion and
multiplication of well-conditioned matrices of low order.
The method is based on the analysis of a matrix of exact least
squares optimal filters. Even though these filters are generally not
194
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 6, NO. 2, MARCH 1998
Fig. 6. The frequency responses of the 16 filters shown in Fig. 5. The unit of the x-axis is kHz, and the unit of the y -axis is dB. The diagonal elements
are ideally 0 dB, the off-diagonal elements are ideally
dB.
01
realizable in practice, their properties indicate that it is possible to
design a matrix of causal FIR optimal filters whose performance is
optimized in the frequency domain at a large number of discrete
frequencies. The well-known, and in this case undesirable, circular
convolution effect in the time domain, which is associated with filter
design based on frequency sampling techniques, is controlled by using
regularization. In practice, the regularization works by ensuring that
the optimal filters decay away quickly enough to ensure that the
circular convolution effect is insignificant. In order to achieve an
accurate inversion, the regularization parameter must be set to an
appropriate value, but fortunately the exact value of is usually not
critical.
[10]
REFERENCES
[16]
[1] B. Widrow and S. D. Stearns, Adaptive Signal Processing. Englewood
Cliffs, NJ: Prentice-Hall, 1985.
[2] J. H. Justice, N. L. Owsley, J. L. Yen, and A. C. Kak, Array Signal
Processing, S. Haykin, Ed. Englewood Cliffs, NJ: Prentice-Hall, 1985.
[3] B. S. Atal, M. Hill, and M. R. Schroeder, “Apparant sound source
translator,” U.S. Patent no. 3 236 949, Feb. 22, 1966.
[4] P. Damaske, “Head-related two-channel stereophony with loudspeaker
reproduction,” J. Acoust. Soc. Amer., vol. 50, pp. 1109–1115, 1971.
[5] M. R. Schroeder, “Models of hearing,” Proc. IEEE, vol. 63, pp.
1332–1352, 1975.
[6] P. A. Nelson, H. Hamada, and S. J. Elliott, “Adaptive inverse filters for
stereophonic sound reproduction,” IEEE Trans. Signal Processing, vol.
40, pp. 1621–1632, 1992.
[7] D. Griesinger, “Equalization and spatial equalization of dummy-head
rocordings for loudspeaker reproduction,” J. Audio Eng. Soc., vol. 37,
pp. 20–29, 1989.
[8] D. H. Cooper and J. L. Bauck, “Prospects for transaural recording,” J.
Audio Eng. Soc., vol. 37, pp. 3–19, 1989.
[9] J. Bauck and D. H. Cooper, “Generalized transaural stereo and applications,” J. Audio Eng. Soc., vol. 44, pp. 683–705, 1996.
[17]
[11]
[12]
[13]
[14]
[15]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
M. Miyoshi and Y. Kaneda, “Inverse filtering of room acoustics,”
IEEE Trans. Acoust., Speech, Signal Processing, vol. 36, pp. 145–152,
1988.
O. Kirkeby, P. A. Nelson, F. Orduna-Bustamante, and H. Hamada,
“Local sound field reproduction using digital signal processing,” J.
Acoust. Soc. Amer., vol. 100, pp. 1584–1593, 1996.
A. L. Van Buren, “Theoretical design of nearfield calibration arrays,” J.
Acoust. Soc. Amer., vol. 50, pp. 192–199, 1973.
O. Kirkeby and P. A. Nelson, “Reproduction of plane wave sound
fields,” J. Acoust. Soc. Amer., vol. 94, pp. 2992–3000, 1993.
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery,
Numerical Recipes in C, 2nd ed. Cambridge, U.K.: Cambridge Univ.
Press, 1992.
J. H. Wilkinson, The Algebraic Eigenvalue Problem. Oxford, U.K.:
Oxford Univ. Press, 1965.
P. Kraniauskas, Transforms in Signals and Systems. Reading, MA:
Addison-Wesley, 1992.
A. V. Oppenheim and R. W. Schafer, Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1975.
E. Kreyszig, Advanced Engineering Mathematics. New York: Wiley,
1983.
S. J. Elliott, C. C. Boucher, and P. A. Nelson, “The behavior of a
multiple channel active control system,” IEEE Trans. Signal Processing,
vol. 40, pp. 1041–1052, 1992.
P. A. Nelson, “Active control of acoustic fields and the reproduction of
sound,” J. Sound Vib., vol. 177, pp. 447–477, 1994.
F. Orduna-Bustamante, “Digital signal processing for multi-channel
sound reproduction,” Ph.D. dissertation, Southampton Univ., U.K.,
1995.
O. Kirkeby, P. A. Nelson, H. Hamada, and F. Orduna-Bustamante, “Fast
deconvolution of multi-channel systems using regularization,” ISVR
Tech. Rep. 255, Univ. Southampton, U.K., 1996.
H. Hamada, “Construction of orthostereophonic system for the purposes
of quasiinsitu recording and reproduction,” J. Acoust. Soc. Jpn., vol. 39,
pp. 337–348, 1983.
J. Blauert, Spatial Hearing, Amer. ed., trans. J. S. Allen. Cambridge,
MA: MIT Press, 1983.
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 6, NO. 2, MARCH 1998
Correction to “A Frequency-Warping
Approach to Speaker Normalization”
Due to an editorial error in the above paper,1 the biography of
Richard C. Rose was incorrect. Dr. Rose’s picture and his name
were printed in the paper with the biography of Dr. Kenneth Rose.
The correct photograph and biography for Richard C. Rose follow.
We would like to apologize to Richard Rose for this serious mistake.
We would also like to apologize to both authors of the above paper
for having detracted from the quality of their paper. Finally, we
would like to apologize to Dr. Kenneth Rose for having mistakenly
associated his biography with another individual.
195
Richard C. Rose received the B.S. and M.S. degrees in electrical engineering from the University
of Illinois, Urbana, in 1979 and 1981, respectively.
He received the Ph.D. degree in electrical engineering from the Georgia Institute of Technology,
Atlanta, in 1988, completing his dissertation work
in speech coding and analysis.
From 1980 to 1984, he was with Bell Laboratories, where he worked on signal processing and
digital switching systems. From 1988 to 1992, he
was a member of the Speech Systems and Technology Group, MIT Lincoln Laboratory, Lexington, MA, working on speech
recognition and speaker recognition. He is currently a Principal Member of
Technical Staff, Speech and Image Processing Services Laboratory, AT&T
Laboratories–Research, Florham Park, NJ.
Dr. Rose served as a member of the IEEE SP Technical Committee on
Digital Signal Processing from 1990 to 1995, and has served as an adjunct
faculty member with the Georgia Institute of Technology. He has been elected
as an at-large member of the Board of Governors for the Signal Processing
Society, serves as an Associate Editor for the IEEE TRANSACTIONS ON SPEECH
AND AUDIO PROCESSING, and serves as a member of the IEEE SP Technical
Committee on Speech. He is also a member of Tau Beta Pi, Eta Kappa Nu,
and Phi Kappa Phi.
Manuscript received January 12, 1997.
Publisher Item Identifier S 1063-6676(98)02406-7.
1 L. Lee and R. C. Rose, IEEE Trans. Speech Audio Processing, vol. 6, pp.
49–60, January 1998.
1063–6676/98$10.00  1998 IEEE
Download