ACCESS Seed project in Signals and Systems: Robust spectral

advertisement
ACCESS Seed project in Signals and Systems: Robust spectral estimation (RoSE)
Project leader: Per Enqvist
Participating ACCESS founding faculty: Bastiaan Kleijn, Anders Lindquist, Bo Wahlberg
Other participating researchers: Chris Byrnes, Guest Professor, Opt&Syst.
Enrico Avventi, Doctorate student, Opt&Syst.
Number of deliverables needed: 4
Proposed deliverables:
B. Kleijn, P. Enqvist and E. Avventi, ”Robust spectral envelope estimation in speech”, IEEE
transactions on speech and audio processing.
P. Enqvist, E. Avventi and B. Kleijn, ”Spectral estimation methods for short data sequences”, Signal
Processing.
A.Lindquist, C. Byrnes, P. Enqvist and B. Kleijn, “Estimating spectral zeros using approximate
covariance interpolation”, IEEE Transactions on Signal Processing.
B. Wahlberg, A. Lindquist, E. Avventi, and P. Enqvist, “ On stochastic realization theory on graphs“,
IEEE Transactions on Signal Processing.
Background
In this project we will improve spectral estimation for the purpose of improving network services.
We will particularly consider spectral estimation using ARMA models in the context of the
classification, manipulation, coding, and enhancement of audio signals.
Speech and audio processing is an integral part of the modern telecommunication networks.
State-of-the-art processing is largely application specific and lacks flexibility. For example, speech
recognition, noise suppression, speech coding, and audio coding generally use different
approaches for spectral analysis. Increased flexibility would be obtained if the signals could be
characterized by probability densities that are specified in terms of physically and/or linguistically
meaningful parameters. The parameters could then be used to classify and modify signals. This
long-term goal requires spectral analysis methods that are flexible (can be adapted to the task at
hand) and provide meaningful descriptions of the signal source.
Speech and speaker recognition is getting more commonly used and finds new applications in
areas such as teleconferencing, surveillance, security and other network structures. The spectral
estimation in these applications commonly relies on linear prediction. However, it is well-known
that linear prediction can give spectral estimates with very sharp peaks, and these peaky estimates
can cause problems in e.g. speech modification, speaker recognition and audio coding applications
[1]. In fact, most methods for spectral estimation are developed to be consistent and
asymptotically tend to the generating spectrum (if one exists) as the number of data points goes to
infinity. However, in most applications only a relatively short data sequence is available, and this
can cause problems if a method designed with asymptotic behavior in mind is applied without
care.
Different regularization methods have been proposed to deal with this problem.
Interestingly enough, two different approaches have been developed by the participants in this
ACCESS seed project proposal [1-3]. Since one method was developed at the sound and imaging
lab [1,2] and the other at the optimization and systems theory division [3], they obviously were
designed with different aims and tools, but there should be great synergy in a joint approach.
Overall Goals
As a first step, it would be interesting to compare the two approaches. Both are based on linear
prediction, which is the conventional method used in the most practical applications. Linear
prediction itself can be motivated by entropy maximization principles, and to increase robustness
the two approaches use two different regularizations: a penalty on the derivative of the envelope
and a least-squares covariances error slack.
It has been shown that it is possible to find a method that performs better than linear prediction
at a small extra computational cost [1], but it is not clear how much better it could perform if we
allow a bit higher cost.
The aim would be to develop a method with low computational complexity, guaranteed stability,
and good experimental and statistical properties for short data sequences. To get as close as
possible to this goal we need the combined expertise of both the more theoretical and applied
environments.
I was administrating an ACCESS graduate school course given by Prof. Christopher Byrnes on
“Moment problems in signals, systems and control”, and some procedures used in this course can
be generalized to form adapted specialized spectral estimators. This approach will be used to
consider two other related spectral estimation problems.
A difficult part of ARMA modeling involves the estimation of spectral zeros. Given the zeros we
have a good estimator, but in practice these zeros have to be estimated from data and a number of
different approaches will be evaluated on speech data and further developed.
Working on graphs it is known a priori that certain conditional independence constraints should
hold, this is described in more detail in [5], where fitting of AR models under these constraints
have been studied. A generalization for fitting ARMA models will be considered.
References
[1] L.A. Ekman, W. B. Kleijn, and M.N. Murthi, ”Regularized Linear Prediction of Speech”, No. 1,
Vol. 16, pages 65-73, Jan. 2008, IEEE Trans. Audio, Speech and Language Processing.
[2] L.A. Ekman, W. B. Kleijn, and M.N. Murthi, ”Spectral Envelope Estimation and
Regularization”,Vol. 1, Pages 245-248, May 2006, Proc. IEEE int. Conf. On Acoustics, Speech and
Signal Processing.
[3] P. Enqvist and E. Avventi, ”Approximative Covariance Interpolation with a quadratic penalty”,
Proc. 46:th IEEE Conf. On Decision and Control, CDC 2007.
[4] C.I. Byrnes, P. Enqvist and A. Lindquist, ”Cepstral coefficients, covariance lags, and pole-zero
models for finite data strings”, vol. 49, No. 4, 2001, IEEE Trans. On Signal Processing.
[5] J. Songsiri, J. Dahl and L. Vandenberghe, ”Graphical models of autoregressive processes”, In Y.
Eldar and D.Palomar, editors, Convex optimization in signal processing and communications.
Cambridge University Press, 2009.
Download