ACCESS Seed project in Signals and Systems: Robust spectral estimation (RoSE) Project leader: Per Enqvist Participating ACCESS founding faculty: Bastiaan Kleijn, Anders Lindquist, Bo Wahlberg Other participating researchers: Chris Byrnes, Guest Professor, Opt&Syst. Enrico Avventi, Doctorate student, Opt&Syst. Number of deliverables needed: 4 Proposed deliverables: B. Kleijn, P. Enqvist and E. Avventi, ”Robust spectral envelope estimation in speech”, IEEE transactions on speech and audio processing. P. Enqvist, E. Avventi and B. Kleijn, ”Spectral estimation methods for short data sequences”, Signal Processing. A.Lindquist, C. Byrnes, P. Enqvist and B. Kleijn, “Estimating spectral zeros using approximate covariance interpolation”, IEEE Transactions on Signal Processing. B. Wahlberg, A. Lindquist, E. Avventi, and P. Enqvist, “ On stochastic realization theory on graphs“, IEEE Transactions on Signal Processing. Background In this project we will improve spectral estimation for the purpose of improving network services. We will particularly consider spectral estimation using ARMA models in the context of the classification, manipulation, coding, and enhancement of audio signals. Speech and audio processing is an integral part of the modern telecommunication networks. State-of-the-art processing is largely application specific and lacks flexibility. For example, speech recognition, noise suppression, speech coding, and audio coding generally use different approaches for spectral analysis. Increased flexibility would be obtained if the signals could be characterized by probability densities that are specified in terms of physically and/or linguistically meaningful parameters. The parameters could then be used to classify and modify signals. This long-term goal requires spectral analysis methods that are flexible (can be adapted to the task at hand) and provide meaningful descriptions of the signal source. Speech and speaker recognition is getting more commonly used and finds new applications in areas such as teleconferencing, surveillance, security and other network structures. The spectral estimation in these applications commonly relies on linear prediction. However, it is well-known that linear prediction can give spectral estimates with very sharp peaks, and these peaky estimates can cause problems in e.g. speech modification, speaker recognition and audio coding applications [1]. In fact, most methods for spectral estimation are developed to be consistent and asymptotically tend to the generating spectrum (if one exists) as the number of data points goes to infinity. However, in most applications only a relatively short data sequence is available, and this can cause problems if a method designed with asymptotic behavior in mind is applied without care. Different regularization methods have been proposed to deal with this problem. Interestingly enough, two different approaches have been developed by the participants in this ACCESS seed project proposal [1-3]. Since one method was developed at the sound and imaging lab [1,2] and the other at the optimization and systems theory division [3], they obviously were designed with different aims and tools, but there should be great synergy in a joint approach. Overall Goals As a first step, it would be interesting to compare the two approaches. Both are based on linear prediction, which is the conventional method used in the most practical applications. Linear prediction itself can be motivated by entropy maximization principles, and to increase robustness the two approaches use two different regularizations: a penalty on the derivative of the envelope and a least-squares covariances error slack. It has been shown that it is possible to find a method that performs better than linear prediction at a small extra computational cost [1], but it is not clear how much better it could perform if we allow a bit higher cost. The aim would be to develop a method with low computational complexity, guaranteed stability, and good experimental and statistical properties for short data sequences. To get as close as possible to this goal we need the combined expertise of both the more theoretical and applied environments. I was administrating an ACCESS graduate school course given by Prof. Christopher Byrnes on “Moment problems in signals, systems and control”, and some procedures used in this course can be generalized to form adapted specialized spectral estimators. This approach will be used to consider two other related spectral estimation problems. A difficult part of ARMA modeling involves the estimation of spectral zeros. Given the zeros we have a good estimator, but in practice these zeros have to be estimated from data and a number of different approaches will be evaluated on speech data and further developed. Working on graphs it is known a priori that certain conditional independence constraints should hold, this is described in more detail in [5], where fitting of AR models under these constraints have been studied. A generalization for fitting ARMA models will be considered. References [1] L.A. Ekman, W. B. Kleijn, and M.N. Murthi, ”Regularized Linear Prediction of Speech”, No. 1, Vol. 16, pages 65-73, Jan. 2008, IEEE Trans. Audio, Speech and Language Processing. [2] L.A. Ekman, W. B. Kleijn, and M.N. Murthi, ”Spectral Envelope Estimation and Regularization”,Vol. 1, Pages 245-248, May 2006, Proc. IEEE int. Conf. On Acoustics, Speech and Signal Processing. [3] P. Enqvist and E. Avventi, ”Approximative Covariance Interpolation with a quadratic penalty”, Proc. 46:th IEEE Conf. On Decision and Control, CDC 2007. [4] C.I. Byrnes, P. Enqvist and A. Lindquist, ”Cepstral coefficients, covariance lags, and pole-zero models for finite data strings”, vol. 49, No. 4, 2001, IEEE Trans. On Signal Processing. [5] J. Songsiri, J. Dahl and L. Vandenberghe, ”Graphical models of autoregressive processes”, In Y. Eldar and D.Palomar, editors, Convex optimization in signal processing and communications. Cambridge University Press, 2009.