A QUASI-NEWTON ADAPTIVE ALGORITHM FOR ESTIMATING GENERALIZED EIGENVECTORS G. Mathew’ V.U. Reddd A. Poulmf ‘Dept. of Electrical Communication Engineering Indian Institute of Science, Bangalore- 560012, India e-mail: vurQece.iisc.ernet .in ‘Information Systems Laboratory, Dept. of Electrical Engineering Stanford University, Stanford, CA 94305 Tel: 415-725-8307 F~x:415-723-8473 ABSTRACT can be obtained from the eigenvector corresponding to the minimum eigenvalue of (R,, RW). Many researchers have addressed the problem (1) for given R, and R, and proposed methods for solving it. Moler and Stewart [2] proposed a QZ-algorithm and Kaufman [3] propoeed an LZ-algorithm for iteratively solving (1). But, their methods do not exploit the structure in Ry and R,,,. On the other hand, the solution of (1) in the special case of symmetric R, and symmetric positive definite R, has been a subject of special interest and several dficient approaches have been reported. By using the Cholesky factorization of R,, this problem can be reduced to the standard eigenvalue problem as reported by Martin and Wilkinson [4]. Bunse-Gerstner [5] proposed an a p proach using congruence transformations for the simultaneous diagonalization of R, and R,. Shougen and Shuqin [6]reported an algorithm which makes use of Cholesky, QR and Singular value decompositions when R, is also positive definite. This algorithm is stable, faster than QZ-algorithm, and much superior to that of [4]. Auchmuty [l] proposed and analyzed certain cost functions which are minimized at the eigenvectors corresponding to some specific eigenvalues. In the case of adaptive signal processing applications, however, & and R, correspond to asymptotic covariance matrices and they need to be estimated. In this paper, the data covariance matrix of g(n), i.e., R,(n), is taken as the estimate of R, and M estimate of R,, say R,, is assumed to be available based on some o priori measurements. The sample covariance matrix at nth data instant is calculated We first introduce a constrained minimization formulation for the generalized symmetric eigenvalue problem and then recast it into an unconstrained minimization problem by constructing an appropriate cost function. Minimizer of this cost function corresponds to the eigenvector corresponding to the minimum eigenvalue of the given symmetric matrix pencil and all minimizers are global minimizers. We also present an inflation technique for obtaining multiple generalized eigenvectors of thia pencil. Based on this asymptotic formulation, we derive a quasi-Newton adaptive algorithm for estimating these eigenvectors in the data case. This algorithm is highly modular and parallel with a computational complexity of O ( N z ) multiplications, N being the problem-size. Simulation results show fast convergence and good quality of the estimated eigenvectors. 1. INTRODUCTION Consider the matrix pencil (R,, R,) where R, and R, are N x N symmetric positive definite matrices. Then, the task of computing an N x 1 vector x and a scalar X such that R,x = XR,x (1) is called the generalized symmetric eigenvalue problem. The solution vector x and scalar X are called the generalized eigenvector and eigenvalue, respectively, of the pencil ( R Y , R w ) . This pencil has N positive eigenvalues viz. XI 5 X2 5 5 AN, and corresponding real R,orthonormal eigenvectors e,j = 1,.. .,N [9,1]: Rpqi =XiRwq, with q T R w Q t = 6i, i , j €{1, ..., N } as n irl (2) where y ( n ) is the data vector containing the most recent of ~ ( n ) .Thus, the problem we address in this paper ie as follows. Given the timc-series ~ ( n )develop , an adaptive algorithm for estimating the first D (D 5 N) generalized R,-orthogonal eigenvectors of the pencil (R,,R,) using (R,(R), k,) as an estimate of (R,,R,). The paper is organized as follows. The generalized eigenvdue problem ie translated into an unconstrained minimization problem in Section 2. A quasi-Newton adaptive algorithm ie derived in Section 3. Computer simulations are presented in Section 4 and finally, Section 5 concludes the paper. N samples where 6, ie the Kronecker delta function. The generalized symmetric eigenvalue problem has extensive applications in the harmonic retrieval / directionof-arrival estimation problem when the observed data g(n) is the sum of a desired signal z(n) and a coloured noise which is of the form u w ( n ) , where U is a scalar. For example, consider z(n) to be the sum of P real sinusoids and m u m e ~ ( n to ) be uncorrelated with z(n). Then, if R, and R, represent asymptotic covariance matrices (of size N x N with N 1 2P+1) of g(n) and w(n), respectively, with R, assumed to be non-singular, the sinusoidal frequencies 1058-6393/95$4.000 1995 IEEE 602 2. A MINIMIZATION FORMULATION FOR GENERALIZED SYMMETRIC EIGENVALUE PROBLEM In this section, we recast the generalized symmetric eigen- a*'.Rwa*),we get from Theorem 1 that a* is a stationary point of J(a, a). Let y = a* + p, p E RN.Then, it caa be ohown that value problem into an unconstrained minimization framework. We first consider the case of the eigenvector corresponding to the minimum eigenvalue (henceforth referred to as the minimum eigenvector) and then extend it to the case of more than one eigenvector uaing an inflation approach. The principle used here b a generalization of the approach followed in [SI. 2.1. +f [2pTRwa*+ pTRwp]'. (8) Single Eigenvector Case We first introduce a constrained " i z a t i o n formulation for seeking the minimum eigenvector of (R,, R,). Consider the following problem: aTRwa= 1. (3) Let R, = L,LE be the Cholesky factorization of Then, (3) can be re-written as R,. min aTR,a a min b bTR,b subject to bTb = 1 subject to V p E RN. Thus, a* is a global minimizer of J ( a , p ) . Only if part: Since a' is a global minimizer, we have m a t i o n a r y point of J ( a , p ) and ii) H(a*,p) is positive semi-definite. Hence, from Theorem 1, we get T &a* = XmRwa' with Am = p(1 - a* R,a*), a* = @qm and @ = for some m E { 1, . , N). Then, with aiqa, it can be shown that p= d (4) .. z N E(& - + 2fiB26mi)a?. pTH(a*,C C = )~ where R, = LZ1R,&GTand b = LEa. Since the eigenvalues of (R,, R,) and R, are identical and the eigenvectors of (R,,,R,) are the eigenvectors of R, premultiplied by LGT, it follows from (4) that the solution of (3) is the minimum eigenvector of (R,, &). Using the penalty function method [7], the constrained problem (3) can be translated into unconstrained " i z a tion of the cost function aTR,a p(aTR,a J ( a ,P ) = 2 + - 1)? 4 + + (9) Since H(a*,p) is positive semi-definite, (9) implies that Xi 2 Am for a l l i = 1 , . .. ,N. That is, a* is the minimum eigenvector of (R,,R,). Further, since H(a*,p ) is positive semi-definite even if a* is a local minimizer, it follows that dl minimizers are global minimizers. Corollary 1 The value of p should be greater than A, . Corollary 2 The eigenvectors of (R,, R,) arrociated with (5) the non-minimum eigenvalues correspond to saddle points of J ( a ,a). where p is a positive scalar. Below, we give results to establish the correspondence between the minimizer of J(a, p ) and a minimum eigenvector of (R,, R,,,). The gradient vector g and Hessian matrix H of J ( a , p ) with respect to a are g(a, P ) = R,a p(aT&a - l)Rwa, H(a, P ) = R, 2 ~ ( R , a ) ( L a ) ~ +p(aTR,a - 1 ) L . Xm 1-1 Clearly, minimizer of J ( a , p ) will be unique (except for the sign) only if the minimum eigenvalue of ( R Y , R w ) is simple. The main result of the above analysis is that all minimizers of J(a, p ) are aligned with the minimum eigenvectors of (RY,Rw).Since all minimizers are global minimizers, any simple search technique will suffice to reach the 2 Xmin V y E RN, correct solution. Further, since the bound for p in Corollary 1 can be satisfied by choosing {a (6) (7) ' R Using (6) and (7), we get the following results. (1 1) 2.2. Inflation Technique for Seeking Multiple Eigenvectom Now, we present an inflation technique which, combined Theorem 1 a* is a rtationarypoint of J(a, p ) if and only if a* is an eigenvector of (R,, R,) corresponding to the T eigenvalue X ruch that X = p ( l - a* Rwa*). with the result of the previous section, is used to seek the fist D (D 5 N ) Rw-orthogonal eigenvectors of (R,, R,). The basic approach is to divide the problem into D s u b problems such that the kth subproblem solves for the k t h eigenvector of (R,,R,). Let at, i = I , . . . ,k 1, (with 2 2 k 2 D ) be the R,orthogonal eigenvectors of (R,, R,) corresponding to the T eigenvalues Xi = p(1 -a: R,at) for i = 1,. ,k - 1. To Thie result immediately follows from (6). Theorem 2 a* is a global minimizer of J(a, p ) if and only if a* is a minimum eigenvector of (R,,,R,) with eigenvalue T Xmin = p ( l a* L a * ) , when Amin = XI. Further, all - - .. minimizers are global minimizers. 603 - replacing R,, Rw and a?, i = 1,.. .,k 1, with Ru(n),R, and aj(n), i = 1,. ,k 1, respectively. Thus, we obtain obtain the next &,-orthogonal eigenvector a;, consider the coot function .. - k-1 - irl with h I ( n ) = R,(n). Further, we approximate the Hessian by dropping the last term in (16) so that the approximant is positive definite and a recursion can be obtained directly in terms of its inverse. Sutwtituting this approximated Hessian along with (15) in (14), we obtain the quasi-Newton adaptive dgorithm for estimating the eigenvectors of (R,, R,) (after some manipulations) as with RY1= R,. Eqn.(l2) represents the inflation step and its implication is discussed below. Let a: = pip,, p, > 0, i = 1,.. . , k 1. Post-multiplying (11) with qJ,we get - + ak(n) = ik(n - l)R;;(n)Ck(n . - a/3;)Rwe j = 1,. . ,k I = XjRwQ j = k, ...,N . R u b e = (A, ik(n-1) That is, (RY,Rw) and (RYh,RW) have the same set of eigenvectors but k 1 different eigenvalues. Now, choose Q such that = p 1 + - Xk < +3/.: j = 1 , ...,k-1. R;m - 1) k=1, ...,D (18) 1 ar(n 1)ck(n 1) 2p(Cr(n l)R,':(n)Ck(n 1)) k = l , ...,D (19) + - - - - = R;;&) - RFi-l (nlck-1 (n)cT-i (fl)RLi-l(n) + cT-l(n)RFi-l (nlck-1 (n) (13) 1 Then, clearly, the minimum eigenvalue of (RYh,Rw) with qk as the corresponding eigenvector. Hence, if a; is a minimizer of Jk(ak,p, a), then it follows from Theorem 2 that a; is a minimum eigenvector of (RYr,Rw) with eigenT value x k = p(1-a; Rwat). By construction, a; is nothing but the k t h R,srthogonal eigenvector of (R,, R,). A practical lower bound for Q io p. Thus, by constructing D cost functions Jk(ak,p , a),k = 1,.. . , D , as h (10) and finding their minimizers, we get the first D Rw-orthogonal eigenvectors of (R,, R,). 3. R;i(n) n22 I (21) C k ( n ) = Ei,ak(n). This algorithm can be implemented using a pipeline architecture as illustrated in Fig. 1. Here, krh eigenvector is estimated by the kth unit and it goes through the following steps during n'" sample interval (time indices are suppressed in the figure): unit, i) pass on q - i ( n - A) and ak(n - k) to (k + ii) accept RLi-l(n-k+l) and ak-l(n-k+l) from (k-l)'h unit and iii) update q - i ( n - k ) and ak(n-t) to R ; : ( n - k + I ) and ak(n k + 1). Observe from (18)-(21) that the computations required for updating the eigenvector estimates are identical for all the units, thus making the algorithm both modular and parallel. Consequently, the effective computational requirement is equal to that required for updating only one eigenvector estimate and it is about 5.5" multiplications per iteration. In this section, we combine the inflation technique of the previous section with a quasi-Newton method and derive an adaptive algorithm for estimating the first D Rworthogonal eigenvectors of (R,,R,) in the data case. Let ak(n),k = 1,.. . , D, be the estimates of these vectors at nth adaptation instant. Newton algorithm for updating ak(n 1) to ak(n) is of the form - - - 1) - H;'(n - l)gk(n - 1) (20) where A QUASI-NEWTON-CUM-INFLATION BASED ADAPTIVE ALGORITHM ak(n) = ak(n k22 [R;'(n - 1) n-1 - R ~ ' ( R- I)y(n)yT(n)Ry'(n- 1) n - 1 + yT(n)R;'(n - l)y(n) = R;'(n) = n (14) where &(n-1) and gk(n-1) are the Hessian and gradient, respectively, of Jk(ak,p,a ) evaluated at ?ik = ak(n 1). Since R,, R, and a?, i = 1,. . . ,k - 1, are not available, estimates of gk(n - 1) and Hk(n 1) can be obtained by - - 604 We have the following remarks regaruding the convergence of this algorithm. Following the steps as in the convergence analysis of a similar-looking algorithm given in [E], we can show that i) the algorithm in locally convergent (asymptotically) and ii) the underired stationary points (i.e., the undesired eigenvectoro) are unrtable. However, when R,(n) = RYfor AU n, the algorithm can get stuck at an undesired eigenvector. Thus, the propoeed algorithm is globally convergent with probability one. the data length, s‘ = [l,expljarf), exp(j4*f), , urp(j(N 1)2xf)lT and H denotes Hermitian transpose. The peaks of S(f)rue taken as the estimates of the sinusoidal frequencies in the desired signal z(n). The aversged results are shown in Fig. 3. Observe that that the desired frequencies are well estimated in spite of the fact that the undesired signal frequencies are closely interlaced with the desired signal frequencies. - 4. S I M U L A T I O N R E S U L T S In this section, we present some computer simulation results to demonstrate the performance of the proposed adaptive algorithm. The performance measures used for evaluating the quality of the estimated eigenvectors are as follows. To see how close the estimated eigenvectors are to the true subspace, we use the projection error measure, E ( n ) ,defined M 11 - S(STS)”ST] z(n)l12 E(n) = [I where z ( n ) = F ( n ) , ,ED(n)], X,(n) 6. C O N C L U S I O N S The problem of seeking the generalized eigenvector corresponding to the minimum eigenvalue of a symmetric matrix positive definite pencil ( R Y , R w )has been translated into an unconstrained minimization problem. This was then extended to the case of more than one eigenvector using an inflation technique. Based on this asymptotic formulation, a quasi-Newton adaptive algorithm was derived for estimating these eigenvectors in the data case. Note that the algorithm requires the knowledge of noise covariance matrix to within a scalar multiple. (22) = S = [si,q 2 , . . . ,q ~ ] In . order to know the extent of R,- REFERENCES orthogonality among the estimated eigenvectors, we define an orthogonality meaeure, Orthmal(n), as [l] G. Auchmuty, “Globally and Rapidly Convergent Algorithms for Symmetric Eigenproblems,” SIAM J. Motriz Analysis and Applications, ~01.12, 110.4, pp.690- 706, Oct. 1991. [2] C.B. Moler and G.W. Stewart, “An Algorithm for Generalized Matrix Eigenvalue Problems,” SIAM J . Numerical Analysis, ~01.10,pp.241-256, 1973. (23) Thus, the smaller the values of E ( n ) and Otihma=(n), the better is the quality of the estimated eigenvectors, and viceversa. In the simulations, the signal y(n) was generated as g(n) = z(n) uw(n) where (31 L. Kaufman, “The LZ-Algorithm to solve the Generalized Eigenvalue Problem,” SIAM J. Numericol Analy818, VOl.11, pp.997-1024, 1974. [4] R.S. Martin and J.H. Wilkinson, “Reduction of the Symmetric Eigenproblem Ax = XBx and Related Problems to Standard Form,” Numerische Mathematik, ~01.11,pp.99-110, 1968. + z(n) w(n) + 61) + psin(2r(0.24)n + 6,) + 6,) + sh(2*(0.23)n + 0 , ) +an(2~(0.25)n+ 0,) + U(.). = psin(2*(0.2)n = sin(2*(0.21)n (51 A. BunseGerstner, “An Algorithm for the Symmetric Generalized Eigenvalue Problem,” Linear Algebra and Applications, ~01.58,pp.43-68, 1984. [6] W. Shougen and Z. Shuqin, “An Algorithm for Ax = XBx with Symmetric and Positive-Definite A and B,” SIAM J . Motriz Analysis and Applications, v01.12, 110.4, pp.654-660, Oct. 1991. [7] D.G. Luenberger, Linear and Non-linear Programming, pp.366-369, Addison-Wesley, 1978. [E] G. Mathew, V.U. Reddy and S. Dasgupta, “Adap tive Estimation of Eigensubspace,” to appear in IEEE h n s . Signal Processing, Feb. 1995. [9] B.N. Parlett, The Symmetric Eigenvalue Problem, Prentice-Had, Englewood Cliffs,NJ, 1980. Here, u(n) is a zero-mean and unit variance white noise and 6,’s are the initial phases (assumed to be uniform in [ - r , 4 ) . Values of N and D were fixed at 10 and 6, respectively, and U at 0.6325. The algorithm waa initialized as ak(0) = i k , k = 1 , . . ,D, and Rr’(1) = 1001 where i k is the kth column of I. The matrix R, WM taken to be the asymptotic covariance matrix of w(n). 100 Monte Carlo simulations were performed. Values of E(n), averaged over 100 trials, are plotted in Fig. 2 for p = 5.00 (high signal to noise ratio (SNR)) and 1.58 (low SNR). Observe that the algorithm converges quite fast especially in the high SNR case. Values of Orthm,=(n) (not shown here) are of the order of IO-‘ and for p = 5.00 and p = 1.58, respectively, implying that the implicit orthogonalization built into the algorithm through the inflation technique is very effective. In order to see how this subspace quality reflects in frequency estimation, we used the spectral estimator . 05f 5 0.5 (24) 605 a, operating On dbta upto (n - D l)'* innant operating on data I)* innmt operating on data upto nr*instant upto (n - h + + Modular implementation of the proposed method Figure 1. ..................................... e-l&Kl .::-.: 7.T .c..r.?,.W............... ..................................... _ so 0 ..................................... ..................................... .............................. c . . . ' -;---_-__ . __ -so 100 -----2 ---_-- 200 rrmpl, no. Figure 2. Figure 3. Convergence performance of the proposed method Spectrum estimate using the estimated eigenvectors 606 io