471 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, A N D SIGNAL PROCESSING, VOL. 36, NO. 4, APRIL 1988 Parameter Estimation of Superimposed Signals Using the EM Algorithm MEIR FEDER AND Abstract-We develop a computationally efficient algorithm for parameter estimation of superimposed signals based on the EM algorithm. The idea is to decompose the observed data into their signal components and then to estimate the parameters of each signal component separately. The algorithm iterates back and forth, using the current parameter estimates to decompose the observed data better and thus increase the likelihood of the next parameter estimates. The application of the algorithm to the multipath time delay and to the multiple source location estimation problems is considered. I. INTRODUCTION HE general problem of interest here may be characterized using the following model: T K y(t) = k= 1 Sk(t; 0,) -k n ( t ) where 0, are the vector unknown parameters associated with the kth signal component and n ( t ) stands for the additive noise. This model covers a wide range of problems involving superimposed signals. The specific problem we have in mind is multiple source location estimation; in that case, ( t ; 0,) are the array signals observed from the kth source, and 0 k are the unknown source location parameters. In this paper, we develop a computationally efficient scheme for the joint estimation of e , , , OK based on the Estimate Maximize (EM) algorithm. The idea is to decompose the observed data y ( t ) into its signal components and then estimate the parameters of each signal component separately. The algorithm iterates back and forth, using the current parameter estimates to decompose the observed data better and thus improve the next parameter estimates. Under the stated regularity conditions, the algorithm converges to a stationary point of the likelihood function where each iteration cycle increases the likelihood of the estimated parameters. - Manuscript received February 20, 1986; revised September 28, 1987. This work was supported in part by the MIT-WHO1 Joint Program, in part by the Advanced Research Projects Agency monitored by ONR under Contract NO00144 1-K-0742 and the National Science Foundation under Grant ECS-8407285 in MIT, and in part by the ONR under Contract N00014-85K-0272 in WHOI. It is WHO1 Contribution 6584. M. Feder is with the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, and the Department of Ocean Engineering, Woods Hole Oceanographic Institution, Woods Hole, MA 02543. E. Weinstein is with the Department of Electronic Systems, Faculty of Engineering, Tel-Aviv University, Tel-Aviv 69978, Israel, and the Department of Ocean Engineering, Woods Hole Oceanographic Institution, Woods Hole, MA 02543. IEEE Log Number 8719013. EHUD WEINSTEIN The results developed in this paper can be viewed as a generalization of the results presented by the authors in [l] and [2]. We note that the idea of iteratively decomposing the observed signal and estimating the parameters of each signal component separately has been previously proposed in several applications (e.g., [3]-[5]). However, in most cases, the approach is ad hoc, and there is no proof of convergence of the algorithm. As will be shown, the EM method suggests a very specific way of decomposing the observed signals, leading to a numerically as well as statistically stable algorithm. The paper is organized as follows. In Section I1 the EM method is represented following the derivation in [6]. In Section I11 the EM method is applied to the parameter estimation of superimposed signals, and the basic algorithm is developed for deterministic (known) signals and for stationary Gaussian signals. In Section IV the algorithm is applied to the multipath time-delay estimation, in Section V the algorithm is applied to the multiple source angle of arrival estimation, and in Section VI we summarize the results. 11. MAXIMUM LIKELIHOOD ESTIMATION V I A THE EM ALGORITHM The EM algorithm, developed in [6], is a general method for solving maximum likelihood (ML) estimation problems given incomplete data. The considerations leading to the EM algorithm are given below. Let Y denote the observed (incomplete) data, possessing the probability densityfy( y; 0 ) indexed by the parameter vector 0 E 8 C R K , and let X denote the “complete” data, related to Y by H(X)= Y (1) where H ( ) is a noninvertible (many-to-one) transformation. Express densities f a x ; 0) =fx/Y=y(x; 0) * fr(y; 0) vH(x) = Y (2) where f x ( x ; 0 ) is the probability density associated with X and f X l y = y(x; 0 ) is the conditional probability density of X given Y = y. Taking the logarithm on both sides of (2) 9 lOgfy(Y; 0 ) = logfx(x; 0 ) - logfx/Y=y(x; 0 ) . (3) Taking the conditional expectation given Y = y at a 0096-3518/88/0400-0477$01.OO O 1988 IEEE IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, A N D SIGNAL PROCESSING, VOL. 36, NO. 4. APRIL 1988 478 parameter value logfy(y; 0 ) e’, = E{logfx(x; e ) / Y = - E{logfx/y=y(x; Y; e’} e)/Y = y; O r } . (4) Define, for convenience, L ( e ) = logfy(Y; u(e, el) = e> (5) E { iogfx(x; O ) / Y = Y; o r } (6) and v(e,e!) = E{ iogfx/y=y(x; e ) / y = Y; e r } . (7) With these definitions, (4) reads ~ ( e=) u(e,er) - q e , o r ) . (8) We identify L ( 8 ), the log-likelihood of the observed data, as the function we want to maximize. Invoking the Jensen’s inequality, v(e, el) 5 v(er,e r ) . (9) Hence, if cannot be found, and thus the computation of U ( e , O r ) at each iteration cycle generally requires multiple integration. Appendix A has importance of its own since it covers a wide range of problems. We note that the EM algorithm is not uniquely specified. The transformation H ( ) relating X to Y can be any noninvertible transformation. Obviously, there are many possible “complete” data specifications X that will generate the observed data Y . However, the choice of “complete” data may critically affect the complexity and the rate of convergence of the algorithm, and the unfortunate choice of “complete” data may yield a completely useless algorithm. In the next section, it will be shown that for the class of problems involving superimposed signals, there is a natural choice of “complete” data, leading to a surprisingly simple algorithm to extract the ML parameter estimates. 111. PARAMETER ESTIMATION OF SUPERIMPOSED SIGNALS IN NOISE The general problem of interest here may be characterized using the following model: u(e,er) > u(w,er) K then ~ ( e >) q w ) . (10) The relation in (10) forms the basis for the EM algorithm. The algorithm starts with an arbitrary initial guess 6“’, and denote by the current estimate of 8 after n iterations of the algorithm. Then, the next iteration cycle can be described in two steps, as follows: E step: Compute u(e,6 ( n ) ) . (11) where Ok are the vector unknown parameters associated with the kth signal component and n ( t ) denotes the additive noise. This model covers a wide range of problems arising in array and signal processing, including multitone frequency estimation, multipath time-delay estimation, and multiple source location estimation. Given observations of y ( t ) , we want to find the ML estimate of el, e*, * * , 0 K jointly. We shall now consider the cases of deterministic (known) signals and stochastic Gaussian signals separately. - M step: Max U ( 0 , 6‘”)) 0 + 6(”+’). (12) If U ( 0 , e‘) is continuous in both 8 and O r , the algorithm converges to a stationary point of the log-likelihood function [7] where the maximization in (12) ensures that each iteration cycle increases the likelihood. Of course, as in the case of all “hill climbing” algorithms, the convergence point may not be the global maximum of the likelihood function, and thus several starting points may be needed. The rate of convergence of the algorithm is exponential, depending on the fraction of the covariance of the “complete” data that can be predicted using the observed (incomplete) data [6], [8]. If that fraction is small, the rate of convergence tends to be slow, in which case one could use standard numerical methods to accelerate the algorithm. In the Appendix we derive a closed-form analytical expression for U ( 8, O r ) for the case where X and Y are jointly Gaussian, related by a linear transformation. We note that in general a closed-form analytical expression A. Deterministic Signals Consider the model of (13) under the following assumptions. 1) Thes,(t; Q k ) , k = 1, 2, * , K, are conditionally known up to the vector parameters Ok. 2) The n ( t )are vector zero-mean white Gaussian processes whose covariance matrix is E { n ( t ) n* ( a) } = Q 6 ( t - a). Under these assumptions, the log-likelihood function is given by (e.g., see [9]) - - K Q-’[y(t) - k= 1 ~ d t e,)] ; dt (14) where = 1 if n ( t ) is real-valued, X = 2 if n ( t ) is complex-valued, and c is a normalizing constant. In the case of discrete-time observations, L ( 8 ) is still given by (14) where the integral is replaced by the sum over all discrete 419 FEDER A N D WEINSTEIN: PARAMETER ESTIMATION OF SUPERIMPOSED SIGNALS points t E T. Thus, to obtain the ML estimate of the various Ok’s, one therefore must solve In that case, the log-likelihood of the complete data x( t ) is K Q-’[Y(t) * C k- I - Sk(t; e k ) ] dt]. = This is a complicated multiparameter optimization problem. Of course, brute force can always be used to solve the problem, evaluating the objective function on a coarse grid to locate roughly the global minimum and then applying the Gauss method, the Newton-Raphson, or some other gradient-search iterative algorithm. However, when applied to the Problem at hand, these methods tend to be computationally complex and time consuming. Having the EM algorithm in mind, we want to simplify the computations involved. To apply the algorithm to the problem- at hand, the first step is io specify the “complete” data. A natural choice for the “complete” data x ( t ) is obtained by decomposing the observed data y ( 1 ) into its signal components; that is, \ I 1 . A - ’ [ x ( t ) - s ( t ; e)] dt (15) +2 d +2 - l T T s*(t; e ) A - ’ x ( t ) dt x*(t) A - ’ s ( t ; e) dt 1 J s * ( t ; e, A - ~ s ( te; ) dt 2 T (22) where d contains all the terms that are independent of 0. s (t; 0 ) and A are the mean and the covariance matrix of x(t) given, respectively, by 1 where Xk(t) = sk(t; e k ) + nk(t) (17) where the s ( t ) are obtained by arbitrarily decomposing the total noise n ( t ) into the K components, so that where the notation in (24) indicates that A is a block-diagonal matrix. Taking the conditional expectation of (22) given y ( t ) at a parameter value O f , K s * ( t ; e) A - ’ i ( t ) dt From (13), (17), and (18), the relation between the “complete” data x ( t ) and the incomplete data y ( t ) is given by K y(t) = C xk(t) = H k= 1 ’ X(t) (19) where H=[Z Z.**Z]. L v * (20) Kterms We shall find it most convenient to let the nk( t ) De statistically independent, zero-mean, and Gaussian with the covariance matrix E{ nk(t)nk*(u)}= QkS(t - a) where Qk = &Q and the &’S are arbitrary real-valued scalars satisfying A-’[i(t)- s(t; e)] dt (25) where f(t) = E{x(t)/y(t); O f } (26) and e is a constant independent of 8. Since x ( t ) and y ( t ) are jointly Gaussian, related by the linear transformation y ( t ) = Hx(t), then using (A7) of the Appendix, 480 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, A N D SIGNAL PROCESSING, VOL. 36, NO. 4, APRIL 1988 Substituting (23) and (24) into (25) and following straightforward matrix manipulations, we obtain w e , e') = e - ,zlS,[w X K - sk(t; ,, " + / I e,)] * e,)] dt Q i ' [ % ( t ) - s,(t; (28) where the a,( t ) are the components of P ( t ) given by i ett) + O , [ y ( t ) - k = e;)]. (29) Observing that the maximization of V ( 0 , e') with rei , ( t ) = s,(t; Sk(t; 1 spect to 8 is equivalent to the minimization of each of the terms in the k sum of (28) separately, the EM algorithm assumes the following form. , K , compute Estep: F o r k = 1, 2, - i : " ' ( t ) = s,(t; ep) + 0,[y ( t ) - ,Ilc s,(t; ern))]. (30) Mstep: F o r k = 1, 2, min ek j [if'(t) - * - * 7 - K, 4:"' fil") 81 Fig. 1 . The EM algorithm for deterministic (known) signals. 2) The observation interval T is long compared to the correlation time (inverse bandwidth) of the signals and the noise, Le., WT/2n >> 1 . Fourier analyzing y ( t ) , we obtain Under the above assumptions, y ( t ) is WSS, zero-mean, and Gaussian. Thus, for WT/27r >> 1 , the Y ( w l ) ' sare asymptotically uncorrelated, zero-mean, and Gaussian with covariance matrix P ( w l ; e ) where P ( w ; 0 ) is the spectral density matrix of y ( t ) given by K sk(t; e,)]* e) = kC = 1 P(W; T Q - ' [ i r ' ( t ) - s k ( t ; e,)] dt (31) We observe that is, in fact, the ML estimate of 8, based on a:"' ( t ) . The algorithm is illustrated in Fig. 1 . We note that in the case of discrete observations, the integral in (31) is substituted by the sum over all points t E T. The most striking feature of the algorithm is that it decouples the complicated multiparameter optimization into K separate ML optimizations. Hence, the complexity of the algorithm is essentially unaffected by the assumed number of signal components. As K increases, we have to increase the number of ML processors in parallel; however, each ML processor is maximized separately. Since the algorithm is based on the EM method, each iteration cycle increases the likelihood until convergence is accomplished. We note that the 0,'s must satisfy the constraint stated in (21), but otherwise they are arbitrary free variables in the algorithm. The 0,'s can be used to control the rate of convergence of the algorithm and possibly to avoid the convergence to an unwanted stationary point of the algorithm. These considerations are currently under investigation. -+ @F+l'. @p+" B. Gaussian Signals Consider the model of (13) under the following assumptions. 1) n ( t ) and s , ( t ; e,), k = 1, 2, * , K , are mutually uncorrelated, wide-sense stationary (WSS), zero-mean Gaussian vector stochastic processes whose spectral density matrices areN(w) andSk(w; e,), k = 1, 2, * * , K , respectively. e,) + N(u). (33) The log-likelihood function given the Y ( q ) ' s is therefore given by q e ) = - C1 [logdet r p ( w l ; e) + Y*(Wl) P - ' ( q ; 0 ) y ( 4 ] (34) where the summation in (34) is carried over all wI in the signal frequency band. In the case of discrete observations y , = y (iAt),the log-likelihood is still given by (34) where the Y ( w l ) ' s are the discrete Fourier transform (DFT) of the y l ' s . To obtain the ML estimate of the various Ok's, we therefore must solve min 81,82,' ' ' ,8k [T [logdet P(w,;e) + Y * ( ~ ,P) - ' ( ~ e) ~ ;Y I ( ~ ~.) ] (35) We want to use the EM method to bypass this complicated multiparameter optimization. Following the same considerations as in the deterministic signal case, let the "complete" data x ( t ) be given by - - where T l 481 FEDER A N D WEINSTEIN: PARAMETER ESTIMATION OF SUPERIMPOSED SIGNALS and the n , ( t ) are chosen to be mutually uncorrelated, zero-mean, and Gaussian with respective spectral density matrices N k ( w ) = 0, N ( w ) where the &'S are arbitrary real-valued constants subject to (2 1). Following the same considerations leading to (34), the log-likelihood of the "complete" data is given by - logfx(x; e) = - C1 e), (43) [logdet T A ( ~ ,e) ; +X*(4 = - Exploiting the block-diagonal form of A ( w ; becomes 0) X(4] 7 [logdet T A ( u [ ; 0) + tr (A-'(ai; 0) X ( w i > X * ( w 1 ) ) ] (38) where (39) (40) A,(@; 0,) = &(w; 0,) + Pk ' N('"). (42) Taking the conditional expectation of (38) given Y ( w l ) at a parameter value e', where P ( w l ; e) is defined in (33). Observing that the maximization of U ( 8, e') with respect to 8 is equivalent to the minimization of each of the terms in the k sum of (46) separately, the EM algorithm assumes the following form. Mstep: Fork = 1, 2, min 8' ,K, C [ logdet A,( w l ; e,) I We observe that 6?+" is the ML estimate of 0, where is substituted by its the sufficient statistic X,( w,) X,*(or) current estimate X k ~ l ) ' n 'The . algorithm is illustrated in Fig. 2. The most attractive feature of the algorithm is that it decouples the full multidimensional optimization (35) into optimizations in the smaller-dimensional parameter subspaces. Thus, as in the deterministic signal case, the complexity of the algorithm is essentially unaffected by the assumed number of signal components. As K increases, we have to increase the number of ML processors in parallel; however, an ML processor is maximized separately. Since the algorithm is based on the EM method, each iteration cycle increases the likelihood until convergence is accomplished. 482 IEEE TRANSACTIONS ON ACOUSTICS. SPEECH, AND SIGNAL PROCESSING. VOL. 36, NO. 4. APRIL 1988 Mstep: -,$"I Fork 1 , 2, = , K, * (53) Ai") Fig. 2. The EM algorithm for stochastic Gaussian signals. IV. APPLICATION TO MULTIPATH TIME-DELAY ESTIMATION Let the observed signal y ( t ) be modeled by kzl Assuming that the observation interval T is long compared to the duration of the signal and the maximum expected delay, the two-parameter maximization required in (53) can be carried out in two steps as follows: (54) K y(t) = a k s ( l - rk) + n(t). (50) This model characterizes multipath effects where the transmitted signal s ( t ) is observed at the receiver through more than one path. We shall concentrate here on the case where s ( t ) is a deterministic known signal. The extension to the case where s ( t ) is deterministic but unknown, and to the case where s ( t ) is a sample function from a Gaussian random process, can be found in [ l l ] and [l], respectively. We suppose that the additive noise n ( t ) is spectrally flat over the receiver frequency band. The problem ,K. is to estimate the pairs ( q , rk), k = 1 , 2 , The direct ML approach requires the solution to [see (1511 e - . This optimization problem is addressed in [lo], where it is shown that, at the optimum, the (Yk's can be expressed in terms of the Tk's. Thus, the 2 K-dimensional search can be reduced to a K-dimensional search. Howeyer, as pointed out in [lo], for K 2 3 the required computations become too involved. Consequently, ad hoc approaches and suboptimal solutions have been proposed. The most common solution consists of correlating y ( t ) with a replica of s ( t ) and searching for the K highest peaks of the correlation function. If the various paths are resolvable, i.e., the difference between Tk and T~ is long compared to the temporal correlation of the signal for all combinations of k and 1, this approach yields near-optimal estimates. However, in situations where the signal paths are unresolvable, this approach is only distinctly suboptimal. We identify the model in (50) as a special case of (13). Therefore, in correspondence with (30) and (31), we obtain the following algorithm. , K, compute Estep: Fork = 1 , 2, * $ ' ( t ) = &u:"'s(t - $') + bk[ y ( t ) - I 5 &l"'s(t = 1 l'-1 - ?I"') 1 . (52) where E = j T I s ( t ) I dt is the signal energy and P &r) = J T i l " ' ( t )s * ( t - r ) dt. (56) We note that gp'( r ) can be generated by passing if'( t) through a filter matched to s ( t ) . The algorithm is illustrated in Fig. 3. This computationally attractive algorithm decreases iteratively the objective function in (5 1) without ever going through the indicated multiparameter optimization. The complexity of the algorithm is essentially unaffected by the assumed number of signal paths. As K increases, we have to increase the number of matched filters in parallel; however, each matched filter output is maximized separately. If the number K of signal paths is unknown, several criteria for its determination are developed in [ 121. These criteria are based on the ML parameter estimates and can therefore be easily incorporated into the algorithm. To demonstrate the performance of the algorithm, we have considered the following example. The observed signal y ( t ) consists of three signal paths in additive noise 3 c y(t) = (.kS(t - 7 k ) k= I + n(t) where s ( t ) is the trapezoidal pulse s(t) = [i: O ~ t < 5 5 j t 5 1 5 1/4 10)/20 15 < t 20. The actual delays are 71 = 0 72 = 5 73 = 1 , 2, 3. = 10, and the amplitude scales are (Yk = 1 K The additive noise is spectrally flat with a spectral level of N = 0.025 (so that the postintegration signal-to-noise ratio (SNR) per pulse is approximately 16 dB). The observed data consist of 100 time samples, as illustrated in 11 483 FEDER A N D WEINSTEIN: PARAMETER ESTIMATION OF SUPERIMPOSED SIGNALS I Y ill I SIGNAL -- I PEAK n ("+I! n ( " + I / S E L E C T O R k a ' "I I MATCHED FILTER n ("tl) ("*I! 'K ,TK PEAK SELECTOR Fig. 3. The EM algorithm for multipath time-delay estimation. -40-30-20 -10 0 10 20 30 40 50 60 Fig. 4 . The observed data. Fig. 4. In Fig. 5 we have plotted the matched filter output as a function of delay. As we can see, the conventional method cannot resolve the various signal paths and estimate their parameters. First, we have computed the ML estimates by direct minimization of the objective function in (51) using exhaustive search. We obtain ?I = 0.0117 ?2 = 5.0031 ?3 = &I = 1.1511 &2 = 0.7799 &3 = 9.9884 0.9471, Fig. 5 . The conventional matched filter response. and the value of the objective function at the minimum (corresponds to the value of the log-likelihood function at the maximum) is J = 0.45879. We have also computed lower bounds on the root mean square (rms) error of each parameter estimate using the Cramer-Rao inequality. We obtain a ( ? l ) = 0.028 a ( ? 2 )= 0.030 a(?3) = 0.028 0.076 ~ ( & 2 )= 0.079 ~ ( & 1 ) = ~ ( & 3 )= -80 -60-40-20 0 20 40 60 80 0.076. a ( ? k ) denotes the minimum attainable rms error in the estimate, and ( T ( & k ) denotes the minimum attainable rms error in the ( Y k estimate. We now apply our algorithm. In Fig. 6 we have plotted the matched filter response to the various signal paths, as they are evolved with the iterations. In Fig. 7 we have tabulated the results using several arbitrarily selected starting points as indicated by the first line of each table. We see that after 10-15 iterations, the algorithm converges within the Cramer-Rao lower bound to ML estimates of all the unknown parameters, regardless of the initial guess. We note that the small differences in the final estimates result from the finite grid used for the optimization. Using the asymptotic efficiency and lack of bias of the ML estimates, we can claim with some confidence that the rms error performance of the algorithm is the minimum attainablk characterized by the- Cramer-Rao lower bound. Tk -+: i I : I -80-60-40-20 0 20 40 60 80 Fig. 6. The matched filter response to each signal path V . PASSIVEMULTIPLESOURCELOCATION ESTIMATION array, the actual waveform observed at the mth sensor The basic system of interest here consists of K spatially Output is K distributed sources radiating noise-like signals toward an array of M spatially distributed sensors. Assuming perfect ym(t) = Sk(f - Tkm) + nrn(f) k= I propagation conditions in the medium, and ignoring amplitude attenuations of the signal wavefronts across the m = 1,2, ,M (57) IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. VOL. 36, NO. 4, APRIL 1988 484 . : , , .-., . - ,o 1 - ., . : , .: .] , I , .l , , .* L *, ", I . S I :i , l G 23 - 4 I--0 il 3 .. c., .... :I., ('I - , I 11, I ...... .. .. , .I. ...... . . > . .A, ,,, . ".1 e. .. ....... . . . I32 j . s, . . ......,,......,.. I >., , , , . . I E ~1 - 0 0 0 0 0 0 0 0 0 0 0 0 I> 0 i) 0 -CN <U . . . . . . . . . . . . . . . . u " c1 i) i> .> 0 0 3 0 E, 0 3 ?, c) ,c, i J f, 7 : , 4 L. P- 3 a , 3" - u <a 5 r- I., "1 I, ,m u. 7 - -.a o c- dJ .1 > :I I0 0 000 0 0 0 0 -..... - .. -. -..-- .-.. -. .. 11, I . . . . -. . -. . . -. . . . . . . ~~ o m r- P D m m ~ irl o m b i Q m rae moPN -0 o o o m "7 o m o y7 D m D li: m m o o o o o o k, , I 0 c t 1 8 0 o 0 0 c .a 0 0 0 I ..--I I <I- ! I I , I I I I - 4 I I O I Q 51 -. m P F *a P 1.1 e - G- m - - - I I o ~ N n a oroN Q No m m n m u? N o m r- o Q Q u o n -3 u i n u e v o e or m o m m ,o L- I- r- o n o - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 o o NP m Q P r o o r* N m I- - N ................ O O O O O i J O O O O O O O O O O I 4 t n"'.*, . . . . .: - c vi -0 I- 01 N ( I N - r i 0 r- m nm e LI - L: r i 3 r r- o 0 I I I 1 o m - o . . Q. . .m.o.n.w. .r.m.v..o.u-.o. o o m r m o o r m o o o ~ ~ u o -n 0 3 0 0 0 0 CI 0 0 0 0 0 c4 D o mm no C . . . -. . . . . . . . . . . . . "> 0 b 1 ld C J O1 4 h l L? aeo mo mo r (8 -. _.- .. m u %I :I -- -- 2 -.> -.-................ E I- o m r m o m r- o c ru m o r 3 ri o Q u o P m m -* e U) - m ~ u ~ o m ~ m o m . - 3 a o n 0- 4 n o m o o Q m c o c =. o r Q P m N - 0 0 0 0 0 0 0 0 0 0 0 - - -. .--. ................. o ~ N ~ ~ o N o m m I- m omo o or - ~ o Jo m -3 J N ~ 9 N L I 2 i o - P -7 Y mc o3 nnoaI-r-r-mmmmmmoa0 0 0 0 0 0 0 0 0 0 0 0 0 3 '> o I e r- 1 . . . . . . . . . . . . c. . o. 3. 00 0 00 0 00 7 000 1 %- 0 - ., -................... o m - r i ~ r - c - o r - . > o Q ca o P m N I- o r a 3 v o o c e m m b? Q P- Q b i Q a r P- r- m m m m p1 w -1 R D Y w 0 00 0 0 00 0 0 0 00 3.3 - - c- o n> C. 0 - :,.. . . . . . . . . . . . -. . . . . on rd c r Q pi w r i . 3. F .i o t-4 m m r* ra o o lil F v 4 J o Q r r, r o m u ri 9 (1 o o o o 0Pm k l P b> m 30 0 0 0 0u b7 0 m lr -4 - I - - -- o r* - Q N o r4 Q 4 mc .. 4 e m r.1 I 4 - .- . . . . . . . . . . . . -. . . . O N N ~ N I N - ~ - C ~ O ~ ~ N o m 91 m r- o m n r n m oon ~ m r - ~ m n m ~ - o o o o o o o o -Ir - N m o n 4 I- m P o - O - - .. - rq 11 c *> 4 11 485 FEDER A N D WEINSTEIN: PARAMETER ESTIMATION OF SUPERIMPOSED SIGNALS where sk ( t ) is the kth source signal, n, ( t ) is the additive noise at the rnth sensor output, and Tkm is the travel time of the signal wavefront from the kth source to the rnth sensor. Information concerning the various source location parameters can be extracted by measuring the various Tkm. In the passive case, one can only measure the travel time differences, obtainable by selecting one sensor as a reference and comparing its output to that of every other sensor. If we let sensor M be the reference and set 7km = 0 , then Tkm measures the travel time difference of the kth signal wavefront to the ( m , M ) sensor pair. To simplify the exposition, suppose that the various signal sources are relatively far-field so that the observed signal wavefronts are essentially planar across the array. If we further suppose that the array sensors are colinear, then where d , is the spacing between sensors m and M , c is the velocity of propagation in the medium, and d k is the angle of arrival of the kth signal wavefront relative to the boresight. Substituting (58) into (57) and concatenating the various equations, we obtain where where yrn = d , / c . Suppose that the various sk ( t ) and the various n, ( t ) are mutually independent, WSS , zero-mean Gaussian random processes with spectral densities & ( w ) and N , ( w ) , respectively (the cases where the sk( t ) are deterministic known/unknown signals are presented in [2] and [ 111, respectively). We want to find the ML estimates of el, &, . . . , given observations of y ( t ). Assuming that the observation interval T is long compared to the correlation time (inverse bandwidth) of the signals and the noises, the direct ML approach requires the solution to [see (35)] T' min Bt,el,. . ,eK c [logdet P ( q ; 0) + Y * ( q ) 1 - P-'(wl; e) Y ( w , ) ] . (611 where Y ( w I ) are the Fourier transform coefficients [or the DFT in the discrete case of y ( t ) ] and K P(w; 8) = c s k ( w ) v(w; 0,) v*(w;6,) + N ( w ) k= 1 (62) where and 1 This is a complicated multiparameter optimization problem in several unknowns. Consequently, numerous ad hoc solutions and suboptimal approaches have been proposed in the recent literature (e.g., [13]-[18]). Still, the most common approach consists of beamforming and searching for the K highest peaks. If the various signal sources are widely separated, this approach is nearly optimal. However, if the source signals are closely spaced, we are likely to obtain poor estimates of the various 0k's. Identifying the model in (59) as a special case of (13), the algorithm specified by (48) and (49) is directly applicable, where Ak(w; e,) = & ( w ) V ( w ; 8,) v*(w; 6,) + PkN(W). (65) Now, det &(W; [ 0,) = 1 + -1 s k ( W ) v*(Cd;6,) Pk 486 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. VOL. 36. NO. 4, APRIL 1988 Substituting (66) and (67) into (49) and ignoring the terms that are independent of e,, the M step can be further simplified. The resulting algorithm is as follows. , K , compute E step: For k = 1, 2 , X m ) ' " ) = & ( a l ; 8:"') - Ak(wl;8:"') * P-'( 4; 6'"') A,( q ;8:"') + & ( a 1 ;8:"') P-'(q; e'"') Y ( 4Y"(4 e'"')A,( e?)). P-'( w l ; M s t e p : Fork = 1, 2 , * .N-'(~/) wl; 9 e) (68) K, -+ a:"+? We note that the objective function in (69) is the array is substibeamformer, where the product X,( w , ) Xz ( q ) . altuted by its current estimate X k m l ) ( " ) The gorithm is illustrated in Fig. 8. This computationally attractive algorithm decreases iteratively the objective function in (61) without ever going through the indicated multiparameter optimization. The complexity of the algorithm is essentially unaffected by the assumed number of signal sources. As K increases, we have to increase the number of beamformers in parallel; however, each beamformer output is maximized separately. To demonstrate the performance of the algorithm, we have considered the following example. The array consists of five colinear and evenly spaced sensors. There are two far-field signal sources at bearings el = 00 e2 = 100 B t I 1 -90 -72 relative to the boresight. The array-source geometry is shown in Fig. 9. The signals and the noises are spectrally flat with S, ( w ) = S and N,,, ( a )= N over the frequency band [ - W / 2 , W/2]. We assume that S I N = 1 and that WT/2n = 20 (so that the postintegration SNR per channel is approximately 23 dB). The array length is taken to be L = 6X where X is the wavelength associated with the highest frequency in the signal band. In Fig. 10 we have plotted the array beamformer response as a function of the bearing angle. As we can see, the conventional beamformer cannot resolve between the signal sources and estimate their bearings. The ML estimates, obtained by direct minimization of the objective function in (61), are 8, = -0.0563 82 = 10.4556, and the value of the objective function at the minimum (corresponds to the value of the log-likelihood function at the maximum) is J = 159.0137. r i I II O Fig. 9. Array-source geometry. -54 -36 -18 o 18 36 54 72 * 90 8 Fig. 10. The conventional beamformer. We have also computed the Cramer-Rao lower bound on the rms error of each parameter estimate. We obtain .(el) = 0.2680 ~ ( 8 , )= 0.2722. We now apply our algorithm. In Fig. 11 we have plotted the beamformer response to the various signal sources as they are evolved with iterations. In Fig. 12 we have tabulated the results using several arbitrarily selected initial guesses. We see that in all cases, after 5-10 iterations, the algorithm essentially converges, within the CramerRao lower bound, to the ML estimates of all the unknown bearing parameters simultaneously, and the various signal sources are correctly resolved. VI. DISCUSSION We have presented a computationally efficient algorithm for ML parameter estimation of superimposed signals based on the EM method. The algorithm is developed I I FEDER AND WEINSTEIN: PARAMETER ESTIMATION OF SUPERIMPOSED SIGNALS I l l I -0.0-7.5 -5.0 -2.5 l 0 I 2.5 I I t L 4 1 : 5.0 7.5 10.0 12.5 15.0; n= I I I I -10.0 7.5 -51) -25 I 0 I I I I I t * 2.5 91) 7.5 10.0 12.5 15.08 Fig. 11. The beamfonner response to each signal source. --n 0 1 2 3 4 5 6 7 8 9 n0 1 2 3 4 5 6 7 8 9 10 11 12 13 P 2.0000 1.5776 1.1220 .4932 .3716 .1572 .os00 - -0036 0304 0304 -. -. 6 - 8.0000 8.4652 8.9744 9.4568 9.8588 10.1268 10.2076 10.3660 10.4216 10.4404 (n) I 4.0000 3.6412 3.2928 2.9176 2.4888 2.0332 1.5240 1.0148 .5860 .2912 .1304 .OS00 - ,0636 0304 -. 7.0006 7.2592 7.5272 7.8220 8.1436 8.5188 8.9476 9.4032 9.8 0 S 2 16. 1000 10.2608 10..3680 10.4216 10.4484 J - n- 269.6614 241.1230 210.3016 0 1 184.1323 3 4 5 168.3571 161.7404 159.7 169 159. I956 159.0440 159.0218 J 2 6 7 8 . n- 378.5662 351.2384. 328.5127 306.1186 281.8299 254.9325 224.2842 194.4633 173.5086 163.5225 160.3573 159.3599 159.6797 159.0218 0 1 2 3 4 c 6 7 8 9 10 11 12 13 14 15 16 -5.0000 -2.4692 -1.2632 -6736 3788 2440 1644 I108 --. -. -. -. -.0840 13.0000 111.1452 1. 9492 10.7164 10.5280 10.4484 10.4216 10.4216 10.4216 p) J - I 7.0000 . 6.0532 4.9544 3.8288 2.7836 1.9528 1.3632 .9344 .6396 .4520 .3180 .2108 ,1304 .0768 .os00 .0232 * 0036 - J 520.2958 294.7128 196.5459 167.7777 161.0492 159.6113 159.2048 1 59.0730 159.0405 13. ElWO 12.6996 12.3780 12.08.32 11.8420 1 1.6276 11.4132 1 1.2256 1 1.0380 10.9040 10.7960 10.7164 10.6360 10.5024 10.5555 10.5288 10.5020 Fig. 12. Tables of results for the multiple source angle-of-arrival estimation. 497.0527 448.0688 385.8327 318.2541 256.9945 214.1421 109.1836 175.1 le9 167.4013 i 63.6597 161.5851 160.3e80 159.6668 159. 33351 159.2103 159.1171 159.0536 488 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. 36. NO. 4. APRIL 1988 for the case of deterministic (known) signals and for the case of stationary Gaussian signals. The most striking feature of the algorithm is that it decouples the full multidimensional search associated with the direct ML approach into searches in smaller-dimensional parameter subspaces, leading to a considerable simplification in the computations involved. The algorithm is applied to the multipath time-delay and multiple source location estimation problems; in both cases, we demonstrate the performance of the algorithm in a given example and show that the algorithm converges iteratively to the exact ML estimate of all the unknown parameters simultaneously where each iteration increases the likelihood. We finally note that the derivation of the EM algorithm for the linear-Gaussian case (the Appendix) has importance of its own since it covers a wide range of applications. APPENDIX DEVELOPMENT OF THE EM ALGORITHM FOR LINEAR-GAUSSIAN CASE Y = HX where h = 1 if X is real-valued and = 2 if X is complexvalued. Taking the logarithm of (A2), 2 - m * ( e )P ( 8 where f = E{x/Y = E{XX*/Y = y ; 8 ' ) . = y; 8') and XX* (A61 Since X and Y are related by a linear transformation, they are jointly Gaussian, and the conditional expectations required in (A5) and (A6) can be computed by a straightforward modification of existing results. We obtain R = m(w)+ n(w)H*[HA(W)H*]-' [Y - Hm((3')] (-47) xx* = A ( 8 ' ) - A ( 8 ' ) H * [ H A ( e ' ) H * ] - ' ('41) where H is a noninvertible matrix and X possesses the following multivariate Gaussian probability density: -- + m * ( e )P ( e THE Suppose the "complete" data X and the observed (incomplete) data Yare related by the linear transformation x [logdet a(e) u(8,e') = c * HA(€)') + if*. ('48) Note that if we set 8' = 8, (A7) and (A8) are the wellknown formulas for the conditional expectations in the Gaussian case (e.g., [19]). Substituting (A7) and (A8) into (A4), the function U(8, 8 ' ) required for the EM algorithm is given in a closed form. We observe that U(8, 8 ' ) and log f x ( x ; 8 ) have the same dependence on 8. Maximizing U(8, 8 ' ) with respect to 8 is therefore the same as maximizing logfx(x; 8 ) with respect to 8. Hence, the EM algorithm essentially requires the ML solution in the X model, which might be significantly simpler than the direct ML solution in the Y model. In correspondence with (1 1) and (12), the EM algorithm for the linear-Gaussian case is given by the following. E step: $"' = m(@"') + A(@"') * x = c - - [logdet A ( 8 ) + m * ( 8 ) A - ' ( 8 ) &*(") = 2 . m ( e ) - x*A-'(e) - m * ( e ) A-'(e)x H * [ H A ( 0 ' " ' ) H * ] - ' [ y- Hm(8'"')] . HA( e("))+ f(n)f(n)*. m(e) + tr ( ~ ( ~ ) x x * ) ] (A3 1 where c is a constant independent of 8. Taking the conditional expectation of (A3) given Y = y at a parameter value e', (A91 -' A ( e("))- A ( e(")) H*[ H A ( ( j ( " ) ) H * ] M step: min {logdet A ( 8 ) e ( A10 ) + m*(8) - n-'(e) m ( e ) - d")*A-'(e) m ( e ) - m * ( e ) +(e) 2'") + tr (A-'(e)&*('))} + W+l). (All) FEDER AND WEINSTEIN: PARAMETER ESTIMATION OF SUPERIMPOSED SIGNALS ACKNOWLEDGMENT The authors wish to thank D. Kelter and Y. Eyal for their help in the simulations, and the referees for many helpful comments. REFERENCES [ l ] M. Feder and E. Weinstein, “Optimal multiple source location estimation via the EM algorithm,” in Proc. I985 Int. Conf. Acoust., Speech, Signal Processing (ICASSP ’85),Tampa Bay, FL, Mar. 1985, pp. 1762-1765. [2] -, “Multipath and multiple source array processing via the EM algorithm,” in Proc. I986 Int. Con$ Acoust., Speech, Signal Processing (ICASSP ’86), Tokyo, Japan, Apr. 1986. [3] J. R. Grindon, “Estimation theory for a class of nonlinear random vector channels,” Ph.D. dissertation, Washington Univ., St. Louis, MO, 1970. [4] J. S. Lim and A. V. Oppenheim, “All pole modeling of degraded speech,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-26, pp. 197-210, 1978. [5] M. Segal, “Resolution and estimation techniques with application to vectorial miss-distance indicator,” M.Sc. thesis, Technion-Israel Inst. Technol., Haifa, Israel, Aug. 1981. [6] N. M. Laird, A. P. Dempster, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Ann. Roy. Stat. Soc., pp. 1-38, Dec. 1977. [7] C. F. J. Wu, “On the convergence properties of the EM algorithm,” Ann. Stat., no. 11, pp. 95-103, 1983. [8] I. Meilijson, “A fast improvement to the EM algorithm on its own terms,’’ submitted for publication. [9] H. L. Van Trees, Detection Estimation and Modulation Theory, Part I. New York: Wiley, 1968. [lo] J. E. Ehrenberg, T . E. Ewart, and R. D. Moms, “Signal processing techniques for resolving individual pulses in multipath signal,” J . Acoust. SOC.Amer., vol. 63, no. 6, pp. 1861-1865, 1978. [ I l l M. Feder and E. Weinstein, “Optimal multiple source location estimation via the EM algorithm,” Woods Hole Oceanographic Institution, Woods Hole, MA, Tech. Rep. WHOI-85-25, July 1985. [12] M. Wax and T. Kailath, “Detection of signals by information theoretic criteria,’’ IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, pp. 387-392, 1985. [131 S. S. Reddy, “Multiple source location-A digital approach,” IEEE Trans. Aerosp. Electron. Sysr., vol. AES-15, pp. 95-105, 1979. [14] G. V. Borgiotti and L. J. Kaplan, “Superresolution of uncorrelated interference sources by using adaptive array techniques,” IEEE Trans. Antennas Propagat., vol. AP-27, pp. 842-845, 1979. ‘11 489 [15] R. 0. Schmidt, “A signal subspace approach to multiple emitter location and spectral estimation,” Ph.D. dissertation, Stanford Univ., Stanford, CA, 1981. 1161 G. Su and M. Morf, “The signal subspace approach for multiple wideband emitter location,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-31, pp. 1503-1522, 1983. [17] B. Porat and B. Friedlander, “Estimation of spatial and spectral parameters of multiple sources,” IEEE Trans. Inform. Theory, vol. IT29, pp. 412-425, 1983. 1181 A. Nehorai, G. Su, and M. Morf, “Estimation of time difference of arrival for multiple ARMA sources by pole decomposition,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-31, pp. 14781491, 1983. [19] B. 0. Anderson and J. B. Moore, Optimal Filtering. Englewood Cliffs, NJ: Prentice-Hall, 1979. Meir Feder received the B.Sc. and M.Sc. degrees (summa cum laude) from Tel-Aviv University, Tel-Aviv, Israel, and the Sc.D. degree from the Massachusetts Institute of Technology, Cambridge, and the Woods Hole Oceanographic Institution, Woods Hole, MA, all in electrical engineering, in 1980, 1984, and 1987, respectively. He is currently a Research Associate in the De partment of Electrical Engineering and Computer Science at M.I.T. His research interests lie in the broad area of signal processing, stochastic estimation, and information theory. He is particularly interested in investigating new statistical estimation techniques and using them in signal processing applications, e.g., speech, image, and array processing problems. Ehud Weinstein received the B.Sc. degree from the Technion-Israel Institute of Technology, Haifa, Israel, and the Ph.D. degree from Yale University, New Haven, CT, both in electncal engineering, in 1975 and 1978, respectively. He is currently an Associate Professor in the Department of Electronic Systems, Faculty of Engineering, Tel-Aviv University, Tel-Aviv, Israel, and an Adjunct Scientist with the Department of Ocean Engineenng, Woods Hole Oceanographic Institution, Woods Hole, MA. His research interests are in the general area of statistical estimation theory and its application to signal and array processing. Dr Weinstein is the co-recipient of the IEEE Acoustics, Speech, and Signal Processing Society’s 1983 Senior Award.