1520 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 4, APRIL 2005 Convergence and Performance Analysis of Godard Family and Multimodulus Algorithms for Blind Equalization Vinod Sharma, Senior Member, IEEE, and V. Naveen Raj, Student Member, IEEE Abstract—We obtain the convergence of the Godard family [including the Sato and constant modulus (CM) algorithms] and the multimodulus algorithms (MMA) in a unified way. Our analysis also works for CMA fractionally spaced equalizer (FSE). Our assumptions are quite realistic: The channel input can be asymptotically stationary and ergodic, the channel impulse response is finite and can be stationary and ergodic (this models fading channels), and the equalizer length is finite. The noise is independent and identically distributed (i.i.d.). The channel input can be discrete or continuous. Our approach allows us to approximate the whole trajectory of the equalizer coefficients. This provides estimates of the rate of convergence, and the system performance (symbol error rate) can be evaluated under transience and steady state. Index Terms—Blind equalization, CMA, convergence analysis, MMA, performance analysis, Sato algorithm. I. INTRODUCTION A DAPTIVE equalizers have become an integral part of today’s communication systems. These are used to cancel the effect of intersymbol interference (ISI) in the channels. Traditionally, training sequences have been used to estimate the tap weights in equalizers. However, this consumes some channel bandwidth. If the channel is time varying, as in cellular mobile systems, then the training sequence needs to be sent frequently, and the resulting bandwidth loss can be significant. In addition, in certain situations, e.g., in point to multipoint transmission, a training sequence may interrupt with the broadcast if transmitted to revive any disrupted link. Therefore, blind equalizers have been suggested more recently where instead of using a training sequence, only general channel input statistics are used to adapt the equalizer. Some important current applications are in broadband access on copper in Fiber-to-the-curb and very-high-rate subscriber line (VDSL) networks. The earliest blind equalization algorithm has been proposed by Sato [21]. Subsequently, this algorithm was generalized by Benveniste et al. [3] and Godard [13]. They defined classes of algorithms called BGR (named after the three authors Benveniste, Goursat, and Ruget) and Godard algorithms, respectively. The Manuscript received May 26, 2003; revised April 16, 2004. Parts of this paper were presented at the IEEE International Control Conference, 2003. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Behrouz Farhang-Boroujeny. V. Sharma is with the Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore, India (e-mail: vinod@ece.iisc.ernet.in). V. N. Raj is with the Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore, India, and also with Philips Consumer Electronics, Bangalore, India (e-mail: naveen@pal.ece.iisc.ernet.in). Digital Object Identifier 10.1109/TSP.2005.843725 Constant Modulus Algorithm (CMA), which is one member of the Godard family, has particularly been widely implemented because of its simplicity and effectiveness. Since then, other algorithms have also been proposed: Maximum Likelihood receiver [10], Shalvi and Weinstein [23], Bussgang algorithms [2], Multimodulus algorithms (MMA) [27], [28], subspace methods [24], [25], and linear prediction methods [1]. Good recent surveys on the topic can be found in [9], [12], and [24]. However, CMA continues to be a popular algorithm. Despite extensive research on the analysis of these algorithms, because of the nonlinearities and discontinuities in the cost-functions, further work is required. In this paper, we concentrate on the Sato algorithm (because of the historical importance and because it continues to be a challenge due to discontinuities) and the CMA algorithm. Since the MMA algorithm, which is an improvement over the CMA, and the CMA–fractionally spaced equalizer (FSE) can also be handled in the same way, we have included these in this paper. In the following, we survey the analytical studies available on these algorithms and then explain our contribution in this paper. An early seminal paper on the analysis of the BGR algorithms is Benveniste et al. [3]. They obtained convergence of the algorithms under the assumptions that the equalizer has a doubly infinite parametrization and the channel input is continuous and super/sub Gaussian. These assumptions are relaxed in Ding et al. [8]. They provide the local convergence of the algorithm for an independent identically distributed (i.i.d.) input to the channel. They also show that under their weaker conditions, there can be an ill-convergence of these algorithms i.e., they can converge to an undesired limit point in case of a finite length equalizer. The Sato algorithm has also been studied in Weerackody et al. [26]. Instead of studying the convergence of equalizer weights, they study the mean square error of the equalizer output. They also assume the channel input to be an i.i.d. sequence, the channel output to be conditionally Gaussian given the input, and that the equalizer weight vectors are independent of the equalizer input. This approach has been extended by Cusani and Laurenti [6] to the CMA and by Garth [11] to the MMA. Macchi and Eweda [14] also analyzed the Sato algorithm when the channel has small ISI. Ding et al. [7] show that even the Godard algorithm can have convergence to an undesirable equilibrium point. Rupp and Sayed [20] provide the convergence of stop-and-go variants of the Sato and CMA algorithms under the assumption of no noise and bounded channel outputs. Yang et al. [27] is an extensive study on the MMA algorithm. We supplement their work by providing the convergence 1053-587X/$20.00 © 2005 IEEE SHARMA AND RAJ: CONVERGENCE AND PERFORMANCE ANALYSIS OF GODARD FAMILY AND MULTI MODULUS ALGORITHMS of MMA. See the above-mentioned surveys for up-to-date results on this problem. In this paper, we prove convergence of the Sato and the other algorithms of the Godard family. Then, we show that our proofs extend to the CMA fractional spaced equalizers (FSEs) and the MMA algorithm as well. We show that the trajectory of the equalizer weight coefficients converges to the solution of an ordinary differential equation (ODE) over any finite time. Due to constant step size and nonzero noise, even though the ODE may converge to an attractor, the equalizer will eventually come out of its neighborhood and may go to another attractor. Our results hold under more general assumptions than the previous studies: The input sequence to the channel can be (asymptotically) stationary and ergodic and can have continuous or discrete distributions. This is more realistic than the i.i.d. assumption that is usually made. Often, the input to a channel is coded (e.g., convolutionally coded), and then, it will be (asymptotically) stationary and ergodic but not i.i.d. In addition, we work directly on the trajectory of equalizer coefficients than on the convolution of channel and equalizer coefficients, as is often done in the literature. This has some well-known advantages [9]. We do not assume that the equalizer weight vectors are independent of channel output, as assumed in [6], [11], and [26]. In addition, our convergence proof is explicit and quite general. We can assume the channel impulse response to be random and time varying. This allows modeling a fading channel, which one encounters in wireless networks. The convergence of the Sato algorithm, due to the discontinuity, has not been rigorously proved so far. Most convergence results on the other algorithms are also local convergence results, i.e., when the initial guess of the equalizer weights is close to the equilibrium point. We do not need such an assumption. Our method also provides approximate knowledge of the trajectory of the equalizer coefficients. This tells us about the rate of convergence of the equalizer weights and, also, as we will show later, allows us to compute the performance symbol error rate (SER) of the corresponding system under transience and steady state. Performance of the CMA under steady state has also been studied recently in [15], [18], [22], and [29]. They obtain approximate location of the equalizer under steady state and obtain bounds on its mean square error. It may be possible to obtain bounds on SER from these results. However, in our study, we directly obtain SER not only under steady state but also under transience. The paper is organized as follows. In Section II, we specify our system for the Godard family and also describe our approach. In Section III, we prove the convergence of the Sato algorithm. Section IV proves the convergence for the other members of the Godard family. In Section V, we discuss the performance of these algorithms under transient and steady states. Section VI describes the setup used for CMA FSE [9] and MMA [27]. Then, we show how the convergence analysis of CMA provided in Section IV extends to these systems.We also show that CMA FSE may not have ill convergence, even when there is some channel noise. Section VII illustrates our results via simulations. In Sections V and VII, we will limit ourselves to the Sato and CM algorithms. These are the main algorithms of the Godard family. Of course, in the same way, we can carry out the details for the other algorithms as well. 1521 Fig. 1. Communication system model with blind equalizer. II. SYSTEM DESCRIPTION AND OUR APPROACH In Section II-A, we describe our system and the equalizer algorithms we will analyze for the Godard family. Section VI will describe a modification of this system for the CMA FSE and MMA. In Section II-B, we present our approach to prove the convergence of the algorithms. A. System We study the following communication system. The input symbols are passed through the pulse shaping filter to pro. Let the channel impulse reduce the transmitted signal .Then, the received signal can be sponse be denoted by written as (1) Let denote the impulse response of the cascade of the pulseand the channel response . If we sample shaping filter the received signal at a rate greater than the Nyquist rate, we can get the sufficient statistics. In this paper, we assume that this criterion is met. Then, we can have the discrete-time model where is a real-valued input sequence that we want to transmit through the communication channel. The channel is assumed to be a linear time-invariant system (later on we generalize it to time varying fading channels) with a finite impulse of order . The output of the channel at time response is . An i.i.d. noise is added to the channel output . The resulting system is at the receiver. Let shown in Fig. 1. Due to bandwidth limitations, the channel may exhibit ISI. An equalizer is used at the receiver to remove the distortion caused by the ISI. We consider only linear equalizers. To exactly remove the ISI, the transfer function of the equalizer should be the inverse of the channel transfer function. However, , and in blind equalization, we also do we do not know (no training sequence). Thus, an adaptive not know the , scheme is used to learn the proper equalizer coefficients is a vector of dimenas the input sequence is transmitted. sion and are its components. Let be the equalizer output at time . Then, , where . Simand ilarly, we denote . Often, we will consider a or particular realization of . Then, we will denote it by . We will assume the channel impulse response and the 1522 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 4, APRIL 2005 number of equalizer weights to be finite. Based on the input , the equalizer is updated to Theorem 1: If the solution of the equation (6) (2) is an appropriate function, depending on the blind where equalization algorithm used, and is the step-size constant. In to the general, has an effect on the rate of convergence of steady-state value as well as on the steady-state performance. We consider the Godard family of algorithms. For this family, the function in (2) is given by (3) , and , an integer, is . where If , we obtain the Sato algorithm. provides the for the Sato algorithm has a CM algorithm. The function discontinuity, and hence, it is treated separately in Section III. The CMA and the other members of the family are considered in Section IV. B. Our Approach and Assumptions For the convergence of the sequence , we use the results from Bucklew et al. [4]. For easy reference, we restate the results. The presentation is slightly modified to suit our problem setup. Define The following conditions are used in [4]. is asymptotically stationary and ergodic, and C1) is an i.i.d. sequence statistically independent of . In addition, is integrable, with , i.e., respect to the distribution of , for each . is continuous in , and for a finite C2) is unique for each as ) , then under C1 and C2 (for for any , where denotes convergence in proba. bility, and We will show that the solutions of the concerned ODEs for our algorithms do not blow up in a finite time. Then, in fact, in the statement of the above theorem, we can replace the and by . We will verify the assumptions of the above theorem for the Godard class of algorithms under the following conditions. and are statistically independent. A1) is asymptotically stationary and ergodic. A2) is a sequence of i.i.d. zero mean random variA3) ables whose distribution has a bounded probability density. for an appropriate . A4) for an appropriate . A5) The and required will be stated whenever needed. Our assumptions are weaker than those used in any previous analysis of which we are aware. More importantly, they will be satisfied in most practical systems. Actually, the proof of convergence can be used under the more general conditions when is random instead of being the channel impulse response is stationary erdeterministic. Then, we will assume that godic and independent of . This implies that is also asymptotically stationary and ergodic ( denotes the convolution operator). In addition, in that case, we will need . We will use the above result on the Sato algorithm in Section III and on the rest of the Godard class in Section IV. III. CONVERGENCE OF SATO ALGORITHM (4) for every The Sato algorithm can be described as and sgn (7) (5) , and sgn where Under the above conditions, the following result is proved in [4]. We use the notation , where denotes the integer part of . Define for , which is a suitably large constant. Then, we have the following theorem. sgn for for is defined as . The sgn function makes a discontinuous function, and hence, this algorithm has been considered harder to analyze as compared to some other blind equalization algorithms. However, we will see that our convergence results will go through, SHARMA AND RAJ: CONVERGENCE AND PERFORMANCE ANALYSIS OF GODARD FAMILY AND MULTI MODULUS ALGORITHMS despite this discontinuity. This happens because in the assumpto be continuous only at the tration C2, we require and corresponding to jectory of ODE (6). Define , as in the begining of Section II-B. Now, we verify [4, assumptions C1 and C2] mentioned in is assumed Section II. Since the channel impulse response is asymptotically stationary and to be of finite duration, if ergodic, then , being a convolution of with , is also asymptotically stationary and ergodic. In addition, from (7), , if for any given . Therefore, condition C1 is verified. Next, consider condition C2. If , then (4) is sat. Of isfied. Similarly, (5) is satisfied if, in addition, if . Now, consider the conticourse, . It is provided by the following Lemma. nuity of , Lemma 1: Under assumptions A1–A5 with is continuous in with probability one. Proof: From (7) sgn sgn (8) Thus, we need to show the continuity of sgn and sgn with respect to . sgn . The funcFirst, consider is continuous, except at zero. Therefore, when tion sgn , sgn is continuous in if the argument is not zero. If the distribution of has an absolutely continuous component, then this happens with probability zero . Similarly, one can say when . for any given Therefore, when , if , then sgn sgn almost , this implies, surely. Since we have also assumed by the dominated convergence theorem, that sgn 1523 . This shows that the ODE has a unique local solution at . Then, we argue that for any initial condition does not matter. Finally, in Lemma 3, we prove our assertion of . unique solution for all time if the initial condition , Lemma 2: If A1–A5 are satisfied with is locally Lipschitz for . Proof: See the Appendix . is certainly not a desirable equalizer for any Since channel, for any reasonable equalizer algorithm, zero should not be an attractor. This is also true for the Godard family. At least for the Sato and CM algorithms, one can observe from the cost functions [see (21) and (22) below] that they minimize. An equilibrium point is an attractor for the ODE if and only if it is a local minimum of the corresponding cost function. Zero is not a local minimum for these cost functions. The example in Section VI also shows it. When has a density (assumption A.3), with probability one, and if zero is also not an attractor, it can be , with probability one ignored (because even if ), and we can start the ODE with . Thus, for in the following lemma, the restriction of is harmless. , Lemma 3: Under the assumptions of Lemma 2, if the ODE has a unique solution, and the solution does not blow up in finite time. Proof: To prove our assertion, we use a result from [17, pp. is continuously differentiable 171]. This shows that if ( denotes a norm) by and we can upper bound , i.e., a Lipschitz function of for each , then the solution of our ODE will not blow up in a finite time. Since sgn define the function as Then sgn (10) However, one can show that sgn is, in , general, not continuous under our assumptions at but when has an absolutely continuous component, from (7), with probability zero for . we observe that Now, consider the continuity of sgn . , the argument of Since this also involves sgn the last paragraph works for this also. This gives the continuity on a set of with probability one. of Next, we consider the ODE (9) needed in Theorem 1. We would like to show that (9) has a unique solution for each initial condition and that the solution does not blow up in finite time. This would hold if we could is Lipschitz. However, it is easy to show that show that it is not. Therefore, we split the proof of our result in two parts. is locally Lipschitz In the next lemma, we show that and hence, is Lipschitz. This proves (along with the fact that , there is a unique solution 0 is not an attractor) that if of the ODE, and the solution does not blow up in finite time. Using the above results, we have proved the following. Theorem 2: For the Sato algorithm, under the assumptions , and A1–A5, with (11) , for any as the ODE . Here, is the solution of (12) with the initial condition that . From the above theorem, we can show the following useful be an attractor of the ODE (12) and be a point result. Let 1524 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 4, APRIL 2005 such that if , then from (11), we obtain that as , as , for any given . Then, for and large enough. Thus, for small enough , is concentrated asymptotically (in time), the trajectory of with a large probability for a long time. For all pracaround can be taken as its tical purposes, for small enough , then will eventually come out of steady-state value, although , its neighborhood. A similar conclusion is made if instead of of the ODE (12), which has a rewe have an invariant set gion of attraction. If is not an equilibrium point but is a renor will converge to unless peller, then neither (then, stays there, but will move out of has a region of attraction but does it and will not return). If not contain a ball with as its center, (i.e., is a saddle may converge to it, but will not, at least point), then when there is noise. Similar conclusions hold for the convergence results in Sections IV and VI. From (11), we also obtain an approximation of the trajectory for small . We will use it to obtain the performance of of the system at any time in Section V. To be able to obtain the . above useful information, we need to compute One can easily show that with Proof: We first show the continuity of . Let us denote in (3) as a product respect to (when are functions of , , and representing the three terms in the three square brackets in (3)). be a sequence converging component-wise to Let . For large enough , for appropriate finite and . Then and Therefore, for large enough (13) is the distribution function for , , and is the probability density of . We will use these simplified expressions on an example in Section V. (14) Expectation of the right side of (14) is finite if . Then, by continuity of (and hence of ) as a , by the dominated convergence theorem function of where , IV. CONVERGENCE OF GODARD FAMILY ) OF ALGORITHMS ( In this section, we prove the convergence of the Godard , defined in (3) in Section II. family of algorithms, for Define We assume (A.1)—(A.5) in Section II, with and to be specified. In the following, we prove C1 and C2 of Section II and is locally Lipschitz and that the solution also show that of the ODE does not blow up in finite time. is asymptotically staWe already know that tionary and ergodic. One can also easily check that for any whenever . This verifies C1. The following lemma verifies C2. , C2 Lemma 4: Under assumptions A1–A5 with is satisfied. and is finite for all . Under the same con. ditions, we also get , we also need For . Thus, condition C2 is verified. Next, we consider the ODE (15) To ensure the existence and uniqueness of a local solution to (15), we need to verify the local Lipschitz continuity of . We do this in the following Lemma. is locally Lipschitz under the assumpLemma 5: tions of Lemma 4. . Later on, we Proof: We first prove the result of . will have a proof for By arguing as in Section III, it is enough to verify that is continuously differentiable with respect to . For this purpose, we express as , as is discussed in the proof of Lemma 4. We observe that obviously differentiable. For differentiability of and with respect to , we only need to verify the differentiability of for : a positive integer. This, in fact, will be SHARMA AND RAJ: CONVERGENCE AND PERFORMANCE ANALYSIS OF GODARD FAMILY AND MULTI MODULUS ALGORITHMS verified if we verify it for . One can easily check its . Under our differentiability with respect to whenever occurs only with probability zero. For assumption A3, convenience, we write the derivative of (we will write ) for 1525 . Then, certainly, Define as . In addition sgn (17) (16) , the middle term in the above expression is absent. When and , Thus, under assumptions ) of the derivative of expectation (with respect to with respect to can be upper bounded by a function that is . Therefore, by the absolutely integrable with respect to dominant convergence theorem, we can take the gradient operinside the integral, ator in the expression . In addiand hence, we obtain the differentiability of with probtion, the derivative in (16) is continuous for ability one. Again, by the dominated convergence theorem, we for . obtain the continuity of derivative of . Consider a Now, we show Lipschitz continuity for of center and radius . Observe that ball for . Thus, for large enough, the term in the We need to show that for all bracket in the last equality is positive. Define (18) By the dominated convergence theorem, the map is continuous on the unit circle, and hence, the infimum in (18) is finite. In addition, under the assumption for any on the unit that has a density, circle for any . Therefore, since infimum of a continuous . function is attained on a compact set, The term in the bracket in (17) can be written as (19) . Therefore, (19) is positive whenever . This proves the lemma. Finally, we obtain the following. Theorem 3: For the Godard family of algorithms (with ) characterized by the function in (3), if A1–A5 is satisfied with and , then for any where as Therefore, when and , is locally Lipschitz at . As in the case of Sato, one can observe from (16) that the is not bounded. Therefore, we have only derivative of is locally Lipschitz, but the global Lipsshown that chitz property is not guaranteed. This is not sufficient to guarof the ODE (15) will not blow up in antee that the solution a finite time. However, now, we use another technique to guarantee this. Lemma 6: Under the assumptions A1–A5 with , the ODE (15) has a unique solution for any given initial condition, and the solution does not blow up in a finite time. Proof: To prove this result, we use the following fact. If the ODE has a unique local solution and there is a continuous , such that as function and whenever is large enough, then the solution of the ODE does not blow up in a finite time. Below, we show such a function. . Here, is the solution of the ODE As in Theorem 2, here also we have proved that the trajectory spends a lot of time around an attractor with a large of if as . probability as Next, we give a simplified form of (see details is i.i.d. [ denotes in [16]) when noise normal distribution with mean 0 and variance ]. We will use . these simplifications in Section VII to plot the solution Denoting (20) where . 1526 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 4, APRIL 2005 V. PERFORMANCE ANALYSIS OF SATO AND CM ALGORITHMS The performance of the communication system described in Section II at any time depends on the values of the equalizer weights at that time. Starting from any point, the trajectory of the equalizer coefficients is given by the process . This process, by Theorems 2 and 3, is approximated by the of the corresponding ODE. If the ODE solution converges to an attractor, the steady-state performance of the system can be approximately obtained by taking the equalizer weights at the attractor. Thus, in this section, first, we study the attractors of the ODE and then look at the overall trajectory to obtain the system performance under transience also. In Section V-A, we study the attractors of the algorithms. In Section V-B, for a particular system, we compute the symbol error rate for a given value of the equalizer. This allows us to compute the performance at any time. We will illustrate it by an example in Section VI. A. Attractors of the ODEs Similarly, defining we get, when is i.i.d., (24) and diag The cost functions for the Sato and CM algorithms are sgn (21) (22) . The derivative of these functions form the where right side of the ODE to which these algorithms converge (as ). Thus, the equilibrium points of the ODE are the points , where the derivative vanishes, i.e., when . We have established earlier that the algorithms can converge to equilibrium points, which are attractors. Attractors are the equilibrium points that are local minima of the cost functions. When the cost surface has multiple minima, then we may have ill-convergence, i.e., convergence to local minima that are not global minima. Since ill convergence can degrade the performance of the system substantially, it has been studied extensively (see the surveys [9], [12], and [24] for the noiseless case, and [18] and [29] for the noisy case). In particular, for finite length equalizers, there can be ill convergence. However, all these results are for i.i.d. channel input. Under our general conditions, the situation can be more complicated. The attractors and their regions of attractors depend on the input and noise distributions, but the possibility of ill-convergence persists. For fractionally spaced CMA, which will be discussed in Section VI, the ill convergence can be avoided under the small noise variance. Let us first consider the Sato algorithm. In general, will have multiple zeros. If the Hessian of is positive definite at a zero of , then it is a local minimum of the cost function and an attractor for the is i.i.d., with distribution ODE. We can show that if the noise , then (25) represents a diagonal matrix with the th elewhere diag ment given by . We will use (23) and (25) to obtain the equilibrium points (the zeros of ) for the Sato and CM algorithms for a specific communication system we describe below. From the equilibrium points, we will find the attractors. If one also knows the regions of attraction of these attractors, one can predict to which attractor the ODE will converge for a given will initially tend to that initial condition. The process attractor and stay around it for some time and then again wander ) or even diverge. If is away to another attractor (since very small, the time spent around an attractor can be very large. The following lemma is used in subsequent sections. is i.i.d. , the attractors Lemma 7: When noise of CMA depend continuously on noise variance , and there is no bifurcation in the branch of attractors with respect to noise variance . is a Proof: From (24), we observe that . In addition, from (25), the Hescontinuous function of sian of is a continuous function of in the open set . Therefore, is continuously differentiable . in of restricted to the The total derivative affine subspace { and fixed} is the . It is positive definite at an atsame as the Hessian of tractor of CMA and, hence, invertible. represent an attractor corresponding to noise variance Let . Then, by the implicit function theorem [19], there exist open and of and a mapping from neighborhoods of to such that and for all . In addition, by continuity of the Hessian of as a function of , the Hessian of at these solutions stays positive for small . Hence, these solutions are attractors. Further, due to the implicit function theorem, there is no bifurcation at [5, p. 129]. the attractor B. Performance Analysis (23) Given the equalizer coefficients, one can study the performance of a communication system. As an example, we SHARMA AND RAJ: CONVERGENCE AND PERFORMANCE ANALYSIS OF GODARD FAMILY AND MULTI MODULUS ALGORITHMS consider a particular pulse amplitude modulation (PAM) system and compute its symbol error rate (SER) at particular equalizer coefficients. Using these, we will obtain the system performance under transience and steady state. We to be i.i.d. . For will take the noise sequence , , define Thus, the equalizer output, with input , is (26) is the random variable reprewhere senting the distortion due to the ISI in the th output of the equalizer. The PAM constellation used is shown in Fig. 2, where and is the energy in the pulse used to transmit the symbols. is the parameter used to control the spacing between the signal points and, hence, the average signal power of the constellation. To compute the probability of a symbol error, we need to find the probability that the summation of the noise term and the ISI term exceeds in magnitude one half of the distance between . two levels. In our case, After the transmission of the PAM signal through the channel and the equalizer, the minimum intersymbol distance reduces . Thus, the minimum intersymbol distance in by a factor of is somehow the new constellation is (we are assuming that available at the receiver) (27) Then, the probability of error at the receiver is error when For is transmitted i.i.d., this can be simplified to (28) . where Similarly, one can obtain expressions for other constellations. We can use the above expressions for SER and the ODE ap- Fig. 2. 1527 PAM constellation. proximation of the equalizer trajectory for the Sato and the CM algorithm to compute the trajectory of the SER for the two algorithms. We will compare the trajectory so obtained with the simulation results in the next section. VI. FURTHER EXTENSIONS AND GENERALIZATIONS In this section, we extend the results provided so far to other systems. First, we consider in Section VI-A the CMA FSE or, equivalently, the CMA single input multiple output (SIMO) equalizer. In Section VI-B, we consider the recently proposed algorithm MMA. The convergence proofs for these equalizers require minor modifications to the proofs in Section IV. Therefore, after explaining these systems, we will only make the necessary comments. A. CMA FSE Let the continuous time channel output of the system in Fig. 1 be sampled at times the baud rate ( ). If , then , then it we obtain the system studied so far, but when is an oversampled system. This can lead to improvements in performance. For example, if a length-and-zero condition [9] is satisfied for a certain upsampling factor , then taking sampling rate times the baud rate will ensure that the FSE CMA will no longer have ill convergence under noiseless conditions. In addition, the required equalizer length can be small. , the sampled In the case of oversampling at rate times, channel output can be subdivided into subsequences, each of which is the output of a discrete linear time invariant channel . Each of these channels can be equalized by a with input separate CMA equalizer. Such a system (see Fig. 3) is called an FSE CMA. Fig. 3 can also represent an SIMO channel where there is a single input sequence , and the receiver gets the output from separate channels (e.g., multipaths in a fading wireless channel or the receiver with multiple antennas). Thus, this system is also covered by our analysis. is the The following notations are used in this section: th component of th equalizer at time , and is the noise in th subchannel at time . The outputs of the channel (with and without noise) and are represented by similar notation. is the th component of the th channel. The output of the equalizer can be written as with , where 1528 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 4, APRIL 2005 Fig. 3. Block diagram of an FSE. Further, , which is the channel output, can be expressed in and the channel co-efficients terms of the input sequence as , where Fig. 4. Communication system model (using CAP architecture) with blind equalizer. and B. MMA .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . .. . The CMA FSE equalizer is updated as The only difference between CMA and CMA FSE is in the is related to input. Now, . It is easy to see way is asymptotically stationary ergodic, then so is . that if With the above mentioned notational changes, the proof of a theorem corresponding to Theorem 3 for convergence of CMA FSE is same as for CMA. Next, we consider the attractors. It is shown in [9, ch. 7] that under i.i.d. input conditions and zero noise, if the length and and no two chanzero condition (equalizer length nels have common zeros) is satisfied by the channel, CMA FSE has only one minimum in each hyper cone for all , where , and is the th component of . Each of these minima is a global minimum. One can show that Lemma 7 holds for CMA FSE. Thus, there is no bifurcation in any branch of attractors with respect to noise variance. This implies that for any noise variance , the number of attractors remains the same. Hence, by continuity of attractors as a function of , for very small noise variances, each cone still contains a unique attractor. These attractors will be close in their cones. to the global optima corresponding to The performance of the system SER depends continuously on the equalizer coefficients, as is evident from (28). Therefore, for , the SER at limit points will stay close to that of small the globally optimum equalizer. In this section, we study the MMA, as described in [27]. The communication system considered for this algorithm in [27] is the Carrierless Amplitude and Phase (CAP) modulation transceiver. In the following, we first briefly describe this system. Then, we will explain the MMA algorithm for this system. This algorithm can perform better than CMA for nonsquare and dense constellations. However, we will see that convergence of this algorithm is an easy extension of the convergence proofs in Section IV. We study the communication system in Fig. 4. Now, the is complex valued with . channel input In the CAP system, the role of carrier to preserve the phase information is taken care of by two digital filters with impulse and modulating the real and imaginary responses part separately. These filters operate at a rate greater than ), where is the symbol interval. The and are ( orthogonal to each other and have unit energy. Therefore, we choose a Hilbert pair for this purpose. and . The The outputs of the upsampler are denoted by upsampling is done by adding the necessary number of zeros rather than using interpolation. The higher sampling rate is used only to ensure orthogonality at the receiver and, hence, to demodulate the channel real and imaginary outputs independently. This requires the equalizers to be initialized with corresponding sequences of a Hilbert pair, as will be explained , output , and noise later. The channel impulse response are as in Section II. In addition, . To exactly remove the ISI, the transfer function of the equalizer should be the inverse of the channel transfer function convolved with the corresponding waveform of the Hilbert pair. An adaptive scheme is used to learn the proper equalizer coefficients and , as the input sequence is transmitted. and , which are the values of the equalizer coefficients at time , are vectors of dimension with respective components and . be the output of equalizer , and let be the output Let of equalizer at time . Then, , where . Similarly, . We denote SHARMA AND RAJ: CONVERGENCE AND PERFORMANCE ANALYSIS OF GODARD FAMILY AND MULTI MODULUS ALGORITHMS and the equalizer is updated as . Based on the input 1529 , (29) where for MMA is given by (30) and . and are downsampled to the The equalizer outputs symbol rate. This sequence is fed to a nearest neighbor decision and of input symbols device to obtain the estimates and . These equations individually are similar to the equations for CMA analyzed in Section IV, except for the upsampling at the transmitter. Thus, making the notational modifications, as in Section VI-A, we can obtain convergence of their trajectories to the corresponding ODEs. Equations (29) and (30) can be studied independently, and their attractors can be obtained as for the CMA algorithm in Section V. In Section VII, we will demonstrate the convergence of the trajectories to the corresponding ODEs via simulations. Once we have the equalizer coefficients at any given time, we can study the system performance at that time. One can also use a CMA algorithm as an equalizer for this system (use details in [27]). Then, of course, the convergence of this algorithms holds, as in Section IV. VII. SIMULATION RESULTS In this section, we apply the theoretical results obtained and verify them via simulations for a particular PAM communication system for the Sato and the CM algorithms. For the MMA, we consider a CAP system at the end of the section. Let us consider the channel with impulse response for . Let the input symbols be i.i.d. with values . The equalizer has two taps. in is i.i.d. . The noise One can show that for the above channel, the Sato cost func), ( , 0.3), (0, tion with no noise has local minima at (1, ). Since the cost function of Sato (and also of 0.89), (0, CMA) is even, we need to study only two of these minima. The at (1, ) and (0, ) is with the corresponding eigenvalues {0.7791, 1.4382} and {0.7792, 1.4386}, respectively. This shows that all the equilibrium points are local attractors. Similarly, we can show that for the no-noise case, the cost function of the CMA has zero Fig. 5. SAT-scaled trajectory and Sato ODE with step size channel 1 with different initial conditions. gradient locations (equilibrium points) at (1, ( , 0.3), (0, ), and (0, 0). The = 0:001 for ), (0, 0.888), is The eigen values of these matrices are {3.5139, 1.3949}, , }, respectively. This {3.9435, 0.8576}, and { shows that all the equilibrium points except (0, 0) are local attractors. The simulation is done for different values of , different initial settings for the equalizer taps, different values for SNR, and different source distributions. Thereby, our analysis explains the behavior of the Sato and CM algorithms in transience and steady state for a wide range of possible variations in the system settings. and for the Sato algorithm for In Fig. 5, we plot different initial conditions.. The different parameters are , noise variance , and the source distribution is uniand overlap almost perfectly. The plots form. The ) and (1, 0). For the correspond to initial conditions (0.5, ), given channel, the optimum equalizer coefficients are (1, but the trajectory corresponding to the initial condition (0.5, ) ), which is not an optimum converges to the limit point (0, point. This illustrates ill convergence. However, when started from (1,0), the algorithm converges to the optimum equalizer coefficients. This establishes the fact that there exist multiple equilibria. The trajectories differ in their limit values, depending on the initial conditions. Fig. 6 plots the curves for the same channel and parameters ), (1, 0), and (2, for the CMA with initial conditions (0.5, 2). All above observations continue to hold, except that the rate of convergence is much faster, and the oscillations are more in CMA. 1530 Fig. 6. CMA-scaled trajectory and CMA ODE with step size = 0:001 for channel 1 with different initial conditions. Fig. 7. Sato-scaled trajectory and Sato ODE for channel 1 with different values for SNR = 30, 20, and 10 dB. We have also simulated the trajectories for Sato and CMA for different values of (0.001 and 0.005) with parameters dB and the input distribution uniform. The results SNR are provided in [16]. Due to lack of space, we will not report them here. From this, we can infer that even with noise, the ODE . As increases, the oscillations increase. As tracks the increases, the rate of convergence also increases. In Fig. 7, we have shown the behavior of the algorithm and the solution of the ODE for different values of SNR (which is also the noise power in this case since the signal power is 0 dB). From this, we can infer that the noise level can change increase with noise. the limit points. The oscillations in very well as the noise The theoretical ODE does not track level increases (at 10 dB). Fig. 8 shows the corresponding results for CMA with the same qualitative conclusions. In Fig. 9, we have plotted the trajectories for different source distributions. The trajectories in the figure correspond to the uniform distribution and to the distribution (0.4, 0.4, 0.1, 0.1) for . From this, we infer that the change in the input symbol IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 4, APRIL 2005 Fig. 8. CMA-scaled trajectory and CMA ODE for channel 1 with different values of SNR = 30, 20, and 10 dB. Fig. 9. Sato-scaled trajectory and Sato ODE for channel 1 with different source distributions with 20-dB SNR. Fig. 10. CMA-scaled trajectory and CMA ODE for channel 1 with different source distributions with 20-dB SNR. input distributions can change the attractors and the ODE trajectory. One draws the same inferences for CMA from Fig. 10. SHARMA AND RAJ: CONVERGENCE AND PERFORMANCE ANALYSIS OF GODARD FAMILY AND MULTI MODULUS ALGORITHMS Fig. 11. SER evolution for the Sato equalizer with 15-dB SNR and nonuniform source distribution (0.4, 0.4, 0.1, 0.1). Fig. 13. MMA trajectory and MMA ODE with step-size channel 1 with 20-dB SNR. 1531 = 0:001 for We have also done simulations for a channel with impulse , 0.2572, 0.1221, , 0.1167, response (1.0000, , , 0.0356, , ). The optimum three-tap equalizer has the values (1,0.815,0.407). We found similar conclusions for Sato as well as CMA. The results are provided in [16]. Finally, we provide simulations for an MMA system. In the simulations, we have used an 8-QAM constellation given by . }. We The channel has the impulse response {1, 0.3, upsample the encoder output by a factor of 4. The new sample values are zeros embedded in the intermediate sample instants. , ] and [ , We choose a Hilbert pair [0.5, ] in our simulations. Gaussian noise is used to 0.5, 0.5, corrupt the output of the channel. The initial impulse response of the equalizers are the same as their corresponding Hilbert sequences. The MMA algorithm is used to update the equalizer and the coefficients. The ODEs and the simulated values are and SNR 20 dB. shown in Fig. 13 for Fig. 12. SER evolution for the CM equalizer with 15-dB SNR and nonuniform source distribution (0.4, 0.4, 0.1, 0.1). We also observe from these that the limit points are different in both the algorithms, in spite of maintaining all the parameters the same. It is also seen that the Sato algorithm has higher rate of convergence than that of a CM algorithm, when the source distribution is (0.4, 0.4, 0.1, 0.1). The rate of convergence observation here contradicts the one made above for uniform input distribution. Thus, one cannot conclude that one algorithm converges faster than the other in all conditions. The SER trajectory for this example is plotted for the two algorithms in Figs. 11 and 12. The simulated SER is obtained by running the simulation for a long time at the given equalizer values. We provide the plots for 20 dB SNR with source probability mass function (0.4, 0.4, 0.1, 0.1). The figures provide the theoretical curves obtained above as well as the results obtained via simulations (for the simulated trajectory). One sees a close match between theory and simulations in all cases. APPENDIX PROOF OF LEMMA 2 is continuously differIt is sufficient to show that . We will show it for the component of entiable for . Similarly, one can be shown for other components of . , if we show that Since exists for a fixed and all and , where are the component functions of and are integrable, then by the dominated convergence theorem, is differentiable with respect to and . Furthermore, again by the dominated convergence theorem, is continuous if is continuous for at all . Thus, we prove these facts for for any and any . From (7), we observe that we only need to consider the funcsgn and sgn . tions 1532 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 4, APRIL 2005 First, consider sgn sgn . For a given Next, consider the case when . Then, (32) equals implies . Since , this (31) with Thus, we only need to show that the derivative of exists and is continuous with respect to when respect to . For a fixed and , since has a density, also has a density, and hence, . is We have (32), shown at the bottom of the page, where the density of . by . We consider (32), when Denote and when separately. First, consider . Because has a density, also has when a density, which we denote by . Then, (32) becomes , we (33), shown at the bottom of the page. For a fixed of is bounded (say show below that the density ), and hence, the integrand in the above integral is by . Since has been bounded by assumed, by the dominated convergence theorem, the limit as can be taken inside the integral, and hence, (33) equals (34) Now, we show that is continuous in and bounded . We know that is a convolution whenever when and . Since , of densities of for some . Then, density of is given by (37) Using (36) and (37), we have Since and , the right side of the above inequality is integrable. Next, we show the continuity of the derivative of . From (35), we observe that for , is a continuous function of . We have also shown above that for is bounded. Thus, by the dominated converis continuous in for . Again, gence theorem, using (34), by the dominant convergence theorem and from (37), we obtain the continuity of the derivative of with respect to . sgn Next, consider the differentiability of with respect to . Consider the th component of the right side of the equality sgn (38) where represents the indicator function. Since by assumption A3 the components of are i.i.d., we obtain (39) (35) is an upper bound on the density of the noise where . Since the density of is a convolution of densities for with , and the convolution of densities of is upper bounded by the upper bound on any one of them, we . Thus, obtain from (34) (36) when . We consider the derivative of it with respect to the th of . Using arguments similar to showing component sgn and the the differentiability of dominated convergence theorem, we have the deriva, which equals (assuming tive of ), (40), shown at the top of the next page, where is defined as . the random variable The finiteness of the expression in case of is obvious from . an argument given above with the assumption , we can show, in the same For the expression in (40), for of is bounded by way, that the density for . Hence, under the assumption , (32) (33) SHARMA AND RAJ: CONVERGENCE AND PERFORMANCE ANALYSIS OF GODARD FAMILY AND MULTI MODULUS ALGORITHMS 1533 for for (40) and are bounded, the the integral is finite. Since two integrals in (40) can be bounded by integrable functions . Using arguments similar to showing of when sgn is continuous with respect to that , we can show that sgn is also continuous with respect to . ACKNOWLEDGMENT Various discussions with V. Borkar and V. Kavitha have been very useful. REFERENCES [1] A. K. Abed-Meraim, P. Duhamel, D. Gesbert, L. Loubaton, S. Mayrargue, E. Moulines, and D. Slock, “Prediction error methods for time-domain blind identification of multichannel FIR filters,” in Proc. Asilomer Conf. Signals, Syst., Comput., 1994, pp. 1154–1158. [2] S. Bellini, “Bussgang techniques for blind equalization,” in Proc. Global Telecommunication Conf., Houston, TX, Dec. 1986, pp. 1634–1640. [3] A. Benveniste, M. Goursat, and G. Ruget, “Robust identification of a nonminimum phase system: Blind adjustment of a linear equalizer in data communications,” IEEE Trans. Autom. Control, vol. AC-25, no. 3, pp. 385–399, Jun. 1980. [4] J. Bucklew, T. G. Kurtz, and W. A. Sethares, “Weak convergence and local stability properties of fixed step size recursive algorithms,” IEEE Trans. Inf. Theory, vol. 30, no. 3, pp. 966–978, May 1993. [5] B. Buffoni and J. F. Toland, Analytic Theory of Global Bifurcation. Princeton, NJ: Princeton Univ. Press, 2002. [6] R. Cusani and A. Laurenti, “Convergence analysis of the constant modulus algorithm blind equalizer,” IEEE Trans. Commun., vol. 43, no. 2/3/4, pp. 1304–1307, Feb/Mar/Apr. 1993. [7] Z. Ding, R. A. Kennedy, B. D. O. Anderson, and C. R. Johnson, Jr., “Ill-convergence of Godard blind equalizers in data communication systems,” IEEE Trans. Commun., vol. 39, no. 9, pp. 1313–1327, Sep. 1991. [8] , “Local convergence of the Sato blind equalizer and generalizations under practical constraints,” IEEE Trans. Inf. Theory, vol. 39, no. 1, pp. 129–144, Jan. 1993. [9] Z. Ding and Y. Li, Blind Equalization and Identification. New York: Marcel Dekker, 2001. [10] G. D. Forney Jr., “Maximum likelihood sequence estimation of digital sequence in the presence of intersymbol interference,” IEEE Trans. Inf. Theory, vol. IT-18, pp. 363–378, May 1972. [11] L. Garth, “A dynamic convergence analysis of blind equalization algorithms,” IEEE Trans. Commun., vol. 49, no. 4, pp. 624–634, Apr. 2001. [12] G. B. Giannakis, Y. Hua, P. Stoica, and L. Tong, Signal Processing Advances in Wireless and Mobile Communications. Englewood Cliffs, NJ: Prentice-Hall, 2001, vol. 1. [13] D. N. Godard, “Self-recovering equalization and carrier tracking in twodimensional data communication systems,” IEEE Trans. Commun., vol. COM-28, pp. 1867–1875, Nov. 1980. [14] O. Macchi and E. Eweda, “Convergence analysis of self-adaptive equalizers,” IEEE Trans. Inf. Theory, vol. IT-30, pp. 162–176, Mar. 1984. [15] J. Mai and A. H. Sayed, “A feed back approach to the steady state performance of fractionally spaced blind adaptive equalizers,” IEEE Trans. Signal Process., vol. 48, no. 1, pp. 80–91, Jan. 2000. [16] V. N. Raj, “Convergence and Performance Analysis of Godard Family of Algorithms,” M.Sc. (Engg.) thesis, Indian Inst. Science, Bangalore, India, 2002. [17] L. C. Piccinini, G. Stampacchia, and G. Vidossich, Ordinary Differential Equations in < . New York: Springer, 1984. [18] P. A. Regalia and M. Mboup, “Properties of some blind equalization criteria in noisy multiuser environments,” IEEE Trans. Signal Process., vol. 49, no. 12, pp. 3112–3122, Dec. 2001. [19] W. Rudins, Principles of Mathematical Analysis, Third ed. Singapore: McGraw-Hill, 1973. [20] M. Rupp and A. H. Sayed, “On the convergence of blind adaptive equalizers for constant modulus signals,” IEEE Trans. Commun., vol. 48, no. 5, pp. 795–803, May 2000. [21] Y. Sato, “A method of self-recovering equalization for multi-level amplitude modulation,” IEEE Trans. Commun., vol. COM-23, pp. 679–682, Jun. 1975. [22] P. Schniter and C. R. Johnson, “Bounds for the MSE performance of constant modulus estimators,” IEEE Trans. Inf. Theory, vol. 46, no. 7, pp. 2544–2560, Nov. 2000. [23] O. Shalvi and E. Weinstein, “New criteria for blind deconvolution of nonminimum phase systems (channels),” IEEE Trans. Inf. Theory, vol. 36, no. 2, pp. 312–321, Mar. 1990. [24] L. Tong and S. Perreau, “Multichannel blind identification: From subspace to maximum likelihood methods,” Proc. IEEE, vol. 86, no. 10, pp. 1951–1968, Oct. 1998. [25] L. Tong, G. Xu, and T. Kailath, “Blind identification and equalization based on second-order statistics: A time domain approach,” IEEE Trans. Inf. Theory, vol. 40, no. 2, pp. 340–349, Mar. 1994. [26] V. Weerackody, A. Kassam, and K. R. Larker, “Convergence analysis of an algorithm for blind equalization,” IEEE Trans. Commun., vol. 39, no. 6, pp. 856–865, Jun. 1991. [27] J. Yang, J. J. Werner, and G. Dumont, “The multimodulus blind equalization and its generalized algorithms,” IEEE J. Select. Areas Commun., vol. 20, no. 5, pp. 997–1015, Jun. 2002. [28] J. Yang, J. J. Werner, and G. A. Dumont, “The multimodulus blind equalization algorithm,” in Proc. Thirteenth Int. Conf. Digital Signal Process., Santorni, Greece, Jul. 1997. [29] H. H. Zheng, L. Tong, and C. R. Johnson, “Relationship between the constant modulus and Wiener receivers,” IEEE Trans. Inf. Theory, vol. 44, no. 4, pp. 1523–1538, Jul. 1998. Vinod Sharma (SM’00) received the B.Tech. degree in electrical engineering from Indian Institute of Technology, Delhi, in 1978 and the Ph.D. degree in electrical and computer engineeering from Carnegie Mellon University, Pittsburgh, PA, in 1984. He was an Assistant Professor with Northeastern University, Boston, MA, from 1984 to 1985 and a visiting faculy member with the University of California, Los Angeles, from 1985 to 1987. He has been a faculty member with the Indian Institute of Science, Bangalore, since 1988, where he is currently a Professor with the Electrical Communication Engineering Department. His research interests are in modeling, analysis, and control of wireline and wireless networks. Recently, has also been interested in information theory and signal processing aspects of wireless channels. V. Naveen Raj (S’02) received the B.E. degree in electronics and communicatioon from Anna University, Chenai, India, in 1998 and the M.Sc.(Engg.) degree in communication engineering from Indian Institute of Science, Bangalore, India, in 2002. He is currently with Philips Consumer Electronics, Bangalore, where he is involved in the development of audio codecs. His research interests are in the areas of statistical signal processing algorithms and information theory applied to communications and audio compression.