FPGA Implementation for Channel Estimations Based on Wiener LMS for DS-CDMA Mohamed ELNAMAKY, Messaoud AHMED-OUAMEUR and Daniel MASSICOTTE Laboratory of Signal and System Integration Department of Electrical and Computer Engineering Université du Québec à Trois-Rivières Trois-Rivières, Québec, Canada {mohamed.elnamaky, ahmed.ouameur, daniel.massicotte}@uqtr.ca Abstract — The estimation of channel delays along with their respective complex channel coefficients of different users constitutes the first stage in the detection process at the receiving base station in a DS-CDMA communication system. A multiuser steepest Wiener LMS (MS-WLMS) like structure algorithm along with smoothing/prediction filters to improve tracking quality is suggested. This paper presents a customized and fixed-point hardware parallel implementation of the proposed algorithm for WCDMA uplink transmission in third generation (3G) wireless system. Additional speedup in the execution time is achieved over the well known Maximum Likelihood channel estimation for DS-CDMA. It is also shown that our solution could achieve the real-time requirements of 3GPP standards applied in WCDMA systems. I. INTRODUCTION In a Code-Division Multiple-Access (CDMA) communication system, the input to the receiver is a linear superposition of the signals transmitted by all the users, attenuated by arbitrary factors and delayed by an arbitrary amount. The goal of channel parameter estimation is to determine these unknown and time-varying attenuation factors and delays [1]. Channel estimation is one of the major problems in radio communications particularly when the mobile system is subject to multipath fading. Since CDMA systems are inherently interference limited. Receivers can combat multiple access interference (MAI) by using multiuser channel estimation (e.g. [2]-[5]). Recently, some channel estimation algorithms have been proposed for long code systems [6]-[9]. Designing efficient multiuser channel estimation and tracking for fast time varying multipath channel is of an important concern [10]. The current algorithms for channel estimation do mainly rely on some kind of averaging assuming that the channel coefficients are constant at least during the period of interest (training period, defined window). Moreover, the actual channel estimation adopted by the industry is based on Correlator channel estimation filter (Correlator-CEF) [11]. The well known, Maximum Likelihood (ML) [1] channel estimation operates on an averaged decision statistic over successive (windowed) outputs of the matched-filters of all the users. Even though it provides satisfactory performance in slow channel variation, it is still not efficient in case of fast time varying channels. 0-7803-9333-3/05/$20.00 ©2005 IEEE Toward this end, a Multiuser Steepest Wiener LMS-like structure (MS-WLMS) along with smoothing and perdition filters to improve tracking quality was suggested in [14] (briefly described in Section III). The choice for such an adaptation family stems from its low computational complexity and its regular structure, which is favorable for an efficient VLSI implementation where parallelism and fine pipeline are easily applied. They are computationally effective due to the even distribution of the computation load. Like Kalman-based channel estimation methods, an autoregressive stochastic model of correlated Rayleigh fading processes is used but indirectly embedded into the design. The attractive property of the proposed structure is the unique filter settings used for a large range of mobile speeds and for all users accessing the system. In this paper, we proposed a FPGA implementation of Correlator-CEF and the MS-WLMS channel estimation algorithms for DS-CDMA. The computation complexity of the proposed MS-WLMS algorithm is proven to be as low complexity as close to the Correlator with a substantial comparable performance gain like ML algorithm. It is also investigated that our solution could achieve the real-time requirements of 3GPP standards applied in WCDMA systems. In Section II, we describe the environment of wireless and multipath fading channels. The MS-WLMS algorithm is briefly described in Section III. Section IV is dedicated for architecture design for both Correlator-CEF and the MS-WLMS algorithm. Simulation and implementation results are shown in Section V. Finally a conclusion is drawn. II. CHANNEL MODEL AND SINGLE-USER ESTIMATION In this paper, the channel is modeled as consisting of various multipath components for each user. Consider a system model consists of K users in asynchronous DSCDMA operating with long spreading codes. That transmitted signal of the kth user corresponding to an information sequence L is given in baseband format by L sk (t ) = Ek ∑ bk ,i ck ,i (t − iT ) (1) i =1 where T is the bit duration, ck,i is the spreading chip, bk,i is the corresponding ith bit of the kth user, and finally Ek is the 618 SIPS 2005 transmitted power of that user. By considering the nature of multipath channel with Pk paths for each kth user and let the complex attenuation and the delay with respect to some timing reference at receiver of the pth path of the kth user be donated by wk,p and τk,p respectively, the received signal can be represented as K Pk r (t ) = ∑∑ wk , p sk (t − τ k , p ) + n(t ) (2) where n(t) is the additive white Gaussian noise. If we assume that all paths of all the users are within on symbol period N from arbitrary timing reference, we will have only two symbols of each user in each observation duration, and we can use the following representation developed in [4] ri = U i Zb i + ni (3) where ri is the ith N×1 observation vector, Z is a 2K(N+1)×2k channel response matrix, bi is a 2K×1 symbol vector and ni is a N×1 complex Gaussian zero-mean random vector σ2 for each of its independent elements and Ui is a N×2K(N+1) spreading matrix defined as U kR,i U kL,i +1 U kR,i U kL,i +1 U RK ,i U LK ,i +1 contribution of (1 − γ k , p )ωk , p and (γ k , p )ωk , p from the pth path of the kth user, where τ k , p = (qk , p + γ k , p )Tc . Then, if a user k has one path at delay τ k ,1 , then we can write the channel impulse response vector of that user as z k = [0...0 (1 − γ k ,1 ) wk ,1 (γ k ,1 ) wk ,1 0...0]T which could be calculated by adopting the simplified singleuser channel estimation, namely Correlator-CEF, given by 1 L H zˆ SU ( L) = (6) ∑ ( Ui Bi ) ri . NL i =1 (4) III. .. ck ,i [ N ] 0 .. 0 0 : : : , .. 0 0 .. 0 0 0 0 .. ck ,i [1] 0 .. ck ,i +1[1] ck ,i [2] . : : : : 0 ck ,i +1[1] .. ck ,i [ N − 1] ck ,i +1[1] ck ,i +1[2] .. ck , i [ N ] • compute the error 0 b1,i 0 b1,i +1 0 b2,i 0 b2,i +1 Bi = : : : : 0 0 0 0 .. .. .. .. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ⊗ I N +1 : : bk ,i bk ,i +1 ε i as H i εi = ri − X zˆ i|i −1 (7) H i where X = U i Bi ; • compute the smoothed version of zˆ i as zˆ i = zˆ i|i −1 + µ Xi ε i The spreading matrix, Ui, is constructed using shifting the versions of the spreading codes corresponding to the ith and i+1th symbols for each user in the observation window allowing all possible cases to be considered in the model. Clearly, the number of paths should not exceed the spreading factor N (in WCDMA long code system the pilot defined N=256). By rearranging the model we rewrite the received vector for channel estimation as ri = U i B i z i + n i where Bi is a 2K(N+1)×K(N+1) matrix defined as MS-WLMS ALGORITHM In this section, a brief description of the MS-WLMS propose in [14] is done. The MS-WLMS algorithm is based on the baseband model (5) and the method proposed in [10]. The MS-WLMS algorithm can be summarized below for i = 1, 2, , M , as soon as the observation N × 1 vector ri is received, ck ,i [2] ck ,i [1] c [2] ck ,i [3] k ,i : : = c [ N − 1] c [ k ,i N ] k ,i ck ,i [ N ] 0 0 0 = : 0 0 T each user. z i = z1T zT2 zTK is a K(N+1)×1 channel response vector where zˆ k is the (N+1)×1 channel response vector for the kth user, assuming that all paths are located within one symbol duration Tc from some time reference, then the qkth, p and (qk , p + 1)th elements of z k have the k =1 p =1 U i = U1,Ri U1,L i +1 U 2,R i U1,L i +1 with where ⊗ denotes Kronecker product and IN+1 is the identity matrix of rank N+1, thus we estimate N+1 parameters for (8) where the term Xi ε i can be view as performing the correlation over the prediction error from 1. • compute the one step prediction of (9) zˆ i +1|i = zˆ Si + zˆ iP where zˆ Si = (5) N S −1 ∑ξ n n =1 zˆ i − n −1 (10) zˆ i − n +1|i − n . (11) and zˆ iP = N P −1 ∑ζ n =1 n Using the procedure outlined in [10] after a proper choice of the AR model order p and the step size µ , the coefficients ξ n and ζ n are computed. The algorithm as outlined above needs to set µ to its optimal value. One way to search 619 dynamically for µ is to compute µ at each iteration i = 1, 2, , M , as in steepest descent based algorithm, using εH ε µi = λ i i . 2 KN (12) D D D D PE1 PE1 PE1 PE1 C k,i(i) c k,i(1)bk,i At an additional computational cost the MS-WLMS shows stable and suitable solution for setting µ . ck,i(2)b k,i c k,i(1)b k,i PEC Array CONTROL UNIT Set of RAM-Based Shift Registers Pilot ROM B PCB Array ck,i(N)bk,i ck,i(2)b k,i ck,i(3)b k,i ck,i(4)b k,i b k,i ck,i(N)b k,i D D D D D PE2 PE2 PE2 PE2 PE2 r(1)/z(1)/e(1) r(2)/z(2)/e(2) r(3)/z(3)/e(3) output r(N)/z(N)/e(N) r(4)/z(4)/e(4) b) Fig. 2 Structure of processor arrays for Processing Code array (a) and processing estimated channel vector array (b). old creal 0 new creal new cimag 1 cb real 1 cb imag 0 old cimag a) creal 0 cimag 1 0 b b) outreal R + 0 0 0 1 2's 1 + m ẑ i cbimag 0 b R r real/z real cb real 1 c(i) * b = m+jn Channel Estimates control 0 control in real Code Generator C c k,i(3)bk,i a) IV. ARCHITECTURE DESIGN Based on processor array implementation, a common architecture design was developed for both MS-WLMS and Correlator-CEF channel estimation algorithms. For a number of K users, we reuse the same time-pipelined general processing unit for a single user. This trend of implementation does not prevent the modification of the architecture to use a single or a pre-defined number of general processing units in order to save system resources on a FPGA chip. Due to the general restriction of using a spreading factor of N=256, the general processing unit could go under further mandatory pipelining procedure based on the calculation of the processing cycle and its convenience to targeted channel fading rate. Fig. 1 shows the general block diagram for both channel estimation techniques. Received vector ri Old Reg 1 2's 0 1 R r imag/zimag n 0 1 + 0 1 + in imag R out imag control Fig. 1 General block diagram of channel estimation processor. The function of the processing code-bit array (PCB) admits at its inputs the pilot sequence and the spreading codes to generate XiH . The function of the processing estimated channel (PEC) vector array along with the "RAM based Shift Register" module is to implement (7)-(12) in case of MS-WLMS and/or (6) in case of the single user channel estimation. The structure of both arrays is shown in Fig. 2 and 3. Since we assume Binary Phase Shift Keying (BPSK) modulation (1 bit/symbol), matrix multiplication involves only multiplications by ±1’s which helps to design multiplierfree PCB array element PE1 and PEC array element PE2 respectively, shown in Fig. 3. Fig. 4 illustrates the general structure of Correlator-CEF architecture based on the mathematical model given in (6). This assumes an averaging window length of L=4, and multipliers-free design [15]. Timing diagram is shown in Fig. 5. For a fixed number of users K, this diagram represents the frequency an observation vector ri is being processed to generate the channel estimates of each user after a certain averaging window of length L. A processing cycle of 2(N+1) clock cycles is required in order to receive and process the data of a single observation vector. This implies a latency of c) Fig. 3 Processor Elements a) PE1 Correlator-CEF b) PE1 MS-WLMS c) PE2 for both architectures. L(N+1)/fclk with a throughput of 2(N+1)fclk, where fclk corresponds to the clock frequency. Fig. 6 shows the general structure MS-WLMS architecture based on the mathematical model proposed in Section III. Over the general processing unit of the Correlator-CEF, an additional block has been added related to both filters outputs and step size µ calculations. We have chosen the order filters as NS=5 and NP=4. Again, the multipliers used by the FIR smoothing/prediction filters of equations (10) and (11) in addition to the single multiplier accumulator (MAC) in the operation used to calculate the step size µ, are the only multipliers included in the whole architecture. Timing diagram is shown in Fig. 7. As the array of processor element PE2 represents the consumption bottleneck, this trend of pipelining appears be more attractive to be implemented. Accordingly, the latency and throughput are 2K(N+1)/fclk and 2(K+1)(N+1)fclk respectively. V. RESULTS AND COMMENTS The results of this work are functional hardware architectures targeted on the Xilinx Virtex II and Virtex II 620 MAC SHIFT Step size + Fig. 4. Processing unit general structure for user k, Correlator-CEF. ζ1 16 + 16 zˆ k ,4 ( N + 1) zˆ k ,2 ( N + 1) zˆ k ,3 ( N + 1) Channel estimates zˆ k ,4 ( N + 1) ξ0 ẑ i ζ2 256 x 16-bit RAM-Based Shift Register RAM-Based Shift Register ζ3 X ζ4 X S Processing cycle zˆ k ,1 ( N + 1) X X + CB H ∑ RAM-Based Shift Register RAM-Based Shift Register X µ ẑ i|i-1 16 X ẑ iP ẑ i ξ1 X + ∑ + ξ2 X ξ3 X ξ4 256 x 16-bit ẑ i ei RAM-Based Shift Register + 16 CB PEC Array RAM-Based Shift Register ∑ ∑ Channel Estimates CB H 1 RAM-Based Shift Register + - RAM-Based Shift Regiatrer + ∑ PCB Array Received vector ri 16 + 16 PEC Array ẑ i+1|i Pilot ROM B RAM-Based Shift Register 256 x 16-bit RAM-Based Shift Register Code Generator C 256 x 16-bit RAM-Based Shift Regiatrer 16 CB H 256 x 16-bit 1 RAM-Based Shift Register PCB Array RAM-Based Shift Register Pilot ROM B RAM-Based Shift Register Code Generator C X + r1 r2 0 zˆ k ,1 (1) (N+1) r3 zˆ k ,2 (1) 2(N+1) r4 zˆ k ,3 (1) 3(N+1) r5 zˆ k ,4 (1) zˆ k ,4 (1) 4(N+1) r6 ∑ zˆ k ,5 (1) ẑ Si Fig. 6. Processing unit general structure for user k,, MS-WLMS. nT 5(N+1) Fig. 5. Timing diagram of Correlator-CEF processing cycles. Processing cycle zˆ1,1 (1) Pro components satisfying the different algorithmic specifications. The FPGA validations are done in comparison with Matlab simulation based on a CDMA platform. A. Simulation Results Consider a WCDMA system with one receiving antenna for 64kb/s data rate in compliance with 3GPP. The users are assumed to asynchronously access the system. Vehicular A channel, with uniformly distributed mobile speeds between 0 km/h and 120 km/h is considered for simulation. Perfect power control (PC) is implemented (0% power control transfer error rate) with a PC step size of 1 dB, a PC dynamic range of 80 dB and a PC frequency of 1500Hz. The comparison was performed between MS-WLMS and Correlator-CEF along with different detectors, namely, the Rake receiver, hard multistage parallel interference canceller (Hard IC) and soft MPIC (soft IC). From Figs. 8a and 8b, it is apparent that performance improvement can be obtained using a superior channel estimator. At 64kb/s data rate, simulation results demonstrate the importance of using multiuser detectors instead of the Rake receiver to reach a BER as low as 5%. The multiuser detectors show performance gains that can reach even 4 dB’s over the Rake receiver using a correlator-CEF; in addition, extra 2 to 4 dB’s are attainable when our proposed channel estimator is used. B. Implementation Analysis The clock cycles in both architectures is restricted to either the propagation path through PEC processor element or one multiplication process combined with a register in addition to routing delays. The FPGA synthesis results for both propagation paths show a clock frequency of 110 MHz. Table I depicts the hardware resources need to implement the µ1 H X1,1 ẑ1|0,1 ε1 (1) H X1,2 ẑ1|0,2 ε1 ( N) X1,1ẑ1|0,1 zˆ1,2 (1) zˆ1,1 ( N +1) zˆ1,2 ( N + 1) zˆ 2,1 (1) X1,2 ẑ1|0,2 H X1,1 ẑ1|0,1 (N+1) X2,1ẑ1|0,1 ε 2 ( N) r2 r1 0 ε2 (1) H ẑ1|0,2 X1,2 µ2 2(N+1) 3(N+1) 4(N+1) 5(N+1) 6(N+1) 7(N+1) 8(N+1) 9(N+1) 10(N+1) Fig. 7. Timing diagram of MS-WLMS processing cycles, K=2. PCB and the PEC arrays on Virtex II or Virtex II-Pro from Xilinx devices. The post layout synthesis results are included in Table I using Leonardo and Foundation tools. Considering the PCB and the PEC arrays as having the same structure for both architectures, zˆ k is obtained each 2.33µs and 4.67(K+1)µs for Correlator and MS-WLMS techniques respectively. The time frame defined by 3GPP WCDMA is 10ms with 150 pilot-bits per frame. Considering the time and hardware constraints, we evaluated the maximum number of estimated channels for a given FPGA device. Table II shows the number of users that can be implemented on selected devices of FPGA families. Our target is to increase the number of the estimated channels to be implemented on each device especially for the MS-WLMS algorithm noticing the low BRAM utilization, percentage compared with the CorrelatorCEF, and the available on-chip memory. The results in Table II assume processing of each single observation vector in real-time data reception in WCDMA system. Generally, the MS-WLMS shows a close to 2dB (4dB) gain at a complexity increase of 2 to 3 times that of the CorrelatorCEF. VI. CONCLUSION In this paper we proposed a FPGA implementation of two channel estimation algorithms for DS-CDMA, namely, the Correlator-CEF and the MS-WLMS. We developed a fixed- 621 nT , 0 , [ ] , TABLE I NUMBER OF SLICES TO IMPLEMENT PCB AND PEC ON TWO VIRTEX FPGA FAMILIES FOR N=256 Number of CLBs (1 CLB = 4 Slices) FPGA Family PCB-Array PEC-Array 10 Raw BER Rake Hard IC Soft IC Correlator MS-WLMS 3 1 Virtex-II Virtex-II Pro -1 10 6648 TABLE II NUMBER OF ESTIMATED CHANNELS COULD BE IMPLEMENTED ON DIFFERENT DEVICE CORRELATOR AND MS-WLMS ON TWO VIRTEX FPGA FAMILIES (N=256, fclk=110MHZ) FPGA Family Virtex-II correlator-CEF No. users -2 10 a) 0 2 4 6 Eb/No [dB] 0 8 [ 10 12 ] 10 Raw BER Rake, Perfect Chan. Est. Rake Hard IC Soft IC XC2V6000 28 XC2V8000 41 13 19 BRAM Slices % 37% 53/144 47% 79/168 79% 6676/8448 81% 9539/11648 XC2VP70 28 XC2VP100 43 22% 72/328 24% 108/444 13 16% 53/328 18% 79/444 80% 6676/8272 90% 10016/11024 19 -1 10 [3] [5] -2 10 0 2 4 6 Eb/No [dB] 8 10 12 b) Fig. 8. Raw BER performance in Vehicular A channel at [0 120]km/h, 10 users for 64b/s in WCDMA conditions, a) Correlator-CEF and b) MS-WLMS. [6] point architecture for both techniques based on processing element arrays design. Unlike the well known Maximum Likelihood algorithm, which shows a high potential of computation complexity, MS-WLMS algorithm is proven to be implemented with as a low complexity as close to the Correlator with a substantial comparable performance gain like ML algorithm. Additional speedup in the execution time can be achieved. It is also investigated that our solution could achieve the real-time requirements of 3GPP standards applied in WCDMA systems. Our future work will focus on increasing the number of users (channels) that can be implemented on one device. This will be increased the number of users considerably by applying time-multiplex design and pipelining the processing elements. ACKNOWLEDGMENT The authors are grateful for the financial support of the Natural Sciences and Engineering Research Council of Canada (NSERC). We also wish to thank Axiocom inc. for its technical and financial assistance. [7] [8] [9] [10] [11] [12] [13] [14] [15] REFERENCES [2] 50% 72/144 62% 104/168 MS-WLMS No. users Virtex-II Pro [4] [1] BRAM S. Bhashyam and B. Aazhang, “Multiuser Channel Estimation and Tracking for Long Code CDMA systems”, IEEE Trans. on Communications, vol. 50, no. 7, July 2002. S. E. Bensley and B. Aazhang, “Subspace-based channel estimation for code division multiple access communication systems”, IEEE Trans. on Communications, vol. 44, no. 8, pp.1009-1020, Aug 1996. 622 T. K. Moon, Z. Xie, C. K. Rushforth and R. T. Short, “Parameter estimation in a multiuser communication system”, IEEE Trans. on Communications, vol. 42, no. 8, pp., August 1994. E. G. Strom, S. Parkvall, S. L. Miller, and B. E. Ottersten, “Propagation delay estimation is asynchronous DS-CDMA systems”, IEEE Trans. on Communications, vol. 44, no. 1, pp. 84-93, Jan. 1996. C. Sengupta, A. Hottinen, J. R. Cavallaro, and B. Aazhang, “Maximum likelihood channel parameter estimation in CDMA systems,” in Proceedings CISS, Princeton, NJ, March 1998. R. Cameron and B. Woerner, “Synchronization of CDMA systems employing interference cancellation,” Proceedings VTC, Atlanta, GA, April 1996, pp. 178-182 J. Thomas and E. Geraniotis, “Iterative MMSE multiuser interference cancellation for trellis coded CDMA systems in multipath fading environments”, in Proceedings CISS, Baltimore, MD, March 1999. A. J. Weiss and B. Freidlander, “Channel estimation for DS-CDMA downlink with a periodic spreading codes”, IEEE Trans. on Communications, vol. 47, no. 10, pp. 1561-1569, October 1999. Z. Xu and M. K. Tsatsanis, “Blind channel estimation for long code multiuser CDMA systems”, IEEE Trans. on Signal Processing, vol. 48, no. 4, pp. 988-1001, April 2000. L. Lars, M. Sternad and A. Ahlèn, “Tracking of time-varying mobile radio channels, Part I: The Wiener LMS Algorithm”, IEEE Trans. on communications, Vol. 49, No. 12, December 2001. J. Woong Choi and Y. Hawan Lee, “Design of channel estimation filters for pilot channel based DS-CDMA systems”, IEICE Trans. On Communications, vol. E87-B, no. 2, Feb. 2004. P. Y. Kam, C. H. Teh, “An adaptive receiver with memory for slowly fading channels”, IEEE Trans. on Communications, vol. COM-32, pp. 654-659, June 1984. K. E. Baddour, N. C. Beaulieu, “Autoregressive models for fading channel simulation”, IEEE Global Telecommunications Conference, No. 1, pp. 1187-1192, Nov 2001. M. A. Ouameur and D. Massicotte, “SWLMS for channel estimation and tracking”, Internal Report, Axiocom Inc., June 2003. M. Rupp, and H. Lou, "On efficient multiplier-free implementation of channel estimation and equalization", IEEE Global Telecomm. Conf., San Francisco, Vol. 1, 27 Nov.-1 Dec. 2000, pp. 6-10.