A Clock Recovery Circuit for Blind Equalization of Multi-Gbps Serial Data Links Jiawen Hu Villach Design Center Micronas Semiconductor Europastrasse 4, A-9524, Villach, Austria Abstract— This paper presents a clock recovery circuit for multi-Gbps serial data link that extracts the timing information hidden in pulse amplitude modulation by exploiting its intrinsic cyclostationarity. In contrast to conventional clock recovery circuit based on phase detector of Alexander or Hogge type, the proposed one is compatible with multi-level PAM data and linearly distorted data with closed eye diagram, and does not convert the inter-symbol interference into timing jitter. Hence it is well suited to be integrated as an a priori clock generator in a blind equalizer of a multi-Gbps serial link receiver. The operation principle with related theoretical background as well as the circuit implementation are discussed in detail. I. I NTRODUCTION Multi-Gbps serial data transmission over backplanes is a challenging task. A backplane channel exhibits at mulitGHz range dispersive characteristic attributed to impedance discontinuities at the multiple vias and connectors along the signal path and dissipative property resulted from the dielectric loss of the PCB basis material and skin effect of the copper transmission lines. Both channel impairments are frequency dependent and introduce inter-symbol interferences (ISI) to the transmitted symbols. Furthermore, due to the possibility of connecting components from different vendors, the frequency response of an actual backplane channel and the related distortion is subject to diversity. In order to cancel the ISI and consequently achieve the required transmission rate and bit error rate (BER), a blind equalizer that automatically adapts itself to match the variant channel distortion without using training sequences is highly desired. The architecture of a multi-Gbps serial data link is illustrated in Fig. 1(a). The transmitter can optionally incorporate pre-emphasis equalizer, which partly compensates the channel loss at high frequency. The receiver consists of a clock recovery circuit and a discrete-time equalizer, where the latter can include either the feed-forward equalization (FFE) or the decision feedback equalization (DFE) or both. Since a conventional clock recovery circuit based on clock and data recovery (CDR) of Alexander or Hogge type does not function with unequalized data, whose eye diagram may be completely closed due to the ISI cause by channel distortion or improperly adjusted pre-emphasis, it is usually connected to the equalizer output. This configuration, referred to as a posteriori, has a potential problem during initialization, namely, if the equalizer is unable to provide the clock recovery circuit with a sufficiently opened eye diagram, both components may be 0-7803-9390-2/06/$20.00 ©2006 IEEE Fig. 1. (a)Architecture of a serial data link with equalizers both at the transmitter and at the receiver, where the receiver incorporates an a posteriori clock recovery circuit; (b) receiver that incorporates an a priori clock recovery circuit. interlocked since they are interdependent. Therefore, a robust receiver favors an a priori clock recovery configuration, where the clock is directly extracted from the unequalized data, independently of the equalizer, as shown in Fig. 1(b). Multi-level pulse amplitude modulation (PAM) is an efficient method to enhance the transmission efficiency and alleviate the bandwidth requirement as advised in [1] [2]. The operation mechanism of the clock recovery circuit in those designs is based on the analogy of CDR for PAM-2 signals. Hence, its structure is rather cumbersome because the data transition of PAM-4 is much more complicated than that of PAM-2. Further, to extend it for PAM-8 and PAM-16 operation is almost impractical. Aiming at the aforementioned two problems, this paper presents a clock recovery circuit based on the theory of cyclostationarity. This clock recovery circuit functions with all PAM types. It does not presume the eye diagram of the received data to be open. It only postulates that the channel transfer function is linear and takes value other than zero at half symbol rate. It either does not convert ISI into timing jitter as a conventional CDR does [3]. As it only evaluates the spectrum in the direct vicinity of half symbol rate for phase detection and ignores all the other, it is less susceptible to noise. 5163 ISCAS 2006 Fig. 3. Fig. 2. Visualization of the PSD at different nodes in a serial data link: (a) PSD of x(t), (b) PSD of the transmitter output signal s(t), (c) transfer function of the channel, (d) PSD of the received data. II. O PERATION P RINCIPLE Architecture of the clock recovery circuit To assist better understanding, the PSD at different nodes along the serial data link is visualized in Fig. 2. As shown in Fig. 2, Sx (f ) exhibits periodicity and consists of mul1 1 tiple replica of the spectrum in [− 2T , 2T ). Each of them carries the same information and is therefore correlated. This property is called spectral correlation or cyclostationarity. The PSD of the received signal y(t) is calculated as Sy (f ) = Sx (f )|H(f )G(f )|2 . Although the profile of Sy (f ) is shaped by |H(f )G(f )|2 , the spectral correlation originated from x(t) is preserved in y(t). The spectrum in the vicinity of 1 resembles that of a double side-band suppressed carrier ± 2T amplitude modulated signal (DSBSC-AM) which has a carrier 1 frequency of 2T . As a result, the clock recovery of PAM signal can be similarly achieved as the carrier recovery of DSBSC1 is defined for convenience as AM signal. In the sequel, 2T f0 . A. Cyclostationarity in PAM Signal B. Phase Detection Referring to the link architecture in Fig. 1, the impulse response of the pulse shaping and pre-emphasis filter on the transmitter is g(t) and that of the transmission channel is h(t). The Fourier transform of g(t) and h(t) is G(f ) and H(f ), respectively. The output of the transmitter can be expressed as s(t) = g(t) ∗ x(t) [4], by defining As illustrated in Fig. 3, the clock recovery circuit uses a PLL structure to synchronize its phase with the received data. At first the spectral component of interest is shifted into baseband by a down-converter consisting of two mixers and low-pass filters (LPF) and then amplified to a proper amplitude level by two variable gain amplifiers (VGA), because the attenuation of backplane channel can be as much as 25 dB at 5 GHz, the half symbol rate of 10 Gbps PAM-2 signal. Subsequently the phase detection (PD) and frequency detection (FD) block extracts the phase and frequency information from the baseband signal and converts it to the control signals for charge pump (CP). The loop filter integrates the current pulses from the CP and controls further the VCO, whose quadrature outputs are fed to the mixers. In this way, the loop is closed. The in-phase and quadrature clock generated by the VCO is cos(2πfvco + ϕ) and sin(2πfvco + ϕ), respectively, where fvco and ϕ is the frequency and phase of the VCO output signals, respectively. By defining the complex variable u(t) as u(t) = a(t) + jb(t) and applying Euler formula, the output of the down-converter can be written as x(t) = ∞ cn δ(t − nT ) (1) n=−∞ where T is the symbol interval and {cn } is a real stochastic sequence that is independently identically distributed. Consequently, E{cn cm } = Rc (n − m) = Rc (k) = σc2 δ(k). The power spectral density (PSD) of x(t) is calculated as Sx (f ) = ∞ 1 σ2 Rc (k) exp(−j2πf kT ) = c . T T (2) k=−∞ According to (2), Sx (f ) is periodic with T1 , i.e. Sx (f ) = Sx (f + m T ) and exhibits even symmetry at the origin, i.e. Sx (−f ) = Sx (f ), which leads to m m Sx ( + f ) = Sx ( − f ). (3) 2T 2T Hence, there exist even symmetries at each multiple of the 1 in the PSD of x(t). half symbol rate 2T u(t) = y(t) exp[j(2πfvco + ϕ)] ∗ l(t). (4) When the PLL is locked, fvco = f0 . During the acquisition phase of the PLL, fvco = f0 . But since in a practical transceiver the VCO frequency is usually calibrated by the 5164 quartz oscillator, ∆f = fvco − f0 is at most some hundred ppm of f0 and exp[j(2π∆f t + ϕ)] is a slowly wandering variable compared to the VCO frequency. As a result, u(t)=y(t) exp[j(2πf0 + 2π∆f + ϕ)] ∗ l(t) =y(t) exp(j2πf0 ) ∗ l(t) exp[j(2π∆f t + ϕ)]. (5) w(t) By using frequency shift property of Fouriour transform and assuming that the transfer function of the LPF L(f ) has a steeper roll-off than H(f − f0 )G(f − f0 ), the PSD of w(t) can approximated as, 2 Sw (f )=Sy (f − f0 )|L(f )| =Sx (f − f0 )|H(f − f0 )G(f − f0 )L(f )|2 σ2 ≈ c |H(f0 )G(f0 )L(f )|2 . (6) T As a result, w(t) is a wide-sense stationary (WSS) process. Although it is generally not real due to the phase shift caused by H(f )G(f ), it can be expressed as w(t) = w(t) exp[arg(H(f0 )G(f0 ))], Fig. 4. Working mechanism of the PD: detect whether the angle u is (a) lagging or (b) leading to π/2; detect whether the angle of v is (c) lagging or (d) leading to π/2. (7) where w(t) is a real WSS process and has the same PSD as w(t). By defining ∆ϕ = ϕ + arg(H(f0 )G(f0 )) and using (7), (5) can be simplified as, u(t) = w(t) exp[j(2π∆f t + ∆ϕ)]. (8) According to (8), if the PLL is not locked and fvco deviates from f0 by a constant amount ∆f , u(t) is modulated by the term exp[j(2π∆f t + ∆ϕ)], and the envelops of a(t) and b(t) exhibit sinusoidal form. If ∆f is tuned to zero, the envelops of a(t) and b(t) maintain constant and their relation is determined by exp(j∆ϕ). Conversely, ∆ϕ can be calculated from a(t) and b(t), i.e., ∆ϕ = arctan(b(t)/a(t)). (9) III. C IRCUIT I MPLEMENTATION Although the mathematical deduction of the phase detection in (9) is straightforward, it does not offer useful advice on the implementation of PD, since the division and arctan() function would result in prohibitive hardware cost. This section presents a compact realization of PD and FD, which has a bang-bang output characteristic and is capable of high speed operation. A. Working Mechanism of PD In contrast to its linear counterpart that outputs a signal proportional to the detected phase error, a bang-bang PD merely indicates whether the phase is leading or lagging. As illustrated in Fig. 4(a)(b), the PD indicates that the phase is lagging π/2 if u is in the first quadrant, while indicates that the phase is leading π/2 if u is in the second quadrant. The position of u is determined simply by calculating sgn(a)sgn(b). In this way, the ambiguity in a and b introduced by w(t), which randomly takes positive and negative value, is eliminated since they both contain w(t) referring to (5). As a side effect, the PD does Fig. 5. Working mechanism of the FD: detect a cross-over of sgn(a)sgn(b) from −1 to 1, (a) if v is in the first quadrant, ∆f is positive; (b) if v is in the second quadrant, ∆f is negative; detected a cross-over of sgn(a)sgn(b) from 1 to −1, (c) if v is in the first quadrant, ∆f is negative; (d) if v is in the second quadrant, ∆f is positive. not distinguish between the odd quadrants and between the even quadrants, which results in a phase ambiguity of π in the generated clock signal. But this phase ambiguity will not be sensed by the equalizer since both the rising and falling edge of the clock are used to slice the data. In the practical circuit implementation, it is desirable to keep the envelops of a and b equal in the locked status. This can be achieved by introducing √ √ an auxiliary variable v = u exp(π/4) = 22 (a−b)+j 22 (a+b) and locking v at π/2, so that u is locked at π/4, as depicted in Fig. 4(c)(d). B. Working Mechanism of FD In order to extend the pull-in range of the PLL independently of its loop bandwidth, FD is incorporated in the clock recovery circuit. The output of the FD is only active when the detected phase is continuously wandering in one direction, and it indicates the direction of the phase change. As shown in Fig. 5, each time when u changes quadrant, an event is triggered. Referring to Fig. 5(a)(b), a change of u from the fourth quadrant to the first quadrant is indistinguishable from one from the second quadrant to the first quadrant, if only sgn(a)sgn(b) is considered, because it changes from −1 to 1 in both cases. The auxiliary variable v is utilized to solve this ambiguity, since in the former case, v is located in the first quadrant as shown in Fig. 5(a), while in the latter case, v is 5165 Fig. 6. Block diagram of the phase detector and the frequency detector located in the second quadrant which is shown in Fig. 5(b). The ambiguity at the cross-over of sgn(a)sgn(b) from 1 to −1 is solved similarly as demonstrated in Fig. 5(c)(d). C. Realization of PD and FD The PD and FD block is realized with minimum circuit complexity, it contains six analog comparators and several digital blocks, as shown in Fig. 6. The first four comparators detect the sign of a and b and determine the quadrant of u. The rest two comparators detect the sign of a − b and a + b and determine the quadrant of v. When the u lies in the vicinity of quadrant boundary, either a or b assumes small value, and sgn(a)sgn(b) is prone to be disturbed by noise, which can be falsely interpreted as quadrant change of u. Therefore threshold voltage reference VREF is built into the comparator bank so that the comparison results of a and b are discarded when either of their absolute values is smaller than VREF . The digital blocks analyze the results coming from the comparators, determine the quadrant of v and the quadrant change of u and generate accordingly the control signal for CP. The entire block can be operated at a lower frequency than the half symbol rate, which is an advantage with regard to the power efficiency. Fig. 7. Simulation plots: (a) frequency response of channel 1, (b) eye diagram of the received PAM-4 signal distorted by channel 1, (c) recovered clock from the distorted PAM-4 signal, (d) frequency response of channel 2, (e) eye diagram of the received PAM-2 signal distorted by channel 2, (f) recovered clock from the distorted PAM-2 signal IV. S IMULATION R ESULTS The behavioral model of the clock recovery circuit has been simulated with different backplane channels. As shown in Fig. 7(a)(b), channel 1 has slight attenuation, and the eye diagram of the received PAM-4 signal is open. Fig. 7(d)(e) show that channel 2 has such severe attenuation that the eye diagram of the received PAM-2 signal is almost closed. In both cases referring to Fig. 7(c)(f), the clocks of the received data are successfully recovered. [1] J. L. Zerbe, C. W. Werner, V. Stojanović, F. Chen, J. Wei, G. Tsang, D. Kim, W. F. Stonecypher, A. Ho, T. P. Thrush, R. T. Kollipara, M. A. Horowitz, and K. S. Donnelly, “Equalization and clock recovery for a 2.510-Gb/s 2-PAM/4-PAM backplane transceiver cell,” IEEE J. Solid-State Circuits, vol. 38, no. 12, pp. 2121–2130, Dec. 2003. [2] V. Stojanović, A. Ho, B. W. Garlepp, F. Chen, J. Wei, G. Tsang, E. Alon, R. T. Kollipara, C. W. Werner, J. L. Zerbe, and M. A. Horowitz, “Autonomous dual-mode (PAM2/4) serial link transceiver with adaptive equalization and data recovery,” IEEE J. Solid-State Circuits, vol. 40, no. 4, pp. 1012–1026, Apr. 2005. [3] P. K. Hanumolu, B. Casper, R. Mooney, G.-Y. Wei, and U.-K. Moon, “Analysis of PLL clock jitter in high-speed serial links,” IEEE Trans. Circuits Syst. II, vol. 50, no. 11, pp. 879–886, Nov. 2003. [4] N. M. Blachman, “Beneficial effects of spectral correlation on synchronization,” in Cyclostationarity in Communication and Signal Processing, W. A. Gardner, Ed. IEEE Press, 1994, pp. 362–390. V. C ONCLUSION This paper has discussed the operation principle and circuit implementation of a clock recovery circuit for multiGbps serial links. The frequency and phase information is detected by exploiting the cyclostationarity in the received data spectrum. Unlike traditional CDR circuit that works only with opened data eye diagram and PAM-2 modulation, the proposed clock recovery circuit is compatible with closed data eye diagram and multi-level PAM signal. ACKNOWLEDGMENT The author would like to thank Stephan Mechnig, Christian Ebner, Christophe Holuigue, Heinz Werker, Gerhard Mitteregger and Dr. Heribert Geib of Xignal Technologies AG for helpful discussions and support. R EFERENCES 5166