IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009 1927 A 5.75 to 44 Gb/s Quarter Rate CDR With Data Rate Selection in 90 nm Bulk CMOS Lucio Rodoni, Student Member, IEEE, George von Büren, Student Member, IEEE, Alex Huber, Member, IEEE, Martin Schmatz, Member, IEEE, and Heinz Jäckel, Member, IEEE Abstract—This paper presents a quarter-rate clock and data recovery (CDR) circuit for plesiochronous serial I/O-links. The 2 -oversampling phase-tracking CDR, implemented in 90 nm bulk CMOS technology, covers the whole range of data rates from 5.75 to 44 Gb/s realized in a single IC by the novel feature of a data rate selection logic. Input data are sampled with eight parallel differential master-slave flip-flops, where bandwidth enhancement techniques were necessary for 90 nm CMOS. Precise and low-jitter local clock phases are generated by an analog delay-locked loop. These clock phases are aligned to the incoming data by four parallel phase rotators. The phase-tracking loop of the CDR is realized as a digital delay-locked loop and is therefore immune against process tolerances. The CDR is able to track a maximum frequency deviation of 615 ppm between incoming data and a local reference clock and fulfills the extended XAUI jitter tolerance mask. A bit error rate 10 12 was verified up to 38 Gb/s using a 27 1 PRBS pattern. With a low power consumption per data rate of only 5.74 mW/(Gb/s) the CDR meets the specifications of the International Technology Roadmap for Semiconductors for 90 nm CMOS serial I/O-links at the maximal data rate of 44 Gb/s. The CDR occupies a chip area of 0.2 mm2 . Index Terms—Clock and data recovery (CDR), CMOS analog integrated circuits, current-mode logic (CML), delay-locked loop (DLL), high-speed serial link, jitter tolerance. I. INTRODUCTION T HE aggregate data communication bandwidth of key components in telecommunication equipment and computer servers has experienced a continuous increase. This progress has been achieved by increasing the serial data rate and by integrating more power- and area-efficient transceivers on a single CMOS IC. Key trends in CMOS technology, power consumption, and aggregate data rate are summarized in Table I according to the forecast of the International Roadmap for Semiconductors (ITRS) published in 2004 [1]. In Table I, the transceivers are categorized into high-integration-level serial transceivers (e.g., 200 8 Gb/s) and high-performance serial Manuscript received November 07, 2008; revised March 06, 2009. Current version published June 24, 2009. This work was supported by the Swiss Federal Office for Professional Education and Technology under Contract/Grant number KTI 7995.1. L. Rodoni, G. von Büren, and H. Jäckel are with the Swiss Federal Institute of Technology (ETH) Zurich, Electronics Laboratory, 8092 Zurich, Switzerland (e-mail: lucio@rodoni.ch; george.vonbueren@ife.ee.ethz.ch; jaeckel@ife.ee.ethz.ch). A. Huber is with the Institute of Microelectronics, University of Applied Sciences Northwestern Switzerland, 5210 Windisch, Switzerland (e-mail: alex. huber@fhnw.ch). M. Schmatz is with the Zurich Research Laboratory, IBM Research, 8803 Rüschlikon, Switzerland (e-mail: mrt@zurich.ibm.com). Digital Object Identifier 10.1109/JSSC.2009.2021913 TABLE I SERIAL TRANSCEIVER ROADMAP OF ITRS [1] transceivers (e.g., 40 40 Gb/s). High-integration-level [2], [3], high-performance [4], [5], and electrical/optical [6], [7] chip-to-chip transceivers, representing the state of the art, are summarized in Table II. One of the critical and speed-limiting circuit blocks in a serial I/O link macro-cell is the clock and data recovery (CDR) circuit in the receiver. The first 40 Gb/s CMOS CDR was presented in 2003 and was realized in a 0.18 m process [8]. This 40 Gb/s CDR employs a quarter-rate architecture with a multiphase LC oscillator and a passive loop filter. In 2007, a quarterrate 3 -oversampling 40–44 Gb/s CDR with 1:16 DEMUX implemented in 90 nm CMOS was presented [9]. This CDR fulfills the ITU-T G.8251 jitter tolerance mask and its power consumption is less than 1/3 of a comparable commercial SiGe CDR with 1:16 DEMUX. In multi-channel applications, where every participant has , the CDR nominally the same local reference frequency of each receiver aligns the phase of its plesiochronous sampling clock to the incoming data by using phase interpolation techniques [10]. Since no VCO is needed in each CDR, coupling between channels is reduced. The control of the sampling position is realized by analog [11] or digital [10], [12] phasetracking loops. Using an analog phase interpolator, a 10.8 Gb/s half-rate CDR implemented in 0.11 m CMOS fulfills the SDH/ SONET jitter tolerance at a BER 10 , consuming a power of 220 mW and an area of 0.35 mm [11]. A half-rate 25 Gb/s CDR implemented in 90 nm CMOS achieving a BER 10 incorporates a digital first-order loop filter, consumes 98 mA from a 1.1 V supply, and occupies a die area of 0.064 mm only, and is therefore suited for high-density integration [12]. It has been shown with a 65 nm SOI CMOS technology that area and power consumption per data rate of a quarter-rate 40 Gb/s CDR can be as low as 0.03 mm and 1.8 mW/(Gb/s), respectively 0018-9200/$25.00 © 2009 IEEE Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply. 1928 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009 TABLE II PUBLISHED SERIAL TRANSCEIVERS AND CDRS Fig. 1. Architecture of the phase tracking loop. [13]. Performance figures (e.g., input data range, reference clock range) of these CDRs are listed in Table II. None frequency of them are able to handle the complete input data range from 10 to 40 Gb/s. Either the input data rate range is limited due to the VCO of the CDR [8], [9], [13] or the plesiochronous CDR asks for a too large reference frequency range [11], [12]. Therefore, we propose and have implemented successfully a data rate selection logic that allows coverage of the whole range of data rates even from 5.75 to 44 Gb/s while the reference frequency range is 5.75 to 11 GHz [14]. This feature makes the circuit especially suitable in multi-standard applications, enabling new link rates while supporting compatibility with legacy rates. Section II gives an overview of the proposed CDR architecture. The building blocks of the CDR, such as the samplers, 8:32 demultiplexer, digital control loop with data rate selection, phase rotator, delay-locked loop (DLL), and clock buffer, are described in detail in Section III. Finally, measurement results are presented in Section IV and a summary is given in Section V. II. CDR ARCHITECTURE In high-density serial I/O links, the transmitter (TX) and receiver (RX) are clocked by two independent reference clocks having the same nominal frequency. The CDR of the receiver has to track a slowly drifting phase difference between the incoming data and the RX clock caused by a bounded frequency difference in the range of 10 to 100 ppm between the quartz-based plesiochronous TX and RX clocks. Hence, a phase-tracking loop in the CDR is sufficient for this purpose. Since the sampler is the speed-limiting circuit block of the CDR, parallel architectures, e.g., half-rate [12], [15] and quarter-rate CDRs [8], [13], are employed to demultiplex the data at the input. A higher demultiplexing factor increases the number of samplers and the number of clock edges, but has the following two advantages: 1) the regeneration phase of the comparator at the input is enlarged and 2) the sampling clock frequency is lowered, simplifying the on-chip clock distribution. The block diagram of the 2 -oversampling quarter-rate CDR is shown in Fig. 1. We chose a dual-loop architecture [10]: 1) the phase tracking loop is realized as a digital DLL and 2) the referare generated ence clock phases of the phase rotators by an analog DLL. Eight parallel samplers clocked by acquire the four data bits and the four edges needed to evaluate the sampling position [16]. Eight parallel 1:4 Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply. RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS demultiplexers reduce the data rate from 10 to 2.5 Gb/s and align the sampled bits, which are separated by 1/8th of the period of the reference clock signal, to a single 2.5 GHz clock phase. At this point, the transition from differential signaling to full-swing CMOS signal levels is performed, since the data are aligned and the clock period is long enough to design the digital blocks by a standard design flow [17]. The aligned 16 data bits and 16 edge bits are compared in the edge detector, which solves the 16 Alexander equations [16] and outputs a early/late signal after majority voting. The digital loop filter then evaluates the sequence of early and late bits and asserts if needed an up/down signal for the phase rotator. The up/down counter translates the up/down signal to a thermometer-coded word controlling the four phase rotators. These four phase roso that the samtators shift the local reference phases pling phases are aligned to the incoming data. The are local reference clock phases of the phase rotators generated with a DLL from a single reference clock. All circuit blocks operating at a clock frequency above 2.5 GHz are implemented as current-mode logic (CML) to meet the speed requirements. In addition, CML circuits have a higher immunity to supply noise and generate less switching noise on the power supply. The proposed CDR macro-cell requires only a single reference clock phase reducing complexity and power consumption of the reference clock distribution are buffered and network. The four 10 Gb/s data bits fed to output pins for testing and measurement purposes. III. CDR BUILDING BLOCKS A. Samplers In this 2 -oversampling quarter-rate CDR, the front-end sampling latch that is present in each sampler is the most speed-critical building block. The front-end sampling latch has to be able to track the incoming 40 Gb/s signal, sample the data with a 10 GHz clock signal, and then decide if the voltage at its input is below or above a threshold voltage within a time period of half of a 10 GHz clock period. The latch following the sampling latch has relaxed speed constraints because it operates at a reduced data rate of 10 Gb/s. Together with the sampling latch it forms a master–slave flip-flop (MS-FF) and provides a stable output that is valid during a full 10 GHz clock period. At the input of the eight samplers the data signal should have rise and fall times , that are shorter than one half of the bit . Based on first-order RC-circuit analysis time , the total input capacitance of the eight samplers including wiring capacitance and pad has to be kept under (1) allowing an input capacitance of 10 fF per sampler excluding wire and pad capacitance of 30 fF and 70 fF, respectively. A sampling latch consisting of a track-and-hold stage, implemented as NMOS pass transistors, followed by a latch [13], [18] has not been chosen because of the rigorous requirement of the clock signal. These requirements are short clock fall time to achieve a high time resolution [19], high common-mode voltage 1929 of the sampling clock, and large clock swing to fully switch on and off the pass gate. Since all other circuits use CML signaling, differential CML latches are preferred. Samplers composed of CML latches implemented in 90 nm CMOS are able to regenerate a 40 Gb/s data signal [9], [20]. Fig. 2(a) shows the block diagram of our sampler, which consists of a front-end sampling latch [Fig. 2(b)], a slave latch [Fig. 2(c)], and a CML buffer. The CML latches and the CML buffer are fully differential circuits to achieve a higher immunity to power supply variations than pseudo-differential circuits of the front-end sam[8], [21], [22]. The sample transistors pling latch are limited in size ( m and nm) since the input capacitance of each sampler has to be lower than 10 fF. In order to reduce rise and fall times at the output, the load resistors have to be decreased and the tail current increased. Therefore, the widest transistor that keeps the input capacitance below 10 fF has been chosen. The tail current has to provide a m at current of 1.54 mA to bias the transistors a current density of 0.11 mA m, which has been evaluated by simulation to achieve peak . In order to provide enough regenerative gain to fully switch the following differential pair and to guarantee enough noise margin, a voltage swing of 600 mV is . The latch required, resulting in a load resistance transistors and the sample transistors have equal transistor dimensions so that both have current densities of peak and present the same load to the clock transistors . The tranof the differential clocking stage has to steer the sistor pair current at a lower frequency than the transistor pair , . To guarantee full current switching with a typical CML clock are by a factor 1.5 wider than signal, the clock transistors and in order to reduce the required signal swing for proper switching of the differential pair . With a fan-out of 1, this sampling latch is able to regenerate the input data up to 32 Gb/s. This configuration is defined as case I. Because the second CML latch in the MS-FF configuration of Fig. 2(a) has to process a four times lower data rate, the dimensions of its devices have been scaled by a factor of 1/3 in order to reduce the capacitive load of the first latch. In this configuration (case II) a maximal data rate of 37 Gb/s is achieved, because the tracking bandwidth of the sampling latch has been increased and its regeneration time [23, eq. (1)] has been reduced. To further increase the bandwidth of the sampling latch, shunt peaking inductors [24] are introduced (case III) as shown nH, where in Fig. 2(b). With integrated inductors of one inductor occupies an area of 20 20 m using the two topmost metal layers [25], [26], the tracking bandwidth of the first CML latch is extended by a factor 1.2 enabling the sampling of input data up to 44 Gb/s. A CML buffer after the second latch is needed to drive not only the demultiplexer but also the 10 Gb/s output driver. Even though the first latch of the sampler incorporates two shunt-peaking inductors, the layout of one sampler is still very compact and occupies an area of 50 m 35 m only, where the two inductors occupy 46% of the area. To quantify and compare the sensitivity, timing resolution, and bandwidth of these three sampling front-ends (I, II, III) each of them has been characterized with the procedure described in [27]. The idea is that the latch can be separated in a linear portion, described by an integration window, and an ideal sampler Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply. 1930 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009 Fig. 2. (a) Sampler. (b) Inductive shunt-peaked first CML latch. (c) Second CML latch. Fig. 3. (a) Required voltage A(1t), (b) sensitivity function h and (c) normalized transfer function jH 1, no L; (II) Fig. 2(a), no L; (III) Fig. 2(a). plus decision-dependent feedback [27]. By this, the bandwidth limitations of the sampling stage and the impact of the finite slew rate of the clock (0.4 V/32 ps in this case) can be included in the transfer function of the overall data path. The integration window is derived by measuring the sensitivity for a short voltage pulse as a function of the sampling time relative to the that is just clock edge. Fig. 3(a) illustrates the amplitude with resufficient to flip the latch for a given time offset spect to the latch clock. The sensitivity function , or more precisely the function one divided by the sensitivity per ps, is depicted for the three cases in Fig. 3(b). The sensitivity window of the CML latch with peaking inductors is slightly smaller than the others, indicating superior time-resolution capability. Moreover, the CML latch with peaking inductors (III) has the best ! ( )j for the three cases: (I) 1st CML latch with fanout [27, eq. 29]. The transfer funcDC input sensitivity voltage is derived by taking the Fourier transfer function tion of the sensitivity function . The normalized transfer funcnormalized for the target sensitivity of 5 mV is tion shown in Fig. 3(c). The sampler (III) shown in Fig. 2(a) has the highest equivalent 3 dB bandwidth. B. Data Alignment and 8:32 Demultiplexer The 8:32 demultiplexer block following the samplers consists of eight parallel paths where each is built by a cascade of one 1:2 demultiplexer and one 2:4 demultiplexer. Eight 10 Gb/s input signals at the CML level, which are separated by 1/8th of the period of the reference clock signal, are converted to 32 output signals at full-swing CMOS levels at a data rate of 2.5 Gb/s. These Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply. RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS 1931 Fig. 4. Block diagram of the digital control circuit. output signals are aligned to a single 2.5 GHz clock CMOS signal. The 2.5 GHz clock is derived from the 10 GHz sampling clock and serves as clock for the digital logic. The design goal for this alignment is to balance the loading of all sampling clock , without inserting any dummy elements. Difphases ferent capacitive loads connected to the clock signals would potentially lead to phase shifts, which result in inaccuracies of the sampling points. Simply resampling the input signals is not of the demultiplexer by one of the clock phases used for sampling would be possible since the clock phase too heavily loaded, and furthermore, the timing margins in the latches of the first demultiplexer stage would be too small. In order to increase the timing margins, the first four samples , , , and are delayed by one half of a clock period. Although an additional 50 ps of timing margin is obtained, correct operation is still not guaranteed for all process corners when all eight signals are sampled with one clock phase. We therefore used the and phases of the divided clock at 5 GHz, where each of them samples four input signals. This adds another 25 ps of timing margin. The frequency divider can be designed to present a small capacitive load to one of the sampling clock phases. This minimum load is the only loading imbal. Moreover, using the and ance of the clock phases phases further leads to symmetrical loads connected to the frequency divider, which is favorable in terms of speed. However, at the output of the first demultiplexer stage, the data signals are not aligned yet. The data alignment can easily be achieved by sampling the signals with a single 2.5 GHz clock signal at the input of the second demultiplexer stage. At a nominal data rate of 2.5 Gb/s, the timing margin is large enough (150 ps). The presented alignment procedure results in a minimum imbalanced loading of the sampling clock signals, and it is robust to process, voltage, and temperature (PVT) variations because of the large timing margins. C. Digital Control Loop With Rate Selection Fig. 4 illustrates the block diagram of the digital control logic, which offers the option to select between three different input data rates. All these circuit blocks, which run at 2.5 and 1.25 GHz, are synthesized circuits and are placed and routed with a digital design tool. The edge detector solves the Alexander equations [16] (2) for 16, 8, or 4 data/edge pairs depending on the selected data rate. The detector outputs a single early or late signal after majority voting. In order to relax the speed requirements for the digital CMOS loop filter, the output signals of the edge detector are demultiplexed by a factor of two. The loop filter, running at 1.25 GHz, is realized as a finite state machine (FSM), which accumulates the EARLY[1:0] and LATE[1:0] bits. The state machine can make zero, one, or two steps depending on the difference of the number of EARLY[1:0] and LATE[1:0] bits. The state machine consists of twelve states, arranged as two circles. After running through one circle of six states, an up or down signal, respectively, is generated by the FSM. These up and down signals increment or decrement the thermometer-coded up/down counter value, which controls the phase rotator. Since double steps are possible, the state machine needs at least three clock cycles between two consecutive up or down impulses. Hence, the maximum update rate of the phase rotator for the nominal data rate is (3) A higher update rate, which would increase the jitter tolerance, could be reached by reducing the number of states in the FSM. Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply. 1932 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009 Fig. 5. Principle of the rate selection: Quarter-rate (QR), half-rate (HR), and full-rate (FR) mode. : sample points; : discarded samples. However, the minimum number of states is given in order not to induce false phase steps due to the overall delay of the CDR loop. A special feature of our digital control logic is the capability to support different data rates. The logic responsible for the data rate selection is implemented in the edge detector as shown in Fig. 4 and operates as depicted in Fig. 5. Quarter-rate (QR) operation is used for an input data rate from 23 to 44 Gb/s. The early/late generation logic generates for each of the 16 data/edge bit pairs an early/late signal by solving the Alexander equations [16]. When the data rate is lower and the bit length larger, between 11.5 and 23 Gb/s, the CDR operates in half-rate (HR) mode. The edge samples used in the quarter-rate mode are omitted and only the data samples are evaluated. In this mode, the even data samples take the role of the edge bits and the odd data samples are still data bits. From these eight data/edge pairs, the early/late information is generated. For a still lower input data rate ranging from 5.75 to 11.5 Gb/s, the full-rate (FR) mode is possible. Here, every other sample of the odd data samples are alternately used as a data and an edge bit, respectively. Hence, our receiver can cover the full range of data rates from 5.75 to 44 Gb/s, even though the multi-phase DLL, , is band which generates the reference clock phases limited. The DLL operates from 5.75 to 11.5 GHz and limits the lower data rate of the CDR. D. Phase Rotator In order to update the sampling position, we use four parallel phase rotators which are controlled by the thermometer-coded up/down counter. Using a full thermometer code, discontinuities in the phase rotator transfer characteristics can be avoided. generated in the DLL are The reference clock phases fed to the four phase rotators. One phase rotator as shown in Fig. 6 consists of a phase selection stage followed by a phase interpolation stage [10]. The first stage, consisting of two 4:1 multiplexers [Fig. 7(a)], selects two clock phases from two adjacent phase octants. The interpolation process with eight clock phases results in a better phase linearity compared to interpolation schemes using six or I/Q phases [12]. The phase Fig. 6. Block diagram of one phase rotator. interpolator [Fig. 7(b)] is a dual input differential amplifier and blends the two selected phases according to the 8-bit ther. Retiming flip-flops between mometer-coded value the up/down counter and the phase rotator guarantee that all , , change control signals their states at the same time, thus avoiding phase glitches. The common-mode outputs of the selector and the interpolator are regulated by a replica bias as all CML circuits of this CDR. An important practical requirement is that amplitude and common-mode voltage of the sampling clock always have their correct amplitude and voltage level - even after start-up to assure the presence of the CDR system clock. This implies , , are that the control signals initialized correctly. The eight interpolation steps together with the eight input result in a total of 64 phase steps. Hence, clock phases one phase step amounts to (4) and the ideal phasor of the output signal rotator having equal phase steps is of the phase (5) denotes the amplitude. The value of inwhere parameter teger ranges in the interval , where is the number of interpolation steps. Since the phase interpolation is achieved by adding two signals with different phases and not by a real rotation of the phase of a single signal, the interpolated Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply. RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS 1933 Fig. 7. (a) Schematic of the 4:1 phase selector. (b) Schematic of the phase interpolator (type-I). Fig. 8. (a) Equal versus interpolated phase steps of one octant of the 360 circle and phase error " . (b) Simulated phase step (DNL) and (c) absolute phase error (INL) for type-I and type-II phase interpolators with a 10 GHz clock. output signal and is is not equal to the ideal output signal (6) The interpolated phase steps in (6) and equal phase steps in (5) are calculated and displayed as and , respectively, for one octant in Fig. 8(a). The maximum deterministic interpolation is 0.5 and is by a factor of ten phase error smaller than one phase step. Furthermore, the phase steps vary because inputs and output are not fully isolated due to capacitive feedback. The simulated phase steps (DNL) and absolute phase error (INL) for a clock frequency of 10 GHz are shown for our implementation, a type-I phase interpolator, and a type-II phase interpolator in Fig. 8(b) and (c), respectively. A type-I phase interpolator has a common-source stage as shown in Fig. 7(b). A type-II phase interpolator incorporates a cascode stage [10, Fig. 10]. The maximum phase step of a type-I phase interpolator occurs due to the parasitic effect of capacitive coupling between gate and drain when the interpolation boundary is reached and the output clock of the 4:1 multiplexer is switched, e.g., from to . Although the alternative design (type-II) has a better isolation property, it was not used since it has a too low unity gain frequency under worst-case process condition. Furthermore, it has been reported that a type-II phase interpolator has a more nonlinear transfer characteristic at lower clock frequencies [10, Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply. 1934 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009 Fig. 9. Simulated transfer characteristic of the four phase rotators. Fig. 11]. The simulated transfer function of the four phase rotators with a clock frequency of 10 GHz and an update rate of 1.25 GHz is depicted in Fig. 9 and reveals no deterministic phase offset between the four clock phases. At lower clock frequencies the transfer characteristic becomes slightly more nonlinear due to sharper clock edges and wider spacing between the interpolating edges. A total of 64 phase steps for one 100 ps reference clock period or 16 steps for one data unit interval (UI) of 25 ps are provided, resulting in a nominal timing resolution of 1.56 ps. When the phase rotator is updated with the maximum update evaluated in (3), the maximal possible frerate between TX and RX clocks that can be quency offset tracked correctly is (7) ppm or expressed in parts per million (ppm): ppm. Besides the frequency offset, which can be tracked, the jitter tolerance is the second key parameter for CDRs employed in chip-to-chip communication. For a first-order phase tracking that the CDR can tolCDR, the maximum jitter amplitude erate is limited by (8) and is inversely proportional to the jitter frequency . Jitter tolerance can be increased by a higher update rate or a larger phase step , where the latter increases the dither amplitude of the loop. E. Delay-Locked Loop A DLL operating between 5.75 and 11.5 GHz generates the eight clock phases for the four phase rotators. Compared to a PLL implementation, a DLL solution is preferred due to better immunity to on-chip noise because a voltage-controlled delay line (VCDL) does not suffer from cycle-to-cycle accumulated jitter as a voltage-controlled oscillator does. Any accumulated jitter created by supply or substrate noise is corrected when a clean reference clock edge arrives at the input of the VCDL. Differential CML delay elements have been used in the VCDL to achieve short gate delays with low supply and substrate noise sensitivity. Four delay elements are sufficient to . If we assume a congenerate the eight clock phases for each delay element, the phase errors stant phase error sum up after each delay element. The maximum phase error , and , and amounts occurs between the phases to . In order to sample the data bits correctly, the phase and , difference between two sampling clock phases , has to be in the range . Fig. 10 illustrates how the phase tracking loop of the CDR aligns the to the incoming data stream edges of sampling clocks for the two extreme cases. For a positive and a negative phase has to be lower than error , the absolute phase error one fourth of the clock phase margin (CPM) of a sampler. At per delay element 11 GHz the maximal tolerable phase error has to be lower than 4.2 . In a DLL, a single loop integrator suffices to drive the steadystate phase error to zero. A typical DLL consists of a VCDL, phase detector, charge pump, and loop filter [10], [28]. In reality, charge pump DLLs have a static phase offset between the two clock signals at the phase detector input mainly due to the mismatch between the charge pump’s up and down currents. Thus, the steady-state phase error is not zero and depends on the non-idealities of the charge pump. The block diagram of the implemented DLL is shown in Fig. 11(a). Its structure is similar to the typical DLL but the charge pump is replaced by a differential operational transconductance amplifier (OTA). The OTA is a voltage-to-current in and out of its load caconverter that pumps current and is therefore equivalent to a charge pump pacitance that steers a current in and out of the loop filter capacitance. Fig. 12(a) illustrates a possible linear time-invariant phase is the gain domain model of the DLL [29] where of the phase detector and the OTA [30]. Since the OTA has a finite output resistance, the steady-state error of the DLL is non-zero and is inversely proportional to the loop gain. In order to determine the steady-state phase error, the phase domain model shown in Fig. 12(a) cannot be applied since it neglects the output resistance of the OTA. The phase domain model illustrated in Fig. 12(b) includes all gain stages and all poles, the dominant pole and higher order poles. The dominant pole is and is formed by the output impedance of the OTA and the input capacitance of the control element (CE). Hence, the OTA limits the bandwidth of the loop and determines the loop gain. for signal Dummy delay elements are added in front of conditioning and behind to provide the same load capaciin the delay line. The tance for all delay elements tunable delay elements are implemented using phase interpolation technique [31]–[33] as shown in Fig. 11(b) and (c). The gate delay of the delay element can be tuned proportionally to between the delay of the interpolator alone and the , where is the delay of the buffer inserted in the sum Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply. RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS Fig. 10. Maximum allowed phase error 1935 2 for the quarter-rate 2 -oversampling CDR. non-direct signal path. This results in a maximal tuning factor of (9) has been reduced to 1.9 to prevent The tuning factor erroneous phase locking over all process corners while using a 90 XOR phase detector [34]. This tuning factor limits the range for exact equidistant clock phases to frequencies from 6 of 2.88 to 11.5 GHz. At 5.75 GHz a systematic phase error per delay element is introduced but does not compromise the operation of the CDR because at the lower data rates, the CPM for sampling is larger. The 90 XOR phase detector based on the Gilbert cell multiplier, shown in Fig. 13(a), is sufficient to perform a direct phase detection at 11 GHz. The circuit is simpler than edge-triggered phase detectors and consequently consumes less power as well. The differential output signal PX is the result of the multiplication of the two differential input phase signals PH1 and PH2. The DC component of the signal PX is then proportional to the between the input phase difference and signals PH1 and PH2. Different propagation times from the input ports PH1 and PH2 to the output node of the Gilbert cell multiplier causes a systematic offset in the transfer function of the phase detector as shown in Fig. 13(b) and (c). 4.2 This offset generates an intolerable phase error of 9 per delay element. To compensate this error, we implemented a symmetrical phase detector PD formed by two Gilbert cell multipliers PD1 and PD2 as shown in Fig. 11(a). The input signals of PD2 are swapped with respect to PD1, generating a negative offset and thus compensating the offset of PD1 as shown in Fig. 13(c). This symmetrical phase detector is connected to the VCDL in such a way that the phase detector PD1 compares the phases P0 and P2, while phase detector PD2 compares the phases P1 and P3. This connection scheme leads to equally and removes the systematic loaded delay elements phase errors. The high-frequency components of the output signals PX1 and PX2 of the Gilbert cell multipliers are low-pass filtered before they are summed to generate the control signal PX. The filtering reduces the high-frequency amplitude of PX and prevents potential saturation of the input stage of the differential OTA. The OTA provides sufficient gain (37 dB) to the control loop to keep the steady-state phase error of one delay element below 0.4 (determined with the final value theorem [34]). Successively, the control element (CE) converts the differential output voltage of the differential OTA into the differof the VCDL. Compared to a ential control voltage charge pump solution, a linear and differential high-gain amplifier in the control loop has been preferred in order to minimize Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply. 1936 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009 Fig. 11. (a) DLL block diagram. (b) Simplified schematics of one delay element. (c) Schematics of the interpolator. Fig. 12. Phase domain model: (a) with integrator (OTA) and loop filter C the generated switching noise. Moreover, the whole control loop of the DLL has been implemented differentially to reduce the influence of common mode, substrate, and power supply noise in the DLL and generate clean clock phases with low jitter. Measurements performed on a separate test chip confirm the low noise and high PSRR of the implemented DLL. At 10 GHz, a peak-to-peak jitter below 2 ps has been measured on the singleended output p2a as shown in Fig. 14(a). This value is comparable to the jitter of the input reference clock. A supply noise of 100 mV modulated at 5 MHz amounts to a peak-to-peak jitter of the single-ended output p2a of 5.6 ps ( 0.22 UI at 40 Gb/s) as reported in Fig. 14(b). But when considering the differential , the peak-to-peak jitter remains below signal 2 ps ( 0.08 UI at 40 Gb/s) as shown in Fig. 14(c). The phase error measured between and after the delay and element DE1 is below 2 . Thus, the phase error between , (b) including finite output resistance of OTA and higher order poles. amounts to 6 . Mismatches between the devices (polysilicon resistors and NMOS transistors) in the differential stages are the at the output of the delay elements as cause of DC offset illustrated in Fig. 15(a). This DC offset propagates through the cascaded differential stages in the DLL, where it gets amplified, and causes a phase error at the output. A maximum DC offset up to 50 mV can occur after the last delay element of the VCDL of 25 ( 0.28 UI). To producing an intolerable phase error solve this problem, clock buffers are introduced after the DLL to reduce the accumulated DC offset, thereby restoring the required phase precision. F. Clock Buffer Clock buffers are placed between the DLL and the phase rotator and between the phase rotator and the samplers in order to Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply. RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS 1937 Fig. 13. (a) Schematic of the Gilbert cell multiplier. (b) Simplified block diagram of the Gilbert cell multiplier. (c) Ideal transfer characteristic of the Gilbert cell multiplier. Fig. 14. Measured peak-to-peak jitter of the DLL. (a) Single ended signal p2a without power supply noise: 1.8 ps. (b) Single-ended signal p2a with 100 mV modulated power supply at 5 MHz: 5.6 ps. (c) Differential signal p2a–p2b with power supply noise: 2 ps. drive the relatively large capacitive loads. Shunt-peaking inducof 1 nH are used to compensate the large load capactors , itance at the output, thus also reducing the power consumption of the clock buffer by 30%. With the nominal load capacitance of 100 fF, a gain of 4.5 dB at 10 GHz is achieved. The power consumption of the buffer is 5 mW. The clock buffers were designed to reduce DC offsets generated in the DLL. These DC offsets cause duty cycle distortion on differential signals compromising the phase precision and reducing the clock phase margin (CPM) of the system as shown in Fig. 15(b). Two samplers (sampler0 and sampler4) are and clocked with the two complementary phases . The DC offset between oa and ob causes between and , which reduces the CPM a phase error of the system. To reduce the phase error caused by DC offsets, a clock buffer with regulated output DC levels is implemented. Fig. 15(c) shows the schematic of the implemented clock buffer. Capacitive degeneration is used to reduce the gain at low frequencies without sacrificing the gain at high frequencies. The DC levels of the outputs oa and ob are regulated to the same DC level , set in the bias circuit, reducing the influence of the input DC offset. For input DC offsets up to 200 mV the phase accuracy of the output is improved by a factor of 25 with respect to the input signal, thus reducing the maximal phase error caused by input DC offset to 0.02 UI. The offset introduced by mismatches between the devices in the clock buffer is 15 mV corresponding to a phase error of maximal 2.15 . This phase error is much smaller than the error of the DLL. IV. MEASUREMENTS Our CDR circuit has been fabricated in a 90 nm bulk CMOS 0.2 mm . The technology and occupies 570 350 m layout and the die micrograph of the CDR circuit are shown in Fig. 16. All inputs and outputs are ESD protected except the differential 40 Gb/s data inputs. An ESD protection circuit similar to [35] cannot be placed at the 40 Gb/s input port since any additional capacitance at the input lowers the input pole frequency. The CDR is able to lock to a PRBS 2 1 data stream up to 44 Gb/s if the input signal is applied to the chip using on-wafer probes. The 40 Gb/s input eye diagram with a 10 GHz sinusoidal clock signal is illustrated in Fig. 17(a). The recovered 10 Gb/s data measured on-wafer without ESD protection and together with the packaged module including ESD protection are illustrated in Fig. 17(b) and (c), respectively. Since the recovered 10 Gb/s data signal is the buffered output signal of the front-end MS-FF (Figs. 1 and 2), the eye diagrams [Fig. 17(b), (c)] for full-, half-, and quarter-rate modes all look alike. The operating ranges for full-, half-, and quarter-rate modes cover the data ranges from 5.75 to 11.5 Gb/s, 11.5 to 23 Gb/s, and 23 to 44 Gb/s, respectively. For all data rates, the circuit consumes 230 mA from a 1 V power supply voltage (analog part: 215 mA, digital section: 15 mA). This results in an overhead of power consumption of a factor of two and four Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply. 1938 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009 Fig. 15. (a) Generation of DC offsets in CML stages. (b) Clock buffer with regulated DC levels at the output. (c) Phase error differential signals and reduction of the system CPM. Fig. 16. Chip photo and layout of the CDR. for half rate and quarter rate mode, respectively, assuming that the power consumption ideally scales with the data rate. As a future feature, this overhead could be reduced to 30% and 50%, respectively, by turning off the unused circuit blocks, e.g., part of samplers, clock buffers, and demultiplexers. In all operating modes, the maximum frequency offset that can be tracked is 615 ppm for a BER of 10 up to 38 Gb/s. The limit was set by the measurement setup because the ' generated by DC offset in input pattern from the pattern generator was not error free above 38 Gb/s. The value of 615 ppm is sufficient to countervail inequalities of the clock frequencies of two chips clocked from different crystal oscillators. Besides the frequency offset, which can be tracked, the jitter tolerance is the second key parameter for CDRs employed in chip-to-chip communication. Since our jitter tolerance measurement setup was limited to 27 Gb/s, the jitter tolerance measurements have been performed only with the IC mounted on a substrate (Fig. 18) at its maximum data rate of 24 Gb/s. This maximum is limited by losses and mismatches of the 1.6 cm input line on the substrate. To illustrate these effects, measured eye diagrams of the 24 Gb/s data stream at the input and output of this line are depicted in Fig. 18. Two different measurement setups are needed for the jitter tolerance test. Setup I has been used for jitter frequencies between 10 kHz and 1 MHz where the incoming data signal is directly modulated. At jitter frequencies above 1 MHz, it was not possible to modulate the data anymore and setup II has to be used. In setup II the system clock of the CDR has been modulated relative to the incoming data. The measured jitter tolerance plot at 24 Gb/s of the packaged CDR and the extended jitter tolerance mask for XAUI [36] are illustrated in Fig. 19. For all jitter frequencies and all jitter amplitudes, the XAUI mask can be fulfilled by our circuit. The dip around the jitter frequency of Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply. RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS 1939 Fig. 17. (a) 40 Gb/s input data, 10 GHz sinusoidal clock signal. (b) Recovered 10 Gb/s data measured on-wafer without ESD protection. (c) Recovered 10 Gb/s data measured with the packaged module including ESD protection. TABLE III 40 GB/S CMOS CDRS Fig. 18. Eye diagram of a 24 Gb/s data stream at the input of the package (left eye diagram) and at the pad of the circuit (right eye diagram). loop filter, our CDR covers the largest range of data rates. Furthermore, it consumes less power (30%) and has a smaller chip area than the 3 -oversampling CDR with an integrated 1:16 DEMUX [9]. Only the circuit in [13] reaches superior performance with respect to power and area, but uses a more advanced and expensive SOI CMOS technology that allows to implement also the speed-critical circuit blocks in CMOS logic instead of the more power- and area-consuming CML logic. V. SUMMARY Fig. 19. Jitter tolerance of the packed CDR at 24 Gb/s achieving a BER . 10 < 20 MHz, where the maximum jitter amplitude, which the CDR can tolerate, is lower than the clock phase margin of the sampler, is due to the loop delay mainly caused by pipelining stages in the digital part. Finally, Table III shows a comparison with previously published 40 Gb/s CMOS CDRs with analog [8], [15] or digital loop filters [9], [13], [14]. Fully analog CDRs are area consuming, 10 however dissipate less power but have a larger BER compared to [9] and [14]. Among the three CDRs with a digital A semi-digital clock-data-recovery circuit implemented in 90 nm bulk CMOS for 40 Gb/s chip-to-chip communication is presented. Thanks to the novel rate selection feature in the fully digital loop filter, a very large data rate range from 5.75 to 10 44 Gb/s can be covered. From 5.75 to 38 Gb/s a BER is achieved even for a frequency offset of 615 ppm and data jitter amplitudes above the XAUI mask. Measurement results of the DLL circuit showed that differential signaling in the clock path keeps jitter generation caused by power supply noise low. By inductive shunt-peaking in the speed-critical blocks, like the samplers and the clock buffers, the required high bandwidth is reached at a low power consumption. The power consumption per data rate of 5.3 mW/(Gb/s) of the proposed CDR is below the ITRS power budget requirement (Table I) for high-speed transceivers implemented in 90 nm CMOS technology. ACKNOWLEDGMENT The authors thank R. Brun, D. Holzer for the design of the digital logic, T. Toifl, H. Schmid, D. Müller, S. Schmid, P. Looser, D. Barras, C. Kromer, C. Menolfi, T. Morf, M. Kossel Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply. 1940 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009 and J. Weiss for fruitful discussions, and M. Lanz and M. Witzig for bonding. REFERENCES [1] The International Technology Roadmap for Semiconductors (2004 Update): Test and Test Equipment. ITRS, 2004 [Online]. Available: http:// www.itrs.net/Links/2004Update/2004_02_Test.pdf [2] K. L. J. Wong, H. Hatamkhani, M. Mansuri, and C. K. K. Yang, “A 27-mW 3.6-Gb/s I/O transceiver,” IEEE J. Solid-State Circuits, vol. 39, no. 4, pp. 602–612, Apr. 2004. [3] J. Poulton, R. Palmer, A. M. Fuller, T. Greer, J. Eyles, W. J. Dally, and M. Horowitz, “A 14-mW 6.25-Gb/s transceiver in 90-nm CMOS,” IEEE J. Solid-State Circuits, vol. 42, no. 12, pp. 2745–2757, Dec. 2007. [4] J. F. Bulzacchelli, M. Meghelli, S. V. Rylov, W. Rhee, A. V. Rylyakov, H. A. Ainspan, B. D. Parker, M. P. Beakes, A. Chung, T. J. Beukema, P. K. Pepeljugoski, L. Shan, Y. H. Kwark, S. Gowda, and D. J. Friedman, “A 10-Gb/s 5-tap DFE/4-tap FFE transceiver in 90-nm CMOS technology,” IEEE J. Solid-State Circuits, vol. 41, no. 12, pp. 2885–2900, Dec. 2006. [5] K. Fukuda, H. Yamashita, F. Yuki, M. Yagyu, R. Nemoto, T. Takemoto, T. Saito, N. Chujo, K. Yamamoto, H. Kanai, and A. Hayashi, “An 8 Gb/s transceiver with 3x-oversampling 2-threshold eye-tracking CDR circuit for 36.8 dB-loss backplane,” in IEEE ISSCC Dig. Tech. Papers, 2008, pp. 98–598. [6] C. Kromer, G. Sialm, C. Berger, T. Morf, M. L. Schmatz, F. Ellinger, D. Erni, G.-L. Bona, and H. Jäckel, “A 100 mW 4 10 Gb/s transceiver in 80-nm CMOS for high-density optical interconnects,” IEEE J. SolidState Circuits, vol. 40, no. 12, pp. 2667–2679, Dec. 2005. [7] S. Palermo, A. Emami-Neyestanak, and M. Horowitz, “A 90 nm CMOS 16 Gb/s transceiver for optical interconnects,” in IEEE ISSCC Dig. Tech. Papers, 2007, pp. 44, 586. [8] J. Lee and B. Razavi, “A 40-Gb/s clock and data recovery circuit in 0.18-m CMOS technology,” IEEE J. Solid-State Circuits, vol. 38, no. 12, pp. 2181–2190, Dec. 2003. [9] N. Nedovic, N. Tzartzanis, H. Tamura, F. M. Rotella, M. Wiklund, Y. Mizutani, Y. Okaniwa, T. Kuroda, J. Ogawa, and W. W. Walker, “A 40–44 Gb/s 3x oversampling CMOS CDR/1:16 DEMUX,” IEEE J. Solid-State Circuits, vol. 42, no. 12, pp. 2726–2735, Dec. 2007. [10] S. Sidiropoulos and M. A. Horowitz, “A semidigital dual delay-locked loop,” IEEE J. Solid-State Circuits, vol. 32, no. 11, pp. 1683–1692, Nov. 1997. [11] R. Kreienkamp, U. Langmann, C. Zimmermann, T. Aoyama, and H. Siedhoff, “A 10-Gb/s CMOS clock and data recovery circuit with an analog phase interpolator,” IEEE J. Solid-State Circuits, vol. 40, no. 3, pp. 736–743, Mar. 2005. [12] C. Kromer, G. Sialm, C. Menolfi, M. Schmatz, F. Ellinger, and H. Jackel, “A 25-Gb/s CDR in 90-nm CMOS for high-density interconnects,” IEEE J. Solid-State Circuits, vol. 41, no. 12, pp. 2921–2929, Dec. 2006. [13] T. Toifl, C. Menolfi, P. Buchmann, C. Hagleitner, M. Kossel, T. Morf, J. Weiss, and M. Schmatz, “A 72 mW 0.03 mm inductorless 40 Gb/s CDR in 65 nm SOI CMOS,” in IEEE ISSCC Dig. Tech. Papers, 2007, pp. 226–598. [14] G. v. Büren, L. Rodoni, A. Huber, R. Brun, D. Holzer, M. Schmatz, and H. Jäckel, “5.75 to 44 Gb/s quarter rate CDR with data rate selection in 90 nm bulk CMOS,” in Proc. ESSCIRC, 2008, pp. 166–169. [15] C. F. Liao and S. I. Liu, “40 Gb/s transimpedance-AGC amplifier and CDR circuit for broadband data receivers in 90 nm CMOS,” IEEE J. Solid-State Circuits, vol. 43, no. 3, pp. 642–655, Mar. 2008. [16] J. D. H. Alexander, “Clock recovery from random binary data,” Electron. Lett., vol. 11, pp. 541–542, Oct. 1975. [17] M. Horowitz, C.-K. K. Yang, and S. Sidiropoulos, “High-speed electrical signaling: Overview and limitations,” IEEE Micro, vol. 18, pp. 12–24, 1998. [18] C.-K. K. Yang and M. A. Horowitz, “A 0.8 m CMOS 2.5 Gb/s oversampling receiver and transmitter for serial links,” IEEE J. Solid-State Circuits, vol. 31, no. 12, pp. 2015–2023, Dec. 1996. [19] H. O. Johansson and C. Svensson, “Time resolution of NMOS sampling switches used on low-swing signals,” IEEE J. Solid-State Circuits, vol. 33, no. 2, pp. 237–245, Feb. 1998. [20] Y. Okaniwa, H. Tamura, M. Kibune, D. Yamazaki, C. Tsz-Shing, J. Ogawa, N. Tzartzanis, W. W. Walker, and T. Kuroda, “A 40-Gb/s CMOS clocked comparator with bandwidth modulation technique,” IEEE J. Solid-State Circuits, vol. 40, no. 8, pp. 1680–1687, Aug. 2005. 0 2 [21] K. Kanda, D. Yamazaki, T. Yamamoto, M. Horinaka, J. Ogawa, H. Tamura, and H. Onodera, “40 Gb/s 4:1 MUX/1:4 DEMUX in 90 nm standard CMOS,” in IEEE ISSCC Dig. Tech. Papers, 2005, pp. 152–590. [22] T. Chalvatzis, K. H. K. Yau, P. Schvan, M. T. Yang, and S. P. Voinigescu, “A 40-Gb/s decision circuit in 90 nm CMOS,” in Proc. ESSCIRC, 2006, pp. 512–515. [23] G. v. Buren, L. Rodoni, C. Kromer, H. Jackel, A. Huber, and T. Morf, “Low power sampling latch for up to 25 Gb/s 2x oversampling CDR in 90 nm CMOS,” in Proc. ESSCIRC, 2006, pp. 106–109. [24] S. S. Mohan, M. d. M. Hershenson, S. P. Boyd, and T. H. Lee, “Bandwidth extension in CMOS with optimized on-chip inductors,” IEEE J. Solid-State Circuits, vol. 35, no. 3, pp. 346–355, Mar. 2000. [25] F. Ellinger, M. Kossel, M. Huber, M. Schmatz, C. Kromer, G. Sialm, D. Barras, L. Rodoni, G. v. Buren, and H. Jackel, “High-Q inductors on digital VLSI CMOS substrate for analog RF applications,” in Proc. IEEE Int. Microwave and Optoelectronics Conf. (IMOC), 2003, vol. 2, pp. 869–872. [26] C. Kromer, “10 Gb/s to 40 Gb/s receiver for high-density optical interconnects in 80 nm CMOS,” Ph.D. dissertation, Swiss Federal Inst. Technol. (ETH), Zurich, Switzerland, 2005, ETH No. 16347. [27] T. Toifl, C. Menolfi, M. Ruegg, R. Reutemann, P. Buchmann, M. Kossel, T. Morf, J. Weiss, and M. L. Schmatz, “A 22-Gb/s PAM-4 receiver in 90-nm CMOS SOI technology,” IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 954–965, Apr. 2006. [28] J. G. Maneatis, “Low-jitter process-independent DLL and PLL based on self-biased techniques,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1723–1732, Nov. 1996. [29] J. R. Burnham, G.-Y. Wei, C.-K. K. Yang, and H. Hindi, “A comprehensive phase-transfer model for delay-locked loops,” in Proc. IEEE Custom Integrated Circuits Conf. (CICC), 2007, pp. 627–630. [30] T. Toifl, C. Menolfi, P. Buchmann, M. Kossel, T. Morf, R. Reutemann, M. Ruegg, M. L. Schmatz, and J. Weiss, “A 0.94-ps-RMS-jitter 0.016-mm 2.5-GHz multiphase generator PLL with 360 digitally programmable phase shift for 10-Gb/s serial links,” IEEE J. Solid-State Circuits, vol. 40, no. 12, pp. 2700–2712, Dec. 2005. [31] B. Lai and R. C. Walker, “A monolithic 622 Mb/s clock extraction data retiming circuit,” in IEEE ISSCC Dig. Tech. Papers, 1991, pp. 144–306. [32] M. Soyuer, J. F. Ewen, and H. L. Chuang, “A fully monolithic 1.25 GHz CMOS frequency synthesizer,” in Symp. VLSI Circuits Dig., 1994, pp. 127–128. [33] J. Savoj and B. Razavi, “A 10-Gb/s CMOS clock and data recovery circuit with a half-rate linear phase detector,” IEEE J. Solid-State Circuits, vol. 36, no. 5, pp. 761–768, May 2001. [34] R. E. Best, Phase-Locked Loops: Design, Simulation and Applications, 4th ed. New York: McGraw-Hill, 2003. [35] M. Kossel, C. Menolfi, J. Weiss, P. Buchmann, G. von Bueren, L. Rodoni, T. Morf, T. Toifl, and M. Schmatz, “A T-coil-enhanced 8.5 Gb/s high-swing sst transmitter in 65 nm bulk CMOS with < 16 dB return loss over 10 GHz bandwidth,” IEEE J. Solid-State Circuits, vol. 43, no. 12, pp. 2905–2920, Dec. 2008. [36] IEEE Standard for Information Technology: Media Access Control (MAC) Parameters, Physical Layers, and Management Parameters for 10 Gb/s Operation, IEEE Std. 802.3ae-2002, 2002, pp. 0_1-516. 0 Lucio Carlo Rodoni (S’03) was born in Biasca, Switzerland, in 1971. He received the Dipl. Ing. (M.S.) degree in electrical engineering from the Swiss Institute of Technology (ETH) Zürich, Switzerland, in 1998. From 1998 to 2000, he was with Mandozzi Electronics Inc., where he was involved in the development of digital audio mixers and 2 Mb/s transmission systems for audio and data. From 2000 to 2002, he was a Research Engineer with TChip Inc. developing global positioning system (GPS) RF front-end chips. Since 2002, he has been a member of the RF Integrated Circuit (RFIC) Group, Electronics Laboratory, ETH Zürich, Switzerland. Between October 2006 and March 2007, he was with IBM Zurich Research Laboratory, involved in a series-source terminated transmitter project. His main interests are integrated circuits for high-speed interconnect applications. Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply. RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS George von Büren (S’03) was born in Zürich, Switzerland, in 1974. He received the Dipl. Ing. (M.S.) degree in electrical engineering from the Swiss Federal Institute of Technology (ETH) Zurich, Switzerland, in 1999. From 1999 to 2002, he was with u-blox Inc., where he was involved in the development of embedded computers and GPS receivers. In 2002, he joined the Electronics Laboratory, ETH Zurich, as a Research Assistant to pursue his Ph.D. degree in collaboration with the IBM Zurich Research Laboratory in Rüschlikon. From October 2006 to March 2007, he was with IBM Zurich Research Laboratory developing on a series-source terminated transmitter in 65 nm CMOS. His research interests are the field of analog and mixed-signal design, with current focus on PLLs and clock and data recovery circuits for serial I/O-links. Alex Huber (S’93–M’00) was born in Zürich, Switzerland, in 1967. He received the Dipl. Ing. degree and the Ph.D. degree in electrical engineering form the Swiss Federal Institute of Technology (ETH), Zürich, in 1993 and 2000, respectively. From 1993 to 2000, he was with the Electronics Laboratory, ETH Zürich, as a Research Assistant, where he worked on RF circuit design and modeling of InP/InGaAs HBT devices. Since October 1999, he has been with the Institute of Microelectronics of the University of Applied Sciences Northwestern Switzerland, Windisch, Switzerland. His main research interests include low-power and high-speed integrated circuits in CMOS technologies for sensor and communication applications. 1941 Martin L. Schmatz (S’94–M’97) was born on May 8th, 1967, in St. Gallen, Switzerland. He received the degree in electrical engineering in 1993 and the Ph.D. degree in 1998, both from the Swiss Federal Institute of Technology (ETH), Zürich, for his work on low-power wireless receiver designs and on noise-parameter measurement systems. He received the ETH medal for his diploma work and the ETH-SEU award for outstanding research activities in 1995. In 1999, he joined the IBM Zürich Research Laboratory, where he established and managed a research team working on high-speed and high-density CMOS serial-link systems. By mid of 2008, he took over management responsibilities for the Systems Department at the IBM Zürich Research Laboratory with focused research on a wide range of server systems building blocks. He is a member of the IBM Academy of Technology and also manages the IBM-ETH Center for Advanced Silicon Electronics (CASE). Heinz Jäckel (M’82) received the Ph.D. degree in electrical engineering at the ETH Zurich in 1979. In 1980, he joined the IBM Research Division where he held scientific and management positions for 13 years in IBM Rüschlikon, Switzerland, and IBM Yorktown Heights, USA. During this time he carried out research in superconducting Josephson junction computers, GaAs-MESFET ICs, and optoelectronics. He has been a full Professor at the Electronics Laboratory of the Swiss Federal Institute of Technology, ETH Zurich, since 1993, heading the High Speed Electronics and Photonics group (http://www.ife.ee.ethz.ch/, http://www.photonics.ee.ethz.ch/). In electronics the research activities of his group concentrate on development of III/V technology, design and characterization of ultrafast InP-HBT transistors for 100 Gb/s electronics, and multi-10 GHz RF and digital 10–40 Gb/s CMOS IC design. In the area of ultra-dense and Tb/s lightwave communication research, topics are integrated InP-based mode-locked diode lasers, all-optical switches for all optical signal processing at Tb/s data rates, and planar InP-based photonic crystals. Prof. Jäckel has authored or coauthored over 100 publications, and holds around 20 patents. + Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.