A 5.75 to 44 Gb/s Quarter Rate CDR With Data Rate

advertisement
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009
1927
A 5.75 to 44 Gb/s Quarter Rate CDR With Data Rate
Selection in 90 nm Bulk CMOS
Lucio Rodoni, Student Member, IEEE, George von Büren, Student Member, IEEE, Alex Huber, Member, IEEE,
Martin Schmatz, Member, IEEE, and Heinz Jäckel, Member, IEEE
Abstract—This paper presents a quarter-rate clock and data
recovery (CDR) circuit for plesiochronous serial I/O-links. The
2 -oversampling phase-tracking CDR, implemented in 90 nm
bulk CMOS technology, covers the whole range of data rates from
5.75 to 44 Gb/s realized in a single IC by the novel feature of a data
rate selection logic. Input data are sampled with eight parallel
differential master-slave flip-flops, where bandwidth enhancement
techniques were necessary for 90 nm CMOS. Precise and low-jitter
local clock phases are generated by an analog delay-locked loop.
These clock phases are aligned to the incoming data by four
parallel phase rotators. The phase-tracking loop of the CDR is
realized as a digital delay-locked loop and is therefore immune
against process tolerances. The CDR is able to track a maximum
frequency deviation of 615 ppm between incoming data and a
local reference clock and fulfills the extended XAUI jitter tolerance
mask. A bit error rate 10 12 was verified up to 38 Gb/s using
a 27 1 PRBS pattern. With a low power consumption per data
rate of only 5.74 mW/(Gb/s) the CDR meets the specifications of
the International Technology Roadmap for Semiconductors for
90 nm CMOS serial I/O-links at the maximal data rate of 44 Gb/s.
The CDR occupies a chip area of 0.2 mm2 .
Index Terms—Clock and data recovery (CDR), CMOS analog
integrated circuits, current-mode logic (CML), delay-locked loop
(DLL), high-speed serial link, jitter tolerance.
I. INTRODUCTION
T
HE aggregate data communication bandwidth of key
components in telecommunication equipment and computer servers has experienced a continuous increase. This
progress has been achieved by increasing the serial data rate
and by integrating more power- and area-efficient transceivers
on a single CMOS IC. Key trends in CMOS technology, power
consumption, and aggregate data rate are summarized in Table I
according to the forecast of the International Roadmap for
Semiconductors (ITRS) published in 2004 [1]. In Table I, the
transceivers are categorized into high-integration-level serial
transceivers (e.g., 200 8 Gb/s) and high-performance serial
Manuscript received November 07, 2008; revised March 06, 2009. Current
version published June 24, 2009. This work was supported by the Swiss Federal
Office for Professional Education and Technology under Contract/Grant number
KTI 7995.1.
L. Rodoni, G. von Büren, and H. Jäckel are with the Swiss Federal Institute of Technology (ETH) Zurich, Electronics Laboratory, 8092 Zurich,
Switzerland (e-mail: lucio@rodoni.ch; george.vonbueren@ife.ee.ethz.ch;
jaeckel@ife.ee.ethz.ch).
A. Huber is with the Institute of Microelectronics, University of Applied
Sciences Northwestern Switzerland, 5210 Windisch, Switzerland (e-mail: alex.
huber@fhnw.ch).
M. Schmatz is with the Zurich Research Laboratory, IBM Research, 8803
Rüschlikon, Switzerland (e-mail: mrt@zurich.ibm.com).
Digital Object Identifier 10.1109/JSSC.2009.2021913
TABLE I
SERIAL TRANSCEIVER ROADMAP OF ITRS [1]
transceivers (e.g., 40 40 Gb/s). High-integration-level [2],
[3], high-performance [4], [5], and electrical/optical [6], [7]
chip-to-chip transceivers, representing the state of the art, are
summarized in Table II.
One of the critical and speed-limiting circuit blocks in a serial I/O link macro-cell is the clock and data recovery (CDR)
circuit in the receiver. The first 40 Gb/s CMOS CDR was presented in 2003 and was realized in a 0.18 m process [8]. This
40 Gb/s CDR employs a quarter-rate architecture with a multiphase LC oscillator and a passive loop filter. In 2007, a quarterrate 3 -oversampling 40–44 Gb/s CDR with 1:16 DEMUX implemented in 90 nm CMOS was presented [9]. This CDR fulfills
the ITU-T G.8251 jitter tolerance mask and its power consumption is less than 1/3 of a comparable commercial SiGe CDR with
1:16 DEMUX.
In multi-channel applications, where every participant has
, the CDR
nominally the same local reference frequency
of each receiver aligns the phase of its plesiochronous sampling clock to the incoming data by using phase interpolation
techniques [10]. Since no VCO is needed in each CDR, coupling between channels is reduced. The control of the sampling
position is realized by analog [11] or digital [10], [12] phasetracking loops. Using an analog phase interpolator, a 10.8 Gb/s
half-rate CDR implemented in 0.11 m CMOS fulfills the SDH/
SONET jitter tolerance at a BER 10 , consuming a power
of 220 mW and an area of 0.35 mm [11]. A half-rate 25 Gb/s
CDR implemented in 90 nm CMOS achieving a BER 10
incorporates a digital first-order loop filter, consumes 98 mA
from a 1.1 V supply, and occupies a die area of 0.064 mm only,
and is therefore suited for high-density integration [12]. It has
been shown with a 65 nm SOI CMOS technology that area and
power consumption per data rate of a quarter-rate 40 Gb/s CDR
can be as low as 0.03 mm and 1.8 mW/(Gb/s), respectively
0018-9200/$25.00 © 2009 IEEE
Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.
1928
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009
TABLE II
PUBLISHED SERIAL TRANSCEIVERS AND CDRS
Fig. 1. Architecture of the phase tracking loop.
[13]. Performance figures (e.g., input data range, reference clock
range) of these CDRs are listed in Table II. None
frequency
of them are able to handle the complete input data range from
10 to 40 Gb/s. Either the input data rate range is limited due to
the VCO of the CDR [8], [9], [13] or the plesiochronous CDR
asks for a too large reference frequency range [11], [12]. Therefore, we propose and have implemented successfully a data rate
selection logic that allows coverage of the whole range of data
rates even from 5.75 to 44 Gb/s while the reference frequency
range is 5.75 to 11 GHz [14]. This feature makes the circuit especially suitable in multi-standard applications, enabling new
link rates while supporting compatibility with legacy rates.
Section II gives an overview of the proposed CDR architecture. The building blocks of the CDR, such as the samplers,
8:32 demultiplexer, digital control loop with data rate selection,
phase rotator, delay-locked loop (DLL), and clock buffer, are
described in detail in Section III. Finally, measurement results
are presented in Section IV and a summary is given in Section V.
II. CDR ARCHITECTURE
In high-density serial I/O links, the transmitter (TX) and
receiver (RX) are clocked by two independent reference clocks
having the same nominal frequency. The CDR of the receiver
has to track a slowly drifting phase difference between the
incoming data and the RX clock caused by a bounded frequency difference in the range of 10 to 100 ppm between
the quartz-based plesiochronous TX and RX clocks. Hence, a
phase-tracking loop in the CDR is sufficient for this purpose.
Since the sampler is the speed-limiting circuit block of
the CDR, parallel architectures, e.g., half-rate [12], [15] and
quarter-rate CDRs [8], [13], are employed to demultiplex the
data at the input. A higher demultiplexing factor increases the
number of samplers and the number of clock edges, but has
the following two advantages: 1) the regeneration phase of
the comparator at the input is enlarged and 2) the sampling
clock frequency is lowered, simplifying the on-chip clock
distribution.
The block diagram of the 2 -oversampling quarter-rate CDR
is shown in Fig. 1. We chose a dual-loop architecture [10]: 1) the
phase tracking loop is realized as a digital DLL and 2) the referare generated
ence clock phases of the phase rotators
by an analog DLL. Eight parallel samplers clocked by
acquire the four data bits
and the four edges
needed to evaluate the sampling position [16]. Eight parallel 1:4
Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.
RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS
demultiplexers reduce the data rate from 10 to 2.5 Gb/s and align
the sampled bits, which are separated by 1/8th of the period of
the reference clock signal, to a single 2.5 GHz clock phase. At
this point, the transition from differential signaling to full-swing
CMOS signal levels is performed, since the data are aligned and
the clock period is long enough to design the digital blocks by
a standard design flow [17]. The aligned 16 data bits
and 16 edge bits
are compared in the edge detector,
which solves the 16 Alexander equations [16] and outputs a
early/late signal after majority voting. The digital loop filter
then evaluates the sequence of early and late bits and asserts
if needed an up/down signal for the phase rotator. The up/down
counter translates the up/down signal to a thermometer-coded
word controlling the four phase rotators. These four phase roso that the samtators shift the local reference phases
pling phases
are aligned to the incoming data. The
are
local reference clock phases of the phase rotators
generated with a DLL from a single reference clock.
All circuit blocks operating at a clock frequency above
2.5 GHz are implemented as current-mode logic (CML) to
meet the speed requirements. In addition, CML circuits have a
higher immunity to supply noise and generate less switching
noise on the power supply. The proposed CDR macro-cell requires only a single reference clock phase reducing complexity
and power consumption of the reference clock distribution
are buffered and
network. The four 10 Gb/s data bits
fed to output pins for testing and measurement purposes.
III. CDR BUILDING BLOCKS
A. Samplers
In this 2 -oversampling quarter-rate CDR, the front-end
sampling latch that is present in each sampler is the most
speed-critical building block. The front-end sampling latch has
to be able to track the incoming 40 Gb/s signal, sample the data
with a 10 GHz clock signal, and then decide if the voltage at
its input is below or above a threshold voltage within a time
period of half of a 10 GHz clock period. The latch following the
sampling latch has relaxed speed constraints because it operates
at a reduced data rate of 10 Gb/s. Together with the sampling
latch it forms a master–slave flip-flop (MS-FF) and provides a
stable output that is valid during a full 10 GHz clock period.
At the input of the eight samplers the data signal should have
rise and fall times , that are shorter than one half of the bit
. Based on first-order RC-circuit analysis
time
, the total input capacitance
of the eight samplers including wiring capacitance and pad has
to be kept under
(1)
allowing an input capacitance of 10 fF per sampler excluding
wire and pad capacitance of 30 fF and 70 fF, respectively.
A sampling latch consisting of a track-and-hold stage, implemented as NMOS pass transistors, followed by a latch [13],
[18] has not been chosen because of the rigorous requirement of
the clock signal. These requirements are short clock fall time to
achieve a high time resolution [19], high common-mode voltage
1929
of the sampling clock, and large clock swing to fully switch on
and off the pass gate. Since all other circuits use CML signaling,
differential CML latches are preferred. Samplers composed of
CML latches implemented in 90 nm CMOS are able to regenerate a 40 Gb/s data signal [9], [20].
Fig. 2(a) shows the block diagram of our sampler, which
consists of a front-end sampling latch [Fig. 2(b)], a slave latch
[Fig. 2(c)], and a CML buffer. The CML latches and the CML
buffer are fully differential circuits to achieve a higher immunity to power supply variations than pseudo-differential circuits
of the front-end sam[8], [21], [22]. The sample transistors
pling latch are limited in size (
m and
nm)
since the input capacitance of each sampler has to be lower than
10 fF. In order to reduce rise and fall times at the output, the
load resistors have to be decreased and the tail current increased.
Therefore, the widest transistor that keeps the input capacitance
below 10 fF has been chosen. The tail current has to provide a
m at
current of 1.54 mA to bias the transistors
a current density of 0.11 mA m, which has been evaluated by
simulation to achieve peak . In order to provide enough regenerative gain to fully switch the following differential pair and to
guarantee enough noise margin, a voltage swing of 600 mV is
. The latch
required, resulting in a load resistance
transistors
and the sample transistors
have equal transistor dimensions so that both have current densities of peak
and present the same load to the clock transistors
. The tranof the differential clocking stage has to steer the
sistor pair
current at a lower frequency than the transistor pair
,
.
To guarantee full current switching with a typical CML clock
are by a factor 1.5 wider than
signal, the clock transistors
and
in order to reduce the required signal swing for
proper switching of the differential pair
.
With a fan-out of 1, this sampling latch is able to regenerate
the input data up to 32 Gb/s. This configuration is defined as
case I. Because the second CML latch in the MS-FF configuration of Fig. 2(a) has to process a four times lower data rate, the
dimensions of its devices have been scaled by a factor of 1/3 in
order to reduce the capacitive load of the first latch. In this configuration (case II) a maximal data rate of 37 Gb/s is achieved,
because the tracking bandwidth of the sampling latch has been
increased and its regeneration time [23, eq. (1)] has been reduced. To further increase the bandwidth of the sampling latch,
shunt peaking inductors [24] are introduced (case III) as shown
nH, where
in Fig. 2(b). With integrated inductors of
one inductor occupies an area of 20 20 m using the two topmost metal layers [25], [26], the tracking bandwidth of the first
CML latch is extended by a factor 1.2 enabling the sampling of
input data up to 44 Gb/s. A CML buffer after the second latch is
needed to drive not only the demultiplexer but also the 10 Gb/s
output driver. Even though the first latch of the sampler incorporates two shunt-peaking inductors, the layout of one sampler is
still very compact and occupies an area of 50 m 35 m only,
where the two inductors occupy 46% of the area.
To quantify and compare the sensitivity, timing resolution,
and bandwidth of these three sampling front-ends (I, II, III) each
of them has been characterized with the procedure described in
[27]. The idea is that the latch can be separated in a linear portion, described by an integration window, and an ideal sampler
Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.
1930
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009
Fig. 2. (a) Sampler. (b) Inductive shunt-peaked first CML latch. (c) Second CML latch.
Fig. 3. (a) Required voltage A(1t), (b) sensitivity function h and (c) normalized transfer function jH
1, no L; (II) Fig. 2(a), no L; (III) Fig. 2(a).
plus decision-dependent feedback [27]. By this, the bandwidth
limitations of the sampling stage and the impact of the finite
slew rate of the clock (0.4 V/32 ps in this case) can be included
in the transfer function of the overall data path. The integration window is derived by measuring the sensitivity for a short
voltage pulse as a function of the sampling time relative to the
that is just
clock edge. Fig. 3(a) illustrates the amplitude
with resufficient to flip the latch for a given time offset
spect to the latch clock. The sensitivity function , or more
precisely the function one divided by the sensitivity per ps, is
depicted for the three cases in Fig. 3(b). The sensitivity window
of the CML latch with peaking inductors is slightly smaller than
the others, indicating superior time-resolution capability. Moreover, the CML latch with peaking inductors (III) has the best
!
( )j for the three cases: (I) 1st CML latch with fanout
[27, eq. 29]. The transfer funcDC input sensitivity voltage
is derived by taking the Fourier transfer function
tion
of the sensitivity function
. The normalized transfer funcnormalized for the target sensitivity of 5 mV is
tion
shown in Fig. 3(c). The sampler (III) shown in Fig. 2(a) has the
highest equivalent 3 dB bandwidth.
B. Data Alignment and 8:32 Demultiplexer
The 8:32 demultiplexer block following the samplers consists
of eight parallel paths where each is built by a cascade of one 1:2
demultiplexer and one 2:4 demultiplexer. Eight 10 Gb/s input
signals at the CML level, which are separated by 1/8th of the period of the reference clock signal, are converted to 32 output signals at full-swing CMOS levels at a data rate of 2.5 Gb/s. These
Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.
RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS
1931
Fig. 4. Block diagram of the digital control circuit.
output signals are aligned to a single 2.5 GHz clock CMOS
signal. The 2.5 GHz clock is derived from the 10 GHz sampling
clock and serves as clock for the digital logic. The design goal
for this alignment is to balance the loading of all sampling clock
, without inserting any dummy elements. Difphases
ferent capacitive loads connected to the clock signals
would potentially lead to phase shifts, which result in inaccuracies of the sampling points. Simply resampling the input signals
is not
of the demultiplexer by one of the clock phases
used for sampling would be
possible since the clock phase
too heavily loaded, and furthermore, the timing margins in the
latches of the first demultiplexer stage would be too small. In
order to increase the timing margins, the first four samples ,
, , and
are delayed by one half of a clock period. Although an additional 50 ps of timing margin is obtained, correct
operation is still not guaranteed for all process corners when
all eight signals are sampled with one clock phase. We therefore used the and phases of the divided clock at 5 GHz,
where each of them samples four input signals. This adds another 25 ps of timing margin. The frequency divider can be designed to present a small capacitive load to one of the sampling
clock phases. This minimum load is the only loading imbal. Moreover, using the and
ance of the clock phases
phases further leads to symmetrical loads connected to the frequency divider, which is favorable in terms of speed. However,
at the output of the first demultiplexer stage, the data signals
are not aligned yet. The data alignment can easily be achieved
by sampling the signals with a single 2.5 GHz clock signal at
the input of the second demultiplexer stage. At a nominal data
rate of 2.5 Gb/s, the timing margin is large enough (150 ps).
The presented alignment procedure results in a minimum imbalanced loading of the sampling clock signals, and it is robust
to process, voltage, and temperature (PVT) variations because
of the large timing margins.
C. Digital Control Loop With Rate Selection
Fig. 4 illustrates the block diagram of the digital control
logic, which offers the option to select between three different
input data rates. All these circuit blocks, which run at 2.5 and
1.25 GHz, are synthesized circuits and are placed and routed
with a digital design tool.
The edge detector solves the Alexander equations [16]
(2)
for 16, 8, or 4 data/edge pairs depending on the selected data
rate. The detector outputs a single early or late signal after majority voting. In order to relax the speed requirements for the
digital CMOS loop filter, the output signals of the edge detector
are demultiplexed by a factor of two. The loop filter, running at
1.25 GHz, is realized as a finite state machine (FSM), which accumulates the EARLY[1:0] and LATE[1:0] bits. The state machine can make zero, one, or two steps depending on the difference of the number of EARLY[1:0] and LATE[1:0] bits. The
state machine consists of twelve states, arranged as two circles.
After running through one circle of six states, an up or down
signal, respectively, is generated by the FSM. These up and
down signals increment or decrement the thermometer-coded
up/down counter value, which controls the phase rotator. Since
double steps are possible, the state machine needs at least three
clock cycles between two consecutive up or down impulses.
Hence, the maximum update rate of the phase rotator for the
nominal data rate is
(3)
A higher update rate, which would increase the jitter tolerance,
could be reached by reducing the number of states in the FSM.
Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.
1932
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009
Fig. 5. Principle of the rate selection: Quarter-rate (QR), half-rate (HR), and full-rate (FR) mode. : sample points; : discarded samples.
However, the minimum number of states is given in order not
to induce false phase steps due to the overall delay of the CDR
loop.
A special feature of our digital control logic is the capability
to support different data rates. The logic responsible for the data
rate selection is implemented in the edge detector as shown in
Fig. 4 and operates as depicted in Fig. 5. Quarter-rate (QR) operation is used for an input data rate from 23 to 44 Gb/s. The
early/late generation logic generates for each of the 16 data/edge
bit pairs an early/late signal by solving the Alexander equations
[16]. When the data rate is lower and the bit length larger, between 11.5 and 23 Gb/s, the CDR operates in half-rate (HR)
mode. The edge samples
used in the quarter-rate mode
are omitted and only the data samples
are evaluated.
In this mode, the even data samples take the role of the edge bits
and the odd data samples are still data bits. From these eight
data/edge pairs, the early/late information is generated. For a
still lower input data rate ranging from 5.75 to 11.5 Gb/s, the
full-rate (FR) mode is possible. Here, every other sample of the
odd data samples are alternately used as a data and an edge bit,
respectively. Hence, our receiver can cover the full range of data
rates from 5.75 to 44 Gb/s, even though the multi-phase DLL,
, is band
which generates the reference clock phases
limited. The DLL operates from 5.75 to 11.5 GHz and limits
the lower data rate of the CDR.
D. Phase Rotator
In order to update the sampling position, we use four parallel
phase rotators which are controlled by the thermometer-coded
up/down counter. Using a full thermometer code, discontinuities in the phase rotator transfer characteristics can be avoided.
generated in the DLL are
The reference clock phases
fed to the four phase rotators. One phase rotator as shown in
Fig. 6 consists of a phase selection stage followed by a phase
interpolation stage [10]. The first stage, consisting of two 4:1
multiplexers [Fig. 7(a)], selects two clock phases from two
adjacent phase octants. The interpolation process with eight
clock phases results in a better phase linearity compared to
interpolation schemes using six or I/Q phases [12]. The phase
Fig. 6. Block diagram of one phase rotator.
interpolator [Fig. 7(b)] is a dual input differential amplifier
and blends the two selected phases according to the 8-bit ther. Retiming flip-flops between
mometer-coded value
the up/down counter and the phase rotator guarantee that all
,
,
change
control signals
their states at the same time, thus avoiding phase glitches.
The common-mode outputs of the selector and the interpolator
are regulated by a replica bias as all CML circuits of this
CDR. An important practical requirement is that amplitude
and common-mode voltage of the sampling clock always have
their correct amplitude and voltage level - even after start-up to assure the presence of the CDR system clock. This implies
,
,
are
that the control signals
initialized correctly.
The eight interpolation steps together with the eight input
result in a total of 64 phase steps. Hence,
clock phases
one phase step
amounts to
(4)
and the ideal phasor of the output signal
rotator having equal phase steps is
of the phase
(5)
denotes the amplitude. The value of inwhere parameter
teger
ranges in the interval
, where
is the
number of interpolation steps. Since the phase interpolation is
achieved by adding two signals with different phases and not by
a real rotation of the phase of a single signal, the interpolated
Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.
RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS
1933
Fig. 7. (a) Schematic of the 4:1 phase selector. (b) Schematic of the phase interpolator (type-I).
Fig. 8. (a) Equal versus interpolated phase steps of one octant of the 360 circle and phase error " . (b) Simulated phase step (DNL) and (c) absolute phase error
(INL) for type-I and type-II phase interpolators with a 10 GHz clock.
output signal
and is
is not equal to the ideal output signal
(6)
The interpolated phase steps in (6) and equal phase steps in (5)
are calculated and displayed as
and , respectively, for one
octant in Fig. 8(a). The maximum deterministic interpolation
is 0.5 and is by a factor of ten
phase error
smaller than one phase step. Furthermore, the phase steps vary
because inputs and output are not fully isolated due to capacitive
feedback. The simulated phase steps (DNL) and absolute phase
error (INL) for a clock frequency of 10 GHz are shown for our
implementation, a type-I phase interpolator, and a type-II phase
interpolator in Fig. 8(b) and (c), respectively. A type-I phase
interpolator has a common-source stage as shown in Fig. 7(b).
A type-II phase interpolator incorporates a cascode stage [10,
Fig. 10]. The maximum phase step of a type-I phase interpolator
occurs due to the parasitic effect of capacitive coupling between
gate and drain when the interpolation boundary is reached and
the output clock of the 4:1 multiplexer is switched, e.g., from
to . Although the alternative design (type-II) has a better isolation property, it was not used since it has a too low unity gain
frequency under worst-case process condition. Furthermore, it
has been reported that a type-II phase interpolator has a more
nonlinear transfer characteristic at lower clock frequencies [10,
Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.
1934
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009
Fig. 9. Simulated transfer characteristic of the four phase rotators.
Fig. 11]. The simulated transfer function of the four phase rotators with a clock frequency of 10 GHz and an update rate of
1.25 GHz is depicted in Fig. 9 and reveals no deterministic phase
offset between the four clock phases. At lower clock frequencies
the transfer characteristic becomes slightly more nonlinear due
to sharper clock edges and wider spacing between the interpolating edges.
A total of 64 phase steps for one 100 ps reference clock
period or 16 steps for one data unit interval (UI) of 25 ps are
provided, resulting in a nominal timing resolution of 1.56 ps.
When the phase rotator is updated with the maximum update
evaluated in (3), the maximal possible frerate
between TX and RX clocks that can be
quency offset
tracked correctly is
(7)
ppm
or expressed in parts per million (ppm):
ppm.
Besides the frequency offset, which can be tracked, the jitter
tolerance is the second key parameter for CDRs employed in
chip-to-chip communication. For a first-order phase tracking
that the CDR can tolCDR, the maximum jitter amplitude
erate is limited by
(8)
and is inversely proportional to the jitter frequency
.
Jitter tolerance can be increased by a higher update rate
or a larger phase step
, where the latter
increases the dither amplitude of the loop.
E. Delay-Locked Loop
A DLL operating between 5.75 and 11.5 GHz generates the
eight clock phases
for the four phase rotators. Compared to a PLL implementation, a DLL solution is preferred due
to better immunity to on-chip noise because a voltage-controlled
delay line (VCDL) does not suffer from cycle-to-cycle accumulated jitter as a voltage-controlled oscillator does. Any accumulated jitter created by supply or substrate noise is corrected when
a clean reference clock edge arrives at the input of the VCDL.
Differential CML delay elements have been used in the
VCDL to achieve short gate delays with low supply and substrate noise sensitivity. Four delay elements are sufficient to
. If we assume a congenerate the eight clock phases
for each delay element, the phase errors
stant phase error
sum up after each delay element. The maximum phase error
,
and
,
and amounts
occurs between the phases
to
. In order to sample the data bits correctly, the phase
and ,
difference between two sampling clock phases ,
has to be in the range
. Fig. 10
illustrates how the phase tracking loop of the CDR aligns the
to the incoming data stream
edges of sampling clocks
for the two extreme cases. For a positive and a negative phase
has to be lower than
error , the absolute phase error
one fourth of the clock phase margin (CPM) of a sampler. At
per delay element
11 GHz the maximal tolerable phase error
has to be lower than 4.2 .
In a DLL, a single loop integrator suffices to drive the steadystate phase error to zero. A typical DLL consists of a VCDL,
phase detector, charge pump, and loop filter [10], [28]. In reality, charge pump DLLs have a static phase offset between the
two clock signals at the phase detector input mainly due to the
mismatch between the charge pump’s up and down currents.
Thus, the steady-state phase error is not zero and depends on
the non-idealities of the charge pump.
The block diagram of the implemented DLL is shown in
Fig. 11(a). Its structure is similar to the typical DLL but the
charge pump is replaced by a differential operational transconductance amplifier (OTA). The OTA is a voltage-to-current
in and out of its load caconverter that pumps current
and is therefore equivalent to a charge pump
pacitance
that steers a current in and out of the loop filter capacitance.
Fig. 12(a) illustrates a possible linear time-invariant phase
is the gain
domain model of the DLL [29] where
of the phase detector and the OTA [30]. Since the OTA has a
finite output resistance, the steady-state error of the DLL is
non-zero and is inversely proportional to the loop gain. In order
to determine the steady-state phase error, the phase domain
model shown in Fig. 12(a) cannot be applied since it neglects
the output resistance of the OTA. The phase domain model
illustrated in Fig. 12(b) includes all gain stages and all poles,
the dominant pole and higher order poles. The dominant pole is
and is formed by the output impedance of the OTA and
the input capacitance of the control element (CE). Hence, the
OTA limits the bandwidth of the loop and determines the loop
gain.
for signal
Dummy delay elements are added in front of
conditioning and behind
to provide the same load capaciin the delay line. The
tance for all delay elements
tunable delay elements are implemented using phase interpolation technique [31]–[33] as shown in Fig. 11(b) and (c). The
gate delay of the delay element can be tuned proportionally to
between the delay
of the interpolator alone and the
, where is the delay of the buffer inserted in the
sum
Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.
RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS
Fig. 10. Maximum allowed phase error
1935
2
for the quarter-rate 2 -oversampling CDR.
non-direct signal path. This results in a maximal tuning factor
of
(9)
has been reduced to 1.9 to prevent
The tuning factor
erroneous phase locking over all process corners while using
a 90 XOR phase detector [34]. This tuning factor limits the
range for exact equidistant clock phases to frequencies from 6
of 2.88
to 11.5 GHz. At 5.75 GHz a systematic phase error
per delay element is introduced but does not compromise the
operation of the CDR because at the lower data rates, the CPM
for sampling is larger.
The 90 XOR phase detector based on the Gilbert cell multiplier, shown in Fig. 13(a), is sufficient to perform a direct phase
detection at 11 GHz. The circuit is simpler than edge-triggered
phase detectors and consequently consumes less power as well.
The differential output signal PX is the result of the multiplication of the two differential input phase signals PH1 and PH2.
The DC component of the signal PX is then proportional to the
between the input
phase difference
and
signals PH1 and PH2. Different propagation times
from the input ports PH1 and PH2 to the output node of the
Gilbert cell multiplier causes a systematic offset in the transfer
function of the phase detector as shown in Fig. 13(b) and (c).
4.2
This offset generates an intolerable phase error of 9
per delay element. To compensate this error, we implemented a
symmetrical phase detector PD formed by two Gilbert cell multipliers PD1 and PD2 as shown in Fig. 11(a). The input signals
of PD2 are swapped with respect to PD1, generating a negative offset and thus compensating the offset of PD1 as shown
in Fig. 13(c). This symmetrical phase detector is connected to
the VCDL in such a way that the phase detector PD1 compares
the phases P0 and P2, while phase detector PD2 compares the
phases P1 and P3. This connection scheme leads to equally
and removes the systematic
loaded delay elements
phase errors. The high-frequency components of the output signals PX1 and PX2 of the Gilbert cell multipliers are low-pass
filtered before they are summed to generate the control signal
PX. The filtering reduces the high-frequency amplitude of PX
and prevents potential saturation of the input stage of the differential OTA. The OTA provides sufficient gain (37 dB) to the
control loop to keep the steady-state phase error of one delay
element below 0.4 (determined with the final value theorem
[34]). Successively, the control element (CE) converts the differential output voltage of the differential OTA into the differof the VCDL. Compared to a
ential control voltage
charge pump solution, a linear and differential high-gain amplifier in the control loop has been preferred in order to minimize
Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.
1936
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009
Fig. 11. (a) DLL block diagram. (b) Simplified schematics of one delay element. (c) Schematics of the interpolator.
Fig. 12. Phase domain model: (a) with integrator (OTA) and loop filter C
the generated switching noise. Moreover, the whole control loop
of the DLL has been implemented differentially to reduce the influence of common mode, substrate, and power supply noise in
the DLL and generate clean clock phases with low jitter.
Measurements performed on a separate test chip confirm the
low noise and high PSRR of the implemented DLL. At 10 GHz,
a peak-to-peak jitter below 2 ps has been measured on the singleended output p2a as shown in Fig. 14(a). This value is comparable to the jitter of the input reference clock. A supply noise of
100 mV modulated at 5 MHz amounts to a peak-to-peak jitter
of the single-ended output p2a of 5.6 ps ( 0.22 UI at 40 Gb/s)
as reported in Fig. 14(b). But when considering the differential
, the peak-to-peak jitter remains below
signal
2 ps ( 0.08 UI at 40 Gb/s) as shown in Fig. 14(c).
The phase error measured between
and
after the delay
and
element DE1 is below 2 . Thus, the phase error between
, (b) including finite output resistance of OTA and higher order poles.
amounts to 6 . Mismatches between the devices (polysilicon
resistors and NMOS transistors) in the differential stages are the
at the output of the delay elements as
cause of DC offset
illustrated in Fig. 15(a). This DC offset propagates through the
cascaded differential stages in the DLL, where it gets amplified,
and causes a phase error at the output. A maximum DC offset
up to 50 mV can occur after the last delay element of the VCDL
of 25 ( 0.28 UI). To
producing an intolerable phase error
solve this problem, clock buffers are introduced after the DLL
to reduce the accumulated DC offset, thereby restoring the required phase precision.
F. Clock Buffer
Clock buffers are placed between the DLL and the phase rotator and between the phase rotator and the samplers in order to
Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.
RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS
1937
Fig. 13. (a) Schematic of the Gilbert cell multiplier. (b) Simplified block diagram of the Gilbert cell multiplier. (c) Ideal transfer characteristic of the Gilbert cell
multiplier.
Fig. 14. Measured peak-to-peak jitter of the DLL. (a) Single ended signal p2a without power supply noise: 1.8 ps. (b) Single-ended signal p2a with 100 mV
modulated power supply at 5 MHz: 5.6 ps. (c) Differential signal p2a–p2b with power supply noise: 2 ps.
drive the relatively large capacitive loads. Shunt-peaking inducof 1 nH are used to compensate the large load capactors ,
itance at the output, thus also reducing the power consumption
of the clock buffer by 30%. With the nominal load capacitance
of 100 fF, a gain of 4.5 dB at 10 GHz is achieved. The power
consumption of the buffer is 5 mW.
The clock buffers were designed to reduce DC offsets generated in the DLL. These DC offsets cause duty cycle distortion on differential signals compromising the phase precision
and reducing the clock phase margin (CPM) of the system as
shown in Fig. 15(b). Two samplers (sampler0 and sampler4) are
and
clocked with the two complementary phases
. The DC offset
between oa and ob causes
between
and , which reduces the CPM
a phase error
of the system.
To reduce the phase error caused by DC offsets, a clock buffer
with regulated output DC levels is implemented. Fig. 15(c)
shows the schematic of the implemented clock buffer. Capacitive degeneration is used to reduce the gain at low frequencies
without sacrificing the gain at high frequencies. The DC levels
of the outputs oa and ob are regulated to the same DC level
, set in the bias circuit, reducing the influence of the
input DC offset. For input DC offsets up to 200 mV the phase
accuracy of the output is improved by a factor of 25 with respect to the input signal, thus reducing the maximal phase error
caused by input DC offset to 0.02 UI. The offset introduced
by mismatches between the devices in the clock buffer is 15 mV
corresponding to a phase error of maximal 2.15 . This phase
error is much smaller than the error of the DLL.
IV. MEASUREMENTS
Our CDR circuit has been fabricated in a 90 nm bulk CMOS
0.2 mm . The
technology and occupies 570 350 m
layout and the die micrograph of the CDR circuit are shown
in Fig. 16. All inputs and outputs are ESD protected except
the differential 40 Gb/s data inputs. An ESD protection circuit similar to [35] cannot be placed at the 40 Gb/s input port
since any additional capacitance at the input lowers the input
pole frequency. The CDR is able to lock to a PRBS 2 1 data
stream up to 44 Gb/s if the input signal is applied to the chip
using on-wafer probes. The 40 Gb/s input eye diagram with
a 10 GHz sinusoidal clock signal is illustrated in Fig. 17(a).
The recovered 10 Gb/s data measured on-wafer without ESD
protection and together with the packaged module including
ESD protection are illustrated in Fig. 17(b) and (c), respectively. Since the recovered 10 Gb/s data signal is the buffered
output signal of the front-end MS-FF (Figs. 1 and 2), the eye diagrams [Fig. 17(b), (c)] for full-, half-, and quarter-rate modes all
look alike. The operating ranges for full-, half-, and quarter-rate
modes cover the data ranges from 5.75 to 11.5 Gb/s, 11.5 to
23 Gb/s, and 23 to 44 Gb/s, respectively. For all data rates, the
circuit consumes 230 mA from a 1 V power supply voltage
(analog part: 215 mA, digital section: 15 mA). This results in
an overhead of power consumption of a factor of two and four
Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.
1938
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009
Fig. 15. (a) Generation of DC offsets in CML stages. (b) Clock buffer with regulated DC levels at the output. (c) Phase error
differential signals and reduction of the system CPM.
Fig. 16. Chip photo and layout of the CDR.
for half rate and quarter rate mode, respectively, assuming that
the power consumption ideally scales with the data rate. As a
future feature, this overhead could be reduced to 30% and 50%,
respectively, by turning off the unused circuit blocks, e.g., part
of samplers, clock buffers, and demultiplexers.
In all operating modes, the maximum frequency offset
that can be tracked is 615 ppm for a BER of 10
up to
38 Gb/s. The limit was set by the measurement setup because the
'
generated by DC offset in
input pattern from the pattern generator was not error free above
38 Gb/s. The value of 615 ppm is sufficient to countervail inequalities of the clock frequencies of two chips clocked from
different crystal oscillators. Besides the frequency offset, which
can be tracked, the jitter tolerance is the second key parameter
for CDRs employed in chip-to-chip communication. Since our
jitter tolerance measurement setup was limited to 27 Gb/s, the
jitter tolerance measurements have been performed only with
the IC mounted on a substrate (Fig. 18) at its maximum data
rate of 24 Gb/s. This maximum is limited by losses and mismatches of the 1.6 cm input line on the substrate. To illustrate
these effects, measured eye diagrams of the 24 Gb/s data stream
at the input and output of this line are depicted in Fig. 18. Two
different measurement setups are needed for the jitter tolerance
test. Setup I has been used for jitter frequencies between 10 kHz
and 1 MHz where the incoming data signal is directly modulated. At jitter frequencies above 1 MHz, it was not possible
to modulate the data anymore and setup II has to be used. In
setup II the system clock of the CDR has been modulated relative to the incoming data. The measured jitter tolerance plot
at 24 Gb/s of the packaged CDR and the extended jitter tolerance mask for XAUI [36] are illustrated in Fig. 19. For all
jitter frequencies and all jitter amplitudes, the XAUI mask can
be fulfilled by our circuit. The dip around the jitter frequency of
Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.
RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS
1939
Fig. 17. (a) 40 Gb/s input data, 10 GHz sinusoidal clock signal. (b) Recovered 10 Gb/s data measured on-wafer without ESD protection. (c) Recovered 10 Gb/s
data measured with the packaged module including ESD protection.
TABLE III
40 GB/S CMOS CDRS
Fig. 18. Eye diagram of a 24 Gb/s data stream at the input of the package (left
eye diagram) and at the pad of the circuit (right eye diagram).
loop filter, our CDR covers the largest range of data rates. Furthermore, it consumes less power (30%) and has a smaller chip
area than the 3 -oversampling CDR with an integrated 1:16
DEMUX [9]. Only the circuit in [13] reaches superior performance with respect to power and area, but uses a more advanced
and expensive SOI CMOS technology that allows to implement
also the speed-critical circuit blocks in CMOS logic instead of
the more power- and area-consuming CML logic.
V. SUMMARY
Fig. 19. Jitter tolerance of the packed CDR at 24 Gb/s achieving a BER
.
10
<
20 MHz, where the maximum jitter amplitude, which the CDR
can tolerate, is lower than the clock phase margin of the sampler, is due to the loop delay mainly caused by pipelining stages
in the digital part.
Finally, Table III shows a comparison with previously published 40 Gb/s CMOS CDRs with analog [8], [15] or digital loop
filters [9], [13], [14]. Fully analog CDRs are area consuming,
10
however dissipate less power but have a larger BER
compared to [9] and [14]. Among the three CDRs with a digital
A semi-digital clock-data-recovery circuit implemented in
90 nm bulk CMOS for 40 Gb/s chip-to-chip communication
is presented. Thanks to the novel rate selection feature in the
fully digital loop filter, a very large data rate range from 5.75 to
10
44 Gb/s can be covered. From 5.75 to 38 Gb/s a BER
is achieved even for a frequency offset of 615 ppm and data
jitter amplitudes above the XAUI mask. Measurement results of
the DLL circuit showed that differential signaling in the clock
path keeps jitter generation caused by power supply noise low.
By inductive shunt-peaking in the speed-critical blocks, like the
samplers and the clock buffers, the required high bandwidth is
reached at a low power consumption. The power consumption
per data rate of 5.3 mW/(Gb/s) of the proposed CDR is below
the ITRS power budget requirement (Table I) for high-speed
transceivers implemented in 90 nm CMOS technology.
ACKNOWLEDGMENT
The authors thank R. Brun, D. Holzer for the design of
the digital logic, T. Toifl, H. Schmid, D. Müller, S. Schmid,
P. Looser, D. Barras, C. Kromer, C. Menolfi, T. Morf, M. Kossel
Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.
1940
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009
and J. Weiss for fruitful discussions, and M. Lanz and M. Witzig
for bonding.
REFERENCES
[1] The International Technology Roadmap for Semiconductors (2004 Update): Test and Test Equipment. ITRS, 2004 [Online]. Available: http://
www.itrs.net/Links/2004Update/2004_02_Test.pdf
[2] K. L. J. Wong, H. Hatamkhani, M. Mansuri, and C. K. K. Yang, “A
27-mW 3.6-Gb/s I/O transceiver,” IEEE J. Solid-State Circuits, vol. 39,
no. 4, pp. 602–612, Apr. 2004.
[3] J. Poulton, R. Palmer, A. M. Fuller, T. Greer, J. Eyles, W. J. Dally,
and M. Horowitz, “A 14-mW 6.25-Gb/s transceiver in 90-nm CMOS,”
IEEE J. Solid-State Circuits, vol. 42, no. 12, pp. 2745–2757, Dec. 2007.
[4] J. F. Bulzacchelli, M. Meghelli, S. V. Rylov, W. Rhee, A. V. Rylyakov,
H. A. Ainspan, B. D. Parker, M. P. Beakes, A. Chung, T. J. Beukema, P.
K. Pepeljugoski, L. Shan, Y. H. Kwark, S. Gowda, and D. J. Friedman,
“A 10-Gb/s 5-tap DFE/4-tap FFE transceiver in 90-nm CMOS technology,” IEEE J. Solid-State Circuits, vol. 41, no. 12, pp. 2885–2900,
Dec. 2006.
[5] K. Fukuda, H. Yamashita, F. Yuki, M. Yagyu, R. Nemoto, T. Takemoto, T. Saito, N. Chujo, K. Yamamoto, H. Kanai, and A. Hayashi,
“An 8 Gb/s transceiver with 3x-oversampling 2-threshold eye-tracking
CDR circuit for 36.8 dB-loss backplane,” in IEEE ISSCC Dig. Tech.
Papers, 2008, pp. 98–598.
[6] C. Kromer, G. Sialm, C. Berger, T. Morf, M. L. Schmatz, F. Ellinger, D.
Erni, G.-L. Bona, and H. Jäckel, “A 100 mW 4 10 Gb/s transceiver
in 80-nm CMOS for high-density optical interconnects,” IEEE J. SolidState Circuits, vol. 40, no. 12, pp. 2667–2679, Dec. 2005.
[7] S. Palermo, A. Emami-Neyestanak, and M. Horowitz, “A 90 nm CMOS
16 Gb/s transceiver for optical interconnects,” in IEEE ISSCC Dig.
Tech. Papers, 2007, pp. 44, 586.
[8] J. Lee and B. Razavi, “A 40-Gb/s clock and data recovery circuit in
0.18-m CMOS technology,” IEEE J. Solid-State Circuits, vol. 38, no.
12, pp. 2181–2190, Dec. 2003.
[9] N. Nedovic, N. Tzartzanis, H. Tamura, F. M. Rotella, M. Wiklund,
Y. Mizutani, Y. Okaniwa, T. Kuroda, J. Ogawa, and W. W. Walker,
“A 40–44 Gb/s 3x oversampling CMOS CDR/1:16 DEMUX,” IEEE J.
Solid-State Circuits, vol. 42, no. 12, pp. 2726–2735, Dec. 2007.
[10] S. Sidiropoulos and M. A. Horowitz, “A semidigital dual delay-locked
loop,” IEEE J. Solid-State Circuits, vol. 32, no. 11, pp. 1683–1692,
Nov. 1997.
[11] R. Kreienkamp, U. Langmann, C. Zimmermann, T. Aoyama, and H.
Siedhoff, “A 10-Gb/s CMOS clock and data recovery circuit with an
analog phase interpolator,” IEEE J. Solid-State Circuits, vol. 40, no. 3,
pp. 736–743, Mar. 2005.
[12] C. Kromer, G. Sialm, C. Menolfi, M. Schmatz, F. Ellinger, and H.
Jackel, “A 25-Gb/s CDR in 90-nm CMOS for high-density interconnects,” IEEE J. Solid-State Circuits, vol. 41, no. 12, pp. 2921–2929,
Dec. 2006.
[13] T. Toifl, C. Menolfi, P. Buchmann, C. Hagleitner, M. Kossel, T. Morf,
J. Weiss, and M. Schmatz, “A 72 mW 0.03 mm inductorless 40 Gb/s
CDR in 65 nm SOI CMOS,” in IEEE ISSCC Dig. Tech. Papers, 2007,
pp. 226–598.
[14] G. v. Büren, L. Rodoni, A. Huber, R. Brun, D. Holzer, M. Schmatz, and
H. Jäckel, “5.75 to 44 Gb/s quarter rate CDR with data rate selection
in 90 nm bulk CMOS,” in Proc. ESSCIRC, 2008, pp. 166–169.
[15] C. F. Liao and S. I. Liu, “40 Gb/s transimpedance-AGC amplifier and
CDR circuit for broadband data receivers in 90 nm CMOS,” IEEE J.
Solid-State Circuits, vol. 43, no. 3, pp. 642–655, Mar. 2008.
[16] J. D. H. Alexander, “Clock recovery from random binary data,” Electron. Lett., vol. 11, pp. 541–542, Oct. 1975.
[17] M. Horowitz, C.-K. K. Yang, and S. Sidiropoulos, “High-speed electrical signaling: Overview and limitations,” IEEE Micro, vol. 18, pp.
12–24, 1998.
[18] C.-K. K. Yang and M. A. Horowitz, “A 0.8 m CMOS 2.5 Gb/s oversampling receiver and transmitter for serial links,” IEEE J. Solid-State
Circuits, vol. 31, no. 12, pp. 2015–2023, Dec. 1996.
[19] H. O. Johansson and C. Svensson, “Time resolution of NMOS sampling switches used on low-swing signals,” IEEE J. Solid-State Circuits, vol. 33, no. 2, pp. 237–245, Feb. 1998.
[20] Y. Okaniwa, H. Tamura, M. Kibune, D. Yamazaki, C. Tsz-Shing, J.
Ogawa, N. Tzartzanis, W. W. Walker, and T. Kuroda, “A 40-Gb/s
CMOS clocked comparator with bandwidth modulation technique,”
IEEE J. Solid-State Circuits, vol. 40, no. 8, pp. 1680–1687, Aug. 2005.
0
2
[21] K. Kanda, D. Yamazaki, T. Yamamoto, M. Horinaka, J. Ogawa, H.
Tamura, and H. Onodera, “40 Gb/s 4:1 MUX/1:4 DEMUX in 90
nm standard CMOS,” in IEEE ISSCC Dig. Tech. Papers, 2005, pp.
152–590.
[22] T. Chalvatzis, K. H. K. Yau, P. Schvan, M. T. Yang, and S. P.
Voinigescu, “A 40-Gb/s decision circuit in 90 nm CMOS,” in Proc.
ESSCIRC, 2006, pp. 512–515.
[23] G. v. Buren, L. Rodoni, C. Kromer, H. Jackel, A. Huber, and T. Morf,
“Low power sampling latch for up to 25 Gb/s 2x oversampling CDR in
90 nm CMOS,” in Proc. ESSCIRC, 2006, pp. 106–109.
[24] S. S. Mohan, M. d. M. Hershenson, S. P. Boyd, and T. H. Lee, “Bandwidth extension in CMOS with optimized on-chip inductors,” IEEE J.
Solid-State Circuits, vol. 35, no. 3, pp. 346–355, Mar. 2000.
[25] F. Ellinger, M. Kossel, M. Huber, M. Schmatz, C. Kromer, G. Sialm,
D. Barras, L. Rodoni, G. v. Buren, and H. Jackel, “High-Q inductors
on digital VLSI CMOS substrate for analog RF applications,” in Proc.
IEEE Int. Microwave and Optoelectronics Conf. (IMOC), 2003, vol. 2,
pp. 869–872.
[26] C. Kromer, “10 Gb/s to 40 Gb/s receiver for high-density optical interconnects in 80 nm CMOS,” Ph.D. dissertation, Swiss Federal Inst.
Technol. (ETH), Zurich, Switzerland, 2005, ETH No. 16347.
[27] T. Toifl, C. Menolfi, M. Ruegg, R. Reutemann, P. Buchmann, M.
Kossel, T. Morf, J. Weiss, and M. L. Schmatz, “A 22-Gb/s PAM-4 receiver in 90-nm CMOS SOI technology,” IEEE J. Solid-State Circuits,
vol. 41, no. 4, pp. 954–965, Apr. 2006.
[28] J. G. Maneatis, “Low-jitter process-independent DLL and PLL based
on self-biased techniques,” IEEE J. Solid-State Circuits, vol. 31, no.
11, pp. 1723–1732, Nov. 1996.
[29] J. R. Burnham, G.-Y. Wei, C.-K. K. Yang, and H. Hindi, “A comprehensive phase-transfer model for delay-locked loops,” in Proc. IEEE
Custom Integrated Circuits Conf. (CICC), 2007, pp. 627–630.
[30] T. Toifl, C. Menolfi, P. Buchmann, M. Kossel, T. Morf, R. Reutemann, M. Ruegg, M. L. Schmatz, and J. Weiss, “A 0.94-ps-RMS-jitter
0.016-mm 2.5-GHz multiphase generator PLL with 360 digitally
programmable phase shift for 10-Gb/s serial links,” IEEE J. Solid-State
Circuits, vol. 40, no. 12, pp. 2700–2712, Dec. 2005.
[31] B. Lai and R. C. Walker, “A monolithic 622 Mb/s clock extraction data
retiming circuit,” in IEEE ISSCC Dig. Tech. Papers, 1991, pp. 144–306.
[32] M. Soyuer, J. F. Ewen, and H. L. Chuang, “A fully monolithic 1.25 GHz
CMOS frequency synthesizer,” in Symp. VLSI Circuits Dig., 1994, pp.
127–128.
[33] J. Savoj and B. Razavi, “A 10-Gb/s CMOS clock and data recovery
circuit with a half-rate linear phase detector,” IEEE J. Solid-State Circuits, vol. 36, no. 5, pp. 761–768, May 2001.
[34] R. E. Best, Phase-Locked Loops: Design, Simulation and Applications,
4th ed. New York: McGraw-Hill, 2003.
[35] M. Kossel, C. Menolfi, J. Weiss, P. Buchmann, G. von Bueren, L.
Rodoni, T. Morf, T. Toifl, and M. Schmatz, “A T-coil-enhanced 8.5
Gb/s high-swing sst transmitter in 65 nm bulk CMOS with < 16 dB
return loss over 10 GHz bandwidth,” IEEE J. Solid-State Circuits, vol.
43, no. 12, pp. 2905–2920, Dec. 2008.
[36] IEEE Standard for Information Technology: Media Access Control
(MAC) Parameters, Physical Layers, and Management Parameters for
10 Gb/s Operation, IEEE Std. 802.3ae-2002, 2002, pp. 0_1-516.
0
Lucio Carlo Rodoni (S’03) was born in Biasca,
Switzerland, in 1971. He received the Dipl. Ing.
(M.S.) degree in electrical engineering from the
Swiss Institute of Technology (ETH) Zürich,
Switzerland, in 1998.
From 1998 to 2000, he was with Mandozzi Electronics Inc., where he was involved in the development of digital audio mixers and 2 Mb/s transmission
systems for audio and data. From 2000 to 2002, he
was a Research Engineer with TChip Inc. developing
global positioning system (GPS) RF front-end chips.
Since 2002, he has been a member of the RF Integrated Circuit (RFIC) Group,
Electronics Laboratory, ETH Zürich, Switzerland. Between October 2006 and
March 2007, he was with IBM Zurich Research Laboratory, involved in a series-source terminated transmitter project. His main interests are integrated circuits for high-speed interconnect applications.
Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.
RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS
George von Büren (S’03) was born in Zürich,
Switzerland, in 1974. He received the Dipl. Ing.
(M.S.) degree in electrical engineering from the
Swiss Federal Institute of Technology (ETH) Zurich,
Switzerland, in 1999.
From 1999 to 2002, he was with u-blox Inc.,
where he was involved in the development of
embedded computers and GPS receivers. In 2002,
he joined the Electronics Laboratory, ETH Zurich,
as a Research Assistant to pursue his Ph.D. degree
in collaboration with the IBM Zurich Research
Laboratory in Rüschlikon. From October 2006 to March 2007, he was with
IBM Zurich Research Laboratory developing on a series-source terminated
transmitter in 65 nm CMOS. His research interests are the field of analog and
mixed-signal design, with current focus on PLLs and clock and data recovery
circuits for serial I/O-links.
Alex Huber (S’93–M’00) was born in Zürich,
Switzerland, in 1967. He received the Dipl. Ing.
degree and the Ph.D. degree in electrical engineering
form the Swiss Federal Institute of Technology
(ETH), Zürich, in 1993 and 2000, respectively.
From 1993 to 2000, he was with the Electronics
Laboratory, ETH Zürich, as a Research Assistant,
where he worked on RF circuit design and modeling
of InP/InGaAs HBT devices. Since October 1999,
he has been with the Institute of Microelectronics
of the University of Applied Sciences Northwestern
Switzerland, Windisch, Switzerland. His main research interests include
low-power and high-speed integrated circuits in CMOS technologies for sensor
and communication applications.
1941
Martin L. Schmatz (S’94–M’97) was born on May
8th, 1967, in St. Gallen, Switzerland. He received the
degree in electrical engineering in 1993 and the Ph.D.
degree in 1998, both from the Swiss Federal Institute of Technology (ETH), Zürich, for his work on
low-power wireless receiver designs and on noise-parameter measurement systems. He received the ETH
medal for his diploma work and the ETH-SEU award
for outstanding research activities in 1995.
In 1999, he joined the IBM Zürich Research Laboratory, where he established and managed a research
team working on high-speed and high-density CMOS serial-link systems. By
mid of 2008, he took over management responsibilities for the Systems Department at the IBM Zürich Research Laboratory with focused research on a wide
range of server systems building blocks. He is a member of the IBM Academy
of Technology and also manages the IBM-ETH Center for Advanced Silicon
Electronics (CASE).
Heinz Jäckel (M’82) received the Ph.D. degree in
electrical engineering at the ETH Zurich in 1979.
In 1980, he joined the IBM Research Division
where he held scientific and management positions
for 13 years in IBM Rüschlikon, Switzerland, and
IBM Yorktown Heights, USA. During this time he
carried out research in superconducting Josephson
junction computers, GaAs-MESFET ICs, and optoelectronics. He has been a full Professor at the
Electronics Laboratory of the Swiss Federal Institute
of Technology, ETH Zurich, since 1993, heading
the High Speed Electronics and Photonics group (http://www.ife.ee.ethz.ch/,
http://www.photonics.ee.ethz.ch/). In electronics the research activities of his
group concentrate on development of III/V technology, design and characterization of ultrafast InP-HBT transistors for 100 Gb/s electronics, and
multi-10 GHz RF and digital 10–40 Gb/s CMOS IC design. In the area of
ultra-dense and Tb/s lightwave communication research, topics are integrated
InP-based mode-locked diode lasers, all-optical switches for all optical signal
processing at Tb/s data rates, and planar InP-based photonic crystals. Prof.
Jäckel has authored or coauthored over 100 publications, and holds around 20
patents.
+
Authorized licensed use limited to: Fachhochschule Nordwestschweiz. Downloaded on June 25, 2009 at 04:48 from IEEE Xplore. Restrictions apply.
Download