5.75 to 44Gb/s Quarter Rate CDR with Data Rate Selection

advertisement
5.75 to 44Gb/s Quarter Rate CDR with Data Rate
Selection in 90nm Bulk CMOS
George von Bueren, Lucio Rodoni, Heinz Jaeckel
Alex Huber
Electronics Laboratory
ETH Zürich
CH-8092 Zürich, Switzerland
Institute of Microelectronics
University of Applied Sciences Northwestern Switzerland
CH-5210 Windisch, Switzerland
Roland Brun, Daniel Holzer
Martin Schmatz
Bern University of Applied Sciences
CH-3400 Burgdorf, Switzerland
IBM Zurich Research Laboratory
CH-8803 Rüschlikon, Switzerland
Abstract—This paper presents a quarter rate clock/data
recovery (CDR) circuit for plesiochronous serial I/O-links. This
2x-oversampled phase-tracking CDR, implemented in 90nm
bulk CMOS technology, covers the whole range of data rates
from 5.75 to 44Gb/s thanks to a data rate selection logic. A bit
error rate < 10–12 was verified up to 38Gb/s using a 27–1 PRBS
pattern. The CDR is able to track a maximum frequency
deviation of ±615ppm between incoming data and reference
clock.
Keywords: clock data recovery, quarter rate, CMOS.
I. INTRODUCTION
The aggregate data communication bandwidth of key
components in telecommunication equipment and computer
servers has shown a continuous increase in the past. This
progress has been reached by increasing the serial data rate
and by integrating more links on a single chip. In order to
achieve multi-channel integration into a CMOS logic process,
these transceivers should be low power and area efficient. One
of the most crucial and speed-limiting circuit blocks in these
link macrocells is the clock and data recovery (CDR) circuit in
the receiver. The first 40Gb/s CMOS CDR has been presented
in 2003 [1]. This 40Gb/s CDR has been realized in 0.18µm
CMOS, employs a quarter-rate architecture with a multiphase
VCO and passive loop filter, achieves a the bit-error rate
(BER) of 10-6 and consumes a current of 144mA from a 2V
supply. In case of plesiochronous systems, where every
participant gets nearly the same frequency, the CDR tracking
loop with the area-consuming passive loop filter can be
replaced with a digital phase tracking loop [2]. A half-rate
25Gb/s CDR implemented in 90nm CMOS achieving a
BER < 10-12 incorporates a digital first order loop filter,
consumes 98mA from a 1.1V supply and its area consumption
is 0.064mm2 only, and is therefore suited for high-density
integration [3]. It has been shown with a quarter rate CDR [4]
This work was supported by the Swiss Federal Office for Professional
Education and Technology, contract/grant number KTI 7995.1
that area and power consumption can be further reduced
thanks to two accomplishments. First, the application of a
phase-programmable PLL [5] allows realizing a dual loop
CDR [2] without phase rotators. Second, the use of staticCMOS design style in most analog circuits instead of current
mode logic (CML). This 40Gb/s CDR is implemented in
65nm SOI CMOS and its area and power consumption are
0.03mm2 and 72mW, respectively. The use of static-CMOS
design style is only possible with regulated supply voltages
[5], [6]. Compared to static-CMOS design style CML circuits
have a better immunity to supply variations and generate less
switching noise. The 40Gb/s CDR presented in this paper
employs fully differential CML in all analog high-speed
circuits. With a 90nm CMOS technology CML circuits are
mandatory to processes a 40Gb/s data stream. Only the digital
loop filter consists of CMOS gates. We propose a data rate
selection logic that allows covering the whole range of data
rates from 5.75 to 44Gb/s. This feature makes the circuit
especially suitable in multi-standard applications enabling new
link rates while supporting compatibility with legacy rates.
II. CDR TOPOLOGY
In high-density serial I/O links, the transmitter (TX) and
receiver (RX) are clocked by two independent reference
clocks having the same nominal frequency. These reference
clocks are multiplied from a quartz crystal oscillator with a
frequency tolerance ranging from ±10 to ±100ppm. In these
plesiochronous systems the CDR has to track a slowly drifting
phase difference between the incoming data and the RX clock
caused by the small frequency offset between the TX and RX
clocks. Hence, a phase-tracking loop in the CDR is sufficient.
The architecture of our phase tracking loop is shown in
Fig. 1. It is a 2x-oversampled quarter rate CDR with the
advantage that only the first latch of the sampling flip-flop
must be able to track the data at full speed. Eight parallel
Data Out 4x10Gb/s
Data
40Gb/s
D0-3
8 Samplers
E0-3
8 phases
10GHz
Clock, Data Out @ 2.5Gb/s
8:32 Demux
Data Alignment
d0-15
Edge Detection
@ 2.5GHz
e0-15
early/
late
φ0 φ1 … φ7
Up/Dn Counter
4 Phase Rotators
16
8 phases
10GHz
Digital Loop
Filter
up/dn
@ 1.25GHz
@ 1.25GHz
ψ0 ψ1 … ψ 7
CML
DLL
Reference Clock 10GHz
CMOS
Fig. 1 Architecture of the phase tracking loop.
samplers acquire the four data bits (D0..3) and four edges (E0..3)
needed to evaluate the sampling position [7]. Eight parallel 1:4
demultiplexers reduce the data rate form 10 to 2.5 Gb/s and
align the sampled bits, which are separated by one eighth of
the period of the reference clock signal, to one single clock
phase, generating 16 data (d0..15) and 16 edge (e0..3) bits. The
transition from differential signaling to full swing CMOS
signal levels is performed in the demultiplexers. The phase
tracking loop is implemented by a digital delay locked loop.
The digital control logic consists of an edge detection logic, a
digital loop filter and an up/down counter, which controls the
output phases (φi) of the four phase rotators. The reference
clock phases (Ψi) are generated in an analog delay locked loop
(DLL). The four 10Gb/s data bits D0..3 are buffered and fed to
output pins for testing and measurement purposes.
III.
CIRCUIT DESIGN
A. Sampler
The first stage of the master-slave flip-flop is a shunt
inductive peaked CML latch. The bandwidth enhancement is
necessary since this latch has to track the 40Gb/s input data.
With a 0.7nH on-chip inductor a maximal bandwidth
enhancement by a factor of 1.8 [8] has been achieved. The
area of one multi-layer spiral inductor amounts to 20x20µm2.
In the second latch of the master-slave flip-flop no inductive
peaking is required because this latch operates with a 10Gb/s
data stream only.
B. Digital Control Loop with Rate Selection
Fig. 2 illustrates the block diagram of the digital control
loop. All circuit blocks are synthesized circuits and are placed
and routed with a digital design tool.
d15
d-1
d0-15
d0-15
e0-15
e 0-15
2.5GHz
Alexander
early
phase detector
late
majorityvoting
D
E
M ea1,0
U
X la1,0
M
A
J.
V
O
early
late
step1,0
F
up
S
dn
M
1.25GHz
Fig. 2. Block diagram of the digital loop filter.
Sx
up/
dn
W0-7
counter
W0-7
Fig. 3. Principle of the rate selection quarter-rate (QR), half-rate (HR), and
full-rate (FR) mode, • sample points, ◦ discarded samples
The edge detector solves the Alexander equations [7] and
outputs a single early or late signal after majority voting. In
order to relax the speed requirement for the digital loop filter,
the early and late output signals of the edge detector are
demultiplexed by a factor of 2. The loop filter is realized as
finite state machine and accumulates the incoming early and
late bits. A phase step (up or down) is induced when the
overhang of early or late signals is greater than three. This
edge detection logic can work in three operation modes as
depicted in Fig. 3. Quarter rate (QR) operation is used for an
input data rate from 23 to 44Gb/s. The early/late generation
logic generates for each of the 16 data/edge bit pairs an
early/late signal by solving the Alexander equations [7]. When
the data rate is lower and the bit length larger, between 11.5
and 23Gb/s, the CDR operates in half rate (HR) mode. The
edge samples used in the quarter rate mode are omitted and
only the data samples are evaluated. In this mode, the even
data samples take the role of the edge bits and the odd data
samples are still data bits. From the eight data/edge pairs the
early/late information is generated. For a still lower input data
rate from 5.75 to 11.5Gb/s, the full rate (FR) mode is
appropriate. Here, every other sample of the odd data samples
are alternately used as data and edge bit, respectively. In this
case, the early/late logic generates 4 early/late signals. Hence,
our receiver can cover the full range of data rates from 5.75 to
44Gb/s, even though the multi phase delay lock loop (DLL),
which generates the reference clock phase Ψi, is band limited.
The DLL operates from 5.75 to 11.5 GHz and limits the lower
data rate of the CDR.
C. DLL, Phase Rotator and Clock Buffer
In order to update the sample position, we use four parallel
phase rotators, which are controlled by a thermometer coded
up/down counter. Using a full thermometer code, glitches or
discontinuities, in the phase rotator characteristics can be
avoided. The four differential reference clock phases (Ψi),
which are generated by the DLL, are fed to the four phase
rotators. One phase rotator, shown in Fig. 4(a), consists of a
phase selection stage followed by a phase interpolation stage
[2]. The first stage selects two clock phases from two adjacent
phase octants. Using eight clock phases provides a better
phase linearity compared to using six phases or I/Q
Ψ0
Ψ2
Ψ4
Ψ6
4:1
PI
Ψ1
Ψ3
Ψ5
Ψ7
(a)
PH1
4:1
8
PH2
W0..7
W0B..7B
interpolator weight
8
Si
phase select
500Ω
PH1
Ψ0
Ψ4 Ψ2
S0
PI0
500Ω
PH1B
Ψ6 Ψ4
Ψ0 Ψ6
S180
S90
Ψ2
signals (Ψi) as well as the sample clock signals (φi) are driven
by clock buffers using inductive and capacitive peaking to
have enough driving capability and to remove any DC-offset
in the differential clock signal. The inductive shunt peaking is
used to expand the bandwidth of the buffer. With capacitive
peaking, the gain at lower frequencies (<5 GHz) is decreased.
In addition, the output DC levels are regulated actively to
reduce DC-offset and duty cycle distortion of the clock signal.
S270
Vbias
(b)
250Ω
250Ω
PI0
PI0B
(c)
PH1
PH1B
PH2
PH2B
.....
W0
Vbias
W0B W1
I=0.25mA
W1B
I=0.25mA
W7
.....
Fig. 5. Chip photo and layout of the CDR
W7B
I=0.25mA
Fig. 4. (a) Phase rotator, (b) phase selector, (c) phase interpolator
interpolation schemes. The phase interpolator that blends the
two selected phases is controlled by the 8-bit thermometer
coded value W7..0. The schematic of the used 4:1 multiplexer
and interpolator are depicted in Fig. 4(b) and Fig. 4(c),
respectively. Retiming flip-flops between the up/down counter
and the phase rotator guarantee that all control signals Si,
W7..0, W7B..0B change at the same time. The common mode
outputs of the selector and the interpolator are regulated by a
replica bias as all CML circuits of this CDR. An important
practical requirement is that amplitude and common mode
voltage of sampling clock are valid always, even after start-up,
to assure the presence of the CDR system clock. This implies
that the control signals Si, W7..0, W7B..0B are initialized
correctly. As can be seen Fig. 4(b) and Fig. 4(c), the regulated
output common mode voltages of the proposed selector and
interpolator circuits are always valid because their output
common mode voltages are independent of the digital control
signals.
A total number of 64 phase steps for one 100ps reference
clock period or 16 steps for one data unit interval (UI) of 25ps
are provided, resulting in a nominal timing resolution of
1.56ps. As a consequence, the maximal possible frequency
offset between TX and RX clocks that can be tracked correctly
amounts to 106/(64·8·3)ppm = 615ppm. The reference clock
IV. MEASUREMENT RESULTS
Our CDR circuit is fabricated in a 90nm bulk CMOS
technology and consumes 230mA from a 1V power supply
voltage (analog 215mA, digital supply 15mA). All inputs and
outputs are ESD protected except the differential 40Gb/s data
inputs. The layout of the core circuit that occupies
570x350µm2 (=0.2mm2) and the die micrograph of the CDR
circuit are shown in Fig. 5. The CDR is able to lock to a PRBS
data stream at up to 44Gb/s when the input signal is applied to
the chip using on-wafer probes. The 40Gb/s input eye diagram
with a 10GHz sinusoidal clock signal is illustrated in Fig. 6(a).
The recovered data at 10Gb/s is shown in Fig. 6(b). The
operating ranges for full-, half- and quarter-rate modes are
5.75 to 11.5Gb/s, 11.5 to 23Gb/s and 23 to 44Gb/s,
respectively. In all operating ranges, the maximum frequency
offset that can be tracked is ±615ppm for a BER of <10–12 up
to 38Gb/s. The limit was set by the measurement setup
because the input pattern was not error free above 38Gb/s. The
value of ±615ppm is sufficient to countervail inequalities of
(a)
(b)
Fig. 6. (a) 40Gb/s input data, 10GHz sinusoidal clock signal.
(time scale: 10ps/div, amplitude scale: 50mV/div)
(b) Recovered 10Gb/s data
(time scale: 20ps/div, amplitude scale: 50mV/div)
Fig. 7. Eye diagram of a 24Gb/s data stream at the input of the package
(left eye diagram) and at the pad of the circuit (right eye diagram)
the clock frequencies of two chips clocked from different
crystal oscillators. Besides the frequency offset, which can be
tracked, the jitter tolerance is the second key parameter for
CDRs employed in chip-to-chip communication. The jitter
tolerance measurements have been performed in a packaged
module (Fig. 7). Fig. 7 also shows the eye diagram of the
24Gb/s input data before (left eye diagram) and after (right
eye diagram) a trace of 1.6cm length on the substrate. The
jitter tolerance plot at 24Gb/s of the packaged CDR and the
extended jitter tolerance mask for XAUI [9] are illustrated in
Fig. 8. For all jitter frequencies and all jitter amplitudes, the
XAUI mask can be fulfilled by our circuit.
TABLE I. shows a comparison with previously published
40Gb/s CMOS CDRs with analog [1], [10] or digital loop
filters [4], [11]. Fully analog CDRs are area consuming and
dissipate less power but have a larger BER (>10-12) compared
to [4], [11]. Among the three CDRs with a digital loop filter
our CDR covers the largest range of data rates. Furthermore, it
consumes less power and has a smaller chip area than the 3xoversampling CDR [10]. Only the circuit in [4] reaches
superior performance with respect to power and area, but uses
a more advanced transistor technology that allows to
implement the speed-critical circuit blocks in CMOS logic
instead of the more power- and area-consuming CML logic.
TABLE I.
Data-rate [Gb/s]
Tbit/Tclock
Demux data
Loop filter
Supply [V]
Power [mW]
Area [mm2]
Gb/s/mW
Tb/s/mm2
BER
CMOS
[1]
40
1/4
1:4
passive
2
144
0.64a
0.28
0.06
10-6
0.18µm
40GB/S CMOS CDRS
[4]
27-40
1/4
1:8
digital
1
72
0.03
0.56
1.33
<10-12
65nmb
[10]
40
1/2
1:2
passive
1.2
48
0.42
0.83
0.09
10-9
90nm
[11]
40-44
1/4
1:16
digital
1.4
900
1.44
0.048
0.03
<10-12
90nm
This
5.75-44
1/4
1:16
digital
1
230
0.2
0.174
0.2
<10-12
90nm
a. Estimated b. Silicon on insulator technology (SOI)
V. CONCLUSION
A clock-data-recovery circuit implemented in 90 nm bulk
CMOS for 40Gb/s chip-to-chip communication is presented.
Thanks to the novel rate selection feature in the fully digital
Fig. 8. Jitter tolerance of the packed CDR at 24Gb/s achieving a BER<10–12.
loop filter a very large data rate range from 5.75 to 44Gb/s can
be covered. From 5.75 to 38Gb/s a BER <10–12 is achieved
even for a frequency offset of ±615ppm and data jitter
amplitudes above the XAUI mask.
ACKNOWLEDGMENT
The authors thank T. Toifl, C. Menolfi, T. Morf, C.
Kromer, M. Kossel, J. Weiss for fruitful discussions, M. Lanz
and M. Witzig for bonding and the IBM foundry team for
manufacturing the CMOS chips.
REFERENCES
[1]
J. Lee and B. Razavi, “A 40-Gb/s Clock and Data Recovery Circuit in
0.18-µm CMOS Technology,” IEEE Journal of Solid-State Circuits,
vol. 38, pp. 2181–2190, Dec. 2003.
[2] S. Sidiropoulos, M. Horowitz, “A Semi-Digital Dual Delay-Locked
Loop,” IEEE JSSC, vol. 32, no. 11, pp. 1683-1692, Nov. 1997.
[3] C. Kromer, G. Sialm, C. Menolfi, M. Schmatz, F. Ellinger, H. Jäckel,
“A 25-Gb/s CDR in 90-nm CMOS for High-Density Interconnects”,
IEEE J. Solid-State Circuits, vol. 41, no.12, pp. 2921-2929, Dec. 2006.
[4] T. Toifl, C. Menolfi, P. Buchmann, C. Hagleitner, M. Kossel, T. Morf,
J. Weiss, and M. Schmatz, “A 72mW 0.03mm2 Inductorless 40 Gb/s
CDR in 65 nm SOI CMOS,” ISSCC Dig. Technical Papers, pp. 226–
227, 11–15 Feb. 2007.
[5] T. Toifl, C. Menolfi, P. Buchmann, et al., “0.94ps-rms-Jitter 0.016mm2
2.5GHz Multi-Phase Generator PLL with 360° Digitally Programmable
Phase Shift for 10Gb/s Serial Links,” IEEE J. Solid-State Circuits, vol.
40, no. 12, pp. 2700–2712, Dec., 2005.
[6] E. Alon, J. Kim, S. Pamarti, K. Chang, and M. Horowitz, "Replica
compensated linear regulators for supply-regulated phase-locked
loops," IEEE J. of Solid-State Circuits, vol. 41, pp. 413-424, Feb. 2006.
[7] J. D. H. Alexander, “Clock Recovery from Random Binary Data,”
Electronics Letters, vol. 11, pp. 541–542, 1975.
[8] S. S. Mohan, M. Hershenson, S. P. Boyd, and T. H. Lee, “Bandwidth
Extension in CMOS with Optimized On-Chip Inductors”, IEEE Journal
of Solid-State Circuits, vol. 35, no. 3, pp. 346–355, March 2000.
[9] IEEE Std. 802.3ae-2002, Media Access Control (MAC) Parameters,
Physical Layers, and Management Parameters for 10 Gbps Operation.
[10] C. F. Liao, and S. I. Liu, “40 Gb/s Transimpedance-AGC Amplifier and
CDR Circuit for Broadband Data Receivers in 90 nm CMOS,” IEEE
JSSC, vol. 43. no. 3, pp. 642-655, March 2008.
[11] N. Nedovic, N. Tzartzanis, H. Tamura, H. Rotella, M. Wiklund, Y.
Mizutani, Y. Okaniwa, T. Kuroda, J. Ogawa, and W. Walker, “40-to-44
Gb/s 3× Oversampling CMOS CDR, 1:16 DEMUX,” in IEEE ISSCC
Dig. Technical Papers, pp. 224–225, 11–15 Feb. 2007.
Download