A 4.8–6.4-Gb/s Serial Link for Backplane Applications Using

advertisement
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 9, SEPTEMBER 2005
1957
A 4.8–6.4-Gb/s Serial Link for Backplane
Applications Using Decision
Feedback Equalization
Vishnu Balan, Member, IEEE, Joe Caroselli, Member, IEEE, Jenn-Gang Chern, Member, IEEE,
Catherine Chow, Member, IEEE, Ratnakar Dadi, Member, IEEE, Chintan Desai, Member, IEEE,
Leo Fang, Member, IEEE, David Hsu, Member, IEEE, Pankaj Joshi, Member, IEEE, Hiroshi Kimura, Member, IEEE,
Cathy Ye Liu, Member, IEEE, Tzu-Wang Pan, Member, IEEE, Ryan Park, Member, IEEE, Cindy You, Member, IEEE,
Yi Zeng, Member, IEEE, Eric Zhang, Member, IEEE, and Freeman Zhong, Member, IEEE
Abstract—In this paper, a serial link design that is capable of
4.8–6.4-Gb/s binary NRZ signaling across 40 of FR4 copper backplane traces and two connectors is described. The transmitter features a programmable two-tap feed forward equalizer and the receiver uses an adaptive four-tap decision feedback equalization to
compensate for the losses in the channel at 6.4 Gbps. The transceiver core is built in LSI’s 0.13- m standard CMOS technology
to be integrated into ASIC designs that require serial links. The
transceiver consumes 310 mW per duplex channel at 1.2 V and
6.4 Gb/s under nominal conditions.
Index Terms—Adaptive equalization, backplane transceiver, decision feedback equalization (DFE), SerDes, serial link.
I. INTRODUCTION
T
HE speed of serial links across copper backplanes has seen
a steady rise over the past few years. Backplane serial
links need to be able to handle increased channel losses at these
higher speeds while still being capable of supporting legacy
backplanes that were originally designed for 1–3-Gb/s operation. Advanced equalization techniques are required to remove
intersymbol interference (ISI) due to loss mechanisms in copper
traces drawn on PCBs. The loss mechanisms include those due
to skin effect, dielectric loss, and reflections from impedance
discontinuities. Equalization techniques that provide high frequency boost to compensate for channel losses also boost noise
or crosstalk, which degrades overall performance. Traditional
decision feedback equalization (DFE) architectures have both
feedback and feedforward equalizers in the RX. In order to ease
design and maintain backward compatibility, the feedforward
equalizer is moved to the TX. The transceiver described here has
both transmit (TX) equalization in the form of programmable
de-emphasis filter (FF) and receive (RX) equalization in the
form of DFE to compensate for channel losses. DFE uses clean
decisions of previously received symbols to remove ISI in the
current symbol. Since it does not boost high-frequency noise
Manuscript received December 7, 2004; revised February 18, 2005.
V. Balan, J. Caroselli, C. Chow, C. Desai, L. Fang, D. Hsu, P. Joshi, C. Y. Liu,
T.-W. Pan, R. Park, Y. Zeng, E. Zhang, and F. Zhong are with the Communications & ASIC Technology Department, LSI Logic Corporation, Milpitas, CA
95035 USA (e-mail: vbalan@teranetics.com).
J.-G. Chern, R. Dadi, H. Kimura, and C. You are with Link-A-Media Inc.,
Santa Clara, CA 95051-0951 USA.
Digital Object Identifier 10.1109/JSSC.2005.848180
such as crosstalk or wideband noise to equalize the channel, this
technique can be suitable for backplane environments with high
channel count. DFE is vulnerable to error propagation because
an error made during a decision will influence future decisions
through the feedback equalizer. However, the target bit error rate
,
(BER) in backplane applications is already very low
and the degradation due to error propagation is acceptable in
most cases.
The transceiver consists of three building blocks, namely the
TX, RX, and PLL. The purpose of the PLL block is mainly to
serve as a clock multiplier unit (CMU) and to generate multiphase clocks that are at nominally at the four-tap (4T) clock rate.
The PLL provides a fixed phase to the RX and TX. The TX serializes the data using this clock, while the RX uses the PLL clock
as an initial guess for the incoming data phase and frequency.
The exact phase and frequency at the RX is recovered from the
data by a digital clock and data recovery loop architecture.
Each PLL block is capable of driving up to four full duplex channels. The multiphase clock is distributed through a
low skew clock tree to each channel. Since the PLL block is
a common block for both TX and RX, it is also an ideal place
for placing shared bias circuits, common circuitry for TX phase
calibration, and at-speed loop back and BiST circuits.
II. ARCHITECTURE
Fig. 1 shows the block diagram of the architecture of the serial data link. It consists of a TX data serializer that uses the
multiphase clocks from the CMU PLL to serialize the data. The
clock phases are calibrated to adjust for phase mismatches before being used by the serializer. The serialized data is then
passed through the programmable TX filter before being driven
to the output pads. The target jitter at the output of the TX
is 0.3UI including random and deterministic components mea. After traversing the channel, the
sured at a BER level of
eye at the input of the receiver can be completely closed due
to ISI. At the receiver, the ISI in the data is first cancelled by
the DFE filter and then sliced by the comparators of a 2-b ADC.
The outputs of the ADC (Dk, Ek) are used to drive both the CDR
loop and the digital DFE coefficient adaptation loop as shown
in Fig. 1. The CDR is a dual-loop architecture consisting of the
CMU PLL (loop1) which generates multiphase clocks that are
0018-9200/$20.00 © 2005 IEEE
1958
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 9, SEPTEMBER 2005
Fig. 1. Block diagram of a serial link with programmable TX de-emphasis and adaptive RX equalizer.
Fig. 2. Floor plan of a four-channel full duplex system, showing the PLL,
PLL2, 4TX, 4RX, mini-RX, mini-TX, and the clock tree.
used at each RX by the phase interpolators to produce the recovered clock (loop2). The interpolator is driven by a digital CDR
loop that will be described later.
A modular approach is adopted to facilitate scalable integration of these blocks into ASIC designs. Fig. 2 shows the floor
plan of a full duplex four-channel subsystem. The design consists of a TX core, RX core, and PLL core, which can then be
used as single channels or assembled into subsystems (such as a
four- or eight-channel duplex system driven by one PLL block)
based on application requirements. The PLL generates four differential phases of 4T clocks, with one T separation between the
clocks (where T is the bit-time at full-rate). Both the TX and RX
use these 4T, 4- clocks to serialize and deserialize the data.
The number of FF taps to be used is selected based on the
number of significant precursor ISI samples in the pulse response. The measured pulse response for one of the target backplanes is shown in Fig. 4(a). It shows only one significant sample
of precursor ISI, implying that a two-tap FIR should be sufficient for the feedforward equalizer. Fig. 4(b) shows a plot of
SNR at the slicer input for the target back plane versus the
number of taps employed. On the -axis, the number of feedback taps is shown. Each line represents a different number of
taps in the FF equalizer at the TX. As was expected, a significant
improvement occurs when going from one tap to two in the FF
equalizer, but there is virtually no benefit beyond that. Similarly,
while there is significant improvement in SNR as the number of
feedback taps increase, there is diminishing return beyond four
taps. Thus, a compromise between performance and complexity
is made by choosing a two-tap FF de-emphasis filter at the TX
and a four-tap feedback filter at the RX.
Since the TX is far away from the RX and a back channel is
not guaranteed to be available, the TX filter coefficients cannot
be adapted and are only programmable through register settings.
The filter can be register programmed to either act as a precursor
type (
, Type I) or postcursor type (
, Type II)
of de-emphasis. (Here “
” denotes delayed or previous data,
while “1” denotes present data bit in the serialized bit stream).
Type-II de-emphasis is the traditional method where the signal
amplitude overshoots just after a transition, and then reduces for
longer run lengths. Type-I de-emphasis differs from Type-II in
that the signal undershoots just before the data transition and
then reduces for longer run lengths. Type-I preemphasis effectively removes precursor ISI while Type II removes postcursor
ISI. Since DFE at the RX can only cancel postcursor ISI, a
Type-I FF filter would be more suitable to cancel precursor ISI
at the TX [as an example, Fig. 4(a) shows a pulse response before and after this type of equalization is performed]. For legacy
BALAN et al.: 4.8–6.4-GB/S SERIAL LINK FOR BACKPLANE APPLICATIONS USING DECISION FEEDBACK EQUALIZATION
Fig. 3. Block diagram showing the TX core along with the mini-RX block that
is used during loopback BiST and TX calibration. The mini-RX is clocked by
the PLL or PLL2 during different modes.
receivers that operate on open eyes, or those that do not have RX
equalization, a Type-II filter at the TX would be more suitable to
cancel postcursor ISI, as that dominates the loss mechanism in
typical systems. The RX has a four-tap DFE filter with the filter
coefficients being set adaptively by a sign–sign LMS algorithm
[1]. The RX also features a second-order digital CDR loop that
200 ppm offset) of
acquires the phase and frequency (up to
the incoming data.
The TX uses multiphase clocks to serialize data, and, as a result, any mismatch between the clock phase spacing will translate into periodic deterministic jitter at the TX output. In order
to minimize the impact of this, the TX has phase calibration
logic, which can detect and correct the phase mismatch of the
clocks. The clock phase calibration at the TX is performed once
during power-up and the phase mismatches correction values
are stored digitally. During calibration, the TX serializes fixed
4T patterns, namely 0011, 0110, 1010 (and their complements)
in a specific sequence. Fig. 3 shows a block diagram of the
shared data path between loopback and calibration modes. The
output of TX is looped back to the PLL block, which has a simplified RX called mini-RX to detect the data. This uses the same
path as the TX loopback BiST that is part of the test strategy for
the TX core. During calibration, the mini-RX is clocked with
a scan clock that has a small fixed offset frequency (approximately 2000 ppm) from the TX serializer. The scan clock is generated by another PLL (called PLL2) locked to the same crystal
as the CMU. The small offset frequency causes the scan clock
to “walk” across the serialized data stream from the TX. The
output from the mini-RX is digitally averaged to detect the duty
cycle of the pattern. If all serializing phases are ideal, the duty
cycle for every pattern will be exactly 50%. Depending on the
actual duty cycle of each pattern sent from the TX, the phase
mismatch can be detected. Each of the multi-phase TX clocks
pass through a 5-b phase interpolator such that each clock can
be independently adjusted to an accuracy of T/64. The phase interpolators are driven by the digital calibration logic according
to the algorithm described below.
In a four-phase system, we use one of the phases as reference and make three independent adjustments to get perfectly
aligned phases. The duty cycle is sensitive to different clock
1959
(a)
(b)
Fig. 4. (a) Pulse response measured at the far end near the RX, for a target
backplane for this design. (b) SNR at the slicer input for different numbers of
taps in the feed-forward and feedback equalizers.
phases’ mismatches depending on the data being serialized. For
is used to serialize the first
example, it is assumed that
,
, and
data bit of the 4T pattern, followed by
in that order for each of the remaining 3 b. While the pattern
“0011” is serialized at the TX, the duty cycle is only a function of the clock phase mismatch between the phases
and
. Similarly, the “0110” pattern has sensitivity to mismatch between
and
, while the “0101” pattern has
sensitivity to all phases. The calibration is done in three sequenare first adjusted to be 180 apart,
tial steps. The
are adjusted to be 180 apart. Assuming
and then
and
that the previous steps ensured 180 between
, the remaining error during the “0101” pattern is
.
used to correct for mismatch between
In order to maximize compatibility with most designs, the
RX is ac-coupled on chip after the 50- termination resistors.
This gives flexibility to tolerate any common-mode voltage
at the input of the RX within process limits. It also gives the
freedom to set an optimal common mode at the RX input stage
for best performance independent of line conditions. The data
latches at the RX have offset calibration to remove the inputreferred offset of the latch and thereby improve the sensitivity
1960
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 9, SEPTEMBER 2005
Fig. 5. Block diagrams of the 4:1 serializer, predriver, and pad driver in the TX. The inset also shows the timing diagram for the 4:1 serializer.
of the latch. The offset calibration at the RX is done once during
power-up and the offset correction values are stored digitally.
The input of the data latches consists of a double differential
pair, each being driven by data and an offset DAC, respectively.
The signed 3-b offset DAC generates offset voltages in the range
which can be used to cancel the input-referred
of 30 mV
offset of the latch. During calibration, the RX inputs just after
the ac coupling capacitors are shorted to the common-mode
voltage so that external conditions at the RX input pins do not
affect the calibration process. Then the DAC codes are swept
from 7 to 7 while the data are observed at the digital portion
of the chip. When a data transition from 1 to 1 occurs, the
DAC codes represent the input offset of the latch. The values
are digitally stored and used during normal operation.
III. TRANSMITTER DESIGN
It is desired to perform as much of the equalization as possible
at the RX, since the feedback equalizer is adaptive and does not
enhance high-frequency cross talk. The feedback equalizer is
capable of removing the postcursor ISI, but not the precursor
ISI. Consequently, the FF equalizer in the TX will be used solely
to mitigate the precursor ISI.
Fig. 5 shows the circuit detail of the TX. It consists of a data
rotator which aligns data for optimal timing at the serializer, two
4:1 serializers (one each for data and 1T delayed data), predriver,
and a CML 50- driver. The on-chip 50- termination is capable of being automatically trimmed at power-up to track an
external 3-k resistor and thereby remove any process variation of the polysilicon resistors. The 4-b parallel data from the
digital side comes in at 4T clock rate. The data rotator takes the
4-b parallel data and shifts each data bit by integer multiples
of T, to be aligned with the serializing clock edges as shown
in the timing diagram of Fig. 5. The data rotator uses the 4- ,
1T-spaced clocks to achieve this. As shown in the timing diagram of Fig. 5, the 4:1 mux forms a 1T-wide pulse by combining
the rising edge and falling edge of two adjacent phases. The data
rotator realigns the 4T parallel data such that the 1T-wide sampling pulse for each bit has a 2T setup time and a 1T hold time.
The serialization circuit consists of four pulsing circuits, with
each one active for a period of 1T, that sum currents at a resistor (Fig. 5). Each leg ANDs the data with two adjacent clock
phases to produce a 1T data pulse that in turn drives the gate
of the NMOS current source. A data of “1” will pull a current
through the resistor while a “0” produces no current. This, when
combined with the other complementary data side, produces a
pseudodifferential CML signal to drive the predriver stage. Two
4:1 serializers are used to create two data streams that are delayed by 1T with respect to each other. The two data streams
drive a segmented pre-driver and driver stages to produce the TX
output at the pads. Depending on the pre-emphasis setting desired, each predriver is switched to choose either the data or delayed data. For example, in Fig. 5, four segments of the predriver
, with
,
and driver, each with a weight ,
is shown. If segments 2 and 3 are chosen to select delayed data
while segments 1 and 4 are chosen to select the data, then output
of the TX, produces a de-emphasis waveform that is of the form
, where “1” denotes present
” denotes delayed data. Type-II or Type-I de-emdata and “
phasis can be achieved by simply swapping the data streams at
BALAN et al.: 4.8–6.4-GB/S SERIAL LINK FOR BACKPLANE APPLICATIONS USING DECISION FEEDBACK EQUALIZATION
1961
Fig. 6. Rx uses a 4- architecture with each slice operating at 4T clock rate. The blow-up of each slice shows the precalculate operation in the RX.
the output of the rotator. In this example, the waveform would
then be of the form
. By digitally choosing various combinations of mixing the weights, ,
as above, eight different de-emphasis settings in the range of
0%–36% are made possible.
IV. RECEIVER DESIGN
As the data slicer continues to make decisions every bit period, T, the DFE filter processes the information to cancel ISI
from the present incoming symbol. In order to be able to cancel
ISI from the most recent bit from the present bit, the following
timing constraint has to be met (Fig. 1):
(1)
where
is the clock to data delay of the slicer, and
is the delay of the DFE filter, and
is the
setup time of the slicer. Meeting this constraint for operation
ps in 0.13- m CMOS technology
at 6.4 Gb/s
may not be feasible or may require high-power dissipation. The
biggest bottleneck in meeting the timing constraint without a
. A major portion of the
precalculate feature is the large
which includes the time to
budget in (1) is used by the
amplify a small signal to CMOS levels. In order to ease the
constraint, a precalculate architecture [6] is adopted as shown
in Fig. 6. The ISI due to every possible combination of the
previous bit ( 1 in the case of binary NRZ) is first calculated
in parallel and the correct choice is made once the previous
data are known. This solves the critical path involving the first
tap, and the new critical path is pushed to the second tap of
the DFE. The new critical path now consists of the data signal
passing through the first slice and reaching the latch input of the
third slice within 2T. The speed of performing other operations
such as multiplexing or retiming large signals is much faster
delay of the slicer. Even though more steps are
than the
taking place in the new architecture within a time of 2T, the
additional steps are fast leaving sufficient time for the slowest
step to complete with margin.
The RX uses a four-phase design where each slice operates
at a 4T rate. Unlike previous CDR architectures [3], [4] that
use 2X or 3X over sampling, a baud-rate sampling scheme is
used to capture the data (D) as well as generate an error signal
(E) from the center of the data eye for driving the timing, gain,
and coefficient adaptation loops [1]. Phase update in traditional
high-speed serial link design is performed by a “bang–bang”
timing loop [3]. This requires oversampling the received signal
by a factor of two relative to the transmit frequency. Typically,
two samples are taken, one at the center of the data eye and
one at the edge of the eye. In a DFE system, since a feedback
filter is equalizing the received signal between data samples, the
1962
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 9, SEPTEMBER 2005
Fig. 7. RX front-end circuit with ISI cancellation. The source-follower circuit follows the input signal and each leg of the current cancels ISI based on previously
received data.
signal in front of the edge sampler may not be settled within
the T/2 time. This may cause a data-dependent timing offset
and result in suboptimal performance. In this design, in order
to avoid the cost of over sampling, phase update is driven by
baud rate spaced samples. The phase update equation
(2)
has been shown to update to the center of a symmetric pulse
response [5]. This implementation uses a sign–sign version of
(2) and performs a majority vote every 4T on four consecutive
outputs of the update equation so that
In a typical DFE system, an adaptive FF filter will control
the gain of the incoming signal such that the amplitude seen by
the ADC is constant (AGC action). Since the FF filter is now
placed at the TX and is not adapted, a “gain loop” adjusts the
size of the LSB in the 2-b data slicer. As the incoming amplitude
varies from channel to channel or due to TX amplitude settings,
the target internal to the RX, , changes until it reaches steady
state. As shown in Fig. 6, is the comparator threshold for the
2-b ADC. Large incoming amplitude will cause to be adapted
to a higher value and vice-versa. This can be represented by the
following equation:
(6)
(3)
A second-order timing loop was implemented to accommo200 ppm without severely
date frequency offsets of up to
impacting jitter tolerance performance [7], [8]. The frequency
update equation is given by
(4)
The sampling position is updated according to the following
equation each 4T:
(5)
where
controls the bandwidth of the phase update and
determines the bandwidth of the frequency-offset update. These
are implemented in digital logic, which then drives a phase interpolator that has a T/64 resolution to obtain the recovered clock.
The interpolator uses the multiphase clocks from the CMU to
perform phase interpolation.
where
is the th sample of incoming signal
,
for
are the DFE filter coefficients,
is the detected
data, and is the amplitude target (same as the slicer threshold
of the 2-b ADC shown in Fig. 6). The coefficient adaptation loop
(adjusts ) and the gain loop (adjusts ) all work together
to satisfy (6), at every sample. Fig. 7 shows the circuit detail
of how the ISI from four previous bits are subtracted from the
incoming data signal to determine the present bit. It consists of
a source follower with several legs of current sources controlled
by previously recovered data and the gate of the source follower
being driven by the input signal. The bias current in each leg is
controlled by adaptation loops and determine the magnitude of
coefficients, . The magnitude for each tap is determined by
the currents in each leg being drawn from the output impedance
. The source follower is operated in
of the source follower,
the “small signal” regime by ensuring that even when maximum
ISI is cancelled, the follower circuit is still in the “linear” range.
Further, the amplitude of the incoming data signal at the RX
is restricted in range to keep the source follower in the linear
BALAN et al.: 4.8–6.4-GB/S SERIAL LINK FOR BACKPLANE APPLICATIONS USING DECISION FEEDBACK EQUALIZATION
1963
Fig. 8. PLL schematic showing VCO delay cell and self-bias loop.
region. In the short channel case, this is done by reducing the
launch amplitude at the TX while in the long channel case the
channel losses are sufficient to attenuate the signal. The target
with the exact value
voltage, , is nominally set at 200 mV
depending
being set adaptively in the range of 100–300 mV
on the actual incoming voltage amplitude. If the input amplitude
needs to be set outside of this range, then the
is such that the
performance will degrade due to suboptimal operation of the
loops. In the extreme case, when the amplitude is too large or
too small, the receiver will fail.
The feedback taps and target level must be properly set before
reliable reception of data can be achieved. This is accomplished
during an initialization period where the feedback and target
adaptation loops along with the timing loop are allowed to adapt
and settle. It is necessary, therefore, for the TX to transmit a
data pattern which the RX can use for adaptation. However, this
pattern is not constrained other than it be spectrally rich. Any
PRBS pattern such as a PRBS7 (or higher order) is sufficient
for this purpose. The initialization is accomplished in two steps.
First, the timing and target adaptation loops are activated while
the feedback adaptation loop remains off. This second-order
timing loop will track out the frequency offset and the phase
will lock. However, the phase will not lock to the ideal sampling point if the pulse response is asymmetric due to the presence of postcursor ISI. In the next step, the feedback tap loop
is switched on as well. As the feedback taps adapt, postcursor
ISI is removed and the equalized pulse response becomes more
symmetric. As this happens, the selected phase moves toward to
the desired position. In order to minimize interaction between
the loops, it is important that the bandwidth of the coefficient
loop be significantly slower than that of the timing loop.
V. PLL DESIGN
The PLL generates multiphase clocks that are distributed
through a scalable clock tree to the RX and TX channels. Each
PLL is capable of driving up to eight channels of TX or RX. Fig. 8
shows a block diagram of the PLL with details of the VCO delay
stage and loop filter. The delay stage consists of two invertors
that are cross-coupled at the output with two weak invertors
(inset in Fig. 8). The cross-coupling helps to keep the two outputs
complementary. The supply voltage of the invertors is used as the
control voltage for the VCO. The four-stage ring VCO runs on
a regulated supply that is also the control voltage for the VCO.
A self-biased regulator is used to provide current to the VCO
while also serving to improve PSRR [2]. The regulator has a
low-pass characteristic [2] that acts as the 3rd pole in the PLL
loop to help remove high frequency noise on the control voltage.
The PLL uses self-bias techniques to track the bandwidth and
damping over PVT variations. This is simply achieved by letting
the damping resistor in the loop filter track the VCO delay and
the charge pump current track the VCO regulator current. The
damping resistor is formed by a pMOS transistor in the triode
region that is a scaled version of the pMOS in the delay cell. The
charge pump is biased using a copy of the current in the regulator.
A start-up circuit ensures that a minimum charge pump current
exists during power up until the self bias loop takes over and
drives it toward the optimum value. This lets the PLL natural
be a fixed fraction of the
, while the damping
frequency
factor is a fixed constant proportional to the ratio of capacitors
’s of transistors. A simplified analysis of the self-bias
and
loop is given below. Here it is assumed that the invertor delay
and
cell switches completely between the control voltage
ground
. Also, for simplicity, the threshold voltage of the
devices is assumed to be 0 V. The delay of one delay cell is
approximately given by
(7)
is the load capacitance at each VCO node, and
is
where
the equivalent resistor of the transistor in triode region. Substituting expressions for
(8)
1964
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 9, SEPTEMBER 2005
Fig. 9. Jitter tolerance plot of the RX with sinusoidal jitter added at the TX side. The test setup consists of TX transmitting across 30 of FR4 copper trace, two
backplane HmZd connectors, and 2 of cable.
we can get the expression for
as
(9)
is number of delay stages. The current through the
where
regulator can be given as
(10)
The charge pump current is a fraction of the VCO regulator
current, while the triode region damping transistor (shown in
Fig. 8) is a fraction of the delay cell transistor. Substituting
and of a second-order PLL, it
for the usual expressions for
can be shown that
(11)
Fig. 10. RX input return loss plotted versus frequency. The TX return loss is
similar to the RX.
(12)
where is the feedback divider count and is the loop capacitor. From the above expression, it can be seen that to first order
the damping and natural frequency of the loop are independent
of PVT. In reality, there are still variations due to second-order
effects and due to the fact that some of our assumptions (such
as 0-V threshold) are not strictly valid. However, the variations
are still within 10%–15% over various conditions.
VI. RESULTS
The transceiver was fabricated as part of a test-chip in LSIs
0.13- m CMOS technology and packaged in a generic fourlayer flip-chip PBGA package. The test-chip consists of several
RX and TX channels sharing clocks driven by a PLL core. The
TX/RX channels are surrounded by random switching logic,
which emulates digital switching noise present in a real ASIC
environment. The experimental setup consists of 30 of FR4
copper trace, two backplane HmZd connectors, and 2 of cable.
The eye diagrams shown as inset in Fig. 9 are measured at
6.4-Gb/s operation close to TX and close to the RX, respectively.
The output of the TX shows a jitter of 41 ps (peak-to-peak) of
which the PLL jitter is about 26 ps (peak to peak) measured up
to
accuracy. The remaining jitter includes phase mismatch
and other deterministic jitter components due to ISI from 5 of
FR4 trace on the evaluation board. It then traverses the backplane and the jitter adds with the ISI due to backplane losses to
close the eye at the RX by approximately 0.73UI (see the inset in
Fig. 9). The receiver blind-adapts the DFE taps to the channel
response based on a PRBS pattern, acquires timing phase and
frequency and operates error-free for over 24 h. At 6.4 Gb/s,
this translates to a BER of about
. The RX jitter tolerance plot for the setup is also shown in Fig. 9. The plot shows
the frequency of the sinusoidal jitter added to the TX clock on
the -axis versus the extra jitter amplitude added at the jitter frequency on the -axis. The sinusoidal jitter is added at the output
of TX PLL clock that is used to serialize the TX data. (The sinusoidal jitter is in addition to the already existing jitter components that are present in the normal PLL output.) The plot
shows a high-frequency jitter tolerance of about 0.12UI and a
corner frequency of about 3 MHz. Fig. 10 shows the measured
RX return loss as a function of the frequency. The RX return
loss is better than 10 dB for frequencies up to 3.5 GHz. The
TX output return loss is similar to the RX measurement. The
TX calibration routine measures the phase offsets between multiphase clocks used at the TX and corrects them as described
earlier. The standard deviation of the TX eye widths before and
BALAN et al.: 4.8–6.4-GB/S SERIAL LINK FOR BACKPLANE APPLICATIONS USING DECISION FEEDBACK EQUALIZATION
1965
[6] S. Kasturia et al., “Techniques for high-speed implementation of nonlinear cancellation,” IEEE J. Sel. Areas Commun., vol. 9, no. 5, pp.
711–717, Jun. 1991.
[7] V. Stojanovic et al., “Adaptive equalization and data recovery in a dualmode (PAM2/4) serial link transceiver,” in Proc. VLSI Circuit Symp.,
2004.
[8] H. Ng et al., “A second-order semi-digital clock recovery circuit based
on injection locking,” IEEE J. Solid-State Circuits, vol. 38, no. 12, p.
2101, Dec. 2003.
Fig. 11.
Microphotograph of test chip showing RX, TX, and PLL cores.
after calibration are 6 and 2 ps, respectively, as measured over
15 parts. Hence, the TX phase calibration helps to keep the duty
cycle distortion low at the output of the TX.
A microphotograph (taken with backside IR-OBIRCH) of the
test chip is shown in Fig. 11. The power dissipation under nominal conditions (1.2 V, 25 C ambient) for a duplex channel
is approximately 310 mW. The TX
consumes about 100 mW, the RX about 200 mW, while the PLL
that can shared by up to four full duplex channels
consumes about 40 mW. The PLL (together with BiST, PLL2,
and calibration circuits) occupies 0.78 mm (600 m 1300
m), TX occupies 0.24 mm (200 m 1200 m), and RX
about 0.32 mm (200 m 1600 m), respectively. The modular design and layout of each block allows building subsystems
easily at the chip level.
Vishnu Balan (S’95–M’96) received the B.S. degree
in electronics and communications engineering from
the Indian Institute of Technology, Madras, in 1995
and the M.S. degree in electrical engineering from
Duke University, Durham, NC, in 1996.
In 1997, he joined DataPath Systems, Inc., Santa
Clara, CA, where he worked on analog front-end circuits for several generations of hard disk drive read
channel design. In 2000, he joined LSI Logic Corporation, Milpitas, CA, where he worked on SerDes
design for backplane applications. While at LSI, he
led the analog design effort for several serial links at speeds ranging from 1.6
Gb/s to 12.8 Gb/s. In 2005, he joined Teranetics, Inc., Santa Clara, CA, where he
is currently working on analog front-end circuits for 10GBASE-T transceivers.
Joe Caroselli (S’97–A’98–M’03) received the B.S.
degree in electrical engineering and economics from
the California Institute of Technology, Pasadena, in
1992 and the Ph.D. degree in electrical engineering
from the University of California, San Diego, in
1998.
He has worked as a Systems Architect for read
channels for disk and magnetic tape drives for
Quantum Corporation, Milpitas, CA, and DataPath
Systems, Santa Clara, CA. He joined LSI Logic
Corporation, Milpitas, CA, in 2000 as part of the
DataPath acquisition and has been working on high-speed serdes as part of the
HyperPhy team since 2001 where he leads the system architecture team.
VII. CONCLUSION
A high-speed serial link design using a programmable FF
filter at the TX and an adaptive four-tap DFE at the RX has been
demonstrated. The RX implements baud-rate sampling using a
2-b ADC to recover the data and generate error signal for the
adaptation of the DFE coefficients as well as acquire the phase of
the incoming data stream. A test chip implemented in 0.13- m
standard CMOS technology is shown to be fully functional and
meet all of the target specifications. The transceivers are also capable of half- and quarter-rate mode operation to support backward compatibility.
REFERENCES
[1] V. Balan et al., “A 4.8–6.4 Gbps serial link for back-plane applications
using decision feedback equalization,” in Proc. IEEE CICC, Oct. 2004,
p. 3-3-1.
[2] V. Balan, “A low-voltage regulator circuit with self-bias to improve accuracy,” IEEE J. Solid State Circuits, vol. 38, no. 2, p. 365, Feb. 2003.
[3] A. Fiedler et al., “A 1.0625 Gbps transceiver with 2 -oversampling and
transmit signal pre-emphasis,” in ISSCC Dig. Tech. Papers, vol. XL, Feb.
1997, p. 238.
[4] S.-H. Lee and M.-S. Hwang et al., “A 5 Gb/s 0.25 CMOS jittertolerant variable-interval over sampling clock/data recovery circuit,” in
Proc. ISSCC, vol. XLV, Feb. 2002, p. 256.
[5] K. H. Mueller and M. Muller, “Timing recovery in digital synchronous
data receivers,” IEEE Trans. Commun., vol. COM-24, no. 5, p. 516, May
1976.
2
m
Jenn-Gang Chern (S’87–M’88) received the
M.S.E.E. degree from the University of California,
Los Angeles, in 1988.
From 1988 to 1994, he was with Silicon Systems
as a Design Engineer on the peak detection channels
for hard disk drives. From 1994 to 2000, he was with
DataPath Systems, Santa Clara, CA, where he was
involved with PRML read channel development. In
2000, he joined LSI Logic Corporation, Milpitas, CA,
to work on high-speed SERDES and DVD front-end
developments. In 2004, he joined Link-A-Media Corporation, Santa Clara, CA, to lead HDD read channel SOC analog frond-end
development.
Catherine Chow (M’78) received the B.S. degree in
engineering from the University of Michigan, Ann
Arbor, in 1974 and the M.S. and Ph.D. degrees in
computer science from the University of Illinois, Urbana, in 1977 and 1981, respectively.
Currently, she is a Design Manager with the LSI
SerDes group. She joined LSI Logic Corporation,
Milpitas, CA, in 2001. Her prior experiences included ASIC methodology development and storage
controller design in IBM’s Storage Division, San
Jose, CA.
1966
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 9, SEPTEMBER 2005
Ratnakar Dadi (S’98–A’00–M’03) was born in W.
Godavari District, AP, India. He received the B.Tech
degree in electrical engineering from the Indian Institute of Technology, Bombay, in 1998, and the M.S.
degree in electrical engineering from University of
Hawaii, Manoa, in 2000.
He joined LSI Logic Corporation, Milpitas, CA,
in 2000, where he worked on analog circuits for several generations of high-frequency serial-link communication over backplanes. Since 2004, he has been
with Link-A-Media Corporation, Santa Clara, CA,
working on mixed-signal circuits for hard-disk drive controllers. His current
research interests include design of high-performance analog circuits in CMOS
and BiCMOS technologies, analog and mixed-signal circuit simulation techniques, and mixed-signal design flow and verification methodologies.
Hiroshi Kimura, photograph and biography not available at the time of publication.
Cathy Ye Liu (S’96–M’99) received the B.S. degree
in electronic engineering from Tsinghua University,
Beijing, China, in 1995 and the M.S. and Ph.D.
degrees in electrical engineering from University of
Hawaii, Manoa, in 1997 and 1999, respectively.
She joined DataPath Systems, Inc., Santa Clara,
CA, in 2000 and is currently with LSI Logic Corporation, Milpitas, CA, where she is the System Architect for high-speed SerDes. Her current interests are
high-speed clock and data recovery, adaptive decision
feedback equalizer, signal processing, and error correction coding.
Chintan Desai (M’94) received the B.S.E.E. degree
from the Regional Engineering College, Surat, India,
in 1988 and the M.S.E.E. degree from Oklahoma
State University, Stillwater, in 1992.
He has over ten years of SerDes development experience at LSI Logic Corporation, Milpitas, CA, where
is he is currently the Director of SerDes Development
for Telecommunications Applications.
Leo Fang (S’93–M’95) received the B.S. degree in electrical engineering/computer science and material science engineering from the University of California, Berkeley, and the M.S. degree in electrical and computer engineering
from Carnegie Mellon University, Pittsburgh, PA.
Presently, he is the Chief Operating Officer with PyX Technologies, San
Ramon, CA. Prior to joining PyX Technologies, he was Director of SERDES
and USB Development at LSI Logic Corporation, Milpitas, CA. Before
LSI Logic, he held a variety of senior design engineering positions within
storage-centric companies such as Quantum Corporation, Milpitas, CA, and
DataPath Systems, Inc., Santa Clara, CA.
David Hsu (M’95) received the B.S. degree in engineering physics from The Ohio State University,
Columbus, in 1993, and the M.S. degree in electrical
engineering from Purdue University, West Lafayette,
IN, in 1994.
He has been with the SerDes group in LSI Logic,
Milpitas, CA, since 2001 and is currently a Design
Manager with the group. Prior to joining the SerDes
group, he was with DataPath Systems, Inc., San Jose,
CA, working in DSL and read channel projects, and
in telecommunication projects in the Siemens Semiconductor division.
Pankaj Joshi, photograph and biography not available at the time of publication.
Tzu-Wang Pan, photograph and biography not available at the time of publication.
Ryan Park (M’00) was born in Seoul, Korea, in
1976. He received the degree with emphasis on digital VLSI design from the University of California,
Berkeley, in 2000.
In 2000, he joined DataPath Systems, Inc., Santa
Clara, CA, later acquired by LSI Logic Corporation,
Milpitas, CA, where he has been involved in the area
of high-speed digital design. He is currently with LSI
Logic as a Digital VLSI Designer working in SerDes
design.
Cindy You (M’00) received the B.S. degree in electrical engineering from National Tsing Hua University, Taipei, Taiwan, in 1996, and the M.S. degree in
electrical and computer engineering from the University of Texas, Austin, in 1998.
In 1999, she joined DataPath Systems, Inc., Santa
Clara, CA, where she worked on continuous-time
filters for hard disk drives. From 2000 to 2004,
she was with LSI Logic Corporation, Milpitas,
CA, working on mixed-signal circuits including
CDR, transmitters, and DFE analog front-ends for
multi-Gb/s SerDes development. She is now with Link-A-Media Devices
Corporation, Santa Clara, CA, where she has been engaged in the development
of analog front-ends of disk drive read channels.
Yi Zeng (M’03) received the B.S. degree from Tsinghua University, Beijing, China, in 1997 and the
M.S. degree from the University of Hawaii, Manoa,
in 2000.
In 2001, he joined Ample Communications, Inc.,
Fremont, CA, where he worked on SONET/SDH
Framer analog I/O design. Since December 2002, he
has been with LSI Logic Corporation, Milpitas, CA,
where he has been working on analog circuits for
serial links at speeds ranging from 3.2 to 12.8 Gb/s.
BALAN et al.: 4.8–6.4-GB/S SERIAL LINK FOR BACKPLANE APPLICATIONS USING DECISION FEEDBACK EQUALIZATION
Eric Zhang (M’00) received the M.S. and Ph.D.
degrees in electrical engineering from the University
of Maryland, College Park, in 1992 and 1996,
respectively.
From 1996 to 2000, he was with Integrated Device
Technology, Santa Clara, CA, working on advanced
silicon IC device development and modeling, and
later on analog and mixed-signal design of SERDES
for SONET applications. In 2000 he joined LSI Logic,
where he has worked on design of Ethernet PHYs and
high-speed transceivers for backplane applications.
1967
Freeman Zhong (M’00) received the B.S. degree
(with high honor) in physics from Guangzhou
University, Guangzhou, China, in 1983, the M.S.
degree in solid-state physics from Jinan University,
Guangzhou, China, in 1986, and the M.S. degree in
electrical engineering from San Jose State University, San Jose, CA, in 1995.
From 1995 to 1997, he was with National Semiconductor Corporation, Santa Clara, CA, working on
analog and mixed-signal circuit designs. From 1997
to 2000, he was with NEC Corporation, Santa Clara,
working on mixed-signal circuit and SOC designs for hard-disk controllers.
Since 2000, he has been with LSI Logic Corporation, Milpitas, CA, as a Senior Design Manager leading developments of Ethernet PHY and high-speed
SerDes.
Download