Uploaded by devesh kashyap

yuan2014

advertisement
www.ietdl.org
Published in IET Circuits, Devices & Systems
Received on 21st January 2013
Revised on 31st October 2013
Accepted on 1st November 2013
doi: 10.1049/iet-cds.2013.0159
ISSN 1751-858X
Design techniques for decision feedback equalisation
of multi-giga-bit-per-second serial data links:
a state-of-the-art review
Fei Yuan1, Alaa R. AL-Taee1, Andy Ye1, Saman Sadr2
1
Department of Electrical and Computer Engineering, Ryerson University, Toronto M5B 2K3, ON, Canada M5B 2K3
Semtech Corp., Toronto, ON, Canada
E-mail: fyuan@ryerson.ca
2
Abstract: This study provides a comprehensive review of decision feedback equalisation (DFE) for multi-giga-bit-per-second
(Gbps) data links. The state-of-the-art of DFE for multi-Gbps serial links reported in the past decade are compiled and
presented. The imperfection of wire channels, in particular, finite bandwidth, reflection and cross-talk and their impact on data
transmission are investigated. The fundamentals of both near-end and far-end channel equalisation to combat the effect of the
imperfection of wire channels at high frequencies are explored. A detailed examination of the principle, configuration,
operation and limitation of DFE is followed. Design challenges encountered in design of DFE for multi-Gbps data links
including timing constraints, sampling, error propagation, arithmetic operation, highly dispersive channels, power
consumption and techniques and circuit implementations that address these challenges are studied. The need for adaptive DFE
and the principles of adaptive DFE are investigated. Finally, the performance of various adaptive DFEs is examined and their
pros and cons are compared.
1
Introduction
The explosive growth of data processed by integrated circuits
(ICs) demands that data be transmitted over wire channels
(interconnects, vias, connectors, package pins, printed circuit
boards and coaxial cables) at multiple giga-bit-per-second
(Gbps). Although increasing the number of wire channels
directly improves the total data bandwidth, a large number
of parallel channels not only increase the cost of routing, the
overall data rate is also affected by clock and data skews
caused by the mismatch of the channels [1]. As a result,
parallel links are only attractive for short-range data
communications such as multi-processor systems,
processor-to-memory interfaces and network switches.
Unlike parallel links, serial links transmit data and clock
using a single wire channel, typically a differential pair to
minimise electromagnetic interference with neighbouring
devices. The elimination of a dedicated channel for clock
transmission removes the difficulties associated with clock
skew. The use of only a single wire channel also eliminates
the bottle neck associated with data skew. Moreover, it
greatly reduces the cost associated with routing. As a result,
serial links are very attractive in applications such as
block-to-block (on-chip), chip-to-chip, chassis-to-chassis and
computer-to-computer links where the distance over which
data are transmitted is large and the number of channels
available is small.
Although the maximum transit frequency of metal-oxide
semiconductor (MOS) transistors has well exceeded 100 GHz,
118
& The Institution of Engineering and Technology 2014
the data rate of serial links is much lower, as evident in
Table 1, despite the nearly linear improvement of the
maximum data rate with technology scaling, as shown in
Fig. 1. The low data rate is mainly because of inter-symbol
interference (ISI) arising from channel imperfections with
limited bandwidth, reflection and cross-talk the most
critical. The limited bandwidth of channels caused by the
rising resistive and dielectric loss of the channels at high
frequencies gives rise to a long channel impulse response or
equivalently frequency-dependent attenuation, as shown in
Fig. 2a. Reflection caused by the impedance mismatch of
channels, largely because of the inclusion of vias,
connectors and branches in the channels, results in crests
and troughs that are non-uniformly distributed over a large
number of symbol intervals in channel impulse response or
equivalently sharp troughs in frequency domain response,
as shown in Fig. 2b [2–4]. Note that troughs are because of
capacitive impedance mismatches. For channels with severe
reflection, deep troughs exist, as shown in Figs. 2c and d.
Crosstalk is primarily because of capacitive and inductive
coupling with neighbouring devices and manifests itself as
crests and troughs in the channel impulse response. As a
result, data symbols received at the far end of the channel
consist of pre-cursors, main cursor and post-cursors with
the number of post-cursors significantly larger than that of
the pre-cursors, as shown in Fig. 2d. The main cursor is
used for data recovery while pre-cursors and post-cursors
need to be removed. They can be removed using near-end
and far-end channel equalisation.
IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130
doi: 10.1049/iet-cds.2013.0159
www.ietdl.org
Table 1 Data rate of serial links utilizing decision feedback equalisation
Ref.
Shahramian et al. [5]
Payne et al. [6]
Krishna et al. [7]
Beukema et al. [8]
Stojanovic et al. [9]
Balan et al. [10]
Wong et al. [11]
Park et al. [12]
Leibowitz et al. [13]
Bulzacchelli et al. [14]
Hidaka et al. [15]
Seong et al. [16]
Huang et al. [17]
Turker et al. [18]
Bulzacchelli et al. [19]
Kim et al. [20]
Pozzoni et al. [21]
Harwood et al. [22]
Abiri et al. [23]
Sarvari et al. [24]
Wang et al. [25]
Y. Ying et al. [26]
Dickson et al. [27]
Nazari et al. [28]
Gangasaniet al. [29]
Agrawal et al. [30]
Amirkhany et al. [31]
Kaviani et al. [32]
Quan et al. [3]
Joy et al. [33]
Cui et al. [34]
Spagna et al. [35]
Toifl et al. [36]
Toifl et al. [37]
Bulzacchelli et al. [38]
J. Savoj et al. [39]
Tech.
Channel loss
Data rate, Gbps
Tx
Rx
BER
130 nm
130 nm
130 nm
130 nm
130 nm
130 nm
90 nm
90 nm
90 nm
90 nm
90 nm
90 nm
90 nm
90 nm
65 nm
65 nm
65 nm
65 nm
65 nm
65 nm
65 nm
65 nm
45 nm SOI
45 nm SOI
45 nm SOI
45 nm SOI
40 nm
40 nm
40 nm
40 nm
40 nm
32 nm
32 nm
32 nm
32 nm SOI
28 nm
−8 dB
−21 dB (36″ FR4)
−18 dB (33″ FR4)
−18 dB (30″ FR4)
−12 dB (26″ FR4)
−(40″ FR4)
−6.2 dB (10″ SMA)
−12 dB (16″ Tyco)
−(18″ BP)
−33 dB (16″ Legacy BP)
−25.4 dB (29″ FR4)
−25.4 dB (15″ FR4)
−32.7 dB (5.5″ FR4)
−14 dB (20″ Nelco)
−16 dB (30″ PCB)
−21 dB (50″ Nelco)
−24 dB (28″ FR4)
−24 dB (12″ PCB)
−13.3 dB (34″ FR4)
−13.3 dB (34″ FR4)
−13.3 dB (16″ FR4)
−22.3 dB (14″ FR4)
−21 dB (50″ Nelco)
−25 dB (18″ FR4)
−32 dB (40″ Nelco)
−25 dB (20″ PCB)
−10 dB (3″ FR4)
−15 dB (3″ FR4)
−26 dB
−34 dB (24″ FR4)
−20 dB (8″ FR4)
−25 dB (14″ PCB)
−27 dB (39″ PCB)
−36 dB
−35 dB (15″ PCB)
−33 dB BP
3.7
6.25
9.6
6.4
5
6.4
6.0
7.0
−7.5
10
10.3
10
6
15
11
10
8.5
12.5
5
5
21
20
10
15
16
19
16
20
14
16
23
11.8
12.5
30
28
12.5
–
–
2-tap
4-tap
–
2-tap
–
–
–
4-tap
2/3-tap
–
–
–
3-tap
–
–
4-tap
–
–
3-tap
–
–
–
3-tap
–
–
1-tap
4-tap
3-tap
2-tap
3-tap
–
4-tap
4-tap
5-tap
3 IIR
5-tap
CTLE/1-tap
5-tap
1-tap
4-tap
2-tap
2-tap
10-tap
5-tap
CTLE/1-tap
2-tap
1-tap & IIR
1-tap
5-tap
1-tap & IIR
3-tap
2-tap FFE/5-tap
1-tap
1-tap
1-tap
CTLE/1-tap
DFE-IIR
2-tap
12-tap
4-tap FFE/5-tap
1-tap
CTLE/1-tap
10-tap
14-tap
CTLE/1-tap
4-tap
CTLE/8-tap
CTLE/15-tap
CTLE/15-tap
CTLE/3-tap
10−12
10−15
10−15
10−15
–
tL
10−12
10−13
10−12
10−12
10−13
10−12
10−12
10−13
10−15
10−12
–
–
10−12
10−12
10−12
10−12
10−12
10−12
10−15
10−13
10−12
10−12
–
10−12
10−12
10−12
10−12
10−12
10−13
10−15
Channel loss is measured at half baud-rate frequency.
This paper provides a comprehensive review of decision
feedback equalisation (DFE) for multi-Gbps data links. The
remainder of the paper is organised as the followings:
Section 2 investigates the imperfection of wire channels, in
particular, finite bandwidth, reflection and cross-talk and
their impact on data transmission are investigated. The
fundamentals of both near-end and far-end channel
equalisation to combat the effect of the imperfection of wire
channels at high frequencies are explored. Section 3
provides a detailed examination of the principle,
configuration, operation and limitation of DFE. Design
challenges encountered in design of DFE for multi-Gbps
data links including timing constraints, sampling, error
propagation, arithmetic operation, highly dispersive
channels, power consumption and techniques and circuit
implementations that address these challenges are studied.
Section 4 investigates the need for adaptive DFE and the
principles of adaptive DFE. The performance of various
adaptive DFEs is examined and their pros and cons are
compared. Section 5 explores the direction of future
research and development of DFE. The paper is concluded
in Section 6.
2
2.1
Fig. 1 Dependence of data rate on the minimum channel length of
MOS transistors
IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130
doi: 10.1049/iet-cds.2013.0159
Channel equalisation
Near-end channel equalisation
Pre-cursors and post-cursors can be removed by boosting the
high-frequency components [41, 42] or attenuating the
low-frequency components of data symbols [43, 44] prior
to their transmission. The former increases cross-talk as
crosstalk intensifies at high frequencies. The latter reduces
the power of the transmitted symbols as the power of
non-return-to-zero data is largely concentrated at half
baud-rate frequency. Since it increases the relative strength
of the high-frequency components of the transmitted
signals, crosstalk is also reduced [8]. Near-end channel
equalisation is often implemented using finite impulse
119
& The Institution of Engineering and Technology 2014
www.ietdl.org
Fig. 2 Limited bandwidth of channels caused by the rising resistive and dielectric loss of the channels at high frequencies and reflection due to
the impedance mismatch
a Frequency response of a 30″ trace on a Nelco4000-13SI board [40]
b Frequency response of a 16″ Tyco legacy backplane with two daughter cards [40]
c and d Frequency response and impulse response of a highly reflective backplane [4]
response (FIR) filters that introduce zeros to offset the effect
of the poles of the channels [45, 46]. For example, the
first-order pre-emphasis FIR filter shown in Fig. 3 and
given by y(n) = x(n) − a1x(n − 1) where x(n) and y(n) are
the input and output of the FIR filter, respectively, has its
transfer function HFIR(z) = 1 − a1z−1. Clearly it introduces
a zero at z = 0 that will impact all the poles of the channels.
To demonstrate this, since vTs ≪ 1 where Ts the symbol
time, we have z = esTs ≃ 1 + sTs . As a result
s
HFIR (s) ≃ 1 − a1
+1
vz
(1)
where ωz = (1 − a1)/(a1Ts). If we model the channel as a
first-order low-pass, that is, Hch(s) = 1/((s/ωch) + 1) where
ωch is the channel bandwidth, the transfer function of the
equalised channel is given by
s/vz + 1
Heq (s) = 1 − a1 s/vch + 1
(2)
It becomes apparent that if we choose ωz = ωch, the pole of the
channel will be cancelled by the zero of the pre-emphasis
FIR, resulting in a desirable all-pass. Also observed is that
since 1 − a1 < 1, there is a loss of signal energy in
pre-emphasis.
Since the loss of wire channels is typically much deeper
than −20 dB/dec, a high-order pre-emphasis FIR filter is
needed. The order of pre-emphasis FIR filters, however, is
usually limited to four (Table 1) as further increasing the
order improves the performance only marginally [44, 47–
49]. This observation also reveals that wire channels can be
modelled using a fourth-order low-pass adequately.
Fig. 3 Near-end channel equalisation with first-order pre-emphasis
The added pre-emphasis tap shortens the duration of the received symbol thereby improving data rate
120
& The Institution of Engineering and Technology 2014
IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130
doi: 10.1049/iet-cds.2013.0159
www.ietdl.org
Since the characteristics of the channel are not known prior
to data transmission, the optimal tap coefficients of
pre-emphasis FIR filters can only be obtained if a back
channel exists. This constraint undermines the robustness of
pre-emphasis channel equalisation. Another limitation of
pre-emphasis channel equalisation is its inability to remove
ISI caused by reflection and crosstalk as these ISI manifest
themselves as crests and troughs rather than uniformly
sloped attenuation, as shown earlier in Fig. 2. ISI caused by
reflection and crosstalk is typically significant when data
rate is high and channels contain multiple vias, connectors
and branches (highly reflective channels).
2.2
Far-end channel equalisation
Far-end channel equalisation also known as post-equalisation
combats ISI by either amplifying the high-frequency
components of received data symbols in the analog domain
or removing post-cursors in the digital domain prior to
clock and data recovery (CDR). As compared with near-end
equalisation, post-equalisation offers the ability to combat
ISI caused by reflection and crosstalk.
(1) Linear post-equalisation: Linear post-equalisation boosts
the high-frequency components of received symbols with a
continuous-time linear equaliser (CTLE). CTLE provides
zeros to cancel out the poles of the channels so that the
equalised channel exhibits an all-pass transfer characteristic.
To demonstrate this, consider the CTLE in Fig. 4 and
neglect the capacitance of MOSFETs. We examine three
cases
† If only Cx is considered (neglect Rx, L and CL), the transfer
function is given by
Vo (s)
sRC
= − x Vin (s)
sCx / gm + 1
(3)
where gm is the transconductance of the MOSFETs. The
feedback provided by Cx adds a zero at frequency ωz = 0.
The pole provided by Cx is at frequency ωp = gm/Cx. ωp
must be sufficiently higher than the half baud-rate
frequency so that its impact is negligible. The domain in
Fig. 4 Continuous-time linear equalisers with inductor series
peaking, source degeneration and negative capacitors
IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130
doi: 10.1049/iet-cds.2013.0159
which the added zero is effective in compensating the effect
of the poles of the channel is given by ωz ≤ ω ≤ ωp.
† If we consider both Cx and Rx (Neglect L and CL), the
transfer function becomes
Vo (s)
Rgm
sR C + 1
x x
=−
Rx gm + 1 s Rx C / Rx gm + 1 + 1
Vin (s)
(4)
The zero is now located at ωz = 1/(RxCx) and the pole is at
vp = Rx gm + 1 / Rx Cx ≃ gm /Cx
provided Rx gm ≫ 1. It is evident that ωz is now tunable by
varying Cx and Rx.
† If L, Rx, Cx and CL are all considered, the transfer function
becomes
Vo (s)
Rgm
= −
Vin (s)
Rx gm + 1 LCL
sRx Cx + 1 s(L/R) + 1
× sRx C / Rx gm + 1 + 1 s2 + s(R/L) + 1/ LCL
(5)
It is seen from (5) that the addition of the inductor peaking
introduces another zero at ωz2 = R/L. This is in addition to
the zero introduced by Cx and Rx at ωz1 = 1/(RxCx). It also
introduces complex conjugate
poles with natural resonant
frequency vn = 1/ LCL . It is well understood that
complex conjugate poles improve bandwidth [50]. The
zeros are used to cancel the effect of the poles of the
channel so as to increase the bandwidth while the complex
conjugate poles improve the bandwidth through resonance.
The higher the quality factor, the larger the bandwidth
improvement. The addition of the negative capacitors
reduces CL, which in turn boosts the natural resonant
frequency ωn subsequently the bandwidth.
The use of zeros to offset the effect of the poles of wire
channels bears a strong resemblance to the use of filtering
mechanisms to compensate for the loss of wireless channels
so as to shorten channel impulse response length or
equivalently improve the channel bandwidth, for example,
the time truncation of channel impulse response by filtering
proposed in [51]. The computational cost of these
mechanisms, however, makes them difficult to meet the
ever stringent timing constraints of multi-Gbps serial links.
As the received symbol is severely attenuated by the
channel upon arriving CTLE, input offset voltage
compensation is also required in CTLE [15, 20]. The order
of CTLE is determined by the attenuation of the channel
and the sensitivity of the slicer. High-order CTLE can be
obtained by cascading low-order CTLEs at the cost of more
power consumption [21]. CTLE is often used in
conjunction with non-linear post-equalisation with the
former providing secondary channel equalisation. As a
result, low-order DFE can be used without sacrificing
performance [26, 49]. CTLE has also been used as a solo
post-equaliser for channels with negligible reflection and
cross-talk. The absence of feedback in this case allows
CTLE to support higher data rates. For example, CTLE in
130 nm CMOS enables 10 Gbps transmission over 30″ FR4
channel of −21 dB loss at half baud-rate frequency and
achieves 10−13 bit-error-rate (BER) [52]. Similarly, CTLE
121
& The Institution of Engineering and Technology 2014
www.ietdl.org
implemented in 130 nm CMOS supports 10 Gbps
transmission over 34″ FR4 channel with −14 dB loss and
consumes 6 mW [53]. It should be emphasised that CTLE
is only effective in removing channel loss-induced ISI and
ineffective in eliminating crosstalk/reflection-induced ISI
[52].
(2) Non-linear post-equalisation: Unlike CTLE, non-linear
post-equalisation mitigates the effect of channel loss,
reflection and crosstalk by removing the post-cursors of the
received symbol in the digital domain. The most widely
used non-linear equalisation is decision feedback
equalisation (DFE) introduced by Austin in 1967 [54].
Unlike other channel equalisation mechanisms, since DFE
does not amplify the received data symbol, it therefore does
not deteriorate crosstalk with neighbouring devices.
Moreover, since the number, the weight and the order of
the taps of DFE can be adjusted in accordance with the
characteristics of the channel to be equalised, DFE is not
only most effect in eliminating ISI caused by the finite
bandwidth of the channel, it is also most effective in
eliminating ISI caused by reflection and crosstalk. Since
DFE only utilises the past decisions, it has no effect on
pre-cursors. As a result, the only effective means available
for us to combat pre-cursors is pre-emphasis. Fig. 5 shows
the basic configuration and operation of DFE. The slicer
clocked by the recovered clock senses the difference
between the partially equalised symbol vin − vf and
reference vref. The output of the slicer passes through N
delay stages of unit delay where N is the number of the
post-cursors to be removed. The output of each delay stage
is multiplied by weight factor ck such that Hk = ckDj − k
where Hk is kth post-cursor and Dj − k is ( j − k)th past
decision
of
the
slicer.
It
is
seen
that
Dj = vin, j − Nn=1 v f , n cn D j−n . The operation of DFE is
illustrated using four consecutive logic-1 symbols S1–S4.
The received symbols are overlapped because of ISI. Let
Hm,n denote nth post-cursor of symbol-m with Hm,0 the
main cursor. Observe that vin[4] contains four components:
the main cursor of symbol-4 H4,0, the first post-cursor of
symbol-3 H3,1, the second post-cursor of symbol-2 H2,2 and
the third post-cursor of symbol-1 H1,3, that is, vin[4] = H4,0
+ H3,1 + H2,2 + H1,3. If the feedback vf contains vf = H3,1 +
H2,2 + H1,3 = c3D3 + c2D2 + c1D1, then when it is subtracted
from vin[4], we have vs[4] = vin[4] − vf = H4,0. H4,0 can
therefore be safely digitised regardless of the presence of
ISI. It is evident that DFE has no impact on both noise
present in current symbol and pre-cursors.
3 Design challenges in decision feedback
equalisation
3.1
Bit error rate test
The performance of serial links is primarily quantified by
BER obtained by transmitting a pseudo-random bit stream
(PRBS) to the channel and recording the number of
transmission error, typically BER = 10−12 is required.
Transmission errors are obtained using a PRBS checker that
compares the transmitted bits with the corresponding
received bits. Although PRBS7 (7-bit PRBS) has been used
[20], they are primarily for testing serial links with 8 B/10
B encoded data. PRBS31 (31-bit PRBS) that provides a
sufficient transition density is preferred especially for those
using 64 B/66 B encoded data. PRBS can be generated
using linear feedback-shift registers, although parallel PRBS
generators are also available [55]. Since eye-opening is
typically maximised at the centre of the data eye where
BER is minimised and gradually levels off towards the
Fig. 5 Basic configuration and operation of decision feedback equalisation
Legends: UI – Delay cell with one unit delay, vs = vin − vf is the symbol before the slicer
vref is the threshold of the slicer, Dj=1 is the delayed version of Dj, c1,…, cN are the weighting factors of feedback taps, Ts is the symbol time, Hj, k denote the kth
post-cursor of symbol-j and H0, j the main cursor of symbol-j
122
& The Institution of Engineering and Technology 2014
IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130
doi: 10.1049/iet-cds.2013.0159
www.ietdl.org
Fig. 6 Dependence of horizontal eye-opening on BER
edges of the data eye where BER climbs, the horizontal
eye-opening at a given BER, for example 10−8 [56], 10−9
[14, 20, 40], or 10−10 [36, 37, 57], is usually used as a
figure-of-merit to quantify the performance, as shown in
Fig. 6. The bathtub curves are obtained by varying the
sampling instant within one UI while evaluating BER for
each sampling instant [14]. It is seen that BER is minimised
at the centre of the data eye and gradually levels up when
sampling instant moves away from the centre towards the
edge of the data eye.
3.2
Timing constraints
DFE operations include data slicing, multiplication and
subtraction must be completed within one UI. Since there is
only one UI between current symbol and tap-1 feedback,
tap-1 loop has the most stringent constraint. An effective
way to overcome this difficulty is to feed the error signal
for all possible feedback signals ( +1 or −1) to two
identical slicers, as shown in Fig. 7. The decisions of the
slicers are 2-to-1 multiplexed with its output selected by the
previous decision [7, 8, 13, 14]. This approach was
originally proposed by Kasturia et al. [58] and is known as
loop unrolling, speculation, look-ahead, or partial-response
[59]. The use of loop unrolling is often limited to tap-1 as
the number of slicers and summers increases exponentially
with the number of taps. In recent DFE implementations,
loop unrolling has also been used for first two taps to cope
with 28 Gbps data rate [38] and first 3 taps to cope with 30
Gbps data rate [36, 37, 57]. As pointed out in [6, 60], the
delay of the slicer could be long if its input is small. To
overcome this, the insertion of an auxiliary amplifier
between the CTLE and the slicer has been shown to be
effective. This approach, however, becomes less effective
once data rate is high. The remaining DFE taps are
generated using clocked delay stages and their outputs are
weighted and subtracted from the output of the CTLE. This
approach is known as direct feedback [8] or dynamic
feedback [46]. To relax timing constraint and lower power
consumption, half-rate approach where the remaining taps
are operated at half the data rate, as depicted graphically in
Fig. 7, is widely favoured [8]. Quarter-rate approach has
also been used to further relax timing constraint in recent
designs [36, 37, 46, 57].
3.3
Sampling
The slicer is typically implemented using a re-generative
sense amplifier for speed, as shown in Fig. 8 [13, 15, 61,
62]. When φ = 0, the re-generative mechanism is disabled
and the input and output of the cross-coupled inverters are
set to be equal, forcing their operating point to the
transition region where a maximum transconductance exists.
The input and reference voltage, in the mean time, are
Fig. 7 Half-rate decision feedback equalisation
Tap-1 is implemented using loop-unrolling while the remaining taps are implemented using direct feedback with a complementary clock whose frequency is only
half that required for the data rate
IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130
doi: 10.1049/iet-cds.2013.0159
123
& The Institution of Engineering and Technology 2014
www.ietdl.org
avoids the deployment of compensation capacitors at the
input of the slicer, which has a detrimental impact on speed.
3.4
Fig. 8 Clocked re-generative sense amplifier as slicer [13, 15, 61,
62]
sampled by the input capacitors of the slicer. M6 and M11 are
turned on, equalising the voltages of the two branches.
In the following φ = 1 phase, the re-generative mechanism
is activated, forcing the output of the slicer to respond to the
voltage sampled by M2 and M5 in the previous phase.
Since the signal at the far end of the channel is attenuated,
the input offset voltage of the slicer must be sufficiently small
in order to minimise slicer error. It was shown in [63] that the
BER of a slicer with an input offset voltage Vos is given by
⎛
⎞
1 ⎜V − V ⎟
BER ≃ Err⎝ m os ⎠
2
Vn2
(6)
where Vn2 is the input-referred noise power, Vm is the voltage
of the input and
√ 1 −u2 /2
e
du
Err (x) = 1/ 2p
x
is the error function. Clearly the input offset voltage directly
affects the BER. Large input devices are preferred from a low
input offset voltage point of view [64]. Small input devices,
on the other hand, have the advantage of a low input
capacitance and therefore high speed, but at the cost of
deteriorating input offset voltage.
The calibration of the slicer prior to any data-slicing
operation is mandatory. In [7], a background calibration
method was used to calibrate slicers. Specifically, two
identical slicers operated in an interleaved manner are
employed. One slicer is connected to the input for sensing
(on-line slicer) the other slicer is connected to a reference
voltage for calibration (off-line slicer) [9]. The calibration
of the off-line slicer can be accomplished using
conventional auto-zero techniques or dynamic offset control
techniques [65] to avoid the power penalty of current
array-based offset compensation [47] and the speed penalty
of capacitor array-based offset compensation [48]. The
digital offset compensation technique proposed in [66]
detects the effect of offset voltage on the duty-cycle of the
output and utilises the detected duty-cycle imbalance to
adjust the biasing currents so as to eliminate duty-cycle
imbalance subsequently the offset effect. The method
124
& The Institution of Engineering and Technology 2014
Error propagation
If the output of the slicer does not correspond to the received
data symbol, a slice error occurs. The slicer error will not only
affect the current decision, it will also affect the next decision.
Moreover, the error will propagate through the entire delay
chain. This error propagation characteristic is a fundamental
drawback of DFE. Clearly, to minimise the possibility of
slicer error, the input of the slicer prior to slicing must be
sufficiently large and disturbance-free.
There are a number of factors that contribute to slice error.
The first is the input offset voltage of the slicer. As detailed in
Section 3.3, the input offset voltage of the slicer must be
compensated in order to improve the sensitivity of the slicer
unless the signal from the preceding CTLE is large enough,
revealing the importance to have a CTLE preceding the
slicer. The feedback signal from the summer must also be
stable prior to slicing operation. To achieve this, a stringent
timing constraint is imposed on the feedback path where a
number of arithmetic blocks exist, as detailed in Section
3.5. Since the slice is differentially configured, a proper
common-mode voltage must be in place so that a small
differential input from the preceding CTLE can be sensed
and latched up correctly. Current-integrating where the
incoming signal is integrated over a capacitor and
the resultant capacitor voltage is sampled at the end of the
integration phase is an effective means to minimise the
effect of transient disturbances so as to minimise slicer error
[67, 68]. This approach is effective if the duration of the
disturbance is much smaller as compared with the symbol
time. Difficulty also arises when the data rate is high unless
the integrating capacitor is sufficiently small. A small
integrating capacitor, however, signifies the effect of the
parasitic capacitances of devices. The error of the slicer is
minimised if the incoming signal is sufficiently large. This
can be achieved using CTLE prior to slicing, as shown in
Fig. 4 earlier [26]. Note that CTLE also provides partial
channel equalisation. Some implementations also employ a
pre-amplifier preceding CTLE to boost the received signal
prior to equalisation so as to minimise the possibility of
slice error [3, 4]. The effect of the kick-back of the slicer
should also be taken into consideration [69, 70].
3.5
Arithmetic operation
Both multiplication and summation operations are needed in
DFE. Not only these operations must be completed within
one UI, the result must also be stable prior to any slicing
operation. Multiplication of a past Boolean decision by a
weighting factor is most efficiently implemented using
current-steering configurations, as shown in Fig. 9. The
delay of the summer usually dominates the speed of
arithmetic operation because of the large number of the taps
[8]. Current summation is the most widely favoured over its
voltage counterpart because of its ease of implementation
and high-speed operation. The speed of the current-mode
resistor-load summer shown in Fig. 9 (without inductors) is
set by the time constant of the current summation node.
Since vin is attenuated, the transistors driven by vin are
operated in saturation whereas those in the tap stages are
operated in an ON/OFF mode. Lowering the load resistance
R reduces the time constant, however, at the cost of reduced
output voltage swing. As the output of the summer directly
IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130
doi: 10.1049/iet-cds.2013.0159
www.ietdl.org
Fig. 9 Current-mode summer with resistor loads
When the dotted portion replaces the upper portion of the current-mode summer with resistor loads, it becomes a current-integrating summer
The weight of tap-1, … , tap-N is tuned by varying the tail currents c1, … , cN, respectively
When the dotted portion replaces the lower portion of the current-mode summer with resistor loads, it becomes a capacitive charge feedback summer
feeds the slicer, a large output voltage of the summer is
essential to minimise slicer error. Increasing the dimension
of the input transistors improves output voltage swing, it,
however, also reduces the speed. One effective way to
speed up the summer without reducing the load resistance
is shunt inductor peaking, as shown in Fig. 9 [26]. In [12,
40, 71], a current-integrating summer shown in Fig. 9 was
proposed. To improve speed, source degeneration is widely
adopted. The load resistors are replaced with pMOS
transistors operated in an ON/OFF mode. In the reset phase,
pMOS transistors are switched on and the load capacitors
are charged to the supply voltage. During the following
integration phase, pMOS transistors are switched off and
the capacitors are discharged by the tail current sources
representing the feedback taps. To eliminate the effect of vin
during discharge, vin is disconnected from the gate of the
input transistors [40]. The current-integrating summer offers
the key advantage of reduced power consumption because
there is no static current flowing from VDD to ground in
both the reset and integrating phases [19, 56, 72]. The
speed of the current-integrating summer can be further
increased by replacing current feedback taps with capacitive
charge feedback, as shown in Fig. 9, with capacitance
proportional to DFE tap coefficients [36, 37, 57]. To further
reduce power consumption, switched-capacitor summers
were proposed [72]. Since complex clock schemes are
needed for their proper operation, switched-capacitor
summers are typically used to perform the summation of
the first-tap with the rest of the taps implemented using the
current-integrating approach depicted earlier.
The characteristics of channels with severe reflection,
however, differ from those of channels with high loss but
insignificant reflection. The impulse response of these
channel typically have post-cursors that reside far away
from the main cursors. The post-cursors between the
dominating post-cursors typically immediately following
the main cursor and reflection-induced post-cursors are
often insignificant, leading to sparsity in post-cursor
distribution. Although there are many effective means to
equalise sparse wireless channels [73, 74], these
approaches cannot be adopted for wire channels because of
the need for excessive computation subsequently long
latency. Equalisation of these channels requires a long
fixed-tap DFE even though many of the taps corresponding
to the insignificant post-cursors between the main cursors
and the remotely placed post-cursors are insignificant,
resulting in excessive power and silicon consumption.
Floating-tap DFE proposed by Zhong et al. is an elegant
technique effective in combating reflection-induced
post-cursors located far away from the main cursor [2–4].
In this approach, a number of fixed-taps are used to
remove dominant post-cursors located close to the main
cursor. In addition, a number of floating-taps whose
locations are not fixed but rather determined by an
optimisation algorithm that yields the largest tap
coefficients subsequently the best performance are used to
remove reflection-induced post-cursors. Although extra
computation is needed, this additional cost is well justified
by the elimination of the remote post-cursors.
3.7
3.6
Power consumption
Channels with severe dispersity
The impulse response of severely dispersive channels
stretches over a large number of symbol intervals, as seen
in Fig. 2 [4]. To equalise these channels, a large number of
taps is needed, resulting in excessive power and silicon
consumption. Efficient DFE with a small number of taps
without sacrificing performance is highly desirable.
DFE with an analog infinite-impulse-response (IIR) filter
uses an analog IIR filter to mimic the response of the
channel such that when subtracted from the response of the
channel, the tail is removed without using a large number
of DFE taps [20]. This approach works well for highly
dispersive channels.
IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130
doi: 10.1049/iet-cds.2013.0159
The power consumption of a decision feedback equaliser
consists of the power consumption of the slicer, the delay
units and the summer. The power consumption of the
summers is significant because of their current-mode
configuration. When loop-unrolling is employed, additional
power consumption exists. The DFE-IIR examined earlier
offers an attractive means to reduce power consumption,
especially for highly dispersive channels. In [46], a
soft-decision DFE was proposed to replace loop-unrolling
and dynamic feedbacks without sacrificing speed. Instead of
employing two slicers and other logic circuits, soft-decision
DFE uses sample-and-hold before the summation and
latches after the summation to perform channel equalisation.
125
& The Institution of Engineering and Technology 2014
www.ietdl.org
value of ek and signal vk − j that can only be obtained using
ADCs. Sign-sign LMS (SS-LMS) where only the sign of ek
and vk − j are used is proven to be an effective alternative
c j, k+1 = c j, k + h sign ek sign vk−j
(8)
sign ek and sign (vk − j) can be obtained conveniently using
comparators. Since SS-LMS searches for the optimal tap
coefficients based on the binary decision of the sign of ek
and vk − j, the final optimal value of the tap coefficients will
fluctuates in the vicinity of the optimal taps. In practice, a
smaller h is typically used by SS-LMS to reduce the
fluctuation. The convergence time of SS-LMS, however,
will be shorter as compared with regular LMS.
4.2
Fig. 10 Configuration of SS-LMS adaptive DFE
4
Adaptive decision feedback equalisation
The variation of wire channels requires that adaptive DFE so
that the tap coefficients of DFE can be set in accordance with
the characteristics of the channels. In this section, we examine
the adaptive DFE algorithms that are widely used for
Multi-Gbps serial links.
4.1
Least-mean-square (LMS) adaptive DFE
LMS adaptation updates the DFE tap coefficients in such a
way that the power of the error between the output and
input of the slicer is minimised, that is, minimise
||Dj − vs, j ||, as shown in Fig. 10. The tap coefficients ck in
step k of DFE are updated using [75]
c j, k+1 = c j, k + hek vk−j
(7)
where h is the step size used to adjust the tap coefficients.
LMS is difficult to implement because of the need for the
Eye-opening adaptive DFE
The opening of data eyes can be used to guide the search for
the optimal parameters of DFE [6, 60]. Eye-opening can be
captured using an eye-opening monitor (EOM), as shown in
Fig. 11. An one-dimensional EOM quantifies the opening
of data eyes by either the vertical or horizontal dimension
of the eye with the vertical opening the most widely used
(Fig. 12) [21, 35, 76–79]. The horizontal eye-opening can
be determined from oversampling received data symbols.
This, however, is at the cost of high silicon and power
consumption [61]. Improvements were made in [24, 80]
where both the edges and centre of the eye are used to
quantify the opening of the eye. A two-dimensional EOM
measures the dimension of the eye in both the vertical and
horizontal directions [81–84]. Since the edge of the eye tc is
determined by CDR, tH and tL can be chosen for the
desired horizontal eye-opening using delay blocks.
In one-dimensional EOM-based adaptive DFE, we are only
concerned with the cases where vin < VH and vin > VL and do
not care of the cases where vin > VH or vin < VL. LMS, on the
other hand, searches for optimal tap coefficients that
minimises ||vin − VH ||. Clearly LMS will converge slower as
compared with one-dimensional EOM-based adaptive
algorithms that only need to satisfy relaxed constraint vin > VH
and vin < VL. This is the fundamental difference between
SS-LMS adaptive DFE and EOM-based adaptive DFE. An
attractive characteristic of EOM-based adaptive DFE is the
freedom to adjust VH and VL if a one-dimensional EOM is
used and VH, VL, tH and tL if a two-dimensional EOM is used.
Clearly the increase in the number of constraints of adaptive
DFE will result in better eye-opening subsequently better BER.
Fig. 11 Eye-opening monitors
a One-dimensional EOM
b Two-dimensional EOM
126
& The Institution of Engineering and Technology 2014
IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130
doi: 10.1049/iet-cds.2013.0159
www.ietdl.org
Fig. 12 One-dimensional eye-opening adaptive DFE [78]
4.3
Jitter-based adaptive DFE
The intrinsic relation between the vertical and horizontal
openings of data eyes reveals that minimising timing jitter
at the edges of the eye will also maximise the vertical
opening of the data eye as illustrated graphically in Fig. 13.
To illustrate this, we represent the eye-diagram with zero
jitter with a sinusoid vs(t) = Vmsin(ωst) where ωs = π/Ts, as
shown in Fig. 13. We further assume that the eye-diagram
with non-zero jitter is simply the down-shifted version of
the one without jitter, that is, v̂s (t) = Vm sin (vs t) − DVm ,
where ΔVm is the variation of the amplitude because of
timing jitter. It is straightforward to show that Δt = Ts/
πsin−1(ΔVm/Vm).
Dt ≪ Ts ,
we
have
If
Dt ≃ Ts /p DVm /Vm . It is evident that ΔVm is directly
proportional to Δt.
In [85], a jitter-based EOM was proposed. The transition
edge of data eyes is sampled by a number of samplers, as
shown in Fig. 14. XOR gates are used to determine the
location of the transition edges and counters to record the
number of transitions at each sampling position. An
edge-transition histogram is generated. Measurement results
demonstrate that the larger the eye-opening, the more
narrow the histogram. Since the quality of the obtained
histogram depends upon the number of samplers,
this method is power hungry. Also, it becomes difficult
to employ multiple samplers when data rate is high.
Fig. 14 Data edge-based EOM by Gerfers et al. [85]
To simultaneously minimie the timing jitter and maximise
the vertical opening, a dual-mode adaptive DFE was
proposed [46]. The dual-mode DFE consists of a data DFE
and an edge DFE with the former maximising the vertical
opening and the latter minimising the timing jitter. The
edge adaptive DFE reduces the eye-edge timing jitter by
30% without sacrificing vertical opening.
4.4
Blind ADC-based adaptive DFE
The effectiveness of EOM-based adaptive DFE depends upon
the choice of the reference voltages used to generate the error
signals. This approach works well with phase-tracking CDR
as recovered data-sampling clock is positioned at the centre
Fig. 13 Relation between vertical opening and edge jitter
a Small eye-edge jitter
b Large eye-edge jitter
IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130
doi: 10.1049/iet-cds.2013.0159
127
& The Institution of Engineering and Technology 2014
www.ietdl.org
Fig. 15 Blind ADC-based adaptive DFE
v∗in : desired post-cursor response
of the data eye. For phase-picking or over-sampling clock and
data recovery, since the centre of the data eye is not known,
EOM-based adaptive DFE becomes less attractive. In [23,
86], a set of eight vertical openings per UI are used to
provide the desired post-cursor profile of the data eye, as
shown in Fig. 15. In each sub-interval, LMS is used to
adjust the tap coefficients of DFE to maximise the vertical
eye-opening in the sub-interval. The incoming data symbol
is digitised using four flash ADCs operated in an
interleaved manner to cope with high data rates [22]. All
adaptive DFE operations are performed in the digital
domain to take the advantage of the flexibility. Clearly
flexibility is at the price of power consumption.
without excessive power consumption, however, remains to
be a steep cliff to climb. The recent deployment of
three-dimensional MOSFET, also known as FINFET,
reduces the minimum channel length to below 28 nm. The
speed advantage offered by FINFET equips designers with
an effective means to overcome the difficulties encountered
in meeting the timing constraint of DFE. The utilisation of
FINFET in design of the building blocks of DFE is being
carried out by leading SerDes developers worldwide. It is
expected that exciting results will soon emerge. To equalise
highly dispersive channels, a large number of taps are
needed. This is often accompanied with an excessive
amount of power and silicon consumption. Adaptive DFE is
proven to be an effective means to achieve this. For
channels with strong reflection and crosstalk, since the
location of the reflection and crosstalk varies from one
application to another, the post-cursors caused by the
reflection and crosstalk often locate far away from the main
cursor. As a result, DFE with a large number taps is
needed. Algorithms that automatically determine the
location of the post-cursors that is, the sparsity of the
post-cursors, are needed in order to minimise the number of
the taps needed subsequently the silicon and power
consumption of DFE. Portability and adaptivity are
therefore the two most important issues that future DFE
must address while meeting timing requirement.
6
4.5
Comparison
The preceding presentation of EOM-based adaptive DFE,
jitter-based adaptive DFE and ADC-based adaptive DFE,
reveals the following intrinsic advantages of these adaptive
DFE as compared with LMS adaptive DFE
1. EOM and jitter-based adaptive DFE allow designers to
freely set the constraints with which the optimisation
algorithms must satisfy. These constraints such as vertical
and horizontal eye-openings and timing jitter are directly
related to BER of data links.
2. Multiple constraints such as eye-opening and timing jitter
at the edges of data eyes can be imposed simultaneously to
obtain significantly improved performance, as demonstrated
in [46].
3. The constraint of optimisation constraint is entirely set by
users. For example, in [87], a hexagon two-dimensional EOM
was proposed to provide better measure of the minimum data
eye so as to provide an improved two-dimensional EOM
adaptive DFE.
4. The step size of EOM adaptive DFE can be set adaptively
in accordance with the level of the severity of the violation of
the minimum eye-opening or timing jitter so as to provide
improved adaptivity and performance, as demonstrated in
[81].
5
Future of DFE
Although DFE is most effective in minimising ISI caused by
finite channel bandwidth, reflection and crosstalk, a number
of challenges exist in deploying DFE for multi-Gbps data
communications over highly dispersive channels with
strong reflection and crosstalk. It is highly desirable that
channel equalisation be fully portable from one technology
node to another. ADC-based DFE offers such a flexibility.
Performing analog-to-digital conversion at multi-Gbps
128
& The Institution of Engineering and Technology 2014
Conclusions
A comprehensive review of decision feedback equalisation
for multi-Gbps data links has been presented. The
imperfections of wire channels and their impact on data
transmission have been investigated. The pros and cons of
near-end and far-end channel equalisations that combat ISI
have been explored. A detailed examination of the
principle, configuration, operation and limitation of DFE
has been provided. Design challenges encountered in DFE
for multi-Gbps data links including BER, timing
constraints, error propagation, arithmetic operation,
sampling and delay cells and circuit techniques addressing
these challenges have been studied. The need for and the
principle of adaptive DFE have also been investigated.
7
Acknowledgment
The authors are deeply grateful to reviewers for their
invaluable comments. The paper could not have been in its
present form without the criticism and suggestion of the
reviewers.
This project was financially supported by Natural Sciences
and Engineering Research Council (NSERC) of Canada.
8
References
1 Hu, A., Yuan, F.: ‘Inter-signal timing skew compensation of parallel
links with voltage-mode incremental signaling’, IEEE Trans. Circuits
and Systems I, 2009, 56, (4), pp. 773–783
2 Aziz, P., Kimura, H., Malipatil, A., Kotagiri, S.: ‘A class of
down-sampled floating tap dfe architectures with application to serial
links’. Proc. IEEE Int. Symp. Circuits and Systems, 2012, pp. 325–328
3 Zhong, F., Quan, S., Liu, W., et al.: ‘A 1.0625-to-14.025 Gb/s
multimedia transceiver with full-rate source-series-terminated transmit
driver and floating-tap decision-feedback equalizer in 40 nm CMOS’.
IEEE Int. Solid-State Circuit Conf. Digest of Technical Papers, 2011,
pp. 348–349
4 Zhong, F., Quan, S., Liu, W., et al.: ‘A 1.0625–14.025 Gb/s multi-media
transceiver with full-rate source-series-terminated transmit driver and
IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130
doi: 10.1049/iet-cds.2013.0159
www.ietdl.org
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
floating-tap decision-feedback equalizer in 40 nm CMOS’, IEEE
J. Solid-State Circuits, 2011, 46, (12), pp. 3126–3139
Shahramian, S., Yasotharan, H., Carusone, A.: ‘Decision feedback
equalizer architectures with multiple continuous-time infinite impulse
response filters’, IEEE Trans. Circuits Syst. II, 2012, 59, (6),
pp. 226–230
Payne, R., Bhakta, B., Ramaswamy, S., et al.: ‘A 6.25 Gb/s binary
adaptive DFE with first post-cursor tap cancellation for serial
backplane communications’. IEEE Int. Solid-State Circuits Conf.
Digest of Technical Papers, 2005, pp. 68–69
Krishna, K., Yokoyama-Martin, D., Caffee, A., et al.: ‘A multi-giga-bit
backplane transceiver core in 0.13-m CMOS with a power-efficient
equalization architecture’, IEEE J. Solid-State Circuits, 2005, 40, (12),
pp. 2658–2666
Beukema, T., Sorna, M., Selander, K., et al.: ‘A 6.4-Gb/s CMOS SerDes
core with feed-forward and decision-feedback equalization’, IEEE
J. Solid-State Circuits, 2005, 40, (12), pp. 2633–2645
Stojanovic, V., Ho, A., Garlepp, B., et al.: ‘Autonomous dual-mode
(PAM2/4) serial link transceiver with adaptive equalization and data
recovery’, IEEE J. Solid-State Circuits, 2005, 40, (4), pp. 1012–1026
Balan, V., Caroselli, J., Chern, J., et al.: ‘A 4.8-6.4-Gb/s serial link for
backplane applications using decision feedback equalization’, IEEE
J. Solid-State Circuits, 2005, 40, (9), pp. 1957–1967
Wong, K., Rylyakov, A., Yang, C.: ‘A 5 mW 6 Gb/s quarter-rate
sampling receiver with a 2-tap DFE using soft decisions’, IEEE
J. Solid-State Circuits, 2007, 42, (4), pp. 881–888
Park, M., Bulzacchelli, J., Beakes, M., Friedman, D.: ‘A 7 Gb/s 9.3 mW
2-tap current-integrating DFE receiver’. IEEE Int’l Solid-State Circuits
Conf. Digest of Technical Papers, February 2007, pp. 230–231
Leibowitz, B., Kizer, J., Lee, H., et al.: ‘A 7.5 Gb/s 10-tap DFE receiver
with first tap partial response, spectrally gated adaptation, and 2nd-order
data-filtered CDR’. IEEE Int. Solid-State Circuits Conf. Digest of
Technical Papers, February 2007, pp. 228–599
Bulzacchelli, J., Meghelli, M., Rylov, S., et al.: ‘A 10-Gb/s 5-tap DFE/
4-tap FFE trnsceiver in 90 nm CMOS technology’, IEEE J. Solid-State
Circuits, 2006, 41, (12), pp. 2885–2900
Hidaka, Y., Gai, W., Horie, T., Jiang, J., Koyanagi, Y., Osone, H.: ‘A
4-channel 1.25-10.3 Gb/s backplane transceiver macro with 35 dB
equalizer and sign-based zero-forcing adaptive control’, IEEE
J. Solid-State Circuits, 2009, 44, (12), pp. 3547–3559
Seong, C., Rhim, J., Choi, W.: ‘A 10-Gb/s adaptive look-ahead decision
feedback equilizer with an eye-opening monitor’, IEEE Trans. Circuits
Syst. II, 2012, 59, (4), pp. 209–213
Huang, Y., Liu, S.: ‘A 6 Gb/s receiver with 32.7 dB adaptive DFE-IIR
equalization’. IEEE Int. Solid-State Circuits Conf. Digest of Technical
Papers, February 2011, pp. 356–358
Turker, D., Rylyakov, A., Friedman, D., Gowda, S., Sanchez-Sinencio,
E.: ‘A 19 Gb/s 30 mW 1-tap speculative DFE receiver in 90 nm CMOS’.
Symp. VLSI Circuits Digest of Technical Papers, 2008, pp. 216–217
Bulzacchelli, J., Dickson, T., Deniz, Z., et al.: ‘A 78 mW 11.1 Gb/s
5-tap DFE receiver with digitally calibrated current-integrating
summers in 65 nm CMOS’. IEEE Int’l Solid-State Circuits Conf.
Digest of Technical Papers, February 2009, pp. 368–369
Kim, B., Liu, Y., Dickson, T., Bulzacchelli, J., Friedman, D.: ‘A 10-Gb/s
compact low-power serial I/O with DFE-IIR equalization in 65-nm
CMOS’, IEEE J. Solid-State Circuits, 2009, 44, (12), pp. 3526–3538
Pozzoni, M., Erba, S., Viola, P., et al.: ‘A multi-standard 1.5 to 10 Gb/s
latch-based 3-tap DFE receiver with a SSC tolerant CDR for serial
backplane communication’, IEEE J. Solid-State Circuits, 2009, 44,
(4), pp. 1306–1315
Harwood, M., Warke, N., Simpson, R., et al.: ‘A 12.5 Gb/s SerDes in
65 nm CMOS using a baud-rate ADC with digital receiver
equalization and clock recovery’. Proc. IEEE Custom Integrated
Circuits Conf., 2007, pp. 436–437
Abiri, B., Sheikholeslami, A., Tamura, H., Kibune, M.: ‘A 5 Gb/s
adaptive DFE for 2x blind ADC-based CDR in 65 nm CMOS’. IEEE
Int. Solid-State Circuit Conf. Digest of Technical Papers, February
2011, pp. 436–437
Sarvari, S., Tahmoureszadeh, T., Sheikholeslami, A., Tamura, H.,
Kibune, M.: ‘A 5 Gb/s speculative DFE for 2x blind ADC-based
receivers in 65-nm CMOS’. Symp. VLSI Circuits Digest of Technical
Papers, June 2010, pp. 69–70
Wang, H., Lee, J.: ‘A 21-Gb/s 87-mW transceiver with FFE/DFE/analog
equalizer in 65-nm CMOS technology’, IEEE J. Solid-State Circuits,
2010, 45, (4), pp. 909–920
Ying, Y., Liu, S.: ‘A 20 Gb/s digitally adaptive equalizer/DFE with
blind sampling’. IEEE Int. Solid-State Circuits Conf. Digest of
Technical Papers, February 2011, pp. 444–446
IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130
doi: 10.1049/iet-cds.2013.0159
27 Dickson, T., Liu, Y., Rylov, S., et al.: ‘An 8x 10-Gb/s
source-synchronous I/O system based on high-density silicon carrier
interconnects’, IEEE J. Solid-State Circuits, 2012, 47, (4), pp. 884–896
28 Nazari, M., Emami-Neyestanak, A.: ‘A 15-Gb/s 0.5-mW/Gbps two-tap
DFE receiver with far-end crosstalk cancellation’, IEEE J. Solid-State
Circuits, 2012, 47, (10), pp. 2420–2432
29 Gangasani, G., Hsu, C., Bulzacchelli, J., et al.: ‘A 16-Gb/s backplane
transceiver with 12-tap current integrating DFE and dynamic adaption
of voltage offset and timing drifts in 45-nm SOI CMOS technology’,
IEEE J. Solid-State Circuits, 2012, 47, (8), pp. 1828–1841
30 Agrawal, A., Dickson, J.B.T., Liu, Y., Tierno, J., Friedman, D.: ‘A
19-Gb/s serial link receiver with both 4-tap FFE and 5-tap DFE
function in 45-nm SOI CMOS’, IEEE J. Solid-State Circuits, 2012,
47, (12), pp. 134–136
31 Amirkhany, A., Kaviani, K., Abbasfar, A., et al.: ‘A 4.1-pJ/b, 16-Gb/s
coded differential bidirectional parallel electrical link’, IEEE
J. Solid-State Circuits, 2012, 47, (12), pp. 3208–3219
32 Kaviani, K., Wu, T., Wei, J., et al.: ‘A tri-modal 20-Gbps/
linkdifferential/DDR3/GDDR5 memory interface’, IEEE J. Solid-State
Circuits, 2012, 47, (4), pp. 926–937
33 Joy, A., Mair, H., Lee, H., et al.: ‘Analog-DFE-based 16 Gb/s SerDes in
40 nm CMOS that operates across 34 dB loss channels at Nyquist with a
baud rate CDR and 1.2 Vpp voltage-mode driver’. IEEE Int’l Solid-State
Circuit Conf. Digest of Technical Papers, February 2011, pp. 350–351
34 Cui, D., Raghavan, B., Singh, U., et al.: ‘A dual-channel 23-Gbps
CMOS transmitter/receiver for 40-Gbps RZ-DQPSK and
CS-RZ-DQPSK optical transmission’, IEEE J. Solid-State Circuits,
2012, 47, (12), pp. 3249–3260
35 Spagna, F., Chen, L., Deshpande, M., et al.: ‘A 78 mw11.8 Gb/s serial
link transceiver with adaptive RX equalization and baud-rate CDR in 32
nm CMOS’. IEEE Int’l Solid-State Circuits Conf. Digest of Technical
Papers, February 2010, pp. 366–367
36 Toifl, T., Menolfi, C., Ruegg, M., et al.: ‘A 2.6 mw/Gb/s 12.5 Gbps RX
with 8-tap switched-capacitor DFE in 32 nm CMOS’, IEEE
J. Solid-State Circuits, 2012, 47, (4), pp. 897–910
37 Toifl, T., Menolfi, C., Ruegg, M., et al.: ‘A 3.1 mW/Gb/s 30 Gbps
quadrater-rate triple-speculation 15-tap SC-DFE RX data path in 32
nm CMOS’. Symp. Circuits Digest of Technical Papers, 2012,
pp. 102–103
38 Bulzacchelli, J., Menolfi, C., Beukema, T., et al.: ‘A 28-Gb/s 4-tap FFE/
15-tap DFE serial link transceiver in 32-nm SOI CMOS technology’,
IEEE J. Solid-State Circuits, 2012, 47, (12), pp. 324–326
39 Savoj, J., Hsieh, K., Upadhyaya, P., et al.: ‘A wide common-mode
fully-adaptive multi-standard 12.5 Gb/s backplane transceiver in 28 nm
CMOS’. Symp.VLSI Circuits Digest of Technical Papers, 2012,
pp. 104–105
40 Dickson, T., Bulzacchelli, J., Friedman, D.: ‘A 12 Gb/s 11 mW half-rate
sampled 5-tap decision feedback equalizer with currentintegrating
summers in 45-nm SOI CMOS technology’, IEEE J. Solid-State
Circuits, 2009, 44, (4), pp. 1298–1230
41 Zhao, Z., Wang, J., Li, S., Chen, J.: ‘A 2.5-Gb/s 0.13 μm CMOS current
mode logic transceiver with pre-emphasis and equalization’. Proc. Int.
Conf. ASIC, October 2007, pp. 368–371
42 Kao, S., Liu, S.: ‘A 20-Gb/s transmitter with adaptive pre-emphasis in
65-nm CMOS technology’, IEEE Trans. Circuits Syst. II, 2010, 57,
(5), pp. 319–323
43 Fiedler, A., Mactaggart, R., Welch, J., Krishnan, S.: ‘A 1.0625 Gbps
transceiver with 2x-oversampling and transmit signal preemphasis’.
IEEE Int. Solid-State Circuits Conf. Digest of Technical Papers,
February 1997, pp. 238–239
44 Dally, J., Poulton, J.: ‘Transmitter equalization for 4-gbps signaling’,
IEEE Micro, 1997, 17, (1), pp. 48–56
45 Yuan, F.: ‘An area-power efficient 4-PAM full-clock 10 Gb/s CMOS
pre-emphasis serial link transmitter’, Analog Integr. Circuits Signal
Process., 2009, 59, (3), pp. 257–264
46 Wong, K., Chen, E., Yang, C.: ‘Edge and data adaptive equalization of
serial-link transceivers’, IEEE J. Solid-State Circuits, 2008, 43, (9),
pp. 2157–2169
47 Lee, M., Dally, W., Chiang, P.: ‘Low-power area-efficient high-speed I/
O circuit techniques’, IEEE J. Solid-State Circuits, 2000, 35, (11),
pp. 1591–1599
48 Lee, M., Dally, W., Farjad-Rad, R., et al.: ‘CMOS high-speed I/Os present and future’. Proc. Int’l Conf. Computer Design, October 2003,
pp. 454–461
49 Balan, V., Caroselli, J., Chern, J., Desai, C., Liu, C.: ‘A 4.8-6.4 Gbps
serial link for back-plane applications using decision feedback
equalization’. Proc. IEEE Custom Integrated Circuits Conf., 2004,
pp. 331–334
50 Razavi, B.: ‘Design of integrated circuits for optical communications’
(McGraw-Hill, New York, 2003)
129
& The Institution of Engineering and Technology 2014
www.ietdl.org
51 Kammeyer, K.: ‘Time truncation of channel impulse responses by linear
filtering: A method to reduce the complexity of Viterbi equalization’,
Int. J. Electron. Commun. (AEU), 1994, 48, (5), pp. 237–243
52 Gondi, S., Razavi, B.: ‘Equalization and clock and data recovery
techniques for 10 Gb/s CMOS serial-link receivers’, IEEE
J. Solid-State Circuits, 2007, 42, (9), pp. 1999–2011
53 Lee, D., Han, J., Han, G., Park, S.: ‘10 Gbit/s 0.0065 mm2 6 mW
analogue adaptive equalizer utilizing negative capacitance’, IET
Electron. Lett., 2009, 45, (17), pp. 863–865
54 Austin, M.: ‘Decision-feedback equalization for digital communication
over dispersive channels’. IEEE Int. Research Laboratory of
Electronics Technical Report 461, August 1967
55 Chen, W., Huang, G.: ‘A parallel multi-pattern PRBS generator and
BER tester for 40 Gbps SerDes application’. Proc. IEEE Asia-Percific
Conf. Advanced System Integrated Circuis, August 2004, pp. 318–321
56 Dickson, T., Bulzacchelli, J., Friedman, D.: ‘A 12 Gb/s 11 mW half-rate
sampled 5-tap decision feedback equalizer with currentintegrating
summers in 45 nm SOI CMOS technology’. Symp. VLSI Circuits
Digest of Technical Papers, 2008, pp. 58–59
57 Toifl, T., Ruegg, T.M.M., Inti, R., et al.: ‘A 3.1 mW/Gbps 30 Gbps
quarter-rate triple-speculation 15-tap SC-DFE RX data path in 32 nm
CMOS’. Symp. VLSI Circuits Digest of Technical Papers, 2012,
pp. 102–103
58 Winters, J., Kasturia, S.: ‘Adaptive nonlinear cancellation for high-speed
fiber-optic systems’, J. Lightwave Technol., 1992, 10, (7), pp. 971–977
59 Ren, J., Lee, H., Lin, Q., et al.: ‘Precursor ISI reduction in high-speed I/
O’. Symp. VLSI Circuits Digest of Technical Papers, 2007, pp. 134–135
60 Payne, R., Bhakta, B., Ramaswamy, S., et al.: ‘A 6.25-Gb/s binary
transceiver in 0.13-μm CMOS for serial data transmission across high
loss legacy backplane channels’, IEEE J. Solid-State Circuits, 2005,
40, (12), pp. 2646–2657
61 Lee, S., Hwang, M., Choi, Y., et al.: ‘A 5-Gb/s 0.25-μm CMOS
jitter-tolerant variable-interval oversampling clock/data recovery
circuit’, IEEE J. Solid-State Circuits, 2002, 37, (12), pp. 1822–1830
62 Milijevic, S., Kwasniewski, T.: ‘4 Gbit/s receiver with adaptive blind
DFE’, IEE Electron. Lett., 2005, 41, (25), pp. 1373–1374
63 Ibranhim, S., Razavi, B.: ‘Low-power CMOS equalizer design for
20-Gb/s systems’, IEEE J. Solid-State Circuits, 2011, 46, (6),
pp. 1321–1336
64 Razavi, B.: ‘Design of analog CMOS integrated circuits’ (McGraw-Hill,
New York, 2001)
65 Zhu, X., Chen, Y., Kibune, M., et al.: ‘A dynamic offset control
technique for comparator design in scaled CMOS technology’. Proc.
IEEE Custom Integrated Circuits Conf., 2008, pp. TP–011–014
66 McLeod, S., Sheikholeslami, A., Yamamoto, T., Nedovic, N., Tamura,
H., Walker, W.: ‘A digital offset-compensation scheme for an la and cdr
in 65-nm cmos’. Symp. VLSI Circuits Digest of Technical Papers, June
2009, pp. 448–449
67 Sidiropoulos, S., Horowitz, M.: ‘A 700 Mb/s/pin CMOS signaling
interface using current integrating receivers’, IEEE J. Solid-State
Circuits, 1997, 32, (5), pp. 681–690
68 Wang, T., Yuan, F.: ‘A new current-mode incremental signaling scheme
with applications to Gb/s parallel links’, IEEE Trans. Circuits Syst. I.,
2007, 54, (2), pp. 255–267
69 Carusone, T., Johns, D., Martin, K.: ‘Analog integrated circuit design’
(John Wileys and Sons, New York, 2012, 2nd edn.)
70 Fayed, A., Ismail, M.: ‘A low-voltage, low-power CMOS analog
adaptive equalizer for UTP-5 cables’, IEEE Trans. Circuits Syt. I,
2008, 55, (2), pp. 480–495
130
& The Institution of Engineering and Technology 2014
71 Bulzacchelli, J., Rylyakov, A., Friedman, D.: ‘Power-efficient
decision-feedback equalizers for multi-Gb/s CMOS serial links’. Proc.
IEEE Radio Frequency Integrated Circuits Symp., June 2007,
pp. 507–510
72 Payandelhnia, P., Abbasfar, A., Sheikhaei, S., Forouzandeh, B.,
Nanbakhsh, K., Eghbali, A.: ‘A 4 mW 3-tap 10 Gb/s decision
feedback equalizer’. Proc. IEEE Mid-West Symp. Circuits and
Systems, 2011, pp. 1–4
73 Rontogiannis, A., Berberidis, K.: ‘Efficient decision feedback
equalization for sparse wireless channels’, IEEE Trans. Wirel.
Commun., 2003, 2, (3), pp. 570–581
74 Mietzner, J., Badri-Hoeher, S., Land, I., Hoeher, P.: ‘Equalization of
sparse intersymbol-interference channels revisited’, EURASIP J. Wirel.
Commun. Netw., 2006, 2006, (2), pp. 1–13
75 Winters, J., Gitlin, R.: ‘Electrical signal processing techniques in
long-haul fiber-optic systems’, IEEE Trans. Commun., 1990, 38, (9),
pp. 1439–1453
76 Bien, F., Kim, H., Hur, Y., et al.: ‘A 10 Gb/s reconfigurable CMOS
equalizer employing a transition detector-based output monitoring
technique for band-limited serial links’, IEEE Trans. Microw. Theory
Tech., 2006, 54, (12), pp. 4538–4547
77 Hyoungsoo, K., de Ginestous, J., Bien, F., et al.: ‘An electronic
dispersion compensator (EDC) with an analog eye-opening monitor
(EOM) for 1.25 Gb/s gigabit passive optical network (GPON)
upstream links’, IEEE Trans. Microw. Theory Tech., 2007, 55, (12),
pp. 2942–2950
78 Hong, D., Cheng, K.: ‘An accurate jitter estimation technique for
efficient high speed I/O testing’. Proc. Asian Test Symp., October
2007, pp. 224–229
79 Chen, L., Zhang, X., Spagna, F.: ‘A scalable 3.6-5.2 mW 5-to-10 Gb/s
4-tab DFE in 32 nm’. IEEE Int. Solid-State Circuits Conf. Digest of
Technical Papers, February 2009, pp. 180–181
80 Suttorp, T., Langmann, U.: ‘A 10-Gb/s CMOS serial-link receiver using
eye-opening monitoring for adaptive equalization and for clock and data
recovery’. Proc. IEEE Custom Integrated Circuits Conf., September
2007, pp. 277–280
81 Ellermeyer, T., Langman, U., Wedding, B., Pohlmann, W.: ‘A 10 Gb/s
eye-opening monitor IC for decision-guided adaptation of the frequency
response of an optical receiver’, IEEE J. Solid-State Circuits, 2000, 35,
(12), pp. 1958–1963
82 Analui, B., Rylyakov, A., Rylov, S., Meghelli, M., Hajimiri, A.: ‘A
10-Gb/s two-dimensional eye-opening monitor in 0.13-μm standard
CMOS’, IEEE J. Solid-State Circuits, 2005, 40, (12), pp. 2689–2699
83 Noguchi, H., Yoshida, N., Uchida, H., Ozaki, M., Kanemitsu, S., Wada,
S.: ‘A 40-Gb/s CDR circuit with adaptive decision-point control based
on eye-opening monitor feedback’, IEEE J. Solid-State Circuits, 2008,
43, (12), pp. 2929–2938
84 Bhatta, D., Kim, K., Gebara, E., Laskar, J.: ‘A 10 Gb/s two dimensional
scanning eye opening monitor in 0.18 μm CMOS process’. Proc. IEEE
Int. Microwave Symp. Digest, June 2009, pp. 1141–1144
85 Gerfers, F., Besten, G., Petkov, P., Conder, J., Koellmann, A.: ‘A 0.2-2
Gb/s 6x OSR receiver using a digitally self-adaptive equalizer’, IEEE
J. Solid-State Circuits, 2008, 43, (6), pp. 1436–1448
86 Abiri, B., Sheikholeslami, A., Tamura, H., Kibune, M.: ‘An adaptation
engine for a 2x blind ADC-based CDR in 65 nm CMOS’, IEEE
J. Solid-State Circuits, 2011, 46, (12), pp. 3140–3149
87 AL-Taee, A., Yuan, F., Ye, A., Sad, S.: ‘A new two-dimensional
eye-opening monitor for Gbps serial links’, IEEE Trans. VLSI, 2013
IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130
doi: 10.1049/iet-cds.2013.0159
Download