www.ietdl.org Published in IET Circuits, Devices & Systems Received on 21st January 2013 Revised on 31st October 2013 Accepted on 1st November 2013 doi: 10.1049/iet-cds.2013.0159 ISSN 1751-858X Design techniques for decision feedback equalisation of multi-giga-bit-per-second serial data links: a state-of-the-art review Fei Yuan1, Alaa R. AL-Taee1, Andy Ye1, Saman Sadr2 1 Department of Electrical and Computer Engineering, Ryerson University, Toronto M5B 2K3, ON, Canada M5B 2K3 Semtech Corp., Toronto, ON, Canada E-mail: fyuan@ryerson.ca 2 Abstract: This study provides a comprehensive review of decision feedback equalisation (DFE) for multi-giga-bit-per-second (Gbps) data links. The state-of-the-art of DFE for multi-Gbps serial links reported in the past decade are compiled and presented. The imperfection of wire channels, in particular, finite bandwidth, reflection and cross-talk and their impact on data transmission are investigated. The fundamentals of both near-end and far-end channel equalisation to combat the effect of the imperfection of wire channels at high frequencies are explored. A detailed examination of the principle, configuration, operation and limitation of DFE is followed. Design challenges encountered in design of DFE for multi-Gbps data links including timing constraints, sampling, error propagation, arithmetic operation, highly dispersive channels, power consumption and techniques and circuit implementations that address these challenges are studied. The need for adaptive DFE and the principles of adaptive DFE are investigated. Finally, the performance of various adaptive DFEs is examined and their pros and cons are compared. 1 Introduction The explosive growth of data processed by integrated circuits (ICs) demands that data be transmitted over wire channels (interconnects, vias, connectors, package pins, printed circuit boards and coaxial cables) at multiple giga-bit-per-second (Gbps). Although increasing the number of wire channels directly improves the total data bandwidth, a large number of parallel channels not only increase the cost of routing, the overall data rate is also affected by clock and data skews caused by the mismatch of the channels [1]. As a result, parallel links are only attractive for short-range data communications such as multi-processor systems, processor-to-memory interfaces and network switches. Unlike parallel links, serial links transmit data and clock using a single wire channel, typically a differential pair to minimise electromagnetic interference with neighbouring devices. The elimination of a dedicated channel for clock transmission removes the difficulties associated with clock skew. The use of only a single wire channel also eliminates the bottle neck associated with data skew. Moreover, it greatly reduces the cost associated with routing. As a result, serial links are very attractive in applications such as block-to-block (on-chip), chip-to-chip, chassis-to-chassis and computer-to-computer links where the distance over which data are transmitted is large and the number of channels available is small. Although the maximum transit frequency of metal-oxide semiconductor (MOS) transistors has well exceeded 100 GHz, 118 & The Institution of Engineering and Technology 2014 the data rate of serial links is much lower, as evident in Table 1, despite the nearly linear improvement of the maximum data rate with technology scaling, as shown in Fig. 1. The low data rate is mainly because of inter-symbol interference (ISI) arising from channel imperfections with limited bandwidth, reflection and cross-talk the most critical. The limited bandwidth of channels caused by the rising resistive and dielectric loss of the channels at high frequencies gives rise to a long channel impulse response or equivalently frequency-dependent attenuation, as shown in Fig. 2a. Reflection caused by the impedance mismatch of channels, largely because of the inclusion of vias, connectors and branches in the channels, results in crests and troughs that are non-uniformly distributed over a large number of symbol intervals in channel impulse response or equivalently sharp troughs in frequency domain response, as shown in Fig. 2b [2–4]. Note that troughs are because of capacitive impedance mismatches. For channels with severe reflection, deep troughs exist, as shown in Figs. 2c and d. Crosstalk is primarily because of capacitive and inductive coupling with neighbouring devices and manifests itself as crests and troughs in the channel impulse response. As a result, data symbols received at the far end of the channel consist of pre-cursors, main cursor and post-cursors with the number of post-cursors significantly larger than that of the pre-cursors, as shown in Fig. 2d. The main cursor is used for data recovery while pre-cursors and post-cursors need to be removed. They can be removed using near-end and far-end channel equalisation. IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130 doi: 10.1049/iet-cds.2013.0159 www.ietdl.org Table 1 Data rate of serial links utilizing decision feedback equalisation Ref. Shahramian et al. [5] Payne et al. [6] Krishna et al. [7] Beukema et al. [8] Stojanovic et al. [9] Balan et al. [10] Wong et al. [11] Park et al. [12] Leibowitz et al. [13] Bulzacchelli et al. [14] Hidaka et al. [15] Seong et al. [16] Huang et al. [17] Turker et al. [18] Bulzacchelli et al. [19] Kim et al. [20] Pozzoni et al. [21] Harwood et al. [22] Abiri et al. [23] Sarvari et al. [24] Wang et al. [25] Y. Ying et al. [26] Dickson et al. [27] Nazari et al. [28] Gangasaniet al. [29] Agrawal et al. [30] Amirkhany et al. [31] Kaviani et al. [32] Quan et al. [3] Joy et al. [33] Cui et al. [34] Spagna et al. [35] Toifl et al. [36] Toifl et al. [37] Bulzacchelli et al. [38] J. Savoj et al. [39] Tech. Channel loss Data rate, Gbps Tx Rx BER 130 nm 130 nm 130 nm 130 nm 130 nm 130 nm 90 nm 90 nm 90 nm 90 nm 90 nm 90 nm 90 nm 90 nm 65 nm 65 nm 65 nm 65 nm 65 nm 65 nm 65 nm 65 nm 45 nm SOI 45 nm SOI 45 nm SOI 45 nm SOI 40 nm 40 nm 40 nm 40 nm 40 nm 32 nm 32 nm 32 nm 32 nm SOI 28 nm −8 dB −21 dB (36″ FR4) −18 dB (33″ FR4) −18 dB (30″ FR4) −12 dB (26″ FR4) −(40″ FR4) −6.2 dB (10″ SMA) −12 dB (16″ Tyco) −(18″ BP) −33 dB (16″ Legacy BP) −25.4 dB (29″ FR4) −25.4 dB (15″ FR4) −32.7 dB (5.5″ FR4) −14 dB (20″ Nelco) −16 dB (30″ PCB) −21 dB (50″ Nelco) −24 dB (28″ FR4) −24 dB (12″ PCB) −13.3 dB (34″ FR4) −13.3 dB (34″ FR4) −13.3 dB (16″ FR4) −22.3 dB (14″ FR4) −21 dB (50″ Nelco) −25 dB (18″ FR4) −32 dB (40″ Nelco) −25 dB (20″ PCB) −10 dB (3″ FR4) −15 dB (3″ FR4) −26 dB −34 dB (24″ FR4) −20 dB (8″ FR4) −25 dB (14″ PCB) −27 dB (39″ PCB) −36 dB −35 dB (15″ PCB) −33 dB BP 3.7 6.25 9.6 6.4 5 6.4 6.0 7.0 −7.5 10 10.3 10 6 15 11 10 8.5 12.5 5 5 21 20 10 15 16 19 16 20 14 16 23 11.8 12.5 30 28 12.5 – – 2-tap 4-tap – 2-tap – – – 4-tap 2/3-tap – – – 3-tap – – 4-tap – – 3-tap – – – 3-tap – – 1-tap 4-tap 3-tap 2-tap 3-tap – 4-tap 4-tap 5-tap 3 IIR 5-tap CTLE/1-tap 5-tap 1-tap 4-tap 2-tap 2-tap 10-tap 5-tap CTLE/1-tap 2-tap 1-tap & IIR 1-tap 5-tap 1-tap & IIR 3-tap 2-tap FFE/5-tap 1-tap 1-tap 1-tap CTLE/1-tap DFE-IIR 2-tap 12-tap 4-tap FFE/5-tap 1-tap CTLE/1-tap 10-tap 14-tap CTLE/1-tap 4-tap CTLE/8-tap CTLE/15-tap CTLE/15-tap CTLE/3-tap 10−12 10−15 10−15 10−15 – tL 10−12 10−13 10−12 10−12 10−13 10−12 10−12 10−13 10−15 10−12 – – 10−12 10−12 10−12 10−12 10−12 10−12 10−15 10−13 10−12 10−12 – 10−12 10−12 10−12 10−12 10−12 10−13 10−15 Channel loss is measured at half baud-rate frequency. This paper provides a comprehensive review of decision feedback equalisation (DFE) for multi-Gbps data links. The remainder of the paper is organised as the followings: Section 2 investigates the imperfection of wire channels, in particular, finite bandwidth, reflection and cross-talk and their impact on data transmission are investigated. The fundamentals of both near-end and far-end channel equalisation to combat the effect of the imperfection of wire channels at high frequencies are explored. Section 3 provides a detailed examination of the principle, configuration, operation and limitation of DFE. Design challenges encountered in design of DFE for multi-Gbps data links including timing constraints, sampling, error propagation, arithmetic operation, highly dispersive channels, power consumption and techniques and circuit implementations that address these challenges are studied. Section 4 investigates the need for adaptive DFE and the principles of adaptive DFE. The performance of various adaptive DFEs is examined and their pros and cons are compared. Section 5 explores the direction of future research and development of DFE. The paper is concluded in Section 6. 2 2.1 Fig. 1 Dependence of data rate on the minimum channel length of MOS transistors IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130 doi: 10.1049/iet-cds.2013.0159 Channel equalisation Near-end channel equalisation Pre-cursors and post-cursors can be removed by boosting the high-frequency components [41, 42] or attenuating the low-frequency components of data symbols [43, 44] prior to their transmission. The former increases cross-talk as crosstalk intensifies at high frequencies. The latter reduces the power of the transmitted symbols as the power of non-return-to-zero data is largely concentrated at half baud-rate frequency. Since it increases the relative strength of the high-frequency components of the transmitted signals, crosstalk is also reduced [8]. Near-end channel equalisation is often implemented using finite impulse 119 & The Institution of Engineering and Technology 2014 www.ietdl.org Fig. 2 Limited bandwidth of channels caused by the rising resistive and dielectric loss of the channels at high frequencies and reflection due to the impedance mismatch a Frequency response of a 30″ trace on a Nelco4000-13SI board [40] b Frequency response of a 16″ Tyco legacy backplane with two daughter cards [40] c and d Frequency response and impulse response of a highly reflective backplane [4] response (FIR) filters that introduce zeros to offset the effect of the poles of the channels [45, 46]. For example, the first-order pre-emphasis FIR filter shown in Fig. 3 and given by y(n) = x(n) − a1x(n − 1) where x(n) and y(n) are the input and output of the FIR filter, respectively, has its transfer function HFIR(z) = 1 − a1z−1. Clearly it introduces a zero at z = 0 that will impact all the poles of the channels. To demonstrate this, since vTs ≪ 1 where Ts the symbol time, we have z = esTs ≃ 1 + sTs . As a result s HFIR (s) ≃ 1 − a1 +1 vz (1) where ωz = (1 − a1)/(a1Ts). If we model the channel as a first-order low-pass, that is, Hch(s) = 1/((s/ωch) + 1) where ωch is the channel bandwidth, the transfer function of the equalised channel is given by s/vz + 1 Heq (s) = 1 − a1 s/vch + 1 (2) It becomes apparent that if we choose ωz = ωch, the pole of the channel will be cancelled by the zero of the pre-emphasis FIR, resulting in a desirable all-pass. Also observed is that since 1 − a1 < 1, there is a loss of signal energy in pre-emphasis. Since the loss of wire channels is typically much deeper than −20 dB/dec, a high-order pre-emphasis FIR filter is needed. The order of pre-emphasis FIR filters, however, is usually limited to four (Table 1) as further increasing the order improves the performance only marginally [44, 47– 49]. This observation also reveals that wire channels can be modelled using a fourth-order low-pass adequately. Fig. 3 Near-end channel equalisation with first-order pre-emphasis The added pre-emphasis tap shortens the duration of the received symbol thereby improving data rate 120 & The Institution of Engineering and Technology 2014 IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130 doi: 10.1049/iet-cds.2013.0159 www.ietdl.org Since the characteristics of the channel are not known prior to data transmission, the optimal tap coefficients of pre-emphasis FIR filters can only be obtained if a back channel exists. This constraint undermines the robustness of pre-emphasis channel equalisation. Another limitation of pre-emphasis channel equalisation is its inability to remove ISI caused by reflection and crosstalk as these ISI manifest themselves as crests and troughs rather than uniformly sloped attenuation, as shown earlier in Fig. 2. ISI caused by reflection and crosstalk is typically significant when data rate is high and channels contain multiple vias, connectors and branches (highly reflective channels). 2.2 Far-end channel equalisation Far-end channel equalisation also known as post-equalisation combats ISI by either amplifying the high-frequency components of received data symbols in the analog domain or removing post-cursors in the digital domain prior to clock and data recovery (CDR). As compared with near-end equalisation, post-equalisation offers the ability to combat ISI caused by reflection and crosstalk. (1) Linear post-equalisation: Linear post-equalisation boosts the high-frequency components of received symbols with a continuous-time linear equaliser (CTLE). CTLE provides zeros to cancel out the poles of the channels so that the equalised channel exhibits an all-pass transfer characteristic. To demonstrate this, consider the CTLE in Fig. 4 and neglect the capacitance of MOSFETs. We examine three cases † If only Cx is considered (neglect Rx, L and CL), the transfer function is given by Vo (s) sRC = − x Vin (s) sCx / gm + 1 (3) where gm is the transconductance of the MOSFETs. The feedback provided by Cx adds a zero at frequency ωz = 0. The pole provided by Cx is at frequency ωp = gm/Cx. ωp must be sufficiently higher than the half baud-rate frequency so that its impact is negligible. The domain in Fig. 4 Continuous-time linear equalisers with inductor series peaking, source degeneration and negative capacitors IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130 doi: 10.1049/iet-cds.2013.0159 which the added zero is effective in compensating the effect of the poles of the channel is given by ωz ≤ ω ≤ ωp. † If we consider both Cx and Rx (Neglect L and CL), the transfer function becomes Vo (s) Rgm sR C + 1 x x =− Rx gm + 1 s Rx C / Rx gm + 1 + 1 Vin (s) (4) The zero is now located at ωz = 1/(RxCx) and the pole is at vp = Rx gm + 1 / Rx Cx ≃ gm /Cx provided Rx gm ≫ 1. It is evident that ωz is now tunable by varying Cx and Rx. † If L, Rx, Cx and CL are all considered, the transfer function becomes Vo (s) Rgm = − Vin (s) Rx gm + 1 LCL sRx Cx + 1 s(L/R) + 1 × sRx C / Rx gm + 1 + 1 s2 + s(R/L) + 1/ LCL (5) It is seen from (5) that the addition of the inductor peaking introduces another zero at ωz2 = R/L. This is in addition to the zero introduced by Cx and Rx at ωz1 = 1/(RxCx). It also introduces complex conjugate poles with natural resonant frequency vn = 1/ LCL . It is well understood that complex conjugate poles improve bandwidth [50]. The zeros are used to cancel the effect of the poles of the channel so as to increase the bandwidth while the complex conjugate poles improve the bandwidth through resonance. The higher the quality factor, the larger the bandwidth improvement. The addition of the negative capacitors reduces CL, which in turn boosts the natural resonant frequency ωn subsequently the bandwidth. The use of zeros to offset the effect of the poles of wire channels bears a strong resemblance to the use of filtering mechanisms to compensate for the loss of wireless channels so as to shorten channel impulse response length or equivalently improve the channel bandwidth, for example, the time truncation of channel impulse response by filtering proposed in [51]. The computational cost of these mechanisms, however, makes them difficult to meet the ever stringent timing constraints of multi-Gbps serial links. As the received symbol is severely attenuated by the channel upon arriving CTLE, input offset voltage compensation is also required in CTLE [15, 20]. The order of CTLE is determined by the attenuation of the channel and the sensitivity of the slicer. High-order CTLE can be obtained by cascading low-order CTLEs at the cost of more power consumption [21]. CTLE is often used in conjunction with non-linear post-equalisation with the former providing secondary channel equalisation. As a result, low-order DFE can be used without sacrificing performance [26, 49]. CTLE has also been used as a solo post-equaliser for channels with negligible reflection and cross-talk. The absence of feedback in this case allows CTLE to support higher data rates. For example, CTLE in 130 nm CMOS enables 10 Gbps transmission over 30″ FR4 channel of −21 dB loss at half baud-rate frequency and achieves 10−13 bit-error-rate (BER) [52]. Similarly, CTLE 121 & The Institution of Engineering and Technology 2014 www.ietdl.org implemented in 130 nm CMOS supports 10 Gbps transmission over 34″ FR4 channel with −14 dB loss and consumes 6 mW [53]. It should be emphasised that CTLE is only effective in removing channel loss-induced ISI and ineffective in eliminating crosstalk/reflection-induced ISI [52]. (2) Non-linear post-equalisation: Unlike CTLE, non-linear post-equalisation mitigates the effect of channel loss, reflection and crosstalk by removing the post-cursors of the received symbol in the digital domain. The most widely used non-linear equalisation is decision feedback equalisation (DFE) introduced by Austin in 1967 [54]. Unlike other channel equalisation mechanisms, since DFE does not amplify the received data symbol, it therefore does not deteriorate crosstalk with neighbouring devices. Moreover, since the number, the weight and the order of the taps of DFE can be adjusted in accordance with the characteristics of the channel to be equalised, DFE is not only most effect in eliminating ISI caused by the finite bandwidth of the channel, it is also most effective in eliminating ISI caused by reflection and crosstalk. Since DFE only utilises the past decisions, it has no effect on pre-cursors. As a result, the only effective means available for us to combat pre-cursors is pre-emphasis. Fig. 5 shows the basic configuration and operation of DFE. The slicer clocked by the recovered clock senses the difference between the partially equalised symbol vin − vf and reference vref. The output of the slicer passes through N delay stages of unit delay where N is the number of the post-cursors to be removed. The output of each delay stage is multiplied by weight factor ck such that Hk = ckDj − k where Hk is kth post-cursor and Dj − k is ( j − k)th past decision of the slicer. It is seen that Dj = vin, j − Nn=1 v f , n cn D j−n . The operation of DFE is illustrated using four consecutive logic-1 symbols S1–S4. The received symbols are overlapped because of ISI. Let Hm,n denote nth post-cursor of symbol-m with Hm,0 the main cursor. Observe that vin[4] contains four components: the main cursor of symbol-4 H4,0, the first post-cursor of symbol-3 H3,1, the second post-cursor of symbol-2 H2,2 and the third post-cursor of symbol-1 H1,3, that is, vin[4] = H4,0 + H3,1 + H2,2 + H1,3. If the feedback vf contains vf = H3,1 + H2,2 + H1,3 = c3D3 + c2D2 + c1D1, then when it is subtracted from vin[4], we have vs[4] = vin[4] − vf = H4,0. H4,0 can therefore be safely digitised regardless of the presence of ISI. It is evident that DFE has no impact on both noise present in current symbol and pre-cursors. 3 Design challenges in decision feedback equalisation 3.1 Bit error rate test The performance of serial links is primarily quantified by BER obtained by transmitting a pseudo-random bit stream (PRBS) to the channel and recording the number of transmission error, typically BER = 10−12 is required. Transmission errors are obtained using a PRBS checker that compares the transmitted bits with the corresponding received bits. Although PRBS7 (7-bit PRBS) has been used [20], they are primarily for testing serial links with 8 B/10 B encoded data. PRBS31 (31-bit PRBS) that provides a sufficient transition density is preferred especially for those using 64 B/66 B encoded data. PRBS can be generated using linear feedback-shift registers, although parallel PRBS generators are also available [55]. Since eye-opening is typically maximised at the centre of the data eye where BER is minimised and gradually levels off towards the Fig. 5 Basic configuration and operation of decision feedback equalisation Legends: UI – Delay cell with one unit delay, vs = vin − vf is the symbol before the slicer vref is the threshold of the slicer, Dj=1 is the delayed version of Dj, c1,…, cN are the weighting factors of feedback taps, Ts is the symbol time, Hj, k denote the kth post-cursor of symbol-j and H0, j the main cursor of symbol-j 122 & The Institution of Engineering and Technology 2014 IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130 doi: 10.1049/iet-cds.2013.0159 www.ietdl.org Fig. 6 Dependence of horizontal eye-opening on BER edges of the data eye where BER climbs, the horizontal eye-opening at a given BER, for example 10−8 [56], 10−9 [14, 20, 40], or 10−10 [36, 37, 57], is usually used as a figure-of-merit to quantify the performance, as shown in Fig. 6. The bathtub curves are obtained by varying the sampling instant within one UI while evaluating BER for each sampling instant [14]. It is seen that BER is minimised at the centre of the data eye and gradually levels up when sampling instant moves away from the centre towards the edge of the data eye. 3.2 Timing constraints DFE operations include data slicing, multiplication and subtraction must be completed within one UI. Since there is only one UI between current symbol and tap-1 feedback, tap-1 loop has the most stringent constraint. An effective way to overcome this difficulty is to feed the error signal for all possible feedback signals ( +1 or −1) to two identical slicers, as shown in Fig. 7. The decisions of the slicers are 2-to-1 multiplexed with its output selected by the previous decision [7, 8, 13, 14]. This approach was originally proposed by Kasturia et al. [58] and is known as loop unrolling, speculation, look-ahead, or partial-response [59]. The use of loop unrolling is often limited to tap-1 as the number of slicers and summers increases exponentially with the number of taps. In recent DFE implementations, loop unrolling has also been used for first two taps to cope with 28 Gbps data rate [38] and first 3 taps to cope with 30 Gbps data rate [36, 37, 57]. As pointed out in [6, 60], the delay of the slicer could be long if its input is small. To overcome this, the insertion of an auxiliary amplifier between the CTLE and the slicer has been shown to be effective. This approach, however, becomes less effective once data rate is high. The remaining DFE taps are generated using clocked delay stages and their outputs are weighted and subtracted from the output of the CTLE. This approach is known as direct feedback [8] or dynamic feedback [46]. To relax timing constraint and lower power consumption, half-rate approach where the remaining taps are operated at half the data rate, as depicted graphically in Fig. 7, is widely favoured [8]. Quarter-rate approach has also been used to further relax timing constraint in recent designs [36, 37, 46, 57]. 3.3 Sampling The slicer is typically implemented using a re-generative sense amplifier for speed, as shown in Fig. 8 [13, 15, 61, 62]. When φ = 0, the re-generative mechanism is disabled and the input and output of the cross-coupled inverters are set to be equal, forcing their operating point to the transition region where a maximum transconductance exists. The input and reference voltage, in the mean time, are Fig. 7 Half-rate decision feedback equalisation Tap-1 is implemented using loop-unrolling while the remaining taps are implemented using direct feedback with a complementary clock whose frequency is only half that required for the data rate IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130 doi: 10.1049/iet-cds.2013.0159 123 & The Institution of Engineering and Technology 2014 www.ietdl.org avoids the deployment of compensation capacitors at the input of the slicer, which has a detrimental impact on speed. 3.4 Fig. 8 Clocked re-generative sense amplifier as slicer [13, 15, 61, 62] sampled by the input capacitors of the slicer. M6 and M11 are turned on, equalising the voltages of the two branches. In the following φ = 1 phase, the re-generative mechanism is activated, forcing the output of the slicer to respond to the voltage sampled by M2 and M5 in the previous phase. Since the signal at the far end of the channel is attenuated, the input offset voltage of the slicer must be sufficiently small in order to minimise slicer error. It was shown in [63] that the BER of a slicer with an input offset voltage Vos is given by ⎛ ⎞ 1 ⎜V − V ⎟ BER ≃ Err⎝ m os ⎠ 2 Vn2 (6) where Vn2 is the input-referred noise power, Vm is the voltage of the input and √ 1 −u2 /2 e du Err (x) = 1/ 2p x is the error function. Clearly the input offset voltage directly affects the BER. Large input devices are preferred from a low input offset voltage point of view [64]. Small input devices, on the other hand, have the advantage of a low input capacitance and therefore high speed, but at the cost of deteriorating input offset voltage. The calibration of the slicer prior to any data-slicing operation is mandatory. In [7], a background calibration method was used to calibrate slicers. Specifically, two identical slicers operated in an interleaved manner are employed. One slicer is connected to the input for sensing (on-line slicer) the other slicer is connected to a reference voltage for calibration (off-line slicer) [9]. The calibration of the off-line slicer can be accomplished using conventional auto-zero techniques or dynamic offset control techniques [65] to avoid the power penalty of current array-based offset compensation [47] and the speed penalty of capacitor array-based offset compensation [48]. The digital offset compensation technique proposed in [66] detects the effect of offset voltage on the duty-cycle of the output and utilises the detected duty-cycle imbalance to adjust the biasing currents so as to eliminate duty-cycle imbalance subsequently the offset effect. The method 124 & The Institution of Engineering and Technology 2014 Error propagation If the output of the slicer does not correspond to the received data symbol, a slice error occurs. The slicer error will not only affect the current decision, it will also affect the next decision. Moreover, the error will propagate through the entire delay chain. This error propagation characteristic is a fundamental drawback of DFE. Clearly, to minimise the possibility of slicer error, the input of the slicer prior to slicing must be sufficiently large and disturbance-free. There are a number of factors that contribute to slice error. The first is the input offset voltage of the slicer. As detailed in Section 3.3, the input offset voltage of the slicer must be compensated in order to improve the sensitivity of the slicer unless the signal from the preceding CTLE is large enough, revealing the importance to have a CTLE preceding the slicer. The feedback signal from the summer must also be stable prior to slicing operation. To achieve this, a stringent timing constraint is imposed on the feedback path where a number of arithmetic blocks exist, as detailed in Section 3.5. Since the slice is differentially configured, a proper common-mode voltage must be in place so that a small differential input from the preceding CTLE can be sensed and latched up correctly. Current-integrating where the incoming signal is integrated over a capacitor and the resultant capacitor voltage is sampled at the end of the integration phase is an effective means to minimise the effect of transient disturbances so as to minimise slicer error [67, 68]. This approach is effective if the duration of the disturbance is much smaller as compared with the symbol time. Difficulty also arises when the data rate is high unless the integrating capacitor is sufficiently small. A small integrating capacitor, however, signifies the effect of the parasitic capacitances of devices. The error of the slicer is minimised if the incoming signal is sufficiently large. This can be achieved using CTLE prior to slicing, as shown in Fig. 4 earlier [26]. Note that CTLE also provides partial channel equalisation. Some implementations also employ a pre-amplifier preceding CTLE to boost the received signal prior to equalisation so as to minimise the possibility of slice error [3, 4]. The effect of the kick-back of the slicer should also be taken into consideration [69, 70]. 3.5 Arithmetic operation Both multiplication and summation operations are needed in DFE. Not only these operations must be completed within one UI, the result must also be stable prior to any slicing operation. Multiplication of a past Boolean decision by a weighting factor is most efficiently implemented using current-steering configurations, as shown in Fig. 9. The delay of the summer usually dominates the speed of arithmetic operation because of the large number of the taps [8]. Current summation is the most widely favoured over its voltage counterpart because of its ease of implementation and high-speed operation. The speed of the current-mode resistor-load summer shown in Fig. 9 (without inductors) is set by the time constant of the current summation node. Since vin is attenuated, the transistors driven by vin are operated in saturation whereas those in the tap stages are operated in an ON/OFF mode. Lowering the load resistance R reduces the time constant, however, at the cost of reduced output voltage swing. As the output of the summer directly IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130 doi: 10.1049/iet-cds.2013.0159 www.ietdl.org Fig. 9 Current-mode summer with resistor loads When the dotted portion replaces the upper portion of the current-mode summer with resistor loads, it becomes a current-integrating summer The weight of tap-1, … , tap-N is tuned by varying the tail currents c1, … , cN, respectively When the dotted portion replaces the lower portion of the current-mode summer with resistor loads, it becomes a capacitive charge feedback summer feeds the slicer, a large output voltage of the summer is essential to minimise slicer error. Increasing the dimension of the input transistors improves output voltage swing, it, however, also reduces the speed. One effective way to speed up the summer without reducing the load resistance is shunt inductor peaking, as shown in Fig. 9 [26]. In [12, 40, 71], a current-integrating summer shown in Fig. 9 was proposed. To improve speed, source degeneration is widely adopted. The load resistors are replaced with pMOS transistors operated in an ON/OFF mode. In the reset phase, pMOS transistors are switched on and the load capacitors are charged to the supply voltage. During the following integration phase, pMOS transistors are switched off and the capacitors are discharged by the tail current sources representing the feedback taps. To eliminate the effect of vin during discharge, vin is disconnected from the gate of the input transistors [40]. The current-integrating summer offers the key advantage of reduced power consumption because there is no static current flowing from VDD to ground in both the reset and integrating phases [19, 56, 72]. The speed of the current-integrating summer can be further increased by replacing current feedback taps with capacitive charge feedback, as shown in Fig. 9, with capacitance proportional to DFE tap coefficients [36, 37, 57]. To further reduce power consumption, switched-capacitor summers were proposed [72]. Since complex clock schemes are needed for their proper operation, switched-capacitor summers are typically used to perform the summation of the first-tap with the rest of the taps implemented using the current-integrating approach depicted earlier. The characteristics of channels with severe reflection, however, differ from those of channels with high loss but insignificant reflection. The impulse response of these channel typically have post-cursors that reside far away from the main cursors. The post-cursors between the dominating post-cursors typically immediately following the main cursor and reflection-induced post-cursors are often insignificant, leading to sparsity in post-cursor distribution. Although there are many effective means to equalise sparse wireless channels [73, 74], these approaches cannot be adopted for wire channels because of the need for excessive computation subsequently long latency. Equalisation of these channels requires a long fixed-tap DFE even though many of the taps corresponding to the insignificant post-cursors between the main cursors and the remotely placed post-cursors are insignificant, resulting in excessive power and silicon consumption. Floating-tap DFE proposed by Zhong et al. is an elegant technique effective in combating reflection-induced post-cursors located far away from the main cursor [2–4]. In this approach, a number of fixed-taps are used to remove dominant post-cursors located close to the main cursor. In addition, a number of floating-taps whose locations are not fixed but rather determined by an optimisation algorithm that yields the largest tap coefficients subsequently the best performance are used to remove reflection-induced post-cursors. Although extra computation is needed, this additional cost is well justified by the elimination of the remote post-cursors. 3.7 3.6 Power consumption Channels with severe dispersity The impulse response of severely dispersive channels stretches over a large number of symbol intervals, as seen in Fig. 2 [4]. To equalise these channels, a large number of taps is needed, resulting in excessive power and silicon consumption. Efficient DFE with a small number of taps without sacrificing performance is highly desirable. DFE with an analog infinite-impulse-response (IIR) filter uses an analog IIR filter to mimic the response of the channel such that when subtracted from the response of the channel, the tail is removed without using a large number of DFE taps [20]. This approach works well for highly dispersive channels. IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130 doi: 10.1049/iet-cds.2013.0159 The power consumption of a decision feedback equaliser consists of the power consumption of the slicer, the delay units and the summer. The power consumption of the summers is significant because of their current-mode configuration. When loop-unrolling is employed, additional power consumption exists. The DFE-IIR examined earlier offers an attractive means to reduce power consumption, especially for highly dispersive channels. In [46], a soft-decision DFE was proposed to replace loop-unrolling and dynamic feedbacks without sacrificing speed. Instead of employing two slicers and other logic circuits, soft-decision DFE uses sample-and-hold before the summation and latches after the summation to perform channel equalisation. 125 & The Institution of Engineering and Technology 2014 www.ietdl.org value of ek and signal vk − j that can only be obtained using ADCs. Sign-sign LMS (SS-LMS) where only the sign of ek and vk − j are used is proven to be an effective alternative c j, k+1 = c j, k + h sign ek sign vk−j (8) sign ek and sign (vk − j) can be obtained conveniently using comparators. Since SS-LMS searches for the optimal tap coefficients based on the binary decision of the sign of ek and vk − j, the final optimal value of the tap coefficients will fluctuates in the vicinity of the optimal taps. In practice, a smaller h is typically used by SS-LMS to reduce the fluctuation. The convergence time of SS-LMS, however, will be shorter as compared with regular LMS. 4.2 Fig. 10 Configuration of SS-LMS adaptive DFE 4 Adaptive decision feedback equalisation The variation of wire channels requires that adaptive DFE so that the tap coefficients of DFE can be set in accordance with the characteristics of the channels. In this section, we examine the adaptive DFE algorithms that are widely used for Multi-Gbps serial links. 4.1 Least-mean-square (LMS) adaptive DFE LMS adaptation updates the DFE tap coefficients in such a way that the power of the error between the output and input of the slicer is minimised, that is, minimise ||Dj − vs, j ||, as shown in Fig. 10. The tap coefficients ck in step k of DFE are updated using [75] c j, k+1 = c j, k + hek vk−j (7) where h is the step size used to adjust the tap coefficients. LMS is difficult to implement because of the need for the Eye-opening adaptive DFE The opening of data eyes can be used to guide the search for the optimal parameters of DFE [6, 60]. Eye-opening can be captured using an eye-opening monitor (EOM), as shown in Fig. 11. An one-dimensional EOM quantifies the opening of data eyes by either the vertical or horizontal dimension of the eye with the vertical opening the most widely used (Fig. 12) [21, 35, 76–79]. The horizontal eye-opening can be determined from oversampling received data symbols. This, however, is at the cost of high silicon and power consumption [61]. Improvements were made in [24, 80] where both the edges and centre of the eye are used to quantify the opening of the eye. A two-dimensional EOM measures the dimension of the eye in both the vertical and horizontal directions [81–84]. Since the edge of the eye tc is determined by CDR, tH and tL can be chosen for the desired horizontal eye-opening using delay blocks. In one-dimensional EOM-based adaptive DFE, we are only concerned with the cases where vin < VH and vin > VL and do not care of the cases where vin > VH or vin < VL. LMS, on the other hand, searches for optimal tap coefficients that minimises ||vin − VH ||. Clearly LMS will converge slower as compared with one-dimensional EOM-based adaptive algorithms that only need to satisfy relaxed constraint vin > VH and vin < VL. This is the fundamental difference between SS-LMS adaptive DFE and EOM-based adaptive DFE. An attractive characteristic of EOM-based adaptive DFE is the freedom to adjust VH and VL if a one-dimensional EOM is used and VH, VL, tH and tL if a two-dimensional EOM is used. Clearly the increase in the number of constraints of adaptive DFE will result in better eye-opening subsequently better BER. Fig. 11 Eye-opening monitors a One-dimensional EOM b Two-dimensional EOM 126 & The Institution of Engineering and Technology 2014 IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130 doi: 10.1049/iet-cds.2013.0159 www.ietdl.org Fig. 12 One-dimensional eye-opening adaptive DFE [78] 4.3 Jitter-based adaptive DFE The intrinsic relation between the vertical and horizontal openings of data eyes reveals that minimising timing jitter at the edges of the eye will also maximise the vertical opening of the data eye as illustrated graphically in Fig. 13. To illustrate this, we represent the eye-diagram with zero jitter with a sinusoid vs(t) = Vmsin(ωst) where ωs = π/Ts, as shown in Fig. 13. We further assume that the eye-diagram with non-zero jitter is simply the down-shifted version of the one without jitter, that is, v̂s (t) = Vm sin (vs t) − DVm , where ΔVm is the variation of the amplitude because of timing jitter. It is straightforward to show that Δt = Ts/ πsin−1(ΔVm/Vm). Dt ≪ Ts , we have If Dt ≃ Ts /p DVm /Vm . It is evident that ΔVm is directly proportional to Δt. In [85], a jitter-based EOM was proposed. The transition edge of data eyes is sampled by a number of samplers, as shown in Fig. 14. XOR gates are used to determine the location of the transition edges and counters to record the number of transitions at each sampling position. An edge-transition histogram is generated. Measurement results demonstrate that the larger the eye-opening, the more narrow the histogram. Since the quality of the obtained histogram depends upon the number of samplers, this method is power hungry. Also, it becomes difficult to employ multiple samplers when data rate is high. Fig. 14 Data edge-based EOM by Gerfers et al. [85] To simultaneously minimie the timing jitter and maximise the vertical opening, a dual-mode adaptive DFE was proposed [46]. The dual-mode DFE consists of a data DFE and an edge DFE with the former maximising the vertical opening and the latter minimising the timing jitter. The edge adaptive DFE reduces the eye-edge timing jitter by 30% without sacrificing vertical opening. 4.4 Blind ADC-based adaptive DFE The effectiveness of EOM-based adaptive DFE depends upon the choice of the reference voltages used to generate the error signals. This approach works well with phase-tracking CDR as recovered data-sampling clock is positioned at the centre Fig. 13 Relation between vertical opening and edge jitter a Small eye-edge jitter b Large eye-edge jitter IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130 doi: 10.1049/iet-cds.2013.0159 127 & The Institution of Engineering and Technology 2014 www.ietdl.org Fig. 15 Blind ADC-based adaptive DFE v∗in : desired post-cursor response of the data eye. For phase-picking or over-sampling clock and data recovery, since the centre of the data eye is not known, EOM-based adaptive DFE becomes less attractive. In [23, 86], a set of eight vertical openings per UI are used to provide the desired post-cursor profile of the data eye, as shown in Fig. 15. In each sub-interval, LMS is used to adjust the tap coefficients of DFE to maximise the vertical eye-opening in the sub-interval. The incoming data symbol is digitised using four flash ADCs operated in an interleaved manner to cope with high data rates [22]. All adaptive DFE operations are performed in the digital domain to take the advantage of the flexibility. Clearly flexibility is at the price of power consumption. without excessive power consumption, however, remains to be a steep cliff to climb. The recent deployment of three-dimensional MOSFET, also known as FINFET, reduces the minimum channel length to below 28 nm. The speed advantage offered by FINFET equips designers with an effective means to overcome the difficulties encountered in meeting the timing constraint of DFE. The utilisation of FINFET in design of the building blocks of DFE is being carried out by leading SerDes developers worldwide. It is expected that exciting results will soon emerge. To equalise highly dispersive channels, a large number of taps are needed. This is often accompanied with an excessive amount of power and silicon consumption. Adaptive DFE is proven to be an effective means to achieve this. For channels with strong reflection and crosstalk, since the location of the reflection and crosstalk varies from one application to another, the post-cursors caused by the reflection and crosstalk often locate far away from the main cursor. As a result, DFE with a large number taps is needed. Algorithms that automatically determine the location of the post-cursors that is, the sparsity of the post-cursors, are needed in order to minimise the number of the taps needed subsequently the silicon and power consumption of DFE. Portability and adaptivity are therefore the two most important issues that future DFE must address while meeting timing requirement. 6 4.5 Comparison The preceding presentation of EOM-based adaptive DFE, jitter-based adaptive DFE and ADC-based adaptive DFE, reveals the following intrinsic advantages of these adaptive DFE as compared with LMS adaptive DFE 1. EOM and jitter-based adaptive DFE allow designers to freely set the constraints with which the optimisation algorithms must satisfy. These constraints such as vertical and horizontal eye-openings and timing jitter are directly related to BER of data links. 2. Multiple constraints such as eye-opening and timing jitter at the edges of data eyes can be imposed simultaneously to obtain significantly improved performance, as demonstrated in [46]. 3. The constraint of optimisation constraint is entirely set by users. For example, in [87], a hexagon two-dimensional EOM was proposed to provide better measure of the minimum data eye so as to provide an improved two-dimensional EOM adaptive DFE. 4. The step size of EOM adaptive DFE can be set adaptively in accordance with the level of the severity of the violation of the minimum eye-opening or timing jitter so as to provide improved adaptivity and performance, as demonstrated in [81]. 5 Future of DFE Although DFE is most effective in minimising ISI caused by finite channel bandwidth, reflection and crosstalk, a number of challenges exist in deploying DFE for multi-Gbps data communications over highly dispersive channels with strong reflection and crosstalk. It is highly desirable that channel equalisation be fully portable from one technology node to another. ADC-based DFE offers such a flexibility. Performing analog-to-digital conversion at multi-Gbps 128 & The Institution of Engineering and Technology 2014 Conclusions A comprehensive review of decision feedback equalisation for multi-Gbps data links has been presented. The imperfections of wire channels and their impact on data transmission have been investigated. The pros and cons of near-end and far-end channel equalisations that combat ISI have been explored. A detailed examination of the principle, configuration, operation and limitation of DFE has been provided. Design challenges encountered in DFE for multi-Gbps data links including BER, timing constraints, error propagation, arithmetic operation, sampling and delay cells and circuit techniques addressing these challenges have been studied. The need for and the principle of adaptive DFE have also been investigated. 7 Acknowledgment The authors are deeply grateful to reviewers for their invaluable comments. The paper could not have been in its present form without the criticism and suggestion of the reviewers. This project was financially supported by Natural Sciences and Engineering Research Council (NSERC) of Canada. 8 References 1 Hu, A., Yuan, F.: ‘Inter-signal timing skew compensation of parallel links with voltage-mode incremental signaling’, IEEE Trans. Circuits and Systems I, 2009, 56, (4), pp. 773–783 2 Aziz, P., Kimura, H., Malipatil, A., Kotagiri, S.: ‘A class of down-sampled floating tap dfe architectures with application to serial links’. Proc. IEEE Int. Symp. Circuits and Systems, 2012, pp. 325–328 3 Zhong, F., Quan, S., Liu, W., et al.: ‘A 1.0625-to-14.025 Gb/s multimedia transceiver with full-rate source-series-terminated transmit driver and floating-tap decision-feedback equalizer in 40 nm CMOS’. IEEE Int. Solid-State Circuit Conf. Digest of Technical Papers, 2011, pp. 348–349 4 Zhong, F., Quan, S., Liu, W., et al.: ‘A 1.0625–14.025 Gb/s multi-media transceiver with full-rate source-series-terminated transmit driver and IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130 doi: 10.1049/iet-cds.2013.0159 www.ietdl.org 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 floating-tap decision-feedback equalizer in 40 nm CMOS’, IEEE J. Solid-State Circuits, 2011, 46, (12), pp. 3126–3139 Shahramian, S., Yasotharan, H., Carusone, A.: ‘Decision feedback equalizer architectures with multiple continuous-time infinite impulse response filters’, IEEE Trans. Circuits Syst. II, 2012, 59, (6), pp. 226–230 Payne, R., Bhakta, B., Ramaswamy, S., et al.: ‘A 6.25 Gb/s binary adaptive DFE with first post-cursor tap cancellation for serial backplane communications’. IEEE Int. Solid-State Circuits Conf. Digest of Technical Papers, 2005, pp. 68–69 Krishna, K., Yokoyama-Martin, D., Caffee, A., et al.: ‘A multi-giga-bit backplane transceiver core in 0.13-m CMOS with a power-efficient equalization architecture’, IEEE J. Solid-State Circuits, 2005, 40, (12), pp. 2658–2666 Beukema, T., Sorna, M., Selander, K., et al.: ‘A 6.4-Gb/s CMOS SerDes core with feed-forward and decision-feedback equalization’, IEEE J. Solid-State Circuits, 2005, 40, (12), pp. 2633–2645 Stojanovic, V., Ho, A., Garlepp, B., et al.: ‘Autonomous dual-mode (PAM2/4) serial link transceiver with adaptive equalization and data recovery’, IEEE J. Solid-State Circuits, 2005, 40, (4), pp. 1012–1026 Balan, V., Caroselli, J., Chern, J., et al.: ‘A 4.8-6.4-Gb/s serial link for backplane applications using decision feedback equalization’, IEEE J. Solid-State Circuits, 2005, 40, (9), pp. 1957–1967 Wong, K., Rylyakov, A., Yang, C.: ‘A 5 mW 6 Gb/s quarter-rate sampling receiver with a 2-tap DFE using soft decisions’, IEEE J. Solid-State Circuits, 2007, 42, (4), pp. 881–888 Park, M., Bulzacchelli, J., Beakes, M., Friedman, D.: ‘A 7 Gb/s 9.3 mW 2-tap current-integrating DFE receiver’. IEEE Int’l Solid-State Circuits Conf. Digest of Technical Papers, February 2007, pp. 230–231 Leibowitz, B., Kizer, J., Lee, H., et al.: ‘A 7.5 Gb/s 10-tap DFE receiver with first tap partial response, spectrally gated adaptation, and 2nd-order data-filtered CDR’. IEEE Int. Solid-State Circuits Conf. Digest of Technical Papers, February 2007, pp. 228–599 Bulzacchelli, J., Meghelli, M., Rylov, S., et al.: ‘A 10-Gb/s 5-tap DFE/ 4-tap FFE trnsceiver in 90 nm CMOS technology’, IEEE J. Solid-State Circuits, 2006, 41, (12), pp. 2885–2900 Hidaka, Y., Gai, W., Horie, T., Jiang, J., Koyanagi, Y., Osone, H.: ‘A 4-channel 1.25-10.3 Gb/s backplane transceiver macro with 35 dB equalizer and sign-based zero-forcing adaptive control’, IEEE J. Solid-State Circuits, 2009, 44, (12), pp. 3547–3559 Seong, C., Rhim, J., Choi, W.: ‘A 10-Gb/s adaptive look-ahead decision feedback equilizer with an eye-opening monitor’, IEEE Trans. Circuits Syst. II, 2012, 59, (4), pp. 209–213 Huang, Y., Liu, S.: ‘A 6 Gb/s receiver with 32.7 dB adaptive DFE-IIR equalization’. IEEE Int. Solid-State Circuits Conf. Digest of Technical Papers, February 2011, pp. 356–358 Turker, D., Rylyakov, A., Friedman, D., Gowda, S., Sanchez-Sinencio, E.: ‘A 19 Gb/s 30 mW 1-tap speculative DFE receiver in 90 nm CMOS’. Symp. VLSI Circuits Digest of Technical Papers, 2008, pp. 216–217 Bulzacchelli, J., Dickson, T., Deniz, Z., et al.: ‘A 78 mW 11.1 Gb/s 5-tap DFE receiver with digitally calibrated current-integrating summers in 65 nm CMOS’. IEEE Int’l Solid-State Circuits Conf. Digest of Technical Papers, February 2009, pp. 368–369 Kim, B., Liu, Y., Dickson, T., Bulzacchelli, J., Friedman, D.: ‘A 10-Gb/s compact low-power serial I/O with DFE-IIR equalization in 65-nm CMOS’, IEEE J. Solid-State Circuits, 2009, 44, (12), pp. 3526–3538 Pozzoni, M., Erba, S., Viola, P., et al.: ‘A multi-standard 1.5 to 10 Gb/s latch-based 3-tap DFE receiver with a SSC tolerant CDR for serial backplane communication’, IEEE J. Solid-State Circuits, 2009, 44, (4), pp. 1306–1315 Harwood, M., Warke, N., Simpson, R., et al.: ‘A 12.5 Gb/s SerDes in 65 nm CMOS using a baud-rate ADC with digital receiver equalization and clock recovery’. Proc. IEEE Custom Integrated Circuits Conf., 2007, pp. 436–437 Abiri, B., Sheikholeslami, A., Tamura, H., Kibune, M.: ‘A 5 Gb/s adaptive DFE for 2x blind ADC-based CDR in 65 nm CMOS’. IEEE Int. Solid-State Circuit Conf. Digest of Technical Papers, February 2011, pp. 436–437 Sarvari, S., Tahmoureszadeh, T., Sheikholeslami, A., Tamura, H., Kibune, M.: ‘A 5 Gb/s speculative DFE for 2x blind ADC-based receivers in 65-nm CMOS’. Symp. VLSI Circuits Digest of Technical Papers, June 2010, pp. 69–70 Wang, H., Lee, J.: ‘A 21-Gb/s 87-mW transceiver with FFE/DFE/analog equalizer in 65-nm CMOS technology’, IEEE J. Solid-State Circuits, 2010, 45, (4), pp. 909–920 Ying, Y., Liu, S.: ‘A 20 Gb/s digitally adaptive equalizer/DFE with blind sampling’. IEEE Int. Solid-State Circuits Conf. Digest of Technical Papers, February 2011, pp. 444–446 IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130 doi: 10.1049/iet-cds.2013.0159 27 Dickson, T., Liu, Y., Rylov, S., et al.: ‘An 8x 10-Gb/s source-synchronous I/O system based on high-density silicon carrier interconnects’, IEEE J. Solid-State Circuits, 2012, 47, (4), pp. 884–896 28 Nazari, M., Emami-Neyestanak, A.: ‘A 15-Gb/s 0.5-mW/Gbps two-tap DFE receiver with far-end crosstalk cancellation’, IEEE J. Solid-State Circuits, 2012, 47, (10), pp. 2420–2432 29 Gangasani, G., Hsu, C., Bulzacchelli, J., et al.: ‘A 16-Gb/s backplane transceiver with 12-tap current integrating DFE and dynamic adaption of voltage offset and timing drifts in 45-nm SOI CMOS technology’, IEEE J. Solid-State Circuits, 2012, 47, (8), pp. 1828–1841 30 Agrawal, A., Dickson, J.B.T., Liu, Y., Tierno, J., Friedman, D.: ‘A 19-Gb/s serial link receiver with both 4-tap FFE and 5-tap DFE function in 45-nm SOI CMOS’, IEEE J. Solid-State Circuits, 2012, 47, (12), pp. 134–136 31 Amirkhany, A., Kaviani, K., Abbasfar, A., et al.: ‘A 4.1-pJ/b, 16-Gb/s coded differential bidirectional parallel electrical link’, IEEE J. Solid-State Circuits, 2012, 47, (12), pp. 3208–3219 32 Kaviani, K., Wu, T., Wei, J., et al.: ‘A tri-modal 20-Gbps/ linkdifferential/DDR3/GDDR5 memory interface’, IEEE J. Solid-State Circuits, 2012, 47, (4), pp. 926–937 33 Joy, A., Mair, H., Lee, H., et al.: ‘Analog-DFE-based 16 Gb/s SerDes in 40 nm CMOS that operates across 34 dB loss channels at Nyquist with a baud rate CDR and 1.2 Vpp voltage-mode driver’. IEEE Int’l Solid-State Circuit Conf. Digest of Technical Papers, February 2011, pp. 350–351 34 Cui, D., Raghavan, B., Singh, U., et al.: ‘A dual-channel 23-Gbps CMOS transmitter/receiver for 40-Gbps RZ-DQPSK and CS-RZ-DQPSK optical transmission’, IEEE J. Solid-State Circuits, 2012, 47, (12), pp. 3249–3260 35 Spagna, F., Chen, L., Deshpande, M., et al.: ‘A 78 mw11.8 Gb/s serial link transceiver with adaptive RX equalization and baud-rate CDR in 32 nm CMOS’. IEEE Int’l Solid-State Circuits Conf. Digest of Technical Papers, February 2010, pp. 366–367 36 Toifl, T., Menolfi, C., Ruegg, M., et al.: ‘A 2.6 mw/Gb/s 12.5 Gbps RX with 8-tap switched-capacitor DFE in 32 nm CMOS’, IEEE J. Solid-State Circuits, 2012, 47, (4), pp. 897–910 37 Toifl, T., Menolfi, C., Ruegg, M., et al.: ‘A 3.1 mW/Gb/s 30 Gbps quadrater-rate triple-speculation 15-tap SC-DFE RX data path in 32 nm CMOS’. Symp. Circuits Digest of Technical Papers, 2012, pp. 102–103 38 Bulzacchelli, J., Menolfi, C., Beukema, T., et al.: ‘A 28-Gb/s 4-tap FFE/ 15-tap DFE serial link transceiver in 32-nm SOI CMOS technology’, IEEE J. Solid-State Circuits, 2012, 47, (12), pp. 324–326 39 Savoj, J., Hsieh, K., Upadhyaya, P., et al.: ‘A wide common-mode fully-adaptive multi-standard 12.5 Gb/s backplane transceiver in 28 nm CMOS’. Symp.VLSI Circuits Digest of Technical Papers, 2012, pp. 104–105 40 Dickson, T., Bulzacchelli, J., Friedman, D.: ‘A 12 Gb/s 11 mW half-rate sampled 5-tap decision feedback equalizer with currentintegrating summers in 45-nm SOI CMOS technology’, IEEE J. Solid-State Circuits, 2009, 44, (4), pp. 1298–1230 41 Zhao, Z., Wang, J., Li, S., Chen, J.: ‘A 2.5-Gb/s 0.13 μm CMOS current mode logic transceiver with pre-emphasis and equalization’. Proc. Int. Conf. ASIC, October 2007, pp. 368–371 42 Kao, S., Liu, S.: ‘A 20-Gb/s transmitter with adaptive pre-emphasis in 65-nm CMOS technology’, IEEE Trans. Circuits Syst. II, 2010, 57, (5), pp. 319–323 43 Fiedler, A., Mactaggart, R., Welch, J., Krishnan, S.: ‘A 1.0625 Gbps transceiver with 2x-oversampling and transmit signal preemphasis’. IEEE Int. Solid-State Circuits Conf. Digest of Technical Papers, February 1997, pp. 238–239 44 Dally, J., Poulton, J.: ‘Transmitter equalization for 4-gbps signaling’, IEEE Micro, 1997, 17, (1), pp. 48–56 45 Yuan, F.: ‘An area-power efficient 4-PAM full-clock 10 Gb/s CMOS pre-emphasis serial link transmitter’, Analog Integr. Circuits Signal Process., 2009, 59, (3), pp. 257–264 46 Wong, K., Chen, E., Yang, C.: ‘Edge and data adaptive equalization of serial-link transceivers’, IEEE J. Solid-State Circuits, 2008, 43, (9), pp. 2157–2169 47 Lee, M., Dally, W., Chiang, P.: ‘Low-power area-efficient high-speed I/ O circuit techniques’, IEEE J. Solid-State Circuits, 2000, 35, (11), pp. 1591–1599 48 Lee, M., Dally, W., Farjad-Rad, R., et al.: ‘CMOS high-speed I/Os present and future’. Proc. Int’l Conf. Computer Design, October 2003, pp. 454–461 49 Balan, V., Caroselli, J., Chern, J., Desai, C., Liu, C.: ‘A 4.8-6.4 Gbps serial link for back-plane applications using decision feedback equalization’. Proc. IEEE Custom Integrated Circuits Conf., 2004, pp. 331–334 50 Razavi, B.: ‘Design of integrated circuits for optical communications’ (McGraw-Hill, New York, 2003) 129 & The Institution of Engineering and Technology 2014 www.ietdl.org 51 Kammeyer, K.: ‘Time truncation of channel impulse responses by linear filtering: A method to reduce the complexity of Viterbi equalization’, Int. J. Electron. Commun. (AEU), 1994, 48, (5), pp. 237–243 52 Gondi, S., Razavi, B.: ‘Equalization and clock and data recovery techniques for 10 Gb/s CMOS serial-link receivers’, IEEE J. Solid-State Circuits, 2007, 42, (9), pp. 1999–2011 53 Lee, D., Han, J., Han, G., Park, S.: ‘10 Gbit/s 0.0065 mm2 6 mW analogue adaptive equalizer utilizing negative capacitance’, IET Electron. Lett., 2009, 45, (17), pp. 863–865 54 Austin, M.: ‘Decision-feedback equalization for digital communication over dispersive channels’. IEEE Int. Research Laboratory of Electronics Technical Report 461, August 1967 55 Chen, W., Huang, G.: ‘A parallel multi-pattern PRBS generator and BER tester for 40 Gbps SerDes application’. Proc. IEEE Asia-Percific Conf. Advanced System Integrated Circuis, August 2004, pp. 318–321 56 Dickson, T., Bulzacchelli, J., Friedman, D.: ‘A 12 Gb/s 11 mW half-rate sampled 5-tap decision feedback equalizer with currentintegrating summers in 45 nm SOI CMOS technology’. Symp. VLSI Circuits Digest of Technical Papers, 2008, pp. 58–59 57 Toifl, T., Ruegg, T.M.M., Inti, R., et al.: ‘A 3.1 mW/Gbps 30 Gbps quarter-rate triple-speculation 15-tap SC-DFE RX data path in 32 nm CMOS’. Symp. VLSI Circuits Digest of Technical Papers, 2012, pp. 102–103 58 Winters, J., Kasturia, S.: ‘Adaptive nonlinear cancellation for high-speed fiber-optic systems’, J. Lightwave Technol., 1992, 10, (7), pp. 971–977 59 Ren, J., Lee, H., Lin, Q., et al.: ‘Precursor ISI reduction in high-speed I/ O’. Symp. VLSI Circuits Digest of Technical Papers, 2007, pp. 134–135 60 Payne, R., Bhakta, B., Ramaswamy, S., et al.: ‘A 6.25-Gb/s binary transceiver in 0.13-μm CMOS for serial data transmission across high loss legacy backplane channels’, IEEE J. Solid-State Circuits, 2005, 40, (12), pp. 2646–2657 61 Lee, S., Hwang, M., Choi, Y., et al.: ‘A 5-Gb/s 0.25-μm CMOS jitter-tolerant variable-interval oversampling clock/data recovery circuit’, IEEE J. Solid-State Circuits, 2002, 37, (12), pp. 1822–1830 62 Milijevic, S., Kwasniewski, T.: ‘4 Gbit/s receiver with adaptive blind DFE’, IEE Electron. Lett., 2005, 41, (25), pp. 1373–1374 63 Ibranhim, S., Razavi, B.: ‘Low-power CMOS equalizer design for 20-Gb/s systems’, IEEE J. Solid-State Circuits, 2011, 46, (6), pp. 1321–1336 64 Razavi, B.: ‘Design of analog CMOS integrated circuits’ (McGraw-Hill, New York, 2001) 65 Zhu, X., Chen, Y., Kibune, M., et al.: ‘A dynamic offset control technique for comparator design in scaled CMOS technology’. Proc. IEEE Custom Integrated Circuits Conf., 2008, pp. TP–011–014 66 McLeod, S., Sheikholeslami, A., Yamamoto, T., Nedovic, N., Tamura, H., Walker, W.: ‘A digital offset-compensation scheme for an la and cdr in 65-nm cmos’. Symp. VLSI Circuits Digest of Technical Papers, June 2009, pp. 448–449 67 Sidiropoulos, S., Horowitz, M.: ‘A 700 Mb/s/pin CMOS signaling interface using current integrating receivers’, IEEE J. Solid-State Circuits, 1997, 32, (5), pp. 681–690 68 Wang, T., Yuan, F.: ‘A new current-mode incremental signaling scheme with applications to Gb/s parallel links’, IEEE Trans. Circuits Syst. I., 2007, 54, (2), pp. 255–267 69 Carusone, T., Johns, D., Martin, K.: ‘Analog integrated circuit design’ (John Wileys and Sons, New York, 2012, 2nd edn.) 70 Fayed, A., Ismail, M.: ‘A low-voltage, low-power CMOS analog adaptive equalizer for UTP-5 cables’, IEEE Trans. Circuits Syt. I, 2008, 55, (2), pp. 480–495 130 & The Institution of Engineering and Technology 2014 71 Bulzacchelli, J., Rylyakov, A., Friedman, D.: ‘Power-efficient decision-feedback equalizers for multi-Gb/s CMOS serial links’. Proc. IEEE Radio Frequency Integrated Circuits Symp., June 2007, pp. 507–510 72 Payandelhnia, P., Abbasfar, A., Sheikhaei, S., Forouzandeh, B., Nanbakhsh, K., Eghbali, A.: ‘A 4 mW 3-tap 10 Gb/s decision feedback equalizer’. Proc. IEEE Mid-West Symp. Circuits and Systems, 2011, pp. 1–4 73 Rontogiannis, A., Berberidis, K.: ‘Efficient decision feedback equalization for sparse wireless channels’, IEEE Trans. Wirel. Commun., 2003, 2, (3), pp. 570–581 74 Mietzner, J., Badri-Hoeher, S., Land, I., Hoeher, P.: ‘Equalization of sparse intersymbol-interference channels revisited’, EURASIP J. Wirel. Commun. Netw., 2006, 2006, (2), pp. 1–13 75 Winters, J., Gitlin, R.: ‘Electrical signal processing techniques in long-haul fiber-optic systems’, IEEE Trans. Commun., 1990, 38, (9), pp. 1439–1453 76 Bien, F., Kim, H., Hur, Y., et al.: ‘A 10 Gb/s reconfigurable CMOS equalizer employing a transition detector-based output monitoring technique for band-limited serial links’, IEEE Trans. Microw. Theory Tech., 2006, 54, (12), pp. 4538–4547 77 Hyoungsoo, K., de Ginestous, J., Bien, F., et al.: ‘An electronic dispersion compensator (EDC) with an analog eye-opening monitor (EOM) for 1.25 Gb/s gigabit passive optical network (GPON) upstream links’, IEEE Trans. Microw. Theory Tech., 2007, 55, (12), pp. 2942–2950 78 Hong, D., Cheng, K.: ‘An accurate jitter estimation technique for efficient high speed I/O testing’. Proc. Asian Test Symp., October 2007, pp. 224–229 79 Chen, L., Zhang, X., Spagna, F.: ‘A scalable 3.6-5.2 mW 5-to-10 Gb/s 4-tab DFE in 32 nm’. IEEE Int. Solid-State Circuits Conf. Digest of Technical Papers, February 2009, pp. 180–181 80 Suttorp, T., Langmann, U.: ‘A 10-Gb/s CMOS serial-link receiver using eye-opening monitoring for adaptive equalization and for clock and data recovery’. Proc. IEEE Custom Integrated Circuits Conf., September 2007, pp. 277–280 81 Ellermeyer, T., Langman, U., Wedding, B., Pohlmann, W.: ‘A 10 Gb/s eye-opening monitor IC for decision-guided adaptation of the frequency response of an optical receiver’, IEEE J. Solid-State Circuits, 2000, 35, (12), pp. 1958–1963 82 Analui, B., Rylyakov, A., Rylov, S., Meghelli, M., Hajimiri, A.: ‘A 10-Gb/s two-dimensional eye-opening monitor in 0.13-μm standard CMOS’, IEEE J. Solid-State Circuits, 2005, 40, (12), pp. 2689–2699 83 Noguchi, H., Yoshida, N., Uchida, H., Ozaki, M., Kanemitsu, S., Wada, S.: ‘A 40-Gb/s CDR circuit with adaptive decision-point control based on eye-opening monitor feedback’, IEEE J. Solid-State Circuits, 2008, 43, (12), pp. 2929–2938 84 Bhatta, D., Kim, K., Gebara, E., Laskar, J.: ‘A 10 Gb/s two dimensional scanning eye opening monitor in 0.18 μm CMOS process’. Proc. IEEE Int. Microwave Symp. Digest, June 2009, pp. 1141–1144 85 Gerfers, F., Besten, G., Petkov, P., Conder, J., Koellmann, A.: ‘A 0.2-2 Gb/s 6x OSR receiver using a digitally self-adaptive equalizer’, IEEE J. Solid-State Circuits, 2008, 43, (6), pp. 1436–1448 86 Abiri, B., Sheikholeslami, A., Tamura, H., Kibune, M.: ‘An adaptation engine for a 2x blind ADC-based CDR in 65 nm CMOS’, IEEE J. Solid-State Circuits, 2011, 46, (12), pp. 3140–3149 87 AL-Taee, A., Yuan, F., Ye, A., Sad, S.: ‘A new two-dimensional eye-opening monitor for Gbps serial links’, IEEE Trans. VLSI, 2013 IET Circuits Devices Syst., 2014, Vol. 8, Iss. 2, pp. 118–130 doi: 10.1049/iet-cds.2013.0159