AN OVERVIEW OF HIGH-SPEED SERIAL I/O TRENDS, TECHNIQUES AND STANDARDS Farhad Zarkeshvari, Peter Noel, Sergei Uhanov and Tad Kwasniewsh Department of Electronics, Carleton University <fzarkes, pnoel, suhanov, tak> @doe.carleton.ca Abstract The goal of this paper is to provide the reader with a brief overview of the basic building blocks within a high-speed serial transceiver, to provide an outline of the major interconnect standards utilizing the highspeed serial U0 circuitiy and to give the basics behind the design techniques required to successfulb design and implement a flpical niulti-GHz serial U0 device. Several major design obstacles will be presented followed by a discussioii of the potential design techniques that may be used to avercome such implementafioriissues. The paper will cover hvo main de.sign approaches: low swing differential signaling and multilevelsignaling. Keywords: Higli-Speed Data, Serial I/O, Rapid 110, Infiniband, HJ’perironsport.. 1. INTRODUCTION High-speed data transport and device integration are two main requirements of network development and installation. A network can vary from a system as small as the motherboard of a microprocessor and can he as large as the most complex telecommunication network. All such applications require reliable highspeed serial interconnection. The increasing demand for more bandwidth to support inter-device communication is driving the development of high-speed serial transceivers. A typical networking installation utilizing 10 Gigabit Ethernet, for example, must incorporate multi-GHz serial 110. The use of such interconnect speeds has become so common place in intellectual property (IP) development that the leading FPGA suppliers provide such cores as standard offerings on the higher performance programmable devices. This is not to say that the design struggle to implement a more reliable, yet even faster interconnect, has been solved. To the contrary, the device community is viewing such offerings as an indication that even higher serial data rates must he eminent. Thus, the design challenge continues. 2. HIGH-SPEED I/O TRENDS To understand the trends of high-speed YO, the designer must appreciate the current requirements and limitations of existing telecommunications and networking infrastructures. The shared multi-drop bus has heen exploited to its full potential. Many techniques have been applied, such as increasing frequency, widening the interface, pipelining transactions, splitting transactions, and allowing out of order completion. Continuing to work with a bus in this manner creates several design issues. Increasing bus width, for example, reduces the maximum achievable frequency due to skew between signals. More signals also results in more pins on a device, traces on boards and larger connectors, causing a higher product cost and a reduction in the number of interfaces a system or device can provide. Worsening the situation is the desire to provide point-tomultipoint interconnections. As frequency and width increase, the ability to have more than a few devices attached to a shared bus becomes a difficult design challenge. To solve the connection problem, major technology companies and slandardization bodies like the Optical Internetworking Forum (OF), the Network Processing Forum (h’PF), etc., have developed strategies, implemented in standards lie RapidlO, Hypemansport, Infiniband and PCI Express, that utilize three approaches: 1) packet switching, 2) pointto-point unidirectional connections and 3) minimal pin count. With the packet switching approach, a shared bus is replaced with a switch fabric that controls the flow of the packetized data between devices. One data link may carry the traffic hetween many devices, concurrently, with each packet being delivered to the appropriate destination oust like in the Internet world with its IP packets and routing switches). For example, with Hypertransport protocol, commands, addresses CCECE 2004- CCGEI 2004, Niagara Falls, M a y h a i 2004 0-7803-8253-6104/$17.00 02004 IEEE - 1215 Authorized licensed use limited to: Carleton University. Downloaded on July 13, 2009 at 14:27 from IEEE Xplore. Restrictions apply. and data are containned in all packets. The data link of such interconnects provides flow control and error management to ensure high reliability and exhibits very low latency due to the simplicity of the protocol. The second approach utilizes point-to-point serial unidirectional connections. This dramatically reduces the number of connections, the pin count and the cost. These connections, relieved from the skew constraints, can run at high speeds through serial interfaces, further increasing the bandwidth of the system. To achieve very high data rates, 1/0 interfaces use low swing differential signaling (LVDS) with on-die differential termination. The third fundamental trend concentrates on minimizing pin count. This allows smaller packages, reduced power consumption and better thermal characteristics. At first glance, differential signaling would seem to increase pin count as it requires two pins per hit width and separate upstream and downstream data paths. However, the increase in signal pins is offset by two factors: a) usage of separate data paths permits operation at higher frequencies; and b) differential signaling provides a return current path for each signal, thereby reducing the number of power and ground pins. An emerging trend is that of built-in scalability of the interfaces: of both frequency and data width. This can guarantee a longer life of the particular I/O standard in the fast changing standards world. 3. TRANCEIVER TECHNOLOGY Conventional VO methods use fiill-swing unterminated signaling and massive parallelism and have the disadvantage of increased costs of packaging and PCB manufactwing. Frequency dependent distortion arising from skin effect, dielectric losses and reflections, have become increasingly problematic at high data rates. The major performance-limiting factor for digital systems is the interconnection bandwidth between chips, boards and cabinets. Full-swing CMOS must ring-up the line, and is bandwidth limited by the length of the line rather than the performance of the semiconductor technology. As VLSI technology scales, the pin bandwidth does not scale accordingly, but remains limited by board and cable geometry, making off-chip bandwidth an even more critical bottleneck. There are two main approaches to overcoming these problems and for producing reliable high-speed interconnects. One is using low swing differential signaling and the other is using multilevel signaling. 3.1 Differential Signaling The noise margin on digital chip-to-chip interconnects has been decreasing for two main reasons. Supply voltages in digital CMOS processes are decreasing, reducing the voltage available for driving UOs. Small signal swings are being used to reduce dynamic power dissipation on high-speed buses. Full differential signals effectively reject common-mode noise and even-order distortion terms. Since common-mode noise is prevalent on matched PCB traces, differential signaling is effective for both voltage and current-mode interfaces. The differential signaling schemes used for Gh/sec interconnect are emitter-coupled logic (ECL), pseudo emitter-coupled logic (PECL) and low-voltage differential signaling (LVDS). 3.1.1 ECL and PECL The benefits of cost reduction and increased density make it desirable to implement VO cells in low-voltage digital CMOS technology, thus avoiding any additional masks for bipolar or 5-V devices. Emitter-coupled logic is a low-swing standard that has been the dominant technique in the implementation of high-speed digital systems and is normally associated with bipolar technology. With advances in CMOS technology, sub-micrometer CMOS circuits are becoming competitive with high-speed ECL. Low swing signaling in CMOS features increased speed performance, lower power consumption and higher integration density. Positive-biased ECL signals, also known as pseudo-ECL (PECL), differ from true ECL in that PECL uses positive voltages instead of negative voltages. PECL transmitters may be implemented as switched-current or switched-voltage drivers as shown in Figure 1, in order to provide standard PECL levels on the external termination resistor. b) -7- Figure 1: lmplementation of a line driver as (a) a switched-voltage or (h) switched-current source 111 Although the switched-current architecture was the preferred approach, it supports only one termination scheme. Switched-voltage architectures are more - 1216 Authorized licensed use limited to: Carleton University. Downloaded on July 13, 2009 at 14:27 from IEEE Xplore. Restrictions apply. flexible and allow several termination schemes, if the output resistance is sufficiently low. The constraints posed on the output resistance lead to large transistor sizes and additional power consumption due to multiple buffer stages. This may be a disadvantage if the transmitter has to he embedded in an ASIC with a large number of outputs. of bias current are used in the falling and rising transition so that both are optimized for minimum power consumption. Dynamic biasing has the advantage of maintaining the PMOS device bias current at a fairly constant level between logic state transitions. This leads to a constant output resistance, which is useful for a series termination. The reduced current through the PMOS device leads to a lower size ratio and therefore lower area and parasitics. Figure 2: Termination scheme for single ended ECL drivers [I]: (a) Canonical (parallel) (b) Thevenin (c) Series An overview of typical termination schemes used in ECL or PECL circuits is shown in Figure 2. Besides the canonical (pardllel) termination scheme of Figure Z(a), a Thevenin termination, as in Figure 2(b), can be used at the load end with the advantage that the additional supply is avoided. With an appropriate choice of R,and R2, the same loadmg characteristic as for the parallel termination is obtained but at the expense of higher power consumption, A series termination at the source end as in Figure 2(c) is an attractive alternative because of the lower power consumption. It provides better suppression of reflected waves caused by line-to-line crosstalk. Both the Thevenin and series terminations increase the sensitivity of the output levels to the supply voltage. -. Figure 3: A PECL transmitter 111 fi," bo ........-. ., 3.1.2 Switched-voltage Configuration 4 block diagram of a switched-voltage configuration transmitter with a dc biasing block and an external termination resistor is shown in Figure 3. The required voltage references are obtained by means of a scaled replica circuit and several feedback loops. The loading effect of the external termination resistor RT.is replicated by the internal resistors Rm connected toward an internally developed voltage reference equal to VDD- V,, with VT =2V. The schematic of a PECL driver is shown in Figure 4. This circuit uses the technique of dynamic biasing to minimize the transition time of the output voltage without increasing output stage power consumption, MI and M 2 make a differential pair. The dynamic biasing is forced by M5 and M8.Two different values Figure 4: A switched-voltage PECL transmitter. [l] 3.1.3 Switched-current Configuration The block diagam of a switched-current configuration, introduced in 121, is shown in Figure 5. An open drain configuration is used that suffers from a slow rise time. Two active pull-up circuits and a pulsebiasing scheme are used to improve edge transition rates. The circuit diagram is shown at the bottom of Figure 5. M l a and Mlh make up the differential pair and M2 provides the dc bias. M3 with the overlap generator and edge detector provide a pulse bias - 1217 Authorized licensed use limited to: Carleton University. Downloaded on July 13, 2009 at 14:27 from IEEE Xplore. Restrictions apply. current in each transient. This improves circuit rise and fall time. A , , ; : /- a IC c .................. complexity, a simple low-power common-mode feedback control is implemented in the transmitter. The common-mode output voltage is sensed by means of a high resistive divider RA-RBand compared with a 1.25-V reference by the differential amplifier M5-MX. The fraction of the tail current IT flowing across M7 and M8 is mirrored by Mu and ML, respectively, thus forcing VcM=1.25 V. "im~i .. ................................... y*+ y 7, U,,Ul :. %j RI h iT I& m 2. N,. LVDS M ..................................... Figure 5: Switched Current Configuration 121 3.1.4 LVDS Low Voltage Differential Signaling (LVDS) is a low noise, low power, and low amplitude method for high-speed (Gb/s) data transmission over copper wire. LVDS achieves significant power savings by means of a differential scheme for transmission and termination, in conjunction with a low voltage swing. LVDS uses a transmitter configured as a switched-polarity current generator. Figure 6 shows different termination topologies. Figure 6(a) shows a differential load resistor at the receiver that provides current-to-voltage conversion and optimum line matching. For operation in the Gh/s range, an additional termination resistor is usually placed ai the source end as seen in Figure 6@). This serves to suppress reflected waves caused by crosstalk or by imperfect termination, due to package parasitics and component tolerance. Differential transmission greatly improves the robustness of the link to common-mode voltage bouncing (if using a cable as the medium) and crosstalk, thereby improving the tolerance to a reduced noise margin. LVDS uses a lower voltage swing that further reduces crosstalk and radiated electro-magnetic interference (EMI). LVDS requires less power than either differential or single-ended PECL. PECL exhibits an open-emitter output stage and requires a resistor to V0,-2 (V) at the receiver for line termination and biasing or a pull-down resistor toward ground. Whichever termination is used, the openemitter configuration and the larger voltage swing lead to higher power consumption in a PECL link. The circuit diagram of an LVDS transmitter is shown in Figure 7. Either M1 and M3 or Ivi2 and M4 are turned on resulting in a different output voltage polarity, depending on the active combination. In order to achieve higher precision and lower circuit DlFRREMlAL WCL Figure 6: Different solutions for high-speed data (a) LVDS with termination at the receiver end. (h) LVDS link with termination at the receiver. (c) Single-ended and (d) differential PECL links. 131 3.2 Non-differential High-speed Circuit Techniques Despite the use of matched torminations and carefully controlled line and connector impedance, the better differential (current-mode) signaling methods are still limited to a data rate of about 1.6 GHz, due to the frequency-dependent attenuation of copper lines. Skin-effect resistance increases the attenuation of a conventional transmission line with liequency. For a broadband signal the superposition of un-attenuated low-frequency sigma1 components with attenuated high-frequency signal components causes InterSymbol Interference (1st). This interference degrades noise margins and reduces the maximum frequency at which the system can operate. The main problem here is not the magnitude of the attenuation, but rather the interference caused by the frequency dependent nature of the attenuation. Equalization eliminates the problem of frequency-dependent attenuation by filtering the transmitted or received waveform so the concatenation of the equalizing filter and the transmission line gives - 1218 Authorized licensed use limited to: Carleton University. Downloaded on July 13, 2009 at 14:27 from IEEE Xplore. Restrictions apply. a flat frequency response. Equalization can be performed digitally by a discrete-time finite impulse response (FIR) filter or in the analog domain by a continuous-time passive or active filter. Figure 7: LVDS Transmitter Circuit 131 Equalizing with a FIR filter is most popular. It is more easily realized in a standard CMOS process and it is easier to make adaptive. Equalizing using a FIR filter requires either an analog-to-digital converter with at least a few bits of resolution or a high-speed analog delay line, both difficult to design. It is more common and much simpler to equalize at the transmitter than at the receiver. Equalizing at the transmitter permits the use of a simple receiver that just samples a binary value. In an adaptive equalizer, the coefficients are initially calculated in the receiver during a training sequence. The calculated coefficients are fed back to the transmitter. As the FIR coefficients are updated at the beginning of communication, the equalizer can adapt to the characteristics of the line and may be used for a wide variety of interconnections. Adding more taps to the filter could widen the bandwidth. The number of filter taps chosen is a compromise between bandwidth and equalization cost. Two configurations describing this technique are in [6] and[11]. 3.3 Multi-level Signaling Multi-level voltage coding is used to lower the baud rate or the frequency content of the signal in order to reduce the IS1 and to improve the Bit Error Rate (BER). It uses lower fUndamental frequencies than does binary signaling at the same data rate, thereby offering the potential of higher performance in limited bandwidth systems. In a nonretun-to-zero (NU) N-PAM communication system, the spectral efficiency is 2xlog2(N), which increases logarithmically with the number of PAM levels (N). All techniques mentioned previously are applied to multilevel signaling, especially when equalizing in the transmitter. Optimal detection can also be performed by a simple peak detector or by using maximum-likelihood sequence detection or sampled matched filtering at the receiver. Coding can be used to improve the system error rate. Recently, many high-speed I/O designs using NPAM for chip-to-chip communications have been documented [4-IO].Several approaches describe using an equalizer in the receiver side [5] (accompanied with pre-emphasis in transmitter) while others indicate the use of a more complicated self-adaptive equalizer only in the transmitter [6]. A common feature of multilevel signaling transceivers includes multiplexing and demultiplexing the data at the transmitter output and receiver input. This avoids any on-chip requirement for high clocking frequencies. 3.3.1 Transmitter Serial link transmitters fabricated in CMOS typically use multilevel signaling (2"-PAM) and M-tap pre-emphasis filters to reduce the ISI. A common practice is to design the uansmitter output driver as a k l multiplexer to reduce the clock frequency to l/k the symbol rate and to increase the bit rate against a process-limited on-chip frequency. The pre-emphasis and PAM encoder circuits are usually implemented using DACs. A transmitter-equalizer architecture is shown in Figure 8 [6]. In adaptive schemes, the coefficients are updated by information from the receiver and changed to a voltage/current by the DACs. In this diagram, the n,b DAC creates the main symbol, the (n-i),b and (n+i)h.DACs produce the ip, preceding and i,b trailing symbol, and the last DAC cancels any significant echo that is too far from the main symbol for the main tap to cancel. The outputs of all DACs are summed and connected to the output line. - 1219 Authorized licensed use limited to: Carleton University. Downloaded on July 13, 2009 at 14:27 from IEEE Xplore. Restrictions apply. Ihl 1 1 i l I than the PECL techniques but requires a larger power supply, as there are more series transistors in this configuration. Multi-level signaling techniques achieve a higher bit rate at the price of more power consumption. Using an equalization technique [4], a good trade-off between power consumption and speed is achieved for short channels (less than Im). A IO Gbis speed was technology and 1W power reported with 0 . 4 ~ consumption [7]. I Figure 8: Common N-bit transmitter-equalizer architecture 3.3.2 Receiver A main issue in receiver design is in sampling the incoming data. It can be performed by adaptively setting the phase of a sampler [IO]. Over-sampling is an alternative but is not feasible as the serial link operates at the maximum possible technology speed. In [9], three ADCs are activated by three equally spaced clock phases for sampling the incoming data. This is equivalent to three times over-sampling of the channel data rate and is similar to using the multiplexer in the transmitter. AAer correct sampling, the ADC converts the incoming continuous-time signal to digital data. The bit resolution of the ADC is dependent on the number of levels in the signaling scheme. If the transmitter and the receiver have different reference voltages lhen a resistor ladder may be required. This ladder may not be optimally centered and might not cover the complete input voltage range. [9] solves this problem by using a calibration circuit. 4.0 COMPARISON OF CONFIGURATIONS Both PECL configurations use a bias generator with scaled replica feedback and dynamic biasing to improve the transient edges and to maintain the proper output voltage. The core of the PECL circuit is a differential pair that is used in the switched-current configuration directly connected to the output through its drain (open drain configuration). In the switchedvoltage configuration, the differential pair output is connected to the line through a source follower stage. LVDS also uses a semi-differential configuration with a form of bridge. LVDS consumes less power References [I] A. Boni, “1.2-Chis True PBCL IOOK Compatible I/O Interface in 0.35-pm CMOS” IEEE Journal of SolidState Circuits, vol. 36, pp. 979-986,2001. [2]H. Djahdnshahi, F. Haiisen and C.A.T. Salama, “Gigabit per Second ECL Compatible IiO Interface in 0.35micron CMOS,” IEEE Journal ofSolid-Srafe Circuits, vol. 34,pp. 1074-1083,1999. [3]A. Boni, A. Pierazzi and D. Vecchi, “LVDS U 0 Interface for Gbis-per-pin Operation in 0.35-pm CMOS”, IEEE Journal ofSolid-Stute Circuits, vol. 36, pp. 706-71 1, 2001. [4]M.-J.E. Lee, W. J. Dally andP. Chiang, “Low-power Area-efficient High-speed 110 Circuit Techniques”, IEEE JournalofSolid-Stute Circuits. vol. 35. nD. 15911599,2000. [SIR. Faqad-Rad, C.-K. K. Yang and M.A. Horowih, “A 0.3-um CMOS X-Gb/s 4-PAM serial link transceiver”. IEEE Journal of Solid-State Circuit,T, vol. 35, pp. 757: 764,2000. [6]J. T. Stonick, W. Gu-Yeon, J.L. Sonntag and D. K. Weinlader, “An Adaptive PAM-4 5 Gbis Backplane Transceiver in 0.25pm CMOS”, IEEE Journal of SolidSfate Circuits, vol. 38, pp. 436443,2003. [7] R. Faqad-Rad, C.-K. K. Yang, M.A. Horowilz and T. H. Lee, “A 0.4um CMOS 10Gbis4-PAM PreEinphasis Serial Link Transmitter”, IEEE Journal ofSolid-State Circuits, vol. 34, pp. 436-443, 1999. [XI W. J. Dally and J. Poulton, “Transmitter equalization for 4-Gbps signaling”, IEEE Micro, vol. 17, pp. 48-56, 1997. [9] J. L. Zerbe, P. S . Chau, C. W. Werner, T. P. Thrush, H. J. Liaw, B.W. Garlepp and K.S. Donnelly, “1.6 Gbisipin 4PAM signaling and circuits for a multidrop bus”, IEEE Journal ofSolid-State Circuits, vol. 36, pp. 752-760, 2001. [IO] D. J. Foley and M. P. Flynn, “A low-power %PAM serial transceiver in 0.5-pm digital CMOS”, IEEE Journal of Solid-Stare Circuits, vol. 37, pp. 310-316, 2002. [l I] Lei Lin, Peter Noel and Tad Kwasniewski, “Implementinga Digitally SynthesizedAdaptive Preemphasis Algorithm for use in a High-speed Backplane Interconnection.” Canadian Conference OII Computer and Electrical Engineering,May 2004 - 1220 - Authorized licensed use limited to: Carleton University. Downloaded on July 13, 2009 at 14:27 from IEEE Xplore. Restrictions apply. ,..