as a PDF

advertisement
AN OVERVIEW OF HIGH-SPEED SERIAL I/O
TRENDS, TECHNIQUES AND STANDARDS
Farhad Zarkeshvari, Peter Noel, Sergei Uhanov and Tad Kwasniewsh
Department of Electronics, Carleton University
<fzarkes, pnoel, suhanov, tak> @doe.carleton.ca
Abstract
The goal of this paper is to provide the reader with
a brief overview of the basic building blocks within a
high-speed serial transceiver, to provide an outline of
the major interconnect standards utilizing the highspeed serial U0 circuitiy and to give the basics behind
the design techniques required to successfulb design
and implement a flpical niulti-GHz serial U0 device.
Several major design obstacles will be presented
followed by a discussioii of the potential design
techniques that may be used to avercome such
implementafioriissues. The paper will cover hvo main
de.sign approaches: low swing differential signaling
and multilevelsignaling.
Keywords: Higli-Speed Data, Serial I/O, Rapid 110,
Infiniband, HJ’perironsport..
1. INTRODUCTION
High-speed data transport and device integration
are two main requirements of network development
and installation. A network can vary from a system as
small as the motherboard of a microprocessor and can
he as large as the most complex telecommunication
network. All such applications require reliable highspeed serial interconnection.
The increasing demand for more bandwidth to
support inter-device communication is driving the
development of high-speed serial transceivers. A
typical networking installation utilizing 10 Gigabit
Ethernet, for example, must incorporate multi-GHz
serial 110. The use of such interconnect speeds has
become so common place in intellectual property (IP)
development that the leading FPGA suppliers provide
such cores as standard offerings on the higher
performance programmable devices. This is not to say
that the design struggle to implement a more reliable,
yet even faster interconnect, has been solved. To the
contrary, the device community is viewing such
offerings as an indication that even higher serial data
rates must he eminent. Thus, the design challenge
continues.
2. HIGH-SPEED I/O TRENDS
To understand the trends of high-speed YO, the
designer must appreciate the current requirements and
limitations of existing telecommunications and
networking infrastructures. The shared multi-drop bus
has heen exploited to its full potential. Many
techniques have been applied, such as increasing
frequency, widening the interface, pipelining
transactions, splitting transactions, and allowing out of
order completion. Continuing to work with a bus in
this manner creates several design issues. Increasing
bus width, for example, reduces the maximum
achievable frequency due to skew between signals.
More signals also results in more pins on a device,
traces on boards and larger connectors, causing a
higher product cost and a reduction in the number of
interfaces a system or device can provide. Worsening
the situation is the desire to provide point-tomultipoint interconnections. As frequency and width
increase, the ability to have more than a few devices
attached to a shared bus becomes a difficult design
challenge.
To solve the connection problem, major technology
companies and slandardization bodies like the Optical
Internetworking Forum (OF), the Network Processing
Forum (h’PF), etc., have developed strategies,
implemented
in
standards
lie
RapidlO,
Hypemansport, Infiniband and PCI Express, that
utilize three approaches: 1) packet switching, 2) pointto-point unidirectional connections and 3) minimal pin
count.
With the packet switching approach, a shared bus
is replaced with a switch fabric that controls the flow
of the packetized data between devices. One data link
may carry the traffic hetween many devices,
concurrently, with each packet being delivered to the
appropriate destination oust like in the Internet world
with its IP packets and routing switches). For example,
with Hypertransport protocol, commands, addresses
CCECE 2004- CCGEI 2004, Niagara Falls, M a y h a i 2004
0-7803-8253-6104/$17.00 02004 IEEE
- 1215 Authorized licensed use limited to: Carleton University. Downloaded on July 13, 2009 at 14:27 from IEEE Xplore. Restrictions apply.
and data are containned in all packets. The data link of
such interconnects provides flow control and error
management to ensure high reliability and exhibits
very low latency due to the simplicity of the protocol.
The second approach utilizes point-to-point serial
unidirectional connections. This dramatically reduces
the number of connections, the pin count and the cost.
These connections, relieved from the skew constraints,
can run at high speeds through serial interfaces, further
increasing the bandwidth of the system. To achieve
very high data rates, 1/0 interfaces use low swing
differential signaling (LVDS) with on-die differential
termination.
The third fundamental trend concentrates on
minimizing pin count. This allows smaller packages,
reduced power consumption and better thermal
characteristics. At first glance, differential signaling
would seem to increase pin count as it requires two
pins per hit width and separate upstream and
downstream data paths. However, the increase in
signal pins is offset by two factors: a) usage of
separate data paths permits operation at higher
frequencies; and b) differential signaling provides a
return current path for each signal, thereby reducing
the number of power and ground pins.
An emerging trend is that of built-in scalability of
the interfaces: of both frequency and data width. This
can guarantee a longer life of the particular I/O
standard in the fast changing standards world.
3. TRANCEIVER TECHNOLOGY
Conventional VO methods use fiill-swing unterminated signaling and massive parallelism and have
the disadvantage of increased costs of packaging and
PCB manufactwing. Frequency dependent distortion
arising from skin effect, dielectric losses and
reflections, have become increasingly problematic at
high data rates. The major performance-limiting factor
for digital systems is the interconnection bandwidth
between chips, boards and cabinets.
Full-swing CMOS must ring-up the line, and is
bandwidth limited by the length of the line rather than
the performance of the semiconductor technology. As
VLSI technology scales, the pin bandwidth does not
scale accordingly, but remains limited by board and
cable geometry, making off-chip bandwidth an even
more critical bottleneck. There are two main
approaches to overcoming these problems and for
producing reliable high-speed interconnects. One is
using low swing differential signaling and the other is
using multilevel signaling.
3.1 Differential Signaling
The noise margin on digital chip-to-chip
interconnects has been decreasing for two main
reasons. Supply voltages in digital CMOS processes
are decreasing, reducing the voltage available for
driving UOs. Small signal swings are being used to
reduce dynamic power dissipation on high-speed
buses. Full differential signals effectively reject
common-mode noise and even-order distortion terms.
Since common-mode noise is prevalent on matched
PCB traces, differential signaling is effective for both
voltage and current-mode interfaces.
The differential signaling schemes used for Gh/sec
interconnect are emitter-coupled logic (ECL), pseudo
emitter-coupled logic (PECL) and low-voltage
differential signaling (LVDS).
3.1.1 ECL and PECL
The benefits of cost reduction and increased
density make it desirable to implement VO cells in
low-voltage digital CMOS technology, thus avoiding
any additional masks for bipolar or 5-V devices.
Emitter-coupled logic is a low-swing standard that has
been the dominant technique in the implementation of
high-speed digital systems and is normally associated
with bipolar technology. With advances in CMOS
technology, sub-micrometer CMOS circuits are
becoming competitive with high-speed ECL. Low
swing signaling in CMOS features increased speed
performance, lower power consumption and higher
integration density.
Positive-biased ECL signals, also known as
pseudo-ECL (PECL), differ from true ECL in that
PECL uses positive voltages instead of negative
voltages. PECL transmitters may be implemented as
switched-current or switched-voltage drivers as shown
in Figure 1, in order to provide standard PECL levels
on the external termination resistor.
b)
-7-
Figure 1: lmplementation of a line driver as (a) a
switched-voltage or (h) switched-current source 111
Although the switched-current architecture was the
preferred approach, it supports only one termination
scheme. Switched-voltage architectures are more
- 1216 Authorized licensed use limited to: Carleton University. Downloaded on July 13, 2009 at 14:27 from IEEE Xplore. Restrictions apply.
flexible and allow several termination schemes, if the
output resistance is sufficiently low. The constraints
posed on the output resistance lead to large transistor
sizes and additional power consumption due to
multiple buffer stages. This may be a disadvantage if
the transmitter has to he embedded in an ASIC with a
large number of outputs.
of bias current are used in the falling and rising
transition so that both are optimized for minimum
power consumption. Dynamic biasing has the
advantage of maintaining the PMOS device bias
current at a fairly constant level between logic state
transitions. This leads to a constant output resistance,
which is useful for a series termination. The reduced
current through the PMOS device leads to a lower size
ratio and therefore lower area and parasitics.
Figure 2: Termination scheme for single ended
ECL drivers [I]: (a) Canonical (parallel) (b)
Thevenin (c) Series
An overview of typical termination schemes used
in ECL or PECL circuits is shown in Figure 2. Besides
the canonical (pardllel) termination scheme of Figure
Z(a), a Thevenin termination, as in Figure 2(b), can be
used at the load end with the advantage that the
additional supply is avoided. With an appropriate
choice of R,and R2, the same loadmg characteristic as
for the parallel termination is obtained but at the
expense of higher power consumption, A series
termination at the source end as in Figure 2(c) is an
attractive alternative because of the lower power
consumption. It provides better suppression of
reflected waves caused by line-to-line crosstalk. Both
the Thevenin and series terminations increase the
sensitivity of the output levels to the supply voltage.
-.
Figure 3: A PECL transmitter 111
fi,"
bo
........-.
.,
3.1.2 Switched-voltage Configuration
4 block diagram of a switched-voltage
configuration transmitter with a dc biasing block and
an external termination resistor is shown in Figure 3.
The required voltage references are obtained by means
of a scaled replica circuit and several feedback loops.
The loading effect of the external termination resistor
RT.is replicated by the internal resistors Rm connected
toward an internally developed voltage reference equal
to VDD- V,, with VT =2V.
The schematic of a PECL driver is shown in Figure
4. This circuit uses the technique of dynamic biasing
to minimize the transition time of the output voltage
without increasing output stage power consumption,
MI and M 2 make a differential pair. The dynamic
biasing is forced by M5 and M8.Two different values
Figure 4: A switched-voltage PECL transmitter. [l]
3.1.3 Switched-current Configuration
The block diagam of a switched-current
configuration, introduced in 121, is shown in Figure 5.
An open drain configuration is used that suffers from a
slow rise time. Two active pull-up circuits and a pulsebiasing scheme are used to improve edge transition
rates. The circuit diagram is shown at the bottom of
Figure 5. M l a and Mlh make up the differential pair
and M2 provides the dc bias. M3 with the overlap
generator and edge detector provide a pulse bias
- 1217 Authorized licensed use limited to: Carleton University. Downloaded on July 13, 2009 at 14:27 from IEEE Xplore. Restrictions apply.
current in each transient. This improves circuit rise
and fall time.
A
,
,
;
:
/- a
IC
c
..................
complexity, a simple low-power common-mode
feedback control is implemented in the transmitter.
The common-mode output voltage is sensed by means
of a high resistive divider RA-RBand compared with a
1.25-V reference by the differential amplifier M5-MX.
The fraction of the tail current IT flowing across M7
and M8 is mirrored by Mu and ML, respectively, thus
forcing VcM=1.25 V.
"im~i
.. ...................................
y*+
y
7,
U,,Ul
:.
%j
RI
h
iT
I&
m 2.
N,.
LVDS
M
.....................................
Figure 5: Switched Current Configuration 121
3.1.4 LVDS
Low Voltage Differential Signaling (LVDS) is a
low noise, low power, and low amplitude method for
high-speed (Gb/s) data transmission over copper wire.
LVDS achieves significant power savings by means of
a differential scheme for transmission and termination,
in conjunction with a low voltage swing. LVDS uses a
transmitter configured as a switched-polarity current
generator. Figure 6 shows different termination
topologies. Figure 6(a) shows a differential load
resistor at the receiver that provides current-to-voltage
conversion and optimum line matching. For operation
in the Gh/s range, an additional termination resistor is
usually placed ai the source end as seen in Figure 6@).
This serves to suppress reflected waves caused by
crosstalk or by imperfect termination, due to package
parasitics and component tolerance. Differential
transmission greatly improves the robustness of the
link to common-mode voltage bouncing (if using a
cable as the medium) and crosstalk, thereby improving
the tolerance to a reduced noise margin.
LVDS uses a lower voltage swing that further
reduces crosstalk and radiated electro-magnetic
interference (EMI). LVDS requires less power than
either differential or single-ended PECL. PECL
exhibits an open-emitter output stage and requires a
resistor to V0,-2 (V) at the receiver for line
termination and biasing or a pull-down resistor toward
ground. Whichever termination is used, the openemitter configuration and the larger voltage swing lead
to higher power consumption in a PECL link.
The circuit diagram of an LVDS transmitter is
shown in Figure 7. Either M1 and M3 or Ivi2 and M4
are turned on resulting in a different output voltage
polarity, depending on the active combination. In
order to achieve higher precision and lower circuit
DlFRREMlAL WCL
Figure 6: Different solutions for high-speed data
(a) LVDS with termination at the receiver end. (h)
LVDS link with termination at the receiver. (c)
Single-ended and (d) differential PECL links. 131
3.2 Non-differential High-speed Circuit Techniques
Despite the use of matched torminations and
carefully controlled line and connector impedance, the
better differential (current-mode) signaling methods
are still limited to a data rate of about 1.6 GHz, due to
the frequency-dependent attenuation of copper lines.
Skin-effect resistance increases the attenuation of a
conventional transmission line with liequency. For a
broadband signal the superposition of un-attenuated
low-frequency sigma1 components with attenuated
high-frequency signal components causes InterSymbol Interference (1st). This interference degrades
noise margins and reduces the maximum frequency at
which the system can operate. The main problem here
is not the magnitude of the attenuation, but rather the
interference caused by the frequency dependent nature
of the attenuation. Equalization eliminates the problem
of frequency-dependent attenuation by filtering the
transmitted or received waveform so the concatenation
of the equalizing filter and the transmission line gives
- 1218 Authorized licensed use limited to: Carleton University. Downloaded on July 13, 2009 at 14:27 from IEEE Xplore. Restrictions apply.
a flat frequency response. Equalization can be
performed digitally by a discrete-time finite impulse
response (FIR) filter or in the analog domain by a
continuous-time passive or active filter.
Figure 7: LVDS Transmitter Circuit 131
Equalizing with a FIR filter is most popular. It is
more easily realized in a standard CMOS process and
it is easier to make adaptive. Equalizing using a FIR
filter requires either an analog-to-digital converter
with at least a few bits of resolution or a high-speed
analog delay line, both difficult to design. It is more
common and much simpler to equalize at the
transmitter than at the receiver. Equalizing at the
transmitter permits the use of a simple receiver that
just samples a binary value.
In an adaptive equalizer, the coefficients are
initially calculated in the receiver during a training
sequence. The calculated coefficients are fed back to
the transmitter. As the FIR coefficients are updated at
the beginning of communication, the equalizer can
adapt to the characteristics of the line and may be used
for a wide variety of interconnections.
Adding more taps to the filter could widen the
bandwidth. The number of filter taps chosen is a
compromise between bandwidth and equalization cost.
Two configurations describing this technique are in [6]
and[11].
3.3 Multi-level Signaling
Multi-level voltage coding is used to lower the
baud rate or the frequency content of the signal in
order to reduce the IS1 and to improve the Bit Error
Rate (BER). It uses lower fUndamental frequencies
than does binary signaling at the same data rate,
thereby offering the potential of higher performance in
limited bandwidth systems.
In
a
nonretun-to-zero
(NU) N-PAM
communication system, the spectral efficiency is
2xlog2(N), which increases logarithmically with the
number of PAM levels (N). All techniques mentioned
previously are applied to multilevel signaling,
especially when equalizing in the transmitter. Optimal
detection can also be performed by a simple peak
detector or by using maximum-likelihood sequence
detection or sampled matched filtering at the receiver.
Coding can be used to improve the system error rate.
Recently, many high-speed I/O designs using NPAM for chip-to-chip communications have been
documented [4-IO].Several approaches describe using
an equalizer in the receiver side [5] (accompanied with
pre-emphasis in transmitter) while others indicate the
use of a more complicated self-adaptive equalizer only
in the transmitter [6].
A common feature of multilevel signaling
transceivers includes multiplexing and demultiplexing
the data at the transmitter output and receiver input.
This avoids any on-chip requirement for high clocking
frequencies.
3.3.1 Transmitter
Serial link transmitters fabricated in CMOS
typically use multilevel signaling (2"-PAM) and M-tap
pre-emphasis filters to reduce the ISI. A common
practice is to design the uansmitter output driver as a
k l multiplexer to reduce the clock frequency to l/k
the symbol rate and to increase the bit rate against a
process-limited on-chip frequency. The pre-emphasis
and PAM encoder circuits are usually implemented
using DACs. A transmitter-equalizer architecture is
shown in Figure 8 [6]. In adaptive schemes, the
coefficients are updated by information from the
receiver and changed to a voltage/current by the
DACs. In this diagram, the n,b DAC creates the main
symbol, the (n-i),b and (n+i)h.DACs produce the ip,
preceding and i,b trailing symbol, and the last DAC
cancels any significant echo that is too far from the
main symbol for the main tap to cancel. The outputs
of all DACs are summed and connected to the output
line.
- 1219 Authorized licensed use limited to: Carleton University. Downloaded on July 13, 2009 at 14:27 from IEEE Xplore. Restrictions apply.
Ihl
1
1
i
l
I
than the PECL techniques but requires a larger power
supply, as there are more series transistors in this
configuration.
Multi-level signaling techniques achieve a higher
bit rate at the price of more power consumption.
Using an equalization technique [4], a good trade-off
between power consumption and speed is achieved for
short channels (less than Im). A IO Gbis speed was
technology and 1W power
reported with 0 . 4 ~
consumption [7].
I
Figure 8: Common N-bit transmitter-equalizer
architecture
3.3.2 Receiver
A main issue in receiver design is in sampling the
incoming data. It can be performed by adaptively
setting the phase of a sampler [IO]. Over-sampling is
an alternative but is not feasible as the serial link
operates at the maximum possible technology speed.
In [9], three ADCs are activated by three equally
spaced clock phases for sampling the incoming data.
This is equivalent to three times over-sampling of the
channel data rate and is similar to using the
multiplexer in the transmitter. AAer correct sampling,
the ADC converts the incoming continuous-time
signal to digital data. The bit resolution of the ADC is
dependent on the number of levels in the signaling
scheme. If the transmitter and the receiver have
different reference voltages lhen a resistor ladder may
be required. This ladder may not be optimally centered
and might not cover the complete input voltage range.
[9] solves this problem by using a calibration circuit.
4.0 COMPARISON OF CONFIGURATIONS
Both PECL configurations use a bias generator
with scaled replica feedback and dynamic biasing to
improve the transient edges and to maintain the proper
output voltage. The core of the PECL circuit is a
differential pair that is used in the switched-current
configuration directly connected to the output through
its drain (open drain configuration). In the switchedvoltage configuration, the differential pair output is
connected to the line through a source follower stage.
LVDS also uses a semi-differential configuration
with a form of bridge. LVDS consumes less power
References
[I] A. Boni, “1.2-Chis True PBCL IOOK Compatible I/O
Interface in 0.35-pm CMOS” IEEE Journal of SolidState Circuits, vol. 36, pp. 979-986,2001.
[2]H. Djahdnshahi, F. Haiisen and C.A.T. Salama, “Gigabit
per Second ECL Compatible IiO Interface in 0.35micron CMOS,” IEEE Journal ofSolid-Srafe Circuits,
vol. 34,pp. 1074-1083,1999.
[3]A. Boni, A. Pierazzi and D. Vecchi, “LVDS U 0 Interface
for Gbis-per-pin Operation in 0.35-pm CMOS”, IEEE
Journal ofSolid-Stute Circuits, vol. 36, pp. 706-71 1,
2001.
[4]M.-J.E. Lee, W. J. Dally andP. Chiang, “Low-power
Area-efficient High-speed 110 Circuit Techniques”,
IEEE JournalofSolid-Stute Circuits. vol. 35. nD. 15911599,2000.
[SIR. Faqad-Rad, C.-K. K. Yang and M.A. Horowih, “A
0.3-um CMOS X-Gb/s 4-PAM serial link transceiver”.
IEEE Journal of Solid-State Circuit,T, vol. 35, pp. 757:
764,2000.
[6]J. T. Stonick, W. Gu-Yeon, J.L. Sonntag and D. K.
Weinlader, “An Adaptive PAM-4 5 Gbis Backplane
Transceiver in 0.25pm CMOS”, IEEE Journal of SolidSfate Circuits, vol. 38, pp. 436443,2003.
[7] R. Faqad-Rad, C.-K. K. Yang, M.A. Horowilz and T. H.
Lee, “A 0.4um CMOS 10Gbis4-PAM PreEinphasis
Serial Link Transmitter”, IEEE Journal ofSolid-State
Circuits, vol. 34, pp. 436-443, 1999.
[XI W. J. Dally and J. Poulton, “Transmitter equalization for
4-Gbps signaling”, IEEE Micro, vol. 17, pp. 48-56,
1997.
[9] J. L. Zerbe, P. S . Chau, C. W. Werner, T. P. Thrush, H. J.
Liaw, B.W. Garlepp and K.S. Donnelly, “1.6 Gbisipin 4PAM signaling and circuits for a multidrop bus”, IEEE
Journal ofSolid-State Circuits, vol. 36, pp. 752-760,
2001.
[IO] D. J. Foley and M. P. Flynn, “A low-power %PAM
serial transceiver in 0.5-pm digital CMOS”, IEEE
Journal of Solid-Stare Circuits, vol. 37, pp. 310-316,
2002.
[l I] Lei Lin, Peter Noel and Tad Kwasniewski,
“Implementinga Digitally SynthesizedAdaptive Preemphasis Algorithm for use in a High-speed Backplane
Interconnection.” Canadian Conference OII Computer
and Electrical Engineering,May 2004
- 1220 -
Authorized licensed use limited to: Carleton University. Downloaded on July 13, 2009 at 14:27 from IEEE Xplore. Restrictions apply.
,..
Download