Low-Power High-Speed Links Gu-Yeon Wei Division of Engineering and Applied Sciences Harvard University 6.976 Guest Lecture, Spring 2003 Outline • • • • • Wei Motivation Brief Overview of High-Speed Links Design Considerations for Low Power DVS Link Design Example Summary Low-Power High-Speed Links 2 Motivation • • Demand for high bandwidth communications Advancements in IC fabrication technology – – Æ Æ • Higher performance More complex functionality Chip I/O becomes performance bottleneck Increasing power consumption Network router example network card switch card digital crosspoint transceivers backplane PCB • Wei 10’s to 100’s of links on a single crosspoint switch Low-Power High-Speed Links 50-Ω traces 3 High-Speed Links Overview • High-speed data communication between chips across an impedance controlled channel – Shared communication bus (memories) – Point-to-point links • Types of link architectures and implementations – – – – • Parallel vs. serial Differential vs. single-ended Low-impedance vs. high-impedance driver Transmitter-only vs. receiver-only vs. double termination We will focus on point-to-point serial links using differential highimpedance drivers with double termination for network routers – Techniques to reduce power applicable to other link types Wei Low-Power High-Speed Links 4 Link Components 10010 10010 TX RX channel data in data out • High-speed links consist of 4 main components timing recovery – – – – Wei Serializing transmitter driver Communication channel De-serializing receiver samplers Timing recovery Low-Power High-Speed Links 5 Performance Limitations • • Important to look at performance because higher performance can lead to lower power by trading off performance for energy reduction Several factors limit the performance of high-speed links – Non-ideal channel characteristics – Bandwidth limits of transceiver circuitry – Noise from power supply, cross talk, clock jitter, device mismatches, etc. • Eye diagrams – a qualitative measure of link performance random bit stream Tbit ideal data eye Wei realistic data eye Low-Power High-Speed Links 6 Channel Impairments and ISI • One of the dominant causes of eye closure is inter-symbol interference (ISI) due to channel bandwidth limitations – Two ways to view channel impairments Wei Low-Power High-Speed Links 7 Equalization • Placing a high-pass filter in the signal path can counter the rolloff effects of the channel – Preemphasis or transmit-side equalization is commonly used Wei Low-Power High-Speed Links 8 Critical Path in Links • The critical path in links can be as short as 1~2 gate delays – Φ Φ Transmit path Rterm – D Q D Q Zo=50Ω Receive path Rterm Deven Zo=50Ω D Q D Q Φ Dodd Vref Φ Wei Low-Power High-Speed Links 9 Clock Frequency Limit • While a symbol time can be short, there is a limit to the maximum on-chip clock frequency – Must distribute a clock driven by a buffer chain CLKIN • Wei CLKOUT Overcome this limitation with parallelism Low-Power High-Speed Links 10 Parallelism • Parallelism can increase bit rate even with limited clock frequency – Time-interleaved multiplexing – Multi-level signaling clk[n] 10 clk[n+1] 11 01 00 data • Some performance issues to be wary of – Static timing offsets in multi-phase clock generator (DLL or PLL) – Requires higher voltage dynamic range in transmitter and receiver • Wei Parallelism can also be low power Low-Power High-Speed Links 11 Sources of Noise • Power supply noise – Translates into voltage and timing uncertainty • Cross talk – Near- and Far-End Cross Talk (NEXT and FEXT) – High-frequency coupling • Clock Jitter – Timing uncertainty in transmitted and sampled symbol – Probabilistic distribution of timing edges (bounded and unbounded components) • Device mismatches and systematic offsets – Deterministic or systematic variation in timing edges from multi-phase clock generators Wei Low-Power High-Speed Links 12 Considerations for Low Power • Low noise Æ low power – Target some signal to noise ratio (SNR) – Reducing noise allows for lower signal power • Trade speed for lower power – Reducing bit rate improves SNR – Many noise sources are fixed Æ ratio of timing uncertainty to bit time improves (have longer bit times) • Let’s look at a few design choices for low power – Circuit level – Architecture level – System level Wei Low-Power High-Speed Links 13 Offset Calibration • Two sets of offsets that can manifest itself as voltage and timing uncertainty to close the eye and may require higher power to overcome them: – Multi-phase clock generator timing offsets – Receiver input voltage offsets ideal Tx timing higher margins zero Rx offset lower margins non-zero Rx offset w/ offsets sampling edge • Static offsets due to systematic (layout) and random (device) mismatches – Calibration enables more timing and voltage margins (i.e., lower noise) Wei Low-Power High-Speed Links 14 Differential Signaling • Differential communication can lead to a lower power solution – – • Immunity to common mode noise Injects less noise into the supplies But aren’t there now are two channels that switch? Yes, but… – – Signal amplitudes can be smaller on both channels Alternative is pseudo-differential signaling but needs a reference voltage which can be noisy and require larger Vswing Rterm Zo=50Ω Rterm D Vswingdiff pk2pk = 2(I*R) Rterm Zo=50Ω Vswing = I*R Zo=50Ω Vref • What does it cost? – Wei Requires two pins per link Low-Power High-Speed Links 15 Signal Multiplexing • There are different options for choosing where to combine pulses to create sub-clock period symbols – Combine at the final transmitter stage vs. farther up stream Rterm Rterm Zo=50Ω high-speed path D0 D1 D2 Zo=50Ω D3 – Cload for the clocks higher when combined at the final transmitter vs. – Need faster signal path after the multiplexer • Wei Best choice depends on implementation (see Zerbe, ISSCC2003) Low-Power High-Speed Links 16 Multiplexing Parallelized Transmitter Parallelized Receiver TX RX TX channel TX bitrate = M.fclk RX RX Multiphase Clocks Wei Low-Power High-Speed Links 17 Multiplexing = Low Power? With M:1 multiplexing, fCLK = bit rate/M • Power = M·CV2·f = M·CV2 ·(BR/M) = C·V2·BR • With fixed supply, power does not vary with M But wait, at lower frequencies, I can lower voltage! • With lower supply voltage (V ∝ fCLK = BR/M), • Power decreases as ∝ 1/M2 ! Wei Low-Power High-Speed Links 18 Power vs. M • Larger M Fixed Supply B Can reduce voltage B Lower power B Less accurate timing Lower Supply ∝ 1/M or fCLK – static phase offsets – jitter • • Cannot make M arbitrarily large b/c there is a lower limit to Vdd Choice: M= 4~6 choice This begs the questions… What if we make Vdd adaptive w/ fCLK? Wei Low-Power High-Speed Links 19 DVS Links • Dynamic Voltage Scaling (DVS) – Technique first introduced for digital systems (e.g., uP, DSP chips) • Lot’s of work done in both academia and industry (e.g., Intel, Transmeta) – Allows trade off between speed and power • Let’s investigate DVS for high-speed links – Motivation and potential benefits – Design example from Dr. Jaeha Kim (ISSCC2002, JSSC2002, PhD thesis 2002) Wei Low-Power High-Speed Links 20 DVS Links • Dynamic Voltage Scaling (DVS) can reduce power consumption in two ways 1) Digital circuits operate at their most energy-efficient point in the presence of PVT variations by eliminating extra performance margins Normalized Frequency 1.2 1 Pdynamic = αCswitched V 2F 88% 0 0.8 50 130 0.6 0.5 V 0.4 0.2 0 0.8 V 1 1.5 2 2.5 3 3.5 Supply Voltage (V) Wei Low-Power High-Speed Links 21 Trade Performance for Energy Savings 2) DVS enables trade off between performance and energy • Reducing frequency alone reduces power but not energy per bit Fixed Vdd = 3.3V Normalized energy / bit 1 0.8 Energy Savings 0.6 0.4 Dynamically scaled Vdd 0.2 0 Wei E ∝ C ⋅V 2 0 0.2 0.4 0.6 Normalized bit rate Low-Power High-Speed Links 0.8 1 22 DVS Link Components • DVS links require two additional components – Mechanism to measure circuit critical path to appropriately adjust voltage with respect to frequency • Use an on-chip performance monitor circuit (inverter delay elements of core DLL) – Efficient supply-voltage regulator (buck converter) • Overall Block diagram (Wei et al, ISSCC2000) FCLK VCTRL core DLL Digital Controller Buck Converter I/O Transceiver DTX RVdd DRX Wei Low-Power High-Speed Links 23 Performance Monitoring DLL CP 0 A O UP VCP DN PD VCTRL 180 1 O A clk • Reduces design complexity by enabling one to replace precision analog components with simple digital gates. How? – Inverters of the delay line model the critical path (clock distribution) – Delay of gates in I/O circuitry are fixed relative to clock period Wei Low-Power High-Speed Links 24 Adaptive Receiver Filter • • Filter signal frequencies beyond the Nyquist rate at the receiver (helps for dealing with cross talk) Receiver example IN IN Vbias Φ0 AdB Preamplifier fsymbol Φ1 f Regenerative Latch – Filter’s corner frequency tracks fsymbol Wei Low-Power High-Speed Links 25 Example: Adaptive Supply Serial Links • Jaeha Kim (Ph.D. defense 2002) Multiphase Clock Recovery 1:5 Demultiplexing Receiver fref Wei Multiphase Clock Generation Adaptive Supply, V 5:1 Multiplexing Transmitter Adaptive Power Supply Regulator Low-Power High-Speed Links 26 Multi-Phase Clock Generation • Must minimize static offsets between phases • Generate multiphase clocks locally at each pin, but watch out for power and area overhead Static Offsets Jitter TX TX TX Wei Low-Power High-Speed Links 27 Dual-Loop Clock Generation Adaptive Power Supply Regulator fref Coarse Control f Reference VCO Local Multiphase Clock Recovery Wei Adaptive Supply, V Local Multiphase Clock Generation Fine Control Fine Control Local VCO Local VCO 1:5 Demultiplexing Receiver 5:1 Multiplexing Transmitter Low-Power High-Speed Links 28 Dual-Loop Clock Generation (2) • Global loop brings the local VCO frequency close to lock • Narrow local tuning range (+/-15%) is sufficient to compensate for on-chip mismatches • Narrow tuning range leads to low VCO gain – Small loop capacitor area (2.5pF) – Low sensitivity on Vctrl noise Wei Low-Power High-Speed Links 29 Clock Recovery • Optimal receiver timing is recovered from the incoming data stream Data RX PD Clock Recovery Wei Filter Low-Power High-Speed Links VCO 30 Phase Detection • Phase detector made of an identical set of receivers minimizes timing error PD RX PD RX PD RX PD RX PD Ψ[4:0] 5 RX PD Edge Detection / Majority Voting Early/Late /None 5 Data RX Wei Received Data Low-Power High-Speed Links 31 TX data gen Testing Interface TX/RX Feedback Biasing RX- PRBS PLL RX Wei TX Digital Sliding Controller Power Transistors data gen TX TX-DLL TX TX-PLL Chip Prototype RX-DLL RX • • • • • • • • PRBS Low-Power High-Speed Links 0.25µm CMOS 2.5V / 0.55Vth 3.1×2.9mm2 0.4~5.0Gb/s 0.9~2.5V 5.6~375mW BER < 10-15 Reg. Efficiency: 83-94% 32 Power and Performance 3.1Gb/s, 113mW Wei Low-Power High-Speed Links 33 Power Breakdown Wei Low-Power High-Speed Links 34 Summary • Higher performance links require low noise Æ low noise solutions lead to lower power – Trade performance (speed) for power reduction • DVS links enable energy-efficient link operation and also have some nice properties • Outstanding issues with using DVS links – Communication during frequency and voltage transitions – Supply voltage regulator slew rate limits – Overhead of multiple regulators Wei Low-Power High-Speed Links 35