Low-Power High-Speed Links

advertisement
Low-Power High-Speed Links
Gu-Yeon Wei
Division of Engineering and Applied Sciences
Harvard University
6.976 Guest Lecture, Spring 2003
Outline
•
•
•
•
•
Wei
Motivation
Brief Overview of High-Speed Links
Design Considerations for Low Power
DVS Link Design Example
Summary
Low-Power High-Speed Links
2
Motivation
•
•
Demand for high bandwidth communications
Advancements in IC fabrication technology
–
–
Æ
Æ
•
Higher performance
More complex functionality
Chip I/O becomes performance bottleneck
Increasing power consumption
Network router example
network card
switch card
digital
crosspoint
transceivers
backplane PCB
•
Wei
10’s to 100’s of links on a single crosspoint switch
Low-Power High-Speed Links
50-Ω traces
3
High-Speed Links Overview
•
High-speed data communication between chips across an
impedance controlled channel
– Shared communication bus (memories)
– Point-to-point links
•
Types of link architectures and implementations
–
–
–
–
•
Parallel vs. serial
Differential vs. single-ended
Low-impedance vs. high-impedance driver
Transmitter-only vs. receiver-only vs. double termination
We will focus on point-to-point serial links using differential highimpedance drivers with double termination for network routers
– Techniques to reduce power applicable to other link types
Wei
Low-Power High-Speed Links
4
Link Components
10010
10010
TX
RX
channel
data in
data out
• High-speed links consist of 4 main components
timing
recovery
–
–
–
–
Wei
Serializing transmitter driver
Communication channel
De-serializing receiver samplers
Timing recovery
Low-Power High-Speed Links
5
Performance Limitations
•
•
Important to look at performance because higher performance can lead
to lower power by trading off performance for energy reduction
Several factors limit the performance of high-speed links
– Non-ideal channel characteristics
– Bandwidth limits of transceiver circuitry
– Noise from power supply, cross talk, clock jitter, device mismatches, etc.
•
Eye diagrams – a qualitative measure of link performance
random bit stream
Tbit
ideal data eye
Wei
realistic data eye
Low-Power High-Speed Links
6
Channel Impairments and ISI
•
One of the dominant causes of eye closure is inter-symbol
interference (ISI) due to channel bandwidth limitations
– Two ways to view channel impairments
Wei
Low-Power High-Speed Links
7
Equalization
•
Placing a high-pass filter in the signal path can counter the rolloff effects of the channel
– Preemphasis or transmit-side equalization is commonly used
Wei
Low-Power High-Speed Links
8
Critical Path in Links
•
The critical path in links can be as short as 1~2 gate delays
–
Φ Φ
Transmit path
Rterm
–
D
Q
D
Q
Zo=50Ω
Receive path
Rterm
Deven
Zo=50Ω
D
Q
D
Q
Φ
Dodd
Vref
Φ
Wei
Low-Power High-Speed Links
9
Clock Frequency Limit
•
While a symbol time can be short, there is a limit to the maximum
on-chip clock frequency
– Must distribute a clock driven by a buffer chain
CLKIN
•
Wei
CLKOUT
Overcome this limitation with parallelism
Low-Power High-Speed Links
10
Parallelism
•
Parallelism can increase bit rate even with limited clock
frequency
– Time-interleaved multiplexing
– Multi-level signaling
clk[n]
10
clk[n+1]
11
01
00
data
•
Some performance issues to be wary of
– Static timing offsets in multi-phase clock generator (DLL or PLL)
– Requires higher voltage dynamic range in transmitter and receiver
•
Wei
Parallelism can also be low power
Low-Power High-Speed Links
11
Sources of Noise
• Power supply noise
– Translates into voltage and timing uncertainty
• Cross talk
– Near- and Far-End Cross Talk (NEXT and FEXT)
– High-frequency coupling
• Clock Jitter
– Timing uncertainty in transmitted and sampled symbol
– Probabilistic distribution of timing edges (bounded and
unbounded components)
• Device mismatches and systematic offsets
– Deterministic or systematic variation in timing edges from
multi-phase clock generators
Wei
Low-Power High-Speed Links
12
Considerations for Low Power
• Low noise Æ low power
– Target some signal to noise ratio (SNR)
– Reducing noise allows for lower signal power
• Trade speed for lower power
– Reducing bit rate improves SNR
– Many noise sources are fixed Æ ratio of timing uncertainty to
bit time improves (have longer bit times)
• Let’s look at a few design choices for low power
– Circuit level
– Architecture level
– System level
Wei
Low-Power High-Speed Links
13
Offset Calibration
•
Two sets of offsets that can manifest itself as voltage and timing
uncertainty to close the eye and may require higher power to overcome
them:
– Multi-phase clock generator timing offsets
– Receiver input voltage offsets
ideal Tx timing
higher
margins
zero
Rx offset
lower
margins
non-zero
Rx offset
w/ offsets
sampling edge
•
Static offsets due to systematic (layout) and random (device)
mismatches
– Calibration enables more timing and voltage margins (i.e., lower noise)
Wei
Low-Power High-Speed Links
14
Differential Signaling
•
Differential communication can lead to a lower power solution
–
–
•
Immunity to common mode noise
Injects less noise into the supplies
But aren’t there now are two channels that switch? Yes, but…
–
–
Signal amplitudes can be smaller on both channels
Alternative is pseudo-differential signaling but needs a reference voltage which can be
noisy and require larger Vswing
Rterm
Zo=50Ω
Rterm
D
Vswingdiff pk2pk = 2(I*R)
Rterm
Zo=50Ω
Vswing = I*R
Zo=50Ω
Vref
•
What does it cost?
–
Wei
Requires two pins per link
Low-Power High-Speed Links
15
Signal Multiplexing
•
There are different options for choosing where to combine pulses to
create sub-clock period symbols
– Combine at the final transmitter stage vs. farther up stream
Rterm
Rterm
Zo=50Ω
high-speed path
D0
D1
D2
Zo=50Ω
D3
– Cload for the clocks higher when combined at the final transmitter
vs.
– Need faster signal path after the multiplexer
•
Wei
Best choice depends on implementation (see Zerbe, ISSCC2003)
Low-Power High-Speed Links
16
Multiplexing
Parallelized Transmitter
Parallelized Receiver
TX
RX
TX
channel
TX
bitrate = M.fclk
RX
RX
Multiphase Clocks
Wei
Low-Power High-Speed Links
17
Multiplexing = Low Power?
With M:1 multiplexing, fCLK = bit rate/M
• Power = M·CV2·f = M·CV2 ·(BR/M) = C·V2·BR
• With fixed supply, power does not vary with M
But wait, at lower frequencies, I can lower voltage!
• With lower supply voltage (V ∝ fCLK = BR/M),
• Power decreases as ∝ 1/M2 !
Wei
Low-Power High-Speed Links
18
Power vs. M
•
Larger M
Fixed Supply
B Can reduce voltage
B Lower power
B Less accurate timing
Lower Supply ∝ 1/M or fCLK
– static phase offsets
– jitter
•
•
Cannot make M
arbitrarily large b/c there
is a lower limit to Vdd
Choice: M= 4~6
choice
This begs the questions… What if we make Vdd adaptive
w/ fCLK?
Wei
Low-Power High-Speed Links
19
DVS Links
• Dynamic Voltage Scaling (DVS)
– Technique first introduced for digital systems (e.g., uP, DSP
chips)
• Lot’s of work done in both academia and industry
(e.g., Intel, Transmeta)
– Allows trade off between speed and power
• Let’s investigate DVS for high-speed links
– Motivation and potential benefits
– Design example from Dr. Jaeha Kim
(ISSCC2002, JSSC2002, PhD thesis 2002)
Wei
Low-Power High-Speed Links
20
DVS Links
•
Dynamic Voltage Scaling (DVS) can reduce power consumption in
two ways
1) Digital circuits operate at their most energy-efficient point in the
presence of PVT variations by eliminating extra performance margins
Normalized Frequency
1.2
1
Pdynamic = αCswitched V 2F
88%
0
0.8
50
130
0.6
0.5 V
0.4
0.2
0
0.8 V
1
1.5
2
2.5
3
3.5
Supply Voltage (V)
Wei
Low-Power High-Speed Links
21
Trade Performance for Energy Savings
2) DVS enables trade off between performance and energy
•
Reducing frequency alone reduces power but not energy per bit
Fixed Vdd = 3.3V
Normalized energy / bit
1
0.8
Energy Savings
0.6
0.4
Dynamically scaled Vdd
0.2
0
Wei
E ∝ C ⋅V 2
0
0.2
0.4
0.6
Normalized bit rate
Low-Power High-Speed Links
0.8
1
22
DVS Link Components
•
DVS links require two additional components
– Mechanism to measure circuit critical path to appropriately adjust
voltage with respect to frequency
• Use an on-chip performance monitor circuit (inverter delay elements of
core DLL)
– Efficient supply-voltage regulator (buck converter)
•
Overall Block diagram (Wei et al, ISSCC2000)
FCLK
VCTRL
core DLL
Digital
Controller
Buck Converter
I/O Transceiver
DTX
RVdd
DRX
Wei
Low-Power High-Speed Links
23
Performance Monitoring DLL
CP
0
A
O
UP
VCP
DN
PD
VCTRL
180
1
O
A
clk
•
Reduces design complexity by enabling one to replace precision analog
components with simple digital gates. How?
– Inverters of the delay line model the critical path (clock distribution)
– Delay of gates in I/O circuitry are fixed relative to clock period
Wei
Low-Power High-Speed Links
24
Adaptive Receiver Filter
•
•
Filter signal frequencies beyond the Nyquist rate at the receiver (helps
for dealing with cross talk)
Receiver example
IN
IN
Vbias
Φ0
AdB
Preamplifier
fsymbol
Φ1
f
Regenerative
Latch
– Filter’s corner frequency tracks fsymbol
Wei
Low-Power High-Speed Links
25
Example: Adaptive Supply Serial Links
• Jaeha Kim (Ph.D. defense 2002)
Multiphase Clock
Recovery
1:5 Demultiplexing
Receiver
fref
Wei
Multiphase Clock
Generation
Adaptive
Supply, V
5:1 Multiplexing
Transmitter
Adaptive Power
Supply Regulator
Low-Power High-Speed Links
26
Multi-Phase Clock Generation
• Must minimize static offsets between phases
• Generate multiphase clocks locally at each pin, but
watch out for power and area overhead
Static Offsets
Jitter
TX
TX
TX
Wei
Low-Power High-Speed Links
27
Dual-Loop Clock Generation
Adaptive Power Supply Regulator
fref
Coarse Control
f
Reference VCO
Local Multiphase Clock Recovery
Wei
Adaptive Supply, V
Local Multiphase Clock Generation
Fine Control
Fine Control
Local VCO
Local VCO
1:5 Demultiplexing
Receiver
5:1 Multiplexing
Transmitter
Low-Power High-Speed Links
28
Dual-Loop Clock Generation (2)
• Global loop brings the local VCO frequency close
to lock
• Narrow local tuning range (+/-15%) is sufficient to
compensate for on-chip mismatches
• Narrow tuning range leads to low VCO gain
– Small loop capacitor area (2.5pF)
– Low sensitivity on Vctrl noise
Wei
Low-Power High-Speed Links
29
Clock Recovery
• Optimal receiver timing is recovered from the
incoming data stream
Data
RX
PD
Clock
Recovery
Wei
Filter
Low-Power High-Speed Links
VCO
30
Phase Detection
• Phase detector made of an identical set of
receivers minimizes timing error
PD
RX
PD
RX
PD
RX
PD
RX
PD
Ψ[4:0]
5
RX
PD
Edge Detection
/ Majority Voting
Early/Late
/None
5
Data
RX
Wei
Received Data
Low-Power High-Speed Links
31
TX
data gen
Testing
Interface
TX/RX
Feedback
Biasing
RX- PRBS
PLL RX
Wei
TX
Digital Sliding
Controller
Power
Transistors
data gen
TX
TX-DLL
TX
TX-PLL
Chip Prototype
RX-DLL
RX
•
•
•
•
•
•
•
•
PRBS
Low-Power High-Speed Links
0.25µm CMOS
2.5V / 0.55Vth
3.1×2.9mm2
0.4~5.0Gb/s
0.9~2.5V
5.6~375mW
BER < 10-15
Reg. Efficiency:
83-94%
32
Power and Performance
3.1Gb/s,
113mW
Wei
Low-Power High-Speed Links
33
Power Breakdown
Wei
Low-Power High-Speed Links
34
Summary
•
Higher performance links require low noise Æ low noise
solutions lead to lower power
– Trade performance (speed) for power reduction
•
DVS links enable energy-efficient link operation and also have
some nice properties
•
Outstanding issues with using DVS links
– Communication during frequency and voltage transitions
– Supply voltage regulator slew rate limits
– Overhead of multiple regulators
Wei
Low-Power High-Speed Links
35
Download