Design of High-Speed Links: A look at Modern VLSI Design Vladimir Stojanović Integrated Systems Group Massachusetts Institute of Technology Chip design is changing Becoming constrained by power Not so much by area/density Pentium 3M transistors 30mW/mm2 0.6um tech 4W 0.1GHz Pentium 4 125M transistors 850mW/mm2 90nm tech 103W 3.4GHz Best systems trade-off circuits, architecture and system issues Integrated Systems Group 2 Power-performance system optimization Complex, many levels of hierarchy and variables Integrated Systems Group 3 Power-performance system optimization Complex, many levels of hierarchy and variables Individual components Flops & latches (power and timing critical) D Q D Q Logic Clk Clk V. Stojanović, V.G. Oklobdzija "Comparative Analysis of MS Latches and Flip-Flops for High-Performance and Low-Power Systems," IEEE Journal Solid-State Circuits, April 1999. Integrated Systems Group 4 Power-performance system optimization Complex, many levels of hierarchy and variables Vdd1, Vth1 Individual components Flops & latches (power and timing critical) D Q D Q Vdd2, Vdd3, Vth2 Vth3 Vdd4, Vdd5, Vth4 Vth5 Logic A Clk -Physical (Vdd,Vth,Sizing) -Logic -uArchitecture (parallelism, pipelining) D Q Logic Clk System level, VLSI blocks and circuits Logic B Clk D Q D Q Logic A D Q Logic A Logic B Clk D Q Logic B Clk Clk Clk V. Stojanovic, D. Markovic, B. Nikolic, M. A. Horowitz and R. W. Brodersen "Energy-Delay Tradeoffs in Combinational Logic using Gate Sizing and Supply Voltage Optimization," European Solid-State Circuits Conference, September 2002 Integrated Systems Group 5 Look at system-level problem: links Seems pretty simple: Channel Transmitter Receiver Challenging multi-disciplinary area Circuits Communications Optimization Integrated Systems Group 6 What makes it challenging High speed link chip > 2 GHz signals Now, the bandwidth limit is in wires Integrated Systems Group 7 New link design Dealing with bandwidth limited channels This is an old research area But can’t directly apply their solutions Textbooks on digital communications Think modems, DSL Standard approach requires high-speed A/Ds and digital signal processing 20Gs/s A/Ds are expensive (Un)fortunately need to rethink issues Integrated Systems Group 8 Outline Show system level optimization for links Create a framework to evaluate trade-offs Background on high-speed links High-speed link modeling System level optimization Practical implementation issues Current / future work Integrated Systems Group 9 Backplane environment Package On-chip parasitic Line card trace Back plane trace (termination resistance and device loading capacitance) Back plane connector Package via Line card via Backplane via Line attenuation Reflections from stubs (vias) Integrated Systems Group 10 Backplane channel Loss is variable Same backplane Different lengths Different stubs Top vs. Bot 0 -20 -30 -40 >30dB @ 3GHz But is that bad? -50 9" FR4 -10 Attenuation is large Attenuation [dB] 26" FR4 9" FR4, via stub 26" FR4, via stub -60 Required signal amplitude set by noise 0 2 Integrated Systems Group 4 6 8 10 frequency [GHz] 11 Inter-symbol interference (ISI) Channel is low pass Our nice short pulse gets spread out pulse response 1 0.8 0.6 Tsymbol=160ps 0.4 0.2 Dispersion – short latency (skin-effect, dielectric loss) Reflections – long latency (impedance mismatches – connectors, via stubs, device parasitics, package) 0 0 1 2 3 ns Integrated Systems Group 12 ISI 1 Error! Amplitude 0.8 0.6 0.4 0.2 0 0 2 4 6 8 10 12 Symbol time 14 16 18 Middle sample is corrupted by 0.2 trailing ISI (from the previous symbol), and 0.1 leading ISI (from the next symbol) resulting in 0.3 total ISI As a result middle symbol is detected in error Integrated Systems Group 13 Prior state of high-speed links Driver/ Equalizer dataIn Data Slicer Channel serializer deserializer dataOut ref Clk PLL Links components well developed Clock, data recovery Fast multiplexed transmitters and receivers Precise timing generation and data recovery Starting to use equalization (1 – 2 taps) Few taps set manually at the transmitter Integrated Systems Group 14 Barriers to improving link performance No good link system and noise models Maximum achievable data rates – unknown Cannot predict the “right” architecture for a given set of channels Need to make performance/power tradeoff Limited link communication system design Peak power constraint in the transmitter No solution for optimal transmit equalization No solution for automatic equalization Integrated Systems Group 15 Previous system models Mostly non-existent Borrowed from computer systems Worst case analysis Borrowed from data communications Gaussian distributions Can be too pessimistic in links Works well near mean Often way off at tails ISI distribution is bounded Need accurate models To relate the power/complexity to performance Integrated Systems Group 16 How bad is Gaussian model? -2 -4 -6 -8 -10 -1 0 40m V erro r @ 10 25% o f eye h eig h t 0 25 50 75 100 re sidual ISI [m V ] 10 Steady-State Phase Probability 0 Impact on CDR phase 0 -2 9% T s ym bol -4 -6 -8 -10 4% T s ym bol log log 10 probability [cdf] Cumulative ISI distribution erro r @ 10 80 -1 0 100 120 140 160 180 phase count Gaussian model only good down to 10-3 probability Way pessimistic for much lower probabilities Integrated Systems Group 17 A new model Use direct noise and interference statistics Main system impairments Interference Voltage noise (thermal, supply, offsets, quantization) Timing noise – always looked at separately Key to integrate with voltage noise sources Need to map from time to voltage Integrated Systems Group 18 Effect of timing noise Voltage noise when receiver clock is off Jittered sampling Ideal sampling Voltage noise The effect depends on the size of the jitter, the input sequence, and the channel Need effective voltage noise distribution Integrated Systems Group 19 Example: Effect of transmitter jitter ideal bk ε ε TX k kT TX k +1 (k + 1)T kT 2 + (k +1)T ε kTX − bk bk ε TX k +1 ≈ noise − bk ε kTX Decompose output into ideal and noise Noise are pulses at front and end of symbol bk ε kTX+1 1 bk Width of pulse is equal to jitter Approximate with deltas on bandlimited channels V. Stojanović, M. Horowitz, “Modeling and Analysis of High-Speed Links,” IEEE Custom Integrated Circuits Conference, September 2003. (invited) Integrated Systems Group 20 Jitter effect on voltage noise Transmitter jitter High frequency (cycle-cycle) jitter is bad Changes the energy (area) of the symbol No correlation of noise sources that sum Low frequency jitter is less bad Effectively shifts waveform Correlated noise give partial cancellation εkRx Receive jitter ≡ εkRx Modeled by shift of transmit sequence Same as low frequency transmitter jitter Bandwidth of the jitter is critical It sets the magnitude of the noise created Integrated Systems Group 21 RefClk Phase +detector − Kpd Icp Icp VCO R Kvco/s C Clock buffer Noise transfer functions [dB] Jitter source from PLL clocks 10 from input clock from clock buffer supply 0 -10 -20 from VCO supply -30 ÷N 5 10 Noise sources 6 10 7 10 8 10 9 10 10 10 frequency [Hz] Reference clock phase noise VCO supply noise Clock buffer supply noise M. Mansuri, C-K.K. Yang, "Jitter optimization based on phase-locked loop design parameters," IEEE Journal Solid-State Circuits, Nov. 2002 E. Alon, V. Stojanovic, M. Horowitz “Circuits and Techniques for High-Resolution Measurement of On-Chip Power Supply Noise,” IEEE Symposium on VLSI Circuits, June 2004. Integrated Systems Group 22 2x Oversampled bang-bang CDR Slicer deserializer dn dataOut dn PD en data Clk edge Clk Phase mixer en (late) ref Clk PLL dn-1 Generate early/late from dn,dn-1,en Phase control Simple 1st order loop, cancels receiver setup time Now need jitter on data Clk, not PLL output Base linear PLL jitter Add non-linear phase selector noise from CDR Integrated Systems Group 23 Bang-bang CDR model Model CDR loop as a state machine – Markov chain log 10 Steady-State Probability 0 pdn,i phold ,i -5 -10 φi −1 φi pup,i φi +1 -15 0 50 100 150 200 250 Phase Count Gives the probability distribution of phase Which is the CDR jitter distribution A.E. Payzin, "Analysis of a Digital Bit Synchronizer," IEEE Transactions on Communications, April 1983. Integrated Systems Group 24 Outline Show system level optimization for links Create a framework to evaluate trade-offs Background on high-speed links High-speed link modeling System level optimization Limits – What is the capacity of these links? Improving today’s baseband signaling Practical implementation issues Current / future work Integrated Systems Group 25 Attenuation [dB] Baseline channels 0 -20 26" NELCO, no stub (b) -40 -60 -80 26" FR4, via stub -100 0 5 10 15 20 frequency [GHz] Legacy (FR4) - lots of reflections Microwave engineered (NELCO) Integrated Systems Group 26 Capacity with link-specific noise FR4 140 therm al noise 120 100 therm al noise and LC PLL phase noise 80 Capacity [Gb/s] Capacity [Gb/s] NELCO 140 120 100 therm al noise 80 therm al noise and ring PLL phase noise 60 60 40 40 20 20 0 -25 0 -25 -20 therm al noise and LC PLL phase noise log10(Clipping probability) -15 -10 -5 0 Effective noise from phase noise -20 log10(Clipping probability) -15 -10 -5 0 therm al noise and ring PLL phase noise Proportional to signal energy Decreases expected gains Still, capacity much higher than data rates in today’s links Integrated Systems Group 27 Removing ISI Linear transmit equalizer Tx Data Sampled Data Anticausal taps Deadband Feedback taps Channel 50Ω Causal taps outP outN d d I eq 0 TapSel Logic 50Ω Decision-feedback equalizer Transmit and Receive Equalization Changes signal to correct for ISI Often easier to work at transmitter DACs easier than ADCs J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM Backplane Transceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003. Integrated Systems Group 28 Tx Data Anticausal taps Attenuation [dB] Transmit equalization – headroom constraint Peak power constraint 0 unequalized -5 -10 -15 Channel equalized -20 Causal taps -25 0 frequency [GHz] 0.5 1 1.5 2 2.5 Amplitude of equalized signal depends on the channel Transmit DAC has limited voltage headroom Unknown target signal levels Hard to formulate error or objective function Need to tune the equalizer and receive comparator levels Integrated Systems Group 29 Optimization example: Power constrained linear precoding pow er constraint ak w P precoder channel pulse response ( g noise âk ) ek ak MSE( w, g ) = Ea 1 − 2 g w P1∆ + g 2 w PPT w + g 2σ 2 ∆ T T T Ea ( w P1∆ ) 2 SINRunbiased ( w) = T T T Ea w P (I − 1∆ 1∆ )(I − 1∆ 1∆ )T P T w + σ 2 Add variable gain to amplify to known target level Formulate the objective function from error SINR is not concave in w in general Change objective to quasiconcave SINRunbiased V. Stojanović, A. Amirkhany, M. Horowitz, “Optimal Linear Precoding with Theoretical and Practical Data Rates in High-Speed Serial-Link Backplane Communication,” IEEE International Conference on Communications , June 2004 Integrated Systems Group 30 Optimal linear precoding Still, does this objective really relate to link performance? Need to look at noise and interference distributions 0.5d min w P1∆ − V peak wPI PD 1 − offset T maximize γ = T T T w Ea w P (I − 1∆ 1∆ − I PD )(I − 1∆ 1∆ − I PD )T P T w + σ 2 ( s.t. ) 1/ 2 w 1 ≤1 σ2=wTS0TXw+wTS0RXw+σ2thermal Minimize BER Residual dispersion into peak distortion Reflections into mean distortion Includes all link-specific noise sources Integrated Systems Group 31 Including feedback equalization Feedback equalization (DFE) Subtracts error from input No attenuation Problem with DFE Need to know interfering bits ISI must be causal Feedback equalization 0.8 Amplitude 1 0.6 0.4 0.2 0 0 2 4 6 8 10 12 14 16 18 Symbol time Problem - latency in the decision circuit Receive latency + DAC settling < bit time Can increase allowable time by loop unrolling Receive next bit before the previous is resolved Integrated Systems Group 32 One-tap DFE with loop unrolling 1 Pulse response α +1 0 -1 Integrated Systems Group 33 One-tap DFE with loop unrolling 1 α +1+α +1 +α 0 -1+α -1 Integrated Systems Group 34 One-tap DFE with loop unrolling 1 α +1+α +1 +α +1-α 0 -α -1+α -1 -1-α Integrated Systems Group 35 One-tap DFE with loop unrolling +1+α +α d n | d n −1 = 1 +α +1-α xn D Q dClk -α -1+α d n | d n −1 = 0 -α -1-α d n −1 dClk Instead of subtracting the error Move the slicer level to include the noise Slice for each possible level, since previous value unknown K.K. Parhi, "High-Speed architectures for algorithms with quantizer loops," IEEE International Symposium on Circuits and Systems, May 1990 Integrated Systems Group 36 BER contours 5 tap Tx Eq 5 tap Tx Eq + 1 tap DFE 150 150 -5 -5 100 -10 50 -15 0 -50 -20 -100 -150 0 20 40 60 80 100 120 140 160 time [ps] margin [mV] margin [mV] 100 -10 50 -15 0 -50 -20 -25 -100 -25 -30 -150 0 20 40 60 80 100 120 140 160 time [ps] -30 Voltage margin Min. distance between the receiver threshold and contours with same BER Integrated Systems Group 37 Pulse amplitude modulation Binary (NRZ) 1 bit / symbol Symbol rate = bit rate PAM4 2 bits / symbol Symbol rate = bit rate/2 00 1 01 0 11 10 Integrated Systems Group 38 Multi-level: Offset and jitter are crucial thermal noise + offset thermal noise 35 PAM16 30 PAM8 25 20 PAM16 PAM4 25 20 5 2 4 6 PAM4 15 8 10 12 14 16 18 20 Symbol rate [Gs/s] PAM2 10 10 5 5 0 0 2 4 6 8 10 12 14 16 18 20 Symbol rate [Gs/s] 0 0 2 4 6 8 10 12 14 16 18 20 Symbol rate [Gs/s] PAM2 10 PAM8 25 PAM2 PAM4 15 30 15 20 0 0 30 Data rate [Gb/s] Data rate [Gb/s] Data rate [Gb/s] 45 40 thermal noise + offset+ jitter PAM8 To make better use of available bandwidth, need better circuits PAM2/PAM4 robust candidate for next generation links Integrated Systems Group 39 Full ISI compensation too costly thermal noise + offset Data rate [Gb/s] Data rate [Gb/s] 18 18 16 PAM4 14 12 20 20 20 18 16 16 14 14 PAM8 12 PAM16 10 PAM8 8 10 8 PAM2 Data rate [Gb/s] thermal noise thermal noise + offset+ jitter PAM4 12 PAM4 10 PAM2 8 PAM8 PAM2 6 6 4 4 4 2 2 2 0 0 2 4 6 8 10 12 14 16 Symbol rate [Gs/s] 0 0 2 4 6 8 10 12 14 16 Symbol rate [Gs/s] 6 0 0 2 4 6 8 10 12 14 16 Symbol rate [Gs/s] Today’s links cannot afford to compensate all ISI Limits today’s maximum achievable data rates Integrated Systems Group 40 Outline Show system level optimization for links Create a framework to evaluate trade-offs Background on high-speed links High-speed link modeling System level optimization Practical implementation issues Low-cost adaptation Dual-mode link (hardware re-use) Current / future work Integrated Systems Group 41 Fully adaptive dual-mode link Config Registers CDR Logic Phase Mixers PLL Receiver Reflection Canceller PAM2/PAM4 2-10Gb/s 0.13µm 40mW/Gb/s Transmitter Backchannel RX Backchannel TX Reconfigurable dual-mode PAM2/PAM4 link Adaptive equalization Transmit and receive equalization DFE with loop unrolling V. Stojanović et al. “Adaptive Equalization and Data Recovery in Dual-Mode (PAM2/4) Serial Link Transceiver,” IEEE Symposium on VLSI Circuits, June 2004. Integrated Systems Group 42 Adaptation with minimum overhead dLev Tx Data error adaptive aClk sampler Rx data Channel Adaptive macro dClk thresholds tap updates edge CDR eClk aClk dClk eClk Adaptive sampler Generates the error signal at reference level Monitors the link tap updates Adjustable voltage and time reference On-chip sampling scope Can replace any other sampler - calibration Integrated Systems Group 43 Dual-loop adaptive algorithm Data level reference loop dLevn +1 = dLevn − stepdataLev sign(en ), xˆ n > 0 dLevinit errorinit x̂n dLevmid p-p dLevend Sign(en ) … Initial eye Sign( xˆn ) … Mid-way equalized Equalized Equalizer loop wn +1 = wn + stepw sign(en ) sign( xˆ n ) Scale the equalizer - output Tx constraint Integrated Systems Group 44 Dual loop convergence – 4 tap example PAM2, 5Gb/s, 4taps Tx Equalization 100 1000 800 tap weight [mV] dLev [mV] 80 60 40 20 400 200 post2 0 pre1 -200 0 0 main tap 600 50 100 150 number of updates 200 -400 0 post1 50 100 150 200 number of updates Hard to estimate analytically Experimental results show Both loops are stable within wide range 0.1 – 10x of relative speeds Integrated Systems Group 45 Hardware re-use: Dual-mode receiver prDFE enable thresh (+) D Q D Q 0 1 dClk in D Q D Q prDFE enable 0 dClk msb D Q 1 thresh(-) 1 D Q thresh (-) D Q prDFE enable D Q 0 0 dClk 1 lsb(+) D Q thresh(+) 0 0 lsb(-) D Q PAM4 Integrated Systems Group 46 Hardware re-use: Dual-mode receiver prDFE enable thresh (+) D Q D Q 0 0 1 dClk in 0 D Q prDFE enable D Q 0 dClk thresh (-) clk D Q 1 inP outN outP inP D Q clk prDFE enable 0 1 outP Q 0 D Q dClk PAM4 msb D Q 1 thresh(-) lsb(+) D Q thresh(+) D Q outN inN I + I thresh 2 I − I th resh 2 clk pre-amp with offset Integrated Systems Group Q comparator 47 lsb(-) Hardware re-use: Dual-mode receiver prDFE enable thresh (+) D Q D Q 0 lsb(+) D Q 0 0 1 dClk in D Q D Q prDFE enable 0 dClk msb D Q 1 1 D Q thresh (-) D Q prDFE enable D Q 0 0 dClk 1 lsb(-) D Q PAM2 Integrated Systems Group 48 Hardware re-use: Dual-mode receiver prDFE enable thresh (+) D Q D Q 0 lsb(+) D Q 0 1 dClk in D Q D Q prDFE enable 0 dClk msb D Q 1 1 D Q thresh (-) D Q prDFE enable D Q 0 0 dClk 1 lsb(-) D Q PAM2 with loop-unrolled DFE tap Integrated Systems Group 49 Hardware re-use: Dual-mode receiver prDFE enable thresh (+) D Q D Q 0 lsb(+) D Q 0 thresh(+) 1 dClk in D Q D Q prDFE enable 0 dClk 1 thresh(-) 1 D Q thresh (-) D Q prDFE enable D Q 0 0 dClk 1 msb D Q lsb(-) D Q PAM2 with loop-unrolled DFE tap Leverage multi-level properties of signals in loop-unrolling Re-use PAM4 receiver hardware (slicers and CDR) Integrated Systems Group 50 Improvements with loop-unrolling 0.4 unequalized 0.3 -3 200 0.2 150 -3.5 100 0.1 -4 [ps] 0 0.25 1000 [V] 2000 3000 [mV] 50 0 4000 0.2 0.1 0 0 1000 2000 -5 0 0.05 -4.5 -100 fully transmit equalized 0.15 0 -50 transmit equalized with one tap DFE log10(voltage probability distribution) [V] 50 100 150 200 [ps] Signal as seen by the receiver (on-chip scope) [ps] 3000 4000 Integrated Systems Group 51 Model and measurements 0 log10(BER) -2 -4 -6 -8 -10 -12 -14 80 60 40 20 0 -20 -40 -60 -80 Voltage Margin [mV] PAM4, 3taps of transmit equalization, 5Gb/s, 26” FR4 channel Integrated Systems Group 52 Outline Show system level optimization for links Create a framework to evaluate trade-offs Background on high-speed links High-speed link modeling System level optimization Practical implementation issues Current / future work Bridging the gap to link capacity Other similar system optimizations Integrated Systems Group 53 Bridging the gap: Multi-tone link bits/dimension 8 Multi-tone data rates with thermal noise Nelco 64 Gb/s 6 FR4 38 Gb/s 4 2 0 0 2 4 6 8 10 12 GHz A. Amirkhany, V. Stojanovic, M.A. Horowitz, “Multi-tone Signaling for High-speed Backplane Electrical Links,” IEEE Global Telecommunications Conference, November 2004. Integrated Systems Group 54 Bridging the gap: Multi-tone link bits/dimension 8 data0 LPF Nelco 64 Gb/s 6 dataN LPF 0 2 4 6 8 10 12 GHz BPF ejw1t BPF data0 LPF 2 BPF LPF data1 ejw1t # levels LPF FR4 38 Gb/s 4 0 data1 Multi-tone data rates with thermal noise … BPF LPF dataN f ejwNt ejwNt Challenge – balancing the inter-symbol and inter-channel interference Microwave filter techniques Custom signal processing Integrated Systems Group 55 The Problem with Multi-Mode Fiber 1 0.8 0.6 1 Multi-modal dispersion 0.4 0.8 0.2 0.6 0 0 5 10 15 20 25 0.4 0.2 0 0 1 1 2 3 0.8 4 0.6 0.4 Source - Corning 0.2 0 0 1 2 3 4 Modal dispersion limits the data rates to ~ 3Gb/s/km Integrated Systems Group 56 Example Fiber Modes 1000 2000 500 0 0 5 -2000 5 x 10 -5 0 -5 -5 0 5 x 10 -5 -5 x 10 -5 -5 2000 2000 0 0 -2000 5 -2000 5 x 10 -5 0 -5 -5 0 5 -5 x 10 0 x 10 -5 0 Integrated Systems Group -5 -5 0 0 5 x 10 -5 5 x 10 -5 57 SLM’s for Equalization Shape the E-field projected on the fiber Lens performs Fourier Transform SLM’s adjust the spatial frequency of the light 1000 500 - 0 5 x 10 -5 0 -5 -5 0 5 x 10 -5 x MEMS Spatial Light Modulator Optimize to reduce modal dispersion dmin Objective is intensity – makes optimization challenging E. Alon, V. Stojanovic, J. M. Kahn, S. Boyd, M. Horowitz “Equalization of Modal Dispersion in Multimode Fiber using Spatial Light Modulators,” Systems Group2004. IEEE Global TelecommunicationsIntegrated Conference , November 58 Conclusions Interfaces are challenging system designs Good space to explore system level optimization Optimization leads to novel approaches Baseband links Still, far from the capacity of these links PAM4 and simple DFE reduce effect of ISI Low cost adaptive, self calibrating link Looking into multi-tone to bridge the gap Multimode fiber optics Leverage multiple propagation modes rather than being limited Integrated Systems Group 59