Power Reduction Techniques in SoC Bus Interconnects Low Power Design for SoCs ASIC Tutorial SoC Buses.1 Buses.1 ©M.J. Irwin, PSU, 1999 Bus Power l Buses are a significant source of power dissipation due to high switching activities and large capacitive loading » 15% of total power in Alpha 21064 » 30% of total power in Intel 80386 Wout Xout Ain Low Power Design for SoCs Yout Bin Zout Cin ASIC Tutorial SoC Buses.2 Buses.2 Bus receivers Bus Din Bus drivers ©M.J. Irwin, PSU, 1999 1 Bus Power Reduction Pbus = nCVdd2f l l l for an n-bit bus Minimize bit switching activity (f) of buses by encoding the data Minimize voltage swing (V2) using differential signaling Alternative bus structures » charge recovery buses » bus multiplexing (lower f, maybe) » segmented buses (lower C) l Minimizing bus traffic (n) » code compression » instruction loop buffers Low Power Design for SoCs ASIC Tutorial SoC Buses.3 Buses.3 ©M.J. Irwin, PSU, 1999 Signal Encoding Binary Code Gray Code sequence # toggles Sequence # toggles 000 001 010 011 100 101 110 111 Low Power Design for SoCs 3 1 2 1 3 1 2 1 000 001 011 010 110 111 101 100 ASIC Tutorial SoC Buses.4 Buses.4 1 1 1 1 1 1 1 1 ©M.J. Irwin, PSU, 1999 2 Toggle Rates # bits 1 2 3 4 5 6 ∞ Binary toggles (Bn = 2(2n-1)) 2 6 14 30 62 126 - Low Power Design for SoCs Gray toggles (Gn=2n) 2 4 8 16 32 64 - ASIC Tutorial SoC Buses.5 Buses.5 Bn/Gn 1 1.5 1.75 1.88 1.94 1.99 2.00 ©M.J. Irwin, PSU, 1999 Bus Signal Encoding l Different encodings lead to different area, delay, and power trade-offs » What is the power and latency cost of the encoding/decoding logic? » What if the bus stream is not sequential? l Can really pay off in buses with large capacitive loading (off-chip buses and high level on-chip buses) Low Power Design for SoCs ASIC Tutorial SoC Buses.6 Buses.6 ©M.J. Irwin, PSU, 1999 3 Bus Invert Encoding l At each cycle decide whether sending the true or compliment signal leads to fewer toggles l Need an additional polarity signal on the bus to tell the bus receiver whether to invert the signal or not l Only makes sense for groups of signals buses - that can share the polarity signal l Works for both sequential and random bus streams Low Power Design for SoCs ASIC Tutorial SoC Buses.7 Buses.7 ©M.J. Irwin, PSU, 1999 Bus Invert Coding Logic Source data Invert/pass Data bus Invert/pass Received data Polarity signal Polarity decision logic Bus register Hamming distance Low Power Design for SoCs ASIC Tutorial SoC Buses.8 Buses.8 ©M.J. Irwin, PSU, 1999 4 Efficiency of Bus Invert Encoding l Have overhead in area, power and delay of additional logic to encode/decode l Maximum number of toggling bits reduced from n to n/2 l Under uniform random signal conditions (non-correlated data sequence), the toggle reduction has an upper bound of 25% Low Power Design for SoCs ASIC Tutorial SoC Buses.9 Buses.9 ©M.J. Irwin, PSU, 1999 Efficiency of Bus Encoding # bits 2 4 8 16 32 64 ∞ E[P] = n/2 Reg bus E[P] 1 2 4 8 16 32 - Invert bus E[Q] 0.75 1.56 3.27 6.83 14.19 29.27 - Invert/Reg E[Q]/E[P] 0.75 0.781 0.817 0.854 0.886 0.915 1.00 n/2 1 n+1 E[Q] = ∑ k Qk where Qk = 2n k k=0 From Stan, 1995 Low Power Design for SoCs ASIC Tutorial SoC Buses.10 Buses.10 ©M.J. Irwin, PSU, 1999 5 Bus Encoding Extensions l For sequential data (e.g., generated on address buses) » Gray code encoding (except for overhead) » T0 code by Benini – add address incrementer circuitry to receiver – add INC line to address bus – for consecutive addresses, just assert the INC line without sending the second address – reduces address bus transitions by 36% over binary – outperforms Gray code when probability of consecutive addresses is > 0.5 Low Power Design for SoCs ASIC Tutorial SoC Buses.11 Buses.11 ©M.J. Irwin, PSU, 1999 Low Swing Buses voltage swing (V2) using differential signaling l Minimize » bus contains multiple bits -> relatively low overhead » all signals on the bus operate in sync -> creative circuit techniques for differential circuits l Two basic approaches » Additional reference voltage lines – driver circuit responsible for generating Vref – SA bus receiver circuit required » Charge recycling Low Power Design for SoCs ASIC Tutorial SoC Buses.12 Buses.12 ©M.J. Irwin, PSU, 1999 6 Additional Reference Lines l Introduce an additional reference voltage line between the sender and receiver driver circuit Send data Vref receiver circuit Vbus Received data Cbus Low swing bus ∆V ≅ 0.1Vdd Vbus Vref Conventional bus Logic 0 Low Power Design for SoCs Logic 1 ASIC Tutorial SoC Buses.13 Buses.13 ©M.J. Irwin, PSU, 1999 Bus Driver Circuit Vbus Cbus Source data Low Power Design for SoCs Cref Vref Cn ASIC Tutorial SoC Buses.14 Buses.14 Cref >> Cn,Cbus Cn Vref = Cbus(Vdd-Vref) Vref = Cbus Cn+Cbus Vdd ©M.J. Irwin, PSU, 1999 7 Power Efficiency l Depends on the extent of voltage swing reduction (depends on required noise immunity and sensitivity of sensing circuit) » 0.1Vdd reduced swing -> 99% savings l Also must consider » additional power of driver and receiver circuits » additional timing delays of circuits (but reduced swing improves signal switching time) » reduced swing -> smaller transistors at driver -> reduced short circuit currents Low Power Design for SoCs ASIC Tutorial SoC Buses.15 Buses.15 ©M.J. Irwin, PSU, 1999 Limitations l Susceptible to noise and cross-talk l Producing large on-chip capacitance Cref difficult l Sensing circuit difficult to design for very low operating voltages l Ratio of Cbus to Cn may be difficult to control (sensitive to process variations) l Driver circuit inherently dynamic so cannot stay dormant for long periods (what if data signal contains long series of identical values?) l Takes time for Vref to recover if bus deactivated Low Power Design for SoCs ASIC Tutorial SoC Buses.16 Buses.16 ©M.J. Irwin, PSU, 1999 8 Charge Recycling Bus l High order bit discharges to lower bit recycling charge (need 2 wires per bit) Vdd 0 CD0+ CD0CD1+ 1 CD1CD2+ 0 S1 D0 = 0 2/3Vdd D1 = 1 1/3Vdd D2 = 0 0 Precharge S1 closed CD2- Precharge Bus S1 closed valid S2 closed S2 Low Power Design for SoCs ASIC Tutorial SoC Buses.17 Buses.17 ©M.J. Irwin, PSU, 1999 Power Efficiency l Depends on the number of bits stacked l For n bits, voltage swing of each line is ∆V = Vdd/(2n) l So power dissipation of recycling bus is Pcrb = 2n C (Vdd/(2n))2 (2f) = Pconv /(n2) l However, due to precharge don’ t gain from data correlation, so efficiency reduced to Pcrb = 2Pconv /(n2) Low Power Design for SoCs ASIC Tutorial SoC Buses.18 Buses.18 ©M.J. Irwin, PSU, 1999 9 Limitations l Larger values of n improves power efficiency but decreases noise immunity l Must maintain all line capacitances at an equal value (may limit scheme to on-chip buses -> have to be careful in layout to balance capacitances) l Requires precharge phase -> reduces data transfer rate Low Power Design for SoCs ASIC Tutorial SoC Buses.19 Buses.19 ©M.J. Irwin, PSU, 1999 Comparisons E D E*D Swing Robustness (pJ) (ns) (v) CMOS 11.6 2.1 24.5 2.0 Excellent CISB 3.5 4.4 15.4 0.25 Small margin CRB 3.1 3.5 10.9 0.25 Not reliable LCR 1.78 2.43 4.32 0.6 Good noise margin Complexity Least Extra timings, sense amps Extra timings, dual rail Extra timings, 2 ref voltages Vdd= 2V, CL(bus) = 1pF, 0.6µ From Zhang, 1998 Low Power Design for SoCs ASIC Tutorial SoC Buses.20 Buses.20 ©M.J. Irwin, PSU, 1999 10 Charge Recovery Bus l Recover charge from falling bit lines to precharge rising bit lines … CD0 CD1 CD2 CD3 transmit control Low Power Design for SoCs … … … short control ASIC Tutorial SoC Buses.21 Buses.21 receive control ©M.J. Irwin, PSU, 1999 Energy Savings l The amount of energy savings depends on the number of lines shorted, the control circuitry, and the data length and pattern l For a single transfer charge recovery E = RCVdd∆V where R is the number of rising bit lines and ∆V is the voltage change after charge transfer E = RCVdd(Vdd-Vdd(F/(R+F))) = CVdd2(R2/(R+F)) Low Power Design for SoCs ASIC Tutorial SoC Buses.22 Buses.22 ©M.J. Irwin, PSU, 1999 11 Reported Savings l For random data, 32-bit bus » single transfer energy savings of 47% » maximum optimal energy savings of 72% Avg energy savings 60 50 40 30 20 0 3 8 16 32 48 64 Width of databus Low Power Design for SoCs From Khoo, Khoo, 1995 ASIC Tutorial SoC Buses.23 Buses.23 ©M.J. Irwin, PSU, 1999 Single Transfer Charge Recovery Bus L L L L ≠? CD0 ≠? CD1 ≠? CD2 ≠? CD3 transmit control … … … … receive control Participates in charge sharing if data bit is different from last data bit transmitted Low Power Design for SoCs ASIC Tutorial SoC Buses.24 Buses.24 ©M.J. Irwin, PSU, 1999 12 Data Patterns Affect Savings Trace A: 0001->1110->0001 Step # S1 (charge) S2 (short) S3 (charge) S4 (short) S5 (charge) A3 0.0 0.6 2.5 1.9 0.0 A2 0.0 0.6 2.5 1.9 0.0 Trace B: 0011->1100->0011 A1 0.0 0.6 2.5 1.9 0.0 A0 2.5 0.6 0.0 1.9 2.5 Trace A B3 0.0 1.2 2.5 1.2 0.0 B2 0.0 1.2 2.5 1.2 0.0 B1 2.5 1.2 0.0 1.2 2.5 B0 2.5 1.2 0.0 1.2 2.5 Trace B Step 3 Step 5 Total Low Power Design for SoCs ASIC Tutorial SoC Buses.25 Buses.25 ©M.J. Irwin, PSU, 1999 Relative energy consumption Impacts of Signal Encoding on Charge Recovery 1 2s'c regular 2s'c recover gray regular gray recover SM regular SM recover Businv reg Businv rec 0.5 0 Average of 15 Mediabench benchmarks From Bishop, 1999 Low Power Design for SoCs ASIC Tutorial SoC Buses.26 Buses.26 ©M.J. Irwin, PSU, 1999 13 Bus Multiplexing l Share long data buses with time multiplexing (S1 uses even cycles, S2 odd) S1 S2 D1 S1 D1 D2 S2 D2 l But what if data samples are correlated (e.g., sign bits)? Low Power Design for SoCs ASIC Tutorial SoC Buses.27 Buses.27 ©M.J. Irwin, PSU, 1999 Bit switching probabilities Correlated Data Streams 1 0.5 Muxed Dedicated 0 14 12 MSB Low Power Design for SoCs 10 8 6 4 Bit position ASIC Tutorial SoC Buses.28 Buses.28 2 0 LSB ©M.J. Irwin, PSU, 1999 14 Disadvantage of Bus Multiplexing l If data bus is shared, advantages of data correlation are lost (bus carries samples from two uncorrelated data streams) l Bus sharing should not be used for positively correlated data streams l Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) more random switching Low Power Design for SoCs ASIC Tutorial SoC Buses.29 Buses.29 ©M.J. Irwin, PSU, 1999 Segmented Buses l Partition bus into several segments that reduces the capacitance per segment Wout Xout Yout Zout TIE Ain l Bin TIE control Cin Din Try to group often communicating circuits on the same segment Low Power Design for SoCs ASIC Tutorial SoC Buses.30 Buses.30 ©M.J. Irwin, PSU, 1999 15 TIE Design l To connect the segments Tie tie tie Tie l Delay/power models for t-gate solution show a 60%-70% reduction in power and a 10%-30% improvement in bus delay Low Power Design for SoCs ASIC Tutorial SoC Buses.31 Buses.31 ©M.J. Irwin, PSU, 1999 Code Compression l Assuming only a subset of instr’s used, replace them with a shorter encoding to reduce memory bandwidth addresses Core logN bits instructions memory Low Power Design for SoCs IDT k bits instruction decompression table (restores original format) ASIC Tutorial SoC Buses.32 Buses.32 ©M.J. Irwin, PSU, 1999 16 Instruction Loop Buffer l Temporarily store decoded instr’s from small loops in a buffer DIB stores decoded instr’s for a whole loop Low Power Design for SoCs Execute DIB Memory MAR I$ Decode Instruction PC Fetch D$ WriteBack MDR skip Ifetch and decode Can achieve a 40% power savings in a DSP or RISC processor ASIC Tutorial SoC Buses.33 Buses.33 ©M.J. Irwin, PSU, 1999 Key References Bajwa, Stage-skip pipeline, Proc. of ISLPED, pp. 353-358, 1996. Bellaouar, An ultra-low power CMOS on-chip interconnect architecture, Proc. of SLPE, pp. 52-53, 1995. Benini, Address bus encoding techniques for system-level power optimization, Proc. of DATE, pp. 861-866, 1998. Bishop, Database charge recovery: practical considerations, Proc. SLPED, 1999. Chen, Segmented bus design for low power systems, IEEE Trans. on VLSI Systems, 7(1):25-29, Mar 1999. Hikari, Data dependent logic swing internal bus architecture for ultralow power LSI, IEEE Journal of SSC, 30(4):397-402, Apr 1995. Khoo, Charge recovery on a databus, Proc. SLPED, 185-189, Aug 1995. Stan, Bus-invert coding for low power I/O, IEEE Trans. on VLSI Systems, 3(1):4958, 1995. Yamauchi, An asymtotically zero power charge recycling bus architecture, IEEE Journal of SSC, 30(4):423-431, Apr 1995. Yoshida, An object code compression approach to embedded processors, Proc. of ISLPED, pp. 265-268, 1997. Zhang, Low-swing interconnect interface circuits, Proc. SLPED, 161-166, Aug 1998. Low Power Design for SoCs ASIC Tutorial SoC Buses.34 Buses.34 ©M.J. Irwin, PSU, 1999 17