Power Reduction Techniques in the SoC Bus Interconnects

advertisement
Power Reduction
Techniques in SoC
Bus Interconnects
Low Power Design for SoCs
ASIC Tutorial SoC Buses.1
Buses.1
©M.J. Irwin, PSU, 1999
Bus Power
l Buses
are a significant source of power
dissipation due to high switching activities
and large capacitive loading
» 15% of total power in Alpha 21064
» 30% of total power in Intel 80386
Wout
Xout
Ain
Low Power Design for SoCs
Yout
Bin
Zout
Cin
ASIC Tutorial SoC Buses.2
Buses.2
Bus
receivers
Bus
Din
Bus
drivers
©M.J. Irwin, PSU, 1999
1
Bus Power Reduction
Pbus = nCVdd2f
l
l
l
for an n-bit bus
Minimize bit switching activity (f) of buses by encoding
the data
Minimize voltage swing (V2) using differential signaling
Alternative bus structures
» charge recovery buses
» bus multiplexing (lower f, maybe)
» segmented buses (lower C)
l
Minimizing bus traffic (n)
» code compression
» instruction loop buffers
Low Power Design for SoCs
ASIC Tutorial SoC Buses.3
Buses.3
©M.J. Irwin, PSU, 1999
Signal Encoding
Binary Code
Gray Code
sequence # toggles Sequence # toggles
000
001
010
011
100
101
110
111
Low Power Design for SoCs
3
1
2
1
3
1
2
1
000
001
011
010
110
111
101
100
ASIC Tutorial SoC Buses.4
Buses.4
1
1
1
1
1
1
1
1
©M.J. Irwin, PSU, 1999
2
Toggle Rates
# bits
1
2
3
4
5
6
∞
Binary toggles
(Bn = 2(2n-1))
2
6
14
30
62
126
-
Low Power Design for SoCs
Gray toggles
(Gn=2n)
2
4
8
16
32
64
-
ASIC Tutorial SoC Buses.5
Buses.5
Bn/Gn
1
1.5
1.75
1.88
1.94
1.99
2.00
©M.J. Irwin, PSU, 1999
Bus Signal Encoding
l Different
encodings lead to different area,
delay, and power trade-offs
» What is the power and latency cost of the
encoding/decoding logic?
» What if the bus stream is not sequential?
l Can
really pay off in buses with large
capacitive loading (off-chip buses and
high level on-chip buses)
Low Power Design for SoCs
ASIC Tutorial SoC Buses.6
Buses.6
©M.J. Irwin, PSU, 1999
3
Bus Invert Encoding
l At
each cycle decide whether sending the
true or compliment signal leads to fewer
toggles
l Need an additional polarity signal on the
bus to tell the bus receiver whether to
invert the signal or not
l Only makes sense for groups of signals buses - that can share the polarity signal
l Works for both sequential and random
bus streams
Low Power Design for SoCs
ASIC Tutorial SoC Buses.7
Buses.7
©M.J. Irwin, PSU, 1999
Bus Invert Coding Logic
Source
data
Invert/pass
Data bus Invert/pass
Received
data
Polarity signal
Polarity
decision
logic
Bus
register
Hamming distance
Low Power Design for SoCs
ASIC Tutorial SoC Buses.8
Buses.8
©M.J. Irwin, PSU, 1999
4
Efficiency of Bus Invert Encoding
l Have
overhead in area, power and delay
of additional logic to encode/decode
l Maximum number of toggling bits
reduced from n to n/2
l Under uniform random signal conditions
(non-correlated data sequence), the
toggle reduction has an upper bound of
25%
Low Power Design for SoCs
ASIC Tutorial SoC Buses.9
Buses.9
©M.J. Irwin, PSU, 1999
Efficiency of Bus Encoding
# bits
2
4
8
16
32
64
∞
E[P] = n/2
Reg bus
E[P]
1
2
4
8
16
32
-
Invert bus
E[Q]
0.75
1.56
3.27
6.83
14.19
29.27
-
Invert/Reg
E[Q]/E[P]
0.75
0.781
0.817
0.854
0.886
0.915
1.00
n/2
1 n+1
E[Q] = ∑ k Qk where Qk = 2n k
k=0
From Stan, 1995
Low Power Design for SoCs
ASIC Tutorial SoC Buses.10
Buses.10
©M.J. Irwin, PSU, 1999
5
Bus Encoding Extensions
l For
sequential data (e.g., generated on
address buses)
» Gray code encoding (except for overhead)
» T0 code by Benini
– add address incrementer circuitry to receiver
– add INC line to address bus
– for consecutive addresses, just assert the INC line
without sending the second address
– reduces address bus transitions by 36% over binary
– outperforms Gray code when probability of
consecutive addresses is > 0.5
Low Power Design for SoCs
ASIC Tutorial SoC Buses.11
Buses.11
©M.J. Irwin, PSU, 1999
Low Swing Buses
voltage swing (V2) using
differential signaling
l Minimize
» bus contains multiple bits -> relatively low
overhead
» all signals on the bus operate in sync ->
creative circuit techniques for differential circuits
l Two
basic approaches
» Additional reference voltage lines
– driver circuit responsible for generating Vref
– SA bus receiver circuit required
» Charge recycling
Low Power Design for SoCs
ASIC Tutorial SoC Buses.12
Buses.12
©M.J. Irwin, PSU, 1999
6
Additional Reference Lines
l Introduce
an additional reference voltage
line between the sender and receiver
driver
circuit
Send
data
Vref
receiver
circuit
Vbus
Received
data
Cbus
Low swing bus
∆V ≅ 0.1Vdd
Vbus
Vref
Conventional bus
Logic 0
Low Power Design for SoCs
Logic 1
ASIC Tutorial SoC Buses.13
Buses.13
©M.J. Irwin, PSU, 1999
Bus Driver Circuit
Vbus
Cbus
Source
data
Low Power Design for SoCs
Cref
Vref
Cn
ASIC Tutorial SoC Buses.14
Buses.14
Cref >> Cn,Cbus
Cn Vref = Cbus(Vdd-Vref)
Vref =
Cbus
Cn+Cbus
Vdd
©M.J. Irwin, PSU, 1999
7
Power Efficiency
l Depends
on the extent of voltage swing
reduction (depends on required noise
immunity and sensitivity of sensing circuit)
» 0.1Vdd reduced swing -> 99% savings
l Also
must consider
» additional power of driver and receiver circuits
» additional timing delays of circuits (but reduced
swing improves signal switching time)
» reduced swing -> smaller transistors at driver ->
reduced short circuit currents
Low Power Design for SoCs
ASIC Tutorial SoC Buses.15
Buses.15
©M.J. Irwin, PSU, 1999
Limitations
l Susceptible
to noise and cross-talk
l Producing large on-chip capacitance Cref difficult
l Sensing circuit difficult to design for very low
operating voltages
l Ratio of Cbus to Cn may be difficult to control
(sensitive to process variations)
l Driver circuit inherently dynamic so cannot stay
dormant for long periods (what if data signal
contains long series of identical values?)
l Takes time for Vref to recover if bus deactivated
Low Power Design for SoCs
ASIC Tutorial SoC Buses.16
Buses.16
©M.J. Irwin, PSU, 1999
8
Charge Recycling Bus
l High
order bit discharges to lower bit
recycling charge (need 2 wires per bit)
Vdd
0
CD0+
CD0CD1+
1
CD1CD2+
0
S1
D0 = 0
2/3Vdd
D1 = 1
1/3Vdd
D2 = 0
0
Precharge
S1 closed
CD2-
Precharge
Bus
S1 closed
valid
S2 closed
S2
Low Power Design for SoCs
ASIC Tutorial SoC Buses.17
Buses.17
©M.J. Irwin, PSU, 1999
Power Efficiency
l Depends
on the number of bits stacked
l For n bits, voltage swing of each line is
∆V = Vdd/(2n)
l So power dissipation of recycling bus is
Pcrb = 2n C (Vdd/(2n))2 (2f) = Pconv /(n2)
l However, due to precharge don’
t gain from
data correlation, so efficiency reduced to
Pcrb = 2Pconv /(n2)
Low Power Design for SoCs
ASIC Tutorial SoC Buses.18
Buses.18
©M.J. Irwin, PSU, 1999
9
Limitations
l Larger
values of n improves power
efficiency but decreases noise immunity
l Must maintain all line capacitances at an
equal value (may limit scheme to on-chip
buses -> have to be careful in layout to
balance capacitances)
l Requires precharge phase -> reduces
data transfer rate
Low Power Design for SoCs
ASIC Tutorial SoC Buses.19
Buses.19
©M.J. Irwin, PSU, 1999
Comparisons
E
D E*D Swing Robustness
(pJ) (ns)
(v)
CMOS 11.6 2.1 24.5 2.0
Excellent
CISB
3.5
4.4
15.4 0.25
Small margin
CRB
3.1
3.5
10.9 0.25
Not reliable
LCR
1.78 2.43 4.32 0.6
Good noise
margin
Complexity
Least
Extra timings,
sense amps
Extra timings,
dual rail
Extra timings,
2 ref voltages
Vdd= 2V, CL(bus) = 1pF, 0.6µ
From Zhang, 1998
Low Power Design for SoCs
ASIC Tutorial SoC Buses.20
Buses.20
©M.J. Irwin, PSU, 1999
10
Charge Recovery Bus
l Recover
charge from falling bit lines to
precharge rising bit lines
…
CD0
CD1
CD2
CD3
transmit
control
Low Power Design for SoCs
…
…
…
short
control
ASIC Tutorial SoC Buses.21
Buses.21
receive
control
©M.J. Irwin, PSU, 1999
Energy Savings
l The
amount of energy savings depends on
the number of lines shorted, the control
circuitry, and the data length and pattern
l For a single transfer charge recovery
E = RCVdd∆V
where R is the number of rising bit lines and
∆V is the voltage change after charge
transfer
E = RCVdd(Vdd-Vdd(F/(R+F))) = CVdd2(R2/(R+F))
Low Power Design for SoCs
ASIC Tutorial SoC Buses.22
Buses.22
©M.J. Irwin, PSU, 1999
11
Reported Savings
l For
random data, 32-bit bus
» single transfer energy savings of 47%
» maximum optimal energy savings of 72%
Avg energy savings
60
50
40
30
20
0
3
8
16
32
48
64
Width of databus
Low Power Design for SoCs
From Khoo,
Khoo, 1995
ASIC Tutorial SoC Buses.23
Buses.23
©M.J. Irwin, PSU, 1999
Single Transfer Charge
Recovery Bus
L
L
L
L
≠?
CD0
≠?
CD1
≠?
CD2
≠?
CD3
transmit
control
…
…
…
…
receive
control
Participates in charge sharing if data bit is
different from last data bit transmitted
Low Power Design for SoCs
ASIC Tutorial SoC Buses.24
Buses.24
©M.J. Irwin, PSU, 1999
12
Data Patterns Affect Savings
Trace A: 0001->1110->0001
Step #
S1 (charge)
S2 (short)
S3 (charge)
S4 (short)
S5 (charge)
A3
0.0
0.6
2.5
1.9
0.0
A2
0.0
0.6
2.5
1.9
0.0
Trace B: 0011->1100->0011
A1
0.0
0.6
2.5
1.9
0.0
A0
2.5
0.6
0.0
1.9
2.5
Trace A
B3
0.0
1.2
2.5
1.2
0.0
B2
0.0
1.2
2.5
1.2
0.0
B1
2.5
1.2
0.0
1.2
2.5
B0
2.5
1.2
0.0
1.2
2.5
Trace B
Step 3
Step 5
Total
Low Power Design for SoCs
ASIC Tutorial SoC Buses.25
Buses.25
©M.J. Irwin, PSU, 1999
Relative energy consumption
Impacts of Signal Encoding on
Charge Recovery
1
2s'c regular
2s'c recover
gray regular
gray recover
SM regular
SM recover
Businv reg
Businv rec
0.5
0
Average of 15 Mediabench benchmarks
From Bishop, 1999
Low Power Design for SoCs
ASIC Tutorial SoC Buses.26
Buses.26
©M.J. Irwin, PSU, 1999
13
Bus Multiplexing
l Share
long data buses with time multiplexing
(S1 uses even cycles, S2 odd)
S1
S2
D1
S1
D1
D2
S2
D2
l But
what if data samples are correlated (e.g.,
sign bits)?
Low Power Design for SoCs
ASIC Tutorial SoC Buses.27
Buses.27
©M.J. Irwin, PSU, 1999
Bit switching probabilities
Correlated Data Streams
1
0.5
Muxed
Dedicated
0
14
12
MSB
Low Power Design for SoCs
10
8
6
4
Bit position
ASIC Tutorial SoC Buses.28
Buses.28
2
0
LSB
©M.J. Irwin, PSU, 1999
14
Disadvantage of Bus Multiplexing
l If
data bus is shared, advantages of data
correlation are lost (bus carries samples
from two uncorrelated data streams)
l Bus sharing should not be used for
positively correlated data streams
l Bus sharing may prove advantageous in a
negatively correlated data stream (where
successive samples switch sign bits) more random switching
Low Power Design for SoCs
ASIC Tutorial SoC Buses.29
Buses.29
©M.J. Irwin, PSU, 1999
Segmented Buses
l
Partition bus into several segments that reduces
the capacitance per segment
Wout
Xout
Yout
Zout
TIE
Ain
l
Bin
TIE
control
Cin
Din
Try to group often communicating circuits on the
same segment
Low Power Design for SoCs
ASIC Tutorial SoC Buses.30
Buses.30
©M.J. Irwin, PSU, 1999
15
TIE Design
l To
connect the segments
Tie
tie
tie
Tie
l Delay/power
models for t-gate solution
show a 60%-70% reduction in power and
a 10%-30% improvement in bus delay
Low Power Design for SoCs
ASIC Tutorial SoC Buses.31
Buses.31
©M.J. Irwin, PSU, 1999
Code Compression
l Assuming
only a subset of instr’s used,
replace them with a shorter encoding to
reduce memory bandwidth
addresses
Core
logN bits instructions
memory
Low Power Design for SoCs
IDT
k bits
instruction
decompression table
(restores original format)
ASIC Tutorial SoC Buses.32
Buses.32
©M.J. Irwin, PSU, 1999
16
Instruction Loop Buffer
l Temporarily
store decoded instr’s from
small loops in a buffer
DIB stores
decoded instr’s
for a whole loop
Low Power Design for SoCs
Execute
DIB
Memory
MAR
I$
Decode
Instruction
PC
Fetch
D$
WriteBack
MDR
skip Ifetch and
decode
Can achieve a 40% power
savings in a DSP or RISC
processor
ASIC Tutorial SoC Buses.33
Buses.33
©M.J. Irwin, PSU, 1999
Key References
Bajwa, Stage-skip pipeline, Proc. of ISLPED, pp. 353-358, 1996.
Bellaouar, An ultra-low power CMOS on-chip interconnect architecture, Proc. of
SLPE, pp. 52-53, 1995.
Benini, Address bus encoding techniques for system-level power optimization, Proc.
of DATE, pp. 861-866, 1998.
Bishop, Database charge recovery: practical considerations, Proc. SLPED, 1999.
Chen, Segmented bus design for low power systems, IEEE Trans. on VLSI
Systems, 7(1):25-29, Mar 1999.
Hikari, Data dependent logic swing internal bus architecture for ultralow power LSI,
IEEE Journal of SSC, 30(4):397-402, Apr 1995.
Khoo, Charge recovery on a databus, Proc. SLPED, 185-189, Aug 1995.
Stan, Bus-invert coding for low power I/O, IEEE Trans. on VLSI Systems, 3(1):4958, 1995.
Yamauchi, An asymtotically zero power charge recycling bus architecture, IEEE
Journal of SSC, 30(4):423-431, Apr 1995.
Yoshida, An object code compression approach to embedded processors, Proc. of
ISLPED, pp. 265-268, 1997.
Zhang, Low-swing interconnect interface circuits, Proc. SLPED, 161-166, Aug 1998.
Low Power Design for SoCs
ASIC Tutorial SoC Buses.34
Buses.34
©M.J. Irwin, PSU, 1999
17
Download