Power Reduction Techniques in the Processor Core

advertisement
Power Reduction
Techniques in the
Processor Core
Low Power Design for SoCs
ASIC Tutorial Processor Core.1
Core.1
©M.J. Irwin, PSU, 1999
Power Usage Stats
16%
18%
52%
2%
12%
1995 5V Notebook PC
Motherboard
Hard Disk
Floppy Disk
LCD/VGA
Power Supply
From Roy, 1997
Low Power Design for SoCs
ASIC Tutorial Processor Core.2
Core.2
©M.J. Irwin, PSU, 1999
1
Processor Power Budgets
Clock
Datapath
Memory
I/O (pads)
Inner circle: low end embedded microprocessor
Next circle: high end CPU with on-chip cache
Next circle: MPEG2 decoder ASIC
Outer circle: ATM switch ASIC
Low Power Design for SoCs
ASIC Tutorial Processor Core.3
Core.3
©M.J. Irwin, PSU, 1999
Basic Principles of Low Power Design
P = CL Vdd2 f + (tr + tf)/2 Vdd Ipeak f + Vdd Ileakage
l Reduce
switching (supply) voltage
» quadratic effect -> dramatic savings
» negative effect on performance
l Reduce
capacitance
l Reduce switching frequency
l Reduce glitching
l Reduce leakage and static currents
Low Power Design for SoCs
ASIC Tutorial Processor Core.4
Core.4
©M.J. Irwin, PSU, 1999
2
Design Levels
Abstraction
Level
Power
Savings
Analysis
Resources
Analysis
Accuracy
Most
Least
Worst
Least
Most
Best
Algorithm
Software/system
Architecture
Functional unit
Gate
Circuit
Low Power Design for SoCs
ASIC Tutorial Processor Core.5
Core.5
©M.J. Irwin, PSU, 1999
Circuit and Logic Gate
Techniques
Low Power Design for SoCs
ASIC Tutorial Processor Core.6
Core.6
©M.J. Irwin, PSU, 1999
3
Transistor Sizing for Dynamic
Power Reduction
l Use
the smallest transistors that satisfy
the delay constraints
» slack time - difference between required
time and arrival time of a signal at a gate
output
– Positive slack - size down
– Negative slack - size up
l Make
gates that toggle more frequently
smaller
Low Power Design for SoCs
ASIC Tutorial Processor Core.7
Core.7
©M.J. Irwin, PSU, 1999
Equivalent Pin Ordering
l Logically
equivalent pins may not have
identical delay/power characteristics
lTo
B
Out
Cout
A
Ci
Low Power Design for SoCs
conserve power (and
improve speed), connect
inputs so that most active
input is nearest output
lNeed
to know signal stats
ASIC Tutorial Processor Core.8
Core.8
©M.J. Irwin, PSU, 1999
4
Gate Restructuring
l Logically
equivalent gates may not have
identical power/delay characteristics
Low Power Design for SoCs
ASIC Tutorial Processor Core.9
Core.9
©M.J. Irwin, PSU, 1999
Network Restructuring
l Logically
equivalent gate networks may
not have identical power/delay
characteristics
F = ABCD
Technology mapping
Low Power Design for SoCs
ASIC Tutorial Processor Core.10
Core.10
delay
area
power
©M.J. Irwin, PSU, 1999
5
Dual Supply Voltages
l
Use two Vdd’s (e.g., 2.5V and 1.5V)
» use the higher supply for gates on the critical path
» use the lower supply for gates off the critical path
Reduces power without a performance loss
l Cons
l
» slight area penalty
» increased design time
» need level converters to interconnect gates on different
supplies (to avoid static currents)
Low Power Design for SoCs
ASIC Tutorial Processor Core.11
Core.11
©M.J. Irwin, PSU, 1999
Functional Unit
Techniques
Low Power Design for SoCs
ASIC Tutorial Processor Core.12
Core.12
©M.J. Irwin, PSU, 1999
6
Latches and Flipflops
l
Consume a lot of power because they are
clocked every cycle
» Clock energy (Ec)
– energy dissipated when the ff is clocked with stable data
» Data energy (Ed)
– energy dissipated when the ff is clocked and the data has
changed so that the ff changes state
» Typically the data rate (fd) is much lower than the
clock rate (fc)
l
Also impacts clock power since a large portion of
clock power is used to drive the sequential
elements
Low Power Design for SoCs
ASIC Tutorial Processor Core.13
Core.13
©M.J. Irwin, PSU, 1999
Power Consumption in Latches
CLKB
D
Q
CLK
100
% Power
80
60
Data
Clock
40
20
0
0
0.1
0.2
0.3
0.4
0.5
Latch Data AF
From Tiwari,
Tiwari, 1998
Low Power Design for SoCs
ASIC Tutorial Processor Core.14
Core.14
©M.J. Irwin, PSU, 1999
7
Some Typical CMOS FFs
CLK
CLK
Q
D
Q
D
Static TG FF
Dynamic C2MOS FF
Q
D
D
CLK
Q
CLK
Dyn Precharged TSPC FF
Low Power Design for SoCs
Dyn Non-Precharged TSPC FF
ASIC Tutorial Processor Core.15
Core.15
©M.J. Irwin, PSU, 1999
Relative Power Consumption
FF Power Comparison
30
25
TGFF
GFF
C2MOS
PTSPC
NPTSPC
RSLATCH
20
15
10
5
0
0.05
0.15
0.25
0.35
0.45
Latch Data AF
Low Power Design for SoCs
ASIC Tutorial Processor Core.16
Core.16
From Svenson,
Svenson, 1996
©M.J. Irwin, PSU, 1999
8
Some Low Power FFs
D
VDD
GND
Q
VDD
Q
VDD
CLK
CLK
Power PC 603 FF
CLKB
StrongArm SA110 FF
D
Q
CLK
CLKB
Low Power Design for SoCs
ASIC Tutorial Processor Core.17
Core.17
©M.J. Irwin, PSU, 1999
PDP of Some Low Power FFs
80
70
PDPtot (fJ)
60
50
High
Low
Average
40
30
20
10
K6
ET
L
SA
11
0F
F
m
C2
M
OS
Po
we
rP
C
SD
FF
HL
FF
0
From Stojanovic,
Stojanovic, 1998
Low Power Design for SoCs
ASIC Tutorial Processor Core.18
Core.18
©M.J. Irwin, PSU, 1999
9
Self-Gating FF
l When
ff input is equal to its output, suppress
internal clocking to conserve power
» gating function is derived within the FF
Φ
Φ
D
Q
Φ
Φ
Φ
Φ
Φ
CLK
Φ
Low Power Design for SoCs
Φ Strict rules
D on when D can
Q change wrt CLK
ASIC Tutorial Processor Core.19
Core.19
©M.J. Irwin, PSU, 1999
Power of Self-Gated FF
Power dissipation
10
SG FF
Reg FF
0
1
2
Data switching rate fd/fc
From Reyes, 1996
Low Power Design for SoCs
ASIC Tutorial Processor Core.20
Core.20
©M.J. Irwin, PSU, 1999
10
Double Edge Triggered FF
CLKB
CLK
Loads data at both
rising and falling
clock edges
CLKB
CLKB
CLK
Q
D
CLK
CLKB
Low Power Design for SoCs
CLK
ASIC Tutorial Processor Core.21
Core.21
©M.J. Irwin, PSU, 1999
DETFF Pros and Cons
l
Advantages
» Clock frequency can be halved to achieve the same
computational throughput: Pd = 0.84Ps
» Also get a 2X power savings in the clock network
l
Disadvantages
»
»
»
»
»
About 15% larger in transistor count
Maximum operating frequency less
Strict requirements on clock skew
Requires a strict 50% duty cycle
Larger clock load
Low Power Design for SoCs
ASIC Tutorial Processor Core.22
Core.22
©M.J. Irwin, PSU, 1999
11
Arithmetic Components
l Many
techniques for lowering power
consumption of arithmetic components
» adders, ALUs
» barrel shifters, multipliers, MACs
l PDP
of different architectures
l Delay balancing to reduce glitching
l Precomputation
l Common case computation
Low Power Design for SoCs
ASIC Tutorial Processor Core.23
Core.23
©M.J. Irwin, PSU, 1999
PDP of Different Adders
100
RCA
MCCA
CSkA
VSkA
CSlA
CLA
BKA
ELMA
75
50
25
0
8 bits
16 bits
32 bits
48 bits
64 bits
From Nagendra,
Nagendra, 1996
Low Power Design for SoCs
ASIC Tutorial Processor Core.24
Core.24
©M.J. Irwin, PSU, 1999
12
Array Multiplier
B3
B2
0
M03
Low Power Design for SoCs
M11
M22
M33
M21
M32
Y6
0
M01
M12
M23
Y7
B1
M02
M13
Longest
delay path
2i+j+1
0
M31
Y5
Y4
ASIC Tutorial Processor Core.25
Core.25
B0
0
M00
A0
0
M10
Y0
A1
0
M20
Y1
A2
0
M30
Y2
A3
0
Y3
©M.J. Irwin, PSU, 1999
Multiplier Cell Structure
Bj
sum
input
Ai
carry out
full
adder
carry in
add
delay
elements
to minimize
glitching
sum
output
Low Power Design for SoCs
ASIC Tutorial Processor Core.26
Core.26
©M.J. Irwin, PSU, 1999
13
Precomputation Logic
Precomputed
inputs
R1
Gated
inputs
R2
Combination
logic
f(X)
Outputs
Load
g(X) disable
Precomputation
logic
lIdentify
logical conditions at inputs that are
invariant to the output
»since those inputs don’t affect output, disable input transitions
»trade area for power
Low Power Design for SoCs
ASIC Tutorial Processor Core.27
Core.27
©M.J. Irwin, PSU, 1999
Binary Comparator Example
An
Bn
An-1
Bn-1
R1
R2
n-bit binary value
comparator
A>B
A>B
A1
B1
Load
disable
An = Bn
Low Power Design for SoCs
Can achieve up to 75% power
reduction with 3% area overhead
and 1 to 5 additional gate delays
in worst case path
ASIC Tutorial Processor Core.28
Core.28
©M.J. Irwin, PSU, 1999
14
Design Issues in Precomputation
l Design
steps
1. Select precomputation architecture
2. Determined the precomputed and gated inputs
(R1 should be much smaller than R2)
3. Find (good implementation for) g(X)
4. Evaluate potential power savings based on input
statistics (if savings not sufficient go to step 2 or
3 and try again)
l Also
works for multiple output functions
where g(X) is the product of gj(X) over all j
Low Power Design for SoCs
ASIC Tutorial Processor Core.29
Core.29
©M.J. Irwin, PSU, 1999
Common Case Computation
Inputs
common case detected
sleep1
CC detection
circuit
Original
circuit
CC execution
circuit
sleep2
sleep3
CCC
controller
common case completed
Outputs
Low Power Design for SoCs
ASIC Tutorial Processor Core.30
Core.30
©M.J. Irwin, PSU, 1999
15
Activity of CCC Circuit Over
Time
Original
circuit
CC detection
circuit
CC execution
circuit
tp tc
te
Time
lSeveral
(possibly conflicting) factors involved
in choosing the CC circuit leading to maximal
energy and/or time savings
lDependent
Low Power Design for SoCs
on input data statistics
ASIC Tutorial Processor Core.31
Core.31
©M.J. Irwin, PSU, 1999
CCC Performance
Circuit
GCD
Area
% Increase
29.0
Cycles
Power (mW)
% Decrease % Decrease
76.6
59.8
Poly
14.5
17.9
58.2
Test1
21.9
42.5
48.6
Linegen
23.5
43.3
39.7
Graphics
29.7
27.4
12.4
From Lakshminarayana,, 1999
Low Power Design for SoCs
ASIC Tutorial Processor Core.32
Core.32
©M.J. Irwin, PSU, 1999
16
Control Unit Design
Inputs
Outputs
Combinational
Logic
State FFs
n! different possible
encodings (n states)
0/0
State Encoding
One of most important factors
determining area, speed, and
power of resulting control logic
Low Power Design for SoCs
11
0/0
0,1/1
1/X
00
ASIC Tutorial Processor Core.33
Core.33
01
1/X
©M.J. Irwin, PSU, 1999
Power State Encoding Heuristic
Area driven -> try to reduce the distance in
Boolean n-space between related states
l Power driven -> try to minimize number of bit
transitions in the state register
l
» fewer transitions in state register
» fewer transitions propagated to combinational logic
0.1
0.3
01
0.1
0.4
00
Low Power Design for SoCs
0.1
11
ASIC Tutorial Processor Core.34
Core.34
probability that a
transition will occur
(sum of all edges
equals unity)
©M.J. Irwin, PSU, 1999
17
Caveat
l Lowest
E[M] may not be lowest in power ->
it could require more gates and/or signal
transitions in the combinational logic
l Experiments
show that the area and power
dissipation of a state machine are
correlated when the state encoding is
varied
Low Power Design for SoCs
ASIC Tutorial Processor Core.35
Core.35
©M.J. Irwin, PSU, 1999
State Encoding Effects
750
Power
700
650
600
550
500
3300
3400
3500
3600
3700
3800
Area
Low Power Design for SoCs
ASIC Tutorial Processor Core.36
Core.36
3900
4000
4100
From Yeap,
Yeap, 1997
©M.J. Irwin, PSU, 1999
18
Practical Considerations
l Balance
area-power by forced encoding of
only a subset of states that span the high
probability edges
» leave assignment of remaining states to the
logic synthesis system for area optimization
» fortunately, in practice, most state machines
have this characteristic
l Unlike
area encoding, power encoding
requires knowledge of probabilities of state
transitions and input signals
Low Power Design for SoCs
ASIC Tutorial Processor Core.37
Core.37
©M.J. Irwin, PSU, 1999
Architecture
Techniques
Low Power Design for SoCs
ASIC Tutorial Processor Core.38
Core.38
©M.J. Irwin, PSU, 1999
19
Glitch Reduction by Pipelining
l Glitches
are dependent on the logic
depth of the circuit
l Nodes logically deeper are more prone
to glitching
» arrival times of the gate inputs are more
spread due to delay imbalances
» usually affected by more PI switching
l Reduce
depth by adding pipeline
registers
Low Power Design for SoCs
ASIC Tutorial Processor Core.39
Core.39
©M.J. Irwin, PSU, 1999
Typical RISC Datapath
l Five
stage pipeline (originally for
performance, but also helps with power)
Low Power Design for SoCs
Memory
ASIC Tutorial Processor Core.40
Core.40
D$
WriteBack
MDR
Execute
MAR
I$
Decode
Instruction
PC
Fetch
©M.J. Irwin, PSU, 1999
20
Pipelined Multiplier
CLK
B3
0
M03
M13
M23
M33
Y7
Low Power Design for SoCs
Y6
B2
0
M02
M12
M22
M32
Y5
B1
0
M01
M11
M21
M31
Y4
B0
0
M00
A0
0
M10
Y0
A1
0
M20
Y1
A2
0
M30
Y2
A3
0
Y3
ASIC Tutorial Processor Core.41
Core.41
©M.J. Irwin, PSU, 1999
Signal Gating
l
Mask unwanted switching activity from
propagating
source
signal
gated
signal
Latch/
FF
control signal to suppress source signal
l
Generation of control signals requires additional
logic circuitry (more power)
Low Power Design for SoCs
ASIC Tutorial Processor Core.42
Core.42
©M.J. Irwin, PSU, 1999
21
Signal Gating, con’t
l Signal
gating saves power if the relative
enable/disable frequency of control signal
is much lower than the frequency of source
signal (so many signal activities blocked)
l Savings even greater if a group of source
signals can share a control signal
l Good candidates - clock signals, address
or data buses, signals with high frequency
or high glitching
Low Power Design for SoCs
ASIC Tutorial Processor Core.43
Core.43
©M.J. Irwin, PSU, 1999
Guarded Evaluation
Reduce switching activity by adding latches at the
inputs if outputs are not used
A
A
B
C
B
C
Multiplier
condition
l
Latch
l
Multiplier
condition
Latch preserves previous value of inputs to
suppress activity
– could also use AND gates to mask one or both inputs to zero ->
forced zero (good if zero-out condition changes infrequently
compared to data rate)
Low Power Design for SoCs
ASIC Tutorial Processor Core.44
Core.44
©M.J. Irwin, PSU, 1999
22
Sleep Modes
l Software
power control - power
management
» DOZE - most fu’s stopped except on-chip
cache memory (cache coherency)
» NAP - cache also turned off, time out or
external interrupt to resume
» SLEEP - clock off, external interrupt to
resume
Deeper sleep mode saves
more power
Low Power Design for SoCs
Deeper sleep mode requires
more latency to resume
ASIC Tutorial Processor Core.45
Core.45
©M.J. Irwin, PSU, 1999
PowerPC Sleep Modes
Mode
66Mhz
80Mhz
No power mgmt
Dynamic power mgmt
DOZE
2.18W
1.89W
307mW
2.54W
2.20W
366mW
NAP
113mW
135mW
SLEEP
89mW
105mW
SLEEP without PLL
18mW
19mW
SLEEP without clock
2mW
2mW
10 cycles to wake up from SLEEP
Low Power Design for SoCs
100us to wake up from SLEEP+
ASIC Tutorial Processor Core.46
Core.46
©M.J. Irwin, PSU, 1999
23
Keeper Circuits
lA
floating node (not driven by any gates)
can suffer charge decay resulting in shortcircuit currents
powered
down
weak
l Keeper
circuits can
power down
control
» slightly increase power dissipation
» slightly increase delay
l Essential
Low Power Design for SoCs
in circuits with sleep modes
ASIC Tutorial Processor Core.47
Core.47
©M.J. Irwin, PSU, 1999
A Low Power
Processor Core
Example
Low Power Design for SoCs
ASIC Tutorial Processor Core.48
Core.48
©M.J. Irwin, PSU, 1999
24
M•
CORE Architecture
GP
Alt
Control
reg file
reg file
reg file
(32bitx16) (32bitx16) (32bitx13)
X port
Y port
Address
bus
Immed
PC
increment
Scale
Branch
adder
Barrel shift,
FF1
Sign ext
Instr pipeline
ALU, priority encode,
0 detect
Instr decoder
Writeback bus
H/W acc bus
Low Power Design for SoCs
Data
bus
ASIC Tutorial Processor Core.49
Core.49
©M.J. Irwin, PSU, 1999
M•
CORE Power Distribution
28%
36%
9%
5%
6%
42%
7%
36%
Datapath
Clock
Control
Low Power Design for SoCs
8%
9%
14%
ASIC Tutorial Processor Core.50
Core.50
Reg File
Addr/Data Bus
Inst Reg
Barrel Shifter
X MUX
Y MUX
Addr Gen
Other
©M.J. Irwin, PSU, 1999
25
Key References
Alidina, Precomputation-based sequential logic optimization for low power,
IEEE Trans. on VLSI Systems, 2(4):426-436, 1994.
Hossain, Low power design using double edge triggered flipflop, IEEE Trans.
on VLSI Systems, 2(2):261-265, 1994.
Lakshminarayana, et.al., Common-Case Computation, Proc. of DAC, pp 5661, 1999.
Motorola, M•
CORE Architecture microRISC Engine, MCORE 1/D,
www.mot.com/SPS/MCORE/info_documentation.htm
Mutsunori, Low power designmethod using multiple supply voltages, Proc. of
SLPED, pp. 36-41, 1997.
Rabaey, Digital Integrated Circuits, Prentice-Hall, 1996.
Reyes, Low Power FF Circuit and Method Thereof, Patent No 5,498,988,
1996.
Roy, Power analysis and design at the system level, Low Power Design in
Deep Submicron Electronics, Nebel and Mermet, Ed., Kluwer, 1997.
Low Power Design for SoCs
ASIC Tutorial Processor Core.51
Core.51
©M.J. Irwin, PSU, 1999
Key References, con’t
Sakuta, Delay balanced multipliers for low power, Proc. of SLPE, pp. 36-37,
1995.
Scott, Designing the Low-Power M•
CORE Architecture, Proc. Inter. Symp.
Computer Architecture Power Driven Microarchitecture Workshop, June
1998.
Stojanovic, A unified approach in the analysis of latches and FFs for low
power systems, Proc. of ISLPED, pp. 227-232, 1998.
Tiwari, Reducing power in high-performance microprocessors, Proc. of DAC,
pp. 732-737, 1998.
Tiwari, Guarded evaluation, Proc. ISLPD, pp. 221-226, 1995.
Yeap, CPU controller optimization for HDL logic synthesis, Proc. of CICC, pp.
127-130, 1997.
Yeap, Practical Low Power Digital VLSI Design, KAP, 1998.
Low Power Design for SoCs
ASIC Tutorial Processor Core.52
Core.52
©M.J. Irwin, PSU, 1999
26
Download