ELEC 5970-001/6970-001(Fall 2005) Special Topics in Electrical Engineering Power Aware Microprocessors

advertisement
ELEC 5970-001/6970-001(Fall 2005)
Special Topics in Electrical Engineering
Low-Power Design of Electronic Circuits
Power Aware Microprocessors
Vishwani D. Agrawal
James J. Danaher Professor
Department of Electrical and Computer Engineering
Auburn University
http://www.eng.auburn.edu/~vagrawal
vagrawal@eng.auburn.edu
11/15/05
ELEC 5970-001/6970-001 Lecture 19
1
SIA Roadmap for Processors (1999)
Year
1999
2002
2005
2008
2011
2014
Feature size (nm)
180
130
100
70
50
35
Logic transistors/cm2
6.2M
18M
39M
84M
180M
390M
Clock (GHz)
1.25
2.1
3.5
6.0
10.0
16.9
Chip size (mm2)
340
430
520
620
750
900
Power supply (V)
1.8
1.5
1.2
0.9
0.6
0.5
High-perf. Power (W)
90
130
160
170
175
183
Source: http://www.semichips.org
11/15/05
ELEC 5970-001/6970-001 Lecture 19
2
Power Reduction in Processors
• Just about everything is used.
• Hardware methods:
•
•
•
•
Voltage reduction for dynamic power
Dual-threshold devices for leakage reduction
Clock gating, frequency reduction
Sleep mode
• Architecture:
• Instruction set
• hardware organization
• Software methods
11/15/05
ELEC 5970-001/6970-001 Lecture 19
3
SPEC CPU2000 Benchmarks
• Twelve integer and 14 floating point
programs, CINT2000 and CFP2000.
• Each program run time is normalized to
obtain a SPEC ratio with respect to the run
time of Sun Ultra 5_10 with a 300MHz
processor.
• CINT2000 and CFP2000 summary
measurements are the geometric means
of SPEC ratios.
11/15/05
ELEC 5970-001/6970-001 Lecture 19
4
Reference CPU s: Sun Ultra 5_10
300MHz Processor
3500
3000
2500
2000
CINT2000
CFP2000
1500
1000
0
11/15/05
gzip
vpr
gcc
mcf
crafty
parser
eon
perlbmk
gap
vortex
bzip2
twolf
wupwise
swim
mgrid
applu
mesa
galgel
art
equake
facerec
ammp
lucas
fma3d
sixtrack
apsi
500
ELEC 5970-001/6970-001 Lecture 19
5
CINT2000: 3.4GHz Pentium 4, HT
Technology (D850MD Motherboard)
SPECint2000_base = 1341
SPECint2000 = 1389
2500
2000
1500
Base ratio
Opt. ratio
1000
11/15/05
ELEC 5970-001/6970-001 Lecture 19
twolf
bzip2
vortex
gap
perlbmk
eon
parser
crafty
mcf
gcc
vpr
0
gzip
500
Source: www.spec.org
6
Two Benchmark Results
• Baseline: A uniform configuration not
optimized for specific program:
• Same compiler with same settings and flags used
for all benchmarks
• Other restrictions
• Peak: Run is optimized for obtaining the
peak performance for each benchmark
program.
11/15/05
ELEC 5970-001/6970-001 Lecture 19
7
CFP2000: 3.6GHz Pentium 4, HT Technology
(D925XCV/AA-400 Motherboard)
3000
SPECfp2000_base = 1627
SPECfp2000 = 1630
2500
2000
1500
Base ratio
Opt. ratio
1000
0
11/15/05
wupwise
swim
mgrid
applu
mesa
galgel
art
equake
facerec
ammp
lucas
fma3d
sixtrack
apsi
500
ELEC 5970-001/6970-001 Lecture 19
Source: www.spec.org
8
CINT2000: 1.7GHz Pentium 4
(D850MD Motherboard)
11/15/05
ELEC 5970-001/6970-001 Lecture 19
twolf
bzip2
vortex
gap
perlbmk
eon
parser
crafty
mcf
gcc
vpr
Base ratio
Opt. ratio
gzip
1000
900
800
700
600
500
400
300
200
100
0
SPECint2000_base = 579
SPECint2000 = 588
Source: www.spec.org
9
CFP2000: 1.7GHz Pentium 4
(D850MD Motherboard)
SPECfp2000_base = 648
SPECfp2000 = 659
1400
1200
1000
800
600
Base ratio
Opt. ratio
400
0
11/15/05
wupwise
swim
mgrid
applu
mesa
galgel
art
equake
facerec
ammp
lucas
fma3d
sixtrack
apsi
200
ELEC 5970-001/6970-001 Lecture 19
Source: www.spec.org
10
Energy SPEC Benchmarks
• Energy efficiency mode: Besides the
execution time, energy efficiency of SPEC
benchmark programs is also measured.
Energy efficiency of a benchmark program
is given by:
1/(Execution time)
Energy efficiency =
────────────
joules consumed
11/15/05
ELEC 5970-001/6970-001 Lecture 19
11
Energy Efficiency
• Efficiency averaged on n benchmark programs:
n
1/n
Efficiency
=
( Π Efficiencyi )
i=1
where Efficiencyi is the efficiency for program i.
• Relative efficiency:
Efficiency of a computer
Relative efficiency = ─────────────────
Eff. of reference computer
11/15/05
ELEC 5970-001/6970-001 Lecture 19
12
SPEC2000 Relative Energy
Efficiency
6
5
Pentium M
@1.6/0.6GHz Energyefficient procesor
Pentium 4-M
@2.4GHz (Reference)
4
3
2
1
SPECFP2000
SPECINT2000
SPECFP2000
SPECINT2000
SPECFP2000
SPECINT2000
0
Pentium III-M
@1.2HGz
Always
Laptop
Min. power
max. clock adaptive clk. min. clock
11/15/05
ELEC 5970-001/6970-001 Lecture 19
13
Voltage Scaling
• Dynamic: Reduce voltage and frequency
during idle or low activity periods.
• Static: Clustered voltage scaling
• Logic on non-critical path given lower voltage
• 47% power reduction with 10% area increase
reported.
• M. Igarashi et al., “Clustered Voltage Scaling
Techniques for Low-Power Design,” Proc. IEEE
Symp. Low Power Design, 1997.
11/15/05
ELEC 5970-001/6970-001 Lecture 19
14
Pipeline Gating
• A pipeline processor uses speculative execution.
• Incorrect branch prediction results in pipeline stalls and
wasted energy.
• Idea: Stop fetching instructions if a branch
hazard is expected:
• If the count (M) of incorrect predictions exceeds a prespecified number (N), then suspend fetching instruction for
some k cycles.
• Ref.: S. Manne, A. Klauser and D. Grunwald,
“Pipeline Gating: Speculation Control for Energy
Reduction,” Proc. 25th Annual International
Symp. Computer Architecture, June 1998.
11/15/05
ELEC 5970-001/6970-001 Lecture 19
15
Slack Scheduling
• Application: Superscalar, out-of-order execution:
• An instruction is executed as soon as data and resources it
needs become available.
• A commit unit reorders the results.
• Delay the execution of instructions whose result is
not immediately needed.
• Example of RISC instructions:
•
•
•
•
•
11/15/05
add r0, r1, r2;
sub r3, r4, r5;
and r9, x1, r9;
or r5, r9, r10;
xor r2, r10, r11;
(A)
(B)
(C)
(D)
(E)
J. Casmira and D. Grunwald,
“Dynamic Instruction Scheduling
Slack,” Proc. ACM Kool Chips
Workshop, Dec. 2000.
ELEC 5970-001/6970-001 Lecture 19
16
Slack Scheduling Example
Standard scheduling
A
B
C
D
E
Slack scheduling
B
C
A
D
E
11/15/05
ELEC 5970-001/6970-001 Lecture 19
17
Slack Scheduling
Scheduling logic
Re-order buffer
Slack bit
11/15/05
ELEC 5970-001/6970-001 Lecture 19
Low-power
execution units
18
Parallel Architecture
Processor
Input
Output
Processor
Input
11/15/05
Output
f/2
f
Processor
Capacitance = C
Voltage = V
Frequency = f
Power = CV2f
f/2
ELEC 5970-001/6970-001 Lecture 19
f
Capacitance = 2.2C
Voltage = 0.6V
Frequency = 0.5f
Power = 0.396CV2f
19
½
Proc.
½
Proc.
Output
f
f
Capacitance = C
Voltage = V
Frequency = f
Power = CV2f
11/15/05
Output Input
Register
Processor
Register
Input
Register
Pipeline Architecture
Capacitance = 1.2C
Voltage = 0.6V
Frequency = f
Power = 0.432CV2f
ELEC 5970-001/6970-001 Lecture 19
20
Approximate Trend
n-parallel proc.
n-stage pipeline proc.
Capacitance
nC
C
Voltage
V/n
V/n
Frequency
f/n
f
Power
CV2f/n2
CV2f/n2
Chip area
n times
10-20% increase
G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer
Academic Publishers, 1998.
11/15/05
ELEC 5970-001/6970-001 Lecture 19
21
Clock Distribution
clock
11/15/05
ELEC 5970-001/6970-001 Lecture 19
22
Clock Power
Pclk
= CLVDD2f + CLVDD2f / λ + CLVDD2f / λ2 + . . .
= CLVDD2f
where CL =
λ =
stages – 1
Σ
n=0
1
─
λn
total load capacitance
constant fanout at each stage in distribution
network
Clock consumes about 40% of total processor power.
11/15/05
ELEC 5970-001/6970-001 Lecture 19
23
Clock Network Examples
Alpha 21064
Alpha 21164
Alpha 21264
Technology
0.75μ CMOS
0.5μ CMOS
0.35μ CMOS
Frequency (MHz)
200
300
600
Total capacitance
12.5nF
Clock load
3.25nF
Clock power
Max. clock skew
3.75nF
20W
200ps (<10%)
90ps
D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for
a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33,
no. 11, pp. 1627-1633, Nov. 1998.
11/15/05
ELEC 5970-001/6970-001 Lecture 19
24
Power Reduction Example
•
•
•
•
•
•
•
Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W
Reduce voltage to 1.5V, power (5.3x) = 4.9W
Eliminate FP, power (3x) = 1.6W
Scale 0.75→0.35μ, power (2x) = 0.8W
Reduce clock load, power (1.3x) = 0.6W
Reduce frequency 200→160MHz, power (1.25x) = 0.5W
J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC
Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no.
11, pp. 1703-1714, Nov. 1996.
11/15/05
ELEC 5970-001/6970-001 Lecture 19
25
Download