ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

advertisement
ELEC 7770
Advanced VLSI Design
Spring 2007
Power Aware Microprocessors
Vishwani D. Agrawal
James J. Danaher Professor
ECE Department, Auburn University
Auburn, AL 36849
vagrawal@eng.auburn.edu
http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
1
SIA Roadmap for Processors (1999)
Year
1999
2002
2005
2008
2011
2014
Feature size (nm)
180
130
100
70
50
35
Logic transistors/cm2
6.2M
18M
39M
84M
180M
390M
Clock (GHz)
1.25
2.1
3.5
6.0
10.0
16.9
Chip size (mm2)
340
430
520
620
750
900
Power supply (V)
1.8
1.5
1.2
0.9
0.6
0.5
High-perf. Power (W)
90
130
160
170
175
183
Source: http://www.semichips.org
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
2
Power Reduction in Processors
 Just about everything is used.
 Hardware methods:


 Voltage reduction for dynamic power
 Dual-threshold devices for leakage reduction
 Clock gating, frequency reduction
 Sleep mode
Architecture:
 Instruction set
 hardware organization
Software methods
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
3
SPEC CPU2000 Benchmarks
 Twelve integer and 14 floating point programs,


CINT2000 and CFP2000.
Each program run time is normalized to obtain a
SPEC ratio with respect to the run time of Sun
Ultra 5_10 with a 300MHz processor.
CINT2000 and CFP2000 summary
measurements are the geometric means of
SPEC ratios.
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
4
Reference CPU s: Sun Ultra 5_10
300MHz Processor
3500
3000
2500
2000
CINT2000
CFP2000
1500
1000
0
Spring 07, Feb 22
gzip
vpr
gcc
mcf
crafty
parser
eon
perlbmk
gap
vortex
bzip2
twolf
wupwise
swim
mgrid
applu
mesa
galgel
art
equake
facerec
ammp
lucas
fma3d
sixtrack
apsi
500
ELEC 7770: Advanced VLSI Design (Agrawal)
5
CINT2000: 3.4GHz Pentium 4, HT
Technology (D850MD Motherboard)
SPECint2000_base = 1341
SPECint2000 = 1389
2500
2000
1500
Base ratio
Opt. ratio
1000
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
twolf
bzip2
vortex
gap
perlbmk
eon
parser
crafty
mcf
gcc
vpr
0
gzip
500
Source: www.spec.org
6
Two Benchmark Results
 Baseline: A uniform configuration not optimized
for specific program:
 Same compiler with same settings and flags used
for all benchmarks
 Other restrictions
 Peak: Run is optimized for obtaining the peak
performance for each benchmark program.
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
7
CFP2000: 3.6GHz Pentium 4, HT Technology
(D925XCV/AA-400 Motherboard)
3000
SPECfp2000_base = 1627
SPECfp2000 = 1630
2500
2000
1500
Base ratio
Opt. ratio
1000
0
Spring 07, Feb 22
wupwise
swim
mgrid
applu
mesa
galgel
art
equake
facerec
ammp
lucas
fma3d
sixtrack
apsi
500
ELEC 7770: Advanced VLSI Design (Agrawal)
Source: www.spec.org
8
CINT2000: 1.7GHz Pentium 4
(D850MD Motherboard)
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
twolf
bzip2
vortex
gap
perlbmk
eon
parser
crafty
mcf
gcc
vpr
Base ratio
Opt. ratio
gzip
1000
900
800
700
600
500
400
300
200
100
0
SPECint2000_base = 579
SPECint2000 = 588
Source: www.spec.org
9
CFP2000: 1.7GHz Pentium 4 (D850MD
Motherboard)
SPECfp2000_base = 648
SPECfp2000 = 659
1400
1200
1000
800
600
Base ratio
Opt. ratio
400
0
Spring 07, Feb 22
wupwise
swim
mgrid
applu
mesa
galgel
art
equake
facerec
ammp
lucas
fma3d
sixtrack
apsi
200
ELEC 7770: Advanced VLSI Design (Agrawal)
Source: www.spec.org
10
Energy SPEC Benchmarks
 Energy efficiency mode: Besides the execution
time, energy efficiency of SPEC benchmark
programs is also measured. Energy efficiency of
a benchmark program is given by:
1/(Execution time)
Energy efficiency
=
────────────
joules consumed
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
11
Energy Efficiency
 Efficiency averaged on n benchmark programs:
n
Efficiency =
( Π Efficiencyi )1/n
i=1
where Efficiencyi is the efficiency for program i.
 Relative efficiency:
Efficiency of a computer
Relative efficiency = ─────────────────
Eff. of reference computer
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
12
SPEC2000 Relative Energy Efficiency
6
5
Pentium M
@1.6/0.6GHz Energyefficient procesor
Pentium 4-M
@2.4GHz (Reference)
4
3
2
1
SPECFP2000
SPECINT2000
SPECFP2000
SPECINT2000
SPECFP2000
SPECINT2000
0
Pentium III-M
@1.2HGz
Always
Laptop
Min. power
max. clock adaptive clk. min. clock
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
13
Voltage Scaling
 Dynamic: Reduce voltage and frequency during

idle or low activity periods.
Static: Clustered voltage scaling
 Logic on non-critical paths given lower voltage.
 47% power reduction with 10% area increase
reported.
 M. Igarashi et al., “Clustered Voltage Scaling
Techniques for Low-Power Design,” Proc. IEEE
Symp. Low Power Design, 1997.
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
14
Pipeline Gating
 A pipeline processor uses speculative execution.
 Incorrect branch prediction results in pipeline stalls and
wasted energy.
 Idea: Stop fetching instructions if a branch hazard is
expected:
 If the count (M) of incorrect predictions exceeds a prespecified number (N), then suspend fetching instruction for
some k cycles.
 Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline
Gating: Speculation Control for Energy Reduction,” Proc.
25th Annual International Symp. Computer Architecture,
June 1998.
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
15
Slack Scheduling
 Application: Superscalar, out-of-order execution:
 An instruction is executed as soon as data and resources it
needs become available.
 A commit unit reorders the results.
 Delay the execution of instructions whose result is not

immediately needed.
Example of RISC instructions:
 add r0, r1, r2;
(A)
 sub r3, r4, r5;
(B)
 and r9, x1, r9;
(C)
 or r5, r9, r10;
(D)
 xor r2, r10, r11;
(E)
Spring 07, Feb 22
J. Casmira and D. Grunwald,
“Dynamic Instruction Scheduling
Slack,” Proc. ACM Kool Chips
Workshop, Dec. 2000.
ELEC 7770: Advanced VLSI Design (Agrawal)
16
Slack Scheduling Example
Standard scheduling
A
B
C
D
E
Slack scheduling
B
C
A
D
E
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
17
Slack Scheduling
Scheduling logic
Re-order buffer
Slack bit
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
Low-power
execution units
18
Clock Distribution
clock
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
19
Clock Power
Pclk
= CLVDD2f + CLVDD2f / λ + CLVDD2f / λ2 + . . .
= CLVDD2f
where CL =
λ =
stages – 1
Σ
n=0
1
─
λn
total load capacitance
constant fanout at each stage in distribution
network
Clock consumes about 40% of total processor power.
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
20
Clock Network Examples
Alpha 21064
Alpha 21164
Alpha 21264
Technology
0.75μ CMOS
0.5μ CMOS
0.35μ CMOS
Frequency (MHz)
200
300
600
Total capacitance
12.5nF
Clock load
3.25nF
3.75nF
Clock power
40%
40% (20W)
Max. clock skew
200ps (<10%)
90ps
D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for
a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33,
no. 11, pp. 1627-1633, Nov. 1998.
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
21
Power Reduction Example







Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W
Reduce voltage to 1.5V, power (5.3x) = 4.9W
Eliminate FP, power (3x) = 1.6W
Scale 0.75→0.35μ, power (2x) = 0.8W
Reduce clock load, power (1.3x) = 0.6W
Reduce frequency 200→160MHz, power (1.25x) = 0.5W
J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC
Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 17031714, Nov. 1996.
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
22
Parallel Architecture
Processor
Input
Output
Processor
Input
Output
f/2
f
Processor
Capacitance = C
Voltage = V
Frequency = f
Power = CV2f
f/2
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
f
Capacitance = 2.2C
Voltage = 0.6V
Frequency = 0.5f
Power = 0.396CV2f
23
Output Input
½
Proc.
Register
Processor
Register
Input
Register
Pipeline Architecture
½
Proc.
Output
f
f
Capacitance = C
Voltage = V
Frequency = f
Power = CV2f
Spring 07, Feb 22
Capacitance = 1.2C
Voltage = 0.6V
Frequency = f
Power = 0.432CV2f
ELEC 7770: Advanced VLSI Design (Agrawal)
24
Approximate Trend
n-parallel proc.
n-stage pipeline proc.
Capacitance
nC
C
Voltage
V/n
V/n
Frequency
f/n
f
Power
CV2f/n2
CV2f/n2
Chip area
n times
10-20% increase
G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer
Academic Publishers, 1998.
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
25
For More on Microprocessors
 T. D. Burd and R. W. Brodersen, Energy Efficient

Microprocessor Design, Springer, 2002.
R. Graybill and R. Melhem, Power Aware
Computing, New York: Plenum Publishers, 2002.
Spring 07, Feb 22
ELEC 7770: Advanced VLSI Design (Agrawal)
26
Download