Low Power Architecture and Implementation of Multicore Design Khushboo Sheth, Kyungseok Kim

advertisement
Low Power Architecture and
Implementation of Multicore Design
Khushboo Sheth, Kyungseok Kim
Fan Wang, Siddharth Dantu
Advisor: Dr. V Agrawal
ELEC6270 Low Power Design of Electronic Circuits
Team Project
VLSI D&T Seminar
Nov. 8 2006
Project Objectives
 Design
and verify 16-bit ALU with
synchronous clocked inputs and outputs.
 Study low-voltage power and delay
characteristics of the design.
 Redesign ALU for minimum power and
highest speed.
Component of Power Dissipation

Dynamic
Power due to Signal transitions.
•
•
Logic power (due to logic transitions).
Glitch power (due to glitches).
Short Circuit power

Static
Leakage power (due to leakage currents).
Power components in CMOS circuit
Ron
Dynamic power
Leakage power
VDD
vi (t)
vo(t)
Short circuit power
R=large
Ground
Power = CVDD2
CL
1-bit ALU Design
1-bit ALU Core
Reg B
Reg C
Reg A
1 bit ALU Core
Simulation Specification
Technology
TSMC 0.25 um
Application Voltage
2.5 Volt
N-MOS Vth
0.365 V
P-MOS Vth
-0.5625 V
Temperature
90 C degree
Spice Simulator
Eldo ver. 6.3.1.1
Sweep Supply Voltage (6 point)
0,0.5,1.0,1.5,2.0,2.5 V
1-bit ALU Core Timing ( Vdd=2.5V )
A
opcode[3:0]
NX156
C
B
NX60
Combinational
Logic
CYIN
CLK
opcode 1010 (nand)
opcode 1001 (c<=b)
opcode 1000 (c<=a)
opcode 0110 (or)
opcode 0100 (xor)
opcode 0010 (equal)
opcode 0000 (a+b)
opcode 0111 (and)
opcode 0101 (nor)
opcode 0011 (not equal)
opcode 0001 (a-b)
opcode others (all zero’s output)
Z
NX16
COMPOUT
CY
NX80
DFF
Longest Path in Combinational Logic: c <= a+b (Opcode 0000)
C
CY
Z
COMPOUT
1-bit ALU Core Sweep Vdd from 2.5V to 0V
2.5V
2.0V
1.5V
1.0V
0.5V
0.0V
Analog Mode
C(NX156) Output
Vdd=2.5
Vdd=0.5
1Bit ALU Core Logic Operation Voltage @200Mz
Supply Voltage Sweep near PMOS Vth = -0.5625 V ( ver. NMOS Vth= 0.365)
Sweep From Vsupply = 0.50 to 1.00 Volt ( linear increment 0.05 V, 11 point)
Vsupply = 0.80 V
(Analog Domain)
opcode 1000 (c<=a)
Vsupply = 0.85 V
Overshoot
(Analog Domain)
Ripples
Output
Output
Input
Input
Vsupply = 0.80 V
Vsupply = 0.85 V
Wrong Operation
Correct Operation
1-bit ALU Average Power vs. Delay
@200MHz
Average Power ( Total ALU Block ver. ALU Core)
400
4
1bit ALU Block
354.563
Average Power
2.2493
179.9153
200
2
1.4203
82.8828
31.0283
0.5427
0
0
0.0
1-bit ALU Core
Delay
0.5
0.5
1
1.0
0.4955
0.7204
1.5
Vsupply(V) 1.5
Power = CVDD2
2
2.0
0.4123
0
2.5
2.5
Delay(nsec)
1-bit ALU Core
Power(uW)
Average Power
Combinational
Logic
(16-Bit ALU)
Register
Input
Register
16 Bit ALU (Single Core) Design
Cref
CK
Supply voltage
Total capacitance switched per cycle
Clock frequency
Power consumption:
Pref
= Vref
= Cref
=f
= CrefVref2f
Output
16-BIT ALU Vectors
a
b
Opcode
cyin
Vector1 1010101010101010
0001010101010101
0001 (sub)
0
Vector2 0101010101010101
1010101010101010
0011 (comp)
0
Vector3 0101010101010101
1010101010101010
0100 (xor)
0
Vector4 1111111111111111
0000000000000001
0000 (add)
0
Vector5 0110011001100110
0000000000000000
1010 (nand)
0
Vector6 0001011001101101
0101010010101010
0001 (sub)
0
*Vector4 activate the critical path, carryout = 1
16-Bit ALU Simulation Result
Circuit information: # 694 Gates
Temperature: 27C o
Clock Frequency applied: 10 MHz
Vectors Applied: 6 vectors
TSMC025 Technology : Vthn = 0.365 V, Vthp = -0.562 V
By ELDO, SPICE simulation
Voltage
(v)
2.5
Simulation Time: 700 ns
1.25
0.85
0.625 0.45
24.55
Static
Power(nw)
6.02
3.05
1.84
1.71
391.16
Average
Power
(uw)
Delay (ns) 2.83
62.62
26.66
14.57
3.56
7.14
18.88
73.21
Ckt
failed
16 Bit ALU Functional Correct Operation at 2.5 V, 1.25 V, 0.85 V and
0.625 V for 6 Vectors
Circuit fail @0.45 V (< Vth)
Simulated Single Vector Pair
16-Bit ALU Power Savings and Delay
Increase with Reference @ 2.5 Volts
(Reference)
Voltage
(v)
Average
Power
(uw)
Delay
(ns)
1.25 V
0.85 V
0.625 V
VDD/2
VDD/3
VDD/4
391.16
62.22
P2.5/6.24
84%
26.22
P2.5/14.67
93%
14.67
P2.5/26.66
96%
2.83
7.14
2.57*D2.5
18.87
6.67*D2.5
73.21
25.87*D2.5
VDD
2.5V
16 Bit ALU Power Savings and Delay
Increase with Reference @1.25 Volts
Voltage
(v)
Average
Power
(uw)
Delay
(ns)
0.85
0.625
(VDD/1.5)
(VDD/2)
62.22
26.66
P1.25/2.35
57%
14.67
P1.25/4.27
77%
7.14
18.87
2.63 * D1.25
73.21
10.25 * D1.25
(Reference)
1.25
Different Technology Impact On Power Saving
16 Bit ALU
Simulation Setup:
 Supply Voltage: 2.5v
 Simulation Transient Time: 700 ns
 6 vectors
 Temperature: 27Co
Technology
TSMC035
#Gates after synthesis 734 gates
TSMC025
694 gate
2.5 V
2.5 V
24.555 N Watts
24.550 N Watts
Average Power 381.60 U Watts
3.12 ns
Delay
391.16 U Watts
Voltage
Static Power
2.83 ns
Temperature Influence On Power





Circuit information: # 734 Gates
Clock Frequency applied: 10 MHz ; Vdd=2.5V
Vectors Applied: 6 vectors
Simulation Time: 700 ns
TSMC035 Technology
Temperature
(C o )
0
27
60
90
120
900
Static Power
(nw)
12.7
24.5
75.51
357.36
4803.3
3.38
mw
Average Power
(uw)
404.23
381.60
378.15
367.48
363.15
70.43
w
Delay (ns)
2.58
3.12
3.18
3.53
3.91
Ckt
fail!!
Multicore Design Methodology




Lower supply voltage
 This slows down circuit speed
 Use parallel computing to gain the speed back
Multi-core means to place two or more complete cores
within a single module.
This architecture is a “divide and conquer” strategy. By
splitting the work between multiple execution cores , a
multi-core design can perform more work within a given
clock cycle.
About more than 60% reduction in power is observed.
Source: http://www.eng.auburn.edu/~vagrawal/D&TSEMINAR_SPR06/SLIDES/Agrawal_DTSem06.ppt
f/4
Comb.
Logic
Copy 2
f/4
Rgst
Comb.
Logic
Copy 3
Rgst
Comb.
Logic
Copy 4
Ck3
Ck2
Ck1
Mux control
CK
Ck0
f/4
Register
Rgst
Input
Rgst
16 Bit ALU
Comb.
Logic
Copy 1
f/4
4 to 1 multiplexer
Parallel Architecture
f
Output
Control Signals, N = 4
CK
Phase 1
Phase 2
Phase 3
Phase 4
Mux control
00 01 10 11 00 01 01 10 11
……
16 Bit ALU Multi-core Power Savings and Delay Increase with
Reference @2.5 Volts
Circuit information: # 2617 Gates Clock Frequency applied: 10 MHz
Temperature: 27C
Vectors Applied: 6 vectors
TSMC025 Technology : Vthn = 0.365 V, Vthp = -0.562 V
Simulator: ELDO(Spice)
Simulation Setup: Simulation Time: 700 ns
Voltage
(v)
(Reference)
Static
Power (nw)
96.35
Average
Power
(uw)
Delay
(ns)
2.5
1.25
0.85
VDD/2 VDD/3
23.56
11.94
95.64U 40.93U
687.86
0.11
P2.5/7.19
86%
P2.5/16.8
94%
0.57
1.52
0.625
VDD/4
0.45
7.21
6.37
21.13U
P2.5/32.55
94.75%
7.26U
30.70
Ckt
failed
5.18*D2.5 13.8*D2.5 279.1*D2.5
16 Bit ALU Multicore Power Savings and
Delay Increase with Reference @1.25 Volts
Voltage
(v)
Average
Power
(uw)
Delay
(ns)
(Reference)
1.25
VDD
0.85
VDD/1.5
0.625
VDD/2
95.64
40.93
P1.25/2.33
57%
21.13
P1.25/4.52
78%
0.57
1.52
2.67 * D1.25
30.7
53.86 * D1.25
Power and Delay comparison @2.5 V
Reference Design with Multicore Design at different voltages
Voltage
(v)
2.5
VDD
1.25
0.85
0.725
Multicore
Reference Design
VDD/2
Design
Multicore
Design
Multicore
Design
VDD/3
VDD/3.5
0.7
0.625
Multicore Multicore
Design
Design
VDD/3.6 VDD/4
Average
Power
(uw)
391.16
95.64
P2.5/4.09
76%
40.93
P2.5/9.56
89.5%
25.6
22.35
21.14
P2.5/15.23 P2.5/17.5 P2.5/18.5
93.45%
94.3%
94.6%
Delay
(ns)
2.83
0.57
D2.5/4.96
1.52
D2.5/1.86
2.61
D2.5/1.08
3.04
30.7
D2.5/0.93 D2.5/0.09
Summary
 For Single core ALU design we get more than 60%



power savings at reduced voltage but at the cost of
performance.
With Reference of 2.5 Volts we observe power drops
faster than 1/Vsquare.
With Reference of 1.25 Volts, power drop is almost
equal to 1/Vsquare.
Multi-core design helps to gain the speed back at
reduced voltage and consumes less power.
References










ELEC6270 Low Power Design Electronics Class Slides from Dr. Agrawal
Spring 06, Dr. Agrawal’ Presentation on VLSI D&T seminar “Multi-Core
Parallelism for Low-Power Design”
www.tomshardware.com
N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Reading,
Massachusetts, Addison-Wesley, 2005.
L. Shang, R.P Dick, “Thermal crisis: challenges and potential solutions,”
Potentials IEEE, vol. 25 , Issue 5, 2006
International Technology Roadmap for Semiconductors. http://public.itrs.net
Alokik Kanwal, “A review of Carbon Nanotube Field Effect Transistors”
Version 2.0, 2003
K. K Likharev, “Single Electron Devices and their applications,” Proc IIEEE,
vol. 87, no. 4, pp. 606-632, Apr. 1999
A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS
Design, Boston: Kluwer Academic Publishers (Now Springer), 1995.
“Quad-core processor forecas”,Alexander Wolfe @TechWeb
Thank You !!!
Download