Power_Management_in_..

advertisement
Minshu Zhao
Power Management in Multicores
Outline
Introduction
 Review of Power management technique
 Power management in Multicore

◦ Identify Multicores Characteristics
◦ Apply power management technique

Future of multicore
Review on low power technique

Clock gating
EN

FF
CK
◦ + Gating can be done
on fine grained
◦ + Save dynamic power
◦ - Not affect static
power
Power Gating
◦ + save both
dynamic and static
power
EN
◦ - need
microseconds to
power up again
◦ - lost data or need
some form of
state retention
Vdd
FF
Review on low power technique

Voltage (Frequency)
Scaling
◦ Scale down frequency
and/or voltage,
sacrifice performance
for power
 I ∝ (Vdd-Vt) ~ Vdd
 f ∝ Vdd
 P ∝ CV2f ∝ V3

Variable device
threshold
◦ Use high vt transistor
to reduce leakage
◦ + reduce leakage
◦ - vt is generally fixed
for one transistor
Outline
Introduction
 Review of Power management technique
 Power management in Multicore

◦ Identify Multicores Characteristics
◦ Apply power management technique

Future of multicore
Identify Multicore Characteristics

Half of the chip is cores
◦ Large dynamic power
◦ Unbalanced power consumption among cores

Another Half of the chip is Cache
◦ Large Leakage Power
Outline
Introduction
 Review of Power management technique
 Power management in Multicore

◦ Identify Multicores Characteristics
◦ Apply power management technique
 To Cores
 To Caches

Future of multicore
Traditional DVFS

Motivation
◦ Large
Computation/Memory Gap

Problems to apply to
multi-core
Power
supply
Off-chip
regulator
◦ Slow
 Microsecond timescales
◦ Coarse-grained adjustment
 In operating system
◦ All cores arrive at a single
chip-wide VF setting
 Lose potential power saving
Core0
Core1
Core2
Core3
Per-core DVFS & on-chip regulator
On-chip vs. off-chip
regulator
◦ Tens of nanoseconds
vs. microseconds

Per-Core vs. ChipWide DVFS
◦ Benefit heterogeneous
workload
Power
supply
Off-chip
regulator
On-chip Regulator

Core0
Core1
Core2
Core3
Wonyoung Kim; Gupta, M.S.; Gu-Yeon Wei; Brooks, D.; , "System level analysis
of fast, per-core DVFS using on-chip switching regulators," High Performance
Computer Architecture, 2008. HPCA 2008.
Per-core DVFS & on-chip regulator

Application
◦ Multi-Core Global Power Management
 Monitor power & performance
 Apply policies by per-core DVFS

Problem
◦ Overhead is large
Thread Motion
App B
Low IPC
High IPC
High-VF
Activity
App A
Low-VF
Time
Cores have different Voltage-Frequency setting
 Migrate thread between cores
 Apply DVFS benefits to program variability by
observe micro architectural events
 Fast movement create effective voltage level

Krishna K. Rangan, Gu-Yeon Wei, and David Brooks. 2009. Thread motion: finegrained power management for multi-core systems. In Proceedings of the 36th
annual international symposium on Computer architecture (ISCA '09).
Thread Motion

Application
◦ Thread Motion Framework
 Evaluation driven by micro
architectural events
 Time-driven
 Miss-driven
 Predict IPC for the next
interval
 Move thread if needed

Problem
◦ Potential Cache penalty
 Clustered multicore with
shared L1 cache within cluster
◦ Register file transfer penalty
 Store them in the shared cache
Heterogeneous Cores

Motivation
◦ Different applications have different resource
requirements
 Large ILP -> VLIW
◦ Different Power conditions
 full battery vs. low battery

Combine existing processor architecture
and do core-selection to minimize energy
Rakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, Norman P. Jouppi,
and Keith I. Farkas. 2004. Single-ISA Heterogeneous Multi-Core Architectures
for Multithreaded Workload Performance. In Proceedings of the 31st annual
international symposium on Computer architecture (ISCA '04).
Outline
Introduction
 Review of Power management technique
 Power management in Multicore

◦ Identify Multicores Characteristics
◦ Apply power management technique
 To Cores
 To Caches

Future of multicore
Gated-Vdd cache
Use high- Vt
transistor to turn off
power supply
 + reduce power
when turn off
 - data stored in low
power mode are lost
Vdd

SRAM CELL
Gated-vdd
control
Gnd
Michael Powell, Se-Hyun Yang, Babak Falsafi, Kaushik Roy, and T. N. Vijaykumar.
2000. Gated-Vdd: a circuit technique to reduce leakage in deep-submicron
cache memories. In Proceedings of the 2000 international symposium on Low
power electronics and design (ISLPED '00). ACM, New York, NY, USA, 90-95.
Gated-Vdd cache

Application
◦ Dynamically resizable i-cache
 Evaluate miss rate at every time interval and
upsize/downsize the cache using gated-vdd

Problem
◦ Data remapping on the fly
Yang, S.; Powell, M.D.; Falsafi, B.; Roy, K.; Vijaykumar, T.N.; , "An integrated
circuit/architecture approach to reducing leakage in deep-submicron highperformance I-caches," High-Performance Computer Architecture, 2001. HPCA.
Gated-Vdd cache

Application
◦ Cache Decay
 Turn a cache line off if
some cycles elapsed since
last access
 The decay interval can be
adaptive to the program

Problem
◦ Data lost in sleep cache
line, suffer cache miss
Kaxiras, S.; Zhigang Hu; Martonosi, M.; , "Cache decay: exploiting generational
behavior to reduce cache leakage power," Computer Architecture, 2001.
Proceedings. 28th Annual International Symposium on , vol., no., pp.240-251, 2001
ABB-Multi-threshold CMOS
Increase Vsb in the
sleep mode
 Effectively increase
vth to reduce leakage
 + State Preserved in
sleep mode
 - Need long time to
switch from sleep

1.0V
1.0V
1.0V / 3.3V
0V / 1.0V
0V
K. Nii, et. al. A low power SRAM using auto-backgate-controlled
MT-CMOS. Proc. of Int. Symp. Low Power Electronics
and Design, 1998, pp. 293-298.
0V
Drowsy Caches
Apply DVFS to
Cache
 + Waking up cost is
small
 + State preserve
 - Save not as much
leakage power
drowsy

1V
Vdd
0.3V
drowsy
SRAM CELL
Krisztián Flautner, Nam Sung Kim, Steve Martin, David Blaauw, and Trevor
Mudge. 2002. Drowsy caches: simple techniques for reducing leakage power. In
Proceedings of the 29th annual international symposium on Computer architecture
(ISCA '02). IEEE Computer Society, Washington, DC, USA, 148-157.
Drowsy Caches

Application
◦ Simple policy
 Put all lines into sleep periodically and wake up
afterwards
◦ No-access policy
 Put the lines which is not access in the window in sleep
◦ 90% of the lines can be drowsy mode
Avg

Normalized
total energy
Normalized
leakage energy
Run time
increase
0.46
0.29
0.41%
Problem
Leakage power
Drowsy cache
Gated-Vdd
6.24nW
0.02nW
Outline
Introduction
 Review of Power management technique
 Power management in Multicore

◦ Identify Multicores Characteristics
◦ Apply power management technique

Future of multicore
Future multicore

Dark silicon (transistor under-utilization)
◦ Power constraints
 Power down the transistor to reduce power
◦ Memory wall
 Waiting for the memory to continue computation
◦ Lack of parallelism
 Do not have enough work for transistor
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and
Doug Burger. 2011. Dark silicon and the end of multicore scaling. In Proceeding of
the 38th annual international symposium on Computer architecture (ISCA '11).
Future multicore

Power constraints
◦ New Device– FinFET

Memory wall
◦ New Technology – 3D IC

Lack of parallelism
◦ Auto parallization
Thank you !
Download