Wide Operational Range Processor Power Delivery Design for Both

advertisement
He X, Yan GH, Han YH et al. Wide operational range processor power delivery design for both super-threshold voltage and
near-threshold voltage computing. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 31(2): 253–266 Mar.
2016. DOI 10.1007/s11390-016-1625-7
Wide Operational Range Processor Power Delivery Design for Both
Super-Threshold Voltage and Near-Threshold Voltage Computing
Xin He 1,2 , Student Member, CCF, IEEE, Gui-Hai Yan 1 , Member, CCF, ACM, IEEE
Yin-He Han 1 , Senior Member, CCF, IEEE, Member, ACM, and
Xiao-Wei Li 1 , Senior Member, CCF, IEEE
1
State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences
Beijing 100190, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
E-mail: {hexin, yan, yinhes, lxw}@ict.ac.cn
Received January 4, 2015; revised June 19, 2015.
Abstract
The load power range of modern processors is greatly enlarged because many advanced power management
techniques are employed, such as dynamic voltage frequency scaling, Turbo Boosting, and near-threshold voltage (NTV)
technologies. However, because the efficiency of power delivery varies greatly with different load conditions, conventional
power delivery designs cannot maintain high efficiency over the entire voltage spectrum, and the gained power saving may be
offset by power loss in power delivery. We propose SuperRange, a wide operational range power delivery unit. SuperRange
complements the power delivery capability of on-chip voltage regulator and off-chip voltage regulator. On top of SuperRange,
we analyze its power conversion characteristics and propose a voltage regulator (VR) aware power management algorithm.
Moreover, as more and more cores have been integrated on a singe chip, multiple SuperRange units can serve as basic
building blocks to build, in a highly scalable way, more powerful power delivery subsystem with larger power capacity.
Experimental results show SuperRange unit offers 1x and 1.3x higher power conversion efficiency (PCE) than other two
conventional power delivery schemes at NTV region and exhibits an average 70% PCE over entire operational range. It also
exhibits superior resilience to power-constrained systems.
Keywords
1
voltage regulator, power delivery, near-threshold computing, multicore processor
Introduction
Nowadays processor design has been gradually
changed from performance-oriented paradigm to power
efficiency oriented paradigm because of the approaching end of Dennard’s scaling[1] . To boost efficiency,
the state-of-the-art processors provide multiple performance states (P-states) to enable flexible power management, which has even been marked as a selling point
by processor vendors.
Processor operational range has been greatly enlarged. Implementing performance states (P-states) requires dynamic voltage frequency scaling (DVFS) techniques which dictate a power delivery subsystem sup-
porting a wide range of voltage levels. For example,
to enable all P-states of an Intelr Pentiumr processor,
the power delivery has to provide from 0.9 V at P5,
the lowest performance state, to 1.4 V at P0, the highest performance state[2] . The operational voltage range
will be further expanded when incorporating some advanced power management techniques such as Turbo
Boost[3] and near-threshold voltage (NTV)[4] technology. Turbo Boost requires higher-than-nominal voltage to support instant over-clocking, while NTV uses
lower-than-nominal voltage to achieve aggressive power
saving.
Wide operational range processor design brings new
challenges to power delivery design. However, the
Regular Paper
This work is supported by the National Natural Science Foundation of China under Grant Nos. 61572470, 61532017, 61522406,
61432017, 61376043, and 61221062.
©2016 Springer Science + Business Media, LLC & Science Press, China
254
J. Comput. Sci. & Technol., Mar. 2016, Vol.31, No.2
power delivery subsystem is never ideal. It draws power
when transferring power from power source to the loading processor. This decreases the power conversion efficiency (PCE), i.e., the ratio of processor power to
the total power drawn from the board power source.
Within optimal range, PCE can reach up to 90% with
well-designed on-board voltage regulator (VR), referred
to as off-chip inductor based VR (Off-VR in this paper).
However, out of that optimal range, PCE decreases significantly. This can greatly dampen and even totally
eat up the power benefit achieved by wide operational
range power management. This argument can be confirmed by Fig.1(a). The result shows PCE of a typical
Off-VR on different output voltage levels ranging from
1.1 V to 2.3 V[5] . The PCE reaches the optimal point
(89%) at the highest voltage of 2.3 V, but plunges to
merely 12% at the lowest 1.1 V. Clearly, in the lowvoltage states, the power efficiency cannot be high because the power loss in power delivery has dominated
the total power.
Power Efficiency (%)
100
80
60
40
20
Off-VR
0
1.1
1.4
1.7
2.0
Output Voltage (V)
2.3
Power Efficiency (%)
80
60
40
20
Off-VR
On-VR
LDO-VR
0
0.4
0.6
0.8
1.0
1.2
Output Voltage (V)
Fig.1. Power conversion efficiency at different output voltage
levels.
The optimal PCE of conventional Off-VR resides at
the high voltage end, but gradually decreases at low
voltage levels. This is not surprising since the power
delivery is designed for peak performance. This conventional wisdom becomes less effective when it comes
to the state-of-the-art processors with wide operational
voltage range. The future processors are likely required
to work in both traditional super-threshold voltage
(STV) mode and aggressive power-saving NTV mode
to achieve ultra-high efficiency[6] . This requirement,
however, cannot be served well with current power delivery designs. The looming challenge is how to design
a power delivery subsystem that can provide wide operational range and, more importantly, high PCE across
the whole range.
One intuitive solution is to design multiple Off-VRs
whose optimal points are evenly located in the operational range. However, we find that designing an OffVR based power delivery subsystem targeting low voltage range is impractical because of the degraded regulation quality and voltage scaling speed. Also, it is neither physically practical nor scalable because today’s
processors have been very power-intensive. Even one
Off-VR solution has greatly burdened the board design, not to mention multiple Off-VR solutions. Another traditional solution is to use an on-chip LDOVR to collaborate with Off-VR. Although LDO-VR
can be easily integrated on chip, PCE still remains a
big problem since LDO-VR’s PCE is limited to output/input voltage ratio, as confirmed in Fig.1(b). Note
that Off-VR exhibits higher PCE in Fig.1(b) than the
one in Fig.1(a). This is because Off-VR in Fig.1(a)
employs larger inductor to aggressively remove output
ripple, and thus incurs larger power loss, and Off-VR in
Fig.1(b) uses much smaller inductor to maintain output
ripple at an acceptable level based on the observation
in Subsection 4.1. Although the inductors of two OffVRs vary in size and inductances, the behaviors of PCE
share same trends.
We resort to on-chip switched capacitor based VR
(On-VR in this paper) to complement the Off-VR at
the lower-end voltage. The rationale is twofold. 1)
On-VR does not dictate on-board resources, but only
on-chip silicon. At 20 nm technology node, due to thermal constraint it is estimated that roughly 20% chip
area will become dark silicon, this number will grow
to more than 50% at 8 nm technology node[7] . On the
other side, on-chip switch capacitor voltage regulator
has gone increasing mature in recent years[8], and integrated capacitors can be used to implement voltage regulator in current CMOS processes without additional
fabrication steps. These imply we could trade increas-
Xin He et al.: Wide Operational Range Processor Power Delivery Design
ingly cheaper silicon to build On-VR for power delivery. 2) On-VR’s optimal point usually resides in low
voltage range compared with Off-VR, and hence it is
just right to complement Off-VR in terms of optimal
PCE. As Fig.1(b) shows the efficiencies of Off-VR and
On-VR change with varying load levels under fixed input voltages, note that the load current is scaled with
output voltage. Off-VR can achieve high conversion efficiency at high load levels, but when the load shifts
to low levels, Off-VR’s PCE degrades significantly. By
contrast, On-VR maintains high PCE at the low load
levels, but cannot feed high load levels. This observation provides a unique opportunity to achieve wide operational range delivery by judiciously building the synergy between Off-VR and On-VR. Based on this observation, we explore the design space of wide operational
range and examine several design candidates. Finally
we propose SuperRange, a wide operation range power
delivery unit, to maximize PCE over the whole range.
It is worth to note that the power delivery capacity of a SuperRange unit is limited, and under heavy
load condition, the PCE of SuperRange would degrade
significantly even if SuperRange could afford it. In
this case, we propose to use SuperRange units as basic building blocks to form a distributed power delivery
network. For example, the typical power of a 64-core
processor is 120 W, and the capacity of a SuperRange
unit is 30 W. In order to power up the manycore processor, the power delivery design should group the cores
into four power domains powered by SuperRange units.
In this way, high PCE could be still maintained. The
reasons are twofold: 1) processor cores rarely operate at
full speed and usually remain under-utilized, and thus
power consumption is far less than the typical power;
2) large load can be shared by several SuperRange domains. Taking an 8-thread application as an example,
each thread should be assigned to an active core. The
thread scheduling algorithm could map each thread to
one core in a different SuperRange domain, and thus
the burden of large load can be relieved. We resort to
hardware and software co-design to optimize PCE for
a manycore processor. Specifically, we find an optimal
core count of SuperRange domains and propose a load
sharing scheduling algorithm.
In particular, we make the following contributions.
1) We explore the design space of wide operational
range power delivery design. We thoroughly evaluate
the feasibility of possible design options, which motivates an optimal design.
255
2) We propose a wide operation range power delivery scheme, called SuperRange, to maximize PCE over
the whole range. SuperRange builds the synergy between Off-VR and On-VR.
3) On top of SuperRange, we further propose a VR
aware power management algorithm. This algorithm
is used to search for optimal processor power states to
maximize performance under given power budget.
4) We propose to use SuperRange unit as a basic
building block to form a distributed power delivery network design for wide operational range processor. Incorporated with load sharing scheduling algorithm, the
proposed design could keep high PCE and exhibit good
salability.
The rest of the paper is organized as follows. Section 2 gives the background. Section 3 analyzes the optional power delivery scheme and Section 4 details the
proposed SuperRange scheme. Section 5 proposes the
distributed power delivery network design. Section 6
describes the experiment. We introduce related work
in Section 7. Finally Section 8 summarizes this paper.
2
Background
Voltage regulator is the key component for delivering power to microprocessors. It has always been challenging to achieve high PCE. The widely used VRs are
1) linear VR and 2) switch-mode VR.
The most commonly used linear VR is low-dropout
regulator (LDO). Fig.2(a) shows a typical LDO regulator circuit. When the input voltage is higher than the
dropout voltage, LDO maintains a specified voltage.
In the dropout region, the PMOS pass element is simply a resistor. Voltage regulation is achieved through
dropout voltage tuning in LDO part. LDO-VR only
provides lower-than-input voltage levels, and its efficiency is limited to the ratio of output to input voltage.
Thus LDO-VR can achieve high PCE only when the
output voltage is close to the input voltage, and linearly degrades with the increasing gap between input
and output voltage.
The traditional switch-mode VRs refer to switching inductors VRs. Fig.2(b) shows a typical Off-VR
model[9] . The voltage conversion is achieved by two
parts: the bridge and the inductor. In the bridge
part, the two power switches switch on and off asynchronously at a specified switching frequency. This periodical operation generates a square wave of voltage
to charge and discharge the inductor. In the inductor
part, a low-pass output filter LC is employed to filter the square wave. From Fourier analysis, finally the
256
J. Comput. Sci. & Technol., Mar. 2016, Vol.31, No.2
output voltage value is equal to the average value of
the square wave, Vout = D × Vin , where D is duty cycle which is tunable to support different output voltage
levels.
D
S
PMOS Pass
Element
Vin
G
R
Vp
Vo
Error
Amplifier
R
Vref
(a)
Bridge
Inductor
Vin
Re
11 kg
Rb
Ri
Vout
Switch
Li
Cdcap
Cb
GND
GND
(b)
M
Vin
M
C1
C1
Vout
C1
M
GND
M
3
M
C1
C1
M
Vout
C1
C2
M
Vout
GND
C1
achieves high efficiency at high load level, but degrades
fast at low load where the switching loss dominates the
total VR power. Moreover, such type of VRs usually is
a discrete component and can only reside off chip due
to large form factor.
In recent years, another type of switch-mode VR
design, switch capacitor VR, goes increasingly mature. Switch capacitor VR consists of multiple switches
and capacitors organized into specific topologies. The
regulation is achieved by using capacitors as energy
storage elements and periodically charging/discharging
them. Fig.2(c) represents a typical switch capacitor
VR model. Basically, it is implemented into a serialparallel topology, and achieves 3:1 voltage conversion
ratio. In Fig.2(c), during charging phase, switches M 1,
M 5, M 7 are on, others are off, and capacitors C1 and
C2 are serially connected and charged, and in discharging phase, switches M 1, M 5, M 7 are off, others are
on, and capacitors C1 and C2 are parallel connected
and discharged. The circuit is periodically switched
between two configurations at a specified switch frequency.
Switch capacitor VR can gain high efficiency at a
few voltage levels near its target conversion ratio. Compared with inductor mode VRs, switch capacitor VRs
can be optimized to deliver power efficiently at low
load levels, and they can be built on chip thanks to
small physical size and compatible manufacturing process with the host chips. Even though switch capacitor
VR lacks the ability to deliver power at all load levels,
it is still a promising scheme to complement with other
schemes.
C1
(c)
Fig.2. (a) LDO-VR model. (b) Off-VR model. (c) On-VR
model.
This type of VRs is still imperfect in conversion efficiency. It suffers from three kinds of power losses:
switching loss, the resistive loss in the bridge, and the
conductive loss in the inductor. Similar to LDO-VRs, it
SuperRange Design Space Exploration
In this section, we aim to find an efficient power
delivery design towards wide operational range. This
design would have the capability to convert the board
voltage to a wide output voltage range. In this design,
the board voltage is set to 3.7 V which follows conventional cases[10] . The output voltage levels range from
NTV 0.4 V to STV 1.2 V[6] .
To derive the optimal design is challenging. We
therefore first explore the design space and analyze optional designs. These designs are evaluated from two
aspects: 1) power conversion efficiency and 2) regulation quality, explained as follows.
The power conversion efficiency (η) of a voltage regulator is defined as the ratio of the loading processor
power (Pload ) to the power drawn from power source,
257
Xin He et al.: Wide Operational Range Processor Power Delivery Design
that is
η=
Pload
1
=
.
loss
Pload + Ploss
1 + PPload
Clearly, we cannot achieve high efficiency if the
power loss (Ploss ) is comparable to and even overwhelming the load power. The PCE is obtained based on
existing Off-VR and On-VR models[8,11] .
The regulation quality refers to output voltage ripple. To maintain high voltage integrity, the upper
bound of voltage ripple is usually no more than 10%.
3.1
Off-VR Scheme
One intuitive solution is to design multiple Off-VRs
whose optimal output points are evenly located in the
operational range, as shown in Fig.3(a). There are three
sources of power loss: 1) switching loss Pcap , 2) resistive loss Pres in the bridge, and 3) conductive loss Pind
in the inductors:
ripple VR , because VR ∝ 1/f ; 2) longer response time,
i.e., processor power state transition would take longer,
as the switching frequency is reduced; 3) increased Pres
and Pind , because the effective current Irms and the resistance Ri will increase with the reducing frequency.
These trade-offs can be clearly observed in Fig.4.
The spice simulation and power modeling results show
how voltage ripple and efficiency change with reducing frequency. The initial switch frequency is 500 KHz
given suggested f is several hundreds of KHz. To shift
the range to low end, f has to be reduced to 133 KHz.
Unfortunately, the output ripple goes over 10% guard
line, but PCE merely increases 10%∼15%. The power
saving is doomed to be offset by the increased load
power due to higher voltage to tolerate the large ripple. Moreover, using one more Off-VR further burdens
PCB board design and incurs high board area overhead. Hence, the Off-VR scheme is not a recommended
design.
30
Ploss = Pcap + Pres + Pind ,
f=500 KHz
where
f=133 KHz
25
2
Pind = Ri Irms
Board Power
Supply
3.7 V
Off-VR
Off-VR
0.4 V~0.6 V
0.7 V~1.2 V
I2
= Rc (IL2 + R ),
3
IR2
2
= Ri (IL +
).
3
Board Power
Supply
3.7 V
0.4 V~1.2 V
0.4
0.5
0.6
Output Voltage (V)
(a)
On-VR
CPU
CPU
Chip
Chip
Chip
(b)
0
0.4 V~0.6 V 0.7 V~1.2 V
CPU
(a)
15 Max
Allowed
Ripple
10
Off-VR
Off-VR
2 V
LDO-VR
20
5
Board Power
Supply
3.7 V
(c)
Fig.3. Top view of optional power delivery designs. (a) Off-VR.
(b) LDO-VR. (c) On-VR + Off-VR.
In this design, we find the main culprit of low efficiency at low load levels is high switching loss Pcap ,
while Pres and Pind are relative small. (To reduce
the switching loss Pcap , we need to reduce switch capacitance C, input voltage Vin or switching frequency
f .) The most effective approach to reducing Pcap is
to reduce switching frequency (f ). However, decreasing switching frequency will incur new problems: 1) reduced quality of regulation, i.e., larger output voltage
Power Conversion Efficiency (%)
2
Pres = Rc Irms
Voltage Ripple (%)
Pcap = CVin2 f,
80
f=500 KHz
f=133 KHz
70
60
50
40
30
20
10
0
0.4
0.5
Output Voltage (V)
0.6
(b)
Fig.4. Voltage ripple and efficiency comparison while reducing
frequency from 500 KHz to 133 KHz. (a) Output voltage ripple
comparison. (b) PCE comparison.
258
3.2
J. Comput. Sci. & Technol., Mar. 2016, Vol.31, No.2
LDO-VR Scheme
Another solution is to use a large range LDO-VR.
LDO has the ability to deliver voltage levels below its
input voltage. However, the achievable PCE of LDO
regulator is limited by the output/input voltages ratio
(Vo /Vi ) and the quiescent current (Iq ) as follows:
η=
Io Vo
.
(Io + Iq )Vi
In this analysis, we ignore the quiescent current’s impact on PCE, because it is two orders of magnitude
smaller than the output current Io . Fig.3(b) shows this
power delivery scheme. A high efficiency Off-VR serves
as the frontend and feeds the step-down voltage to an
LDO regulator. The LDO regulator can be built on
chip or off chip, and we follow the most conventional
design and build it on chip. In this baseline, LDO delivers power from 2 V to the voltage from 0.4 V to 1.2 V.
The PCE is limited because of the low ratio of output
voltage to input voltage. Especially at near-threshold
region, the PCE is less than 30%. Therefore, using
LDO regulator is not an efficient solution to support
low load levels.
3.3
On-VR Scheme
We propose to use On-VR to deliver low output voltage levels as Fig.3(c) shows. To serve NTV efficiently,
one should reduce the gap between input and output
voltage. An Off-VR first steps 3.7 V power source down
to an intermediate value, then the On-VR further converts the voltage to 0.4 V∼0.6 V. Because the PCE is
the product of PCEs of On-VR and Off-VR, both of
them should be taken into careful consideration. The
On-VR model follows a high efficiency design[8] . The
power loss is shown as follows:
Ploss = PCfly + PRsw + Pbott-cap + Pgate-cap ,
where PCfly is switch capacitor loss, PRsw is switch conductance loss, Pbott-cap is parasitic capacitor switching
loss and Pgate-cap is gate parasitic capacitance switching loss. Fig.1(b) shows PCE of On-VR when OnVR converts voltage from the intermediate level 2 V
to 0.4 V∼0.6 V. Clearly, it achieves high conversion efficiency at low output voltage levels because it uses
lossless low power MOSFETs. However, On-VR alone
lacks the ability to support high voltage range. We
therefore need to use the frontend Off-VR as the complement in high voltage range. Specifically, the power
delivery steers to Off-VR and shuts down On-VR when
the loading processor needs STV, or steers to On-VR
when NTV is engaged. The on-chip power switches
have been well studied[12-13] . In this scheme, STV is
readily supported, but NTV is worthy to be further
clarified.
The voltage transfer function of On-VR can be formulated as follows[8] :
Vo = αVin − Io Ri , where Ri ∝ 1/f.
In this equation, Vo is output voltage, Vin is input voltage, Ri is inner resistance, f is switching frequency and
α is a topology specific parameter which defines the
conversion ratio which is constant given an On-VR design. According to the equation, the output voltage can
be modulated by 1) tuning the resistance of On-VR by
changing f and keeping the Vin constant, or 2) tuning
the input voltage produced by Off-VR and keeping the
resistance constant. However, these two approaches are
not equivalent to each other, and we find the second one
is preferred. The reason is explained as follows.
The first way is to adjust the operation state of
On-VR fed with fixed input voltage from frontend OffVR. The voltage is firstly converted from 3.7 V to a
lower value 2 V by Off-VR. Its PCE can be optimized
to 80%∼87%. Then On-VR converts the voltage from
fixed voltage 2 V to 0.6 V. Further voltage scaling, i.e.,
0.4 V∼0.6 V, is realized by tuning the switching frequency of On-VR. This design can regulate the voltage to near-threshold region with high regulation quality, but the PCE of On-VR suffers from the decreasing
switching frequency. The optimal operation point of
On-VR is limited in a narrow range near 0.6 V. Simulation result shows this design has an average efficiency
58% to support NTV. Although it is 20% higher than
conventional design, the efficiency is still very low at
the lowest 0.4 V output voltage.
Another alternative is to tune the Off-VR state to
generate a variable voltage Vin and use fixed On-VR
to step this intermediate voltage to NTV. For example, the Off-VR first changes the duty cycle and steps
the source voltage to a value between 1.3 V and 2 V,
and then the voltage value is converted to 0.4 V∼0.6 V
range by On-VR. Because On-VR does not need to decrease its switching frequency, the PCE can stay around
90%. However, given Vo Io ≈ 90% × Iin Vin in On-VR,
the output current of Off-VR, Iout , is very low, which
renders the Off-VR less efficient. In our case, the efficiency drops to 44% on average. Although such solution
is still not optimal, it sheds light on the way to further
259
Xin He et al.: Wide Operational Range Processor Power Delivery Design
optimization: to improve the efficiency of frontend OffVR under a low current mode, which is the key design
consideration in SuperRange scheme.
Proposed SuperRange Scheme
In this section, we use a multi-phase Off-VR (four
phases in this paper) as the frontend to build SuperRange. This design solves the low efficiency problem
of Off-VR at low current condition when feeding the
On-VR.
4.1
Proposed Power Delivery Design
Fig.5 shows the top view of the SuperRange scheme
and loading processor. The processor working voltage
covers NTV and STV. The STV levels are directly provided by 4-phase Off-VR, while the NTV levels are produced by On-VR which uses the Off-VR as its frontend.
The On-VR is a serial-parallel single topology VR with
3 : 1 conversion ratio. The Off-VR is a conventional Off1
VR, similar to commercial regulator LTC3733○
.
35
Vo=0.4 V
Output Voltage Ripple (%)
4
over 70%. Note that reducing the number of working
phase inevitably incurs larger ripple, but the increased
current ripple can be easily removed by using a relative
large inductance inductor.
The increased inductor size will not burden board
design since we use only one Off-VR in this design. In
our case, we find a 1.5 µH inductor is competent to ease
the extra ripple which is confirmed in Fig.6. An inductor with inductances larger than 1.5 µH can reduce the
ripple to a level lower than 7%. Thus we can dynamically reduce the number of working phase when more
load current is required.
30
Vo=0.5 V
Vo=0.6 V
25
20
Max Allowed Ripple
15
10
5
Vin
Board Power
Supply
Vout
0
3.7 V
0.5
1.0
1.5
2.0
2.5
Inductor Inductance (mH)
Cdcap
Off-VR
0
VID Code
Fig.6. Output ripple with varying inductor size.
Power Switch
0.4 V~0.6 V
Vout
0.7 V~1.2 V
PMU
Multicore
Processor
Chip
Voltage (V)
On-VR
1.3
1.1
STV 1.2 V
0.9
0.7
NTV 0.5 V
0.5
0.3
3.90 3.94 3.98 4.02 4.06 4.10
ms
Fig.5. Implementation of proposed power delivery.
The multi-phase Off-VR provides an opportunity
to improve the efficiency of frontend Off-VR under a
low current mode. Multi-phase technology is commonly
used to increase the load current. Each phase delivers
one equal portion of total load current. Modern Off-VR
has the capability to dynamically change the number
of working phases[14] . Therefore some phases can be
disabled to adapt to the low current mode, without degrading PCE. In our design, when delivering to NTV,
we find that a single working phase can afford the load,
and other phases can be disabled. The PCE increases
Basically, SuperRange has two working modes.
1) Supporting STV. Voltage conversion to STV is
performed by Off-VR. Off-VR receives and decodes
VID, and the power delivery is directly steered to OffVR. It changes output voltage based on VID.
2) Supporting NTV. The support to NTV region is
achieved by a two-step conversion. Off-VR first decodes
VID and the power delivery is steered to On-VR. Then
Off-VR sets to a single working phase, and the output
voltage is initially regulated to a variable intermediate
level based on VID. In our case, the intermediate level
is 1.3 V, 1.6 V or 2.0 V with the careful consideration
of On-VR’s resistance. After this, On-VR steps the
voltage to a corresponding near-threshold value.
By doing so, the proposed scheme can provide high
conversion efficiency over the entire load spectrum.
4.2
VR-Aware Power Management Algorithm
Supply voltage scaling decision is made by on-chip
power management unit (PMU). PMU is a micro-
1
○
LTC3733: 3-phase, buck controllers for AMD CPUs. http://cds.linear.com/docs/en/datasheet/3733f.pdf, Nov. 2015.
ferent load conditions; 3) processor core count; 4) power
budget.
100
Power Conversion Efficiency (%)
controller and runs power management routine which
makes decisions on core power states (voltage and frequency) and active core count selections. PMU collects power data from on-chip current sensors and performance statistics from performance counters. Power
management routine then uses data collected as input
and selects supply voltage level to achieve certain power
management goals. Once PMU makes a decision on the
selection of supply voltage level, PMU sends the corresponding VID (Voltage Identification) code to Off-VR.
The detailed PMU design is beyond the scope of this
paper.
To demonstrate the potential of SuperRange, we
present a VR-aware power management algorithm employed by PMU. The goal of this algorithm is to maximize system performance under given power budget. It
is worth to note that SuperRange can also be applied to
other management algorithms, not limited to this one.
We first exploit the efficiency of SuperRange at different load conditions as Fig.7 shows. Figs.7(a) and
7(b) show the PCEs when supporting NTV and STV
levels, respectively. Clearly, the efficiency of the SuperRange strongly correlates with output voltage and
current. Usually excessively low current cannot yield
high efficiency; hence the low current condition should
be avoided as much as possible.
Then we consider system power efficiency. The
power efficiency is defined as the ratio of performance
(billion instruction per second) to total power, a.k.a.
BIPS/Watt. Without considering the imperfect PCE,
the power efficiency increases quadratically with lower
voltage. This is because core power consumption reduces cubically while the performance degrades linearly
with lower voltage. However, there is a trade-off when
taking the imperfect PCE into account. Although high
voltage benefits high PCE, it degrades the application
power efficiency more. The following algorithm is used
to tackle this trade-off and the goal is to maximize the
performance under power budget.
The load condition, the efficiency of power delivery
system and the application performance serve as inputs to the algorithm. The load condition and performance can be measured through on-chip current sensor
and performance counter. We assume a homogeneous
multi-core processor, but the basic principle is also applicable to heterogenous processors, which is supposed
to be future work.
The problem can be formulated as follows:
• Given: 1) application power and performance in
different voltage levels; 2) SuperRange PCE under dif-
J. Comput. Sci. & Technol., Mar. 2016, Vol.31, No.2
Vo=0.4 V
Vo=0.5 V
80
Vo=0.6 V
60
40
20
0
0
1
2
3
Output Current (A)
4
5
(a)
80
Vo=0.7 V
Power Conversion Efficiency (%)
260
Vo=0.9 V
60
Vo=1.1 V
40
20
0
0
5
10
15
Output Current (A)
(b)
20
Fig.7. Per-phase efficiency at different load conditions. (a) OnVR & Off-VR model. (b) Off-VR model.
• Determine: the number of active cores and the
supply voltage level to maximize performance under
power budget.
The problem is solved with Algorithm 1. Basically,
it is a two-step process.
Step 1. The algorithm first computes the total
power Pv when all cores are active at each voltage level.
The power is the sum of measured core power and VR
power loss. Then it compares the power values with
given power budget, and then selects the lowest level
VL under which Pv is higher than PB and the highest
level VH under which Pv is lower than PB . The optimal
voltage setting will be chosen between the two voltage
levels.
Step 2. The algorithm then calculates the maximum
number of active core at VL which satisfies power budget, and gets the corresponding performance. Then it
compares the performance with the performance BVH at
261
Xin He et al.: Wide Operational Range Processor Power Delivery Design
VH , and picks the configuration which gives the higher
performance.
Algorithm 1. Searching for Optimal VF Setting and
Active Core Count
Input: core power Pvall , performance Bv at voltage level
v, processor core count Nall , power budget PB ; SuperRange PCE ηv
Output: output voltage level Vo ; active core number No
1: for each v do
Pv
2:
Pv = all //total power when all cores are active
ηv
3: end for
4: Select the highest voltage VH at which Pv is smaller than
PB , and get BVH ;
5: Select the lowest voltage VL at which Pv is higher than
PB ;
6: for i = Nall ; i > 0; i − − do
PLi
7:
P =
; //PLi is obtained by current sensor
ηv
8:
Collect Bi through performance counter;
9:
if P <= PB then
10:
BVL = Bi ;
11:
break;
12:
end if
13: end for
14: (Vo , No ) = max(BVH , BVL ); //final configuration gives
higher Bips
Distributed Power Delivery Network Design
Using SuperRange
In Section 3 and Section 4, we propose SuperRange
design, aiming at delivering power for wide operational
range multicore processor at high PCE. However, the
power delivering capacity of SuperRange unit is limited. When large load condition is incurred, PCE of
SuperRange would degrade significantly although SuperRange could afford such load. In this section, we
propose to use SuperRange unit as a basic building
block to form a distributed power delivery network. In
order to achieve high PCE, we analyze the correlation
between PCE and underlying load condition, and observe that optimal PCE could only be attained at a
modest load condition. We resort to hardware and software co-design to optimize PCE. PCE should be kept
at an acceptable level under proposed power delivery
organization, and it could be further increased through
load balancing.
Firstly we try to find the optimal core organization
of SuperRange, specifically, the number of cores that
SuperRange delivers to. When applying power delivery
to load processor, the most important goal is to maintain PCE an acceptable value, and usually the differ-
75
Power Conversion Efficiency (%)
5
ence of the lowest value to the highest value should be
kept less than 25%. We firstly exploit the relation between the power delivery PCE and the underlying core
count and take a manycore processor as an example,
and the result is shown in Fig.8. The underlying cores
are all running in their highest voltage/frequency setting. From this figure, the PCE first rises to an optimal
point (nearly 75%) with increasing core count. When
core count goes across a certain extent (four cores in
this case), PCE drops gradually, to 57% with 16 active
cores. Inspired by the result, the most straightforward
decision is that each SuperRange unit delivers to four
cores to meet the highest PCE point. However, for a
manycore system, equipping every a small number of
cores with one SuperRange unit will incur large area
overhead and lead to over-design. We observe that SuperRange can still keep high PCE even if SuperRange
delivers to more cores. This is because conventionally a
large number of cores are underutilized in a manycore
system, and some cores in SuperRange can be slowed
down to regain the PCE. Therefore, we find that deploying each SuperRange for 16 cores is a reasonable
solution. It cannot only maintain PCE above 57%, but
also incur negligible area overhead and exhibit superior
scalability.
ferret
bodytrack
dedup
swaptions
70
65
Acceptable PCE
60
55
50
0
5
10
Core Count
15 16
20
Fig.8. SuperRange PCE with increasing active core count.
Secondly we propose to use load sharing scheduling algorithm to further increase PCE. The distributed
power delivery network for a manycore processor is
shown in Fig.9. The processor cores are grouped into
several power domains, and each power domain is supported by SuperRange power delivery. This scheme
provides a unique opportunity to increase PCE. As we
mentioned above, PCE would suffer from excess load
262
J. Comput. Sci. & Technol., Mar. 2016, Vol.31, No.2
SuperRange
Domain 1
SuperRange
SuperRange
Domain 2
SuperRange
Domain 2
Domain 1
...
Active Region
...
Load
Sharing
Active Region
...
Domain 3
SuperRange
...
Domain 4
Domain 3
SuperRange
SuperRange
Domain 4
SuperRange
Fig.9. Application of SuperRange to multicore processor.
condition. In this scheme, excess load current can be
avoided by load sharing. As shown in Fig.9, without
loss of generality, let us assume a multi-programmed
workload running on 16 cores. Running 16 cores in
one domain would result in a reluctantly accepted PCE
value. Fortunately, as shown in this figure, load current
can be shared by four power domains. In each domain,
four cores are active and the others are shut down, thus
the PCE shifts to the high end. In this way, we can also
achieve high PCE at large load condition. Note that the
load can also be shared by other domains, and the decision making process should compare these results and
find the optimal setting.
6
6.1
Experiment
Experimental Setup
Baseline Architecture. The baseline is a multi-core
processor consisting of SuperRange domains. Each domain delivers power to 16 OoO (Out-of-Order) cores,
though SuperRange essentially can support any type
of architectures. Each domain baseline is deployed
on a 32 MB last level cache (LLC). The LLC is organized in eight banks and each bank is 8-way associative. Distributed directory based MESI cache coherence is enabled to maintain the shared data consistency.
The processor connects to the main memory through a
DDR3-1333 memory controller with 10.6 Gb/s bandwidth. Detailed information is provided in Table 1.
Voltage Regulators. The efficiency model of offchip inductor based VR is based on converter model[11] ,
and inductor and switches parameters are derived from
LTC3733’s datasheet. The switch capacitor VR follows an existing high efficiency model[8] , and uses a
3 : 1 serial-parallel topology voltage regulator. We also
build their LTspice models to simulate their behavior.
Detailed configurations including VR parameters and
area cost are listed in Table 2. The area overhead of
On-VR is 0.084 mm2 , which is less than 0.5% area of
the baseline domain.
Table 1. Baseline Architecture Configurations
Parameter
Core number
Power delivery
LLC capacity
LLC feature
Cache coherence
On chip interconnection
Memory controller
Memory bandwidth
Area (mm2 )
Value
16
Hybird
32 MB
32 B block, 8-way
Distribute directory-based MESI
Mesh + router
1
10 Gb/s
253.03
Table 2. Voltage Regulator Parameters
Parameter
Topology
Vin
Vout
Freq (Mhz)
Number of phases
L per ph (µH)
Intrinsic resistance (mohm)
Cfly (nF)
Area (mm2 )
Off-VR
Buck
3.7
0.7∼2.0
0.5
4
1.5
32
N/A
3.8
On-VR
Switch Cap
1.2∼2.0
0.4∼0.7
300
20
N/A
56
20
0.084
Simulation Infrastructure. We use a full-system
2
simulator gem5○
as our simulation infrastructure.
In addition, McPAT[15] is plugged in gem5 to evaluate power consumption and area overhead. The
technology is configured to 32 nm technology node.
2
○
The gem5 simulator system. http://www.m5sim.org, Nov. 2015.
263
Xin He et al.: Wide Operational Range Processor Power Delivery Design
Table 3. Microarchitectural Parameters
Parameter
Processor
Fetch/decode width
Branch-predictor type
Reorder buffer size
Unified load/store
FP/INT register file size
Supply voltage (V)
Clock rate (GHz)
Inst/data cache
Access time
Miss penalty
Area (mm2 )
Value
Out-of-Order core
4
Hybrid 2-Level
128
64
64/64
0.4 : 0.1 : 1.2
0.3 : 0.2 : 1.9
32 KB, 2-way
2 ns
12 ns
13.65
Benchmarks. We use Parsec benchmark suite[16] to
evaluate our design, because it targets the general purpose processors and aims to represent emerging workloads in the near future.
Power Domains. The power domains are powered
by SuperRange schemes. For other power consuming
components, we follow Intel Nehalem family processor design[17] , i.e., the LLC and memory controllers
are powered by their own off-chip VR. Because LLC
and memory controller contribute to fixed and relative
small portion of system power, our work focuses on core
power delivery.
6.2
Experimental Results
First, we show the overall power conversion efficiency of SuperRange domain in Fig.10. From the results, we can see that PCE at NTV increases by 40%
compared with conventional Off-VR design and the average PCE over entire operational range can be nearly
70%. This is because SuperRange not only implements
the synergy between Off-VR and On-VR to improve
PCE, but also proposes to optimize the PCE of frontend
Off-VR while maintaining On-VR high PCE (around
90%) at NTV levels. Since our scheme fully explores
the efficiency benefit of both Off-VR and On-VR, we
can conclude that the wide operational range now can
be well supported.
100
Power Conversion Efficiency (%)
We extend McPAT to calculate the power in different performance states. We build the Out-of-Order
(OoO) core resembling Alpha-EV6. The core configuration is detailed in Table 3. The cores have
nine performance states, and each state corresponds
to a specified voltage/frequency setting: Pstate1(1.2 V,
1.9 GHz), Pstate2(1.1 V, 1.7 GHz), Pstate3(1.0 V,
1.5 GHz), Pstate4(0.9 V, 1.3 GHz), Pstate5(0.8 V,
1.1 GHz), Pstate6(0.7 V, 0.9 GHz), Pstate7(0.6 V,
0.7 GHz), Pstate8(0.5 V, 0.5 GHz), Pstate9(0.4 V,
0.3 GHz).
NTV Region
STV Region
80
60
40
20
0
SuperRange
Off-VR
LDO-VR
0.4
0.6
0.8
1.0
1.2
Output Voltage (V)
Fig.10. Power conversion efficiency over the entire operational
range.
Second, we study the performance potential under a constant power budget. To simulate a powerconstrained system, the highest power budget is able
to power up eight cores at Pstate1, the highest performance state, and half of the cores have to stay in dark.
If more active cores turn out to yield higher performance, we need to reduce the voltage to enable lower
performance states, at the risk of large power loss in the
delivery. Fig.11 shows the maximum performance delivered by the target processor under three power delivery schemes (as shown in Fig.3) with 25% of the highest
power budget. The results show that the proposed SuperRange obviously outperforms the other two schemes.
This is because with the highest conversion efficiency
over the large voltage range, the processor has more
flexibility to tune between the single core performance
and the multiple cores parallelism. This flexibility provides the applications more opportunities to successfully enable the optimal multi-threading configurations.
By contrast, the LDO-VR and the Off-VR lead to less
flexibility because the efficiency loss when delivering to
low voltage range may exclude some multi-threading
configurations which turn out to be performance optimal.
The proposed SuperRange scheme not only outperforms the LDO-VR and the Off-VR at the constant
power budget, but also shows superior resilience on even
tighter power constraints. Fig.12 shows the normalized
performance of the target processor under shrinking
264
J. Comput. Sci. & Technol., Mar. 2016, Vol.31, No.2
power budget. X axis sets the system power budget,
Y axis shows the corresponding maximum achievable
performance under three delivery schemes. Each box
shows the distribution of the performance running Parsec Benchmarks. The results show that at loose power
budget, SuperRange behaves as well as Off-VR scheme.
When the budget gets tighter, the system prefers NTV
levels. The result shows the achievable performance
under SuperRange is far higher than the performance
in LDO-VR and Off-VR by 171% and 52% on average,
respectively. The result demonstrates that SuperRange
is a more promising solution in the dark silicon era.
2.5
Τ104
LDO-VR
Off-VR
SuperRange
Performance (MIPS)
2.0
1.5
The experiment is performed on four workloads. We assume each workload is made up of four Parsec applications, and each application has four threads. The workload configuration is detailed in Table 4. And active
cores are running on their highest power/performance
state. We compare the PCE under four different ways
of load sharing. 1) The load is shared by four SuperRange domains, referred to as “4 domain”. 2) The
load is balanced in two SuperRange domains, referred
to as “2 domain + balance”. 3) The load is imbalanced in two SuperRange domains, referred to as “2
domain + imbalance”. 4) The load is delivered by one
SuperRange domain, referred to as “1 domain”. The result is shown in Fig.13. We find that four-domain sharing method outperforms others and two-domain sharing
methods behave better than single domain delivering.
Under a heavy load condition, load sharing can always
reduce resistive loss in power delivery. There is a slight
difference of PCE about whether load is balanced or not
because of the small absolute difference of load condition.
1.0
Table 4. Workload Formulation
0.5
ps
Application
Workload 1
blackscholes, bodytrack, canneal, dedup
Workload 2
ferret, fluidanimate, freqmine, dedup
Workload 3
streamcluster, swaptions, vips, freqmine
Workload 4
blackscholes, canneal, swaptions, x264
vi
flu
fe
rr
id
et
an
im
st
at
re
e
am
cl
us
te
r
fr
eq
m
in
sw
e
ap
ti
on
s
de
du
p
bo
bl
ac
ks
ch
ol
es
dy
tr
ac
k
0
Workload
Fig.11. Performance comparison in power-constrained system.
80
75
Power Conversion Efficiency (%)
Normalized Performance
1.0
0.8
0.6
0.4
0.2
0
SuperRange
Off-VR
LDO-VR
100
90
80
70 60 50 40
Power Budget (%)
20
2 Domain+Balance
1 Domain
70
65
60
55
50
45
40
35
30
30
4 Domain
2 Domain+Imbalance
Workload 1
Workload 2
Workload 3
Workload 4
10
Fig.12. Performance under increasingly tighter power constraint.
Finally we evaluate the effectiveness of load sharing
in the distributed SuperRange power delivery design.
Fig.13. PCE comparison of different load sharing schemes.
7
Related Work
The design for high efficiency power delivery has
been a hot topic for years. Yan et al.[18] presented a
Xin He et al.: Wide Operational Range Processor Power Delivery Design
hybrid power delivery scheme to explore different power
phases, but they did not take the varying PCE into consideration. Ng and Sanders[19] and Le et al.[8] proposed
a high efficiency switched-capacitor DC-DC converter.
Ng and Sanders’ design aims at large input voltage
range, but the output voltage level is limited. Le et al.’s
design implements a multiple topology converter with
increased design complexity and on-chip area overhead.
It is unsuitable for powering microprocessors. Kim et
al.[20] designed an integrated 3-level DC-DC converter,
and their design targets at fast DVFS and does not
have high efficiency at a low load level. He et al.[21]
proposed a wide range power delivery design, but like
any other power delivery circuit design, the power delivering capability of their design is limited. All the
designs stated above lack the ability to support a wide
operational range for manycore processor efficiently.
Another line of prior work considers system level
strategies. Cho et al.[13] proposed a system level power
management method based on dynamic voltage regulator scheduling to achieve high conversion efficiency
over a large power range. They chose the most efficient voltage regulator with changing load levels. Sinkar
et al.[22] proposed a workload aware voltage regulator
configuration tuning method to optimize system power.
Amelifard and Pedram[23] introduced a reconfigurable
power delivery network design by dynamically changing
the topology of voltage regulators to support different
load levels. Ghasemi et al.[24] implemented per-core
voltage domains by using on-chip low dropout converters. Their method imposes low area overhead since the
converters share components with power gating circuit.
However, the LDO converter has a low conversion efficiency at low load levels. Yan et al.[25] applied pathgrained voltage adaptation to relieve the violation incurred by low voltage, and Han et al.[26] proposed the
redundancy-based data salvaging technique for fault recovery to enable near-threshold voltage. Both of Yan
et al.’s and Wang et al.’s design are orthogonal to design in this paper. Differing from these designs, our
work addresses the efficiency problem through a hybrid
power delivery scheme and highlights wide operation
range which is missed in prior work.
8
Conclusions
In this paper, we proposed a large operational range
power delivery scheme, SuperRange, by exploring the
advantages of both on-chip and off-chip voltage regulator. We thoroughly analyzed the efficiency beha-
265
vior of existing voltage regulators, and ensure that SuperRange can provide high power conversion efficiency
over the entire voltage range. Moreover, we proposed
a voltage regulator aware power management algorithm. This algorithm heuristically finds optimal processor configurations (number of active core and voltage/frequency setting) to maximize performance under given power budget. Experimental results showed
that SuperRange is well suitable to support wide power
range for future power-constrained systems.
References
[1] Esmaeilzadeh H, Blem E, Amant R S, Sankaralingam K,
Burger D. Dark silicon and the end of multicore scaling. In
Proc. ISCA, June 2011, pp.365-376.
[2] Intel. Enhanced Intelr SpeedStepr Technology for
the Intelr Pentiumr M Processor. http://download.intel.com/design/network/papers/30117401.pdf, Aug. 2015.
[3] Rotem E, Naveh A, Rajwan D, Ananthakrishnan A,
Weissmann E. Power management architecture of the 2nd
generation Intelr CoreTM microarchitecture, formerly
codenamed Sandy Bridge. http://120.52.72.36/www.hotchips.org/c3pr90ntcsf0/wp-content/uploads/hc archives/hc23/HC23.19.9-Desktop-CPUs/HC23.19.921.SandyBridge Power 10-Rotem-Intel.pdf, August 2011.
[4] Dreslinski R G, Wieckowski M, Blaauw D et al. Nearthreshold computing: Reclaiming Moore’s law through energy efficient integrated circuits. Proceedings of the IEEE,
2010, 98(2): 253-266.
[5] Kim J, Horowitz M. An efficient digital sliding controller for
adaptive power-supply regulation. IEEE Journal of SolidState Circuits (JSSC), 2002, 37(5): 639-647.
[6] Jain S, Khare S, Yada S et al. A 280mV-to-1.2V wideoperating-range IA-32 processor in 32nm CMOS. In Proc.
IEEE ISSCC, Feb. 2012, pp.66-68.
[7] Esmaeilzadeh H, Blem E, Amant R S, Sankaralingam K,
Burger D. Power limitations and dark silicon challenge the
future of multicore. ACM Transactions on Computer Systems (TOCS), 2012, 30(3): 11:1-11:27.
[8] Le H P, Sanders S R, Alon E. Design techniques for fully integrated switched-capacitor DC-DC converters. IEEE Journal of Solid State Circuits (JSSC), 2011, 46(9): 2120-2131.
[9] Schrom G, Hazucha P, Paillet F, Gardner D S, Moon S T,
Karnik T. Optimal design of monolithic integrated DC-DC
converters. In Proc. IEEE ICICDT, May 2006.
[10] Kim W, Gupta M S, Wei G Y, Brooks D. System level analysis of fast, per-core DVFS using on-chip switching regulators. In Proc. the 14th HPCA, Feb. 2008, pp.123-134.
[11] Choi Y, Chang N, Kim T. DC-DC converter-aware
power management for low-power embedded systems. IEEE
Trans. Comput.-Aid. Des. Integr. Circ. Syst., 2007, 26(8):
1367-1381.
[12] Amelifard B, Pedram M. Design of an efficient power delivery network in an SoC to enable dynamic power management. In Proc. ACM/IEEE ISLPED, August 2007, pp.328333.
[13] Cho Y, Kim Y, Joo Y, Lee K, Chang N. Simultaneous
optimization of battery-aware voltage regulator scheduling with dynamic voltage and frequency scaling. In Proc.
ACM/IEEE ISLPED, August 2008, pp.309-314.
266
[14] Zumel P, Fernandez C, de Castro A, Garcia O. Efficiency
improvement in multiphase converter by changing dynamically the number of phases. In Proc. the 37th PESC, June
2006.
[15] Li S, Ahn J H, Strong R D, Brockman J B, Tullsen D M,
Jouppi N P. McPAT: An integrated power, area, and timing
modeling framework for multicore and manycore architectures. In Proc. the 42nd Annual IEEE/ACM MICRO, Dec.
2009, pp.469-480.
[16] Bienia C, Kumar S, Singh J P, Li K. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. the 17th PACT, October 2008, pp.72-81.
[17] Gunther S, Deval A, Burton T, Kumar R. Energy-efficient
computing: Power management system on the Nehalem
family of processors. Intel Technology Journal, 2010, 14(3):
50-65.
[18] Yan G, Li Y, Han Y, Li X, Guo M, Liang X. AgileRegulator: A hybrid voltage regulator scheme redeeming dark
silicon for power efficiency in a multicore architecture. In
Proc. the 18th IEEE HPCA, Feb. 2012, pp.287-298.
[19] Ng V, Sanders S. A 92%-efficiency wide-input-voltagerange switched-capacitor DC-DC converter. In Proc. IEEE
ISSCC, Feb. 2012, pp.282-284.
[20] KimW, Brooks D, Wei G Y. A fully-integrated 3-level
DCDC converter for nanosecond-scale DVFS. IEEE Journal of Solid State Circuits (JSSC), 2012, 47(1): 206-219.
[21] He X, Yan G, Han Y, Li X. SuperRange: Wide operational
range power delivery design for both STV and NTV computing. In Proc. DATE, March 2014.
[22] Sinkar A A,Wang H, Kim N S. Workload-aware voltage
regulator optimization for power efficient multi-core processors. In Proc. DATE, March 2012, pp.1134-1137.
[23] Amelifard B, Pedram M. Optimal selection of voltage regulator modules in a power delivery network. In Proc. the
44th ACM/IEEE DAC, June 2007, pp.168-173.
[24] Ghasemi H R, Sinkar A A, Schulte M J, Kim N S. Costeffective power delivery to support per-core voltage domains for power-constrained processors. In Proc. the 49th
ACM/IEEE DAC, June 2012, pp.56-61.
[25] Yan G, Han Y, Liu H, Liang X, Li X. MicroFix: Exploiting path-grained timing adaptability for improving powerperformance efficiency. In Proc. ACM/IEEE ISLPED, August 2009, pp.395-400.
[26] Han Y, Wang Y, Li H, Li X. Enabling near-threshold voltage (NTV) operation in multi-VDD cache for power reduction. In Proc. IEEE ISCAS, May 2013, pp.337-340.
Xin He received his B.E. degree
in software engineering from Sichuan
University, Chengdu, in 2011, and
his M.E. degree in computer architecture from Institute of Computing
Technology (ICT), Chinese Academy
of Sciences (CAS), Beijing, in 2013.
He is currently a Ph.D. candidate in
ICT, CAS. His research interests include computer
architecture especially on near-threshold computing,
low power architecture and approximate computing.
He is a student member of CCF/IEEE.
J. Comput. Sci. & Technol., Mar. 2016, Vol.31, No.2
Gui-Hai Yan is an associate professor of ICT, CAS. He received his B.S.
degree in electronic information science
and technology from Peking University,
Beijing, in 2005, and Ph.D. degree in
computer architecture from ICT, CAS,
in 2010. His research focuses on energyefficient and resilient architectures. He
was awarded the CCF Outstanding
Doctoral Dissertation and Chinese Academy of Sciences
Outstanding Doctoral Dissertation for contributions to
reliable computer architecture. He is a member of CCF,
ACM, and IEEE.
Yin-He Han received his B.E. degree
from the College of Automatization Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, in
2001, and his M.E. and Ph.D. degrees in
computer science from the Institute of
Computing Technology (ICT), Chinese
Academy of Sciences (CAS), Beijing, in
2003 and 2006, respectively. He is currently a professor
at the State Key Laboratory of Computer Architecture,
ICT, CAS. His research interests include computer architecture especially on fault-tolerant and low power architecture, VLSI design and test. Prof. Han was a recipient
of Best Paper Award at ATS 2003. He is program chair of
ATS 2014, finance chair of HPCA 2013, program co-chair of
WRTLT 2009, and serves on the Technical Program Committees of several IEEE and ACM conferences, including
PACT 2014, HPCA 2013, ASPDAC 2013, Cool Chip 2013,
ATS 2008∼2010, GVLSI 2009∼2010, etc. He is a senior
member of CCF and IEEE, and a member of ACM.
Xiao-Wei Li received his B.E. and
M.E. degrees in computer science from
Hefei University of Technology, Hefei,
in 1985 and 1988 respectively, and his
Ph.D. degree in computer science from
the Institute of Computing Technology
(ICT), Chinese Academy of Sciences
(CAS), Beijing, in 1991. Currently,
he is a professor and deputy director
of the State Key Laboratory of Computer Architecture,
ICT, CAS. His research interests include VLSI testing and
design verification, dependable computing, and wireless
sensor networks. He is an associate editor-in-chief of the
Journal of Computer Science and Technology, and a member of editorial board of Journal of Electronic Testing and
Journal of Low Power Electronics. In addition, he serves
on the Technical Program Committees of several IEEE
and ACM conferences, including VTS, DATE, ASP-DAC,
PRDC, etc. He was program co-chair of IEEE Asian
Test Symposium in 2003, and general co-chair of ATS 2007.
Download