Andrey Mokhov School of Computing Science

advertisement
Hardware/Software Mechanisms for
Cross-Layer Power Proportionality
“Power Prop”
Alex Yakovlev, Andrey Mokhov,
Sascha Romanovsky,
Max Rykunov, Alexei Iliasov and Danil
Sokolov,
Schools of EEE and CS, Newcastle
University
Power
Prop
The more you get
The more you give!
Moore’s law and Power trends
Part I:
Power Proportionality
Power Proportionality
Issues reported in literature:
•Performance -power tradeoff for
commodity systems is linear; the best
strategy is “Race to sleep”; additional
“run” power states are of little use;
changes in existing commodity operating
systems have little influence
•The focus should be on the time to
transition to and from sleep!
•For a new type of systems such as WSN
there is a non-linear region – the slogan
is: learn how to run CMOS slowly and
exploit scheduling optimizations
Core i7
power drawn
at different
frequencies
Source: S. Dawson-Haggerty
et al. Power Optimization –
Reality Check, UC Berkeley,
2009
Power proportionality
Service-modulated
processing
Energy-modulated
processing
Part II:
Reconfigurable Processors
Achieving Power Proportionality
• Support for wide range of voltages
– Asynchronous design
– Unstable voltage supply (energy harvesting)
• Components optimised for different modes
– Survival mode (power)
– Mission mode (energy efficiency)
– Emergency mode (performance)
• Reconfigurable instructions
– Altering instruction behaviour in runtime
Pathway from a high-level specification
a low-level MCU implementation
CS + EE
CS + EE
Chip
design
Chip
tapeout
CS + EE
EE
Chip
Testing
Reconfigurable Instructions
DP3(x, y) = x1y1 + x2y2 + x3y3
Resource-level refinement
•
•
Functionality: DP3(x, y) = x1y1 + x2y2 + x3y3
Abstract specification:
Initialisation: c := 0
Invariant: (c = 1) => (res = x1y1 + x2y2 + x3y3)
Event: if (c = 0) then (res := x1y1 + x2y2 + x3y3 & c := 1)
Open the black box and
show what is inside:
- Perform multiplications by
2-input fast multipliers
- Perform addition by 3-input adder
Reconfigurable Instructions
2 multipliers
Balanced
Fastest
111
001
Least peak power
011
101
Dedicated
component
000
Reconfigurable Instructions
Reconfigurable Instructions
x=1 y=0 z=1
Part III: Intel 8051
Final remarks
• Towards power proportionality
– Voltage range: 0.2V – 1.5V
– Performance range: 2.7K – 67M instructions/sec
• Survival of components
– Full capability mode: 0.89V – 1.5V
– RAM fails at 0.89V
– Program counter unreliable below 0.74V
– Asynchronous control survives until 0.2V
PCB board for evaluation
PCB board with FPGA
16
Project outcomes:
• Conference and journal papers:
– Towards Reconfigurable Processors for Power-Proportional
Computing, A. Mokhov, M. Rykunov, D. Sokolov and A. Yakovlev,
Proceedings of the 12th IEEE Low Voltage Low Power Conference
(FTFC), Paris, France, 2013.
– Design-for-Adaptivity of Microarchitectures, M. Rykunov, A.
Mokhov, D. Sokolov, A. Yakovlev and A. Koelmans, Proceedings of
the 24th IEEE International Conference on Application-specific
Systems, Architectures and Processors, Washington D.C., USA,
2013.
– Synthesis of processor instruction sets from high-level ISA
specifications, A. Mokhov, M. Rykunov, D. Sokolov, A. Yakovlev, A.
Iliasov, and A. Romanovsky. IEEE Transactions on Computers, 2013.
– Design of Processors with Reconfigurable Microarchitecture,
A. Mokhov, M. Rykunov, D. Sokolov, and A. Yakovlev, Journal of
Low Power Electronics and Applications, 2013. (Under review).
17
Project outcomes (cont.):
• Several MSc projects
• PhD thesis – “Design of Asynchronous Microprocessor for
Power Proportionality” (Nov. 2013).
• The PowerProp project established several important industrial
connections, e.g. Maxeler Technologies, IBM Research, etc.
• Some PowerProp theory, tool support and software ideas have
moved to a new Programme Grant -- PRiME (EP/K034448/1).
•CPU design ideas will be used in SAVVIE project (EP/K012908/1).
• Helped to promote joint CS+EE developments in Workcraft
(graph-based EDA environment), used in several EPSRC projects.
18
Thank you!
Parameterised Graphs for formal
specification of Multi-modal systems
DP3 instruction computes dot product x·y = x1·y1 + x2·y2 + x3·y3.
–declaration of the functional units
a = unit "2-input adder"
b = unit “3-input adder"
c = unit “2-input multiplier"
d = unit "fast 2-input multiplier"
e = unit "dedicated DP3 unit“
–specification of each instruction
inst_a = (d1 + d2 + d3) -> b
inst_b = c1 -> c1 1 -> c1 -> b
inst_c = e
inst_d = (c2 + c1) -> a + c1 -> c1 -> a
inst_e = d1 -> d1 -> (a + c1) -> a
20
Parameterised Graphs for formal
specification of Multi-modal systems
DP3 instruction computes dot product x·y = x1·y1 + x2·y2 + x3·y3.
–declaration of the functional units
a = unit "2-input adder"
b = unit “3-input adder"
c = unit “2-input multiplier"
d = unit "fast 2-input multiplier"
e = unit "dedicated DP3 unit“
–specification of each instruction
inst_a = (d1 + d2 + d3) -> b
inst_b = c1 -> c1 1 -> c1 -> b
inst_c = e
inst_d = (c2 + c1) -> a + c1 -> c1 -> a
inst_e = d1 -> d1 -> (a + c1) -> a
23
Intel 8051 Instruction Set
CJNE Instruction
CJNE Instruction
CJNE Instruction
Branch taken
Branch not taken
Measurements: Current & Latency
Measurements: Power
Measurements: Energy Efficiency
Some measurements…
• 0.89V to 1.5V: full capability mode.
• 0.74V to 0.89V: at 0.89V the RAM starts
to fail, so the chip can only operate using
internal registers.
• 0.22V to 0.74V: at 0.74V the program
counter starts to fail, however the control
logic synthesised using the CPOG model
continues to operate correctly down to 0.22V
• 67 MIPS at 1.2 V.
• ~2700 instructions per second at 0.25V.
31
Download