Architecture Exploration for Ambient Energy Harvesting Nonvolatile

advertisement
Architecture Exploration For Ambient Energy Harvesting
Nonvolatile Processors
Introduction
• Future: powered by technology harvesting ambient energy sources
• Battery-free systems
• Ambient energy sources:
Solar Energy
Wi-Fi and Radio Frequency (RF) energy
Motion energy – Piezoelectric devices
Eg. Wireless powered smart contact lens
Application Categories
Applications vary in complexity, throughput constraints and computational
demands.
Based on demand for nonvolatility, categorized into:
1.Signal detection and sensing: Detection and relaying. Eg. UV radiation,
blood pressure, blood sugar level, temperature
2.Signal detection and analysis: Computation carried out for analyzing the
signal for diagnosis. Eg. wearable EEG/ECG
3.Signal prediction: Predicts future pattern. Eg. Wearable systems that warn
against seizures
Ambient energy sources are unreliable. Category 1 is easier to implement
Category 2, 3 require QoS (to be completed within fixed time)
Energy Harvesting System Structure
• Energy Harvesting and Management:
Determines entire power used for signal
sensing, processing and transmission
• Digital Signal Processor: More about it
later
• I/O Interface and analog RF frontend:
Digital interfaces, antennas, etc
Processor Design: Volatile Vs Nonvolatile
• Volatile processor with
periodic checkpointing –
Forced rollback to previously
checkpointed state
• NV processor: enables more
complex state-dependent
signal processing that tolerates
power source insufficiency and
unreliability – consumes more
power for read and write
Architectural Exploration
Parameters to be analyzed:
•Number of pipeline stages
•Data to be backed up
•Frequency of backup
Assumptions:
•MIPS ISA
•Clock frequency - 8 KHz: limited strength of the Wi-Fi signal used
•Instruction memory (ROM) and ICache (SRAM, NVM)
•Data memory (nonvolatile) and DCache (NV write-back)
Non-pipelined Configuration (NP)
Entire state of the processor can be characterized by a single instruction state
• Program Counter (PC): Instruction being executed and needs to be stored
• Register File (RegFile): Volatile RegFile is energy efficient due to frequent
usage and large number of frequent read and writes
Tradeoff between energy consumed in backing up and recovering data and
the overall performance
Which data to save? When to save? 3 policies:
• Backup Every Cycle (BEC)
• On Demand All Backup (ODAB)
• On Demand Selective Backup (ODSB)
NV – Backup Every Cycle (BEC)
• Employs NVM RegFile inspite of significant energy penalty, else volatile and
nonvolatile need to be updated every cycle
• PC and few registers in RegFile written every cycle
• Instructions like StoreWord and Jump do not require RegFile write
NV – On Demand All Backup (ODAB)
• All RegFile entries to be backed up in the event of reduced power state
• If input power < preset threshold, power warning signal is activated
• Control unit backs up PC and resets atomic flag
• Upon power restore, energy is accumulated in the capacitor
NV – On Demand Selective Backup (ODSB)
• Synchronous power warning signal ensures that current PC finishes executing
and writing back. PC + 4 is stored to avoid re-execution
• Change flag to identify if a register has been written into
• Control unit doesn’t generate address for unchanged data
• Reduces backup time and energy penalty
Simulation Results And Comparison
• Total area is similar as NVM cache and backup blocks are much bigger than
logic
• BEC has lowest peak frequency due to frequent backups
• Recovery time: Time from activation of Energy OK signal to the time all
backup operations are complete
• ODSB backup time < ODAB backup time
Simulation Results And Comparison
• ODSB is more energy efficient with
stable source like solar
• ODSB can reduce backup energy
penalty by 69% with 0.002% area
overhead
• BEC doesn’t need time to
accumulate energy in cap, viable
when power failure is extremely
frequent (less than 1 in 10 cycles)
N-stage-pipeline
• Increased circuit complexity and activity factor results in higher power
threshold compared to non-pipelined processor
• 5 Stage Pipeline (5SP) under study
• Two backup schemes proposed:
1) Shifted PC and Volatile Flip-flops (SPC/VFF)
2) Nonvolatile Flip-flops Solution (NVFF)
Shifted PC & Volatile FF (SPC/VFF)
• Pipelined data flow with bypass and forward, complex control flow to
handle hazard
• Shifter buffer stores the PC value in each pipeline stage
• When power is down, PC in write back stage will be finished, unfinished PC to
be backed up will be in data memory stage
• Shifter used instead of rolling back since different
PC needs to be backed up for jump and branch
• An extra 4 clock cycles are needed to re-execute
the last 4 instructions lost from the latter pipeline stages after recovery
Nonvolatile FF Solution (NVFF)
• This solution uses NVM flip-flops
• SPC/VFF requires 11% less time and 57% less energy than NVFF
Out-of-order Processor (OoO)
• More complex than NP and
5SP
• System state is broadly
distributed across structures
such as PC, ROB, RegFile, Map
Table, Issue Queue, Load Store
Queue, BHT and BTB
• Larger power requirement 
fewer periods where the input
power exceeds the min
threshold. Which structures
need to be backed up?
Resource Selection Strategies
The resource selection strategies proposed are:
1)Minimum State Resource backup solution (MinR)
2)Low-latency Backup solution (LLB)
3)Middle-level Backup solution (MLB)
4)Min-state-lost Backup solution (MPL)
5)Integrated Flexible Atomic Backup Solution (IFA)
Resource Selection Strategies
• Minimum State Resource backup solution (MinR):
Backs up min number of bits required to preserve functionality
Depends on branch misprediction mechanism to minimize the number of
valid/ relevant state bits prior to backup.
ROB and PC: Backs up the first uncommitted PC at the head of ROB
ARegFile is backed up as it is small
Map Table: Pseudo-Misprediction is used to restore Map table
PRegFile, Ready Table, Free List, BHT, BTB can be recovered
Resource Selection Strategies
• Low Latency Backup solution (LLB): Aims to minimize the number of bits to
store if backup begins immediately
Backs up the entire ROB, IQ, ARegFile, Map Table and PRegFile
• Middle-level Backup solution (MLB): Backs up Ready table and Free List as
well
• Min-state-lost Backup solution (MPL): All structures including BHT, BTB backed
up
• Integrated Flexible Atomic Backup Solution (IFA): Even if the power is below
threshold, it could allow for an optional state (BHT) to be stored subjected to
optimistic attempt
OOO Strategies Comparison
In MinR pseudo-misprediction operation for map table requires extra backup
clock cycles. While recovering, extra clock cycles needed to restore PRegFile,
Ready Table and Free Table
OoO Strategies Comparison
• LLB: ROB, PRegFile are large  increase backup time and energy. Recovery
energy is smaller as instructions in ROB are backed up (no re-execution)
• MPL incurs largest backup and recovery penalties, but backing up all structures
incurs min latency to return to peak performance after a power failure
• OoO needs higher threshold, but periods of sufficient power are common
enough to allow superior performance to pay for lost clock cycles
Simulation Results
• The configurations are compared with
baseline non-pipelined volatile processor
without checkpointing or data backup
• The volatile processor’s progress returns to
zero when power drops to below threshold
• Nonvolatile NP and 5SP have higher power
threshold
• OoO runs for only a small fraction of time but
its performance can be upto 4x faster than NP
and 5SP
Validation
• Non pipelined On Demand strategy was explored using an actual
fabricated processor (THU1010N)
• It has an Intel 8051 CISC like architecture
• The saved state includes the state machine that captures current instruction
• PC, RegFiles are FeRAM based FF. FF have additional backup FeCap
• NV processor based system interfaced to a solar panel and UV sensor
Operation
• Upon power failure detection, NV control logic backs up DFFs to FeCaps
• When power resumes, data is restored from FeCaps to DFFs
• Internal RC oscillator is used. External osc becomes unstable with low power
Simulator calibration:
• Several kernels executed both on platform and simulator
• Intermittent power supply modeled by a 1KHz square waveform
• Processor frequency: 3MHz
• Each kernel is executed 1000 times to obtain completion time
• Stable power case: No mismatch; Unstable power case: mismatch < 5%
• Simulator averages energy consumed by instruction to estimate remaining
energy
Dependence On Input Power
• Input signal characteristics plays
a major role in determining
optimal design.
• Performance of backup
schemes with home and office
Wi-Fi sources for harvesting
• In home, NP ODSB architecture is
best performing, in office OoO
MPL is most desirable
Dependence On Nature Of Input Source
• Input energy sources differ in magnitude
• For each case, the best performing
backup policy is adopted
• For same input power source, the actual
execution time for NP and 5SP is almost
same
• Higher power threshold in OoO results in
longer Off time
Meeting QoS Requirements
• Some application (like ECG)
require periodic outputs within
fixed time periods – QoS constraints
• Ambient energy - unreliable
• Piezo and solar can provide almost
100% QoS
• QoS can be improved by:
Shrinking size and using FinFETs
 Power reduction techniques: dark
silicon aware architecture, clock
gating, DVFS, DATS, Tunnel FET, low
power sub-threshold circuits
Conclusion
• Explored various factors : battery-less system with ambient energy
• Intermittent energy source: Different nonvolatile processor configurations,
techniques to conserve state while maximizing forward progress
• Examined tradeoffs between performance and energy for different
architecture
• Compared and validated simulation results with nonvolatile solar energy
harvesting processor platform
• The video of HPCA 2015 Best Paper Competition Demo
References
• KaiSheng Ma, Yang Zheng, Shuangchen Li, Karthik Swaminathan, Xueqing Li,
Yongpan Liu, Jack Sampson, Yuan Xie, Vijaykrishnan Narayanan. "
Architecture Exploration for Ambient Energy Harvesting Nonvolatile
Processors", The International Symposium on High-Performance Computer
Architecture (HPCA-21)
• A. Parks, A. Sample, Z. Yi, and J. Smith. A wireless sensing platform utilizing
ambient RF energy. In IEEE Radio and Wireless Symposium (RWS), 2013.
• S. Kannan, A. Gavrilovska, K. Schwan, and D. Milojicic. Optimizing
checkpoints using NVM as virtual memory. In IPDPS, 2013.
• X. Dong, C. Xu, Y. Xie, and N. Jouppi. NVSim: A circuit-level performance,
energy, and area model for emerging nonvolatile memory. IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, 31(7):994–
1007, 2012.
Questions?
Thank You
Download