Architecture Exploration For Ambient Energy Harvesting Nonvolatile Processors Introduction • Future: powered by technology harvesting ambient energy sources • Battery-free systems • Ambient energy sources: Solar Energy Wi-Fi and Radio Frequency (RF) energy Motion energy – Piezoelectric devices Eg. Wireless powered smart contact lens Application Categories Applications vary in complexity, throughput constraints and computational demands. Based on demand for nonvolatility, categorized into: 1.Signal detection and sensing: Detection and relaying. Eg. UV radiation, blood pressure, blood sugar level, temperature 2.Signal detection and analysis: Computation carried out for analyzing the signal for diagnosis. Eg. wearable EEG/ECG 3.Signal prediction: Predicts future pattern. Eg. Wearable systems that warn against seizures Ambient energy sources are unreliable. Category 1 is easier to implement Category 2, 3 require QoS (to be completed within fixed time) Energy Harvesting System Structure • Energy Harvesting and Management: Determines entire power used for signal sensing, processing and transmission • Digital Signal Processor: More about it later • I/O Interface and analog RF frontend: Digital interfaces, antennas, etc Processor Design: Volatile Vs Nonvolatile • Volatile processor with periodic checkpointing – Forced rollback to previously checkpointed state • NV processor: enables more complex state-dependent signal processing that tolerates power source insufficiency and unreliability – consumes more power for read and write Architectural Exploration Parameters to be analyzed: •Number of pipeline stages •Data to be backed up •Frequency of backup Assumptions: •MIPS ISA •Clock frequency - 8 KHz: limited strength of the Wi-Fi signal used •Instruction memory (ROM) and ICache (SRAM, NVM) •Data memory (nonvolatile) and DCache (NV write-back) Non-pipelined Configuration (NP) Entire state of the processor can be characterized by a single instruction state • Program Counter (PC): Instruction being executed and needs to be stored • Register File (RegFile): Volatile RegFile is energy efficient due to frequent usage and large number of frequent read and writes Tradeoff between energy consumed in backing up and recovering data and the overall performance Which data to save? When to save? 3 policies: • Backup Every Cycle (BEC) • On Demand All Backup (ODAB) • On Demand Selective Backup (ODSB) NV – Backup Every Cycle (BEC) • Employs NVM RegFile inspite of significant energy penalty, else volatile and nonvolatile need to be updated every cycle • PC and few registers in RegFile written every cycle • Instructions like StoreWord and Jump do not require RegFile write NV – On Demand All Backup (ODAB) • All RegFile entries to be backed up in the event of reduced power state • If input power < preset threshold, power warning signal is activated • Control unit backs up PC and resets atomic flag • Upon power restore, energy is accumulated in the capacitor NV – On Demand Selective Backup (ODSB) • Synchronous power warning signal ensures that current PC finishes executing and writing back. PC + 4 is stored to avoid re-execution • Change flag to identify if a register has been written into • Control unit doesn’t generate address for unchanged data • Reduces backup time and energy penalty Simulation Results And Comparison • Total area is similar as NVM cache and backup blocks are much bigger than logic • BEC has lowest peak frequency due to frequent backups • Recovery time: Time from activation of Energy OK signal to the time all backup operations are complete • ODSB backup time < ODAB backup time Simulation Results And Comparison • ODSB is more energy efficient with stable source like solar • ODSB can reduce backup energy penalty by 69% with 0.002% area overhead • BEC doesn’t need time to accumulate energy in cap, viable when power failure is extremely frequent (less than 1 in 10 cycles) N-stage-pipeline • Increased circuit complexity and activity factor results in higher power threshold compared to non-pipelined processor • 5 Stage Pipeline (5SP) under study • Two backup schemes proposed: 1) Shifted PC and Volatile Flip-flops (SPC/VFF) 2) Nonvolatile Flip-flops Solution (NVFF) Shifted PC & Volatile FF (SPC/VFF) • Pipelined data flow with bypass and forward, complex control flow to handle hazard • Shifter buffer stores the PC value in each pipeline stage • When power is down, PC in write back stage will be finished, unfinished PC to be backed up will be in data memory stage • Shifter used instead of rolling back since different PC needs to be backed up for jump and branch • An extra 4 clock cycles are needed to re-execute the last 4 instructions lost from the latter pipeline stages after recovery Nonvolatile FF Solution (NVFF) • This solution uses NVM flip-flops • SPC/VFF requires 11% less time and 57% less energy than NVFF Out-of-order Processor (OoO) • More complex than NP and 5SP • System state is broadly distributed across structures such as PC, ROB, RegFile, Map Table, Issue Queue, Load Store Queue, BHT and BTB • Larger power requirement fewer periods where the input power exceeds the min threshold. Which structures need to be backed up? Resource Selection Strategies The resource selection strategies proposed are: 1)Minimum State Resource backup solution (MinR) 2)Low-latency Backup solution (LLB) 3)Middle-level Backup solution (MLB) 4)Min-state-lost Backup solution (MPL) 5)Integrated Flexible Atomic Backup Solution (IFA) Resource Selection Strategies • Minimum State Resource backup solution (MinR): Backs up min number of bits required to preserve functionality Depends on branch misprediction mechanism to minimize the number of valid/ relevant state bits prior to backup. ROB and PC: Backs up the first uncommitted PC at the head of ROB ARegFile is backed up as it is small Map Table: Pseudo-Misprediction is used to restore Map table PRegFile, Ready Table, Free List, BHT, BTB can be recovered Resource Selection Strategies • Low Latency Backup solution (LLB): Aims to minimize the number of bits to store if backup begins immediately Backs up the entire ROB, IQ, ARegFile, Map Table and PRegFile • Middle-level Backup solution (MLB): Backs up Ready table and Free List as well • Min-state-lost Backup solution (MPL): All structures including BHT, BTB backed up • Integrated Flexible Atomic Backup Solution (IFA): Even if the power is below threshold, it could allow for an optional state (BHT) to be stored subjected to optimistic attempt OOO Strategies Comparison In MinR pseudo-misprediction operation for map table requires extra backup clock cycles. While recovering, extra clock cycles needed to restore PRegFile, Ready Table and Free Table OoO Strategies Comparison • LLB: ROB, PRegFile are large increase backup time and energy. Recovery energy is smaller as instructions in ROB are backed up (no re-execution) • MPL incurs largest backup and recovery penalties, but backing up all structures incurs min latency to return to peak performance after a power failure • OoO needs higher threshold, but periods of sufficient power are common enough to allow superior performance to pay for lost clock cycles Simulation Results • The configurations are compared with baseline non-pipelined volatile processor without checkpointing or data backup • The volatile processor’s progress returns to zero when power drops to below threshold • Nonvolatile NP and 5SP have higher power threshold • OoO runs for only a small fraction of time but its performance can be upto 4x faster than NP and 5SP Validation • Non pipelined On Demand strategy was explored using an actual fabricated processor (THU1010N) • It has an Intel 8051 CISC like architecture • The saved state includes the state machine that captures current instruction • PC, RegFiles are FeRAM based FF. FF have additional backup FeCap • NV processor based system interfaced to a solar panel and UV sensor Operation • Upon power failure detection, NV control logic backs up DFFs to FeCaps • When power resumes, data is restored from FeCaps to DFFs • Internal RC oscillator is used. External osc becomes unstable with low power Simulator calibration: • Several kernels executed both on platform and simulator • Intermittent power supply modeled by a 1KHz square waveform • Processor frequency: 3MHz • Each kernel is executed 1000 times to obtain completion time • Stable power case: No mismatch; Unstable power case: mismatch < 5% • Simulator averages energy consumed by instruction to estimate remaining energy Dependence On Input Power • Input signal characteristics plays a major role in determining optimal design. • Performance of backup schemes with home and office Wi-Fi sources for harvesting • In home, NP ODSB architecture is best performing, in office OoO MPL is most desirable Dependence On Nature Of Input Source • Input energy sources differ in magnitude • For each case, the best performing backup policy is adopted • For same input power source, the actual execution time for NP and 5SP is almost same • Higher power threshold in OoO results in longer Off time Meeting QoS Requirements • Some application (like ECG) require periodic outputs within fixed time periods – QoS constraints • Ambient energy - unreliable • Piezo and solar can provide almost 100% QoS • QoS can be improved by: Shrinking size and using FinFETs Power reduction techniques: dark silicon aware architecture, clock gating, DVFS, DATS, Tunnel FET, low power sub-threshold circuits Conclusion • Explored various factors : battery-less system with ambient energy • Intermittent energy source: Different nonvolatile processor configurations, techniques to conserve state while maximizing forward progress • Examined tradeoffs between performance and energy for different architecture • Compared and validated simulation results with nonvolatile solar energy harvesting processor platform • The video of HPCA 2015 Best Paper Competition Demo References • KaiSheng Ma, Yang Zheng, Shuangchen Li, Karthik Swaminathan, Xueqing Li, Yongpan Liu, Jack Sampson, Yuan Xie, Vijaykrishnan Narayanan. " Architecture Exploration for Ambient Energy Harvesting Nonvolatile Processors", The International Symposium on High-Performance Computer Architecture (HPCA-21) • A. Parks, A. Sample, Z. Yi, and J. Smith. A wireless sensing platform utilizing ambient RF energy. In IEEE Radio and Wireless Symposium (RWS), 2013. • S. Kannan, A. Gavrilovska, K. Schwan, and D. Milojicic. Optimizing checkpoints using NVM as virtual memory. In IPDPS, 2013. • X. Dong, C. Xu, Y. Xie, and N. Jouppi. NVSim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31(7):994– 1007, 2012. Questions? Thank You