Design Flows and Tools Peter A. Beerel University of Southern California USC Asynchronous CAD/VLSI Group (async.usc.edu) Part II - Agenda Design Flows • Design via decomposition • Modeling design using System Verilog Design Automation – The Proteus-A flow • Legacy RTL • Added System Verilog CSP front-end • Asynchronous optimizations Final Flow Considerations • Analog Verification • Design for Test and Debug Design via Process Decomposition • Collection of Processes linked by Channels • Channels pass messages with guaranteed delivery • Processes synchronize • Processes can be decomposed into smaller processes Modeling Asynchronous Design via SystemVerilogCSP (SVC) • SystemVerilog interface abstracts channel wires as well as communication protocol • Send/Receive • Blocking tasks (Flow control) Sender SVC Interface Receiver Abstract communication module Sender (interface R); parameter WIDTH = 8; logic [WIDTH-1:0] data; always begin //produce data R.Send(data); end endmodule module Receiver (interface L); parameter WIDTH = 8; logic [WIDTH-1:0] data; always begin L.Receive(data); //consume data end endmodule SVC - Waveform view //Sender (DataGen) always begin #Delay; R.Send(data); End //Receiver always begin L.Receive(data); #FL; R.Send(data); #BL; end Receiver pending on Receive Sender performs Send, Communication happens No one is Sending or Receiving Sender pending on Send Receiver performs Receive, Communication happens Part II - Agenda Design Flows • Design via decomposition • Modeling design using System Verilog Design Automation – The Proteus-A flow • Legacy RTL • Added System Verilog CSP front-end • Asynchronous optimizations Final Flow Considerations • Analog Verification • Design for Test and Debug The Proteus-A Flow – Legacy RTL Key Features • Re-uses synchronous EDA tools • Seamless integration into existing flows • Back-end design style agnostic • Up to 2X higher performance Tool Status • Commercialized version in production Proteus/ Sync Sync Library for 2+ years Library • Uses proprietary QDI library • Academic version (Proteus-A) enhanced significantly at USC Synth RTL Design Goals Synthesis Image Netlist Netlist ClockGating Gating Clock Netlist ClockFree Constraints Clock Tree Synthesis Async NetlistNetlist Constraints Physical Design Recent Advances • Power optimization algorithms Constraints Final Layout Flow Demo – Legacy RTL Physical Design Synth. RTL Synthesis Clockfree Legacy RTL Specification Final Layout Synthesized Image Netlist Asynchronous Gate-level Netlist Amber23 – Proteus-A Case Study • Download from http://opencores.com/project,amber • ARM-compatible 32-bit RISC processor • 3 stages : FETCH, DECODE and EXECUTE Cache Cache Bus Bus interface interface instruction Decode State machine control Register bank Barrel shifter ALU Multiplexer Read data Zhang, USC Summer Research, 2012 Address, write data Amber23 – Performance Comparison • Download from http://opencores.com/project,amber • ARM-compatible 32-bit RISC processor • 3 stages : FETCH, DECODE and EXECUTE Cache Cache Bus Bus interface interface instruction Decode State machine control Register bank Barrel shifter ALU Multiplexer Read data Address, write data Zhang, USC Summer Research, 2012 The Proteus-A Flow – SVCRTL SystemVerilog Key New Features Design Verilog SVC2RTL Goals • Supports System Verilog CSP front-end • Enables user-defined conditional communication Tool Status • System Verilog version subsequently developed at USC • Used in current research at USC and Technion and 40+ person async class Constraints Synthesis • Saves power at architectural level • Proprietary version starting from CAST developed at Fulcrum Synth. RTL Image Netlist Netlist Proteus/ Sync Sync Library Library Constraints ClockGating Gating Clock Netlist ClockFree Constraints Clock Tree Synthesis Async NetlistNetlist Constraints Physical Design Final Layout Key to Low-Power Conditional Communication D op + 0 S 0 R0 Mult 0 MUX A,B DEMUX Add/Sub + Conditional communication reduces token flow, saving power • Traditionally - manually introduced via user-created decomposition • Recent research - automatically introduced via Operand Isolation Saifhashemi, PATMOS 2012 SVC2RTL – Enables User-Defined Conditional Communication Dummy value 0 0 Not received Not sent 1 0 1 Part II - Agenda Design Flows • Design via decomposition • Modeling design using System Verilog Design Automation – The Proteus-A flow • Legacy RTL • Added System Verilog CSP front-end • Asynchronous optimizations Final Flow Considerations • Analog Verification • Design for Test and Debug Power Optimization Overview • Conditioning • Automatically add conditional communication • Reconditioning • Optimize the existing conditionality Power Saving - The Opportunity + Unnecessary calculation Our Solution - Adding Isolation Cells • All inputs/outputs are unconditional • Operand Isolation • And-based isolation cells • Generated by synchronous RTL synthesizer • Does not prevent switching in asynchronous circuits Isolation cells are not effective in asynchronous circuits Our Solution - Conditioning & + + 0 0 No Activity Power Optimization Results • Case study: 32-bit ALU placed and routed • Back annotated switching activity using a VCD file • Results: • Isolating ADD and SUB are detrimental for rADD and rSUB > 0.2 • 53% power reduction when only isolating MUL (rf=0.25) • Area cost of isolating MUL is about 4% and no performance penalty Saifhashemi, Patmos 2012 Power Savings – The Opportunity Unnecessary activity 0 0 1 0 1 0 Unnecessary activity Conditional communication is explicit and only at primary IO The Reconditioning Problem Definition (The Reconditioning Problem): Rearrange location of RECEIVE and SEND cells to minimize Power consumption while preserving functional behavior. Power Results Power Comparison: 32 bit Power Comparison: 32 bit 8000 7000 5000 4000 Original 3000 Greedy0 2000 MILP 1000 0 0.25 0.5 Power Power 6000 4500 4000 3500 3000 2500 2000 1500 1000 500 0 0.75 Original Greedy0 MILP 0.25 Operational factor 0.75 Operational factor RECON1: Dual-mode arithmetic unit RECON2: Conditional multiplier Power Power Comparison: 32 bit 4500 4000 3500 3000 2500 2000 1500 1000 500 0 Original Greedy0 MILP 0.25 Saifhashemi, PhD Thesis, 2012 0.5 0.5 Operational factor 0.75 ALU-OI ALU after operand isolation Mode Based Conditional Slack Matching op S R S R MUX A,B DEMUX Add/Sub Mult Conditional Slack Matching Advantage – Conditional behavior yields less stalls and thus not as many pipeline buffers needed • Previously ignored – conservatively modeled as unconditional Najibii,2012 Conditional Slack Matching - Results 33% less buffers on average Najibii,2012 Design Flow Demo SystemVerilog Design Goals SVC2RTL Synth. RTL Constraints Synthesis Image Netlist Proteus/ Sync Library Constraints ClockFree Async Netlist Constraints Physical Design Final Layout Agenda Design Flows • Design via decomposition • Modeling design using System Verilog Design Automation – The Proteus-A flow • Legacy RTL • Added System Verilog CSP front-end • Asynchronous optimizations Final Flow Considerations • Analog Verification • Design for Test and Debug Final Flow Considerations Static Timing Analysis • Verify timing constraints and performance is a must • Trick traditional tools into working with asynchronous circuits Analog Verification • Domino logic used in QDI flows sensitive to charge sharing • Asynchronous channels cannot tolerate cross-talk glitches • Special spiced-based tools developed Asynchronous Scan • Asynchronous scan is a must but doable Design for Silicon Debug • Chip deadlock is still difficult to debug Conclusions The Asynchronous Design Flow/CAD Landscape • Synchronous design rigidity continues to hamper quality design • Asynchronous design offers solutions but has many design flow challenges Design Flow Requirements • Design flows must easily integrate into synchronous designs • Circuit quality must compete very well to warrant switching design styles Our approach • Proteus provides a good design framework for automation of both legacy RTL and SystemVerilog CSP • Final considerations of analog and timing verification, scan, and debug should not be over looked Acknowledgements http://ee.usc.edu/async2013