Observability Conditions and Automatic OperandIsolation in High-Throughput Asynchronous Pipelines Arash Saifhashemi Peter A. Beerel University of Southern California USC Asynchronous CAD/VLSI Group (async.usc.edu) (Thanks to a grant from Intel and NSF) Patmos 2012, Sep 2012, Newcastle upon Tyne Asynchronous Circuit Design - Today Applications • 3D Network on chips (STMicroelectronics) • Ethernet Switches (Intel SRD) • Ultra high-speed FPGAs (Achronix) STMicroelectronics WIOMING 3D-IC (July 2012) • Process variation • Low-power chip design (Encryption – Tiempo, …) Basic challenges: Automation Achronix FPGA. 1.7 M LUTs. 2.1 Gbps IO Tiempo TAM16 Clockless 16-bit microcontroller Proteus design flow (USC) • Uses commercial synchronous CAD tools • Starting at a high-level specification written in SVC (SystemVerilogCSP) Fulcrum Microsystems Ethernet switch chip (up to 72 10G ports, 40G) - 1.2 B transistors, 90% Asynchronous 13% Proteus The Proteus Flow SystemVerilog Key Features Design Verilog SVC2RTL Goals • Re-uses synchronous EDA tools • Seamless integration into existing flows Synth. RTL • Up to 2X higher performance Synthesis Tool Status Image Netlist Netlist • Started at USC Async CAD/VLSI • Commercialized by TimeLess (2008) • Acquired by Fulcrum (2010) • Intel Acquired Fulcrum (2011) Constraints Proteus/ Sync Sync Library Library Constraints ClockGating Gating Clock Netlist ClockFree Constraints Clock Tree Synthesis Async NetlistNetlist Constraints • Used in Intel Ethernet Alta FM6000 chip The Problem • Limited and manual power optimization Physical Design 6 Final Layout Conditional Communication in Proteus Dummy value 0 0 Not received Not sent 1 0 1 Example: ALU SVC Description No conditionality in high-level description Reconverging fanouts + Unnecessary calculation Adding Isolation Cells • All inputs/outputs are unconditional • Operand Isolation • And-based isolation cells • Generated by synchronous RTL synthesizer • Does not prevent switching in asynchronous circuits Isolation cells are not effective in asynchronous circuits Three-valued logic • Formal justification of conditioning • Three-valued logic image model • Each iteration is modeled by a clock cycle • Each variable can be 0, 1, or N (no token) One iteration Status of each channel 3VL Unconditional Functions Unconditional functions • Can be represented only by , , operators • Example: functions represented by combinational gates in a typical cell library: NAND, NOR, AOI, XOR, … Lemma 1: the output is N iff at least one of the inputs is N. SEND/RECEIVE Operators • Conditional Communication • RECEIVE and SEND are modeled as β and β operators Behave like buffers when E=1 SEND Reconditioning Assuming y=f(x) is unconditional and e TFO(y) Lemma 2: Application: SEND cells can be moved through logic • Similar to retiming in synchronous circuits Less number of SENDs Less switching when e=0 Observability in 3V Networks Local Observability Partial Care (LOPC) • OPC(f,C,xj) of input xj of a node representing a function f is the condition under which f’s output is not affected as xj changes in C {0,1,N} Global Observability Partial Care (GOPC) • GOPC(C,x) of a variable x is the condition under which the value of no primary output is affected as the value of x changes in C πππΆ π, πΆ, π₯ implies {0,1,N} πΊπππΆ πΆ, π₯ s =1 • Example: πππΆ ππ’π₯, 0,1 , π1 = π i1 changes in {0,1} are not observable when… i2 =0 or i2 =1 1 π2 0,1 GOPC Conditioning When xj is not observable… • Add a SEND followed by a RECEIVE • Move the SENDs using SEND reconditioning Lemma 3: πΌπ π 0 → πΊπππΆ 0,1 , π₯1 π‘βππ: π π = π π βπ βπ SEND Reconditioning N N N N N 0 or 1 0 1 Conditioning & + + 0 0 No Activity Inserting Isolating Nodes and Recognizing Enable Domains Synchronous synthesis tools can insert isolating nodes • Constrained to insert isolating nodes only on non-critical paths Node u is in e’s Enable Domain OIED(e) if • All paths starting from a primary input and ending at u include an isolating node controlled by e • Detected using a DFS search Pre-layout Analysis • Wu : power of receiving data on all inputs and sending the output (unconditional nodes) • K: power of conditional nodes • rf: activity factor Domain power after isolation (n inputs) Benefit of isolating each domain Total power Power of each domain Post-layout Experimental Results • Case study: 32-bit ALU placed and routed • Back annotated switching activity using a VCD file • Results: • Isolating ADD and SUB are detrimental for rADD and rSUB > 0.2 • 53% power reduction when only isolating MUL (rf=0.25) • Area cost of isolating MUL is about 4% and no performance penalty Conclusions and Future Work Conditional communication in async. circuits is not free • Creates area and performance overheads • Requires manual or automatic optimization Asynchronous circuits can/should leverage sync. tools • This paper is first to use 3-valued-logic and observability don’t cares for power optimization of asynchronous circuits Our future work • Evaluate the proposed method on bigger designs • Adopt other sync power optimization techniques such as clock gating • Optimize the location of SEND/RECEIVE nodes (Reconditioning)