Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20 Outline • Introduction • Dual-rail asynchronous pipelines – PS0 pipeline – Lookahead pipelines • • • • • Self precharging pipeline Timing constraints Simulation results Conclusion Advices Introduction • Asynchronous circuit’s functional blocks communicate with handshake protocol • Asynchronous pipeline’s advantages – No global clock distribution problem – No clock skew – Lower power consumption – Automatically adapt to the environments Introduction (cont.) • Asynchronous pipelines have two types – Single-rail topology – Dual-rail topology • Single-rail topology – Less area and wiring load – But always takes the worst case delay and additional timing margins • Dual-rail topology – More robust data depended completion detection – Low throughput Dual-rail asynchronous pipelines • Williams’ PS0 pipeline • Lookahead pipelines – LP3/1 pipeline – LP2/2 pipeline – LP2/1 pipeline • Enhanced lookahead pipelines – Enhanced LP3/1 pipeline – Enhanced LP2/1 pipeline Dual-rail asynchronous pipelines (cont.) PS0 pipeline • Tcycle = 3Teval + 2Tcd + Tprech , Tfl = Teval πππ precharge πππ precharge ππ©π«πππ‘ πππ―ππ₯ πππ―ππ₯ πππ―ππ₯ LP3/1 pipeline • Tcycle = 3Teval + Tcd + TNANDB πππ +ππππππ precharge πππ―ππ₯ πππ―ππ₯ πππ―ππ₯ LP2/2 pipeline • Tcycle = 2Teval + 2Tcd (wrong!) ππππ precharge πππ―ππ₯ πππ―ππ₯ Asymmetric C element (aC) LP2/1 pipeline • Tcycle = 2Teval + Tcd + TNANDB (wrong!) πππ + ππππππ precharge πππ―ππ₯ πππ―ππ₯ Enhanced lookahead pipelines • LP3/1 and LP2/1 pipelines have the problem of higher wiring load and larger number of inter-stage control signals • It is difficult to communicate with the environments • Enhanced lookahead pipelines reduce wire load but increase the cycle time Enhanced LP3/1 pipeline • Tcycle = 3Teval + Tcd + 2TNANDB πππ + πππππππ precharge πππ―ππ₯ πππ―ππ₯ πππ―ππ₯ Enhanced LP2/1 pipeline • Tcycle = 2Teval + Tcd + 2TNANDB πππ + πππππππ precharge πππ―ππ₯ πππ―ππ₯ Comparison • Reduce interstage control signal • Communicate with environment Self precharging pipeline • Has all properties of a dual rail pipeline • Each pipeline stage consists of – A functional block with domino gates – A completion detector – An special asymmetric C-element • The completion detectors are moved just after the previous functional block – interstage wiring load is reduced Self precharging pipeline (cont.) • Completion detector’s done signal is used to precharge both the special aC and the functional block, called self precharging Special asymmetric C element • This completion detector has lesser area, delay and power consumption Special asymmetric C element (cont.) • Has two inputs coming from the CDs of current stage N and next stage (N+1) • It’s functionality – When N+1 is 1 => output is 1 – When N+1 is 0 and N is 1 => output is 0 – Hold the previous value otherwise Self precharging pipeline (cont.) • Tcycle = 2Teval + Tcd + TaC , Tfl = Teval πππ + πππ precharge πππ―ππ₯ πππ―ππ₯ Timing constraints • Assume all stages, completion detectors and aC are similar • Three timing constraints – Input hold-time:πβπππ ≤ πππ + πππΆ + ππππβ – Precharge signal width:ππππβ ≤ πππ£ππ + 2ππΌππ – Doesn’t have the safe takeover timing constraint Simulation results • Layout in 90nm UMC process, at 1.2V supply, temperature is 300K, normal process corner Simulation results (cont.) • Power and area – Enhanced LP3/1 has the highest power consumption and area, followed by PS0 – LP2/2 has the lowest power consumption and area – Enhanced LP2/1 and SP have almost same area and power consumption (but SP slightly higher) Conclusion • Self precharging protocol • CDs are just placed after the previous stage • aC removes the self takeover timing constraint of the LP family, makes it simpler to design • High throughput (2.227G data items/s) • Area and power consumption are comparable with LP2/1 pipeline • low latency, high robustness, low power, avoidance of explicit latches etc. compared with synchronous counter parts Comments • Tcycle of LP2/1 and LP2/2 are wrong in [14] • Maybe including the results of power consumption and area • Describe more detail about timing constraints Thanks for your attention