TAU 2014 Contest and more … Igor Keller TAU 2014: Timing Workshop n STA and related areas ¨ Analog circuits … also related n n n Right forum for top STA experts Covers both mainstream and controversial topics Goals: ¨ Re-­‐discover STA ¨ Bring STA science to design houses ¨ Bring design experience to academics and EDA ¨ Exchange ideas and build networking ¨ EnLce students to stay in EDA ¨ Have fun … 2 Technical Program 48 registered aQendees n EDA, design houses, foundries, academia n ¨ Altera, IBM, Intel, Cisco, Qualcomm, Broadcom, ADI, TI, Oracle, ARM, Swatch group, Samsung, TSMC, NVIDIA, Cadence, Synopsys ¨ 8 universiLes ¨ Countries: Switzerland, Greece, Ireland, Netherlands, Taiwan, Japan, Korea, USA 6 technical sessions squeezed into 2 days n 17 papers (2 invited) n 2 emoLonal panel discussions n Timing Contest: 8 teams n The TAU 2014 Contest Removing Pessimism during Timing Analysis Jin Hu IBM Corp. Debjit Sinha IBM Corp. Igor Keller Cadence [Speaker] Sponsors: TAU 2014 Workshop – March 6th-­‐7th, 2014 4 Past and Present Timing Contests Goal of Coordinated Academic-­‐Industry Contests Guided awareness of challenging projects at earlier academic stages Encourage novel parallelizaLon techniques (including mulL-­‐threading) Facilitate infrastructure/benchmarks for future research Develop Clever Methods for Solving Difficult Problems Gain insight from other perspecLves and approaches Allow algorithm development through focused problem statement Previous Contests 5 Focused Problem Statement Develop an algorithm to perform Common Path Pessimism Removal CPPR during Lming analysis CPPR: the process of removing inherent but arJficial pessimism from Jming tests and paths 6 CPPR Relevance Variability causes many sources of Mming uncertainty Manufacturing VariaLons Metal thickness (CMP) Random dopant effects (Vt) Line-­‐edge roughness Sinha et al. [TAU 2013] Voltage & Temperature VariaLons Across surface of chip From cycle to cycle Difficult to accurately and quickly model for all variaJon sources Create lower (early) and upper (late) delay bounds [lb, ub] Commonly found by deraLng original delay, e.g., ±5% Any unknown, difficult-­‐to-­‐model effect can be accounted for Electrical Effects PotenLal coupling noise Simultaneous signal switching *Global chip-­‐to-­‐chip variaJons Good news: addiLonal pessimism introduced (desirable for safe chip operaLon) Bad news: addiLonal pessimism introduced (unnecessary) CPPR prevents over-­‐opLmizaLon of design due to false Mming fails 7 SequenLal Timing Analysis Hold Tests (Same Cycle) slackHOLD = at (D) – at (CK) – tHOLD pre-­‐CPPR slack arrival Jme at D Details provided in contest_education.pdf [data must be stable tHOLD Jme a^er clock arrives] arrival hold Jme Jme at CK Timing tests are checked against data pin D and clock pin CK of FF IN Launching FF1 Capturing FF2 D OUT Test CombinaJonal Logic CLOCK CK Data Path (DP) Clock Path (CP) 8 SequenLal Timing Analysis Hold Tests (Same Cycle) E L slackHOLD = at (D) – at (CK) – tHOLD Details provided in contest_education.pdf [data must be stable tHOLD Jme a^er clock arrives] pre-­‐CPPR slack early arrival late arrival hold Jme Jme at D Jme at CK Timing tests are checked against data pin D and clock pin CK of FF at opposite modes IN Launching FF1 Capturing FF2 D OUT Test CombinaJonal Logic CK CLOCK Data Path (DP) Clock Path (CP) Signal cannot be both early and late in common porJon à This is inherent but arLficial pessimism 9 Common Path Pessimism Removal Hold Tests (Same Cycle) E L slackHOLD = at (D) – at (CK) – tHOLD Details provided in contest_education.pdf [data must be stable tHOLD Jme a^er clock arrives] L E + [at (cp) – at (cp)] post-­‐CPPR slack early arrival late arrival hold Jme Jme at D Jme at CK Apply [Hold CPPR credit] late arrival early arrival Jme at cp Jme at cp Timing tests are checked against data pin D and clock pin CK of FF at opposite modes IN Launching FF1 Capturing FF2 D OUT Test CombinaJonal Logic CK CLOCK Data Path (DP) Clock Path (CP) common point (cp) Signal cannot be both early and late in common porJon à This is inherent but arLficial pessimism 10 PotenLal Impact of CPPR *if done correctly CPPR can only improve test slacks (never overly opMmisMc) Pre-­‐CPPR slack ≈ -­‐55 Post-­‐CPPR slack ≈ +275 Post-­‐CPPR Test Slack Pre-­‐CPPR Test Slack pre-­‐CPPR slack = post-­‐CPPR slack no post-­‐CPPR slack worse than its pre-­‐CPPR slack 11 Why is CPPR Important? Timing opLmizaLon Power opLmizaLon Slack Slack CPPR Done! Slack 12 Why is CPPR Difficult? n ExponenLal complexity n Analysis of graph for re-­‐converging paths n High runLme and memory n Grows fast with number of clock nets in the design n Possible but not easy to parallelize n Devil is in the details … 13 TAU 2014 Contest MoLvaLon CPPR Challenges Analysis is path-­‐based: can have exponenMal runMme à CPPR can be overly opLmisLc if not enough paths are considered ExisLng literature and research is limited Contest / Topic Scope *not accounJng for holidays Timeline spans roughly 2.5 months Only Hold + Setup tests considered No latches (flush segments) considered Limited design topologies, e.g., clock tree reconvergence Limited to determinisJc Jming (no staJsJcal) Lessons Learned from Previous Contests Simplify input / output processing à focus on algorithm development and performance opLmizaLons Provide adequate documentaLon à assumes no prior knowledge of Lming analysis or CPPR 14 TAU 2014 Contest Guidelines From TAU 2013 Pre-­‐processing Provided to Contestants TAU 2013 Benchmarks Detailed DocumentaMon EvaluaLon Path-­‐based CPPR Output Design Library Delay File Topology, tests, etc. RunMme (relaMve) Delay Converter Phase 1 Industrial Tool Timing File Golden Result* AsserJons, etc. Phases 2 and 3 *Industrial tool 15 Inputs: Delay File Details provided in contest_file_formats.pdf Specifies primary inputs and outputs Provides early and late propagaLon delay for every source-­‐to-­‐sink Lming arc Provides setup and hold Lmes for every data-­‐to-­‐clock Lming test FF D Setup Hold OR2 Q A B Y A Timing Arc Y Timing Test Primary Input B CK input IN1 input IN2 input CLOCK output OUT Primary Output IN1 source FF1 sink OR2:A OR2:Y 5e-11 5e-11 OR2 B2 IN2 FF3 Setup FF2 early delay OUT test type late delay test Jme setup FF3:D FF3:CK 3e-11 CLOCK data pin clock pin B1 B3 B4 16 Inputs: Timing File Details provided in contest_file_formats.pdf Provides early and late arrival Lmes for each primary input Provides clock period for the clock source FF D Setup Hold OR2 Q A B Y A Timing Arc Y Timing Test Primary Input B CK Primary Output IN1 Primary Input FF1 at IN1 0e+0 1e-11 OR2 FF3 OUT early at late at B2 IN2 FF2 clock CLOCK 1.2e-10 CLOCK clock pin B1 B3 clock period B4 17 Benchmarks Phase 1 [6-­‐42 Tests] Based on TAU 2013 v1.0 benchmarks (sequenJal circuits) Added more complex (randomized) clock tree BRANCH(CLOCK, iniJal FF) For each remaining FF Select random locaLon L in current tree BRANCH(L,FF) BRANCH(src,sink): create buffer chain from src to sink FF FF FF Phase 2 [380-­‐50.1K Tests] L2 Based on TAU 2013 v2.0 benchmarks (openCore) CLOCK L1 Phase 3 (EvaluaLon) [8.2K to 109.6K Tests] Phase 2 Benchmark Phase 2 Benchmark Phase 2 Benchmark 18 Output File [Jming analysis] Details provided in contest_file_formats.pdf contest_rules.pdf [a^er CPPR] Requires pre-­‐CPPR and post-­‐CPPR slacks for each test and path Controllable opLons: <testType> -numTests <int> -numPaths <int> [setup/hold/both] FF D Setup Hold OR2 Q A B Y A Y pre-­‐CPPR post-­‐CPPR test slack test slack Primary Output IN1 FF1 B2 FF3 Setup FF2 B3 path length -1.5e-11 -1e-11 10 OR2 IN2 -numPaths setup -3e-11 -1e-11 1 Timing Test Primary Input CK B1 [number of paths per test] test type Timing Arc B CLOCK [number of tests] B4 OUT pre-­‐CPPR post-­‐CPPR path slack path slack FF3:D OR2:Y OR2:A FF2:Q FF2:CK B3:Y B3:A B1:Y B1:A CLOCK pin-­‐to-­‐pin data path from: data pin of test to: primary input 19 EvaluaLon Metrics T: set of all tests in D P: set of all paths in D Slack Accuracy (Difference) [0,1] (1,3] (3,5] (5, ) ps ps ps ps 100 80 50 0 Accuracy (Compared to “Golden” Results) Test t Slack Accuracy A(t) Pre-­‐CPPR test slack Post-­‐CPPR test slack Path p Slack Accuracy A(p) Pre-­‐CPPR path slack Post-­‐CPPR path slack Correctness of path (Raw) Testcase D Accuracy A(D) Average of A(t) for all tests t in T Average of A(p) for all paths p in P Average of A(dp(t)) for all tests t in T, where dp(t) is the criLcal path of t Minimum of {A(t)} for all tests t in T A(critT), where critT is the most criLcal test First three considers overall tool quality; Last two considers worst tool quality RunMme Factor (RelaMve) runLme(D) RF(D) = Average of all contestants Composite Testcase Score Overall Contestant Score Average of score(D) for all designs score(D) = A(D) (0.5 + 0.5 RF(D)) Memory usage not considered 20 TAU 2014 Contestants University Country Team Name NaLonal Chiao Tung University Taiwan iTimerC University of Thessaly Greece The TimeKeepers NaLonal Tsing Hua University Taiwan TTT India InsLtute of Technology, Madras India ElecEnthus University of Illinois at Urbana-­‐Champaign USA UI-­‐Timer India InsLtute of Technology, Madras India LightSpeed Missouri University of Science and Technology USA MST_CAD China PKU-­‐HappyTimer Peking University 21 Contestant Performance Overall quality of submi\ed binaries was superb One testcase comprised of <benchmark,testType,-numTests,-numPaths> à 24 total For each Combo benchmark, used 4 seungs: -setup -hold -numTests N –numPaths 1 -numTests n –numPaths m n < N < 50K, m < 20 Ex: Combo7 -setup -numTests 35000 -numPaths 1 5 of 7 final submissions had no crashes; 1 of 7 crashed on only 5 testcases 6 of 7 final submissions had full accuracy on 12 designs EvaluaJon Machine: 8X Intel(R) Xeon CPU E7-­‐8837 @2.67GHz 6 of 7 final submissions used 8 threads [maximum allowed]; 1 of 7 final submissions used 2 threads Normalized Average Performance 1 Raw Accuracy RunLme Factor Best 0.5 Worst 0 Team C1 C2 C3 C4 C5 C6 Total RunLme (hours) 6.27 4.27 1.44 7.8 4.96 2.42 *Final evaluaJon is design-­‐specific, not based on total runJme or overall averages 22 Acknowledgments Jobin Kavalam, NiLn Chandrachoodan [IITimer from TAU 2013] Provided Jmer source code, helped with iniJal input file conversions Jin Hu (IBM), Debjit Sinha (IBM) , Chirayu Amin (Intel) Special Thanks to the TAU 2014 Contestants This contest would not have been successful without your hard work and dedicaJon 23 Winners 24 TAU 2014 Timing Contest Removing Common Path Pessimism Third Place Award Presented to Yu-­‐Ming Yang, Yu-­‐Wei Chang and Iris Hui-­‐Ru Jiang NaLonal Chiao Tung University, Taiwan For iTimerC Chirayu Amin General Chair Igor Keller Technical Chair Jin Hu Contest Chair 25 TAU 2014 Timing Contest Removing Common Path Pessimism Honorable MenHon Presented to Christos Kalonakis, Charalampos Antoniadis, PanagioLs Giannakou, Dimos Dioudis, George Pinitas and George Stamoulis University of Thessaly, Greece For The TimeKeepers Chirayu Amin General Chair Igor Keller Technical Chair Jin Hu Contest Chair 26 TAU 2014 Timing Contest Removing Common Path Pessimism Second Place Award Presented to M S Santosh Kumar and Sireesh N IIT Madras, India For LightSpeed Chirayu Amin General Chair Igor Keller Technical Chair Jin Hu Contest Chair 27 TAU 2014 Timing Contest Removing Common Path Pessimism First Place Award Presented to Tsung-­‐Wei Huang, Pei-­‐Ci Wu and MarLn D. F. Wong University of Illinois at Urbana-­‐Champaign, USA For UI-­‐Timer Chirayu Amin General Chair Igor Keller Technical Chair Jin Hu Contest Chair 28 Thank you! 29 Backup 30 Contest Timeline Date AcMvity 10/13/2013 Contest release date https://sites.google.com/site/taucontest2014 • Timing analysis and CPPR tutorial [contest_education.pdf] • Contest overview and guidelines [contest_rules.pdf] • Contest input and output specificaLons [contest_file_formats.pdf] • Source code from the winners of TAU 2013 Contest (IITimer) 11/22/2013 – 12/02/2013 – 01/06/2014 End of contest registraLon • Phase 1 Benchmark Set [9 testcases] • Phase 2 Benchmark Set [6 testcases] 01/15/2014 Alpha binary submission 02/01/2014 [~2.5 months] Final binary + short report submission 03/07/2014 Winners announced (today!) 31 Common Path Pessimism Removal Hold Tests (Same Cycle) E L slackHOLD = at (D) – at (CK) – tHOLD Details provided in contest_education.pdf [data must be stable tHOLD Jme a^er clock arrives] L E + [at (cp) – at (cp)] post-­‐CPPR slack early arrival late arrival hold Jme Jme at D Jme at CK Apply [Hold CPPR credit] late arrival early arrival Jme at cp Jme at cp Setup Tests (Next Cycle with clock period P) [data must be stable tSETUP Jme before clock arrives] L E slackSETUP = atE(CK) + P – atL(D) – tSETUP + [delay (OL) – delay (OL)] Apply [Setup CPPR credit] post-­‐CPPR slack early arrival clock late arrival setup Jme Jme at CK period Jme at D late delays of early delays of CP and DP overlap CP and DP overlap OL = CP ∩ DP Timing tests are checked against data pin D and clock pin CK of FF at opposite modes IN Launching FF1 Capturing FF2 D OUT Test CombinaJonal Logic CK CLOCK Data Path (DP) Clock Path (CP) common point (cp) Signal cannot be both early and late in common porJon à This is inherent but arLficial pessimism 32