New Ways of Generating Large Realistic Benchmarks for Testing Synthesis Tools Petr Fišer, Jan Schmidt Faculty of Information Technology Czech Technical University in Prague fiserp@fit.cvut.cz, schmidt@fit.cvut.cz Outline Motivation New benchmark generation methods Experimental results Conclusions IWSBP 2010, Freiberg 2 Motivation … why another artificial benchmark generator? To test logic synthesis tools Capabilities of synthesis processes Immunity to “bad” structures Ability to discover “good” structures Iterative power Scalability … IWSBP 2010, Freiberg 3 Motivation J. Cong, K. Minkovich: Optimality study of logic synthesis for LUT-based FPGAs, IEEE Trans. on CAD, vol. 26, 2007 They created artificially large circuits, functionally equivalent to their small origins (70 LUTs) Synthesis produced 10k – 30k LUTs IWSBP 2010, Freiberg 4 Motivation P. Fišer, J. Schmidt, J: Small but Nasty Logic Synthesis Examples, IWSBP'08 XOR tree is appended to the circuit outputs and the circuit is collapsed Synthesis produced >400 LUTs instead of 11 IWSBP 2010, Freiberg 5 Motivation Will my synthesis tool produce the same result for different descriptions (versions) of one particular circuit? (a.k.a. iterative power) Most probably not! (if things go bad) What went wrong? What descriptions are bad for me? What structures caused my failure? What should I do to perform better? IWSBP 2010, Freiberg 6 Proposed Benchmarks Starting with seed circuit (could be small) Functionally equivalent “big” circuit is created The size of the benchmark circuit is adjustable Ideal case: Seed circuit IWSBP 2010, Freiberg Transformation 1 Bench circuit 1 Synthesis Transformation 2 Bench circuit 2 Synthesis Transformation 3 Bench circuit 3 Synthesis Result 7 Proposed Benchmarks Starting with seed circuit (could be small) Functionally equivalent “big” circuit is created The size of the benchmark circuit is adjustable Real case: Seed circuit IWSBP 2010, Freiberg Transformation 1 Bench circuit 1 Synthesis Result 1 Transformation 2 Bench circuit 2 Synthesis Result 2 Transformation 3 Bench circuit 3 Synthesis Result 3 8 Cong’s LEKU Benchmarks J. Cong, K. Minkovich: Optimality study of logic synthesis for LUT-based FPGAs, IEEE Trans. on CAD, vol. 26, 2007 LEKU = Logic Examples with Known Upper Bound Based on elimination of the original circuit structure … and bad decomposition G5 7 LUTs Replicate IWSBP 2010, Freiberg G25 70 LUTs Collapse ABC balance LEKU-CB 814 gates SIS te ch_decomp LEKU-CD >1M gates SOP 19K terms 9 1. Realistic LEKU Benchmarks Any circuit may be used as a seed (instead of g25) Possible chance of success Global BDDs may be used instead of collapsing Upper bound = size of the original circuit Collapse Possibly large SO P SIS tech_decomp Possibly large circuit Global BDD Possibly large SO P SIS tech_decomp Possibly large circuit Original circuit IWSBP 2010, Freiberg 10 1. Realistic LEKU Benchmarks Size increase by collapsing 7x 250 ISCAS and IWLS benchmarks Size increase factor 6x Size increase in 61% of circuits 5x 4x 3x 2x 1x 0x 0 2000 4000 6000 8000 10000 12000 Gates IWSBP 2010, Freiberg 11 1. Realistic LEKU Benchmarks Experimental results Synthesis (# of 4-LUTs) Benchmark circuit Bench Inp. Out. Process c432 36 7 original c432 36 7 c432 36 c432 Gates ABC #1 #2 145 84 77 118 global BDD 2,017 1,031 1,023 1,333 7 ABC collapse 2,658 1,246 1,548 1,648 36 7 SIS collapse 7,075 3,361 3,872 4,738 c880 60 26 original 208 113 110 122 c880 60 26 global BDD 407,098 93,190 174,983 N/A c880 60 26 ABC collapse 13,727 7,437 8,109 9,460 c880 60 26 SIS collapse 30,015 19,787 20,487 28,017 IWSBP 2010, Freiberg 12 2. Parity Benchmark Circuits XOR tree is appended to the circuit outputs, then the structure is destroyed (collapsing, BDD) No guarantee of circuit size increase Upper bound = size of the core circuit + XOR tree x1 Collapse Possibly large SOP SIS tech_decomp Possibly large circuit Global BDD Possibly large MUX tree SIS tech_decomp Possibly large circuit y1 core circuit xn IWSBP 2010, Freiberg XOR ym 13 2. Parity Benchmark Circuits Size increase by appending parity & collapsing 30x 100 ISCAS and IWLS benchmarks 25x Size increase Size increase in 25% of circuits 20x 15x 10x 5x 0x 0 1000 2000 3000 4000 5000 Gates IWSBP 2010, Freiberg 14 2. Parity Benchmark Circuits Experimental results Benchmark circuit Bench Inp. Out. Process s1238 32 1 original s1238 32 1 global BDD s1238 32 1 s1238 32 b4 Synthesis (# of 4-LUTs) Gates ABC #1 #2 493 229 241 263 6,282 3,849 4,055 3,839 ABC collapse 31,839 19,741 21,875 25,793 1 SIS collapse 39,636 26,313 28,254 N/A 33 1 original 267 110 108 116 b4 33 1 BDD 16,963 6,347 6,099 4,285 b4 33 1 ABC collapse 1,405 730 841 884 b4 33 1 SIS collapse 4,087 2,036 2,422 1,627 IWSBP 2010, Freiberg 15 3. Tautology Benchmarks Large random SOP is generated When the number of terms exceeds some threshold, the SOP is a tautology Then, the big SOP is mapped into 2-input gates (SIS tech_decomp) Big network Upper bound = 0 The benchmark size may be adjusted by 1. 2. IWSBP 2010, Freiberg Number of input variables Dimension of SOP terms 16 4. Partial Collapsing Only parts of the network are collapsed 1. 2. 3. 4. 5. 6. Choose one pivot gate Extract its transitive fan-in and fan-out to a given radius Collapse the extracted network part Decompose into 2-input gates Put it back Iterate several times Upper bound = size of the original circuit The benchmark size may be adjusted by 1. 2. IWSBP 2010, Freiberg Size of the extracted circuit Number of iterations 17 4. Partial Collapsing Example – c432 12000 10000 Gates 8000 6000 4000 2000 0 0 20 40 60 80 100 120 140 Part size IWSBP 2010, Freiberg 18 4. Partial Collapsing Example – big tautology 20000 Gates 15000 10000 5000 0 0 2000 4000 6000 8000 10000 Part size IWSBP 2010, Freiberg 19 4. Partial Collapsing Example – big tautology 11000 10500 10000 20000 Gates Gates 15000 10000 9500 9000 5000 0 0 2000 4000 6000 8000 10000 8500 Part size 8000 0 2000 4000 6000 8000 Part size IWSBP 2010, Freiberg 20 4. Partial Collapsing Experimental results Synthesis (4-LUTs) Benchmark circuit Bench Inp. Out. Process c432 36 7 original c432 36 7 c432 36 c432 Gates ABC #1 #2 145 84 77 118 Part. coll., size 98 1,247 626 782 916 7 Part. coll., size 109 3,077 1,445 1,699 2,422 36 7 Part. coll., size 138 5,026 2,598 2,761 3,727 c432 36 7 Part. coll., size 140 11,531 6,647 6,844 9,255 c880 60 26 original 208 113 110 122 c880 60 26 Part. coll., size 129 1,008 485 601 597 c880 60 26 Part. coll., size 171 5,034 2950 2,394 3,769 c880 60 26 Part. coll., size 201 10,423 6224 5,010 7,887 IWSBP 2010, Freiberg 21 5. Replicating Shared Logic Duplicate a part of the logic that is shared 1. 2. Find a branching signal Duplicate its transitive fan-in, to a given depth G1 G3 G1 G3 G10 G11 G6 G2 G22 G16 G6 G2 G10 G11 G22 G16 G11’ G19 G7 G16’ G23 G7 G19 G23 Upper bound = size of the original circuit The benchmark size may be adjusted by 1. 2. IWSBP 2010, Freiberg Number of duplicated branches Depth of duplication 22 5. Replicating Shared Logic Experimental results Benchmark circuit Bench Inp. Out. Process c432 36 7 original c432 36 7 c432 36 c432 Synthesis (4-LUTs) Gates ABC #1 #2 145 84 77 118 10k dup., depth 1 1,428 84 244 333 7 10k dup., depth 2 4,905 84 447 586 36 7 10k dup., depth 3 8,389 84 396 637 c432 36 7 10k dup., depth 4 11,349 84 452 739 c432 36 7 10k dup., depth 5 16,040 84 472 771 IWSBP 2010, Freiberg 23 6. Adding Inverters (special bonus – not included in the proceedings) Add pairs of inverters to random locations The network size may be arbitrarily expanded And all the synthesis tools… Are completely immune to this! IWSBP 2010, Freiberg 24 Summary Experiments 10000 9000 c432 8000 #1 #2 ABC 7000 LUTs 6000 5000 4000 3000 2000 1000 0 0 2k 4k 6k 8k 10k 12k 14k 16k 18k Source circuit gates IWSBP 2010, Freiberg 25 Summary Experiments 60k 50k #1 #2 ABC LUTs 40k 30k s1238_p 20k 10k 0 0 20k 40k 60k 80k 100k 120k 140k Source circuit gates IWSBP 2010, Freiberg 26 Conclusions Several new benchmark generation methods proposed Artificially “big” circuits are generated from seed circuits Benchmarks are functionally equivalent to the seed circuits the complexity upper bound is known Tested on ABC and 2 commercial tools Unfortunate result – the bigger the circuit going to synthesis, the bigger the result IWSBP 2010, Freiberg 27