Synthesis of the Optimal 4-bit Reversible Circuits Oleg Golubitsky Sean Falconer Dmitri Maslov (spkr) Google Inc. Waterloo, ON, Canada Stanford University Stanford, CA University of Waterloo Waterloo, ON, Canada Basic Definitions NOT x y x y z xy z Toffoli CNOT Toffoli-4 Reversible circuit is a string of gates. Reversible nbit function is a permutation of 2n elements. page 1/15 Problem Synthesize optimal 4-bit reversible circuits, i.e., containing minimal number of gates. Complexity -- There are 16!=20,922,789,888,000 reversible functions. -- There 32 gates. -- An average optimal circuit requires 11.94 gates. :: 20,922,789,888,000 * log2 32 * 11.94 bits > 100 TB. Murphy, David. "Western Digital Launches World-First 2TB Hard Drive". PC World. Retrieved 2009-01-27. page 2/15 Importance Library for physicists interested in performing a small experiment, but having very limited control over their system. Indispensable for peep-hole optimization methods. Peephole optimizations are an important part of any modern compiler. Mathematical curiosity. Computing the value of Shannon’s complexity function. L(3)=8, L(4)=[14,17], L(5)=? page 3/15 Solution Denote 16!=N (formally, N:=2n!). Rough complexity analysis N -- time: N -- space: Next, reduce these complexity figures to something manageable. page 4/15 Solution Optimization 1 Synthesize and save only halves of all optimal circuits. An optimal circuit for any function may be found by searching for both of its halves. Rough complexity analysis -- space: -- time: N N * N N page 5/15 Solution Optimization 2 Store optimal halves in a hash table. Rough complexity analysis N -- time: soft N -- space: Actual complexity is closer to OSpace Re quiredToSt oreHalves( N ) -- time: soft OSpace Re quiredToSt oreHalves( N ) -- space: page 6/15 Solution Optimization 3 Simultaneous input/output relabeling does not change optimality of a circuit. Thus, we store a single (canonical--binary string with least lexicographic order) representative. In practice, there are almost 24=4! different relabelings, reducing the storage complexity by a factor of almost 24, and helping to reduce runtime. page 7/15 Solution Optimization 4 If an optimal circuit is found for a function f, an optimal circuit for the inverse function, f -1, can be obtained by reversing the optimal circuit for f. In practice, random f frequently differs from f -1 resulting in the reduction of storage requirement by an additional factor of almost 2, and helping to further reduce the runtime. page 8/15 Performance Parameters of the linear hash table storing canonical representatives. k Size 7 225 8 228 Memory usage 256 MB 2 GB Load factor 0.58 0.91 9 232 32 GB 0.51 Using a high performance server with 16 AMD Opteron 2300 MHz processors, 64 GB RAM, and Seagate Barracuda ES2 SCSI 7200 RPM HDD running Linux it took 10,549 seconds (under 3 hours) to synthesize all optimal circuits with up to 9 gates. page 9/15 Performance Size 14 13 Functions 17,191 2,371,039 12 11 10 5,110,943 2,051,507 392,108 9 8 7 50,861 5,269 455 6 5 24 3 Synthesis of 10,000,000 random functions (Fisher-Yates shuffle over Mersenne twister random number generator) took 104,616.716 seconds (about 29 hours) of user time with the maximal memory usage of 43.04 GB. Loading optimal circuits with up to 9 gates into RAM took 1111 seconds. On average, it took only 0.01035 seconds to synthesize an optimal circuit. A 5400-RPM HDD access time may be expected to be on the order of 0.01— 0.02 seconds. page 10/15 Performance Distribution of the number of functions requiring a circuit of a specified size (gate count). page 11/15 Performance Size Functions 10 138 9 13,555 8 84,225 7 118,424 6 72,062 Distribution of the number of linear functions requiring a circuit of a specified size (gate count). 5 26,182 4 6,589 3 1206 It took under 2 seconds to synthesize all these circuits. 2 162 1 16 0 1 WA: 6.8816 Total: 322,560 page 12/15 Performance page 13/15 Future directions Larger circuits -- There are 80 transformations resulting from the application of all possible Toffoli-type gates on 5 bits. -- 806*(log2 80)/5!/2 ~ 7.1 billion bits, fits into RAM memory. -- 6+6=12. Meaning, it is reasonable to expect that extending the search for optimal 4-bit reversible circuits will allow to find optimal 5-bit reversible circuits with up to 12 gates. page 14/15 Future directions Optimal circuits using other cost metrics This search can be easily extended to account for other cost metrics: Weighted gate count optimal circuits---organize breadth first search such that a gate with cost G is assigned to a circuit of cost C at the iteration number G+C. Depth optimal circuits---choose a different set of elementary transformations, e.g., circuit NOT(a)CNOT(b,c) is now an elementary transformation. Depth optimal weighted gate circuits---combine previous two modifications. page 15/15 END Questions? 2 ! 5.418528796 10 10 2639 210!=541852879605885728307692194468385473800155396353801344448287027068321061207337660373314098413621458671907918845 7089807539319941657701873682604541333337219391083675280127649937697682925169378911657556806596637479473145184048866 7767255612518869433525121367727452196343077013371320579624843312887008843617165469023751839045294473227780840293215 8722061853806162806063925435310822186848239287130261690914211362251144684713888587881629252104046295315949943900357 8824102439343150374441138908061814062108639532752353758850185984515822295996545585412427891309024869442986109231533 0757913167574514643630402489082044290773456182736903050225279692655307296737099075874779312763510470246988966796146 2133026237158973227857814631807156427767644064591085076564783456324457736853810336981776080498707767046394272605341 4167791256977333745680374751866762659616656158846814502633370425226641418621570468256847733609443267374936766749150 9895376811294583162664385647902781638573029154266772566564227682605826439388451491197641967550929020859271315636298 3290989441052732125187249527501314071676405516936190781821236701912295767363117054126589929916482008515781751955466 9109028387292322245099063886381477712552277826313223857569488193936588899089936708745168606530984110202998538162815 6433498184710577783953474253149962210348880758451370576983976399310392966504604612116665134513114951365740086905633 4867859885025601787284982567787314407216524272262997319791568603629406624740101482697559533155736658800562921274680 6572852015704019406922855578006114290557553245497940089398491468126398607500852632988202247195855053447737115906566 8282104141726504065860068384494510435499881288680131655155171467338832334085176381971359131237254867373478353731634 1517369387565212899726597964903241208727348690699802996369265070088758384854547542272771024255049902319275830918157 4482051964210728372049372935161753419577754224531524422803913724077178916612030610402558300550338867900521160254087 4045462093838436763788665876991279092232371737134317606748335251362912336288589362713229418356588401041872786935443 9077085278288558308427090461075019007184933139915558212752392329879780649639075333845719173822840501869570463626600 2352655875023355954893116375093802191198604713357716524039994032963602455772579636732866543489573257409997105671316 2327234576676193765140810399919363390828642051009857745452406810689739249313828736222625792000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Classically: 210!/(Lifespan_of_universe_in_Planck_time_units * Estimated_number_of_atoms) ~ 102452 universes! page __/__ Work in progress Synthesize all optimal 4-bit circuits -- Store circuits with up to 9 gates as we do it now. -- Store a bit vector (~250 GB) for canonical representatives of circuits with 10, 11, 12, 13, and 14 gates, one at a time. -- Use a minimal number of uploads/downloads of parts of each of such vectors into RAM. page __/__