Reversible Computing A Brief Introduction Dr. Michael P. Frank mpf@cise.ufl.edu Dept. of Computer & Information Science & Engineering (Affil. Dept. of Electrical & Computer Engineering) University of Florida, Gainesville, Florida Presented at: 2004 Computing Beyond Silicon Summer School (Week 4) California Institute of Technology Pasadena, California, July 6-8, 2004 Abstract • The performance of power-limited computing systems is directly limited by the energy efficiency of logic operations. Performance (ops / time) = Power (energy dissipated / time) × Energy efficiency (ops / energy dissipated) • Traditional logic techniques are approaching a number of very general physical limits on energy efficiency. – Due to quite fundamental thermodynamic considerations. • The only potential way to circumvent all of these limits is through (logically & physically) reversible computing (RC). – It is related to quantum computing, but easier in some ways. • RC appears to be doable, but it is still very challenging… – But, it is a challenge that we must meet, for continued progress. • In this talk, we survey fundamental concepts, available technologies, and outstanding problems of RC. Moore’sMoore's Law – Devices per IC Law - Transistors per Chip 1,000,000,000 Madison Itanium 2 P4 P3 Intel µpu’s P2 486DX Pentium 386 286 8086 100,000,000 10,000,000 1,000,000 100,000 10,000 4004 1,000 Early Fairchild 10 ICs Avg. increase of 57%/year 100 1 1950 1960 1970 1980 1990 2000 2010 ITRS '03 Feature Lengths Device Size Scaling Trends 1000 (1 µm) 350 250 Feature length (nm) 180 130 100 DRAM hp MPU M1 hp poly hp printed GL physical GL Node EOT Based on ITRS ’97-03 roadmaps Virus 90 65 45 32 22 10 Protein molecule 1 DNA/CNT radius Silicon atom 0.1 Hydrogen atom 1990 1995 2000 2005 2010 2015 2020 2025 Year of Production 2030 2035 2040 2045 ITRS '97-'03 Gate Energy Trends Trend of Minimum Transistor Switching Energy Based on ITRS ’97-03 roadmaps 1.E-14 250 LP min gate energy, aJ HP min gate energy, aJ 100 k(300 K) ln(2) k(300 K) 1 eV k(300 K) 180 1.E-15 130 90 65 CVV/2 energy, J 1.E-16 45 32 1.E-17 fJ 22 Practical limit for CMOS? 1.E-18 aJ Room-temperature 100 kT reliability limit One electron volt 1.E-19 1.E-20 Room-temperature kT thermal energy Room-temperature von Neumann - Landauer limit zJ 1.E-21 1.E-22 1995 2000 2005 2010 2015 2020 Year 2025 2030 2035 2040 2045 The Leakage Problem • The primary traditional approach to decrease energy dissipation per logic-op has been: – Simply decrease the magnitude of the ½CV2 energy that is stored per bit. • This is done by moving to smaller transistor structures, which decreases C and usable V. – However, as V decreases, there is a problem. • An upper bound on the on/off ratio Ron/off = Ion/Ioff of transistors is given by the relation log Ron/off ≲ V/s. – The parameter s is called the subthreshold slope. • Typical units: mV/decade (decade = log 10) • The exact value of s depends on the precise device geometry, – It is reduced by going to multi-gate or surround-gate structures. • But, s has a fundamental room-temperature T minimum of s ≥ T/q = (kT/q ln 10)/decade ≈ 60 mV/decade in FETs, independent of materials! (Whether carbon nanotubes, Si nanowires, etc. – This is just due to the ratio between above-barrier state occupancy probabilities for a change in barrier height of V. (From Boltzmann distrib.) • At low voltages (e.g., a few hundred mV), transistors can’t turn off effectively, and there is substantial continuous power dissipation. – Leakage already accounts for as much as 40% of total power in many designs! A Fairly Conventional “Optimistic” Technology Scenario for CMOS • Suppose device lengths are cut in half every 3 years… – From 90 nm today down to 22 nm node in 2010 (then stop). – Node capacitances, gate delays also decrease accordingly… • “Technology boosters” such as high-κ dielectrics & novel FET structures (FinFET, surround-gate, etc.) keep leakage power manageable, for a little while… – However, the absolute minimum room-T subthreshold slope for FETs will remain 60 mV/decade! (= (kT/q)/log 10) • Assume this point is also reached by around 2007. • Voltages then reach a minimum of ~0.5V in 2007. – Can’t go lower while keeping on/off ratio above 108 level! • A minimum level chosen so as to keep leakage small • Now, consider what all this implies about future chip performance, given a 100 W maximum power level… – Let max raw performance = 100 W / (½CV2 gate energy) Not much life left for standard CMOS… CMOS Raw Performance - "Optimistic" Scenario Device-ops/second per 100W chip 1.00E+19 e.g. 825 million devices actively switching @ 4 GHz, ~7,000 kT dissip. per device-op 1.00E+18 CMOS e.g., 67 million devices actively switching @ 3 GHz 1.00E+17 2004 2006 2008 2010 2012 2014 2016 2018 2020 Year Now, even if the leakage problem were solved, the ~100 kT limit for reliable switching is only another factor of 70 beyond this point! Reversible Computing Motivation & Basic Concepts Landauer’s (1961) principle: (Was hinted at by von Neumann ’49) The minimum energy cost of oblivious bit erasure Before bit erasure: s0 0 0 … … sN−1 t0 0 … N possible distinct states After bit erasure: tN−1 0 tN 0 Unitary (one-to-one) evolution s′0 1 1 … s′N−1 … … … N possible distinct states 2N possible distinct states t2N−1 0 Increase in entropy: ∆S = log 2 = k ln 2. Energy dissipated to heat: T∆S = kT ln 2 Non-oblivious “erasure” (by decomputing known bits) avoids the von Neumann–Landauer bound Before decomputing B: A s0 B A t0 0 0 sN−1 B 0 0 … A … … N possible distinct states After decomputing B: tN−1 B 0 0 A B 0 0 N possible distinct states Unitary (one-to-one) A s′0 1 1 A B 1 1 t′0 … s′N−1 evolution … … N possible distinct states B t′N−1 A B 1 0 A B 1 0 N possible distinct states Increase in entropy: ∆S → 0. Energy dissipated to heat: T∆S → 0 Reversible Computing • A reversible digital logic operation is: – Any operation that performs an invertible (one-to-one) transformation of the device’s local digital state space. • Or at least, of that subset of states that are actually used in a design. • Landauer’s principle only limits the energy dissipation of ordinary irreversible (many-to-one) logic operations. – Reversible logic operations can dissipate much less energy, • Since they can be implemented in a thermodynamically reversible way. • In 1973, Charles Bennett (IBM Research) showed how any desired computation can in fact be performed using only reversible operations (with basically no bit erasure). – This opened up the possibility of a vastly more energy-efficient alternative paradigm for digital computation. • After 30 years of (sporadic) research, this idea is finally approaching the realm of practical implementability… – Making it happen is the goal of the RevComp project at UF. Requirements for Reversible vs. Quantum Computing Property of Computing Mechanism Approximate Meaning Required for Quantum Computing? Required for Reversible Computing? System’s full invertible quantum evolution, w. all phase information, is modeled & tracked Yes, device & system evolution must be modeled as ~unitary, within threshold No, only reversible evolution of classical state variables must be modeled & tracked Coherent Pure quantum states don’t decohere (for us) into statistical mixtures Yes, must maintain full global coherence, locally within threshold No, only maintain stability of local pointer states & transitions Adiabatic No heat flow in/out of computational subsystem Yes, must be above a certain threshold Yes, adiabaticity as high as possible No new entropy generated by mechanism Yes, must be above a certain threshold Yes, isentropicity as high as possible Time-Independent Hamiltonian, Self-Controlled Closed system, evolves autonomously w/o external control No, transitions can be externally timed & controlled Yes, if we care about energy dissipation in the driving system Ballistic System evolves w. net forward momentum No, transitions can be externally driven Yes, if we care about performance (Treated As) Unitary Isentropic / Thermodynamically Reversible Some Doubts and Their Answers Some Claims Against Reversible Computing Eventual Resolution of Claim John von Neumann, 1949 – Offhandedly claims during a lecture that computing requires kT ln 2 dissipation per “elementary act of decision” (bit-operation). No proof provided. Twelve years later, Rolf Landauer of IBM tries valiantly to prove it, but succeeds only for logically irreversible operations. Rolf Landauer, 1961 – Proposes that the logically irreversible operations which necessarily cause dissipation are unavoidable. Landauer’s argument for unavoidability of logically irreversible operations was conclusively refuted by Bennett’s 1973 paper. Bennett’s 1973 construction is criticized for using too much memory. Bennett devises a more space-efficient version of the algorithm in 1989. Bennett’s models criticized by various parties for depending on random Brownian motion, and not making steady forward progress. Fredkin and Toffoli at MIT, 1980, provide ballistic “billiard ball” model of reversible computing that makes steady progress. Various parties note that Fredkin’s original classical-mechanical billiard-ball model is chaotically unstable. Zurek, 1984, shows that quantum models can avoid the chaotic instabilities. (Though there are workable classical ways to fix the problem also.) Various parties propose that classical reversible logic principles won’t work at the nanoscale, for unspecified or vaguely-stated reasons. Drexler, 1980’s, designs various mechanical nanoscale reversible logics and carefully analyzes their energy dissipation. Carver Mead, CalTech, 1980 – Attempts to show that the kT bound is unavoidable in electronic devices, via a collection of counter-examples. No general proof provided. Later he asked Feynman about the issue; in 1985 Feynman provided a quantum-mechanical model of reversible computing. Various parties point out that Feynman’s model only supports serial computation. Margolus at MIT, 1990, demonstrates a parallel quantum model of reversible computing—but only with 1 dimension of parallelism. People question whether the various theoretical models can be validated with a working electronic implementation. Seitz and colleagues at CalTech, 1985, demonstrate circuits using adiabatic switching principles. Seitz, 1985—Has some working circuits, unsure if arbitrary logic is possible. Koller & Athas, Hall, and Merkle (1992) separately devise general reversible combinational logics. Koller & Athas, 1992 – Conjecture reversible sequential feedback logic impossible. Younis & Knight @MIT do reversible sequential, pipelineable circuits in 1993-94. Some computer architects wonder whether the constraint of reversible logic leads to unreasonable design convolutions. Vieri, Frank and coworkers at MIT, 1995-99, refute these qualms by demonstrating straightforward designs for fully-reversible, scalable gate arrays, microprocessors, and instruction sets. Some computer science theorists suggest that the algorithmic overheads of reversible computing might outweigh their practical benefits. Frank, 1997-2003, publishes a variety of rigorous theoretical analysis refuting these claims for the most general classes of applications. Various parties point out that high-quality power supplies for adiabatic circuits seem difficult to build electronically. Frank, 2000, suggests microscale/nanoscale electromechanical resonators for highquality energy recovery with desired waveform shape and frequency. Frank, 2002—Briefly wonders if synchronization of parallel reversible computation in 3 dimensions (not covered by Margolus) might not be possible. Later that year, Frank devises a simple mechanical model showing that parallel reversible systems can indeed be synchronized locally in 3 dimensions. working energy recovery Adiabatic Circuits • Reversible logic can be implemented today using fairly ordinary voltage-coded CMOS VLSI circuits. – With a few changes to the logic-gate/circuit architecture. • We avoid dissipating most of the circuit node energy when switching, by transferring charges in a nearly adiabatic (literally, “without flow of heat”) fashion. – I.e., asymptotically thermodynamically reversible. • In the limit, as various low-level technology parameters are scaled. • There are many designs for purported “adiabatic” circuits in the literature, but most of them contain fatal flaws and are not truly adiabatic. – Many past designers are unaware of (or accidentally failed to meet) all the requirements for true thermodynamic reversibility. Reversible and/or Adiabatic VLSI Chips Designed @ MIT, 1996-1999 By Frank and other then-students in the MIT Reversible Computing group, under CS/AI lab members Tom Knight and Norm Margolus. AND Transition Tables • • Recall how a truth table for Boolean logic lists all possible input combinations on the left, and the corresponding output(s) on the right. Q 0 0 0 0 1 0 1 0 0 1 1 1 A transition table is a similar device designed to allow us to easily distinguish reversible operations from irreversible ones. – We list each combination of all local bits once in both “before” and “after” columns. • Corresponding to just before the operation begins, and just after it is completely finished. – We draw an arrow from each before state to the particular after state that it transforms to. • Red if the transition is dissipative, green otherwise. – Must obey the following rule: Only one of the arrows going into any given after state may be green. Before After in out in out 00 00 01 10 01 10 11 11 Before CD 00 01 10 After CD 00 01 11 11 10 • It is convenient to order the after column so that all the green arrows go straight horizontally. – It may be that only a subset of the input and/or output states arise in the context of a given circuit design. • We may “fade away” the particular states and transitions which never arise. – An operation is always reversible iff there are no red arrows in the table. • This means the operation is one-to-one. – An operation is reversible in context iff there are no un-faded red arrows in the resulting table. • A B • I.e., the operation is 1-1 on the states that arise. We will find these tables to be very useful. Standard inverter (present-day “NOT gate”) operation. Function: out := ¬ in Usually irreversible. Only reversible in the context that its input never changes! cNOT (controlled-NOT) “gate” (operation) Function: D = C Always reversible. Bistable Potential-Energy Wells A Technology-Independent Model of Digital Devices (Landauer ’61) • Consider any system having an (adjustable) potential energy surface (PES) in its configuration space. – The PES should have at least two local minima (or wells) – Therefore the system is bistable • It has two stable (or at least metastable) configurations – Located at well bottoms – One state can represent 0, the other 1. • This picture can also be easily generalized to larger numbers of stable states. • Consider now the PES having two adjustable parameters: – (1) “Height” (energy) of the potential energy barrier between wells, relative to well bottoms – (2) Relative height of the left and right states in the well (call this “bias”) Potential energy • The two stable states form a natural bit. 0 1 Generalized configuration coordinate Possible Parameter Settings • In the following slides, we will distinguish six qualitatively different settings of the well parameters, as shown below… Raised Barrier Height Lowered Left Neutral Direction of Bias Force Right Box spring Bias rod Rightward bias Fixed sleeve bearing Gate rod One Mechanical Implementation State knob Barrier wedge Barrier up Barrier down Leftward bias MOSFET Implementation • The logical state is in the location of a charge packet (excess of electrons) on either side terminal of a FET. – The charge packet might even consist of just a single excess electron in a sufficiently small (nanoscale) logic node. • The potential energy barrier is provided by the built-in voltage across the PN junctions in the FET. – The barrier height is lowered when the device is turned on by adjusting the voltage on the gate electrode. • Bias forces can be provided by (e.g.) capacitive coupling to nearby electrodes. n e e e p n Possible Well Transitions • Catalog of all the possible transitions in the bistable wells, adiabatic & not... (Ignoring superposition states.) – We can characterize a wide variety of digital logic and memory styles in terms of how their operation corresponds to subgraphs of this diagram. 1 leak 0 0 0 Barrier Height ∆E 0 1 1 k ln 2 N Direction of Bias Force leak ∆E 1 “1” states “0” states Logic & Memory Styles All describable within the potential-well paradigm! • Irreversible styles: – Input-barrier, fixed-bias logic. • E.g. standard static CMOS inverters & combinational gates. – Input-bias, clocked-barrier latching. • Standard static CMOS latches, dynamic RAM cells, etc. • Reversible styles: – Type 1: Input-bias, clocked-barrier latching. – Type 2: Input-barrier, clocked-bias logic. – Type 3: Input-barrier, clocked-bias latching logic. • All of these are available in a very wide variety of different physical instantiations of the bistable well. – E.g., CMOS, superconducting, quantum-dot, Y-branch switches, mechanical implementations, etc. Ordinary Irreversible Logics • Principle of operation: Lower a barrier, or not, based on input. Series/parallel combinations of barriers do logic. Major 1 dissipation in at least one of the possible transitions. Input changes, barrier lowered 0 0 • Can amplify input signals. Example: Ordinary CMOS logics Output irreversibly changed to 0 Irreversible SET/CLR operations • Irreversible SET: Turn on a pFET connecting node B to a high voltage source. SET operation B B B Voltage color scheme: Low / High ½CV2 B B before after 0 0 1 1 • Irreversible CLR: Turn on an nFET connecting node B to a low voltage source. CLR operation B B ½CV2 B B B before after 0 0 1 1 Conventional Logic is Irreversible Even a simple NOT gate, as it’s traditionally implemented! • Here’s what all of today’s logic gates (including NOT) do continually, i.e., every time their input changes: – – – – They overwrite previous output with a function of their input. Performs many-to-one transformation of local digital state! required to dissipate ≳kT on avg., by Landauer principle Incurs ½CV2 energy dissipation when the output changes. Example: Static CMOS Inverter: in out Inverter transition table: Just before After transition: transition: in out 0 0 0 1 1 0 1 1 in out 0 1 1 0 Example: Standard CMOS Inverter Power (Vdd) on In =0 Out =1 off Ground (0V) Barrier lowered Charge Vdd falls in Out Power (Vdd) Input goes high off In =1 on Input goes low Barrier btwn. Out and Ground lowered, charge “falls” to lower energy level Ground (0V) Voltage color scheme: Low / High Barrier raised Simplified ← picture → of PES GND Out = 0 Barrier lowered Charge falls out Vdd Out GND Spacetime Logic Network Diagrams • In this general class of diagrams (popular in reversible & quantum logic), – Time is plotted in one direction, often left→right, – Horizontal lines denote locations (nodes, bits of state). – Operations (potential change events) are denoted by icons on and/or connections between bit-lines. • Please keep in mind: These diagrams do not directly depict the spatial structure of how a physical circuit is wired! – E.g., a long horizontal line denotes the evolution of a localized node in a physical circuit over a long period of time, not a long, spatially extended wire. – A vertical connection between lines or an icon on a line (often called a “gate”) denotes a momentary interaction event, not a perpetual physical link, or a physical object. Location An icon denotes that O potentially changes (whether spontaneously or under external control) at this time. I This arrow denotes that some external event causes the value of node I to change at this time. The change in I is propagated so as to cause node O to change a moment later. O Time Inverter action in spacetime diagram • Note: This notation makes it explicit that an ordinary inverter’s real semantics is that it should carry out a logically irreversible transformation of its output node. Some outside influence causes In to possibly change here Location In The “×” icon denotes that the old value of Out gets obliviously overwritten Out Time This (standard) icon denotes that In’s value gets copied (with gain & delay) & inverted to produce the new Out. Possible Well Transitions • Catalog of all the possible transitions in the bistable wells, adiabatic & not... (Ignoring superposition states.) – We can characterize a wide variety of digital logic and memory styles in terms of how their operation corresponds to subgraphs of this diagram. 1 leak 0 0 0 Barrier Height ∆E 0 1 1 k ln 2 N Direction of Bias Force leak ∆E 1 “1” states “0” states Ordinary Irreversible Memory • (1) Lower a barrier, obliviously erasing stored information. (2) Apply an input bias. (3) Raise the barrier to latch the new information into place. (4) Remove input (4) Retract 1 bias. input (1) and (2) can also be in the opposite order Examples: ordinary DRAM cell, rod logic register (4) Retract input 0 Barrier up 0 (3) Input “0” 0 Dissipation here can be made as low as kT ln 2 (2) (1) N Barrier up Input “1” (2) 1 1 (3) Example: NMOS latch / DRAM cell • Sequence corresponds exactly to general picture illustrated on previous slide. I off M I I off M (1) Oblivious erasure on Voltage color scheme: Low / Medium / High I on M I off M I off M I on M I off M I off M M (2) Apply input bias Could also do these in the other order also (3) Raise barrier (4) Remove input bias (& back to start) Irreversible latch in spacetime diagram • Again, this notation makes it clear that irreversible behavior is occurring. Location Outside influence causes I to possibly change here I may change again later without necessarily affecting value of M I The “×” & arrow denotes that the old value of M gets obliviously erased or overwritten by I when barrier is lowered M Later arrow denotes that I gets reflected (without gain) in location M with a small delay Barrier is raised shortly afterwards (end of shaded area) Time Conventional vs. Adiabatic Charging For charging a capacitive load C through a voltage swing V • Conventional charging: – Constant voltage source • Ideal adiabatic charging: – Constant current source Q=CV Q=CV V C – Energy dissipated: Ediss CV 1 2 2 I R C – Energy dissipated: 2 Q R 2 2 RC Ediss I Rt CV t t Note: Adiabatic beats conventional by advantage factor A = t/2RC. Adiabatic Switching with MOSFETs Vg • Use a voltage ramp to approximate an ideal current source. ~R + V • Switch conditionally, C − Q=CV if MOSFET gate voltage t Vg > V+VT during ramp. • Can discharge the load later using a similar ramp. – Either through the same path, or a different path. RC t ≫ RC Ediss CV t 2 t ≪ RC Ediss 12 CV 2 Exact formula: Ediss s 1 s e 1/ s 1 CV 2 given speed fraction s : RC/t Athas ’96, Tzartzanis ‘98 Requirements for True Adiabatic Logic in Voltage-coded, FET-based circuits • Avoid passing current through diodes. – Crossing the “diode drop” leads to irreducible dissipation. • Follow a “dry switching” discipline (in the relay lingo): – Never turn on a transistor when VDS ≠ 0. – Never turn off a transistor when IDS ≠ 0. • Together these rules imply: Important but often neglected! – The logic design must be logically reversible • There is no way to erase information under these rules! – Transitions must be driven by a quasi-trapezoidal waveform • It must be generated resonantly, with high Q • Of course, leakage power must also be kept manageable. – Because of this, the optimal design point will not necessarily use the smallest devices that can ever be manufactured! • Since the smallest devices may have insoluble problems with leakage. Possible Well Transitions • Catalog of all the possible transitions in the bistable wells, adiabatic & not... (Ignoring superposition states.) – We can characterize a wide variety of digital logic and memory styles in terms of how their operation corresponds to subgraphs of this diagram. 1 leak 0 0 0 Barrier Height ∆E 0 1 1 k ln 2 N Direction of Bias Force leak ∆E 1 “1” states “0” states Erasing Digital Entropy • Note that if the information in a bit-system is already entropy, – Then erasing it just moves this entropy to the surroundings. – This can be done with a thermodynamically reversible process, and does not necessarily increase total entropy! • However, if/when we take a bit that is known, and irrevocably commit ourselves to thereafter treating it as if it were unknown, – that is the true irreversible step, – and that is when the entropy is effectively generated!! 0 ?1 0 1 This state contains 1 bit of decomputable information, in a stable, “digital” form This state contains 1 bit of physical entropy, but in a stable, “digital” form Note: This transformation is reversible!! 0 N In these 3 states, there is no entropy in the digital state; it has all been pushed out into the environment. Reversible Set (rSET) & Clear (rCLR) • rSET operation semantics: Given assurance that a bit is initially 0, unconditionally change it to 1. – To implement: Traverse the adiabat (reversible trajectory) shown below. • Reverse this path to perform rCLR. (6) 1 (1) 0 Barrier Height Get work out 1 Put work back in 0 (5) (2) (3) 0 N (4) Direction of Bias Force 1 “1” states “0” states Taking rSET & rCLR out of context • What happens if we attempt to perform rSET on a bit that is already a 1? – It still ends up with the right value (1), but… – Irreversible dissipation occurs in step 2 (when barrier is lowered), as shown below. • Similarly if we try to rCLR a 0. (1) 1 1 (takes work to raise 1) (2) Barrier Height 1 (takes work to raise 1) (5) (dissipates it as heat) (3) 0 (6) N (4) Direction of Bias Force 1 “1” states “0” states rSET/rCLR transition tables • Note that these tables are not reversible according to the strict traditional definition… – Since they don’t represent a 1-1 transformation of all possible input states. • However, if we restrict our use of these operations so as to always avoid the input states that actually result in dissipation, – Then, we obtain a 1-1 transformation of the subset of the input states that are actually used, – And that is the correct statement of the true logical requirement for avoiding Landauer’s principle! Before After rSET rSET 0 1 1 Before After rCLR rCLR 0 1 0 Type 1: Input-Bias Clocked-Barrier Reversible Latching (& Logic) • Cycle of operation: (Can amplify/restore input signal – (1) Data input applies bias in the barrier-raising step.) • Add forces to do majority logic – (2) Clock signal raises barrier – (3) Data input bias removed (3) 1 1 (4) Can reset latch reversibly (4) given copy of contents. (3) 0 0 (2) (4) (4) (4) Examples: Adiabatic QCA, SCRL latch, Rod logic latch, PQ logic, Buckled logic, Helical logic (2) (1) 0 (4) N (1) (4) 1 Type 1 Example: Adiabatic NMOS latch / DRAM cell • Same as irrev. latch, just skip the erasure step! Voltage color scheme: Low / Medium / High I on M I off M I off M I Can similarly use a CMOS transmission gate (nFET/pFET pair) (1) to latch a full-swing Apply signal if necessary. on M I off M I off M I on M input bias (2) Raise barrier (3) Remove input bias (Reverse steps to reversibly unlatch M) A Simple Reversible CMOS Latch • Uses a single standard CMOS transmission gate (T-gate). • Sequence of operation: (0) input level initially tied to latch ‘contents’ (output); (1) input changes gradually output follows closely; (2) latch closes, charge is stored dynamically (node floats); (3) afterwards, the input signal can be removed. Before input: in out 0 0 P in out P “Reversible latch” (0) (1) (2) (3) Input arrived: in out 0 0 1 1 Input removed: in out 0 0 0 1 • Later, we can reversibly “unlatch” the data with an exactly time-reversed sequence of steps. Reversible latch in spacetime diagram Location Outside influence causes I to possibly change here I may be restored to neutral again later without necessarily affecting value of M I Arrow to dotted line denotes that change to I is reversibly carried through (without gain) to location M at this time (energy transferred into I is also fanned out to M) Dotted lines denote that these nodes contain no information at these times (they are in a predetermined state) M Barrier is raised some time afterwards (end of shaded area) Barrier is lowered some time in here (start of shaded area) Time Unlatching sequence: I M Time Note this operation is reversible only if I and M match up exactly when they are first connected together! Simplified Version of Diagram • Suppose the signal on the input node I was produced as a temporary copy of some origin node O. – We will see how to implement this reversibly later. • Then for simplicity of our diagrams, we may wish to omit explicit representation of the intermediate node I. – However, we must keep in mind that there is then a small additional space usage not explicitly shown in the diagram. O O “Reversible copy” I M M Time Time Type 2: Input-Barrier, Clocked-Bias Reversible Retractile Logic • Cycle of operation: – (1) Inputs raise or lower barriers • Do logic w. series/parallel barriers • Barrier signal is amplified! Gain, restoring logic, fan-out. • Must reset output prior to changing input. • Combinational logic only! – (2) Clock applies bias force, which changes state, or not 0 0 0 (1) Input barrier height Examples: Hall’s logic, SCRL gates, Rod logic interlocks 0 N 1 (2) Clocked bias force applied Type 2 example: Adiabatic CMOS “buffer” (really, a cSET/cCLR gate) • Controlled-SET / controlled-CLEAR. • Structure: Essentially just a pair of CMOS transmission gates – 2 transistors each, an nFET and a pFET in parallel • Using dual-rail signaling, we can reversibly set or clear a bit on an unoccupied logic node (pair of voltage nodes), conditionally on an input node. – Amplifies input signal. – Fully restores logic levels. DriveN DriveN InN InP on InN OutN DriveN off InN InP off DriveN InP OutN OutN (And similarly for OutP) InP on OutN Voltage color scheme: Low / High InN DriveN InN off InP OutN Spacetime diagram for buffer • Subscript NP notation denotes shorthand for dual-rail NP pair of wires. – Still denotes a single logical bit. • Diagram emphasizes that the buffer copies InNP’s value to a new location. – The value simultaneously remains available in the old location. • Dotted horizontal line shows that OutNP is empty prior to the operation. – The absence of “×” icon shows that the operation is reversible. • Buffer icon indicates that the input signal is being amplified and restored. – Note that the input comes from InNP, not from previous value of OutNP. • Downward wedges remind us the output remains dependent on the input. – Input can’t be changed without (possibly) irreversibly destroying output. • Fortunately, the buffer’s entire operation sequence is reversible! – So, sometime later on, we can unbuffer the output, • and then we are free to change the input. InNP … OutNP InNP Input value can be changed afterwards. Restored to null. OutNP Time Time A Reversible Buffered Latch • Uses two dual-rail T-gates. • Combines a buffer and latch. This is our icon for a CMOS transmission gate (T-gate). It says that nodes A and B are connected whenever the control signal CNP has logic value 1. CNP B – Reversibly copies InNP to Spacetime diagram for operation sequence: MemNP when operated. In NP Physical structure: IntNP DriveNP MemNP InNP LatchNP MemNP IntNP Implements “reversible copy”: InNP MemNP Transition Table for cSET • It is not always reversible, – Not a one-to-one transformation of all possible local states, • But, it is reversible in context – I.e., in the context that input state 1,1 is avoided. Before cSET After cSET Source Destination Source Destination 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 0 Type 2 example: SCRL inverter (w/o latch) • Same structure as static CMOS inverter, but used reversibly. • Produces a fully-restored, amplified output signal. • Inverters can be cascaded, but need latches to get feedback. driveH driveH In In off In off Out on driveL driveL driveH driveH on on In Voltage color scheme: Low / Medium / High off Out on off Out driveL driveH Out In Out off off driveL driveL SCRL Inverter Transition Table Before After SCRL-Inv SCRL-Inv In Out In Out 0 0 0 ½ 0 1 0 1 ½ 0 ½ ½ ½ 1 1 0 1 ½ 1 1 1 0 • Reversible in context, if input is valid and output is ½ just before drivers do their thing. • No point in even listing the table entries that don’t occur; can summarize operation below. Before After SCRL-Inv SCRL-Inv In Out In Out 0 ½ 0 1 1 ½ 1 0 Spacetime Diagram for SCRL Inverter • Note that the notation shows that Out is being computed from In on a separate wire. – In is explicitly not being inverted “in place.” • Wedge symbols show ongoing dependence. – Of course, we can always undo the op later. In Out … Example: Adiabatic NMOS OR gate • Together A A Out Drive B A A B B A B A Out Drive B Out Drive Out Drive B • Reverse sequence decomputes Out. • Can’t change A,B freely until then. B A Out Drive Out Drive B A Out Drive Out = A B B A Out Drive Out Drive • With NMOS, Out is weak (orange). • Can use an SCRL inverter to restore the signal levels. • If appropriately biased… • Or, just use CMOS transmission gates instead (8T OR) Type 3: Input-Barrier, Clocked-Bias Latching Logic ● Cycle of operation: 1. Input conditionally lowers barrier • Do logic w. series/parallel barriers 2. Clock applies bias force; conditional bit flip 3. Input removed, raising the barrier & (4) locking in the state-change (4) 4. Clock 0 0 bias can 0 (2) (2) retract (1) Examples: Mike’s 4-cycle 2-level adiabatic CMOS logic (2LAL) (2) 0 N (2) 1 (3) 1 2LAL: 2-level Adiabatic Logic A pipelined fully-adiabatic logic invented at UF (Spring 2000), implementable using ordinary CMOS transistors. TN T • Use simplified T-gate symbol: 1 • Basic buffer element: – cross-coupled T-gates: • need 8 transistors to buffer 1 dual-rail signal in 0 out • Only 4 timing signals 0-3 are needed. Only 4 ticks per cycle: – i rises during ticks t≡i (mod 4) – i falls during ticks t≡i+2 (mod 4) 2 : (implicit dual-rail encoding everywhere) TP Animation: 0 1 2 3 Tick # 0 1 2 3… 2LAL Cycle of Operation Tick #0 Tick #1 in1 in Tick #2 11 in0 Tick #3 10 out1 01 in=0 01 00 11 out0 out=0 00 A Schematic Notation for 2LAL PP (a) P PN A B ≡ A B PN ≡ A A out ≡ in B PN PN t A B φt mod 4 (b) in (e) PP ≡ AB t t B AB A B A+B t ≡ out A t out ≡ t int-1 A outt t-1 ≡ ~A (h) t A+B B (g) (c) in A (f) PN PN A t ~A=0 A=0 A 2 2 A=1 ~A=1 (d) A=1 in0 1 2 3 4 5 A=1 A out5 A B t AB ≡ B=1 A=0 B=0 t AB=1 t AB=0 ~A 2LAL Shift Register Structure Animation: • 1-tick delay per logic stage: 1 2 3 0 in@0 0 1 2 3 out@4 • Logic pulse timing and signal propagation: 0 1 2 3 ... inN inP 0 1 2 3 ... More Complex Logic Functions • Non-inverting multi-input Boolean functions: A0 B0 0 AND gate (plus delayed A) A0 A1 OR gate B0 (AB)1 (AB)1 • One way to do inverting functions in pipelined logic is to use a quad-rail logic encoding: – To invert, just swap the rails! • Zero-transistor “inverters.” A=0 AN AP AN AP A=1 Minimum Losses w. Leakage topt Pleak Sleak cE cS Etot = Eadia + Eleak Eleak = Pleak·tr 2 Pleak cE 2T Sleak cS Eadia = cE / tr UF CONFIDENTIAL – PATENT PENDING MEMS Resonator Concept A potential approach for efficiently driving adiabatic logic transitions The Power Supply Problem • In adiabatics, the factor of reduction in energy dissipated per switching event is limited to (at most) the Q factor of the clock/power supply. Qoverall = (Qlogic−1 + Qsupply−1)−1 • Electronic resonator designs typically have low Q factors, due to considerations such as: – Energy overhead of switching a clamping power MOSFET to limit the voltage swing of a sinusoidal LC oscillator. – Low coil count, substrate coupling in integrated inductors. – Unfavorable scaling of inductor Q with frequency. • Our proposed solution: – Use electromechanical resonators instead! MEMS (& NEMS) Resonators • State of the art of technology demonstrated in lab: – Frequencies up to the 100s of MHz, even GHz – Q’s >10,000 in vacuum, several thousand even in air! • An important emerging technology being explored for use in RF filters, U. Mich., poly, f=156 MHz, Q=9,400 etc., in communications SoCs, e.g. for 34 µm cellphones. UF CONFIDENTIAL – PATENT PENDING Original Concept • Imagine a set of charged plates whose horizontal position oscillates between two sets of interdigitated fixed plates. – Structure forms a variable capacitor and voltage divider with the load. • Capacitance changes substantially only when crossing border. – Produces nearly flat-topped (quasi-trapezoidal) output waveforms. – The two output signals have opposite phases (2 of the 4 φ’s in 2LAL) Logic load #2 Logic load #1 V1 RL CL V2 RL CL x t V1 t V2 t UF CONFIDENTIAL – PATENT PENDING MEMS Resonant Power Supply for Ultra-Low-Power Adiabatic Circuits A.k.a. The “AdiaMEMS” Project • Part of CISE’s Reversible & Quantum Computing group – Collab. with Huikai Xie (MEMS, ECE dept.) • Goal: Demonstrate orders-of-magnitude improvement in power-performance efficiency of digital CMOS circuits. – Based on reversible logic in adiabatic circuits powered by high-quality custom microelectromechanical resonators. • Funding: $40K seed grant from SRC’s Cross-Disciplinary Semiconductor Research (CSR) Program MEMS Designer: Maojiao He VLSI designer: Krishna Natarajan UF CONFIDENTIAL – PATENT PENDING Key Characteristics of Resonator • Goal: Produce a near-ideal trapezoidal output voltage waveform resonantly, with high Q. • To be optimized with logic: Resonant frequency f. • Key resonator figures of merit: – Effective quality factor: Qeff = Etrans/Ediss. – Area efficiency: EA = Etrans/A. • Key resonator figures of demerit: – Maximum relative transition slope: smax = (dC/dt)max / (∆Cmax/∆ttrans) – Fractional capacitance variation: dC vC = ∆Cvar / ∆Cmax dt max ∆ttrans ∆Cvar ∆Cmax UF CONFIDENTIAL – PATENT PENDING First MEMS Technology Tried • MEMS process donated by Robert Bosch corp. • It is a thin-film technology – We have since moved to a multi-layer, bulk singlecrystal process which can be expected to do better. • Integrated CMOS/MEMS devices will eventually be available in this process. – However our initial design was dual-die • CMOS side was not mature yet in this process • Minimum etched structure width: λ = 0.5 µm • Minimum etched gap size: d = 0.1 µm UF CONFIDENTIAL – PATENT PENDING Some Early Resonator Designs By Ph.D. student Maojiao He, under supervision of Huikai Xie drive Close-up of sense fingers comb sense comb Another finger design UF CONFIDENTIAL – PATENT PENDING Resonator Schematic Vc vac Actuator Vc Vb vac Ca Sensor Sensor Cs Cr Vb Sensor Vc Sensor vac Actuator V p Vc Vb UF CONFIDENTIAL – PATENT PENDING Sensor Design ds d Lst Wsst Ls Ws Ws X 8 Lst Wst Ws 4d Wsst Ws 2d Lst Ls d ( Ls 20d ) Wst ds 4Csf 8 1016 F Four-finger sensor 14 Capacitance 10 16 (Early design w. thin fingers) F 12 10 8 6 4 Simulated Output Waveform t 2 0 -5 -4 -3 -2 -1 0 1 2 t Dissipation in Resonator Ways to minimize some major sources of dissipation: • Air damping: – Vacuum packaging, small size, or optimize airflow • Clamping losses to the substrate: – Locate support at a nodal point of vibration mode – Use impedance-mismatched supports to reflect energy back • Thermoelastic dissipation (heat flow resulting from nonuniform strain): – Small size – Use stiff, high thermal conductivity materials (Si, diamond?) – Utilize modes with uniform compression/expansion • Surface loss mechanisms: – Avoid layered structures (thin-film interfaces) at surfaces • Intrinsic material losses: – Prefer single-crystal materials Status / Plans for Near Future • Improved resonator designs afforded by a suitably modified post-CMOS process flow are being developed. – I will briefly review some aspects of the new process. • A small prototype resonator design was taped out in a post-CMOS MEMS process (TSMC .35) – Parts were just received last week; are presently being etched. • Process donation has been obtained from MOSIS for fabricating a integrated CMOS/MEMS test chip (~$20k). – Resonator driving a simple 2LAL shift register or adder pipeline – Tape-out for this chip is scheduled for July 26. • Test the various parts separately, & together. – Characterize power dissipation using sensitive calorimetry techniques. Post CMOS-MEMS Process (DRIE) CMOS-region (a) Backside etch STS: 12-sec etching 130-sccm SF6, 13-sccm O2, 23 mT, 600 W coil power, 12 W platen power; 8-sec passivation 85-sccm C4F8, 12 mT, 600 W coil power, 0 platen power. (b) Oxide etch PlasmaTherm-790: 22.5-sccm CHF3, 16-sccm O2, 100 W, 125 mT for 125 minutes and then 100 mT for 10 minutes. Single-crystal Si (SCS) membrane metal-3 metal-2 metal-1 oxide poly-Si (a) Post CMOS-MEMS Process (DRIE) (b) CMOS layer (c) Deep Si etch STS: same as Step (a). Flat structure Thin-film structure (d) Si undercut STS: 130-sccm SF6, 13-sccm O2, 23 mT, 600 W coil power, and 0 platen power. SCS layer (20~100mm) H. Xie et al, J. MEMS, Vol.11, no.2, 2002 Electrical Isolation of Silicon Electrically isolated silicon island Electrically isolated comb fingers Using n-well to improve undercut yield n-well Al Oxide DRIE CMOS-MEMS Resonators Front-side view Serpentine Proof spring mass Comb drive Back-side view 150 kHz Resonators UF CONFIDENTIAL – PATENT PENDING Post-TSMC35 AdiaMEMS Resonator Taped out April ‘04 Drive comb Sense comb Flex arm UF CONFIDENTIAL – PATENT PENDING Close-Up View, Drive/Sense Combs UF CONFIDENTIAL – PATENT PENDING Side View, Showing Si Undercut UF CONFIDENTIAL – PATENT PENDING New Comb Finger Shape Concepts For improved waveform shape and area efficiency UF CONFIDENTIAL – PATENT PENDING New Comb Finger Shape I Load electrode Maximum vertical (z) thickness for maximum overlap capacitance per planar area Fixed plate Moving plate support arm/electrode Minimum thickness to minimize undesired arm-load capacitance Moving Fixed plate Fixed plate (cut awayplate Moving Plate Range of Motion view) Minimum gap size for maximum overlap capacitance per-area Note that the new configuration increases the magnitude of the capacitance variation while reducing the magnitude of departures from the desired trapezoidal wave shape. Metal/oxide layers Color key: Silicon substrate material UF CONFIDENTIAL – PATENT PENDING New Comb Finger Shape II Fixed plate Maximum vertical (z) thickness for maximum overlap capacitance per planar area Moving Fixed plate plate Fixed (cut awayplate Moving Plate Range of Motion view) Moving plate support arm/electrode Minimum gap size for maximum overlap capacitance per-area Note that the new configuration increases the magnitude of the capacitance variation while reducing the magnitude of departures from the desired trapezoidal wave shape. In addition, the structures are made of silicon Metal/oxide layers Color key: Silicon substrate material UF CONFIDENTIAL – PATENT PENDING New Comb Finger Shape III Moving plate support arm/electrode Load electrode High vertical (z) thickness for large overlap capacitance per planar area Fixed plate Moving plate Plate Range of Motion FixedMoving plate Note that the new configuration increases the magnitude of the capacitance variation while reducing the magnitude of departures from the desired trapezoidal wave shape. Note separation to reduce undesired arm-load capacitance Minimum gap size for maximum overlap capacitance per-area Metal/oxide layers Color key: Silicon substrate material UF CONFIDENTIAL – PATENT PENDING New Comb Finger Shape IV Arm anchored to nodal points of fixed-fixed beam flexures, located a little ways away, in both directions (for symmetry) Moving metal plate support arm/electrode Moving plate Range of Motion z Phase 0° electrode C(θ) 0° θ 360° Repeat interdigitated structure arbitrarily many times along y axis, all anchored to the same flexure Phase 180° electrode x C(θ) 0° θ y 360° Or, if we can do the structure on the previous slide, then why not this one too? Or, will there be a problem etching the intervening silicon out from in between the metal/oxide layers and the bulk substrate? UF CONFIDENTIAL – PATENT PENDING New Comb Finger Shape V Fixed plate Fixed plate Moving plate Fixed plate Fixed plate In this design, the plates are attached directly to a supprt arm which extends in the y direction instead of x. This arm can be the flexure, or it can be attached to a surrounding frame anchored to a flexure. Note that in the initial position, at all points, we only need etch from top and/or bottom, with no undercuts. Also, the flexure can be single-crystal Si. Requires accurate, variable-depth backside etch (not presently available). UF CONFIDENTIAL – PATENT PENDING New finger: One Candidate Layout UF CONFIDENTIAL – PATENT PENDING New finger simulation results 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 0 0 2 4 6 8 10 12 Cadence simulation results Work by AdiaMEMS project students: Krishna Natarajan Venkiteswaran Anantharam (UF ECE Dept., under supervision of Dr. Frank, CISE/ECE) 2LAL 8-stage circular shift register Shift register layout, in progress Pulse propagation in 8-stage circuit Simulation Results from Cadence Power vs. freq., TSMC 0.18, Std. CMOS vs. 2LAL 1.E-05 1.E-07 1.E-08 Standard CMOS 1.E-10 1.E-11 1.E-12 <.01× the power @ 1 MHz 1.E-09 >100× faster @ 1 pW/T 1.E-13 1.E-14 1.E+09 1.E+08 1.E+07 1.E+06 1.E+05 1.E+04 1.E+03 Frequency, Hz Energy dissipated per nFET per cycle Average power dissipation per nFET, W 1.E-06 Assumptions & caveats: •Assumes ideal trapezoidal power/clock waveform. • Minimum-sized devices, 2λ×3λ * .18 µm (L) × .24 µm (W) • nFET data is shown * pFETs data is very similar • Various body biases tried * Higher Vth suppresses leakage • Room temperature operation. • Interconnect parasitics have not yet been included. • Activity factor (transitions per device-cycle) is 1 for CMOS, 0.5 for 2LAL in this graph. • Hardware overhead from fullyadiabatic design style is not yet reflected * ≥2× transistor-tick hardware overhead in known reversible CMOS design styles O(log n)-time carry-skip adder (8 bit segment shown) 3rd carry tick 4th carry tick S AB G S AB Cin GCoutCin P Pms G S AB G P S AB GCoutCin Cin P Gls Pls MS Pms GCout P S AB G P Gls LS Cin P Pls Pms G Cin P Pms G Gls With this structure, we can do a 2n-bit add in 2(n+1) logic levels → 4(n+1) reversible ticks 2nd carry tick → n+1 clock cycles. Hardware overhead is <2× regular G P P G P ripple-carry. MS LS GC C S AB S AB GCoutCin G P ls Cin ls P ms Gls P Pms Gls GCout LS P Pls Cin Pls Cin ls in P GCout LS P ls out Pms MS GCoutCin P P Pls S AB Adder Schematic – High 16 Bits 32-bit Adder Simulation Results 32-bit adder power vs. frequency 32-bit adder energy vs. frequency 1.E-04 1.E-11 Energy/Add (J) 1.E-05 Power (W) 1.E-06 1.E-07 1.E-12 1V CMOS 0.5V CMOS 1.E-13 1.E-14 CMOS energy 1.E-08 Adia. enrgy 20x better perf. @ 3 nW/adder CMOS pwr 1.E-09 1.E-15 1.E+08 Adia. pwr 1.E+07 1.E+06 1.E+05 1.E+04 Add Frequency (Hz) 1.E-10 1.E+08 1.E+07 1.E+06 1.E+05 Add Frequency (Hz) 1.E+04 (All results normalized to a throughput level of 1 add/cycle) Power vs. freq., alt. device techs. Power per device, vs. frequency Plenty of Room for Device Improvement 1.E-03 1.E-04 1.E-05 1.E-06 1.E-07 • Recall, irreversible device technology has at most ~3-4 orders of magnitude of power-performance improvements remaining. 1.E-08 1.E-09 1.E-10 1.E-11 1.E-12 1.E-13 1.E-15 – And then, the firm kT ln 2 limit is encountered. 1.E-16 1.E-17 1.E-18 • But, a wide variety of proposed reversible device technologies have been analyzed by physicists. 1.E-19 1.E-20 1.E-21 .18um 2LAL nSQUID QCA cell Quantum FET Rod logic Param. quantron Helical logic .18um CMOS kT ln 2 – With theoretical powerperformance up to 10-12 orders of magnitude better than today’s CMOS! • Ultimate limits are unclear. 1.E+12 1.E+11 1.E+10 1.E+09 1.E-22 1.E-23 1.E-24 Various reversible device proposals 1.E-25 1.E-26 1.E-27 1.E-28 1.E-29 1.E-30 1.E+08 1.E+07 Frequency (Hz) 1.E+06 1.E+05 1.E+04 1.E-31 1.E+03 Power per device (W) 1.E-14 A Potential Scaling Scenario for Reversible Computing Technology Make same assumptions as previously, except: • Assume energy coefficient (energy diss. / freq.) of reversible technology continues declining at historical rate of 16× / 3 years, through 2020. – For adiabatic CMOS, cE = CV2RC = C2V2R. • This has been going as ~4 under constant-field scaling. – But, requires new devices after CMOS scaling stops. • However, many candidates are waiting in the wings… • Assume number of affordable layers of active circuitry per chip (or per package, e.g., stacked dies) doubles every 3 years, through 2020. – Competitive pressures will tend to ensure this will happen, esp. if device-size scaling stops, as assumed. Result of Scenario A Potential Scenario for CMOS vs. Reversible Raw Affordable Chip Performance 40 layers, ea. w. 8 billion active devices, freq. 180 GHz, 0.4 kT dissip. per device-op Device-ops/second per affordable 100W chip 1.00E+23 1.00E+22 1.00E+21 CMOS 1.00E+20 Reversible 1.00E+19 e.g. 1 billion devices actively switching at 3.3 GHz, ~7,000 kT dissip. per device-op 1.00E+18 1.00E+17 2004 2006 2008 2010 2012 2014 2016 2018 2020 Year Note that by 2020, there could be a factor of 20,000× difference in raw performance per 100W package. (E.g., a 100× overhead factor from reversible design could be absorbed while still showing a 200× boost in performance!) Conclusions • Standard CMOS is approaching imminent limits on raw performance per unit power consumed. – Due to various lower bounds on the energy dissipated by conventional irreversible switching. • Only mostly-reversible logic architectures have the potential to bypass all of the known energy limits! – Via migration to an increasingly adiabatic, ballistic mode of operation, and an increasingly reversible logic design. • With increasingly high-Q energy transfers during logic. • UF’s AdiaMEMS project is refining techniques for near-term reversible computing in CMOS/MEMS. – Potentially viable technology for ultra-low-power products. • Long-term, digital circuit architectures that are designed in a mostly-reversible logic style will be the only ones that can be easily ported to future ultra-highperformance reversible logic-device nanotechnologies. – We need to start paying more attention to these issues! AdiaMEMS Project Members – Thanks! Left to Right: Venki, Mike, Maojiao, Krishna, & Huikai