“Information Friction” and Minimum Power Consumed in Data Center and Big Data Networks The hope: Pulkit Grover CMUdistortion? y probes = significant near-field the core idea: coil in or around the circuit coil is probed, detect probing by measuring field Acknowledgments: NSF ECCS 1343324 NSF CSoI !1 /15 Energy consumed in communication & computing In the body - 20% of total energy In the world - 8% of electricity in 2011 - Projection: 15% by 2020 !2 /15 Moore’s law, and the difficulty of reducing computation power Moore’s law is saturating Higher frequencies cost more power But this is the era of BigData! !3 /15 Solution to Moore’s law saturation: Parallelize! Use communication to compute Communication has become a bottleneck “We need a “Moore’s Law” technology for networking” [Byrant, Katz, Lazowska ’08] !4 /15 ! How much power must communication consume? Transmitter PT A soft-spoken teacher’s dilemma Speak slower? louder? Channel c M Receiver “Shannon’s solution” Neither! Have a lot to say; “Mix” your information −5 log (P(P ) log 10 e e ) 10 M −10 fixed R, distance −15 path loss Shannon limit ✓ ◆ −25 (Tx power) PT C = W log 1 + >R N0 −20 −30 0 0.05 0.1 0.15 Total power (Watts) Total power (watts) Transmit 0.2 !5 /15 How can we reduce communication power? log10(P(P ) log 10 e e ) −5 −10 fixed R, distance −15 path loss Shannon limit ✓ ◆ −25 (Tx power) PT C = W log 1 + >R N0 −20 −30 0 0.05 0.1 0.15 Total power (Watts) Total power (watts) Transmit 0.2 Shannon limit is already achieved [Turbo Codes ’93] [LDPC codes ’98] [Polar Codes ’08] We are already at the theoretical minimum transmit power /27 !6 /15 Communication systems: then and now !7 /15 The problem has changed We now want to minimize total (transmit + circuit) power !8 /15 What could fundamental limits on total power look like? M log10(P(P ) ) log e e 10 −5 −10 Transmitter PT Channel c M Receiver Total power behavior fixed rate, distance this? ... or this? −15 −20 Shannon limit −25 (Tx power) −30 0 0.05 0.1 0.15 Total power (watts) 0.2 Total power (Watts) !9 /15 Total power: Hints from experiments Circuit-simulation-based modeling of decoding power [Ganesan, Grover, Rabaey SiPS’11][Ganesan, Wen, Grover, Rabaey’12] Regular LDPCs (3,4) (3,6) (4,8) (5,10) log10 (Pe ) −5 −10 Ptotal ⇤ PT −15 −20 0 0.025 0.05 0.075 Power (Watts) R = 7 Gbps W = 7 GHz path-loss coefft = 3 d = 3 meters fc = 60 GHz STM CMOS 90 nm Cadence Encnter (Layout) Synopsys HSPICE (post-lyt) 0.1 !10 /15 “Information-friction” energy model Circuit implementation model: Computational nodes that can talk to one another over circuit links Min distance between nodes = “Info-friction” model of energy consumed in circuit-links: weight w w B bits d Efriction = µ w d Example applications: Einfo d friction =µBd Coefficient of informational friction !11 /15 Info-friction losses in encoding/decoding M Transmitter PT c M Receiver Channel Theorem [Grover, ISIT ’13] 0 s ⌦ @k Eenc,dec log 1 Pe PT 1 for any code, and any encoding & A decoding algorithm implemented in the circuit model log10 (Pe ) −5 Ptotal −10 −15 ⇠ −20 r 3 PT⇤ 1 log Pe −25 0 Ptotal r 1 3 2 1 PT ⇠ log Pe PT with bounded r ⇠ ⇠ −30 In comparison, uncoded transmission: 1 log Pe 1 log Pe 3 Power (Watts) 4 5 −5 x 10 Experimental corroboration [Ganesan, Grover, Goldsmith, Rabaey ‘12] !12 /15 Why does this matter? Traditional view Total-power perspective 41 Code choice (10-Gbase-T): (6,32) - “RS-LDPC”, length 2048 10 FER/BER FER/BER 10 10 10 10 10 10 10 0 20 30 −2 -5 −4 log10 (Pe ) 10 Distance (meters) −6 -10 −8 −10 −12 −14 2.5 -15 10 iterations 20 iterations 50 iterations 100 iterations 200 iterations 3 3.5 (6,32) - “RS-LDPC” Code-choice must length 2048 depend on distance -20 4 4.5 5 5.5 6 Eb/No (dB) SNR (dB) FER (dotted lines) and BER (solid lines) performance of the Q4.2 sum-product the (2048,1723) RS-LDPC code using different number of decoding iterations. New codes that adapt and minimize total power [Mahzoon, Yang, Grover in prep] !13 /15 Towards fundamental limits on energy of computing networks ! ! Total computation energy on-chip energy + cable-energy µon-chip Bon-chip don-chip + µcable Bcable dcable Total energy for sorting, classification, … !14 /15 Can noisy or approximate computing help? The Shannon waterfall has inspired a study of noisy computing [von Neumann’56] [Pippenger’88] [Hajek, Weller’91] [Evans, Schulman’99] … [Shanbhag, Kumar, Jones’10] Theorem [Grover, ISIT ’14 Sub.] Etotal per-bit ⌦ ✓r 3 1 log Pe ◆ for any implementation with Gaussian noise in circuit elements … i.e., the “Shannon-capacity” of noisy computing is zero! [Grover, ISIT’14 Sub.] Nevertheless, noisy computing could help, . . . but don’t expect Shannon waterfall’s “magic” !15 /15 Summary 1) Traditional info theory ignores energy of encoding/decoding - useful in long distance communication - misleading in computation −5 log10 (Pe ) log10(P(P ) log 10 e e ) −5 −10 −10 −15 −15 −20 −20 −25 PT⇤ Shannon limit (Tx power) Ptotal −25 −30 −30 0 Total power with info-friction 0.05 0.1 0.15 Total power (Watts) Total Total power (watts) 0.2 0 1 2 3 4 Power (Watts) 5 −5 x 10 2) Info-friction model: fundamental limits on energy of various computations - sorting/ranking, Fourier transforms, classification ! ! 3) Insights towards “good” computational network designs 4) Still need to understand noise in computing and its impact !16 /15 !17 /15 Extra slides !18 /15 cellationof wave functionsin re- above use active channels 8. C. M. Cav J. J. Hall (Cambrid 89; W. Zu R. Landa C. H. Ben __, I D. Frank Requirements Internati Minimal Energy sociation ... in Communication 1995), pp Worksho '94 (IEE ’96] the wave function within that eliminated.[Landauer the chaoticnature Furthermore, CA, 1994 NOTES literature describing the energy needs for a communications channel has been 13. C. M. Cav atREFERENCES a much AND In it higher energy.The the of billiard ball collisions makes extremely dominated by analyses of linear electromagnetic transmission, often without awareness 66, 481 ( . J. Betley, V. L. Miller,J. J. Mekalanos, Annu. Rev. energy that an amount of the conclusion case leads to special case. This that this is a the mit, approaches in the that sensitive to flaws machinery. Using crobiol.40, 577 behavior (1986); J. Morschhauser, V. Vet14. L. Brillou , L. Emody, J. Hacker, Mol. Microbiol. 11, 555 equal to kTln 2, where kT is the thermal noise per unit bandwidth, is needed to transmit wo-level such motion forphoton a communications as an elecR. Taylor, C. system, P. Spears, 994); Shaw, K. Peterson, energies hv > kT. Alternative demic Pr if quantized ball channels are used with a bit, and morebilliard Mekalanos, Vaccine 6,151 (1988); J. Reidl and J. The physicist noframework unavoidable minimal to done show that there is a methods are proposed communication 15. L. B. Le where noMicrobiol. details the link most within beyond spinis easily Mekalanos, Mol. 18, 685 (1995). energy requirement per transmitted bit. These methods are invoked as part of an analysis J. Mekalanos, Cell 35, 253 (1983). Physics states net rate of billiardball transfer in-down exist. where D. N. Pearson, thesis, Harvard UniversityMedical procedures. of ultimate limits and not the as practical Compute hool (1989). smatter inA. two an in- doesnot dependon the informationcontent First, D. N. Pearson, Woods, ways. S. L. Chiang, J. J. pp. 210-2 ekalanos, Proc. Natl. Acad. Sci. U.S.A. 90, 3750 t993). can be transmitted.This is not a of the link. For example,0 is denotedby a 16. H. J. Br Fasano et a/, ibid. 88, 5242 (1991). Informationis inevitablytied to a physical with a known and specifiablelowerbound (1982); H and a 1 isIt was J. Michalski, the of error is ball an A. Fasano, J. B. Trucksis, J. if Galen, oblem, followed empty slot, probability by information. a paper, a that discard as a mark on are those such representation, aper, ibid. 90, 5267 (1993). 17. E. Fredk path loss that operalong ago (9) also understood spin emptyslot followedby a in a punched card, an electron hole B. Kaper, J. Michalski, J. M. mall error rate a Ketley, M. M. Levine, small represented by an requires only 18. R. Landa fect. Immun. 62,1480 (1994). or down, or a chargepresentor tions that do throwawayinformation,such pointingup (channel “frictional” loss) fK.redundancy use of athereturn Taylor,V. L. Miller,D. B. to J. J. Mekalaerror free ball. This scheme Furlong, regain permits logicalOR, can Noise in AND and logical This representation as the a capacitor. absent on s, Proc. Natl. Acad. Sci. U.S.A. 84,✓ 2833 (1987). ◆ that per- Mountain operations larger be imbeddedin of physics whether the laws leads us to asklink L. Miller andThe J. J. Mekalanos, sion if are ibid. 81,that we opti3471 Prestores (3). (Fig. 5). Alternatively, decoding Trestrictthe handlingof informationand in form a logical 1:1 mapping and do not National 984); V. J. DiRita,Mol. Microbiol.6, 451 (1992). C = W log 1 + >R L. Miller and J. J. Mekalanos, J. Bacteriol. nal 170, message is not about lifetime of a then photon, dissipationless; mistic the a real 1981), pp Nevertheless, discard information. enerminimal whether there are particular N 75 (1988); V. J. DiRita,C. Parsot, G. Jander, J. J. 0 is now calledreversof what with are understanding associated gy dissipationrequirements ekalanos, Proc. Natl. Acad. Sci.away U.S.A. 88, information 5403 we are inventions there throwing possible conceivably ,A 19. information handling. The subject has ible computationcame from the work of 991); G. Jonson, J. Holmgren, A.M. Svennerholm, 20. E. Goto, J fect. Immun.(4). 60, 4278However, (1992). error the the minimal make use of timing that compupolarization showed branches (10, 11 ), whoor Bennett but interrelated three distinct that Noise Goldberg and J. J. Mekalanos, J. Bacteriol. 165, [Shannon The engineer conductedthrougha J. von Ne can alwaysbe ’48] measurement tation with the dealing,respectively, 3 (1986). is to the Or there couldBennett be furdissipation proportional of single photons. perhaps /15 !19exam process,the communicationschannel, and seriesof logical 1:1 mappings. 21. For e primers used to determine the sequence of the 1994, Genetics Computer Group, 575 Science grateful to Mekalanos laboratory members, espeof geneticelementsmediatingthe transDrive, Madison, WI 5371 1, USA). ciallyE. Rubin,J. Tobias, S. Chiang, and G. Pearson, 0f virulence or 1 state channel.Such a (Fig. 4A). The two all along the length offorthe genes with the pathogenic 22. S. L. Chiang, R. K. Taylor,M. Koomey, J. J. Mekalavaluable support, suggestions, and strains. All nos, Mol. Microbiol.17,1133 (1995). experiments involvinganimals were done in accorderialeigenstates species they infect. (Fig. Thus, a virued as 4B) have these schemes setup makes ance with theunappealing guidelines for animal experimentation e factor (TCP) is the receptorfor a 23. D. A. Herrington et al., J. Exp. Med. 168, 1487 of HarvardMedical School. (1988). fferent energies. The most 9. Their exact technological candidates. eriophageencoding another virulence 24. wave R. H. Hall,P. A.serious Vial,J. B. Kaper, J. J. Mekalanos, M. 22 April1996; accepted 28 May 1996 M. Levine, Microb. Pathogen. 4, 257 (1988). or(CT), both of which arecoordinately within a difpocket active link, for the 10. are obvious alternative to an slightly latedby the same virulenceregulatory 11. In this case, the natural habitatand can(toxR). ey cannot cancel exactly classicalcase, is the use of particleswith 12. th phageand pathogenis the gastroinanal totally This sufficientlywell-definedspeed and directract. It isunoccupied apparentthat thispocket. host partment the necessary envican beprovides minimized by using very tion, as in the well-knowncollidingbilliard mentalsignalsrequiredforthe expression in which tential ball computer(29). This scheme assumes sential genepockets, productsmediating interac-the next between all threethe participants, namely, ate within pocket, a that friction can be completely having and noise Rolf Landauer [1996] erium,phage,and mammalianhost. How much power must communication consume? zero! Total power: Hints from experiments Circuit-simulation-based modeling of decoding power [Ganesan, Grover, Rabaey SiPS’11][Ganesan, Wen, Grover, Rabaey’12] Regular LDPCs (3,4) (3,6) (4,8) (5,10) log10 (Pe ) −5 −10 Ptotal ⇤ PT −15 −20 0 0.025 0.05 0.075 Power (Watts) R = 7 Gbps W = 7 GHz path-loss coefft = 3 d = 3 meters fc = 60 GHz STM CMOS 90 nm Cadence Encnter (Layout) Synopsys HSPICE (post-lyt) 0.1 !20 /15 Decoding complexity/power: Fundamental limits Theorem*[Grover, Woyach, Sahai ISIT ’08, JSAC ’11] # of clock-cycles, ⇥ & 1 log( 1) log log P1e (C(PT ) R)2 /V (PT ) ! . . . for any code and any decoding algorithm. ↵ : Max. node degree V (PT ) : “channel dispersion” Node mode: Enode joules per clock-cycle * precise result for any Pe and any gap appears in [Grover, Woyach, Sahai ’11] !21 /15 Moving information: a model of implementation Output nodes Input nodes Helper nodes 1) Asynchronous 2) No connectivity limitations 3) No constraints on interconnect material 4) but fixed schedule [Grover ’13] !22 /15 Looking forward: Neuronal and neuro-morphic processing Nodes of Ranvier (provides insulation) friction + noise in computing? !23 /15