Defect and Fault Tolerant Architectures for Nanoscale Devices David Newell, BSEE ‘07 Taylor Johnson, BSEE ‘08 ELEC527 March 22, 2007 Motivation “As silicon manufacturing technology reaches the nanoscale, architectural designs need to accommodate the uncertainty inherent at such scales. These uncertainties are germane in the miniscule dimension of the devices, quantum physical effects, reduced noise margins, system energy levels reaching computing thermal limits, manufacturing defects, aging and many other factors. Defect tolerant architectures and their reliability measures will gain importance for logic and micro-architecture designs based on nanoscale substrates.” March 22, 2007 Debayan Bhaduri, Sandeep Shukla, NANOLAB: A Tool for Evaluating Reliability of Defect-Tolerant Nano Architectures (2/49) State of the Art Yesterday March 22, 2007 http://www.rpi.edu/~schubert/Educational%20resources/Educational%20resources.htm (3/49) State of the Art Yesterday Intel 4004, 1971 Intel 8008, 1972 March 22, 2007 Max clock speed: 740kHz Process: 10um PMOS 2250 transistors Max clock speed: 800kHz Process: 10um PMOS 3500 transistors http://www.cpu-world.com/CPUs/CPU.html (4/49) State of the Art Yesterday (cont) Intel 8080, 1974 Intel 80286, 1982 March 22, 2007 Max clock speed: 2MHz Process: 6um NMOS 6000 transistors Max clock speed: 12.5MHz Process: 1.5um CMOS 134,000 transistors http://www.cpu-world.com/CPUs/CPU.html (5/49) State of the Art Yesterday (cont) Intel 80386, 1985 Intel 80486, 1989 March 22, 2007 Max clock speed: 16MHz Process: 1um CMOS 275,000 transistors Max clock speed: 25MHz Process: 1um CMOS 1.2 million transistors http://www.cpu-world.com/CPUs/CPU.html (6/49) State of the Art Yesterday (cont) Pentium, 1993 Pentium Pro, 1995 March 22, 2007 Max clock speed: 66MHz Process: 0.8um CMOS 3.1 million transistors Max clock speed: 200MHz Process: 0.6um CMOS 5.5 million transistors http://www.cpu-world.com/CPUs/CPU.html (7/49) State of the Art Yesterday (cont) Pentium II, 1997 Pentium III, 1999 March 22, 2007 Max clock speed: 300MHz Process: 0.35um CMOS 7.5 million transistors Max clock speed: 600MHz Process: 0.25um CMOS 9.5 million transistors http://www.cpu-world.com/CPUs/CPU.html (8/49) State of the Art Yesterday (cont) Pentium 4, 1999 Pentium 4HT, 2002 March 22, 2007 Max clock speed: 1.5GHz Process: 0.18um CMOS 42 million transistors Max clock speed: 3.006GHz Process: 0.13um CMOS 55 million transistors http://www.cpu-world.com/CPUs/CPU.html (9/49) State of the Art Yesterday (cont) Pentium 4EE, 2003 Pentium M, 2005 March 22, 2007 Max clock speed: 3.2GHz Process: 0.13um CMOS 178 million transistors Max clock speed: 2.13GHz Process: 90nm CMOS 140 million transistors www.wikipedia.org (10/49) State of the Art Yesterday (cont) Core Duo, 2006 March 22, 2007 Max clock speed: 2.33GHz Process: 65nm CMOS 291 million transistors www.wikipedia.org (11/49) March 22, 2007 tiu Model and Year tiu tiu C or e e 2 C or 4 M D uo D uo m 4E E 4H T tiu m m P en P en P en m -2 7 6 00 20 0 5 3 00 00 -2 -2 2 0 00 20 0 -2 - 9 7 99 19 9 5 93 9 5 2 19 9 19 19 8 19 8 19 8 19 74 19 72 19 71 -1 - - - - - - - - - - III II ro m m tiu tiu P en P en tiu P m 48 6 38 6 28 6 80 08 04 tiu m P en P en P en 80 80 80 80 80 40 log(Number of Transistors) Transistors 1.0E+09 1.0E+08 1.0E+07 1.0E+06 1.0E+05 1.0E+04 1.0E+03 1.0E+02 1.0E+01 1.0E+00 (12/49) March 22, 2007 tiu Model and Year tiu tiu C or e e 2 C or 4 M D uo D uo m 4E E 4H T tiu m m P en P en P en m -2 7 6 00 20 0 5 3 00 00 -2 -2 2 0 00 20 0 -2 - 9 7 99 19 9 5 93 9 5 2 19 9 19 19 8 19 8 19 8 19 74 19 72 19 71 -1 - - - - - - - - - - III II ro m m tiu tiu P en P en tiu P m 48 6 38 6 28 6 80 08 04 tiu m P en P en P en 80 80 80 80 80 40 log(Process Size) Process Size 10 1 0.1 0.01 (13/49) State of the Art Today Core 2 Duo, 20062007 March 22, 2007 Max clock speed: 2.66GHz Process: 65nm CMOS 376 million transistors www.wikipedia.org (14/49) State of the Art Tomorrow Evolutionary Fabrication (<45nm) March 22, 2007 Extreme ultraviolet lithography Electron projection lithography Interconnect problems INTERNATIONAL TECHNOLOGY ROADMAP FOR SEMICONDUCTORS, http://www.sia-online.org (15/49) State of the Art Tomorrow Revolutionary Molecular Electronics Issues March 22, 2007 Self-assembly Carbon nanotubes Nanotube transistors are only a few atoms across More transistors means more chances for failure (16/49) Traditional Full Adder March 22, 2007 Ellenbogen, J.C., Love, J.C., Architectures for molecular electronic computers, PROCEEEDINGS OF THE IEEE, VOL. 88, NO. 3, MARCH 2000. (17/49) Molecular Electronics Full Adder using Molecular Diodes March 22, 2007 Ellenbogen, J.C., Love, J.C., Architectures for molecular electronic computers, PROCEEEDINGS OF THE IEEE, VOL. 88, NO. 3, MARCH 2000. (18/49) March 22, 2007 Ellenbogen, J.C., Love, J.C., Architectures for molecular electronic computers, PROCEEEDINGS OF THE IEEE, VOL. 88, NO. 3, MARCH 2000. (19/49) Architecture Tolerance Types Defect Tolerance Fault Tolerance March 22, 2007 Manufacture-time defect detection and reconfiguration Ex: controlling placement of wires, orientation of wires, and interconnects Operation-time fault detection, reconfiguration, recovery, etc. Shukla, Goldstein, et al, Nano, Quantum, and Molecular Computing: Are We Ready for the Validation and Test Challenges. In Eighth IEEE International High-Level Design Validation and Test Workshop, pages 3-7, November, 2003. (20/49) Defect Tolerant Architecture March 22, 2007 An architecture which uses techniques to mitigate the effects of defects in the devices that make up the architecture, and guarantees a given level of reliability So, what are some of these techniques? Shukla, et al, Evaluating the Reliability of Defect-Tolerant Architectures for Nanotechnology, Proceedings of the 17th International Conference on VLSI Design, 2004. (21/49) Building on Traditional Tolerance Methods Teramac (1998) March 22, 2007 Massively parallel experimental computer built at HewlettPackard Laboratories to investigate a wide range of different computational architectures Defect-tolerant architecture of Teramac, which incorporates a high communication bandwidth that enables it to easily route around defects, has significant implications for any future nanometerscale computational paradigm Maybe feasible to chemically synthesize individual electronic components with less than a 100 percent yield, assemble them into systems with appreciable uncertainty in their connectivity, and still create a powerful and reliable data communications network Future nanoscale computers may consist of extremely large-configuration memories that are programmed for specific tasks by a tutor that locates and tags the defects in the system Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998 (22/49) Building on Traditional Tolerance Methods Teramac (cont) March 22, 2007 Consists of 65,536 LUTs connected via crossbars in a fat-tree network. Extremely flexible architecture with few critical paths Highly redundant connectivity Contains about 220,000 hardware defects, any one of which could prove fatal to a conventional computer Despite defects, operated 100 times faster than a high-end single-processor workstation for some of its configurations Functions normally despite defects in 10% of cells and interconnects Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998 (23/49) Fault Tolerance: Teramac Overview March 22, 2007 Successful operation due to learning defects after fabrication Able to avoid running into defects due to extremely high connectivity via high bandwidth bus Redundancy Tree architecture leads to intrinsic ability to find paths to an end node Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998 (24/49) Fault Tolerance: Teramac – Lesson #1 March 22, 2007 Possible to build a very powerful computer that contains defective components and wiring, given sufficient communication bandwidth in the system to find and use the healthy resources Machine is built cheaply but imperfectly, a map of the defective resources is prepared, and then the computer is configured with only the healthy resources Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998 (25/49) Fault Tolerance: Teramac – Lesson #2 March 22, 2007 Resources in a computer do not have to be regular, but rather they must have a sufficiently high degree of connectivity System at the nanoscale that has some random character can still be functional if there is enough local intelligence to locate resources, either through the laws of physics or through the ability to reach down through random but fixed local connections Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998 (26/49) Fault Tolerance: Teramac – Lesson #3 March 22, 2007 Wires are by far the most plentiful resource, and the most important are the address lines that control the settings of the configuration switches and the data lines that link the LUTs to perform the calculations In a nanotechnology paradigm, these wires may be physical or logical, but they will be essential for the enormous amount of communication bandwidth that will be required Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998 (27/49) Fault Tolerance: Teramac – Lesson #4 March 22, 2007 The conventional paradigm for computation is to design the computer, build it perfectly, compile the program, and then run the algorithm Teramac paradigm is to build the computer (however imperfectly), find the defects, configure the resources with software, compile the program, and then run it Moves what is difficult to do in hardware into a software task, which is just the continuation of a trend that has accompanied the development of electronic computers from their first appearance Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998 (28/49) Tolerance Methods in Traditional Silicon Architectures Von Neumann Defect Byzantine Defect March 22, 2007 Expect a 0 and see a 1 Expect a 1 and see a 0 Unknown number of faulty inputs Given full communication, if 1/3 of inputs are faulty, the correct output can still be determined Shukla, et al, Evaluating the Reliability of Defect-Tolerant Architectures for Nanotechnology, Proceedings of the 17th International Conference on VLSI Design, 2004. (29/49) Traditional Methods Applied: NAND Multiplexing March 22, 2007 Proposed by von Neumann in 1952 Idea: if the failure probabilities of the gates are sufficiently small and failures are independent, then computations may be done with a high probability of correctness Shukla, et al, Evaluating the Reliability of Defect-Tolerant Architectures for Nanotechnology, Proceedings of the 17th International Conference on VLSI Design, 2004. (30/49) Traditional Methods Applied: NAND Multiplexing March 22, 2007 Shukla, et al, Evaluating the Reliability of Defect-Tolerant Architectures for Nanotechnology, Proceedings of the 17th International Conference on VLSI Design, 2004. (31/49) Traditional Methods Applied: NAND Multiplexing March 22, 2007 Shukla, et al, Evaluating the Reliability of Defect-Tolerant Architectures for Nanotechnology, Proceedings of the 17th International Conference on VLSI Design, 2004. (32/49) Fault Tolerance: Modern Solutions Pair and Spare Triple Modular Redundancy March 22, 2007 2 pairs of circuits Choose the pair that agrees 3 circuits take majority vote (33/49) Fault Tolerance: Fault Protection ACID Atomicity Consistency refers to the ability of the application to make operations in a transaction appear isolated from all other operations. Durability March 22, 2007 refers to being in a legal state when the transaction begins and when it ends. Isolation either all of the tasks of a transaction are performed or none of them is refers to the guarantee that once the user has been notified of success, the transaction will persist, and not be undone. (34/49) Fault Tolerance: Safe Failures Fail-Safe Graceful Degradation March 22, 2007 Should a function fail, it will not cause harm to other areas Operating quality is proportional to severity of failure (35/49) Defect Tolerance: Failure March 22, 2007 Detecting failures in transistors becomes more complex as size decreases Rather than detect and replace failures, accept and over come them (36/49) Defect Tolerance: Accounting for failure Architecture that does not require a large number of working cells March 22, 2007 Find other ways to reach cells Find ways to avoid failed cells Find logically equivalent circuits Will Knight, Y-shaped nanotubes are ready-made transistors, http://www.newscientist.com/article.ns?id=dn7847, 15 August 2005. (37/49) Defect Tolerance: DNA Self-Assembly March 22, 2007 Control over nanoscale devices is exceedingly difficult Exercising more control reduces the speed of self assembly Exercising less control reduces the possible size of self assembly Which methods of control allow the greatest speed and size? (38/49) Defect Tolerance: Controlled Parameters Placement Orientation All nodes are aligned the same direction Interconnect March 22, 2007 All nodes are set up in a grid format All interconnects are straight and at right angles to the node Jaidev P. Patwardhan, Chris Dwyer, and Alvin R. Lebeck, Self-Assembled Networks: Control vs. Complexity, Duke University (39/49) Defect Tolerance: Controlled Parameters March 22, 2007 Patwardhan, et al, A Defect Tolerant Self-organizing Nanoscale SIMD Architecture, Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems (40/49) Defect Tolerance: Network Organization March 22, 2007 Patwardhan, et al, A Defect Tolerant Self-organizing Nanoscale SIMD Architecture, Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems (41/49) Defect Tolerance: Results Shows percent of nodes reachable for each combination of control March 22, 2007 With infinite backoff, there can only be one receiver and one broadcaster Infinite backoff not shown if below 10% of nodes are reachable Device reliability from 99.99% to 100% Patwardhan, et al, A Defect Tolerant Self-organizing Nanoscale SIMD Architecture, Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems (42/49) Defect Tolerance: Reachable Nodes March 22, 2007 Control of orientation and placement (N6) allows for many more reachable nodes for lower device reliability Control of Interconnects and one other parameter (N3, N5) leads to fewer reachable nodes Patwardhan, et al, A Defect Tolerant Self-organizing Nanoscale SIMD Architecture, Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems (43/49) Defect Tolerance: Methods of Control Orientation and Placement controlled through DNA placement. Lack of control of Interconnect matters much less than other parameters March 22, 2007 Control of one implies control of the other Better placement of DNA allows for more control of both parameters More productive to focus on device reliability Gaia Vince, Nano-transistor self-assembles using biology, http://www.newscientist.com/article.ns?id=dn4406, 20 November 2003. (44/49) Motivation Revisited “With the continuing advances in the miniaturization of devices, we are already at the deep submicron scale of device manufacture. However, nanotechnology is emerging as the technology of the not too distant future. In the nano era, device sizes will be in the range of several nanometres, leading to a high degree of failures, due to manufacturing defects, transient faults resulting from reduced noise tolerance at low voltage and current levels, and faults due to ageing because of molecular and other kinds of techniques for creating nano-devices. Although nano-scale manufacturing will allow us to pack more devices on a chip, we have to live with the possibilities of defects in the nano-substrate. As a result, ‘defect-tolerant architecture’ is being posed as a way to mitigate the challenge of the inherent unreliability at the nano-scale. Defect-tolerance is built into the architecture in the form of redundancy of devices and functional units.” March 22, 2007 Shukla, et al, Evaluating the Reliability of Defect-Tolerant Architectures for Nanotechnology, Proceedings of the 17th International Conference on VLSI Design, 2004. (45/49) Conclusions Evolutionary Advances Revolutionary Advances March 22, 2007 Traditional semiconductor technologies are reaching their limits Mandate some form of effective defect and fault tolerance to behave within desired error limits Currently researched methods are primarily probabilistic with varying levels of effectively depending on model Much more research is need in this arena, especially using fabricated devices instead of solely modeled ones (46/49) References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. March 22, 2007 Debayan Bhaduri, Sandeep Shukla, NANOLAB: A Tool for Evaluating Reliability of Defect-Tolerant Nano Architectures http://www.rpi.edu/~schubert/Educational%20resources/Educational%20resources.htm http://www.cpu-world.com/CPUs/CPU.html www.wikipedia.org INTERNATIONAL TECHNOLOGY ROADMAP FOR SEMICONDUCTORS, http://www.sia-online.org Ellenbogen, J.C., Love, J.C., Architectures for molecular electronic computers, PROCEEEDINGS OF THE IEEE, VOL. 88, NO. 3, MARCH 2000. Shukla, Goldstein, et al, Nano, Quantum, and Molecular Computing: Are We Ready for the Validation and Test Challenges. In Eighth IEEE International High-Level Design Validation and Test Workshop, pages 37, November, 2003. Heath, J. R., et al, A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology, Science, Vol. 280, JUNE 1998 Will Knight, Y-shaped nanotubes are ready-made transistors, http://www.newscientist.com/article.ns?id=dn7847, 15 August 2005. Jaidev P. Patwardhan, Chris Dwyer, and Alvin R. Lebeck, Self-Assembled Networks: Control vs. Complexity, Duke University Patwardhan, et al, A Defect Tolerant Self-organizing Nanoscale SIMD Architecture, Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems Gaia Vince, Nano-transistor self-assembles using biology, http://www.newscientist.com/article.ns?id=dn4406, 20 November 2003. Shukla, et al, Evaluating the Reliability of Defect-Tolerant Architectures for Nanotechnology, Proceedings of the 17th International Conference on VLSI Design, 2004. (47/49) Thank You March 22, 2007 (48/49) Questions? March 22, 2007 (49/49)