Fabrizio Lombardi ITC Endowed Chair Professor Dept of ECE Northeastern University, Boston CMOS: currently at 28/22nm, soon to move further down in scaling (ITRS) New commercial markets: GPU, tablet, massive external storage (mostly portable) Emerging paradigms: multi-value operation, non-volatile RAM, processing-in-memory Challenges: New designs abound, but not yet a clear winner CMOS is not going away any time soon More and More-Than Moore Beyond CMOS year Elements Beyond CMOS Extending MOSFETs to the End of the Roadmap ___________ CNTFETs Graphene nanoribbons III-V Channel MOSFETs Ge Channel MOSFETs Nanowire FETs Tunnel FET Non-conventional Geometry Devices Unconventional FETSCharge-based Extended CMOS Devices _______________ Spin FET& Spin MOSFET Negative Cg MOSFET NEMS switch Excitonic FET, Mott FET Tunnel FET I-MOS SET Non-FET, Non Charge-based ‘Beyond CMOS’ Devices _______________ Spin Transfer Torque Logic Moving domain wall devices Pseudo-spintronic Devices Nanomagnetic (M:QCA) Negative Cg MOSFET All Spin Logic Molecular Switch Atomic Switch BiSFET Resistive Memories Spin Transfer Torque MRAM Nanoelectromechanical Nanowire PCM Macromolecular (Polymer) Electronic Effects Memory − Charge trapping − Metal-Insulator Transition − FE barrier effects Redox Memory −Nanoionic memory −Electrochemical memory − Fuse/Antifuse memory Molecular Memory Capacitive Memory FeFET Memory NVM cost/gigabyte ~ $1 (Intel) PVT variations Stability (SNM) concern Power dissipation Charge diffusion and collection in the layout Basic binary operation (supply voltage requirements) Inability to meet large storage needs Likely soft errors Avoid large capital investment, selectively use new/compatible technologies Preferably, hybrid circuits Multi-level (multi-bit) operation Processing in memory (PIM) Problematic endurance Move to higher radix bases than binary: ternary, quad or eventually octal Bases: 1. Ternary: used for CAM processing mostly in routers, but also in GPUs (cache) 2. Quaternary/Octal: increase capacity for massive storage (to replace flash memories) Not efficiently done in CMOS (additional voltage rails and high area/power penalty) Use radically new technologies ITRS: memory has always met stated objectives in the past Late 2014 as crucial initial milestone wrt to performance (power dissipation and density) and design fundamentals. Discuss new (emerging) directions: Unorthodox technologies (briefly) Material-based technologies Focus on non volatile memories Innovative operational paradigms for memory using new physics storage phenomena: 1. QCA (memory in motion); challenge is room temperature operation and CMOS compatibility for manufacturing 2. SET (controlled transfer of electrons for memory operation purposes) Long term opportunities abound, but grand challenges too Currently applicable mostly to an academic investigation Exploit new materials and fabrication methods (CMOS compatible) to meet challenges Additional criteria: 1. Hybrid operation is usually sought 2. Robustness to PVT variations/endurance. 3. New design realms: Multi level (resistance) for increased capacity Ambipolar operation for control APPLICATION: non volatile storage 2011 Memory Application (ITRS) Emerging Research Memory Technology Stand-Alone Ferroelectric-gate FET X Nanoelectromechanical RAM X Spin Transfer Torque MRAM Embedded X X Nanoionic or Redox Memory X X Nanowire Phase Change Memory (PCM) X X Electronic Effects (Charge trapping, Mott) X Macromolecular memory X X Molecular memory X X Also know as Resistive RAMs: add (programmable) resistive element(s) to active device(s) (usually 1T1R for simplest non-volatile cell design) Issues: 1. Resistance range (Rmax-Rmin) 2. Power dissipation and leakage 3. Programmability and universal memory feature 4. Error/defect models (soft and drift) 5. Endurance (related to read/write operation) 6. Testing FEATURE Capacity Random Read Random Write Endurance Management Error Correction Retention(ys) Read Access(ns) Prog Access(us) Erase Access(ms) Power Cell size(F^2) Universal Memory NOR 256MB Yes No 10^5 High No 10 60 200 1-100 Mid 10 No NAND 16GB No No 10^5-10^3 High 1-72 bits 1-10 60 200 1-100 Mid 4 No PCM 32MB Yes Yes 10^6 Mod * 15 10 20 50 Mid 4 Yes MRAM 2MB Yes Yes 10^15 No No 20 35 35 35 Low 6-20 Yes FRAM 1MB Yes Yes 10^14 No No 5-20 60 60 60 Low 4-15 Yes Flash memory seen as a mature technology, unable to capitalize on scaling and not meeting high density storage for mobile application Low lifetime due to high-voltage based process Apple and Anobit (2012) Additional players: Samsung, Micron, IBM • Does not require many transistors or other access devices Remove silicon requirements: • Improve density • Reduce power consumption • Integrate with processors • Reduce total area • Crossbar Inc (August 2013): 3D stacking, 1TByte on chip prototype (using FeRRAM) Feature size = Litho node F Cell Size = 4 F2 P Pitch = 2F for cross bars The Memristor: Prediction Fourth Fundamental, Two-Terminal Circuit Element φ Leon Chua U.C. Berkeley v q dφ/ dt = v i dq / d t = i v Ohm 1827 RESISTOR dv = R di q i 1831 Faraday Von Kleist 1745 CAPACITOR dq = C dv MEMRISTOR dφ = M dq IN DUCTOR dφ = Ldi φ 1971 Chua Resistance depends on direction of voltage or current across it (dϕ = M*dq) Titanium dioxide film sandwiched between two platinum electrodes; doped operation (HP Labs), 5-10nm in length Resistance Range • Between Ron and Roff • Roff : Highest resistance • Ron : Lowest resistance Excellent linearity in switching Resistive range is good I-V characteristics are also very good Nanometric dimension (10nm in 2011, 5nm in 2013): very high density potential at extremely low power consumption Manufacturing compatibility with CMOS Problem: endurance and leakage (on read) Ambipolar control of single memristor No standby power, no direct path from VDD to GND, only dynamic power dissipation Less number of transistors than RAM (6T) Memristor changes its value when reading Roff state Refresh operation is required Write time significantly higher than read VDD(V) 32 nm 0.9 V 1 V 45nm 0.9 V 1 V 65 nm 0.9 V 1 V Write time (ns) 160 150 195 180 235 200 Read time (ns) 0.8 0.75 0.975 0.9 1.175 1 Ti 1nm /Pt 100nm/TiOx 29nm/Ti4O7 100nm Resistance (ohm) 104 103 R on R o ff 102 100 101 102 103 104 105 106 s w itc h in g c yc le s Use phases of GTS (chalcogenide alloy) High current-based process for two phases: amorphous (high R) and crystalline (low R). No erase-write cycle as for NAND flash (at most 100,000 cycles for enterprise product) Ron, programming (write) region: intersection of Ron curve with voltage axis is Vh (holding voltage) Roff, read region: this can be changed by I or V pulse; Roff=Ron exp(toff/t) where t=effective recombination time (constant), toff=non programming time Vx as intersection point of Ron curve and Rset curve, Vx=Vh x Rset/(Rset-Ron) Typical values: Rset=7k, Rreset=200k, Ron=1k, Vh=0.45v, Rset<Roff<Rreset, t=5nsec Mobile devices (Samsung) PCM likely to a be a depository (for less frequently accessed data) next to DRAM for processor design (IBM) Networking/Communication systems: CAM/TCAM designs Massive storage for data acquisition systems ISSCC11: Samsung (1-Gbit, 58-nm manufacturing process, low-power doubledata-rate nonvolatile memory interface) ISSCC12 : Samsung (8-Gbit, 20-nm device). IEDM11: Macronix/IBM (39-nm device with 30-microamp reset current and 10^9 cycling endurance, 128-Mbit) July 2012: Micron/Numonyx (45 nm PCM for mobile devices in 1 Gb and 512 Mb multichip packages); commercially available Low voltage and moderate current as operational characteristics Multiple bit operation (at least 2): higher resistance range (M ohms) than other RRAMs Read Time: 12ns; Write time: 85ns (@45nm) Soft error highly unlikely to occur for GST Good endurance (IBM: 1million cycles) and density Use 1T1P core for both CAM/TCAM Functionality is at support circuitry Voltage-based sensing for comparison outcome in search Use of circuit with ambipolar properties for comparison and control IBM (1/2 PCMs per core), current based operation Stored Search IML (A) 0 (200kΩ) 0 (VSL = 0) -1.38*10-9 1 (VSL = 0.4) -1.97*10-6 0 (VSL = 0) -1.38*10-9 1 (VSL = 0.4) -4.15*10-5 1 (7kΩ) Circuit New cell (1 PCM per core), voltage based operation Write Time (ns) Search Time (ns) Number of Transistors/C ore Number of PCM s/Core PDP of Search Operation (fJ) CAM [20] Proposed 199.34 199.34 1.326 1.092 1 1 1 1 46.6886 36.4296 [20] 209.53 1.346 TCAM Proposed 199.34 2.447 2 1 2 1 48.41 43.4518 Practical problem: drift of resistance and threshold voltage (when not read or programmed) Related to crystalline fraction (Cx) in GST Rpcm=(1-Cx)*Ra+Rc*Cx (Ra >> Rc) Ra=Rreset Rc=Rset Level drift is more pronounced for high resistance states and non linear wrt time Problematic for MVL storage (i.e. more than one bit per cell) Order of resistivity for states remains the same (short term), so avoid overlap in long term. Use advanced modulation coding technique for solving short-term drift (analogous to NAND flash, electrons leak through thin walls of cells and create data read errors). Apply a voltage pulse based on deviation from desired level and measure resistance. If desired level of resistance is not achieved, apply another voltage pulse and measure again – until achieve the exact level Only suitable for binary cell storage It may reduce endurance (multiple writes) Assume cell independence in drift errors (?). Data to be encoded not in the programmed state but in the relative order of the states in a small group of cells. Error in encoding scheme only seen when resistivity levels of states cross each other Software-based error correction methodologies are then applied (slow) Reduction in capacity: from 2 bits/cell to 1.57 bits/cell Octal base for MVL (noise, crosstalk) and/or single vs multiple storage elements MVL implications on error detection/correction Dynamic models of RRAM operation in HSPICE (as related to drift evaluation and mitigation) At system-level, improve endurance by reducing maximum number of writes to a cell System-level application modeling (for example “normally-off instantly-on” operation: combining SRAM with PCM) Emergence of new paradigms: resistive RAMs, non-volatile operation, multi-bit storage Nearly all future memories will utilize new phenomena away from 6T configuration TECHNOLOGY TIME SCALE: Hybrid implementations will be dominant in the next 5-10 years 4Q-2014/1Q-2015 as crucial time frame for PCM