2011 11th Non-Volatile Memory Technology Symposium @ Shanghai, China, Nov. 9, 20112 MTJ-Based Nonvolatile Logic-in-Memory Architecture and Its Application Takahiro Hanyu1,3, S. Matsunaga1, D. Suzuki1, M. Natsui1,3, S. Ikeda1,2, T. Endoh1,4, N. Kasai1, and H. Ohno1,2 1Center for Spintronics Integrated Systems, Tohoku University, JAPAN for Nanoelectronics and Spintronics, RIEC, Tohoku University, JAPAN 3Laboratory for Brainware Systems, RIEC, Tohoku University, JAPAN 4Center for Interdisciplinary Research, Tohoku University, JAPAN 2Laboratory http://www.csis.tohoku.ac.jp/ Acknowledgement: This work was supported by the project “Research and Development of Ultra-Low Power Spintronics-Based VLSIs” under the FIRST program of JSPS (Leader: Prof. H. Ohno). Outline • Nonvolatile Logic-in-Memory Architecture Overview • NV GP-Logic: Nonvolatile-FPGA • NV SP-Logic: Nonvolatile-TCAM • Conclusions & Future Prospects 2 Background: Increasing delay & power Leakage current Logic and Memory modules are separated Many interconnections between modules On-chip memory modules are volatile. Wire delay dominates chip performance Global wires requires large drivers. Power supply must be continuously applied in memory modules. Delay: Long Power: Large Static power: Large 3 Nonvolatile logic-in-memory architecture Logic-in-Memory Architecture (proposed in 1969): Storage elements are distributed over a logic-circuit plane. • Magnetic Tunnel Junction (MTJ) device •No volatility MTJ •Unlimited endurance layer •Fast writability •Scalability CMOS •CMOS compatibility layer •3-D stack capability ●Storage is nonvolatile: (Leakage current is cut off) ●MTJ devices are put on the CMOS layer ●Storage/logic are merged: (global-wire count is reduced) Static power is cut off. Chip area is reduced. Wire delay is reduced. Dynamic power is reduced. 4 Implementation of MTJ Device MTJ Device Fabricated CMOS back-end process UE MTJ Layer LE TH M3 M3 TH M2 TH TH M1 M1 C Gate C Substrate MTJ Metal Layers TH M2 MTJ M1 Gate C CMOS Layer 300nm D. Suzuki, et al., VLSI Circuit Symp. 2009 Diffusion MTJ device stacked over MOS Plane Cross-sectional SEM image . area cost using MTJ device is small. The 5 Use of Charge and Spin Storage “0” state Logic Spin is used as an information carrier. Data is still kept when VDD is OFF (Nonvolatile) Spintronics High resistance “1” state Low resistance Logic operation is possible Logic Charge is used as an information carrier. Storage Silicon CMOS Use of both charge and spin “0” state Very high resistance “1” state Very low resistance Charge is gone when VDD is OFF (Volatile) Logic operation is done 6 Realize no volatility and rich logic functionality Power-Gating Suitability Power switch Power switch External Nonvolatile storage Leakage current Volatile storage Power switch Power VDD Active Standby Escape to NVM Active Reload from NVM Logic VDD Power switch (PMOS) Power Time Active Standby Active Nonvolatile storage (MTJ device) Logic Time NV logic-in-memory architecture Power gating is performed without data backup/reload. 7 Nonvolatile Processor Architecture Flash GP-Logic: General-purpose logic SP-Logic: Special-purpose logic DRAM Spin RAM Spin RAM NV SRAM NV NV FF GP-Logic FF SP-Logic Si 1st -step Nonvolatile Processor NVLIM NVLIM GP-Logic SP-Logic Si 2nd-step Nonvolatile Processor NV SRAM Nonvolatile Field-Programmable Gate Array (FPGA) Nonvolatile Ternary ContentAddressable Memory (TCAM) 8 Outline • Nonvolatile Logic-in-Memory Architecture Overview • NV GP-Logic: Nonvolatile-FPGA • NV SP-Logic: Nonvolatile-TCAM • Conclusions & Future Prospects 9 Nonvolatile Field-Programmable Gate Array (FPGA) NV LUT (Lookup Table) NV device -- Arbitrary logic functions are performed and programmed by FPGA -- Power dissipation and hardware overhead are two major issues. -- NV storage elements are distributed over the NV-FPGA (no external NVM). ☺ NVM NV FPGA Leakage current elimination and short latency are possible. How to design? Nonvolatile logic-inmemory architecture Not required! 10 Conventional nonvolatile FPGA CMOS logic circuit requires high-voltage input swing. MTJ MTJ MTJ MTJ SA SA SA SA Combinational logic (CMOS) Output (SA: Sense Amplifier) Low voltage High Voltage How do we perform logic operation by using low swing signal from MTJ device directly? 11 MOS/MTJ-hybrid circuitry (Proposed) Current-mode logic (CML) Logic operation is performed even low swing voltage by using the small difference of the current value. MTJ MTJ MTJ MTJ Combinational logic (Current-Mode) Low voltage SA Output High voltage Device count is reduced to 28% with less performance degradation. 12 Operation example (XOR) Z=0 IF > IREF IF RAP ‘1’ ‘0’ ‘1’ ‘0’ Z=1 Sense Amplifier ‘1’ RP ‘0’ RP ‘1’ RAP ‘0’ IREF Truth table ‘0’ A B Z 0 0 1 ‘1’ 0 1 0 1 0 0 1 1 1 RREF Logic operation in low swing voltage is performed by using a MOS/MTJ-hybrid network. 13 Test chip features Fabricated 2-input LUT Selection Transistor Tree 4 MTJ devices are stacked over MOS layer D. Suzuki, et al., VLSI Circuit Symposium, June 2009. Process 0.14m MTJ/MOS 1-Poly, 3-Metal Area 287m2 MTJ Size 50nm 150nm TMR Ratio Current Write Time Standby Current 100% 150A 10ns 0A 14 Measured waveforms (Basic operations) P E P E P E P E P: Pre-Charge E: Evaluate Input A Input B Output Z ‘1’ ‘0’ ‘0’ ‘0’ ‘1’ ‘1’ ‘1’ ‘0’ Output Z NAND NOR A 0.78V/div B Z 100s/div ‘0’ ‘1’ ‘1’ ‘0’ ‘1’ ‘0’ ‘0’ ‘1’ Z XOR XNOR 15 Immediate wakeup behavior Active Standby Active VDD= 0 A B CLK VDD A B 00 01 10 11 00 01 10 11 Z Z 0.78V/div 50s/div Immediate wakeup behavior has also measured successfully. 16 Comparison of performances Device Counts Standby Proposed 29 MOSs + 4 MTJs 702 m2 287m2 Delay *3) 140 ps 185 ps Power*3) 26.7 W 17.5 W Power 0 W 0 W Area *2) Active Nonvolatile SRAM *1) 102 MOSs + 8 MTJs *1) W. Zhao, et al., Physica Status SOLIDI a Application and Materials Science, 205, 6, 1373/1377, May 2008. *2) Estimation based on a 0.14m process *3) HSPICE simulation based on a 0.14m MOS/MTJ-hybrid process 17 Outline • Nonvolatile Logic-in-Memory Architecture Overview • NV GP-Logic: Nonvolatile-FPGA • NV SP-Logic: Nonvolatile-TCAM • Conclusions & Future Prospects 18 Ternary Content-Addressable Memory (TCAM) 0 1 0 0 1 ・・・ 1 0 Fully parallel masked equality search Search-line / Word-line driver 2 2 BL1 BL1’ BL2 BL2’ 1 0 2 1 1 2 0 0 2 2 0 0 0 1 Stored words 2 BLn BLn’ 1 0 1 0 1 2 ・・・ ・・・ 1 X 2 X X OUT1 OUT2 ・・・ Bit-line driver 2 ・・・ X X OUTn Output driver Input key 0 (Mismatch) 1 (Match) 0 (Mismatch) Fully parallel search and fully parallel comparison can be done. TCAM is a “functional memory.” TCAM is the powerful data-search engine useful for various applications such as database machine and virus checker in network router TCAM must be implemented more compactly with lower power dissipation. 19 NV-TCAM Cell Function Stored data B 0 1 X don’t care (b1,b2 ) Search input S Current comparison Match result ML 1 0 IZ < IZ’ 1 IZ > IZ’ (Mismatch) 0 IZ > IZ’ (Mismatch) 1 IZ < IZ’ (Match) 0 IZ < IZ’ (Match) 1 IZ < IZ’ (0,1) (1,0) (0,0) (Match) 0 0 1 1 1 (Match) 20 CMOS-based TCAM cell circuit 1-bit storage Equality-detection (ED) circuit 1-bit storage ML VDD Leakage current WL VSS BL1 Leakage current SL’ SL BL2 Transistor counts : 12 (ED;4T, 2-bit storage;8T) Input/output wires : 8 (BL;2, WL;1, VDD&VSS;2, SL;2, ML;1) Always supply the power : Many leakage current path How to realize compact & cut off the leakage current ? 21 MOS/MTJ-hybrid TCAM cell circuit S. Matsunaga, et al. Applied Physics Express (APEX), 2, 2, 023004, Feb. 2009. ML/BL 2-bit storage (MTJs) Logic (MTJs & MOSs) SL’/WL1 SL/WL1 •Merge storage into logic circuit : Compact (2T-2MTJ) •Share wires : 4 (ML/BL, SL/WL, No-VDD) •3-D stack structure : Great reduction of circuit area Compact & nonvolatile TCAM cell with MTJ devices 22 Power-Gating Scheme of Bit-Serial NV-TCAM S. Matsunaga, et al., JJAP 49 (2010) 04DM05. 1st-bit search 1 1 1 Search word 0 0 X SA ACC 0 1 0 SA 0 X 1 1 0 1 2nd-bit search 1 1 1 Search word Mismatch 0 0 X SA ACC ACC Mismatch 0 1 0 SA SA ACC Mismatch 0 X 1 X SA ACC Match 1 0 1 0 SA ACC Match 1 1 X 1 SA ACC Match X 0 X SA ACC X 1 0 SA X X 1 SA 3rd-bit search 1 1 1 Search word Mismatch 0 0 X SA ACC Mismatch ACC Mismatch 0 1 0 SA ACC Mismatch SA ACC Mismatch 0 X 1 SA ACC Mismatch X SA ACC Mismatch 1 0 X SA ACC Mismatch 1 0 SA ACC Match 1 1 0 SA ACC Mismatch 1 X 1 SA ACC Match 1 X 1 SA ACC Match Match X 0 X SA ACC Mismatch X 0 X SA ACC Mismatch ACC Match X 1 0 SA ACC Match X 1 0 SA ACC Mismatch ACC Match X X 1 SA ACC Match X X 1 SA ACC Match TCAM cell in standby mode (Static power is suppressed.) TCAM cell in active mode SA Sense amplifier in active mode SA ACC Accumulator in active mode Sense amplifier in standby mode (Static power is suppressed.) According to the word length of the TCAM, the effectiveness of the standby-power reduction is increased. 23 TCAM cell circuit test chip 3.0 m Chip features 9.8 m Output generator in MLSA TCAM cell Ref. cell Dynamic current comparator in MLSA Process 0.14m CMOS/MTJ 1-Poly, 3-Metal Total area 29.4 m2 TCAM cell size 3.15 m2 (2.1 m×1.5 m) a) Cell structure 2MOSs-2MTJs MTJ size 50 nm×200 nm TMR ratio 167 % Average write current 274 A (p = 10 s) b) Standby current 0A (Power off) CMOS-based TCAM cell with 12 transistors, whose cell size is 17.54 m2 under a 0.18 m CMOS process, has been reported.8) The size of the conventional TCAM cell can be estimated as 10.61 m2 under a 0.14 m CMOS process by scaling down. Thus, the size of the fabricated TCAM cell is reduced to 30 % compared to that of the conventional one. Moreover, minimum size of the proposed TCAM cell can be considered as 1/6 of the conventional one. b) More high-speed write operation is possible with increase of write current. For example, with the average current 24 of 327 A at 10 ns write. a) A Waveforms of equality-search operations P : Precharge phase E : Evaluate phase Bit-level equality-search is successfully demonstrated. 25 Waveforms of sleep/wake-up operations VDD Power-off Power-off Active P Active E Standby P E P Active E Standby P E CLK Stored data B=0 S S=0 OUTbefore=1 Stored data B=0 S=0 OUTafter=1 S=1 OUTbefore=0 S=1 OUTafter=0 OUT 780mV 10s Match Match Mismatch Mismatch Instant sleep/wake-up behavior is successfully demonstrated. 26 Outline • Nonvolatile Logic-in-Memory Architecture Overview • NV GP-Logic: Nonvolatile-FPGA • NV SP-Logic: Nonvolatile-TCAM • Conclusions & Future Prospects 27 Conclusions Propose a MOS/MTJ-hybrid circuit (nonvolatile logic-inmemory circuit using MTJ devices) style Two kinds of typical applications with logic-in-memory architecture; NV-LUT circuit and NV- TCAM Compact and no static power dissipation Confirm basic behavior with fabricated test chips under an MTJ/CMOS process. It could open an ultra-low-power logic-circuit paradigm Future Prospects and Issues: 1. Establish the fabrication line 2. Establish the CAD tools 3. Explore the appropriate application fields (Impact towards “Reliability Enhancement”) 28 29