MTJ-Based Nonvolatile Logic-in-Memory Architecture and Its

advertisement
2011 11th Non-Volatile Memory Technology Symposium @ Shanghai, China, Nov. 9, 20112
MTJ-Based Nonvolatile
Logic-in-Memory Architecture
and Its Application
Takahiro Hanyu1,3,
S. Matsunaga1, D. Suzuki1, M. Natsui1,3, S. Ikeda1,2, T. Endoh1,4, N. Kasai1,
and H. Ohno1,2
1Center
for Spintronics Integrated Systems, Tohoku University, JAPAN
for Nanoelectronics and Spintronics, RIEC, Tohoku University, JAPAN
3Laboratory for Brainware Systems, RIEC, Tohoku University, JAPAN
4Center for Interdisciplinary Research, Tohoku University, JAPAN
2Laboratory
http://www.csis.tohoku.ac.jp/
Acknowledgement:
This work was supported by the project “Research and
Development of Ultra-Low Power Spintronics-Based VLSIs”
under the FIRST program of JSPS (Leader: Prof. H. Ohno).
Outline
• Nonvolatile Logic-in-Memory
Architecture Overview
• NV GP-Logic: Nonvolatile-FPGA
• NV SP-Logic: Nonvolatile-TCAM
• Conclusions & Future Prospects
2
Background: Increasing delay & power
Leakage
current
Logic and Memory modules are separated
Many interconnections between modules
On-chip memory modules are volatile.
Wire delay dominates chip performance
Global wires requires large drivers.
Power supply must be continuously
applied in memory modules.
Delay: Long
Power: Large
Static power: Large
3
Nonvolatile logic-in-memory architecture
Logic-in-Memory Architecture (proposed in 1969):
Storage elements are distributed over a logic-circuit plane.
•
Magnetic Tunnel Junction
(MTJ) device
•No volatility
MTJ
•Unlimited endurance
layer
•Fast writability
•Scalability
CMOS
•CMOS compatibility
layer
•3-D stack capability
●Storage is nonvolatile:
(Leakage current is cut off)
●MTJ devices are put
on the CMOS layer
●Storage/logic are merged:
(global-wire count is reduced)
Static power is cut off.
Chip area is reduced.
Wire delay is reduced.
Dynamic power is reduced.
4
Implementation of MTJ Device
MTJ Device
Fabricated
CMOS back-end
process
UE
MTJ
Layer
LE
TH
M3
M3
TH
M2
TH
TH
M1
M1
C
Gate C
Substrate
MTJ
Metal
Layers
TH
M2
MTJ
M1
Gate
C
CMOS
Layer
300nm
D. Suzuki, et al., VLSI Circuit Symp. 2009
Diffusion
MTJ device stacked over MOS Plane
Cross-sectional SEM image
. area cost using MTJ device is small.
The
5
Use of Charge and Spin
Storage
“0” state
Logic
Spin is used as an
information carrier.
Data is still kept
when VDD is OFF
(Nonvolatile)
Spintronics
High resistance
“1” state
Low resistance
Logic operation
is possible
Logic
Charge is used as an
information carrier.
Storage
Silicon CMOS
Use of both charge and spin
“0” state
Very high
resistance
“1” state
Very low
resistance
Charge is gone
when VDD is OFF
(Volatile)
Logic operation
is done
6
Realize no volatility and rich logic functionality
Power-Gating Suitability
Power
switch
Power
switch
External
Nonvolatile
storage
Leakage
current
Volatile
storage
Power
switch
Power
VDD
Active
Standby
Escape
to NVM
Active
Reload
from NVM
Logic
VDD
Power switch
(PMOS)
Power
Time
Active
Standby
Active
Nonvolatile
storage
(MTJ device)
Logic
Time
NV logic-in-memory architecture
Power gating is performed without data backup/reload.
7
Nonvolatile Processor Architecture
Flash
GP-Logic: General-purpose logic
SP-Logic: Special-purpose logic
DRAM
Spin RAM
Spin RAM
NV
SRAM
NV
NV
FF GP-Logic FF SP-Logic
Si
1st -step Nonvolatile Processor
NVLIM
NVLIM
GP-Logic SP-Logic
Si
2nd-step Nonvolatile Processor
NV
SRAM
Nonvolatile Field-Programmable
Gate Array (FPGA)
Nonvolatile Ternary ContentAddressable Memory (TCAM)
8
Outline
• Nonvolatile Logic-in-Memory Architecture
Overview
• NV GP-Logic: Nonvolatile-FPGA
• NV SP-Logic: Nonvolatile-TCAM
• Conclusions & Future Prospects
9
Nonvolatile Field-Programmable Gate Array (FPGA)
NV LUT (Lookup Table)
NV device
-- Arbitrary logic functions are performed
and programmed by FPGA
-- Power dissipation and hardware overhead
are two major issues.
-- NV storage elements are distributed
over the NV-FPGA (no external NVM).
☺
NVM
NV
FPGA
Leakage current elimination
and short latency are possible.
 How to design?
Nonvolatile logic-inmemory architecture
Not required!
10
Conventional nonvolatile FPGA
 CMOS logic circuit requires
high-voltage input swing.
MTJ
MTJ
MTJ
MTJ
SA
SA
SA
SA
Combinational
logic
(CMOS)
Output
(SA: Sense Amplifier)
Low voltage
High Voltage
How do we perform logic operation by using
low swing signal from MTJ device directly?
11
MOS/MTJ-hybrid circuitry (Proposed)
Current-mode logic (CML)
 Logic operation is performed even low swing voltage by
using the small difference of the current value.
MTJ
MTJ
MTJ
MTJ
Combinational
logic
(Current-Mode)
Low voltage
SA
Output
High voltage
Device count is reduced to 28% with less
performance degradation.
12
Operation example (XOR)
Z=0
IF > IREF
IF
RAP
‘1’
‘0’
‘1’
‘0’
Z=1
Sense Amplifier
‘1’
RP
‘0’
RP
‘1’
RAP
‘0’
IREF
Truth table
‘0’
A
B
Z
0
0
1
‘1’
0
1
0
1
0
0
1
1
1
RREF
Logic operation in low swing voltage is performed
by using a MOS/MTJ-hybrid network.
13
Test chip features
Fabricated 2-input LUT
Selection
Transistor Tree
4 MTJ devices are
stacked over MOS layer
D. Suzuki, et al., VLSI Circuit Symposium, June 2009.
Process
0.14m
MTJ/MOS
1-Poly,
3-Metal
Area
287m2
MTJ Size
50nm
150nm
TMR Ratio
Current
Write
Time
Standby Current
100%
150A
10ns
0A
14
Measured waveforms (Basic operations)
P E P E P E P E
P: Pre-Charge
E: Evaluate
Input A
Input B
Output Z
‘1’
‘0’
‘0’
‘0’
‘1’
‘1’
‘1’
‘0’
Output Z
NAND
NOR
A
0.78V/div
B
Z
100s/div
‘0’
‘1’
‘1’
‘0’
‘1’
‘0’
‘0’
‘1’
Z
XOR
XNOR
15
Immediate wakeup behavior
Active
Standby
Active
VDD= 0
A B
CLK
VDD
A B
00 01 10 11
00 01 10 11
Z
Z
0.78V/div
50s/div
Immediate wakeup behavior
has also measured successfully.
16
Comparison of performances
Device Counts
Standby
Proposed
29 MOSs
+ 4 MTJs
702 m2
287m2
Delay *3)
140 ps
185 ps
Power*3)
26.7 W
17.5 W
Power
0 W
0 W
Area *2)
Active
Nonvolatile
SRAM *1)
102 MOSs
+ 8 MTJs
*1) W. Zhao, et al., Physica Status SOLIDI a Application and Materials Science, 205, 6,
1373/1377, May 2008.
*2) Estimation based on a 0.14m process
*3) HSPICE simulation based on a 0.14m MOS/MTJ-hybrid process
17
Outline
• Nonvolatile Logic-in-Memory Architecture
Overview
• NV GP-Logic: Nonvolatile-FPGA
• NV SP-Logic: Nonvolatile-TCAM
• Conclusions & Future Prospects
18
Ternary Content-Addressable Memory (TCAM)
0
1
0
0
1
・・・
1
0
Fully parallel
masked equality
search
Search-line / Word-line driver
2
2
BL1
BL1’
BL2
BL2’
1
0
2
1
1
2
0
0
2
2
0
0
0
1
Stored words
2
BLn
BLn’
1
0
1
0
1
2
・・・
・・・
1
X
2
X
X
OUT1
OUT2
・・・
Bit-line driver
2
・・・
X
X
OUTn
Output driver
Input key
0 (Mismatch)
1 (Match)
0 (Mismatch)
Fully parallel search and fully parallel comparison can be done.
TCAM is a “functional memory.”
TCAM is the powerful data-search engine
useful for various applications such as database machine and virus checker in
network router
TCAM must be implemented more compactly with lower power dissipation.
19
NV-TCAM Cell Function
Stored data
B
0
1
X
don’t
care
(b1,b2 )
Search
input
S
Current
comparison
Match
result
ML
1
0
IZ < IZ’
1
IZ > IZ’
(Mismatch)
0
IZ > IZ’
(Mismatch)
1
IZ < IZ’
(Match)
0
IZ < IZ’
(Match)
1
IZ < IZ’
(0,1)
(1,0)
(0,0)
(Match)
0
0
1
1
1
(Match)
20
CMOS-based TCAM cell circuit
1-bit storage
Equality-detection
(ED) circuit
1-bit storage
ML
VDD
Leakage
current
WL
VSS
BL1
Leakage
current
SL’
SL
BL2
Transistor counts : 12 (ED;4T, 2-bit storage;8T)
Input/output wires : 8 (BL;2, WL;1, VDD&VSS;2, SL;2, ML;1)
Always supply the power : Many leakage current path
How to realize compact & cut off the leakage current ?
21
MOS/MTJ-hybrid TCAM cell circuit
S. Matsunaga, et al. Applied Physics Express (APEX), 2, 2, 023004, Feb. 2009.
ML/BL
2-bit storage
(MTJs)
Logic
(MTJs & MOSs)
SL’/WL1
SL/WL1
•Merge storage into logic circuit : Compact (2T-2MTJ)
•Share wires : 4 (ML/BL, SL/WL, No-VDD)
•3-D stack structure : Great reduction of circuit area
Compact & nonvolatile TCAM cell with MTJ devices
22
Power-Gating Scheme of Bit-Serial NV-TCAM
S. Matsunaga, et al., JJAP 49 (2010) 04DM05.
1st-bit search
1
1
1
Search word
0
0
X
SA
ACC
0
1
0
SA
0
X
1
1
0
1
2nd-bit search
1
1
1
Search word
Mismatch
0
0
X
SA
ACC
ACC
Mismatch
0
1
0
SA
SA
ACC
Mismatch
0
X
1
X
SA
ACC
Match
1
0
1
0
SA
ACC
Match
1
1
X
1
SA
ACC
Match
X
0
X
SA
ACC
X
1
0
SA
X
X
1
SA
3rd-bit search
1
1
1
Search word
Mismatch
0
0
X
SA
ACC
Mismatch
ACC
Mismatch
0
1
0
SA
ACC
Mismatch
SA
ACC
Mismatch
0
X
1
SA
ACC
Mismatch
X
SA
ACC
Mismatch
1
0
X
SA
ACC
Mismatch
1
0
SA
ACC
Match
1
1
0
SA
ACC
Mismatch
1
X
1
SA
ACC
Match
1
X
1
SA
ACC
Match
Match
X
0
X
SA
ACC
Mismatch
X
0
X
SA
ACC
Mismatch
ACC
Match
X
1
0
SA
ACC
Match
X
1
0
SA
ACC
Mismatch
ACC
Match
X
X
1
SA
ACC
Match
X
X
1
SA
ACC
Match
TCAM cell in standby mode
(Static power is suppressed.)
TCAM cell in active mode
SA Sense amplifier in active mode
SA
ACC Accumulator in active mode
Sense amplifier in standby mode
(Static power is suppressed.)
According to the word length of the TCAM,
the effectiveness of the standby-power reduction is increased.
23
TCAM cell circuit test chip
3.0 m
Chip features
9.8 m
Output
generator
in
MLSA
TCAM
cell
Ref.
cell
Dynamic
current
comparator
in
MLSA
Process
0.14m CMOS/MTJ
1-Poly, 3-Metal
Total area
29.4 m2
TCAM cell size
3.15 m2
(2.1 m×1.5 m) a)
Cell structure
2MOSs-2MTJs
MTJ size
50 nm×200 nm
TMR ratio
167 %
Average
write current
274 A (p = 10 s) b)
Standby current
0A (Power off)
CMOS-based TCAM cell with 12 transistors, whose cell size is 17.54 m2 under a 0.18 m CMOS process,
has been reported.8) The size of the conventional TCAM cell can be estimated as 10.61 m2 under a 0.14 m
CMOS process by scaling down. Thus, the size of the fabricated TCAM cell is reduced to 30 % compared to that
of the conventional one. Moreover, minimum size of the proposed TCAM cell can be considered as 1/6 of the
conventional one.
b) More high-speed write operation is possible with increase of write current. For example, with the average current
24
of 327 A at 10 ns write.
a) A
Waveforms of equality-search operations
P : Precharge phase
E : Evaluate phase
Bit-level equality-search is successfully demonstrated.
25
Waveforms of sleep/wake-up operations
VDD
Power-off
Power-off
Active
P
Active
E
Standby
P
E
P
Active
E
Standby
P
E
CLK
Stored data B=0
S
S=0
OUTbefore=1
Stored data B=0
S=0
OUTafter=1
S=1
OUTbefore=0
S=1
OUTafter=0
OUT
780mV
10s
Match
Match
Mismatch
Mismatch
Instant sleep/wake-up behavior is successfully demonstrated.
26
Outline
• Nonvolatile Logic-in-Memory Architecture
Overview
• NV GP-Logic: Nonvolatile-FPGA
• NV SP-Logic: Nonvolatile-TCAM
• Conclusions & Future Prospects
27
Conclusions
Propose a MOS/MTJ-hybrid circuit (nonvolatile logic-inmemory circuit using MTJ devices) style
Two kinds of typical applications with logic-in-memory
architecture; NV-LUT circuit and NV- TCAM
Compact and no static power dissipation
Confirm basic behavior with fabricated test chips
under an MTJ/CMOS process.
It could open an ultra-low-power logic-circuit paradigm
Future Prospects and Issues:
1. Establish the fabrication line
2. Establish the CAD tools
3. Explore the appropriate application fields
(Impact towards “Reliability Enhancement”)
28
29
Download