A Study According to the Temperature Effect of Substrate Cuurrent

advertisement
Resistance-Based Memory Circuit Design
0251808 周德玉
Abstract
Decreasing read cell current (ICELL) has become a major
trend in nonvolatile memory (NVM). However, a reduced
ICELL leaves the operation of the sense amplifier (SAs)
vulnerable to bitline (BL) level offset and SA input offset.
Thus, small- ICELL NVMs suffer from slow read speed or
low read yield. In this study, we propose a new
current-sampling-based SA (CSB-SA) to suppress the
offset due to device mismatch, while maintaining
tolerance for insufficient precharge time. These features
enable CSB-SA to achieve a read speed 6.3× -8.1× faster
than previous SAs, for sensing 100 nA ICELL on a 2K-cell
bitline.
Introduction
The ultimate nonvolatile data memory (NVM) should
display characteristics such as high-density and low cost,
fast write and read access, low energy operation, and high
performance with respect to endurance (write cyclability)
and retention.[41] Today, Si-based Flash memory devices
represent the most prominent NVM because of their high
density and low fabrication costs. However, Flash suffers
from low endurance, low write speed, and high voltages
required for the write operations. In addition, further
scaling, i.e., a continuation in increasing the density of
Flash is expected to run into physical limits in the near
future. Ferroelectric random access memory (FeRAM)
and magnetoresistive random access memory (MRAM)
cover niche markets for special applications. One reason
among several others is that FeRAMas well as
conventional MRAM exhibit technological and inherent
problems in the scalability, i.e., in achieving the same
density as Flash today. To overcome the problems of
current NVM concepts, a variety of alternative memory
concepts is explored. Most notably, NVMs based on
electrically switchable resistance have attracted
considerable attention, often summarized under the
umbrella term resistance (switching) random access
memory, short RRAM. This review will cover particularly
interesting classes of RRAM in which redox reactions and
nano-ionic transport processes play the key role. It should
be noted, however, that despite the quite consistent
pictures painted in this review many details and many
variants are still completely unknown and our current
pictures have more the character of working hypotheses
instead of well funded physical models.
General Requirement of RRAM
General Requirements of RRAM
Memory cells in a RAM are organized in a matrix. The
rows and columns of the matrix are called word lines and
bit lines, respectively, connecting to the electronic
amplifiers in the periphery of the matrix which conduct
the write and read operations. In the simplest case,
resistively switching memory cells may be organized in a
passive cross-bar matrix, just connecting the word and bit
lines at each node (Fig. 1a). In order to avoid the so-called
parasitic-path-problem, i.e., signal bypasses by cells in
their low resistance state, serial elements with a particular
non-linearity must be added at each node. Depending on
the switching scheme of the memory cell, these can be
diodes or varistor-type elements with a specific degree of
non-linearity.[42] Alternatively, a RAM is organized in an
active matrix comprising of a select transistor at each node
which decouples the memory cell if it is not addressed
(Fig. 1b). This concept significantly reduces crosstalk and
disturbs signals in the matrix at the expense of some
additional area required for the footprint of the transistor
contacts.
A resistive switching memory cell in a RRAM is generally
built by a capacitor-like MIM structure, composed of an
insulating or resistive material ‘I’ sandwiched between
two (possibly different) electron conductors ‘M’. In the
framework of this review, the material ‘I’ are oxides or
higher chalcogenides which typically show some ion
conductivity. These MIM cells can be electrically
switched between at least two different resistance states,
after an initial electroforming cycle which is usually
required to activate the switching property. By applying
appropriate programming or write voltage pulses Vwr, a
cell in its high-resistance (OFF) state can be SET to a
low-resistance (ON) state or RESET back into the OFF
state. In the literature, the RESET is sometimes called an
‘erase’ operation. In a number of cases multilevel
switching has been demonstrated, i.e., more than two
resistance states have been established in order to realize,
for example, multiple bits per cell. The state of the RRAM
cell is detected by applying a read voltage Vrd.
Based on the circuit requirements of high-density NVM
today such as Flash and taking predictions about
technology scaling of the next 15 years into account,[43]
one can collect a number of requirements for RRAM cells:
Write operation
Write voltages Vwr should be in the range of a few
hundred mV to be compatible with scaled CMOS to few
V (to give an advantage over Flash which suffers from
high programming voltages). The length of write voltage
pulses twr is desired to be <100 ns in order to compete with
DRAM specifications and to outperform Flash which has
a programming speed of some 10 μs, or even <10 ns to
approach high-performance SRAM.
Read operation
Read voltages Vrd need to be significantly smaller than
write voltages Vwr in order to prevent a change of the
resistance during the read operation. Because of
constraints by circuit design, Vrd cannot be less than
approximately one tenth of Vwr. An additional requirement
originates from the minimum read current I rd. In the
ON-state, Ird should not be less than approximately 1mA
to allow for a fast detection of the state by reasonably
small sense amplifiers. The read time trd must be in the
order of twr or preferably shorter.
Resistance ratio
Although an ROFF /RON ratio of only 1.2 to 1.3 can be
utilized by dedicated circuit design as shown in MRAM,
ROFF /RON ratios >10 are required to allow for small and
highly efficient sense amplifiers and, hence, RRAM
devices which are cost competitive with Flash.
Endurance
Contemporary Flash shows a maximum number of write
cycles between 103 and 107, depending on the type.
RRAM should provide at least the same endurance,
preferably a better one.
Retention
A data retention time tret of >10 years is required for
universal NVM. This retention time must be kept at
thermal stress up to 85℃ and small electrical stress such as
a constant stream of Vrd pulses.
The combination of requirements on the write operation,
the read operation, and the retention sets a voltage-time
dilemma which is not addressed in most of the papers
published on resistive switching so far. A ratio Vwr/Vrd of
ten at most needs to lead to an acceleration of the
switching kinetics of tret/twr, i.e., approx. 1016! There are
only a few physical mechanisms which show such a huge
non-linearity.
Fig. 1. Circuit diagram of a storage node in the matrix of a
resistance random access memory (RRAM), where RS
denotes the resistive switching cell. a) Passive matrix, in
which NLE is a serial element with a specific
non-linearity. b) Active matrix with the select transistor T.
Fig. 2. The two basic operation schemes of resistance
switching memory cells. I–V curves recorded for a
triangular shaped voltage signal. cc denotes the
compliance current. Dashed lines indicate that the real
voltage at the system will differ from the control voltage
because of the cc in action. a) Unipolar switching. The
SET voltage is always higher than the RESET voltage,
and the RESET current is always higher than the cc during
SET operation. b) Bipolar switching. The SET operation
occurs on one polarity of the voltage or current, the
RESET operation requires the opposite polarity. In some
systems, no cc is used. Please note that the I–V curves of
real systems may deviate considerably from these sketches,
for both operation schemes.
NONVOLATILE MEMORY (NVM) suffers from a
decrease in read cell current with an shrinking in device
size and VDD but keeping the same threshold voltage.
This vulnerability to decreased ICELL is exacerbated in the
following situations: 1) multiple-level-cell (MLC) [1]–[8]
or cross-point structures [9] for attempting to achieve
smaller area-per-bit; 2) lowering VDD [10]–[14] to reduce
power
consumption;
3)
logic-process-compatible
one-time/multiple-time
programming
memories
(OTP/MTP) [15], [16] for embedding into mobile chips.
The sensing margin required by NVM is dominated by the
sense amplifier (SA) offset and bitline (BL) level offsets.
SA offset is caused by device mismatch resulting from
process variations. The BL offset is the result of noise,
bias, and load (CBL) mismatches between BLs. With
continued efforts to reduce the size and BL-pitch,
overcoming these issues has become a major challenge in
the read operation of NVMs with a smaller ICELL.
Due to these offsets, small-ICELL NVMs suffer from slow
read speed or high read fail probability. Thus, developing
an SA with greater offset tolerance is a prerequisite to
achieving high-yield small-ICELL NVMs with faster read
operations.
Many small-ICELL memories (NVMs and low-voltage
SRAMs) employ voltage-mode SA (VSA) [2]–[5],
[17]–[22] with a long BL developing time to provide
tolerance for BL and SA offset; however, this is
accomplished at the cost of reduced read speed.
Current-mode SA (CSA) achieves faster read speeds than
VSA
[1],
[13].
Cascode-current-load
(or
resistive-divider-like) CSAs (CCL-CSAs) [9], [23]–[27]
require long BL settling times and have a small 1st-stage
voltage difference when reading a small-ICELL.
Current-mirror CSA (CM-CSA) [28]–[32] has fast read
speeds but cannot sense small ICELL, due to mismatch in
the mirror-stage device. Global-clamping-local-discharge
CSA (GCLD-CSA) [1], [33], [34] achieves sub-100 nA
sensing, but requires long BL precharge and settling times
to prevent false read. Threshold-voltage (VTH) nulling
inverter-offset-compensated SA (IOC-SA) [35] reduces
SA offset; however, BL offset and settling time still limit
the advantages it provides with regard to speed, compared
to VSA/CSA. Thus, there remains a need for a new SA
capable of sensing small-ICELL, while providing fast read
speed.
In this study, we propose a new offset tolerant
current-sampling-based SA (CSB-SA) [36] capable of
detecting sub-100 nA ICELL against BL offsets and process
variations, while providing a read speed faster than that of
other SAs. A prototype 90 nm, CSB-SA achieves sub-100
nA sensing as well as a 512 Kb logic-process OTP macro
with sub-200 nA-ICELL, capable of 26 ns random access
time. Fig. 3 illustrates the superior access-time- ICELL
performance compared to previous NVMs.
Fig. 4. (a) Concept, and (b) circuits and (c) waveforms of
conventional voltage-mode sense amplifier (VSA) and
charge-transfer (CT) based VSA (CT-VSA).
Fig. 3. Read performance of recent reported NVMs.
Challenges of Small-Cell-Current Sensing
A. Voltage-Mode SA (VSA)
1) Conventional VSA: Fig. 4 outlines the concept,
simplified circuit, and waveform of VSAs. In
conventional VSAs, selected BLs are precharged to a
target voltage (VPRE) in the precharge phase.When theWL
is on, the BL is then discharged by the I CELL (ICELL-0) of a
0-cell, which has a low-VTH or low-resistance-state (LRS).
In read-1 (R1) operations, involving a high-VTH or
high-resistance-state (HRS) cell, the BL is maintained
at when the current of the “1” cell (I CELL-1) is smaller than
the BL-load current of the BL-keeper. The voltage
comparator then compares the dataline (DL) voltage (V DL)
or the sense-node voltage (VS1) with a reference voltage
(Vpre), before outputting a digital result.
The IV-conversion behavior of VSAs suffers from BL
offset, due to BL-load mismatch and BL noise, such as
crosstalk and WL-to-BL coupling [1], [12]. Moreover,
SAs and comparators have input offsets due to
mismatches between transistors in VTH, width, length, and
oxide thickness. Therefore, a large BL voltage swing is
required to provide tolerance against both BL and SA
offsets. Employing a longer BL developing time (T BL)
increases the BL voltage swing . However, if ICELL is small
and the BL-load is large, an excessively long T BL is
required, which inevitably results in slow read speeds for
high-yield sensing operations.
2) Charge-Transfer (CT)-Based VSA (CT-VSA): Many
small- ICELL NVMs use charge-transfer (CT)-based VSA
(CT-VSA) [2]–[5] to achieve read-speeds faster than what
is possible with conventional VSA. This is made possible
by a larger local signal swing on the sensing node (VS1),
while requiring less VBLS/TBL, as shown in Fig. 5.
Moreover, CT-CSA uses the same CT transistor (MCT) to
perform the BL precharge/clamping and charge-transfer
sensing operations in order to cancel the fluctuations in
sensing margin caused by VTH variation of the MCT. The
large signal swing of also enables CT-CSA to use a single
transistor as the 1st-stage comparator to drive the
page-buffer, rather than using the latch-type comparator as
in conventional VSA. This helps to save area and suppress
peak-current for wide-IO/column applications.
However, like conventional VSA, CT-VSA is still
vulnerable to BL offset and suffers from a long BL
developing time when reading small- ICELL.
3) Capacitor-Based Inverter-Offset-Compensated SA
(IOC-SA): Fig. 5 presents the concept and circuits of
previous capacitor-based inverter-offset-compensated SAs
(IOC-SAs). IOC-SA uses cross-coupled capacitors to store
and perform NMOS-VTH-nulling operations. This scheme
only reduces the SA offset associated with VTH-mismatch.
However, two MOS transistors with the same VTH may
have different drain currents, if the length or width of their
transistors differs. IOC-SA does not cancel the SA offset
resulting from other variations, including transistor width,
length, and mismatches in TOX.
Fig. 5. Traditional NMOS-VTH-nulling
inverter-offset-compensated SA (IOC-SA).
B. Current-Mode SA (CSA)
1)
Current-Mirror-Based
CSA
(CM-CSA):
Differential-input current-mirror-type CSA (CM-CSA) is
commonly used in NOR-type NVMs to achieve fast read
speeds when there is a large BL-load and small cell
current, as shown in Fig. 6. The CM-CSA scheme uses a
fixed bias voltage on BL (VPRE) to induce ICELL for reading.
When the WL is on, the 0-cell generates a larger ICELL
than that of a 1-cell. The current comparator then
compares the to-be-sensed ICELL (ICELL-0 and ICELL-1) with a
reference-current (IREF) to determine the sensing result.
However, transistor mismatch between the mirror-stage
circuit, M1-M3 and M2-M4 pairs, results in offset
between the source current (ICELL/IREF) and the mirrored
currents (I3/I4). This I3/I4 current offset leads to voltage
offset between the sense-node S1 (VS1) and S2 (VS1). If
the ICELL and IREF are small, these offsets may result in low
sensing yield. Fig. 7 plots the simulated I3 and I4
distribution of two 90 nm CM-CSAs, using 1× and 10×
transistor sizes for M1-M4. The 1× -sized CM-CSA
suffers low read yield for sensing ICELL = 1μA and IREF =
500nA due to significant overlapping between I3 and I4
resulting from VTH variations in M1-M4. The 10× -sized
CM-CSA has a narrower distribution, which enables it to
sense smaller than the 1× -sized CM-CSA. However, it
still suffers from I3 -I4 overlap when sensing ICELL =
500nA and IREF = 250nA. Obviously, CM-CSA is
incapable of providing high yield when reading sub-500
nA ICELL.
Fig. 6. (a) Concept, (b) circuit, and (c) waveform of a
conventional current-mirror-based CSA (CM-CSA).
Fig. 7. Simulated current (I3 and I4) distribution of
CM-CSA: (a) 1× transistor size for ICELL = 1μA and IREF
= 500nA; (b) 10× transistor size for ICELL = 500nA and
IREF = 250nA.
2) Cascode-Current-Load CSA (CCL-CSA): Fig. 8 shows
the
concept,
structure
and
waveform
of
cascode-current-load (CCL-CSA). CCL-CSA comprises a
load (M3), clamp transistor (M1), and a voltage
comparator.
The bias current generated by M3 (ILOAD) is set to the
middle of ICELL-0 and ICELL-1. In R1 operations, the ILOAD
exceeds ICELL-1, which requires that the voltage at the node
“S1” (VS1) be kept high during the sensing phase. In
read-0 (R0) operation, the ICELL-0 is larger than the ILOAD;
VS1 is pulled down, and the output datum (DOUT) is “0”.
When the difference in current between ICELL-0 and ICELL-1
is small, the CCL-CSA requires a long BL settling time,
resulting in a small difference between reading 1-cell and
0-cell. Moreover, the mismatch (or variation) between the
path and significantly influences the minimum sensible
ICELL and read speed. Thus, CCL-CSA is unsuitable
for reading ultra-small ICELL for long BL applications.
mistakenly turned on, which leads to a false drop in
voltage at S0 during R1 operation.
However, when the IPRE is reduced and VBL nears the
target VPRE, the precharge/clamping operation is in the
near/sub-threshold range, which results in long BL setting
time. Thus, GCLD-CSA suffers slow read speed when
reading small ICELL.
Fig. 8. (a) Concept, (b) circuit, and (c) waveform of a
cascode-current-load CSA (CCL-CSA).
3) Global-Clamping-Local-Discharging CSA
(GCLD-CSA):
Fig. 9 shows the concept, simplified circuit, and waveform
of global-clamping-local-discharging CSA (GCLD-CSA).
In the precharge phase, the voltage at local sense nodes
S0/S1 (VS0/VS1) of GCLD-CSA are precharged to (VBLC
- VTH3)/ VDD, which is higher than the BL precharge
voltage (VPRE=VBLC1-VTH1). In the sensing phase, the
local sensing nodes are floating and maintained at their
precharged voltages to read a 1-cell (ICELL-0 ~= 0). To
read a 0-cell, the ICELL-0 discharged the local sensing nodes
to generate sufficient input voltage swing (VDD-(VBLC2-
VTH1) ) on S1 for high-yield voltage comparison. At the
same time, the S0 node is clamped at (VBLC2-VTH2>
VPRE) to clamp the global BL at the VPRE in order to
prevent BL swing and the generation of coupling noise to
neighboring BLs, as in nominal current-sensing schemes.
GCLD-CSA achieves sub-100 nA sensing, but requires
that nearly no residual IPRE exists, to avoid a false drop in
voltage at the floating sense nodes (S0 and S1). If the
precharge time is insufficient, residual IPRE flows through
precharge PMOS, M3, and M1. Thus, the VS1 falls
between VBL and (VBLC3 - VTH3) at the end of the
precharge phase. This insufficiency results in M3 being
Fig. 9. (a) Concept, (b) circuit, and (c) waveform of a
global-clamping-local-discharging CSAs (GCLD-CSA).
Proposed Current-Sampling-Based Sense Amplifier
To achieve small-ICELL sensing capabilities, while
maintaining fast read speeds, this work proposes a
current-sampling-based sense amplifier (CSB-SA) capable
of overcoming process variations, while remaining
tolerant of residual IPRE to reduce the required BL
precharge time.
Concept of CSB-SA
Fig. 10(a) shows the concept of the proposed CSB-SA.
This approach uses the same MOS device (M1/M2) for
current sampling and current-ratio amplification across
operating phases. Thus, VTH-independent current sampling
schemes can be implemented for different ICELL and IREF
inputs. This approach differs significantly from that of
CM-CSA, which uses different MOS devices for
current-mirroring or conveying I-V, resulting in increased
vulnerability to mismatch between devices.
Conventional VSA or CCL-CSA must develop 1st-stage
voltage on the heavily loaded BL using continuous ICELL
driving. Thus, the read access time of VSA and CCL-CSA
is sensitive to BL load and mismatch. CSB-SA uses
sampled current to rapidly generate the 1st-stage voltage
difference at small-load internal nodes, thanks to its
BL-decoupled behavior. In addition, the sampled currents
are insensitive to transistor and CBL mismatch and the
current sampling operation is embedded in the BL
precharge operation.
Thanks to the tolerance for process variation, CSB-SA can
tolerate residual IPRE, which reduces the time required for
BL precharge. Compared to other SAs, CSB-SA achieves
faster read speeds, while providing tolerance for
transistor/BL offset in sensing small on a heavy-load BL,
as shown in Fig. 10(c).
Operation of CSB-SA
The CSB-SA operation is divided to three phases. In
phase-1 (BL precharge and current sampling phase),
S1-S4 are turned on to connect the BL and dummy-BL
(DBL, at ICELL side) to the diode-connected M1 andM2,
respectively. M1 and M2 provide large precharge
IPRE(IPRE1/IPRE2) to BL/DBL at the beginning of phase-1.
After a sufficient precharge time (TPRE), voltages at nodes
SA1 (VSA1) and SA2 (VSA2) are high, while IPRE is low.
The drain currents of M1 and M2 are (IM1=ICELL+IPRE1)
and (IM2=IREF+IPRE2), respectively. At the end of phase-1,
the gate voltages for M1 (VG1) and M2 (VG2) are stored in
C2 and C1, respectively. When the IPRE is low enough to
be disregarded, the IM1/IM2 is equal to ICELL/IREF, despite
various M1-M2 VTH mismatch conditions. Fig. 11 shows
the structure of the equivalent circuit and waveform of
phase-1.
Fig. 12 shows the structure of the circuit and the
waveform in phase-2. In phase-2 (current-ratio
amplification), the S1/S2 is switched off to disconnect the
SA1/SA2 from the heavy-load BL/DBL. At the beginning
of phase-2, the M1/M2 charges SA1/SA2 with the current
(IM1/IM2) sampled in phase-1. For a given period in
phase-2 (TP2), IM1/IM2 increases the VSA1/VSA2 by
ΔVSA1/ΔVSA2.
Due to the AC-coupling behavior of C1/C2, theΔVSA1
increases the gate voltage (VG2) and reduces the
gate-source voltage difference (VGS2) of M2 by ΔVG2, and
ΔVSA2 reduces the VGS1 by ΔVG1. The reduction in VGS1
and VGS2 causes the IM1 and IM2 to decrease differentially
by ΔIM1 and ΔIM2, respectively. A longer TP2 increases the
difference in current (ΔIM1-M2= IM1-IM2) between M1
and M2, when using the same devices (M1/M2) as in
phase-1. This results in an amplification of the current
ratio (CR= IM2/IM1), as shown in Fig. 13. CR-amplification
accelerates the development of the difference (ΔVSA)
between VSA1 and VSA2.
Fig. 11. Operating circuit and waveform in phase-1
(current-sampling).
Fig. 10. (a) concept and (b) circuit of proposed CSB-SA.
(c) Conceptual read-speed comparison of CSB-SA with
previous SAs.
Fig. 12. Operating circuit and waveform in phase-2.
Fig. 13. Current ratio vs. TP2.
In phase-3 (2nd-stage amplification), the EN turns on the
NMOS-latch and pulls down the SA2, while M1
continually charges the SA1 to VDD for read-1 operations.
Finally, the digital output is generated at nodes SA1 and
SA2, as shown in Fig. 14.
Fig. 14. Equivalent circuit and waveform of phase-3
(current-amplification).
Analysis And Comparison
CSB-SA vs. Process Variation
In CSB-SA, the sampled IM1/IM2 acts as a DC element
and the ΔIM1-M2 acts as an AC-element. Variations in
ΔVSA occur at the end of phase-1 (ΔVSA-P1), due to
mismatch in VTH between M1 and M2 and differences in
the amount of under-sensing ICELL. The phase-2
AC-coupling behavior using DC elements enables VSA1-
VSA2 “cross-over”, even if a significant proportion of the
initial VSA1 - VSA2 offset (ΔVSA-P1) remains. This is
because ΔVSA-P1 can be overcome by ΔIM2-M1, if TP2 is
sufficient, unlike un-recoverable sensing error due to VTH
mismatch in CM-CSA. Hence, the “delay until VSA1-
VSA2 cross-over” of the SE signal, which activates the
NMOS-latch, enables “offset cancel” for CSB-SA against
process variation.
Fig. 15 shows two simulated VSA1 - VSA2 crossover
behaviors for sensing ICELL = 100 nA and IREF = 50 nA:
with and without a 150 mV VTH-mismatch between
transistors M1 and M2. In the case of VTH-mismatch, the
VSA1-VSA2 crossover point occurs later than in the case
without mismatch. However, with a VTH-mismatch of 150
mV, CSB-SA still has the same sampled current as well as
current-ratio amplification behavior similar to the case
without mismatch. Clearly, the functionality of CSB-SA is
insensitive to device mismatch.
Although ΔIM2-M1 is small when sensing small ICELL, the
required TP2 to tolerate large mismatches between devices
is insignificant and remains independent of CBL, thanks to
the small parasitic load at nodes SA0 and SA1
(disconnection from BL) during phase-2.
Fig. 16 shows the TP2 versus M1-M2 VTH mismatch for
CSB-SA required to maintain functionality. When larger
M1-M2 mismatch occurs, the CSB-SA requires a longer
TP2. Fortunately, the TP2 penalty needed to compensate for
a 150 mV VTH-mismatch is only 0.4% of the access time
of a macro with 2048 cells per BL. Unlike CM-CSA,
which suffers from low yield due to variations in VTH, the
CSB-SA can achieve 100% yield sensing sub-100 nA,
against device variations, as long as sufficient TP2 is
provided.
The variation in capacitance of C1/C2 does not affect the
behavior of current sampling. This is because the sampled
current in phase-1 depends on the voltages (VG1/ VG2)
stored on C1/C2, which are insensitive to the capacitance
of C1/C2. In phase-2, ΔVSA1/ΔVSA2 depends on the
capacitance of node SA1/SA2, which includes that of
C1/C2 as well as the parasitic capacitance of S1/S2 and
S3/S4. ΔVG1/ΔVG2 depends on the capacitance of node
G1/G2, which includes that of C1/C2 as well as the
parasitic capacitance of M1/M2 and S3/S4. Because the
parasitic
capacitance
of
(S1+S3)/(S2+S4)
or
(M1+S3)/(M2+S4) far exceeds that of C1/C2, the
variation in capacitance of C1/C2 does not have a
significant influence on the behavior of phase-2.
The CSB-SA scheme uses an inactive sub-array to provide
the DBLs with IREF, as shown in Fig. 17. This improves
the common-mode BL precharge behavior of the ICELL and
IREF branches. Fig. 18 shows the switch-point analysis
under two different IM2 conditions: IM2 = 50nA and IM2 =
150 nA. This analysis includes variations to all transistors
in the CSB-SA using 10000-point Monte-Carlo simulation
with the foundry’s statistical SPICE model. It should be
noted that IM2 is the sum of IREF and IPRE2. The CSB-SA
achieves 100% yield if ΔIM1-M2 exceeds 3.8 nA, when IM2
is only 50 nA (IREF = 50 nA and IPRE = 0 nA). With an
increase in residual (i.e., 100 nA), while IREF is maintained
at 50 nA, the CSB-SA still can have a small dead-zone
(ΔIM1-M2 = 7.8 nA). This indicates that CSB-SA is not
overly sensitive to near-common-mode residual IPRE. The
small dead-zone implies that the CSB-SA is capable of
tolerating a larger residual IPRE and employing a shorter
BL precharge time to achieve faster read speeds without a
significant influence on the yield of CSB-SA, particularly
in long BL applications.
Fig. 15. SA1-SA2 cross-over behavior: (a) No mismatch
between M1 and M2; (b) 150 mV VTH mismatch between
M1 and M2.
residual IPRE. The ability to tolerate residual IREF of 50 nA,
enables a reduction in the TPRE of 256-cell-per-BL and
2048-cell-per-BL of 76% and 82%, respectively,
compared to the case without residual IPRE. Therefore,
using a shorter TPRE is an effective approach for
improving the macro read access time, provided the SA
has correct functionality.
However, using a shorter TPRE (insufficient TPRE)
increases the residual IPRE (IPRE1 and IPRE2) of most CSAs.
Fortunately, as discussed in Section IV.B, our CSB-SA
can tolerate near-common-mode residual IPRE, at the
expense of increased delay time (TP2) in phase-2.
Fig. 20 plots TP2 and macro access time (TAC) versus IPRE
in two cases: 1) IREF = 50nA and ICELL = 100 nA; 2) IREF =
100 nA and ICELL = 200 nA for a 512 Kb macro with 256
cells per BL. As expected, larger IREF and ICELL reduce the
TP2 penalty for a given level of IPRE to tolerance. Although
TP2 is increased slightly with IPRE to generate the target
ΔVSA-P2 in achieving high-yield phase-3 operation, the
overall read access time (TAC) of CSB-SA is significantly
reduced by using a shorter TPRE with a higher IPRE.
Fig. 16. TP2 versus VTH-mismatch between M1 and M2
transistors.
Fig. 19. BL precharge time (TPRE) vs. residual IPRE.
Fig. 17. Macro-level sensing structure for CSB-SA.
Fig. 20. Simulated TP2 and macro access time (BL–
length = 256) vs. residual IPRE for two cases: 1) IREF =
50nA and ICELL = 100 nA; 2) IREF = 100 nA and ICELL =
200 nA.
Fig. 18. Switch-point analysis of CSB-SA: (a) IM2 = 50 nA;
(b) IM2 = 150 nA.
BL Precharge Time vs. Speed
Fig. 19 shows the TPRE of two BL structures
(256-cellper-BL and 2048-cell-per-BL) required to read
ICELL = 100 nA and IREF = 50 nA across various levels of
Comparison With Other SAs
1) Comparison With Conventional SAs: Fig. 21 shows
yield versus ICELL (ICELL0) for higher-speed SAs between
the proposed CSB-SA and CM-CSAs, using a
10000-point Monte-Carlo simulation. As mentioned in
Section II, the CM-CSA suffers from low yield for sub-1
uA ICELL. Conversely, CSB-SA achieves high yield even
when ICELL is below 100 nA, particularly when using a
longer TP2 phase (i.e., 50 ns).
Fig. 22 compares the random read access speed of various
SAs for a NVM macro with 2048 cells per BL. The
CSB-SA without residual IPRE achieves 1.1× -1.4× faster
macro read speeds reading ICELL = 100 nA, compared to
VSA, CT-VSA, CCL-CSA, and GCLD-SA. Tolerance for
50 nA IPRE enables the CSB-SA to achieve macro read
speeds (for reading a 100nA-ICELL, 6.3× -8.1× faster than
those of SAs.
2)
Comparison
With
Capacitor-Based
Inverter-Offset-Compensated SA (IOC-SA): CSB-SA and
IOC-SA have similar circuit structures, using
cross-coupled capacitors to suppress the input offset of the
SA. However, the circuit behavior and the influence on
offset suppression performance differs between CSB-SA
and IOC-SA from the following two perspectives:
1) Usage of capacitors for offset suppression
2) Area and design complexity
a) Usage of Capacitors for Offset Suppression: Unlike
IOC-SA, the CSB-SA uses capacitors to store the gate
voltage (VG) for input current sampling. This sampled
current includes the VTH variation as well as transistor
width, length, and TOX mismatches between the M1/M2
transistors. Unlike the voltage-mode amplification in
IOC-SA, our proposed CSB-SA utilizes current-mode
operations
(current
sampling
and
current-ratio
amplification) for 1st-stage input-signal amplification. In
addition, the “SE delay until V SA1 -VSA2 cross-over”
behavior in phase-2 enables “offset cancel”, even if initial
VSA1 - VSA2 offset (ΔVSA-P2) remains. This enables
CSB-SA to employ relaxed internal timing to improve its
tolerance for variation without the need for an increase in
the difference between input signals on BLs.
b) Area and Design Complexity: The VTH or auto-zero
point storing operation in IOC-SAs requires a complex
multi-step offset nulling process and numerous switches.
The use of many switches degrades offset-cancelling
performance, due to the charge-injection effect. In contrast,
the CSB-SA requires only four switches sharing the same
control signal, PRE. The IOC-SA requires twice as many
switches and four times as many control signals than
CSB-SA, as shown in Table I. The VTH-nulling operation
increases the control-complexity and area overhead of
IOC-SA beyond what is required by for the proposed
CSB-SA.
Fig. 21. Yields of CSB-SA and CM-CSA.
Fig. 20. Comparison of speed between various SAs.
This paper proposes a current-sampling-based (CSB) SA
to achieve robust and fast read operations for NVMwith
small cell current. This CSB-SA is insensitive to device
mismatch and BL offset, while achieving 6.3× -8.1× faster
read speed, compared to other SAs, when reading a
2048-cell BL with a 100 nA cell current.[40]
[1] R.-A. Cernea et al., “A 34 MB/s MLC write
throughput 16 Gb NAND with all bit line architecture on
56 nm technology,” IEEE J. Solid-State Circuits, vol. 44,
no. 1, pp. 186–194, Jan. 2009.
[2] K. Takeuchi et al., “A 56-nm CMOS 99-nm 8-Gb
multi-level NAND flash memory with 10 MB/s program
throughput,” in IEEE Int. Solid-State Circuits Conf.
(ISSCC) Dig. Tech. Papers, 2006, pp. 507–516.
[3] D.-S. Byeon et al., “An 8 Gb multi-level NAND flash
memory with 63 nm STI CMOS process technology,” in
IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.
Papers, 2005, pp. 46–47.
[4] S.-H. Chang et al., “A 48 nm 32Gb 8-levelNANDflash
memory with 5.5 MBs program throughput,” in IEEE Int.
Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
2009, pp. 240–241.
[5] G. G. Marotta et al., “A3 bit/cell 32Gb NANDflash
memory at 34 nm with 6MB/s program throughput and
with dynamic 2 b/cell blocks configuration mode for a
program throughput increase up to 13 MB/s,” in IEEE Int.
Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
2010, pp. 444–445.
[6] K. Fukuda et al., “A 151 mm 64 Gb MLC NAND
flash memory in 24 nm CMOS technology,” in IEEE Int.
Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
2011, pp. 198–199.
[7] D. Lee et al., “A 64 Gb 533 Mb/s DDR interface MLC
NAND flash in sub-20 nm technology,” in IEEE Int.
Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
2012, pp. 430–431.
[8] Y. Li et al., “128 Gb 3b/cell NAND flash memory in
19 nm technology with 18MB/swrite rate and 400 Mb/s
togglemode,” in IEEE Int. Solid-State Circuits Conf.
(ISSCC) Dig. Tech. Papers, 2012, pp. 436–437.
[9] C. J. Chevallier et al., “A 0.13 m 64 Mb multi-layered
conductive metal-oxide memory,” in IEEE Int. Solid-State
Circuits Conf. (ISSCC) Dig. Tech. Papers, 2010, pp.
260–261.
[10] A. Wang and A. Chandrakasan, “A 180-mV FFT
processor using subthreshold circuit techniques,” in IEEE
Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
2004, pp. 292–529.
[11] M. Seok, S. Hanson, J.-S. Seo, D. Sylvester, and D.
Blauuw, “Robust ultra-low voltage ROM design,” in Proc.
IEEE 2008 Custom Integrated Circuits Conf. (CICC),
2008, pp. 423–426.
[12] M.-F. Chang, S.-M. Yang, C.-W. Liang, C.-C.
Chiang, P.-F. Chiu, and K.-F. Lin, “Noise-immune
embedded NAND-ROM using a dynamic split source-line
scheme for VDDmin and speed improvements,” IEEE J.
Solid-State Circuits, vol. 45, no. 10, pp. 2142–2155, Oct.
2010.
[13] M.-F. Chang et al., “A 0.5 V 4 Mb logic-process
compatible embedded resistive RAM (ReRAM) in 65 nm
CMOS using low voltage currentmode sensing scheme
with 45 ns randomread time,” in IEEE Int. Solid-State
Circuits Conf. (ISSCC) Dig. Tech. Papers, 2012, pp.
434–435.
[14] M.-F. Chang et al., “A 0.29 V embedded
NAND-ROM in 90 nm CMOS for ultra-low-voltage
applications,” in IEEE Int. Solid-State Circuits Conf.
(ISSCC) Dig. Tech. Papers, 2010, pp. 266–267.
[15] H.-C. Lai, K.-Y. Cheng, Y. C. King, and C. J. Lin,
“A 0.26-mm U-shaped nitride-based programming cell on
pure 90-nm CMOS technology,” IEEE Electron Device
Lett., vol. 28, no. 9, pp. 837–839, Sep. 2007.
[16] C. E. Huang et al., “A new self-aligned nitride MTP
cell with 45 nm CMOS fully compatible process,” in
IEDMDig.,Dec. 2007, pp. 91–94.
[17] B. H. Calhoun and A. P. Chandrakasan, “A 256 kb
subthreshold SRAM in 65 nm CMOS,” in IEEE Int.
Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
2006, pp. 2592–2601.
[18] T. H. Kim, J. Liu, and C. H. Kim, “A voltage scalable
0.26 V, 64 Kb 8T SRAM with Vmin lowering techniques
and deep sleep mode,” IEEE J. Solid-State Circuits, vol.
44, no. 6, pp. 1785–1795, Jun. 2009.
[19] B. Zhai, D. Blaauw, D. Sylvester, and S. Hanson, “A
variation-tolerant sub-200 mV 6 T subthreshold SRAM,”
IEEE J. Solid-State Circuits, vol. 43, no. 10, pp.
2338–2348, Feb. 2008.
[20] I. J. Chang, J. J. Kim, S. P. Park, and K. Roy, “A 32
kb 10 T subthreshold SRAM array with bit-interleaving
and differential read scheme in 90 nm CMOS,” IEEE J.
Solid-State Circuits, vol. 44, no. 2, pp. 650–658, Feb.
2009.
[21] Y. Morita et al., “An area-conscious
low-voltage-oriented 8T-SRAM design under DVS
environment,” in Symp. VLSI Circuits Dig. Papers, 2007,
pp. 256–257.
[22] J. Chen, L. T. Clark, and T.-H. Chen, “An
ultra-low-power memory with a subthreshold power
supply voltage,” IEEE J. Solid-State Circuits, vol. 41, no.
10, pp. 2344–2353, Oct. 2006.
[23] J.-K. Kim et al., “A 120-mm 64-Mb NAND flash
memory achieving 180 ns/byte effective program speed,”
IEEE J. Solid-State Circuits, vol. 32, no. 5, pp. 670–680,
May 1997.
[24] T. Tanzawa, Y. Takano, T. Taura, and S. Atsumi,
“Design of a sense circuit for low-voltage flash
memories,” IEEE J. Solid-State Circuits, vol. 35, no. 10,
pp. 1415–1421, Oct. 2000.
[25] K. J. Lee and B.-H. Cho et al., “A 90 nm 1.8 V 512
Mb diode-switch PRAM with 266 MB/s read throughput,”
in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.
Papers, 2007, pp. 472–616.
[26] F. Bedeschi et al., “A bipolar-selected phase change
memory featuring multi-level cell storage,” IEEE J.
Solid-State Circuits, vol. 44, pp. 217–227, 2009.
[27] S. Chung, J.-T. Huang, P. Chen, and F.-L. Hsueh, “A
512 8 electrical fuse memory with 15 m2 cells using 8-sq
asymmetric fuse and core devices in 90 nm CMOS,” in
IEEE Symp. VLSI Circuits Dig. Tech. Papers, 2007, pp.
74–75.
[28] M.-K. Seo et al., “A 130-nm 0.9 V 66-MHz 8-Mb
(256 K 32) local SONOS embedded flash EEPROM,”
IEEE J. Solid-State Circuits, vol. 40, no. 4, pp. 877–883,
Apr. 2005.
[29] T. Ogura et al., “A 1.8-V 256-Mb multilevel cell
NOR flash memory with BGO function,” IEEE J.
Solid-State Circuits, vol. 41, no. 11, pp. 2589–2600, Nov.
2006.
[30] M.-F. Chang et al., “A process variation tolerant
embedded split-gate flash memory using pre-stable
current sensing scheme,” IEEE J. Solid-State Circuits, vol.
44, no. 3, pp. 987–994, Mar. 2009.
[31] A. Conte, “A high-performance very low-voltage
current sense amplifier for nonvolatile memories,” IEEE J.
Solid-State Circuits, vol. 40, no. 2, pp. 507–514, Feb.
2005.
[32] R. Micheloni, “The flash memory read path: Building
blocks and critical aspects,” Proc. IEEE, vol. 91, no. 4,
Apr. 2003.
[33] T. Futatsuyama et al., “A 113 mm 32 Gb 3b/cell
NAND flash memory,” in IEEE Int. Solid-State Circuits
Conf. (ISSCC) Dig. Tech. Papers, 2009, pp. 242–243.
[34] Y. Li et al., “A 16 Gb 3-bit per cell (X3) NAND flash
memory on 56 nm technology with 8 MB/s write rate,”
IEEE J. Solid-State Circuits, pp. 195–207, Feb. 2009.
[35] Javanifard et al., “A 45 nm self-aligned-contact
process 1 Gb NOR flash with 5 MB/s program speed,” in
IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.
Papers, 2008, pp. 424–425.
[36] M.-F. Chang et al., “An offset tolerant
current-sampling-based sense amplifier for sub-100
nA-cell-current nonvolatile memory,” in IEEE Int.
Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
2011, pp. 206–207.
[37] S. H. Kulkarni et al., “High-density 3-D metal-fuse
PROM featuring 1.37 m 1T1R bit cell in 32 nm high-k
metal-gate CMOS technology,” in Symp. VLSI Circuits
Dig. Tech. Papers, 2009, pp. 28–29.
[38] G. Uhlmann et al., “A commercial
field-programmable dense eFUSE array memory with
99.999% sense yield for 45 nm SOI CMOS,” in IEEE Int.
Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
2008, pp. 406–407.
[39] S. Chung et al., “A 1.25 um cell 32 Kb electrical fuse
memory in 32 nmCMOS with 700mV Vddmin and
parallel/serial interface,” in IEEE Symp. VLSI Circuits Dig.
Tech. Papers, 2009, pp. 30–31.
[40] M.-F. Chang et al., “An Offset-Tolerant
Fast-Random-Read Current-Sampling-Based Sense
Amplifier for Small-Cell-Current Nonvolatile Memory,”
IEEE J. Solid-State Circuits, vol. 48, no. 3, pp. 864–877,
Mar. 2013.
[41] Nanotechnology, Vol. 3 (Ed: R. Waser), Wiley-VCH,
Weinheim 2008.
[42] A. Flocke, T. G. Noll, C. Kugeler, C. Nauenheim, R.
Waser, in Proc. IEEE Non-Volatile Memory Technology
Symposium, 2008, p. 319.
[43] The International Technology Roadmap for
Semiconductors – ITRS 2007 Edition (http://www.itrs.net),
2007.
Download