Uploaded by 22141024

A 128kb Stochastic Computing Chip based on RRAM Flicker Noise with High Noise Density and Nearly Zero Autocorrelation on 28-nm CMOS Platform

advertisement
A 128kb Stochastic Computing Chip based on RRAM
Flicker Noise with High Noise Density and Nearly Zero
Autocorrelation on 28-nm CMOS Platform
Tiancheng Gong†1, Qiao Hu†1, Danian Dong1, Haijun Jiang2, Jianguo Yang*1,2, Xiaoxin Xu*1, Xiaoming Chen3,
Qing Luo1, Qi Liu4, Steve S. Chung5, Hangbing Lyu1, Ming Liu1
2021 IEEE International Electron Devices Meeting (IEDM) | 978-1-6654-2572-8/21/$31.00 ©2021 IEEE | DOI: 10.1109/IEDM19574.2021.9720588
1Key
Laboratory of Microelectronics Devices and Integrated Technology, Institute of Microelectronics of Chinese Academy of Sciences, Beijing,
China; 2Zhejiang Lab, Hangzhou, China; 3Institute of Computing Technology of Chinese Academy of Sciences, Beijing, China;
4Frontier Institute of Chip and System, Fudan University, China; 5National Yang Ming Chiao Tung University, Taiwan.
Email: yangjianguo@ime.ac.cn; xuxiaoxin@ime.ac.cn
Abstract—In this work, a 128kb RRAM flicker noise based
Stochastic Computing (SC) chip is demonstrated on 28-nm
HKMG CMOS platform for the first time. Flicker noise in
RRAM is selected as the entropy source owing to the higher
noise density compared with the conventional logic devices,
even the advanced 14-nm FinFET. The reliability of the flicker
noise in RRAM is further improved by an optimized weak
forming scheme at the array level towards accurate and stable
SC applications. Moreover, a probability modulated true
random number generator (PM-TRNG) circuit is proposed to
generate stochastic bit-streams (SBS) with the desired
probability and eliminated correlation. The autocorrelation
function (ACF) and NIST testing results show that the
correlation of output SBS is nearly zero. Finally, this RRAMflicker noise based SC chip is successfully implemented for
edge detection in image processing system and extremely low
error rate (3.13%@128 BSL) can be achieved.
I.
INTRODUCTION
Arithmetic operations based on deterministic computing,
such as multiplication, division, addition and subtraction,
usually consume large logical resources. Compared with
deterministic computing, stochastic computing (SC) only needs
a simple single logical unit to complete the above operations as
shown in Fig. 1 [1]. Stochastic computing, which is based on
the operation of stochastic bit-streams (SBS) generated by
stochastic number generator (SNG), consumes less hardware
resources, and has the advantages of high error-tolerance and
no memory bottleneck [2]. Recently, the robust SNG based on
RRAM has attracted great attention owing to its lower energy
and simple structural complexity [3-4]. However, due to the
reliability and autocorrelation of the entropy source, none has
been demonstrated for RRAM-based SC chip.
For the first time, we fabricated a 128kb SC chip based on
RRAM flicker noise on 28-nm HKMG CMOS platform.
Through systematic comparison of the flicker noise in 40-nm
MOSFET, 28-nm MOSFET, 14-nm FinFET and RRAM,
RRAM shows the highest flicker noise density, which will
ensure the accuracy of SC applications. After the ingenious
modulation of the noise density by an optimized operation
scheme, the highly reliable flicker noise is achieved, which can
be used for stable SC applications. The approaching zero
autocorrelation SBS is achieved by the probability modulated
978-1-6654-2572-8/21/$31.00 ©2021 IEEE
TRNG circuit. By using this SC chip, extremely low error rate
(3.13%@128 BSL) is achieved for edge detection application.
II.
SELECTION OF THE ENTROPY SOURCE
Noise of electron devices is widely adopted as the entropy
source as a result of its randomness [7]. Fig. 2(a) shows the
classification of noise in electron devices. Thermal noise,
whose spectral density is too tiny to be observed, can be easily
affected by the testing environment. RTN, which is caused by
a single trap, is difficult to activate and control. Also, the noise
density of RTN is relatively low. Among all types of noise,
flicker noise, which is regarded as the influence of many local
traps, can be easily measured and has the highest noise density.
The slope of flicker noise spectrum is -1 and the current
fluctuation is shown in Fig. 2(b). In order to ensure the accuracy
of SC, the noise density of flicker noise (S/I2) should be
increased.
Flicker noises of MOSFET/FinFET devices are shown in
Fig. 3. The normalized drain current noise (W×L×S/I2), which
is extracted for a fixed frequency (f=10Hz), is plotted as a
function of the device area (W×L) in Fig. 3(a) for 28-nm
PolySiON MOSFET devices. The average value of the
normalized noise level in log scale is relatively constant with
the area for all devices, indicating that the 1/f noise density
scales as the inverse of the device area. For MOSFET/FinFET
devices with different process nodes (40-nm, 28-nm, 14-nm),
the W×L×S/I2 value keeps the same under the same ID as shown
in Fig. 3(b). Thus, the maximum 1/f noise density can be
obtained under the smallest area. The W×L×S/I2 value as a
function of VG is shown in Fig. 3(c). The W×L×S/I2 value
decreases with the increasing bias, VG.
Flicker noises of RRAM devices are shown in Fig. 4. The
normalized noise does not change with the device area as shown
in Fig. 4(a). Fig. 4(b) shows the noise density as a function of
the resistance. Obviously, when the resistance of the RRAM
device increases, the noise density also increases. The
schematic mechanism of LRS and HRS 1/f noise is shown in
Fig. 4(c). The maximum 1/f noise of RRAM (3.2MΩ) and
FinFET (VG=Vth, 2 Fins, 1 Finger) are compared as shown in
Fig. 5(a). The maximum 1/f noise density of RRAM is much
higher than that of FinFET. Fig. 5(b) shows the benchmark of
noise in MOSFET/FinFET and RRAM. Based on the above
results, 1/f noise in HRS of RRAM devices is suitable to be used
as the entropy source owing to its high noise density.
12.5.1
Authorized licensed use limited to: Zhejiang University. Downloaded on June 02,2022 at 08:46:18 UTC from IEEE Xplore. Restrictions apply.
IEDM21-266
III. TEST CHIP MEASUREMENT RESULTS
A. Optimization of 1/f Noise in RRAM at the Array Level
Based on the above analysis, the RRAM flicker noise based
128kbit SC test chip is implemented on SMIC 28nm HKMG
CMOS platform. The die micrograph is shown in Fig. 6(b). The
cross-section of RRAM cell in array and the zoom-in view of
the cell structure with the EDX line scan across the single cell
are shown in Fig. 6(c). Fig. 6(a) is the testing setup, including
microcontroller, pulse and bias generator and FPGA with level
shift interface circuit. Fig. 7(a) shows the distributions of LRS
and HRS in 10kb mini-array. 1/f noise of HRS is used as the
entropy source owing to its high noise density. However, the
noise density of HRS decreases with the retention time as
shown in Fig. 7(b), which will reduce the accuracy of stable SC
applications. Based on the corresponding relationship between
resistance and noise density, the decrease of noise density can
be attributed to the reduction of resistance. To solve this
problem, we proposed a weak forming scheme to suppress the
drift of HRS resistance values towards to lower resistance
values. Fig. 8(a) shows the operation procedure. Different VWLs
are adopted during the forming process for three mini-arrays,
and then all the devices are reset to the same HRS. In the
subsequent retention measurement, the memory cells
undergoing the strong forming operation tend to drift towards
lower resistance values as shown in Fig. 8(b). This
phenomenon could be explained by the percolation model [8]
as shown in Figs. 8(c)(d). As the VWL increases, the size of the
conductive filament becomes larger, remaining the larger crosssectional area of the residual filament after reset. In this case,
the percolation paths are more likely to form by connecting the
VO in the gap region, resulting in the lower resistance.
B. The Probability Modulated TRNG Circuit for Independent
SBS
A schematic of the PM-TRNG circuit is shown in Fig. 9(a).
The noise of the selected RRAM cell is amplified. A passive
band pass filter is implemented to remove the undesired
frequency noise and set a common DC-bias for the noise signal
(Vbias). The probability of a bit ‘1’ at the comparator output is
modulated by the bias voltage Vbias which is controlled by the
DAC circuit. The SBS with a desired probability of logic ‘1’ is
achieved from the value in a register. The Vref_cmp is initially
chosen to be equal to the Vbias, and then remains unchanged.
Owing to the high-density flicker noise from the RRAM, only
a single amplifier and a comparator are need in this circuit,
which greatly decreases the circuit area. Figs. 9(b)(c)(d) shows
the measured time-varying SBS value for specific input
voltages, displaying stochastic behavior at input change by 0
mV, but becoming deterministic as the input changes by about
±60 mV. Fig. 10(a) shows the measured output probability of
‘1’ in the random data (120k samples) versus the dc bias voltage
of the noise signal. The out probability of ‘1’ granularity is set
by the resolution of the D/A converter. For a desired probability
of 0.5, the sequences randomness was validated by the
autocorrelation function (ACF) and NIST tests. The ACF tested
result in Fig. 10(b) verifies that the flicker noise in RRAM is
an independent variable. Our proposed PM-TRNG has passed
all NIST randomness tests as summarized in Table 1.
C. Error-Tolerant Stochastic Computing for Edge Detection
Application
The proposed flicker noise SC chip can be used for edge
detection in image processing system based on the Robert cross
algorithm as shown in Fig. 11. By using the correlation between
two SBSs [9], one XOR gate can achieve absolute value
subtraction, so a Robert operator can be realized by only two
XOR gates and one MUX, which requires the SBSs used by the
three logic units to be irrelevant. In this way, one Robert
operator can save two random number generators, which can
greatly save hardware resources in massively parallel detection.
In Fig. 12, the Cameraman image with a size of 320×320 (P×Q)
pixels was used for Roberts cross edge detection in different
BSLs (Bit Stream Lengths) and noise level (percentage of bit
flips of SBS caused by external disturbance). Fig. 13 shows the
measured MAE for different BSLs and noise levels compared
with the software results using binary to perform Robert
operator operations. It can be seen that the MAE decreases with
increasing BSL (also increasing computing time), which
enables us to make a trade-off between BSL (computing time)
and accuracy according to different applications. For example,
the BSL can be reduced for real-time processing applications
when only the rough contour of the object needs to be initially
located. For medical images applications, which require high
image recognition accuracy, the BSL can be increased to
improve the detection accuracy. Meanwhile, it can be seen from
Figs. 12(f) and 13 (b) that there is still a good detection result
when the noise level is 5%, which shows an excellent ErrorTolerant performance.
IV. CONCLUSION
In summary, a 128kb SC chip based on RRAM flicker noise
with high noise density and approaching zero autocorrelation in
28-nm CMOS platform is demonstrated for accurate and stable
SC applications. Table 2 summarizes this work and compares
to state-of-the-art SNGs. The presented test chip reduced the
area by 100× and the power by 770× compared with the
traditional CMOS implementations, which shows its potential
for deployment in error tolerant and energy-efficient SC
designs to mitigate the bottleneck challenges of deterministic
computing.
ACKNOWLEDGMENTS
This work was supported in part by the MOST of China under Grants
2019YFB2205100, 2018YFA0701500, and in part by the National Natural
Science Foundation of China under Grants 61904200, 62025406, 61834009 and
the Strategic Priority Research Program of the Chinese Academy of Sciences
under Grant No. XDB44000000. T. Gong and Q. Hu contribute equally to this
work.
REFERENCES
[1] B. R. Gaines, “Stochastic computing systems,” Advances in Information
Systems Science, Springer, pp. 37-172, 1969.
[2] Y. Zhao, et al., “A Physics-based Model of RRAM Probabilistic Switching for
Generating Stable and Accurate Stochastic Bit-streams,” IEDM Tech., pp.
32.4.1-32.4.1, 2019.
[3] P. Knag, et al., “RRAM solutions for stochastic computing,” Stochastic
Computing: Techniques and Applications, Springer, pp. 153-164, 2019.
[4] D. Ielmini, et al., “In-memory computing with resistive switching devices,”
Nature Electronics, vol. 1, no. 6, pp. 333-343, 2018.
[5] J. Hu, et al., “Spin-Hall-Effect-Based Stochastic Number Generator for Parallel
Stochastic Computing,” IEEE TED, vol. 66, no. 8, pp. 3620-3627, 2019.
[6] H. Ichihara, et al., “Compact and accurate stochastic circuits with shared
random number sources,” ICCD, pp. 361-366, 2014.
[7] A. A. Balandin, “Low-frequency 1/f noise in graphene devices,” Nature Nano.,
vol. 8, no. 8, pp. 549-555, 2013.
[8] T. Ninomiya, et al., “Conductive filament scaling of TaOx bipolar ReRAM for
long retention with low current operation,” VLSI, pp. 73-74, 2012.
[9] A. Alaghi, et al., “Stochastic circuits for real-time image-processing
applications,” DAC, pp. 1-6, 2013.
12.5.2
Authorized licensed use limited to: Zhejiang University. Downloaded on June 02,2022 at 08:46:18 UTC from IEEE Xplore. Restrictions apply.
IEDM21-267
Memory
FF+MUX
MUX
Stochastic
Bit Stream
This work
(RRAM 1/f noise)
SNG
Memory bottleneck
Large arithmetic resources
Noise sensitivity
Analog
Less Memory Access
Resource saving
Error-Tolerant
Digital
S/I2 (Hz-1)
10-9
10-10
1/f
VD=0.1V
VG=Vth
-12
10
1
10
100
f (Hz)
10
-11
1000
(c)
VG-Vth=0
VG-Vth=0.1V
10-7
VG-Vth=0.2V
10-8
14nm FinFET
10-9
10
-11
10-12
1
VD=0.1V
10
100
f (Hz)
FPGA
Level
Shift
Pulse
Generator
(a)
1000
10-8
14nm FinFET
VG=Vth
DAC &
Filter
10
104
HRS
105
106
Resistance (Ω)
10-3
RRAM
R=3.2MΩ
10-5
(c)
(b)
1/f noise in RRAM
1/f @10Hz
thermal noise
RTN @10Hz
10-7
Noise in MOSFET/FinFET
10-9
2 Fins
10-11
100
f (Hz)
100
128Kbits
RRAM
Array
(c)
1000
80
LRS
HRS
100
60
40
Memory
Window
20
(a)
0
10
Intensity
400
RRAM cell
200
Cu
O
Ta
Ti
N
HRS
60
100~150KW
0h
300h
400h
40
20
10
2
Resistance (KΩ)
TiN/TiON/Ta2O5/TiN
300
1
(b)
80
0
RRAM cell
RRAM cell
Power
1/f traps
Fig.5 (a) Comparison of the maximum 1/f noise of logic devices and
RRAM. (b) Benchmark of noise in MOSFET/FinFET and RRAM. 1/f
noise in RRAM is more suitable to be used as the entropy source.
500
USB
1/f
traps
14nm FinFET
10-10
1
(b)
Bias&Test
MCU
LRS
10-7
1000
Area: 70✖70nm2
10-5
10-6
Peripheral
Level
Shift
Test
Chip
True randomness
10-3
10-4
1/f
Fig.3 Flicker noise in MOSFET. (a) 1/f noise density (S/I2) at
f=10Hz is inversely proportional to the device area for 28nm
MOSFET. (b) For different logic device, the W×L×S/I2 value keeps
constant under Vth, indicating the higher 1/f noise density (S/I2) can
be obtained by device area (W×L) scaling. (c) For FinFET with
same area, 1/f noise decreases as the increasing of VG.
(a)
100
f (Hz)
10-2
VG
10-10
10
(b)
S/I2 (Hz-1)
10-8
10-6
R=100K
Probability (%)
10-7
14nm FinFET
28nm HKMG
28nm PolySiON
40nm MOSFET
1/f
S/I2 (Hz-1)
W×L×S/I2D (μm2/Hz)
(b)
High noise density
Fig.4 Flicker 1/f noise in RRAM. (a) Normalized noise
level (S/I2) is independent with the size of RRAM cell.
(b) Measured S/I2 at 10Hz as a function of the
resistance. 1/f noise level of RRAM increases linearly
with R. (c) The mechanism of LRS and HRS 1/f noise.
W×L (μm2)
10-6
10-5
10-6
10-1
50×50nm2
60×60nm2
70×70nm2
80×80nm2
(a)
10-7
1
15
09
.
×0
.
×0
27
0.
6
0.
09
0.
2×
15
-14
1.
.
×0
85
15
10
0.
10-12
W×L×S/I2D (μm2/Hz)
10
-10
Easy to measure
Fig. 2 (a) Classification of different noise types in electron devices.
Compared with thermal noise and RTN, flicker noise, which is the
influence of many local traps, has high noise density and can be
easily measured. (b) Current fluctuation caused by 1/f noise has a
positive correlation with noise density.
10-4
10-8
1/f noise
Thermal
noise
Frequency (Hz)
10-3
VG=Vth
Time (s)
Corner
Freq. (fc)
single
Lorentzian
~1/f2
Comparison of energy and complexity of
different SNG implementation
VD=0.1V
28nm PolySiON
0.
5×
1.
W×L×S/ID2 @ 10Hz (μm2/Hz)
10
(a)
(b)
RTN
RRAM
Fig. 1 Stochastic computing uses SBS generated by SNG to do arithmetic
operations, consumes less hardware resources, and has the advantages of
high error-tolerance and no memory bottleneck. Motivation of SNG
based on RRAM: The SNG based on RRAM has lower energy and
simple structural complexity.
-6
ΔI
Probability (%)
AND
XOR
S/I2 @10Hz (Hz-1)
Divider
Adder
Binary
Data
envelope of multiple
Lorentzian ~1/f
S/I2 (Hz-1)
Multiplier
Subtracter
Single Logic Unit
Structural Complexity
CPU/ASIC
Energy(pJ/bit)
Based on Stochastic
Computing
Based on Deterministic
Computing
Flicker 1/f noise
Energy
Complexity
Current (A)
(a)
Arithmetic Operation
10-5
10-4
2
S/I @ 10Hz (Hz-1)
Fig.7 (a) Distributions of LRS and HRS in 10kb array. 1/f
noise of HRS is used as entropy source owing to the high
Fig.6 (a) Testing setup for the 28-nm RRAM based stochastic computing system; noise density. (b) The retention test of 1/f noise in HRS. The
(b) Die micrograph of 128kb RRAM test chip in 28-nm HKMG CMOS process; noise density decreases with time, which will reduce the
(c) The cross-section of RRAM cell in array and the zoom-in view of the cell accuracy of long-term SC applications. The decrease of noise
density with time can be attributed to resistance reduction.
structure with the EDX line scan across the single cell.
100
28nm HighK MG vehicle
0
0
30
60
90
Position(nm)
120
12.5.3
Authorized licensed use limited to: Zhejiang University. Downloaded on June 02,2022 at 08:46:18 UTC from IEEE Xplore. Restrictions apply.
IEDM21-268
RESET to HRS
READ HRS
Probability (%)
Forming with
different VWL
100
80
60
40
20
HRS @
0h
400h
10kb cells
per condition
Bake
READ HRS
TE
CF
36
Vo
CF
BE
12
ADDR
SBS
Amplifier
CLK
(b)
Vref_cmp
(d)
(c)
Vo
0h
0
A
Band pass filter
(d)
TE
Din
Reg
01101
Weak forming
(c)
TE
Vbias DAC
RRAM
Rref
BE
Strong forming
24
Vread
(d)
Vo
(b)
0
Weak forming
(c)
TE
1.2V
1.5V
1.8V
VWL =
(a)
Strong forming
Forming operation :
Probability (%)
Testing
Procedure
100 Vo
CF
CF
Resistance (KΩ)
101
BE
102
103 BE
(a)
Resistance (KΩ)
Fig. 8 (a) The testing procedure. (b) The weak forming scheme suppress the drift
of HRS resistance values to lower resistance values. Insert: Detailed version of
the resistance distribution below 120KΩ. The simulated morphology of filament
in HRS for different forming conditions (c)V WL=1.8V and (d)VWL=1.2V. The
probability of percolation paths formation increases for the larger cross-sectional
area of the residual filament.
RRAM based
TRNG1
RRAM based
TRNG3
ri,j
ri+1,j+1
Modulation
Circuit
0
ri+1,j
ri,j+1
Modulation
Circuit
Stochastic
Window
White noise ACF with 95%
confidence
(b)
Fig. 10 (a) The output probability
of ‘1’ in the random data verse the
change of dc bias voltage. (b) ACF
verified results at p=0.5, which
verifies that the flicker noise in
RRAM is an independent variable.
(a) 6.73%
7
P-Value Results
0.207 PASS
0.633
0.469
0.426
0.827
0.576
0.343
0.611
0.491
0.357
PASS
0.438
PASS
PASS
0.715
PASS
PASS
PASS
PASS
PASS
PASS
PASS
PASS
PASS
PASS
PASS
PASS
PASS
PASS
2.76%
4
8
16
32
BSL
64
(b)
40
35
@256 bit
31.68%
25.93%
25
15
256
37.48%
30
20
128
43.32%
45
MAE(%)
Block Frequency
Cumulative Sums-1
Cumulative Sums-2
Runs
Longest Runs of Ones
FFT
Rank
Universal Statistical
Approximate Entropy
Non Overlaping Template
Overlaping Template
Random Excusions
Random Excusions Var.
Linear Complexity
3.70%
3.13%
3
2
(e)Noise=20% @256 bits
(b)
(d)
BSL=128
(f) Noise=5% @256 bits
Software
Fig. 12 The edge detection result. (a) Original image. (b)
Software (binary) accurate output image. (c)-(d) Output image
with different BSLs. (e)-(f) Output image with different noise
levels. The MAE decreases with increasing BSL, there is still
a good detection result when the noise level is 5%, which
shows an excellent Error-Tolerant performance.
4.49%
4
Table 1. NIST verified results at
Vbias=0. PM-TRNG has passed all
NIST randomness tests.
NIST Test
Frequency
5.52%
5
BSL=8
1
7.69%
6
(c)
Zi,j
Fig. 11 Implementation of hardwaresaving edge detection circuit based on
stochastic computing. The Robert
operator can be realized by only two
XOR gates and one MUX, which 2-input
bit-streams of the XOR gate are
correlated.
8
(a) Original Image
r = 0.5
RRAM based
TRNG2
MAE(%)
p=0.5
(a)
Fig. 9 (a) Schematic illustration of the probability modulated
TRNG circuit. (b, c, d) The probability of ‘1’ in SBS for
three different Vbias. The SBS value displays stochastic
behavior at input change by 0mV, and becoming
deterministic as the input changes by about ±60 mV.
20.23%
14.61%
9.80%
10 5.89%
5
5% 10%15%20%25%30%35%40%
Noise Level
Fig. 13 (a) MAE (Mean Absolute
Error) decreases with increasing BSL.
(b) The MAE of different injected noise
ratios when BSL is 256.
Table 2. Comparison of this work with other SNGs
IEDM' 19 TED' 19 ICCD'14
[2]
[5]
[6]
Technology
28 nm
130 nm
45 nm 0.35 μm
RRAM
RRAM
CMOS
Stochastic Source
MTJ
Flicker Noise Switching
FSR
Single Cell Single Cell
128Kb
P-bit Capacity
with
with
8bits
Test Chip
Simulation Simulation
NIST Verified
Passed All
N/A
N/A
N/A
(@ p=0.5)
This work
SNG Area
1×
1×
10×
>100×
SNG Power
1×
SNG Delay
Edge Detection
Error Rate
(@128bits SBS)
1×
1000×
10×
>770×
2×
0.4×
5×
3.13%
15%
4%
21%
12.5.4
Authorized licensed use limited to: Zhejiang University. Downloaded on June 02,2022 at 08:46:18 UTC from IEEE Xplore. Restrictions apply.
IEDM21-269
Download