A 128kb Stochastic Computing Chip based on RRAM Flicker Noise with High Noise Density and Nearly Zero Autocorrelation on 28-nm CMOS Platform Tiancheng Gong†1, Qiao Hu†1, Danian Dong1, Haijun Jiang2, Jianguo Yang*1,2, Xiaoxin Xu*1, Xiaoming Chen3, Qing Luo1, Qi Liu4, Steve S. Chung5, Hangbing Lyu1, Ming Liu1 2021 IEEE International Electron Devices Meeting (IEDM) | 978-1-6654-2572-8/21/$31.00 ©2021 IEEE | DOI: 10.1109/IEDM19574.2021.9720588 1Key Laboratory of Microelectronics Devices and Integrated Technology, Institute of Microelectronics of Chinese Academy of Sciences, Beijing, China; 2Zhejiang Lab, Hangzhou, China; 3Institute of Computing Technology of Chinese Academy of Sciences, Beijing, China; 4Frontier Institute of Chip and System, Fudan University, China; 5National Yang Ming Chiao Tung University, Taiwan. Email: yangjianguo@ime.ac.cn; xuxiaoxin@ime.ac.cn Abstract—In this work, a 128kb RRAM flicker noise based Stochastic Computing (SC) chip is demonstrated on 28-nm HKMG CMOS platform for the first time. Flicker noise in RRAM is selected as the entropy source owing to the higher noise density compared with the conventional logic devices, even the advanced 14-nm FinFET. The reliability of the flicker noise in RRAM is further improved by an optimized weak forming scheme at the array level towards accurate and stable SC applications. Moreover, a probability modulated true random number generator (PM-TRNG) circuit is proposed to generate stochastic bit-streams (SBS) with the desired probability and eliminated correlation. The autocorrelation function (ACF) and NIST testing results show that the correlation of output SBS is nearly zero. Finally, this RRAMflicker noise based SC chip is successfully implemented for edge detection in image processing system and extremely low error rate (3.13%@128 BSL) can be achieved. I. INTRODUCTION Arithmetic operations based on deterministic computing, such as multiplication, division, addition and subtraction, usually consume large logical resources. Compared with deterministic computing, stochastic computing (SC) only needs a simple single logical unit to complete the above operations as shown in Fig. 1 [1]. Stochastic computing, which is based on the operation of stochastic bit-streams (SBS) generated by stochastic number generator (SNG), consumes less hardware resources, and has the advantages of high error-tolerance and no memory bottleneck [2]. Recently, the robust SNG based on RRAM has attracted great attention owing to its lower energy and simple structural complexity [3-4]. However, due to the reliability and autocorrelation of the entropy source, none has been demonstrated for RRAM-based SC chip. For the first time, we fabricated a 128kb SC chip based on RRAM flicker noise on 28-nm HKMG CMOS platform. Through systematic comparison of the flicker noise in 40-nm MOSFET, 28-nm MOSFET, 14-nm FinFET and RRAM, RRAM shows the highest flicker noise density, which will ensure the accuracy of SC applications. After the ingenious modulation of the noise density by an optimized operation scheme, the highly reliable flicker noise is achieved, which can be used for stable SC applications. The approaching zero autocorrelation SBS is achieved by the probability modulated 978-1-6654-2572-8/21/$31.00 ©2021 IEEE TRNG circuit. By using this SC chip, extremely low error rate (3.13%@128 BSL) is achieved for edge detection application. II. SELECTION OF THE ENTROPY SOURCE Noise of electron devices is widely adopted as the entropy source as a result of its randomness [7]. Fig. 2(a) shows the classification of noise in electron devices. Thermal noise, whose spectral density is too tiny to be observed, can be easily affected by the testing environment. RTN, which is caused by a single trap, is difficult to activate and control. Also, the noise density of RTN is relatively low. Among all types of noise, flicker noise, which is regarded as the influence of many local traps, can be easily measured and has the highest noise density. The slope of flicker noise spectrum is -1 and the current fluctuation is shown in Fig. 2(b). In order to ensure the accuracy of SC, the noise density of flicker noise (S/I2) should be increased. Flicker noises of MOSFET/FinFET devices are shown in Fig. 3. The normalized drain current noise (W×L×S/I2), which is extracted for a fixed frequency (f=10Hz), is plotted as a function of the device area (W×L) in Fig. 3(a) for 28-nm PolySiON MOSFET devices. The average value of the normalized noise level in log scale is relatively constant with the area for all devices, indicating that the 1/f noise density scales as the inverse of the device area. For MOSFET/FinFET devices with different process nodes (40-nm, 28-nm, 14-nm), the W×L×S/I2 value keeps the same under the same ID as shown in Fig. 3(b). Thus, the maximum 1/f noise density can be obtained under the smallest area. The W×L×S/I2 value as a function of VG is shown in Fig. 3(c). The W×L×S/I2 value decreases with the increasing bias, VG. Flicker noises of RRAM devices are shown in Fig. 4. The normalized noise does not change with the device area as shown in Fig. 4(a). Fig. 4(b) shows the noise density as a function of the resistance. Obviously, when the resistance of the RRAM device increases, the noise density also increases. The schematic mechanism of LRS and HRS 1/f noise is shown in Fig. 4(c). The maximum 1/f noise of RRAM (3.2MΩ) and FinFET (VG=Vth, 2 Fins, 1 Finger) are compared as shown in Fig. 5(a). The maximum 1/f noise density of RRAM is much higher than that of FinFET. Fig. 5(b) shows the benchmark of noise in MOSFET/FinFET and RRAM. Based on the above results, 1/f noise in HRS of RRAM devices is suitable to be used as the entropy source owing to its high noise density. 12.5.1 Authorized licensed use limited to: Zhejiang University. Downloaded on June 02,2022 at 08:46:18 UTC from IEEE Xplore. Restrictions apply. IEDM21-266 III. TEST CHIP MEASUREMENT RESULTS A. Optimization of 1/f Noise in RRAM at the Array Level Based on the above analysis, the RRAM flicker noise based 128kbit SC test chip is implemented on SMIC 28nm HKMG CMOS platform. The die micrograph is shown in Fig. 6(b). The cross-section of RRAM cell in array and the zoom-in view of the cell structure with the EDX line scan across the single cell are shown in Fig. 6(c). Fig. 6(a) is the testing setup, including microcontroller, pulse and bias generator and FPGA with level shift interface circuit. Fig. 7(a) shows the distributions of LRS and HRS in 10kb mini-array. 1/f noise of HRS is used as the entropy source owing to its high noise density. However, the noise density of HRS decreases with the retention time as shown in Fig. 7(b), which will reduce the accuracy of stable SC applications. Based on the corresponding relationship between resistance and noise density, the decrease of noise density can be attributed to the reduction of resistance. To solve this problem, we proposed a weak forming scheme to suppress the drift of HRS resistance values towards to lower resistance values. Fig. 8(a) shows the operation procedure. Different VWLs are adopted during the forming process for three mini-arrays, and then all the devices are reset to the same HRS. In the subsequent retention measurement, the memory cells undergoing the strong forming operation tend to drift towards lower resistance values as shown in Fig. 8(b). This phenomenon could be explained by the percolation model [8] as shown in Figs. 8(c)(d). As the VWL increases, the size of the conductive filament becomes larger, remaining the larger crosssectional area of the residual filament after reset. In this case, the percolation paths are more likely to form by connecting the VO in the gap region, resulting in the lower resistance. B. The Probability Modulated TRNG Circuit for Independent SBS A schematic of the PM-TRNG circuit is shown in Fig. 9(a). The noise of the selected RRAM cell is amplified. A passive band pass filter is implemented to remove the undesired frequency noise and set a common DC-bias for the noise signal (Vbias). The probability of a bit ‘1’ at the comparator output is modulated by the bias voltage Vbias which is controlled by the DAC circuit. The SBS with a desired probability of logic ‘1’ is achieved from the value in a register. The Vref_cmp is initially chosen to be equal to the Vbias, and then remains unchanged. Owing to the high-density flicker noise from the RRAM, only a single amplifier and a comparator are need in this circuit, which greatly decreases the circuit area. Figs. 9(b)(c)(d) shows the measured time-varying SBS value for specific input voltages, displaying stochastic behavior at input change by 0 mV, but becoming deterministic as the input changes by about ±60 mV. Fig. 10(a) shows the measured output probability of ‘1’ in the random data (120k samples) versus the dc bias voltage of the noise signal. The out probability of ‘1’ granularity is set by the resolution of the D/A converter. For a desired probability of 0.5, the sequences randomness was validated by the autocorrelation function (ACF) and NIST tests. The ACF tested result in Fig. 10(b) verifies that the flicker noise in RRAM is an independent variable. Our proposed PM-TRNG has passed all NIST randomness tests as summarized in Table 1. C. Error-Tolerant Stochastic Computing for Edge Detection Application The proposed flicker noise SC chip can be used for edge detection in image processing system based on the Robert cross algorithm as shown in Fig. 11. By using the correlation between two SBSs [9], one XOR gate can achieve absolute value subtraction, so a Robert operator can be realized by only two XOR gates and one MUX, which requires the SBSs used by the three logic units to be irrelevant. In this way, one Robert operator can save two random number generators, which can greatly save hardware resources in massively parallel detection. In Fig. 12, the Cameraman image with a size of 320×320 (P×Q) pixels was used for Roberts cross edge detection in different BSLs (Bit Stream Lengths) and noise level (percentage of bit flips of SBS caused by external disturbance). Fig. 13 shows the measured MAE for different BSLs and noise levels compared with the software results using binary to perform Robert operator operations. It can be seen that the MAE decreases with increasing BSL (also increasing computing time), which enables us to make a trade-off between BSL (computing time) and accuracy according to different applications. For example, the BSL can be reduced for real-time processing applications when only the rough contour of the object needs to be initially located. For medical images applications, which require high image recognition accuracy, the BSL can be increased to improve the detection accuracy. Meanwhile, it can be seen from Figs. 12(f) and 13 (b) that there is still a good detection result when the noise level is 5%, which shows an excellent ErrorTolerant performance. IV. CONCLUSION In summary, a 128kb SC chip based on RRAM flicker noise with high noise density and approaching zero autocorrelation in 28-nm CMOS platform is demonstrated for accurate and stable SC applications. Table 2 summarizes this work and compares to state-of-the-art SNGs. The presented test chip reduced the area by 100× and the power by 770× compared with the traditional CMOS implementations, which shows its potential for deployment in error tolerant and energy-efficient SC designs to mitigate the bottleneck challenges of deterministic computing. ACKNOWLEDGMENTS This work was supported in part by the MOST of China under Grants 2019YFB2205100, 2018YFA0701500, and in part by the National Natural Science Foundation of China under Grants 61904200, 62025406, 61834009 and the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDB44000000. T. Gong and Q. Hu contribute equally to this work. REFERENCES [1] B. R. Gaines, “Stochastic computing systems,” Advances in Information Systems Science, Springer, pp. 37-172, 1969. [2] Y. Zhao, et al., “A Physics-based Model of RRAM Probabilistic Switching for Generating Stable and Accurate Stochastic Bit-streams,” IEDM Tech., pp. 32.4.1-32.4.1, 2019. [3] P. Knag, et al., “RRAM solutions for stochastic computing,” Stochastic Computing: Techniques and Applications, Springer, pp. 153-164, 2019. [4] D. Ielmini, et al., “In-memory computing with resistive switching devices,” Nature Electronics, vol. 1, no. 6, pp. 333-343, 2018. [5] J. Hu, et al., “Spin-Hall-Effect-Based Stochastic Number Generator for Parallel Stochastic Computing,” IEEE TED, vol. 66, no. 8, pp. 3620-3627, 2019. [6] H. Ichihara, et al., “Compact and accurate stochastic circuits with shared random number sources,” ICCD, pp. 361-366, 2014. [7] A. A. Balandin, “Low-frequency 1/f noise in graphene devices,” Nature Nano., vol. 8, no. 8, pp. 549-555, 2013. [8] T. Ninomiya, et al., “Conductive filament scaling of TaOx bipolar ReRAM for long retention with low current operation,” VLSI, pp. 73-74, 2012. [9] A. Alaghi, et al., “Stochastic circuits for real-time image-processing applications,” DAC, pp. 1-6, 2013. 12.5.2 Authorized licensed use limited to: Zhejiang University. Downloaded on June 02,2022 at 08:46:18 UTC from IEEE Xplore. Restrictions apply. IEDM21-267 Memory FF+MUX MUX Stochastic Bit Stream This work (RRAM 1/f noise) SNG Memory bottleneck Large arithmetic resources Noise sensitivity Analog Less Memory Access Resource saving Error-Tolerant Digital S/I2 (Hz-1) 10-9 10-10 1/f VD=0.1V VG=Vth -12 10 1 10 100 f (Hz) 10 -11 1000 (c) VG-Vth=0 VG-Vth=0.1V 10-7 VG-Vth=0.2V 10-8 14nm FinFET 10-9 10 -11 10-12 1 VD=0.1V 10 100 f (Hz) FPGA Level Shift Pulse Generator (a) 1000 10-8 14nm FinFET VG=Vth DAC & Filter 10 104 HRS 105 106 Resistance (Ω) 10-3 RRAM R=3.2MΩ 10-5 (c) (b) 1/f noise in RRAM 1/f @10Hz thermal noise RTN @10Hz 10-7 Noise in MOSFET/FinFET 10-9 2 Fins 10-11 100 f (Hz) 100 128Kbits RRAM Array (c) 1000 80 LRS HRS 100 60 40 Memory Window 20 (a) 0 10 Intensity 400 RRAM cell 200 Cu O Ta Ti N HRS 60 100~150KW 0h 300h 400h 40 20 10 2 Resistance (KΩ) TiN/TiON/Ta2O5/TiN 300 1 (b) 80 0 RRAM cell RRAM cell Power 1/f traps Fig.5 (a) Comparison of the maximum 1/f noise of logic devices and RRAM. (b) Benchmark of noise in MOSFET/FinFET and RRAM. 1/f noise in RRAM is more suitable to be used as the entropy source. 500 USB 1/f traps 14nm FinFET 10-10 1 (b) Bias&Test MCU LRS 10-7 1000 Area: 70✖70nm2 10-5 10-6 Peripheral Level Shift Test Chip True randomness 10-3 10-4 1/f Fig.3 Flicker noise in MOSFET. (a) 1/f noise density (S/I2) at f=10Hz is inversely proportional to the device area for 28nm MOSFET. (b) For different logic device, the W×L×S/I2 value keeps constant under Vth, indicating the higher 1/f noise density (S/I2) can be obtained by device area (W×L) scaling. (c) For FinFET with same area, 1/f noise decreases as the increasing of VG. (a) 100 f (Hz) 10-2 VG 10-10 10 (b) S/I2 (Hz-1) 10-8 10-6 R=100K Probability (%) 10-7 14nm FinFET 28nm HKMG 28nm PolySiON 40nm MOSFET 1/f S/I2 (Hz-1) W×L×S/I2D (μm2/Hz) (b) High noise density Fig.4 Flicker 1/f noise in RRAM. (a) Normalized noise level (S/I2) is independent with the size of RRAM cell. (b) Measured S/I2 at 10Hz as a function of the resistance. 1/f noise level of RRAM increases linearly with R. (c) The mechanism of LRS and HRS 1/f noise. W×L (μm2) 10-6 10-5 10-6 10-1 50×50nm2 60×60nm2 70×70nm2 80×80nm2 (a) 10-7 1 15 09 . ×0 . ×0 27 0. 6 0. 09 0. 2× 15 -14 1. . ×0 85 15 10 0. 10-12 W×L×S/I2D (μm2/Hz) 10 -10 Easy to measure Fig. 2 (a) Classification of different noise types in electron devices. Compared with thermal noise and RTN, flicker noise, which is the influence of many local traps, has high noise density and can be easily measured. (b) Current fluctuation caused by 1/f noise has a positive correlation with noise density. 10-4 10-8 1/f noise Thermal noise Frequency (Hz) 10-3 VG=Vth Time (s) Corner Freq. (fc) single Lorentzian ~1/f2 Comparison of energy and complexity of different SNG implementation VD=0.1V 28nm PolySiON 0. 5× 1. W×L×S/ID2 @ 10Hz (μm2/Hz) 10 (a) (b) RTN RRAM Fig. 1 Stochastic computing uses SBS generated by SNG to do arithmetic operations, consumes less hardware resources, and has the advantages of high error-tolerance and no memory bottleneck. Motivation of SNG based on RRAM: The SNG based on RRAM has lower energy and simple structural complexity. -6 ΔI Probability (%) AND XOR S/I2 @10Hz (Hz-1) Divider Adder Binary Data envelope of multiple Lorentzian ~1/f S/I2 (Hz-1) Multiplier Subtracter Single Logic Unit Structural Complexity CPU/ASIC Energy(pJ/bit) Based on Stochastic Computing Based on Deterministic Computing Flicker 1/f noise Energy Complexity Current (A) (a) Arithmetic Operation 10-5 10-4 2 S/I @ 10Hz (Hz-1) Fig.7 (a) Distributions of LRS and HRS in 10kb array. 1/f noise of HRS is used as entropy source owing to the high Fig.6 (a) Testing setup for the 28-nm RRAM based stochastic computing system; noise density. (b) The retention test of 1/f noise in HRS. The (b) Die micrograph of 128kb RRAM test chip in 28-nm HKMG CMOS process; noise density decreases with time, which will reduce the (c) The cross-section of RRAM cell in array and the zoom-in view of the cell accuracy of long-term SC applications. The decrease of noise density with time can be attributed to resistance reduction. structure with the EDX line scan across the single cell. 100 28nm HighK MG vehicle 0 0 30 60 90 Position(nm) 120 12.5.3 Authorized licensed use limited to: Zhejiang University. Downloaded on June 02,2022 at 08:46:18 UTC from IEEE Xplore. Restrictions apply. IEDM21-268 RESET to HRS READ HRS Probability (%) Forming with different VWL 100 80 60 40 20 HRS @ 0h 400h 10kb cells per condition Bake READ HRS TE CF 36 Vo CF BE 12 ADDR SBS Amplifier CLK (b) Vref_cmp (d) (c) Vo 0h 0 A Band pass filter (d) TE Din Reg 01101 Weak forming (c) TE Vbias DAC RRAM Rref BE Strong forming 24 Vread (d) Vo (b) 0 Weak forming (c) TE 1.2V 1.5V 1.8V VWL = (a) Strong forming Forming operation : Probability (%) Testing Procedure 100 Vo CF CF Resistance (KΩ) 101 BE 102 103 BE (a) Resistance (KΩ) Fig. 8 (a) The testing procedure. (b) The weak forming scheme suppress the drift of HRS resistance values to lower resistance values. Insert: Detailed version of the resistance distribution below 120KΩ. The simulated morphology of filament in HRS for different forming conditions (c)V WL=1.8V and (d)VWL=1.2V. The probability of percolation paths formation increases for the larger cross-sectional area of the residual filament. RRAM based TRNG1 RRAM based TRNG3 ri,j ri+1,j+1 Modulation Circuit 0 ri+1,j ri,j+1 Modulation Circuit Stochastic Window White noise ACF with 95% confidence (b) Fig. 10 (a) The output probability of ‘1’ in the random data verse the change of dc bias voltage. (b) ACF verified results at p=0.5, which verifies that the flicker noise in RRAM is an independent variable. (a) 6.73% 7 P-Value Results 0.207 PASS 0.633 0.469 0.426 0.827 0.576 0.343 0.611 0.491 0.357 PASS 0.438 PASS PASS 0.715 PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS 2.76% 4 8 16 32 BSL 64 (b) 40 35 @256 bit 31.68% 25.93% 25 15 256 37.48% 30 20 128 43.32% 45 MAE(%) Block Frequency Cumulative Sums-1 Cumulative Sums-2 Runs Longest Runs of Ones FFT Rank Universal Statistical Approximate Entropy Non Overlaping Template Overlaping Template Random Excusions Random Excusions Var. Linear Complexity 3.70% 3.13% 3 2 (e)Noise=20% @256 bits (b) (d) BSL=128 (f) Noise=5% @256 bits Software Fig. 12 The edge detection result. (a) Original image. (b) Software (binary) accurate output image. (c)-(d) Output image with different BSLs. (e)-(f) Output image with different noise levels. The MAE decreases with increasing BSL, there is still a good detection result when the noise level is 5%, which shows an excellent Error-Tolerant performance. 4.49% 4 Table 1. NIST verified results at Vbias=0. PM-TRNG has passed all NIST randomness tests. NIST Test Frequency 5.52% 5 BSL=8 1 7.69% 6 (c) Zi,j Fig. 11 Implementation of hardwaresaving edge detection circuit based on stochastic computing. The Robert operator can be realized by only two XOR gates and one MUX, which 2-input bit-streams of the XOR gate are correlated. 8 (a) Original Image r = 0.5 RRAM based TRNG2 MAE(%) p=0.5 (a) Fig. 9 (a) Schematic illustration of the probability modulated TRNG circuit. (b, c, d) The probability of ‘1’ in SBS for three different Vbias. The SBS value displays stochastic behavior at input change by 0mV, and becoming deterministic as the input changes by about ±60 mV. 20.23% 14.61% 9.80% 10 5.89% 5 5% 10%15%20%25%30%35%40% Noise Level Fig. 13 (a) MAE (Mean Absolute Error) decreases with increasing BSL. (b) The MAE of different injected noise ratios when BSL is 256. Table 2. Comparison of this work with other SNGs IEDM' 19 TED' 19 ICCD'14 [2] [5] [6] Technology 28 nm 130 nm 45 nm 0.35 μm RRAM RRAM CMOS Stochastic Source MTJ Flicker Noise Switching FSR Single Cell Single Cell 128Kb P-bit Capacity with with 8bits Test Chip Simulation Simulation NIST Verified Passed All N/A N/A N/A (@ p=0.5) This work SNG Area 1× 1× 10× >100× SNG Power 1× SNG Delay Edge Detection Error Rate (@128bits SBS) 1× 1000× 10× >770× 2× 0.4× 5× 3.13% 15% 4% 21% 12.5.4 Authorized licensed use limited to: Zhejiang University. Downloaded on June 02,2022 at 08:46:18 UTC from IEEE Xplore. Restrictions apply. IEDM21-269