To address the concerns from frequent data-shuttling-related energy consumption and latency, computing-in-memory (CIM) applications in neural networks (NNs) have attracted much attention. Some previous works have demonstrated CIM-based NN inference[
Journal of Semiconductors, Volume. 45, Issue 1, 012301(2024)
Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance
With the rapid development of machine learning, the demand for high-efficient computing becomes more and more urgent. To break the bottleneck of the traditional Von Neumann architecture, computing-in-memory (CIM) has attracted increasing attention in recent years. In this work, to provide a feasible CIM solution for the large-scale neural networks (NN) requiring continuous weight updating in online training, a flash-based computing-in-memory with high endurance (109 cycles) and ultra-fast programming speed is investigated. On the one hand, the proposed programming scheme of channel hot electron injection (CHEI) and hot hole injection (HHI) demonstrate high linearity, symmetric potentiation, and a depression process, which help to improve the training speed and accuracy. On the other hand, the low-damage programming scheme and memory window (MW) optimizations can suppress cell degradation effectively with improved computing accuracy. Even after 109 cycles, the leakage current (Ioff) of cells remains sub-10pA, ensuring the large-scale computing ability of memory. Further characterizations are done on read disturb to demonstrate its robust reliabilities. By processing CIFAR-10 tasks, it is evident that ~90% accuracy can be achieved after 109 cycles in both ResNet50 and VGG16 NN. Our results suggest that flash-based CIM has great potential to overcome the limitations of traditional Von Neumann architectures and enable high-performance NN online training, which pave the way for further development of artificial intelligence (AI) accelerators.
1. Introduction
To address the concerns from frequent data-shuttling-related energy consumption and latency, computing-in-memory (CIM) applications in neural networks (NNs) have attracted much attention. Some previous works have demonstrated CIM-based NN inference[
2. Background
2.1. Flash-based CIM architecture
Figure 1.(Color online) Schematics of flash-based CIM architecture. The pulse time of Vg and the threshold voltage is individually mapped as vector and matrix, then the amount of charge can represent the result of MVM.
2.2. Flash operation by CHEI and HHI
The CHEI programming scheme and the HHI erasing scheme are adopted in this work. The separated word line (WL) and bit line (BL) of NOR flash allow single-cell selectivity and individual programming operations with CHEI. Different from the traditional Fowler-Nordheim (FN) tunneling operation, the HHI can tune cells individually, benefitting from the independent WL and BL[
Figure 2.(Color online) (a) Schematic of adopted CHEI and HHI programming scheme. (b) The energy band diagram of CHEI and HHI programming scheme.
2.3. ResNet50 and VGG16 neural networks
We choose the representative ResNet50 and VGG16 neural networks to test the performance of the proposed online training architecture. The system frame diagram of both two architectures is shown in
Figure 3.(Color online) The architecture of (a) ResNet 50 and (b) VGG 16 convolutional neural network.
3. Reliability analysis and discussion
The 55-nm NOR flash array is used to construct the CIM matrix for large-scale online training NNs, wherein the CHEI and the HHI[
Figure 4.(Color online) (a) The proposed scheme to improve both endurance and speed by optimizing the operation scheme for NN online training. (b) The comparison of the Vth tuning speed of FN tunneling and the HHI. (c) The high linearity and symmetric potentiation and depression process using the CHEI and the HHI combined methods.
Rather than the 7.7 V substrate operation voltage and −8 to −9.5 V gate operation voltage used in FN tunneling operation, the 0 V substrate operation voltage of the HHI makes the design of the peripheral circuit much simpler. More importantly, it is found that as fast as 10 ns (pulse width) operations can be adopted for Vth tuning. The HHI can achieve 104 times faster erasing than FN tunneling as shown in
In addition to adjustments in the programming and erasing scheme, the trade-off between MW and endurance is investigated in this work for CIM applications, especially for NN online training. Impressively, by adopting the CHEI–HHI combined programming scheme and lowering the MW, flash cells can realize record high endurance, exceeding 109 cycles in 0.2−0.5 V MW operations, which is enough for 1-bit/cell and even 2-bit/cell operations in CIM applications. In
After cycling, the subthreshold swing (SS) and the leakage current (Ioff) are also tested to avoid deteriorated computing accuracy as the result of on/off ratio degradation. Although both the traditional FN tunneling and the HHI will increase the SS value, the increment of SS is conducive to precise device programming for more precise weight updating of the CIM application instead. This is because larger SS values result in larger memory windows between adjacent programming states. Therefore, only the degradation of the Ioff will significantly impact the computational performance to the deteriorated on/off ratio. However, different from FN tunneling erasing with serious Ioff degradation, Ioff can be suppressed to sub-10 pA after 109 cycles by the HHI erasing (
This can be observed from the statistical data of SS and Ioff in
4. Performance in neural network
The comparison is analyzed between different programming schemes. Aiming at CIM applications, the test results of 55 nm flash memory in
Figure 5.(Color online) (a) The I–V curves of the programmed/erased state before and after 109 cycles. (b) Enhancements of endurance at lower MW show the trade-off between MW and endurance. (c) SS value and (d) Ioff of different MW and cycles compared with the traditionalprogramming scheme, wherein each box contains 15 different memory cells.
Figure 6.(Color online) (a) Comparisons between the proposed scheme and the traditional scheme. (b) Read disturbance of different states after 109 cycles. (c) Applications in CIFAR-10 using ResNet50 and Vgg16. Even after 109 cycles, ~90% accuracy can be achieved for the CIFAR-10 task.
|
The read disturb (RD) characteristic is then tested to evidence robust reliabilities in flash cells. Well-controlled RD lasting for 1 ks can be observed in
To further evaluate the performance of the compact flash-based NN system, the chip tester is designed to characterize flash-based CIMs and can support fast programming, as well as operations of matrix-vector multiplication (MVM). To demonstrate the feasibility of flash CIM, standard the ResNet50 and VGG16 convolutional neural network (CNN) is implemented for image classification in the CIFAR-10 dataset with 10 object classes, as shown in
5. Conclusion
This work shows the potential of flash-based computational-in-memory (CIM) devices to achieve high endurance (109) and ultra-fast programming speed (10 ns) through the implementation of the CHEI–HHI programming scheme and MW optimizations. Utilizing this optimized operation scheme, a compact flash-based NN online training CIM system is proposed. Our results demonstrate that even after 109 cycles, a high accuracy rate of approximately 90% can be attained when performing CIFAR-10 tasks. Further characterizations are done on read disturb to evidence robust reliabilities, highlighting its significant potential for online training of actual NN tasks. This work provides a comprehensive assessment of a flash-based online training network with potential implications for the advancement of AI accelerators.
[2] W S Khwa, K Akarvardar, Y S Chen et al. MLC PCM techniques to improve nerual network inference retention time by 105X and reduce accuracy degradation by 10.8X. Proc IEEE Symp VLSI Technol, 1(2020).
[3] W Y Zhang, S C Wang, Y Li et al. Few-shot graph learning with robust and energy-efficient memory-augmented graph neural network (MAGNN) based on homogeneous computing-in-memory. 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 224(2022).
[8] T Ravsher, D Garbin, A Fantini et al. Enhanced performance and low-power capability of SiGeAsSe-GeSbTe 1S1R phase-change memory operated in bipolar mode. 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 312(2022).
[13] W Choi, M Kwak, S Heo et al. Hardware neural network using hybrid synapses via transfer learning: WO x nano-resistors and TiO x RRAM synapse for energy-efficient edge-AI sensor. 2021 IEEE International Electron Devices Meeting (IEDM), 23.1. 1(2021).
Get Citation
Copy Citation Text
Yang Feng, Zhaohui Sun, Yueran Qi, Xuepeng Zhan, Junyu Zhang, Jing Liu, Masaharu Kobayashi, Jixuan Wu, Jiezhi Chen. Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance[J]. Journal of Semiconductors, 2024, 45(1): 012301
Category: Articles
Received: Jul. 14, 2023
Accepted: --
Published Online: Mar. 13, 2024
The Author Email: Wu Jixuan (JXWu), Chen Jiezhi (JZChen)