Chinese Journal of Lasers, Volume. 51, Issue 23, 2311003(2024)
Quantitative Prediction of Heavy Metal Elements in white peony root Using Laser‐Induced Breakdown Spectroscopy and Semi‐Supervised Sequential Learning
With the rapid development of traditional Chinese medicine (TCM) and increasing attention paid to dietary therapy and health preservation, the market for Chinese patent medicines, medicinal materials, and medicated diets is expanding rapidly. Consequently, there is a growing demand for TCM raw materials, with concerns about the quality and heavy metal contamination of TCM materials attracting widespread attention. Although China has intensified its supervision of heavy metal content in commonly sold TCM raw materials and TCM decoction pieces, issues related to excessive heavy metal content remain, especially in rhizome and root TCM materials, which are more prone to adsorbing heavy metal elements from the environment. Therefore, real-time accurate monitoring of heavy metal concentration, to abide by TCM material market regulations, is of great significance in mitigating pollution risk and enhancing the safety of TCM materials.
Laser-induced breakdown spectroscopy (LIBS), which is capable of real-time, rapid, and simultaneous detection of multiple components, has emerged as a promising metal detection technology and is considered one of the most promising analytical methods in the field of TCM material quality monitoring. However, in practice, regional differences and inherent genetic characteristics of TCM material matrix samples make it difficult to obtain comparable or similar matrix samples for calibration, complicating the accurate quantification of the elements to be tested in TCM materials and placing higher demands on the processing of LIBS spectral data. In recent years, deep learning methods, with an ability for powerful representation learning and discovering intrinsic patterns in high-dimensional data, have become a hotspot in LIBS quantitative analysis research. The premise of successful deep-learning methods is model training based on a large amount of labeled data. However, in LIBS spectral detection and quantitative analysis of TCM materials, the complexity of the TCM material matrix complicates the collection and accurate calibration of a large amount of labeled data. Consequently, the amount of data required for quantitative analysis model training is insufficient. In contrast, obtaining unlabeled LIBS spectral data from TCM materials is relatively easy. Thus, methods to improve the performance of deep learning using unlabeled data are urgently required.
In TCM, white peony root (WPR) is a common medicinal and dietary rhizome that can be contaminated with heavy metals during its growth, processing, storage, and sale. This study proposes a semi-supervised deep learning framework for predicting the content of two heavy metal elements, Pb and Cd, based on WPR LIBS data. The framework includes a sequence modeling module based on multilayer multichannel causal convolution, which transforms one-dimensional discrete LIBS data into a two-dimensional dense embedding matrix. Additionally, a feature extraction module based on multiresolution one-dimensional sequential convolution is designed to extract semantic features for Pb and Cd content prediction. An autoencoder module that guides the network to learn the topological information from the original data is used for reconstructing the LIBS data, thereby enabling the full utilization of the unlabeled data. This framework addresses the problem of poor prediction performance of heavy metals content in white peony caused by the high cost of obtaining labeled data in practical applications.
This study proposes a semi-supervised sequential learning framework for predicting the Pb and Cd contents in WPR based on LIBS data. A parameter-shared dual-branch network is employed to model labeled and unlabeled white peony LIBS data in a unified network using end-to-end training. First, discrete LIBS data were transformed into dense embedding matrices through multilayer multichannel causal convolutions for sequential modeling. Then, a multiresolution one-dimensional temporal convolution module was utilized to extract semantic feature representations for labeled and unlabeled WPR LIBS data from the dense embedding matrices. The parameters of both the multilayer multichannel causal convolutions and the multiresolution one-dimensional temporal convolution module were shared among the branches of the labeled and unlabeled data. Subsequently, the semantic feature representations of the labeled white peony LIBS data were fed into a prediction module composed of a deep neural network (DNN) for regression prediction of Pb and Cd contents. To fully utilize the unlabeled data in guiding the training of the feature extraction module, both labeled and unlabeled features were reconstructed using an autoencoder module to capture the original discrete LIBS spectral data features.
This study compares and studies the impact of different optimizers. The proposed multiresolution one-dimensional sequential convolution was combined with the Adam optimizer, and the quantitative analysis results of four commonly used deep learning sequence data feature extraction models, including DNN, long short-term memory (LSTM), bidirectional LSTM (BiLSTM), and Transformer, were studied regarding their effectiveness as semi-supervised learning strategies. The results indicate that compared with other commonly used sequence data feature extraction methods, the proposed one-dimensional multiresolution sequential convolution combined with the Adam optimizer converges faster, achieves a loss convergence value closer to 0, and exhibits smaller prediction errors on the test set (Fig.6, Table 2). After incorporating semi-supervised learning with unlabeled data, the average relative errors in predicting the Pb and Cd contents decreased to 4.12% and 3.32% (Table 3), respectively, on the test set, demonstrating good spectral reproducibility with an average relative error of 0.22% between the original and reconstructed spectral data (Fig.11). In comparative studies, the integration of LIBS technology with semi-supervised learning, effectively alleviated the dependency of deep learning models on labeled data during training, reducing the cost and challenges in heavy metal detection applications of white peony LIBS, thereby providing a more effective method for the quantitative analysis of complex elemental compositions in traditional Chinese medicinal materials.
In response to the challenge of high acquisition costs in practical applications for labeled LIBS data in predicting heavy metal content in WPR, a semi-supervised sequential learning method was proposed to predict the Pb and Cd contents of WPR using a complex matrix based on LIBS data. The algorithm consists of a parameter-shared dual-branch deep-learning network that models both labeled and unlabeled LIBS data in a unified framework for end-to-end training. Discrete LIBS data were transformed into dense embedding vector matrices using multilayer, multichannel causal convolutions. By combining the local features of LIBS spectral characteristic lines with sample element content-related features, multiresolution one-dimensional sequential convolutions were employed to extract local contextual semantic features from LIBS spectral dense embedding vector matrices. The regression module for Pb and Cd contents was trained using only the labeled data. To fully utilize the knowledge embedded in the unlabeled data and enhance the feature extraction capability of the model, an autoencoder was utilized to reconstruct the discrete LIBS data of all WPR samples. In experimental comparisons, the impact of different optimizers on model training was investigated using only labeled data, providing quantitative analysis results using multiresolution one-dimensional temporal convolution which was compared with four other sequence data feature extraction structures: DNN, LSTM, BiLSTM, and Transformer. The results indicate that the proposed multiresolution one-dimensional temporal convolution model combined with the Adam optimizer converges faster and achieves smaller prediction errors on the test set (Pb: 5.54%, Cd: 5.16%). After incorporating unlabeled data for semi-supervised learning, the average relative errors for the Pb and Cd predictions in the test set decreased to 4.12% and 3.32%, respectively.
Get Citation
Copy Citation Text
Fudong Nian, Yujie Hu, Fuqiang Chen, Zhao Cheng, Yanhong Gu. Quantitative Prediction of Heavy Metal Elements in white peony root Using Laser‐Induced Breakdown Spectroscopy and Semi‐Supervised Sequential Learning[J]. Chinese Journal of Lasers, 2024, 51(23): 2311003
Category: spectroscopy
Received: Apr. 19, 2024
Accepted: May. 27, 2024
Published Online: Dec. 10, 2024
The Author Email: Gu Yanhong (guyh@hfuu.edu.cn)
CSTR:32183.14.CJL240790