Spectroscopy and Spectral Analysis, Volume. 44, Issue 6, 1546(2024)
Research on the Twin Check Abnormal Sample Detection Method of Mid-Infrared Spectroscopy
Mid-infrared absorption spectroscopy is one of the most promising non-invasive blood glucose measurement techniques. The accuracy of blood glucose concentration measurement results of the mid-infrared absorption spectrum is closely related to the reliability of spectral signals. However, collecting mid-infrared spectral signals is susceptible to environmental or human factors, and an anomaly spectrum containing a large amount of interference information will be generated. The existence of an anomaly spectrum will reduce the effectiveness and reliability of the prediction model, so the detection and removal of abnormal samples are crucial. This study proposes that the twin check abnormal sample detection method can accurately screen and eliminate abnormal samples. This algorithm is divided into two stages. Firstly, the Monte Carlo cross-validation abnormal sample detection method is used to preliminarily screen abnormal samples and improve the stability of the spectral sample set. Secondly, based on the theory that Mahalanobis distance square approximately obeys chi-square distribution, the optimal threshold is adaptively determined, and the remaining data sets are re-identified with abnormal samples. 64 samples of the glucose-mixed imitated solution containing glucose, albumin, urea, lactic acid, fructose and cholesterol were studied. The twin check method first uses the characteristic that the sum of squared prediction errors is sensitive to abnormal samples to make a preliminary judgment on the abnormal samples in the spectral data set, and a total of 3 abnormal samples are detected. The PLS correction model is established after removing the abnormal samples from the spectral data set. The correlation coefficient of this model is 0.91, and RMSECV is 60.17 mg·dL-1. Secondly, the twin check method is based on the theory of Mahalanobis distance square approximately conforming to chi-square distribution, which realizes the adaptive identification of abnormal samples. A total of 12 abnormal samples were detected. The performance of the PLS model constructed after removing all abnormal samples was improved, with the correlation coefficient reaching 0.99 and RMSECV reaching 57.77 mg·dL-1. By comparing the results of the twin check method with the non-abnormal sample removal, PCA-MD method and Monte Carlo method, the superiority of this algorithm in abnormal sample detection is proved. Compared with the PLS model without removing abnormal samples, the correlation coefficient increased from 0.86 to 0.99, and RMSECV decreased from 67.51 to 57.77 mg·dL-1, increasing by 15.12% and 14.42%, respectively. This study provides a good solution strategy for the problem of false detection of normal samples or missing detection of abnormal samples due to the easy influence of threshold of existing abnormal sample detection methods, which is conducive to the methods accurate detection and elimination of abnormal samples, thus improving the accuracy and prediction performance of the prediction model. This method provides a way to eliminate the abnormal samples of mid-infrared absorption spectrum accurately.
Get Citation
Copy Citation Text
ZHANGZHU Shan-ying, ZHANG Ruo-jing, GU Han-wen, XIE Qin-lan, ZHANG Xian-wen, SA Ji-ming, LIU Yi. Research on the Twin Check Abnormal Sample Detection Method of Mid-Infrared Spectroscopy[J]. Spectroscopy and Spectral Analysis, 2024, 44(6): 1546
Received: May. 31, 2023
Accepted: --
Published Online: Aug. 28, 2024
The Author Email: Qin-lan XIE (xieqinlan@126.com)