Spectroscopy and Spectral Analysis, Volume. 44, Issue 3, 737(2024)
Characteristic Wavelength Selection Method and Application of Near Infrared Spectrum Based on Lasso Huber
In near-infrared spectroscopy ( NIRS ) wavelength screening, selecting characteristic wavelengths is challenging problem when the number of variables is much larger than the sample size. Lasso and Elastic Net algorithms are used for variable selection for large-dimensional small-sample data, but both use the least square error to measure the loss function to select characteristic variables. Therefore, when the sample contains outliers, the model established using Lasso or Elastic Net algorithms is more sensitive to outliers, resulting in the model shifting to outliers and reduced robustness. Because of the above problems, this paper uses the Huber function as the loss function and proposes the Lasso-Huber wavelength selection method for near-infrared characteristic wavelength selection. Combined with the partial least squares ( PLS ) method, the quantitative correction model of the quality control index components of Antai pills is established and compared with the model performance of full wavelength modeling, Lasso and Elastic-Net method wavelength selection. In this experiment, 116 NIRS data from 21 batches of Antai Pills were collected, of which 101 data were used as calibration sets. The model was internally verified by the leave-one-out cross-validation method, and the other 15 data were used as validation sets for external verification. The Mahalanobis distance method ( MD ) based on principal component analysis ( PCA ) was used for detection for outliers in the calibration set. Taking ferulic acid, one of the quality control index components of Antai pills, as an example, Lasso, Elastic-Net and Lasso-Huber methods were used to screen 69, 155 and 87 characteristic wavelength points in the normal spectra of Antai pill samples. The prediction model established by the Lasso-Huber method combined with PLS was the best, and the R2p and SEP of the prediction set were 0.953 1 and 0.058 7. In addition, the Lasso-Huber method was found to be more advantageous in modeling with outliers by comparing the prediction performance of calibration models normal spectra and outliers in the calibration set. The results showed that the optimal number of wavelength points selected by the Lasso-Huber algorithm was 88, and the performance R2vof the model combined with PLS was 0.967 3, while the R2vof the Lasso method is 0.840 5, the R2vof the Elastic-Net method was 0.834 7, the of the full wavelength modeling is 0.852 0. It can be seen that in the samples with outliers, the Lasso-Huber method not only reduces the number of characteristic bands but also reduces the algorithms sensitivity to outliers, improving the accuracy and robustness of the model. From the perspective of the simplified model, the modeling time of Lasso and Elastic-Net is 61.826 0 and 79.959 9 s, while the modeling time of Lasso-Huber is only 1.360 8 s. Therefore, the algorithm is expected to be integrated into the near-infrared spectroscopy modeling software for practical production applications in the future.
Get Citation
Copy Citation Text
GUO Tuo, XU Feng-jie, MA Jin-fang, XIAO Huan-xian. Characteristic Wavelength Selection Method and Application of Near Infrared Spectrum Based on Lasso Huber[J]. Spectroscopy and Spectral Analysis, 2024, 44(3): 737
Received: Jul. 10, 2022
Accepted: --
Published Online: Aug. 6, 2024
The Author Email: Jin-fang MA (majf0351@126.com)