Spectroscopy and Spectral Analysis, Volume. 45, Issue 1, 52(2025)
A Combinatorial Optimization Strategy for Near-Infrared Spectral Data Preprocessing
Preprocessing is an important step in constructing a near-infrared (NIR) spectroscopy detection model, which significantly affects the accuracy of the detection process. Various preprocessing methods are available, each designed to address specific types of noise and irrelevant information, thereby improving the signal-to-noise ratio. Optimizing the preprocessing combination is essential for achieving the desired model results. This study proposes a strategy for the combinatorial optimization of pre-processing methods for the calibration of near-infrared spectroscopy models, which includes selecting eight commonly used preprocessing methods to establish a library of preprocessing methods, building a quantitative model using the partial least squares (PLS) method.Then selecting preprocessing combinations from the library that have an excellent calibration capability for the model simply and efficiently, using the root-mean-square-error-of-cross-verification of the model (RMSECV) as an iterative criterion.The strategy's structural design employs the greedy algorithm for optimization and achieves global optimization by searching for the optimal preprocessing method at each step. This enables the selection of preprocessing combinations for spectral data to be completed simply and efficiently. Tests were conducted on publicly available datasets such as wheat and meat, and the proposed strategy was compared with a similar stacked strategy (Stacked) and sequential orthogonal fusion of multi-block data strategy (SPORT). The results show that on the wheat dataset, the proposed strategy reduced the root mean square error of calibration (RMSEC) by 12%, 6%, and the root mean square error of prediction (RMSEP) by 32%, 17% compared to the Stacked and SPORT strategies, respectively. On the meat dataset, the proposed strategy reduced the RMSEC compared to the Stacked and SPORT strategies by 49% and 48%, and RMSEP was reduced by 46% and 41%, respectively.These results demonstrate good calibration performance.Finally, this analysis examines the contribution of the preprocessing methods selected by the strategy in model calibration. It also discusses the strategy's potential in terms of model interpretability and prevention of overfitting. The strategy presents a new approach to selecting preprocessing methods for NIR spectroscopy.
Get Citation
Copy Citation Text
ZHOU Yu-kun, CHEN Xiao-jing, XIE Zhong-hao, SHI Wen, YUAN Lei ming, CHEN Xi, HUANG Guang-zao. A Combinatorial Optimization Strategy for Near-Infrared Spectral Data Preprocessing[J]. Spectroscopy and Spectral Analysis, 2025, 45(1): 52
Received: Jan. 18, 2024
Accepted: Feb. 28, 2025
Published Online: Feb. 28, 2025
The Author Email: Wen SHI (shiwen@wzu.edu.cn)