Spectroscopy and Spectral Analysis, Volume. 41, Issue 4, 1097(2021)

Quantitative Analysis of Near Infrared Spectroscopy Based on Orthogonal Matching Pursuit Algorithm

LI Si-hai1、* and LIU Dong-ling2
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • show less

    Compressed sensing (CS) is a new technology of signal compression and sampling. Orthogonal Matching Pursuit (OMP), a greedy tracking algorithm, is widely used in sparse signal reconstruction in the compressed sensing field. In connection with the characteristics of high-dimensional small samples of near-infrared spectra signals and sparse prior signals, a novel near-infrared spectra variable selection method named Orthogonal Matching Pursuit Based Variable Selection (OMPBVS) is proposed, based on the compressed sensing theory, to further improve the flexibility and reliability of near-infrared spectra variable selection. By sparse reconstruction of the original spectral signal, OMPBVS can compress the regression coefficient of most variables to zero, and then indirectly realize the selection of spectral variables. In the specific process, the spectral matrix is adopted as the sensing matrix, the predictive variable as the observation variable and iteratively calculated residual and the inner product of the atom, and the inner product of the largest atom is chosen. During each iteration, the signal is projected onto the subspace spanned by all selected atoms, and then the coefficients are updated for all the selected atoms, enabling the residual error and all the selected atoms to be orthogonal. With the residual calculation to be the essence of Grammar-Schmidt Orthogonalization, the orthogonal projection can reduce the number of iterations and ensure the accuracy of signal reconstruction. OMPBVS can reduce the spectral dimension to the sample size scale, and its variable selection capability is comparable to LASSO. However, compared with LASSO, the optimization method of OMPBVS loss function is a forward selection algorithm, which reduces the number of iterations and can precisely control the number of selected variables. Variable selection experiments were performed on the beer dataset and Wheat kernels dataset to compare the performance of six variable selection methods: PLS, MCUVE, CARS, WMSCVS, LASSOLarsCV, and OMPBVS. There were 60 samples in the beer dataset, 36 samples of the training set and 24 samples of the test set were divided by Kennard Stone (KS) method, and the prediction variable was Original extract concentration. The Wheat kernels data set consisted of 523 samples, 415 training samples, and 108 test samples. The predicted value was protein content. The OMPBVS method selects the number of variables, RMSEC and RMSEP from the beer dataset as 2, 0.205 2 and 0.159 8, respectively. When on the Wheat kernels data set, the number of selected variables, RMSEC and RMSEP were 9, 0.450 2, and 0.412 5, respectively, and the variable selection ability and model performance was better than the other five methods, indicating that OMPBVS is an effective NIR spectral variable selection and quantitative analysis method. OMPBVS variable selection method has good generalization ability in the case of small samples, which can reduce the number of selected variables and improve the robustness of variable selection. Besides, spectral preprocessing methods based on SNV and MSC can reduce the number of selected variables to a certain extent and improve the interpretability of the model.

    Tools

    Get Citation

    Copy Citation Text

    LI Si-hai, LIU Dong-ling. Quantitative Analysis of Near Infrared Spectroscopy Based on Orthogonal Matching Pursuit Algorithm[J]. Spectroscopy and Spectral Analysis, 2021, 41(4): 1097

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Received: Mar. 2, 2020

    Accepted: --

    Published Online: Apr. 12, 2021

    The Author Email: Si-hai LI (lshroom@163.com)

    DOI:10.3964/j.issn.1000-0593(2021)04-1097-05

    Topics