Spectroscopy and Spectral Analysis, Volume. 43, Issue 1, 239(2023)

Research on Quantitative Regression Method of IR Spectra of Organic Compounds Based on Ensemble Learning With Wavelength Selection

JU Wei1, LU Chang-hua2,3, ZHANG Yu-jun3, CHEN Xiao-jing1, and JIANG Wei-wei2
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • 3[in Chinese]
  • show less

    The application of the ensemble learning method in the quantitative analysis of organic infrared spectra and the influence of the characteristic wavelength selection method on the modeling efficiency and prediction accuracy of infrared spectra ensemble learning is studied. Taking the cetane number and total aromatic hydrocarbon content of diesel infrared spectra as the research object, firstly, a two-layer stacking ensemble learning framework is established by using extreme random forest (ERT), linear kernel support vector machine (LinearSVM), radial basis kernel support vector machine (RBFSVM) and polynomial kernel support vector machine (polySVM) as baselearners, and LinearSVM as meta-learners. The quantitative regression accuracy of diesel infrared spectra by single base learners and ensemble learning model is analyzed and compared. Compared with the partial least squares (PLS) quantitative regression model, the prediction accuracy of the Stacking ensemble learning model for two organic compounds in diesel spectra is improved. The ERT model for cetane number content is the best (r=0.848, RMSEP=1.603, RDP=2.627), the prediction result of Stacking model for total aromatic content is the best (r=0.991, RMSEP=0.645, RDP=9.243). Further, the characteristic wavelengths of infrared spectra are selected using the combined partial least squares (SiPLS) and successive projections algorithm (SPA), and the ensemble learning quantitative regression model is established using the selected characteristic wavelengths. Among them, the prediction results of the SiPLS-ERT model for cetane number content are the best (r=0.893, RMSEP=1.013, RDP=3.051), and the prediction results of the SiPLS-Stacking model for total aromatic content are the best (r=0.998, RMSEP=0.354, RDP=11.475), and the average training time of the model is reduced by more than 50% compared with the full spectra training time, and the modeling speed is significantly improved. The results show that the characteristic wavelengths combined with ensemble learning quantitative regression modeling can be used in the quantitative analysis of organic infrared spectra. Compared with the traditional quantitative regression method, the modeling efficiency and prediction accuracy of this method are greatly improved, which provides relevant method support for the further study of the application of machine learning in the quantitative analysis of spectra.

    Tools

    Get Citation

    Copy Citation Text

    JU Wei, LU Chang-hua, ZHANG Yu-jun, CHEN Xiao-jing, JIANG Wei-wei. Research on Quantitative Regression Method of IR Spectra of Organic Compounds Based on Ensemble Learning With Wavelength Selection[J]. Spectroscopy and Spectral Analysis, 2023, 43(1): 239

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Received: Oct. 13, 2021

    Accepted: --

    Published Online: Mar. 28, 2023

    The Author Email:

    DOI:10.3964/j.issn.1000-0593(2023)01-0239-09

    Topics