Spectroscopy and Spectral Analysis, Volume. 43, Issue 4, 1030(2023)

Prediction of Oil Content in Oil Shale by Near-Infrared Spectroscopy Based on Stacking Ensemble Learning

LI Quan-lun1、*, CHEN Zheng-guang1, and JIAO Feng2
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • show less

    Aims to overcome the shortcomings that the prediction accuracy of a single model is hard to improve further, A heterogeneous ensemble learning model based on the Stacking framework, combined with near-infrared spectroscopy analysis technology, was adopted to detect the oil content in oil shale in this study. A total of 230 oil shale core samples, collected from some block in Songliao Basin, were taken as the research object, whose oil content was measured by the low-temperature dry distillation method, and near-infrared spectral data corresponding to each sample was scanned simultaneously. The Monte Carlo algorithm was employed to eliminate outlier samples, and 213 samples, after removing outliers, were randomly divided into a training set and test set according to the ratio of 3∶1. The detrend coupled with the baseline correction method was used to eliminate the influence of noise and baseline drift in spectral data. After that, the random forest algorithm (RF) was used to extract the characteristic wavelength according to the importance of wavelength. In order to further reduce the data dimension, the CARS algorithm was used to extract the characteristic wavelength. Finally, PLS, SVM, RF and GBDT, whose parameters were optimized by grid search, were adopted as primary learners, and the PLS regression modelwas adopted as secondary learners to build the stacking ensemble learning model. The accuracy of the single and ensemble learning models for oil shale oil content prediction was compared under evaluation indicators of R2 and RMSE. The research results show that the RF-CASR method can effectively screen important wavelengths and improve the efficiency of the model, thereby improving the model efficiency. Heterogeneous integrated learning models based on Stacking have better predictive performance and greater stability than single models (SVM, PLS) and homogeneous integrated learning models (RF, GBDT). Based on multiple random divisions of the data set, the average R2 of the Stacking ensemble learning model is 0.894 2, an average increase of 0.062 3 compared with other models; the RMSEP of 0.5869 is an average of 0.147 4 lower than other models. The results of this study show that the heterogeneous integrated learning model based on stacking can combine the advantages of primary learners to predict the oil content of oil shale quickly and accurately, which provides a new fast and portable method for oil shale oil content detection.

    Tools

    Get Citation

    Copy Citation Text

    LI Quan-lun, CHEN Zheng-guang, JIAO Feng. Prediction of Oil Content in Oil Shale by Near-Infrared Spectroscopy Based on Stacking Ensemble Learning[J]. Spectroscopy and Spectral Analysis, 2023, 43(4): 1030

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Received: Feb. 7, 2022

    Accepted: --

    Published Online: May. 3, 2023

    The Author Email: Quan-lun LI (18663065663@163.com)

    DOI:10.3964/j.issn.1000-0593(2023)04-1030-07

    Topics