Spectroscopy and Spectral Analysis, Volume. 45, Issue 1, 125(2025)

Comparative Study of Hyperspectral Preprocessing Methods and Multiple Models in Classification and Discrimination

JU Lei1... YU Jie1, WU Yan-miao2, LI Li2, LU Tian3, DING Ya-ping2 and SHU Ru-xin1,* |Show fewer author(s)
Author Affiliations
  • 1Technology Center of Shanghai Tobacco Group Co., Ltd., Shanghai 200082, China
  • 2Department of Chemistry, Shanghai University, Shanghai 200444, China
  • 3Shanghai Shuzhiwei Information Technology Co., Ltd., Shanghai 200444, China
  • show less

    Identifying different parts of Solanaceae plants is crucial for their product formulation design and quality control. Hyperspectral technology, which can quickly and non-destructively acquire rich information, has become a widely used tool in plant research and monitoring. As important economic crops, Solanaceae plants have great research potential when combined with hyperspectral technology. This study employs hyperspectral technology to classify different parts of Solanaceae plant leaves after initial roasting. Firstly, hyperspectral sampling was conducted on 293 powder samples from different parts of Solanaceae plants using the Field Spec 3 spectroradiometer. Subsequently, data preprocessing was performed using S-G smoothing and first-order and second-order derivatives to enhance information and remove noise. To minimize redundant features, partial least squares (PLS) were then used for data dimensionality reduction. Finally, based on the dimensionality-reduced data, six machine learning classification models-support vector machine (SVM), logistic regression, K-nearest neighbors (KNN), decision tree, random forest, and gradient boosting decision tree—were used for modeling and analysis. The results showed that for the classification task, the SVM model performed best after first-order derivative processing, achieving an accuracy of 100.0% on the training set and 84.7% on the test set. After grid parameter optimization, the optimal parameters were determined: no restriction on maximum depth, a minimum sample split of 4, and 200 estimators. The accuracy of five-fold cross-validation after parameter optimization was 88.1%, with the training set accuracy at 100% and the test set accuracy at 86.4%. The study results indicate that preprocessing methods combined with dimensionality reduction can enhance data information, enabling classification models to capture the characteristics of Solanaceae plant samples better. This study is of great significance for the rapid, accurate, and non-destructive differentiation of parts of Solanaceae plants.

    Tools

    Get Citation

    Copy Citation Text

    JU Lei, YU Jie, WU Yan-miao, LI Li, LU Tian, DING Ya-ping, SHU Ru-xin. Comparative Study of Hyperspectral Preprocessing Methods and Multiple Models in Classification and Discrimination[J]. Spectroscopy and Spectral Analysis, 2025, 45(1): 125

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Received: Jun. 18, 2024

    Accepted: Feb. 28, 2025

    Published Online: Feb. 28, 2025

    The Author Email: Ru-xin SHU (shurx@sh.tobacco.com.cn)

    DOI:10.3964/j.issn.1000-0593(2025)01-0125-08

    Topics