Spectroscopy and Spectral Analysis, Volume. 45, Issue 9, 2585(2025)
Near-Infrared Modeling for Total Flavonoid and Protein Contents in Buckwheat Leaves Based on CARS Feature Extraction
To meet the requirements of buckwheat quality determination and breeding work, the Competitive Adaptive Re-Weighted Sampling (CARS) algorithm was used in this study to extract the characteristic spectrum and combined with the quantitative partial least squares method to rapidly determine the total flavonoid and protein content in buckwheat leaves. First, the Kennard-Stone (KS) algorithm was used to split the training and test sets. The training set's average, maximum, and minimum total flavonoid contents were 55.8, 92.5 and 28.1 mg·g-1, respectively. The test set's average, maximum, and minimum total flavonoid contents were 71.0, 99.8 and 31.5 mg·g-1, respectively. The training set's average, maximum, and minimum protein contents were 169.6, 331.0 and 121.2 mg·g-1, respectively. The samples' average, maximum, and minimum protein contents in the test set are 158.2, 183.0 and 129.1 mg·g-1, respectively. Then use Normalization, Normalization + Multiplicative Scatter Correction (MSC), Normalization + Standard Normal Variate Transform (SNV), Normalization + First Derivative, Normalization + Second Order Derivative, Normalization + Savitzky-Golay Smoothing Filter (SG) to preprocess the spectrum in the wavelength range from 4 000 to 12 000 cm-1, then use CARS algorithm to extract the characteristic spectrum, and finally use the partial least squares method to build prediction models. Through a comprehensive analysis of the coefficient of determination of the training model (Rc), the coefficient of determination of the test model (Rp), the root mean square error of cross-validation (RMSECV), the root mean square error of the test model (RMSEP) and the residual predictive deviation (RPD), we obtain the best model for the prediction of total flavonoid and protein in buckwheat. Three available prediction models for total flavonoidswere constructed. The best prediction model used 46 characteristic wavenumber points out of 1102 wavenumber points. The preprocessing method used was normalization + first derivative. The model's Rc, Rp, RMSECV, RMSEP, and RPD were 0.997, 0.933, 0.170, 0.829 and 2.893, respectively. Four available protein prediction models were created, the best of which used 42 characteristic wavenumber points, and the preprocessing method used was normalization + second derivative. The model'sRc, Rp, RMSECV, RMSEP, and RPD are 0.998, 0.965, 0.202, 0.353 and 3.849, respectively. The results show that the application of the KS algorithm and CARS algorithm in building the near-infrared spectroscopy model requires fewer samples to build a reliable prediction model, enables the rapid determination of total flavonoids and protein of buckwheat leaves, and provides powerful tools for buckwheat breeding.
Get Citation
Copy Citation Text
ZHU Li-wei, DU Qian-xi, TANG Guo-hong, LI Hong-you, ZHANG Xiao-na, CHEN Qing-fu, SHI Tao-xiong. Near-Infrared Modeling for Total Flavonoid and Protein Contents in Buckwheat Leaves Based on CARS Feature Extraction[J]. Spectroscopy and Spectral Analysis, 2025, 45(9): 2585
Received: Oct. 24, 2024
Accepted: Sep. 19, 2025
Published Online: Sep. 19, 2025
The Author Email: SHI Tao-xiong (shitaoxiong@126.com)