Spectroscopy and Spectral Analysis, Volume. 41, Issue 11, 3331(2021)

Variable Selection Methods in Spectral Data Analysis

Yan-kun LI1、1; *;, Ru-nan DONG1、1;, Jin ZHANG2、2;, Ke-nan HUANG3、3;, and Zhi-yi MAO4、4;
Author Affiliations
  • 11. Department of Environmental Science and Engineering, North China Electric Power University, Hebei Key Lab of Power Plant Flue Gas Multi-Pollutants Control, Baoding 071003, China
  • 22. School of Food Science, Guizhou Medical University, Guiyang 550025, China
  • 33. The 82nd Army Group Hospital of the Chinese People’s Liberation Army, Baoding 071000, China
  • 44. Tianjin Building Material Science Research Academy, Tianjin 300110, China
  • show less
    Figures & Tables(6)
    An overview of related works on variable/wavelength selection
    The methods of WS and WIS
    Comparisons of variable selection methods in NIR-protein model for corn data
    Illustration for filter (F), wrapper (W) and embedded (E) methods
    • Table 1. PLS parameter-based variables selection methods

      View table
      View in Article

      Table 1. PLS parameter-based variables selection methods

      MethodFirst appearance[Ref.]Characteristic (Merit and Drawback)
      UVE (uninformative variable elimination)Massart, 1996[6]Intuitive and practical, effectively eliminate the influence of non-objective factors; Random noise variables make the result unstable, and LOOCV makes calculation efficiency low.
      MC-UVE (Monte Carlo-UVE)Shao, 2008[7]MC technique instead of LOOCV, does not add noise variables, high stability; Needs to define a threshold, tends to select more variables.
      iPLS (interval PLS)Norgaard, 2000[8]Focus on a choice of better sub-intervals; Just testing a series of adjacent but nonoverlapping intervals, which would miss some more informative ones.
      MWPLS (moving window PLS)Jiang, 2002[9]Considers all the possible continuous intervals but maybe not the optimized intervals.
      CARS (competitive adaptive reweighted sampling)-PLSLiang, 2009[10]With fewer variables and latent variables; The reliability of PLS model parameters based on full spectra needs to be strengthened, low stability.
      VIP (variable importance in projection)Wold, 1993[11]Accumulate the importance of each variable reflected by loading weight from each component; It can be used when the independent variables number is more than the sample size; Require probabilistic considerations regarding VIP.
      RT (randomization test)-PLSFisher, 1935[12]Combines permutation and statistical test, the result is more reliable; When the dataset is large, it has low efficiency and time consumption.
      IVS (interactive variable
      selection)
      Lindgren & Wold,
      1994[13]
      Dimension-wise instead of model-wise, variable selection is carried out for each PLS component, an interactive variable selection approach; Large elements in sometimes suppress smaller values.
      IPW (iterative predictor
      weighting)[15]
      Forina, 1999[14]The importance measure is used both to re-scale the original X-variables and to eliminate the least important variables; Time-consuming for too many variables.
    • Table 2. Other common methods of spectral variables selection

      View table
      View in Article

      Table 2. Other common methods of spectral variables selection

      Selection strategyRepresentative methods[Ref.]First appearance[Ref.]Characteristic(Merit and Drawback)
      Intelligent optimizing
      algorithms (IOA)-based
      GA(Genetic algorithm)Holand, 1975[43]Return to the mathematical essence of variable combination optimization, retain advantages of the combination of variables; Too many combinations of variables to optimize, usually need more preset parameters, sometimes easy to fall into local optimum.
      SA(Simulated Annealing)Metropolis, 1953[44]
      PSO(Particle swarm optimization)Eberhart&Kennedy, 1995[45]
      ACO(Ant colony optimization)Colorni, 1991[46]
      GWO(Gray wolf optimizer)Mirjalili, 2014[47]
      Model population
      analysis (MPA)-based
      BOSS (Bootstrapping soft shrinkage)Liang, 2016[48]The traditional strategy of rigidly eliminating variables according to a single index is transformed into a flexible strategy of changing weight, which can preserve the effective variables more safely; The introduction of random algorithm helps to preserve the combination effect among spectral variables, however, it also makes the calculation more complicated.
      VCPA (Variable combination
      population analysis)
      Liang, 2015[49]
      VISSA (Variable iterative space
      shrinkage approach)
      Liang, 2014[50]
      ICO (Interval combination optimization)Xiong & Min, 2016[51]
      iRF (internal Random frog)Liang, 2013[52]
      Collinearity
      minimization-based
      SPA (Successive projection
      algorithm)[53, 54]
      Araujo, 2001[55]Minimizing the influence of multi-collinearity variables on the model; In the optimization, each variable is used as the starting point, the calculation amount is too large to be suitable for small-size sample.
      SR (Stepwise
      regression)[56]
      Category model-basedLDA (Linear discriminant analysis)Fisher, 1936[57]The correlation between variables and model is preserved, and the overall prediction accuracy is improved by combining different classification algorithms. The computational complexity is small, but the result is limited by the performance of the classification model.
      ULDA (Uncorrelated
      lineardiscriminant analysis)[58]
      Jin, 2001[59]
      RF (Random forest)[60,61,62]Breiman, 2001[63]
      SVM (Support vector machine)Vapnik, 1995[64]
      Regularization methodLASSO (Least absolute shrinkage
      and selection operator)[65]
      Tibshirani, 1996[66]Parameter estimation and variable selection are realized simultaneously, fast. When the number of variables is large, the over-fitting can be avoided; The suitable parameter value should be chosen.
      EN (Elastic net)Zou, 2003[67]
      RR (Ridge regression)Hoerl & Kennard, 1998[68]
    Tools

    Get Citation

    Copy Citation Text

    Yan-kun LI, Ru-nan DONG, Jin ZHANG, Ke-nan HUANG, Zhi-yi MAO. Variable Selection Methods in Spectral Data Analysis[J]. Spectroscopy and Spectral Analysis, 2021, 41(11): 3331

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Orginal Article

    Received: Nov. 1, 2020

    Accepted: --

    Published Online: Dec. 17, 2021

    The Author Email:

    DOI:10.3964/j.issn.1000-0593(2021)11-3331-08

    Topics