Acta Optica Sinica, Volume. 44, Issue 20, 2030001(2024)

Dimensionality Reduction Method for Compact Sample Distribution Using Multi-Step Diffusion Mapping

Zhonghai He1,2、*, Qiong Jia1, Zhanbo Feng1, and Xiaofang Zhang3
Author Affiliations
  • 1School of Control Engineering, Northeastern University at Qinhuangdao, Qinhuangdao 066004, Hebei , China
  • 2Hebei Key Laboratory of Micro-Nano Sensing, Qinhuangdao 066004, Hebei , China
  • 3School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
  • show less
    Figures & Tables(9)
    Typical bandwidth-aggregate similarity plot
    Typical Shannon entropy variation curve with diffusion step
    Spectrogram of soybean meal
    Scatter plots of soybean meal samples with different values of diffusion step t. (a) t=1; (b) t=5; (c) t=150; (d) t=21
    Scatter plots produced by the different dimensionality reduction methods show that the point cloud from the multi-step diffusion mapping method is more conducive to showing the specificity of the new samples. (a) Soybean meal (PCA); (b) soybean meal (multi-step diffusion mapping method); (c) corn gluten meal (PCA); (d) corn gluten meal (multi-step diffusion mapping method)
    • Table 1. Automatic method for optimal bandwidth selection of kernel function based on the best linear fit degree

      View table

      Table 1. Automatic method for optimal bandwidth selection of kernel function based on the best linear fit degree

      Algorithm 1: bandwidth selection algorithm

      Input: N sample sets {XN}

      Output: the optimal bandwidth value of the kernel function δopt

      Initialization procedure: plot the range of bandwidth values on the X-axis using a logarithmic scale, δ10-4,104, select 81 bandwidth value points according to the ratio of q=100.1, δ=δ1, δ2, ..., δ81

      % The similarity matrix is constructed and the total similarity is calculated

      1 for a = 1: 81

      2 Wi,j=exp-xi-xj2/δa

      3 Lδ=iNjNWi,j

      4 end

      %Calculate the range of the bandwidth value δδL,δU, where the subscripts L and U represent the sequence number of the lower upper bound element in the δ set respectively

      5δL=argN+0.1N2-NδU=argN+0.9N2-N

      %The two-point method is used to fit the line, and the fitting error value of each data point is calculated

      6 for i=L+1: U-1

      7 ŷi=Lδi+1-Lδi-1lg δi+1-lg δi-1lg δi-lg δi-1+Lδi-1

      8 Ci=Lδi-ŷi

      9 end

      %Select the point with the smallest sum of error values

      10 δopt=argminCi

    • Table 2. Selection algorithm of optimal diffusion step number topt

      View table

      Table 2. Selection algorithm of optimal diffusion step number topt

      Algorithm 2: optimal diffusion step t selection algorithm

      Input: Markov probability transition matrix P, the maximum tmax

      Output: the most appropriate number of diffusion steps topt

      %The matrix P is decomposed by eigenvalues

      1 Px=λx %N eigenvalues are calculated λ1,λ2,,λN

      %The eigenvalues of matrix P are normalized and the function is calculated Ht

      2 for t=1: tmax

      3 At=j=1Nλjt %calculate the sum of N features worth t powers

      4 for i=1:N

      5 ηti=λi t/At

      6 end %calculate ηt1, ηt2 ,..., ηtN

      7 Ht=-i=1Nηtilogηti

      8 end

      %Divide Ht point by point, and the left and right sides of each equinoxes are fitted linearly, and the fitting error value is obtained

      9 for j=2:tmax-1

      10 for m=1:j

      11 It is calculated by linear fitting with least square method yjlt

      12 Cjl=m=1jHm-yjlm

      13 end

      14 for n=j:tmax-1

      15 It is calculated by linear fitting with least square method yjrt

      16 Cjr=n=jtmaxHn-yjrn

      17 end

      18 Cj=Cjl+Cjr

      19 end

      %Select the point with the smallest sum of fitting error values

      20 topt=argminCj

    • Table 3. Statistical parameters of the data set

      View table

      Table 3. Statistical parameters of the data set

      Data setComponentMinimum value /%Maximum value /%Average value /%Standard deviation /%
      Soybean mealMoisture11.9615.2113.150.91
      Corn gluten mealMoisture5.1011.107.921.01
    • Table 4. RMSE of spectral prediction model

      View table

      Table 4. RMSE of spectral prediction model

      Data setOld sampleNew sample 1New sample 2New sample 3
      Soybean meal0.20640.32610.45400.6391
      Corn gluten meal0.59350.62040.68780.8358
    Tools

    Get Citation

    Copy Citation Text

    Zhonghai He, Qiong Jia, Zhanbo Feng, Xiaofang Zhang. Dimensionality Reduction Method for Compact Sample Distribution Using Multi-Step Diffusion Mapping[J]. Acta Optica Sinica, 2024, 44(20): 2030001

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Spectroscopy

    Received: Apr. 8, 2024

    Accepted: Jun. 4, 2024

    Published Online: Oct. 12, 2024

    The Author Email: He Zhonghai (professorhe@qq.com)

    DOI:10.3788/AOS240820

    CSTR:32393.14.AOS240820

    Topics