Laser & Optoelectronics Progress, Volume. 61, Issue 22, 2237010(2024)

Fine-Grained Image Classification Based on Feature Fusion and Ensemble Learning

Wenli Zhang1,2 and Wei Song1,2、*
Author Affiliations
  • 1School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, Jiangsu , China
  • 2Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, Jiangsu , China
  • show less
    Figures & Tables(17)
    Illustration of different mixup methods. (a) Input; (b) Cutout; (c) Mixup; (d) CutMix; (e) AGMix
    Framework of ViT
    Framework of FFEL-Net
    Framework of part relationship modeling module
    Framework of multi-expert part voting module
    PR curves of different ablation methods. (a) CUB-200-2011; (b) Stanford Dogs; (c) NAbirds; (d) IP102
    Ablation results with different γ and η values. (a) CUB-200-2011; (b) Stanford Dogs; (c) NABirds; (d) IP102
    Visualization results with different γ values
    t-SNE visualization results on four datasets. (a) CUB-200-2011; (b) Stanford Dogs; (c) NABirds; (d) IP102
    Feature visualization results. (a) CUB-200-2011; (b) Stanford Dogs; (c) NABirds; (d) IP102
    • Table 1. Introduction of four datasets

      View table

      Table 1. Introduction of four datasets

      DatasetCalssTrainValidation
      CUB-200-201120059945794
      Stanford Dogs120120008000
      NABirds5552392924633
      IP1021024509522619
    • Table 2. Ablation results of different modules on four datasets

      View table

      Table 2. Ablation results of different modules on four datasets

      MLFFMPVAGMixCUB-200-2011Stanford DogsNABirdsIP102
      ---90.5291.5789.9373.41
      --91.2592.3490.3274.89
      --91.1692.1890.2574.62
      --91.3992.4290.4674.98
      -91.5392.6890.5775.57
      -91.7792.8390.8275.84
      -91.6692.7690.6475.74
      91.9293.1090.9876.21
    • Table 3. Ablation results with different K values

      View table

      Table 3. Ablation results with different K values

      Hyper-parameterAccuracy /%Parameters /106
      CUB-200-2011Standford DogsNABirdsIP102
      K=291.5392.7090.5075.7493.21
      K=391.9293.1090.9876.2193.79
      K=491.6692.8390.6275.8394.38
      K=591.3592.5290.3575.4494.97
      K=691.0992.3090.2275.0795.57
    • Table 4. Comparative results of different mixup methods

      View table

      Table 4. Comparative results of different mixup methods

      MethodCUB-200-2011Stanford DogsNABirdsIP102
      ViT90.5291.5789.9373.41
      +Cutout90.7491.8690.0173.97
      + Mixup91.1292.1590.2274.48
      + CutMix91.2892.2390.3174.75
      +AGMix91.3992.4290.4674.98
    • Table 5. Comparison results on CUB-200-2011, Standford Dogs and NABirds datasets

      View table

      Table 5. Comparison results on CUB-200-2011, Standford Dogs and NABirds datasets

      MethodBackboneCUB-200-2011Standford DogsNABirds
      API-Net27ResNet-5087.788.386.2
      FDL7ResNet-5088.685.0
      MGE-CNN9ResNet-5088.588.0
      MRDMN10ResNet-5088.889.1
      PMG-V228ResNet-5089.989.187.6
      ViT11ViT-B/1690.591.689.9
      TransFG12ViT-B/1691.792.390.8
      TPSKG14ViT-B/1691.392.590.3
      FFVT29ViT-B/1691.691.5
      SM-ViT30ViT-B/1691.692.390.5
      AA-Trans15ViT-B/1691.490.890.2
      IELT5ViT-B/1691.891.890.8
      FFEL-NetViT-B/1691.9293.1090.98
    • Table 6. Comparison results on IP102 dataset

      View table

      Table 6. Comparison results on IP102 dataset

      MethodBackboneIP102 /%
      API-Net27ResNet-5056.9
      DMF-ResNet31ResNet-5059.2
      SGDL-Net32DenseNet12172.7
      MS-ALN+DL33ResNet-5074.6
      ViT11ViT-B/1673.4
      FFVT29ViT-B/1674.6
      TransFG12ViT-B/1674.8
      AATrans15ViT-B/1675.0
      IELT5ViT-B/1675.4
      FFEL-NetViT-B/1676.21
    • Table 7. Analysis of intensity values before

      View table

      Table 7. Analysis of intensity values before

      MethodbackboneParameter /106Inference time /sFLOPs /109ChinaBirds /%
      ViT11ViT-B/1686.420.01167.4695.86
      TransFG12ViT-B/1686.420.147108.4097.35
      FFVT29ViT-B/1686.420.01067.7397.12
      AA-Trans15ViT-B/1686.420.01165.9796.53
      IELT5ViT-B/1699.980.01372.6097.09
      FFEL-NetViT-B/1693.790.01268.7497.83
    Tools

    Get Citation

    Copy Citation Text

    Wenli Zhang, Wei Song. Fine-Grained Image Classification Based on Feature Fusion and Ensemble Learning[J]. Laser & Optoelectronics Progress, 2024, 61(22): 2237010

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Digital Image Processing

    Received: Feb. 28, 2024

    Accepted: Apr. 11, 2024

    Published Online: Nov. 19, 2024

    The Author Email: Song Wei (songwei@jiangnan.edu.cn)

    DOI:10.3788/LOP240759

    CSTR:32186.14.LOP240759

    Topics