Laser & Optoelectronics Progress, Volume. 61, Issue 8, 0837015(2024)

Few-Shot Object Detection Based on Association and Discrimination

Jianli Jia1,2,3, Huiyan Han1,2,3、*, Liqun Kuang1,2,3, Fangzheng Han1,2,3, Xinyi Zheng1,2,3, and Xiuquan Zhang1,2,3
Author Affiliations
  • 1School of Computer Science and Technology, North University of China, Taiyuan 030051, Shanxi , China
  • 2Shanxi Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, Shanxi , China
  • 3Shanxi Province’s Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, Shanxi , China
  • show less
    Figures & Tables(21)
    Basic structure of Faster R-CNN
    Structure of TFA two-stage fine-tuning method
    Decision boundary of TFA fine-tuning stage
    Decision boundary of association step
    Decision boundary of discrimination step
    Step of association stage
    Step of discrimination stage
    Dynamic R-CNN. (a) DLA; (b) DSL
    ECA module
    Prediction results of FSAD and TFA. (a) FSAD; (b) TFA
    Prediction results of differnet algorithms. (a) MPSR; (b) Retentive R-CNN; (c) DiGeo; (d) HTRPN; (e) FSAD
    Coexisting instances, left is semantic similarity, right is visual similarity
    • Table 1. Experimental parameters and their values

      View table

      Table 1. Experimental parameters and their values

      ParameterLearning rateMomentumWeight decayBatch size
      value0.0010.90.000116
    • Table 2. Number of training iterations under different K values

      View table

      Table 2. Number of training iterations under different K values

      K123510
      Number of iterations40008000120001600020000
    • Table 3. nAP50 of different methods on PASCAL VOC dataset

      View table

      Table 3. nAP50 of different methods on PASCAL VOC dataset

      MethodBackboneNovel Split 1Novel Split 2Novel Split 3
      K=1K=2K=3K=5K=10K=1K=2K=3K=5K=10K=1K=2K=3K=5K=10
      LSTD28VGG-168.21.012.429.138.511.43.85.015.731.012.68.515.027.336.3
      YOLOv2-ft29YOLO V26.610.712.524.838.612.54.211.616.133.913.015.915.032.238.4
      FSRW314.815.526.733.947.215.715.322.730.140.521.325.628.442.845.9
      MetaDet2917.119.128.935.048.818.220.625.930.641.520.122.327.941.942.9
      RepMet6InceptionV326.132.934.438.641.317.222.123.428.335.827.531.131.534.437.2
      FRCN-ft29FRCN-R10113.819.632.841.545.67.915.326.231.639.19.811.319.135.045.1
      FRCN+FPN-ft 78.220.329.040.145.513.420.628.632.438.819.620.828.742.242.1
      MetaDet2918.920.630.236.849.621.823.127.831.743.020.623.929.443.944.1
      Meta R-CNN419.925.535.045.751.510.419.429.634.845.414.318.227.541.248.1
      TFA w/fc7FRCN-R10136.829.143.655.757.018.229.033.435.539.027.733.642.548.750.2
      TFA w/cos739.836.144.755.756.023.526.934.135.139.130.834.842.849.549.8
      MPSR841.751.455.261.824.439.239.947.835.642.348.049.7
      SRR-FSD3047.850.551.355.256.832.535.339.140.843.840.141.544.346.946.4
      DiGeo2637.939.448.558.661.526.628.941.942.149.130.440.146.952.754.7
      FSCE944.243.851.461.963.427.329.543.544.250.237.241.947.554.658.5
      Retentive R-CNN2542.445.845.953.756.121.727.835.237.040.330.237.843.049.750.1
      HTRPN2747.044.853.462.965.229.832.646.347.753.040.145.949.657.059.7
      FSAD(ours)50.554.754.657.662.231.435.539.242.545.246.146.347.354.859.0
    • Table 4. Parameter quantity comparison

      View table

      Table 4. Parameter quantity comparison

      Methodtotal_paramstrainable_paramsnontrainable_params
      TFA60.30.160.2
      FSCE60.360.10.2
      DiGeo76.415.061.4
      HTRPN76.576.30.2
      FSAD60.417.942.5
    • Table 5. Effectiveness of different components of FSAD

      View table

      Table 5. Effectiveness of different components of FSAD

      AssociationDisentanglingMarginnAP50 /%
      K=1K=3K=5
      ×××41.346.353.7
      ××42.446.855.2
      ××42.447.354.1
      ×44.950.356.8
      ××46.348.856.4
      50.554.657.6
    • Table 6. Effectiveness of modules in the correlation and recognition stage

      View table

      Table 6. Effectiveness of modules in the correlation and recognition stage

      Dynamic RoI headECAnAP50 /%
      K=1K=3K=5
      ××43.249.454.4
      ×44.951.356.0
      ×46.753.056.8
      50.554.657.6
    • Table 7. Comparison of different allocation strategies (without using margin loss)

      View table

      Table 7. Comparison of different allocation strategies (without using margin loss)

      BasebirdbuscowmotorbikesofanAP50 /%
      randompersonboathorseaeroplanesheep39.6
      humanaeroplanetrainsheepbicyclechair44.1
      visualdogcarhorsepersonchair43.4
      top2dogcarsheeptvdiningtable41.2
      top1horsetrainhorsebicyclechair44.3
      top1 w/o dupdogtrainhorsebicyclechair44.9
    • Table 8. Performance comparison of different margin loss

      View table

      Table 8. Performance comparison of different margin loss

      MarginnAP50 /%
      TFA41.3
      CosFace38.9
      ArcFace37.9
      CosFace(novel)44.2
      ArcFace(novel)44.3
      Ours46.3
    • Table 9. Comparison of visual similarity and semantic similarity

      View table

      Table 9. Comparison of visual similarity and semantic similarity

      MetricNovel Split 1Novel Split 2Novel Split 3
      K=1K=3K=5K=1K=3K=5K=1K=3K=5
      Visual43.349.356.422.537.239.331.843.150.7
      Semantic44.950.356.826.138.540.137.145.051.5
    Tools

    Get Citation

    Copy Citation Text

    Jianli Jia, Huiyan Han, Liqun Kuang, Fangzheng Han, Xinyi Zheng, Xiuquan Zhang. Few-Shot Object Detection Based on Association and Discrimination[J]. Laser & Optoelectronics Progress, 2024, 61(8): 0837015

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Digital Image Processing

    Received: Jul. 5, 2023

    Accepted: Aug. 22, 2023

    Published Online: Apr. 16, 2024

    The Author Email: Han Huiyan (hhy980344@163.com)

    DOI:10.3788/LOP231658

    Topics