Acta Photonica Sinica, Volume. 52, Issue 1, 0110002(2023)

Object Detection Algorithm Based on Dual-modal Fusion Network

Ying SUN1,2, Zhiqiang HOU1,2、*, Chen YANG1,2, Sugang MA1,2, and Jiulun FAN1
Author Affiliations
  • 1School of Computer Science and Technology,Xi'an University of Posts & Telecommunications,Xi'an 710121,China
  • 2Shaanxi Provincial Key Laboratory of Network Data Analysis and Intelligent Processing,Xi'an 710121,China
  • show less
    Figures & Tables(13)
    Overall algorithm architecture
    Dual-mode encoder structure
    Gated fusion network structure
    P-R curves of the two models with different modal inputs
    Detection results on the KAIST dataset
    Detection results on the GIR dataset
    • Table 1. Detector performance for different input image pairs sizes on n-model

      View table
      View in Article

      Table 1. Detector performance for different input image pairs sizes on n-model

      AlgorithmResolutionAP0.5:0.95AP0.5
      Ours-n416×41630.570
      Ours-n512×51232.573.1
      Ours-n608×60832.973.3
      Ours-n640×64033.373.8
    • Table 2. Detector performance for different input image pairs sizes on s-model

      View table
      View in Article

      Table 2. Detector performance for different input image pairs sizes on s-model

      AlgorithmResolutionAP0.5:0.95AP0.5
      Ours-s416×41631.171
      Ours-s512×51231.972.7
      Ours-s608×60834.373.9
      Ours-s640×64035.274.5
    • Table 3. Ablation experimental results of different models on the KAIST dataset

      View table
      View in Article

      Table 3. Ablation experimental results of different models on the KAIST dataset

      MethodEncoder-VSEncoder-IRGated FusionInputAP0.5:0.95AP0.5FPS
      YOLOv5-nVS24.858.7158.7
      YOLOv5-nIR31.671158.7
      YOLOv5-n-EVSVS2559.1125
      YOLOv5-n-EIRIR31.871.3125
      Ours-nVS+IR33.373.8117.6
      YOLOv5-sVS26.759.8112.4
      YOLOv5-sIR3271.5112.4
      YOLOv5-s-EVSVS26.960.2107.5
      YOLOv5-s-EIRIR32.271.9107.5
      Ours-sVS+IR35.274.5102
    • Table 4. Ablation experimental results of different models on the GIR dataset

      View table
      View in Article

      Table 4. Ablation experimental results of different models on the GIR dataset

      MethodEncoder-VSEncoder-IRGating FusionInputAP0.5:0.95AP0.5FPS
      YOLOv5-nVS48.488.8158.7
      YOLOv5-nIR36.375.5158.7
      YOLOv5-n-EVSVS49.489.1105.3
      YOLOv5-n-EIRIR36.476.3105.3
      Ours-nVS+IR49.789.8101
      YOLOv5-sVS51.489.9111.1
      YOLOv5-sIR36.676.8111.1
      YOLOv5-s-EVSVS51.990.191.7
      YOLOv5-s-EIRIR36.77791.7
      Ours-sVS+IR52.290.585.5
    • Table 5. The detection accuracy of the proposed algorithm and the baseline algorithm(AP0.5%)

      View table
      View in Article

      Table 5. The detection accuracy of the proposed algorithm and the baseline algorithm(AP0.5%)

      ClassOurs-nYOLOv5-n-VSYOLOv5-n-IROurs-sYOLOv5-s-VSYOLOv5-s-IR
      Person90.791.284.091.791.785.4
      Dog99.599.599.599.599.591.6
      Car95.495.294.395.895.194.7
      Bicycle80.483.770.780.884.772.8
      Plant85.684.579.186.487.076.0
      Motorcycle82.882.076.183.982.477.7
      Umbrella86.087.870.585.786.676.1
      Kite93.682.964.694.489.267.6
      Toy95.696.386.796.497.083.7
      Ball88.584.729.590.785.542.1
    • Table 6. Comparative experimental results on the KSIAT dataset

      View table
      View in Article

      Table 6. Comparative experimental results on the KSIAT dataset

      InputAlgorithmBackboneResolutionAP0.5:0.95AP0.5FPS
      VSFaster R-CNN(2015)ResNet-501 000×60024.258.315.2
      SSD(2016)VGG-16512×51218.148.238.1
      RetinaNet(2017)ResNet-501 333×80022.557.716.6
      YOLOv3(2018)DarkNet-53416×41618.346.756.2
      FCOS(2019)ResNet-501 333×80022.756.718.3
      ATSS(2020)ResNet-501 333×80024.357.817
      YOLOv4(2020)CSPDarkNet-53416×41623.757.455
      YOLOX-s(2021)Modified CSP v5416×4162761.148.4
      YOLOX-m(2021)Modified CSP v5416×41627.761.840.3
      YOLOF(2021)ResNet-501 333×80022.254.125.7
      YOLOv5-n(2020)Modified CSP v5640×64024.858.7158.7
      YOLOv5-s(2020)Modified CSP v5640×64026.459.8112.4
      YOLOv5-n-EVSModified CSP v5640×6402559.1125
      YOLOv5-s-EVSModified CSP v5640×64026.960.2107.5
      IRFaster R-CNN(2015)ResNet-501 000×60028.868.612
      SSD(2016)VGG-16512×51223.260.934
      RetinaNet(2017)ResNet-501 333×80027.868.214.1
      YOLOv3(2018)DarkNet-53416×41625.363.637
      FCOS(2019)ResNet-501 333×80029.669.414
      ATSS(2020)ResNet-501 333×800296913.8
      YOLOv4(2020)CSPDarkNet-53416×41627.468.552.6
      YOLOX-s(2021)Modified CSP v5416×41632.872.145
      YOLOX-m(2021)Modified CSP v5416×41633.573.140
      YOLOF(2021)ResNet-501 333×80027.365.625
      YOLOv5-n(2020)Modified CSP v5640×64031.671158.7
      YOLOv5-s(2020)Modified CSP v5640×6403271.5112.4
      YOLOv5-n-EIRModified CSP v5640×64031.871.3125
      YOLOv5-s-EIRModified CSP v5640×64032.271.9107.5
      VS+IRMMTOD(2019)18ResNet-1011 000×60031.170.713.2
      CMDet(2021)37ResNet-101640×51228.368.425.3
      RISNet(2022)38DarkNet-53416×41633.172.723
      Ours-nModified CSP v5640×64033.373.8117.6
      Ours-sModified CSP v5640×64035.274.5102
    • Table 7. Comparative experimental results on the GIR dataset

      View table
      View in Article

      Table 7. Comparative experimental results on the GIR dataset

      InputAlgorithmBackboneResolutionAP0.5:0.95AP0.5FPS
      VSYOLOv3(2018)DarkNet-53416×41641.285.750
      FCOS(2019)ResNet-501 333×80040.48416
      ATSS(2020)ResNet-501 333×80047.187.114
      YOLOv4(2020)CSPDarkNet-53416×41644.587.953
      YOLOX-s(2021)Modified CSP v5416×41651.790.352
      YOLOv5-n(2020)Modified CSP v5640×64048.488.8158.7
      YOLOv5-s(2020)Modified CSP v5640×64051.489.8111.1
      YOLOv5-n-EVSModified CSP v5640×64049.489.1105.3
      YOLOv5-s-EVSModified CSP v5640×64051.990.191.7
      IRYOLOv3(2018)DarkNet-53416×41635.674.248.4
      FCOS(2019)ResNet-501 333×80034.572.312
      ATSS(2020)ResNet-501 333×80035.273.411.7
      YOLOv4(2020)CSPDarkNet-53416×41635.874.749
      YOLOX-s(2021)Modified CSP v5416×41636.976.353
      YOLOv5-n(2020)Modified CSP v5640×64036.375.5158.7
      YOLOv5-s(2020)Modified CSP v5640×64036.676.8111.1
      YOLOv5-n-EIRModified CSP v5640×64036.476.3105.3
      YOLOv5-s-EIRModified CSP v5640×64036.77791.7
      VS+IRMMTOD(2019)18ResNet-1011 000×60040.784.311.2
      CMDet(2021)37ResNet-101640×51248.688.922.7
      RISNet(2022)38DarkNet-53416×41649.389.223.3
      Ours-nModified CSP v5640×64049.789.8101
      Ours-sModified CSP v5640×64052.290.585.5
    Tools

    Get Citation

    Copy Citation Text

    Ying SUN, Zhiqiang HOU, Chen YANG, Sugang MA, Jiulun FAN. Object Detection Algorithm Based on Dual-modal Fusion Network[J]. Acta Photonica Sinica, 2023, 52(1): 0110002

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Jul. 13, 2022

    Accepted: Aug. 2, 2022

    Published Online: Feb. 27, 2023

    The Author Email: Zhiqiang HOU (hou-zhq@sohu.com)

    DOI:10.3788/gzxb20235201.0110002

    Topics