Acta Photonica Sinica, Volume. 53, Issue 3, 0310001(2024)

Object Detection Algorithm Based on CNN-Transformer Dual Modal Feature Fusion

Chen YANG1...2, Zhiqiang HOU1,2,*, Xinyue LI1,2, Sugang MA1,2, and Xiaobao YANG12 |Show fewer author(s)
Author Affiliations
  • 1School of Computer Science and Technology, Xi'an University of Posts & Telecommunications, Xi'an 710121, China
  • 2Shaanxi Provincial Key Laboratory of Network Data Analysis and Intelligent Processing,Xi'an 710121, China
  • show less
    Figures & Tables(11)
    Overall network architecture
    Infrared feature extraction module
    Visible feature extraction module
    Infrared-visible dual modal fusion module
    Qualitative analysis results on the KAIST dataset
    Qualitative analysis results on the FLIR dataset
    Qualitative analysis results on the GIR dataset
    • Table 1. Ablation experiment on the KAIST dataset

      View table
      View in Article

      Table 1. Ablation experiment on the KAIST dataset

      MethodCFETFEIRFInputKAIST
      RGBIRAP0.5AP0.5:0.95FPS
      YOLOv571.532112.4
      YOLOv559.826.7112.4
      YOLOv5+CFE72.232.4103.5
      YOLOv5+TFE60.426.898
      YOLOv5+CFE+IRF76.333.794.5
      YOLOv5+TFE+IRF76.533.990.1
      CTDMDet (ours)77.234.688.7
    • Table 2. Ablation experiment on the FLIR dataset

      View table
      View in Article

      Table 2. Ablation experiment on the FLIR dataset

      MethodCFETFEIRFInputFLIR
      RGBIRAP0.5AP0.5:0.95FPS
      YOLOv573.935.7132.6
      YOLOv567.825.9132.6
      YOLOv5+CFE82.443.9124.7
      YOLOv5+TFE8040.4119.5
      YOLOv5+CFE+IRF85.346.2116.9
      YOLOv5+TFE+IRF84.945.0111
      CTDMDet (ours)85.546.6108.3
    • Table 3. Ablation experiment on the GIR dataset

      View table
      View in Article

      Table 3. Ablation experiment on the GIR dataset

      MethodCFETFEIRFInputGIR
      RGBIRAP0.5AP0.5:0.95FPS
      YOLOv576.836.6111.1
      YOLOv589.951.4111.1
      YOLOv5+CFE84.447.9106.7
      YOLOv5+TFE91.152.7103
      YOLOv5+CFE+IRF91.655.995.2
      YOLOv5+TFE+IRF91.355.893.8
      CTDMDet (ours)91.756.491.6
    • Table 4. Quantitative analysis results on the KAIST, FLIR, and GIR datasets

      View table
      View in Article

      Table 4. Quantitative analysis results on the KAIST, FLIR, and GIR datasets

      InputAlgorithmKAISTFLIRGIR
      AP0.5AP0.5:0.95FPSAP0.5AP0.5:0.95FPSAP0.5AP0.5:0.95FPS
      IRFaster-RCNN (2015)68.628.81278.437.91677.939.210.9
      SSD (2016)60.923.2344013.237.875.236.932.6
      RetinaNet (2017)68.227.814.176.132.316.278.138.311.3
      YOLOv3 (2018)63.625.33772.630.439.674.235.648.4
      FCOS (2019)69.429.61482.442.617.372.334.512
      ATSS (2020)692913.871.538.61573.435.211.7
      YOLOv4 (2020)68.527.452.648.520.455.674.735.849
      YOLOv5-s (2020)71.532112.473.935.7132.676.836.6111.1
      YOLOF (2021)65.627.32551.219.33068.330.722
      YOLOv7 (2022)72.130.9110.774.833.6113.575.930.6110.7
      YOLOv8 (2023)68.72810770.631.6110.173.232.9106.4
      CTDMDet-IR72.232.4103.582.443.9124.784.447.9106.7
      RGBFaster-RCNN (2015)58.324.215.265.622.816.888.945.812.6
      SSD (2016)48.218.138.157.418.640.285.438.834.2
      RetinaNet (2017)57.722.516.665.222.319.387.643.912.8
      YOLOv3 (2018)46.718.356.256.916.858.885.741.250
      FCOS (2019)56.722.718.367.126.625.18440.416
      ATSS (2020)57.824.31757.823.918.987.147.114
      YOLOv4 (2020)57.423.75665.322.75887.944.553
      YOLOv5-s (2020)59.826.4112.467.825.9132.689.851.4111.1
      YOLOF (2021)54.122.225.744.816.829.376.142.821.3
      YOLOv7 (2022)59.623.8101.766.323.5108.988.250.198.2
      YOLOv8 (2023)56.221.5100.463.921103.484.644.697.7
      CTDMDet-RGB60.426.8988040.4119.591.152.7103

      IR

      +

      RGB

      MMTOD 45 (2019)70.731.313.276.337.213.284.340.711.2
      CMDet46 (2021)68.428.325.370.535.325.388.948.622.7
      GAFF37 (2021)67.124.470.972.933.470.982.154.470.9
      CFT31 (2021)71.229.38877.736.8888860.541.6
      ProbEn32 (2021)---75.537.9----
      RISNet33 (2022)72.733.12378.540.12389.249.323.3
      CSAA34 (2023)---79.241.6----
      CTDMDet (Ours)77.234.688.785.546.6108.391.756.491.6
    Tools

    Get Citation

    Copy Citation Text

    Chen YANG, Zhiqiang HOU, Xinyue LI, Sugang MA, Xiaobao YANG. Object Detection Algorithm Based on CNN-Transformer Dual Modal Feature Fusion[J]. Acta Photonica Sinica, 2024, 53(3): 0310001

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Aug. 21, 2023

    Accepted: Sep. 26, 2023

    Published Online: May. 16, 2024

    The Author Email: HOU Zhiqiang (hou-zhq@sohu.com)

    DOI:10.3788/gzxb20245303.0310001

    Topics