Optics and Precision Engineering, Volume. 32, Issue 5, 727(2024)

Position-sensitive Transformer aerial image object detection model

Daxiang LI... Jiani XIN* and Ying LIU |Show fewer author(s)
Author Affiliations
  • College of communication and information engineering, Xi'an University of Posts and Telecommunication, Xi'an710121, China
  • show less
    Figures & Tables(12)
    Schematic diagram of PS-TOD model
    Fusion scheme of PCE3DA cross layer feature map
    Flow chart of position channel embedding 3D attention
    Position sensitive self-attention mechanism
    Encoder-decoder structure
    Partial detection results of PS-TOD on VisDrone test set
    Comparison of small object detection result
    • Table 1. Ablation experiment results on VisDrone test set

      View table
      View in Article

      Table 1. Ablation experiment results on VisDrone test set

      方法MSFFPSSALossAPSAPMAPLAPParam/M
      基线---13.836.847.524.741.30
      --16.438.949.426.442.36
      --15.037.648.725.841.45
      --15.639.148.926.041.30
      -17.139.749.827.242.51
      -16.540.049.126.941.45
      -18.539.650.128.142.36
      Ours19.440.150.928.842.51
    • Table 2. Experimental results for different attention mechanisms and using multi-scale features

      View table
      View in Article

      Table 2. Experimental results for different attention mechanisms and using multi-scale features

      组别方 法APSAPMAPLAP
      ABaseline13.836.847.524.7
      BBaseline-SE13.937.047.524.9
      CBaseline-SA14.538.147.725.2
      DBaseline-CA14.337.748.325.4
      EBaseline-CBAM14.637.548.125.2
      FBaseline-PCE3DA15.238.448.725.7
      GF+MSFF16.438.949.426.4
    • Table 3. Experimental results of different relative position calculation methods

      View table
      View in Article

      Table 3. Experimental results of different relative position calculation methods

      方 法APSAPMAPLAP
      基线模型13.836.847.524.7
      文献[2714.337.048.325.0
      文献[2814.637.448.125.1
      PSSA15.037.648.725.8
    • Table 4. Performance comparison of different algorithms on VisDrone test set

      View table
      View in Article

      Table 4. Performance comparison of different algorithms on VisDrone test set

      方 法AP50AP75APFPS
      Faster R-CNN321.7//15.9
      Cascade R-CNN438.625.023.59.0
      YOLOv4631.216.716.828.8
      QueryDet748.128.828.32.8
      CornerNet1034.115.817.415.5
      RetinaNet2028.412.311.316
      Double-Head RCNN2938.324.823.86.5
      IterDet3036.820.320.411.4
      RSOD3143.327.125.428
      YOLOv83246.427.526.530.1
      PVTv23334.121.420.610.9
      PS-TOD(Ours)51.828.328.822.7
    • Table 5. Experimental results of different categories on VisDrone test set

      View table
      View in Article

      Table 5. Experimental results of different categories on VisDrone test set

      目标类别行人汽车公交车自行车卡车三轮车雨棚三轮车面包车摩托车
      基线模型24.818.761.635.212.123.315.24.628.624.9
      PS-TOD29.022.464.345.914.727.121.49.031.728.4
    Tools

    Get Citation

    Copy Citation Text

    Daxiang LI, Jiani XIN, Ying LIU. Position-sensitive Transformer aerial image object detection model[J]. Optics and Precision Engineering, 2024, 32(5): 727

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: May. 30, 2023

    Accepted: --

    Published Online: Apr. 2, 2024

    The Author Email: Jiani XIN (xjn_2000@163.com)

    DOI:10.37188/OPE.20243205.0727

    Topics