Infrared and Laser Engineering, Volume. 53, Issue 9, 20240253(2024)

Review of advances in small object detection technology based on deep learning (invited)

Genghuan LIU1,2,3, Xiangjin ZENG1,2,3, Jiazhen DOU1,2,3, Zhenbo REN4、*, Liyun ZHONG1,2,3, Jianglei DI1,2,3, and Yuwen QIN1,2,3
Author Affiliations
  • 1School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China
  • 2Key Laboratory of Photonic Technology for Integrated Sensing and Communication, Ministry of Education, Guangzhou 510006, China
  • 3Guangdong Provincial Key Laboratory of Information, Guangzhou 510006, China
  • 4School of Physical Science and Technology, Northwestern Polytechnical University, Xi'an 710129, China
  • show less
    Figures & Tables(21)
    Examples of small and tiny objects in the AI-TOD dataset (Green boxes representing small objects, while infrared boxes representing tiny objects)[12]
    The complex background leads to losignal-to-noise ratio and low detectability[6]
    Low tolerance of small targets to bounding box perturbations( The top-left, bottom-left, and right images respectively represent small, medium, and large targets. Black indicates the ground truth boxes, while blue and red represent predicted bounding boxes slightly offset in the diagonal direction)
    Four methods of multi-scale representation learning[76]. (a) Single feature map; (b) Image pyramid;(c) Pyramid feature levels;(d) Feature pyramid network
    PANet network structure[81]
    GCWNet network structure[114]
    Module structure of LSKNet[127]
    Detection methods of four anchor-free mechanisms. (a) ConnerNet; (b) CenterNet; (c) ExtremeNet; (d) FCOS
    DETR network structure[150]
    AnChor DETR network structure[157]
    Four image fusion strategies. (a) Early fusion; (b) Mid-level fusion; (c) Late fusion; (d) Confidence fusion[169]
    YOLOFusion network structure[182]
    Examples of various datasets. (a) DOTA[13]; (b) AI_TOD[12]; (c) DIOR[8]; (d) VisDrone2019[22]; (e) TT100 K[218]; (f) BSTID[219]; (g) TinyPerson[14]; (h) CityPerson[25]; (i) WiderPerson[220]; (j) BIRDSAI[221]; (k) VEDAI[222]; (l) MS COCO[1]
    • Table 1. Data augmentation methods

      View table
      View in Article

      Table 1. Data augmentation methods

      NumberMethodMain contentYearPublication
      1CutOut[41]2017arXiv
      2Adaptive Resampling[47]2019ICCV
      3Mosaic[45]2019arXiv
    • Table 2. Super-resolution methods

      View table
      View in Article

      Table 2. Super-resolution methods

      NumberMethodMain contentYearPublication
      1CARAFE[58]2019CVPR
      2Perceptual GAN[68]2017CVPR
      3MTGAN[71]2020IJCV
    • Table 3. Summary of advantages and disadvantages of small object detection methods

      View table
      View in Article

      Table 3. Summary of advantages and disadvantages of small object detection methods

      MethodModelAdvantageDisadvantage
      Data AugmentationMixUp[42]CutMix[43]Mosaic[45]Increasing small object samples to address issues with limited visual information for small targetsHeavily relies on specific datasets. May introduce new noise, impairing the performance of feature extraction
      Super ResolutionCARAFE[58]Perceptual GAN[68]MTGAN[71]"By understanding the connections between small and large targets, repair certain small object detailsFacing a trade-off between high computational load and performance optimization. GANs may generate false artifacts
      Multi-scale Feature Perception and FusionFPN[76]PANet[78]AFF[88]Enhancing with deep semantic-rich features while retaining the spatial richness of shallow featuresProne to interference from noise and computational burdens
      Contextual Information LearningCoupleNet[103]PyramidBox[104]GCWNet[114]Utilize the connection between the target and its surrounding targets and environment to provide more information for the networkRedundant contextual information can lead to information noise
      Large Kernel ConvolutionConvNeXt[124]LSKNet[127]]YOLO-MS[129]]A larger receptive field can effectively capture remote dependencies and contextual informationIntroducing huge computational overhead, which is not conducive to real-time detection
      Anchor-freeCenterNet[138]FCOS[141]]YOLOX[143]Avoiding complex anchor box calculationsOften results in inaccurate bounding boxes
      DETRDETR[151]CF-DETR[154]RT-DERT[19]Avoids complex convolutional neural-based designs and post-processingThe training process is slow
      Dual-modeWagner, et al[170]Liu, et al[174]YOLOFusion[182]Improve detection performance and robustness. Especially in complex environmentsIncrease computational costs and system complexity
    • Table 4. Brief performance evaluation on the MS COCO dataset

      View table
      View in Article

      Table 4. Brief performance evaluation on the MS COCO dataset

      ModelBackBoneAPAP0.50AP0.75APSAPMAPLYear
       注:字体加粗表示该模型在此指标精度第一,下划线表示第二,波浪线表示第三
      FPN[76]ResNet10136.259.139.018.239.048.22017
      PANet[84]ResNeXt10140.062.843.118.842.357.22018
      FCOS[140]ResNet10141.560.745.024.444.851.62019
      YOLOX-L[143]Modified CSP v550.068.554.529.854.564.42021
      QueryDet[209]ResNeXt1044.765.647.429.147.553.12022
      RTMDet-m[128]CSPDarkNet49.366.953.930.553.666.12022
      DN-DETR[162]ResNet101+DC547.367.550.828.651.565.02022
      YOLOMS[129]CSPDarkNet51.068.655.733.156.166.52023
      RT-DETR[19]ResNet10154.372.758.636.058.872.12023
    • Table 5. Brief performance evaluation on the DOTA dataset

      View table
      View in Article

      Table 5. Brief performance evaluation on the DOTA dataset

      ModelBackBoneAP0.50YearModelBackBoneAP0.50Year
       注:字体加粗表示该模型在此指标精度第一,下划线表示第二,波浪线表示第三
      YOLOv2[40]DarkNet1925.42017PP-YOLOE-R[149]CSPRepResNet80.72022
      CenterNet[138]ResNet10159. 12019RTMDet-L[128]CSPDarkNet5381.32022
      CADNet[106]ResNet10169.92019Info-FPN[98]ResNet5080.92023
      SLA[201]ResNet5076.32021PCI[115]ReResNet5080.22023
    • Table 6. Brief performance evaluation on the AI-TOD dataset

      View table
      View in Article

      Table 6. Brief performance evaluation on the AI-TOD dataset

      ModelBackBoneAPAP0.50AP0.75APvtAPtAPsAPmYear
       注:字体加粗表示该模型在此指标精度第一,下划线表示第二,波浪线表示第三
      Faster R-CNN[17]ResNet5012.428.38.10.08.426.336.22015
      Cascade R-CNN[207]ResNet5014.432.710.60.09.928.339.92018
      FSAF[140]ResNet5014.435.38.43.414.419.924.22019
      TOOD[145]ResNet5018.643.012.73.216.526.939.22021
      M-CenterNet[13]DLA-3414.540.76.46.115.019.420.42021
      FasterR-CNN/NWD[199]ResNet5020.551.512.45.820.325.435.72021
      Faster R-CNN/RFLA[202]ResNet5021.151.613.19.521.226.131.52022
      FSANet[95]ResNet5016.341.49.84.414.623.433.32022
      Faster R-CNN/ADAS-GPM[203]ResNet5022.353.713.57.121.927.535.12023
    • Table 7. Brief performance evaluation on the TinyPerson dataset

      View table
      View in Article

      Table 7. Brief performance evaluation on the TinyPerson dataset

      Model$ \mathrm{AP}_{50}^{\mathrm{tiny}1} $$ \mathrm{AP}_{50_{ }}^{\mathrm{tiny}2} $$ \mathrm{AP}_{50_{^{ }}}^{\mathrm{tiny}3} $$ {{\rm{AP}}} _{{5 0}}^{{\mathrm{tiny}}} $APallAPyAPyYear
       注:字体加粗表示该模型在此指标精度第一,下划线表示第二,波浪线表示第三
      Cascade R-CNN[207]45.2160.0665.0657.1970.7176.998.562018
      FCOS[141]3.3912.3929.2516.9035.7540.491.452019
      Faster RCNN-SPPNet[90]47.5662.3666.1559.1371.1779.478.622021
      FPN-SM[14]33.9155.1662.5851.3366.9671.556.462021
      Faster R-CNN-RFLA[202]32.8055.6060.6050.1065.3069.905.902022
      SODNe[116]40.5359.5264.6255.5566.2275.987.612022
      FENet[97]37.0255.0362.4451.3366.9272.816.202023
    • Table 8. Brief performance evaluation on the TT-100 K dataset

      View table
      View in Article

      Table 8. Brief performance evaluation on the TT-100 K dataset

      ModelSmallMediumLargeYear
      RecAccF1RecAccF1RecAccF1
       注:字体加粗表示该模型在此指标精度第一,下划线表示第二,波浪线表示第三
      PerceptuaGAN[68]89.084.086.496.091.093.489.091.089.92017
      FPN[76]86.480.183.193.994.093.392.292.292.22017
      Noh, et al[70]92.684.988.697.594.596.097.593.395.42019
      EFPN[63]92.385.788.996.795.796.297.194.395.72021
      SODNet[116]90.085.587.696.695.896.2---2022
      AFPN[94]92.785.188.797.795.396.597.794.396.02022
    Tools

    Get Citation

    Copy Citation Text

    Genghuan LIU, Xiangjin ZENG, Jiazhen DOU, Zhenbo REN, Liyun ZHONG, Jianglei DI, Yuwen QIN. Review of advances in small object detection technology based on deep learning (invited)[J]. Infrared and Laser Engineering, 2024, 53(9): 20240253

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Special issue—Computational optical imaging and application Ⅱ

    Received: Jun. 4, 2024

    Accepted: --

    Published Online: Oct. 22, 2024

    The Author Email: REN Zhenbo (zbren@nwpu.edu.cn)

    DOI:10.3788/IRLA20240253

    Topics