Optoelectronics Letters, Volume. 20, Issue 6, 372(2024)

Fusion network for small target detection based on YOLO and attention mechanism

Caie XU1...2,3,*, Zhe DONG1, Shengyun ZHONG1, Yijiang CHEN1, Sishun PAN1 and Mingyang and WU1 |Show fewer author(s)
Author Affiliations
  • 1School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310012, China
  • 2Research Development Department, Hangzhou Xinhe Data Technology Co., Ltd., Hangzhou 311202, China
  • 3College of Mechanical Engineering, Zhejiang University, Hangzhou 310013, China
  • show less
    References(18)

    [1] [1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017: 84-90.

    [2] [2] REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08) [2023-06-24]. https://arxiv.org/abs/1804.02767.

    [3] [3] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition,July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 2017: 936-944.

    [4] [4] BOCHKOVSKIY A, WANG C Y, LIAO H Y. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23) [2023-06-24]. https://arxiv.org/abs/2004. 10934.

    [5] [5] VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, December 8-14, 2001, Kauai, HI, USA. New York: IEEE, 2001: 990517.

    [6] [6] FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object detection with discriminatively trained part-based models[J]. IEEE transactions on pattern analysis and machine intelligence, 2010, 32(9): 1627-1645.

    [7] [7] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2014, Columbus, OH, USA. New York: IEEE, 2014: 81.

    [8] [8] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916.

    [9] [9] GIRSHICK R. Fast R-CNN[C]//2015 IEEE InternationalConference on Computer Vision, December 7-13, 2015, Santiago, Chile. New York: IEEE, 2015: 169.

    [10] [10] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6): 1137-1149.

    [11] [11] HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[J]. IEEE transactions on pattern analysis and machine intelligence, 2020, 42: 386-397.

    [12] [12] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//16th European Conference on Computer Vision, August 23-18, 2020, Glasgow, UK. Berlin, Heidelberg: Springer, 2020: 213-229.

    [13] [13] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. (2020-10-22) [2023-06-24]. https://arxiv.org/abs/2010.11929v1.

    [14] [14] HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2020, 42: 2011-2023.

    [15] [15] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//15th European Conference on Computer Vision, September 8-14, 2018, Munich, Germany. Berlin, Heidelberg: Springer, 2018: 3-19.

    [16] [16] HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern RecognitionJune 19-25, 2021, Nashville, TN, USA. New York: IEEE, 2021: 01350.

    [17] [17] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Neural information processing systems, neural information processing systems, 2017: 30.

    [18] [18] RAMACHANDRAN P, ZOPH B, LE Q. Searching for activation functions[EB/OL]. (2017-10-16) [2023-06-24]. https://arxiv.org/abs/1710.05941v2.

    Tools

    Get Citation

    Copy Citation Text

    XU Caie, DONG Zhe, ZHONG Shengyun, CHEN Yijiang, PAN Sishun, and WU Mingyang. Fusion network for small target detection based on YOLO and attention mechanism[J]. Optoelectronics Letters, 2024, 20(6): 372

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Received: Aug. 28, 2023

    Accepted: Nov. 11, 2023

    Published Online: Aug. 23, 2024

    The Author Email: Caie XU (caiexu@163.com)

    DOI:10.1007/s11801-024-3177-3

    Topics