Optoelectronics Letters, Volume. 20, Issue 6, 372(2024)
Fusion network for small target detection based on YOLO and attention mechanism
[1] [1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017: 84-90.
[2] [2] REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08) [2023-06-24]. https://arxiv.org/abs/1804.02767.
[3] [3] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition,July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 2017: 936-944.
[4] [4] BOCHKOVSKIY A, WANG C Y, LIAO H Y. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23) [2023-06-24]. https://arxiv.org/abs/2004. 10934.
[5] [5] VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, December 8-14, 2001, Kauai, HI, USA. New York: IEEE, 2001: 990517.
[6] [6] FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object detection with discriminatively trained part-based models[J]. IEEE transactions on pattern analysis and machine intelligence, 2010, 32(9): 1627-1645.
[7] [7] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2014, Columbus, OH, USA. New York: IEEE, 2014: 81.
[8] [8] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916.
[9] [9] GIRSHICK R. Fast R-CNN[C]//2015 IEEE InternationalConference on Computer Vision, December 7-13, 2015, Santiago, Chile. New York: IEEE, 2015: 169.
[10] [10] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6): 1137-1149.
[11] [11] HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[J]. IEEE transactions on pattern analysis and machine intelligence, 2020, 42: 386-397.
[12] [12] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//16th European Conference on Computer Vision, August 23-18, 2020, Glasgow, UK. Berlin, Heidelberg: Springer, 2020: 213-229.
[13] [13] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. (2020-10-22) [2023-06-24]. https://arxiv.org/abs/2010.11929v1.
[14] [14] HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2020, 42: 2011-2023.
[15] [15] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//15th European Conference on Computer Vision, September 8-14, 2018, Munich, Germany. Berlin, Heidelberg: Springer, 2018: 3-19.
[16] [16] HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern RecognitionJune 19-25, 2021, Nashville, TN, USA. New York: IEEE, 2021: 01350.
[17] [17] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Neural information processing systems, neural information processing systems, 2017: 30.
[18] [18] RAMACHANDRAN P, ZOPH B, LE Q. Searching for activation functions[EB/OL]. (2017-10-16) [2023-06-24]. https://arxiv.org/abs/1710.05941v2.
Get Citation
Copy Citation Text
XU Caie, DONG Zhe, ZHONG Shengyun, CHEN Yijiang, PAN Sishun, and WU Mingyang. Fusion network for small target detection based on YOLO and attention mechanism[J]. Optoelectronics Letters, 2024, 20(6): 372
Received: Aug. 28, 2023
Accepted: Nov. 11, 2023
Published Online: Aug. 23, 2024
The Author Email: Caie XU (caiexu@163.com)