Infrared Technology, Volume. 46, Issue 8, 912(2024)

FVIT-YOLO v8: Improved YOLO v8 Small Object Detection Based on Multi-scale Fusion Attention Mechanism

Fukuan LIU, Suyun LUO*, Jia HE, and Chaoneng ZHA
Author Affiliations
  • [in Chinese]
  • show less
    References(35)

    [1] [1] Everingham M, Van Gool L, Williams C K I, et al. The pascal vision object classes (voc) challenge[J]. International Journal of Computer Vision, 2009, 88: 303-308.

    [2] [2] LIN T Y, Maire M, Belongie S, et al. Microsoft coco: lofxol common objects in context[C]//13th European Conference, 2014: 740-755.

    [3] [3] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580-587.

    [4] [4] Girshick R. Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.

    [5] [5] HE K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961-2969.

    [6] [6] Purkait P, Zhao C, Zach C. SPP-Net: Deep absolute pose regression with synthetic views[J]. arXiv preprint arXiv:1712.03452, 2017.

    [7] [7] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.

    [8] [8] Bochkovskiy A, WANG C Y, LIAO H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.

    [9] [9] WANG C Y, Bochkovskiy A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J]. arXiv preprint arXiv:2207.02696, 2022.

    [10] [10] HAN X, CHANG J, WANG K. Real-time object detection based on YOLO-v2 for tiny vehicle object[J]. Procedia Computer Science, 2021, 183: 61-72.

    [11] [11] LIU W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, 2016: 21-37.

    [12] [12] ZHU M, XU Y, MA S, et al. Effective airplane detection in remote sensing images based on multilayer feature fusion and improved nonmaximal suppression algorithm[J]. Remote Sensing, 2019, 11(9): 1062.

    [13] [13] DONG Z, LIN B. BMF-CNN: an object detection method based on multi-scale feature fusion in VHR remote sensing images[J]. Remote Sensing Letters, 2020, 11(3): 215-224.

    [14] [14] ZHU H, ZHANG P, WANG L, et al. A multiscale object detection approach for remote sensing images based on MSE-DenseNet and the dynamic anchor assignment[J]. Remote Sensing Letters, 2019, 10(10): 959-967.

    [15] [15] ZHANG X, ZHU K, CHEN G, et al. Geospatial object detection on high resolution remote sensing imagery based on double multi-scale feature pyramid network[J]. Remote Sensing, 2019, 11(7): 755.

    [16] [16] ZHUANG S, WANG P, JIANG B, et al. A single shot framework with multi-scale feature fusion for geospatial object detection[J]. Remote Sensing, 2019, 11(5): 594.

    [17] [17] CHENG G, SI Y, HONG H, et al. Cross-scale feature fusion for object detection in optical remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 18(3): 431-435.

    [18] [18] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

    [19] [19] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30: 6000-6010.

    [20] [20] ZHU X, LYU S, WANG X, et al. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 2778-2788.

    [21] [21] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.

    [22] [22] XIA G S, BAI X, DING J, et al. DOTA: A large-scale dataset for object detection in aerial images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 3974-3983.

    [23] [23] Targ S, Almeida D, Lyman K. Resnet in resnet: generalizing residual architectures[J]. arXiv preprint arXiv:1603.08029, 2016.

    [24] [24] HAN K, XIAO A, WU E, et al. Transformer in transformer[J]. Advances in Neural Information Processing Systems, 2021, 34: 15908-15919.

    [25] [25] LI X, WANG W, WU L, et al. Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection[J]. Advances in Neural Information Processing Systems, 2020, 33: 21002-21012.

    [26] [26] DAI J, LI Y, HE K, et al. R-fcn: Object detection via region-based fully convolutional networks[J]. Advances in Neural Information Processing Systems, 2016, 29: 379-387.

    [27] [27] Selvaraju R R, Cogswell M, Das A, et al. Grad-cam: Vision explanations from deep networks via gradient-based localization[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 618-626.

    [28] [28] Visdrone Team. Visdrone2020leaderboard [EB/OL][2020-07-10]. http: //aiskyeye.com/ visdrone-2020-leaderboard/.

    [29] [29] CHENG X, YU J. RetinaNet with difference channel attention and adaptively spatial feature fusion for steel surface defect detection[J]. IEEE Transactions on Instrumentation and Measurement, 2020, 70: 1-11.

    [30] [30] ZHANG S, WEN L, BIAN X, et al. Single-shot refinement neural network for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4203-4212.

    [31] [31] LI Z, PENG C, YU G, et al. Detnet: a backbone network for object detection[J]. arXiv preprint arXiv:1804.06215, 2018.

    [32] [32] CAI Z, Vasconcelos N. Cascade r-cnn: delving into high quality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6154-6162.

    [33] [33] LIN T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2117-2125.

    [34] [34] LI Z, PENG C, YU G, et al. Light-head r-cnn: In defense of two-stage object detector[J]. arXiv preprint arXiv:1711.07264, 2017.

    [35] [35] Law H, DENG J. Cornernet: detecting objects as paired keypoints[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018: 734-750.

    Tools

    Get Citation

    Copy Citation Text

    LIU Fukuan, LUO Suyun, HE Jia, ZHA Chaoneng. FVIT-YOLO v8: Improved YOLO v8 Small Object Detection Based on Multi-scale Fusion Attention Mechanism[J]. Infrared Technology, 2024, 46(8): 912

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Apr. 26, 2023

    Accepted: --

    Published Online: Sep. 10, 2024

    The Author Email: Suyun LUO (lsyluo@163.com)

    DOI:

    Topics