Optoelectronics Letters, Volume. 21, Issue 2, 113(2025)

Detection using mask adaptive transformers in unmanned aerial vehicle imagery

Huibiao YE, Weiming FAN, Yuping GUO, Xuna WANG, and Dalin ZHOU
References(33)

[1] [1] LE C Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.

[2] [2] REDMON J. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08) [2024-03-12]. https://arxiv.org/abs/1804.02767.

[3] [3] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision, August 23-28, 2020, Glasgow, UK. Berlin: Springer, 2020:213-229.

[4] [4] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Hawaii, USA. New York: IEEE, 2017:4700-4708.

[5] [5] YU J, GAO H, CHEN Y, et al. Deep object detector with attentional spatiotemporal LSTM for space human-robot interaction[J]. IEEE transactions on human-machine systems, 2022, 52(4): 784-793.

[6] [6] CHEN X, FAN H, GIRSHICK R, et al. Improved baselines with momentum contrastive learning[EB/OL]. (2020-03-09) [2024-03-12]. https://arxiv.org/abs/2003.04297.

[7] [7] ZHANG H, LU C, CHEN E. Obstacle detection: improved YOLOX-S based on swin transformer-tiny[J]. Optoelectronics letters, 2023, 19(11): 698-704.

[8] [8] DU B, HUANG Y, CHEN J, et al. Adaptive sparse convolutional networks with global context enhancement for faster object detection on drone images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-22, 2023, Vancouver, Canada. New York: IEEE, 2023:13435-13444.

[9] [9] BAO F, NIE S, XUE K, et al. All are worth words: a VIT backbone for diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-22, 2023, Vancouver, Canada. New York: IEEE, 2023: 22669-22679.

[10] [10] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 11-17, 2021, Montreal, Canada. New York: IEEE, 2021:10012-10022.

[11] [11] LI T, WANG J, ZHANG T. L-DETR: a light-weight detector for end-to-end object detection with transformers[J]. IEEE access, 2022, 10: 105685-105692.

[12] [12] DONG X, BAO J, CHEN D, et al. Cswin transformer: a general vision transformer backbone with cross-shaped windows[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 19-24, 2022, Louisiana, USA. New York: IEEE, 2022:12124-12134.

[13] [13] WANG W, XIE E, LI X, et al. Pvt v2: improved baselines with pyramid vision transformer[J]. Computational visual media, 2022, 8(3): 415-424.

[14] [14] WANG W, XIE E, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 11-17, 2021, Montreal, Canada. New York: IEEE, 2021:568-578.

[15] [15] HSIEH M R, LIN Y L, HSU W H. Drone-based object counting by spatially regularized regional proposal network[C]//Proceedings of the IEEE International Conference on Computer Vision, October 22-29, 2017, Venice, Italy. New York: IEEE, 2017: 4145-4153.

[16] [16] DU D, ZHU P, WEN L, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, October 27-November 2, 2019, Seoul, South Korea. New York: IEEE, 2019.

[17] [17] MO N, YAN L. Oriented vehicle detection in high-resolution remote sensing images based on feature amplification and category balance by oversampling data augmentation[J]. The international archives of the photogrammetry, remote sensing and spatial information sciences, 2020, 43: 153-159.

[18] [18] TANG T, ZHOU S, DENG Z, et al. Arbitrary-oriented vehicle detection in aerial imagery with single convolutional neural networks[J]. Remote sensing, 2017, 9(11):1170.

[19] [19] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 7-12, 2015, Boston, USA. New York: IEEE, 2015: 3431-3440.

[20] [20] BADRINARAYANAN V, KENDALL A, CIPOLLA R. Segnet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12):2481-2495.

[21] [21] RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015, October 5-9, 2015, Munich, Germany. Berlin: Springer International Publishing, 2015: 234-241.

[22] [22] YU J, GAO H, SUN J, et al. Spatial cognition-driven deep learning for car detection in unmanned aerial vehicle imagery[J]. IEEE transactions on cognitive and developmental systems, 2021, 14(4): 1574-1583.

[23] [23] YANG C, HUANG Z, WANG N. QueryDet: cascaded sparse query for accelerating high-resolution small object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 19-24, 2022, Louisiana, USA. New York: IEEE,2022: 13668-13677.

[24] [24] MEETHAL A, GRANGER E, PEDERSOLI M. Cascaded zoom-in detector for high resolution aerial images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-22, 2023, Vancouver, Canada. New York: IEEE, 2023:2046-2055.

[25] [25] NGUYEN D L, VO X T, PRIADANA A, et al. Car Detector Based on YOLOv5 for Parking Management[C]//Conference on Information Technology and Its Applications, July 28-29, 2023, Da Nang, Vietnam. Cham: Springer Nature Switzerland, 2023: 102-113.

[26] [26] ZHU C, HE Y, SAVVIDES M. Feature selective anchor-free module for single-shot object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, October 27-November 2, 2019, Seoul, South Korea. New York: IEEE, 2019: 840-849.

[27] [27] ZHANG H, WANG Y, DAYOUB F, et al. Varifocalnet: an IoU-aware dense object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 19-25, 2021, Nashville, TN, USA. New York: IEEE, 2021: 8514-8523.

[28] [28] FENG C, ZHONG Y, GAO Y, et al. TOOD: task-aligned one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 11-17, 2021, Montreal, Canada. New York: IEEE, 2021: 3490-3499.

[29] [29] CHEN Z, YANG C, LI Q, et al. Disentangle your dense object detector[C]//Proceedings of the 29th ACM International Conference on Multimedia, October 21-25, 2021, Chengdu, China. New York: ACM, 2021:4939-4948.

[30] [30] JOCHER G, CHAURASIA A, QIU J. YOLO by Ultralytics[EB/OL]. (2023-01-01) [2024-03-12]. https://github.com/ultralytics/ultralytics/blob/main/CITATION.cff.

[31] [31] WANG X, YAO F, LI A, et al. DroneNet: rescue drone-view object detection[J]. Drones, 2023, 7(7): 441.

[32] [32] WEI Z, DUAN C, SONG X, et al. AMRNet: chips augmentation in aerial images object detection[EB/OL]. (2020-09-15) [2024-03-12]. https://arxiv.org/abs/2009.07168.

[33] [33] ZHANG H, LI F, LIU S, et al. DINO: DETR with improved denoising anchor boxes for end-to-end object detection[EB/OL]. (2022-03-07) [2024-03-12]. https://arxiv.org/abs/2203.03605.

Tools

Get Citation

Copy Citation Text

YE Huibiao, FAN Weiming, GUO Yuping, WANG Xuna, ZHOU Dalin. Detection using mask adaptive transformers in unmanned aerial vehicle imagery[J]. Optoelectronics Letters, 2025, 21(2): 113

Download Citation

EndNote(RIS)BibTexPlain Text
Save article for my favorites
Paper Information

Received: Jul. 27, 2024

Accepted: Jan. 24, 2025

Published Online: Jan. 24, 2025

The Author Email:

DOI:10.1007/s11801-025-4185-7

Topics