Laser & Optoelectronics Progress, Volume. 61, Issue 2, 0211023(2024)
DETR with Improved DeNoising Training for Multi-Scale Oriented Object Detection in Optical Remote Sensing Images (Invited)
Ruijiao Jin1,2、†, Kun Wang1,2、†, Minhao Liu1,2, Xichao Teng1,2, Zhang Li1,2、*, and Qifeng Yu1,2
Author Affiliations
1College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410000, Hunan , China2Hunan Key Laboratory of Image Measurement and Vision Navigation, Changsha 410000, Hunan , Chinashow less
Oriented object detection is one of the important tasks in remote sensing image interpretation, which faces typical problems such as arbitrary object orientation, dense arrangement of small targets, and angular periodicity caused by target representation, thus, this paper proposes a method calledarbitrary-oriented object detection Transformer with improved deNoising anchor boxes (AO2DINO) which based on DEtection Transformer (DETR) and improved denoising training. First, a multi-scale rotated deformable attention (MS-RDA) module is proposed. The MS-RDA module introduces the angle information in the form of rotation matrix for the calculation of attention weights, which improves the adaptability of the model to the orientated objects. Second, this paper proposes a self-adaption assigner (SAA), which uses the rotated intersection over union (IoU) and adaptive threshold to accurately separate dense targets, to improve the small targets detection under the dense arrangement scenarios. Finally, the Kalman filtering IoU (KFIoU) is introduced as the regression loss to solve the angular periodicity problem caused by the representation of orientated objects. Our proposed method is compared with the typical oriented bounding box (OBB) methods on two public datasets, DOTAv1.0 and DIOR-R, and the detection accuracy is the highest among the DETR-based OBB methods, and the convergence speed is faster during training, which only needs 12 training epochs to achieve comparable detection accuracy as other methods using 36 training epochs.