Optics and Precision Engineering, Volume. 32, Issue 5, 727(2024)
Position-sensitive Transformer aerial image object detection model
Addressing the challenge of detecting numerous small objects in UAV⁃captured aerial images, this paper introduces the Position⁃Sensitive Transformer Target Detection (PS⁃TOD) model. Initially, it presents a multi⁃scale feature fusion (MSFF) module incorporating a Positional Channel Embedded 3D Attention (PCE3DA) mechanism. PCE3DA leverages the interplay between spatial and channel data to generate 3D attention, enhancing feature representation in areas of interest. This foundation supports a bottom⁃up, cross⁃layer MSFF approach, augmenting the semantic richness of combined features. Subsequently, it proposes a novel Position⁃Sensitive Self⁃Attention (PSSA) mechanism, leading to the development of a position⁃sensitive Transformer encoder⁃decoder. This innovation heightens the model's sensitivity to target positioning, facilitating the capture of long⁃term dependencies within the image's global context. Comparative tests using the VisDrone dataset reveal that the PS⁃TOD model attains an Average Precision (AP) of 28.8%, marking a 4.1% enhancement over the baseline model (DETR). Furthermore, it demonstrates precise object detection in UAV aerial imagery against complex backdrops, significantly boosting the detection accuracy of small targets.
Get Citation
Copy Citation Text
Daxiang LI, Jiani XIN, Ying LIU. Position-sensitive Transformer aerial image object detection model[J]. Optics and Precision Engineering, 2024, 32(5): 727
Category:
Received: May. 30, 2023
Accepted: --
Published Online: Apr. 2, 2024
The Author Email: XIN Jiani (xjn_2000@163.com)