Position-sensitive Transformer aerial image object detection model

Daxiang LI; Jiani XIN; Ying LIU

doi:10.37188/OPE.20243205.0727

Optics and Precision Engineering, Volume. 32, Issue 5, 727(2024)

Position-sensitive Transformer aerial image object detection model

Daxiang LI... Jiani XIN* and Ying LIU |Show fewer author(s)

College of communication and information engineering， Xi'an University of Posts and Telecommunication， Xi'an710121， China

show less

Abstract Get PDF(in Chinese)

Addressing the challenge of detecting numerous small objects in UAV⁃captured aerial images, this paper introduces the Position⁃Sensitive Transformer Target Detection (PS⁃TOD) model. Initially, it presents a multi⁃scale feature fusion (MSFF) module incorporating a Positional Channel Embedded 3D Attention (PCE3DA) mechanism. PCE3DA leverages the interplay between spatial and channel data to generate 3D attention, enhancing feature representation in areas of interest. This foundation supports a bottom⁃up, cross⁃layer MSFF approach, augmenting the semantic richness of combined features. Subsequently, it proposes a novel Position⁃Sensitive Self⁃Attention (PSSA) mechanism, leading to the development of a position⁃sensitive Transformer encoder⁃decoder. This innovation heightens the model's sensitivity to target positioning, facilitating the capture of long⁃term dependencies within the image's global context. Comparative tests using the VisDrone dataset reveal that the PS⁃TOD model attains an Average Precision (AP) of 28.8%, marking a 4.1% enhancement over the baseline model (DETR). Furthermore, it demonstrates precise object detection in UAV aerial imagery against complex backdrops, significantly boosting the detection accuracy of small targets.

AI Picture Guide

AI One Sentence

AI Short Abstract

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

attention mechanism multi-scale feature fusion object detection position sensitive Transformer unmanned aerial vehicle image

Tools

Get Citation

Copy Citation Text

Daxiang LI, Jiani XIN, Ying LIU. Position-sensitive Transformer aerial image object detection model[J]. Optics and Precision Engineering, 2024, 32(5): 727

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: May. 30, 2023

Accepted: --

Published Online: Apr. 2, 2024

The Author Email: XIN Jiani (xjn_2000@163.com)

DOI:10.37188/OPE.20243205.0727

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology

微信扫一扫：分享