Position-sensitive Transformer aerial image object detection model

方法	MSFF	PSSA	Loss	AP_S	AP_M	AP_L	AP	Param/M
基线	-	-	-	13.8	36.8	47.5	24.7	41.30
	√	-	-	16.4	38.9	49.4	26.4	42.36
	-	√	-	15.0	37.6	48.7	25.8	41.45
	-	-	√	15.6	39.1	48.9	26.0	41.30
	√	√	-	17.1	39.7	49.8	27.2	42.51
	-	√	√	16.5	40.0	49.1	26.9	41.45
	√	-	√	18.5	39.6	50.1	28.1	42.36
Ours	√	√	√	19.4	40.1	50.9	28.8	42.51

Table 2. Experimental results for different attention mechanisms and using multi-scale features

View table

View in Article

Table 2. Experimental results for different attention mechanisms and using multi-scale features

组别	方法	AP_S	AP_M	AP_L	AP
A	Baseline	13.8	36.8	47.5	24.7
B	Baseline-SE	13.9	37.0	47.5	24.9
C	Baseline-SA	14.5	38.1	47.7	25.2
D	Baseline-CA	14.3	37.7	48.3	25.4
E	Baseline-CBAM	14.6	37.5	48.1	25.2
F	Baseline-PCE3DA	15.2	38.4	48.7	25.7
G	F+MSFF	16.4	38.9	49.4	26.4

Table 3. Experimental results of different relative position calculation methods
View table
View in Article
Table 3. Experimental results of different relative position calculation methods
方法 AP_S AP_M AP_L AP
基线模型 13.8 36.8 47.5 24.7
文献［27］ 14.3 37.0 48.3 25.0
文献［28］ 14.6 37.4 48.1 25.1
PSSA 15.0 37.6 48.7 25.8

Table 4. Performance comparison of different algorithms on VisDrone test set

View table

View in Article

Table 4. Performance comparison of different algorithms on VisDrone test set

方法	AP₅₀	AP₇₅	AP	FPS
Faster R-CNN^［3］	21.7	/	/	15.9
Cascade R-CNN^［4］	38.6	25.0	23.5	9.0
YOLOv4^［6］	31.2	16.7	16.8	28.8
QueryDet^［7］	48.1	28.8	28.3	2.8
CornerNet^［10］	34.1	15.8	17.4	15.5
RetinaNet^［20］	28.4	12.3	11.3	16
Double-Head RCNN^［29］	38.3	24.8	23.8	6.5
IterDet^［30］	36.8	20.3	20.4	11.4
RSOD^［31］	43.3	27.1	25.4	28
YOLOv8^［32］	46.4	27.5	26.5	30.1
PVTv2^［33］	34.1	21.4	20.6	10.9
PS-TOD（Ours）	51.8	28.3	28.8	22.7

Table 5. Experimental results of different categories on VisDrone test set
View table
View in Article
Table 5. Experimental results of different categories on VisDrone test set
目标类别行人人汽车公交车自行车卡车三轮车雨棚三轮车面包车摩托车
基线模型 24.8 18.7 61.6 35.2 12.1 23.3 15.2 4.6 28.6 24.9
PS-TOD 29.0 22.4 64.3 45.9 14.7 27.1 21.4 9.0 31.7 28.4

Tools

Get Citation

Copy Citation Text

Daxiang LI, Jiani XIN, Ying LIU. Position-sensitive Transformer aerial image object detection model[J]. Optics and Precision Engineering, 2024, 32(5): 727

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites