Review of advances in small object detection technology based on deep learning (<i>invited</i>)

Genghuan LIU; Xiangjin ZENG; Jiazhen DOU; Zhenbo REN; Liyun ZHONG; Jianglei DI; Yuwen QIN

doi:10.3788/IRLA20240253

Infrared and Laser Engineering, Volume. 53, Issue 9, 20240253(2024)

Review of advances in small object detection technology based on deep learning (invited)

Genghuan LIU1...2,3, Xiangjin ZENG1,2,3, Jiazhen DOU1,2,3, Zhenbo REN4,*, Liyun ZHONG1,2,3, Jianglei DI1,2,3, and Yuwen QIN1,23 |Show fewer author(s)

Author Affiliations

¹School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China

²Key Laboratory of Photonic Technology for Integrated Sensing and Communication, Ministry of Education, Guangzhou 510006, China

³Guangdong Provincial Key Laboratory of Information, Guangzhou 510006, China

⁴School of Physical Science and Technology, Northwestern Polytechnical University, Xi'an 710129, China

show less

Abstract Get PDF(in Chinese)

Figures & Tables(21)

Fig. 1. Examples of small and tiny objects in the AI-TOD dataset (Green boxes representing small objects, while infrared boxes representing tiny objects)^[12]

Download full size

View in Article

Fig. 2. The complex background leads to losignal-to-noise ratio and low detectability^[6]

Download full size

View in Article

Fig. 3. Low tolerance of small targets to bounding box perturbations( The top-left, bottom-left, and right images respectively represent small, medium, and large targets. Black indicates the ground truth boxes, while blue and red represent predicted bounding boxes slightly offset in the diagonal direction)

Download full size

View in Article

Fig. 4. Four methods of multi-scale representation learning^[76]. (a) Single feature map; (b) Image pyramid；(c) Pyramid feature levels；(d) Feature pyramid network

Download full size

View in Article

Fig. 5. PANet network structure^[81]

Download full size

View in Article

Fig. 6. GCWNet network structure^[114]

Download full size

View in Article

Fig. 7. Module structure of LSKNet^[127]

Download full size

View in Article

Fig. 8. Detection methods of four anchor-free mechanisms. (a) ConnerNet; (b) CenterNet; (c) ExtremeNet; (d) FCOS

Download full size

View in Article

Fig. 9. DETR network structure^[150]

Download full size

View in Article

Fig. 10. AnChor DETR network structure^[157]

Download full size

View in Article

Fig. 11. Four image fusion strategies. (a) Early fusion; (b) Mid-level fusion; (c) Late fusion; (d) Confidence fusion^[169]

Download full size

View in Article

Fig. 12. YOLOFusion network structure^[182]

Download full size

View in Article

Fig. 13. Examples of various datasets. (a) DOTA^[13]; (b) AI_TOD^[12]; (c) DIOR^[8]; (d) VisDrone2019^[22]; (e) TT100 K^[218]; (f) BSTID^[219]; (g) TinyPerson^[14]; (h) CityPerson^[25]; (i) WiderPerson^[220]; (j) BIRDSAI^[221]; (k) VEDAI^[222]; (l) MS COCO^[1]

Download full size

View in Article

Table 1. Data augmentation methods
View table
View in Article
Table 1. Data augmentation methods
Number Method Main content Year Publication
1 CutOut[41] → 2017 arXiv
2 Adaptive Resampling[47] → 2019 ICCV
3 Mosaic[45] → 2019 arXiv

Table 2. Super-resolution methods
View table
View in Article
Table 2. Super-resolution methods
Number Method Main content Year Publication
1 CARAFE[58] 2019 CVPR
2 Perceptual GAN[68] 2017 CVPR
3 MTGAN[71] 2020 IJCV

Table 3. Summary of advantages and disadvantages of small object detection methods

View table

View in Article

Table 3. Summary of advantages and disadvantages of small object detection methods

Method	Model	Advantage	Disadvantage
Data Augmentation	MixUp^[42]CutMix^[43]Mosaic^[45]	Increasing small object samples to address issues with limited visual information for small targets	Heavily relies on specific datasets. May introduce new noise, impairing the performance of feature extraction
Super Resolution	CARAFE^[58]Perceptual GAN^[68]MTGAN^[71]	"By understanding the connections between small and large targets, repair certain small object details	Facing a trade-off between high computational load and performance optimization. GANs may generate false artifacts
Multi-scale Feature Perception and Fusion	FPN^[76]PANet^[78]AFF^[88]	Enhancing with deep semantic-rich features while retaining the spatial richness of shallow features	Prone to interference from noise and computational burdens
Contextual Information Learning	CoupleNet^[103]PyramidBox^[104]GCWNet^[114]	Utilize the connection between the target and its surrounding targets and environment to provide more information for the network	Redundant contextual information can lead to information noise
Large Kernel Convolution	ConvNeXt^[124]LSKNet^[127]]YOLO-MS^[129]]	A larger receptive field can effectively capture remote dependencies and contextual information	Introducing huge computational overhead, which is not conducive to real-time detection
Anchor-free	CenterNet^[138]FCOS^[141]]YOLOX^[143]	Avoiding complex anchor box calculations	Often results in inaccurate bounding boxes
DETR	DETR^[151]CF-DETR^[154]RT-DERT^[19]	Avoids complex convolutional neural-based designs and post-processing	The training process is slow
Dual-mode	Wagner, et al^[170]Liu, et al^[174]YOLOFusion^[182]	Improve detection performance and robustness. Especially in complex environments	Increase computational costs and system complexity

Table 4. Brief performance evaluation on the MS COCO dataset

View table

View in Article

Table 4. Brief performance evaluation on the MS COCO dataset

Model	BackBone	AP	AP0.50	AP0.75	APS	APM	APL	Year
注：字体加粗表示该模型在此指标精度第一，下划线表示第二，波浪线表示第三
FPN^[76]	ResNet101	36.2	59.1	39.0	18.2	39.0	48.2	2017
PANet^[84]	ResNeXt101	40.0	62.8	43.1	18.8	42.3	57.2	2018
FCOS^[140]	ResNet101	41.5	60.7	45.0	24.4	44.8	51.6	2019
YOLOX-L^[143]	Modified CSP v5	50.0	68.5	54.5	29.8	54.5	64.4	2021
QueryDet^[209]	ResNeXt10	44.7	65.6	47.4	29.1	47.5	53.1	2022
RTMDet-m^[128]	CSPDarkNet	49.3	66.9	53.9	30.5	53.6	66.1	2022
DN-DETR^[162]	ResNet101+DC5	47.3	67.5	50.8	28.6	51.5	65.0	2022
YOLOMS^[129]	CSPDarkNet	51.0	68.6	55.7	33.1	56.1	66.5	2023
RT-DETR^[19]	ResNet101	54.3	72.7	58.6	36.0	58.8	72.1	2023

Table 5. Brief performance evaluation on the DOTA dataset

View table

View in Article

Table 5. Brief performance evaluation on the DOTA dataset

Model	BackBone	AP0.50	Year	Model	BackBone	AP0.50	Year
注：字体加粗表示该模型在此指标精度第一，下划线表示第二，波浪线表示第三
YOLOv2^[40]	DarkNet19	25.4	2017	PP-YOLOE-R^[149]	CSPRepResNet	80.7	2022
CenterNet^[138]	ResNet101	59. 1	2019	RTMDet-L^[128]	CSPDarkNet53	81.3	2022
CADNet^[106]	ResNet101	69.9	2019	Info-FPN^[98]	ResNet50	80.9	2023
SLA^[201]	ResNet50	76.3	2021	PCI^[115]	ReResNet50	80.2	2023

Table 6. Brief performance evaluation on the AI-TOD dataset

View table

View in Article

Table 6. Brief performance evaluation on the AI-TOD dataset

Model	BackBone	AP	AP0.50	AP0.75	APvt	APt	APs	APm	Year
注：字体加粗表示该模型在此指标精度第一，下划线表示第二，波浪线表示第三
Faster R-CNN^[17]	ResNet50	12.4	28.3	8.1	0.0	8.4	26.3	36.2	2015
Cascade R-CNN^[207]	ResNet50	14.4	32.7	10.6	0.0	9.9	28.3	39.9	2018
FSAF^[140]	ResNet50	14.4	35.3	8.4	3.4	14.4	19.9	24.2	2019
TOOD^[145]	ResNet50	18.6	43.0	12.7	3.2	16.5	26.9	39.2	2021
M-CenterNet^[13]	DLA-34	14.5	40.7	6.4	6.1	15.0	19.4	20.4	2021
FasterR-CNN/NWD^[199]	ResNet50	20.5	51.5	12.4	5.8	20.3	25.4	35.7	2021
Faster R-CNN/RFLA^[202]	ResNet50	21.1	51.6	13.1	9.5	21.2	26.1	31.5	2022
FSANet^[95]	ResNet50	16.3	41.4	9.8	4.4	14.6	23.4	33.3	2022
Faster R-CNN/ADAS-GPM^[203]	ResNet50	22.3	53.7	13.5	7.1	21.9	27.5	35.1	2023

Table 7. Brief performance evaluation on the TinyPerson dataset

View table

View in Article

Table 7. Brief performance evaluation on the TinyPerson dataset

Model	$ \mathrm{AP}_{50}^{\mathrm{tiny}1} $	$ \mathrm{AP}_{50_{ }}^{\mathrm{tiny}2} $	$ \mathrm{AP}_{50_{^{ }}}^{\mathrm{tiny}3} $	$ {{\rm{AP}}} _{{5 0}}^{{\mathrm{tiny}}} $	APall	APy	APy	Year
注：字体加粗表示该模型在此指标精度第一，下划线表示第二，波浪线表示第三
Cascade R-CNN^[207]	45.21	60.06	65.06	57.19	70.71	76.99	8.56	2018
FCOS^[141]	3.39	12.39	29.25	16.90	35.75	40.49	1.45	2019
Faster RCNN-SPPNet^[90]	47.56	62.36	66.15	59.13	71.17	79.47	8.62	2021
FPN-SM^[14]	33.91	55.16	62.58	51.33	66.96	71.55	6.46	2021
Faster R-CNN-RFLA^[202]	32.80	55.60	60.60	50.10	65.30	69.90	5.90	2022
SODNe^[116]	40.53	59.52	64.62	55.55	66.22	75.98	7.61	2022
FENet^[97]	37.02	55.03	62.44	51.33	66.92	72.81	6.20	2023

Table 8. Brief performance evaluation on the TT-100 K dataset

View table

View in Article

Table 8. Brief performance evaluation on the TT-100 K dataset

Model	Small			Medium			Large			Year
Model	Rec	Acc	F1	Rec	Acc	F1	Rec	Acc	F1	Year
注：字体加粗表示该模型在此指标精度第一，下划线表示第二，波浪线表示第三
PerceptuaGAN^[68]	89.0	84.0	86.4	96.0	91.0	93.4	89.0	91.0	89.9	2017
FPN^[76]	86.4	80.1	83.1	93.9	94.0	93.3	92.2	92.2	92.2	2017
Noh, et al^[70]	92.6	84.9	88.6	97.5	94.5	96.0	97.5	93.3	95.4	2019
EFPN^[63]	92.3	85.7	88.9	96.7	95.7	96.2	97.1	94.3	95.7	2021
SODNet^[116]	90.0	85.5	87.6	96.6	95.8	96.2	-	-	-	2022
AFPN^[94]	92.7	85.1	88.7	97.7	95.3	96.5	97.7	94.3	96.0	2022

Tools

Get Citation

Copy Citation Text

Genghuan LIU, Xiangjin ZENG, Jiazhen DOU, Zhenbo REN, Liyun ZHONG, Jianglei DI, Yuwen QIN. Review of advances in small object detection technology based on deep learning (invited)[J]. Infrared and Laser Engineering, 2024, 53(9): 20240253

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites