Object Detection Algorithm Based on Dual-modal Fusion Network

Table 1. Detector performance for different input image pairs sizes on n-model
View table
View in Article
Table 1. Detector performance for different input image pairs sizes on n-model
Algorithm Resolution AP_0.5：0.95 AP_0.5
Ours-n 416×416 30.5 70
Ours-n 512×512 32.5 73.1
Ours-n 608×608 32.9 73.3
Ours-n 640×640 33.3 73.8

Table 2. Detector performance for different input image pairs sizes on s-model
View table
View in Article
Table 2. Detector performance for different input image pairs sizes on s-model
Algorithm Resolution AP_0.5：0.95 AP_0.5
Ours-s 416×416 31.1 71
Ours-s 512×512 31.9 72.7
Ours-s 608×608 34.3 73.9
Ours-s 640×640 35.2 74.5

Table 3. Ablation experimental results of different models on the KAIST dataset

View table

View in Article

Table 3. Ablation experimental results of different models on the KAIST dataset

Method	Encoder-VS	Encoder-IR	Gated Fusion	Input	AP_0.5：0.95	AP_0.5	FPS
YOLOv5-n				VS	24.8	58.7	158.7
YOLOv5-n				IR	31.6	71	158.7
YOLOv5-n-EVS	√			VS	25	59.1	125
YOLOv5-n-EIR		√		IR	31.8	71.3	125
Ours-n	√	√	√	VS+IR	33.3	73.8	117.6
YOLOv5-s				VS	26.7	59.8	112.4
YOLOv5-s				IR	32	71.5	112.4
YOLOv5-s-EVS	√			VS	26.9	60.2	107.5
YOLOv5-s-EIR		√		IR	32.2	71.9	107.5
Ours-s	√	√	√	VS+IR	35.2	74.5	102

Table 4. Ablation experimental results of different models on the GIR dataset

View table

View in Article

Table 4. Ablation experimental results of different models on the GIR dataset

Method	Encoder-VS	Encoder-IR	Gating Fusion	Input	AP_0.5：0.95	AP_0.5	FPS
YOLOv5-n				VS	48.4	88.8	158.7
YOLOv5-n				IR	36.3	75.5	158.7
YOLOv5-n-EVS	√			VS	49.4	89.1	105.3
YOLOv5-n-EIR		√		IR	36.4	76.3	105.3
Ours-n	√	√	√	VS+IR	49.7	89.8	101
YOLOv5-s				VS	51.4	89.9	111.1
YOLOv5-s				IR	36.6	76.8	111.1
YOLOv5-s-EVS	√			VS	51.9	90.1	91.7
YOLOv5-s-EIR		√		IR	36.7	77	91.7
Ours-s	√	√	√	VS+IR	52.2	90.5	85.5

Table 5. The detection accuracy of the proposed algorithm and the baseline algorithm（AP_0.5%）

View table

View in Article

Table 5. The detection accuracy of the proposed algorithm and the baseline algorithm（AP_0.5%）

Class	Ours-n	YOLOv5-n-VS	YOLOv5-n-IR	Ours-s	YOLOv5-s-VS	YOLOv5-s-IR
Person	90.7	91.2	84.0	91.7	91.7	85.4
Dog	99.5	99.5	99.5	99.5	99.5	91.6
Car	95.4	95.2	94.3	95.8	95.1	94.7
Bicycle	80.4	83.7	70.7	80.8	84.7	72.8
Plant	85.6	84.5	79.1	86.4	87.0	76.0
Motorcycle	82.8	82.0	76.1	83.9	82.4	77.7
Umbrella	86.0	87.8	70.5	85.7	86.6	76.1
Kite	93.6	82.9	64.6	94.4	89.2	67.6
Toy	95.6	96.3	86.7	96.4	97.0	83.7
Ball	88.5	84.7	29.5	90.7	85.5	42.1

Table 6. Comparative experimental results on the KSIAT dataset

View table

View in Article

Table 6. Comparative experimental results on the KSIAT dataset

Input	Algorithm	Backbone	Resolution	AP_0.5：0.95	AP_0.5	FPS
VS	Faster R-CNN（2015）	ResNet-50	1 000×600	24.2	58.3	15.2
	SSD（2016）	VGG-16	512×512	18.1	48.2	38.1
	RetinaNet（2017）	ResNet-50	1 333×800	22.5	57.7	16.6
	YOLOv3（2018）	DarkNet-53	416×416	18.3	46.7	56.2
	FCOS（2019）	ResNet-50	1 333×800	22.7	56.7	18.3
	ATSS（2020）	ResNet-50	1 333×800	24.3	57.8	17
	YOLOv4（2020）	CSPDarkNet-53	416×416	23.7	57.4	55
	YOLOX-s（2021）	Modified CSP v5	416×416	27	61.1	48.4
	YOLOX-m（2021）	Modified CSP v5	416×416	27.7	61.8	40.3
	YOLOF（2021）	ResNet-50	1 333×800	22.2	54.1	25.7
	YOLOv5-n（2020）	Modified CSP v5	640×640	24.8	58.7	158.7
	YOLOv5-s（2020）	Modified CSP v5	640×640	26.4	59.8	112.4
	YOLOv5-n-EVS	Modified CSP v5	640×640	25	59.1	125
	YOLOv5-s-EVS	Modified CSP v5	640×640	26.9	60.2	107.5
IR	Faster R-CNN（2015）	ResNet-50	1 000×600	28.8	68.6	12
	SSD（2016）	VGG-16	512×512	23.2	60.9	34
	RetinaNet（2017）	ResNet-50	1 333×800	27.8	68.2	14.1
	YOLOv3（2018）	DarkNet-53	416×416	25.3	63.6	37
	FCOS（2019）	ResNet-50	1 333×800	29.6	69.4	14
	ATSS（2020）	ResNet-50	1 333×800	29	69	13.8
	YOLOv4（2020）	CSPDarkNet-53	416×416	27.4	68.5	52.6
	YOLOX-s（2021）	Modified CSP v5	416×416	32.8	72.1	45
	YOLOX-m（2021）	Modified CSP v5	416×416	33.5	73.1	40
	YOLOF（2021）	ResNet-50	1 333×800	27.3	65.6	25
	YOLOv5-n（2020）	Modified CSP v5	640×640	31.6	71	158.7
	YOLOv5-s（2020）	Modified CSP v5	640×640	32	71.5	112.4
	YOLOv5-n-EIR	Modified CSP v5	640×640	31.8	71.3	125
	YOLOv5-s-EIR	Modified CSP v5	640×640	32.2	71.9	107.5
VS+IR	MMTOD（2019）^［18］	ResNet-101	1 000×600	31.1	70.7	13.2
	CMDet（2021）^［37］	ResNet-101	640×512	28.3	68.4	25.3
	RISNet（2022）^［38］	DarkNet-53	416×416	33.1	72.7	23
	Ours-n	Modified CSP v5	640×640	33.3	73.8	117.6
	Ours-s	Modified CSP v5	640×640	35.2	74.5	102

Table 7. Comparative experimental results on the GIR dataset

View table

View in Article

Table 7. Comparative experimental results on the GIR dataset

Input	Algorithm	Backbone	Resolution	AP_0.5：0.95	AP_0.5	FPS
VS	YOLOv3（2018）	DarkNet-53	416×416	41.2	85.7	50
	FCOS（2019）	ResNet-50	1 333×800	40.4	84	16
	ATSS（2020）	ResNet-50	1 333×800	47.1	87.1	14
	YOLOv4（2020）	CSPDarkNet-53	416×416	44.5	87.9	53
	YOLOX-s（2021）	Modified CSP v5	416×416	51.7	90.3	52
	YOLOv5-n（2020）	Modified CSP v5	640×640	48.4	88.8	158.7
	YOLOv5-s（2020）	Modified CSP v5	640×640	51.4	89.8	111.1
	YOLOv5-n-EVS	Modified CSP v5	640×640	49.4	89.1	105.3
	YOLOv5-s-EVS	Modified CSP v5	640×640	51.9	90.1	91.7
IR	YOLOv3（2018）	DarkNet-53	416×416	35.6	74.2	48.4
	FCOS（2019）	ResNet-50	1 333×800	34.5	72.3	12
	ATSS（2020）	ResNet-50	1 333×800	35.2	73.4	11.7
	YOLOv4（2020）	CSPDarkNet-53	416×416	35.8	74.7	49
	YOLOX-s（2021）	Modified CSP v5	416×416	36.9	76.3	53
	YOLOv5-n（2020）	Modified CSP v5	640×640	36.3	75.5	158.7
	YOLOv5-s（2020）	Modified CSP v5	640×640	36.6	76.8	111.1
	YOLOv5-n-EIR	Modified CSP v5	640×640	36.4	76.3	105.3
	YOLOv5-s-EIR	Modified CSP v5	640×640	36.7	77	91.7
VS+IR	MMTOD（2019）^［18］	ResNet-101	1 000×600	40.7	84.3	11.2
	CMDet（2021）^［37］	ResNet-101	640×512	48.6	88.9	22.7
	RISNet（2022）^［38］	DarkNet-53	416×416	49.3	89.2	23.3
	Ours-n	Modified CSP v5	640×640	49.7	89.8	101
	Ours-s	Modified CSP v5	640×640	52.2	90.5	85.5

Tools

Get Citation

Copy Citation Text

Ying SUN, Zhiqiang HOU, Chen YANG, Sugang MA, Jiulun FAN. Object Detection Algorithm Based on Dual-modal Fusion Network[J]. Acta Photonica Sinica, 2023, 52(1): 0110002

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites