Object Detection Algorithm Based on CNN-Transformer Dual Modal Feature Fusion

Method	CFE	TFE	IRF	Input		KAIST
Method	CFE	TFE	IRF	RGB	IR	AP_0.5	AP_0.5：0.95	FPS
YOLOv5					√	71.5	32	112.4
YOLOv5				√		59.8	26.7	112.4
YOLOv5+CFE	√				√	72.2	32.4	103.5
YOLOv5+TFE		√		√		60.4	26.8	98
YOLOv5+CFE+IRF	√		√	√	√	76.3	33.7	94.5
YOLOv5+TFE+IRF		√	√	√	√	76.5	33.9	90.1
CTDMDet （ours）	√	√	√	√	√	77.2	34.6	88.7

Table 2. Ablation experiment on the FLIR dataset

View table

View in Article

Table 2. Ablation experiment on the FLIR dataset

Method	CFE	TFE	IRF	Input		FLIR
Method	CFE	TFE	IRF	RGB	IR	AP_0.5	AP_0.5：0.95	FPS
YOLOv5					√	73.9	35.7	132.6
YOLOv5				√		67.8	25.9	132.6
YOLOv5+CFE	√				√	82.4	43.9	124.7
YOLOv5+TFE		√		√		80	40.4	119.5
YOLOv5+CFE+IRF	√		√	√	√	85.3	46.2	116.9
YOLOv5+TFE+IRF		√	√	√	√	84.9	45.0	111
CTDMDet （ours）	√	√	√	√	√	85.5	46.6	108.3

Table 3. Ablation experiment on the GIR dataset

View table

View in Article

Table 3. Ablation experiment on the GIR dataset

Method	CFE	TFE	IRF	Input		GIR
Method	CFE	TFE	IRF	RGB	IR	AP_0.5	AP_0.5：0.95	FPS
YOLOv5					√	76.8	36.6	111.1
YOLOv5				√		89.9	51.4	111.1
YOLOv5+CFE	√				√	84.4	47.9	106.7
YOLOv5+TFE		√		√		91.1	52.7	103
YOLOv5+CFE+IRF	√		√	√	√	91.6	55.9	95.2
YOLOv5+TFE+IRF		√	√	√	√	91.3	55.8	93.8
CTDMDet （ours）	√	√	√	√	√	91.7	56.4	91.6

Table 4. Quantitative analysis results on the KAIST， FLIR， and GIR datasets

View table

View in Article

Table 4. Quantitative analysis results on the KAIST， FLIR， and GIR datasets

Input	Algorithm	KAIST			FLIR			GIR
Input	Algorithm	AP_0.5	AP_0.5：0.95	FPS	AP_0.5	AP_0.5：0.95	FPS	AP_0.5	AP_0.5：0.95	FPS
IR	Faster-RCNN （2015）	68.6	28.8	12	78.4	37.9	16	77.9	39.2	10.9
	SSD （2016）	60.9	23.2	34	40	13.2	37.8	75.2	36.9	32.6
	RetinaNet （2017）	68.2	27.8	14.1	76.1	32.3	16.2	78.1	38.3	11.3
	YOLOv3 （2018）	63.6	25.3	37	72.6	30.4	39.6	74.2	35.6	48.4
	FCOS （2019）	69.4	29.6	14	82.4	42.6	17.3	72.3	34.5	12
	ATSS （2020）	69	29	13.8	71.5	38.6	15	73.4	35.2	11.7
	YOLOv4 （2020）	68.5	27.4	52.6	48.5	20.4	55.6	74.7	35.8	49
	YOLOv5-s （2020）	71.5	32	112.4	73.9	35.7	132.6	76.8	36.6	111.1
	YOLOF （2021）	65.6	27.3	25	51.2	19.3	30	68.3	30.7	22
	YOLOv7 （2022）	72.1	30.9	110.7	74.8	33.6	113.5	75.9	30.6	110.7
	YOLOv8 （2023）	68.7	28	107	70.6	31.6	110.1	73.2	32.9	106.4
	CTDMDet-IR	72.2	32.4	103.5	82.4	43.9	124.7	84.4	47.9	106.7
RGB	Faster-RCNN （2015）	58.3	24.2	15.2	65.6	22.8	16.8	88.9	45.8	12.6
	SSD （2016）	48.2	18.1	38.1	57.4	18.6	40.2	85.4	38.8	34.2
	RetinaNet （2017）	57.7	22.5	16.6	65.2	22.3	19.3	87.6	43.9	12.8
	YOLOv3 （2018）	46.7	18.3	56.2	56.9	16.8	58.8	85.7	41.2	50
	FCOS （2019）	56.7	22.7	18.3	67.1	26.6	25.1	84	40.4	16
	ATSS （2020）	57.8	24.3	17	57.8	23.9	18.9	87.1	47.1	14
	YOLOv4 （2020）	57.4	23.7	56	65.3	22.7	58	87.9	44.5	53
	YOLOv5-s （2020）	59.8	26.4	112.4	67.8	25.9	132.6	89.8	51.4	111.1
	YOLOF （2021）	54.1	22.2	25.7	44.8	16.8	29.3	76.1	42.8	21.3
	YOLOv7 （2022）	59.6	23.8	101.7	66.3	23.5	108.9	88.2	50.1	98.2
	YOLOv8 （2023）	56.2	21.5	100.4	63.9	21	103.4	84.6	44.6	97.7
	CTDMDet-RGB	60.4	26.8	98	80	40.4	119.5	91.1	52.7	103
IR + RGB	MMTOD ^［45］（2019）	70.7	31.3	13.2	76.3	37.2	13.2	84.3	40.7	11.2
	CMDet^［46］（2021）	68.4	28.3	25.3	70.5	35.3	25.3	88.9	48.6	22.7
	GAFF^［37］（2021）	67.1	24.4	70.9	72.9	33.4	70.9	82.1	54.4	70.9
	CFT^［31］（2021）	71.2	29.3	88	77.7	36.8	88	88	60.5	41.6
	ProbEn^［32］（2021）	-	-	-	75.5	37.9	-	-	-	-
	RISNet^［33］（2022）	72.7	33.1	23	78.5	40.1	23	89.2	49.3	23.3
	CSAA^［34］（2023）	-	-	-	79.2	41.6	-	-	-	-
	CTDMDet （Ours）	77.2	34.6	88.7	85.5	46.6	108.3	91.7	56.4	91.6

Tools

Get Citation

Copy Citation Text

Chen YANG, Zhiqiang HOU, Xinyue LI, Sugang MA, Xiaobao YANG. Object Detection Algorithm Based on CNN-Transformer Dual Modal Feature Fusion[J]. Acta Photonica Sinica, 2024, 53(3): 0310001

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites