Infrared and visible image fusion based on cross-modal feature interaction and multi-scale reconstruction

Fig. 1. The proposed algorithm network structure diagram: Convolutional attention enhancement module(CEM), Encoder network (Encoder), Cross-modal feature interactive fusion module(CFIM) and Decoder network based on multi-scale reconstruction(Decoder)

Download full size

View in Article

Fig. 2. CFIM structure diagram

Download full size

View in Article

Fig. 3. Comparison of TNO data set fusion effect

Download full size

View in Article

Fig. 4. Comparison of LLVIP data set fusion effect

Download full size

View in Article

Fig. 5. Comparison of MSRS data set fusion effect

Download full size

View in Article

Table 1. Encoder network and decoder network parameter table

View table

View in Article

Table 1. Encoder network and decoder network parameter table

	Layer	Size	Stride	Input	Output	Activate
Encode	Conv0	3×3	1	1	16	ReLU
	E_Conv1	3×3	1	16	16	ReLU
	E_Conv2	3×3	2	16	32	ReLU
	E_Conv3	3×3	2	32	64	ReLU
	E_Conv4	3×3	2	64	128	ReLU
Decode	D_Conv1	3×3	1	256	256	ReLU
	D_Conv2	3×3	1	128	128	ReLU
	D_Conv3	3×3	1	64	64	ReLU
	D_Conv4	3×3	1	32	1	ReLU

Table 2. Objective index results of fused images on TNO dataset (Black bold: best, underline: second)

View table

View in Article

Table 2. Objective index results of fused images on TNO dataset (Black bold: best, underline: second)

Method	AG	EI	MI	PSNR	SD	SF	SSIM	VIF
GANMcC	2.44	26.53	0.60	27.87	33.43	5.38	0.80	0.12
SwinFuse	3.91	49.43	0.53	27.89	52.83	8.37	0.53	0.10
U2 Fusion	4.21	51.53	1.35	27.98	37.19	8.42	1.31	0.79
LapH	4.01	50.77	1.30	27.65	44.74	8.21	1.28	0.83
MUFusion	4.21	56.11	0.45	27.90	47.47	7.75	0.53	0.12
CMRFusion	3.41	39.11	2.52	27.82	36.66	7.44	1.30	0.87
TUFusion	1.86	21.06	0.52	27.00	28.93	3.85	0.91	0.10
Ours	4.77	66.35	1.91	27.92	63.28	9.92	1.08	1.01

Table 3. Objective index results of fused images on LLVIP dataset (Black bold: best, underline: second)

View table

View in Article

Table 3. Objective index results of fused images on LLVIP dataset (Black bold: best, underline: second)

Method	AG	EI	MI	PSNR	SD	SF	SSIM	VIF
GANMcC	1.60	19.91	0.58	27.99	31.24	4.19	0.91	0.24
SwinFuse	1.07	17.00	0.48	28.28	26.52	3.63	0.58	0.13
U2 Fusion	1.82	26.37	1.64	28.64	29.46	4.91	1.34	0.70
LapH	2.35	37.73	1.58	28.65	44.35	5.67	1.35	1.02
MUFusion	1.95	27.73	0.50	27.79	33.58	4.74	0.78	0.24
CMRFusion	1.89	28.00	1.91	28.12	33.93	4.87	1.11	0.70
TUFusion	1.12	12.66	0.57	26.77	29.60	2.85	0.97	0.22
Ours	2.16	38.23	2.13	28.75	48.55	5.68	1.10	1.09

Table 4. Objective index of fused images on MSRS dataset (Black bold: best, underline: second)

View table

View in Article

Table 4. Objective index of fused images on MSRS dataset (Black bold: best, underline: second)

Method	AG	EI	MI	PSNR	SD	SF	SSIM	VIF
GANMcC	1.64	19.70	0.49	28.20	26.55	3.97	0.87	0.11
SwinFuse	1.00	16.34	0.30	28.59	28.47	3.18	0.49	0.05
U2 Fusion	1.46	19.22	1.36	27.24	18.56	3.78	0.98	0.43
LapH	2.47	37.19	1.53	28.49	40.75	5.52	1.30	0.78
MUFusion	2.04	29.61	0.32	27.88	27.00	4.65	0.76	0.09
CMRFusion	2.07	31.53	3.56	31.65	42.96	4.87	1.43	1.02
TUFusion	1.39	16.62	0.47	26.78	26.13	3.50	0.86	0.09
Ours	2.29	40.96	2.41	28.65	56.20	5.70	0.93	0.95

Table 5. CFIMRFusion ablation experiment results (black bold represents the highest value)

View table

View in Article

Table 5. CFIMRFusion ablation experiment results (black bold represents the highest value)

Dataset	Methods	AG	EI	SD	SF	VIF
TNO	CEM+CFIM	3.12	40.79	43.62	6.15	0.65
	CEM+JCDE	3.71	45.77	44.39	7.69	0.89
	CFIM+JCDE	4.56	58.56	52.86	9.45	0.90
	CEM+CFIM+JCDE	4.77	66.35	63.28	9.92	1.01
LLVIP	CEM+CFIM	1.85	27.92	36.50	4.76	0.98
	CEM+JCDE	1.86	28.51	37.34	4.87	0.98
	CFIM+JCDE	2.08	34.31	44.54	5.45	1.05
	CEM+CFIM+JCDE	2.16	38.23	48.55	5.68	1.09
MSRS	CEM+CFIM	2.04	31.39	41.93	4.81	0.91
	CEM+JCDE	2.09	32.00	42.87	4.96	0.94
	CFIM+JCDE	2.22	35.92	48.30	5.42	0.94
	CEM+CFIM+JCDE	2.29	40.96	56.20	5.70	0.95

Table 6. Comparison of average running time of different algorithms on three datasets

View table

View in Article

Table 6. Comparison of average running time of different algorithms on three datasets

Time/s	TNO	LLVIP	MSRS
GANMcC	4.8812	22.9348	5.2128
SwinFuse	5.1853	17.6771	5.3423
U2 Fusion	4.2032	11.8176	2.5972
LapH	2.0856	24.5213	1.9659
MUFusion	5.2808	33.3692	4.3691
CMRFusion	8.4142	42.8038	8.3331
TUFusion	3.8353	3.9438	3.8626
CFIMRFusion	0.5029	2.8176	0.4963

Tools

Get Citation

Copy Citation Text

Rui YAO, Kai WANG, Haofan GUO, Wentao HU, Xiangrui TIAN. Infrared and visible image fusion based on cross-modal feature interaction and multi-scale reconstruction[J]. Infrared and Laser Engineering, 2025, 54(8): 20250210

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category: Optical imaging, display and information processing

Received: Apr. 3, 2025

Accepted: --

Published Online: Aug. 29, 2025

The Author Email: Rui YAO (yaorui@nuaa.edu.cn)

DOI:10.3788/IRLA20250210

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology