Infrared and visible image fusion based on cross-modal feature interaction and multi-scale reconstruction

Rui YAO; Kai WANG; Haofan GUO; Wentao HU; Xiangrui TIAN

doi:10.3788/IRLA20250210

Infrared and Laser Engineering, Volume. 54, Issue 8, 20250210(2025)

Infrared and visible image fusion based on cross-modal feature interaction and multi-scale reconstruction

Rui YAO^*, Kai WANG, Haofan GUO, Wentao HU, and Xiangrui TIAN

Author Affiliations

School of Automation Enginerring, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

show less

Abstract Get PDF(in Chinese)

ObjectiveThe goal of infrared and visible light image fusion is to amalgamate image data from disparate sensors into a unified representation that preserves the complementary information and salient features inherent to both modalities. The autoencoder framework offers significant advantages in imagine fusion, whose encoder can efficiently extract image features, and decoder can precisely reconstruct the image. Moreover, by introducing mechanisms such as attention, the issues of detail loss and information retention can be effectively addressed. However, existing fusion methods still have many shortcomings: 1) Conventional autoencoder frameworks struggle to effectively learn deep features, leading to gradient vanishing issues during the image reconstruction process; 2) In low-light conditions, existing algorithms are frequently impeded by the suboptimal quality of visible light images, which significantly degrades the overall fusion quality; 3) Existing deep learning-based algorithms are often computationally intensive, thus being unsuitable for real-time applications. Therefore, a novel infrared and visible light image fusion algorithm based on Cross-modal Feature Interaction and Multi-scale Reconstruction, CFIMRFusion, has been proposed.MethodsThe proposed algorithm comprises four key components: a convolutional attention enhancement module, an encoder network, a cross-modal feature interaction fusion module, and a decoder network based on multi-scale reconstruction (Fig.1). The convolutional attention enhancement module extracts features through convolutional operations, and enhances the contrast and texture visibility of degraded visible light images, thereby enhancing the detailed features of the images. The encoder module employs convolutions to extract deep features from both infrared images and the enhanced visible light images. To fully leverage the multi-modal features of infrared and visible light images, a cross-modal feature interaction fusion module has been developed, which performs complementary fusion on infrared and visible light image via a channel-spatial attention mechanism. Additionally, considering that the encoder structure with direct connections is prone to feature vanishing during training, a decoder network based on multi-scale reconstruction has been developed, where fused features extracted by encoders at different levels are skip-connected to the decoder network.Results and DiscussionsIn the objective analysis of the proposed algorithm compared with GANMcC, SwinFuse, U2Fusion, LapH, MUFusion, CMRFusion, and TUFusion, the proposed algorithm obtained six optimal values and one suboptimal value on the TNO dataset (Tab.2) and LLVIP dataset (Tab.3), three optimal values and four sub-optimal values are obtained on the MSRS data set (Tab.4). The subjective evaluations on TNO, LLVIP, and MSRS datasets are shown in Fig.3-Fig.5 respectively. The fused images exhibit superior detail features and overall visual effects compared with the benchmark algorithms. Moreover, the fusion speed is 24.1%, 23.86% and 25.2% of the fastest comparison algorithm respectively.ConclusionsA novel infrared and visible light image fusion algorithm based on cross-modal feature interaction and multi-scale reconstruction, CFIMRFusion, has been developed, to enhance information acquisition in low-light conditions and produce fused images with clear details and improved visual effects. Initially, the degraded visible light image is fed into a convolutional attention enhancement module to obtain enhanced features. Subsequently, the enhanced visible light image and the infrared image are fed an autoencoder network to extract multi-scale features, which are then complementarily fused via a cross-modal feature interaction fusion module. Finally, a decoder network based on multi-scale reconstruction is employed to generate the fused image with rich details and clear structures. Experimental results indicate that, compared to the optimal comparison algorithm, the fused images of CFIMRFusion achieved 15.8% and 18.2% increase in average gradient and edge intensity, respectively, on the TNO dataset; 11.5% and 9.5% increase in mutual information and standard deviation on the LLVIP dataset; and 10.1% increase in edge intensity in the MSRS dataset. Moreover, ablation studies further demonstrate the effectiveness of each module in the algorithm, where the complete model outperforms partial module combinations in terms of AG, EI, SD, SF, and VIF metrics. Regarding computational efficiency, the proposed CFIMRFusion algorithm achieves the shortest average runtime and is thus capable of satisfying the real-time requirements of applications in low-light scenarios with high temporal constraints.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

attention mechanism cross-modal feature interaction image enhancement image fusion self-coding network

Tools

Get Citation

Copy Citation Text

Rui YAO, Kai WANG, Haofan GUO, Wentao HU, Xiangrui TIAN. Infrared and visible image fusion based on cross-modal feature interaction and multi-scale reconstruction[J]. Infrared and Laser Engineering, 2025, 54(8): 20250210

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category: Optical imaging, display and information processing

Received: Apr. 3, 2025

Accepted: --

Published Online: Aug. 29, 2025

The Author Email: Rui YAO (yaorui@nuaa.edu.cn)

DOI:10.3788/IRLA20250210

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology