Laser & Optoelectronics Progress, Volume. 62, Issue 14, 1439003(2025)
Infrared-Visible Image Fusion Network Based on Dual-Branch Feature Decomposition
Multimodal image fusion involves the integration of information from different sensors to obtain complementary modal features. Infrared-visible image fusion is a popular topic in multimodal tasks. However, the existing methods face challenges in effectively integrating these different modal features and generating comprehensive feature representations. To address this issue, we propose a dual-branch feature-decomposition (DBDFuse) network. A dual-branch feature extraction structure is introduced, in which the Outlook Attention Transformer (OAT) block is used to extract high-frequency local features, whereas newly designed fold-and-unfold modules in the Stoken Transformer (ST) efficiently capture low-frequency global dependencies. The ST decomposes the original global attention into a product of sparse correlation maps and low-dimensional attention to capture low-frequency global features. Experimental results demonstrate that the DBDFuse network outperforms state-of-the-art (SOTA) methods for infrared-visible image fusion. The fused images exhibit higher clarity and detail retention in visual effects, while also enhancing the complementarity between modalities. In addition, the performance of infrared and visible light fusion images in downstream tasks has been improved, with mean average accuracy of 80.98% in the M3FD object detection task and mean intersection to union ratio of 63.9% in the LLVIP semantic segmentation task.
Get Citation
Copy Citation Text
Xundong Gao, Hui Chen, Yaning Yao, Chengcheng Zhang. Infrared-Visible Image Fusion Network Based on Dual-Branch Feature Decomposition[J]. Laser & Optoelectronics Progress, 2025, 62(14): 1439003
Category: AI for Optics
Received: Dec. 23, 2024
Accepted: Mar. 2, 2025
Published Online: Jul. 16, 2025
The Author Email: Xundong Gao (xundonggao@guet.edu.cn), Hui Chen (Chenhui02@guet.edu.cn)
CSTR:32186.14.LOP242481