Acta Optica Sinica, Volume. 45, Issue 8, 0810001(2025)
Dual‑Branch Multimodal Medical Image Fusion Based on Local and Global Information Collaboration
Medical image fusion is a crucial technology for assisting doctors in making accurate diagnoses. However, existing medical image fusion techniques suffer from issues such as blurred lesion boundaries, loss of detailed information, and high texture similarity between normal tissues and lesion regions. To address these problems, we propose a dual-branch multimodal medical image fusion method based on the collaboration of local and global information. This method not only reduces the loss of detailed information but also effectively improves the clarity and accuracy of the fused images, which ensures more precise identification of lesion regions, thereby providing more reliable and accurate support for medical image diagnosis.
We propose a dual-branch multimodal medical image fusion model based on the collaboration of local and global information. Firstly, the model utilizes a multi-scale depth-separable convolutional network to extract feature information with different receptive fields from the input images. Subsequently, the extracted features are fed into a dual-branch structure, which mainly consists of two modules: the deep local feature enhancement module and the global information extraction module. The local feature enhancement module focuses on enhancing image details, especially those in the lesion areas, to improve the clarity of these regions. The global information extraction module, on the other hand, emphasizes capturing the global structural context of the input images, which ensures the overall consistency of the images and the integrity of their organizational structures during the fusion process. To further optimize the feature fusion process, we introduce two advanced fusion units: the Multidimensional Joint Local Fusion Unit (MLFU) and the Cross-Global Fusion Unit (CGFU). The MLFU efficiently fuses the local features extracted by the two branches, which ensures that important fine-grained information is retained and enhanced during the fusion process. The CGFU promotes the fusion of global features, which facilitates information sharing and complementarity between different modalities. Finally, a convolutional layer is used to adjust and reconstruct the fused features, and a fused image with richer details and higher clarity is output.
The effectiveness of the proposed model in medical image fusion tasks has been validated through extensive comparison and ablation experiments. The experimental results demonstrate that our model significantly outperforms nine other mainstream methods in several key objective evaluation metrics on the Harvard public medical image dataset. Specifically, our model achieves improvements of 3.14%, 0.95%, 13.66%, 16.81%, and 1.12% in EN, SD, SF, AG, and CC, respectively (Table 4). The model enhances local feature extraction through the deep local feature enhancement module, which accurately captures subtle differences in the input images and significantly improves the clarity of lesion boundaries. Furthermore, to further optimize the fusion results, the model employs different fusion strategies for different feature types, which effectively integrates local features with global information and achieves more efficient information complementarity and collaboration between modalities. As a result, the fused images exhibit richer texture details and clearer structural features, thereby significantly enhancing image readability and diagnostic value (Figs. 6, 7, and 8). Ablation experiments further validate the effectiveness of each module in the model. The results show that removing the deep local feature enhancement module leads to a noticeable decline in lesion boundary clarity and texture detail, particularly in high-contrast lesion areas, where the fusion quality deteriorates. Furthermore, removing the global information fusion module results in a significant loss of global consistency and information complementarity between different modalities, which leads to a fusion result with a lack of necessary global coherence (Figs. 11, 12, and 13).
The algorithm proposed in this paper effectively integrates local and global information, achieving excellent detail preservation and structural representation in medical image fusion tasks. This method not only accurately fuses normal tissue structures to ensure global consistency but also highlights the details of abnormal lesion areas, which improves the visibility and recognizability of lesions. By combining deep local feature extraction with global context information, the algorithm ensures the preservation of local details while effectively enhancing the texture features and boundary clarity of lesions. The fused images exhibit richer texture details and clearer structural features. Experimental results, verified through numerous comparative experiments, demonstrate that this algorithm is effective in improving the diagnostic accuracy of medical images. Compared with other mainstream methods, the proposed algorithm performs outstandingly in multiple key objective evaluation metrics, especially in terms of detail preservation, structural clarity, and lesion prominence.
Get Citation
Copy Citation Text
Yu Shen, Jiaying Liu, Jiarong Yan, Ruoxuan Wang, Yukun Ma, Jiangcheng Li, Shan Bai, Ziyi Wei, Yangyang Li, Zhenkai Qiang. Dual‑Branch Multimodal Medical Image Fusion Based on Local and Global Information Collaboration[J]. Acta Optica Sinica, 2025, 45(8): 0810001
Category: Image Processing
Received: Jan. 6, 2025
Accepted: Feb. 18, 2025
Published Online: Apr. 27, 2025
The Author Email: Liu Jiaying (1640264144@qq.com)
CSTR:32393.14.AOS250443