Laser & Infrared, Volume. 54, Issue 3, 457(2024)
Infrared and visible image fusion based on transformer and spatial attention model
Currently, the applications of convolutional neural networks to the task of fusing infrared and visible images have achieved better fusion results. Many of these methods are based on network models with self-encoder architectures, which are trained in a self-supervised methods and require the use of hand-designed fusion strategies to fuse features in the testing phase. However, existing methods based on self-encoder networks rarely make full use of both shallow and deep features, and convolutional neural networks are limited by the receptive field, making it more difficult to establish long-range dependencies and thus losing global information. In contrast, Transformer, with the help of self-attention mechanism, can establish long-range dependencies and effectively obtain global contextual information. In terms of fusion strategies, most of the methods are designed in a crude way and do not specifically consider the characteristics of different modal images. Therefore, CNN and Transformer are combined in the encoder to enable the encoder to extract more comprehensive features. And the attention model is applied to the fusion strategy to optimize the features in a more refined way. The experimental results show that the fusion algorithm achieves excellent results in both subjective and objective evaluations compared to other image fusion algorithms.
Get Citation
Copy Citation Text
GENG Jun, WU Zi-hao, LI Wen-hai, LI Xiao-yu. Infrared and visible image fusion based on transformer and spatial attention model[J]. Laser & Infrared, 2024, 54(3): 457
Category:
Received: Mar. 20, 2023
Accepted: Jun. 4, 2025
Published Online: Jun. 4, 2025
The Author Email: WU Zi-hao (761545864@qq.com)