Multi-scale Transformer Fusion Method for Infrared and Visible Images

Yanlin CHEN; Zhishe WANG; Wenyu SHAO; Fan YANG; Jing SUN

Infrared Technology, Volume. 45, Issue 3, 266(2023)

Yanlin CHEN, Zhishe WANG^*, Wenyu SHAO, Fan YANG, and Jing SUN

Author Affiliations

[in Chinese]

show less

Abstract Get PDF(in Chinese)

Mainstream fusion methods based on deep learning employ a convolutional operation to extract local image features; however, the interaction between an image and convolution kernel is content-independent, and the long-range dependency cannot be well modeled. Consequently, the loss of important contextual information may be unavoidable and further limit the fusion performance of infrared and visible images. To this end, we present a simple and effective fusion network for infrared and visible images, namely, the multiscale transformer fusion method (MsTFusion). We first designed a novel Conv Swin Transformer block to model long-range dependency. A convolutional layer was used to improve the representative ability of the global features. Subsequently, we constructed a multiscale self-attentional encoding-decoding network to extract and reconstruct global features without the help of local features. Moreover, we designed a learnable fusion layer for feature sequences that employed softmax operations to calculate the attention weight of the feature sequences and highlight the salient features of the source image. The proposed method is an end-to-end model that uses a fully attentional model to interact with image content and attention weights. We conducted a series of experiments on TNO and road scene datasets, and the experimental results demonstrated that the proposed MsTFusion transcended other methods in terms of subjective visual observations and objective indicator comparisons. By integrating the self-attention mechanism, our method built a fully attentional fusion model for infrared and visible image fusion and modeled the long-range dependency for global feature extraction and reconstruction to overcome the limitations of deep learning-based models. Compared with other state-of-the-art traditional and deep learning methods, MsTFusion achieved remarkable fusion performance with strong generalization ability and competitive computational efficiency.

Keywords

image fusion infrared image multi-scale self-attentional mechanism Swin Transformer

Tools

Get Citation

Copy Citation Text

CHEN Yanlin, WANG Zhishe, SHAO Wenyu, YANG Fan, SUN Jing. Multi-scale Transformer Fusion Method for Infrared and Visible Images[J]. Infrared Technology, 2023, 45(3): 266

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: Aug. 23, 2022

Accepted: --

Published Online: Apr. 7, 2023

The Author Email: Zhishe WANG (wangzs@tyust.edu.cn)

DOI:

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology