Acta Photonica Sinica, Volume. 52, Issue 11, 1110001(2023)

Remote Sensing Image Fusion Method Based on Improved Swin Transformer

Zitong LI, Jiankang ZHAO*, Jingran XU, Haihui LONG, and Chuanqi LIU
Author Affiliations
  • School of Electronic Information and Electrical Engineering,School of Perceptual Science and Engineering,Shanghai Jiao Tong University,Shanghai 200240,China
  • show less

    Remote sensing images are widely used in land monitoring, environmental perception, disaster prediction and urban analysis. Most commercial satellites such as WorldView-4, QuickBird and WorldView-2 are equipped with sensors that can obtain panchromatic images and multispectral images at the same time. Panchromatic images have high spatial resolution but have only one band. The spatial resolution of multispectral images is low due to the bandwidth limitation of the equipment. In order to obtain more accurate details of the measured object, panchromatic image and multispectral image can be fused to generate images with both high spatial resolution and high spectral resolution. Fusion methods of multispectral and panchromatic images can be divided into four categories: multi-resolution analysis method, component substitution method, variational optimization method and deep learning method. Compared with traditional methods, deep learning has stronger feature extraction ability, so it is widely used. Currently, transformer structure is introduced into advanced remote sensing image fusion method. Aiming at the problem that existing methods based on transformer fail to fully integrate multi-scale features of remote sensing images, this paper proposes a multispectral-panchromatic fusion network MSCANet, based on improved Swin transformer. The model extracts features of multispectral images and panchromatic images respectively by using two-flow branches. The downsampled feature images are cascaded and fed into the fusion network. In order to improve the robustness of feature extraction in various complex ground scenes, a Multiscale Swin-transformer with Channel Attention (MSCA) unit is integrated in the fusion part. The unit replaces the MLP part of Swin transformer into a cascade module of multi-scale convolution and channel attention, which can better fuse the feature information of ground objects of different sizes in remote sensing images and use the long-range dependence between regions. The fusion network focus on predicting the high-frequency details lost in multispectral images. Then high frequency details are added to the original image to restore a high resolution multispectral image. Simulation experiment and real experiment of three commercial satellites are conducted. In the experiment of simulation data, the fusion results were evaluated by calculating the difference between the reference image and the simulation dataset. Compared with other methods, MSCANet has the best performance in visual performance and quantitative metrics. Compared with the method with the second performance, the ERGAS index of MSCANet in the three datasets decreased by 11.99%, 0.4% and 3.43%, respectively. In the experiment of three real datasets, combining visual effect and quantitative metrics analysis, the result of MSCANet is the best. Ablation experiments were conducted for the three fusion strategies proposed in this paper. The experimental result shows that the injected model used in this paper outperforms the non-injected model. It also proves that the replacement of MLP module in MSCA module and the addition of attention mechanism are conducive to the improvement of fusion performance. Also, the addition of spectral loss and spatial structure loss on the basis of MAE loss is effective for the improvement of spectral fidelity and spatial resolution. In conclusion, the effectiveness of the proposed method was verified by comparison and ablation experiments. In future work, MSCANet is expected to be migrated to the fusion of multispectral image and hyperspectral image, visible image and infrared image, and other similar tasks to improve the generalization of the model proposed in this paper.

    Tools

    Get Citation

    Copy Citation Text

    Zitong LI, Jiankang ZHAO, Jingran XU, Haihui LONG, Chuanqi LIU. Remote Sensing Image Fusion Method Based on Improved Swin Transformer[J]. Acta Photonica Sinica, 2023, 52(11): 1110001

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: May. 8, 2023

    Accepted: Jun. 26, 2023

    Published Online: Dec. 22, 2023

    The Author Email: ZHAO Jiankang (zhaojiankang@sjtu.edu.cn)

    DOI:10.3788/gzxb20235211.1110001

    Topics