Infrared and Laser Engineering, Volume. 53, Issue 11, 20240305(2024)

Triple multi-modal image fusion algorithm based on mixed difference convolution and efficient vision Transformer network

Kunyu SI and Chunhui NIU
Author Affiliations
  • School of Instrument Science and Photoelectric Engineering, Beijing Information Science & Technology University, Beijing 100192, China
  • show less
    Figures & Tables(19)
    Multi-stage image fusion process
    Central difference convolution and angular difference convolution
    Horizontal difference convolution and vertical difference convolution
    The overall structure of mixed differential convolution
    Overview of EfficientViT[14]. (a) Architecture of EfficientViT; (b) Sandwich layout block; (c) Cascaded group attention
    Examples of differential images. (a) Visible image. (b) Infared image. (c) Difference image
    Overall framework of network structure
    The framework of CNN branch
    The framework of Transformer-branch
    The framework of improved coordinate attention
    The framework of MCA
    The framework of CMCA attention
    Feature fusion layer based on multi-dimensional coordinate co-attention mechanism
    The framework of decoder
    Network framework of the training stage
    Fusion results of three typical pairs of images
    Fusion results of ablation experiment
    • Table 1. Objective evaluation results of two datasets

      View table
      View in Article

      Table 1. Objective evaluation results of two datasets

      DatasetMethodsMI↑VIF↑SF↑SD↑QAB/F
      TNOCBF2.15050.386913.959034.41110.4164
      ADF1.81880.58569.199223.75980.4973
      Nestfuse3.39700.83209.333039.82970.4980
      Densefuse2.16470.54546.005422.98600.3373
      Swinfusion3.30850.724010.262237.79690.5098
      MTDfusion3.29890.80849.733940.61570.4900
      MFST2.19940.48277.369138.31560.3081
      Ours5.5457(1)0.9889(1)10.3439(2)42.7316(1)0.5374 (1)
      RoadScenceCBF3.61050.471120.937945.47040.5064
      ADF2.65470.559311.388531.87370.4709
      Nestfuse3.95540.793112.477148.54750.5558
      Densefuse2.65120.51088.412528.63380.3580
      Swinfusion3.30850.724010.262237.79690.5098
      MTDfusion3.55760.568412.878643.00120.3667
      MFST2.71600.47589.846841.30270.3230
      Ours5.7924(1)0.8934(1)14.4354(2)53.4160(1)0.5739(1)
    • Table 2. Objective evaluation results of ablation experiment

      View table
      View in Article

      Table 2. Objective evaluation results of ablation experiment

      MethodsMI↑VIF↑SF↑SD↑QAB/F
      Normal_conv1.86910.361210.328231.17170.2320
      Non_transformer3.87020.40535.422314.28290.1858
      Two_input2.20870.62988.579732.76180.4268
      No_denseblock4.55720.58526.922221.55300.3749
      Ours5.5457(1)0.9889(1)10.3439(1)42.7316(1)0.5374(1)
    Tools

    Get Citation

    Copy Citation Text

    Kunyu SI, Chunhui NIU. Triple multi-modal image fusion algorithm based on mixed difference convolution and efficient vision Transformer network[J]. Infrared and Laser Engineering, 2024, 53(11): 20240305

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: 图像处理

    Received: Jul. 8, 2024

    Accepted: --

    Published Online: Dec. 13, 2024

    The Author Email:

    DOI:10.3788/IRLA20240305

    Topics