Opto-Electronic Engineering, Volume. 51, Issue 12, 240237-1(2024)

Multi-scale feature enhanced Transformer network for efficient semantic segmentation

Yan Zhang, Chunming Ma, Shudong Liu, and Yemei Sun
Author Affiliations
  • College of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300380, China
  • show less
    Figures & Tables(13)
    An efficient Transformer-based semantic segmentation network enhanced by multi-scale features. (a) Multi-scale pooling self-attention module; (b) Cross-spatial feed-forward network module
    Multi-scale pooling operation
    Cross-spatial attention
    Visualization of segmentation results of different algorithms on the ADE20K dataset
    • Table 1. Detailed settings of the MFE-Former, output size refers to the resolution of the output at each stage, and operation represents the operations used in the i stage. Additionally, the parameter settings include the channel reduction ratio r and the pooling factor pl used by each MFE-Transformer block

      View table
      View in Article

      Table 1. Detailed settings of the MFE-Former, output size refers to the resolution of the output at each stage, and operation represents the operations used in the i stage. Additionally, the parameter settings include the channel reduction ratio r and the pooling factor pl used by each MFE-Transformer block

      阶段输出大小操作参数设置
      1128×128×48Patch Embed , MFE-Transformer block×2r=1, {p1=8, p2=16, p3=24, p4=32, p5=40}
      264×64×96Patch Embed, MFE-Transformer block×2 r=2, {p1=4, p2=8, p3=12, p4=16, p5=20}
      332×32×260Patch Embed, MFE-Transformer block×6r=4, {p1=2, p2=4, p3=6, p4=8, p5=10}
      416×16×384Patch Embed, MFE-Transformer block×3r=8, {p1=1, p2=2, p3=3, p4=4, p5=5}
    • Table 2. Model evaluation results of different segmentation models on ADE20K dataset

      View table
      View in Article

      Table 2. Model evaluation results of different segmentation models on ADE20K dataset

      Method#Param/MFLOPs/GmIoU/%
      FCN[19]9.839.619.7
      ResNet18[11]15.532.232.9
      PSPNet[9]13.752.229.6
      DeepLabV3+[6]15.425.938.1
      ViT[13]10.224.637.4
      PVT-Tiny[17]17.033.235.7
      PoolFormer-S12[42]15.737.2
      Conv-PVT-Tiny[43]16.437.2
      EdgeViT-XS[44]10.327.741.4
      Swin-tiny [14]31.946.041.5
      PVT-Large[17]65.179.642.1
      Segformer-B1[29]13.715.942.2
      PoolFormer-M48[42]77.142.7
      Twins-SVT-S[45]28.337.043.2
      Xcit-T12/16[46]33.743.5
      TBFormer-T[47]20.524.342.8
      SCTNet-B[48]17.443.0
      MFE-Former (Ours)15.931.144.1
    • Table 3. The model evaluation results on the Cityscapes dataset (the FLOPs test was performed at a resolution of 1024×2048)

      View table
      View in Article

      Table 3. The model evaluation results on the Cityscapes dataset (the FLOPs test was performed at a resolution of 1024×2048)

      Method#Param/MFLOPs/GmIoU/%
      FCN[19]9.831761.5
      PSPNet[9]13.742370.2
      DeepLabV3+[6]15.455575.2
      SwiftNetRN[49]11.810475.5
      EncNet[50]55.1174876.9
      PVT-Tiny[17]17.071.7
      MLT[51]20.177.4
      RTFormer-Bas[52]16.879.3
      SFNet(ResNet-18)[53]12.87247.078.9
      DDRNet-39[54]32.3281.280.4
      PIDNet-L[55]36.9275.880.6
      MFE-Former (Ours)15.9238.680.6
    • Table 4. Compare with the state-of-the-art models on the COCO stuff dataset (testing was conducted using an input resolution of 512×512)

      View table
      View in Article

      Table 4. Compare with the state-of-the-art models on the COCO stuff dataset (testing was conducted using an input resolution of 512×512)

      Method#Param/MFLOPs/GmIoU/%
      PSPNet[9]13.752.930.1
      DeepLabV3+[6]15.425.929.9
      LR-ASPP[56]2.3725.2
      MaskFormer[57]415337.1
      TBFormer-T[47]20.537.537.9
      SCTNet-B[48]17.435.9
      MFE-Former (Ours)15.931.138.0
    • Table 5. The segmentation performance metrics of 3 models on the ADE20K, Cityscapes, and COCO-stuff datasets

      View table
      View in Article

      Table 5. The segmentation performance metrics of 3 models on the ADE20K, Cityscapes, and COCO-stuff datasets

      MethodADE20KCityscapesCOCO-stuff
      mIoU/%aAcc/%mAcc/%mDice/%mIoU/%aAcc/%mAcc/%mDice/%mIoU/%aAcc/%mAcc/%mDice/%
      Segformer-B142.179.652.756.378.595.883.485.4
      SCTNet-B43.080.256.157.280.196.486.688.335.967.148.547.9
      MFE-Former(Ours)44.181.056.157.480.696.587.787.538.069.149.148.1
    • Table 6. Experiment results on ablation using different pooling methods

      View table
      View in Article

      Table 6. Experiment results on ablation using different pooling methods

      池化方法FLOPs/GmIoU/%
      最大池化30.740.8
      多尺度最大池化31.143.1
      平均池化30.742.5
      多尺度平均池化31.144.1
    • Table 7. Experiment results on ablation with multi-scale pooling ratio parameter settings and experimental results of parameter setting for ADE20K dataset

      View table
      View in Article

      Table 7. Experiment results on ablation with multi-scale pooling ratio parameter settings and experimental results of parameter setting for ADE20K dataset

      池化因子plRFLOPs/GmIoU/%
      41631.542.3
      86430.942.5
      1625630.641.9
      2457630.541.3
      32102430.540.9
      40160030.540.4
      816244731.143.7
      8162432404331.144.1
    • Table 8. An ablation experiment was conducted on different components of cross spatial attention on the ADE20K dataset

      View table
      View in Article

      Table 8. An ablation experiment was conducted on different components of cross spatial attention on the ADE20K dataset

      Setting#Param/MFLOPs/GmIoU/%
      不添加跨空间注意力15.830.343.1
      仅添加通道交互分支15.830.343.3
      添加跨空间注意力15.931.144.1
    • Table 9. Ablation study on the hyperparameter N in the CSA module on the ADE20K dataset

      View table
      View in Article

      Table 9. Ablation study on the hyperparameter N in the CSA module on the ADE20K dataset

      N#Param/MFLOPs/GmIoU/%
      416.331.843.9
      815.931.144.1
      1615.830.843.3
    Tools

    Get Citation

    Copy Citation Text

    Yan Zhang, Chunming Ma, Shudong Liu, Yemei Sun. Multi-scale feature enhanced Transformer network for efficient semantic segmentation[J]. Opto-Electronic Engineering, 2024, 51(12): 240237-1

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Article

    Received: Oct. 10, 2024

    Accepted: Nov. 19, 2024

    Published Online: Feb. 21, 2025

    The Author Email:

    DOI:10.12086/oee.2024.240237

    Topics