Opto-Electronic Engineering, Volume. 51, Issue 12, 240237-1(2024)

Multi-scale feature enhanced Transformer network for efficient semantic segmentation

Yan Zhang, Chunming Ma, Shudong Liu, and Yemei Sun
Author Affiliations
  • College of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300380, China
  • show less
    References(57)

    [1] Goodfellow I, Pouget-Abadie J, Mirza M et al. Generative adversarial networks[J]. Commun ACM, 63, 139-144(2020).

    [3] Jiang W T, Dong R, Zhang S C. Global pooling residual classification network guided by local attention[J]. Opto-Electron Eng, 51, 240126(2024).

    [4] He F T, Wu Q Q, Yang Y et al. Research on laser speckle image recognition technology based on transfer learning[J]. Laser Technol, 48, 443-448(2024).

    [5] Zhang C, Huang Y P, Guo Z Y et al. Real-time lane detection method based on semantic segmentation[J]. Opto-Electron Eng, 49, 210378(2022).

    [6] Chen L C, Papandreou G, Kokkinos I et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Trans Pattern Anal Mach Intell, 40, 834-848(2018).

    [12] Vaswani A, Shazeer N, Parmar N et al. Attention is all you need[C], 6000-6010(2017).

    [13] Dosovitskiy A, Beyer L, Kolesnikov A et al. An image is worth 16x16 words: transformers for image recognition at scale[C](2021).

    [14] Liu Z, Lin Y T, Cao Y et al. Swin transformer: hierarchical vision transformer using shifted windows[C](2021).

    [19] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C], 3431-3440(2015).

    [29] Xie E Z, Wang W H, Yu Z D et al. SegFormer: simple and efficient design for semantic segmentation with transformers[C], 924(2021).

    [30] Wan Q, Huang Z L, Lu J C et al. SeaFormer: squeeze-enhanced axial transformer for mobile semantic segmentation[C](2023).

    [31] Zhang Q L, Yang Y B. ResT: an efficient transformer for visual recognition[C], 1185(2021).

    [35] Wang W H, Xie E Z, Li X et al. Pvt v2: improved baselines with pyramid vision transformer[J]. Comp Visual Media, 8, 415-424(2022).

    [36] Yuan L, Chen Y P, Wang T et al. Tokens-to-token ViT: training vision transformers from scratch on ImageNet[C], 558-567(2021).

    [41] Loshchilov I, Hutter F. Decoupled weight decay regularization[C](2019).

    [43] Zhang X, Zhang Y. Conv-PVT: a fusion architecture of convolution and pyramid vision transformer[J]. Int J Mach Learn Cyber, 14, 2127-2136(2023).

    [45] Chu X X, Tian Z, Wang Y Q et al. Twins: revisiting the design of spatial attention in vision transformers[C], 716(2021).

    [46] El-Nouby A, Touvron H, Caron M et al. XCiT: cross-covariance image transformers[C], 1531(2021).

    [47] Wei C, Wei Y. TBFormer: three-branch efficient transformer for semantic segmentation[J]. Signal, Image Video Process, 18, 3661-3672(2024).

    [51] Zhou Q, Sun Z H, Wang L J et al. Mixture lightweight transformer for scene understanding[J]. Computers and Electrical Engineering, 108, 108698(2023).

    [52] Wang J, Gou C H, Wu Q M et al. RTFormer: efficient design for real-time semantic segmentation with transformer[C], 539(2022).

    [54] Pan H H, Hong Y D, Sun W C et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes[J]. IEEE Trans Intell Transp Syst, 24, 3448-3460(2023).

    [57] Cheng B W, Schwing A G, Kirillov A. Per-pixel classification is not all you need for semantic segmentation[C], 1367(2021).

    Tools

    Get Citation

    Copy Citation Text

    Yan Zhang, Chunming Ma, Shudong Liu, Yemei Sun. Multi-scale feature enhanced Transformer network for efficient semantic segmentation[J]. Opto-Electronic Engineering, 2024, 51(12): 240237-1

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Article

    Received: Oct. 10, 2024

    Accepted: Nov. 19, 2024

    Published Online: Feb. 21, 2025

    The Author Email:

    DOI:10.12086/oee.2024.240237

    Topics