Opto-Electronic Engineering, Volume. 51, Issue 12, 240237-1(2024)
Multi-scale feature enhanced Transformer network for efficient semantic segmentation
[1] Goodfellow I, Pouget-Abadie J, Mirza M et al. Generative adversarial networks[J]. Commun ACM, 63, 139-144(2020).
[3] Jiang W T, Dong R, Zhang S C. Global pooling residual classification network guided by local attention[J]. Opto-Electron Eng, 51, 240126(2024).
[4] He F T, Wu Q Q, Yang Y et al. Research on laser speckle image recognition technology based on transfer learning[J]. Laser Technol, 48, 443-448(2024).
[5] Zhang C, Huang Y P, Guo Z Y et al. Real-time lane detection method based on semantic segmentation[J]. Opto-Electron Eng, 49, 210378(2022).
[6] Chen L C, Papandreou G, Kokkinos I et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Trans Pattern Anal Mach Intell, 40, 834-848(2018).
[12] Vaswani A, Shazeer N, Parmar N et al. Attention is all you need[C], 6000-6010(2017).
[13] Dosovitskiy A, Beyer L, Kolesnikov A et al. An image is worth 16x16 words: transformers for image recognition at scale[C](2021).
[14] Liu Z, Lin Y T, Cao Y et al. Swin transformer: hierarchical vision transformer using shifted windows[C](2021).
[19] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C], 3431-3440(2015).
[29] Xie E Z, Wang W H, Yu Z D et al. SegFormer: simple and efficient design for semantic segmentation with transformers[C], 924(2021).
[30] Wan Q, Huang Z L, Lu J C et al. SeaFormer: squeeze-enhanced axial transformer for mobile semantic segmentation[C](2023).
[31] Zhang Q L, Yang Y B. ResT: an efficient transformer for visual recognition[C], 1185(2021).
[35] Wang W H, Xie E Z, Li X et al. Pvt v2: improved baselines with pyramid vision transformer[J]. Comp Visual Media, 8, 415-424(2022).
[36] Yuan L, Chen Y P, Wang T et al. Tokens-to-token ViT: training vision transformers from scratch on ImageNet[C], 558-567(2021).
[41] Loshchilov I, Hutter F. Decoupled weight decay regularization[C](2019).
[43] Zhang X, Zhang Y. Conv-PVT: a fusion architecture of convolution and pyramid vision transformer[J]. Int J Mach Learn Cyber, 14, 2127-2136(2023).
[45] Chu X X, Tian Z, Wang Y Q et al. Twins: revisiting the design of spatial attention in vision transformers[C], 716(2021).
[46] El-Nouby A, Touvron H, Caron M et al. XCiT: cross-covariance image transformers[C], 1531(2021).
[47] Wei C, Wei Y. TBFormer: three-branch efficient transformer for semantic segmentation[J]. Signal, Image Video Process, 18, 3661-3672(2024).
[51] Zhou Q, Sun Z H, Wang L J et al. Mixture lightweight transformer for scene understanding[J]. Computers and Electrical Engineering, 108, 108698(2023).
[52] Wang J, Gou C H, Wu Q M et al. RTFormer: efficient design for real-time semantic segmentation with transformer[C], 539(2022).
[54] Pan H H, Hong Y D, Sun W C et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes[J]. IEEE Trans Intell Transp Syst, 24, 3448-3460(2023).
[57] Cheng B W, Schwing A G, Kirillov A. Per-pixel classification is not all you need for semantic segmentation[C], 1367(2021).
Get Citation
Copy Citation Text
Yan Zhang, Chunming Ma, Shudong Liu, Yemei Sun. Multi-scale feature enhanced Transformer network for efficient semantic segmentation[J]. Opto-Electronic Engineering, 2024, 51(12): 240237-1
Category: Article
Received: Oct. 10, 2024
Accepted: Nov. 19, 2024
Published Online: Feb. 21, 2025
The Author Email: