Research on urban street view semantic segmentation method based on Transformer architecture

XIONG Wei; ZHAO Di; SUN Peng; LIU Yue

doi:10.16136/j.joel.2024.12.0229

Journal of Optoelectronics · Laser, Volume. 35, Issue 12, 1240(2024)

Research on urban street view semantic segmentation method based on Transformer architecture

XIONG Wei^1,2, ZHAO Di¹, SUN Peng¹, and LIU Yue¹

Author Affiliations

¹School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, Hubei 430068, China

²Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29201, USA

show less

Abstract Get PDF(in Chinese)

References(20)

[1] [1] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, December 4-9, 2017, Long Beach, California, USA. La Jolla: NIPS, 2017: 5998-6008.

[2] [2] BITTER C, ELIZONDO D A, YANG Y J. Natural language processing: a prolog perspective[J]. Artificial Intelligence Review, 2010, 33(1-2): 151-173.

[3] [3] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL].(2021-06-03)[2023-05-08]. https://arxiv.org/abs/2010.11929.

[4] [4] ZHENG S, LU J, ZHAO H, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), June 19-25, 2021, Virtual. New York: IEEE, 2021: 6877-6886.

[5] [5] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV 2021), October 11-17, 2021, Virtual. New York: IEEE, 2021: 9992-10002.

[6] [6] ZHANG B, GU S, ZHANG B, et al. StyleSwin: transformer-based gan for high-resolution image generation[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), June 18-24, 2022, New Orleans, LA, USA. New York: IEEE, 2022: 11294-11304.

[7] [7] XIA Z, PAN X, SONG S, et al. Vision transformer with deformable attention[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), June 18-24, 2022, New Orleans, LA, USA. New York: IEEE, 2022: 4784-4793.

[8] [8] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), July 21-26, 2017, Honolulu, HI, United States. New York: IEEE, 2017: 936-944.

[9] [9] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention, October 5-9, 2015, Munich, Germany. Cham: Springer, 2015: 234-241.

[10] [10] WU X, IRIE G, HIRAMATSU K, et al. Weighted generalized mean pooling for deep image retrieval[C]//25th IEEE International Conference on Image Processing (ICIP 2018), October 7-10, 2018, Athens, Greece. New York: IEEE, 2018: 495-499.

[11] [11] ELHASSAN M A M, YANG C, HUANG C, et al. SPINet: subspace pyramid fusion network for semantic segmentation[EB/OL].(2022-04-04)[2023-05-08]. https://arxiv.org/abs/2204.01278.

[12] [12] ZHANG X, ZHOU X, LIN M, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018), June 18-23, 2018, Salt Lake City, UT, United States. New York: IEEE, 2018: 6848-6856.

[13] [13] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.

[14] [14] PASZKE A, CHAURASIA A, KIM S, et al. ENet: A deep neural network architecture for real-time semantic segmentation[EB/OL].(2016-06-07)[2023-05-08]. https://arxiv.org/abs/1606.02147.

[15] [15] CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//15th European Conference on Computer Vision (ECCV 2018), September 8-14, 2018, Munich, Germany, Cham: Springer, 2018: 833-851.

[16] [16] YU C, WANG J, PENG C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation[C]//15th European Conference on Computer Vision (ECCV 2018), September 8-14, 2018, Munich, Germany. Cham: Springer, 2018: 334-349.

[17] [17] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019), June 16-20, 2019, Long Beach, CA, United States. New York: IEEE, 2019: 5686-5696.

[18] [18] YANG Z, YU H, FU Q, et al. NDNet: narrow while deep network for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(9): 5508-5519.

[19] [19] YU C, GAO C, WANG J, et al. BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129(11): 3051-3068.

[20] [20] DONG G, YAN Y, SHEN C, et al. Real-time high-performance semantic image segmentation of urban street scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(6): 3258-3274.

Tools

Get Citation

Copy Citation Text

XIONG Wei, ZHAO Di, SUN Peng, LIU Yue. Research on urban street view semantic segmentation method based on Transformer architecture[J]. Journal of Optoelectronics · Laser, 2024, 35(12): 1240

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: May. 8, 2023

Accepted: Dec. 31, 2024

Published Online: Dec. 31, 2024

The Author Email:

DOI:10.16136/j.joel.2024.12.0229

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology