Opto-Electronic Engineering, Volume. 51, Issue 1, 230304-1(2024)

Design of Swin Transformer for semantic segmentation of road scenes

Hao Hang, Yingping Huang*, Xurui Zhang, and Xin Luo
Author Affiliations
  • School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
  • show less

    Road scene semantic segmentation is a crucial task in autonomous driving environment perception. In recent years, Transformer neural networks have been applied in the field of computer vision and have shown excellent performance. Addressing issues such as low semantic segmentation accuracy in complex scene images and insufficient recognition capabilities for small objects, this paper proposes a road scene semantic segmentation algorithm based on Swin Transformer with multiscale feature fusion. The network adopts an encoder-decoder structure, where the encoder utilizes an improved Swin Transformer feature extractor for road scene image feature extraction. The decoder consists of an attention fusion module and a feature pyramid network, effectively integrating semantic features at multiple scales. Validation tests on the Cityscapes urban road scene dataset show that, compared to various existing semantic segmentation algorithms, our approach demonstrates significant improvement in segmentation accuracy.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Hao Hang, Yingping Huang, Xurui Zhang, Xin Luo. Design of Swin Transformer for semantic segmentation of road scenes[J]. Opto-Electronic Engineering, 2024, 51(1): 230304-1

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Article

    Received: Dec. 14, 2023

    Accepted: Jan. 24, 2024

    Published Online: Apr. 19, 2024

    The Author Email: Huang Yingping (黄影平)

    DOI:10.12086/oee.2024.230304

    Topics