Real-time urban street view semantic segmentation based on cross-layer aggregation network

Zhiqiang HOU; Minjie CHENG; Sugang MA; Minjie QU; Xiaobao YANG

doi:10.37188/OPE.20243208.1212

Optics and Precision Engineering, Volume. 32, Issue 8, 1212(2024)

Real-time urban street view semantic segmentation based on cross-layer aggregation network

Zhiqiang HOU1...2, Minjie CHENG1,2,*, Sugang MA1,2, Minjie QU1,2, and Xiaobao YANG12 |Show fewer author(s)

Author Affiliations

¹Xi'an University of Posts and Telecommunications， Institute of Computer， Xi'an702， China

²Xi'an University of Posts and Telecommunications， Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing， Xi'an71011， China

show less

Abstract Get PDF(in Chinese)

With the rapid development of autonomous driving technology， precise and efficient scene understanding has become increasingly important. Urban street scene semantic segmentation aims to accurately identify and segment elements such as pedestrians， obstacles， roads， and signs， providing necessary road information for autonomous driving technology. However， current semantic segmentation algorithms still face challenges in urban street scene segmentation， mainly manifested in issues such as insufficient discrimination between different categories of pixels， inaccurate understanding of complex scene structures， and inaccurate segmentation of small-scale objects or large-scale structures. To address these issues， this paper proposed a real-time urban street scene semantic segmentation algorithm based on a cross-layer aggregation network. Firstly， a pyramid pooling module combined with cross-layer aggregation was designed at the end of the encoder to efficiently extract multi-scale context information. Secondly， a cross-layer aggregation module was designed between the encoder and decoder， which enhances the representation ability of information by introducing a channel attention mechanism and gradually aggregates the features of the encoder stage to fully achieve feature reuse. Finally， a multi-scale fusion module was designed in the decoder stage， which aggregates global and local information in the channel dimension to promote the fusion of deep and shallow features. The proposed algorithm was validated on two common urban street scene datasets. On an RTX 3090 graphics card （TensorRT speed measurement environment）， the algorithm achieves 73.0% mIoU accuracy on the Cityscapes test set with real-time performance of 294 FPS， and 75.8% mIoU accuracy on higher resolution images with real-time performance of 164 FPS； on the CamVid dataset， it achieves 74.8% mIoU accuracy with real-time performance of 239 FPS. Experimental results show that the proposed algorithm effectively balances accuracy and real-time performance， significantly improving semantic segmentation performance compared to other algorithms， and bringing new breakthroughs to the field of real-time urban street scene semantic segmentation.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

convolutional neural network encoder-decoder structure pyramid pooling module semantic segmentation urban street view

Tools

Get Citation

Copy Citation Text

Zhiqiang HOU, Minjie CHENG, Sugang MA, Minjie QU, Xiaobao YANG. Real-time urban street view semantic segmentation based on cross-layer aggregation network[J]. Optics and Precision Engineering, 2024, 32(8): 1212

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites