Chinese Journal of Liquid Crystals and Displays, Volume. 38, Issue 7, 975(2023)
Multimodal image semantic segmentation based on attention mechanism
The training data of many current semantic segmentation models are RGB images, and the stability of the model is easily affected in some extreme environments. It cannot meet the actual demand of automatic driving at night. ResNet-152 is used as a feature extraction network to construct a multi-modal dual encoder-decoder model integrating lightweight attention module. The dual encoder extracts key information from the two modes of RGB-T and fuses it through the attention module. Then, the extracted feature information is sent to the decoder. The upsampled feature map and the feature map extracted by the encoder of each layer are spliced in stages, the feature is extracted by the convolution layer, the resolution is restored by upsampling, and the semantic segmentation is carried out at the last. The experimental results show that the mean accuracy and mean intersection over union of the proposed model on the MFNet test set are 76% and 55.7%, respectively, which makes a certain improvement compared with other network models. This model can basically achieve the requirement of accurate semantic segmentation of RGB-T modal images both day and night
Get Citation
Copy Citation Text
Ji-you ZHANG, Rong-fen ZHANG, Yu-hong LIU, Wen-hao YUAN. Multimodal image semantic segmentation based on attention mechanism[J]. Chinese Journal of Liquid Crystals and Displays, 2023, 38(7): 975
Category: Research Articles
Received: Sep. 16, 2022
Accepted: --
Published Online: Jul. 31, 2023
The Author Email: Rong-fen ZHANG (rfzhang@gzu.edu.cn)