High-Resolution Remote Sensing Semantic Segmentation Method Coupling ResNet and Transformer

Lei Zhang; Xue Ding; Jinliang Wang; Shuangyun Peng; Rongxiang Luo

doi:10.3788/LOP250591

Laser & Optoelectronics Progress, Volume. 62, Issue 16, 1628004(2025)

High-Resolution Remote Sensing Semantic Segmentation Method Coupling ResNet and Transformer

Lei Zhang¹, Xue Ding^1,2,3、*, Jinliang Wang^2,3,4, Shuangyun Peng⁴, and Rongxiang Luo¹

Author Affiliations

¹School of Information Science and Technology, Yunnan Normal University, Kunming 650500, Yunnan , China

²Key Laboratory of Resources and Environmental Remote Sensing for Universities in Yunnan, Kunming 650500, Yunnan , China

³Yunnan Provincial Engineering and Technology Research Center for Geospatial Information Technology, Kunming 650500, Yunnan , China

⁴Faculty of Geography, Yunnan Normal University, Kunming 650500, Yunnan , China

show less

Abstract Get PDF(in Chinese)

Convolutional neural networks (CNNs) and visual Transformer face the problems of difficulty in effective fusion and low segmentation accuracy when fusing global and local features in semantic segmentation of high-resolution remote sensing images. This paper proposes a highly fused hybrid network RTHNet. RTHNet adopts an encoder and decoder structure, and uses ResNet50 as the backbone network in the encoding stage to effectively extract local features in remote sensing images. An attention adaptive fusion module (AAFM) is designed to achieve efficient integration of multi-level attention features between the encoder and decoder. In the decoding stage, a global-local contextual Transform module (GLCTB) is designed to pay attention to global context information and local details at the same time. A detail enhancement module (DEM) is proposed at the end of the decoder to ensure the precision and accuracy of the segmentation results by refining the semantic consistency and spatial detail information between features. Experimental results on the Potsdam, Vaihingen, and WHDLD datasets show that, the mean intersection over union (mIoU) of RTHNet reach 79.58%, 73.61%, and 60.37%, respectively. Compared with the current mainstream segmentation networks such as MAResU-Net and UNetFormer, RTHNet has significantly improved the segmentation accuracy.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

attention mechanism convolutional neural network global and local feature fusion remote sensing image segmentation vision Transformer

Tools

Get Citation

Copy Citation Text

Lei Zhang, Xue Ding, Jinliang Wang, Shuangyun Peng, Rongxiang Luo. High-Resolution Remote Sensing Semantic Segmentation Method Coupling ResNet and Transformer[J]. Laser & Optoelectronics Progress, 2025, 62(16): 1628004

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites