Laser & Optoelectronics Progress, Volume. 62, Issue 16, 1628004(2025)
High-Resolution Remote Sensing Semantic Segmentation Method Coupling ResNet and Transformer
Convolutional neural networks (CNNs) and visual Transformer face the problems of difficulty in effective fusion and low segmentation accuracy when fusing global and local features in semantic segmentation of high-resolution remote sensing images. This paper proposes a highly fused hybrid network RTHNet. RTHNet adopts an encoder and decoder structure, and uses ResNet50 as the backbone network in the encoding stage to effectively extract local features in remote sensing images. An attention adaptive fusion module (AAFM) is designed to achieve efficient integration of multi-level attention features between the encoder and decoder. In the decoding stage, a global-local contextual Transform module (GLCTB) is designed to pay attention to global context information and local details at the same time. A detail enhancement module (DEM) is proposed at the end of the decoder to ensure the precision and accuracy of the segmentation results by refining the semantic consistency and spatial detail information between features. Experimental results on the Potsdam, Vaihingen, and WHDLD datasets show that, the mean intersection over union (mIoU) of RTHNet reach 79.58%, 73.61%, and 60.37%, respectively. Compared with the current mainstream segmentation networks such as MAResU-Net and UNetFormer, RTHNet has significantly improved the segmentation accuracy.
Get Citation
Copy Citation Text
Lei Zhang, Xue Ding, Jinliang Wang, Shuangyun Peng, Rongxiang Luo. High-Resolution Remote Sensing Semantic Segmentation Method Coupling ResNet and Transformer[J]. Laser & Optoelectronics Progress, 2025, 62(16): 1628004
Category: Remote Sensing and Sensors
Received: Feb. 5, 2025
Accepted: Mar. 21, 2025
Published Online: Jul. 25, 2025
The Author Email: Xue Ding (4228@ynnu.edu.cn)
CSTR:32186.14.LOP250591