Remote Sensing Image Segmentation Based on Attention Guidance and Multi-Feature Fusion

Yinhui Zhang; Feng Zhang; Zifen He; Xiaogang Yang; Ruitao Lu; Guangchen Chen

doi:10.3788/AOS230631

Acta Optica Sinica, Volume. 43, Issue 24, 2428010(2023)

Remote Sensing Image Segmentation Based on Attention Guidance and Multi-Feature Fusion

Yinhui Zhang¹, Feng Zhang¹, Zifen He^1、*, Xiaogang Yang², Ruitao Lu², and Guangchen Chen¹

¹Faculty of Mechanical and Electrical Engineering, Kunming University of Science and Technology, Kunming 650500, Yunnan , China

²College of Missile Engineering, Rocket Force Engineering University, Xi'an 710025, Shaanxi , China

show less

Abstract Get PDF(in Chinese)

Objective

Remote sensing images have a large detection range, long dynamic monitoring time, and a large amount of carrying information, making the obtained ground feature information more comprehensive and rich. By extracting ground object targets from remote sensing images, more detailed and accurate ground object information in the imaging area can be obtained, providing data support for high-altitude reconnaissance, precision guidance, and terrain matching. However, with the rapid increase in data volume, the current low level of intelligent and automated target extraction methods is difficult to embrace the demand. Traditional image extraction techniques contain edge detection, threshold segmentation, and region segmentation. These methods have good segmentation performance for remote sensing targets with significant contour boundaries but lack the ability of adaptive adjustment while facing complex and ever-changing remote sensing targets. Convolutional neural networks have stronger representation ability, scalability, and robustness than traditional methods by providing multi-level semantic information in images. Due to the uneven distribution, blurred edges, and variable scales of ground objects in remote sensing images, convolutional neural networks are prone to losing edge information and multi-scale feature information during feature extraction. In addition, cloud cover of remote sensing targets in complex scenes exacerbates the loss of target edge and multi-scale information, making it more difficult for convolutional neural networks to accurately segment remote sensing ground objects. In order to solve the above problems, we propose a segmentation method that uses deep residual networks as the backbone and combines attention guidance and multi-feature fusion to enhance the network's ability to segment remote sensing image ground object edges and multi-scale objects.

Methods

We propose a remote sensing image semantic segmentation network called AMSNet, which combines attention guidance and multi-feature fusion. In the Encoder Section, D_ Resnet50 is applied as the backbone network to extract the main feature information from remote sensing images, which can enhance the acquisition of detailed information such as edge and small-scale targets in remote sensing images. The category guidance channel attention module is inserted into the backbone to enhance the network's segmentation ability for difficult-to-distinguish and irregularly shaped areas in remote sensing images. A feature reuse module is added to the backbone network to solve the loss of edge detail information and the disappearance of scattered small-scale targets during feature extraction. In the Decoder Section, the cross-regional feature fusion module is applied to fuse the multi-feature information, improving the acquisition of multi-scale target information. Multi-scale loss fusion module is also joined to further enhance the segmentation performance of the network for multi-scale targets.

Results and Discussions

From the analysis of experimental results on the remote sensing image dataset of the plateau region and the remote sensing image dataset of the plateau region under cloud interference, compared with other semantic segmentation networks, the proposed network has better segmentation performance (Table 6 and Table 7) regardless of cloud interference. In addition, the segmentation performance is less affected by cloud interference. Even under cloud interference, the segmentation accuracy of ground targets is only 1.10 percentage points lower than that without cloud interference in mIoU, 0.58 percentage points lower than that in mPa, and 0.71 percentage points lower than that in mF1, which is lower than the influence of other semantic segmentation networks on segmentation effect under different cloud meteorological interference conditions. In addition, in order to verify the generalization performance of the AMSNet network segmentation effect, the International Society for Photogrammetry and Remote Sensing (ISPRS) dataset in the Vaihingen region of Germany is selected. In order to better fit the picture size, number of grouping convolutions of feature multiplexing modules in the AMSNet network is reduced to four groups. From the experimental results in Table 8, the network still performs better than other networks. This network is compared with PspNet and OCNet, with mIoU increased by 5.09 percentage points and 5.57 percentage points, Deeplabv3+ network with mIoU by 3.47 percentage points, mPa by 3.56 percentage points, and mF1 by 2.78 percentage points. From the segmenting effect diagram of Fig. 8, this network has a lower error rate, fewer omission, and a more accurate segmenting boundary for building edges and small-scale cars than other networks.

Conclusions

We propose a network model based on encoding-decoding structure—AMSNet. In the encoding part, the D_Resnet50 network is applied as the backbone to extract the main feature information of remote sensing images. We also use a category-guided channel attention module to reduce the interference of channel noise on segmented objects and improve the segmentation effect of targets in difficult-to-distinguish areas. We embed a feature reuse module to compensate for the problem of target edge loss and small-scale target loss during the feature extraction process. In the decoding part, the cross-regional feature fusion module is designed to integrate multi-layer features and combine the multi-scale loss fusion module to calculate the feature loss at different scales to improve the segmentation effect of the network on multi-scale targets. This network conducts experiments on the remote sensing image dataset of the plateau region, remote sensing image dataset of the plateau region under cloud interference, and a public dataset. Compared with semantic segmentation networks such as BiseNetv2, PspNet, and Deeplabv3+, the proposed network achieves better results in the evaluation indicators of mIoU, mPa, and mF1. The visualization results show that the proposed network can effectively segment the ground object targets and scattered multi-scale targets in the interlaced and hard-to-distinguish areas in the remote sensing images, and it has good segmentation performance and good robustness in cloud interference.

Keywords

attention mechanism multi-scale feature remote sensing image semantic segmentation

Tools

Get Citation

Copy Citation Text

Yinhui Zhang, Feng Zhang, Zifen He, Xiaogang Yang, Ruitao Lu, Guangchen Chen. Remote Sensing Image Segmentation Based on Attention Guidance and Multi-Feature Fusion[J]. Acta Optica Sinica, 2023, 43(24): 2428010

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category: Remote Sensing and Sensors

Received: Mar. 6, 2023

Accepted: Apr. 24, 2023

Published Online: Dec. 8, 2023

The Author Email: He Zifen (zyhhzf1998@168.com)

DOI:10.3788/AOS230631

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology