Classification of High-Resolution Remote Sensing Image Based on Swin Transformer and Convolutional Neural Network

Xiaoying He; Weiming Xu; Kaixiang Pan; Juan Wang; Ziwei Li

doi:10.3788/LOP232003

Laser & Optoelectronics Progress, Volume. 61, Issue 14, 1428002(2024)

Classification of High-Resolution Remote Sensing Image Based on Swin Transformer and Convolutional Neural Network

Xiaoying He^1,2,3, Weiming Xu^1,2,3、*, Kaixiang Pan^1,2,3, Juan Wang^1,2,3, and Ziwei Li^1,2,3

Author Affiliations

¹The Academy of Digital China, Fuzhou University, Fuzhou 350108, Fujian , China

²Key Laboratory of Spatial Data Mining & Information Sharing, Ministry of Education, Fuzhou University, Fuzhou 350002, Fujian , China

³National Engineering Research Center of Geospatial Information Technology, Fuzhou University, Fuzhou 350002, Fujian , China

show less

Abstract Get PDF(in Chinese)

It is challenging to directly obtain global information of existing deep learning-based remote sensing intelligent interpretation methods, resulting in blurred object edges and low classification accuracy between similar classes. This study proposes a semantic segmentation model called SRAU-Net based on Swin Transformer and convolutional neural network. SRAU-Net adopts a Swin Transformer encoder-decoder framework with a U-Net shape and introduces several improvements to address the limitations of previous methods. First, Swin Transformer and convolutional neural network are used to construct a dual-branch encoder, which effectively captures spatial details with different scales and complements the context features, resulting in higher classification accuracy and sharper object edges. Second, a feature fusion module is designed as a bridge for the dual-branch encoder. This module efficiently fuses global and local features in channel and spatial dimensions, improving the segmentation accuracy for small target objects. Moreover, the proposed SRAU-Net model incorporates a feature enhancement module that utilizes attention mechanisms to adaptively fuse features from the encoder and decoder and enhances the aggregation of spatial and semantic features, further improving the ability of the model to extract features from remote sensing images. The effectiveness of the proposed SRAU-Net model is demonstrated using the ISPRS Vaihingen dataset for land cover classification. The results show that SRAU-Net outperforms other models in terms of overall accuracy and F1 score, achieving 92.06% and 86.90%, respectively. Notably, the SRAU-Net model excels in extracting object edge information and accurately classifying small-scale regions, with an improvement of 2.57 percentage points in the overall classification accuracy compared with the original model. Furthermore, it effectively distinguishes remote sensing objects with similar characteristics, such as trees and low vegetation.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

convolutional neural network feature fusion high-resolution remote sensing image semantic segmentation Swin Transformer

Tools

Get Citation

Copy Citation Text

Xiaoying He, Weiming Xu, Kaixiang Pan, Juan Wang, Ziwei Li. Classification of High-Resolution Remote Sensing Image Based on Swin Transformer and Convolutional Neural Network[J]. Laser & Optoelectronics Progress, 2024, 61(14): 1428002

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites