Laser & Optoelectronics Progress, Volume. 61, Issue 14, 1428002(2024)

Classification of High-Resolution Remote Sensing Image Based on Swin Transformer and Convolutional Neural Network

Xiaoying He1,2,3, Weiming Xu1,2,3、*, Kaixiang Pan1,2,3, Juan Wang1,2,3, and Ziwei Li1,2,3
Author Affiliations
  • 1The Academy of Digital China, Fuzhou University, Fuzhou 350108, Fujian , China
  • 2Key Laboratory of Spatial Data Mining & Information Sharing, Ministry of Education, Fuzhou University, Fuzhou 350002, Fujian , China
  • 3National Engineering Research Center of Geospatial Information Technology, Fuzhou University, Fuzhou 350002, Fujian , China
  • show less

    It is challenging to directly obtain global information of existing deep learning-based remote sensing intelligent interpretation methods, resulting in blurred object edges and low classification accuracy between similar classes. This study proposes a semantic segmentation model called SRAU-Net based on Swin Transformer and convolutional neural network. SRAU-Net adopts a Swin Transformer encoder-decoder framework with a U-Net shape and introduces several improvements to address the limitations of previous methods. First, Swin Transformer and convolutional neural network are used to construct a dual-branch encoder, which effectively captures spatial details with different scales and complements the context features, resulting in higher classification accuracy and sharper object edges. Second, a feature fusion module is designed as a bridge for the dual-branch encoder. This module efficiently fuses global and local features in channel and spatial dimensions, improving the segmentation accuracy for small target objects. Moreover, the proposed SRAU-Net model incorporates a feature enhancement module that utilizes attention mechanisms to adaptively fuse features from the encoder and decoder and enhances the aggregation of spatial and semantic features, further improving the ability of the model to extract features from remote sensing images. The effectiveness of the proposed SRAU-Net model is demonstrated using the ISPRS Vaihingen dataset for land cover classification. The results show that SRAU-Net outperforms other models in terms of overall accuracy and F1 score, achieving 92.06% and 86.90%, respectively. Notably, the SRAU-Net model excels in extracting object edge information and accurately classifying small-scale regions, with an improvement of 2.57 percentage points in the overall classification accuracy compared with the original model. Furthermore, it effectively distinguishes remote sensing objects with similar characteristics, such as trees and low vegetation.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Xiaoying He, Weiming Xu, Kaixiang Pan, Juan Wang, Ziwei Li. Classification of High-Resolution Remote Sensing Image Based on Swin Transformer and Convolutional Neural Network[J]. Laser & Optoelectronics Progress, 2024, 61(14): 1428002

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Remote Sensing and Sensors

    Received: Aug. 29, 2023

    Accepted: Nov. 21, 2023

    Published Online: Jul. 8, 2024

    The Author Email: Weiming Xu (xwming2@126.com)

    DOI:10.3788/LOP232003

    CSTR:32186.14.LOP232003

    Topics