Acta Optica Sinica, Volume. 45, Issue 15, 1528001(2025)

Remote Sensing Image Classification Based on Grouped Spatial Coordinate Attention and Mamba

Hui Chen and Zixu Li*
Author Affiliations
  • School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan 232001, Anhui , China
  • show less

    Objective

    With the rapid advancement of remote sensing imaging technology, remote sensing image classification has become a critical research focus due to its foundational role in tasks such as agricultural management, urban planning, and disaster monitoring. However, existing methods still suffer from insufficient discriminative feature extraction, challenges in capturing global relationships and long-range dependencies, and low computational efficiency. To address these limitations, this study proposes a novel remote sensing image classification model, GCDM-Mamba, which integrates attention mechanisms and the Mamba architecture to enhance both accuracy and efficiency.

    Methods

    This paper presents GCDM-Mamba, a remote sensing image classification model that combines attention mechanisms with the Mamba architecture. The model incorporates a spatial grouping coordinate attention (GSCA) module, which utilizes global information from feature map spatial dimensions to generate attention maps. These maps subsequently weight the input feature maps to enhance feature expression capabilities. Additionally, the model employs position encoding to capture spatial information and implements a class token to generate global semantic representation for the input sequence, providing comprehensive category information. The proposed dual-stream multi-directional Mamba encoder (DMME) extracts features in parallel across the channel dimension and implements a multi-directional state space model (MDS) to capture spatial information in remote sensing images.

    Results and Discussions

    The GCDM-Mamba network model utilizes the GSCA module (Fig. 3) to leverage global information from the spatial dimensions (height and width) of feature maps for constructing attention maps, which then weight the input feature maps to enhance feature representation. Experimental results demonstrate that after integrating the GSCA module, the model’s precision (P), recall (R), and F1 score (F1) improved by 2.26 percentage points, 2.22 percentage points, 2.23 percentage points on the UCM dataset; 2.22 percentage points, 2.23 percentage points, 2.13 percentage points on the AID dataset; and 2.32 percentage points, 2.41 percentage points, 2.43 percentage points on the NWPU-RESISC45 dataset respectively (Table 4). Through parallel processing of channel-wise feature extraction via the DMME module and multi-directional SSM module (Fig. 4), the model simultaneously enhances feature extraction capabilities and computational efficiency. Experiments reveal that with the DMME module, the model’s P, R, and F1 increased by 1.75 percentage points, 1.90 percentage points, 1.94 percentage points on the UCM dataset; 1.85 percentage points, 1.91 percentage points, 1.85 percentage points on the AID dataset; and 1.52 percentage points, 1.58 percentage points, 1.58 percentage points on the NWPU-RESISC45 dataset (Table 4). Comparative experiments confirm that the GCDM-Mamba model achieves state-of-the-art classification performance across all three datasets, outperforming the current best model RSMamba-H with F1 improvements of 1.88 percentage points, 1.78 percentage points , and 1.15 percentage points respectively (Table 1, Table 2, Table 3).

    Conclusions

    To address the challenges of insufficient feature discrimination and low computational efficiency in remote sensing image classification tasks, a novel method named GCDM-Mamba is proposed. The method begins by employing a GSCA module, where feature maps are grouped and processed through average pooling and max pooling along the height and width dimensions to construct attention maps. These maps utilize multi-dimensional global information to weight the input feature maps, thereby enhancing feature representation. Subsequently, positional embeddings are integrated to capture spatial information, while a class token is adopted to provide global category-related context for the entire image. Finally, DMME is introduced to further improve computational efficiency and strengthen the network’s ability to model long-range dependencies. Experimental evaluations on the UCM, AID, and NWPU-RESISC45 datasets demonstrate that the proposed GCDM-Mamba achieves superior classification performance compared to existing methods. With reduced parameters, the model effectively extracts image features and captures long-range dependencies, validating its effectiveness in remote sensing image classification tasks.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Hui Chen, Zixu Li. Remote Sensing Image Classification Based on Grouped Spatial Coordinate Attention and Mamba[J]. Acta Optica Sinica, 2025, 45(15): 1528001

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Remote Sensing and Sensors

    Received: Apr. 17, 2025

    Accepted: May. 12, 2025

    Published Online: Aug. 8, 2025

    The Author Email: Zixu Li (1074301430@qq.com)

    DOI:10.3788/AOS250956

    CSTR:32393.14.AOS250956

    Topics