Remote Sensing Image Classification Based on Grouped Spatial Coordinate Attention and Mamba

With the rapid advancement of remote sensing imaging technology, remote sensing image classification has become a critical research focus due to its foundational role in tasks such as agricultural management, urban planning, and disaster monitoring. However, existing methods still suffer from insufficient discriminative feature extraction, challenges in capturing global relationships and long-range dependencies, and low computational efficiency. To address these limitations, this study proposes a novel remote sensing image classification model, GCDM-Mamba, which integrates attention mechanisms and the Mamba architecture to enhance both accuracy and efficiency.

Methods

This paper presents GCDM-Mamba, a remote sensing image classification model that combines attention mechanisms with the Mamba architecture. The model incorporates a spatial grouping coordinate attention (GSCA) module, which utilizes global information from feature map spatial dimensions to generate attention maps. These maps subsequently weight the input feature maps to enhance feature expression capabilities. Additionally, the model employs position encoding to capture spatial information and implements a class token to generate global semantic representation for the input sequence, providing comprehensive category information. The proposed dual-stream multi-directional Mamba encoder (DMME) extracts features in parallel across the channel dimension and implements a multi-directional state space model (MDS) to capture spatial information in remote sensing images.

Results and Discussions

The GCDM-Mamba network model utilizes the GSCA module (Fig. 3) to leverage global information from the spatial dimensions (height and width) of feature maps for constructing attention maps, which then weight the input feature maps to enhance feature representation. Experimental results demonstrate that after integrating the GSCA module, the model’s precision (P), recall (R), and F1 score (F₁) improved by 2.26 percentage points, 2.22 percentage points, 2.23 percentage points on the UCM dataset; 2.22 percentage points, 2.23 percentage points, 2.13 percentage points on the AID dataset; and 2.32 percentage points, 2.41 percentage points, 2.43 percentage points on the NWPU-RESISC45 dataset respectively (Table 4). Through parallel processing of channel-wise feature extraction via the DMME module and multi-directional SSM module (Fig. 4), the model simultaneously enhances feature extraction capabilities and computational efficiency. Experiments reveal that with the DMME module, the model’s P, R, and F₁ increased by 1.75 percentage points, 1.90 percentage points, 1.94 percentage points on the UCM dataset; 1.85 percentage points, 1.91 percentage points, 1.85 percentage points on the AID dataset; and 1.52 percentage points, 1.58 percentage points, 1.58 percentage points on the NWPU-RESISC45 dataset (Table 4). Comparative experiments confirm that the GCDM-Mamba model achieves state-of-the-art classification performance across all three datasets, outperforming the current best model RSMamba-H with F₁ improvements of 1.88 percentage points, 1.78 percentage points , and 1.15 percentage points respectively (Table 1, Table 2, Table 3).

Conclusions

To address the challenges of insufficient feature discrimination and low computational efficiency in remote sensing image classification tasks, a novel method named GCDM-Mamba is proposed. The method begins by employing a GSCA module, where feature maps are grouped and processed through average pooling and max pooling along the height and width dimensions to construct attention maps. These maps utilize multi-dimensional global information to weight the input feature maps, thereby enhancing feature representation. Subsequently, positional embeddings are integrated to capture spatial information, while a class token is adopted to provide global category-related context for the entire image. Finally, DMME is introduced to further improve computational efficiency and strengthen the network’s ability to model long-range dependencies. Experimental evaluations on the UCM, AID, and NWPU-RESISC45 datasets demonstrate that the proposed GCDM-Mamba achieves superior classification performance compared to existing methods. With reduced parameters, the model effectively extracts image features and captures long-range dependencies, validating its effectiveness in remote sensing image classification tasks.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

attention mechanism deep learning image classification Mamba remote sensing

Tools

Get Citation

Copy Citation Text

Hui Chen, Zixu Li. Remote Sensing Image Classification Based on Grouped Spatial Coordinate Attention and Mamba[J]. Acta Optica Sinica, 2025, 45(15): 1528001

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites