Acta Optica Sinica, Volume. 45, Issue 15, 1528001(2025)
Remote Sensing Image Classification Based on Grouped Spatial Coordinate Attention and Mamba
With the rapid advancement of remote sensing imaging technology, remote sensing image classification has become a critical research focus due to its foundational role in tasks such as agricultural management, urban planning, and disaster monitoring. However, existing methods still suffer from insufficient discriminative feature extraction, challenges in capturing global relationships and long-range dependencies, and low computational efficiency. To address these limitations, this study proposes a novel remote sensing image classification model, GCDM-Mamba, which integrates attention mechanisms and the Mamba architecture to enhance both accuracy and efficiency.
This paper presents GCDM-Mamba, a remote sensing image classification model that combines attention mechanisms with the Mamba architecture. The model incorporates a spatial grouping coordinate attention (GSCA) module, which utilizes global information from feature map spatial dimensions to generate attention maps. These maps subsequently weight the input feature maps to enhance feature expression capabilities. Additionally, the model employs position encoding to capture spatial information and implements a class token to generate global semantic representation for the input sequence, providing comprehensive category information. The proposed dual-stream multi-directional Mamba encoder (DMME) extracts features in parallel across the channel dimension and implements a multi-directional state space model (MDS) to capture spatial information in remote sensing images.
The GCDM-Mamba network model utilizes the GSCA module (Fig. 3) to leverage global information from the spatial dimensions (height and width) of feature maps for constructing attention maps, which then weight the input feature maps to enhance feature representation. Experimental results demonstrate that after integrating the GSCA module, the model’s precision (P), recall (R), and F1 score (F1) improved by 2.26 percentage points, 2.22 percentage points, 2.23 percentage points on the UCM dataset; 2.22 percentage points, 2.23 percentage points, 2.13 percentage points on the AID dataset; and 2.32 percentage points, 2.41 percentage points, 2.43 percentage points on the NWPU-RESISC45 dataset respectively (Table 4). Through parallel processing of channel-wise feature extraction via the DMME module and multi-directional SSM module (Fig. 4), the model simultaneously enhances feature extraction capabilities and computational efficiency. Experiments reveal that with the DMME module, the model’s P, R, and F1 increased by 1.75 percentage points, 1.90 percentage points, 1.94 percentage points on the UCM dataset; 1.85 percentage points, 1.91 percentage points, 1.85 percentage points on the AID dataset; and 1.52 percentage points, 1.58 percentage points, 1.58 percentage points on the NWPU-RESISC45 dataset (Table 4). Comparative experiments confirm that the GCDM-Mamba model achieves state-of-the-art classification performance across all three datasets, outperforming the current best model RSMamba-H with F1 improvements of 1.88 percentage points, 1.78 percentage points , and 1.15 percentage points respectively (Table 1, Table 2, Table 3).
To address the challenges of insufficient feature discrimination and low computational efficiency in remote sensing image classification tasks, a novel method named GCDM-Mamba is proposed. The method begins by employing a GSCA module, where feature maps are grouped and processed through average pooling and max pooling along the height and width dimensions to construct attention maps. These maps utilize multi-dimensional global information to weight the input feature maps, thereby enhancing feature representation. Subsequently, positional embeddings are integrated to capture spatial information, while a class token is adopted to provide global category-related context for the entire image. Finally, DMME is introduced to further improve computational efficiency and strengthen the network’s ability to model long-range dependencies. Experimental evaluations on the UCM, AID, and NWPU-RESISC45 datasets demonstrate that the proposed GCDM-Mamba achieves superior classification performance compared to existing methods. With reduced parameters, the model effectively extracts image features and captures long-range dependencies, validating its effectiveness in remote sensing image classification tasks.
Get Citation
Copy Citation Text
Hui Chen, Zixu Li. Remote Sensing Image Classification Based on Grouped Spatial Coordinate Attention and Mamba[J]. Acta Optica Sinica, 2025, 45(15): 1528001
Category: Remote Sensing and Sensors
Received: Apr. 17, 2025
Accepted: May. 12, 2025
Published Online: Aug. 8, 2025
The Author Email: Zixu Li (1074301430@qq.com)
CSTR:32393.14.AOS250956