Spectroscopy and Spectral Analysis, Volume. 45, Issue 5, 1485(2025)
Classification of Hyperspectral Remote Sensing Images by Joint Hybrid Convolution and Cascaded Group Attention Mechanisms
The rich spectral information of hyperspectral remote sensing images can provide reliable data support for their feature classification. However, the problems of high dimensionality and redundancy of spectral data, difficulty associating spatial and spectral features, and insufficient spectral feature extraction have challenged the classification of hyperspectral remote sensing images based on deep learning. Convolutional neural network (CNN) and Vision Transformer (ViT) are two deep learning architectures widely used in computer vision, and each has unique advantages and limitations. CNN is good at capturing local features and spatial hierarchies and can deal with the invariance of the image's translation. ViT can capture global dependencies and has a better understanding of complex patterns in images. To improve the classification accuracy of hyperspectral remote sensing images and give full play to the advantages of both CNN and ViT models, this paper combines the local feature extraction capability of CNN and the global context understanding capability of ViT, and innovatively introduces the 3D Efficient ViT module into the hybrid convolution, and proposes a hyperspectral remote sensing image classification algorithm combining the hybrid convolution and cascading group attention mechanism EVIT3D_HSN: This algorithm introduces 3D Efficient ViT module based on 3D convolution to extract the joint features of hyperspectral remote sensing images and 2D convolution to extract the spatial features, which improves the generalization ability to different datasets and captures the image features of hyperspectral data in a more comprehensive way, thus enhances the performance of the classification algorithm without increasing the complexity of the model. To validate the advancement of this algorithm, this paper's algorithm EVIT3D_HSN is compared with algorithms1DCNN, 2DCNN, 3DFCN, and 3DCNN and the original algorithm HybridSN for ablation experiments on hyperspectral remote sensing imagery classification datasets India Pines, Pavia University, and Salinas. The classification results of EVIT3D_HSN on the above three datasets are 97.66%, 99.00%, and 99.65% for OA and 97.3%, 98.6%, and 99.6% for the Kappa coefficient, respectively. Compared with 1DCNN, the model classification accuracies are improved by 37.12%, 25.09%, and 33.67%, respectively; compared with 2DCNN, the accuracies are improved by 59%, 57.43%, and 46.92%, respectively; compared with 3DFCN, the accuracies are improved by 45.36%, 24.5% and 29.72%, respectively; and compared with 3DCNN, the accuracies are improved by 28.05%, 14.26% and 34.29%; and compared to HybridSN, the accuracy is improved by 3.76%, 1.85% and 2.57%, respectively. In addition, EVIT3D_HSN has the highest F1 values for a total of 37 features, except stone steel towers for the IP dataset, Painted metal sheets and Shadows for the PU dataset, and Stubble features for the SA dataset. CONCLUSION The experimental results show that EVIT3D_HSN outperforms the above five hyperspectral remote sensing image classification algorithms regarding model accuracy and generalization ability, and the model has good practical value.
Get Citation
Copy Citation Text
WANG Xiao-yan, LIANG Wen-hui, BI Chu-ran, LI Jie, WANG Xi-yu. Classification of Hyperspectral Remote Sensing Images by Joint Hybrid Convolution and Cascaded Group Attention Mechanisms[J]. Spectroscopy and Spectral Analysis, 2025, 45(5): 1485
Received: Jun. 29, 2024
Accepted: May. 21, 2025
Published Online: May. 21, 2025
The Author Email: LI Jie (lijie@bucea.edu.cn)