Classification of Hyperspectral Remote Sensing Images by Joint Hybrid Convolution and Cascaded Group Attention Mechanisms

WANG Xiao-yan; LIANG Wen-hui; BI Chu-ran; LI Jie; WANG Xi-yu

doi:10.3964/j.issn.1000-0593(2025)05-1485-09

Spectroscopy and Spectral Analysis, Volume. 45, Issue 5, 1485(2025)

Classification of Hyperspectral Remote Sensing Images by Joint Hybrid Convolution and Cascaded Group Attention Mechanisms

WANG Xiao-yan¹, LIANG Wen-hui², BI Chu-ran¹, LI Jie^3、*, and WANG Xi-yu²

Author Affiliations

¹School of Systems Science and Statistics, Beijing Wuzi University, Beijing 101149, China

²School of Information, Beijing Wuzi University, Beijing 101149, China

³School of Electromechanical and Vehicle Engineering, Beijing University of Civil Engineering and Architecture, Beijing 102616, China

show less

Abstract Get PDF(in Chinese)

The rich spectral information of hyperspectral remote sensing images can provide reliable data support for their feature classification. However, the problems of high dimensionality and redundancy of spectral data, difficulty associating spatial and spectral features, and insufficient spectral feature extraction have challenged the classification of hyperspectral remote sensing images based on deep learning. Convolutional neural network (CNN) and Vision Transformer (ViT) are two deep learning architectures widely used in computer vision, and each has unique advantages and limitations. CNN is good at capturing local features and spatial hierarchies and can deal with the invariance of the image's translation. ViT can capture global dependencies and has a better understanding of complex patterns in images. To improve the classification accuracy of hyperspectral remote sensing images and give full play to the advantages of both CNN and ViT models, this paper combines the local feature extraction capability of CNN and the global context understanding capability of ViT, and innovatively introduces the 3D Efficient ViT module into the hybrid convolution, and proposes a hyperspectral remote sensing image classification algorithm combining the hybrid convolution and cascading group attention mechanism EVIT3D_HSN: This algorithm introduces 3D Efficient ViT module based on 3D convolution to extract the joint features of hyperspectral remote sensing images and 2D convolution to extract the spatial features, which improves the generalization ability to different datasets and captures the image features of hyperspectral data in a more comprehensive way, thus enhances the performance of the classification algorithm without increasing the complexity of the model. To validate the advancement of this algorithm, this paper's algorithm EVIT3D_HSN is compared with algorithms1DCNN, 2DCNN, 3DFCN, and 3DCNN and the original algorithm HybridSN for ablation experiments on hyperspectral remote sensing imagery classification datasets India Pines, Pavia University, and Salinas. The classification results of EVIT3D_HSN on the above three datasets are 97.66%, 99.00%, and 99.65% for OA and 97.3%, 98.6%, and 99.6% for the Kappa coefficient, respectively. Compared with 1DCNN, the model classification accuracies are improved by 37.12%, 25.09%, and 33.67%, respectively; compared with 2DCNN, the accuracies are improved by 59%, 57.43%, and 46.92%, respectively; compared with 3DFCN, the accuracies are improved by 45.36%, 24.5% and 29.72%, respectively; and compared with 3DCNN, the accuracies are improved by 28.05%, 14.26% and 34.29%; and compared to HybridSN, the accuracy is improved by 3.76%, 1.85% and 2.57%, respectively. In addition, EVIT3D_HSN has the highest F1 values for a total of 37 features, except stone steel towers for the IP dataset, Painted metal sheets and Shadows for the PU dataset, and Stubble features for the SA dataset. CONCLUSION The experimental results show that EVIT3D_HSN outperforms the above five hyperspectral remote sensing image classification algorithms regarding model accuracy and generalization ability, and the model has good practical value.

Keywords

3D efficient ViT Cascade group attention Hybrid convolution Hyperspectral remote sensing image classification

Tools

Get Citation

Copy Citation Text

WANG Xiao-yan, LIANG Wen-hui, BI Chu-ran, LI Jie, WANG Xi-yu. Classification of Hyperspectral Remote Sensing Images by Joint Hybrid Convolution and Cascaded Group Attention Mechanisms[J]. Spectroscopy and Spectral Analysis, 2025, 45(5): 1485

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Received: Jun. 29, 2024

Accepted: May. 21, 2025

Published Online: May. 21, 2025

The Author Email: LI Jie (lijie@bucea.edu.cn)

DOI:10.3964/j.issn.1000-0593(2025)05-1485-09

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology