Optics and Precision Engineering, Volume. 32, Issue 23, 3504(2024)
Specral-spatial classification of hyperspectral imagery with hybrid architecture of 3D-CNN and Transformer
To address pixel-level land cover classification in hyperspectral images (HSI), a hybrid model 3D-ConvFormer is proposed. The model integrates 3D convolutional neural networks (3D-CNN) and self-attention mechanisms to effectively extract spatial-spectral features. In the shallow layers, 3D-CNN operations capture local spatial-spectral features, while in the deeper layers, the self-attention mechanism operates within convolutional windows to enhance feature extraction flexibility. This design achieves a synergistic fusion of the translation invariance of convolutional networks and the adaptive feature extraction capabilities of self-attention. The model's performance was evaluated on three publicly available hyperspectral image datasets—Indian Pines, PaviaU, and WHU-Hi-Longkou—using three metrics: Overall Accuracy (OA), Average Accuracy (AA), and the Kappa coefficient. Experimental results demonstrate that the proposed model achieved an OA of 98.41%, AA of 97.56%, and Kappa of 98.16% on the Indian Pines dataset; an OA of 99.39%, AA of 99.30%, and Kappa of 99.18% on the PaviaU dataset; and an OA of 98.53%, AA of 98.97%, and Kappa of 98.06% on the WHU-Hi-Longkou dataset. Compared to baseline models, 3D-ConvFormer consistently outperformed in classification tasks across all three datasets, significantly improving the accuracy of hyperspectral image classification.
Get Citation
Copy Citation Text
Haizhao JING, Lijie TAO, Haokui ZHANG. Specral-spatial classification of hyperspectral imagery with hybrid architecture of 3D-CNN and Transformer[J]. Optics and Precision Engineering, 2024, 32(23): 3504
Category:
Received: Sep. 30, 2024
Accepted: --
Published Online: Mar. 10, 2025
The Author Email: ZHANG Haokui (hkzhang@nwpu.edu.cn)