Self-Supervised Feature Learning Method for Hyperspectral Images Based on Mixed Convolutional Networks

Fan Feng; Yongsheng Zhang; Jin Zhang; Bing Liu; Ying Yu

doi:10.3788/AOS231776

Acta Optica Sinica, Volume. 44, Issue 18, 1828007(2024)

Self-Supervised Feature Learning Method for Hyperspectral Images Based on Mixed Convolutional Networks

Fan Feng^1、*, Yongsheng Zhang¹, Jin Zhang¹, Bing Liu², and Ying Yu¹

¹Institute of Geospatial Information, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, Henan , China

²Institute of Data and Target Engineering, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, Henan , China

show less

Abstract Get PDF(in Chinese)

Objective

Hyperspectral images record the reflectance of ground objects in hundreds of narrow bands, forming a unified three-dimensional data cube. Accurate hyperspectral image classification results exhibit a detailed distribution of ground objects, making it the cornerstone of many remote sensing applications. Recently, hyperspectral images with high spatial resolution have promoted the application of hyperspectral technology in various fine-grained tasks. Since hyperspectral images feature high nonlinearity, feature extraction serves as a key to accurate classification. Learning robust spatial-spectral features in real-world complex scenes with insufficient labeled samples has been a long-standing problem. We propose a self-supervised feature learning method for hyperspectral images based on mixed convolutional networks and contrastive learning. This method can make full use of abundant spatial-spectral information in hyperspectral images and automatically learn to extract features suitable for classification tasks in a self-supervised manner. We hope that our findings can help the study of small sample hyperspectral classification, and promote the generalization and practicability of deep learning methods in complex hyperspectral scenes.

Methods

We propose a self-supervised mixed feature fusion network, which is based on mixed convolutional networks and contrastive learning. Firstly, the dimensionality of hyperspectral images is reduced by a factor analysis (FA) algorithm, and the neighborhood information of image pixels is extracted to form image patches. Positive and negative sample pairs are then generated through random spatial and spectral augmentation. Secondly, an efficient cascade feature fusion encoder is constructed by 3D convolution layers and 2D depth-separable convolutional layers. Multi-scale spatial-spectral features are extracted and fine-grained embeddings are calculated by a second-order pooling (SOP) layer. By calculating the contrastive loss on the extracted features for positive and negative sample pairs, the encoder can be trained in a self-supervised manner. Finally, the trained encoder will be fine-tuned using few labeled samples, producing the classification results of hyperspectral images.

Results and Discussions

To validate the proposed method, extensive experiments are conducted on four hyperspectral datasets with distinct spatial-spectral features, namely Indian Pines, Houston, Longkou, and Hanchuan. Indian Pines and Houston are conventional hyperspectral datasets for algorithm verification. Longkou and Hanchuan are recently released datasets that feature extremely high spatial resolution. The contrast methods include attention-based methods, transformer-based methods, and the contrastive learning method that have been proposed recently. Only five supervised samples from each type of ground object are utilized for fine-tuning, and the overall accuracy of the proposed method stands at 79.46%, 84.32%, 92.97%, and 82.31%, respectively, which outperform the above contrast methods (Tables 2-5). The classification maps of the four datasets also demonstrate fewer misclassifications of this method (Figs. 3-6). Targeted ablation experiments are carried out with the results confirming the efficacy of FA, SOP, and contrastive learning method designed in this paper (Table 6). Further experiments on contrastive learning-related settings reveal three key points. First, spatial and spectral enhancement is indispensable. Second, the batch normalization (BN) layer in the projection head plays a crucial role in contrastive learning. Third, the full-finetune approach is more suitable than the linear probe method in hyperspectral image classification tasks (Table 7). Additionally, operational efficiency has been considered and the proposed method can realize the balance between classification accuracy and operation efficiency (Table 8).

Conclusions

We propose a self-supervised classification framework for hyperspectral image classification based on mixed convolutional networks and contrastive learning. Our method combines self-supervised pretext task design and encoder design. The abundant spatial-spectral information of hyperspectral images can be systematically investigated and the features suitable for classification tasks can be extracted in a self-supervised manner. Firstly, spatial and spectral enhancement is used to add random perturbations to hyperspectral image patches, forming positive and negative sample pairs. Then, a mixed convolutional network-based encoder is utilized to extract multi-scale features. The mixed convolutional network consists of a cascade feature fusion structure and a SOP layer, which can extract robust fine-grained spatial-spectral features from disturbed sample pairs. Lastly, the contrastive loss is calculated using the extracted features, enabling the encoder parameters to be optimized in a self-supervised way. Experiments are carried out on four hyperspectral datasets with distinct differences in spatial-spectral features. The classification accuracy of the proposed method is superior to those of contrast methods, and the ablation experimental results show the effectiveness of FA, SOP, and the proposed contrastive learning method. In addition, this method is designed to reduce parameter redundancy and improve parameter utilization efficiency for a balance between operating efficiency and classification accuracy. We explore the combination of model design and self-supervised learning. In the future, we hope that the proposed method will be used in various hyperspectral datasets and it will be further improved for greater generalization ability.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

contrastive learning hyperspectral image classification mixed convolutional network second-order pooling self-supervised learning

Tools

Get Citation

Copy Citation Text

Fan Feng, Yongsheng Zhang, Jin Zhang, Bing Liu, Ying Yu. Self-Supervised Feature Learning Method for Hyperspectral Images Based on Mixed Convolutional Networks[J]. Acta Optica Sinica, 2024, 44(18): 1828007

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites