Acta Optica Sinica, Volume. 43, Issue 12, 1228010(2023)

Lightweight Residual Network Based on Depthwise Separable Convolution for Hyperspectral Image Classification

Rongjie Cheng1, Yun Yang1,2、*, Longwei Li1, Yanting Wang1, and Jiayu Wang1
Author Affiliations
  • 1School of Geological Engineering and Geomatics, Chang'an University, Xi'an 710054, Shaanxi, China
  • 2Key Laboratory of Disaster Mechanism and Prevention of Mine Geological Disasters, Ministry of Natural Resources, Xi'an 710054, Shaanxi, China
  • show less

    Objective

    Due to the high spectral resolution, hyperspectral remote sensing imaging technology can describe rich spectral features of ground objects, which is of great significance for the fine classification and recognition of ground objects. In the feature extraction and classification of hyperspectral images, the traditional deep learning network model employs the deep network structure to improve the classification accuracy. However, with the superposition of the convolution layer and the pooling layer, the phenomenon of gradient vanishing and gradient explosion appears in the model, which exerts adverse effects on the classification. Although some scholars have proposed the residual network with the identity link to solve the model degradation caused by the deepening of the network model, it still has the shortcomings of large reference quantity and high time cost. To this end, a lightweight multi-scale residual network model (DSC-Res14) based on depthwise separable convolution is designed and built in this paper, which not only ensures the high classification accuracy of the model but also improves the model training efficiency. This innovative study provides a new solution to further promote the intelligent information extraction of hyperspectral remote sensing images.

    Methods

    In this paper, a lightweight residual network (DSC-Res14) is proposed based on three-dimensional depthwise separable convolution instead of the traditional two-dimensional convolution, so that the long training time caused by large parameters in feature extraction and classification of hyperspectral images by traditional depth residual network is solved, and the performance of object classification using hyperspectral images is improved. For the proposed model with the input using image blocks after reduction dimensionally by principal component analysis, a convolution layer is first employed for initial feature extraction, and both spectral and spatial features of the image blocks are further extracted by three residual layers, each of which contains two residual structures. Finally, a full-connected layer is adopted to provide an input of one-dimensional feature vector for a classifier for pixel-to-pixel classification of hyperspectral images. To reduce network training parameters, this paper leverages the depthwise separable convolution for each residual structure in the residual layer. For the depthwise separable convolution operation, two-dimensional grouping convolution is utilized to extract spatial features from each channel, and then one-dimensional point convolution is employed to extract spectral features. After each convolution layer, batch normalization layer is added to keep the same distribution of input features, and the ReLU activation function is adopted to accelerate the network convergence speed and alleviate gradient disappearance.

    Results and Discussions

    To verify the classification accuracy and speed of the proposed DSC-Res14 model, the paper compares this model with other three similar models and the Res14 model which employs the traditional 3D convolution kernel but has the same network structure as the proposed model. In terms of classification accuracy, the overall accuracy of Res14 on two public standard datasets has reached more than 99.5%, indicating the rationality of the network structure in this paper. For the categories with a small number of samples in the Indian Pines dataset, the classification accuracy of the DSC-Res14 model after introducing depthwise separable convolution is slightly decreased compared with Res14, but it still has obvious advantages over other similar models. In the Pavia University dataset, the accuracy indexes of the DSC-Res14 model proposed in this paper are all superior to Res14. The overall accuracy (OA) is 0.04% higher, the average accuracy (AA) 0.02% higher, and the kappa coefficient 0.04% higher, which shows the best performance among the similar models involved in the comparison. Under the conditions of relatively balanced and sufficient samples, the proposed DSC-Res14 model not only avoids a decline in classification accuracy with a reduction of the network parameters and an optimization of network structure, but also slightly improves the classification accuracy, compared with the traditional 3D convolution residual network. In contrast to similar models likely with depthwise separable convolution, the parameter number of the proposed model is smaller and the deep residual structure also leads to higher classification accuracy. For the three categories with fewer training samples of the Indian Pines dataset, the classification accuracy of other models is poor, but the classification accuracy of each category in the proposed model becomes better and more balanced with an average accuracy of 99.03%, which indicates the ability to deal with uneven sample probability distribution.

    From a comparative analysis above, the conclusions are as follows. The introduction of depthwise separable convolution makes the parameter number of convolution layers and floating point operations (FLOPs) of the proposed DSC-Res14 model in the paper only 1/7 of that of the Res14 model, and the training time is about 1/3 of that of the Res14 model with ensuring high classification accuracy. The proposed model is proven to be a lightweight and efficient depth residual network.

    Conclusions

    In this paper, a lightweight deep residual network model based on depthwise separable convolution is proposed to address the issue of large parameter size and longer training time caused by a deep network structure for improving classification accuracy using hyperspectral remote sensing images. Firstly, both spectral and spatial features of dimensional-reduced hyperspectral images by principal component analysis method are extracted through a three-dimensional convolution layer of the proposed network. Then, three 3D depthwise separable convolution residual layers with different spatial scales are introduced to extract deep semantic features of the given images. This reduces the number of training parameters of the network and enhances the expression ability in high-dimensional and multi-scale spatial features of the image. Experiments on the public Indian Pines and Pavia University datasets show that the classification accuracy of the proposed model is 99.46% and 99.65%. Compared with similar models, this model guarantees high classification accuracy and has fewer parameters and lower computation costs, shorter training time, and better robustness.

    Tools

    Get Citation

    Copy Citation Text

    Rongjie Cheng, Yun Yang, Longwei Li, Yanting Wang, Jiayu Wang. Lightweight Residual Network Based on Depthwise Separable Convolution for Hyperspectral Image Classification[J]. Acta Optica Sinica, 2023, 43(12): 1228010

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Remote Sensing and Sensors

    Received: Oct. 19, 2022

    Accepted: Dec. 12, 2022

    Published Online: Jun. 20, 2023

    The Author Email: Yang Yun (yangyunbox@chd.edu.cn)

    DOI:10.3788/AOS221848

    Topics