Acta Photonica Sinica, Volume. 50, Issue 11, 1128002(2021)

Scene Classification of Optical High-resolution Remote Sensing Images Using Vision Transformer and Graph Convolutional Network

Jianan WANG1, Yue GAO1、*, Jun SHI2, and Ziqi LIU1
Author Affiliations
  • 1Space Star Technology Co.,Ltd.,Beijing 100095,China
  • 2School of Software,Hefei University of Technology,Hefei 230601,China
  • show less
    Figures & Tables(6)
    The pipeline of the proposed method
    • Table 1. Stds and overall accuracies of different methods with 20% and 50% training ratio in the AID dataset

      View table
      View in Article

      Table 1. Stds and overall accuracies of different methods with 20% and 50% training ratio in the AID dataset

      Method20% training ratio/%50% training ratio/%
      CaffeNet986.86±0.4789.53±0.31
      VGG-VD-16986.59±0.2989.64±0.36
      GoogLeNet983.44±0.4086.39±0.55
      Two-stream deep feature fusion1194.09±0.3495.99±0.35
      MSCP1291.52±0.2194.42±0.17
      mSmL-Gcoding1391.69±0.3695.61±0.28
      MDFR1490.62±0.2793.37±0.29
      SAFF1890.25±0.2993.83±0.28
      Scale-attention network1992.53±0.3395.72±0.27
      Proposed method94.52±0.2596.80±0.17
    • Table 2. Stds and overall accuracies of different methods with 10% and 20% training ratio in the NWPU-RESISC45 dataset

      View table
      View in Article

      Table 2. Stds and overall accuracies of different methods with 10% and 20% training ratio in the NWPU-RESISC45 dataset

      Method10% training ratio/%20% training ratio/%
      CaffeNet1081.22±0.1985.16±0.18
      VGG-VD-161087.15±0.4590.36±0.18
      GoogLeNet1082.57±0.1286.02±0.18
      Two-stream deep feature fusion1185.02±0.2587.01±0.19
      MSCP1285.33±0.1788.93±0.14
      mSmL-Gcoding1389.34±0.4891.64±0.26
      MDFR1483.37±0.2686.89±0.17
      SAFF1884.38±0.1987.86±0.14
      Scale-attention network1988.92±0.2992.25±0.18
      Proposed method90.50±0.2693.31±0.15
    • Table 3. Ablation experiments of the proposed method in the AID and NWPU-RESISC45 datasets

      View table
      View in Article

      Table 3. Ablation experiments of the proposed method in the AID and NWPU-RESISC45 datasets

      Method

      AID

      50% training ratio/%

      NWPU-RESISC45

      20% training ratio/%

      Proposed method w/o ViT89.90±0.5088.10±0.21
      Proposed method w/o GCN94.20±0.2291.70±0.18
      Proposed method96.80±0.1793.31±0.15
    • Table 4. Overall accuracies of different methods with 20% training ratio in the GID dataset

      View table
      View in Article

      Table 4. Overall accuracies of different methods with 20% training ratio in the GID dataset

      MethodDenseNet-121Proposed method w/o ViTProposed method w/o GCNProposed method
      OA/%69.1068.5070.2072.30
    • Table 5. FPS comparison of proposed method in the AID,NWPU-RESISC45 and GID datasets

      View table
      View in Article

      Table 5. FPS comparison of proposed method in the AID,NWPU-RESISC45 and GID datasets

      Method

      AID

      50% training ratio/%

      NWPU-RESISC45

      20% training ratio/%

      GID

      20% training ratio/%

      DenseNet-12163.463.524.4
      Proposed method135.6138.1139.3
    Tools

    Get Citation

    Copy Citation Text

    Jianan WANG, Yue GAO, Jun SHI, Ziqi LIU. Scene Classification of Optical High-resolution Remote Sensing Images Using Vision Transformer and Graph Convolutional Network[J]. Acta Photonica Sinica, 2021, 50(11): 1128002

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Remote Sensing and Sensors

    Received: May. 25, 2021

    Accepted: Jul. 26, 2021

    Published Online: Dec. 2, 2021

    The Author Email: GAO Yue (bjlguniversity@163.com)

    DOI:10.3788/gzxb20215011.1128002

    Topics