Classification Method of High-Resolution Remote Sensing Scene Image Based on Dictionary Learning and Vision Transformer

Xiaojun He; Xuan Liu; Xian Wei

doi:10.3788/LOP222166

Laser & Optoelectronics Progress, Volume. 60, Issue 14, 1410019(2023)

Classification Method of High-Resolution Remote Sensing Scene Image Based on Dictionary Learning and Vision Transformer

Xiaojun He¹, Xuan Liu^1,2、*, and Xian Wei²

Author Affiliations

¹College of Software, Liaoning Technical University, Huludao 125105, Liaoning, China

²Quanzhou Institute of Equipment Manufacturing Haixi Institutes, Fujian Institute of Research on the Structure, Chinese Academy of Sciences, Quanzhou 362216, Fujian, China

show less

Abstract Get PDF(in Chinese)

Figures & Tables(17)

Fig. 1. Diagram of dictionary learning

Download full size

Fig. 2. Flowchart of the proposed method

Download full size

Fig. 3. Batch normalization and layer normalization

Download full size

Fig. 4. Schematic of multilayer perceptron

Download full size

Fig. 5. Flowchart of attention module method

Download full size

Fig. 6. Attention module based on dictionary learning

Download full size

Fig. 7. RSSCN7 dataset

Download full size

Fig. 8. NWPU-RESISC45 dataset

Download full size

Fig. 9. AID dataset

Download full size

Fig. 10. Rate of change of classification accuracy on Gaussian noise images

Download full size

Table 1. Introduction of datasets
View table
Table 1. Introduction of datasets
Dataset Number of scene classes Number of total images Image size Spatial resolution /m Year
RSSCN7 7 2800 400×400 2015
NWPU-RESISC45 45 31500 256×256 ~30-0.2 2016
AID 30 10000 600×600 ~8-0.5 2017

Table 2. Laboratory environment
View table
Table 2. Laboratory environment
Laboratory environment Environment configuration
Language Python3.8.6
Tool PyCharm11.0.11
Framework PyTorch1.9.1
CUDA 10.2

Table 3. Accuracy of different networks on RSSCN7 dataset
View table
Table 3. Accuracy of different networks on RSSCN7 dataset
Network Accuracy /%
AlexNet 82.230
VGG 80.833
ResNet50 89.048
TNT 84.833
ViT 89.643
Proposed network 91.406

Table 4. Accuracy of different networks on NWPU-RESISC45 dataset
View table
Table 4. Accuracy of different networks on NWPU-RESISC45 dataset
Network Accuracy /%
Fine-tuned AlexNet 85.160
Fine-tuned VGGNet-16 90.360
Fine-tuned GoogLeNet 86.020
TNT 85.031
ViT 90.255
Proposed network 91.576

Table 5. Accuracy of different networks on AID dataset
View table
Table 5. Accuracy of different networks on AID dataset
Network Accuracy /%
CaffeNet 86.860
VGG-VD-16 86.590
ResNet152 89.130
GoogLeNet 83.440
TNT 80.450
ViT 85.514
Proposed network 89.218

Table 6. Parameter indicators of two methods on three datasets

View table

Table 6. Parameter indicators of two methods on three datasets

Parameter	RSSCN7		NWPU-RESISC45		AID
Parameter	ViT	Proposed method	ViT	Proposed method	ViT	Proposed method
kappa	0.900	0.916	0.934	0.947	0.883	0.909
F1	86.222	90.890	88.927	90.207	84.202	87.768
recall	85.986	91.142	88.984	90.286	84.147	87.662
precision	86.417	91.002	89.039	90.317	84.558	88.004

Table 7. Parameters of different classification frameworks
View table
Table 7. Parameters of different classification frameworks
Network Number of parameter /10⁶
AlexNet 6
VGG 13.3
ResNet50 2.55
TNT 2.25
ViT 2.6
Proposed method 1.84

Tools

Get Citation

Copy Citation Text

Xiaojun He, Xuan Liu, Xian Wei. Classification Method of High-Resolution Remote Sensing Scene Image Based on Dictionary Learning and Vision Transformer[J]. Laser & Optoelectronics Progress, 2023, 60(14): 1410019

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites