Remote Sensing Image Classification Method Based on Fusion of CNN and Transformer

Fig. 5. Scene instances of three datasets. (a) Airport; (b) beach; (c) center; (d) park; (e) mountain; (f) bridge; (g) church; (h) harbor; (i) overpass; (j) cloud; (k) river; (l) intersection; (m) aeroplane; (n) chaparral; (o) storage tank

Download full size

Fig. 6. Heat map comparison with other related models. (a) Airplane; (b) center; (c) pond; (d) (e) (f) EfficientNet; (g) (h) (i) ResNet; (j) (k) (l) Swin-T; (m) (n) (o) proposed method

Download full size

Fig. 7. Confusion matrix of the AID dataset at 50% training scale

Download full size

Fig. 8. Confusion matrix of the NWPU-RESISC45 dataset at 20% training scale

Download full size

Fig. 9. Confusion matrix of the VGoogle dataset at 20% training scale

Download full size

Table 1. Characteristics of the datasets

View table

Table 1. Characteristics of the datasets

Dataset	Number of classes	Number of images	Total number of images	Resolution /m	Image size	Year
AID	30	220‒420	10000	0.5‒8	600×600	2017
NWPU-RESISC45	45	700	31500	0.2‒30	256×256	2016
VGoogle	38	1502‒1847	59404	0.075‒9.555	256×256	2019

Table 2. Parameter settings for model training
View table
Table 2. Parameter settings for model training
Parameter Value Parameter Value
Epoch 100 Drop rate 0.2
Batch_size 64 Optimiser AdamW
Learning rate 0.000005 Warmup 10
Weight decay 0.0005 Random seed 42

Table 3. Accuracy of different models on three datasets

View table

Table 3. Accuracy of different models on three datasets

Method	Number of parameters/10⁶	AID		NWPU-RESISC45		VGoogle
Method	Number of parameters/10⁶	20% training data	50% training data	10% training data	20% training data	10% training data	20% training data
VGG-16	134.4	86.59±0.29	89.64±0.36	76.47±0.18	79.79±0.65	72.41±0.22	76.74±0.16
GoogLeNet	54.4	83.44±0.40	86.39±0.55	76.19±0.38	78.48±0.26	77.33±0.57	86.79±0.47
EfficientNet-B0	4.1	83.69±0.11	86.17±0.16	79.96±0.27	82.89±0.16	78.30±0.26	88.38±0.29
ResNet-50	23.6	92.39±0.15	94.96±0.19	86.23±0.41	88.93±0.12	88.02±0.15	92.99±0.10
LGRIN	4.6	94.74±0.23	97.65±0.25	91.91±0.15	94.43±0.16
ViT-Base	85.8	91.16±0.41	94.44±0.28	87.59±0.21	90.87±0.17	86.22±0.33	91.42±0.17
PVT-Medium	43.3	92.84±0.19	95.93±0.17	90.51±0.13	92.66±0.14	86.60±0.14	92.32±0.22
Swin-Base	86.8	94.86±0.22	97.80±0.15	91.80±0.16	94.04±0.11	88.48±0.12	93.19±0.13
TRS	46.3	95.54±0.18	98.48±0.06	93.06±0.11	95.56±0.20
TSTNet	173.0	97.20±0.22	98.70±0.12	94.08±0.24	95.70±0.10
Proposed method	20.4	97.81±0.08	98.95±0.06	94.82±0.04	96.00±0.07	91.27±0.02	95.01±0.14

Table 4. Accuracy of ablation experiments

View table

Table 4. Accuracy of ablation experiments

Method	AID		NWPU-RESISC45		VGoogle
Method	20% training data	50% training data	10% training data	20% training data	10% training data	20% training data
Without CA+Transformer	85.17±0.57	92.97±0.09	77.81±0.16	86.55±0.13	86.32±0.16	92.83±0.10
With CA	86.57±0.85	93.42±0.09	79.25±0.25	87.24±0.09	86.91±0.20	92.79±0.20
With Transformer	77.79±0.17	90.45±0.32	72.06±0.32	83.88±0.27	76.97±0.30	86.65±0.22
With CA+Transformer	97.81±0.08	98.95±0.06	94.82±0.04	96.00±0.07	91.27±0.02	95.01±0.14

Tools

Get Citation

Copy Citation Text

Chuan Jin, Changqing Tong. Remote Sensing Image Classification Method Based on Fusion of CNN and Transformer[J]. Laser & Optoelectronics Progress, 2023, 60(20): 2028006

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites