Target Detection Model Based on Once Bidirectional Feature Pyramid Network

Block	Layer	Operation	Specific operational detail	Output feature size
Block 1	Conv1_1	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$300 \times 300 \times 64$
Block 1	Conv1_2	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$300 \times 300 \times 64$
Block 2	Pooling1	MaxPooling	$k = 2$ ， $s = 2$	$150 \times 150 \times 64$
	Conv2_1	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$150 \times 150 \times 128$
	Conv2_2	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$150 \times 150 \times 128$
Block 3	Pooling2	MaxPooling	$k = 2$ ， $s = 2$	$75 \times 75 \times 128$
	Conv3_1	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$75 \times 75 \times 256$
	Conv3_2	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$75 \times 75 \times 256$
	Conv3_3	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$75 \times 75 \times 256$
Block 4	Pooling3	MaxPooling	$k = 2$ ， $s = 2$	$38 \times 38 \times 256$
	Conv4_1	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$38 \times 38 \times 512$
	Conv4_2	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$38 \times 38 \times 512$
	Conv4_3	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$38 \times 38 \times 512$
Block 5	Pooling4	MaxPooling	$k = 2$ ， $s = 2$	$19 \times 19 \times 512$
	Conv5_1	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$19 \times 19 \times 512$
	Conv5_2	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$19 \times 19 \times 512$
	Conv5_3	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$19 \times 19 \times 512$
Block 6	Pooling5	MaxPooling	$k = 2$ ， $s = 1$ ， $p = 1$	$19 \times 19 \times 512$
	Conv6	Conv，Act	$k = 3$ ， $p = 6$ ， $d = 6$ ；ReLU	$19 \times 19 \times 1024$
	Conv7	Conv，Act	$k = 1$ ；ReLU	$19 \times 19 \times 1024$
Block 7	Conv8_1	Conv，Act	$k = 1$ ；ReLU	$19 \times 19 \times 256$
Block 7	Conv8_2	Conv，Act	$k = 3$ ， $s = 2$ ， $p = 1$ ；ReLU	$10 \times 10 \times 512$
Block 8	Conv9_1	Conv，Act	$k = 1$ ；ReLU	$10 \times 10 \times 128$
Block 8	Conv9_2	Conv，Act	$k = 3$ ， $s = 2$ ， $p = 1$ ；ReLU	$5 \times 5 \times 256$
Block 9	Conv10_1	Conv，Act	$k = 1$ ；ReLU	$5 \times 5 \times 128$
Block 9	Conv10_2	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$3 \times 3 \times 256$
Block 10	Conv11_1	Conv，Act	$k = 1$ ；ReLU	$3 \times 3 \times 128$
Block 10	Conv11_2	Conv，Act	$k = 3$ ， $p = 1$ ；ReLU	$1 \times 1 \times 256$

Table 2. Number of prior frames of a single grid on effective feature layer
View table
Table 2. Number of prior frames of a single grid on effective feature layer
Efficient feature layer Size Number of prior frames per grid
Conv4_3 $38 \times 38$ 4
Conv7 $19 \times 19$ 6
Conv8_2 $10 \times 10$ 6
Conv9_2 $5 \times 5$ 6
Conv10_2 $3 \times 3$ 4
Conv11_2 $1 \times 1$ 4

Table 3. Training strategies
View table
Table 3. Training strategies
Stage Optimizer Batch_size Freeze_train Initial_Lr Lr_scheduler Epoch
1 Adam 32 True 0.0005 ReduceLROnPlateau 50
Adam 16 False 0.0001 ReduceLROnPlateau 150
2 SGD-M 32 True 0.001 MultiStepLR 50
SGD-M 16 False 0.001 MultiStepLR 50

Table 4. Comparison results of detection accuracy and detection speed on PASCAL VOC2007 test set

View table

Table 4. Comparison results of detection accuracy and detection speed on PASCAL VOC2007 test set

Method	Dataset	Backbone	Input size	FPS	mAP /%
Faster^［4］	VOC07+12	VGG16	$600 \times 1000$	7	73.2
SSD（Baseline）^［10］	VOC07+12	VGG16	$300 \times 300$	59	74.3
SSD^*［10］	VOC07+12	VGG16	$300 \times 300$	52.6	76.9
DSSD^［11］	VOC07+12	ResNet-101	$321 \times 321$	13.6	78.6
DSOD^［29］	VOC07+12	DS/64-192-48-1	$300 \times 300$	17.4	77.7
RSSD^［12］	VOC07+12	VGG16	$300 \times 300$	35	78.5
FSSD^［30］	VOC07+12	VGG16	$300 \times 300$	65.8	78.8
ESSD^［31］	VOC07+12	VGG16	$300 \times 300$	25	79.4
FASSD^［32］	VOC07+12	ResNet-50	$300 \times 300$	30	78.1
DFSSD^［33］	VOC07+12	DenseNet-S-32-1	$300 \times 300$	11.6	78.9
FDSSD^［17］	VOC07+12	VGG16	$300 \times 300$	12.6	79.1
OBSSD	VOC07+12	VGG16	$300 \times 300$	41.7	80.8

Table 5. Comparison of average precision results of 20 categories in PASCAL VOC2007 test set

View table

Table 5. Comparison of average precision results of 20 categories in PASCAL VOC2007 test set

Method	mAP /%	areo	bicycle	bird	boat	bottle	bus	car	cat	chair	cow
Faster^［4］	73.2	76.5	79.0	70.9	65.5	52.1	83.1	84.7	86.4	52.0	81.9
SSD^［10］（baseline）	74.3	75.5	80.2	72.3	66.3	47.6	83.0	84.2	86.1	54.7	78.3
SSD^*［10］	76.9	76.9	86.6	74.5	66.4	50.4	85.0	84.7	87.3	61.0	78.7
DSSD^［11］	78.6	81.9	84.9	80.5	68.4	53.9	85.6	86.2	88.9	61.1	83.5
ESSD^［31］	79.4	82.6	86.1	79.8	72.2	54.7	86.8	86.9	88.2	62.8	85.2
OBSSD	80.8	82.7	89.7	81.5	71.8	53.7	90.7	90.0	90.6	64.8	86.2
Model	mAP /%	table	dog	horse	mbike	person	plant	sheep	sofa	train	tv
Faster^［4］	73.2	65.7	84.8	84.6	77.5	76.7	38.8	73.6	73.9	83.0	72.6
SSD^［10］（baseline）	74.3	73.9	84.5	85.3	82.6	76.2	48.6	73.9	76.0	83.4	74.0
SSD^*［10］	76.9	78.2	86.1	89.4	86.0	79.8	48.5	76.1	80.3	86.9	76.1
DSSD^［11］	78.6	78.7	86.7	88.7	86.7	79.7	51.7	78.0	80.9	87.2	79.4
ESSD^［31］	79.4	78.2	87.5	88.0	87.0	80.0	56.1	80.2	80.4	88.7	78.1
OBSSD	80.8	77.3	87.9	90.0	88.1	82.0	54.2	80.5	83.1	90.2	80.0

Table 6. Results of ablation experiment
View table
Table 6. Results of ablation experiment
Model mAP@0.3 /% mAP@0.5 /% Size /MB FPS
SSD^［10］ 74.3 25.1 59
SSD^*［10］ 80.8 76.9 25.1 52.6
PMSSD^* 82.9 78.2 25.6 48.2
OBMSSD^* 84.2 80.1 25.8 44.3
OBSSD^* 85.2 80.8 27.4 41.7

Tools

Get Citation

Copy Citation Text

Yunchuan Zhang, Lin Jiang, Li Lin. Target Detection Model Based on Once Bidirectional Feature Pyramid Network[J]. Laser & Optoelectronics Progress, 2023, 60(2): 0215005

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites