Audio object detection network with multimodal cross level feature knowledge transfer

模型	教师模态		mAP值（越大越好）			中心距离（越小越好）
模型	RGB	深度	mAP@Avg	mAP@0.5	mAP@0.75	CDx	CDy
StereoSoundNet^［6］	√	-	44.05	62.38	41.46	3.00	2.24
Baseline^［7］	√	-	51.45	69.22	49.07	2.97	1.72
	-	√	40.28	54.09	38.45	6.08	3.28
	√	√	51.91	75.92	47.13	2.07	1.11
Ours	√	-	57.57	77.02	55.85	2.29	1.31
	-	√	48.04	63.53	46.40	4.80	2.67
	√	√	62.23	82.63	61.49	1.95	1.05

Table 2. This paper compares the method with classical object detection networks

View table

View in Article

Table 2. This paper compares the method with classical object detection networks

模型	FPS/（FPS）	模型	FPS/（FPS）
Faster R-CNN VGG16	18.41	Yolov3-m	94.81
Faster R-CNN ResNet	13.15	Yolov3-l	66.89
Yolov5-x（EfficientNet-B2）	43.82	Yolov5-s	118.17
SSD300（EfficientNet-B2）	44.41	Yolov5-m	93.20
SSD300	121.39	Yolov5-l	67.04
SSD500	84.16	Yolov5-x	48.33
Yolov3-s	96.90	Ours	49.91

Table 3. Ablation studies for both losses
View table
View in Article
Table 3. Ablation studies for both losses
模型损失 mAP值中心距离
MCFT Loss LD Loss mAP@Avg mAP@0.5 mAP@0.75 CDx CDy
M1 - - 52.68 72.05 50.04 2.69 1.51
M2 - √ 55.96 76.68 54.87 2.51 1.41
M3 √ - 62.39 82.23 61.38 1.98 1.08
M4 √ √ 62.23 82.63 61.49 1.95 1.05

Table 4. 损失函数中超参数和的消融研究

View table

View in Article

Table 4. 损失函数中超参数和的消融研究

超参数		mAP值			中心距离
$δ$	$β$	mAP@Avg	mAP@0.5	mAP@0.75	CDx	CDy
1.0	0.003	52.88	72.85	50.77	2.65	1.57
1.0	0.005	62.39	82.23	61.38	1.98	1.08
1.0	0.008	53.86	72.49	51.74	2.81	1.61
1.0	0.01	50.43	69.12	48.44	3.11	1.80
1.0	0.03	51.55	69.56	49.82	3.06	1.75
1.0	0.05	59.29	78.97	57.52	2.25	1.25
1.0	1.0	49.97	67.24	47.87	3.22	1.82

Table 5. 损失函数中超参数，和的消融研究

View table

View in Article

Table 5. 损失函数中超参数，和的消融研究

超参数			mAP值			中心距离
$δ$	$β$	$λ$	mAP@Avg	mAP@0.5	mAP@0.75	CDx	CDy
1.0	0.005	0.005	50.13	66.40	48.27	3.52	2.06
1.0	0.005	0.06	51.17	70.89	48.76	2.87	1.65
1.0	0.005	0.01	52.22	71.67	49.80	2.86	1.64
1.0	0.005	0.25	62.23	82.63	61.49	1.95	1.05
1.0	0.005	0.3	50.95	70.24	48.97	2.92	1.71
1.0	0.005	1.0	55.79	78.13	53.40	2.28	1.29

Table 6. Ablation studies with different fusion methods and loss calculation methods

View table

View in Article

Table 6. Ablation studies with different fusion methods and loss calculation methods

方法			mAP值			中心距离
跨级	融合方式	损失计算方式	mAP@Avg	mAP@0.5	mAP@0.75	CDx	CDy
-	-	KL	51.91	75.92	47.13	2.07	1.11
-	-	L2	52.68	72.05	50.04	2.69	1.51
√	-	KL	58.36	78.02	56.97	2.31	1.27
√	-	L2	56.13	75.33	54.74	2.67	1.48
√	两两融合	KL	62.15	81.84	61.13	2.04	1.13
√	两两融合	L2	58.68	77.97	56.44	2.28	1.28
√	堆叠融合	KL	62.39	82.23	61.38	1.98	1.08
√	堆叠融合	L2	61.74	80.54	60.45	2.05	1.11

Tools

Get Citation

Copy Citation Text

Shibei LIU, Ying CHEN. Audio object detection network with multimodal cross level feature knowledge transfer[J]. Optics and Precision Engineering, 2024, 32(2): 237

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: Jun. 8, 2023

Accepted: --

Published Online: Apr. 2, 2024

The Author Email: Ying CHEN (chenying@jiangnan.edu.cn)

DOI:10.37188/OPE.20243202.0237

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology

Table 1. Results comparison of the method and the baseline network under different faculty modes

Table 1. Results comparison of the method and the baseline network under different faculty modes

Table 2. This paper compares the method with classical object detection networks

Table 2. This paper compares the method with classical object detection networks

Table 3. Ablation studies for both losses

Table 3. Ablation studies for both losses

Table 4. 损失函数中超参数和的消融研究

Table 4. 损失函数中超参数和的消融研究

Table 5. 损失函数中超参数，和的消融研究

Table 5. 损失函数中超参数，和的消融研究

Table 6. Ablation studies with different fusion methods and loss calculation methods

Table 6. Ablation studies with different fusion methods and loss calculation methods