Audio object detection network with multimodal cross level feature knowledge transfer

Shibei LIU; Ying CHEN

doi:10.37188/OPE.20243202.0237

Optics and Precision Engineering, Volume. 32, Issue 2, 237(2024)

Audio object detection network with multimodal cross level feature knowledge transfer

Shibei LIU and Ying CHEN^*

Key Laboratory of Advanced Process Control for Light Industry （Ministry of Education）， Jiangnan University， Wuxi214122， China

show less

Abstract Get PDF(in Chinese)

As one of the inherent properties of objects， sound can provide valuable information for target detection. At present， the method of target positioning only by monitoring environmental sound is less robust. To solve this problem， a multi-modal self-supervised target detection network under cross-level feature knowledge transfer was proposed. First of all， in view of the teachers network and students at the same characteristics of network learning ability of the limited problem， design based on the integration of teachers across level knowledge transfer loss， through the way of attention fusion deep and shallow characteristics of students， more efficient learning to the corresponding teacher middle layer characteristics， to extract more knowledge， combined with KL divergence， realize the alignment of teachers and students network alignment. In addition， in order to solve the problem of missing localization information， localization distillation loss was added， and more localization information was obtained by fitting the distribution of the teacher. With the network trained in the multimodal audiovisual detection MAVD dataset， the mAP values improve by 6.71%， 14.36% and 10.32% from the baseline network at IOU values of 0.5，0.75 and average， respectively. The experimental results demonstrate the superiority of this detection network.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

deep learning knowledge distillation multimodal object detection self-supervised

Tools

Get Citation

Copy Citation Text

Shibei LIU, Ying CHEN. Audio object detection network with multimodal cross level feature knowledge transfer[J]. Optics and Precision Engineering, 2024, 32(2): 237

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: Jun. 8, 2023

Accepted: --

Published Online: Apr. 2, 2024

The Author Email: CHEN Ying (chenying@jiangnan.edu.cn)

DOI:10.37188/OPE.20243202.0237

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology