Optics and Precision Engineering, Volume. 32, Issue 2, 237(2024)

Audio object detection network with multimodal cross level feature knowledge transfer

Shibei LIU and Ying CHEN*
Author Affiliations
  • Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Jiangnan University, Wuxi214122, China
  • show less
    Figures & Tables(18)
    Schematic of RGB, depth and audio information
    Multimodal knowledge distillation target detection network
    Cross-level fusion and no cross-level feature heatmaps
    Cross-level feature knowledge transfer loss based on attentional fusion
    Attention fusion module(AFM) and the KL divergence calculation module(KLD)
    Selection diagram of image and audio
    Example images of MAVD dataset
    Comparison of object detection capability under different network architecture
    Schematic diagram of different fusion modes
    Qualitative comparison of vehicle detection capability with or without LDLoss
    Los curves for MTALoss and MCFTLoss
    Qualitatively compares the vehicle detection capabilities of the baseline network and the method presented in this paper
    • Table 1. Results comparison of the method and the baseline network under different faculty modes

      View table
      View in Article

      Table 1. Results comparison of the method and the baseline network under different faculty modes

      模型教师模态mAP值(越大越好)中心距离(越小越好)
      RGB深度mAP@AvgmAP@0.5mAP@0.75CDxCDy
      StereoSoundNet6-44.0562.3841.463.002.24
      Baseline7-51.4569.2249.072.971.72
      -40.2854.0938.456.083.28
      51.9175.9247.132.071.11
      Ours-57.5777.0255.852.291.31
      -48.0463.5346.404.802.67
      62.2382.6361.491.951.05
    • Table 2. This paper compares the method with classical object detection networks

      View table
      View in Article

      Table 2. This paper compares the method with classical object detection networks

      模型FPS/(FPS)模型FPS/(FPS)
      Faster R-CNN VGG1618.41Yolov3-m94.81
      Faster R-CNN ResNet13.15Yolov3-l66.89
      Yolov5-x(EfficientNet-B2)43.82Yolov5-s118.17
      SSD300(EfficientNet-B2)44.41Yolov5-m93.20
      SSD300121.39Yolov5-l67.04
      SSD50084.16Yolov5-x48.33
      Yolov3-s96.90Ours49.91
    • Table 3. Ablation studies for both losses

      View table
      View in Article

      Table 3. Ablation studies for both losses

      模型损失mAP值中心距离
      MCFT LossLD LossmAP@AvgmAP@0.5mAP@0.75CDxCDy
      M1--52.6872.0550.042.691.51
      M2-55.9676.6854.872.511.41
      M3-62.3982.2361.381.981.08
      M462.2382.6361.491.951.05
    • Table 4. 损失函数中超参数和的消融研究

      View table
      View in Article

      Table 4. 损失函数中超参数和的消融研究

      超参数mAP值中心距离
      δβmAP@AvgmAP@0.5mAP@0.75CDxCDy
      1.00.00352.8872.8550.772.651.57
      1.00.00562.3982.2361.381.981.08
      1.00.00853.8672.4951.742.811.61
      1.00.0150.4369.1248.443.111.80
      1.00.0351.5569.5649.823.061.75
      1.00.0559.2978.9757.522.251.25
      1.01.049.9767.2447.873.221.82
    • Table 5. 损失函数中超参数,和的消融研究

      View table
      View in Article

      Table 5. 损失函数中超参数,和的消融研究

      超参数mAP值中心距离
      δβλmAP@AvgmAP@0.5mAP@0.75CDxCDy
      1.00.0050.00550.1366.4048.273.522.06
      1.00.0050.0651.1770.8948.762.871.65
      1.00.0050.0152.2271.6749.802.861.64
      1.00.0050.2562.2382.6361.491.951.05
      1.00.0050.350.9570.2448.972.921.71
      1.00.0051.055.7978.1353.402.281.29
    • Table 6. Ablation studies with different fusion methods and loss calculation methods

      View table
      View in Article

      Table 6. Ablation studies with different fusion methods and loss calculation methods

      方法mAP值中心距离
      跨级融合方式损失计算方式mAP@AvgmAP@0.5mAP@0.75CDxCDy
      --KL51.9175.9247.132.071.11
      --L252.6872.0550.042.691.51
      -KL58.3678.0256.972.311.27
      -L256.1375.3354.742.671.48
      两两融合KL62.1581.8461.132.041.13
      两两融合L258.6877.9756.442.281.28
      堆叠融合KL62.3982.2361.381.981.08
      堆叠融合L261.7480.5460.452.051.11
    Tools

    Get Citation

    Copy Citation Text

    Shibei LIU, Ying CHEN. Audio object detection network with multimodal cross level feature knowledge transfer[J]. Optics and Precision Engineering, 2024, 32(2): 237

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Jun. 8, 2023

    Accepted: --

    Published Online: Apr. 2, 2024

    The Author Email: CHEN Ying (chenying@jiangnan.edu.cn)

    DOI:10.37188/OPE.20243202.0237

    Topics