Journal of Applied Optics, Volume. 42, Issue 5, 867(2021)

Object detection and tracking algorithm based on audio-visual information fusion

Zhanhua HUANG... Zhilin CHEN, Hanxiao ZHANG, Yusheng CAO and Muhong SHEN |Show fewer author(s)
Author Affiliations
  • Key Laboratory of Opto-electronics Information Technology (Ministry of Education), School of Precision Instruments and Opto-electronics Engineering, Tianjin University, Tianjin 300072, China
  • show less
    Figures & Tables(16)
    Block diagram of algorithm
    Structure diagram of YOLOv5m model
    Flow chart of YOLOv5m feature extraction
    Spatial structure of microphone array
    Block diagram of system hardware design
    Physical drawing of hardware part
    Training result chart of YOLOv5m
    Renderings of object detection
    Experimental effect of YOLOv5m + UKF
    Audio signal waveform
    Comparison of tracking effect of three algorithms
    Error curves of audio and video sequences
    • Table 1. Matching effect of tracking box and detection box

      View table
      View in Article

      Table 1. Matching effect of tracking box and detection box

      帧数序号40455055
      左侧目标GIoU值0.9180.8470.8590.856
      右侧目标GIoU值0.8420.8630.8040.796
    • Table 2. Average error of azimuth θ and φ

      View table
      View in Article

      Table 2. Average error of azimuth θ and φ

      声源坐标/ mmθ平均误差/ (°) φ平均误差/ (°)
      (0,0,500)3.4182.891
      (0,0,800)3.2542.724
      (0,200,400)2.9212.230
      (0,500,500)3.3962.486
      (200,0,500)2.8552.492
      (200,500,500)2.4232.362
      (150,400,200)2.8272.069
    • Table 3. Tracking coordinate values of three algorithms

      View table
      View in Article

      Table 3. Tracking coordinate values of three algorithms

      帧数 序号 声源 定位 A视觉 跟踪 A融合跟踪 (加权) A融合跟踪 (不加权) A标定 位置
      67(833,366)(923,427)(903,387)(869,378)(911,425)
      77(925,353)(1002, 426)(972, 397)(961,372)(945,422)
      80(953,322)(1029, 428)(989, 374)(982,361)(961,422)
      86(950,320)(1058, 429)(1009, 378)(998,369)(982,422)
      96(982,501)(1085, 476)(1076, 483)(1039,494)(1042,484)
    • Table 4. Performance comparison of three algorithms

      View table
      View in Article

      Table 4. Performance comparison of three algorithms

      算法准确率/%平均每帧运行时间/ms
      声源定位74.484.1
      视频检测跟踪83.4623.0
      融合检测跟踪90.6829.2
    Tools

    Get Citation

    Copy Citation Text

    Zhanhua HUANG, Zhilin CHEN, Hanxiao ZHANG, Yusheng CAO, Muhong SHEN. Object detection and tracking algorithm based on audio-visual information fusion[J]. Journal of Applied Optics, 2021, 42(5): 867

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: OE INFORMATION ACQUISITION AND PROCESSING

    Received: May. 6, 2021

    Accepted: --

    Published Online: Sep. 23, 2021

    The Author Email:

    DOI:10.5768/JAO202142.0502007

    Topics