Laser & Optoelectronics Progress, Volume. 61, Issue 24, 2415004(2024)

3D Object Detection Based on Voxel Self-Attention Auxiliary Networks

Jie Cao1, Yiqiang Peng1,2,3, Likang Fan1,2,3、*, and Longfei Wang1
Author Affiliations
  • 1School of Automobile and Transportation, Xihua University, Chengdu 610039, Sichuan , China
  • 2Vehicle Measurement Control and Safety Key Laboratory of Sichuan Province, Xihua University, Chengdu 610039, Sichuan , China
  • 3Provincial Engineering Research Center for New Energy Vehicle Intelligent Control and Simulation Test Technology of Sichuan, Chengdu 610039, Sichuan , China
  • show less
    Figures & Tables(13)
    Range of attention
    Voxel feature query
    Voxel self-attention network architecture
    VA-SECOND
    VA-PVRCNN
    Visualization of long-range scene detection results. (a) Results of PV-RCNN; (b) results of VA-PVRCNN; (c) scene image
    Visualization of detection results in complex scenes. (a) Result of the PV-RCNN; (b) result of the VA-PVRCNN; (c) real scene
    • Table 1. Detection level classification criteria in KITTI dataset

      View table

      Table 1. Detection level classification criteria in KITTI dataset

      LevelMinimum bounding box heightOcclusion situationMinimum truncation /%
      Easy40Fully visible15
      Mod25Partial occlusion30
      Hard25Severe occlusion50
    • Table 2. Comparative experimental results of various algorithms on KITTI dataset

      View table

      Table 2. Comparative experimental results of various algorithms on KITTI dataset

      MethodCar-3D (AP) /%Ped-3D (AP) /%Cyc-3D (AP) /%Average /%
      EasyModHardmAPEasyModHardmAPEasyModHardmAP
      Voxel Net87.9375.3773.2178.8467.8163.5258.8763.4077.6958.7251.6362.6868.30
      Pointpillars87.5077.0174.7779.7666.7361.0656.5061.4383.6563.4059.7168.9270.03
      PointRCNN89.0178.7778.1081.9662.6955.3651.6056.5584.4865.3759.8369.8969.46
      Point-GNN89.3379.4778.2982.3661.9253.7750.1455.2886.6067.4862.5872.2269.95
      Part-A289.5679.4178.8482.6065.6960.0555.4560.5785.5068.9064.5372.9872.05
      CT3D92.8585.8283.4687.3765.7358.5653.0459.1191.9971.6067.3476.9774.48
      SECOND88.6178.6277.2281.4856.0050.0243.6449.8980.9763.4356.6767.0266.13
      VA-SECOND89.1080.8677.9182.6356.8250.6346.5651.3482.3662.6658.7167.9167.30
      *+0.49+2.24+0.69+1.15+0.82+0.61+2.92+1.45+1.39-0.77+2.04+0.89+1.16
      PV-RCNN92.5784.8382.6986.6964.2656.6751.9157.6188.8871.9566.7875.8772.49
      VA-PVRCNN92.1285.0782.6186.6067.8560.0855.4761.1492.0371.7467.3477.0374.93
      *-0.45+0.24-0.08-0.09+3.59+3.41+3.56+3.53+3.15-0.21+0.56+1.16+1.54
    • Table 3. Experimental results with dropout layers

      View table

      Table 3. Experimental results with dropout layers

      MethodDropoutCar-3D (mAP) /%Ped-3D (mAP) /%Cyc-3D (mAP) /%
      VA-SECOND082.6351.3467.91
      0.182.2150.9667.81
      0.381.6049.9267.16
      VA-PVRCNN086.6061.1477.03
      0.186.4761.0376.95
      0.385.9560.1776.08
    • Table 4. Experimental results with numbers of voxel in self-attention computation

      View table

      Table 4. Experimental results with numbers of voxel in self-attention computation

      MethodNumberCar-3D (mAP) /%Ped-3D (mAP) /%Cyc-3D (mAP) /%
      VA-SECOND2482.0150.6567.32
      4882.6351.3467.91
      VA-PVRCNN2486.3959.2676.67
      4886.6061.1477.03
    • Table 5. Experimental results with projection layer

      View table

      Table 5. Experimental results with projection layer

      MethodProjection layerCar-3D (mAP) /%Ped-3D (mAP) /%Cyc-3D (mAP) /%
      VA-SECOND×82.2151.1067.68
      82.6351.3467.91
      VA-PVRCNN×86.4960.9576.92
      86.6061.1477.03
    • Table 6. Algorithmic reasoning time comparison

      View table

      Table 6. Algorithmic reasoning time comparison

      MethodSECONDVA-SECONDPV-RCNNVA-PVRCNN
      Runtime /s0.010.030.060.09
    Tools

    Get Citation

    Copy Citation Text

    Jie Cao, Yiqiang Peng, Likang Fan, Longfei Wang. 3D Object Detection Based on Voxel Self-Attention Auxiliary Networks[J]. Laser & Optoelectronics Progress, 2024, 61(24): 2415004

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Machine Vision

    Received: Mar. 19, 2024

    Accepted: Apr. 24, 2024

    Published Online: Dec. 13, 2024

    The Author Email: Likang Fan (BITfanlikang@163.com)

    DOI:10.3788/LOP240923

    CSTR:32186.14.LOP240923

    Topics