Infrared and Laser Engineering, Volume. 53, Issue 8, 20240206(2024)

Semi-supervised 3D object detection based on frustum transformation and RGB voxel grid

Yan WANG1, Tiantian YUAN1, Bin HU1,2、*, and Yao LI2
Author Affiliations
  • 1Technical College for the Deaf , Tianjin University of Technology, Tianjin 300384, China
  • 2School of Microelectronics, Tianjin University, Tianjin 300072, China
  • show less
    Figures & Tables(18)
    A 3D object detection framework based on frustum transformation and RGB voxel maps
    RGB voxel feature extraction module based on frustum transformation
    Channel attention module
    Teacher-Student model framework
    Spatial feature fusion module
    The RVFM addresses the directional issue, as shown in the bottom left and right figures
    The RVFM addresses the proximity issue, with the results displayed in the bottom left and right figures
    Comparison of visualization results between the CAM module and other fusion modules
    The successful scenarios demonstrate high detection accuracy for both cars and cyclists
    The successful scenarios demonstrate high detection accuracy for occluded objects
    The CLM module resolved the false detection issues, as shown in the bottom left and right images
    The complete model addressed issues of repetitive detections and missed detections
    The complete model resolved false detection issues and detected unlabelled objects
    • Table 1. Performance comparisons with established methods were conducted on the KITTI test set

      View table
      View in Article

      Table 1. Performance comparisons with established methods were conducted on the KITTI test set

      MethodSpeedCar 3D AP(R40)Pedestrian 3D AP(R40)Cyclist 3D AP(R40)
      EasyModerateHardEasyModerateHardEasyModerateHard
      PointPillars[14]61.279.0574.9968.3052.0843.5341.4975.7859.0752.92
      VoxelNet[2]4.581.9765.4662.8557.8653.4248.8767.1747.6545.11
      SECOND[15]2083.1373.6666.2051.0742.5637.2970.5153.8546.90
      MV3D[3]2.871.0962.3555.12------
      UberATG-ContFuse[16]16.882.5466.2264.04------
      F-PointNet[4]681.2070.3962.1951.2144.8940.2371.9656.7750.39
      AVOD-FPN[17]12.581.9471.8866.3850.8042.8140.8864.0052.1846.61
      MVXNet[18]25.283.2072.7065.20------
      PointPainting[19]28.582.1171.7067.0850.3240.9737.7777.6363.7855.89
      Ours(LRFN)2183.2674.5169.7648.0840.2137.8371.5558.1551.94
      Baseline-VoxelNet4.480.7465.5863.7356.4452.1647.7966.3945.8144.37
      Ours(LRFN-S)21.488.1275.8871.2159.6756.3052.6172.5756.5052.74
    • Table 2. The influence of each module in the LRFN model on the performance of the KITTI validation set

      View table
      View in Article

      Table 2. The influence of each module in the LRFN model on the performance of the KITTI validation set

      MethodRVFMCAMAP 3D
      EasyMod.Hard
      Ours--69.9359.4456.81
      -70.4960.1758.07
      74.4162.6559.66
      Improvements+4.48+3.21+2.85
    • Table 3. The impact of different fusion modules on the performance of the KITTI validation set

      View table
      View in Article

      Table 3. The impact of different fusion modules on the performance of the KITTI validation set

      MethodAP 3DmAP
      EasyModerateHard
      VC[18]63.3250.3947.2353.65%
      LI-F[20]70.9359.5056.8462.42%
      Ours(CAM)74.4162.6559.6665.57%
    • Table 4. Ablation study on 1% labeled data from the KITTI dataset

      View table
      View in Article

      Table 4. Ablation study on 1% labeled data from the KITTI dataset

      MethodCar 3D AP(R40)Pedestrian 3D AP(R40)Cyclist 3D AP(R40)
      EasyModerateHardEasyModerateHardEasyModerateHard
      PV-RCNN[6]87.773.567.732.428.726.248.128.427.1
      naive psd.-lb.[5]88.475.269.532.729.226.751.430.728.7
      cls. filt. only[5]87.975.570.536.631.028.357.335.333.0
      3DIoUMatch[5]89.076.070.837.031.729.160.436.434.3
      Ours90.777.572.239.131.831.960.938.735.5
    • Table 5. Ablation study of the effectiveness of different modules

      View table
      View in Article

      Table 5. Ablation study of the effectiveness of different modules

      MethodCLMSFFGBVCar-3D DetectionmAP
      EasyModHard
      Baseline---81.8066.9263.9770.90%
      Ours--83.2468.2166.5072.65%
      -86.5475.7873.5778.63%
      87.3676.7174.4679.51%
    Tools

    Get Citation

    Copy Citation Text

    Yan WANG, Tiantian YUAN, Bin HU, Yao LI. Semi-supervised 3D object detection based on frustum transformation and RGB voxel grid[J]. Infrared and Laser Engineering, 2024, 53(8): 20240206

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: May. 16, 2024

    Accepted: --

    Published Online: Oct. 29, 2024

    The Author Email:

    DOI:10.3788/IRLA20240206

    Topics