Laser & Optoelectronics Progress, Volume. 62, Issue 16, 1615006(2025)

3D Object Detection Based on Fusion of Voxel Texture Information and Deep Semantic Features

Longfei Wang1, Likang Fan1,2,3、*, Yiqiang Peng1,4,5, Jie Cao1, Liu He1,2,3, Xulei Liu1,2,3, and Xiyuan Gao1
Author Affiliations
  • 1School of Automobile and Transportation, Xihua University, Chengdu 610039, Sichuan , China
  • 2Vehicle Measurement Control and Safety Key Laboratory of Sichuan Province, Xihua University, Chengdu 610039, Sichuan , China
  • 3Provincial Engineering Research Center for New Energy Vehicle Intelligent Control and Simulation Test Technology of Sichuan, Chengdu 610039, Sichuan , China
  • 4Yibin Institute in Xihua University, Yibin 644000, Sichuan , China
  • 5Sichuan Intelligent and New Energy Automobile Industry College, Yibin 644000, Sichuan , China
  • show less

    In response to the current issue that most voxel-based 3D object detection methods have relatively poor detection and recognition performance for small target objects such as pedestrians and cyclists on the road, this study proposes a single-stage 3D object detection network (Voxel-AESC), which integrates voxel texture information with deep semantic features. First, considering the spatial features of voxels under different receptive fields, a multi-scale 3D feature pyramid network module (ISC3D) is designed to enhance the extraction ability of fine-grained local information in 3D space. Then, a module integrating the channel attention and spatial attention (CASA) mechanisms of residual networks is proposed, which enables the network to adaptively extract the most discriminative features of the targets, significantly enhancing the network’s ability to focus on important information. Finally, the algorithm is verified using the KITTI dataset, the average 3D detection accuracies of the three types of targets (Car, Cyclist, and Pedestrian) in the verification set are 81.45%, 68.59%, and 52.91% respectively, while the average bird’s eye view detection accuracies are 89.16%, 71.90%, and 52.56% respectively, and the inference time is 55 ms, which indicates the detection accuracy and efficiency of the proposed algorithm are superior to those of most existing 3D object detection algorithms. Furthermore, the algorithm is deployed on a real vehicle platform to verify its engineering value.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Longfei Wang, Likang Fan, Yiqiang Peng, Jie Cao, Liu He, Xulei Liu, Xiyuan Gao. 3D Object Detection Based on Fusion of Voxel Texture Information and Deep Semantic Features[J]. Laser & Optoelectronics Progress, 2025, 62(16): 1615006

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Machine Vision

    Received: Dec. 31, 2024

    Accepted: Mar. 14, 2025

    Published Online: Aug. 4, 2025

    The Author Email: Likang Fan (BITfanlikang@163.com)

    DOI:10.3788/LOP242537

    CSTR:32186.14.LOP242537

    Topics