3D Object Detection Based on Fusion of Voxel Texture Information and Deep Semantic Features

Longfei Wang; Likang Fan; Yiqiang Peng; Jie Cao; Liu He; Xulei Liu; Xiyuan Gao

doi:10.3788/LOP242537

Laser & Optoelectronics Progress, Volume. 62, Issue 16, 1615006(2025)

3D Object Detection Based on Fusion of Voxel Texture Information and Deep Semantic Features

Longfei Wang¹, Likang Fan^1,2,3、*, Yiqiang Peng^1,4,5, Jie Cao¹, Liu He^1,2,3, Xulei Liu^1,2,3, and Xiyuan Gao¹

Author Affiliations

¹School of Automobile and Transportation, Xihua University, Chengdu 610039, Sichuan , China

²Vehicle Measurement Control and Safety Key Laboratory of Sichuan Province, Xihua University, Chengdu 610039, Sichuan , China

³Provincial Engineering Research Center for New Energy Vehicle Intelligent Control and Simulation Test Technology of Sichuan, Chengdu 610039, Sichuan , China

⁴Yibin Institute in Xihua University, Yibin 644000, Sichuan , China

⁵Sichuan Intelligent and New Energy Automobile Industry College, Yibin 644000, Sichuan , China

show less

Abstract Get PDF(in Chinese)

In response to the current issue that most voxel-based 3D object detection methods have relatively poor detection and recognition performance for small target objects such as pedestrians and cyclists on the road, this study proposes a single-stage 3D object detection network (Voxel-AESC), which integrates voxel texture information with deep semantic features. First, considering the spatial features of voxels under different receptive fields, a multi-scale 3D feature pyramid network module (ISC3D) is designed to enhance the extraction ability of fine-grained local information in 3D space. Then, a module integrating the channel attention and spatial attention (CASA) mechanisms of residual networks is proposed, which enables the network to adaptively extract the most discriminative features of the targets, significantly enhancing the network’s ability to focus on important information. Finally, the algorithm is verified using the KITTI dataset, the average 3D detection accuracies of the three types of targets (Car, Cyclist, and Pedestrian) in the verification set are 81.45%, 68.59%, and 52.91% respectively, while the average bird’s eye view detection accuracies are 89.16%, 71.90%, and 52.56% respectively, and the inference time is 55 ms, which indicates the detection accuracy and efficiency of the proposed algorithm are superior to those of most existing 3D object detection algorithms. Furthermore, the algorithm is deployed on a real vehicle platform to verify its engineering value.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

3D object detection attention mechanism feature fusion LiDAR voxel

Tools

Get Citation

Copy Citation Text

Longfei Wang, Likang Fan, Yiqiang Peng, Jie Cao, Liu He, Xulei Liu, Xiyuan Gao. 3D Object Detection Based on Fusion of Voxel Texture Information and Deep Semantic Features[J]. Laser & Optoelectronics Progress, 2025, 62(16): 1615006

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites