Acta Photonica Sinica, Volume. 54, Issue 6, 0614002(2025)
PointPillars-S 3D Object Detection Algorithm Based on Lidar
Compared to traditional 2D image detection, 3D point cloud detection provides several distinct and unique advantages that make it highly beneficial and widely applicable in various fields, including autonomous driving, drone navigation, and robotic obstacle avoidance. These advantages include an enhanced ability to resist interference caused by light sources and shadows, the ability to directly obtain 3D environmental data, and the capability to detect objects at significantly greater distances. A key issue with voxel-based methods is the loss of fine-grained details during the voxel encoding process. This loss makes it difficult for the model to capture important features in the point cloud data. Additionally, the feature extraction process may be insufficient, leading to the model not fully exploiting the available information in the point cloud. To address these challenges, a 3D object detection algorithm has been proposed that combines multi-attention voxel encoding with a composite backbone network. The central idea of this approach is to introduce a multi-attention voxel encoder during the voxel encoding process, which includes three types of attention mechanisms: point attention, channel attention, and voxel attention. These mechanisms work together to enable the model to focus on the most important features in the point cloud, allowing it to extract more detailed and specific local and channel features while minimizing the loss of fine-grained information.In addition to the improvements in the voxel encoding process, the backbone network of the model is structured as a composite network, which integrates multiple different networks. A feature fusion module is used to connect these networks laterally, allowing them to share feature information. This cross-network integration improves the model’s ability to understand and represent complex scenes, which in turn enhances the accuracy of object detection. Furthermore, to tackle the issue of class imbalance in the dataset, the classification loss function is modified by introducing a perturbation term. This modification fine-tunes the loss values for each category, allowing the model to better handle the imbalance in the number of samples from each class, which ultimately leads to improved classification accuracy. The proposed algorithm consists of four key components: voxel encoding, the backbone network, the neck, and the detection head. The process begins with the input of unordered 3D point cloud data, which is then spatially divided into pillars. After this, the point cloud data undergoes voxel encoding and multi-attention voxel encoding processing, transforming it into 2D pseudo-images that can be processed more easily by the network. The backbone network is made up of two identical down-sampling structures, each consisting of convolutional layers, normalization, and activation functions. These structures are linked by a feature fusion mechanism, known as attention feature fusion, which combines the features from both networks to enhance feature extraction and reduce information loss. The neck of the model utilizes a feature pyramid network, which is responsible for merging multi-scale feature information, allowing the model to better handle objects of different sizes and shapes. The detection head of the model further improves the classification process by incorporating a perturbation term into the original classification loss function. This addition allows for vertical adjustments of the polynomial coefficients in the loss function, which reduces the number of iterations needed to achieve the same loss values as the original model. As a result, the model is able to achieve higher classification accuracy more efficiently, requiring fewer training iterations.The effectiveness of this algorithm has been validated through experiments conducted on the KITTI public dataset. The results of these experiments show that the proposed method significantly improves the average detection accuracy for cars, cyclists, and pedestrians, with increases of 1.77%, 1.53%, and 7.68%, respectively. These results indicate that the model is able to maintain real-time performance while substantially improving detection accuracy. The improvements are particularly notable for pedestrians, who are often difficult to detect due to their smaller size and less distinctive features. Overall, the proposed method has proven to be a highly effective solution for improving 3D object detection performance in autonomous systems, especially in complex and dynamic real-world environments.
Get Citation
Copy Citation Text
Zengxu ZHAO, Lianqing HU, Bin REN, Shuai YUAN. PointPillars-S 3D Object Detection Algorithm Based on Lidar[J]. Acta Photonica Sinica, 2025, 54(6): 0614002
Category:
Received: Dec. 25, 2024
Accepted: Mar. 4, 2025
Published Online: Jul. 14, 2025
The Author Email: Lianqing HU (164066960@qq.com)