Three-Dimensional Object Detection Technology Based on Point Cloud Data

In recent years, self-driving technology has garnered considerable attention from both academia and industry. Autonomous perception, which encompasses the perception of the vehicle's state and the surrounding environment, is a critical component of self-driving technology, guiding decision-making and planning modules. In order to perceive the environment accurately, it is necessary to detect objects in three-dimensional (3D) scenes. However, traditional 3D object detection techniques are typically based on image data, which lack depth information. This makes it challenging to use image-based object detection in 3D scene tasks. Therefore, 3D object detection predominantly relies on point cloud data obtained from devices such as lidar and 3D scanners.

Point cloud data consist of a collection of points, with each containing coordinate information and additional attributes such as color, normal vector, and intensity. Point cloud data are rich in depth information. However, in contrast to two-dimensional images, point cloud data are sparse and unordered, and they exhibit a complex and irregular structure, posing challenges for feature extraction processes. Traditional methods rely on local point cloud information such as curvature, normal vector, and density, combined with methods such as the Gaussian model to manually design descriptors for processing point cloud data. However, these methods rely heavily on a priori knowledge and fail to account for the relationships between neighboring points, resulting in low robustness and susceptibility to noise.

In recent years, deep learning methods have gained significant attention from researchers due to their robust feature representation and generalization capabilities. The effectiveness of deep learning methods relies heavily on high-quality datasets. To advance the field of point cloud object detection, numerous companies such as Waymo and Baidu, as well as research institutes have produced large-scale point cloud datasets. With the help of such datasets, point cloud object detection combined with deep learning has rapidly developed and demonstrated powerful performance. Despite the progress made in this field, challenges related to accuracy and real-time performance still exist. Therefore, this paper provides a review of the research conducted in point cloud object detection and looks forward to future developments to promote the advancement of this field.

Progress

The development of point cloud object detection has been significantly promoted by the recent emergence of large-scale open-source datasets. Several standard datasets for outdoor scenes, including KITTI, Waymo, and nuScenes, as well as indoor scenes, including NYU-Depth, SUN RGB-D, and ScanNet, have been released, which have greatly facilitated research in this field. The relevant properties of these datasets are summarized in Table 1.

Point cloud data are characterized by sparsity, non-uniformity, and disorder, which distinguish them from image data. To address these unique properties of point clouds, researchers have developed a range of object detection algorithms specifically designed for this type of data. Based on the methods of feature extraction, point cloud-based single-modal methods can be categorized into four groups: voxel-based, point-based, graph-based, and point+voxel-based methods. Voxel-based methods divide the point cloud into regular voxel grids and aggregate point cloud features within each voxel to generate regular four-dimensional feature maps. VoxelNet, SECOND, and PointPillars are classic architectures of this kind of method. Point-based methods process the point cloud directly and utilize symmetric functions to aggregate point cloud features while retaining the geometric information of the point cloud to the greatest extent. PointNet, PointNet++, and Point R-CNN are their classic architectures. Graph-based methods convert the point cloud into a graph representation and process it through the graph neural network. Point GNN and Graph R-CNN are classic architectures of this approach. Point+voxel-based methods combine the methods based on point and those based on voxel, with STD and PV R-CNN as classic architectures. In addition, to enhance the semantic information of point cloud data, researchers have used image data to supplement secondary information to design multi-modal methods. MV3D, AVOD, and MMF are classic architectures of multi-modal methods. A chronological summary of classical methods for object detection from point clouds is presented in Fig. 4.

Conclusions and Prospects

The field of 3D object detection from point clouds is a significant research area in computer vision that is gaining increasing attention from scholars. The foundational branch of 3D object detection from point clouds has flourished, and future research may focus on several areas. These include multi-branch and multi-mode fusion, the integration of two-dimensional detection methods, weakly supervised and self-supervised learning, and the creation and utilization of complex datasets.

Keywords

3D object detection multi-modality point cloud single modality

Tools

Get Citation

Copy Citation Text

Jianan Li, Ze Wang, Tingfa Xu. Three-Dimensional Object Detection Technology Based on Point Cloud Data[J]. Acta Optica Sinica, 2023, 43(15): 1515001

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites