Review of 3D Point Cloud Processing Methods Based on Deep Learning

Yiquan Wu; Huixian Chen; Yao Zhang

doi:10.3788/CJL230924

Table 1. Comparison of performance parameters of three depth cameras

Performance	Structured light camera	Binocular vision camera	Time of flight camera
Principle	Project special structural patterns onto the object	Calculate depth information from two RGB images	Direct measurement based on the time of flight of light
Accuracy	Achieve high precision of 0.01‒1.00 mm in short distance	Up to millimeter precision in short distance	Up to centimeter-level accuracy
Range	Within 10 m	Within 2 m（baseline 10 mm）	Within 100 m
Resolution	Up to 1080 pixel×720 pixel	Up to 2000 pixel	Less than 640 pixel×480 pixel
Frame rate	30 frame/s	From high to low	Higher，up to hundreds of frame per second
Influencing factor	Reflection	Illumination changes and object textures，unavailable at night	Illumination changes and object textures，multiple reflections
Software complexity	Medium	High	Low
Representative	Kinect v1，Pickit，PrimeSense	PointGrey Bumblebee，ZED	Kinect v2，Terabee，Basler

Table 2. Comparison of point cloud denoising and filtering methods based on deep learning

View table

Table 2. Comparison of point cloud denoising and filtering methods based on deep learning

Type	Ref.	Specific structure	Contribution	Limitation
CNN-based	［18］	Fully differentiable CNN	Height map denoising network	Poor denoising effect on larger holes
	［19］	GCN	Robust to high levels of noise	Neighborhood size can affect performance
	［20］	Geometric dual domain graph convolutional networks	Real and virtual normals are defined	Longer training time
	［21］	Feature preserving normal estimation	Automatically estimate normals and update point locations	Unsuitable for severe noise and large outliers
Upsampling- based	［25］	Denoiser and upsampler combined	Effectively resist attacks from other point cloud datasets	Unsuitable for defending against black box attacks
	［27］	Networks based on discrete differential geometry	Preserve features and geometric details	Incomplete datasets are not considered
	［29］	Patch correlation unit and position correction unit	Consider noise and outliers in practical applications	The patch selection strategy will affect the stability of the algorithm
	［30］	Graph attention convolution and edge-aware node caching	Fine-grained edge detail is preserved with high quality	GAC modules increase computational complexity
Filter-based	［31］	Edge-aware integrated network	Suitable for dense point clouds with structure-invariant scale	Training time is long
	［32］	Projection denoising method based on neural network	Direct point cloud denoising using deep learning techniques	Need enough training samples
	［37］	Add repulsion term and data term to the objective function	Capable of handling fine-scale features and sharp features	Depend on the quality of the input normals
	［38］	Outlier recognizer and denoiser	Identify and remove points that are far from the surface	Runtime can also be optimized
Gradient-based	［39］	Score estimation network	More robust to outliers	The gradient is discontinuous
	［41］	Momentum gradient ascent	The gradient field is continuous	Need to construct an effective global gradient field
	［42］	GPCD++ network framework	Lightweight network UniNet	Cannot handle large pores
Other methods	［43］	Channel attention module	Stitching local features of point clouds at multiple scales	The capture of neighborhood feature information is biased
	［44］	Hybrid self-attention network	Enhance local information through Transformer	Longer training time
	［48］	Unsupervised machine learning	Detect outliers by isolation forests and elliptical envelopes	High time complexity
	［49］	Transformer-based	Extract multi-scale local features	High computational complexity

Table 3. Comparison of point cloud lossless compression methods based on deep learning

View table

Table 3. Comparison of point cloud lossless compression methods based on deep learning

Type	Ref.	Specific structure	Contribution	Limitation
Octree-based	［50］	Octree encoding	Using network training entropy model	Neighborhood information not used
Octree-based	［51］	Multi-context deep learning	Using the feature of sibling nodes	Decoding speed can be accelerated
Hybrid representation	［52］	Voxel context compression octree structured	Suitable for static and dynamic point cloud compression	Higher resolution features are ignored
	［53］	Deep autoregressive generative models	Apply autoregressive generative models to 3D	Long encoding and decoding time
	［54］	Multiscale deep context model	Parallel voxel prediction	Sparse point cloud effect is poor
Other methods	［55］	Based on learning conditional probability model	Capture features and relationships of point clouds by sparse tensors	Runtime is highly dependent on the number of occupied blocks
Other methods	［56］	Combination of multi-scale and sparse convolutional network	Use cross-scale，cross-group and cross-color correlations to approximate attribute probabilities	When the prediction module increases，the algorithm complexity will also increase

Table 4. Comparison of point cloud lossy compression methods based on deep learning

View table

Table 4. Comparison of point cloud lossy compression methods based on deep learning

Type	Ref.	Specific structure	Contribution	Limitation
Octree-based	［57］	Learning approximation model based on neural network	Use octree partition to divide point cloud patches with the same size	Long training time
Octree-based	［58］	Multiscale end-to-end network	Learn point cloud features by sparse convolution	Noise can affect performance
Voxel-based	［59］	Variational autoencoders based on neural networks	Apply stacked 3D convolutions in a variational autoencoder structure	Convolution efficiency needs to be improved
Auto-encoder	［62］	Encoding method based on CNN	Extend deep learning coding methods	Long encoding and decoding time
	［63］	Deep autoencoders with hierarchical structure	Multi-scale layered encoder to obtain features at each level	Can only handle small and fixed size point clouds
	［66］	Convolutional autoencoders	Enhanced encoding robustness and more flexible decoding	Rate distortion
	［67］	Compression with spatial and temporal redundancy	Increased compression ratio and compression speed	Computational cost is high
Other methods	［69］	Folding-based network	Fold the 3D manifold onto the image	Unsuitable for point clouds with complex geometries
	［73］	End-to-end TransPCC framework	Lear complex relationships between points by self-attention structure	Computational efficiency needs to be improved
	［74］	Multi-scale local self-attention mechanism	Capture high-level feature in dynamic local neighborhoods	Model running speed still needs to be optimized
	［75］	Transformer network model based on attention mechanism	Use the Transformer to enhance point space feature perception	Long encoding and decoding time

Table 5. Comparison of point cloud super-resolution methods based on deep learning

View table

Table 5. Comparison of point cloud super-resolution methods based on deep learning

Type	Ref.	Specific structure	Contribution	Limitation
CNN-based	［76］	Multi-level feature aggregation	Good anti-noise performance	Inability to fill large holes
	［77］	Point cloud density enhanced convolutional network	Enhancing point cloud density with SRCNN	Point cloud density increase is small
	［78］	Based on single LiDAR	Eliminates dependency on camera	Sensitive to outliers
	［80］	Channel-based attention network	Use circular fills to solve edge recovery issues	Need more reasonable evaluation indicators
GCN-based	［82］	Graph convolutional network	Fewer network parameters	Computational cost increases
	［83］	Dynamic residual graph convolutional networks	Learn local geometric features by multilayer graph convolution	Sensitive to rotating point clouds
	［84］	Double-channel graph convolutional network	Apply feature similarity to construct local graphs of point clouds	Computational complexity increases
GAN-based	［85］	Based on GAN	Robust to noise and sparse point clouds	Unsuitable for filling large gaps
	［86］	Adversarial residual graph network	Obtain features by graph confrontation loss function	Cannot repair large holes or missing parts
	［87］	“Zero-shot” point cloud upsampling network	Training time is reduced	Complex regions are still mismapped
Other structure methods	［88］	Progressive point set upsampling network	The generated point cloud is smoother and more complete	Difficult to handle sparse low-quality point clouds
	［89］	Face point cloud super-resolution network	Predict high-resolution data from low-resolution data	The preprocessing stage is not in the super-resolution network
	［90］	Based on the Transformer	Different types of data can be upsampled	Consume more network parameters

Table 6. Comparison of point cloud restoration, completion and reconstruction methods based on deep learning

View table

Table 6. Comparison of point cloud restoration, completion and reconstruction methods based on deep learning

Type	Ref.	Specific structure	Contribution	Limitation
Image-based	［94］	Point cloud deformation network	Invariant to disordered point clouds	Lack of some details
Image-based	［95］	CNN	Efficient and scalable	Lack of projection information
Sampling-based	［88］	Multi-step upsampling network	Robust to noisy and sparse inputs	Unsuitable for sparse point clouds
	［97］	Data driven	Generate more accurate upsampling with less chamfer loss	Sampling of unknown features degrades
	［98］	Feature reshaping	The generated point cloud is smoother and more complete	Difficult to handle sparse input
Completion-based	［100］	Learning-based shape completion methods	Robust to occlusion and noise	Not sure if the output preserves the input points
	［103］	Multi-scale generative network based on feature points	The spatial arrangement of the point cloud is preserved	Only a part of the point cloud missing area is predicted
	［104］	Cascade refinement network	Remain more details	Occlusion leads to large errors
	［105］	Skip-attention network	High-quality point cloud restoration	Calculation efficiency still needs to be optimized
	［111］	Normalized matrix attention Transformer	Integrate features from different channels and neighborhoods	High computational complexity

Table 7. Common datasets for point cloud processing tasks based on deep learning

View table

Table 7. Common datasets for point cloud processing tasks based on deep learning

Dataset	Year	Task				Website
Dataset	Year	PCD	PCC	PCSR	PCR	Website
KITTI^［113］	2012	√	√	√	√	http：∥www.cvlibs.net/datasets/kitti
Paris-rue-Madame^［114］	2014	√			√	https：∥people.cmm.minesparis.psl.eu/users/serna/rueMadameDataset.html
SHREC15^［115］	2015			√	√	https：∥www.icst.pku.edu.cn/zlian/representa/3d15/index.htm
ModelNet^［116］	2015	√		√	√	http：∥modelnet.cs.princeton.edu/
ShapeNet^［117］	2015	√	√	√	√	https：∥shapenet.org/
vKITTI^［118］	2016				√	https：∥europe.naverlabs.com/Research/Computer-Vision/Proxy-Virtual-Worlds/
ShapeNet Part^［119］	2016		√			https：∥cs.stanford.edu/~ericyi/project_page/part_annotation/
S3DIS^［120］	2016		√		√	http：∥buildingparser.stanford.edu/dataset.html
MVUB	2016		√			http：∥plenodb.jpeg.org/pc/microsoft/
8iVFB	2017		√			http：∥plenodb.jpeg.org/pc/8ilabs/
3DMatch^［121］	2017				√	http：∥3Dmatch.cs.princeton.edu/#rgbd-reconstruction-datasets
ScanNet^［122］	2017	√	√	√	√	http：∥www.scan-net.org/
Matterport3D^［123］	2017				√	https：∥niessner.github.io/Matterport/
PU-Net^［76］	2018	√		√		https：∥drive.google.com/file/d/1R21MD1O6q8E7ANui8FR0MaABkKc30PG4/view
PCN^［100］	2018				√	https：∥drive.google.com/drive/folders/1M_lJN14Ac1RtPtEQxNlCV9e8pom3U6Pa
PU-GAN^［85］	2020			√		https：∥drive.google.com/file/d/1BNqjidBVWP0_MUdMTeGy1wZiR6fqyGmC/view？pli=1
SemanticKITTI^［124］	2019		√	√		http：∥semantic- kitti.org/
MPEG PCC^［125］	2018		√			https：∥mpeg-pcc.org/
nuScenes^［126］	2020				√	https：∥nuscenes.org/
Waymo^［127］	2020				√	https：∥waymo.com/open/
PCNet^［35］	2020	√				https：∥nuage.lix.polytechnique.fr/index.php/s/xSRrTNmtgqgeLGa
PU1K^［82］	2021	√		√		https：∥drive.google.com/file/d/1oTAx34YNbL6GDwHYL2qqvjmYtTVWcELg/view

Table 8. Common evaluation indicators for point cloud processing tasks

View table

Table 8. Common evaluation indicators for point cloud processing tasks

Task	Evaluation indicator
Task	Accuracy	Distance	Similarity	Others
PCD	Precision，recall，F-score，RMSE，MAE	CD，EMD，HD	PSNR	P2M
PCC	Precision，recall，F-score，RMSE，MAE	CD，EMD，HD	PSNR	BPP，time
PCSR	Precision，recall，F-score，RMSE，MAE	CD，EMD，HD	SSIM，PSNR	P2F，NUC
PCR	Precision，recall，F-score，RMSE，MAE	CD，EMD，HD	PSNR

Table 9. Performance comparison of classic point cloud denoising methods on PU-Net and PCNet datasets

View table

Table 9. Performance comparison of classic point cloud denoising methods on PU-Net and PCNet datasets

Dataset	Method	Evaluation index for points with resolution of 10000（sparse）
		CD			P2M
		1% noise	2% noise	3% noise	1% noise	2% noise	3% noise
PU-Net	PCNet^［35］	3.515	7.467	13.067	1.148	3.965	8.737
	GPDNet^［19］	3.78	8.007	13.482	1.337	4.426	9.114
	DMR^［46］	4.482	4.982	5.892	1.722	2.115	2.846
	Score-based^［39］	2.521	3.686	4.708	0.463	1.074	1.942
	PSR^［40］	2.353	3.35	4.075	0.306	0.734	1.242
	GPCD++^［42］	1.881	2.728	3.433	0.251	0.654	1.161
PCNet	PCNet^［35］	3.847	8.752	14.525	1.221	3.043	5.873
	GPDNet^［19］	5.47	10.006	15.521	1.973	3.65	6.353
	DMR^［46］	6.602	7.145	8.087	2.152	2.237	2.487
	Score-based^［39］	3.369	5.132	6.776	0.83	1.195	1.941
	PSR^［40］	2.873	4.757	6.031	0.783	1.118	1.619
	GPCD++^［42］	2.813	4.195	5.385	0.759	0.893	1.333
Dataset	Method	Evaluation index for points with resolution of 50000（dense）
		CD			P2M
		1% noise	2% noise	3% noise	1% noise	2% noise	3% noise
PU-Net	PCNet^［35］	1.049	1.447	2.289	0.346	0.608	1.285
	GPDNet^［19］	1.913	5.021	9.705	1.037	3.736	7.998
	DMR^［46］	1.162	1.566	2.432	0.469	0.8	1.528
	Score-based^［39］	0.716	1.288	1.928	0.15	0.566	1.041
	PSR^［40］	0.649	0.997	1.344	0.076	0.296	0.531
	GPCD++^［42］	0.505	0.852	1.198	0.073	0.303	0.534
PCNet	PCNet^［35］	1.293	1.913	3.249	0.289	0.505	1.076
	GPDNet^［19］	5.31	7.709	11.941	1.716	2.859	5.13
	DMR^［46］	1.566	2.009	2.933	0.35	0.485	0.859
	Score-based^［39］	1.066	1.659	2.494	0.177	0.354	0.657
	PSR^［40］	1.01	1.515	2.093	0.146	0.34	0.573
	GPCD++^［42］	0.857	1.344	1.92	0.132	0.331	0.53

Table 10. Average bits per point (bpp) results of classic point cloud lossless compression methods

View table

Table 10. Average bits per point (bpp) results of classic point cloud lossless compression methods

Method	Microsoft Voxelized Upper Bodies（MVUB）dataset
Method	Phil9	Phil10	Ricardo9	Ricardo10	Average
Frame	245	245	216	216	—
G-PCC^［128］	1.23	1.07	1.04	1.07	0.95
VoxelDNN^［53］	0.92	0.83	0.72	0.75	0.81
MSVoxelDNN^［54］		1.02		0.95	0.99
OctAttention^［51］	0.83	0.79	0.72	0.72	0.76
Method	8i Voxelized Full Bodies（8iVFB）dataset
Method	Loot10	Redandblack10	Boxer9/10	Thaidancer9/10	Average
Frame	300	300	1	1	—
G-PCC^［128］	0.95	1.09	0.96/0.94	0.99/0.99	0.99
VoxelDNN^［53］	0.64	0.73	0.76/—	0.81/—	0.73
MSVoxelDNN^［54］	0.73	0.87	—/0.70	—/0.85	0.79

Table 11. Comparison of average encoding and decoding time for different point cloud lossy compression methods

View table

Table 11. Comparison of average encoding and decoding time for different point cloud lossy compression methods

Method	8iVFB dataset		KITTI dataset		MVUB dataset
Method	Encoding time /s	Decoding time /s	Encoding time /s	Decoding time /s	Encoding time /s	Decoding time /s
G-PCC（octree）^［128］	1.6	0.6	—	—	0.73	0.07
G-PCC（trisoup）^［128］	8.1	6.6	—	—	2.06	1.10
G-PCC v8^［128］	—	—	1.30	0.55	—	—
Learned-PCGC^［59］	9.3	9.5	—	—	—	—
PCGCv2^［58］	1.6	5.4	—	—	0.53	0.18
SparsePCGC^［72］	—	—	1.44	1.32	—	—
PCGFormer^［74］	—	—	—	—	0.87	0.51

Table 12. Performance comparison of different point cloud super-resolution methods on PU-Net dataset

View table

Table 12. Performance comparison of different point cloud super-resolution methods on PU-Net dataset

Method	CD /10^-3	HD /10^-3	P2F/10^-3		NUC 0.4% / $10^{- 3}$	Epoch	Time	Parameter quantity /10³
Method	CD /10^-3	HD /10^-3	$μ$	$σ$	NUC 0.4% / $10^{- 3}$	Epoch	Time	Parameter quantity /10³
PU-Net^［76］	0.38	3.67	8.19	6.65	6.36	120	4.5 h	814
AR-GCN^［86］	0.23	1.78	3.02	3.52	1.29	120	6.2 h	822
MPU^［88］	0.21	1.90	1.72	2.21	1.32	400	27 h	304
PU-GAN^［85］	0.17	1.76	1.05	1.92	0.55	100	25 h	684
PU-GCN^［82］	0.26	2.62	2.15	3.01	1.75	100	9 h	542
ZSPU^［87］	0.19	1.11	2.12	2.21	2.24	50	96 s	310

Table 13. Performance comparison of different point cloud super-resolution methods on PU1K dataset

View table

Table 13. Performance comparison of different point cloud super-resolution methods on PU1K dataset

Method	CD/10^-3	HD/10^-3	P2F/10^-3	Epoch	Time / （10^-3 s）	Parameter quantity /10³	Model /MB
PU-Net^［76］	1.155	15.170	4.834	100	8.4	812.0	10.1
MPU^［88］	0.935	13.327	3.551	100	8.3	76.2	6.2
PU-GCN^［82］	0.585	7.577	2.499	100	8.0	76.0	1.8
PU-Transformer^［90］	0.451	3.843	1.277	100	9.9	969.9	18.4

Table 14. Performance comparison of different point cloud restoration, completion and reconstruction methods

View table

Table 14. Performance comparison of different point cloud restoration, completion and reconstruction methods

Method	Mean chamfer distance per point on PCN dataset /10³
Method	Average	Airplane	Cabinet	Car	Chair	Lamp	Sofa	Table	Vessel
PCN^［100］	9.64	5.50	10.63	8.70	11.00	11.34	11.68	8.59	9.67
TopNet^［101］	9.89	6.24	11.63	9.83	11.50	9.37	12.35	9.36	8.85
CRN^［104］	8.51	4.79	9.97	8.31	9.49	8.94	10.69	7.81	8.05
AGFA-Net^［109］	6.76	3.89	9.03	7.68	7.18	5.52	8.72	6.18	5.91
Method	Chamfer distance per point on ShapeNet dataset /10⁴
Method	Average	Airplane	Cabinet	Car	Chair	Lamp	Sofa	Table	Vessel
PCN^［100］	14.72	8.09	18.32	10.53	19.33	18.52	16.44	16.34	10.21
TopNet^［101］	9.72	5.50	12.02	8.90	12.56	9.54	12.20	9.57	7.51
SA-Net^［105］	7.74	2.18	9.11	5.56	8.94	9.98	7.83	9.94	7.23

Table 1. Comparison of performance parameters of three depth cameras

Table 1. Comparison of performance parameters of three depth cameras

Table 2. Comparison of point cloud denoising and filtering methods based on deep learning

Table 2. Comparison of point cloud denoising and filtering methods based on deep learning

Table 3. Comparison of point cloud lossless compression methods based on deep learning

Table 3. Comparison of point cloud lossless compression methods based on deep learning

Table 4. Comparison of point cloud lossy compression methods based on deep learning

Table 4. Comparison of point cloud lossy compression methods based on deep learning

Table 5. Comparison of point cloud super-resolution methods based on deep learning

Table 5. Comparison of point cloud super-resolution methods based on deep learning

Table 6. Comparison of point cloud restoration, completion and reconstruction methods based on deep learning

Table 6. Comparison of point cloud restoration, completion and reconstruction methods based on deep learning

Table 7. Common datasets for point cloud processing tasks based on deep learning

Table 7. Common datasets for point cloud processing tasks based on deep learning

Table 8. Common evaluation indicators for point cloud processing tasks

Table 8. Common evaluation indicators for point cloud processing tasks

Table 9. Performance comparison of classic point cloud denoising methods on PU-Net and PCNet datasets

Table 9. Performance comparison of classic point cloud denoising methods on PU-Net and PCNet datasets

Table 10. Average bits per point (bpp) results of classic point cloud lossless compression methods

Table 10. Average bits per point (bpp) results of classic point cloud lossless compression methods

Table 11. Comparison of average encoding and decoding time for different point cloud lossy compression methods

Table 11. Comparison of average encoding and decoding time for different point cloud lossy compression methods

Table 12. Performance comparison of different point cloud super-resolution methods on PU-Net dataset

Table 12. Performance comparison of different point cloud super-resolution methods on PU-Net dataset

Table 13. Performance comparison of different point cloud super-resolution methods on PU1K dataset

Table 13. Performance comparison of different point cloud super-resolution methods on PU1K dataset

Table 14. Performance comparison of different point cloud restoration, completion and reconstruction methods

Table 14. Performance comparison of different point cloud restoration, completion and reconstruction methods

微信扫一扫：分享