Acta Optica Sinica, Volume. 45, Issue 8, 0815001(2025)
Three-Dimensional Reconstruction Method of AA-HD Neural Radiation Field Based on PE-FNF and Point Cloud Prior
As a technology that transforms real-world objects or scenes into digital models, three-dimensional (3D) reconstruction plays a crucial role in today’s technological landscape. With the rapid development of computer vision, graphics, machine learning, and related fields, vision-based 3D reconstruction technology has become widely applicable in fields such as artificial intelligence, autonomous driving, robotics, simultaneous localization and mapping (SLAM), and virtual reality, due to its speed and excellent real-time performance. In recent years, many researchers have integrated traditional geometry-based multi-view techniques with deep learning-based 3D reconstruction methods, using neural networks to implicitly represent 3D scenes and combining them with computer graphics to render and complete the scene reconstruction task. However, the 3D reconstruction method using neural radiation field (NeRF) may lead to loss of details and the generation of jagged edges when sampling density is insufficient, resulting in poor image quality. Therefore, it is crucial to develop a reconstruction method that achieves higher quality and stronger anti-aliasing capabilities, allowing for more efficient (faster and higher-quality) scene reconstruction to improve the practical user experience.
Two public datasets, LLFF and Realistic Synthetic 360°, are used for experimentation. The images are arranged in order, and one out of every seven images is selected as the test data. Five scenes are randomly selected for experiments. The experiments are conducted on the AutoDL AI cloud platform with the following setup: the operating system is Ubuntu 20.04 with a 12-core Xeon? Silver 4214R CPU, two RTX 3080 GPUs, and the Pytorch deep learning framework in a Python 3.10 environment. The parallel computing ability of CUDA 11.5 is utilized to accelerate the algorithm’s execution. The position encoding frequency filter is applied to screen the frequency numbers corresponding to each viewpoint. This process does not affect the model’s inference speed before training. The point cloud prior replaces the coarse sampling layer in NeRF to accelerate the model’s training. During the experimental setup, 128 points are non-uniformly sampled to ensure efficient and accurate scene information capture, with 1024 rays sampled per batch. The mean squared error loss function and Adam optimizer are used, with each scene undergoing 150000 iterations. To rigorously assess the performance of this method in new view reconstruction, three common image quality metrics (PSNR, SSIM, and LPIPS) are used. PSNR evaluates the pixel-level error between the reconstructed image and the original image, with higher values indicating better fidelity in terms of signal purity and noise suppression. SSIM measures structural similarity, with higher values indicating that the reconstructed image better retains the original image’s layout and texture. LPIPS evaluates human perceptual similarity. To assess the effectiveness of the proposed method, ablation experiments are conducted for each module. In experimental group A, only the position encoding frequency filter is used; group B adds the gating channel transformation module to multilayer perceptron (MLP); group C uses only the point cloud prior; and group D combines all three: the position encoding frequency filter, gating channel transformation module, and point cloud prior. To evaluate the anti-aliasing and detail improvements, comparisons are made between the NeRF, MIP-NeRF, and the proposed method. Five scenes from the LLFF and Realistic Synthetic 360° datasets are selected for comparative testing under the same experimental parameters.
The anti-aliasing-high detail (AA-HD) neural radiation field algorithm, based on position-based coded frequency number filter (PE-FNF) and point cloud prior, is shown in Fig. 2. The sparse 3D space points are obtained using the SFM algorithm, leveraging the given images and corresponding poses for each dataset. The geometric center is calculated to fit the center position of the reconstructed object, and the distance between the camera pose and the geometric center is used to estimate the number of position encoding frequency filters. This frequency is then dynamically adjusted to enhance the detailed expression of the sampling points. In addition, the SFM algorithm generates sparse point clouds from the continuous images of the reconstructed object, which serves as the point cloud prior, eliminating the need for rough sampling in NeRF. This accelerates reconstruction while effectively capturing spatial points at different depths. Even with significant changes in viewpoint, the reconstruction remains effective. During the feature reasoning process, a gated channel transformation MLP module is introduced to capture feature information between high-frequency signals. Key high-weight features are finely filtered, further enhancing the details of the reconstructed object. The ablation experiments in Table 1 and Fig. 10 demonstrate the positive effects of each module both quantitatively and qualitatively, with improvements in the three evaluation metrics. In comparative experiments, the proposed algorithm outperforms the original NeRF in terms of detail and anti-aliasing (Figs. 11 and 12). Compared to NeRF and MIP-NeRF, the algorithm achieves the highest average values for PSNR, SSIM, and LPIPS (Tables 2 and 3). In addition, the computational cost of the proposed method is significantly lower than that of both NeRF and MIP-NeRF (Table 4).
In this paper, we propose an optimized version of the original NeRF, improving its sampling, position encoding, and MLP structure. We introduce an AA-HD neural radiation field method, based on PE-FNF and point cloud prior. The detailed representation of the reconstruction target is enhanced using the position encoding frequency filter and the gating channel transformation. The point cloud prior improves the anti-aliasing ability of the method and reduces computational costs. Through ablation experiments, the effectiveness of each module and the overall method are proved. Through comparative experiments, it is proved that, when applied to the Realistic Synthetic 360° and LLFF public datasets, the proposed method outperforms NeRF in new view reconstruction. Specifically, on the Realistic Synthetic 360° dataset, the three image quality metrics (PSNR, SSIM, and LPIPS) improve by 3.77%, 3.01%, and 27.59%, respectively. On the LLFF dataset, the improvements are 16.36%, 10.87%, and 31.33%, respectively, with qualitative results further confirming the effectiveness of the method.
Get Citation
Copy Citation Text
Tao Song, Hongyao Tang, Bin Xing, Yichen Yang, Yao Xiao, Shengjie Lei, Jianxu Wang, Zourong Long. Three-Dimensional Reconstruction Method of AA-HD Neural Radiation Field Based on PE-FNF and Point Cloud Prior[J]. Acta Optica Sinica, 2025, 45(8): 0815001
Category: Machine Vision
Received: Dec. 23, 2024
Accepted: Feb. 14, 2025
Published Online: Apr. 27, 2025
The Author Email: Song Tao (tsong@cqut.edu.cn)
CSTR:32393.14.AOS241920