Reconstruction of Dynamic Human Neural Radiance Fields Based on Monocular Vision

The three-dimensional representation and reconstruction of dynamically deformed human bodies is a significant research direction in computer graphics and computer vision. It aims to represent, reconstruct and render the human body using dynamic videos or image sequences. Current methods for dynamic deformation human body reconstruction necessitate high-precision synchronization of multiple cameras and depth cameras to capture non-rigid body deformations and perform three-dimensional reconstruction. Reconstructing a dynamically deformed human body using a monocular camera presents a challenging yet practical research issue. As a crucial component in dynamic human body reconstruction, geometric representation is primarily divided into two categories: explicit and implicit representation. Existing dynamic human body reconstruction methods mostly focus on explicit representation. Most existing methods focus on explicit representation but are constrained by its inherent discrete properties, often struggling to present detailed deformation information. Moreover, these methods typically rely on equipment such as synchronized multi-view visual acquisition systems or depth cameras, increasing technical complexity and reducing feasibility, thus limiting the advancement and application of dynamic human body reconstruction. Given the heavy reliance on multi-view synchronous acquisition and the scarcity of research on combined dynamic and static reconstruction, our study proposes a dynamic human neural radiation field reconstruction method based on monocular vision. By introducing neural radiation fields to implicitly represent static backgrounds and dynamic human bodies, the problem of poor reconstruction outcomes is effectively addressed. The challenge of jointly reconstructing dynamic and static models is overcome through SAM segmentation of large models.

Methods

We utilize monocular camera data to undertake three-dimensional reconstructions of dynamically deformed human bodies. We propose the neural radiation field representation for dynamically deformed human bodies, a joint dynamic and static scene reconstruction of the neural radiation field, and its rendering technique. Leveraging the neural radiation field and a human body parametric model, we establish a dynamic deformation neural radiation field for the human body. The parametric model matches the dynamic human body in the video, mapping the dynamic body from camera space to a standardized static space via a deformation field. A geometric correction network adjusts inaccuracies between the parametric model and the scene’s human body. The segment anything model (SAM) is employed to dynamically and statically decompose the scene radiation field, using two-dimensional joints as prompts for precise extraction of the human body mask. Guided by the human body mask, the scene radiation field is split into a static background neural radiation field and a dynamic human body neural radiation field. The differentiable properties of volume rendering enable the joint reconstruction of both neural radiation fields. Ultimately, any viewing angle and human body pose are rendered through the volume rendering of the neural radiation field.

Results and Discussions

We present a monocular vision-based dynamic human body neural radiation field reconstruction that integrates the neural radiation field with a human body parametric model. Comparative analysis with existing methods is provided, with results illustrated in Figs. 5, 6, 7, 8, and Table 1. This approach combines the neural radiation field with the SAM to reconstruct the static background, effectively eliminating the human body. For human body reconstruction, not only is a free-view image generated, but also a novel dynamic human posture against a static background emerges. Experimental results validate the method’s capability to accurately capture details of dynamically deforming human bodies and scenes, demonstrating high fidelity and precision in reconstructing dynamic human bodies and static settings.

Conclusions

We introduce a monocular vision-based dynamic human neural radiation field reconstruction technique that represents static backgrounds and dynamic human figures via neural radiation fields. Utilizing monocular camera-captured dynamic human body videos, this method incorporates the SAM segmentation model and neural radiation fields to efficiently segregate scenes into static and dynamic components. Through separate training of the dynamic human body and static background using the neural radiation field, joint dynamic and static reconstruction is achieved. Experimental findings reveal that, compared with existing human body reconstruction methods, our proposed method offers a joint reconstruction of dynamic human bodies and static scenes with high authenticity and accuracy under monocular visual input. This breakthrough diminishes the prevalent reliance on multi-view synchronous acquisition in human body reconstruction and paves new pathways for applications in virtual reality, film production, and robotics. However, slow neural radiation field training persists as a common issue. Future efforts will aim to enhance training speed, refine algorithm performance, and broaden applicable scenarios.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

body reconstruction human body parametric model machine vision monocular vision neural radiation field scene decomposition

Tools

Get Citation

Copy Citation Text

Chao Sun, Jun Qiu, Lina Wu, Chang Liu. Reconstruction of Dynamic Human Neural Radiance Fields Based on Monocular Vision[J]. Acta Optica Sinica, 2024, 44(19): 1915001

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites