Acta Optica Sinica, Volume. 44, Issue 19, 1915001(2024)

Reconstruction of Dynamic Human Neural Radiance Fields Based on Monocular Vision

Chao Sun, Jun Qiu, Lina Wu, and Chang Liu*
Author Affiliations
  • Institute of Applied Mathematics, Beijing Information Science and Technology University, Beijing 100101, China
  • show less

    Objective

    The three-dimensional representation and reconstruction of dynamically deformed human bodies is a significant research direction in computer graphics and computer vision. It aims to represent, reconstruct and render the human body using dynamic videos or image sequences. Current methods for dynamic deformation human body reconstruction necessitate high-precision synchronization of multiple cameras and depth cameras to capture non-rigid body deformations and perform three-dimensional reconstruction. Reconstructing a dynamically deformed human body using a monocular camera presents a challenging yet practical research issue. As a crucial component in dynamic human body reconstruction, geometric representation is primarily divided into two categories: explicit and implicit representation. Existing dynamic human body reconstruction methods mostly focus on explicit representation. Most existing methods focus on explicit representation but are constrained by its inherent discrete properties, often struggling to present detailed deformation information. Moreover, these methods typically rely on equipment such as synchronized multi-view visual acquisition systems or depth cameras, increasing technical complexity and reducing feasibility, thus limiting the advancement and application of dynamic human body reconstruction. Given the heavy reliance on multi-view synchronous acquisition and the scarcity of research on combined dynamic and static reconstruction, our study proposes a dynamic human neural radiation field reconstruction method based on monocular vision. By introducing neural radiation fields to implicitly represent static backgrounds and dynamic human bodies, the problem of poor reconstruction outcomes is effectively addressed. The challenge of jointly reconstructing dynamic and static models is overcome through SAM segmentation of large models.

    Methods

    We utilize monocular camera data to undertake three-dimensional reconstructions of dynamically deformed human bodies. We propose the neural radiation field representation for dynamically deformed human bodies, a joint dynamic and static scene reconstruction of the neural radiation field, and its rendering technique. Leveraging the neural radiation field and a human body parametric model, we establish a dynamic deformation neural radiation field for the human body. The parametric model matches the dynamic human body in the video, mapping the dynamic body from camera space to a standardized static space via a deformation field. A geometric correction network adjusts inaccuracies between the parametric model and the scene’s human body. The segment anything model (SAM) is employed to dynamically and statically decompose the scene radiation field, using two-dimensional joints as prompts for precise extraction of the human body mask. Guided by the human body mask, the scene radiation field is split into a static background neural radiation field and a dynamic human body neural radiation field. The differentiable properties of volume rendering enable the joint reconstruction of both neural radiation fields. Ultimately, any viewing angle and human body pose are rendered through the volume rendering of the neural radiation field.

    Results and Discussions

    We present a monocular vision-based dynamic human body neural radiation field reconstruction that integrates the neural radiation field with a human body parametric model. Comparative analysis with existing methods is provided, with results illustrated in Figs. 5, 6, 7, 8, and Table 1. This approach combines the neural radiation field with the SAM to reconstruct the static background, effectively eliminating the human body. For human body reconstruction, not only is a free-view image generated, but also a novel dynamic human posture against a static background emerges. Experimental results validate the method’s capability to accurately capture details of dynamically deforming human bodies and scenes, demonstrating high fidelity and precision in reconstructing dynamic human bodies and static settings.

    Conclusions

    We introduce a monocular vision-based dynamic human neural radiation field reconstruction technique that represents static backgrounds and dynamic human figures via neural radiation fields. Utilizing monocular camera-captured dynamic human body videos, this method incorporates the SAM segmentation model and neural radiation fields to efficiently segregate scenes into static and dynamic components. Through separate training of the dynamic human body and static background using the neural radiation field, joint dynamic and static reconstruction is achieved. Experimental findings reveal that, compared with existing human body reconstruction methods, our proposed method offers a joint reconstruction of dynamic human bodies and static scenes with high authenticity and accuracy under monocular visual input. This breakthrough diminishes the prevalent reliance on multi-view synchronous acquisition in human body reconstruction and paves new pathways for applications in virtual reality, film production, and robotics. However, slow neural radiation field training persists as a common issue. Future efforts will aim to enhance training speed, refine algorithm performance, and broaden applicable scenarios.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Chao Sun, Jun Qiu, Lina Wu, Chang Liu. Reconstruction of Dynamic Human Neural Radiance Fields Based on Monocular Vision[J]. Acta Optica Sinica, 2024, 44(19): 1915001

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Machine Vision

    Received: Apr. 7, 2024

    Accepted: May. 13, 2024

    Published Online: Oct. 12, 2024

    The Author Email: Liu Chang (liu.chang.cn@ieee.org)

    DOI:10.3788/AOS240809

    Topics