Significance The metaverse is a guiding and supporting technology for the revolution of internet. It can enhance the visual experience and interactive efficiency, demonstrating prominent economic and social benefits. Digital 3D content is a core element of the metaverse, serving as the primary medium for visual information and interactive feedback. Thus, the generation and presentation of 3D content are critical for the construction of the metaverse (
Fig.1-
Fig.2). Generating 3D content through digital rendering technology and presenting it through holographic display technology is a wise combination for the metaverse construction because it can strike a balance among visual fidelity, device costs, and deployment complexity. However, in the task of real-world digitalization, this combination often faces bottlenecks of calculation speed and presentation quality which are caused by the massive computational load. Fortunately, the advancement of neural network provides a powerful tool to break through these bottlenecks.
Progress Digital 3D rendering of 2D images, also known as depth estimation, can be categorized into multi-view estimation, motion estimation, and monocular estimation. Monocular depth estimation employs single-view 2D images as the input data, demonstrating advantages including high deployment flexibility and low device costs. The neural network of monocular depth estimation can be categorized into supervised-type and unsupervised-type (
Fig.3). Supervised network requires depth-labeled datasets as supervisory signals for parameter training. However, its practical application is often limited by the high difficulty of obtaining labeled datasets. Unsupervised network primarily relies on mathematical priors to achieve depth estimation, significantly reducing dependence on labeled datasets. However, the performance of this type of networks still requires continuous enhancement. Currently, monocular depth estimation networks face challenges in insufficient estimation robustness and inadequate calculation speed. To rapidly construct high-quality 3D content for the metaverse, constraints in monocular depth estimation require further in-depth investigation, to break through these mentioned challenges. Potential research directions include the optimization of estimation intervals, reduction of feature redundancy in depth estimation, and enhancement of correlations between monocular estimation and multi-view estimation (
Fig.4).
Holographic display is an impeccable solution for presenting digital 3D content in the metaverse. Phase-only hologram, with its high energy-efficiency and absence of twin-image artifact, serves as a superior medium for dynamic 3D content. However, the generation process of a phase-only hologram is ill-posed, posing challenges of limited computational speed and accuracy. Neural network, as an expert in solving ill-posed problems, provides a powerful tool for the calculation of phase-only holograms. Generation networks for phase-only holograms can be categorized into data-driven type and model-driven type (
Fig.5). Data-driven network requires 3D targets and corresponding phase-only holograms to update parameters of the network. However, obtaining high-quality hologram-datasets demands significant computational resources. Model-driven network leverages physical constraints to train the network, overcoming the limitation of dataset quality on inference capabilities of the network. Currently, holographic display often suffers from the limited depth ranges in optical reconstructions. To extend the depth range, it is critical to address the constraints imposed by computational strategies on solving ill-posed problems. Further research directions include frequency filtering of phase-only holograms, optimization of initial calculation conditions, and selection of solution paths (
Fig.6).
Conclusion and prospect The integration of metaverse technology with internet technology holds the potential to revolutionize many fields including education, social interaction, healthcare, and industry. Neural network, as a rapid and accurate calculation tool, provides an ideal solution for the generation and presentation of the 3D content in the metaverse. The limited estimation robustness and calculation speed pose a bottleneck on 3D content generation. Researches on the constraints in monocular depth estimation should be conducted to breakthrough this bottleneck. The limited depth range of optical reconstructions is a major challenge for holographic presentation of the 3D content. Addressing this challenge requires optimizing calculation strategies for solving ill-posed problems. Based on these researches, 3D acquisition and projection systems can be constructed in the foreseeable future, which would inject strong momentum into the sustainable development of virtual-real interaction in the metaverse.