Deep learning empowers generation and presentation of virtual-real fusion scenarios in holographic metaverse: development and prospects (<i>invited</i>)

[2] HE Zehao, CAO Liangcai. Display, interactions and applications of immersive metaverse: Progress and outlooks[J]. Science & Technology Review, 41, 6-14(2023).

[3] [3] GOTSCH D, ZHANG X, MERRITT T, et al. TeleHuman2: A cylindrical light field teleconferencing system f lifesize 3D human telepresence[C] Proceedings of the 2018 CHI Conference on Human Facts in Computing Systems, 2018, 18: 552.

[4] LAWRENCE J, GOLDMAN D B, ACHAR S et al. Project Starline: A high-fidelity telepresence system[J]. ACM Transactions on Graphics, 40, 242(2021).

[5] HE L, LIU K, HE Z et al. Three-dimensional holographic communication system for the metaverse[J]. Optics Communications, 526, 128894(2023).

[6] ZHAN Z, SUN X, HE C et al. Enhancing precision for simultaneous 3D localization and 3D orientation with structured illumination[J]. Optics Letters, 50, 2856-2859(2025).

[7] YU J, WANG R, LIU X et al. Accuracy improvements of near-field photometric stereo via light source calibration[J]. Optics Express, 33, 14207-14220(2025).

[8] KOEHLER N, GEIS M, NöH C et al. Key parameters for performance and resilience modeling of 3D time-of-flight cameras under consideration of signal-to-noise ratio and phase noise wiggling[J]. Sensors, 25, 109(2024).

[9] LEE J, USMANI K, JAVIDI B. Polarimetric 3D integral imaging profilometry under degraded environmental conditions[J]. Optics Express, 32, 43172-43183(2024).

[10] YE W, ZHENG W, CAI J et al. Light-field imaging device based on a Fresnel lens array with composite microstructures[J]. Optics Letters, 50, 1257-1260(2025).

[11] [11] SUN J, XIE Y, CHEN L, et al. Neuralrecon: Realtime coherent 3D reconstruction from monocular video[C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition. 2021: 1559815607.

[12] [12] WENG C Y, CURLESS B, SRINIVASAN P P, et al. Humannerf: Freeviewpoint rendering of moving people from monocular video[C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2022: 1621016220.

[13] LI Z, WANG X, LIU X et al. BinsFormer: Revisiting adaptive bins for monocular depth estimation[J]. IEEE Transactions on Image Processing, 33, 3964-3976(2024).

[14] [14] WANG Y, LIANG Y, XU H, et al. SQLdepth: Generalizable selfsupervised finestructured monocular depth estimation[C]Proceedings of the AAAI Conference on Artificial Intelligence, 2024: 57135721.

[15] YANG Z, PAN J, DAI J et al. Self-supervised lightweight depth estimation in endoscopy combining CNN and transformer[J]. IEEE Transactions on Medical Imaging, 43, 1934-1944(2024).

[16] WU T, YUAN Y J, ZHANG L X et al. Recent advances in 3D Gaussian splatting[J]. Computational Visual Media, 10, 613-642(2024).

[17] FAN R, WU J, SHI X et al. Fov-GS: Foveated 3D Gaussian splatting for dynamic scenes[J]. IEEE Transactions on Visualization and Computer Graphics, 31, 2975-2985(2025).

[18] GUO Z, ZHOU W, LI L et al. Motion-aware 3D Gaussian splatting for efficient dynamic scene reconstruction[J]. IEEE Transactions on Circuits and Systems for Video Technology, 35, 3119-3133(2025).

[19] LEE Y, YOON G J, SONG J et al. Single-stage convolutional neural radiance fields[J]. Pattern Analysis and Applications, 28, 48(2025).

[20] CHEN Y, YUAN Q, LI Z et al. UPST-NeRF: Universal photorealistic style transfer of neural radiance fields for 3D scene[J]. IEEE Transactions on Visualization and Computer Graphics, 31, 2045-2057(2025).

[21] JIANG L, CHE R, HU L et al. Mixer-NeRF: Research on 3D reconstruction methods of neural radiance fields based on hybrid spatial feature information[J]. Journal of Computing and Electronic Information Management, 15, 149-155(2024).

[22] ZHANG Z, YU X, GAO X et al. High-fidelity light-field display with enhanced information utilization by modulating chrominance and luminance separately[J]. Light: Science & Applications, 14, 78(2025).

[23] JI L, SANG X, XING S et al. Text-driven light-field content editing for three-dimensional light-field display based on Gaussian splatting[J]. Optics Express, 33, 954-971(2025).

[24] ZHANG S, XING S, FU B et al. Optimized visual simulation of 3D light field display based on differentiable ray tracing[J]. Optics Communications, 583, 131728(2025).

[25] LI Hanyu, YU Xunbo, GAO Xing et al. Key technologies of high fidelity three-dimensional light field display (Invited)[J]. Acta Optica Sinica, 45, 0200005(2025).

[26] PEI X, YU X, GAO X et al. Optimization of depth of field and enhancement of viewing angle in three-dimensional light field fusion display system based on the design of aspherical symmetric compound lens[J]. Optics and Lasers in Engineering, 178, 108221(2024).

[27] KUMAGAI K, MORI T, HAYASAKI Y. Fist-sized aerial volumetric display with femtosecond laser drawing[J]. Journal of the Society for Information Display, 33, 168-174(2025).

[28] SMALLEY D E, NYGAARD E, SQUIRE K et al. A photophoretic-trap volumetric display[J]. Nature, 553, 486-490(2018).

[29] BLINDER D, BIRNBAUM T, ITO T et al. The state-of-the-art in computer generated holography for 3D display[J]. Light: Advanced Manufacturing, 3, 572-600(2022).

[30] HE Z, SUI X, JIN G et al. Progress in virtual reality and augmented reality based on holographic display[J]. Applied Optics, 58, A74-A81(2019).

[31] SHI L, LI B, KIM C et al. Towards real-time photorealistic 3D holography with deep neural networks[J]. Nature, 591, 234-239(2021).

[32] WU J, LIU K, SUI X et al. High-speed computer-generated holography using an autoencoder-based deep neural network[J]. Optics Letters, 46, 2908-2911(2021).

[33] LIU K, WU J, HE Z et al. 4K-DMDNet: Diffraction model-driven network for 4K computer-generated holography[J]. Opto-Electronic Advances, 6, 220135(2023).

[34] GOPAKUMAR M, LEE G Y, CHOI S et al. GOPAKUMAR M, LEE G Y, CHOI S, et al[J]. Nature, 629, 791-797(2024).

[35] SHI L, WEBB R, XIAO L et al. Neural compression for hologram images and videos[J]. Optics Letters, 47, 6013-6016(2022).

[36] [36] ZHANG H, SHEN C, LI Y, et al. Exploiting tempal consistency f realtime video depth estimation[C]Proceedings of the IEEECVF International Conference on Computer Vision, 2019: 17251734.

[37] YAN Huabiao, XU Fangqi, HUANG Lv’er et al. Review of multi-view stereo reconstruction methods based on deep learning[J]. Optics and Precision Engineering, 31, 2444-2464(2023).

[38] DENG H, ZHANG T, DAI Y et al. Deep non-rigid structure-from-motion: A sequence-to-sequence translation perspective[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 10814-10828(2024).

[39] ARAMPATZAKIS V, PAVLIDIS G, MITIANOUDIS N et al. Monocular depth estimation: a thorough review[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 2396-2414(2024).

[40] SNAVELY N, SEITZ S M, SZELISKI R. Photo tourism: exploring photo collections in 3D[J]. ACM Transactions on Graphics, 25, 835-846(2006).

[41] [41] SCHÖNBERGER J L, FRAHM J. Structurefrommotion revisited[C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2016: 41044113.

[42] HUANG Jun, WANG Cong, LIU Yue et al. The progress of monocular depth estimation technology[J]. Journal of Image and Graphics, 24, 2081-2097(2019).

[43] KHAN F, SALAHUDDIN S, JAVIDNIA H. Deep learning-based monocular depth estimation methods: A state-of-the-art review[J]. Sensors, 20, 2272(2020).

[44] [44] ZHANG N, NEX F, VOSSELMAN G, et al. Litemono: A lightweight CNN transfmer architecture f selfsupervised monocular depth estimation[C] Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2023: 1853718546.

[45] ZHOU Z, FAN X, SHI P et al. Recurrent multiscale feature modulation for geometry consistent depth learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 9551-9566(2024).

[46] LIU C, ZHANG C, LIANG X et al. Attention mono-depth: attention-enhanced transformer for monocular depth estimation of volatile kiln burden surface[J]. IEEE Transactions on Circuits and Systems for Video Technology, 35, 1686-1699(2025).

[47] [47] XIAN Ke. Monocular depth prediction: algithms applications[D]. Wuhan: Huazhong University of Science Technology, 2021. (in Chinese)

[48] [48] LYU X, LIU L, WANG M, et al. HRdepth: high resolution selfsupervised monocular depth estimation[C]Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(3): 22942301.

[49] [49] EIGEN D, PUHRSCH C, FERGUS R, et al. Depth map prediction from a single image using a multiscale deep wk[C]Proceedings of the 28th International Conference on Neural Infmation Processing Systems, 2014, 2: 23662374.

[50] [50] EIGEN D, FERGUS R. Predicting depth, surface nmals semantic labels with a common multiscale convolutional architecture[C] Proceedings of the IEEECVF International Conference on Computer Vision, 2015: 26502658.

[51] [51] LIU F Y, SHEN C H, LIN G S. Deep convolutional neural fields f depth estimation from a single image[C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2015: 51625170.

[52] [52] LAINA I, RUPPRECHT C, BELAGIANNIS V, et al. Deeper depth prediction with fully convolutional residual wks[C]Proceedings of the International Conference on 3D Vision, 2016: 239248.

[53] [53] PATIL V, SAKARIDIS C, LINIGER A, et al. P3Depth: Monocular depth estimation with a piecewise planarity pri[C] Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2022: 16101621.

[54] HU M, YIN W, ZHANG C et al. Metric3D v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 10579-10596(2024).

[55] [55] GARG R, KUMAR B G V, CARNEIRO G, et al. Unsupervised CNN f single view depth estimation: Geometry to the rescue[C] Proceedings of the European Conference on Computer Vision, 2016: 740756.

[56] [56] ZHOU T, BROWN M, SNAVELY N, et al. Unsupervised learning of depth egomotion from video[C]Proceedings of IEEECVF Conference on Computer Vision Pattern Recognition, 2017: 66126619.

[57] [57] GODARD C, AODHA O M, FIRMAN M, et al. Digging into selfsupervised monocular depth estimation[C]Proceedings of the IEEECVF International Conference on Computer Vision, 2019: 38273837.

[58] [58] BIAN J W, LI Z, WANG N, et al. Unsupervised scaleconsistent depth egomotion learning from monocular video[C]Proceedings of the International Conference on Neural Infmation Processing Systems, 2019, 4: 3545.

[59] BIAN J W, ZHAN H, WANG N et al. Auto-rectify network for unsupervised indoor depth estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 9802-9813(2022).

[60] SUN L, BIAN J W, ZHAN H et al. SC-depthV3: Robust self-supervised monocular depth estimation for dynamic scenes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 497-508(2024).

[61] [61] AI H, CAO Z, CAO Y P, et al. HRDFuse: Monocular 360. depth estimation by collabatively learning holisticwithregional depth distributions[C] Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2023: 1327313282.

[62] YU S, WU M, LAM S K et al. EDS-depth: Enhancing self-Supervised monocular depth estimation in dynamic scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 26, 5585-5597(2025).

[63] [63] MING Y, HONG K, WEI Q, et al. Twostage enhancement wk f monocular depth estimation of indo scenes[C]Proceedings of the IEEE International Conference on Signal Processing, 2024: 495498.

[64] [64] CHEN R, LUO H, ZHAO F, et al. Structurecentric robust monocular depth estimation via knowledge distillation[C]Proceedings of the ACM Asian Conference on Computer Vision, 2024: 123140.

[65] [65] WANG Y, LI X, SHI M, et al. Knowledge distillation f fast accurate monocular depth estimation on mobile device[C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2021: 24572465.

[66] [66] ZHOU Z, DONG Q. Twoinone depth: Bridging the gap between monocular binocular selfsupervised depth estimation[C]Proceedings of the IEEECVF International Conference on Computer Vision, 2023: 93779387.

[67] [67] XU Y, YANG X, YU Y, et al. Depth estimation by combining binocular stereo monocular structuredlight[C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2022: 17461755.

[68] CAO Liangcai, HE Zehao, LIU Kexuan et al. Progress and challenges in dynamic holographic 3D display for the metaverse (invited)[J]. Infrared and Laser Engineering, 51, 20210935(2022).

[69] SUI X, HE Z, CHU D et al. Non-convex optimization for inverse problem solving in computer-generated holography[J]. Light: Sciences and Applications, 13, 158(2024).

[70] HE Z, SUI X, ZHANG H et al. Frequency-based optimized random phase for computer-generated holographic display[J]. Applied Optics, 60, A145-A154(2021).

[71] ZHANG C, ZHANG L, ZHANG R et al. Two-constraint-free dual-domain optimised random phase-only hologram[J]. Optics Communications, 552, 130065(2024).

[72] FANG Q, ZHENG H, XIA X et al. Generating high-quality phase-only holograms of binary images using global loss and stochastic homogenization training strategy[J]. Optics and Laser Technology, 181, 112059(2025).

[73] SUI X, HE Z, ZHANG H et al. Spatiotemporal double-phase hologram for complex-amplitude holographic displays[J]. Chinese Optics Letters, 18, 100901(2020).

[74] LIU K, HE Z, CAO L. Pattern-adaptive error diffusion algorithm for improved phase-only hologram generation[J]. Chinese Optics Letters, 19, 050501(2021).

[75] LIU K, WU J, CAO L. High-quality and high-speed computer-generated holography via deep-learning-assisted bidirectional error diffusion method[J]. Optics Express, 32, 37342-37354(2024).

[76] LIU K, HE Z, CAO L. Double amplitude freedom Gerchberg-Saxton algorithm for generation of phase-only hologram with speckle suppression[J]. Applied Physics Letters, 120, 061103(2022).

[77] GAO Q, HE Z, LIU K et al. Adaptive mixed-constraint Gerchberg-Saxton algorithm for phase-only holographic display[J]. Acta Physica Sinica, 72, 024203(2023).

[78] CHAKRAVARTHULA P, PENG Y, KOLLIN J et al. Wirtinger holography for near-eye displays[J]. ACM Transactions on Graphics, 38, 213(2019).

[79] PAN Y, WANG J, WU Y et al. Reconstructed quality improvement with a stochastic gradient descent optimization algorithm for a spherical hologram[J]. Applied Optics, 61, 5341-5349(2022).

[80] HUANG Y, WANG J, SU P et al. Lensless holographic dynamic projection system based on weakly supervised learning[J]. Optics and Laser Technology, 177, 111219(2024).

[81] ZHANG Y, ZHANG M, LIU K et al. Progress of the computer-generated holography based on deep learning[J]. Applied Sciences, 12, 8568(2022).

[82] QU Z, JIANG H, WANG K et al. Deep-learning-aided multi-focal hologram generation[J]. Optics and Laser Technology, 182, 112056(2025).

[83] CHEN C, NAM S, KIM D et al. Ultrahigh-fidelity full-color holographic display via color-aware optimization[J]. PhotoniX, 5, 20(2024).

[84] WANG D, LI Z, ZHENG Y et al. Liquid lens based holographic camera for real 3D scene hologram acquisition using end-to-end physical model-driven network[J]. Light: Science and Applications, 13, 62(2024).

[85] HORISAKI R, TAKAGI R, TANIDA J. Deep-learning-generated holography[J]. Applied Optics, 57, 3859-3863(2018).

[86] LEE J, JEONG J, CHO J et al. Deep neural network for multi-depth hologram generation and its training strategy[J]. Optics Express, 28, 27137-27154(2020).

[87] SHI L, LI B, MATUSIK W. End-to-end learning of 3D phase-only holograms for holographic display[J]. Light: Science and Applications, 11, 247(2022).

[88] PENG Y, CHOI S, PADMANABAN N et al. Neural holography with camera-in-the-loop training[J]. ACM Transactions on Graphics, 39, 185(2020).

[89] YU T, ZHANG S, CHEN W et al. Phase dual-resolution networks for a computer-generated hologram[J]. Optics Express, 30, 2378-2389(2022).

[90] FANG Q, ZHENG H, XIA X et al. Diffraction model-driven neural network with semi-supervised training strategy for real-world 3D holographic photography[J]. Optics Express, 32, 45406-45420(2024).

[91] YAN X, LI J, ZHANG Y et al. Generation of multiple-depth 3D computer-generated holograms from 2D-image-datasets trained CNN[J]. Advanced Sciences, 12, 2408610(2025).

[92] YAN X, LIU X, LI J et al. Generating multi-depth 3D holograms using a fully convolutional neural network[J]. Advanced Sciences, 11, 2308886(2024).

[93] ZHANG Y, CHENG D, WANG, Y et al. Real-time multi-depth holographic display using complex-valued neural network[J]. Optics Express, 33, 7380-7395(2025).

[94] ZHANG Y, YU G, CHEN C et al. A depth-aware network for real-time and high-quality neural holography[J]. IEEE Signal Processing Letters, 32, 756-760(2025).

[95] HE Z, SUI X, JIN G et al. Optimal quantization for amplitude and phase in computer-generated holography[J]. Optics Express, 29, 119-133(2020).

[96] WANG X, HE Z, CAO L. Analysis of reconstruction quality for computer-generated holograms using a model free of circular-convolution error[J]. Optics Express, 31, 19021-19035(2023).

[97] SUN G, HU C, ZHANG J et al. High-speed arbitrary pure phase hologram generation method based on a specific multi-phase[J]. Applied Optics, 63, 7338-7344(2024).

[98] XU X, WANG X, LUO W et al. Efficient computer-generated holography based on mixed linear convolutional neural networks[J]. Applied Sciences, 12, 4177(2022).

[99] QIN H, HAN C, SHI X et al. Complex-valued generative adversarial network for real-time and high-quality computer-generated holography[J]. Optics Express, 32, 44437-44451(2024).

[100] YUAN G, ZHOU M, LIU F et al. Physics-aware cross-domain fusion aids learning-driven computer-generated holography[J]. Photonics Research, 12, 2747-2756(2024).

[101] [101] HE K, ZHANG X, REN S, et al. Deep residual learning f image recognition[C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2016: 770778.

[102] [102] HE K, ZHANG X, REN S, et al. Identity mappings in deep residual wks[C]Proceedings of the European Conference on Computer Vision, 2016: 630645.

[103] DUMAS T, GALPIN F, BORDES P et al. Iterative training of neural networks for intra prediction[J]. IEEE Transactions on Image Processing, 30, 697-711(2021).

Tools

Get Citation

Copy Citation Text

Zehao HE, Yunhui GAO, Liangcai CAO, Yan ZHANG. Deep learning empowers generation and presentation of virtual-real fusion scenarios in holographic metaverse: development and prospects (invited)[J]. Infrared and Laser Engineering, 2025, 54(7): 20250189

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category: Special issue—Advanced display technology and applications

Received: Mar. 25, 2025

Accepted: --

Published Online: Aug. 29, 2025

The Author Email: Yan ZHANG (yzhang@cnu.edu.cn)

DOI:10.3788/IRLA20250189

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology