Infrared and Laser Engineering, Volume. 54, Issue 8, 20250129(2025)
3D face reconstruction based on multimodal expression encoding and temporal modeling
[1] [1] DENG Y, YANG J, XU S, et al. Accurate 3D face reconstruction with weaklysupervised learning: From single image to image set [C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition Wkshops, 2019: 285295.
[2] [2] GUO J, ZHU X, YANG Y, et al. Towards fast, accurate stable 3D dense face alignment [C]European Conference on Computer Vision, Cham: Springer International Publishing, 2020: 152168.
[3] FENG Y, FENG H, BLACK M J et al. Learning an animatable detailed 3D face model from in-the-wild images[J]. ACM Transactions on Graphics (ToG), 40, 1-13(2021).
[4] [4] DANĚČEK R, BLACK M J, BOLKART T. Emoca: Emotion driven monocular face capture animation [C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2022: 2031120322.
[5] [5] RETSINAS G, FILNTISIS P P, DANECEK R, et al. 3D facial expressions through analysisbyneuralsynthesis [C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2024: 24902501.
[6] [6] ZHANG C, ZHAO Y, HUANG Y, et al. Facial: Synthesizing dynamic talking face with implicit attribute learning [C]Proceedings of the IEEECVF International Conference on Computer Vision (ICCV), 2021: 38673876.
[7] [7] CHEN L, MADDOX R K, DUAN Z, et al. Hierarchical crossmodal talking face generation with dynamic pixelwise loss [C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2019: 78327841.
[8] [8] YI R, YE Z, ZHANG J, et al. Audiodriven talking face video generation with learningbased personalized head pose[EBOL]. (20200224) [20240521]. https:arxiv.gabs2002.10137.
[9] [9] JI X, LIN C, DING Z, et al. Realtalk: Realtime realistic audiodriven face generation with 3D facial priguided identity alignment wk [EBOL]. (20240626) [20250128]. https:arxiv.gabs2406.18284.
[10] [10] RADFD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]International Conference on Machine Learning, PmLR, 2021: 87488763.
[11] NAGRANI A, CHUNG J S, XIE W et al. Voxceleb: Large-scale speaker verification in the wild[J]. Computer Speech & Language, 60, 101027(2020).
[12] BAEVSKI A, ZHOU Y, MOHAMED A et al. Wav2vec 2.0: A framework for self-supervised learning of speech representations[J]. Advances in Neural Information Processing Systems, 33, 12449-12460(2020).
[15] MOLLAHOSSEINI A, HASANI B, MAHOOR M H. Affectnet: A database for facial expression, valence, and arousal computing in the wild[J]. IEEE Transactions on Affective Computing, 10, 18-31(2017).
[16] GERCZUK M, AMIRIPARIAN S, OTTL S et al. Emonet: A transfer learning framework for multi-corpus speech emotion recognition[J]. IEEE Transactions on Affective Computing, 14, 1472-1487(2021).
[17] [17] CHANG F J, TRAN A T, HASSNER T, et al. Exp: Lmarkfree, deep, 3D facial expressions [C]2018 13th IEEE International Conference on Automatic Face & Gesture Recognition, 2018: 122129.
[18] [18] SHANG J, SHEN T, LI S, et al. Selfsupervised monocular 3D face reconstruction by occlusionaware multiview geometry consistency [C]European Conference on Computer Vision. 2020: 5370.
[19] [19] ZHOU H, LIU Y, LIU Z, et al. Talking face generation by adversarially disentangled audiovisual representation [C]Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 92999306.
[20] ZHOU Y, HAN X, SHECHTMAN E et al. Makelttalk: speaker-aware talking-head animation[J]. ACM Transactions on Graphics, 39, 1-15(2020).
Get Citation
Copy Citation Text
Xiaolong HE, Feipeng DA, Shaoyan GAI. 3D face reconstruction based on multimodal expression encoding and temporal modeling[J]. Infrared and Laser Engineering, 2025, 54(8): 20250129
Category: Optical imaging, display and information processing
Received: Feb. 27, 2025
Accepted: --
Published Online: Aug. 29, 2025
The Author Email: