Infrared and Laser Engineering, Volume. 54, Issue 8, 20250129(2025)

3D face reconstruction based on multimodal expression encoding and temporal modeling

Xiaolong HE1,2, Feipeng DA1,2, and Shaoyan GAI1,2
Author Affiliations
  • 1School of Automation, Southeast University, Nanjing 210096, China
  • 2Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Nanjing 210096, China
  • show less
    References(20)

    [1] [1] DENG Y, YANG J, XU S, et al. Accurate 3D face reconstruction with weaklysupervised learning: From single image to image set [C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition Wkshops, 2019: 285295.

    [2] [2] GUO J, ZHU X, YANG Y, et al. Towards fast, accurate stable 3D dense face alignment [C]European Conference on Computer Vision, Cham: Springer International Publishing, 2020: 152168.

    [3] FENG Y, FENG H, BLACK M J et al. Learning an animatable detailed 3D face model from in-the-wild images[J]. ACM Transactions on Graphics (ToG), 40, 1-13(2021).

    [4] [4] DANĚČEK R, BLACK M J, BOLKART T. Emoca: Emotion driven monocular face capture animation [C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2022: 2031120322.

    [5] [5] RETSINAS G, FILNTISIS P P, DANECEK R, et al. 3D facial expressions through analysisbyneuralsynthesis [C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2024: 24902501.

    [6] [6] ZHANG C, ZHAO Y, HUANG Y, et al. Facial: Synthesizing dynamic talking face with implicit attribute learning [C]Proceedings of the IEEECVF International Conference on Computer Vision (ICCV), 2021: 38673876.

    [7] [7] CHEN L, MADDOX R K, DUAN Z, et al. Hierarchical crossmodal talking face generation with dynamic pixelwise loss [C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2019: 78327841.

    [8] [8] YI R, YE Z, ZHANG J, et al. Audiodriven talking face video generation with learningbased personalized head pose[EBOL]. (20200224) [20240521]. https:arxiv.gabs2002.10137.

    [9] [9] JI X, LIN C, DING Z, et al. Realtalk: Realtime realistic audiodriven face generation with 3D facial priguided identity alignment wk [EBOL]. (20240626) [20250128]. https:arxiv.gabs2406.18284.

    [10] [10] RADFD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]International Conference on Machine Learning, PmLR, 2021: 87488763.

    [11] NAGRANI A, CHUNG J S, XIE W et al. Voxceleb: Large-scale speaker verification in the wild[J]. Computer Speech & Language, 60, 101027(2020).

    [12] BAEVSKI A, ZHOU Y, MOHAMED A et al. Wav2vec 2.0: A framework for self-supervised learning of speech representations[J]. Advances in Neural Information Processing Systems, 33, 12449-12460(2020).

    [15] MOLLAHOSSEINI A, HASANI B, MAHOOR M H. Affectnet: A database for facial expression, valence, and arousal computing in the wild[J]. IEEE Transactions on Affective Computing, 10, 18-31(2017).

    [16] GERCZUK M, AMIRIPARIAN S, OTTL S et al. Emonet: A transfer learning framework for multi-corpus speech emotion recognition[J]. IEEE Transactions on Affective Computing, 14, 1472-1487(2021).

    [17] [17] CHANG F J, TRAN A T, HASSNER T, et al. Exp: Lmarkfree, deep, 3D facial expressions [C]2018 13th IEEE International Conference on Automatic Face & Gesture Recognition, 2018: 122129.

    [18] [18] SHANG J, SHEN T, LI S, et al. Selfsupervised monocular 3D face reconstruction by occlusionaware multiview geometry consistency [C]European Conference on Computer Vision. 2020: 5370.

    [19] [19] ZHOU H, LIU Y, LIU Z, et al. Talking face generation by adversarially disentangled audiovisual representation [C]Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 92999306.

    [20] ZHOU Y, HAN X, SHECHTMAN E et al. Makelttalk: speaker-aware talking-head animation[J]. ACM Transactions on Graphics, 39, 1-15(2020).

    Tools

    Get Citation

    Copy Citation Text

    Xiaolong HE, Feipeng DA, Shaoyan GAI. 3D face reconstruction based on multimodal expression encoding and temporal modeling[J]. Infrared and Laser Engineering, 2025, 54(8): 20250129

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Optical imaging, display and information processing

    Received: Feb. 27, 2025

    Accepted: --

    Published Online: Aug. 29, 2025

    The Author Email:

    DOI:10.3788/IRLA20250129

    Topics