Infrared and Laser Engineering, Volume. 54, Issue 8, 20250129(2025)

3D face reconstruction based on multimodal expression encoding and temporal modeling

Xiaolong HE1,2, Feipeng DA1,2, and Shaoyan GAI1,2
Author Affiliations
  • 1School of Automation, Southeast University, Nanjing 210096, China
  • 2Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Nanjing 210096, China
  • show less
    Figures & Tables(12)
    3D face reconstruction based on multimodal expression encoding and temporal modeling
    Cross modal feature fusion
    Adaptive temporal modeling
    Expression attribute descriptor coverage range
    Comparison of the reconstruction effects of image-based bimodal face reconstruction and other algorithms
    Three mode fusion face reconstruction based on video sequence and comparison with other algorithms
    Ablation of modal fusion methods
    Ablation of temporal adaptive adjustment module
    • Table 1. Expression attribute text descriptors

      View table
      View in Article

      Table 1. Expression attribute text descriptors

      Facial area & affectDescription
      Eye stateGazing, squinting, scowling, staring
      EyebrowTwisted, frowning, furrowed, drooping
      Mouth shapeClosed naturally, smiling, talking, shouting
      Nose stateSlight flare, breathing normally, flared, nostrils dilated
      Cheek stateAt rest, slightly puffed, puffed, tense
      Jaw stateRelaxed, slightly tensed, clenched, gritting
      ForeheadSmooth, lightly wrinkled, wrinkled, deeply furrowed
      ValenceSad, negative, neutral, positive, happy
      ArousalCalm, mild, excited, intense
    • Table 2. Impact of region and descriptor counts on coverage and overlap

      View table
      View in Article

      Table 2. Impact of region and descriptor counts on coverage and overlap

      Region counts/Descriptor countsCoverage/Overlap
      7/1227%/30%50%/71%72%/83%72%/89%
      8/2436%/35%61%/45%79%/60%84%/70%
      9/3640%/40%64%/55%91%/71%92%/91%
      10/4838%/45%67% 50%92%/78%93%/92%
    • Table 3. Comparison of emotion reconstruction performance on the AffectNet test set

      View table
      View in Article

      Table 3. Comparison of emotion reconstruction performance on the AffectNet test set

      MethodValenceArousal
      PCC↑CCCRMSE↓SAGRPCCCCCRMSE↓SAGR
      EmoNet[16]0.750.730.320.800.680.650.290.78
      ExpNet[17]0.450.420.430.730.390.360.380.64
      MGCNet[18]0.710.690.350.800.590.580.340.77
      3DDFA_V2[2]0.630.620.390.750.530.500.340.73
      Deep3DFace[1]0.750.730.330.800.660.650.310.78
      DECA[3]0.700.690.370.760.590.570.330.74
      EMOCA[4]0.780.770.310.810.690.680.270.81
      SMIRK[5]0.740.720.35-0.630.610.31-
      Ours0.790.770.300.810.710.690.300.81
    • Table 4. Comparison of reconstruction performance with other reconstruction algorithms on Voxceleb test set

      View table
      View in Article

      Table 4. Comparison of reconstruction performance with other reconstruction algorithms on Voxceleb test set

      MethodLMD↓PSNR↑SSIM↑ECM↑
      ATVG[7]18.2526.220.890.45
      DAVS[19]7.3329.450.860.51
      ZHOU[20]10.1332.140.870.50
      YI[8]8.6730.440.810.55
      FACIAL[6]6.9831.410.940.59
      RealTalk[9]6.7230.130.930.63
      Ours6.8230.210.940.68
    Tools

    Get Citation

    Copy Citation Text

    Xiaolong HE, Feipeng DA, Shaoyan GAI. 3D face reconstruction based on multimodal expression encoding and temporal modeling[J]. Infrared and Laser Engineering, 2025, 54(8): 20250129

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Optical imaging, display and information processing

    Received: Feb. 27, 2025

    Accepted: --

    Published Online: Aug. 29, 2025

    The Author Email:

    DOI:10.3788/IRLA20250129

    Topics