3D face reconstruction based on multimodal expression encoding and temporal modeling

Facial area & affect	Description
Eye state	Gazing, squinting, scowling, staring
Eyebrow	Twisted, frowning, furrowed, drooping
Mouth shape	Closed naturally, smiling, talking, shouting
Nose state	Slight flare, breathing normally, flared, nostrils dilated
Cheek state	At rest, slightly puffed, puffed, tense
Jaw state	Relaxed, slightly tensed, clenched, gritting
Forehead	Smooth, lightly wrinkled, wrinkled, deeply furrowed
Valence	Sad, negative, neutral, positive, happy
Arousal	Calm, mild, excited, intense

Table 2. Impact of region and descriptor counts on coverage and overlap
View table
View in Article
Table 2. Impact of region and descriptor counts on coverage and overlap
Region counts/Descriptor counts Coverage/Overlap
7/12 27%/30% 50%/71% 72%/83% 72%/89%
8/24 36%/35% 61%/45% 79%/60% 84%/70%
9/36 40%/40% 64%/55% 91%/71% 92%/91%
10/48 38%/45% 67% 50% 92%/78% 93%/92%

Table 3. Comparison of emotion reconstruction performance on the AffectNet test set

View table

View in Article

Table 3. Comparison of emotion reconstruction performance on the AffectNet test set

Method	Valence				Arousal
Method	PCC↑	CCC↑	RMSE↓	SAGR↑	PCC↑	CCC↑	RMSE↓	SAGR↑
EmoNet^[16]	0.75	0.73	0.32	0.80	0.68	0.65	0.29	0.78
ExpNet^[17]	0.45	0.42	0.43	0.73	0.39	0.36	0.38	0.64
MGCNet^[18]	0.71	0.69	0.35	0.80	0.59	0.58	0.34	0.77
3DDFA_V2^[2]	0.63	0.62	0.39	0.75	0.53	0.50	0.34	0.73
Deep3DFace^[1]	0.75	0.73	0.33	0.80	0.66	0.65	0.31	0.78
DECA^[3]	0.70	0.69	0.37	0.76	0.59	0.57	0.33	0.74
EMOCA^[4]	0.78	0.77	0.31	0.81	0.69	0.68	0.27	0.81
SMIRK^[5]	0.74	0.72	0.35	-	0.63	0.61	0.31	-
Ours	0.79	0.77	0.30	0.81	0.71	0.69	0.30	0.81

Table 4. Comparison of reconstruction performance with other reconstruction algorithms on Voxceleb test set

View table

View in Article

Table 4. Comparison of reconstruction performance with other reconstruction algorithms on Voxceleb test set

Method	LMD↓	PSNR↑	SSIM↑	ECM↑
ATVG^[7]	18.25	26.22	0.89	0.45
DAVS^[19]	7.33	29.45	0.86	0.51
ZHOU^[20]	10.13	32.14	0.87	0.50
YI^[8]	8.67	30.44	0.81	0.55
FACIAL^[6]	6.98	31.41	0.94	0.59
RealTalk^[9]	6.72	30.13	0.93	0.63
Ours	6.82	30.21	0.94	0.68

Tools

Get Citation

Copy Citation Text

Xiaolong HE, Feipeng DA, Shaoyan GAI. 3D face reconstruction based on multimodal expression encoding and temporal modeling[J]. Infrared and Laser Engineering, 2025, 54(8): 20250129

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category: Optical imaging, display and information processing

Received: Feb. 27, 2025

Accepted: --

Published Online: Aug. 29, 2025

The Author Email:

DOI:10.3788/IRLA20250129

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology

Table 1. Expression attribute text descriptors

Table 1. Expression attribute text descriptors

Table 2. Impact of region and descriptor counts on coverage and overlap

Table 2. Impact of region and descriptor counts on coverage and overlap

Table 3. Comparison of emotion reconstruction performance on the AffectNet test set

Table 3. Comparison of emotion reconstruction performance on the AffectNet test set

Table 4. Comparison of reconstruction performance with other reconstruction algorithms on Voxceleb test set

Table 4. Comparison of reconstruction performance with other reconstruction algorithms on Voxceleb test set