Experimental Design of Speech Emotion Recognition with the Multi-Task Teacher-Student Model

Linhui SUN; Ping’an LI; Yunlong LEI; Zixiao ZHANG

doi:10.12179/1672-4550.20240391

Experiment Science and Technology, Volume. 23, Issue 4, 1(2025)

Experimental Design of Speech Emotion Recognition with the Multi-Task Teacher-Student Model

Linhui SUN^*, Ping’an LI, Yunlong LEI, and Zixiao ZHANG

School of Communications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China

show less

Abstract Get PDF(in Chinese)

References(22)

[1] LANGARI S, MARVI H, ZAHEDI M. Efficient speech emotion recognition using modified feature extraction[J]. Informatics in Medicine Unlocked, 20, 100424(2020).

[2] MAI J L, XING X F, CHEN W D et al. DropFormer: A dynamic noise-dropping transformer for speech emotion recognition[C], 2645-2649(2024).

[3] WANG J C, ZHAO Y, LU C et al. Boosting cross-corpus speech emotion recognition using CycleGAN with contrastive learning[C], 1605-1609(2024).

[4] CHATTERJEE R, MAZUMDAR S, SHERRATT R S et al. Real-time speech emotion analysis for smart home assistants[J]. IEEE Transactions on Consumer Electronics, 67, 68-76(2021).

[5] LAGHARI M, TAHIR M J, AZEEM A et al. Robust speech emotion recognition for Sindhi language based on deep convolutional neural network[C], 543-548(2021).

[6] KODURU A, VALIVETI H B, BUDATI A K. Feature extraction algorithms to improve the speech emotion recognition rate[J]. International Journal of Speech Technology, 23, 45-55(2020).

[7] ZHOU H S, DU J, TU Y H et al. Using speech enhancement preprocessing for speech emotion recognition in realistic noisy conditions[C], 4098-4102(2020).

[8] CHAKRABORTY R, PANDA A, PANDHARIPANDE M et al. Front-end feature compensation and denoising for noise robust speech emotion recognition[C], 3257-3261(2019).

[9] SUN L H, FU S, WANG F. Decision tree SVM model with Fisher feature selection for speech emotion recognition[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2019, 2(2019).

[10] BANDELA S R, KUMAR T K. Unsupervised feature selection and NMF de-noising for robust Speech Emotion Recognition[J]. Applied Acoustics, 172, 107645(2021).

[11] TRIANTAFYLLOPOULOS A, KEREN G, WAGNER J et al. Towards robust speech emotion recognition using deep residual networks for speech enhancement[C], 1691-1695(2019).

[12] SUN L H, LEI Y L, WANG S et al. Joint enhancement and classification constraints for noisy speech emotion recognition[J]. Digital Signal Processing, 151, 104581(2024).

[13] BUSSO C, BULUT M, LEE C C et al. IEMOCAP: Interactive emotional dyadic motion capture database[J]. Language Resources and Evaluation, 42, 335-359(2008).

[14] LOHRENZ T, LI Z Y, FINGSCHEIDT T. Multi-encoder learning and stream fusion for transformer-based end-to-end automatic speech recognition[C], 2846-2850(2021).

[15] LI J Y, ZHAO R, HUANG J T et al. Learning small-size DNN with output-distribution-based criteria[C], 1910-1914(2014).

[16] NEUMANN M, VU N T. Improving speech emotion recognition with unsupervised representation learning on unlabeled speech[C], 7390-7394(2019).

[17] XU M K, ZHANG F, CUI X D et al. Speech emotion recognition with multiscale area attention and data augmentation[C], 6319-6323(2021).

[18] SNYDER D, CHEN G G, POVEY D. MUSAN: A music, speech, and noise corpus[EB/OL]. https://arxiv.org/abs/1510.08484v1

[19] XU M K, ZHANG F, KHAN S U. Improve accuracy of speech emotion recognition with attention head fusion[C], 1058-1064(2020).

[20] ZHU W J, LI X. Speech emotion recognition with global-aware fusion on multi-scale feature representation[C], 6437-6441(2022).

[21] YE J X, WEN X C, WANG X Z et al. GM-TCNet: Gated multi-scale temporal convolutional network using emotion causality for speech emotion recognition[J]. Speech Communication, 145, 21-35(2022).

[22] YE J X, WEN X C, WEI Y J et al. Temporal modeling matters: A novel temporal emotional modeling approach for speech emotion recognition[C], 1-5(2023).

Tools

Get Citation

Copy Citation Text

Linhui SUN, Ping’an LI, Yunlong LEI, Zixiao ZHANG. Experimental Design of Speech Emotion Recognition with the Multi-Task Teacher-Student Model[J]. Experiment Science and Technology, 2025, 23(4): 1

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: Jul. 25, 2024

Accepted: Oct. 30, 2024

Published Online: Jul. 30, 2025

The Author Email: Linhui SUN (sunlh@njupt.edu.cn)

DOI:10.12179/1672-4550.20240391

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology