Computer Applications and Software, Volume. 42, Issue 4, 223(2025)
THE SEQ2SEQ MODEL FOR ASSISTING THE SPEAKER VERIFICATION ON SHORT UTTERANCES
[5] [5] Snyder D, Garcia-Romero D, Povey D, et al. Deep neural network embeddings for text-independent speaker verification[C]//Interspeech, 2017: 999-1003.
[6] [6] Bai Z, Zhang X L. Speaker recognition based on deep learning: An overview[J]. Neural Network, 2021, 140: 65-99.
[8] [8] Snyder D, Garcia-Romero D, Sell G, et al. Speaker recognition for multi-speaker conversations using X-vectors[C]//IEEE International Conference on Acoustics, Speech and Signal Processing, 2019: 5796-5800.
[9] [9] Peddinti V, Povey D, Khudanpur S. A time delay neural network architecture for efficient modeling of long temporal contexts[C]//16th Annual Conference of the International Speech Communication Association, 2015.
[10] [10] Snyder D, Garcia-Romero D, Sell G, et al. X-vectors: Robust DNN embeddings for speaker recognition[C]//IEEE International Conference on Acoustics, Speech and Signal Processing, 2018: 5329-5333.
[11] [11] Zeinali H, Wang S, Silnova A, et al. But system description to VoxCeleb speaker recognition challenge 2019[EB]. arX-iv: 1910.12592, 2019.
[12] [12] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[13] [13] Desplanques B, Thienpondt J, Demuynck K. ECAPATDNN: Emphasized channel attention, propagation and aggregation in TDNN based speaker verification[EB]. arXiv: 2005.07143, 2020.
[14] [14] Jung J W, Heo H S, Shim H J, et al. Short utterance compensation in speaker verification via cosine-based teacherstudent learning of speaker embeddings[C]//IEEE Automatic Speech Recognition and Understanding Workshop, 2019: 335-341.
[15] [15] Hajavi A, Etemad A. A deep neural network for short-segment speaker recognition[EB]. arXiv: 1907.10420, 2019.
[16] [16] Chen W D, Huang J, Bocklet T. Length-and noise-aware training techniques for short-utterance speaker recognition[EB]. arXiv: 2008.12218, 2020.
[17] [17] Jung Y M, Choi Y J, Lim H J, et al. A unified deep learning framework for short-duration speaker verification in adverse environments[J]. IEEE Access, 2020, 8: 175448-175466.
[18] [18] Liu K, Zhou H. Text-independent speaker verification with adversarial learning on short utterances[C]//IEEE International Conference on Acoustics, Speech and Signal Processing, 2020: 6569-6573.
[19] [19] Tawara N, Ogawa A, Iwata T, et al. Frame-level phoneme-invariant speaker embedding for text-independent speaker recognition on extremely short utterances[C]//IEEE International Conference on Acoustics, Speech and Signal Processing, 2020: 6799-6803.
[20] [20] Nagrani A, Chung J S, Zisserman A. VoxCeleb: A largescale speaker identification dataset[EB]. arXiv: 1706.08612, 2017.
[21] [21] Gao S, Cheng M, Zhao K, et al. Res2net: A new multiscale backbone architecture[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(2): 652-662.
[22] [22] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[23] [23] Snyder D, Chen G, Povey D. MUSAN: A music, speech, and noise corpus[EB]. arXiv: 1510.08484, 2015.
[24] [24] Ko T, Peddinti V, Povey D, et al. A study on data augmentation of reverberant speech for robust speech recognition[C]//IEEE International Conference on Acoustics, Speech and Signal Processing, 2017: 5220-5224.
Get Citation
Copy Citation Text
Yang Shuang, Ma Baichao, Yang Yu, Chen Dan. THE SEQ2SEQ MODEL FOR ASSISTING THE SPEAKER VERIFICATION ON SHORT UTTERANCES[J]. Computer Applications and Software, 2025, 42(4): 223
Category:
Received: Jan. 17, 2022
Accepted: Aug. 25, 2025
Published Online: Aug. 25, 2025
The Author Email: