Laser & Optoelectronics Progress, Volume. 59, Issue 13, 1307001(2022)
Language Identification Using Joint Voice Activity Detection and Dynamic Range Control
In the language identification system, the interference of silent segments and the inconsistency of voice decibel range leads to a decline in language identification. Additionally, algorithms using spectrograms for language identification cannot effectively show the information of its low-frequency part, which results in performance failure. To mitigate this, we proposed a language identification method based on joint voice activity detection and dynamic range control. First, we extracted the first dimension coefficient of the Mel-scale frequency cepstral coefficients. Second, we applied median filtering to smooth the feature parameters and perform voice activity detection to remove the silent segment of the voice. Next, we used the dynamic range control to adjust the decibel range of different voices. Finally, we put the log scale spectrogram into the convolutional neural network for classification. The experimental results show that the proposed algorithm improved performance by 7.16 percentage points as compared with the traditional language identification algorithm using spectrogram in the VoxForge public corpus under the ResNeSt network. Additionally, under the same experimental settings, the recognition performance of the log scale spectrogram showed superiority over other mainstream features, which fully validates the effectiveness and superiority of the proposed algorithm and features.
Get Citation
Copy Citation Text
Yankai Wang, Hua Long, Yubin Shao, Qingzhi Du, Yao Wang. Language Identification Using Joint Voice Activity Detection and Dynamic Range Control[J]. Laser & Optoelectronics Progress, 2022, 59(13): 1307001
Category: Fourier Optics and Signal Processing
Received: Jul. 12, 2021
Accepted: Aug. 13, 2021
Published Online: Jun. 9, 2022
The Author Email: Long Hua (2748373869@qq.com)