Laser & Optoelectronics Progress, Volume. 59, Issue 13, 1307001(2022)

Language Identification Using Joint Voice Activity Detection and Dynamic Range Control

Yankai Wang, Hua Long*, Yubin Shao, Qingzhi Du, and Yao Wang
Author Affiliations
  • Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan , China
  • show less
    Figures & Tables(13)
    MFCC0 feature voice activity detection. (a) Voice waveform; (b) MFCC0 features; (c) MFCC0 feature voice activity detection result after median filtering
    DRC input/output processing unit
    Voice changes before and after DRC processing. (a) Voice waveform changes before and after DRC processing; (b) spectropram before DRC processing; (c) spectropram after DRC processing
    Comparison of different frequency scales. (a) Linear scale spectrogram; (b) log scale spectrogram
    Flow chart of language recognition
    Multi-classification task evaluation parameters
    Results of different frequency coordinate scales
    Resnet classification results
    ResNeSt classification results
    Language recognition result confusion matrix
    • Table 1. Probability distribution change before and after VAD

      View table

      Table 1. Probability distribution change before and after VAD

      Probability distribution before VADξ10
      PL2SL1S
      Probability distribution after VADξ10
      PL2+L3SL1-L3S
    • Table 2. Data allocation of training set and testing set

      View table

      Table 2. Data allocation of training set and testing set

      Language

      type

      Training setTesting setTotal wav numberDuration /s
      Wav numberPeople numberWav numberPeople number
      French120015030014915003
      German120015030015015003
      Spanish120015130015115003
      English120016930015415003
      Italian120015130015015003
      Russian120015030014815003
      Total720092118009029000
    • Table 3. Comparison of language identification results of several different features

      View table

      Table 3. Comparison of language identification results of several different features

      Feature(Frame_number, Data_dimension)Aaccuracy /%
      MFCC-SDC(374, 56)65.72
      MFCC(374, 39)80.88
      GFCC(374, 32)85.44
      Log scale Fbank feature(374, 64)93.05
      Linear scale spectrogram(374, 128)93.66
      Log scale spectrogram (proposed)(374, 128)97.94
    Tools

    Get Citation

    Copy Citation Text

    Yankai Wang, Hua Long, Yubin Shao, Qingzhi Du, Yao Wang. Language Identification Using Joint Voice Activity Detection and Dynamic Range Control[J]. Laser & Optoelectronics Progress, 2022, 59(13): 1307001

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Fourier Optics and Signal Processing

    Received: Jul. 12, 2021

    Accepted: Aug. 13, 2021

    Published Online: Jun. 9, 2022

    The Author Email: Long Hua (2748373869@qq.com)

    DOI:10.3788/LOP202259.1307001

    Topics