Journal of Terahertz Science and Electronic Information Technology , Volume. 21, Issue 9, 1163(2023)

Multi-agent ad-hoc speech recognition

CHENJunqi and ZHANG Xiaolei
Author Affiliations
  • [in Chinese]
  • show less
    References(22)

    [1] [1] HEYMANN J, DRUDE L, HAEB-UMBACH R. Neural network based spectral mask estimation for acoustic beamforming[C]// 2016 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Shanghai, China: IEEE, 2016: 196-200.

    [2] [2] XIAO Xiong,WATANABE S,ERDOGAN H,et al. Deep beamforming networks for multi-channel speech recognition[C]// 2016 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP). Shanghai,China:IEEE, 2016:5745-5749.

    [3] [3] SAINATH T N,WEISS R J,WILSON K W,et al. Multichannel signal processing with deep neural networks for automatic speech recognition[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2017,25(5):965-979.

    [4] [4] HAEB-UMBACH R,HEYMANN J,DRUDE L,et al. Far-field automatic speech recognition[J]. Proceedings of the IEEE, 2021, 109(2):124-148.

    [5] [5] HEYMANN J,BACCHIANI M,SAINATH T N. Performance of mask based statistical beamforming in a smart home scenario[C]// 2018 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Calgary, AB, Canada: IEEE, 2018: 6722-6726.

    [7] [7] DORRI A,KANHERE S S,JURDAK R. Multi-agent systems:a survey[J]. IEEE Access, 2018(6):28573-28593.

    [9] [9] RAYKAR V C,KOZINTSEV I V,LIENHART R. Position calibration of microphones and loudspeakers in distributed computing platforms[J]. IEEE Transactions on Speech and Audio Processing, 2005,13(1):70-83.

    [10] [10] ZHANG Xiaolei. Deep ad-hoc beamforming[J]. Computer Speech & Language, 2021(68):101201.

    [11] [11] COSSALTER M, SUNDARARAJAN P, LANE I. Ad-hoc meeting transcription on clusters of mobile devices[C]// The 12th Annual Conference of the International Speech Communication Association-INTERSPEECH 2011. Florence,Italy:ISCA, 2011: 2881-2884.

    [12] [12] WOLF M, NADEU C. Channel selection measures for multi-microphone speech recognition[J]. Speech Communication, 2014 (57):170-180.

    [13] [13] BOEDDEKER C, HEITKAEMPER J, SCHMALENSTROEER J, et al. Front-end processing for the CHiME-5 dinner party scenario[C]// The 5th International Workshop on Speech Processing in Everyday Environments(CHiME 2018). Hyderabad,India: ISCA, 2018:35-40.

    [14] [14] WATANABE S,MANDEL M, BARKER J,et al. CHiME-6 challenge: tackling multispeaker speech recognition for unsegmented recordings[EB/OL]. (2020-05-02). https://doi.org/10.48550/arXiv.2004.09249.

    [15] [15] LI Ruizhi, WANG Xiaofei, MALLIDI S H, et al. Multi-stream end-to-end speech recognition[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2020(28):646-655.

    [16] [16] LI Ruizhi, SELL G, WANG Xiaofei, et al. A practical two-stage training strategy for multi-stream end-to-end speech recognition[C]// ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Barcelona, Spain:IEEE, 2020:7014-7018.

    [17] [17] GULATI A, QIN J, CHIU C C, et al. Conformer: convolution-augmented transformer for speech recognition[EB/OL]. (2020-05- 16). https://doi.org/10.48550/arXiv.2005.08100.

    [18] [18] MARTINS A F T,ASTUDILLO R F. From softmax to sparsemax: a sparse model of attention and multi-label classification[C]// Proceedings of the 33rd International Conference on Machine Learning. New York,NY,USA:JMLR.org, 2016:1614-1623.

    [19] [19] PANAYOTOV V, CHEN Guoguo, POVEY D, et al. Librispeech: an ASR corpus based on public domain audio books[C]// 2015 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP). South Brisbane,QLD,Australia:IEEE, 2015: 5206-5210.

    [20] [20] GUAN Shanzheng, LIU Shupei, CHEN Junqi, et al. Libri-adhoc40: a dataset collected from synchronized ad-hoc microphone arrays[EB/OL]. (2021-04-07). https://doi.org/10.48550/arXiv.2103.15118.

    [21] [21] TAN Xu, ZHANG Xiaolei. Speech enhancement aided end-to-end multi-task learning for voice activity detection[EB/OL]. TAN Xu, ZHANG Xiaolei. Speech enhancement aided end-to-end multi-task learning for voice activity detection[EB/OL]. (2020-10-23). https://doi.org/10.48550/arXiv.2010.12484.

    [22] [22] BARKER J,MARXER R,VINCENT E,et al. The third 'CHiME' speech separation and recognition challenge:dataset,task and baselines[C]// 2015 IEEE Workshop on Automatic Speech Recognition and Understanding(ASRU). Scottsdale,AZ,USA:IEEE, 2015:504-511.

    [23] [23] VARGA A,STEENEKEN H J M. Assessment for automatic speech recognition:II. NOISEX-92:a database and an experiment to study the effect of additive noise on speech recognition systems[J]. Speech Communication, 1993,12(3):247-251.

    [24] [24] PARK D S,CHAN W,ZHANG Y,et al. SpecAugment: a simple data augmentation method for automatic speech recognition[EB/ OL]. (2019-12-03). https://doi.org/10.48550/arXiv.1904.08779.

    Tools

    Get Citation

    Copy Citation Text

    CHENJunqi, ZHANG Xiaolei. Multi-agent ad-hoc speech recognition[J]. Journal of Terahertz Science and Electronic Information Technology , 2023, 21(9): 1163

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Received: Jun. 14, 2021

    Accepted: --

    Published Online: Jan. 19, 2024

    The Author Email:

    DOI:10.11805/tkyda2021247

    Topics