Multi-agent ad-hoc speech recognition

CHENJunqi; ZHANG Xiaolei

doi:10.11805/tkyda2021247

Journal of Terahertz Science and Electronic Information Technology , Volume. 21, Issue 9, 1163(2023)

Multi-agent ad-hoc speech recognition

CHENJunqi and ZHANG Xiaolei

Author Affiliations

[in Chinese]

show less

Speech perception is an important part of unmanned systems. Most of the existing work focuses on the speech perception of a single agent, which is affected by factors such as noise and reverberation, and the performance has an upper limit. Therefore, it is necessary to study multi-agent speech perception, and improve perception performance through multi-agent self-organization and mutual cooperation. A multi-agent ad-hoc speech system is proposed under the assumption that each agent outputs a channel of speech stream. The multi-agent ad-hoc speech system aims to comprehensively utilize all channels to improve perception performance. Taking the speech recognition as an example, a channel selection method that can handle large-scale multi-agent speech recognition is proposed. Specifically, an end-to-end speech recognition stream attention mechanism based on Sparsemax operator is proposed to force the channel weights of noisy channels to zero, and make the stream attention bear the function of channel selection. Nevertheless, Sparsemax would punish the weights of many channels to zero harshly. Therefore, Scaling Sparsemax is proposed, which punishes the channels mildly by setting the weights of strong noise channels to zero only. At the same time, a multilayer stream attention structure is proposed to effectively reduce computational complexity. Experimental results in an unmanned system environment with up to 30 agents under the conformer speech recognition architecture show that the Word Error Rate(WER) of the proposed Scaling Sparsemax is lower than that of Softmax by over 30% on simulation data sets, and by over 20% on semi-real data sets, in test scenarios with mismatched channel numbers.

Keywords

attention channel selection multi-agent speech recognition Scaling Sparsemax

Tools

Get Citation

Copy Citation Text

CHENJunqi, ZHANG Xiaolei. Multi-agent ad-hoc speech recognition[J]. Journal of Terahertz Science and Electronic Information Technology , 2023, 21(9): 1163

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Received: Jun. 14, 2021

Accepted: --

Published Online: Jan. 19, 2024

The Author Email:

DOI:10.11805/tkyda2021247

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology