Laser & Optoelectronics Progress, Volume. 58, Issue 2, 0210017(2021)
Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism
A single-layer long short term memory (LSTM) network is not generalizable to solve complex speech emotion recognition problems. Therefore, a hierarchical LSTM model with a self-attention mechanism is proposed. Penalty items are introduced to improve network performance. For the emotion recognition of video sequences, the attention mechanism is introduced to assign a weight to each video frame according to its emotional information and then classify these frames. The weighted decision fusion method is used to fuse expressions and speech signals to achieve the final emotion recognition. The experimental results demonstrate that compared with single-modal emotion recognition, the recognition accuracy of the proposed method on the selected data is improved by approximately 4%, thus the proposed method has a better recognition results.
Get Citation
Copy Citation Text
Tianbao Liu, Lingtao Zhang, Wentao Yu, Dongchuan Wei, Yijun Fan. Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism[J]. Laser & Optoelectronics Progress, 2021, 58(2): 0210017
Category: Image Processing
Received: Jul. 6, 2020
Accepted: Sep. 8, 2020
Published Online: Jan. 11, 2021
The Author Email: Zhang Lingtao (158809488@qq.com)