Laser & Optoelectronics Progress, Volume. 58, Issue 2, 0210017(2021)

Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism

Tianbao Liu, Lingtao Zhang*, Wentao Yu, Dongchuan Wei, and Yijun Fan
Author Affiliations
  • College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, Hunan 410004, China
  • show less

    A single-layer long short term memory (LSTM) network is not generalizable to solve complex speech emotion recognition problems. Therefore, a hierarchical LSTM model with a self-attention mechanism is proposed. Penalty items are introduced to improve network performance. For the emotion recognition of video sequences, the attention mechanism is introduced to assign a weight to each video frame according to its emotional information and then classify these frames. The weighted decision fusion method is used to fuse expressions and speech signals to achieve the final emotion recognition. The experimental results demonstrate that compared with single-modal emotion recognition, the recognition accuracy of the proposed method on the selected data is improved by approximately 4%, thus the proposed method has a better recognition results.

    Tools

    Get Citation

    Copy Citation Text

    Tianbao Liu, Lingtao Zhang, Wentao Yu, Dongchuan Wei, Yijun Fan. Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism[J]. Laser & Optoelectronics Progress, 2021, 58(2): 0210017

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Image Processing

    Received: Jul. 6, 2020

    Accepted: Sep. 8, 2020

    Published Online: Jan. 11, 2021

    The Author Email: Zhang Lingtao (158809488@qq.com)

    DOI:10.3788/LOP202158.0210017

    Topics