Laser & Optoelectronics Progress, Volume. 58, Issue 2, 0210017(2021)

Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism

Tianbao Liu, Lingtao Zhang*, Wentao Yu, Dongchuan Wei, and Yijun Fan
Author Affiliations
  • College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, Hunan 410004, China
  • show less
    Figures & Tables(12)
    Flow chart of audio and video emotion recognition system
    Structure of recursive neuron
    Schematic of attention mechanism
    Schematic of stacking LSTM model with attention mechanism
    Diagram of video emotion recognition system
    Relationship between LSTM layers and recognition rate
    Performance comparison of different feature fusion algorithms
    • Table 1. Comparison of recognition rate in speech emotion recognition experiment

      View table

      Table 1. Comparison of recognition rate in speech emotion recognition experiment

      NetworkRMLAFEW6.0eNTERFACE'05
      SVM[23]0.60200.37900.4831
      Random forest[24]0.65280.35080.4711
      LSTM+CNN[25]0.85460.4915
      CNN0.83630.4691
      CNN+LSTM0.84460.42170.4952
      Proposed network0.90110.54730.5932
    • Table 2. Recognition rate comparison of hierarchical attention mechanism

      View table

      Table 2. Recognition rate comparison of hierarchical attention mechanism

      Dataset3-layer LSTM
      OrdinaryAdd attentionmechanism
      RML0.86610.8873
      AFEW6.00.46330.4965
      eNTERFACE'050.53150.5739
    • Table 3. Recognition rate comparison under penalty items

      View table

      Table 3. Recognition rate comparison under penalty items

      DatasetOrdinaryAdd penalty
      RML0.88730.9011
      AFEW6.00.49650.5473
      eNTERFACE'050.57390.5932
    • Table 4. Recognition rate of facial expression

      View table

      Table 4. Recognition rate of facial expression

      Video sequencefeatureRMLAFEW6.0eNTERFACE'05
      EF-A0.86530.50740.7458
      EF-B0.88120.51850.7974
      EF-C0.82320.47130.7515
      EF-VGG0.83460.48820.7627
    • Table 5. Weight settings on three datasets

      View table

      Table 5. Weight settings on three datasets

      DatasetFacial expressionrecognitionSpeech expressionrecognition
      RML0.600.40
      AFEW6.00.750.25
      eNTERFACE'050.800.20
    Tools

    Get Citation

    Copy Citation Text

    Tianbao Liu, Lingtao Zhang, Wentao Yu, Dongchuan Wei, Yijun Fan. Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism[J]. Laser & Optoelectronics Progress, 2021, 58(2): 0210017

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Image Processing

    Received: Jul. 6, 2020

    Accepted: Sep. 8, 2020

    Published Online: Jan. 11, 2021

    The Author Email: Zhang Lingtao (158809488@qq.com)

    DOI:10.3788/LOP202158.0210017

    Topics