Laser & Optoelectronics Progress, Volume. 57, Issue 18, 181506(2020)

Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model

Na Pan, Min Jiang*, and Jun Kong
Author Affiliations
  • Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, Jiangsu 214122, China
  • show less
    Figures & Tables(12)
    Framework of action recognition network based on spatio-temporal interactive attention model
    Local_Mask feature maps generated from UCF101 dataset. (a) Balance beam; (b) walking with dog
    Mask guided spatial attention model
    Optical flow guided temporal attention model
    Training and testing iteration curves of each algorithm on UCF101 dataset.(a) Proposed model; (b) proposed model with OGTAM;(c) proposed model with MGSAM;(d) proposed model with OGTAM+MGSAM
    Visualization results of proposed algorithm on different datasets. (a) UCF101; (b) Penn Action
    • Table 1. Experimental parameters

      View table

      Table 1. Experimental parameters

      ParameterValue
      Loss functionCategorical_cross entropy
      OptimizerAdam
      Learning rate0.0001
      Batch size18
      Epoch150(Penn Action)/250(UCF101)
    • Table 2. Effects of optical flow guided temporal attention mechanism on UCF101 datasetunit: %

      View table

      Table 2. Effects of optical flow guided temporal attention mechanism on UCF101 datasetunit: %

      ModalityattentionRGBTVNet
      WithWithoutWithWithout
      3D ConvNet76.5875.4382.7981.71
      Bi-LSTM82.2280.1580.3679.38
    • Table 3. Effects of mask guided spatial attention mechanism on UCF101 dataset%

      View table

      Table 3. Effects of mask guided spatial attention mechanism on UCF101 dataset%

      AttentionWithWithout
      RGB85.4480.15
      TVNet82.6281.71
      RGB+TVNet92.8091.70
    • Table 4. Comparison of proposed model and other basic models on UCF101 dataset%

      View table

      Table 4. Comparison of proposed model and other basic models on UCF101 dataset%

      ModelAccuracy
      VideoLSTM-two stream[15]89.2
      Two-stream MLDF-3D[16]91.3
      Two-stream HHF[17]91.2
      Proposed model91.7
      Proposed model(with OGTAM)92.2
      Proposed model(with MGSAM)92.8
      Proposed model(with OGTAM+MGSAM)94.9
    • Table 5. Comparison of accuracy of different algorithms on UCF101 dataset%

      View table

      Table 5. Comparison of accuracy of different algorithms on UCF101 dataset%

      ModelAccuracy
      IDT+FV[18]85.9
      IDT+HSV[19]87.9
      MIFS[20]89.1
      TSN(two modalities)[2]94.0
      Hidden two-stream[21]93.1
      MLDF-3D[16]94.4
      MS-NET[22]93.9
      Two-stream I3D[3]98.0
      Two-stream FCAN-comp[23]92.0
      VideoLSTM[15]89.2
      JSTA[11]93.7
      RSTAN[24]94.6
      VideoYOLO[10]90.6
      Proposed model91.7
      Proposed model(with OGTAM+MGSAM)94.9
    • Table 6. Comparison of accuracy of different algorithms on Penn Action dataset%

      View table

      Table 6. Comparison of accuracy of different algorithms on Penn Action dataset%

      ModelAccuracy
      Good-practice CNN88.6
      JDD[25]87.4
      C3D[25]86.0
      TSN-S+T[2]93.8
      GLTF[26]86.1
      Im2Flow[27]77.4
      Spatial81.7
      Temporal83.4
      Proposed model89.3
      Proposed model(with OGTAM)90.7
      Proposed model(with MGSAM)90.6
      Proposed model(with OGTAM+MGSAM)91.7
    Tools

    Get Citation

    Copy Citation Text

    Na Pan, Min Jiang, Jun Kong. Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181506

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Machine Vision

    Received: Dec. 23, 2019

    Accepted: Feb. 14, 2020

    Published Online: Sep. 2, 2020

    The Author Email: Jiang Min (minjiang@jiangnan.edu.cn)

    DOI:10.3788/LOP57.181506

    Topics