Laser & Optoelectronics Progress, Volume. 57, Issue 20, 201506(2020)

Action Recognition Based on Adaptive Fusion of RGB and Skeleton Features

Fuzheng Guo, Jun Kong*, and Min Jiang
Author Affiliations
  • International Joint Laboratory for Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, Jiangsu 214122, China
  • show less
    Figures & Tables(16)
    RGB images and corresponding skeleton images
    Overall network
    Spatial-temporal feature extracting network with self-attention
    Adaptive weight computing network
    Feature fusion and classification
    Accuracy of different weight combinations
    Recognition results of using skeleton features only and fusion features
    Visualization of self-attention on skeleton and RGB images of Golf
    Visualization of self-attention on skeleton and RGB images of Baseball swing
    Visualization of adaptive weight of Golf, Baseball swing, Walk and Run
    • Table 1. Experimental parameters

      View table

      Table 1. Experimental parameters

      ParameterValue
      Loss functionCategorical cross entropy
      OptimizerAdam
      Learning rate0.0001
      Batch_size32
      Number of epoch150
    • Table 2. Accuracy with and without self-attention on Penn Action dataset unit: %

      View table

      Table 2. Accuracy with and without self-attention on Penn Action dataset unit: %

      AttentionRGBSkeletonFusion
      Without attention90.383.892.8
      With attention92.185.294.3
    • Table 3. Accuracy with and without self-attention on JHMDB dataset unit: %

      View table

      Table 3. Accuracy with and without self-attention on JHMDB dataset unit: %

      AttentionRGBSkeletonFusion
      Without attention69.261.972.9
      With attention71.363.774.8
    • Table 4. Comparison of AWCN and other algorithms on Penn Action dataset unit: %

      View table

      Table 4. Comparison of AWCN and other algorithms on Penn Action dataset unit: %

      AlgorithmAccuracy
      AOG-Fine[16]73.4
      STIP-HoG+HoG[17]82.8
      AOG-All[16]85.5
      C3D[18]86.0
      JDD[19]87.4
      MMTSN-RGB+Pose[20]91.67
      IDT-FV[19]92.0
      IDT-FV+Pose[19]92.9
      TSN [21]93.8
      DPI+att-DTI[22]93.9
      DPI+att-DTIs[22]95.8
      AWCN (Ours)92.8
      AWCN+self-attention (Ours)94.3
    • Table 5. Comparison of AWCN and other algorithms on JHMDB dataset unit: %

      View table

      Table 5. Comparison of AWCN and other algorithms on JHMDB dataset unit: %

      AlgorithmAccuracy
      P-CNN[7]61.1
      FAT[23]62.5
      MMTSN-RGB+Pose[20]62.86
      STAR-Net[24]64.3
      IDT-FV[19]65.9
      TS R-CNN[23]70.5
      MR-TS R-CNN[23]71.1
      GoogLeNet+iTF[25]74.5
      AWCN (Ours)72.9
      AWCN+self-attention (Ours)74.8
    • Table 6. Comparison of AWCN and other algorithms on NTU RGB-D dataset unit: %

      View table

      Table 6. Comparison of AWCN and other algorithms on NTU RGB-D dataset unit: %

      AlgorithmCSCV
      STA-LSTM[26]73.481.2
      VA-LSTM[27]79.487.6
      ST-GCN[28]81.588.3
      Two-Stream CNN[29]83.289.3
      CSTA-CNN[30]84.989.9
      HCN[31]86.591.9
      SR-TSL[32]84.892.4
      AWCN (Ours)85.688.9
      AWCN+self-attention (Ours)87.390.1
    Tools

    Get Citation

    Copy Citation Text

    Fuzheng Guo, Jun Kong, Min Jiang. Action Recognition Based on Adaptive Fusion of RGB and Skeleton Features[J]. Laser & Optoelectronics Progress, 2020, 57(20): 201506

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Machine Vision

    Received: Dec. 23, 2019

    Accepted: Feb. 25, 2020

    Published Online: Oct. 17, 2020

    The Author Email: Jun Kong (kongjun@jiangnan.edu.cn)

    DOI:10.3788/LOP57.201506

    Topics