Chinese Journal of Liquid Crystals and Displays, Volume. 38, Issue 8, 1095(2023)

Behavior recognition based on time-dependent attention

Kuan LIU1,2, Wei WANG1,2, Hong-ting SHEN1, Hong-tao HOU1,2, Min-zhen GUO1,2, and Zi-jiang LUO1、*
Author Affiliations
  • 1School of Information, Guizhou University of Finance and Economics,Guiyang 550025, China
  • 2Intelligent Middle, Beijing Cloud Trace Technology Co., Ltd., Beijing 100089, China
  • show less
    Figures & Tables(15)
    SlowFast architecture
    Example of action correlatio
    Correlative attention mechanism
    Temporal attention mechanism
    Time-dependent attention mechanism
    Risidual block of Res_TCAM
    Statistics of the number of UCF101 video frames
    Training results on UCF101
    Confusion matrix
    • Table 0. [in Chinese]

      View table
      View in Article

      Table 0. [in Chinese]

      算法1:自动帧采样算法

      输入:原始视频:RawVideo,帧采样率:r,有效视频帧采样数量:γ

      输出:包含视频帧数据的缓存:DataBuffer

      变量:开始索引:start_idx结束索引:end_idx帧有效采样计数器:sample_count采样计数器:count视频帧数量:frame_count帧采样数量:fram_sample_count

      1  获取视频帧数量:frame_count=GetFrameCountFun(RawVideo)

      2  初始化参数:

      3  start_idx=0

      4  end_idx=frame_count-1

      5  sample_count=0

      6  count=0

      7  fram_sample_count=frame_count // r-1

      8  if(frame_count>300)

      9    end_idx=RandInt(300,frame_count)

      10   start_idx=end_idx-300

      11   fram_sample_count=301 // r-1

      12 初始化数据缓存:InitFun(DataBuffer)

      13 while(count<end_idx)do

      14       frame=GetFrame(RawVideo)

      15    if(count<start_idx)

      16       count+=1

      17       continue

      18    if(count>end_idx)

      19       break

      20    if(sample_count<fram_sample_count)

      21       DataBuffer[sample_count]=frame

      22       sample_count+=1

      23    count+=1

      24 if(len(DataBuffer)<γ+2):

      25    delete DataBuffer

      26 else

      27    DataBuffer←随机从DataBuffer 顺序抽取

             γ帧数据

      28 return DataBuffer

    • Table 1. Parameters of network structure

      View table
      View in Article

      Table 1. Parameters of network structure

      SlowFastOutsize T×S2
      global average pool,concate,fc# classes
      Input64×2242
      Datastride 16,12stride 2,12

      slow:4×2242

      fast:32×2242

      Conv1

      1×72,64

      stride 1,22

      5×72,8

      stride 1,22

      slow:4×1122

      fast:32×1122

      Pool1

      1×32,64

      stride 1,22

      1×32,8

      stride 1,22

      slow:4×562

      fast:32×562

      Res_TCAM1

      1×12,64

      1×32,64×3

      TCAM,256

      3×12,8

      1×32,8×3

      TCAM,32

      slow:4×562

      fast:32×562

      Res_TCAM2

      1×12,128

      1×32,128×4

      TCAM,512

      3×12,16

      1×32,16×4

      TCAM,64

      slow:4×282

      fast:32×282

      Res_TCAM3

      3×12,256

      1×32,256×6

      TCAM,1 024

      3×12,32

      1×32,32×6

      TCAM,128

      slow:4×142

      fast:32×142

      Res_TCAM4

      3×12,512

      1×32,512×4

      TCAM,2 048

      3×12,64

      1×32,64×4

      TCAM,256

      slow:4×72

      fast:32×72

    • Table 2. Effect of γ on the model

      View table
      View in Article

      Table 2. Effect of γ on the model

      ModelγTop-1/%GFlops
      SlowFast6492.6827.99
      9693.1241.99
      12894.5555.98
      SlowFast+TCAM6495.1528.74
      9696.0343.08
      12896.1157.49
    • Table 3. Effects of different frame sampling rates on the model

      View table
      View in Article

      Table 3. Effects of different frame sampling rates on the model

      T×τtop-1/%GFlops
      2×3292.8914.86
      4×1695.1528.53
      8×896.7356.79
    • Table 4. Effects of time-dependent attention and lateral connectivity on experimental results

      View table
      View in Article

      Table 4. Effects of time-dependent attention and lateral connectivity on experimental results

      modelTCAMLateralLateral(Ours)Top-1/%
      SlowFast(b+92.68
      SlowFast93.79
      Slow Only94.70
      Fast Only(ours)95.83
      SlowFast95.15
    • Table 5. Comparison of recognition accuracy of TCAM with other methods

      View table
      View in Article

      Table 5. Comparison of recognition accuracy of TCAM with other methods

      ClassMethodPretrainingSpatial resolutionBackboneUCF101/%HMDB51/%
      BaselineIDT+FV185.9057.20
      IDT+BOVW187.9061.10
      Two-stream7ImageNet224×224ConvNets88.0059.40
      C3D 8128×171C3D85.2
      No-attentionTwo-stream-I3D9224×224Inception V193.4066.40
      Two-stream-I3D9ImageNet+Kinetics224×224Inception V198.0080.70
      Hidden Two-stream26UCF101224×224I3D97.1078.70
      R(2+1)D27Kinetics112×112ResNet3494.97
      AttentionAR3D_V122112×112R3D88.3951.53
      AR3D_V222112×112R3D89.2852.51
      STA-LSTM10ImageNet224×224GoogLeNet90.8059.70
      STA-LSTM10Kinetics224×224I3D98.6681.45
      Chen28Kinetics224×224Inception-v396.1374.45
      Chen28Kinetics224×224BN-Inception96.1875.17
      NST29ImageNet+ Kinetics224×224I3D96.0076.10
      文献[11ImageNet+ Kinetics224×224ResNet5096.377.7
      文献[12ImageNet128×128InceptionV394.9
      文献[13ImageNet299×299InceptionV394.970.8
      TCAM(Ours)224×224ResNet5095.8376.10
      TCAM(Ours)Kinetics224×224ResNet5098.1682.30
    Tools

    Get Citation

    Copy Citation Text

    Kuan LIU, Wei WANG, Hong-ting SHEN, Hong-tao HOU, Min-zhen GUO, Zi-jiang LUO. Behavior recognition based on time-dependent attention[J]. Chinese Journal of Liquid Crystals and Displays, 2023, 38(8): 1095

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Research Articles

    Received: Oct. 11, 2022

    Accepted: --

    Published Online: Oct. 9, 2023

    The Author Email: Zi-jiang LUO (luozijiang@mail.gufe.edu.cn)

    DOI:10.37188/CJLCD.2022-0330

    Topics