Journal of Optoelectronics · Laser, Volume. 34, Issue 12, 1298(2023)
Human motion recognition based on ConvGRU and attention feature fusion
In the action recognition task,how to fully learn and utilize the correlation between the spatial features and temporal features of the video is particularly important for the final recognition results.Aiming at the problem that the traditional action recognition method ignores the correlation of spatio-temporal features and small features,which leads to the decrease of recognition accuracy,this paper proposes a human action recognition method based on convolutional GRU (ConvGRU) and attentional feature fusion (AFF).Firstly,the Xception network is used to obtain the spatial feature extraction network of video frames,and the spatio-temporal excitation (STE) module and channel excitation (CE) module are introduced to obtain the spatial features and strengthen the modeling ability of temporal actions.In addition,the traditional long short term memory (LSTM) network is replaced by the ConvGRU network,which uses convolution to further mine the spatial features of video frames while extracting temporal features.Finally,the output classifier is improved,and the feature fusion module based on improved multi-scale channel attention is introduced to strengthen the recognition ability of small features and improve the accuracy of the model.The experimental results show that the recognition accuracy of 95.66 % and 69.82 % are achieved on the UCF101 dataset and the HMDB51 dataset,respectively.The algorithm obtains more complete spatio-temporal features and is superior to the current mainstream models.
Get Citation
Copy Citation Text
CHENG Nana, ZHANG Rongfen, LIU Yuhong, LIU Yuan, LIU Xingfei, YANG Shuang. Human motion recognition based on ConvGRU and attention feature fusion[J]. Journal of Optoelectronics · Laser, 2023, 34(12): 1298
Received: Mar. 21, 2023
Accepted: --
Published Online: Sep. 25, 2024
The Author Email: ZHANG Rongfen (rfzhang@gzu.edu.cn)