Laser & Infrared, Volume. 54, Issue 3, 431(2024)
Behavior recognition in infrared video based on global bilinear attention
[1] [1] Sun Z, Ke Q, Rahmani H, et al. Human action recognition from various data modalities: a review[J/OL]. https:arxiv.org/pdf/2012.11866.pdf.
[3] [3] C Gao, Y Du, J Liu, et al. Infar dataset: Infrared action recognition at different times[J]. Neurocomputing, 2016, 212.
[4] [4] Z Jiang, V Rozgic, S Adali. Learning spatiotemporal features for infrared action recognition with 3D convolutional neural networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Worksprops (CVPRW). IEEE, 2017.
[5] [5] Y Liu, Z Lu, J Li, et al. Global temporal representation based cnns for infrared action recognition[J]. IEEESignal Process. Lett., 2018, 25(6).
[6] [6] J Imran, B Raman. Deep residual infrared action recognition by integrating local and global spatio temporal cues[J]. Infrared Phys, Technol., 2019, 102.
[7] [7] Chen Xu, Gao Chenqiang, Li Chaoyu, et al. Infrared action detection in the dark via cross-stream attention mechanism[J]. IEEE Transactions on Multimedia, 2022, 24: 288-300.
[8] [8] G Batchuluun, J K Kang, D T Nguyen, et al. Action recognition from thermal videos using joint and skeleton information[J]. IEEE Access, 2021(9): 11716-11733.
[9] [9] A M De Boissiere, R Noumeir. Infrared and 3D skeleton feature fusion for rgb-d action recognition[J]. IEEE Access, 2020, 8: 168297-168308.
[10] [10] Shaoqing Ren, Kaiming He, Ross Girshick, et al. Fasterrcnn: towards real-time object detection with region proposal networks[J]. arXiv: 1506.01497, 2015.
[11] [11] Ke Sun, Bin Xiao, Dong Liu, et al. Deep high-resolution representation learning for human pose estimation[C]//CVPR, 2019: 5693-5703.
[12] [12] H Duan, Y Zhao, K Chen, et al. Revisiting skeleton-based action recognition[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022: 2959-2968.
[13] [13] Guo M H, Xu T X., Liu J J, et al. Attention mechanisms in computer vision: a survey[J]. Comp. Visual Media, 2022, 8: 331-368.
[14] [14] X Wang, R Girshick A. Gupta, K. He. Non-local neural networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 7794-7803.
[15] [15] Y Cao, J Xu, S Lin, et al. GCNet: Non-Local networks meet squeeze-excitation networks and beyond[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea (South), 2019: 1971-1980.
[16] [16] J Hu, L Shen, G Sun. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 7132-7141.
[17] [17] L Chi, Z Yuan, Y Mu, et al. Non-Local neural networks with grouped bilinear attentional transforms[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020: 11801-11810.
[19] [19] Y Cui, M Jia, T -Y Lin, et al. Class-balanced loss based on effective number of samples[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019: 9260-9269.
[20] [20] T. Y. Lin, P. Goyal, R. Girshick, et al. Focal loss for dense object detection[C]//PAMI, 2018.
[21] [21] H Wang, A Klser, C Schmid, et al. Action recognition by dense trajectories[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2011: 3169-3176.
[22] [22] H Wang, C Schmid. Action recognition with improved trajectories[C]//IEEE International Conference on Computer Vision, 2013: 3551-3558.
[23] [23] Y Liu, Z Lu, J Li, et al. Transferable feature representation for visible-to-infrared cross-dataset human action recognition[J/OL]. http://arxiv.org/abs/1909.08297.
[25] [25] C Feichtenhofer. X3D: expanding architectures for efficient video recognition[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020: 200-210.
[26] [26] C Feichtenhofer, H Fan, J Malik, et al. SlowFast networks for video recognition[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019: 6201-6210.
[27] [27] Y Chen, Z Zhang, C Yuan, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 2021: 13339-13348.
Get Citation
Copy Citation Text
OUYANG Nan-nan, KUANG Li-qun, XIE Jian-bin, HAN Hui-yan, CAO Ya-ming, WANG Fei. Behavior recognition in infrared video based on global bilinear attention[J]. Laser & Infrared, 2024, 54(3): 431
Category:
Received: May. 23, 2023
Accepted: Jun. 4, 2025
Published Online: Jun. 4, 2025
The Author Email: KUANG Li-qun (kuang@nuc.edu.cn)