Laser & Infrared, Volume. 54, Issue 3, 431(2024)

Behavior recognition in infrared video based on global bilinear attention

OUYANG Nan-nan1, KUANG Li-qun1,2,3、*, XIE Jian-bin1, HAN Hui-yan1,2,3, CAO Ya-ming1,2,3, and WANG Fei1,2,3
Author Affiliations
  • 1School of Computer Science and Technology, North University of China, Taiyuan 030051, China
  • 2Shanxi Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, China
  • 3Shanxi Province's Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China
  • show less
    References(24)

    [1] [1] Sun Z, Ke Q, Rahmani H, et al. Human action recognition from various data modalities: a review[J/OL]. https:arxiv.org/pdf/2012.11866.pdf.

    [3] [3] C Gao, Y Du, J Liu, et al. Infar dataset: Infrared action recognition at different times[J]. Neurocomputing, 2016, 212.

    [4] [4] Z Jiang, V Rozgic, S Adali. Learning spatiotemporal features for infrared action recognition with 3D convolutional neural networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Worksprops (CVPRW). IEEE, 2017.

    [5] [5] Y Liu, Z Lu, J Li, et al. Global temporal representation based cnns for infrared action recognition[J]. IEEESignal Process. Lett., 2018, 25(6).

    [6] [6] J Imran, B Raman. Deep residual infrared action recognition by integrating local and global spatio temporal cues[J]. Infrared Phys, Technol., 2019, 102.

    [7] [7] Chen Xu, Gao Chenqiang, Li Chaoyu, et al. Infrared action detection in the dark via cross-stream attention mechanism[J]. IEEE Transactions on Multimedia, 2022, 24: 288-300.

    [8] [8] G Batchuluun, J K Kang, D T Nguyen, et al. Action recognition from thermal videos using joint and skeleton information[J]. IEEE Access, 2021(9): 11716-11733.

    [9] [9] A M De Boissiere, R Noumeir. Infrared and 3D skeleton feature fusion for rgb-d action recognition[J]. IEEE Access, 2020, 8: 168297-168308.

    [10] [10] Shaoqing Ren, Kaiming He, Ross Girshick, et al. Fasterrcnn: towards real-time object detection with region proposal networks[J]. arXiv: 1506.01497, 2015.

    [11] [11] Ke Sun, Bin Xiao, Dong Liu, et al. Deep high-resolution representation learning for human pose estimation[C]//CVPR, 2019: 5693-5703.

    [12] [12] H Duan, Y Zhao, K Chen, et al. Revisiting skeleton-based action recognition[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022: 2959-2968.

    [13] [13] Guo M H, Xu T X., Liu J J, et al. Attention mechanisms in computer vision: a survey[J]. Comp. Visual Media, 2022, 8: 331-368.

    [14] [14] X Wang, R Girshick A. Gupta, K. He. Non-local neural networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 7794-7803.

    [15] [15] Y Cao, J Xu, S Lin, et al. GCNet: Non-Local networks meet squeeze-excitation networks and beyond[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea (South), 2019: 1971-1980.

    [16] [16] J Hu, L Shen, G Sun. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 7132-7141.

    [17] [17] L Chi, Z Yuan, Y Mu, et al. Non-Local neural networks with grouped bilinear attentional transforms[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020: 11801-11810.

    [19] [19] Y Cui, M Jia, T -Y Lin, et al. Class-balanced loss based on effective number of samples[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019: 9260-9269.

    [20] [20] T. Y. Lin, P. Goyal, R. Girshick, et al. Focal loss for dense object detection[C]//PAMI, 2018.

    [21] [21] H Wang, A Klser, C Schmid, et al. Action recognition by dense trajectories[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2011: 3169-3176.

    [22] [22] H Wang, C Schmid. Action recognition with improved trajectories[C]//IEEE International Conference on Computer Vision, 2013: 3551-3558.

    [23] [23] Y Liu, Z Lu, J Li, et al. Transferable feature representation for visible-to-infrared cross-dataset human action recognition[J/OL]. http://arxiv.org/abs/1909.08297.

    [25] [25] C Feichtenhofer. X3D: expanding architectures for efficient video recognition[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020: 200-210.

    [26] [26] C Feichtenhofer, H Fan, J Malik, et al. SlowFast networks for video recognition[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019: 6201-6210.

    [27] [27] Y Chen, Z Zhang, C Yuan, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 2021: 13339-13348.

    Tools

    Get Citation

    Copy Citation Text

    OUYANG Nan-nan, KUANG Li-qun, XIE Jian-bin, HAN Hui-yan, CAO Ya-ming, WANG Fei. Behavior recognition in infrared video based on global bilinear attention[J]. Laser & Infrared, 2024, 54(3): 431

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: May. 23, 2023

    Accepted: Jun. 4, 2025

    Published Online: Jun. 4, 2025

    The Author Email: KUANG Li-qun (kuang@nuc.edu.cn)

    DOI:10.3969/j.issn.1001-5078.2024.03.015

    Topics