Behavior recognition in infrared video based on global bilinear attention

OUYANG Nan-nan; KUANG Li-qun; XIE Jian-bin; HAN Hui-yan; CAO Ya-ming; WANG Fei

doi:10.3969/j.issn.1001-5078.2024.03.015

Laser & Infrared, Volume. 54, Issue 3, 431(2024)

Behavior recognition in infrared video based on global bilinear attention

OUYANG Nan-nan¹, KUANG Li-qun^1,2,3、*, XIE Jian-bin¹, HAN Hui-yan^1,2,3, CAO Ya-ming^1,2,3, and WANG Fei^1,2,3

Author Affiliations

¹School of Computer Science and Technology, North University of China, Taiyuan 030051, China

²Shanxi Key Laboratory of Machine Vision and Virtual Reality, Taiyuan 030051, China

³Shanxi Province's Vision Information Processing and Intelligent Robot Engineering Research Center, Taiyuan 030051, China

show less

Abstract Get PDF(in Chinese)

References(24)

[1] [1] Sun Z, Ke Q, Rahmani H, et al. Human action recognition from various data modalities: a review[J/OL]. https:arxiv.org/pdf/2012.11866.pdf.

[3] [3] C Gao, Y Du, J Liu, et al. Infar dataset: Infrared action recognition at different times[J]. Neurocomputing, 2016, 212.

[4] [4] Z Jiang, V Rozgic, S Adali. Learning spatiotemporal features for infrared action recognition with 3D convolutional neural networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Worksprops (CVPRW). IEEE, 2017.

[5] [5] Y Liu, Z Lu, J Li, et al. Global temporal representation based cnns for infrared action recognition[J]. IEEESignal Process. Lett., 2018, 25(6).

[6] [6] J Imran, B Raman. Deep residual infrared action recognition by integrating local and global spatio temporal cues[J]. Infrared Phys, Technol., 2019, 102.

[7] [7] Chen Xu, Gao Chenqiang, Li Chaoyu, et al. Infrared action detection in the dark via cross-stream attention mechanism[J]. IEEE Transactions on Multimedia, 2022, 24: 288-300.

[8] [8] G Batchuluun, J K Kang, D T Nguyen, et al. Action recognition from thermal videos using joint and skeleton information[J]. IEEE Access, 2021(9): 11716-11733.

[9] [9] A M De Boissiere, R Noumeir. Infrared and 3D skeleton feature fusion for rgb-d action recognition[J]. IEEE Access, 2020, 8: 168297-168308.

[10] [10] Shaoqing Ren, Kaiming He, Ross Girshick, et al. Fasterrcnn: towards real-time object detection with region proposal networks[J]. arXiv: 1506.01497, 2015.

[11] [11] Ke Sun, Bin Xiao, Dong Liu, et al. Deep high-resolution representation learning for human pose estimation[C]//CVPR, 2019: 5693-5703.

[12] [12] H Duan, Y Zhao, K Chen, et al. Revisiting skeleton-based action recognition[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022: 2959-2968.

[13] [13] Guo M H, Xu T X., Liu J J, et al. Attention mechanisms in computer vision: a survey[J]. Comp. Visual Media, 2022, 8: 331-368.

[14] [14] X Wang, R Girshick A. Gupta, K. He. Non-local neural networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 7794-7803.

[15] [15] Y Cao, J Xu, S Lin, et al. GCNet: Non-Local networks meet squeeze-excitation networks and beyond[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea (South), 2019: 1971-1980.

[16] [16] J Hu, L Shen, G Sun. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 7132-7141.

[17] [17] L Chi, Z Yuan, Y Mu, et al. Non-Local neural networks with grouped bilinear attentional transforms[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020: 11801-11810.

[19] [19] Y Cui, M Jia, T -Y Lin, et al. Class-balanced loss based on effective number of samples[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019: 9260-9269.

[20] [20] T. Y. Lin, P. Goyal, R. Girshick, et al. Focal loss for dense object detection[C]//PAMI, 2018.

[21] [21] H Wang, A Klser, C Schmid, et al. Action recognition by dense trajectories[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2011: 3169-3176.

[22] [22] H Wang, C Schmid. Action recognition with improved trajectories[C]//IEEE International Conference on Computer Vision, 2013: 3551-3558.

[23] [23] Y Liu, Z Lu, J Li, et al. Transferable feature representation for visible-to-infrared cross-dataset human action recognition[J/OL]. http://arxiv.org/abs/1909.08297.

[25] [25] C Feichtenhofer. X3D: expanding architectures for efficient video recognition[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020: 200-210.

[26] [26] C Feichtenhofer, H Fan, J Malik, et al. SlowFast networks for video recognition[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019: 6201-6210.

[27] [27] Y Chen, Z Zhang, C Yuan, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 2021: 13339-13348.

Tools

Get Citation

Copy Citation Text

OUYANG Nan-nan, KUANG Li-qun, XIE Jian-bin, HAN Hui-yan, CAO Ya-ming, WANG Fei. Behavior recognition in infrared video based on global bilinear attention[J]. Laser & Infrared, 2024, 54(3): 431

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: May. 23, 2023

Accepted: Jun. 4, 2025

Published Online: Jun. 4, 2025

The Author Email: KUANG Li-qun (kuang@nuc.edu.cn)

DOI:10.3969/j.issn.1001-5078.2024.03.015

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology