Computer Applications and Software, Volume. 42, Issue 4, 156(2025)

HUMAN BEHAVIOR DETECTION METHOD BASED ON SPATIO-TEMPORAL INTERACTIVE NETWORK

Tian Qing1, Zhang Haoran1, Chu Baiqing2, Zhang Zheng1, and Dou Fei2,3
Author Affiliations
  • 1School of Information, North China University of Technology, Beijing 100144, China
  • 2Beijing Mass Transit Railway Operation Co., Ltd., Beijing 100144, China
  • 3Beijing Key Laboratory of Subway Operation Safety Technology, Beijing 100144, China
  • show less
    References(23)

    [4] [4] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.

    [5] [5] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector[C]//European Conference on Computer Vision, 2016: 21-37.

    [6] [6] Feichtenhofer C, Pinz A, Zisserman A. Convolutional twostream network fusion for video action recognition[C]//Computer Vision & Pattern Recognition, 2016: 1933-1941.

    [7] [7] Donahue J, Hendricks L A, Guadarrama S, et al. Longterm recurrent convolutional net-works for visual recognition and description[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2015, 2625-2634

    [8] [8] Peng X J, Schmid C. Multi-region two-stream R-CNN for action detection[C]//European Conference on Computer Vision, 2016: 744-759.

    [9] [9] Singh G, Saha S, Sapienza M, et al. Online real-time multiple spatiotemporal actionlocalisation and prediction[C]//IEEE International Conference on Computer Vision, 2017: 3637-3646.

    [10] [10] Hou R, Chen C, Shah M. Tube convolutional neural network (T-CNN) for action detection in videos[C]//IEEE International Conference on Computer Vision, 2017: 5822-5831.

    [11] [11] Duarte K, Rawat Y S, Shah M. VideoCapsuleNet: A simplified network for action detection[C]//32nd International Conference on Neural Information Processing Systems, 2018: 7621-7630.

    [12] [12] Girdhar R, Caeira J, Doersch C, et al. Video action transformer network[C]//IEEE Conference on Computer Vision and pattern Recognition, 2019: 244-253.

    [13] [13] Wu J, Yang X, Xi M, et al. Research on behavior recognition algorithm based on SE-I3D-GRU network[J]. High Technology Letters, 2021, 27(2): 163-172.

    [14] [14] Wu C Y, Feichtenhofer C, Fan H, et al. Long-term feature banks for detailed video understanding[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2019: 284 -293.

    [15] [15] Sun N, Leng L, Liu J X, et al. Multi-stream SlowFast graph convolutional networks for skeleton-based action recognition[J]. Image and Vision Computing, 2021, 109: 104141.

    [16] [16] Wei X Y, Rigoll G, Kopuklu O. You only watch once: A unified CNN architecture for real-time spatiotemporal action localization[EB]. arXiv: 1911.06644, 2019.

    [17] [17] Wang X L, Girshick R, Gupta A, et al. Non-local neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognitio, 2018: 7794-7803.

    [19] [19] Wang F, Jiang M Q, Qian C, et al. Residual attention network for image classification[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6450-6458.

    [20] [20] Chaudhari S, Polatkan G, Ramanath R, et al. An attentive survey of attention models[EB]. arXiv: 1904.02874, 2019.

    [21] [21] Babaeizadeh M, Ghiasi G. Adjustable real-time style transfer[EB]. arXiv: 1811.08560, 2018.

    [23] [23] Gu C H, Chen S, Ross D A, et al. AVA: A video dataset of spatio-temporally localize atomic visual actions[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 6047-6056.

    [24] [24] Sun C, Shrivastava A, Vondrick C, et al. Actor-centric relation network[EB]. arXiv: 1807.10982, 2018.

    [25] [25] Jiang J W, Cao Y, Song L, et al. Human centric spatiotemporal action localization[EB/OL]. [2021-11-01]. https://www.skicyyu.org/AVA/AVA_report.pdf.

    [26] [26] Kalogeiton V, Weinzaepfel P, Ferrari V, Action Tubelet detector for spatio-temporal action localization[C]//IEEE International Conference on Computer Vision, 2017: 4405-4413.

    [27] [27] Girdhar R, Carreira J, Doersch C, et al. Video action transformer network[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2019: 244-253.

    [28] [28] Feichtenhofer C, Fan H, Malik J, et al. SlowFast networks for video recognition[C]//IEEE International Conference on Computer Vision, 2019: 6202-6211.

    Tools

    Get Citation

    Copy Citation Text

    Tian Qing, Zhang Haoran, Chu Baiqing, Zhang Zheng, Dou Fei. HUMAN BEHAVIOR DETECTION METHOD BASED ON SPATIO-TEMPORAL INTERACTIVE NETWORK[J]. Computer Applications and Software, 2025, 42(4): 156

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Nov. 1, 2021

    Accepted: Aug. 25, 2025

    Published Online: Aug. 25, 2025

    The Author Email:

    DOI:10.3969/j.issn.1000-386x.2025.04.024

    Topics