Acta Optica Sinica, Volume. 43, Issue 15, 1510003(2023)

Research Progress in Fundamental Architecture of Deep Learning-Based Single Object Tracking Method

Tingfa Xu1,2、*, Ying Wang1, Guokai Shi3, Tianhao Li1, and Jianan Li1、**
Author Affiliations
  • 1Key Laboratory of Photoelectronic Imaging Technology and System, Ministry of Education, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
  • 2Chongqing Innovation Center, Beijing Institute of Technology, Chongqing 401120, China
  • 3North Automatic Control Technology Institute, Taiyuan 030006, Shanxi, China
  • show less
    References(63)

    [1] Henriques J F, Caseiro R, Martins P et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 583-596(2015).

    [2] Bertinetto L, Valmadre J, Henriques J F et al. Fully-convolutional Siamese networks for object tracking[M]. Hua G, Jégou H. Computer vision–ECCV 2016 workshops. Lecture notes in computer science, 9914, 850-865(2016).

    [3] Cui Z J, An J S, Zhang Y F et al. Light-weight Siamese attention network object tracking for unmanned aerial vehicle[J]. Acta Optica Sinica, 40, 1915001(2020).

    [4] Xie Z, Geng Z, Hu J et al. Revealing the dark secrets of masked image modeling[J]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14475-14485(2023).

    [5] Held D, Thrun S, Savarese S. Learning to track at 100 fps with deep regression networks[C], 749-765(2016).

    [6] Guo D, Shao Y, Cui Y et al. Graph attention tracking[J]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9543-9552(2021).

    [7] Xu Y D, Wang Z Y, Li Z X et al. SiamFC++: towards robust and accurate visual tracking with target estimation guidelines[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12549-12556(2020).

    [8] Vaswani A, Shazeer N, Parmar N et al. Attention is all you need[C], 6000-6010(2017).

    [9] Rao Y, Zhao W, Liu B et al. Dynamicvit: Efficient vision transformers with dynamic token sparsification[J]. Advances in Neural Information Processing Systems, 34, 13937-13949(2021).

    [10] Carion N, Massa F, Synnaeve G et al. End-to-end object detection with transformers[M]. Vedaldi A, Bischof H, Brox T, et al. Computer vision–ECCV 2020. Lecture notes in computer science, 12346, 213-229(2020).

    [11] Wang H Y, Zhu Y K, Adam H et al. MaX-DeepLab: end-to-end panoptic segmentation with mask transformers[C], 5459-5470(2021).

    [12] Chen X, Yan B, Zhu J W et al. Transformer tracking[C], 8122-8131(2021).

    [13] Xie F, Wang C Y, Wang G T et al. Correlation-aware deep tracking[C], 8741-8750(2022).

    [14] Ye B T, Chang H, Ma B P et al. Joint feature learning and relation modeling for tracking: a one-stream framework[M]. Avidan S, Brostow G, Cissé M, et al. Computer vision–ECCV 2022. Lecture notes in computer science, 13682, 341-357(2022).

    [15] Kristan M, Leonardis A, Matas J et al. The sixth visual object tracking vot2018 challenge results[C], 3-53(2018).

    [16] Huang L H, Zhao X, Huang K Q. GOT-10k: a large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1562-1577(2021).

    [17] Fan H, Lin L T, Yang F et al. LaSOT: a high-quality benchmark for large-scale single object tracking[C], 5369-5378(2020).

    [18] Müller M, Bibi A, Giancola S et al. TrackingNet: a large-scale dataset and benchmark for object tracking in the wild[M]. Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision–ECCV 2018. Lecture notes in computer science, 11205, 310-327(2018).

    [19] Li P X, Wang D, Wang L J et al. Deep visual tracking: review and experimental comparison[J]. Pattern Recognition, 76, 323-338(2018).

    [20] Marvasti-Zadeh S M, Cheng L, Ghanei-Yakhdan H et al. Deep learning for visual tracking: a comprehensive survey[J]. IEEE Transactions on Intelligent Transportation Systems, 23, 3943-3968(2022).

    [21] Javed S, Danelljan M, Khan F S et al. Visual object tracking with discriminative filters and Siamese networks: a survey and outlook[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 6552-6574(2022).

    [22] Yang T Y, Chan A B. Recurrent filter learning for visual tracking[C], 2010-2019(2018).

    [23] Song Y B, Ma C, Wu X H et al. VITAL: visual tracking via adversarial learning[C], 8990-8999(2018).

    [24] Park E, Berg A C. Meta-tracker: fast and robust online adaptation for visual object trackers[M]. Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision–ECCV 2018. Lecture notes in computer science, 11207, 587-604(2018).

    [25] Zheng J L, Ma C, Peng H W et al. Learning to track objects from unlabeled videos[C], 13526-13535(2022).

    [26] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 60, 84-90(2017).

    [27] Liu Z, Lin Y T, Cao Y et al. Swin transformer: hierarchical vision transformer using shifted windows[C], 9992-10002(2022).

    [29] Valmadre J, Bertinetto L, Henriques J et al. End-to-end representation learning for correlation filter based tracking[C], 2805-2813(2017).

    [31] Szegedy C, Vanhoucke V, Ioffe S et al. Rethinking the inception architecture for computer vision[C], 2818-2826(2016).

    [32] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition[C], 770-778(2016).

    [33] Zhang Z P, Peng H W. Deeper and wider Siamese networks for real-time visual tracking[C], 4586-4595(2020).

    [34] Li B, Wu W, Wang Q et al. SiamRPN: evolution of Siamese visual tracking with very deep networks[C], 4277-4286(2020).

    [38] Guo M Z, Zhang Z P, Fan H et al. Learning target-aware representation for visual tracking via informative interactions[C], 927-934(2022).

    [39] Li B, Yan J J, Wu W et al. High performance visual tracking with Siamese region proposal network[C], 8971-8980(2018).

    [40] Yan B, Zhang X Y, Wang D et al. Alpha-refine: boosting tracking performance by precise bounding box estimation[C], 5285-5294(2021).

    [41] Wang Y, Xu T F, Li J N et al. Pyramid correlation based deep Hough voting for visual object tracking[C], 610-625(2021).

    [42] Fu Z H, Liu Q J, Fu Z H et al. STMTrack: template-free visual tracking with space-time memory networks[C], 13769-13778(2021).

    [43] Wang X L, Girshick R, Gupta A et al. Non-local neural networks[C], 7794-7803(2018).

    [44] Yan B, Peng H W, Fu J L et al. Learning spatio-temporal transformer for visual tracking[C], 10428-10437(2022).

    [45] Song Z K, Yu J Q, Chen Y P P et al. Transformer tracking with cyclic shifting window attention[C], 8781-8790(2022).

    [46] Zhang Z P, Liu Y H, Wang X et al. Learn to match: automatic matching network design for visual tracking[C], 13319-13328(2022).

    [47] Ren S Q, He K M, Girshick R et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149(2017).

    [48] Yu Y C, Xiong Y L, Huang W L et al. Deformable Siamese attention networks for visual object tracking[C], 6727-6736(2020).

    [49] Chen Z D, Zhong B N, Li G R et al. Siamese box adaptive network for visual tracking[C], 6667-6676(2020).

    [50] Jiang B R, Luo R X, Mao J Y et al. Acquisition of localization confidence for accurate object detection[M]. Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision–ECCV 2018. Lecture notes in computer science, 11218, 816-832(2018).

    [51] Guo D Y, Wang J, Cui Y et al. SiamCAR: Siamese fully convolutional classification and regression for visual tracking[C], 6268-6276(2020).

    [52] Du F, Liu P, Zhao W et al. Correlation-guided attention for corner detection based visual tracking[C], 6835-6844(2020).

    [53] Deng J, Dong W, Socher R et al. ImageNet: a large-scale hierarchical image database[C], 248-255(2009).

    [54] Cui Y T, Jiang C, Wang L M et al. MixFormer: end-to-end tracking with iterative mixed attention[C], 13598-13608(2022).

    [55] Chen B Y, Li P X, Bai L et al. Backbone is all your need: a simplified architecture for visual object tracking[M]. Avidan S, Brostow G, Cissé M, et al. Computer vision–ECCV 2022. Lecture notes in computer science, 13682, 375-392(2022).

    [56] Wu H P, Xiao B, Codella N et al. CvT: introducing convolutions to vision transformers[C], 22-31(2022).

    [58] Chen X L, Xie S N, He K M. An empirical study of training self-supervised vision transformers[C], 9620-9629(2022).

    [59] He K M, Chen X L, Xie S N et al. Masked autoencoders are scalable vision learners[C], 15979-15988(2022).

    [61] Mueller M, Smith N, Ghanem B. A benchmark and simulator for UAV tracking[M]. Leibe B, Matas J, Sebe N, et al. Computer vision–ECCV 2016. Lecture notes in computer science, 9905, 445-461(2016).

    [62] Wang X, Shu X J, Zhang Z P et al. Towards more flexible and accurate object tracking with natural language: algorithms and benchmark[C], 13758-13768(2021).

    [63] Zhang Z P, Peng H W, Fu J L et al. Ocean: object-aware anchor-free tracking[M]. Vedaldi A, Bischof H, Brox T, et al. Computer vision-ECCV 2020. Lecture notes in computer science, 12366, 771-787(2020).

    Tools

    Get Citation

    Copy Citation Text

    Tingfa Xu, Ying Wang, Guokai Shi, Tianhao Li, Jianan Li. Research Progress in Fundamental Architecture of Deep Learning-Based Single Object Tracking Method[J]. Acta Optica Sinica, 2023, 43(15): 1510003

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Image Processing

    Received: Mar. 29, 2023

    Accepted: Jun. 15, 2023

    Published Online: Aug. 15, 2023

    The Author Email: Xu Tingfa (ciom_xtf1@bit.edu.cn), Li Jianan (lijianan@bit.edu.cn)

    DOI:10.3788/AOS230746

    Topics