Optics and Precision Engineering, Volume. 33, Issue 4, 653(2025)

A Transformer-based visual tracker via knowledge distillation

Na LI*, Mengqiao LIU, Jinting PAN, Kai HUANG, and Xingxuan JIA
Author Affiliations
  • School of Communication and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an710121, China
  • show less
    References(39)

    [1] MITRA S, ACHARYA T. Gesture recognition: a survey[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 37, 311-324(2007).

    [2] COLLINS R T, LIPTON A J, KANADE T. Introduction to the special section on video surveillance[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 745-746(2000).

    [3] HARITAOGLU I, HARWOOD D, DAVIS L S. W4: real-time surveillance of people and their activities[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 809-830(2000).

    [4] KASTRINAKI V, ZERVAKIS M, KALAITZAKIS K. A survey of video processing techniques for traffic applications[J]. Image and Vision Computing, 21, 359-381(2003).

    [5] RIOS-CABRERA R, TUYTELAARS T, VAN GOOL L. Efficient multi-camera vehicle detection, tracking, and identification in a tunnel surveillance application[J]. Computer Vision and Image Understanding, 116, 742-753(2012).

    [6] DESOUZA G N, KAK A C. Vision for mobile robot navigation: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 237-267(2002).

    [7] BONIN-FONT F, ORTIZ A, OLIVER G. Visual navigation for mobile robots: a survey[J]. Journal of Intelligent and Robotic Systems, 53, 263-296(2008).

    [8] ASHISH V, NOAM S, NIKI P et al. Attention is all you need[C], 5998-6008(2017).

    [9] DOSOVITSKIY A, BEYER L, KOLESNIKOV A et al. An Image Is Worth 16×16 Words: transformers for image recognition at scale[C], 1-22(2021).

    [10] LIU Z, LIN Y T, CAO Y et al. Swin transformer: hierarchical vision transformer using shifted windows[C], 9992-10002(2021).

    [11] CHEN X, YAN B, ZHU J et al. Transformer tracking[C], 8126-8135(2021).

    [12] MAYER C, DANELLJAN M, BHAT G et al. Transforming model prediction for tracking[C], 8731-8740(2022).

    [13] CHEN Q, WU Q M, WANG J et al. MixFormer: mixing features across windows and dimensions[C], 5239-5249(2022).

    [14] GAO S Y, ZHOU C L, MA C et al[M]. AiATrack: Attention in Attention for Transformer Visual Tracking, 146-164(2022).

    [15] CHEN X, PENG H W, WANG D et al. SeqTrack: sequence to sequence learning for visual object tracking[C], 14572-14581(2023).

    [17] ROMERO A, BALLAS N, KAHOU S E et al. Fitnets: hints for thin deep nets[C], 1-13(2015).

    [19] ZHOU G R, FAN Y, CUI R P et al. Rocket launching: a universal and efficient framework for training well-performing light net[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 32, 4580-4587(2018).

    [20] FU H, ZHOU S J, YANG Q H et al. LRC-BERT: latent-representation contrastive knowledge distillation for natural language understanding[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 12830-12838(2021).

    [21] CHANG C W, ZHONG Z Q, LIOU J J. A FPGA implementation of farneback optical flow by high-level synthesis[C], 309-309(2019).

    [22] HE K M, CHEN X L, XIE S N et al. Masked autoencoders are scalable vision learners[J]. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 15988(2022).

    [23] CIPOLLA R, KENDALL A. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C], 7482-7491(2018).

    [24] FAN H, LIN L T, YANG F et al. LaSOT: A High-Quality benchmark for large-scale single object tracking[C], 5369-5378(2019).

    [25] HUANG L H, ZHAO X, HUANG K Q. GOT-10k: a large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1562-1577(2021).

    [26] WU Y, YANG M H. Online object tracking: a benchmark[C]. OR, 2411-2418(2013).

    [27] WU Y, YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1834-1848(2015).

    [28] MUELLER M, SMITH N, GHANEM B. A benchmark and simulator for uav tracking[C], 445-461(2016).

    [29] MÜLLER M, BIBI A, GIANCOLA S et al[M]. Trackingnet: a large-scale dataset and benchmark for object tracking in the wild, 310-327(2018).

    [30] BERTINETTO L, VALMADRE J, HENRIQUES J F et al. Fully-convolutional siamese networks for object tracking[C], 850-865(2016).

    [31] CHEN Z, ZHONG B, LI G et al. Siamese box adaptive network for visual tracking[C], 6668-6677(2020).

    [32] GUO D Y, SHAO Y Y, CUI Y et al. Graph attention tracking[C], 9543-9552(2021).

    [33] DANELLJAN M, BHAT G, KHAN F S et al. ATOM: accurate tracking by overlap maximization[C], 15, 4655-4664(2019).

    [34] BHAT G, DANELLJAN M, VAN GOOL L et al. Learning discriminative model prediction for tracking[C], 6181-6190(2019).

    [35] BLATTER P, KANAKIS M, DANELLJAN M et al. Efficient visual tracking with exemplar transformers[C], 1571-1581(2023).

    [36] YAN B, PENG H W, WU K et al. LightTrack: finding lightweight neural networks for object tracking via one-shot architecture search[C], 20, 15180-15189(2021).

    [38] CHEN X, KANG B, WANG D et al[M]. Efficient Visual Tracking Via Hierarchical Cross-Attention Transformer, 461-477(2023).

    [39] KANG B, CHEN X, WANG D et al. Exploring lightweight hierarchical vision transformers for efficient visual tracking[C], 9578-9587(2023).

    Tools

    Get Citation

    Copy Citation Text

    Na LI, Mengqiao LIU, Jinting PAN, Kai HUANG, Xingxuan JIA. A Transformer-based visual tracker via knowledge distillation[J]. Optics and Precision Engineering, 2025, 33(4): 653

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Sep. 24, 2024

    Accepted: --

    Published Online: May. 20, 2025

    The Author Email: Na LI (lina114@xupt.edu.cn)

    DOI:10.37188/OPE.20253304.0653

    Topics