A Transformer-based visual tracker via knowledge distillation

Na LI; Mengqiao LIU; Jinting PAN; Kai HUANG; Xingxuan JIA

doi:10.37188/OPE.20253304.0653

Optics and Precision Engineering, Volume. 33, Issue 4, 653(2025)

A Transformer-based visual tracker via knowledge distillation

Na LI^*, Mengqiao LIU, Jinting PAN, Kai HUANG, and Xingxuan JIA

School of Communication and Information Engineering， Xi’an University of Posts and Telecommunications， Xi’an710121， China

show less

Abstract Get PDF(in Chinese)

References(39)

[1] MITRA S, ACHARYA T. Gesture recognition： a survey[J]. IEEE Transactions on Systems， Man， and Cybernetics， Part C （Applications and Reviews）, 37, 311-324(2007).

[2] COLLINS R T, LIPTON A J, KANADE T. Introduction to the special section on video surveillance[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 745-746(2000).

[3] HARITAOGLU I, HARWOOD D, DAVIS L S. W⁴： real-time surveillance of people and their activities[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 809-830(2000).

[4] KASTRINAKI V, ZERVAKIS M, KALAITZAKIS K. A survey of video processing techniques for traffic applications[J]. Image and Vision Computing, 21, 359-381(2003).

[5] RIOS-CABRERA R, TUYTELAARS T, VAN GOOL L. Efficient multi-camera vehicle detection， tracking， and identification in a tunnel surveillance application[J]. Computer Vision and Image Understanding, 116, 742-753(2012).

[6] DESOUZA G N, KAK A C. Vision for mobile robot navigation： a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 237-267(2002).

[7] BONIN-FONT F, ORTIZ A, OLIVER G. Visual navigation for mobile robots： a survey[J]. Journal of Intelligent and Robotic Systems, 53, 263-296(2008).

[8] ASHISH V, NOAM S, NIKI P et al. Attention is all you need[C], 5998-6008(2017).

[9] DOSOVITSKIY A, BEYER L, KOLESNIKOV A et al. An Image Is Worth 16×16 Words： transformers for image recognition at scale[C], 1-22(2021).

[10] LIU Z, LIN Y T, CAO Y et al. Swin transformer： hierarchical vision transformer using shifted windows[C], 9992-10002(2021).

[11] CHEN X, YAN B, ZHU J et al. Transformer tracking[C], 8126-8135(2021).

[12] MAYER C, DANELLJAN M, BHAT G et al. Transforming model prediction for tracking[C], 8731-8740(2022).

[13] CHEN Q, WU Q M, WANG J et al. MixFormer： mixing features across windows and dimensions[C], 5239-5249(2022).

[14] GAO S Y, ZHOU C L, MA C et al[M]. AiATrack： Attention in Attention for Transformer Visual Tracking, 146-164(2022).

[15] CHEN X, PENG H W, WANG D et al. SeqTrack： sequence to sequence learning for visual object tracking[C], 14572-14581(2023).

[16] HINTON G, VINYALS O, DEAN J. Distilling The Knowledge in A Neural Network[webpage]. arXiv, 1503-02531(2015). https：//arxiv.org/abs/1503.02531v1

[17] ROMERO A, BALLAS N, KAHOU S E et al. Fitnets： hints for thin deep nets[C], 1-13(2015).

[18] ZHI Z, NING G H, HE Z H. Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks[webpage]. https：//arxiv.org/pdf/1710.09505.pdf

[19] ZHOU G R, FAN Y, CUI R P et al. Rocket launching： a universal and efficient framework for training well-performing light net[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 32, 4580-4587(2018).

[20] FU H, ZHOU S J, YANG Q H et al. LRC-BERT： latent-representation contrastive knowledge distillation for natural language understanding[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 12830-12838(2021).

[21] CHANG C W, ZHONG Z Q, LIOU J J. A FPGA implementation of farneback optical flow by high-level synthesis[C], 309-309(2019).

[22] HE K M, CHEN X L, XIE S N et al. Masked autoencoders are scalable vision learners[J]. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 15988(2022).

[23] CIPOLLA R, KENDALL A. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C], 7482-7491(2018).

[24] FAN H, LIN L T, YANG F et al. LaSOT： A High-Quality benchmark for large-scale single object tracking[C], 5369-5378(2019).

[25] HUANG L H, ZHAO X, HUANG K Q. GOT-10k： a large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1562-1577(2021).

[26] WU Y, YANG M H. Online object tracking： a benchmark[C]. OR, 2411-2418(2013).

[27] WU Y, YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1834-1848(2015).

[28] MUELLER M, SMITH N, GHANEM B. A benchmark and simulator for uav tracking[C], 445-461(2016).

[29] MÜLLER M, BIBI A, GIANCOLA S et al[M]. Trackingnet： a large-scale dataset and benchmark for object tracking in the wild, 310-327(2018).

[30] BERTINETTO L, VALMADRE J, HENRIQUES J F et al. Fully-convolutional siamese networks for object tracking[C], 850-865(2016).

[31] CHEN Z, ZHONG B, LI G et al. Siamese box adaptive network for visual tracking[C], 6668-6677(2020).

[32] GUO D Y, SHAO Y Y, CUI Y et al. Graph attention tracking[C], 9543-9552(2021).

[33] DANELLJAN M, BHAT G, KHAN F S et al. ATOM： accurate tracking by overlap maximization[C], 15, 4655-4664(2019).

[34] BHAT G, DANELLJAN M, VAN GOOL L et al. Learning discriminative model prediction for tracking[C], 6181-6190(2019).

[35] BLATTER P, KANAKIS M, DANELLJAN M et al. Efficient visual tracking with exemplar transformers[C], 1571-1581(2023).

[36] YAN B, PENG H W, WU K et al. LightTrack： finding lightweight neural networks for object tracking via one-shot architecture search[C], 20, 15180-15189(2021).

[37] CUI Y, SONG T, WU G et al. Mixformerv2： Efficient Fully Transformer Tracking[J]. https：//arxiv.org/pdf/2305.15896

[38] CHEN X, KANG B, WANG D et al[M]. Efficient Visual Tracking Via Hierarchical Cross-Attention Transformer, 461-477(2023).

[39] KANG B, CHEN X, WANG D et al. Exploring lightweight hierarchical vision transformers for efficient visual tracking[C], 9578-9587(2023).

Tools

Get Citation

Copy Citation Text

Na LI, Mengqiao LIU, Jinting PAN, Kai HUANG, Xingxuan JIA. A Transformer-based visual tracker via knowledge distillation[J]. Optics and Precision Engineering, 2025, 33(4): 653

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: Sep. 24, 2024

Accepted: --

Published Online: May. 20, 2025

The Author Email: Na LI (lina114@xupt.edu.cn)

DOI:10.37188/OPE.20253304.0653

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology