Optics and Precision Engineering, Volume. 33, Issue 4, 653(2025)
A Transformer-based visual tracker via knowledge distillation
[1] MITRA S, ACHARYA T. Gesture recognition: a survey[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 37, 311-324(2007).
[2] COLLINS R T, LIPTON A J, KANADE T. Introduction to the special section on video surveillance[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 745-746(2000).
[3] HARITAOGLU I, HARWOOD D, DAVIS L S. W4: real-time surveillance of people and their activities[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 809-830(2000).
[4] KASTRINAKI V, ZERVAKIS M, KALAITZAKIS K. A survey of video processing techniques for traffic applications[J]. Image and Vision Computing, 21, 359-381(2003).
[5] RIOS-CABRERA R, TUYTELAARS T, VAN GOOL L. Efficient multi-camera vehicle detection, tracking, and identification in a tunnel surveillance application[J]. Computer Vision and Image Understanding, 116, 742-753(2012).
[6] DESOUZA G N, KAK A C. Vision for mobile robot navigation: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 237-267(2002).
[7] BONIN-FONT F, ORTIZ A, OLIVER G. Visual navigation for mobile robots: a survey[J]. Journal of Intelligent and Robotic Systems, 53, 263-296(2008).
[8] ASHISH V, NOAM S, NIKI P et al. Attention is all you need[C], 5998-6008(2017).
[9] DOSOVITSKIY A, BEYER L, KOLESNIKOV A et al. An Image Is Worth 16×16 Words: transformers for image recognition at scale[C], 1-22(2021).
[10] LIU Z, LIN Y T, CAO Y et al. Swin transformer: hierarchical vision transformer using shifted windows[C], 9992-10002(2021).
[11] CHEN X, YAN B, ZHU J et al. Transformer tracking[C], 8126-8135(2021).
[12] MAYER C, DANELLJAN M, BHAT G et al. Transforming model prediction for tracking[C], 8731-8740(2022).
[13] CHEN Q, WU Q M, WANG J et al. MixFormer: mixing features across windows and dimensions[C], 5239-5249(2022).
[14] GAO S Y, ZHOU C L, MA C et al[M]. AiATrack: Attention in Attention for Transformer Visual Tracking, 146-164(2022).
[15] CHEN X, PENG H W, WANG D et al. SeqTrack: sequence to sequence learning for visual object tracking[C], 14572-14581(2023).
[17] ROMERO A, BALLAS N, KAHOU S E et al. Fitnets: hints for thin deep nets[C], 1-13(2015).
[19] ZHOU G R, FAN Y, CUI R P et al. Rocket launching: a universal and efficient framework for training well-performing light net[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 32, 4580-4587(2018).
[20] FU H, ZHOU S J, YANG Q H et al. LRC-BERT: latent-representation contrastive knowledge distillation for natural language understanding[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 12830-12838(2021).
[21] CHANG C W, ZHONG Z Q, LIOU J J. A FPGA implementation of farneback optical flow by high-level synthesis[C], 309-309(2019).
[22] HE K M, CHEN X L, XIE S N et al. Masked autoencoders are scalable vision learners[J]. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 15988(2022).
[23] CIPOLLA R, KENDALL A. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C], 7482-7491(2018).
[24] FAN H, LIN L T, YANG F et al. LaSOT: A High-Quality benchmark for large-scale single object tracking[C], 5369-5378(2019).
[25] HUANG L H, ZHAO X, HUANG K Q. GOT-10k: a large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1562-1577(2021).
[26] WU Y, YANG M H. Online object tracking: a benchmark[C]. OR, 2411-2418(2013).
[27] WU Y, YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1834-1848(2015).
[28] MUELLER M, SMITH N, GHANEM B. A benchmark and simulator for uav tracking[C], 445-461(2016).
[29] MÜLLER M, BIBI A, GIANCOLA S et al[M]. Trackingnet: a large-scale dataset and benchmark for object tracking in the wild, 310-327(2018).
[30] BERTINETTO L, VALMADRE J, HENRIQUES J F et al. Fully-convolutional siamese networks for object tracking[C], 850-865(2016).
[31] CHEN Z, ZHONG B, LI G et al. Siamese box adaptive network for visual tracking[C], 6668-6677(2020).
[32] GUO D Y, SHAO Y Y, CUI Y et al. Graph attention tracking[C], 9543-9552(2021).
[33] DANELLJAN M, BHAT G, KHAN F S et al. ATOM: accurate tracking by overlap maximization[C], 15, 4655-4664(2019).
[34] BHAT G, DANELLJAN M, VAN GOOL L et al. Learning discriminative model prediction for tracking[C], 6181-6190(2019).
[35] BLATTER P, KANAKIS M, DANELLJAN M et al. Efficient visual tracking with exemplar transformers[C], 1571-1581(2023).
[36] YAN B, PENG H W, WU K et al. LightTrack: finding lightweight neural networks for object tracking via one-shot architecture search[C], 20, 15180-15189(2021).
[38] CHEN X, KANG B, WANG D et al[M]. Efficient Visual Tracking Via Hierarchical Cross-Attention Transformer, 461-477(2023).
[39] KANG B, CHEN X, WANG D et al. Exploring lightweight hierarchical vision transformers for efficient visual tracking[C], 9578-9587(2023).
Get Citation
Copy Citation Text
Na LI, Mengqiao LIU, Jinting PAN, Kai HUANG, Xingxuan JIA. A Transformer-based visual tracker via knowledge distillation[J]. Optics and Precision Engineering, 2025, 33(4): 653
Category:
Received: Sep. 24, 2024
Accepted: --
Published Online: May. 20, 2025
The Author Email: Na LI (lina114@xupt.edu.cn)