Infrared Technology, Volume. 47, Issue 8, 998(2025)
RGBT Adaptive Fusion Visual Tracking with Transformer
[1] [1] LI P, WANG D, WANGL, et al. Deep visual tracking: review and experimental comparison[J].Pattern Recognition, 2018,76: 323-338.
[2] [2] Bolme D S, Beveridge J R, Draper B A, et al. Visual object tracking using adaptive correlation filters[C]//2010IEEE Computer Society Conference on Computer Vision and Pattern Recognition of IEEE, 2010: 2544-2550.
[3] [3] Henriques J F, Caseiro R, Martins P, et al. Exploiting the circulant structure of tracking-by-detection with kernels[C]//12th European Conference on Computer Vision, 2012: 702-715.
[4] [4] Held D, Thrun S, Savarese S. Learning to track at 100 fps with deep regression networks[C]//14th European Conference, 2016: 749-765.
[5] [5] Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional siamese networks for object tracking[C]//Proceedings of Computer Vision-ECCV, 2016: 850-865.
[6] [6] WU Y, Blasch E, CHEN G, et al. Multiple source data fusion via sparse representation for robust visual tracking[C]//14th International Conference on Information Fusion of IEEE, 2011: 1-8.
[7] [7] Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4293-4302.
[8] [8] ZHU Y, LI C, LUO B, et al. Dense feature aggregation and pruning for RGBT tracking[C]//Proceedings of the27th ACM International Conference on Multimedia, 2019: 465-472.
[9] [9] ZHANG X, YE P, PENG S, et al. SiamFT: an RGB-infrared fusion tracking method via fully convolutional Siamese networks[J].IEEE Access, 2019,7: 122122-122133.
[10] [10] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J].Advances in Neural Information Processing Systems, 2017: 84-90.
[11] [11] GUO C, YANG D, LI C, et al. Dual siamese network for RGBT tracking via fusing predicted position maps[J].The Visual Computer, 2022,38(7): 2555-2567.
[12] [12] LI B, YAN J, WU W, et al. High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8971-8980.
[13] [13] ZHANG Z, WU Y, ZHANG J, et al. Efficient channel attention for deep convolutional neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 11531-11539.
[14] [14] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J].Advances in Neural Information Processing Systems, 2017,30: 6000-6010.
[15] [15] YAN B, PENG H, FU J, et al. Learning spatio-temporal transformer for visual tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10448-10457.
[16] [16] CHEN X, YAN B, ZHU J, et al. Transformer tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 8126-8135.
[17] [17] LIN H, CHENG X, WU X, et al. Cat: cross attention in vision transformer[C]//2022IEEE International Conference on Multimedia and Expo(ICME)of IEEE, 2022: 1-6.
[18] [18] ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(7): 12993-13000.
[19] [19] DONG X, BAO J, CHEN D, et al. Cswin transformer: a general vision transformer backbone with cross-shaped windows[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12124-12134.
[20] [20] LI X, WANG W, HU X, et al. Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 510-519.
[21] [21] XUE L, LI X, ZHANG N L. Not all attention is needed: gated attention network for sequence data[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(4): 6550-6557.
[22] [22] LI C, CHENG H, HU S, et al. Learning collaborative sparse representation for grayscale-thermal tracking[J].IEEE Transactions on Image Processing, 2016,25(12): 5743-5756.
[23] [23] LI C, LIANG X, LU Y, et al. RGB-T object tracking: Benchmark and baseline[J].Pattern Recognition, 2019,96: 106977.
[24] [24] LI C L, LU A D, ZHENG A H, et al. Multi-adapter RGBT tracking[C]//2019IEEE/CVF International Conference on Computer Vision Workshop(ICCVW), 2019: 2262-2270.
[25] [25] GAO Y, LI C, ZHU Y, et al. Deep adaptive fusion network for high performance RGBT tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019: 91-99.
[26] [26] LI C, ZHAO N, LU Y, et al. Weighted sparse representation regularized graph learning for RGB-T object tracking[C]//Proceedings of the25th ACM International Conference on Multimedia, 2017: 1856-1864.
[27] [27] PU S, SONG Y, MA C, et al. Deep attentive tracking via reciprocative learning[J].Advances in Neural Information Processing Systems, 2018,31: 1935-1945.
[28] [28] DANELLJAN M, BHAT G, SHAHBAZ KHAN F, et al. Eco: efficient convolution operators for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6638-6646.
[29] [29] ZHANG Z, PENG H. Deeper and wider siamese networks for real-time visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 4591-4600.
[30] [30] Danelljan M, Robinson A, Shahbaz Khan F, et al. Beyond correlation filters: learning continuous convolution operators for visual tracking[C]//14th European Conference, 2016: 472-488.
[31] [31] Valmadre J, Bertinetto L, Henriques J, et al. End-to-end representation learning for correlation filter based tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2805-2813.
[32] [32] WU Y, Blasch E, CHEN G, et al. Multiple source data fusion via sparse representation for robust visual tracking[C]//14th International Conference on Information Fusion of IEEE, 2011: 1-8.
Get Citation
Copy Citation Text
GUO Yong, SHEN Haiyun, CHEN Jianyu, XIAO Zhangyong. RGBT Adaptive Fusion Visual Tracking with Transformer[J]. Infrared Technology, 2025, 47(8): 998