Infrared Technology, Volume. 47, Issue 8, 998(2025)

RGBT Adaptive Fusion Visual Tracking with Transformer

Yong GUO, Haiyun SHEN*, Jianyu CHEN, and Zhangyong XIAO
Author Affiliations
  • School of Electrical Information, Southwest Petroleum University, Chengdu 610500, China
  • show less
    References(32)

    [1] [1] LI P, WANG D, WANGL, et al. Deep visual tracking: review and experimental comparison[J].Pattern Recognition, 2018,76: 323-338.

    [2] [2] Bolme D S, Beveridge J R, Draper B A, et al. Visual object tracking using adaptive correlation filters[C]//2010IEEE Computer Society Conference on Computer Vision and Pattern Recognition of IEEE, 2010: 2544-2550.

    [3] [3] Henriques J F, Caseiro R, Martins P, et al. Exploiting the circulant structure of tracking-by-detection with kernels[C]//12th European Conference on Computer Vision, 2012: 702-715.

    [4] [4] Held D, Thrun S, Savarese S. Learning to track at 100 fps with deep regression networks[C]//14th European Conference, 2016: 749-765.

    [5] [5] Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional siamese networks for object tracking[C]//Proceedings of Computer Vision-ECCV, 2016: 850-865.

    [6] [6] WU Y, Blasch E, CHEN G, et al. Multiple source data fusion via sparse representation for robust visual tracking[C]//14th International Conference on Information Fusion of IEEE, 2011: 1-8.

    [7] [7] Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4293-4302.

    [8] [8] ZHU Y, LI C, LUO B, et al. Dense feature aggregation and pruning for RGBT tracking[C]//Proceedings of the27th ACM International Conference on Multimedia, 2019: 465-472.

    [9] [9] ZHANG X, YE P, PENG S, et al. SiamFT: an RGB-infrared fusion tracking method via fully convolutional Siamese networks[J].IEEE Access, 2019,7: 122122-122133.

    [10] [10] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J].Advances in Neural Information Processing Systems, 2017: 84-90.

    [11] [11] GUO C, YANG D, LI C, et al. Dual siamese network for RGBT tracking via fusing predicted position maps[J].The Visual Computer, 2022,38(7): 2555-2567.

    [12] [12] LI B, YAN J, WU W, et al. High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8971-8980.

    [13] [13] ZHANG Z, WU Y, ZHANG J, et al. Efficient channel attention for deep convolutional neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 11531-11539.

    [14] [14] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J].Advances in Neural Information Processing Systems, 2017,30: 6000-6010.

    [15] [15] YAN B, PENG H, FU J, et al. Learning spatio-temporal transformer for visual tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10448-10457.

    [16] [16] CHEN X, YAN B, ZHU J, et al. Transformer tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 8126-8135.

    [17] [17] LIN H, CHENG X, WU X, et al. Cat: cross attention in vision transformer[C]//2022IEEE International Conference on Multimedia and Expo(ICME)of IEEE, 2022: 1-6.

    [18] [18] ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(7): 12993-13000.

    [19] [19] DONG X, BAO J, CHEN D, et al. Cswin transformer: a general vision transformer backbone with cross-shaped windows[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12124-12134.

    [20] [20] LI X, WANG W, HU X, et al. Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 510-519.

    [21] [21] XUE L, LI X, ZHANG N L. Not all attention is needed: gated attention network for sequence data[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(4): 6550-6557.

    [22] [22] LI C, CHENG H, HU S, et al. Learning collaborative sparse representation for grayscale-thermal tracking[J].IEEE Transactions on Image Processing, 2016,25(12): 5743-5756.

    [23] [23] LI C, LIANG X, LU Y, et al. RGB-T object tracking: Benchmark and baseline[J].Pattern Recognition, 2019,96: 106977.

    [24] [24] LI C L, LU A D, ZHENG A H, et al. Multi-adapter RGBT tracking[C]//2019IEEE/CVF International Conference on Computer Vision Workshop(ICCVW), 2019: 2262-2270.

    [25] [25] GAO Y, LI C, ZHU Y, et al. Deep adaptive fusion network for high performance RGBT tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019: 91-99.

    [26] [26] LI C, ZHAO N, LU Y, et al. Weighted sparse representation regularized graph learning for RGB-T object tracking[C]//Proceedings of the25th ACM International Conference on Multimedia, 2017: 1856-1864.

    [27] [27] PU S, SONG Y, MA C, et al. Deep attentive tracking via reciprocative learning[J].Advances in Neural Information Processing Systems, 2018,31: 1935-1945.

    [28] [28] DANELLJAN M, BHAT G, SHAHBAZ KHAN F, et al. Eco: efficient convolution operators for tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6638-6646.

    [29] [29] ZHANG Z, PENG H. Deeper and wider siamese networks for real-time visual tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 4591-4600.

    [30] [30] Danelljan M, Robinson A, Shahbaz Khan F, et al. Beyond correlation filters: learning continuous convolution operators for visual tracking[C]//14th European Conference, 2016: 472-488.

    [31] [31] Valmadre J, Bertinetto L, Henriques J, et al. End-to-end representation learning for correlation filter based tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2805-2813.

    [32] [32] WU Y, Blasch E, CHEN G, et al. Multiple source data fusion via sparse representation for robust visual tracking[C]//14th International Conference on Information Fusion of IEEE, 2011: 1-8.

    Tools

    Get Citation

    Copy Citation Text

    GUO Yong, SHEN Haiyun, CHEN Jianyu, XIAO Zhangyong. RGBT Adaptive Fusion Visual Tracking with Transformer[J]. Infrared Technology, 2025, 47(8): 998

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Dec. 9, 2023

    Accepted: Sep. 15, 2025

    Published Online: Sep. 15, 2025

    The Author Email: SHEN Haiyun (202021000155@stu.swpu.edu.cn)

    DOI:

    Topics