Optics and Precision Engineering, Volume. 33, Issue 12, 1940(2025)

RGB-T tracking network based on multi-modal feature fusion

Jing JIN, Jianqin LIU*, and Fengwen ZHAI
Author Affiliations
  • School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou730070, China
  • show less

    In recent years, RGB-T tracking methods have been widely used in visual tracking tasks due to the complementarity of visible image and thermal infrared images. However, the existing RGB-T moving target tracking methods have not yet made full use of the complementary information between the two modalities, which limits the performance of the tracker. The existing Transformer-based RGB-T tracking algorithms are still short of direct interaction between the two modalities, which limits the full use of the original semantic information of RGB and TIR modalities. To solve this problem, the paper proposed a Multi-modal Feature Fusion Tracking Network for RGB-T (MMFFTN). Firstly, after extracting the preliminary features from the backbone network, the Channel Feature Fusion Module (CFFM) was introduced to realize the direct interaction and fusion of RGB and TIR channel features. Secondly, in order to solve the problem of unsatisfactory fusion effect caused by the difference between RGB and TIR modality, a Cross-Modal Feature Fusion Module (CMFM) was designed and the global features of RGB and TIR were further fused through an adaptive fusion strategy to improve the tracking accuracy. The proposed tracking model was evaluated in detail on three datasets: GTOT, RGBT234 and LasHeR. Experimental results demonstrate that MMFFTN improves the success rate and precision rate by 3.0% and 4.7% ,respectively compared with the current advanced Transformer-based tracker ViPT. Compared with the Transformer-based tracker SDSTrack, the success rate and accuracy are improved by 2.4% and 3.3%, respectively.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Jing JIN, Jianqin LIU, Fengwen ZHAI. RGB-T tracking network based on multi-modal feature fusion[J]. Optics and Precision Engineering, 2025, 33(12): 1940

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Nov. 25, 2024

    Accepted: --

    Published Online: Aug. 15, 2025

    The Author Email: Jianqin LIU (1970477938@qq.com)

    DOI:10.37188/OPE.20253312.1940

    Topics