RGB-T tracking network based on multi-modal feature fusion

Jing JIN; Jianqin LIU; Fengwen ZHAI

doi:10.37188/OPE.20253312.1940

Optics and Precision Engineering, Volume. 33, Issue 12, 1940(2025)

RGB-T tracking network based on multi-modal feature fusion

Jing JIN, Jianqin LIU^*, and Fengwen ZHAI

Author Affiliations

School of Electronic and Information Engineering， Lanzhou Jiaotong University， Lanzhou730070， China

show less

Abstract Get PDF(in Chinese)

In recent years， RGB-T tracking methods have been widely used in visual tracking tasks due to the complementarity of visible image and thermal infrared images. However， the existing RGB-T moving target tracking methods have not yet made full use of the complementary information between the two modalities， which limits the performance of the tracker. The existing Transformer-based RGB-T tracking algorithms are still short of direct interaction between the two modalities， which limits the full use of the original semantic information of RGB and TIR modalities. To solve this problem， the paper proposed a Multi-modal Feature Fusion Tracking Network for RGB-T （MMFFTN）. Firstly， after extracting the preliminary features from the backbone network， the Channel Feature Fusion Module （CFFM） was introduced to realize the direct interaction and fusion of RGB and TIR channel features. Secondly， in order to solve the problem of unsatisfactory fusion effect caused by the difference between RGB and TIR modality， a Cross-Modal Feature Fusion Module （CMFM） was designed and the global features of RGB and TIR were further fused through an adaptive fusion strategy to improve the tracking accuracy. The proposed tracking model was evaluated in detail on three datasets： GTOT， RGBT234 and LasHeR. Experimental results demonstrate that MMFFTN improves the success rate and precision rate by 3.0% and 4.7% ，respectively compared with the current advanced Transformer-based tracker ViPT. Compared with the Transformer-based tracker SDSTrack， the success rate and accuracy are improved by 2.4% and 3.3%， respectively.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

channel feature fusion cross-modal feature fusion RGB-T tracking transformer

Tools

Get Citation

Copy Citation Text

Jing JIN, Jianqin LIU, Fengwen ZHAI. RGB-T tracking network based on multi-modal feature fusion[J]. Optics and Precision Engineering, 2025, 33(12): 1940

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: Nov. 25, 2024

Accepted: --

Published Online: Aug. 15, 2025

The Author Email: Jianqin LIU (1970477938@qq.com)

DOI:10.37188/OPE.20253312.1940

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology