Cross-modal Feature Decoupling and Focalizing Network for Robust UAV-based Road Traffic Scenes Semantic Segmentation

Qingwang WANG; Junlin OUYANG; Pengcheng JIN; Tao SHEN

doi:10.11873/j.issn.1004-0323.2025.4.0864

Remote Sensing Technology and Application, Volume. 40, Issue 4, 864(2025)

Cross-modal Feature Decoupling and Focalizing Network for Robust UAV-based Road Traffic Scenes Semantic Segmentation

Qingwang WANG, Junlin OUYANG, Pengcheng JIN, and Tao SHEN^*

Author Affiliations

Faculty of Information Engineering and Automation，Kunming University of Science and Technology，Kunming650500， China

show less

Abstract Get PDF(in Chinese)

This study proposes a novel method to joint utilization of visible and thermal infrared images from UAV perspectives. The method involves the development of a multimodal semantic segmentation model， termed CDFNet， which is designed based on cross-modal feature decoupling and attention refocusing. A cross-modal feature decoupling module is introduced to explicitly disentangle and enhance complementary discriminative features from different modalities， thereby improving the representational capacity of fused features in complex urban scenes. Furthermore， a focalizing attention decoder is incorporated to dynamically refine the attention scope towards small-scale objects during decoding， thereby effectively mitigating the interference from noisy backgrounds. Extensive experimentation on the Kust4K dataset demonstrates that CDFNet achieves mIoU improvements of 6.3% and 3.1% over the baseline and the current state-of-the-art multimodal method Sigma， respectively. Feature visualization and modality robustness evaluations further confirm that CDFNet yields more robust feature representations under low signal-to-noise conditions and significantly enhances segmentation accuracy for small targets in challenging urban road scenes from UAV perspectives.

Keywords

Feature fusion Multi-modal Semantic segmentation Unmanned Aerial Vehicle（UAV）Urban road traffic

Tools

Get Citation

Copy Citation Text

Qingwang WANG, Junlin OUYANG, Pengcheng JIN, Tao SHEN. Cross-modal Feature Decoupling and Focalizing Network for Robust UAV-based Road Traffic Scenes Semantic Segmentation[J]. Remote Sensing Technology and Application, 2025, 40(4): 864

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: May. 11, 2025

Accepted: --

Published Online: Aug. 26, 2025

The Author Email: Tao SHEN (shentao@kust.edu.cn)

DOI:10.11873/j.issn.1004-0323.2025.4.0864

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology