Remote Sensing Technology and Application, Volume. 40, Issue 4, 864(2025)
Cross-modal Feature Decoupling and Focalizing Network for Robust UAV-based Road Traffic Scenes Semantic Segmentation
This study proposes a novel method to joint utilization of visible and thermal infrared images from UAV perspectives. The method involves the development of a multimodal semantic segmentation model, termed CDFNet, which is designed based on cross-modal feature decoupling and attention refocusing. A cross-modal feature decoupling module is introduced to explicitly disentangle and enhance complementary discriminative features from different modalities, thereby improving the representational capacity of fused features in complex urban scenes. Furthermore, a focalizing attention decoder is incorporated to dynamically refine the attention scope towards small-scale objects during decoding, thereby effectively mitigating the interference from noisy backgrounds. Extensive experimentation on the Kust4K dataset demonstrates that CDFNet achieves mIoU improvements of 6.3% and 3.1% over the baseline and the current state-of-the-art multimodal method Sigma, respectively. Feature visualization and modality robustness evaluations further confirm that CDFNet yields more robust feature representations under low signal-to-noise conditions and significantly enhances segmentation accuracy for small targets in challenging urban road scenes from UAV perspectives.
Get Citation
Copy Citation Text
WANG Qingwang, OUYANG Junlin, JIN Pengcheng, SHEN Tao. Cross-modal Feature Decoupling and Focalizing Network for Robust UAV-based Road Traffic Scenes Semantic Segmentation[J]. Remote Sensing Technology and Application, 2025, 40(4): 864
Received: May. 11, 2025
Accepted: Aug. 26, 2025
Published Online: Aug. 26, 2025
The Author Email: SHEN Tao (shentao@kust.edu.cn)