Cross-modal Feature Decoupling and Focalizing Network for Robust UAV-based Road Traffic Scenes Semantic Segmentation

[1] [1] JINMengting，XUQuan，GUOPeng，et al.Crop classification method from UAV images based on object-oriented multi-feature learning［J］. Remote Sensing Technology and Application，2023，38（3）：588-598.

[2] [2] TANGYu， ZOUZhigang， ZHOUXinhui， et al. Identification of photovoltaic string and assessment of carbon emission reduction effects based on Unmanned Aerial Vehicle（UAV） imagery［J］.Remote Sensing Technology and Application， 2024， 39（6）： 1543-54.

[3] [3] DONGXiuchun， LIUZhongyou， JIANGYi， et al. Winter wheat extraction of WorldView-2 image based on semantic segmentation method［J］. Remote Sensing Technology and Application， 2022， 37（3）： 564-570.

[4] [4] HUTengyun， XIEPengfei， WENYanan， et al. Research on building footprints extraction methods based on different deep learning models［J］. Remote Sensing Technology and Application， 2023， 38（4）： 892-902.

[5] [5] WANGYun， LIAOMengguang， CHUNan， et al. Semantic segmentation model-based mangrove identification method and time-series variation analysis in Wenzhou city［J］. Remote Sensing Technology and Application，2025，40（3）：545-556.

[6] [6] YANGX H， LIH Q， ZHUW， et al. RSHRNet： Improved HRNet-based semantic segmentation for UAV rice seedling images in mechanical transplanting quality assessment［J］. Computers and Electronics in Agriculture，2025，234：110273. DOI：10.1016/j.compag.2025.110273

[7] [7] HAQ S， WATANABEK， KARASAWAT， et al. MFNet： Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes［C］∥Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS）. IEEE， 2017： 5108-5115. DOI： 10.1109/IROS.2017.8206396

[8] [8] SUNY X， ZUOW X， LIUM. RTFNet： RGB-thermal fusion network for semantic segmentation of urban scenes［J］. IEEE Robotics and Automation Letters， 2019， 4（3）： 2576-2583. DOI： 10.1109/LRA.2019.2904733

[9] [9] SUNY X， ZUOW X， YUNP， et al. FuseSeg： Semantic segmentation of urban［J］. IEEE Transactions on Automation Science and Engineering，2021，18（3）10：1000ZHOU- 1011W. DOI： 10.1109/ TASE.2020.2993143J

[10] [10] LINX Y， LEIJ S， et al. MFFENet： Multiscale feature fusion and enhancement network for RGB–thermal urban road scene parsing［J］. IEEE Transactions on Multimedia， 2021， 24： 2526-2538. DOI： 10.1109/TMM.2021.3086618

[11] [11] ZHOUW J，LIUJ F，LEIJ S，et al.GMNet： Graded-feature multilabel-learning network for RGB-thermal urban scene semantic segmentation［J］.IEEE Transactions on Image Processing，2021，30：7790-802. DOI：10.1109/TIP.2021. 3109518

[12] [12] HOUY L， JIAY， HOUZ J， et al. IAFFNet： Illumination-aware feature fusion network for all-day RGB-thermal semantic segmentation of road scenes［J］. IEEE Access， 2022， 10： 129702-129711.

[13] [13] CHENY， ZHANW D， JIANGY C， et al. LASNet： A light-weight asymmetric spatial feature network for real-time semantic segmentation［J］. Electronics， 2022， 11（19）： 3238. DOI：10.3390/electronics11193238

[14] [14] WANGQ W， YINC， SONGH H， et al. UTFNet： Uncertainty-guided trustworthy fusion network for RGB-thermal semantic segmentation［J］. IEEE Geoscience Remote Sensing Letters， 2023， 20： 1-5. DOI：10.1109/LGRS.2023.3322452

[15] [15] ZHANGQ， ZHAOS L， LUOY J， et al. ABMDRNet： Adaptive-weighted bi-directional modality difference reduction network for RGB-T semantic segmentation［C］∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. IEEE， 2021： 2633-2642. DOI： 10.1109/cvpr46437.2021.00266

[16] [16] ZHAOS L，LIUY C，JIAOQ， et al. Mitigating modality discrepancies for RGB-T semantic segmentation［J］.IEEE Transactions on Neural Networks and Learning Systems， 2024，35（7）：9380-9394. DOI：10.1109/TNNLS.2022.3233089

[17] [17] ZHOUH， TIANC H， ZHANGZ X， et al. Multispectral fusion transformer network for RGB-thermal urban scene semantic segmentation［J］. IEEE Geoscience Remote Sensing Letter，2022，19：1-5. DOI：10.1109/LGRS.2022.3179721

[18] [18] ZHANGJ M， LIUH Y， YANGK L， et al. CMX： Cross-modal fusion for RGB-X semantic segmentation with transformers［J］. IEEE Transation on Intelligent Transportation Systems，2023，24（12）：14679-14694. DOI：10.1109/TITS. 2023.3300537

[19] [19] WANZ F， ZHANGP P， WANGY H， et al. Sigma： Siamese mamba network for multi-modal semantic segmentation； proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision （WACV），F，2025［C］∥ IEEE， 2025. DOI：10.1109/WACV61041.2025.00176

[20] [20] GUOX D， LINZ A， HUL W， et al. Cross-modal State Space Modeling for Real-time RGB-thermal Wild Scene Semantic Segmentation［J］. arXiv Preprint. 2025.10.48550/arXiv.2506.17869

[21] [21] OUYANGJunlin ， WANGQingwang， SHENtao. Kust4K： An RGB-TIR Dataset from UAV Platform for Robust Urban Traffic Scenes Semantic Segmentation［DB/OL］.Figshare. 2025.10.6084/m9.figshare.29476610.v3

[22] [22] CARIONN， MASSAF， SYNNAEVEG， et al. End-to-end object detection with transformers； proceedings of the European conference on computer vision F2020［J］. arXiv Preprint. DOI：arXiv：2005.12872

[23] [23] CHENGB W， SCHWINGA， KIRILLOVAlexander. Per-pixel classification is not all you need for semantic segmentation［J］. Advances in Neural Information Processing Systems， 2021， 34： 17864-17875. DOI：10.5555/3540261.3541628

[24] [24] LIF， ZHANGH， XUH Z， et al. Mask DINO： Towards a unified transformer-based framework for object detection and segmentation［C］∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. IEEE，2023：3041-3050. DOI： 10.1109/CVPR52729.2023.00297

[25] [25] LIANGM J， HUJ J， BAOC Y， et al. Explicit attention-enhanced fusion for RGB-thermal perception tasks［J］. IEEE Robotics and Automation Letters， 2023， 8（7）： 4060-4067. DOI： 10.1109/LRA.2023.3272269

[26] [26] DENGF Q， FENGH， LIANGM J， et al. FEANet： Feature-enhanced attention network for RGB-thermal real-time semantic segmentation［C］∥Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS）.IEEE，2021：4467-4473. DOI：10.1109/iros51168. 2021.9636084

[27] [27] RONNEBERGERO， FISCHERP， BROXT. U-Net： Convolutional networks for biomedical image segmentation［M］∥Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham： Springer International Publishing，2015：234-241. DOI： 10.1007/978-3-319-24574-4_28

[28] [28] XIAOT T， LIUY C， ZHOUB L， et al. Unified perceptual parsing for scene understanding［C］∥ Proceedings of the European Conference on Computer Vision （ECCV）， F， 2018. DOI：arXiv：1807.10221

[29] [29] ZHANGJ M， LIUR P， SHIH， et al. Delivering arbitrary-modal semantic segmentation［C］∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）.IEEE，2023：1136-1147. DOI：10.1109/CVPR 52729.2023.00116

Tools

Get Citation

Copy Citation Text

Qingwang WANG, Junlin OUYANG, Pengcheng JIN, Tao SHEN. Cross-modal Feature Decoupling and Focalizing Network for Robust UAV-based Road Traffic Scenes Semantic Segmentation[J]. Remote Sensing Technology and Application, 2025, 40(4): 864

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: May. 11, 2025

Accepted: --

Published Online: Aug. 26, 2025

The Author Email: Tao SHEN (shentao@kust.edu.cn)

DOI:10.11873/j.issn.1004-0323.2025.4.0864

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology