Remote Sensing Technology and Application, Volume. 40, Issue 4, 864(2025)

Cross-modal Feature Decoupling and Focalizing Network for Robust UAV-based Road Traffic Scenes Semantic Segmentation

Qingwang WANG, Junlin OUYANG, Pengcheng JIN, and Tao SHEN*
Author Affiliations
  • Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming650500, China
  • show less
    References(29)

    [1] [1] JINMengting,XUQuan,GUOPeng,et al.Crop classification method from UAV images based on object-oriented multi-feature learning[J]. Remote Sensing Technology and Application,2023,38(3):588-598.

    [2] [2] TANGYu, ZOUZhigang, ZHOUXinhui, et al. Identification of photovoltaic string and assessment of carbon emission reduction effects based on Unmanned Aerial Vehicle(UAV) imagery[J].Remote Sensing Technology and Application, 2024, 39(6): 1543-54.

    [3] [3] DONGXiuchun, LIUZhongyou, JIANGYi, et al. Winter wheat extraction of WorldView-2 image based on semantic segmentation method[J]. Remote Sensing Technology and Application, 2022, 37(3): 564-570.

    [4] [4] HUTengyun, XIEPengfei, WENYanan, et al. Research on building footprints extraction methods based on different deep learning models[J]. Remote Sensing Technology and Application, 2023, 38(4): 892-902.

    [5] [5] WANGYun, LIAOMengguang, CHUNan, et al. Semantic segmentation model-based mangrove identification method and time-series variation analysis in Wenzhou city[J]. Remote Sensing Technology and Application,2025,40(3):545-556.

    [6] [6] YANGX H, LIH Q, ZHUW, et al. RSHRNet: Improved HRNet-based semantic segmentation for UAV rice seedling images in mechanical transplanting quality assessment[J]. Computers and Electronics in Agriculture,2025,234:110273. DOI:10.1016/j.compag.2025.110273

    [7] [7] HAQ S, WATANABEK, KARASAWAT, et al. MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes[C]∥Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017: 5108-5115. DOI: 10.1109/IROS.2017.8206396

    [8] [8] SUNY X, ZUOW X, LIUM. RTFNet: RGB-thermal fusion network for semantic segmentation of urban scenes[J]. IEEE Robotics and Automation Letters, 2019, 4(3): 2576-2583. DOI: 10.1109/LRA.2019.2904733

    [9] [9] SUNY X, ZUOW X, YUNP, et al. FuseSeg: Semantic segmentation of urban[J]. IEEE Transactions on Automation Science and Engineering,2021,18(3)10:1000ZHOU- 1011W. DOI: 10.1109/ TASE.2020.2993143J

    [10] [10] LINX Y, LEIJ S, et al. MFFENet: Multiscale feature fusion and enhancement network for RGB–thermal urban road scene parsing[J]. IEEE Transactions on Multimedia, 2021, 24: 2526-2538. DOI: 10.1109/TMM.2021.3086618

    [11] [11] ZHOUW J,LIUJ F,LEIJ S,et al.GMNet: Graded-feature multilabel-learning network for RGB-thermal urban scene semantic segmentation[J].IEEE Transactions on Image Processing,2021,30:7790-802. DOI:10.1109/TIP.2021. 3109518

    [12] [12] HOUY L, JIAY, HOUZ J, et al. IAFFNet: Illumination-aware feature fusion network for all-day RGB-thermal semantic segmentation of road scenes[J]. IEEE Access, 2022, 10: 129702-129711.

    [13] [13] CHENY, ZHANW D, JIANGY C, et al. LASNet: A light-weight asymmetric spatial feature network for real-time semantic segmentation[J]. Electronics, 2022, 11(19): 3238. DOI:10.3390/electronics11193238

    [14] [14] WANGQ W, YINC, SONGH H, et al. UTFNet: Uncertainty-guided trustworthy fusion network for RGB-thermal semantic segmentation[J]. IEEE Geoscience Remote Sensing Letters, 2023, 20: 1-5. DOI:10.1109/LGRS.2023.3322452

    [15] [15] ZHANGQ, ZHAOS L, LUOY J, et al. ABMDRNet: Adaptive-weighted bi-directional modality difference reduction network for RGB-T semantic segmentation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021: 2633-2642. DOI: 10.1109/cvpr46437.2021.00266

    [16] [16] ZHAOS L,LIUY C,JIAOQ, et al. Mitigating modality discrepancies for RGB-T semantic segmentation[J].IEEE Transactions on Neural Networks and Learning Systems, 2024,35(7):9380-9394. DOI:10.1109/TNNLS.2022.3233089

    [17] [17] ZHOUH, TIANC H, ZHANGZ X, et al. Multispectral fusion transformer network for RGB-thermal urban scene semantic segmentation[J]. IEEE Geoscience Remote Sensing Letter,2022,19:1-5. DOI:10.1109/LGRS.2022.3179721

    [18] [18] ZHANGJ M, LIUH Y, YANGK L, et al. CMX: Cross-modal fusion for RGB-X semantic segmentation with transformers[J]. IEEE Transation on Intelligent Transportation Systems,2023,24(12):14679-14694. DOI:10.1109/TITS. 2023.3300537

    [19] [19] WANZ F, ZHANGP P, WANGY H, et al. Sigma: Siamese mamba network for multi-modal semantic segmentation; proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV),F,2025[C]∥ IEEE, 2025. DOI:10.1109/WACV61041.2025.00176

    [20] [20] GUOX D, LINZ A, HUL W, et al. Cross-modal State Space Modeling for Real-time RGB-thermal Wild Scene Semantic Segmentation[J]. arXiv Preprint. 2025.10.48550/arXiv.2506.17869

    [21] [21] OUYANGJunlin , WANGQingwang, SHENtao. Kust4K: An RGB-TIR Dataset from UAV Platform for Robust Urban Traffic Scenes Semantic Segmentation[DB/OL].Figshare. 2025.10.6084/m9.figshare.29476610.v3

    [22] [22] CARIONN, MASSAF, SYNNAEVEG, et al. End-to-end object detection with transformers; proceedings of the European conference on computer vision F2020[J]. arXiv Preprint. DOI:arXiv:2005.12872

    [23] [23] CHENGB W, SCHWINGA, KIRILLOVAlexander. Per-pixel classification is not all you need for semantic segmentation[J]. Advances in Neural Information Processing Systems, 2021, 34: 17864-17875. DOI:10.5555/3540261.3541628

    [24] [24] LIF, ZHANGH, XUH Z, et al. Mask DINO: Towards a unified transformer-based framework for object detection and segmentation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE,2023:3041-3050. DOI: 10.1109/CVPR52729.2023.00297

    [25] [25] LIANGM J, HUJ J, BAOC Y, et al. Explicit attention-enhanced fusion for RGB-thermal perception tasks[J]. IEEE Robotics and Automation Letters, 2023, 8(7): 4060-4067. DOI: 10.1109/LRA.2023.3272269

    [26] [26] DENGF Q, FENGH, LIANGM J, et al. FEANet: Feature-enhanced attention network for RGB-thermal real-time semantic segmentation[C]∥Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).IEEE,2021:4467-4473. DOI:10.1109/iros51168. 2021.9636084

    [27] [27] RONNEBERGERO, FISCHERP, BROXT. U-Net: Convolutional networks for biomedical image segmentation[M]∥Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham: Springer International Publishing,2015:234-241. DOI: 10.1007/978-3-319-24574-4_28

    [28] [28] XIAOT T, LIUY C, ZHOUB L, et al. Unified perceptual parsing for scene understanding[C]∥ Proceedings of the European Conference on Computer Vision (ECCV), F, 2018. DOI:arXiv:1807.10221

    [29] [29] ZHANGJ M, LIUR P, SHIH, et al. Delivering arbitrary-modal semantic segmentation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2023:1136-1147. DOI:10.1109/CVPR 52729.2023.00116

    Tools

    Get Citation

    Copy Citation Text

    Qingwang WANG, Junlin OUYANG, Pengcheng JIN, Tao SHEN. Cross-modal Feature Decoupling and Focalizing Network for Robust UAV-based Road Traffic Scenes Semantic Segmentation[J]. Remote Sensing Technology and Application, 2025, 40(4): 864

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: May. 11, 2025

    Accepted: --

    Published Online: Aug. 26, 2025

    The Author Email: Tao SHEN (shentao@kust.edu.cn)

    DOI:10.11873/j.issn.1004-0323.2025.4.0864

    Topics