Acta Photonica Sinica, Volume. 53, Issue 3, 0310001(2024)

Object Detection Algorithm Based on CNN-Transformer Dual Modal Feature Fusion

Chen YANG1...2, Zhiqiang HOU1,2,*, Xinyue LI1,2, Sugang MA1,2, and Xiaobao YANG12 |Show fewer author(s)
Author Affiliations
  • 1School of Computer Science and Technology, Xi'an University of Posts & Telecommunications, Xi'an 710121, China
  • 2Shaanxi Provincial Key Laboratory of Network Data Analysis and Intelligent Processing,Xi'an 710121, China
  • show less
    References(46)

    [1] Wei YUAN, Ming YANG, Chunxiang WANG et al. VRDriving: a virtual-to-real autonomous driving framework based on adversarial learning. IEEE Transactions on Cognitive and Developmental Systems, 13, 912-921(2020).

    [2] M BILAL, A KHAN, M U K KHAN et al. A low-complexity pedestrian detection framework for smart video surveillance systems. IEEE Transactions on Circuits and Systems for Video Technology, 27, 2260-2273(2016).

    [3] Cheng ZHANG, Biyu CHEN, W H K LAM et al. Vehicle re-identification for lane-level travel time estimations on congested urban road networks using video images. IEEE Transactions on Intelligent Transportation Systems, 23, 12877-12893(2021).

    [4] R GIRSHICK, J DONAHUE, T DARRELL et al. Rich feature hierarchies for accurate object detection and semantic segmentation, 580-587(2014).

    [5] R GIRSHICK. Fast R-CNN, 1440-1448(2015).

    [6] S REN, K HE, R GIRSHICK et al. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 28(2015).

    [7] Wei LIU, D ANGUELOV, D ERHAN et al. SSD: single shot multibox detector, 21-37(2016).

    [8] J REDMON, S DIVVALA, R GIRSHICK et al. You only look once: unified, real-time object detection, 779-788(2016).

    [9] J REDMON, A FARHADI. YOLO9000: better, faster, stronger, 7263-7271(2017).

    [10] J REDMON, A FARHADI. Yolov3: an incremental improvement. arXiv preprint(2018).

    [11] A BOCHKOVSKIY, C Y WANG, H Y M LIAO. Yolov4: optimal speed and accuracy of object detection. arXiv preprint(2020).

    [13] Chuyi LI, Lulu LI, Hongliang JIANG et al. YOLOv6: a single-stage object detection framework for industrial applications. arXiv preprint(2022).

    [14] C WANG, A BOCHKOVSKIY, H Y M LIAO. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, 7464-7475(2023).

    [15] H LAW, J DENG. Cornernet: detecting objects as paired keypoints, 734-750(2018).

    [16] Zhi TIAN, Chunhua SHEN, Hao CHEN et al. Fcos: fully convolutional one-stage object detection, 9627-9636(2019).

    [17] Xingyi ZHOU, Dequan WANG, P KRÄHENBÜHL. Objects as points. arXiv preprint(2019).

    [18] N CARION, F MASSA, G SYNNAEVE et al. End-to-end object detection with transformers, 213-229(2020).

    [19] J BEAL, E KIM, E TZENG et al. Toward transformer-based object detection. arXiv preprint(2020).

    [20] Xizhou ZHU, Weijie SU, Lewei LU et al. Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint(2020).

    [21] A SRINIVAS, T Y LIN, N PARMAR et al. Bottleneck transformers for visual recognition, 16519-16529(2021).

    [22] Jianyuan GUO, Kai HAN, Han WU et al. CMT: convolutional neural networks meet vision transformers, 12175-12185(2022).

    [23] Yongping HAO, Zhaorui CAO, Fan BAI et al. Research on infrared visible image fusion and target recognition algorithm based on region of interest mask convolution neural network. Acta Photonica Sinica, 50, 0210002(2021).

    [24] Huilan LUO, Shan PENG, Hongkun CHEN. Review on latest research progress of challenging problems in object detection. Computer Engineering and Applications, 57, 36-46(2021).

    [25] Yaohui ZHU, Xiaoyu SUN, Miao WANG et al. Multi-modal feature pyramid transformer for RGB-infrared object detection. IEEE Transactions on Intelligent Transportation Systems, 24, 9984-9995(2023).

    [26] Heng ZHANG, E FROMONT, S LEFEVRE et al. Multispectral fusion for object detection with cyclic fuse-and-refine blocks, 276-280(2020).

    [27] Kailai ZHOU, Linsen CHEN, Xun CAO. Improving multispectral pedestrian detection by addressing modality imbalance problems, 787-803(2020).

    [28] Ming ZHAO, Haoran ZHANG. An infrared object detection method based on cross-domain fusion network. Acta Photonica Sinica, 50, 1110001(2021).

    [29] Xiaoxiao YANG, Yeqiang QIAN, Huijie ZHU et al. BAANet: learning bi-directional adaptive attention gates for multispectral pedestrian detection, 2920-2926(2022).

    [30] Qunyan JIANG, Juying DAI, Ting RUI et al. Attention-based cross-modality feature complementation for multispectral pedestrian detection. IEEE Access, 10, 53797-53809(2022).

    [31] Qingyun FANG, Dapeng HAN, Zhaokui WANG. Cross-modality fusion transformer for multispectral object detection. arXiv preprint(2021).

    [32] Yiting CHEN, Jinghao SHI, Zelin YE et al. Multimodal object detection via probabilistic ensembling, 139-158(2022).

    [33] Qingwang WANG, Yongke CHI, Tao SHEN et al. Improving RGB-infrared object detection by reducing cross-modality redundancy. Remote Sensing, 14, 2020(2022).

    [34] Yue CAO, BIN Junchi, J HAMARI et al. Multimodal object detection by channel switching and spatial attention, 403-411(2023).

    [35] Wensheng WANG, Jianxin REN, Chang SU et al. Ship detection in multispectral remote sensing images via saliency analysis. Applied Ocean Research, 106, 102448(2021).

    [36] Xiaoye ZHANG, Yong MA, Fan FAN et al. Infrared and visible image fusion via saliency analysis and local edge-preserving multi-scale decomposition. Journal of the Optical Society of America A, 34, 1400-1410(2017).

    [37] Heng ZHANG, E FROMONT, S LEFEVRE et al. Guided attentive feature fusion for multispectral pedestrian detection, 72-80(2021).

    [38] Zijia AN, Chunlei LIU, Yuqi HAN. Effectiveness guided cross-modal information sharing for aligned RGB-T object detection. IEEE Signal Processing Letters, 29, 2562-2566(2022).

    [39] Chengyang LI, Dan SONG, Ruofeng TONG et al. Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recognition, 85, 161-171(2019).

    [40] Yu BAI, Zhiqiang HOU, Xiaoyi LIU et al. Target detection algorithm based on decision-level fusion of visible light image and infrared image. Journal of Air Force Engineering University (Natural Science Edition), 21, 53-59(2020).

    [41] S HWANG, J PARK, N KIM et al. Multispectral pedestrian detection: benchmark dataset and baseline, 1037-1045(2015).

    [43] Chenglong LI, Nan ZHAO, Yijuan LU. Weighted sparse representation regularized graph learning for RGB-T object tracking, 1856-1864(2017).

    [44] Zhiqiang HOU, Ying SUN, Hao GUO et al. M-YOLO: an object detector based on global context information for infrared images. Journal of Real-Time Image Processing, 19, 1009-1022(2022).

    [45] C DEVAGUPTAPU, N AKOLEKAR, M SHARMA M et al. Borrow from anywhere: pseudo multi-modal object detection in thermal imagery(2019).

    [46] Yiming SUN, Bing CAO, Pengfei ZHU et al. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning. IEEE Transactions on Circuits and Systems for Video Technology, 32, 6700-6713(2022).

    Tools

    Get Citation

    Copy Citation Text

    Chen YANG, Zhiqiang HOU, Xinyue LI, Sugang MA, Xiaobao YANG. Object Detection Algorithm Based on CNN-Transformer Dual Modal Feature Fusion[J]. Acta Photonica Sinica, 2024, 53(3): 0310001

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Aug. 21, 2023

    Accepted: Sep. 26, 2023

    Published Online: May. 16, 2024

    The Author Email: HOU Zhiqiang (hou-zhq@sohu.com)

    DOI:10.3788/gzxb20245303.0310001

    Topics