Acta Photonica Sinica, Volume. 54, Issue 6, 0610001(2025)

Lightweight Pedestrian Vehicle Detection Algorithm Based on Visible and Infrared Bimodal Fusion

Cuixia GUO, Yongtao XU, Zhanghuang ZOU, Zhijie PAN, and Feng HUANG*
Author Affiliations
  • School of Mechanical Engineering and Automation,Fuzhou University,Fuzhou 350000,China
  • show less

    With the continuous development of computer vision research, the application of deep learning has become more and more widespread. The introduction of deep learning into object detection research has greatly improved detection performance. However, in order to further improve the detection accuracy of the model, the depth and width of the network are continuously increased, which leads to models containing a large number of parameters and complex structures, posing challenges for practical deployment. Aiming at the problems of high computational cost, high memory consumption, and difficulty in efficient deployment on edge devices of visible-infrared dual-modal fusion detection models, this paper proposes a lightweight pedestrian-vehicle detection model based on visible and infrared modality fusion. The multimodal input greatly improves the stability of the algorithm during all-weather operation, and the detection task can be well accomplished regardless of snowy, foggy, or low-illumination conditions. The proposed model uses the MobileNetV2 lightweight network instead of the YOLOv7-tiny network backbone. The MobileNetV2 network employs an inverted residual structure and a linear bottleneck layer, both of which effectively enhance its feature representation and learning capability. It also uses depthwise separable convolution, which, unlike conventional convolution, divides the convolution into pointwise and depthwise components. This paper also proposes a differential modal fusion module inspired by the principle of differential amplification circuits. This module differentiates the two modal images, extracts the differential and common mode information, and amplifies the differential mode information to fully utilize the complementary advantages of visible and infrared modalities. The illumination-aware module is introduced due to the fact that under low illumination and bad weather conditions, infrared and visible images have different impacts on model performance. This module dynamically assigns weights to visible and infrared features based on illumination conditions, thus maximizing the feature information of each modality. Three public datasets are used for the experiments: FLIR ADAS, LLVIP, and KAIST. We conduct comparative experiments between lightweight single-modal algorithms such as YOLOv5s and YOLOv7-tiny, and dual-modal algorithms such as ICAFusion and CFT. The results show that the bimodal detection models outperform the unimodal detection models in terms of detection performance. On the FLIR ADAS dataset, the proposed model improves the detection accuracy by 11.6% and 15.3%, respectively, compared to the unimodal YOLOv5s and YOLOv7-tiny models with RGB input. Compared to the unimodal YOLOv5s and YOLOv7-tiny models with infrared input, the detection accuracy is improved by 3.3% and 4.9%, respectively. Compared with the baseline model, the accuracy of the proposed model is improved by 3.8%. Compared with the bimodal models ICAFusion and CFT, the proposed model improves the detection accuracy by 3.8% and 1.9%, respectively. On the LLVIP dataset, the proposed model improves the detection accuracy by 6.9% and 1.1%, respectively, compared to YOLOv7-tiny when using visible and infrared unimodal inputs. The proposed model improves the detection accuracy by 5.4% and 2.0%, respectively, compared to YOLOv5s with visible and infrared input. Compared with the baseline model, the accuracy is improved by 1.3%. Compared with the bimodal models ICAFusion and SLBAF, the proposed model improves the detection accuracy by 9.6% and 1.5%, respectively. On the KAIST dataset, the proposed model improves detection accuracy by 26.9% and 7.8% on visible and infrared inputs, respectively, compared to YOLOv7-tiny. Compared with the baseline model, the accuracy is improved by 3.4%. Compared to other bimodal models, the proposed model shows the most advantageous results in terms of detection accuracy, achieving as high as 76.2%. In terms of inference speed, the proposed model achieves 208 FPS, 103 FPS, and 113 FPS on the FLIR ADAS, LLVIP, and KAIST datasets, respectively. These results show that the proposed model has significant advantages in both detection accuracy and speed, as well as robustness.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Cuixia GUO, Yongtao XU, Zhanghuang ZOU, Zhijie PAN, Feng HUANG. Lightweight Pedestrian Vehicle Detection Algorithm Based on Visible and Infrared Bimodal Fusion[J]. Acta Photonica Sinica, 2025, 54(6): 0610001

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Nov. 26, 2024

    Accepted: Jan. 20, 2025

    Published Online: Jul. 14, 2025

    The Author Email: Feng HUANG (huangf@fzu.edu.cn)

    DOI:10.3788/gzxb20255406.0610001

    Topics