Objective Infrared object detection in UAV applications is of significant value, as it can enhance target recognition under low light, complex backgrounds, and extreme weather conditions. However, due to challenges such as target feature blurring, significant multi-target scale differences, and dynamic angle changes in UAV infrared images, existing models struggle to balance high accuracy and real-time performance on resource-constrained UAV hardware. Therefore, this paper proposes model optimization based on YOLOv8 for UAV-based infrared object detection, aiming to improve detection performance for complex backgrounds and dynamic targets while reducing computational resource usage, thereby better adapting to resource-constrained real-world environments.
Methods A lightweight UAV infrared object detection model, PSI-YOLO, is proposed based on multi-scale feature fusion and channel compression. First, to address the limitations of UAV computational resources and the loss of texture details in infrared images, a multi-scale feature extraction network, PHGNet (
Fig.2), is introduced. This backbone network integrates the HGNetV2 network with channel scaling (
Fig.3) and a partial perceptual spatial attention mechanism (
Fig.4), achieving a lightweight design while enhancing feature extraction accuracy. Second, to handle complex backgrounds and excessive angular changes in infrared images, which cause target image distortion, a Slim-neck is designed to improve information flow through grouped convolutions and channel rearrangement (
Fig.5), combined with cross-stage and partial residual connections (
Fig.6) for feature fusion. Finally, the Inner-Eiou (
Fig.7) loss function is introduced to accelerate model convergence and improve target localization accuracy, thereby strengthening target object detection performance.
Results and Discussions The experiments were conducted using the HIT-UAV dataset (
Fig.8), which is mainly used for personnel and vehicle detection in thermal infrared images of high-altitude UAVs. The feasibility of the improvements in each module is verified by ablation experiments (
Tab.2) and comparison experiments of different lightweight backbone networks (
Tab.3). The results show that PHGNet achieves a better balance between lightweight design and detection accuracy. Next, the performance of different loss functions is evaluated (
Tab.4), and the experimental results show that Inner-EIoU converges faster and with less fluctuation (
Fig.10). In addition, a comparison with the experimental results of different modeling algorithms (
Tab.5) shows that PSI-YOLO outperforms the benchmark model in detection performance (
Fig.11) and reduces the number of parameters, model size, and FLOPs by 35.5%, 25.4%, and 28.0%, respectively. Finally, heat maps (
Fig.12) and detection maps (
Fig.13) are provided, more comprehensively verifying the effectiveness of the improved model in reducing missed and false detection rates.
Conclusions A lightweight object detection model, PSI-YOLO, is developed to address the challenges of significant feature loss, low recognition accuracy, and high computational cost caused by the lack of texture details and target deformation in UAV infrared images. The model incorporates a lightweight backbone network, PHGNet, to alleviate feature loss resulting from the absence of texture details. To resolve target deformation and stretching issues in infrared images, the Slim-neck module leverages grouped convolutions and cross-stage connections for efficient feature fusion. The loss function is refined to Inner-EIoU. Experimental results validate the effectiveness and superiority of the algorithm for object detection in UAV infrared scenes.