ObjectiveUAVs equipped with infrared cameras can efficiently acquire and continuously track ground targets without being detected. Aiming at the problems of blurring of image contour caused by long distance in UAV infrared imaging and degradation of segmentation accuracy due to changes in target scale, this paper proposes a segmentation model of UAV infrared target instances with dynamic feature aggregation and multilevel synergy.
MethodsA dynamic feature aggregation and multilevel perception UAV infrared target instance segmentation model-DFANet is proposed on the basis of YOLOv8n algorithm (
Fig.1) in this paper. Firstly, the standard convolution in the network backbone is replaced with a regional feature adaptive convolution module (
Fig.2), which enhances the benefit of feature extraction. Secondly, the original up-sampling module is replaced with a redesigned feature-aware reorganized up-sampling module (
Fig.3) to better extract the infrared image target edge feature information; finally, a multi-scale context-aggregated feature extraction module is embedded in the backbone network to reduce the effect of target scale variation (
Fig.4).
Results and DiscussionsIn order to evaluate the segmentation performance of the proposed network model, various metrics such as mAP50, mAP50-95, size, GFLOPs and inference time are used in this paper for a comprehensive comparison. The ablation study in Table 2 shows that the average segmentation accuracy of the improved IR target segmentation model increases from 67.6% to 78.4% and the inference time slightly increases from 5.5 ms to 10.8 ms. The comparison experiments in Tables 3 and 4 comparing the proposed module with other modules of the same type show that our proposed module has better results for infrared target instance segmentation. Finally, the comparison experiments of different networks in Table 5 illustrate that the improved model outperforms the current state-of-the-art networks yolov11n and YOLOV12n in terms of segmentation accuracy.
ConclusionsThis study proposes a dynamic feature aggregation and multilevel synergistic UAV infrared target instance segmentation model, targeting the challenges of missing target details in infrared imaging and low accuracy of multi-scale target recognition, and realizing performance breakthroughs through three core innovations: designing a regional feature adaptive convolution module, based on the spatial attention-guided dynamic weight allocation strategy, to enhance the feature focusing ability on the target's key regions; constructing a feature-aware restructuring up-sampling module, which realizes efficient reconstruction of high-resolution features through content-driven dynamic kernel generation and local affine transformation; and the development of a multi-scale context aggregation module, which fuses cavity convolution and feature pyramid structure to capture cross-level contextual dependencies. Experiments on the aerial infrared vehicle dataset show that DFANet achieves mAP50 78.4% and mAP50-95 51.1%, which improves the segmentation accuracy by 9.7% and 5.6% respectively compared to the benchmark model, and achieves the comprehensive optimal result in experimental comparison with other mainstream instance segmentation networks.