Acta Optica Sinica, Volume. 45, Issue 15, 1510003(2025)

Difference Aware Guided Boundary Transformer Network for Childhood Pneumonia CT Image Segmentation

Jia Lü1,2、*, Mingkai Yu1, Xin Chen3, and Ling He3、**
Author Affiliations
  • 1College of Computer and Information Sciences, Chongqing Normal University, Chongqing 401331, China
  • 2National Center for Applied Mathematics in Chongqing, Chongqing Normal University, Chongqing 401331, China
  • 3Intelligent Application of Big Data in Pediatrics Engineering Research Center of Chongqing Education Commission of China, Ministry of Education Key Laboratory of Child Development and Disorders, National Clinical Research Center for Child Health and Disorders, Department of Radiology Children’s Hospital of Chongqing Medical University, Chongqing 400014, China
  • show less

    Objective

    Pneumonia represents a respiratory disease with high incidence and mortality rates in childhood. The accurate segmentation of lung computed tomography (CT) images plays a crucial role in early diagnosis and treatment planning. The manual labeling of infected regions, however, is time-intensive and burdensome, significantly increasing radiologists’ workload. Therefore, developing efficient automatic segmentation methods holds substantial practical significance in alleviating medical resource constraints. Current predominant medical image segmentation approaches primarily utilize U-shaped architecture, known for its semantic modeling capabilities. However, the encoder-decoder structure inherently requires multiple down-sampling operations, resulting in the loss of critical spatial structure information and compromising segmentation accuracy. Furthermore, infected regions in childhood pneumonia CT images typically present with scattered and fragmented multifocal distribution patterns, demanding enhanced model capabilities for capturing long-distance dependencies and maintaining overall structural coherence. While Transformer-based segmentation networks have demonstrated strong performance in global modeling recently, their limited local spatial priors and patch size constraints often lead to inadequate local detail segmentation. Additionally, normal anatomical structures such as the lung hilum exhibit morphological similarities to infected regions, necessitating superior network anti-interference capabilities. To address these challenges, this study proposes a U-Net-based difference aware guided boundary Transformer segmentation network.

    Methods

    DBTU-Net aims to enhance both local structural detail modeling and global semantic dependency modeling capabilities, enabling accurate segmentation of fragmented and scattered multi-focal regions. The network architecture builds upon the classical U-shaped structure, incorporating three key components: gated channel Transformer (GCT), difference aware fusion (DAF), and boundary Transformer (BT), as shown in Fig. 1. During the feature extraction phase, multi-scale semantic information is progressively extracted through multilayer convolution and down-sampling operations. To enhance the network’s context modeling capability, the GCT module (Fig. 2) is integrated into each encoder layer. This module dynamically models channel dependencies to adaptively adjust the importance distribution across different semantic channels, thereby strengthening global information perception. The DAF module is implemented in the skip connection path, explicitly enhancing spatial structure by computing difference information between adjacent encoder layer features. This mechanism mitigates spatial detail loss from down-sampling while providing the decoder with comprehensive structural priors, improving the model’s capacity to recognize small lesion regions. At the network’s bottleneck layer, the BT module (Fig. 3) further enhances global modeling capability. This module utilizes encoder multi-scale disparity maps for guidance, establishing potential connections between distant lesion regions through Transformer architecture, improving distant lesion recognition, and refining boundary segmentation while maintaining global consistency. The decoder ultimately produces a high-quality segmentation map through up-sampling operations.

    Results and Discussions

    Experimental analyses are conducted on a private childhood pneumonia CT dataset (Child-P) and two public COVID-19 CT datasets (COVID, MosMed) to validate DBTU-Net’s effectiveness. Eight ablation schemes were designed to evaluate the performance of three key modules: GCT, DAF, and BT. The results demonstrate that each module enhances network segmentation performance (Table 2), with the DAF module showing notable improvements of 8.21 percentage points, 12.34 percentage points, and 13.94 percentage points for Dice similarly coefficient (Dice), Jaccard index (JI), and sensitivity (SE) metrics, respectively, confirming its effectiveness in enhancing spatial detail expression, preserving structural information, and improving lesion integrity. Module combination experiments further validate the DAF module’s importance. Without DAF, retaining only GCT and BT leads to performance decreases of 1.09 percentage points and 0.99 percentage points in JI and SE metrics compared to the full model. The DAF and BT combination achieves 25.24 pixel in Hausdorff distance (HD) metrics while maintaining high comprehensive performance, demonstrating their synergistic effect on boundary detail extraction. In comparison experiments, DBTU-Net achieves optimal results across all five metrics on the Child-P dataset (Table 3), with Dice and JI reaching 89.61% and 81.17%, representing improvements of 8.17 percentage points and 12.48 percentage points over the baseline network, surpassing the suboptimal CASCADE network. Visualization results indicate DBTU-Net’s superior sensitivity in identifying scattered and tiny lesions, reducing missed segmentation and over-segmentation instances (Fig. 6). The decreased HD metrics demonstrate the model’s advantage in lesion boundary modeling, validating the effectiveness of cross-scale difference perception and boundary modeling mechanisms. On the COVID dataset, DBTU-Net leads in all five metrics, with JI, SE, and Matthews correlation coefficient (MCC) reaching 69.66%, 77.02%, and 81.04%, significantly outperforming U-Net++ in balancing lesion pixel identification and background differentiation (Table 4). On the MosMed dataset, despite slightly lower SE than TransDeepLab, DBTU-Net achieves optimal results in Dice, JI, and MCC metrics (Table 5), demonstrating robust lesion structure modeling. Visualization results show DBTU-Net’s advantage in reducing mis-segmentation and false positives, attributed to its contextual modeling mechanism integrating GCT and BT (Fig. 7). Multiple local detail visualizations confirm DBTU-Net maintains consistent segmentation in regions with blurred edges or irregular lesion morphology, validating its local segmentation accuracy and robustness under complex structures (Fig. 8).

    Conclusions

    This research focuses on childhood pneumonia segmentation. DBTU-Net addresses the limitations of traditional U-shaped networks in segmenting small, fragmented, and complex lesions due to spatial information loss and limited global feature extraction. The network enhances spatial structure expression by utilizing the DAF module to mine differential features between layers, while incorporating the BT module to guide high-level semantic information for boundary enhancement. This combination improves the modeling capability for distant lesions and local boundary segmentation accuracy, reducing mis-segmentation in complex lesion regions. Experimental results on the private childhood pneumonia dataset demonstrate DBTU-Net’s superior performance compared to existing mainstream methods across multiple evaluation metrics. Additionally, its strong performance on the public COVID-19 dataset validates the method’s generalization capabilities.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Jia Lü, Mingkai Yu, Xin Chen, Ling He. Difference Aware Guided Boundary Transformer Network for Childhood Pneumonia CT Image Segmentation[J]. Acta Optica Sinica, 2025, 45(15): 1510003

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Image Processing

    Received: Mar. 18, 2025

    Accepted: May. 6, 2025

    Published Online: Aug. 8, 2025

    The Author Email: Jia Lü (lvjia@cqnu.edu.cn), Ling He (heling508@sina.com)

    DOI:10.3788/AOS250760

    CSTR:32393.14.AOS250760

    Topics