Acta Optica Sinica, Volume. 43, Issue 13, 1310001(2023)

Lightweight Directional Transformer for X-Ray-Aided Pneumonia Diagnosis

Tao Zhou1,2, Xinyu Ye1,2、*, Fengzhen Liu1,2, and Huiling Lu3
Author Affiliations
  • 1School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, Ningxia, China
  • 2The Key Laboratory of Images and Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, Ningxia, China
  • 3School of Medical information and Engineering, Ningxia Medical University, Yinchuan 750004, Ningxia, China
  • show less

    Objective

    Computer-aided pneumonia diagnosis with chest X-rays based on convolutional neural networks (CNNs) is an important research direction. The presence of factors such as patient positions and inspiratory depth in chest X-rays images can lead to confusion with other diseases, and existing methods ignore the directional and spatial features of images in chest X-rays, such as the common onset of pneumonia in the middle and lower lobes of the lung. However, it is difficult to extract the directional information and global semantic information of pneumonia X-rays by a CNN. Additionally, the model is not sufficiently lightweight, and the time and space complexity is high. Hence, this paper proposes a lightweight directional Transformer (LDTransformer) model for pneumonia X-rays to assist in diagnosis.

    Methods

    The densely connected architecture of CNN combined with the Transformer is constructed. It is composed of cross-stacking local feature extraction and global feature extraction, and its dense connections are used to achieve the combination of local and global information in deep and shallow layers. Next, lateral, vertical, and dilated convolutions in parallel with the directional convolution are designed to capture spatial and directional information of different shape sizes. The directional convolution is used to compress feature scales in the Transformer and learn global features and directional features of images with low computational complexity. After that, the lightweight convolution in CNN is designed. It employs a dedicated convolution kernel for each sample feature, learns features in chunks, and fuses them by a channel-noted blender to reduce the number of model parameters and maintain efficient computation while effectively increasing the feature extraction capability of the network. Finally, a balanced focal loss function is constructed to increase the weight of small and misclassified samples and decrease the weight of overclassified samples.

    Results and Discussions

    The LDTransformer model achieves high recognition accuracy with good robustness and generalization in all three X-ray datasets of number, category, and difficulty. Smaller datasets make it difficult for the high-performance CNN and Transformer models to learn sufficiently, while the lightweight model using a combination of both can obtain high recognition accuracy (Table 6). Compared with various lightweight models of CNN and Transformer (Table 4), the model in this paper has advantages in terms of the number of parameters, computation, and training time. In particular, its lightweight design with a dedicated convolution kernel for each sample feature makes the operation efficiency significantly better than that of existing models. Finally, the performance of each component of the model in this paper is tested separately by ablation experiments and loss function comparison experiments, and the region of interest and accuracy of the model are visualized by the heat map visualization in the ablation experiments (Fig. 4).

    Conclusions

    Considering the inadequate feature extraction and insufficient model lightweight, this paper proposes a model for X-ray-aided pneumonia diagnosis to combine local and global information in deep and shallow layers. The directional convolution learns spatial and directional information of different shape sizes. The lightweight convolution with a dedicated convolution kernel for each sample feature is designed to reduce resource consumption, and a balanced focal loss function is constructed to optimize training. The proposed model achieves the accuracy of 98.87% and an AUC value of 98.85% under a small number of model parameters (2.53×105), the lowest model computation (3.98×107), and the fastest total speed (12647 s) in the pneumonia X-ray dataset. It effectively extracts the directional features and global features of pneumonia X-ray images with a high degree of lightweight.

    Tools

    Get Citation

    Copy Citation Text

    Tao Zhou, Xinyu Ye, Fengzhen Liu, Huiling Lu. Lightweight Directional Transformer for X-Ray-Aided Pneumonia Diagnosis[J]. Acta Optica Sinica, 2023, 43(13): 1310001

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Image Processing

    Received: Jan. 6, 2023

    Accepted: Feb. 21, 2023

    Published Online: Jul. 12, 2023

    The Author Email: Ye Xinyu (3303626778@qq.com)

    DOI:10.3788/AOS230447

    Topics