Acta Photonica Sinica, Volume. 54, Issue 4, 0410003(2025)

MSP-YOLACT:Instance Segmentation Model for Multimodal PET/CT Medical Images of Lung Tumors

Tao ZHOU1,3, Wenwen CHAI1,3、*, Yaxing WANG1,3, Kaixiong CHEN1,3, Huiling LU2, and Daozong SHI1,3
Author Affiliations
  • 1School of Computer Science and Engineering,North Minzu University,Yinchuan 750021,China
  • 2School of Medical Information & Engineering,Ningxia Medical University,Yinchuan 750004,China
  • 3Key Laboratory of Image and Graphics Intelligent Processing of State Ethnic Affairs Commission,North Minzu University,Yinchuan 750021,China
  • show less

    With the development of medical image technology, multimodal medical image instance segmentation is a research hotspot. Existing instances segmentation model of multimodal medical image does not fully consider the complementary information of multimodal images lesions. To address the issues of low contrast and blurred boundary of lesion information in lung tumor medical images, the instance segmentation model is proposed for PET/CT lung tumor medical images in this paper. The main contributions of the model include the following 3 parts. Firstly, in order to adequately fully utilize the common features of lesions in different modal images for lesion morphological enhancement, a multimodal feature mixer is designed. The module adaptively learns the common features related to lesion area through the PET and CT 2 branches. Specifically, it first normalizes the input PET and CT feature maps to make the data distribution more stable. Then, it adopts the self-attention mechanism to extract the PET/CT branche features. This mechanism enables the model to focus on different parts of the features and capture more discriminative information. After that, it fuses the features of lesions areas learned from PET and CT branches into PET/CT branches pixel by pixel. By using a weighted fusion method, the important features are emphasized, thereby highlighting the features of lesions areas and making the lesion regions stand out more clearly in the images. Secondly, in order to increase the lesion area attention, the enhanced feature pyramid is designed, which includes an enhanced feature fusion module and a multi-scale feature fusion device. For the enhanced feature fusion module, in the top-down fusion process, the module focuses on the semantic information of the high-level feature map while suppressing the noise factor. It does this by leveraging self-attention mechanisms to selectively emphasize relevant features. For the multi-scale feature fusion device, which receives the coarse and fine information of PET and CT branch features, the module effectively fuses the fore-back ground prominent features, fills the lowest pyramid feature map information, and enhances the learning ability of image morphological information by using dedicated convolutional operations for better feature extraction. Finally, in order to enhance the localization and boundary characterization ability of the model, a parallel feature enhancement prediction head module is designed. This structure reconstructs the anchor frame and mask coefficient branches. Specifically, the anchor frame branch generates a different proportion of anchor frame for each pixel based on the learned feature distribution, and the mask coefficient branch realizes a one-to-one corresponding to the mask by accurately predicting the coefficient of each mask, so as to precisely locate the lesion area. Additionally, global and local feature enhancement modules are introduced to further enhance the lesion areas in the feature map, and thus significantly improve the ability to identify the lesion regions and lesion boundaries. The clinical multimodal lung tumor medical image dataset is used to verify the validity of the model. PET/CT single mode is used to detect and segment lung tumor lesion area, the mAPdet ,mAPseg are 58.25 and 59.45, respectively. PET/CT and CT modes are used to detect and segment lung tumor lesion area, the mAPdet, mAPseg are 57.59 and 59.18, respectively. PET/CT and PET modes are used to detect and segment lung tumor lesion area, the mAPdet, mAPseg are 58.31 and 59.32, respectively. The experimental results showed that the APdet, APseg, ARdet, ARseg, mAPdet, and mAPseg of the proposed model are 64.55%, 65.53%, 51.47%, 52.28%, 64.37% and 65.41%, respectively, for the lung tumor lesions area detection and segmentation. This model can achieve accurate detection and segmentation of lung tumor lesion area, which is of positive significance for automated auxiliary diagnosis of lung tumors.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Tao ZHOU, Wenwen CHAI, Yaxing WANG, Kaixiong CHEN, Huiling LU, Daozong SHI. MSP-YOLACT:Instance Segmentation Model for Multimodal PET/CT Medical Images of Lung Tumors[J]. Acta Photonica Sinica, 2025, 54(4): 0410003

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Sep. 19, 2024

    Accepted: Dec. 20, 2024

    Published Online: May. 15, 2025

    The Author Email: Wenwen CHAI (chaiwenwen@stu.nmu.edu.cn)

    DOI:10.3788/gzxb20255404.0410003

    Topics