MSP-YOLACT：Instance Segmentation Model for Multimodal PET/CT Medical Images of Lung Tumors

Tao ZHOU; Wenwen CHAI; Yaxing WANG; Kaixiong CHEN; Huiling LU; Daozong SHI

doi:10.3788/gzxb20255404.0410003

Acta Photonica Sinica, Volume. 54, Issue 4, 0410003(2025)

MSP-YOLACT：Instance Segmentation Model for Multimodal PET/CT Medical Images of Lung Tumors

Tao ZHOU^1,3, Wenwen CHAI^1,3、*, Yaxing WANG^1,3, Kaixiong CHEN^1,3, Huiling LU², and Daozong SHI^1,3

Author Affiliations

¹School of Computer Science and Engineering，North Minzu University，Yinchuan 750021，China

²School of Medical Information & Engineering，Ningxia Medical University，Yinchuan 750004，China

³Key Laboratory of Image and Graphics Intelligent Processing of State Ethnic Affairs Commission，North Minzu University，Yinchuan 750021，China

show less

Abstract Get PDF(in Chinese)

With the development of medical image technology, multimodal medical image instance segmentation is a research hotspot. Existing instances segmentation model of multimodal medical image does not fully consider the complementary information of multimodal images lesions. To address the issues of low contrast and blurred boundary of lesion information in lung tumor medical images, the instance segmentation model is proposed for PET/CT lung tumor medical images in this paper. The main contributions of the model include the following 3 parts. Firstly, in order to adequately fully utilize the common features of lesions in different modal images for lesion morphological enhancement, a multimodal feature mixer is designed. The module adaptively learns the common features related to lesion area through the PET and CT 2 branches. Specifically, it first normalizes the input PET and CT feature maps to make the data distribution more stable. Then, it adopts the self-attention mechanism to extract the PET/CT branche features. This mechanism enables the model to focus on different parts of the features and capture more discriminative information. After that, it fuses the features of lesions areas learned from PET and CT branches into PET/CT branches pixel by pixel. By using a weighted fusion method, the important features are emphasized, thereby highlighting the features of lesions areas and making the lesion regions stand out more clearly in the images. Secondly, in order to increase the lesion area attention, the enhanced feature pyramid is designed, which includes an enhanced feature fusion module and a multi-scale feature fusion device. For the enhanced feature fusion module, in the top-down fusion process, the module focuses on the semantic information of the high-level feature map while suppressing the noise factor. It does this by leveraging self-attention mechanisms to selectively emphasize relevant features. For the multi-scale feature fusion device, which receives the coarse and fine information of PET and CT branch features, the module effectively fuses the fore-back ground prominent features, fills the lowest pyramid feature map information, and enhances the learning ability of image morphological information by using dedicated convolutional operations for better feature extraction. Finally, in order to enhance the localization and boundary characterization ability of the model, a parallel feature enhancement prediction head module is designed. This structure reconstructs the anchor frame and mask coefficient branches. Specifically, the anchor frame branch generates a different proportion of anchor frame for each pixel based on the learned feature distribution, and the mask coefficient branch realizes a one-to-one corresponding to the mask by accurately predicting the coefficient of each mask, so as to precisely locate the lesion area. Additionally, global and local feature enhancement modules are introduced to further enhance the lesion areas in the feature map, and thus significantly improve the ability to identify the lesion regions and lesion boundaries. The clinical multimodal lung tumor medical image dataset is used to verify the validity of the model. PET/CT single mode is used to detect and segment lung tumor lesion area, the mAP_{det ,}mAP_seg are 58.25 and 59.45, respectively. PET/CT and CT modes are used to detect and segment lung tumor lesion area, the mAP_det,mAP_seg are 57.59 and 59.18, respectively. PET/CT and PET modes are used to detect and segment lung tumor lesion area, the mAP_det,mAP_seg are 58.31 and 59.32, respectively. The experimental results showed that the AP_det, AP_seg, AR_det, AR_seg, mAP_det, and mAP_seg of the proposed model are 64.55%, 65.53%, 51.47%, 52.28%, 64.37% and 65.41%, respectively, for the lung tumor lesions area detection and segmentation. This model can achieve accurate detection and segmentation of lung tumor lesion area, which is of positive significance for automated auxiliary diagnosis of lung tumors.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

Attention mechanism Instance segmentation Lung tumors Medical image Multimodal PET/CT

Tools

Get Citation

Copy Citation Text

Tao ZHOU, Wenwen CHAI, Yaxing WANG, Kaixiong CHEN, Huiling LU, Daozong SHI. MSP-YOLACT：Instance Segmentation Model for Multimodal PET/CT Medical Images of Lung Tumors[J]. Acta Photonica Sinica, 2025, 54(4): 0410003

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: Sep. 19, 2024

Accepted: Dec. 20, 2024

Published Online: May. 15, 2025

The Author Email: Wenwen CHAI (chaiwenwen@stu.nmu.edu.cn)

DOI:10.3788/gzxb20255404.0410003

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology