Medical Image Segmentation Method Combining a Pyramid Vision Transformer and a Kolmogorov-Arnold Network

Zhongan Huang; Xinyu Li; Qiaohong Liu; min Lin; Huayuan Yang

doi:10.3788/LOP242398

Laser & Optoelectronics Progress, Volume. 62, Issue 14, 1417002(2025)

Medical Image Segmentation Method Combining a Pyramid Vision Transformer and a Kolmogorov-Arnold Network

Zhongan Huang¹, Xinyu Li², Qiaohong Liu³, min Lin^4、**, and Huayuan Yang^1、*

¹School of Acupuncture-Moxibustion and Tuina, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China

²School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China

³College of Medical Instruments, Shanghai University of Medicine and Health Sciences, Shanghai 201318, China

⁴Shanghai Tongji Hospital, Shanghai 200065, China

show less

Abstract Get PDF(in Chinese)

The integration of the Kolmogorov-Arnold network (KAN) structure into the traditional U-Net yields the U-KAN model, which demonstrates exceptional nonlinear modeling capability and interpretability in medical image segmentation tasks. To address the limitations of the U-KAN model, such as inadequate global feature capture, poor recognition of complex edges, and limited multiscale feature fusion, this study proposes a novel segmentation network, The proposed model combines a pyramid vision Transformer (PVT) with U-KAN and is referred to as PVT-KANet. While preserving the interpretability of the KAN structure, PVT-KANet overcomes the aforementioned challenges through three key innovations. First, the PVTv2 module is incorporated into the encoder to enhance the global modeling ability of lesion features at different scales. This module effectively improves the perception of complex lesion areas by leveraging a pyramid structure and a self-attention mechanism. Second, a multiscale convolutional attention Tok-KAN module is introduced, integrating a multiscale convolutional attention mechanism with the KAN structure. This module substantially enhances the ability to recognize fuzzy boundaries and fine details. Third, the inception deep convolution decoding module is employed to achieve adaptive multiscale feature fusion, thereby improving the accuracy of segmentation results. On the CVC-ClinicDB polyp dataset, the proposed model achieves a 4.66 percentage-point improvement in terms of the intersection over union compared to U-KAN. Further, the model outperforms existing models on the BUSI breast ultrasound and GlaS glandular tissue datasets. Thus, both experimental results and theoretical analysis confirm the substantial advantages of PVT-KANet in enhancing the accuracy and facilitating the generalization of medical image segmentation.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

image segmentation Inception depthwise convolution KAN pyramid vision Transformer

Tools

Get Citation

Copy Citation Text

Zhongan Huang, Xinyu Li, Qiaohong Liu, min Lin, Huayuan Yang. Medical Image Segmentation Method Combining a Pyramid Vision Transformer and a Kolmogorov-Arnold Network[J]. Laser & Optoelectronics Progress, 2025, 62(14): 1417002

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites