Laser & Optoelectronics Progress, Volume. 62, Issue 14, 1417002(2025)
Medical Image Segmentation Method Combining a Pyramid Vision Transformer and a Kolmogorov-Arnold Network
The integration of the Kolmogorov-Arnold network (KAN) structure into the traditional U-Net yields the U-KAN model, which demonstrates exceptional nonlinear modeling capability and interpretability in medical image segmentation tasks. To address the limitations of the U-KAN model, such as inadequate global feature capture, poor recognition of complex edges, and limited multiscale feature fusion, this study proposes a novel segmentation network, The proposed model combines a pyramid vision Transformer (PVT) with U-KAN and is referred to as PVT-KANet. While preserving the interpretability of the KAN structure, PVT-KANet overcomes the aforementioned challenges through three key innovations. First, the PVTv2 module is incorporated into the encoder to enhance the global modeling ability of lesion features at different scales. This module effectively improves the perception of complex lesion areas by leveraging a pyramid structure and a self-attention mechanism. Second, a multiscale convolutional attention Tok-KAN module is introduced, integrating a multiscale convolutional attention mechanism with the KAN structure. This module substantially enhances the ability to recognize fuzzy boundaries and fine details. Third, the inception deep convolution decoding module is employed to achieve adaptive multiscale feature fusion, thereby improving the accuracy of segmentation results. On the CVC-ClinicDB polyp dataset, the proposed model achieves a 4.66 percentage-point improvement in terms of the intersection over union compared to U-KAN. Further, the model outperforms existing models on the BUSI breast ultrasound and GlaS glandular tissue datasets. Thus, both experimental results and theoretical analysis confirm the substantial advantages of PVT-KANet in enhancing the accuracy and facilitating the generalization of medical image segmentation.
Get Citation
Copy Citation Text
Zhongan Huang, Xinyu Li, Qiaohong Liu, min Lin, Huayuan Yang. Medical Image Segmentation Method Combining a Pyramid Vision Transformer and a Kolmogorov-Arnold Network[J]. Laser & Optoelectronics Progress, 2025, 62(14): 1417002
Category: Medical Optics and Biotechnology
Received: Dec. 11, 2024
Accepted: Feb. 7, 2025
Published Online: Jul. 16, 2025
The Author Email: min Lin (linm_doc@163.com), Huayuan Yang (yhyabcd@sina.com)
CSTR:32186.14.LOP242398