Laser & Optoelectronics Progress, Volume. 62, Issue 14, 1417002(2025)

Medical Image Segmentation Method Combining a Pyramid Vision Transformer and a Kolmogorov-Arnold Network

Zhongan Huang1, Xinyu Li2, Qiaohong Liu3, min Lin4、**, and Huayuan Yang1、*
Author Affiliations
  • 1School of Acupuncture-Moxibustion and Tuina, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
  • 2School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
  • 3College of Medical Instruments, Shanghai University of Medicine and Health Sciences, Shanghai 201318, China
  • 4Shanghai Tongji Hospital, Shanghai 200065, China
  • show less

    The integration of the Kolmogorov-Arnold network (KAN) structure into the traditional U-Net yields the U-KAN model, which demonstrates exceptional nonlinear modeling capability and interpretability in medical image segmentation tasks. To address the limitations of the U-KAN model, such as inadequate global feature capture, poor recognition of complex edges, and limited multiscale feature fusion, this study proposes a novel segmentation network, The proposed model combines a pyramid vision Transformer (PVT) with U-KAN and is referred to as PVT-KANet. While preserving the interpretability of the KAN structure, PVT-KANet overcomes the aforementioned challenges through three key innovations. First, the PVTv2 module is incorporated into the encoder to enhance the global modeling ability of lesion features at different scales. This module effectively improves the perception of complex lesion areas by leveraging a pyramid structure and a self-attention mechanism. Second, a multiscale convolutional attention Tok-KAN module is introduced, integrating a multiscale convolutional attention mechanism with the KAN structure. This module substantially enhances the ability to recognize fuzzy boundaries and fine details. Third, the inception deep convolution decoding module is employed to achieve adaptive multiscale feature fusion, thereby improving the accuracy of segmentation results. On the CVC-ClinicDB polyp dataset, the proposed model achieves a 4.66 percentage-point improvement in terms of the intersection over union compared to U-KAN. Further, the model outperforms existing models on the BUSI breast ultrasound and GlaS glandular tissue datasets. Thus, both experimental results and theoretical analysis confirm the substantial advantages of PVT-KANet in enhancing the accuracy and facilitating the generalization of medical image segmentation.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Zhongan Huang, Xinyu Li, Qiaohong Liu, min Lin, Huayuan Yang. Medical Image Segmentation Method Combining a Pyramid Vision Transformer and a Kolmogorov-Arnold Network[J]. Laser & Optoelectronics Progress, 2025, 62(14): 1417002

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Medical Optics and Biotechnology

    Received: Dec. 11, 2024

    Accepted: Feb. 7, 2025

    Published Online: Jul. 16, 2025

    The Author Email: min Lin (linm_doc@163.com), Huayuan Yang (yhyabcd@sina.com)

    DOI:10.3788/LOP242398

    CSTR:32186.14.LOP242398

    Topics