Laser & Optoelectronics Progress, Volume. 61, Issue 16, 1611013(2024)

Single-View Endoscopic Surgical Light Field Reconstruction Combining Vision Transformer and Diffusion Model(Invited)

Chenming Han and Gaochang Wu*
Author Affiliations
  • State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, Liaoning, China
  • show less

    To address the issues associated with 3D perception in endoscopic surgery, such as uncertainty in depth estimation and occlusions from a single-view image, this paper proposes a novel single-view multi-plane image (MPI) representation-based method. This method uses a fusion of a vision transformer and a conditional diffusion model designed for light field reconstruction in endoscopic operations. Initially, the method employs a vision transformer to tokenize the single-view input image, decomposing it into multiple image patches and extracting locally and globally associative features through a multi-head attention mechanism. Then, the image block features are reassembled and fused from coarse to fine using a multi-scale convolutional decoder to generate an initial MPI. Finally, to address the occlusion problem between tissues in single-view endoscopic surgery, a background prediction module based on a conditional diffusion model is introduced. This module uses the initial MPI to obtain an occlusion mask, and conditioned on this mask and the input viewpoint, it predicts the distribution of the occluded areas. This approach effectively addresses the problem of incoherent viewing angles in the light field caused by single-view input. The proposed method combines the initial MPI, decomposed by the vision transformer, with the background area predicted by the diffusion model to produce an optimized MPI, thus rendering the sub-view images within the endoscopic surgical light field. Experiment results on a real endoscopic surgical dataset from the Da Vinci surgical robot demonstrate that the proposed method outperforms existing single-view light field reconstruction methods in terms of both visual and objective evaluation metrics.

    Keywords
    Tools

    Get Citation

    Copy Citation Text

    Chenming Han, Gaochang Wu. Single-View Endoscopic Surgical Light Field Reconstruction Combining Vision Transformer and Diffusion Model(Invited)[J]. Laser & Optoelectronics Progress, 2024, 61(16): 1611013

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Imaging Systems

    Received: May. 13, 2024

    Accepted: Jul. 18, 2024

    Published Online: Aug. 12, 2024

    The Author Email: Wu Gaochang (wugc@mail.neu.edu.cn)

    DOI:10.3788/LOP241272

    CSTR:32186.14.LOP241272

    Topics