Single-View Endoscopic Surgical Light Field Reconstruction Combining Vision Transformer and Diffusion Model(Invited)

Chenming Han; Gaochang Wu

doi:10.3788/LOP241272

Laser & Optoelectronics Progress, Volume. 61, Issue 16, 1611013(2024)

Single-View Endoscopic Surgical Light Field Reconstruction Combining Vision Transformer and Diffusion Model(Invited)

Chenming Han and Gaochang Wu^*

State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, Liaoning, China

show less

Abstract Get PDF(in Chinese)

To address the issues associated with 3D perception in endoscopic surgery, such as uncertainty in depth estimation and occlusions from a single-view image, this paper proposes a novel single-view multi-plane image (MPI) representation-based method. This method uses a fusion of a vision transformer and a conditional diffusion model designed for light field reconstruction in endoscopic operations. Initially, the method employs a vision transformer to tokenize the single-view input image, decomposing it into multiple image patches and extracting locally and globally associative features through a multi-head attention mechanism. Then, the image block features are reassembled and fused from coarse to fine using a multi-scale convolutional decoder to generate an initial MPI. Finally, to address the occlusion problem between tissues in single-view endoscopic surgery, a background prediction module based on a conditional diffusion model is introduced. This module uses the initial MPI to obtain an occlusion mask, and conditioned on this mask and the input viewpoint, it predicts the distribution of the occluded areas. This approach effectively addresses the problem of incoherent viewing angles in the light field caused by single-view input. The proposed method combines the initial MPI, decomposed by the vision transformer, with the background area predicted by the diffusion model to produce an optimized MPI, thus rendering the sub-view images within the endoscopic surgical light field. Experiment results on a real endoscopic surgical dataset from the Da Vinci surgical robot demonstrate that the proposed method outperforms existing single-view light field reconstruction methods in terms of both visual and objective evaluation metrics.

Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.

Keywords

conditional diffusion model light field reconstruction multi-plane image representation vision Transformer

Tools

Get Citation

Copy Citation Text

Chenming Han, Gaochang Wu. Single-View Endoscopic Surgical Light Field Reconstruction Combining Vision Transformer and Diffusion Model(Invited)[J]. Laser & Optoelectronics Progress, 2024, 61(16): 1611013

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites