Laser & Optoelectronics Progress, Volume. 61, Issue 16, 1611013(2024)
Single-View Endoscopic Surgical Light Field Reconstruction Combining Vision Transformer and Diffusion Model(Invited)
To address the issues associated with 3D perception in endoscopic surgery, such as uncertainty in depth estimation and occlusions from a single-view image, this paper proposes a novel single-view multi-plane image (MPI) representation-based method. This method uses a fusion of a vision transformer and a conditional diffusion model designed for light field reconstruction in endoscopic operations. Initially, the method employs a vision transformer to tokenize the single-view input image, decomposing it into multiple image patches and extracting locally and globally associative features through a multi-head attention mechanism. Then, the image block features are reassembled and fused from coarse to fine using a multi-scale convolutional decoder to generate an initial MPI. Finally, to address the occlusion problem between tissues in single-view endoscopic surgery, a background prediction module based on a conditional diffusion model is introduced. This module uses the initial MPI to obtain an occlusion mask, and conditioned on this mask and the input viewpoint, it predicts the distribution of the occluded areas. This approach effectively addresses the problem of incoherent viewing angles in the light field caused by single-view input. The proposed method combines the initial MPI, decomposed by the vision transformer, with the background area predicted by the diffusion model to produce an optimized MPI, thus rendering the sub-view images within the endoscopic surgical light field. Experiment results on a real endoscopic surgical dataset from the Da Vinci surgical robot demonstrate that the proposed method outperforms existing single-view light field reconstruction methods in terms of both visual and objective evaluation metrics.
Get Citation
Copy Citation Text
Chenming Han, Gaochang Wu. Single-View Endoscopic Surgical Light Field Reconstruction Combining Vision Transformer and Diffusion Model(Invited)[J]. Laser & Optoelectronics Progress, 2024, 61(16): 1611013
Category: Imaging Systems
Received: May. 13, 2024
Accepted: Jul. 18, 2024
Published Online: Aug. 12, 2024
The Author Email: Wu Gaochang (wugc@mail.neu.edu.cn)
CSTR:32186.14.LOP241272