Photonics Research, Volume. 13, Issue 2, 550(2025)

Monocular depth estimation based on deep learning for intraoperative guidance using surface-enhanced Raman scattering imaging

Aniwat Juhong1,2, Bo Li1,2, Yifan Liu1,2, Cheng-You Yao2,3, Chia-Wei Yang2,4, A. K. M. Atique Ullah2,4, Kunli Liu2,4, Ryan P. Lewandowski5, Jack R. Harkema5, Dalen W. Agnew5, Yu Leo Lei6, Gary D. Luker7, Xuefei Huang2,3,4, Wibool Piyawattanametha2,8, and Zhen Qiu1,2,3、*
Author Affiliations
  • 1Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, USA
  • 2Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, Michigan 48824, USA
  • 3Department of Biomedical Engineering, Michigan State University, East Lansing, Michigan 48824, USA
  • 4Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, USA
  • 5Department of Pathobiology and Diagnostic Investigation, College of Veterinary Medicine, Michigan State University, East Lansing, Michigan 48824, USA
  • 6Department of Periodontics and Oral Medicine, University of Michigan, Ann Arbor, Michigan 48104, USA
  • 7Departments of Radiology and Biomedical Engineering, University of Michigan, Ann Arbor, Michigan 48109, USA
  • 8Department of Biomedical Engineering, School of Engineering, King Mongkut’s Institute of Technology Ladkrabang (KMITL), Bangkok 10520, Thailand
  • show less
    Figures & Tables(9)
    Schematic of the custom-made Raman imaging system, together with the visualization system. (a) The optical diagram of the Raman spectroscopy system. A 785 nm laser is used to illuminate the sample through a single mode fiber and collimated by a plano-convex lens (L1). The scattered light is then collected by the Raman probe, coupled into the spectrometer using the relay optics (L2 and L3 lenses) with an interchangeable mirror (IM) and a long-pass filter (LPF) in between. The spectrometer consists of a rotatable grating, three mirrors (M1, reflection mirror; M2, collimating mirror; and M3, focusing mirror), and a back-illuminated deep-depletion CCD. To perform 2D Raman imaging, the Raman probe is translated by a two-axis motorized stage. (b) The photographs of the distal and proximal ends of the custom-made fiber bundle. (c) Schematic of the visualization system for generating the 2D and 3D co-registered SERS images.
    Synthesis of the SERS NPs. (a) SERS NPs synthesis and HA/PEG conjugation procedure. First, 17 nm gold seeds (Au NPs) are formed. Second, the NPs further grow to 50 nm; meanwhile different Raman reporters (S420 and S481) are attached to the gold surface. Lastly, the SERS NPs are functionalized with HA or PEG. (b) TEM image of the SERS NPs with diameter of approximately 50 nm. (c) DLS result of the corresponding SERS NPs. The measured size is 56.16 nm in diameter. (d) Normalized Raman spectra of the stock SERS NPs solution of both flavors (S420 and S481).
    (a) Overview of the MiDaS V 3.1 architecture. The input image is embedded with a positional embedding and a patch-independent readout token (orange) is included. These patches are fed to four BEiT stages. At each BEiT, the output tensor is passed through the Reassemble and Fusion blocks to predict the encoder outputs for each stage. (b) BEiT transformer architecture used in the encoder part in (a). (c) Reassemble block applied to assemble the tokens into feature maps with 1/s the spatial resolution of the input image. (d) Fusion block used to combine the features and upsample the feature maps by two times.
    Validation of depth map imaging and Raman spectra at different distances from a camera and a Raman catheter, respectively. (a) Depth map imaging of a step-wedge phantom generated by MiDaS models based on three different backbones (CNN, ViT, and BEiT) and the comparison of the depth map intensity profiles of each model. (b) Depth map imaging of a tumor phantom with different distances from the camera. (c) Raman spectra of S420 SERS NPs characterization at different distances from the Raman catheter by using the step-wedge phantom. (d) Linearity plot of the highest intensity of S420 (1614 cm−1) versus the distances from the Raman catheter.
    (a) Multiplexed Raman images of tissues topically stained with the mixture of SERS-HA (CD44 targeting) and SERS-PEG (control) solutions. (a1) Photographs of the mouse tumor tissue and spleen connective tissue (control), and (a2)–(a4) Raman images of individual channels and ratiometric result. (b) H&E and IHC-CD44 images of the corresponding tissues. (c) Representative enlarged IHC images in (b) of the breast tumor and normal tissues. Scale bars in (a), (b) and (c) are 5 mm and 50 µm, respectively.
    SERS-image-guided surgery for resection of a mouse with a breast tumor. (a) Photographs of the tumor during the intraoperative SERS-image-guided surgery from the first removal to the complete removal. (b) Corresponding SERS images (weight of S420-HA) reconstructed by the demultiplexing algorithm. The scale bar is 5 mm, and the white boundaries depict the resection regions.
    (a) 2D SERS image during Raman spectra acquisition (see Visualization 1): (a1) before scanning, (a2) during scanning, and (a3) complete scanning. (b) 3D image of the sample, SERS, and co-registered SERS reconstructed by using affine transformation and MiDaS V 3.1 DL model with the BEiT backbone architecture (see Visualization 2). The scale bars of (a1) and (b1) are 10 mm and 8 mm, respectively.
    • Table 1. Depth Map Intensity Characterization Result (Average Absolute Error ± Standard Deviation) of MiDaS Models with Three Different Architectures: CNN, ViT, and BEiT

      View table
      View in Article

      Table 1. Depth Map Intensity Characterization Result (Average Absolute Error ± Standard Deviation) of MiDaS Models with Three Different Architectures: CNN, ViT, and BEiT

      Step NumberCNNViTBEiT
      Step 10.074±0.5600.318±0.1400.051±0.560
      Step 20.070±0.0460.252±0.0100.032±0.040
      Step 30.135±0.0880.018±0.0160.024±0.012
      Step 40.092±0.0770.161±0.1000.087±0.083
      Average0.0927±0.0700.1872±0.06650.0485±0.1737
    • Table 2. Tumor Phantom Characterization Result of the Three Different MiDaS Models

      View table
      View in Article

      Table 2. Tumor Phantom Characterization Result of the Three Different MiDaS Models

      ModelCNNViTBEiT
      IoU0.139±0.0260.241±0.0180.272±0.033
      F1-score0.244±0.0410.389±0.0240.426±0.042
      Recall0.262±0.0240.370±0.0270.402±0.029
      Precision0.234±0.0580.421±0.0740.466±0.088
      Execution time (s)0.8610.9981.175
      Number of parameters1.05 × 1083.34 × 1083.45 × 108
    Tools

    Get Citation

    Copy Citation Text

    Aniwat Juhong, Bo Li, Yifan Liu, Cheng-You Yao, Chia-Wei Yang, A. K. M. Atique Ullah, Kunli Liu, Ryan P. Lewandowski, Jack R. Harkema, Dalen W. Agnew, Yu Leo Lei, Gary D. Luker, Xuefei Huang, Wibool Piyawattanametha, Zhen Qiu, "Monocular depth estimation based on deep learning for intraoperative guidance using surface-enhanced Raman scattering imaging," Photonics Res. 13, 550 (2025)

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Medical Optics and Biotechnology

    Received: Jul. 29, 2024

    Accepted: Dec. 8, 2024

    Published Online: Feb. 10, 2025

    The Author Email: Zhen Qiu (qiuzhen@msu.edu)

    DOI:10.1364/PRJ.536871

    Topics