Optics and Precision Engineering, Volume. 31, Issue 20, 2993(2023)

Indoor self-supervised monocular depth estimation based on level feature fusion

Deqiang CHENG1, Huaqiang ZHANG1, Qiqi KOU2, Chen LÜ1, and Jiansheng QIAN1、*
Author Affiliations
  • 1School of Information and Control Engineering, University of Mining and Technology, Xuzhou 226, China
  • 2School of Computer Science and Technology, University of Mining and Technology, Xuzhou 1116, China
  • show less
    Figures & Tables(15)
    Depth estimation network model in this paper
    Main steps to compute the brightness mapping function ν
    Comparison of image enhancement effect of MCIE module
    Structure of feature adjustment in single level
    Structure of feature adjustment in three levels
    Cross-Level Feature Adjustment module
    Steps to calculate Gram matrix similarity
    Iterative optimization curve of loss function
    Comparison of predicted depth maps between proposed model and existing main methods on NYU Depth V2 dataset
    Comparison of predicted depth maps between proposed model and existing main methods on ScanNet dataset
    • Table 1. Relationship between CLFA module related parameters in Decoder network

      View table
      View in Article

      Table 1. Relationship between CLFA module related parameters in Decoder network

      ModuleInput

      Channels

      (Input)

      Channels

      (Result)

      Output

      Channels

      (Output)

      Ti

      Channels

      Ti

      CLFA3F16432O5256T5256
      F26432
      F312864
      F4256128
      CLFA4F16432O4128T4128
      F26432
      F312864
      CLFA5F16432O364T364
      F26432
      CLFA6F16464O264T264
    • Table 2. Comparison of experimental results between proposed model and existing main methods on NYU Depth V2 dataset

      View table
      View in Article

      Table 2. Comparison of experimental results between proposed model and existing main methods on NYU Depth V2 dataset

      MethodMethod of SupervisionIndicator of Error (Lower is better)Accuracy of prediction(Higher is better)
      RMSEAbs RelRMS Log10δ<1.25δ<1.252δ<1.253
      DORN180.5090.1150.0510.8280.9650.992
      Hu et al.460.5300.1150.0500.8660.9750.993
      Yin et al.47Supervised0.4160.1080.1080.8750.9760.994
      AdaBins480.3640.1030.0440.9030.9840.997
      Niklaus et al.490.3000.0800.0300.9400.9901.000
      MovingIndoor220.7120.2080.0860.6740.9000.968
      TrainFlow500.6860.2080.0860.7010.9120.978
      Monodepth2100.6000.1610.0680.7710.9480.987
      SC-Depth510.6080.1590.0680.7720.9390.982
      P2Net120.5610.1500.0640.7960.9480.986
      P2Net(5 frames PP)12Self-supervised0.5530.1470.0620.8010.9510.987
      Bian et al.520.5360.1470.0620.8040.9500.986
      PLNet (5 frames)530.5400.1440.0610.8070.9570.990
      Zhan et al.540.5380.1430.0600.8120.9510.986
      StructDepth240.5400.1420.0600.8130.9540.988
      Our0.5300.1380.0590.8190.9590.990
    • Table 3. Comparison of experimental results between the model in this paper and existing main methods on ScanNet dataset

      View table
      View in Article

      Table 3. Comparison of experimental results between the model in this paper and existing main methods on ScanNet dataset

      MethodMethod of SupervisionIndicator of Error (Lower is better)Accuracy of prediction (Higher is better)
      RMSEAbs RelRMS Log10δ<1.25δ<1.252δ<1.253
      MovingIndoor22Self-supervised0.4830.2120.0880.6500.9050.976
      Monodepth2100.4510.19110.0800.6930.9260.983
      P2Net120.4200.1750.0740.7400.9320.982
      P2Net-finetune240.4120.1720.0730.7430.9350.984
      StructDepth240.4000.1650.0700.7540.9390.985
      Our0.3910.1620.0690.7600.9460.987
    • Table 4. Ablation experiment of Several Cross-Level feature fusion structure on NYU Depth V2 dataset

      View table
      View in Article

      Table 4. Ablation experiment of Several Cross-Level feature fusion structure on NYU Depth V2 dataset

      MethodPositionIndicator of Error (Lower is better)Accuracy of prediction (Higher is better)
      RMSEAbs RelRMS Log10δ<1.25δ<1.252δ<1.253
      Baseline/0.5400.1420.0600.8130.9540.988
      Single LevelF10.5430.1430.0610.8100.9530.988
      F20.5430.1420.0610.8110.9540.988
      F30.5400.1420.0610.8120.9540.988
      F40.5400.1410.0600.8140.9550.989
      Double LevelsF2F30.5400.1400.0610.8150.9560.988
      F2F40.5390.1400.0600.8160.9550.989
      F3F40.5370.1390.0600.8170.9550.989
      Three LevelsF2F3F40.5400.1400.0600.8150.9550.988
    • Table 5. Ablation experiment of different innovative modules on NYU Depth V2 dataset

      View table
      View in Article

      Table 5. Ablation experiment of different innovative modules on NYU Depth V2 dataset

      MethodCLFAGMSLMCIEIndicator of Error (Lower is better)Accuracy of prediction (Higher is better)
      RMSEAbs RelRMS Log10δ<1.25δ<1.252δ<1.253
      Baseline×××0.5400.1420.0600.8130.9540.988
      1××0.5370.1390.0600.8170.9550.989
      2××0.5360.1400.0600.8160.9560.989
      3××0.5390.1410.0610.8150.9550.988
      4×0.5330.1380.0590.8180.9570.990
      5×0.5360.1390.0590.8170.9560.989
      6×0.5370.1390.0600.8170.9550.988
      70.5300.1380.0590.8190.9590.990
    Tools

    Get Citation

    Copy Citation Text

    Deqiang CHENG, Huaqiang ZHANG, Qiqi KOU, Chen LÜ, Jiansheng QIAN. Indoor self-supervised monocular depth estimation based on level feature fusion[J]. Optics and Precision Engineering, 2023, 31(20): 2993

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Information Sciences

    Received: Mar. 1, 2023

    Accepted: --

    Published Online: Nov. 28, 2023

    The Author Email: Jiansheng QIAN (qianjsh@cumt.edu.cn)

    DOI:10.37188/OPE.20233120.2993

    Topics