Laser & Optoelectronics Progress, Volume. 62, Issue 8, 0811005(2025)

Lightweight Unsupervised Monocular Depth Estimation Framework Using Attention Mechanisms

Xiyu Li, Yilihamu Yaermaimaiti*, Lirong Xie, and Shuoqi Cheng
Author Affiliations
  • College of Electrical Engineering, Xinjiang University, Urumqi 830017, Xinjiang , China
  • show less
    Figures & Tables(16)
    Complete architecture of the proposed model
    Model architectures of cross-covariance Transformer and coordinate attention
    Qualitative results on the KITTI dataset
    Detail presentation on the KITTI dataset
    Qualitative results on the Make3D dataset
    Qualitative results on the Cityscapes dataset
    Qualitative results on the NYUDepth-v2 dataset
    Predicted trajectories for sequences 01, 07, 09, 10
    • Table 1. Details of the composition of the depth estimation network

      View table

      Table 1. Details of the composition of the depth estimation network

      Depth encoderDepth decoder
      Stage nameNetwork structureOutput sizeStage nameNetwork structureOutput size
      Stage 17×7, 64, stride 296×320Stage 1UP-Block

      3×3, 32, stride 1upsample

      concatenate

      3×3, 32, stride 1

      192×640
      Stage 2Max Pool3×3 Max Pool, stride 248×160
      CCT-Block3×3, 64, stride 1
      3×3, 64, stride 1
      CCT
      CA-Block3×3, 64, stride 1BR-Block1×1, 1, stride 1
      3×3, 64, stride 1upsample
      CASigmoid
      Stage 3CCT-Block3×3, 128, stride 224×80Stage 2UP-Block3×3, 64, stride 196×320
      3×3, 128, stride 1upsample
      1×1, 128, stride 2concatenate
      CCT3×3, 64, stride 1
      CA-Block3×3, 128, stride 1BR-Block1×1, 1, stride 1
      3×3, 128, stride 1upsample
      CASigmoid
      Stage 4CCT-Block3×3, 256, stride 212×40Stage 3UP-Block3×3, 128, stride 148×160
      3×3, 256, stride 1upsample
      1×1, 256, stride 2concatenate
      CCT3×3, 128, stride 1
      CA-Block3×3, 256, stride 1BR-Block1×1, 1, stride 1
      3×3, 256, strode 1upsample
      CASigmoid
    • Table 2. Quantitative results on the KITTI dataset

      View table

      Table 2. Quantitative results on the KITTI dataset

      MethodDatasetParams /106Abs_rel↓Sq_rel↓Rmse↓Rmselog↓a1a2a3
      SfMLearner7M16.50.1831.5956.7090.2700.7340.9020.959
      Monodepth2-188M14.30.1321.0445.1420.2100.8450.9480.977
      Monodepth2-508M32.50.1311.0235.0640.2060.8490.9510.979
      DeepMatchVO9M32.80.1561.3095.7300.2360.7970.9290.969
      SGDepth25M+Se16.30.1280.9735.0850.2060.8530.9510.978
      R-MSFM326M3.50.1280.9655.0190.2070.8530.9510.977
      Lite-mono-tiny18M2.20.1250.9354.9860.2040.8530.9500.978
      Proposed methodM4.90.1220.9374.9420.1990.8580.9530.981
      Proposed method (4-layers)M15.80.1210.9354.9450.1970.8580.9540.981
      Monodepth2-188M†14.30.1150.9034.8630.1930.8770.9590.981
      R-MSFM326M†3.50.1140.8154.8410.1900.8660.8570.982
      Lite-mono-tiny18M†2.20.1100.8374.7100.1870.8800.9600.982
      Proposed methodM†4.90.1070.8394.6740.1830.8830.9630.982
    • Table 3. Model parametric quantities and computational analysis

      View table

      Table 3. Model parametric quantities and computational analysis

      MethodEncoderDecoderFull moder
      Params /106FLOP /109Params /106FLOP /109Params /106FLOP /109
      Monodepth2-18811.24.53.13.514.38.0
      DeepMatchVO929.411.73.43.732.815.4
      Lite-mono-tiny182.02.40.20.442.22.84
      R-MSFM3261.72.43.814.15.516.5
      Proposed method4.15.00.82.84.97.8
      Proposed method(4-layers)12.66.03.23.615.89.6
    • Table 4. Quantitative results on the Make3D dataset

      View table

      Table 4. Quantitative results on the Make3D dataset

      MethodAbs_relSq_relRmseRmesloga1a2a3
      Monodepth2-1880.3223.5897.4170.1780.4830.7570.855
      DeepMatchVO90.4365.1018.7480.1970.4430.6840.817
      Lite-mono-tiny180.3183.2147.2320.1640.5430.7900.894
      Proposed method0.3073.1957.1580.1600.5590.8010.913
    • Table 5. Quantitative results on the Cityscapes dataset

      View table

      Table 5. Quantitative results on the Cityscapes dataset

      MethodAbs_relSq_relRmseRmesloga1a2a3
      Monodepth2-1880.3863.7528.5560.2780.4510.7170.843
      DeepMatchVO90.6175.2359.7350.3620.3240.5690.660
      Lite-mono-tiny180.4063.9928.7520.2920.3770.6540.735
      Proposed method0.3603.7117.8730.2740.4820.7340.868
    • Table 6. Quantitative results on the NUYDepth-v2 dataset

      View table

      Table 6. Quantitative results on the NUYDepth-v2 dataset

      MethodAbs_relSq_relRmseRmesloga1a2a3
      Monodepth2-1880.5171.9712.2280.5300.3490.6100.788
      DeepMatchVO90.6824.6764.3170.8240.1860.3550.537
      Lite-mono-tiny180.7453.3832.5470.7540.2140.4180.603
      Proposed method0.4190.9841.6170.4680.3810.6650.840
    • Table 7. Results of ablation experiments

      View table

      Table 7. Results of ablation experiments

      MethodAbs_relSq_relRmseRmseloga1a2a3
      Base/Monodepth20.1321.0445.1420.2100.8450.9480.977
      Base + CCT-Block0.1280.9785.0130.2050.8560.9520.979
      Base + CA-Block0.1281.1025.1490.2070.8550.8470.977
      Base + CCT-Block + CA-Block0.1260.9385.0510.2030.8550.9520.980
      Base + CCT-Block + CA-Block + SURF0.1220.9374.9420.1990.8580.9530.981
      Method-10.1361.1155.2330.2130.8390.9470.976
      Method-20.1361.0715.2390.2110.8380.9450.978
    • Table 8. Odometer trajectory errors and standard deviations on the KITTI-odometry dataset

      View table

      Table 8. Odometer trajectory errors and standard deviations on the KITTI-odometry dataset

      Methodsequence 01sequence 07sequence 09sequence 10
      Base/Monodepth20.046±0.0200.024±0.0140.061±0.0320.039±0.025
      Base + CCT-Block0.030±0.0160.014±0.0090.016±0.0090.016±0.011
      Base + CA-Block0.031±0.0170.014±0.0100.018±0.0110.017±0.012
      Base + CCT-Block + CA-Block0.025±0.0170.013±0.0080.016±0.0080.014±0.010
      Base + CCT-Block + CA-Block + SURF0.017±0.0100.010±0.0070.011±0.0040.012±0.008
    Tools

    Get Citation

    Copy Citation Text

    Xiyu Li, Yilihamu Yaermaimaiti, Lirong Xie, Shuoqi Cheng. Lightweight Unsupervised Monocular Depth Estimation Framework Using Attention Mechanisms[J]. Laser & Optoelectronics Progress, 2025, 62(8): 0811005

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Imaging Systems

    Received: Jul. 15, 2024

    Accepted: Oct. 12, 2024

    Published Online: Apr. 2, 2025

    The Author Email: Yilihamu Yaermaimaiti (65891080@qq.com)

    DOI:10.3788/LOP241688

    CSTR:32186.14.LOP241688

    Topics