Lightweight Unsupervised Monocular Depth Estimation Framework Using Attention Mechanisms

Depth encoder				Depth decoder
Stage name		Network structure	Output size	Stage name		Network structure	Output size
Stage 1		7×7， 64， stride 2	96×320	Stage 1	UP-Block	3×3， 32， stride 1upsample concatenate 3×3， 32， stride 1	192×640
Stage 2	Max Pool	3×3 Max Pool， stride 2	48×160
	CCT-Block	3×3， 64， stride 1
		3×3， 64， stride 1
		CCT
	CA-Block	3×3， 64， stride 1			BR-Block	1×1， 1， stride 1
		3×3， 64， stride 1				upsample
		CA				Sigmoid
Stage 3	CCT-Block	3×3， 128， stride 2	24×80	Stage 2	UP-Block	3×3， 64， stride 1	96×320
		3×3， 128， stride 1				upsample
		1×1， 128， stride 2				concatenate
		CCT				3×3， 64， stride 1
	CA-Block	3×3， 128， stride 1			BR-Block	1×1， 1， stride 1
		3×3， 128， stride 1				upsample
		CA				Sigmoid
Stage 4	CCT-Block	3×3， 256， stride 2	12×40	Stage 3	UP-Block	3×3， 128， stride 1	48×160
		3×3， 256， stride 1				upsample
		1×1， 256， stride 2				concatenate
		CCT				3×3， 128， stride 1
	CA-Block	3×3， 256， stride 1			BR-Block	1×1， 1， stride 1
		3×3， 256， strode 1				upsample
		CA				Sigmoid

Table 2. Quantitative results on the KITTI dataset

View table

Table 2. Quantitative results on the KITTI dataset

Method	Dataset	Params /10⁶↓	Abs_rel↓	Sq_rel↓	Rmse↓	Rmselog↓	a₁↑	a₂↑	a₃↑
SfMLearner^［7］	M	16.5	0.183	1.595	6.709	0.270	0.734	0.902	0.959
Monodepth2-18^［8］	M	14.3	0.132	1.044	5.142	0.210	0.845	0.948	0.977
Monodepth2-50^［8］	M	32.5	0.131	1.023	5.064	0.206	0.849	0.951	0.979
DeepMatchVO^［9］	M	32.8	0.156	1.309	5.730	0.236	0.797	0.929	0.969
SGDepth^［25］	M+Se	16.3	0.128	0.973	5.085	0.206	0.853	0.951	0.978
R-MSFM3^［26］	M	3.5	0.128	0.965	5.019	0.207	0.853	0.951	0.977
Lite-mono-tiny^［18］	M	2.2	0.125	0.935	4.986	0.204	0.853	0.950	0.978
Proposed method	M	4.9	0.122	0.937	4.942	0.199	0.858	0.953	0.981
Proposed method （4-layers）	M	15.8	0.121	0.935	4.945	0.197	0.858	0.954	0.981
Monodepth2-18^［8］	M†	14.3	0.115	0.903	4.863	0.193	0.877	0.959	0.981
R-MSFM3^［26］	M†	3.5	0.114	0.815	4.841	0.190	0.866	0.857	0.982
Lite-mono-tiny^［18］	M†	2.2	0.110	0.837	4.710	0.187	0.880	0.960	0.982
Proposed method	M†	4.9	0.107	0.839	4.674	0.183	0.883	0.963	0.982

Table 3. Model parametric quantities and computational analysis

View table

Table 3. Model parametric quantities and computational analysis

Method	Encoder		Decoder		Full moder
Method	Params /10⁶	FLOP /10⁹	Params /10⁶	FLOP /10⁹	Params /10⁶	FLOP /10⁹
Monodepth2-18^［8］	11.2	4.5	3.1	3.5	14.3	8.0
DeepMatchVO^［9］	29.4	11.7	3.4	3.7	32.8	15.4
Lite-mono-tiny^［18］	2.0	2.4	0.2	0.44	2.2	2.84
R-MSFM3^［26］	1.7	2.4	3.8	14.1	5.5	16.5
Proposed method	4.1	5.0	0.8	2.8	4.9	7.8
Proposed method（4-layers）	12.6	6.0	3.2	3.6	15.8	9.6

Table 4. Quantitative results on the Make3D dataset

View table

Table 4. Quantitative results on the Make3D dataset

Method	Abs_rel	Sq_rel	Rmse	Rmeslog	a₁	a₂	a₃
Monodepth2-18^［8］	0.322	3.589	7.417	0.178	0.483	0.757	0.855
DeepMatchVO^［9］	0.436	5.101	8.748	0.197	0.443	0.684	0.817
Lite-mono-tiny^［18］	0.318	3.214	7.232	0.164	0.543	0.790	0.894
Proposed method	0.307	3.195	7.158	0.160	0.559	0.801	0.913

Table 5. Quantitative results on the Cityscapes dataset

View table

Table 5. Quantitative results on the Cityscapes dataset

Method	Abs_rel	Sq_rel	Rmse	Rmeslog	a₁	a₂	a₃
Monodepth2-18^［8］	0.386	3.752	8.556	0.278	0.451	0.717	0.843
DeepMatchVO^［9］	0.617	5.235	9.735	0.362	0.324	0.569	0.660
Lite-mono-tiny^［18］	0.406	3.992	8.752	0.292	0.377	0.654	0.735
Proposed method	0.360	3.711	7.873	0.274	0.482	0.734	0.868

Table 6. Quantitative results on the NUYDepth-v2 dataset

View table

Table 6. Quantitative results on the NUYDepth-v2 dataset

Method	Abs_rel	Sq_rel	Rmse	Rmeslog	a₁	a₂	a₃
Monodepth2-18^［8］	0.517	1.971	2.228	0.530	0.349	0.610	0.788
DeepMatchVO^［9］	0.682	4.676	4.317	0.824	0.186	0.355	0.537
Lite-mono-tiny^［18］	0.745	3.383	2.547	0.754	0.214	0.418	0.603
Proposed method	0.419	0.984	1.617	0.468	0.381	0.665	0.840

Table 7. Results of ablation experiments

View table

Table 7. Results of ablation experiments

Method	Abs_rel	Sq_rel	Rmse	Rmselog	a₁↑	a₂↑	a₃↑
Base/Monodepth2	0.132	1.044	5.142	0.210	0.845	0.948	0.977
Base + CCT-Block	0.128	0.978	5.013	0.205	0.856	0.952	0.979
Base + CA-Block	0.128	1.102	5.149	0.207	0.855	0.847	0.977
Base + CCT-Block + CA-Block	0.126	0.938	5.051	0.203	0.855	0.952	0.980
Base + CCT-Block + CA-Block + SURF	0.122	0.937	4.942	0.199	0.858	0.953	0.981
Method-1	0.136	1.115	5.233	0.213	0.839	0.947	0.976
Method-2	0.136	1.071	5.239	0.211	0.838	0.945	0.978

Table 8. Odometer trajectory errors and standard deviations on the KITTI-odometry dataset

View table

Table 8. Odometer trajectory errors and standard deviations on the KITTI-odometry dataset

Method	sequence 01	sequence 07	sequence 09	sequence 10
Base/Monodepth2	0.046±0.020	0.024±0.014	0.061±0.032	0.039±0.025
Base + CCT-Block	0.030±0.016	0.014±0.009	0.016±0.009	0.016±0.011
Base + CA-Block	0.031±0.017	0.014±0.010	0.018±0.011	0.017±0.012
Base + CCT-Block + CA-Block	0.025±0.017	0.013±0.008	0.016±0.008	0.014±0.010
Base + CCT-Block + CA-Block + SURF	0.017±0.010	0.010±0.007	0.011±0.004	0.012±0.008

Tools

Get Citation

Copy Citation Text

Xiyu Li, Yilihamu Yaermaimaiti, Lirong Xie, Shuoqi Cheng. Lightweight Unsupervised Monocular Depth Estimation Framework Using Attention Mechanisms[J]. Laser & Optoelectronics Progress, 2025, 62(8): 0811005

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category: Imaging Systems

Received: Jul. 15, 2024

Accepted: Oct. 12, 2024

Published Online: Apr. 2, 2025

The Author Email: Yilihamu Yaermaimaiti (65891080@qq.com)

DOI:10.3788/LOP241688

CSTR:32186.14.LOP241688

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology