Indoor self-supervised monocular depth estimation based on level feature fusion

Module	Input	Channels （Input）	Channels （Result）	Output	Channels （Output）	T_i	Channels （T_i）
CLFA₃	F₁	64	32	O₅	256	T₅	256
	F₂	64	32
	F₃	128	64
	F₄	256	128
CLFA₄	F₁	64	32	O₄	128	T₄	128
	F₂	64	32
	F₃	128	64
CLFA₅	F₁	64	32	O₃	64	T₃	64
CLFA₅	F₂	64	32	O₃	64	T₃	64
CLFA₆	F₁	64	64	O₂	64	T₂	64

Table 2. Comparison of experimental results between proposed model and existing main methods on NYU Depth V2 dataset

View table

View in Article

Table 2. Comparison of experimental results between proposed model and existing main methods on NYU Depth V2 dataset

Method	Method of Supervision	Indicator of Error （Lower is better）			Accuracy of prediction（Higher is better）
Method	Method of Supervision	RMSE	Abs Rel	RMS Log10	$δ < 1.25$	$δ < 1 . 25^{2}$	$δ < 1 . 25^{3}$
DORN^［18］		0.509	0.115	0.051	0.828	0.965	0.992
Hu et al.^［46］		0.530	0.115	0.050	0.866	0.975	0.993
Yin et al.^［47］	Supervised	0.416	0.108	0.108	0.875	0.976	0.994
AdaBins^［48］		0.364	0.103	0.044	0.903	0.984	0.997
Niklaus et al.^［49］		0.300	0.080	0.030	0.940	0.990	1.000
MovingIndoor^［22］		0.712	0.208	0.086	0.674	0.900	0.968
TrainFlow^［50］		0.686	0.208	0.086	0.701	0.912	0.978
Monodepth2^［10］		0.600	0.161	0.068	0.771	0.948	0.987
SC-Depth^［51］		0.608	0.159	0.068	0.772	0.939	0.982
P²Net^［12］		0.561	0.150	0.064	0.796	0.948	0.986
P²Net（5 frames PP）^［12］	Self-supervised	0.553	0.147	0.062	0.801	0.951	0.987
Bian et al.^［52］		0.536	0.147	0.062	0.804	0.950	0.986
PLNet （5 frames）^［53］		0.540	0.144	0.061	0.807	0.957	0.990
Zhan et al.^［54］		0.538	0.143	0.060	0.812	0.951	0.986
StructDepth^［24］		0.540	0.142	0.060	0.813	0.954	0.988
Our		0.530	0.138	0.059	0.819	0.959	0.990

Table 3. Comparison of experimental results between the model in this paper and existing main methods on ScanNet dataset

View table

View in Article

Table 3. Comparison of experimental results between the model in this paper and existing main methods on ScanNet dataset

Method	Method of Supervision	Indicator of Error （Lower is better）			Accuracy of prediction （Higher is better）
Method	Method of Supervision	RMSE	Abs Rel	RMS Log10	$δ < 1.25$	$δ < 1 . 25^{2}$	$δ < 1 . 25^{3}$
MovingIndoor^［22］	Self-supervised	0.483	0.212	0.088	0.650	0.905	0.976
Monodepth2^［10］		0.451	0.1911	0.080	0.693	0.926	0.983
P²Net^［12］		0.420	0.175	0.074	0.740	0.932	0.982
P²Net-finetune^［24］		0.412	0.172	0.073	0.743	0.935	0.984
StructDepth^［24］		0.400	0.165	0.070	0.754	0.939	0.985
Our		0.391	0.162	0.069	0.760	0.946	0.987

Table 4. Ablation experiment of Several Cross-Level feature fusion structure on NYU Depth V2 dataset

View table

View in Article

Table 4. Ablation experiment of Several Cross-Level feature fusion structure on NYU Depth V2 dataset

Method	Position	Indicator of Error （Lower is better）			Accuracy of prediction （Higher is better）
Method	Position	RMSE	Abs Rel	RMS Log10	$δ < 1.25$	$δ < 1 . 25^{2}$	$δ < 1 . 25^{3}$
Baseline	/	0.540	0.142	0.060	0.813	0.954	0.988
Single Level	F₁	0.543	0.143	0.061	0.810	0.953	0.988
	F₂	0.543	0.142	0.061	0.811	0.954	0.988
	F₃	0.540	0.142	0.061	0.812	0.954	0.988
	F₄	0.540	0.141	0.060	0.814	0.955	0.989
Double Levels	F₂，F₃	0.540	0.140	0.061	0.815	0.956	0.988
	F₂，F₄	0.539	0.140	0.060	0.816	0.955	0.989
	F₃，F₄	0.537	0.139	0.060	0.817	0.955	0.989
Three Levels	F₂，F₃，F₄	0.540	0.140	0.060	0.815	0.955	0.988

Table 5. Ablation experiment of different innovative modules on NYU Depth V2 dataset

View table

View in Article

Table 5. Ablation experiment of different innovative modules on NYU Depth V2 dataset

Method	CLFA	GMSL	MCIE	Indicator of Error （Lower is better）			Accuracy of prediction （Higher is better）
Method	CLFA	GMSL	MCIE	RMSE	Abs Rel	RMS Log10	$δ < 1.25$	$δ < 1 . 25^{2}$	$δ < 1 . 25^{3}$
Baseline	×	×	×	0.540	0.142	0.060	0.813	0.954	0.988
1	√	×	×	0.537	0.139	0.060	0.817	0.955	0.989
2	×	√	×	0.536	0.140	0.060	0.816	0.956	0.989
3	×	×	√	0.539	0.141	0.061	0.815	0.955	0.988
4	√	√	×	0.533	0.138	0.059	0.818	0.957	0.990
5	√	×	√	0.536	0.139	0.059	0.817	0.956	0.989
6	×	√	√	0.537	0.139	0.060	0.817	0.955	0.988
7	√	√	√	0.530	0.138	0.059	0.819	0.959	0.990

Tools

Get Citation

Copy Citation Text

Deqiang CHENG, Huaqiang ZHANG, Qiqi KOU, Chen LÜ, Jiansheng QIAN. Indoor self-supervised monocular depth estimation based on level feature fusion[J]. Optics and Precision Engineering, 2023, 31(20): 2993

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category: Information Sciences

Received: Mar. 1, 2023

Accepted: --

Published Online: Nov. 28, 2023

The Author Email: Jiansheng QIAN (qianjsh@cumt.edu.cn)

DOI:10.37188/OPE.20233120.2993

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology