Multi-frame self-supervised monocular depth estimation with multi-scale feature enhancement

Qiqi KOU; Weichen WANG; Chenggong HAN; Chen LÜ; Deqiang CHENG; Yucheng JI

doi:10.37188/OPE.20243224.3603

Optics and Precision Engineering, Volume. 32, Issue 24, 3603(2024)

Multi-frame self-supervised monocular depth estimation with multi-scale feature enhancement

Qiqi KOU¹, Weichen WANG², Chenggong HAN², Chen LÜ², Deqiang CHENG², and Yucheng JI^3、*

Author Affiliations

¹School of Computer Science and Technology，China University of Mining and Technology， Xuzhou226，China

²School of Information and Control Engineering，China University of Mining and Technology， Xuzhou1116，China

³Department Big Data Center，Ministry of Emergency Management， Beijing10001， China

show less

Abstract Get PDF(in Chinese)

References(38)

[1] WU X R, XUE Q W. 3D vehicle detection for unmanned driving systerm based on lidar[J]. Opt. Precision Eng., 30, 489-497(2022).

伍锡如, 薛其威. 基于激光雷达的无人驾驶系统三维车辆检测[J]. 光学精密工程, 30, 489-497(2022).

[2] 史晓刚, 薛正辉, 李会会. 增强现实显示技术综述[J]. 中国光学, 14, 1146-1161(2021).

SHI X G, XUE Z H, LI H H et al. Review of augmented reality display technology[J]. Chinese Optics, 14, 1146-1161(2021).

[3] YAN H B, XU F Q, HUANG L /LÜ）E et al. Review of multi-view stereo reconstruction methods based on deep learning[J]. Opt. Precision Eng., 31, 2444-2464(2023).

鄢化彪, 徐方奇, 黄绿娥. 基于深度学习的多视图立体重建方法综述[J]. 光学精密工程, 31, 2444-2464(2023).

[4] GARG R, CARNEIRO G et al. Unsupervised CNN for single view depth estimation： geometry to the rescue[C], 740-756(2016).

[5] ZHOU T H, BROWN M, SNAVELY N et al. Unsupervised learning of depth and ego-motion from video[C], 6612-6619(2017).

[6] GODARD C, AODHA OMAC, BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency[C], 6602-6611(2017).

[7] BIAN J W, LI Z C, WANG N et al. Unsupervised scale-consistent depth and ego-motion learning from monocular video[C], 1-12(2019).

[8] GODARD C, AODHA OMAC, FIRMAN M et al. Digging into self-supervised monocular depth estimation[C], 3827-3837(2019).

[9] YAO Y, LUO Z X, LI S W et al. MVSNet： depth inference for unstructured multi-view stereo[C], 785-801(2018).

[10] WIMBAUER F, YANG N, VON STUMBERG L et al. MonoRec： semi-supervised dense reconstruction in dynamic environments from a single moving camera[C], 6108-6118(2021).

[11] SCHÖPS T, SCHÖNBERGER J L, GALLIANI S et al. A multi-view stereo benchmark with high-resolution images and multi-camera videos[C], 2538-2547(2017).

[12] KNAPITSCH A, PARK J, ZHOU Q Y et al. Tanks and temples： Benchmarking large-scale scene reconstruction[J]. ACM Transactions on Graphics, 36, 1-13(2017).

[13] WATSON J, AODHA OMAC, PRISACARIU V et al. The temporal opportunist： self-supervised multi-frame monocular depth[C], 1164-1174(2021).

[14] FENG Z Y, YANG L, JING L L et al. Disentangling object motion and occlusion for unsupervised multi-frame monocular depth[C], 228-244(2022).

[15] SHAO S W, PEI Z C, CHEN W H et al. SMUDLP： Self-Teaching Multi-Frame Unsupervised Endoscopic Depth Estimation with Learnable Patchmatch[webpage]. arXiv, 2205-15034(2022). http：//arxiv.org/abs/2205.15034

[16] WANG X F, ZHU Z, HUANG G et al. Crafting monocular cues and velocity guidance for self-supervised multi-frame depth learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 2689-2697(2023).

[17] LI R, GONG D, YIN W et al. Learning to fuse monocular and multi-view cues for multi-frame depth estimation in dynamic scenes[C], 21539-21548(2023).

[18] GUO M H, LU C Z, LIU Z N et al. Visual attention network[J]. Computational Visual Media, 9, 733-752(2023).

[19] LIU W Z, LU H, FU H T et al. Learning to upsample by learning to sample[C], 6004-6014(2023).

[20] HE K M, ZHANG X Y, REN S Q et al. Deep residual learning for image recognition[C], 770-778(2016).

[21] DOSOVITSKIY A, FISCHER P et al. FlowNet： learning optical flow with convolutional networks[C], 2758-2766(2015).

[22] WANG W H, XIE E Z, LI X et al. PVT v2： improved baselines with pyramid vision transformer[J]. Computational Visual Media, 8, 415-424(2022).

[23] YAN J X, ZHAO H, BU P H et al. Channel-wise attention-based network for self-supervised monocular depth estimation[C], 464-473(2021).

[24] EIGEN D, FERGUS R. Predicting depth， surface normals and semantic labels with a common multi-scale convolutional architecture[C], 2650-2658(2015).

[25] EIGEN D, PUHRSCH C, FERGUS R. Depth map prediction from a single image using a multi-scale deep network[C](2014).

[26] JOHNSTON A, CARNEIRO G. Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume[C], 4755-4764(2020).

[27] GUIZILINI V, AMBRUS R, PILLAI S et al. 3D packing for self-supervised monocular depth estimation[C]. usa, 2482-2491(2020).

[28] XIANG J, WANG Y, AN L F et al. Visual attention-based self-supervised absolute depth estimation using geometric priors in autonomous driving[J]. IEEE Robotics and Automation Letters, 7, 11998-12005(2022).

[29] SURI Z K. Pose Constraints for consistent self-supervised monocular depth and ego-motion[C], 340-353(2023).

[30] BOULAHBAL H, VOICILA A, COMPORT A. STDepthFormer： Predicting Spatio-Temporal Depth from Video with A Self-Supervised Transformer Model[webpage]. arXiv, 2303-01196(2023). http：//arxiv.org/abs/2303.01196

[31] PATIL V, VAN GANSBEKE W, DAI D X et al. Don’t forget the past： recurrent depth estimation from monocular video[J]. IEEE Robotics and Automation Letters, 5, 6813-6820(2020).

[32] SAUNDERS K, VOGIATZIS G, MANSO L J. Self-supervised monocular depth estimation： Let'S talk about the weather[C], 8873-8883(2023).

[33] WANG J R, ZHANG G, WU Z Y et al. Self-supervised Joint Learning Framework of Depth Estimation Via Implicit Cues[webpage]. arXiv, 2006-09876(2020). http：//arxiv.org/abs/2006.09876

[34] SHU C, YU K, DUAN Z X et al. Feature-metric Loss for self-supervised learning of depth and egomotion[C], 572-588(2020).

[35] LI H H, GORDON A, ZHAO H et al. Unsupervised monocular depth learning in dynamic scenes[C], 1908-1917(2021).

Tools

Get Citation

Copy Citation Text

Qiqi KOU, Weichen WANG, Chenggong HAN, Chen LÜ, Deqiang CHENG, Yucheng JI. Multi-frame self-supervised monocular depth estimation with multi-scale feature enhancement[J]. Optics and Precision Engineering, 2024, 32(24): 3603

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: Jun. 8, 2024

Accepted: --

Published Online: Mar. 11, 2025

The Author Email: Yucheng JI (j.yc@outlook.com)

DOI:10.37188/OPE.20243224.3603

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology