Infrared and Laser Engineering, Volume. 54, Issue 7, 20250157(2025)

Recent progress in research and applications of monocular and binocular depth estimation (invited)

Haiyang HU, Chaoping CHEN, Tianmu GAO, Baoen HAN, Yunfan YANG, Yi LIU, and Xiaojun WU
Author Affiliations
  • State Key Laboratory of Avionics Integration and Aviation System-of-Systems Synthesis, Shanghai Jiao Tong University, Shanghai 200240, China
  • show less
    References(56)

    [6] ARAMPATZAKIS V, PAVLIDIS G, MITIANOUDIS N et al. Monocular depth estimation: A thorough review[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 2396-2414(2023).

    [8] [8] SAXENA A, CHUNG S, NG A. Learning depth from single monocular images [J]. Advances in Neural Infmation Processing Systems, 2005: 11611168.

    [13] [13] EIGEN D, PUHRSCH C, FERGUS R. Depth map prediction from a single image using a multiscale deep wk [J]. Advances in Neural Infmation Processing Systems, 2014, 27.

    [14] [14] LAINA I, RUPPRECHT C, BELAGIANNIS V, et al. Deeper depth prediction with fully convolutional residual wks[C]Proceedings of the 2016 Fourth International Conference on 3D Vision, 2016: 239248.

    [15] [15] FU H, GONG M, WANG C, et al. Deep dinal regression wk f monocular depth estimation[C]Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, 2018: 20022011.

    [17] [17] LIU Z, LIN Y, CAO Y, et al. Swin transfmer: Hierarchical vision transfmer using shifted windows[C]Proceedings of the IEEECVF International Conference on Computer Vision, 2021: 1001210022.

    [18] LIU Y, ZHANG Y, WANG Y et al. A survey of visual transformers[J]. arXiv, 2111.06091cs, 2023.

    [19] [19] RANFTL R, BOCHKOVSKIY A, KOLTUN V. Vision transfmers f dense prediction[C]Proceedings of the IEEECVF International Conference on Computer Vision, 2021: 1217912188.

    [21] [21] GARG R, BG V K, CARNEIRO G, et al. Unsupervised cnn f single view depth estimation: Geometry to the rescue[C]Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The herls, October 1114, 2016, Proceedings, Part VIII 14, 2016: 740756.

    [22] [22] GODARD C, MAC AODHA O, BROSTOW G J. Unsupervised monocular depth estimation with leftright consistency[C]Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, 2017: 270279.

    [23] [23] GODARD C, MAC AODHA O, FIRMAN M, et al. Digging into selfsupervised monocular depth estimation[C]Proceedings of the IEEECVF International Conference on Computer Vision, 2019: 38283838.

    [24] [24] ZHOU T, BROWN M, SNAVELY N, et al. Unsupervised learning of depth egomotion from video[C]Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, 2017: 18511858.

    [25] [25] WATSON J, MAC AODHA O, PRISACARIU V, et al. The tempal opptunist: Selfsupervised multiframe monocular depth[C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2021: 11641174.

    [27] CHEN L-C, PAPANDREOU G, KOKKINOS I et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848(2017).

    [28] [28] BHAT S F, ALHASHIM I, WONKA P. Adabins: Depth estimation using adaptive bins[C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2021: 40094018.

    [30] [30] SILBERMAN N, HOIEM D, KOHLI P, et al. Indo segmentation suppt inference from rgbd images[C]Proceedings of the Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Flence, Italy, October 713, 2012, Proceedings, Part V 12, 2012: 746760.

    [31] [31] GEIGER A, LENZ P, URTASUN R. Are we ready f autonomous driving the kitti vision benchmark suite[C]Proceedings of the Conference on Computer Vision Pattern Recognition, 2012: 33543361.

    [33] [33] GAIDON A, WANG Q, CABON Y, et al. Virtual wlds as proxy f multiobject tracking analysis[C]Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, 2016: 43404349.

    [34] [34] YANG L, KANG B, HUANG Z, et al. Depth anything: Unleashing the power of largescale unlabeled data[C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition. 2024: 1037110381.

    [36] RANFTL R, LASINGER K, HAFNER D et al. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 1623-1637(2020).

    [37] CHOI S, GOPAKUMAR M, PENG Y et al. Neural 3D holography: Learning accurate wave propagation models for 3D holographic virtual and augmented reality displays[J]. ACM Transactions on Graphics, 40, 1-12(2021).

    [39] [39] KE B, OBUKHOV A, HUANG S, et al. Repurposing diffusionbased image generats f monocular depth estimation[C]Proceedings of the IEEECVF Conference on Computer Vision Pattern Recognition, 2024: 94929502.

    [41] LAGA H, JOSPIN L V, BOUSSAID F et al. A survey on deep learning techniques for stereo-based depth estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 1738-1764(2020).

    [43] HIRSCHMULLER H. Stereo processing by semiglobal matching and mutual information[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 328-341(2007).

    [44] [44] BARRON J T, ADAMS A, SHIH Y, et al. Fast bilateralspace stereo f synthetic defocus[C]Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, 2015: 44664474.

    [45] [45] MAYER N, ILG E, HAUSSER P, et al. A large dataset to train convolutional wks f disparity, optical flow, scene flow estimation[C]Proceedings of the Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, 2016: 40404048.

    [46] [46] CHANG JR, CHEN YS. Pyra stereo matching wk[C]Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, 2018: 54105418.

    [47] [47] KENDALL A, MARTIROSYAN H, DASGUPTA S, et al. Endtoend learning of geometry context f deep stereo regression[C]Proceedings of the IEEE International Conference on Computer Vision, 2017: 6675.

    [48] [48] LI Z, LIU X, DRENKOW N, et al. Revisiting stereo depth estimation from a sequencetosequence perspective with transfmers[C]Proceedings of the IEEECVF International Conference on Computer Vision, 2021: 61976206.

    [50] [50] HUANG X, CHENG X, GENG Q, et al. The apolloscape dataset f autonomous driving[C]Proceedings of the IEEE Conference on Computer Vision Pattern Recognition Wkshops, 2018: 954960.

    [51] [51] MAYER N, ILG E, HAUSSER P, et al. A large dataset to train convolutional wks f disparity, optical flow, scene flow estimation[C]Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, 2016: 40404048.

    [52] [52] SSTEIN D, SZELISKI R. Highaccuracy stereo depth maps using structured light[C]2003 IEEE Computer Society Conference on Computer Vision Pattern Recognition, 2003.

    [53] [53] KHAMIS S, FANELLO S, RHEMANN C, et al. Stereo: Guided hierarchical refinement f realtime edgeaware depth prediction[C]Proceedings of the European Conference on Computer Vision, 2018: 573590.

    [54] [54] PaddlePaddle. PaddleDepth: A Toolkit f Depth Infmation Argumentation[CPOL]. GitHub, 2023(20230503)[20250718]. https:github.comPaddlePaddlePaddleDepth.

    [56] [56] GAO T, ZOU D, CHEN C P, et al. Online lane mapping based on multisens SLAM CatmullRom splines [J] Measurement Science Technology, 2025, 36: 026318.

    Tools

    Get Citation

    Copy Citation Text

    Haiyang HU, Chaoping CHEN, Tianmu GAO, Baoen HAN, Yunfan YANG, Yi LIU, Xiaojun WU. Recent progress in research and applications of monocular and binocular depth estimation (invited)[J]. Infrared and Laser Engineering, 2025, 54(7): 20250157

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Special issue—Advanced display technology and applications

    Received: Mar. 10, 2025

    Accepted: --

    Published Online: Aug. 29, 2025

    The Author Email:

    DOI:10.3788/IRLA20250157

    Topics