Optics and Precision Engineering, Volume. 31, Issue 16, 2444(2023)

Review of multi-view stereo reconstruction methods based on deep learning

Huabiao YAN1, Fangqi XU1, Lü'er HUANG2、*, Cibo LIU1, and Chuxin LIN1
Author Affiliations
  • 1School of Science, Jiangxi University of Science and Technology, Ganzhou34000, China
  • 2School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou341000, China
  • show less
    References(124)

    [1] FURUKAWA Y, HERNÁNDEZ C. Multi-view stereo: a tutorial[J]. Foundations and Trends® in Computer Graphics and Vision, 9, 1-148(2015).

    [2] SMITH M W, CARRIVICK J L, QUINCEY D J. Structure from motion photogrammetry in physical geography[J]. Progress in Physical Geography: Earth and Environment, 40, 247-275(2016).

    [3] [3] 3刘东生, 陈建林, 费点, 等. 基于深度相机的大场景三维重建[J]. 光学 精密工程, 2020, 28(1): 234-243. doi: 10.3788/ope.20202801.0234LIUD S, CHENJ L, FEID, et al. Three-dimensional reconstruction of large-scale scene based on depth camera[J]. Opt. Precision Eng., 2020, 28(1): 234-243.(in Chinese). doi: 10.3788/ope.20202801.0234

    [4] SCHÖNBERGER J L, ZHENG E L, FRAHM J M et al. Pixelwise View Selection for Unstructured Multi-View Stereo[M]. Computer Vision - ECCV 2016, 501-518(2016).

    [5] XU Q S, TAO W B. Multi-scale geometric consistency guided multi-view stereo[C], 5478-5487(15).

    [6] [6] 6张宝祥, 玉振明, 杨秋慧. 基于Harris-SIFT算法和全卷积深度预测的显微镜成像的三维重建研究[J]. 光学 精密工程, 2022, 30(14): 1669-1681. doi: 10.37188/OPE.20223014.1669ZHANGB X, YUZ M, YANGQ H. Research on 3D reconstruction of microscope imaging based on Harris-SIFT algorithm and full convolution depth prediction[J]. Opt. Precision Eng., 2022, 30(14): 1669-1681.(in Chinese). doi: 10.37188/OPE.20223014.1669

    [7] JI M Q, GALL J, ZHENG H T et al. SurfaceNet: an End-to-End 3D neural network for multiview stereopsis[C], 2326-2334(22).

    [9] AANÆS H, JENSEN R R, VOGIATZIS G et al. Large-scale data for multiple-view stereopsis[J]. International Journal of Computer Vision, 120, 153-168(2016).

    [10] KNAPITSCH A, PARK J, ZHOU Q Y et al. Tanks and temples: benchmarking large-scale scene reconstruction[J]. ACM Transactions on Graphics, 36, 1-13.

    [11] YAO Y, LUO Z X, LI S W et al. MVSNet Depth Inference for Unstructured Multi-View Stereo[M]. Computer Vision-ECCV 2018, 785-801(2018).

    [12] LI L Y, LI X Y, JIANG L Y et al. A review on deep learning techniques for cloud detection methodologies and challenges[J]. Signal, Image and Video Processing, 15, 1527-1535(2021).

    [14] WANG X, WANG C, LIU B et al. Multi-view stereo in the deep learning era: a comprehensive review[J]. Displays, 70, 102102(2021).

    [15] GALLUP D, FRAHM J M, MORDOHAI P et al. Real-time plane-sweeping stereo with multiple sweeping directions[C], 1-8(17).

    [16] YAO Y, LUO Z X, LI S W et al. Recurrent MVSNet for high-resolution multi-view stereo depth inference[C], 5520-5529(15).

    [17] CHEN R, HAN S F, XU J et al. Point-based multi-view stereo network[C], 1538-1547.

    [18] XUE Y Z, CHEN J S, WAN W T et al. MVSCRF: learning multi-view stereo with conditional random fields[C], 4311-4320.

    [19] GU X D, FAN Z W, ZHU S Y et al. Cascade cost volume for high-resolution multi-view stereo and stereo matching[C], 2492-2501(13).

    [20] YANG J Y, MAO W, ALVAREZ J M et al. Cost volume pyramid based depth inference for multi-view stereo[C], 4876-4885(13).

    [21] RONNEBERGER O, FISCHER P, BROX T. U-net Convolutional Networks for Biomedical Image Segmentation[M]. Lecture Notes in Computer Science, 234-241(2015).

    [23] SHI Y F, XI J H, HU D W et al. RayMVSNet: learning ray-based 1D implicit fields for accurate multi-view stereo[C], 1-17(2023).

    [24] YI H W, WEI Z Z, DING M Y et al. Pyramid Multi-View Stereo Net with Self-Adaptive View Aggregation[M]. Computer Vision - ECCV 2020, 766-782(2020).

    [26] LIN T Y, DOLLÁR P, GIRSHICK R et al. Feature pyramid networks for object detection[C], 936-944(21).

    [27] CHENG S, XU Z X, ZHU S L et al. Deep stereo using adaptive thin volume representation with uncertainty awareness[C], 2521-2531(13).

    [28] LI Y, LI W Y, ZHAO Z J et al. DRI-MVSNet: a depth residual inference network for multi-view stereo images[J]. PLoS One, 17(2022).

    [30] ZHANG K, LIU M Y, ZHANG J L et al. PA-MVSNet: sparse-to-dense multi-view stereo with pyramid attention[J]. IEEE Access, 9, 27908-27915(2021).

    [31] ANZHU, YU, et al, ANZHU, YU, et al. Attention aware cost volume pyramid based multi-view stereo network for 3D reconstruction[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 175, 448-460(2021).

    [32] LI J J, BAI Z Y, CHENG W et al. Feature pyramid multi-view stereo network based on self-attention mechanism[C], 226-233(9).

    [33] PARMAR N, RAMACHANDRAN P, VASWANI A et al. Stand-alone self-attention in vision models[C], 13(2019).

    [34] ZHANG X D, HU Y T, WANG H C et al. Long-range attention network for multi-view stereo[C], 3781-3790(2021).

    [35] LIU W J, WANG J K, QU H C et al. Hierarchical MVSNet with cost volume separation and fusion based on U-shape feature extraction[J]. Multimedia Systems, 29, 377-387(2023).

    [36] PARK J, LEE J Y et al. CBAM Convolutional Block Attention Module[M]. Computer Vision - ECCV 2018, 3-19(2018).

    [37] CAO C, REN X, FU Y. MVSFormer: multi-view stereo with pre-trained vision transformers and temperature-based depth[J]. arXiv preprint arXiv:, 2022.

    [39] SAEED S, LEE S, CHO Y et al. ASPPMVSNet: a high-receptive-field multiview stereo network for dense three-dimensional reconstruction[J]. ETRI Journal, 44, 1034-1046(2022).

    [40] CHEN L C, PAPANDREOU G, KOKKINOS I et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848(2018).

    [41] WEI Z Z, ZHU Q T, MIN C et al. AA-RMVSNet: adaptive aggregation recurrent multi-view stereo network[C], 6167-6176(10).

    [42] MASSON J E N, PETRY M R, COUTINHO D F et al. Deformable convolutions in multi-view stereo[J]. Image and Vision Computing, 118, 104369(2022).

    [43] CHENG W, BAI Z Y, LI J J et al. ADIM-MVSNet: adaptive depth interval multi-view stereo network for 3d reconstruction[C], 281-287(2022).

    [44] DAI J F, QI H Z, XIONG Y W et al. Deformable convolutional networks[C], 764-773(22).

    [45] DING Y K, YUAN W T, ZHU Q T et al. TransMVSNet: global context-aware multi-view stereo network with transformers[C], 8575-8584(18).

    [46] GIANG KT, SONG S. Curvature-guided dynamic scale networks for multi-view stereo[J]. arXiv preprint arXiv:, 2022.

    [47] YAN J F, WEI Z Z, YI H W et al. Dense hybrid recurrent multi-view stereo net with dynamic consistency checking[C], 674-689(23).

    [48] YU Z H, GAO S H. Fast-MVSNet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement[C], 1946-1955(13).

    [49] CHEN R, HAN S F, XU J et al. Visibility-aware point-based multi-view stereo network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 3695-3708(2020).

    [50] WEILHARTER R, FRAUNDORFER F. HighRes-MVSNet: a fast multi-view stereo network for dense 3D reconstruction from high-resolution images[J]. IEEE Access, 9, 11306-11315(2021).

    [52] VASWANI A, SHAZEER N, PARMAR N et al. Attention is all you need[C], 6000-6010(9).

    [53] WANG X F, ZHU Z, HUANG G et al. MVSTER Epipolar Transformer for Efficient Multi-View Stereo[M]. Lecture Notes in Computer Science, 573-591(2022).

    [56] HE Y H, YAN R, FRAGKIADAKI K et al. Epipolar transformers[C], 7776-7785(13).

    [57] CHEN P H, YANG H C, CHEN K W et al. MVSNet: learning depth-based attention pyramid features for multi-view stereo[J]. IEEE Transactions on Image Processing, 29, 7261-7273(2020).

    [58] BENGIO Y, LOURADOUR J, COLLOBERT R et al. Curriculum learning[C], 41-48(18).

    [59] GUO X Y, YANG K, YANG W K et al. Group-wise correlation stereo network[C], 3268-3277(15).

    [60] XU Q S, TAO W B. Learning inverse depth regression for multi-view stereo with correlation cost volume[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12508-12515(2020).

    [62] WANG F, GALLIANI S, VOGEL C et al. PatchmatchNet: learned multi-view patchmatch stereo[C], 14189-14198(20).

    [63] SONG B, HU X, XIAO J et al. Implicit neural refinement based multi-view stereo network with adaptive correlation[J]. Image and Vision Computing, 124, 104511(2022).

    [64] CAI Y C, LI L, WANG D et al. MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction[J]. Applied Intelligence, 53, 4289-4301(2023).

    [65] GAO S Y, LI Z X, WANG Z Q. Cost Volume Pyramid Network with Multi-Strategies Range Searching for Multi-View Stereo[M]. Advances in Computer Graphics, 157-169(2022).

    [66] LUO K Y, GUAN T, JU L L et al. P-MVSNet: learning patch-wise matching confidence aggregation for multi-view stereo[C], 10451-10460(2019).

    [67] MA X J, GONG Y, WANG Q R et al. EPP-MVSNet: epipolar-assembling based depth prediction for multi-view stereo[C], 5712-5720(10).

    [68] PENG R, WANG R J, WANG Z Y et al. Rethinking depth estimation for multi-view stereo: a unified representation[C], 8635-8644(18).

    [69] XU H F, ZHANG J Y. AANet: adaptive aggregation network for efficient stereo matching[C], 1956-1965(13).

    [70] SORMANN C, KNÖBELREITER P, KUHN A et al. BP-MVSNet: belief-propagation-layers for multi-view-stereo[C], 394-403(2021).

    [71] QI Y, SU W, XU Q et al. Sparse prior guided deep multi-view stereo[J]. Computers & Graphics, 107, 1-9(2022).

    [72] LIU J, JI S P. A novel recurrent encoder-decoder structure for large-scale multi-view stereo reconstruction from an open aerial dataset[C], 6049-6058(13).

    [73] XU Q, OSWALD M R, TAO W et al. Non-local recurrent regularization networks for multi-view stereo[J]. arXiv preprint arXiv:, 2021.

    [74] WANG F, GALLIANI S, VOGEL C et al. IterMVS: iterative probability estimation for efficient multi-view stereo[C], 8596-8605(18).

    [75] MI Z X, DI C, XU D. Generalized binary search network for highly-efficient multi-view stereo[C], 12981-12990(18).

    [76] LEE J Y, DEGOL J, ZOU C H et al. PatchMatch-RL: deep mvs with pixelwise depth, normal, and visibility[C], 6138-6147(10).

    [77] YANG J Y, ALVAREZ J M, LIU M M. Non-parametric depth distribution modelling based depth inference for multi-view stereo[C], 8616-8624(18).

    [78] WANG S Q, LI B, DAI Y C. Efficient multi-view stereo by iterative dynamic cost volume[C], 8645-8654(18).

    [79] LI Y, ZHAO Z, FAN J et al. ADR-MVSNet: a cascade network for 3D point cloud reconstruction with pixel occlusion[J]. Pattern Recognition, 125, 108516(2022).

    [80] LAFFERTY J, MCCALLUM A, PEREIRA FC. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C](2001).

    [81] ZHENG S, JAYASUMANA S, ROMERA-PAREDES B et al. Conditional random fields as recurrent neural networks[C], 1529-1537(7).

    [82] KNÖBELREITER P, SORMANN C, SHEKHOVTSOV A et al. Belief propagation reloaded: learning BP-Layers for labeling problems[C], 7897-7906(13).

    [83] LUO K Y, GUAN T, JU L L et al. Attention-aware multi-view stereo[C], 1587-1596(13).

    [84] WEI Z Z, ZHU Q T, MIN C et al. Bidirectional hybrid LSTM based recurrent neural network for multi-view stereo[J]. IEEE Transactions on Visualization and Computer Graphics(2022).

    [85] LIN T Y, GOYAL P, GIRSHICK R et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 318-327(2020).

    [86] ZHOU H Z, ZHAO H L, WANG Q et al. Miper-MVS: multi-scale iterative probability estimation with refinement for efficient multi-view stereo[J]. Neural Networks, 162, 502-515(2023).

    [87] DING Y K, LI Z Y, HUANG D H et al. Enhancing Multi-View stereo with contrastive matching and weighted focal loss[C], 821-825(2022).

    [89] KHOT T, AGRAWAL S, TULSIANI S et al. Learning unsupervised multi-view stereopsis via robust photometric consistency[J]. arXiv preprint arXiv:, 02706, 2019(1905).

    [90] DAI Y C, ZHU Z D, RAO Z B et al. MVS2: Deep unsupervised multi-view stereo with multi-view symmetry[C], 1-8(2019).

    [91] MALLICK A, STÜCKLER J, LENSCH H. Learning to adapt multi-view stereo by self-supervision[J]. arXiv preprint arXiv, 2020.

    [92] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C], 1126-1135(11).

    [93] HUANG B C, YI H W, HUANG C et al. M3VSNET: unsupervised multi-metric multi-view stereo network[C], 3163-3167(19).

    [94] XU H B, ZHOU Z P, QIAO Y et al. Self-supervised multi-view stereo via effective co-segmentation and data-augmentation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 3030-3038(2021).

    [95] YANG J Y, ALVAREZ J M, LIU M M. Self-supervised learning of depth inference for multi-view stereo[C], 7522-7530(20).

    [96] XU H B, ZHOU Z P, WANG Y L et al. Digging into uncertainty in self-supervised multi-view stereo[C], 6058-6067(10).

    [97] QI S, SANG X, YAN B et al. Unsupervised multi-view stereo network based on multi-stage depth estimation[J]. Image and Vision Computing, 122, 104449(2022).

    [98] DONG H, YAO J. PatchMVSNet: patch-wise unsuper-vised multi-view stereo for weakly-textured surface reconstruction[J]. arXiv preprint arXiv:, 2022.

    [99] CHANG D, BOŽIČ A, ZHANG T et al. RC-MVSNet Unsupervised Multi-View Stereo with Neural Rendering[M]. Lecture Notes in Computer Science, 665-680(2022).

    [100] MILDENHALL B, SRINIVASAN P P, TANCIK M et al. NeRF[J]. Communications of the ACM, 65, 99-106(2022).

    [101] CHEN A P, XU Z X, ZHAO F Q et al. MVSNeRF: fast generalizable radiance field reconstruction from multi-view stereo[C], 14104-14113(10).

    [102] XU Q G, XU Z X, PHILIP J et al. Point-nerf: point-based neural radiance fields[C], 5428-5438(18).

    [103] ZHANG J Z, JI M Q, WANG G Y et al. SurRF: unsupervised multi-view stereopsis by learning surface radiance field[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 7912-7927(2022).

    [105] DING Y K, ZHU Q T, LIU X Y et al. KD-MVS Knowledge Distillation Based Self-Supervised Learning for Multi-View Stereo[M]. Lecture Notes in Computer Science, 630-646(2022).

    [106] VLADLEN KOLTUN et al. Adaptive surface reconstruction with multiscale convolutional kernels[J]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV): 5631(5640).

    [107] SCHÖPS T, SATTLER T, POLLEFEYS M. BAD SLAM: bundle adjusted direct RGB-D SLAM[C], 134-144(15).

    [108] YAO Y, LUO Z X, LI S W et al. BlendedMVS: a large-scale dataset for generalized multi-view stereo networks[C], 1787-1796(13).

    [109] ZHANG J N, ZHANG J Z, MAO S et al. GigaMVS: a benchmark for ultra-large-scale gigapixel-level 3D reconstruction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 7534-7550(2022).

    [110] MA Z Y, TEED Z, DENG J. Multiview Stereo with Cascaded Epipolar RAFT[M]. Lecture Notes in Computer Science, 734-750(2022).

    [111] LI Z X, ZUO W M, WANG Z Q et al. Confidence-based large-scale dense multi-view stereo[J]. IEEE Transactions on Image Processing, 29, 7176-7191(2020).

    [112] LI W, ZHU D, WANG Q. A single view leaf reconstruction method based on the fusion of ResNet and differentiable render in plant growth digital twin system[J]. Computers and Electronics in Agriculture, 193, 106712(2022).

    [113] DENG X P, QIU S, JIN W Q et al. Three-dimensional reconstruction method for bionic compound-eye system based on MVSNet network[J]. Electronics, 11, 1790(2022).

    [114] [114] 114郝雯, 张雯静, 梁玮, 等. 面向三维点云的场景识别方法综述[J]. 光学 精密工程, 2022, 30(16): 1988-2005. doi: 10.37188/OPE.20223016.1988HAOW, ZHANGW J, LIANGW, et al. Scene recognition for 3D point clouds: a review[J]. Opt. Precision Eng., 2022, 30(16): 1988-2005. (in Chinese). doi: 10.37188/OPE.20223016.1988

    [115] EBNER T, FELDMANN I, RENAULT S et al. Multi-view reconstruction of dynamic real-world objects and their integration in augmented and virtual reality applications[J]. Journal of the Society for Information Display, 25, 151-157(2017).

    [116] [116] 116李兆歆, 蒋浩, 刘衍青, 等. 丝路文化虚拟体验中的多视角立体重建技术研究[J]. 计算机学报, 2022, 45(3): 500-512. doi: 10.11897/SP.J.1016.2022.00500LIZ X, JIANGH, LIUY Q, et al. Research on multi-view stereo 3D reconstruction in virtual reality system of silk road cultural inheritance[J]. Chinese Journal of Computers, 2022, 45(3): 500-512. (in Chinese). doi: 10.11897/SP.J.1016.2022.00500

    [117] [117] 117余加勇, 薛现凯, 陈昌富, 等. 基于无人机倾斜摄影的公路边坡三维重建与灾害识别方法[J]. 中国公路学报, 2022, 35(4): 77-86. doi: 10.3969/j.issn.1001-7372.2022.04.005YUJ Y, XUEX K, CHENC F, et al. Three-dimensional reconstruction and disaster identification of highway slope using unmanned aerial vehicle-based oblique photography technique[J]. China Journal of Highway and Transport, 2022, 35(4): 77-86. (in Chinese). doi: 10.3969/j.issn.1001-7372.2022.04.005

    [118] HU Z, HOU Y, TAO P et al. IMGTR: Image-triangle based multi-view 3D reconstruction for urban scenes[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 181, 191-204(2021).

    [119] ORSINGHER M, ZANI P, MEDICI P et al. Revisiting patchmatch multi-view stereo for urban 3D reconstruction[C], 190-196(4).

    [120] ZHOU Y X, EIMEN R L, SEIBEL E J et al. Cost-efficient video synthesis and evaluation for development of virtual 3D endoscopy[J]. IEEE Journal of Translational Engineering in Health and Medicine, 9, 1-11(2021).

    [121] [121] 121何东健, 熊虹婷, 芦忠忠, 等. 基于多视角立体视觉的拔节期玉米水分胁迫预测模型[J]. 农业机械学报, 2020, 51(6): 248-257. doi: 10.6041/j.issn.1000-1298.2020.06.026HED J, XIONGH T, LUZ Z, et al. Predictive model of maize moisture stress during jointing stage based on multi-view stereo vision[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51(6): 248-257. (in Chinese). doi: 10.6041/j.issn.1000-1298.2020.06.026

    [122] [122] 122王思启, 张家强, 李丽圆, 等. MVSNet在空间目标三维重建中的应用[J]. 中国激光, 2022, 49(23): 2310003.WANGS Q, ZHANGJ Q, LIL Y, et al. Application of MVSNet in 3D reconstruction of space objects[J]. Chinese Journal of Lasers, 2022, 49(23): 2310003. (in Chinese)

    [123] GÓMEZ A, RANDALL G, FACCIOLO G et al. An Experimental comparison of multi-view stereo approaches on satellite images[C], 707-716(3).

    [124] LU J C, LI Y X, ZUO Z C. SatMVS A Novel 3D Reconstruction Pipeline for Remote Sensing Satellite Imagery[M]. Lecture Notes in Electrical Engineering, 521-538(2022).

    CLP Journals

    [1] Qiqi KOU, Weichen WANG, Chenggong HAN, Chen LÜ, Deqiang CHENG, Yucheng JI. Multi-frame self-supervised monocular depth estimation with multi-scale feature enhancement[J]. Optics and Precision Engineering, 2024, 32(24): 3603

    [2] Wenjie LI, Wulang LIU, Beibei WANG, Yuyuan HUANG, Guijie LIU, Fuquan LI. Precision three-dimensional measurement method of telecentric imaging based on calibration parameter correction[J]. Optics and Precision Engineering, 2025, 33(2): 165

    [3] Qiqi KOU, Weichen WANG, Chenggong HAN, Chen LÜ, Deqiang CHENG, Yucheng JI. Multi-frame self-supervised monocular depth estimation with multi-scale feature enhancement[J]. Optics and Precision Engineering, 2024, 32(24): 3603

    [4] Wenjie LI, Wulang LIU, Beibei WANG, Yuyuan HUANG, Guijie LIU, Fuquan LI. Precision three-dimensional measurement method of telecentric imaging based on calibration parameter correction[J]. Optics and Precision Engineering, 2025, 33(2): 165

    Tools

    Get Citation

    Copy Citation Text

    Huabiao YAN, Fangqi XU, Lü'er HUANG, Cibo LIU, Chuxin LIN. Review of multi-view stereo reconstruction methods based on deep learning[J]. Optics and Precision Engineering, 2023, 31(16): 2444

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Information Sciences

    Received: Nov. 14, 2022

    Accepted: --

    Published Online: Sep. 5, 2023

    The Author Email: Lü'er HUANG (9320080310@jxust.edu.cn)

    DOI:10.37188/OPE.20233116.2444

    Topics