Optics and Precision Engineering, Volume. 31, Issue 16, 2444(2023)

Review of multi-view stereo reconstruction methods based on deep learning

Huabiao YAN1... Fangqi XU1, Lü'er HUANG2,*, Cibo LIU1 and Chuxin LIN1 |Show fewer author(s)
Author Affiliations
  • 1School of Science, Jiangxi University of Science and Technology, Ganzhou34000, China
  • 2School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou341000, China
  • show less
    References(124)

    [1] Y FURUKAWA, C HERNÁNDEZ. Multi-view stereo: a tutorial. Foundations and Trends® in Computer Graphics and Vision, 9, 1-148(2015).

    [2] M W SMITH, J L CARRIVICK, D J QUINCEY. Structure from motion photogrammetry in physical geography. Progress in Physical Geography: Earth and Environment, 40, 247-275(2016).

    [3] [3] 3刘东生, 陈建林, 费点, 等. 基于深度相机的大场景三维重建[J]. 光学 精密工程, 2020, 28(1): 234-243. doi: 10.3788/ope.20202801.0234LIUD S, CHENJ L, FEID, et al. Three-dimensional reconstruction of large-scale scene based on depth camera[J]. Opt. Precision Eng., 2020, 28(1): 234-243.(in Chinese). doi: 10.3788/ope.20202801.0234

    [4] J L SCHÖNBERGER, E L ZHENG, J M FRAHM et al. Pixelwise View Selection for Unstructured Multi-View Stereo. Computer Vision - ECCV 2016, 501-518(2016).

    [5] Q S XU, W B TAO. Multi-scale geometric consistency guided multi-view stereo, 5478-5487(15).

    [6] [6] 6张宝祥, 玉振明, 杨秋慧. 基于Harris-SIFT算法和全卷积深度预测的显微镜成像的三维重建研究[J]. 光学 精密工程, 2022, 30(14): 1669-1681. doi: 10.37188/OPE.20223014.1669ZHANGB X, YUZ M, YANGQ H. Research on 3D reconstruction of microscope imaging based on Harris-SIFT algorithm and full convolution depth prediction[J]. Opt. Precision Eng., 2022, 30(14): 1669-1681.(in Chinese). doi: 10.37188/OPE.20223014.1669

    [7] M Q JI, J GALL, H T ZHENG et al. SurfaceNet: an End-to-End 3D neural network for multiview stereopsis, 2326-2334(22).

    [9] H AANÆS, R R JENSEN, G VOGIATZIS et al. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, 120, 153-168(2016).

    [10] A KNAPITSCH, J PARK, Q Y ZHOU et al. Tanks and temples: benchmarking large-scale scene reconstruction. ACM Transactions on Graphics, 36, 1-13.

    [11] Y YAO, Z X LUO, S W LI et al. MVSNet Depth Inference for Unstructured Multi-View Stereo. Computer Vision-ECCV 2018, 785-801(2018).

    [12] L Y LI, X Y LI, L Y JIANG et al. A review on deep learning techniques for cloud detection methodologies and challenges. Signal, Image and Video Processing, 15, 1527-1535(2021).

    [14] X WANG, C WANG, B LIU et al. Multi-view stereo in the deep learning era: a comprehensive review. Displays, 70, 102102(2021).

    [15] D GALLUP, J M FRAHM, P MORDOHAI et al. Real-time plane-sweeping stereo with multiple sweeping directions, 1-8(17).

    [16] Y YAO, Z X LUO, S W LI et al. Recurrent MVSNet for high-resolution multi-view stereo depth inference, 5520-5529(15).

    [17] R CHEN, S F HAN, J XU et al. Point-based multi-view stereo network, 1538-1547.

    [18] Y Z XUE, J S CHEN, W T WAN et al. MVSCRF: learning multi-view stereo with conditional random fields, 4311-4320.

    [19] X D GU, Z W FAN, S Y ZHU et al. Cascade cost volume for high-resolution multi-view stereo and stereo matching, 2492-2501(13).

    [20] J Y YANG, W MAO, J M ALVAREZ et al. Cost volume pyramid based depth inference for multi-view stereo, 4876-4885(13).

    [21] O RONNEBERGER, P FISCHER, T BROX. U-net Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science, 234-241(2015).

    [23] Y F SHI, J H XI, D W HU et al. RayMVSNet: learning ray-based 1D implicit fields for accurate multi-view stereo, 1-17(2023).

    [24] H W YI, Z Z WEI, M Y DING et al. Pyramid Multi-View Stereo Net with Self-Adaptive View Aggregation. Computer Vision - ECCV 2020, 766-782(2020).

    [26] T Y LIN, P DOLLÁR, R GIRSHICK et al. Feature pyramid networks for object detection, 936-944(21).

    [27] S CHENG, Z X XU, S L ZHU et al. Deep stereo using adaptive thin volume representation with uncertainty awareness, 2521-2531(13).

    [28] Y LI, W Y LI, Z J ZHAO et al. DRI-MVSNet: a depth residual inference network for multi-view stereo images. PLoS One, 17(2022).

    [30] K ZHANG, M Y LIU, J L ZHANG et al. PA-MVSNet: sparse-to-dense multi-view stereo with pyramid attention. IEEE Access, 9, 27908-27915(2021).

    [31] ANZHU, YU, et al, ANZHU, YU, et al. Attention aware cost volume pyramid based multi-view stereo network for 3D reconstruction. ISPRS Journal of Photogrammetry and Remote Sensing, 175, 448-460(2021).

    [32] J J LI, Z Y BAI, W CHENG et al. Feature pyramid multi-view stereo network based on self-attention mechanism, 226-233(9).

    [33] N PARMAR, P RAMACHANDRAN, A VASWANI et al. Stand-alone self-attention in vision models, 13(2019).

    [34] X D ZHANG, Y T HU, H C WANG et al. Long-range attention network for multi-view stereo, 3781-3790(2021).

    [35] W J LIU, J K WANG, H C QU et al. Hierarchical MVSNet with cost volume separation and fusion based on U-shape feature extraction. Multimedia Systems, 29, 377-387(2023).

    [36] J PARK, J Y LEE et al. CBAM Convolutional Block Attention Module. Computer Vision - ECCV 2018, 3-19(2018).

    [37] C CAO, X REN, Y FU. MVSFormer: multi-view stereo with pre-trained vision transformers and temperature-based depth. arXiv preprint arXiv:, 2022.

    [39] S SAEED, S LEE, Y CHO et al. ASPPMVSNet: a high-receptive-field multiview stereo network for dense three-dimensional reconstruction. ETRI Journal, 44, 1034-1046(2022).

    [40] L C CHEN, G PAPANDREOU, I KOKKINOS et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848(2018).

    [41] Z Z WEI, Q T ZHU, C MIN et al. AA-RMVSNet: adaptive aggregation recurrent multi-view stereo network, 6167-6176(10).

    [42] J E N MASSON, M R PETRY, D F COUTINHO et al. Deformable convolutions in multi-view stereo. Image and Vision Computing, 118, 104369(2022).

    [43] W CHENG, Z Y BAI, J J LI et al. ADIM-MVSNet: adaptive depth interval multi-view stereo network for 3d reconstruction, 281-287(2022).

    [44] J F DAI, H Z QI, Y W XIONG et al. Deformable convolutional networks, 764-773(22).

    [45] Y K DING, W T YUAN, Q T ZHU et al. TransMVSNet: global context-aware multi-view stereo network with transformers, 8575-8584(18).

    [46] KT GIANG, S SONG. Curvature-guided dynamic scale networks for multi-view stereo. arXiv preprint arXiv:, 2022.

    [47] J F YAN, Z Z WEI, H W YI et al. Dense hybrid recurrent multi-view stereo net with dynamic consistency checking, 674-689(23).

    [48] Z H YU, S H GAO. Fast-MVSNet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement, 1946-1955(13).

    [49] R CHEN, S F HAN, J XU et al. Visibility-aware point-based multi-view stereo network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 3695-3708(2020).

    [50] R WEILHARTER, F FRAUNDORFER. HighRes-MVSNet: a fast multi-view stereo network for dense 3D reconstruction from high-resolution images. IEEE Access, 9, 11306-11315(2021).

    [52] A VASWANI, N SHAZEER, N PARMAR et al. Attention is all you need, 6000-6010(9).

    [53] X F WANG, Z ZHU, G HUANG et al. MVSTER Epipolar Transformer for Efficient Multi-View Stereo. Lecture Notes in Computer Science, 573-591(2022).

    [56] Y H HE, R YAN, K FRAGKIADAKI et al. Epipolar transformers, 7776-7785(13).

    [57] P H CHEN, H C YANG, K W CHEN et al. MVSNet: learning depth-based attention pyramid features for multi-view stereo. IEEE Transactions on Image Processing, 29, 7261-7273(2020).

    [58] Y BENGIO, J LOURADOUR, R COLLOBERT et al. Curriculum learning, 41-48(18).

    [59] X Y GUO, K YANG, W K YANG et al. Group-wise correlation stereo network, 3268-3277(15).

    [60] Q S XU, W B TAO. Learning inverse depth regression for multi-view stereo with correlation cost volume. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12508-12515(2020).

    [62] F WANG, S GALLIANI, C VOGEL et al. PatchmatchNet: learned multi-view patchmatch stereo, 14189-14198(20).

    [63] B SONG, X HU, J XIAO et al. Implicit neural refinement based multi-view stereo network with adaptive correlation. Image and Vision Computing, 124, 104511(2022).

    [64] Y C CAI, L LI, D WANG et al. MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction. Applied Intelligence, 53, 4289-4301(2023).

    [65] S Y GAO, Z X LI, Z Q WANG. Cost Volume Pyramid Network with Multi-Strategies Range Searching for Multi-View Stereo. Advances in Computer Graphics, 157-169(2022).

    [66] K Y LUO, T GUAN, L L JU et al. P-MVSNet: learning patch-wise matching confidence aggregation for multi-view stereo, 10451-10460(2019).

    [67] X J MA, Y GONG, Q R WANG et al. EPP-MVSNet: epipolar-assembling based depth prediction for multi-view stereo, 5712-5720(10).

    [68] R PENG, R J WANG, Z Y WANG et al. Rethinking depth estimation for multi-view stereo: a unified representation, 8635-8644(18).

    [69] H F XU, J Y ZHANG. AANet: adaptive aggregation network for efficient stereo matching, 1956-1965(13).

    [70] C SORMANN, P KNÖBELREITER, A KUHN et al. BP-MVSNet: belief-propagation-layers for multi-view-stereo, 394-403(2021).

    [71] Y QI, W SU, Q XU et al. Sparse prior guided deep multi-view stereo. Computers & Graphics, 107, 1-9(2022).

    [72] J LIU, S P JI. A novel recurrent encoder-decoder structure for large-scale multi-view stereo reconstruction from an open aerial dataset, 6049-6058(13).

    [73] Q XU, M R OSWALD, W TAO et al. Non-local recurrent regularization networks for multi-view stereo. arXiv preprint arXiv:, 2021.

    [74] F WANG, S GALLIANI, C VOGEL et al. IterMVS: iterative probability estimation for efficient multi-view stereo, 8596-8605(18).

    [75] Z X MI, C DI, D XU. Generalized binary search network for highly-efficient multi-view stereo, 12981-12990(18).

    [76] J Y LEE, J DEGOL, C H ZOU et al. PatchMatch-RL: deep mvs with pixelwise depth, normal, and visibility, 6138-6147(10).

    [77] J Y YANG, J M ALVAREZ, M M LIU. Non-parametric depth distribution modelling based depth inference for multi-view stereo, 8616-8624(18).

    [78] S Q WANG, B LI, Y C DAI. Efficient multi-view stereo by iterative dynamic cost volume, 8645-8654(18).

    [79] Y LI, Z ZHAO, J FAN et al. ADR-MVSNet: a cascade network for 3D point cloud reconstruction with pixel occlusion. Pattern Recognition, 125, 108516(2022).

    [80] J LAFFERTY, A MCCALLUM, FC PEREIRA. Conditional random fields: probabilistic models for segmenting and labeling sequence data(2001).

    [81] S ZHENG, S JAYASUMANA, B ROMERA-PAREDES et al. Conditional random fields as recurrent neural networks, 1529-1537(7).

    [82] P KNÖBELREITER, C SORMANN, A SHEKHOVTSOV et al. Belief propagation reloaded: learning BP-Layers for labeling problems, 7897-7906(13).

    [83] K Y LUO, T GUAN, L L JU et al. Attention-aware multi-view stereo, 1587-1596(13).

    [84] Z Z WEI, Q T ZHU, C MIN et al. Bidirectional hybrid LSTM based recurrent neural network for multi-view stereo. IEEE Transactions on Visualization and Computer Graphics(2022).

    [85] T Y LIN, P GOYAL, R GIRSHICK et al. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 318-327(2020).

    [86] H Z ZHOU, H L ZHAO, Q WANG et al. Miper-MVS: multi-scale iterative probability estimation with refinement for efficient multi-view stereo. Neural Networks, 162, 502-515(2023).

    [87] Y K DING, Z Y LI, D H HUANG et al. Enhancing Multi-View stereo with contrastive matching and weighted focal loss, 821-825(2022).

    [89] T KHOT, S AGRAWAL, S TULSIANI et al. Learning unsupervised multi-view stereopsis via robust photometric consistency. arXiv preprint arXiv:, 02706, 2019(1905).

    [90] Y C DAI, Z D ZHU, Z B RAO et al. MVS2: Deep unsupervised multi-view stereo with multi-view symmetry, 1-8(2019).

    [91] A MALLICK, J STÜCKLER, H LENSCH. Learning to adapt multi-view stereo by self-supervision. arXiv preprint arXiv, 2020.

    [92] C FINN, P ABBEEL, S LEVINE. Model-agnostic meta-learning for fast adaptation of deep networks, 1126-1135(11).

    [93] B C HUANG, H W YI, C HUANG et al. M3VSNET: unsupervised multi-metric multi-view stereo network, 3163-3167(19).

    [94] H B XU, Z P ZHOU, Y QIAO et al. Self-supervised multi-view stereo via effective co-segmentation and data-augmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 3030-3038(2021).

    [95] J Y YANG, J M ALVAREZ, M M LIU. Self-supervised learning of depth inference for multi-view stereo, 7522-7530(20).

    [96] H B XU, Z P ZHOU, Y L WANG et al. Digging into uncertainty in self-supervised multi-view stereo, 6058-6067(10).

    [97] S QI, X SANG, B YAN et al. Unsupervised multi-view stereo network based on multi-stage depth estimation. Image and Vision Computing, 122, 104449(2022).

    [98] H DONG, J YAO. PatchMVSNet: patch-wise unsuper-vised multi-view stereo for weakly-textured surface reconstruction. arXiv preprint arXiv:, 2022.

    [99] D CHANG, A BOŽIČ, T ZHANG et al. RC-MVSNet Unsupervised Multi-View Stereo with Neural Rendering. Lecture Notes in Computer Science, 665-680(2022).

    [100] B MILDENHALL, P P SRINIVASAN, M TANCIK et al. NeRF. Communications of the ACM, 65, 99-106(2022).

    [101] A P CHEN, Z X XU, F Q ZHAO et al. MVSNeRF: fast generalizable radiance field reconstruction from multi-view stereo, 14104-14113(10).

    [102] Q G XU, Z X XU, J PHILIP et al. Point-nerf: point-based neural radiance fields, 5428-5438(18).

    [103] J Z ZHANG, M Q JI, G Y WANG et al. SurRF: unsupervised multi-view stereopsis by learning surface radiance field. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 7912-7927(2022).

    [105] Y K DING, Q T ZHU, X Y LIU et al. KD-MVS Knowledge Distillation Based Self-Supervised Learning for Multi-View Stereo. Lecture Notes in Computer Science, 630-646(2022).

    [106] KOLTUN VLADLEN et al. Adaptive surface reconstruction with multiscale convolutional kernels. 2021 IEEE/CVF International Conference on Computer Vision (ICCV): 5631(5640).

    [107] T SCHÖPS, T SATTLER, M POLLEFEYS. BAD SLAM: bundle adjusted direct RGB-D SLAM, 134-144(15).

    [108] Y YAO, Z X LUO, S W LI et al. BlendedMVS: a large-scale dataset for generalized multi-view stereo networks, 1787-1796(13).

    [109] J N ZHANG, J Z ZHANG, S MAO et al. GigaMVS: a benchmark for ultra-large-scale gigapixel-level 3D reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 7534-7550(2022).

    [110] Z Y MA, Z TEED, J DENG. Multiview Stereo with Cascaded Epipolar RAFT. Lecture Notes in Computer Science, 734-750(2022).

    [111] Z X LI, W M ZUO, Z Q WANG et al. Confidence-based large-scale dense multi-view stereo. IEEE Transactions on Image Processing, 29, 7176-7191(2020).

    [112] W LI, D ZHU, Q WANG. A single view leaf reconstruction method based on the fusion of ResNet and differentiable render in plant growth digital twin system. Computers and Electronics in Agriculture, 193, 106712(2022).

    [113] X P DENG, S QIU, W Q JIN et al. Three-dimensional reconstruction method for bionic compound-eye system based on MVSNet network. Electronics, 11, 1790(2022).

    [114] [114] 114郝雯, 张雯静, 梁玮, 等. 面向三维点云的场景识别方法综述[J]. 光学 精密工程, 2022, 30(16): 1988-2005. doi: 10.37188/OPE.20223016.1988HAOW, ZHANGW J, LIANGW, et al. Scene recognition for 3D point clouds: a review[J]. Opt. Precision Eng., 2022, 30(16): 1988-2005. (in Chinese). doi: 10.37188/OPE.20223016.1988

    [115] T EBNER, I FELDMANN, S RENAULT et al. Multi-view reconstruction of dynamic real-world objects and their integration in augmented and virtual reality applications. Journal of the Society for Information Display, 25, 151-157(2017).

    [116] [116] 116李兆歆, 蒋浩, 刘衍青, 等. 丝路文化虚拟体验中的多视角立体重建技术研究[J]. 计算机学报, 2022, 45(3): 500-512. doi: 10.11897/SP.J.1016.2022.00500LIZ X, JIANGH, LIUY Q, et al. Research on multi-view stereo 3D reconstruction in virtual reality system of silk road cultural inheritance[J]. Chinese Journal of Computers, 2022, 45(3): 500-512. (in Chinese). doi: 10.11897/SP.J.1016.2022.00500

    [117] [117] 117余加勇, 薛现凯, 陈昌富, 等. 基于无人机倾斜摄影的公路边坡三维重建与灾害识别方法[J]. 中国公路学报, 2022, 35(4): 77-86. doi: 10.3969/j.issn.1001-7372.2022.04.005YUJ Y, XUEX K, CHENC F, et al. Three-dimensional reconstruction and disaster identification of highway slope using unmanned aerial vehicle-based oblique photography technique[J]. China Journal of Highway and Transport, 2022, 35(4): 77-86. (in Chinese). doi: 10.3969/j.issn.1001-7372.2022.04.005

    [118] Z HU, Y HOU, P TAO et al. IMGTR: Image-triangle based multi-view 3D reconstruction for urban scenes. ISPRS Journal of Photogrammetry and Remote Sensing, 181, 191-204(2021).

    [119] M ORSINGHER, P ZANI, P MEDICI et al. Revisiting patchmatch multi-view stereo for urban 3D reconstruction, 190-196(4).

    [120] Y X ZHOU, R L EIMEN, E J SEIBEL et al. Cost-efficient video synthesis and evaluation for development of virtual 3D endoscopy. IEEE Journal of Translational Engineering in Health and Medicine, 9, 1-11(2021).

    [121] [121] 121何东健, 熊虹婷, 芦忠忠, 等. 基于多视角立体视觉的拔节期玉米水分胁迫预测模型[J]. 农业机械学报, 2020, 51(6): 248-257. doi: 10.6041/j.issn.1000-1298.2020.06.026HED J, XIONGH T, LUZ Z, et al. Predictive model of maize moisture stress during jointing stage based on multi-view stereo vision[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51(6): 248-257. (in Chinese). doi: 10.6041/j.issn.1000-1298.2020.06.026

    [122] [122] 122王思启, 张家强, 李丽圆, 等. MVSNet在空间目标三维重建中的应用[J]. 中国激光, 2022, 49(23): 2310003.WANGS Q, ZHANGJ Q, LIL Y, et al. Application of MVSNet in 3D reconstruction of space objects[J]. Chinese Journal of Lasers, 2022, 49(23): 2310003. (in Chinese)

    [123] A GÓMEZ, G RANDALL, G FACCIOLO et al. An Experimental comparison of multi-view stereo approaches on satellite images, 707-716(3).

    [124] J C LU, Y X LI, Z C ZUO. SatMVS A Novel 3D Reconstruction Pipeline for Remote Sensing Satellite Imagery. Lecture Notes in Electrical Engineering, 521-538(2022).

    CLP Journals

    [1] Qiqi KOU, Weichen WANG, Chenggong HAN, Chen LÜ, Deqiang CHENG, Yucheng JI. Multi-frame self-supervised monocular depth estimation with multi-scale feature enhancement[J]. Optics and Precision Engineering, 2024, 32(24): 3603

    [2] Wenjie LI, Wulang LIU, Beibei WANG, Yuyuan HUANG, Guijie LIU, Fuquan LI. Precision three-dimensional measurement method of telecentric imaging based on calibration parameter correction[J]. Optics and Precision Engineering, 2025, 33(2): 165

    [3] Qiqi KOU, Weichen WANG, Chenggong HAN, Chen LÜ, Deqiang CHENG, Yucheng JI. Multi-frame self-supervised monocular depth estimation with multi-scale feature enhancement[J]. Optics and Precision Engineering, 2024, 32(24): 3603

    [4] Wenjie LI, Wulang LIU, Beibei WANG, Yuyuan HUANG, Guijie LIU, Fuquan LI. Precision three-dimensional measurement method of telecentric imaging based on calibration parameter correction[J]. Optics and Precision Engineering, 2025, 33(2): 165

    Tools

    Get Citation

    Copy Citation Text

    Huabiao YAN, Fangqi XU, Lü'er HUANG, Cibo LIU, Chuxin LIN. Review of multi-view stereo reconstruction methods based on deep learning[J]. Optics and Precision Engineering, 2023, 31(16): 2444

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Information Sciences

    Received: Nov. 14, 2022

    Accepted: --

    Published Online: Sep. 5, 2023

    The Author Email: HUANG Lü'er (9320080310@jxust.edu.cn)

    DOI:10.37188/OPE.20233116.2444

    Topics