Optics and Precision Engineering, Volume. 31, Issue 16, 2444(2023)
Review of multi-view stereo reconstruction methods based on deep learning
[1] FURUKAWA Y, HERNÁNDEZ C. Multi-view stereo: a tutorial[J]. Foundations and Trends® in Computer Graphics and Vision, 9, 1-148(2015).
[2] SMITH M W, CARRIVICK J L, QUINCEY D J. Structure from motion photogrammetry in physical geography[J]. Progress in Physical Geography: Earth and Environment, 40, 247-275(2016).
[3] [3] 3刘东生, 陈建林, 费点, 等. 基于深度相机的大场景三维重建[J]. 光学 精密工程, 2020, 28(1): 234-243. doi: 10.3788/ope.20202801.0234LIUD S, CHENJ L, FEID, et al. Three-dimensional reconstruction of large-scale scene based on depth camera[J]. Opt. Precision Eng., 2020, 28(1): 234-243.(in Chinese). doi: 10.3788/ope.20202801.0234
[4] SCHÖNBERGER J L, ZHENG E L, FRAHM J M et al. Pixelwise View Selection for Unstructured Multi-View Stereo[M]. Computer Vision - ECCV 2016, 501-518(2016).
[5] XU Q S, TAO W B. Multi-scale geometric consistency guided multi-view stereo[C], 5478-5487(15).
[6] [6] 6张宝祥, 玉振明, 杨秋慧. 基于Harris-SIFT算法和全卷积深度预测的显微镜成像的三维重建研究[J]. 光学 精密工程, 2022, 30(14): 1669-1681. doi: 10.37188/OPE.20223014.1669ZHANGB X, YUZ M, YANGQ H. Research on 3D reconstruction of microscope imaging based on Harris-SIFT algorithm and full convolution depth prediction[J]. Opt. Precision Eng., 2022, 30(14): 1669-1681.(in Chinese). doi: 10.37188/OPE.20223014.1669
[7] JI M Q, GALL J, ZHENG H T et al. SurfaceNet: an End-to-End 3D neural network for multiview stereopsis[C], 2326-2334(22).
[9] AANÆS H, JENSEN R R, VOGIATZIS G et al. Large-scale data for multiple-view stereopsis[J]. International Journal of Computer Vision, 120, 153-168(2016).
[10] KNAPITSCH A, PARK J, ZHOU Q Y et al. Tanks and temples: benchmarking large-scale scene reconstruction[J]. ACM Transactions on Graphics, 36, 1-13.
[11] YAO Y, LUO Z X, LI S W et al.
[12] LI L Y, LI X Y, JIANG L Y et al. A review on deep learning techniques for cloud detection methodologies and challenges[J]. Signal, Image and Video Processing, 15, 1527-1535(2021).
[14] WANG X, WANG C, LIU B et al. Multi-view stereo in the deep learning era: a comprehensive review[J]. Displays, 70, 102102(2021).
[15] GALLUP D, FRAHM J M, MORDOHAI P et al. Real-time plane-sweeping stereo with multiple sweeping directions[C], 1-8(17).
[16] YAO Y, LUO Z X, LI S W et al. Recurrent MVSNet for high-resolution multi-view stereo depth inference[C], 5520-5529(15).
[17] CHEN R, HAN S F, XU J et al. Point-based multi-view stereo network[C], 1538-1547.
[18] XUE Y Z, CHEN J S, WAN W T et al. MVSCRF: learning multi-view stereo with conditional random fields[C], 4311-4320.
[19] GU X D, FAN Z W, ZHU S Y et al. Cascade cost volume for high-resolution multi-view stereo and stereo matching[C], 2492-2501(13).
[20] YANG J Y, MAO W, ALVAREZ J M et al. Cost volume pyramid based depth inference for multi-view stereo[C], 4876-4885(13).
[21] RONNEBERGER O, FISCHER P, BROX T.
[23] SHI Y F, XI J H, HU D W et al. RayMVSNet: learning ray-based 1D implicit fields for accurate multi-view stereo[C], 1-17(2023).
[24] YI H W, WEI Z Z, DING M Y et al. Pyramid Multi-View Stereo Net with Self-Adaptive View Aggregation[M]. Computer Vision - ECCV 2020, 766-782(2020).
[26] LIN T Y, DOLLÁR P, GIRSHICK R et al. Feature pyramid networks for object detection[C], 936-944(21).
[27] CHENG S, XU Z X, ZHU S L et al. Deep stereo using adaptive thin volume representation with uncertainty awareness[C], 2521-2531(13).
[28] LI Y, LI W Y, ZHAO Z J et al. DRI-MVSNet: a depth residual inference network for multi-view stereo images[J]. PLoS One, 17(2022).
[30] ZHANG K, LIU M Y, ZHANG J L et al. PA-MVSNet: sparse-to-dense multi-view stereo with pyramid attention[J]. IEEE Access, 9, 27908-27915(2021).
[31] ANZHU, YU, et al, ANZHU, YU, et al. Attention aware cost volume pyramid based multi-view stereo network for 3D reconstruction[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 175, 448-460(2021).
[32] LI J J, BAI Z Y, CHENG W et al. Feature pyramid multi-view stereo network based on self-attention mechanism[C], 226-233(9).
[33] PARMAR N, RAMACHANDRAN P, VASWANI A et al. Stand-alone self-attention in vision models[C], 13(2019).
[34] ZHANG X D, HU Y T, WANG H C et al. Long-range attention network for multi-view stereo[C], 3781-3790(2021).
[35] LIU W J, WANG J K, QU H C et al. Hierarchical MVSNet with cost volume separation and fusion based on U-shape feature extraction[J]. Multimedia Systems, 29, 377-387(2023).
[36] PARK J, LEE J Y et al.
[37] CAO C, REN X, FU Y. MVSFormer: multi-view stereo with pre-trained vision transformers and temperature-based depth[J]. arXiv preprint arXiv:, 2022.
[39] SAEED S, LEE S, CHO Y et al. ASPPMVSNet: a high-receptive-field multiview stereo network for dense three-dimensional reconstruction[J]. ETRI Journal, 44, 1034-1046(2022).
[40] CHEN L C, PAPANDREOU G, KOKKINOS I et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848(2018).
[41] WEI Z Z, ZHU Q T, MIN C et al. AA-RMVSNet: adaptive aggregation recurrent multi-view stereo network[C], 6167-6176(10).
[42] MASSON J E N, PETRY M R, COUTINHO D F et al. Deformable convolutions in multi-view stereo[J]. Image and Vision Computing, 118, 104369(2022).
[43] CHENG W, BAI Z Y, LI J J et al. ADIM-MVSNet: adaptive depth interval multi-view stereo network for 3d reconstruction[C], 281-287(2022).
[44] DAI J F, QI H Z, XIONG Y W et al. Deformable convolutional networks[C], 764-773(22).
[45] DING Y K, YUAN W T, ZHU Q T et al. TransMVSNet: global context-aware multi-view stereo network with transformers[C], 8575-8584(18).
[46] GIANG KT, SONG S. Curvature-guided dynamic scale networks for multi-view stereo[J]. arXiv preprint arXiv:, 2022.
[47] YAN J F, WEI Z Z, YI H W et al. Dense hybrid recurrent multi-view stereo net with dynamic consistency checking[C], 674-689(23).
[48] YU Z H, GAO S H. Fast-MVSNet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement[C], 1946-1955(13).
[49] CHEN R, HAN S F, XU J et al. Visibility-aware point-based multi-view stereo network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 3695-3708(2020).
[50] WEILHARTER R, FRAUNDORFER F. HighRes-MVSNet: a fast multi-view stereo network for dense 3D reconstruction from high-resolution images[J]. IEEE Access, 9, 11306-11315(2021).
[52] VASWANI A, SHAZEER N, PARMAR N et al. Attention is all you need[C], 6000-6010(9).
[53] WANG X F, ZHU Z, HUANG G et al.
[56] HE Y H, YAN R, FRAGKIADAKI K et al. Epipolar transformers[C], 7776-7785(13).
[57] CHEN P H, YANG H C, CHEN K W et al. MVSNet: learning depth-based attention pyramid features for multi-view stereo[J]. IEEE Transactions on Image Processing, 29, 7261-7273(2020).
[58] BENGIO Y, LOURADOUR J, COLLOBERT R et al. Curriculum learning[C], 41-48(18).
[59] GUO X Y, YANG K, YANG W K et al. Group-wise correlation stereo network[C], 3268-3277(15).
[60] XU Q S, TAO W B. Learning inverse depth regression for multi-view stereo with correlation cost volume[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12508-12515(2020).
[62] WANG F, GALLIANI S, VOGEL C et al. PatchmatchNet: learned multi-view patchmatch stereo[C], 14189-14198(20).
[63] SONG B, HU X, XIAO J et al. Implicit neural refinement based multi-view stereo network with adaptive correlation[J]. Image and Vision Computing, 124, 104511(2022).
[64] CAI Y C, LI L, WANG D et al. MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction[J]. Applied Intelligence, 53, 4289-4301(2023).
[65] GAO S Y, LI Z X, WANG Z Q. Cost Volume Pyramid Network with Multi-Strategies Range Searching for Multi-View Stereo[M]. Advances in Computer Graphics, 157-169(2022).
[66] LUO K Y, GUAN T, JU L L et al. P-MVSNet: learning patch-wise matching confidence aggregation for multi-view stereo[C], 10451-10460(2019).
[67] MA X J, GONG Y, WANG Q R et al. EPP-MVSNet: epipolar-assembling based depth prediction for multi-view stereo[C], 5712-5720(10).
[68] PENG R, WANG R J, WANG Z Y et al. Rethinking depth estimation for multi-view stereo: a unified representation[C], 8635-8644(18).
[69] XU H F, ZHANG J Y. AANet: adaptive aggregation network for efficient stereo matching[C], 1956-1965(13).
[70] SORMANN C, KNÖBELREITER P, KUHN A et al. BP-MVSNet: belief-propagation-layers for multi-view-stereo[C], 394-403(2021).
[71] QI Y, SU W, XU Q et al. Sparse prior guided deep multi-view stereo[J]. Computers & Graphics, 107, 1-9(2022).
[72] LIU J, JI S P. A novel recurrent encoder-decoder structure for large-scale multi-view stereo reconstruction from an open aerial dataset[C], 6049-6058(13).
[73] XU Q, OSWALD M R, TAO W et al. Non-local recurrent regularization networks for multi-view stereo[J]. arXiv preprint arXiv:, 2021.
[74] WANG F, GALLIANI S, VOGEL C et al. IterMVS: iterative probability estimation for efficient multi-view stereo[C], 8596-8605(18).
[75] MI Z X, DI C, XU D. Generalized binary search network for highly-efficient multi-view stereo[C], 12981-12990(18).
[76] LEE J Y, DEGOL J, ZOU C H et al. PatchMatch-RL: deep mvs with pixelwise depth, normal, and visibility[C], 6138-6147(10).
[77] YANG J Y, ALVAREZ J M, LIU M M. Non-parametric depth distribution modelling based depth inference for multi-view stereo[C], 8616-8624(18).
[78] WANG S Q, LI B, DAI Y C. Efficient multi-view stereo by iterative dynamic cost volume[C], 8645-8654(18).
[79] LI Y, ZHAO Z, FAN J et al. ADR-MVSNet: a cascade network for 3D point cloud reconstruction with pixel occlusion[J]. Pattern Recognition, 125, 108516(2022).
[80] LAFFERTY J, MCCALLUM A, PEREIRA FC. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C](2001).
[81] ZHENG S, JAYASUMANA S, ROMERA-PAREDES B et al. Conditional random fields as recurrent neural networks[C], 1529-1537(7).
[82] KNÖBELREITER P, SORMANN C, SHEKHOVTSOV A et al. Belief propagation reloaded: learning BP-Layers for labeling problems[C], 7897-7906(13).
[83] LUO K Y, GUAN T, JU L L et al. Attention-aware multi-view stereo[C], 1587-1596(13).
[84] WEI Z Z, ZHU Q T, MIN C et al. Bidirectional hybrid LSTM based recurrent neural network for multi-view stereo[J]. IEEE Transactions on Visualization and Computer Graphics(2022).
[85] LIN T Y, GOYAL P, GIRSHICK R et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 318-327(2020).
[86] ZHOU H Z, ZHAO H L, WANG Q et al. Miper-MVS: multi-scale iterative probability estimation with refinement for efficient multi-view stereo[J]. Neural Networks, 162, 502-515(2023).
[87] DING Y K, LI Z Y, HUANG D H et al. Enhancing Multi-View stereo with contrastive matching and weighted focal loss[C], 821-825(2022).
[89] KHOT T, AGRAWAL S, TULSIANI S et al. Learning unsupervised multi-view stereopsis via robust photometric consistency[J]. arXiv preprint arXiv:, 02706, 2019(1905).
[90] DAI Y C, ZHU Z D, RAO Z B et al. MVS2: Deep unsupervised multi-view stereo with multi-view symmetry[C], 1-8(2019).
[91] MALLICK A, STÜCKLER J, LENSCH H. Learning to adapt multi-view stereo by self-supervision[J]. arXiv preprint arXiv, 2020.
[92] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C], 1126-1135(11).
[93] HUANG B C, YI H W, HUANG C et al. M3VSNET: unsupervised multi-metric multi-view stereo network[C], 3163-3167(19).
[94] XU H B, ZHOU Z P, QIAO Y et al. Self-supervised multi-view stereo via effective co-segmentation and data-augmentation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 3030-3038(2021).
[95] YANG J Y, ALVAREZ J M, LIU M M. Self-supervised learning of depth inference for multi-view stereo[C], 7522-7530(20).
[96] XU H B, ZHOU Z P, WANG Y L et al. Digging into uncertainty in self-supervised multi-view stereo[C], 6058-6067(10).
[97] QI S, SANG X, YAN B et al. Unsupervised multi-view stereo network based on multi-stage depth estimation[J]. Image and Vision Computing, 122, 104449(2022).
[98] DONG H, YAO J. PatchMVSNet: patch-wise unsuper-vised multi-view stereo for weakly-textured surface reconstruction[J]. arXiv preprint arXiv:, 2022.
[99] CHANG D, BOŽIČ A, ZHANG T et al.
[100] MILDENHALL B, SRINIVASAN P P, TANCIK M et al. NeRF[J]. Communications of the ACM, 65, 99-106(2022).
[101] CHEN A P, XU Z X, ZHAO F Q et al. MVSNeRF: fast generalizable radiance field reconstruction from multi-view stereo[C], 14104-14113(10).
[102] XU Q G, XU Z X, PHILIP J et al. Point-nerf: point-based neural radiance fields[C], 5428-5438(18).
[103] ZHANG J Z, JI M Q, WANG G Y et al. SurRF: unsupervised multi-view stereopsis by learning surface radiance field[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 7912-7927(2022).
[105] DING Y K, ZHU Q T, LIU X Y et al.
[106] VLADLEN KOLTUN et al. Adaptive surface reconstruction with multiscale convolutional kernels[J]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV): 5631(5640).
[107] SCHÖPS T, SATTLER T, POLLEFEYS M. BAD SLAM: bundle adjusted direct RGB-D SLAM[C], 134-144(15).
[108] YAO Y, LUO Z X, LI S W et al. BlendedMVS: a large-scale dataset for generalized multi-view stereo networks[C], 1787-1796(13).
[109] ZHANG J N, ZHANG J Z, MAO S et al. GigaMVS: a benchmark for ultra-large-scale gigapixel-level 3D reconstruction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 7534-7550(2022).
[110] MA Z Y, TEED Z, DENG J. Multiview Stereo with Cascaded Epipolar RAFT[M]. Lecture Notes in Computer Science, 734-750(2022).
[111] LI Z X, ZUO W M, WANG Z Q et al. Confidence-based large-scale dense multi-view stereo[J]. IEEE Transactions on Image Processing, 29, 7176-7191(2020).
[112] LI W, ZHU D, WANG Q. A single view leaf reconstruction method based on the fusion of ResNet and differentiable render in plant growth digital twin system[J]. Computers and Electronics in Agriculture, 193, 106712(2022).
[113] DENG X P, QIU S, JIN W Q et al. Three-dimensional reconstruction method for bionic compound-eye system based on MVSNet network[J]. Electronics, 11, 1790(2022).
[114] [114] 114郝雯, 张雯静, 梁玮, 等. 面向三维点云的场景识别方法综述[J]. 光学 精密工程, 2022, 30(16): 1988-2005. doi: 10.37188/OPE.20223016.1988HAOW, ZHANGW J, LIANGW, et al. Scene recognition for 3D point clouds: a review[J]. Opt. Precision Eng., 2022, 30(16): 1988-2005. (in Chinese). doi: 10.37188/OPE.20223016.1988
[115] EBNER T, FELDMANN I, RENAULT S et al. Multi-view reconstruction of dynamic real-world objects and their integration in augmented and virtual reality applications[J]. Journal of the Society for Information Display, 25, 151-157(2017).
[116] [116] 116李兆歆, 蒋浩, 刘衍青, 等. 丝路文化虚拟体验中的多视角立体重建技术研究[J]. 计算机学报, 2022, 45(3): 500-512. doi: 10.11897/SP.J.1016.2022.00500LIZ X, JIANGH, LIUY Q, et al. Research on multi-view stereo 3D reconstruction in virtual reality system of silk road cultural inheritance[J]. Chinese Journal of Computers, 2022, 45(3): 500-512. (in Chinese). doi: 10.11897/SP.J.1016.2022.00500
[117] [117] 117余加勇, 薛现凯, 陈昌富, 等. 基于无人机倾斜摄影的公路边坡三维重建与灾害识别方法[J]. 中国公路学报, 2022, 35(4): 77-86. doi: 10.3969/j.issn.1001-7372.2022.04.005YUJ Y, XUEX K, CHENC F, et al. Three-dimensional reconstruction and disaster identification of highway slope using unmanned aerial vehicle-based oblique photography technique[J]. China Journal of Highway and Transport, 2022, 35(4): 77-86. (in Chinese). doi: 10.3969/j.issn.1001-7372.2022.04.005
[118] HU Z, HOU Y, TAO P et al. IMGTR: Image-triangle based multi-view 3D reconstruction for urban scenes[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 181, 191-204(2021).
[119] ORSINGHER M, ZANI P, MEDICI P et al. Revisiting patchmatch multi-view stereo for urban 3D reconstruction[C], 190-196(4).
[120] ZHOU Y X, EIMEN R L, SEIBEL E J et al. Cost-efficient video synthesis and evaluation for development of virtual 3D endoscopy[J]. IEEE Journal of Translational Engineering in Health and Medicine, 9, 1-11(2021).
[121] [121] 121何东健, 熊虹婷, 芦忠忠, 等. 基于多视角立体视觉的拔节期玉米水分胁迫预测模型[J]. 农业机械学报, 2020, 51(6): 248-257. doi: 10.6041/j.issn.1000-1298.2020.06.026HED J, XIONGH T, LUZ Z, et al. Predictive model of maize moisture stress during jointing stage based on multi-view stereo vision[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51(6): 248-257. (in Chinese). doi: 10.6041/j.issn.1000-1298.2020.06.026
[122] [122] 122王思启, 张家强, 李丽圆, 等. MVSNet在空间目标三维重建中的应用[J]. 中国激光, 2022, 49(23): 2310003.WANGS Q, ZHANGJ Q, LIL Y, et al. Application of MVSNet in 3D reconstruction of space objects[J]. Chinese Journal of Lasers, 2022, 49(23): 2310003. (in Chinese)
[123] GÓMEZ A, RANDALL G, FACCIOLO G et al. An Experimental comparison of multi-view stereo approaches on satellite images[C], 707-716(3).
[124] LU J C, LI Y X, ZUO Z C.
Get Citation
Copy Citation Text
Huabiao YAN, Fangqi XU, Lü'er HUANG, Cibo LIU, Chuxin LIN. Review of multi-view stereo reconstruction methods based on deep learning[J]. Optics and Precision Engineering, 2023, 31(16): 2444
Category: Information Sciences
Received: Nov. 14, 2022
Accepted: --
Published Online: Sep. 5, 2023
The Author Email: Lü'er HUANG (9320080310@jxust.edu.cn)