Laser & Optoelectronics Progress, Volume. 60, Issue 8, 0811010(2023)

Research Progress on Binocular Stereo Vision Applications

Xiaoli Yang1、†, Yuhua Xu1、†,*, Lejia Ye1, Xin Zhao1, Fei Wang2, and Zhenzhong Xiao1、**
Author Affiliations
  • 1Orbbec Inc, Shenzhen 518062, Guangdong, China
  • 2Shenzhen Oxin Technology Co., Ltd., Shenzhen 518062, Guangdong, China
  • show less
    References(136)

    [1] Scharstein D, Szeliski R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms[J]. International Journal of Computer Vision, 47, 7-42(2002).

    [2] Se S, Jasiobedzki P. Stereo-vision based 3D modeling and localization for unmanned vehicles[J]. International Journal of Intelligent Control and Systems, 13, 47-58(2008).

    [3] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C], 3354-3361(2012).

    [4] Murray D, Little J J. Using real-time stereo vision for mobile robot navigation[J]. Autonomous Robots, 8, 161-171(2000).

    [5] Fan R, Jiao J H, Pan J et al. Real-time dense stereo embedded in a UAV for road inspection[C], 535-543(2020).

    [6] Li R X, Squyres S W, Arvidson R E et al. Initial results of rover localization and topographic mapping for the 2003 Mars exploration rover mission[J]. Photogrammetric Engineering & Remote Sensing, 71, 1129-1142(2005).

    [7] Wang B F, Zhou J L, Tang G S et al. Research on visual localization method of lunar rover[J]. Scientia Sinica: Informationis, 44, 452-460(2014).

    [8] Luo C C, Li Y M, Lin K M et al. Wavelet synthesis net for disparity estimation to synthesize DSLR calibre bokeh effect on smartphones[C], 2404-2412(2020).

    [9] Zhu D X. Research on location and crawling of workpiece based on binocular vision[J]. Computer Measurement & Control, 19, 92-94(2011).

    [10] Gong J Y, Ji S P. Photogrammetry and deep learning[J]. Acta Geodaetica et Cartographica Sinica, 47, 693-704(2018).

    [11] Gu C, Qian W X, Chen Q et al. Rapid head detection method based on binocular stereo vision[J]. Chinese Journal of Lasers, 41, 0108001(2014).

    [12] Bao W, Wang W, Xu Y H et al. InStereo2K: a large real dataset for stereo matching in indoor scenes[J]. Science China Information Sciences, 63, 212101(2020).

    [13] Xu H F, Zhang J Y. AANet: adaptive aggregation network for efficient stereo matching[C], 1956-1965(2020).

    [14] Teed Z, Deng J. RAFT: recurrent all-pairs field transforms for optical flow[M]. Vedaldi A, Bischof H, Brox T, et al. Computer vision-ECCV 2020, 12347, 402-419(2020).

    [15] Lipson L, Teed Z, Deng J. RAFT-Stereo: multilevel recurrent field transforms for stereo matching[C], 218-227(2022).

    [16] Li J K, Wang P S, Xiong P F et al. Practical stereo matching via cascaded recurrent network with adaptive correlation[C], 16242-16251(2022).

    [17] Xu G W, Cheng J D, Guo P et al. Attention concatenation volume for accurate and efficient stereo matching[C], 12971-12980(2022).

    [18] Shen Z L, Dai Y C, Rao Z B. CFNet: cascade and fused cost volume for robust stereo matching[C], 13901-13910(2021).

    [20] Xu H F, Zhang J, Cai J F et al. GMFlow: learning optical flow via global matching[C], 8111-8120(2022).

    [21] Menze M, Heipke C, Geiger A. Joint 3d estimation of vehicles and scene flow[J]. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, II-3/W5, 427-434(2015).

    [22] Scharstein D, Hirschmüller H, Kitajima Y et al. High-resolution stereo datasets with subpixel-accurate ground truth[M]. Jiang X Y, Hornegger J, Koch R. Pattern recognition, 8753, 31-42(2014).

    [23] Zhang F H, Prisacariu V, Yang R G et al. GA-net: guided aggregation net for end-to-end stereo matching[C], 185-194(2020).

    [24] Chang J R, Chen Y S. Pyramid stereo matching network[C], 5410-5418(2018).

    [25] Xu B, Xu Y H, Yang X L et al. Bilateral grid learning for stereo matching networks[C], 12492-12501(2021).

    [26] Birchfield S, Tomasi C. A pixel dissimilarity measure that is insensitive to image sampling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 401-406(1998).

    [27] Mei X, Sun X, Zhou M C et al. On building an accurate stereo matching system on graphics hardware[C], 467-474(2012).

    [28] Hirschmüller H. Stereo processing by semiglobal matching and mutual information[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 328-341(2008).

    [29] Hirschmuller H, Scharstein D. Evaluation of cost functions for stereo matching[C](2007).

    [30] Bai M, Zhuang Y, Wang W. Progress in binocular stereo matching algorithms[J]. Control and Decision, 23, 721-729(2008).

    [31] Hong L, Chen G. Segment-based stereo matching using graph cuts[C](2004).

    [32] Zhang L T, Qu D K, Xu F. An improved stereo matching algorithm based on graph cuts[J]. Robot, 32, 104-108(2010).

    [33] Sun J, Zheng N N, Shum H Y. Stereo matching using belief propagation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 787-800(2003).

    [34] Yang Q X, Wang L, Yang R G et al. Stereo matching with color-weighted correlation, hierarchical belief propagation, and occlusion handling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 492-504(2009).

    [35] Boykov Y, Veksler O, Zabih R. Fast approximate energy minimization via graph cuts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 1222-1239(2001).

    [36] Yoon K J, Kweon I S. Adaptive support-weight approach for correspondence search[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 650-656(2006).

    [37] Tomasi C, Manduchi R. Bilateral filtering for gray and color images[C], 839-846(2002).

    [38] Wang L, Liao M, Gong M L et al. High-quality real-time stereo using adaptive cost aggregation and dynamic programming[C], 798-805(2007).

    [39] He K M, Sun J, Tang X O. Guided image filtering[M]. Daniilidis K, Maragos P, Paragios N. Computer vision-ECCV 2010, 6311, 1-14(2010).

    [40] Hosni A, Rhemann C, Bleyer M et al. Fast cost-volume filtering for visual correspondence and beyond[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 504-511(2012).

    [41] Zhang K, Lu J B, Lafruit G. Cross-based local stereo matching using orthogonal integral images[J]. IEEE Transactions on Circuits and Systems for Video Technology, 19, 1073-1079(2009).

    [42] Keselman L, Woodfill J I, Grunnet-Jepsen A et al. Intel(R) RealSense(TM) stereoscopic depth cameras[C], 1267-1276(2017).

    [43] Bleyer M, Rhemann C, Rother C. PatchMatch stereo-stereo matching with slanted support windows[C], 1-11(2011).

    [44] Barnes C, Shechtman E, Finkelstein A et al. PatchMatch: a randomized correspondence algorithm for structural image editing[J]. ACM Transactions on Graphics, 28, 1-11(2009).

    [45] Yang Q X. A non-local cost aggregation method for stereo matching[C], 1402-1409(2012).

    [46] Yang Q X. Stereo matching using tree filtering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 834-846(2015).

    [47] Li L C, Yu X, Zhang S L et al. 3D cost aggregation with multiple minimum spanning trees for stereo matching[J]. Applied Optics, 56, 3411-3420(2017).

    [48] Zbontar J, LeCun Y. Stereo matching by training a convolutional neural network to compare image patches[J]. Journal of Machine Learning Research, 17, 2287-2318(2016).

    [49] Zagoruyko S, Komodakis N. Learning to compare image patches via convolutional neural networks[C], 4353-4361(2015).

    [50] Park H, Lee K M. Look wider to match image patches with convolutional neural networks[J]. IEEE Signal Processing Letters, 24, 1788-1792(2017).

    [51] Shaked A, Wolf L. Improved stereo matching with constant highway networks and reflective confidence learning[C], 6901-6910(2017).

    [52] Seki A, Pollefeys M. SGM-nets: semi-global matching with neural networks[C], 6640-6649(2017).

    [53] Knöbelreiter P, Reinbacher C, Shekhovtsov A et al. End-to-end training of hybrid CNN-CRF models for stereo[C], 1456-1465(2017).

    [54] Gidaris S, DetectKomodakis N. replace, refine: deep structured prediction for pixel wise labeling[C], 7187-7196(2017).

    [55] Güney F, Geiger A. Displets: Resolving stereo ambiguities using object knowledge[C], 4165-4175(2015).

    [56] Mayer N, Ilg E, Häusser P et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C], 4040-4048(2016).

    [57] Vaswani A, Shazeer N, Parmar N et al. Attention is all You need[C], 6000-6010(2017).

    [58] Wang X L, Girshick R, Gupta A et al. Non-local neural networks[C], 7794-7803(2018).

    [60] Liu Z, Lin Y T, Cao Y et al. Swin transformer: hierarchical vision transformer using shifted windows[C], 9992-10002(2022).

    [61] Dosovitskiy A, Fischer P, Ilg E et al. FlowNet: learning optical flow with convolutional networks[C], 2758-2766(2016).

    [62] Liang Z F, Feng Y L, Guo Y L et al. Learning for disparity estimation through feature constancy[C], 2811-2820(2018).

    [63] Pang J H, Sun W X, Ren J S et al. Cascade residual learning: a two-stage convolutional neural network for stereo matching[C], 878-886(2018).

    [64] Song X, Zhao X, Fang L J et al. EdgeStereo: an effective multi-task learning network for stereo matching and edge detection[J]. International Journal of Computer Vision, 128, 910-930(2020).

    [65] Yang G R, Zhao H S, Shi J P et al. SegStereo: exploiting semantic information for disparity estimation[M]. Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision-ECCV 2018, 11211, 660-676(2018).

    [66] Dai J F, Qi H Z, Xiong Y W et al. Deformable convolutional networks[C], 764-773(2017).

    [67] Tankovich V, Häne C, Zhang Y D et al. HITNet: hierarchical iterative tile refinement network for real-time stereo matching[C], 14357-14367(2021).

    [68] Badki A, Troccoli A, Kim K et al. Bi3D: stereo depth estimation via binary classifications[C], 1597-1605(2020).

    [69] Tosi F, Liao Y Y, Schmitt C et al. SMD-Nets: stereo mixture density networks[C], 8938-8948(2021).

    [70] Kendall A, Martirosyan H, Dasgupta S et al. End-to-end learning of geometry and context for deep stereo regression[C], 66-75(2017).

    [71] Guo X Y, Yang K, Yang W K et al. Group-wise correlation stereo network[C], 3268-3277(2020).

    [72] Khamis S, Fanello S, Rhemann C et al. StereoNet: guided hierarchical refinement for real-time edge-aware depth prediction[M]. Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision-ECCV 2018, 11219, 596-613(2018).

    [73] Duggal S, Wang S L, Ma W C et al. DeepPruner: learning efficient stereo matching via differentiable PatchMatch[C], 4383-4392(2020).

    [74] Zhang F H, Qi X J, Yang R G et al. Domain-invariant stereo matching networks[M]. Vedaldi A, Bischof H, Brox T, et al. Computer vision-ECCV 2020, 12347, 420-439(2020).

    [75] Yao C T, Jia Y D, Di H J et al. A decomposition model for stereo matching[C], 6087-6096(2021).

    [76] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition[C], 770-778(2016).

    [77] Li Z S, Liu X T, Drenkow N et al. Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers[C], 6177-6186(2022).

    [78] Zhang Y M, Chen Y M, Bai X et al. Adaptive unimodal cost volume filtering for deep stereo matching[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12926-12934(2020).

    [79] Yu J J, Harley A W, Derpanis K G. Back to basics: unsupervised learning of optical flow via brightness constancy and motion smoothness[M]. Hua G, Jégou H. Computer vision-ECCV 2016 workshops, 9915, 3-10(2016).

    [80] Ahmadi A, Patras I. Unsupervised convolutional neural networks for motion estimation[C], 1629-1633(2016).

    [81] Flynn J, Neulander I, Philbin J et al. Deep Stereo: learning to predict new views from the world’s imagery[C], 5515-5524(2016).

    [82] Xie J Y, Girshick R, Farhadi A. Deep3D: fully automatic 2D-to-3D video conversion with deep convolutional neural networks[M]. Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016, 9908, 842-857(2016).

    [83] Garg R, B G V K, Carneiro G et al. Unsupervised CNN for single view depth estimation: geometry to the rescue[M]. Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016, 9912, 740-756(2016).

    [84] Ren Z, Yan J C, Ni B B et al. Unsupervised deep learning for optical flow estimation[C], 1495-1501(2017).

    [85] Godard C, Mac Aodha O, Brostow G J. Unsupervised monocular depth estimation with left-right consistency[C], 6602-6611(2017).

    [86] Tonioni A, Poggi M, Mattoccia S et al. Unsupervised adaptation for deep stereo[C], 1614-1622(2017).

    [87] Zabih R, Woodfill J. Non-parametric local transforms for computing visual correspondence[M]. Eklundh J O. Computer vision-ECCV '94, 801, 151-158(1994).

    [88] Poggi M, Mattoccia S. Learning from scratch a confidence measure[C], 1-13(2016).

    [89] Kuznietsov Y, Stückler J, Leibe B. Semi-supervised deep learning for monocular depth map prediction[C], 2215-2223(2017).

    [90] Zhou C, Zhang H, Shen X Y et al. Unsupervised learning of stereo matching[C], 1576-1584(2017).

    [91] Aleotti F, Tosi F, Zhang L et al. Reversing the cycle: self-supervised deep stereo through enhanced monocular distillation[M]. Vedaldi A, Bischof H, Brox T, et al. Computer vision-ECCV 2020, 12356, 614-632(2020).

    [92] Wang Y, Lai Z H, Huang G et al. Anytime stereo image depth estimation on mobile devices[C], 5893-5900(2019).

    [93] Yee K, Chakrabarti A. Fast deep stereo with 2D convolutional processing of cost signatures[C], 183-191(2020).

    [94] Shamsafar F, Woerz S, Rahim R et al. MobileStereoNet: towards lightweight deep networks for stereo matching[C], 677-686(2022).

    [96] Sandler M, Howard A, Zhu M L et al. MobileNetV2: inverted residuals and linear bottlenecks[C], 4510-4520(2018).

    [97] LeCun Y, Denker J, Solla S[M]. Optimal brain damage(1989).

    [98] Hassibi B, Stork D G, Wolff G et al. Optimal brain surgeon: extensions and performance comparisons[C], 263-270(1993).

    [100] Han S, Liu X Y, Mao H Z et al. EIE: efficient inference engine on compressed deep neural network[C], 243-254(2016).

    [101] Anwar S, Sung W[M]. Coarse pruning of convolutional neural networks with random masks(2017).

    [103] Liu B Y, Wang M, Foroosh H et al. Sparse convolutional neural networks[C], 806-814(2015).

    [104] Liu Z, Li J G, Shen Z Q et al. Learning efficient convolutional networks through network slimming[C], 2755-2763(2017).

    [107] Luo J H, Wu J X, Lin W Y. ThiNet: a filter level pruning method for deep neural network compression[C], 5068-5076(2017).

    [108] Yu R C, Li A, Chen C F et al. NISP: pruning networks using neuron importance score propagation[C], 9194-9203(2018).

    [111] He Y H, Lin J, Liu Z J et al. AMC: AutoML for model compression and acceleration on mobile devices[M]. Ferrari V, Hebert M, Sminchisescu C, et al. Computer Vision-ECCV 2018, 11211, 815-832(2018).

    [112] Yang T J, Howard A, Chen B et al. NetAdapt: platform-aware neural network adaptation for mobile applications[M]. Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision-ECCV 2018, 11214, 289-304(2018).

    [114] Jacob B, Kligys S, Chen B et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C], 2704-2713(2018).

    [115] Ron B, Yury N, Elad H. Post training 4-bit quantization of convolution networks for rapid-deployment[C], 7948-7956(2019).

    [117] Gong J, Shen H H, Zhang G M et al. Highly efficient 8-bit low precision inference of convolutional neural networks with IntelCaffe[C], 1-2(2018).

    [118] Lin X F, Zhao C, Pan W. Towards accurate binary convolutional neural network[C], 344-352(2017).

    [120] Chmiel B, Banner R, Shomron G et al. Robust quantization: one model to rule them all[C], 5308-5317(2020).

    [123] Cao Z J, Long M S, Wang J M et al. HashNet: deep learning to hash by continuation[C], 5609-5618(2017).

    [124] Hwang K, Sung W. Fixed-point feedforward deep neural network design using weights 1, 0, and -1[C](2014).

    [127] Zhang S J, Du Z D, Zhang L et al. Cambricon-X: an accelerator for sparse neural networks[C](2016).

    [128] Chen T S, Du Z D, Sun N H et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning[J]. ACM SIGARCH Computer Architecture News, 42, 269-284(2014).

    [129] Scharstein D, Szeliski R. High-accuracy stereo depth maps using structured light[C](2003).

    [130] Ramirez P Z, Tosi F, Poggi M et al. Open challenges in deep stereo: the booster dataset[C], 21136-21146(2022).

    [131] Yang G R, Song X, Huang C Q et al. DrivingStereo: a large-scale dataset for stereo matching in autonomous driving scenarios[C], 899-908(2020).

    [133] Schöps T, Schönberger J L, Galliani S et al. A multi-view stereo benchmark with high-resolution images and multi-camera videos[C], 2538-2547(2017).

    [134] Tremblay J, To T, Birchfield S. Falling things: a synthetic dataset for 3D object detection and pose estimation[C], 2119-21193(2018).

    [135] Yang G S, Manela J, Happold M et al. Hierarchical deep stereo matching on high-resolution images[C], 5510-5519(2020).

    [136] Huang X Y, Cheng X J, Geng Q C et al. The ApolloScape dataset for autonomous driving[C], 1067-10676(2018).

    Tools

    Get Citation

    Copy Citation Text

    Xiaoli Yang, Yuhua Xu, Lejia Ye, Xin Zhao, Fei Wang, Zhenzhong Xiao. Research Progress on Binocular Stereo Vision Applications[J]. Laser & Optoelectronics Progress, 2023, 60(8): 0811010

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Imaging Systems

    Received: Jan. 5, 2023

    Accepted: Feb. 22, 2023

    Published Online: Apr. 17, 2023

    The Author Email: Xu Yuhua (xyh_nudt@163.com), Xiao Zhenzhong (xiaozhenzhong@orbbec.com)

    DOI:10.3788/LOP230457

    Topics