Opto-Electronic Engineering, Volume. 50, Issue 4, 220246(2023)

STransMNet: a stereo matching method with swin transformer fusion

Gaoping Wang1... Xun Li1,2,*, Xuefang Jia1, Zhewen Li1 and Wenjie Wang1 |Show fewer author(s)
Author Affiliations
  • 1School of Electronics and Information, Xi'an Polytechnic University, Xi'an, Shaanxi 710600, China
  • 2Xi'an Polytechnic University Branch of Shaanxi Artificial Intelligence Joint Laboratory, Xi'an, Shaanxi 710600, China
  • show less
    References(26)

    [2] [2] Hoffman J, Gupta S, Leong J, et al. Cross-modal adaptation for RGB-D detection[C]//2016 IEEE International Conference on Robotics and Automation (ICRA), 2016: 5032–5039.https://doi.org/10.1109/ICRA.2016.7487708.

    [3] [3] Schwarz M, Milan A, Periyasamy A S, et al. RGB-D object detection and semantic segmentation for autonomous manipulation in clutter[J]. Int J Robot Res, 2018, 37(4–5): 437–451.https://doi.org/10.1177/0278364917713117.

    [5] [5] Scharstein D, Szeliski R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms[J]. Int J Comput Vision, 2002, 47(1–3): 7–42.https://doi.org/10.1023/A:1014573219977.

    [8] [8] Mayer N, Ilg E, Häusser P, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4040–4048.https://doi.org/10.1109/CVPR.2016.438.

    [9] [9] Kendall A, Martirosyan H, Dasgupta S, et al. End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 66–75.https://doi.org/10.1109/ICCV.2017.17.

    [10] [10] Chang J R, Chen Y S. Pyramid stereo matching network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 5410–5418.https://doi.org/10.1109/CVPR.2018.00567.

    [11] [11] Xu H F, Zhang J Y. AANet: Adaptive aggregation network for efficient stereo matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1956–1965.https://doi.org/10.1109/CVPR42600.2020.00203.

    [12] [12] Khamis S, Fanello S, Rhemann C, et al. StereoNet: Guided hierarchical refinement for real-time edge-aware depth prediction[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 596–613.https://doi.org/10.1007/978-3-030-01267-0_35.

    [13] [13] Chopra S, Hadsell R, LeCun Y. Learning a similarity metric discriminatively, with application to face verification[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005, 1: 539–546.https://doi.org/10.1109/CVPR.2005.202.

    [15] [15] Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6230–6239.https://doi.org/10.1109/CVPR.2017.660.

    [16] [16] Nie G Y, Cheng M M, Liu Y, et al. Multi-level context ultra-aggregation for stereo matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 3278–3286.https://doi.org/10.1109/CVPR.2019.00340.

    [17] [17] Zhang F H, Prisacariu V, Yang R G, et al. Ga-Net: guided aggregation net for end-to-end stereo matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 185–194.https://doi.org/10.1109/CVPR.2019.00027.

    [18] [18] Chabra R, Straub J, Sweeney C, et al. StereoDRNet: dilated residual StereoNet[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 11778–11787.https://doi.org/10.1109/CVPR.2019.01206.

    [19] [19] Li Z S, Liu X T, Drenkow N, et al. Revisiting stereo depth estimation from a sequence-to-sequence perspective with Transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 6177–6186.https://doi.org/10.1109/ICCV48922.2021.00614.

    [20] [20] Tulyakov S, Ivanov A, Fleuret F. Practical deep stereo (PDS): toward applications-friendly deep stereo matching[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018: 5875–5885.https://doi.org/10.5555/3327345.3327488.

    [21] [21] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 7132–7141.https://doi.org/10.1109/CVPR.2018.00745.

    [22] [22] Woo S, Park J, Lee J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 3–19.https://doi.org/10.1007/978-3-030-01234-2_1.

    [23] [23] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale[C]//9th International Conference on Learning Representations, 2021.

    [24] [24] Han K, Xiao A, Wu E H, et al. Transformer in transformer[C]//Proceedings of the 35thConference onNeural Information Processing Systems, 2021: 15908–15919.

    [25] [25] Fang Y X, Liao B C, Wang X G, et al. You only look at one sequence: rethinking transformer in vision through object detection[C]//Proceedings of the 35thConference onNeural Information Processing Systems, 2021: 26183–26197.

    [26] [26] Liu Z, Lin Y T, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 9992–10002.https://doi.org/10.1109/ICCV48922.2021.00986.

    [28] [28] Liu Y B, Zhu L C, Yamada M, et al. Semantic correspondence as an optimal transport problem[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 4463–4472.https://doi.org/10.1109/CVPR42600.2020.00452.

    [29] [29] Sarlin P E, DeTone D, Malisiewicz T, et al. Superglue: learning feature matching with graph neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 4937–4946.https://doi.org/10.1109/CVPR42600.2020.00499.

    [30] [30] Cao H, Wang Y Y, Chen J, et al. Swin-Unet: Unet-like pure Transformer for medical image segmentation[C]//Proceedings of the International Conference on Computer Vision, 2022: 205–218.https://doi.org/10.1007/978-3-031-25066-8_9.

    [31] [31] Menze M, Geiger A. Object scene flow for autonomous vehicles[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3061–3070.https://doi.org/10.1109/CVPR.2015.7298925.

    [32] [32] He K M, Girshick R, Dollár P. Rethinking ImageNet pre-training[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 4917–4926.https://doi.org/10.1109/ICCV.2019.00502.

    Tools

    Get Citation

    Copy Citation Text

    Gaoping Wang, Xun Li, Xuefang Jia, Zhewen Li, Wenjie Wang. STransMNet: a stereo matching method with swin transformer fusion[J]. Opto-Electronic Engineering, 2023, 50(4): 220246

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Article

    Received: Oct. 8, 2022

    Accepted: Jan. 19, 2023

    Published Online: Jun. 15, 2023

    The Author Email: Li Xun (lixun@xpu.edu.cn)

    DOI:10.12086/oee.2023.220246

    Topics