Opto-Electronic Engineering, Volume. 50, Issue 4, 220246(2023)
STransMNet: a stereo matching method with swin transformer fusion
[2] [2] Hoffman J, Gupta S, Leong J, et al. Cross-modal adaptation for RGB-D detection[C]//2016 IEEE International Conference on Robotics and Automation (ICRA), 2016: 5032–5039.https://doi.org/10.1109/ICRA.2016.7487708.
[3] [3] Schwarz M, Milan A, Periyasamy A S, et al. RGB-D object detection and semantic segmentation for autonomous manipulation in clutter[J]. Int J Robot Res, 2018, 37(4–5): 437–451.https://doi.org/10.1177/0278364917713117.
[5] [5] Scharstein D, Szeliski R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms[J]. Int J Comput Vision, 2002, 47(1–3): 7–42.https://doi.org/10.1023/A:1014573219977.
[8] [8] Mayer N, Ilg E, Häusser P, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4040–4048.https://doi.org/10.1109/CVPR.2016.438.
[9] [9] Kendall A, Martirosyan H, Dasgupta S, et al. End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 66–75.https://doi.org/10.1109/ICCV.2017.17.
[10] [10] Chang J R, Chen Y S. Pyramid stereo matching network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 5410–5418.https://doi.org/10.1109/CVPR.2018.00567.
[11] [11] Xu H F, Zhang J Y. AANet: Adaptive aggregation network for efficient stereo matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1956–1965.https://doi.org/10.1109/CVPR42600.2020.00203.
[12] [12] Khamis S, Fanello S, Rhemann C, et al. StereoNet: Guided hierarchical refinement for real-time edge-aware depth prediction[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 596–613.https://doi.org/10.1007/978-3-030-01267-0_35.
[13] [13] Chopra S, Hadsell R, LeCun Y. Learning a similarity metric discriminatively, with application to face verification[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005, 1: 539–546.https://doi.org/10.1109/CVPR.2005.202.
[15] [15] Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6230–6239.https://doi.org/10.1109/CVPR.2017.660.
[16] [16] Nie G Y, Cheng M M, Liu Y, et al. Multi-level context ultra-aggregation for stereo matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 3278–3286.https://doi.org/10.1109/CVPR.2019.00340.
[17] [17] Zhang F H, Prisacariu V, Yang R G, et al. Ga-Net: guided aggregation net for end-to-end stereo matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 185–194.https://doi.org/10.1109/CVPR.2019.00027.
[18] [18] Chabra R, Straub J, Sweeney C, et al. StereoDRNet: dilated residual StereoNet[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 11778–11787.https://doi.org/10.1109/CVPR.2019.01206.
[19] [19] Li Z S, Liu X T, Drenkow N, et al. Revisiting stereo depth estimation from a sequence-to-sequence perspective with Transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 6177–6186.https://doi.org/10.1109/ICCV48922.2021.00614.
[20] [20] Tulyakov S, Ivanov A, Fleuret F. Practical deep stereo (PDS): toward applications-friendly deep stereo matching[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018: 5875–5885.https://doi.org/10.5555/3327345.3327488.
[21] [21] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 7132–7141.https://doi.org/10.1109/CVPR.2018.00745.
[22] [22] Woo S, Park J, Lee J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision, 2018: 3–19.https://doi.org/10.1007/978-3-030-01234-2_1.
[23] [23] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale[C]//9th International Conference on Learning Representations, 2021.
[24] [24] Han K, Xiao A, Wu E H, et al. Transformer in transformer[C]//Proceedings of the 35thConference onNeural Information Processing Systems, 2021: 15908–15919.
[25] [25] Fang Y X, Liao B C, Wang X G, et al. You only look at one sequence: rethinking transformer in vision through object detection[C]//Proceedings of the 35thConference onNeural Information Processing Systems, 2021: 26183–26197.
[26] [26] Liu Z, Lin Y T, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 9992–10002.https://doi.org/10.1109/ICCV48922.2021.00986.
[28] [28] Liu Y B, Zhu L C, Yamada M, et al. Semantic correspondence as an optimal transport problem[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 4463–4472.https://doi.org/10.1109/CVPR42600.2020.00452.
[29] [29] Sarlin P E, DeTone D, Malisiewicz T, et al. Superglue: learning feature matching with graph neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 4937–4946.https://doi.org/10.1109/CVPR42600.2020.00499.
[30] [30] Cao H, Wang Y Y, Chen J, et al. Swin-Unet: Unet-like pure Transformer for medical image segmentation[C]//Proceedings of the International Conference on Computer Vision, 2022: 205–218.https://doi.org/10.1007/978-3-031-25066-8_9.
[31] [31] Menze M, Geiger A. Object scene flow for autonomous vehicles[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3061–3070.https://doi.org/10.1109/CVPR.2015.7298925.
[32] [32] He K M, Girshick R, Dollár P. Rethinking ImageNet pre-training[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 4917–4926.https://doi.org/10.1109/ICCV.2019.00502.
Get Citation
Copy Citation Text
Gaoping Wang, Xun Li, Xuefang Jia, Zhewen Li, Wenjie Wang. STransMNet: a stereo matching method with swin transformer fusion[J]. Opto-Electronic Engineering, 2023, 50(4): 220246
Category: Article
Received: Oct. 8, 2022
Accepted: Jan. 19, 2023
Published Online: Jun. 15, 2023
The Author Email: Li Xun (lixun@xpu.edu.cn)