Optoelectronics Letters, Volume. 20, Issue 10, 599(2024)
2SWUNet: small window SWinUNet based on tansformer for building extraction from high-resolution remote sensing images
[1] [1] ZHU F, CUI J, ZHU B, et al. Semantic segmentation of urban street scene images based on improved U-Net network[J]. Optoelectronics letters, 2023, 19(3):179-185.
[2] [2] RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention, October 5-9, 2015, Munich,Germany. Berlin, Heidelberg: Springer, 2015: 234-241.
[3] [3] BADRINARAYANAN V, KENDALL A, CIPOLLA R.SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12):2481-2495.
[4] [4] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al.An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. (2020-10-22)[2023-05-10]. https://arxiv.org/abs/2010.11929.
[5] [5] YUAN L, CHEN Y P, WANG T, et al. Tokens-to-token vit: training vision transformers from scratch on imagenet[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10-17, 2021,Montreal, Canada. New York: IEEE, 2021: 558-567.
[6] [6] WANG W H, XIE E, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October10-17, 2021, Montreal, Canada. New York: IEEE, 2021:568-578.
[7] [7] GUO L Z, LIN Y T, CAO Y, et al. Swin transformer:hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10-17, 2021,Montreal, Canada. New York: IEEE, 2021:10012-10022.
[8] [8] CHAN S, LIU P, ZHANG Z. WeBox: locating small objects from weak edges[J]. Optoelectronics letters, 2021, 17(6): 349-353.
[9] [9] PARK N, KIM S. How do vision transformers work?[EB/OL]. (2022-02-14) [2023-05-10].https://arxiv.org/abs/2202.06709.
[10] [10] CORDONNIER J B, LOUKAS A, JAGGI M. On the relationship between self-attention and convolutional layers[EB/OL]. (2019-11-08) [2023-05-10].https://arxiv.org/ abs/1911.03584.
[11] [11] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2014-09-04) [2023-05-10].https://arxiv.org/abs/1911.03584.
[12] [12] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 26-July 1, 2016, Las Vegas, NV, USA. New York: IEEE, 2016: 770-778.
[13] [13] XIA L G, ZHANG X B, ZHANG J X, et al. Building extraction from very-high-resolution remote sensing images using semi-supervised semantic edge detection[J]. Remote sensing, 2021, 13(11): 2187.
[14] [14] LEI Y J, YU J M, CHAN S X, et al. SNLRUX++ for building extraction from high-resolution remote sensing images[J]. IEEE journal of selected topics in applied earth observations and remote sensing, 2021, 15:409-421.
[15] [15] DENG J, DONG W, SOCHER R, et al. Imagenet: a large-scale hierarchical image database[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 20-25, 2009, Miami Beach, FL, USA. New York: IEEE, 2009: 248-255.
[16] [16] XIE E, WANG W H, YU Z D, et al. Segformer: simple and efficient design for semantic segmentation with transformers[J]. Advances in neural information processing systems, 2021, 34: 12077-12090.
[17] [17] ZHOU B, ZHAO H, PUIG X, et al. Scene parsing through ade20k dataset[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, Hawaii, USA. New York: IEEE, 2017: 633-641.
[18] [18] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. (2017-06-17) [2023-05-10].https://arxiv.org/abs/1706.05587.
[19] [19] CHAURASIA A, CULURCIELLO E. Linknet: exploiting encoder representations for efficient semantic segmentation[C]//IEEE Visual Communications and Image Processing, December 10-13, 2017, St. Petersburg, FL,USA. New York: IEEE, 2017: 1-4.
[20] [20] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, Hawaii, USA. New York: IEEE, 2017: 2117-2125.
[21] [21] ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, Hawaii, USA. New York: IEEE,2017: 2881-2890.
[22] [22] LI H C, XIONG P F, AN J, et al. Pyramid attention network for semantic segmentation[EB/OL].(2018-05-25) [2023-05-10].https://arxiv.org/abs/1805.10180.
[23] [23] ZHOU Z W, SIDDIQUEE M M R, TAJBAKHSH N, et al. Unet++: a nested u-net architecture for medical image segmentation[C]//Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, September 20, 2018, Granada, Spain. Berlin, Heidelberg: Springer, 2018: 3-11.
[24] [24] STRUDEL R, GARCIA R, LAPTEV I, et al. Segmenter: transformer for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10-17, 2021, Montreal, Canada. New York: IEEE, 2021: 7262-7272.
[25] [25] CHEN J N, LU Y Y, YU Q H, et al. TransUNet: transformers make strong encoders for medical image segmentation[EB/OL]. (2021-02-08) [2023-05-10].https://arxiv.org/ abs/2102.04306.
[26] [26] CAO H, WANG Y, CHEN J, et al. Swin-unet: unet-like pure transformer for medical image segmentation[C]//European Conference on Computer Vision,October 23-27, 2022, Tel-Aviv, Israel. Berlin: Springer,2022: 205-218.
[27] [27] XIAO T, DOLLAR P, SINGH M, et al. Early convolutions help transformers see better[J]. Advances in neural information processing systems, 2021, 34:30392-30400.
Get Citation
Copy Citation Text
YU Jiamin, CHAN Sixian, LEI Yanjing, WU Wei, WANG Yuan, ZHOU Xiaolong. 2SWUNet: small window SWinUNet based on tansformer for building extraction from high-resolution remote sensing images[J]. Optoelectronics Letters, 2024, 20(10): 599
Received: Aug. 29, 2023
Accepted: Apr. 3, 2024
Published Online: Sep. 20, 2024
The Author Email: Sixian CHAN (sxchan@zjut.edu.cn)