Optoelectronics Letters, Volume. 20, Issue 10, 599(2024)

2SWUNet: small window SWinUNet based on tansformer for building extraction from high-resolution remote sensing images

Jiamin YU1, Sixian CHAN1,2,3、*, Yanjing LEI1, Wei WU1,3, Yuan WANG1, and Xiaolong ZHOU4
Author Affiliations
  • 1College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
  • 2Key Laboratory of Meteorological Disaster (KLME), Ministry of Education & Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters (CIC-FEMD), Nanjing University of Information Science &Technology, Nanjing 210044, China
  • 3College of Geographic Information Modern Industry, Zhejiang University of Technology, Hangzhou 310023, China
  • 4College of Electrical and Information Engineering, Quzhou University, Quzhou 324000, China2
  • show less
    References(27)

    [1] [1] ZHU F, CUI J, ZHU B, et al. Semantic segmentation of urban street scene images based on improved U-Net network[J]. Optoelectronics letters, 2023, 19(3):179-185.

    [2] [2] RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention, October 5-9, 2015, Munich,Germany. Berlin, Heidelberg: Springer, 2015: 234-241.

    [3] [3] BADRINARAYANAN V, KENDALL A, CIPOLLA R.SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12):2481-2495.

    [4] [4] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al.An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. (2020-10-22)[2023-05-10]. https://arxiv.org/abs/2010.11929.

    [5] [5] YUAN L, CHEN Y P, WANG T, et al. Tokens-to-token vit: training vision transformers from scratch on imagenet[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10-17, 2021,Montreal, Canada. New York: IEEE, 2021: 558-567.

    [6] [6] WANG W H, XIE E, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October10-17, 2021, Montreal, Canada. New York: IEEE, 2021:568-578.

    [7] [7] GUO L Z, LIN Y T, CAO Y, et al. Swin transformer:hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10-17, 2021,Montreal, Canada. New York: IEEE, 2021:10012-10022.

    [8] [8] CHAN S, LIU P, ZHANG Z. WeBox: locating small objects from weak edges[J]. Optoelectronics letters, 2021, 17(6): 349-353.

    [9] [9] PARK N, KIM S. How do vision transformers work?[EB/OL]. (2022-02-14) [2023-05-10].https://arxiv.org/abs/2202.06709.

    [10] [10] CORDONNIER J B, LOUKAS A, JAGGI M. On the relationship between self-attention and convolutional layers[EB/OL]. (2019-11-08) [2023-05-10].https://arxiv.org/ abs/1911.03584.

    [11] [11] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2014-09-04) [2023-05-10].https://arxiv.org/abs/1911.03584.

    [12] [12] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 26-July 1, 2016, Las Vegas, NV, USA. New York: IEEE, 2016: 770-778.

    [13] [13] XIA L G, ZHANG X B, ZHANG J X, et al. Building extraction from very-high-resolution remote sensing images using semi-supervised semantic edge detection[J]. Remote sensing, 2021, 13(11): 2187.

    [14] [14] LEI Y J, YU J M, CHAN S X, et al. SNLRUX++ for building extraction from high-resolution remote sensing images[J]. IEEE journal of selected topics in applied earth observations and remote sensing, 2021, 15:409-421.

    [15] [15] DENG J, DONG W, SOCHER R, et al. Imagenet: a large-scale hierarchical image database[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 20-25, 2009, Miami Beach, FL, USA. New York: IEEE, 2009: 248-255.

    [16] [16] XIE E, WANG W H, YU Z D, et al. Segformer: simple and efficient design for semantic segmentation with transformers[J]. Advances in neural information processing systems, 2021, 34: 12077-12090.

    [17] [17] ZHOU B, ZHAO H, PUIG X, et al. Scene parsing through ade20k dataset[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, Hawaii, USA. New York: IEEE, 2017: 633-641.

    [18] [18] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. (2017-06-17) [2023-05-10].https://arxiv.org/abs/1706.05587.

    [19] [19] CHAURASIA A, CULURCIELLO E. Linknet: exploiting encoder representations for efficient semantic segmentation[C]//IEEE Visual Communications and Image Processing, December 10-13, 2017, St. Petersburg, FL,USA. New York: IEEE, 2017: 1-4.

    [20] [20] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, Hawaii, USA. New York: IEEE, 2017: 2117-2125.

    [21] [21] ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, Hawaii, USA. New York: IEEE,2017: 2881-2890.

    [22] [22] LI H C, XIONG P F, AN J, et al. Pyramid attention network for semantic segmentation[EB/OL].(2018-05-25) [2023-05-10].https://arxiv.org/abs/1805.10180.

    [23] [23] ZHOU Z W, SIDDIQUEE M M R, TAJBAKHSH N, et al. Unet++: a nested u-net architecture for medical image segmentation[C]//Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, September 20, 2018, Granada, Spain. Berlin, Heidelberg: Springer, 2018: 3-11.

    [24] [24] STRUDEL R, GARCIA R, LAPTEV I, et al. Segmenter: transformer for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10-17, 2021, Montreal, Canada. New York: IEEE, 2021: 7262-7272.

    [25] [25] CHEN J N, LU Y Y, YU Q H, et al. TransUNet: transformers make strong encoders for medical image segmentation[EB/OL]. (2021-02-08) [2023-05-10].https://arxiv.org/ abs/2102.04306.

    [26] [26] CAO H, WANG Y, CHEN J, et al. Swin-unet: unet-like pure transformer for medical image segmentation[C]//European Conference on Computer Vision,October 23-27, 2022, Tel-Aviv, Israel. Berlin: Springer,2022: 205-218.

    [27] [27] XIAO T, DOLLAR P, SINGH M, et al. Early convolutions help transformers see better[J]. Advances in neural information processing systems, 2021, 34:30392-30400.

    Tools

    Get Citation

    Copy Citation Text

    YU Jiamin, CHAN Sixian, LEI Yanjing, WU Wei, WANG Yuan, ZHOU Xiaolong. 2SWUNet: small window SWinUNet based on tansformer for building extraction from high-resolution remote sensing images[J]. Optoelectronics Letters, 2024, 20(10): 599

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Received: Aug. 29, 2023

    Accepted: Apr. 3, 2024

    Published Online: Sep. 20, 2024

    The Author Email: Sixian CHAN (sxchan@zjut.edu.cn)

    DOI:10.1007/s11801-024-3179-1

    Topics