Journal of Optoelectronics · Laser, Volume. 35, Issue 6, 570(2024)

Scene text detection based on dual attention and multi-scale feature fusion

QIANG Guanchen1, YANG Qian1, ZHANG Lizhen1, XIONG Wei1,2,3,4、*, and LI Lirong1,2
Author Affiliations
  • 1School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, Hubei 430068, China
  • 2Hubei Key Laboratory of Solar Energy Efficient Utilization and Energy Storage Operation Control, Hubei University of Technology, Wuhan, Hubei 430068, China
  • 3Hubei Engineering Research Center for Safety Monitoring of New Energy and Power Grid Equipment, Hubei University of Technology, Wuhan, Hubei 430068, China
  • 4Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina 29201, USA
  • show less
    References(29)

    [1] [1] SHIVAKUMARA P, DUTTA A, TAN C L, et al. Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing[J]. Multimedia Tools and Applications, 2014, 72(1): 515-539.

    [2] [2] NAIEMI F, GHODS V, KHALESI H. Scene text detection using enhanced extremal region and convolutional neural network[J]. Multimedia Tools and Applications, 2020, 79(37-38): 27137-27159.

    [3] [3] WANG X, SONG Y, ZHANG Y, et al. A hierarchical recursive method for text detection in natural scene images[J]. Multimedia Tools and Applications, 2017, 76(24): 26201-26223.

    [4] [4] HUANG X. Automatic video scene text detection based on saliency edge map[J]. Multimedia Tools and Applications, 2019, 78(24): 34819-34838.

    [5] [5] SHIVAKUMARA P, BANERJEE A, PAL U, et al. A new language-independent deep CNN for scene text detection and style transfer in social media images[J]. IEEE Transactions on Image Processing, 2023, 32: 3552-3566.

    [6] [6] HE T, HUANG W, QIAO Y, et al. Text-attentional convolutional neural network for scene text detection[J]. IEEE Transactions on Image Processing, 2016, 25(6): 2529-2541.

    [7] [7] ZHU Y, YAO C, BAI X. Scene text detection and recognition: recent advances and future trends[J]. Frontiers of Computer Science, 2016, 10(1): 19-36.

    [8] [8] LIAO M, SHI B, BAI X. Textboxes++: A single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690.

    [9] [9] SHI B, BAI X, BELONGIE S. Detecting oriented text in natural images by linking segments[C]//30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, July 21-26, 2017, Honolulu, HI, United States. New York: IEEE, 2017: 3482-3490.

    [10] [10] ZHOU X, YAO C, WEN H, et al. East: An efficient and accurate scene text detector[C]//30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, July 21-26, 2017, Honolulu, HI, United States. New York: IEEE, 2017: 2642-2651.

    [11] [11] RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segmentation[C]//18th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2015, October 5-9, 2015, Munich, Germany. Cham: Springer 2015: 234-241.

    [12] [12] ZHANG C, LIANG B, HUANG Z, et al. Look more than once: An accurate detector for text of arbitrary shapes[C]//32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019, June 16-20, 2019, Long Beach, CA, United States. New York: IEEE, 2019: 10544-10553.

    [13] [13] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, June 7-12, 2015, Boston, MA, United States. New York: IEEE, 2015: 431-440.

    [14] [14] DENG D, LIU H, LI X, et al. Pixellink: Detecting scene text via instance segmentation[C]//32nd AAAI Conference on Artificial Intelligence, AAAI 2018, February 2-7, 2018, New Orleans, LA, United States. Palo Alto: AAAI Press, 2018: 6773-6780.

    [15] [15] LONG S, RUAN J, ZHANG W, et al. Textsnake: A flexible representation for detecting text of arbitrary shapes[C]//15th European Conference on Computer Vision, ECCV 2018, September 8-14, 2018, Munich, Germany. Cham: Springer, 2018: 19-35.

    [16] [16] LI Y, WU Z, ZHAO S, et al. PSENet: Psoriasis severity evaluation network[C]//34th AAAI Conference on Artificial Intelligence, AAAI 2020, February 7-12, 2020, New York, NY, United States. Palo Alto: AAAI Press, 2020: 800-807.

    [17] [17] XIE E, ZANG Y, SHAO S, et al. Scene text detection with supervised pyramid context network[C]//33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Annual Conference on Innovative Applications of Artificial Intelligence, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, January 27-February 1, 2019, Honolulu, HI, United States. Palo Alto: AAAI Press, 2019: 9038-9045.

    [18] [18] HE K, GKIOXARI G, DOLLAR P, et al. Mask R-Cnn[C]//16th IEEE International Conference on Computer Vision, ICCV 2017, October 22-29, 2017, Venice, Italy. New York: IEEE, 2017: 2980-2988.

    [19] [19] TIAN Z, SHU M, LYU P, et al. Learning shape-aware embedding for scene text detection[C]//32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019, June 16-20, 2019, Long Beach, CA, United States. New York: IEEE, 2019: 4229-4238.

    [20] [20] LIAO M, WAN Z, YAO C, et al. Real-time scene text detection with differentiable binarization[C]//34th AAAI Conference on Artificial Intelligence, AAAI 2020, February 7-12, 2020, New York, NY, United States. Palo Alto: AAAI Press, 2020: 11474-11481.

    [21] [21] KHAN T, SARKAR R, MOLLAH A F. Deep learning approaches to scene text detection: A comprehensive review[J]. Artificial Intelligence Review, 2021, 54(5): 3239-3298.

    [22] [22] GAO S H, CHENG M M, ZHAO K, et al. Res2Net: A new multi-scale backbone architecture[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(2): 652-662.

    [23] [23] WANG Q, WU B, ZHU P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, June 14-19, 2020, Virtual, Online, United States. New York: IEEE, 2020: 11531-11539.

    [24] [24] LIU Y, SHAO Z, TENG Y, et al. Nam: Normalization-based attention module[EB/OL].(2021-11-24)[2023-12-18]. https://arxiv.org/abs/2111.12419.

    [25] [25] DAI J, QI H, XIONG Y, et al. Deformable convolutional networks[C]//16th IEEE International Conference on Computer Vision, ICCV 2017, October 22-29, 2017, Venice, Italy. New York: IEEE, 2017: 764-773.

    [26] [26] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, July 21-26, 2017, Honolulu, HI, United States. New York: IEEE, 2017: 936-944.

    [27] [27] LIU Y, JIN L, ZHANG S, et al. Curved scene text detection via transverse and longitudinal sequence connection[J]. Pattern Recognition, 2019, 90: 337-345.

    [28] [28] KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]//13th International Conference on Document Analysis and Recognition, ICDAR 2015, August 23-26, 2015, Tunis, Tunisia. New York: IEEE, 2015: 1156-1160.

    [29] [29] CH'NG C K, CHAN C S. Total-text: A comprehensive dataset for scene text detection and recognition[C]//14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017, November 9-15, 2017, Kyoto, Japan. New York: IEEE, 2017: 935-942.

    Tools

    Get Citation

    Copy Citation Text

    QIANG Guanchen, YANG Qian, ZHANG Lizhen, XIONG Wei, LI Lirong. Scene text detection based on dual attention and multi-scale feature fusion[J]. Journal of Optoelectronics · Laser, 2024, 35(6): 570

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category:

    Received: Sep. 4, 2023

    Accepted: Dec. 13, 2024

    Published Online: Dec. 13, 2024

    The Author Email: XIONG Wei (xw@mail.hbut.edu.cn)

    DOI:10.16136/j.joel.2024.06.0468

    Topics