Journal of Optoelectronics · Laser, Volume. 35, Issue 5, 525(2024)

Text recognition algorithm based on multimodal iteration and correction

QIANG Guanchen1, ZHANG Lizhen1, YANG Qian1, XIONG Wei1,2,3,4、*, and LI Lirong1,2
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • 3[in Chinese]
  • 4[in Chinese]
  • show less
    References(28)

    [1] [1] HENG H R,LI P J,GUAN T X,et al.Scene text recognition via context modeling for low-quality image in logistics industry[J].Complex & Intelligent Systems,2023,9:3229-3248.

    [2] [2] ZHANG C S,DING W P,PENG G W,et al.Street view text recognition with deep learning for urban scene understanding in intelligent transportation systems[J].IEEE Transactions on Intelligent Transportation Systems,2021,22(7):4727-4743.

    [3] [3] GUO Z Y,HUANG Y P,HU X,et al.A Survey on deep learning based approaches for scene understanding in autonomous driving[J].Electronics,2021,10(4):471.

    [4] [4] PHAN T Q,SHIVAKUMARA P,TIAN S,et al.Recognizing text with perspective distortion in natural scenes[C]//2013 14th IEEE International Conference on Computer Vision,ICCV 2013,December 1-8,2013,Sydney,NSW,Australia.New York:IEEE,2013:569-576.

    [5] [5] YAO C,BAI X,SHI B,et al.Strokelets:A learned multi-scale representation for scene text recognition[C]//27th IEEE Conference on Computer Vision and Pattern Recognition,CVPR 2014,June 23-28,2014,Columbus,OH,United States.New York:IEEE,2014:4042-4049.

    [6] [6] GORDO A.Supervised mid-level features for word image representation[C]//IEEE Conference on Computer Vision and Pattern Recognition,CVPR 2015,June 7-12,2015,Boston,MA,United States.New York:IEEE,2015:2956-2964.

    [7] [7] MISHRA A,ALAHARI K,JAWAHAR C V.Scene text recognition using higher order language priors[C]//2012 23rd British Machine Vision Conference,BMVC 2012,September 3-7,2012,Guildford,Surrey,United Kingdom,2012.Durham:BMVA Press:11.

    [8] [8] GOODFELLOW I J,WARDE-FARLEY D,MIRZA M,et al.Maxout networks[C]//30th International Conference on Machine Learning,ICML 2013,June 16-21,2013,Atlanta,GA,United States.JMLR,2013:1319-1327.

    [9] [9] LUO C,LIN Q,LIU Y,et al.Separating content from style using adversarial learning for recognizing text in the wild[J].International Journal of Computer Vision,2021,129(4):960-976.

    [10] [10] WANG W,XIE E,LIU X,et al.Scene text image super-resolution in the wild[C]//16th European Conference on Computer Vision,ECCV 2020,August 23-28,2020,Glasgow,United Kingdom.Cham:Springer,2020:650-666.

    [11] [11] YANG M,GUAN Y,LIAO M,et al.Symmetry-constrained rectification network for scene text recognition[C]//17th IEEE/CVF International Conference on Computer Vision,ICCV 2019,October 27-November 2,2019,Seoul,Korea.New York:IEEE,2019:9146-9155.

    [12] [12] ZHANG H,YAO Q,YANG M,et al.Autostr:Efficient backbone search for scene text recognition[C]//16th European Conference on Computer Vision,ECCV 2020,August 23-28,2020,Glasgow,United Kingdom.Cham:Springer,2020:751-767.

    [13] [13] LIAO M,LYU P,HE M,et al.Mask textspotter:An end-to-end trainable neural network for spotting text with arbitrary shapes[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,43(2):532-548.

    [14] [14] YU D,LI X,ZHANG C,et al.Towards accurate scene text recognition with semantic reasoning networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,CVPR 2020,June 14-19,2020,Virtual,Online.New York:IEEE,2020:12110-12119.

    [15] [15] QIAO Z,ZHOU Y,YANG D,et al.Seed:Semantics enhanced encoder-decoder framework for scene text recognition[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,CVPR 2020,June 14-19,2020,Virtual,Online.New York:IEEE,2020:13525-13534.

    [16] [16] WAN Z,XIE F,LIU Y,et al.2D-CTC for scene text recognition[EB/OL].(2019-07-23)[2023-04-17].https://arxiv.org/abs/1907.09705v1.

    [17] [17] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[C]//3rd International Conference on Learning Representations,ICLR 2015,May 7-9,2015,San Diego,CA,United States,2015.

    [18] [18] YUE X,KUANG Z,LIN C,et al.Robustscanner:Dynamically enhancing positional clues for robust text recognition[C]//16th European Conference on Computer Vision,ECCV 2020,August 23-28,2020,Glasgow,United Kingdom.Cham:Springer,2020:135-151.

    [19] [19] KE W J,WEI J G,HOU Q Z,et al.Rethinking text rectification for scene text recognition[J].Expert Systems with Applications:An Iternational Journall,2023,219:119647.

    [20] [20] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//29th IEEE Conference on Computer Vision and Pattern Recognition,CVPR 2016,June 26-July 1,2016,Las Vegas,NV,United States.New York:IEEE,2016:770-778.

    [21] [21] LI Y H,YAO T,PAN Y W,et al.Contextual transformer networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(2):1489-1500.

    [22] [22] SUNKARA R,LUO T.No more strided convolutions or pooling:a new CNN building block for low-resolution images and small objects[EB/OL].(2022-08-07)[2023-04-17].https://arxiv.org/abs/2208.03641v1.

    [23] [23] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//31st International Conference on Neural Information Processing Systems,NIPS 2017,December 4-9,2017,Long Beach,CA,United States.Red Hook:Curran Associates Inc.,2017:6000-6010.

    [24] [24] JADERBERG M,SIMONYAN K,VEDALDI A,et al.Reading text in the wild with convolutional neural networks[J].International Journal of Computer Vision,2016,116(1):1-20.

    [25] [25] GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceeding of the 29th IEEE Conference on Computer Vision and Pattern Recognition,CVPR 2016,June 26-July 1,2016,Las Vegas,NV,United States.New York:IEEE,2016:2315-2324.

    [26] [26] KARATZAS D,SHAFAIT F,UCHIDA S,et al.ICDAR 2013 robust reading competition[C]//12th International Conference on Document Analysis and Recognition,ICDAR 2013,August 25-28,2013,Washington,DC,United States.New York:IEEE,2013:1484-1493.

    [27] [27] KARATZAS D,GOMEZ-BIGORDA L,NICOLAOU A,et al.ICDAR 2015 competition on robust reading[C]//Proceeding of the 13th International Conference on Document Analysis and Recognition,ICDAR 2015,August 23-26,2015,Tunis,Tunisia.New York:IEEE,2015:1156-1160.

    [28] [28] WANG K,BABENKO B,BELONGIE S.End-to-end scene text recognition[C]//2011 IEEE International Conference on Computer Vision,ICCV 2011,November 6-13,2011,Barcelona,Spain.New York:IEEE,2011:1457-1464.

    Tools

    Get Citation

    Copy Citation Text

    QIANG Guanchen, ZHANG Lizhen, YANG Qian, XIONG Wei, LI Lirong. Text recognition algorithm based on multimodal iteration and correction[J]. Journal of Optoelectronics · Laser, 2024, 35(5): 525

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Received: Apr. 17, 2023

    Accepted: --

    Published Online: Sep. 24, 2024

    The Author Email: XIONG Wei (xw@mail.hbut.edu.cn)

    DOI:10.16136/j.joel.2024.05.0193

    Topics