Text recognition algorithm based on multimodal iteration and correction

[1] [1] HENG H R,LI P J,GUAN T X,et al.Scene text recognition via context modeling for low-quality image in logistics industry[J].Complex & Intelligent Systems,2023,9:3229-3248.

[2] [2] ZHANG C S,DING W P,PENG G W,et al.Street view text recognition with deep learning for urban scene understanding in intelligent transportation systems[J].IEEE Transactions on Intelligent Transportation Systems,2021,22(7):4727-4743.

[3] [3] GUO Z Y,HUANG Y P,HU X,et al.A Survey on deep learning based approaches for scene understanding in autonomous driving[J].Electronics,2021,10(4):471.

[4] [4] PHAN T Q,SHIVAKUMARA P,TIAN S,et al.Recognizing text with perspective distortion in natural scenes［C］//2013 14th IEEE International Conference on Computer Vision,ICCV 2013,December 1-8,2013,Sydney,NSW,Australia.New York:IEEE,2013:569-576.

[5] [5] YAO C,BAI X,SHI B,et al.Strokelets:A learned multi-scale representation for scene text recognition［C］//27th IEEE Conference on Computer Vision and Pattern Recognition,CVPR 2014,June 23-28,2014,Columbus,OH,United States.New York:IEEE,2014:4042-4049.

[6] [6] GORDO A.Supervised mid-level features for word image representation［C］//IEEE Conference on Computer Vision and Pattern Recognition,CVPR 2015,June 7-12,2015,Boston,MA,United States.New York:IEEE,2015:2956-2964.

[7] [7] MISHRA A,ALAHARI K,JAWAHAR C V.Scene text recognition using higher order language priors［C］//2012 23rd British Machine Vision Conference,BMVC 2012,September 3-7,2012,Guildford,Surrey,United Kingdom,2012.Durham:BMVA Press:11.

[8] [8] GOODFELLOW I J,WARDE-FARLEY D,MIRZA M,et al.Maxout networks［C］//30th International Conference on Machine Learning,ICML 2013,June 16-21,2013,Atlanta,GA,United States.JMLR,2013:1319-1327.

[9] [9] LUO C,LIN Q,LIU Y,et al.Separating content from style using adversarial learning for recognizing text in the wild[J].International Journal of Computer Vision,2021,129(4):960-976.

[10] [10] WANG W,XIE E,LIU X,et al.Scene text image super-resolution in the wild［C］//16th European Conference on Computer Vision,ECCV 2020,August 23-28,2020,Glasgow,United Kingdom.Cham:Springer,2020:650-666.

[11] [11] YANG M,GUAN Y,LIAO M,et al.Symmetry-constrained rectification network for scene text recognition［C］//17th IEEE/CVF International Conference on Computer Vision,ICCV 2019,October 27-November 2,2019,Seoul,Korea.New York:IEEE,2019:9146-9155.

[12] [12] ZHANG H,YAO Q,YANG M,et al.Autostr:Efficient backbone search for scene text recognition［C］//16th European Conference on Computer Vision,ECCV 2020,August 23-28,2020,Glasgow,United Kingdom.Cham:Springer,2020:751-767.

[13] [13] LIAO M,LYU P,HE M,et al.Mask textspotter:An end-to-end trainable neural network for spotting text with arbitrary shapes[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,43(2):532-548.

[14] [14] YU D,LI X,ZHANG C,et al.Towards accurate scene text recognition with semantic reasoning networks［C］//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,CVPR 2020,June 14-19,2020,Virtual,Online.New York:IEEE,2020:12110-12119.

[15] [15] QIAO Z,ZHOU Y,YANG D,et al.Seed:Semantics enhanced encoder-decoder framework for scene text recognition［C］//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,CVPR 2020,June 14-19,2020,Virtual,Online.New York:IEEE,2020:13525-13534.

[16] [16] WAN Z,XIE F,LIU Y,et al.2D-CTC for scene text recognition［EB/OL］.(2019-07-23)［2023-04-17］.https://arxiv.org/abs/1907.09705v1.

[17] [17] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate［C］//3rd International Conference on Learning Representations,ICLR 2015,May 7-9,2015,San Diego,CA,United States,2015.

[18] [18] YUE X,KUANG Z,LIN C,et al.Robustscanner:Dynamically enhancing positional clues for robust text recognition［C］//16th European Conference on Computer Vision,ECCV 2020,August 23-28,2020,Glasgow,United Kingdom.Cham:Springer,2020:135-151.

[19] [19] KE W J,WEI J G,HOU Q Z,et al.Rethinking text rectification for scene text recognition[J].Expert Systems with Applications:An Iternational Journall,2023,219:119647.

[20] [20] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition［C］//29th IEEE Conference on Computer Vision and Pattern Recognition,CVPR 2016,June 26-July 1,2016,Las Vegas,NV,United States.New York:IEEE,2016:770-778.

[21] [21] LI Y H,YAO T,PAN Y W,et al.Contextual transformer networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(2):1489-1500.

[22] [22] SUNKARA R,LUO T.No more strided convolutions or pooling:a new CNN building block for low-resolution images and small objects［EB/OL］.(2022-08-07)［2023-04-17］.https://arxiv.org/abs/2208.03641v1.

[23] [23] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need［C］//31st International Conference on Neural Information Processing Systems,NIPS 2017,December 4-9,2017,Long Beach,CA,United States.Red Hook:Curran Associates Inc.,2017:6000-6010.

[24] [24] JADERBERG M,SIMONYAN K,VEDALDI A,et al.Reading text in the wild with convolutional neural networks[J].International Journal of Computer Vision,2016,116(1):1-20.

[25] [25] GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images［C］//Proceeding of the 29th IEEE Conference on Computer Vision and Pattern Recognition,CVPR 2016,June 26-July 1,2016,Las Vegas,NV,United States.New York:IEEE,2016:2315-2324.

[26] [26] KARATZAS D,SHAFAIT F,UCHIDA S,et al.ICDAR 2013 robust reading competition［C］//12th International Conference on Document Analysis and Recognition,ICDAR 2013,August 25-28,2013,Washington,DC,United States.New York:IEEE,2013:1484-1493.

[27] [27] KARATZAS D,GOMEZ-BIGORDA L,NICOLAOU A,et al.ICDAR 2015 competition on robust reading［C］//Proceeding of the 13th International Conference on Document Analysis and Recognition,ICDAR 2015,August 23-26,2015,Tunis,Tunisia.New York:IEEE,2015:1156-1160.

[28] [28] WANG K,BABENKO B,BELONGIE S.End-to-end scene text recognition［C］//2011 IEEE International Conference on Computer Vision,ICCV 2011,November 6-13,2011,Barcelona,Spain.New York:IEEE,2011:1457-1464.

Tools

Get Citation

Copy Citation Text

QIANG Guanchen, ZHANG Lizhen, YANG Qian, XIONG Wei, LI Lirong. Text recognition algorithm based on multimodal iteration and correction[J]. Journal of Optoelectronics · Laser, 2024, 35(5): 525

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Received: Apr. 17, 2023

Accepted: --

Published Online: Sep. 24, 2024

The Author Email: XIONG Wei (xw@mail.hbut.edu.cn)

DOI:10.16136/j.joel.2024.05.0193

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology