Journal of Optoelectronics · Laser, Volume. 35, Issue 5, 525(2024)
Text recognition algorithm based on multimodal iteration and correction
A text recognition algorithm based on multimodal iteration and correction is proposed to address the problems that scene text recognition is prone to information loss when modeling over long distances and weak characterization for low-resolution text images.The visual model of the algorithm in this paper is a combination of contextual transformer networks for visual recognition (CoTNet),a dynamic convolutional attention module (DCAM),an external attention encoder (EA-Encoder),and a positional attention mechanism.The CoTNet can effectively alleviate the information loss problem arising from long-distance modeling.The DCAM enhances representation by focusing on the essential features while passing the critical components to the EA-Encoder,improving the connection between CoTNet and EA-Encoder.EA-Encoder learns the best distinguishing features on the entire dataset,capturing the most semantic information parts and thus enhancing representation.After the visual model,the text correction and fusion modules obtain the final recognition results.According to the experimental data,the algorithm proposed in this paper performs well on several public scene text datasets,especially on the irregular dataset ICDAR2015 with an accuracy of 85.9%.
Get Citation
Copy Citation Text
QIANG Guanchen, ZHANG Lizhen, YANG Qian, XIONG Wei, LI Lirong. Text recognition algorithm based on multimodal iteration and correction[J]. Journal of Optoelectronics · Laser, 2024, 35(5): 525
Received: Apr. 17, 2023
Accepted: --
Published Online: Sep. 24, 2024
The Author Email: XIONG Wei (xw@mail.hbut.edu.cn)