Journal of Optoelectronics · Laser, Volume. 35, Issue 5, 525(2024)

Text recognition algorithm based on multimodal iteration and correction

QIANG Guanchen1, ZHANG Lizhen1, YANG Qian1, XIONG Wei1,2,3,4、*, and LI Lirong1,2
Author Affiliations
  • 1[in Chinese]
  • 2[in Chinese]
  • 3[in Chinese]
  • 4[in Chinese]
  • show less

    A text recognition algorithm based on multimodal iteration and correction is proposed to address the problems that scene text recognition is prone to information loss when modeling over long distances and weak characterization for low-resolution text images.The visual model of the algorithm in this paper is a combination of contextual transformer networks for visual recognition (CoTNet),a dynamic convolutional attention module (DCAM),an external attention encoder (EA-Encoder),and a positional attention mechanism.The CoTNet can effectively alleviate the information loss problem arising from long-distance modeling.The DCAM enhances representation by focusing on the essential features while passing the critical components to the EA-Encoder,improving the connection between CoTNet and EA-Encoder.EA-Encoder learns the best distinguishing features on the entire dataset,capturing the most semantic information parts and thus enhancing representation.After the visual model,the text correction and fusion modules obtain the final recognition results.According to the experimental data,the algorithm proposed in this paper performs well on several public scene text datasets,especially on the irregular dataset ICDAR2015 with an accuracy of 85.9%.

    Tools

    Get Citation

    Copy Citation Text

    QIANG Guanchen, ZHANG Lizhen, YANG Qian, XIONG Wei, LI Lirong. Text recognition algorithm based on multimodal iteration and correction[J]. Journal of Optoelectronics · Laser, 2024, 35(5): 525

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Received: Apr. 17, 2023

    Accepted: --

    Published Online: Sep. 24, 2024

    The Author Email: XIONG Wei (xw@mail.hbut.edu.cn)

    DOI:10.16136/j.joel.2024.05.0193

    Topics