Laser & Optoelectronics Progress, Volume. 58, Issue 2, 0210008(2021)

Text Line Segmentation of Tibetan Historical Documents Based on Text Core Regions Combined with Expansion Growth

Jincheng Li1, Xiaojuan Wang2, Weilan Wang1、*, Qiang Lin2, and Pengfei Hu2
Author Affiliations
  • 1Key Laboratory of China's Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Lanzhou, Gansu 730030, China
  • 2College of Mathematics and Computer Science, Northwest Minzu University, Lanzhou, Gansu 730030, China
  • show less

    In the Tibetan historical document images, there usually exist adhesion and overlapping between adjacent text lines, which makes text line segmentation become a difficult task. We propose a method for line segmentation of Tibetan historical document images, which combines the text core regions and expansion growth. First, the non-syllable points are removed according to the area and roundness of the connected components in the binary Tibetan historical document images and thus the syllable point images are obtained. Second, through the syllable point image via horizontal projection and the binary original image via vertical projection, the scope of the text line baselines and the number of text lines are obtained and the text core regions are generated. Meanwhile, the text core regions are combined with the binary original images via the or operation of pixel values to obtain the pseudo-text connected regions. Finally, based on the breadth-first-search algorithm, the expansion growth from the text core regions to the pseudo-text connected regions is realized and the pseudo-text line connected regions are obtained. The non-literal regions are removed to obtain the pseudo-text lines, and the final text lines are obtained through an effective algorithm for the line attribution of broken strokes. The experimental results show that the proposed method achieves relatively good text line segmentation effect and effectively solves the problems in text line segmentation of Tibetan historical documents, such as overlapping between text lines, partial adhesion between lines and stroke breaking.

    Tools

    Get Citation

    Copy Citation Text

    Jincheng Li, Xiaojuan Wang, Weilan Wang, Qiang Lin, Pengfei Hu. Text Line Segmentation of Tibetan Historical Documents Based on Text Core Regions Combined with Expansion Growth[J]. Laser & Optoelectronics Progress, 2021, 58(2): 0210008

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Image Processing

    Received: Jun. 8, 2020

    Accepted: Jul. 7, 2020

    Published Online: Jan. 8, 2021

    The Author Email: Wang Weilan (wangweilan@xbmu.edu.cn)

    DOI:10.3788/LOP202158.0210008

    Topics