Optoelectronics Letters, Volume. 13, Issue 6, 457(2017)

Text extraction method for historical Tibetan document images based on block projections

Li-juan DUAN1,2、*, Xi-qun ZHANG1,3, Long-long MA4, and Jian WU4
Author Affiliations
  • 1Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
  • 2Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, Beijing University of Technology,Beijing 100124, China
  • 3Beijing Key Laboratory of Trusted Computing, Beijing University of Technology, Beijing 100124, China
  • 4Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing 100190,China
  • show less

    Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of text extraction is considered as text area detection and location problem. The images are divided equally into blocks and the blocks are filtered by the information of the categories of connected components and corner point density. By analyzing the filtered blocks’ projections, the approximate text areas can be located, and the text regions are extracted. Experiments on the dataset of historical Tibetan documents demonstrate the effectiveness of the proposed method.

    Tools

    Get Citation

    Copy Citation Text

    DUAN Li-juan, ZHANG Xi-qun, MA Long-long, WU Jian. Text extraction method for historical Tibetan document images based on block projections[J]. Optoelectronics Letters, 2017, 13(6): 457

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Received: Aug. 30, 2017

    Accepted: Sep. 18, 2017

    Published Online: Sep. 13, 2018

    The Author Email: Li-juan DUAN (ljduan@bjut.edu.cn)

    DOI:10.1007/s11801-017-7197-0

    Topics