Laser & Optoelectronics Progress, Volume. 58, Issue 20, 2010020(2021)

Character Segmentation for Historical Uchen Tibetan Document Based on Structure Attributes

Ce Zhang1,2 and Weilan Wang1、*
Author Affiliations
  • 1Key Laboratory of China's Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Lanzhou, Gansu 730030, China
  • 2School of Mathematics and Information Engineering, Chongqing University of Education, Chongqing 400065, China
  • show less
    Figures & Tables(32)
    Syllable of the Tibetan. (a) Structure of the Tibetan syllable; (b) example of the Tibetan syllable; (c) examples of Tibetan transliteration of Sanskrit
    Original image of the historical Tibetan document
    Tibetan document after pre-processing
    Vertical segmentation process by projection of the historical Tibetan document. (a) Document line and its vertical projection; (b) character blocks in rectangular area
    Part of character block dataset
    Examples of the character segmentation challenges. (a) Segmentation challenges above the baseline; (b) segmentation challenges below the baseline
    Flow chart of the character segmentation for historical Tibetan document
    Flow chart of the local baseline detection
    Touching type above the baseline
    Schematic diagram of coordinate system and segmentation direction
    Examples of multipath segmentation. (a) Combination example; (b) marked skeleton diagram; (c) segmentation path
    Stroke types and geometric characteristics above the baseline
    Broken strokes type below the baseline. (a) Cross left and right; (b) cross up and down; (c) separate up and down; (d) contain
    Examples of strokes attribution classification. (a) With no stroke above the baseline and with no broken stroke below the baseline; (b) with strokes above the baseline and with no broken stroke below the baseline; (c) with no stroke above the baseline and with no broken stroke below the baseline; (d) with strokes above the baseline and with broken strokes below the baseline
    Process of local baseline detection and horizontal segmentation of character block. (a) Character blocks with syllable points; (b) character blocks with no syllable point and with no stroke above the baseline; (c) character blocks with no syllable point and with strokes above the baseline
    Touching stroke and type detection above the baseline
    Character segmentation with a touching stroke. (a) Character direction is D1; (b) character direction is D2
    Character segmentation with multiple touching strokes
    Character segmentation with multiple crossing strokes
    Statistical results of broken strokes below the baseline. (a) Cross left and right; (b) cross up and down; (c) separate up and down; (d) contain
    Attribution based on the horizontal distance of the centroid. (a) Character block; (b) centroid of strokes after attribution
    Attribution analysis of the stroke. (a) No. 1; (b) No. 9
    Attribution analysis of “ ” stroke
    Attribution analysis of both “ ” and “ ” stroks
    Results of character segmentation. (a) Character block; (b) block after character segmentation
    Wrong character segmentation caused by strokes attribution. (a)Character block; (b) local baseline and horizontal segmentation;(c) broken stroke mark; (d) result of character segmentation
    Wrong character segmentation caused by the baseline detection. (a) Character block; (b) horizontal projection; (c) Hough straight line detection; (d) local baseline; (e) result of character segmentation
    • Table 1. Classification of the character segmentation challenges

      View table

      Table 1. Classification of the character segmentation challenges

      LabelDescription
      C1overlapping strokes above the baseline
      C2crossing strokes above the baseline
      C3touching strokes above the baseline
      C4broken strokes above the baseline
      C5overlapping strokes below the baseline
      C6touching strokes below the baseline
      C7broken strokes below the baseline
    • Table 2. Correct segmentation data of character block dataset

      View table

      Table 2. Correct segmentation data of character block dataset

      NCSCNTSCNRecall/%
      10935410960399.77
    • Table 3. Data of the correct segmentation in the character segmentation stage

      View table

      Table 3. Data of the correct segmentation in the character segmentation stage

      NCSCNTCNTSCNRecall/%NPrecision/%NF-Measure/%
      17680218398718037996.0998.0297.05
    • Table 4. NError for each step during character segmentation

      View table

      Table 4. NError for each step during character segmentation

      Character segmentation stepsNWSCNProportion/%NError/%
      Build character block dataset2493.460.14
      Detect the local baseline and horizontal segmentation96213.390.52
      Detection of touching strokes type2673.720.15
      Segmentation of touching strokes250.350.01
      Strokes attribution568279.083.09
    • Table 5. Correctly segmented data

      View table

      Table 5. Correctly segmented data

      NCSCNTCCNTSCNRecall/%NPrecision/%NF-Measure/%
      19922020640520279796.5298.2497.37
    Tools

    Get Citation

    Copy Citation Text

    Ce Zhang, Weilan Wang. Character Segmentation for Historical Uchen Tibetan Document Based on Structure Attributes[J]. Laser & Optoelectronics Progress, 2021, 58(20): 2010020

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Image Processing

    Received: Jan. 8, 2021

    Accepted: Mar. 3, 2021

    Published Online: Nov. 3, 2021

    The Author Email: Weilan Wang (wangweilan@xbmu.edu.cn)

    DOI:10.3788/LOP202158.2010020

    Topics