Researching | Character Segmentation for Historical Uchen Tibetan Document Based on Structure Attributes

Laser & Optoelectronics Progress, Volume. 58, Issue 20, 2010020(2021)

Character Segmentation for Historical Uchen Tibetan Document Based on Structure Attributes

Ce Zhang^1,2 and Weilan Wang^1、*

Author Affiliations

¹Key Laboratory of China's Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Lanzhou, Gansu 730030, China

²School of Mathematics and Information Engineering, Chongqing University of Education, Chongqing 400065, China

show less

Abstract Get PDF(in Chinese)

Figures&Tables (32)

References (19)

Paper Information

Figures & Tables(32)

Fig. 1. Syllable of the Tibetan. (a) Structure of the Tibetan syllable; (b) example of the Tibetan syllable; (c) examples of Tibetan transliteration of Sanskrit

Download full size

Fig. 2. Original image of the historical Tibetan document

Download full size

Fig. 3. Tibetan document after pre-processing

Download full size

Fig. 4. Vertical segmentation process by projection of the historical Tibetan document. (a) Document line and its vertical projection; (b) character blocks in rectangular area

Download full size

Fig. 5. Part of character block dataset

Download full size

Fig. 6. Examples of the character segmentation challenges. (a) Segmentation challenges above the baseline; (b) segmentation challenges below the baseline

Download full size

Fig. 7. Flow chart of the character segmentation for historical Tibetan document

Download full size

Fig. 8. Flow chart of the local baseline detection

Download full size

Fig. 9. Touching type above the baseline

Download full size

Fig. 10. Schematic diagram of coordinate system and segmentation direction

Download full size

Fig. 11. Examples of multipath segmentation. (a) Combination example; (b) marked skeleton diagram; (c) segmentation path

Download full size

Fig. 12. Stroke types and geometric characteristics above the baseline

Download full size

Fig. 13. Broken strokes type below the baseline. (a) Cross left and right; (b) cross up and down; (c) separate up and down; (d) contain

Download full size

Fig. 14. Examples of strokes attribution classification. (a) With no stroke above the baseline and with no broken stroke below the baseline; (b) with strokes above the baseline and with no broken stroke below the baseline; (c) with no stroke above the baseline and with no broken stroke below the baseline; (d) with strokes above the baseline and with broken strokes below the baseline

Download full size

Fig. 15. Process of local baseline detection and horizontal segmentation of character block. (a) Character blocks with syllable points; (b) character blocks with no syllable point and with no stroke above the baseline; (c) character blocks with no syllable point and with strokes above the baseline

Download full size

Fig. 16. Touching stroke and type detection above the baseline

Download full size

Fig. 17. Character segmentation with a touching stroke. (a) Character direction is D₁; (b) character direction is D₂

Download full size

Fig. 18. Character segmentation with multiple touching strokes

Download full size

Fig. 19. Character segmentation with multiple crossing strokes

Download full size

Fig. 20. Statistical results of broken strokes below the baseline. (a) Cross left and right; (b) cross up and down; (c) separate up and down; (d) contain

Download full size

Fig. 21. Attribution based on the horizontal distance of the centroid. (a) Character block; (b) centroid of strokes after attribution

Download full size

Fig. 22. Attribution analysis of the stroke. (a) No. 1; (b) No. 9

Download full size

Fig. 23. Attribution analysis of “ ” stroke

Download full size

Fig. 24. Attribution analysis of both “ ” and “ ” stroks

Download full size

Fig. 25. Results of character segmentation. (a) Character block; (b) block after character segmentation

Download full size

Fig. 26. Wrong character segmentation caused by strokes attribution. (a)Character block; (b) local baseline and horizontal segmentation;(c) broken stroke mark; (d) result of character segmentation

Download full size

Fig. 27. Wrong character segmentation caused by the baseline detection. (a) Character block; (b) horizontal projection; (c) Hough straight line detection; (d) local baseline; (e) result of character segmentation

Download full size

Table 1. Classification of the character segmentation challenges

Table 1. Classification of the character segmentation challenges

Label	Description
C1	overlapping strokes above the baseline
C2	crossing strokes above the baseline
C3	touching strokes above the baseline
C4	broken strokes above the baseline
C5	overlapping strokes below the baseline
C6	touching strokes below the baseline
C7	broken strokes below the baseline

Table 2. Correct segmentation data of character block dataset
View table
Table 2. Correct segmentation data of character block dataset
N_CSC N_TSC N_Recall/%
109354 109603 99.77

Table 3. Data of the correct segmentation in the character segmentation stage
View table
Table 3. Data of the correct segmentation in the character segmentation stage
N_CSC N_TC N_TSC N_Recall/% N_Precision/% N_F-Measure/%
176802 183987 180379 96.09 98.02 97.05

Table 4. N_Error for each step during character segmentation

Table 4. N_Error for each step during character segmentation

Character segmentation steps	N_WSC	N_Proportion/%	N_Error/%
Build character block dataset	249	3.46	0.14
Detect the local baseline and horizontal segmentation	962	13.39	0.52
Detection of touching strokes type	267	3.72	0.15
Segmentation of touching strokes	25	0.35	0.01
Strokes attribution	5682	79.08	3.09

Table 5. Correctly segmented data
View table
Table 5. Correctly segmented data
N_CSC N_TCC N_TSC N_Recall/% N_Precision/% N_F-Measure/%
199220 206405 202797 96.52 98.24 97.37

Tools

Get Citation

Copy Citation Text

Ce Zhang, Weilan Wang. Character Segmentation for Historical Uchen Tibetan Document Based on Structure Attributes[J]. Laser & Optoelectronics Progress, 2021, 58(20): 2010020

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category: Image Processing

Received: Jan. 8, 2021

Accepted: Mar. 3, 2021

Published Online: Nov. 3, 2021

The Author Email: Weilan Wang (wangweilan@xbmu.edu.cn)

DOI:10.3788/LOP202158.2010020

Topics

laser devices and laser physics

Lasers and Laser Optics

laser manufacturing

Instrumentation, Measurement and Metrology