Laser & Optoelectronics Progress, Volume. 58, Issue 20, 2010020(2021)
Character Segmentation for Historical Uchen Tibetan Document Based on Structure Attributes
Character segmentation is an important part in image analysis and recognition of historical Tibetan document. Aiming at the problems of text line slanting, stroke overlapping, crossing, touching between characters, stroke breaking and noise interference of historical Uchen Tibetan document, a character segmentation method for historical Uchen Tibetan document based on structure attributes is proposed in this paper. First, a character block dataset of historical Uchen Tibetan document is established. Then, the local baseline of character block is detected by using syllable point position information or combining horizontal projection and linear detection, and the character block is divided horizontally into two parts above and below the baseline. The improved template matching algorithm is used to detect touching strokes and touching type above the baseline. The multi-direction and multi-path touching character segmentation algorithm is used to realize crossing and touching strokes segmentation. Finally, according to Tibetan structure attribute, to complete the attribution of each stroke. Experimental results show that the proposed method can effectively solve the challenge problem in character segmentation. The recall rate, precision rate and F-Measure of character segmentation reached 96.52%, 98.24% and 97.37%, respectively.
Get Citation
Copy Citation Text
Ce Zhang, Weilan Wang. Character Segmentation for Historical Uchen Tibetan Document Based on Structure Attributes[J]. Laser & Optoelectronics Progress, 2021, 58(20): 2010020
Category: Image Processing
Received: Jan. 8, 2021
Accepted: Mar. 3, 2021
Published Online: Nov. 3, 2021
The Author Email: Wang Weilan (wangweilan@xbmu.edu.cn)