Laser & Optoelectronics Progress, Volume. 58, Issue 14, 1410006(2021)
Layout Segmentation and Description of Tibetan Document Images Based on Adaptive Run Length Smoothing Algorithm
Layout segmentation is an important basic step in the process of document image analysis and recognition. In order to explore a suitable method for layout segmentation and description of Tibetan document images, a research method based on the adaptive run length smoothing algorithm is proposed. Firstly, according to the layout structure of Tibetan document images, K-means clustering analysis is used to get the run length threshold suitable for the layout, smooth the run length, find the connected component, and realize the layout segmentation. Then, according to the external contour characteristics of each layout element, the text area and non-text area are simply distinguished. Finally, the text area is recognized by a Tibetan text recognizer, and then the extensible markup language is used to record layout information and realize layout description. Experiments on Tibetan primary and secondary school teaching materials and stereotyped Tibetan document images show that this method can achieve good layout analysis results.
Get Citation
Copy Citation Text
Yuanyuan Chen, Weilan Wang, Huaming Liu, Zhengqi Cai, Penghai Zhao. Layout Segmentation and Description of Tibetan Document Images Based on Adaptive Run Length Smoothing Algorithm[J]. Laser & Optoelectronics Progress, 2021, 58(14): 1410006
Category: Image Processing
Received: Sep. 21, 2020
Accepted: Nov. 12, 2020
Published Online: Jul. 14, 2021
The Author Email: Wang Weilan (wangweilan@xbmu.edu.cn)